How to Install xCAT in the HPC Environment 

Installing xCAT (Extreme Cloud Administration Toolkit) on AlmaLinux or Rocky Linux involves setting up the necessary repositories, installing required dependencies, and configuring xCAT to manage your cluster. Here's a step-by-step guide to installing xCAT from scratch.

  • simple xCAT architecture diagram



Hardware Requirements for xCAT Installation

Depending on the size of your cluster, you need to meet certain hardware requirements to install and use xCAT effectively. Below are the general recommendations for management and compute nodes.


1. Management Node (Head Node) Requirements

The management node is the central controller that manages the compute nodes via PXE boot, DHCP, and other services.

ComponentMinimum RequirementRecommended for Large Clusters
CPU4-core x86_64 processor8+ core Xeon/EPYC
RAM8 GB32+ GB
Disk100 GB SSD/HDD500 GB SSD (RAID recommended)
Network1 Gbps Ethernet10/40/100 Gbps Ethernet/Infiniband
OtherStatic IP, Internet AccessRedundant PSU for reliability

🔹 For clusters with 100+ nodes, a high-performance management node with SSDs and 10Gbps+ networking is recommended.


2. Compute Node Requirements

Compute nodes run workloads and are managed by the head node.

ComponentMinimum RequirementRecommended
CPU2-core x86_648+ core Xeon/EPYC
RAM4 GB16+ GB
Disk50 GB HDD250 GB SSD
Network1 Gbps10/100 Gbps Ethernet/Infiniband

🔹 Diskless compute nodes are supported, but require NFS or a shared storage system.


3. Network Requirements

  • Management Node ↔ Compute Nodes: Minimum 1 Gbps; preferably 10 Gbps or Infiniband.
  • PXE Boot Support: Compute nodes should support PXE boot for automated provisioning.
  • IPMI (Optional): If available, can be used for power management (rpower in xCAT).

4. Additional Considerations

  • RAID Storage: If managing a large cluster, consider RAID 10 for redundancy.
  • High-Availability (HA): Use two management nodes in an HA setup for critical systems.
  • GPU Support: If using GPUs, ensure xCAT is configured to handle GPU-based provisioning.



Step 1: Prepare the System

Before installing xCAT, ensure your system meets the following requirements:

  • A fresh AlmaLinux 9 or Rocky Linux 9 installation.
  • A static IP address for the management node.
  • Internet access for package installation.

1.1 Update System and Install Required Dependencies

Run the following commands to update your system and install the necessary packages:

dnf update -y dnf install -y epel-release dnf install -y perl bzip2 net-tools wget tar mlocate

Step 2: Install xCAT

2.1 Add xCAT Repository

xCAT provides its own repository. Download and enable it:

dnf install -y https://xcat.org/files/xcat/repos/yum/xcat-core/xcat-core.repo dnf install -y https://xcat.org/files/xcat/repos/yum/xcat-dep/xcat-dep.repo

2.2 Install xCAT Packages

Now install xCAT along with dependencies:

dnf install -y xCAT

2.3 Verify Installation

Check if xCAT is installed correctly by running:

lsxcatd -v

You should see the installed xCAT version.


Step 3: Start xCAT Services

After installation, start and enable the xcatd service:


systemctl enable xcatd systemctl start xcatd

Check if the service is running:

systemctl status xcatd

Step 4: Configure Network for Node Provisioning

The management node should handle DHCP, DNS, and PXE boot for compute nodes.

4.1 Set Management Node and Domain

Replace yourdomain.com and your-mgmt-node-ip with your actual domain and IP:

chtab key=domain site.value=yourdomain.com chtab key=master site.value=your-mgmt-node-ip

4.2 Configure DHCP and DNS

Generate DHCP and DNS configurations:

makedhcp -n
makedns -n

Verify settings:

lsdef -t site

Step 5: Add Compute Nodes

Each compute node needs to be defined in xCAT.

5.1 Define Compute Node

Replace 10.0.0.101 with the node's IP and XX:XX:XX:XX:XX:XX with its MAC address:

mkdef -t node node01 groups=compute ip=10.0.0.101 mac=XX:XX:XX:XX:XX:XX

5.2 Verify Node Configuration

List all defined nodes:

lsdef -t node

Check details of a specific node:

lsdef node01

Step 6: Setup PXE Boot and OS Provisioning

6.1 Copy OS Installation Media

Download the OS ISO and mount it:

mkdir /mnt/iso
mount -o loop /path/to/your-os.iso /mnt/iso

Copy OS files for provisioning:

copycds /mnt/iso

6.2 Assign OS Image to Compute Nodes

Check available OS images:

lsdef -t osimage

Assign an OS image to a node (replace your-os-image with an actual image name):

nodeset node01 osimage=your-os-image

Verify PXE boot setup:

nodeset node01 stat

Step 7: Power On and Boot Compute Nodes

Turn on the compute node (via IPMI, if available):

rpower node01 on

Check node status:

nodestat node01

Step 8: Verify Cluster Setup

Once the compute node has booted, verify connectivity:

xdsh node01 "hostname && uptime"

Step 9: Enable Monitoring (Optional)

Start xcatmon to monitor the cluster:

bash
xcatmon

Conclusion

This guide walks you through installing and configuring xCAT on AlmaLinux/RockyLinux 9. You now have a working xCAT environment for managing HPC clusters.


Comments

Popular Posts