InfiniBand: Overview, Working, Installation & Configuration

1. What is InfiniBand?

InfiniBand (IB) is a high-speed, low-latency networking technology designed for HPC (High-Performance Computing), AI, and data centers. It provides high bandwidth (up to 800 Gbps with NDR InfiniBand) and is commonly used in supercomputers and clusters.


2. How InfiniBand Works?

InfiniBand creates a switched fabric topology, where each node (compute or storage) is connected via InfiniBand switches. Unlike traditional Ethernet, IB uses RDMA (Remote Direct Memory Access), allowing direct memory access between nodes without CPU involvement.

Key Features of InfiniBand:

  • High Bandwidth (up to 800 Gbps with latest versions).

  • Low Latency (sub-microsecond).

  • RDMA Support (direct memory access, reducing CPU load).

  • Lossless Networking (no packet drops like Ethernet).

  • Supports Multi-Pathing for better reliability.


3. InfiniBand Architecture Diagram

An architecture diagram to illustrate how InfiniBand connects HPC nodes.



Installing and Configuring Mellanox InfiniBand Driver

To use InfiniBand (IB) networking, you must install the Mellanox OFED (OpenFabrics Enterprise Distribution) driver, which supports RDMA and other IB features.


Step 1: Verify InfiniBand Hardware

Before installation, check if the InfiniBand adapter is detected:

lspci | grep Mellanox

If you see an output like:


07:00.0 Network controller: Mellanox Technologies MT27800 Family [ConnectX-5]

Then your IB card is detected.


Step 2: Download Mellanox OFED Driver

Get the latest Mellanox OFED driver from NVIDIA's official website:
🔗 Mellanox OFED Downloads

Alternatively, use wget:

wget https://www.mellanox.com/downloads/ofed/MLNX_OFED-23.07-0.5.2.0-rhel9.1-x86_64.tgz

(Change the version based on your OS.)


Step 3: Install the Mellanox OFED Driver

1️⃣ Extract the downloaded file:

tar -xvzf MLNX_OFED-23.07-0.5.2.0-rhel9.1-x86_64.tgz
cd MLNX_OFED-23.07-0.5.2.0-rhel9.1-x86_64

2️⃣ . Run the installer:

sudo ./mlnxofedinstall --without-fw-update

(The --without-fw-update flag prevents accidental firmware updates.)

3️⃣ Reboot the system:

reboot

Step 4: Verify Installation

Check if the InfiniBand modules are loaded:

lsmod | grep mlx

You should see output like:

mlx5_core 200704 0
mlx4_en 53248 0

Check the status of the IB interfaces:

ibstat

It will display details about the InfiniBand HCA (Host Channel Adapter) if properly installed.


Step 5: Configure the InfiniBand Network

Enable IB Interface

1️⃣ List network interfaces:

ibdev2netdev

Example output:

mlx5_0 port 1 ==> ib0 (Up)
mlx5_1 port 1 ==> ib1 (Down)

2️⃣ Assign an IP address (if using IPoIB):
Edit /etc/sysconfig/network-scripts/ifcfg-ib0 (for RHEL/AlmaLinux) or /etc/netplan/ (for Ubuntu):

DEVICE=ib0
BOOTPROTO=dhcp ONBOOT=yes

Then restart the network service:

systemctl restart network

Step 6: Test InfiniBand Communication

1️⃣ Check the InfiniBand link status:

iblinkinfo

2️⃣ . Run a bandwidth test:

ib_send_bw -d mlx5_0

3️⃣ Run an RDMA ping test:

ibping -S

(on another node)

ibping -c <IB-IP-of-Server>

Conclusion

Mellanox OFED installation enables InfiniBand communication in HPC systems.
Use RDMA for low-latency, high-speed networking.
Ensure IB devices are configured and active using ibstat, ibping, and ib_send_bw.

Comments

Popular Posts