Infiniband Configuration
Last updated: 2024-08-22 10:34:08
Scenarios
After purchasing a GPU cloud server, if it contains Infiniband, please refer to this article for configuration and deployment.
Directions
Check Network
First, check if there is an Infiniband controller.
$ lspci | grep -i infiniband
00:07.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
If the above information is successfully returned, it indicates that the InfiniBand (IB) network card has been successfully allocated; if the corresponding information is not returned, it means that the network card may not have been correctly identified or the allocation has failed.
Install Driver
- Configure APT using the Mellanox repository and download the Mellanox GPG key.
$ wget -qO - http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | gpg --dearmor -o /usr/share/keyrings/GPG-KEY-Mellanox.gpg
- Create the
/etc/apt/sources.list.d/mlnx.list
file and specify the repository location.
$ curl https://repo.download.nvidia.com/baseos/ubuntu/jammy/dgx-repo-files.tgz | sudo tar xzf - -C /
- Update package
$ sudo apt update
- Install the nvidia-manage-ofed software package.
$ sudo apt install -y nvidia-manage-ofed
- Remove installed OFED components
$ sudo /usr/sbin/nvidia-manage-ofed.py -r ofed
- Add Mellanox OFED components
$ sudo /usr/sbin/nvidia-manage-ofed.py -i mofed
- After executing the above commands, restart the system for the changes to take effect.
$ reboot
Configure Network
- Check the status of the IB network.
$ ibstat
CA 'mlx5_0'
CA type: MT4129
Number of ports: 1
Firmware version: 28.39.3004
Hardware version: 0
Node GUID: 0xa088c20300d6d136
System image GUID: 0xa088c20300d6d136
Port 1:
State: Active
Physical state: LinkUp
Rate: 400
Base lid: 189
LMC: 0
SM lid: 1
Capability mask: 0xa751e848
Port GUID: 0xa088c20300d6d136
Link layer: InfiniBand
If the status shows "Active" or "LinkUp", it indicates that the network card has been started. If it is not in this state, you can try to restart the system and check again.
- Check the network port of the IB network.
$ ibdev2netdev
mlx5_0 port 1 ==> ibs7 (Down)
$ ibstatus mlx5_0
Infiniband device 'mlx5_0' port 1 status:
default gid: fe80:0000:0000:0000:a088:c203:00d6:d136
base lid: 0xbd
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 400 Gb/sec (4X NDR)
link_layer: InfiniBand
You can see that the InfiniBand port name is ibs7
. The subsequent series of configurations all require this port name.
- View the InfiniBand network segment allocation.
Please view the instance information and jump to the instance details. Check the network segment configuration rules at the "Device" prompt. You can refer to the screenshot.
For example, the network card is configured as 100.0.n.4/24
, and n is set according to requirements.
- Create an interface configuration.
Create the configuration file at /etc/network/interfaces
. If it doesn't exist, create it.
$ cd /etc/network/
$ cat interfaces
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
Create the configuration file /etc/network/interfaces.d
. If it doesn't exist, create it.
$ cd /etc/network/
$ ll
total 40
drwxr-xr-x 7 root root 4096 Mar 8 09:37 ./
drwxr-xr-x 152 root root 12288 Feb 19 07:46 ../
drwxr-xr-x 2 root root 4096 Dec 10 22:00 if-down.d/
drwxr-xr-x 2 root root 4096 Dec 10 22:00 if-post-down.d/
drwxr-xr-x 2 root root 4096 Dec 10 22:00 if-pre-up.d/
drwxr-xr-x 2 root root 4096 Jan 31 02:53 if-up.d/
-rw-r--r-- 1 root root 241 Jan 31 02:50 interfaces
drwxr-xr-x 2 root root 4096 Mar 8 09:40 interfaces.d/
Enter /etc/network/interfaces.d
, and create the file ifcfg-ibs7
. Both the file name and the network card-related names in the configuration content need to be changed to ibs7
. Please check carefully to maintain consistency.
Network configuration is 100.0.n.4/24
. If n = 5, the IP address is configured as address 100.0.5.4
. The subnet mask corresponding to /24 is netmask 255.255.255.0
.
$ cd /etc/network/interfaces.d
$ vim ifcfg-ibs7
auto ibs7
iface ibs7 inet static
address 100.0.5.4
netmask 255.255.255.0
pre-up echo datagram > /sys/class/net/ibs7/mode || :
pre-up /sbin/ifconfig ibs7 mtu 1500 || :
Enter the /etc/systemd/network/
directory and create the file 10-ibs7.network
. Both the file name and the network-card-related names in the configuration content need to be changed to ibs7
. Please check carefully to maintain consistency.
For the network card configuration of 100.0.n.4/24
, if n = 5, the IP address should be configured as Address = 100.0.5.4/24
.
$ cd /etc/systemd/network/
$ vim 10-ibs7.network
[Match]
Name=ibs7
[Network]
Address=100.0.5.4/24
[Link]
MTUBytes=1500
- Update the network configuration and enable the interface.
$ netplan apply
$ ifconfig ibs7 up
- Check the IB network by attempting to ping.
$ ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 172.16.11.253/24 metric 100 fe80::222:10ff:fe94:6982/64
ibs7 UP 100.0.5.4/24 fe80::a288:c203:d6:d136/64
$ ping 100.0.5.4
PING 100.0.5.4 (100.0.5.4) 56(84) bytes of data.
64 bytes from 100.0.5.4: icmp_seq=1 ttl=64 time=0.129 ms
64 bytes from 100.0.5.4: icmp_seq=2 ttl=64 time=0.101 ms