Installation of RDMA stack manually

Contents

5.00 avg. rating (98% score) - 2 votes

It is possible to install the RDMA packages, in convenient and simple way, using the OFED distribution or using the packages that shipped within the Linux distributions. However, sometimes there is a need to build and install the packages manually. The reasons for doing this can be:

Use the latest code to enjoy new features/bug fixes
Debug the code
Develop new features

The RDMA code is combined from both kernel and userspace code. One can use the following instructions to install only the Linux kernel or only the userspace and use the other one from the Linux distributions. This will work since the developers of the RDMA software work hard to keep this working (i.e. keep ABI compatible).

Linux Kernel

Downloading the kernel sources

The Linux kernel source tree can be taken from many places. The best location is Linus Torvalds git tree. However, one can take it from any other location and the instructions below will still be the same.

[root@localhost] # git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[root@localhost] # cd linux

Configuring the Linux kernel

The Linux kernel can be configured to support various features and hardware devices. The following command will open a text-based menu which allows one to configure the kernel.

[root@localhost] # make menuconfig

Here are the modules that are relevant for RDMA. If one needs to enable more options, he should do it before/after enabling the relevant RDMA options, and then saving the current configuration before exiting the menu.

Enable RDMA core and low-level drivers

Enter the menu: Device Drivers -> InfiniBand support
(This option name is misleading; it enables kernel support for all RDMA transports (InfiniBand, iWARP, and RoCE) and not only InfiniBand).

Enable the following options:

InfiniBand userspace MAD support
InfiniBand userspace access (verbs and CM)
IP-over-InfiniBand
- IP-over-InfiniBand Connected Mode support
InfiniBand SCSI RDMA Protocol
iSCSI Extensions for RDMA (iSER)
The low-level drivers for the RDMA devices that one may have on his computer

Enable RDS

Enter the menu: Networking support -> Networking options:

Enable the following options:

The RDS Protocol
- RDS over Infiniband and iWARP

Enable NFS over RDMA

Enter the menu: File Systems -> Network File Systems:

Enable the following options:

RPC over RDMA Client Support
RPC over RDMA Server Support

Building the Linux kernel

Now, after the kernel is configured the kernel can be built.

[root@localhost] # make

Installing the Linux kernel

To allow installing the new compiled kernel on the local machine, the following command will install it and a new kernel will be added to the boot loader of the computer.

[root@localhost] # make install

More information

Detailed information on how to compile the Linux kernel can be found in the following URL:

http://kernelnewbies.org/KernelBuild

More information can be found in many locations on the internet.

Userspace code

There are many source-code packages and libraries. The minimum packages that are needed for working with RDMA are: libibverbs, librdmacm and userspace low-level library for the RDMA device.

The following instructions are relevant for all the packages.

Downloading the kernel sources

The userspace source should be downloaded from several locations:

Kernel.org git repositories - contains libibverbs and some low-level drivers:
https://git.kernel.org/cgit/

OpenFabrics repositories - hosts all the rest packages that are shipped in the OFED and in Linux distributions:
http://git.openfabrics.org/

All the packages above are maintained in GIT repositories. One can clone them using the 'git clone <URL>' command line.

Configuring a package

Every package supports the GNU build system (i.e. Autotools), and it needs to be configured before it can be built. The following command lines will configure a package and create the Makefile and the spec file of this package (which is needed to build RPM to it), after checking that all prerequisites are met.

[root@localhost] # ./autogen.sh
[root@localhost] # ./configure

Now one can have two options to install the package:

Install the package in the computer file system (without any package-based control system)
Generate an RPM and install it

Build and install a package to the file system

Building a userspace package

The following command line will compile the package:

[root@localhost] # make

Installing a userspace package

The following command line will install the package, without any trivial way to uninstall or remove it.

[root@localhost] # make install

Build and install a package using RPM

Creating a tarball

Before creating an RPM based on a repository, one needs to create a directory with the package name and version, and then compress it into a tarball. The RPM version can be found in the spec file (the line that starts with 'Version'.

[root@localhost] # cp -r <package name> <package name>-<version>
[root@localhost] # tar czfv ~/rpmbuild/SOURCES/<package name>-<version>.tar.gz <package name>-<version> --exclude .git

Building an SRPM

Now, that the tarball is ready, the following command line will build an SRPM (Source RPM):

[root@localhost] # rpmbuild -bs <package dir>/<package name>.spec

The SRPM will be built at ~/rpmbuild/SRPMS.

Building binary RPM(s)

When there is an SRPM, binary RPMs can be built; one SRPM can build one or more binary RPMs. the following command line will build binary RPM(s):

[root@localhost] # rpmbuild --rebuild ~/rpmbuild/SRPMS/<package name>-<version>-<release>.<dist>.src.rpm

The binary RPM(s) will be built in ~/rpmbuild/RPMS/<arch>.

Install a binary RPM(s)

The binary RPMs can be installed locally, and later be removed/upgraded easily. The following command line will install a binary RPM:

[root@localhost] # rpm -ihv <binary rpm>

Written by: Dotan Barak on December 27, 2014.on February 27, 2015.

Comments

Tell us what do you think.

Pankaj says: March 31, 2015

Hi Dotan,
can u please throw light on technique/feature which improves latency in RDMA.
only reason I can imagine is one DMA is saved, which otherwise needed to fetch recv WQE in case of send/recv

Reply
- Dotan Barak says: March 31, 2015
  
  Hi.
  
  Did you look at the following post: http://www.rdmamojo.com/2013/06/08/tips-and-tricks-to-optimize-your-rdma-code/ ?
  
  Thanks
  Dotan
  
  Reply
  - Pankaj says: March 31, 2015
    
    Hi Dotan,
    but this thread does not explain the EDGE rdma has over send/recv.
    i.e. all this optimizations are irrespective of channel or memory semantics.
    in more simpler words, provided all this optimizations applied why RDMA performs better than send/recv
  - Dotan Barak says: March 31, 2015
    
    Hi.
    
    I don't quite understand your request:
    do you want to understand why RDMA is better than send/recv?
    (by RDMA you are referring to RDMA Write/Read?)
    
    Or is there is anything else?
    Sorry, but I failed to understand ..
    
    Dotan
Pankaj says: April 1, 2015

Yes,
you got it right.
I wanted to understand why RDMA write/read is better than send/recv.
OSU's benchmarks shows better latency in rdma rd/wr than send/recv. so I am wondaring what justifies this

Reply
- Dotan Barak says: April 3, 2015
  
  Hi.
  
  Let me try to explain why Write is better than Send/Recv.
  In Send:
  Data is being traveled over the network and when it reached to remote side,
  a Receive Request is fetched and the device scatter/write to those buffers,
  according to the S/G list.
  
  In Write:
  Data is being traveled over the network and when it reached to remote side,
  the information where this data will be written is known (remote address is known to the sender),
  and the data is written in a contiguous memory block (no extra Receive Request fetch is required).
  
  So, RDMA Write is better than Send/Recv because:
  * Extra Work Request fetch is not being done
  * Only contiguous memory write (in remote side) is allowed
  
  Read is similar, although it require some work from the remote side.
  
  I hope that I answered your question
  :)
  
  Dotan
  
  Reply
  - Pankaj says: April 8, 2015
    
    Thanks for your reply
    that's exactly my question was.
    but somehow I find hard to agree with your justification
    cuz what If
    * all work requests are onboard(NIC memory).No extra fetch
    * generally latency is 1 byte operation thus bypassing contiguous memory requirement
  - Dotan Barak says: April 8, 2015
    
    Hi.
    
    It is fine to disagree\
    :)
    
    Actually, there are adapters that their Work Queues are onboard,
    but more and more adapters are now using host memory
    (low costs, no need for different adapters with different amount of memory, etc).
    So, in those adapters there will be an extra PCI access.
    
    I gave a general answer, although you are right that latency most likely to be measured on 1 byte message.
    
    Thanks
    Dotan
Alok says: September 10, 2020

Hi Dotan,
It will good , if you write an article on Upstream submission related to RDMA in both userspace and kernel space.

Thanks,Alok

Reply
- Dotan Barak says: November 21, 2020
  
  Hi.
  
  The kernel verbs is a completely different east (memory registration is different, it is fully async, more verbs, etc.),
  I wrote a chapter in a book that explained the kernel verbs in details
  (at least, the one that were available when I wrote that chapter).
  
  For more information, look at the post:
  https://www.rdmamojo.com/2013/12/07/book-linux-kernel-networking-implementation-and-theory/
  
  Thanks
  Dotan
  
  Reply

Add a Comment

This comment will be moderated; answer may be provided within 14 days.

Social Network Badges

Main Menu

Installation of RDMA stack manually

Linux Kernel

Downloading the kernel sources