Installation of RDMA stack manually
Contents
It is possible to install the RDMA packages, in convenient and simple way, using the OFED distribution or using the packages that shipped within the Linux distributions. However, sometimes there is a need to build and install the packages manually. The reasons for doing this can be:
- Use the latest code to enjoy new features/bug fixes
- Debug the code
- Develop new features
The RDMA code is combined from both kernel and userspace code. One can use the following instructions to install only the Linux kernel or only the userspace and use the other one from the Linux distributions. This will work since the developers of the RDMA software work hard to keep this working (i.e. keep ABI compatible).
Linux Kernel
Downloading the kernel sources
The Linux kernel source tree can be taken from many places. The best location is Linus Torvalds git tree. However, one can take it from any other location and the instructions below will still be the same.
[root@localhost] # cd linux
Configuring the Linux kernel
The Linux kernel can be configured to support various features and hardware devices. The following command will open a text-based menu which allows one to configure the kernel.
Here are the modules that are relevant for RDMA. If one needs to enable more options, he should do it before/after enabling the relevant RDMA options, and then saving the current configuration before exiting the menu.
Enable RDMA core and low-level drivers
Enter the menu: Device Drivers -> InfiniBand support
(This option name is misleading; it enables kernel support for all RDMA transports (InfiniBand, iWARP, and RoCE) and not only InfiniBand).
Enable the following options:
- InfiniBand userspace MAD support
- InfiniBand userspace access (verbs and CM)
- IP-over-InfiniBand
- IP-over-InfiniBand Connected Mode support
- InfiniBand SCSI RDMA Protocol
- iSCSI Extensions for RDMA (iSER)
- The low-level drivers for the RDMA devices that one may have on his computer
Enable RDS
Enter the menu: Networking support -> Networking options:
Enable the following options:
- The RDS Protocol
- RDS over Infiniband and iWARP
Enable NFS over RDMA
Enter the menu: File Systems -> Network File Systems:
Enable the following options:
- RPC over RDMA Client Support
- RPC over RDMA Server Support
Building the Linux kernel
Now, after the kernel is configured the kernel can be built.
Installing the Linux kernel
To allow installing the new compiled kernel on the local machine, the following command will install it and a new kernel will be added to the boot loader of the computer.
More information
Detailed information on how to compile the Linux kernel can be found in the following URL:
http://kernelnewbies.org/KernelBuild
More information can be found in many locations on the internet.
Userspace code
There are many source-code packages and libraries. The minimum packages that are needed for working with RDMA are: libibverbs, librdmacm and userspace low-level library for the RDMA device.
The following instructions are relevant for all the packages.
Downloading the kernel sources
The userspace source should be downloaded from several locations:
Kernel.org git repositories - contains libibverbs and some low-level drivers:
https://git.kernel.org/cgit/
OpenFabrics repositories - hosts all the rest packages that are shipped in the OFED and in Linux distributions:
http://git.openfabrics.org/
All the packages above are maintained in GIT repositories. One can clone them using the 'git clone <URL>' command line.
Configuring a package
Every package supports the GNU build system (i.e. Autotools), and it needs to be configured before it can be built. The following command lines will configure a package and create the Makefile and the spec file of this package (which is needed to build RPM to it), after checking that all prerequisites are met.
[root@localhost] # ./configure
Now one can have two options to install the package:
- Install the package in the computer file system (without any package-based control system)
- Generate an RPM and install it
Build and install a package to the file system
Building a userspace package
The following command line will compile the package:
Installing a userspace package
The following command line will install the package, without any trivial way to uninstall or remove it.
Build and install a package using RPM
Creating a tarball
Before creating an RPM based on a repository, one needs to create a directory with the package name and version, and then compress it into a tarball. The RPM version can be found in the spec file (the line that starts with 'Version'.
[root@localhost] # tar czfv ~/rpmbuild/SOURCES/<package name>-<version>.tar.gz <package name>-<version> --exclude .git
Building an SRPM
Now, that the tarball is ready, the following command line will build an SRPM (Source RPM):
The SRPM will be built at ~/rpmbuild/SRPMS.
Building binary RPM(s)
When there is an SRPM, binary RPMs can be built; one SRPM can build one or more binary RPMs. the following command line will build binary RPM(s):
The binary RPM(s) will be built in ~/rpmbuild/RPMS/<arch>.
Install a binary RPM(s)
The binary RPMs can be installed locally, and later be removed/upgraded easily. The following command line will install a binary RPM:
Comments
Tell us what do you think.
Hi Dotan,
can u please throw light on technique/feature which improves latency in RDMA.
only reason I can imagine is one DMA is saved, which otherwise needed to fetch recv WQE in case of send/recv
Hi.
Did you look at the following post: http://www.rdmamojo.com/2013/06/08/tips-and-tricks-to-optimize-your-rdma-code/ ?
Thanks
Dotan
Hi Dotan,
but this thread does not explain the EDGE rdma has over send/recv.
i.e. all this optimizations are irrespective of channel or memory semantics.
in more simpler words, provided all this optimizations applied why RDMA performs better than send/recv
Hi.
I don't quite understand your request:
do you want to understand why RDMA is better than send/recv?
(by RDMA you are referring to RDMA Write/Read?)
Or is there is anything else?
Sorry, but I failed to understand ..
Dotan
Yes,
you got it right.
I wanted to understand why RDMA write/read is better than send/recv.
OSU's benchmarks shows better latency in rdma rd/wr than send/recv. so I am wondaring what justifies this
Hi.
Let me try to explain why Write is better than Send/Recv.
In Send:
Data is being traveled over the network and when it reached to remote side,
a Receive Request is fetched and the device scatter/write to those buffers,
according to the S/G list.
In Write:
Data is being traveled over the network and when it reached to remote side,
the information where this data will be written is known (remote address is known to the sender),
and the data is written in a contiguous memory block (no extra Receive Request fetch is required).
So, RDMA Write is better than Send/Recv because:
* Extra Work Request fetch is not being done
* Only contiguous memory write (in remote side) is allowed
Read is similar, although it require some work from the remote side.
I hope that I answered your question
:)
Dotan
Thanks for your reply
that's exactly my question was.
but somehow I find hard to agree with your justification
cuz what If
* all work requests are onboard(NIC memory).No extra fetch
* generally latency is 1 byte operation thus bypassing contiguous memory requirement
Hi.
It is fine to disagree\
:)
Actually, there are adapters that their Work Queues are onboard,
but more and more adapters are now using host memory
(low costs, no need for different adapters with different amount of memory, etc).
So, in those adapters there will be an extra PCI access.
I gave a general answer, although you are right that latency most likely to be measured on 1 byte message.
Thanks
Dotan
Hi Dotan,
It will good , if you write an article on Upstream submission related to RDMA in both userspace and kernel space.
Thanks,Alok
Hi.
The kernel verbs is a completely different east (memory registration is different, it is fully async, more verbs, etc.),
I wrote a chapter in a book that explained the kernel verbs in details
(at least, the one that were available when I wrote that chapter).
For more information, look at the post:
https://www.rdmamojo.com/2013/12/07/book-linux-kernel-networking-implementation-and-theory/
Thanks
Dotan