libibverbs
Contents
General
libibverbs is an implementation of the RDMA verbs for both Infiniband (according to the Infiniband specifications) and iWarp (iWARP verbs specifications). It handles the control path of creating, modifying, querying and destroying resources such as Protection Domains (PD), Completion Queues (CQ), Queue-Pairs (QP), Shared Receive Queues (SRQ), Address Handles (AH), Memory Regions (MR). It also handles sending and receiving data posted to QPs and SRQs, getting completions from CQs using polling and completions events.
The control path is implemented through system calls to the uverbs kernel module which further calls the low-level HW driver. The data path is implemented through calls made to low-level HW library which, in most cases, interacts directly with the HW provides kernel and network stack bypass (saving context/mode switches) along with zero copy and an asynchronous I/O model.
Typically, under network and RDMA programming, there are operations which involve interaction with remote peers (such as address resolution and connection establishment) and remote entities (such as route resolution and joining a multicast group under IB), where a resource managed through IB verbs such as QP or AH would be eventually created or effected from this interaction. In such cases, applications whose addressing semantics are based on IP can use librdmacm which works in conjunction with libibverbs.
Thread safe
This library is a thread safe library and verbs can be called from every thread in the process. The same resource can even be handled from different threads (the atomicity of the operations is guaranteed). However, it is up to the user to stop working with a resource after it was destroyed (by the same thread or by any other thread), not doing so may result a segmentation fault.
Fork safe
As a general rule of thumb, fork() should be avoided when using libibvebrs, either by calling it explicitly or by calling it implicitly (by calling other system calls that call it, such as system(), popen(), etc.).
However, if one must use fork() please read the documentation of ibv_fork_init().
Library API
The functions in the library shall be declared as functions and some of them may be declared as macros.
In order to use libibvebrs, the following line must be included in the source code:
#include <infiniband/verbs.h> |
Library functions
int ibv_fork_init(void); |
Device functions
struct ibv_device **ibv_get_device_list(int *num_devices); void ibv_free_device_list(struct ibv_device **list); const char *ibv_get_device_name(struct ibv_device *device); uint64_t ibv_get_device_guid(struct ibv_device *device); |
Context functions
struct ibv_context *ibv_open_device(struct ibv_device *device); int ibv_close_device(struct ibv_context *context); |
Queries
int ibv_query_device(struct ibv_context *context, struct ibv_device_attr *device_attr); int ibv_query_port(struct ibv_context *context, uint8_t port_num, struct ibv_port_attr *port_attr); int ibv_query_pkey(struct ibv_context *context, uint8_t port_num, int index, uint16_t *pkey); int ibv_query_gid(struct ibv_context *context, uint8_t port_num, int index, union ibv_gid *gid); |
Asynchronous events
int ibv_get_async_event(struct ibv_context *context, struct ibv_async_event *event); void ibv_ack_async_event(struct ibv_async_event *event); |
Protection Domains
struct ibv_pd *ibv_alloc_pd(struct ibv_context *context); int ibv_dealloc_pd(struct ibv_pd *pd); |
Memory Regions
struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, enum ibv_access_flags access); int ibv_dereg_mr(struct ibv_mr *mr); |
Address Handles
struct ibv_ah *ibv_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr); int ibv_init_ah_from_wc(struct ibv_context *context, uint8_t port_num, struct ibv_wc *wc, struct ibv_grh *grh, struct ibv_ah_attr *ah_attr); struct ibv_ah *ibv_create_ah_from_wc(struct ibv_pd *pd, struct ibv_wc *wc, struct ibv_grh *grh, uint8_t port_num); int ibv_destroy_ah(struct ibv_ah *ah); |
Completion event channels
struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context); int ibv_destroy_comp_channel(struct ibv_comp_channel *channel); |
Completion Queues control
struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe, void *cq_context, struct ibv_comp_channel *channel, int comp_vector); int ibv_destroy_cq(struct ibv_cq *cq); int ibv_resize_cq(struct ibv_cq *cq, int cqe); |
struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, struct ibv_srq_init_attr *srq_init_attr); int ibv_destroy_srq(struct ibv_srq *srq); int ibv_modify_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr, enum ibv_srq_attr_mask srq_attr_mask); int ibv_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr); |
Queue Pair control
struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr); int ibv_destroy_qp(struct ibv_qp *qp); int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask); int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, struct ibv_qp_init_attr *init_attr); |
Posting Work Requests to QPs/SRQs
int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr); int ibv_post_recv(struct ibv_qp *qp, struct ibv_recv_wr *wr, struct ibv_recv_wr **bad_wr); int ibv_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *recv_wr, struct ibv_recv_wr **bad_recv_wr); |
Reading Completions from CQ
int ibv_poll_cq(struct ibv_cq *cq, int num_entries, struct ibv_wc *wc); |
Requesting / Managing CQ events
int ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only); int ibv_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq, void **cq_context); void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents); |
Multicast group
int ibv_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); |
General functions
int ibv_rate_to_mult(enum ibv_rate rate); enum ibv_rate mult_to_ibv_rate(int mult); const char *ibv_node_type_str(enum ibv_node_type node_type); const char *ibv_port_state_str(enum ibv_port_state port_state); const char *ibv_event_type_str(enum ibv_event_type event); const char *ibv_wc_status_str(enum ibv_wc_status status); |
Resource creation dependency
Typical error messages
Here is a list of the typical error messages, which may be printed to stderr when executing a libibverbs application, and how to solve them:
- libibverbs: Fatal: couldn't read uverbs abi version
- Reason: libibverbs failed to find the file (/sys/class/infiniband_verbs/abi_version) that indicated the ABI (Application Binary Interface) version between the kernel and libibverbs.
- Cause: this usually happens when the module ib_uverbs isn't loaded.
- Solution: if the RDMA package (OFED) was installed - reboot the machine. Otherwise, load the RDMA stack drivers using the proper service file.
- libibverbs: Fatal: kernel ABI version X doesn't match library version Y
- Reason: the available RDMA kernel stack isn't supported by libibverbs (this is what's wrong ABI means).
- Cause: this usually happen when the kernel part and libibverbs don't come from the same source (i.e. OFED/inbox/built manually).
- Solution: uninstall the current RDMA packages that one may have and install a fresh OFED distribution or the packages that come within the Linux distribution.
- libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'
- Reason: libibverbs failed to open the directory that holds information about the installed userspace low-level driver libraries.
- Cause: this usually happens when libibverbs was configured and compiled with different parameters (--sysconfdir that was provided to "configure") than the userspace low-level driver libraries.
- Solution: uninstall the userspace low-level drivers and libibverbs and install them from a consistent source, or recompile all those libraries with the same parameters.
- libibverbs: Warning: fork()-safety requested but init failed
- Reason: libibverbs tried to work in fork()-safe mode, according to the user's request, but failed.
- Cause: this usually happens in old Linux kernels (older than 2.6.12)
- Solution: move to older Linux kernel or disable the fork() request environment variable/verb.
- libibverbs: Warning: no userspace device-specific driver found
- Reason: libibverbs failed to find userspace low-level driver for a specific RDMA device.
- Cause: the userspace low-level driver for this RDMA device is missing.
- Solution: install the missing low-level driver, according to the HW that exists in your computer (lspci may be handy).
- libibverbs Warning: couldn't load driver
- Reason: libibverbs failed to load the userspace low-level driver library for a specific RDMA device.
- Cause: this usually happens when the userspace low-level driver library (.so file) for this RDMA device is missing, corrupted or isn't consistent with the libibverbs (in terms of supported features).
- Solution: if the userspace low-level driver library for this RDMA device is missing: install it. If it is already installed, uninstalll and reinstall it from the same source that libibverbs came from.
- libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes
- Reason: libibverbs verified the amount of memory that can be locked by the running process, and detected that this value is 32KB or less.
- Cause: working with RDMA requires to pin (i.e. lock) system memory. Low amount of memory which can be locked will cause failure when creating a Completion Queue, Queue Pair, Shared Receive Queue or Memory Region.
- Solution: increase the amount of memory which can be locked by any process to a higher value ("unlimited" is preferred).
Summary
In this post, we described libibverbs.
In the next posts, we will cover the API in details.
Comments
Tell us what do you think.
Hello, looking for RDMA event error message code document that explains RDMA error code messages. I have this code I'm trying to identify (142800.654819Z NASDIBFD_Q_K_R_1 FATAL int getrdmaevent(b9::Session*): cmevent->status != 0 RDMA CM event error -22). What does this mean? Thanks for you help in advance
Hi.
I think that there isn't enough information here.
I believe that the error -22 is an errno (-EINVAL).
Since there isn't negative RDMA_CM error value.
Thanks
Dotan
Dotan,
Thank you for your feedback..
Could I get the soure code of the rdmav2.so? I look for them for a long time, but no result. Could you tell me whether I can get the source code?
Hi.
Which package/library are you referring to?
Thanks
Dotan
Hi,
I am a student, start to learn about
RDMA. Firstly, I really appreciate about your blog. Can you please let me know the answers of the following questions it maybe helpful for others:
I totally confused with different libraries for RDMA.
Is libibvers the same as libvers? and is it different from the library of OFA because I read that it is developed by Roland Dreier, maybe he is a member of OFA. I see also lrdmacm (it is the same as other libraries) Is there any other libraries for RDMA. Actually, I dont know what is the meaning of verbs, is it a abbreviation?
I am really looking forward for your response!!
Thank you very much,
Hi.
Welcome to the RDMA scene
:)
libibverbs is a low-level layer that allows one to use RDMA in his code,
it uses RDMA technology primitives that were defined in the InfiniBand specifications
(in chapter 11: the verbs layer description).
Roland Dreier started the development of this library,
but now it is maintained by other people..
librdmacm is an attempt to make the RDMA programming closer to socket programming.
Personally, I prefer to program over libibverbs - but this is only me ....
I hope that I answered your question..
Thanks
Dotan
Thank you very much Dotan!
Just one more question.
Is libibverbs the same as OpenFabric Alliance present or it is something else? Is it mandatory to install OpenFabric Alliance library to communicate RDMA in Infiniband or it is like a library like libibverbs ?
Thank you very much!!
Hi.
which "OpenFabrics Alliance library that was presented" you are referring to?
Libibverbs is a fully supported library which is part of the OFED/Linux distributions.
Thanks
Dotan
Hi,
i have experienced a problem when running HPC, and i am pretty sure it is just the one described in OFED release note known issues "Internal Ref 781383, Creating Address Handler (AH) may run slow or may hang under a heavy load on all nodes cores (for example: MPI All2All cases)". can u pls tell me why it happens ? Does MELLANOX have any plans to fix it since i noticed that the issue 781383 has existed for a long time.
Thank you
Hi.
Currently, I'm a Mellanox employee. But this is my blog and not Mellanox's blog.
I'm sorry, but I don't answer Mellanox-specific related issues here ...
Please contact the Mellanox support or developers in other way.
Thanks
Dotan
Hello! Dotan.
I have read all of your articles and recently find that it lacks some introduction of exp_api(such as ibv_exp_post_send). In fact, I want to try the DCT functionality but haven't found any detailed reference. Can you help me with it?
Hi.
I'm sorry, I can't document things that aren't really official and available for everyone.
For more information, I suggest that you'll contact Mellanox Technologies support.
Thanks
Dotan
Got it.
Thanks for replying.
hello Dotan,
as the libibverbs and librdmacm updates, some struct 、 function parameters and return value were changed,but the document are old both your blog and the rdma-core project。
Hi.
Thanks for the feedback.
I maintain this blog and support people on my (very limited) free time,
without any help or sponsor from anyone or any company/foundation.
Updating all the blog will require much time, which I don't have.
And I won't pay someone to do it.
The bottom line is that no one will pay me for updating the blog (as you can see, all the content is free of charge).
So, I'm sorry - I don't have a good solution for this right now
Dotan