ibv_get_cq_event()
Contents
int ibv_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq, void **cq_context); |
Description
ibv_get_cq_event() waits for the next Completion event, according to the Completion event type that was requested using ibv_req_notify_cq() for a specific Completion Event Channel.
By default, ibv_get_cq_event() is a blocking function and if there isn't any Completion event to read, it waits until the next Completion event will be generated. It can be useful to have a dedicated thread that wait for the next Completion event to occur. However, if one wishes to read the event in non-blocking way, this can be done. One can configure the file descriptor of the event file in the Completion Event Channel channel to be non-blocking using fcntl(), and then read this file descriptor using read()/poll()/epoll()/select() in order to determine if there is any Completion event that waits to be read. There is an example on how to do it in this post.
All of the Completion events which were received using ibv_get_cq_event() must be acknowledged using ibv_ack_cq_events().
A typical use of working with Completion events will be the following:
Stage I: Preparation
1. Creates a CQ, associated with a Completion Event Channel
2. Requests for notification upon a new (first) completion event
Stage II: Completion Handling Routine
3. Wait for the completion event and ack it
4. Request for notification upon the next completion event
5. Empty the CQ
Note that an extra event may be triggered without having a corresponding Work Completion entry in the CQ. This occurs if a completion entry is added to the CQ between Step 4 and Step 5, and the CQ is then emptied (polled) in Step 5.
Parameters
Name | Direction | Description |
---|---|---|
channel | in | Completion event channel that was returned from ibv_create_comp_channel() |
cq | out | A CQ that got the Completion event |
cq_context | out | The CQ context of the CQ that got the Completion event |
Return Values
Value | Description | ||||
---|---|---|---|---|---|
0 | On success | ||||
-1 |
|
Examples
1) Waiting for a Completion event for the next Work Completion (in blocking way):
struct ibv_context *context; struct ibv_cq *cq; void *ev_ctx = NULL; /* can be initialized with other values for the CQ context */ /* Create a CQ, which is associated with a Completion Event Channel */ cq = ibv_create_cq(ctx, 1, ev_ctx, channel, 0); if (!cq) { fprintf(stderr, "Failed to create CQ\n"); return -1; } /* Request notification before any completion can be created (to prevent races) */ ret = ibv_req_notify_cq(cq, 0); if (ret) { fprintf(stderr, "Couldn't request CQ notification\n"); return -1; } . . /* Perform an operation that will eventually end with Work Completion */ . /* The following code will be called each time you need to read a Work Completion */ struct ibv_cq *ev_cq; void *ev_ctx; int ret; int ne; /* Wait for the Completion event */ ret = ibv_get_cq_event(channel, &ev_cq, &ev_ctx); if (ret) { fprintf(stderr, "Failed to get CQ event\n"); return -1; } /* Ack the event */ ibv_ack_cq_events(ev_cq, 1); /* Request notification upon the next completion event */ ret = ibv_req_notify_cq(ev_cq, 0); if (ret) { fprintf(stderr, "Couldn't request CQ notification\n"); return -1; } /* Empty the CQ: poll all of the completions from the CQ (if any exist) */ do { ne = ibv_poll_cq(cq, 1, &wc); if (ne < 0) { fprintf(stderr, "Failed to poll completions from the CQ: ret = %d\n", ne); return -1; } /* there may be an extra event with no completion in the CQ */ if (ne == 0) continue; if (wc.status != IBV_WC_SUCCESS) { fprintf(stderr, "Completion with status 0x%x was found\n", wc.status); return -1; } } while (ne); |
2) Waiting for a Completion event for the next Work Completion (in non-blocking way):
struct ibv_context *context; struct ibv_cq *cq; void *ev_ctx = NULL; /* can be initialized with other values for the CQ context */ /* Create a CQ, which is associated with a Completion Event Channel */ cq = ibv_create_cq(ctx, 1, ev_ctx, channel, 0); if (!cq) { fprintf(stderr, "Failed to create CQ\n"); return -1; } /* Request notification before any completion can be created (to prevent races) */ ret = ibv_req_notify_cq(cq, 0); if (ret) { fprintf(stderr, "Couldn't request CQ notification\n"); return -1; } int flags; int ret; /* The following code will be called only once, after the Completion Event Channel was created */ printf("Changing the mode of Completion events to be read in non-blocking\n"); /* change the blocking mode of the completion channel */ flags = fcntl(channel->fd, F_GETFL); rc = fcntl(channel->fd, F_SETFL, flags | O_NONBLOCK); if (rc < 0) { fprintf(stderr, "Failed to change file descriptor of Completion Event Channel\n"); return -1; } . . /* Perform an operation that will eventually end with Work Completion */ . /* The following code will be called each time you need to read a Work Completion */ struct pollfd my_pollfd; struct ibv_cq *ev_cq; void *ev_ctx; int ne; int ms_timeout = 10; /* * poll the channel until it has an event and sleep ms_timeout * milliseconds between any iteration */ my_pollfd.fd = channel->fd; my_pollfd.events = POLLIN; my_pollfd.revents = 0; do { rc = poll(&my_pollfd, 1, ms_timeout); } while (rc == 0); if (rc < 0) { fprintf(stderr, "poll failed\n"); return -1; } ev_cq = cq; /* Wait for the completion event */ ret = ibv_get_cq_event(channel, &ev_cq, &ev_ctx); if (ret) { fprintf(stderr, "Failed to get cq_event\n"); return -1; } /* Ack the event */ ibv_ack_cq_events(ev_cq, 1); /* Request notification upon the next completion event */ ret = ibv_req_notify_cq(ev_cq, 0); if (ret) { fprintf(stderr, "Couldn't request CQ notification\n"); return -1; } /* Empty the CQ: poll all of the completions from the CQ (if any exist) */ do { ne = ibv_poll_cq(cq, 1, &wc); if (ne < 0) { fprintf(stderr, "Failed to poll completions from the CQ: ret = %d\n", ne); return -1; } /* there may be an extra event with no completion in the CQ */ if (ne == 0) continue; if (wc.status != IBV_WC_SUCCESS) { fprintf(stderr, "Completion with status 0x%x was found\n", wc.status); return -1; } } while (ne); |
FAQs
Do I have to work with Completion events?
No. The Completion events mechanism is a way to decrease the CPU consumption of reading Work Completions. The user doesn't have to use it, even if a Completion event was requested using ibv_req_notify_cq().
Can I read the Completion events once in a while (for example, every few seconds or minutes)?
Yes, you can. The downside for this is that you won't know when the Completion event happened, and maybe this information is irrelevant anymore.
Is this verb thread-safe?
Yes, this verb is thread-safe (just like the rest of the verbs).
I got a CQ Completion event. Will other processes get this event too?
No. A Completion event is generated for the CQ that it was requested for. Other CQs won't even know that this event occurred.
I got a Completion event, but there weren't any Work Completions in the CQ. Why?
When requesting for a Completion event using ibv_req_notify_cq(), this means that a Completion event will be generated for the next Work Completion that will enter the CQ, after this request. However, if one polls all of the Work Completions from the CQ (including the one that caused the Completion event), before waiting for the Completion event, the CQ may be empty. This isn't a bug, just a flow that one needs to be prepared to handle.
I called ibv_get_cq_event() and I didn't get any Completion event. why?
If one call ibv_req_notify_cq(), the next Work Completion (according to the requested solicited notification) that will be added to the CQ will generate a Completion event. If there is already a Work Completion in the CQ and no new Work Completion will be added to it after ibv_req_notify_cq() was called, when working in blocking mode ibv_get_cq_event() will be blocked.
Comments
Tell us what do you think.
why i send a message with the opcode is IBV_WR_RDMA_WRITE_WITH_IMM , the send end can not get a success singnal for send by ibv_get_cq_event()
Hi.
The used opcode is irrelevant to the fact that it can or can't read the completions with ibv_get_cq_event().
Does this SR create a completion in the Send side?
Please answer the following to make things clear for me:
* When you created the QP, what was the value of sq_sig_all?
* When you post the SR, do you set the IBV_SEND_SIGNALED in the SR.send_flags?
Thanks
Dotan
Hi Dotan,
Is it correct to state that for every ibv_req_notify_cq() the channel eventually generates exactly 1 event that can be consumed with ibv_get_cq_event() - regardless of CQ contents at that point? In other words, can one assume that the following pseudo-code won't block forever:
ibv_req_notify_cq();
ibv_req_notify_cq();
/* *now* someone adds at least 2 WRs that produce CQEs */
empty_all_the_cq();
ibv_get_cq_event();
ibv_get_cq_event();
Thanks.
Hi Igor.
No it isn't...
Calling ibv_req_notify_cq() change the (internal) CQ state to generate an event when the next Work Completion will be enqueued to it. Calling it once or several times before the CQ event was generated will have the same effect and generate only one CQ event.
Thanks
Dotan
I see. Good to know!
Thanks.
Hi Dotan
Is it possible that we may miss any of our completion using the blocking version? Can we get a completion in the CQ but do not get a notification, if so our program will wait endlessly in the ibv_get_cq_event(). How do we ensure no completion is ever missed? This is a hypothetical question.
Hi Omar.
Yes, it is possible:
If a Work Completion was added to the CQ and only then you called ibv_req_notify_cq(),
an event won't be generated and ibv_get_cq_event() will be blocked forever.
Please follow the suggest flow that I suggested in the post and call ibv_req_notify_cq() when for sure there isn't a Work Completion (to prevent the race that may happen).
Thanks
Dotan
Hi Dotan,
I'm trying to implement the non-blocking polling method you describe above.
If I understand correctly I must still call ibv_req_notify_cq otherwise the fd in the associated channel may never be signaled. (BTW it seems that the example code above for non-blocking poll is missing a call to ibv_req_notify_cq before polling for the first time.)
My own testing seems to support this.
In order to retrieve the completions I then call ibv_poll_cq. However I NEVER call ibv_get_cq_event! But afterwards when I call ibv_destroy_cq the call never returns. What might cause this behavior?
Thanks,
Ariel
Hi Ariel.
Your comment was moderated, just like the rest of the comments (to prevent SPAMs), so I'm sorry that it takes me some time to approve and answer it.
Thanks for the comment about the missing ibv_req_notify_cq(), I fixed it.
And yes, not calling ibv_get_cq_event() and the proper ibv_ack_cq_events() may cause the destroy CQ to never return.
Thanks
Dotan
Hi Dotan,
just a quick question:
Is there any way to come out of a blocking ibv_get_cq_event()-call besides a completion?
For example: A server-application has a dedicated thread that simply waits for completions, lets say only for incoming messages. Now the main thread of the server wants to shut down, how could it ever join the waiting thread?
Even if everything is disconnected, the QP and CQs are destroyed the waiter-thread just waits. Can this blocking ibv_get_cq_event() be interrupted by any means?
Thank you!
Martin
In general - no (I mean, once you called ibv_get_cq_event(), it can't be canceled).
However, I would have suggest to work in non-blocking mode, and use poll() and define the timeout to sleep.
For more information, please check the post on ibv_get_cq_event().
Thanks
Dotan
Hi Dotan,
Thank you very much for the wonderful article but I have an amateur question for you. I want to know exactly how the code for blocking and non-blocking are different in terms of their behavior and what is the advantage of one over the other? For example, wouldn't the while loop which polls in the non-blocking example block the same way as the blocking ibv_get_cq_event() function?
Thanks in advance.
Kaushal
Hi Kaushal.
The difference between the code that is block and the one that isn't blocking is only in the way that the wait for the Completion Event to occur.
Instead of calling ibv_get_cq_event() which will be blocked until a Completion Even will be generated,
the code can check if a Completion Event was generated, and it wasn't generated the code
can do whatever it needs to do (for example, calculate or whatever your code does).
The while loop in both examples is the same because the flow is the same.
One may ask: if I'm using non-blocking mode, why should I work with Completion Events?
The answer is that when working with Completion Events you can use poll and sleep for a while.
Furthermore, if you are using several Completion Queues working with Completion Events will inform
you about the Completion Queue that got the event
(instead of polling all of the Completion Queues and search for the one that got the Work Completion).
I hoped that I answered
:)
Thanks
Dotan
Once
Thank you Dotan.
Regards,
Kaushal
Hi Dotan,
I have some issues with getting completion events, I am using your code from examples and it seems that once in a while there is missing event. We are writing in 2 portions each time - 1048512 and 64 bytes, and let's say for 1% of operations completion event is missing.
Do you have any suggestion?
Thank you, also thank you very much for your very useful and well written blog!
Regards,
Lilia
Hi Lilia.
Thanks for the complements
:)
* Do you have this problem during work (i.e. after you got some events) or in the first event?
* What is the configuration that you are using (native/SR-IOV, which OS)?
* Can you share the code with me?
Thanks
Dotan
Hi Dotan,
1. It starts to happen after approximately 80 successful operations/events.
2. Redhat 6.5 64 bit, I guess it's native, how can I check?
3. Here is the code of the thread dedicated for receiving events.
/* Wait for the Completion event */
ret = ibv_get_cq_event(completionChannel, &pCq, &pCtx);
if (ret)
{
goto Exit;
}
/* Ack the event */
ibv_ack_cq_events(completionQueue, 1);
/* Request notification upon the next completion event */
ret = ibv_req_notify_cq(completionQueue, 0);
if (ret)
{
goto Exit;
}
/* Empty the CQ: poll all of the completions from the CQ (if any exist) */
do
{
num = ibv_poll_cq(completionQueue, 1, &wc);
if (num < 0)
{
goto Exit;
}
/* there may be an extra event with no completion in the CQ */
if (num == 0)
continue;
if (wc.status != IBV_WC_SUCCESS)
{
goto Exit;
}
switch (wc.opcode)
{
case IBV_WC_SEND:
opCode = SY_RDMA_SEND;
break;
case IBV_WC_RECV:
opCode = SY_RDMA_RECV;
break;
case IBV_WC_RDMA_WRITE:
opCode = SY_RDMA_WRITE;
break;
case IBV_WC_RDMA_READ:
opCode = SY_RDMA_READ;
break;
default:
continue;
}
(* onOperationCompleteCallback)(opCode, wc.byte_len, (SYRdmaContext)wc.wr_id);
} while (num);
Tried to play with completion queue length - 1000 and max_cqe - same result.
Thanks,
Lilia
Hi Lilia.
Do you check the completion status? (that there isn't any completion with error)
Do you check if there is a CQ overrun?
Thanks
Dotan
Hi Dotan,
I have a check for errors (just posted cropped version of code), there are no errors and no unexpected opcodes. I have a thread for async events, but nothing there as well. How else do I check for CQ overrun?
Thanks,
Lilia
Hi Lilia.
CQ overrun is reported using asynchronous events
(to both CQ and QP).
You have the thread that you mentioned - so I guess this isn't the case.
I wonder, did you try to using the non-blocking mode that I wrote in the post?
this way, even if (in mysterious way) an event disappears,
your program won't be blocked forever.
Can you check the QP state when you don't have an event?
Did you check if there is anything in the CQ?
Maybe the lack of Completion event means that no new Work Completion was added to the CQ..
Thanks
Dotan
When you don't get the
Hi Dotan,
I have been studying the implementation of the verb functions as part of my research. I understand that ibv_get_cq_event() reads on a file descriptor until a completion event is added to it and the associated file descriptor is probably "/dev/infiniband/rdma_cm". But what I am unable to figure out is which entity and which part of the code in the OFED library writes to this file descriptor.
It would be great if you could point that out to me.
Thank You,
Ace
Hi Ace.
Actually, ibv_get_cq_events() waits for a data which is sent by the kernel part of the RDMA stack
("/dev/infiniband/rdma_cm" is used by librdmacm functions and not libibverbs functions).
When creating a Completion Event Channel, the RDMA stack (kernel part) allocates a File Descriptor that will be used to communicate (one way: from kernel -> user) about Completion Events.
For more information, please refer to:
1) libibverbs sources: file: src/verbs.c, function: ibv_create_comp_channel
2) Linux kernel: file: infiniband.git/tree/drivers/infiniband/core/uverbs_cmd.c, function: ib_uverbs_get_context
I hope that this helped you.
Dotan
Thank you very much for the reply Dotan.
So the file descriptor is allocated by ib_uverbs_get_context() when there a call to create a completion event channel. Assuming ibv_create_comp_channel() and ibv_req_notify_cq() have already been called and ibv_get_cq_event() is currently waiting on the read() function. During this time if there was a completion event, will ib_uverbs_get_context() be responsible for updating the file descriptor?
Sorry for asking so many questions. Feel free to reply at your convenience.
Thank You,
Ace
Hi Ace.
ib_uverbs_get_context() allocates file descriptors for other roles than Completion Events.
Byt, anyway the answer is yes:
Assuming that the verbs that you mentioned were called, if there is a Completion Event, the kernel part will write this information to the file descriptor of the Completion Events. When the userspace process will call read(), it will read this data.
Thanks
Dotan
Let's say there is a receiver that handles incoming data on CQs A, B, C... bound to the same completion channel and ibv_get_cq_event() returns an event for CQ A. Emptying CQ A may take a long time if the sender is streaming data in long bursts at a high rate. Receiver needs to handle other senders and it may need to stop polling CQ A if it is still not empty after a certain time. However if the receiver stops polling and the sender on CQ A stops the burst exactly between steps 3 and 4 of Stage II then there will not be any more events for CQ A until more data comes in. Which may not happen if it depends on a response to the last data stuck in CQ A. Hence polling can stop without emptying the CQ only if it's possible to find out that a new completion event is pending for this CQ or if the handler can forcibly generate a "dummy" completion to make sure there is a new event. But is any of this possible to do?
Thanks,
Al
Hi.
IMHO, this is a SW engineering problem and not an RDMA problem.
What you can do is have several threads that one of them will wake up once you have a completion event.
This way, there won't be any starvation to the other CQs, and still you'll empty the CQ.
BTW, there isn't any way to add a dummy completion; sorry.
Thanks
Dotan
I got a completion event, how to check if it is solicited event or not?
Hi.
Once you have a completion event, you don't have the information whether it is solicited or not.
The only place that you set different behavior whether the completion event is solicited event or not is when you call ibv_req_notify_cq().
Thanks
Dotan
Hi Dotan.
Is there a way to generate EPOLLOUT and EPOLLIN events on one completion channel file descriptor which is associated with two completion queues (Send & Recv queues)?
For now, I could only detect the EPOLLIN event on the completion channel file descriptor.
Thanks
Haonan
Hi.
Upon creation of a CQ you provide the completion channel.
Providing the same completion channel for both CQs and using its file descriptor with the poll family() should do the trick.
Thanks
Dotan
Sorry,
I find something mess in my previous explanation,
There are server side and client side,
at server side,
1.First create a cq with cqe = 100 and a qp with both max_send_wr and max_recv_wr are 8.
2.Call pthread_create.
3.There are 4 threads at server side, each thread pre-posts recv and call ibv_get_cq_event and ibv_poll_cq like above example in block way.
4.Post send with IBV_SEND_SIGNALED flag and IBV_WR_SEND opcode.
5.Call ibv_get_cq_event and ibv_poll_cq again.
6.Call pthread_exit.
At client side, just like the server side,
1.First, create a cq and qp with the same parameter.
2.Call pthread_create to create 4 threads, and then post recv and post send twice
(the opcode of first post send is IBV_WR_RDMA_WRITE, the second is IBV_WR_SEND. Both are with IBV_SEND_SIGNALED flag).
3.Call pthread_exit and then call ibv_get_cq_event and ibv_poll_cq like above example.
The problem is, every time I execute my code, I get segmentation fault or get completion error at ibv_get_cq_event or ibv_poll_cq either the client or the server
Thanks
Hi.
Assuming that all threads are working with the same QP and CQ,
every thread may get completions of different thread.
And you may call ibv_get_cq_event() multiple time to the *same* next work completion,
it may or may not work (since there isn't any sync between the flow in the threads).
However, I don't expect that you'll get a segmentation flow in the sencario you described.
Thanks
Dotan
you may not get completions
Except to the fact that you may not get the expect the completions