ibv_post_srq_recv()
Contents
int ibv_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *recv_wr, struct ibv_recv_wr **bad_recv_wr); |
Description
ibv_post_srq_recv() posts a linked list of Work Requests (WRs) to a Shared Receive Queue (SRQ).
ibv_post_srq_recv() go over all of the entries in the linked list, one by one, check that it is valid, generate a HW-specific Receive Request out of it and add it to the tail of the SRQ without performing any context switch. The RDMA device will take one of those Work Requests as soon as an incoming opcode to a QP, which is associated with this SRQ, will consume a Receive Request (RR). If there is a failure in one of the WRs because the Receive Queue is full or one of the attributes in the WR is bad, it stops immediately and return the pointer to that WR.
ibv_post_srq_recv() can be called whether any QP is associated with the SRQ or not and regardless to the state of the QPs which are associated with it.
A QP, which is associated with an SRQ, will handle Work Requests in the Receive queue according to the following rules:
- If the QP is in REST, INIT or ERROR state, incoming messages won't be processed and no RR will be fetched from the SRQ.
- If the QP is in RTR, RTS, SQD or SQE state, incoming messages will be processed and RRs will be fetched from the SRQ if needed.
- If the SRQ entered into an ERROR state (internal state) the affiliated asynchronous event IBV_EVENT_SRQ_ERR will be generated.
- If the SRQ is in ERROR state or no QP is associated with this SRQ no RRs will be fetched any more from this SRQ.
If a QP is associated with an SRQ, one must call ibv_post_srq_recv(), and not ibv_post_recv(), since the QP's own receive queue will not be used.
If a RR is being posted to an UD QP, the Global Routing Header (GRH) of the incoming message will be placed in the first 40 bytes of the buffer(s) in the scatter list. If no GRH is present in the incoming message, then the first bytes will be undefined. This means that in all cases, the actual data of the incoming message will start at an offset of 40 bytes into the buffer(s) in the scatter list.
The struct ibv_recv_wr describes the Work Request to the Shared Receive Queue, i.e. Receive Request.
struct ibv_recv_wr { uint64_t wr_id; struct ibv_recv_wr *next; struct ibv_sge *sg_list; int num_sge; }; |
Here is the full description of struct ibv_recv_wr:
wr_id | A 64 bits value associated with this WR. A Work Completion will be generated when this Work Request ends, it will contain this value |
next | Pointer to the next WR in the linked list. NULL indicates that this is the last WR |
sg_list | Scatter/Gather array, as described in the table below. It specifies the buffers where data will be written in. The entries in the list can specify memory blocks that were registered by different Memory Regions. The maximum message size that it can serve is the sum of all of the memory buffers length in the scatter/gather list |
num_sge | Size of the sg_list array. This number can be less or equal to the number of scatter/gather entries that the SRQ was created to support (srq_init_attr.attr.max_sge). If this size is 0, this indicates that the message size is 0 |
struct ibv_sge describes a scatter/gather entry. The memory buffer that this entry describes must be registered until any posted Work Request that uses it isn't considered outstanding anymore. The order in which the RDMA device access the memory in a scatter/gather list isn't defined. This means that if some of the entries overlap the same memory address, the content of this address is undefined.
struct ibv_sge { uint64_t addr; uint32_t length; uint32_t lkey; }; |
Here is the full description of struct ibv_sge:
addr | The address of the buffer to read from or write to |
length | The length of the buffer in bytes. The value 0 is a special value and is equal to [latex]2^{31}[/latex] bytes (and not zero bytes, as one might imagine) |
lkey | The Local key of the Memory Region that this memory buffer was registered with |
While a WR is considered outstanding, the local memory buffers content shouldn't be read since one doesn't know when the RDMA device will stop writing new content to it.
If the SRQ isn't in ERROR state and one of the QPs that are associated with the SRQ receive an incoming message that should consume a RR has arrived, a RR will be fetched from the head of SRQ in an atomic way. Since one cannot control or predict in advanced which WR will be fetched from the SRQ to which QP, it is highly advised that all of the WRs in the SRQ will be able to handle the maximum message that any QP may receive.
Parameters
Name | Direction | Description |
---|---|---|
srq | in | Shared Receive Queue that was returned from ibv_create_srq() |
wr | in | Linked list of Work Requests to be posted to the Shared Receive Queue |
bad_wr | out | A pointer to that will be filled with the first Work Request that its processing failed |
Return Values
Value | Description |
---|---|
0 | On success |
errno | On failure and no change will be done to the SRQ or the QPs that are associated with it and bad_wr points to the RR that failed to be posted |
EINVAL | Invalid value provided in wr |
ENOMEM | SRQ is full or not enough resources to complete this operation |
EFAULT | Invalid value provided in qp |
Examples
Posting a RR to an SRQ:
struct ibv_sge sg; struct ibv_recv_wr wr; struct ibv_recv_wr *bad_wr; memset(&sg, 0, sizeof(sg)); sg.addr = (uintptr_t)buf_addr; sg.length = buf_size; sg.lkey = mr->lkey; memset(&wr, 0, sizeof(wr)); wr.wr_id = 0; wr.sg_list = &sg; wr.num_sge = 1; if (ibv_post_srq_recv(srq, &wr, &bad_wr)) { fprintf(stderr, "Error, ibv_post_srq_recv() failed\n"); return -1; } |
FAQs
Does ibv_post_srq_recv() cause a context switch?
No. Posting a RR doesn't cause a context switch at all; this is why RDMA technologies can achieve very low latency (below 1 usec).
How many WRs can I post?
There is a limit to the maximum number of outstanding WRs for an SRQ. This value was specified when the SRQ was created.
Can I know how many WRs are outstanding in a Work Queue?
No, you can't. You should keep track of the number of outstanding WRs according to the number of posted WRs and the number of Work Completions that you polled. However, for SRQ you can use the LIMIT mechanism that will create an affiliated asynchronous event when the number of WRs in an SRQ drops below a specific value.
Can I know which QP will fetch a specific WR from the SRQ?
No, you don't. This is the reason that all of the WRs in the SRQ should be able to hold the maximum message that any of the QP which are associated with the SRQ may receive.
Which operations will consume RRs?
If the remote side post a Send Request with one of the following opcodes, a RR will be consumed:
- Send
- Send with Immediate
- RDMA Write with immediate
What will happen if I will deregister an MR that is used by an outstanding WR?
When processing a WR, if one of the MRs that are specified in the WR isn't valid, a Work Completion with error will be generated.
I called ibv_post_srq_recv() and I got segmentation fault, what happened?
There may be several reasons for this to happen:
1) At least one of the sg_list entries is in invalid address
2) The value of next points to an invalid address
3) Error occurred in one of the posted RRs (bad value in the RR or full Work Queue) and the variable bad_recv_wr is NULL
Help, I've posted and Receive Request and it wasn't completed with a corresponding Work Completion. What happened?
In order to debug this kind of problem, one should do the following:
- Verify that a Send Request was actually posted in the remote QP
- Verify that a Receive Request was actually posted in the local QP
- Wait enough time, maybe a Work Completion will eventually be generated
- Verify that the logical port state of the RDMA device is IBV_PORT_ACTIVE
- Verify that the QP state is in one of the following states: RTR, RTS, SQD, SQE, ERROR
I had a code that worked with UC or RC QP and I added support to UD QP, but I keep getting Work Completion with error. What happened?
For UD QP, an extra 40 bytes should be added to the RR buffers (to allow save the GRH, if such exists in this message).
Can I (re)use the Receive Request after ibv_post_srq_recv() returned?
Yes. This verb translates the Receive Request from the libibverbs abstraction to a HW-specific Receive Request and you can (re)use both the Receive Request and the s/g list within it.
Comments
Tell us what do you think.
Hi,
If we have a program involved in different message size(from 4KB to 16MB even more), what's the best practice for post buffer?
For example, if server side post ten 4kb RR and ten 4MB RR, is it can match the best incoming payload size?
What if client send a very large message size beyond server-side's RR, is it recommend to split payload in applicantion'side?
Hi.
Since per Receive Request one can't predict which message size will consume it,
IMHO there are two options for handling this:
1) Be prepared to receive the maximum incoming message size (4 MB in your example)
2) Work with two SRQs: one will handle 4 KB messages, and the second one will handle 4 MB messages
Thanks
Dotan
If I want to use different message sizes and use 2 SRQ(s) as you suggested is it enough? how does it work, for example if I have small buffers of 4k and large buffers of 256k. sending a buffer of less than 4k will consume the small buffers and any buffer larger than 4k will consume the large buffer (256k)?
Hi.
What do you mean "is it enough?",
To use big and small buffers you should associate the QP with the small and big SRQs
(i.e. the SRQ that will accept big messages and the one that will accept small messages).
You need to know to which QPs to send the messages,
otherwise the buffers in the Receive Requests won't be enough ...
Thanks
Dotan
Hi Dotan,
I didnt undertand this properly "Work with two SRQs: one will handle 4 KB messages, and the second one will handle 4 MB messages"
How can i decide from Message to which SRQ i should i process?
Hi.
At the creation of the QP, one should choose the SRQ that the QP will be associated with.
Thanks
Dotan
Hello,
I just wanted to clarify how has to be managed wr_id posted to SRQ:
is it unique per SRQ ?
is it unique per RR of QP posting to SRQ ?
Can one thread post RR to SRQ and another one consume it for received untagged message ?
Hi Elena.
wr_id of any Work Request (Send or Receive Request) is a completely application specific;
it can be value.
Different threads can use different values or the same value;
it can help the application to understand which Work Request was completed
(if this is needed).
Thanks
Dotan
Thanks Dotan,
so this means usage of wr_id becomes meaningless in case two threads (two different QPs) post RR to the same SRQ and every thread manages it's own wr_id assignment rules, since RR posted by one thread can be consumed by another one.
Hi Elena.
I would say it has less effect than meaningless in your scenario;
if this is important to you, you can change it ..
For example, since the wr_id is 64 bits value: use the higher 32 bits as the thread id,
and the lower 32 bits as an identifier.
Just an idea (if wr_id is important for you).
Thanks
Dotan
Hi Dotan,
You mentioned "At the creation of the QP, one should choose the SRQ that the QP will be associated with". But I still don't get how to choose the SRQ that I want the new QP associated with, could you help me with that?
Hi.
Before creating a QP, you need to create a SRQ (using ibv_create_srq),
and provide set it in the QP init attributes.
Is this is clear now?
Thanks
Dotan