ibv_create_qp()
Contents
struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr); |
Description
ibv_create_qp() creates a Queue Pair (QP) associated with a Protection Domain.
The user can define the minimum attributes to the QP: number of Work Requests and number of scatter/gather entries per Work Request to Send and Receive queues. The actual attributes can be equal or higher than those values.
The struct ibv_qp_init_attr describes the requested attributes of the newly created QP.
struct ibv_qp_init_attr { void *qp_context; struct ibv_cq *send_cq; struct ibv_cq *recv_cq; struct ibv_srq *srq; struct ibv_qp_cap cap; enum ibv_qp_type qp_type; int sq_sig_all; }; |
Here is the full description of struct ibv_qp_init_attr:
qp_context | (optional) User defined value which will be available in qp->qp_context | ||||||
send_cq | A Completion Queue, that was returned from ibv_create_cq(), to be associated with the Send Queue | ||||||
recv_cq | A Completion Queue, that was returned from ibv_create_cq(), to be associated with the Receive Queue | ||||||
srq | (optional) A Shared Receive Queue, that was returned from ibv_create_srq(), that this Queue Pair will be associated with. Otherwise, NULL | ||||||
cap | Attributes of the Queue Pair size, as described in the table below. Upon a successful Queue Pair creation, this structure will hold the actual Queue Pair attributes | ||||||
qp_type | Requested Transport Service Type of this QP:
|
||||||
sq_sig_all | The Signaling level of Work Requests that will be posted to the Send Queue in this QP.
|
The InfiniBand spec defines the QP transport type: Reliable Datagram. However, the RDMA software stack doesn't support it nor any RDMA device.
send_cq and recv_cq can be the same CQ or different CQs.
An RC and UD QPs always can be associated with an SRQ. There are RDMA devices which allow a UC QP to be associated with an SRQ as well. However, currently there isn't any indication to know that the RDMA device supports this.
struct ibv_qp_cap describes the size of the Queue Pair (for both Send and Receive Queues).
struct ibv_qp_cap { uint32_t max_send_wr; uint32_t max_recv_wr; uint32_t max_send_sge; uint32_t max_recv_sge; uint32_t max_inline_data; }; |
Here is the full description of struct ibv_qp_cap:
max_send_wr | The maximum number of outstanding Work Requests that can be posted to the Send Queue in that Queue Pair. Value can be [0..dev_cap.max_qp_wr]. There may be RDMA devices that for specific transport types may support less outstanding Work Requests than the maximum reported value. |
max_recv_wr | The maximum number of outstanding Work Requests that can be posted to the Receive Queue in that Queue Pair. Value can be [0..dev_cap.max_qp_wr]. There may be RDMA devices that for specific transport types may support less outstanding Work Requests than the maximum reported value. This value is ignored if the Queue Pair is associated with an SRQ |
max_send_sge | The maximum number of scatter/gather elements in any Work Request that can be posted to the Send Queue in that Queue Pair. Value can be [0..dev_cap.max_sge]. There may be RDMA devices that for specific transport types may support less scatter/gather elements than the maximum reported value. |
max_recv_sge | The maximum number of scatter/gather elements in any Work Request that can be posted to the Receive Queue in that Queue Pair. Value can be [0..dev_cap.max_sge]. There may be RDMA devices that for specific transport types may support less scatter/gather elements than the maximum reported value. This value is ignored if the Queue Pair is associated with an SRQ |
max_inline_data | The maximum message size (in bytes) that can be posted inline to the Send Queue. 0, if no inline message is requested |
Sending inline'd data is an implementation extension that isn't defined in any RDMA specification: it allows send the data itself in the Work Request (instead the scatter/gather entries) that is posted to the RDMA device. The memory that holds this message doesn't have to be registered. There isn't any verb that specifies the maximum message size that can be sent inline'd in a QP. Some of the RDMA devices support it. In some RDMA devices, creating a QP with will set the value of max_inline_data to the size of messages that can be sent using the requested number of scatter/gather elements of the Send Queue. If others, one should specify explicitly the message size to be sent inline before the creation of a QP. for those devices, it is advised to try to create the QP with the required message size and continue decreasing it if the QP creation fails.
Parameters
Name | Direction | Description |
---|---|---|
pd | in | Protection Domain that was returned from ibv_alloc_pd() |
qp_init_attr | in/out | Requested attributes for the Queue Pair. After the QP creation, it will hold the actual attributes of the QP |
Return Values
Value | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
QP | A pointer to the newly allocated Queue Pair. This pointer also contains the following fields:
|
||||||||
NULL | On failure, errno indicates the failure reason:
|
Examples
1) Create a QP with both CQ in the Send and Receive Queues and destroy it:
struct ibv_pd *pd; struct ibv_cq *cq; struct ibv_qp *qp; struct ibv_qp_init_attr qp_init_attr; memset(&qp_init_attr, 0, sizeof(qp_init_attr)); qp_init_attr.send_cq = cq; qp_init_attr.recv_cq = cq; qp_init_attr.qp_type = IBV_QPT_RC; qp_init_attr.cap.max_send_wr = 2; qp_init_attr.cap.max_recv_wr = 2; qp_init_attr.cap.max_send_sge = 1; qp_init_attr.cap.max_recv_sge = 1; qp = ibv_create_qp(pd, &qp_init_attr); if (!qp) { fprintf(stderr, "Error, ibv_create_qp() failed\n"); return -1; } if (ibv_destroy_qp(qp)) { fprintf(stderr, "Error, ibv_destroy_qp() failed\n"); return -1; } |
2) Create a QP with different CQs in the Send and Receive Queues:
struct ibv_pd *pd; struct ibv_cq *send_cq; struct ibv_cq *recv_cq; struct ibv_qp *qp; struct ibv_qp_init_attr qp_init_attr; memset(&qp_init_attr, 0, sizeof(qp_init_attr)); qp_init_attr.send_cq = send_cq; qp_init_attr.recv_cq = recv_cq; qp_init_attr.qp_type = IBV_QPT_RC; qp_init_attr.cap.max_send_wr = 2; qp_init_attr.cap.max_recv_wr = 2; qp_init_attr.cap.max_send_sge = 1; qp_init_attr.cap.max_recv_sge = 1; qp = ibv_create_qp(pd, &qp_init_attr); if (!qp) { fprintf(stderr, "Error, ibv_create_qp() failed\n"); return -1; } |
3) Create a QP, which is associated with an SRQ:
struct ibv_pd *pd; struct ibv_cq *cq; struct ibv_srq *srq; struct ibv_qp *qp; struct ibv_qp_init_attr qp_init_attr; memset(&qp_init_attr, 0, sizeof(qp_init_attr)); qp_init_attr.send_cq = send_cq; qp_init_attr.recv_cq = recv_cq; qp_init_attr.srq = srq; qp_init_attr.qp_type = IBV_QPT_RC; qp_init_attr.cap.max_send_wr = 2; qp_init_attr.cap.max_send_sge = 1; qp = ibv_create_qp(pd, &qp_init_attr); if (!qp) { fprintf(stderr, "Error, ibv_create_qp() failed\n"); return -1; } |
4) Create a QP with supports inline message:
struct ibv_pd *pd; struct ibv_cq *cq; struct ibv_qp *qp; struct ibv_qp_init_attr qp_init_attr; memset(&qp_init_attr, 0, sizeof(qp_init_attr)); qp_init_attr.send_cq = cq; qp_init_attr.recv_cq = cq; qp_init_attr.qp_type = IBV_QPT_RC; qp_init_attr.cap.max_send_wr = 2; qp_init_attr.cap.max_recv_wr = 2; qp_init_attr.cap.max_send_sge = 1; qp_init_attr.cap.max_recv_sge = 1; qp_init_attr.cap.max_inline_data = 512; qp = ibv_create_qp(pd, &qp_init_attr); if (!qp) { fprintf(stderr, "Error, ibv_create_qp() failed\n"); return -1; } |
FAQs
Why is a QP good for anyway?
QP is the actual object that sends and receives data in the RDMA architecture (something like a socket).
Are socket and QP equivalent?
Not exactly. A socket is an abstraction, which is maintained by the network stack and doesn't have a physical resource behind it. A QP is a resource of an RDMA device and a QP number can be used by one process at the same time (similar to a socket that is associated with a specific TCP or UDP port number)
Can I associate several QPs with the same SRQ?
Yes. you can.
Which QP Transport Types can be associated with an SRQ?
RC and UD QPs, can be associated with an SRQ by all RDMA devices. In some RDMA devices, you can associate a UC QP with an SRQ as well.
Do I need to set the Receive Queue attributes if I associate a QP with an SRQ?
No, you don't have to do it. The Receive Queue attributes are completely ignored if the QP is being associated with an SRQ.
Can I use the same CQ in both Send in Receive Queues?
Yes. you can.
Can I use one CQ in the Send Queue and another CQ in the Receive Queue?
Yes. you can.
How can I know what is the maximum message size that can be sent inline in a QP?
You can't know this information. This information is unavailable. You find this information by trial and error.
I created a QP with transport type X and the QP was created successfully. I tried to create a QP with transport type Y and the QP creation failed. What happened?
The value in dev_cap.max_sge and dev_cap.max_qp_wr reports the supported values of scatter/gather entries and Work Requests that are supported by any QP's transport type. However, for a specific RDMA device, there may be QP's transport types that cannot be created with those maximum values. Using trial and error, one should get the right attributes for this specific RDMA device.
The device capabilities reported that max_qp_wr/max_sge is X, but when I tried to create a QP with those attributes it failed. What happened?
The value in dev_cap.max_sge and dev_cap.max_qp_wr reports the maximum supported values of scatter/gather entries and Work Requests that are supported by any Work Queue (Send and Receive). However, for a specific RDMA device, there may be other considerations for the Send or Receive Queue that prevent a QP to be created with those maximum values. Using trial and error, one should get the right attributes for this specific RDMA device.
Comments
Tell us what do you think.
How do you associate each qp with cq?
When creating a QP, one fills the structure ibv_qp_init_attr:
* The field send_cq is the CQ that is associated with the QP's Send Queue.
* The field recv_cq is the CQ that is associated with the QP's Receive Queue.
You can use the same CQ for both Send and Receive Queues or use different CQs.
When you call ibv_create_qp(), the newly created QP is associated with the CQs that you specified.
Thanks
Dotan
Hi Dotan!
Can i use same CQ for different QP's?
Unfortunately I can't create QP with shared receive queue in windows OFED, but this work in the Mellanox OFED (WinOF, but i can't use it)
Hi Max.
Yes, you can use the same CQ for different QPs.
Thanks
Dotan
Dotan, thanks for the info. I'm getting an ENOMEM on creating the third call (in a server program; one context for each client) to ibv_create_qp() with following parameters (1 page size for memory region, shared send & recv cq, of depth 10, max send/recv sge = 1). strace indicates the create QP verb failing on write to the verbs device with ENOMEM. All others are default settings. Any pointers on how to proceed will be much appreciated.
Thanks
Sara
A quick update: if I run it as superuser I don't hit this issue. But "ulimit -a" shows the same values for both user & superuser. What could be the difference between the two scenarios?
Thanks
Sara
Please ignore my comments :)
The /etc/security/limits.conf values were not properly propagated due to incorrect pam config. I fixed that and now it works.
This is great that you managed to solve it, thanks for the update.
When I'll finish covering all of the verbs description,
I plan to write about the memory locking issues...
I hope that you find this blog useful..
Dotan
I'm able to create a qp with qp_type IBV_QPT_RC. However creating a qp with type IBV_QPT_UD and with the same parameters that I used to create IBV_QPT_RC, returns NULL with invalid argument error. I could not figure out which parameter could be invalid, any suggestions? I'm trying to create a UD.
Thanks,
Jeff
Hi.
If you'll specify the attributes that you are using for creating the QP,
maybe I'll be able to provide a tip on this...
There may be some HCAs that have different attributes to RC and UD QPs,
so decreasing the number of s/g or the number of WRs or the number of inline data may fix this issue.
Thanks
Dotan
Hi,
I'm using an UD communication. At the "server" side I do a ibv_create_qp every time that a client "connects" (with quotes because there is no connection in the traditional sense). However, since there is no real connection, how can I know that the client disconnected in order to release the QP created with ibv_create_qp?
Thank you very much for maintaining this great, and resourceful, website!
Hi.
First of all, thanks for he complements, I'm trying to do my best
:)
In order to know when to destroy the QP, you have several options:
1) Use the CM libraries (libibcm/librdmacm) for connection establishment and teardown
2) Handle this within your application: maintain a "keep alive" messages and/or "leaving" message
The question, is do you really need several QPs?
You can use the same QP to handle all the communications...
(only different Address Handle can be used)
I hope that my answer helped you..
Thanks
Dotan
Thank you for the answer. It indeed helps.
I guess that reusing the QP is the easiest solution. But that brings me two doubts:
- Does a single QP scale well?
- Is it expensive to create&destroy an address handle every time the server receives a message?
Thanks
Hi Lluis.
Those are good questions:
* The question is will one UD QP will scale to your needs
(IMHO, one UD QP can't get to full line rate, but I don't know what you application needs are)
* Create and destroy Address Handle is relatively cheap compared to create and destroy a QP
(AH can be created without a context switch - depends on the low level driver,
and it has small footprints compared to a QP)
Thanks
Dotan
Hi, Dotan,
I've got a question regarding max_send/recv_wr qp attributes. While max_recv_wr is clear for me (i can prepost as many wrs to recv qp as it was specified with this attribute value) there is still some ambiguity with max_send_wr parameter. Suppose, for example, i set max_send_wr=5, and i'm doing ibv_post_send calls in a loop (each time posting a single wr). Is it correct to say that proper code has to wait for 5 completions after each 5 WRs posted (assuming all a signalled)? Or, the work requests are consumed when they are being posted? Will the code work if max_send_wr=1 and I only check for send completion queue overflow (and not the send QP depth itself)? Thanks in advance for your help!
Hi.
A Send Request (like any other Work Request) is considered outstanding until there is a Work Completion for it or for Send Requests that were posted after it
(if you are using Unsignaled Send Requests).
The attribute max_send_wr specify how many Send Requests can be outstanding.
So, if all Send Requests are signaled - you must poll the corresponding Send Requests.
If for example, you set 5 in max_send_wr (assuming that the low-level driver didn't increase this value),
and you posted 5 Send Requests. Posting the 6th Send Request will fail, and you'll be able to post another Send Requests
after at least one Work Completion (that was generated from a Send Request which ended) will be polled from the Completion Queue.
You can look at it as polling Work Completion of an ended Send Request consume the Send Request from the Send Queue.
It is more clear now?
Thanks
Dotan
Oh, I see now. Thanks a lot! BTW, does the same hold for SEND_INLINE? I mean there is no additional semantics with respect to completion right (only buffering)?
Yes.
SEND_INLINE is yet another feature in ibv_post_send() and the semantics that I wrote is relevant to it as well.
Thanks
Dotan
Ok, I see. Thanks again for doing a great job with this blog! It's been extremely helpful for me!
hi,
I am getting error no 12 while tring to create the queue pair and when I reduced the size of max_send_wr,then there is no issues in creating the queue pair. Earlier I used the max device limit which I found by devattr->max_qp_wr. Is the issue is because of the reason that you mention above.
The maximum number of outstanding Work Requests that can be posted to the Send Queue in that Queue Pair. Value can be [0..dev_cap.max_qp_wr]. There may be RDMA devices that for specific transport types may support less outstanding Work Requests than the maximum reported value.
And if it reason then is there any other way by which I find out max sendq limit.
Thanks
Hi.
I have some questions to be able to answer:
1) Under which user name are you working?
2) What is the value of 'ulimit -l'?
3) Which RDMA device are you using?
thanks
Dotan
Hi,
ulimit -l is unlimited.
$ ibv_devinfo -v
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.7.710
node_guid: f04d:a290:9779:10e0
sys_image_guid: f04d:a290:9779:10e3
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: DEL08F0120009
phys_port_cnt: 2
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffe00
max_qp: 65456
max_qp_wr: 16384
For the recv queue it is allowing 16384 but for the sendq max 16351 allowed.
Thanks,
I added the following note to the Q&A at this post:
The value in dev_cap.max_sge and dev_cap.max_qp_wr reports the maximum supported values of scatter/gather entries and Work Requests that are supported by any Work Queue (Send and Receive). However, for a specific RDMA device, there may be other considerations for the Send or Receive Queue that prevent a QP to be created with those maximum values. Using trial and error one should get the right attributes for this specific RDMA device.
This answers your question...
Thanks
Dotan
Hi,
thanks for the reply, but it's applicable only if we always use same machine. Actually in my case software has to run on client side and there we can't try trail and error method. So if there any alternate way for eg. max sendq depth must be guaranteed to be half of the max_qp_wr.
Sorry, I can't give you a recipe that will work for all the RDMA devices;
Try using the reported values minus DELTA, and increase the delta that will fit all the devices that you are working with...
Thanks
Dotan
You can
Hi, Dotan:
As the QP and CQ will consumer the resource in the RNIC, if the dev_cap.max_qp_wr = 1024, what's the meaning for max_send_wr/max_recv_wr in one connection? Does all the connections share the same dev_cap.max_qp_wr? more connections(one qp per connection) means the max_send_wr will be smaller, or no matter how many queue pairs. their max_send_wr can be as large as dev_cap.max_qp_wr?
Hi.
Every QP can have a Work Queue which its depth is maximum of max_qp_wr.
How the RDMA device handle it, depends on its internal implementation.
For example, if there are 100 QPs which each of them has 1024 WRs,
and the application, in a magical way, posts 1024 SRs to every one of them;
how those SRs will be processed depends on the RDMA device.
You only mentioned the number of message aspect, but if one QP sends
several 2GB messages and the rest send only 1B message?
Bottom line: the scheduling policy of the Send Requests processing is an internal attribute of the RDMA device.
Thanks
Dotan
Thanks ! so this means that the post 1024 SRs can be blocked or go into a error state?
Does this also apply to the Complete Queue?
If one will post 1024 (i.e. fill the Send Queue) with unsignaled Send Requests no more Send Requests can be posted to this Send Queue anymore.
Theoretically, nothing should happen to the Completion Queue.
However, I may think about implementations that may add Work Completions to the Completion Queue.
One should avoid getting into this state, since there isn't any indication when the Send Requests processing was ended.
Thanks
Dotan
Thanks for your answer!
I have occurred a very strange problem. I have a cluster of 6 machine, each one may using RDMA Read to read data from other machines at the same time. I find RDMA READ would cost about 1s to fetch data which size is about 10M. My environment is 10Gb, and using ROCE, the PFC is configure correctly using priority 3.
This is too long for a RDMA READ, but I can not do nothing, because all this is done by RNIC. my max_send_wr && max_recv_wr is 200, max_send_sge && max_recv_sge is 4. Is there any wrong for my configuration?
Hi.
Did you execute the RDMA Read/Write benchmarks on you setup?
This will help you understand if you have a configuration problem.
Maybe there is a retry and this causes a delay.
Which values did you configure for the retry in the QP?
(I mean the 'timeout' attribute).
Thanks
Dotan
Thanks!
The retry is 6, maybe I can set up this lower to demonstrate whether caused by retry?
I do not sure initiator_depth and responder_resources have a effect on the RDMA read, because I set both of them as 2.
Hi.
1) I would set the retry_count value to 1, to see if there are errors.
2) If you want more parallel RDMA Reads to be initiated,
you need to increase the initial_depth (and make the responder_resources be able to accept this value).
Thanks
Dotan
yes. I use the async mode, there are some parallel RDMA READs.
I wonder whether there is some detail data about how the value of initial_depth & responder_resources impact the parallel RDMA Read? because I find many resource, and few of them take about this two values, the max_send_wr/max_recv_wr is talked more.
Hi.
I don't know if there is any detailed information on this; try the InfiniBand spec.
Anyway, those values specify the number of in-fligt RDMA Read/Atomic that QP can handle in parallel as requestor/responder.
Thanks
Dotan
Thanks, Dotan, I will try to find out.
Hi Dotan,
I have a question. When we create a qp using the RDMA verbs, are the queues physically created in the RNIC memory (RNIC DDR) or the HOST system memory? Thanks in advance for your help!
Hi and welcome to the RDMA world
:)
When calling RDMA version, actual HW resources are created.
their location (RNIC memory or Host memory) depends on the RDMA device technology (device specific):
* Some of them will create the resources in the attached memory (if such exists)
* Some of them will create the resources in the host memory
You should ask your HW vendor how his device behaves (in this aspect).
Thanks
Dotan
Hi!
Firstly, I'd like to thank you for this great overview. When I try to create a *lot* of QPs (>100k), I run into ENOMEM errors. Is there some way to assign more memory to the HCA or is there also some HW limitation to the number of available QPs?
Hi Mandrake.
I can't answer without know which HW you are using.
In general, RDMA supports up to 16M QPs (since there are 24 bits for QP numbers).
Possible solutions/ideas:
* Maybe you need to load the driver with different parameters to allow support for many QPs
* Maybe the problem is lack of memory in your host
Thanks
Dotan
Hi Dotan. Thanks a lot for your answer. We are using Mellanox Connect-IB cards running with the mlx-5 driver. I could not find any module options to the kernel module. Host memory should be no problem as the machines have 32GB of which only 4 are in use.
May I ask you where the 24 bit for QP numbers are specified? I have a hard time finding any reliable information about the hardware. Even identifying the exact HCA seems to be non trivial as "Connect-IB" and "MT27600" seem to refer to a variety of cards.
Hi.
The 24 bits are coming from the RDMA spec headers, for example: look at the BTH, it has 24 bits for encoding the destination QP number.
Identifying a PCI device should be easy, using the PCI.ids repository: https://pci-ids.ucw.cz/
Thanks
Dotan
Please help me with this. While I use ib_create_qp it is giving me "Invalid argument error". The code which I am using is in this http://stackoverflow.com/questions/34788781/cannot-create-queue-pair-with-ib-create-qpstackoverflow page. All other functions like create CQ works fine.
Hi Mark.
Which device are you using?
Thanks
Dotan
Hi Dotan,
Is there any way to know when one side of a queue pair goes down without having to constantly send "keep alive" messages? For example, if I have client and server applications running and the server crashes, is there any way for the client to know that the remote side of the connection is down before trying to send to it? I guess I'm looking for something similar to a TCP RST that could be used to automatically re-establish a connection, perhaps at the subnet manager level. Any advice would be greatly appreciated!
Thanks,
David
Hi.
If you are using the QPs directly (i.e. without CM), then the answer is: No.
If you are using CM for connecting and managing the QP connection,
you should get an event when the remote QP goes down.
Thanks
Dotan
Hi Danton,
What is the total number of outstanding RDMA Read/Write Requests that can be performed simultaneously. I have ConnectX4 card and find that there is a problem when I go to a queue depth of more than 64. Is there any limit.
Thanking You,
Param.
Hi.
RDMA Write messages don't require any special resources, but RDMA Read do, so:
* The total number of outstanding RDMA Write messages is limited in the requestor: HCA_CAP.max_qp_wr
* The total number of outstanding RDMA Read messages is limited in the requestor: HCA_CAP.max_qp_init_rd_atom
* The total number of outstanding RDMA Read messages is limited in the responder: HCA_CAP.max_qp_rd_atom
Thanks
Dotan
Hi Dotan, is there a way to attribute a queue pair to a specific traffic class?
Hi.
What do you mean by "traffic class"?
This information exists in the IPv6/GRH header;
Do you refer to Infiniband or RoCE?
Thanks
Dotan
Hi Dotan,
If I send several IBV_WR_RDMA_WRITE then I send a IBV_WR_RDMA_WRITE_WITH_IMM, when the completion of the IBV_WR_RDMA_WRITE_WITH_IMM appears in the destination server, is there guarantee that all the previously sent IBV_WR_RDMA_WRITE were written to the RAM of the destination and hence completed?
Hi.
This is a great question, but I'm not consider myself an expert in this area;
however, let's try to analyze it.
Let's assume that all the messages are sent from one QP to a destination QP:
All the RDMA Writes are accepted by the destination QP and now,
the the last RDMA Write is accepted as well, and a Completion is generated.
So, all the previous + the last incoming RDMA write content is being DMA'ed to the RAM
and only then the information that there is a Completion is DMA'ed to the Completion Queue memory.
The big question is: "Did the content from the DMA writes messages was actually written before the DMA of the Completion?".
I have a feeling that the answer is "it depends on the runtime memory ordering".
However, since when one polls for Completion, AFAIK there is a read barrier so I would expect all the DMA operations to be finished,
and the RAM should contain the memory from the incoming messages. And then the knowledge that there is a Work Completion should be available.
As I said, I'm not a memory or PCI sub-system expert, but those are my 2 cents.
Thanks
Dotan
Thanks
Dotan
Yes, the GRH has a field referred to as traffic class. Mellanox defines it as "Traffic class (or class of service) is a group of all flows that receive the same service characteristics (e.g. buffer size, scheduling). It is possible that some flow with different priorities will be mapped to the same traffic class." link:https://community.mellanox.com/docs/DOC-2022
I am interested in RoCEv1. Is it only useful in switches, or we can also divide traffic between 2 HCA using the traffic class (by associating a qp with a specific traffic class I suppose)?
Hi.
As a former employee of Mellanox, I don't want to respond to things that it publishes;
However, AFAIK traffic should hint the expected type of service,
and in theory this can affect how the packet is handled in switches, routers and adapters.
Those components, if this is supported, can have different buffers to different type of service
thus provide different handling to different type of class.
In InfiniBand there is an SL/VL mechanism for this; I believe that this is the mechanism for IPv4/6 packets.
Thanks
Dotan
Hi Dotan,
I found this comment interesting and somewhat related to my particular problem.
I am interfacing a CX5 ASIC to a FPGA. I need to create the memory region and queue pairs in FPGA physical memory, which I can map into Linux space. I see I can use PA-MR for the data buffer, but how do I create a QP in physical memory?
Hi.
This is a vendor-specific question; so I won't answer your specific question. I'll give a more general answer.
The data buffers can be used from all over the system (even from memory that is attached to any PCI card); as long as they can be mapped by Linux.
To create a QP, one needs to register this QP number and provide memory for the Work Queues.
AFAIK, the caller cannot control the origin of the Work Queue memory; this should be support by the low-level driver.
I hope that I answered.
Thanks
Dotan
Hi Dotan,
When we create a QP using the RDMA verbs,do we get base address of the QP somehow. My question is related to how the hardware knows where to read the posted WR from that QP.
Hi.
When one creates an RDMA resources (CQ, QP, SRQ, etc.),
the internal buffers of those resources isn't (easily) exposed to the user.
Thanks
Dotan
Hi Dotan,
What could cause the failure to create qp
Hi.
QP creation can fail if there aren't enough resources (no more QPs, or memory in the host).
or
Bad configuration of the ulimit (number of locked memory pages).
Thanks
Dotan