ibv_get_async_event()
Contents
int ibv_get_async_event(struct ibv_context *context, struct ibv_async_event *event); |
Description
ibv_get_async_event() reads the next asynchronous event for an RDMA device context context.
After calling ibv_open_device() all of the asynchronous events are being enqueued to this context, and calling ibv_get_async_event() will read them one by one, by their order. Even if ibv_get_async_event() will be called a long time after the events were generated, it will still first read the older events. Unfortunately, there isn't any time notion for the events, and the user can't know when the events occurred.
By default, ibv_get_async_event() is a blocking function and if there isn't any asynchronous event to read, it waits until the next event will be generated. It can be useful to have a dedicated thread that wait for the next event to occur. However, if one wishes to read the event in non-blocking way, this can be done. One can configure the file descriptor of the event file in the device context context to be non-blocking using fcntl(), and then read this file descriptor using read()/poll()/epoll()/select() in order to determine if there is an event that waits to be read. There is an example on how doing it in this post.
Calling ibv_get_async_event() is atomic and even it being called in more than one thread, it is guaranteed that the same event won't be read by more than one thread.
Each event which was received using ibv_get_async_event() must be acknowledged using ibv_ack_async_event().
Here is the full description of struct ibv_async_event:
Name | Description |
---|---|
element | A union of several fields that only one of them is valid, depends on the event type:
CQ events: element.cq is valid QP events: element.qp is valid SRQ events: element.srq is valid Port events: element.port_num is valid RDMA device events: no field is valid |
event_type | Enumerated value which described the type of the event |
Here is a full description of the possible events:
QP events
Here is the description of the affiliated events that may occur for QPs. For those events, the field event->element.qp contains the handle of the QP that got this asynchronous event. Those events will be generated only in the context of the code that this QP belongs to.
IBV_EVENT_COMM_EST
A QP which its state is IBV_QPS_RTR received the first packet in its Receive Queue and it was processed without any error.
This event is mainly relevant only in connection oriented QPs, i.e. RC and UC QPs. It may happen for UD QP as well, it is driver implementation specific.
IBV_EVENT_SQ_DRAINED
A QP, which its state was changed from IBV_QPS_RTS to IBV_QPS_SQD, completed sending all of the outstanding messages in progress in its Send Queue when the state change was requested. For RC QP, this means that all of those messages received acknowledgments, if applicable.
Most of the time, this event will be generated when the (internal) QP state will be changed from SQD.draining to SQD.drained. However, this event may be also generated if the transition to the state IBV_QPS_SQD was aborted because of a transition (either by the RDMA device or by the user) into the IBV_QPS_SQE, IBV_QPS_ERR or IBV_QPS_RESET QP states.
After this event, and the QP is in the IBV_QPS_SQD state it is safe to the user to start modifying the Send Queue attributes send there aren't any message send in progress.
IBV_EVENT_PATH_MIG
Indicates the connection has migrated to the alternate path. This event is relevant only to connection oriented QPs, i.e. RC and UC QPs.
This means that the alternate path attributes are now being used as the primary path attributes. If it is required that there will be another alternate path attribute loaded, the user can now set those attributes.
IBV_EVENT_QP_LAST_WQE_REACHED
A QP, which is associated with an SRQ, was transitioned to the IBV_QPS_ERR state, either automatically by the RDMA device or explicitly by the user, and one of the following occurred:
- A completion with error was generated for the last WQE
- The QP transitioned to the IBV_QPS_ERR state and there are no more WQEs on Receive Queue of that QP
This event actually means that WQEs won't be consumed anymore from the SRQ by this QP.
If there was an error to a QP and this event wasn't generated, the user must destroy all of the QPs that are associated with this SRQ and the SRQ itself in order to reclaim all of the WQEs associated with that QP.
IBV_EVENT_QP_FATAL
A QP experienced an error that prevents the generation of completions while accessing or processing the Work Queue, either Send or Receive Queue.
If the problem that caused this event is in the CQ of that Work Queue, the appropriate CQ will get the IBV_EVENT_CQ_ERR event too.
IBV_EVENT_QP_REQ_ERR
The transport layer of the RDMA device detected a transport error violation in the responder side. This error may be one of the following:
- Unsupported or reserved opcode
- Out of sequence opcode
Those errors are rare and may happen when there are problems in the subnet or when an RDMA device sends illegal packets.
When this happens, the QP is being transitioned automatically to the IBV_QPS_ERR state by the RDMA device.
This event is relevant only to RC QPs.
IBV_EVENT_QP_ACCESS_ERR
The transport layer of the RDMA device detected a request error violation in the responder side. This error may be one of the following:
- Misaligned atomic request
- Too many RDMA Read or Atomic requests
- R_Key violation
- Length errors without immediate data
Those errors are usually happening due to bugs in the user code.
When this happens, the QP is being transitioned automatically to the IBV_QPS_ERR state by the RDMA device.
This event is relevant only to RC QPs.
IBV_EVENT_PATH_MIG_ERR
A QP that has an alternate path attributes loaded tried to perform a path migration change, either by the RDMA device or explicitly by the user, and there was an error that prevented from moving to that alternate path.
This error usually can happen if the alternate path attributes in both sides aren't consistent.
CQ events
Here is the description of the affiliated events that may occur for CQs. For those events, the field event->element.cq contains the handle of the CQ that got this asynchronous event. Those events will be generated only in the context of the code that this CQ belongs to.
IBV_EVENT_CQ_ERR
An error occurred when writing a completion to the CQ. This event may occur when there is a protection error (a rare condition) or when there is a CQ overrun (most likely)
When the CQ has an error, it isn't guaranteed that completions from that CQ can be pulled. All of the QPs that are associated with this CQ, either in their RQ or in their SQ will get the IBV_EVENT_QP_FATAL event too.
SRQ events
Here is the description of the affiliated events that may occur for SRQs. For those events, the field event->element.srq contains the handle of the SRQ that got this asynchronous event. Those events will be generated only in the context of the code that this SRQ belongs to.
IBV_EVENT_SRQ_LIMIT_REACHED
A SRQ which was armed and the number of RR in that SRQ dropped below the limit value of that SRQ. When this event is being generated, the limit value of the SRQ will be set to zero.
Most likely that when this event happens, the user will post more RRs to that SRQ and rearm the SRQ again.
IBV_EVENT_SRQ_ERR
An error occurred that prevents from the RDMA device from dequeuing RRs from that SRQ and reporting of receive completions.
If an SRQ experience an error, all of the QPs, which are associated with this SRQ, will be transitioned to IBV_QPS_ERR state and the IBV_EVENT_QP_FATAL asynchronous event will be generated for them.
Port events
Here is the description of the unaffiliated events that may occur for RDMA device ports. For those events, the field event->element.port_num contains the number of the port that got this asynchronous event. Those events will be generated for all of the contexts that use the RDMA device that its port got the events.
IBV_EVENT_PORT_ACTIVE
The link becomes active and it now available to send/receive packets.
The port_attr.state is was in one of the following states: IBV_PORT_DOWN, IBV_PORT_INIT, IBV_PORT_ARMED and it moved to one of the following states IBV_PORT_ACTIVE or IBV_PORT_ACTIVE_DEFER. This can happen when the SM configures the port.
This event will be generated by the device only if IBV_DEVICE_PORT_ACTIVE_EVENT is set in dev_cap.device_cap_flags.
IBV_EVENT_LID_CHANGE
LID was changed on a port by the SM. If this is not the first time that the SM configures the port LID, this may indicate that there is a new SM in the subnet, or the SM reconfigures the subnet. QPs which send/receive data may experience connection failures (if the LIDs in the subnet were changed).
IBV_EVENT_PKEY_CHANGE
P_Key table was changed on a port by the SM. Since QPs are using P_Key table indexes rather than absolute values, it is suggested for the client to check that the P_Key indexes which his QPs use weren't changed.
IBV_EVENT_GID_CHANGE
GID table was changed on a port by the SM. Since QPs are using GID table indexes rather than absolute values (as the source GID), it is suggested for the client to check that the GID indexes which his QPs use weren't changed.
IBV_EVENT_SM_CHANGE
There is a new SM in the subnet which port belongs to and the client should reregister to all subscriptions previously requested from this port, for example (but not limited to) join a multicast group.
IBV_EVENT_CLIENT_REREGISTER
The SM requests that the client will reregister to all subscriptions previously requested from this port, for example (but not limited to) join a multicast group. This event may be generated when the SM suffered from a failure, which caused it to lose his records or when there is new SM in the subnet.
This event will be generated by the device only if the bit that indicates that client reregister is supported set in port_attr.port_cap_flags.
IBV_EVENT_PORT_ERR
The link becomes inactive and it now unavailable to send/receive packets.
The port_attr.state is was in either IBV_PORT_ACTIVE or IBV_PORT_ACTIVE_DEFER states and it moved to one of the following states: IBV_PORT_DOWN, IBV_PORT_INIT, IBV_PORT_ARMED. This can happen when the there are problems with the link (for example: the cable was removed).
This will not affect the QPs, which are associated with this port, states. Although if they are reliable and tries to send data, they may experience retry exceeded.
Device events
Here are the unaffiliated events that may occur in RDMA devices. Those events will be generated for all of the contexts that use the RDMA device that got the events.
IBV_EVENT_DEVICE_FATAL
The RDMA device suffered from an error which isn't related to one of the above asynchronous events. When this event occurs, the behavior of the RDMA device isn't determined and it is highly recommended to close the process immediately since the attempt to destroy the RDMA resources may fail.
Summary
The following table summarize the behavior of the asynchronous events:
Event name | Element type | Event type | Protocol |
---|---|---|---|
IBV_EVENT_COMM_EST | QP | Info | IB, RoCE |
IBV_EVENT_SQ_DRAINED | QP | Info | IB, RoCE |
IBV_EVENT_PATH_MIG | QP | Info | IB, RoCE |
IBV_EVENT_QP_LAST_WQE_REACHED | QP | Info | IB, RoCE |
IBV_EVENT_QP_FATAL | QP | Error | IB, RoCE, iWARP |
IBV_EVENT_QP_REQ_ERR | QP | Error | IB, RoCE, iWARP |
IBV_EVENT_QP_ACCESS_ERR | QP | Error | IB, RoCE, iWARP |
IBV_EVENT_PATH_MIG_ERR | QP | Error | IB, RoCE |
IBV_EVENT_CQ_ERR | CQ | Error | IB, RoCE, iWARP |
IBV_EVENT_SRQ_LIMIT_REACHED | SRQ | Info | IB, RoCE, iWARP |
IBV_EVENT_SRQ_ERR | SRQ | Error | IB, RoCE, iWARP |
IBV_EVENT_PORT_ACTIVE | Port | Info | IB, RoCE, iWARP |
IBV_EVENT_LID_CHANGE | Port | Info | IB |
IBV_EVENT_PKEY_CHANGE | Port | Info | IB |
IBV_EVENT_GID_CHANGE | Port | Info | IB, RoCE |
IBV_EVENT_SM_CHANGE | Port | Info | IB |
IBV_EVENT_CLIENT_REREGISTER | Port | Info | IB |
IBV_EVENT_PORT_ERR | Port | Error | IB, RoCE, iWARP |
IBV_EVENT_DEVICE_FATAL | Device | Error | IB, RoCE, iWARP |
Parameters
Name | Direction | Description |
---|---|---|
context | in | RDMA device context that was returned from ibv_open_device() |
event | out | The asynchronous event that occurred |
Return Values
Value | Description | ||||
---|---|---|---|---|---|
0 | On success | ||||
-1 |
|
Examples
1) Reading asynchronous event (in blocking way) and printing its context:
/* helper function to print the content of the async event */ static void print_async_event(struct ibv_context *ctx, struct ibv_async_event *event) { switch (event->event_type) { /* QP events */ case IBV_EVENT_QP_FATAL: printf("QP fatal event for QP with handle %p\n", event->element.qp); break; case IBV_EVENT_QP_REQ_ERR: printf("QP Requestor error for QP with handle %p\n", event->element.qp); break; case IBV_EVENT_QP_ACCESS_ERR: printf("QP access error event for QP with handle %p\n", event->element.qp); break; case IBV_EVENT_COMM_EST: printf("QP communication established event for QP with handle %p\n", event->element.qp); break; case IBV_EVENT_SQ_DRAINED: printf("QP Send Queue drained event for QP with handle %p\n", event->element.qp); break; case IBV_EVENT_PATH_MIG: printf("QP Path migration loaded event for QP with handle %p\n", event->element.qp); break; case IBV_EVENT_PATH_MIG_ERR: printf("QP Path migration error event for QP with handle %p\n", event->element.qp); break; case IBV_EVENT_QP_LAST_WQE_REACHED: printf("QP last WQE reached event for QP with handle %p\n", event->element.qp); break; /* CQ events */ case IBV_EVENT_CQ_ERR: printf("CQ error for CQ with handle %p\n", event->element.cq); break; /* SRQ events */ case IBV_EVENT_SRQ_ERR: printf("SRQ error for SRQ with handle %p\n", event->element.srq); break; case IBV_EVENT_SRQ_LIMIT_REACHED: printf("SRQ limit reached event for SRQ with handle %p\n", event->element.srq); break; /* Port events */ case IBV_EVENT_PORT_ACTIVE: printf("Port active event for port number %d\n", event->element.port_num); break; case IBV_EVENT_PORT_ERR: printf("Port error event for port number %d\n", event->element.port_num); break; case IBV_EVENT_LID_CHANGE: printf("LID change event for port number %d\n", event->element.port_num); break; case IBV_EVENT_PKEY_CHANGE: printf("P_Key table change event for port number %d\n", event->element.port_num); break; case IBV_EVENT_GID_CHANGE: printf("GID table change event for port number %d\n", event->element.port_num); break; case IBV_EVENT_SM_CHANGE: printf("SM change event for port number %d\n", event->element.port_num); break; case IBV_EVENT_CLIENT_REREGISTER: printf("Client reregister event for port number %d\n", event->element.port_num); break; /* RDMA device events */ case IBV_EVENT_DEVICE_FATAL: printf("Fatal error event for device %s\n", ibv_get_device_name(ctx->device)); break; default: printf("Unknown event (%d)\n", event->event_type); } } /* the actual code that reads the events in the loop and prints it */ int ret; while (1) { /* wait for the next async event */ ret = ibv_get_async_event(ctx, &event); if (ret) { fprintf(stderr, "Error, ibv_get_async_event() failed\n"); return -1; } /* print the event */ print_async_event(ctx, &event); /* ack the event */ ibv_ack_async_event(&event); } |
2) Reading asynchronous event (in non-blocking way) and printing its context:
int flags; int ret; printf("Changing the mode of events read to be non-blocking\n"); /* change the blocking mode of the async event queue */ flags = fcntl(ctx->async_fd, F_GETFL); ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK); if (ret < 0) { fprintf(stderr, "Error, failed to change file descriptor of async event queue\n"); return -1; } while (1) { struct pollfd my_pollfd; int ms_timeout = 100; /* * poll the queue until it has an event and sleep ms_timeout * milliseconds between any iteration */ my_pollfd.fd = ctx->async_fd; my_pollfd.events = POLLIN; my_pollfd.revents = 0; do { ret = poll(&my_pollfd, 1, ms_timeout); } while (ret == 0); if (ret < 0) { fprintf(stderr, "poll failed\n"); return -1; } /* we know that there is an event, so we just need to read it */ ret = ibv_get_async_event(ctx, &event); if (ret) { fprintf(stderr, "Error, ibv_get_async_event() failed\n"); return -1; } /* print the event */ print_async_event(ctx, &event); /* ack the event */ ibv_ack_async_event(&event); } |
async_event.c
async_event_nonblocking.c
FAQs
Do I have to read the asynchronous events?
No. The asynchronous events mechanism is a way to provide extra information about things that happen in the CQs, QPs, SRQs, ports, devices. The user doesn't have to use it, but it is highly recommended doing so.
Can I read the events once in a while (for example, every few minutes)?
Yes, you can. The downside for this is that you won't know when the event happened, and maybe this information is irrelevant anymore.
Is this verb is thread-safe?
Yes, this verb is thread-safe (just like the rest of the verbs).
I got a QP/CQ/SRQ event. Will other processes get this event too?
No. Affiliated events will be generated only to the context that this resource belongs to. Other contexts won't even know that this event occurred.
Comments
Tell us what do you think.
Hi Dotan, how can another node polling cq for recv event tell if its peer process has crashed?
I ran a quick experiment and it seems the alive node is simply polling the CQ without any erroneous work completion. I made it sidetrack to calling ibv_get_async_event occasionally and no new event from that fd either.
Hi.
If you are using only the verbs - you won't be able to know that the remote side crashed.
If you'll use CM/CMA you'll get an event about it.
However, you can implement a simple keep alive mechanism:
RDMA Write of zero-bytes message (assuming that this is an RC QP).
Thanks
Dotan
What are the possible reasons to get async event IBV_EVENT_PATH_MIG. I am handling high number of traffic in one path. In that case I am getting IBV_EVENT_PATH_MIG. Are those correlated ?
Regards,
Kethiri
Hi.
Automatic Path Migration (APM) starts when there is a transport error with a connection
(on a reliable Connection).
I must admit that I don't understand what "High number of traffic" means;
But maybe now (when there are many packets in the path), the QP timeout isn't enough - and you should increase it.
Thanks
Dotan