Zero byte messages

5.00 avg. rating (98% score) - 2 votes

RDMA supports zero byte messages, and this can be done by posting a Send Request without a scatter/gather list (i.e. a list with zero entries).

Zero byte messages can be done with the following opcodes:

Send
Send with immediate
RDMA Write
RDMA Write with Immediate
RDMA Read

To the RDMA operations, the remote address and remote key aren't being actually used or validated, so those values don't have to contain the details of a valid remote Memory Region.

What zero byte messages are good for?

Zero byte messages can be useful in the following scenarios:

When only the immediate data is used - This can be useful to mark a directive or a status update.
For keep alive messages in a reliable QP - Zero byte messages of RDMA Write or RDMA Read are a good idea for a non-intrusive keep alive messages in a reliable QP: to make sure that the remote QP is still alive and functioning. If the remote QP will be offline, for example, if the QP was transitioned to Error or Reset state, or if the process was terminated or if even the node itself was rebooted, there will be a Work Completion with Retry Exceeded status. Using one of the other above-mentioned opcodes will consume a Receive Request from the remote side QP.

Written by: Dotan Barak on September 20, 2013.on February 19, 2015.

Comments

Tell us what do you think.

Hiroyuki Sato(@hiroysato) says: September 25, 2013

This is nice article!.

By the way, My follower said. It cause IBV_WC_LOC_LEN_ERR, when zero byte message send to peer on ConnectX-3 card.
The source is here.
http://www.nminoru.jp/~nminoru/data/201309/libibverbs-1.1.6-zero-length-send-test.diff

Could you please tell me what is missing?

Reply
- Dotan Barak says: September 25, 2013
  
  Thanks.
  
  As I wrote, in order to send zero byte messages, the s/g list should have zero entries, but in your example it has one entry:
  
  struct ibv_send_wr wr = {
  .wr_id = PINGPONG_SEND_WRID,
  .sg_list = &list,
  .num_sge = 1, <-------------------- this should be zero - .opcode = IBV_WR_SEND, + .opcode = IBV_WR_SEND_WITH_IMM, + .imm_data = 0, .send_flags = IBV_SEND_SIGNALED, }; Sending a scatter/gather list with value zero in the size member, actually mean send 2GB... Thanks Dotan
  
  Reply
  - Hiroyuki Sato(@hiroysato) says: September 25, 2013
    
    Hello Dotan.
    
    Thank you for your reply. I'll check it.
    And feedback it later.
  - rsai says: April 18, 2014
    
    From what I understand, this should have actually caused 2GB of data transfer. But it causes IBV_WC_LOC_LEN_ERR. Why ?
    The reason why I ask is, that I am facing the same problem setting ie, .num_sge = 1 and .length = 0, causes IBV_WC_LOC_LEN_ERR.
  - Dotan Barak says: April 18, 2014
    
    The question is whether the S/G entry that you described points to a valid Memory Region space?
    
    Thanks
    Dotan
    
    If you like RDMAmojo, support it.
Hiroyuki Sato(@hiroysato) says: September 28, 2013

Hello Dotan.

Thank you for your advice. It worked properly.

Reply
- Dotan Barak says: September 28, 2013
  
  Great!
  
  Reply
Anuj Kalia says: November 11, 2013

Hey Dotan.

I was wondering if you could help me understand the interaction between RDMA and CPU caches. I had the following specific question:

When a remote host reads from a server via RDMA, where does the read actually come from? I read that writes go to L3 cache. Do the reads come from L3 cache too? If so, what happens if something is in a modified state in L1 or L2 cache? Is L3 always up-to-date with L1/L2?

Thanks a lot for your time!

Reply
- Dotan Barak says: November 30, 2013
  
  Hi Anuj and sorry for the late response.
  
  The question if L3 chache is up-to-date with L1/L2 is a question that you should ask
  the chipset/CPU guys.
  
  But IMHO, the answer is yes.
  
  Thanks
  Dotan
  
  Reply
rsai says: April 14, 2014

Hey Dotan,

I noticed this from your post on ibv_post_send about sge.length:
The length of the buffer in bytes. The value 0 is a special value and is equal to 2^{31} bytes (and not zero bytes, as one might imagine)

Is this Mellanox specific or generic?
Is there a spec document which describes this? I could not find any which states this , except for a couple of forum posts on zero-byte messages.
Is there a reason why 0 is expected to mean 2gig? If so, what would my CI interpret if sge.length is set to 2147483648 (bytes) which can also be stored in uint32_t?

Thanks,
--

Reply
- Dotan Barak says: April 15, 2014
  
  Hi.
  
  This is a good question.
  I searched for an answer in the InfiniBand spec, and couldn't find one.
  So, I can't give you a quick answer what is the origin of this behavior.
  
  I can think about one reason: what is the meaning on a scatter/gather entry with zero bytes?
  If it is zero bytes, why did you add it in the first place?
  
  One another reason is that 0 is actually 2GB module 2GB, so if for any scatter/gather entry length you perform a module of 2GB (the maximum size of a message in RDMA), you'll get to 0.
  
  I further investigate it, but if will take some time though.
  (BTW, all posts are moderated to prevent SPAM, so they will be seen only when they approved by me)
  
  Dotan
  
  If you like RDMAmojo, support it.
  
  Reply
  - rsai says: April 15, 2014
    
    Dotan,
    
    Thanks for your response.
    And no problem. Please take your time. I shall keep my eyes open on this thread. :-)
Parthiban says: September 23, 2014

Hi Dotan,
I have the same question as "rsai" has, I did you find any answer for that? and
I'm trying to do RDMA write of 5 GB of memory but i see only 1GB is getting into the remote buffer, i have assigned the sge with the allocated buffer and length to the sge.length and using only sge. kindly suggest me what could be the issue.
Thanks,

Reply
- Dotan Barak says: September 23, 2014
  
  Hi Parthiban.
  
  In general, RDMA (the protocol itself) can support up to 2 GB in one message.
  RDMA devices may have lower limit.
  
  If you need to send more data than the maximum supported value (1 GB in your example),
  you can use several RDMA writes to send the local (big) buffer to the remote buffer.
  
  Thanks
  Dotan
  
  Reply
  - Parthiban says: September 23, 2014
    
    Hi Dotan,
    Thanks a lot for your response, I'm new to RDMA is there any program that i can refer to implement this?
    Thanks.
  - Dotan Barak says: September 23, 2014
    
    Hi Parthiban.
    
    I haven't published an "hello world" posts - yet.
    A good example can be the examples/rping.c in the following URL:
    .
    
    Thanks
    Dotan
  - rsai says: September 25, 2014
    
    Just to point out, I looked through different vendor's driver source code, and what I found out was, Mellanox are only one who consider 0 as 2 GB.
  - Dotan Barak says: September 25, 2014
    
    Hi.
    
    Here is a text from the InfiniBand specifications:
    
    9.3.3.3 DMA LENGTH (DMALEN) - 32 BITS
    
    This field indicates the length, in bytes, of the remote DMA operation.
    
    C9-9: For an HCA performing RDMA operations, the minimum length
    specified in the DMALen field is 0; the maximum length is 2^31.
    
    So, the value zero (in packet headers) means 2^31.
    If one wishes to send a message with zero bytes, he can use a Send Request with no scatter/gather elements at all.
    
    Thanks
    Dotan
Parthiban says: September 25, 2014

Hi Dotan,
Thanks for your pointers, not able to see the link you have posted, but i have referred the program in the below link to implement,
http://web.mit.edu/freebsd/head/contrib/ofed/libibverbs/examples/rc_pingpong.c
Thanks.

Reply

Add a Comment

This comment will be moderated; answer may be provided within 14 days.

Social Network Badges

Main Menu

Zero byte messages

What zero byte messages are good for?

Comments

Add a Comment

Sidebar

Donate

Categories

Archives

Recent Comments

Twitter Status

Archives

Social Network Badges

Main Menu

Zero byte messages

What zero byte messages are good for?

Share Our Posts

Comments

Add a Comment

Sidebar

Donate

Tags

Categories

Archives

Recent Comments

Twitter Status

Blogroll

Archives