mbox series

[for-next,v5,00/11] RDMA/rxe: Add RDMA FLUSH operation

Message ID 20220927055337.22630-1-lizhijian@fujitsu.com (mailing list archive)
Headers show
Series RDMA/rxe: Add RDMA FLUSH operation | expand

Message

Zhijian Li (Fujitsu) Sept. 27, 2022, 5:53 a.m. UTC
Hey folks,

Firstly i want to say thank you to all you guys, especially Bob, who in the
past 1+ month, gave me a lots of idea and inspiration.

With the your help, some changes are make in 5th version, such as:
- new names and new patch split schemem, suggested by Bob
- bugfix: set is_pmem true only if the whole MR is pmem. it's possible the
  one MR container both PMEM and DRAM.
- introduce feth structure, instead of u32
- new bugfix to rxe_lookup_mw() and lookup_mr(), see (RDMA/rxe: make sure requested access is a subset of {mr,mw}->access),
  with this fix, we remove check_placement_type(), lookup_mr() has done the such check.
- Enable QP attr flushable
These change logs also appear in the patch it belongs to.

These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
added in the MEMORY PLACEMENT EXTENSIONS section.

This patchset makes SoftRoCE support new RDMA FLUSH on RC service.

You can verify the patchset by building and running the rdma_flush example[2].
server:
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]

Corresponding pyverbs and tests(tests.test_qpex.QpExTestCase.test_qp_ex_rc_rdma_flush)
are also added to rdma-core

[1]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
[2]: https://github.com/zhijianli88/rdma-core/tree/rdma-flush-v5

CC: Xiao Yang <yangx.jy@fujitsu.com>
CC: "Gotou, Yasunori" <y-goto@fujitsu.com>
CC: Jason Gunthorpe <jgg@ziepe.ca>
CC: Zhu Yanjun <zyjzyj2000@gmail.com>
CC: Leon Romanovsky <leon@kernel.org>
CC: Bob Pearson <rpearsonhpe@gmail.com>
CC: Mark Bloch <mbloch@nvidia.com>
CC: Wenpeng Liang <liangwenpeng@huawei.com>
CC: Tom Talpey <tom@talpey.com>
CC: "Gromadzki, Tomasz" <tomasz.gromadzki@intel.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: linux-rdma@vger.kernel.org
CC: linux-kernel@vger.kernel.org

Can also access the kernel source in:
https://github.com/zhijianli88/linux/tree/rdma-flush-v5
Changes log
V4:
- rework responder process
- rebase to v5.19+
- remove [7/7]: RDMA/rxe: Add RD FLUSH service support since RD is not really supported

V3:
- Just rebase and commit log and comment updates
- delete patch-1: "RDMA: mr: Introduce is_pmem", which will be combined into "Allow registering persistent flag for pmem MR only"
- delete patch-7

V2:
RDMA: mr: Introduce is_pmem
   check 1st byte to avoid crossing page boundary
   new scheme to check is_pmem # Dan

RDMA: Allow registering MR with flush access flags
   combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
   split RDMA_FLUSH to 2 capabilities

RDMA/rxe: Allow registering persistent flag for pmem MR only
   update commit message, get rid of confusing ib_check_flush_access_flags() # Tom

RDMA/rxe: Implement RC RDMA FLUSH service in requester side
   extend flush to include length field. # Tom and Tomasz

RDMA/rxe: Implement flush execution in responder side
   adjust start for WHOLE MR level # Tom
   don't support DMA mr for flush # Tom
   check flush return value

RDMA/rxe: Enable RDMA FLUSH capability for rxe device
   adjust patch's order. move it here from [04/10]

Li Zhijian (11):
  RDMA/rxe: make sure requested access is a subset of {mr,mw}->access
  RDMA: Extend RDMA user ABI to support flush
  RDMA: Extend RDMA kernel verbs ABI to support flush
  RDMA/rxe: Extend rxe user ABI to support flush
  RDMA/rxe: Allow registering persistent flag for pmem MR only
  RDMA/rxe: Extend rxe packet format to support flush
  RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  RDMA/rxe: Implement flush execution in responder side
  RDMA/rxe: Implement flush completion
  RDMA/cm: Make QP FLUSHABLE
  RDMA/rxe: Enable RDMA FLUSH capability for rxe device

 drivers/infiniband/core/cm.c            |   3 +-
 drivers/infiniband/sw/rxe/rxe_comp.c    |   4 +-
 drivers/infiniband/sw/rxe/rxe_hdr.h     |  47 +++++++
 drivers/infiniband/sw/rxe/rxe_loc.h     |   1 +
 drivers/infiniband/sw/rxe/rxe_mr.c      |  81 ++++++++++-
 drivers/infiniband/sw/rxe/rxe_mw.c      |   3 +-
 drivers/infiniband/sw/rxe/rxe_opcode.c  |  17 +++
 drivers/infiniband/sw/rxe/rxe_opcode.h  |  16 ++-
 drivers/infiniband/sw/rxe/rxe_param.h   |   4 +-
 drivers/infiniband/sw/rxe/rxe_req.c     |  15 +-
 drivers/infiniband/sw/rxe/rxe_resp.c    | 180 +++++++++++++++++++++---
 drivers/infiniband/sw/rxe/rxe_verbs.h   |   6 +
 include/rdma/ib_pack.h                  |   3 +
 include/rdma/ib_verbs.h                 |  20 ++-
 include/uapi/rdma/ib_user_ioctl_verbs.h |   2 +
 include/uapi/rdma/ib_user_verbs.h       |  16 +++
 include/uapi/rdma/rdma_user_rxe.h       |   7 +
 17 files changed, 389 insertions(+), 36 deletions(-)

Comments

Jason Gunthorpe Oct. 28, 2022, 5:44 p.m. UTC | #1
On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
> Hey folks,
> 
> Firstly i want to say thank you to all you guys, especially Bob, who in the
> past 1+ month, gave me a lots of idea and inspiration.
> 
> With the your help, some changes are make in 5th version, such as:
> - new names and new patch split schemem, suggested by Bob
> - bugfix: set is_pmem true only if the whole MR is pmem. it's possible the
>   one MR container both PMEM and DRAM.
> - introduce feth structure, instead of u32
> - new bugfix to rxe_lookup_mw() and lookup_mr(), see (RDMA/rxe: make sure requested access is a subset of {mr,mw}->access),
>   with this fix, we remove check_placement_type(), lookup_mr() has done the such check.
> - Enable QP attr flushable
> These change logs also appear in the patch it belongs to.
> 
> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
> In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
> added in the MEMORY PLACEMENT EXTENSIONS section.

This doesn't apply anymore, I did try to fix it, but it ended up not
compiling, so it is better if you handle it and repost.

Thanks,
Jason
Jason Gunthorpe Oct. 28, 2022, 5:57 p.m. UTC | #2
On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
> Hey folks,
> 
> Firstly i want to say thank you to all you guys, especially Bob, who in the
> past 1+ month, gave me a lots of idea and inspiration.

I would like it if someone familiar with rxe could reviewed-by the
protocol parts.

Jason
Zhu Yanjun Nov. 11, 2022, 2:49 a.m. UTC | #3
在 2022/10/29 1:57, Jason Gunthorpe 写道:
> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>> Hey folks,
>>
>> Firstly i want to say thank you to all you guys, especially Bob, who in the
>> past 1+ month, gave me a lots of idea and inspiration.
> 
> I would like it if someone familiar with rxe could reviewed-by the
> protocol parts.

Hi, Jason

I reviewed these patches. I am fine with these patches.

Hi, Zhijian

I noticed the followings:
"
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]
"
Can you merge the server and the client to rdma-core?

Thanks,
Zhu Yanjun

> 
> Jason
Zhijian Li (Fujitsu) Nov. 11, 2022, 5:10 a.m. UTC | #4
On 11/11/2022 10:49, Yanjun Zhu wrote:
> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>> Hey folks,
>>>
>>> Firstly i want to say thank you to all you guys, especially Bob, who 
>>> in the
>>> past 1+ month, gave me a lots of idea and inspiration.
>>
>> I would like it if someone familiar with rxe could reviewed-by the
>> protocol parts.
> 
> Hi, Jason
> 
> I reviewed these patches. I am fine with these patches.
> 
> Hi, Zhijian
> 
> I noticed the followings:
> "
> $ ./rdma_flush_server -s [server_address] -p [port_number]
> client:
> $ ./rdma_flush_client -s [server_address] -p [port_number]
> "
> Can you merge the server and the client to rdma-core?

Yanjun,

Yes, there was already a draft PR here 
https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go 
ahead until the kernel's patches are merged.

and i will post a new version these days, would you mind if i add your 
"Reviewed-by" in next version ?



> 
> Thanks,
> Zhu Yanjun
> 
>>
>> Jason
>
Zhu Yanjun Nov. 11, 2022, 5:52 a.m. UTC | #5
在 2022/11/11 13:10, lizhijian@fujitsu.com 写道:
> 
> 
> On 11/11/2022 10:49, Yanjun Zhu wrote:
>> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>>> Hey folks,
>>>>
>>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>>> in the
>>>> past 1+ month, gave me a lots of idea and inspiration.
>>>
>>> I would like it if someone familiar with rxe could reviewed-by the
>>> protocol parts.
>>
>> Hi, Jason
>>
>> I reviewed these patches. I am fine with these patches.
>>
>> Hi, Zhijian
>>
>> I noticed the followings:
>> "
>> $ ./rdma_flush_server -s [server_address] -p [port_number]
>> client:
>> $ ./rdma_flush_client -s [server_address] -p [port_number]
>> "
>> Can you merge the server and the client to rdma-core?
> 
> Yanjun,
> 
> Yes, there was already a draft PR here
> https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
> ahead until the kernel's patches are merged.
> 
> and i will post a new version these days, would you mind if i add your
> "Reviewed-by" in next version ?

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Thanks.

Another problem, normally rxe should connect to physical ib devices, 
such as mlx ib device. That is, one host is rxe, the other host is mlx 
ib device. The rdma connection should be created between the 2 hosts.

Do you connect to mlx ib device with this RDMA FLUSH operation?
And what is the test result?

Thanks a lot.
Zhu Yanjun

> 
> 
> 
>>
>> Thanks,
>> Zhu Yanjun
>>
>>>
>>> Jason
Zhijian Li (Fujitsu) Nov. 11, 2022, 6:10 a.m. UTC | #6
On 11/11/2022 13:52, Yanjun Zhu wrote:
> 在 2022/11/11 13:10, lizhijian@fujitsu.com 写道:
>>
>>
>> On 11/11/2022 10:49, Yanjun Zhu wrote:
>>> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>>>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>>>> Hey folks,
>>>>>
>>>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>>>> in the
>>>>> past 1+ month, gave me a lots of idea and inspiration.
>>>>
>>>> I would like it if someone familiar with rxe could reviewed-by the
>>>> protocol parts.
>>>
>>> Hi, Jason
>>>
>>> I reviewed these patches. I am fine with these patches.
>>>
>>> Hi, Zhijian
>>>
>>> I noticed the followings:
>>> "
>>> $ ./rdma_flush_server -s [server_address] -p [port_number]
>>> client:
>>> $ ./rdma_flush_client -s [server_address] -p [port_number]
>>> "
>>> Can you merge the server and the client to rdma-core?
>>
>> Yanjun,
>>
>> Yes, there was already a draft PR here
>> https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
>> ahead until the kernel's patches are merged.
>>
>> and i will post a new version these days, would you mind if i add your
>> "Reviewed-by" in next version ?
> 
> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> Thanks.
> 
> Another problem, normally rxe should connect to physical ib devices, 
> such as mlx ib device. That is, one host is rxe, the other host is mlx 
> ib device. The rdma connection should be created between the 2 hosts.

it's fully compatible with old operation.


> 
> Do you connect to mlx ib device with this RDMA FLUSH operation?
> And what is the test result?

Yes, i tested it.

After these patches, only RXE device can register *FLUSHABLE* MRs 
successfully. If mlx try that, EOPNOSUPP will be returned.

Similarly, Since other hardwares(MLX for example) have not supported 
FLUSH operation, EOPNOSUPP will be returned if users try to to that.

In short, for RXE requester, MLX responder will return error for the 
request. MLX requester is not able to request a FLUSH operation.

Thanks
Zhijian


> 
> Thanks a lot.
> Zhu Yanjun
> 
>>
>>
>>
>>>
>>> Thanks,
>>> Zhu Yanjun
>>>
>>>>
>>>> Jason
>
Zhu Yanjun Nov. 11, 2022, 6:30 a.m. UTC | #7
在 2022/11/11 14:10, lizhijian@fujitsu.com 写道:
>
> On 11/11/2022 13:52, Yanjun Zhu wrote:
>> 在 2022/11/11 13:10, lizhijian@fujitsu.com 写道:
>>>
>>> On 11/11/2022 10:49, Yanjun Zhu wrote:
>>>> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>>>>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>>>>> Hey folks,
>>>>>>
>>>>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>>>>> in the
>>>>>> past 1+ month, gave me a lots of idea and inspiration.
>>>>> I would like it if someone familiar with rxe could reviewed-by the
>>>>> protocol parts.
>>>> Hi, Jason
>>>>
>>>> I reviewed these patches. I am fine with these patches.
>>>>
>>>> Hi, Zhijian
>>>>
>>>> I noticed the followings:
>>>> "
>>>> $ ./rdma_flush_server -s [server_address] -p [port_number]
>>>> client:
>>>> $ ./rdma_flush_client -s [server_address] -p [port_number]
>>>> "
>>>> Can you merge the server and the client to rdma-core?
>>> Yanjun,
>>>
>>> Yes, there was already a draft PR here
>>> https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
>>> ahead until the kernel's patches are merged.
>>>
>>> and i will post a new version these days, would you mind if i add your
>>> "Reviewed-by" in next version ?
>> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> Thanks.
>>
>> Another problem, normally rxe should connect to physical ib devices,
>> such as mlx ib device. That is, one host is rxe, the other host is mlx
>> ib device. The rdma connection should be created between the 2 hosts.
> it's fully compatible with old operation.
>
>
>> Do you connect to mlx ib device with this RDMA FLUSH operation?
>> And what is the test result?
> Yes, i tested it.
>
> After these patches, only RXE device can register *FLUSHABLE* MRs
> successfully. If mlx try that, EOPNOSUPP will be returned.
>
> Similarly, Since other hardwares(MLX for example) have not supported
> FLUSH operation, EOPNOSUPP will be returned if users try to to that.
>
> In short, for RXE requester, MLX responder will return error for the
> request. MLX requester is not able to request a FLUSH operation.

Thanks. Do you mean that FLUSH operation is only supported in RXE? ^_^

And MLX does not support FLUSH operation currently?

Zhu Yanjun

>
> Thanks
> Zhijian
>
>
>> Thanks a lot.
>> Zhu Yanjun
>>
>>>
>>>
>>>> Thanks,
>>>> Zhu Yanjun
>>>>
>>>>> Jason
Zhijian Li (Fujitsu) Nov. 11, 2022, 6:38 a.m. UTC | #8
On 11/11/2022 14:30, Yanjun Zhu wrote:
>>
>> After these patches, only RXE device can register *FLUSHABLE* MRs
>> successfully. If mlx try that, EOPNOSUPP will be returned.
>>
>> Similarly, Since other hardwares(MLX for example) have not supported
>> FLUSH operation, EOPNOSUPP will be returned if users try to to that.
>>
>> In short, for RXE requester, MLX responder will return error for the
>> request. MLX requester is not able to request a FLUSH operation.
> 
> Thanks. Do you mean that FLUSH operation is only supported in RXE? ^_^
> 
> And MLX does not support FLUSH operation currently?

IMO, FLUSH and Atomic Write are newly introduced by IBA spec 1.5 
published in 2021. So hardware/drivers(MLX) should do something to
support it.
Zhu Yanjun Nov. 11, 2022, 7:08 a.m. UTC | #9
在 2022/11/11 14:38, lizhijian@fujitsu.com 写道:
>
> On 11/11/2022 14:30, Yanjun Zhu wrote:
>>> After these patches, only RXE device can register *FLUSHABLE* MRs
>>> successfully. If mlx try that, EOPNOSUPP will be returned.
>>>
>>> Similarly, Since other hardwares(MLX for example) have not supported
>>> FLUSH operation, EOPNOSUPP will be returned if users try to to that.
>>>
>>> In short, for RXE requester, MLX responder will return error for the
>>> request. MLX requester is not able to request a FLUSH operation.
>> Thanks. Do you mean that FLUSH operation is only supported in RXE? ^_^
>>
>> And MLX does not support FLUSH operation currently?
> IMO, FLUSH and Atomic Write are newly introduced by IBA spec 1.5
> published in 2021. So hardware/drivers(MLX) should do something to
> support it.

Thanks.

If I got you correctly, FLUSH and Atomic Write is a new feature. And 
from the test result, it is not supported by MLX driver currently.

Wait for MLX Engineer for updates about FLUSH and Atomic Write.

IMO, it had better make rxe successfully connect to one physical ib 
device with FLUSH and Atomic Write, such as MLX or others.

Zhu Yanjun