mbox series

[net-next,0/4] Make SMC-R can work with rxe devices

Message ID 20240809083148.1989912-1-liujian56@huawei.com (mailing list archive)
Headers show
Series Make SMC-R can work with rxe devices | expand

Message

Liu Jian Aug. 9, 2024, 8:31 a.m. UTC
Make SMC-R can work with rxe devices. This allows us to easily test and
learn the SMC-R protocol without relying on a physical RoCE NIC.

Liu Jian (4):
  rdma/device: export ib_device_get_netdev()
  net/smc: use ib_device_get_netdev() helper to get netdev info
  net/smc: fix one NULL pointer dereference in smc_ib_is_sg_need_sync()
  RDMA/rxe: Set queue pair cur_qp_state when being queried

 drivers/infiniband/core/core_priv.h   |  3 ---
 drivers/infiniband/core/device.c      |  1 +
 drivers/infiniband/sw/rxe/rxe_verbs.c |  2 ++
 include/rdma/ib_verbs.h               |  2 ++
 net/smc/smc_ib.c                      | 10 +++++-----
 net/smc/smc_pnet.c                    |  6 +-----
 6 files changed, 11 insertions(+), 13 deletions(-)

Comments

Jan Karcher Aug. 11, 2024, 6:05 a.m. UTC | #1
On 09/08/2024 10:31, Liu Jian wrote:
> Make SMC-R can work with rxe devices. This allows us to easily test and
> learn the SMC-R protocol without relying on a physical RoCE NIC.

Hi Liu,

thank you for your contribution. Please give me some time to review and 
test it in the next couple of days.

Thanks
- J

> 
> Liu Jian (4):
>    rdma/device: export ib_device_get_netdev()
>    net/smc: use ib_device_get_netdev() helper to get netdev info
>    net/smc: fix one NULL pointer dereference in smc_ib_is_sg_need_sync()
>    RDMA/rxe: Set queue pair cur_qp_state when being queried
> 
>   drivers/infiniband/core/core_priv.h   |  3 ---
>   drivers/infiniband/core/device.c      |  1 +
>   drivers/infiniband/sw/rxe/rxe_verbs.c |  2 ++
>   include/rdma/ib_verbs.h               |  2 ++
>   net/smc/smc_ib.c                      | 10 +++++-----
>   net/smc/smc_pnet.c                    |  6 +-----
>   6 files changed, 11 insertions(+), 13 deletions(-)
>
Jan Karcher Aug. 20, 2024, 1:16 p.m. UTC | #2
On 09/08/2024 10:31, Liu Jian wrote:
> Make SMC-R can work with rxe devices. This allows us to easily test and
> learn the SMC-R protocol without relying on a physical RoCE NIC.

Hi Liu,

sorry for taking quite some time to answer.

Looking into this i cannot accept this series at the given point of time.

FWIU, RXE is mainly for testing and development and i agree that it 
would be a nice thing to have for SMC-R.
The problem is that there is no clean layer for different RoCE devices 
currently. Adding RXE to it works but isn't clean.
Also we have no way to do a "test" build which would have such a device 
supported and a "prod" build which would not support it.

Please give us time to investigate how to solve this in a neat way 
without building up to much technical debt.

Thanks for your contribution and making us aware of this area of improvment.
- Jan

> 
> Liu Jian (4):
>    rdma/device: export ib_device_get_netdev()
>    net/smc: use ib_device_get_netdev() helper to get netdev info
>    net/smc: fix one NULL pointer dereference in smc_ib_is_sg_need_sync()
>    RDMA/rxe: Set queue pair cur_qp_state when being queried
> 
>   drivers/infiniband/core/core_priv.h   |  3 ---
>   drivers/infiniband/core/device.c      |  1 +
>   drivers/infiniband/sw/rxe/rxe_verbs.c |  2 ++
>   include/rdma/ib_verbs.h               |  2 ++
>   net/smc/smc_ib.c                      | 10 +++++-----
>   net/smc/smc_pnet.c                    |  6 +-----
>   6 files changed, 11 insertions(+), 13 deletions(-)
>
Dust Li Aug. 21, 2024, 1:03 a.m. UTC | #3
On 2024-08-20 15:16:57, Jan Karcher wrote:
>
>
>On 09/08/2024 10:31, Liu Jian wrote:
>> Make SMC-R can work with rxe devices. This allows us to easily test and
>> learn the SMC-R protocol without relying on a physical RoCE NIC.
>
>Hi Liu,
>
>sorry for taking quite some time to answer.
>
>Looking into this i cannot accept this series at the given point of time.
>
>FWIU, RXE is mainly for testing and development and i agree that it would be
>a nice thing to have for SMC-R.
>The problem is that there is no clean layer for different RoCE devices
>currently. Adding RXE to it works but isn't clean.

Hi jan,

>Also we have no way to do a "test" build which would have such a device
>supported and a "prod" build which would not support it.

I don't quite understand what you mean here, Maybe I missed something ?
IIUC, we can control whether to use RXE by simpling insmod or rmmod rdma_rxe.ko

I believe having RXE support is beneficial for testing, especially in
simple physical networking setups where many corner cases are unlikely
to occur. By using RXE, we can easily configure unusual scenarios with
the existing iptables/netfilter infrastructure to simulate real-world
situations, such as packet dropping or network retransmission. This
approach can be advantageous for finding hidden bugs.

Best regards,
Dust


>
>Please give us time to investigate how to solve this in a neat way without
>building up to much technical debt.
>
>Thanks for your contribution and making us aware of this area of improvment.
>- Jan
>
>> 
>> Liu Jian (4):
>>    rdma/device: export ib_device_get_netdev()
>>    net/smc: use ib_device_get_netdev() helper to get netdev info
>>    net/smc: fix one NULL pointer dereference in smc_ib_is_sg_need_sync()
>>    RDMA/rxe: Set queue pair cur_qp_state when being queried
>> 
>>   drivers/infiniband/core/core_priv.h   |  3 ---
>>   drivers/infiniband/core/device.c      |  1 +
>>   drivers/infiniband/sw/rxe/rxe_verbs.c |  2 ++
>>   include/rdma/ib_verbs.h               |  2 ++
>>   net/smc/smc_ib.c                      | 10 +++++-----
>>   net/smc/smc_pnet.c                    |  6 +-----
>>   6 files changed, 11 insertions(+), 13 deletions(-)
>>
Liu Jian Aug. 24, 2024, 10:04 a.m. UTC | #4
在 2024/8/21 9:03, Dust Li 写道:
> On 2024-08-20 15:16:57, Jan Karcher wrote:
>>
>>
>> On 09/08/2024 10:31, Liu Jian wrote:
>>> Make SMC-R can work with rxe devices. This allows us to easily test and
>>> learn the SMC-R protocol without relying on a physical RoCE NIC.
>>
>> Hi Liu,
>>
>> sorry for taking quite some time to answer.
>>
>> Looking into this i cannot accept this series at the given point of time.
>>
>> FWIU, RXE is mainly for testing and development and i agree that it would be
>> a nice thing to have for SMC-R.
>> The problem is that there is no clean layer for different RoCE devices
>> currently. Adding RXE to it works but isn't clean.
> 
> Hi jan,
> 
>> Also we have no way to do a "test" build which would have such a device
>> supported and a "prod" build which would not support it.
>  > I don't quite understand what you mean here, Maybe I missed something ?
> IIUC, we can control whether to use RXE by simpling insmod or rmmod rdma_rxe.ko
> 
Yes, in the "prod" environment, we can completely turn off CONFIG_RDMA_RXE.

> I believe having RXE support is beneficial for testing, especially in
> simple physical networking setups where many corner cases are unlikely
> to occur. By using RXE, we can easily configure unusual scenarios with
> the existing iptables/netfilter infrastructure to simulate real-world
> situations, such as packet dropping or network retransmission. This
> approach can be advantageous for finding hidden bugs.
> 
Yes, one of my main original intentions was to make testing smc-r 
easier. This change is relatively simple, mainly patch2 and patch4, and 
there are no logical changes.
> Best regards,
> Dust
> 
> 
>>
>> Please give us time to investigate how to solve this in a neat way without
>> building up to much technical debt.
>>
>> Thanks for your contribution and making us aware of this area of improvment.
>> - Jan
>>
>>>
>>> Liu Jian (4):
>>>     rdma/device: export ib_device_get_netdev()
>>>     net/smc: use ib_device_get_netdev() helper to get netdev info
>>>     net/smc: fix one NULL pointer dereference in smc_ib_is_sg_need_sync()
>>>     RDMA/rxe: Set queue pair cur_qp_state when being queried
>>>
>>>    drivers/infiniband/core/core_priv.h   |  3 ---
>>>    drivers/infiniband/core/device.c      |  1 +
>>>    drivers/infiniband/sw/rxe/rxe_verbs.c |  2 ++
>>>    include/rdma/ib_verbs.h               |  2 ++
>>>    net/smc/smc_ib.c                      | 10 +++++-----
>>>    net/smc/smc_pnet.c                    |  6 +-----
>>>    6 files changed, 11 insertions(+), 13 deletions(-)
>>>
Jan Karcher Aug. 26, 2024, 7:04 p.m. UTC | #5
On 24/08/2024 12:04, liujian (CE) wrote:
> 
> 
> 在 2024/8/21 9:03, Dust Li 写道:
>> On 2024-08-20 15:16:57, Jan Karcher wrote:
>>>
>>>
>>> On 09/08/2024 10:31, Liu Jian wrote:
>>>> Make SMC-R can work with rxe devices. This allows us to easily test and
>>>> learn the SMC-R protocol without relying on a physical RoCE NIC.
>>>
>>> Hi Liu,
>>>
>>> sorry for taking quite some time to answer.
>>>
>>> Looking into this i cannot accept this series at the given point of 
>>> time.
>>>
>>> FWIU, RXE is mainly for testing and development and i agree that it 
>>> would be
>>> a nice thing to have for SMC-R.
>>> The problem is that there is no clean layer for different RoCE devices
>>> currently. Adding RXE to it works but isn't clean.
>>
>> Hi jan,
>>
>>> Also we have no way to do a "test" build which would have such a device
>>> supported and a "prod" build which would not support it.
>>  > I don't quite understand what you mean here, Maybe I missed 
>> something ?
>> IIUC, we can control whether to use RXE by simpling insmod or rmmod 
>> rdma_rxe.ko

Hi,

Yes that enables RXE in general, but not the use of RXE in SMC.

>>
> Yes, in the "prod" environment, we can completely turn off CONFIG_RDMA_RXE.

Same as above + this is a compile time switch that is enabled for 
distros like rh. Simply disabling it won't work here.

> 
>> I believe having RXE support is beneficial for testing, especially in
>> simple physical networking setups where many corner cases are unlikely
>> to occur. By using RXE, we can easily configure unusual scenarios with
>> the existing iptables/netfilter infrastructure to simulate real-world
>> situations, such as packet dropping or network retransmission. This
>> approach can be advantageous for finding hidden bugs.
>>
> Yes, one of my main original intentions was to make testing smc-r 
> easier. This change is relatively simple, mainly patch2 and patch4, and 
> there are no logical changes.

I agree with you. It would be beneficial for testing.
This is not a never, this is a not right now.

If you want to push this forward as something you need now, feel free to 
encapsulate it and introduce a vendor specific experimental option as 
defined in the v2.1 protocol version [1] for it. This would be 
compromise for me at the current time.

Thanks
- Jan

[1] 
https://www.ibm.com/support/pages/system/files/inline-files/IBM%20Shared%20Memory%20Communications%20Version%202.1%20Emulated-ISM_0.pdf

>> Best regards,
>> Dust
>>
>>
>>>
>>> Please give us time to investigate how to solve this in a neat way 
>>> without
>>> building up to much technical debt.
>>>
>>> Thanks for your contribution and making us aware of this area of 
>>> improvment.
>>> - Jan
>>>
>>>>
>>>> Liu Jian (4):
>>>>     rdma/device: export ib_device_get_netdev()
>>>>     net/smc: use ib_device_get_netdev() helper to get netdev info
>>>>     net/smc: fix one NULL pointer dereference in 
>>>> smc_ib_is_sg_need_sync()
>>>>     RDMA/rxe: Set queue pair cur_qp_state when being queried
>>>>
>>>>    drivers/infiniband/core/core_priv.h   |  3 ---
>>>>    drivers/infiniband/core/device.c      |  1 +
>>>>    drivers/infiniband/sw/rxe/rxe_verbs.c |  2 ++
>>>>    include/rdma/ib_verbs.h               |  2 ++
>>>>    net/smc/smc_ib.c                      | 10 +++++-----
>>>>    net/smc/smc_pnet.c                    |  6 +-----
>>>>    6 files changed, 11 insertions(+), 13 deletions(-)
>>>>
Liu Jian Aug. 28, 2024, 10:29 a.m. UTC | #6
在 2024/8/27 3:04, Jan Karcher 写道:
> 
> 
> On 24/08/2024 12:04, liujian (CE) wrote:
>>
>>
>> 在 2024/8/21 9:03, Dust Li 写道:
>>> On 2024-08-20 15:16:57, Jan Karcher wrote:
>>>>
>>>>
>>>> On 09/08/2024 10:31, Liu Jian wrote:
>>>>> Make SMC-R can work with rxe devices. This allows us to easily test 
>>>>> and
>>>>> learn the SMC-R protocol without relying on a physical RoCE NIC.
>>>>
>>>> Hi Liu,
>>>>
>>>> sorry for taking quite some time to answer.
>>>>
>>>> Looking into this i cannot accept this series at the given point of 
>>>> time.
>>>>
>>>> FWIU, RXE is mainly for testing and development and i agree that it 
>>>> would be
>>>> a nice thing to have for SMC-R.
>>>> The problem is that there is no clean layer for different RoCE devices
>>>> currently. Adding RXE to it works but isn't clean.
>>>
>>> Hi jan,
>>>
>>>> Also we have no way to do a "test" build which would have such a device
>>>> supported and a "prod" build which would not support it.
>>>  > I don't quite understand what you mean here, Maybe I missed 
>>> something ?
>>> IIUC, we can control whether to use RXE by simpling insmod or rmmod 
>>> rdma_rxe.ko
> 
> Hi,
> 
> Yes that enables RXE in general, but not the use of RXE in SMC.
> >>>
>> Yes, in the "prod" environment, we can completely turn off 
>> CONFIG_RDMA_RXE.
> 
> Same as above + this is a compile time switch that is enabled for 
> distros like rh. Simply disabling it won't work here.
> 
Got it, I misunderstood what you meant above. Thank you.
>>
>>> I believe having RXE support is beneficial for testing, especially in
>>> simple physical networking setups where many corner cases are unlikely
>>> to occur. By using RXE, we can easily configure unusual scenarios with
>>> the existing iptables/netfilter infrastructure to simulate real-world
>>> situations, such as packet dropping or network retransmission. This
>>> approach can be advantageous for finding hidden bugs.
>>>
>> Yes, one of my main original intentions was to make testing smc-r 
>> easier. This change is relatively simple, mainly patch2 and patch4, 
>> and there are no logical changes.
> 
> I agree with you. It would be beneficial for testing.
> This is not a never, this is a not right now.
> 
> If you want to push this forward as something you need now, feel free to 
> encapsulate it and introduce a vendor specific experimental option as 
> defined in the v2.1 protocol version [1] for it. This would be 
> compromise for me at the current time.
> 
Got it, thanks.
> Thanks
> - Jan
> 
> [1] 
> https://www.ibm.com/support/pages/system/files/inline-files/IBM%20Shared%20Memory%20Communications%20Version%202.1%20Emulated-ISM_0.pdf
> >>> Best regards,
>>> Dust
>>>
>>>
>>>>
>>>> Please give us time to investigate how to solve this in a neat way 
>>>> without
>>>> building up to much technical debt.
>>>>
>>>> Thanks for your contribution and making us aware of this area of 
>>>> improvment.
>>>> - Jan
>>>>
>>>>>
>>>>> Liu Jian (4):
>>>>>     rdma/device: export ib_device_get_netdev()
>>>>>     net/smc: use ib_device_get_netdev() helper to get netdev info
>>>>>     net/smc: fix one NULL pointer dereference in 
>>>>> smc_ib_is_sg_need_sync()
>>>>>     RDMA/rxe: Set queue pair cur_qp_state when being queried
>>>>>
>>>>>    drivers/infiniband/core/core_priv.h   |  3 ---
>>>>>    drivers/infiniband/core/device.c      |  1 +
>>>>>    drivers/infiniband/sw/rxe/rxe_verbs.c |  2 ++
>>>>>    include/rdma/ib_verbs.h               |  2 ++
>>>>>    net/smc/smc_ib.c                      | 10 +++++-----
>>>>>    net/smc/smc_pnet.c                    |  6 +-----
>>>>>    6 files changed, 11 insertions(+), 13 deletions(-)
>>>>>