mbox series

[0/3] patches and bug report for rxe

Message ID 20220210073655.42281-1-guoqing.jiang@linux.dev (mailing list archive)
Headers show
Series patches and bug report for rxe | expand

Message

Guoqing Jiang Feb. 10, 2022, 7:36 a.m. UTC
Hi,

Recently, I am trying to run rnbd/rtrs on top of rxe, and get several
calltraces with 5.17-rc3 kernel (CONFIG_PROVE_LOCKING is enabled).

However, seems rnbd/rtrs over rxe still can't work with 5.17-rc3 kernel,
dmesg reports below.

1. server side

[  440.723182] rdma_rxe: qp#17 moved to error state
[  440.725300] rtrs_server L1205: <bla>: remote access error (wr_cqe: 000000003b14397c, type: 0, vendor_err: 0x0, len: 0)
[  440.845926] rnbd_server L256: RTRS Session bla disconnected

2. client side

[  997.817536] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
[  998.968810] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
[  999.017988] rtrs_client L610: <bla>: RDMA failed: remote access error
[ 1029.836943] rtrs_client L353: <bla>: Failed IB_WR_LOCAL_INV: WR flushe    

Then I tried 5.16 and 5.15 version, seems 5.15 does work as follows.

1. server side

[  333.076482] rnbd_server L800: </dev/loop1@bla>: Opened device 'loop1'

2. client side

[ 1584.325825] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
[ 1585.268291] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
[ 1585.349300] rnbd_client L1607: </dev/loop1@bla> map_device: Device mapped as rnbd0 (nsectors: 0, logical_block_size: 512, physical_block_size: 512, max_write_same_sectors: 0, max_discard_sectors: 0, discard_granularity: 0, discard_alignment: 0, secure_discard: 0, max_segments: 128, max_hw_sectors: 248, rotational: 1, wc: 0, fua: 0)

I would appreciate if someone shed light on why it doesn't work after 5.15,
And I am happy to test potential patch for it.

Guoqing Jiang (3):
  RDMA/rxe: Replace write_{lock,unlock}_bh with
    write_lock_irq{save,restore} in __rxe_add_index
  RDMA/rxe: Replace write_{lock,unlock}_bh with
    write_lock_irq{save,restore} in __rxe_drop_index
  RDMA/rxe: Replace write_{lock,unlock}_bh with
    write_lock_irq{save,restore} in post_one_send

 drivers/infiniband/sw/rxe/rxe_pool.c  | 10 ++++++----
 drivers/infiniband/sw/rxe/rxe_verbs.c |  5 +++--
 2 files changed, 9 insertions(+), 6 deletions(-)

Comments

Guoqing Jiang Feb. 22, 2022, 9:50 a.m. UTC | #1
On 2/10/22 3:36 PM, Guoqing Jiang wrote:
> However, seems rnbd/rtrs over rxe still can't work with 5.17-rc3 kernel,
> dmesg reports below.
>
> 1. server side
>
> [  440.723182] rdma_rxe: qp#17 moved to error state
> [  440.725300] rtrs_server L1205: <bla>: remote access error (wr_cqe: 000000003b14397c, type: 0, vendor_err: 0x0, len: 0)
> [  440.845926] rnbd_server L256: RTRS Session bla disconnected
>
> 2. client side
>
> [  997.817536] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
> [  998.968810] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
> [  999.017988] rtrs_client L610: <bla>: RDMA failed: remote access error
> [ 1029.836943] rtrs_client L353: <bla>: Failed IB_WR_LOCAL_INV: WR flushe
>
> Then I tried 5.16 and 5.15 version, seems 5.15 does work as follows.
>
> 1. server side
>
> [  333.076482] rnbd_server L800: </dev/loop1@bla>: Opened device 'loop1'
>
> 2. client side
>
> [ 1584.325825] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
> [ 1585.268291] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
> [ 1585.349300] rnbd_client L1607: </dev/loop1@bla> map_device: Device mapped as rnbd0 (nsectors: 0, logical_block_size: 512, physical_block_size: 512, max_write_same_sectors: 0, max_discard_sectors: 0, discard_granularity: 0, discard_alignment: 0, secure_discard: 0, max_segments: 128, max_hw_sectors: 248, rotational: 1, wc: 0, fua: 0)
>
> I would appreciate if someone shed light on why it doesn't work after 5.15,
> And I am happy to test potential patch for it.

After investigation, seems the culprit is commit 647bf13ce944 ("RDMA/rxe:
Create duplicate mapping tables for FMRs"). The problem is mr_check_range
returns -EFAULT after find iova and length are not valid, so connection 
between
two VMs can't be established.

Revert the commit manually or apply below temporary change,  rxe works again
with rnbd/rtrs though I don't think it is the right thing to do. Could 
experts provide
a proper solution? Thanks.

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c 
b/drivers/infiniband/sw/rxe/rxe_mr.c
index 453ef3c9d535..4a2fc4d5809d 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -652,7 +652,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct 
rxe_send_wqe *wqe)
         mr->state = RXE_MR_STATE_VALID;

         set = mr->cur_map_set;
-       mr->cur_map_set = mr->next_map_set;
+       //mr->cur_map_set = mr->next_map_set;
         mr->cur_map_set->iova = wqe->wr.wr.reg.mr->iova;
         mr->next_map_set = set;

@@ -662,7 +662,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct 
rxe_send_wqe *wqe)
  int rxe_mr_set_page(struct ib_mr *ibmr, u64 addr)
  {
         struct rxe_mr *mr = to_rmr(ibmr);
-       struct rxe_map_set *set = mr->next_map_set;
+       struct rxe_map_set *set = mr->cur_map_set;
         struct rxe_map *map;
         struct rxe_phys_buf *buf;

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c 
b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 80df9a8f71a1..e41d2c8612d8 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -992,7 +992,7 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct 
scatterlist *sg,
                          int sg_nents, unsigned int *sg_offset)
  {
         struct rxe_mr *mr = to_rmr(ibmr);
-       struct rxe_map_set *set = mr->next_map_set;
+       struct rxe_map_set *set = mr->cur_map_set;

And the test is pretty simple.

1.  VM (server)

modprobe rdma_rxe
rdma link add rxe0 type rxe netdev ens3
modprobe rnbd-server

2.  VM (client)

modprobe rdma_rxe
rdma link add rxe0 type rxe netdev ens3
modprobe rnbd-client
echo "sessname=bla path=ip:$serverip 
device_path=$block_device_in_server" > 
/sys/devices/virtual/rnbd-client/ctl/map_device

BTW, I tried wip/jgg-for-next branch with commit 3810c1a1cbe8f.

Guoqing
Zhu Yanjun Feb. 22, 2022, 10:04 a.m. UTC | #2
On Tue, Feb 22, 2022 at 5:50 PM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
>
>
> On 2/10/22 3:36 PM, Guoqing Jiang wrote:
> > However, seems rnbd/rtrs over rxe still can't work with 5.17-rc3 kernel,
> > dmesg reports below.
> >
> > 1. server side
> >
> > [  440.723182] rdma_rxe: qp#17 moved to error state
> > [  440.725300] rtrs_server L1205: <bla>: remote access error (wr_cqe: 000000003b14397c, type: 0, vendor_err: 0x0, len: 0)
> > [  440.845926] rnbd_server L256: RTRS Session bla disconnected
> >
> > 2. client side
> >
> > [  997.817536] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
> > [  998.968810] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
> > [  999.017988] rtrs_client L610: <bla>: RDMA failed: remote access error
> > [ 1029.836943] rtrs_client L353: <bla>: Failed IB_WR_LOCAL_INV: WR flushe
> >
> > Then I tried 5.16 and 5.15 version, seems 5.15 does work as follows.
> >
> > 1. server side
> >
> > [  333.076482] rnbd_server L800: </dev/loop1@bla>: Opened device 'loop1'
> >
> > 2. client side
> >
> > [ 1584.325825] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
> > [ 1585.268291] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
> > [ 1585.349300] rnbd_client L1607: </dev/loop1@bla> map_device: Device mapped as rnbd0 (nsectors: 0, logical_block_size: 512, physical_block_size: 512, max_write_same_sectors: 0, max_discard_sectors: 0, discard_granularity: 0, discard_alignment: 0, secure_discard: 0, max_segments: 128, max_hw_sectors: 248, rotational: 1, wc: 0, fua: 0)
> >
> > I would appreciate if someone shed light on why it doesn't work after 5.15,
> > And I am happy to test potential patch for it.
>
> After investigation, seems the culprit is commit 647bf13ce944 ("RDMA/rxe:
> Create duplicate mapping tables for FMRs"). The problem is mr_check_range
> returns -EFAULT after find iova and length are not valid, so connection
> between
> two VMs can't be established.
>
> Revert the commit manually or apply below temporary change,  rxe works again
> with rnbd/rtrs though I don't think it is the right thing to do. Could
> experts provide
> a proper solution? Thanks.
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
> b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 453ef3c9d535..4a2fc4d5809d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -652,7 +652,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct
> rxe_send_wqe *wqe)
>          mr->state = RXE_MR_STATE_VALID;
>
>          set = mr->cur_map_set;
> -       mr->cur_map_set = mr->next_map_set;
> +       //mr->cur_map_set = mr->next_map_set;
>          mr->cur_map_set->iova = wqe->wr.wr.reg.mr->iova;
>          mr->next_map_set = set;
>
> @@ -662,7 +662,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct
> rxe_send_wqe *wqe)
>   int rxe_mr_set_page(struct ib_mr *ibmr, u64 addr)
>   {
>          struct rxe_mr *mr = to_rmr(ibmr);
> -       struct rxe_map_set *set = mr->next_map_set;
> +       struct rxe_map_set *set = mr->cur_map_set;
>          struct rxe_map *map;
>          struct rxe_phys_buf *buf;
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c
> b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 80df9a8f71a1..e41d2c8612d8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -992,7 +992,7 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct
> scatterlist *sg,
>                           int sg_nents, unsigned int *sg_offset)
>   {
>          struct rxe_mr *mr = to_rmr(ibmr);
> -       struct rxe_map_set *set = mr->next_map_set;
> +       struct rxe_map_set *set = mr->cur_map_set;

Thanks a lot. Please file a patch for the above changes.

Zhu Yanjun

>
> And the test is pretty simple.
>
> 1.  VM (server)
>
> modprobe rdma_rxe
> rdma link add rxe0 type rxe netdev ens3
> modprobe rnbd-server
>
> 2.  VM (client)
>
> modprobe rdma_rxe
> rdma link add rxe0 type rxe netdev ens3
> modprobe rnbd-client
> echo "sessname=bla path=ip:$serverip
> device_path=$block_device_in_server" >
> /sys/devices/virtual/rnbd-client/ctl/map_device
>
> BTW, I tried wip/jgg-for-next branch with commit 3810c1a1cbe8f.
>
> Guoqing
Pearson, Robert B Feb. 22, 2022, 4:58 p.m. UTC | #3
>
> After investigation, seems the culprit is commit 647bf13ce944 ("RDMA/rxe:
> Create duplicate mapping tables for FMRs"). The problem is 
> mr_check_range returns -EFAULT after find iova and length are not 
> valid, so connection between two VMs can't be established.
>
> Revert the commit manually or apply below temporary change,  rxe works 
> again with rnbd/rtrs though I don't think it is the right thing to do. 
> Could experts provide a proper solution? Thanks.
>
This patch fixed failures in blktests and srp which were discussed at length. See e.g.

https://lore.kernel.org/linux-rdma/20210907163939.GW1200268@ziepe.ca/

and related messages. The conclusion was that two mappings were required. One owned by the
driver and one by the 'hardware', i.e. bottom half in the rxe case, allowing updating a new mapping
while the old one is still active and then switching them.

If this case has iova and length not valid as indicated is there a problem with the test case?

Bob
Guoqing Jiang Feb. 23, 2022, 4:43 a.m. UTC | #4
On 2/23/22 12:58 AM, Pearson, Robert B wrote:
>> After investigation, seems the culprit is commit 647bf13ce944 ("RDMA/rxe:
>> Create duplicate mapping tables for FMRs"). The problem is
>> mr_check_range returns -EFAULT after find iova and length are not
>> valid, so connection between two VMs can't be established.
>>
>> Revert the commit manually or apply below temporary change,  rxe works
>> again with rnbd/rtrs though I don't think it is the right thing to do.
>> Could experts provide a proper solution? Thanks.
>>
> This patch fixed failures in blktests and srp which were discussed at length. See e.g.
>
> https://lore.kernel.org/linux-rdma/20210907163939.GW1200268@ziepe.ca/

Thanks for the link, which reminds me the always_invalidate parameter in 
rtrs_server.

> and related messages. The conclusion was that two mappings were required. One owned by the
> driver and one by the 'hardware', i.e. bottom half in the rxe case, allowing updating a new mapping
> while the old one is still active and then switching them.
>
> If this case has iova and length not valid as indicated is there a problem with the test case?

And after disable always_invalidate (which is enabled by default), 
rnbd/rtrs over roce
works either. So I suppose there might be potential issue for 
always_invalidate=Y in
rtrs server side since invalidate works for srp IIUC, @Jinpu.

Thanks,
Guoqing
Bob Pearson Feb. 23, 2022, 5:01 a.m. UTC | #5
On 2/22/22 22:43, Guoqing Jiang wrote:
> 
> 
> On 2/23/22 12:58 AM, Pearson, Robert B wrote:
>>> After investigation, seems the culprit is commit 647bf13ce944 ("RDMA/rxe:
>>> Create duplicate mapping tables for FMRs"). The problem is
>>> mr_check_range returns -EFAULT after find iova and length are not
>>> valid, so connection between two VMs can't be established.
>>>
>>> Revert the commit manually or apply below temporary change,  rxe works
>>> again with rnbd/rtrs though I don't think it is the right thing to do.
>>> Could experts provide a proper solution? Thanks.
>>>
>> This patch fixed failures in blktests and srp which were discussed at length. See e.g.
>>
>> https://lore.kernel.org/linux-rdma/20210907163939.GW1200268@ziepe.ca/
> 
> Thanks for the link, which reminds me the always_invalidate parameter in rtrs_server.
> 
>> and related messages. The conclusion was that two mappings were required. One owned by the
>> driver and one by the 'hardware', i.e. bottom half in the rxe case, allowing updating a new mapping
>> while the old one is still active and then switching them.
>>
>> If this case has iova and length not valid as indicated is there a problem with the test case?
> 
> And after disable always_invalidate (which is enabled by default), rnbd/rtrs over roce
> works either. So I suppose there might be potential issue for always_invalidate=Y in
> rtrs server side since invalidate works for srp IIUC, @Jinpu.
> 
> Thanks,
> Guoqing

It would help to understand what you are running. My main concern is that mr_check_range
shouldn't fail unless there is something very wrong.
Guoqing Jiang Feb. 23, 2022, 5:50 a.m. UTC | #6
On 2/23/22 1:01 PM, Bob Pearson wrote:
> On 2/22/22 22:43, Guoqing Jiang wrote:
>>
>> On 2/23/22 12:58 AM, Pearson, Robert B wrote:
>>>> After investigation, seems the culprit is commit 647bf13ce944 ("RDMA/rxe:
>>>> Create duplicate mapping tables for FMRs"). The problem is
>>>> mr_check_range returns -EFAULT after find iova and length are not
>>>> valid, so connection between two VMs can't be established.
>>>>
>>>> Revert the commit manually or apply below temporary change,  rxe works
>>>> again with rnbd/rtrs though I don't think it is the right thing to do.
>>>> Could experts provide a proper solution? Thanks.
>>>>
>>> This patch fixed failures in blktests and srp which were discussed at length. See e.g.
>>>
>>> https://lore.kernel.org/linux-rdma/20210907163939.GW1200268@ziepe.ca/
>> Thanks for the link, which reminds me the always_invalidate parameter in rtrs_server.
>>
>>> and related messages. The conclusion was that two mappings were required. One owned by the
>>> driver and one by the 'hardware', i.e. bottom half in the rxe case, allowing updating a new mapping
>>> while the old one is still active and then switching them.
>>>
>>> If this case has iova and length not valid as indicated is there a problem with the test case?
>> And after disable always_invalidate (which is enabled by default), rnbd/rtrs over roce
>> works either. So I suppose there might be potential issue for always_invalidate=Y in
>> rtrs server side since invalidate works for srp IIUC, @Jinpu.
>>
>> Thanks,
>> Guoqing
> It would help to understand what you are running. My main concern is that mr_check_range
> shouldn't fail unless there is something very wrong.

Let me paste it again, I just try to map server block device to client 
side, specifically the bold
line.

1.  VM (server)

modprobe rdma_rxe
rdma link add rxe0 type rxe netdev ens3
modprobe rnbd-server

2.  VM (client)

modprobe rdma_rxe
rdma link add rxe0 type rxe netdev ens3
modprobe rnbd-client
*echo "sessname=bla path=ip:$serverip 
device_path=$block_device_in_server" > 
/sys/devices/virtual/rnbd-client/ctl/map_device*



Thanks,
Guoqing