Message ID | 8992bd28-667f-94b1-e582-106e6b41aa4b@sandisk.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
----- Original Message ----- > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > To: "Doug Ledford" <dledford@redhat.com> > Cc: "Max Gurtovoy" <maxg@mellanox.com>, "Sagi Grimberg" <sagi@grimberg.me>, "Leon Romanovsky" <leonro@mellanox.com>, > "Israel Rukshin" <israelr@mellanox.com>, "Laurence Oberman" <loberman@redhat.com>, linux-rdma@vger.kernel.org > Sent: Monday, April 24, 2017 6:15:28 PM > Subject: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > > ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger > than what fits into a single MR. .map_mr_sg() must not attempt to > map more SG-list elements than what fits into a single MR. > Hence make sure that mlx5_ib_sg_to_klms() does not write outside > the MR klms[] array. > > Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > Reviewed-by: Max Gurtovoy <maxg@mellanox.com> > Cc: Sagi Grimberg <sagi@grimberg.me> > Cc: Leon Romanovsky <leonro@mellanox.com> > Cc: Israel Rukshin <israelr@mellanox.com> > Cc: <stable@vger.kernel.org> > --- > drivers/infiniband/hw/mlx5/mr.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/hw/mlx5/mr.c > b/drivers/infiniband/hw/mlx5/mr.c > index d9c6c0ea750b..99beacfc4716 100644 > --- a/drivers/infiniband/hw/mlx5/mr.c > +++ b/drivers/infiniband/hw/mlx5/mr.c > @@ -1777,7 +1777,7 @@ mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, > mr->ndescs = sg_nents; > > for_each_sg(sgl, sg, sg_nents, i) { > - if (unlikely(i > mr->max_descs)) > + if (unlikely(i >= mr->max_descs)) > break; > klms[i].va = cpu_to_be64(sg_dma_address(sg) + sg_offset); > klms[i].bcount = cpu_to_be32(sg_dma_len(sg) - sg_offset); > -- > 2.12.2 > > Thanks Bart as always. Will get this tested this week, Regards Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2017-04-24 at 18:39 -0400, Laurence Oberman wrote:
> Will get this tested this week,
Thanks Laurence. BTW, if you want to test this patch with the SRP protocol
you will also have to revert commit d6c58dc40fec ("IB/SRP: Avoid using
IB_MR_TYPE_SG_GAPS"). The code path touched by this patch is namely only
relevant for IB_MR_TYPE_SG_GAPS memory regions. Currently the SRP initiator
driver does not use that MR type. Reverting the aforementioned commit will
make the SRP initiator driver use that MR type.
Please also apply Sagi's "mlx5: Fix mlx5_ib_map_mr_sg mr length" patch
before starting any tests.
Thanks,
Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Bart Van Assche" <Bart.VanAssche@sandisk.com> > To: loberman@redhat.com > Cc: maxg@mellanox.com, israelr@mellanox.com, leonro@mellanox.com, linux-rdma@vger.kernel.org, dledford@redhat.com, > sagi@grimberg.me > Sent: Monday, April 24, 2017 6:46:30 PM > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > > On Mon, 2017-04-24 at 18:39 -0400, Laurence Oberman wrote: > > Will get this tested this week, > > Thanks Laurence. BTW, if you want to test this patch with the SRP protocol > you will also have to revert commit d6c58dc40fec ("IB/SRP: Avoid using > IB_MR_TYPE_SG_GAPS"). The code path touched by this patch is namely only > relevant for IB_MR_TYPE_SG_GAPS memory regions. Currently the SRP initiator > driver does not use that MR type. Reverting the aforementioned commit will > make the SRP initiator driver use that MR type. > > Please also apply Sagi's "mlx5: Fix mlx5_ib_map_mr_sg mr length" patch > before starting any tests. > > Thanks, > > Bart.-- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Understood Regards Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote: > ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger > than what fits into a single MR. .map_mr_sg() must not attempt to > map more SG-list elements than what fits into a single MR. > Hence make sure that mlx5_ib_sg_to_klms() does not write outside > the MR klms[] array. > > Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > Reviewed-by: Max Gurtovoy <maxg@mellanox.com> > Cc: Sagi Grimberg <sagi@grimberg.me> > Cc: Leon Romanovsky <leonro@mellanox.com> > Cc: Israel Rukshin <israelr@mellanox.com> > Cc: <stable@vger.kernel.org> > --- > drivers/infiniband/hw/mlx5/mr.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Bart, Thanks a lot, it indeed looks right. Acked-by: Leon Romanovsky <leonro@mellanox.com> Thanks
----- Original Message ----- > From: "Leon Romanovsky" <leonro@mellanox.com> > To: "Bart Van Assche" <bart.vanassche@sandisk.com> > Cc: "Doug Ledford" <dledford@redhat.com>, "Max Gurtovoy" <maxg@mellanox.com>, "Sagi Grimberg" <sagi@grimberg.me>, > "Israel Rukshin" <israelr@mellanox.com>, "Laurence Oberman" <loberman@redhat.com>, linux-rdma@vger.kernel.org > Sent: Tuesday, April 25, 2017 1:58:49 PM > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > > On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote: > > ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger > > than what fits into a single MR. .map_mr_sg() must not attempt to > > map more SG-list elements than what fits into a single MR. > > Hence make sure that mlx5_ib_sg_to_klms() does not write outside > > the MR klms[] array. > > > > Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") > > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > > Reviewed-by: Max Gurtovoy <maxg@mellanox.com> > > Cc: Sagi Grimberg <sagi@grimberg.me> > > Cc: Leon Romanovsky <leonro@mellanox.com> > > Cc: Israel Rukshin <israelr@mellanox.com> > > Cc: <stable@vger.kernel.org> > > --- > > drivers/infiniband/hw/mlx5/mr.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Bart, > > Thanks a lot, it indeed looks right. > Acked-by: Leon Romanovsky <leonro@mellanox.com> > > Thanks > Hello Bart, Leon, Max and Israel. I cloned off Barts tree. git clone https://github.com/bvanassche/linux cd linux git checkout block-scsi-for-next I checked all patches were in for this test. a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt Built and tested the kernel. However this issue is not resolved :( [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0 [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe [ 2708.121342] 00000000 00000000 00000000 00000000 [ 2708.147104] 00000000 00000000 00000000 00000000 [ 2708.172633] 00000000 00000000 00000000 00000000 [ 2708.198702] 00000000 0f007806 2500002a 14a527d0 [ 2732.434127] scsi host1: ib_srp: reconnect succeeded [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9c30 [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe [ 2746.443240] 00000000 00000000 00000000 00000000 [ 2746.469323] 00000000 00000000 00000000 00000000 [ 2746.495310] 00000000 00000000 00000000 00000000 [ 2746.521407] 00000000 0f007806 25000032 003c7ad0 [ 2752.445899] scsi host1: ib_srp: reconnect succeeded [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0 [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe [ 2763.297826] 00000000 00000000 00000000 00000000 [ 2763.323352] 00000000 00000000 00000000 00000000 [ 2763.348722] 00000000 00000000 00000000 00000000 [ 2763.374681] 00000000 0f007806 2500003a 00084bd0 [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP port-1:1 / host1. [ 2769.415956] scsi host1: ib_srp: reconnect succeeded [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0 [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe [ 2780.093520] 00000000 00000000 00000000 00000000 [ 2780.120067] 00000000 00000000 00000000 00000000 [ 2780.145575] 00000000 00000000 00000000 00000000 [ 2780.171153] 00000000 0f007806 25000042 000833d0 [ 2785.923399] scsi host1: ib_srp: reconnect succeeded [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0 [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe [ 2796.495257] 00000000 00000000 00000000 00000000 [ 2796.521506] 00000000 00000000 00000000 00000000 [ 2796.547640] 00000000 00000000 00000000 00000000 [ 2796.573120] 00000000 0f007806 2500004a 00083bd0 [ 2802.562578] scsi host1: ib_srp: reconnect succeeded [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9cf0 Regards Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-04-25 at 16:37 -0400, Laurence Oberman wrote: > Hello Bart, Leon, Max and Israel. > > I cloned off Barts tree. > > git clone https://github.com/bvanassche/linux > cd linux > git checkout block-scsi-for-next > > I checked all patches were in for this test. > > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt > > Built and tested the kernel. > > However this issue is not resolved :( > > [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0 > [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe > [ 2708.121342] 00000000 00000000 00000000 00000000 > [ 2708.147104] 00000000 00000000 00000000 00000000 > [ 2708.172633] 00000000 00000000 00000000 00000000 > [ 2708.198702] 00000000 0f007806 2500002a 14a527d0 > [ 2732.434127] scsi host1: ib_srp: reconnect succeeded > [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9c30 Hello Laurence, Thank you for having run this test. But are you aware that if a flush error is reported at the initiator side that does not necessarily mean that there is a bug at the initiator side? If e.g. the target system would initiate a disconnect that would also trigger this kind of flush errors. What kind of SRP target system was used in this test? Were the clocks of initiator and target system synchronized? Are the logs of the target system available? If so, can you have a look whether anything interesting can be found in the target log around the time the initiator reported the flush error? Thanks, Bart.
On Tue, Apr 25, 2017 at 04:37:35PM -0400, Laurence Oberman wrote: > > > ----- Original Message ----- > > From: "Leon Romanovsky" <leonro@mellanox.com> > > To: "Bart Van Assche" <bart.vanassche@sandisk.com> > > Cc: "Doug Ledford" <dledford@redhat.com>, "Max Gurtovoy" <maxg@mellanox.com>, "Sagi Grimberg" <sagi@grimberg.me>, > > "Israel Rukshin" <israelr@mellanox.com>, "Laurence Oberman" <loberman@redhat.com>, linux-rdma@vger.kernel.org > > Sent: Tuesday, April 25, 2017 1:58:49 PM > > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > > > > On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote: > > > ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger > > > than what fits into a single MR. .map_mr_sg() must not attempt to > > > map more SG-list elements than what fits into a single MR. > > > Hence make sure that mlx5_ib_sg_to_klms() does not write outside > > > the MR klms[] array. > > > > > > Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") > > > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > > > Reviewed-by: Max Gurtovoy <maxg@mellanox.com> > > > Cc: Sagi Grimberg <sagi@grimberg.me> > > > Cc: Leon Romanovsky <leonro@mellanox.com> > > > Cc: Israel Rukshin <israelr@mellanox.com> > > > Cc: <stable@vger.kernel.org> > > > --- > > > drivers/infiniband/hw/mlx5/mr.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > Bart, > > > > Thanks a lot, it indeed looks right. > > Acked-by: Leon Romanovsky <leonro@mellanox.com> > > > > Thanks > > > > > Hello Bart, Leon, Max and Israel. > > I cloned off Barts tree. > > git clone https://github.com/bvanassche/linux > cd linux > git checkout block-scsi-for-next > > I checked all patches were in for this test. > > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt > > Built and tested the kernel. > > However this issue is not resolved :( > > [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0 > [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe > [ 2708.121342] 00000000 00000000 00000000 00000000 > [ 2708.147104] 00000000 00000000 00000000 00000000 > [ 2708.172633] 00000000 00000000 00000000 00000000 > [ 2708.198702] 00000000 0f007806 2500002a 14a527d0 Parsed version: hw_error_syndrome : 0xf hw_syndrome_type : 0x0 vendor_error_syndrome : 0x78 syndrome : MEMORY_WINDOW_BIND_ERROR (0x6) s_wqe_opcode : UMR (0x25) opcode : REQUESTOR_ERROR (0xd) cqe_format : NO_INLINE_DATA (0x0) owner : 0x0 Description: umr.klm_octoword_count > mkey.mtt_octoword_count Sagi, Max, Any idea where can it be? Thanks
On 4/26/2017 9:16 AM, Leon Romanovsky wrote: > On Tue, Apr 25, 2017 at 04:37:35PM -0400, Laurence Oberman wrote: >> >> >> ----- Original Message ----- >>> From: "Leon Romanovsky" <leonro@mellanox.com> >>> To: "Bart Van Assche" <bart.vanassche@sandisk.com> >>> Cc: "Doug Ledford" <dledford@redhat.com>, "Max Gurtovoy" <maxg@mellanox.com>, "Sagi Grimberg" <sagi@grimberg.me>, >>> "Israel Rukshin" <israelr@mellanox.com>, "Laurence Oberman" <loberman@redhat.com>, linux-rdma@vger.kernel.org >>> Sent: Tuesday, April 25, 2017 1:58:49 PM >>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array >>> >>> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote: >>>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger >>>> than what fits into a single MR. .map_mr_sg() must not attempt to >>>> map more SG-list elements than what fits into a single MR. >>>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside >>>> the MR klms[] array. >>>> >>>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") >>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> >>>> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> >>>> Cc: Sagi Grimberg <sagi@grimberg.me> >>>> Cc: Leon Romanovsky <leonro@mellanox.com> >>>> Cc: Israel Rukshin <israelr@mellanox.com> >>>> Cc: <stable@vger.kernel.org> >>>> --- >>>> drivers/infiniband/hw/mlx5/mr.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>> >>> Bart, >>> >>> Thanks a lot, it indeed looks right. >>> Acked-by: Leon Romanovsky <leonro@mellanox.com> >>> >>> Thanks >>> >> >> >> Hello Bart, Leon, Max and Israel. >> >> I cloned off Barts tree. >> >> git clone https://github.com/bvanassche/linux >> cd linux >> git checkout block-scsi-for-next >> >> I checked all patches were in for this test. >> >> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS >> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array >> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt >> >> Built and tested the kernel. >> >> However this issue is not resolved :( >> >> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0 >> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe >> [ 2708.121342] 00000000 00000000 00000000 00000000 >> [ 2708.147104] 00000000 00000000 00000000 00000000 >> [ 2708.172633] 00000000 00000000 00000000 00000000 >> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0 > > Parsed version: > hw_error_syndrome : 0xf > hw_syndrome_type : 0x0 > vendor_error_syndrome : 0x78 > syndrome : MEMORY_WINDOW_BIND_ERROR (0x6) > s_wqe_opcode : UMR (0x25) > opcode : REQUESTOR_ERROR (0xd) > cqe_format : NO_INLINE_DATA (0x0) > owner : 0x0 > > Description: > umr.klm_octoword_count > mkey.mtt_octoword_count > > Sagi, Max, > Any idea where can it be? Sagi, I see this code in drivers/infiniband/hw/mlx5/mr.c: " ... else if (mr_type == IB_MR_TYPE_SG_GAPS) { mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS; err = mlx5_alloc_priv_descs(pd->device, mr, ndescs, sizeof(struct mlx5_klm)); if (err) goto err_free_in; mr->desc_size = sizeof(struct mlx5_klm); mr->max_descs = ndescs; " while in the past it was: " } else if (mr_type == IB_MR_INDIRECT_REG) { MLX5_SET(mkc, mkc, translations_octword_size, ALIGN(max_num_sg + 1, 4)); mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS | MLX5_PERM_UMR_EN; mr->max_descs = ndescs; " in INDIRECT_REG it was + 1... maybe this is the issue ? Max. > > Thanks > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Bart Van Assche" <Bart.VanAssche@sandisk.com> > To: leonro@mellanox.com, loberman@redhat.com > Cc: maxg@mellanox.com, israelr@mellanox.com, linux-rdma@vger.kernel.org, dledford@redhat.com, sagi@grimberg.me > Sent: Tuesday, April 25, 2017 11:39:12 PM > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > > On Tue, 2017-04-25 at 16:37 -0400, Laurence Oberman wrote: > > Hello Bart, Leon, Max and Israel. > > > > I cloned off Barts tree. > > > > git clone https://github.com/bvanassche/linux > > cd linux > > git checkout block-scsi-for-next > > > > I checked all patches were in for this test. > > > > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS > > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt > > > > Built and tested the kernel. > > > > However this issue is not resolved :( > > > > [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for > > CQE ffff8817edca86b0 > > [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe > > [ 2708.121342] 00000000 00000000 00000000 00000000 > > [ 2708.147104] 00000000 00000000 00000000 00000000 > > [ 2708.172633] 00000000 00000000 00000000 00000000 > > [ 2708.198702] 00000000 0f007806 2500002a 14a527d0 > > [ 2732.434127] scsi host1: ib_srp: reconnect succeeded > > [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for > > CQE ffff8817ed0a9c30 > > Hello Laurence, > > Thank you for having run this test. But are you aware that if a flush error > is reported at the initiator side that does not necessarily mean that there > is a bug at the initiator side? If e.g. the target system would initiate a > disconnect that would also trigger this kind of flush errors. What kind of > SRP target system was used in this test? Were the clocks of initiator and > target system synchronized? Are the logs of the target system available? If > so, can you have a look whether anything interesting can be found in the > target log around the time the initiator reported the flush error? > > Thanks, > > Bart. Hi Bart Its the same target that is stable for all other tests. This is the same issue I originally reported when we then reverted the SG+GAPS. Remember when I reverted that we were stable again. This happens on the initiator first [root@localhost ~]# [ 512.375904] mlx5_0:dump_cqe:262:(pid 4653): dump error cqe [ 512.376648] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817c596f770 [ 512.454276] 00000000 00000000 00000000 00000000 [ 512.478734] 00000000 00000000 00000000 00000000 [ 512.504170] 00000000 00000000 00000000 00000000 [ 512.529457] 00000000 0f007806 2500002a 0548e2d0 [ 532.128455] scsi host2: ib_srp: reconnect succeeded [ 532.232126] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bf2bb3bf0 [ 532.780107] mlx5_0:dump_cqe:262:(pid 511): dump error cqe [ 532.811863] 00000000 00000000 00000000 00000000 [ 532.837984] 00000000 00000000 00000000 00000000 [ 532.863955] 00000000 00000000 00000000 00000000 [ 532.889885] 00000000 0f007806 25000032 00683bd0 Only afterwards do I see the target complain [root@fedstorage ~]# [ 537.105985] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-48. [ 537.152767] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-47. [ 537.200585] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-46. [ 537.247864] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-45. [ 537.296822] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-44. [ 537.345001] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-43. [ 537.394146] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-42. [ 537.442148] ib_srpt Received CM TimeWait exit for ch 0x4e6e72000390fe7c7cfe900300726ed2-41. [ 537.490011] ib_srpt sending response for ioctx 0xffff8800951ed800 failed with status 5 [ 539.774018] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e) [ 539.887987] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e) [ 540.001241] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e) [ 540.111455] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e) [ 540.224780] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e) [ 540.340522] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e) [ 540.453736] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e) [ 540.567043] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x4e6e72000390fe7c:0x7cfe900300726ed2, t_port_id 0x7cfe900300726e4e:0x7cfe900300726e4e and it_iu_len 4148 on port 1 (guid=0xfe80000000000000:0x7cfe900300726e4e) Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Looks good Bart,
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-04-26 at 07:46 -0400, Laurence Oberman wrote: > Its the same target that is stable for all other tests. > This is the same issue I originally reported when we then reverted the SG+GAPS. > Remember when I reverted that we were stable again. > > This happens on the initiator first > > [...] > > Only afterwards do I see the target complain > > [...] Thanks Laurence. I think this confirms that we have to continue analyzing the initiator side further. Bart.-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index d9c6c0ea750b..99beacfc4716 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1777,7 +1777,7 @@ mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, mr->ndescs = sg_nents; for_each_sg(sgl, sg, sg_nents, i) { - if (unlikely(i > mr->max_descs)) + if (unlikely(i >= mr->max_descs)) break; klms[i].va = cpu_to_be64(sg_dma_address(sg) + sg_offset); klms[i].bcount = cpu_to_be32(sg_dma_len(sg) - sg_offset);