diff mbox

v4.10-rc SRP + mlx5 regression

Message ID 568916592.30910570.1487038794766.JavaMail.zimbra@redhat.com (mailing list archive)
State Superseded
Headers show

Commit Message

Laurence Oberman Feb. 14, 2017, 2:19 a.m. UTC
----- Original Message -----
> From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
> To: leon@kernel.org, loberman@redhat.com
> Cc: hch@lst.de, maxg@mellanox.com, israelr@mellanox.com, linux-rdma@vger.kernel.org, dledford@redhat.com
> Sent: Monday, February 13, 2017 4:52:28 PM
> Subject: Re: v4.10-rc SRP + mlx5 regression
> 
> On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote:
> > I will have to run through this again and see where the bisect went wrong.
> 
> Hello Laurence,
> 
> If you would be considering to repeat the bisect, did you know that a bisect
> can be sped up by specifying the names of the files and/or directories that
> are suspected? An example:
> 
> git bisect start */infiniband */net
> 
> Bart.--
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hello Bart, 

Much better news this time :), worked late on this but got it figured out.

OK, so we got to this one, which makes a lot more sense and is right in the area where we are having issues.
I must have answered wrong to one of the steps the first time I did the bisect.

Reverted this in the master tree of rc8 and rebuilt the kernel
Now all tests pass on Linus's tree - 4.10.0_rc8+

The interesting point here is that this commit is in rc5 but rc5 was not failing so we have an interoperability issue with this commit


[loberman@ibclient linux]$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation when the device supports sg gaps

[loberman@ibclient linux]$ git show ad8e66b4a80182174f73487ed25fd2140cf43361
commit ad8e66b4a80182174f73487ed25fd2140cf43361
Author: Israel Rukshin <israelr@mellanox.com>
Date:   Wed Dec 28 12:48:28 2016 +0200

    IB/srp: fix mr allocation when the device supports sg gaps
    
    If the device support arbitrary sg list mapping (device cap
    IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
    IB_MR_TYPE_SG_GAPS.
    
    Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
    Cc: <stable@vger.kernel.org> # 4.7+
    Signed-off-by: Israel Rukshin <israelr@mellanox.com>
    Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
    Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
    Reviewed-by: Mark Bloch <markb@mellanox.com>
    Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
    Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>




Now moving on to what got me here in the first place.
Bart, let me know if the 7 of the 8 patches in your most recent series are all still valid after this revert 
Otherwise let me know which ones you want me to apply.

patch 6 - I am thinking i sno longer valid.
"
If a HCA supports the SG_GAPS_REG feature then a single memory
region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch
reduces the number of memory regions that is allocated per SRP
session.
"

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Leon Romanovsky Feb. 14, 2017, 6:39 a.m. UTC | #1
On Mon, Feb 13, 2017 at 09:19:54PM -0500, Laurence Oberman wrote:
>
>
> ----- Original Message -----
> > From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
> > To: leon@kernel.org, loberman@redhat.com
> > Cc: hch@lst.de, maxg@mellanox.com, israelr@mellanox.com, linux-rdma@vger.kernel.org, dledford@redhat.com
> > Sent: Monday, February 13, 2017 4:52:28 PM
> > Subject: Re: v4.10-rc SRP + mlx5 regression
> >
> > On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote:
> > > I will have to run through this again and see where the bisect went wrong.
> >
> > Hello Laurence,
> >
> > If you would be considering to repeat the bisect, did you know that a bisect
> > can be sped up by specifying the names of the files and/or directories that
> > are suspected? An example:
> >
> > git bisect start */infiniband */net
> >
> > Bart.--
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>
> Hello Bart,
>
> Much better news this time :), worked late on this but got it figured out.
>
> OK, so we got to this one, which makes a lot more sense and is right in the area where we are having issues.
> I must have answered wrong to one of the steps the first time I did the bisect.
>
> Reverted this in the master tree of rc8 and rebuilt the kernel
> Now all tests pass on Linus's tree - 4.10.0_rc8+
>
> The interesting point here is that this commit is in rc5 but rc5 was not failing so we have an interoperability issue with this commit
>
>
> [loberman@ibclient linux]$ git bisect good
> Bisecting: 0 revisions left to test after this (roughly 1 step)
> [ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation when the device supports sg gaps
>
> [loberman@ibclient linux]$ git show ad8e66b4a80182174f73487ed25fd2140cf43361
> commit ad8e66b4a80182174f73487ed25fd2140cf43361
> Author: Israel Rukshin <israelr@mellanox.com>
> Date:   Wed Dec 28 12:48:28 2016 +0200
>
>     IB/srp: fix mr allocation when the device supports sg gaps
>
>     If the device support arbitrary sg list mapping (device cap
>     IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
>     IB_MR_TYPE_SG_GAPS.
>
>     Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
>     Cc: <stable@vger.kernel.org> # 4.7+
>     Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>     Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>     Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>     Reviewed-by: Mark Bloch <markb@mellanox.com>
>     Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
>     Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>     Signed-off-by: Doug Ledford <dledford@redhat.com>
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 8ddc071..0f67cf9 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -371,6 +371,7 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
>         struct srp_fr_desc *d;
>         struct ib_mr *mr;
>         int i, ret = -EINVAL;
> +       enum ib_mr_type mr_type;
>
>         if (pool_size <= 0)
>                 goto err;
> @@ -384,9 +385,13 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
>         spin_lock_init(&pool->lock);
>         INIT_LIST_HEAD(&pool->free_list);
>
> +       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
> +               mr_type = IB_MR_TYPE_SG_GAPS;
> +       else
> +               mr_type = IB_MR_TYPE_MEM_REG;
> +
>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> -               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
> -                                max_page_list_len);
> +               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);

First, ib_alloc_mr receives u32 as a third parameter, but int was
supplied. Second (I can be wrong here), shouldn't max_page_list_len be
replaced with max_fast_reg_page_list_len?

Thanks

>                 if (IS_ERR(mr)) {
>                         ret = PTR_ERR(mr);
>                         if (ret == -ENOMEM)
> (END)
>
>
> So here is the revert patch, but you need to decide how you want to deal with this.
>
>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
>     Laurence Oberman
>     Traced after bisection to a cause for this failure
>
> Tested-by:     Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Laurence Oberman <loberman@redhat.com>
>
> commit 90d169d312a173d5350c1bb36d6daab04c592127
> Author: Laurence Oberman <loberman@redhat.com>
> Date:   Mon Feb 13 20:33:32 2017 -0500
>
>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
>     Laurence Oberman
>     Traced after bisection to a cause for this failure
>
>     [  130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe
>     [  130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f0edbfb0
>     [  130.510899] 00000000 00000000 00000000 00000000
>     [  130.536455] 00000000 00000000 00000000 00000000
>     [  130.561878] 00000000 00000000 00000000 00000000
>     [  130.585904] 00000000 0f007806 2500002a db0ec4d0
>     [  145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1.
>     [  146.530439] scsi host1: ib_srp: reconnect succeeded
>     [  146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe
>     [  146.597635] 00000000 00000000 00000000 00000000
>     [  146.623545] 00000000 00000000 00000000 00000000
>     [  146.649599] 00000000 00000000 00000000 00000000
>     [  146.673938] 00000000 0f007806 25000032 000c46d0
>     [  146.697969] scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff88
>     [  162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1.
>     [  162.256337] scsi host1: ib_srp: reconnect succeeded
>     [  162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f0412ef0`
>
>     This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361.
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 79bf484..01338c8 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -371,7 +371,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
>         struct srp_fr_desc *d;
>         struct ib_mr *mr;
>         int i, ret = -EINVAL;
> -       enum ib_mr_type mr_type;
>
>         if (pool_size <= 0)
>                 goto err;
> @@ -385,13 +384,9 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
>         spin_lock_init(&pool->lock);
>         INIT_LIST_HEAD(&pool->free_list);
>
> -       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
> -               mr_type = IB_MR_TYPE_SG_GAPS;
> -       else
> -               mr_type = IB_MR_TYPE_MEM_REG;
> -
>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> -               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
> +               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
> +                                max_page_list_len);
>                 if (IS_ERR(mr)) {
>                         ret = PTR_ERR(mr);
>                         if (ret == -ENOMEM)
>
>
>
> Now moving on to what got me here in the first place.
> Bart, let me know if the 7 of the 8 patches in your most recent series are all still valid after this revert
> Otherwise let me know which ones you want me to apply.
>
> patch 6 - I am thinking i sno longer valid.
> "
> If a HCA supports the SG_GAPS_REG feature then a single memory
> region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch
> reduces the number of memory regions that is allocated per SRP
> session.
> "
>
> Thanks
> Laurence
Max Gurtovoy Feb. 14, 2017, 10 a.m. UTC | #2
Hi Laurence,
can you specify the test that repro these failures ?
have you tried running with CX5 HCA or only CX4 ?
I think this commit is right and we have issues in other places.


On 2/14/2017 8:39 AM, Leon Romanovsky wrote:
> On Mon, Feb 13, 2017 at 09:19:54PM -0500, Laurence Oberman wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
>>> To: leon@kernel.org, loberman@redhat.com
>>> Cc: hch@lst.de, maxg@mellanox.com, israelr@mellanox.com, linux-rdma@vger.kernel.org, dledford@redhat.com
>>> Sent: Monday, February 13, 2017 4:52:28 PM
>>> Subject: Re: v4.10-rc SRP + mlx5 regression
>>>
>>> On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote:
>>>> I will have to run through this again and see where the bisect went wrong.
>>>
>>> Hello Laurence,
>>>
>>> If you would be considering to repeat the bisect, did you know that a bisect
>>> can be sped up by specifying the names of the files and/or directories that
>>> are suspected? An example:
>>>
>>> git bisect start */infiniband */net
>>>
>>> Bart.--
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> Hello Bart,
>>
>> Much better news this time :), worked late on this but got it figured out.
>>
>> OK, so we got to this one, which makes a lot more sense and is right in the area where we are having issues.
>> I must have answered wrong to one of the steps the first time I did the bisect.
>>
>> Reverted this in the master tree of rc8 and rebuilt the kernel
>> Now all tests pass on Linus's tree - 4.10.0_rc8+
>>
>> The interesting point here is that this commit is in rc5 but rc5 was not failing so we have an interoperability issue with this commit
>>
>>
>> [loberman@ibclient linux]$ git bisect good
>> Bisecting: 0 revisions left to test after this (roughly 1 step)
>> [ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation when the device supports sg gaps
>>
>> [loberman@ibclient linux]$ git show ad8e66b4a80182174f73487ed25fd2140cf43361
>> commit ad8e66b4a80182174f73487ed25fd2140cf43361
>> Author: Israel Rukshin <israelr@mellanox.com>
>> Date:   Wed Dec 28 12:48:28 2016 +0200
>>
>>     IB/srp: fix mr allocation when the device supports sg gaps
>>
>>     If the device support arbitrary sg list mapping (device cap
>>     IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
>>     IB_MR_TYPE_SG_GAPS.
>>
>>     Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
>>     Cc: <stable@vger.kernel.org> # 4.7+
>>     Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>>     Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>     Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>     Reviewed-by: Mark Bloch <markb@mellanox.com>
>>     Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
>>     Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>     Signed-off-by: Doug Ledford <dledford@redhat.com>
>>
>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
>> index 8ddc071..0f67cf9 100644
>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -371,6 +371,7 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
>>         struct srp_fr_desc *d;
>>         struct ib_mr *mr;
>>         int i, ret = -EINVAL;
>> +       enum ib_mr_type mr_type;
>>
>>         if (pool_size <= 0)
>>                 goto err;
>> @@ -384,9 +385,13 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
>>         spin_lock_init(&pool->lock);
>>         INIT_LIST_HEAD(&pool->free_list);
>>
>> +       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
>> +               mr_type = IB_MR_TYPE_SG_GAPS;
>> +       else
>> +               mr_type = IB_MR_TYPE_MEM_REG;
>> +
>>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
>> -               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
>> -                                max_page_list_len);
>> +               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
>
> First, ib_alloc_mr receives u32 as a third parameter, but int was
> supplied. Second (I can be wrong here), shouldn't max_page_list_len be
> replaced with max_fast_reg_page_list_len?
>
> Thanks

there is a statement that:

	if (srp_dev->use_fast_reg) {
                 srp_dev->max_pages_per_mr =
                         min_t(u32, srp_dev->max_pages_per_mr,
                               attr->max_fast_reg_page_list_len);
         }

so we take the max_fast_reg_page_list_len in this case.

>
>>                 if (IS_ERR(mr)) {
>>                         ret = PTR_ERR(mr);
>>                         if (ret == -ENOMEM)
>> (END)
>>
>>
>> So here is the revert patch, but you need to decide how you want to deal with this.
>>
>>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
>>     Laurence Oberman
>>     Traced after bisection to a cause for this failure
>>
>> Tested-by:     Laurence Oberman <loberman@redhat.com>
>> Signed-off-by: Laurence Oberman <loberman@redhat.com>
>>
>> commit 90d169d312a173d5350c1bb36d6daab04c592127
>> Author: Laurence Oberman <loberman@redhat.com>
>> Date:   Mon Feb 13 20:33:32 2017 -0500
>>
>>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
>>     Laurence Oberman
>>     Traced after bisection to a cause for this failure
>>
>>     [  130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe
>>     [  130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f0edbfb0
>>     [  130.510899] 00000000 00000000 00000000 00000000
>>     [  130.536455] 00000000 00000000 00000000 00000000
>>     [  130.561878] 00000000 00000000 00000000 00000000
>>     [  130.585904] 00000000 0f007806 2500002a db0ec4d0
>>     [  145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1.
>>     [  146.530439] scsi host1: ib_srp: reconnect succeeded
>>     [  146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe
>>     [  146.597635] 00000000 00000000 00000000 00000000
>>     [  146.623545] 00000000 00000000 00000000 00000000
>>     [  146.649599] 00000000 00000000 00000000 00000000
>>     [  146.673938] 00000000 0f007806 25000032 000c46d0
>>     [  146.697969] scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff88
>>     [  162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1.
>>     [  162.256337] scsi host1: ib_srp: reconnect succeeded
>>     [  162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f0412ef0`
>>
>>     This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361.
>>
>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
>> index 79bf484..01338c8 100644
>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -371,7 +371,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
>>         struct srp_fr_desc *d;
>>         struct ib_mr *mr;
>>         int i, ret = -EINVAL;
>> -       enum ib_mr_type mr_type;
>>
>>         if (pool_size <= 0)
>>                 goto err;
>> @@ -385,13 +384,9 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
>>         spin_lock_init(&pool->lock);
>>         INIT_LIST_HEAD(&pool->free_list);
>>
>> -       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
>> -               mr_type = IB_MR_TYPE_SG_GAPS;
>> -       else
>> -               mr_type = IB_MR_TYPE_MEM_REG;
>> -
>>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
>> -               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
>> +               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
>> +                                max_page_list_len);
>>                 if (IS_ERR(mr)) {
>>                         ret = PTR_ERR(mr);
>>                         if (ret == -ENOMEM)
>>
>>
>>
>> Now moving on to what got me here in the first place.
>> Bart, let me know if the 7 of the 8 patches in your most recent series are all still valid after this revert
>> Otherwise let me know which ones you want me to apply.
>>
>> patch 6 - I am thinking i sno longer valid.
>> "
>> If a HCA supports the SG_GAPS_REG feature then a single memory
>> region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch
>> reduces the number of memory regions that is allocated per SRP
>> session.
>> "
>>
>> Thanks
>> Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman Feb. 14, 2017, 1:31 p.m. UTC | #3
----- Original Message -----
> From: "Max Gurtovoy" <maxg@mellanox.com>
> To: "Leon Romanovsky" <leon@kernel.org>, "Laurence Oberman" <loberman@redhat.com>
> Cc: "Bart Van Assche" <Bart.VanAssche@sandisk.com>, hch@lst.de, israelr@mellanox.com, linux-rdma@vger.kernel.org,
> dledford@redhat.com
> Sent: Tuesday, February 14, 2017 5:00:04 AM
> Subject: Re: v4.10-rc SRP + mlx5 regression
> 
> Hi Laurence,
> can you specify the test that repro these failures ?
> have you tried running with CX5 HCA or only CX4 ?
> I think this commit is right and we have issues in other places.
> 
> 
> On 2/14/2017 8:39 AM, Leon Romanovsky wrote:
> > On Mon, Feb 13, 2017 at 09:19:54PM -0500, Laurence Oberman wrote:
> >>
> >>
> >> ----- Original Message -----
> >>> From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
> >>> To: leon@kernel.org, loberman@redhat.com
> >>> Cc: hch@lst.de, maxg@mellanox.com, israelr@mellanox.com,
> >>> linux-rdma@vger.kernel.org, dledford@redhat.com
> >>> Sent: Monday, February 13, 2017 4:52:28 PM
> >>> Subject: Re: v4.10-rc SRP + mlx5 regression
> >>>
> >>> On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote:
> >>>> I will have to run through this again and see where the bisect went
> >>>> wrong.
> >>>
> >>> Hello Laurence,
> >>>
> >>> If you would be considering to repeat the bisect, did you know that a
> >>> bisect
> >>> can be sped up by specifying the names of the files and/or directories
> >>> that
> >>> are suspected? An example:
> >>>
> >>> git bisect start */infiniband */net
> >>>
> >>> Bart.--
> >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>
> >> Hello Bart,
> >>
> >> Much better news this time :), worked late on this but got it figured out.
> >>
> >> OK, so we got to this one, which makes a lot more sense and is right in
> >> the area where we are having issues.
> >> I must have answered wrong to one of the steps the first time I did the
> >> bisect.
> >>
> >> Reverted this in the master tree of rc8 and rebuilt the kernel
> >> Now all tests pass on Linus's tree - 4.10.0_rc8+
> >>
> >> The interesting point here is that this commit is in rc5 but rc5 was not
> >> failing so we have an interoperability issue with this commit
> >>
> >>
> >> [loberman@ibclient linux]$ git bisect good
> >> Bisecting: 0 revisions left to test after this (roughly 1 step)
> >> [ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation when
> >> the device supports sg gaps
> >>
> >> [loberman@ibclient linux]$ git show
> >> ad8e66b4a80182174f73487ed25fd2140cf43361
> >> commit ad8e66b4a80182174f73487ed25fd2140cf43361
> >> Author: Israel Rukshin <israelr@mellanox.com>
> >> Date:   Wed Dec 28 12:48:28 2016 +0200
> >>
> >>     IB/srp: fix mr allocation when the device supports sg gaps
> >>
> >>     If the device support arbitrary sg list mapping (device cap
> >>     IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
> >>     IB_MR_TYPE_SG_GAPS.
> >>
> >>     Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
> >>     Cc: <stable@vger.kernel.org> # 4.7+
> >>     Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> >>     Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> >>     Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> >>     Reviewed-by: Mark Bloch <markb@mellanox.com>
> >>     Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> >>     Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> >>     Signed-off-by: Doug Ledford <dledford@redhat.com>
> >>
> >> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
> >> b/drivers/infiniband/ulp/srp/ib_srp.c
> >> index 8ddc071..0f67cf9 100644
> >> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> >> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> >> @@ -371,6 +371,7 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
> >> ib_device *device,
> >>         struct srp_fr_desc *d;
> >>         struct ib_mr *mr;
> >>         int i, ret = -EINVAL;
> >> +       enum ib_mr_type mr_type;
> >>
> >>         if (pool_size <= 0)
> >>                 goto err;
> >> @@ -384,9 +385,13 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
> >> ib_device *device,
> >>         spin_lock_init(&pool->lock);
> >>         INIT_LIST_HEAD(&pool->free_list);
> >>
> >> +       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
> >> +               mr_type = IB_MR_TYPE_SG_GAPS;
> >> +       else
> >> +               mr_type = IB_MR_TYPE_MEM_REG;
> >> +
> >>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> >> -               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
> >> -                                max_page_list_len);
> >> +               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
> >
> > First, ib_alloc_mr receives u32 as a third parameter, but int was
> > supplied. Second (I can be wrong here), shouldn't max_page_list_len be
> > replaced with max_fast_reg_page_list_len?
> >
> > Thanks
> 
> there is a statement that:
> 
> 	if (srp_dev->use_fast_reg) {
>                  srp_dev->max_pages_per_mr =
>                          min_t(u32, srp_dev->max_pages_per_mr,
>                                attr->max_fast_reg_page_list_len);
>          }
> 
> so we take the max_fast_reg_page_list_len in this case.
> 
> >
> >>                 if (IS_ERR(mr)) {
> >>                         ret = PTR_ERR(mr);
> >>                         if (ret == -ENOMEM)
> >> (END)
> >>
> >>
> >> So here is the revert patch, but you need to decide how you want to deal
> >> with this.
> >>
> >>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
> >>     Laurence Oberman
> >>     Traced after bisection to a cause for this failure
> >>
> >> Tested-by:     Laurence Oberman <loberman@redhat.com>
> >> Signed-off-by: Laurence Oberman <loberman@redhat.com>
> >>
> >> commit 90d169d312a173d5350c1bb36d6daab04c592127
> >> Author: Laurence Oberman <loberman@redhat.com>
> >> Date:   Mon Feb 13 20:33:32 2017 -0500
> >>
> >>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
> >>     Laurence Oberman
> >>     Traced after bisection to a cause for this failure
> >>
> >>     [  130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe
> >>     [  130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5)
> >>     for CQE ffff8817f0edbfb0
> >>     [  130.510899] 00000000 00000000 00000000 00000000
> >>     [  130.536455] 00000000 00000000 00000000 00000000
> >>     [  130.561878] 00000000 00000000 00000000 00000000
> >>     [  130.585904] 00000000 0f007806 2500002a db0ec4d0
> >>     [  145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> >>     [  146.530439] scsi host1: ib_srp: reconnect succeeded
> >>     [  146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe
> >>     [  146.597635] 00000000 00000000 00000000 00000000
> >>     [  146.623545] 00000000 00000000 00000000 00000000
> >>     [  146.649599] 00000000 00000000 00000000 00000000
> >>     [  146.673938] 00000000 0f007806 25000032 000c46d0
> >>     [  146.697969] scsi host1: ib_srp: failed FAST REG status memory
> >>     management operation error (6) for CQE ffff88
> >>     [  162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> >>     [  162.256337] scsi host1: ib_srp: reconnect succeeded
> >>     [  162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5)
> >>     for CQE ffff8817f0412ef0`
> >>
> >>     This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361.
> >>
> >> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
> >> b/drivers/infiniband/ulp/srp/ib_srp.c
> >> index 79bf484..01338c8 100644
> >> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> >> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> >> @@ -371,7 +371,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
> >> ib_device *device,
> >>         struct srp_fr_desc *d;
> >>         struct ib_mr *mr;
> >>         int i, ret = -EINVAL;
> >> -       enum ib_mr_type mr_type;
> >>
> >>         if (pool_size <= 0)
> >>                 goto err;
> >> @@ -385,13 +384,9 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
> >> ib_device *device,
> >>         spin_lock_init(&pool->lock);
> >>         INIT_LIST_HEAD(&pool->free_list);
> >>
> >> -       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
> >> -               mr_type = IB_MR_TYPE_SG_GAPS;
> >> -       else
> >> -               mr_type = IB_MR_TYPE_MEM_REG;
> >> -
> >>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> >> -               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
> >> +               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
> >> +                                max_page_list_len);
> >>                 if (IS_ERR(mr)) {
> >>                         ret = PTR_ERR(mr);
> >>                         if (ret == -ENOMEM)
> >>
> >>
> >>
> >> Now moving on to what got me here in the first place.
> >> Bart, let me know if the 7 of the 8 patches in your most recent series are
> >> all still valid after this revert
> >> Otherwise let me know which ones you want me to apply.
> >>
> >> patch 6 - I am thinking i sno longer valid.
> >> "
> >> If a HCA supports the SG_GAPS_REG feature then a single memory
> >> region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch
> >> reduces the number of memory regions that is allocated per SRP
> >> session.
> >> "
> >>
> >> Thanks
> >> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello Max,

I only have CX4 and CX3 in my lab, this test bed only has CX4.

CA 'mlx5_0'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.14.2036
	Hardware version: 0
	Node GUID: 0x7cfe900300726ed2
	System image GUID: 0x7cfe900300726ed2
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 3
		LMC: 0
		SM lid: 3
		Capability mask: 0x2651e84a
		Port GUID: 0x7cfe900300726ed2
		Link layer: InfiniBand

The test is simple, it's the same one I start with every time because it always
brings out issues with mapping for large I/O sizes and mem registration if such issues exist.

I have a server running LIO with memory backed LUNS.
These are served via a dual port mlx5 (CX4) over ib_srpt

The client mounts these LUNS via ib_srp (mlx5) and device-mapper-multipath 
and I run a simple dd on the XFS file system.

#!/bin/bash
while true
do
	dd if=/dev/zero of=/data-$1/bigfile bs=4096k count=900 
	sync;
	rm -rf /data-$1/bigfile
done

Once this passes I run a suite of other tests read/write, direct and buffered.

Thanks
Laurence

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman Feb. 14, 2017, 4:21 p.m. UTC | #4
----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Max Gurtovoy" <maxg@mellanox.com>
> Cc: "Leon Romanovsky" <leon@kernel.org>, "Bart Van Assche" <Bart.VanAssche@sandisk.com>, hch@lst.de,
> israelr@mellanox.com, linux-rdma@vger.kernel.org, dledford@redhat.com
> Sent: Tuesday, February 14, 2017 8:31:02 AM
> Subject: Re: v4.10-rc SRP + mlx5 regression
> 
> 
> 
> ----- Original Message -----
> > From: "Max Gurtovoy" <maxg@mellanox.com>
> > To: "Leon Romanovsky" <leon@kernel.org>, "Laurence Oberman"
> > <loberman@redhat.com>
> > Cc: "Bart Van Assche" <Bart.VanAssche@sandisk.com>, hch@lst.de,
> > israelr@mellanox.com, linux-rdma@vger.kernel.org,
> > dledford@redhat.com
> > Sent: Tuesday, February 14, 2017 5:00:04 AM
> > Subject: Re: v4.10-rc SRP + mlx5 regression
> > 
> > Hi Laurence,
> > can you specify the test that repro these failures ?
> > have you tried running with CX5 HCA or only CX4 ?
> > I think this commit is right and we have issues in other places.
> > 
> > 
> > On 2/14/2017 8:39 AM, Leon Romanovsky wrote:
> > > On Mon, Feb 13, 2017 at 09:19:54PM -0500, Laurence Oberman wrote:
> > >>
> > >>
> > >> ----- Original Message -----
> > >>> From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
> > >>> To: leon@kernel.org, loberman@redhat.com
> > >>> Cc: hch@lst.de, maxg@mellanox.com, israelr@mellanox.com,
> > >>> linux-rdma@vger.kernel.org, dledford@redhat.com
> > >>> Sent: Monday, February 13, 2017 4:52:28 PM
> > >>> Subject: Re: v4.10-rc SRP + mlx5 regression
> > >>>
> > >>> On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote:
> > >>>> I will have to run through this again and see where the bisect went
> > >>>> wrong.
> > >>>
> > >>> Hello Laurence,
> > >>>
> > >>> If you would be considering to repeat the bisect, did you know that a
> > >>> bisect
> > >>> can be sped up by specifying the names of the files and/or directories
> > >>> that
> > >>> are suspected? An example:
> > >>>
> > >>> git bisect start */infiniband */net
> > >>>
> > >>> Bart.--
> > >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > >>> in
> > >>> the body of a message to majordomo@vger.kernel.org
> > >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>>
> > >>
> > >> Hello Bart,
> > >>
> > >> Much better news this time :), worked late on this but got it figured
> > >> out.
> > >>
> > >> OK, so we got to this one, which makes a lot more sense and is right in
> > >> the area where we are having issues.
> > >> I must have answered wrong to one of the steps the first time I did the
> > >> bisect.
> > >>
> > >> Reverted this in the master tree of rc8 and rebuilt the kernel
> > >> Now all tests pass on Linus's tree - 4.10.0_rc8+
> > >>
> > >> The interesting point here is that this commit is in rc5 but rc5 was not
> > >> failing so we have an interoperability issue with this commit
> > >>
> > >>
> > >> [loberman@ibclient linux]$ git bisect good
> > >> Bisecting: 0 revisions left to test after this (roughly 1 step)
> > >> [ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation
> > >> when
> > >> the device supports sg gaps
> > >>
> > >> [loberman@ibclient linux]$ git show
> > >> ad8e66b4a80182174f73487ed25fd2140cf43361
> > >> commit ad8e66b4a80182174f73487ed25fd2140cf43361
> > >> Author: Israel Rukshin <israelr@mellanox.com>
> > >> Date:   Wed Dec 28 12:48:28 2016 +0200
> > >>
> > >>     IB/srp: fix mr allocation when the device supports sg gaps
> > >>
> > >>     If the device support arbitrary sg list mapping (device cap
> > >>     IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
> > >>     IB_MR_TYPE_SG_GAPS.
> > >>
> > >>     Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
> > >>     Cc: <stable@vger.kernel.org> # 4.7+
> > >>     Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> > >>     Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> > >>     Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > >>     Reviewed-by: Mark Bloch <markb@mellanox.com>
> > >>     Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> > >>     Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> > >>     Signed-off-by: Doug Ledford <dledford@redhat.com>
> > >>
> > >> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
> > >> b/drivers/infiniband/ulp/srp/ib_srp.c
> > >> index 8ddc071..0f67cf9 100644
> > >> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> > >> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> > >> @@ -371,6 +371,7 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
> > >> ib_device *device,
> > >>         struct srp_fr_desc *d;
> > >>         struct ib_mr *mr;
> > >>         int i, ret = -EINVAL;
> > >> +       enum ib_mr_type mr_type;
> > >>
> > >>         if (pool_size <= 0)
> > >>                 goto err;
> > >> @@ -384,9 +385,13 @@ static struct srp_fr_pool
> > >> *srp_create_fr_pool(struct
> > >> ib_device *device,
> > >>         spin_lock_init(&pool->lock);
> > >>         INIT_LIST_HEAD(&pool->free_list);
> > >>
> > >> +       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
> > >> +               mr_type = IB_MR_TYPE_SG_GAPS;
> > >> +       else
> > >> +               mr_type = IB_MR_TYPE_MEM_REG;
> > >> +
> > >>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> > >> -               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
> > >> -                                max_page_list_len);
> > >> +               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
> > >
> > > First, ib_alloc_mr receives u32 as a third parameter, but int was
> > > supplied. Second (I can be wrong here), shouldn't max_page_list_len be
> > > replaced with max_fast_reg_page_list_len?
> > >
> > > Thanks
> > 
> > there is a statement that:
> > 
> > 	if (srp_dev->use_fast_reg) {
> >                  srp_dev->max_pages_per_mr =
> >                          min_t(u32, srp_dev->max_pages_per_mr,
> >                                attr->max_fast_reg_page_list_len);
> >          }
> > 
> > so we take the max_fast_reg_page_list_len in this case.
> > 
> > >
> > >>                 if (IS_ERR(mr)) {
> > >>                         ret = PTR_ERR(mr);
> > >>                         if (ret == -ENOMEM)
> > >> (END)
> > >>
> > >>
> > >> So here is the revert patch, but you need to decide how you want to deal
> > >> with this.
> > >>
> > >>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
> > >>     Laurence Oberman
> > >>     Traced after bisection to a cause for this failure
> > >>
> > >> Tested-by:     Laurence Oberman <loberman@redhat.com>
> > >> Signed-off-by: Laurence Oberman <loberman@redhat.com>
> > >>
> > >> commit 90d169d312a173d5350c1bb36d6daab04c592127
> > >> Author: Laurence Oberman <loberman@redhat.com>
> > >> Date:   Mon Feb 13 20:33:32 2017 -0500
> > >>
> > >>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
> > >>     Laurence Oberman
> > >>     Traced after bisection to a cause for this failure
> > >>
> > >>     [  130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe
> > >>     [  130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > >>     for CQE ffff8817f0edbfb0
> > >>     [  130.510899] 00000000 00000000 00000000 00000000
> > >>     [  130.536455] 00000000 00000000 00000000 00000000
> > >>     [  130.561878] 00000000 00000000 00000000 00000000
> > >>     [  130.585904] 00000000 0f007806 2500002a db0ec4d0
> > >>     [  145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> > >>     [  146.530439] scsi host1: ib_srp: reconnect succeeded
> > >>     [  146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe
> > >>     [  146.597635] 00000000 00000000 00000000 00000000
> > >>     [  146.623545] 00000000 00000000 00000000 00000000
> > >>     [  146.649599] 00000000 00000000 00000000 00000000
> > >>     [  146.673938] 00000000 0f007806 25000032 000c46d0
> > >>     [  146.697969] scsi host1: ib_srp: failed FAST REG status memory
> > >>     management operation error (6) for CQE ffff88
> > >>     [  162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> > >>     [  162.256337] scsi host1: ib_srp: reconnect succeeded
> > >>     [  162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5)
> > >>     for CQE ffff8817f0412ef0`
> > >>
> > >>     This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361.
> > >>
> > >> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
> > >> b/drivers/infiniband/ulp/srp/ib_srp.c
> > >> index 79bf484..01338c8 100644
> > >> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> > >> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> > >> @@ -371,7 +371,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
> > >> ib_device *device,
> > >>         struct srp_fr_desc *d;
> > >>         struct ib_mr *mr;
> > >>         int i, ret = -EINVAL;
> > >> -       enum ib_mr_type mr_type;
> > >>
> > >>         if (pool_size <= 0)
> > >>                 goto err;
> > >> @@ -385,13 +384,9 @@ static struct srp_fr_pool
> > >> *srp_create_fr_pool(struct
> > >> ib_device *device,
> > >>         spin_lock_init(&pool->lock);
> > >>         INIT_LIST_HEAD(&pool->free_list);
> > >>
> > >> -       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
> > >> -               mr_type = IB_MR_TYPE_SG_GAPS;
> > >> -       else
> > >> -               mr_type = IB_MR_TYPE_MEM_REG;
> > >> -
> > >>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> > >> -               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
> > >> +               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
> > >> +                                max_page_list_len);
> > >>                 if (IS_ERR(mr)) {
> > >>                         ret = PTR_ERR(mr);
> > >>                         if (ret == -ENOMEM)
> > >>
> > >>
> > >>
> > >> Now moving on to what got me here in the first place.
> > >> Bart, let me know if the 7 of the 8 patches in your most recent series
> > >> are
> > >> all still valid after this revert
> > >> Otherwise let me know which ones you want me to apply.
> > >>
> > >> patch 6 - I am thinking i sno longer valid.
> > >> "
> > >> If a HCA supports the SG_GAPS_REG feature then a single memory
> > >> region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch
> > >> reduces the number of memory regions that is allocated per SRP
> > >> session.
> > >> "
> > >>
> > >> Thanks
> > >> Laurence
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> Hello Max,
> 
> I only have CX4 and CX3 in my lab, this test bed only has CX4.
> 
> CA 'mlx5_0'
> 	CA type: MT4115
> 	Number of ports: 1
> 	Firmware version: 12.14.2036
> 	Hardware version: 0
> 	Node GUID: 0x7cfe900300726ed2
> 	System image GUID: 0x7cfe900300726ed2
> 	Port 1:
> 		State: Active
> 		Physical state: LinkUp
> 		Rate: 100
> 		Base lid: 3
> 		LMC: 0
> 		SM lid: 3
> 		Capability mask: 0x2651e84a
> 		Port GUID: 0x7cfe900300726ed2
> 		Link layer: InfiniBand
> 
> The test is simple, it's the same one I start with every time because it
> always
> brings out issues with mapping for large I/O sizes and mem registration if
> such issues exist.
> 
> I have a server running LIO with memory backed LUNS.
> These are served via a dual port mlx5 (CX4) over ib_srpt
> 
> The client mounts these LUNS via ib_srp (mlx5) and device-mapper-multipath
> and I run a simple dd on the XFS file system.
> 
> #!/bin/bash
> while true
> do
> 	dd if=/dev/zero of=/data-$1/bigfile bs=4096k count=900
> 	sync;
> 	rm -rf /data-$1/bigfile
> done
> 
> Once this passes I run a suite of other tests read/write, direct and
> buffered.
> 
> Thanks
> Laurence
> 
> 

Max, Leon, Israel, Bart and Doug

We should consider reverting that commit for now until we figure out what specifically brings this out unless a quick fix is
forthcoming.
I have been running since last night with that commit reverted and 7 of Bart's latest patches and its been rock solid stable.
Its also shown no issues performance wise.

Tests included read/writes, large/small I/O sizes, buffered and unbuffered, XFS file-system and direct I/O.

Thanks
Laurence

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Feb. 14, 2017, 4:53 p.m. UTC | #5
On Tue, 2017-02-14 at 12:00 +0200, Max Gurtovoy wrote:
> can you specify the test that repro these failures ?
> have you tried running with CX5 HCA or only CX4 ?
> I think this commit is right and we have issues in other places.

My proposal is to proceed as Laurence proposed - modify the SRP initiator
driver such that it doesn't use gaps registration anymore. However, an
additional change is needed in addition to the patch Laurence proposed,
namely to call blk_queue_virt_boundary() unconditionally. I'm currently
testing this approach against mlx4.

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Max Gurtovoy Feb. 14, 2017, 5:15 p.m. UTC | #6
On 2/14/2017 3:31 PM, Laurence Oberman wrote:
>
>
> ----- Original Message -----
>> From: "Max Gurtovoy" <maxg@mellanox.com>
>> To: "Leon Romanovsky" <leon@kernel.org>, "Laurence Oberman" <loberman@redhat.com>
>> Cc: "Bart Van Assche" <Bart.VanAssche@sandisk.com>, hch@lst.de, israelr@mellanox.com, linux-rdma@vger.kernel.org,
>> dledford@redhat.com
>> Sent: Tuesday, February 14, 2017 5:00:04 AM
>> Subject: Re: v4.10-rc SRP + mlx5 regression
>>
>> Hi Laurence,
>> can you specify the test that repro these failures ?
>> have you tried running with CX5 HCA or only CX4 ?
>> I think this commit is right and we have issues in other places.
>>
>>
>> On 2/14/2017 8:39 AM, Leon Romanovsky wrote:
>>> On Mon, Feb 13, 2017 at 09:19:54PM -0500, Laurence Oberman wrote:
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
>>>>> To: leon@kernel.org, loberman@redhat.com
>>>>> Cc: hch@lst.de, maxg@mellanox.com, israelr@mellanox.com,
>>>>> linux-rdma@vger.kernel.org, dledford@redhat.com
>>>>> Sent: Monday, February 13, 2017 4:52:28 PM
>>>>> Subject: Re: v4.10-rc SRP + mlx5 regression
>>>>>
>>>>> On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote:
>>>>>> I will have to run through this again and see where the bisect went
>>>>>> wrong.
>>>>>
>>>>> Hello Laurence,
>>>>>
>>>>> If you would be considering to repeat the bisect, did you know that a
>>>>> bisect
>>>>> can be sped up by specifying the names of the files and/or directories
>>>>> that
>>>>> are suspected? An example:
>>>>>
>>>>> git bisect start */infiniband */net
>>>>>
>>>>> Bart.--
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>> Hello Bart,
>>>>
>>>> Much better news this time :), worked late on this but got it figured out.
>>>>
>>>> OK, so we got to this one, which makes a lot more sense and is right in
>>>> the area where we are having issues.
>>>> I must have answered wrong to one of the steps the first time I did the
>>>> bisect.
>>>>
>>>> Reverted this in the master tree of rc8 and rebuilt the kernel
>>>> Now all tests pass on Linus's tree - 4.10.0_rc8+
>>>>
>>>> The interesting point here is that this commit is in rc5 but rc5 was not
>>>> failing so we have an interoperability issue with this commit
>>>>
>>>>
>>>> [loberman@ibclient linux]$ git bisect good
>>>> Bisecting: 0 revisions left to test after this (roughly 1 step)
>>>> [ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation when
>>>> the device supports sg gaps
>>>>
>>>> [loberman@ibclient linux]$ git show
>>>> ad8e66b4a80182174f73487ed25fd2140cf43361
>>>> commit ad8e66b4a80182174f73487ed25fd2140cf43361
>>>> Author: Israel Rukshin <israelr@mellanox.com>
>>>> Date:   Wed Dec 28 12:48:28 2016 +0200
>>>>
>>>>     IB/srp: fix mr allocation when the device supports sg gaps
>>>>
>>>>     If the device support arbitrary sg list mapping (device cap
>>>>     IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
>>>>     IB_MR_TYPE_SG_GAPS.
>>>>
>>>>     Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
>>>>     Cc: <stable@vger.kernel.org> # 4.7+
>>>>     Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>>>>     Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>>>     Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>>     Reviewed-by: Mark Bloch <markb@mellanox.com>
>>>>     Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
>>>>     Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>>     Signed-off-by: Doug Ledford <dledford@redhat.com>
>>>>
>>>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
>>>> b/drivers/infiniband/ulp/srp/ib_srp.c
>>>> index 8ddc071..0f67cf9 100644
>>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>>>> @@ -371,6 +371,7 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
>>>> ib_device *device,
>>>>         struct srp_fr_desc *d;
>>>>         struct ib_mr *mr;
>>>>         int i, ret = -EINVAL;
>>>> +       enum ib_mr_type mr_type;
>>>>
>>>>         if (pool_size <= 0)
>>>>                 goto err;
>>>> @@ -384,9 +385,13 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
>>>> ib_device *device,
>>>>         spin_lock_init(&pool->lock);
>>>>         INIT_LIST_HEAD(&pool->free_list);
>>>>
>>>> +       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
>>>> +               mr_type = IB_MR_TYPE_SG_GAPS;
>>>> +       else
>>>> +               mr_type = IB_MR_TYPE_MEM_REG;
>>>> +
>>>>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
>>>> -               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
>>>> -                                max_page_list_len);
>>>> +               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
>>>
>>> First, ib_alloc_mr receives u32 as a third parameter, but int was
>>> supplied. Second (I can be wrong here), shouldn't max_page_list_len be
>>> replaced with max_fast_reg_page_list_len?
>>>
>>> Thanks
>>
>> there is a statement that:
>>
>> 	if (srp_dev->use_fast_reg) {
>>                  srp_dev->max_pages_per_mr =
>>                          min_t(u32, srp_dev->max_pages_per_mr,
>>                                attr->max_fast_reg_page_list_len);
>>          }
>>
>> so we take the max_fast_reg_page_list_len in this case.
>>
>>>
>>>>                 if (IS_ERR(mr)) {
>>>>                         ret = PTR_ERR(mr);
>>>>                         if (ret == -ENOMEM)
>>>> (END)
>>>>
>>>>
>>>> So here is the revert patch, but you need to decide how you want to deal
>>>> with this.
>>>>
>>>>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
>>>>     Laurence Oberman
>>>>     Traced after bisection to a cause for this failure
>>>>
>>>> Tested-by:     Laurence Oberman <loberman@redhat.com>
>>>> Signed-off-by: Laurence Oberman <loberman@redhat.com>
>>>>
>>>> commit 90d169d312a173d5350c1bb36d6daab04c592127
>>>> Author: Laurence Oberman <loberman@redhat.com>
>>>> Date:   Mon Feb 13 20:33:32 2017 -0500
>>>>
>>>>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
>>>>     Laurence Oberman
>>>>     Traced after bisection to a cause for this failure
>>>>
>>>>     [  130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe
>>>>     [  130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>     for CQE ffff8817f0edbfb0
>>>>     [  130.510899] 00000000 00000000 00000000 00000000
>>>>     [  130.536455] 00000000 00000000 00000000 00000000
>>>>     [  130.561878] 00000000 00000000 00000000 00000000
>>>>     [  130.585904] 00000000 0f007806 2500002a db0ec4d0
>>>>     [  145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1.
>>>>     [  146.530439] scsi host1: ib_srp: reconnect succeeded
>>>>     [  146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe
>>>>     [  146.597635] 00000000 00000000 00000000 00000000
>>>>     [  146.623545] 00000000 00000000 00000000 00000000
>>>>     [  146.649599] 00000000 00000000 00000000 00000000
>>>>     [  146.673938] 00000000 0f007806 25000032 000c46d0
>>>>     [  146.697969] scsi host1: ib_srp: failed FAST REG status memory
>>>>     management operation error (6) for CQE ffff88
>>>>     [  162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1.
>>>>     [  162.256337] scsi host1: ib_srp: reconnect succeeded
>>>>     [  162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>     for CQE ffff8817f0412ef0`
>>>>
>>>>     This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361.
>>>>
>>>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
>>>> b/drivers/infiniband/ulp/srp/ib_srp.c
>>>> index 79bf484..01338c8 100644
>>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>>>> @@ -371,7 +371,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
>>>> ib_device *device,
>>>>         struct srp_fr_desc *d;
>>>>         struct ib_mr *mr;
>>>>         int i, ret = -EINVAL;
>>>> -       enum ib_mr_type mr_type;
>>>>
>>>>         if (pool_size <= 0)
>>>>                 goto err;
>>>> @@ -385,13 +384,9 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
>>>> ib_device *device,
>>>>         spin_lock_init(&pool->lock);
>>>>         INIT_LIST_HEAD(&pool->free_list);
>>>>
>>>> -       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
>>>> -               mr_type = IB_MR_TYPE_SG_GAPS;
>>>> -       else
>>>> -               mr_type = IB_MR_TYPE_MEM_REG;
>>>> -
>>>>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
>>>> -               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
>>>> +               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
>>>> +                                max_page_list_len);
>>>>                 if (IS_ERR(mr)) {
>>>>                         ret = PTR_ERR(mr);
>>>>                         if (ret == -ENOMEM)
>>>>
>>>>
>>>>
>>>> Now moving on to what got me here in the first place.
>>>> Bart, let me know if the 7 of the 8 patches in your most recent series are
>>>> all still valid after this revert
>>>> Otherwise let me know which ones you want me to apply.
>>>>
>>>> patch 6 - I am thinking i sno longer valid.
>>>> "
>>>> If a HCA supports the SG_GAPS_REG feature then a single memory
>>>> region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch
>>>> reduces the number of memory regions that is allocated per SRP
>>>> session.
>>>> "
>>>>
>>>> Thanks
>>>> Laurence
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> Hello Max,
>
> I only have CX4 and CX3 in my lab, this test bed only has CX4.
>
> CA 'mlx5_0'
> 	CA type: MT4115
> 	Number of ports: 1
> 	Firmware version: 12.14.2036
> 	Hardware version: 0
> 	Node GUID: 0x7cfe900300726ed2
> 	System image GUID: 0x7cfe900300726ed2
> 	Port 1:
> 		State: Active
> 		Physical state: LinkUp
> 		Rate: 100
> 		Base lid: 3
> 		LMC: 0
> 		SM lid: 3
> 		Capability mask: 0x2651e84a
> 		Port GUID: 0x7cfe900300726ed2
> 		Link layer: InfiniBand
>
> The test is simple, it's the same one I start with every time because it always
> brings out issues with mapping for large I/O sizes and mem registration if such issues exist.
>
> I have a server running LIO with memory backed LUNS.
> These are served via a dual port mlx5 (CX4) over ib_srpt
>
> The client mounts these LUNS via ib_srp (mlx5) and device-mapper-multipath
> and I run a simple dd on the XFS file system.
>
> #!/bin/bash
> while true
> do
> 	dd if=/dev/zero of=/data-$1/bigfile bs=4096k count=900
> 	sync;
> 	rm -rf /data-$1/bigfile
> done
>
> Once this passes I run a suite of other tests read/write, direct and buffered.

Laurence,
this is 4MB transactions. can you increase the cmd_sg_entries to the 
maximum and run the test again ?


>
> Thanks
> Laurence
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Max Gurtovoy Feb. 14, 2017, 5:15 p.m. UTC | #7
On 2/14/2017 3:31 PM, Laurence Oberman wrote:
>
>
> ----- Original Message -----
>> From: "Max Gurtovoy" <maxg@mellanox.com>
>> To: "Leon Romanovsky" <leon@kernel.org>, "Laurence Oberman" <loberman@redhat.com>
>> Cc: "Bart Van Assche" <Bart.VanAssche@sandisk.com>, hch@lst.de, israelr@mellanox.com, linux-rdma@vger.kernel.org,
>> dledford@redhat.com
>> Sent: Tuesday, February 14, 2017 5:00:04 AM
>> Subject: Re: v4.10-rc SRP + mlx5 regression
>>
>> Hi Laurence,
>> can you specify the test that repro these failures ?
>> have you tried running with CX5 HCA or only CX4 ?
>> I think this commit is right and we have issues in other places.
>>
>>
>> On 2/14/2017 8:39 AM, Leon Romanovsky wrote:
>>> On Mon, Feb 13, 2017 at 09:19:54PM -0500, Laurence Oberman wrote:
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
>>>>> To: leon@kernel.org, loberman@redhat.com
>>>>> Cc: hch@lst.de, maxg@mellanox.com, israelr@mellanox.com,
>>>>> linux-rdma@vger.kernel.org, dledford@redhat.com
>>>>> Sent: Monday, February 13, 2017 4:52:28 PM
>>>>> Subject: Re: v4.10-rc SRP + mlx5 regression
>>>>>
>>>>> On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote:
>>>>>> I will have to run through this again and see where the bisect went
>>>>>> wrong.
>>>>>
>>>>> Hello Laurence,
>>>>>
>>>>> If you would be considering to repeat the bisect, did you know that a
>>>>> bisect
>>>>> can be sped up by specifying the names of the files and/or directories
>>>>> that
>>>>> are suspected? An example:
>>>>>
>>>>> git bisect start */infiniband */net
>>>>>
>>>>> Bart.--
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>> Hello Bart,
>>>>
>>>> Much better news this time :), worked late on this but got it figured out.
>>>>
>>>> OK, so we got to this one, which makes a lot more sense and is right in
>>>> the area where we are having issues.
>>>> I must have answered wrong to one of the steps the first time I did the
>>>> bisect.
>>>>
>>>> Reverted this in the master tree of rc8 and rebuilt the kernel
>>>> Now all tests pass on Linus's tree - 4.10.0_rc8+
>>>>
>>>> The interesting point here is that this commit is in rc5 but rc5 was not
>>>> failing so we have an interoperability issue with this commit
>>>>
>>>>
>>>> [loberman@ibclient linux]$ git bisect good
>>>> Bisecting: 0 revisions left to test after this (roughly 1 step)
>>>> [ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation when
>>>> the device supports sg gaps
>>>>
>>>> [loberman@ibclient linux]$ git show
>>>> ad8e66b4a80182174f73487ed25fd2140cf43361
>>>> commit ad8e66b4a80182174f73487ed25fd2140cf43361
>>>> Author: Israel Rukshin <israelr@mellanox.com>
>>>> Date:   Wed Dec 28 12:48:28 2016 +0200
>>>>
>>>>     IB/srp: fix mr allocation when the device supports sg gaps
>>>>
>>>>     If the device support arbitrary sg list mapping (device cap
>>>>     IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
>>>>     IB_MR_TYPE_SG_GAPS.
>>>>
>>>>     Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
>>>>     Cc: <stable@vger.kernel.org> # 4.7+
>>>>     Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>>>>     Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>>>     Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>>     Reviewed-by: Mark Bloch <markb@mellanox.com>
>>>>     Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
>>>>     Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>>     Signed-off-by: Doug Ledford <dledford@redhat.com>
>>>>
>>>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
>>>> b/drivers/infiniband/ulp/srp/ib_srp.c
>>>> index 8ddc071..0f67cf9 100644
>>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>>>> @@ -371,6 +371,7 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
>>>> ib_device *device,
>>>>         struct srp_fr_desc *d;
>>>>         struct ib_mr *mr;
>>>>         int i, ret = -EINVAL;
>>>> +       enum ib_mr_type mr_type;
>>>>
>>>>         if (pool_size <= 0)
>>>>                 goto err;
>>>> @@ -384,9 +385,13 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
>>>> ib_device *device,
>>>>         spin_lock_init(&pool->lock);
>>>>         INIT_LIST_HEAD(&pool->free_list);
>>>>
>>>> +       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
>>>> +               mr_type = IB_MR_TYPE_SG_GAPS;
>>>> +       else
>>>> +               mr_type = IB_MR_TYPE_MEM_REG;
>>>> +
>>>>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
>>>> -               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
>>>> -                                max_page_list_len);
>>>> +               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
>>>
>>> First, ib_alloc_mr receives u32 as a third parameter, but int was
>>> supplied. Second (I can be wrong here), shouldn't max_page_list_len be
>>> replaced with max_fast_reg_page_list_len?
>>>
>>> Thanks
>>
>> there is a statement that:
>>
>> 	if (srp_dev->use_fast_reg) {
>>                  srp_dev->max_pages_per_mr =
>>                          min_t(u32, srp_dev->max_pages_per_mr,
>>                                attr->max_fast_reg_page_list_len);
>>          }
>>
>> so we take the max_fast_reg_page_list_len in this case.
>>
>>>
>>>>                 if (IS_ERR(mr)) {
>>>>                         ret = PTR_ERR(mr);
>>>>                         if (ret == -ENOMEM)
>>>> (END)
>>>>
>>>>
>>>> So here is the revert patch, but you need to decide how you want to deal
>>>> with this.
>>>>
>>>>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
>>>>     Laurence Oberman
>>>>     Traced after bisection to a cause for this failure
>>>>
>>>> Tested-by:     Laurence Oberman <loberman@redhat.com>
>>>> Signed-off-by: Laurence Oberman <loberman@redhat.com>
>>>>
>>>> commit 90d169d312a173d5350c1bb36d6daab04c592127
>>>> Author: Laurence Oberman <loberman@redhat.com>
>>>> Date:   Mon Feb 13 20:33:32 2017 -0500
>>>>
>>>>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
>>>>     Laurence Oberman
>>>>     Traced after bisection to a cause for this failure
>>>>
>>>>     [  130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe
>>>>     [  130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>     for CQE ffff8817f0edbfb0
>>>>     [  130.510899] 00000000 00000000 00000000 00000000
>>>>     [  130.536455] 00000000 00000000 00000000 00000000
>>>>     [  130.561878] 00000000 00000000 00000000 00000000
>>>>     [  130.585904] 00000000 0f007806 2500002a db0ec4d0
>>>>     [  145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1.
>>>>     [  146.530439] scsi host1: ib_srp: reconnect succeeded
>>>>     [  146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe
>>>>     [  146.597635] 00000000 00000000 00000000 00000000
>>>>     [  146.623545] 00000000 00000000 00000000 00000000
>>>>     [  146.649599] 00000000 00000000 00000000 00000000
>>>>     [  146.673938] 00000000 0f007806 25000032 000c46d0
>>>>     [  146.697969] scsi host1: ib_srp: failed FAST REG status memory
>>>>     management operation error (6) for CQE ffff88
>>>>     [  162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1.
>>>>     [  162.256337] scsi host1: ib_srp: reconnect succeeded
>>>>     [  162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5)
>>>>     for CQE ffff8817f0412ef0`
>>>>
>>>>     This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361.
>>>>
>>>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
>>>> b/drivers/infiniband/ulp/srp/ib_srp.c
>>>> index 79bf484..01338c8 100644
>>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>>>> @@ -371,7 +371,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
>>>> ib_device *device,
>>>>         struct srp_fr_desc *d;
>>>>         struct ib_mr *mr;
>>>>         int i, ret = -EINVAL;
>>>> -       enum ib_mr_type mr_type;
>>>>
>>>>         if (pool_size <= 0)
>>>>                 goto err;
>>>> @@ -385,13 +384,9 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
>>>> ib_device *device,
>>>>         spin_lock_init(&pool->lock);
>>>>         INIT_LIST_HEAD(&pool->free_list);
>>>>
>>>> -       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
>>>> -               mr_type = IB_MR_TYPE_SG_GAPS;
>>>> -       else
>>>> -               mr_type = IB_MR_TYPE_MEM_REG;
>>>> -
>>>>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
>>>> -               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
>>>> +               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
>>>> +                                max_page_list_len);
>>>>                 if (IS_ERR(mr)) {
>>>>                         ret = PTR_ERR(mr);
>>>>                         if (ret == -ENOMEM)
>>>>
>>>>
>>>>
>>>> Now moving on to what got me here in the first place.
>>>> Bart, let me know if the 7 of the 8 patches in your most recent series are
>>>> all still valid after this revert
>>>> Otherwise let me know which ones you want me to apply.
>>>>
>>>> patch 6 - I am thinking i sno longer valid.
>>>> "
>>>> If a HCA supports the SG_GAPS_REG feature then a single memory
>>>> region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch
>>>> reduces the number of memory regions that is allocated per SRP
>>>> session.
>>>> "
>>>>
>>>> Thanks
>>>> Laurence
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> Hello Max,
>
> I only have CX4 and CX3 in my lab, this test bed only has CX4.
>
> CA 'mlx5_0'
> 	CA type: MT4115
> 	Number of ports: 1
> 	Firmware version: 12.14.2036
> 	Hardware version: 0
> 	Node GUID: 0x7cfe900300726ed2
> 	System image GUID: 0x7cfe900300726ed2
> 	Port 1:
> 		State: Active
> 		Physical state: LinkUp
> 		Rate: 100
> 		Base lid: 3
> 		LMC: 0
> 		SM lid: 3
> 		Capability mask: 0x2651e84a
> 		Port GUID: 0x7cfe900300726ed2
> 		Link layer: InfiniBand
>
> The test is simple, it's the same one I start with every time because it always
> brings out issues with mapping for large I/O sizes and mem registration if such issues exist.
>
> I have a server running LIO with memory backed LUNS.
> These are served via a dual port mlx5 (CX4) over ib_srpt
>
> The client mounts these LUNS via ib_srp (mlx5) and device-mapper-multipath
> and I run a simple dd on the XFS file system.
>
> #!/bin/bash
> while true
> do
> 	dd if=/dev/zero of=/data-$1/bigfile bs=4096k count=900
> 	sync;
> 	rm -rf /data-$1/bigfile
> done
>
> Once this passes I run a suite of other tests read/write, direct and buffered.

Laurence,
this is 4MB transactions. can you increase the cmd_sg_entries to the 
maximum and run the test again ?


>
> Thanks
> Laurence
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Feb. 14, 2017, 5:29 p.m. UTC | #8
On 02/14/2017 09:15 AM, Max Gurtovoy wrote:
> this is 4MB transactions. can you increase the cmd_sg_entries to the
> maximum and run the test again ?

How could that affect the error message Laurence reported? If
cmd_sg_entries is too low then the block layer refuses direct I/O
requests that are too large. From __scsi_init_queue():

	blk_queue_max_segments(q, min_t(unsigned short,
					shost->sg_tablesize,
					SG_MAX_SEGMENTS));

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman Feb. 14, 2017, 5:31 p.m. UTC | #9
----- Original Message -----
> From: "Max Gurtovoy" <maxg@mellanox.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: "Leon Romanovsky" <leon@kernel.org>, "Bart Van Assche" <Bart.VanAssche@sandisk.com>, hch@lst.de,
> israelr@mellanox.com, linux-rdma@vger.kernel.org, dledford@redhat.com
> Sent: Tuesday, February 14, 2017 12:15:20 PM
> Subject: Re: v4.10-rc SRP + mlx5 regression
> 
> 
> 
> On 2/14/2017 3:31 PM, Laurence Oberman wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Max Gurtovoy" <maxg@mellanox.com>
> >> To: "Leon Romanovsky" <leon@kernel.org>, "Laurence Oberman"
> >> <loberman@redhat.com>
> >> Cc: "Bart Van Assche" <Bart.VanAssche@sandisk.com>, hch@lst.de,
> >> israelr@mellanox.com, linux-rdma@vger.kernel.org,
> >> dledford@redhat.com
> >> Sent: Tuesday, February 14, 2017 5:00:04 AM
> >> Subject: Re: v4.10-rc SRP + mlx5 regression
> >>
> >> Hi Laurence,
> >> can you specify the test that repro these failures ?
> >> have you tried running with CX5 HCA or only CX4 ?
> >> I think this commit is right and we have issues in other places.
> >>
> >>
> >> On 2/14/2017 8:39 AM, Leon Romanovsky wrote:
> >>> On Mon, Feb 13, 2017 at 09:19:54PM -0500, Laurence Oberman wrote:
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
> >>>>> To: leon@kernel.org, loberman@redhat.com
> >>>>> Cc: hch@lst.de, maxg@mellanox.com, israelr@mellanox.com,
> >>>>> linux-rdma@vger.kernel.org, dledford@redhat.com
> >>>>> Sent: Monday, February 13, 2017 4:52:28 PM
> >>>>> Subject: Re: v4.10-rc SRP + mlx5 regression
> >>>>>
> >>>>> On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote:
> >>>>>> I will have to run through this again and see where the bisect went
> >>>>>> wrong.
> >>>>>
> >>>>> Hello Laurence,
> >>>>>
> >>>>> If you would be considering to repeat the bisect, did you know that a
> >>>>> bisect
> >>>>> can be sped up by specifying the names of the files and/or directories
> >>>>> that
> >>>>> are suspected? An example:
> >>>>>
> >>>>> git bisect start */infiniband */net
> >>>>>
> >>>>> Bart.--
> >>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >>>>> in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>
> >>>> Hello Bart,
> >>>>
> >>>> Much better news this time :), worked late on this but got it figured
> >>>> out.
> >>>>
> >>>> OK, so we got to this one, which makes a lot more sense and is right in
> >>>> the area where we are having issues.
> >>>> I must have answered wrong to one of the steps the first time I did the
> >>>> bisect.
> >>>>
> >>>> Reverted this in the master tree of rc8 and rebuilt the kernel
> >>>> Now all tests pass on Linus's tree - 4.10.0_rc8+
> >>>>
> >>>> The interesting point here is that this commit is in rc5 but rc5 was not
> >>>> failing so we have an interoperability issue with this commit
> >>>>
> >>>>
> >>>> [loberman@ibclient linux]$ git bisect good
> >>>> Bisecting: 0 revisions left to test after this (roughly 1 step)
> >>>> [ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation
> >>>> when
> >>>> the device supports sg gaps
> >>>>
> >>>> [loberman@ibclient linux]$ git show
> >>>> ad8e66b4a80182174f73487ed25fd2140cf43361
> >>>> commit ad8e66b4a80182174f73487ed25fd2140cf43361
> >>>> Author: Israel Rukshin <israelr@mellanox.com>
> >>>> Date:   Wed Dec 28 12:48:28 2016 +0200
> >>>>
> >>>>     IB/srp: fix mr allocation when the device supports sg gaps
> >>>>
> >>>>     If the device support arbitrary sg list mapping (device cap
> >>>>     IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
> >>>>     IB_MR_TYPE_SG_GAPS.
> >>>>
> >>>>     Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
> >>>>     Cc: <stable@vger.kernel.org> # 4.7+
> >>>>     Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> >>>>     Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> >>>>     Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> >>>>     Reviewed-by: Mark Bloch <markb@mellanox.com>
> >>>>     Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> >>>>     Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> >>>>     Signed-off-by: Doug Ledford <dledford@redhat.com>
> >>>>
> >>>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
> >>>> b/drivers/infiniband/ulp/srp/ib_srp.c
> >>>> index 8ddc071..0f67cf9 100644
> >>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> >>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> >>>> @@ -371,6 +371,7 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
> >>>> ib_device *device,
> >>>>         struct srp_fr_desc *d;
> >>>>         struct ib_mr *mr;
> >>>>         int i, ret = -EINVAL;
> >>>> +       enum ib_mr_type mr_type;
> >>>>
> >>>>         if (pool_size <= 0)
> >>>>                 goto err;
> >>>> @@ -384,9 +385,13 @@ static struct srp_fr_pool
> >>>> *srp_create_fr_pool(struct
> >>>> ib_device *device,
> >>>>         spin_lock_init(&pool->lock);
> >>>>         INIT_LIST_HEAD(&pool->free_list);
> >>>>
> >>>> +       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
> >>>> +               mr_type = IB_MR_TYPE_SG_GAPS;
> >>>> +       else
> >>>> +               mr_type = IB_MR_TYPE_MEM_REG;
> >>>> +
> >>>>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> >>>> -               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
> >>>> -                                max_page_list_len);
> >>>> +               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
> >>>
> >>> First, ib_alloc_mr receives u32 as a third parameter, but int was
> >>> supplied. Second (I can be wrong here), shouldn't max_page_list_len be
> >>> replaced with max_fast_reg_page_list_len?
> >>>
> >>> Thanks
> >>
> >> there is a statement that:
> >>
> >> 	if (srp_dev->use_fast_reg) {
> >>                  srp_dev->max_pages_per_mr =
> >>                          min_t(u32, srp_dev->max_pages_per_mr,
> >>                                attr->max_fast_reg_page_list_len);
> >>          }
> >>
> >> so we take the max_fast_reg_page_list_len in this case.
> >>
> >>>
> >>>>                 if (IS_ERR(mr)) {
> >>>>                         ret = PTR_ERR(mr);
> >>>>                         if (ret == -ENOMEM)
> >>>> (END)
> >>>>
> >>>>
> >>>> So here is the revert patch, but you need to decide how you want to deal
> >>>> with this.
> >>>>
> >>>>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
> >>>>     Laurence Oberman
> >>>>     Traced after bisection to a cause for this failure
> >>>>
> >>>> Tested-by:     Laurence Oberman <loberman@redhat.com>
> >>>> Signed-off-by: Laurence Oberman <loberman@redhat.com>
> >>>>
> >>>> commit 90d169d312a173d5350c1bb36d6daab04c592127
> >>>> Author: Laurence Oberman <loberman@redhat.com>
> >>>> Date:   Mon Feb 13 20:33:32 2017 -0500
> >>>>
> >>>>     Revert "IB/srp: fix mr allocation when the device supports sg gaps"
> >>>>     Laurence Oberman
> >>>>     Traced after bisection to a cause for this failure
> >>>>
> >>>>     [  130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe
> >>>>     [  130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5)
> >>>>     for CQE ffff8817f0edbfb0
> >>>>     [  130.510899] 00000000 00000000 00000000 00000000
> >>>>     [  130.536455] 00000000 00000000 00000000 00000000
> >>>>     [  130.561878] 00000000 00000000 00000000 00000000
> >>>>     [  130.585904] 00000000 0f007806 2500002a db0ec4d0
> >>>>     [  145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> >>>>     [  146.530439] scsi host1: ib_srp: reconnect succeeded
> >>>>     [  146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe
> >>>>     [  146.597635] 00000000 00000000 00000000 00000000
> >>>>     [  146.623545] 00000000 00000000 00000000 00000000
> >>>>     [  146.649599] 00000000 00000000 00000000 00000000
> >>>>     [  146.673938] 00000000 0f007806 25000032 000c46d0
> >>>>     [  146.697969] scsi host1: ib_srp: failed FAST REG status memory
> >>>>     management operation error (6) for CQE ffff88
> >>>>     [  162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1.
> >>>>     [  162.256337] scsi host1: ib_srp: reconnect succeeded
> >>>>     [  162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5)
> >>>>     for CQE ffff8817f0412ef0`
> >>>>
> >>>>     This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361.
> >>>>
> >>>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
> >>>> b/drivers/infiniband/ulp/srp/ib_srp.c
> >>>> index 79bf484..01338c8 100644
> >>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> >>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> >>>> @@ -371,7 +371,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct
> >>>> ib_device *device,
> >>>>         struct srp_fr_desc *d;
> >>>>         struct ib_mr *mr;
> >>>>         int i, ret = -EINVAL;
> >>>> -       enum ib_mr_type mr_type;
> >>>>
> >>>>         if (pool_size <= 0)
> >>>>                 goto err;
> >>>> @@ -385,13 +384,9 @@ static struct srp_fr_pool
> >>>> *srp_create_fr_pool(struct
> >>>> ib_device *device,
> >>>>         spin_lock_init(&pool->lock);
> >>>>         INIT_LIST_HEAD(&pool->free_list);
> >>>>
> >>>> -       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
> >>>> -               mr_type = IB_MR_TYPE_SG_GAPS;
> >>>> -       else
> >>>> -               mr_type = IB_MR_TYPE_MEM_REG;
> >>>> -
> >>>>         for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> >>>> -               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
> >>>> +               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
> >>>> +                                max_page_list_len);
> >>>>                 if (IS_ERR(mr)) {
> >>>>                         ret = PTR_ERR(mr);
> >>>>                         if (ret == -ENOMEM)
> >>>>
> >>>>
> >>>>
> >>>> Now moving on to what got me here in the first place.
> >>>> Bart, let me know if the 7 of the 8 patches in your most recent series
> >>>> are
> >>>> all still valid after this revert
> >>>> Otherwise let me know which ones you want me to apply.
> >>>>
> >>>> patch 6 - I am thinking i sno longer valid.
> >>>> "
> >>>> If a HCA supports the SG_GAPS_REG feature then a single memory
> >>>> region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch
> >>>> reduces the number of memory regions that is allocated per SRP
> >>>> session.
> >>>> "
> >>>>
> >>>> Thanks
> >>>> Laurence
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> > Hello Max,
> >
> > I only have CX4 and CX3 in my lab, this test bed only has CX4.
> >
> > CA 'mlx5_0'
> > 	CA type: MT4115
> > 	Number of ports: 1
> > 	Firmware version: 12.14.2036
> > 	Hardware version: 0
> > 	Node GUID: 0x7cfe900300726ed2
> > 	System image GUID: 0x7cfe900300726ed2
> > 	Port 1:
> > 		State: Active
> > 		Physical state: LinkUp
> > 		Rate: 100
> > 		Base lid: 3
> > 		LMC: 0
> > 		SM lid: 3
> > 		Capability mask: 0x2651e84a
> > 		Port GUID: 0x7cfe900300726ed2
> > 		Link layer: InfiniBand
> >
> > The test is simple, it's the same one I start with every time because it
> > always
> > brings out issues with mapping for large I/O sizes and mem registration if
> > such issues exist.
> >
> > I have a server running LIO with memory backed LUNS.
> > These are served via a dual port mlx5 (CX4) over ib_srpt
> >
> > The client mounts these LUNS via ib_srp (mlx5) and device-mapper-multipath
> > and I run a simple dd on the XFS file system.
> >
> > #!/bin/bash
> > while true
> > do
> > 	dd if=/dev/zero of=/data-$1/bigfile bs=4096k count=900
> > 	sync;
> > 	rm -rf /data-$1/bigfile
> > done
> >
> > Once this passes I run a suite of other tests read/write, direct and
> > buffered.
> 
> Laurence,
> this is 4MB transactions. can you increase the cmd_sg_entries to the
> maximum and run the test again ?
> 
> 
> >
> > Thanks
> > Laurence
> >
> 

Hello Max,

Yes 4MB is very important for one of our biggest RHEL customers and I worked many hours with Bart last year to stabilize large 4MB
buffered and direct I/O for ib_srp/ib_srpt.

I am already running with:
options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048

Regards and thanks for your assistance

Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 8ddc071..0f67cf9 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -371,6 +371,7 @@  static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
        struct srp_fr_desc *d;
        struct ib_mr *mr;
        int i, ret = -EINVAL;
+       enum ib_mr_type mr_type;
 
        if (pool_size <= 0)
                goto err;
@@ -384,9 +385,13 @@  static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
        spin_lock_init(&pool->lock);
        INIT_LIST_HEAD(&pool->free_list);
 
+       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
+               mr_type = IB_MR_TYPE_SG_GAPS;
+       else
+               mr_type = IB_MR_TYPE_MEM_REG;
+
        for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
-               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
-                                max_page_list_len);
+               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
                if (IS_ERR(mr)) {
                        ret = PTR_ERR(mr);
                        if (ret == -ENOMEM)
(END)


So here is the revert patch, but you need to decide how you want to deal with this.

    Revert "IB/srp: fix mr allocation when the device supports sg gaps"
    Laurence Oberman
    Traced after bisection to a cause for this failure

Tested-by:     Laurence Oberman <loberman@redhat.com>
Signed-off-by: Laurence Oberman <loberman@redhat.com>

commit 90d169d312a173d5350c1bb36d6daab04c592127
Author: Laurence Oberman <loberman@redhat.com>
Date:   Mon Feb 13 20:33:32 2017 -0500

    Revert "IB/srp: fix mr allocation when the device supports sg gaps"
    Laurence Oberman
    Traced after bisection to a cause for this failure
    
    [  130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe
    [  130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f0edbfb0
    [  130.510899] 00000000 00000000 00000000 00000000
    [  130.536455] 00000000 00000000 00000000 00000000
    [  130.561878] 00000000 00000000 00000000 00000000
    [  130.585904] 00000000 0f007806 2500002a db0ec4d0
    [  145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1.
    [  146.530439] scsi host1: ib_srp: reconnect succeeded
    [  146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe
    [  146.597635] 00000000 00000000 00000000 00000000
    [  146.623545] 00000000 00000000 00000000 00000000
    [  146.649599] 00000000 00000000 00000000 00000000
    [  146.673938] 00000000 0f007806 25000032 000c46d0
    [  146.697969] scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff88
    [  162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1.
    [  162.256337] scsi host1: ib_srp: reconnect succeeded
    [  162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f0412ef0`
    
    This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361.

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 79bf484..01338c8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -371,7 +371,6 @@  static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
        struct srp_fr_desc *d;
        struct ib_mr *mr;
        int i, ret = -EINVAL;
-       enum ib_mr_type mr_type;
 
        if (pool_size <= 0)
                goto err;
@@ -385,13 +384,9 @@  static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
        spin_lock_init(&pool->lock);
        INIT_LIST_HEAD(&pool->free_list);
 
-       if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG)
-               mr_type = IB_MR_TYPE_SG_GAPS;
-       else
-               mr_type = IB_MR_TYPE_MEM_REG;
-
        for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
-               mr = ib_alloc_mr(pd, mr_type, max_page_list_len);
+               mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
+                                max_page_list_len);
                if (IS_ERR(mr)) {
                        ret = PTR_ERR(mr);
                        if (ret == -ENOMEM)