diff mbox

[11/11] IB/srp: Prevent mapping failures

Message ID 572A8975.9060606@sandisk.com (mailing list archive)
State Superseded
Headers show

Commit Message

Bart Van Assche May 4, 2016, 11:44 p.m. UTC
On 05/03/2016 02:33 AM, Christoph Hellwig wrote:
> On Fri, Apr 22, 2016 at 02:16:31PM -0700, Bart Van Assche wrote:
>> If both max_sectors and the queue_depth are high enough it can
>> happen that the MR pool is depleted temporarily. This causes
>> the SRP initiator to report mapping failures. Although the SRP
>> initiator recovers from such mapping failures, prevent that
>> this can happen by limiting max_sectors.
> 
> FYI, even with this patch I see tons of errors like:
> 
> [ 2237.161106] scsi host7: ib_srp: Failed to map data (-12)

Hello Christoph,

The patch below makes these messages disappear on my setup. Are
you OK with including this change in patch 11/11?


Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Hellwig May 5, 2016, 2:06 p.m. UTC | #1
On Wed, May 04, 2016 at 04:44:53PM -0700, Bart Van Assche wrote:
> > FYI, even with this patch I see tons of errors like:
> > 
> > [ 2237.161106] scsi host7: ib_srp: Failed to map data (-12)
> 
> Hello Christoph,
> 
> The patch below makes these messages disappear on my setup. Are
> you OK with including this change in patch 11/11?

Looks fine to me.

Btw, while we're at it - any chance you could move the fixes before
the cleanups in the series so that they are more easily backportabke to
-stable?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 5, 2016, 7:50 p.m. UTC | #2
On 05/05/2016 07:06 AM, Christoph Hellwig wrote:
> Btw, while we're at it - any chance you could move the fixes before
> the cleanups in the series so that they are more easily backportabke to
> -stable?

That sounds like a good idea to me. I will do that.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 5, 2016, 7:58 p.m. UTC | #3
----- Original Message -----
> From: "Christoph Hellwig" <hch@lst.de>
> To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> Cc: "Christoph Hellwig" <hch@lst.de>, "Doug Ledford" <dledford@redhat.com>, "Sagi Grimberg" <sagi@grimberg.me>,
> "Laurence Oberman" <loberman@redhat.com>, linux-rdma@vger.kernel.org
> Sent: Thursday, May 5, 2016 10:06:16 AM
> Subject: Re: [PATCH 11/11] IB/srp: Prevent mapping failures
> 
> On Wed, May 04, 2016 at 04:44:53PM -0700, Bart Van Assche wrote:
> > > FYI, even with this patch I see tons of errors like:
> > > 
> > > [ 2237.161106] scsi host7: ib_srp: Failed to map data (-12)
> > 
> > Hello Christoph,
> > 
> > The patch below makes these messages disappear on my setup. Are
> > you OK with including this change in patch 11/11?
> 
> Looks fine to me.
> 
> Btw, while we're at it - any chance you could move the fixes before
> the cleanups in the series so that they are more easily backportabke to
> -stable?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

I added and tested this patch.
I left the warn message in as I need it.

[root@srptest ~]# dmesg | grep Reducing
[  551.526015] scsi host4: ib_srp: Reducing max_sectors from 8192 to 4080
[  559.367609] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
[  570.449121] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
[  577.831703] scsi host6: ib_srp: Reducing max_sectors from 8192 to 4080

Its stable but only because we constrain the max_sectors.
I am writing 4MB buffered to 10 file systems (ext3) however the max_sectors is constrained to 2040k and its stable.

I have been told that RHEL6 and the Mofed stack happily does 4MB buffered I/O with mapping failures but I need to check that out.
That would make no sense to me unless the Mofed stack has code that differs enough to what we have upstream to make a difference.

I am about to install RHEL6 and Mofed to compare and see what happens with 4MB buffered.

Tested-by Laurence Oberman <loberman@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 5, 2016, 8:14 p.m. UTC | #4
On 05/05/2016 12:58 PM, Laurence Oberman wrote:
> I added and tested this patch.
> I left the warn message in as I need it.
>
> [root@srptest ~]# dmesg | grep Reducing
> [  551.526015] scsi host4: ib_srp: Reducing max_sectors from 8192 to 4080
> [  559.367609] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
> [  570.449121] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
> [  577.831703] scsi host6: ib_srp: Reducing max_sectors from 8192 to 4080
>
> Its stable but only because we constrain the max_sectors.
> I am writing 4MB buffered to 10 file systems (ext3) however the max_sectors is constrained to 2040k and its stable.
>
> I have been told that RHEL6 and the Mofed stack happily does 4MB buffered I/O with mapping failures but I need to check that out.
> That would make no sense to me unless the Mofed stack has code that differs enough to what we have upstream to make a difference.
>
> I am about to install RHEL6 and Mofed to compare and see what happens with 4MB buffered.
>
> Tested-by Laurence Oberman <loberman@redhat.com>

Thanks for the help with testing :-)

Can you try to increase SRP_MAX_PAGES_PER_MR in ib_srp.h from 512 to 
1024 and see whether that allows 4MB I/O?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 5, 2016, 8:17 p.m. UTC | #5
On 05/05/2016 01:14 PM, Bart Van Assche wrote:
> On 05/05/2016 12:58 PM, Laurence Oberman wrote:
>> I added and tested this patch.
>> I left the warn message in as I need it.
>>
>> [root@srptest ~]# dmesg | grep Reducing
>> [  551.526015] scsi host4: ib_srp: Reducing max_sectors from 8192 to 4080
>> [  559.367609] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
>> [  570.449121] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
>> [  577.831703] scsi host6: ib_srp: Reducing max_sectors from 8192 to 4080
>>
>> Its stable but only because we constrain the max_sectors.
>> I am writing 4MB buffered to 10 file systems (ext3) however the max_sectors is constrained to 2040k and its stable.
>>
>> I have been told that RHEL6 and the Mofed stack happily does 4MB buffered I/O with mapping failures but I need to check that out.
>> That would make no sense to me unless the Mofed stack has code that differs enough to what we have upstream to make a difference.
>>
>> I am about to install RHEL6 and Mofed to compare and see what happens with 4MB buffered.
>>
>> Tested-by Laurence Oberman <loberman@redhat.com>
>
> Thanks for the help with testing :-)
>
> Can you try to increase SRP_MAX_PAGES_PER_MR in ib_srp.h from 512 to
> 1024 and see whether that allows 4MB I/O?

(replying to my own e-mail)

That was a typo - I meant 1025 instead of 1024.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 5, 2016, 8:55 p.m. UTC | #6
----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>, "Christoph Hellwig" <hch@lst.de>
> Cc: "Doug Ledford" <dledford@redhat.com>, "Sagi Grimberg" <sagi@grimberg.me>, linux-rdma@vger.kernel.org
> Sent: Thursday, May 5, 2016 4:17:51 PM
> Subject: Re: [PATCH 11/11] IB/srp: Prevent mapping failures
> 
> On 05/05/2016 01:14 PM, Bart Van Assche wrote:
> > On 05/05/2016 12:58 PM, Laurence Oberman wrote:
> >> I added and tested this patch.
> >> I left the warn message in as I need it.
> >>
> >> [root@srptest ~]# dmesg | grep Reducing
> >> [  551.526015] scsi host4: ib_srp: Reducing max_sectors from 8192 to 4080
> >> [  559.367609] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
> >> [  570.449121] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
> >> [  577.831703] scsi host6: ib_srp: Reducing max_sectors from 8192 to 4080
> >>
> >> Its stable but only because we constrain the max_sectors.
> >> I am writing 4MB buffered to 10 file systems (ext3) however the
> >> max_sectors is constrained to 2040k and its stable.
> >>
> >> I have been told that RHEL6 and the Mofed stack happily does 4MB buffered
> >> I/O with mapping failures but I need to check that out.
> >> That would make no sense to me unless the Mofed stack has code that
> >> differs enough to what we have upstream to make a difference.
> >>
> >> I am about to install RHEL6 and Mofed to compare and see what happens with
> >> 4MB buffered.
> >>
> >> Tested-by Laurence Oberman <loberman@redhat.com>
> >
> > Thanks for the help with testing :-)
> >
> > Can you try to increase SRP_MAX_PAGES_PER_MR in ib_srp.h from 512 to
> > 1024 and see whether that allows 4MB I/O?
> 
> (replying to my own e-mail)
> 
> That was a typo - I meant 1025 instead of 1024.
> 
> Bart.
> 
> 
Set at 1025, rebuilt ib_srp module, rebooted client and:

Still get:

[root@srptest ~]# dmesg | grep Reducing
[  227.076550] scsi host4: ib_srp: Reducing max_sectors from 8192 to 4080
[  235.316995] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
[  245.726223] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
[  253.197573] scsi host6: ib_srp: Reducing max_sectors from 8192 to 4080

And then of course we are constrained

[root@srptest queue]# cat max_sectors_kb
2040
[root@srptest queue]# echo 4096 > max_sectors_kb
-bash: echo: write error: Invalid argument

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 5, 2016, 9:40 p.m. UTC | #7
On 05/05/2016 01:56 PM, Laurence Oberman wrote:
> Set at 1025, rebuilt ib_srp module, rebooted client and:
>
> Still get:
>
> [root@srptest ~]# dmesg | grep Reducing
> [  227.076550] scsi host4: ib_srp: Reducing max_sectors from 8192 to 4080
> [  235.316995] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
> [  245.726223] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
> [  253.197573] scsi host6: ib_srp: Reducing max_sectors from 8192 to 4080
>
> And then of course we are constrained

Sorry. I forgot that the maximum FR page list length for mlx4 is 511. I 
will come up with another solution.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 5, 2016, 10:01 p.m. UTC | #8
----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: "Christoph Hellwig" <hch@lst.de>, "Doug Ledford" <dledford@redhat.com>, "Sagi Grimberg" <sagi@grimberg.me>,
> linux-rdma@vger.kernel.org
> Sent: Thursday, May 5, 2016 5:40:16 PM
> Subject: Re: [PATCH 11/11] IB/srp: Prevent mapping failures
> 
> On 05/05/2016 01:56 PM, Laurence Oberman wrote:
> > Set at 1025, rebuilt ib_srp module, rebooted client and:
> >
> > Still get:
> >
> > [root@srptest ~]# dmesg | grep Reducing
> > [  227.076550] scsi host4: ib_srp: Reducing max_sectors from 8192 to 4080
> > [  235.316995] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
> > [  245.726223] scsi host5: ib_srp: Reducing max_sectors from 8192 to 4080
> > [  253.197573] scsi host6: ib_srp: Reducing max_sectors from 8192 to 4080
> >
> > And then of course we are constrained
> 
> Sorry. I forgot that the maximum FR page list length for mlx4 is 511. I
> will come up with another solution.
> 
> Bart.
> 
> 

Hi Bart

Yep, something has changed upstream here to bring this out.
This time back to testing on mlx5.
I installed RHEL6 + Mofed stack to validate what I was told about RHEL6 + Mofed being stable.

kmod-mlnx-ofa_kernel-3.1-OFED.3.1.1.0.3.1.g9032737.rhel6u7.x86_64
libibverbs-devel-1.1.8mlnx1-OFED.3.1.1.0.0.x86_64
libibcm-1.0.5mlnx2-OFED.3.0.11.gd7d485d.x86_64
librdmacm-1.0.21mlnx-OFED.3.0.1.5.2.x86_64
mlnx-ofa_kernel-3.1-OFED.3.1.1.0.3.1.g9032737.rhel6u7.x86_64
libmlx4-1.0.6mlnx1-OFED.3.1.1.0.0.x86_64
mlnx-ofa_kernel-devel-3.1-OFED.3.1.1.0.3.1.g9032737.rhel6u7.x86_64
mlnx-ofa_kernel-modules-3.1-2.6.32_573.12.1.el6.x86_64_OFED.3.1.1.0.3.1.g9032737.x86_64
libibcm-devel-1.0.5mlnx2-OFED.3.0.11.gd7d485d.x86_64
libmlx5-devel-1.0.2mlnx1-OFED.3.1.1.0.3.x86_64
librdmacm-devel-1.0.21mlnx-OFED.3.0.1.5.2.x86_64
dapl-devel-2.1.5mlnx-OFED.3.0.0.3.3.x86_64
ibacm-1.0.9mlnx1-OFED.3.0.10.ga105e8b.x86_64
libibverbs-utils-1.1.8mlnx1-OFED.3.1.1.0.0.x86_64
libibverbs-1.1.8mlnx1-OFED.3.1.1.0.0.x86_64
kmod-knem-mlnx-1.1.2.90mlnx-OFED.3.1.0.0.7.1.g4b084fc.rhel6u7.x86_64
libmlx4-devel-1.0.6mlnx1-OFED.3.1.1.0.0.x86_64
dapl-2.1.5mlnx-OFED.3.0.0.3.3.x86_64
dapl-utils-2.1.5mlnx-OFED.3.0.0.3.3.x86_64
ofed-scripts-3.1-OFED.3.1.1.0.3.x86_64
librdmacm-utils-1.0.21mlnx-OFED.3.0.1.5.2.x86_64
libmlx5-1.0.2mlnx1-OFED.3.1.1.0.3.x86_64

[root@dhcp40-159 ~]# uname -a
Linux xxxxxxxx 2.6.32-573.12.1.el6.x86_64 #1 SMP Mon Nov 23 12:55:32 EST 2015 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.7 (Santiago)

I can set the max_sector_kb to 4096

Setting max_sectors_kb to 8192 x 512
4096
4096
..
..
4096
4096

When doing buffered I/O I see it periodically get to 4096 and what is most important is that we have no mapping errors so far.
So the messaging I am getting about RHEL6.7 + Mofed stack and stability seems to be valid.

### RECORD  336 >>> dhcp40-159 <<< (1462485288.001) (Thu May  5 21:54:48 2016) ###
# DISK STATISTICS (/sec)
#                   <---------reads---------><---------writes---------><--------averages--------> Pct
#Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
22:00:35 sdk              0      0    0    0  167936      0   41 4096    4096    12   314     24   986
22:00:35 sdg              0      0    0    0   69660      0   18 3870    3870     5   374     25   457
22:00:35 sdl              0      0    0    0  364544      0   92 3962    3962    48   358     10   997
22:00:35 sdc              0      0    0    0  184320      0   47 3922    3921    16   298     21   995

We need to figure out what is different upstream here to get us into trouble I guess with 4MB buffered I/O.
Of course kernel etc. is totally different to what we have on RHEL 6.7.

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 5, 2016, 10:46 p.m. UTC | #9
On 05/05/2016 03:01 PM, Laurence Oberman wrote:
> Yep, something has changed upstream here to bring this out.

I will modify the SRP initiator such that more mr's are allocated if 
max_sectors > max_sectors_per_mr. That should allow 4 MB I/O even with 
mlx4 and fast registration and without the risk of encountering a 
mapping failure.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 9, 2016, 10:50 p.m. UTC | #10
----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: "Christoph Hellwig" <hch@lst.de>, "Doug Ledford" <dledford@redhat.com>, "Sagi Grimberg" <sagi@grimberg.me>,
> linux-rdma@vger.kernel.org
> Sent: Thursday, May 5, 2016 6:46:49 PM
> Subject: Re: [PATCH 11/11] IB/srp: Prevent mapping failures
> 
> On 05/05/2016 03:01 PM, Laurence Oberman wrote:
> > Yep, something has changed upstream here to bring this out.
> 
> I will modify the SRP initiator such that more mr's are allocated if
> max_sectors > max_sectors_per_mr. That should allow 4 MB I/O even with
> mlx4 and fast registration and without the risk of encountering a
> mapping failure.
> 
> Bart.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hello Bart

I spent the weekend testing the RHEL6 kernel + Mellanox Ofed. Its rock solid.

On both the latest upstream and the RHEL7.2 kernel we have mapping failures with 4MB max_sector_kb sizes.

In addition the latest RHEL 7.2 kernel + Mellanox Ofed sees the same mapping issues.
This seems to suggest its not confined to only code changes Mellanox has in Ofed ib_srp drivers

Are we looking at a combination of kernel changes and ib_srp changes here to correct this.
Or do you think this will be resolved in the end by just changes to the ib_srp driver code.

If there is any other data you want me to capture or debug please do let me know.

I have started investigating the full flow of I/O from the upper layers through the kernel to the ib_srp layer to see what is different in RHEL6 and why this 
issue is not apparent.

Thanks as always for your assistance here.

Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 9, 2016, 10:55 p.m. UTC | #11
On 05/09/2016 03:50 PM, Laurence Oberman wrote:
> Are we looking at a combination of kernel changes and ib_srp changes here to correct this.

Hello Laurence,

I am working on a patch series that will support a max_sector size of 
4MB and that also will prevent mapping failures. I hope to finish and 
repost that patch series this week. I will CC you on that patch series.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index fc1e84b..53c8e08 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -3409,6 +3409,12 @@  static ssize_t srp_create_target(struct device *dev,
 
 	target_host->sg_tablesize = target->sg_tablesize;
 	target->mr_pool_size = target->scsi_host->can_queue;
+	/*
+	 * The indirect data buffer descriptor is contiguous so the memory for
+	 * that buffer will only be registered if register_always is true.
+	 */
+	if (register_always)
+		target->mr_pool_size *= 2;
 	target->indirect_size = target->sg_tablesize *
 				sizeof (struct srp_direct_buf);
 	target->max_iu_len = sizeof (struct srp_cmd) +