mbox series

[0/3] sd: Rely on the driver core for asynchronous probing

Message ID 20181002215223.20453-1-bvanassche@acm.org (mailing list archive)
Headers show
Series sd: Rely on the driver core for asynchronous probing | expand

Message

Bart Van Assche Oct. 2, 2018, 9:52 p.m. UTC
Hello Martin,

During the 2018 edition of LSF/MM there was a session about increasing SCSI
disk probing concurrency. This patch series implements what has been proposed
during that session, namely:
- Make sure that the driver core is aware of asynchronous SCSI LUN probing.
- Avoid unnecessary serialization between sd_probe() and sd_remove() because
  this could lead to a deadlock.

Please consider this patch series for kernel v4.20.

Thanks,

Bart.

Bart Van Assche (3):
  sd: Rely on the driver core for asynchronous probing
  sd: Inline sd_probe_part2()
  __device_release_driver(): Do not wait for asynchronous probing

 drivers/base/dd.c        |   3 --
 drivers/scsi/scsi.c      |  14 -----
 drivers/scsi/scsi_pm.c   |  22 +-------
 drivers/scsi/scsi_priv.h |   3 --
 drivers/scsi/sd.c        | 110 ++++++++++++++++-----------------------
 5 files changed, 46 insertions(+), 106 deletions(-)

Comments

Bart Van Assche Oct. 14, 2018, 7:43 p.m. UTC | #1
On 10/2/18 2:52 PM, Bart Van Assche wrote:
> During the 2018 edition of LSF/MM there was a session about increasing SCSI
> disk probing concurrency. This patch series implements what has been proposed
> during that session, namely:
> - Make sure that the driver core is aware of asynchronous SCSI LUN probing.
> - Avoid unnecessary serialization between sd_probe() and sd_remove() because
>    this could lead to a deadlock.

Has anyone already had the time to have a closer look at this patch series?

Thanks,

Bart.
Martin K. Petersen Oct. 16, 2018, 5:19 a.m. UTC | #2
Bart,

> During the 2018 edition of LSF/MM there was a session about increasing
> SCSI disk probing concurrency. This patch series implements what has
> been proposed during that session, namely: - Make sure that the driver
> core is aware of asynchronous SCSI LUN probing.  - Avoid unnecessary
> serialization between sd_probe() and sd_remove() because this could
> lead to a deadlock.

I like it.

What kind of testing have you done with $BIGNUM devices? Got any numbers
to share?
Bart Van Assche Oct. 16, 2018, 6:29 p.m. UTC | #3
On Tue, 2018-10-16 at 01:19 -0400, Martin K. Petersen wrote:
> Bart,
> 
> > During the 2018 edition of LSF/MM there was a session about increasing
> > SCSI disk probing concurrency. This patch series implements what has
> > been proposed during that session, namely: - Make sure that the driver
> > core is aware of asynchronous SCSI LUN probing.  - Avoid unnecessary
> > serialization between sd_probe() and sd_remove() because this could
> > lead to a deadlock.
> 
> I like it.
> 
> What kind of testing have you done with $BIGNUM devices? Got any numbers
> to share?

Hi Martin,

For the following test:

modprobe -r scsi_debug; time modprobe scsi_debug delay=0 max_luns=256

I obtained the following time measurements:
* Kernel 4.15.0: 2.2s.
* Kernel 4.19-rc6 with this patch series applied: 2.3s.

This is not what I had expected. I think this small increase is because the
current sd code scans multiple LUNs associated with the same SCSI host
concurrently while apparently the device core scans such LUNs sequentially.
From the driver core:

void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
{
	[ ... ]
	bus_for_each_drv(dev->bus, NULL, &data, __device_attach_driver);
	[ ... ]
}

static int __device_attach(struct device *dev, bool allow_async)
{
	[ ... ]
			async_schedule(__device_attach_async_helper, dev);
	[ ... ]
}

Do you want me to look into modifying __device_attach_async_helper() such
that it calls __device_attach_driver() concurrently instead of sequentially
in case of multiple LUNs?

Thanks,

Bart.
Hannes Reinecke Oct. 26, 2018, 1:58 p.m. UTC | #4
On 10/2/18 11:52 PM, Bart Van Assche wrote:
> Hello Martin,
> 
> During the 2018 edition of LSF/MM there was a session about increasing SCSI
> disk probing concurrency. This patch series implements what has been proposed
> during that session, namely:
> - Make sure that the driver core is aware of asynchronous SCSI LUN probing.
> - Avoid unnecessary serialization between sd_probe() and sd_remove() because
>    this could lead to a deadlock.
> 
> Please consider this patch series for kernel v4.20.
> 
> Thanks,
> 
> Bart.
> 
> Bart Van Assche (3):
>    sd: Rely on the driver core for asynchronous probing
>    sd: Inline sd_probe_part2()
>    __device_release_driver(): Do not wait for asynchronous probing
> 
>   drivers/base/dd.c        |   3 --
>   drivers/scsi/scsi.c      |  14 -----
>   drivers/scsi/scsi_pm.c   |  22 +-------
>   drivers/scsi/scsi_priv.h |   3 --
>   drivers/scsi/sd.c        | 110 ++++++++++++++++-----------------------
>   5 files changed, 46 insertions(+), 106 deletions(-)
> 
Initially it looks ok-ish.
But I've has so many issues with asynchronous probing that I'd be really 
careful here.
However, inlining sd_probe_async() should amount for some contention to 
be removed; but I'd rather give it some more testing here.
I've found that testing on large configurations on iSER or SRP tend to 
excite the issues.
Can you check against those, too?
I'll see to give it a spin on my test bed.

Cheers,

Hannes
Bart Van Assche Oct. 26, 2018, 3:13 p.m. UTC | #5
On Fri, 2018-10-26 at 15:58 +0200, Hannes Reinecke wrote:
> On 10/2/18 11:52 PM, Bart Van Assche wrote:
> > Hello Martin,
> > 
> > During the 2018 edition of LSF/MM there was a session about increasing SCSI
> > disk probing concurrency. This patch series implements what has been proposed
> > during that session, namely:
> > - Make sure that the driver core is aware of asynchronous SCSI LUN probing.
> > - Avoid unnecessary serialization between sd_probe() and sd_remove() because
> >    this could lead to a deadlock.
> > 
> > Please consider this patch series for kernel v4.20.
> > 
> > Thanks,
> > 
> > Bart.
> > 
> > Bart Van Assche (3):
> >    sd: Rely on the driver core for asynchronous probing
> >    sd: Inline sd_probe_part2()
> >    __device_release_driver(): Do not wait for asynchronous probing
> > 
> >   drivers/base/dd.c        |   3 --
> >   drivers/scsi/scsi.c      |  14 -----
> >   drivers/scsi/scsi_pm.c   |  22 +-------
> >   drivers/scsi/scsi_priv.h |   3 --
> >   drivers/scsi/sd.c        | 110 ++++++++++++++++-----------------------
> >   5 files changed, 46 insertions(+), 106 deletions(-)
> > 
> 
> Initially it looks ok-ish.
> But I've has so many issues with asynchronous probing that I'd be really 
> careful here.
> However, inlining sd_probe_async() should amount for some contention to 
> be removed; but I'd rather give it some more testing here.
> I've found that testing on large configurations on iSER or SRP tend to 
> excite the issues.
> Can you check against those, too?
> I'll see to give it a spin on my test bed.

Hi Hannes,

As you may have noticed the build bot was not very happy with the patches in
v2 of this series for the device driver core. Additional testing is welcome
but please start from the patches in the following branch for any tests:
https://github.com/bvanassche/linux/tree/for-next. On that branch my patches
for the device driver core have been replaced with patches from Alexander Duyck.

Thanks,

Bart.
Laurence Oberman Oct. 28, 2018, 2:04 p.m. UTC | #6
On Tue, 2018-10-02 at 14:52 -0700, Bart Van Assche wrote:
> Hello Martin,
> 
> During the 2018 edition of LSF/MM there was a session about
> increasing SCSI
> disk probing concurrency. This patch series implements what has been
> proposed
> during that session, namely:
> - Make sure that the driver core is aware of asynchronous SCSI LUN
> probing.
> - Avoid unnecessary serialization between sd_probe() and sd_remove()
> because
>   this could lead to a deadlock.
> 
> Please consider this patch series for kernel v4.20.
> 
> Thanks,
> 
> Bart.
> 
> Bart Van Assche (3):
>   sd: Rely on the driver core for asynchronous probing
>   sd: Inline sd_probe_part2()
>   __device_release_driver(): Do not wait for asynchronous probing
> 
>  drivers/base/dd.c        |   3 --
>  drivers/scsi/scsi.c      |  14 -----
>  drivers/scsi/scsi_pm.c   |  22 +-------
>  drivers/scsi/scsi_priv.h |   3 --
>  drivers/scsi/sd.c        | 110 ++++++++++++++++---------------------
> --
>  5 files changed, 46 insertions(+), 106 deletions(-)
> 

I ran multiple tests for two days using the series from Bart's tree on
the SRP test bed here.

Multiple disconnects/reconnects during heavy IO activity on 32 SRP 64
LUNS/ 32 mpaths with no issues seen with probes and reconnects.

For the series.

Tested-by: Laurence Oberman <loberman@redhat.com>
Bart Van Assche Oct. 28, 2018, 9:05 p.m. UTC | #7
On 10/28/18 7:04 AM, Laurence Oberman wrote:
> I ran multiple tests for two days using the series from Bart's tree on
> the SRP test bed here.
> 
> Multiple disconnects/reconnects during heavy IO activity on 32 SRP 64
> LUNS/ 32 mpaths with no issues seen with probes and reconnects.
> 
> For the series.
> 
> Tested-by: Laurence Oberman <loberman@redhat.com>

Thanks Laurence for the testing!

In case this would not be clear, Laurence tested commit 55d2bab790e2 
("sd: Inline sd_probe_part2()") from my github repository.

Bart.