Message ID | 20181002215223.20453-1-bvanassche@acm.org (mailing list archive) |
---|---|
Headers | show |
Series | sd: Rely on the driver core for asynchronous probing | expand |
On 10/2/18 2:52 PM, Bart Van Assche wrote: > During the 2018 edition of LSF/MM there was a session about increasing SCSI > disk probing concurrency. This patch series implements what has been proposed > during that session, namely: > - Make sure that the driver core is aware of asynchronous SCSI LUN probing. > - Avoid unnecessary serialization between sd_probe() and sd_remove() because > this could lead to a deadlock. Has anyone already had the time to have a closer look at this patch series? Thanks, Bart.
Bart, > During the 2018 edition of LSF/MM there was a session about increasing > SCSI disk probing concurrency. This patch series implements what has > been proposed during that session, namely: - Make sure that the driver > core is aware of asynchronous SCSI LUN probing. - Avoid unnecessary > serialization between sd_probe() and sd_remove() because this could > lead to a deadlock. I like it. What kind of testing have you done with $BIGNUM devices? Got any numbers to share?
On Tue, 2018-10-16 at 01:19 -0400, Martin K. Petersen wrote: > Bart, > > > During the 2018 edition of LSF/MM there was a session about increasing > > SCSI disk probing concurrency. This patch series implements what has > > been proposed during that session, namely: - Make sure that the driver > > core is aware of asynchronous SCSI LUN probing. - Avoid unnecessary > > serialization between sd_probe() and sd_remove() because this could > > lead to a deadlock. > > I like it. > > What kind of testing have you done with $BIGNUM devices? Got any numbers > to share? Hi Martin, For the following test: modprobe -r scsi_debug; time modprobe scsi_debug delay=0 max_luns=256 I obtained the following time measurements: * Kernel 4.15.0: 2.2s. * Kernel 4.19-rc6 with this patch series applied: 2.3s. This is not what I had expected. I think this small increase is because the current sd code scans multiple LUNs associated with the same SCSI host concurrently while apparently the device core scans such LUNs sequentially. From the driver core: void __device_attach_async_helper(void *_dev, async_cookie_t cookie) { [ ... ] bus_for_each_drv(dev->bus, NULL, &data, __device_attach_driver); [ ... ] } static int __device_attach(struct device *dev, bool allow_async) { [ ... ] async_schedule(__device_attach_async_helper, dev); [ ... ] } Do you want me to look into modifying __device_attach_async_helper() such that it calls __device_attach_driver() concurrently instead of sequentially in case of multiple LUNs? Thanks, Bart.
On 10/2/18 11:52 PM, Bart Van Assche wrote: > Hello Martin, > > During the 2018 edition of LSF/MM there was a session about increasing SCSI > disk probing concurrency. This patch series implements what has been proposed > during that session, namely: > - Make sure that the driver core is aware of asynchronous SCSI LUN probing. > - Avoid unnecessary serialization between sd_probe() and sd_remove() because > this could lead to a deadlock. > > Please consider this patch series for kernel v4.20. > > Thanks, > > Bart. > > Bart Van Assche (3): > sd: Rely on the driver core for asynchronous probing > sd: Inline sd_probe_part2() > __device_release_driver(): Do not wait for asynchronous probing > > drivers/base/dd.c | 3 -- > drivers/scsi/scsi.c | 14 ----- > drivers/scsi/scsi_pm.c | 22 +------- > drivers/scsi/scsi_priv.h | 3 -- > drivers/scsi/sd.c | 110 ++++++++++++++++----------------------- > 5 files changed, 46 insertions(+), 106 deletions(-) > Initially it looks ok-ish. But I've has so many issues with asynchronous probing that I'd be really careful here. However, inlining sd_probe_async() should amount for some contention to be removed; but I'd rather give it some more testing here. I've found that testing on large configurations on iSER or SRP tend to excite the issues. Can you check against those, too? I'll see to give it a spin on my test bed. Cheers, Hannes
On Fri, 2018-10-26 at 15:58 +0200, Hannes Reinecke wrote: > On 10/2/18 11:52 PM, Bart Van Assche wrote: > > Hello Martin, > > > > During the 2018 edition of LSF/MM there was a session about increasing SCSI > > disk probing concurrency. This patch series implements what has been proposed > > during that session, namely: > > - Make sure that the driver core is aware of asynchronous SCSI LUN probing. > > - Avoid unnecessary serialization between sd_probe() and sd_remove() because > > this could lead to a deadlock. > > > > Please consider this patch series for kernel v4.20. > > > > Thanks, > > > > Bart. > > > > Bart Van Assche (3): > > sd: Rely on the driver core for asynchronous probing > > sd: Inline sd_probe_part2() > > __device_release_driver(): Do not wait for asynchronous probing > > > > drivers/base/dd.c | 3 -- > > drivers/scsi/scsi.c | 14 ----- > > drivers/scsi/scsi_pm.c | 22 +------- > > drivers/scsi/scsi_priv.h | 3 -- > > drivers/scsi/sd.c | 110 ++++++++++++++++----------------------- > > 5 files changed, 46 insertions(+), 106 deletions(-) > > > > Initially it looks ok-ish. > But I've has so many issues with asynchronous probing that I'd be really > careful here. > However, inlining sd_probe_async() should amount for some contention to > be removed; but I'd rather give it some more testing here. > I've found that testing on large configurations on iSER or SRP tend to > excite the issues. > Can you check against those, too? > I'll see to give it a spin on my test bed. Hi Hannes, As you may have noticed the build bot was not very happy with the patches in v2 of this series for the device driver core. Additional testing is welcome but please start from the patches in the following branch for any tests: https://github.com/bvanassche/linux/tree/for-next. On that branch my patches for the device driver core have been replaced with patches from Alexander Duyck. Thanks, Bart.
On Tue, 2018-10-02 at 14:52 -0700, Bart Van Assche wrote: > Hello Martin, > > During the 2018 edition of LSF/MM there was a session about > increasing SCSI > disk probing concurrency. This patch series implements what has been > proposed > during that session, namely: > - Make sure that the driver core is aware of asynchronous SCSI LUN > probing. > - Avoid unnecessary serialization between sd_probe() and sd_remove() > because > this could lead to a deadlock. > > Please consider this patch series for kernel v4.20. > > Thanks, > > Bart. > > Bart Van Assche (3): > sd: Rely on the driver core for asynchronous probing > sd: Inline sd_probe_part2() > __device_release_driver(): Do not wait for asynchronous probing > > drivers/base/dd.c | 3 -- > drivers/scsi/scsi.c | 14 ----- > drivers/scsi/scsi_pm.c | 22 +------- > drivers/scsi/scsi_priv.h | 3 -- > drivers/scsi/sd.c | 110 ++++++++++++++++--------------------- > -- > 5 files changed, 46 insertions(+), 106 deletions(-) > I ran multiple tests for two days using the series from Bart's tree on the SRP test bed here. Multiple disconnects/reconnects during heavy IO activity on 32 SRP 64 LUNS/ 32 mpaths with no issues seen with probes and reconnects. For the series. Tested-by: Laurence Oberman <loberman@redhat.com>
On 10/28/18 7:04 AM, Laurence Oberman wrote: > I ran multiple tests for two days using the series from Bart's tree on > the SRP test bed here. > > Multiple disconnects/reconnects during heavy IO activity on 32 SRP 64 > LUNS/ 32 mpaths with no issues seen with probes and reconnects. > > For the series. > > Tested-by: Laurence Oberman <loberman@redhat.com> Thanks Laurence for the testing! In case this would not be clear, Laurence tested commit 55d2bab790e2 ("sd: Inline sd_probe_part2()") from my github repository. Bart.