diff mbox series

[v2] scsi: sr: get rid of sr global mutex

Message ID 20200218143918.30267-1-merlijn@archive.org (mailing list archive)
State Mainlined
Commit 51a858817dcdbbdee22cb54b0b2b26eb145ca5b6
Headers show
Series [v2] scsi: sr: get rid of sr global mutex | expand

Commit Message

Merlijn B.W. Wajer Feb. 18, 2020, 2:39 p.m. UTC
When replacing the Big Kernel Lock in commit
2a48fc0ab24241755dc93bfd4f01d68efab47f5a ("block: autoconvert trivial
BKL users to private mutex"), the lock was replaced with a sr-wide lock.

This causes very poor performance when using multiple sr devices, as the
sr driver was not able to execute more than one command to one drive at
any given time, even when there were many CD drives available.

Replace the global mutex with per-sr-device mutex.

Someone tried this patch at the time, but it never made it
upstream, due to possible concerns with race conditions, but it's not
clear the patch actually caused those:

https://www.spinics.net/lists/linux-scsi/msg63706.html
https://www.spinics.net/lists/linux-scsi/msg63750.html

Also see

http://lists.xiph.org/pipermail/paranoia/2019-December/001647.html

Signed-off-by: Merlijn Wajer <merlijn@archive.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/scsi/sr.c | 20 +++++++++++---------
 drivers/scsi/sr.h |  2 ++
 2 files changed, 13 insertions(+), 9 deletions(-)

Comments

Christoph Hellwig Feb. 18, 2020, 5:12 p.m. UTC | #1
On Tue, Feb 18, 2020 at 03:39:17PM +0100, Merlijn Wajer wrote:
> When replacing the Big Kernel Lock in commit
> 2a48fc0ab24241755dc93bfd4f01d68efab47f5a ("block: autoconvert trivial
> BKL users to private mutex"), the lock was replaced with a sr-wide lock.
> 
> This causes very poor performance when using multiple sr devices, as the
> sr driver was not able to execute more than one command to one drive at
> any given time, even when there were many CD drives available.
> 
> Replace the global mutex with per-sr-device mutex.

Do we actually need the lock at all?  What is protected by it?
James Bottomley Feb. 18, 2020, 5:20 p.m. UTC | #2
On Tue, 2020-02-18 at 09:12 -0800, Christoph Hellwig wrote:
> On Tue, Feb 18, 2020 at 03:39:17PM +0100, Merlijn Wajer wrote:
> > When replacing the Big Kernel Lock in commit
> > 2a48fc0ab24241755dc93bfd4f01d68efab47f5a ("block: autoconvert
> > trivial BKL users to private mutex"), the lock was replaced with a
> > sr-wide lock.
> > 
> > This causes very poor performance when using multiple sr devices,
> > as the sr driver was not able to execute more than one command to
> > one drive at any given time, even when there were many CD drives
> > available.
> > 
> > Replace the global mutex with per-sr-device mutex.
> 
> Do we actually need the lock at all?  What is protected by it?

We do at least for cdrom_open.  It modifies the cdi structure with no
other protection and concurrent modification would at least screw up
the use counter which is not atomic.  Same reasoning for cdrom_release.

I think the ioctls don't need the mutex (not looked deeply enough) and
certainly the probe only requires it for the idr allocation which has
its own lock, so I don't believe the mutex additions are needed there.

James
Christoph Hellwig Feb. 18, 2020, 5:23 p.m. UTC | #3
On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote:
> > > Replace the global mutex with per-sr-device mutex.
> > 
> > Do we actually need the lock at all?  What is protected by it?
> 
> We do at least for cdrom_open.  It modifies the cdi structure with no
> other protection and concurrent modification would at least screw up
> the use counter which is not atomic.  Same reasoning for cdrom_release.

Wouldn't the right fix to add locking to cdrom_open/release instead of
having an undocumented requirement for the callers?
James Bottomley Feb. 18, 2020, 5:28 p.m. UTC | #4
On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote:
> On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote:
> > > > Replace the global mutex with per-sr-device mutex.
> > > 
> > > Do we actually need the lock at all?  What is protected by it?
> > 
> > We do at least for cdrom_open.  It modifies the cdi structure with
> > no other protection and concurrent modification would at least
> > screw up the use counter which is not atomic.  Same reasoning for
> > cdrom_release.
> 
> Wouldn't the right fix to add locking to cdrom_open/release instead
> of having an undocumented requirement for the callers?

Yes ... but that's somewhat of a bigger patch because you now have to
reason about the callbacks within cdrom.  There's also the question of
whether you can assume ops->generic_packet() has its own concurrency
protections ... it's certainly true for SCSI, but is it for anything
else?  Although I suppose you can just not care and run the internal
lock over it anyway.

James
Christoph Hellwig Feb. 18, 2020, 5:31 p.m. UTC | #5
On Tue, Feb 18, 2020 at 09:28:34AM -0800, James Bottomley wrote:
> On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote:
> > On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote:
> > > > > Replace the global mutex with per-sr-device mutex.
> > > > 
> > > > Do we actually need the lock at all?  What is protected by it?
> > > 
> > > We do at least for cdrom_open.  It modifies the cdi structure with
> > > no other protection and concurrent modification would at least
> > > screw up the use counter which is not atomic.  Same reasoning for
> > > cdrom_release.
> > 
> > Wouldn't the right fix to add locking to cdrom_open/release instead
> > of having an undocumented requirement for the callers?
> 
> Yes ... but that's somewhat of a bigger patch because you now have to
> reason about the callbacks within cdrom.  There's also the question of
> whether you can assume ops->generic_packet() has its own concurrency
> protections ... it's certainly true for SCSI, but is it for anything
> else?  Although I suppose you can just not care and run the internal
> lock over it anyway.

We have 4 instances of struct cdrom_device_ops in the kernel, one of
which has a no-op generic_packet.  So I don't think this should be a
huge project.
Merlijn B.W. Wajer Feb. 18, 2020, 7:21 p.m. UTC | #6
Hi,

On 18/02/2020 18:31, Christoph Hellwig wrote:
> On Tue, Feb 18, 2020 at 09:28:34AM -0800, James Bottomley wrote:
>> On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote:
>>> On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote:
>>>>>> Replace the global mutex with per-sr-device mutex.
>>>>>
>>>>> Do we actually need the lock at all?  What is protected by it?
>>>>
>>>> We do at least for cdrom_open.  It modifies the cdi structure with
>>>> no other protection and concurrent modification would at least
>>>> screw up the use counter which is not atomic.  Same reasoning for
>>>> cdrom_release.
>>>
>>> Wouldn't the right fix to add locking to cdrom_open/release instead
>>> of having an undocumented requirement for the callers?
>>
>> Yes ... but that's somewhat of a bigger patch because you now have to
>> reason about the callbacks within cdrom.  There's also the question of
>> whether you can assume ops->generic_packet() has its own concurrency
>> protections ... it's certainly true for SCSI, but is it for anything
>> else?  Although I suppose you can just not care and run the internal
>> lock over it anyway.
> 
> We have 4 instances of struct cdrom_device_ops in the kernel, one of
> which has a no-op generic_packet.  So I don't think this should be a
> huge project.

The are two reasons I decided to make minor changes to fix the
performance regression.

First, being able to send the patch to the various stable branches once
merged. For people working with many CD drives attached to one station,
this is a pretty big deal, so I tried to keep the patch simple. It fixes
the regression introduced in another commit.

Secondly, I don't have the hardware to test sophisticated or old setups,
like some of the issues linked from my patch. I have SATA CD drives with
USB->SATA bridges, no IDE, no PATA, etc. So the testing I can do is
relatively limited.

Perhaps I or someone else can work on removing the usage of the locks,
but as it stands I think this addresses the performance issue present in
the current kernel, and removing locks and the associated testing
required with that is something I am not entirely comfortable doing.

Cheers,
Merlijn
Arnd Bergmann Feb. 18, 2020, 7:46 p.m. UTC | #7
On Tue, Feb 18, 2020 at 8:20 PM Merlijn B.W. Wajer <merlijn@archive.org> wrote:
> On 18/02/2020 18:31, Christoph Hellwig wrote:
> > On Tue, Feb 18, 2020 at 09:28:34AM -0800, James Bottomley wrote:
> >> On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote:
> >>> On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote:
> >>>>>> Replace the global mutex with per-sr-device mutex.
> >>>>>
> >>>>> Do we actually need the lock at all?  What is protected by it?
> >>>>
> >>>> We do at least for cdrom_open.  It modifies the cdi structure with
> >>>> no other protection and concurrent modification would at least
> >>>> screw up the use counter which is not atomic.  Same reasoning for
> >>>> cdrom_release.
> >>>
> >>> Wouldn't the right fix to add locking to cdrom_open/release instead
> >>> of having an undocumented requirement for the callers?
> >>
> >> Yes ... but that's somewhat of a bigger patch because you now have to
> >> reason about the callbacks within cdrom.  There's also the question of
> >> whether you can assume ops->generic_packet() has its own concurrency
> >> protections ... it's certainly true for SCSI, but is it for anything
> >> else?  Although I suppose you can just not care and run the internal
> >> lock over it anyway.
> >
> > We have 4 instances of struct cdrom_device_ops in the kernel, one of
> > which has a no-op generic_packet.  So I don't think this should be a
> > huge project.
>
> The are two reasons I decided to make minor changes to fix the
> performance regression.
>
> First, being able to send the patch to the various stable branches once
> merged. For people working with many CD drives attached to one station,
> this is a pretty big deal, so I tried to keep the patch simple. It fixes
> the regression introduced in another commit.
>
> Secondly, I don't have the hardware to test sophisticated or old setups,
> like some of the issues linked from my patch. I have SATA CD drives with
> USB->SATA bridges, no IDE, no PATA, etc. So the testing I can do is
> relatively limited.
>
> Perhaps I or someone else can work on removing the usage of the locks,
> but as it stands I think this addresses the performance issue present in
> the current kernel, and removing locks and the associated testing
> required with that is something I am not entirely comfortable doing.

I think this is entirely reasonable. There is a good chance that the
per-device lock is not needed, but there is an even higher chance
that there is never any contention, because the normal use case
is for a CDROM driver is to only have one process working on it at
a time using ioctl.

        Arnd
Merlijn B.W. Wajer Feb. 24, 2020, 9:20 p.m. UTC | #8
Hi Martin,

Just wanted to check if you planned to apply this v2 (you tried to apply
v1 but it didn't compile, so I rebased it onto 5.7/scsi-queue as you
requested). Please let me know if there's anything you'd like to see
changed.

Regards,
Merlijn

On 18/02/2020 20:21, Merlijn B.W. Wajer wrote:
> Hi,
> 
> On 18/02/2020 18:31, Christoph Hellwig wrote:
>> On Tue, Feb 18, 2020 at 09:28:34AM -0800, James Bottomley wrote:
>>> On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote:
>>>> On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote:
>>>>>>> Replace the global mutex with per-sr-device mutex.
>>>>>>
>>>>>> Do we actually need the lock at all?  What is protected by it?
>>>>>
>>>>> We do at least for cdrom_open.  It modifies the cdi structure with
>>>>> no other protection and concurrent modification would at least
>>>>> screw up the use counter which is not atomic.  Same reasoning for
>>>>> cdrom_release.
>>>>
>>>> Wouldn't the right fix to add locking to cdrom_open/release instead
>>>> of having an undocumented requirement for the callers?
>>>
>>> Yes ... but that's somewhat of a bigger patch because you now have to
>>> reason about the callbacks within cdrom.  There's also the question of
>>> whether you can assume ops->generic_packet() has its own concurrency
>>> protections ... it's certainly true for SCSI, but is it for anything
>>> else?  Although I suppose you can just not care and run the internal
>>> lock over it anyway.
>>
>> We have 4 instances of struct cdrom_device_ops in the kernel, one of
>> which has a no-op generic_packet.  So I don't think this should be a
>> huge project.
> 
> The are two reasons I decided to make minor changes to fix the
> performance regression.
> 
> First, being able to send the patch to the various stable branches once
> merged. For people working with many CD drives attached to one station,
> this is a pretty big deal, so I tried to keep the patch simple. It fixes
> the regression introduced in another commit.
> 
> Secondly, I don't have the hardware to test sophisticated or old setups,
> like some of the issues linked from my patch. I have SATA CD drives with
> USB->SATA bridges, no IDE, no PATA, etc. So the testing I can do is
> relatively limited.
> 
> Perhaps I or someone else can work on removing the usage of the locks,
> but as it stands I think this addresses the performance issue present in
> the current kernel, and removing locks and the associated testing
> required with that is something I am not entirely comfortable doing.
> 
> Cheers,
> Merlijn
>
diff mbox series

Patch

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 0fbb8fe6e521..fe0e1c721a99 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -79,7 +79,6 @@  MODULE_ALIAS_SCSI_DEVICE(TYPE_WORM);
 	 CDC_CD_R|CDC_CD_RW|CDC_DVD|CDC_DVD_R|CDC_DVD_RAM|CDC_GENERIC_PACKET| \
 	 CDC_MRW|CDC_MRW_W|CDC_RAM)
 
-static DEFINE_MUTEX(sr_mutex);
 static int sr_probe(struct device *);
 static int sr_remove(struct device *);
 static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt);
@@ -536,9 +535,9 @@  static int sr_block_open(struct block_device *bdev, fmode_t mode)
 	scsi_autopm_get_device(sdev);
 	check_disk_change(bdev);
 
-	mutex_lock(&sr_mutex);
+	mutex_lock(&cd->lock);
 	ret = cdrom_open(&cd->cdi, bdev, mode);
-	mutex_unlock(&sr_mutex);
+	mutex_unlock(&cd->lock);
 
 	scsi_autopm_put_device(sdev);
 	if (ret)
@@ -551,10 +550,10 @@  static int sr_block_open(struct block_device *bdev, fmode_t mode)
 static void sr_block_release(struct gendisk *disk, fmode_t mode)
 {
 	struct scsi_cd *cd = scsi_cd(disk);
-	mutex_lock(&sr_mutex);
+	mutex_lock(&cd->lock);
 	cdrom_release(&cd->cdi, mode);
 	scsi_cd_put(cd);
-	mutex_unlock(&sr_mutex);
+	mutex_unlock(&cd->lock);
 }
 
 static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
@@ -565,7 +564,7 @@  static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 	void __user *argp = (void __user *)arg;
 	int ret;
 
-	mutex_lock(&sr_mutex);
+	mutex_lock(&cd->lock);
 
 	ret = scsi_ioctl_block_when_processing_errors(sdev, cmd,
 			(mode & FMODE_NDELAY) != 0);
@@ -595,7 +594,7 @@  static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 	scsi_autopm_put_device(sdev);
 
 out:
-	mutex_unlock(&sr_mutex);
+	mutex_unlock(&cd->lock);
 	return ret;
 }
 
@@ -608,7 +607,7 @@  static int sr_block_compat_ioctl(struct block_device *bdev, fmode_t mode, unsign
 	void __user *argp = compat_ptr(arg);
 	int ret;
 
-	mutex_lock(&sr_mutex);
+	mutex_lock(&cd->lock);
 
 	ret = scsi_ioctl_block_when_processing_errors(sdev, cmd,
 			(mode & FMODE_NDELAY) != 0);
@@ -638,7 +637,7 @@  static int sr_block_compat_ioctl(struct block_device *bdev, fmode_t mode, unsign
 	scsi_autopm_put_device(sdev);
 
 out:
-	mutex_unlock(&sr_mutex);
+	mutex_unlock(&cd->lock);
 	return ret;
 
 }
@@ -745,6 +744,7 @@  static int sr_probe(struct device *dev)
 	disk = alloc_disk(1);
 	if (!disk)
 		goto fail_free;
+	mutex_init(&cd->lock);
 
 	spin_lock(&sr_index_lock);
 	minor = find_first_zero_bit(sr_index_bits, SR_DISKS);
@@ -1055,6 +1055,8 @@  static void sr_kref_release(struct kref *kref)
 
 	put_disk(disk);
 
+	mutex_destroy(&cd->lock);
+
 	kfree(cd);
 }
 
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index a2bb7b8bace5..339c624e04d8 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -20,6 +20,7 @@ 
 
 #include <linux/genhd.h>
 #include <linux/kref.h>
+#include <linux/mutex.h>
 
 #define MAX_RETRIES	3
 #define SR_TIMEOUT	(30 * HZ)
@@ -51,6 +52,7 @@  typedef struct scsi_cd {
 	bool ignore_get_event:1;	/* GET_EVENT is unreliable, use TUR */
 
 	struct cdrom_device_info cdi;
+	struct mutex lock;
 	/* We hold gendisk and scsi_device references on probe and use
 	 * the refs on this kref to decide when to release them */
 	struct kref kref;