Message ID | 20200218143918.30267-1-merlijn@archive.org (mailing list archive) |
---|---|
State | Mainlined |
Commit | 51a858817dcdbbdee22cb54b0b2b26eb145ca5b6 |
Headers | show |
Series | [v2] scsi: sr: get rid of sr global mutex | expand |
On Tue, Feb 18, 2020 at 03:39:17PM +0100, Merlijn Wajer wrote: > When replacing the Big Kernel Lock in commit > 2a48fc0ab24241755dc93bfd4f01d68efab47f5a ("block: autoconvert trivial > BKL users to private mutex"), the lock was replaced with a sr-wide lock. > > This causes very poor performance when using multiple sr devices, as the > sr driver was not able to execute more than one command to one drive at > any given time, even when there were many CD drives available. > > Replace the global mutex with per-sr-device mutex. Do we actually need the lock at all? What is protected by it?
On Tue, 2020-02-18 at 09:12 -0800, Christoph Hellwig wrote: > On Tue, Feb 18, 2020 at 03:39:17PM +0100, Merlijn Wajer wrote: > > When replacing the Big Kernel Lock in commit > > 2a48fc0ab24241755dc93bfd4f01d68efab47f5a ("block: autoconvert > > trivial BKL users to private mutex"), the lock was replaced with a > > sr-wide lock. > > > > This causes very poor performance when using multiple sr devices, > > as the sr driver was not able to execute more than one command to > > one drive at any given time, even when there were many CD drives > > available. > > > > Replace the global mutex with per-sr-device mutex. > > Do we actually need the lock at all? What is protected by it? We do at least for cdrom_open. It modifies the cdi structure with no other protection and concurrent modification would at least screw up the use counter which is not atomic. Same reasoning for cdrom_release. I think the ioctls don't need the mutex (not looked deeply enough) and certainly the probe only requires it for the idr allocation which has its own lock, so I don't believe the mutex additions are needed there. James
On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote: > > > Replace the global mutex with per-sr-device mutex. > > > > Do we actually need the lock at all? What is protected by it? > > We do at least for cdrom_open. It modifies the cdi structure with no > other protection and concurrent modification would at least screw up > the use counter which is not atomic. Same reasoning for cdrom_release. Wouldn't the right fix to add locking to cdrom_open/release instead of having an undocumented requirement for the callers?
On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote: > On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote: > > > > Replace the global mutex with per-sr-device mutex. > > > > > > Do we actually need the lock at all? What is protected by it? > > > > We do at least for cdrom_open. It modifies the cdi structure with > > no other protection and concurrent modification would at least > > screw up the use counter which is not atomic. Same reasoning for > > cdrom_release. > > Wouldn't the right fix to add locking to cdrom_open/release instead > of having an undocumented requirement for the callers? Yes ... but that's somewhat of a bigger patch because you now have to reason about the callbacks within cdrom. There's also the question of whether you can assume ops->generic_packet() has its own concurrency protections ... it's certainly true for SCSI, but is it for anything else? Although I suppose you can just not care and run the internal lock over it anyway. James
On Tue, Feb 18, 2020 at 09:28:34AM -0800, James Bottomley wrote: > On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote: > > On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote: > > > > > Replace the global mutex with per-sr-device mutex. > > > > > > > > Do we actually need the lock at all? What is protected by it? > > > > > > We do at least for cdrom_open. It modifies the cdi structure with > > > no other protection and concurrent modification would at least > > > screw up the use counter which is not atomic. Same reasoning for > > > cdrom_release. > > > > Wouldn't the right fix to add locking to cdrom_open/release instead > > of having an undocumented requirement for the callers? > > Yes ... but that's somewhat of a bigger patch because you now have to > reason about the callbacks within cdrom. There's also the question of > whether you can assume ops->generic_packet() has its own concurrency > protections ... it's certainly true for SCSI, but is it for anything > else? Although I suppose you can just not care and run the internal > lock over it anyway. We have 4 instances of struct cdrom_device_ops in the kernel, one of which has a no-op generic_packet. So I don't think this should be a huge project.
Hi, On 18/02/2020 18:31, Christoph Hellwig wrote: > On Tue, Feb 18, 2020 at 09:28:34AM -0800, James Bottomley wrote: >> On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote: >>> On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote: >>>>>> Replace the global mutex with per-sr-device mutex. >>>>> >>>>> Do we actually need the lock at all? What is protected by it? >>>> >>>> We do at least for cdrom_open. It modifies the cdi structure with >>>> no other protection and concurrent modification would at least >>>> screw up the use counter which is not atomic. Same reasoning for >>>> cdrom_release. >>> >>> Wouldn't the right fix to add locking to cdrom_open/release instead >>> of having an undocumented requirement for the callers? >> >> Yes ... but that's somewhat of a bigger patch because you now have to >> reason about the callbacks within cdrom. There's also the question of >> whether you can assume ops->generic_packet() has its own concurrency >> protections ... it's certainly true for SCSI, but is it for anything >> else? Although I suppose you can just not care and run the internal >> lock over it anyway. > > We have 4 instances of struct cdrom_device_ops in the kernel, one of > which has a no-op generic_packet. So I don't think this should be a > huge project. The are two reasons I decided to make minor changes to fix the performance regression. First, being able to send the patch to the various stable branches once merged. For people working with many CD drives attached to one station, this is a pretty big deal, so I tried to keep the patch simple. It fixes the regression introduced in another commit. Secondly, I don't have the hardware to test sophisticated or old setups, like some of the issues linked from my patch. I have SATA CD drives with USB->SATA bridges, no IDE, no PATA, etc. So the testing I can do is relatively limited. Perhaps I or someone else can work on removing the usage of the locks, but as it stands I think this addresses the performance issue present in the current kernel, and removing locks and the associated testing required with that is something I am not entirely comfortable doing. Cheers, Merlijn
On Tue, Feb 18, 2020 at 8:20 PM Merlijn B.W. Wajer <merlijn@archive.org> wrote: > On 18/02/2020 18:31, Christoph Hellwig wrote: > > On Tue, Feb 18, 2020 at 09:28:34AM -0800, James Bottomley wrote: > >> On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote: > >>> On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote: > >>>>>> Replace the global mutex with per-sr-device mutex. > >>>>> > >>>>> Do we actually need the lock at all? What is protected by it? > >>>> > >>>> We do at least for cdrom_open. It modifies the cdi structure with > >>>> no other protection and concurrent modification would at least > >>>> screw up the use counter which is not atomic. Same reasoning for > >>>> cdrom_release. > >>> > >>> Wouldn't the right fix to add locking to cdrom_open/release instead > >>> of having an undocumented requirement for the callers? > >> > >> Yes ... but that's somewhat of a bigger patch because you now have to > >> reason about the callbacks within cdrom. There's also the question of > >> whether you can assume ops->generic_packet() has its own concurrency > >> protections ... it's certainly true for SCSI, but is it for anything > >> else? Although I suppose you can just not care and run the internal > >> lock over it anyway. > > > > We have 4 instances of struct cdrom_device_ops in the kernel, one of > > which has a no-op generic_packet. So I don't think this should be a > > huge project. > > The are two reasons I decided to make minor changes to fix the > performance regression. > > First, being able to send the patch to the various stable branches once > merged. For people working with many CD drives attached to one station, > this is a pretty big deal, so I tried to keep the patch simple. It fixes > the regression introduced in another commit. > > Secondly, I don't have the hardware to test sophisticated or old setups, > like some of the issues linked from my patch. I have SATA CD drives with > USB->SATA bridges, no IDE, no PATA, etc. So the testing I can do is > relatively limited. > > Perhaps I or someone else can work on removing the usage of the locks, > but as it stands I think this addresses the performance issue present in > the current kernel, and removing locks and the associated testing > required with that is something I am not entirely comfortable doing. I think this is entirely reasonable. There is a good chance that the per-device lock is not needed, but there is an even higher chance that there is never any contention, because the normal use case is for a CDROM driver is to only have one process working on it at a time using ioctl. Arnd
Hi Martin, Just wanted to check if you planned to apply this v2 (you tried to apply v1 but it didn't compile, so I rebased it onto 5.7/scsi-queue as you requested). Please let me know if there's anything you'd like to see changed. Regards, Merlijn On 18/02/2020 20:21, Merlijn B.W. Wajer wrote: > Hi, > > On 18/02/2020 18:31, Christoph Hellwig wrote: >> On Tue, Feb 18, 2020 at 09:28:34AM -0800, James Bottomley wrote: >>> On Tue, 2020-02-18 at 09:23 -0800, Christoph Hellwig wrote: >>>> On Tue, Feb 18, 2020 at 09:20:28AM -0800, James Bottomley wrote: >>>>>>> Replace the global mutex with per-sr-device mutex. >>>>>> >>>>>> Do we actually need the lock at all? What is protected by it? >>>>> >>>>> We do at least for cdrom_open. It modifies the cdi structure with >>>>> no other protection and concurrent modification would at least >>>>> screw up the use counter which is not atomic. Same reasoning for >>>>> cdrom_release. >>>> >>>> Wouldn't the right fix to add locking to cdrom_open/release instead >>>> of having an undocumented requirement for the callers? >>> >>> Yes ... but that's somewhat of a bigger patch because you now have to >>> reason about the callbacks within cdrom. There's also the question of >>> whether you can assume ops->generic_packet() has its own concurrency >>> protections ... it's certainly true for SCSI, but is it for anything >>> else? Although I suppose you can just not care and run the internal >>> lock over it anyway. >> >> We have 4 instances of struct cdrom_device_ops in the kernel, one of >> which has a no-op generic_packet. So I don't think this should be a >> huge project. > > The are two reasons I decided to make minor changes to fix the > performance regression. > > First, being able to send the patch to the various stable branches once > merged. For people working with many CD drives attached to one station, > this is a pretty big deal, so I tried to keep the patch simple. It fixes > the regression introduced in another commit. > > Secondly, I don't have the hardware to test sophisticated or old setups, > like some of the issues linked from my patch. I have SATA CD drives with > USB->SATA bridges, no IDE, no PATA, etc. So the testing I can do is > relatively limited. > > Perhaps I or someone else can work on removing the usage of the locks, > but as it stands I think this addresses the performance issue present in > the current kernel, and removing locks and the associated testing > required with that is something I am not entirely comfortable doing. > > Cheers, > Merlijn >
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 0fbb8fe6e521..fe0e1c721a99 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -79,7 +79,6 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_WORM); CDC_CD_R|CDC_CD_RW|CDC_DVD|CDC_DVD_R|CDC_DVD_RAM|CDC_GENERIC_PACKET| \ CDC_MRW|CDC_MRW_W|CDC_RAM) -static DEFINE_MUTEX(sr_mutex); static int sr_probe(struct device *); static int sr_remove(struct device *); static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt); @@ -536,9 +535,9 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode) scsi_autopm_get_device(sdev); check_disk_change(bdev); - mutex_lock(&sr_mutex); + mutex_lock(&cd->lock); ret = cdrom_open(&cd->cdi, bdev, mode); - mutex_unlock(&sr_mutex); + mutex_unlock(&cd->lock); scsi_autopm_put_device(sdev); if (ret) @@ -551,10 +550,10 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode) static void sr_block_release(struct gendisk *disk, fmode_t mode) { struct scsi_cd *cd = scsi_cd(disk); - mutex_lock(&sr_mutex); + mutex_lock(&cd->lock); cdrom_release(&cd->cdi, mode); scsi_cd_put(cd); - mutex_unlock(&sr_mutex); + mutex_unlock(&cd->lock); } static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, @@ -565,7 +564,7 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, void __user *argp = (void __user *)arg; int ret; - mutex_lock(&sr_mutex); + mutex_lock(&cd->lock); ret = scsi_ioctl_block_when_processing_errors(sdev, cmd, (mode & FMODE_NDELAY) != 0); @@ -595,7 +594,7 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, scsi_autopm_put_device(sdev); out: - mutex_unlock(&sr_mutex); + mutex_unlock(&cd->lock); return ret; } @@ -608,7 +607,7 @@ static int sr_block_compat_ioctl(struct block_device *bdev, fmode_t mode, unsign void __user *argp = compat_ptr(arg); int ret; - mutex_lock(&sr_mutex); + mutex_lock(&cd->lock); ret = scsi_ioctl_block_when_processing_errors(sdev, cmd, (mode & FMODE_NDELAY) != 0); @@ -638,7 +637,7 @@ static int sr_block_compat_ioctl(struct block_device *bdev, fmode_t mode, unsign scsi_autopm_put_device(sdev); out: - mutex_unlock(&sr_mutex); + mutex_unlock(&cd->lock); return ret; } @@ -745,6 +744,7 @@ static int sr_probe(struct device *dev) disk = alloc_disk(1); if (!disk) goto fail_free; + mutex_init(&cd->lock); spin_lock(&sr_index_lock); minor = find_first_zero_bit(sr_index_bits, SR_DISKS); @@ -1055,6 +1055,8 @@ static void sr_kref_release(struct kref *kref) put_disk(disk); + mutex_destroy(&cd->lock); + kfree(cd); } diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h index a2bb7b8bace5..339c624e04d8 100644 --- a/drivers/scsi/sr.h +++ b/drivers/scsi/sr.h @@ -20,6 +20,7 @@ #include <linux/genhd.h> #include <linux/kref.h> +#include <linux/mutex.h> #define MAX_RETRIES 3 #define SR_TIMEOUT (30 * HZ) @@ -51,6 +52,7 @@ typedef struct scsi_cd { bool ignore_get_event:1; /* GET_EVENT is unreliable, use TUR */ struct cdrom_device_info cdi; + struct mutex lock; /* We hold gendisk and scsi_device references on probe and use * the refs on this kref to decide when to release them */ struct kref kref;