diff mbox series

[v2,1/2] spi: spidev: fix a race condition when accessing spidev->spi

Message ID 20230106100719.196243-1-brgl@bgdev.pl (mailing list archive)
State Accepted
Commit 1f4d2dd45b6ef9c047f620d2812326a7813d2354
Headers show
Series [v2,1/2] spi: spidev: fix a race condition when accessing spidev->spi | expand

Commit Message

Bartosz Golaszewski Jan. 6, 2023, 10:07 a.m. UTC
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

There's a spinlock in place that is taken in file_operations callbacks
whenever we check if spidev->spi is still alive (not null). It's also
taken when spidev->spi is set to NULL in remove().

This however doesn't protect the code against driver unbind event while
one of the syscalls is still in progress. To that end we need a lock taken
continuously as long as we may still access spidev->spi. As both the file
ops and the remove callback are never called from interrupt context, we
can replace the spinlock with a mutex.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
---
v1 -> v2:
- use a mutex instead of an RW semaphore (but for the record: I believe that
  the semaphore is the better solution here)

 drivers/spi/spidev.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

Comments

Mark Brown Jan. 6, 2023, 2:13 p.m. UTC | #1
On Fri, Jan 06, 2023 at 11:07:18AM +0100, Bartosz Golaszewski wrote:

> - use a mutex instead of an RW semaphore (but for the record: I believe that
>   the semaphore is the better solution here)

Why?  Like I said in my original reply I'm not clear what the extra
complication is buying us.
Bartosz Golaszewski Jan. 6, 2023, 2:27 p.m. UTC | #2
On Fri, Jan 6, 2023 at 3:13 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Jan 06, 2023 at 11:07:18AM +0100, Bartosz Golaszewski wrote:
>
> > - use a mutex instead of an RW semaphore (but for the record: I believe that
> >   the semaphore is the better solution here)
>
> Why?  Like I said in my original reply I'm not clear what the extra
> complication is buying us.

Typically, we'd want to keep locking as fine-grained as possible.
Logically, there's no reason to exclude concurrent execution of
file_operations callbacks. There's a bunch of code in there that could
run at the same time that we're now covering by the mutex' critical
section. We should only be protecting spidev->spi here so any other
locking should be handled elsewhere.

IMO the complication of using an RW semaphore is insignificant and
maybe a comment next to its declaration in struct spidev would
suffice?

Bart
Mark Brown Jan. 6, 2023, 4:16 p.m. UTC | #3
On Fri, Jan 06, 2023 at 03:27:57PM +0100, Bartosz Golaszewski wrote:

> IMO the complication of using an RW semaphore is insignificant and
> maybe a comment next to its declaration in struct spidev would
> suffice?

The complication is using a semaphore at all, it's a pretty unusual
locking construct and the whole up/down thing isn't clear to people
who aren't familiar with it.  Given that there's no clounting being
used rwlock would be a much more obvious choice if the microseconds
of extra concurrency is meaningful somehow.
Bartosz Golaszewski Jan. 6, 2023, 4:27 p.m. UTC | #4
On Fri, Jan 6, 2023 at 5:16 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Jan 06, 2023 at 03:27:57PM +0100, Bartosz Golaszewski wrote:
>
> > IMO the complication of using an RW semaphore is insignificant and
> > maybe a comment next to its declaration in struct spidev would
> > suffice?
>
> The complication is using a semaphore at all, it's a pretty unusual
> locking construct and the whole up/down thing isn't clear to people
> who aren't familiar with it.  Given that there's no clounting being
> used rwlock would be a much more obvious choice if the microseconds
> of extra concurrency is meaningful somehow.

I don't have any numbers, it's just that in this case the rwsem feels
more correct. My opinion is not very strong so you can apply v2.

Bart
Mark Brown Jan. 6, 2023, 4:34 p.m. UTC | #5
On Fri, Jan 06, 2023 at 05:27:34PM +0100, Bartosz Golaszewski wrote:
> On Fri, Jan 6, 2023 at 5:16 PM Mark Brown <broonie@kernel.org> wrote:

> > The complication is using a semaphore at all, it's a pretty unusual
> > locking construct and the whole up/down thing isn't clear to people
> > who aren't familiar with it.  Given that there's no clounting being
> > used rwlock would be a much more obvious choice if the microseconds
> > of extra concurrency is meaningful somehow.

> I don't have any numbers, it's just that in this case the rwsem feels
> more correct. My opinion is not very strong so you can apply v2.

Like I say the semaphore in particular feels wrong when we don't need
the counting, we have an explicit reader/writer lock if that's what
you're trying to accomplish.
Bartosz Golaszewski Jan. 6, 2023, 4:36 p.m. UTC | #6
On Fri, Jan 6, 2023 at 5:34 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Jan 06, 2023 at 05:27:34PM +0100, Bartosz Golaszewski wrote:
> > On Fri, Jan 6, 2023 at 5:16 PM Mark Brown <broonie@kernel.org> wrote:
>
> > > The complication is using a semaphore at all, it's a pretty unusual
> > > locking construct and the whole up/down thing isn't clear to people
> > > who aren't familiar with it.  Given that there's no clounting being
> > > used rwlock would be a much more obvious choice if the microseconds
> > > of extra concurrency is meaningful somehow.
>
> > I don't have any numbers, it's just that in this case the rwsem feels
> > more correct. My opinion is not very strong so you can apply v2.
>
> Like I say the semaphore in particular feels wrong when we don't need
> the counting, we have an explicit reader/writer lock if that's what
> you're trying to accomplish.

Let's go with a mutex and see if anyone complains, if so, we can rethink it.

Bart
Mark Brown Jan. 6, 2023, 9:56 p.m. UTC | #7
On Fri, 06 Jan 2023 11:07:18 +0100, Bartosz Golaszewski wrote:
> There's a spinlock in place that is taken in file_operations callbacks
> whenever we check if spidev->spi is still alive (not null). It's also
> taken when spidev->spi is set to NULL in remove().
> 
> This however doesn't protect the code against driver unbind event while
> one of the syscalls is still in progress. To that end we need a lock taken
> continuously as long as we may still access spidev->spi. As both the file
> ops and the remove callback are never called from interrupt context, we
> can replace the spinlock with a mutex.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/2] spi: spidev: fix a race condition when accessing spidev->spi
      commit: 1f4d2dd45b6ef9c047f620d2812326a7813d2354
[2/2] spi: spidev: remove debug messages that access spidev->spi without locking
      commit: 50028988403ab8e031093655af9c6919ecba3aa6

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
diff mbox series

Patch

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 6313e7d0cdf8..42aaaca67265 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -68,7 +68,7 @@  static_assert(N_SPI_MINORS > 0 && N_SPI_MINORS <= 256);
 
 struct spidev_data {
 	dev_t			devt;
-	spinlock_t		spi_lock;
+	struct mutex		spi_lock;
 	struct spi_device	*spi;
 	struct list_head	device_entry;
 
@@ -95,9 +95,8 @@  spidev_sync(struct spidev_data *spidev, struct spi_message *message)
 	int status;
 	struct spi_device *spi;
 
-	spin_lock_irq(&spidev->spi_lock);
+	mutex_lock(&spidev->spi_lock);
 	spi = spidev->spi;
-	spin_unlock_irq(&spidev->spi_lock);
 
 	if (spi == NULL)
 		status = -ESHUTDOWN;
@@ -107,6 +106,7 @@  spidev_sync(struct spidev_data *spidev, struct spi_message *message)
 	if (status == 0)
 		status = message->actual_length;
 
+	mutex_unlock(&spidev->spi_lock);
 	return status;
 }
 
@@ -359,12 +359,12 @@  spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	 * we issue this ioctl.
 	 */
 	spidev = filp->private_data;
-	spin_lock_irq(&spidev->spi_lock);
+	mutex_lock(&spidev->spi_lock);
 	spi = spi_dev_get(spidev->spi);
-	spin_unlock_irq(&spidev->spi_lock);
-
-	if (spi == NULL)
+	if (spi == NULL) {
+		mutex_unlock(&spidev->spi_lock);
 		return -ESHUTDOWN;
+	}
 
 	/* use the buffer lock here for triple duty:
 	 *  - prevent I/O (from us) so calling spi_setup() is safe;
@@ -508,6 +508,7 @@  spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 
 	mutex_unlock(&spidev->buf_lock);
 	spi_dev_put(spi);
+	mutex_unlock(&spidev->spi_lock);
 	return retval;
 }
 
@@ -529,12 +530,12 @@  spidev_compat_ioc_message(struct file *filp, unsigned int cmd,
 	 * we issue this ioctl.
 	 */
 	spidev = filp->private_data;
-	spin_lock_irq(&spidev->spi_lock);
+	mutex_lock(&spidev->spi_lock);
 	spi = spi_dev_get(spidev->spi);
-	spin_unlock_irq(&spidev->spi_lock);
-
-	if (spi == NULL)
+	if (spi == NULL) {
+		mutex_unlock(&spidev->spi_lock);
 		return -ESHUTDOWN;
+	}
 
 	/* SPI_IOC_MESSAGE needs the buffer locked "normally" */
 	mutex_lock(&spidev->buf_lock);
@@ -561,6 +562,7 @@  spidev_compat_ioc_message(struct file *filp, unsigned int cmd,
 done:
 	mutex_unlock(&spidev->buf_lock);
 	spi_dev_put(spi);
+	mutex_unlock(&spidev->spi_lock);
 	return retval;
 }
 
@@ -640,10 +642,10 @@  static int spidev_release(struct inode *inode, struct file *filp)
 	spidev = filp->private_data;
 	filp->private_data = NULL;
 
-	spin_lock_irq(&spidev->spi_lock);
+	mutex_lock(&spidev->spi_lock);
 	/* ... after we unbound from the underlying device? */
 	dofree = (spidev->spi == NULL);
-	spin_unlock_irq(&spidev->spi_lock);
+	mutex_unlock(&spidev->spi_lock);
 
 	/* last close? */
 	spidev->users--;
@@ -776,7 +778,7 @@  static int spidev_probe(struct spi_device *spi)
 
 	/* Initialize the driver data */
 	spidev->spi = spi;
-	spin_lock_init(&spidev->spi_lock);
+	mutex_init(&spidev->spi_lock);
 	mutex_init(&spidev->buf_lock);
 
 	INIT_LIST_HEAD(&spidev->device_entry);
@@ -821,9 +823,9 @@  static void spidev_remove(struct spi_device *spi)
 	/* prevent new opens */
 	mutex_lock(&device_list_lock);
 	/* make sure ops on existing fds can abort cleanly */
-	spin_lock_irq(&spidev->spi_lock);
+	mutex_lock(&spidev->spi_lock);
 	spidev->spi = NULL;
-	spin_unlock_irq(&spidev->spi_lock);
+	mutex_unlock(&spidev->spi_lock);
 
 	list_del(&spidev->device_entry);
 	device_destroy(spidev_class, spidev->devt);