diff mbox

PCI: ensure the PCI device is locked over ->reset_notify calls

Message ID 20170523054202.7985-2-hch@lst.de (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Christoph Hellwig May 23, 2017, 5:42 a.m. UTC
Without this ->notify_reset instance may race with ->remove calls,
which can be easily triggered in NVMe.

Reported-by: Rakesh Pandit <rakesh@tuxera.com>
Tested-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/pci/pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Rakesh Pandit May 29, 2017, 9:19 a.m. UTC | #1
Hi,

On Tue, May 23, 2017 at 07:42:02AM +0200, Christoph Hellwig wrote:
> Without this ->notify_reset instance may race with ->remove calls,
> which can be easily triggered in NVMe.
>

Any input on this sometime before next -rc release would be great ?

> Reported-by: Rakesh Pandit <rakesh@tuxera.com>
> Tested-by: Rakesh Pandit <rakesh@tuxera.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/pci/pci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b01bd5bba8e6..b61ad77dc322 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4275,11 +4275,13 @@ int pci_reset_function(struct pci_dev *dev)
>  	if (rc)
>  		return rc;
>  
> +	pci_dev_lock(dev);
>  	pci_dev_save_and_disable(dev);
>  
> -	rc = pci_dev_reset(dev, 0);
> +	rc = __pci_dev_reset(dev, 0);
>  
>  	pci_dev_restore(dev);
> +	pci_dev_unlock(dev);
>  
>  	return rc;
>  }
> -- 
> 2.11.0
>
Bjorn Helgaas May 30, 2017, 10:28 p.m. UTC | #2
[+cc Alex]

On Tue, May 23, 2017 at 07:42:02AM +0200, Christoph Hellwig wrote:
> Without this ->notify_reset instance may race with ->remove calls,

Do you mean the .reset_notify() method in struct pci_error_handlers?
I don't see a "notify_reset" symbol.

Can you elaborate on exactly how this race happens?  I'm trying to
figure out whether this is also a problem or potential problem with
other reset paths like pci_try_reset_function(), pci_reset_bus(),
pci_try_reset_bus(), pci_reset_slot(), and pci_try_reset_slot().

What does the race look like when it happens?  Oops, panic, etc?

Can this also be triggered via the sysfs "reset" file?

> which can be easily triggered in NVMe.
> 
> Reported-by: Rakesh Pandit <rakesh@tuxera.com>
> Tested-by: Rakesh Pandit <rakesh@tuxera.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/pci/pci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b01bd5bba8e6..b61ad77dc322 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4275,11 +4275,13 @@ int pci_reset_function(struct pci_dev *dev)
>  	if (rc)
>  		return rc;
>  
> +	pci_dev_lock(dev);
>  	pci_dev_save_and_disable(dev);
>  
> -	rc = pci_dev_reset(dev, 0);
> +	rc = __pci_dev_reset(dev, 0);
>  
>  	pci_dev_restore(dev);
> +	pci_dev_unlock(dev);
>  
>  	return rc;
>  }
> -- 
> 2.11.0
>
Christoph Hellwig May 31, 2017, 4:58 a.m. UTC | #3
On Tue, May 30, 2017 at 05:28:44PM -0500, Bjorn Helgaas wrote:
> [+cc Alex]
> 
> On Tue, May 23, 2017 at 07:42:02AM +0200, Christoph Hellwig wrote:
> > Without this ->notify_reset instance may race with ->remove calls,
> 
> Do you mean the .reset_notify() method in struct pci_error_handlers?
> I don't see a "notify_reset" symbol.

Yes.

> Can you elaborate on exactly how this race happens?  I'm trying to
> figure out whether this is also a problem or potential problem with
> other reset paths like pci_try_reset_function(), pci_reset_bus(),
> pci_try_reset_bus(), pci_reset_slot(), and pci_try_reset_slot().
> 
> What does the race look like when it happens?  Oops, panic, etc?
> 
> Can this also be triggered via the sysfs "reset" file?

Here is the original bug report:

http://lists.infradead.org/pipermail/linux-nvme/2017-May/010345.html

the issue is triggered through sysfs.
Bjorn Helgaas May 31, 2017, 4:51 p.m. UTC | #4
On Wed, May 31, 2017 at 06:58:48AM +0200, Christoph Hellwig wrote:
> On Tue, May 30, 2017 at 05:28:44PM -0500, Bjorn Helgaas wrote:
> > [+cc Alex]
> > 
> > On Tue, May 23, 2017 at 07:42:02AM +0200, Christoph Hellwig wrote:
> > > Without this ->notify_reset instance may race with ->remove calls,
> > 
> > Do you mean the .reset_notify() method in struct pci_error_handlers?
> > I don't see a "notify_reset" symbol.
> 
> Yes.
> 
> > Can you elaborate on exactly how this race happens?  I'm trying to
> > figure out whether this is also a problem or potential problem with
> > other reset paths like pci_try_reset_function(), pci_reset_bus(),
> > pci_try_reset_bus(), pci_reset_slot(), and pci_try_reset_slot().
> > 
> > What does the race look like when it happens?  Oops, panic, etc?
> > 
> > Can this also be triggered via the sysfs "reset" file?
> 
> Here is the original bug report:
> 
> http://lists.infradead.org/pipermail/linux-nvme/2017-May/010345.html
> 
> the issue is triggered through sysfs.

Thanks.

Do you have any thoughts on whether the paths I mentioned above are
also susceptible?

I don't want to apply a patch that fixes one issue while leaving
similar ones unfixed.

Bjorn
Christoph Hellwig June 1, 2017, 10:46 a.m. UTC | #5
On Wed, May 31, 2017 at 11:51:31AM -0500, Bjorn Helgaas wrote:
> Do you have any thoughts on whether the paths I mentioned above are
> also susceptible?

At least in theory ever method call that doesn't have the device
locked is susceptible, so yes.
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b01bd5bba8e6..b61ad77dc322 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4275,11 +4275,13 @@  int pci_reset_function(struct pci_dev *dev)
 	if (rc)
 		return rc;
 
+	pci_dev_lock(dev);
 	pci_dev_save_and_disable(dev);
 
-	rc = pci_dev_reset(dev, 0);
+	rc = __pci_dev_reset(dev, 0);
 
 	pci_dev_restore(dev);
+	pci_dev_unlock(dev);
 
 	return rc;
 }