diff mbox series

[11/12] PCI/AER: Use managed resource allocations

Message ID 20180918235848.26694-12-keith.busch@intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series error handling and pciehp maintenance | expand

Commit Message

Keith Busch Sept. 18, 2018, 11:58 p.m. UTC
This uses the managed device resource allocations for the service data
so that the aer driver doesn't need to manage it, further simplifying
this driver.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

Comments

Sinan Kaya Sept. 19, 2018, 4:29 p.m. UTC | #1
On 9/18/2018 7:58 PM, Keith Busch wrote:
>   	if (status) {
>   		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
>   			   dev->irq);
> -		aer_remove(dev);
>   		return status;
>   	}
>   

Don't we still need to call aer_remove() here?

Old code would call aer_disable_rootport(rpc) via aer_remove() on IRQ allocation
failure. We are no longer doing this.
Keith Busch Sept. 19, 2018, 5:25 p.m. UTC | #2
On Wed, Sep 19, 2018 at 12:29:15PM -0400, Sinan Kaya wrote:
> On 9/18/2018 7:58 PM, Keith Busch wrote:
> >   	if (status) {
> >   		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
> >   			   dev->irq);
> > -		aer_remove(dev);
> >   		return status;
> >   	}
> 
> Don't we still need to call aer_remove() here?
> 
> Old code would call aer_disable_rootport(rpc) via aer_remove() on IRQ allocation
> failure. We are no longer doing this.

We need to call aer_disable_rootport only if aer_enable_rootport was
called, but that happens *after* irq allocation.
Sinan Kaya Sept. 19, 2018, 5:36 p.m. UTC | #3
On 9/19/2018 1:25 PM, Keith Busch wrote:
> On Wed, Sep 19, 2018 at 12:29:15PM -0400, Sinan Kaya wrote:
>> On 9/18/2018 7:58 PM, Keith Busch wrote:
>>>    	if (status) {
>>>    		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
>>>    			   dev->irq);
>>> -		aer_remove(dev);
>>>    		return status;
>>>    	}
>>
>> Don't we still need to call aer_remove() here?
>>
>> Old code would call aer_disable_rootport(rpc) via aer_remove() on IRQ allocation
>> failure. We are no longer doing this.
> 
> We need to call aer_disable_rootport only if aer_enable_rootport was
> called, but that happens *after* irq allocation.
> 

I see. Thanks for clarification.
Benjamin Herrenschmidt Sept. 25, 2018, 1:13 a.m. UTC | #4
On Tue, 2018-09-18 at 17:58 -0600, Keith Busch wrote:
> This uses the managed device resource allocations for the service data
> so that the aer driver doesn't need to manage it, further simplifying
> this driver.

Just be careful (it migh be ok, I haven't audited everything, but I got
bitten by something like that in the past) that the devm stuff will get
disposed of in two cases:

 - The owner device going away (so far so good)

 - The owner device's driver being unbound

The latter is something not completely obvious, ie, even if the owner
device still has held references, the successful completion of
->remove() on the driver will be followed by a cleanup of the managed
stuff.

As I said, it might be ok in the AER case, but you might want to at
least keep the set_service_data(dev, NULL) to make sure you don't leave
a stale pointer there.

Cheers,
Ben.

> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/aer.c | 17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1878d9d7760b..7ecad011458d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1366,11 +1366,7 @@ static void aer_remove(struct pcie_device *dev)
>  {
>  	struct aer_rpc *rpc = get_service_data(dev);
>  
> -	if (rpc) {
> -		aer_disable_rootport(rpc);
> -		kfree(rpc);
> -		set_service_data(dev, NULL);
> -	}
> +	aer_disable_rootport(rpc);
>  }
>  
>  /**
> @@ -1383,10 +1379,9 @@ static int aer_probe(struct pcie_device *dev)
>  {
>  	int status;
>  	struct aer_rpc *rpc;
> -	struct device *device = &dev->port->device;
> +	struct device *device = &dev->device;
>  
> -	/* Alloc rpc data structure */
> -	rpc = kzalloc(sizeof(struct aer_rpc), GFP_KERNEL);
> +	rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
>  	if (!rpc) {
>  		dev_printk(KERN_DEBUG, device, "alloc AER rpc failed\n");
>  		return -ENOMEM;
> @@ -1394,13 +1389,11 @@ static int aer_probe(struct pcie_device *dev)
>  	rpc->rpd = dev->port;
>  	set_service_data(dev, rpc);
>  
> -	/* Request IRQ ISR */
> -	status = request_threaded_irq(dev->irq, aer_irq, aer_isr,
> -				      IRQF_SHARED, "aerdrv", dev);
> +	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> +					   IRQF_SHARED, "aerdrv", dev);
>  	if (status) {
>  		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
>  			   dev->irq);
> -		aer_remove(dev);
>  		return status;
>  	}
>
Keith Busch Sept. 25, 2018, 2:17 p.m. UTC | #5
On Tue, Sep 25, 2018 at 11:13:42AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2018-09-18 at 17:58 -0600, Keith Busch wrote:
> > This uses the managed device resource allocations for the service data
> > so that the aer driver doesn't need to manage it, further simplifying
> > this driver.
> 
> Just be careful (it migh be ok, I haven't audited everything, but I got
> bitten by something like that in the past) that the devm stuff will get
> disposed of in two cases:
> 
>  - The owner device going away (so far so good)
> 
>  - The owner device's driver being unbound
> 
> The latter is something not completely obvious, ie, even if the owner
> device still has held references, the successful completion of
> ->remove() on the driver will be followed by a cleanup of the managed
> stuff.
> 
> As I said, it might be ok in the AER case, but you might want to at
> least keep the set_service_data(dev, NULL) to make sure you don't leave
> a stale pointer there.

Yes, these resource methods should be considered carefully. I think
we're okay here, and didn't want to set service data to NULL for a
couple reasons:

 1. The service data and its device are released together, so the device
    is already out of scope before it could hold a stale pointer.

 2. It is possible the IRQ handler may be invoked after 'remove', but
    before the managed irq is torn down. Leaving the service data set
    while it is allocated removes a requirement to check for NULL on
    each interrupt.
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 1878d9d7760b..7ecad011458d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1366,11 +1366,7 @@  static void aer_remove(struct pcie_device *dev)
 {
 	struct aer_rpc *rpc = get_service_data(dev);
 
-	if (rpc) {
-		aer_disable_rootport(rpc);
-		kfree(rpc);
-		set_service_data(dev, NULL);
-	}
+	aer_disable_rootport(rpc);
 }
 
 /**
@@ -1383,10 +1379,9 @@  static int aer_probe(struct pcie_device *dev)
 {
 	int status;
 	struct aer_rpc *rpc;
-	struct device *device = &dev->port->device;
+	struct device *device = &dev->device;
 
-	/* Alloc rpc data structure */
-	rpc = kzalloc(sizeof(struct aer_rpc), GFP_KERNEL);
+	rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
 	if (!rpc) {
 		dev_printk(KERN_DEBUG, device, "alloc AER rpc failed\n");
 		return -ENOMEM;
@@ -1394,13 +1389,11 @@  static int aer_probe(struct pcie_device *dev)
 	rpc->rpd = dev->port;
 	set_service_data(dev, rpc);
 
-	/* Request IRQ ISR */
-	status = request_threaded_irq(dev->irq, aer_irq, aer_isr,
-				      IRQF_SHARED, "aerdrv", dev);
+	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
+					   IRQF_SHARED, "aerdrv", dev);
 	if (status) {
 		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
 			   dev->irq);
-		aer_remove(dev);
 		return status;
 	}