diff mbox

pci/iov: return a reference to PF on destroying VF

Message ID 1428655984-26903-1-git-send-email-weiyang@linux.vnet.ibm.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Wei Yang April 10, 2015, 8:53 a.m. UTC
Each VF will get a reference to its PF, while it is not returned back in
all cases and leave a removed PF's pci_dev un-released.

As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,
when removing devices on a bus, we do it in the reverse order. This means
we would remove VFs first, then PFs. After doing so, VF's removal is done
with pci_stop_and_remove_bus_device() instead of virtfn_remove().
virtfn_remove() returns the reference of its PF, while
pci_stop_and_remove_bus_device() doesn't.

This patches moves the return of PF's reference to pci_destroy_dev() to
make sure the PF's pci_dev is released in any case.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c    |    1 -
 drivers/pci/remove.c |    5 +++++
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas May 5, 2015, 9:29 p.m. UTC | #1
On Fri, Apr 10, 2015 at 04:53:04PM +0800, Wei Yang wrote:
> Each VF will get a reference to its PF, while it is not returned back in
> all cases and leave a removed PF's pci_dev un-released.
> 
> As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,
> when removing devices on a bus, we do it in the reverse order. This means
> we would remove VFs first, then PFs. After doing so, VF's removal is done
> with pci_stop_and_remove_bus_device() instead of virtfn_remove().
> virtfn_remove() returns the reference of its PF, while
> pci_stop_and_remove_bus_device() doesn't.

Please use conventional citation style (12-char SHA1).

ac205b7bb72f appeared in v3.4.  Did that commit cause a regression?
Should this patch be marked for stable?

"After doing so, VF removal is done with pci_stop_and_remove_device() ..."
After doing what?  After removing the VFs and PFs?  After commit
ac205b7bb72f?

Prior to your patch, the VF reference was released in virtfn_remove(),
which is only called via pci_disable_sriov().  This typically happens in
a driver .remove() method.  The reference is *not* released if we call
pci_stop_and_remove_bus_device(VF) directly, as we would via the
remove_store() (sysfs "remove" file) or hot unplug paths, e.g.,
pciehp_unconfigure_device().

After your patch, the VF reference is released in pci_destroy_dev().  This
is called from pci_disable_sriov(), so it happens in that path as before.
But pci_destroy_dev() is called from pci_stop_and_remove_bus_device(), so
the reference is now released for all the paths that use
pci_stop_and_remove_bus_device().

What about the other things done in virtfn_remove(), e.g., the sysfs link
removal?  Your patch fixes a reference count leak, but don't we still have
a sysfs link leak?

It would be useful to mention a way to cause the leak.  I suspect writing
to a VF's sysfs "remove" file is the easiest.

> This patches moves the return of PF's reference to pci_destroy_dev() to
> make sure the PF's pci_dev is released in any case.
> 
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
>  drivers/pci/iov.c    |    1 -
>  drivers/pci/remove.c |    5 +++++
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 4b3a4ea..9b04bde 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -167,7 +167,6 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset)
>  
>  	/* balance pci_get_domain_bus_and_slot() */
>  	pci_dev_put(virtfn);
> -	pci_dev_put(dev);
>  }
>  
>  static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index 8bd76c9..836ddf6 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -41,6 +41,11 @@ static void pci_destroy_dev(struct pci_dev *dev)
>  	list_del(&dev->bus_list);
>  	up_write(&pci_bus_sem);
>  
> +#ifdef CONFIG_PCI_IOV
> +	if (dev->is_virtfn)
> +		pci_dev_put(dev->physfn);
> +#endif
> +
>  	pci_free_resources(dev);
>  	put_device(&dev->dev);
>  }
> -- 
> 1.7.9.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Yang May 6, 2015, 6:09 a.m. UTC | #2
On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote:
>On Fri, Apr 10, 2015 at 04:53:04PM +0800, Wei Yang wrote:
>> Each VF will get a reference to its PF, while it is not returned back in
>> all cases and leave a removed PF's pci_dev un-released.
>> 
>> As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,
>> when removing devices on a bus, we do it in the reverse order. This means
>> we would remove VFs first, then PFs. After doing so, VF's removal is done
>> with pci_stop_and_remove_bus_device() instead of virtfn_remove().
>> virtfn_remove() returns the reference of its PF, while
>> pci_stop_and_remove_bus_device() doesn't.

Hi, Bjorn

Nice to see you again :-)

>
>Please use conventional citation style (12-char SHA1).

sure, I'd like to change it.

>
>ac205b7bb72f appeared in v3.4.  Did that commit cause a regression?
>Should this patch be marked for stable?
>

Hmm... regression.

I think commit ac205b7bb72f is not a complete fix for the problem. Before
commit ac205b7bb72f, system would crash, and after that, at least we can use
the machine.

While yes, I prefer this could be in stable tree.

>"After doing so, VF removal is done with pci_stop_and_remove_device() ..."
>After doing what?  After removing the VFs and PFs?  After commit
>ac205b7bb72f?

Sorry for my poor expression.

Here I mean, after commit ac205b7bb72f.
Before this commit ac205b7bb72f, VFs are destroyed in sriov_disable() called 
by the PF's driver. After this commit ac205b7bb72f, since we reverse the
order, VFs are destroyed by the pci_stop_and_remove_bus_device() in the loop.

>
>Prior to your patch, the VF reference was released in virtfn_remove(),
>which is only called via pci_disable_sriov().  This typically happens in
>a driver .remove() method.  The reference is *not* released if we call
>pci_stop_and_remove_bus_device(VF) directly, as we would via the
>remove_store() (sysfs "remove" file) or hot unplug paths, e.g.,
>pciehp_unconfigure_device().

You want to say the reference for VF or PF?

VF's reference is still released in virtfn_remove.
pci_stop_and_remove_bus_device() will call pci_destroy_dev() which will put
the dev's reference.

Maybe I don't get your point.

>
>After your patch, the VF reference is released in pci_destroy_dev().  This
>is called from pci_disable_sriov(), so it happens in that path as before.
>But pci_destroy_dev() is called from pci_stop_and_remove_bus_device(), so
>the reference is now released for all the paths that use
>pci_stop_and_remove_bus_device().
>

The change in this patch is the reference release of the PF.

Before my patch, PF's reference is released in virtfn_remove(). After my
patch, PF's reference is released in the pci_destroy_dev() of the VF.

>What about the other things done in virtfn_remove(), e.g., the sysfs link
>removal?  Your patch fixes a reference count leak, but don't we still have
>a sysfs link leak?
>

Agree, I am afraid the sysfs would have a leak too.

While I am not that familiar with the sysfs part, I don't dare to move that to
pci_destroy_dev(). Need more investigation.

>It would be useful to mention a way to cause the leak.  I suspect writing
>to a VF's sysfs "remove" file is the easiest.
>

I use the EEH hotplug case to see the leak, sounds your way is more general.
I will do some tests, and if it is true, I will put it in the change log.

>> This patches moves the return of PF's reference to pci_destroy_dev() to
>> make sure the PF's pci_dev is released in any case.
>> 
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> ---
>>  drivers/pci/iov.c    |    1 -
>>  drivers/pci/remove.c |    5 +++++
>>  2 files changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> index 4b3a4ea..9b04bde 100644
>> --- a/drivers/pci/iov.c
>> +++ b/drivers/pci/iov.c
>> @@ -167,7 +167,6 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset)
>>  
>>  	/* balance pci_get_domain_bus_and_slot() */
>>  	pci_dev_put(virtfn);
>> -	pci_dev_put(dev);
>>  }
>>  
>>  static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
>> index 8bd76c9..836ddf6 100644
>> --- a/drivers/pci/remove.c
>> +++ b/drivers/pci/remove.c
>> @@ -41,6 +41,11 @@ static void pci_destroy_dev(struct pci_dev *dev)
>>  	list_del(&dev->bus_list);
>>  	up_write(&pci_bus_sem);
>>  
>> +#ifdef CONFIG_PCI_IOV
>> +	if (dev->is_virtfn)
>> +		pci_dev_put(dev->physfn);
>> +#endif
>> +
>>  	pci_free_resources(dev);
>>  	put_device(&dev->dev);
>>  }
>> -- 
>> 1.7.9.5
>>
Wei Yang May 6, 2015, 7:25 a.m. UTC | #3
On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote:
>On Fri, Apr 10, 2015 at 04:53:04PM +0800, Wei Yang wrote:
>> Each VF will get a reference to its PF, while it is not returned back in
>> all cases and leave a removed PF's pci_dev un-released.
>> 
>> As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates,
>> when removing devices on a bus, we do it in the reverse order. This means
>> we would remove VFs first, then PFs. After doing so, VF's removal is done
>> with pci_stop_and_remove_bus_device() instead of virtfn_remove().
>> virtfn_remove() returns the reference of its PF, while
>> pci_stop_and_remove_bus_device() doesn't.
>
>Please use conventional citation style (12-char SHA1).
>
>ac205b7bb72f appeared in v3.4.  Did that commit cause a regression?
>Should this patch be marked for stable?
>
>"After doing so, VF removal is done with pci_stop_and_remove_device() ..."
>After doing what?  After removing the VFs and PFs?  After commit
>ac205b7bb72f?
>
>Prior to your patch, the VF reference was released in virtfn_remove(),
>which is only called via pci_disable_sriov().  This typically happens in
>a driver .remove() method.  The reference is *not* released if we call
>pci_stop_and_remove_bus_device(VF) directly, as we would via the
>remove_store() (sysfs "remove" file) or hot unplug paths, e.g.,
>pciehp_unconfigure_device().
>
>After your patch, the VF reference is released in pci_destroy_dev().  This
>is called from pci_disable_sriov(), so it happens in that path as before.
>But pci_destroy_dev() is called from pci_stop_and_remove_bus_device(), so
>the reference is now released for all the paths that use
>pci_stop_and_remove_bus_device().
>
>What about the other things done in virtfn_remove(), e.g., the sysfs link
>removal?  Your patch fixes a reference count leak, but don't we still have
>a sysfs link leak?
>
>It would be useful to mention a way to cause the leak.  I suspect writing
>to a VF's sysfs "remove" file is the easiest.
>

Looks a VF don't support the remove now.

static umode_t pci_dev_hp_attrs_are_visible(struct kobject *kobj,
					    struct attribute *a, int n)
{
	struct device *dev = container_of(kobj, struct device, kobj);
	struct pci_dev *pdev = to_pci_dev(dev);

	if (pdev->is_virtfn)
		return 0;

	return a->mode;
}

static struct attribute_group pci_dev_hp_attr_group = {
	.attrs = pci_dev_hp_attrs,
	.is_visible = pci_dev_hp_attrs_are_visible,
};


>> This patches moves the return of PF's reference to pci_destroy_dev() to
>> make sure the PF's pci_dev is released in any case.
>> 
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> ---
>>  drivers/pci/iov.c    |    1 -
>>  drivers/pci/remove.c |    5 +++++
>>  2 files changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> index 4b3a4ea..9b04bde 100644
>> --- a/drivers/pci/iov.c
>> +++ b/drivers/pci/iov.c
>> @@ -167,7 +167,6 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset)
>>  
>>  	/* balance pci_get_domain_bus_and_slot() */
>>  	pci_dev_put(virtfn);
>> -	pci_dev_put(dev);
>>  }
>>  
>>  static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
>> index 8bd76c9..836ddf6 100644
>> --- a/drivers/pci/remove.c
>> +++ b/drivers/pci/remove.c
>> @@ -41,6 +41,11 @@ static void pci_destroy_dev(struct pci_dev *dev)
>>  	list_del(&dev->bus_list);
>>  	up_write(&pci_bus_sem);
>>  
>> +#ifdef CONFIG_PCI_IOV
>> +	if (dev->is_virtfn)
>> +		pci_dev_put(dev->physfn);
>> +#endif
>> +
>>  	pci_free_resources(dev);
>>  	put_device(&dev->dev);
>>  }
>> -- 
>> 1.7.9.5
>>
Bjorn Helgaas May 6, 2015, 3:23 p.m. UTC | #4
On Wed, May 6, 2015 at 2:25 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote:

>>It would be useful to mention a way to cause the leak.  I suspect writing
>>to a VF's sysfs "remove" file is the easiest.
>>
>
> Looks a VF don't support the remove now.
>
> static umode_t pci_dev_hp_attrs_are_visible(struct kobject *kobj,
>                                             struct attribute *a, int n)
> {
>         struct device *dev = container_of(kobj, struct device, kobj);
>         struct pci_dev *pdev = to_pci_dev(dev);
>
>         if (pdev->is_virtfn)
>                 return 0;
>
>         return a->mode;
> }
>
> static struct attribute_group pci_dev_hp_attr_group = {
>         .attrs = pci_dev_hp_attrs,
>         .is_visible = pci_dev_hp_attrs_are_visible,
> };

Right, I forgot about this.  A VF has no "remove" file, so it can't be
used to cause the leak.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas May 6, 2015, 3:30 p.m. UTC | #5
On Wed, May 6, 2015 at 1:09 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote:

>>Prior to your patch, the VF reference was released in virtfn_remove(),
>>which is only called via pci_disable_sriov().  This typically happens in
>>a driver .remove() method.  The reference is *not* released if we call
>>pci_stop_and_remove_bus_device(VF) directly, as we would via the
>>remove_store() (sysfs "remove" file) or hot unplug paths, e.g.,
>>pciehp_unconfigure_device().
>
> You want to say the reference for VF or PF?

Yes, I meant the PF reference.

The hot unplug paths call pci_stop_and_remove_bus_device() for the VFs
first, then the PF.  Calling it for the VF releases the VF reference
but not the PF one.  Calling it for the PF will call virtfn_remove()
via the driver's .remove() method, but it probably does nothing
because when it calls pci_get_domain_bus_and_slot() to get the virtfn,
it gets a NULL because the VF has already been removed.

>>What about the other things done in virtfn_remove(), e.g., the sysfs link
>>removal?  Your patch fixes a reference count leak, but don't we still have
>>a sysfs link leak?
>
> Agree, I am afraid the sysfs would have a leak too.
>
> While I am not that familiar with the sysfs part, I don't dare to move that to
> pci_destroy_dev(). Need more investigation.

OK.  I don't want to fix half of the leak problem.  I want to fix the
whole thing at once.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Yang May 7, 2015, 2:35 a.m. UTC | #6
On Wed, May 06, 2015 at 10:30:39AM -0500, Bjorn Helgaas wrote:
>On Wed, May 6, 2015 at 1:09 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>> On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote:
>
>>>Prior to your patch, the VF reference was released in virtfn_remove(),
>>>which is only called via pci_disable_sriov().  This typically happens in
>>>a driver .remove() method.  The reference is *not* released if we call
>>>pci_stop_and_remove_bus_device(VF) directly, as we would via the
>>>remove_store() (sysfs "remove" file) or hot unplug paths, e.g.,
>>>pciehp_unconfigure_device().
>>
>> You want to say the reference for VF or PF?
>
>Yes, I meant the PF reference.
>
>The hot unplug paths call pci_stop_and_remove_bus_device() for the VFs
>first, then the PF.  Calling it for the VF releases the VF reference
>but not the PF one.  Calling it for the PF will call virtfn_remove()
>via the driver's .remove() method, but it probably does nothing
>because when it calls pci_get_domain_bus_and_slot() to get the virtfn,
>it gets a NULL because the VF has already been removed.
>
>>>What about the other things done in virtfn_remove(), e.g., the sysfs link
>>>removal?  Your patch fixes a reference count leak, but don't we still have
>>>a sysfs link leak?
>>
>> Agree, I am afraid the sysfs would have a leak too.
>>
>> While I am not that familiar with the sysfs part, I don't dare to move that to
>> pci_destroy_dev(). Need more investigation.
>
>OK.  I don't want to fix half of the leak problem.  I want to fix the
>whole thing at once.
>

Sure, I will do some investigation and repost it.

BTW, seems we also face the leak of the virtual bus. Will fix it in next
version too.

>Bjorn
Wei Yang May 7, 2015, 2:40 a.m. UTC | #7
On Wed, May 06, 2015 at 10:23:26AM -0500, Bjorn Helgaas wrote:
>On Wed, May 6, 2015 at 2:25 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>> On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote:
>
>>>It would be useful to mention a way to cause the leak.  I suspect writing
>>>to a VF's sysfs "remove" file is the easiest.
>>>
>>
>> Looks a VF don't support the remove now.
>>
>> static umode_t pci_dev_hp_attrs_are_visible(struct kobject *kobj,
>>                                             struct attribute *a, int n)
>> {
>>         struct device *dev = container_of(kobj, struct device, kobj);
>>         struct pci_dev *pdev = to_pci_dev(dev);
>>
>>         if (pdev->is_virtfn)
>>                 return 0;
>>
>>         return a->mode;
>> }
>>
>> static struct attribute_group pci_dev_hp_attr_group = {
>>         .attrs = pci_dev_hp_attrs,
>>         .is_visible = pci_dev_hp_attrs_are_visible,
>> };
>
>Right, I forgot about this.  A VF has no "remove" file, so it can't be
>used to cause the leak.

Hmm, I see this is introduced in commit dfab88beda88, to prevent some memory
leak.

Could we say, if we do the release properly like in this patch, we could
re-enable this "remove"?
diff mbox

Patch

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 4b3a4ea..9b04bde 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -167,7 +167,6 @@  static void virtfn_remove(struct pci_dev *dev, int id, int reset)
 
 	/* balance pci_get_domain_bus_and_slot() */
 	pci_dev_put(virtfn);
-	pci_dev_put(dev);
 }
 
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 8bd76c9..836ddf6 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -41,6 +41,11 @@  static void pci_destroy_dev(struct pci_dev *dev)
 	list_del(&dev->bus_list);
 	up_write(&pci_bus_sem);
 
+#ifdef CONFIG_PCI_IOV
+	if (dev->is_virtfn)
+		pci_dev_put(dev->physfn);
+#endif
+
 	pci_free_resources(dev);
 	put_device(&dev->dev);
 }