Message ID | 1428655984-26903-1-git-send-email-weiyang@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Fri, Apr 10, 2015 at 04:53:04PM +0800, Wei Yang wrote: > Each VF will get a reference to its PF, while it is not returned back in > all cases and leave a removed PF's pci_dev un-released. > > As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates, > when removing devices on a bus, we do it in the reverse order. This means > we would remove VFs first, then PFs. After doing so, VF's removal is done > with pci_stop_and_remove_bus_device() instead of virtfn_remove(). > virtfn_remove() returns the reference of its PF, while > pci_stop_and_remove_bus_device() doesn't. Please use conventional citation style (12-char SHA1). ac205b7bb72f appeared in v3.4. Did that commit cause a regression? Should this patch be marked for stable? "After doing so, VF removal is done with pci_stop_and_remove_device() ..." After doing what? After removing the VFs and PFs? After commit ac205b7bb72f? Prior to your patch, the VF reference was released in virtfn_remove(), which is only called via pci_disable_sriov(). This typically happens in a driver .remove() method. The reference is *not* released if we call pci_stop_and_remove_bus_device(VF) directly, as we would via the remove_store() (sysfs "remove" file) or hot unplug paths, e.g., pciehp_unconfigure_device(). After your patch, the VF reference is released in pci_destroy_dev(). This is called from pci_disable_sriov(), so it happens in that path as before. But pci_destroy_dev() is called from pci_stop_and_remove_bus_device(), so the reference is now released for all the paths that use pci_stop_and_remove_bus_device(). What about the other things done in virtfn_remove(), e.g., the sysfs link removal? Your patch fixes a reference count leak, but don't we still have a sysfs link leak? It would be useful to mention a way to cause the leak. I suspect writing to a VF's sysfs "remove" file is the easiest. > This patches moves the return of PF's reference to pci_destroy_dev() to > make sure the PF's pci_dev is released in any case. > > Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> > --- > drivers/pci/iov.c | 1 - > drivers/pci/remove.c | 5 +++++ > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index 4b3a4ea..9b04bde 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -167,7 +167,6 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) > > /* balance pci_get_domain_bus_and_slot() */ > pci_dev_put(virtfn); > - pci_dev_put(dev); > } > > static int sriov_enable(struct pci_dev *dev, int nr_virtfn) > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index 8bd76c9..836ddf6 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -41,6 +41,11 @@ static void pci_destroy_dev(struct pci_dev *dev) > list_del(&dev->bus_list); > up_write(&pci_bus_sem); > > +#ifdef CONFIG_PCI_IOV > + if (dev->is_virtfn) > + pci_dev_put(dev->physfn); > +#endif > + > pci_free_resources(dev); > put_device(&dev->dev); > } > -- > 1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote: >On Fri, Apr 10, 2015 at 04:53:04PM +0800, Wei Yang wrote: >> Each VF will get a reference to its PF, while it is not returned back in >> all cases and leave a removed PF's pci_dev un-released. >> >> As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates, >> when removing devices on a bus, we do it in the reverse order. This means >> we would remove VFs first, then PFs. After doing so, VF's removal is done >> with pci_stop_and_remove_bus_device() instead of virtfn_remove(). >> virtfn_remove() returns the reference of its PF, while >> pci_stop_and_remove_bus_device() doesn't. Hi, Bjorn Nice to see you again :-) > >Please use conventional citation style (12-char SHA1). sure, I'd like to change it. > >ac205b7bb72f appeared in v3.4. Did that commit cause a regression? >Should this patch be marked for stable? > Hmm... regression. I think commit ac205b7bb72f is not a complete fix for the problem. Before commit ac205b7bb72f, system would crash, and after that, at least we can use the machine. While yes, I prefer this could be in stable tree. >"After doing so, VF removal is done with pci_stop_and_remove_device() ..." >After doing what? After removing the VFs and PFs? After commit >ac205b7bb72f? Sorry for my poor expression. Here I mean, after commit ac205b7bb72f. Before this commit ac205b7bb72f, VFs are destroyed in sriov_disable() called by the PF's driver. After this commit ac205b7bb72f, since we reverse the order, VFs are destroyed by the pci_stop_and_remove_bus_device() in the loop. > >Prior to your patch, the VF reference was released in virtfn_remove(), >which is only called via pci_disable_sriov(). This typically happens in >a driver .remove() method. The reference is *not* released if we call >pci_stop_and_remove_bus_device(VF) directly, as we would via the >remove_store() (sysfs "remove" file) or hot unplug paths, e.g., >pciehp_unconfigure_device(). You want to say the reference for VF or PF? VF's reference is still released in virtfn_remove. pci_stop_and_remove_bus_device() will call pci_destroy_dev() which will put the dev's reference. Maybe I don't get your point. > >After your patch, the VF reference is released in pci_destroy_dev(). This >is called from pci_disable_sriov(), so it happens in that path as before. >But pci_destroy_dev() is called from pci_stop_and_remove_bus_device(), so >the reference is now released for all the paths that use >pci_stop_and_remove_bus_device(). > The change in this patch is the reference release of the PF. Before my patch, PF's reference is released in virtfn_remove(). After my patch, PF's reference is released in the pci_destroy_dev() of the VF. >What about the other things done in virtfn_remove(), e.g., the sysfs link >removal? Your patch fixes a reference count leak, but don't we still have >a sysfs link leak? > Agree, I am afraid the sysfs would have a leak too. While I am not that familiar with the sysfs part, I don't dare to move that to pci_destroy_dev(). Need more investigation. >It would be useful to mention a way to cause the leak. I suspect writing >to a VF's sysfs "remove" file is the easiest. > I use the EEH hotplug case to see the leak, sounds your way is more general. I will do some tests, and if it is true, I will put it in the change log. >> This patches moves the return of PF's reference to pci_destroy_dev() to >> make sure the PF's pci_dev is released in any case. >> >> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> >> --- >> drivers/pci/iov.c | 1 - >> drivers/pci/remove.c | 5 +++++ >> 2 files changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c >> index 4b3a4ea..9b04bde 100644 >> --- a/drivers/pci/iov.c >> +++ b/drivers/pci/iov.c >> @@ -167,7 +167,6 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) >> >> /* balance pci_get_domain_bus_and_slot() */ >> pci_dev_put(virtfn); >> - pci_dev_put(dev); >> } >> >> static int sriov_enable(struct pci_dev *dev, int nr_virtfn) >> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c >> index 8bd76c9..836ddf6 100644 >> --- a/drivers/pci/remove.c >> +++ b/drivers/pci/remove.c >> @@ -41,6 +41,11 @@ static void pci_destroy_dev(struct pci_dev *dev) >> list_del(&dev->bus_list); >> up_write(&pci_bus_sem); >> >> +#ifdef CONFIG_PCI_IOV >> + if (dev->is_virtfn) >> + pci_dev_put(dev->physfn); >> +#endif >> + >> pci_free_resources(dev); >> put_device(&dev->dev); >> } >> -- >> 1.7.9.5 >>
On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote: >On Fri, Apr 10, 2015 at 04:53:04PM +0800, Wei Yang wrote: >> Each VF will get a reference to its PF, while it is not returned back in >> all cases and leave a removed PF's pci_dev un-released. >> >> As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates, >> when removing devices on a bus, we do it in the reverse order. This means >> we would remove VFs first, then PFs. After doing so, VF's removal is done >> with pci_stop_and_remove_bus_device() instead of virtfn_remove(). >> virtfn_remove() returns the reference of its PF, while >> pci_stop_and_remove_bus_device() doesn't. > >Please use conventional citation style (12-char SHA1). > >ac205b7bb72f appeared in v3.4. Did that commit cause a regression? >Should this patch be marked for stable? > >"After doing so, VF removal is done with pci_stop_and_remove_device() ..." >After doing what? After removing the VFs and PFs? After commit >ac205b7bb72f? > >Prior to your patch, the VF reference was released in virtfn_remove(), >which is only called via pci_disable_sriov(). This typically happens in >a driver .remove() method. The reference is *not* released if we call >pci_stop_and_remove_bus_device(VF) directly, as we would via the >remove_store() (sysfs "remove" file) or hot unplug paths, e.g., >pciehp_unconfigure_device(). > >After your patch, the VF reference is released in pci_destroy_dev(). This >is called from pci_disable_sriov(), so it happens in that path as before. >But pci_destroy_dev() is called from pci_stop_and_remove_bus_device(), so >the reference is now released for all the paths that use >pci_stop_and_remove_bus_device(). > >What about the other things done in virtfn_remove(), e.g., the sysfs link >removal? Your patch fixes a reference count leak, but don't we still have >a sysfs link leak? > >It would be useful to mention a way to cause the leak. I suspect writing >to a VF's sysfs "remove" file is the easiest. > Looks a VF don't support the remove now. static umode_t pci_dev_hp_attrs_are_visible(struct kobject *kobj, struct attribute *a, int n) { struct device *dev = container_of(kobj, struct device, kobj); struct pci_dev *pdev = to_pci_dev(dev); if (pdev->is_virtfn) return 0; return a->mode; } static struct attribute_group pci_dev_hp_attr_group = { .attrs = pci_dev_hp_attrs, .is_visible = pci_dev_hp_attrs_are_visible, }; >> This patches moves the return of PF's reference to pci_destroy_dev() to >> make sure the PF's pci_dev is released in any case. >> >> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> >> --- >> drivers/pci/iov.c | 1 - >> drivers/pci/remove.c | 5 +++++ >> 2 files changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c >> index 4b3a4ea..9b04bde 100644 >> --- a/drivers/pci/iov.c >> +++ b/drivers/pci/iov.c >> @@ -167,7 +167,6 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) >> >> /* balance pci_get_domain_bus_and_slot() */ >> pci_dev_put(virtfn); >> - pci_dev_put(dev); >> } >> >> static int sriov_enable(struct pci_dev *dev, int nr_virtfn) >> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c >> index 8bd76c9..836ddf6 100644 >> --- a/drivers/pci/remove.c >> +++ b/drivers/pci/remove.c >> @@ -41,6 +41,11 @@ static void pci_destroy_dev(struct pci_dev *dev) >> list_del(&dev->bus_list); >> up_write(&pci_bus_sem); >> >> +#ifdef CONFIG_PCI_IOV >> + if (dev->is_virtfn) >> + pci_dev_put(dev->physfn); >> +#endif >> + >> pci_free_resources(dev); >> put_device(&dev->dev); >> } >> -- >> 1.7.9.5 >>
On Wed, May 6, 2015 at 2:25 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote: > On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote: >>It would be useful to mention a way to cause the leak. I suspect writing >>to a VF's sysfs "remove" file is the easiest. >> > > Looks a VF don't support the remove now. > > static umode_t pci_dev_hp_attrs_are_visible(struct kobject *kobj, > struct attribute *a, int n) > { > struct device *dev = container_of(kobj, struct device, kobj); > struct pci_dev *pdev = to_pci_dev(dev); > > if (pdev->is_virtfn) > return 0; > > return a->mode; > } > > static struct attribute_group pci_dev_hp_attr_group = { > .attrs = pci_dev_hp_attrs, > .is_visible = pci_dev_hp_attrs_are_visible, > }; Right, I forgot about this. A VF has no "remove" file, so it can't be used to cause the leak. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 6, 2015 at 1:09 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote: > On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote: >>Prior to your patch, the VF reference was released in virtfn_remove(), >>which is only called via pci_disable_sriov(). This typically happens in >>a driver .remove() method. The reference is *not* released if we call >>pci_stop_and_remove_bus_device(VF) directly, as we would via the >>remove_store() (sysfs "remove" file) or hot unplug paths, e.g., >>pciehp_unconfigure_device(). > > You want to say the reference for VF or PF? Yes, I meant the PF reference. The hot unplug paths call pci_stop_and_remove_bus_device() for the VFs first, then the PF. Calling it for the VF releases the VF reference but not the PF one. Calling it for the PF will call virtfn_remove() via the driver's .remove() method, but it probably does nothing because when it calls pci_get_domain_bus_and_slot() to get the virtfn, it gets a NULL because the VF has already been removed. >>What about the other things done in virtfn_remove(), e.g., the sysfs link >>removal? Your patch fixes a reference count leak, but don't we still have >>a sysfs link leak? > > Agree, I am afraid the sysfs would have a leak too. > > While I am not that familiar with the sysfs part, I don't dare to move that to > pci_destroy_dev(). Need more investigation. OK. I don't want to fix half of the leak problem. I want to fix the whole thing at once. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 06, 2015 at 10:30:39AM -0500, Bjorn Helgaas wrote: >On Wed, May 6, 2015 at 1:09 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote: >> On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote: > >>>Prior to your patch, the VF reference was released in virtfn_remove(), >>>which is only called via pci_disable_sriov(). This typically happens in >>>a driver .remove() method. The reference is *not* released if we call >>>pci_stop_and_remove_bus_device(VF) directly, as we would via the >>>remove_store() (sysfs "remove" file) or hot unplug paths, e.g., >>>pciehp_unconfigure_device(). >> >> You want to say the reference for VF or PF? > >Yes, I meant the PF reference. > >The hot unplug paths call pci_stop_and_remove_bus_device() for the VFs >first, then the PF. Calling it for the VF releases the VF reference >but not the PF one. Calling it for the PF will call virtfn_remove() >via the driver's .remove() method, but it probably does nothing >because when it calls pci_get_domain_bus_and_slot() to get the virtfn, >it gets a NULL because the VF has already been removed. > >>>What about the other things done in virtfn_remove(), e.g., the sysfs link >>>removal? Your patch fixes a reference count leak, but don't we still have >>>a sysfs link leak? >> >> Agree, I am afraid the sysfs would have a leak too. >> >> While I am not that familiar with the sysfs part, I don't dare to move that to >> pci_destroy_dev(). Need more investigation. > >OK. I don't want to fix half of the leak problem. I want to fix the >whole thing at once. > Sure, I will do some investigation and repost it. BTW, seems we also face the leak of the virtual bus. Will fix it in next version too. >Bjorn
On Wed, May 06, 2015 at 10:23:26AM -0500, Bjorn Helgaas wrote: >On Wed, May 6, 2015 at 2:25 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote: >> On Tue, May 05, 2015 at 04:29:05PM -0500, Bjorn Helgaas wrote: > >>>It would be useful to mention a way to cause the leak. I suspect writing >>>to a VF's sysfs "remove" file is the easiest. >>> >> >> Looks a VF don't support the remove now. >> >> static umode_t pci_dev_hp_attrs_are_visible(struct kobject *kobj, >> struct attribute *a, int n) >> { >> struct device *dev = container_of(kobj, struct device, kobj); >> struct pci_dev *pdev = to_pci_dev(dev); >> >> if (pdev->is_virtfn) >> return 0; >> >> return a->mode; >> } >> >> static struct attribute_group pci_dev_hp_attr_group = { >> .attrs = pci_dev_hp_attrs, >> .is_visible = pci_dev_hp_attrs_are_visible, >> }; > >Right, I forgot about this. A VF has no "remove" file, so it can't be >used to cause the leak. Hmm, I see this is introduced in commit dfab88beda88, to prevent some memory leak. Could we say, if we do the release properly like in this patch, we could re-enable this "remove"?
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 4b3a4ea..9b04bde 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -167,7 +167,6 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) /* balance pci_get_domain_bus_and_slot() */ pci_dev_put(virtfn); - pci_dev_put(dev); } static int sriov_enable(struct pci_dev *dev, int nr_virtfn) diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index 8bd76c9..836ddf6 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -41,6 +41,11 @@ static void pci_destroy_dev(struct pci_dev *dev) list_del(&dev->bus_list); up_write(&pci_bus_sem); +#ifdef CONFIG_PCI_IOV + if (dev->is_virtfn) + pci_dev_put(dev->physfn); +#endif + pci_free_resources(dev); put_device(&dev->dev); }
Each VF will get a reference to its PF, while it is not returned back in all cases and leave a removed PF's pci_dev un-released. As commit ac205b7b ("PCI: make sriov work with hotplug remove") indicates, when removing devices on a bus, we do it in the reverse order. This means we would remove VFs first, then PFs. After doing so, VF's removal is done with pci_stop_and_remove_bus_device() instead of virtfn_remove(). virtfn_remove() returns the reference of its PF, while pci_stop_and_remove_bus_device() doesn't. This patches moves the return of PF's reference to pci_destroy_dev() to make sure the PF's pci_dev is released in any case. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> --- drivers/pci/iov.c | 1 - drivers/pci/remove.c | 5 +++++ 2 files changed, 5 insertions(+), 1 deletion(-)