Message ID | 20230803171233.3810944-2-alex.williamson@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 5cd903bce9ddd234d76e67d0dfaf0aab0f11a2e0 |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Protect VPD and PME accesses from power management | expand |
On Thu, Aug 03, 2023 at 11:12:32AM -0600, Alex Williamson wrote: > Unlike default access to config space through sysfs, the vpd read and > write function don't actively manage the runtime power management state > of the device during access. Since commit 7ab5e10eda02 ("vfio/pci: Move > the unused device into low power state with runtime PM"), the vfio-pci > driver will use runtime power management and release unused devices to > make use of low power states. Attempting to access VPD information in > this low power state can result in incorrect information or kernel > crashes depending on the system behavior. > > Wrap the vpd read/write bin attribute handlers in runtime PM and take > into account the potential quirk to select the correct device to wake. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > drivers/pci/vpd.c | 34 ++++++++++++++++++++++++++++++++-- > 1 file changed, 32 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > index a4fc4d0690fe..81217dd4789f 100644 > --- a/drivers/pci/vpd.c > +++ b/drivers/pci/vpd.c > @@ -275,8 +275,23 @@ static ssize_t vpd_read(struct file *filp, struct kobject *kobj, > size_t count) > { > struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj)); > + struct pci_dev *vpd_dev = dev; > + ssize_t ret; > + > + if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { > + vpd_dev = pci_get_func0_dev(dev); > + if (!vpd_dev) > + return -ENODEV; > + } > + > + pci_config_pm_runtime_get(vpd_dev); > + ret = pci_read_vpd(vpd_dev, off, count, buf); > + pci_config_pm_runtime_put(vpd_dev); > + > + if (dev != vpd_dev) > + pci_dev_put(vpd_dev); I first thought this would leak a reference if dev was func0 and had PCI_DEV_FLAGS_VPD_REF_F0 set, because in that case vpd_dev would be the same as dev. But I think that case can't happen because quirk_f0_vpd_link() does nothing for func0 devices, so PCI_DEV_FLAGS_VPD_REF_F0 should never be set for func0. But it seems like this might be easier to analyze as: if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) pci_dev_put(vpd_dev); Or am I missing something? > - return pci_read_vpd(dev, off, count, buf); > + return ret; > } > > static ssize_t vpd_write(struct file *filp, struct kobject *kobj, > @@ -284,8 +299,23 @@ static ssize_t vpd_write(struct file *filp, struct kobject *kobj, > size_t count) > { > struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj)); > + struct pci_dev *vpd_dev = dev; > + ssize_t ret; > + > + if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { > + vpd_dev = pci_get_func0_dev(dev); > + if (!vpd_dev) > + return -ENODEV; > + } > + > + pci_config_pm_runtime_get(vpd_dev); > + ret = pci_write_vpd(vpd_dev, off, count, buf); > + pci_config_pm_runtime_put(vpd_dev); > + > + if (dev != vpd_dev) > + pci_dev_put(vpd_dev); > > - return pci_write_vpd(dev, off, count, buf); > + return ret; > } > static BIN_ATTR(vpd, 0600, vpd_read, vpd_write, 0); > > -- > 2.40.1 >
On Thu, 10 Aug 2023 10:59:26 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > On Thu, Aug 03, 2023 at 11:12:32AM -0600, Alex Williamson wrote: > > Unlike default access to config space through sysfs, the vpd read and > > write function don't actively manage the runtime power management state > > of the device during access. Since commit 7ab5e10eda02 ("vfio/pci: Move > > the unused device into low power state with runtime PM"), the vfio-pci > > driver will use runtime power management and release unused devices to > > make use of low power states. Attempting to access VPD information in > > this low power state can result in incorrect information or kernel > > crashes depending on the system behavior. > > > > Wrap the vpd read/write bin attribute handlers in runtime PM and take > > into account the potential quirk to select the correct device to wake. > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > drivers/pci/vpd.c | 34 ++++++++++++++++++++++++++++++++-- > > 1 file changed, 32 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > > index a4fc4d0690fe..81217dd4789f 100644 > > --- a/drivers/pci/vpd.c > > +++ b/drivers/pci/vpd.c > > @@ -275,8 +275,23 @@ static ssize_t vpd_read(struct file *filp, struct kobject *kobj, > > size_t count) > > { > > struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj)); > > + struct pci_dev *vpd_dev = dev; > > + ssize_t ret; > > + > > + if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { > > + vpd_dev = pci_get_func0_dev(dev); > > + if (!vpd_dev) > > + return -ENODEV; > > + } > > + > > + pci_config_pm_runtime_get(vpd_dev); > > + ret = pci_read_vpd(vpd_dev, off, count, buf); > > + pci_config_pm_runtime_put(vpd_dev); > > + > > + if (dev != vpd_dev) > > + pci_dev_put(vpd_dev); > > I first thought this would leak a reference if dev was func0 and had > PCI_DEV_FLAGS_VPD_REF_F0 set, because in that case vpd_dev would be > the same as dev. > > But I think that case can't happen because quirk_f0_vpd_link() does > nothing for func0 devices, so PCI_DEV_FLAGS_VPD_REF_F0 should never be > set for func0. But it seems like this might be easier to analyze as: > > if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) > pci_dev_put(vpd_dev); > > Or am I missing something? Nope, your analysis is correct, it doesn't make any sense to have a flag on func0 redirecting VPD access to func0 so vpd_dev can only be different on non-zero functions. The alternative test is equally valid so if you think it's more intuitive, let's use it. Thanks, Alex
On Thu, Aug 03, 2023 at 11:12:32AM -0600, Alex Williamson wrote: > Unlike default access to config space through sysfs, the vpd read and > write function don't actively manage the runtime power management state > of the device during access. Since commit 7ab5e10eda02 ("vfio/pci: Move > the unused device into low power state with runtime PM"), the vfio-pci > driver will use runtime power management and release unused devices to > make use of low power states. Attempting to access VPD information in > this low power state can result in incorrect information or kernel > crashes depending on the system behavior. I guess this specifically refers to D3cold, right? VPD is accessed via config space, which should work in all D0-D3hot states, but not in D3cold. I don't see anything in the spec about needing to be in D0 to access VPD. I assume there's no public problem report we could cite here? I suppose the behavior in D3cold is however the system handles a UR error. > Wrap the vpd read/write bin attribute handlers in runtime PM and take > into account the potential quirk to select the correct device to wake. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > drivers/pci/vpd.c | 34 ++++++++++++++++++++++++++++++++-- > 1 file changed, 32 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > index a4fc4d0690fe..81217dd4789f 100644 > --- a/drivers/pci/vpd.c > +++ b/drivers/pci/vpd.c > @@ -275,8 +275,23 @@ static ssize_t vpd_read(struct file *filp, struct kobject *kobj, > size_t count) > { > struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj)); > + struct pci_dev *vpd_dev = dev; > + ssize_t ret; > + > + if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { > + vpd_dev = pci_get_func0_dev(dev); > + if (!vpd_dev) > + return -ENODEV; > + } > + > + pci_config_pm_runtime_get(vpd_dev); > + ret = pci_read_vpd(vpd_dev, off, count, buf); > + pci_config_pm_runtime_put(vpd_dev); > + > + if (dev != vpd_dev) > + pci_dev_put(vpd_dev); > > - return pci_read_vpd(dev, off, count, buf); > + return ret; > } > > static ssize_t vpd_write(struct file *filp, struct kobject *kobj, > @@ -284,8 +299,23 @@ static ssize_t vpd_write(struct file *filp, struct kobject *kobj, > size_t count) > { > struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj)); > + struct pci_dev *vpd_dev = dev; > + ssize_t ret; > + > + if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { > + vpd_dev = pci_get_func0_dev(dev); > + if (!vpd_dev) > + return -ENODEV; > + } > + > + pci_config_pm_runtime_get(vpd_dev); > + ret = pci_write_vpd(vpd_dev, off, count, buf); > + pci_config_pm_runtime_put(vpd_dev); > + > + if (dev != vpd_dev) > + pci_dev_put(vpd_dev); > > - return pci_write_vpd(dev, off, count, buf); > + return ret; > } > static BIN_ATTR(vpd, 0600, vpd_read, vpd_write, 0); > > -- > 2.40.1 >
On Fri, 11 Aug 2023 14:25:43 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > On Thu, Aug 03, 2023 at 11:12:32AM -0600, Alex Williamson wrote: > > Unlike default access to config space through sysfs, the vpd read and > > write function don't actively manage the runtime power management state > > of the device during access. Since commit 7ab5e10eda02 ("vfio/pci: Move > > the unused device into low power state with runtime PM"), the vfio-pci > > driver will use runtime power management and release unused devices to > > make use of low power states. Attempting to access VPD information in > > this low power state can result in incorrect information or kernel > > crashes depending on the system behavior. > > I guess this specifically refers to D3cold, right? VPD is accessed > via config space, which should work in all D0-D3hot states, but not in > D3cold. I don't see anything in the spec about needing to be in D0 to > access VPD. > > I assume there's no public problem report we could cite here? I > suppose the behavior in D3cold is however the system handles a UR > error. Yes, Eric tested that pcie_port_pm=off is a viable workaround resolving both the VPD and PME accesses, so I think the issue is actually that the root port is in D3cold. This aligns with commit 7ab5e10eda02 above, since prior to that we were only manipulating the endpoint power state. The oops indicates an "Internal error: synchronous external abort", with a stack trace ending in pci_generic_config_read, so I suspect this is a UR. Unfortunately the bz is not currently public :-\ Thanks, Alex > > Wrap the vpd read/write bin attribute handlers in runtime PM and take > > into account the potential quirk to select the correct device to wake. > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > drivers/pci/vpd.c | 34 ++++++++++++++++++++++++++++++++-- > > 1 file changed, 32 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > > index a4fc4d0690fe..81217dd4789f 100644 > > --- a/drivers/pci/vpd.c > > +++ b/drivers/pci/vpd.c > > @@ -275,8 +275,23 @@ static ssize_t vpd_read(struct file *filp, struct kobject *kobj, > > size_t count) > > { > > struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj)); > > + struct pci_dev *vpd_dev = dev; > > + ssize_t ret; > > + > > + if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { > > + vpd_dev = pci_get_func0_dev(dev); > > + if (!vpd_dev) > > + return -ENODEV; > > + } > > + > > + pci_config_pm_runtime_get(vpd_dev); > > + ret = pci_read_vpd(vpd_dev, off, count, buf); > > + pci_config_pm_runtime_put(vpd_dev); > > + > > + if (dev != vpd_dev) > > + pci_dev_put(vpd_dev); > > > > - return pci_read_vpd(dev, off, count, buf); > > + return ret; > > } > > > > static ssize_t vpd_write(struct file *filp, struct kobject *kobj, > > @@ -284,8 +299,23 @@ static ssize_t vpd_write(struct file *filp, struct kobject *kobj, > > size_t count) > > { > > struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj)); > > + struct pci_dev *vpd_dev = dev; > > + ssize_t ret; > > + > > + if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { > > + vpd_dev = pci_get_func0_dev(dev); > > + if (!vpd_dev) > > + return -ENODEV; > > + } > > + > > + pci_config_pm_runtime_get(vpd_dev); > > + ret = pci_write_vpd(vpd_dev, off, count, buf); > > + pci_config_pm_runtime_put(vpd_dev); > > + > > + if (dev != vpd_dev) > > + pci_dev_put(vpd_dev); > > > > - return pci_write_vpd(dev, off, count, buf); > > + return ret; > > } > > static BIN_ATTR(vpd, 0600, vpd_read, vpd_write, 0); > > > > -- > > 2.40.1 > > >
diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c index a4fc4d0690fe..81217dd4789f 100644 --- a/drivers/pci/vpd.c +++ b/drivers/pci/vpd.c @@ -275,8 +275,23 @@ static ssize_t vpd_read(struct file *filp, struct kobject *kobj, size_t count) { struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj)); + struct pci_dev *vpd_dev = dev; + ssize_t ret; + + if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { + vpd_dev = pci_get_func0_dev(dev); + if (!vpd_dev) + return -ENODEV; + } + + pci_config_pm_runtime_get(vpd_dev); + ret = pci_read_vpd(vpd_dev, off, count, buf); + pci_config_pm_runtime_put(vpd_dev); + + if (dev != vpd_dev) + pci_dev_put(vpd_dev); - return pci_read_vpd(dev, off, count, buf); + return ret; } static ssize_t vpd_write(struct file *filp, struct kobject *kobj, @@ -284,8 +299,23 @@ static ssize_t vpd_write(struct file *filp, struct kobject *kobj, size_t count) { struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj)); + struct pci_dev *vpd_dev = dev; + ssize_t ret; + + if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { + vpd_dev = pci_get_func0_dev(dev); + if (!vpd_dev) + return -ENODEV; + } + + pci_config_pm_runtime_get(vpd_dev); + ret = pci_write_vpd(vpd_dev, off, count, buf); + pci_config_pm_runtime_put(vpd_dev); + + if (dev != vpd_dev) + pci_dev_put(vpd_dev); - return pci_write_vpd(dev, off, count, buf); + return ret; } static BIN_ATTR(vpd, 0600, vpd_read, vpd_write, 0);
Unlike default access to config space through sysfs, the vpd read and write function don't actively manage the runtime power management state of the device during access. Since commit 7ab5e10eda02 ("vfio/pci: Move the unused device into low power state with runtime PM"), the vfio-pci driver will use runtime power management and release unused devices to make use of low power states. Attempting to access VPD information in this low power state can result in incorrect information or kernel crashes depending on the system behavior. Wrap the vpd read/write bin attribute handlers in runtime PM and take into account the potential quirk to select the correct device to wake. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- drivers/pci/vpd.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-)