Message ID | 20231004144731.158342-1-mario.limonciello@amd.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [v2] PCI: Make d3cold_allowed sysfs attribute read only | expand |
On Wed, Oct 04, 2023 at 09:47:31AM -0500, Mario Limonciello wrote: > Before d3cold was stable userspace was allowed to influence the kernel's > decision of whether to enable d3cold for a device by a sysfs file > `d3cold_allowed`. This potentially allows userspace to break the suspend > for the system. Is "Before d3cold was stable" referring to a "d3cold" read-only variable, or to Linux functionality of using D3cold, or ...? In what sense does the `d3cold_allowed` sysfs file break suspend? > For debugging purposes `pci_port_pm=` can be used to control whether > a PCI port will go into D3cold and runtime PM can be turned off by > sysfs on PCI end points. I guess this should be "pcie_port_pm=", which affects *all* PCIe ports? Which sysfs file turns off runtime PM for endpoints? > Change the sysfs attribute to a noop that ignores the input when written > and shows a warning. Simplify the internal kernel logic to drop > `d3cold_allowed`. > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > v1->v2: > * Leave R/W and show a warning instead > * Mark deprecated in sysfs file > --- > Documentation/ABI/testing/sysfs-bus-pci | 4 ++-- > drivers/pci/pci-acpi.c | 2 +- > drivers/pci/pci-sysfs.c | 14 ++------------ > drivers/pci/pci.c | 3 +-- > include/linux/pci.h | 1 - > 5 files changed, 6 insertions(+), 18 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci > index ecf47559f495..b5db141dfee6 100644 > --- a/Documentation/ABI/testing/sysfs-bus-pci > +++ b/Documentation/ABI/testing/sysfs-bus-pci > @@ -283,8 +283,8 @@ Description: > device will never be put into D3Cold state. If it is set, the > device may be put into D3Cold state if other requirements are > satisfied too. Reading this attribute will show the current > - value of d3cold_allowed bit. Writing this attribute will set > - the value of d3cold_allowed bit. > + value of no_d3cold bit. > + Writing to this attribute is deprecated and will do nothing. > > What: /sys/bus/pci/devices/.../sriov_totalvfs > Date: November 2012 > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > index 05b7357bd258..a05350a4e49c 100644 > --- a/drivers/pci/pci-acpi.c > +++ b/drivers/pci/pci-acpi.c > @@ -911,7 +911,7 @@ pci_power_t acpi_pci_choose_state(struct pci_dev *pdev) > { > int acpi_state, d_max; > > - if (pdev->no_d3cold || !pdev->d3cold_allowed) > + if (pdev->no_d3cold) > d_max = ACPI_STATE_D3_HOT; > else > d_max = ACPI_STATE_D3_COLD; > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 5e741a05cf2c..52ed5a55a371 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -523,17 +523,7 @@ static ssize_t d3cold_allowed_store(struct device *dev, > struct device_attribute *attr, > const char *buf, size_t count) > { > - struct pci_dev *pdev = to_pci_dev(dev); > - unsigned long val; > - > - if (kstrtoul(buf, 0, &val) < 0) > - return -EINVAL; > - > - pdev->d3cold_allowed = !!val; > - pci_bridge_d3_update(pdev); > - > - pm_runtime_resume(dev); > - > + dev_warn_once(dev, "pci: writing to d3cold_allowed is deprecated\n"); > return count; > } > > @@ -541,7 +531,7 @@ static ssize_t d3cold_allowed_show(struct device *dev, > struct device_attribute *attr, char *buf) > { > struct pci_dev *pdev = to_pci_dev(dev); > - return sysfs_emit(buf, "%u\n", pdev->d3cold_allowed); > + return sysfs_emit(buf, "%u\n", !pdev->no_d3cold); > } > static DEVICE_ATTR_RW(d3cold_allowed); > #endif > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 59c01d68c6d5..8c5a6f68f63d 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -3067,7 +3067,7 @@ static int pci_dev_check_d3cold(struct pci_dev *dev, void *data) > bool *d3cold_ok = data; > > if (/* The device needs to be allowed to go D3cold ... */ > - dev->no_d3cold || !dev->d3cold_allowed || > + dev->no_d3cold || > > /* ... and if it is wakeup capable to do so from D3cold. */ > (device_may_wakeup(&dev->dev) && > @@ -3204,7 +3204,6 @@ void pci_pm_init(struct pci_dev *dev) > dev->d3hot_delay = PCI_PM_D3HOT_WAIT; > dev->d3cold_delay = PCI_PM_D3COLD_WAIT; > dev->bridge_d3 = pci_bridge_d3_possible(dev); > - dev->d3cold_allowed = true; > > dev->d1_support = false; > dev->d2_support = false; > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 8c7c2c3c6c65..5f4ed71d31f5 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -376,7 +376,6 @@ struct pci_dev { > unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ > unsigned int no_d3cold:1; /* D3cold is forbidden */ > unsigned int bridge_d3:1; /* Allow D3 for bridge */ > - unsigned int d3cold_allowed:1; /* D3cold is allowed by user */ > unsigned int mmio_always_on:1; /* Disallow turning off io/mem > decoding during BAR sizing */ > unsigned int wakeup_prepared:1; > -- > 2.34.1 >
On 10/5/2023 13:53, Bjorn Helgaas wrote: > On Wed, Oct 04, 2023 at 09:47:31AM -0500, Mario Limonciello wrote: >> Before d3cold was stable userspace was allowed to influence the kernel's >> decision of whether to enable d3cold for a device by a sysfs file >> `d3cold_allowed`. This potentially allows userspace to break the suspend >> for the system. > > Is "Before d3cold was stable" referring to a "d3cold" read-only > variable, or to Linux functionality of using D3cold, or ...? I was referring to the previous thread's comments when I asked about the history on it. > > In what sense does the `d3cold_allowed` sysfs file break suspend? SoCs might not be able to get into their deepest sleep state if userspace messes with it. > >> For debugging purposes `pci_port_pm=` can be used to control whether >> a PCI port will go into D3cold and runtime PM can be turned off by >> sysfs on PCI end points. > > I guess this should be "pcie_port_pm=", which affects *all* PCIe > ports? Yes. > > Which sysfs file turns off runtime PM for endpoints? /sys/bus/pci/devices/*/power/control > >> Change the sysfs attribute to a noop that ignores the input when written >> and shows a warning. Simplify the internal kernel logic to drop >> `d3cold_allowed`. >> >> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >> --- >> v1->v2: >> * Leave R/W and show a warning instead >> * Mark deprecated in sysfs file >> --- >> Documentation/ABI/testing/sysfs-bus-pci | 4 ++-- >> drivers/pci/pci-acpi.c | 2 +- >> drivers/pci/pci-sysfs.c | 14 ++------------ >> drivers/pci/pci.c | 3 +-- >> include/linux/pci.h | 1 - >> 5 files changed, 6 insertions(+), 18 deletions(-) >> >> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci >> index ecf47559f495..b5db141dfee6 100644 >> --- a/Documentation/ABI/testing/sysfs-bus-pci >> +++ b/Documentation/ABI/testing/sysfs-bus-pci >> @@ -283,8 +283,8 @@ Description: >> device will never be put into D3Cold state. If it is set, the >> device may be put into D3Cold state if other requirements are >> satisfied too. Reading this attribute will show the current >> - value of d3cold_allowed bit. Writing this attribute will set >> - the value of d3cold_allowed bit. >> + value of no_d3cold bit. >> + Writing to this attribute is deprecated and will do nothing. >> >> What: /sys/bus/pci/devices/.../sriov_totalvfs >> Date: November 2012 >> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c >> index 05b7357bd258..a05350a4e49c 100644 >> --- a/drivers/pci/pci-acpi.c >> +++ b/drivers/pci/pci-acpi.c >> @@ -911,7 +911,7 @@ pci_power_t acpi_pci_choose_state(struct pci_dev *pdev) >> { >> int acpi_state, d_max; >> >> - if (pdev->no_d3cold || !pdev->d3cold_allowed) >> + if (pdev->no_d3cold) >> d_max = ACPI_STATE_D3_HOT; >> else >> d_max = ACPI_STATE_D3_COLD; >> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c >> index 5e741a05cf2c..52ed5a55a371 100644 >> --- a/drivers/pci/pci-sysfs.c >> +++ b/drivers/pci/pci-sysfs.c >> @@ -523,17 +523,7 @@ static ssize_t d3cold_allowed_store(struct device *dev, >> struct device_attribute *attr, >> const char *buf, size_t count) >> { >> - struct pci_dev *pdev = to_pci_dev(dev); >> - unsigned long val; >> - >> - if (kstrtoul(buf, 0, &val) < 0) >> - return -EINVAL; >> - >> - pdev->d3cold_allowed = !!val; >> - pci_bridge_d3_update(pdev); >> - >> - pm_runtime_resume(dev); >> - >> + dev_warn_once(dev, "pci: writing to d3cold_allowed is deprecated\n"); >> return count; >> } >> >> @@ -541,7 +531,7 @@ static ssize_t d3cold_allowed_show(struct device *dev, >> struct device_attribute *attr, char *buf) >> { >> struct pci_dev *pdev = to_pci_dev(dev); >> - return sysfs_emit(buf, "%u\n", pdev->d3cold_allowed); >> + return sysfs_emit(buf, "%u\n", !pdev->no_d3cold); >> } >> static DEVICE_ATTR_RW(d3cold_allowed); >> #endif >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 59c01d68c6d5..8c5a6f68f63d 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -3067,7 +3067,7 @@ static int pci_dev_check_d3cold(struct pci_dev *dev, void *data) >> bool *d3cold_ok = data; >> >> if (/* The device needs to be allowed to go D3cold ... */ >> - dev->no_d3cold || !dev->d3cold_allowed || >> + dev->no_d3cold || >> >> /* ... and if it is wakeup capable to do so from D3cold. */ >> (device_may_wakeup(&dev->dev) && >> @@ -3204,7 +3204,6 @@ void pci_pm_init(struct pci_dev *dev) >> dev->d3hot_delay = PCI_PM_D3HOT_WAIT; >> dev->d3cold_delay = PCI_PM_D3COLD_WAIT; >> dev->bridge_d3 = pci_bridge_d3_possible(dev); >> - dev->d3cold_allowed = true; >> >> dev->d1_support = false; >> dev->d2_support = false; >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 8c7c2c3c6c65..5f4ed71d31f5 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -376,7 +376,6 @@ struct pci_dev { >> unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ >> unsigned int no_d3cold:1; /* D3cold is forbidden */ >> unsigned int bridge_d3:1; /* Allow D3 for bridge */ >> - unsigned int d3cold_allowed:1; /* D3cold is allowed by user */ >> unsigned int mmio_always_on:1; /* Disallow turning off io/mem >> decoding during BAR sizing */ >> unsigned int wakeup_prepared:1; >> -- >> 2.34.1 >>
On Thu, Oct 05, 2023 at 01:56:27PM -0500, Mario Limonciello wrote: > On 10/5/2023 13:53, Bjorn Helgaas wrote: > > On Wed, Oct 04, 2023 at 09:47:31AM -0500, Mario Limonciello wrote: > > > Before d3cold was stable userspace was allowed to influence the kernel's > > > decision of whether to enable d3cold for a device by a sysfs file > > > `d3cold_allowed`. This potentially allows userspace to break the suspend > > > for the system. > > > > Is "Before d3cold was stable" referring to a "d3cold" read-only > > variable, or to Linux functionality of using D3cold, or ...? > > I was referring to the previous thread's comments when I asked about the > history on it. > > > In what sense does the `d3cold_allowed` sysfs file break suspend? > > SoCs might not be able to get into their deepest sleep state if userspace > messes with it. > > > > For debugging purposes `pci_port_pm=` can be used to control whether > > > a PCI port will go into D3cold and runtime PM can be turned off by > > > sysfs on PCI end points. > > > > I guess this should be "pcie_port_pm=", which affects *all* PCIe > > ports? > > Yes. > > > Which sysfs file turns off runtime PM for endpoints? > > /sys/bus/pci/devices/*/power/control To close the loop on this, I think these are questions that should be answered in the commit log (actually, that's usually the case when I have questions, because future readers of the git history may have the same questions, and it's not practical to dig the answers out of the lore archive). Bjorn
On 10/10/2023 11:33, Bjorn Helgaas wrote: > On Thu, Oct 05, 2023 at 01:56:27PM -0500, Mario Limonciello wrote: >> On 10/5/2023 13:53, Bjorn Helgaas wrote: >>> On Wed, Oct 04, 2023 at 09:47:31AM -0500, Mario Limonciello wrote: >>>> Before d3cold was stable userspace was allowed to influence the kernel's >>>> decision of whether to enable d3cold for a device by a sysfs file >>>> `d3cold_allowed`. This potentially allows userspace to break the suspend >>>> for the system. >>> >>> Is "Before d3cold was stable" referring to a "d3cold" read-only >>> variable, or to Linux functionality of using D3cold, or ...? >> >> I was referring to the previous thread's comments when I asked about the >> history on it. >> >>> In what sense does the `d3cold_allowed` sysfs file break suspend? >> >> SoCs might not be able to get into their deepest sleep state if userspace >> messes with it. >> >>>> For debugging purposes `pci_port_pm=` can be used to control whether >>>> a PCI port will go into D3cold and runtime PM can be turned off by >>>> sysfs on PCI end points. >>> >>> I guess this should be "pcie_port_pm=", which affects *all* PCIe >>> ports? >> >> Yes. >> >>> Which sysfs file turns off runtime PM for endpoints? >> >> /sys/bus/pci/devices/*/power/control > > To close the loop on this, I think these are questions that should be > answered in the commit log (actually, that's usually the case when I > have questions, because future readers of the git history may have the > same questions, and it's not practical to dig the answers out of the > lore archive). > > Bjorn OK thanks, sometimes it's unclear if you just want to know more or want it in the commit message. I'll respin a v2 with the commit message adjusted.
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index ecf47559f495..b5db141dfee6 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -283,8 +283,8 @@ Description: device will never be put into D3Cold state. If it is set, the device may be put into D3Cold state if other requirements are satisfied too. Reading this attribute will show the current - value of d3cold_allowed bit. Writing this attribute will set - the value of d3cold_allowed bit. + value of no_d3cold bit. + Writing to this attribute is deprecated and will do nothing. What: /sys/bus/pci/devices/.../sriov_totalvfs Date: November 2012 diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index 05b7357bd258..a05350a4e49c 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -911,7 +911,7 @@ pci_power_t acpi_pci_choose_state(struct pci_dev *pdev) { int acpi_state, d_max; - if (pdev->no_d3cold || !pdev->d3cold_allowed) + if (pdev->no_d3cold) d_max = ACPI_STATE_D3_HOT; else d_max = ACPI_STATE_D3_COLD; diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 5e741a05cf2c..52ed5a55a371 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -523,17 +523,7 @@ static ssize_t d3cold_allowed_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { - struct pci_dev *pdev = to_pci_dev(dev); - unsigned long val; - - if (kstrtoul(buf, 0, &val) < 0) - return -EINVAL; - - pdev->d3cold_allowed = !!val; - pci_bridge_d3_update(pdev); - - pm_runtime_resume(dev); - + dev_warn_once(dev, "pci: writing to d3cold_allowed is deprecated\n"); return count; } @@ -541,7 +531,7 @@ static ssize_t d3cold_allowed_show(struct device *dev, struct device_attribute *attr, char *buf) { struct pci_dev *pdev = to_pci_dev(dev); - return sysfs_emit(buf, "%u\n", pdev->d3cold_allowed); + return sysfs_emit(buf, "%u\n", !pdev->no_d3cold); } static DEVICE_ATTR_RW(d3cold_allowed); #endif diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 59c01d68c6d5..8c5a6f68f63d 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3067,7 +3067,7 @@ static int pci_dev_check_d3cold(struct pci_dev *dev, void *data) bool *d3cold_ok = data; if (/* The device needs to be allowed to go D3cold ... */ - dev->no_d3cold || !dev->d3cold_allowed || + dev->no_d3cold || /* ... and if it is wakeup capable to do so from D3cold. */ (device_may_wakeup(&dev->dev) && @@ -3204,7 +3204,6 @@ void pci_pm_init(struct pci_dev *dev) dev->d3hot_delay = PCI_PM_D3HOT_WAIT; dev->d3cold_delay = PCI_PM_D3COLD_WAIT; dev->bridge_d3 = pci_bridge_d3_possible(dev); - dev->d3cold_allowed = true; dev->d1_support = false; dev->d2_support = false; diff --git a/include/linux/pci.h b/include/linux/pci.h index 8c7c2c3c6c65..5f4ed71d31f5 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -376,7 +376,6 @@ struct pci_dev { unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ unsigned int no_d3cold:1; /* D3cold is forbidden */ unsigned int bridge_d3:1; /* Allow D3 for bridge */ - unsigned int d3cold_allowed:1; /* D3cold is allowed by user */ unsigned int mmio_always_on:1; /* Disallow turning off io/mem decoding during BAR sizing */ unsigned int wakeup_prepared:1;
Before d3cold was stable userspace was allowed to influence the kernel's decision of whether to enable d3cold for a device by a sysfs file `d3cold_allowed`. This potentially allows userspace to break the suspend for the system. For debugging purposes `pci_port_pm=` can be used to control whether a PCI port will go into D3cold and runtime PM can be turned off by sysfs on PCI end points. Change the sysfs attribute to a noop that ignores the input when written and shows a warning. Simplify the internal kernel logic to drop `d3cold_allowed`. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> --- v1->v2: * Leave R/W and show a warning instead * Mark deprecated in sysfs file --- Documentation/ABI/testing/sysfs-bus-pci | 4 ++-- drivers/pci/pci-acpi.c | 2 +- drivers/pci/pci-sysfs.c | 14 ++------------ drivers/pci/pci.c | 3 +-- include/linux/pci.h | 1 - 5 files changed, 6 insertions(+), 18 deletions(-)