Message ID | 20240207084452.9597-1-drake@endlessos.org (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | [v2,1/2] PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge | expand |
Daniel Drake <drake@endlessos.org> 於 2024年2月7日 週三 下午4:44寫道: > > The Asus B1400 with original shipped firmware versions and VMD disabled > cannot resume from suspend: the NVMe device becomes unresponsive and > inaccessible. > > This is because the NVMe device and parent PCI bridge get put into D3cold > during suspend, and this PCI bridge cannot be recovered from D3cold mode: > > echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/unbind > echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/unbind > setpci -s 00:06.0 CAP_PM+4.b=03 # D3hot > acpidbg -b "execute \_SB.PC00.PEG0.PXP._OFF" > acpidbg -b "execute \_SB.PC00.PEG0.PXP._ON" > setpci -s 00:06.0 CAP_PM+4.b=0 # D0 > echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/bind > echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/bind > # NVMe probe fails here with -ENODEV > > This appears to be an untested D3cold transition by the vendor; Intel > socwatch shows that Windows leaves the NVMe device and parent bridge in D0 > during suspend, even though these firmware versions have StorageD3Enable=1. > > Experimenting with the DSDT, the _OFF method calls DL23() which sets a L23E > bit at offset 0xe2 into the PCI configuration space for this root port. > This is the specific write that the _ON routine is unable to recover from. > This register is not documented in the public chipset datasheet. > > Disallow D3cold on the PCI bridge to enable successful suspend/resume. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742 > Signed-off-by: Daniel Drake <drake@endlessos.org> Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > --- > arch/x86/pci/fixup.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 45 insertions(+) > > v2: > Match only specific BIOS versions where this quirk is required. > Add subsequent patch to this series to revert the original S3 workaround > now that s2idle is usable again. > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c > index f347c20247d30..6b0b341178e4f 100644 > --- a/arch/x86/pci/fixup.c > +++ b/arch/x86/pci/fixup.c > @@ -907,6 +907,51 @@ static void chromeos_fixup_apl_pci_l1ss_capability(struct pci_dev *dev) > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_save_apl_pci_l1ss_capability); > DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_fixup_apl_pci_l1ss_capability); > > +/* > + * Disable D3cold on Asus B1400 PCIe bridge at 00:06.0. > + * > + * On this platform with VMD off, the NVMe's parent PCI bridge cannot > + * successfully power back on from D3cold, resulting in unresponsive NVMe on > + * resume. This appears to be an untested transition by the vendor: Windows > + * leaves the NVMe and parent bridge in D0 during suspend. > + * This is only needed on BIOS versions before 308; the newer versions flip > + * StorageD3Enable from 1 to 0. > + */ > +static const struct dmi_system_id asus_nvme_broken_d3cold_table[] = { > + { > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.304"), > + }, > + }, > + { > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.305"), > + }, > + }, > + { > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.306"), > + }, > + }, > + { > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.307"), > + }, > + }, > + {} > +}; > + > +static void asus_disable_nvme_d3cold(struct pci_dev *pdev) > +{ > + if (dmi_check_system(asus_nvme_broken_d3cold_table) > 0) > + pci_d3cold_disable(pdev); > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9a09, asus_disable_nvme_d3cold); > + > #ifdef CONFIG_SUSPEND > /* > * Root Ports on some AMD SoCs advertise PME_Support for D3hot and D3cold, but > -- > 2.43.0 >
Jian-Hong Pan <jhp@endlessos.org> 於 2024年2月7日 週三 下午5:06寫道: > > Daniel Drake <drake@endlessos.org> 於 2024年2月7日 週三 下午4:44寫道: > > > > The Asus B1400 with original shipped firmware versions and VMD disabled > > cannot resume from suspend: the NVMe device becomes unresponsive and > > inaccessible. > > > > This is because the NVMe device and parent PCI bridge get put into D3cold > > during suspend, and this PCI bridge cannot be recovered from D3cold mode: > > > > echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/unbind > > echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/unbind > > setpci -s 00:06.0 CAP_PM+4.b=03 # D3hot > > acpidbg -b "execute \_SB.PC00.PEG0.PXP._OFF" > > acpidbg -b "execute \_SB.PC00.PEG0.PXP._ON" > > setpci -s 00:06.0 CAP_PM+4.b=0 # D0 > > echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/bind > > echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/bind > > # NVMe probe fails here with -ENODEV > > > > This appears to be an untested D3cold transition by the vendor; Intel > > socwatch shows that Windows leaves the NVMe device and parent bridge in D0 > > during suspend, even though these firmware versions have StorageD3Enable=1. > > > > Experimenting with the DSDT, the _OFF method calls DL23() which sets a L23E > > bit at offset 0xe2 into the PCI configuration space for this root port. > > This is the specific write that the _ON routine is unable to recover from. > > This register is not documented in the public chipset datasheet. > > > > Disallow D3cold on the PCI bridge to enable successful suspend/resume. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742 > > Signed-off-by: Daniel Drake <drake@endlessos.org> > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> Change to Acked-by: Jian-Hong Pan <jhp@endlessos.org> > > --- > > arch/x86/pci/fixup.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 45 insertions(+) > > > > v2: > > Match only specific BIOS versions where this quirk is required. > > Add subsequent patch to this series to revert the original S3 workaround > > now that s2idle is usable again. > > > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c > > index f347c20247d30..6b0b341178e4f 100644 > > --- a/arch/x86/pci/fixup.c > > +++ b/arch/x86/pci/fixup.c > > @@ -907,6 +907,51 @@ static void chromeos_fixup_apl_pci_l1ss_capability(struct pci_dev *dev) > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_save_apl_pci_l1ss_capability); > > DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_fixup_apl_pci_l1ss_capability); > > > > +/* > > + * Disable D3cold on Asus B1400 PCIe bridge at 00:06.0. > > + * > > + * On this platform with VMD off, the NVMe's parent PCI bridge cannot > > + * successfully power back on from D3cold, resulting in unresponsive NVMe on > > + * resume. This appears to be an untested transition by the vendor: Windows > > + * leaves the NVMe and parent bridge in D0 during suspend. > > + * This is only needed on BIOS versions before 308; the newer versions flip > > + * StorageD3Enable from 1 to 0. > > + */ > > +static const struct dmi_system_id asus_nvme_broken_d3cold_table[] = { > > + { > > + .matches = { > > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.304"), > > + }, > > + }, > > + { > > + .matches = { > > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.305"), > > + }, > > + }, > > + { > > + .matches = { > > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.306"), > > + }, > > + }, > > + { > > + .matches = { > > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.307"), > > + }, > > + }, > > + {} > > +}; > > + > > +static void asus_disable_nvme_d3cold(struct pci_dev *pdev) > > +{ > > + if (dmi_check_system(asus_nvme_broken_d3cold_table) > 0) > > + pci_d3cold_disable(pdev); > > +} > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9a09, asus_disable_nvme_d3cold); > > + > > #ifdef CONFIG_SUSPEND > > /* > > * Root Ports on some AMD SoCs advertise PME_Support for D3hot and D3cold, but > > -- > > 2.43.0 > >
On Wed, Feb 07, 2024 at 09:44:51AM +0100, Daniel Drake wrote: > The Asus B1400 with original shipped firmware versions and VMD disabled > cannot resume from suspend: the NVMe device becomes unresponsive and > inaccessible. > > This is because the NVMe device and parent PCI bridge get put into D3cold > during suspend, and this PCI bridge cannot be recovered from D3cold mode: > > echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/unbind > echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/unbind > setpci -s 00:06.0 CAP_PM+4.b=03 # D3hot > acpidbg -b "execute \_SB.PC00.PEG0.PXP._OFF" > acpidbg -b "execute \_SB.PC00.PEG0.PXP._ON" > setpci -s 00:06.0 CAP_PM+4.b=0 # D0 > echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/bind > echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/bind > # NVMe probe fails here with -ENODEV Can you run "sudo lspci -vvxxxx -s00:06.0" before putting the Root Port in D3hot, and then again after putting it back in D0 (when NVMe is inaccessible), and attach both outputs to the bugzilla? > This appears to be an untested D3cold transition by the vendor; Intel > socwatch shows that Windows leaves the NVMe device and parent bridge in D0 > during suspend, even though these firmware versions have StorageD3Enable=1. > > Experimenting with the DSDT, the _OFF method calls DL23() which sets a L23E > bit at offset 0xe2 into the PCI configuration space for this root port. > This is the specific write that the _ON routine is unable to recover from. > This register is not documented in the public chipset datasheet. > > Disallow D3cold on the PCI bridge to enable successful suspend/resume. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742 > Signed-off-by: Daniel Drake <drake@endlessos.org> > --- > arch/x86/pci/fixup.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 45 insertions(+) > > v2: > Match only specific BIOS versions where this quirk is required. > Add subsequent patch to this series to revert the original S3 workaround > now that s2idle is usable again. > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c > index f347c20247d30..6b0b341178e4f 100644 > --- a/arch/x86/pci/fixup.c > +++ b/arch/x86/pci/fixup.c > @@ -907,6 +907,51 @@ static void chromeos_fixup_apl_pci_l1ss_capability(struct pci_dev *dev) > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_save_apl_pci_l1ss_capability); > DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_fixup_apl_pci_l1ss_capability); > > +/* > + * Disable D3cold on Asus B1400 PCIe bridge at 00:06.0. I doubt the 00:06.0 PCI address is relevant here. > + * On this platform with VMD off, the NVMe's parent PCI bridge cannot > + * successfully power back on from D3cold, resulting in unresponsive NVMe on > + * resume. This appears to be an untested transition by the vendor: Windows > + * leaves the NVMe and parent bridge in D0 during suspend. > + * This is only needed on BIOS versions before 308; the newer versions flip > + * StorageD3Enable from 1 to 0. Given that D3cold is just "main power off," and obviously the Root Port *can* transition from D3cold to D0 (at initial platform power-up if nothing else), this seems kind of strange and makes me think we may not completely understand the root cause, e.g., maybe some config didn't get restored. But the fact that Windows doesn't use D3cold in this case suggests that either (1) Windows has a similar quirk to work around this, or (2) Windows decides whether to use D3cold differently than Linux does. I have no data, but (1) seems sort of unlikely. In case it turns out to be (2) and we figure out how to fix it that way someday, can you add the output of "sudo lspci -vvxxxx" of the system to the bugzilla? What would be the downside of skipping the DMI table and calling pci_d3cold_disable() always? If this truly is a Root Port defect, it should affect all platforms with this device, and what's the benefit of relying on BIOS to use StorageD3Enable to avoid the defect? Rewrap into a single paragraph or add a blank line between paragraphs. > + */ > +static const struct dmi_system_id asus_nvme_broken_d3cold_table[] = { > + { > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.304"), > + }, > + }, > + { > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.305"), > + }, > + }, > + { > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.306"), > + }, > + }, > + { > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.307"), > + }, > + }, > + {} > +}; > + > +static void asus_disable_nvme_d3cold(struct pci_dev *pdev) > +{ > + if (dmi_check_system(asus_nvme_broken_d3cold_table) > 0) > + pci_d3cold_disable(pdev); > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9a09, asus_disable_nvme_d3cold); > + > #ifdef CONFIG_SUSPEND > /* > * Root Ports on some AMD SoCs advertise PME_Support for D3hot and D3cold, but > -- > 2.43.0 >
On Wed, Feb 7, 2024 at 9:05 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > Can you run "sudo lspci -vvxxxx -s00:06.0" before putting the Root > Port in D3hot, and then again after putting it back in D0 (when NVMe > is inaccessible), and attach both outputs to the bugzilla? Done: https://bugzilla.kernel.org/show_bug.cgi?id=215742#c21 > Given that D3cold is just "main power off," and obviously the Root > Port *can* transition from D3cold to D0 (at initial platform power-up > if nothing else), this seems kind of strange and makes me think we may > not completely understand the root cause, e.g., maybe some config > didn't get restored. > > But the fact that Windows doesn't use D3cold in this case suggests > that either (1) Windows has a similar quirk to work around this, or > (2) Windows decides whether to use D3cold differently than Linux does. > > I have no data, but (1) seems sort of unlikely. In case it turns out > to be (2) and we figure out how to fix it that way someday, can you > add the output of "sudo lspci -vvxxxx" of the system to the bugzilla? https://bugzilla.kernel.org/show_bug.cgi?id=215742#c27 Some other interesting observations from Windows, observed via socwatch & VTune: On affected BIOS versions: CPU does not go into the lowest power state PC10 during suspend - it only reaches PC8. SLP_S0# signal is not asserted (which follows from it not reaching PC10). NVMe device in D0 and the HDD LED briefly blinks every 1-2 seconds (can't recall if it a regular or irregular blink) On latest BIOS version: PC10 reached and SLP_S0# asserted during suspend, but only for about 25% of the suspend time NVMe device in D0 and the HDD LED briefly blinks every 1-2 seconds (can't recall if it a regular or irregular blink) The LED blinking leaves me wondering if there is something "using" the disk during suspend in Windows, so that's why it doesn't try to power it down even on the original version with StorageD3Enable=1. This HDD LED blinking during suspend does not happen on Linux, not even when NVMe device is left in D0 over suspend with the regular nvme_suspend() path putting the NVMe device into lower power mode at the NVMe protocol level. > What would be the downside of skipping the DMI table and calling > pci_d3cold_disable() always? If this truly is a Root Port defect, it > should affect all platforms with this device, and what's the benefit > of relying on BIOS to use StorageD3Enable to avoid the defect? I had more assumed that it was a platform-specific DSDT bug, in that PEG0.PXP._OFF is doing something that PEG0.PXP._ON is unable to recover from, and that other platforms might handle the suspend/resume of this root port more correctly. Not sure if it is reasonable to assume that all other platforms on the same chipset have the same bug (if that's what this is). Daniel
On Thu, Feb 8, 2024 at 9:37 AM Daniel Drake <drake@endlessos.org> wrote: > > What would be the downside of skipping the DMI table and calling > > pci_d3cold_disable() always? If this truly is a Root Port defect, it > > should affect all platforms with this device, and what's the benefit > > of relying on BIOS to use StorageD3Enable to avoid the defect? > > I had more assumed that it was a platform-specific DSDT bug, in that > PEG0.PXP._OFF is doing something that PEG0.PXP._ON is unable to > recover from, and that other platforms might handle the suspend/resume > of this root port more correctly. Not sure if it is reasonable to > assume that all other platforms on the same chipset have the same bug > (if that's what this is). Just realised my main workstation (Dell XPS) has the same chipset. The Dell ACPI table has the exact same suspect-buggy function, which the affected Asus system calls from PEG0.PXP._OFF: Method (DL23, 0, Serialized) { L23E = One Sleep (0x10) Local0 = Zero While (L23E) { If ((Local0 > 0x04)) { Break } Sleep (0x10) Local0++ } SCB0 = One } (the "L23E = One" line is the one that writes a value to config offset 0xe2, if you comment out this line then everything works) However, on the Dell XPS system, nothing calls DL23() i.e. it is dead code. Comparing side by side: Asus root port (PC00.PEG0) has the PXP power resource which gets powered down during D3cold transition as it becomes unused. Dell root port has no power resources (no _PR0). Asus NVM device sitting under that root port (PC00.PEG0.PEGP) has no-op _PS3 method, but Dell does not have _PS3. This means that Dell doesn't attempt D3cold on NVMe nor the parent root port during suspend (both go to D3hot only). Let me know if you have any ideas for other useful comparative experiments. Daniel
On Thu, 2024-02-08 at 10:52 +0100, Daniel Drake wrote: > On Thu, Feb 8, 2024 at 9:37 AM Daniel Drake <drake@endlessos.org> wrote: > > > What would be the downside of skipping the DMI table and calling > > > pci_d3cold_disable() always? If this truly is a Root Port defect, it > > > should affect all platforms with this device, and what's the benefit > > > of relying on BIOS to use StorageD3Enable to avoid the defect? > > > > I had more assumed that it was a platform-specific DSDT bug, in that > > PEG0.PXP._OFF is doing something that PEG0.PXP._ON is unable to > > recover from, and that other platforms might handle the suspend/resume > > of this root port more correctly. Not sure if it is reasonable to > > assume that all other platforms on the same chipset have the same bug > > (if that's what this is). This does look like a firmware bug. We've had reports of D3cold support missing when running in non-VMD mode on systems that were designed with VMD for Windows. These issues have been caught and addressed by OEMs during enabling of Linux systems. Does D3cold work in VMD mode? David > > Just realised my main workstation (Dell XPS) has the same chipset. > > The Dell ACPI table has the exact same suspect-buggy function, which > the affected Asus system calls from PEG0.PXP._OFF: > > Method (DL23, 0, Serialized) > { > L23E = One > Sleep (0x10) > Local0 = Zero > While (L23E) > { > If ((Local0 > 0x04)) > { > Break > } > > Sleep (0x10) > Local0++ > } > > SCB0 = One > } > > (the "L23E = One" line is the one that writes a value to config offset > 0xe2, if you comment out this line then everything works) > > However, on the Dell XPS system, nothing calls DL23() i.e. it is dead code. > > Comparing side by side: > Asus root port (PC00.PEG0) has the PXP power resource which gets > powered down during D3cold transition as it becomes unused. Dell root > port has no power resources (no _PR0). > Asus NVM device sitting under that root port (PC00.PEG0.PEGP) has > no-op _PS3 method, but Dell does not have _PS3. This means that Dell > doesn't attempt D3cold on NVMe nor the parent root port during suspend > (both go to D3hot only). > > Let me know if you have any ideas for other useful comparative experiments. > Daniel
On Thu, Feb 8, 2024 at 5:57 PM David E. Box <david.e.box@linux.intel.com> wrote: > This does look like a firmware bug. We've had reports of D3cold support missing > when running in non-VMD mode on systems that were designed with VMD for Windows. > These issues have been caught and addressed by OEMs during enabling of Linux > systems. Does D3cold work in VMD mode? On Windows for the VMD=on case, we only tested this on a BIOS with StorageD3Enable=0. The NVMe device and parent bridge stayed in D0 over suspend, but that's exactly what the BIOS asked for, so it doesn't really answer your question. On Linux with VMD=on and StorageD3Enable=1, the NVMe storage device and the VMD parent bridge are staying in D0 over suspend. I don't know why this is, I would have expected at least D3hot. However, given that the NVMe device has no firmware_node under the VMD=on setup, I believe there is no way it would enter D3cold because there's no linkage to an ACPI device, so no available _PS3 or _PR0 or whatever is the precise definition of D3cold. I also realise I may have made a bad assumption in my previous mail when looking at the Dell device: I was assuming that a parent PCI bridge cannot go into D3cold if its child devices only got as far as D3hot, but I now realise I'm not sure if that constraint actually exists. Not sure if these questions are relevant for the consideration of this patch, but I'll try to find some time to answer them next week. Daniel
On Fri, 2024-02-09 at 09:36 +0100, Daniel Drake wrote: > On Thu, Feb 8, 2024 at 5:57 PM David E. Box <david.e.box@linux.intel.com> > wrote: > > This does look like a firmware bug. We've had reports of D3cold support > > missing > > when running in non-VMD mode on systems that were designed with VMD for > > Windows. > > These issues have been caught and addressed by OEMs during enabling of Linux > > systems. Does D3cold work in VMD mode? > > On Windows for the VMD=on case, we only tested this on a BIOS with > StorageD3Enable=0. The NVMe device and parent bridge stayed in D0 over > suspend, but that's exactly what the BIOS asked for, so it doesn't > really answer your question. > > On Linux with VMD=on and StorageD3Enable=1, the NVMe storage device > and the VMD parent bridge are staying in D0 over suspend. I don't know > why this is, I would have expected at least D3hot. Yeah something is missing here. When StorageD3Enable is set, the nvme driver prints the following message during boot: "platform quirk: setting simple suspend" If you don't see this, then the driver never saw StorageD3Enable=1. Possible reasons are: - The property doesn't exist - The property isn't set under the ACPI companion device - There is no associated ACPI companion device - The "nvme=noacpi" parameter was passed on the kernel cmdline - The nvme driver was quirked to not use D3 with NVME_QUIRK_FORCE_NO_SIMPLE_SUSPEND. How was the D-state status confirmed? You can use the following to see the D state of PCI devices during suspend in the kernel log: echo -n "file pci-driver.c +p" > /sys/kernel/debug/dynamic_debug/control David > However, given > that the NVMe device has no firmware_node under the VMD=on setup, I > believe there is no way it would enter D3cold because there's no > linkage to an ACPI device, so no available _PS3 or _PR0 or whatever is > the precise definition of D3cold. > > I also realise I may have made a bad assumption in my previous mail > when looking at the Dell device: I was assuming that a parent PCI > bridge cannot go into D3cold if its child devices only got as far as > D3hot, but I now realise I'm not sure if that constraint actually > exists. > > Not sure if these questions are relevant for the consideration of this > patch, but I'll try to find some time to answer them next week. > > Daniel
On Thu, Feb 8, 2024 at 10:52 AM Daniel Drake <drake@endlessos.org> wrote: > Just realised my main workstation (Dell XPS) has the same chipset. > > The Dell ACPI table has the exact same suspect-buggy function, which > the affected Asus system calls from PEG0.PXP._OFF: > > Method (DL23, 0, Serialized) > { > L23E = One > Sleep (0x10) > Local0 = Zero > While (L23E) > { > If ((Local0 > 0x04)) > { > Break > } > > Sleep (0x10) > Local0++ > } > > SCB0 = One > } > > (the "L23E = One" line is the one that writes a value to config offset > 0xe2, if you comment out this line then everything works) > > However, on the Dell XPS system, nothing calls DL23() i.e. it is dead code. > > Comparing side by side: > Asus root port (PC00.PEG0) has the PXP power resource which gets > powered down during D3cold transition as it becomes unused. Dell root > port has no power resources (no _PR0). > Asus NVM device sitting under that root port (PC00.PEG0.PEGP) has > no-op _PS3 method, but Dell does not have _PS3. This means that Dell > doesn't attempt D3cold on NVMe nor the parent root port during suspend > (both go to D3hot only). Recap: comparing Asus device (NVMe + parent bridge goes into D3cold in suspend, and cannot wake up) vs Dell device with same chipset (NVMe device + parent bridge go into D3hot). These suspend power states were confirmed by: echo -n "file pci-driver.c +p" > /sys/kernel/debug/dynamic_debug/control In asking "why does the Dell device not go into D3cold" I got some details mixed up above. I have now clarified: The NVMe device does not have any _PSx _PRx methods so acpi_bus_get_power_flags() does not set the power_manageable flag. This limits the pci layer to D3hot at best. The parent bridge has _PS0 and _PS3 methods, so it is power_manageable. However, it does not have any power resources (_PR0/_PR3) and hence ACPI_STATE_D3_COLD is not marked as valid. Checking the ACPI spec, this is indeed the definition of D3cold support (_PR3 gives the required power resources for running the device in D3hot state, so if you turn those ones off you get D3cold). This does not conclusively answer the question of "is D3cold broken on this PCI bridge for all devices built on this chipset?". But at a stretch you could regard it as another data point agreeing with that theory: the Dell product does not attempt D3cold support at the ACPI level and there may be a good reason for that. Daniel
On Fri, Feb 9, 2024 at 9:36 AM Daniel Drake <drake@endlessos.org> wrote: > On Thu, Feb 8, 2024 at 5:57 PM David E. Box <david.e.box@linux.intel.com> wrote: > > This does look like a firmware bug. We've had reports of D3cold support missing > > when running in non-VMD mode on systems that were designed with VMD for Windows. > > These issues have been caught and addressed by OEMs during enabling of Linux > > systems. Does D3cold work in VMD mode? > > On Windows for the VMD=on case, we only tested this on a BIOS with > StorageD3Enable=0. The NVMe device and parent bridge stayed in D0 over > suspend, but that's exactly what the BIOS asked for, so it doesn't > really answer your question. Tested on the original BIOS version with VMD=on: Windows leaves the NVMe device (and parent bridge) in D0 during suspend (i.e. same result as VMD=off). On this setup, there are 2 devices with StorageD3Enable flags: 1. \_SB.PC00.PEG0.PEGP._DSD has StorageD3Enable=1. This is set regardless of the VMD setting at the BIOS level. This is the flag that is causing us the headache in non-VMD mode where Linux then proceeds to put devices into D3cold. This PEGP device in the non-VMD configuration corresponds to the NVMe storage device. PEG0 is the PCI root port at 00:06.0 (the one in question in this thread), and PEGP is the child with address 0. However in VMD mode, 00:06.0 is a dummy device (not a bridge) so this PEGP device isn't going to be used by anything. 2. \_SB.PC00.VMD0._DSD has StorageD3Enable=0. This VMD0 device is only present when VMD is enabled in the BIOS. It is the companion for 00:0e.0 which is the device that the vmd driver binds against. This could be influencing Windows to leave the NVMe device in D0, but I doubt it, because it can't explain why Windows would have the D0 behaviour when VMD=off, also this is a really strange place to put the StorageD3Enable setting because it is not a storage device. > On Linux with VMD=on and StorageD3Enable=1, the NVMe storage device > and the VMD parent bridge are staying in D0 over suspend. I don't know > why this is, I would have expected at least D3hot. However, given > that the NVMe device has no firmware_node under the VMD=on setup, I > believe there is no way it would enter D3cold because there's no > linkage to an ACPI device, so no available _PS3 or _PR0 or whatever is > the precise definition of D3cold. Checked in more detail. In Linux, the NVMe device will only go into D3hot/D3cold if the ACPI companion device has an explicit StorageD3Enable=1. However, in VMD mode the NVMe storage device has no ACPI companion. Code flow is nvme_pci_alloc_dev() -> acpi_storage_d3() -> return false because no companion. The VMD PCI bridge at 10000:e0:06.0 that is parent of the SATA & NVME devices does have a companion \_SB.PC00.VMD0.PEG0 However, the SATA and NVME child devices do not have any ACPI companion. I examined the logic of vmd_acpi_find_companion() and determined that it is looking for devices with _ADR 80b8ffff (SATA) and 8100ffff (NVME) and such devices do not exist in the ACPI tables. Speculating a little, I guess this is also why Windows leaves the device in D0 in VMD=on mode: it would only put the NVMe device in D3hot/D3cold if it had a corresponding companion with StorageD3Enable=1 and there isn't one of those. What's still unknown is why it doesn't put the device in D3 in VMD=off mode because there is a correctly placed StorageD3Enable=1 in that case. Daniel
On 2/19/2024 05:35, Daniel Drake wrote: > On Fri, Feb 9, 2024 at 9:36 AM Daniel Drake <drake@endlessos.org> wrote: >> On Thu, Feb 8, 2024 at 5:57 PM David E. Box <david.e.box@linux.intel.com> wrote: >>> This does look like a firmware bug. We've had reports of D3cold support missing >>> when running in non-VMD mode on systems that were designed with VMD for Windows. >>> These issues have been caught and addressed by OEMs during enabling of Linux >>> systems. Does D3cold work in VMD mode? >> >> On Windows for the VMD=on case, we only tested this on a BIOS with >> StorageD3Enable=0. The NVMe device and parent bridge stayed in D0 over >> suspend, but that's exactly what the BIOS asked for, so it doesn't >> really answer your question. > > Tested on the original BIOS version with VMD=on: Windows leaves the > NVMe device (and parent bridge) in D0 during suspend (i.e. same result > as VMD=off). > > On this setup, there are 2 devices with StorageD3Enable flags: > > 1. \_SB.PC00.PEG0.PEGP._DSD has StorageD3Enable=1. This is set > regardless of the VMD setting at the BIOS level. This is the flag that > is causing us the headache in non-VMD mode where Linux then proceeds > to put devices into D3cold. > This PEGP device in the non-VMD configuration corresponds to the NVMe > storage device. PEG0 is the PCI root port at 00:06.0 (the one in > question in this thread), and PEGP is the child with address 0. > However in VMD mode, 00:06.0 is a dummy device (not a bridge) so this > PEGP device isn't going to be used by anything. > > 2. \_SB.PC00.VMD0._DSD has StorageD3Enable=0. This VMD0 device is only > present when VMD is enabled in the BIOS. It is the companion for > 00:0e.0 which is the device that the vmd driver binds against. This > could be influencing Windows to leave the NVMe device in D0, but I > doubt it, because it can't explain why Windows would have the D0 > behaviour when VMD=off, also this is a really strange place to put the > StorageD3Enable setting because it is not a storage device. > >> On Linux with VMD=on and StorageD3Enable=1, the NVMe storage device >> and the VMD parent bridge are staying in D0 over suspend. I don't know >> why this is, I would have expected at least D3hot. However, given >> that the NVMe device has no firmware_node under the VMD=on setup, I >> believe there is no way it would enter D3cold because there's no >> linkage to an ACPI device, so no available _PS3 or _PR0 or whatever is >> the precise definition of D3cold. > > Checked in more detail. In Linux, the NVMe device will only go into > D3hot/D3cold if the ACPI companion device has an explicit > StorageD3Enable=1. However, in VMD mode the NVMe storage device has no > ACPI companion. Code flow is nvme_pci_alloc_dev() -> acpi_storage_d3() > -> return false because no companion. > > The VMD PCI bridge at 10000:e0:06.0 that is parent of the SATA & NVME > devices does have a companion \_SB.PC00.VMD0.PEG0 > However, the SATA and NVME child devices do not have any ACPI > companion. I examined the logic of vmd_acpi_find_companion() and > determined that it is looking for devices with _ADR 80b8ffff (SATA) > and 8100ffff (NVME) and such devices do not exist in the ACPI tables. > > Speculating a little, I guess this is also why Windows leaves the > device in D0 in VMD=on mode: it would only put the NVMe device in > D3hot/D3cold if it had a corresponding companion with > StorageD3Enable=1 and there isn't one of those. What's still unknown > is why it doesn't put the device in D3 in VMD=off mode because there > is a correctly placed StorageD3Enable=1 in that case. > > Daniel Tangentially related, I've observed from multiple bug reports that Windows will apply StorageD3Enable behavior to all storage devices in the system even if "only one" has the property. I would speculate that the logic for Windows exists specifically in the Microsoft storage drivers but is a "global boolean" for those drivers. So perhaps when VMD is enabled the Intel VMD driver is used instead of the Microsoft storage driver so it doesn't have that logic.
On Mon, 2024-02-19 at 12:35 +0100, Daniel Drake wrote: > On Fri, Feb 9, 2024 at 9:36 AM Daniel Drake <drake@endlessos.org> wrote: > > On Thu, Feb 8, 2024 at 5:57 PM David E. Box <david.e.box@linux.intel.com> > > wrote: > > > This does look like a firmware bug. We've had reports of D3cold support > > > missing > > > when running in non-VMD mode on systems that were designed with VMD for > > > Windows. > > > These issues have been caught and addressed by OEMs during enabling of > > > Linux > > > systems. Does D3cold work in VMD mode? > > > > On Windows for the VMD=on case, we only tested this on a BIOS with > > StorageD3Enable=0. The NVMe device and parent bridge stayed in D0 over > > suspend, but that's exactly what the BIOS asked for, so it doesn't > > really answer your question. > > Tested on the original BIOS version with VMD=on: Windows leaves the > NVMe device (and parent bridge) in D0 during suspend (i.e. same result > as VMD=off). > > On this setup, there are 2 devices with StorageD3Enable flags: > > 1. \_SB.PC00.PEG0.PEGP._DSD has StorageD3Enable=1. This is set > regardless of the VMD setting at the BIOS level. This is the flag that > is causing us the headache in non-VMD mode where Linux then proceeds > to put devices into D3cold. Likely never tested. > This PEGP device in the non-VMD configuration corresponds to the NVMe > storage device. PEG0 is the PCI root port at 00:06.0 (the one in > question in this thread), and PEGP is the child with address 0. > However in VMD mode, 00:06.0 is a dummy device (not a bridge) so this > PEGP device isn't going to be used by anything. > > 2. \_SB.PC00.VMD0._DSD has StorageD3Enable=0. This VMD0 device is only > present when VMD is enabled in the BIOS. It is the companion for > 00:0e.0 which is the device that the vmd driver binds against. This > could be influencing Windows to leave the NVMe device in D0, but I > doubt it, because it can't explain why Windows would have the D0 > behaviour when VMD=off, also this is a really strange place to put the > StorageD3Enable setting because it is not a storage device. Yes, we don't look for the property here, only under the child device of the Root Port. > > > On Linux with VMD=on and StorageD3Enable=1, the NVMe storage device > > and the VMD parent bridge are staying in D0 over suspend. I don't know > > why this is, I would have expected at least D3hot. However, given > > that the NVMe device has no firmware_node under the VMD=on setup, I > > believe there is no way it would enter D3cold because there's no > > linkage to an ACPI device, so no available _PS3 or _PR0 or whatever is > > the precise definition of D3cold. > > Checked in more detail. In Linux, the NVMe device will only go into > D3hot/D3cold if the ACPI companion device has an explicit > StorageD3Enable=1. However, in VMD mode the NVMe storage device has no > ACPI companion. Code flow is nvme_pci_alloc_dev() -> acpi_storage_d3() > -> return false because no companion. That explains it. > > The VMD PCI bridge at 10000:e0:06.0 that is parent of the SATA & NVME > devices does have a companion \_SB.PC00.VMD0.PEG0 > However, the SATA and NVME child devices do not have any ACPI > companion. I examined the logic of vmd_acpi_find_companion() and > determined that it is looking for devices with _ADR 80b8ffff (SATA) > and 8100ffff (NVME) and such devices do not exist in the ACPI tables. Based on its counterpart it should be at \_SB.PC00.VMD0.PEG0.PEGP in VMD mode. This is where _ADR = 8100FFFF should be. This looks like broken ACPI code since the property does exist in the expected location when VMD is disabled. > > Speculating a little, I guess this is also why Windows leaves the > device in D0 in VMD=on mode: it would only put the NVMe device in > D3hot/D3cold if it had a corresponding companion with > StorageD3Enable=1 and there isn't one of those. What's still unknown > is why it doesn't put the device in D3 in VMD=off mode because there > is a correctly placed StorageD3Enable=1 in that case. Given the observations (inconsistent firmware and testing showing Windows using D0) I'm good with the approach. David
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c index f347c20247d30..6b0b341178e4f 100644 --- a/arch/x86/pci/fixup.c +++ b/arch/x86/pci/fixup.c @@ -907,6 +907,51 @@ static void chromeos_fixup_apl_pci_l1ss_capability(struct pci_dev *dev) DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_save_apl_pci_l1ss_capability); DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_fixup_apl_pci_l1ss_capability); +/* + * Disable D3cold on Asus B1400 PCIe bridge at 00:06.0. + * + * On this platform with VMD off, the NVMe's parent PCI bridge cannot + * successfully power back on from D3cold, resulting in unresponsive NVMe on + * resume. This appears to be an untested transition by the vendor: Windows + * leaves the NVMe and parent bridge in D0 during suspend. + * This is only needed on BIOS versions before 308; the newer versions flip + * StorageD3Enable from 1 to 0. + */ +static const struct dmi_system_id asus_nvme_broken_d3cold_table[] = { + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.304"), + }, + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.305"), + }, + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.306"), + }, + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.307"), + }, + }, + {} +}; + +static void asus_disable_nvme_d3cold(struct pci_dev *pdev) +{ + if (dmi_check_system(asus_nvme_broken_d3cold_table) > 0) + pci_d3cold_disable(pdev); +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9a09, asus_disable_nvme_d3cold); + #ifdef CONFIG_SUSPEND /* * Root Ports on some AMD SoCs advertise PME_Support for D3hot and D3cold, but
The Asus B1400 with original shipped firmware versions and VMD disabled cannot resume from suspend: the NVMe device becomes unresponsive and inaccessible. This is because the NVMe device and parent PCI bridge get put into D3cold during suspend, and this PCI bridge cannot be recovered from D3cold mode: echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/unbind echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/unbind setpci -s 00:06.0 CAP_PM+4.b=03 # D3hot acpidbg -b "execute \_SB.PC00.PEG0.PXP._OFF" acpidbg -b "execute \_SB.PC00.PEG0.PXP._ON" setpci -s 00:06.0 CAP_PM+4.b=0 # D0 echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/bind echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/bind # NVMe probe fails here with -ENODEV This appears to be an untested D3cold transition by the vendor; Intel socwatch shows that Windows leaves the NVMe device and parent bridge in D0 during suspend, even though these firmware versions have StorageD3Enable=1. Experimenting with the DSDT, the _OFF method calls DL23() which sets a L23E bit at offset 0xe2 into the PCI configuration space for this root port. This is the specific write that the _ON routine is unable to recover from. This register is not documented in the public chipset datasheet. Disallow D3cold on the PCI bridge to enable successful suspend/resume. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742 Signed-off-by: Daniel Drake <drake@endlessos.org> --- arch/x86/pci/fixup.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) v2: Match only specific BIOS versions where this quirk is required. Add subsequent patch to this series to revert the original S3 workaround now that s2idle is usable again.