diff mbox series

[1/1] PCI/ASPM: Fix L1SS saving

Message ID 20250131152913.2507-1-ilpo.jarvinen@linux.intel.com (mailing list archive)
State Accepted
Commit be6358acefc1f4261c6fa43721e0d2d94ca4266c
Delegated to: Bjorn Helgaas
Headers show
Series [1/1] PCI/ASPM: Fix L1SS saving | expand

Commit Message

Ilpo Järvinen Jan. 31, 2025, 3:29 p.m. UTC
The commit 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in
pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both
the Upstream Port and its upstream bridge when handling an Upstream
Port, which matches what the L1SS restore side does. However,
parent->state_saved can be set true at an earlier time when the
upstream bridge saved other parts of its state. Then later when
attempting to save the L1SS config while handling the Upstream Port,
parent->state_saved is true in pci_save_aspm_l1ss_state() resulting in
early return and skipping saving bridge's L1SS config because it is
assumed to be already saved. Later on restore, junk is written into
L1SS config which causes issues with some devices.

Remove parent->state_saved check and unconditionally save L1SS config
also for the upstream bridge from an Upstream Port which ought to be
harmless from correctness point of view. With the Upstream Port check
now present, saving the L1SS config more than once for the bridge is no
longer a problem (unlike when the parent->state_saved check got
introduced into the fix during its development).

Fixes: 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731
Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pcie/aspm.c | 3 ---
 1 file changed, 3 deletions(-)


base-commit: 72deda0abee6e705ae71a93f69f55e33be5bca5c

Comments

Bjorn Helgaas Feb. 4, 2025, 8:39 p.m. UTC | #1
[+cc Rafael]

On Fri, Jan 31, 2025 at 05:29:13PM +0200, Ilpo Järvinen wrote:
> The commit 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in
> pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both
> the Upstream Port and its upstream bridge when handling an Upstream
> Port, which matches what the L1SS restore side does. However,
> parent->state_saved can be set true at an earlier time when the
> upstream bridge saved other parts of its state. 

So I guess the scenario is that we got here because some driver called
pci_save_state(pdev):

  pci_save_state
    dev->state_saved = true                <--
    pci_save_pcie_state
      pci_save_aspm_l1ss_state
        if (pcie_downstream_port(pdev))
          return
        # save pdev L1SS state here
        if (parent->state_saved)           <--
          return
        # save parent L1SS state here

and the problem is that we previously called pci_save_state(parent),
which set "parent->state_saved = true" but did not save its L1SS state
because pci_save_aspm_l1ss_state() is a no-op for Downstream Ports,
right?

But I would think this would be a very common situation because
pcie_portdrv_probe() calls pci_save_state() for Downstream Ports when
pciehp, AER, PME, etc, are enabled, and this would happen before the
pci_save_state() calls from Endpoint drivers.

So I'm a little surprised that this didn't blow up for everybody
immediately.  Is there something that makes this unusual?

> Then later when
> attempting to save the L1SS config while handling the Upstream Port,
> parent->state_saved is true in pci_save_aspm_l1ss_state() resulting in
> early return and skipping saving bridge's L1SS config because it is
> assumed to be already saved. Later on restore, junk is written into
> L1SS config which causes issues with some devices.
> 
> Remove parent->state_saved check and unconditionally save L1SS config
> also for the upstream bridge from an Upstream Port which ought to be
> harmless from correctness point of view. With the Upstream Port check
> now present, saving the L1SS config more than once for the bridge is no
> longer a problem (unlike when the parent->state_saved check got
> introduced into the fix during its development).
> 
> Fixes: 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()")
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731
> Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
> Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/pcie/aspm.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index e0bc90597dca..da3e7edcf49d 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -108,9 +108,6 @@ void pci_save_aspm_l1ss_state(struct pci_dev *pdev)
>  	pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL2, cap++);
>  	pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL1, cap++);
>  
> -	if (parent->state_saved)
> -		return;
> -
>  	/*
>  	 * Save parent's L1 substate configuration so we have it for
>  	 * pci_restore_aspm_l1ss_state(pdev) to restore.
> 
> base-commit: 72deda0abee6e705ae71a93f69f55e33be5bca5c
> -- 
> 2.39.5
>
Niklāvs Koļesņikovs Feb. 4, 2025, 8:55 p.m. UTC | #2
Hi,

I'm the person who first reported this issue. The most obvious reason
to a layman such as myself would be the simple fact that desktop
platforms by default do not enable L1SS in their EFI/BIOS
configurations.
Meaning only weirdoes such as myself who tinker with EFI and kernel
settings [while looking at a power meter readings] would ever
encounter this on a desktop system.

I don't know about portable devices but perhaps they rely on their
embedded controllers instead of exposing L1SS to the OS, too? Just
guessing.

That being said, it's possible my layman intuition is very wrong. If
so, sorry about the noise.

Cheers,
Niklāvs Koļesņikovs

P.S. Sorry, if reply to all was the wrong GMail button to press.

otrd., 2025. g. 4. febr., plkst. 22:39 — lietotājs Bjorn Helgaas
(<helgaas@kernel.org>) rakstīja:
>
> [+cc Rafael]
>
> On Fri, Jan 31, 2025 at 05:29:13PM +0200, Ilpo Järvinen wrote:
> > The commit 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in
> > pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both
> > the Upstream Port and its upstream bridge when handling an Upstream
> > Port, which matches what the L1SS restore side does. However,
> > parent->state_saved can be set true at an earlier time when the
> > upstream bridge saved other parts of its state.
>
> So I guess the scenario is that we got here because some driver called
> pci_save_state(pdev):
>
>   pci_save_state
>     dev->state_saved = true                <--
>     pci_save_pcie_state
>       pci_save_aspm_l1ss_state
>         if (pcie_downstream_port(pdev))
>           return
>         # save pdev L1SS state here
>         if (parent->state_saved)           <--
>           return
>         # save parent L1SS state here
>
> and the problem is that we previously called pci_save_state(parent),
> which set "parent->state_saved = true" but did not save its L1SS state
> because pci_save_aspm_l1ss_state() is a no-op for Downstream Ports,
> right?
>
> But I would think this would be a very common situation because
> pcie_portdrv_probe() calls pci_save_state() for Downstream Ports when
> pciehp, AER, PME, etc, are enabled, and this would happen before the
> pci_save_state() calls from Endpoint drivers.
>
> So I'm a little surprised that this didn't blow up for everybody
> immediately.  Is there something that makes this unusual?
>
> > Then later when
> > attempting to save the L1SS config while handling the Upstream Port,
> > parent->state_saved is true in pci_save_aspm_l1ss_state() resulting in
> > early return and skipping saving bridge's L1SS config because it is
> > assumed to be already saved. Later on restore, junk is written into
> > L1SS config which causes issues with some devices.
> >
> > Remove parent->state_saved check and unconditionally save L1SS config
> > also for the upstream bridge from an Upstream Port which ought to be
> > harmless from correctness point of view. With the Upstream Port check
> > now present, saving the L1SS config more than once for the bridge is no
> > longer a problem (unlike when the parent->state_saved check got
> > introduced into the fix during its development).
> >
> > Fixes: 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()")
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731
> > Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
> > Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index e0bc90597dca..da3e7edcf49d 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -108,9 +108,6 @@ void pci_save_aspm_l1ss_state(struct pci_dev *pdev)
> >       pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL2, cap++);
> >       pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL1, cap++);
> >
> > -     if (parent->state_saved)
> > -             return;
> > -
> >       /*
> >        * Save parent's L1 substate configuration so we have it for
> >        * pci_restore_aspm_l1ss_state(pdev) to restore.
> >
> > base-commit: 72deda0abee6e705ae71a93f69f55e33be5bca5c
> > --
> > 2.39.5
> >
Ilpo Järvinen Feb. 5, 2025, 8:38 a.m. UTC | #3
On Tue, 4 Feb 2025, Bjorn Helgaas wrote:

> [+cc Rafael]
> 
> On Fri, Jan 31, 2025 at 05:29:13PM +0200, Ilpo Järvinen wrote:
> > The commit 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in
> > pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both
> > the Upstream Port and its upstream bridge when handling an Upstream
> > Port, which matches what the L1SS restore side does. However,
> > parent->state_saved can be set true at an earlier time when the
> > upstream bridge saved other parts of its state. 
> 
> So I guess the scenario is that we got here because some driver called
> pci_save_state(pdev):
> 
>   pci_save_state
>     dev->state_saved = true                <--
>     pci_save_pcie_state
>       pci_save_aspm_l1ss_state
>         if (pcie_downstream_port(pdev))
>           return
>         # save pdev L1SS state here
>         if (parent->state_saved)           <--
>           return
>         # save parent L1SS state here
> 
> and the problem is that we previously called pci_save_state(parent),
> which set "parent->state_saved = true" but did not save its L1SS state
> because pci_save_aspm_l1ss_state() is a no-op for Downstream Ports,
> right?

Yes! An unfortunate interaction between those two checks.

> But I would think this would be a very common situation because
> pcie_portdrv_probe() calls pci_save_state() for Downstream Ports when
> pciehp, AER, PME, etc, are enabled, and this would happen before the
> pci_save_state() calls from Endpoint drivers.
> 
> So I'm a little surprised that this didn't blow up for everybody
> immediately.  Is there something that makes this unusual?

I agree it should be very common and was quite surprised that -next
did not catch it. What I recall though is you modified the patch while 
applying it by adding those Downstream Port checks so the fix 
patch's Tested-by was for different code from what got applied (and it 
would have been caught would the original author have tested also the 
modified commit).

Unfortunately, I cannot think of anything that would be so unusual to 
warrant not detecting it earlier. Maybe it was just the holiday period 
causing less testing and lower level of awareness in general? The machine 
doesn't always hang because of the problem as was the case with Niklāvs,
so it might have occurred but went unnoticed if it occurred for a device 
that is not critical.

> > Then later when
> > attempting to save the L1SS config while handling the Upstream Port,
> > parent->state_saved is true in pci_save_aspm_l1ss_state() resulting in
> > early return and skipping saving bridge's L1SS config because it is
> > assumed to be already saved. Later on restore, junk is written into
> > L1SS config which causes issues with some devices.
> > 
> > Remove parent->state_saved check and unconditionally save L1SS config
> > also for the upstream bridge from an Upstream Port which ought to be
> > harmless from correctness point of view. With the Upstream Port check
> > now present, saving the L1SS config more than once for the bridge is no
> > longer a problem (unlike when the parent->state_saved check got
> > introduced into the fix during its development).
> > 
> > Fixes: 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()")
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731
> > Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
> > Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index e0bc90597dca..da3e7edcf49d 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -108,9 +108,6 @@ void pci_save_aspm_l1ss_state(struct pci_dev *pdev)
> >  	pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL2, cap++);
> >  	pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL1, cap++);
> >  
> > -	if (parent->state_saved)
> > -		return;
> > -
> >  	/*
> >  	 * Save parent's L1 substate configuration so we have it for
> >  	 * pci_restore_aspm_l1ss_state(pdev) to restore.
> > 
> > base-commit: 72deda0abee6e705ae71a93f69f55e33be5bca5c
> > -- 
> > 2.39.5
> > 
>
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index e0bc90597dca..da3e7edcf49d 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -108,9 +108,6 @@  void pci_save_aspm_l1ss_state(struct pci_dev *pdev)
 	pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL2, cap++);
 	pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL1, cap++);
 
-	if (parent->state_saved)
-		return;
-
 	/*
 	 * Save parent's L1 substate configuration so we have it for
 	 * pci_restore_aspm_l1ss_state(pdev) to restore.