diff mbox series

Use maximum latency when determining L1 ASPM

Message ID 20200727213045.2117855-1-ian.kumlien@gmail.com (mailing list archive)
State Superseded, archived
Delegated to: Bjorn Helgaas
Headers show
Series Use maximum latency when determining L1 ASPM | expand

Commit Message

Ian Kumlien July 27, 2020, 9:30 p.m. UTC
Currently we check the maximum latency of upstream and downstream
per link, not the maximum for the path

This would work if all links have the same latency, but:
endpoint -> c -> b -> a -> root  (in the order we walk the path)

If c or b has the higest latency, it will not register

Fix this by maintaining the maximum latency value for the path

This change fixes a regression introduced by:
66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)

Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
---
 drivers/pci/pcie/aspm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Bjorn Helgaas July 29, 2020, 10:27 p.m. UTC | #1
On Mon, Jul 27, 2020 at 11:30:45PM +0200, Ian Kumlien wrote:
> Currently we check the maximum latency of upstream and downstream
> per link, not the maximum for the path
> 
> This would work if all links have the same latency, but:
> endpoint -> c -> b -> a -> root  (in the order we walk the path)
> 
> If c or b has the higest latency, it will not register
> 
> Fix this by maintaining the maximum latency value for the path
> 
> This change fixes a regression introduced by:
> 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)

Hi Ian,

Sorry about the regression, and thank you very much for doing the
hard work of debugging and fixing it!

My guess is that 66ff14e59e8a isn't itself buggy, but it allowed ASPM
to be enabled on a longer path, and we weren't computing the maximum
latency correctly, so ASPM on that longer path exceeded the amount we
could tolerate.  If that's the case, 66ff14e59e8a probably just
exposed an existing problem that could occur in other topologies even
without 66ff14e59e8a.

I'd like to work through this latency code with concrete examples.
Can you collect the "sudo lspci -vv" output and attach it to an entry
at https://bugzilla.kernel.org?  If it's convenient, it would be
really nice to compare it with similar output from before this patch.

Bjorn

> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> ---
>  drivers/pci/pcie/aspm.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b17e5ffd31b1..bd53fba7f382 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>  
>  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  {
> -	u32 latency, l1_switch_latency = 0;
> +	u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
>  	struct aspm_latency *acceptable;
>  	struct pcie_link_state *link;
>  
> @@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  		 * substate latencies (and hence do not do any check).
>  		 */
>  		latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> +		l1_max_latency = max_t(u32, latency, l1_max_latency);
>  		if ((link->aspm_capable & ASPM_STATE_L1) &&
> -		    (latency + l1_switch_latency > acceptable->l1))
> +		    (l1_max_latency + l1_switch_latency > acceptable->l1))
>  			link->aspm_capable &= ~ASPM_STATE_L1;
>  		l1_switch_latency += 1000;
>  
> -- 
> 2.27.0
>
Ian Kumlien July 29, 2020, 10:43 p.m. UTC | #2
On Thu, Jul 30, 2020 at 12:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Mon, Jul 27, 2020 at 11:30:45PM +0200, Ian Kumlien wrote:
> > Currently we check the maximum latency of upstream and downstream
> > per link, not the maximum for the path
> >
> > This would work if all links have the same latency, but:
> > endpoint -> c -> b -> a -> root  (in the order we walk the path)
> >
> > If c or b has the higest latency, it will not register
> >
> > Fix this by maintaining the maximum latency value for the path
> >
> > This change fixes a regression introduced by:
> > 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
>
> Hi Ian,
>
> Sorry about the regression, and thank you very much for doing the
> hard work of debugging and fixing it!
>
> My guess is that 66ff14e59e8a isn't itself buggy, but it allowed ASPM
> to be enabled on a longer path, and we weren't computing the maximum
> latency correctly, so ASPM on that longer path exceeded the amount we
> could tolerate.  If that's the case, 66ff14e59e8a probably just
> exposed an existing problem that could occur in other topologies even
> without 66ff14e59e8a.

I agree, this is why I didn't do fixes:. but it does fix a regression
- and it's hard to say
exactly what - I'd like it to go in to stable when accepted...

But we can rewrite it anyway you see fit :)

> I'd like to work through this latency code with concrete examples.
> Can you collect the "sudo lspci -vv" output and attach it to an entry
> at https://bugzilla.kernel.org?  If it's convenient, it would be
> really nice to compare it with similar output from before this patch.

I can cut from the mails i had in the conversation with Alexander Duyck

I submitted it against PCI, you can find it here:
https://bugzilla.kernel.org/show_bug.cgi?id=208741

Still filling in the data

> Bjorn
>
> > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index b17e5ffd31b1..bd53fba7f382 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> >
> >  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >  {
> > -     u32 latency, l1_switch_latency = 0;
> > +     u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> >       struct aspm_latency *acceptable;
> >       struct pcie_link_state *link;
> >
> > @@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >                * substate latencies (and hence do not do any check).
> >                */
> >               latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > +             l1_max_latency = max_t(u32, latency, l1_max_latency);
> >               if ((link->aspm_capable & ASPM_STATE_L1) &&
> > -                 (latency + l1_switch_latency > acceptable->l1))
> > +                 (l1_max_latency + l1_switch_latency > acceptable->l1))
> >                       link->aspm_capable &= ~ASPM_STATE_L1;
> >               l1_switch_latency += 1000;
> >
> > --
> > 2.27.0
> >
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index b17e5ffd31b1..bd53fba7f382 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,7 @@  static void pcie_get_aspm_reg(struct pci_dev *pdev,
 
 static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 {
-	u32 latency, l1_switch_latency = 0;
+	u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
 	struct aspm_latency *acceptable;
 	struct pcie_link_state *link;
 
@@ -470,8 +470,9 @@  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 		 * substate latencies (and hence do not do any check).
 		 */
 		latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
+		l1_max_latency = max_t(u32, latency, l1_max_latency);
 		if ((link->aspm_capable & ASPM_STATE_L1) &&
-		    (latency + l1_switch_latency > acceptable->l1))
+		    (l1_max_latency + l1_switch_latency > acceptable->l1))
 			link->aspm_capable &= ~ASPM_STATE_L1;
 		l1_switch_latency += 1000;