diff mbox series

PCI/ASPM: Call pcie_aspm_sanity_check() as late as possible

Message ID 20221006115950.821736-1-steve@sk2.org (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series PCI/ASPM: Call pcie_aspm_sanity_check() as late as possible | expand

Commit Message

Stephen Kitt Oct. 6, 2022, 11:59 a.m. UTC
In pcie_aspm_init_link_state(), a number of checks are made to
determine whether the function should proceed, before the result of
the call to pcie_aspm_sanity_check() is actually used. The latter
function doesn't change any state, it only reports a result, so
calling it later doesn't make any difference to the state of the
devices or the information we have about them. But having the call
early reportedly can cause null-pointer dereferences; see
https://unix.stackexchange.com/q/322337 for one example with
pcie_aspm=off (this was reported in 2016, but the relevant code hasn't
changed since then).

This moves the call to pcie_aspm_sanity_check() just before the result
is actually used, giving all the other checks a chance to run first.

Signed-off-by: Stephen Kitt <steve@sk2.org>
---
 drivers/pci/pcie/aspm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


base-commit: 833477fce7a14d43ae4c07f8ddc32fa5119471a2

Comments

Bjorn Helgaas Dec. 7, 2022, 9:56 p.m. UTC | #1
[+cc Jan]

On Thu, Oct 06, 2022 at 01:59:50PM +0200, Stephen Kitt wrote:
> In pcie_aspm_init_link_state(), a number of checks are made to
> determine whether the function should proceed, before the result of
> the call to pcie_aspm_sanity_check() is actually used. The latter
> function doesn't change any state, it only reports a result, so
> calling it later doesn't make any difference to the state of the
> devices or the information we have about them. But having the call
> early reportedly can cause null-pointer dereferences; see
> https://unix.stackexchange.com/q/322337 for one example with
> pcie_aspm=off (this was reported in 2016, but the relevant code hasn't
> changed since then).

Thanks, Stephen!

That stackexchange report doesn't have much information, but it looks
similar to this old report from Jan Rueth, which I'm sorry to say I
never got resolved:

  https://bugzilla.kernel.org/show_bug.cgi?id=187731
  https://lore.kernel.org/all/4cec62c2-218a-672b-8c12-d44e8df56aae@comsys.rwth-aachen.de/#t

And Jan's patch is almost identical to yours :)

I hope to get this resolved, but I don't have time to work on it
before the upcoming merge window, which will probably open Sunday.
And then it's holiday time, so it may be January before I get back to
it.  I'm just dropping the links here as breadcrumbs for picking this
back up.

Bjorn

> This moves the call to pcie_aspm_sanity_check() just before the result
> is actually used, giving all the other checks a chance to run first.
> 
> Signed-off-by: Stephen Kitt <steve@sk2.org>
> ---
>  drivers/pci/pcie/aspm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index a8aec190986c..38df439568b7 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -889,7 +889,7 @@ static void pcie_aspm_update_sysfs_visibility(struct pci_dev *pdev)
>  void pcie_aspm_init_link_state(struct pci_dev *pdev)
>  {
>  	struct pcie_link_state *link;
> -	int blacklist = !!pcie_aspm_sanity_check(pdev);
> +	int blacklist;
>  
>  	if (!aspm_support_enabled)
>  		return;
> @@ -923,6 +923,7 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
>  	 * upstream links also because capable state of them can be
>  	 * update through pcie_aspm_cap_init().
>  	 */
> +	blacklist = !!pcie_aspm_sanity_check(pdev);
>  	pcie_aspm_cap_init(link, blacklist);
>  
>  	/* Setup initial Clock PM state */
> 
> base-commit: 833477fce7a14d43ae4c07f8ddc32fa5119471a2
> -- 
> 2.30.2
>
Stephen Kitt Dec. 8, 2022, 8 a.m. UTC | #2
Hi Bjorn,

On Wed, 7 Dec 2022 15:56:08 -0600, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Thu, Oct 06, 2022 at 01:59:50PM +0200, Stephen Kitt wrote:
> > In pcie_aspm_init_link_state(), a number of checks are made to
> > determine whether the function should proceed, before the result of
> > the call to pcie_aspm_sanity_check() is actually used. The latter
> > function doesn't change any state, it only reports a result, so
> > calling it later doesn't make any difference to the state of the
> > devices or the information we have about them. But having the call
> > early reportedly can cause null-pointer dereferences; see
> > https://unix.stackexchange.com/q/322337 for one example with
> > pcie_aspm=off (this was reported in 2016, but the relevant code hasn't
> > changed since then).  
> 
> Thanks, Stephen!
> 
> That stackexchange report doesn't have much information, but it looks
> similar to this old report from Jan Rueth, which I'm sorry to say I
> never got resolved:
> 
>   https://bugzilla.kernel.org/show_bug.cgi?id=187731
>   https://lore.kernel.org/all/4cec62c2-218a-672b-8c12-d44e8df56aae@comsys.rwth-aachen.de/#t
> 
> And Jan's patch is almost identical to yours :)
> 
> I hope to get this resolved, but I don't have time to work on it
> before the upcoming merge window, which will probably open Sunday.
> And then it's holiday time, so it may be January before I get back to
> it.  I'm just dropping the links here as breadcrumbs for picking this
> back up.

Thanks for the update! I was somewhat bemused by the dereference here, I’m
reassured to see I’m not the only one. Unfortunately I don’t have hardware
which exhibits this problem, I submitted the patch because it seemed
reasonably sensible even though as you say there is probably something else
going on here. Of course if this approach is useful, Jan’s patch should go in
rather than mine.

Anyway, it’s been six years, so a few more weeks won’t make any difference
;-).

Enjoy the holiday season!

Regards,

Stephen
Bjorn Helgaas Dec. 8, 2022, 4:55 p.m. UTC | #3
On Thu, Dec 08, 2022 at 09:00:17AM +0100, Stephen Kitt wrote:
> Hi Bjorn,
> 
> On Wed, 7 Dec 2022 15:56:08 -0600, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Oct 06, 2022 at 01:59:50PM +0200, Stephen Kitt wrote:
> > > In pcie_aspm_init_link_state(), a number of checks are made to
> > > determine whether the function should proceed, before the result of
> > > the call to pcie_aspm_sanity_check() is actually used. The latter
> > > function doesn't change any state, it only reports a result, so
> > > calling it later doesn't make any difference to the state of the
> > > devices or the information we have about them. But having the call
> > > early reportedly can cause null-pointer dereferences; see
> > > https://unix.stackexchange.com/q/322337 for one example with
> > > pcie_aspm=off (this was reported in 2016, but the relevant code hasn't
> > > changed since then).  
> > 
> > Thanks, Stephen!
> > 
> > That stackexchange report doesn't have much information, but it looks
> > similar to this old report from Jan Rueth, which I'm sorry to say I
> > never got resolved:
> > 
> >   https://bugzilla.kernel.org/show_bug.cgi?id=187731
> >   https://lore.kernel.org/all/4cec62c2-218a-672b-8c12-d44e8df56aae@comsys.rwth-aachen.de/#t
> > 
> > And Jan's patch is almost identical to yours :)
> > 
> > I hope to get this resolved, but I don't have time to work on it
> > before the upcoming merge window, which will probably open Sunday.
> > And then it's holiday time, so it may be January before I get back to
> > it.  I'm just dropping the links here as breadcrumbs for picking this
> > back up.
> 
> Thanks for the update! I was somewhat bemused by the dereference here, I’m
> reassured to see I’m not the only one. Unfortunately I don’t have hardware
> which exhibits this problem,

Yeah, that's a weird thing about this.  This shouldn't be a
platform-specific thing, but both stackexchange and Jan's patch
mention IBM x3850.  

Maybe both came from a single source, or maybe there's something
deeper going on.

> I submitted the patch because it seemed
> reasonably sensible even though as you say there is probably something else
> going on here. Of course if this approach is useful, Jan’s patch should go in
> rather than mine.
> 
> Anyway, it’s been six years, so a few more weeks won’t make any difference
> ;-).
> 
> Enjoy the holiday season!
> 
> Regards,
> 
> Stephen
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index a8aec190986c..38df439568b7 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -889,7 +889,7 @@  static void pcie_aspm_update_sysfs_visibility(struct pci_dev *pdev)
 void pcie_aspm_init_link_state(struct pci_dev *pdev)
 {
 	struct pcie_link_state *link;
-	int blacklist = !!pcie_aspm_sanity_check(pdev);
+	int blacklist;
 
 	if (!aspm_support_enabled)
 		return;
@@ -923,6 +923,7 @@  void pcie_aspm_init_link_state(struct pci_dev *pdev)
 	 * upstream links also because capable state of them can be
 	 * update through pcie_aspm_cap_init().
 	 */
+	blacklist = !!pcie_aspm_sanity_check(pdev);
 	pcie_aspm_cap_init(link, blacklist);
 
 	/* Setup initial Clock PM state */