mbox series

[v2,0/4] Potential fix for runpm issues on various laptops

Message ID 20190507201245.9295-1-kherbst@redhat.com (mailing list archive)
Headers show
Series Potential fix for runpm issues on various laptops | expand

Message

Karol Herbst May 7, 2019, 8:12 p.m. UTC
CCing linux-pci and Bjorn Helgaas. Maybe we could get better insights on
how a reasonable fix would look like.

Anyway, to me this entire issue looks like something which has to be fixed
on a PCI level instead of inside a driver, so it makes sense to ask the
pci folks if they have any better suggestions.

Original cover letter:
While investigating the runpm issues on my GP107 I noticed that something
inside devinit makes runpm break. If Nouveau loads up to the point right
before doing devinit, runpm works without any issues, if devinit is ran,
not anymore.

Out of curiousity I even tried to "bisect" devinit by not running it on
vbios provided signed PMU image, but on the devinit parser we have inside
Nouveau.
Allthough this one isn't as feature complete as the vbios one, I was able
to reproduce the runpm issues as well. From that point I was able to only
run a certain amount of commands until I got to some PCIe initialization
code inside devinit which trigger those runpm issues.

Devinit on my GPU was changing the PCIe link from 8.0 to 2.5, reversing
that on the fini path makes runpm work again.

There are a few other things going on, but with my limited knowledge about
PCIe in general, the change in the link speed sounded like it could cause
issues on resume if the controller and the device disagree on the actual
link.

Maybe this is just a bug within the PCI subsystem inside linux instead and
the controller has to be forced to do _something_?

Anyway, with this runpm seems to work nicely on my machine. Secure booting
the gr (even with my workaround applied I need anyway) might fail after
the GPU got runtime resumed though...

Karol Herbst (4):
  drm: don't set the pci power state if the pci subsystem handles the
    ACPI bits
  pci: enable pcie link changes for pascal
  pci: add nvkm_pcie_get_speed
  pci: save the boot pcie link speed and restore it on fini

 drm/nouveau/include/nvkm/subdev/pci.h |  6 +++--
 drm/nouveau/nouveau_acpi.c            |  7 +++++-
 drm/nouveau/nouveau_acpi.h            |  2 ++
 drm/nouveau/nouveau_drm.c             | 14 +++++++++---
 drm/nouveau/nouveau_drv.h             |  2 ++
 drm/nouveau/nvkm/subdev/pci/base.c    |  9 ++++++--
 drm/nouveau/nvkm/subdev/pci/gk104.c   |  8 +++----
 drm/nouveau/nvkm/subdev/pci/gp100.c   | 10 +++++++++
 drm/nouveau/nvkm/subdev/pci/pcie.c    | 32 +++++++++++++++++++++++----
 drm/nouveau/nvkm/subdev/pci/priv.h    |  7 ++++++
 10 files changed, 81 insertions(+), 16 deletions(-)

Comments

Karol Herbst May 20, 2019, 1:23 p.m. UTC | #1
ping to the pci folks? I really would like to know what you make out of it.

In fact, this kind of looks like a pcie issue, but I just don't know
enough to really be able to tell. I am mainly wondering why putting
the device with a 2.5 vs a 8.0 link into d3cold makes the resume path
break? Any ideas? broken pcie controller? broken implementation on the
gpu?

On Tue, May 7, 2019 at 10:12 PM Karol Herbst <kherbst@redhat.com> wrote:
>
> CCing linux-pci and Bjorn Helgaas. Maybe we could get better insights on
> how a reasonable fix would look like.
>
> Anyway, to me this entire issue looks like something which has to be fixed
> on a PCI level instead of inside a driver, so it makes sense to ask the
> pci folks if they have any better suggestions.
>
> Original cover letter:
> While investigating the runpm issues on my GP107 I noticed that something
> inside devinit makes runpm break. If Nouveau loads up to the point right
> before doing devinit, runpm works without any issues, if devinit is ran,
> not anymore.
>
> Out of curiousity I even tried to "bisect" devinit by not running it on
> vbios provided signed PMU image, but on the devinit parser we have inside
> Nouveau.
> Allthough this one isn't as feature complete as the vbios one, I was able
> to reproduce the runpm issues as well. From that point I was able to only
> run a certain amount of commands until I got to some PCIe initialization
> code inside devinit which trigger those runpm issues.
>
> Devinit on my GPU was changing the PCIe link from 8.0 to 2.5, reversing
> that on the fini path makes runpm work again.
>
> There are a few other things going on, but with my limited knowledge about
> PCIe in general, the change in the link speed sounded like it could cause
> issues on resume if the controller and the device disagree on the actual
> link.
>
> Maybe this is just a bug within the PCI subsystem inside linux instead and
> the controller has to be forced to do _something_?
>
> Anyway, with this runpm seems to work nicely on my machine. Secure booting
> the gr (even with my workaround applied I need anyway) might fail after
> the GPU got runtime resumed though...
>
> Karol Herbst (4):
>   drm: don't set the pci power state if the pci subsystem handles the
>     ACPI bits
>   pci: enable pcie link changes for pascal
>   pci: add nvkm_pcie_get_speed
>   pci: save the boot pcie link speed and restore it on fini
>
>  drm/nouveau/include/nvkm/subdev/pci.h |  6 +++--
>  drm/nouveau/nouveau_acpi.c            |  7 +++++-
>  drm/nouveau/nouveau_acpi.h            |  2 ++
>  drm/nouveau/nouveau_drm.c             | 14 +++++++++---
>  drm/nouveau/nouveau_drv.h             |  2 ++
>  drm/nouveau/nvkm/subdev/pci/base.c    |  9 ++++++--
>  drm/nouveau/nvkm/subdev/pci/gk104.c   |  8 +++----
>  drm/nouveau/nvkm/subdev/pci/gp100.c   | 10 +++++++++
>  drm/nouveau/nvkm/subdev/pci/pcie.c    | 32 +++++++++++++++++++++++----
>  drm/nouveau/nvkm/subdev/pci/priv.h    |  7 ++++++
>  10 files changed, 81 insertions(+), 16 deletions(-)
>
> --
> 2.21.0
>