diff mbox

[1/9] drm/nouveau: Don't leak runtime pm ref on driver unload

Message ID dd120a30cb769c93af8973cae41f61831d17e04b.1464103767.git.lukas@wunner.de (mailing list archive)
State New, archived
Headers show

Commit Message

Lukas Wunner May 24, 2016, 4:03 p.m. UTC
nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0,
but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally.
We therefore leak a runtime pm ref whenever nouveau is loaded with
runpm=0 and then unloaded. The GPU will subsequently never runtime
suspend even if nouveau is loaded again with runpm=1.

Fix by taking the runtime pm ref under the same condition that it was
released on driver load.

Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
Cc: Dave Airlie <airlied@redhat.com>
Reported-by: Karol Herbst <nouveau@karolherbst.de>
Tested-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Peter Wu May 27, 2016, 1:07 a.m. UTC | #1
On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0,
> but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally.
> We therefore leak a runtime pm ref whenever nouveau is loaded with
> runpm=0 and then unloaded. The GPU will subsequently never runtime
> suspend even if nouveau is loaded again with runpm=1.
> 
> Fix by taking the runtime pm ref under the same condition that it was
> released on driver load.
> 
> Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> Cc: Dave Airlie <airlied@redhat.com>
> Reported-by: Karol Herbst <nouveau@karolherbst.de>
> Tested-by: Karol Herbst <nouveau@karolherbst.de>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Looks good, I tested this scenario:

    ru(){ cat /sys/bus/pci/devices/0000\:01:00.0/power/runtime_usage;}
    ru # reports 1
    modprobe nouveau runpm=0
    ru # reports 2
    rmmod nouveau
    ru # reports 1

Without runpm=0 the count drops to 0 in the second step and stays 0 in
the third step. After applying patch 2/9, this correctly reports 1 as
expected (this is the same as manually setting power/control to on).

Peter

> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 11f8dd9..faf7438 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -498,7 +498,10 @@ nouveau_drm_unload(struct drm_device *dev)
>  {
>  	struct nouveau_drm *drm = nouveau_drm(dev);
>  
> -	pm_runtime_get_sync(dev->dev);
> +	if (nouveau_runtime_pm != 0) {
> +		pm_runtime_get_sync(dev->dev);
> +	}
> +
>  	nouveau_fbcon_fini(dev);
>  	nouveau_accel_fini(drm);
>  	nouveau_hwmon_fini(dev);
> -- 
> 2.8.1
> 
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
Lukas Wunner May 29, 2016, 3:50 p.m. UTC | #2
Hi Peter,

On Fri, May 27, 2016 at 03:07:33AM +0200, Peter Wu wrote:
> On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0,
> > but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally.
> > We therefore leak a runtime pm ref whenever nouveau is loaded with
> > runpm=0 and then unloaded. The GPU will subsequently never runtime
> > suspend even if nouveau is loaded again with runpm=1.
> > 
> > Fix by taking the runtime pm ref under the same condition that it was
> > released on driver load.
> > 
> > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > Cc: Dave Airlie <airlied@redhat.com>
> > Reported-by: Karol Herbst <nouveau@karolherbst.de>
> > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> 
> Looks good, I tested this scenario:
> 
>     ru(){ cat /sys/bus/pci/devices/0000\:01:00.0/power/runtime_usage;}
>     ru # reports 1
>     modprobe nouveau runpm=0
>     ru # reports 2
>     rmmod nouveau
>     ru # reports 1
> 
> Without runpm=0 the count drops to 0 in the second step and stays 0 in
> the third step. After applying patch 2/9, this correctly reports 1 as
> expected (this is the same as manually setting power/control to on).

How exactly did you reach the situation where the root port didn't wake
up when you tried to load nouveau again? (IRC conversation this week.)

What's happening is, the PCI core will keep unbound devices (i.e.,
without driver) in D0 but the runtime status is allowed to change
to "suspended". So it'll appear to the kernel as if it was suspended
but in reality it stays in D0.

Once runtime pm for PCIe ports gets merged, the root port above the
GPU will indeed go to D3 in such a situation because the check
pm_children_suspended() (called from rpm_check_suspend_allowed())
returns true.

I'm not sure if this is desirable or not. If we keep unbound devices
in D0, should we allow ports above them to go to D3?

In any case, when nouveau is loaded again, local_pci_probe() will
call pm_runtime_get_sync(), which will implicitly set the runtime
status to "active" and which should also wake parents. So how did
you ever reach a point where you loaded nouveau and the root port
stayed asleep? Clearly we have a bug there, question is where.
This shouldn't work only if pm_runtime_forbid() was called on
driver unload.

Thanks for the extensive testing!

Lukas

> 
> Peter
> 
> > ---
> >  drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > index 11f8dd9..faf7438 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > @@ -498,7 +498,10 @@ nouveau_drm_unload(struct drm_device *dev)
> >  {
> >  	struct nouveau_drm *drm = nouveau_drm(dev);
> >  
> > -	pm_runtime_get_sync(dev->dev);
> > +	if (nouveau_runtime_pm != 0) {
> > +		pm_runtime_get_sync(dev->dev);
> > +	}
> > +
> >  	nouveau_fbcon_fini(dev);
> >  	nouveau_accel_fini(drm);
> >  	nouveau_hwmon_fini(dev);
> > -- 
> > 2.8.1
> > 
> > _______________________________________________
> > Nouveau mailing list
> > Nouveau@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/nouveau
> 
> -- 
> Kind regards,
> Peter Wu
> https://lekensteyn.nl
Peter Wu May 30, 2016, 5:03 p.m. UTC | #3
On Sun, May 29, 2016 at 05:50:06PM +0200, Lukas Wunner wrote:
> Hi Peter,
> 
> On Fri, May 27, 2016 at 03:07:33AM +0200, Peter Wu wrote:
> > On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > > nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0,
> > > but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally.
> > > We therefore leak a runtime pm ref whenever nouveau is loaded with
> > > runpm=0 and then unloaded. The GPU will subsequently never runtime
> > > suspend even if nouveau is loaded again with runpm=1.
> > > 
> > > Fix by taking the runtime pm ref under the same condition that it was
> > > released on driver load.
> > > 
> > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > > Cc: Dave Airlie <airlied@redhat.com>
> > > Reported-by: Karol Herbst <nouveau@karolherbst.de>
> > > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > 
> > Looks good, I tested this scenario:
> > 
> >     ru(){ cat /sys/bus/pci/devices/0000\:01:00.0/power/runtime_usage;}
> >     ru # reports 1
> >     modprobe nouveau runpm=0
> >     ru # reports 2
> >     rmmod nouveau
> >     ru # reports 1
> > 
> > Without runpm=0 the count drops to 0 in the second step and stays 0 in
> > the third step. After applying patch 2/9, this correctly reports 1 as
> > expected (this is the same as manually setting power/control to on).
> 
> How exactly did you reach the situation where the root port didn't wake
> up when you tried to load nouveau again? (IRC conversation this week.)

Ensure that the pci/pm patches are applied, then:

 0. Unload nouveau (I have blacklisted it for testing).
 1. Enable rpm for the root port and children (control = auto).
 2. Verify in the kernel logs that the devices are sleeping:
        pcieport 0000:00:01.0: power state changed by ACPI to D3cold
 3. (Optional, to rule out issues with delays:) Disable rpm for the
    Nvidia device (control = on).
 4. modprobe nouveau.

The above test with v4.6 + 4 pci/pm patches (8b71f565) gives:

    50.245795 MXM: GUID detected in BIOS
    50.245948    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
    50.246016 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    50.246044    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
    50.246110    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
    50.246256 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    50.246289    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
    50.246443 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    50.246457    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
    50.246932 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
    50.247005 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
    50.247084    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PG00._ON] at AML address ffffc9000001086e length 11D
    50.390140 pcieport 0000:00:01.0: power state changed by ACPI to D0
    50.491893    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D
    50.492285 pcieport 0000:00:01.0: PME# disabled
    50.492583 nouveau 0000:01:00.0: unknown chipset (ffffffff)
    50.492687 nouveau: probe of 0000:01:00.0 failed with error -12
    50.501990    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0._S0W] at AML address ffffc90000010a8e length 2
    50.502403 pcieport 0000:00:01.0: PME# enabled
    50.502601    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D
    50.513005    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PG00._OFF] at AML address ffffc90000010994 length 6D
    50.533258 pcieport 0000:00:01.0: power state changed by ACPI to D3cold

(Note that this patch is not included.) When nouveau is operating
normally, I see that _PS0 is also called (which does not happen above).

If you think that mixing power resources with DSM causes this issue, I
also tried to apply my power resources work for nouveau but it gives the
same problem:

    20.183306 MXM: GUID detected in BIOS
    20.183606 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    20.184158 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    20.184547 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
    20.185152 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
    20.185351 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
    20.185384 nouveau: detected PR support, will not use DSM
    20.185552 nouveau 0000:01:00.0: enabling device (0000 -> 0003)
    20.185873 nouveau 0000:01:00.0: unknown chipset (ffffffff)
    20.185946 nouveau: probe of 0000:01:00.0 failed with error -12

> What's happening is, the PCI core will keep unbound devices (i.e.,
> without driver) in D0 but the runtime status is allowed to change
> to "suspended". So it'll appear to the kernel as if it was suspended
> but in reality it stays in D0.
> 
> Once runtime pm for PCIe ports gets merged, the root port above the
> GPU will indeed go to D3 in such a situation because the check
> pm_children_suspended() (called from rpm_check_suspend_allowed())
> returns true.
> 
> I'm not sure if this is desirable or not. If we keep unbound devices
> in D0, should we allow ports above them to go to D3?

Maybe Rafael (linux-pm / linux-pci) can answer this question better?
The comments in local_pci_probe, pci_pm_runtime_suspend and
pci_pm_runtime_resume suggest that unbound devices are assumed in D0
which is apparently not the case when runtime PM is enabled.

> In any case, when nouveau is loaded again, local_pci_probe() will
> call pm_runtime_get_sync(), which will implicitly set the runtime
> status to "active" and which should also wake parents. So how did
> you ever reach a point where you loaded nouveau and the root port
> stayed asleep? Clearly we have a bug there, question is where.
> This shouldn't work only if pm_runtime_forbid() was called on
> driver unload.
> 
> Thanks for the extensive testing!
> Lukas

Both devices (root port and Nvidia) were resumed, but somehow the Nvidia
card was not fully initialized/ready (as you can see in the above logs).

Peter

> > 
> > Peter
> > 
> > > ---
> > >  drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > index 11f8dd9..faf7438 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > @@ -498,7 +498,10 @@ nouveau_drm_unload(struct drm_device *dev)
> > >  {
> > >  	struct nouveau_drm *drm = nouveau_drm(dev);
> > >  
> > > -	pm_runtime_get_sync(dev->dev);
> > > +	if (nouveau_runtime_pm != 0) {
> > > +		pm_runtime_get_sync(dev->dev);
> > > +	}
> > > +
> > >  	nouveau_fbcon_fini(dev);
> > >  	nouveau_accel_fini(drm);
> > >  	nouveau_hwmon_fini(dev);
> > > -- 
> > > 2.8.1
> > > 
> > > _______________________________________________
> > > Nouveau mailing list
> > > Nouveau@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/nouveau
Lukas Wunner May 31, 2016, 11:34 a.m. UTC | #4
On Mon, May 30, 2016 at 07:03:46PM +0200, Peter Wu wrote:
> On Sun, May 29, 2016 at 05:50:06PM +0200, Lukas Wunner wrote:
> > How exactly did you reach the situation where the root port didn't wake
> > up when you tried to load nouveau again? (IRC conversation this week.)
> 
> Ensure that the pci/pm patches are applied, then:
> 
>  0. Unload nouveau (I have blacklisted it for testing).
>  1. Enable rpm for the root port and children (control = auto).
>  2. Verify in the kernel logs that the devices are sleeping:
>         pcieport 0000:00:01.0: power state changed by ACPI to D3cold
>  3. (Optional, to rule out issues with delays:) Disable rpm for the
>     Nvidia device (control = on).
>  4. modprobe nouveau.
> 
> The above test with v4.6 + 4 pci/pm patches (8b71f565) gives:
> 
>     50.245795 MXM: GUID detected in BIOS
>     50.245948    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
>     50.246016 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
>     50.246044    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
>     50.246110    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
>     50.246256 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
>     50.246289    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
>     50.246443 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
>     50.246457    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
>     50.246932 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
>     50.247005 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
>     50.247084    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PG00._ON] at AML address ffffc9000001086e length 11D
>     50.390140 pcieport 0000:00:01.0: power state changed by ACPI to D0
>     50.491893    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D
>     50.492285 pcieport 0000:00:01.0: PME# disabled
>     50.492583 nouveau 0000:01:00.0: unknown chipset (ffffffff)
>     50.492687 nouveau: probe of 0000:01:00.0 failed with error -12

I've tested this on a MacBook Pro, which does not have ACPI _PR3
methods for the root port to which the discrete GPU is attached.
The port can thus only suspend to D3hot, not D3cold.

Even without patch [2/9], when unloading nouveau and letting the
root port go to D3hot, the port is subsequently correctly resumed
to D0 when reloading nouveau.

So the issue that you're seeing without patch [2/9] seems to be
specific to Optimus/_PR3 machines. If possible you should try to
get it working without patch [2/9] because that patch is really
optional (as I've written in the commit message). I'm totally
unfamiliar with Optimus but maybe lspci could help to debug this?

Lukas
Peter Wu May 31, 2016, 11:41 a.m. UTC | #5
On Tue, May 31, 2016 at 01:34:43PM +0200, Lukas Wunner wrote:
> On Mon, May 30, 2016 at 07:03:46PM +0200, Peter Wu wrote:
> > On Sun, May 29, 2016 at 05:50:06PM +0200, Lukas Wunner wrote:
> > > How exactly did you reach the situation where the root port didn't wake
> > > up when you tried to load nouveau again? (IRC conversation this week.)
> > 
> > Ensure that the pci/pm patches are applied, then:
> > 
> >  0. Unload nouveau (I have blacklisted it for testing).
> >  1. Enable rpm for the root port and children (control = auto).
> >  2. Verify in the kernel logs that the devices are sleeping:
> >         pcieport 0000:00:01.0: power state changed by ACPI to D3cold
> >  3. (Optional, to rule out issues with delays:) Disable rpm for the
> >     Nvidia device (control = on).
> >  4. modprobe nouveau.
> > 
> > The above test with v4.6 + 4 pci/pm patches (8b71f565) gives:
> > 
> >     50.245795 MXM: GUID detected in BIOS
> >     50.245948    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
> >     50.246016 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
> >     50.246044    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
> >     50.246110    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
> >     50.246256 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
> >     50.246289    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
> >     50.246443 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
> >     50.246457    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
> >     50.246932 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
> >     50.247005 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
> >     50.247084    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0.PG00._ON] at AML address ffffc9000001086e length 11D
> >     50.390140 pcieport 0000:00:01.0: power state changed by ACPI to D0
> >     50.491893    nseval-0227 ns_evaluate           : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D
> >     50.492285 pcieport 0000:00:01.0: PME# disabled
> >     50.492583 nouveau 0000:01:00.0: unknown chipset (ffffffff)
> >     50.492687 nouveau: probe of 0000:01:00.0 failed with error -12
> 
> I've tested this on a MacBook Pro, which does not have ACPI _PR3
> methods for the root port to which the discrete GPU is attached.
> The port can thus only suspend to D3hot, not D3cold.
> 
> Even without patch [2/9], when unloading nouveau and letting the
> root port go to D3hot, the port is subsequently correctly resumed
> to D0 when reloading nouveau.
> 
> So the issue that you're seeing without patch [2/9] seems to be
> specific to Optimus/_PR3 machines. If possible you should try to
> get it working without patch [2/9] because that patch is really
> optional (as I've written in the commit message). I'm totally
> unfamiliar with Optimus but maybe lspci could help to debug this?

Without 2/9 I can prevent the issue by writing "on" to
/sys/bus/pci/devices/0000:00:01.0/power/control (the PCIe port), but
that effectively gives the same result as applying 2/9.

The problem occurs when the power is lost (by putting the PCIe port in
D3cold). Maybe it is a bug in the PCI core that does not re-initialize
devices under the port, but since a workaround is available (2/9), I
will focus on other issues first. Maybe it is worth to mention this
issue in the commit message for 2/9 though.
diff mbox

Patch

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 11f8dd9..faf7438 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -498,7 +498,10 @@  nouveau_drm_unload(struct drm_device *dev)
 {
 	struct nouveau_drm *drm = nouveau_drm(dev);
 
-	pm_runtime_get_sync(dev->dev);
+	if (nouveau_runtime_pm != 0) {
+		pm_runtime_get_sync(dev->dev);
+	}
+
 	nouveau_fbcon_fini(dev);
 	nouveau_accel_fini(drm);
 	nouveau_hwmon_fini(dev);