mbox series

[v2,0/6] drm/i915/pm: Clean up the hibernate vs. PCI D3 quirk

Message ID 20250311195624.22420-1-ville.syrjala@linux.intel.com (mailing list archive)
Headers show
Series drm/i915/pm: Clean up the hibernate vs. PCI D3 quirk | expand

Message

Ville Syrjälä March 11, 2025, 7:56 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Attempt to make i915 rely more on the standard pci pm
code instead of hand rolling a bunch of
pci_save_state()+pci_set_power_state() stuff in the
driver.

v2: Drop the core pci changes for now since I couldn't
    get any real answers to them
    Drop some redundant pci_*() clals from the pm paths

Ville Syrjälä (6):
  drm/i915/pm: Simplify pm hook documentation
  drm/i915/pm: Hoist pci_save_state()+pci_set_power_state() to the end
    of pm _late() hook
  drm/i915/pm: Move the hibernate+D3 quirk stuff into noirq() pm hooks
  drm/i915/pm: Do pci_restore_state() in switcheroo resume hook
  drm/i915/pm: Allow drivers/pci to manage our pci state normally
  drm/i915/pm: Drop redundant pci stuff from suspend/resume paths

 drivers/gpu/drm/i915/i915_driver.c | 133 +++++++++++++++--------------
 1 file changed, 69 insertions(+), 64 deletions(-)

Comments

Ville Syrjälä March 12, 2025, 9:52 a.m. UTC | #1
On Tue, Mar 11, 2025 at 11:15:53PM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915/pm: Clean up the hibernate vs. PCI D3 quirk (rev2)
> URL   : https://patchwork.freedesktop.org/series/139097/
> State : failure
> 
> == Summary ==
> 
> #### Possible regressions ####
>   * igt@kms_addfb_basic@too-high:
>     - fi-kbl-8809g:       NOTRUN -> [FAIL][6] +3 other tests fail
>    [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_139097v2/fi-kbl-8809g/igt@kms_addfb_basic@too-high.html

A bunch of stuff seems to have broken in CI:
- something is now loading amdgpu when we didn't want it loaded
- the full dmesg has been lost so I can't even find out when amdgpu
  got loaded
Saarinen, Jani March 12, 2025, 10:05 a.m. UTC | #2
Hi, 
> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Ville
> Syrjälä
> Sent: Wednesday, 12 March 2025 11.53
> To: intel-gfx@lists.freedesktop.org
> Cc: I915-ci-infra@lists.freedesktop.org
> Subject: Re: ✗ i915.CI.BAT: failure for drm/i915/pm: Clean up the hibernate
> vs. PCI D3 quirk (rev2)
> 
> On Tue, Mar 11, 2025 at 11:15:53PM -0000, Patchwork wrote:
> > == Series Details ==
> >
> > Series: drm/i915/pm: Clean up the hibernate vs. PCI D3 quirk (rev2)
> > URL   : https://patchwork.freedesktop.org/series/139097/
> > State : failure
> >
> > == Summary ==
> >
> > #### Possible regressions ####
> >   * igt@kms_addfb_basic@too-high:
> >     - fi-kbl-8809g:       NOTRUN -> [FAIL][6] +3 other tests fail
> >    [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_139097v2/fi-kbl-
> 8809g/igt@kms_addfb_basic@too-high.html
> 
> A bunch of stuff seems to have broken in CI:
> - something is now loading amdgpu when we didn't want it loaded
On boot I see 
<6>[    0.000000] Command line: BOOT_IMAGE=/boot/drm_intel root=/dev/nvme0n1p2 rootwait fsck.repair=yes nmi_watchdog=panic,auto panic=5 softdog.soft_panic=5 log_buf_len=1M trace_clock=global xe.force_probe=* i915.force_probe=* drm.debug=0xe modprobe.blacklist=xe,i915,ast modprobe.blacklist=amdgpu ro

Is that not enough? 

> - the full dmesg has been lost so I can't even find out when amdgpu  got loaded
CI team, can you get all logs transferred ? 
On digging internally I see from dmesg (start from that file) 

<7>[   39.365629] [IGT] i915_module_load: executing
<7>[   39.373992] [IGT] i915_module_load: starting subtest load
<7>[   39.376091] [IGT] i915_module_load: finished subtest load, SKIP
<7>[   39.376197] [IGT] i915_module_load: exiting, ret=77
<7>[   39.551743] [IGT] core_auth: executing
<6>[   42.196892] [drm] amdgpu kernel modesetting enabled.
<7>[   42.197065] [drm:amdgpu_acpi_detect [amdgpu]] No matching acpi device found for AMD3000
<6>[   42.198069] amdgpu: Virtual CRAT table created for CPU
<6>[   42.198933] amdgpu: Topology: Add CPU node
<6>[   42.200595] amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
<6>[   42.201352] [drm] initializing kernel modesetting (VEGAM 0x1002:0x694C 0x8086:0x2073 0xC0).
<6>[   42.201418] [drm] register mmio base: 0xDB500000
<6>[   42.201420] [drm] register mmio size: 262144
<6>[   42.202307] amdgpu 0000:01:00.0: amdgpu: detected ip block number 0 <vi_common>
<6>[   42.202311] amdgpu 0000:01:00.0: amdgpu: detected ip block number 1 <gmc_v8_0>
<6>[   42.202314] amdgpu 0000:01:00.0: amdgpu: detected ip block number 2 <tonga_ih>
<6>[   42.202316] amdgpu 0000:01:00.0: amdgpu: detected ip block number 3 <gfx_v8_0>
<6>[   42.202318] amdgpu 0000:01:00.0: amdgpu: detected ip block number 4 <sdma_v3_0>
<6>[   42.202321] amdgpu 0000:01:00.0: amdgpu: detected ip block number 5 <powerplay>
<6>[   42.202323] amdgpu 0000:01:00.0: amdgpu: detected ip block number 6 <dm>
<6>[   42.202325] amdgpu 0000:01:00.0: amdgpu: detected ip block number 7 <uvd_v6_0>
<6>[   42.202327] amdgpu 0000:01:00.0: amdgpu: detected ip block number 8 <vce_v3_0>
<6>[   42.202427] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from VFCT
<6>[   42.202449] amdgpu: ATOM BIOS: 408435.180301.04s
<6>[   42.228348] [drm] UVD is enabled in VM mode
<6>[   42.228353] [drm] UVD ENC is enabled in VM mode
<6>[   42.228356] [drm] VCE enabled in VM mode
<6>[   42.228734] amdgpu 0000:01:00.0: vgaarb: deactivate vga console

> 
> --
> Ville Syrjälä
> Intel
Saarinen, Jani March 12, 2025, 10:07 a.m. UTC | #3
Hi, 


> -----Original Message-----
> From: Saarinen, Jani
> Sent: Wednesday, 12 March 2025 12.06
> To: Ville Syrjälä <ville.syrjala@linux.intel.com>; intel-gfx@lists.freedesktop.org;
> I915-ci-infra@lists.freedesktop.org
> Subject: RE: ✗ i915.CI.BAT: failure for drm/i915/pm: Clean up the hibernate
> vs. PCI D3 quirk (rev2)
> 
> Hi,
> > -----Original Message-----
> > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of
> > Ville Syrjälä
> > Sent: Wednesday, 12 March 2025 11.53
> > To: intel-gfx@lists.freedesktop.org
> > Cc: I915-ci-infra@lists.freedesktop.org
> > Subject: Re: ✗ i915.CI.BAT: failure for drm/i915/pm: Clean up the
> > hibernate vs. PCI D3 quirk (rev2)
> >
> > On Tue, Mar 11, 2025 at 11:15:53PM -0000, Patchwork wrote:
> > > == Series Details ==
> > >
> > > Series: drm/i915/pm: Clean up the hibernate vs. PCI D3 quirk (rev2)
> > > URL   : https://patchwork.freedesktop.org/series/139097/
> > > State : failure
> > >
> > > == Summary ==
> > >
> > > #### Possible regressions ####
> > >   * igt@kms_addfb_basic@too-high:
> > >     - fi-kbl-8809g:       NOTRUN -> [FAIL][6] +3 other tests fail
> > >    [6]:
> > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_139097v2/fi-kbl-
> > 8809g/igt@kms_addfb_basic@too-high.html
> >
> > A bunch of stuff seems to have broken in CI:
> > - something is now loading amdgpu when we didn't want it loaded
> On boot I see
> <6>[    0.000000] Command line: BOOT_IMAGE=/boot/drm_intel
> root=/dev/nvme0n1p2 rootwait fsck.repair=yes nmi_watchdog=panic,auto
> panic=5 softdog.soft_panic=5 log_buf_len=1M trace_clock=global
> xe.force_probe=* i915.force_probe=* drm.debug=0xe
> modprobe.blacklist=xe,i915,ast modprobe.blacklist=amdgpu ro
> 
> Is that not enough?
> 
> > - the full dmesg has been lost so I can't even find out when amdgpu
> > got loaded
> CI team, can you get all logs transferred ?
From runner log also some data : https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_139097v2/fi-kbl-8809g/igt_runner0.txt 


> On digging internally I see from dmesg (start from that file)
> 
> <7>[   39.365629] [IGT] i915_module_load: executing
> <7>[   39.373992] [IGT] i915_module_load: starting subtest load
> <7>[   39.376091] [IGT] i915_module_load: finished subtest load, SKIP
> <7>[   39.376197] [IGT] i915_module_load: exiting, ret=77
> <7>[   39.551743] [IGT] core_auth: executing
> <6>[   42.196892] [drm] amdgpu kernel modesetting enabled.
> <7>[   42.197065] [drm:amdgpu_acpi_detect [amdgpu]] No matching acpi
> device found for AMD3000
> <6>[   42.198069] amdgpu: Virtual CRAT table created for CPU
> <6>[   42.198933] amdgpu: Topology: Add CPU node
> <6>[   42.200595] amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
> <6>[   42.201352] [drm] initializing kernel modesetting (VEGAM
> 0x1002:0x694C 0x8086:0x2073 0xC0).
> <6>[   42.201418] [drm] register mmio base: 0xDB500000
> <6>[   42.201420] [drm] register mmio size: 262144
> <6>[   42.202307] amdgpu 0000:01:00.0: amdgpu: detected ip block number
> 0 <vi_common>
> <6>[   42.202311] amdgpu 0000:01:00.0: amdgpu: detected ip block number
> 1 <gmc_v8_0>
> <6>[   42.202314] amdgpu 0000:01:00.0: amdgpu: detected ip block number
> 2 <tonga_ih>
> <6>[   42.202316] amdgpu 0000:01:00.0: amdgpu: detected ip block number
> 3 <gfx_v8_0>
> <6>[   42.202318] amdgpu 0000:01:00.0: amdgpu: detected ip block number
> 4 <sdma_v3_0>
> <6>[   42.202321] amdgpu 0000:01:00.0: amdgpu: detected ip block number
> 5 <powerplay>
> <6>[   42.202323] amdgpu 0000:01:00.0: amdgpu: detected ip block number
> 6 <dm>
> <6>[   42.202325] amdgpu 0000:01:00.0: amdgpu: detected ip block number
> 7 <uvd_v6_0>
> <6>[   42.202327] amdgpu 0000:01:00.0: amdgpu: detected ip block number
> 8 <vce_v3_0>
> <6>[   42.202427] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from VFCT
> <6>[   42.202449] amdgpu: ATOM BIOS: 408435.180301.04s
> <6>[   42.228348] [drm] UVD is enabled in VM mode
> <6>[   42.228353] [drm] UVD ENC is enabled in VM mode
> <6>[   42.228356] [drm] VCE enabled in VM mode
> <6>[   42.228734] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
> 
> >
> > --
> > Ville Syrjälä
> > Intel
Saarinen, Jani March 12, 2025, 11:53 a.m. UTC | #4
Hi, and one more

> -----Original Message-----
> From: Saarinen, Jani
> Sent: Wednesday, 12 March 2025 12.08
> To: Ville Syrjälä <ville.syrjala@linux.intel.com>; intel-gfx@lists.freedesktop.org;
> I915-ci-infra@lists.freedesktop.org
> Subject: RE: ✗ i915.CI.BAT: failure for drm/i915/pm: Clean up the hibernate
> vs. PCI D3 quirk (rev2)
> 
> Hi,
> 
> 
> > -----Original Message-----
> > From: Saarinen, Jani
> > Sent: Wednesday, 12 March 2025 12.06
> > To: Ville Syrjälä <ville.syrjala@linux.intel.com>;
> > intel-gfx@lists.freedesktop.org; I915-ci-infra@lists.freedesktop.org
> > Subject: RE: ✗ i915.CI.BAT: failure for drm/i915/pm: Clean up the
> > hibernate vs. PCI D3 quirk (rev2)
> >
> > Hi,
> > > -----Original Message-----
> > > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf
> > > Of Ville Syrjälä
> > > Sent: Wednesday, 12 March 2025 11.53
> > > To: intel-gfx@lists.freedesktop.org
> > > Cc: I915-ci-infra@lists.freedesktop.org
> > > Subject: Re: ✗ i915.CI.BAT: failure for drm/i915/pm: Clean up the
> > > hibernate vs. PCI D3 quirk (rev2)
> > >
> > > On Tue, Mar 11, 2025 at 11:15:53PM -0000, Patchwork wrote:
> > > > == Series Details ==
> > > >
> > > > Series: drm/i915/pm: Clean up the hibernate vs. PCI D3 quirk (rev2)
> > > > URL   : https://patchwork.freedesktop.org/series/139097/
> > > > State : failure
> > > >
> > > > == Summary ==
> > > >
> > > > #### Possible regressions ####
> > > >   * igt@kms_addfb_basic@too-high:
> > > >     - fi-kbl-8809g:       NOTRUN -> [FAIL][6] +3 other tests fail
> > > >    [6]:
> > > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_139097v2/fi-kbl
> > > > -
> > > 8809g/igt@kms_addfb_basic@too-high.html
> > >
> > > A bunch of stuff seems to have broken in CI:
> > > - something is now loading amdgpu when we didn't want it loaded
> > On boot I see
> > <6>[    0.000000] Command line: BOOT_IMAGE=/boot/drm_intel
> > root=/dev/nvme0n1p2 rootwait fsck.repair=yes nmi_watchdog=panic,auto
> > panic=5 softdog.soft_panic=5 log_buf_len=1M trace_clock=global
> > xe.force_probe=* i915.force_probe=* drm.debug=0xe
> > modprobe.blacklist=xe,i915,ast modprobe.blacklist=amdgpu ro
> >
> > Is that not enough?
> >
> > > - the full dmesg has been lost so I can't even find out when amdgpu
> > > got loaded
> > CI team, can you get all logs transferred ?
> From runner log also some data : https://intel-gfx-ci.01.org/tree/drm-
> tip/Patchwork_139097v2/fi-kbl-8809g/igt_runner0.txt
> 
Should this fix the behavior https://patchwork.freedesktop.org/series/146170/ as we started not to blacklist snd_hda_intel at CI_DRM_16263 (deploy script change). 

Br,
Jani
> 
> > On digging internally I see from dmesg (start from that file)
> >
> > <7>[   39.365629] [IGT] i915_module_load: executing
> > <7>[   39.373992] [IGT] i915_module_load: starting subtest load
> > <7>[   39.376091] [IGT] i915_module_load: finished subtest load, SKIP
> > <7>[   39.376197] [IGT] i915_module_load: exiting, ret=77
> > <7>[   39.551743] [IGT] core_auth: executing
> > <6>[   42.196892] [drm] amdgpu kernel modesetting enabled.
> > <7>[   42.197065] [drm:amdgpu_acpi_detect [amdgpu]] No matching acpi
> > device found for AMD3000
> > <6>[   42.198069] amdgpu: Virtual CRAT table created for CPU
> > <6>[   42.198933] amdgpu: Topology: Add CPU node
> > <6>[   42.200595] amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
> > <6>[   42.201352] [drm] initializing kernel modesetting (VEGAM
> > 0x1002:0x694C 0x8086:0x2073 0xC0).
> > <6>[   42.201418] [drm] register mmio base: 0xDB500000
> > <6>[   42.201420] [drm] register mmio size: 262144
> > <6>[   42.202307] amdgpu 0000:01:00.0: amdgpu: detected ip block
> number
> > 0 <vi_common>
> > <6>[   42.202311] amdgpu 0000:01:00.0: amdgpu: detected ip block
> number
> > 1 <gmc_v8_0>
> > <6>[   42.202314] amdgpu 0000:01:00.0: amdgpu: detected ip block
> number
> > 2 <tonga_ih>
> > <6>[   42.202316] amdgpu 0000:01:00.0: amdgpu: detected ip block
> number
> > 3 <gfx_v8_0>
> > <6>[   42.202318] amdgpu 0000:01:00.0: amdgpu: detected ip block
> number
> > 4 <sdma_v3_0>
> > <6>[   42.202321] amdgpu 0000:01:00.0: amdgpu: detected ip block
> number
> > 5 <powerplay>
> > <6>[   42.202323] amdgpu 0000:01:00.0: amdgpu: detected ip block
> number
> > 6 <dm>
> > <6>[   42.202325] amdgpu 0000:01:00.0: amdgpu: detected ip block
> number
> > 7 <uvd_v6_0>
> > <6>[   42.202327] amdgpu 0000:01:00.0: amdgpu: detected ip block
> number
> > 8 <vce_v3_0>
> > <6>[   42.202427] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from
> VFCT
> > <6>[   42.202449] amdgpu: ATOM BIOS: 408435.180301.04s
> > <6>[   42.228348] [drm] UVD is enabled in VM mode
> > <6>[   42.228353] [drm] UVD ENC is enabled in VM mode
> > <6>[   42.228356] [drm] VCE enabled in VM mode
> > <6>[   42.228734] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
> >
> > >
> > > --
> > > Ville Syrjälä
> > > Intel
Knop, Ryszard March 12, 2025, 12:06 p.m. UTC | #5
On Wed, 2025-03-12 at 10:05 +0000, Saarinen, Jani wrote:
> Hi, 
> > -----Original Message-----
> > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Ville
> > Syrjälä
> > Sent: Wednesday, 12 March 2025 11.53
> > To: intel-gfx@lists.freedesktop.org
> > Cc: I915-ci-infra@lists.freedesktop.org
> > Subject: Re: ✗ i915.CI.BAT: failure for drm/i915/pm: Clean up the hibernate
> > vs. PCI D3 quirk (rev2)
> > 
> > On Tue, Mar 11, 2025 at 11:15:53PM -0000, Patchwork wrote:
> > > == Series Details ==
> > > 
> > > Series: drm/i915/pm: Clean up the hibernate vs. PCI D3 quirk (rev2)
> > > URL   : https://patchwork.freedesktop.org/series/139097/
> > > State : failure
> > > 
> > > == Summary ==
> > > 
> > > #### Possible regressions ####
> > >   * igt@kms_addfb_basic@too-high:
> > >     - fi-kbl-8809g:       NOTRUN -> [FAIL][6] +3 other tests fail
> > >    [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_139097v2/fi-kbl-
> > 8809g/igt@kms_addfb_basic@too-high.html
> > 
> > A bunch of stuff seems to have broken in CI:
> > - something is now loading amdgpu when we didn't want it loaded
> On boot I see 
> <6>[    0.000000] Command line: BOOT_IMAGE=/boot/drm_intel root=/dev/nvme0n1p2 rootwait fsck.repair=yes nmi_watchdog=panic,auto panic=5 softdog.soft_panic=5 log_buf_len=1M trace_clock=global xe.force_probe=* i915.force_probe=* drm.debug=0xe modprobe.blacklist=xe,i915,ast modprobe.blacklist=amdgpu ro
> 
> Is that not enough? 

It looks like removing the snd_hda_intel blacklist causes this, see:

testrunner@fi-kbl-8809g:~$ lspci -v -s "01:00.1"
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 HDMI Audio
        Subsystem: Intel Corporation Polaris 22 HDMI Audio
        Flags: bus master, fast devsel, latency 0, IRQ 163, IOMMU group 1
        Memory at db560000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel

+Lucas, should we revert that?

> 
> > - the full dmesg has been lost so I can't even find out when amdgpu  got loaded
> CI team, can you get all logs transferred ? 
> On digging internally I see from dmesg (start from that file) 
> 
> <7>[   39.365629] [IGT] i915_module_load: executing
> <7>[   39.373992] [IGT] i915_module_load: starting subtest load
> <7>[   39.376091] [IGT] i915_module_load: finished subtest load, SKIP
> <7>[   39.376197] [IGT] i915_module_load: exiting, ret=77
> <7>[   39.551743] [IGT] core_auth: executing
> <6>[   42.196892] [drm] amdgpu kernel modesetting enabled.
> <7>[   42.197065] [drm:amdgpu_acpi_detect [amdgpu]] No matching acpi device found for AMD3000
> <6>[   42.198069] amdgpu: Virtual CRAT table created for CPU
> <6>[   42.198933] amdgpu: Topology: Add CPU node
> <6>[   42.200595] amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
> <6>[   42.201352] [drm] initializing kernel modesetting (VEGAM 0x1002:0x694C 0x8086:0x2073 0xC0).
> <6>[   42.201418] [drm] register mmio base: 0xDB500000
> <6>[   42.201420] [drm] register mmio size: 262144
> <6>[   42.202307] amdgpu 0000:01:00.0: amdgpu: detected ip block number 0 <vi_common>
> <6>[   42.202311] amdgpu 0000:01:00.0: amdgpu: detected ip block number 1 <gmc_v8_0>
> <6>[   42.202314] amdgpu 0000:01:00.0: amdgpu: detected ip block number 2 <tonga_ih>
> <6>[   42.202316] amdgpu 0000:01:00.0: amdgpu: detected ip block number 3 <gfx_v8_0>
> <6>[   42.202318] amdgpu 0000:01:00.0: amdgpu: detected ip block number 4 <sdma_v3_0>
> <6>[   42.202321] amdgpu 0000:01:00.0: amdgpu: detected ip block number 5 <powerplay>
> <6>[   42.202323] amdgpu 0000:01:00.0: amdgpu: detected ip block number 6 <dm>
> <6>[   42.202325] amdgpu 0000:01:00.0: amdgpu: detected ip block number 7 <uvd_v6_0>
> <6>[   42.202327] amdgpu 0000:01:00.0: amdgpu: detected ip block number 8 <vce_v3_0>
> <6>[   42.202427] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from VFCT
> <6>[   42.202449] amdgpu: ATOM BIOS: 408435.180301.04s
> <6>[   42.228348] [drm] UVD is enabled in VM mode
> <6>[   42.228353] [drm] UVD ENC is enabled in VM mode
> <6>[   42.228356] [drm] VCE enabled in VM mode
> <6>[   42.228734] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
> 
> > 
> > --
> > Ville Syrjälä
> > Intel