diff mbox series

drm/i915/display: Fix warning callstack for imbalance wakeref

Message ID 20220812044724.12131-1-mitulkumar.ajitkumar.golani@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/display: Fix warning callstack for imbalance wakeref | expand

Commit Message

Golani, Mitulkumar Ajitkumar Aug. 12, 2022, 4:47 a.m. UTC
While executing i915_selftest, wakeref imbalance warning is seen
with i915_selftest failure.

When device is already suspended, wakeref is acquired by
disable_rpm_wakeref_asserts and rpm ownership is transferred back
to core. During this case wakeref_count will not be zero.
Once driver is unregistered, this wakeref is released with
enable_rpm_wakeref_asserts and balancing wakeref_count acquired
by driver.

This patch will fix the warning callstack by adding check if device
is already suspended and rpm ownership transfer is going on.

Signed-off-by: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
---
 drivers/gpu/drm/i915/i915_driver.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Imre Deak Aug. 17, 2022, 12:49 p.m. UTC | #1
On Fri, Aug 12, 2022 at 10:17:24AM +0530, Mitul Golani wrote:
> While executing i915_selftest, wakeref imbalance warning is seen
> with i915_selftest failure.
> 
> When device is already suspended, wakeref is acquired by
> disable_rpm_wakeref_asserts and rpm ownership is transferred back
> to core. During this case wakeref_count will not be zero.
> Once driver is unregistered, this wakeref is released with
> enable_rpm_wakeref_asserts and balancing wakeref_count acquired
> by driver.
> 
> This patch will fix the warning callstack by adding check if device
> is already suspended and rpm ownership transfer is going on.
> 
> Signed-off-by: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_driver.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index deb8a8b76965..6530a8680cfd 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -1670,7 +1670,13 @@ static int intel_runtime_resume(struct device *kdev)
>  
>  	drm_dbg(&dev_priv->drm, "Resuming device\n");
>  
> -	drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm->wakeref_count));
> +	/*
> +	 * When device is already suspended, Wakeref is acquired by disable_rpm_wakeref_asserts
> +	 * and rpm ownership is transferred back to core. During this case wakeref_count will
> +	 * not be zero. Once driver is unregistered, this wakeref is released with
> +	 * enable_rpm_wakeref_asserts and balancing wakeref_count acquired by driver.
> +	 */
> +	drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm->wakeref_count) && !rpm->suspended);

I can't see how disable/enable_rpm_wakeref_asserts() can lead to this
WARN. They are always called in pairs both in intel_runtime_suspend()
and intel_runtime_resume(), leaving rpm->wakeref_count unchanged.

The root cause is probably somewhere else, incrementing
rpm->wakeref_count without runtime resuming the device.

The WARN() condition is corret, we shouldn't get here with a non-zero
wakeref_count. rpm->suspended - set in intel_runtime_suspend() and
cleared in intel_runtime_resume() - should be always false here, so the
above change would just disable the WARN in all cases.

>  	disable_rpm_wakeref_asserts(rpm);
>  
>  	intel_opregion_notify_adapter(dev_priv, PCI_D0);
> -- 
> 2.25.1
>
Golani, Mitulkumar Ajitkumar Aug. 23, 2022, 12:39 p.m. UTC | #2
Hi Imre,
 
> On Fri, Aug 12, 2022 at 10:17:24AM +0530, Mitul Golani wrote:
> > While executing i915_selftest, wakeref imbalance warning is seen with
> > i915_selftest failure.
> >
> > When device is already suspended, wakeref is acquired by
> > disable_rpm_wakeref_asserts and rpm ownership is transferred back to
> > core. During this case wakeref_count will not be zero.
> > Once driver is unregistered, this wakeref is released with
> > enable_rpm_wakeref_asserts and balancing wakeref_count acquired by
> > driver.
> >
> > This patch will fix the warning callstack by adding check if device is
> > already suspended and rpm ownership transfer is going on.
> >
> > Signed-off-by: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_driver.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_driver.c
> > b/drivers/gpu/drm/i915/i915_driver.c
> > index deb8a8b76965..6530a8680cfd 100644
> > --- a/drivers/gpu/drm/i915/i915_driver.c
> > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > @@ -1670,7 +1670,13 @@ static int intel_runtime_resume(struct device
> > *kdev)
> >
> >  	drm_dbg(&dev_priv->drm, "Resuming device\n");
> >
> > -	drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> >wakeref_count));
> > +	/*
> > +	 * When device is already suspended, Wakeref is acquired by
> disable_rpm_wakeref_asserts
> > +	 * and rpm ownership is transferred back to core. During this case
> wakeref_count will
> > +	 * not be zero. Once driver is unregistered, this wakeref is released
> with
> > +	 * enable_rpm_wakeref_asserts and balancing wakeref_count
> acquired by driver.
> > +	 */
> > +	drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> >wakeref_count) &&
> > +!rpm->suspended);
> 
> I can't see how disable/enable_rpm_wakeref_asserts() can lead to this
> WARN. They are always called in pairs both in intel_runtime_suspend() and
> intel_runtime_resume(), leaving rpm->wakeref_count unchanged.
> 
> The root cause is probably somewhere else, incrementing
> rpm->wakeref_count without runtime resuming the device.
> 
> The WARN() condition is corret, we shouldn't get here with a non-zero
> wakeref_count. rpm->suspended - set in intel_runtime_suspend() and
> cleared in intel_runtime_resume() - should be always false here, so the
> above change would just disable the WARN in all cases.
> 
Yes, in case of DG2, after device is suspended, i915_driver_remove is being called.
Here driver is taking wakeref with disable_rpm_wakeref_asserts when device was not resumed.

As per logs,
[  395.872971] i915 0000:03:00.0: [drm:intel_runtime_suspend [i915]] Suspending device
...
[  403.553235] i915_driver_remove: START wakeref=0
[  403.553288] i915_driver_remove: before unregister i915 wakeref=65537 (Wakeref Taken)
[  403.566086] i915 0000:03:00.0: [drm:intel_runtime_resume [i915]] Resuming device (Later Resuming Device)

Pushed new change with : https://patchwork.freedesktop.org/series/107211/#rev5 

> >  	disable_rpm_wakeref_asserts(rpm);
> >
> >  	intel_opregion_notify_adapter(dev_priv, PCI_D0);
> > --
> > 2.25.1
> >
Golani, Mitulkumar Ajitkumar Aug. 23, 2022, 12:56 p.m. UTC | #3
> Hi Imre,
> 
> > On Fri, Aug 12, 2022 at 10:17:24AM +0530, Mitul Golani wrote:
> > > While executing i915_selftest, wakeref imbalance warning is seen
> > > with i915_selftest failure.
> > >
> > > When device is already suspended, wakeref is acquired by
> > > disable_rpm_wakeref_asserts and rpm ownership is transferred back to
> > > core. During this case wakeref_count will not be zero.
> > > Once driver is unregistered, this wakeref is released with
> > > enable_rpm_wakeref_asserts and balancing wakeref_count acquired by
> > > driver.
> > >
> > > This patch will fix the warning callstack by adding check if device
> > > is already suspended and rpm ownership transfer is going on.
> > >
> > > Signed-off-by: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_driver.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_driver.c
> > > b/drivers/gpu/drm/i915/i915_driver.c
> > > index deb8a8b76965..6530a8680cfd 100644
> > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > @@ -1670,7 +1670,13 @@ static int intel_runtime_resume(struct device
> > > *kdev)
> > >
> > >  	drm_dbg(&dev_priv->drm, "Resuming device\n");
> > >
> > > -	drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> > >wakeref_count));
> > > +	/*
> > > +	 * When device is already suspended, Wakeref is acquired by
> > disable_rpm_wakeref_asserts
> > > +	 * and rpm ownership is transferred back to core. During this case
> > wakeref_count will
> > > +	 * not be zero. Once driver is unregistered, this wakeref is
> > > +released
> > with
> > > +	 * enable_rpm_wakeref_asserts and balancing wakeref_count
> > acquired by driver.
> > > +	 */
> > > +	drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> > >wakeref_count) &&
> > > +!rpm->suspended);
> >
> > I can't see how disable/enable_rpm_wakeref_asserts() can lead to this
> > WARN. They are always called in pairs both in intel_runtime_suspend()
> > and intel_runtime_resume(), leaving rpm->wakeref_count unchanged.
> >
> > The root cause is probably somewhere else, incrementing
> > rpm->wakeref_count without runtime resuming the device.
> >
> > The WARN() condition is corret, we shouldn't get here with a non-zero
> > wakeref_count. rpm->suspended - set in intel_runtime_suspend() and
> > cleared in intel_runtime_resume() - should be always false here, so
> > the above change would just disable the WARN in all cases.
> >
> Yes, in case of DG2, after device is suspended, i915_driver_remove is being
> called.
> Here driver is taking wakeref with disable_rpm_wakeref_asserts when device
> was not resumed.
> 
> As per logs,
> [  395.872971] i915 0000:03:00.0: [drm:intel_runtime_suspend [i915]]
> Suspending device ...
> [  403.553235] i915_driver_remove: START wakeref=0 [  403.553288]
> i915_driver_remove: before unregister i915 wakeref=65537 (Wakeref Taken)
> [  403.566086] i915 0000:03:00.0: [drm:intel_runtime_resume [i915]]
> Resuming device (Later Resuming Device)
> 
> Pushed new change with :
> https://patchwork.freedesktop.org/series/107211/#rev5
> 
Also when compared DG2 logs with ADLP working logs,
Already 1 wakeref was taken by DMC firmware(i915/adlp_dmc_ver2_16.bin (v2.16)), in-case of DG2 looks to be missing.
To support other targets and to prevent consecutive resuming device added following check,
if (i915->runtime_pm.suspended && !atomic_read(&i915->runtime_pm.wakeref_count))

ADLP Logs:
---------------
[   99.502434] i915_driver_probe: START wakeref=0
[  713.979074] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adlp_dmc_ver2_16.bin (v2.16)
[  102.455766] i915_driver_probe: END wakeref=65538
...
[  103.448570] i915_driver_remove: START wakeref=65537
[  103.448587] i915_driver_remove: before unregister i915 wakeref=131074 -> (disable_rpm_wakeref_assert)
[  103.585886] i915_driver_remove: END wakeref=0

DG2 Logs:
-------------
[ 1271.704314] i915_driver_probe: START wakeref=0
[  383.050984] i915 0000:03:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_07.bin (v2.7)
[ 1272.021133] i915_driver_probe: END wakeref=1
...
[  395.883531] i915 0000:03:00.0: [drm:intel_runtime_suspend [i915]] Device suspended
...
[ 1291.450841] i915_driver_remove: START wakeref=0
[ 1291.450877] i915_driver_remove: before unregister i915 wakeref=65537 -> (disable_rpm_wakeref_assert)
[ 1291.603281] i915_driver_remove: END wakeref=0

> > >  	disable_rpm_wakeref_asserts(rpm);
> > >
> > >  	intel_opregion_notify_adapter(dev_priv, PCI_D0);
> > > --
> > > 2.25.1
> > >
Imre Deak Aug. 25, 2022, 10:51 a.m. UTC | #4
On Tue, Aug 23, 2022 at 03:56:56PM +0300, Golani, Mitulkumar Ajitkumar wrote:
> > Hi Imre,
> >
> > > On Fri, Aug 12, 2022 at 10:17:24AM +0530, Mitul Golani wrote:
> > > > While executing i915_selftest, wakeref imbalance warning is seen
> > > > with i915_selftest failure.
> > > >
> > > > When device is already suspended, wakeref is acquired by
> > > > disable_rpm_wakeref_asserts and rpm ownership is transferred back to
> > > > core. During this case wakeref_count will not be zero.
> > > > Once driver is unregistered, this wakeref is released with
> > > > enable_rpm_wakeref_asserts and balancing wakeref_count acquired by
> > > > driver.
> > > >
> > > > This patch will fix the warning callstack by adding check if device
> > > > is already suspended and rpm ownership transfer is going on.
> > > >
> > > > Signed-off-by: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_driver.c | 8 +++++++-
> > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/i915_driver.c
> > > > b/drivers/gpu/drm/i915/i915_driver.c
> > > > index deb8a8b76965..6530a8680cfd 100644
> > > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > > @@ -1670,7 +1670,13 @@ static int intel_runtime_resume(struct device
> > > > *kdev)
> > > >
> > > >   drm_dbg(&dev_priv->drm, "Resuming device\n");
> > > >
> > > > - drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> > > >wakeref_count));
> > > > + /*
> > > > +  * When device is already suspended, Wakeref is acquired by
> > > disable_rpm_wakeref_asserts
> > > > +  * and rpm ownership is transferred back to core. During this case
> > > wakeref_count will
> > > > +  * not be zero. Once driver is unregistered, this wakeref is
> > > > +released
> > > with
> > > > +  * enable_rpm_wakeref_asserts and balancing wakeref_count
> > > acquired by driver.
> > > > +  */
> > > > + drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> > > >wakeref_count) &&
> > > > +!rpm->suspended);
> > >
> > > I can't see how disable/enable_rpm_wakeref_asserts() can lead to this
> > > WARN. They are always called in pairs both in intel_runtime_suspend()
> > > and intel_runtime_resume(), leaving rpm->wakeref_count unchanged.
> > >
> > > The root cause is probably somewhere else, incrementing
> > > rpm->wakeref_count without runtime resuming the device.
> > >
> > > The WARN() condition is corret, we shouldn't get here with a non-zero
> > > wakeref_count. rpm->suspended - set in intel_runtime_suspend() and
> > > cleared in intel_runtime_resume() - should be always false here, so
> > > the above change would just disable the WARN in all cases.
> > >
> > Yes, in case of DG2, after device is suspended, i915_driver_remove
> > is being called.  Here driver is taking wakeref with
> > disable_rpm_wakeref_asserts when device was not resumed.

> >
> > As per logs,
> > [  395.872971] i915 0000:03:00.0: [drm:intel_runtime_suspend [i915]]
> > Suspending device ...
> > [  403.553235] i915_driver_remove: START wakeref=0 [  403.553288]
> > i915_driver_remove: before unregister i915 wakeref=65537 (Wakeref Taken)
> > [  403.566086] i915 0000:03:00.0: [drm:intel_runtime_resume [i915]]
> > Resuming device (Later Resuming Device)
> >
> > Pushed new change with :
> > https://patchwork.freedesktop.org/series/107211/#rev5
> >
> Also when compared DG2 logs with ADLP working logs,
> Already 1 wakeref was taken by DMC firmware(i915/adlp_dmc_ver2_16.bin (v2.16)), in-case of DG2 looks to be missing.
> To support other targets and to prevent consecutive resuming device added following check,
> if (i915->runtime_pm.suspended && !atomic_read(&i915->runtime_pm.wakeref_count))
> 
> ADLP Logs:
> ---------------
> [   99.502434] i915_driver_probe: START wakeref=0
> [  713.979074] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adlp_dmc_ver2_16.bin (v2.16)
> [  102.455766] i915_driver_probe: END wakeref=65538
> ...
> [  103.448570] i915_driver_remove: START wakeref=65537
> [  103.448587] i915_driver_remove: before unregister i915 wakeref=131074 -> (disable_rpm_wakeref_assert)
> [  103.585886] i915_driver_remove: END wakeref=0
> 
> DG2 Logs:
> -------------
> [ 1271.704314] i915_driver_probe: START wakeref=0
> [  383.050984] i915 0000:03:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_07.bin (v2.7)
> [ 1272.021133] i915_driver_probe: END wakeref=1
> ...
> [  395.883531] i915 0000:03:00.0: [drm:intel_runtime_suspend [i915]] Device suspended
> ...
> [ 1291.450841] i915_driver_remove: START wakeref=0
> [ 1291.450877] i915_driver_remove: before unregister i915 wakeref=65537 -> (disable_rpm_wakeref_assert)
> [ 1291.603281] i915_driver_remove: END wakeref=0

Still not sure what's going. Both i915_pci_probe() and
i915_pci_remove()->i915_driver_remove() is called with a runtime PM
reference - taken at local_pci_probe() and pci_device_remove() - and so
the device should be runtime resumed at those points.

> > > >   disable_rpm_wakeref_asserts(rpm);
> > > >
> > > >   intel_opregion_notify_adapter(dev_priv, PCI_D0);
> > > > --
> > > > 2.25.1
> > > >
Golani, Mitulkumar Ajitkumar Aug. 29, 2022, 6:45 a.m. UTC | #5
Hi Imre,

> -----Original Message-----
> From: Deak, Imre <imre.deak@intel.com>
> Sent: 25 August 2022 16:22
> To: Golani, Mitulkumar Ajitkumar <mitulkumar.ajitkumar.golani@intel.com>
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Fix warning callstack for
> imbalance wakeref
> 
> On Tue, Aug 23, 2022 at 03:56:56PM +0300, Golani, Mitulkumar Ajitkumar
> wrote:
> > > Hi Imre,
> > >
> > > > On Fri, Aug 12, 2022 at 10:17:24AM +0530, Mitul Golani wrote:
> > > > > While executing i915_selftest, wakeref imbalance warning is seen
> > > > > with i915_selftest failure.
> > > > >
> > > > > When device is already suspended, wakeref is acquired by
> > > > > disable_rpm_wakeref_asserts and rpm ownership is transferred
> > > > > back to core. During this case wakeref_count will not be zero.
> > > > > Once driver is unregistered, this wakeref is released with
> > > > > enable_rpm_wakeref_asserts and balancing wakeref_count acquired
> > > > > by driver.
> > > > >
> > > > > This patch will fix the warning callstack by adding check if
> > > > > device is already suspended and rpm ownership transfer is going on.
> > > > >
> > > > > Signed-off-by: Mitul Golani
> > > > > <mitulkumar.ajitkumar.golani@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_driver.c | 8 +++++++-
> > > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/i915_driver.c
> > > > > b/drivers/gpu/drm/i915/i915_driver.c
> > > > > index deb8a8b76965..6530a8680cfd 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > > > @@ -1670,7 +1670,13 @@ static int intel_runtime_resume(struct
> > > > > device
> > > > > *kdev)
> > > > >
> > > > >   drm_dbg(&dev_priv->drm, "Resuming device\n");
> > > > >
> > > > > - drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> > > > >wakeref_count));
> > > > > + /*
> > > > > +  * When device is already suspended, Wakeref is acquired by
> > > > disable_rpm_wakeref_asserts
> > > > > +  * and rpm ownership is transferred back to core. During this
> > > > > + case
> > > > wakeref_count will
> > > > > +  * not be zero. Once driver is unregistered, this wakeref is
> > > > > +released
> > > > with
> > > > > +  * enable_rpm_wakeref_asserts and balancing wakeref_count
> > > > acquired by driver.
> > > > > +  */
> > > > > + drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm-
> > > > >wakeref_count) &&
> > > > > +!rpm->suspended);
> > > >
> > > > I can't see how disable/enable_rpm_wakeref_asserts() can lead to
> > > > this WARN. They are always called in pairs both in
> > > > intel_runtime_suspend() and intel_runtime_resume(), leaving rpm-
> >wakeref_count unchanged.
> > > >
> > > > The root cause is probably somewhere else, incrementing
> > > > rpm->wakeref_count without runtime resuming the device.
> > > >
> > > > The WARN() condition is corret, we shouldn't get here with a
> > > > non-zero wakeref_count. rpm->suspended - set in
> > > > intel_runtime_suspend() and cleared in intel_runtime_resume() -
> > > > should be always false here, so the above change would just disable the
> WARN in all cases.
> > > >
> > > Yes, in case of DG2, after device is suspended, i915_driver_remove
> > > is being called.  Here driver is taking wakeref with
> > > disable_rpm_wakeref_asserts when device was not resumed.
> 
> > >
> > > As per logs,
> > > [  395.872971] i915 0000:03:00.0: [drm:intel_runtime_suspend [i915]]
> > > Suspending device ...
> > > [  403.553235] i915_driver_remove: START wakeref=0 [  403.553288]
> > > i915_driver_remove: before unregister i915 wakeref=65537 (Wakeref
> > > Taken) [  403.566086] i915 0000:03:00.0: [drm:intel_runtime_resume
> > > [i915]] Resuming device (Later Resuming Device)
> > >
> > > Pushed new change with :
> > > https://patchwork.freedesktop.org/series/107211/#rev5
> > >
> > Also when compared DG2 logs with ADLP working logs, Already 1 wakeref
> > was taken by DMC firmware(i915/adlp_dmc_ver2_16.bin (v2.16)), in-case
> of DG2 looks to be missing.
> > To support other targets and to prevent consecutive resuming device
> > added following check, if (i915->runtime_pm.suspended &&
> > !atomic_read(&i915->runtime_pm.wakeref_count))
> >
> > ADLP Logs:
> > ---------------
> > [   99.502434] i915_driver_probe: START wakeref=0
> > [  713.979074] i915 0000:00:02.0: [drm] Finished loading DMC firmware
> > i915/adlp_dmc_ver2_16.bin (v2.16) [  102.455766] i915_driver_probe:
> > END wakeref=65538 ...
> > [  103.448570] i915_driver_remove: START wakeref=65537 [  103.448587]
> > i915_driver_remove: before unregister i915 wakeref=131074 ->
> > (disable_rpm_wakeref_assert) [  103.585886] i915_driver_remove: END
> > wakeref=0
> >
> > DG2 Logs:
> > -------------
> > [ 1271.704314] i915_driver_probe: START wakeref=0 [  383.050984] i915
> > 0000:03:00.0: [drm] Finished loading DMC firmware
> > i915/dg2_dmc_ver2_07.bin (v2.7) [ 1272.021133] i915_driver_probe: END
> > wakeref=1 ...
> > [  395.883531] i915 0000:03:00.0: [drm:intel_runtime_suspend [i915]]
> > Device suspended ...
> > [ 1291.450841] i915_driver_remove: START wakeref=0 [ 1291.450877]
> > i915_driver_remove: before unregister i915 wakeref=65537 ->
> > (disable_rpm_wakeref_assert) [ 1291.603281] i915_driver_remove: END
> > wakeref=0
> 
> Still not sure what's going. Both i915_pci_probe() and
> i915_pci_remove()->i915_driver_remove() is called with a runtime PM
> reference - taken at local_pci_probe() and pci_device_remove() - and so the
> device should be runtime resumed at those points.
> 

Yes reference is being taken at local_pci_probe() and pci_device_remove() but 
During i915_selftest@perf, it is loading and unloading i915_pci_probe() and
i915_pci_remove(), here pci_device_remove() is not being called, that's why
runtime PM reference is not present during i915_driver_remove().

> > > > >   disable_rpm_wakeref_asserts(rpm);
> > > > >
> > > > >   intel_opregion_notify_adapter(dev_priv, PCI_D0);
> > > > > --
> > > > > 2.25.1
> > > > >
Imre Deak Aug. 29, 2022, 2:46 p.m. UTC | #6
On Mon, Aug 29, 2022 at 09:45:53AM +0300, Golani, Mitulkumar Ajitkumar wrote:
> Hi Imre,
> 
> > [...]
> > Still not sure what's going. Both i915_pci_probe() and
> > i915_pci_remove()->i915_driver_remove() is called with a runtime PM
> > reference - taken at local_pci_probe() and pci_device_remove() - and so the
> > device should be runtime resumed at those points.
> >
> 
> Yes reference is being taken at local_pci_probe() and pci_device_remove() but
> During i915_selftest@perf, it is loading and unloading i915_pci_probe() and
> i915_pci_remove(), here pci_device_remove() is not being called, that's why
> runtime PM reference is not present during i915_driver_remove().

Ok, that explains it. Taking an actual RPM reference unconditionally in
i915_driver_remove() should fix this (instead of the
disable/enable_rpm_wakeref_asserts() calls there):

wakeref = intel_runtime_pm_get();
...
intel_runtime_pm_put(wakeref);

While at it the same change should be applied in i915_driver_release()
as well for consistency.

> > > > > >   disable_rpm_wakeref_asserts(rpm);
> > > > > >
> > > > > >   intel_opregion_notify_adapter(dev_priv, PCI_D0);
> > > > > > --
> > > > > > 2.25.1
> > > > > >
Golani, Mitulkumar Ajitkumar Aug. 29, 2022, 4:59 p.m. UTC | #7
Hi Imre,

> -----Original Message-----
> From: Deak, Imre <imre.deak@intel.com>
> Sent: 29 August 2022 20:16
> To: Golani, Mitulkumar Ajitkumar <mitulkumar.ajitkumar.golani@intel.com>
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Fix warning callstack for
> imbalance wakeref
> 
> On Mon, Aug 29, 2022 at 09:45:53AM +0300, Golani, Mitulkumar Ajitkumar
> wrote:
> > Hi Imre,
> >
> > > [...]
> > > Still not sure what's going. Both i915_pci_probe() and
> > > i915_pci_remove()->i915_driver_remove() is called with a runtime PM
> > > reference - taken at local_pci_probe() and pci_device_remove() - and
> > > so the device should be runtime resumed at those points.
> > >
> >
> > Yes reference is being taken at local_pci_probe() and
> > pci_device_remove() but During i915_selftest@perf, it is loading and
> > unloading i915_pci_probe() and i915_pci_remove(), here
> > pci_device_remove() is not being called, that's why runtime PM reference is
> not present during i915_driver_remove().
> 
> Ok, that explains it. Taking an actual RPM reference unconditionally in
> i915_driver_remove() should fix this (instead of the
> disable/enable_rpm_wakeref_asserts() calls there):
> 
> wakeref = intel_runtime_pm_get();
> ...
> intel_runtime_pm_put(wakeref);
> 
> While at it the same change should be applied in i915_driver_release() as
> well for consistency.
> 

Thanks Imre. Verified on Target. This works.

Pushed changes : https://patchwork.freedesktop.org/patch/500180/?series=107211&rev=6
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index deb8a8b76965..6530a8680cfd 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1670,7 +1670,13 @@  static int intel_runtime_resume(struct device *kdev)
 
 	drm_dbg(&dev_priv->drm, "Resuming device\n");
 
-	drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm->wakeref_count));
+	/*
+	 * When device is already suspended, Wakeref is acquired by disable_rpm_wakeref_asserts
+	 * and rpm ownership is transferred back to core. During this case wakeref_count will
+	 * not be zero. Once driver is unregistered, this wakeref is released with
+	 * enable_rpm_wakeref_asserts and balancing wakeref_count acquired by driver.
+	 */
+	drm_WARN_ON_ONCE(&dev_priv->drm, atomic_read(&rpm->wakeref_count) && !rpm->suspended);
 	disable_rpm_wakeref_asserts(rpm);
 
 	intel_opregion_notify_adapter(dev_priv, PCI_D0);