diff mbox

[v2] drm/i915: Don't warn if we restore pm interrupts during reset

Message ID 1408020403-21956-1-git-send-email-mika.kuoppala@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Kuoppala Aug. 14, 2014, 12:46 p.m. UTC
We lost the software state tracking due to reset, so don't
complain if it doesn't match.

v2: fix build error

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Daniel Vetter Aug. 14, 2014, 2:23 p.m. UTC | #1
On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote:
> We lost the software state tracking due to reset, so don't
> complain if it doesn't match.

This sounds more like gpu reset should be a bit more careful (even more
careful than we already are compared to earlier kernels) with making sure
the irq state is still sane after a reset?

Or what exactly is the failure mode here? The commit message lacks a bit
details in form of a nice text or even better: A testcase ;-)

Thanks, Daniel

> 
> v2: fix build error
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 12f4e14..7a1309c 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
>  	spin_lock_irq(&dev_priv->irq_lock);
> -	WARN_ON(dev_priv->rps.pm_iir);
> +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
> +		WARN_ON(dev_priv->rps.pm_iir);
>  	gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
>  	I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events);
>  	spin_unlock_irq(&dev_priv->irq_lock);
> @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
>  	spin_lock_irq(&dev_priv->irq_lock);
> -	WARN_ON(dev_priv->rps.pm_iir);
> +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
> +		WARN_ON(dev_priv->rps.pm_iir);
>  	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
>  	I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events);
>  	spin_unlock_irq(&dev_priv->irq_lock);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ville Syrjälä Aug. 14, 2014, 2:45 p.m. UTC | #2
On Thu, Aug 14, 2014 at 04:23:16PM +0200, Daniel Vetter wrote:
> On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote:
> > We lost the software state tracking due to reset, so don't
> > complain if it doesn't match.
> 
> This sounds more like gpu reset should be a bit more careful (even more
> careful than we already are compared to earlier kernels) with making sure
> the irq state is still sane after a reset?
> 
> Or what exactly is the failure mode here? The commit message lacks a bit
> details in form of a nice text or even better: A testcase ;-)

Killing the hpd irq and gt_powersave junk from i915_reset() would be my
suggestion here. I don't even know why the hpd stuff is still there, we
removed all the other irq frobbery from there a while back. And last I
looked gpu reset didn't affect the rc6/rps stuff either, though more
testing should be done to make sure I didn't just imagine it.

> 
> Thanks, Daniel
> 
> > 
> > v2: fix build error
> > 
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_pm.c |    6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 12f4e14..7a1309c 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev)
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  
> >  	spin_lock_irq(&dev_priv->irq_lock);
> > -	WARN_ON(dev_priv->rps.pm_iir);
> > +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
> > +		WARN_ON(dev_priv->rps.pm_iir);
> >  	gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
> >  	I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events);
> >  	spin_unlock_irq(&dev_priv->irq_lock);
> > @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev)
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  
> >  	spin_lock_irq(&dev_priv->irq_lock);
> > -	WARN_ON(dev_priv->rps.pm_iir);
> > +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
> > +		WARN_ON(dev_priv->rps.pm_iir);
> >  	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
> >  	I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events);
> >  	spin_unlock_irq(&dev_priv->irq_lock);
> > -- 
> > 1.7.9.5
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Aug. 14, 2014, 3:03 p.m. UTC | #3
On Thu, Aug 14, 2014 at 05:45:59PM +0300, Ville Syrjälä wrote:
> On Thu, Aug 14, 2014 at 04:23:16PM +0200, Daniel Vetter wrote:
> > On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote:
> > > We lost the software state tracking due to reset, so don't
> > > complain if it doesn't match.
> > 
> > This sounds more like gpu reset should be a bit more careful (even more
> > careful than we already are compared to earlier kernels) with making sure
> > the irq state is still sane after a reset?
> > 
> > Or what exactly is the failure mode here? The commit message lacks a bit
> > details in form of a nice text or even better: A testcase ;-)
> 
> Killing the hpd irq and gt_powersave junk from i915_reset() would be my
> suggestion here. I don't even know why the hpd stuff is still there, we
> removed all the other irq frobbery from there a while back. And last I
> looked gpu reset didn't affect the rc6/rps stuff either, though more
> testing should be done to make sure I didn't just imagine it.

No idea why the hpd_init is still in there. Could be a merge artifcat with
removing the irq handling in one patch and adding hpd_init in another. I
guess we could ditch it. The PM irq stuff is a bit more tricky since the
ring init resets them ...
-Daniel
> 
> > 
> > Thanks, Daniel
> > 
> > > 
> > > v2: fix build error
> > > 
> > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_pm.c |    6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > index 12f4e14..7a1309c 100644
> > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev)
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > >  
> > >  	spin_lock_irq(&dev_priv->irq_lock);
> > > -	WARN_ON(dev_priv->rps.pm_iir);
> > > +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
> > > +		WARN_ON(dev_priv->rps.pm_iir);
> > >  	gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
> > >  	I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events);
> > >  	spin_unlock_irq(&dev_priv->irq_lock);
> > > @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev)
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > >  
> > >  	spin_lock_irq(&dev_priv->irq_lock);
> > > -	WARN_ON(dev_priv->rps.pm_iir);
> > > +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
> > > +		WARN_ON(dev_priv->rps.pm_iir);
> > >  	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
> > >  	I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events);
> > >  	spin_unlock_irq(&dev_priv->irq_lock);
> > > -- 
> > > 1.7.9.5
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC
Mika Kuoppala Aug. 14, 2014, 3:14 p.m. UTC | #4
Daniel Vetter <daniel@ffwll.ch> writes:

> On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote:
>> We lost the software state tracking due to reset, so don't
>> complain if it doesn't match.
>
> This sounds more like gpu reset should be a bit more careful (even more
> careful than we already are compared to earlier kernels) with making sure
> the irq state is still sane after a reset?
>
> Or what exactly is the failure mode here? The commit message lacks a bit
> details in form of a nice text or even better: A testcase ;-)

We have pm ref during reset. And then after reset, we kick 
intel_gt_reset_powersave to re-enable the rps. Countrary to
suspend/thaw, we never disabled the interrupts. And the warn
triggers.

I tried to disable the interrupts during reset handling but the
nonblocking __wait_seqno() triggered another state warning 
it was taking a pm ref during or right after reset recovery for hw
access.

-Mika

> Thanks, Daniel
>
>> 
>> v2: fix build error
>> 
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>  drivers/gpu/drm/i915/intel_pm.c |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> index 12f4e14..7a1309c 100644
>> --- a/drivers/gpu/drm/i915/intel_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>> @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev)
>>  	struct drm_i915_private *dev_priv = dev->dev_private;
>>  
>>  	spin_lock_irq(&dev_priv->irq_lock);
>> -	WARN_ON(dev_priv->rps.pm_iir);
>> +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
>> +		WARN_ON(dev_priv->rps.pm_iir);
>>  	gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
>>  	I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events);
>>  	spin_unlock_irq(&dev_priv->irq_lock);
>> @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev)
>>  	struct drm_i915_private *dev_priv = dev->dev_private;
>>  
>>  	spin_lock_irq(&dev_priv->irq_lock);
>> -	WARN_ON(dev_priv->rps.pm_iir);
>> +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
>> +		WARN_ON(dev_priv->rps.pm_iir);
>>  	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
>>  	I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events);
>>  	spin_unlock_irq(&dev_priv->irq_lock);
>> -- 
>> 1.7.9.5
>> 
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Mika Kuoppala Aug. 14, 2014, 3:43 p.m. UTC | #5
Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:

> Daniel Vetter <daniel@ffwll.ch> writes:
>
>> On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote:
>>> We lost the software state tracking due to reset, so don't
>>> complain if it doesn't match.
>>
>> This sounds more like gpu reset should be a bit more careful (even more
>> careful than we already are compared to earlier kernels) with making sure
>> the irq state is still sane after a reset?
>>
>> Or what exactly is the failure mode here? The commit message lacks a bit
>> details in form of a nice text or even better: A testcase ;-)
>
> We have pm ref during reset. And then after reset, we kick 
> intel_gt_reset_powersave to re-enable the rps. Countrary to
> suspend/thaw, we never disabled the interrupts. And the warn
> triggers.
>
> I tried to disable the interrupts during reset handling but the
> nonblocking __wait_seqno() triggered another state warning 
> it was taking a pm ref during or right after reset recovery for hw
> access.
> -Mika
>

Pretty difficult to hit also. I needed multiple tries of 
ctrl-c the process that submitted the hang and have a another
client running in background doing gpu access.

Timing issue related that we enable the rps through delayed workqueue?

Here is the trace:
[  635.478701] [drm] Simulated gpu hang, resetting stop_rings
[  637.457126] ------------[ cut here ]------------
[  637.458711] WARNING: CPU: 5 PID: 3595 at
drivers/gpu/drm/i915/intel_pm.c:3607
gen6_enable_rps_interrupts+0x72/0x80 [i915]()
[  637.460361] Modules linked in: i915 drm_kms_helper drm kvm_intel kvm
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm
snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq mxm_wmi snd_timer
snd_seq_device psmouse snd serio_raw ehci_pci bnep ehci_hcd rfcomm
soundcore bluetooth wmi mac_hid parport_pc ppdev lp parport dm_crypt
usbhid firewire_ohci firewire_core crc_itu_t e1000e ptp pps_core
xhci_hcd usbcore i2c_algo_bit video usb_common [last unloaded: drm]
[  637.468170] CPU: 5 PID: 3595 Comm: kworker/5:0 Tainted: G        W
3.16.0+ #240
[  637.469545] Workqueue: events intel_gen6_powersave_work [i915]
[  637.471042]  00000000 00000000 ca0d3e54 c15adcca f8898260 ca0d3e84
c1047224 c17536b0
[  637.472616]  00000005 00000e0b f8898260 00000e17 f87ff852 f87ff852
f6ec8000 f6ecbe68
[  637.474301]  ee851c00 ca0d3e94 c1047262 00000009 00000000 ca0d3ea8
f87ff852 f6ec8000
[  637.475920] Call Trace:
[  637.477504]  [<c15adcca>] dump_stack+0x48/0x60
[  637.479060]  [<c1047224>] warn_slowpath_common+0x84/0xa0
[  637.480708]  [<f87ff852>] ? gen6_enable_rps_interrupts+0x72/0x80
[i915]
[  637.481880]  [<f87ff852>] ? gen6_enable_rps_interrupts+0x72/0x80
[i915]
[  637.483220]  [<c1047262>] warn_slowpath_null+0x22/0x30
[  637.484258]  [<f87ff852>] gen6_enable_rps_interrupts+0x72/0x80 [i915]
[  637.485503]  [<f8808ecd>] intel_gen6_powersave_work+0x57d/0x1020
[i915]
[  637.486516]  [<c105e8bc>] process_one_work+0x10c/0x3c0
[  637.487630]  [<c105f523>] worker_thread+0xf3/0x470
[  637.488618]  [<c105f430>] ? create_and_start_worker+0x50/0x50
[  637.489802]  [<c1064cdb>] kthread+0x9b/0xb0
[  637.490804]  [<c15b4e01>] ret_from_kernel_thread+0x21/0x30
[  637.491872]  [<c1064c40>] ? flush_kthread_worker+0xb0/0xb0
[  637.492862] ---[ end trace b31c16cec8a7abaa ]---

-Mika

>> Thanks, Daniel
>>
>>> 
>>> v2: fix build error
>>> 
>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/intel_pm.c |    6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>>> index 12f4e14..7a1309c 100644
>>> --- a/drivers/gpu/drm/i915/intel_pm.c
>>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>>> @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev)
>>>  	struct drm_i915_private *dev_priv = dev->dev_private;
>>>  
>>>  	spin_lock_irq(&dev_priv->irq_lock);
>>> -	WARN_ON(dev_priv->rps.pm_iir);
>>> +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
>>> +		WARN_ON(dev_priv->rps.pm_iir);
>>>  	gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
>>>  	I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events);
>>>  	spin_unlock_irq(&dev_priv->irq_lock);
>>> @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev)
>>>  	struct drm_i915_private *dev_priv = dev->dev_private;
>>>  
>>>  	spin_lock_irq(&dev_priv->irq_lock);
>>> -	WARN_ON(dev_priv->rps.pm_iir);
>>> +	if (!i915_reset_in_progress(&dev_priv->gpu_error))
>>> +		WARN_ON(dev_priv->rps.pm_iir);
>>>  	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
>>>  	I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events);
>>>  	spin_unlock_irq(&dev_priv->irq_lock);
>>> -- 
>>> 1.7.9.5
>>> 
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Aug. 15, 2014, 6:55 a.m. UTC | #6
On Thu, Aug 14, 2014 at 05:45:59PM +0300, Ville Syrjälä wrote:
> On Thu, Aug 14, 2014 at 04:23:16PM +0200, Daniel Vetter wrote:
> > On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote:
> > > We lost the software state tracking due to reset, so don't
> > > complain if it doesn't match.
> > 
> > This sounds more like gpu reset should be a bit more careful (even more
> > careful than we already are compared to earlier kernels) with making sure
> > the irq state is still sane after a reset?
> > 
> > Or what exactly is the failure mode here? The commit message lacks a bit
> > details in form of a nice text or even better: A testcase ;-)
> 
> Killing the hpd irq and gt_powersave junk from i915_reset() would be my
> suggestion here. I don't even know why the hpd stuff is still there, we
> removed all the other irq frobbery from there a while back. And last I
> looked gpu reset didn't affect the rc6/rps stuff either, though more
> testing should be done to make sure I didn't just imagine it.

I'd like a call to intel_mark_idle() since we are resetting the GPU
activity tracker. That is meant to dtrt with all the powersave tracking
as well. (Well if someone would just review some patches it might dtrt!)
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 12f4e14..7a1309c 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3593,7 +3593,8 @@  static void gen8_enable_rps_interrupts(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	spin_lock_irq(&dev_priv->irq_lock);
-	WARN_ON(dev_priv->rps.pm_iir);
+	if (!i915_reset_in_progress(&dev_priv->gpu_error))
+		WARN_ON(dev_priv->rps.pm_iir);
 	gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
 	I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events);
 	spin_unlock_irq(&dev_priv->irq_lock);
@@ -3604,7 +3605,8 @@  static void gen6_enable_rps_interrupts(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	spin_lock_irq(&dev_priv->irq_lock);
-	WARN_ON(dev_priv->rps.pm_iir);
+	if (!i915_reset_in_progress(&dev_priv->gpu_error))
+		WARN_ON(dev_priv->rps.pm_iir);
 	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
 	I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events);
 	spin_unlock_irq(&dev_priv->irq_lock);