Message ID | 1408020403-21956-1-git-send-email-mika.kuoppala@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote: > We lost the software state tracking due to reset, so don't > complain if it doesn't match. This sounds more like gpu reset should be a bit more careful (even more careful than we already are compared to earlier kernels) with making sure the irq state is still sane after a reset? Or what exactly is the failure mode here? The commit message lacks a bit details in form of a nice text or even better: A testcase ;-) Thanks, Daniel > > v2: fix build error > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/intel_pm.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > index 12f4e14..7a1309c 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev) > struct drm_i915_private *dev_priv = dev->dev_private; > > spin_lock_irq(&dev_priv->irq_lock); > - WARN_ON(dev_priv->rps.pm_iir); > + if (!i915_reset_in_progress(&dev_priv->gpu_error)) > + WARN_ON(dev_priv->rps.pm_iir); > gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); > I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events); > spin_unlock_irq(&dev_priv->irq_lock); > @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev) > struct drm_i915_private *dev_priv = dev->dev_private; > > spin_lock_irq(&dev_priv->irq_lock); > - WARN_ON(dev_priv->rps.pm_iir); > + if (!i915_reset_in_progress(&dev_priv->gpu_error)) > + WARN_ON(dev_priv->rps.pm_iir); > gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); > I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events); > spin_unlock_irq(&dev_priv->irq_lock); > -- > 1.7.9.5 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Aug 14, 2014 at 04:23:16PM +0200, Daniel Vetter wrote: > On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote: > > We lost the software state tracking due to reset, so don't > > complain if it doesn't match. > > This sounds more like gpu reset should be a bit more careful (even more > careful than we already are compared to earlier kernels) with making sure > the irq state is still sane after a reset? > > Or what exactly is the failure mode here? The commit message lacks a bit > details in form of a nice text or even better: A testcase ;-) Killing the hpd irq and gt_powersave junk from i915_reset() would be my suggestion here. I don't even know why the hpd stuff is still there, we removed all the other irq frobbery from there a while back. And last I looked gpu reset didn't affect the rc6/rps stuff either, though more testing should be done to make sure I didn't just imagine it. > > Thanks, Daniel > > > > > v2: fix build error > > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > > --- > > drivers/gpu/drm/i915/intel_pm.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > > index 12f4e14..7a1309c 100644 > > --- a/drivers/gpu/drm/i915/intel_pm.c > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev) > > struct drm_i915_private *dev_priv = dev->dev_private; > > > > spin_lock_irq(&dev_priv->irq_lock); > > - WARN_ON(dev_priv->rps.pm_iir); > > + if (!i915_reset_in_progress(&dev_priv->gpu_error)) > > + WARN_ON(dev_priv->rps.pm_iir); > > gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); > > I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events); > > spin_unlock_irq(&dev_priv->irq_lock); > > @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev) > > struct drm_i915_private *dev_priv = dev->dev_private; > > > > spin_lock_irq(&dev_priv->irq_lock); > > - WARN_ON(dev_priv->rps.pm_iir); > > + if (!i915_reset_in_progress(&dev_priv->gpu_error)) > > + WARN_ON(dev_priv->rps.pm_iir); > > gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); > > I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events); > > spin_unlock_irq(&dev_priv->irq_lock); > > -- > > 1.7.9.5 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Aug 14, 2014 at 05:45:59PM +0300, Ville Syrjälä wrote: > On Thu, Aug 14, 2014 at 04:23:16PM +0200, Daniel Vetter wrote: > > On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote: > > > We lost the software state tracking due to reset, so don't > > > complain if it doesn't match. > > > > This sounds more like gpu reset should be a bit more careful (even more > > careful than we already are compared to earlier kernels) with making sure > > the irq state is still sane after a reset? > > > > Or what exactly is the failure mode here? The commit message lacks a bit > > details in form of a nice text or even better: A testcase ;-) > > Killing the hpd irq and gt_powersave junk from i915_reset() would be my > suggestion here. I don't even know why the hpd stuff is still there, we > removed all the other irq frobbery from there a while back. And last I > looked gpu reset didn't affect the rc6/rps stuff either, though more > testing should be done to make sure I didn't just imagine it. No idea why the hpd_init is still in there. Could be a merge artifcat with removing the irq handling in one patch and adding hpd_init in another. I guess we could ditch it. The PM irq stuff is a bit more tricky since the ring init resets them ... -Daniel > > > > > Thanks, Daniel > > > > > > > > v2: fix build error > > > > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > > > --- > > > drivers/gpu/drm/i915/intel_pm.c | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > > > index 12f4e14..7a1309c 100644 > > > --- a/drivers/gpu/drm/i915/intel_pm.c > > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > > @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev) > > > struct drm_i915_private *dev_priv = dev->dev_private; > > > > > > spin_lock_irq(&dev_priv->irq_lock); > > > - WARN_ON(dev_priv->rps.pm_iir); > > > + if (!i915_reset_in_progress(&dev_priv->gpu_error)) > > > + WARN_ON(dev_priv->rps.pm_iir); > > > gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); > > > I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events); > > > spin_unlock_irq(&dev_priv->irq_lock); > > > @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev) > > > struct drm_i915_private *dev_priv = dev->dev_private; > > > > > > spin_lock_irq(&dev_priv->irq_lock); > > > - WARN_ON(dev_priv->rps.pm_iir); > > > + if (!i915_reset_in_progress(&dev_priv->gpu_error)) > > > + WARN_ON(dev_priv->rps.pm_iir); > > > gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); > > > I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events); > > > spin_unlock_irq(&dev_priv->irq_lock); > > > -- > > > 1.7.9.5 > > > > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Ville Syrjälä > Intel OTC
Daniel Vetter <daniel@ffwll.ch> writes: > On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote: >> We lost the software state tracking due to reset, so don't >> complain if it doesn't match. > > This sounds more like gpu reset should be a bit more careful (even more > careful than we already are compared to earlier kernels) with making sure > the irq state is still sane after a reset? > > Or what exactly is the failure mode here? The commit message lacks a bit > details in form of a nice text or even better: A testcase ;-) We have pm ref during reset. And then after reset, we kick intel_gt_reset_powersave to re-enable the rps. Countrary to suspend/thaw, we never disabled the interrupts. And the warn triggers. I tried to disable the interrupts during reset handling but the nonblocking __wait_seqno() triggered another state warning it was taking a pm ref during or right after reset recovery for hw access. -Mika > Thanks, Daniel > >> >> v2: fix build error >> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> >> --- >> drivers/gpu/drm/i915/intel_pm.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c >> index 12f4e14..7a1309c 100644 >> --- a/drivers/gpu/drm/i915/intel_pm.c >> +++ b/drivers/gpu/drm/i915/intel_pm.c >> @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev) >> struct drm_i915_private *dev_priv = dev->dev_private; >> >> spin_lock_irq(&dev_priv->irq_lock); >> - WARN_ON(dev_priv->rps.pm_iir); >> + if (!i915_reset_in_progress(&dev_priv->gpu_error)) >> + WARN_ON(dev_priv->rps.pm_iir); >> gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); >> I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events); >> spin_unlock_irq(&dev_priv->irq_lock); >> @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev) >> struct drm_i915_private *dev_priv = dev->dev_private; >> >> spin_lock_irq(&dev_priv->irq_lock); >> - WARN_ON(dev_priv->rps.pm_iir); >> + if (!i915_reset_in_progress(&dev_priv->gpu_error)) >> + WARN_ON(dev_priv->rps.pm_iir); >> gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); >> I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events); >> spin_unlock_irq(&dev_priv->irq_lock); >> -- >> 1.7.9.5 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Mika Kuoppala <mika.kuoppala@linux.intel.com> writes: > Daniel Vetter <daniel@ffwll.ch> writes: > >> On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote: >>> We lost the software state tracking due to reset, so don't >>> complain if it doesn't match. >> >> This sounds more like gpu reset should be a bit more careful (even more >> careful than we already are compared to earlier kernels) with making sure >> the irq state is still sane after a reset? >> >> Or what exactly is the failure mode here? The commit message lacks a bit >> details in form of a nice text or even better: A testcase ;-) > > We have pm ref during reset. And then after reset, we kick > intel_gt_reset_powersave to re-enable the rps. Countrary to > suspend/thaw, we never disabled the interrupts. And the warn > triggers. > > I tried to disable the interrupts during reset handling but the > nonblocking __wait_seqno() triggered another state warning > it was taking a pm ref during or right after reset recovery for hw > access. > -Mika > Pretty difficult to hit also. I needed multiple tries of ctrl-c the process that submitted the hang and have a another client running in background doing gpu access. Timing issue related that we enable the rps through delayed workqueue? Here is the trace: [ 635.478701] [drm] Simulated gpu hang, resetting stop_rings [ 637.457126] ------------[ cut here ]------------ [ 637.458711] WARNING: CPU: 5 PID: 3595 at drivers/gpu/drm/i915/intel_pm.c:3607 gen6_enable_rps_interrupts+0x72/0x80 [i915]() [ 637.460361] Modules linked in: i915 drm_kms_helper drm kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq mxm_wmi snd_timer snd_seq_device psmouse snd serio_raw ehci_pci bnep ehci_hcd rfcomm soundcore bluetooth wmi mac_hid parport_pc ppdev lp parport dm_crypt usbhid firewire_ohci firewire_core crc_itu_t e1000e ptp pps_core xhci_hcd usbcore i2c_algo_bit video usb_common [last unloaded: drm] [ 637.468170] CPU: 5 PID: 3595 Comm: kworker/5:0 Tainted: G W 3.16.0+ #240 [ 637.469545] Workqueue: events intel_gen6_powersave_work [i915] [ 637.471042] 00000000 00000000 ca0d3e54 c15adcca f8898260 ca0d3e84 c1047224 c17536b0 [ 637.472616] 00000005 00000e0b f8898260 00000e17 f87ff852 f87ff852 f6ec8000 f6ecbe68 [ 637.474301] ee851c00 ca0d3e94 c1047262 00000009 00000000 ca0d3ea8 f87ff852 f6ec8000 [ 637.475920] Call Trace: [ 637.477504] [<c15adcca>] dump_stack+0x48/0x60 [ 637.479060] [<c1047224>] warn_slowpath_common+0x84/0xa0 [ 637.480708] [<f87ff852>] ? gen6_enable_rps_interrupts+0x72/0x80 [i915] [ 637.481880] [<f87ff852>] ? gen6_enable_rps_interrupts+0x72/0x80 [i915] [ 637.483220] [<c1047262>] warn_slowpath_null+0x22/0x30 [ 637.484258] [<f87ff852>] gen6_enable_rps_interrupts+0x72/0x80 [i915] [ 637.485503] [<f8808ecd>] intel_gen6_powersave_work+0x57d/0x1020 [i915] [ 637.486516] [<c105e8bc>] process_one_work+0x10c/0x3c0 [ 637.487630] [<c105f523>] worker_thread+0xf3/0x470 [ 637.488618] [<c105f430>] ? create_and_start_worker+0x50/0x50 [ 637.489802] [<c1064cdb>] kthread+0x9b/0xb0 [ 637.490804] [<c15b4e01>] ret_from_kernel_thread+0x21/0x30 [ 637.491872] [<c1064c40>] ? flush_kthread_worker+0xb0/0xb0 [ 637.492862] ---[ end trace b31c16cec8a7abaa ]--- -Mika >> Thanks, Daniel >> >>> >>> v2: fix build error >>> >>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> >>> --- >>> drivers/gpu/drm/i915/intel_pm.c | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c >>> index 12f4e14..7a1309c 100644 >>> --- a/drivers/gpu/drm/i915/intel_pm.c >>> +++ b/drivers/gpu/drm/i915/intel_pm.c >>> @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev) >>> struct drm_i915_private *dev_priv = dev->dev_private; >>> >>> spin_lock_irq(&dev_priv->irq_lock); >>> - WARN_ON(dev_priv->rps.pm_iir); >>> + if (!i915_reset_in_progress(&dev_priv->gpu_error)) >>> + WARN_ON(dev_priv->rps.pm_iir); >>> gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); >>> I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events); >>> spin_unlock_irq(&dev_priv->irq_lock); >>> @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev) >>> struct drm_i915_private *dev_priv = dev->dev_private; >>> >>> spin_lock_irq(&dev_priv->irq_lock); >>> - WARN_ON(dev_priv->rps.pm_iir); >>> + if (!i915_reset_in_progress(&dev_priv->gpu_error)) >>> + WARN_ON(dev_priv->rps.pm_iir); >>> gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); >>> I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events); >>> spin_unlock_irq(&dev_priv->irq_lock); >>> -- >>> 1.7.9.5 >>> >>> _______________________________________________ >>> Intel-gfx mailing list >>> Intel-gfx@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx >> >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Aug 14, 2014 at 05:45:59PM +0300, Ville Syrjälä wrote: > On Thu, Aug 14, 2014 at 04:23:16PM +0200, Daniel Vetter wrote: > > On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote: > > > We lost the software state tracking due to reset, so don't > > > complain if it doesn't match. > > > > This sounds more like gpu reset should be a bit more careful (even more > > careful than we already are compared to earlier kernels) with making sure > > the irq state is still sane after a reset? > > > > Or what exactly is the failure mode here? The commit message lacks a bit > > details in form of a nice text or even better: A testcase ;-) > > Killing the hpd irq and gt_powersave junk from i915_reset() would be my > suggestion here. I don't even know why the hpd stuff is still there, we > removed all the other irq frobbery from there a while back. And last I > looked gpu reset didn't affect the rc6/rps stuff either, though more > testing should be done to make sure I didn't just imagine it. I'd like a call to intel_mark_idle() since we are resetting the GPU activity tracker. That is meant to dtrt with all the powersave tracking as well. (Well if someone would just review some patches it might dtrt!) -Chris
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 12f4e14..7a1309c 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev) struct drm_i915_private *dev_priv = dev->dev_private; spin_lock_irq(&dev_priv->irq_lock); - WARN_ON(dev_priv->rps.pm_iir); + if (!i915_reset_in_progress(&dev_priv->gpu_error)) + WARN_ON(dev_priv->rps.pm_iir); gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events); spin_unlock_irq(&dev_priv->irq_lock); @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev) struct drm_i915_private *dev_priv = dev->dev_private; spin_lock_irq(&dev_priv->irq_lock); - WARN_ON(dev_priv->rps.pm_iir); + if (!i915_reset_in_progress(&dev_priv->gpu_error)) + WARN_ON(dev_priv->rps.pm_iir); gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events); spin_unlock_irq(&dev_priv->irq_lock);
We lost the software state tracking due to reset, so don't complain if it doesn't match. v2: fix build error Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/intel_pm.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)