drm/i915/guc: Flush the CT worker after updating log controls
diff mbox series

Message ID 20190718080203.24704-1-chris@chris-wilson.co.uk
State New
Headers show
Series
  • drm/i915/guc: Flush the CT worker after updating log controls
Related show

Commit Message

Chris Wilson July 18, 2019, 8:02 a.m. UTC
The log is not flushed from inside the irq, but from a worker spawned by
the interrupt. Flushing an on-going interrupt (intel_synchronize_irq)
does not flush a concurrent intel_guc_log_handle_flush_event() and
instead we must flush the guc_ct.worker.

<7> [159.391930] [IGT] Reading file "i915_guc_log_relay"
<5> [159.392247] i915 0000:00:02.0: GuC log buffer overflow
<5> [159.392259] i915 0000:00:02.0: GuC log buffer overflow
<4> [159.393301] ------------[ cut here ]------------
<4> [159.393303] WARN_ON(!intel_guc_log_relay_enabled(log))
<4> [159.393353] WARNING: CPU: 4 PID: 290 at drivers/gpu/drm/i915/gt/uc/intel_guc_log.c:247 guc_log_capture_logs+0x39d/0x470 [i915]
<4> [159.393355] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp snd_hda_intel snd_hda_codec mei_hdcp crct10dif_pclmul e1000e cdc_ether crc32_pclmul snd_hwdep usbnet mii snd_hda_core ghash_clmulni_intel snd_pcm ptp
<7> [159.393363] [IGT] Reading file "i915_guc_log_level"
<4> [159.393364]  pps_core mei_me mei prime_numbers
<4> [159.393368] CPU: 4 PID: 290 Comm: kworker/4:1H Tainted: G     U            5.2.0-CI-CI_DRM_6502+ #1
<4> [159.393369] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
<4> [159.393402] Workqueue: events_highpri capture_logs_work [i915]
<4> [159.393433] RIP: 0010:guc_log_capture_logs+0x39d/0x470 [i915]
<4> [159.393436] Code: c9 3a a0 e8 95 8e 3d e1 4d 89 f3 4c 8b 14 24 48 89 da e9 3f fe ff ff 48 c7 c6 c8 55 38 a0 48 c7 c7 cc c9 3a a0 e8 93 c1 e2 e0 <0f> 0b e9 c5 fe ff ff 48 c7 c6 00 30 35 a0 48 c7 c7 c0 ee 42 a0 e8
<4> [159.393439] RSP: 0018:ffffc900003b3dc0 EFLAGS: 00010282
<7> [159.393440] [IGT] Reading file "i915_dp_test_active"
<4> [159.393442] RAX: 0000000000000000 RBX: ffff88848ac0b8e0 RCX: 0000000000000004
<4> [159.393446] RDX: 000000000000202b RSI: ffff888497b64908 RDI: ffffffff8212d681
<4> [159.393448] RBP: ffffc900003b3ec8 R08: 00000000ca1d5cb3 R09: 0000000000000000
<4> [159.393451] R10: ffffc900003b3dc0 R11: 0000000000000000 R12: ffff88849fe39b40
<4> [159.393453] R13: ffff88848ac0b8c8 R14: 0000000000000000 R15: 0000000000000000
<4> [159.393455] FS:  0000000000000000(0000) GS:ffff88849fe00000(0000) knlGS:0000000000000000
<4> [159.393457] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [159.393460] CR2: 00007f42b9b35778 CR3: 0000000005210006 CR4: 0000000000760ee0
<4> [159.393462] PKRU: 55555554
<4> [159.393463] Call Trace:
<4> [159.393469]  ? process_one_work+0x1bf/0x610
<4> [159.393474]  process_one_work+0x245/0x610
<4> [159.393479]  worker_thread+0x37/0x380
<4> [159.393482]  ? process_one_work+0x610/0x610
<4> [159.393484]  kthread+0x119/0x130
<4> [159.393487]  ? kthread_park+0x80/0x80
<4> [159.393491]  ret_from_fork+0x24/0x50

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Michal Wajdeczko July 18, 2019, 12:53 p.m. UTC | #1
On Thu, 18 Jul 2019 10:02:03 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

...

> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> index 77fda1e85d3b..e80c1eb309ee 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> @@ -613,12 +613,11 @@ void intel_guc_log_relay_flush(struct  
> intel_guc_log *log)
>  void intel_guc_log_relay_close(struct intel_guc_log *log)
>  {
>  	struct intel_guc *guc = log_to_guc(log);
> -	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> 	guc_log_disable_flush_events(log);
> -	intel_synchronize_irq(i915);
> -	flush_work(&log->relay.flush_work);
> +	flush_work(&guc->ct.worker);

We don't need to touch ct.worker here as in above call to
guc_log_disable_flush_events() we will turn off handling of
all log events (they will be dropped if processed later by
the ct.worker in intel_guc_to_host_process_recv_msg)

Additionally touching ct.worker here will break our nice
separation between guc components

Michal

> +	cancel_work_sync(&log->relay.flush_work);
> 	mutex_lock(&log->relay.lock);
>  	GEM_BUG_ON(!intel_guc_log_relay_enabled(log));
Chris Wilson July 18, 2019, 12:59 p.m. UTC | #2
Quoting Michal Wajdeczko (2019-07-18 13:53:47)
> On Thu, 18 Jul 2019 10:02:03 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> ...
> 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c  
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> > index 77fda1e85d3b..e80c1eb309ee 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> > @@ -613,12 +613,11 @@ void intel_guc_log_relay_flush(struct  
> > intel_guc_log *log)
> >  void intel_guc_log_relay_close(struct intel_guc_log *log)
> >  {
> >       struct intel_guc *guc = log_to_guc(log);
> > -     struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> >       guc_log_disable_flush_events(log);
> > -     intel_synchronize_irq(i915);
> > -     flush_work(&log->relay.flush_work);
> > +     flush_work(&guc->ct.worker);
> 
> We don't need to touch ct.worker here as in above call to
> guc_log_disable_flush_events() we will turn off handling of
> all log events (they will be dropped if processed later by
> the ct.worker in intel_guc_to_host_process_recv_msg)

But... The check on guc->msg_enabled_mask is unserialised and you must
flush the worker to flush a concurrent enqueue as we cancel the
flush_work. Previously that was given by the sync_irq, but the
queue_work is no longer inside the irq, so it does nothing.
-Chris
Chris Wilson Sept. 19, 2019, 8:19 p.m. UTC | #3
Quoting Chris Wilson (2019-07-18 09:02:03)
> The log is not flushed from inside the irq, but from a worker spawned by
> the interrupt. Flushing an on-going interrupt (intel_synchronize_irq)
> does not flush a concurrent intel_guc_log_handle_flush_event() and
> instead we must flush the guc_ct.worker.
> 
> <7> [159.391930] [IGT] Reading file "i915_guc_log_relay"
> <5> [159.392247] i915 0000:00:02.0: GuC log buffer overflow
> <5> [159.392259] i915 0000:00:02.0: GuC log buffer overflow
> <4> [159.393301] ------------[ cut here ]------------
> <4> [159.393303] WARN_ON(!intel_guc_log_relay_enabled(log))
> <4> [159.393353] WARNING: CPU: 4 PID: 290 at drivers/gpu/drm/i915/gt/uc/intel_guc_log.c:247 guc_log_capture_logs+0x39d/0x470 [i915]
> <4> [159.393355] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp snd_hda_intel snd_hda_codec mei_hdcp crct10dif_pclmul e1000e cdc_ether crc32_pclmul snd_hwdep usbnet mii snd_hda_core ghash_clmulni_intel snd_pcm ptp
> <7> [159.393363] [IGT] Reading file "i915_guc_log_level"
> <4> [159.393364]  pps_core mei_me mei prime_numbers
> <4> [159.393368] CPU: 4 PID: 290 Comm: kworker/4:1H Tainted: G     U            5.2.0-CI-CI_DRM_6502+ #1
> <4> [159.393369] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
> <4> [159.393402] Workqueue: events_highpri capture_logs_work [i915]
> <4> [159.393433] RIP: 0010:guc_log_capture_logs+0x39d/0x470 [i915]
> <4> [159.393436] Code: c9 3a a0 e8 95 8e 3d e1 4d 89 f3 4c 8b 14 24 48 89 da e9 3f fe ff ff 48 c7 c6 c8 55 38 a0 48 c7 c7 cc c9 3a a0 e8 93 c1 e2 e0 <0f> 0b e9 c5 fe ff ff 48 c7 c6 00 30 35 a0 48 c7 c7 c0 ee 42 a0 e8
> <4> [159.393439] RSP: 0018:ffffc900003b3dc0 EFLAGS: 00010282
> <7> [159.393440] [IGT] Reading file "i915_dp_test_active"
> <4> [159.393442] RAX: 0000000000000000 RBX: ffff88848ac0b8e0 RCX: 0000000000000004
> <4> [159.393446] RDX: 000000000000202b RSI: ffff888497b64908 RDI: ffffffff8212d681
> <4> [159.393448] RBP: ffffc900003b3ec8 R08: 00000000ca1d5cb3 R09: 0000000000000000
> <4> [159.393451] R10: ffffc900003b3dc0 R11: 0000000000000000 R12: ffff88849fe39b40
> <4> [159.393453] R13: ffff88848ac0b8c8 R14: 0000000000000000 R15: 0000000000000000
> <4> [159.393455] FS:  0000000000000000(0000) GS:ffff88849fe00000(0000) knlGS:0000000000000000
> <4> [159.393457] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [159.393460] CR2: 00007f42b9b35778 CR3: 0000000005210006 CR4: 0000000000760ee0
> <4> [159.393462] PKRU: 55555554
> <4> [159.393463] Call Trace:
> <4> [159.393469]  ? process_one_work+0x1bf/0x610
> <4> [159.393474]  process_one_work+0x245/0x610
> <4> [159.393479]  worker_thread+0x37/0x380
> <4> [159.393482]  ? process_one_work+0x610/0x610
> <4> [159.393484]  kthread+0x119/0x130
> <4> [159.393487]  ? kthread_park+0x80/0x80
> <4> [159.393491]  ret_from_fork+0x24/0x50
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> index 77fda1e85d3b..e80c1eb309ee 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> @@ -613,12 +613,11 @@ void intel_guc_log_relay_flush(struct intel_guc_log *log)
>  void intel_guc_log_relay_close(struct intel_guc_log *log)
>  {
>         struct intel_guc *guc = log_to_guc(log);
> -       struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
>  
>         guc_log_disable_flush_events(log);
> -       intel_synchronize_irq(i915);

Ping.

intel_synchronize_irq() is ineffective here.

> -       flush_work(&log->relay.flush_work);
> +       flush_work(&guc->ct.worker);
> +       cancel_work_sync(&log->relay.flush_work);

Your objection was that you haven't provided a guc_ct_sync()...
-Chris

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
index 77fda1e85d3b..e80c1eb309ee 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
@@ -613,12 +613,11 @@  void intel_guc_log_relay_flush(struct intel_guc_log *log)
 void intel_guc_log_relay_close(struct intel_guc_log *log)
 {
 	struct intel_guc *guc = log_to_guc(log);
-	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
 
 	guc_log_disable_flush_events(log);
-	intel_synchronize_irq(i915);
 
-	flush_work(&log->relay.flush_work);
+	flush_work(&guc->ct.worker);
+	cancel_work_sync(&log->relay.flush_work);
 
 	mutex_lock(&log->relay.lock);
 	GEM_BUG_ON(!intel_guc_log_relay_enabled(log));