diff mbox series

drm/i915: Watchdog timeout: IRQ handler for gen8+

Message ID 20190105024001.37629-3-carlos.santa@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Watchdog timeout: IRQ handler for gen8+ | expand

Commit Message

Santa, Carlos Jan. 5, 2019, 2:39 a.m. UTC
From: Michel Thierry <michel.thierry@intel.com>

*** General ***

Watchdog timeout (or "media engine reset") is a feature that allows
userland applications to enable hang detection on individual batch buffers.
The detection mechanism itself is mostly bound to the hardware and the only
thing that the driver needs to do to support this form of hang detection
is to implement the interrupt handling support as well as watchdog command
emission before and after the emitted batch buffer start instruction in the
ring buffer.

The principle of the hang detection mechanism is as follows:

1. Once the decision has been made to enable watchdog timeout for a
particular batch buffer and the driver is in the process of emitting the
batch buffer start instruction into the ring buffer it also emits a
watchdog timer start instruction before and a watchdog timer cancellation
instruction after the batch buffer start instruction in the ring buffer.

2. Once the GPU execution reaches the watchdog timer start instruction
the hardware watchdog counter is started by the hardware. The counter
keeps counting until either reaching a previously configured threshold
value or the timer cancellation instruction is executed.

2a. If the counter reaches the threshold value the hardware fires a
watchdog interrupt that is picked up by the watchdog interrupt handler.
This means that a hang has been detected and the driver needs to deal with
it the same way it would deal with a engine hang detected by the periodic
hang checker. The only difference between the two is that we already blamed
the active request (to ensure an engine reset).

2b. If the batch buffer completes and the execution reaches the watchdog
cancellation instruction before the watchdog counter reaches its
threshold value the watchdog is cancelled and nothing more comes of it.
No hang is detected.

Note about future interaction with preemption: Preemption could happen
in a command sequence prior to watchdog counter getting disabled,
resulting in watchdog being triggered following preemption (e.g. when
watchdog had been enabled in the low priority batch). The driver will
need to explicitly disable the watchdog counter as part of the
preemption sequence.

*** This patch introduces: ***

1. IRQ handler code for watchdog timeout allowing direct hang recovery
based on hardware-driven hang detection, which then integrates directly
with the hang recovery path. This is independent of having per-engine reset
or just full gpu reset.

2. Watchdog specific register information.

Currently the render engine and all available media engines support
watchdog timeout (VECS is only supported in GEN9). The specifications elude
to the BCS engine being supported but that is currently not supported by
this commit.

Note that the value to stop the counter is different between render and
non-render engines in GEN8; GEN9 onwards it's the same.

v2: Move irq handler to tasklet, arm watchdog for a 2nd time to check
against false-positives.

v3: Don't use high priority tasklet, use engine_last_submit while
checking for false-positives. From GEN9 onwards, the stop counter bit is
the same for all engines.

v4: Remove unnecessary brackets, use current_seqno to mark the request
as guilty in the hangcheck/capture code.

v5: Rebased after RESET_ENGINEs flag.

v6: Don't capture error state in case of watchdog timeout. The capture
process is time consuming and this will align to what happens when we
use GuC to handle the watchdog timeout. (Chris)

v7: Rebase.

v8: Rebase, use HZ to reschedule.

v9: Rebase, get forcewake domains in function (no longer in execlists
struct).

v10: Rebase.

Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.h   |  4 ++
 drivers/gpu/drm/i915/i915_irq.c         | 14 +++-
 drivers/gpu/drm/i915/i915_reg.h         |  6 ++
 drivers/gpu/drm/i915/intel_hangcheck.c  | 17 +++--
 drivers/gpu/drm/i915/intel_lrc.c        | 86 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  6 ++
 6 files changed, 126 insertions(+), 7 deletions(-)

Comments

Tvrtko Ursulin Jan. 7, 2019, 11:58 a.m. UTC | #1
Hi,

This series has not been recognized by Patchwork as such, nor are the 
patches numbered. Have you used git format-patch -<N> --cover-letter and 
git send-email to send it out?

Rest inline.

On 05/01/2019 02:39, Carlos Santa wrote:
> From: Michel Thierry <michel.thierry@intel.com>
> 
> *** General ***
> 
> Watchdog timeout (or "media engine reset") is a feature that allows
> userland applications to enable hang detection on individual batch buffers.
> The detection mechanism itself is mostly bound to the hardware and the only
> thing that the driver needs to do to support this form of hang detection
> is to implement the interrupt handling support as well as watchdog command
> emission before and after the emitted batch buffer start instruction in the
> ring buffer.
> 
> The principle of the hang detection mechanism is as follows:
> 
> 1. Once the decision has been made to enable watchdog timeout for a
> particular batch buffer and the driver is in the process of emitting the
> batch buffer start instruction into the ring buffer it also emits a
> watchdog timer start instruction before and a watchdog timer cancellation
> instruction after the batch buffer start instruction in the ring buffer.
> 
> 2. Once the GPU execution reaches the watchdog timer start instruction
> the hardware watchdog counter is started by the hardware. The counter
> keeps counting until either reaching a previously configured threshold
> value or the timer cancellation instruction is executed.
> 
> 2a. If the counter reaches the threshold value the hardware fires a
> watchdog interrupt that is picked up by the watchdog interrupt handler.
> This means that a hang has been detected and the driver needs to deal with
> it the same way it would deal with a engine hang detected by the periodic
> hang checker. The only difference between the two is that we already blamed
> the active request (to ensure an engine reset).

What happens if the watchdog fires but the "guilty" request completes 
before the interrupt has been delivered, or acted upon? Would that mean 
an innocent request could be blamed for the timeout? Maybe the answer 
comes later in the patch/series.

> 
> 2b. If the batch buffer completes and the execution reaches the watchdog
> cancellation instruction before the watchdog counter reaches its
> threshold value the watchdog is cancelled and nothing more comes of it.
> No hang is detected.
> 
> Note about future interaction with preemption: Preemption could happen
> in a command sequence prior to watchdog counter getting disabled,
> resulting in watchdog being triggered following preemption (e.g. when
> watchdog had been enabled in the low priority batch). The driver will
> need to explicitly disable the watchdog counter as part of the
> preemption sequence.

Does the series take care of preemption?

> 
> *** This patch introduces: ***
> 
> 1. IRQ handler code for watchdog timeout allowing direct hang recovery
> based on hardware-driven hang detection, which then integrates directly
> with the hang recovery path. This is independent of having per-engine reset
> or just full gpu reset.
> 
> 2. Watchdog specific register information.
> 
> Currently the render engine and all available media engines support
> watchdog timeout (VECS is only supported in GEN9). The specifications elude
> to the BCS engine being supported but that is currently not supported by
> this commit.
> 
> Note that the value to stop the counter is different between render and
> non-render engines in GEN8; GEN9 onwards it's the same.
> 
> v2: Move irq handler to tasklet, arm watchdog for a 2nd time to check
> against false-positives.
> 
> v3: Don't use high priority tasklet, use engine_last_submit while
> checking for false-positives. From GEN9 onwards, the stop counter bit is
> the same for all engines.
> 
> v4: Remove unnecessary brackets, use current_seqno to mark the request
> as guilty in the hangcheck/capture code.
> 
> v5: Rebased after RESET_ENGINEs flag.
> 
> v6: Don't capture error state in case of watchdog timeout. The capture
> process is time consuming and this will align to what happens when we
> use GuC to handle the watchdog timeout. (Chris)
> 
> v7: Rebase.
> 
> v8: Rebase, use HZ to reschedule.
> 
> v9: Rebase, get forcewake domains in function (no longer in execlists
> struct).
> 
> v10: Rebase.
> 
> Cc: Antonio Argenziano <antonio.argenziano@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> Signed-off-by: Carlos Santa <carlos.santa@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gpu_error.h   |  4 ++
>   drivers/gpu/drm/i915/i915_irq.c         | 14 +++-
>   drivers/gpu/drm/i915/i915_reg.h         |  6 ++
>   drivers/gpu/drm/i915/intel_hangcheck.c  | 17 +++--
>   drivers/gpu/drm/i915/intel_lrc.c        | 86 +++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  6 ++
>   6 files changed, 126 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 6d9f45468ac1..7130786aa5b4 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -256,6 +256,9 @@ struct i915_gpu_error {
>   	 * inspect the bit and do the reset directly, otherwise the worker
>   	 * waits for the struct_mutex.
>   	 *
> +	 * #I915_RESET_WATCHDOG - When hw detects a hang before us, we can use
> +	 * I915_RESET_WATCHDOG to report the hang detection cause accurately.
> +	 *
>   	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
>   	 * acquire the struct_mutex to reset an engine, we need an explicit
>   	 * flag to prevent two concurrent reset attempts in the same engine.
> @@ -271,6 +274,7 @@ struct i915_gpu_error {
>   #define I915_RESET_BACKOFF	0
>   #define I915_RESET_HANDOFF	1
>   #define I915_RESET_MODESET	2
> +#define I915_RESET_WATCHDOG	3
>   #define I915_WEDGED		(BITS_PER_LONG - 1)
>   #define I915_RESET_ENGINE	(I915_WEDGED - I915_NUM_ENGINES)
>   
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index fbb094ecf6c9..859bbadb752f 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1498,6 +1498,9 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>   
>   	if (tasklet)
>   		tasklet_hi_schedule(&engine->execlists.tasklet);
> +
> +	if (iir & (GT_GEN8_WATCHDOG_INTERRUPT))

Braces are not needed.

> +		tasklet_schedule(&engine->execlists.watchdog_tasklet);
>   }
>   
>   static void gen8_gt_irq_ack(struct drm_i915_private *i915,
> @@ -3329,7 +3332,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
>   	if (intel_has_reset_engine(dev_priv) &&
>   	    !i915_terminally_wedged(&dev_priv->gpu_error)) {
>   		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
> -			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> +			BUILD_BUG_ON(I915_RESET_WATCHDOG >= I915_RESET_ENGINE);
>   			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
>   					     &dev_priv->gpu_error.flags))
>   				continue;
> @@ -4162,12 +4165,15 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
>   	uint32_t gt_interrupts[] = {
>   		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
>   			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
> +			GT_GEN8_WATCHDOG_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
>   			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT |
>   			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
>   		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
>   			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
> +			GT_GEN8_WATCHDOG_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
>   			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
> -			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
> +			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
> +			GT_GEN8_WATCHDOG_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
>   		0,
>   		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT |
>   			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT
> @@ -4176,6 +4182,10 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
>   	if (HAS_L3_DPF(dev_priv))
>   		gt_interrupts[0] |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
>   
> +	/* VECS watchdog is only available in skl+ */
> +	if (INTEL_GEN(dev_priv) >= 9)
> +		gt_interrupts[3] |= GT_GEN8_WATCHDOG_INTERRUPT;

Is the shift missing here?

> +
>   	dev_priv->pm_ier = 0x0;
>   	dev_priv->pm_imr = ~dev_priv->pm_ier;
>   	GEN8_IRQ_INIT_NDX(GT, 0, ~gt_interrupts[0], gt_interrupts[0]);
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 44958d994bfa..fff330643090 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -2335,6 +2335,11 @@ enum i915_power_well_id {
>   #define RING_START(base)	_MMIO((base) + 0x38)
>   #define RING_CTL(base)		_MMIO((base) + 0x3c)
>   #define   RING_CTL_SIZE(size)	((size) - PAGE_SIZE) /* in bytes -> pages */
> +#define RING_CNTR(base)		_MMIO((base) + 0x178)
> +#define   GEN8_WATCHDOG_ENABLE		0
> +#define   GEN8_WATCHDOG_DISABLE		1
> +#define   GEN8_XCS_WATCHDOG_DISABLE	0xFFFFFFFF /* GEN8 & non-render only */
> +#define RING_THRESH(base)	_MMIO((base) + 0x17C)
>   #define RING_SYNC_0(base)	_MMIO((base) + 0x40)
>   #define RING_SYNC_1(base)	_MMIO((base) + 0x44)
>   #define RING_SYNC_2(base)	_MMIO((base) + 0x48)
> @@ -2894,6 +2899,7 @@ enum i915_power_well_id {
>   #define GT_BSD_USER_INTERRUPT			(1 << 12)
>   #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1	(1 << 11) /* hsw+; rsvd on snb, ivb, vlv */
>   #define GT_CONTEXT_SWITCH_INTERRUPT		(1 <<  8)
> +#define GT_GEN8_WATCHDOG_INTERRUPT		(1 <<  6) /* gen8+ */
>   #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT	(1 <<  5) /* !snb */
>   #define GT_RENDER_PIPECTL_NOTIFY_INTERRUPT	(1 <<  4)
>   #define GT_RENDER_CS_MASTER_ERROR_INTERRUPT	(1 <<  3)
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index 51e9efec5116..2906f0ef3d77 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -213,7 +213,8 @@ static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
>   
>   static void hangcheck_declare_hang(struct drm_i915_private *i915,
>   				   unsigned int hung,
> -				   unsigned int stuck)
> +				   unsigned int stuck,
> +				   unsigned int watchdog)
>   {
>   	struct intel_engine_cs *engine;
>   	char msg[80];
> @@ -226,13 +227,16 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915,
>   	if (stuck != hung)
>   		hung &= ~stuck;
>   	len = scnprintf(msg, sizeof(msg),
> -			"%s on ", stuck == hung ? "no progress" : "hang");
> +			"%s on ", watchdog ? "watchdog timeout" :
> +				  stuck == hung ? "no progress" : "hang");
>   	for_each_engine_masked(engine, i915, hung, tmp)
>   		len += scnprintf(msg + len, sizeof(msg) - len,
>   				 "%s, ", engine->name);
>   	msg[len-2] = '\0';
>   
> -	return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg);
> +	return i915_handle_error(i915, hung,
> +				 watchdog ? 0 : I915_ERROR_CAPTURE,
> +				 "%s", msg);
>   }
>   
>   /*
> @@ -250,7 +254,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   			     gpu_error.hangcheck_work.work);
>   	struct intel_engine_cs *engine;
>   	enum intel_engine_id id;
> -	unsigned int hung = 0, stuck = 0, wedged = 0;
> +	unsigned int hung = 0, stuck = 0, wedged = 0, watchdog = 0;
>   
>   	if (!i915_modparams.enable_hangcheck)
>   		return;
> @@ -261,6 +265,9 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   	if (i915_terminally_wedged(&dev_priv->gpu_error))
>   		return;
>   
> +	if (test_and_clear_bit(I915_RESET_WATCHDOG, &dev_priv->gpu_error.flags))
> +		watchdog = 1;
> +
>   	/* As enabling the GPU requires fairly extensive mmio access,
>   	 * periodically arm the mmio checker to see if we are triggering
>   	 * any invalid access.
> @@ -293,7 +300,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   	}
>   
>   	if (hung)
> -		hangcheck_declare_hang(dev_priv, hung, stuck);
> +		hangcheck_declare_hang(dev_priv, hung, stuck, watchdog);
>   
>   	/* Reset timer in case GPU hangs without another request being added */
>   	i915_queue_hangcheck(dev_priv);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 6c98fb7cebf2..e1dcdf545bee 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2027,6 +2027,70 @@ static int gen8_emit_flush_render(struct i915_request *request,
>   	return 0;
>   }
>   
> +/* From GEN9 onwards, all engines use the same RING_CNTR format */
> +static inline u32 get_watchdog_disable(struct intel_engine_cs *engine)
> +{
> +	if (engine->id == RCS || INTEL_GEN(engine->i915) >= 9)
> +		return GEN8_WATCHDOG_DISABLE;
> +	else
> +		return GEN8_XCS_WATCHDOG_DISABLE;
> +}
> +
> +#define GEN8_WATCHDOG_1000US 0x2ee0 //XXX: Temp, replace with helper function

Please do then. :)

> +static void gen8_watchdog_irq_handler(unsigned long data)
> +{
> +	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	enum forcewake_domains fw_domains;
> +	u32 current_seqno;
> +
> +	switch (engine->id) {
> +	default:
> +		MISSING_CASE(engine->id);
> +		/* fall through */
> +	case RCS:
> +		fw_domains = FORCEWAKE_RENDER;
> +		break;
> +	case VCS:
> +	case VCS2:
> +	case VECS:
> +		fw_domains = FORCEWAKE_MEDIA;
> +		break;
> +	}
> +
> +	intel_uncore_forcewake_get(dev_priv, fw_domains);

I'd be tempted to drop this and just use I915_WRITE. It doesn't feel 
like there is any performance to be gained with it and it embeds too 
much knowledge here.

Alternatively, if you want to keep it, consider using 
intel_uncore_forcewake_for_reg to leave the fw domain knowledge out of 
here. See for instance how it is used in intel_engine_cs.c.

> +
> +	/* Stop the counter to prevent further timeout interrupts */
> +	I915_WRITE_FW(RING_CNTR(engine->mmio_base), get_watchdog_disable(engine));

What if we disable the watchdog for a batch following the falsely 
accused one? I mean this:

1. Batch 1 runs
2. IRQ fires -> tasklet_schedule
3. Batch 2 runs (can be different context)
4. Tasklet runs
5. Watchdog gets disabled
6. Batch 2 hangs - but watchdog has been disabled

?

> +
> +	current_seqno = intel_engine_get_seqno(engine);
> +
> +	/* did the request complete after the timer expired? */
> +	if (intel_engine_last_submit(engine) == current_seqno)
> +		goto fw_put;
> +
> +	if (engine->hangcheck.watchdog == current_seqno) {
> +		/* Make sure the active request will be marked as guilty */
> +		engine->hangcheck.stalled = true;
> +		engine->hangcheck.acthd = intel_engine_get_active_head(engine);
> +		engine->hangcheck.seqno = current_seqno;
> +
> +		/* And try to run the hangcheck_work as soon as possible */
> +		set_bit(I915_RESET_WATCHDOG, &dev_priv->gpu_error.flags);
> +		queue_delayed_work(system_long_wq,
> +				   &dev_priv->gpu_error.hangcheck_work,
> +				   round_jiffies_up_relative(HZ));
> +	} else {
> +		engine->hangcheck.watchdog = current_seqno;

The logic above potentially handles my previous question? Could be if 
batch 2 hangs. But..

> +		/* Re-start the counter, if really hung, it will expire again */
> +		I915_WRITE_FW(RING_THRESH(engine->mmio_base), GEN8_WATCHDOG_1000US);
> +		I915_WRITE_FW(RING_CNTR(engine->mmio_base), GEN8_WATCHDOG_ENABLE);

.. the timeout will be wrong ie. not respected from what the userspace 
set. So I don't think it will work. This code either needs to handle 
running with watchdog enabled, or here it somehow needs to fish out the 
correct timeout to set.

> +	}
> +
> +fw_put:
> +	intel_uncore_forcewake_put(dev_priv, fw_domains);
> +}
> +
>   /*
>    * Reserve space for 2 NOOPs at the end of each request to be
>    * used as a workaround for not being allowed to do lite
> @@ -2115,6 +2179,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>   			     &engine->execlists.tasklet.state)))
>   		tasklet_kill(&engine->execlists.tasklet);
>   
> +	if (WARN_ON(test_bit(TASKLET_STATE_SCHED, &engine->execlists.watchdog_tasklet.state)))
> +		tasklet_kill(&engine->execlists.watchdog_tasklet);
> +

I don't see any code ensuring this WARN can't fire if the tasklet gets 
delayed? A tasklet_kill in intel_engines_park might be enough.

>   	dev_priv = engine->i915;
>   
>   	if (engine->buffer) {
> @@ -2208,6 +2275,22 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
>   
>   	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift;
>   	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
> +
> +	switch (engine->id) {
> +	default:
> +		/* BCS engine does not support hw watchdog */
> +		break;
> +	case RCS:
> +	case VCS:
> +	case VCS2:

Change all to class based checks please or maintenance gets hard. Hm 
even more so, like this ICL is broken.

> +		engine->irq_keep_mask |= (GT_GEN8_WATCHDOG_INTERRUPT << shift);

Braces not needed here and below.

> +		break;
> +	case VECS:
> +		if (INTEL_GEN(engine->i915) >= 9)
> +			engine->irq_keep_mask |=
> +				(GT_GEN8_WATCHDOG_INTERRUPT << shift);
> +		break;
> +	}
>   }
>   
>   static void
> @@ -2221,6 +2304,9 @@ logical_ring_setup(struct intel_engine_cs *engine)
>   	tasklet_init(&engine->execlists.tasklet,
>   		     execlists_submission_tasklet, (unsigned long)engine);
>   
> +	tasklet_init(&engine->execlists.watchdog_tasklet,
> +		     gen8_watchdog_irq_handler, (unsigned long)engine);
> +
>   	logical_ring_default_vfuncs(engine);
>   	logical_ring_default_irqs(engine);
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 3c1366c58cf3..6cb8b4280035 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -120,6 +120,7 @@ struct intel_instdone {
>   struct intel_engine_hangcheck {
>   	u64 acthd;
>   	u32 seqno;
> +	u32 watchdog;
>   	enum intel_engine_hangcheck_action action;
>   	unsigned long action_timestamp;
>   	int deadlock;
> @@ -224,6 +225,11 @@ struct intel_engine_execlists {
>   	 */
>   	struct tasklet_struct tasklet;
>   
> +	/**
> +	 * @watchdog_tasklet: stop counter and re-schedule hangcheck_work asap
> +	 */
> +	struct tasklet_struct watchdog_tasklet;
> +
>   	/**
>   	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
>   	 */
> 

Regards,

Tvrtko
Chris Wilson Jan. 7, 2019, 12:16 p.m. UTC | #2
Quoting Tvrtko Ursulin (2019-01-07 11:58:13)
> 
> Hi,
> 
> This series has not been recognized by Patchwork as such, nor are the 
> patches numbered. Have you used git format-patch -<N> --cover-letter and 
> git send-email to send it out?
> 
> Rest inline.
> 
> On 05/01/2019 02:39, Carlos Santa wrote:
> > +static void gen8_watchdog_irq_handler(unsigned long data)
> > +{
> > +     struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> > +     struct drm_i915_private *dev_priv = engine->i915;
> > +     enum forcewake_domains fw_domains;
> > +     u32 current_seqno;
> > +
> > +     switch (engine->id) {
> > +     default:
> > +             MISSING_CASE(engine->id);
> > +             /* fall through */
> > +     case RCS:
> > +             fw_domains = FORCEWAKE_RENDER;
> > +             break;
> > +     case VCS:
> > +     case VCS2:
> > +     case VECS:
> > +             fw_domains = FORCEWAKE_MEDIA;
> > +             break;
> > +     }
> > +
> > +     intel_uncore_forcewake_get(dev_priv, fw_domains);
> 
> I'd be tempted to drop this and just use I915_WRITE. It doesn't feel 
> like there is any performance to be gained with it and it embeds too 
> much knowledge here.

No, no, no. Let's not reintroduce a fw inside irq context on a frequent
timer again.

Rule of thumb for fw_get:
gen6+: 10us to 50ms.
gen8+: 10us to 500us.

And then we don't release fw for 1ms after the fw_put. So we basically
prevent GT powersaving while the watchdog is active. That strikes me as
hopefully an unintended consequence.

The fw_get will be required if we actually hang, but for the timer
check, we should be able to do without.

And while on the topic of the timer irq, it should be forcibly cleared
along intel_engine_park, so that we ensure it is not raised while the
device/driver is supposed to be asleep. Or something to that effect.

> > +     current_seqno = intel_engine_get_seqno(engine);
> > +
> > +     /* did the request complete after the timer expired? */
> > +     if (intel_engine_last_submit(engine) == current_seqno)
> > +             goto fw_put;
> > +
> > +     if (engine->hangcheck.watchdog == current_seqno) {
> > +             /* Make sure the active request will be marked as guilty */
> > +             engine->hangcheck.stalled = true;
> > +             engine->hangcheck.acthd = intel_engine_get_active_head(engine);
> > +             engine->hangcheck.seqno = current_seqno;
> > +
> > +             /* And try to run the hangcheck_work as soon as possible */
> > +             set_bit(I915_RESET_WATCHDOG, &dev_priv->gpu_error.flags);
> > +             queue_delayed_work(system_long_wq,
> > +                                &dev_priv->gpu_error.hangcheck_work,
> > +                                round_jiffies_up_relative(HZ));
> > +     } else {
> > +             engine->hangcheck.watchdog = current_seqno;
> 
> The logic above potentially handles my previous question? Could be if 
> batch 2 hangs. But..

Also, DO NOT USE HANGCHECK for this. The whole design was to be able to
do the engine reset right away. (Now guc can't but that's known broken.)

Aside, we have to rewrite this entire logic anyway as the engine seqno
and global_seqno are obsolete.
-Chris
Tvrtko Ursulin Jan. 7, 2019, 12:58 p.m. UTC | #3
On 07/01/2019 12:16, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-07 11:58:13)
>>
>> Hi,
>>
>> This series has not been recognized by Patchwork as such, nor are the
>> patches numbered. Have you used git format-patch -<N> --cover-letter and
>> git send-email to send it out?
>>
>> Rest inline.
>>
>> On 05/01/2019 02:39, Carlos Santa wrote:
>>> +static void gen8_watchdog_irq_handler(unsigned long data)
>>> +{
>>> +     struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
>>> +     struct drm_i915_private *dev_priv = engine->i915;
>>> +     enum forcewake_domains fw_domains;
>>> +     u32 current_seqno;
>>> +
>>> +     switch (engine->id) {
>>> +     default:
>>> +             MISSING_CASE(engine->id);
>>> +             /* fall through */
>>> +     case RCS:
>>> +             fw_domains = FORCEWAKE_RENDER;
>>> +             break;
>>> +     case VCS:
>>> +     case VCS2:
>>> +     case VECS:
>>> +             fw_domains = FORCEWAKE_MEDIA;
>>> +             break;
>>> +     }
>>> +
>>> +     intel_uncore_forcewake_get(dev_priv, fw_domains);
>>
>> I'd be tempted to drop this and just use I915_WRITE. It doesn't feel
>> like there is any performance to be gained with it and it embeds too
>> much knowledge here.
> 
> No, no, no. Let's not reintroduce a fw inside irq context on a frequent
> timer again.

Tasklet and hopefully watchdog timeouts are not frequent. :)

> Rule of thumb for fw_get:
> gen6+: 10us to 50ms.
> gen8+: 10us to 500us.
> 
> And then we don't release fw for 1ms after the fw_put. So we basically
> prevent GT powersaving while the watchdog is active. That strikes me as
> hopefully an unintended consequence.
> 
> The fw_get will be required if we actually hang, but for the timer
> check, we should be able to do without.

That would be nice, but it is needed by the watchdog disable/re-enable 
logic. Which I commented looks suspect to me so maybe something can be 
done about that.

But in general, I didn't quite get if you are opposed to my suggestion 
not to open code knowledge of fw domains here in favour of simple 
I915_WRITE, or just the whole concept of taking a fw by any means here.

> And while on the topic of the timer irq, it should be forcibly cleared
> along intel_engine_park, so that we ensure it is not raised while the
> device/driver is supposed to be asleep. Or something to that effect.

I have raised the issue of syncing the potentially delayed tasklet, but 
yeah, could be that more is needed.

>>> +     current_seqno = intel_engine_get_seqno(engine);
>>> +
>>> +     /* did the request complete after the timer expired? */
>>> +     if (intel_engine_last_submit(engine) == current_seqno)
>>> +             goto fw_put;
>>> +
>>> +     if (engine->hangcheck.watchdog == current_seqno) {
>>> +             /* Make sure the active request will be marked as guilty */
>>> +             engine->hangcheck.stalled = true;
>>> +             engine->hangcheck.acthd = intel_engine_get_active_head(engine);
>>> +             engine->hangcheck.seqno = current_seqno;
>>> +
>>> +             /* And try to run the hangcheck_work as soon as possible */
>>> +             set_bit(I915_RESET_WATCHDOG, &dev_priv->gpu_error.flags);
>>> +             queue_delayed_work(system_long_wq,
>>> +                                &dev_priv->gpu_error.hangcheck_work,
>>> +                                round_jiffies_up_relative(HZ));
>>> +     } else {
>>> +             engine->hangcheck.watchdog = current_seqno;
>>
>> The logic above potentially handles my previous question? Could be if
>> batch 2 hangs. But..
> 
> Also, DO NOT USE HANGCHECK for this. The whole design was to be able to
> do the engine reset right away. (Now guc can't but that's known broken.)
> 
> Aside, we have to rewrite this entire logic anyway as the engine seqno
> and global_seqno are obsolete.

Btw one thing I forgot to say - I did not focus on the hangcheck 
interactions - I'll leave that for people more in the know of that.

Regards,

Tvrtko
Chris Wilson Jan. 7, 2019, 1:02 p.m. UTC | #4
Quoting Tvrtko Ursulin (2019-01-07 12:58:39)
> 
> On 07/01/2019 12:16, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-01-07 11:58:13)
> >> On 05/01/2019 02:39, Carlos Santa wrote:
> >>> +static void gen8_watchdog_irq_handler(unsigned long data)
> >>> +{
> >>> +     struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> >>> +     struct drm_i915_private *dev_priv = engine->i915;
> >>> +     enum forcewake_domains fw_domains;
> >>> +     u32 current_seqno;
> >>> +
> >>> +     switch (engine->id) {
> >>> +     default:
> >>> +             MISSING_CASE(engine->id);
> >>> +             /* fall through */
> >>> +     case RCS:
> >>> +             fw_domains = FORCEWAKE_RENDER;
> >>> +             break;
> >>> +     case VCS:
> >>> +     case VCS2:
> >>> +     case VECS:
> >>> +             fw_domains = FORCEWAKE_MEDIA;
> >>> +             break;
> >>> +     }
> >>> +
> >>> +     intel_uncore_forcewake_get(dev_priv, fw_domains);
> >>
> >> I'd be tempted to drop this and just use I915_WRITE. It doesn't feel
> >> like there is any performance to be gained with it and it embeds too
> >> much knowledge here.
> > 
> > No, no, no. Let's not reintroduce a fw inside irq context on a frequent
> > timer again.
> 
> Tasklet and hopefully watchdog timeouts are not frequent. :)

I thought the typical value mentioned elsewhere was a 1ms watchdog. Some
might say why even use a watchdog for longer than that as a hrtimer will
be more efficient (coupling in with other timer activity) ;)
-Chris
Tvrtko Ursulin Jan. 7, 2019, 1:12 p.m. UTC | #5
On 07/01/2019 13:02, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-07 12:58:39)
>>
>> On 07/01/2019 12:16, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2019-01-07 11:58:13)
>>>> On 05/01/2019 02:39, Carlos Santa wrote:
>>>>> +static void gen8_watchdog_irq_handler(unsigned long data)
>>>>> +{
>>>>> +     struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
>>>>> +     struct drm_i915_private *dev_priv = engine->i915;
>>>>> +     enum forcewake_domains fw_domains;
>>>>> +     u32 current_seqno;
>>>>> +
>>>>> +     switch (engine->id) {
>>>>> +     default:
>>>>> +             MISSING_CASE(engine->id);
>>>>> +             /* fall through */
>>>>> +     case RCS:
>>>>> +             fw_domains = FORCEWAKE_RENDER;
>>>>> +             break;
>>>>> +     case VCS:
>>>>> +     case VCS2:
>>>>> +     case VECS:
>>>>> +             fw_domains = FORCEWAKE_MEDIA;
>>>>> +             break;
>>>>> +     }
>>>>> +
>>>>> +     intel_uncore_forcewake_get(dev_priv, fw_domains);
>>>>
>>>> I'd be tempted to drop this and just use I915_WRITE. It doesn't feel
>>>> like there is any performance to be gained with it and it embeds too
>>>> much knowledge here.
>>>
>>> No, no, no. Let's not reintroduce a fw inside irq context on a frequent
>>> timer again.
>>
>> Tasklet and hopefully watchdog timeouts are not frequent. :)
> 
> I thought the typical value mentioned elsewhere was a 1ms watchdog. Some

Commit message to a patch from this series says 60ms is recommended.

> might say why even use a watchdog for longer than that as a hrtimer will
> be more efficient (coupling in with other timer activity) ;)

For the normal case (longish batches, very few timeouts) it feels more 
efficient to have ten or so extra dwords with each request than to 
fiddle with hrtimers, no?

But interactions with preemption and future time-slicing yeah don't 
know. Maybe a single solution will be simpler at that point.

Regards,

Tvrtko
Tvrtko Ursulin Jan. 7, 2019, 1:43 p.m. UTC | #6
On 07/01/2019 11:58, Tvrtko Ursulin wrote:

[snip]

>> Note about future interaction with preemption: Preemption could happen
>> in a command sequence prior to watchdog counter getting disabled,
>> resulting in watchdog being triggered following preemption (e.g. when
>> watchdog had been enabled in the low priority batch). The driver will
>> need to explicitly disable the watchdog counter as part of the
>> preemption sequence.
> 
> Does the series take care of preemption?

I did not find that it does.

So this is something which definitely needs to be handled.

And it is not only disabling the watchdog on preemption, but there is 
also a question of restoring it when/before a preempted context 
continues execution.

(Can we ignore the thresholds changing in between? Probably yes. 
Otherwise we'll have to record it in the request structure.)

Regards,

Tvrtko
Chris Wilson Jan. 7, 2019, 1:57 p.m. UTC | #7
Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
> 
> On 07/01/2019 11:58, Tvrtko Ursulin wrote:
> 
> [snip]
> 
> >> Note about future interaction with preemption: Preemption could happen
> >> in a command sequence prior to watchdog counter getting disabled,
> >> resulting in watchdog being triggered following preemption (e.g. when
> >> watchdog had been enabled in the low priority batch). The driver will
> >> need to explicitly disable the watchdog counter as part of the
> >> preemption sequence.
> > 
> > Does the series take care of preemption?
> 
> I did not find that it does.

Oh. I hoped that the watchdog was saved as part of the context... Then
despite preemption, the timeout would resume from where we left off as
soon as it was back on the gpu.

If the timeout remaining was context saved it would be much simpler (at
least on first glance), please say it is.
-Chris
Tvrtko Ursulin Jan. 7, 2019, 4:58 p.m. UTC | #8
On 07/01/2019 13:57, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
>>
>> On 07/01/2019 11:58, Tvrtko Ursulin wrote:
>>
>> [snip]
>>
>>>> Note about future interaction with preemption: Preemption could happen
>>>> in a command sequence prior to watchdog counter getting disabled,
>>>> resulting in watchdog being triggered following preemption (e.g. when
>>>> watchdog had been enabled in the low priority batch). The driver will
>>>> need to explicitly disable the watchdog counter as part of the
>>>> preemption sequence.
>>>
>>> Does the series take care of preemption?
>>
>> I did not find that it does.
> 
> Oh. I hoped that the watchdog was saved as part of the context... Then
> despite preemption, the timeout would resume from where we left off as
> soon as it was back on the gpu.
> 
> If the timeout remaining was context saved it would be much simpler (at
> least on first glance), please say it is.

I made my comments going only by the text from the commit message and 
the absence of any preemption special handling.

Having read the spec, the situation seems like this:

  * Watchdog control and threshold register are context saved and restored.

  * On a context switch watchdog counter is reset to zero and 
automatically disabled until enabled by a context restore or explicitly.

So it sounds the commit message could be wrong that special handling is 
needed from this direction. But read till the end on the restriction listed.

  * Watchdog counter is reset to zero and is not accumulated across 
multiple submission of the same context (due preemption).

I read this as - after preemption contexts gets a new full timeout 
allocation. Or in other words, if a context is preempted N times, it's 
cumulative watchdog timeout will be N * set value.

This could be theoretically exploitable to bypass the timeout. If a 
client sets up two contexts with prio -1 and -2, and keeps submitting 
periodical no-op batches against prio -1 context, while prio -2 is it's 
own hog, then prio -2 context defeats the watchdog timer. I think.. 
would appreciate is someone challenged this conclusion.

And finally there is one programming restriction which says:

  * SW must not preempt the workload which has watchdog enabled. Either 
it must:

a) disable preemption for that workload completely, or
b) disable the watchdog via mmio write before any write to ELSP

This seems it contradiction with the statement that the counter gets 
disabled on context switch and stays disabled.

I did not spot anything like this in the series. So it would seem the 
commit message is correct after all.

It would be good if someone could re-read the bspec text on register 
0x2178 to double check what I wrote.

Regards,

Tvrtko
Chris Wilson Jan. 7, 2019, 6:31 p.m. UTC | #9
Quoting Tvrtko Ursulin (2019-01-07 16:58:24)
> And finally there is one programming restriction which says:
> 
>   * SW must not preempt the workload which has watchdog enabled. Either 
> it must:
> 
> a) disable preemption for that workload completely, or
> b) disable the watchdog via mmio write before any write to ELSP

Oh dear. I'm at a loss for words. Any ELSP write, *shudders*.
-Chris
Antonio Argenziano Jan. 11, 2019, 12:47 a.m. UTC | #10
On 07/01/19 08:58, Tvrtko Ursulin wrote:
> 
> On 07/01/2019 13:57, Chris Wilson wrote:
>> Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
>>>
>>> On 07/01/2019 11:58, Tvrtko Ursulin wrote:
>>>
>>> [snip]
>>>
>>>>> Note about future interaction with preemption: Preemption could happen
>>>>> in a command sequence prior to watchdog counter getting disabled,
>>>>> resulting in watchdog being triggered following preemption (e.g. when
>>>>> watchdog had been enabled in the low priority batch). The driver will
>>>>> need to explicitly disable the watchdog counter as part of the
>>>>> preemption sequence.
>>>>
>>>> Does the series take care of preemption?
>>>
>>> I did not find that it does.
>>
>> Oh. I hoped that the watchdog was saved as part of the context... Then
>> despite preemption, the timeout would resume from where we left off as
>> soon as it was back on the gpu.
>>
>> If the timeout remaining was context saved it would be much simpler (at
>> least on first glance), please say it is.
> 
> I made my comments going only by the text from the commit message and 
> the absence of any preemption special handling.
> 
> Having read the spec, the situation seems like this:
> 
>   * Watchdog control and threshold register are context saved and restored.
> 
>   * On a context switch watchdog counter is reset to zero and 
> automatically disabled until enabled by a context restore or explicitly.
> 
> So it sounds the commit message could be wrong that special handling is 
> needed from this direction. But read till the end on the restriction 
> listed.
> 
>   * Watchdog counter is reset to zero and is not accumulated across 
> multiple submission of the same context (due preemption).
> 
> I read this as - after preemption contexts gets a new full timeout 
> allocation. Or in other words, if a context is preempted N times, it's 
> cumulative watchdog timeout will be N * set value.
> 
> This could be theoretically exploitable to bypass the timeout. If a 
> client sets up two contexts with prio -1 and -2, and keeps submitting 
> periodical no-op batches against prio -1 context, while prio -2 is it's 
> own hog, then prio -2 context defeats the watchdog timer. I think.. 
> would appreciate is someone challenged this conclusion.

I think you are right that is a possibility but, is that a problem? The 
client can just not set the threshold to bypass the timeout. Also 
because you need the hanging batch to be simply preemptible, you cannot 
disrupt any work from another client that is higher priority. This is 
pretty much the same behavior of hangcheck IIRC so something we already 
accept.

> 
> And finally there is one programming restriction which says:
> 
>   * SW must not preempt the workload which has watchdog enabled. Either 
> it must:
> 
> a) disable preemption for that workload completely, or
> b) disable the watchdog via mmio write before any write to ELSP
> 
> This seems it contradiction with the statement that the counter gets 
> disabled on context switch and stays disabled.
> 
> I did not spot anything like this in the series. So it would seem the 
> commit message is correct after all.
> 
> It would be good if someone could re-read the bspec text on register 
> 0x2178 to double check what I wrote.

The way I read it is that the restriction applies only to some platforms 
where the 'normal' description doesn't apply.

Antonio

> 
> Regards,
> 
> Tvrtko
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Santa, Carlos Jan. 11, 2019, 2:58 a.m. UTC | #11
On Mon, 2019-01-07 at 16:58 +0000, Tvrtko Ursulin wrote:
> On 07/01/2019 13:57, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
> > > 
> > > On 07/01/2019 11:58, Tvrtko Ursulin wrote:
> > > 
> > > [snip]
> > > 
> > > > > Note about future interaction with preemption: Preemption
> > > > > could happen
> > > > > in a command sequence prior to watchdog counter getting
> > > > > disabled,
> > > > > resulting in watchdog being triggered following preemption
> > > > > (e.g. when
> > > > > watchdog had been enabled in the low priority batch). The
> > > > > driver will
> > > > > need to explicitly disable the watchdog counter as part of
> > > > > the
> > > > > preemption sequence.
> > > > 
> > > > Does the series take care of preemption?
> > > 
> > > I did not find that it does.
> > 
> > Oh. I hoped that the watchdog was saved as part of the context...
> > Then
> > despite preemption, the timeout would resume from where we left off
> > as
> > soon as it was back on the gpu.
> > 
> > If the timeout remaining was context saved it would be much simpler
> > (at
> > least on first glance), please say it is.

The watchdog timeout gets saved as part of the register state context
so it will still be enabled after coming back from preemption but the
timeout value will be reset back to the original MAX value that it was
programmed. At least that's what I remember from a discussion with
Michel but I can check again...

Regards,
Carlos

> 
> I made my comments going only by the text from the commit message
> and 
> the absence of any preemption special handling.
> 
> Having read the spec, the situation seems like this:
> 
>   * Watchdog control and threshold register are context saved and
> restored.
> 
>   * On a context switch watchdog counter is reset to zero and 
> automatically disabled until enabled by a context restore or
> explicitly.
> 
> So it sounds the commit message could be wrong that special handling
> is 
> needed from this direction. But read till the end on the restriction
> listed.
> 
>   * Watchdog counter is reset to zero and is not accumulated across 
> multiple submission of the same context (due preemption).
> 
> I read this as - after preemption contexts gets a new full timeout 
> allocation. Or in other words, if a context is preempted N times,
> it's 
> cumulative watchdog timeout will be N * set value.
> 
> This could be theoretically exploitable to bypass the timeout. If a 
> client sets up two contexts with prio -1 and -2, and keeps
> submitting 
> periodical no-op batches against prio -1 context, while prio -2 is
> it's 
> own hog, then prio -2 context defeats the watchdog timer. I think.. 
> would appreciate is someone challenged this conclusion.
> 
> And finally there is one programming restriction which says:
> 
>   * SW must not preempt the workload which has watchdog enabled.
> Either 
> it must:
> 
> a) disable preemption for that workload completely, or
> b) disable the watchdog via mmio write before any write to ELSP
> 
> This seems it contradiction with the statement that the counter gets 
> disabled on context switch and stays disabled.
> 
> I did not spot anything like this in the series. So it would seem
> the 
> commit message is correct after all.
> 
> It would be good if someone could re-read the bspec text on register 
> 0x2178 to double check what I wrote.
> 
> Regards,
> 
> Tvrtko
Tvrtko Ursulin Jan. 11, 2019, 8:22 a.m. UTC | #12
On 11/01/2019 00:47, Antonio Argenziano wrote:
> On 07/01/19 08:58, Tvrtko Ursulin wrote:
>> On 07/01/2019 13:57, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
>>>>
>>>> On 07/01/2019 11:58, Tvrtko Ursulin wrote:
>>>>
>>>> [snip]
>>>>
>>>>>> Note about future interaction with preemption: Preemption could 
>>>>>> happen
>>>>>> in a command sequence prior to watchdog counter getting disabled,
>>>>>> resulting in watchdog being triggered following preemption (e.g. when
>>>>>> watchdog had been enabled in the low priority batch). The driver will
>>>>>> need to explicitly disable the watchdog counter as part of the
>>>>>> preemption sequence.
>>>>>
>>>>> Does the series take care of preemption?
>>>>
>>>> I did not find that it does.
>>>
>>> Oh. I hoped that the watchdog was saved as part of the context... Then
>>> despite preemption, the timeout would resume from where we left off as
>>> soon as it was back on the gpu.
>>>
>>> If the timeout remaining was context saved it would be much simpler (at
>>> least on first glance), please say it is.
>>
>> I made my comments going only by the text from the commit message and 
>> the absence of any preemption special handling.
>>
>> Having read the spec, the situation seems like this:
>>
>>   * Watchdog control and threshold register are context saved and 
>> restored.
>>
>>   * On a context switch watchdog counter is reset to zero and 
>> automatically disabled until enabled by a context restore or explicitly.
>>
>> So it sounds the commit message could be wrong that special handling 
>> is needed from this direction. But read till the end on the 
>> restriction listed.
>>
>>   * Watchdog counter is reset to zero and is not accumulated across 
>> multiple submission of the same context (due preemption).
>>
>> I read this as - after preemption contexts gets a new full timeout 
>> allocation. Or in other words, if a context is preempted N times, it's 
>> cumulative watchdog timeout will be N * set value.
>>
>> This could be theoretically exploitable to bypass the timeout. If a 
>> client sets up two contexts with prio -1 and -2, and keeps submitting 
>> periodical no-op batches against prio -1 context, while prio -2 is 
>> it's own hog, then prio -2 context defeats the watchdog timer. I 
>> think.. would appreciate is someone challenged this conclusion.
> 
> I think you are right that is a possibility but, is that a problem? The 
> client can just not set the threshold to bypass the timeout. Also 
> because you need the hanging batch to be simply preemptible, you cannot 
> disrupt any work from another client that is higher priority. This is 

But I think higher priority client can have the same effect on the lower 
priority purely by accident, no?

As a real world example, user kicks off an background transcoding job, 
which happens to use prio -2, and uses the watchdog timer.

At the same time user watches a video from a player of normal priority. 
This causes periodic, say 24Hz, preemption events, due frame decoding 
activity on the same engine as the transcoding client.

Does this defeat the watchdog timer for the former is the question? Then 
the questions of can we do something about it and whether it really 
isn't a problem?

Maybe it is not disrupting higher priority clients but it is causing an 
time unbound power drain.

> pretty much the same behavior of hangcheck IIRC so something we already 
> accept.

You mean today hangcheck wouldn't notice a hanging batch in the same 
scenario as above? If so it sounds like a huge gap we need to try and fix.

>>
>> And finally there is one programming restriction which says:
>>
>>   * SW must not preempt the workload which has watchdog enabled. 
>> Either it must:
>>
>> a) disable preemption for that workload completely, or
>> b) disable the watchdog via mmio write before any write to ELSP
>>
>> This seems it contradiction with the statement that the counter gets 
>> disabled on context switch and stays disabled.
>>
>> I did not spot anything like this in the series. So it would seem the 
>> commit message is correct after all.
>>
>> It would be good if someone could re-read the bspec text on register 
>> 0x2178 to double check what I wrote.
> 
> The way I read it is that the restriction applies only to some platforms 
> where the 'normal' description doesn't apply.

You are right. Are the listed parts in the field so the series would 
have to handle this or we can ignore it?

Regards,

Tvrtko
Antonio Argenziano Jan. 11, 2019, 5:31 p.m. UTC | #13
On 11/01/19 00:22, Tvrtko Ursulin wrote:
> 
> On 11/01/2019 00:47, Antonio Argenziano wrote:
>> On 07/01/19 08:58, Tvrtko Ursulin wrote:
>>> On 07/01/2019 13:57, Chris Wilson wrote:
>>>> Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
>>>>>
>>>>> On 07/01/2019 11:58, Tvrtko Ursulin wrote:
>>>>>
>>>>> [snip]
>>>>>
>>>>>>> Note about future interaction with preemption: Preemption could 
>>>>>>> happen
>>>>>>> in a command sequence prior to watchdog counter getting disabled,
>>>>>>> resulting in watchdog being triggered following preemption (e.g. 
>>>>>>> when
>>>>>>> watchdog had been enabled in the low priority batch). The driver 
>>>>>>> will
>>>>>>> need to explicitly disable the watchdog counter as part of the
>>>>>>> preemption sequence.
>>>>>>
>>>>>> Does the series take care of preemption?
>>>>>
>>>>> I did not find that it does.
>>>>
>>>> Oh. I hoped that the watchdog was saved as part of the context... Then
>>>> despite preemption, the timeout would resume from where we left off as
>>>> soon as it was back on the gpu.
>>>>
>>>> If the timeout remaining was context saved it would be much simpler (at
>>>> least on first glance), please say it is.
>>>
>>> I made my comments going only by the text from the commit message and 
>>> the absence of any preemption special handling.
>>>
>>> Having read the spec, the situation seems like this:
>>>
>>>   * Watchdog control and threshold register are context saved and 
>>> restored.
>>>
>>>   * On a context switch watchdog counter is reset to zero and 
>>> automatically disabled until enabled by a context restore or explicitly.
>>>
>>> So it sounds the commit message could be wrong that special handling 
>>> is needed from this direction. But read till the end on the 
>>> restriction listed.
>>>
>>>   * Watchdog counter is reset to zero and is not accumulated across 
>>> multiple submission of the same context (due preemption).
>>>
>>> I read this as - after preemption contexts gets a new full timeout 
>>> allocation. Or in other words, if a context is preempted N times, 
>>> it's cumulative watchdog timeout will be N * set value.
>>>
>>> This could be theoretically exploitable to bypass the timeout. If a 
>>> client sets up two contexts with prio -1 and -2, and keeps submitting 
>>> periodical no-op batches against prio -1 context, while prio -2 is 
>>> it's own hog, then prio -2 context defeats the watchdog timer. I 
>>> think.. would appreciate is someone challenged this conclusion.
>>
>> I think you are right that is a possibility but, is that a problem? 
>> The client can just not set the threshold to bypass the timeout. Also 
>> because you need the hanging batch to be simply preemptible, you 
>> cannot disrupt any work from another client that is higher priority. 
>> This is 
> 
> But I think higher priority client can have the same effect on the lower 
> priority purely by accident, no?
> 
> As a real world example, user kicks off an background transcoding job, 
> which happens to use prio -2, and uses the watchdog timer.
> 
> At the same time user watches a video from a player of normal priority. 
> This causes periodic, say 24Hz, preemption events, due frame decoding 
> activity on the same engine as the transcoding client.
> 
> Does this defeat the watchdog timer for the former is the question? Then 
> the questions of can we do something about it and whether it really 
> isn't a problem?

I guess it depends if you consider that timeout as the maximum lifespan 
a workload can have or max contiguous active time.

> 
> Maybe it is not disrupting higher priority clients but it is causing an 
> time unbound power drain.
> 
>> pretty much the same behavior of hangcheck IIRC so something we 
>> already accept.
> 
> You mean today hangcheck wouldn't notice a hanging batch in the same 
> scenario as above? If so it sounds like a huge gap we need to try and fix.

My understanding of it is that we only keep a record of what was running 
the last time hangcheck was run so it is possible to trick it into 
resetting when a preemption occurs but I could be missing something.

> 
>>>
>>> And finally there is one programming restriction which says:
>>>
>>>   * SW must not preempt the workload which has watchdog enabled. 
>>> Either it must:
>>>
>>> a) disable preemption for that workload completely, or
>>> b) disable the watchdog via mmio write before any write to ELSP
>>>
>>> This seems it contradiction with the statement that the counter gets 
>>> disabled on context switch and stays disabled.
>>>
>>> I did not spot anything like this in the series. So it would seem the 
>>> commit message is correct after all.
>>>
>>> It would be good if someone could re-read the bspec text on register 
>>> 0x2178 to double check what I wrote.
>>
>> The way I read it is that the restriction applies only to some 
>> platforms where the 'normal' description doesn't apply.
> 
> You are right. Are the listed parts in the field so the series would 
> have to handle this or we can ignore it?

I think there is something we need to handle e.g. BXT.

Antonio

> 
> Regards,
> 
> Tvrtko
John Harrison Jan. 11, 2019, 9:28 p.m. UTC | #14
On 1/11/2019 09:31, Antonio Argenziano wrote:
>
> On 11/01/19 00:22, Tvrtko Ursulin wrote:
>>
>> On 11/01/2019 00:47, Antonio Argenziano wrote:
>>> On 07/01/19 08:58, Tvrtko Ursulin wrote:
>>>> On 07/01/2019 13:57, Chris Wilson wrote:
>>>>> Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
>>>>>>
>>>>>> On 07/01/2019 11:58, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>>> Note about future interaction with preemption: Preemption could 
>>>>>>>> happen
>>>>>>>> in a command sequence prior to watchdog counter getting disabled,
>>>>>>>> resulting in watchdog being triggered following preemption 
>>>>>>>> (e.g. when
>>>>>>>> watchdog had been enabled in the low priority batch). The 
>>>>>>>> driver will
>>>>>>>> need to explicitly disable the watchdog counter as part of the
>>>>>>>> preemption sequence.
>>>>>>>
>>>>>>> Does the series take care of preemption?
>>>>>>
>>>>>> I did not find that it does.
>>>>>
>>>>> Oh. I hoped that the watchdog was saved as part of the context... 
>>>>> Then
>>>>> despite preemption, the timeout would resume from where we left 
>>>>> off as
>>>>> soon as it was back on the gpu.
>>>>>
>>>>> If the timeout remaining was context saved it would be much 
>>>>> simpler (at
>>>>> least on first glance), please say it is.
>>>>
>>>> I made my comments going only by the text from the commit message 
>>>> and the absence of any preemption special handling.
>>>>
>>>> Having read the spec, the situation seems like this:
>>>>
>>>>   * Watchdog control and threshold register are context saved and 
>>>> restored.
>>>>
>>>>   * On a context switch watchdog counter is reset to zero and 
>>>> automatically disabled until enabled by a context restore or 
>>>> explicitly.
>>>>
>>>> So it sounds the commit message could be wrong that special 
>>>> handling is needed from this direction. But read till the end on 
>>>> the restriction listed.
>>>>
>>>>   * Watchdog counter is reset to zero and is not accumulated across 
>>>> multiple submission of the same context (due preemption).
>>>>
>>>> I read this as - after preemption contexts gets a new full timeout 
>>>> allocation. Or in other words, if a context is preempted N times, 
>>>> it's cumulative watchdog timeout will be N * set value.
>>>>
>>>> This could be theoretically exploitable to bypass the timeout. If a 
>>>> client sets up two contexts with prio -1 and -2, and keeps 
>>>> submitting periodical no-op batches against prio -1 context, while 
>>>> prio -2 is it's own hog, then prio -2 context defeats the watchdog 
>>>> timer. I think.. would appreciate is someone challenged this 
>>>> conclusion.
>>>
>>> I think you are right that is a possibility but, is that a problem? 
>>> The client can just not set the threshold to bypass the timeout. 
>>> Also because you need the hanging batch to be simply preemptible, 
>>> you cannot disrupt any work from another client that is higher 
>>> priority. This is 
>>
>> But I think higher priority client can have the same effect on the 
>> lower priority purely by accident, no?
>>
>> As a real world example, user kicks off an background transcoding 
>> job, which happens to use prio -2, and uses the watchdog timer.
>>
>> At the same time user watches a video from a player of normal 
>> priority. This causes periodic, say 24Hz, preemption events, due 
>> frame decoding activity on the same engine as the transcoding client.
>>
>> Does this defeat the watchdog timer for the former is the question? 
>> Then the questions of can we do something about it and whether it 
>> really isn't a problem?
>
> I guess it depends if you consider that timeout as the maximum 
> lifespan a workload can have or max contiguous active time.

I believe the intended purpose of the watchdog is to prevent broken 
bitstreams hanging the transcoder/player. That is, it is a form of error 
detection used by the media driver to handle bad user input. So if there 
is a way for the watchdog to be extended indefinitely under normal 
situations, that would be a problem. It means the transcoder will not 
detect the broken input data in a timely manner and effectively hang 
rather than skip over to the next packet. And note that broken input 
data can be caused by something as innocent as a dropped packet due to 
high network contention. No need for any malicious activity at all.

John.
Tvrtko Ursulin Jan. 16, 2019, 4:15 p.m. UTC | #15
On 11/01/2019 21:28, John Harrison wrote:
> 
> On 1/11/2019 09:31, Antonio Argenziano wrote:
>>
>> On 11/01/19 00:22, Tvrtko Ursulin wrote:
>>>
>>> On 11/01/2019 00:47, Antonio Argenziano wrote:
>>>> On 07/01/19 08:58, Tvrtko Ursulin wrote:
>>>>> On 07/01/2019 13:57, Chris Wilson wrote:
>>>>>> Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
>>>>>>>
>>>>>>> On 07/01/2019 11:58, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> [snip]
>>>>>>>
>>>>>>>>> Note about future interaction with preemption: Preemption could 
>>>>>>>>> happen
>>>>>>>>> in a command sequence prior to watchdog counter getting disabled,
>>>>>>>>> resulting in watchdog being triggered following preemption 
>>>>>>>>> (e.g. when
>>>>>>>>> watchdog had been enabled in the low priority batch). The 
>>>>>>>>> driver will
>>>>>>>>> need to explicitly disable the watchdog counter as part of the
>>>>>>>>> preemption sequence.
>>>>>>>>
>>>>>>>> Does the series take care of preemption?
>>>>>>>
>>>>>>> I did not find that it does.
>>>>>>
>>>>>> Oh. I hoped that the watchdog was saved as part of the context... 
>>>>>> Then
>>>>>> despite preemption, the timeout would resume from where we left 
>>>>>> off as
>>>>>> soon as it was back on the gpu.
>>>>>>
>>>>>> If the timeout remaining was context saved it would be much 
>>>>>> simpler (at
>>>>>> least on first glance), please say it is.
>>>>>
>>>>> I made my comments going only by the text from the commit message 
>>>>> and the absence of any preemption special handling.
>>>>>
>>>>> Having read the spec, the situation seems like this:
>>>>>
>>>>>   * Watchdog control and threshold register are context saved and 
>>>>> restored.
>>>>>
>>>>>   * On a context switch watchdog counter is reset to zero and 
>>>>> automatically disabled until enabled by a context restore or 
>>>>> explicitly.
>>>>>
>>>>> So it sounds the commit message could be wrong that special 
>>>>> handling is needed from this direction. But read till the end on 
>>>>> the restriction listed.
>>>>>
>>>>>   * Watchdog counter is reset to zero and is not accumulated across 
>>>>> multiple submission of the same context (due preemption).
>>>>>
>>>>> I read this as - after preemption contexts gets a new full timeout 
>>>>> allocation. Or in other words, if a context is preempted N times, 
>>>>> it's cumulative watchdog timeout will be N * set value.
>>>>>
>>>>> This could be theoretically exploitable to bypass the timeout. If a 
>>>>> client sets up two contexts with prio -1 and -2, and keeps 
>>>>> submitting periodical no-op batches against prio -1 context, while 
>>>>> prio -2 is it's own hog, then prio -2 context defeats the watchdog 
>>>>> timer. I think.. would appreciate is someone challenged this 
>>>>> conclusion.
>>>>
>>>> I think you are right that is a possibility but, is that a problem? 
>>>> The client can just not set the threshold to bypass the timeout. 
>>>> Also because you need the hanging batch to be simply preemptible, 
>>>> you cannot disrupt any work from another client that is higher 
>>>> priority. This is 
>>>
>>> But I think higher priority client can have the same effect on the 
>>> lower priority purely by accident, no?
>>>
>>> As a real world example, user kicks off an background transcoding 
>>> job, which happens to use prio -2, and uses the watchdog timer.
>>>
>>> At the same time user watches a video from a player of normal 
>>> priority. This causes periodic, say 24Hz, preemption events, due 
>>> frame decoding activity on the same engine as the transcoding client.
>>>
>>> Does this defeat the watchdog timer for the former is the question? 
>>> Then the questions of can we do something about it and whether it 
>>> really isn't a problem?
>>
>> I guess it depends if you consider that timeout as the maximum 
>> lifespan a workload can have or max contiguous active time.
> 
> I believe the intended purpose of the watchdog is to prevent broken 
> bitstreams hanging the transcoder/player. That is, it is a form of error 
> detection used by the media driver to handle bad user input. So if there 
> is a way for the watchdog to be extended indefinitely under normal 
> situations, that would be a problem. It means the transcoder will not 
> detect the broken input data in a timely manner and effectively hang 
> rather than skip over to the next packet. And note that broken input 
> data can be caused by something as innocent as a dropped packet due to 
> high network contention. No need for any malicious activity at all.

My understanding of the intended purpose is the same. And it would be a 
very useful feature.

Chris mentioned the other day that until hardware is fixed to context 
save/restore the watchdog counter this could simply be implemented using 
timers. And I have to say I agree. Shouldn't be too hard to prototype it 
using  hrtimers - start on context in, stop on context out and kick 
forward on user interrupts. More or less.

Then if the cost of these hrtimer manipulations wouldn't show in 
profiles significantly we would have a solution. At least in execlists 
mode. :) But in parallel we could file a feature request to fix the 
hardware implementation and then could just switch the timer "backend" 
from hrtimers to GPU.

Regards,

Tvrtko
Antonio Argenziano Jan. 16, 2019, 5:42 p.m. UTC | #16
On 16/01/19 08:15, Tvrtko Ursulin wrote:
> 
> On 11/01/2019 21:28, John Harrison wrote:
>>
>> On 1/11/2019 09:31, Antonio Argenziano wrote:
>>>
>>> On 11/01/19 00:22, Tvrtko Ursulin wrote:
>>>>
>>>> On 11/01/2019 00:47, Antonio Argenziano wrote:
>>>>> On 07/01/19 08:58, Tvrtko Ursulin wrote:
>>>>>> On 07/01/2019 13:57, Chris Wilson wrote:
>>>>>>> Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
>>>>>>>>
>>>>>>>> On 07/01/2019 11:58, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> [snip]
>>>>>>>>
>>>>>>>>>> Note about future interaction with preemption: Preemption 
>>>>>>>>>> could happen
>>>>>>>>>> in a command sequence prior to watchdog counter getting disabled,
>>>>>>>>>> resulting in watchdog being triggered following preemption 
>>>>>>>>>> (e.g. when
>>>>>>>>>> watchdog had been enabled in the low priority batch). The 
>>>>>>>>>> driver will
>>>>>>>>>> need to explicitly disable the watchdog counter as part of the
>>>>>>>>>> preemption sequence.
>>>>>>>>>
>>>>>>>>> Does the series take care of preemption?
>>>>>>>>
>>>>>>>> I did not find that it does.
>>>>>>>
>>>>>>> Oh. I hoped that the watchdog was saved as part of the context... 
>>>>>>> Then
>>>>>>> despite preemption, the timeout would resume from where we left 
>>>>>>> off as
>>>>>>> soon as it was back on the gpu.
>>>>>>>
>>>>>>> If the timeout remaining was context saved it would be much 
>>>>>>> simpler (at
>>>>>>> least on first glance), please say it is.
>>>>>>
>>>>>> I made my comments going only by the text from the commit message 
>>>>>> and the absence of any preemption special handling.
>>>>>>
>>>>>> Having read the spec, the situation seems like this:
>>>>>>
>>>>>>   * Watchdog control and threshold register are context saved and 
>>>>>> restored.
>>>>>>
>>>>>>   * On a context switch watchdog counter is reset to zero and 
>>>>>> automatically disabled until enabled by a context restore or 
>>>>>> explicitly.
>>>>>>
>>>>>> So it sounds the commit message could be wrong that special 
>>>>>> handling is needed from this direction. But read till the end on 
>>>>>> the restriction listed.
>>>>>>
>>>>>>   * Watchdog counter is reset to zero and is not accumulated 
>>>>>> across multiple submission of the same context (due preemption).
>>>>>>
>>>>>> I read this as - after preemption contexts gets a new full timeout 
>>>>>> allocation. Or in other words, if a context is preempted N times, 
>>>>>> it's cumulative watchdog timeout will be N * set value.
>>>>>>
>>>>>> This could be theoretically exploitable to bypass the timeout. If 
>>>>>> a client sets up two contexts with prio -1 and -2, and keeps 
>>>>>> submitting periodical no-op batches against prio -1 context, while 
>>>>>> prio -2 is it's own hog, then prio -2 context defeats the watchdog 
>>>>>> timer. I think.. would appreciate is someone challenged this 
>>>>>> conclusion.
>>>>>
>>>>> I think you are right that is a possibility but, is that a problem? 
>>>>> The client can just not set the threshold to bypass the timeout. 
>>>>> Also because you need the hanging batch to be simply preemptible, 
>>>>> you cannot disrupt any work from another client that is higher 
>>>>> priority. This is 
>>>>
>>>> But I think higher priority client can have the same effect on the 
>>>> lower priority purely by accident, no?
>>>>
>>>> As a real world example, user kicks off an background transcoding 
>>>> job, which happens to use prio -2, and uses the watchdog timer.
>>>>
>>>> At the same time user watches a video from a player of normal 
>>>> priority. This causes periodic, say 24Hz, preemption events, due 
>>>> frame decoding activity on the same engine as the transcoding client.
>>>>
>>>> Does this defeat the watchdog timer for the former is the question? 
>>>> Then the questions of can we do something about it and whether it 
>>>> really isn't a problem?
>>>
>>> I guess it depends if you consider that timeout as the maximum 
>>> lifespan a workload can have or max contiguous active time.
>>
>> I believe the intended purpose of the watchdog is to prevent broken 
>> bitstreams hanging the transcoder/player. That is, it is a form of 
>> error detection used by the media driver to handle bad user input. So 
>> if there is a way for the watchdog to be extended indefinitely under 
>> normal situations, that would be a problem. It means the transcoder 
>> will not detect the broken input data in a timely manner and 
>> effectively hang rather than skip over to the next packet. And note 
>> that broken input data can be caused by something as innocent as a 
>> dropped packet due to high network contention. No need for any 
>> malicious activity at all.
> 
> My understanding of the intended purpose is the same. And it would be a 
> very useful feature.

I'm not familiar enough with the application but, in the scenario above, 
what if the batch that is being preempted is not stuck but just nice 
enough to be preempted enough times so that it wouldn't complete in the 
given wall clock time but would be fast enough by itself.

> 
> Chris mentioned the other day that until hardware is fixed to context 
> save/restore the watchdog counter this could simply be implemented using 
> timers. And I have to say I agree. Shouldn't be too hard to prototype it 
> using  hrtimers - start on context in, stop on context out and kick 
> forward on user interrupts. More or less.

Would this implement the feature on the driver side just like it would 
for the HW? I mean have the same IOCTL and silently discard workload 
that hit the timeout. Also, would it discard batches while they are in 
the queue (not active)?

Antonio

> 
> Then if the cost of these hrtimer manipulations wouldn't show in 
> profiles significantly we would have a solution. At least in execlists 
> mode. :) But in parallel we could file a feature request to fix the 
> hardware implementation and then could just switch the timer "backend" 
> from hrtimers to GPU.
> 
> Regards,
> 
> Tvrtko
Antonio Argenziano Jan. 16, 2019, 5:59 p.m. UTC | #17
On 16/01/19 09:42, Antonio Argenziano wrote:
> 
> 
> On 16/01/19 08:15, Tvrtko Ursulin wrote:
>>
>> On 11/01/2019 21:28, John Harrison wrote:
>>>
>>> On 1/11/2019 09:31, Antonio Argenziano wrote:
>>>>
>>>> On 11/01/19 00:22, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 11/01/2019 00:47, Antonio Argenziano wrote:
>>>>>> On 07/01/19 08:58, Tvrtko Ursulin wrote:
>>>>>>> On 07/01/2019 13:57, Chris Wilson wrote:
>>>>>>>> Quoting Tvrtko Ursulin (2019-01-07 13:43:29)
>>>>>>>>>
>>>>>>>>> On 07/01/2019 11:58, Tvrtko Ursulin wrote:
>>>>>>>>>
>>>>>>>>> [snip]
>>>>>>>>>
>>>>>>>>>>> Note about future interaction with preemption: Preemption 
>>>>>>>>>>> could happen
>>>>>>>>>>> in a command sequence prior to watchdog counter getting 
>>>>>>>>>>> disabled,
>>>>>>>>>>> resulting in watchdog being triggered following preemption 
>>>>>>>>>>> (e.g. when
>>>>>>>>>>> watchdog had been enabled in the low priority batch). The 
>>>>>>>>>>> driver will
>>>>>>>>>>> need to explicitly disable the watchdog counter as part of the
>>>>>>>>>>> preemption sequence.
>>>>>>>>>>
>>>>>>>>>> Does the series take care of preemption?
>>>>>>>>>
>>>>>>>>> I did not find that it does.
>>>>>>>>
>>>>>>>> Oh. I hoped that the watchdog was saved as part of the 
>>>>>>>> context... Then
>>>>>>>> despite preemption, the timeout would resume from where we left 
>>>>>>>> off as
>>>>>>>> soon as it was back on the gpu.
>>>>>>>>
>>>>>>>> If the timeout remaining was context saved it would be much 
>>>>>>>> simpler (at
>>>>>>>> least on first glance), please say it is.
>>>>>>>
>>>>>>> I made my comments going only by the text from the commit message 
>>>>>>> and the absence of any preemption special handling.
>>>>>>>
>>>>>>> Having read the spec, the situation seems like this:
>>>>>>>
>>>>>>>   * Watchdog control and threshold register are context saved and 
>>>>>>> restored.
>>>>>>>
>>>>>>>   * On a context switch watchdog counter is reset to zero and 
>>>>>>> automatically disabled until enabled by a context restore or 
>>>>>>> explicitly.
>>>>>>>
>>>>>>> So it sounds the commit message could be wrong that special 
>>>>>>> handling is needed from this direction. But read till the end on 
>>>>>>> the restriction listed.
>>>>>>>
>>>>>>>   * Watchdog counter is reset to zero and is not accumulated 
>>>>>>> across multiple submission of the same context (due preemption).
>>>>>>>
>>>>>>> I read this as - after preemption contexts gets a new full 
>>>>>>> timeout allocation. Or in other words, if a context is preempted 
>>>>>>> N times, it's cumulative watchdog timeout will be N * set value.
>>>>>>>
>>>>>>> This could be theoretically exploitable to bypass the timeout. If 
>>>>>>> a client sets up two contexts with prio -1 and -2, and keeps 
>>>>>>> submitting periodical no-op batches against prio -1 context, 
>>>>>>> while prio -2 is it's own hog, then prio -2 context defeats the 
>>>>>>> watchdog timer. I think.. would appreciate is someone challenged 
>>>>>>> this conclusion.
>>>>>>
>>>>>> I think you are right that is a possibility but, is that a 
>>>>>> problem? The client can just not set the threshold to bypass the 
>>>>>> timeout. Also because you need the hanging batch to be simply 
>>>>>> preemptible, you cannot disrupt any work from another client that 
>>>>>> is higher priority. This is 
>>>>>
>>>>> But I think higher priority client can have the same effect on the 
>>>>> lower priority purely by accident, no?
>>>>>
>>>>> As a real world example, user kicks off an background transcoding 
>>>>> job, which happens to use prio -2, and uses the watchdog timer.
>>>>>
>>>>> At the same time user watches a video from a player of normal 
>>>>> priority. This causes periodic, say 24Hz, preemption events, due 
>>>>> frame decoding activity on the same engine as the transcoding client.
>>>>>
>>>>> Does this defeat the watchdog timer for the former is the question? 
>>>>> Then the questions of can we do something about it and whether it 
>>>>> really isn't a problem?
>>>>
>>>> I guess it depends if you consider that timeout as the maximum 
>>>> lifespan a workload can have or max contiguous active time.
>>>
>>> I believe the intended purpose of the watchdog is to prevent broken 
>>> bitstreams hanging the transcoder/player. That is, it is a form of 
>>> error detection used by the media driver to handle bad user input. So 
>>> if there is a way for the watchdog to be extended indefinitely under 
>>> normal situations, that would be a problem. It means the transcoder 
>>> will not detect the broken input data in a timely manner and 
>>> effectively hang rather than skip over to the next packet. And note 
>>> that broken input data can be caused by something as innocent as a 
>>> dropped packet due to high network contention. No need for any 
>>> malicious activity at all.
>>
>> My understanding of the intended purpose is the same. And it would be 
>> a very useful feature.
> 
> I'm not familiar enough with the application but, in the scenario above, 
> what if the batch that is being preempted is not stuck but just nice 
> enough to be preempted enough times so that it wouldn't complete in the 
> given wall clock time but would be fast enough by itself.

Ignore me, re-reading this I now get you are trying to advocate for an 
active-time timeout not pure wall clock time.

> 
>>
>> Chris mentioned the other day that until hardware is fixed to context 
>> save/restore the watchdog counter this could simply be implemented 
>> using timers. And I have to say I agree. Shouldn't be too hard to 
>> prototype it using  hrtimers - start on context in, stop on context 
>> out and kick forward on user interrupts. More or less.
> 
> Would this implement the feature on the driver side just like it would 
> for the HW? I mean have the same IOCTL and silently discard workload 
> that hit the timeout. Also, would it discard batches while they are in 
> the queue (not active)?
> 
> Antonio
> 
>>
>> Then if the cost of these hrtimer manipulations wouldn't show in 
>> profiles significantly we would have a solution. At least in execlists 
>> mode. :) But in parallel we could file a feature request to fix the 
>> hardware implementation and then could just switch the timer "backend" 
>> from hrtimers to GPU.
>>
>> Regards,
>>
>> Tvrtko
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Santa, Carlos Jan. 24, 2019, 12:13 a.m. UTC | #18
On Mon, 2019-01-07 at 11:58 +0000, Tvrtko Ursulin wrote:

[snip]

> > 
> >   
> >   static void gen8_gt_irq_ack(struct drm_i915_private *i915,
> > @@ -3329,7 +3332,7 @@ void i915_handle_error(struct
> > drm_i915_private *dev_priv,
> >   	if (intel_has_reset_engine(dev_priv) &&
> >   	    !i915_terminally_wedged(&dev_priv->gpu_error)) {
> >   		for_each_engine_masked(engine, dev_priv, engine_mask,
> > tmp) {
> > -			BUILD_BUG_ON(I915_RESET_MODESET >=
> > I915_RESET_ENGINE);
> > +			BUILD_BUG_ON(I915_RESET_WATCHDOG >=
> > I915_RESET_ENGINE);
> >   			if (test_and_set_bit(I915_RESET_ENGINE +
> > engine->id,
> >   					     &dev_priv-
> > >gpu_error.flags))
> >   				continue;
> > @@ -4162,12 +4165,15 @@ static void gen8_gt_irq_postinstall(struct
> > drm_i915_private *dev_priv)
> >   	uint32_t gt_interrupts[] = {
> >   		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
> >   			GT_CONTEXT_SWITCH_INTERRUPT <<
> > GEN8_RCS_IRQ_SHIFT |
> > +			GT_GEN8_WATCHDOG_INTERRUPT <<
> > GEN8_RCS_IRQ_SHIFT |
> >   			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT
> > |
> >   			GT_CONTEXT_SWITCH_INTERRUPT <<
> > GEN8_BCS_IRQ_SHIFT,
> >   		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
> >   			GT_CONTEXT_SWITCH_INTERRUPT <<
> > GEN8_VCS1_IRQ_SHIFT |
> > +			GT_GEN8_WATCHDOG_INTERRUPT <<
> > GEN8_VCS1_IRQ_SHIFT |
> >   			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT
> > |
> > -			GT_CONTEXT_SWITCH_INTERRUPT <<
> > GEN8_VCS2_IRQ_SHIFT,
> > +			GT_CONTEXT_SWITCH_INTERRUPT <<
> > GEN8_VCS2_IRQ_SHIFT |
> > +			GT_GEN8_WATCHDOG_INTERRUPT <<
> > GEN8_VCS2_IRQ_SHIFT,
> >   		0,
> >   		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT |
> >   			GT_CONTEXT_SWITCH_INTERRUPT <<
> > GEN8_VECS_IRQ_SHIFT
> > @@ -4176,6 +4182,10 @@ static void gen8_gt_irq_postinstall(struct
> > drm_i915_private *dev_priv)
> >   	if (HAS_L3_DPF(dev_priv))
> >   		gt_interrupts[0] |=
> > GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
> >   
> > +	/* VECS watchdog is only available in skl+ */
> > +	if (INTEL_GEN(dev_priv) >= 9)
> > +		gt_interrupts[3] |= GT_GEN8_WATCHDOG_INTERRUPT;
> 
> Is the shift missing here?
> 

No, the above addresses the interrupts for the VECS watchdog only and
the correct shift is applied in element 3 of the array.

Regards,
Carlos
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 6d9f45468ac1..7130786aa5b4 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -256,6 +256,9 @@  struct i915_gpu_error {
 	 * inspect the bit and do the reset directly, otherwise the worker
 	 * waits for the struct_mutex.
 	 *
+	 * #I915_RESET_WATCHDOG - When hw detects a hang before us, we can use
+	 * I915_RESET_WATCHDOG to report the hang detection cause accurately.
+	 *
 	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
 	 * acquire the struct_mutex to reset an engine, we need an explicit
 	 * flag to prevent two concurrent reset attempts in the same engine.
@@ -271,6 +274,7 @@  struct i915_gpu_error {
 #define I915_RESET_BACKOFF	0
 #define I915_RESET_HANDOFF	1
 #define I915_RESET_MODESET	2
+#define I915_RESET_WATCHDOG	3
 #define I915_WEDGED		(BITS_PER_LONG - 1)
 #define I915_RESET_ENGINE	(I915_WEDGED - I915_NUM_ENGINES)
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index fbb094ecf6c9..859bbadb752f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1498,6 +1498,9 @@  gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 
 	if (tasklet)
 		tasklet_hi_schedule(&engine->execlists.tasklet);
+
+	if (iir & (GT_GEN8_WATCHDOG_INTERRUPT))
+		tasklet_schedule(&engine->execlists.watchdog_tasklet);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
@@ -3329,7 +3332,7 @@  void i915_handle_error(struct drm_i915_private *dev_priv,
 	if (intel_has_reset_engine(dev_priv) &&
 	    !i915_terminally_wedged(&dev_priv->gpu_error)) {
 		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
-			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
+			BUILD_BUG_ON(I915_RESET_WATCHDOG >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
 					     &dev_priv->gpu_error.flags))
 				continue;
@@ -4162,12 +4165,15 @@  static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
 	uint32_t gt_interrupts[] = {
 		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
 			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
+			GT_GEN8_WATCHDOG_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT |
 			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
 		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
 			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
+			GT_GEN8_WATCHDOG_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
 			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
-			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
+			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
+			GT_GEN8_WATCHDOG_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
 		0,
 		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT |
 			GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT
@@ -4176,6 +4182,10 @@  static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
 	if (HAS_L3_DPF(dev_priv))
 		gt_interrupts[0] |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
 
+	/* VECS watchdog is only available in skl+ */
+	if (INTEL_GEN(dev_priv) >= 9)
+		gt_interrupts[3] |= GT_GEN8_WATCHDOG_INTERRUPT;
+
 	dev_priv->pm_ier = 0x0;
 	dev_priv->pm_imr = ~dev_priv->pm_ier;
 	GEN8_IRQ_INIT_NDX(GT, 0, ~gt_interrupts[0], gt_interrupts[0]);
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 44958d994bfa..fff330643090 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -2335,6 +2335,11 @@  enum i915_power_well_id {
 #define RING_START(base)	_MMIO((base) + 0x38)
 #define RING_CTL(base)		_MMIO((base) + 0x3c)
 #define   RING_CTL_SIZE(size)	((size) - PAGE_SIZE) /* in bytes -> pages */
+#define RING_CNTR(base)		_MMIO((base) + 0x178)
+#define   GEN8_WATCHDOG_ENABLE		0
+#define   GEN8_WATCHDOG_DISABLE		1
+#define   GEN8_XCS_WATCHDOG_DISABLE	0xFFFFFFFF /* GEN8 & non-render only */
+#define RING_THRESH(base)	_MMIO((base) + 0x17C)
 #define RING_SYNC_0(base)	_MMIO((base) + 0x40)
 #define RING_SYNC_1(base)	_MMIO((base) + 0x44)
 #define RING_SYNC_2(base)	_MMIO((base) + 0x48)
@@ -2894,6 +2899,7 @@  enum i915_power_well_id {
 #define GT_BSD_USER_INTERRUPT			(1 << 12)
 #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1	(1 << 11) /* hsw+; rsvd on snb, ivb, vlv */
 #define GT_CONTEXT_SWITCH_INTERRUPT		(1 <<  8)
+#define GT_GEN8_WATCHDOG_INTERRUPT		(1 <<  6) /* gen8+ */
 #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT	(1 <<  5) /* !snb */
 #define GT_RENDER_PIPECTL_NOTIFY_INTERRUPT	(1 <<  4)
 #define GT_RENDER_CS_MASTER_ERROR_INTERRUPT	(1 <<  3)
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index 51e9efec5116..2906f0ef3d77 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -213,7 +213,8 @@  static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
 
 static void hangcheck_declare_hang(struct drm_i915_private *i915,
 				   unsigned int hung,
-				   unsigned int stuck)
+				   unsigned int stuck,
+				   unsigned int watchdog)
 {
 	struct intel_engine_cs *engine;
 	char msg[80];
@@ -226,13 +227,16 @@  static void hangcheck_declare_hang(struct drm_i915_private *i915,
 	if (stuck != hung)
 		hung &= ~stuck;
 	len = scnprintf(msg, sizeof(msg),
-			"%s on ", stuck == hung ? "no progress" : "hang");
+			"%s on ", watchdog ? "watchdog timeout" :
+				  stuck == hung ? "no progress" : "hang");
 	for_each_engine_masked(engine, i915, hung, tmp)
 		len += scnprintf(msg + len, sizeof(msg) - len,
 				 "%s, ", engine->name);
 	msg[len-2] = '\0';
 
-	return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg);
+	return i915_handle_error(i915, hung,
+				 watchdog ? 0 : I915_ERROR_CAPTURE,
+				 "%s", msg);
 }
 
 /*
@@ -250,7 +254,7 @@  static void i915_hangcheck_elapsed(struct work_struct *work)
 			     gpu_error.hangcheck_work.work);
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
-	unsigned int hung = 0, stuck = 0, wedged = 0;
+	unsigned int hung = 0, stuck = 0, wedged = 0, watchdog = 0;
 
 	if (!i915_modparams.enable_hangcheck)
 		return;
@@ -261,6 +265,9 @@  static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (i915_terminally_wedged(&dev_priv->gpu_error))
 		return;
 
+	if (test_and_clear_bit(I915_RESET_WATCHDOG, &dev_priv->gpu_error.flags))
+		watchdog = 1;
+
 	/* As enabling the GPU requires fairly extensive mmio access,
 	 * periodically arm the mmio checker to see if we are triggering
 	 * any invalid access.
@@ -293,7 +300,7 @@  static void i915_hangcheck_elapsed(struct work_struct *work)
 	}
 
 	if (hung)
-		hangcheck_declare_hang(dev_priv, hung, stuck);
+		hangcheck_declare_hang(dev_priv, hung, stuck, watchdog);
 
 	/* Reset timer in case GPU hangs without another request being added */
 	i915_queue_hangcheck(dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6c98fb7cebf2..e1dcdf545bee 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2027,6 +2027,70 @@  static int gen8_emit_flush_render(struct i915_request *request,
 	return 0;
 }
 
+/* From GEN9 onwards, all engines use the same RING_CNTR format */
+static inline u32 get_watchdog_disable(struct intel_engine_cs *engine)
+{
+	if (engine->id == RCS || INTEL_GEN(engine->i915) >= 9)
+		return GEN8_WATCHDOG_DISABLE;
+	else
+		return GEN8_XCS_WATCHDOG_DISABLE;
+}
+
+#define GEN8_WATCHDOG_1000US 0x2ee0 //XXX: Temp, replace with helper function
+static void gen8_watchdog_irq_handler(unsigned long data)
+{
+	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
+	struct drm_i915_private *dev_priv = engine->i915;
+	enum forcewake_domains fw_domains;
+	u32 current_seqno;
+
+	switch (engine->id) {
+	default:
+		MISSING_CASE(engine->id);
+		/* fall through */
+	case RCS:
+		fw_domains = FORCEWAKE_RENDER;
+		break;
+	case VCS:
+	case VCS2:
+	case VECS:
+		fw_domains = FORCEWAKE_MEDIA;
+		break;
+	}
+
+	intel_uncore_forcewake_get(dev_priv, fw_domains);
+
+	/* Stop the counter to prevent further timeout interrupts */
+	I915_WRITE_FW(RING_CNTR(engine->mmio_base), get_watchdog_disable(engine));
+
+	current_seqno = intel_engine_get_seqno(engine);
+
+	/* did the request complete after the timer expired? */
+	if (intel_engine_last_submit(engine) == current_seqno)
+		goto fw_put;
+
+	if (engine->hangcheck.watchdog == current_seqno) {
+		/* Make sure the active request will be marked as guilty */
+		engine->hangcheck.stalled = true;
+		engine->hangcheck.acthd = intel_engine_get_active_head(engine);
+		engine->hangcheck.seqno = current_seqno;
+
+		/* And try to run the hangcheck_work as soon as possible */
+		set_bit(I915_RESET_WATCHDOG, &dev_priv->gpu_error.flags);
+		queue_delayed_work(system_long_wq,
+				   &dev_priv->gpu_error.hangcheck_work,
+				   round_jiffies_up_relative(HZ));
+	} else {
+		engine->hangcheck.watchdog = current_seqno;
+		/* Re-start the counter, if really hung, it will expire again */
+		I915_WRITE_FW(RING_THRESH(engine->mmio_base), GEN8_WATCHDOG_1000US);
+		I915_WRITE_FW(RING_CNTR(engine->mmio_base), GEN8_WATCHDOG_ENABLE);
+	}
+
+fw_put:
+	intel_uncore_forcewake_put(dev_priv, fw_domains);
+}
+
 /*
  * Reserve space for 2 NOOPs at the end of each request to be
  * used as a workaround for not being allowed to do lite
@@ -2115,6 +2179,9 @@  void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
 			     &engine->execlists.tasklet.state)))
 		tasklet_kill(&engine->execlists.tasklet);
 
+	if (WARN_ON(test_bit(TASKLET_STATE_SCHED, &engine->execlists.watchdog_tasklet.state)))
+		tasklet_kill(&engine->execlists.watchdog_tasklet);
+
 	dev_priv = engine->i915;
 
 	if (engine->buffer) {
@@ -2208,6 +2275,22 @@  logical_ring_default_irqs(struct intel_engine_cs *engine)
 
 	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift;
 	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
+
+	switch (engine->id) {
+	default:
+		/* BCS engine does not support hw watchdog */
+		break;
+	case RCS:
+	case VCS:
+	case VCS2:
+		engine->irq_keep_mask |= (GT_GEN8_WATCHDOG_INTERRUPT << shift);
+		break;
+	case VECS:
+		if (INTEL_GEN(engine->i915) >= 9)
+			engine->irq_keep_mask |=
+				(GT_GEN8_WATCHDOG_INTERRUPT << shift);
+		break;
+	}
 }
 
 static void
@@ -2221,6 +2304,9 @@  logical_ring_setup(struct intel_engine_cs *engine)
 	tasklet_init(&engine->execlists.tasklet,
 		     execlists_submission_tasklet, (unsigned long)engine);
 
+	tasklet_init(&engine->execlists.watchdog_tasklet,
+		     gen8_watchdog_irq_handler, (unsigned long)engine);
+
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3c1366c58cf3..6cb8b4280035 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -120,6 +120,7 @@  struct intel_instdone {
 struct intel_engine_hangcheck {
 	u64 acthd;
 	u32 seqno;
+	u32 watchdog;
 	enum intel_engine_hangcheck_action action;
 	unsigned long action_timestamp;
 	int deadlock;
@@ -224,6 +225,11 @@  struct intel_engine_execlists {
 	 */
 	struct tasklet_struct tasklet;
 
+	/**
+	 * @watchdog_tasklet: stop counter and re-schedule hangcheck_work asap
+	 */
+	struct tasklet_struct watchdog_tasklet;
+
 	/**
 	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
 	 */