diff mbox series

[3/4] drm/i915/gt: Always send a pulse down the engine after disabling heartbeat

Message ID 20200916094219.3878-3-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/4] drm/i915/gem: Hold request reference for canceling an active context | expand

Commit Message

Chris Wilson Sept. 16, 2020, 9:42 a.m. UTC
Currently, we check we can send a pulse prior to disabling the
heartbeat to verify that we can change the heartbeat, but since we may
re-evaluate execution upon changing the heartbeat interval we need another
pulse afterwards to refresh execution.

Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
---
 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Tvrtko Ursulin Sept. 24, 2020, 1:43 p.m. UTC | #1
On 16/09/2020 10:42, Chris Wilson wrote:
> Currently, we check we can send a pulse prior to disabling the
> heartbeat to verify that we can change the heartbeat, but since we may
> re-evaluate execution upon changing the heartbeat interval we need another
> pulse afterwards to refresh execution.
> 
> Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v5.7+
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index 8ffdf676c0a0..d09df370f7cd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -192,10 +192,12 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>   	WRITE_ONCE(engine->props.heartbeat_interval_ms, delay);
>   
>   	if (intel_engine_pm_get_if_awake(engine)) {
> -		if (delay)
> +		if (delay) {
>   			intel_engine_unpark_heartbeat(engine);
> -		else
> +		} else {
>   			intel_engine_park_heartbeat(engine);
> +			intel_engine_pulse(engine); /* recheck execution */
> +		}
>   		intel_engine_pm_put(engine);
>   	}
>   
> 

I did not immediately get this one. Do we really need two pulses or 
maybe we could re-order the code a bit and just undo the heartbeat park 
if pulse after parking did not work?

Regards,

Tvrtko
Chris Wilson Sept. 25, 2020, 10:01 a.m. UTC | #2
Quoting Tvrtko Ursulin (2020-09-24 14:43:08)
> 
> On 16/09/2020 10:42, Chris Wilson wrote:
> > Currently, we check we can send a pulse prior to disabling the
> > heartbeat to verify that we can change the heartbeat, but since we may
> > re-evaluate execution upon changing the heartbeat interval we need another
> > pulse afterwards to refresh execution.
> > 
> > Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: <stable@vger.kernel.org> # v5.7+
> > ---
> >   drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > index 8ffdf676c0a0..d09df370f7cd 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > @@ -192,10 +192,12 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
> >       WRITE_ONCE(engine->props.heartbeat_interval_ms, delay);
> >   
> >       if (intel_engine_pm_get_if_awake(engine)) {
> > -             if (delay)
> > +             if (delay) {
> >                       intel_engine_unpark_heartbeat(engine);
> > -             else
> > +             } else {
> >                       intel_engine_park_heartbeat(engine);
> > +                     intel_engine_pulse(engine); /* recheck execution */
> > +             }
> >               intel_engine_pm_put(engine);
> >       }
> >   
> > 
> 
> I did not immediately get this one. Do we really need two pulses or 
> maybe we could re-order the code a bit and just undo the heartbeat park 
> if pulse after parking did not work?

We use the first pulse to determine if it's legal for the parameter to
be changed (checking we support the preemptive pulse to remove
non-persistent contexts). Then the second pulse after changing the
parameter to flush the changes through.

I like checking for support before making the change, although we could
try and fixup after failure, there would still be a window where the
change would be visible to the system. We don't need to use the pulse per
se for that check, that's pure convenience as it performs the checking
already.
-Chris
Tvrtko Ursulin Sept. 25, 2020, 1:19 p.m. UTC | #3
On 25/09/2020 11:01, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-09-24 14:43:08)
>>
>> On 16/09/2020 10:42, Chris Wilson wrote:
>>> Currently, we check we can send a pulse prior to disabling the
>>> heartbeat to verify that we can change the heartbeat, but since we may
>>> re-evaluate execution upon changing the heartbeat interval we need another
>>> pulse afterwards to refresh execution.
>>>
>>> Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: <stable@vger.kernel.org> # v5.7+
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 6 ++++--
>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
>>> index 8ffdf676c0a0..d09df370f7cd 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
>>> @@ -192,10 +192,12 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>>>        WRITE_ONCE(engine->props.heartbeat_interval_ms, delay);
>>>    
>>>        if (intel_engine_pm_get_if_awake(engine)) {
>>> -             if (delay)
>>> +             if (delay) {
>>>                        intel_engine_unpark_heartbeat(engine);
>>> -             else
>>> +             } else {
>>>                        intel_engine_park_heartbeat(engine);
>>> +                     intel_engine_pulse(engine); /* recheck execution */
>>> +             }
>>>                intel_engine_pm_put(engine);
>>>        }
>>>    
>>>
>>
>> I did not immediately get this one. Do we really need two pulses or
>> maybe we could re-order the code a bit and just undo the heartbeat park
>> if pulse after parking did not work?
> 
> We use the first pulse to determine if it's legal for the parameter to
> be changed (checking we support the preemptive pulse to remove
> non-persistent contexts). Then the second pulse after changing the
> parameter to flush the changes through.
> 
> I like checking for support before making the change, although we could
> try and fixup after failure, there would still be a window where the
> change would be visible to the system. We don't need to use the pulse per
> se for that check, that's pure convenience as it performs the checking
> already.

Hm second pulse also has a problem that sneaky user can nerf it with a 
precisely timed SIGINT on itself. It's a bit ridiculous isn't it? :)

Have engine preemption check open coded first and uninterruptible 
flavour of pulse sending? It's also not good since we do want it to be 
interruptible.. Unwind the change and report error back to write(2) if 
intel_engine_pulse failed for any reason?

Regards,

Tvrtko
Chris Wilson Sept. 25, 2020, 2:13 p.m. UTC | #4
Quoting Tvrtko Ursulin (2020-09-25 14:19:22)
> 
> On 25/09/2020 11:01, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-09-24 14:43:08)
> >>
> >> On 16/09/2020 10:42, Chris Wilson wrote:
> >>> Currently, we check we can send a pulse prior to disabling the
> >>> heartbeat to verify that we can change the heartbeat, but since we may
> >>> re-evaluate execution upon changing the heartbeat interval we need another
> >>> pulse afterwards to refresh execution.
> >>>
> >>> Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >>> Cc: <stable@vger.kernel.org> # v5.7+
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 6 ++++--
> >>>    1 file changed, 4 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> >>> index 8ffdf676c0a0..d09df370f7cd 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> >>> @@ -192,10 +192,12 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
> >>>        WRITE_ONCE(engine->props.heartbeat_interval_ms, delay);
> >>>    
> >>>        if (intel_engine_pm_get_if_awake(engine)) {
> >>> -             if (delay)
> >>> +             if (delay) {
> >>>                        intel_engine_unpark_heartbeat(engine);
> >>> -             else
> >>> +             } else {
> >>>                        intel_engine_park_heartbeat(engine);
> >>> +                     intel_engine_pulse(engine); /* recheck execution */
> >>> +             }
> >>>                intel_engine_pm_put(engine);
> >>>        }
> >>>    
> >>>
> >>
> >> I did not immediately get this one. Do we really need two pulses or
> >> maybe we could re-order the code a bit and just undo the heartbeat park
> >> if pulse after parking did not work?
> > 
> > We use the first pulse to determine if it's legal for the parameter to
> > be changed (checking we support the preemptive pulse to remove
> > non-persistent contexts). Then the second pulse after changing the
> > parameter to flush the changes through.
> > 
> > I like checking for support before making the change, although we could
> > try and fixup after failure, there would still be a window where the
> > change would be visible to the system. We don't need to use the pulse per
> > se for that check, that's pure convenience as it performs the checking
> > already.
> 
> Hm second pulse also has a problem that sneaky user can nerf it with a 
> precisely timed SIGINT on itself. It's a bit ridiculous isn't it? :)

Sufficient pause for concern. In both this and the kill_context path, we
are using the pulse to evict hostile contexts, that could be reasonable
grounds to have the pulse not be interrupted.
 
> Have engine preemption check open coded first and uninterruptible 
> flavour of pulse sending? It's also not good since we do want it to be 
> interruptible..

Right, allowing it to be interruptible does ensure we have an escape
plan. That escape plan is often invaluable. :|

> Unwind the change and report error back to write(2) if 
> intel_engine_pulse failed for any reason?

The concern then is how atomic do we want the change to appear to be.
Would it be sufficient to only rollback after error if the value still
matches (multiple users), cmpxchg for paranoia. Then anything that
observes the intermediate value will just have to obey the new rules
temporarily, as they would have to in future if the change succeeded.

	new = strtoull_from_user();
	/* validate new */

	old = xchg(&heartbeat, new);
	/* apply changes */
	if (err)
		cmpxchg(&heartbeat, new, old);

If heartbeat is changed in the interval, the pulse either uses the new
or the new-new value, fail/pass regardless. That could mean we report an
error for another heartbeat value from a second user. That's arguing for
a mutex guard for the sysfs writes.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 8ffdf676c0a0..d09df370f7cd 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -192,10 +192,12 @@  int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
 	WRITE_ONCE(engine->props.heartbeat_interval_ms, delay);
 
 	if (intel_engine_pm_get_if_awake(engine)) {
-		if (delay)
+		if (delay) {
 			intel_engine_unpark_heartbeat(engine);
-		else
+		} else {
 			intel_engine_park_heartbeat(engine);
+			intel_engine_pulse(engine); /* recheck execution */
+		}
 		intel_engine_pm_put(engine);
 	}