diff mbox

[i-g-t,2/3] lib: Align ring measurement to timer

Message ID 20180530103314.30565-2-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 30, 2018, 10:33 a.m. UTC
After hitting the SIGINT from execbuf, wait until the next timer signal
before trying again. This aligns the start of the ioctl to the timer,
hopefully maximising the amount of time we have for processing before
the next signal -- trying to prevent the case where we are scheduled out
in the middle of processing and so hit the timer signal too early.

References: https://bugs.freedesktop.org/show_bug.cgi?id=106695
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/i915/gem_ring.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Antonio Argenziano May 30, 2018, 5:30 p.m. UTC | #1
On 30/05/18 03:33, Chris Wilson wrote:
> After hitting the SIGINT from execbuf, wait until the next timer signal
> before trying again. This aligns the start of the ioctl to the timer,
> hopefully maximising the amount of time we have for processing before
> the next signal -- trying to prevent the case where we are scheduled out
> in the middle of processing and so hit the timer signal too early.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106695
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Not sure I understand what is the sequence of events, is the problem we 
get a signal in the middle of a 'good' execbuf and exit the while loop 
prematurely? If so maybe we can also think of making the timer 'VIRTUAL' 
so that it would decrement only when the process is executing.

Thanks,
Antonio

> ---
>   lib/i915/gem_ring.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/lib/i915/gem_ring.c b/lib/i915/gem_ring.c
> index 7d64165eb..0c061000c 100644
> --- a/lib/i915/gem_ring.c
> +++ b/lib/i915/gem_ring.c
> @@ -96,6 +96,8 @@ __gem_measure_ring_inflight(int fd, unsigned int engine, enum measure_ring_flags
>   		if (last == count)
>   			break;
>   
> +		/* sleep until the next timer interrupt (woken on signal) */
> +		pause();

Does it cause any (sensible) slowdown?

Thanks,
Antonio

>   		last = count;
>   	} while (1);
>   
>
Chris Wilson May 30, 2018, 7:52 p.m. UTC | #2
Quoting Antonio Argenziano (2018-05-30 18:30:36)
> 
> 
> On 30/05/18 03:33, Chris Wilson wrote:
> > After hitting the SIGINT from execbuf, wait until the next timer signal
> > before trying again. This aligns the start of the ioctl to the timer,
> > hopefully maximising the amount of time we have for processing before
> > the next signal -- trying to prevent the case where we are scheduled out
> > in the middle of processing and so hit the timer signal too early.
> > 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=106695
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Not sure I understand what is the sequence of events, is the problem we 
> get a signal in the middle of a 'good' execbuf and exit the while loop 
> prematurely? If so maybe we can also think of making the timer 'VIRTUAL' 
> so that it would decrement only when the process is executing.

If it's VIRTUAL it'll never fire when we wait for space (as being asleep
no user/sys time is consumed).

The only way I can explain 106695 would be with some very strange
scheduler behaviour, but even then it requires us to hit a path where we
actually check for a pending signal -- which should only happen when we
run out of ring space for this setup. Not even the device being wedged
(which it wasn't) would cause the ring to drain. Possibly going over 10s
and the cork being unplugged? Very stange.

> > ---
> >   lib/i915/gem_ring.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/lib/i915/gem_ring.c b/lib/i915/gem_ring.c
> > index 7d64165eb..0c061000c 100644
> > --- a/lib/i915/gem_ring.c
> > +++ b/lib/i915/gem_ring.c
> > @@ -96,6 +96,8 @@ __gem_measure_ring_inflight(int fd, unsigned int engine, enum measure_ring_flags
> >               if (last == count)
> >                       break;
> >   
> > +             /* sleep until the next timer interrupt (woken on signal) */
> > +             pause();
> 
> Does it cause any (sensible) slowdown?

Adds at most one timer interval, 10us. Ok, at a push 2 timer intervals
if it takes longer than first to setup the sleep.
-Chris
Antonio Argenziano May 31, 2018, 2:42 p.m. UTC | #3
On 30/05/18 12:52, Chris Wilson wrote:
> Quoting Antonio Argenziano (2018-05-30 18:30:36)
>>
>>
>> On 30/05/18 03:33, Chris Wilson wrote:
>>> After hitting the SIGINT from execbuf, wait until the next timer signal
>>> before trying again. This aligns the start of the ioctl to the timer,
>>> hopefully maximising the amount of time we have for processing before
>>> the next signal -- trying to prevent the case where we are scheduled out
>>> in the middle of processing and so hit the timer signal too early.
>>>
>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=106695
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> Not sure I understand what is the sequence of events, is the problem we
>> get a signal in the middle of a 'good' execbuf and exit the while loop
>> prematurely? If so maybe we can also think of making the timer 'VIRTUAL'
>> so that it would decrement only when the process is executing.
> 
> If it's VIRTUAL it'll never fire when we wait for space (as being asleep
> no user/sys time is consumed).
> 
> The only way I can explain 106695 would be with some very strange
> scheduler behaviour, but even then it requires us to hit a path where we
> actually check for a pending signal -- which should only happen when we
> run out of ring space for this setup. Not even the device being wedged
> (which it wasn't) would cause the ring to drain. Possibly going over 10s
> and the cork being unplugged? Very stange.

Just a bit concerned that we might be covering up some weird corner case 
where we are sleeping where we shouldn't.

But the patch does what advertised and seems sensible so:

Acked-by: Antonio Argenziano <antonio.argenziano@intel.com>

> 
>>> ---
>>>    lib/i915/gem_ring.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/lib/i915/gem_ring.c b/lib/i915/gem_ring.c
>>> index 7d64165eb..0c061000c 100644
>>> --- a/lib/i915/gem_ring.c
>>> +++ b/lib/i915/gem_ring.c
>>> @@ -96,6 +96,8 @@ __gem_measure_ring_inflight(int fd, unsigned int engine, enum measure_ring_flags
>>>                if (last == count)
>>>                        break;
>>>    
>>> +             /* sleep until the next timer interrupt (woken on signal) */
>>> +             pause();
>>
>> Does it cause any (sensible) slowdown?
> 
> Adds at most one timer interval, 10us. Ok, at a push 2 timer intervals
> if it takes longer than first to setup the sleep.
> -Chris
>
Chris Wilson May 31, 2018, 2:56 p.m. UTC | #4
Quoting Antonio Argenziano (2018-05-31 15:42:03)
> 
> 
> On 30/05/18 12:52, Chris Wilson wrote:
> > Quoting Antonio Argenziano (2018-05-30 18:30:36)
> >>
> >>
> >> On 30/05/18 03:33, Chris Wilson wrote:
> >>> After hitting the SIGINT from execbuf, wait until the next timer signal
> >>> before trying again. This aligns the start of the ioctl to the timer,
> >>> hopefully maximising the amount of time we have for processing before
> >>> the next signal -- trying to prevent the case where we are scheduled out
> >>> in the middle of processing and so hit the timer signal too early.
> >>>
> >>> References: https://bugs.freedesktop.org/show_bug.cgi?id=106695
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>
> >> Not sure I understand what is the sequence of events, is the problem we
> >> get a signal in the middle of a 'good' execbuf and exit the while loop
> >> prematurely? If so maybe we can also think of making the timer 'VIRTUAL'
> >> so that it would decrement only when the process is executing.
> > 
> > If it's VIRTUAL it'll never fire when we wait for space (as being asleep
> > no user/sys time is consumed).
> > 
> > The only way I can explain 106695 would be with some very strange
> > scheduler behaviour, but even then it requires us to hit a path where we
> > actually check for a pending signal -- which should only happen when we
> > run out of ring space for this setup. Not even the device being wedged
> > (which it wasn't) would cause the ring to drain. Possibly going over 10s
> > and the cork being unplugged? Very stange.
> 
> Just a bit concerned that we might be covering up some weird corner case 
> where we are sleeping where we shouldn't.

The bugs are exactly the opposite, where there's a signal pending and we
ignore it ;)

And ignore them we do. If one day someone cares signal latency, we have
a lot of explaining to do...

It's just worrying because the only signal_pending() check we expect to
hit is in wait-for-space (i915_request_wait to be precise); and that
should be consistent between calls to execbuf. However, it's not meant
to be a defining test of user behaviour, just exploiting the limitation
of the implementation to report said limitation. All that we must do
is to be sure that we don't over-report or else the callers will hang
during their setup and fail. Under reporting is a nuisance, but not a
huge issue.
-Chris
diff mbox

Patch

diff --git a/lib/i915/gem_ring.c b/lib/i915/gem_ring.c
index 7d64165eb..0c061000c 100644
--- a/lib/i915/gem_ring.c
+++ b/lib/i915/gem_ring.c
@@ -96,6 +96,8 @@  __gem_measure_ring_inflight(int fd, unsigned int engine, enum measure_ring_flags
 		if (last == count)
 			break;
 
+		/* sleep until the next timer interrupt (woken on signal) */
+		pause();
 		last = count;
 	} while (1);