diff mbox

[v3] drm/i915/pmu: Reconstruct active state on starting busy-stats

Message ID 151576218683.24367.426104957116548694@mail.alporthouse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Jan. 12, 2018, 1:03 p.m. UTC
Quoting Tvrtko Ursulin (2018-01-12 11:43:11)
> 
> On 12/01/2018 10:35, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-01-12 10:30:26)
> >>
> >>
> >> On 12/01/2018 09:51, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-01-12 09:40:40)
> >>>> So submit side doesn't work in either case, unless I am missing
> >>>> something. Would need the pair of port manipulation and context_in to be
> >>>> atomic.
> >>>
> >>> Sure, there is a small window with the result that we either never turn
> >>> off the stats, or turn them off one too early (and then recover if the
> >>> my understanding of the underflow protection holds). The same problem as
> >>> existed before the reconstruction, except now the window is much
> >>> smaller. I'm not that scared by this (it should not explode, nor result
> >>> in hopelessly wrong stats) so it can wait until stats enabling doesn't
> >>> need to peek at execlists. I think you will need to postpone enabling
> >>> until the next context-switch if we wanted to avoid the atomics; except
> >>> that poses a problem with the igt expecting busy-start to be accurate. A
> >>> dilemma for later.
> >>
> >> My analysis was partially incorrect, yes, there is underflow protection
> >> already.
> >>
> >> But I don't see that there is a race window where we end up with
> >> permanent 100% busyness before the reconstruction patch. Where do you
> >> see that?
> >>
> >> The worst I see without the reconstruction is to miss the accounting of
> >> the batch currently in progress when stats get enabled. Which is a much
> >> less serious, totally ignorable event.
> > 
> > One is observable via pmu, the other not. If we fail to turn off
> > busy-stats accounting, nothing is lost except for a few wasted cycles.
> > Except on the next pmu, it starts from the previous cs instead of the
> > enabling -- but that is a problem that also exists with the second user.
> 
> I don't follow, I am talking about permanent 100%. Let me copy what I 
> wrote earlier:
> 
> port0 context complete
> context_out - not enabled, no-op
> stats enable - port0 busy, active = 1
> port0 clear
> submit
> context_in - active = 2 !!! BAD
> port0 set
> 
> That leads to permanent 100% until busy stats are disabled. I think that 
> is hugely less desirable than just failing to account for the currently 
> running batch, which was the case before reconstruction on enable, and 
> is 99.9% only a problem for IGTs.
> 
> Do you think there is a flaw in this analysis or something else?

No, I don't think it's a huge problem, an improbable race for which the
quick-and-dirty cure may worse than the disease:


using the tasklet control as a spinlock. Whereas the previous race did
not take much effort to observe.
-Chris

Comments

Tvrtko Ursulin Jan. 12, 2018, 1:19 p.m. UTC | #1
On 12/01/2018 13:03, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-12 11:43:11)
>>
>> On 12/01/2018 10:35, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-01-12 10:30:26)
>>>>
>>>>
>>>> On 12/01/2018 09:51, Chris Wilson wrote:
>>>>> Quoting Tvrtko Ursulin (2018-01-12 09:40:40)
>>>>>> So submit side doesn't work in either case, unless I am missing
>>>>>> something. Would need the pair of port manipulation and context_in to be
>>>>>> atomic.
>>>>>
>>>>> Sure, there is a small window with the result that we either never turn
>>>>> off the stats, or turn them off one too early (and then recover if the
>>>>> my understanding of the underflow protection holds). The same problem as
>>>>> existed before the reconstruction, except now the window is much
>>>>> smaller. I'm not that scared by this (it should not explode, nor result
>>>>> in hopelessly wrong stats) so it can wait until stats enabling doesn't
>>>>> need to peek at execlists. I think you will need to postpone enabling
>>>>> until the next context-switch if we wanted to avoid the atomics; except
>>>>> that poses a problem with the igt expecting busy-start to be accurate. A
>>>>> dilemma for later.
>>>>
>>>> My analysis was partially incorrect, yes, there is underflow protection
>>>> already.
>>>>
>>>> But I don't see that there is a race window where we end up with
>>>> permanent 100% busyness before the reconstruction patch. Where do you
>>>> see that?
>>>>
>>>> The worst I see without the reconstruction is to miss the accounting of
>>>> the batch currently in progress when stats get enabled. Which is a much
>>>> less serious, totally ignorable event.
>>>
>>> One is observable via pmu, the other not. If we fail to turn off
>>> busy-stats accounting, nothing is lost except for a few wasted cycles.
>>> Except on the next pmu, it starts from the previous cs instead of the
>>> enabling -- but that is a problem that also exists with the second user.
>>
>> I don't follow, I am talking about permanent 100%. Let me copy what I
>> wrote earlier:
>>
>> port0 context complete
>> context_out - not enabled, no-op
>> stats enable - port0 busy, active = 1
>> port0 clear
>> submit
>> context_in - active = 2 !!! BAD
>> port0 set
>>
>> That leads to permanent 100% until busy stats are disabled. I think that
>> is hugely less desirable than just failing to account for the currently
>> running batch, which was the case before reconstruction on enable, and
>> is 99.9% only a problem for IGTs.
>>
>> Do you think there is a flaw in this analysis or something else?
> 
> No, I don't think it's a huge problem, an improbable race for which the
> quick-and-dirty cure may worse than the disease:
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 25360ce0353f..62d9ee9d45a6 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1980,16 +1980,22 @@ intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
>    */
>   int intel_enable_engine_stats(struct intel_engine_cs *engine)
>   {
> +       struct intel_engine_execlists *execlists = &engine->execlists;
>          unsigned long flags;
> +       int err = 0;
>   
>          if (!intel_engine_supports_stats(engine))
>                  return -ENODEV;
>   
> +       tasklet_disable(&execlists->tasklet);
>          spin_lock_irqsave(&engine->stats.lock, flags);
> -       if (engine->stats.enabled == ~0)
> -               goto busy;
> +
> +       if (unlikely(engine->stats.enabled == ~0)) {
> +               err = -EBUSY;
> +               goto unlock;
> +       }
> +
>          if (engine->stats.enabled++ == 0) {
> -               struct intel_engine_execlists *execlists = &engine->execlists;
>                  const struct execlist_port *port = execlists->port;
>                  unsigned int num_ports = execlists_num_ports(execlists);
>   
> @@ -2004,14 +2010,12 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
>                  if (engine->stats.active)
>                          engine->stats.start = engine->stats.enabled_at;
>          }
> -       spin_unlock_irqrestore(&engine->stats.lock, flags);
>   
> -       return 0;
> -
> -busy:
> +unlock:
>          spin_unlock_irqrestore(&engine->stats.lock, flags);
> +       tasklet_enable(&execlists->tasklet);
>   
> -       return -EBUSY;
> +       return err;
>   }
>   
>   static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine)
> 
> using the tasklet control as a spinlock. Whereas the previous race did
> not take much effort to observe.

By the previous race you are referring to the missed currently running 
batch, or the permanent 100% with active state reconstruction?

You are against reverting the reconstruction? Even if it is unlikely, I 
see it as very severe, while the missed current batch is more likely but 
not at all severe. So I would be really for the revert. Unfortunately I 
did not spot this issue during review. :(

Regards,

Tvrtko
Chris Wilson Jan. 12, 2018, 1:24 p.m. UTC | #2
Quoting Tvrtko Ursulin (2018-01-12 13:19:30)
> 
> On 12/01/2018 13:03, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-01-12 11:43:11)
> >>
> >> On 12/01/2018 10:35, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-01-12 10:30:26)
> >>>>
> >>>>
> >>>> On 12/01/2018 09:51, Chris Wilson wrote:
> >>>>> Quoting Tvrtko Ursulin (2018-01-12 09:40:40)
> >>>>>> So submit side doesn't work in either case, unless I am missing
> >>>>>> something. Would need the pair of port manipulation and context_in to be
> >>>>>> atomic.
> >>>>>
> >>>>> Sure, there is a small window with the result that we either never turn
> >>>>> off the stats, or turn them off one too early (and then recover if the
> >>>>> my understanding of the underflow protection holds). The same problem as
> >>>>> existed before the reconstruction, except now the window is much
> >>>>> smaller. I'm not that scared by this (it should not explode, nor result
> >>>>> in hopelessly wrong stats) so it can wait until stats enabling doesn't
> >>>>> need to peek at execlists. I think you will need to postpone enabling
> >>>>> until the next context-switch if we wanted to avoid the atomics; except
> >>>>> that poses a problem with the igt expecting busy-start to be accurate. A
> >>>>> dilemma for later.
> >>>>
> >>>> My analysis was partially incorrect, yes, there is underflow protection
> >>>> already.
> >>>>
> >>>> But I don't see that there is a race window where we end up with
> >>>> permanent 100% busyness before the reconstruction patch. Where do you
> >>>> see that?
> >>>>
> >>>> The worst I see without the reconstruction is to miss the accounting of
> >>>> the batch currently in progress when stats get enabled. Which is a much
> >>>> less serious, totally ignorable event.
> >>>
> >>> One is observable via pmu, the other not. If we fail to turn off
> >>> busy-stats accounting, nothing is lost except for a few wasted cycles.
> >>> Except on the next pmu, it starts from the previous cs instead of the
> >>> enabling -- but that is a problem that also exists with the second user.
> >>
> >> I don't follow, I am talking about permanent 100%. Let me copy what I
> >> wrote earlier:
> >>
> >> port0 context complete
> >> context_out - not enabled, no-op
> >> stats enable - port0 busy, active = 1
> >> port0 clear
> >> submit
> >> context_in - active = 2 !!! BAD
> >> port0 set
> >>
> >> That leads to permanent 100% until busy stats are disabled. I think that
> >> is hugely less desirable than just failing to account for the currently
> >> running batch, which was the case before reconstruction on enable, and
> >> is 99.9% only a problem for IGTs.
> >>
> >> Do you think there is a flaw in this analysis or something else?
> > 
> > No, I don't think it's a huge problem, an improbable race for which the
> > quick-and-dirty cure may worse than the disease:
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 25360ce0353f..62d9ee9d45a6 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -1980,16 +1980,22 @@ intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
> >    */
> >   int intel_enable_engine_stats(struct intel_engine_cs *engine)
> >   {
> > +       struct intel_engine_execlists *execlists = &engine->execlists;
> >          unsigned long flags;
> > +       int err = 0;
> >   
> >          if (!intel_engine_supports_stats(engine))
> >                  return -ENODEV;
> >   
> > +       tasklet_disable(&execlists->tasklet);
> >          spin_lock_irqsave(&engine->stats.lock, flags);
> > -       if (engine->stats.enabled == ~0)
> > -               goto busy;
> > +
> > +       if (unlikely(engine->stats.enabled == ~0)) {
> > +               err = -EBUSY;
> > +               goto unlock;
> > +       }
> > +
> >          if (engine->stats.enabled++ == 0) {
> > -               struct intel_engine_execlists *execlists = &engine->execlists;
> >                  const struct execlist_port *port = execlists->port;
> >                  unsigned int num_ports = execlists_num_ports(execlists);
> >   
> > @@ -2004,14 +2010,12 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
> >                  if (engine->stats.active)
> >                          engine->stats.start = engine->stats.enabled_at;
> >          }
> > -       spin_unlock_irqrestore(&engine->stats.lock, flags);
> >   
> > -       return 0;
> > -
> > -busy:
> > +unlock:
> >          spin_unlock_irqrestore(&engine->stats.lock, flags);
> > +       tasklet_enable(&execlists->tasklet);
> >   
> > -       return -EBUSY;
> > +       return err;
> >   }
> >   
> >   static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine)
> > 
> > using the tasklet control as a spinlock. Whereas the previous race did
> > not take much effort to observe.
> 
> By the previous race you are referring to the missed currently running 
> batch, or the permanent 100% with active state reconstruction?
> 
> You are against reverting the reconstruction? Even if it is unlikely, I 
> see it as very severe, while the missed current batch is more likely but 
> not at all severe. So I would be really for the revert. Unfortunately I 
> did not spot this issue during review. :(

I am against it since we can demonstrate one race very easily in igt,
and the other requires a stress test to be written, when I expect it to
be fixed. We both saw it as a temporary measure as it is poking in places
it should, and the above patch would provide the protection it needs.
-Chris
Tvrtko Ursulin Jan. 12, 2018, 2:12 p.m. UTC | #3
On 12/01/2018 13:24, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-12 13:19:30)
>>
>> On 12/01/2018 13:03, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-01-12 11:43:11)
>>>>
>>>> On 12/01/2018 10:35, Chris Wilson wrote:
>>>>> Quoting Tvrtko Ursulin (2018-01-12 10:30:26)
>>>>>>
>>>>>>
>>>>>> On 12/01/2018 09:51, Chris Wilson wrote:
>>>>>>> Quoting Tvrtko Ursulin (2018-01-12 09:40:40)
>>>>>>>> So submit side doesn't work in either case, unless I am missing
>>>>>>>> something. Would need the pair of port manipulation and context_in to be
>>>>>>>> atomic.
>>>>>>>
>>>>>>> Sure, there is a small window with the result that we either never turn
>>>>>>> off the stats, or turn them off one too early (and then recover if the
>>>>>>> my understanding of the underflow protection holds). The same problem as
>>>>>>> existed before the reconstruction, except now the window is much
>>>>>>> smaller. I'm not that scared by this (it should not explode, nor result
>>>>>>> in hopelessly wrong stats) so it can wait until stats enabling doesn't
>>>>>>> need to peek at execlists. I think you will need to postpone enabling
>>>>>>> until the next context-switch if we wanted to avoid the atomics; except
>>>>>>> that poses a problem with the igt expecting busy-start to be accurate. A
>>>>>>> dilemma for later.
>>>>>>
>>>>>> My analysis was partially incorrect, yes, there is underflow protection
>>>>>> already.
>>>>>>
>>>>>> But I don't see that there is a race window where we end up with
>>>>>> permanent 100% busyness before the reconstruction patch. Where do you
>>>>>> see that?
>>>>>>
>>>>>> The worst I see without the reconstruction is to miss the accounting of
>>>>>> the batch currently in progress when stats get enabled. Which is a much
>>>>>> less serious, totally ignorable event.
>>>>>
>>>>> One is observable via pmu, the other not. If we fail to turn off
>>>>> busy-stats accounting, nothing is lost except for a few wasted cycles.
>>>>> Except on the next pmu, it starts from the previous cs instead of the
>>>>> enabling -- but that is a problem that also exists with the second user.
>>>>
>>>> I don't follow, I am talking about permanent 100%. Let me copy what I
>>>> wrote earlier:
>>>>
>>>> port0 context complete
>>>> context_out - not enabled, no-op
>>>> stats enable - port0 busy, active = 1
>>>> port0 clear
>>>> submit
>>>> context_in - active = 2 !!! BAD
>>>> port0 set
>>>>
>>>> That leads to permanent 100% until busy stats are disabled. I think that
>>>> is hugely less desirable than just failing to account for the currently
>>>> running batch, which was the case before reconstruction on enable, and
>>>> is 99.9% only a problem for IGTs.
>>>>
>>>> Do you think there is a flaw in this analysis or something else?
>>>
>>> No, I don't think it's a huge problem, an improbable race for which the
>>> quick-and-dirty cure may worse than the disease:
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> index 25360ce0353f..62d9ee9d45a6 100644
>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> @@ -1980,16 +1980,22 @@ intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
>>>     */
>>>    int intel_enable_engine_stats(struct intel_engine_cs *engine)
>>>    {
>>> +       struct intel_engine_execlists *execlists = &engine->execlists;
>>>           unsigned long flags;
>>> +       int err = 0;
>>>    
>>>           if (!intel_engine_supports_stats(engine))
>>>                   return -ENODEV;
>>>    
>>> +       tasklet_disable(&execlists->tasklet);
>>>           spin_lock_irqsave(&engine->stats.lock, flags);
>>> -       if (engine->stats.enabled == ~0)
>>> -               goto busy;
>>> +
>>> +       if (unlikely(engine->stats.enabled == ~0)) {
>>> +               err = -EBUSY;
>>> +               goto unlock;
>>> +       }
>>> +
>>>           if (engine->stats.enabled++ == 0) {
>>> -               struct intel_engine_execlists *execlists = &engine->execlists;
>>>                   const struct execlist_port *port = execlists->port;
>>>                   unsigned int num_ports = execlists_num_ports(execlists);
>>>    
>>> @@ -2004,14 +2010,12 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
>>>                   if (engine->stats.active)
>>>                           engine->stats.start = engine->stats.enabled_at;
>>>           }
>>> -       spin_unlock_irqrestore(&engine->stats.lock, flags);
>>>    
>>> -       return 0;
>>> -
>>> -busy:
>>> +unlock:
>>>           spin_unlock_irqrestore(&engine->stats.lock, flags);
>>> +       tasklet_enable(&execlists->tasklet);
>>>    
>>> -       return -EBUSY;
>>> +       return err;
>>>    }
>>>    
>>>    static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine)
>>>
>>> using the tasklet control as a spinlock. Whereas the previous race did
>>> not take much effort to observe.
>>
>> By the previous race you are referring to the missed currently running
>> batch, or the permanent 100% with active state reconstruction?
>>
>> You are against reverting the reconstruction? Even if it is unlikely, I
>> see it as very severe, while the missed current batch is more likely but
>> not at all severe. So I would be really for the revert. Unfortunately I
>> did not spot this issue during review. :(
> 
> I am against it since we can demonstrate one race very easily in igt,
> and the other requires a stress test to be written, when I expect it to
> be fixed. We both saw it as a temporary measure as it is poking in places
> it should, and the above patch would provide the protection it needs.

Solution with tasklet_disable looks fine to me, I just thought you are 
against that as well - since you said worst than the disease.

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 25360ce0353f..62d9ee9d45a6 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1980,16 +1980,22 @@  intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
  */
 int intel_enable_engine_stats(struct intel_engine_cs *engine)
 {
+       struct intel_engine_execlists *execlists = &engine->execlists;
        unsigned long flags;
+       int err = 0;
 
        if (!intel_engine_supports_stats(engine))
                return -ENODEV;
 
+       tasklet_disable(&execlists->tasklet);
        spin_lock_irqsave(&engine->stats.lock, flags);
-       if (engine->stats.enabled == ~0)
-               goto busy;
+
+       if (unlikely(engine->stats.enabled == ~0)) {
+               err = -EBUSY;
+               goto unlock;
+       }
+
        if (engine->stats.enabled++ == 0) {
-               struct intel_engine_execlists *execlists = &engine->execlists;
                const struct execlist_port *port = execlists->port;
                unsigned int num_ports = execlists_num_ports(execlists);
 
@@ -2004,14 +2010,12 @@  int intel_enable_engine_stats(struct intel_engine_cs *engine)
                if (engine->stats.active)
                        engine->stats.start = engine->stats.enabled_at;
        }
-       spin_unlock_irqrestore(&engine->stats.lock, flags);
 
-       return 0;
-
-busy:
+unlock:
        spin_unlock_irqrestore(&engine->stats.lock, flags);
+       tasklet_enable(&execlists->tasklet);
 
-       return -EBUSY;
+       return err;
 }
 
 static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine)