Message ID | 151576218683.24367.426104957116548694@mail.alporthouse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 12/01/2018 13:03, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-01-12 11:43:11) >> >> On 12/01/2018 10:35, Chris Wilson wrote: >>> Quoting Tvrtko Ursulin (2018-01-12 10:30:26) >>>> >>>> >>>> On 12/01/2018 09:51, Chris Wilson wrote: >>>>> Quoting Tvrtko Ursulin (2018-01-12 09:40:40) >>>>>> So submit side doesn't work in either case, unless I am missing >>>>>> something. Would need the pair of port manipulation and context_in to be >>>>>> atomic. >>>>> >>>>> Sure, there is a small window with the result that we either never turn >>>>> off the stats, or turn them off one too early (and then recover if the >>>>> my understanding of the underflow protection holds). The same problem as >>>>> existed before the reconstruction, except now the window is much >>>>> smaller. I'm not that scared by this (it should not explode, nor result >>>>> in hopelessly wrong stats) so it can wait until stats enabling doesn't >>>>> need to peek at execlists. I think you will need to postpone enabling >>>>> until the next context-switch if we wanted to avoid the atomics; except >>>>> that poses a problem with the igt expecting busy-start to be accurate. A >>>>> dilemma for later. >>>> >>>> My analysis was partially incorrect, yes, there is underflow protection >>>> already. >>>> >>>> But I don't see that there is a race window where we end up with >>>> permanent 100% busyness before the reconstruction patch. Where do you >>>> see that? >>>> >>>> The worst I see without the reconstruction is to miss the accounting of >>>> the batch currently in progress when stats get enabled. Which is a much >>>> less serious, totally ignorable event. >>> >>> One is observable via pmu, the other not. If we fail to turn off >>> busy-stats accounting, nothing is lost except for a few wasted cycles. >>> Except on the next pmu, it starts from the previous cs instead of the >>> enabling -- but that is a problem that also exists with the second user. >> >> I don't follow, I am talking about permanent 100%. Let me copy what I >> wrote earlier: >> >> port0 context complete >> context_out - not enabled, no-op >> stats enable - port0 busy, active = 1 >> port0 clear >> submit >> context_in - active = 2 !!! BAD >> port0 set >> >> That leads to permanent 100% until busy stats are disabled. I think that >> is hugely less desirable than just failing to account for the currently >> running batch, which was the case before reconstruction on enable, and >> is 99.9% only a problem for IGTs. >> >> Do you think there is a flaw in this analysis or something else? > > No, I don't think it's a huge problem, an improbable race for which the > quick-and-dirty cure may worse than the disease: > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > index 25360ce0353f..62d9ee9d45a6 100644 > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > @@ -1980,16 +1980,22 @@ intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance) > */ > int intel_enable_engine_stats(struct intel_engine_cs *engine) > { > + struct intel_engine_execlists *execlists = &engine->execlists; > unsigned long flags; > + int err = 0; > > if (!intel_engine_supports_stats(engine)) > return -ENODEV; > > + tasklet_disable(&execlists->tasklet); > spin_lock_irqsave(&engine->stats.lock, flags); > - if (engine->stats.enabled == ~0) > - goto busy; > + > + if (unlikely(engine->stats.enabled == ~0)) { > + err = -EBUSY; > + goto unlock; > + } > + > if (engine->stats.enabled++ == 0) { > - struct intel_engine_execlists *execlists = &engine->execlists; > const struct execlist_port *port = execlists->port; > unsigned int num_ports = execlists_num_ports(execlists); > > @@ -2004,14 +2010,12 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) > if (engine->stats.active) > engine->stats.start = engine->stats.enabled_at; > } > - spin_unlock_irqrestore(&engine->stats.lock, flags); > > - return 0; > - > -busy: > +unlock: > spin_unlock_irqrestore(&engine->stats.lock, flags); > + tasklet_enable(&execlists->tasklet); > > - return -EBUSY; > + return err; > } > > static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine) > > using the tasklet control as a spinlock. Whereas the previous race did > not take much effort to observe. By the previous race you are referring to the missed currently running batch, or the permanent 100% with active state reconstruction? You are against reverting the reconstruction? Even if it is unlikely, I see it as very severe, while the missed current batch is more likely but not at all severe. So I would be really for the revert. Unfortunately I did not spot this issue during review. :( Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-01-12 13:19:30) > > On 12/01/2018 13:03, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-01-12 11:43:11) > >> > >> On 12/01/2018 10:35, Chris Wilson wrote: > >>> Quoting Tvrtko Ursulin (2018-01-12 10:30:26) > >>>> > >>>> > >>>> On 12/01/2018 09:51, Chris Wilson wrote: > >>>>> Quoting Tvrtko Ursulin (2018-01-12 09:40:40) > >>>>>> So submit side doesn't work in either case, unless I am missing > >>>>>> something. Would need the pair of port manipulation and context_in to be > >>>>>> atomic. > >>>>> > >>>>> Sure, there is a small window with the result that we either never turn > >>>>> off the stats, or turn them off one too early (and then recover if the > >>>>> my understanding of the underflow protection holds). The same problem as > >>>>> existed before the reconstruction, except now the window is much > >>>>> smaller. I'm not that scared by this (it should not explode, nor result > >>>>> in hopelessly wrong stats) so it can wait until stats enabling doesn't > >>>>> need to peek at execlists. I think you will need to postpone enabling > >>>>> until the next context-switch if we wanted to avoid the atomics; except > >>>>> that poses a problem with the igt expecting busy-start to be accurate. A > >>>>> dilemma for later. > >>>> > >>>> My analysis was partially incorrect, yes, there is underflow protection > >>>> already. > >>>> > >>>> But I don't see that there is a race window where we end up with > >>>> permanent 100% busyness before the reconstruction patch. Where do you > >>>> see that? > >>>> > >>>> The worst I see without the reconstruction is to miss the accounting of > >>>> the batch currently in progress when stats get enabled. Which is a much > >>>> less serious, totally ignorable event. > >>> > >>> One is observable via pmu, the other not. If we fail to turn off > >>> busy-stats accounting, nothing is lost except for a few wasted cycles. > >>> Except on the next pmu, it starts from the previous cs instead of the > >>> enabling -- but that is a problem that also exists with the second user. > >> > >> I don't follow, I am talking about permanent 100%. Let me copy what I > >> wrote earlier: > >> > >> port0 context complete > >> context_out - not enabled, no-op > >> stats enable - port0 busy, active = 1 > >> port0 clear > >> submit > >> context_in - active = 2 !!! BAD > >> port0 set > >> > >> That leads to permanent 100% until busy stats are disabled. I think that > >> is hugely less desirable than just failing to account for the currently > >> running batch, which was the case before reconstruction on enable, and > >> is 99.9% only a problem for IGTs. > >> > >> Do you think there is a flaw in this analysis or something else? > > > > No, I don't think it's a huge problem, an improbable race for which the > > quick-and-dirty cure may worse than the disease: > > > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > > index 25360ce0353f..62d9ee9d45a6 100644 > > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > > @@ -1980,16 +1980,22 @@ intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance) > > */ > > int intel_enable_engine_stats(struct intel_engine_cs *engine) > > { > > + struct intel_engine_execlists *execlists = &engine->execlists; > > unsigned long flags; > > + int err = 0; > > > > if (!intel_engine_supports_stats(engine)) > > return -ENODEV; > > > > + tasklet_disable(&execlists->tasklet); > > spin_lock_irqsave(&engine->stats.lock, flags); > > - if (engine->stats.enabled == ~0) > > - goto busy; > > + > > + if (unlikely(engine->stats.enabled == ~0)) { > > + err = -EBUSY; > > + goto unlock; > > + } > > + > > if (engine->stats.enabled++ == 0) { > > - struct intel_engine_execlists *execlists = &engine->execlists; > > const struct execlist_port *port = execlists->port; > > unsigned int num_ports = execlists_num_ports(execlists); > > > > @@ -2004,14 +2010,12 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) > > if (engine->stats.active) > > engine->stats.start = engine->stats.enabled_at; > > } > > - spin_unlock_irqrestore(&engine->stats.lock, flags); > > > > - return 0; > > - > > -busy: > > +unlock: > > spin_unlock_irqrestore(&engine->stats.lock, flags); > > + tasklet_enable(&execlists->tasklet); > > > > - return -EBUSY; > > + return err; > > } > > > > static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine) > > > > using the tasklet control as a spinlock. Whereas the previous race did > > not take much effort to observe. > > By the previous race you are referring to the missed currently running > batch, or the permanent 100% with active state reconstruction? > > You are against reverting the reconstruction? Even if it is unlikely, I > see it as very severe, while the missed current batch is more likely but > not at all severe. So I would be really for the revert. Unfortunately I > did not spot this issue during review. :( I am against it since we can demonstrate one race very easily in igt, and the other requires a stress test to be written, when I expect it to be fixed. We both saw it as a temporary measure as it is poking in places it should, and the above patch would provide the protection it needs. -Chris
On 12/01/2018 13:24, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-01-12 13:19:30) >> >> On 12/01/2018 13:03, Chris Wilson wrote: >>> Quoting Tvrtko Ursulin (2018-01-12 11:43:11) >>>> >>>> On 12/01/2018 10:35, Chris Wilson wrote: >>>>> Quoting Tvrtko Ursulin (2018-01-12 10:30:26) >>>>>> >>>>>> >>>>>> On 12/01/2018 09:51, Chris Wilson wrote: >>>>>>> Quoting Tvrtko Ursulin (2018-01-12 09:40:40) >>>>>>>> So submit side doesn't work in either case, unless I am missing >>>>>>>> something. Would need the pair of port manipulation and context_in to be >>>>>>>> atomic. >>>>>>> >>>>>>> Sure, there is a small window with the result that we either never turn >>>>>>> off the stats, or turn them off one too early (and then recover if the >>>>>>> my understanding of the underflow protection holds). The same problem as >>>>>>> existed before the reconstruction, except now the window is much >>>>>>> smaller. I'm not that scared by this (it should not explode, nor result >>>>>>> in hopelessly wrong stats) so it can wait until stats enabling doesn't >>>>>>> need to peek at execlists. I think you will need to postpone enabling >>>>>>> until the next context-switch if we wanted to avoid the atomics; except >>>>>>> that poses a problem with the igt expecting busy-start to be accurate. A >>>>>>> dilemma for later. >>>>>> >>>>>> My analysis was partially incorrect, yes, there is underflow protection >>>>>> already. >>>>>> >>>>>> But I don't see that there is a race window where we end up with >>>>>> permanent 100% busyness before the reconstruction patch. Where do you >>>>>> see that? >>>>>> >>>>>> The worst I see without the reconstruction is to miss the accounting of >>>>>> the batch currently in progress when stats get enabled. Which is a much >>>>>> less serious, totally ignorable event. >>>>> >>>>> One is observable via pmu, the other not. If we fail to turn off >>>>> busy-stats accounting, nothing is lost except for a few wasted cycles. >>>>> Except on the next pmu, it starts from the previous cs instead of the >>>>> enabling -- but that is a problem that also exists with the second user. >>>> >>>> I don't follow, I am talking about permanent 100%. Let me copy what I >>>> wrote earlier: >>>> >>>> port0 context complete >>>> context_out - not enabled, no-op >>>> stats enable - port0 busy, active = 1 >>>> port0 clear >>>> submit >>>> context_in - active = 2 !!! BAD >>>> port0 set >>>> >>>> That leads to permanent 100% until busy stats are disabled. I think that >>>> is hugely less desirable than just failing to account for the currently >>>> running batch, which was the case before reconstruction on enable, and >>>> is 99.9% only a problem for IGTs. >>>> >>>> Do you think there is a flaw in this analysis or something else? >>> >>> No, I don't think it's a huge problem, an improbable race for which the >>> quick-and-dirty cure may worse than the disease: >>> >>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c >>> index 25360ce0353f..62d9ee9d45a6 100644 >>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c >>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c >>> @@ -1980,16 +1980,22 @@ intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance) >>> */ >>> int intel_enable_engine_stats(struct intel_engine_cs *engine) >>> { >>> + struct intel_engine_execlists *execlists = &engine->execlists; >>> unsigned long flags; >>> + int err = 0; >>> >>> if (!intel_engine_supports_stats(engine)) >>> return -ENODEV; >>> >>> + tasklet_disable(&execlists->tasklet); >>> spin_lock_irqsave(&engine->stats.lock, flags); >>> - if (engine->stats.enabled == ~0) >>> - goto busy; >>> + >>> + if (unlikely(engine->stats.enabled == ~0)) { >>> + err = -EBUSY; >>> + goto unlock; >>> + } >>> + >>> if (engine->stats.enabled++ == 0) { >>> - struct intel_engine_execlists *execlists = &engine->execlists; >>> const struct execlist_port *port = execlists->port; >>> unsigned int num_ports = execlists_num_ports(execlists); >>> >>> @@ -2004,14 +2010,12 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) >>> if (engine->stats.active) >>> engine->stats.start = engine->stats.enabled_at; >>> } >>> - spin_unlock_irqrestore(&engine->stats.lock, flags); >>> >>> - return 0; >>> - >>> -busy: >>> +unlock: >>> spin_unlock_irqrestore(&engine->stats.lock, flags); >>> + tasklet_enable(&execlists->tasklet); >>> >>> - return -EBUSY; >>> + return err; >>> } >>> >>> static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine) >>> >>> using the tasklet control as a spinlock. Whereas the previous race did >>> not take much effort to observe. >> >> By the previous race you are referring to the missed currently running >> batch, or the permanent 100% with active state reconstruction? >> >> You are against reverting the reconstruction? Even if it is unlikely, I >> see it as very severe, while the missed current batch is more likely but >> not at all severe. So I would be really for the revert. Unfortunately I >> did not spot this issue during review. :( > > I am against it since we can demonstrate one race very easily in igt, > and the other requires a stress test to be written, when I expect it to > be fixed. We both saw it as a temporary measure as it is poking in places > it should, and the above patch would provide the protection it needs. Solution with tasklet_disable looks fine to me, I just thought you are against that as well - since you said worst than the disease. Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 25360ce0353f..62d9ee9d45a6 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -1980,16 +1980,22 @@ intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance) */ int intel_enable_engine_stats(struct intel_engine_cs *engine) { + struct intel_engine_execlists *execlists = &engine->execlists; unsigned long flags; + int err = 0; if (!intel_engine_supports_stats(engine)) return -ENODEV; + tasklet_disable(&execlists->tasklet); spin_lock_irqsave(&engine->stats.lock, flags); - if (engine->stats.enabled == ~0) - goto busy; + + if (unlikely(engine->stats.enabled == ~0)) { + err = -EBUSY; + goto unlock; + } + if (engine->stats.enabled++ == 0) { - struct intel_engine_execlists *execlists = &engine->execlists; const struct execlist_port *port = execlists->port; unsigned int num_ports = execlists_num_ports(execlists); @@ -2004,14 +2010,12 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) if (engine->stats.active) engine->stats.start = engine->stats.enabled_at; } - spin_unlock_irqrestore(&engine->stats.lock, flags); - return 0; - -busy: +unlock: spin_unlock_irqrestore(&engine->stats.lock, flags); + tasklet_enable(&execlists->tasklet); - return -EBUSY; + return err; } static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine)