Message ID | 20220623023157.211650-2-alan.previn.teres.alexis@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/guc: Don't update engine busyness stats too frequently | expand |
On 23/06/2022 03:31, Alan Previn wrote: > Using two different types of workoads, it was observed that > guc_update_engine_gt_clks was being called too frequently and/or > causing a CPU-to-lmem bandwidth hit over PCIE. Details on > the workloads and numbers are in the notes below. > > Background: At the moment, guc_update_engine_gt_clks can be invoked > via one of 3 ways. #1 and #2 are infrequent under normal operating > conditions: > 1.When a predefined "ping_delay" timer expires so that GuC- > busyness can sample the GTPM clock counter to ensure it > doesn't miss a wrap-around of the 32-bits of the HW counter. > (The ping_delay is calculated based on 1/8th the time taken > for the counter go from 0x0 to 0xffffffff based on the > GT frequency. This comes to about once every 28 seconds at a > GT frequency of 19.2Mhz). > 2.In preparation for a gt reset. > 3.In response to __gt_park events (as the gt power management > puts the gt into a lower power state when there is no work > being done). > > Root-cause: For both the workloads described farther below, it was > observed that when user space calls IOCTLs that unparks the > gt momentarily and repeats such calls many times in quick succession, > it triggers calling guc_update_engine_gt_clks as many times. However, > the primary purpose of guc_update_engine_gt_clks is to ensure we don't > miss the wraparound while the counter is ticking. Thus, the solution > is to ensure we skip that check if gt_park is calling this function > earlier than necessary. > > Solution: Snapshot jiffies when we do actually update the busyness > stats. Then get the new jiffies every time intel_guc_busyness_park > is called and bail if we are being called too soon. Use half of the > ping_delay as a safe threshold. > > NOTE1: Workload1: IGTs' gem_create was modified to create a file handle, > allocate memory with sizes that range from a min of 4K to the max supported > (in power of two step-sizes). Its maps, modifies and reads back the > memory. Allocations and modification is repeated until total memory > allocation reaches the max. Then the file handle is closed. With this > workload, guc_update_engine_gt_clks was called over 188 thousand times > in the span of 15 seconds while this test ran three times. With this patch, > the number of calls reduced to 14. > > NOTE2: Workload2: 30 transcode sessions are created in quick succession. > While these sessions are created, pcm-iio tool was used to measure I/O > read operation bandwidth consumption sampled at 100 milisecond intervals > over the course of 20 seconds. The total bandwidth consumed over 20 seconds > without this patch was measured at average at 311KBps per sample. With this > patch, the number went down to about 175Kbps which is about a 43% savings. > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> > Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Super detailed and clear - thank you! Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> One question below: > --- > drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 ++++++++ > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 +++++++++++++ > 2 files changed, 21 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > index 966e69a8b1c1..d0d99f178f2d 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > @@ -230,6 +230,14 @@ struct intel_guc { > * @shift: Right shift value for the gpm timestamp > */ > u32 shift; > + > + /** > + * @last_stat_jiffies: jiffies at last actual stats collection time > + * We use this timestamp to ensure we don't oversample the > + * stats because runtime power management events can trigger > + * stats collection at much higher rates than required. > + */ > + unsigned long last_stat_jiffies; > } timestamp; > > #ifdef CONFIG_DRM_I915_SELFTEST > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index e62ea35513ea..c9f167b80910 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -1314,6 +1314,8 @@ static void __update_guc_busyness_stats(struct intel_guc *guc) > unsigned long flags; > ktime_t unused; > > + guc->timestamp.last_stat_jiffies = jiffies; > + > spin_lock_irqsave(&guc->timestamp.lock, flags); > > guc_update_pm_timestamp(guc, &unused); > @@ -1386,6 +1388,17 @@ void intel_guc_busyness_park(struct intel_gt *gt) > return; > > cancel_delayed_work(&guc->timestamp.work); > + > + /* > + * Before parking, we should sample engine busyness stats if we need to. > + * We can skip it if we are less than half a ping from the last time we > + * sampled the busyness stats. > + */ > + if (guc->timestamp.last_stat_jiffies && Is there a case where we enter park with last_stat_jiffies being unset and so skip updating, and if there is, is that a concern? If it is then see if it is safe just to not have the zero check. Worst that could happen is one extra sampling if either unset or overflows, right? Regards, Tvrtko > + !time_after(jiffies, guc->timestamp.last_stat_jiffies + > + (guc->timestamp.ping_delay / 2))) > + return; > + > __update_guc_busyness_stats(guc); > } >
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 966e69a8b1c1..d0d99f178f2d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -230,6 +230,14 @@ struct intel_guc { * @shift: Right shift value for the gpm timestamp */ u32 shift; + + /** + * @last_stat_jiffies: jiffies at last actual stats collection time + * We use this timestamp to ensure we don't oversample the + * stats because runtime power management events can trigger + * stats collection at much higher rates than required. + */ + unsigned long last_stat_jiffies; } timestamp; #ifdef CONFIG_DRM_I915_SELFTEST diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index e62ea35513ea..c9f167b80910 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1314,6 +1314,8 @@ static void __update_guc_busyness_stats(struct intel_guc *guc) unsigned long flags; ktime_t unused; + guc->timestamp.last_stat_jiffies = jiffies; + spin_lock_irqsave(&guc->timestamp.lock, flags); guc_update_pm_timestamp(guc, &unused); @@ -1386,6 +1388,17 @@ void intel_guc_busyness_park(struct intel_gt *gt) return; cancel_delayed_work(&guc->timestamp.work); + + /* + * Before parking, we should sample engine busyness stats if we need to. + * We can skip it if we are less than half a ping from the last time we + * sampled the busyness stats. + */ + if (guc->timestamp.last_stat_jiffies && + !time_after(jiffies, guc->timestamp.last_stat_jiffies + + (guc->timestamp.ping_delay / 2))) + return; + __update_guc_busyness_stats(guc); }