Message ID | 20180808145945.26159-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [i-g-t,1/2] igt/perf_pmu: Aim for a fixed number of iterations for calibrating accuracy | expand |
On 08/08/2018 15:59, Chris Wilson wrote: > Our observation is that the systematic error is proportional to the > number of iterations we perform; the suspicion is that it directly > correlates with the number of sleeps. Reduce the number of iterations, > to try and keep the error in check. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > tests/perf_pmu.c | 34 +++++++++++++++++++++------------- > 1 file changed, 21 insertions(+), 13 deletions(-) > > diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c > index 9a20abb6b..5a26d5272 100644 > --- a/tests/perf_pmu.c > +++ b/tests/perf_pmu.c > @@ -1521,14 +1521,13 @@ static void __rearm_spin_batch(igt_spin_t *spin) > > static void > accuracy(int gem_fd, const struct intel_execution_engine2 *e, > - unsigned long target_busy_pct) > + unsigned long target_busy_pct, > + unsigned long target_iters) > { > - unsigned long busy_us = 10000 - 100 * (1 + abs(50 - target_busy_pct)); > - unsigned long idle_us = 100 * (busy_us - target_busy_pct * > - busy_us / 100) / target_busy_pct; > const unsigned long min_test_us = 1e6; > - const unsigned long pwm_calibration_us = min_test_us; > - const unsigned long test_us = min_test_us; > + unsigned long pwm_calibration_us; > + unsigned long test_us; > + unsigned long cycle_us, busy_us, idle_us; > double busy_r, expected; > uint64_t val[2]; > uint64_t ts[2]; > @@ -1538,18 +1537,27 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e, > /* Sampling platforms cannot reach the high accuracy criteria. */ > igt_require(gem_has_execlists(gem_fd)); > > - while (idle_us < 2500) { > + /* Aim for approximately 100 iterations for calibration */ > + cycle_us = min_test_us / target_iters; > + busy_us = cycle_us * target_busy_pct / 100; > + idle_us = cycle_us - busy_us; 2% load, 1s / 10 iters cycles_us = 100ms busy_us = 2ms idle_us = 98ms ... > + > + while (idle_us < 2500 || busy_us < 2500) { > busy_us *= 2; > idle_us *= 2; ... busy_us = 4ms idle_us = 196ms I fear here that even sampling timers will get it right with this long PWM cycle. So we miss to notice GuC mode is inaccurate for real world workloads. Okay question is what are real work workloads.. are they really typically shorter than 4ms batches? And what PWM cycle we need here to notice this. I had this empirically worked out to the values that were previously used AFAIR, or perhaps there was some leeway. Hmm.. I think finish the series with a patch to remove the skip on !has_execlists so CI tells us? Regards, Tvrtko > } > + cycle_us = busy_us + idle_us; > + pwm_calibration_us = target_iters * cycle_us / 2; > + test_us = target_iters * cycle_us; > > - igt_info("calibration=%lums, test=%lums; ratio=%.2f%% (%luus/%luus)\n", > - pwm_calibration_us / 1000, test_us / 1000, > - (double)busy_us / (busy_us + idle_us) * 100.0, > + igt_info("calibration=%lums, test=%lums, cycle=%lums; ratio=%.2f%% (%luus/%luus)\n", > + pwm_calibration_us / 1000, test_us / 1000, cycle_us / 1000, > + (double)busy_us / cycle_us * 100.0, > busy_us, idle_us); > > - assert_within_epsilon((double)busy_us / (busy_us + idle_us), > - (double)target_busy_pct / 100.0, tolerance); > + assert_within_epsilon((double)busy_us / cycle_us, > + (double)target_busy_pct / 100.0, > + tolerance); > > igt_assert(pipe(link) == 0); > > @@ -1796,7 +1804,7 @@ igt_main > for (i = 0; i < ARRAY_SIZE(pct); i++) { > igt_subtest_f("busy-accuracy-%u-%s", > pct[i], e->name) > - accuracy(fd, e, pct[i]); > + accuracy(fd, e, pct[i], 10); > } > > igt_subtest_f("busy-hang-%s", e->name) >
Quoting Tvrtko Ursulin (2018-08-09 12:54:41) > > On 08/08/2018 15:59, Chris Wilson wrote: > > Our observation is that the systematic error is proportional to the > > number of iterations we perform; the suspicion is that it directly > > correlates with the number of sleeps. Reduce the number of iterations, > > to try and keep the error in check. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > --- > > tests/perf_pmu.c | 34 +++++++++++++++++++++------------- > > 1 file changed, 21 insertions(+), 13 deletions(-) > > > > diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c > > index 9a20abb6b..5a26d5272 100644 > > --- a/tests/perf_pmu.c > > +++ b/tests/perf_pmu.c > > @@ -1521,14 +1521,13 @@ static void __rearm_spin_batch(igt_spin_t *spin) > > > > static void > > accuracy(int gem_fd, const struct intel_execution_engine2 *e, > > - unsigned long target_busy_pct) > > + unsigned long target_busy_pct, > > + unsigned long target_iters) > > { > > - unsigned long busy_us = 10000 - 100 * (1 + abs(50 - target_busy_pct)); > > - unsigned long idle_us = 100 * (busy_us - target_busy_pct * > > - busy_us / 100) / target_busy_pct; > > const unsigned long min_test_us = 1e6; > > - const unsigned long pwm_calibration_us = min_test_us; > > - const unsigned long test_us = min_test_us; > > + unsigned long pwm_calibration_us; > > + unsigned long test_us; > > + unsigned long cycle_us, busy_us, idle_us; > > double busy_r, expected; > > uint64_t val[2]; > > uint64_t ts[2]; > > @@ -1538,18 +1537,27 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e, > > /* Sampling platforms cannot reach the high accuracy criteria. */ > > igt_require(gem_has_execlists(gem_fd)); > > > > - while (idle_us < 2500) { > > + /* Aim for approximately 100 iterations for calibration */ > > + cycle_us = min_test_us / target_iters; > > + busy_us = cycle_us * target_busy_pct / 100; > > + idle_us = cycle_us - busy_us; > > 2% load, 1s / 10 iters > cycles_us = 100ms > busy_us = 2ms > idle_us = 98ms > ... > > > + > > + while (idle_us < 2500 || busy_us < 2500) { > > busy_us *= 2; > > idle_us *= 2; > > ... > > busy_us = 4ms > idle_us = 196ms Currently it is 250ms per 98:2 cycle and about 20ms per 50:50 cycle. So we are only doing 4 and 50 iterations respectively. 10 cycles is strictly an improvement :-p -Chris
On 10/08/2018 14:25, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-08-09 12:54:41) >> >> On 08/08/2018 15:59, Chris Wilson wrote: >>> Our observation is that the systematic error is proportional to the >>> number of iterations we perform; the suspicion is that it directly >>> correlates with the number of sleeps. Reduce the number of iterations, >>> to try and keep the error in check. >>> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> --- >>> tests/perf_pmu.c | 34 +++++++++++++++++++++------------- >>> 1 file changed, 21 insertions(+), 13 deletions(-) >>> >>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c >>> index 9a20abb6b..5a26d5272 100644 >>> --- a/tests/perf_pmu.c >>> +++ b/tests/perf_pmu.c >>> @@ -1521,14 +1521,13 @@ static void __rearm_spin_batch(igt_spin_t *spin) >>> >>> static void >>> accuracy(int gem_fd, const struct intel_execution_engine2 *e, >>> - unsigned long target_busy_pct) >>> + unsigned long target_busy_pct, >>> + unsigned long target_iters) >>> { >>> - unsigned long busy_us = 10000 - 100 * (1 + abs(50 - target_busy_pct)); >>> - unsigned long idle_us = 100 * (busy_us - target_busy_pct * >>> - busy_us / 100) / target_busy_pct; >>> const unsigned long min_test_us = 1e6; >>> - const unsigned long pwm_calibration_us = min_test_us; >>> - const unsigned long test_us = min_test_us; >>> + unsigned long pwm_calibration_us; >>> + unsigned long test_us; >>> + unsigned long cycle_us, busy_us, idle_us; >>> double busy_r, expected; >>> uint64_t val[2]; >>> uint64_t ts[2]; >>> @@ -1538,18 +1537,27 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e, >>> /* Sampling platforms cannot reach the high accuracy criteria. */ >>> igt_require(gem_has_execlists(gem_fd)); >>> >>> - while (idle_us < 2500) { >>> + /* Aim for approximately 100 iterations for calibration */ >>> + cycle_us = min_test_us / target_iters; >>> + busy_us = cycle_us * target_busy_pct / 100; >>> + idle_us = cycle_us - busy_us; >> >> 2% load, 1s / 10 iters >> cycles_us = 100ms >> busy_us = 2ms >> idle_us = 98ms >> ... >> >>> + >>> + while (idle_us < 2500 || busy_us < 2500) { >>> busy_us *= 2; >>> idle_us *= 2; >> >> ... >> >> busy_us = 4ms >> idle_us = 196ms > > Currently it is 250ms per 98:2 cycle and about 20ms per 50:50 cycle. So > we are only doing 4 and 50 iterations respectively. > > 10 cycles is strictly an improvement :-p Hmm indeed. It seems I misremembered how it works. I'll re-read your patches. Regards, Tvrtko
On 08/08/2018 15:59, Chris Wilson wrote: > Our observation is that the systematic error is proportional to the > number of iterations we perform; the suspicion is that it directly > correlates with the number of sleeps. Reduce the number of iterations, > to try and keep the error in check. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > tests/perf_pmu.c | 34 +++++++++++++++++++++------------- > 1 file changed, 21 insertions(+), 13 deletions(-) > > diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c > index 9a20abb6b..5a26d5272 100644 > --- a/tests/perf_pmu.c > +++ b/tests/perf_pmu.c > @@ -1521,14 +1521,13 @@ static void __rearm_spin_batch(igt_spin_t *spin) > > static void > accuracy(int gem_fd, const struct intel_execution_engine2 *e, > - unsigned long target_busy_pct) > + unsigned long target_busy_pct, > + unsigned long target_iters) > { > - unsigned long busy_us = 10000 - 100 * (1 + abs(50 - target_busy_pct)); > - unsigned long idle_us = 100 * (busy_us - target_busy_pct * > - busy_us / 100) / target_busy_pct; > const unsigned long min_test_us = 1e6; > - const unsigned long pwm_calibration_us = min_test_us; > - const unsigned long test_us = min_test_us; > + unsigned long pwm_calibration_us; > + unsigned long test_us; > + unsigned long cycle_us, busy_us, idle_us; > double busy_r, expected; > uint64_t val[2]; > uint64_t ts[2]; > @@ -1538,18 +1537,27 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e, > /* Sampling platforms cannot reach the high accuracy criteria. */ > igt_require(gem_has_execlists(gem_fd)); > > - while (idle_us < 2500) { > + /* Aim for approximately 100 iterations for calibration */ > + cycle_us = min_test_us / target_iters; > + busy_us = cycle_us * target_busy_pct / 100; > + idle_us = cycle_us - busy_us; > + > + while (idle_us < 2500 || busy_us < 2500) { > busy_us *= 2; > idle_us *= 2; > } > + cycle_us = busy_us + idle_us; > + pwm_calibration_us = target_iters * cycle_us / 2; I'd be tempted not to halve the calibration phase, just to minimize the number of changes. > + test_us = target_iters * cycle_us; > > - igt_info("calibration=%lums, test=%lums; ratio=%.2f%% (%luus/%luus)\n", > - pwm_calibration_us / 1000, test_us / 1000, > - (double)busy_us / (busy_us + idle_us) * 100.0, > + igt_info("calibration=%lums, test=%lums, cycle=%lums; ratio=%.2f%% (%luus/%luus)\n", > + pwm_calibration_us / 1000, test_us / 1000, cycle_us / 1000, > + (double)busy_us / cycle_us * 100.0, > busy_us, idle_us); > > - assert_within_epsilon((double)busy_us / (busy_us + idle_us), > - (double)target_busy_pct / 100.0, tolerance); > + assert_within_epsilon((double)busy_us / cycle_us, > + (double)target_busy_pct / 100.0, > + tolerance); > > igt_assert(pipe(link) == 0); > > @@ -1796,7 +1804,7 @@ igt_main > for (i = 0; i < ARRAY_SIZE(pct); i++) { > igt_subtest_f("busy-accuracy-%u-%s", > pct[i], e->name) > - accuracy(fd, e, pct[i]); > + accuracy(fd, e, pct[i], 10); > } > > igt_subtest_f("busy-hang-%s", e->name) > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c index 9a20abb6b..5a26d5272 100644 --- a/tests/perf_pmu.c +++ b/tests/perf_pmu.c @@ -1521,14 +1521,13 @@ static void __rearm_spin_batch(igt_spin_t *spin) static void accuracy(int gem_fd, const struct intel_execution_engine2 *e, - unsigned long target_busy_pct) + unsigned long target_busy_pct, + unsigned long target_iters) { - unsigned long busy_us = 10000 - 100 * (1 + abs(50 - target_busy_pct)); - unsigned long idle_us = 100 * (busy_us - target_busy_pct * - busy_us / 100) / target_busy_pct; const unsigned long min_test_us = 1e6; - const unsigned long pwm_calibration_us = min_test_us; - const unsigned long test_us = min_test_us; + unsigned long pwm_calibration_us; + unsigned long test_us; + unsigned long cycle_us, busy_us, idle_us; double busy_r, expected; uint64_t val[2]; uint64_t ts[2]; @@ -1538,18 +1537,27 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e, /* Sampling platforms cannot reach the high accuracy criteria. */ igt_require(gem_has_execlists(gem_fd)); - while (idle_us < 2500) { + /* Aim for approximately 100 iterations for calibration */ + cycle_us = min_test_us / target_iters; + busy_us = cycle_us * target_busy_pct / 100; + idle_us = cycle_us - busy_us; + + while (idle_us < 2500 || busy_us < 2500) { busy_us *= 2; idle_us *= 2; } + cycle_us = busy_us + idle_us; + pwm_calibration_us = target_iters * cycle_us / 2; + test_us = target_iters * cycle_us; - igt_info("calibration=%lums, test=%lums; ratio=%.2f%% (%luus/%luus)\n", - pwm_calibration_us / 1000, test_us / 1000, - (double)busy_us / (busy_us + idle_us) * 100.0, + igt_info("calibration=%lums, test=%lums, cycle=%lums; ratio=%.2f%% (%luus/%luus)\n", + pwm_calibration_us / 1000, test_us / 1000, cycle_us / 1000, + (double)busy_us / cycle_us * 100.0, busy_us, idle_us); - assert_within_epsilon((double)busy_us / (busy_us + idle_us), - (double)target_busy_pct / 100.0, tolerance); + assert_within_epsilon((double)busy_us / cycle_us, + (double)target_busy_pct / 100.0, + tolerance); igt_assert(pipe(link) == 0); @@ -1796,7 +1804,7 @@ igt_main for (i = 0; i < ARRAY_SIZE(pct); i++) { igt_subtest_f("busy-accuracy-%u-%s", pct[i], e->name) - accuracy(fd, e, pct[i]); + accuracy(fd, e, pct[i], 10); } igt_subtest_f("busy-hang-%s", e->name)
Our observation is that the systematic error is proportional to the number of iterations we perform; the suspicion is that it directly correlates with the number of sleeps. Reduce the number of iterations, to try and keep the error in check. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- tests/perf_pmu.c | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-)