Message ID | 1369935788-19069-1-git-send-email-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, May 30, 2013 at 07:43:07PM +0200, Paolo Bonzini wrote: > This patch includes two fixes for SB: > > * the 3rd fixed counter ("ref cpu cycles") can sometimes report > less than the number of iterations > Is it documented? It is strange for "architectural" counter to behave differently on different architectures. > * there is an 8th counter which causes out of bounds accesses > to gp_event or check_counters_many's cnt array > > There is still a bug in KVM, because the "pmu all counters-0" > test fails. (It passes if you use any 6 of the 8 gp counters, > fails if you use 7 or 8). > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > x86/pmu.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/x86/pmu.c b/x86/pmu.c > index 2c46f31..dca753a 100644 > --- a/x86/pmu.c > +++ b/x86/pmu.c > @@ -88,9 +88,10 @@ struct pmu_event { > }, fixed_events[] = { > {"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N}, > {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N}, > - {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 1*N, 30*N} > + {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N} > }; > > +static int num_counters; > static int tests, failures; > > char *buf; > @@ -237,7 +238,7 @@ static void check_gp_counter(struct pmu_event *evt) > }; > int i; > > - for (i = 0; i < eax.split.num_counters; i++, cnt.ctr++) { > + for (i = 0; i < num_counters; i++, cnt.ctr++) { > cnt.count = 0; > measure(&cnt, 1); > report(evt->name, i, verify_event(cnt.count, evt)); > @@ -276,7 +277,7 @@ static void check_counters_many(void) > pmu_counter_t cnt[10]; > int i, n; > > - for (i = 0, n = 0; n < eax.split.num_counters; i++) { > + for (i = 0, n = 0; n < num_counters; i++) { > if (ebx.full & (1 << i)) > continue; > > @@ -316,10 +317,10 @@ static void check_counter_overflow(void) > /* clear status before test */ > wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, rdmsr(MSR_CORE_PERF_GLOBAL_STATUS)); > > - for (i = 0; i < eax.split.num_counters + 1; i++, cnt.ctr++) { > + for (i = 0; i < num_counters + 1; i++, cnt.ctr++) { > uint64_t status; > int idx; > - if (i == eax.split.num_counters) > + if (i == num_counters) > cnt.ctr = fixed_events[0].unit_sel; > if (i % 2) > cnt.config |= EVNTSEL_INT; > @@ -355,7 +356,7 @@ static void check_rdpmc(void) > uint64_t val = 0x1f3456789ull; > int i; > > - for (i = 0; i < eax.split.num_counters; i++) { > + for (i = 0; i < num_counters; i++) { > uint64_t x = (val & 0xffffffff) | > ((1ull << (eax.split.bit_width - 32)) - 1) << 32; > wrmsr(MSR_IA32_PERFCTR0 + i, val); > @@ -395,6 +396,14 @@ int main(int ac, char **av) > printf("Fixed counters: %d\n", edx.split.num_counters_fixed); > printf("Fixed counter width: %d\n", edx.split.bit_width_fixed); > > + num_counters = eax.split.num_counters; > + if (num_counters > ARRAY_SIZE(gp_events)) > + num_counters = ARRAY_SIZE(gp_events); > + > apic_write(APIC_LVTPC, PC_VECTOR); > > check_gp_counters(); > -- > 1.8.2.1 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 02/06/2013 17:32, Gleb Natapov ha scritto: > On Thu, May 30, 2013 at 07:43:07PM +0200, Paolo Bonzini wrote: >> This patch includes two fixes for SB: >> >> * the 3rd fixed counter ("ref cpu cycles") can sometimes report >> less than the number of iterations >> > Is it documented? It is strange for "architectural" counter to behave > differently on different architectures. It just counts the CPU cycles. If the CPU can optimize the loop better, it will take less CPU cycles to execute it. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 03, 2013 at 08:33:13AM +0200, Paolo Bonzini wrote: > Il 02/06/2013 17:32, Gleb Natapov ha scritto: > > On Thu, May 30, 2013 at 07:43:07PM +0200, Paolo Bonzini wrote: > >> This patch includes two fixes for SB: > >> > >> * the 3rd fixed counter ("ref cpu cycles") can sometimes report > >> less than the number of iterations > >> > > Is it documented? It is strange for "architectural" counter to behave > > differently on different architectures. > > It just counts the CPU cycles. If the CPU can optimize the loop better, > it will take less CPU cycles to execute it. > We should try and change the loop so that it will not be so easily optimized. Making the test succeed if only 10% percent of cycles were spend on a loop may result in the test missing the case when counter counts something different. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 03/06/2013 08:38, Gleb Natapov ha scritto: > On Mon, Jun 03, 2013 at 08:33:13AM +0200, Paolo Bonzini wrote: >> Il 02/06/2013 17:32, Gleb Natapov ha scritto: >>> On Thu, May 30, 2013 at 07:43:07PM +0200, Paolo Bonzini wrote: >>>> This patch includes two fixes for SB: >>>> >>>> * the 3rd fixed counter ("ref cpu cycles") can sometimes report >>>> less than the number of iterations >>>> >>> Is it documented? It is strange for "architectural" counter to behave >>> differently on different architectures. >> >> It just counts the CPU cycles. If the CPU can optimize the loop better, >> it will take less CPU cycles to execute it. >> > We should try and change the loop so that it will not be so easily optimized. > Making the test succeed if only 10% percent of cycles were spend on a loop > may result in the test missing the case when counter counts something > different. Any hard-to-optimize loop risks becoming wrong on the other side (e.g. if something stalls the pipeline, a newer chip with longer pipeline will use more CPU cycles). Turbo boost could also contribute to lowering the number of cycles; a boosted processor has ref cpu cycles that are _longer_ than the regular cycles (thus they count in smaller numbers). Maybe that's why "core cycles" didn't go below N. The real result was something like 0.8*N (780-830000). I used 0.1*N because it is used for the "ref cpu cycles" gp counter, which is not the same but similar. Should I change it to 0.5*N or so? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 03, 2013 at 09:08:46AM +0200, Paolo Bonzini wrote: > Il 03/06/2013 08:38, Gleb Natapov ha scritto: > > On Mon, Jun 03, 2013 at 08:33:13AM +0200, Paolo Bonzini wrote: > >> Il 02/06/2013 17:32, Gleb Natapov ha scritto: > >>> On Thu, May 30, 2013 at 07:43:07PM +0200, Paolo Bonzini wrote: > >>>> This patch includes two fixes for SB: > >>>> > >>>> * the 3rd fixed counter ("ref cpu cycles") can sometimes report > >>>> less than the number of iterations > >>>> > >>> Is it documented? It is strange for "architectural" counter to behave > >>> differently on different architectures. > >> > >> It just counts the CPU cycles. If the CPU can optimize the loop better, > >> it will take less CPU cycles to execute it. > >> > > We should try and change the loop so that it will not be so easily optimized. > > Making the test succeed if only 10% percent of cycles were spend on a loop > > may result in the test missing the case when counter counts something > > different. > > Any hard-to-optimize loop risks becoming wrong on the other side (e.g. > if something stalls the pipeline, a newer chip with longer pipeline will > use more CPU cycles). > > Turbo boost could also contribute to lowering the number of cycles; a > boosted processor has ref cpu cycles that are _longer_ than the regular > cycles (thus they count in smaller numbers). Maybe that's why "core > cycles" didn't go below N. > "core cycles" are subject to Turbo boost changes, not ref cycles. Since instruction are executed at core frequency ref cpu cycles count may be indeed smaller. > The real result was something like 0.8*N (780-830000). I used 0.1*N > because it is used for the "ref cpu cycles" gp counter, which is not the > same but similar. Should I change it to 0.5*N or so? > For cpus with constant_tsc they should be the same. OK lets make gp and fixed use the same boundaries. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 03/06/2013 09:38, Gleb Natapov ha scritto: >> > Turbo boost could also contribute to lowering the number of cycles; a >> > boosted processor has ref cpu cycles that are _longer_ than the regular >> > cycles (thus they count in smaller numbers). Maybe that's why "core >> > cycles" didn't go below N. >> > > "core cycles" are subject to Turbo boost changes, not ref cycles. Since > instruction are executed at core frequency ref cpu cycles count may be > indeed smaller. Yes, that's what I was trying to say. :) Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/x86/pmu.c b/x86/pmu.c index 2c46f31..dca753a 100644 --- a/x86/pmu.c +++ b/x86/pmu.c @@ -88,9 +88,10 @@ struct pmu_event { }, fixed_events[] = { {"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N}, {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N}, - {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 1*N, 30*N} + {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N} }; +static int num_counters; static int tests, failures; char *buf; @@ -237,7 +238,7 @@ static void check_gp_counter(struct pmu_event *evt) }; int i; - for (i = 0; i < eax.split.num_counters; i++, cnt.ctr++) { + for (i = 0; i < num_counters; i++, cnt.ctr++) { cnt.count = 0; measure(&cnt, 1); report(evt->name, i, verify_event(cnt.count, evt)); @@ -276,7 +277,7 @@ static void check_counters_many(void) pmu_counter_t cnt[10]; int i, n; - for (i = 0, n = 0; n < eax.split.num_counters; i++) { + for (i = 0, n = 0; n < num_counters; i++) { if (ebx.full & (1 << i)) continue; @@ -316,10 +317,10 @@ static void check_counter_overflow(void) /* clear status before test */ wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, rdmsr(MSR_CORE_PERF_GLOBAL_STATUS)); - for (i = 0; i < eax.split.num_counters + 1; i++, cnt.ctr++) { + for (i = 0; i < num_counters + 1; i++, cnt.ctr++) { uint64_t status; int idx; - if (i == eax.split.num_counters) + if (i == num_counters) cnt.ctr = fixed_events[0].unit_sel; if (i % 2) cnt.config |= EVNTSEL_INT; @@ -355,7 +356,7 @@ static void check_rdpmc(void) uint64_t val = 0x1f3456789ull; int i; - for (i = 0; i < eax.split.num_counters; i++) { + for (i = 0; i < num_counters; i++) { uint64_t x = (val & 0xffffffff) | ((1ull << (eax.split.bit_width - 32)) - 1) << 32; wrmsr(MSR_IA32_PERFCTR0 + i, val); @@ -395,6 +396,14 @@ int main(int ac, char **av) printf("Fixed counters: %d\n", edx.split.num_counters_fixed); printf("Fixed counter width: %d\n", edx.split.bit_width_fixed); + num_counters = eax.split.num_counters; + if (num_counters > ARRAY_SIZE(gp_events)) + num_counters = ARRAY_SIZE(gp_events); + apic_write(APIC_LVTPC, PC_VECTOR); check_gp_counters(); -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org
This patch includes two fixes for SB: * the 3rd fixed counter ("ref cpu cycles") can sometimes report less than the number of iterations * there is an 8th counter which causes out of bounds accesses to gp_event or check_counters_many's cnt array There is still a bug in KVM, because the "pmu all counters-0" test fails. (It passes if you use any 6 of the 8 gp counters, fails if you use 7 or 8). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- x86/pmu.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) More majordomo info at http://vger.kernel.org/majordomo-info.html