diff mbox series

[04/19] target/ppc: PMU Book3s basic insns count for pseries TCG

Message ID 20210809131057.1694145-5-danielhb413@gmail.com (mailing list archive)
State New, archived
Headers show
Series PMU-EBB support for PPC64 TCG | expand

Commit Message

Daniel Henrique Barboza Aug. 9, 2021, 1:10 p.m. UTC
The PMCC (PMC Control) bit in the MMCR0 register controls whether the
counters PMC5 and PMC6 are being part of the performance monitor
facility in a specific time. If PMCC allows it, PMC5 and PMC6 will
always be used to measure instructions completed and cycles,
respectively.

This patch adds the barebones of the Book3s PMU logic by enabling
instruction counting, using the icount framework, using the performance
monitor counters 5 and 6. The overall logic goes as follows:

- a helper is added to control the PMU state on each MMCR0 write. This
allows for the PMU to start/stop as quick as possible;

- only PMC5 and PMC6 are being set. PMC6 (cycles) is default to 4*insns
(for cycles per instruction) for now;

- the intended usage is to freeze the counters by setting MMCR0_FC, do
any additional setting via MMCR1 (not implemented yet) and setting
initial counter values,  and enable the PMU by zeroing MMCR0_FC. Software
must freeze counters to read the results - on the fly reading of the PMCs
will return the starting value of each one.

Since there will be more PMU exclusive code to be added next, let's also
put the PMU logic in its own helper to keep all in the same place. The
code is also repetitive and not really extensible to add more PMCs, but
we'll handle this in the next patches.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/cpu.h               |  4 ++
 target/ppc/cpu_init.c          |  4 +-
 target/ppc/helper.h            |  1 +
 target/ppc/meson.build         |  1 +
 target/ppc/pmu_book3s_helper.c | 78 ++++++++++++++++++++++++++++++++++
 target/ppc/translate.c         | 14 ++++--
 6 files changed, 97 insertions(+), 5 deletions(-)
 create mode 100644 target/ppc/pmu_book3s_helper.c

Comments

David Gibson Aug. 10, 2021, 3:39 a.m. UTC | #1
On Mon, Aug 09, 2021 at 10:10:42AM -0300, Daniel Henrique Barboza wrote:
> The PMCC (PMC Control) bit in the MMCR0 register controls whether the
> counters PMC5 and PMC6 are being part of the performance monitor
> facility in a specific time. If PMCC allows it, PMC5 and PMC6 will
> always be used to measure instructions completed and cycles,
> respectively.
> 
> This patch adds the barebones of the Book3s PMU logic by enabling
> instruction counting, using the icount framework, using the performance
> monitor counters 5 and 6. The overall logic goes as follows:
> 
> - a helper is added to control the PMU state on each MMCR0 write. This
> allows for the PMU to start/stop as quick as possible;

Um.. why does a helper accomplish that goal?

> 
> - only PMC5 and PMC6 are being set. PMC6 (cycles) is default to 4*insns
> (for cycles per instruction) for now;

What's the justification for that value?  With a superscalar core, I'd
expect average (median) cycles per instruction to be < 1 a lot of the
time.  Mean cycles per instruction could be higher since certain
instructions could take a lot.

> - the intended usage is to freeze the counters by setting MMCR0_FC, do
> any additional setting via MMCR1 (not implemented yet) and setting
> initial counter values,  and enable the PMU by zeroing MMCR0_FC. Software
> must freeze counters to read the results - on the fly reading of the PMCs
> will return the starting value of each one.

Is that the way hardware behaves?  Or is it just a limitation of this
software implementation?  Either is fine, we should just be clear on
what it is.

> 
> Since there will be more PMU exclusive code to be added next, let's also
> put the PMU logic in its own helper to keep all in the same place. The
> code is also repetitive and not really extensible to add more PMCs, but
> we'll handle this in the next patches.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  target/ppc/cpu.h               |  4 ++
>  target/ppc/cpu_init.c          |  4 +-
>  target/ppc/helper.h            |  1 +
>  target/ppc/meson.build         |  1 +
>  target/ppc/pmu_book3s_helper.c | 78 ++++++++++++++++++++++++++++++++++
>  target/ppc/translate.c         | 14 ++++--
>  6 files changed, 97 insertions(+), 5 deletions(-)
>  create mode 100644 target/ppc/pmu_book3s_helper.c
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 4d96015f81..229abfe7ee 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1175,6 +1175,10 @@ struct CPUPPCState {
>      uint32_t tm_vscr;
>      uint64_t tm_dscr;
>      uint64_t tm_tar;
> +
> +    /* PMU registers icount state */
> +    uint64_t pmc5_base_icount;
> +    uint64_t pmc6_base_icount;
>  };
>  
>  #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 71062809c8..fce89ee994 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -6822,7 +6822,7 @@ static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
>      spr_register_kvm(env, SPR_POWER_MMCR0, "MMCR0",
>                       SPR_NOACCESS, SPR_NOACCESS,
>                       &spr_read_pmu_generic, &spr_write_pmu_generic,
> -                     KVM_REG_PPC_MMCR0, 0x00000000);
> +                     KVM_REG_PPC_MMCR0, 0x80000000);
>      spr_register_kvm(env, SPR_POWER_MMCR1, "MMCR1",
>                       SPR_NOACCESS, SPR_NOACCESS,
>                       &spr_read_pmu_generic, &spr_write_pmu_generic,
> @@ -6870,7 +6870,7 @@ static void register_book3s_pmu_user_sprs(CPUPPCState *env)
>      spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
>                   &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>                   &spr_read_ureg, &spr_write_ureg,
> -                 0x00000000);
> +                 0x80000000);
>      spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
>                   &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>                   &spr_read_ureg, &spr_write_ureg,
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 4076aa281e..5122632784 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -20,6 +20,7 @@ DEF_HELPER_1(rfscv, void, env)
>  DEF_HELPER_1(hrfid, void, env)
>  DEF_HELPER_2(store_lpcr, void, env, tl)
>  DEF_HELPER_2(store_pcr, void, env, tl)
> +DEF_HELPER_2(store_mmcr0, void, env, tl)
>  #endif
>  DEF_HELPER_1(check_tlb_flush_local, void, env)
>  DEF_HELPER_1(check_tlb_flush_global, void, env)
> diff --git a/target/ppc/meson.build b/target/ppc/meson.build
> index b85f295703..bf252ca3ac 100644
> --- a/target/ppc/meson.build
> +++ b/target/ppc/meson.build
> @@ -14,6 +14,7 @@ ppc_ss.add(when: 'CONFIG_TCG', if_true: files(
>    'int_helper.c',
>    'mem_helper.c',
>    'misc_helper.c',
> +  'pmu_book3s_helper.c',
>    'timebase_helper.c',
>    'translate.c',
>  ))
> diff --git a/target/ppc/pmu_book3s_helper.c b/target/ppc/pmu_book3s_helper.c
> new file mode 100644
> index 0000000000..fe16fcfce0
> --- /dev/null
> +++ b/target/ppc/pmu_book3s_helper.c

I'd prefer to call this just book3s_pmu.c.  Or better yet
"powerX_pmu.c", where X is the specific PMU model you're implementing
since IIRC the particulars of the PMU vary quite a lot from POWER7
through to POWER10.

> @@ -0,0 +1,78 @@
> +/*
> + * PowerPC Book3s PMU emulation helpers for QEMU TCG
> + *
> + *  Copyright IBM Corp. 2021
> + *
> + * Authors:
> + *  Daniel Henrique Barboza      <danielhb413@gmail.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include "exec/exec-all.h"
> +#include "exec/helper-proto.h"
> +#include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
> +
> +static uint64_t get_insns(void)
> +{
> +    return (uint64_t)icount_get_raw();
> +}
> +
> +static uint64_t get_cycles(uint64_t insns)
> +{
> +    /* Placeholder value */
> +    return insns * 4;
> +}
> +
> +/* PMC5 always count instructions */
> +static void freeze_PMC5_value(CPUPPCState *env)
> +{
> +    uint64_t insns = get_insns() - env->pmc5_base_icount;
> +
> +    env->spr[SPR_POWER_PMC5] += insns;
> +    env->pmc5_base_icount += insns;
> +}
> +
> +/* PMC6 always count cycles */
> +static void freeze_PMC6_value(CPUPPCState *env)
> +{
> +    uint64_t insns = get_insns() - env->pmc6_base_icount;
> +
> +    env->spr[SPR_POWER_PMC6] += get_cycles(insns);
> +    env->pmc6_base_icount += insns;
> +}
> +
> +void helper_store_mmcr0(CPUPPCState *env, target_ulong value)
> +{
> +    bool curr_FC = env->spr[SPR_POWER_MMCR0] & MMCR0_FC;
> +    bool new_FC = value & MMCR0_FC;
> +
> +    /*
> +     * In an frozen count (FC) bit change:
> +     *
> +     * - if PMCs were running (curr_FC = false) and we're freezing
> +     * them (new_FC = true), save the PMCs values in the registers.
> +     *
> +     * - if PMCs were frozen (curr_FC = true) and we're activating
> +     * them (new_FC = false), calculate the current icount for each
> +     * register to allow for subsequent reads to calculate the insns
> +     * passed.
> +     */
> +    if (curr_FC != new_FC) {
> +        if (!curr_FC) {
> +            freeze_PMC5_value(env);
> +            freeze_PMC6_value(env);
> +        } else {
> +            uint64_t curr_icount = get_insns();
> +
> +            env->pmc5_base_icount = curr_icount;
> +            env->pmc6_base_icount = curr_icount;
> +        }
> +    }
> +
> +    env->spr[SPR_POWER_MMCR0] = value;
> +}
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 29b0a340a9..62356cfadf 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -409,8 +409,14 @@ void spr_write_generic(DisasContext *ctx, int sprn, int gprn)
>  
>  void spr_write_pmu_generic(DisasContext *ctx, int sprn, int gprn)
>  {
> -    /* For now it's just a call to spr_write_generic() */
> -    spr_write_generic(ctx, sprn, gprn);
> +    switch (sprn) {
> +    case SPR_POWER_MMCR0:
> +        gen_icount_io_start(ctx);
> +        gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
> +        break;
> +    default:
> +        spr_write_generic(ctx, sprn, gprn);
> +    }
>  }
>  
>  #if !defined(CONFIG_USER_ONLY)
> @@ -592,6 +598,8 @@ void spr_write_pmu_ureg(DisasContext *ctx, int sprn, int gprn)
>          t0 = tcg_temp_new();
>          t1 = tcg_temp_new();
>  
> +        gen_icount_io_start(ctx);
> +
>          /*
>           * Filter out all bits but FC, PMAO, and PMAE, according
>           * to ISA v3.1, in 10.4.4 Monitor Mode Control Register 0,
> @@ -603,7 +611,7 @@ void spr_write_pmu_ureg(DisasContext *ctx, int sprn, int gprn)
>          tcg_gen_andi_tl(t1, t1, ~(MMCR0_FC | MMCR0_PMAO | MMCR0_PMAE));
>          /* Keep all other bits intact */
>          tcg_gen_or_tl(t1, t1, t0);
> -        gen_store_spr(effective_sprn, t1);
> +        gen_helper_store_mmcr0(cpu_env, t1);
>  
>          tcg_temp_free(t0);
>          tcg_temp_free(t1);
Daniel Henrique Barboza Aug. 10, 2021, 1:24 p.m. UTC | #2
On 8/10/21 12:39 AM, David Gibson wrote:
> On Mon, Aug 09, 2021 at 10:10:42AM -0300, Daniel Henrique Barboza wrote:
>> The PMCC (PMC Control) bit in the MMCR0 register controls whether the
>> counters PMC5 and PMC6 are being part of the performance monitor
>> facility in a specific time. If PMCC allows it, PMC5 and PMC6 will
>> always be used to measure instructions completed and cycles,
>> respectively.
>>
>> This patch adds the barebones of the Book3s PMU logic by enabling
>> instruction counting, using the icount framework, using the performance
>> monitor counters 5 and 6. The overall logic goes as follows:
>>
>> - a helper is added to control the PMU state on each MMCR0 write. This
>> allows for the PMU to start/stop as quick as possible;
> 
> Um.. why does a helper accomplish that goal?

Every change in the frozen counter bit (MMCR0_FC) will trigger a specific
behavior in the PMU (starting/stopping counters, timers and so on). Doing
all this work in the body of translation.c looked like a tall order. Adding
a helper to handle this logic somewhere else seemed appropriate.

I got this idea reading how gen_mtmsrd() uses gen_helper_store_msr() to both
update MSR and take other actions depending on the bits being set/cleared.

> 
>>
>> - only PMC5 and PMC6 are being set. PMC6 (cycles) is default to 4*insns
>> (for cycles per instruction) for now;
> 
> What's the justification for that value?  With a superscalar core, I'd
> expect average (median) cycles per instruction to be < 1 a lot of the
> time.  Mean cycles per instruction could be higher since certain
> instructions could take a lot.

Just a placeholder. I can fold the cycles calculation from patch 08 into
this patch to avoid it.

> 
>> - the intended usage is to freeze the counters by setting MMCR0_FC, do
>> any additional setting via MMCR1 (not implemented yet) and setting
>> initial counter values,  and enable the PMU by zeroing MMCR0_FC. Software
>> must freeze counters to read the results - on the fly reading of the PMCs
>> will return the starting value of each one.
> 
> Is that the way hardware behaves?  Or is it just a limitation of this
> software implementation?  Either is fine, we should just be clear on
> what it is.

I probably should've asked the kernel perf folks about it, but I didn't.

The hardware doesn't seem to have these restrictions. I'll reword this a bit
to make it clearer that this is an implementation restriction.

> 
>>
>> Since there will be more PMU exclusive code to be added next, let's also
>> put the PMU logic in its own helper to keep all in the same place. The
>> code is also repetitive and not really extensible to add more PMCs, but
>> we'll handle this in the next patches.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   target/ppc/cpu.h               |  4 ++
>>   target/ppc/cpu_init.c          |  4 +-
>>   target/ppc/helper.h            |  1 +
>>   target/ppc/meson.build         |  1 +
>>   target/ppc/pmu_book3s_helper.c | 78 ++++++++++++++++++++++++++++++++++
>>   target/ppc/translate.c         | 14 ++++--
>>   6 files changed, 97 insertions(+), 5 deletions(-)
>>   create mode 100644 target/ppc/pmu_book3s_helper.c
>>
>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index 4d96015f81..229abfe7ee 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -1175,6 +1175,10 @@ struct CPUPPCState {
>>       uint32_t tm_vscr;
>>       uint64_t tm_dscr;
>>       uint64_t tm_tar;
>> +
>> +    /* PMU registers icount state */
>> +    uint64_t pmc5_base_icount;
>> +    uint64_t pmc6_base_icount;
>>   };
>>   
>>   #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
>> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
>> index 71062809c8..fce89ee994 100644
>> --- a/target/ppc/cpu_init.c
>> +++ b/target/ppc/cpu_init.c
>> @@ -6822,7 +6822,7 @@ static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
>>       spr_register_kvm(env, SPR_POWER_MMCR0, "MMCR0",
>>                        SPR_NOACCESS, SPR_NOACCESS,
>>                        &spr_read_pmu_generic, &spr_write_pmu_generic,
>> -                     KVM_REG_PPC_MMCR0, 0x00000000);
>> +                     KVM_REG_PPC_MMCR0, 0x80000000);
>>       spr_register_kvm(env, SPR_POWER_MMCR1, "MMCR1",
>>                        SPR_NOACCESS, SPR_NOACCESS,
>>                        &spr_read_pmu_generic, &spr_write_pmu_generic,
>> @@ -6870,7 +6870,7 @@ static void register_book3s_pmu_user_sprs(CPUPPCState *env)
>>       spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
>>                    &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>>                    &spr_read_ureg, &spr_write_ureg,
>> -                 0x00000000);
>> +                 0x80000000);
>>       spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
>>                    &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>>                    &spr_read_ureg, &spr_write_ureg,
>> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
>> index 4076aa281e..5122632784 100644
>> --- a/target/ppc/helper.h
>> +++ b/target/ppc/helper.h
>> @@ -20,6 +20,7 @@ DEF_HELPER_1(rfscv, void, env)
>>   DEF_HELPER_1(hrfid, void, env)
>>   DEF_HELPER_2(store_lpcr, void, env, tl)
>>   DEF_HELPER_2(store_pcr, void, env, tl)
>> +DEF_HELPER_2(store_mmcr0, void, env, tl)
>>   #endif
>>   DEF_HELPER_1(check_tlb_flush_local, void, env)
>>   DEF_HELPER_1(check_tlb_flush_global, void, env)
>> diff --git a/target/ppc/meson.build b/target/ppc/meson.build
>> index b85f295703..bf252ca3ac 100644
>> --- a/target/ppc/meson.build
>> +++ b/target/ppc/meson.build
>> @@ -14,6 +14,7 @@ ppc_ss.add(when: 'CONFIG_TCG', if_true: files(
>>     'int_helper.c',
>>     'mem_helper.c',
>>     'misc_helper.c',
>> +  'pmu_book3s_helper.c',
>>     'timebase_helper.c',
>>     'translate.c',
>>   ))
>> diff --git a/target/ppc/pmu_book3s_helper.c b/target/ppc/pmu_book3s_helper.c
>> new file mode 100644
>> index 0000000000..fe16fcfce0
>> --- /dev/null
>> +++ b/target/ppc/pmu_book3s_helper.c
> 
> I'd prefer to call this just book3s_pmu.c.  Or better yet
> "powerX_pmu.c", where X is the specific PMU model you're implementing
> since IIRC the particulars of the PMU vary quite a lot from POWER7
> through to POWER10.
> 
>> @@ -0,0 +1,78 @@
>> +/*
>> + * PowerPC Book3s PMU emulation helpers for QEMU TCG
>> + *
>> + *  Copyright IBM Corp. 2021
>> + *
>> + * Authors:
>> + *  Daniel Henrique Barboza      <danielhb413@gmail.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "cpu.h"
>> +#include "exec/exec-all.h"
>> +#include "exec/helper-proto.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/main-loop.h"
>> +
>> +static uint64_t get_insns(void)
>> +{
>> +    return (uint64_t)icount_get_raw();
>> +}
>> +
>> +static uint64_t get_cycles(uint64_t insns)
>> +{
>> +    /* Placeholder value */
>> +    return insns * 4;
>> +}
>> +
>> +/* PMC5 always count instructions */
>> +static void freeze_PMC5_value(CPUPPCState *env)
>> +{
>> +    uint64_t insns = get_insns() - env->pmc5_base_icount;
>> +
>> +    env->spr[SPR_POWER_PMC5] += insns;
>> +    env->pmc5_base_icount += insns;
>> +}
>> +
>> +/* PMC6 always count cycles */
>> +static void freeze_PMC6_value(CPUPPCState *env)
>> +{
>> +    uint64_t insns = get_insns() - env->pmc6_base_icount;
>> +
>> +    env->spr[SPR_POWER_PMC6] += get_cycles(insns);
>> +    env->pmc6_base_icount += insns;
>> +}
>> +
>> +void helper_store_mmcr0(CPUPPCState *env, target_ulong value)
>> +{
>> +    bool curr_FC = env->spr[SPR_POWER_MMCR0] & MMCR0_FC;
>> +    bool new_FC = value & MMCR0_FC;
>> +
>> +    /*
>> +     * In an frozen count (FC) bit change:
>> +     *
>> +     * - if PMCs were running (curr_FC = false) and we're freezing
>> +     * them (new_FC = true), save the PMCs values in the registers.
>> +     *
>> +     * - if PMCs were frozen (curr_FC = true) and we're activating
>> +     * them (new_FC = false), calculate the current icount for each
>> +     * register to allow for subsequent reads to calculate the insns
>> +     * passed.
>> +     */
>> +    if (curr_FC != new_FC) {
>> +        if (!curr_FC) {
>> +            freeze_PMC5_value(env);
>> +            freeze_PMC6_value(env);
>> +        } else {
>> +            uint64_t curr_icount = get_insns();
>> +
>> +            env->pmc5_base_icount = curr_icount;
>> +            env->pmc6_base_icount = curr_icount;
>> +        }
>> +    }
>> +
>> +    env->spr[SPR_POWER_MMCR0] = value;
>> +}
>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>> index 29b0a340a9..62356cfadf 100644
>> --- a/target/ppc/translate.c
>> +++ b/target/ppc/translate.c
>> @@ -409,8 +409,14 @@ void spr_write_generic(DisasContext *ctx, int sprn, int gprn)
>>   
>>   void spr_write_pmu_generic(DisasContext *ctx, int sprn, int gprn)
>>   {
>> -    /* For now it's just a call to spr_write_generic() */
>> -    spr_write_generic(ctx, sprn, gprn);
>> +    switch (sprn) {
>> +    case SPR_POWER_MMCR0:
>> +        gen_icount_io_start(ctx);
>> +        gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
>> +        break;
>> +    default:
>> +        spr_write_generic(ctx, sprn, gprn);
>> +    }
>>   }
>>   
>>   #if !defined(CONFIG_USER_ONLY)
>> @@ -592,6 +598,8 @@ void spr_write_pmu_ureg(DisasContext *ctx, int sprn, int gprn)
>>           t0 = tcg_temp_new();
>>           t1 = tcg_temp_new();
>>   
>> +        gen_icount_io_start(ctx);
>> +
>>           /*
>>            * Filter out all bits but FC, PMAO, and PMAE, according
>>            * to ISA v3.1, in 10.4.4 Monitor Mode Control Register 0,
>> @@ -603,7 +611,7 @@ void spr_write_pmu_ureg(DisasContext *ctx, int sprn, int gprn)
>>           tcg_gen_andi_tl(t1, t1, ~(MMCR0_FC | MMCR0_PMAO | MMCR0_PMAE));
>>           /* Keep all other bits intact */
>>           tcg_gen_or_tl(t1, t1, t0);
>> -        gen_store_spr(effective_sprn, t1);
>> +        gen_helper_store_mmcr0(cpu_env, t1);
>>   
>>           tcg_temp_free(t0);
>>           tcg_temp_free(t1);
>
Richard Henderson Aug. 11, 2021, 12:26 a.m. UTC | #3
On 8/9/21 3:10 AM, Daniel Henrique Barboza wrote:
> The PMCC (PMC Control) bit in the MMCR0 register controls whether the
> counters PMC5 and PMC6 are being part of the performance monitor
> facility in a specific time. If PMCC allows it, PMC5 and PMC6 will
> always be used to measure instructions completed and cycles,
> respectively.
> 
> This patch adds the barebones of the Book3s PMU logic by enabling
> instruction counting, using the icount framework, using the performance
> monitor counters 5 and 6. The overall logic goes as follows:
> 
> - a helper is added to control the PMU state on each MMCR0 write. This
> allows for the PMU to start/stop as quick as possible;
> 
> - only PMC5 and PMC6 are being set. PMC6 (cycles) is default to 4*insns
> (for cycles per instruction) for now;
> 
> - the intended usage is to freeze the counters by setting MMCR0_FC, do
> any additional setting via MMCR1 (not implemented yet) and setting
> initial counter values,  and enable the PMU by zeroing MMCR0_FC. Software
> must freeze counters to read the results - on the fly reading of the PMCs
> will return the starting value of each one.
> 
> Since there will be more PMU exclusive code to be added next, let's also
> put the PMU logic in its own helper to keep all in the same place. The
> code is also repetitive and not really extensible to add more PMCs, but
> we'll handle this in the next patches.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>   target/ppc/cpu.h               |  4 ++
>   target/ppc/cpu_init.c          |  4 +-
>   target/ppc/helper.h            |  1 +
>   target/ppc/meson.build         |  1 +
>   target/ppc/pmu_book3s_helper.c | 78 ++++++++++++++++++++++++++++++++++
>   target/ppc/translate.c         | 14 ++++--
>   6 files changed, 97 insertions(+), 5 deletions(-)
>   create mode 100644 target/ppc/pmu_book3s_helper.c
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 4d96015f81..229abfe7ee 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1175,6 +1175,10 @@ struct CPUPPCState {
>       uint32_t tm_vscr;
>       uint64_t tm_dscr;
>       uint64_t tm_tar;
> +
> +    /* PMU registers icount state */
> +    uint64_t pmc5_base_icount;
> +    uint64_t pmc6_base_icount;
>   };
>   
>   #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 71062809c8..fce89ee994 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -6822,7 +6822,7 @@ static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
>       spr_register_kvm(env, SPR_POWER_MMCR0, "MMCR0",
>                        SPR_NOACCESS, SPR_NOACCESS,
>                        &spr_read_pmu_generic, &spr_write_pmu_generic,
> -                     KVM_REG_PPC_MMCR0, 0x00000000);
> +                     KVM_REG_PPC_MMCR0, 0x80000000);
>       spr_register_kvm(env, SPR_POWER_MMCR1, "MMCR1",
>                        SPR_NOACCESS, SPR_NOACCESS,
>                        &spr_read_pmu_generic, &spr_write_pmu_generic,
> @@ -6870,7 +6870,7 @@ static void register_book3s_pmu_user_sprs(CPUPPCState *env)
>       spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
>                    &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>                    &spr_read_ureg, &spr_write_ureg,
> -                 0x00000000);
> +                 0x80000000);
>       spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
>                    &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>                    &spr_read_ureg, &spr_write_ureg,
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 4076aa281e..5122632784 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -20,6 +20,7 @@ DEF_HELPER_1(rfscv, void, env)
>   DEF_HELPER_1(hrfid, void, env)
>   DEF_HELPER_2(store_lpcr, void, env, tl)
>   DEF_HELPER_2(store_pcr, void, env, tl)
> +DEF_HELPER_2(store_mmcr0, void, env, tl)
>   #endif
>   DEF_HELPER_1(check_tlb_flush_local, void, env)
>   DEF_HELPER_1(check_tlb_flush_global, void, env)
> diff --git a/target/ppc/meson.build b/target/ppc/meson.build
> index b85f295703..bf252ca3ac 100644
> --- a/target/ppc/meson.build
> +++ b/target/ppc/meson.build
> @@ -14,6 +14,7 @@ ppc_ss.add(when: 'CONFIG_TCG', if_true: files(
>     'int_helper.c',
>     'mem_helper.c',
>     'misc_helper.c',
> +  'pmu_book3s_helper.c',
>     'timebase_helper.c',
>     'translate.c',
>   ))
> diff --git a/target/ppc/pmu_book3s_helper.c b/target/ppc/pmu_book3s_helper.c
> new file mode 100644
> index 0000000000..fe16fcfce0
> --- /dev/null
> +++ b/target/ppc/pmu_book3s_helper.c
> @@ -0,0 +1,78 @@
> +/*
> + * PowerPC Book3s PMU emulation helpers for QEMU TCG
> + *
> + *  Copyright IBM Corp. 2021
> + *
> + * Authors:
> + *  Daniel Henrique Barboza      <danielhb413@gmail.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include "exec/exec-all.h"
> +#include "exec/helper-proto.h"
> +#include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
> +
> +static uint64_t get_insns(void)
> +{
> +    return (uint64_t)icount_get_raw();
> +}

So... this will just abort for user-only, which I suppose is fair since you won't be 
setting the other control bits there.

But I don't know what it will do if !icount_enabled().  Certainly not provide you with any 
meaningful value.  Given that icount affects all TCG translation, and is only enabled from 
the qemu command-line, do you really want to rely on that for a niche feature?

> +/* PMC5 always count instructions */
> +static void freeze_PMC5_value(CPUPPCState *env)
> +{
> +    uint64_t insns = get_insns() - env->pmc5_base_icount;
> +
> +    env->spr[SPR_POWER_PMC5] += insns;
> +    env->pmc5_base_icount += insns;
> +}
> +
> +/* PMC6 always count cycles */
> +static void freeze_PMC6_value(CPUPPCState *env)
> +{
> +    uint64_t insns = get_insns() - env->pmc6_base_icount;
> +
> +    env->spr[SPR_POWER_PMC6] += get_cycles(insns);
> +    env->pmc6_base_icount += insns;
> +}
...
> +            freeze_PMC5_value(env);
> +            freeze_PMC6_value(env);

Why would you read get_insns() for PMC5 and 6 separately?


r~
Daniel Henrique Barboza Aug. 16, 2021, 5:53 p.m. UTC | #4
On 8/10/21 12:39 AM, David Gibson wrote:
> On Mon, Aug 09, 2021 at 10:10:42AM -0300, Daniel Henrique Barboza wrote:
>> The PMCC (PMC Control) bit in the MMCR0 register controls whether the
>> counters PMC5 and PMC6 are being part of the performance monitor
>> facility in a specific time. If PMCC allows it, PMC5 and PMC6 will
>> always be used to measure instructions completed and cycles,
>> respectively.
>>
>> This patch adds the barebones of the Book3s PMU logic by enabling
>> instruction counting, using the icount framework, using the performance
>> monitor counters 5 and 6. The overall logic goes as follows:
>>
>> - a helper is added to control the PMU state on each MMCR0 write. This
>> allows for the PMU to start/stop as quick as possible;
> 
> Um.. why does a helper accomplish that goal?
> 
>>
>> - only PMC5 and PMC6 are being set. PMC6 (cycles) is default to 4*insns
>> (for cycles per instruction) for now;
> 
> What's the justification for that value?  With a superscalar core, I'd
> expect average (median) cycles per instruction to be < 1 a lot of the
> time.  Mean cycles per instruction could be higher since certain
> instructions could take a lot.
> 
>> - the intended usage is to freeze the counters by setting MMCR0_FC, do
>> any additional setting via MMCR1 (not implemented yet) and setting
>> initial counter values,  and enable the PMU by zeroing MMCR0_FC. Software
>> must freeze counters to read the results - on the fly reading of the PMCs
>> will return the starting value of each one.
> 
> Is that the way hardware behaves?  Or is it just a limitation of this
> software implementation?  Either is fine, we should just be clear on
> what it is.
> 
>>
>> Since there will be more PMU exclusive code to be added next, let's also
>> put the PMU logic in its own helper to keep all in the same place. The
>> code is also repetitive and not really extensible to add more PMCs, but
>> we'll handle this in the next patches.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   target/ppc/cpu.h               |  4 ++
>>   target/ppc/cpu_init.c          |  4 +-
>>   target/ppc/helper.h            |  1 +
>>   target/ppc/meson.build         |  1 +
>>   target/ppc/pmu_book3s_helper.c | 78 ++++++++++++++++++++++++++++++++++
>>   target/ppc/translate.c         | 14 ++++--
>>   6 files changed, 97 insertions(+), 5 deletions(-)
>>   create mode 100644 target/ppc/pmu_book3s_helper.c
>>
>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index 4d96015f81..229abfe7ee 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -1175,6 +1175,10 @@ struct CPUPPCState {
>>       uint32_t tm_vscr;
>>       uint64_t tm_dscr;
>>       uint64_t tm_tar;
>> +
>> +    /* PMU registers icount state */
>> +    uint64_t pmc5_base_icount;
>> +    uint64_t pmc6_base_icount;
>>   };
>>   
>>   #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
>> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
>> index 71062809c8..fce89ee994 100644
>> --- a/target/ppc/cpu_init.c
>> +++ b/target/ppc/cpu_init.c
>> @@ -6822,7 +6822,7 @@ static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
>>       spr_register_kvm(env, SPR_POWER_MMCR0, "MMCR0",
>>                        SPR_NOACCESS, SPR_NOACCESS,
>>                        &spr_read_pmu_generic, &spr_write_pmu_generic,
>> -                     KVM_REG_PPC_MMCR0, 0x00000000);
>> +                     KVM_REG_PPC_MMCR0, 0x80000000);
>>       spr_register_kvm(env, SPR_POWER_MMCR1, "MMCR1",
>>                        SPR_NOACCESS, SPR_NOACCESS,
>>                        &spr_read_pmu_generic, &spr_write_pmu_generic,
>> @@ -6870,7 +6870,7 @@ static void register_book3s_pmu_user_sprs(CPUPPCState *env)
>>       spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
>>                    &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>>                    &spr_read_ureg, &spr_write_ureg,
>> -                 0x00000000);
>> +                 0x80000000);
>>       spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
>>                    &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>>                    &spr_read_ureg, &spr_write_ureg,
>> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
>> index 4076aa281e..5122632784 100644
>> --- a/target/ppc/helper.h
>> +++ b/target/ppc/helper.h
>> @@ -20,6 +20,7 @@ DEF_HELPER_1(rfscv, void, env)
>>   DEF_HELPER_1(hrfid, void, env)
>>   DEF_HELPER_2(store_lpcr, void, env, tl)
>>   DEF_HELPER_2(store_pcr, void, env, tl)
>> +DEF_HELPER_2(store_mmcr0, void, env, tl)
>>   #endif
>>   DEF_HELPER_1(check_tlb_flush_local, void, env)
>>   DEF_HELPER_1(check_tlb_flush_global, void, env)
>> diff --git a/target/ppc/meson.build b/target/ppc/meson.build
>> index b85f295703..bf252ca3ac 100644
>> --- a/target/ppc/meson.build
>> +++ b/target/ppc/meson.build
>> @@ -14,6 +14,7 @@ ppc_ss.add(when: 'CONFIG_TCG', if_true: files(
>>     'int_helper.c',
>>     'mem_helper.c',
>>     'misc_helper.c',
>> +  'pmu_book3s_helper.c',
>>     'timebase_helper.c',
>>     'translate.c',
>>   ))
>> diff --git a/target/ppc/pmu_book3s_helper.c b/target/ppc/pmu_book3s_helper.c
>> new file mode 100644
>> index 0000000000..fe16fcfce0
>> --- /dev/null
>> +++ b/target/ppc/pmu_book3s_helper.c
> 
> I'd prefer to call this just book3s_pmu.c.  Or better yet
> "powerX_pmu.c", where X is the specific PMU model you're implementing
> since IIRC the particulars of the PMU vary quite a lot from POWER7
> through to POWER10.

I'll go with book3s_pmu.c because this PMU implementation will not go
deep enough to touch the differences between POWER processors.

The only aspects that will be implementation specific will be 2 perf
events (0x2 and 0x1E) that the kernel has been using for a long time
and only recently migrated to architected events. We'll support all
architected events that are related to those events so that newer
kernels - and other non-IBM processors - will use the PMU without
the need of having to know about 0x2 and 0x1E.


Thanks,


Daniel

> 
>> @@ -0,0 +1,78 @@
>> +/*
>> + * PowerPC Book3s PMU emulation helpers for QEMU TCG
>> + *
>> + *  Copyright IBM Corp. 2021
>> + *
>> + * Authors:
>> + *  Daniel Henrique Barboza      <danielhb413@gmail.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "cpu.h"
>> +#include "exec/exec-all.h"
>> +#include "exec/helper-proto.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/main-loop.h"
>> +
>> +static uint64_t get_insns(void)
>> +{
>> +    return (uint64_t)icount_get_raw();
>> +}
>> +
>> +static uint64_t get_cycles(uint64_t insns)
>> +{
>> +    /* Placeholder value */
>> +    return insns * 4;
>> +}
>> +
>> +/* PMC5 always count instructions */
>> +static void freeze_PMC5_value(CPUPPCState *env)
>> +{
>> +    uint64_t insns = get_insns() - env->pmc5_base_icount;
>> +
>> +    env->spr[SPR_POWER_PMC5] += insns;
>> +    env->pmc5_base_icount += insns;
>> +}
>> +
>> +/* PMC6 always count cycles */
>> +static void freeze_PMC6_value(CPUPPCState *env)
>> +{
>> +    uint64_t insns = get_insns() - env->pmc6_base_icount;
>> +
>> +    env->spr[SPR_POWER_PMC6] += get_cycles(insns);
>> +    env->pmc6_base_icount += insns;
>> +}
>> +
>> +void helper_store_mmcr0(CPUPPCState *env, target_ulong value)
>> +{
>> +    bool curr_FC = env->spr[SPR_POWER_MMCR0] & MMCR0_FC;
>> +    bool new_FC = value & MMCR0_FC;
>> +
>> +    /*
>> +     * In an frozen count (FC) bit change:
>> +     *
>> +     * - if PMCs were running (curr_FC = false) and we're freezing
>> +     * them (new_FC = true), save the PMCs values in the registers.
>> +     *
>> +     * - if PMCs were frozen (curr_FC = true) and we're activating
>> +     * them (new_FC = false), calculate the current icount for each
>> +     * register to allow for subsequent reads to calculate the insns
>> +     * passed.
>> +     */
>> +    if (curr_FC != new_FC) {
>> +        if (!curr_FC) {
>> +            freeze_PMC5_value(env);
>> +            freeze_PMC6_value(env);
>> +        } else {
>> +            uint64_t curr_icount = get_insns();
>> +
>> +            env->pmc5_base_icount = curr_icount;
>> +            env->pmc6_base_icount = curr_icount;
>> +        }
>> +    }
>> +
>> +    env->spr[SPR_POWER_MMCR0] = value;
>> +}
>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>> index 29b0a340a9..62356cfadf 100644
>> --- a/target/ppc/translate.c
>> +++ b/target/ppc/translate.c
>> @@ -409,8 +409,14 @@ void spr_write_generic(DisasContext *ctx, int sprn, int gprn)
>>   
>>   void spr_write_pmu_generic(DisasContext *ctx, int sprn, int gprn)
>>   {
>> -    /* For now it's just a call to spr_write_generic() */
>> -    spr_write_generic(ctx, sprn, gprn);
>> +    switch (sprn) {
>> +    case SPR_POWER_MMCR0:
>> +        gen_icount_io_start(ctx);
>> +        gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
>> +        break;
>> +    default:
>> +        spr_write_generic(ctx, sprn, gprn);
>> +    }
>>   }
>>   
>>   #if !defined(CONFIG_USER_ONLY)
>> @@ -592,6 +598,8 @@ void spr_write_pmu_ureg(DisasContext *ctx, int sprn, int gprn)
>>           t0 = tcg_temp_new();
>>           t1 = tcg_temp_new();
>>   
>> +        gen_icount_io_start(ctx);
>> +
>>           /*
>>            * Filter out all bits but FC, PMAO, and PMAE, according
>>            * to ISA v3.1, in 10.4.4 Monitor Mode Control Register 0,
>> @@ -603,7 +611,7 @@ void spr_write_pmu_ureg(DisasContext *ctx, int sprn, int gprn)
>>           tcg_gen_andi_tl(t1, t1, ~(MMCR0_FC | MMCR0_PMAO | MMCR0_PMAE));
>>           /* Keep all other bits intact */
>>           tcg_gen_or_tl(t1, t1, t0);
>> -        gen_store_spr(effective_sprn, t1);
>> +        gen_helper_store_mmcr0(cpu_env, t1);
>>   
>>           tcg_temp_free(t0);
>>           tcg_temp_free(t1);
>
David Gibson Aug. 17, 2021, 2:59 a.m. UTC | #5
On Mon, Aug 16, 2021 at 02:53:13PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/10/21 12:39 AM, David Gibson wrote:
> > On Mon, Aug 09, 2021 at 10:10:42AM -0300, Daniel Henrique Barboza wrote:
> > > The PMCC (PMC Control) bit in the MMCR0 register controls whether the
> > > counters PMC5 and PMC6 are being part of the performance monitor
> > > facility in a specific time. If PMCC allows it, PMC5 and PMC6 will
> > > always be used to measure instructions completed and cycles,
> > > respectively.
> > > 
> > > This patch adds the barebones of the Book3s PMU logic by enabling
> > > instruction counting, using the icount framework, using the performance
> > > monitor counters 5 and 6. The overall logic goes as follows:
> > > 
> > > - a helper is added to control the PMU state on each MMCR0 write. This
> > > allows for the PMU to start/stop as quick as possible;
> > 
> > Um.. why does a helper accomplish that goal?
> > 
> > > 
> > > - only PMC5 and PMC6 are being set. PMC6 (cycles) is default to 4*insns
> > > (for cycles per instruction) for now;
> > 
> > What's the justification for that value?  With a superscalar core, I'd
> > expect average (median) cycles per instruction to be < 1 a lot of the
> > time.  Mean cycles per instruction could be higher since certain
> > instructions could take a lot.
> > 
> > > - the intended usage is to freeze the counters by setting MMCR0_FC, do
> > > any additional setting via MMCR1 (not implemented yet) and setting
> > > initial counter values,  and enable the PMU by zeroing MMCR0_FC. Software
> > > must freeze counters to read the results - on the fly reading of the PMCs
> > > will return the starting value of each one.
> > 
> > Is that the way hardware behaves?  Or is it just a limitation of this
> > software implementation?  Either is fine, we should just be clear on
> > what it is.
> > 
> > > 
> > > Since there will be more PMU exclusive code to be added next, let's also
> > > put the PMU logic in its own helper to keep all in the same place. The
> > > code is also repetitive and not really extensible to add more PMCs, but
> > > we'll handle this in the next patches.
> > > 
> > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> > > ---
> > >   target/ppc/cpu.h               |  4 ++
> > >   target/ppc/cpu_init.c          |  4 +-
> > >   target/ppc/helper.h            |  1 +
> > >   target/ppc/meson.build         |  1 +
> > >   target/ppc/pmu_book3s_helper.c | 78 ++++++++++++++++++++++++++++++++++
> > >   target/ppc/translate.c         | 14 ++++--
> > >   6 files changed, 97 insertions(+), 5 deletions(-)
> > >   create mode 100644 target/ppc/pmu_book3s_helper.c
> > > 
> > > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > > index 4d96015f81..229abfe7ee 100644
> > > --- a/target/ppc/cpu.h
> > > +++ b/target/ppc/cpu.h
> > > @@ -1175,6 +1175,10 @@ struct CPUPPCState {
> > >       uint32_t tm_vscr;
> > >       uint64_t tm_dscr;
> > >       uint64_t tm_tar;
> > > +
> > > +    /* PMU registers icount state */
> > > +    uint64_t pmc5_base_icount;
> > > +    uint64_t pmc6_base_icount;
> > >   };
> > >   #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
> > > diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> > > index 71062809c8..fce89ee994 100644
> > > --- a/target/ppc/cpu_init.c
> > > +++ b/target/ppc/cpu_init.c
> > > @@ -6822,7 +6822,7 @@ static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
> > >       spr_register_kvm(env, SPR_POWER_MMCR0, "MMCR0",
> > >                        SPR_NOACCESS, SPR_NOACCESS,
> > >                        &spr_read_pmu_generic, &spr_write_pmu_generic,
> > > -                     KVM_REG_PPC_MMCR0, 0x00000000);
> > > +                     KVM_REG_PPC_MMCR0, 0x80000000);
> > >       spr_register_kvm(env, SPR_POWER_MMCR1, "MMCR1",
> > >                        SPR_NOACCESS, SPR_NOACCESS,
> > >                        &spr_read_pmu_generic, &spr_write_pmu_generic,
> > > @@ -6870,7 +6870,7 @@ static void register_book3s_pmu_user_sprs(CPUPPCState *env)
> > >       spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
> > >                    &spr_read_pmu_ureg, &spr_write_pmu_ureg,
> > >                    &spr_read_ureg, &spr_write_ureg,
> > > -                 0x00000000);
> > > +                 0x80000000);
> > >       spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
> > >                    &spr_read_pmu_ureg, &spr_write_pmu_ureg,
> > >                    &spr_read_ureg, &spr_write_ureg,
> > > diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> > > index 4076aa281e..5122632784 100644
> > > --- a/target/ppc/helper.h
> > > +++ b/target/ppc/helper.h
> > > @@ -20,6 +20,7 @@ DEF_HELPER_1(rfscv, void, env)
> > >   DEF_HELPER_1(hrfid, void, env)
> > >   DEF_HELPER_2(store_lpcr, void, env, tl)
> > >   DEF_HELPER_2(store_pcr, void, env, tl)
> > > +DEF_HELPER_2(store_mmcr0, void, env, tl)
> > >   #endif
> > >   DEF_HELPER_1(check_tlb_flush_local, void, env)
> > >   DEF_HELPER_1(check_tlb_flush_global, void, env)
> > > diff --git a/target/ppc/meson.build b/target/ppc/meson.build
> > > index b85f295703..bf252ca3ac 100644
> > > --- a/target/ppc/meson.build
> > > +++ b/target/ppc/meson.build
> > > @@ -14,6 +14,7 @@ ppc_ss.add(when: 'CONFIG_TCG', if_true: files(
> > >     'int_helper.c',
> > >     'mem_helper.c',
> > >     'misc_helper.c',
> > > +  'pmu_book3s_helper.c',
> > >     'timebase_helper.c',
> > >     'translate.c',
> > >   ))
> > > diff --git a/target/ppc/pmu_book3s_helper.c b/target/ppc/pmu_book3s_helper.c
> > > new file mode 100644
> > > index 0000000000..fe16fcfce0
> > > --- /dev/null
> > > +++ b/target/ppc/pmu_book3s_helper.c
> > 
> > I'd prefer to call this just book3s_pmu.c.  Or better yet
> > "powerX_pmu.c", where X is the specific PMU model you're implementing
> > since IIRC the particulars of the PMU vary quite a lot from POWER7
> > through to POWER10.
> 
> I'll go with book3s_pmu.c because this PMU implementation will not go
> deep enough to touch the differences between POWER processors.

I think you are mistaken.

> The only aspects that will be implementation specific will be 2 perf
> events (0x2 and 0x1E) that the kernel has been using for a long time
> and only recently migrated to architected events. We'll support all
> architected events that are related to those events so that newer
> kernels - and other non-IBM processors - will use the PMU without
> the need of having to know about 0x2 and 0x1E.

So, possibly in the last few POWER chips, the only differences are the
table of event keys.  That is definitely not the case for the whole
POWER line though.  I remember from my time at IBM, the PMU underwent
huge changes much deeper than the event table.  Some had different
numbers of PMCs, different bit fields in the MMCRs, different methods
of event selection (in some cases through insanely compplex chains of
multiplexers).  And everything from POWER4 onwards could reasonably be
described as "book3s".  So we definitely need a different
name... working out what it should be is harder though.

If the modern core structure of the PMU got codified in a particular
BookS architecture version we could name it after that version, maybe?
Daniel Henrique Barboza Aug. 17, 2021, 9:30 a.m. UTC | #6
On 8/16/21 11:59 PM, David Gibson wrote:
> On Mon, Aug 16, 2021 at 02:53:13PM -0300, Daniel Henrique Barboza wrote:
>>
>>
>> On 8/10/21 12:39 AM, David Gibson wrote:
>>> On Mon, Aug 09, 2021 at 10:10:42AM -0300, Daniel Henrique Barboza wrote:
>>>> The PMCC (PMC Control) bit in the MMCR0 register controls whether the
>>>> counters PMC5 and PMC6 are being part of the performance monitor
>>>> facility in a specific time. If PMCC allows it, PMC5 and PMC6 will
>>>> always be used to measure instructions completed and cycles,
>>>> respectively.
>>>>
>>>> This patch adds the barebones of the Book3s PMU logic by enabling
>>>> instruction counting, using the icount framework, using the performance
>>>> monitor counters 5 and 6. The overall logic goes as follows:
>>>>
>>>> - a helper is added to control the PMU state on each MMCR0 write. This
>>>> allows for the PMU to start/stop as quick as possible;
>>>
>>> Um.. why does a helper accomplish that goal?
>>>
>>>>
>>>> - only PMC5 and PMC6 are being set. PMC6 (cycles) is default to 4*insns
>>>> (for cycles per instruction) for now;
>>>
>>> What's the justification for that value?  With a superscalar core, I'd
>>> expect average (median) cycles per instruction to be < 1 a lot of the
>>> time.  Mean cycles per instruction could be higher since certain
>>> instructions could take a lot.
>>>
>>>> - the intended usage is to freeze the counters by setting MMCR0_FC, do
>>>> any additional setting via MMCR1 (not implemented yet) and setting
>>>> initial counter values,  and enable the PMU by zeroing MMCR0_FC. Software
>>>> must freeze counters to read the results - on the fly reading of the PMCs
>>>> will return the starting value of each one.
>>>
>>> Is that the way hardware behaves?  Or is it just a limitation of this
>>> software implementation?  Either is fine, we should just be clear on
>>> what it is.
>>>
>>>>
>>>> Since there will be more PMU exclusive code to be added next, let's also
>>>> put the PMU logic in its own helper to keep all in the same place. The
>>>> code is also repetitive and not really extensible to add more PMCs, but
>>>> we'll handle this in the next patches.
>>>>
>>>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>>>> ---
>>>>    target/ppc/cpu.h               |  4 ++
>>>>    target/ppc/cpu_init.c          |  4 +-
>>>>    target/ppc/helper.h            |  1 +
>>>>    target/ppc/meson.build         |  1 +
>>>>    target/ppc/pmu_book3s_helper.c | 78 ++++++++++++++++++++++++++++++++++
>>>>    target/ppc/translate.c         | 14 ++++--
>>>>    6 files changed, 97 insertions(+), 5 deletions(-)
>>>>    create mode 100644 target/ppc/pmu_book3s_helper.c
>>>>
>>>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>>>> index 4d96015f81..229abfe7ee 100644
>>>> --- a/target/ppc/cpu.h
>>>> +++ b/target/ppc/cpu.h
>>>> @@ -1175,6 +1175,10 @@ struct CPUPPCState {
>>>>        uint32_t tm_vscr;
>>>>        uint64_t tm_dscr;
>>>>        uint64_t tm_tar;
>>>> +
>>>> +    /* PMU registers icount state */
>>>> +    uint64_t pmc5_base_icount;
>>>> +    uint64_t pmc6_base_icount;
>>>>    };
>>>>    #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
>>>> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
>>>> index 71062809c8..fce89ee994 100644
>>>> --- a/target/ppc/cpu_init.c
>>>> +++ b/target/ppc/cpu_init.c
>>>> @@ -6822,7 +6822,7 @@ static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
>>>>        spr_register_kvm(env, SPR_POWER_MMCR0, "MMCR0",
>>>>                         SPR_NOACCESS, SPR_NOACCESS,
>>>>                         &spr_read_pmu_generic, &spr_write_pmu_generic,
>>>> -                     KVM_REG_PPC_MMCR0, 0x00000000);
>>>> +                     KVM_REG_PPC_MMCR0, 0x80000000);
>>>>        spr_register_kvm(env, SPR_POWER_MMCR1, "MMCR1",
>>>>                         SPR_NOACCESS, SPR_NOACCESS,
>>>>                         &spr_read_pmu_generic, &spr_write_pmu_generic,
>>>> @@ -6870,7 +6870,7 @@ static void register_book3s_pmu_user_sprs(CPUPPCState *env)
>>>>        spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
>>>>                     &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>>>>                     &spr_read_ureg, &spr_write_ureg,
>>>> -                 0x00000000);
>>>> +                 0x80000000);
>>>>        spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
>>>>                     &spr_read_pmu_ureg, &spr_write_pmu_ureg,
>>>>                     &spr_read_ureg, &spr_write_ureg,
>>>> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
>>>> index 4076aa281e..5122632784 100644
>>>> --- a/target/ppc/helper.h
>>>> +++ b/target/ppc/helper.h
>>>> @@ -20,6 +20,7 @@ DEF_HELPER_1(rfscv, void, env)
>>>>    DEF_HELPER_1(hrfid, void, env)
>>>>    DEF_HELPER_2(store_lpcr, void, env, tl)
>>>>    DEF_HELPER_2(store_pcr, void, env, tl)
>>>> +DEF_HELPER_2(store_mmcr0, void, env, tl)
>>>>    #endif
>>>>    DEF_HELPER_1(check_tlb_flush_local, void, env)
>>>>    DEF_HELPER_1(check_tlb_flush_global, void, env)
>>>> diff --git a/target/ppc/meson.build b/target/ppc/meson.build
>>>> index b85f295703..bf252ca3ac 100644
>>>> --- a/target/ppc/meson.build
>>>> +++ b/target/ppc/meson.build
>>>> @@ -14,6 +14,7 @@ ppc_ss.add(when: 'CONFIG_TCG', if_true: files(
>>>>      'int_helper.c',
>>>>      'mem_helper.c',
>>>>      'misc_helper.c',
>>>> +  'pmu_book3s_helper.c',
>>>>      'timebase_helper.c',
>>>>      'translate.c',
>>>>    ))
>>>> diff --git a/target/ppc/pmu_book3s_helper.c b/target/ppc/pmu_book3s_helper.c
>>>> new file mode 100644
>>>> index 0000000000..fe16fcfce0
>>>> --- /dev/null
>>>> +++ b/target/ppc/pmu_book3s_helper.c
>>>
>>> I'd prefer to call this just book3s_pmu.c.  Or better yet
>>> "powerX_pmu.c", where X is the specific PMU model you're implementing
>>> since IIRC the particulars of the PMU vary quite a lot from POWER7
>>> through to POWER10.
>>
>> I'll go with book3s_pmu.c because this PMU implementation will not go
>> deep enough to touch the differences between POWER processors.
> 
> I think you are mistaken.
> 
>> The only aspects that will be implementation specific will be 2 perf
>> events (0x2 and 0x1E) that the kernel has been using for a long time
>> and only recently migrated to architected events. We'll support all
>> architected events that are related to those events so that newer
>> kernels - and other non-IBM processors - will use the PMU without
>> the need of having to know about 0x2 and 0x1E.
> 
> So, possibly in the last few POWER chips, the only differences are the
> table of event keys.  That is definitely not the case for the whole
> POWER line though.  I remember from my time at IBM, the PMU underwent
> huge changes much deeper than the event table.  Some had different
> numbers of PMCs, different bit fields in the MMCRs, different methods
> of event selection (in some cases through insanely compplex chains of
> multiplexers).  And everything from POWER4 onwards could reasonably be
> described as "book3s".  So we definitely need a different
> name... working out what it should be is harder though.
> 
> If the modern core structure of the PMU got codified in a particular
> BookS architecture version we could name it after that version, maybe?


What about just 'pmu_helper.c'? That way we can have a file with most of
the PMU logic contained in it, without implying nothing about the support
with the filename alone. If time goes by and more specialized coded starts
to being added in there (code like "this helper implements something that
is only valid in PPC chip X") then we can split into more specific files.

Yet another alternative in line with what you suggested could be
"power9_pmu.c", perhaps even "power8_pmu.c". The chip version would mean
"the oldest IBM TCG PPC64 chip that we tested with this PMU code". I'm
testing with POWER9 most of the time, POWER10 works fine, so 'power9_pmu.c'
is viable. I can do some tests with POWER8 to see how it goes. I'm not
sure if it's worth the trouble testing with anything older than P8 though.


Thanks,


Daniel

>
David Gibson Aug. 18, 2021, 5:48 a.m. UTC | #7
On Tue, Aug 17, 2021 at 06:30:37AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/16/21 11:59 PM, David Gibson wrote:
> > On Mon, Aug 16, 2021 at 02:53:13PM -0300, Daniel Henrique Barboza wrote:
> > > 
> > > 
> > > On 8/10/21 12:39 AM, David Gibson wrote:
> > > > On Mon, Aug 09, 2021 at 10:10:42AM -0300, Daniel Henrique Barboza wrote:
> > > > > The PMCC (PMC Control) bit in the MMCR0 register controls whether the
> > > > > counters PMC5 and PMC6 are being part of the performance monitor
> > > > > facility in a specific time. If PMCC allows it, PMC5 and PMC6 will
> > > > > always be used to measure instructions completed and cycles,
> > > > > respectively.
> > > > > 
> > > > > This patch adds the barebones of the Book3s PMU logic by enabling
> > > > > instruction counting, using the icount framework, using the performance
> > > > > monitor counters 5 and 6. The overall logic goes as follows:
> > > > > 
> > > > > - a helper is added to control the PMU state on each MMCR0 write. This
> > > > > allows for the PMU to start/stop as quick as possible;
> > > > 
> > > > Um.. why does a helper accomplish that goal?
> > > > 
> > > > > 
> > > > > - only PMC5 and PMC6 are being set. PMC6 (cycles) is default to 4*insns
> > > > > (for cycles per instruction) for now;
> > > > 
> > > > What's the justification for that value?  With a superscalar core, I'd
> > > > expect average (median) cycles per instruction to be < 1 a lot of the
> > > > time.  Mean cycles per instruction could be higher since certain
> > > > instructions could take a lot.
> > > > 
> > > > > - the intended usage is to freeze the counters by setting MMCR0_FC, do
> > > > > any additional setting via MMCR1 (not implemented yet) and setting
> > > > > initial counter values,  and enable the PMU by zeroing MMCR0_FC. Software
> > > > > must freeze counters to read the results - on the fly reading of the PMCs
> > > > > will return the starting value of each one.
> > > > 
> > > > Is that the way hardware behaves?  Or is it just a limitation of this
> > > > software implementation?  Either is fine, we should just be clear on
> > > > what it is.
> > > > 
> > > > > 
> > > > > Since there will be more PMU exclusive code to be added next, let's also
> > > > > put the PMU logic in its own helper to keep all in the same place. The
> > > > > code is also repetitive and not really extensible to add more PMCs, but
> > > > > we'll handle this in the next patches.
> > > > > 
> > > > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> > > > > ---
> > > > >    target/ppc/cpu.h               |  4 ++
> > > > >    target/ppc/cpu_init.c          |  4 +-
> > > > >    target/ppc/helper.h            |  1 +
> > > > >    target/ppc/meson.build         |  1 +
> > > > >    target/ppc/pmu_book3s_helper.c | 78 ++++++++++++++++++++++++++++++++++
> > > > >    target/ppc/translate.c         | 14 ++++--
> > > > >    6 files changed, 97 insertions(+), 5 deletions(-)
> > > > >    create mode 100644 target/ppc/pmu_book3s_helper.c
> > > > > 
> > > > > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > > > > index 4d96015f81..229abfe7ee 100644
> > > > > --- a/target/ppc/cpu.h
> > > > > +++ b/target/ppc/cpu.h
> > > > > @@ -1175,6 +1175,10 @@ struct CPUPPCState {
> > > > >        uint32_t tm_vscr;
> > > > >        uint64_t tm_dscr;
> > > > >        uint64_t tm_tar;
> > > > > +
> > > > > +    /* PMU registers icount state */
> > > > > +    uint64_t pmc5_base_icount;
> > > > > +    uint64_t pmc6_base_icount;
> > > > >    };
> > > > >    #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
> > > > > diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> > > > > index 71062809c8..fce89ee994 100644
> > > > > --- a/target/ppc/cpu_init.c
> > > > > +++ b/target/ppc/cpu_init.c
> > > > > @@ -6822,7 +6822,7 @@ static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
> > > > >        spr_register_kvm(env, SPR_POWER_MMCR0, "MMCR0",
> > > > >                         SPR_NOACCESS, SPR_NOACCESS,
> > > > >                         &spr_read_pmu_generic, &spr_write_pmu_generic,
> > > > > -                     KVM_REG_PPC_MMCR0, 0x00000000);
> > > > > +                     KVM_REG_PPC_MMCR0, 0x80000000);
> > > > >        spr_register_kvm(env, SPR_POWER_MMCR1, "MMCR1",
> > > > >                         SPR_NOACCESS, SPR_NOACCESS,
> > > > >                         &spr_read_pmu_generic, &spr_write_pmu_generic,
> > > > > @@ -6870,7 +6870,7 @@ static void register_book3s_pmu_user_sprs(CPUPPCState *env)
> > > > >        spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
> > > > >                     &spr_read_pmu_ureg, &spr_write_pmu_ureg,
> > > > >                     &spr_read_ureg, &spr_write_ureg,
> > > > > -                 0x00000000);
> > > > > +                 0x80000000);
> > > > >        spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
> > > > >                     &spr_read_pmu_ureg, &spr_write_pmu_ureg,
> > > > >                     &spr_read_ureg, &spr_write_ureg,
> > > > > diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> > > > > index 4076aa281e..5122632784 100644
> > > > > --- a/target/ppc/helper.h
> > > > > +++ b/target/ppc/helper.h
> > > > > @@ -20,6 +20,7 @@ DEF_HELPER_1(rfscv, void, env)
> > > > >    DEF_HELPER_1(hrfid, void, env)
> > > > >    DEF_HELPER_2(store_lpcr, void, env, tl)
> > > > >    DEF_HELPER_2(store_pcr, void, env, tl)
> > > > > +DEF_HELPER_2(store_mmcr0, void, env, tl)
> > > > >    #endif
> > > > >    DEF_HELPER_1(check_tlb_flush_local, void, env)
> > > > >    DEF_HELPER_1(check_tlb_flush_global, void, env)
> > > > > diff --git a/target/ppc/meson.build b/target/ppc/meson.build
> > > > > index b85f295703..bf252ca3ac 100644
> > > > > --- a/target/ppc/meson.build
> > > > > +++ b/target/ppc/meson.build
> > > > > @@ -14,6 +14,7 @@ ppc_ss.add(when: 'CONFIG_TCG', if_true: files(
> > > > >      'int_helper.c',
> > > > >      'mem_helper.c',
> > > > >      'misc_helper.c',
> > > > > +  'pmu_book3s_helper.c',
> > > > >      'timebase_helper.c',
> > > > >      'translate.c',
> > > > >    ))
> > > > > diff --git a/target/ppc/pmu_book3s_helper.c b/target/ppc/pmu_book3s_helper.c
> > > > > new file mode 100644
> > > > > index 0000000000..fe16fcfce0
> > > > > --- /dev/null
> > > > > +++ b/target/ppc/pmu_book3s_helper.c
> > > > 
> > > > I'd prefer to call this just book3s_pmu.c.  Or better yet
> > > > "powerX_pmu.c", where X is the specific PMU model you're implementing
> > > > since IIRC the particulars of the PMU vary quite a lot from POWER7
> > > > through to POWER10.
> > > 
> > > I'll go with book3s_pmu.c because this PMU implementation will not go
> > > deep enough to touch the differences between POWER processors.
> > 
> > I think you are mistaken.
> > 
> > > The only aspects that will be implementation specific will be 2 perf
> > > events (0x2 and 0x1E) that the kernel has been using for a long time
> > > and only recently migrated to architected events. We'll support all
> > > architected events that are related to those events so that newer
> > > kernels - and other non-IBM processors - will use the PMU without
> > > the need of having to know about 0x2 and 0x1E.
> > 
> > So, possibly in the last few POWER chips, the only differences are the
> > table of event keys.  That is definitely not the case for the whole
> > POWER line though.  I remember from my time at IBM, the PMU underwent
> > huge changes much deeper than the event table.  Some had different
> > numbers of PMCs, different bit fields in the MMCRs, different methods
> > of event selection (in some cases through insanely compplex chains of
> > multiplexers).  And everything from POWER4 onwards could reasonably be
> > described as "book3s".  So we definitely need a different
> > name... working out what it should be is harder though.
> > 
> > If the modern core structure of the PMU got codified in a particular
> > BookS architecture version we could name it after that version, maybe?
> 
> 
> What about just 'pmu_helper.c'? That way we can have a file with most of
> the PMU logic contained in it, without implying nothing about the support
> with the filename alone. If time goes by and more specialized coded starts
> to being added in there (code like "this helper implements something that
> is only valid in PPC chip X") then we can split into more specific files.
> 
> Yet another alternative in line with what you suggested could be
> "power9_pmu.c", perhaps even "power8_pmu.c". The chip version would mean
> "the oldest IBM TCG PPC64 chip that we tested with this PMU code". I'm
> testing with POWER9 most of the time, POWER10 works fine, so 'power9_pmu.c'
> is viable. I can do some tests with POWER8 to see how it goes. I'm not
> sure if it's worth the trouble testing with anything older than P8
> though.

Right.  I much prefer either of these to pmu_helper.c
diff mbox series

Patch

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 4d96015f81..229abfe7ee 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1175,6 +1175,10 @@  struct CPUPPCState {
     uint32_t tm_vscr;
     uint64_t tm_dscr;
     uint64_t tm_tar;
+
+    /* PMU registers icount state */
+    uint64_t pmc5_base_icount;
+    uint64_t pmc6_base_icount;
 };
 
 #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 71062809c8..fce89ee994 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6822,7 +6822,7 @@  static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
     spr_register_kvm(env, SPR_POWER_MMCR0, "MMCR0",
                      SPR_NOACCESS, SPR_NOACCESS,
                      &spr_read_pmu_generic, &spr_write_pmu_generic,
-                     KVM_REG_PPC_MMCR0, 0x00000000);
+                     KVM_REG_PPC_MMCR0, 0x80000000);
     spr_register_kvm(env, SPR_POWER_MMCR1, "MMCR1",
                      SPR_NOACCESS, SPR_NOACCESS,
                      &spr_read_pmu_generic, &spr_write_pmu_generic,
@@ -6870,7 +6870,7 @@  static void register_book3s_pmu_user_sprs(CPUPPCState *env)
     spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
                  &spr_read_pmu_ureg, &spr_write_pmu_ureg,
                  &spr_read_ureg, &spr_write_ureg,
-                 0x00000000);
+                 0x80000000);
     spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
                  &spr_read_pmu_ureg, &spr_write_pmu_ureg,
                  &spr_read_ureg, &spr_write_ureg,
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 4076aa281e..5122632784 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -20,6 +20,7 @@  DEF_HELPER_1(rfscv, void, env)
 DEF_HELPER_1(hrfid, void, env)
 DEF_HELPER_2(store_lpcr, void, env, tl)
 DEF_HELPER_2(store_pcr, void, env, tl)
+DEF_HELPER_2(store_mmcr0, void, env, tl)
 #endif
 DEF_HELPER_1(check_tlb_flush_local, void, env)
 DEF_HELPER_1(check_tlb_flush_global, void, env)
diff --git a/target/ppc/meson.build b/target/ppc/meson.build
index b85f295703..bf252ca3ac 100644
--- a/target/ppc/meson.build
+++ b/target/ppc/meson.build
@@ -14,6 +14,7 @@  ppc_ss.add(when: 'CONFIG_TCG', if_true: files(
   'int_helper.c',
   'mem_helper.c',
   'misc_helper.c',
+  'pmu_book3s_helper.c',
   'timebase_helper.c',
   'translate.c',
 ))
diff --git a/target/ppc/pmu_book3s_helper.c b/target/ppc/pmu_book3s_helper.c
new file mode 100644
index 0000000000..fe16fcfce0
--- /dev/null
+++ b/target/ppc/pmu_book3s_helper.c
@@ -0,0 +1,78 @@ 
+/*
+ * PowerPC Book3s PMU emulation helpers for QEMU TCG
+ *
+ *  Copyright IBM Corp. 2021
+ *
+ * Authors:
+ *  Daniel Henrique Barboza      <danielhb413@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+
+static uint64_t get_insns(void)
+{
+    return (uint64_t)icount_get_raw();
+}
+
+static uint64_t get_cycles(uint64_t insns)
+{
+    /* Placeholder value */
+    return insns * 4;
+}
+
+/* PMC5 always count instructions */
+static void freeze_PMC5_value(CPUPPCState *env)
+{
+    uint64_t insns = get_insns() - env->pmc5_base_icount;
+
+    env->spr[SPR_POWER_PMC5] += insns;
+    env->pmc5_base_icount += insns;
+}
+
+/* PMC6 always count cycles */
+static void freeze_PMC6_value(CPUPPCState *env)
+{
+    uint64_t insns = get_insns() - env->pmc6_base_icount;
+
+    env->spr[SPR_POWER_PMC6] += get_cycles(insns);
+    env->pmc6_base_icount += insns;
+}
+
+void helper_store_mmcr0(CPUPPCState *env, target_ulong value)
+{
+    bool curr_FC = env->spr[SPR_POWER_MMCR0] & MMCR0_FC;
+    bool new_FC = value & MMCR0_FC;
+
+    /*
+     * In an frozen count (FC) bit change:
+     *
+     * - if PMCs were running (curr_FC = false) and we're freezing
+     * them (new_FC = true), save the PMCs values in the registers.
+     *
+     * - if PMCs were frozen (curr_FC = true) and we're activating
+     * them (new_FC = false), calculate the current icount for each
+     * register to allow for subsequent reads to calculate the insns
+     * passed.
+     */
+    if (curr_FC != new_FC) {
+        if (!curr_FC) {
+            freeze_PMC5_value(env);
+            freeze_PMC6_value(env);
+        } else {
+            uint64_t curr_icount = get_insns();
+
+            env->pmc5_base_icount = curr_icount;
+            env->pmc6_base_icount = curr_icount;
+        }
+    }
+
+    env->spr[SPR_POWER_MMCR0] = value;
+}
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 29b0a340a9..62356cfadf 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -409,8 +409,14 @@  void spr_write_generic(DisasContext *ctx, int sprn, int gprn)
 
 void spr_write_pmu_generic(DisasContext *ctx, int sprn, int gprn)
 {
-    /* For now it's just a call to spr_write_generic() */
-    spr_write_generic(ctx, sprn, gprn);
+    switch (sprn) {
+    case SPR_POWER_MMCR0:
+        gen_icount_io_start(ctx);
+        gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
+        break;
+    default:
+        spr_write_generic(ctx, sprn, gprn);
+    }
 }
 
 #if !defined(CONFIG_USER_ONLY)
@@ -592,6 +598,8 @@  void spr_write_pmu_ureg(DisasContext *ctx, int sprn, int gprn)
         t0 = tcg_temp_new();
         t1 = tcg_temp_new();
 
+        gen_icount_io_start(ctx);
+
         /*
          * Filter out all bits but FC, PMAO, and PMAE, according
          * to ISA v3.1, in 10.4.4 Monitor Mode Control Register 0,
@@ -603,7 +611,7 @@  void spr_write_pmu_ureg(DisasContext *ctx, int sprn, int gprn)
         tcg_gen_andi_tl(t1, t1, ~(MMCR0_FC | MMCR0_PMAO | MMCR0_PMAE));
         /* Keep all other bits intact */
         tcg_gen_or_tl(t1, t1, t0);
-        gen_store_spr(effective_sprn, t1);
+        gen_helper_store_mmcr0(cpu_env, t1);
 
         tcg_temp_free(t0);
         tcg_temp_free(t1);