diff mbox series

[v3,03/15] target/ppc: PMU basic cycle count for pseries TCG

Message ID 20210903203116.80628-4-danielhb413@gmail.com (mailing list archive)
State New, archived
Headers show
Series PMU-EBB support for PPC64 TCG | expand

Commit Message

Daniel Henrique Barboza Sept. 3, 2021, 8:31 p.m. UTC
This patch adds the barebones of the PMU logic by enabling cycle
counting, done via the performance monitor counter 6. The overall logic
goes as follows:

- a helper is added to control the PMU state on each MMCR0 write. This
allows for the PMU to start/stop as the frozen counter bit (MMCR0_FC)
is cleared or set;

- MMCR0 reg initial value is set to 0x80000000 (MMCR0_FC set) to avoid
having to spin the PMU right at system init;

- the intended usage is to freeze the counters by setting MMCR0_FC, do
any additional setting of events to be counted via MMCR1 (not
implemented yet) and enable the PMU by zeroing MMCR0_FC. Software must
freeze counters to read the results - on the fly reading of the PMCs
will return the starting value of each one.

Since there will be more PMU exclusive code to be added next, put the
PMU logic in its own helper to keep all in the same place. The name of
the new helper file, power8_pmu.c, is an indicative that the PMU logic
has been tested with the IBM POWER chip family, POWER8 being the oldest
version tested. This doesn't mean that this PMU logic will break with
any other PPC64 chip that implements Book3s, but since we can't assert
that this PMU will work with all available Book3s emulated processors
we're choosing to be explicit.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/cpu.h        |  6 ++++
 target/ppc/cpu_init.c   |  6 ++--
 target/ppc/helper.h     |  1 +
 target/ppc/meson.build  |  1 +
 target/ppc/power8_pmu.c | 74 +++++++++++++++++++++++++++++++++++++++++
 target/ppc/spr_tcg.h    |  1 +
 target/ppc/translate.c  | 23 ++++++++++++-
 7 files changed, 108 insertions(+), 4 deletions(-)
 create mode 100644 target/ppc/power8_pmu.c

Comments

David Gibson Sept. 7, 2021, 1:48 a.m. UTC | #1
On Fri, Sep 03, 2021 at 05:31:04PM -0300, Daniel Henrique Barboza wrote:
> This patch adds the barebones of the PMU logic by enabling cycle
> counting, done via the performance monitor counter 6. The overall logic
> goes as follows:
> 
> - a helper is added to control the PMU state on each MMCR0 write. This
> allows for the PMU to start/stop as the frozen counter bit (MMCR0_FC)
> is cleared or set;
> 
> - MMCR0 reg initial value is set to 0x80000000 (MMCR0_FC set) to avoid
> having to spin the PMU right at system init;
> 
> - the intended usage is to freeze the counters by setting MMCR0_FC, do
> any additional setting of events to be counted via MMCR1 (not
> implemented yet) and enable the PMU by zeroing MMCR0_FC. Software must
> freeze counters to read the results - on the fly reading of the PMCs
> will return the starting value of each one.
> 
> Since there will be more PMU exclusive code to be added next, put the
> PMU logic in its own helper to keep all in the same place. The name of
> the new helper file, power8_pmu.c, is an indicative that the PMU logic
> has been tested with the IBM POWER chip family, POWER8 being the oldest
> version tested. This doesn't mean that this PMU logic will break with
> any other PPC64 chip that implements Book3s, but since we can't assert
> that this PMU will work with all available Book3s emulated processors
> we're choosing to be explicit.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

LGTM, except for one nit:
> +#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
> +
> +static void update_PMC_PM_CYC(CPUPPCState *env, int sprn,
> +                              uint64_t time_delta)
> +{
> +    /*
> +     * The pseries and pvn clock runs at 1Ghz, meaning that

s/pvn/pnv/ ?
Matheus K. Ferst Sept. 22, 2021, 11:24 a.m. UTC | #2
On 03/09/2021 17:31, Daniel Henrique Barboza wrote:
> [E-MAIL EXTERNO] Não clique em links ou abra anexos, a menos que você possa confirmar o remetente e saber que o conteúdo é seguro. Em caso de e-mail suspeito entre imediatamente em contato com o DTI.
> 
> This patch adds the barebones of the PMU logic by enabling cycle
> counting, done via the performance monitor counter 6. The overall logic
> goes as follows:
> 
> - a helper is added to control the PMU state on each MMCR0 write. This
> allows for the PMU to start/stop as the frozen counter bit (MMCR0_FC)
> is cleared or set;
> 
> - MMCR0 reg initial value is set to 0x80000000 (MMCR0_FC set) to avoid
> having to spin the PMU right at system init;
> 
> - the intended usage is to freeze the counters by setting MMCR0_FC, do
> any additional setting of events to be counted via MMCR1 (not
> implemented yet) and enable the PMU by zeroing MMCR0_FC. Software must
> freeze counters to read the results - on the fly reading of the PMCs
> will return the starting value of each one.
> 
> Since there will be more PMU exclusive code to be added next, put the
> PMU logic in its own helper to keep all in the same place. The name of
> the new helper file, power8_pmu.c, is an indicative that the PMU logic
> has been tested with the IBM POWER chip family, POWER8 being the oldest
> version tested. This doesn't mean that this PMU logic will break with
> any other PPC64 chip that implements Book3s, but since we can't assert
> that this PMU will work with all available Book3s emulated processors
> we're choosing to be explicit.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---

<snip>

> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 0babde3131..c3e2e3d329 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -401,6 +401,24 @@ void spr_write_generic(DisasContext *ctx, int sprn, int gprn)
>       spr_store_dump_spr(sprn);
>   }
> 
> +#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
> +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
> +{
> +    /*
> +     * helper_store_mmcr0 will make clock based operations that
> +     * will cause 'bad icount read' errors if we do not execute
> +     * gen_icount_io_start() beforehand.
> +     */
> +    gen_icount_io_start(ctx);
> +    gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
> +}
> +#else
> +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
> +{
> +    spr_write_generic(ctx, sprn, gprn);
> +}
> +#endif
> +
>   #if !defined(CONFIG_USER_ONLY)
>   void spr_write_generic32(DisasContext *ctx, int sprn, int gprn)
>   {
> @@ -596,7 +614,10 @@ void spr_write_MMCR0_ureg(DisasContext *ctx, int sprn, int gprn)
>       tcg_gen_andi_tl(t1, t1, ~(MMCR0_UREG_MASK));
>       /* Keep all other bits intact */
>       tcg_gen_or_tl(t1, t1, t0);
> -    gen_store_spr(SPR_POWER_MMCR0, t1);
> +
> +    /* Overwrite cpu_gpr[gprn] and use spr_write_MMCR0() */
> +    tcg_gen_mov_tl(cpu_gpr[gprn], t1);
> +    spr_write_MMCR0(ctx, sprn + 0x10, gprn);

IIUC, this makes writing to MMCR0 change the GPR value and expose the 
unfiltered content of the SPR to problem state. It might be better to 
call the helper directly or create another method that takes a TCGv as 
an argument and call it from spr_write_MMCR0_ureg and spr_write_MMCR0.

> 
>       tcg_temp_free(t0);
>       tcg_temp_free(t1);
> --
> 2.31.1
>
Daniel Henrique Barboza Sept. 24, 2021, 2:41 p.m. UTC | #3
On 9/22/21 08:24, Matheus K. Ferst wrote:
> On 03/09/2021 17:31, Daniel Henrique Barboza wrote:
>> [E-MAIL EXTERNO] Não clique em links ou abra anexos, a menos que você possa confirmar o remetente e saber que o conteúdo é seguro. Em caso de e-mail suspeito entre imediatamente em contato com o DTI.
>>
>> This patch adds the barebones of the PMU logic by enabling cycle
>> counting, done via the performance monitor counter 6. The overall logic
>> goes as follows:
>>
>> - a helper is added to control the PMU state on each MMCR0 write. This
>> allows for the PMU to start/stop as the frozen counter bit (MMCR0_FC)
>> is cleared or set;
>>
>> - MMCR0 reg initial value is set to 0x80000000 (MMCR0_FC set) to avoid
>> having to spin the PMU right at system init;
>>
>> - the intended usage is to freeze the counters by setting MMCR0_FC, do
>> any additional setting of events to be counted via MMCR1 (not
>> implemented yet) and enable the PMU by zeroing MMCR0_FC. Software must
>> freeze counters to read the results - on the fly reading of the PMCs
>> will return the starting value of each one.
>>
>> Since there will be more PMU exclusive code to be added next, put the
>> PMU logic in its own helper to keep all in the same place. The name of
>> the new helper file, power8_pmu.c, is an indicative that the PMU logic
>> has been tested with the IBM POWER chip family, POWER8 being the oldest
>> version tested. This doesn't mean that this PMU logic will break with
>> any other PPC64 chip that implements Book3s, but since we can't assert
>> that this PMU will work with all available Book3s emulated processors
>> we're choosing to be explicit.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
> 
> <snip>
> 
>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>> index 0babde3131..c3e2e3d329 100644
>> --- a/target/ppc/translate.c
>> +++ b/target/ppc/translate.c
>> @@ -401,6 +401,24 @@ void spr_write_generic(DisasContext *ctx, int sprn, int gprn)
>>       spr_store_dump_spr(sprn);
>>   }
>>
>> +#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
>> +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
>> +{
>> +    /*
>> +     * helper_store_mmcr0 will make clock based operations that
>> +     * will cause 'bad icount read' errors if we do not execute
>> +     * gen_icount_io_start() beforehand.
>> +     */
>> +    gen_icount_io_start(ctx);
>> +    gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
>> +}
>> +#else
>> +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
>> +{
>> +    spr_write_generic(ctx, sprn, gprn);
>> +}
>> +#endif
>> +
>>   #if !defined(CONFIG_USER_ONLY)
>>   void spr_write_generic32(DisasContext *ctx, int sprn, int gprn)
>>   {
>> @@ -596,7 +614,10 @@ void spr_write_MMCR0_ureg(DisasContext *ctx, int sprn, int gprn)
>>       tcg_gen_andi_tl(t1, t1, ~(MMCR0_UREG_MASK));
>>       /* Keep all other bits intact */
>>       tcg_gen_or_tl(t1, t1, t0);
>> -    gen_store_spr(SPR_POWER_MMCR0, t1);
>> +
>> +    /* Overwrite cpu_gpr[gprn] and use spr_write_MMCR0() */
>> +    tcg_gen_mov_tl(cpu_gpr[gprn], t1);
>> +    spr_write_MMCR0(ctx, sprn + 0x10, gprn);
> 
> IIUC, this makes writing to MMCR0 change the GPR value and expose the unfiltered content of the SPR to problem state. It might be better to call the helper directly or create another method that takes a TCGv as an argument and call it from spr_write_MMCR0_ureg and spr_write_MMCR0.

I'm overwriting cpu_gpr[gprn] with t1, which is filtered by MMCR0_REG_MASK
right before, to re-use spr_write_MMCR0() since its API requires a gprn
index. The reason I'm re-using spr_write_MMCR0() here is to avoid code repetition
in spr_write_MMCR0_ureg(), which would need to repeat the same steps as
spr_write_MMCR0 (calling icount_io_start(), calling the helper, and then setting
DISAS_EXIT_UPDATE in a later patch).

The idea behind is that all PMU user_write() functions works the same as its
privileged counterparts but with some form of filtering done beforehand. Note
that this is kind of true in the previous patch as well - gen_store_spr() is
similar to the privileged function MMCR0 was using (spr_write_generic()) with
the exception of an optional qemu_log().

Maybe I should've made this clear in the previous patch, using spr_write_generic()
and overwriting cpu_gpr[gprn] with the filtered t1 content back there.

Speaking of which, since t1 is being filtered by MMCR0_REG_MASK before being used to
overwrite cpu_gpr[gprn], I'm not sure how this is exposing unfiltered content to
problem state. Can you elaborate?



Thanks,


Daniel

> 
>>
>>       tcg_temp_free(t0);
>>       tcg_temp_free(t1);
>> -- 
>> 2.31.1
>>
> 
>
Matheus K. Ferst Sept. 24, 2021, 6:34 p.m. UTC | #4
On 24/09/2021 11:41, Daniel Henrique Barboza wrote:
> On 9/22/21 08:24, Matheus K. Ferst wrote:
>> On 03/09/2021 17:31, Daniel Henrique Barboza wrote:
>>> [E-MAIL EXTERNO] Não clique em links ou abra anexos, a menos que você 
>>> possa confirmar o remetente e saber que o conteúdo é seguro. Em caso 
>>> de e-mail suspeito entre imediatamente em contato com o DTI.
>>>
>>> This patch adds the barebones of the PMU logic by enabling cycle
>>> counting, done via the performance monitor counter 6. The overall logic
>>> goes as follows:
>>>
>>> - a helper is added to control the PMU state on each MMCR0 write. This
>>> allows for the PMU to start/stop as the frozen counter bit (MMCR0_FC)
>>> is cleared or set;
>>>
>>> - MMCR0 reg initial value is set to 0x80000000 (MMCR0_FC set) to avoid
>>> having to spin the PMU right at system init;
>>>
>>> - the intended usage is to freeze the counters by setting MMCR0_FC, do
>>> any additional setting of events to be counted via MMCR1 (not
>>> implemented yet) and enable the PMU by zeroing MMCR0_FC. Software must
>>> freeze counters to read the results - on the fly reading of the PMCs
>>> will return the starting value of each one.
>>>
>>> Since there will be more PMU exclusive code to be added next, put the
>>> PMU logic in its own helper to keep all in the same place. The name of
>>> the new helper file, power8_pmu.c, is an indicative that the PMU logic
>>> has been tested with the IBM POWER chip family, POWER8 being the oldest
>>> version tested. This doesn't mean that this PMU logic will break with
>>> any other PPC64 chip that implements Book3s, but since we can't assert
>>> that this PMU will work with all available Book3s emulated processors
>>> we're choosing to be explicit.
>>>
>>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>>> ---
>>
>> <snip>
>>
>>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>>> index 0babde3131..c3e2e3d329 100644
>>> --- a/target/ppc/translate.c
>>> +++ b/target/ppc/translate.c
>>> @@ -401,6 +401,24 @@ void spr_write_generic(DisasContext *ctx, int 
>>> sprn, int gprn)
>>>       spr_store_dump_spr(sprn);
>>>   }
>>>
>>> +#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
>>> +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
>>> +{
>>> +    /*
>>> +     * helper_store_mmcr0 will make clock based operations that
>>> +     * will cause 'bad icount read' errors if we do not execute
>>> +     * gen_icount_io_start() beforehand.
>>> +     */
>>> +    gen_icount_io_start(ctx);
>>> +    gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
>>> +}
>>> +#else
>>> +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
>>> +{
>>> +    spr_write_generic(ctx, sprn, gprn);
>>> +}
>>> +#endif
>>> +
>>>   #if !defined(CONFIG_USER_ONLY)
>>>   void spr_write_generic32(DisasContext *ctx, int sprn, int gprn)
>>>   {
>>> @@ -596,7 +614,10 @@ void spr_write_MMCR0_ureg(DisasContext *ctx, int 
>>> sprn, int gprn)
>>>       tcg_gen_andi_tl(t1, t1, ~(MMCR0_UREG_MASK));
>>>       /* Keep all other bits intact */
>>>       tcg_gen_or_tl(t1, t1, t0);
>>> -    gen_store_spr(SPR_POWER_MMCR0, t1);
>>> +
>>> +    /* Overwrite cpu_gpr[gprn] and use spr_write_MMCR0() */
>>> +    tcg_gen_mov_tl(cpu_gpr[gprn], t1);
>>> +    spr_write_MMCR0(ctx, sprn + 0x10, gprn);
>>
>> IIUC, this makes writing to MMCR0 change the GPR value and expose the 
>> unfiltered content of the SPR to problem state. It might be better to 
>> call the helper directly or create another method that takes a TCGv as 
>> an argument and call it from spr_write_MMCR0_ureg and spr_write_MMCR0.
> 
> I'm overwriting cpu_gpr[gprn] with t1, which is filtered by MMCR0_REG_MASK
> right before, to re-use spr_write_MMCR0() since its API requires a gprn
> index. The reason I'm re-using spr_write_MMCR0() here is to avoid code 
> repetition
> in spr_write_MMCR0_ureg(), which would need to repeat the same steps as
> spr_write_MMCR0 (calling icount_io_start(), calling the helper, and then 
> setting
> DISAS_EXIT_UPDATE in a later patch).
> 
> The idea behind is that all PMU user_write() functions works the same as 
> its
> privileged counterparts but with some form of filtering done beforehand. 
> Note
> that this is kind of true in the previous patch as well - 
> gen_store_spr() is
> similar to the privileged function MMCR0 was using (spr_write_generic()) 
> with
> the exception of an optional qemu_log().
> 
> Maybe I should've made this clear in the previous patch, using 
> spr_write_generic()
> and overwriting cpu_gpr[gprn] with the filtered t1 content back there.
> 
> Speaking of which, since t1 is being filtered by MMCR0_REG_MASK before 
> being used to
> overwrite cpu_gpr[gprn], I'm not sure how this is exposing unfiltered 
> content to
> problem state. Can you elaborate?

Suppose MMCR0 has the value 0x80000001 (FC and FCH) and problem state 
executes an mtspr with the value 0x4000000 (unset FC and set PMAE) in 
the GPR. The proposed code will do the following:

 > tcg_gen_andi_tl(t0, cpu_gpr[gprn], MMCR0_UREG_MASK);

t0 = GPR & MMCR0_UREG_MASK = 0x4000000 & 0x84000080 = 0x4000000

 > gen_load_spr(t1, SPR_POWER_MMCR0);

t1 = MMCR0 = 0x80000001

 > tcg_gen_andi_tl(t1, t1, ~(MMCR0_UREG_MASK));

t1 = t1 & ~MMCR0_UREG_MASK = 0x80000001 & ~0x84000080 = 0x1

 > tcg_gen_or_tl(t1, t1, t0);

t1 = t1 | t0 = 0x4000000 | 0x1 = 0x4000001

 > tcg_gen_mov_tl(cpu_gpr[gprn], t1);

GPR = 0x4000001

Now problem state knows that FCH is set.
Daniel Henrique Barboza Sept. 24, 2021, 7:05 p.m. UTC | #5
On 9/24/21 15:34, Matheus K. Ferst wrote:
> On 24/09/2021 11:41, Daniel Henrique Barboza wrote:
>> On 9/22/21 08:24, Matheus K. Ferst wrote:
>>> On 03/09/2021 17:31, Daniel Henrique Barboza wrote:
>>>> [E-MAIL EXTERNO] Não clique em links ou abra anexos, a menos que você possa confirmar o remetente e saber que o conteúdo é seguro. Em caso de e-mail suspeito entre imediatamente em contato com o DTI.
>>>>
>>>> This patch adds the barebones of the PMU logic by enabling cycle
>>>> counting, done via the performance monitor counter 6. The overall logic
>>>> goes as follows:
>>>>
>>>> - a helper is added to control the PMU state on each MMCR0 write. This
>>>> allows for the PMU to start/stop as the frozen counter bit (MMCR0_FC)
>>>> is cleared or set;
>>>>
>>>> - MMCR0 reg initial value is set to 0x80000000 (MMCR0_FC set) to avoid
>>>> having to spin the PMU right at system init;
>>>>
>>>> - the intended usage is to freeze the counters by setting MMCR0_FC, do
>>>> any additional setting of events to be counted via MMCR1 (not
>>>> implemented yet) and enable the PMU by zeroing MMCR0_FC. Software must
>>>> freeze counters to read the results - on the fly reading of the PMCs
>>>> will return the starting value of each one.
>>>>
>>>> Since there will be more PMU exclusive code to be added next, put the
>>>> PMU logic in its own helper to keep all in the same place. The name of
>>>> the new helper file, power8_pmu.c, is an indicative that the PMU logic
>>>> has been tested with the IBM POWER chip family, POWER8 being the oldest
>>>> version tested. This doesn't mean that this PMU logic will break with
>>>> any other PPC64 chip that implements Book3s, but since we can't assert
>>>> that this PMU will work with all available Book3s emulated processors
>>>> we're choosing to be explicit.
>>>>
>>>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>>>> ---
>>>
>>> <snip>
>>>
>>>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>>>> index 0babde3131..c3e2e3d329 100644
>>>> --- a/target/ppc/translate.c
>>>> +++ b/target/ppc/translate.c
>>>> @@ -401,6 +401,24 @@ void spr_write_generic(DisasContext *ctx, int sprn, int gprn)
>>>>       spr_store_dump_spr(sprn);
>>>>   }
>>>>
>>>> +#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
>>>> +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
>>>> +{
>>>> +    /*
>>>> +     * helper_store_mmcr0 will make clock based operations that
>>>> +     * will cause 'bad icount read' errors if we do not execute
>>>> +     * gen_icount_io_start() beforehand.
>>>> +     */
>>>> +    gen_icount_io_start(ctx);
>>>> +    gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
>>>> +}
>>>> +#else
>>>> +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
>>>> +{
>>>> +    spr_write_generic(ctx, sprn, gprn);
>>>> +}
>>>> +#endif
>>>> +
>>>>   #if !defined(CONFIG_USER_ONLY)
>>>>   void spr_write_generic32(DisasContext *ctx, int sprn, int gprn)
>>>>   {
>>>> @@ -596,7 +614,10 @@ void spr_write_MMCR0_ureg(DisasContext *ctx, int sprn, int gprn)
>>>>       tcg_gen_andi_tl(t1, t1, ~(MMCR0_UREG_MASK));
>>>>       /* Keep all other bits intact */
>>>>       tcg_gen_or_tl(t1, t1, t0);
>>>> -    gen_store_spr(SPR_POWER_MMCR0, t1);
>>>> +
>>>> +    /* Overwrite cpu_gpr[gprn] and use spr_write_MMCR0() */
>>>> +    tcg_gen_mov_tl(cpu_gpr[gprn], t1);
>>>> +    spr_write_MMCR0(ctx, sprn + 0x10, gprn);
>>>
>>> IIUC, this makes writing to MMCR0 change the GPR value and expose the unfiltered content of the SPR to problem state. It might be better to call the helper directly or create another method that takes a TCGv as an argument and call it from spr_write_MMCR0_ureg and spr_write_MMCR0.
>>
>> I'm overwriting cpu_gpr[gprn] with t1, which is filtered by MMCR0_REG_MASK
>> right before, to re-use spr_write_MMCR0() since its API requires a gprn
>> index. The reason I'm re-using spr_write_MMCR0() here is to avoid code repetition
>> in spr_write_MMCR0_ureg(), which would need to repeat the same steps as
>> spr_write_MMCR0 (calling icount_io_start(), calling the helper, and then setting
>> DISAS_EXIT_UPDATE in a later patch).
>>
>> The idea behind is that all PMU user_write() functions works the same as its
>> privileged counterparts but with some form of filtering done beforehand. Note
>> that this is kind of true in the previous patch as well - gen_store_spr() is
>> similar to the privileged function MMCR0 was using (spr_write_generic()) with
>> the exception of an optional qemu_log().
>>
>> Maybe I should've made this clear in the previous patch, using spr_write_generic()
>> and overwriting cpu_gpr[gprn] with the filtered t1 content back there.
>>
>> Speaking of which, since t1 is being filtered by MMCR0_REG_MASK before being used to
>> overwrite cpu_gpr[gprn], I'm not sure how this is exposing unfiltered content to
>> problem state. Can you elaborate?
> 
> Suppose MMCR0 has the value 0x80000001 (FC and FCH) and problem state executes an mtspr with the value 0x4000000 (unset FC and set PMAE) in the GPR. The proposed code will do the following:
> 
>  > tcg_gen_andi_tl(t0, cpu_gpr[gprn], MMCR0_UREG_MASK);
> 
> t0 = GPR & MMCR0_UREG_MASK = 0x4000000 & 0x84000080 = 0x4000000
> 
>  > gen_load_spr(t1, SPR_POWER_MMCR0);
> 
> t1 = MMCR0 = 0x80000001
> 
>  > tcg_gen_andi_tl(t1, t1, ~(MMCR0_UREG_MASK));
> 
> t1 = t1 & ~MMCR0_UREG_MASK = 0x80000001 & ~0x84000080 = 0x1
> 
>  > tcg_gen_or_tl(t1, t1, t0);
> 
> t1 = t1 | t0 = 0x4000000 | 0x1 = 0x4000001
> 
>  > tcg_gen_mov_tl(cpu_gpr[gprn], t1);
> 
> GPR = 0x4000001
> 
> Now problem state knows that FCH is set.

I see. The problem is that overwriting the GPR is exposing bits outside
of the MMCR0_UREG_MASK via GPR itself, something that wasn't happening
in the previous patch because the filtering logic wasn't visible via
userspace.

Thanks for clarifying. I'll fix it in the next version, probably by adding a
common 'write_MMCR0' method that receives a TCGv and that can be shared
for both privileged and user write() callbacks, like you suggested in your
previous reply.



Thanks,


Daniel


>
David Gibson Sept. 27, 2021, 5:04 a.m. UTC | #6
On Fri, Sep 24, 2021 at 04:05:37PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 9/24/21 15:34, Matheus K. Ferst wrote:
> > On 24/09/2021 11:41, Daniel Henrique Barboza wrote:
> > > On 9/22/21 08:24, Matheus K. Ferst wrote:
> > > > On 03/09/2021 17:31, Daniel Henrique Barboza wrote:
> > > > > [E-MAIL EXTERNO] Não clique em links ou abra anexos, a menos que você possa confirmar o remetente e saber que o conteúdo é seguro. Em caso de e-mail suspeito entre imediatamente em contato com o DTI.
> > > > > 
> > > > > This patch adds the barebones of the PMU logic by enabling cycle
> > > > > counting, done via the performance monitor counter 6. The overall logic
> > > > > goes as follows:
> > > > > 
> > > > > - a helper is added to control the PMU state on each MMCR0 write. This
> > > > > allows for the PMU to start/stop as the frozen counter bit (MMCR0_FC)
> > > > > is cleared or set;
> > > > > 
> > > > > - MMCR0 reg initial value is set to 0x80000000 (MMCR0_FC set) to avoid
> > > > > having to spin the PMU right at system init;
> > > > > 
> > > > > - the intended usage is to freeze the counters by setting MMCR0_FC, do
> > > > > any additional setting of events to be counted via MMCR1 (not
> > > > > implemented yet) and enable the PMU by zeroing MMCR0_FC. Software must
> > > > > freeze counters to read the results - on the fly reading of the PMCs
> > > > > will return the starting value of each one.
> > > > > 
> > > > > Since there will be more PMU exclusive code to be added next, put the
> > > > > PMU logic in its own helper to keep all in the same place. The name of
> > > > > the new helper file, power8_pmu.c, is an indicative that the PMU logic
> > > > > has been tested with the IBM POWER chip family, POWER8 being the oldest
> > > > > version tested. This doesn't mean that this PMU logic will break with
> > > > > any other PPC64 chip that implements Book3s, but since we can't assert
> > > > > that this PMU will work with all available Book3s emulated processors
> > > > > we're choosing to be explicit.
> > > > > 
> > > > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> > > > > ---
> > > > 
> > > > <snip>
> > > > 
> > > > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > > > > index 0babde3131..c3e2e3d329 100644
> > > > > --- a/target/ppc/translate.c
> > > > > +++ b/target/ppc/translate.c
> > > > > @@ -401,6 +401,24 @@ void spr_write_generic(DisasContext *ctx, int sprn, int gprn)
> > > > >       spr_store_dump_spr(sprn);
> > > > >   }
> > > > > 
> > > > > +#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
> > > > > +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
> > > > > +{
> > > > > +    /*
> > > > > +     * helper_store_mmcr0 will make clock based operations that
> > > > > +     * will cause 'bad icount read' errors if we do not execute
> > > > > +     * gen_icount_io_start() beforehand.
> > > > > +     */
> > > > > +    gen_icount_io_start(ctx);
> > > > > +    gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
> > > > > +}
> > > > > +#else
> > > > > +void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
> > > > > +{
> > > > > +    spr_write_generic(ctx, sprn, gprn);
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > >   #if !defined(CONFIG_USER_ONLY)
> > > > >   void spr_write_generic32(DisasContext *ctx, int sprn, int gprn)
> > > > >   {
> > > > > @@ -596,7 +614,10 @@ void spr_write_MMCR0_ureg(DisasContext *ctx, int sprn, int gprn)
> > > > >       tcg_gen_andi_tl(t1, t1, ~(MMCR0_UREG_MASK));
> > > > >       /* Keep all other bits intact */
> > > > >       tcg_gen_or_tl(t1, t1, t0);
> > > > > -    gen_store_spr(SPR_POWER_MMCR0, t1);
> > > > > +
> > > > > +    /* Overwrite cpu_gpr[gprn] and use spr_write_MMCR0() */
> > > > > +    tcg_gen_mov_tl(cpu_gpr[gprn], t1);
> > > > > +    spr_write_MMCR0(ctx, sprn + 0x10, gprn);
> > > > 
> > > > IIUC, this makes writing to MMCR0 change the GPR value and expose the unfiltered content of the SPR to problem state. It might be better to call the helper directly or create another method that takes a TCGv as an argument and call it from spr_write_MMCR0_ureg and spr_write_MMCR0.
> > > 
> > > I'm overwriting cpu_gpr[gprn] with t1, which is filtered by MMCR0_REG_MASK
> > > right before, to re-use spr_write_MMCR0() since its API requires a gprn
> > > index. The reason I'm re-using spr_write_MMCR0() here is to avoid code repetition
> > > in spr_write_MMCR0_ureg(), which would need to repeat the same steps as
> > > spr_write_MMCR0 (calling icount_io_start(), calling the helper, and then setting
> > > DISAS_EXIT_UPDATE in a later patch).
> > > 
> > > The idea behind is that all PMU user_write() functions works the same as its
> > > privileged counterparts but with some form of filtering done beforehand. Note
> > > that this is kind of true in the previous patch as well - gen_store_spr() is
> > > similar to the privileged function MMCR0 was using (spr_write_generic()) with
> > > the exception of an optional qemu_log().
> > > 
> > > Maybe I should've made this clear in the previous patch, using spr_write_generic()
> > > and overwriting cpu_gpr[gprn] with the filtered t1 content back there.
> > > 
> > > Speaking of which, since t1 is being filtered by MMCR0_REG_MASK before being used to
> > > overwrite cpu_gpr[gprn], I'm not sure how this is exposing unfiltered content to
> > > problem state. Can you elaborate?
> > 
> > Suppose MMCR0 has the value 0x80000001 (FC and FCH) and problem state executes an mtspr with the value 0x4000000 (unset FC and set PMAE) in the GPR. The proposed code will do the following:
> > 
> >  > tcg_gen_andi_tl(t0, cpu_gpr[gprn], MMCR0_UREG_MASK);
> > 
> > t0 = GPR & MMCR0_UREG_MASK = 0x4000000 & 0x84000080 = 0x4000000
> > 
> >  > gen_load_spr(t1, SPR_POWER_MMCR0);
> > 
> > t1 = MMCR0 = 0x80000001
> > 
> >  > tcg_gen_andi_tl(t1, t1, ~(MMCR0_UREG_MASK));
> > 
> > t1 = t1 & ~MMCR0_UREG_MASK = 0x80000001 & ~0x84000080 = 0x1
> > 
> >  > tcg_gen_or_tl(t1, t1, t0);
> > 
> > t1 = t1 | t0 = 0x4000000 | 0x1 = 0x4000001
> > 
> >  > tcg_gen_mov_tl(cpu_gpr[gprn], t1);
> > 
> > GPR = 0x4000001
> > 
> > Now problem state knows that FCH is set.

Nice catch Matheus.

> I see. The problem is that overwriting the GPR is exposing bits outside
> of the MMCR0_UREG_MASK via GPR itself, something that wasn't happening
> in the previous patch because the filtering logic wasn't visible via
> userspace.

Right.  Note that even if it wasn't exposing privileged bits, I don't
think changing the GPR value would be correct behaviour, although I
suspect it would be unlikely to cause problems in practice.

> Thanks for clarifying. I'll fix it in the next version, probably by adding a
> common 'write_MMCR0' method that receives a TCGv and that can be shared
> for both privileged and user write() callbacks, like you suggested in your
> previous reply.
diff mbox series

Patch

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 8dfbb62022..a9b31736af 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1178,6 +1178,12 @@  struct CPUPPCState {
     uint32_t tm_vscr;
     uint64_t tm_dscr;
     uint64_t tm_tar;
+
+    /*
+     * PMU base time value used by the PMU to calculate
+     * running cycles.
+     */
+    uint64_t pmu_base_time;
 };
 
 #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index bb5ea04c61..07c79745ba 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6820,8 +6820,8 @@  static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
 {
     spr_register_kvm(env, SPR_POWER_MMCR0, "MMCR0",
                      SPR_NOACCESS, SPR_NOACCESS,
-                     &spr_read_generic, &spr_write_generic,
-                     KVM_REG_PPC_MMCR0, 0x00000000);
+                     &spr_read_generic, &spr_write_MMCR0,
+                     KVM_REG_PPC_MMCR0, 0x80000000);
     spr_register_kvm(env, SPR_POWER_MMCR1, "MMCR1",
                      SPR_NOACCESS, SPR_NOACCESS,
                      &spr_read_generic, &spr_write_generic,
@@ -6869,7 +6869,7 @@  static void register_book3s_pmu_user_sprs(CPUPPCState *env)
     spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
                  &spr_read_MMCR0_ureg, &spr_write_MMCR0_ureg,
                  &spr_read_ureg, &spr_write_ureg,
-                 0x00000000);
+                 0x80000000);
     spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
                  &spr_read_ureg, SPR_NOACCESS,
                  &spr_read_ureg, &spr_write_ureg,
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 4076aa281e..5122632784 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -20,6 +20,7 @@  DEF_HELPER_1(rfscv, void, env)
 DEF_HELPER_1(hrfid, void, env)
 DEF_HELPER_2(store_lpcr, void, env, tl)
 DEF_HELPER_2(store_pcr, void, env, tl)
+DEF_HELPER_2(store_mmcr0, void, env, tl)
 #endif
 DEF_HELPER_1(check_tlb_flush_local, void, env)
 DEF_HELPER_1(check_tlb_flush_global, void, env)
diff --git a/target/ppc/meson.build b/target/ppc/meson.build
index b85f295703..278ce07da9 100644
--- a/target/ppc/meson.build
+++ b/target/ppc/meson.build
@@ -14,6 +14,7 @@  ppc_ss.add(when: 'CONFIG_TCG', if_true: files(
   'int_helper.c',
   'mem_helper.c',
   'misc_helper.c',
+  'power8_pmu.c',
   'timebase_helper.c',
   'translate.c',
 ))
diff --git a/target/ppc/power8_pmu.c b/target/ppc/power8_pmu.c
new file mode 100644
index 0000000000..47de38a99e
--- /dev/null
+++ b/target/ppc/power8_pmu.c
@@ -0,0 +1,74 @@ 
+/*
+ * PMU emulation helpers for TCG IBM POWER chips
+ *
+ *  Copyright IBM Corp. 2021
+ *
+ * Authors:
+ *  Daniel Henrique Barboza      <danielhb413@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "cpu.h"
+#include "helper_regs.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+
+#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
+
+static void update_PMC_PM_CYC(CPUPPCState *env, int sprn,
+                              uint64_t time_delta)
+{
+    /*
+     * The pseries and pvn clock runs at 1Ghz, meaning that
+     * 1 nanosec equals 1 cycle.
+     */
+    env->spr[sprn] += time_delta;
+}
+
+static void update_cycles_PMCs(CPUPPCState *env)
+{
+    uint64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+    uint64_t time_delta = now - env->pmu_base_time;
+
+    update_PMC_PM_CYC(env, SPR_POWER_PMC6, time_delta);
+}
+
+void helper_store_mmcr0(CPUPPCState *env, target_ulong value)
+{
+    target_ulong curr_value = env->spr[SPR_POWER_MMCR0];
+    bool curr_FC = curr_value & MMCR0_FC;
+    bool new_FC = value & MMCR0_FC;
+
+    env->spr[SPR_POWER_MMCR0] = value;
+
+    /* MMCR0 writes can change HFLAGS_PMCCCLEAR */
+    if ((curr_value & MMCR0_PMCC) != (value & MMCR0_PMCC)) {
+        hreg_compute_hflags(env);
+    }
+
+    /*
+     * In an frozen count (FC) bit change:
+     *
+     * - if PMCs were running (curr_FC = false) and we're freezing
+     * them (new_FC = true), save the PMCs values in the registers.
+     *
+     * - if PMCs were frozen (curr_FC = true) and we're activating
+     * them (new_FC = false), set the new base_time for future cycle
+     * calculations.
+     */
+    if (curr_FC != new_FC) {
+        if (!curr_FC) {
+            update_cycles_PMCs(env);
+        } else {
+            env->pmu_base_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+        }
+    }
+}
+
+#endif /* defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY) */
diff --git a/target/ppc/spr_tcg.h b/target/ppc/spr_tcg.h
index 094466a2b2..51fbc081de 100644
--- a/target/ppc/spr_tcg.h
+++ b/target/ppc/spr_tcg.h
@@ -25,6 +25,7 @@ 
 void spr_noaccess(DisasContext *ctx, int gprn, int sprn);
 void spr_read_generic(DisasContext *ctx, int gprn, int sprn);
 void spr_write_generic(DisasContext *ctx, int sprn, int gprn);
+void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn);
 void spr_read_xer(DisasContext *ctx, int gprn, int sprn);
 void spr_write_xer(DisasContext *ctx, int sprn, int gprn);
 void spr_read_lr(DisasContext *ctx, int gprn, int sprn);
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 0babde3131..c3e2e3d329 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -401,6 +401,24 @@  void spr_write_generic(DisasContext *ctx, int sprn, int gprn)
     spr_store_dump_spr(sprn);
 }
 
+#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
+void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
+{
+    /*
+     * helper_store_mmcr0 will make clock based operations that
+     * will cause 'bad icount read' errors if we do not execute
+     * gen_icount_io_start() beforehand.
+     */
+    gen_icount_io_start(ctx);
+    gen_helper_store_mmcr0(cpu_env, cpu_gpr[gprn]);
+}
+#else
+void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn)
+{
+    spr_write_generic(ctx, sprn, gprn);
+}
+#endif
+
 #if !defined(CONFIG_USER_ONLY)
 void spr_write_generic32(DisasContext *ctx, int sprn, int gprn)
 {
@@ -596,7 +614,10 @@  void spr_write_MMCR0_ureg(DisasContext *ctx, int sprn, int gprn)
     tcg_gen_andi_tl(t1, t1, ~(MMCR0_UREG_MASK));
     /* Keep all other bits intact */
     tcg_gen_or_tl(t1, t1, t0);
-    gen_store_spr(SPR_POWER_MMCR0, t1);
+
+    /* Overwrite cpu_gpr[gprn] and use spr_write_MMCR0() */
+    tcg_gen_mov_tl(cpu_gpr[gprn], t1);
+    spr_write_MMCR0(ctx, sprn + 0x10, gprn);
 
     tcg_temp_free(t0);
     tcg_temp_free(t1);