diff mbox series

[v9,11/12] target/arm: add an experimental mpidr arm cpu property object

Message ID b88fe895e6f71711387ca153f4f1b3fbb0aa2176.1724556967.git.mchehab+huawei@kernel.org (mailing list archive)
State New
Headers show
Series Add ACPI CPER firmware first error injection on ARM emulation | expand

Commit Message

Mauro Carvalho Chehab Aug. 25, 2024, 3:46 a.m. UTC
Accurately injecting an ARM Processor error ACPI/APEI GHES
error record requires the value of the ARM Multiprocessor
Affinity Register (mpidr).

While ARM implements it, this is currently not visible.

Add a field at CPU storing it, and place it at arm_cpu_properties
as experimental, thus allowing it to be queried via QMP using
qom-get function.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 target/arm/cpu.c    |  1 +
 target/arm/cpu.h    |  1 +
 target/arm/helper.c | 10 ++++++++--
 3 files changed, 10 insertions(+), 2 deletions(-)

Comments

Peter Maydell Aug. 25, 2024, 11:34 a.m. UTC | #1
On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Accurately injecting an ARM Processor error ACPI/APEI GHES
> error record requires the value of the ARM Multiprocessor
> Affinity Register (mpidr).
>
> While ARM implements it, this is currently not visible.
>
> Add a field at CPU storing it, and place it at arm_cpu_properties
> as experimental, thus allowing it to be queried via QMP using
> qom-get function.

>  static Property arm_cpu_properties[] = {
>      DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
> +    DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
>      DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
>                          mp_affinity, ARM64_AFFINITY_INVALID),
>      DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),

Why do we need this? Why is it experimental? The later patch
seems to use it via QMP, which I'm not super enthusiastic
about -- the preexisting mpidr and mp-affinity properties are
there for code that is creating CPU objects to configure
the CPU object, not as a query interface for QOM.

thanks
-- PMM
Mauro Carvalho Chehab Aug. 26, 2024, 1:53 a.m. UTC | #2
Em Sun, 25 Aug 2024 12:34:14 +0100
Peter Maydell <peter.maydell@linaro.org> escreveu:

> On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab
> <mchehab+huawei@kernel.org> wrote:
> >
> > Accurately injecting an ARM Processor error ACPI/APEI GHES
> > error record requires the value of the ARM Multiprocessor
> > Affinity Register (mpidr).
> >
> > While ARM implements it, this is currently not visible.
> >
> > Add a field at CPU storing it, and place it at arm_cpu_properties
> > as experimental, thus allowing it to be queried via QMP using
> > qom-get function.  
> 
> >  static Property arm_cpu_properties[] = {
> >      DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
> > +    DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
> >      DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> >                          mp_affinity, ARM64_AFFINITY_INVALID),
> >      DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),  
> 
> Why do we need this? 

The ACPI HEST tables, in particular when using GHESv2 provide
several kinds of errors. Among them, we have ARM Processor Error,
as defined at UEFI 2.10 spec (and earlier versions), the Common
Platform Error Record (CPER) is defined as:

   https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html?highlight=ghes#arm-processor-error-section

There are two fields that are part of the CPER record. One of them is
mandatory (MIDR); the other one is optional, but needed to decode another
field.

So, basically those errors need them.

> Why is it experimental? 

This was a suggestion from Igor. As for now the QAPI for external
error injection is experimental, It makes sense to me to keep it
experimental as well.

> The later patch
> seems to use it via QMP, which I'm not super enthusiastic
> about -- the preexisting mpidr and mp-affinity properties are
> there for code that is creating CPU objects to configure
> the CPU object, not as a query interface for QOM.

I saw that. Basically the decoding by OS guest depends on MPIDR,
as explained at the description of Error affinity level field:

	"For errors that can be attributed to a specific affinity level, 
	this field defines the affinity level at which the error was 
	produced, detected, and/or consumed. This is a value between 0
	and 3. All other values (4-255) are reserved

	For example, a vendor may choose to define affinity levels as
	follows:
	Level 0: errors that can be precisely attributed to a specific CPU
	(e.g. due to a synchronous external abort)
	Level 1: Cache parity and/or ECC errors detected at cache of affinity
	level 1 (e.g. only attributed to higher level cache due to 
	prefetching and/or error propagation)

	NOTE: Detailed meanings and groupings of affinity level are chip 
	and/or platform specific. The affinity level described here must 
	be consistent with the platform definitions used MPIDR. For
	cache/TLB errors, the cache/TLB level is provided by the cache/TLB
	error structure, which may differ from affinity level."

Regards,
Mauro
Peter Maydell Aug. 30, 2024, 4:27 p.m. UTC | #3
On Mon, 26 Aug 2024 at 04:12, Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Sun, 25 Aug 2024 12:34:14 +0100
> Peter Maydell <peter.maydell@linaro.org> escreveu:
>
> > On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab
> > <mchehab+huawei@kernel.org> wrote:
> > >
> > > Accurately injecting an ARM Processor error ACPI/APEI GHES
> > > error record requires the value of the ARM Multiprocessor
> > > Affinity Register (mpidr).
> > >
> > > While ARM implements it, this is currently not visible.
> > >
> > > Add a field at CPU storing it, and place it at arm_cpu_properties
> > > as experimental, thus allowing it to be queried via QMP using
> > > qom-get function.
> >
> > >  static Property arm_cpu_properties[] = {
> > >      DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
> > > +    DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
> > >      DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> > >                          mp_affinity, ARM64_AFFINITY_INVALID),
> > >      DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> >
> > Why do we need this?
>
> The ACPI HEST tables, in particular when using GHESv2 provide
> several kinds of errors. Among them, we have ARM Processor Error,
> as defined at UEFI 2.10 spec (and earlier versions), the Common
> Platform Error Record (CPER) is defined as:
>
>    https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html?highlight=ghes#arm-processor-error-section
>
> There are two fields that are part of the CPER record. One of them is
> mandatory (MIDR); the other one is optional, but needed to decode another
> field.
>
> So, basically those errors need them.

OK, but why do scripts outside of QEMU need the information,
as opposed to telling QEMU "hey, generate an error" and
QEMU knowing the format to use? Do we have any other
QMP APIs where something external provides raw ACPI
data like this?

-- PMM
Mauro Carvalho Chehab Sept. 1, 2024, 6:40 a.m. UTC | #4
Em Fri, 30 Aug 2024 17:27:27 +0100
Peter Maydell <peter.maydell@linaro.org> escreveu:

> On Mon, 26 Aug 2024 at 04:12, Mauro Carvalho Chehab
> <mchehab+huawei@kernel.org> wrote:
> >
> > Em Sun, 25 Aug 2024 12:34:14 +0100
> > Peter Maydell <peter.maydell@linaro.org> escreveu:
> >  
> > > On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab
> > > <mchehab+huawei@kernel.org> wrote:  
> > > >
> > > > Accurately injecting an ARM Processor error ACPI/APEI GHES
> > > > error record requires the value of the ARM Multiprocessor
> > > > Affinity Register (mpidr).
> > > >
> > > > While ARM implements it, this is currently not visible.
> > > >
> > > > Add a field at CPU storing it, and place it at arm_cpu_properties
> > > > as experimental, thus allowing it to be queried via QMP using
> > > > qom-get function.  
> > >  
> > > >  static Property arm_cpu_properties[] = {
> > > >      DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
> > > > +    DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
> > > >      DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> > > >                          mp_affinity, ARM64_AFFINITY_INVALID),
> > > >      DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),  
> > >
> > > Why do we need this?  
> >
> > The ACPI HEST tables, in particular when using GHESv2 provide
> > several kinds of errors. Among them, we have ARM Processor Error,
> > as defined at UEFI 2.10 spec (and earlier versions), the Common
> > Platform Error Record (CPER) is defined as:
> >
> >    https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html?highlight=ghes#arm-processor-error-section
> >
> > There are two fields that are part of the CPER record. One of them is
> > mandatory (MIDR); the other one is optional, but needed to decode another
> > field.
> >
> > So, basically those errors need them.  
> 
> OK, but why do scripts outside of QEMU need the information,
> as opposed to telling QEMU "hey, generate an error" and
> QEMU knowing the format to use? Do we have any other
> QMP APIs where something external provides raw ACPI
> data like this?

This was discussed during the review of this patch series. 

See, the ACPI Platform Error Interfaces (APEI) code currently in QEMU
implements limited support for ACPI HEST - Hardware Error Source Table [1].

[1] https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#acpi-error-source

HEST consists of, currently, 9 error types (plus 3 obsoleted ones). Among 
them, there is support for generic errors via GHES and GHESv2 types. 
While not officially obsoleted, GHES is superseded by GHESv2.

GHESv2 (and GHES) has a section type field to identify which error type it
is [2]. Currently, there are +10 defined UUIDs for the section type. 

[2] https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#section-descriptor

The current code on ghes.c implements GHESv2 support for a single
type (memory error), received from the host OS via SIGBUS.

Testing such code and injecting such error is not easy, as the host OS needs
to send a SIGBUS to the guest, this reflecting an error at the main OS.
Such code also has several limitations.

-

At the first three versions of this patch set, the code was just doing
like what you said: it was adding an error injection for a HEST GHESv2 
ARM Processor Error. So the error record (CPER) were produced in QEMU using 
some optional parameters passed via QMP to change fields when needed. 
With such approach, QEMU could use directly the value from MIDR and MPIDR.

The main disadvantage is that, to make full support of HEST, a lot
of code will be needed to add support for every GHESv2 type and for
every GHESv2 section type. So, the feedback we had were to re-implement
it into a generic way.

The generic CPER error inject approach (since v4 of this series), has
soma advantages:

- it is easy to do fuzz testing, as the entire CPER is built via a python
  script;
- no need to modify QEMU to support other GHESv2 types of record and
  to support other types of processors;
- GHESv2 fields can also be dynamically generated;
- It shouldn't be hard to change the code to support other types of
  HEST table (currently, only GHESv2 is supported).

The disadvantage is that queries are needed to pick configuration and
register values from the current emulation to do error injection. For
ARM Processor Error, it means that MPIDR and MIDR, are needed. Other 
processors and other error types will also require to query other data
from QEMU, either using already-existing QMP code or by adding new ones.

Yet, the amount of code for such queries seem to be smaller than the
amount of code to be added for every single GHESv2/HEST type.

-

Worth saying that QEMU may still require internal HEST/GHES errors to be 
able to reflect at the guests hardware problems detected at the host OS. 

So, for instance, if a host OS memory is poisoned due to hardware errors,
QEMU and guests need to know, in order to kill processes affected
by a bad memory. 

Regards,
Mauro
diff mbox series

Patch

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 19191c239181..30fcf0a10f46 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2619,6 +2619,7 @@  static ObjectClass *arm_cpu_class_by_name(const char *cpu_model)
 
 static Property arm_cpu_properties[] = {
     DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
+    DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
     DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
                         mp_affinity, ARM64_AFFINITY_INVALID),
     DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9a3fd595621f..3ad4de793409 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1033,6 +1033,7 @@  struct ArchCPU {
         uint64_t reset_pmcr_el0;
     } isar;
     uint64_t midr;
+    uint64_t mpidr;
     uint32_t revidr;
     uint32_t reset_fpsid;
     uint64_t ctr;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0a582c1cd3b3..d6e7aa069489 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4690,7 +4690,7 @@  static uint64_t mpidr_read_val(CPUARMState *env)
     return mpidr;
 }
 
-static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+static uint64_t mpidr_read(CPUARMState *env)
 {
     unsigned int cur_el = arm_current_el(env);
 
@@ -4700,6 +4700,11 @@  static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
     return mpidr_read_val(env);
 }
 
+static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    return mpidr_read(env);
+}
+
 static const ARMCPRegInfo lpae_cp_reginfo[] = {
     /* NOP AMAIR0/1 */
     { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
@@ -9721,7 +9726,7 @@  void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5,
               .fgt = FGT_MPIDR_EL1,
-              .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW },
+              .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW },
         };
 #ifdef CONFIG_USER_ONLY
         static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = {
@@ -9731,6 +9736,7 @@  void register_cp_regs_for_features(ARMCPU *cpu)
         modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo);
 #endif
         define_arm_cp_regs(cpu, mpidr_cp_reginfo);
+        cpu->mpidr = mpidr_read(env);
     }
 
     if (arm_feature(env, ARM_FEATURE_AUXCR)) {