diff mbox series

[v2,3/6] spapr/xive: Add a warning when StoreEOI is activated on POWER9 CPUs

Message ID 20201005165147.526426-4-clg@kaod.org (mailing list archive)
State New, archived
Headers show
Series spapr/xive: Activate StoreEOI in P10 compat guests | expand

Commit Message

Cédric Le Goater Oct. 5, 2020, 4:51 p.m. UTC
StoreEOI on POWER9 CPUs is racy because load-after-store ordering is
not enforced.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr_caps.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Greg Kurz Oct. 6, 2020, 4:58 p.m. UTC | #1
On Mon, 5 Oct 2020 18:51:44 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> StoreEOI on POWER9 CPUs is racy because load-after-store ordering is
> not enforced.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/ppc/spapr_caps.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index b0a9d0227db2..9251badbdc27 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -549,6 +549,15 @@ static void cap_storeeoi_apply(SpaprMachineState *spapr, uint8_t val,
>              error_setg(errp, "StoreEOI not supported by KVM");
>              return;
>          }
> +
> +        /*
> +         * load-after-store ordering is not enforced on POWER9 CPUs
> +         * and StoreEOI can be racy.
> +         */
> +        if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_10,
> +                                  0, spapr->max_compat_pvr)) {
> +            warn_report("StoreEOI on a POWER9 CPU is unsafe on KVM.");

It all boils down to what "unsafe" really means here... if the outcome is
"very likely hang the guest" as soon as it starts doing I/O, shouldn't
we error out instead ? What is the motivation to use StoreEOI if the
processor doesn't really support it ?

> +        }
>      }
>  }
>
Cédric Le Goater Oct. 6, 2020, 5:03 p.m. UTC | #2
On 10/6/20 6:58 PM, Greg Kurz wrote:
> On Mon, 5 Oct 2020 18:51:44 +0200
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>> StoreEOI on POWER9 CPUs is racy because load-after-store ordering is
>> not enforced.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/ppc/spapr_caps.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index b0a9d0227db2..9251badbdc27 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -549,6 +549,15 @@ static void cap_storeeoi_apply(SpaprMachineState *spapr, uint8_t val,
>>              error_setg(errp, "StoreEOI not supported by KVM");
>>              return;
>>          }
>> +
>> +        /*
>> +         * load-after-store ordering is not enforced on POWER9 CPUs
>> +         * and StoreEOI can be racy.
>> +         */
>> +        if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_10,
>> +                                  0, spapr->max_compat_pvr)) {
>> +            warn_report("StoreEOI on a POWER9 CPU is unsafe on KVM.");
> 
> It all boils down to what "unsafe" really means here... if the outcome is
> "very likely hang the guest" as soon as it starts doing I/O, shouldn't
> we error out instead ? What is the motivation to use StoreEOI if the
> processor doesn't really support it ?

We use it in the lab on P9. We have never seen it failed even under stress. 
But there is a possible race in the logic. 

C.
Greg Kurz Oct. 7, 2020, 8:56 a.m. UTC | #3
On Tue, 6 Oct 2020 19:03:28 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> On 10/6/20 6:58 PM, Greg Kurz wrote:
> > On Mon, 5 Oct 2020 18:51:44 +0200
> > Cédric Le Goater <clg@kaod.org> wrote:
> > 
> >> StoreEOI on POWER9 CPUs is racy because load-after-store ordering is
> >> not enforced.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/ppc/spapr_caps.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >> index b0a9d0227db2..9251badbdc27 100644
> >> --- a/hw/ppc/spapr_caps.c
> >> +++ b/hw/ppc/spapr_caps.c
> >> @@ -549,6 +549,15 @@ static void cap_storeeoi_apply(SpaprMachineState *spapr, uint8_t val,
> >>              error_setg(errp, "StoreEOI not supported by KVM");
> >>              return;
> >>          }
> >> +
> >> +        /*
> >> +         * load-after-store ordering is not enforced on POWER9 CPUs
> >> +         * and StoreEOI can be racy.
> >> +         */
> >> +        if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_10,
> >> +                                  0, spapr->max_compat_pvr)) {
> >> +            warn_report("StoreEOI on a POWER9 CPU is unsafe on KVM.");
> > 

The error message should mention XIVE KVM device actually since this only
depends on kernel-irqchip.

> > It all boils down to what "unsafe" really means here... if the outcome is
> > "very likely hang the guest" as soon as it starts doing I/O, shouldn't
> > we error out instead ? What is the motivation to use StoreEOI if the
> > processor doesn't really support it ?
> 
> We use it in the lab on P9. We have never seen it failed even under stress. 
> But there is a possible race in the logic. 
> 
> C.

Thinking again. P9 boston systems on the field only use emulated XIVE and
we certainly want to be able to migrate to P9 systems that use in-kernel
XIVE and vice-versa. So, even if StoreEOI is technically supported by
the emulated XIVE, maybe we should disallow it anyway ?
Cédric Le Goater Oct. 7, 2020, 9:21 a.m. UTC | #4
On 10/7/20 10:56 AM, Greg Kurz wrote:
> On Tue, 6 Oct 2020 19:03:28 +0200
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>> On 10/6/20 6:58 PM, Greg Kurz wrote:
>>> On Mon, 5 Oct 2020 18:51:44 +0200
>>> Cédric Le Goater <clg@kaod.org> wrote:
>>>
>>>> StoreEOI on POWER9 CPUs is racy because load-after-store ordering is
>>>> not enforced.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/ppc/spapr_caps.c | 9 +++++++++
>>>>  1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>> index b0a9d0227db2..9251badbdc27 100644
>>>> --- a/hw/ppc/spapr_caps.c
>>>> +++ b/hw/ppc/spapr_caps.c
>>>> @@ -549,6 +549,15 @@ static void cap_storeeoi_apply(SpaprMachineState *spapr, uint8_t val,
>>>>              error_setg(errp, "StoreEOI not supported by KVM");
>>>>              return;
>>>>          }
>>>> +
>>>> +        /*
>>>> +         * load-after-store ordering is not enforced on POWER9 CPUs
>>>> +         * and StoreEOI can be racy.
>>>> +         */
>>>> +        if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_10,
>>>> +                                  0, spapr->max_compat_pvr)) {
>>>> +            warn_report("StoreEOI on a POWER9 CPU is unsafe on KVM.");
>>>
> 
> The error message should mention XIVE KVM device actually since this only
> depends on kernel-irqchip.
> 
>>> It all boils down to what "unsafe" really means here... if the outcome is
>>> "very likely hang the guest" as soon as it starts doing I/O, shouldn't
>>> we error out instead ? What is the motivation to use StoreEOI if the
>>> processor doesn't really support it ?
>>
>> We use it in the lab on P9. We have never seen it failed even under stress. 
>> But there is a possible race in the logic. 
>>
>> C.
> 
> Thinking again. P9 boston systems on the field only use emulated XIVE and
> we certainly want to be able to migrate to P9 systems that use in-kernel
> XIVE and vice-versa. So, even if StoreEOI is technically supported by
> the emulated XIVE, maybe we should disallow it anyway ?

yes. But to activate it, one would need to set the capability to "on"
because the default is "off". 

Even with the "cas" capability value introduced at the end of the 
patchset, StoreEOI would not be advertised to guests running under 
P9 compat. 

C.
diff mbox series

Patch

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index b0a9d0227db2..9251badbdc27 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -549,6 +549,15 @@  static void cap_storeeoi_apply(SpaprMachineState *spapr, uint8_t val,
             error_setg(errp, "StoreEOI not supported by KVM");
             return;
         }
+
+        /*
+         * load-after-store ordering is not enforced on POWER9 CPUs
+         * and StoreEOI can be racy.
+         */
+        if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_10,
+                                  0, spapr->max_compat_pvr)) {
+            warn_report("StoreEOI on a POWER9 CPU is unsafe on KVM.");
+        }
     }
 }