diff mbox series

target/riscv/kvm: tolerate KVM disable ext errors

Message ID 20240422131253.313869-1-dbarboza@ventanamicro.com (mailing list archive)
State New, archived
Headers show
Series target/riscv/kvm: tolerate KVM disable ext errors | expand

Commit Message

Daniel Henrique Barboza April 22, 2024, 1:12 p.m. UTC
Running a KVM guest using a 6.9-rc3 kernel, in a 6.8 host that has zkr
enabled, will fail with a kernel oops SIGILL right at the start. The
reason is that we can't expose zkr without implementing the SEED CSR.
Disabling zkr in the guest would be a workaround, but if the KVM doesn't
allow it we'll error out and never boot.

In hindsight this is too strict. If we keep proceeding, despite not
disabling the extension in the KVM vcpu, we'll not add extension in
riscv,isa. The guest kernel will be unaware of the extension, i.e. it
doesn't matter if the KVM vcpu has it enabled underneath or not. So it's
ok to keep booting in this case.

Change our current logic to not error out if we fail to disable an
extension in kvm_set_one_reg(): throw an warning instead and keep
booting.  We'll keep the existing behavior when we fail to enable an
extension in KVM, since adding the extension in riscv,isa at this point
will cause a guest malfunction because the extension isn't enabled in
the vcpu.

Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 target/riscv/kvm/kvm-cpu.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Andrew Jones April 22, 2024, 1:43 p.m. UTC | #1
On Mon, Apr 22, 2024 at 10:12:53AM -0300, Daniel Henrique Barboza wrote:
> Running a KVM guest using a 6.9-rc3 kernel, in a 6.8 host that has zkr
> enabled, will fail with a kernel oops SIGILL right at the start. The
> reason is that we can't expose zkr without implementing the SEED CSR.
> Disabling zkr in the guest would be a workaround, but if the KVM doesn't
> allow it we'll error out and never boot.
> 
> In hindsight this is too strict. If we keep proceeding, despite not
> disabling the extension in the KVM vcpu, we'll not add extension in
                                                        ^ the
> riscv,isa. The guest kernel will be unaware of the extension, i.e. it
 ^ the

> doesn't matter if the KVM vcpu has it enabled underneath or not. So it's
> ok to keep booting in this case.
> 
> Change our current logic to not error out if we fail to disable an
> extension in kvm_set_one_reg(): throw an warning instead and keep
> booting.  We'll keep the existing behavior when we fail to enable an
> extension in KVM, since adding the extension in riscv,isa at this point
> will cause a guest malfunction because the extension isn't enabled in
> the vcpu.

I'd probably add a sentence or two explaining why we still want to warn,
which is because KVM is stating that while you may think you're disabling
an extension, that extension's behavior (instructions, etc.) will still
be exposed to the guest since there's no way not to expose it.  The user
should be aware that a guest could use it anyway, since that means the
attempt to disable it can't be relied upon for migration compatibility of
arbitrary guests. The guest needs to be well behaved, i.e. one that won't
try to use any extensions which aren't in the ISA string. So, disabling
these types of extensions either shouldn't generally be done (so a noisy
warning helps prohibit that) or done for debug purposes (where a noisy
warning is fine).

> 
> Suggested-by: Andrew Jones <ajones@ventanamicro.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  target/riscv/kvm/kvm-cpu.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
> index 6a6c6cae80..261ca24504 100644
> --- a/target/riscv/kvm/kvm-cpu.c
> +++ b/target/riscv/kvm/kvm-cpu.c
> @@ -427,10 +427,14 @@ static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs)
>          reg = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
>          ret = kvm_set_one_reg(cs, id, &reg);
>          if (ret != 0) {
> -            error_report("Unable to %s extension %s in KVM, error %d",
> -                         reg ? "enable" : "disable",
> -                         multi_ext_cfg->name, ret);
> -            exit(EXIT_FAILURE);
> +            if (reg) {
> +                error_report("Unable to enable extension %s in KVM, error %d",
> +                             multi_ext_cfg->name, ret);
> +                exit(EXIT_FAILURE);
> +            } else {

Maybe we should check for a specific error. Is it EINVAL? If we do, then
the message below would drop the (error %d) part and instead state
explicitly that it cannot disable this extension since it's not possible.

> +                warn_report("KVM did not disable extension %s (error %d)",

s/did not/cannot/

> +                            multi_ext_cfg->name, ret);
> +            }
>          }
>      }
>  }
> -- 
> 2.44.0
>

Thanks,
drew
Daniel Henrique Barboza April 22, 2024, 2:08 p.m. UTC | #2
On 4/22/24 10:43, Andrew Jones wrote:
> On Mon, Apr 22, 2024 at 10:12:53AM -0300, Daniel Henrique Barboza wrote:
>> Running a KVM guest using a 6.9-rc3 kernel, in a 6.8 host that has zkr
>> enabled, will fail with a kernel oops SIGILL right at the start. The
>> reason is that we can't expose zkr without implementing the SEED CSR.
>> Disabling zkr in the guest would be a workaround, but if the KVM doesn't
>> allow it we'll error out and never boot.
>>
>> In hindsight this is too strict. If we keep proceeding, despite not
>> disabling the extension in the KVM vcpu, we'll not add extension in
>                                                          ^ the
>> riscv,isa. The guest kernel will be unaware of the extension, i.e. it
>   ^ the
> 
>> doesn't matter if the KVM vcpu has it enabled underneath or not. So it's
>> ok to keep booting in this case.
>>
>> Change our current logic to not error out if we fail to disable an
>> extension in kvm_set_one_reg(): throw an warning instead and keep
>> booting.  We'll keep the existing behavior when we fail to enable an
>> extension in KVM, since adding the extension in riscv,isa at this point
>> will cause a guest malfunction because the extension isn't enabled in
>> the vcpu.
> 
> I'd probably add a sentence or two explaining why we still want to warn,
> which is because KVM is stating that while you may think you're disabling
> an extension, that extension's behavior (instructions, etc.) will still
> be exposed to the guest since there's no way not to expose it.  The user
> should be aware that a guest could use it anyway, since that means the
> attempt to disable it can't be relied upon for migration compatibility of
> arbitrary guests. The guest needs to be well behaved, i.e. one that won't
> try to use any extensions which aren't in the ISA string. So, disabling
> these types of extensions either shouldn't generally be done (so a noisy
> warning helps prohibit that) or done for debug purposes (where a noisy
> warning is fine).

I'll add this in the next version.

> 
>>
>> Suggested-by: Andrew Jones <ajones@ventanamicro.com>
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>>   target/riscv/kvm/kvm-cpu.c | 12 ++++++++----
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
>> index 6a6c6cae80..261ca24504 100644
>> --- a/target/riscv/kvm/kvm-cpu.c
>> +++ b/target/riscv/kvm/kvm-cpu.c
>> @@ -427,10 +427,14 @@ static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs)
>>           reg = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
>>           ret = kvm_set_one_reg(cs, id, &reg);
>>           if (ret != 0) {
>> -            error_report("Unable to %s extension %s in KVM, error %d",
>> -                         reg ? "enable" : "disable",
>> -                         multi_ext_cfg->name, ret);
>> -            exit(EXIT_FAILURE);
>> +            if (reg) {
>> +                error_report("Unable to enable extension %s in KVM, error %d",
>> +                             multi_ext_cfg->name, ret);
>> +                exit(EXIT_FAILURE);
>> +            } else {
> 
> Maybe we should check for a specific error. Is it EINVAL? If we do, then
> the message below would drop the (error %d) part and instead state
> explicitly that it cannot disable this extension since it's not possible.

It's throwing a 22 (EINVAL).

> 
>> +                warn_report("KVM did not disable extension %s (error %d)",
> 
> s/did not/cannot/
> 
>> +                            multi_ext_cfg->name, ret);
>> +            }


I don't mind rephrasing it to "The KVM module is not able to disable extension %s"
But I'm unsure what we gain from not showing the error code.

Aside EINVAL the other (very unlikely) error code being thrown would be EBUSY. Should
we handle EBUSY differently in this case? I don't think so. If we're already throwing a
warning, with error code, the user is well informed about the guest having a flaky
start and can proceed with discretion. Regardless of the extension disablement failing
due to EINVAL or EBUSY.


Thanks,

Daniel

>>           }
>>       }
>>   }
>> -- 
>> 2.44.0
>>
> 
> Thanks,
> drew
Andrew Jones April 22, 2024, 2:28 p.m. UTC | #3
On Mon, Apr 22, 2024 at 11:08:31AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 4/22/24 10:43, Andrew Jones wrote:
> > On Mon, Apr 22, 2024 at 10:12:53AM -0300, Daniel Henrique Barboza wrote:
> > > Running a KVM guest using a 6.9-rc3 kernel, in a 6.8 host that has zkr
> > > enabled, will fail with a kernel oops SIGILL right at the start. The
> > > reason is that we can't expose zkr without implementing the SEED CSR.
> > > Disabling zkr in the guest would be a workaround, but if the KVM doesn't
> > > allow it we'll error out and never boot.
> > > 
> > > In hindsight this is too strict. If we keep proceeding, despite not
> > > disabling the extension in the KVM vcpu, we'll not add extension in
> >                                                          ^ the
> > > riscv,isa. The guest kernel will be unaware of the extension, i.e. it
> >   ^ the
> > 
> > > doesn't matter if the KVM vcpu has it enabled underneath or not. So it's
> > > ok to keep booting in this case.
> > > 
> > > Change our current logic to not error out if we fail to disable an
> > > extension in kvm_set_one_reg(): throw an warning instead and keep
> > > booting.  We'll keep the existing behavior when we fail to enable an
> > > extension in KVM, since adding the extension in riscv,isa at this point
> > > will cause a guest malfunction because the extension isn't enabled in
> > > the vcpu.
> > 
> > I'd probably add a sentence or two explaining why we still want to warn,
> > which is because KVM is stating that while you may think you're disabling
> > an extension, that extension's behavior (instructions, etc.) will still
> > be exposed to the guest since there's no way not to expose it.  The user
> > should be aware that a guest could use it anyway, since that means the
> > attempt to disable it can't be relied upon for migration compatibility of
> > arbitrary guests. The guest needs to be well behaved, i.e. one that won't
> > try to use any extensions which aren't in the ISA string. So, disabling
> > these types of extensions either shouldn't generally be done (so a noisy
> > warning helps prohibit that) or done for debug purposes (where a noisy
> > warning is fine).
> 
> I'll add this in the next version.
> 
> > 
> > > 
> > > Suggested-by: Andrew Jones <ajones@ventanamicro.com>
> > > Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> > > ---
> > >   target/riscv/kvm/kvm-cpu.c | 12 ++++++++----
> > >   1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
> > > index 6a6c6cae80..261ca24504 100644
> > > --- a/target/riscv/kvm/kvm-cpu.c
> > > +++ b/target/riscv/kvm/kvm-cpu.c
> > > @@ -427,10 +427,14 @@ static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs)
> > >           reg = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
> > >           ret = kvm_set_one_reg(cs, id, &reg);
> > >           if (ret != 0) {
> > > -            error_report("Unable to %s extension %s in KVM, error %d",
> > > -                         reg ? "enable" : "disable",
> > > -                         multi_ext_cfg->name, ret);
> > > -            exit(EXIT_FAILURE);
> > > +            if (reg) {
> > > +                error_report("Unable to enable extension %s in KVM, error %d",
> > > +                             multi_ext_cfg->name, ret);
> > > +                exit(EXIT_FAILURE);
> > > +            } else {
> > 
> > Maybe we should check for a specific error. Is it EINVAL? If we do, then
> > the message below would drop the (error %d) part and instead state
> > explicitly that it cannot disable this extension since it's not possible.
> 
> It's throwing a 22 (EINVAL).
> 
> > 
> > > +                warn_report("KVM did not disable extension %s (error %d)",
> > 
> > s/did not/cannot/
> > 
> > > +                            multi_ext_cfg->name, ret);
> > > +            }
> 
> 
> I don't mind rephrasing it to "The KVM module is not able to disable extension %s"
> But I'm unsure what we gain from not showing the error code.
> 
> Aside EINVAL the other (very unlikely) error code being thrown would be EBUSY. Should
> we handle EBUSY differently in this case? I don't think so. If we're already throwing a
> warning, with error code, the user is well informed about the guest having a flaky
> start and can proceed with discretion. Regardless of the extension disablement failing
> due to EINVAL or EBUSY.

But EBUSY means the VMM messed up and tried to do something it shouldn't,
which is to change the configuration of the VCPU after it already ran.
EINVAL isn't a VMM mess up. It wanted to disable the extension and will
remove it from the ISA string, KVM just can't guarantee that the guest
won't be able to use it anyway (making that a warn case). So, EBUSY isn't
a warn case (but it's also not something we'd expect to see in a function
called from kvm_arch_init_vcpu()). However, KVM is free to add new errors,
so I'd only warn for the ones which translate to warn cases (EINVAL).

Thanks,
drew
diff mbox series

Patch

diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 6a6c6cae80..261ca24504 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -427,10 +427,14 @@  static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs)
         reg = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
         ret = kvm_set_one_reg(cs, id, &reg);
         if (ret != 0) {
-            error_report("Unable to %s extension %s in KVM, error %d",
-                         reg ? "enable" : "disable",
-                         multi_ext_cfg->name, ret);
-            exit(EXIT_FAILURE);
+            if (reg) {
+                error_report("Unable to enable extension %s in KVM, error %d",
+                             multi_ext_cfg->name, ret);
+                exit(EXIT_FAILURE);
+            } else {
+                warn_report("KVM did not disable extension %s (error %d)",
+                            multi_ext_cfg->name, ret);
+            }
         }
     }
 }