Message ID | 20240422131253.313869-1-dbarboza@ventanamicro.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | target/riscv/kvm: tolerate KVM disable ext errors | expand |
On Mon, Apr 22, 2024 at 10:12:53AM -0300, Daniel Henrique Barboza wrote: > Running a KVM guest using a 6.9-rc3 kernel, in a 6.8 host that has zkr > enabled, will fail with a kernel oops SIGILL right at the start. The > reason is that we can't expose zkr without implementing the SEED CSR. > Disabling zkr in the guest would be a workaround, but if the KVM doesn't > allow it we'll error out and never boot. > > In hindsight this is too strict. If we keep proceeding, despite not > disabling the extension in the KVM vcpu, we'll not add extension in ^ the > riscv,isa. The guest kernel will be unaware of the extension, i.e. it ^ the > doesn't matter if the KVM vcpu has it enabled underneath or not. So it's > ok to keep booting in this case. > > Change our current logic to not error out if we fail to disable an > extension in kvm_set_one_reg(): throw an warning instead and keep > booting. We'll keep the existing behavior when we fail to enable an > extension in KVM, since adding the extension in riscv,isa at this point > will cause a guest malfunction because the extension isn't enabled in > the vcpu. I'd probably add a sentence or two explaining why we still want to warn, which is because KVM is stating that while you may think you're disabling an extension, that extension's behavior (instructions, etc.) will still be exposed to the guest since there's no way not to expose it. The user should be aware that a guest could use it anyway, since that means the attempt to disable it can't be relied upon for migration compatibility of arbitrary guests. The guest needs to be well behaved, i.e. one that won't try to use any extensions which aren't in the ISA string. So, disabling these types of extensions either shouldn't generally be done (so a noisy warning helps prohibit that) or done for debug purposes (where a noisy warning is fine). > > Suggested-by: Andrew Jones <ajones@ventanamicro.com> > Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> > --- > target/riscv/kvm/kvm-cpu.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c > index 6a6c6cae80..261ca24504 100644 > --- a/target/riscv/kvm/kvm-cpu.c > +++ b/target/riscv/kvm/kvm-cpu.c > @@ -427,10 +427,14 @@ static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs) > reg = kvm_cpu_cfg_get(cpu, multi_ext_cfg); > ret = kvm_set_one_reg(cs, id, ®); > if (ret != 0) { > - error_report("Unable to %s extension %s in KVM, error %d", > - reg ? "enable" : "disable", > - multi_ext_cfg->name, ret); > - exit(EXIT_FAILURE); > + if (reg) { > + error_report("Unable to enable extension %s in KVM, error %d", > + multi_ext_cfg->name, ret); > + exit(EXIT_FAILURE); > + } else { Maybe we should check for a specific error. Is it EINVAL? If we do, then the message below would drop the (error %d) part and instead state explicitly that it cannot disable this extension since it's not possible. > + warn_report("KVM did not disable extension %s (error %d)", s/did not/cannot/ > + multi_ext_cfg->name, ret); > + } > } > } > } > -- > 2.44.0 > Thanks, drew
On 4/22/24 10:43, Andrew Jones wrote: > On Mon, Apr 22, 2024 at 10:12:53AM -0300, Daniel Henrique Barboza wrote: >> Running a KVM guest using a 6.9-rc3 kernel, in a 6.8 host that has zkr >> enabled, will fail with a kernel oops SIGILL right at the start. The >> reason is that we can't expose zkr without implementing the SEED CSR. >> Disabling zkr in the guest would be a workaround, but if the KVM doesn't >> allow it we'll error out and never boot. >> >> In hindsight this is too strict. If we keep proceeding, despite not >> disabling the extension in the KVM vcpu, we'll not add extension in > ^ the >> riscv,isa. The guest kernel will be unaware of the extension, i.e. it > ^ the > >> doesn't matter if the KVM vcpu has it enabled underneath or not. So it's >> ok to keep booting in this case. >> >> Change our current logic to not error out if we fail to disable an >> extension in kvm_set_one_reg(): throw an warning instead and keep >> booting. We'll keep the existing behavior when we fail to enable an >> extension in KVM, since adding the extension in riscv,isa at this point >> will cause a guest malfunction because the extension isn't enabled in >> the vcpu. > > I'd probably add a sentence or two explaining why we still want to warn, > which is because KVM is stating that while you may think you're disabling > an extension, that extension's behavior (instructions, etc.) will still > be exposed to the guest since there's no way not to expose it. The user > should be aware that a guest could use it anyway, since that means the > attempt to disable it can't be relied upon for migration compatibility of > arbitrary guests. The guest needs to be well behaved, i.e. one that won't > try to use any extensions which aren't in the ISA string. So, disabling > these types of extensions either shouldn't generally be done (so a noisy > warning helps prohibit that) or done for debug purposes (where a noisy > warning is fine). I'll add this in the next version. > >> >> Suggested-by: Andrew Jones <ajones@ventanamicro.com> >> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> >> --- >> target/riscv/kvm/kvm-cpu.c | 12 ++++++++---- >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c >> index 6a6c6cae80..261ca24504 100644 >> --- a/target/riscv/kvm/kvm-cpu.c >> +++ b/target/riscv/kvm/kvm-cpu.c >> @@ -427,10 +427,14 @@ static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs) >> reg = kvm_cpu_cfg_get(cpu, multi_ext_cfg); >> ret = kvm_set_one_reg(cs, id, ®); >> if (ret != 0) { >> - error_report("Unable to %s extension %s in KVM, error %d", >> - reg ? "enable" : "disable", >> - multi_ext_cfg->name, ret); >> - exit(EXIT_FAILURE); >> + if (reg) { >> + error_report("Unable to enable extension %s in KVM, error %d", >> + multi_ext_cfg->name, ret); >> + exit(EXIT_FAILURE); >> + } else { > > Maybe we should check for a specific error. Is it EINVAL? If we do, then > the message below would drop the (error %d) part and instead state > explicitly that it cannot disable this extension since it's not possible. It's throwing a 22 (EINVAL). > >> + warn_report("KVM did not disable extension %s (error %d)", > > s/did not/cannot/ > >> + multi_ext_cfg->name, ret); >> + } I don't mind rephrasing it to "The KVM module is not able to disable extension %s" But I'm unsure what we gain from not showing the error code. Aside EINVAL the other (very unlikely) error code being thrown would be EBUSY. Should we handle EBUSY differently in this case? I don't think so. If we're already throwing a warning, with error code, the user is well informed about the guest having a flaky start and can proceed with discretion. Regardless of the extension disablement failing due to EINVAL or EBUSY. Thanks, Daniel >> } >> } >> } >> -- >> 2.44.0 >> > > Thanks, > drew
On Mon, Apr 22, 2024 at 11:08:31AM -0300, Daniel Henrique Barboza wrote: > > > On 4/22/24 10:43, Andrew Jones wrote: > > On Mon, Apr 22, 2024 at 10:12:53AM -0300, Daniel Henrique Barboza wrote: > > > Running a KVM guest using a 6.9-rc3 kernel, in a 6.8 host that has zkr > > > enabled, will fail with a kernel oops SIGILL right at the start. The > > > reason is that we can't expose zkr without implementing the SEED CSR. > > > Disabling zkr in the guest would be a workaround, but if the KVM doesn't > > > allow it we'll error out and never boot. > > > > > > In hindsight this is too strict. If we keep proceeding, despite not > > > disabling the extension in the KVM vcpu, we'll not add extension in > > ^ the > > > riscv,isa. The guest kernel will be unaware of the extension, i.e. it > > ^ the > > > > > doesn't matter if the KVM vcpu has it enabled underneath or not. So it's > > > ok to keep booting in this case. > > > > > > Change our current logic to not error out if we fail to disable an > > > extension in kvm_set_one_reg(): throw an warning instead and keep > > > booting. We'll keep the existing behavior when we fail to enable an > > > extension in KVM, since adding the extension in riscv,isa at this point > > > will cause a guest malfunction because the extension isn't enabled in > > > the vcpu. > > > > I'd probably add a sentence or two explaining why we still want to warn, > > which is because KVM is stating that while you may think you're disabling > > an extension, that extension's behavior (instructions, etc.) will still > > be exposed to the guest since there's no way not to expose it. The user > > should be aware that a guest could use it anyway, since that means the > > attempt to disable it can't be relied upon for migration compatibility of > > arbitrary guests. The guest needs to be well behaved, i.e. one that won't > > try to use any extensions which aren't in the ISA string. So, disabling > > these types of extensions either shouldn't generally be done (so a noisy > > warning helps prohibit that) or done for debug purposes (where a noisy > > warning is fine). > > I'll add this in the next version. > > > > > > > > > Suggested-by: Andrew Jones <ajones@ventanamicro.com> > > > Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> > > > --- > > > target/riscv/kvm/kvm-cpu.c | 12 ++++++++---- > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c > > > index 6a6c6cae80..261ca24504 100644 > > > --- a/target/riscv/kvm/kvm-cpu.c > > > +++ b/target/riscv/kvm/kvm-cpu.c > > > @@ -427,10 +427,14 @@ static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs) > > > reg = kvm_cpu_cfg_get(cpu, multi_ext_cfg); > > > ret = kvm_set_one_reg(cs, id, ®); > > > if (ret != 0) { > > > - error_report("Unable to %s extension %s in KVM, error %d", > > > - reg ? "enable" : "disable", > > > - multi_ext_cfg->name, ret); > > > - exit(EXIT_FAILURE); > > > + if (reg) { > > > + error_report("Unable to enable extension %s in KVM, error %d", > > > + multi_ext_cfg->name, ret); > > > + exit(EXIT_FAILURE); > > > + } else { > > > > Maybe we should check for a specific error. Is it EINVAL? If we do, then > > the message below would drop the (error %d) part and instead state > > explicitly that it cannot disable this extension since it's not possible. > > It's throwing a 22 (EINVAL). > > > > > > + warn_report("KVM did not disable extension %s (error %d)", > > > > s/did not/cannot/ > > > > > + multi_ext_cfg->name, ret); > > > + } > > > I don't mind rephrasing it to "The KVM module is not able to disable extension %s" > But I'm unsure what we gain from not showing the error code. > > Aside EINVAL the other (very unlikely) error code being thrown would be EBUSY. Should > we handle EBUSY differently in this case? I don't think so. If we're already throwing a > warning, with error code, the user is well informed about the guest having a flaky > start and can proceed with discretion. Regardless of the extension disablement failing > due to EINVAL or EBUSY. But EBUSY means the VMM messed up and tried to do something it shouldn't, which is to change the configuration of the VCPU after it already ran. EINVAL isn't a VMM mess up. It wanted to disable the extension and will remove it from the ISA string, KVM just can't guarantee that the guest won't be able to use it anyway (making that a warn case). So, EBUSY isn't a warn case (but it's also not something we'd expect to see in a function called from kvm_arch_init_vcpu()). However, KVM is free to add new errors, so I'd only warn for the ones which translate to warn cases (EINVAL). Thanks, drew
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c index 6a6c6cae80..261ca24504 100644 --- a/target/riscv/kvm/kvm-cpu.c +++ b/target/riscv/kvm/kvm-cpu.c @@ -427,10 +427,14 @@ static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs) reg = kvm_cpu_cfg_get(cpu, multi_ext_cfg); ret = kvm_set_one_reg(cs, id, ®); if (ret != 0) { - error_report("Unable to %s extension %s in KVM, error %d", - reg ? "enable" : "disable", - multi_ext_cfg->name, ret); - exit(EXIT_FAILURE); + if (reg) { + error_report("Unable to enable extension %s in KVM, error %d", + multi_ext_cfg->name, ret); + exit(EXIT_FAILURE); + } else { + warn_report("KVM did not disable extension %s (error %d)", + multi_ext_cfg->name, ret); + } } } }
Running a KVM guest using a 6.9-rc3 kernel, in a 6.8 host that has zkr enabled, will fail with a kernel oops SIGILL right at the start. The reason is that we can't expose zkr without implementing the SEED CSR. Disabling zkr in the guest would be a workaround, but if the KVM doesn't allow it we'll error out and never boot. In hindsight this is too strict. If we keep proceeding, despite not disabling the extension in the KVM vcpu, we'll not add extension in riscv,isa. The guest kernel will be unaware of the extension, i.e. it doesn't matter if the KVM vcpu has it enabled underneath or not. So it's ok to keep booting in this case. Change our current logic to not error out if we fail to disable an extension in kvm_set_one_reg(): throw an warning instead and keep booting. We'll keep the existing behavior when we fail to enable an extension in KVM, since adding the extension in riscv,isa at this point will cause a guest malfunction because the extension isn't enabled in the vcpu. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- target/riscv/kvm/kvm-cpu.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)