mbox series

[0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

Message ID 20201007014417.29276-1-sean.j.christopherson@intel.com (mailing list archive)
Headers show
Series KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup | expand

Message

Sean Christopherson Oct. 7, 2020, 1:44 a.m. UTC
Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.

The overarching issue is that kvm_x86_ops.set_cr4() can fail, but its
invocation from __set_sregs(), a.k.a. KVM_SET_SREGS, ignores the result.
Fix the issue by moving all validity checks out of .set_cr4() in one way
or another.

I intentionally omitted a Cc to stable.  The first bug fix in particular
may break stable trees as it simply removes a check, and I don't know that
stable trees have the generic CR4 reserved bit check that is needed to
prevent the guest from setting VMXE when nVMX is not allowed.

Sean Christopherson (6):
  KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()
  KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()
  KVM: SVM: Drop VMXE check from svm_set_cr4()
  KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook
  KVM: x86: Return bool instead of int for CR4 and SREGS validity checks
  KVM: selftests: Verify supported CR4 bits can be set before
    KVM_SET_CPUID2

 arch/x86/include/asm/kvm_host.h               |  3 +-
 arch/x86/kvm/svm/nested.c                     |  2 +-
 arch/x86/kvm/svm/svm.c                        | 12 ++-
 arch/x86/kvm/svm/svm.h                        |  2 +-
 arch/x86/kvm/vmx/nested.c                     |  2 +-
 arch/x86/kvm/vmx/vmx.c                        | 35 +++----
 arch/x86/kvm/vmx/vmx.h                        |  2 +-
 arch/x86/kvm/x86.c                            | 28 +++---
 arch/x86/kvm/x86.h                            |  2 +-
 .../selftests/kvm/include/x86_64/processor.h  | 17 ++++
 .../selftests/kvm/include/x86_64/vmx.h        |  4 -
 .../selftests/kvm/x86_64/set_sregs_test.c     | 92 ++++++++++++++++++-
 12 files changed, 153 insertions(+), 48 deletions(-)

Comments

stsp Oct. 8, 2020, 4 p.m. UTC | #1
07.10.2020 04:44, Sean Christopherson пишет:
> Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
Hi Sean & KVM devs.

I tested the patches, and wherever I
set VMXE in CR4, I now get
KVM: KVM_SET_SREGS: Invalid argument
Before the patch I was able (with many
problems, but still) to set VMXE sometimes.

So its a NAK so far, waiting for an update. :)
Sean Christopherson Oct. 8, 2020, 5:59 p.m. UTC | #2
On Thu, Oct 08, 2020 at 07:00:13PM +0300, stsp wrote:
> 07.10.2020 04:44, Sean Christopherson пишет:
> >Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
> Hi Sean & KVM devs.
> 
> I tested the patches, and wherever I
> set VMXE in CR4, I now get
> KVM: KVM_SET_SREGS: Invalid argument
> Before the patch I was able (with many
> problems, but still) to set VMXE sometimes.
> 
> So its a NAK so far, waiting for an update. :)

IIRC, you said you were going to test on AMD?  Assuming that's correct, -EINVAL
is the expected behavior.  KVM was essentially lying before; it never actually
set CR4.VMXE in hardware, it just didn't properply detect the error and so VMXE
was set in KVM's shadow of the guest's CR4.
stsp Oct. 8, 2020, 6:18 p.m. UTC | #3
08.10.2020 20:59, Sean Christopherson пишет:
> On Thu, Oct 08, 2020 at 07:00:13PM +0300, stsp wrote:
>> 07.10.2020 04:44, Sean Christopherson пишет:
>>> Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
>> Hi Sean & KVM devs.
>>
>> I tested the patches, and wherever I
>> set VMXE in CR4, I now get
>> KVM: KVM_SET_SREGS: Invalid argument
>> Before the patch I was able (with many
>> problems, but still) to set VMXE sometimes.
>>
>> So its a NAK so far, waiting for an update. :)
> IIRC, you said you were going to test on AMD?  Assuming that's correct,

Yes, that is true.


>   -EINVAL
> is the expected behavior.  KVM was essentially lying before; it never actually
> set CR4.VMXE in hardware, it just didn't properply detect the error and so VMXE
> was set in KVM's shadow of the guest's CR4.

Hmm. But at least it was lying
similarly on AMD and Intel CPUs. :)
So I was able to reproduce the problems
myself.
Do you mean, any AMD tests are now
useless, and we need to proceed with
Intel tests only?

Then additional question.
On old Intel CPUs we needed to set
VMXE in guest to make it to work in
nested-guest mode.
Is it still needed even with your patches?
Or the nested-guest mode will work
now even on older Intel CPUs and KVM
will set VMXE for us itself, when needed?
Sean Christopherson Oct. 9, 2020, 4:04 a.m. UTC | #4
On Thu, Oct 08, 2020 at 09:18:18PM +0300, stsp wrote:
> 08.10.2020 20:59, Sean Christopherson пишет:
> >On Thu, Oct 08, 2020 at 07:00:13PM +0300, stsp wrote:
> >>07.10.2020 04:44, Sean Christopherson пишет:
> >>>Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
> >>Hi Sean & KVM devs.
> >>
> >>I tested the patches, and wherever I
> >>set VMXE in CR4, I now get
> >>KVM: KVM_SET_SREGS: Invalid argument
> >>Before the patch I was able (with many
> >>problems, but still) to set VMXE sometimes.
> >>
> >>So its a NAK so far, waiting for an update. :)
> >IIRC, you said you were going to test on AMD?  Assuming that's correct,
> 
> Yes, that is true.
> 
> 
> >  -EINVAL
> >is the expected behavior.  KVM was essentially lying before; it never actually
> >set CR4.VMXE in hardware, it just didn't properply detect the error and so VMXE
> >was set in KVM's shadow of the guest's CR4.
> 
> Hmm. But at least it was lying
> similarly on AMD and Intel CPUs. :)
> So I was able to reproduce the problems
> myself.
> Do you mean, any AMD tests are now useless, and we need to proceed with Intel
> tests only?

For anything VMXE related, yes.

> Then additional question.
> On old Intel CPUs we needed to set VMXE in guest to make it to work in
> nested-guest mode.
> Is it still needed even with your patches?
> Or the nested-guest mode will work now even on older Intel CPUs and KVM will
> set VMXE for us itself, when needed?

I'm struggling to even come up with a theory as to how setting VMXE from
userspace would have impacted KVM with unrestricted_guest=n, let alone fixed
anything.

CR4.VMXE must always be 1 in _hardware_ when VMX is on, including when running
the guest.  But KVM forces vmcs.GUEST_CR4.VMXE=1 at all times, regardless of
the guest's actual value (the guest sees a shadow value when it reads CR4).

And unless I grossly misunderstand dosemu2, it's not doing anything related to
nested virtualization, i.e. the stuffing VMXE=1 for the guest's shadow value
should have absolutely zero impact.

More than likely, VMXE was a red herring.  Given that the reporter is also
seeing the same bug on bare metal after moving to kernel 5.4, odds are good
the issue is related to unrestricted_guest=n and has nothing to do with nVMX.
stsp Oct. 9, 2020, 2:11 p.m. UTC | #5
09.10.2020 07:04, Sean Christopherson пишет:
>> Hmm. But at least it was lying
>> similarly on AMD and Intel CPUs. :)
>> So I was able to reproduce the problems
>> myself.
>> Do you mean, any AMD tests are now useless, and we need to proceed with Intel
>> tests only?
> For anything VMXE related, yes.

What would be the expected behaviour
on Intel, if it is set? Any difference with AMD?


>> Then additional question.
>> On old Intel CPUs we needed to set VMXE in guest to make it to work in
>> nested-guest mode.
>> Is it still needed even with your patches?
>> Or the nested-guest mode will work now even on older Intel CPUs and KVM will
>> set VMXE for us itself, when needed?
> I'm struggling to even come up with a theory as to how setting VMXE from
> userspace would have impacted KVM with unrestricted_guest=n, let alone fixed
> anything.
>
> CR4.VMXE must always be 1 in _hardware_ when VMX is on, including when running
> the guest.  But KVM forces vmcs.GUEST_CR4.VMXE=1 at all times, regardless of
> the guest's actual value (the guest sees a shadow value when it reads CR4).
>
> And unless I grossly misunderstand dosemu2, it's not doing anything related to
> nested virtualization, i.e. the stuffing VMXE=1 for the guest's shadow value
> should have absolutely zero impact.
>
> More than likely, VMXE was a red herring.

Yes, it was. :(
(as you can see from the end of the
github thread)


>    Given that the reporter is also
> seeing the same bug on bare metal after moving to kernel 5.4, odds are good
> the issue is related to unrestricted_guest=n and has nothing to do with nVMX.

But we do not use unrestricted guest.
We use v86 under KVM.
The only other effect of setting VMXE
was clearing VME. Which shouldn't affect
anything either, right?
Sean Christopherson Oct. 9, 2020, 3:30 p.m. UTC | #6
On Fri, Oct 09, 2020 at 05:11:51PM +0300, stsp wrote:
> 09.10.2020 07:04, Sean Christopherson пишет:
> >>Hmm. But at least it was lying
> >>similarly on AMD and Intel CPUs. :)
> >>So I was able to reproduce the problems
> >>myself.
> >>Do you mean, any AMD tests are now useless, and we need to proceed with Intel
> >>tests only?
> >For anything VMXE related, yes.
> 
> What would be the expected behaviour on Intel, if it is set? Any difference
> with AMD?

On Intel, userspace should be able to stuff CR4.VMXE=1 via KVM_SET_SREGS if
the 'nested' module param is 1, e.g. if 'modprobe kvm_intel nested=1'.  Note,
'nested' is enabled by default on kernel 5.0 and later.

With AMD, setting CR4.VMXE=1 is never allowed as AMD doesn't support VMX,
AMD's virtualization solution is called SVM (Secure Virtual Machine).  KVM
doesn't support nesting VMX within SVM and vice versa.

> >>Then additional question.
> >>On old Intel CPUs we needed to set VMXE in guest to make it to work in
> >>nested-guest mode.
> >>Is it still needed even with your patches?
> >>Or the nested-guest mode will work now even on older Intel CPUs and KVM will
> >>set VMXE for us itself, when needed?
> >I'm struggling to even come up with a theory as to how setting VMXE from
> >userspace would have impacted KVM with unrestricted_guest=n, let alone fixed
> >anything.
> >
> >CR4.VMXE must always be 1 in _hardware_ when VMX is on, including when running
> >the guest.  But KVM forces vmcs.GUEST_CR4.VMXE=1 at all times, regardless of
> >the guest's actual value (the guest sees a shadow value when it reads CR4).
> >
> >And unless I grossly misunderstand dosemu2, it's not doing anything related to
> >nested virtualization, i.e. the stuffing VMXE=1 for the guest's shadow value
> >should have absolutely zero impact.
> >
> >More than likely, VMXE was a red herring.
> 
> Yes, it was. :( (as you can see from the end of the github thread)
> 
> 
> >   Given that the reporter is also
> >seeing the same bug on bare metal after moving to kernel 5.4, odds are good
> >the issue is related to unrestricted_guest=n and has nothing to do with nVMX.
> 
> But we do not use unrestricted guest.
> We use v86 under KVM.

Unrestricted guest can kick in even if CR0.PG=1 && CR0.PE=1, e.g. there are
segmentation checks that apply if and only if unrestricted_guest=0.  Long story
short, without a deep audit, it's basically impossible to rule out a dependency
on unrestricted guest since you're playing around with v86.
 
> The only other effect of setting VMXE was clearing VME. Which shouldn't
> affect anything either, right?

Hmm, clearing VME would mean that exceptions/interrupts within the guest would
trigger a switch out of v86 and into vanilla protected mode.  v86 and PM have
different consistency checks, particularly for segmentation, so it's plausible
that clearing CR4.VME inadvertantly worked around the bug by avoiding invalid
guest state for v86.
stsp Oct. 9, 2020, 3:48 p.m. UTC | #7
09.10.2020 18:30, Sean Christopherson пишет:
> On Fri, Oct 09, 2020 at 05:11:51PM +0300, stsp wrote:
>> 09.10.2020 07:04, Sean Christopherson пишет:
>>>> Hmm. But at least it was lying
>>>> similarly on AMD and Intel CPUs. :)
>>>> So I was able to reproduce the problems
>>>> myself.
>>>> Do you mean, any AMD tests are now useless, and we need to proceed with Intel
>>>> tests only?
>>> For anything VMXE related, yes.
>> What would be the expected behaviour on Intel, if it is set? Any difference
>> with AMD?
> On Intel, userspace should be able to stuff CR4.VMXE=1 via KVM_SET_SREGS if
> the 'nested' module param is 1, e.g. if 'modprobe kvm_intel nested=1'.  Note,
> 'nested' is enabled by default on kernel 5.0 and later.

So if I understand you correctly, we
need to test that:
- with nested=0 VMXE gives EINVAL
- with nested=1 VMXE changes nothing
visible, except probably to allow guest
to read that value (we won't test guest
reading though).

Is this correct?


> With AMD, setting CR4.VMXE=1 is never allowed as AMD doesn't support VMX,

OK, for that I can give you a
Tested-by: Stas Sergeev <stsp@users.sourceforge.net>

because I confirm that on AMD it now
consistently returns EINVAL, whereas
without your patches it did random crap,
depending on whether it is a first call to
KVM_SET_SREGS, or not first.


>> But we do not use unrestricted guest.
>> We use v86 under KVM.
> Unrestricted guest can kick in even if CR0.PG=1 && CR0.PE=1, e.g. there are
> segmentation checks that apply if and only if unrestricted_guest=0.  Long story
> short, without a deep audit, it's basically impossible to rule out a dependency
> on unrestricted guest since you're playing around with v86.

You mean "unrestricted_guest" as a module
parameter, rather than the similar named CPU
feature, right? So we may depend on
unrestricted_guest parameter, but not on a
hardware feature, correct?


>> The only other effect of setting VMXE was clearing VME. Which shouldn't
>> affect anything either, right?
> Hmm, clearing VME would mean that exceptions/interrupts within the guest would
> trigger a switch out of v86 and into vanilla protected mode.  v86 and PM have
> different consistency checks, particularly for segmentation, so it's plausible
> that clearing CR4.VME inadvertantly worked around the bug by avoiding invalid
> guest state for v86.

Lets assume that was the case.
With those github guys its not possible
to do any consistent checks. :(
Sean Christopherson Oct. 9, 2020, 4:11 p.m. UTC | #8
On Fri, Oct 09, 2020 at 06:48:21PM +0300, stsp wrote:
> 09.10.2020 18:30, Sean Christopherson пишет:
> >On Fri, Oct 09, 2020 at 05:11:51PM +0300, stsp wrote:
> >>09.10.2020 07:04, Sean Christopherson пишет:
> >>>>Hmm. But at least it was lying
> >>>>similarly on AMD and Intel CPUs. :)
> >>>>So I was able to reproduce the problems
> >>>>myself.
> >>>>Do you mean, any AMD tests are now useless, and we need to proceed with Intel
> >>>>tests only?
> >>>For anything VMXE related, yes.
> >>What would be the expected behaviour on Intel, if it is set? Any difference
> >>with AMD?
> >On Intel, userspace should be able to stuff CR4.VMXE=1 via KVM_SET_SREGS if
> >the 'nested' module param is 1, e.g. if 'modprobe kvm_intel nested=1'.  Note,
> >'nested' is enabled by default on kernel 5.0 and later.
> 
> So if I understand you correctly, we
> need to test that:
> - with nested=0 VMXE gives EINVAL
> - with nested=1 VMXE changes nothing
> visible, except probably to allow guest
> to read that value (we won't test guest
> reading though).
> 
> Is this correct?

Yep, exactly!
 
> >With AMD, setting CR4.VMXE=1 is never allowed as AMD doesn't support VMX,
> 
> OK, for that I can give you a
> Tested-by: Stas Sergeev <stsp@users.sourceforge.net>
> 
> because I confirm that on AMD it now consistently returns EINVAL, whereas
> without your patches it did random crap, depending on whether it is a first
> call to KVM_SET_SREGS, or not first.
> 
> 
> >>But we do not use unrestricted guest.
> >>We use v86 under KVM.
> >Unrestricted guest can kick in even if CR0.PG=1 && CR0.PE=1, e.g. there are
> >segmentation checks that apply if and only if unrestricted_guest=0.  Long story
> >short, without a deep audit, it's basically impossible to rule out a dependency
> >on unrestricted guest since you're playing around with v86.
> 
> You mean "unrestricted_guest" as a module parameter, rather than the similar
> named CPU feature, right? So we may depend on unrestricted_guest parameter,
> but not on a hardware feature, correct?

The unrestricted_guest module param is tied directly to the hardware feature,
i.e. if kvm_intel.unrestricted_guest=0 then KVM will run guests with
unrestricted guest disabled.  That doesn't necessarily mean any of the
behavior that is allowed by unrestricted guest will be encountered, but if
it is encountered, then it will be handled by the CPU instead of causing a
VM-Exit and requiring KVM emulation.

The reported is using an old CPU that doesn't support unrestricted guest,
so both the hardware feature and the module param will be off/0.

> >>The only other effect of setting VMXE was clearing VME. Which shouldn't
> >>affect anything either, right?
> >Hmm, clearing VME would mean that exceptions/interrupts within the guest would
> >trigger a switch out of v86 and into vanilla protected mode.  v86 and PM have
> >different consistency checks, particularly for segmentation, so it's plausible
> >that clearing CR4.VME inadvertantly worked around the bug by avoiding invalid
> >guest state for v86.
> 
> Lets assume that was the case.  With those github guys its not possible to do
> any consistent checks. :(

K.  If this is ever a problem in the future, having a way relatively simple
reproducer, e.g. something we can run without having to build/install a
variety of tools, would make it easier to debug.  In theory, the bug should be
reproducible even on modern hardware by loading KVM with unrestricted_guest=0.
Paolo Bonzini Nov. 13, 2020, 11:36 a.m. UTC | #9
On 07/10/20 03:44, Sean Christopherson wrote:
> Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
> 
> The overarching issue is that kvm_x86_ops.set_cr4() can fail, but its
> invocation from __set_sregs(), a.k.a. KVM_SET_SREGS, ignores the result.
> Fix the issue by moving all validity checks out of .set_cr4() in one way
> or another.
> 
> I intentionally omitted a Cc to stable.  The first bug fix in particular
> may break stable trees as it simply removes a check, and I don't know that
> stable trees have the generic CR4 reserved bit check that is needed to
> prevent the guest from setting VMXE when nVMX is not allowed.
> 
> Sean Christopherson (6):
>    KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()
>    KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()
>    KVM: SVM: Drop VMXE check from svm_set_cr4()
>    KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook
>    KVM: x86: Return bool instead of int for CR4 and SREGS validity checks
>    KVM: selftests: Verify supported CR4 bits can be set before
>      KVM_SET_CPUID2
> 
>   arch/x86/include/asm/kvm_host.h               |  3 +-
>   arch/x86/kvm/svm/nested.c                     |  2 +-
>   arch/x86/kvm/svm/svm.c                        | 12 ++-
>   arch/x86/kvm/svm/svm.h                        |  2 +-
>   arch/x86/kvm/vmx/nested.c                     |  2 +-
>   arch/x86/kvm/vmx/vmx.c                        | 35 +++----
>   arch/x86/kvm/vmx/vmx.h                        |  2 +-
>   arch/x86/kvm/x86.c                            | 28 +++---
>   arch/x86/kvm/x86.h                            |  2 +-
>   .../selftests/kvm/include/x86_64/processor.h  | 17 ++++
>   .../selftests/kvm/include/x86_64/vmx.h        |  4 -
>   .../selftests/kvm/x86_64/set_sregs_test.c     | 92 ++++++++++++++++++-
>   12 files changed, 153 insertions(+), 48 deletions(-)
> 

Queued, thanks.

Paolo
stsp Dec. 7, 2020, 11:19 a.m. UTC | #10
09.10.2020 18:30, Sean Christopherson пишет:
>> The only other effect of setting VMXE was clearing VME. Which shouldn't
>> affect anything either, right?
> Hmm, clearing VME would mean that exceptions/interrupts within the guest would
> trigger a switch out of v86 and into vanilla protected mode.  v86 and PM have
> different consistency checks, particularly for segmentation, so it's plausible
> that clearing CR4.VME inadvertantly worked around the bug by avoiding invalid
> guest state for v86.

Almost.

So with your patch set (thanks!) and a
bit of further investigations, it now became
clear where the problem is.
We have this code:
---

|cpuid->nent = 2; // Use the same values as in emu-i386/simx86/interp.c 
// (Pentium 133-200MHz, "GenuineIntel") cpuid->entries[0] = (struct 
kvm_cpuid_entry) { .function = 0, .eax = 1, .ebx = 0x756e6547, .ecx = 
0x6c65746e, .edx = 0x49656e69 }; // family 5, model 2, stepping 12, fpu 
vme de pse tsc msr mce cx8 cpuid->entries[1] = (struct kvm_cpuid_entry) 
{ .function = 1, .eax = 0x052c, .ebx = 0, .ecx = 0, .edx = 0x1bf }; ret 
= ioctl(vcpufd, KVM_SET_CPUID, cpuid); free(cpuid); if (ret == -1) { 
perror("KVM: KVM_SET_CPUID"); return 0; } --- It tries to enable VME 
among other things. qemu appears to disable VME by default, unless you 
do "-cpu host". So we have a situation where the host (which is qemu) 
doesn't have VME, and guest (dosemu) is trying to enable it. Now obviously ||KVM_SET_CPUID|  doesn't check anyting
at all and returns success. That later turns
into an invalid guest state.

Question: should|KVM_SET_CPUID|  check for
supported bits, end return error if not everything
is supported?
||
stsp Dec. 7, 2020, 11:24 a.m. UTC | #11
[re-send because of bad formatting]

09.10.2020 18:30, Sean Christopherson пишет:
>> The only other effect of setting VMXE was clearing VME. Which shouldn't
>> affect anything either, right?
> Hmm, clearing VME would mean that exceptions/interrupts within the 
> guest would
> trigger a switch out of v86 and into vanilla protected mode. v86 and 
> PM have
> different consistency checks, particularly for segmentation, so it's 
> plausible
> that clearing CR4.VME inadvertantly worked around the bug by avoiding 
> invalid
> guest state for v86.

Almost.

So with your patch set (thanks!) and a
bit of further investigations, it now became
clear where the problem is.
We have this code:
---

|cpuid->nent = 2; // Use the same values as in emu-i386/simx86/interp.c 
// (Pentium 133-200MHz, "GenuineIntel") cpuid->entries[0] = (struct 
kvm_cpuid_entry) { .function = 0, .eax = 1, .ebx = 0x756e6547, .ecx = 
0x6c65746e, .edx = 0x49656e69 }; // family 5, model 2, stepping 12, fpu 
vme de pse tsc msr mce cx8 cpuid->entries[1] = (struct kvm_cpuid_entry) 
{ .function = 1, .eax = 0x052c, .ebx = 0, .ecx = 0, .edx = 0x1bf }; ret 
= ioctl(vcpufd, KVM_SET_CPUID, cpuid); free(cpuid); if (ret == -1) { 
perror("KVM: KVM_SET_CPUID"); return 0; }|

---


It tries to enable VME among other things.
qemu appears to disable VME by default,
unless you do "-cpu host". So we have a situation where
the host (which is qemu) doesn't have VME,
and guest (dosemu) is trying to enable it.
Now obviously KVM_SET_CPUID doesn't check anyting
at all and returns success. That later turns
into an invalid guest state.


Question: should KVM_SET_CPUID check for
supported bits, end return error if not everything
is supported?
Paolo Bonzini Dec. 7, 2020, 11:29 a.m. UTC | #12
On 07/12/20 12:24, stsp wrote:
> It tries to enable VME among other things.
> qemu appears to disable VME by default,
> unless you do "-cpu host". So we have a situation where
> the host (which is qemu) doesn't have VME,
> and guest (dosemu) is trying to enable it.
> Now obviously KVM_SET_CPUID doesn't check anyting
> at all and returns success. That later turns
> into an invalid guest state.
> 
> 
> Question: should KVM_SET_CPUID check for
> supported bits, end return error if not everything
> is supported?

No, it is intentional.  Most bits of CPUID are not ever checked by KVM, 
so userspace is supposed to set values that makes sense or just copy the 
value of KVM_GET_SUPPORTED_CPUID more or less blindly.

Paolo
stsp Dec. 7, 2020, 11:47 a.m. UTC | #13
07.12.2020 14:29, Paolo Bonzini пишет:
> On 07/12/20 12:24, stsp wrote:
>> It tries to enable VME among other things.
>> qemu appears to disable VME by default,
>> unless you do "-cpu host". So we have a situation where
>> the host (which is qemu) doesn't have VME,
>> and guest (dosemu) is trying to enable it.
>> Now obviously KVM_SET_CPUID doesn't check anyting
>> at all and returns success. That later turns
>> into an invalid guest state.
>>
>>
>> Question: should KVM_SET_CPUID check for
>> supported bits, end return error if not everything
>> is supported?
>
> No, it is intentional.  Most bits of CPUID are not ever checked by 
> KVM, so userspace is supposed to set values that makes sense
By "that makes sense" you probably
meant to say "bits_that_makes_sense masked
with the ones returned by KVM_GET_SUPPORTED_CPUID"?

So am I right that KVM_SET_CPUID only "lowers"
the supported bits? In which case I don't need to
call it at all, but instead just call KVM_GET_SUPPORTED_CPUID
and see if the needed bits are supported, and
exit otherwise, right?
stsp Dec. 7, 2020, 2:03 p.m. UTC | #14
07.12.2020 16:35, Paolo Bonzini пишет:
>
>
> Il lun 7 dic 2020, 12:47 stsp <stsp2@yandex.ru 
> <mailto:stsp2@yandex.ru>> ha scritto:
>
>     So am I right that KVM_SET_CPUID only "lowers"
>     the supported bits? In which case I don't need to
>     call it at all, but instead just call KVM_GET_SUPPORTED_CPUID
>     and see if the needed bits are supported, and
>     exit otherwise, right?
>
>
> You always have to call KVM_SET_CPUID2, but you can just pass in 
> whatever you got from KVM_GET_SUPPORTED_CPUID.
OK, done that, thanks.
(after checking that KVM_GET_SUPPORTED_CPUID
actually has the needed features itself, otherwise exit).

Perhaps it would be good if guest cpuid to
have a default values of KVM_GET_SUPPORTED_CPUID,
so that the user doesn't have to do the needless
calls to just copy host features to guest cpuid.
stsp Dec. 7, 2020, 2:29 p.m. UTC | #15
07.12.2020 17:09, Paolo Bonzini пишет:
>
>
> Il lun 7 dic 2020, 15:04 stsp <stsp2@yandex.ru 
> <mailto:stsp2@yandex.ru>> ha scritto:
>
>     Perhaps it would be good if guest cpuid to
>     have a default values of KVM_GET_SUPPORTED_CPUID,
>     so that the user doesn't have to do the needless
>     calls to just copy host features to guest cpuid.
>
>
> It is too late to change that aspect of the API, unfortunately. We 
> don't know how various userspaces would behave.
Which means some sensible behaviour
already exists if I don't call KVM_SET_CPUID2.
So what is it, #UD on CPUID?
Would be good to have that documented.
stsp Dec. 7, 2020, 2:41 p.m. UTC | #16
07.12.2020 17:34, Paolo Bonzini пишет:
>
>     > It is too late to change that aspect of the API, unfortunately. We
>     > don't know how various userspaces would behave.
>     Which means some sensible behaviour
>     already exists if I don't call KVM_SET_CPUID2.
>     So what is it, #UD on CPUID?
>
>
> I would have to check but I think you always get zeroes; not entirely 
> sensible.
In that case I would argue that you can't
break anything by changing that to something
sensible. :)
But anyway, since my problem is solved,
this is just a potential improvement for the
future, or the case for documenting.
Jim Mattson Dec. 7, 2020, 11:59 p.m. UTC | #17
On Mon, Dec 7, 2020 at 3:47 AM stsp <stsp2@yandex.ru> wrote:
>
> 07.12.2020 14:29, Paolo Bonzini пишет:
> > On 07/12/20 12:24, stsp wrote:
> >> It tries to enable VME among other things.
> >> qemu appears to disable VME by default,
> >> unless you do "-cpu host". So we have a situation where
> >> the host (which is qemu) doesn't have VME,
> >> and guest (dosemu) is trying to enable it.
> >> Now obviously KVM_SET_CPUID doesn't check anyting
> >> at all and returns success. That later turns
> >> into an invalid guest state.
> >>
> >>
> >> Question: should KVM_SET_CPUID check for
> >> supported bits, end return error if not everything
> >> is supported?
> >
> > No, it is intentional.  Most bits of CPUID are not ever checked by
> > KVM, so userspace is supposed to set values that makes sense
> By "that makes sense" you probably
> meant to say "bits_that_makes_sense masked
> with the ones returned by KVM_GET_SUPPORTED_CPUID"?
>
> So am I right that KVM_SET_CPUID only "lowers"
> the supported bits? In which case I don't need to
> call it at all, but instead just call KVM_GET_SUPPORTED_CPUID
> and see if the needed bits are supported, and
> exit otherwise, right?

"Lowers" is a tricky concept for CPUID information. Some feature bits
report 0 for "present" and 1 for "not-present." Some multi-bit fields
are interpreted as numbers, which may be signed or unsigned. Some
multi-bit fields are strings. Some fields have dependencies on other
fields. Etc.