diff mbox series

[RFC] target/i386: filter out VMX_PIN_BASED_POSTED_INTR when enabling SynIC

Message ID 20200218144415.94722-1-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show
Series [RFC] target/i386: filter out VMX_PIN_BASED_POSTED_INTR when enabling SynIC | expand

Commit Message

Vitaly Kuznetsov Feb. 18, 2020, 2:44 p.m. UTC
When a multi-vCPU guest is created with hv_synic, secondary vCPUs fail
to initialize with

qemu-system-x86_64: error: failed to set MSR 0x48d to 0xff00000016

This is caused by SynIC enablement on the boot CPU: when we do this
KVM disables apicv for the whole guest so we can't set
VMX_PIN_BASED_POSTED_INTR bit in MSR_IA32_VMX_TRUE_PINBASED_CTLS anymore.
(see nested_vmx_setup_ctls_msrs() in KVM).

This used to work before fine-grained VMX feature enablement because
we were not setting VMX MSRs.

Fix the issue by filtering out VMX_PIN_BASED_POSTED_INTR when enabling
SynIC. We also need to re-order kvm_init_msrs() with hyperv_init_vcpu()
so filtering on secondary CPUs happens before.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
RFC: This is somewhat similar to eVMCS breakage and it is likely possible
to fix this in KVM. I decided to try QEMU first as this is a single
control and unlike eVMCS we don't need to keep a list of things to disable.
---
 target/i386/kvm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini Feb. 18, 2020, 4:13 p.m. UTC | #1
On 18/02/20 15:44, Vitaly Kuznetsov wrote:
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> RFC: This is somewhat similar to eVMCS breakage and it is likely possible
> to fix this in KVM. I decided to try QEMU first as this is a single
> control and unlike eVMCS we don't need to keep a list of things to disable.

I think you should disable "virtual-interrupt delivery" instead (which
in turn requires "process posted interrupts" to be zero).  That is the
one that is incompatible with AutoEOI interrupts.

The ugly part about fixing this in QEMU is that in theory it would be
still possible to emulate virtual interrupt delivery and posted
interrupts, because they operate on a completely disjoint APIC
configuration than the host's.  I'm not sure we want to go there though,
so I'm thinking that again a KVM implementation is better.  It
acknowledges that this is just a limitation (workaround for a bug) in KVM.

Paolo
no-reply@patchew.org Feb. 18, 2020, 4:56 p.m. UTC | #2
Patchew URL: https://patchew.org/QEMU/20200218144415.94722-1-vkuznets@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

-...........................................................................................
+..................................E........................................................
+======================================================================
+ERROR: test_pause (__main__.TestSingleBlockdev)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "041", line 106, in test_pause
---
 Ran 91 tests
 
-OK
+FAILED (errors=1)
  TEST    iotest-qcow2: 042
  TEST    iotest-qcow2: 043
  TEST    iotest-qcow2: 046
---
  TEST    iotest-qcow2: 283
Failures: 041
Failed 1 of 116 iotests
make: *** [check-tests/check-block.sh] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=d102c1c213f9486daa4ddb9ff686404b', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-wjtutoo7/src/docker-src.2020-02-18-11.43.41.11223:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=d102c1c213f9486daa4ddb9ff686404b
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-wjtutoo7/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    12m38.662s
user    0m8.825s


The full log is available at
http://patchew.org/logs/20200218144415.94722-1-vkuznets@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Vitaly Kuznetsov Feb. 18, 2020, 5:08 p.m. UTC | #3
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 18/02/20 15:44, Vitaly Kuznetsov wrote:
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> RFC: This is somewhat similar to eVMCS breakage and it is likely possible
>> to fix this in KVM. I decided to try QEMU first as this is a single
>> control and unlike eVMCS we don't need to keep a list of things to disable.
>
> I think you should disable "virtual-interrupt delivery" instead (which
> in turn requires "process posted interrupts" to be zero).  That is the
> one that is incompatible with AutoEOI interrupts.

I'm fighting the symptoms, not the cause :-) My understanding is that
when SynIC is enabled for CPU0 KVM does

kvm_vcpu_update_apicv()
	vmx_refresh_apicv_exec_ctrl()
		pin_controls_set()

for *all* vCPUs (KVM_REQ_APICV_UPDATE). I'm not sure why
SECONDARY_EXEC_APIC_REGISTER_VIRT/SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY
are not causing problems and only PIN_BASED_POSTED_INTR does as we clear
them all (not very important atm).

>
> The ugly part about fixing this in QEMU is that in theory it would be
> still possible to emulate virtual interrupt delivery and posted
> interrupts, because they operate on a completely disjoint APIC
> configuration than the host's.  I'm not sure we want to go there though,
> so I'm thinking that again a KVM implementation is better.  It
> acknowledges that this is just a limitation (workaround for a bug) in KVM.

The KVM implementation will differ from what we've done to fix eVMCS. We
will either need to keep the controls on (and additionally check
kvm_vcpu_apicv_active() if guest tries to enable them) and again filter
VMX MSR reads from the guest or do the filtering on MSR write from
userspace (filter out the unsupported controls and not fail).

Actually, I'm starting to think it would've been easier to just filter
all VMX MSRs on KVM_SET_MSRS leaving only the supported controls and not
fail the operation. That way we would've fixed both eVMCS and SynIC
issues in a consistent way shifting the responsibility towards
userspace (document that VMX MSRs are 'special' and enabling certain
features may result in changes; if userspace wants to see the actual
state it may issue KVM_GET_MSRS any time) May not be the worst solution
after all...
Vitaly Kuznetsov Feb. 18, 2020, 5:14 p.m. UTC | #4
no-reply@patchew.org writes:

> Patchew URL: https://patchew.org/QEMU/20200218144415.94722-1-vkuznets@redhat.com/
>
>
>
> Hi,
>
> This series failed the docker-quick@centos7 build test. Please find the testing commands and
> their output below. If you have Docker installed, you can probably reproduce it
> locally.

Hm, honestly I don't see how this can be related to my patch:

--- /tmp/qemu-test/src/tests/qemu-iotests/041.out	2020-02-18 14:42:30.000000000 +0000
+++ /tmp/qemu-test/build/tests/qemu-iotests/041.out.bad	2020-02-18 16:50:07.383069241 +0000
@@ -1,5 +1,29 @@
-...........................................................................................
+..................................E........................................................
+======================================================================
+ERROR: test_pause (__main__.TestSingleBlockdev)
+----------------------------------------------------------------------
...
+Exception: Timeout waiting for job to pause
+

something else is broken?
Paolo Bonzini Feb. 18, 2020, 5:47 p.m. UTC | #5
On 18/02/20 18:08, Vitaly Kuznetsov wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 18/02/20 15:44, Vitaly Kuznetsov wrote:
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> ---
>>> RFC: This is somewhat similar to eVMCS breakage and it is likely possible
>>> to fix this in KVM. I decided to try QEMU first as this is a single
>>> control and unlike eVMCS we don't need to keep a list of things to disable.
>>
>> I think you should disable "virtual-interrupt delivery" instead (which
>> in turn requires "process posted interrupts" to be zero).  That is the
>> one that is incompatible with AutoEOI interrupts.
> 
> I'm fighting the symptoms, not the cause :-) My understanding is that
> when SynIC is enabled for CPU0 KVM does
> 
> kvm_vcpu_update_apicv()
> 	vmx_refresh_apicv_exec_ctrl()
> 		pin_controls_set()
> 
> for *all* vCPUs (KVM_REQ_APICV_UPDATE). I'm not sure why
> SECONDARY_EXEC_APIC_REGISTER_VIRT/SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY
> are not causing problems and only PIN_BASED_POSTED_INTR does as we clear
> them all (not very important atm).

Let's take a step back, what is the symptom, i.e. how does it fail?
Because thinking more about it, since we have separate VMCS we can set
PIN_BASED_POSTED_INTR and SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY just fine
in the vmcs02.  The important part is to unconditionally call
vmx_deliver_nested_posted_interrupt.

Something like

	if (kvm_x86_ops->deliver_posted_interrupt(vcpu, vector)) {
                kvm_lapic_set_irr(vector, apic);
                kvm_make_request(KVM_REQ_EVENT, vcpu);
                kvm_vcpu_kick(vcpu);
        }

and in vmx_deliver_posted_interrupt

        r = vmx_deliver_nested_posted_interrupt(vcpu, vector);
        if (!r)
                return 0;

	if (!vcpu->arch.apicv_active)
                return -1;
        ...
        return 0;

Paolo
Vitaly Kuznetsov Feb. 19, 2020, 9:54 a.m. UTC | #6
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 18/02/20 18:08, Vitaly Kuznetsov wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>>> On 18/02/20 15:44, Vitaly Kuznetsov wrote:
>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>> ---
>>>> RFC: This is somewhat similar to eVMCS breakage and it is likely possible
>>>> to fix this in KVM. I decided to try QEMU first as this is a single
>>>> control and unlike eVMCS we don't need to keep a list of things to disable.
>>>
>>> I think you should disable "virtual-interrupt delivery" instead (which
>>> in turn requires "process posted interrupts" to be zero).  That is the
>>> one that is incompatible with AutoEOI interrupts.
>> 
>> I'm fighting the symptoms, not the cause :-) My understanding is that
>> when SynIC is enabled for CPU0 KVM does
>> 
>> kvm_vcpu_update_apicv()
>> 	vmx_refresh_apicv_exec_ctrl()
>> 		pin_controls_set()
>> 
>> for *all* vCPUs (KVM_REQ_APICV_UPDATE). I'm not sure why
>> SECONDARY_EXEC_APIC_REGISTER_VIRT/SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY
>> are not causing problems and only PIN_BASED_POSTED_INTR does as we clear
>> them all (not very important atm).
>
> Let's take a step back, what is the symptom, i.e. how does it fail?

I just do

~/qemu/x86_64-softmmu/qemu-system-x86_64 -machine q35,accel=kvm -cpu host,hv_vpindex,hv_synic -smp 2 -m 16384 -vnc :0
and get
qemu-system-x86_64: error: failed to set MSR 0x48d to 0xff00000016
qemu-system-x86_64: /root/qemu/target/i386/kvm.c:2684: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Aborted

(it works with '-smp 1' or without 'hv_synic')

> Because thinking more about it, since we have separate VMCS we can set
> PIN_BASED_POSTED_INTR and SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY just fine
> in the vmcs02.
> The important part is to unconditionally call
> vmx_deliver_nested_posted_interrupt.
>
> Something like
>
> 	if (kvm_x86_ops->deliver_posted_interrupt(vcpu, vector)) {
>                 kvm_lapic_set_irr(vector, apic);
>                 kvm_make_request(KVM_REQ_EVENT, vcpu);
>                 kvm_vcpu_kick(vcpu);
>         }
>
> and in vmx_deliver_posted_interrupt
>
>         r = vmx_deliver_nested_posted_interrupt(vcpu, vector);
>         if (!r)
>                 return 0;
>
> 	if (!vcpu->arch.apicv_active)
>                 return -1;
>         ...
>         return 0;

Sound like a plan, let me try playing with it.
diff mbox series

Patch

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 69eb43d796e6..6829b597fdbf 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1366,6 +1366,7 @@  static Error *hv_no_nonarch_cs_mig_blocker;
 static int hyperv_init_vcpu(X86CPU *cpu)
 {
     CPUState *cs = CPU(cpu);
+    CPUX86State *env = &cpu->env;
     Error *local_err = NULL;
     int ret;
 
@@ -1431,6 +1432,9 @@  static int hyperv_init_vcpu(X86CPU *cpu)
             return ret;
         }
 
+        /* When SynIC is enabled, APICv controls become unavailable */
+        env->features[FEAT_VMX_PINBASED_CTLS] &= ~VMX_PIN_BASED_POSTED_INTR;
+
         if (!cpu->hyperv_synic_kvm_only) {
             ret = hyperv_x86_synic_add(cpu);
             if (ret < 0) {
@@ -1845,13 +1849,13 @@  int kvm_arch_init_vcpu(CPUState *cs)
         has_msr_tsc_aux = false;
     }
 
-    kvm_init_msrs(cpu);
-
     r = hyperv_init_vcpu(cpu);
     if (r) {
         goto fail;
     }
 
+    kvm_init_msrs(cpu);
+
     return 0;
 
  fail: