KVM: x86: Rework INIT and SIPI handling

Message ID	51403DEF.3090403@siemens.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@vger.kernel.org> Message-ID: <51403DEF.3090403@siemens.com> Date: Wed, 13 Mar 2013 09:50:55 +0100 From: Jan Kiszka <jan.kiszka@siemens.com> User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666 MIME-Version: 1.0 To: Gleb Natapov <gleb@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com> CC: kvm <kvm@vger.kernel.org>, Paolo Bonzini <pbonzini@redhat.com> Subject: [PATCH] KVM: x86: Rework INIT and SIPI handling Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Jan Kiszka March 13, 2013, 8:50 a.m. UTC

A VCPU sending INIT or SIPI to some other VCPU races for setting the
remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
was overwritten by kvm_emulate_halt and, thus, got lost.

This introduces APIC events for those two signals, keeping them in
kvm_apic until kvm_apic_accept_events is run over the target vcpu
context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
are pending events, thus if vcpu blocking should end.

The patch comes with the side effect of effectively obsoleting
KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
state. That also means we no longer exit to user space after receiving a
SIPI event.

Furthermore, we already reset the VCPU on INIT, only fixing up the code
segment later on when SIPI arrives. Moreover, we fix INIT handling for
the BSP: it never enter wait-for-SIPI but directly starts over on INIT.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/include/asm/kvm_host.h |    3 +-
 arch/x86/kvm/lapic.c            |   47 ++++++++++++++++++++++++++-----
 arch/x86/kvm/lapic.h            |   11 +++++++
 arch/x86/kvm/svm.c              |    5 +---
 arch/x86/kvm/vmx.c              |    4 ---
 arch/x86/kvm/x86.c              |   57 +++++++++++++++++++++++++-------------
 6 files changed, 90 insertions(+), 37 deletions(-)

Gleb Natapov March 13, 2013, 10:31 a.m. UTC | #1

On Wed, Mar 13, 2013 at 09:50:55AM +0100, Jan Kiszka wrote:
> A VCPU sending INIT or SIPI to some other VCPU races for setting the
> remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
> was overwritten by kvm_emulate_halt and, thus, got lost.
> 
> This introduces APIC events for those two signals, keeping them in
> kvm_apic until kvm_apic_accept_events is run over the target vcpu
> context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
> are pending events, thus if vcpu blocking should end.
> 
> The patch comes with the side effect of effectively obsoleting
> KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
> immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
> The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
> state. That also means we no longer exit to user space after receiving a
> SIPI event.
> 
> Furthermore, we already reset the VCPU on INIT, only fixing up the code
> segment later on when SIPI arrives. Moreover, we fix INIT handling for
> the BSP: it never enter wait-for-SIPI but directly starts over on INIT.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    3 +-
>  arch/x86/kvm/lapic.c            |   47 ++++++++++++++++++++++++++-----
>  arch/x86/kvm/lapic.h            |   11 +++++++
>  arch/x86/kvm/svm.c              |    5 +---
>  arch/x86/kvm/vmx.c              |    4 ---
>  arch/x86/kvm/x86.c              |   57 +++++++++++++++++++++++++-------------
>  6 files changed, 90 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 348d859..ef7f4a5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -345,7 +345,6 @@ struct kvm_vcpu_arch {
>  	unsigned long apic_attention;
>  	int32_t apic_arb_prio;
>  	int mp_state;
> -	int sipi_vector;
>  	u64 ia32_misc_enable_msr;
>  	bool tpr_access_reporting;
>  
> @@ -819,6 +818,7 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
>  
>  void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
>  int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
> +void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector);
>  
>  int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
>  		    int reason, bool has_error_code, u32 error_code);
> @@ -1002,6 +1002,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
> +void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>  
>  void kvm_define_shared_msr(unsigned index, u32 msr);
>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 02b51dd..6c16230 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -685,6 +685,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  {
>  	int result = 0;
>  	struct kvm_vcpu *vcpu = apic->vcpu;
> +	unsigned long e;
>  
>  	switch (delivery_mode) {
>  	case APIC_DM_LOWEST:
> @@ -731,7 +732,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  	case APIC_DM_INIT:
>  		if (!trig_mode || level) {
>  			result = 1;
> -			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> +			e = apic->pending_events;
> +			while (!test_bit(KVM_APIC_INIT, &e))
> +				e = cmpxchg(&apic->pending_events, e,
> +					    (e | (1UL << KVM_APIC_INIT)) &
> +					    ~(1UL << KVM_APIC_SIPI));
Can you please add a comment why is this needed.

>  			kvm_make_request(KVM_REQ_EVENT, vcpu);
>  			kvm_vcpu_kick(vcpu);
>  		} else {
> @@ -743,13 +748,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  	case APIC_DM_STARTUP:
>  		apic_debug("SIPI to vcpu %d vector 0x%02x\n",
>  			   vcpu->vcpu_id, vector);
> -		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> -			result = 1;
> -			vcpu->arch.sipi_vector = vector;
> -			vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED;
> -			kvm_make_request(KVM_REQ_EVENT, vcpu);
> -			kvm_vcpu_kick(vcpu);
> -		}
> +		result = 1;
> +		apic->sipi_vector = vector;
> +		/* make sure sipi_vector is visible for the receiver */
> +		smp_wmb();
> +		set_bit(KVM_APIC_SIPI, &apic->pending_events);
> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +		kvm_vcpu_kick(vcpu);
>  		break;
>  
>  	case APIC_DM_EXTINT:
> @@ -1860,6 +1865,32 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
>  					 addr);
>  }
>  
> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +	unsigned int sipi_vector;
> +
> +	if (!kvm_vcpu_has_lapic(vcpu))
> +		return;
> +
> +	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> +		kvm_lapic_reset(vcpu);
> +		kvm_vcpu_reset(vcpu);
> +		if (kvm_vcpu_is_bsp(apic->vcpu))
> +			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +		else
> +			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> +	}
> +	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> +	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> +		sipi_vector = apic->sipi_vector;
> +		pr_debug("vcpu %d received sipi with vector # %x\n",
> +			 vcpu->vcpu_id, sipi_vector);
> +		kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
> +		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +	}
> +}
> +
>  void kvm_lapic_init(void)
>  {
>  	/* do not patch jump label more than once per second */
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 1676d34..2c721b9 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -5,6 +5,9 @@
>  
>  #include <linux/kvm_host.h>
>  
> +#define KVM_APIC_INIT		0
> +#define KVM_APIC_SIPI		1
> +
>  struct kvm_timer {
>  	struct hrtimer timer;
>  	s64 period; 				/* unit: ns */
> @@ -32,6 +35,8 @@ struct kvm_lapic {
>  	void *regs;
>  	gpa_t vapic_addr;
>  	struct page *vapic_page;
> +	unsigned long pending_events;
> +	unsigned int sipi_vector;
>  };
>  int kvm_create_lapic(struct kvm_vcpu *vcpu);
>  void kvm_free_lapic(struct kvm_vcpu *vcpu);
> @@ -39,6 +44,7 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu);
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu);
>  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
> @@ -158,4 +164,9 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
>  				struct kvm_lapic_irq *irq,
>  				u64 *eoi_bitmap);
>  
> +static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.apic->pending_events;
> +}
> +
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 907e428..796601a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1199,11 +1199,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu)
>  
>  	init_vmcb(svm);
>  
> -	if (!kvm_vcpu_is_bsp(vcpu)) {
> +	if (!kvm_vcpu_is_bsp(vcpu))
>  		kvm_rip_write(vcpu, 0);
Table 9-1 in SDM says that after INIT reset RIP is 0xfff0. Not
mentioning AP or BSP. We should drop any mentioning of kvm_vcpu_is_bsp()
in vmx and svm reset code and thing should just work.

> -		svm->vmcb->save.cs.base = svm->vcpu.arch.sipi_vector << 12;
> -		svm->vmcb->save.cs.selector = svm->vcpu.arch.sipi_vector << 8;
> -	}
>  
>  	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
>  	kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f17cd2a..5b862ed 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4121,10 +4121,6 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>  	seg_setup(VCPU_SREG_CS);
>  	if (kvm_vcpu_is_bsp(&vmx->vcpu))
>  		vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
> -	else {
> -		vmcs_write16(GUEST_CS_SELECTOR, vmx->vcpu.arch.sipi_vector << 8);
> -		vmcs_writel(GUEST_CS_BASE, vmx->vcpu.arch.sipi_vector << 12);
> -	}
>  
>  	seg_setup(VCPU_SREG_DS);
>  	seg_setup(VCPU_SREG_ES);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b891ac3..37c0807 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -162,8 +162,6 @@ u64 __read_mostly host_xcr0;
>  
>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>  
> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
> -
>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>  {
>  	int i;
> @@ -2823,10 +2821,9 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
>  	events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
>  	events->nmi.pad = 0;
>  
> -	events->sipi_vector = vcpu->arch.sipi_vector;
> +	events->sipi_vector = 0; /* never valid when reporting to user space */
>  
>  	events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
> -			 | KVM_VCPUEVENT_VALID_SIPI_VECTOR
>  			 | KVM_VCPUEVENT_VALID_SHADOW);
>  	memset(&events->reserved, 0, sizeof(events->reserved));
>  }
> @@ -2857,8 +2854,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>  		vcpu->arch.nmi_pending = events->nmi.pending;
>  	kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
>  
> -	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR)
> -		vcpu->arch.sipi_vector = events->sipi_vector;
> +	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
> +	    kvm_vcpu_has_lapic(vcpu))
> +		vcpu->arch.apic->sipi_vector = events->sipi_vector;
This looks out of place in this patch. Why is it needed?

>  
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  
> @@ -5713,6 +5711,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	}
>  
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> +		kvm_apic_accept_events(vcpu);
> +		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> +			r = 1;
> +			goto out;
> +		}
> +
>  		inject_pending_event(vcpu);
>  
>  		/* enable NMI/IRQ window open exits if needed */
> @@ -5847,14 +5851,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  	int r;
>  	struct kvm *kvm = vcpu->kvm;
>  
> -	if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) {
> -		pr_debug("vcpu %d received sipi with vector # %x\n",
> -			 vcpu->vcpu_id, vcpu->arch.sipi_vector);
> -		kvm_lapic_reset(vcpu);
> -		kvm_vcpu_reset(vcpu);
> -		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> -	}
> -
>  	vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>  	r = vapic_enter(vcpu);
>  	if (r) {
> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  			srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>  			kvm_vcpu_block(vcpu);
>  			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> -			if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
> -			{
> +			if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
> +				kvm_apic_accept_events(vcpu);
I think we can drop this. If INIT happens while vcpu is halted it will
become runnable here and kvm_apic_accept_events() will be called in
vcpu_enter_guest().

>  				switch(vcpu->arch.mp_state) {
>  				case KVM_MP_STATE_HALTED:
>  					vcpu->arch.mp_state =
> @@ -5880,7 +5876,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  				case KVM_MP_STATE_RUNNABLE:
>  					vcpu->arch.apf.halted = false;
>  					break;
> -				case KVM_MP_STATE_SIPI_RECEIVED:
> +				case KVM_MP_STATE_INIT_RECEIVED:
> +					break;
>  				default:
>  					r = -EINTR;
>  					break;
> @@ -6015,6 +6012,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  
>  	if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
>  		kvm_vcpu_block(vcpu);
> +		kvm_apic_accept_events(vcpu);
>  		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
>  		r = -EAGAIN;
>  		goto out;
> @@ -6171,6 +6169,7 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> +	kvm_apic_accept_events(vcpu);
>  	mp_state->mp_state = vcpu->arch.mp_state;
>  	return 0;
>  }
> @@ -6178,7 +6177,15 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> -	vcpu->arch.mp_state = mp_state->mp_state;
> +	if (!kvm_vcpu_has_lapic(vcpu) &&
> +	    mp_state->mp_state != KVM_MP_STATE_RUNNABLE)
> +		return -EINVAL;
> +
> +	if (mp_state->mp_state == KVM_MP_STATE_SIPI_RECEIVED) {
> +		vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> +		set_bit(KVM_APIC_SIPI, &vcpu->arch.apic->pending_events);
> +	} else
> +		vcpu->arch.mp_state = mp_state->mp_state;
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  	return 0;
>  }
> @@ -6515,7 +6522,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->vcpu_free(vcpu);
>  }
>  
> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
> +void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>  {
>  	atomic_set(&vcpu->arch.nmi_queued, 0);
>  	vcpu->arch.nmi_pending = 0;
> @@ -6545,6 +6552,16 @@ static void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->vcpu_reset(vcpu);
>  }
>  
> +void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector)
> +{
> +	struct kvm_segment cs;
> +
> +	kvm_get_segment(vcpu, &cs, VCPU_SREG_CS);
> +	cs.selector = vector << 8;
> +	cs.base = vector << 12;
> +	kvm_set_segment(vcpu, &cs, VCPU_SREG_CS);
> +}
> +
>  int kvm_arch_hardware_enable(void *garbage)
>  {
>  	struct kvm *kvm;
> @@ -6988,7 +7005,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>  		!vcpu->arch.apf.halted)
>  		|| !list_empty_careful(&vcpu->async_pf.done)
> -		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> +		|| kvm_apic_has_events(vcpu)
>  		|| atomic_read(&vcpu->arch.nmi_queued) ||
>  		(kvm_arch_interrupt_allowed(vcpu) &&
>  		 kvm_cpu_has_interrupt(vcpu));
> -- 
> 1.7.3.4

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini March 13, 2013, 11:10 a.m. UTC | #2

Il 13/03/2013 09:50, Jan Kiszka ha scritto:
> A VCPU sending INIT or SIPI to some other VCPU races for setting the
> remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
> was overwritten by kvm_emulate_halt and, thus, got lost.
> 
> This introduces APIC events for those two signals, keeping them in
> kvm_apic until kvm_apic_accept_events is run over the target vcpu
> context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
> are pending events, thus if vcpu blocking should end.
> 
> The patch comes with the side effect of effectively obsoleting
> KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
> immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
> The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
> state. That also means we no longer exit to user space after receiving a
> SIPI event.
> 
> Furthermore, we already reset the VCPU on INIT, only fixing up the code
> segment later on when SIPI arrives. Moreover, we fix INIT handling for
> the BSP: it never enter wait-for-SIPI but directly starts over on INIT.

Some comments inline; may add more once I put it to use...

Paolo

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    3 +-
>  arch/x86/kvm/lapic.c            |   47 ++++++++++++++++++++++++++-----
>  arch/x86/kvm/lapic.h            |   11 +++++++
>  arch/x86/kvm/svm.c              |    5 +---
>  arch/x86/kvm/vmx.c              |    4 ---
>  arch/x86/kvm/x86.c              |   57 +++++++++++++++++++++++++-------------
>  6 files changed, 90 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 348d859..ef7f4a5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -345,7 +345,6 @@ struct kvm_vcpu_arch {
>  	unsigned long apic_attention;
>  	int32_t apic_arb_prio;
>  	int mp_state;
> -	int sipi_vector;
>  	u64 ia32_misc_enable_msr;
>  	bool tpr_access_reporting;
>  
> @@ -819,6 +818,7 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
>  
>  void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
>  int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
> +void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector);
>  
>  int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
>  		    int reason, bool has_error_code, u32 error_code);
> @@ -1002,6 +1002,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
> +void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>  
>  void kvm_define_shared_msr(unsigned index, u32 msr);
>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 02b51dd..6c16230 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -685,6 +685,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  {
>  	int result = 0;
>  	struct kvm_vcpu *vcpu = apic->vcpu;
> +	unsigned long e;
>  
>  	switch (delivery_mode) {
>  	case APIC_DM_LOWEST:
> @@ -731,7 +732,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  	case APIC_DM_INIT:
>  		if (!trig_mode || level) {
>  			result = 1;
> -			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> +			e = apic->pending_events;
> +			while (!test_bit(KVM_APIC_INIT, &e))
> +				e = cmpxchg(&apic->pending_events, e,
> +					    (e | (1UL << KVM_APIC_INIT)) &
> +					    ~(1UL << KVM_APIC_SIPI));
>  			kvm_make_request(KVM_REQ_EVENT, vcpu);
>  			kvm_vcpu_kick(vcpu);
>  		} else {
> @@ -743,13 +748,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  	case APIC_DM_STARTUP:
>  		apic_debug("SIPI to vcpu %d vector 0x%02x\n",
>  			   vcpu->vcpu_id, vector);
> -		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> -			result = 1;
> -			vcpu->arch.sipi_vector = vector;
> -			vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED;
> -			kvm_make_request(KVM_REQ_EVENT, vcpu);
> -			kvm_vcpu_kick(vcpu);
> -		}
> +		result = 1;
> +		apic->sipi_vector = vector;
> +		/* make sure sipi_vector is visible for the receiver */
> +		smp_wmb();
> +		set_bit(KVM_APIC_SIPI, &apic->pending_events);
> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +		kvm_vcpu_kick(vcpu);
>  		break;
>  
>  	case APIC_DM_EXTINT:
> @@ -1860,6 +1865,32 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
>  					 addr);
>  }
>  
> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +	unsigned int sipi_vector;
> +
> +	if (!kvm_vcpu_has_lapic(vcpu))
> +		return;
> +
> +	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> +		kvm_lapic_reset(vcpu);
> +		kvm_vcpu_reset(vcpu);
> +		if (kvm_vcpu_is_bsp(apic->vcpu))
> +			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +		else
> +			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> +	}
> +	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> +	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {

smp_rmb() missing here.

> +		sipi_vector = apic->sipi_vector;
> +		pr_debug("vcpu %d received sipi with vector # %x\n",
> +			 vcpu->vcpu_id, sipi_vector);
> +		kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
> +		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +	}
> +}
> +
>  void kvm_lapic_init(void)
>  {
>  	/* do not patch jump label more than once per second */
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 1676d34..2c721b9 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -5,6 +5,9 @@
>  
>  #include <linux/kvm_host.h>
>  
> +#define KVM_APIC_INIT		0
> +#define KVM_APIC_SIPI		1
> +
>  struct kvm_timer {
>  	struct hrtimer timer;
>  	s64 period; 				/* unit: ns */
> @@ -32,6 +35,8 @@ struct kvm_lapic {
>  	void *regs;
>  	gpa_t vapic_addr;
>  	struct page *vapic_page;
> +	unsigned long pending_events;
> +	unsigned int sipi_vector;
>  };
>  int kvm_create_lapic(struct kvm_vcpu *vcpu);
>  void kvm_free_lapic(struct kvm_vcpu *vcpu);
> @@ -39,6 +44,7 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu);
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu);
>  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
> @@ -158,4 +164,9 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
>  				struct kvm_lapic_irq *irq,
>  				u64 *eoi_bitmap);
>  
> +static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.apic->pending_events;
> +}
> +
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 907e428..796601a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1199,11 +1199,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu)
>  
>  	init_vmcb(svm);
>  
> -	if (!kvm_vcpu_is_bsp(vcpu)) {
> +	if (!kvm_vcpu_is_bsp(vcpu))
>  		kvm_rip_write(vcpu, 0);

Like Gleb said, this perhaps can be moved to
kvm_vcpu_deliver_sipi_vector.  There is a similar "if" in vmx.c.

> -		svm->vmcb->save.cs.base = svm->vcpu.arch.sipi_vector << 12;
> -		svm->vmcb->save.cs.selector = svm->vcpu.arch.sipi_vector << 8;
> -	}
>  
>  	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
>  	kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f17cd2a..5b862ed 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4121,10 +4121,6 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>  	seg_setup(VCPU_SREG_CS);
>  	if (kvm_vcpu_is_bsp(&vmx->vcpu))
>  		vmcs_write16(GUEST_CS_SELECTOR, 0xf000);

I think there's no place where we set the base to 0xffff0000, but I'll
fix that in a follow-up.

> -	else {
> -		vmcs_write16(GUEST_CS_SELECTOR, vmx->vcpu.arch.sipi_vector << 8);
> -		vmcs_writel(GUEST_CS_BASE, vmx->vcpu.arch.sipi_vector << 12);
> -	}
>  
>  	seg_setup(VCPU_SREG_DS);
>  	seg_setup(VCPU_SREG_ES);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b891ac3..37c0807 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -162,8 +162,6 @@ u64 __read_mostly host_xcr0;
>  
>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>  
> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
> -
>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>  {
>  	int i;
> @@ -2823,10 +2821,9 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
>  	events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
>  	events->nmi.pad = 0;
>  
> -	events->sipi_vector = vcpu->arch.sipi_vector;
> +	events->sipi_vector = 0; /* never valid when reporting to user space */

In fact arch.sipi_vector is now unused, isn't it?

>  
>  	events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
> -			 | KVM_VCPUEVENT_VALID_SIPI_VECTOR
>  			 | KVM_VCPUEVENT_VALID_SHADOW);
>  	memset(&events->reserved, 0, sizeof(events->reserved));
>  }
> @@ -2857,8 +2854,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>  		vcpu->arch.nmi_pending = events->nmi.pending;
>  	kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
>  
> -	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR)
> -		vcpu->arch.sipi_vector = events->sipi_vector;
> +	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
> +	    kvm_vcpu_has_lapic(vcpu))
> +		vcpu->arch.apic->sipi_vector = events->sipi_vector;
>  
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  
> @@ -5713,6 +5711,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	}
>  
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> +		kvm_apic_accept_events(vcpu);
> +		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> +			r = 1;
> +			goto out;
> +		}
> +
>  		inject_pending_event(vcpu);
>  
>  		/* enable NMI/IRQ window open exits if needed */
> @@ -5847,14 +5851,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  	int r;
>  	struct kvm *kvm = vcpu->kvm;
>  
> -	if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) {
> -		pr_debug("vcpu %d received sipi with vector # %x\n",
> -			 vcpu->vcpu_id, vcpu->arch.sipi_vector);
> -		kvm_lapic_reset(vcpu);
> -		kvm_vcpu_reset(vcpu);
> -		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> -	}
> -
>  	vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>  	r = vapic_enter(vcpu);
>  	if (r) {
> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  			srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>  			kvm_vcpu_block(vcpu);
>  			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> -			if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
> -			{
> +			if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
> +				kvm_apic_accept_events(vcpu);
>  				switch(vcpu->arch.mp_state) {
>  				case KVM_MP_STATE_HALTED:
>  					vcpu->arch.mp_state =
> @@ -5880,7 +5876,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  				case KVM_MP_STATE_RUNNABLE:
>  					vcpu->arch.apf.halted = false;
>  					break;
> -				case KVM_MP_STATE_SIPI_RECEIVED:
> +				case KVM_MP_STATE_INIT_RECEIVED:
> +					break;
>  				default:
>  					r = -EINTR;
>  					break;
> @@ -6015,6 +6012,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)

So the async page faults turned out not to be a problem?  Should we
still return false from interrupt_allowed when in INIT_RECEIVED state?
(A separate patch, perhaps).

>  
>  	if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
>  		kvm_vcpu_block(vcpu);
> +		kvm_apic_accept_events(vcpu);
>  		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
>  		r = -EAGAIN;
>  		goto out;
> @@ -6171,6 +6169,7 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> +	kvm_apic_accept_events(vcpu);
>  	mp_state->mp_state = vcpu->arch.mp_state;
>  	return 0;
>  }
> @@ -6178,7 +6177,15 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> -	vcpu->arch.mp_state = mp_state->mp_state;
> +	if (!kvm_vcpu_has_lapic(vcpu) &&
> +	    mp_state->mp_state != KVM_MP_STATE_RUNNABLE)
> +		return -EINVAL;
> +
> +	if (mp_state->mp_state == KVM_MP_STATE_SIPI_RECEIVED) {
> +		vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> +		set_bit(KVM_APIC_SIPI, &vcpu->arch.apic->pending_events);
> +	} else
> +		vcpu->arch.mp_state = mp_state->mp_state;
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  	return 0;
>  }
> @@ -6515,7 +6522,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->vcpu_free(vcpu);
>  }
>  
> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
> +void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>  {
>  	atomic_set(&vcpu->arch.nmi_queued, 0);
>  	vcpu->arch.nmi_pending = 0;
> @@ -6545,6 +6552,16 @@ static void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>  	kvm_x86_ops->vcpu_reset(vcpu);
>  }
>  
> +void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector)
> +{
> +	struct kvm_segment cs;
> +
> +	kvm_get_segment(vcpu, &cs, VCPU_SREG_CS);
> +	cs.selector = vector << 8;
> +	cs.base = vector << 12;
> +	kvm_set_segment(vcpu, &cs, VCPU_SREG_CS);

As mentioned above, I think you can set rip to 0 here?

> +}
> +
>  int kvm_arch_hardware_enable(void *garbage)
>  {
>  	struct kvm *kvm;
> @@ -6988,7 +7005,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>  		!vcpu->arch.apf.halted)
>  		|| !list_empty_careful(&vcpu->async_pf.done)
> -		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> +		|| kvm_apic_has_events(vcpu)
>  		|| atomic_read(&vcpu->arch.nmi_queued) ||
>  		(kvm_arch_interrupt_allowed(vcpu) &&
>  		 kvm_cpu_has_interrupt(vcpu));
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Kiszka March 13, 2013, 11:16 a.m. UTC | #3

On 2013-03-13 11:31, Gleb Natapov wrote:
> On Wed, Mar 13, 2013 at 09:50:55AM +0100, Jan Kiszka wrote:
>> A VCPU sending INIT or SIPI to some other VCPU races for setting the
>> remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
>> was overwritten by kvm_emulate_halt and, thus, got lost.
>>
>> This introduces APIC events for those two signals, keeping them in
>> kvm_apic until kvm_apic_accept_events is run over the target vcpu
>> context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
>> are pending events, thus if vcpu blocking should end.
>>
>> The patch comes with the side effect of effectively obsoleting
>> KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
>> immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
>> The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
>> state. That also means we no longer exit to user space after receiving a
>> SIPI event.
>>
>> Furthermore, we already reset the VCPU on INIT, only fixing up the code
>> segment later on when SIPI arrives. Moreover, we fix INIT handling for
>> the BSP: it never enter wait-for-SIPI but directly starts over on INIT.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    3 +-
>>  arch/x86/kvm/lapic.c            |   47 ++++++++++++++++++++++++++-----
>>  arch/x86/kvm/lapic.h            |   11 +++++++
>>  arch/x86/kvm/svm.c              |    5 +---
>>  arch/x86/kvm/vmx.c              |    4 ---
>>  arch/x86/kvm/x86.c              |   57 +++++++++++++++++++++++++-------------
>>  6 files changed, 90 insertions(+), 37 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 348d859..ef7f4a5 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -345,7 +345,6 @@ struct kvm_vcpu_arch {
>>       unsigned long apic_attention;
>>       int32_t apic_arb_prio;
>>       int mp_state;
>> -     int sipi_vector;
>>       u64 ia32_misc_enable_msr;
>>       bool tpr_access_reporting;
>>
>> @@ -819,6 +818,7 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
>>
>>  void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
>>  int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
>> +void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector);
>>
>>  int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
>>                   int reason, bool has_error_code, u32 error_code);
>> @@ -1002,6 +1002,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
>> +void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>>
>>  void kvm_define_shared_msr(unsigned index, u32 msr);
>>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 02b51dd..6c16230 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -685,6 +685,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>>  {
>>       int result = 0;
>>       struct kvm_vcpu *vcpu = apic->vcpu;
>> +     unsigned long e;
>>
>>       switch (delivery_mode) {
>>       case APIC_DM_LOWEST:
>> @@ -731,7 +732,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>>       case APIC_DM_INIT:
>>               if (!trig_mode || level) {
>>                       result = 1;
>> -                     vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> +                     e = apic->pending_events;
>> +                     while (!test_bit(KVM_APIC_INIT, &e))
>> +                             e = cmpxchg(&apic->pending_events, e,
>> +                                         (e | (1UL << KVM_APIC_INIT)) &
>> +                                         ~(1UL << KVM_APIC_SIPI));
> Can you please add a comment why is this needed.

Even better, I'll remove it (see the other thread).

> 
>>                       kvm_make_request(KVM_REQ_EVENT, vcpu);
>>                       kvm_vcpu_kick(vcpu);
>>               } else {
>> @@ -743,13 +748,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>>       case APIC_DM_STARTUP:
>>               apic_debug("SIPI to vcpu %d vector 0x%02x\n",
>>                          vcpu->vcpu_id, vector);
>> -             if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>> -                     result = 1;
>> -                     vcpu->arch.sipi_vector = vector;
>> -                     vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED;
>> -                     kvm_make_request(KVM_REQ_EVENT, vcpu);
>> -                     kvm_vcpu_kick(vcpu);
>> -             }
>> +             result = 1;
>> +             apic->sipi_vector = vector;
>> +             /* make sure sipi_vector is visible for the receiver */
>> +             smp_wmb();
>> +             set_bit(KVM_APIC_SIPI, &apic->pending_events);
>> +             kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +             kvm_vcpu_kick(vcpu);
>>               break;
>>
>>       case APIC_DM_EXTINT:
>> @@ -1860,6 +1865,32 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
>>                                        addr);
>>  }
>>
>> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>> +{
>> +     struct kvm_lapic *apic = vcpu->arch.apic;
>> +     unsigned int sipi_vector;
>> +
>> +     if (!kvm_vcpu_has_lapic(vcpu))
>> +             return;
>> +
>> +     if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
>> +             kvm_lapic_reset(vcpu);
>> +             kvm_vcpu_reset(vcpu);
>> +             if (kvm_vcpu_is_bsp(apic->vcpu))
>> +                     vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>> +             else
>> +                     vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> +     }
>> +     if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>> +         vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>> +             sipi_vector = apic->sipi_vector;
>> +             pr_debug("vcpu %d received sipi with vector # %x\n",
>> +                      vcpu->vcpu_id, sipi_vector);
>> +             kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
>> +             vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>> +     }
>> +}
>> +
>>  void kvm_lapic_init(void)
>>  {
>>       /* do not patch jump label more than once per second */
>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>> index 1676d34..2c721b9 100644
>> --- a/arch/x86/kvm/lapic.h
>> +++ b/arch/x86/kvm/lapic.h
>> @@ -5,6 +5,9 @@
>>
>>  #include <linux/kvm_host.h>
>>
>> +#define KVM_APIC_INIT                0
>> +#define KVM_APIC_SIPI                1
>> +
>>  struct kvm_timer {
>>       struct hrtimer timer;
>>       s64 period;                             /* unit: ns */
>> @@ -32,6 +35,8 @@ struct kvm_lapic {
>>       void *regs;
>>       gpa_t vapic_addr;
>>       struct page *vapic_page;
>> +     unsigned long pending_events;
>> +     unsigned int sipi_vector;
>>  };
>>  int kvm_create_lapic(struct kvm_vcpu *vcpu);
>>  void kvm_free_lapic(struct kvm_vcpu *vcpu);
>> @@ -39,6 +44,7 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
>> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu);
>>  void kvm_lapic_reset(struct kvm_vcpu *vcpu);
>>  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>>  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
>> @@ -158,4 +164,9 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
>>                               struct kvm_lapic_irq *irq,
>>                               u64 *eoi_bitmap);
>>
>> +static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
>> +{
>> +     return vcpu->arch.apic->pending_events;
>> +}
>> +
>>  #endif
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 907e428..796601a 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -1199,11 +1199,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu)
>>
>>       init_vmcb(svm);
>>
>> -     if (!kvm_vcpu_is_bsp(vcpu)) {
>> +     if (!kvm_vcpu_is_bsp(vcpu))
>>               kvm_rip_write(vcpu, 0);
> Table 9-1 in SDM says that after INIT reset RIP is 0xfff0. Not
> mentioning AP or BSP. We should drop any mentioning of kvm_vcpu_is_bsp()
> in vmx and svm reset code and thing should just work.

SDM says that APs start up at 0x000VV000 (with VV == SIPI vector) - this
implies RIP is 0. I suppose no SMP guest would boot if we change this.

> 
>> -             svm->vmcb->save.cs.base = svm->vcpu.arch.sipi_vector << 12;
>> -             svm->vmcb->save.cs.selector = svm->vcpu.arch.sipi_vector << 8;
>> -     }
>>
>>       kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
>>       kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f17cd2a..5b862ed 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -4121,10 +4121,6 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>>       seg_setup(VCPU_SREG_CS);
>>       if (kvm_vcpu_is_bsp(&vmx->vcpu))
>>               vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
>> -     else {
>> -             vmcs_write16(GUEST_CS_SELECTOR, vmx->vcpu.arch.sipi_vector << 8);
>> -             vmcs_writel(GUEST_CS_BASE, vmx->vcpu.arch.sipi_vector << 12);
>> -     }
>>
>>       seg_setup(VCPU_SREG_DS);
>>       seg_setup(VCPU_SREG_ES);
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b891ac3..37c0807 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -162,8 +162,6 @@ u64 __read_mostly host_xcr0;
>>
>>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>>
>> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>> -
>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>>  {
>>       int i;
>> @@ -2823,10 +2821,9 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
>>       events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
>>       events->nmi.pad = 0;
>>
>> -     events->sipi_vector = vcpu->arch.sipi_vector;
>> +     events->sipi_vector = 0; /* never valid when reporting to user space */
>>
>>       events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
>> -                      | KVM_VCPUEVENT_VALID_SIPI_VECTOR
>>                        | KVM_VCPUEVENT_VALID_SHADOW);
>>       memset(&events->reserved, 0, sizeof(events->reserved));
>>  }
>> @@ -2857,8 +2854,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>>               vcpu->arch.nmi_pending = events->nmi.pending;
>>       kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
>>
>> -     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR)
>> -             vcpu->arch.sipi_vector = events->sipi_vector;
>> +     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
>> +         kvm_vcpu_has_lapic(vcpu))
>> +             vcpu->arch.apic->sipi_vector = events->sipi_vector;
> This looks out of place in this patch. Why is it needed?

It is required as long as we support MP_STATE_SIPI_RECEIVED as input
from user space.

> 
>>
>>       kvm_make_request(KVM_REQ_EVENT, vcpu);
>>
>> @@ -5713,6 +5711,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>       }
>>
>>       if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>> +             kvm_apic_accept_events(vcpu);
>> +             if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>> +                     r = 1;
>> +                     goto out;
>> +             }
>> +
>>               inject_pending_event(vcpu);
>>
>>               /* enable NMI/IRQ window open exits if needed */
>> @@ -5847,14 +5851,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>       int r;
>>       struct kvm *kvm = vcpu->kvm;
>>
>> -     if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) {
>> -             pr_debug("vcpu %d received sipi with vector # %x\n",
>> -                      vcpu->vcpu_id, vcpu->arch.sipi_vector);
>> -             kvm_lapic_reset(vcpu);
>> -             kvm_vcpu_reset(vcpu);
>> -             vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>> -     }
>> -
>>       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>>       r = vapic_enter(vcpu);
>>       if (r) {
>> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>                       srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>>                       kvm_vcpu_block(vcpu);
>>                       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>> -                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
>> -                     {
>> +                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
>> +                             kvm_apic_accept_events(vcpu);
> I think we can drop this. If INIT happens while vcpu is halted it will
> become runnable here and kvm_apic_accept_events() will be called in
> vcpu_enter_guest().

I'm not that sure, but I will recheck carefully.

Jan

Jan Kiszka March 13, 2013, 11:20 a.m. UTC | #4

On 2013-03-13 12:10, Paolo Bonzini wrote:
> Il 13/03/2013 09:50, Jan Kiszka ha scritto:
>> A VCPU sending INIT or SIPI to some other VCPU races for setting the
>> remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
>> was overwritten by kvm_emulate_halt and, thus, got lost.
>>
>> This introduces APIC events for those two signals, keeping them in
>> kvm_apic until kvm_apic_accept_events is run over the target vcpu
>> context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
>> are pending events, thus if vcpu blocking should end.
>>
>> The patch comes with the side effect of effectively obsoleting
>> KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
>> immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
>> The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
>> state. That also means we no longer exit to user space after receiving a
>> SIPI event.
>>
>> Furthermore, we already reset the VCPU on INIT, only fixing up the code
>> segment later on when SIPI arrives. Moreover, we fix INIT handling for
>> the BSP: it never enter wait-for-SIPI but directly starts over on INIT.
> 
> Some comments inline; may add more once I put it to use...
> 
> Paolo
> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    3 +-
>>  arch/x86/kvm/lapic.c            |   47 ++++++++++++++++++++++++++-----
>>  arch/x86/kvm/lapic.h            |   11 +++++++
>>  arch/x86/kvm/svm.c              |    5 +---
>>  arch/x86/kvm/vmx.c              |    4 ---
>>  arch/x86/kvm/x86.c              |   57 +++++++++++++++++++++++++-------------
>>  6 files changed, 90 insertions(+), 37 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 348d859..ef7f4a5 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -345,7 +345,6 @@ struct kvm_vcpu_arch {
>>       unsigned long apic_attention;
>>       int32_t apic_arb_prio;
>>       int mp_state;
>> -     int sipi_vector;
>>       u64 ia32_misc_enable_msr;
>>       bool tpr_access_reporting;
>>
>> @@ -819,6 +818,7 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
>>
>>  void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
>>  int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
>> +void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector);
>>
>>  int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
>>                   int reason, bool has_error_code, u32 error_code);
>> @@ -1002,6 +1002,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
>> +void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>>
>>  void kvm_define_shared_msr(unsigned index, u32 msr);
>>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 02b51dd..6c16230 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -685,6 +685,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>>  {
>>       int result = 0;
>>       struct kvm_vcpu *vcpu = apic->vcpu;
>> +     unsigned long e;
>>
>>       switch (delivery_mode) {
>>       case APIC_DM_LOWEST:
>> @@ -731,7 +732,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>>       case APIC_DM_INIT:
>>               if (!trig_mode || level) {
>>                       result = 1;
>> -                     vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> +                     e = apic->pending_events;
>> +                     while (!test_bit(KVM_APIC_INIT, &e))
>> +                             e = cmpxchg(&apic->pending_events, e,
>> +                                         (e | (1UL << KVM_APIC_INIT)) &
>> +                                         ~(1UL << KVM_APIC_SIPI));
>>                       kvm_make_request(KVM_REQ_EVENT, vcpu);
>>                       kvm_vcpu_kick(vcpu);
>>               } else {
>> @@ -743,13 +748,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>>       case APIC_DM_STARTUP:
>>               apic_debug("SIPI to vcpu %d vector 0x%02x\n",
>>                          vcpu->vcpu_id, vector);
>> -             if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>> -                     result = 1;
>> -                     vcpu->arch.sipi_vector = vector;
>> -                     vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED;
>> -                     kvm_make_request(KVM_REQ_EVENT, vcpu);
>> -                     kvm_vcpu_kick(vcpu);
>> -             }
>> +             result = 1;
>> +             apic->sipi_vector = vector;
>> +             /* make sure sipi_vector is visible for the receiver */
>> +             smp_wmb();
>> +             set_bit(KVM_APIC_SIPI, &apic->pending_events);
>> +             kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +             kvm_vcpu_kick(vcpu);
>>               break;
>>
>>       case APIC_DM_EXTINT:
>> @@ -1860,6 +1865,32 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
>>                                        addr);
>>  }
>>
>> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>> +{
>> +     struct kvm_lapic *apic = vcpu->arch.apic;
>> +     unsigned int sipi_vector;
>> +
>> +     if (!kvm_vcpu_has_lapic(vcpu))
>> +             return;
>> +
>> +     if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
>> +             kvm_lapic_reset(vcpu);
>> +             kvm_vcpu_reset(vcpu);
>> +             if (kvm_vcpu_is_bsp(apic->vcpu))
>> +                     vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>> +             else
>> +                     vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> +     }
>> +     if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>> +         vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> 
> smp_rmb() missing here.

OK.

> 
>> +             sipi_vector = apic->sipi_vector;
>> +             pr_debug("vcpu %d received sipi with vector # %x\n",
>> +                      vcpu->vcpu_id, sipi_vector);
>> +             kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
>> +             vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>> +     }
>> +}
>> +
>>  void kvm_lapic_init(void)
>>  {
>>       /* do not patch jump label more than once per second */
>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>> index 1676d34..2c721b9 100644
>> --- a/arch/x86/kvm/lapic.h
>> +++ b/arch/x86/kvm/lapic.h
>> @@ -5,6 +5,9 @@
>>
>>  #include <linux/kvm_host.h>
>>
>> +#define KVM_APIC_INIT                0
>> +#define KVM_APIC_SIPI                1
>> +
>>  struct kvm_timer {
>>       struct hrtimer timer;
>>       s64 period;                             /* unit: ns */
>> @@ -32,6 +35,8 @@ struct kvm_lapic {
>>       void *regs;
>>       gpa_t vapic_addr;
>>       struct page *vapic_page;
>> +     unsigned long pending_events;
>> +     unsigned int sipi_vector;
>>  };
>>  int kvm_create_lapic(struct kvm_vcpu *vcpu);
>>  void kvm_free_lapic(struct kvm_vcpu *vcpu);
>> @@ -39,6 +44,7 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
>> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu);
>>  void kvm_lapic_reset(struct kvm_vcpu *vcpu);
>>  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>>  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
>> @@ -158,4 +164,9 @@ void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu,
>>                               struct kvm_lapic_irq *irq,
>>                               u64 *eoi_bitmap);
>>
>> +static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
>> +{
>> +     return vcpu->arch.apic->pending_events;
>> +}
>> +
>>  #endif
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 907e428..796601a 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -1199,11 +1199,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu)
>>
>>       init_vmcb(svm);
>>
>> -     if (!kvm_vcpu_is_bsp(vcpu)) {
>> +     if (!kvm_vcpu_is_bsp(vcpu))
>>               kvm_rip_write(vcpu, 0);
> 
> Like Gleb said, this perhaps can be moved to
> kvm_vcpu_deliver_sipi_vector.  There is a similar "if" in vmx.c.

Well, moving it to deliver_sipi is something else than dropping it.

> 
>> -             svm->vmcb->save.cs.base = svm->vcpu.arch.sipi_vector << 12;
>> -             svm->vmcb->save.cs.selector = svm->vcpu.arch.sipi_vector << 8;
>> -     }
>>
>>       kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
>>       kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f17cd2a..5b862ed 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -4121,10 +4121,6 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>>       seg_setup(VCPU_SREG_CS);
>>       if (kvm_vcpu_is_bsp(&vmx->vcpu))
>>               vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
> 
> I think there's no place where we set the base to 0xffff0000, but I'll
> fix that in a follow-up.

The base is set to 0xf0000 (VMX specialty) somewhere when dealing with
real-mode - IIRC. Check that first.

> 
>> -     else {
>> -             vmcs_write16(GUEST_CS_SELECTOR, vmx->vcpu.arch.sipi_vector << 8);
>> -             vmcs_writel(GUEST_CS_BASE, vmx->vcpu.arch.sipi_vector << 12);
>> -     }
>>
>>       seg_setup(VCPU_SREG_DS);
>>       seg_setup(VCPU_SREG_ES);
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b891ac3..37c0807 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -162,8 +162,6 @@ u64 __read_mostly host_xcr0;
>>
>>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>>
>> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>> -
>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>>  {
>>       int i;
>> @@ -2823,10 +2821,9 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
>>       events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
>>       events->nmi.pad = 0;
>>
>> -     events->sipi_vector = vcpu->arch.sipi_vector;
>> +     events->sipi_vector = 0; /* never valid when reporting to user space */
> 
> In fact arch.sipi_vector is now unused, isn't it?

That's why I removed it, yes.

> 
>>
>>       events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
>> -                      | KVM_VCPUEVENT_VALID_SIPI_VECTOR
>>                        | KVM_VCPUEVENT_VALID_SHADOW);
>>       memset(&events->reserved, 0, sizeof(events->reserved));
>>  }
>> @@ -2857,8 +2854,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>>               vcpu->arch.nmi_pending = events->nmi.pending;
>>       kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
>>
>> -     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR)
>> -             vcpu->arch.sipi_vector = events->sipi_vector;
>> +     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
>> +         kvm_vcpu_has_lapic(vcpu))
>> +             vcpu->arch.apic->sipi_vector = events->sipi_vector;
>>
>>       kvm_make_request(KVM_REQ_EVENT, vcpu);
>>
>> @@ -5713,6 +5711,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>       }
>>
>>       if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>> +             kvm_apic_accept_events(vcpu);
>> +             if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>> +                     r = 1;
>> +                     goto out;
>> +             }
>> +
>>               inject_pending_event(vcpu);
>>
>>               /* enable NMI/IRQ window open exits if needed */
>> @@ -5847,14 +5851,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>       int r;
>>       struct kvm *kvm = vcpu->kvm;
>>
>> -     if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) {
>> -             pr_debug("vcpu %d received sipi with vector # %x\n",
>> -                      vcpu->vcpu_id, vcpu->arch.sipi_vector);
>> -             kvm_lapic_reset(vcpu);
>> -             kvm_vcpu_reset(vcpu);
>> -             vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>> -     }
>> -
>>       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>>       r = vapic_enter(vcpu);
>>       if (r) {
>> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>                       srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>>                       kvm_vcpu_block(vcpu);
>>                       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>> -                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
>> -                     {
>> +                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
>> +                             kvm_apic_accept_events(vcpu);
>>                               switch(vcpu->arch.mp_state) {
>>                               case KVM_MP_STATE_HALTED:
>>                                       vcpu->arch.mp_state =
>> @@ -5880,7 +5876,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>                               case KVM_MP_STATE_RUNNABLE:
>>                                       vcpu->arch.apf.halted = false;
>>                                       break;
>> -                             case KVM_MP_STATE_SIPI_RECEIVED:
>> +                             case KVM_MP_STATE_INIT_RECEIVED:
>> +                                     break;
>>                               default:
>>                                       r = -EINTR;
>>                                       break;
>> @@ -6015,6 +6012,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> 
> So the async page faults turned out not to be a problem?  Should we
> still return false from interrupt_allowed when in INIT_RECEIVED state?
> (A separate patch, perhaps).

IF is cleared after INIT, no need to.

> 
>>
>>       if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
>>               kvm_vcpu_block(vcpu);
>> +             kvm_apic_accept_events(vcpu);
>>               clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
>>               r = -EAGAIN;
>>               goto out;
>> @@ -6171,6 +6169,7 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
>>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>>                                   struct kvm_mp_state *mp_state)
>>  {
>> +     kvm_apic_accept_events(vcpu);
>>       mp_state->mp_state = vcpu->arch.mp_state;
>>       return 0;
>>  }
>> @@ -6178,7 +6177,15 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>>  int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>>                                   struct kvm_mp_state *mp_state)
>>  {
>> -     vcpu->arch.mp_state = mp_state->mp_state;
>> +     if (!kvm_vcpu_has_lapic(vcpu) &&
>> +         mp_state->mp_state != KVM_MP_STATE_RUNNABLE)
>> +             return -EINVAL;
>> +
>> +     if (mp_state->mp_state == KVM_MP_STATE_SIPI_RECEIVED) {
>> +             vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> +             set_bit(KVM_APIC_SIPI, &vcpu->arch.apic->pending_events);
>> +     } else
>> +             vcpu->arch.mp_state = mp_state->mp_state;
>>       kvm_make_request(KVM_REQ_EVENT, vcpu);
>>       return 0;
>>  }
>> @@ -6515,7 +6522,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>       kvm_x86_ops->vcpu_free(vcpu);
>>  }
>>
>> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>> +void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>>  {
>>       atomic_set(&vcpu->arch.nmi_queued, 0);
>>       vcpu->arch.nmi_pending = 0;
>> @@ -6545,6 +6552,16 @@ static void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>>       kvm_x86_ops->vcpu_reset(vcpu);
>>  }
>>
>> +void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector)
>> +{
>> +     struct kvm_segment cs;
>> +
>> +     kvm_get_segment(vcpu, &cs, VCPU_SREG_CS);
>> +     cs.selector = vector << 8;
>> +     cs.base = vector << 12;
>> +     kvm_set_segment(vcpu, &cs, VCPU_SREG_CS);
> 
> As mentioned above, I think you can set rip to 0 here?

Yes, can do.

Thanks,
Jan

Jan Kiszka March 13, 2013, 11:36 a.m. UTC | #5

On 2013-03-13 12:16, Jan Kiszka wrote:
>>> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>                       srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>>>                       kvm_vcpu_block(vcpu);
>>>                       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>>> -                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
>>> -                     {
>>> +                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
>>> +                             kvm_apic_accept_events(vcpu);
>> I think we can drop this. If INIT happens while vcpu is halted it will
>> become runnable here and kvm_apic_accept_events() will be called in
>> vcpu_enter_guest().
> 
> I'm not that sure, but I will recheck carefully.

Doesn't work: If the state was INIT_RECEIVED, we will not process the
SIPI but reenter kvm_vcpu_block. And it's more consistent to process the
events here IMHO.

Jan

Gleb Natapov March 13, 2013, 12:16 p.m. UTC | #6

On Wed, Mar 13, 2013 at 12:16:13PM +0100, Jan Kiszka wrote:
> >> @@ -1199,11 +1199,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu)
> >>
> >>       init_vmcb(svm);
> >>
> >> -     if (!kvm_vcpu_is_bsp(vcpu)) {
> >> +     if (!kvm_vcpu_is_bsp(vcpu))
> >>               kvm_rip_write(vcpu, 0);
> > Table 9-1 in SDM says that after INIT reset RIP is 0xfff0. Not
> > mentioning AP or BSP. We should drop any mentioning of kvm_vcpu_is_bsp()
> > in vmx and svm reset code and thing should just work.
> 
> SDM says that APs start up at 0x000VV000 (with VV == SIPI vector) - this
> implies RIP is 0. I suppose no SMP guest would boot if we change this.
> 
Yes, correct. Setting RIP to 0 should be moved to SIPI handling.

> > 
> >> -             svm->vmcb->save.cs.base = svm->vcpu.arch.sipi_vector << 12;
> >> -             svm->vmcb->save.cs.selector = svm->vcpu.arch.sipi_vector << 8;
> >> -     }
> >>
> >>       kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
> >>       kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index f17cd2a..5b862ed 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -4121,10 +4121,6 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
> >>       seg_setup(VCPU_SREG_CS);
> >>       if (kvm_vcpu_is_bsp(&vmx->vcpu))
> >>               vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
> >> -     else {
> >> -             vmcs_write16(GUEST_CS_SELECTOR, vmx->vcpu.arch.sipi_vector << 8);
> >> -             vmcs_writel(GUEST_CS_BASE, vmx->vcpu.arch.sipi_vector << 12);
> >> -     }
> >>
> >>       seg_setup(VCPU_SREG_DS);
> >>       seg_setup(VCPU_SREG_ES);
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index b891ac3..37c0807 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -162,8 +162,6 @@ u64 __read_mostly host_xcr0;
> >>
> >>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
> >>
> >> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
> >> -
> >>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
> >>  {
> >>       int i;
> >> @@ -2823,10 +2821,9 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
> >>       events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
> >>       events->nmi.pad = 0;
> >>
> >> -     events->sipi_vector = vcpu->arch.sipi_vector;
> >> +     events->sipi_vector = 0; /* never valid when reporting to user space */
> >>
> >>       events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
> >> -                      | KVM_VCPUEVENT_VALID_SIPI_VECTOR
> >>                        | KVM_VCPUEVENT_VALID_SHADOW);
> >>       memset(&events->reserved, 0, sizeof(events->reserved));
> >>  }
> >> @@ -2857,8 +2854,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> >>               vcpu->arch.nmi_pending = events->nmi.pending;
> >>       kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
> >>
> >> -     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR)
> >> -             vcpu->arch.sipi_vector = events->sipi_vector;
> >> +     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
> >> +         kvm_vcpu_has_lapic(vcpu))
> >> +             vcpu->arch.apic->sipi_vector = events->sipi_vector;
> > This looks out of place in this patch. Why is it needed?
> 
> It is required as long as we support MP_STATE_SIPI_RECEIVED as input
> from user space.
What problem are fixing with adding kvm_vcpu_has_lapic() here?

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Kiszka March 13, 2013, 12:17 p.m. UTC | #7

On 2013-03-13 13:16, Gleb Natapov wrote:
> On Wed, Mar 13, 2013 at 12:16:13PM +0100, Jan Kiszka wrote:
>>>> @@ -1199,11 +1199,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu)
>>>>
>>>>       init_vmcb(svm);
>>>>
>>>> -     if (!kvm_vcpu_is_bsp(vcpu)) {
>>>> +     if (!kvm_vcpu_is_bsp(vcpu))
>>>>               kvm_rip_write(vcpu, 0);
>>> Table 9-1 in SDM says that after INIT reset RIP is 0xfff0. Not
>>> mentioning AP or BSP. We should drop any mentioning of kvm_vcpu_is_bsp()
>>> in vmx and svm reset code and thing should just work.
>>
>> SDM says that APs start up at 0x000VV000 (with VV == SIPI vector) - this
>> implies RIP is 0. I suppose no SMP guest would boot if we change this.
>>
> Yes, correct. Setting RIP to 0 should be moved to SIPI handling.

Done.

> 
>>>
>>>> -             svm->vmcb->save.cs.base = svm->vcpu.arch.sipi_vector << 12;
>>>> -             svm->vmcb->save.cs.selector = svm->vcpu.arch.sipi_vector << 8;
>>>> -     }
>>>>
>>>>       kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
>>>>       kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> index f17cd2a..5b862ed 100644
>>>> --- a/arch/x86/kvm/vmx.c
>>>> +++ b/arch/x86/kvm/vmx.c
>>>> @@ -4121,10 +4121,6 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>>>>       seg_setup(VCPU_SREG_CS);
>>>>       if (kvm_vcpu_is_bsp(&vmx->vcpu))
>>>>               vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
>>>> -     else {
>>>> -             vmcs_write16(GUEST_CS_SELECTOR, vmx->vcpu.arch.sipi_vector << 8);
>>>> -             vmcs_writel(GUEST_CS_BASE, vmx->vcpu.arch.sipi_vector << 12);
>>>> -     }
>>>>
>>>>       seg_setup(VCPU_SREG_DS);
>>>>       seg_setup(VCPU_SREG_ES);
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index b891ac3..37c0807 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -162,8 +162,6 @@ u64 __read_mostly host_xcr0;
>>>>
>>>>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>>>>
>>>> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>>>> -
>>>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>>>>  {
>>>>       int i;
>>>> @@ -2823,10 +2821,9 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
>>>>       events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
>>>>       events->nmi.pad = 0;
>>>>
>>>> -     events->sipi_vector = vcpu->arch.sipi_vector;
>>>> +     events->sipi_vector = 0; /* never valid when reporting to user space */
>>>>
>>>>       events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
>>>> -                      | KVM_VCPUEVENT_VALID_SIPI_VECTOR
>>>>                        | KVM_VCPUEVENT_VALID_SHADOW);
>>>>       memset(&events->reserved, 0, sizeof(events->reserved));
>>>>  }
>>>> @@ -2857,8 +2854,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>>>>               vcpu->arch.nmi_pending = events->nmi.pending;
>>>>       kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
>>>>
>>>> -     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR)
>>>> -             vcpu->arch.sipi_vector = events->sipi_vector;
>>>> +     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
>>>> +         kvm_vcpu_has_lapic(vcpu))
>>>> +             vcpu->arch.apic->sipi_vector = events->sipi_vector;
>>> This looks out of place in this patch. Why is it needed?
>>
>> It is required as long as we support MP_STATE_SIPI_RECEIVED as input
>> from user space.
> What problem are fixing with adding kvm_vcpu_has_lapic() here?

A NULL pointer access when setting the mp_state of a VCPU without an APIC.

Jan

Gleb Natapov March 13, 2013, 12:22 p.m. UTC | #8

On Wed, Mar 13, 2013 at 01:17:50PM +0100, Jan Kiszka wrote:
> >>>> -             svm->vmcb->save.cs.base = svm->vcpu.arch.sipi_vector << 12;
> >>>> -             svm->vmcb->save.cs.selector = svm->vcpu.arch.sipi_vector << 8;
> >>>> -     }
> >>>>
> >>>>       kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
> >>>>       kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>> index f17cd2a..5b862ed 100644
> >>>> --- a/arch/x86/kvm/vmx.c
> >>>> +++ b/arch/x86/kvm/vmx.c
> >>>> @@ -4121,10 +4121,6 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
> >>>>       seg_setup(VCPU_SREG_CS);
> >>>>       if (kvm_vcpu_is_bsp(&vmx->vcpu))
> >>>>               vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
> >>>> -     else {
> >>>> -             vmcs_write16(GUEST_CS_SELECTOR, vmx->vcpu.arch.sipi_vector << 8);
> >>>> -             vmcs_writel(GUEST_CS_BASE, vmx->vcpu.arch.sipi_vector << 12);
> >>>> -     }
> >>>>
> >>>>       seg_setup(VCPU_SREG_DS);
> >>>>       seg_setup(VCPU_SREG_ES);
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>> index b891ac3..37c0807 100644
> >>>> --- a/arch/x86/kvm/x86.c
> >>>> +++ b/arch/x86/kvm/x86.c
> >>>> @@ -162,8 +162,6 @@ u64 __read_mostly host_xcr0;
> >>>>
> >>>>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
> >>>>
> >>>> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
> >>>> -
> >>>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>       int i;
> >>>> @@ -2823,10 +2821,9 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
> >>>>       events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
> >>>>       events->nmi.pad = 0;
> >>>>
> >>>> -     events->sipi_vector = vcpu->arch.sipi_vector;
> >>>> +     events->sipi_vector = 0; /* never valid when reporting to user space */
> >>>>
> >>>>       events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
> >>>> -                      | KVM_VCPUEVENT_VALID_SIPI_VECTOR
> >>>>                        | KVM_VCPUEVENT_VALID_SHADOW);
> >>>>       memset(&events->reserved, 0, sizeof(events->reserved));
> >>>>  }
> >>>> @@ -2857,8 +2854,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> >>>>               vcpu->arch.nmi_pending = events->nmi.pending;
> >>>>       kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
> >>>>
> >>>> -     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR)
> >>>> -             vcpu->arch.sipi_vector = events->sipi_vector;
> >>>> +     if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
> >>>> +         kvm_vcpu_has_lapic(vcpu))
> >>>> +             vcpu->arch.apic->sipi_vector = events->sipi_vector;
> >>> This looks out of place in this patch. Why is it needed?
> >>
> >> It is required as long as we support MP_STATE_SIPI_RECEIVED as input
> >> from user space.
> > What problem are fixing with adding kvm_vcpu_has_lapic() here?
> 
> A NULL pointer access when setting the mp_state of a VCPU without an APIC.
> 
Only now I noticed that you changed vcpu->arch.sipi_vector to
vcpu->arch.apic->sipi_vector :)

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov March 13, 2013, 12:29 p.m. UTC | #9

On Wed, Mar 13, 2013 at 12:36:58PM +0100, Jan Kiszka wrote:
> On 2013-03-13 12:16, Jan Kiszka wrote:
> >>> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
> >>>                       srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> >>>                       kvm_vcpu_block(vcpu);
> >>>                       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> >>> -                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
> >>> -                     {
> >>> +                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
> >>> +                             kvm_apic_accept_events(vcpu);
> >> I think we can drop this. If INIT happens while vcpu is halted it will
> >> become runnable here and kvm_apic_accept_events() will be called in
> >> vcpu_enter_guest().
> > 
> > I'm not that sure, but I will recheck carefully.
> 
> Doesn't work: If the state was INIT_RECEIVED, we will not process the
> SIPI but reenter kvm_vcpu_block.
Which raises the question. What if vcpu is in INIT_RECEIVED and it
receives NMI. It will make kvm_arch_vcpu_runnable() return true but code
will get back to kvm_vcpu_block() again.

>                                 And it's more consistent to process the
> events here IMHO.
> 
I would like to minimize a number of places kvm_apic_accept_events()
is called, but it looks like we cannot remove it from here indeed. What
about calling it in "case: KVM_MP_STATE_INIT_RECEIVED"?
 
--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Kiszka March 13, 2013, 12:40 p.m. UTC | #10

On 2013-03-13 13:29, Gleb Natapov wrote:
> On Wed, Mar 13, 2013 at 12:36:58PM +0100, Jan Kiszka wrote:
>> On 2013-03-13 12:16, Jan Kiszka wrote:
>>>>> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>>                       srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>>>>>                       kvm_vcpu_block(vcpu);
>>>>>                       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>>>>> -                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
>>>>> -                     {
>>>>> +                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
>>>>> +                             kvm_apic_accept_events(vcpu);
>>>> I think we can drop this. If INIT happens while vcpu is halted it will
>>>> become runnable here and kvm_apic_accept_events() will be called in
>>>> vcpu_enter_guest().
>>>
>>> I'm not that sure, but I will recheck carefully.
>>
>> Doesn't work: If the state was INIT_RECEIVED, we will not process the
>> SIPI but reenter kvm_vcpu_block.
> Which raises the question. What if vcpu is in INIT_RECEIVED and it
> receives NMI. It will make kvm_arch_vcpu_runnable() return true but code
> will get back to kvm_vcpu_block() again.

Sounds like we had a bug in this area before. This patch won't improve
it yet. We need to "block NMIs" while in wait-for-sipi state.

BTW, I'v just stumbled over more suspicious code:

static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
...
	switch (delivery_mode) {
	case APIC_DM_LOWEST:
		vcpu->arch.apic_arb_prio++;

What makes this (remote) increment safe that we can avoid an atomic inc?

> 
>>                                 And it's more consistent to process the
>> events here IMHO.
>>
> I would like to minimize a number of places kvm_apic_accept_events()
> is called, but it looks like we cannot remove it from here indeed. What
> about calling it in "case: KVM_MP_STATE_INIT_RECEIVED"?

Should work, but what is the benefit? I'd prefer to avoid temporary
switching to RUNNABLE, entering vcpu_enter_guest just to find out it was
an INIT_RECEIVED transition. Checking unconditionally makes the control
flow simpler.

Jan

Gleb Natapov March 13, 2013, 12:48 p.m. UTC | #11

On Wed, Mar 13, 2013 at 01:40:33PM +0100, Jan Kiszka wrote:
> On 2013-03-13 13:29, Gleb Natapov wrote:
> > On Wed, Mar 13, 2013 at 12:36:58PM +0100, Jan Kiszka wrote:
> >> On 2013-03-13 12:16, Jan Kiszka wrote:
> >>>>> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
> >>>>>                       srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> >>>>>                       kvm_vcpu_block(vcpu);
> >>>>>                       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> >>>>> -                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
> >>>>> -                     {
> >>>>> +                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
> >>>>> +                             kvm_apic_accept_events(vcpu);
> >>>> I think we can drop this. If INIT happens while vcpu is halted it will
> >>>> become runnable here and kvm_apic_accept_events() will be called in
> >>>> vcpu_enter_guest().
> >>>
> >>> I'm not that sure, but I will recheck carefully.
> >>
> >> Doesn't work: If the state was INIT_RECEIVED, we will not process the
> >> SIPI but reenter kvm_vcpu_block.
> > Which raises the question. What if vcpu is in INIT_RECEIVED and it
> > receives NMI. It will make kvm_arch_vcpu_runnable() return true but code
> > will get back to kvm_vcpu_block() again.
> 
> Sounds like we had a bug in this area before. This patch won't improve
> it yet. We need to "block NMIs" while in wait-for-sipi state.
> 
The problem we have now is much more serious in fact. Since INIT does
not reset vcpu even regular interrupt can cause this. I wounder what
should happen to NMIs that were received while CPU is in INIT state?

> BTW, I'v just stumbled over more suspicious code:
> 
> static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
> ...
> 	switch (delivery_mode) {
> 	case APIC_DM_LOWEST:
> 		vcpu->arch.apic_arb_prio++;
> 
> What makes this (remote) increment safe that we can avoid an atomic inc?
> 
Because we do not really care about result.

> > 
> >>                                 And it's more consistent to process the
> >> events here IMHO.
> >>
> > I would like to minimize a number of places kvm_apic_accept_events()
> > is called, but it looks like we cannot remove it from here indeed. What
> > about calling it in "case: KVM_MP_STATE_INIT_RECEIVED"?
> 
> Should work, but what is the benefit? I'd prefer to avoid temporary
> switching to RUNNABLE, entering vcpu_enter_guest just to find out it was
> an INIT_RECEIVED transition. Checking unconditionally makes the control
> flow simpler.
> 
OK.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Kiszka March 13, 2013, 12:58 p.m. UTC | #12

On 2013-03-13 13:48, Gleb Natapov wrote:
> On Wed, Mar 13, 2013 at 01:40:33PM +0100, Jan Kiszka wrote:
>> On 2013-03-13 13:29, Gleb Natapov wrote:
>>> On Wed, Mar 13, 2013 at 12:36:58PM +0100, Jan Kiszka wrote:
>>>> On 2013-03-13 12:16, Jan Kiszka wrote:
>>>>>>> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>>>>                       srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>>>>>>>                       kvm_vcpu_block(vcpu);
>>>>>>>                       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>>>>>>> -                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
>>>>>>> -                     {
>>>>>>> +                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
>>>>>>> +                             kvm_apic_accept_events(vcpu);
>>>>>> I think we can drop this. If INIT happens while vcpu is halted it will
>>>>>> become runnable here and kvm_apic_accept_events() will be called in
>>>>>> vcpu_enter_guest().
>>>>>
>>>>> I'm not that sure, but I will recheck carefully.
>>>>
>>>> Doesn't work: If the state was INIT_RECEIVED, we will not process the
>>>> SIPI but reenter kvm_vcpu_block.
>>> Which raises the question. What if vcpu is in INIT_RECEIVED and it
>>> receives NMI. It will make kvm_arch_vcpu_runnable() return true but code
>>> will get back to kvm_vcpu_block() again.
>>
>> Sounds like we had a bug in this area before. This patch won't improve
>> it yet. We need to "block NMIs" while in wait-for-sipi state.
>>
> The problem we have now is much more serious in fact. Since INIT does
> not reset vcpu even regular interrupt can cause this. I wounder what
> should happen to NMIs that were received while CPU is in INIT state?

I do not find direct references to the "native" wait-for-sipi state, but
while in guest mode, external interrupts and NMIs are "blocked", which
likely means held pending.

We are probably fine by adding a condition to kvm_arch_vcpu_runnable
that pending NMIs are no reason to wake up while in INIT_RECEIVED.

> 
>> BTW, I'v just stumbled over more suspicious code:
>>
>> static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>> ...
>> 	switch (delivery_mode) {
>> 	case APIC_DM_LOWEST:
>> 		vcpu->arch.apic_arb_prio++;
>>
>> What makes this (remote) increment safe that we can avoid an atomic inc?
>>
> Because we do not really care about result.

You mean we do not care if two events increment it by one or two? I mean
the result is actually evaluated, isn't it?

Jan

Gleb Natapov March 13, 2013, 1:18 p.m. UTC | #13

On Wed, Mar 13, 2013 at 01:58:07PM +0100, Jan Kiszka wrote:
> On 2013-03-13 13:48, Gleb Natapov wrote:
> > On Wed, Mar 13, 2013 at 01:40:33PM +0100, Jan Kiszka wrote:
> >> On 2013-03-13 13:29, Gleb Natapov wrote:
> >>> On Wed, Mar 13, 2013 at 12:36:58PM +0100, Jan Kiszka wrote:
> >>>> On 2013-03-13 12:16, Jan Kiszka wrote:
> >>>>>>> @@ -5871,8 +5867,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
> >>>>>>>                       srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> >>>>>>>                       kvm_vcpu_block(vcpu);
> >>>>>>>                       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> >>>>>>> -                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
> >>>>>>> -                     {
> >>>>>>> +                     if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
> >>>>>>> +                             kvm_apic_accept_events(vcpu);
> >>>>>> I think we can drop this. If INIT happens while vcpu is halted it will
> >>>>>> become runnable here and kvm_apic_accept_events() will be called in
> >>>>>> vcpu_enter_guest().
> >>>>>
> >>>>> I'm not that sure, but I will recheck carefully.
> >>>>
> >>>> Doesn't work: If the state was INIT_RECEIVED, we will not process the
> >>>> SIPI but reenter kvm_vcpu_block.
> >>> Which raises the question. What if vcpu is in INIT_RECEIVED and it
> >>> receives NMI. It will make kvm_arch_vcpu_runnable() return true but code
> >>> will get back to kvm_vcpu_block() again.
> >>
> >> Sounds like we had a bug in this area before. This patch won't improve
> >> it yet. We need to "block NMIs" while in wait-for-sipi state.
> >>
> > The problem we have now is much more serious in fact. Since INIT does
> > not reset vcpu even regular interrupt can cause this. I wounder what
> > should happen to NMIs that were received while CPU is in INIT state?
> 
> I do not find direct references to the "native" wait-for-sipi state, but
> while in guest mode, external interrupts and NMIs are "blocked", which
> likely means held pending.
Strange that after SIPI AP can start from int 2 vector instead SIPI
supplied vector.

> 
> We are probably fine by adding a condition to kvm_arch_vcpu_runnable
> that pending NMIs are no reason to wake up while in INIT_RECEIVED.
> 
I think so too.

> > 
> >> BTW, I'v just stumbled over more suspicious code:
> >>
> >> static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
> >> ...
> >> 	switch (delivery_mode) {
> >> 	case APIC_DM_LOWEST:
> >> 		vcpu->arch.apic_arb_prio++;
> >>
> >> What makes this (remote) increment safe that we can avoid an atomic inc?
> >>
> > Because we do not really care about result.
> 
> You mean we do not care if two events increment it by one or two? I mean
> the result is actually evaluated, isn't it?
> 
It is. We do not care if it is increment by one two or even occasionally
stay unchanged. This is our arbitration logic for redistributing LowPrio
interrupts. As long as interrupts are redistributed everything is fine.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM: x86: Rework INIT and SIPI handling

Commit Message

Comments

Patch