diff mbox

[RFC] KVM: Fix race in apic->pending_events processing

Message ID 51A4B5CA.9070109@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini May 28, 2013, 1:48 p.m. UTC
Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>> >  		else
>> >  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> >  	}
>> > -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>> > +	/*
>> > +	 * Note that we may get another INIT+SIPI sequence right here; process
>> > +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>> > +	 */
>> > +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>> >  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> Because pending_events can be INIT/SIPI at this point and it should be
> interpreted as: do SIPI and ignore INIT (atomically).

My patch does "do another INIT (which will have no effect) and do SIPI 
after that INIT", which is different but has almost the same effect.  
If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
the next iteration of kvm_apic_accept_events do both.  The difference 
would be that in a carefully-timed sequence of interrupts

    INIT-INIT-SIPI-INIT-SIPI

your version would do many SIPIs, while mine would do just one.

Hmm... there is a reference to this in 25.2 "Other causes of VM exits": 
"If a logical processor is in the wait-for-SIPI state, INIT signals are 
blocked.  They do not cause VM exits in this case."  It is not for the 
physical processor, but it makes sense to have the same thing.  Is this 
the reason why you did the cmpxchg at the end?

But then, there's another way to mask INITs in the wait-for-SIPI 
state.  Considering that KVM_MP_STATE_INIT_RECEIVED is really a 
wait-for-SIPI, you can do:


I don't have a particular preference.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Gleb Natapov May 28, 2013, 3 p.m. UTC | #1
On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >> >  		else
> >> >  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >> >  	}
> >> > -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >> > +	/*
> >> > +	 * Note that we may get another INIT+SIPI sequence right here; process
> >> > +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >> > +	 */
> >> > +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >> >  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> > Because pending_events can be INIT/SIPI at this point and it should be
> > interpreted as: do SIPI and ignore INIT (atomically).
> 
> My patch does "do another INIT (which will have no effect) and do SIPI 
> after that INIT", which is different but has almost the same effect.  
> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> the next iteration of kvm_apic_accept_events do both.  The difference 
> would be that in a carefully-timed sequence of interrupts
> 
You assume that the next processing will actually happen, but this is
not necessary the case.

>     INIT-INIT-SIPI-INIT-SIPI
> 
> your version would do many SIPIs, while mine would do just one.
> 
> Hmm... there is a reference to this in 25.2 "Other causes of VM exits": 
> "If a logical processor is in the wait-for-SIPI state, INIT signals are 
> blocked.  They do not cause VM exits in this case."  It is not for the 
> physical processor, but it makes sense to have the same thing.  Is this 
> the reason why you did the cmpxchg at the end?
> 
No, but if helps us model proper behaviour for nested guest +1 to it :)
But until we handle INIT in nested virt it is not something that
dictates the solution.

> But then, there's another way to mask INITs in the wait-for-SIPI 
> state.  Considering that KVM_MP_STATE_INIT_RECEIVED is really a 
> wait-for-SIPI, you can do:
> 
Haven't checked it for races (especially races between multiple CPUS
sending INIT), but looks more complicated to me.

> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index e1adbb4..36bc308 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -720,8 +720,12 @@ out:
>  		break;
>  
>  	case APIC_DM_INIT:
> -		if (!trig_mode || level) {
> +		if ((!trig_mode || level) &&
> +		    vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED) {
>  			result = 1;
> +
> +			/* check mp_state before writing apic->pending_events */
> +			smp_mb();
>  			/* assumes that there are only KVM_APIC_INIT/SIPI */
>  			apic->pending_events = (1UL << KVM_APIC_INIT);
>  			/* make sure pending_events is visible before sending
> @@ -1865,13 +1869,17 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>  	if (!kvm_vcpu_has_lapic(vcpu))
>  		return;
>  
> -	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> +	if (test_bit(KVM_APIC_INIT, &apic->pending_events)) {
>  		kvm_lapic_reset(vcpu);
>  		kvm_vcpu_reset(vcpu);
>  		if (kvm_vcpu_is_bsp(apic->vcpu))
>  			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>  		else
>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> +
> +		/* write mp_state before toggling KVM_APIC_INIT */
> +		smp_mb__before_clear_bit();
> +		clear_bit(KVM_APIC_INIT, &apic->pending_events);
>  	}
>  	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> 
> I don't have a particular preference.
> 
> Paolo

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 28, 2013, 4:33 p.m. UTC | #2
Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>>>>>  		else
>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>>>  	}
>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>>>>> +	/*
>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>>>>> +	 */
>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>>> Because pending_events can be INIT/SIPI at this point and it should be
>>> interpreted as: do SIPI and ignore INIT (atomically).
>>
>> My patch does "do another INIT (which will have no effect) and do SIPI 
>> after that INIT", which is different but has almost the same effect.  
>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
>> the next iteration of kvm_apic_accept_events do both.  The difference 
>> would be that in a carefully-timed sequence of interrupts
>>
> You assume that the next processing will actually happen, but this is
> not necessary the case.

Why not?  The INIT and SIPI that have just been sent have kicked the
VCPU again.

>> But then, there's another way to mask INITs in the wait-for-SIPI 
>> state.  Considering that KVM_MP_STATE_INIT_RECEIVED is really a 
>> wait-for-SIPI, you can do:
>>
> Haven't checked it for races (especially races between multiple CPUS
> sending INIT), but looks more complicated to me.

Ok, let's go with yours.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov May 30, 2013, 1:20 a.m. UTC | #3
On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> > On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> >> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >>>>>  		else
> >>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>>  	}
> >>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >>>>> +	/*
> >>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
> >>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >>>>> +	 */
> >>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >>> Because pending_events can be INIT/SIPI at this point and it should be
> >>> interpreted as: do SIPI and ignore INIT (atomically).
> >>
> >> My patch does "do another INIT (which will have no effect) and do SIPI 
> >> after that INIT", which is different but has almost the same effect.  
> >> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> >> the next iteration of kvm_apic_accept_events do both.  The difference 
> >> would be that in a carefully-timed sequence of interrupts
> >>
> > You assume that the next processing will actually happen, but this is
> > not necessary the case.
> 
> Why not?  The INIT and SIPI that have just been sent have kicked the
> VCPU again.
> 
kick is a nop if vcpu thread is not in a halt or in a guest.

> >> But then, there's another way to mask INITs in the wait-for-SIPI 
> >> state.  Considering that KVM_MP_STATE_INIT_RECEIVED is really a 
> >> wait-for-SIPI, you can do:
> >>
> > Haven't checked it for races (especially races between multiple CPUS
> > sending INIT), but looks more complicated to me.
> 
> Ok, let's go with yours.
> 
> Paolo

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 30, 2013, 5:41 a.m. UTC | #4
Il 30/05/2013 03:20, Gleb Natapov ha scritto:
> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>>>>>>>  		else
>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>>>>>  	}
>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>>>>>>> +	/*
>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>>>>>>> +	 */
>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>>>>> Because pending_events can be INIT/SIPI at this point and it should be
>>>>> interpreted as: do SIPI and ignore INIT (atomically).
>>>>
>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
>>>> after that INIT", which is different but has almost the same effect.  
>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
>>>> would be that in a carefully-timed sequence of interrupts
>>>>
>>> You assume that the next processing will actually happen, but this is
>>> not necessary the case.
>>
>> Why not?  The INIT and SIPI that have just been sent have kicked the
>> VCPU again.
>
> kick is a nop if vcpu thread is not in a halt or in a guest.

But the KVM_REQ_EVENT request will be caught at:

        if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
            || need_resched() || signal_pending(current)) {
                vcpu->mode = OUTSIDE_GUEST_MODE;
                smp_wmb();
                local_irq_enable();
                preempt_enable();
                r = 1;
                goto cancel_injection;
        }

and the entry will be canceled.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov May 30, 2013, 6:01 a.m. UTC | #5
On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
> > On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
> >> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> >>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> >>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >>>>>>>  		else
> >>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>>>>  	}
> >>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >>>>>>> +	/*
> >>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
> >>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >>>>>>> +	 */
> >>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >>>>> Because pending_events can be INIT/SIPI at this point and it should be
> >>>>> interpreted as: do SIPI and ignore INIT (atomically).
> >>>>
> >>>> My patch does "do another INIT (which will have no effect) and do SIPI 
> >>>> after that INIT", which is different but has almost the same effect.  
> >>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> >>>> the next iteration of kvm_apic_accept_events do both.  The difference 
> >>>> would be that in a carefully-timed sequence of interrupts
> >>>>
> >>> You assume that the next processing will actually happen, but this is
> >>> not necessary the case.
> >>
> >> Why not?  The INIT and SIPI that have just been sent have kicked the
> >> VCPU again.
> >
> > kick is a nop if vcpu thread is not in a halt or in a guest.
> 
> But the KVM_REQ_EVENT request will be caught at:
> 
>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>             || need_resched() || signal_pending(current)) {
>                 vcpu->mode = OUTSIDE_GUEST_MODE;
>                 smp_wmb();
>                 local_irq_enable();
>                 preempt_enable();
>                 r = 1;
>                 goto cancel_injection;
>         }
> 
> and the entry will be canceled.
> 
But vcpu may be in non running state so we will not get here.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 30, 2013, 6:31 a.m. UTC | #6
Il 30/05/2013 08:01, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
>> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
>>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
>>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
>>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
>>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>>>>>>>>>  		else
>>>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>>>>>>>  	}
>>>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>>>>>>>>> +	/*
>>>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
>>>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>>>>>>>>> +	 */
>>>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>>>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>>>>>>> Because pending_events can be INIT/SIPI at this point and it should be
>>>>>>> interpreted as: do SIPI and ignore INIT (atomically).
>>>>>>
>>>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
>>>>>> after that INIT", which is different but has almost the same effect.  
>>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
>>>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
>>>>>> would be that in a carefully-timed sequence of interrupts
>>>>>>
>>>>> You assume that the next processing will actually happen, but this is
>>>>> not necessary the case.
>>>>
>>>> Why not?  The INIT and SIPI that have just been sent have kicked the
>>>> VCPU again.
>>>
>>> kick is a nop if vcpu thread is not in a halt or in a guest.
>>
>> But the KVM_REQ_EVENT request will be caught at:
>>
>>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>>             || need_resched() || signal_pending(current)) {
>>                 vcpu->mode = OUTSIDE_GUEST_MODE;
>>                 smp_wmb();
>>                 local_irq_enable();
>>                 preempt_enable();
>>                 r = 1;
>>                 goto cancel_injection;
>>         }
>>
>> and the entry will be canceled.

I was wrong: we exit immediately because state is
KVM_MP_STATE_INIT_RECEIVED.  But then...

> But vcpu may be in non running state so we will not get here.

... vcpu_enter_guest will return 1 and __vcpu_run goes around the while
loop once more (modulo pending signals of course).

On the next iteration __vcpu_run will call kvm_vcpu_block, which calls
kvm_arch_vcpu_runnable.  kvm_arch_vcpu_runnable returns true because
kvm_apic_has_events(vcpu) is also true.  This will set KVM_REQ_UNHALT,
call kvm_apic_accept_events again and do the INIT+SIPI.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov May 30, 2013, 7:09 a.m. UTC | #7
On Thu, May 30, 2013 at 08:31:11AM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 08:01, Gleb Natapov ha scritto:
> > On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
> >> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
> >>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
> >>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> >>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> >>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >>>>>>>>>  		else
> >>>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>>>>>>  	}
> >>>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >>>>>>>>> +	/*
> >>>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
> >>>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >>>>>>>>> +	 */
> >>>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >>>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >>>>>>> Because pending_events can be INIT/SIPI at this point and it should be
> >>>>>>> interpreted as: do SIPI and ignore INIT (atomically).
> >>>>>>
> >>>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
> >>>>>> after that INIT", which is different but has almost the same effect.  
> >>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> >>>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
> >>>>>> would be that in a carefully-timed sequence of interrupts
> >>>>>>
> >>>>> You assume that the next processing will actually happen, but this is
> >>>>> not necessary the case.
> >>>>
> >>>> Why not?  The INIT and SIPI that have just been sent have kicked the
> >>>> VCPU again.
> >>>
> >>> kick is a nop if vcpu thread is not in a halt or in a guest.
> >>
> >> But the KVM_REQ_EVENT request will be caught at:
> >>
> >>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> >>             || need_resched() || signal_pending(current)) {
> >>                 vcpu->mode = OUTSIDE_GUEST_MODE;
> >>                 smp_wmb();
> >>                 local_irq_enable();
> >>                 preempt_enable();
> >>                 r = 1;
> >>                 goto cancel_injection;
> >>         }
> >>
> >> and the entry will be canceled.
> 
> I was wrong: we exit immediately because state is
> KVM_MP_STATE_INIT_RECEIVED.  But then...
> 
> > But vcpu may be in non running state so we will not get here.
> 
> ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while
> loop once more (modulo pending signals of course).
> 
> On the next iteration __vcpu_run will call kvm_vcpu_block, which calls
> kvm_arch_vcpu_runnable.  kvm_arch_vcpu_runnable returns true because
> kvm_apic_has_events(vcpu) is also true.  This will set KVM_REQ_UNHALT,
> call kvm_apic_accept_events again and do the INIT+SIPI.
> 
Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
lose the event.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 30, 2013, 7:30 a.m. UTC | #8
Il 30/05/2013 09:09, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 08:31:11AM +0200, Paolo Bonzini wrote:
>> Il 30/05/2013 08:01, Gleb Natapov ha scritto:
>>> On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
>>>> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
>>>>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
>>>>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
>>>>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
>>>>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>>>>>>>>>>>  		else
>>>>>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>>>>>>>>>  	}
>>>>>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>>>>>>>>>>> +	/*
>>>>>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
>>>>>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>>>>>>>>>>> +	 */
>>>>>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>>>>>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>>>>>>>>> Because pending_events can be INIT/SIPI at this point and it should be
>>>>>>>>> interpreted as: do SIPI and ignore INIT (atomically).
>>>>>>>>
>>>>>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
>>>>>>>> after that INIT", which is different but has almost the same effect.  
>>>>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
>>>>>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
>>>>>>>> would be that in a carefully-timed sequence of interrupts
>>>>>>>>
>>>>>>> You assume that the next processing will actually happen, but this is
>>>>>>> not necessary the case.
>>>>>>
>>>>>> Why not?  The INIT and SIPI that have just been sent have kicked the
>>>>>> VCPU again.
>>>>>
>>>>> kick is a nop if vcpu thread is not in a halt or in a guest.
>>>>
>>>> But the KVM_REQ_EVENT request will be caught at:
>>>>
>>>>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>>>>             || need_resched() || signal_pending(current)) {
>>>>                 vcpu->mode = OUTSIDE_GUEST_MODE;
>>>>                 smp_wmb();
>>>>                 local_irq_enable();
>>>>                 preempt_enable();
>>>>                 r = 1;
>>>>                 goto cancel_injection;
>>>>         }
>>>>
>>>> and the entry will be canceled.
>>
>> I was wrong: we exit immediately because state is
>> KVM_MP_STATE_INIT_RECEIVED.  But then...
>>
>>> But vcpu may be in non running state so we will not get here.
>>
>> ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while
>> loop once more (modulo pending signals of course).
>>
>> On the next iteration __vcpu_run will call kvm_vcpu_block, which calls
>> kvm_arch_vcpu_runnable.  kvm_arch_vcpu_runnable returns true because
>> kvm_apic_has_events(vcpu) is also true.  This will set KVM_REQ_UNHALT,
>> call kvm_apic_accept_events again and do the INIT+SIPI.
>
> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> lose the event.

Ok, then I'd prefer to have the cmpxchg directly in the if, as in
http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505

Thanks for the discussion!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov May 30, 2013, 12:34 p.m. UTC | #9
On Thu, May 30, 2013 at 09:30:41AM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 09:09, Gleb Natapov ha scritto:
> > On Thu, May 30, 2013 at 08:31:11AM +0200, Paolo Bonzini wrote:
> >> Il 30/05/2013 08:01, Gleb Natapov ha scritto:
> >>> On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
> >>>> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
> >>>>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
> >>>>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> >>>>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> >>>>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >>>>>>>>>>>  		else
> >>>>>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>>>>>>>>  	}
> >>>>>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >>>>>>>>>>> +	/*
> >>>>>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
> >>>>>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >>>>>>>>>>> +	 */
> >>>>>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >>>>>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >>>>>>>>> Because pending_events can be INIT/SIPI at this point and it should be
> >>>>>>>>> interpreted as: do SIPI and ignore INIT (atomically).
> >>>>>>>>
> >>>>>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
> >>>>>>>> after that INIT", which is different but has almost the same effect.  
> >>>>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> >>>>>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
> >>>>>>>> would be that in a carefully-timed sequence of interrupts
> >>>>>>>>
> >>>>>>> You assume that the next processing will actually happen, but this is
> >>>>>>> not necessary the case.
> >>>>>>
> >>>>>> Why not?  The INIT and SIPI that have just been sent have kicked the
> >>>>>> VCPU again.
> >>>>>
> >>>>> kick is a nop if vcpu thread is not in a halt or in a guest.
> >>>>
> >>>> But the KVM_REQ_EVENT request will be caught at:
> >>>>
> >>>>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> >>>>             || need_resched() || signal_pending(current)) {
> >>>>                 vcpu->mode = OUTSIDE_GUEST_MODE;
> >>>>                 smp_wmb();
> >>>>                 local_irq_enable();
> >>>>                 preempt_enable();
> >>>>                 r = 1;
> >>>>                 goto cancel_injection;
> >>>>         }
> >>>>
> >>>> and the entry will be canceled.
> >>
> >> I was wrong: we exit immediately because state is
> >> KVM_MP_STATE_INIT_RECEIVED.  But then...
> >>
> >>> But vcpu may be in non running state so we will not get here.
> >>
> >> ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while
> >> loop once more (modulo pending signals of course).
> >>
> >> On the next iteration __vcpu_run will call kvm_vcpu_block, which calls
> >> kvm_arch_vcpu_runnable.  kvm_arch_vcpu_runnable returns true because
> >> kvm_apic_has_events(vcpu) is also true.  This will set KVM_REQ_UNHALT,
> >> call kvm_apic_accept_events again and do the INIT+SIPI.
> >
> > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> > lose the event.
> 
> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
> 
I still do not. Both of them are tricky, mine does not coalesce events
needlessly.

> Thanks for the discussion!
> 
> Paolo

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 30, 2013, 12:58 p.m. UTC | #10
Il 30/05/2013 14:34, Gleb Natapov ha scritto:
>>> > >
>>> > > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
>>> > > lose the event.
>> > 
>> > Ok, then I'd prefer to have the cmpxchg directly in the if, as in
>> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
>> > 
> I still do not. Both of them are tricky, mine does not coalesce events
> needlessly.

Agreed that both are tricky, but I don't think my patch is coalescing
events.  If you have

    INIT    SIPI     INIT     SIPI
                  ^                           ^
                  INIT bit cleared here       SIPI bit checked here

my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
bit was set by the second SIPI, not by the first.  In fact the first
SIPI was cancelled by the second INIT, and thus should not be processed
at all.

Instead, with your patch KVM will service all four events; strictly
speaking it is wrong to service the first SIPI, which is why I prefer
having the cmpxchg in the beginning.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov May 30, 2013, 1:10 p.m. UTC | #11
On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
> >>> > >
> >>> > > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> >>> > > lose the event.
> >> > 
> >> > Ok, then I'd prefer to have the cmpxchg directly in the if, as in
> >> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
> >> > 
> > I still do not. Both of them are tricky, mine does not coalesce events
> > needlessly.
> 
> Agreed that both are tricky, but I don't think my patch is coalescing
> events.  If you have
> 
>     INIT    SIPI     INIT     SIPI
>                   ^                           ^
>                   INIT bit cleared here       SIPI bit checked here
> 
Not sure I understand what you are trying to say here.

> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
> bit was set by the second SIPI, not by the first.  In fact the first
> SIPI was cancelled by the second INIT, and thus should not be processed
> at all.
That is called coalesced.

> 
> Instead, with your patch KVM will service all four events; strictly
> speaking it is wrong to service the first SIPI, which is why I prefer
> having the cmpxchg in the beginning.
> y
Why is it wrong?

I do not see what are you arguing about.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 30, 2013, 1:23 p.m. UTC | #12
Il 30/05/2013 15:10, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
>> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
>>>>>>>
>>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
>>>>>>> lose the event.
>>>>>
>>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
>>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
>>>>>
>>> I still do not. Both of them are tricky, mine does not coalesce events
>>> needlessly.
>>
>> Agreed that both are tricky, but I don't think my patch is coalescing
>> events.  If you have
>>
>>     INIT    SIPI     INIT     SIPI
>>                   ^                           ^
>>                   INIT bit cleared here       SIPI bit checked here
>>
> Not sure I understand what you are trying to say here.

I'll redo the picture below.

>> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
>> bit was set by the second SIPI, not by the first.  In fact the first
>> SIPI was cancelled by the second INIT, and thus should not be processed
>> at all.
> That is called coalesced.

Coalescing would be something like INIT SIPI SIPI -> INIT SIPI.  This is
not coalescing, it is proper detection of a cancelled SIPI.  We have:

   event sent           event processed            pending_events
    INIT                                                 INIT
    SIPI                                               INIT|SIPI
                             INIT                        SIPI
XX  INIT                                                 INIT
    SIPI                                               INIT|SIPI
YY                           SIPI                      INIT|SIPI
                        failed cmpxchg                 INIT|SIPI
                             INIT                        SIPI
                             SIPI                          0

At the line I marked with YY, you're processing an interrupt that is not
anymore in the pending_events.  It was dropped at point XX.

With my patch it is:


   event sent           event processed            pending_events
    INIT                                                 INIT
    SIPI                                               INIT|SIPI
                             INIT                        SIPI
XX  INIT                                                 INIT
    SIPI                                               INIT|SIPI
                       failed cmpxchg                 INIT|SIPI
                             INIT                        SIPI
                             SIPI                         0

>> Instead, with your patch KVM will service all four events; strictly
>> speaking it is wrong to service the first SIPI, which is why I prefer
>> having the cmpxchg in the beginning.
>
> Why is it wrong?

Because the first SIPI was dropped atomically with the triggering of the
second INIT, it's as if you were handling it twice.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov May 30, 2013, 1:35 p.m. UTC | #13
On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 15:10, Gleb Natapov ha scritto:
> > On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
> >> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
> >>>>>>>
> >>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> >>>>>>> lose the event.
> >>>>>
> >>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
> >>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
> >>>>>
> >>> I still do not. Both of them are tricky, mine does not coalesce events
> >>> needlessly.
> >>
> >> Agreed that both are tricky, but I don't think my patch is coalescing
> >> events.  If you have
> >>
> >>     INIT    SIPI     INIT     SIPI
> >>                   ^                           ^
> >>                   INIT bit cleared here       SIPI bit checked here
> >>
> > Not sure I understand what you are trying to say here.
> 
> I'll redo the picture below.
> 
> >> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
> >> bit was set by the second SIPI, not by the first.  In fact the first
> >> SIPI was cancelled by the second INIT, and thus should not be processed
> >> at all.
> > That is called coalesced.
> 
> Coalescing would be something like INIT SIPI SIPI -> INIT SIPI.  This is
> not coalescing, it is proper detection of a cancelled SIPI.  We have:
> 
>    event sent           event processed            pending_events
>     INIT                                                 INIT
>     SIPI                                               INIT|SIPI
>                              INIT                        SIPI
> XX  INIT                                                 INIT
>     SIPI                                               INIT|SIPI
> YY                           SIPI                      INIT|SIPI
>                         failed cmpxchg                 INIT|SIPI
>                              INIT                        SIPI
>                              SIPI                          0
> 
At this point I am not even sure that you understand what problem the patch
is fixing, because the bug is not event triggered by above sequence.

> At the line I marked with YY, you're processing an interrupt that is not
> anymore in the pending_events.  It was dropped at point XX.
> 
> With my patch it is:
> 
> 
>    event sent           event processed            pending_events
>     INIT                                                 INIT
>     SIPI                                               INIT|SIPI
>                              INIT                        SIPI
> XX  INIT                                                 INIT
>     SIPI                                               INIT|SIPI
>                        failed cmpxchg                 INIT|SIPI
>                              INIT                        SIPI
>                              SIPI                         0
> 
> >> Instead, with your patch KVM will service all four events; strictly
> >> speaking it is wrong to service the first SIPI, which is why I prefer
> >> having the cmpxchg in the beginning.
> >
> > Why is it wrong?
> 
> Because the first SIPI was dropped atomically with the triggering of the
> second INIT, it's as if you were handling it twice.
> 
No, you were slow to process first SIPI, so second INIT was sent because
vcpu appears to be dead, so instead of processing both you process last.
This is exactly what coalescing means, you are just trying to present is
as something desirable.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 30, 2013, 2:15 p.m. UTC | #14
Il 30/05/2013 15:35, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote:
>> Il 30/05/2013 15:10, Gleb Natapov ha scritto:
>>> On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
>>>> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
>>>>>>>>>
>>>>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
>>>>>>>>> lose the event.
>>>>>>>
>>>>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
>>>>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
>>>>>>>
>>>>> I still do not. Both of them are tricky, mine does not coalesce events
>>>>> needlessly.
>>>>
>>>> Agreed that both are tricky, but I don't think my patch is coalescing
>>>> events.  If you have
>>>>
>>>>     INIT    SIPI     INIT     SIPI
>>>>                   ^                           ^
>>>>                   INIT bit cleared here       SIPI bit checked here
>>>>
>>> Not sure I understand what you are trying to say here.
>>
>> I'll redo the picture below.
>>
>>>> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
>>>> bit was set by the second SIPI, not by the first.  In fact the first
>>>> SIPI was cancelled by the second INIT, and thus should not be processed
>>>> at all.
>>> That is called coalesced.
>>
>> Coalescing would be something like INIT SIPI SIPI -> INIT SIPI.  This is
>> not coalescing, it is proper detection of a cancelled SIPI.  We have:
>>
>>    event sent           event processed            pending_events
>>     INIT                                                 INIT
>>     SIPI                                               INIT|SIPI
>>                              INIT                        SIPI
>> XX  INIT                                                 INIT
>>     SIPI                                               INIT|SIPI
>> YY                           SIPI                      INIT|SIPI
>>                         failed cmpxchg                 INIT|SIPI
>>                              INIT                        SIPI
>>                              SIPI                          0
>>
> At this point I am not even sure that you understand what problem the patch
> is fixing, because the bug is not event triggered by above sequence.

Maybe.

>> Because the first SIPI was dropped atomically with the triggering of the
>> second INIT, it's as if you were handling it twice.
>>
> No, you were slow to process first SIPI, so second INIT was sent because
> vcpu appears to be dead, so instead of processing both you process last.

Can you draw the events that happen?

What I drew above is based on the commit message.  Instead what I
understand from this explanation is:

   event sent           event processed            pending_events
     INIT                                                 INIT
     SIPI                                               INIT|SIPI
                              INIT                        SIPI
                              SIPI                         0
     INIT                                                 INIT
     SIPI                                               INIT|SIPI
                              INIT                        SIPI
                              SIPI                         0

Then my patch has absolutely no effect, the cmpxchg succeeds.  With your
patch instead I get:


   event sent           event processed              pending_events
     INIT                                                 INIT
     SIPI                                               INIT|SIPI
                              INIT                        SIPI
                              SIPI                        ...
     INIT                                                 INIT
     SIPI                                               INIT|SIPI
                         failed cmpxchg                 INIT|SIPI
                              INIT                        SIPI
                              SIPI                        ...
                        successful cmpxchg                  0

But there is no difference in the actual set of events that was processed.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov May 31, 2013, 4:36 a.m. UTC | #15
On Thu, May 30, 2013 at 04:15:35PM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 15:35, Gleb Natapov ha scritto:
> > On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote:
> >> Il 30/05/2013 15:10, Gleb Natapov ha scritto:
> >>> On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
> >>>> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
> >>>>>>>>>
> >>>>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> >>>>>>>>> lose the event.
> >>>>>>>
> >>>>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
> >>>>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
> >>>>>>>
> >>>>> I still do not. Both of them are tricky, mine does not coalesce events
> >>>>> needlessly.
> >>>>
> >>>> Agreed that both are tricky, but I don't think my patch is coalescing
> >>>> events.  If you have
> >>>>
> >>>>     INIT    SIPI     INIT     SIPI
> >>>>                   ^                           ^
> >>>>                   INIT bit cleared here       SIPI bit checked here
> >>>>
> >>> Not sure I understand what you are trying to say here.
> >>
> >> I'll redo the picture below.
> >>
> >>>> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
> >>>> bit was set by the second SIPI, not by the first.  In fact the first
> >>>> SIPI was cancelled by the second INIT, and thus should not be processed
> >>>> at all.
> >>> That is called coalesced.
> >>
> >> Coalescing would be something like INIT SIPI SIPI -> INIT SIPI.  This is
> >> not coalescing, it is proper detection of a cancelled SIPI.  We have:
> >>
> >>    event sent           event processed            pending_events
> >>     INIT                                                 INIT
> >>     SIPI                                               INIT|SIPI
> >>                              INIT                        SIPI
> >> XX  INIT                                                 INIT
> >>     SIPI                                               INIT|SIPI
> >> YY                           SIPI                      INIT|SIPI
> >>                         failed cmpxchg                 INIT|SIPI
> >>                              INIT                        SIPI
> >>                              SIPI                          0
> >>
> > At this point I am not even sure that you understand what problem the patch
> > is fixing, because the bug is not event triggered by above sequence.
> 
> Maybe.
> 
> >> Because the first SIPI was dropped atomically with the triggering of the
> >> second INIT, it's as if you were handling it twice.
> >>
> > No, you were slow to process first SIPI, so second INIT was sent because
> > vcpu appears to be dead, so instead of processing both you process last.
> 
> Can you draw the events that happen?
> 
I did, in commit message.

> What I drew above is based on the commit message.  Instead what I
> understand from this explanation is:
> 
It is definitely not based on my commit message :)

In my commit message there is two INITs in a row:
 vpu0:                            vcpu1:
 set INIT
                                test_and_clear_bit(KVM_APIC_INIT)
                                   process INIT
 set INIT
 set SIPI
                                test_and_clear_bit(KVM_APIC_SIPI)
                                   process SIPI

Two INITs before SIPI are essential to trigger the bug and
coincidentally this is what spec advices to do.

>    event sent           event processed            pending_events
>      INIT                                                 INIT
>      SIPI                                               INIT|SIPI
>                               INIT                        SIPI
>                               SIPI                         0
>      INIT                                                 INIT
>      SIPI                                               INIT|SIPI
>                               INIT                        SIPI
>                               SIPI                         0
> 
> Then my patch has absolutely no effect, the cmpxchg succeeds.  With your
> patch instead I get:
> 
> 
>    event sent           event processed              pending_events
>      INIT                                                 INIT
>      SIPI                                               INIT|SIPI
>                               INIT                        SIPI
>                               SIPI                        ...
>      INIT                                                 INIT
>      SIPI                                               INIT|SIPI
>                          failed cmpxchg                 INIT|SIPI
>                               INIT                        SIPI
>                               SIPI                        ...
>                         successful cmpxchg                  0
> 
> But there is no difference in the actual set of events that was processed.
> 
I do not get what you are trying to tell with this. The scenario you are
repeatedly describing works with your path, with my patch and without any
patch at all.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 31, 2013, 8:48 a.m. UTC | #16
Il 31/05/2013 06:36, Gleb Natapov ha scritto:
> In my commit message there is two INITs in a row:
>  vpu0:                            vcpu1:
>  set INIT
>                                 test_and_clear_bit(KVM_APIC_INIT)
>                                    process INIT
>  set INIT
>  set SIPI
>                                 test_and_clear_bit(KVM_APIC_SIPI)
>                                    process SIPI
> 
> Two INITs before SIPI are essential to trigger the bug

I see now.  Let's draw pending_events as well:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      INIT                                                INIT
      SIPI                                              INIT|SIPI
                               SIPI                       INIT
                               INIT                         0

Events are reordered, there is indeed a bug if the second INIT comes at
just the right time.  With your patch:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      INIT                                                INIT
      SIPI                                              INIT|SIPI
                          SIPI, failed cmpxchg          INIT|SIPI
                               INIT                       SIPI
                               SIPI                       SIPI

The patch introduces a spurious SIPI, that's worse than coalescing.
With my patch:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      INIT                                                INIT
      SIPI                                              INIT|SIPI
                          (failed cmpxchg)              INIT|SIPI
                               INIT                       SIPI
                               SIPI                        0

My patch looks better to me for this scenario.

> and coincidentally this is what spec advices to do.

The spec advises INIT-SIPI-SIPI, not INIT-INIT-SIPI.  This is because
the first INIT may have been processed late, and SIPIs are masked if not
in wait-for-SIPI state.  You need to send the second just in case.  It
is not needed in KVM because INITs effectively unmask the SIPI
immediately, even though the INIT may take place a bit later.

The INIT-SIPI-SIPI sequence is handled correctly by KVM thanks to the
check on the mp-state.  But your patch breaks another corner case:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      SIPI                                                SIPI
                               SIPI                        0
      SIPI                                                SIPI
                           ignored SIPI                   SIPI

  set_mp_state(INIT_RECEIVED)                             SIPI
                               SIPI                        0

With my patch, or no patch at all:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      SIPI                                                SIPI
                               SIPI                        0
      SIPI                                                SIPI
                           ignored SIPI                    0
  set_mp_state(INIT_RECEIVED)                              0

Though perhaps the real bug here is in kvm_arch_vcpu_ioctl_setmp_state.
Setting the mp_state to anything bug SIPI_RECEIVED should clear the SIPI
event.

Paolo

>>    event sent           event processed            pending_events
>>      INIT                                                 INIT
>>      SIPI                                               INIT|SIPI
>>                               INIT                        SIPI
>>                               SIPI                         0
>>      INIT                                                 INIT
>>      SIPI                                               INIT|SIPI
>>                               INIT                        SIPI
>>                               SIPI                         0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov May 31, 2013, 9:18 a.m. UTC | #17
On Fri, May 31, 2013 at 10:48:32AM +0200, Paolo Bonzini wrote:
> Il 31/05/2013 06:36, Gleb Natapov ha scritto:
> > In my commit message there is two INITs in a row:
> >  vpu0:                            vcpu1:
> >  set INIT
> >                                 test_and_clear_bit(KVM_APIC_INIT)
> >                                    process INIT
> >  set INIT
> >  set SIPI
> >                                 test_and_clear_bit(KVM_APIC_SIPI)
> >                                    process SIPI
> > 
> > Two INITs before SIPI are essential to trigger the bug
> 
> I see now.  Let's draw pending_events as well:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       INIT                                                INIT
>       SIPI                                              INIT|SIPI
>                                SIPI                       INIT
>                                INIT                         0
> 
> Events are reordered, there is indeed a bug if the second INIT comes at
> just the right time.  With your patch:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       INIT                                                INIT
>       SIPI                                              INIT|SIPI
>                           SIPI, failed cmpxchg          INIT|SIPI
>                                INIT                       SIPI
>                                SIPI                       SIPI
> 
This is incorrect. cmpxchg will fail only if another INIT cames after SIPI.
Why  would it fail?

> The patch introduces a spurious SIPI, that's worse than coalescing.
> With my patch:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       INIT                                                INIT
>       SIPI                                              INIT|SIPI
>                           (failed cmpxchg)              INIT|SIPI
>                                INIT                       SIPI
>                                SIPI                        0
> 
> My patch looks better to me for this scenario.
> 
> > and coincidentally this is what spec advices to do.
> 
> The spec advises INIT-SIPI-SIPI, not INIT-INIT-SIPI.  This is because
> the first INIT may have been processed late, and SIPIs are masked if not
> in wait-for-SIPI state.  You need to send the second just in case.  It
> is not needed in KVM because INITs effectively unmask the SIPI
> immediately, even though the INIT may take place a bit later.
> 
OK, I said this from memory since I cannot check the spec now. In this
case we need to fix unit test too.

> The INIT-SIPI-SIPI sequence is handled correctly by KVM thanks to the
> check on the mp-state.  But your patch breaks another corner case:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       SIPI                                                SIPI
>                                SIPI                        0
>       SIPI                                                SIPI
>                            ignored SIPI                   SIPI
> 
>   set_mp_state(INIT_RECEIVED)                             SIPI
>                                SIPI                        0
> 
> With my patch, or no patch at all:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       SIPI                                                SIPI
>                                SIPI                        0
>       SIPI                                                SIPI
>                            ignored SIPI                    0
>   set_mp_state(INIT_RECEIVED)                              0
> 
> Though perhaps the real bug here is in kvm_arch_vcpu_ioctl_setmp_state.

Looks like it, also in my patch we can always call cmpxchg to clear
SIPI.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 31, 2013, 9:48 a.m. UTC | #18
Il 31/05/2013 11:18, Gleb Natapov ha scritto:
> On Fri, May 31, 2013 at 10:48:32AM +0200, Paolo Bonzini wrote:
>> Il 31/05/2013 06:36, Gleb Natapov ha scritto:
>>> In my commit message there is two INITs in a row:
>>>  vpu0:                            vcpu1:
>>>  set INIT
>>>                                 test_and_clear_bit(KVM_APIC_INIT)
>>>                                    process INIT
>>>  set INIT
>>>  set SIPI
>>>                                 test_and_clear_bit(KVM_APIC_SIPI)
>>>                                    process SIPI
>>>
>>> Two INITs before SIPI are essential to trigger the bug
>>
>> I see now.  Let's draw pending_events as well:
>>
>>     event sent           event processed            pending_events
>>       INIT                                                INIT
>>                                INIT                        0
>>       INIT                                                INIT
>>       SIPI                                              INIT|SIPI
>>                                SIPI                       INIT
>>                                INIT                         0
>>
>> Events are reordered, there is indeed a bug if the second INIT comes at
>> just the right time.  With your patch:
>>
>>     event sent           event processed            pending_events
>>       INIT                                                INIT
>>                                INIT                        0
>>       INIT                                                INIT
>>       SIPI                                              INIT|SIPI
>>                           SIPI, failed cmpxchg          INIT|SIPI
>>                                INIT                       SIPI
>>                                SIPI                       SIPI
>
> This is incorrect. cmpxchg will fail only if another INIT cames after SIPI.
> Why  would it fail?

You're right.

Can you show what is the case in my patch where you have coalescing?  I
still prefer it because it is a smaller change, it keeps the "clear a
bit before processing" idea that you find almost everywhere.  Changing
it to "clear a bit after processing" is a bigger and more surprising
change, though both are indeed tricky.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e1adbb4..36bc308 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -720,8 +720,12 @@  out:
 		break;
 
 	case APIC_DM_INIT:
-		if (!trig_mode || level) {
+		if ((!trig_mode || level) &&
+		    vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED) {
 			result = 1;
+
+			/* check mp_state before writing apic->pending_events */
+			smp_mb();
 			/* assumes that there are only KVM_APIC_INIT/SIPI */
 			apic->pending_events = (1UL << KVM_APIC_INIT);
 			/* make sure pending_events is visible before sending
@@ -1865,13 +1869,17 @@  void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 	if (!kvm_vcpu_has_lapic(vcpu))
 		return;
 
-	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
+	if (test_bit(KVM_APIC_INIT, &apic->pending_events)) {
 		kvm_lapic_reset(vcpu);
 		kvm_vcpu_reset(vcpu);
 		if (kvm_vcpu_is_bsp(apic->vcpu))
 			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 		else
 			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
+
+		/* write mp_state before toggling KVM_APIC_INIT */
+		smp_mb__before_clear_bit();
+		clear_bit(KVM_APIC_INIT, &apic->pending_events);
 	}
 	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
 	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {