Message ID | 51A4B5CA.9070109@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: > Il 28/05/2013 14:56, Gleb Natapov ha scritto: > >> > else > >> > vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >> > } > >> > - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && > >> > + /* > >> > + * Note that we may get another INIT+SIPI sequence right here; process > >> > + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. > >> > + */ > >> > + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && > >> > vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > > Because pending_events can be INIT/SIPI at this point and it should be > > interpreted as: do SIPI and ignore INIT (atomically). > > My patch does "do another INIT (which will have no effect) and do SIPI > after that INIT", which is different but has almost the same effect. > If pending_events is INIT/SIPI, it ignores the SIPI for now and lets > the next iteration of kvm_apic_accept_events do both. The difference > would be that in a carefully-timed sequence of interrupts > You assume that the next processing will actually happen, but this is not necessary the case. > INIT-INIT-SIPI-INIT-SIPI > > your version would do many SIPIs, while mine would do just one. > > Hmm... there is a reference to this in 25.2 "Other causes of VM exits": > "If a logical processor is in the wait-for-SIPI state, INIT signals are > blocked. They do not cause VM exits in this case." It is not for the > physical processor, but it makes sense to have the same thing. Is this > the reason why you did the cmpxchg at the end? > No, but if helps us model proper behaviour for nested guest +1 to it :) But until we handle INIT in nested virt it is not something that dictates the solution. > But then, there's another way to mask INITs in the wait-for-SIPI > state. Considering that KVM_MP_STATE_INIT_RECEIVED is really a > wait-for-SIPI, you can do: > Haven't checked it for races (especially races between multiple CPUS sending INIT), but looks more complicated to me. > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index e1adbb4..36bc308 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -720,8 +720,12 @@ out: > break; > > case APIC_DM_INIT: > - if (!trig_mode || level) { > + if ((!trig_mode || level) && > + vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED) { > result = 1; > + > + /* check mp_state before writing apic->pending_events */ > + smp_mb(); > /* assumes that there are only KVM_APIC_INIT/SIPI */ > apic->pending_events = (1UL << KVM_APIC_INIT); > /* make sure pending_events is visible before sending > @@ -1865,13 +1869,17 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu) > if (!kvm_vcpu_has_lapic(vcpu)) > return; > > - if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) { > + if (test_bit(KVM_APIC_INIT, &apic->pending_events)) { > kvm_lapic_reset(vcpu); > kvm_vcpu_reset(vcpu); > if (kvm_vcpu_is_bsp(apic->vcpu)) > vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; > else > vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > + > + /* write mp_state before toggling KVM_APIC_INIT */ > + smp_mb__before_clear_bit(); > + clear_bit(KVM_APIC_INIT, &apic->pending_events); > } > if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && > vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > > I don't have a particular preference. > > Paolo -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 28/05/2013 17:00, Gleb Natapov ha scritto: > On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: >> Il 28/05/2013 14:56, Gleb Natapov ha scritto: >>>>> else >>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; >>>>> } >>>>> - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && >>>>> + /* >>>>> + * Note that we may get another INIT+SIPI sequence right here; process >>>>> + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. >>>>> + */ >>>>> + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && >>>>> vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { >>> Because pending_events can be INIT/SIPI at this point and it should be >>> interpreted as: do SIPI and ignore INIT (atomically). >> >> My patch does "do another INIT (which will have no effect) and do SIPI >> after that INIT", which is different but has almost the same effect. >> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets >> the next iteration of kvm_apic_accept_events do both. The difference >> would be that in a carefully-timed sequence of interrupts >> > You assume that the next processing will actually happen, but this is > not necessary the case. Why not? The INIT and SIPI that have just been sent have kicked the VCPU again. >> But then, there's another way to mask INITs in the wait-for-SIPI >> state. Considering that KVM_MP_STATE_INIT_RECEIVED is really a >> wait-for-SIPI, you can do: >> > Haven't checked it for races (especially races between multiple CPUS > sending INIT), but looks more complicated to me. Ok, let's go with yours. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote: > Il 28/05/2013 17:00, Gleb Natapov ha scritto: > > On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: > >> Il 28/05/2013 14:56, Gleb Natapov ha scritto: > >>>>> else > >>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >>>>> } > >>>>> - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && > >>>>> + /* > >>>>> + * Note that we may get another INIT+SIPI sequence right here; process > >>>>> + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. > >>>>> + */ > >>>>> + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && > >>>>> vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > >>> Because pending_events can be INIT/SIPI at this point and it should be > >>> interpreted as: do SIPI and ignore INIT (atomically). > >> > >> My patch does "do another INIT (which will have no effect) and do SIPI > >> after that INIT", which is different but has almost the same effect. > >> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets > >> the next iteration of kvm_apic_accept_events do both. The difference > >> would be that in a carefully-timed sequence of interrupts > >> > > You assume that the next processing will actually happen, but this is > > not necessary the case. > > Why not? The INIT and SIPI that have just been sent have kicked the > VCPU again. > kick is a nop if vcpu thread is not in a halt or in a guest. > >> But then, there's another way to mask INITs in the wait-for-SIPI > >> state. Considering that KVM_MP_STATE_INIT_RECEIVED is really a > >> wait-for-SIPI, you can do: > >> > > Haven't checked it for races (especially races between multiple CPUS > > sending INIT), but looks more complicated to me. > > Ok, let's go with yours. > > Paolo -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 30/05/2013 03:20, Gleb Natapov ha scritto: > On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote: >> Il 28/05/2013 17:00, Gleb Natapov ha scritto: >>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: >>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto: >>>>>>> else >>>>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; >>>>>>> } >>>>>>> - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && >>>>>>> + /* >>>>>>> + * Note that we may get another INIT+SIPI sequence right here; process >>>>>>> + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. >>>>>>> + */ >>>>>>> + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && >>>>>>> vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { >>>>> Because pending_events can be INIT/SIPI at this point and it should be >>>>> interpreted as: do SIPI and ignore INIT (atomically). >>>> >>>> My patch does "do another INIT (which will have no effect) and do SIPI >>>> after that INIT", which is different but has almost the same effect. >>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets >>>> the next iteration of kvm_apic_accept_events do both. The difference >>>> would be that in a carefully-timed sequence of interrupts >>>> >>> You assume that the next processing will actually happen, but this is >>> not necessary the case. >> >> Why not? The INIT and SIPI that have just been sent have kicked the >> VCPU again. > > kick is a nop if vcpu thread is not in a halt or in a guest. But the KVM_REQ_EVENT request will be caught at: if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests || need_resched() || signal_pending(current)) { vcpu->mode = OUTSIDE_GUEST_MODE; smp_wmb(); local_irq_enable(); preempt_enable(); r = 1; goto cancel_injection; } and the entry will be canceled. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote: > Il 30/05/2013 03:20, Gleb Natapov ha scritto: > > On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote: > >> Il 28/05/2013 17:00, Gleb Natapov ha scritto: > >>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: > >>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto: > >>>>>>> else > >>>>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >>>>>>> } > >>>>>>> - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && > >>>>>>> + /* > >>>>>>> + * Note that we may get another INIT+SIPI sequence right here; process > >>>>>>> + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. > >>>>>>> + */ > >>>>>>> + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && > >>>>>>> vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > >>>>> Because pending_events can be INIT/SIPI at this point and it should be > >>>>> interpreted as: do SIPI and ignore INIT (atomically). > >>>> > >>>> My patch does "do another INIT (which will have no effect) and do SIPI > >>>> after that INIT", which is different but has almost the same effect. > >>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets > >>>> the next iteration of kvm_apic_accept_events do both. The difference > >>>> would be that in a carefully-timed sequence of interrupts > >>>> > >>> You assume that the next processing will actually happen, but this is > >>> not necessary the case. > >> > >> Why not? The INIT and SIPI that have just been sent have kicked the > >> VCPU again. > > > > kick is a nop if vcpu thread is not in a halt or in a guest. > > But the KVM_REQ_EVENT request will be caught at: > > if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests > || need_resched() || signal_pending(current)) { > vcpu->mode = OUTSIDE_GUEST_MODE; > smp_wmb(); > local_irq_enable(); > preempt_enable(); > r = 1; > goto cancel_injection; > } > > and the entry will be canceled. > But vcpu may be in non running state so we will not get here. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 30/05/2013 08:01, Gleb Natapov ha scritto: > On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote: >> Il 30/05/2013 03:20, Gleb Natapov ha scritto: >>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote: >>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto: >>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: >>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto: >>>>>>>>> else >>>>>>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; >>>>>>>>> } >>>>>>>>> - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && >>>>>>>>> + /* >>>>>>>>> + * Note that we may get another INIT+SIPI sequence right here; process >>>>>>>>> + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. >>>>>>>>> + */ >>>>>>>>> + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && >>>>>>>>> vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { >>>>>>> Because pending_events can be INIT/SIPI at this point and it should be >>>>>>> interpreted as: do SIPI and ignore INIT (atomically). >>>>>> >>>>>> My patch does "do another INIT (which will have no effect) and do SIPI >>>>>> after that INIT", which is different but has almost the same effect. >>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets >>>>>> the next iteration of kvm_apic_accept_events do both. The difference >>>>>> would be that in a carefully-timed sequence of interrupts >>>>>> >>>>> You assume that the next processing will actually happen, but this is >>>>> not necessary the case. >>>> >>>> Why not? The INIT and SIPI that have just been sent have kicked the >>>> VCPU again. >>> >>> kick is a nop if vcpu thread is not in a halt or in a guest. >> >> But the KVM_REQ_EVENT request will be caught at: >> >> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests >> || need_resched() || signal_pending(current)) { >> vcpu->mode = OUTSIDE_GUEST_MODE; >> smp_wmb(); >> local_irq_enable(); >> preempt_enable(); >> r = 1; >> goto cancel_injection; >> } >> >> and the entry will be canceled. I was wrong: we exit immediately because state is KVM_MP_STATE_INIT_RECEIVED. But then... > But vcpu may be in non running state so we will not get here. ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while loop once more (modulo pending signals of course). On the next iteration __vcpu_run will call kvm_vcpu_block, which calls kvm_arch_vcpu_runnable. kvm_arch_vcpu_runnable returns true because kvm_apic_has_events(vcpu) is also true. This will set KVM_REQ_UNHALT, call kvm_apic_accept_events again and do the INIT+SIPI. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 30, 2013 at 08:31:11AM +0200, Paolo Bonzini wrote: > Il 30/05/2013 08:01, Gleb Natapov ha scritto: > > On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote: > >> Il 30/05/2013 03:20, Gleb Natapov ha scritto: > >>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote: > >>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto: > >>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: > >>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto: > >>>>>>>>> else > >>>>>>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >>>>>>>>> } > >>>>>>>>> - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && > >>>>>>>>> + /* > >>>>>>>>> + * Note that we may get another INIT+SIPI sequence right here; process > >>>>>>>>> + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. > >>>>>>>>> + */ > >>>>>>>>> + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && > >>>>>>>>> vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > >>>>>>> Because pending_events can be INIT/SIPI at this point and it should be > >>>>>>> interpreted as: do SIPI and ignore INIT (atomically). > >>>>>> > >>>>>> My patch does "do another INIT (which will have no effect) and do SIPI > >>>>>> after that INIT", which is different but has almost the same effect. > >>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets > >>>>>> the next iteration of kvm_apic_accept_events do both. The difference > >>>>>> would be that in a carefully-timed sequence of interrupts > >>>>>> > >>>>> You assume that the next processing will actually happen, but this is > >>>>> not necessary the case. > >>>> > >>>> Why not? The INIT and SIPI that have just been sent have kicked the > >>>> VCPU again. > >>> > >>> kick is a nop if vcpu thread is not in a halt or in a guest. > >> > >> But the KVM_REQ_EVENT request will be caught at: > >> > >> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests > >> || need_resched() || signal_pending(current)) { > >> vcpu->mode = OUTSIDE_GUEST_MODE; > >> smp_wmb(); > >> local_irq_enable(); > >> preempt_enable(); > >> r = 1; > >> goto cancel_injection; > >> } > >> > >> and the entry will be canceled. > > I was wrong: we exit immediately because state is > KVM_MP_STATE_INIT_RECEIVED. But then... > > > But vcpu may be in non running state so we will not get here. > > ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while > loop once more (modulo pending signals of course). > > On the next iteration __vcpu_run will call kvm_vcpu_block, which calls > kvm_arch_vcpu_runnable. kvm_arch_vcpu_runnable returns true because > kvm_apic_has_events(vcpu) is also true. This will set KVM_REQ_UNHALT, > call kvm_apic_accept_events again and do the INIT+SIPI. > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not lose the event. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 30/05/2013 09:09, Gleb Natapov ha scritto: > On Thu, May 30, 2013 at 08:31:11AM +0200, Paolo Bonzini wrote: >> Il 30/05/2013 08:01, Gleb Natapov ha scritto: >>> On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote: >>>> Il 30/05/2013 03:20, Gleb Natapov ha scritto: >>>>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote: >>>>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto: >>>>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: >>>>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto: >>>>>>>>>>> else >>>>>>>>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; >>>>>>>>>>> } >>>>>>>>>>> - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && >>>>>>>>>>> + /* >>>>>>>>>>> + * Note that we may get another INIT+SIPI sequence right here; process >>>>>>>>>>> + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. >>>>>>>>>>> + */ >>>>>>>>>>> + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && >>>>>>>>>>> vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { >>>>>>>>> Because pending_events can be INIT/SIPI at this point and it should be >>>>>>>>> interpreted as: do SIPI and ignore INIT (atomically). >>>>>>>> >>>>>>>> My patch does "do another INIT (which will have no effect) and do SIPI >>>>>>>> after that INIT", which is different but has almost the same effect. >>>>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets >>>>>>>> the next iteration of kvm_apic_accept_events do both. The difference >>>>>>>> would be that in a carefully-timed sequence of interrupts >>>>>>>> >>>>>>> You assume that the next processing will actually happen, but this is >>>>>>> not necessary the case. >>>>>> >>>>>> Why not? The INIT and SIPI that have just been sent have kicked the >>>>>> VCPU again. >>>>> >>>>> kick is a nop if vcpu thread is not in a halt or in a guest. >>>> >>>> But the KVM_REQ_EVENT request will be caught at: >>>> >>>> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests >>>> || need_resched() || signal_pending(current)) { >>>> vcpu->mode = OUTSIDE_GUEST_MODE; >>>> smp_wmb(); >>>> local_irq_enable(); >>>> preempt_enable(); >>>> r = 1; >>>> goto cancel_injection; >>>> } >>>> >>>> and the entry will be canceled. >> >> I was wrong: we exit immediately because state is >> KVM_MP_STATE_INIT_RECEIVED. But then... >> >>> But vcpu may be in non running state so we will not get here. >> >> ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while >> loop once more (modulo pending signals of course). >> >> On the next iteration __vcpu_run will call kvm_vcpu_block, which calls >> kvm_arch_vcpu_runnable. kvm_arch_vcpu_runnable returns true because >> kvm_apic_has_events(vcpu) is also true. This will set KVM_REQ_UNHALT, >> call kvm_apic_accept_events again and do the INIT+SIPI. > > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not > lose the event. Ok, then I'd prefer to have the cmpxchg directly in the if, as in http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505 Thanks for the discussion! Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 30, 2013 at 09:30:41AM +0200, Paolo Bonzini wrote: > Il 30/05/2013 09:09, Gleb Natapov ha scritto: > > On Thu, May 30, 2013 at 08:31:11AM +0200, Paolo Bonzini wrote: > >> Il 30/05/2013 08:01, Gleb Natapov ha scritto: > >>> On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote: > >>>> Il 30/05/2013 03:20, Gleb Natapov ha scritto: > >>>>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote: > >>>>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto: > >>>>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: > >>>>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto: > >>>>>>>>>>> else > >>>>>>>>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >>>>>>>>>>> } > >>>>>>>>>>> - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && > >>>>>>>>>>> + /* > >>>>>>>>>>> + * Note that we may get another INIT+SIPI sequence right here; process > >>>>>>>>>>> + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. > >>>>>>>>>>> + */ > >>>>>>>>>>> + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && > >>>>>>>>>>> vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > >>>>>>>>> Because pending_events can be INIT/SIPI at this point and it should be > >>>>>>>>> interpreted as: do SIPI and ignore INIT (atomically). > >>>>>>>> > >>>>>>>> My patch does "do another INIT (which will have no effect) and do SIPI > >>>>>>>> after that INIT", which is different but has almost the same effect. > >>>>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets > >>>>>>>> the next iteration of kvm_apic_accept_events do both. The difference > >>>>>>>> would be that in a carefully-timed sequence of interrupts > >>>>>>>> > >>>>>>> You assume that the next processing will actually happen, but this is > >>>>>>> not necessary the case. > >>>>>> > >>>>>> Why not? The INIT and SIPI that have just been sent have kicked the > >>>>>> VCPU again. > >>>>> > >>>>> kick is a nop if vcpu thread is not in a halt or in a guest. > >>>> > >>>> But the KVM_REQ_EVENT request will be caught at: > >>>> > >>>> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests > >>>> || need_resched() || signal_pending(current)) { > >>>> vcpu->mode = OUTSIDE_GUEST_MODE; > >>>> smp_wmb(); > >>>> local_irq_enable(); > >>>> preempt_enable(); > >>>> r = 1; > >>>> goto cancel_injection; > >>>> } > >>>> > >>>> and the entry will be canceled. > >> > >> I was wrong: we exit immediately because state is > >> KVM_MP_STATE_INIT_RECEIVED. But then... > >> > >>> But vcpu may be in non running state so we will not get here. > >> > >> ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while > >> loop once more (modulo pending signals of course). > >> > >> On the next iteration __vcpu_run will call kvm_vcpu_block, which calls > >> kvm_arch_vcpu_runnable. kvm_arch_vcpu_runnable returns true because > >> kvm_apic_has_events(vcpu) is also true. This will set KVM_REQ_UNHALT, > >> call kvm_apic_accept_events again and do the INIT+SIPI. > > > > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not > > lose the event. > > Ok, then I'd prefer to have the cmpxchg directly in the if, as in > http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505 > I still do not. Both of them are tricky, mine does not coalesce events needlessly. > Thanks for the discussion! > > Paolo -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 30/05/2013 14:34, Gleb Natapov ha scritto: >>> > > >>> > > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not >>> > > lose the event. >> > >> > Ok, then I'd prefer to have the cmpxchg directly in the if, as in >> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505 >> > > I still do not. Both of them are tricky, mine does not coalesce events > needlessly. Agreed that both are tricky, but I don't think my patch is coalescing events. If you have INIT SIPI INIT SIPI ^ ^ INIT bit cleared here SIPI bit checked here my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI bit was set by the second SIPI, not by the first. In fact the first SIPI was cancelled by the second INIT, and thus should not be processed at all. Instead, with your patch KVM will service all four events; strictly speaking it is wrong to service the first SIPI, which is why I prefer having the cmpxchg in the beginning. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote: > Il 30/05/2013 14:34, Gleb Natapov ha scritto: > >>> > > > >>> > > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not > >>> > > lose the event. > >> > > >> > Ok, then I'd prefer to have the cmpxchg directly in the if, as in > >> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505 > >> > > > I still do not. Both of them are tricky, mine does not coalesce events > > needlessly. > > Agreed that both are tricky, but I don't think my patch is coalescing > events. If you have > > INIT SIPI INIT SIPI > ^ ^ > INIT bit cleared here SIPI bit checked here > Not sure I understand what you are trying to say here. > my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI > bit was set by the second SIPI, not by the first. In fact the first > SIPI was cancelled by the second INIT, and thus should not be processed > at all. That is called coalesced. > > Instead, with your patch KVM will service all four events; strictly > speaking it is wrong to service the first SIPI, which is why I prefer > having the cmpxchg in the beginning. > y Why is it wrong? I do not see what are you arguing about. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 30/05/2013 15:10, Gleb Natapov ha scritto: > On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote: >> Il 30/05/2013 14:34, Gleb Natapov ha scritto: >>>>>>> >>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not >>>>>>> lose the event. >>>>> >>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in >>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505 >>>>> >>> I still do not. Both of them are tricky, mine does not coalesce events >>> needlessly. >> >> Agreed that both are tricky, but I don't think my patch is coalescing >> events. If you have >> >> INIT SIPI INIT SIPI >> ^ ^ >> INIT bit cleared here SIPI bit checked here >> > Not sure I understand what you are trying to say here. I'll redo the picture below. >> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI >> bit was set by the second SIPI, not by the first. In fact the first >> SIPI was cancelled by the second INIT, and thus should not be processed >> at all. > That is called coalesced. Coalescing would be something like INIT SIPI SIPI -> INIT SIPI. This is not coalescing, it is proper detection of a cancelled SIPI. We have: event sent event processed pending_events INIT INIT SIPI INIT|SIPI INIT SIPI XX INIT INIT SIPI INIT|SIPI YY SIPI INIT|SIPI failed cmpxchg INIT|SIPI INIT SIPI SIPI 0 At the line I marked with YY, you're processing an interrupt that is not anymore in the pending_events. It was dropped at point XX. With my patch it is: event sent event processed pending_events INIT INIT SIPI INIT|SIPI INIT SIPI XX INIT INIT SIPI INIT|SIPI failed cmpxchg INIT|SIPI INIT SIPI SIPI 0 >> Instead, with your patch KVM will service all four events; strictly >> speaking it is wrong to service the first SIPI, which is why I prefer >> having the cmpxchg in the beginning. > > Why is it wrong? Because the first SIPI was dropped atomically with the triggering of the second INIT, it's as if you were handling it twice. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote: > Il 30/05/2013 15:10, Gleb Natapov ha scritto: > > On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote: > >> Il 30/05/2013 14:34, Gleb Natapov ha scritto: > >>>>>>> > >>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not > >>>>>>> lose the event. > >>>>> > >>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in > >>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505 > >>>>> > >>> I still do not. Both of them are tricky, mine does not coalesce events > >>> needlessly. > >> > >> Agreed that both are tricky, but I don't think my patch is coalescing > >> events. If you have > >> > >> INIT SIPI INIT SIPI > >> ^ ^ > >> INIT bit cleared here SIPI bit checked here > >> > > Not sure I understand what you are trying to say here. > > I'll redo the picture below. > > >> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI > >> bit was set by the second SIPI, not by the first. In fact the first > >> SIPI was cancelled by the second INIT, and thus should not be processed > >> at all. > > That is called coalesced. > > Coalescing would be something like INIT SIPI SIPI -> INIT SIPI. This is > not coalescing, it is proper detection of a cancelled SIPI. We have: > > event sent event processed pending_events > INIT INIT > SIPI INIT|SIPI > INIT SIPI > XX INIT INIT > SIPI INIT|SIPI > YY SIPI INIT|SIPI > failed cmpxchg INIT|SIPI > INIT SIPI > SIPI 0 > At this point I am not even sure that you understand what problem the patch is fixing, because the bug is not event triggered by above sequence. > At the line I marked with YY, you're processing an interrupt that is not > anymore in the pending_events. It was dropped at point XX. > > With my patch it is: > > > event sent event processed pending_events > INIT INIT > SIPI INIT|SIPI > INIT SIPI > XX INIT INIT > SIPI INIT|SIPI > failed cmpxchg INIT|SIPI > INIT SIPI > SIPI 0 > > >> Instead, with your patch KVM will service all four events; strictly > >> speaking it is wrong to service the first SIPI, which is why I prefer > >> having the cmpxchg in the beginning. > > > > Why is it wrong? > > Because the first SIPI was dropped atomically with the triggering of the > second INIT, it's as if you were handling it twice. > No, you were slow to process first SIPI, so second INIT was sent because vcpu appears to be dead, so instead of processing both you process last. This is exactly what coalescing means, you are just trying to present is as something desirable. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 30/05/2013 15:35, Gleb Natapov ha scritto: > On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote: >> Il 30/05/2013 15:10, Gleb Natapov ha scritto: >>> On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote: >>>> Il 30/05/2013 14:34, Gleb Natapov ha scritto: >>>>>>>>> >>>>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not >>>>>>>>> lose the event. >>>>>>> >>>>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in >>>>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505 >>>>>>> >>>>> I still do not. Both of them are tricky, mine does not coalesce events >>>>> needlessly. >>>> >>>> Agreed that both are tricky, but I don't think my patch is coalescing >>>> events. If you have >>>> >>>> INIT SIPI INIT SIPI >>>> ^ ^ >>>> INIT bit cleared here SIPI bit checked here >>>> >>> Not sure I understand what you are trying to say here. >> >> I'll redo the picture below. >> >>>> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI >>>> bit was set by the second SIPI, not by the first. In fact the first >>>> SIPI was cancelled by the second INIT, and thus should not be processed >>>> at all. >>> That is called coalesced. >> >> Coalescing would be something like INIT SIPI SIPI -> INIT SIPI. This is >> not coalescing, it is proper detection of a cancelled SIPI. We have: >> >> event sent event processed pending_events >> INIT INIT >> SIPI INIT|SIPI >> INIT SIPI >> XX INIT INIT >> SIPI INIT|SIPI >> YY SIPI INIT|SIPI >> failed cmpxchg INIT|SIPI >> INIT SIPI >> SIPI 0 >> > At this point I am not even sure that you understand what problem the patch > is fixing, because the bug is not event triggered by above sequence. Maybe. >> Because the first SIPI was dropped atomically with the triggering of the >> second INIT, it's as if you were handling it twice. >> > No, you were slow to process first SIPI, so second INIT was sent because > vcpu appears to be dead, so instead of processing both you process last. Can you draw the events that happen? What I drew above is based on the commit message. Instead what I understand from this explanation is: event sent event processed pending_events INIT INIT SIPI INIT|SIPI INIT SIPI SIPI 0 INIT INIT SIPI INIT|SIPI INIT SIPI SIPI 0 Then my patch has absolutely no effect, the cmpxchg succeeds. With your patch instead I get: event sent event processed pending_events INIT INIT SIPI INIT|SIPI INIT SIPI SIPI ... INIT INIT SIPI INIT|SIPI failed cmpxchg INIT|SIPI INIT SIPI SIPI ... successful cmpxchg 0 But there is no difference in the actual set of events that was processed. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 30, 2013 at 04:15:35PM +0200, Paolo Bonzini wrote: > Il 30/05/2013 15:35, Gleb Natapov ha scritto: > > On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote: > >> Il 30/05/2013 15:10, Gleb Natapov ha scritto: > >>> On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote: > >>>> Il 30/05/2013 14:34, Gleb Natapov ha scritto: > >>>>>>>>> > >>>>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not > >>>>>>>>> lose the event. > >>>>>>> > >>>>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in > >>>>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505 > >>>>>>> > >>>>> I still do not. Both of them are tricky, mine does not coalesce events > >>>>> needlessly. > >>>> > >>>> Agreed that both are tricky, but I don't think my patch is coalescing > >>>> events. If you have > >>>> > >>>> INIT SIPI INIT SIPI > >>>> ^ ^ > >>>> INIT bit cleared here SIPI bit checked here > >>>> > >>> Not sure I understand what you are trying to say here. > >> > >> I'll redo the picture below. > >> > >>>> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI > >>>> bit was set by the second SIPI, not by the first. In fact the first > >>>> SIPI was cancelled by the second INIT, and thus should not be processed > >>>> at all. > >>> That is called coalesced. > >> > >> Coalescing would be something like INIT SIPI SIPI -> INIT SIPI. This is > >> not coalescing, it is proper detection of a cancelled SIPI. We have: > >> > >> event sent event processed pending_events > >> INIT INIT > >> SIPI INIT|SIPI > >> INIT SIPI > >> XX INIT INIT > >> SIPI INIT|SIPI > >> YY SIPI INIT|SIPI > >> failed cmpxchg INIT|SIPI > >> INIT SIPI > >> SIPI 0 > >> > > At this point I am not even sure that you understand what problem the patch > > is fixing, because the bug is not event triggered by above sequence. > > Maybe. > > >> Because the first SIPI was dropped atomically with the triggering of the > >> second INIT, it's as if you were handling it twice. > >> > > No, you were slow to process first SIPI, so second INIT was sent because > > vcpu appears to be dead, so instead of processing both you process last. > > Can you draw the events that happen? > I did, in commit message. > What I drew above is based on the commit message. Instead what I > understand from this explanation is: > It is definitely not based on my commit message :) In my commit message there is two INITs in a row: vpu0: vcpu1: set INIT test_and_clear_bit(KVM_APIC_INIT) process INIT set INIT set SIPI test_and_clear_bit(KVM_APIC_SIPI) process SIPI Two INITs before SIPI are essential to trigger the bug and coincidentally this is what spec advices to do. > event sent event processed pending_events > INIT INIT > SIPI INIT|SIPI > INIT SIPI > SIPI 0 > INIT INIT > SIPI INIT|SIPI > INIT SIPI > SIPI 0 > > Then my patch has absolutely no effect, the cmpxchg succeeds. With your > patch instead I get: > > > event sent event processed pending_events > INIT INIT > SIPI INIT|SIPI > INIT SIPI > SIPI ... > INIT INIT > SIPI INIT|SIPI > failed cmpxchg INIT|SIPI > INIT SIPI > SIPI ... > successful cmpxchg 0 > > But there is no difference in the actual set of events that was processed. > I do not get what you are trying to tell with this. The scenario you are repeatedly describing works with your path, with my patch and without any patch at all. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 31/05/2013 06:36, Gleb Natapov ha scritto: > In my commit message there is two INITs in a row: > vpu0: vcpu1: > set INIT > test_and_clear_bit(KVM_APIC_INIT) > process INIT > set INIT > set SIPI > test_and_clear_bit(KVM_APIC_SIPI) > process SIPI > > Two INITs before SIPI are essential to trigger the bug I see now. Let's draw pending_events as well: event sent event processed pending_events INIT INIT INIT 0 INIT INIT SIPI INIT|SIPI SIPI INIT INIT 0 Events are reordered, there is indeed a bug if the second INIT comes at just the right time. With your patch: event sent event processed pending_events INIT INIT INIT 0 INIT INIT SIPI INIT|SIPI SIPI, failed cmpxchg INIT|SIPI INIT SIPI SIPI SIPI The patch introduces a spurious SIPI, that's worse than coalescing. With my patch: event sent event processed pending_events INIT INIT INIT 0 INIT INIT SIPI INIT|SIPI (failed cmpxchg) INIT|SIPI INIT SIPI SIPI 0 My patch looks better to me for this scenario. > and coincidentally this is what spec advices to do. The spec advises INIT-SIPI-SIPI, not INIT-INIT-SIPI. This is because the first INIT may have been processed late, and SIPIs are masked if not in wait-for-SIPI state. You need to send the second just in case. It is not needed in KVM because INITs effectively unmask the SIPI immediately, even though the INIT may take place a bit later. The INIT-SIPI-SIPI sequence is handled correctly by KVM thanks to the check on the mp-state. But your patch breaks another corner case: event sent event processed pending_events INIT INIT INIT 0 SIPI SIPI SIPI 0 SIPI SIPI ignored SIPI SIPI set_mp_state(INIT_RECEIVED) SIPI SIPI 0 With my patch, or no patch at all: event sent event processed pending_events INIT INIT INIT 0 SIPI SIPI SIPI 0 SIPI SIPI ignored SIPI 0 set_mp_state(INIT_RECEIVED) 0 Though perhaps the real bug here is in kvm_arch_vcpu_ioctl_setmp_state. Setting the mp_state to anything bug SIPI_RECEIVED should clear the SIPI event. Paolo >> event sent event processed pending_events >> INIT INIT >> SIPI INIT|SIPI >> INIT SIPI >> SIPI 0 >> INIT INIT >> SIPI INIT|SIPI >> INIT SIPI >> SIPI 0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 31, 2013 at 10:48:32AM +0200, Paolo Bonzini wrote: > Il 31/05/2013 06:36, Gleb Natapov ha scritto: > > In my commit message there is two INITs in a row: > > vpu0: vcpu1: > > set INIT > > test_and_clear_bit(KVM_APIC_INIT) > > process INIT > > set INIT > > set SIPI > > test_and_clear_bit(KVM_APIC_SIPI) > > process SIPI > > > > Two INITs before SIPI are essential to trigger the bug > > I see now. Let's draw pending_events as well: > > event sent event processed pending_events > INIT INIT > INIT 0 > INIT INIT > SIPI INIT|SIPI > SIPI INIT > INIT 0 > > Events are reordered, there is indeed a bug if the second INIT comes at > just the right time. With your patch: > > event sent event processed pending_events > INIT INIT > INIT 0 > INIT INIT > SIPI INIT|SIPI > SIPI, failed cmpxchg INIT|SIPI > INIT SIPI > SIPI SIPI > This is incorrect. cmpxchg will fail only if another INIT cames after SIPI. Why would it fail? > The patch introduces a spurious SIPI, that's worse than coalescing. > With my patch: > > event sent event processed pending_events > INIT INIT > INIT 0 > INIT INIT > SIPI INIT|SIPI > (failed cmpxchg) INIT|SIPI > INIT SIPI > SIPI 0 > > My patch looks better to me for this scenario. > > > and coincidentally this is what spec advices to do. > > The spec advises INIT-SIPI-SIPI, not INIT-INIT-SIPI. This is because > the first INIT may have been processed late, and SIPIs are masked if not > in wait-for-SIPI state. You need to send the second just in case. It > is not needed in KVM because INITs effectively unmask the SIPI > immediately, even though the INIT may take place a bit later. > OK, I said this from memory since I cannot check the spec now. In this case we need to fix unit test too. > The INIT-SIPI-SIPI sequence is handled correctly by KVM thanks to the > check on the mp-state. But your patch breaks another corner case: > > event sent event processed pending_events > INIT INIT > INIT 0 > SIPI SIPI > SIPI 0 > SIPI SIPI > ignored SIPI SIPI > > set_mp_state(INIT_RECEIVED) SIPI > SIPI 0 > > With my patch, or no patch at all: > > event sent event processed pending_events > INIT INIT > INIT 0 > SIPI SIPI > SIPI 0 > SIPI SIPI > ignored SIPI 0 > set_mp_state(INIT_RECEIVED) 0 > > Though perhaps the real bug here is in kvm_arch_vcpu_ioctl_setmp_state. Looks like it, also in my patch we can always call cmpxchg to clear SIPI. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 31/05/2013 11:18, Gleb Natapov ha scritto: > On Fri, May 31, 2013 at 10:48:32AM +0200, Paolo Bonzini wrote: >> Il 31/05/2013 06:36, Gleb Natapov ha scritto: >>> In my commit message there is two INITs in a row: >>> vpu0: vcpu1: >>> set INIT >>> test_and_clear_bit(KVM_APIC_INIT) >>> process INIT >>> set INIT >>> set SIPI >>> test_and_clear_bit(KVM_APIC_SIPI) >>> process SIPI >>> >>> Two INITs before SIPI are essential to trigger the bug >> >> I see now. Let's draw pending_events as well: >> >> event sent event processed pending_events >> INIT INIT >> INIT 0 >> INIT INIT >> SIPI INIT|SIPI >> SIPI INIT >> INIT 0 >> >> Events are reordered, there is indeed a bug if the second INIT comes at >> just the right time. With your patch: >> >> event sent event processed pending_events >> INIT INIT >> INIT 0 >> INIT INIT >> SIPI INIT|SIPI >> SIPI, failed cmpxchg INIT|SIPI >> INIT SIPI >> SIPI SIPI > > This is incorrect. cmpxchg will fail only if another INIT cames after SIPI. > Why would it fail? You're right. Can you show what is the case in my patch where you have coalescing? I still prefer it because it is a smaller change, it keeps the "clear a bit before processing" idea that you find almost everywhere. Changing it to "clear a bit after processing" is a bigger and more surprising change, though both are indeed tricky. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e1adbb4..36bc308 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -720,8 +720,12 @@ out: break; case APIC_DM_INIT: - if (!trig_mode || level) { + if ((!trig_mode || level) && + vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED) { result = 1; + + /* check mp_state before writing apic->pending_events */ + smp_mb(); /* assumes that there are only KVM_APIC_INIT/SIPI */ apic->pending_events = (1UL << KVM_APIC_INIT); /* make sure pending_events is visible before sending @@ -1865,13 +1869,17 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu) if (!kvm_vcpu_has_lapic(vcpu)) return; - if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) { + if (test_bit(KVM_APIC_INIT, &apic->pending_events)) { kvm_lapic_reset(vcpu); kvm_vcpu_reset(vcpu); if (kvm_vcpu_is_bsp(apic->vcpu)) vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; else vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; + + /* write mp_state before toggling KVM_APIC_INIT */ + smp_mb__before_clear_bit(); + clear_bit(KVM_APIC_INIT, &apic->pending_events); } if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {