Message ID | 1514131983-24305-4-git-send-email-liran.alon@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2017/12/25 00:12, Liran Alon wrote: > In case posted-interrupt was delivered to CPU while it is in host > (outside guest), then posted-interrupt delivery will be done by > calling sync_pir_to_irr() at vmentry after interrupts are disabled. > > sync_pir_to_irr() will check vmx->pi_desc.control ON bit and if > set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to > ensure virtual-interrupt-delivery will dispatch interrupt to guest. > > However, it is possible that L1 will receive a posted-interrupt while > CPU runs at host and is about to enter L2. In this case, the call to > sync_pir_to_irr() will indeed update the L1's APIC IRR but > vcpu_enter_guest() will then just resume into L2 guest without > re-evaluating if it should exit from L2 to L1 as a result of this > new pending L1 event. > > To address this case, if sync_pir_to_irr() has a new L1 injectable > interrupt and CPU is running L2, we force exit GUEST_MODE which will > result in another iteration of vcpu_run() run loop which will call > kvm_vcpu_running() which will call check_nested_events() which will > handle the pending L1 event properly. I agree with this solution.. However ... > Signed-off-by: Liran Alon <liran.alon@oracle.com> > Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> > Reviewed-by: Liam Merwick <liam.merwick@oracle.com> > Signed-off-by: Liam Merwick <liam.merwick@oracle.com> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > arch/x86/kvm/vmx.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 325608a1ed65..d307bf26462a 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -9032,6 +9032,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > int max_irr; > + bool max_irr_updated; > > WARN_ON(!vcpu->arch.apicv_active); > if (pi_test_on(&vmx->pi_desc)) { > @@ -9041,7 +9042,16 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) > * But on x86 this is just a compiler barrier anyway. > */ > smp_mb__after_atomic(); > - kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); > + max_irr_updated = > + kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); > + > + /* > + * If we are running L2 and L1 has a new pending interrupt > + * which can be injected, we should re-evaluate > + * what should be done with this new L1 interrupt. > + */ > + if (is_guest_mode(vcpu) && max_irr_updated) > + kvm_vcpu_exiting_guest_mode(vcpu); ... refer to "Virtual-Interrupt Delivery": """ Vector ← RVI; VISR[Vector] ← 1; SVI ← Vector; VPPR ← Vector & F0H; VIRR[Vector] ← 0; IF any bits set in VIRR THEN RVI ← highest index of bit set in VIRR ELSE RVI ← 0; FI; deliver interrupt with Vector through IDT; cease recognition of any pending virtual interrupt; """ as we synced PIR to L1's APIC IRR, not only the max_irr is injectable but also the other vectors in L1's APIC IRR. So what we need to check is the new L1 injectable interrupt from PIR, even if it is not the max_irr.. Quan Alibaba Cloud
On 02/01/18 04:45, Quan Xu wrote: > On 2017/12/25 00:12, Liran Alon wrote: > >> In case posted-interrupt was delivered to CPU while it is in host >> (outside guest), then posted-interrupt delivery will be done by >> calling sync_pir_to_irr() at vmentry after interrupts are disabled. >> >> sync_pir_to_irr() will check vmx->pi_desc.control ON bit and if >> set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to >> ensure virtual-interrupt-delivery will dispatch interrupt to guest. >> >> However, it is possible that L1 will receive a posted-interrupt while >> CPU runs at host and is about to enter L2. In this case, the call to >> sync_pir_to_irr() will indeed update the L1's APIC IRR but >> vcpu_enter_guest() will then just resume into L2 guest without >> re-evaluating if it should exit from L2 to L1 as a result of this >> new pending L1 event. >> >> To address this case, if sync_pir_to_irr() has a new L1 injectable >> interrupt and CPU is running L2, we force exit GUEST_MODE which will >> result in another iteration of vcpu_run() run loop which will call >> kvm_vcpu_running() which will call check_nested_events() which will >> handle the pending L1 event properly. > > > I agree with this solution.. However ... > > >> Signed-off-by: Liran Alon <liran.alon@oracle.com> >> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> >> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> >> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> >> Signed-off-by: Liam Merwick <liam.merwick@oracle.com> >> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> --- >> arch/x86/kvm/vmx.c | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 325608a1ed65..d307bf26462a 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -9032,6 +9032,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu >> *vcpu) >> { >> struct vcpu_vmx *vmx = to_vmx(vcpu); >> int max_irr; >> + bool max_irr_updated; >> WARN_ON(!vcpu->arch.apicv_active); >> if (pi_test_on(&vmx->pi_desc)) { >> @@ -9041,7 +9042,16 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu >> *vcpu) >> * But on x86 this is just a compiler barrier anyway. >> */ >> smp_mb__after_atomic(); >> - kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); >> + max_irr_updated = >> + kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); >> + >> + /* >> + * If we are running L2 and L1 has a new pending interrupt >> + * which can be injected, we should re-evaluate >> + * what should be done with this new L1 interrupt. >> + */ >> + if (is_guest_mode(vcpu) && max_irr_updated) >> + kvm_vcpu_exiting_guest_mode(vcpu); > > > ... > > refer to "Virtual-Interrupt Delivery": > > """ > Vector ← RVI; > VISR[Vector] ← 1; > SVI ← Vector; > VPPR ← Vector & F0H; > VIRR[Vector] ← 0; > IF any bits set in VIRR > THEN RVI ← highest index of bit set in VIRR > ELSE RVI ← 0; > FI; > deliver interrupt with Vector through IDT; > cease recognition of any pending virtual interrupt; > """ > > > > as we synced PIR to L1's APIC IRR, not only the max_irr is injectable > but also the other vectors in L1's APIC IRR. > > So what we need to check is the new L1 injectable interrupt from PIR, > even if it is not the max_irr.. The vectors which were set in the LAPIC IRR before the sync with PIR were already evaluated to check if they should cause #VMExit from L2 to L1 (in check_nested_events()). Therefore, in vmx_sync_pir_to_irr() we only need to check if PIR have a new injectable interrupt set. I failed to understand how this relates to the VID SDM algorithm you provided here. Can you clarify? Thanks, -Liran > > Quan > Alibaba Cloud
On 2018/01/02 17:57, Liran Alon wrote: > > > On 02/01/18 04:45, Quan Xu wrote: >> On 2017/12/25 00:12, Liran Alon wrote: >> >>> In case posted-interrupt was delivered to CPU while it is in host >>> (outside guest), then posted-interrupt delivery will be done by >>> calling sync_pir_to_irr() at vmentry after interrupts are disabled. >>> >>> sync_pir_to_irr() will check vmx->pi_desc.control ON bit and if >>> set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to >>> ensure virtual-interrupt-delivery will dispatch interrupt to guest. >>> >>> However, it is possible that L1 will receive a posted-interrupt while >>> CPU runs at host and is about to enter L2. In this case, the call to >>> sync_pir_to_irr() will indeed update the L1's APIC IRR but >>> vcpu_enter_guest() will then just resume into L2 guest without >>> re-evaluating if it should exit from L2 to L1 as a result of this >>> new pending L1 event. >>> >>> To address this case, if sync_pir_to_irr() has a new L1 injectable >>> interrupt and CPU is running L2, we force exit GUEST_MODE which will >>> result in another iteration of vcpu_run() run loop which will call >>> kvm_vcpu_running() which will call check_nested_events() which will >>> handle the pending L1 event properly. >> >> >> I agree with this solution.. However ... >> >> >>> Signed-off-by: Liran Alon <liran.alon@oracle.com> >>> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> >>> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> >>> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> >>> Signed-off-by: Liam Merwick <liam.merwick@oracle.com> >>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>> --- >>> arch/x86/kvm/vmx.c | 12 +++++++++++- >>> 1 file changed, 11 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> index 325608a1ed65..d307bf26462a 100644 >>> --- a/arch/x86/kvm/vmx.c >>> +++ b/arch/x86/kvm/vmx.c >>> @@ -9032,6 +9032,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu >>> *vcpu) >>> { >>> struct vcpu_vmx *vmx = to_vmx(vcpu); >>> int max_irr; >>> + bool max_irr_updated; >>> WARN_ON(!vcpu->arch.apicv_active); >>> if (pi_test_on(&vmx->pi_desc)) { >>> @@ -9041,7 +9042,16 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu >>> *vcpu) >>> * But on x86 this is just a compiler barrier anyway. >>> */ >>> smp_mb__after_atomic(); >>> - kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); >>> + max_irr_updated = >>> + kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); >>> + >>> + /* >>> + * If we are running L2 and L1 has a new pending interrupt >>> + * which can be injected, we should re-evaluate >>> + * what should be done with this new L1 interrupt. >>> + */ >>> + if (is_guest_mode(vcpu) && max_irr_updated) >>> + kvm_vcpu_exiting_guest_mode(vcpu); >> >> >> ... >> >> refer to "Virtual-Interrupt Delivery": >> >> """ >> Vector ← RVI; >> VISR[Vector] ← 1; >> SVI ← Vector; >> VPPR ← Vector & F0H; >> VIRR[Vector] ← 0; >> IF any bits set in VIRR >> THEN RVI ← highest index of bit set in VIRR >> ELSE RVI ← 0; >> FI; >> deliver interrupt with Vector through IDT; >> cease recognition of any pending virtual interrupt; >> """ >> >> >> >> as we synced PIR to L1's APIC IRR, not only the max_irr is injectable >> but also the other vectors in L1's APIC IRR. >> >> So what we need to check is the new L1 injectable interrupt from PIR, >> even if it is not the max_irr.. > > The vectors which were set in the LAPIC IRR before the sync with PIR > were already evaluated to check if they should cause #VMExit from L2 > to L1 (in check_nested_events()). > > Therefore, in vmx_sync_pir_to_irr() we only need to check if PIR have > a new injectable interrupt set. > yes, the key point is what does 'a new L1 injectable interrupt' mean.. in your patch, __IIUC__ one of condition to exit guest mode: 1) max_irr_updated - 'vector of PIR' > 'vectors of previous IRR' as the SDM algorithm said, even if: 'vector of PIR' < 'vectors of previos IRR' If we has synced PIR to L1's APIC IRR, the 'vector of PIR' _may_ be delivered to VM.. it is a new injectable interrupt as well.. Quan Alibaba Cloud > I failed to understand how this relates to the VID SDM algorithm you > provided here. Can you clarify? >
On 2018年01月02日 19:21, Quan Xu wrote: > > yes, > > the key point is what does 'a new L1 injectable interrupt' mean.. > > in your patch, __IIUC__ one of condition to exit guest mode: > 1) max_irr_updated - 'vector of PIR' > 'vectors of previous IRR' > > > as the SDM algorithm said, even if: > 'vector of PIR' < 'vectors of previos IRR' > > If we has synced PIR to L1's APIC IRR, the 'vector of PIR' _may_ be > delivered to VM.. If we has synced PIR to L1's APIC IRR, the 'vector of PIR' _may_ be delivered to VM before next VMExit.. > it is a new injectable interrupt as well.. > > > > Quan > Alibaba Cloud
On 02/01/18 13:52, Quan Xu wrote: > > > On 2018年01月02日 19:21, Quan Xu wrote: >> >> yes, >> >> the key point is what does 'a new L1 injectable interrupt' mean.. >> >> in your patch, __IIUC__ one of condition to exit guest mode: >> 1) max_irr_updated - 'vector of PIR' > 'vectors of previous IRR' >> >> >> as the SDM algorithm said, even if: >> 'vector of PIR' < 'vectors of previos IRR' >> >> If we has synced PIR to L1's APIC IRR, the 'vector of PIR' _may_ be >> delivered to VM.. > If we has synced PIR to L1's APIC IRR, the 'vector of PIR' _may_ be > delivered to VM before next VMExit.. > >> it is a new injectable interrupt as well.. >> >> >> >> Quan >> Alibaba Cloud > In case APICv is active (relevant to our case), __apic_accept_irq() handles APIC_DM_FIXED by calling deliver_posted_interrupt() which will set a bit in PIR instead of setting a bit in LAPIC IRR. Therefore, in case APICv is active, the bits that are set in LAPIC IRR are bits which were synced from PIR to IRR by some previous call of vmx_sync_pir_to_irr(). (Not considering the case of some user-mode API setting them manually). Therefore, the only new injectable interrupts we can encounter in vmx_sync_pir_to_irr() are vectors coming from the PIR. It is also not possible for the guest to block interrupt vector X while still being able to accept vector Y if Y<X. Therefore, it is enough to just check if max vector from PIR > max vector from IRR before the sync. It is true that VID will inject the vector in RVI to the guest if possible on VMEntry. RVI is set by vmx_sync_pir_to_irr() to either the max value from PIR or the max value from IRR. If the max value from IRR > max value from PIR, then the vector to be injected to guest was already evaluated if it should exit to L1 by check_nested_events() and therefore we don't need to call kvm_vcpu_exiting_guest_mode() again. Regards, -Liran
On 2018/01/02 20:20, Liran Alon wrote: > > In case APICv is active (relevant to our case), __apic_accept_irq() > handles APIC_DM_FIXED by calling deliver_posted_interrupt() which will > set a bit in PIR instead of setting a bit in LAPIC IRR. > > Therefore, in case APICv is active, the bits that are set in LAPIC IRR > are bits which were synced from PIR to IRR by some previous call of > vmx_sync_pir_to_irr(). (Not considering the case of some user-mode API > setting them manually). > > Therefore, the only new injectable interrupts we can encounter in > vmx_sync_pir_to_irr() are vectors coming from the PIR. > It is also not possible for the guest to block interrupt vector X > while still being able to accept vector Y if Y<X. Therefore, it is > enough to just check if max vector from PIR > max vector from IRR > before the sync. > > It is true that VID will inject the vector in RVI to the guest if > possible on VMEntry. RVI is set by vmx_sync_pir_to_irr() to either the > max value from PIR or the max value from IRR. If the max value from > IRR > max value from PIR, I agree with you on all of above points, ... > then the vector to be injected to guest was already evaluated if it > should exit to L1 by check_nested_events() and therefore we don't need > to call kvm_vcpu_exiting_guest_mode() again. ... :( my bad.. I missed the VPPR.. .e.g. in this case: 'vector of PIR' < 'vector of previos IRR' then the 'vector of PIR' also have a chance to be synced to RVI, but hardware does not recognize 'vector of PIR' until next VMEntry as RVI[7:4] <= VPPR[7:4] (VPPR is from the last EOI virtualization of 'vector of previos IRR') even L2 -> L0 -> L2, L0 code helps to check it as your description.. so you are right. Quan Alibaba Cloud
On 2017/12/25 00:12, Liran Alon wrote: > In case posted-interrupt was delivered to CPU while it is in host > (outside guest), then posted-interrupt delivery will be done by > calling sync_pir_to_irr() at vmentry after interrupts are disabled. > > sync_pir_to_irr() will check vmx->pi_desc.control ON bit and if > set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to > ensure virtual-interrupt-delivery will dispatch interrupt to guest. > > However, it is possible that L1 will receive a posted-interrupt while > CPU runs at host and is about to enter L2. In this case, the call to > sync_pir_to_irr() will indeed update the L1's APIC IRR but > vcpu_enter_guest() will then just resume into L2 guest without > re-evaluating if it should exit from L2 to L1 as a result of this > new pending L1 event. > > To address this case, if sync_pir_to_irr() has a new L1 injectable > interrupt and CPU is running L2, we force exit GUEST_MODE which will > result in another iteration of vcpu_run() run loop which will call > kvm_vcpu_running() which will call check_nested_events() which will > handle the pending L1 event properly. > > Signed-off-by: Liran Alon <liran.alon@oracle.com> > Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> > Reviewed-by: Liam Merwick <liam.merwick@oracle.com> > Signed-off-by: Liam Merwick <liam.merwick@oracle.com> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Quan Xu <quan.xu0@gmail.com> Quan Alibaba Cloud > --- > arch/x86/kvm/vmx.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 325608a1ed65..d307bf26462a 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -9032,6 +9032,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > int max_irr; > + bool max_irr_updated; > > WARN_ON(!vcpu->arch.apicv_active); > if (pi_test_on(&vmx->pi_desc)) { > @@ -9041,7 +9042,16 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) > * But on x86 this is just a compiler barrier anyway. > */ > smp_mb__after_atomic(); > - kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); > + max_irr_updated = > + kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); > + > + /* > + * If we are running L2 and L1 has a new pending interrupt > + * which can be injected, we should re-evaluate > + * what should be done with this new L1 interrupt. > + */ > + if (is_guest_mode(vcpu) && max_irr_updated) > + kvm_vcpu_exiting_guest_mode(vcpu); > } else { > max_irr = kvm_lapic_find_highest_irr(vcpu); > }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 325608a1ed65..d307bf26462a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -9032,6 +9032,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); int max_irr; + bool max_irr_updated; WARN_ON(!vcpu->arch.apicv_active); if (pi_test_on(&vmx->pi_desc)) { @@ -9041,7 +9042,16 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) * But on x86 this is just a compiler barrier anyway. */ smp_mb__after_atomic(); - kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); + max_irr_updated = + kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); + + /* + * If we are running L2 and L1 has a new pending interrupt + * which can be injected, we should re-evaluate + * what should be done with this new L1 interrupt. + */ + if (is_guest_mode(vcpu) && max_irr_updated) + kvm_vcpu_exiting_guest_mode(vcpu); } else { max_irr = kvm_lapic_find_highest_irr(vcpu); }