Message ID | 20170206233245.GA8766@skl-2s3.sh.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote: >>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote: >> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c, >> >> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) >> { >> /* This may race with setting of irr in __apic_accept_irq() and >> * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq >> * will cause vmexit immediately and the value will be recalculated >> * on the next vmentry. >> */ >> ... >> } >> >> I am afraid, there may be a similar race with setting of vIRR.. > >I think this is unrelated: If another interrupt came in, the highest >set bit in vIRR can only increase. Plus pt_update_irq(), before >returning, calls vlapic_set_irq(), which ought to result in pt_vector's >vIRR bit to be set (either directly or via setting its PIR bit). I.e. the >highest priority interrupt should still have a vector >= pt_vector. I have noticed that pt_vector was 0x38 and the hightest vector was 0x30 when the assertion failed. In the process of debugging pt_update_irq() when I was working on another feature, I noticed that 0x30 is always the vector of IRQ2. I suspect that the source of the periodic timer interrupt is not lapic. If that, pt_update_irq() reads the vioapic entry twice, one in hvm_isa_irq_assert() and the other in pt_irq_vector(). If guest changed the content in viopaic entry between the two read (ie. change the vector from 0x30 to 0x38), the assertion would fail. Do you think it is a reasonable explanation? Thanks, chao > >Jan >
On February 07, 2017 7:33 AM, Chao Gao wrote: >Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") >has added a assertion that intack.vector is the highest priority vector. But >according to the osstest, the assertion failed sometimes. More discussion >can be found in the thread >(https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.ht >ml). > >The assertion failure is hard to reproduce. In order to root cause issue, this >patch is to add logs to dump PIR and vIRR when failure takes place. It >should be reverted once the root cause is found. > >Signed-off-by: Chao Gao <chao.gao@intel.com> >--- > xen/arch/x86/hvm/vmx/intr.c | 43 >+++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 43 insertions(+) > >diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c >index 24e4505..5e5b37a 100644 >--- a/xen/arch/x86/hvm/vmx/intr.c >+++ b/xen/arch/x86/hvm/vmx/intr.c >@@ -23,6 +23,7 @@ > #include <xen/errno.h> > #include <xen/trace.h> > #include <xen/event.h> >+#include <asm/apicdef.h> > #include <asm/current.h> > #include <asm/cpufeature.h> > #include <asm/processor.h> >@@ -318,6 +319,48 @@ void vmx_intr_assist(void) > */ > if ( pt_vector != -1 ) > { >+ /* >+ * We assert that intack.vector is the highest priority vector >for >+ * only an interrupt from vlapic can reach this point and >the >+ * highest vector is chosen in hvm_vcpu_has_pending_irq(). >+ * But, in fact, the assertion failed sometimes. It is >suspected >+ * that PIR is not synced to vIRR which makes pt_vector is >left in >+ * PIR. In order to verify this suspicion, dump some >information >+ * when the assertion fails. >+ */ Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c, int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) { /* This may race with setting of irr in __apic_accept_irq() and * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq * will cause vmexit immediately and the value will be recalculated * on the next vmentry. */ ... } I am afraid, there may be a similar race with setting of vIRR.. Quan >+ if ( unlikely(intack.vector < pt_vector) ) >+ { >+ struct vlapic *vlapic; >+ struct pi_desc *pi_desc; >+ uint32_t *word; >+ int i; >+ >+ printk("Assertion failed on %pv. Some info are >below.\n", >+ current); >+ printk("intack info: vector 0x%2x, source %x\n", >intack.vector, >+ intack.source); >+ printk("pt_vector info: vector 0x%2x\n", pt_vector); >+ >+ vlapic = vcpu_vlapic(v); >+ if ( vlapic && vlapic->regs->data ) >+ { >+ word = (void *)&vlapic->regs->data[APIC_IRR]; >+ printk("vIRR:"); >+ for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- ) >+ printk(" %08x", word[i*4]); >+ printk("\n"); >+ } >+ >+ pi_desc = &v->arch.hvm_vmx.pi_desc; >+ if ( pi_desc && pi_desc->pir ) >+ { >+ word = (void *)&pi_desc->pir; >+ printk(" PIR:"); >+ for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- ) >+ printk(" %08x", word[i]); >+ printk("\n"); >+ } >+ } > ASSERT(intack.vector >= pt_vector); > vmx_set_eoi_exit_bitmap(v, intack.vector); > } >-- >1.8.3.1
On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote: >>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote: >> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote: >>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote: >>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c, >>>> >>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) >>>> { >>>> /* This may race with setting of irr in __apic_accept_irq() and >>>> * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq >>>> * will cause vmexit immediately and the value will be recalculated >>>> * on the next vmentry. >>>> */ >>>> ... >>>> } >>>> >>>> I am afraid, there may be a similar race with setting of vIRR.. >>> >>>I think this is unrelated: If another interrupt came in, the highest >>>set bit in vIRR can only increase. Plus pt_update_irq(), before >>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's >>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the >>>highest priority interrupt should still have a vector >= pt_vector. >> >> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30 >> when the assertion failed. In the process of debugging pt_update_irq() >> when I was working on another feature, I noticed that 0x30 is always >> the vector of IRQ2. I suspect that the source of the periodic timer interrupt >> is not lapic. If that, pt_update_irq() reads the vioapic entry twice, >> one in hvm_isa_irq_assert() and the other in pt_irq_vector(). >> If guest changed the content in viopaic entry between the two read >> (ie. change the vector from 0x30 to 0x38), the assertion would fail. >> Do you think it is a reasonable explanation? > >IRQ2? If this was going via the PIC, that would be impossible, as >that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the >IO-APIC (and we follow this model - see tools/libacpi/build.c). How Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC. >did you determine that 0x30 is the vector of IRQ2? By monitoring the write operation to IO-APIC redirection entry. From experimental data, linux kernel always writes vector 0x30 to redirection entry of pin2 of IO-APIC. Thank, chao > >Jan >
>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has > added a assertion that intack.vector is the highest priority vector. But > according to the osstest, the assertion failed sometimes. More discussion can > be found in the thread > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). > > The assertion failure is hard to reproduce. In order to root cause issue, this > patch is to add logs to dump PIR and vIRR when failure takes place. It should > be reverted once the root cause is found. > > Signed-off-by: Chao Gao <chao.gao@intel.com> If this wasn't just a temporary thing, I'd have some comments, but it's good enough (perhaps with a little massaging when committing) for this purpose. Reviewed-by: Jan Beulich <jbeulich@suse.com> Ian, Wei, is there a way to automatically scan osstest logs for occurrences of the extra output, or does someone need to look through the logs of every spurious failure? Jan
>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote: > Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c, > > int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) > { > /* This may race with setting of irr in __apic_accept_irq() and > * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq > * will cause vmexit immediately and the value will be recalculated > * on the next vmentry. > */ > ... > } > > I am afraid, there may be a similar race with setting of vIRR.. I think this is unrelated: If another interrupt came in, the highest set bit in vIRR can only increase. Plus pt_update_irq(), before returning, calls vlapic_set_irq(), which ought to result in pt_vector's vIRR bit to be set (either directly or via setting its PIR bit). I.e. the highest priority interrupt should still have a vector >= pt_vector. Jan
On Tue, Feb 07, 2017 at 02:51:54AM -0700, Jan Beulich wrote: > >>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: > > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has > > added a assertion that intack.vector is the highest priority vector. But > > according to the osstest, the assertion failed sometimes. More discussion can > > be found in the thread > > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). > > > > The assertion failure is hard to reproduce. In order to root cause issue, this > > patch is to add logs to dump PIR and vIRR when failure takes place. It should > > be reverted once the root cause is found. > > > > Signed-off-by: Chao Gao <chao.gao@intel.com> > > If this wasn't just a temporary thing, I'd have some comments, but > it's good enough (perhaps with a little massaging when committing) > for this purpose. > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > Ian, Wei, is there a way to automatically scan osstest logs for > occurrences of the extra output, or does someone need to look > through the logs of every spurious failure? > I don't think there is automatically scanning in place. > Jan >
>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote: > On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote: >>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote: >>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c, >>> >>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) >>> { >>> /* This may race with setting of irr in __apic_accept_irq() and >>> * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq >>> * will cause vmexit immediately and the value will be recalculated >>> * on the next vmentry. >>> */ >>> ... >>> } >>> >>> I am afraid, there may be a similar race with setting of vIRR.. >> >>I think this is unrelated: If another interrupt came in, the highest >>set bit in vIRR can only increase. Plus pt_update_irq(), before >>returning, calls vlapic_set_irq(), which ought to result in pt_vector's >>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the >>highest priority interrupt should still have a vector >= pt_vector. > > I have noticed that pt_vector was 0x38 and the hightest vector was 0x30 > when the assertion failed. In the process of debugging pt_update_irq() > when I was working on another feature, I noticed that 0x30 is always > the vector of IRQ2. I suspect that the source of the periodic timer interrupt > is not lapic. If that, pt_update_irq() reads the vioapic entry twice, > one in hvm_isa_irq_assert() and the other in pt_irq_vector(). > If guest changed the content in viopaic entry between the two read > (ie. change the vector from 0x30 to 0x38), the assertion would fail. > Do you think it is a reasonable explanation? IRQ2? If this was going via the PIC, that would be impossible, as that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the IO-APIC (and we follow this model - see tools/libacpi/build.c). How did you determine that 0x30 is the vector of IRQ2? Jan
>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote: > On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote: >>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote: >>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote: >>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote: >>>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c, >>>>> >>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) >>>>> { >>>>> /* This may race with setting of irr in __apic_accept_irq() and >>>>> * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq >>>>> * will cause vmexit immediately and the value will be recalculated >>>>> * on the next vmentry. >>>>> */ >>>>> ... >>>>> } >>>>> >>>>> I am afraid, there may be a similar race with setting of vIRR.. >>>> >>>>I think this is unrelated: If another interrupt came in, the highest >>>>set bit in vIRR can only increase. Plus pt_update_irq(), before >>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's >>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the >>>>highest priority interrupt should still have a vector >= pt_vector. >>> >>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30 >>> when the assertion failed. In the process of debugging pt_update_irq() >>> when I was working on another feature, I noticed that 0x30 is always >>> the vector of IRQ2. I suspect that the source of the periodic timer > interrupt >>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice, >>> one in hvm_isa_irq_assert() and the other in pt_irq_vector(). >>> If guest changed the content in viopaic entry between the two read >>> (ie. change the vector from 0x30 to 0x38), the assertion would fail. >>> Do you think it is a reasonable explanation? >> >>IRQ2? If this was going via the PIC, that would be impossible, as >>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the >>IO-APIC (and we follow this model - see tools/libacpi/build.c). How > > Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC. > >>did you determine that 0x30 is the vector of IRQ2? > > By monitoring the write operation to IO-APIC redirection entry. > From experimental data, linux kernel always writes vector 0x30 to > redirection entry of pin2 of IO-APIC. But if it goes through the IO-APIC, it'll also go through the LAPIC (and in the ExtInt case you wouldn't be able to read a valid vector from the RTE). Jan
On Tue, Feb 07, 2017 at 09:18:56AM -0700, Jan Beulich wrote: >>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote: >> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote: >>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote: >>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote: >>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote: >>>>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c, >>>>>> >>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) >>>>>> { >>>>>> /* This may race with setting of irr in __apic_accept_irq() and >>>>>> * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq >>>>>> * will cause vmexit immediately and the value will be recalculated >>>>>> * on the next vmentry. >>>>>> */ >>>>>> ... >>>>>> } >>>>>> >>>>>> I am afraid, there may be a similar race with setting of vIRR.. >>>>> >>>>>I think this is unrelated: If another interrupt came in, the highest >>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before >>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's >>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the >>>>>highest priority interrupt should still have a vector >= pt_vector. >>>> >>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30 >>>> when the assertion failed. In the process of debugging pt_update_irq() >>>> when I was working on another feature, I noticed that 0x30 is always >>>> the vector of IRQ2. I suspect that the source of the periodic timer >> interrupt >>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice, >>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector(). >>>> If guest changed the content in viopaic entry between the two read >>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail. >>>> Do you think it is a reasonable explanation? >>> >>>IRQ2? If this was going via the PIC, that would be impossible, as >>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the >>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How >> >> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC. >> >>>did you determine that 0x30 is the vector of IRQ2? >> >> By monitoring the write operation to IO-APIC redirection entry. >> From experimental data, linux kernel always writes vector 0x30 to >> redirection entry of pin2 of IO-APIC. > >But if it goes through the IO-APIC, it'll also go through the LAPIC >(and in the ExtInt case you wouldn't be able to read a valid vector >from the RTE). Yes. If a pt interrupt goes through IO-APIC, pt_update_irq() will read the RTE twice. One is in hvm_isa_irq_assert() to determine which bit in vIRR should be set. The other is in pt_irq_vector() to get the bit we have set and return it. Is it possible that the return vector is different from the bit set in vIRR of LAPIC, since guest has changed RTE of IO-APIC pin2. thanks, chao > >Jan >
On Wed, Feb 08, 2017 at 04:06:53PM +0800, Tian, Kevin wrote: >> From: Gao, Chao >> Sent: Wednesday, February 08, 2017 8:12 AM >> >> On Tue, Feb 07, 2017 at 09:18:56AM -0700, Jan Beulich wrote: >> >>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote: >> >> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote: >> >>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote: >> >>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote: >> >>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote: >> >>>>>> Some comment from QEMU/KVM code, in >> <linux-kernel>/arch/x86/kvm/lapic.c, >> >>>>>> >> >>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) >> >>>>>> { >> >>>>>> /* This may race with setting of irr in __apic_accept_irq() and >> >>>>>> * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq >> >>>>>> * will cause vmexit immediately and the value will be recalculated >> >>>>>> * on the next vmentry. >> >>>>>> */ >> >>>>>> ... >> >>>>>> } >> >>>>>> >> >>>>>> I am afraid, there may be a similar race with setting of vIRR.. >> >>>>> >> >>>>>I think this is unrelated: If another interrupt came in, the highest >> >>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before >> >>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's >> >>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the >> >>>>>highest priority interrupt should still have a vector >= pt_vector. >> >>>> >> >>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30 >> >>>> when the assertion failed. In the process of debugging pt_update_irq() >> >>>> when I was working on another feature, I noticed that 0x30 is always >> >>>> the vector of IRQ2. I suspect that the source of the periodic timer >> >> interrupt >> >>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice, >> >>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector(). >> >>>> If guest changed the content in viopaic entry between the two read >> >>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail. >> >>>> Do you think it is a reasonable explanation? >> >>> >> >>>IRQ2? If this was going via the PIC, that would be impossible, as >> >>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the >> >>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How >> >> >> >> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC. >> >> >> >>>did you determine that 0x30 is the vector of IRQ2? >> >> >> >> By monitoring the write operation to IO-APIC redirection entry. >> >> From experimental data, linux kernel always writes vector 0x30 to >> >> redirection entry of pin2 of IO-APIC. >> > >> >But if it goes through the IO-APIC, it'll also go through the LAPIC >> >(and in the ExtInt case you wouldn't be able to read a valid vector >> >from the RTE). >> >> Yes. If a pt interrupt goes through IO-APIC, pt_update_irq() will read >> the RTE twice. One is in hvm_isa_irq_assert() to determine which bit >> in vIRR should be set. The other is in pt_irq_vector() to get the bit >> we have set and return it. Is it possible that the return vector is >> different from the bit set in vIRR of LAPIC, since guest has changed >> RTE of IO-APIC pin2. >> > >In theory it could happen. There is no lock to prevent vIOAPIC RTE >being modified between above two operations. But I'm not sure >whether it's the reason for this bug. My feeling is that vector for IRQ0 >should be fixed one. Did you capture such vector change in your side? I agree. I don't see guest changes the vector for IRQ0 in hundreds of tests (simply create and shutdown guest). Thanks chao > >Thanks >Kevin
> From: Wei Liu [mailto:wei.liu2@citrix.com] > Sent: Tuesday, February 07, 2017 6:05 PM > > On Tue, Feb 07, 2017 at 02:51:54AM -0700, Jan Beulich wrote: > > >>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: > > > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has > > > added a assertion that intack.vector is the highest priority vector. But > > > according to the osstest, the assertion failed sometimes. More discussion can > > > be found in the thread > > > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). > > > > > > The assertion failure is hard to reproduce. In order to root cause issue, this > > > patch is to add logs to dump PIR and vIRR when failure takes place. It should > > > be reverted once the root cause is found. > > > > > > Signed-off-by: Chao Gao <chao.gao@intel.com> > > > > If this wasn't just a temporary thing, I'd have some comments, but > > it's good enough (perhaps with a little massaging when committing) > > for this purpose. > > > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > > > Ian, Wei, is there a way to automatically scan osstest logs for > > occurrences of the extra output, or does someone need to look > > through the logs of every spurious failure? > > > > I don't think there is automatically scanning in place. Curious how this issue was initially caught? Would same practice make sure next fail catching our eye? Thanks Kevin
> From: Gao, Chao > Sent: Wednesday, February 08, 2017 8:12 AM > > On Tue, Feb 07, 2017 at 09:18:56AM -0700, Jan Beulich wrote: > >>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote: > >> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote: > >>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote: > >>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote: > >>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote: > >>>>>> Some comment from QEMU/KVM code, in > <linux-kernel>/arch/x86/kvm/lapic.c, > >>>>>> > >>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) > >>>>>> { > >>>>>> /* This may race with setting of irr in __apic_accept_irq() and > >>>>>> * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq > >>>>>> * will cause vmexit immediately and the value will be recalculated > >>>>>> * on the next vmentry. > >>>>>> */ > >>>>>> ... > >>>>>> } > >>>>>> > >>>>>> I am afraid, there may be a similar race with setting of vIRR.. > >>>>> > >>>>>I think this is unrelated: If another interrupt came in, the highest > >>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before > >>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's > >>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the > >>>>>highest priority interrupt should still have a vector >= pt_vector. > >>>> > >>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30 > >>>> when the assertion failed. In the process of debugging pt_update_irq() > >>>> when I was working on another feature, I noticed that 0x30 is always > >>>> the vector of IRQ2. I suspect that the source of the periodic timer > >> interrupt > >>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice, > >>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector(). > >>>> If guest changed the content in viopaic entry between the two read > >>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail. > >>>> Do you think it is a reasonable explanation? > >>> > >>>IRQ2? If this was going via the PIC, that would be impossible, as > >>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the > >>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How > >> > >> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC. > >> > >>>did you determine that 0x30 is the vector of IRQ2? > >> > >> By monitoring the write operation to IO-APIC redirection entry. > >> From experimental data, linux kernel always writes vector 0x30 to > >> redirection entry of pin2 of IO-APIC. > > > >But if it goes through the IO-APIC, it'll also go through the LAPIC > >(and in the ExtInt case you wouldn't be able to read a valid vector > >from the RTE). > > Yes. If a pt interrupt goes through IO-APIC, pt_update_irq() will read > the RTE twice. One is in hvm_isa_irq_assert() to determine which bit > in vIRR should be set. The other is in pt_irq_vector() to get the bit > we have set and return it. Is it possible that the return vector is > different from the bit set in vIRR of LAPIC, since guest has changed > RTE of IO-APIC pin2. > In theory it could happen. There is no lock to prevent vIOAPIC RTE being modified between above two operations. But I'm not sure whether it's the reason for this bug. My feeling is that vector for IRQ0 should be fixed one. Did you capture such vector change in your side? Thanks Kevin
>>> On 08.02.17 at 08:49, <kevin.tian@intel.com> wrote: > Curious how this issue was initially caught? Would same practice make > sure next fail catching our eye? I guess Andrew just happened to look at the logs of a spurious failure of some test. Jan
>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has > added a assertion that intack.vector is the highest priority vector. But > according to the osstest, the assertion failed sometimes. More discussion can > be found in the thread > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). > > The assertion failure is hard to reproduce. In order to root cause issue, this > patch is to add logs to dump PIR and vIRR when failure takes place. It should > be reverted once the root cause is found. > > Signed-off-by: Chao Gao <chao.gao@intel.com> Jun, Kevin - can you ack this? Or was Chao expected to make any changes? I'd like to see this go in rather sooner than later, so we can get it back out well before 4.9 is going to settle. Jan
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Friday, February 10, 2017 5:36 PM > > >>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: > > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has > > added a assertion that intack.vector is the highest priority vector. But > > according to the osstest, the assertion failed sometimes. More discussion can > > be found in the thread > > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). > > > > The assertion failure is hard to reproduce. In order to root cause issue, this > > patch is to add logs to dump PIR and vIRR when failure takes place. It should > > be reverted once the root cause is found. > > > > Signed-off-by: Chao Gao <chao.gao@intel.com> > > Jun, Kevin - can you ack this? Or was Chao expected to make any > changes? I'd like to see this go in rather sooner than later, so we > can get it back out well before 4.9 is going to settle. > My bad... thought I've acked but seems distracted by later discussions. Acked-by: Kevin Tian <kevin.tian@intel.com>
>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has > added a assertion that intack.vector is the highest priority vector. But > according to the osstest, the assertion failed sometimes. More discussion can > be found in the thread > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). > > The assertion failure is hard to reproduce. In order to root cause issue, this > patch is to add logs to dump PIR and vIRR when failure takes place. It should > be reverted once the root cause is found. Julien, could you add a note on the 4.9 tracking list for us to not forget to revert this (or at least add NDEBUG conditionals) the latest in the RC phase? Thanks, Jan
Hi Jan, On 02/14/2017 10:51 AM, Jan Beulich wrote: >>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: >> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has >> added a assertion that intack.vector is the highest priority vector. But >> according to the osstest, the assertion failed sometimes. More discussion can >> be found in the thread >> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). >> >> The assertion failure is hard to reproduce. In order to root cause issue, this >> patch is to add logs to dump PIR and vIRR when failure takes place. It should >> be reverted once the root cause is found. > > Julien, could you add a note on the 4.9 tracking list for us to not > forget to revert this (or at least add NDEBUG conditionals) the > latest in the RC phase? I have added a note in the 4.9 tracking list. Cheers,
(Resending the e-mail due to SMTP issue) On 02/14/2017 12:12 PM, Julien Grall wrote: > Hi Jan, > > On 02/14/2017 10:51 AM, Jan Beulich wrote: >>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: >>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv >>> issue") has >>> added a assertion that intack.vector is the highest priority vector. But >>> according to the osstest, the assertion failed sometimes. More >>> discussion can >>> be found in the thread >>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). >>> >>> >>> The assertion failure is hard to reproduce. In order to root cause >>> issue, this >>> patch is to add logs to dump PIR and vIRR when failure takes place. >>> It should >>> be reverted once the root cause is found. >> >> Julien, could you add a note on the 4.9 tracking list for us to not >> forget to revert this (or at least add NDEBUG conditionals) the >> latest in the RC phase? > > I have added a note in the 4.9 tracking list. > > Cheers, >
Hi, On 14/02/17 10:51, Jan Beulich wrote: >>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: >> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has >> added a assertion that intack.vector is the highest priority vector. But >> according to the osstest, the assertion failed sometimes. More discussion can >> be found in the thread >> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). >> >> The assertion failure is hard to reproduce. In order to root cause issue, this >> patch is to add logs to dump PIR and vIRR when failure takes place. It should >> be reverted once the root cause is found. > > Julien, could you add a note on the 4.9 tracking list for us to not > forget to revert this (or at least add NDEBUG conditionals) the > latest in the RC phase? Bumping this discussion as we are in the RC phase. Was the root cause found and we can revert the patch? Or shall we surround the code with NDEBUG? Cheers,
>>> On 04.05.17 at 11:47, <julien.grall@arm.com> wrote: > On 14/02/17 10:51, Jan Beulich wrote: >>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote: >>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has >>> added a assertion that intack.vector is the highest priority vector. But >>> according to the osstest, the assertion failed sometimes. More discussion can >>> be found in the thread >>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). >>> >>> The assertion failure is hard to reproduce. In order to root cause issue, this >>> patch is to add logs to dump PIR and vIRR when failure takes place. It should >>> be reverted once the root cause is found. >> >> Julien, could you add a note on the 4.9 tracking list for us to not >> forget to revert this (or at least add NDEBUG conditionals) the >> latest in the RC phase? > > Bumping this discussion as we are in the RC phase. Was the root cause > found and we can revert the patch? Sadly not, so far. > Or shall we surround the code with NDEBUG? Yes, I'll submit a patch. Jan
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index 24e4505..5e5b37a 100644 --- a/xen/arch/x86/hvm/vmx/intr.c +++ b/xen/arch/x86/hvm/vmx/intr.c @@ -23,6 +23,7 @@ #include <xen/errno.h> #include <xen/trace.h> #include <xen/event.h> +#include <asm/apicdef.h> #include <asm/current.h> #include <asm/cpufeature.h> #include <asm/processor.h> @@ -318,6 +319,48 @@ void vmx_intr_assist(void) */ if ( pt_vector != -1 ) { + /* + * We assert that intack.vector is the highest priority vector for + * only an interrupt from vlapic can reach this point and the + * highest vector is chosen in hvm_vcpu_has_pending_irq(). + * But, in fact, the assertion failed sometimes. It is suspected + * that PIR is not synced to vIRR which makes pt_vector is left in + * PIR. In order to verify this suspicion, dump some information + * when the assertion fails. + */ + if ( unlikely(intack.vector < pt_vector) ) + { + struct vlapic *vlapic; + struct pi_desc *pi_desc; + uint32_t *word; + int i; + + printk("Assertion failed on %pv. Some info are below.\n", + current); + printk("intack info: vector 0x%2x, source %x\n", intack.vector, + intack.source); + printk("pt_vector info: vector 0x%2x\n", pt_vector); + + vlapic = vcpu_vlapic(v); + if ( vlapic && vlapic->regs->data ) + { + word = (void *)&vlapic->regs->data[APIC_IRR]; + printk("vIRR:"); + for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- ) + printk(" %08x", word[i*4]); + printk("\n"); + } + + pi_desc = &v->arch.hvm_vmx.pi_desc; + if ( pi_desc && pi_desc->pir ) + { + word = (void *)&pi_desc->pir; + printk(" PIR:"); + for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- ) + printk(" %08x", word[i]); + printk("\n"); + } + } ASSERT(intack.vector >= pt_vector); vmx_set_eoi_exit_bitmap(v, intack.vector); }
Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has added a assertion that intack.vector is the highest priority vector. But according to the osstest, the assertion failed sometimes. More discussion can be found in the thread (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html). The assertion failure is hard to reproduce. In order to root cause issue, this patch is to add logs to dump PIR and vIRR when failure takes place. It should be reverted once the root cause is found. Signed-off-by: Chao Gao <chao.gao@intel.com> --- xen/arch/x86/hvm/vmx/intr.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+)