diff mbox

X86/vmx: Dump PIR and vIRR before ASSERT()

Message ID 20170206233245.GA8766@skl-2s3.sh.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chao Gao Feb. 6, 2017, 11:32 p.m. UTC
Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
added a assertion that intack.vector is the highest priority vector. But
according to the osstest, the assertion failed sometimes. More discussion can
be found in the thread
(https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).

The assertion failure is hard to reproduce. In order to root cause issue, this
patch is to add logs to dump PIR and vIRR when failure takes place. It should
be reverted once the root cause is found.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/arch/x86/hvm/vmx/intr.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

Comments

Chao Gao Feb. 7, 2017, 6:32 a.m. UTC | #1
On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>> 
>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>> {
>> 	/* This may race with setting of irr in __apic_accept_irq() and
>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>> 	 * will cause vmexit immediately and the value will be recalculated
>> 	 * on the next vmentry.
>> 	 */
>> ...
>> }
>> 
>> I am afraid, there may be a similar race with setting of vIRR..
>
>I think this is unrelated: If another interrupt came in, the highest
>set bit in vIRR can only increase. Plus pt_update_irq(), before
>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>highest priority interrupt should still have a vector >= pt_vector.

I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
when the assertion failed. In the process of debugging pt_update_irq() 
when I was working on another feature, I noticed that 0x30 is always 
the vector of IRQ2. I suspect that the source of the periodic timer interrupt
is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
one in hvm_isa_irq_assert() and the other in pt_irq_vector().
If guest changed the content in viopaic entry between the two read
(ie. change the vector from 0x30 to 0x38), the assertion would fail.
Do you think it is a reasonable explanation?

Thanks,
chao
>
>Jan
>
Xuquan (Euler) Feb. 7, 2017, 6:48 a.m. UTC | #2
On February 07, 2017 7:33 AM, Chao Gao wrote:
>Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue")
>has added a assertion that intack.vector is the highest priority vector. But
>according to the osstest, the assertion failed sometimes. More discussion
>can be found in the thread
>(https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.ht
>ml).
>
>The assertion failure is hard to reproduce. In order to root cause issue, this
>patch is to add logs to dump PIR and vIRR when failure takes place. It
>should be reverted once the root cause is found.
>
>Signed-off-by: Chao Gao <chao.gao@intel.com>
>---
> xen/arch/x86/hvm/vmx/intr.c | 43
>+++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
>diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>index 24e4505..5e5b37a 100644
>--- a/xen/arch/x86/hvm/vmx/intr.c
>+++ b/xen/arch/x86/hvm/vmx/intr.c
>@@ -23,6 +23,7 @@
> #include <xen/errno.h>
> #include <xen/trace.h>
> #include <xen/event.h>
>+#include <asm/apicdef.h>
> #include <asm/current.h>
> #include <asm/cpufeature.h>
> #include <asm/processor.h>
>@@ -318,6 +319,48 @@ void vmx_intr_assist(void)
>         */
>         if ( pt_vector != -1 )
>         {
>+            /*
>+             * We assert that intack.vector is the highest priority vector
>for
>+             * only an interrupt from vlapic can reach this point and
>the
>+             * highest vector is chosen in hvm_vcpu_has_pending_irq().
>+             * But, in fact, the assertion failed sometimes. It is
>suspected
>+             * that PIR is not synced to vIRR which makes pt_vector is
>left in
>+             * PIR. In order to verify this suspicion, dump some
>information
>+             * when the assertion fails.
>+             */

Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,

int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
{
	/* This may race with setting of irr in __apic_accept_irq() and
	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
	 * will cause vmexit immediately and the value will be recalculated
	 * on the next vmentry.
	 */
...
}

I am afraid, there may be a similar race with setting of vIRR..

Quan




>+            if ( unlikely(intack.vector < pt_vector) )
>+            {
>+                struct vlapic *vlapic;
>+                struct pi_desc *pi_desc;
>+                uint32_t *word;
>+                int i;
>+
>+                printk("Assertion failed on %pv. Some info are
>below.\n",
>+                       current);
>+                printk("intack info: vector 0x%2x, source %x\n",
>intack.vector,
>+                       intack.source);
>+                printk("pt_vector info: vector 0x%2x\n", pt_vector);
>+
>+                vlapic = vcpu_vlapic(v);
>+                if ( vlapic && vlapic->regs->data )
>+                {
>+                    word = (void *)&vlapic->regs->data[APIC_IRR];
>+                    printk("vIRR:");
>+                    for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- )
>+                        printk(" %08x", word[i*4]);
>+                    printk("\n");
>+                }
>+
>+                pi_desc = &v->arch.hvm_vmx.pi_desc;
>+                if ( pi_desc && pi_desc->pir )
>+                {
>+                    word = (void *)&pi_desc->pir;
>+                    printk(" PIR:");
>+                    for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- )
>+                        printk(" %08x", word[i]);
>+                    printk("\n");
>+                }
>+            }
>             ASSERT(intack.vector >= pt_vector);
>             vmx_set_eoi_exit_bitmap(v, intack.vector);
>         }
>--
>1.8.3.1
Chao Gao Feb. 7, 2017, 7:28 a.m. UTC | #3
On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>>>> 
>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>>>> {
>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>>>> 	 * will cause vmexit immediately and the value will be recalculated
>>>> 	 * on the next vmentry.
>>>> 	 */
>>>> ...
>>>> }
>>>> 
>>>> I am afraid, there may be a similar race with setting of vIRR..
>>>
>>>I think this is unrelated: If another interrupt came in, the highest
>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>>>highest priority interrupt should still have a vector >= pt_vector.
>> 
>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
>> when the assertion failed. In the process of debugging pt_update_irq() 
>> when I was working on another feature, I noticed that 0x30 is always 
>> the vector of IRQ2. I suspect that the source of the periodic timer interrupt
>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
>> If guest changed the content in viopaic entry between the two read
>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
>> Do you think it is a reasonable explanation?
>
>IRQ2? If this was going via the PIC, that would be impossible, as
>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
>IO-APIC (and we follow this model - see tools/libacpi/build.c). How

Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.

>did you determine that 0x30 is the vector of IRQ2?

By monitoring the write operation to IO-APIC redirection entry.
From experimental data, linux kernel always writes vector 0x30 to
redirection entry of pin2 of IO-APIC.

Thank,
chao
>
>Jan
>
Jan Beulich Feb. 7, 2017, 9:51 a.m. UTC | #4
>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> added a assertion that intack.vector is the highest priority vector. But
> according to the osstest, the assertion failed sometimes. More discussion can
> be found in the thread
> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> 
> The assertion failure is hard to reproduce. In order to root cause issue, this
> patch is to add logs to dump PIR and vIRR when failure takes place. It should
> be reverted once the root cause is found.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

If this wasn't just a temporary thing, I'd have some comments, but
it's good enough (perhaps with a little massaging when committing)
for this purpose.

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Ian, Wei, is there a way to automatically scan osstest logs for
occurrences of the extra output, or does someone need to look
through the logs of every spurious failure?

Jan
Jan Beulich Feb. 7, 2017, 10:04 a.m. UTC | #5
>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
> 
> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
> {
> 	/* This may race with setting of irr in __apic_accept_irq() and
> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
> 	 * will cause vmexit immediately and the value will be recalculated
> 	 * on the next vmentry.
> 	 */
> ...
> }
> 
> I am afraid, there may be a similar race with setting of vIRR..

I think this is unrelated: If another interrupt came in, the highest
set bit in vIRR can only increase. Plus pt_update_irq(), before
returning, calls vlapic_set_irq(), which ought to result in pt_vector's
vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
highest priority interrupt should still have a vector >= pt_vector.

Jan
Wei Liu Feb. 7, 2017, 10:04 a.m. UTC | #6
On Tue, Feb 07, 2017 at 02:51:54AM -0700, Jan Beulich wrote:
> >>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> > added a assertion that intack.vector is the highest priority vector. But
> > according to the osstest, the assertion failed sometimes. More discussion can
> > be found in the thread
> > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> > 
> > The assertion failure is hard to reproduce. In order to root cause issue, this
> > patch is to add logs to dump PIR and vIRR when failure takes place. It should
> > be reverted once the root cause is found.
> > 
> > Signed-off-by: Chao Gao <chao.gao@intel.com>
> 
> If this wasn't just a temporary thing, I'd have some comments, but
> it's good enough (perhaps with a little massaging when committing)
> for this purpose.
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Ian, Wei, is there a way to automatically scan osstest logs for
> occurrences of the extra output, or does someone need to look
> through the logs of every spurious failure?
> 

I don't think there is automatically scanning in place.

> Jan
>
Jan Beulich Feb. 7, 2017, 1:46 p.m. UTC | #7
>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>>> 
>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>>> {
>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>>> 	 * will cause vmexit immediately and the value will be recalculated
>>> 	 * on the next vmentry.
>>> 	 */
>>> ...
>>> }
>>> 
>>> I am afraid, there may be a similar race with setting of vIRR..
>>
>>I think this is unrelated: If another interrupt came in, the highest
>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>>highest priority interrupt should still have a vector >= pt_vector.
> 
> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
> when the assertion failed. In the process of debugging pt_update_irq() 
> when I was working on another feature, I noticed that 0x30 is always 
> the vector of IRQ2. I suspect that the source of the periodic timer interrupt
> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
> If guest changed the content in viopaic entry between the two read
> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
> Do you think it is a reasonable explanation?

IRQ2? If this was going via the PIC, that would be impossible, as
that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
IO-APIC (and we follow this model - see tools/libacpi/build.c). How
did you determine that 0x30 is the vector of IRQ2?

Jan
Jan Beulich Feb. 7, 2017, 4:18 p.m. UTC | #8
>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote:
> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>>>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>>>>> 
>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>>>>> {
>>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>>>>> 	 * will cause vmexit immediately and the value will be recalculated
>>>>> 	 * on the next vmentry.
>>>>> 	 */
>>>>> ...
>>>>> }
>>>>> 
>>>>> I am afraid, there may be a similar race with setting of vIRR..
>>>>
>>>>I think this is unrelated: If another interrupt came in, the highest
>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>>>>highest priority interrupt should still have a vector >= pt_vector.
>>> 
>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
>>> when the assertion failed. In the process of debugging pt_update_irq() 
>>> when I was working on another feature, I noticed that 0x30 is always 
>>> the vector of IRQ2. I suspect that the source of the periodic timer 
> interrupt
>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
>>> If guest changed the content in viopaic entry between the two read
>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
>>> Do you think it is a reasonable explanation?
>>
>>IRQ2? If this was going via the PIC, that would be impossible, as
>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How
> 
> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.
> 
>>did you determine that 0x30 is the vector of IRQ2?
> 
> By monitoring the write operation to IO-APIC redirection entry.
> From experimental data, linux kernel always writes vector 0x30 to
> redirection entry of pin2 of IO-APIC.

But if it goes through the IO-APIC, it'll also go through the LAPIC
(and in the ExtInt case you wouldn't be able to read a valid vector
from the RTE).

Jan
Chao Gao Feb. 8, 2017, 12:12 a.m. UTC | #9
On Tue, Feb 07, 2017 at 09:18:56AM -0700, Jan Beulich wrote:
>>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote:
>> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
>>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
>>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>>>>>> Some comment from QEMU/KVM code, in <linux-kernel>/arch/x86/kvm/lapic.c,
>>>>>> 
>>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>>>>>> {
>>>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>>>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>>>>>> 	 * will cause vmexit immediately and the value will be recalculated
>>>>>> 	 * on the next vmentry.
>>>>>> 	 */
>>>>>> ...
>>>>>> }
>>>>>> 
>>>>>> I am afraid, there may be a similar race with setting of vIRR..
>>>>>
>>>>>I think this is unrelated: If another interrupt came in, the highest
>>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>>>>>highest priority interrupt should still have a vector >= pt_vector.
>>>> 
>>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
>>>> when the assertion failed. In the process of debugging pt_update_irq() 
>>>> when I was working on another feature, I noticed that 0x30 is always 
>>>> the vector of IRQ2. I suspect that the source of the periodic timer 
>> interrupt
>>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
>>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
>>>> If guest changed the content in viopaic entry between the two read
>>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
>>>> Do you think it is a reasonable explanation?
>>>
>>>IRQ2? If this was going via the PIC, that would be impossible, as
>>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
>>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How
>> 
>> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.
>> 
>>>did you determine that 0x30 is the vector of IRQ2?
>> 
>> By monitoring the write operation to IO-APIC redirection entry.
>> From experimental data, linux kernel always writes vector 0x30 to
>> redirection entry of pin2 of IO-APIC.
>
>But if it goes through the IO-APIC, it'll also go through the LAPIC
>(and in the ExtInt case you wouldn't be able to read a valid vector
>from the RTE).

Yes. If a pt interrupt goes through IO-APIC, pt_update_irq() will read
the RTE twice. One is in hvm_isa_irq_assert() to determine which bit
in vIRR should be set. The other is in pt_irq_vector() to get the bit
we have set and return it. Is it possible that the return vector is
different from the bit set in vIRR of LAPIC, since guest has changed
RTE of IO-APIC pin2.

thanks,
chao
>
>Jan
>
Chao Gao Feb. 8, 2017, 1:30 a.m. UTC | #10
On Wed, Feb 08, 2017 at 04:06:53PM +0800, Tian, Kevin wrote:
>> From: Gao, Chao
>> Sent: Wednesday, February 08, 2017 8:12 AM
>> 
>> On Tue, Feb 07, 2017 at 09:18:56AM -0700, Jan Beulich wrote:
>> >>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote:
>> >> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
>> >>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
>> >>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
>> >>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
>> >>>>>> Some comment from QEMU/KVM code, in
>> <linux-kernel>/arch/x86/kvm/lapic.c,
>> >>>>>>
>> >>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
>> >>>>>> {
>> >>>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
>> >>>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
>> >>>>>> 	 * will cause vmexit immediately and the value will be recalculated
>> >>>>>> 	 * on the next vmentry.
>> >>>>>> 	 */
>> >>>>>> ...
>> >>>>>> }
>> >>>>>>
>> >>>>>> I am afraid, there may be a similar race with setting of vIRR..
>> >>>>>
>> >>>>>I think this is unrelated: If another interrupt came in, the highest
>> >>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
>> >>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
>> >>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
>> >>>>>highest priority interrupt should still have a vector >= pt_vector.
>> >>>>
>> >>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
>> >>>> when the assertion failed. In the process of debugging pt_update_irq()
>> >>>> when I was working on another feature, I noticed that 0x30 is always
>> >>>> the vector of IRQ2. I suspect that the source of the periodic timer
>> >> interrupt
>> >>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
>> >>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
>> >>>> If guest changed the content in viopaic entry between the two read
>> >>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
>> >>>> Do you think it is a reasonable explanation?
>> >>>
>> >>>IRQ2? If this was going via the PIC, that would be impossible, as
>> >>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
>> >>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How
>> >>
>> >> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.
>> >>
>> >>>did you determine that 0x30 is the vector of IRQ2?
>> >>
>> >> By monitoring the write operation to IO-APIC redirection entry.
>> >> From experimental data, linux kernel always writes vector 0x30 to
>> >> redirection entry of pin2 of IO-APIC.
>> >
>> >But if it goes through the IO-APIC, it'll also go through the LAPIC
>> >(and in the ExtInt case you wouldn't be able to read a valid vector
>> >from the RTE).
>> 
>> Yes. If a pt interrupt goes through IO-APIC, pt_update_irq() will read
>> the RTE twice. One is in hvm_isa_irq_assert() to determine which bit
>> in vIRR should be set. The other is in pt_irq_vector() to get the bit
>> we have set and return it. Is it possible that the return vector is
>> different from the bit set in vIRR of LAPIC, since guest has changed
>> RTE of IO-APIC pin2.
>> 
>
>In theory it could happen. There is no lock to prevent vIOAPIC RTE
>being modified between above two operations. But I'm not sure
>whether it's the reason for this bug. My feeling is that vector for IRQ0
>should be fixed one. Did you capture such vector change in your side?

I agree. I don't see guest changes the vector for IRQ0 in hundreds of tests
(simply create and shutdown guest).

Thanks
chao
>
>Thanks
>Kevin
Tian, Kevin Feb. 8, 2017, 7:49 a.m. UTC | #11
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: Tuesday, February 07, 2017 6:05 PM
> 
> On Tue, Feb 07, 2017 at 02:51:54AM -0700, Jan Beulich wrote:
> > >>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> > > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> > > added a assertion that intack.vector is the highest priority vector. But
> > > according to the osstest, the assertion failed sometimes. More discussion can
> > > be found in the thread
> > > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> > >
> > > The assertion failure is hard to reproduce. In order to root cause issue, this
> > > patch is to add logs to dump PIR and vIRR when failure takes place. It should
> > > be reverted once the root cause is found.
> > >
> > > Signed-off-by: Chao Gao <chao.gao@intel.com>
> >
> > If this wasn't just a temporary thing, I'd have some comments, but
> > it's good enough (perhaps with a little massaging when committing)
> > for this purpose.
> >
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >
> > Ian, Wei, is there a way to automatically scan osstest logs for
> > occurrences of the extra output, or does someone need to look
> > through the logs of every spurious failure?
> >
> 
> I don't think there is automatically scanning in place.

Curious how this issue was initially caught? Would same practice make
sure next fail catching our eye?

Thanks
Kevin
Tian, Kevin Feb. 8, 2017, 8:06 a.m. UTC | #12
> From: Gao, Chao
> Sent: Wednesday, February 08, 2017 8:12 AM
> 
> On Tue, Feb 07, 2017 at 09:18:56AM -0700, Jan Beulich wrote:
> >>>> On 07.02.17 at 08:28, <chao.gao@intel.com> wrote:
> >> On Tue, Feb 07, 2017 at 06:46:16AM -0700, Jan Beulich wrote:
> >>>>>> On 07.02.17 at 07:32, <chao.gao@intel.com> wrote:
> >>>> On Tue, Feb 07, 2017 at 03:04:32AM -0700, Jan Beulich wrote:
> >>>>>>>> On 07.02.17 at 07:48, <xuquan8@huawei.com> wrote:
> >>>>>> Some comment from QEMU/KVM code, in
> <linux-kernel>/arch/x86/kvm/lapic.c,
> >>>>>>
> >>>>>> int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
> >>>>>> {
> >>>>>> 	/* This may race with setting of irr in __apic_accept_irq() and
> >>>>>> 	 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
> >>>>>> 	 * will cause vmexit immediately and the value will be recalculated
> >>>>>> 	 * on the next vmentry.
> >>>>>> 	 */
> >>>>>> ...
> >>>>>> }
> >>>>>>
> >>>>>> I am afraid, there may be a similar race with setting of vIRR..
> >>>>>
> >>>>>I think this is unrelated: If another interrupt came in, the highest
> >>>>>set bit in vIRR can only increase. Plus pt_update_irq(), before
> >>>>>returning, calls vlapic_set_irq(), which ought to result in pt_vector's
> >>>>>vIRR bit to be set (either directly or via setting its PIR bit). I.e. the
> >>>>>highest priority interrupt should still have a vector >= pt_vector.
> >>>>
> >>>> I have noticed that pt_vector was 0x38 and the hightest vector was 0x30
> >>>> when the assertion failed. In the process of debugging pt_update_irq()
> >>>> when I was working on another feature, I noticed that 0x30 is always
> >>>> the vector of IRQ2. I suspect that the source of the periodic timer
> >> interrupt
> >>>> is not lapic. If that, pt_update_irq() reads the vioapic entry twice,
> >>>> one in hvm_isa_irq_assert() and the other in pt_irq_vector().
> >>>> If guest changed the content in viopaic entry between the two read
> >>>> (ie. change the vector from 0x30 to 0x38), the assertion would fail.
> >>>> Do you think it is a reasonable explanation?
> >>>
> >>>IRQ2? If this was going via the PIC, that would be impossible, as
> >>>that's the cascade pin. IRQ0, otoh, normally arrives at pin2 of the
> >>>IO-APIC (and we follow this model - see tools/libacpi/build.c). How
> >>
> >> Sorry, I meaned IRQ0. I was confused with IRQ0 and pin2 of the IO-APIC.
> >>
> >>>did you determine that 0x30 is the vector of IRQ2?
> >>
> >> By monitoring the write operation to IO-APIC redirection entry.
> >> From experimental data, linux kernel always writes vector 0x30 to
> >> redirection entry of pin2 of IO-APIC.
> >
> >But if it goes through the IO-APIC, it'll also go through the LAPIC
> >(and in the ExtInt case you wouldn't be able to read a valid vector
> >from the RTE).
> 
> Yes. If a pt interrupt goes through IO-APIC, pt_update_irq() will read
> the RTE twice. One is in hvm_isa_irq_assert() to determine which bit
> in vIRR should be set. The other is in pt_irq_vector() to get the bit
> we have set and return it. Is it possible that the return vector is
> different from the bit set in vIRR of LAPIC, since guest has changed
> RTE of IO-APIC pin2.
> 

In theory it could happen. There is no lock to prevent vIOAPIC RTE
being modified between above two operations. But I'm not sure
whether it's the reason for this bug. My feeling is that vector for IRQ0
should be fixed one. Did you capture such vector change in your side?

Thanks
Kevin
Jan Beulich Feb. 10, 2017, 9:34 a.m. UTC | #13
>>> On 08.02.17 at 08:49, <kevin.tian@intel.com> wrote:
> Curious how this issue was initially caught? Would same practice make
> sure next fail catching our eye?

I guess Andrew just happened to look at the logs of a spurious
failure of some test.

Jan
Jan Beulich Feb. 10, 2017, 9:36 a.m. UTC | #14
>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> added a assertion that intack.vector is the highest priority vector. But
> according to the osstest, the assertion failed sometimes. More discussion can
> be found in the thread
> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> 
> The assertion failure is hard to reproduce. In order to root cause issue, this
> patch is to add logs to dump PIR and vIRR when failure takes place. It should
> be reverted once the root cause is found.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Jun, Kevin - can you ack this? Or was Chao expected to make any
changes? I'd like to see this go in rather sooner than later, so we
can get it back out well before 4.9 is going to settle.

Jan
Tian, Kevin Feb. 13, 2017, 2:48 a.m. UTC | #15
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, February 10, 2017 5:36 PM
> 
> >>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> > Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> > added a assertion that intack.vector is the highest priority vector. But
> > according to the osstest, the assertion failed sometimes. More discussion can
> > be found in the thread
> > (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> >
> > The assertion failure is hard to reproduce. In order to root cause issue, this
> > patch is to add logs to dump PIR and vIRR when failure takes place. It should
> > be reverted once the root cause is found.
> >
> > Signed-off-by: Chao Gao <chao.gao@intel.com>
> 
> Jun, Kevin - can you ack this? Or was Chao expected to make any
> changes? I'd like to see this go in rather sooner than later, so we
> can get it back out well before 4.9 is going to settle.
> 

My bad... thought I've acked but seems distracted by later discussions.

Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich Feb. 14, 2017, 10:51 a.m. UTC | #16
>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
> added a assertion that intack.vector is the highest priority vector. But
> according to the osstest, the assertion failed sometimes. More discussion can
> be found in the thread
> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
> 
> The assertion failure is hard to reproduce. In order to root cause issue, this
> patch is to add logs to dump PIR and vIRR when failure takes place. It should
> be reverted once the root cause is found.

Julien, could you add a note on the 4.9 tracking list for us to not
forget to revert this (or at least add NDEBUG conditionals) the
latest in the RC phase?

Thanks, Jan
Julien Grall Feb. 14, 2017, 12:12 p.m. UTC | #17
Hi Jan,

On 02/14/2017 10:51 AM, Jan Beulich wrote:
>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
>> added a assertion that intack.vector is the highest priority vector. But
>> according to the osstest, the assertion failed sometimes. More discussion can
>> be found in the thread
>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
>>
>> The assertion failure is hard to reproduce. In order to root cause issue, this
>> patch is to add logs to dump PIR and vIRR when failure takes place. It should
>> be reverted once the root cause is found.
>
> Julien, could you add a note on the 4.9 tracking list for us to not
> forget to revert this (or at least add NDEBUG conditionals) the
> latest in the RC phase?

I have added a note in the 4.9 tracking list.

Cheers,
Julien Grall Feb. 14, 2017, 2:37 p.m. UTC | #18
(Resending the e-mail due to SMTP issue)

On 02/14/2017 12:12 PM, Julien Grall wrote:
> Hi Jan,
>
> On 02/14/2017 10:51 AM, Jan Beulich wrote:
>>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
>>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv
>>> issue") has
>>> added a assertion that intack.vector is the highest priority vector. But
>>> according to the osstest, the assertion failed sometimes. More
>>> discussion can
>>> be found in the thread
>>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
>>>
>>>
>>> The assertion failure is hard to reproduce. In order to root cause
>>> issue, this
>>> patch is to add logs to dump PIR and vIRR when failure takes place.
>>> It should
>>> be reverted once the root cause is found.
>>
>> Julien, could you add a note on the 4.9 tracking list for us to not
>> forget to revert this (or at least add NDEBUG conditionals) the
>> latest in the RC phase?
>
> I have added a note in the 4.9 tracking list.
>
> Cheers,
>
Julien Grall May 4, 2017, 9:47 a.m. UTC | #19
Hi,

On 14/02/17 10:51, Jan Beulich wrote:
>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
>> added a assertion that intack.vector is the highest priority vector. But
>> according to the osstest, the assertion failed sometimes. More discussion can
>> be found in the thread
>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
>>
>> The assertion failure is hard to reproduce. In order to root cause issue, this
>> patch is to add logs to dump PIR and vIRR when failure takes place. It should
>> be reverted once the root cause is found.
>
> Julien, could you add a note on the 4.9 tracking list for us to not
> forget to revert this (or at least add NDEBUG conditionals) the
> latest in the RC phase?

Bumping this discussion as we are in the RC phase. Was the root cause 
found and we can revert the patch? Or shall we surround the code with 
NDEBUG?

Cheers,
Jan Beulich May 4, 2017, 10:20 a.m. UTC | #20
>>> On 04.05.17 at 11:47, <julien.grall@arm.com> wrote:
> On 14/02/17 10:51, Jan Beulich wrote:
>>>>> On 07.02.17 at 00:32, <chao.gao@intel.com> wrote:
>>> Commit c7bdecae42 ("x86/apicv: fix RTC periodic timer and apicv issue") has
>>> added a assertion that intack.vector is the highest priority vector. But
>>> according to the osstest, the assertion failed sometimes. More discussion can
>>> be found in the thread
>>> (https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg01019.html).
>>>
>>> The assertion failure is hard to reproduce. In order to root cause issue, this
>>> patch is to add logs to dump PIR and vIRR when failure takes place. It should
>>> be reverted once the root cause is found.
>>
>> Julien, could you add a note on the 4.9 tracking list for us to not
>> forget to revert this (or at least add NDEBUG conditionals) the
>> latest in the RC phase?
> 
> Bumping this discussion as we are in the RC phase. Was the root cause 
> found and we can revert the patch?

Sadly not, so far.

> Or shall we surround the code with NDEBUG?

Yes, I'll submit a patch.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 24e4505..5e5b37a 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -23,6 +23,7 @@ 
 #include <xen/errno.h>
 #include <xen/trace.h>
 #include <xen/event.h>
+#include <asm/apicdef.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
@@ -318,6 +319,48 @@  void vmx_intr_assist(void)
         */
         if ( pt_vector != -1 )
         {
+            /*
+             * We assert that intack.vector is the highest priority vector for
+             * only an interrupt from vlapic can reach this point and the
+             * highest vector is chosen in hvm_vcpu_has_pending_irq().
+             * But, in fact, the assertion failed sometimes. It is suspected
+             * that PIR is not synced to vIRR which makes pt_vector is left in
+             * PIR. In order to verify this suspicion, dump some information
+             * when the assertion fails.
+             */
+            if ( unlikely(intack.vector < pt_vector) )
+            {
+                struct vlapic *vlapic;
+                struct pi_desc *pi_desc;
+                uint32_t *word;
+                int i;
+
+                printk("Assertion failed on %pv. Some info are below.\n",
+                       current);
+                printk("intack info: vector 0x%2x, source %x\n", intack.vector,
+                       intack.source);
+                printk("pt_vector info: vector 0x%2x\n", pt_vector);
+
+                vlapic = vcpu_vlapic(v);
+                if ( vlapic && vlapic->regs->data )
+                {
+                    word = (void *)&vlapic->regs->data[APIC_IRR];
+                    printk("vIRR:");
+                    for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- )
+                        printk(" %08x", word[i*4]);
+                    printk("\n");
+                }
+
+                pi_desc = &v->arch.hvm_vmx.pi_desc;
+                if ( pi_desc && pi_desc->pir )
+                {
+                    word = (void *)&pi_desc->pir;
+                    printk(" PIR:");
+                    for ( i = NR_VECTORS / 32 - 1; i >= 0 ; i-- )
+                        printk(" %08x", word[i]);
+                    printk("\n");
+                }
+            }
             ASSERT(intack.vector >= pt_vector);
             vmx_set_eoi_exit_bitmap(v, intack.vector);
         }