diff mbox

[BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

Message ID 569BB278.8080603@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Jan. 17, 2016, 3:25 p.m. UTC
On 17/01/16 15:16, Andrew Cooper wrote:
>
>>> This isn't the first time we have seen this on Haswell processors. Do
>>> you have microcode loading set up?
>>>
>>> ~Andrew
>>>
>> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated
>> cpu microcode, using microcode from 20151106.
> Ok - I previously investigated this issue, but my repro evaporated from
> under my feet with a firmware update, and I never got to the bottom of it.
>
> Please can you start with the following patch which will dump some more
> information on crash.
>
> ---8<---
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index 1228568..588b562 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq)
>      if ( action->ack_type == ACKTYPE_EOI )
>      {
>          sp = pending_eoi_sp(peoi);
> +        if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
> +        {
> +            int p;
> +            for ( p = sp; p > 0; --p )
> +                printk("**peoi[%d] = {%d, 0x%u, %d}\n",
> +                       p-1, peoi[p-1].irq, peoi[p-1].vector,
> peoi[p-1].ready);
> +        }
>          ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
>          ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
>          peoi[sp].irq = irq;

Actually, this will be more useful:

         peoi[sp].irq = irq;

Comments

Andrew Cooper Jan. 22, 2016, 9:20 a.m. UTC | #1
On 22/01/2016 08:57, HÃ¥kon Alstadheim wrote:
> Den 17. jan. 2016 16:25, skrev Andrew Cooper:
>> On 17/01/16 15:16, Andrew Cooper wrote:
>>>>> This isn't the first time we have seen this on Haswell processors. Do
>>>>> you have microcode loading set up?
>>>>>
>>>>> ~Andrew
>>>>>
>>>> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated
>>>> cpu microcode, using microcode from 20151106.
> ...
>>>> Actually, this will be more useful:
>>>>
>>>> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
>>>> index 1228568..4e75b03 100644
>>>> --- a/xen/arch/x86/irq.c
>>>> +++ b/xen/arch/x86/irq.c
>>>> @@ -1165,6 +1165,15 @@ static void __do_IRQ_guest(int irq)
>>>>      if ( action->ack_type == ACKTYPE_EOI )
>>>>      {
>>>>          sp = pending_eoi_sp(peoi);
>>>> +        if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
>>>> +        {
>>>> +            int p;
>>>> +
>>>> +            printk("** sp %d, irq %d, vec %#x\n", sp, irq, vector);
>>>> +            for ( p = sp; p > 0; --p )
>>>> +                printk("**peoi[%d] = {%d, %#x, %d}\n",
>>>> +                       p-1, peoi[p-1].irq, peoi[p-1].vector,
>>>> peoi[p-1].ready);
>>>> +        }
>>>>          ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
>>>>          ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
>>>>          peoi[sp].irq = irq;
>>>>
>>>>
>>>>
> Got one again. dom5 is my desktop, dom1 is my
> mail-server/router/firewall. (planning to split that up ... ) . Is there
> any additional info that would be useful?
>
> Running now with gentoo xen 4.6.0-r8 and xen-tools 4.6.0-r7. dom0 kernel
> is gentoo-sources-4.1.15-r1 , and the above patch.
>
> I tried running with maxcpus=6 for a while, but I had to disable some
> services to get that running. So, when nothing happened for a while I
> re-enabled all my cores (two cpus, 12 cores, 24 threads). I was running
> with two cpu-pools, one for each cpu. I have not re-enabled that.

grant_table.c:1491:d1v3 Expanding dom (1) grant table from (12) to (13)
frames.
** sp 1, irq 107, vec 0x3b
**peoi[0] = {107, 0x3b, 0}
Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1172
----[ Xen-4.6.0  x86_64  debug=y  Tainted:    C ]----
<snip>
Xen call trace:
   [<ffff82d080170205>] do_IRQ+0x451/0x6ea
   [<ffff82d08023b132>] common_interrupt+0x62/0x70
   [<ffff82d0801af1ea>] mwait_idle+0x2cb/0x315
   [<ffff82d0801607bc>] idle_loop+0x51/0x6b

So we have been interrupted with an interrupt we already believe to be
pending.  I wonder if there is an erratum to do with going to sleep with
a pending interrupt.

I will see about extending the debugging patch to stash the IIR/ISR
before going to sleep.

~Andrew
Jan Beulich Jan. 22, 2016, 10:06 a.m. UTC | #2
>>> On 22.01.16 at 10:20, <andrew.cooper3@citrix.com> wrote:
> ** sp 1, irq 107, vec 0x3b
> **peoi[0] = {107, 0x3b, 0}
> Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1172
> ----[ Xen-4.6.0  x86_64  debug=y  Tainted:    C ]----
> <snip>
> Xen call trace:
>    [<ffff82d080170205>] do_IRQ+0x451/0x6ea
>    [<ffff82d08023b132>] common_interrupt+0x62/0x70
>    [<ffff82d0801af1ea>] mwait_idle+0x2cb/0x315
>    [<ffff82d0801607bc>] idle_loop+0x51/0x6b
> 
> So we have been interrupted with an interrupt we already believe to be
> pending.  I wonder if there is an erratum to do with going to sleep with
> a pending interrupt.

An immediate way to check whether that's (part of) the problem
would be to run with "cpuidle=0" for a while.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 1228568..4e75b03 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1165,6 +1165,15 @@  static void __do_IRQ_guest(int irq)
     if ( action->ack_type == ACKTYPE_EOI )
     {
         sp = pending_eoi_sp(peoi);
+        if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
+        {
+            int p;
+
+            printk("** sp %d, irq %d, vec %#x\n", sp, irq, vector);
+            for ( p = sp; p > 0; --p )
+                printk("**peoi[%d] = {%d, %#x, %d}\n",
+                       p-1, peoi[p-1].irq, peoi[p-1].vector,
peoi[p-1].ready);
+        }
         ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
         ASSERT(sp < (NR_DYNAMIC_VECTORS-1));