diff mbox

[Qemu-devel] E5-2620v2 - emulation stop error

Message ID 20150326155807.GA13271@potion.brq.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář March 26, 2015, 3:58 p.m. UTC
2015-03-25 20:05-0400, Kevin O'Connor:
> On Thu, Mar 26, 2015 at 02:35:58AM +0300, Andrey Korolyov wrote:
> > Thanks, strangely the reboot is always failing now and always reaching
> > seabios greeting. May be prints straightened up a race (e.g. it is not
> > int19 problem really).
> > 
> > object file part:
> > 
> > 0000d331 <irq_trampoline_0x19>:
> > irq_trampoline_0x19():
> > /root/seabios-1.8.1/src/romlayout.S:195
> >     d331:       cd 19                   int    $0x19
> >     d333:       cb                      lretw
> 
> [...]
> > Jump to int19 (vector=f000e6f2)
> 
> Thanks.  So, it dies on the "int $0x19" instruction itself.  The
> vector looks correct and I don't see anything in the cpu register
> state that looks wrong.  Maybe one of the kvm developers will have an
> idea what could cause a fault there.

The place agrees with the "<cd> 19 cb" part of KVM error output.
Suberror 2 means that we were interrupted while delivering a vector,
here it is disected: (delivering 'vect_info')

  vect_info (extra data[0]: 800000ef)
  - vector 0xef
  - INTR_TYPE_EXT_INTR (0x000)
  - no error code (0x000)
  - valid (0x80000000)

  intr_info (extra data[1]: 80000b0d)
  - #GP (0x0d)
  - INTR_TYPE_HARD_EXCEPTION (0x300)
  - error code on stack (0x800)  [Hunk at the bottom exposes it.]
  - valid (0x80000000)

Notice the 0xef.  My best hypothesis so far is that we fail at resetting
devices, and 0xef is LOCAL_TIMER_VECTOR from Linux before we rebooted.
(The bug happens at the first place that enables interrupts.)

This should be handled without an internal error, though.
(It's possible we are hitting two bugs here ...)

Can you provide KVM event trace as well?  (`trace-cmd record -e 'kvm*'`)

Thanks.


---
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Kevin O'Connor March 26, 2015, 4:36 p.m. UTC | #1
On Thu, Mar 26, 2015 at 04:58:07PM +0100, Radim Kr?má? wrote:
> 2015-03-25 20:05-0400, Kevin O'Connor:
> > On Thu, Mar 26, 2015 at 02:35:58AM +0300, Andrey Korolyov wrote:
> > > Thanks, strangely the reboot is always failing now and always reaching
> > > seabios greeting. May be prints straightened up a race (e.g. it is not
> > > int19 problem really).
> > > 
> > > object file part:
> > > 
> > > 0000d331 <irq_trampoline_0x19>:
> > > irq_trampoline_0x19():
> > > /root/seabios-1.8.1/src/romlayout.S:195
> > >     d331:       cd 19                   int    $0x19
> > >     d333:       cb                      lretw
> > 
> > [...]
> > > Jump to int19 (vector=f000e6f2)
> > 
> > Thanks.  So, it dies on the "int $0x19" instruction itself.  The
> > vector looks correct and I don't see anything in the cpu register
> > state that looks wrong.  Maybe one of the kvm developers will have an
> > idea what could cause a fault there.
> 
> The place agrees with the "<cd> 19 cb" part of KVM error output.
> Suberror 2 means that we were interrupted while delivering a vector,
> here it is disected: (delivering 'vect_info')
> 
>   vect_info (extra data[0]: 800000ef)
>   - vector 0xef
>   - INTR_TYPE_EXT_INTR (0x000)
>   - no error code (0x000)
>   - valid (0x80000000)
> 
>   intr_info (extra data[1]: 80000b0d)
>   - #GP (0x0d)
>   - INTR_TYPE_HARD_EXCEPTION (0x300)
>   - error code on stack (0x800)  [Hunk at the bottom exposes it.]
>   - valid (0x80000000)

Thanks for the background info.

> Notice the 0xef.  My best hypothesis so far is that we fail at resetting
> devices, and 0xef is LOCAL_TIMER_VECTOR from Linux before we rebooted.
> (The bug happens at the first place that enables interrupts.)

FYI, the "int $0x19" isn't the first place SeaBIOS will enable
interrupts.  Each screen print (every character in the seabios banner
and uuid string) will call the vga bios (int $0x10) with irqs enabled
(see output.c:screenc).

Also, SeaBIOS loads a default vector (f000:ff53) at 0xef which does a
simple "iretw".

Things that are unusual about the "int $0x19" call:
  - it is likely the first place that the cpu is transitioned into
    16bit real mode as opposed to "big real" mode.  (That is, the
    first place interrupts are enabled with the segment limits set to
    0xffff.)
  - it's right after the fw/shadow.c:make_bios_readonly() call, which
    attempts to configures the memory at 0xf0000-0x100000 as
    read-only.  That code also issues a wbinvd() call.

I'm not sure if the crash always happens at the "int $0x19" location
though.  Andrey, does the crash always happen with "EIP=d331" and/or
with "Code=... <cd> 19"?

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Korolyov March 26, 2015, 4:48 p.m. UTC | #2
On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
> On Thu, Mar 26, 2015 at 04:58:07PM +0100, Radim Kr?má? wrote:
>> 2015-03-25 20:05-0400, Kevin O'Connor:
>> > On Thu, Mar 26, 2015 at 02:35:58AM +0300, Andrey Korolyov wrote:
>> > > Thanks, strangely the reboot is always failing now and always reaching
>> > > seabios greeting. May be prints straightened up a race (e.g. it is not
>> > > int19 problem really).
>> > >
>> > > object file part:
>> > >
>> > > 0000d331 <irq_trampoline_0x19>:
>> > > irq_trampoline_0x19():
>> > > /root/seabios-1.8.1/src/romlayout.S:195
>> > >     d331:       cd 19                   int    $0x19
>> > >     d333:       cb                      lretw
>> >
>> > [...]
>> > > Jump to int19 (vector=f000e6f2)
>> >
>> > Thanks.  So, it dies on the "int $0x19" instruction itself.  The
>> > vector looks correct and I don't see anything in the cpu register
>> > state that looks wrong.  Maybe one of the kvm developers will have an
>> > idea what could cause a fault there.
>>
>> The place agrees with the "<cd> 19 cb" part of KVM error output.
>> Suberror 2 means that we were interrupted while delivering a vector,
>> here it is disected: (delivering 'vect_info')
>>
>>   vect_info (extra data[0]: 800000ef)
>>   - vector 0xef
>>   - INTR_TYPE_EXT_INTR (0x000)
>>   - no error code (0x000)
>>   - valid (0x80000000)
>>
>>   intr_info (extra data[1]: 80000b0d)
>>   - #GP (0x0d)
>>   - INTR_TYPE_HARD_EXCEPTION (0x300)
>>   - error code on stack (0x800)  [Hunk at the bottom exposes it.]
>>   - valid (0x80000000)
>
> Thanks for the background info.
>
>> Notice the 0xef.  My best hypothesis so far is that we fail at resetting
>> devices, and 0xef is LOCAL_TIMER_VECTOR from Linux before we rebooted.
>> (The bug happens at the first place that enables interrupts.)
>
> FYI, the "int $0x19" isn't the first place SeaBIOS will enable
> interrupts.  Each screen print (every character in the seabios banner
> and uuid string) will call the vga bios (int $0x10) with irqs enabled
> (see output.c:screenc).
>
> Also, SeaBIOS loads a default vector (f000:ff53) at 0xef which does a
> simple "iretw".
>
> Things that are unusual about the "int $0x19" call:
>   - it is likely the first place that the cpu is transitioned into
>     16bit real mode as opposed to "big real" mode.  (That is, the
>     first place interrupts are enabled with the segment limits set to
>     0xffff.)
>   - it's right after the fw/shadow.c:make_bios_readonly() call, which
>     attempts to configures the memory at 0xf0000-0x100000 as
>     read-only.  That code also issues a wbinvd() call.
>
> I'm not sure if the crash always happens at the "int $0x19" location
> though.  Andrey, does the crash always happen with "EIP=d331" and/or
> with "Code=... <cd> 19"?
>
> -Kevin

There are also rare occurences for d3f9 (in the middle of ep) and d334
ep (less than one tenth of events for both). I`ll post a sample event
capture with and without Radim`s proposed patch maybe today or
tomorrow.

/root/seabios-1.8.1/src/romlayout.S:289
    d3eb:       66 50                   pushl  %eax
    d3ed:       66 51                   pushl  %ecx
    d3ef:       66 52                   pushl  %edx
    d3f1:       66 53                   pushl  %ebx
    d3f3:       66 55                   pushl  %ebp
    d3f5:       66 56                   pushl  %esi
    d3f7:       66 57                   pushl  %edi
    d3f9:       06                      pushw  %es
    d3fa:       1e                      pushw  %ds

0000d334 <irq_trampoline_0x1c>:
irq_trampoline_0x1c():
/root/seabios-1.8.1/src/romlayout.S:196
    d334:       cd 1c                   int    $0x1c
    d336:       cb                      lretw
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin O'Connor March 26, 2015, 5:06 p.m. UTC | #3
On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote:
> On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
> > I'm not sure if the crash always happens at the "int $0x19" location
> > though.  Andrey, does the crash always happen with "EIP=d331" and/or
> > with "Code=... <cd> 19"?
> 
> There are also rare occurences for d3f9 (in the middle of ep) and d334
> ep (less than one tenth of events for both). I`ll post a sample event
> capture with and without Radim`s proposed patch maybe today or
> tomorrow.
> 
> /root/seabios-1.8.1/src/romlayout.S:289
>     d3eb:       66 50                   pushl  %eax
>     d3ed:       66 51                   pushl  %ecx
>     d3ef:       66 52                   pushl  %edx
>     d3f1:       66 53                   pushl  %ebx
>     d3f3:       66 55                   pushl  %ebp
>     d3f5:       66 56                   pushl  %esi
>     d3f7:       66 57                   pushl  %edi
>     d3f9:       06                      pushw  %es
>     d3fa:       1e                      pushw  %ds
> 
> 0000d334 <irq_trampoline_0x1c>:
> irq_trampoline_0x1c():
> /root/seabios-1.8.1/src/romlayout.S:196
>     d334:       cd 1c                   int    $0x1c
>     d336:       cb                      lretw

Thanks.  The d334 looks very similar to the d331 report (code=<cd>
1c).  That path could happen during post (big real mode) or
immiediately after post (real mode).

The d3f9 report does not look like the others - interrupts are
disabled there.  If you still have the error logs, can you post the
full kvm crash report for d3f9?

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Korolyov March 26, 2015, 5:08 p.m. UTC | #4
On Thu, Mar 26, 2015 at 8:06 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
> On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote:
>> On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
>> > I'm not sure if the crash always happens at the "int $0x19" location
>> > though.  Andrey, does the crash always happen with "EIP=d331" and/or
>> > with "Code=... <cd> 19"?
>>
>> There are also rare occurences for d3f9 (in the middle of ep) and d334
>> ep (less than one tenth of events for both). I`ll post a sample event
>> capture with and without Radim`s proposed patch maybe today or
>> tomorrow.
>>
>> /root/seabios-1.8.1/src/romlayout.S:289
>>     d3eb:       66 50                   pushl  %eax
>>     d3ed:       66 51                   pushl  %ecx
>>     d3ef:       66 52                   pushl  %edx
>>     d3f1:       66 53                   pushl  %ebx
>>     d3f3:       66 55                   pushl  %ebp
>>     d3f5:       66 56                   pushl  %esi
>>     d3f7:       66 57                   pushl  %edi
>>     d3f9:       06                      pushw  %es
>>     d3fa:       1e                      pushw  %ds
>>
>> 0000d334 <irq_trampoline_0x1c>:
>> irq_trampoline_0x1c():
>> /root/seabios-1.8.1/src/romlayout.S:196
>>     d334:       cd 1c                   int    $0x1c
>>     d336:       cb                      lretw
>
> Thanks.  The d334 looks very similar to the d331 report (code=<cd>
> 1c).  That path could happen during post (big real mode) or
> immiediately after post (real mode).
>
> The d3f9 report does not look like the others - interrupts are
> disabled there.  If you still have the error logs, can you post the
> full kvm crash report for d3f9?
>

Here you go:

KVM internal error. Suberror: 2
extra data[0]: 800000ef
extra data[1]: 80000b0d
EAX=00000003 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006cd4
EIP=0000d3f9 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     000f6e98 00000037
IDT=     00000000 000003ff
CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb <cd>
10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66
b8 00 e0 00 00 8e
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin O'Connor March 26, 2015, 5:18 p.m. UTC | #5
On Thu, Mar 26, 2015 at 08:08:52PM +0300, Andrey Korolyov wrote:
> On Thu, Mar 26, 2015 at 8:06 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
> > On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote:
> >> On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
> >> > I'm not sure if the crash always happens at the "int $0x19" location
> >> > though.  Andrey, does the crash always happen with "EIP=d331" and/or
> >> > with "Code=... <cd> 19"?
> >>
> >> There are also rare occurences for d3f9 (in the middle of ep) and d334
> >> ep (less than one tenth of events for both). I`ll post a sample event
> >> capture with and without Radim`s proposed patch maybe today or
> >> tomorrow.
> >>
> >> /root/seabios-1.8.1/src/romlayout.S:289
> >>     d3eb:       66 50                   pushl  %eax
> >>     d3ed:       66 51                   pushl  %ecx
> >>     d3ef:       66 52                   pushl  %edx
> >>     d3f1:       66 53                   pushl  %ebx
> >>     d3f3:       66 55                   pushl  %ebp
> >>     d3f5:       66 56                   pushl  %esi
> >>     d3f7:       66 57                   pushl  %edi
> >>     d3f9:       06                      pushw  %es
> >>     d3fa:       1e                      pushw  %ds
> >>
> >> 0000d334 <irq_trampoline_0x1c>:
> >> irq_trampoline_0x1c():
> >> /root/seabios-1.8.1/src/romlayout.S:196
> >>     d334:       cd 1c                   int    $0x1c
> >>     d336:       cb                      lretw
> >
> > Thanks.  The d334 looks very similar to the d331 report (code=<cd>
> > 1c).  That path could happen during post (big real mode) or
> > immiediately after post (real mode).
> >
> > The d3f9 report does not look like the others - interrupts are
> > disabled there.  If you still have the error logs, can you post the
> > full kvm crash report for d3f9?
> >
> 
> Here you go:

Thanks.  While we're at, can you verify if all your reports are
showing the cpu in "real mode".  That is, do they all have "0000ffff"
in the third column of the segment registers - as in:

> ES =0000 00000000 0000ffff 00009300

[...]
> Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb <cd>
> 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66
> b8 00 e0 00 00 8e

KVM reports the code as "int $0x10" here.  Was it possible this report
was from a different build of seabios (that had a different code
layout)?

Interestingly, this "int $0x10" is also in real-mode and not "big real
mode", so I think it would have occurred after post completed.

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Korolyov March 26, 2015, 5:33 p.m. UTC | #6
On Thu, Mar 26, 2015 at 8:18 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
> On Thu, Mar 26, 2015 at 08:08:52PM +0300, Andrey Korolyov wrote:
>> On Thu, Mar 26, 2015 at 8:06 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
>> > On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote:
>> >> On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
>> >> > I'm not sure if the crash always happens at the "int $0x19" location
>> >> > though.  Andrey, does the crash always happen with "EIP=d331" and/or
>> >> > with "Code=... <cd> 19"?
>> >>
>> >> There are also rare occurences for d3f9 (in the middle of ep) and d334
>> >> ep (less than one tenth of events for both). I`ll post a sample event
>> >> capture with and without Radim`s proposed patch maybe today or
>> >> tomorrow.
>> >>
>> >> /root/seabios-1.8.1/src/romlayout.S:289
>> >>     d3eb:       66 50                   pushl  %eax
>> >>     d3ed:       66 51                   pushl  %ecx
>> >>     d3ef:       66 52                   pushl  %edx
>> >>     d3f1:       66 53                   pushl  %ebx
>> >>     d3f3:       66 55                   pushl  %ebp
>> >>     d3f5:       66 56                   pushl  %esi
>> >>     d3f7:       66 57                   pushl  %edi
>> >>     d3f9:       06                      pushw  %es
>> >>     d3fa:       1e                      pushw  %ds
>> >>
>> >> 0000d334 <irq_trampoline_0x1c>:
>> >> irq_trampoline_0x1c():
>> >> /root/seabios-1.8.1/src/romlayout.S:196
>> >>     d334:       cd 1c                   int    $0x1c
>> >>     d336:       cb                      lretw
>> >
>> > Thanks.  The d334 looks very similar to the d331 report (code=<cd>
>> > 1c).  That path could happen during post (big real mode) or
>> > immiediately after post (real mode).
>> >
>> > The d3f9 report does not look like the others - interrupts are
>> > disabled there.  If you still have the error logs, can you post the
>> > full kvm crash report for d3f9?
>> >
>>
>> Here you go:
>
> Thanks.  While we're at, can you verify if all your reports are
> showing the cpu in "real mode".  That is, do they all have "0000ffff"
> in the third column of the segment registers - as in:
>
>> ES =0000 00000000 0000ffff 00009300
>

That`s positive.

> [...]
>> Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb <cd>
>> 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66
>> b8 00 e0 00 00 8e
>
> KVM reports the code as "int $0x10" here.  Was it possible this report
> was from a different build of seabios (that had a different code
> layout)?
>

Yep, sorry, I`ve mixed in logs just from before transition out of 1.7.5.

> Interestingly, this "int $0x10" is also in real-mode and not "big real
> mode", so I think it would have occurred after post completed.
>
> -Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 26, 2015, 5:34 p.m. UTC | #7
2015-03-26 12:36-0400, Kevin O'Connor:
> On Thu, Mar 26, 2015 at 04:58:07PM +0100, Radim Kr?má? wrote:
> > Notice the 0xef.  My best hypothesis so far is that we fail at resetting
> > devices, and 0xef is LOCAL_TIMER_VECTOR from Linux before we rebooted.
> > (The bug happens at the first place that enables interrupts.)
> 
> FYI, the "int $0x19" isn't the first place SeaBIOS will enable
> interrupts.  Each screen print (every character in the seabios banner
> and uuid string) will call the vga bios (int $0x10) with irqs enabled
> (see output.c:screenc).

Most useful, thank you.
So interrupt can't be "forgotten" there on reboot ... it's possible that
a pending timer injects it later.
(I'd like to grasp the reason behind 0xef first.)

> Also, SeaBIOS loads a default vector (f000:ff53) at 0xef which does a
> simple "iretw".

The #GP error code could help a bit here.

> Things that are unusual about the "int $0x19" call:
>   - it is likely the first place that the cpu is transitioned into
>     16bit real mode as opposed to "big real" mode.  (That is, the
>     first place interrupts are enabled with the segment limits set to
>     0xffff.)
>   - it's right after the fw/shadow.c:make_bios_readonly() call, which
>     attempts to configures the memory at 0xf0000-0x100000 as
>     read-only.  That code also issues a wbinvd() call.

(I'll wait for the trace before doing more wild guesses ...)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 26, 2015, 5:35 p.m. UTC | #8
2015-03-26 19:48+0300, Andrey Korolyov:
>                                              I`ll post a sample event
> capture with and without Radim`s proposed patch maybe today or
> tomorrow.

Thanks.

The patch doesn't change runtime behavior, it just adds another data
field to the error report, so there is no need to test both cases.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 26, 2015, 5:40 p.m. UTC | #9
2015-03-26 20:08+0300, Andrey Korolyov:
> KVM internal error. Suberror: 2
> extra data[0]: 800000ef
> extra data[1]: 80000b0d

Btw. does this part ever change?

I see that first report had:

  KVM internal error. Suberror: 2
  extra data[0]: 800000d1
  extra data[1]: 80000b0d

Was that a Windows guest by any chance?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Korolyov March 26, 2015, 6:24 p.m. UTC | #10
On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
> 2015-03-26 20:08+0300, Andrey Korolyov:
>> KVM internal error. Suberror: 2
>> extra data[0]: 800000ef
>> extra data[1]: 80000b0d
>
> Btw. does this part ever change?
>
> I see that first report had:
>
>   KVM internal error. Suberror: 2
>   extra data[0]: 800000d1
>   extra data[1]: 80000b0d
>
> Was that a Windows guest by any chance?

Yes, exactly, different extra data output was from a Windows VMs.
Thanks for clarifying things for your patch, I hadn`t looked at the
vmx code yet and thought that it changing things.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 26, 2015, 8:40 p.m. UTC | #11
2015-03-26 21:24+0300, Andrey Korolyov:
> On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
> > 2015-03-26 20:08+0300, Andrey Korolyov:
> >> KVM internal error. Suberror: 2
> >> extra data[0]: 800000ef
> >> extra data[1]: 80000b0d
> >
> > Btw. does this part ever change?
> >
> > I see that first report had:
> >
> >   KVM internal error. Suberror: 2
> >   extra data[0]: 800000d1
> >   extra data[1]: 80000b0d
> >
> > Was that a Windows guest by any chance?
> 
> Yes, exactly, different extra data output was from a Windows VMs.

Windows uses vector 0xd1 for timer interrupts.

I second Bandan -- checking that it reproduces on other machine would be
great for sanity :)  (Although a bug in our APICv is far more likely.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bandan Das March 26, 2015, 9:03 p.m. UTC | #12
Radim Kr?má? <rkrcmar@redhat.com> writes:

> 2015-03-26 21:24+0300, Andrey Korolyov:
>> On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
>> > 2015-03-26 20:08+0300, Andrey Korolyov:
>> >> KVM internal error. Suberror: 2
>> >> extra data[0]: 800000ef
>> >> extra data[1]: 80000b0d
>> >
>> > Btw. does this part ever change?
>> >
>> > I see that first report had:
>> >
>> >   KVM internal error. Suberror: 2
>> >   extra data[0]: 800000d1
>> >   extra data[1]: 80000b0d
>> >
>> > Was that a Windows guest by any chance?
>> 
>> Yes, exactly, different extra data output was from a Windows VMs.
>
> Windows uses vector 0xd1 for timer interrupts.

> I second Bandan -- checking that it reproduces on other machine would be
> great for sanity :)  (Although a bug in our APICv is far more likely.)

If it's APICv related, a run without apicv enabled could give more hints.

Your "devices not getting reset" hypothesis makes the most sense to me,
maybe the timer vector in the error message is just one part of
the whole story. Another misbehaving interrupt from the dark comes in at the
same time and leads to a double fault.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Korolyov March 27, 2015, 10:16 a.m. UTC | #13
On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das <bsd@redhat.com> wrote:
> Radim Kr?má? <rkrcmar@redhat.com> writes:
>
>> 2015-03-26 21:24+0300, Andrey Korolyov:
>>> On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
>>> > 2015-03-26 20:08+0300, Andrey Korolyov:
>>> >> KVM internal error. Suberror: 2
>>> >> extra data[0]: 800000ef
>>> >> extra data[1]: 80000b0d
>>> >
>>> > Btw. does this part ever change?
>>> >
>>> > I see that first report had:
>>> >
>>> >   KVM internal error. Suberror: 2
>>> >   extra data[0]: 800000d1
>>> >   extra data[1]: 80000b0d
>>> >
>>> > Was that a Windows guest by any chance?
>>>
>>> Yes, exactly, different extra data output was from a Windows VMs.
>>
>> Windows uses vector 0xd1 for timer interrupts.
>
>> I second Bandan -- checking that it reproduces on other machine would be
>> great for sanity :)  (Although a bug in our APICv is far more likely.)
>
> If it's APICv related, a run without apicv enabled could give more hints.
>
> Your "devices not getting reset" hypothesis makes the most sense to me,
> maybe the timer vector in the error message is just one part of
> the whole story. Another misbehaving interrupt from the dark comes in at the
> same time and leads to a double fault.

Default trace (APICv enabled, first reboot introduced the issue):
http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz

Trace without APICv (three reboots, just to make sure to hit the
problematic condition of supposed DF, as it still have not one hundred
percent reproducibility):
http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz

It would be great of course to reproduce this somewhere else,
otherwise all this thread may end in fixing a bug which exists only at
my particular platform. Right now I have no hardware except a lot of
well-known (in terms of existing issues) Supermicro boards of one
model.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Korolyov March 27, 2015, 11:54 a.m. UTC | #14
On Thu, Mar 26, 2015 at 11:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
> 2015-03-26 21:24+0300, Andrey Korolyov:
>> On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
>> > 2015-03-26 20:08+0300, Andrey Korolyov:
>> >> KVM internal error. Suberror: 2
>> >> extra data[0]: 800000ef
>> >> extra data[1]: 80000b0d
>> >
>> > Btw. does this part ever change?
>> >
>> > I see that first report had:
>> >
>> >   KVM internal error. Suberror: 2
>> >   extra data[0]: 800000d1
>> >   extra data[1]: 80000b0d
>> >
>> > Was that a Windows guest by any chance?
>>
>> Yes, exactly, different extra data output was from a Windows VMs.
>
> Windows uses vector 0xd1 for timer interrupts.
>
> I second Bandan -- checking that it reproduces on other machine would be
> great for sanity :)  (Although a bug in our APICv is far more likely.)

Trace with new bits:

KVM internal error. Suberror: 2
extra data[0]: 800000ef
extra data[1]: 80000b0d
extra data[2]: 77b
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006d24
EIP=0000d331 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     000f6cb0 00000037
IDT=     00000000 000003ff
CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb <cd>
19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f
ba 2d d4 fe fb 3f
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 30, 2015, 6:56 p.m. UTC | #15
2015-03-27 13:16+0300, Andrey Korolyov:
> On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das <bsd@redhat.com> wrote:
> > Radim Kr?má? <rkrcmar@redhat.com> writes:
> >> I second Bandan -- checking that it reproduces on other machine would be
> >> great for sanity :)  (Although a bug in our APICv is far more likely.)
> >
> > If it's APICv related, a run without apicv enabled could give more hints.
> >
> > Your "devices not getting reset" hypothesis makes the most sense to me,
> > maybe the timer vector in the error message is just one part of
> > the whole story. Another misbehaving interrupt from the dark comes in at the
> > same time and leads to a double fault.
> 
> Default trace (APICv enabled, first reboot introduced the issue):
> http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz

The relevant part is here,
prefixed with "qemu-system-x86-4180  [002]   697.111550:"

  kvm_exit:             reason CR_ACCESS rip 0xd272 info 0 0
  kvm_cr:               cr_write 0 = 0x10
  kvm_mmu_get_page:     existing sp gfn 0 0/4 q0 direct --- !pge !nxe root 0 sync
  kvm_entry:            vcpu 0
  kvm_emulate_insn:     f0000:d275: ea 7a d2 00 f0
  kvm_emulate_insn:     f0000:d27a: 2e 0f 01 1e f0 6c
  kvm_emulate_insn:     f0000:d280: 31 c0
  kvm_emulate_insn:     f0000:d282: 8e e0
  kvm_emulate_insn:     f0000:d284: 8e e8
  kvm_emulate_insn:     f0000:d286: 8e c0
  kvm_emulate_insn:     f0000:d288: 8e d8
  kvm_emulate_insn:     f0000:d28a: 8e d0
  kvm_entry:            vcpu 0
  kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xd28f info 0 800000f6
  kvm_entry:            vcpu 0
  kvm_exit:             reason EPT_VIOLATION rip 0x8dd0 info 184 0
  kvm_page_fault:       address f8dd0 error_code 184
  kvm_entry:            vcpu 0
  kvm_exit:             reason EXTERNAL_INTERRUPT rip 0x8dd0 info 0 800000f6
  kvm_entry:            vcpu 0
  kvm_exit:             reason EPT_VIOLATION rip 0x76d6 info 184 0
  kvm_page_fault:       address f76d6 error_code 184
  kvm_entry:            vcpu 0
  kvm_exit:             reason EXTERNAL_INTERRUPT rip 0x76d6 info 0 800000f6
  kvm_entry:            vcpu 0
  kvm_exit:             reason PENDING_INTERRUPT rip 0xd331 info 0 0
  kvm_inj_virq:         irq 8
  kvm_entry:            vcpu 0
  kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6
  kvm_entry:            vcpu 0
  kvm_exit:             reason EPT_VIOLATION rip 0xfea5 info 184 0
  kvm_page_fault:       address ffea5 error_code 184
  kvm_entry:            vcpu 0
  kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6
  kvm_entry:            vcpu 0
  kvm_exit:             reason EPT_VIOLATION rip 0xe990 info 184 0
  kvm_page_fault:       address fe990 error_code 184
  kvm_entry:            vcpu 0
  kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xe990 info 0 800000f6
  kvm_entry:            vcpu 0
  kvm_exit:             reason EXCEPTION_NMI rip 0xd334 info 0 80000b0d
  kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)

> Trace without APICv (three reboots, just to make sure to hit the
> problematic condition of supposed DF, as it still have not one hundred
> percent reproducibility):
> http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz

The trace here contains a well matching excerpt, just instead of the
EXCEPTION_NMI, it does

 169.905098: kvm_exit:             reason EPT_VIOLATION rip 0xd334 info 181 0
 169.905102: kvm_page_fault:       address feffd066 error_code 181

and works.  Page fault says we tried to read 0xfeffd066 -- probably IOPB
of TSS.  (I guess it is pre-fetch for following IO instruction.)

Nothing strikes me when looking at it, but some APICv boots don't fail,
so it would be interesting to compare them ... hosts's 0xf6 interrupt
(IRQ_WORK_VECTOR) is a possible source of races.  (We could look more
closely.  It is fired too often for my liking as well.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 30, 2015, 7:28 p.m. UTC | #16
2015-03-27 14:54+0300, Andrey Korolyov:
> Trace with new bits:

Thanks.

> KVM internal error. Suberror: 2
> extra data[0]: 800000ef
> extra data[1]: 80000b0d
> extra data[2]: 77b

The #GP code looks formatted as documented under INT in SDM,
  (vector << 3) | 2 | ext
where 'ext' stands for 'external' (as opposed to software).

  0x77b == (0xef << 3) | 2 | 1

It was 0xef and wasn't triggered by an INT instruction.
The weird part is that it looks like a protected mode error, but CR0
says we are in real mode.

(If CPU interpreted the vector in protected mode, then it would violate
 the IDT limit and throw a #GP ...
 It's too late for coffee today, so I'll try to lure some ideas later.)

> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006d24
> EIP=0000d331 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 000f0000 0000ffff 00009b00
> SS =0000 00000000 0000ffff 00009300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT=     000f6cb0 00000037
> IDT=     00000000 000003ff
> CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb <cd>
> 19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f
> ba 2d d4 fe fb 3f
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Korolyov March 30, 2015, 7:32 p.m. UTC | #17
On Mon, Mar 30, 2015 at 9:56 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
> 2015-03-27 13:16+0300, Andrey Korolyov:
>> On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das <bsd@redhat.com> wrote:
>> > Radim Kr?má? <rkrcmar@redhat.com> writes:
>> >> I second Bandan -- checking that it reproduces on other machine would be
>> >> great for sanity :)  (Although a bug in our APICv is far more likely.)
>> >
>> > If it's APICv related, a run without apicv enabled could give more hints.
>> >
>> > Your "devices not getting reset" hypothesis makes the most sense to me,
>> > maybe the timer vector in the error message is just one part of
>> > the whole story. Another misbehaving interrupt from the dark comes in at the
>> > same time and leads to a double fault.
>>
>> Default trace (APICv enabled, first reboot introduced the issue):
>> http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz
>
> The relevant part is here,
> prefixed with "qemu-system-x86-4180  [002]   697.111550:"
>
>   kvm_exit:             reason CR_ACCESS rip 0xd272 info 0 0
>   kvm_cr:               cr_write 0 = 0x10
>   kvm_mmu_get_page:     existing sp gfn 0 0/4 q0 direct --- !pge !nxe root 0 sync
>   kvm_entry:            vcpu 0
>   kvm_emulate_insn:     f0000:d275: ea 7a d2 00 f0
>   kvm_emulate_insn:     f0000:d27a: 2e 0f 01 1e f0 6c
>   kvm_emulate_insn:     f0000:d280: 31 c0
>   kvm_emulate_insn:     f0000:d282: 8e e0
>   kvm_emulate_insn:     f0000:d284: 8e e8
>   kvm_emulate_insn:     f0000:d286: 8e c0
>   kvm_emulate_insn:     f0000:d288: 8e d8
>   kvm_emulate_insn:     f0000:d28a: 8e d0
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xd28f info 0 800000f6
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EPT_VIOLATION rip 0x8dd0 info 184 0
>   kvm_page_fault:       address f8dd0 error_code 184
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EXTERNAL_INTERRUPT rip 0x8dd0 info 0 800000f6
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EPT_VIOLATION rip 0x76d6 info 184 0
>   kvm_page_fault:       address f76d6 error_code 184
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EXTERNAL_INTERRUPT rip 0x76d6 info 0 800000f6
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason PENDING_INTERRUPT rip 0xd331 info 0 0
>   kvm_inj_virq:         irq 8
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EPT_VIOLATION rip 0xfea5 info 184 0
>   kvm_page_fault:       address ffea5 error_code 184
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EPT_VIOLATION rip 0xe990 info 184 0
>   kvm_page_fault:       address fe990 error_code 184
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xe990 info 0 800000f6
>   kvm_entry:            vcpu 0
>   kvm_exit:             reason EXCEPTION_NMI rip 0xd334 info 0 80000b0d
>   kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)
>
>> Trace without APICv (three reboots, just to make sure to hit the
>> problematic condition of supposed DF, as it still have not one hundred
>> percent reproducibility):
>> http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz
>
> The trace here contains a well matching excerpt, just instead of the
> EXCEPTION_NMI, it does
>
>  169.905098: kvm_exit:             reason EPT_VIOLATION rip 0xd334 info 181 0
>  169.905102: kvm_page_fault:       address feffd066 error_code 181
>
> and works.  Page fault says we tried to read 0xfeffd066 -- probably IOPB
> of TSS.  (I guess it is pre-fetch for following IO instruction.)
>
> Nothing strikes me when looking at it, but some APICv boots don't fail,
> so it would be interesting to compare them ... hosts's 0xf6 interrupt
> (IRQ_WORK_VECTOR) is a possible source of races.  (We could look more
> closely.  It is fired too often for my liking as well.)


Thanks Radim, http://xdel.ru/downloads/kvm-e5v2-issue/no-fail-with-apicv.dat.gz

(missed right button in mailer previously)

The related bits looks the same as with enable_apicv=0 for me.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 50c675b46901..541a29476a56 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5088,9 +5088,10 @@  static int handle_exception(struct kvm_vcpu *vcpu)
 	    !(is_page_fault(intr_info) && !(error_code & PFERR_RSVD_MASK))) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_SIMUL_EX;
-		vcpu->run->internal.ndata = 2;
+		vcpu->run->internal.ndata = 3;
 		vcpu->run->internal.data[0] = vect_info;
 		vcpu->run->internal.data[1] = intr_info;
+		vcpu->run->internal.data[2] = error_code;
 		return 0;
 	}