Message ID | 20150326155807.GA13271@potion.brq.redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Mar 26, 2015 at 04:58:07PM +0100, Radim Kr?má? wrote: > 2015-03-25 20:05-0400, Kevin O'Connor: > > On Thu, Mar 26, 2015 at 02:35:58AM +0300, Andrey Korolyov wrote: > > > Thanks, strangely the reboot is always failing now and always reaching > > > seabios greeting. May be prints straightened up a race (e.g. it is not > > > int19 problem really). > > > > > > object file part: > > > > > > 0000d331 <irq_trampoline_0x19>: > > > irq_trampoline_0x19(): > > > /root/seabios-1.8.1/src/romlayout.S:195 > > > d331: cd 19 int $0x19 > > > d333: cb lretw > > > > [...] > > > Jump to int19 (vector=f000e6f2) > > > > Thanks. So, it dies on the "int $0x19" instruction itself. The > > vector looks correct and I don't see anything in the cpu register > > state that looks wrong. Maybe one of the kvm developers will have an > > idea what could cause a fault there. > > The place agrees with the "<cd> 19 cb" part of KVM error output. > Suberror 2 means that we were interrupted while delivering a vector, > here it is disected: (delivering 'vect_info') > > vect_info (extra data[0]: 800000ef) > - vector 0xef > - INTR_TYPE_EXT_INTR (0x000) > - no error code (0x000) > - valid (0x80000000) > > intr_info (extra data[1]: 80000b0d) > - #GP (0x0d) > - INTR_TYPE_HARD_EXCEPTION (0x300) > - error code on stack (0x800) [Hunk at the bottom exposes it.] > - valid (0x80000000) Thanks for the background info. > Notice the 0xef. My best hypothesis so far is that we fail at resetting > devices, and 0xef is LOCAL_TIMER_VECTOR from Linux before we rebooted. > (The bug happens at the first place that enables interrupts.) FYI, the "int $0x19" isn't the first place SeaBIOS will enable interrupts. Each screen print (every character in the seabios banner and uuid string) will call the vga bios (int $0x10) with irqs enabled (see output.c:screenc). Also, SeaBIOS loads a default vector (f000:ff53) at 0xef which does a simple "iretw". Things that are unusual about the "int $0x19" call: - it is likely the first place that the cpu is transitioned into 16bit real mode as opposed to "big real" mode. (That is, the first place interrupts are enabled with the segment limits set to 0xffff.) - it's right after the fw/shadow.c:make_bios_readonly() call, which attempts to configures the memory at 0xf0000-0x100000 as read-only. That code also issues a wbinvd() call. I'm not sure if the crash always happens at the "int $0x19" location though. Andrey, does the crash always happen with "EIP=d331" and/or with "Code=... <cd> 19"? -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote: > On Thu, Mar 26, 2015 at 04:58:07PM +0100, Radim Kr?má? wrote: >> 2015-03-25 20:05-0400, Kevin O'Connor: >> > On Thu, Mar 26, 2015 at 02:35:58AM +0300, Andrey Korolyov wrote: >> > > Thanks, strangely the reboot is always failing now and always reaching >> > > seabios greeting. May be prints straightened up a race (e.g. it is not >> > > int19 problem really). >> > > >> > > object file part: >> > > >> > > 0000d331 <irq_trampoline_0x19>: >> > > irq_trampoline_0x19(): >> > > /root/seabios-1.8.1/src/romlayout.S:195 >> > > d331: cd 19 int $0x19 >> > > d333: cb lretw >> > >> > [...] >> > > Jump to int19 (vector=f000e6f2) >> > >> > Thanks. So, it dies on the "int $0x19" instruction itself. The >> > vector looks correct and I don't see anything in the cpu register >> > state that looks wrong. Maybe one of the kvm developers will have an >> > idea what could cause a fault there. >> >> The place agrees with the "<cd> 19 cb" part of KVM error output. >> Suberror 2 means that we were interrupted while delivering a vector, >> here it is disected: (delivering 'vect_info') >> >> vect_info (extra data[0]: 800000ef) >> - vector 0xef >> - INTR_TYPE_EXT_INTR (0x000) >> - no error code (0x000) >> - valid (0x80000000) >> >> intr_info (extra data[1]: 80000b0d) >> - #GP (0x0d) >> - INTR_TYPE_HARD_EXCEPTION (0x300) >> - error code on stack (0x800) [Hunk at the bottom exposes it.] >> - valid (0x80000000) > > Thanks for the background info. > >> Notice the 0xef. My best hypothesis so far is that we fail at resetting >> devices, and 0xef is LOCAL_TIMER_VECTOR from Linux before we rebooted. >> (The bug happens at the first place that enables interrupts.) > > FYI, the "int $0x19" isn't the first place SeaBIOS will enable > interrupts. Each screen print (every character in the seabios banner > and uuid string) will call the vga bios (int $0x10) with irqs enabled > (see output.c:screenc). > > Also, SeaBIOS loads a default vector (f000:ff53) at 0xef which does a > simple "iretw". > > Things that are unusual about the "int $0x19" call: > - it is likely the first place that the cpu is transitioned into > 16bit real mode as opposed to "big real" mode. (That is, the > first place interrupts are enabled with the segment limits set to > 0xffff.) > - it's right after the fw/shadow.c:make_bios_readonly() call, which > attempts to configures the memory at 0xf0000-0x100000 as > read-only. That code also issues a wbinvd() call. > > I'm not sure if the crash always happens at the "int $0x19" location > though. Andrey, does the crash always happen with "EIP=d331" and/or > with "Code=... <cd> 19"? > > -Kevin There are also rare occurences for d3f9 (in the middle of ep) and d334 ep (less than one tenth of events for both). I`ll post a sample event capture with and without Radim`s proposed patch maybe today or tomorrow. /root/seabios-1.8.1/src/romlayout.S:289 d3eb: 66 50 pushl %eax d3ed: 66 51 pushl %ecx d3ef: 66 52 pushl %edx d3f1: 66 53 pushl %ebx d3f3: 66 55 pushl %ebp d3f5: 66 56 pushl %esi d3f7: 66 57 pushl %edi d3f9: 06 pushw %es d3fa: 1e pushw %ds 0000d334 <irq_trampoline_0x1c>: irq_trampoline_0x1c(): /root/seabios-1.8.1/src/romlayout.S:196 d334: cd 1c int $0x1c d336: cb lretw -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote: > On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote: > > I'm not sure if the crash always happens at the "int $0x19" location > > though. Andrey, does the crash always happen with "EIP=d331" and/or > > with "Code=... <cd> 19"? > > There are also rare occurences for d3f9 (in the middle of ep) and d334 > ep (less than one tenth of events for both). I`ll post a sample event > capture with and without Radim`s proposed patch maybe today or > tomorrow. > > /root/seabios-1.8.1/src/romlayout.S:289 > d3eb: 66 50 pushl %eax > d3ed: 66 51 pushl %ecx > d3ef: 66 52 pushl %edx > d3f1: 66 53 pushl %ebx > d3f3: 66 55 pushl %ebp > d3f5: 66 56 pushl %esi > d3f7: 66 57 pushl %edi > d3f9: 06 pushw %es > d3fa: 1e pushw %ds > > 0000d334 <irq_trampoline_0x1c>: > irq_trampoline_0x1c(): > /root/seabios-1.8.1/src/romlayout.S:196 > d334: cd 1c int $0x1c > d336: cb lretw Thanks. The d334 looks very similar to the d331 report (code=<cd> 1c). That path could happen during post (big real mode) or immiediately after post (real mode). The d3f9 report does not look like the others - interrupts are disabled there. If you still have the error logs, can you post the full kvm crash report for d3f9? -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 26, 2015 at 8:06 PM, Kevin O'Connor <kevin@koconnor.net> wrote: > On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote: >> On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote: >> > I'm not sure if the crash always happens at the "int $0x19" location >> > though. Andrey, does the crash always happen with "EIP=d331" and/or >> > with "Code=... <cd> 19"? >> >> There are also rare occurences for d3f9 (in the middle of ep) and d334 >> ep (less than one tenth of events for both). I`ll post a sample event >> capture with and without Radim`s proposed patch maybe today or >> tomorrow. >> >> /root/seabios-1.8.1/src/romlayout.S:289 >> d3eb: 66 50 pushl %eax >> d3ed: 66 51 pushl %ecx >> d3ef: 66 52 pushl %edx >> d3f1: 66 53 pushl %ebx >> d3f3: 66 55 pushl %ebp >> d3f5: 66 56 pushl %esi >> d3f7: 66 57 pushl %edi >> d3f9: 06 pushw %es >> d3fa: 1e pushw %ds >> >> 0000d334 <irq_trampoline_0x1c>: >> irq_trampoline_0x1c(): >> /root/seabios-1.8.1/src/romlayout.S:196 >> d334: cd 1c int $0x1c >> d336: cb lretw > > Thanks. The d334 looks very similar to the d331 report (code=<cd> > 1c). That path could happen during post (big real mode) or > immiediately after post (real mode). > > The d3f9 report does not look like the others - interrupts are > disabled there. If you still have the error logs, can you post the > full kvm crash report for d3f9? > Here you go: KVM internal error. Suberror: 2 extra data[0]: 800000ef extra data[1]: 80000b0d EAX=00000003 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006cd4 EIP=0000d3f9 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 000f0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 000f6e98 00000037 IDT= 00000000 000003ff CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb <cd> 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 b8 00 e0 00 00 8e -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 26, 2015 at 08:08:52PM +0300, Andrey Korolyov wrote: > On Thu, Mar 26, 2015 at 8:06 PM, Kevin O'Connor <kevin@koconnor.net> wrote: > > On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote: > >> On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote: > >> > I'm not sure if the crash always happens at the "int $0x19" location > >> > though. Andrey, does the crash always happen with "EIP=d331" and/or > >> > with "Code=... <cd> 19"? > >> > >> There are also rare occurences for d3f9 (in the middle of ep) and d334 > >> ep (less than one tenth of events for both). I`ll post a sample event > >> capture with and without Radim`s proposed patch maybe today or > >> tomorrow. > >> > >> /root/seabios-1.8.1/src/romlayout.S:289 > >> d3eb: 66 50 pushl %eax > >> d3ed: 66 51 pushl %ecx > >> d3ef: 66 52 pushl %edx > >> d3f1: 66 53 pushl %ebx > >> d3f3: 66 55 pushl %ebp > >> d3f5: 66 56 pushl %esi > >> d3f7: 66 57 pushl %edi > >> d3f9: 06 pushw %es > >> d3fa: 1e pushw %ds > >> > >> 0000d334 <irq_trampoline_0x1c>: > >> irq_trampoline_0x1c(): > >> /root/seabios-1.8.1/src/romlayout.S:196 > >> d334: cd 1c int $0x1c > >> d336: cb lretw > > > > Thanks. The d334 looks very similar to the d331 report (code=<cd> > > 1c). That path could happen during post (big real mode) or > > immiediately after post (real mode). > > > > The d3f9 report does not look like the others - interrupts are > > disabled there. If you still have the error logs, can you post the > > full kvm crash report for d3f9? > > > > Here you go: Thanks. While we're at, can you verify if all your reports are showing the cpu in "real mode". That is, do they all have "0000ffff" in the third column of the segment registers - as in: > ES =0000 00000000 0000ffff 00009300 [...] > Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb <cd> > 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 > b8 00 e0 00 00 8e KVM reports the code as "int $0x10" here. Was it possible this report was from a different build of seabios (that had a different code layout)? Interestingly, this "int $0x10" is also in real-mode and not "big real mode", so I think it would have occurred after post completed. -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 26, 2015 at 8:18 PM, Kevin O'Connor <kevin@koconnor.net> wrote: > On Thu, Mar 26, 2015 at 08:08:52PM +0300, Andrey Korolyov wrote: >> On Thu, Mar 26, 2015 at 8:06 PM, Kevin O'Connor <kevin@koconnor.net> wrote: >> > On Thu, Mar 26, 2015 at 07:48:09PM +0300, Andrey Korolyov wrote: >> >> On Thu, Mar 26, 2015 at 7:36 PM, Kevin O'Connor <kevin@koconnor.net> wrote: >> >> > I'm not sure if the crash always happens at the "int $0x19" location >> >> > though. Andrey, does the crash always happen with "EIP=d331" and/or >> >> > with "Code=... <cd> 19"? >> >> >> >> There are also rare occurences for d3f9 (in the middle of ep) and d334 >> >> ep (less than one tenth of events for both). I`ll post a sample event >> >> capture with and without Radim`s proposed patch maybe today or >> >> tomorrow. >> >> >> >> /root/seabios-1.8.1/src/romlayout.S:289 >> >> d3eb: 66 50 pushl %eax >> >> d3ed: 66 51 pushl %ecx >> >> d3ef: 66 52 pushl %edx >> >> d3f1: 66 53 pushl %ebx >> >> d3f3: 66 55 pushl %ebp >> >> d3f5: 66 56 pushl %esi >> >> d3f7: 66 57 pushl %edi >> >> d3f9: 06 pushw %es >> >> d3fa: 1e pushw %ds >> >> >> >> 0000d334 <irq_trampoline_0x1c>: >> >> irq_trampoline_0x1c(): >> >> /root/seabios-1.8.1/src/romlayout.S:196 >> >> d334: cd 1c int $0x1c >> >> d336: cb lretw >> > >> > Thanks. The d334 looks very similar to the d331 report (code=<cd> >> > 1c). That path could happen during post (big real mode) or >> > immiediately after post (real mode). >> > >> > The d3f9 report does not look like the others - interrupts are >> > disabled there. If you still have the error logs, can you post the >> > full kvm crash report for d3f9? >> > >> >> Here you go: > > Thanks. While we're at, can you verify if all your reports are > showing the cpu in "real mode". That is, do they all have "0000ffff" > in the third column of the segment registers - as in: > >> ES =0000 00000000 0000ffff 00009300 > That`s positive. > [...] >> Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb <cd> >> 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 >> b8 00 e0 00 00 8e > > KVM reports the code as "int $0x10" here. Was it possible this report > was from a different build of seabios (that had a different code > layout)? > Yep, sorry, I`ve mixed in logs just from before transition out of 1.7.5. > Interestingly, this "int $0x10" is also in real-mode and not "big real > mode", so I think it would have occurred after post completed. > > -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015-03-26 12:36-0400, Kevin O'Connor: > On Thu, Mar 26, 2015 at 04:58:07PM +0100, Radim Kr?má? wrote: > > Notice the 0xef. My best hypothesis so far is that we fail at resetting > > devices, and 0xef is LOCAL_TIMER_VECTOR from Linux before we rebooted. > > (The bug happens at the first place that enables interrupts.) > > FYI, the "int $0x19" isn't the first place SeaBIOS will enable > interrupts. Each screen print (every character in the seabios banner > and uuid string) will call the vga bios (int $0x10) with irqs enabled > (see output.c:screenc). Most useful, thank you. So interrupt can't be "forgotten" there on reboot ... it's possible that a pending timer injects it later. (I'd like to grasp the reason behind 0xef first.) > Also, SeaBIOS loads a default vector (f000:ff53) at 0xef which does a > simple "iretw". The #GP error code could help a bit here. > Things that are unusual about the "int $0x19" call: > - it is likely the first place that the cpu is transitioned into > 16bit real mode as opposed to "big real" mode. (That is, the > first place interrupts are enabled with the segment limits set to > 0xffff.) > - it's right after the fw/shadow.c:make_bios_readonly() call, which > attempts to configures the memory at 0xf0000-0x100000 as > read-only. That code also issues a wbinvd() call. (I'll wait for the trace before doing more wild guesses ...) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015-03-26 19:48+0300, Andrey Korolyov: > I`ll post a sample event > capture with and without Radim`s proposed patch maybe today or > tomorrow. Thanks. The patch doesn't change runtime behavior, it just adds another data field to the error report, so there is no need to test both cases. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015-03-26 20:08+0300, Andrey Korolyov: > KVM internal error. Suberror: 2 > extra data[0]: 800000ef > extra data[1]: 80000b0d Btw. does this part ever change? I see that first report had: KVM internal error. Suberror: 2 extra data[0]: 800000d1 extra data[1]: 80000b0d Was that a Windows guest by any chance? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote: > 2015-03-26 20:08+0300, Andrey Korolyov: >> KVM internal error. Suberror: 2 >> extra data[0]: 800000ef >> extra data[1]: 80000b0d > > Btw. does this part ever change? > > I see that first report had: > > KVM internal error. Suberror: 2 > extra data[0]: 800000d1 > extra data[1]: 80000b0d > > Was that a Windows guest by any chance? Yes, exactly, different extra data output was from a Windows VMs. Thanks for clarifying things for your patch, I hadn`t looked at the vmx code yet and thought that it changing things. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015-03-26 21:24+0300, Andrey Korolyov: > On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote: > > 2015-03-26 20:08+0300, Andrey Korolyov: > >> KVM internal error. Suberror: 2 > >> extra data[0]: 800000ef > >> extra data[1]: 80000b0d > > > > Btw. does this part ever change? > > > > I see that first report had: > > > > KVM internal error. Suberror: 2 > > extra data[0]: 800000d1 > > extra data[1]: 80000b0d > > > > Was that a Windows guest by any chance? > > Yes, exactly, different extra data output was from a Windows VMs. Windows uses vector 0xd1 for timer interrupts. I second Bandan -- checking that it reproduces on other machine would be great for sanity :) (Although a bug in our APICv is far more likely.) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Radim Kr?má? <rkrcmar@redhat.com> writes: > 2015-03-26 21:24+0300, Andrey Korolyov: >> On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote: >> > 2015-03-26 20:08+0300, Andrey Korolyov: >> >> KVM internal error. Suberror: 2 >> >> extra data[0]: 800000ef >> >> extra data[1]: 80000b0d >> > >> > Btw. does this part ever change? >> > >> > I see that first report had: >> > >> > KVM internal error. Suberror: 2 >> > extra data[0]: 800000d1 >> > extra data[1]: 80000b0d >> > >> > Was that a Windows guest by any chance? >> >> Yes, exactly, different extra data output was from a Windows VMs. > > Windows uses vector 0xd1 for timer interrupts. > I second Bandan -- checking that it reproduces on other machine would be > great for sanity :) (Although a bug in our APICv is far more likely.) If it's APICv related, a run without apicv enabled could give more hints. Your "devices not getting reset" hypothesis makes the most sense to me, maybe the timer vector in the error message is just one part of the whole story. Another misbehaving interrupt from the dark comes in at the same time and leads to a double fault. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das <bsd@redhat.com> wrote: > Radim Kr?má? <rkrcmar@redhat.com> writes: > >> 2015-03-26 21:24+0300, Andrey Korolyov: >>> On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote: >>> > 2015-03-26 20:08+0300, Andrey Korolyov: >>> >> KVM internal error. Suberror: 2 >>> >> extra data[0]: 800000ef >>> >> extra data[1]: 80000b0d >>> > >>> > Btw. does this part ever change? >>> > >>> > I see that first report had: >>> > >>> > KVM internal error. Suberror: 2 >>> > extra data[0]: 800000d1 >>> > extra data[1]: 80000b0d >>> > >>> > Was that a Windows guest by any chance? >>> >>> Yes, exactly, different extra data output was from a Windows VMs. >> >> Windows uses vector 0xd1 for timer interrupts. > >> I second Bandan -- checking that it reproduces on other machine would be >> great for sanity :) (Although a bug in our APICv is far more likely.) > > If it's APICv related, a run without apicv enabled could give more hints. > > Your "devices not getting reset" hypothesis makes the most sense to me, > maybe the timer vector in the error message is just one part of > the whole story. Another misbehaving interrupt from the dark comes in at the > same time and leads to a double fault. Default trace (APICv enabled, first reboot introduced the issue): http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz Trace without APICv (three reboots, just to make sure to hit the problematic condition of supposed DF, as it still have not one hundred percent reproducibility): http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz It would be great of course to reproduce this somewhere else, otherwise all this thread may end in fixing a bug which exists only at my particular platform. Right now I have no hardware except a lot of well-known (in terms of existing issues) Supermicro boards of one model. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 26, 2015 at 11:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote: > 2015-03-26 21:24+0300, Andrey Korolyov: >> On Thu, Mar 26, 2015 at 8:40 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote: >> > 2015-03-26 20:08+0300, Andrey Korolyov: >> >> KVM internal error. Suberror: 2 >> >> extra data[0]: 800000ef >> >> extra data[1]: 80000b0d >> > >> > Btw. does this part ever change? >> > >> > I see that first report had: >> > >> > KVM internal error. Suberror: 2 >> > extra data[0]: 800000d1 >> > extra data[1]: 80000b0d >> > >> > Was that a Windows guest by any chance? >> >> Yes, exactly, different extra data output was from a Windows VMs. > > Windows uses vector 0xd1 for timer interrupts. > > I second Bandan -- checking that it reproduces on other machine would be > great for sanity :) (Although a bug in our APICv is far more likely.) Trace with new bits: KVM internal error. Suberror: 2 extra data[0]: 800000ef extra data[1]: 80000b0d extra data[2]: 77b EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006d24 EIP=0000d331 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 000f0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 000f6cb0 00000037 IDT= 00000000 000003ff CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb <cd> 19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f ba 2d d4 fe fb 3f -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015-03-27 13:16+0300, Andrey Korolyov: > On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das <bsd@redhat.com> wrote: > > Radim Kr?má? <rkrcmar@redhat.com> writes: > >> I second Bandan -- checking that it reproduces on other machine would be > >> great for sanity :) (Although a bug in our APICv is far more likely.) > > > > If it's APICv related, a run without apicv enabled could give more hints. > > > > Your "devices not getting reset" hypothesis makes the most sense to me, > > maybe the timer vector in the error message is just one part of > > the whole story. Another misbehaving interrupt from the dark comes in at the > > same time and leads to a double fault. > > Default trace (APICv enabled, first reboot introduced the issue): > http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz The relevant part is here, prefixed with "qemu-system-x86-4180 [002] 697.111550:" kvm_exit: reason CR_ACCESS rip 0xd272 info 0 0 kvm_cr: cr_write 0 = 0x10 kvm_mmu_get_page: existing sp gfn 0 0/4 q0 direct --- !pge !nxe root 0 sync kvm_entry: vcpu 0 kvm_emulate_insn: f0000:d275: ea 7a d2 00 f0 kvm_emulate_insn: f0000:d27a: 2e 0f 01 1e f0 6c kvm_emulate_insn: f0000:d280: 31 c0 kvm_emulate_insn: f0000:d282: 8e e0 kvm_emulate_insn: f0000:d284: 8e e8 kvm_emulate_insn: f0000:d286: 8e c0 kvm_emulate_insn: f0000:d288: 8e d8 kvm_emulate_insn: f0000:d28a: 8e d0 kvm_entry: vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xd28f info 0 800000f6 kvm_entry: vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0x8dd0 info 184 0 kvm_page_fault: address f8dd0 error_code 184 kvm_entry: vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0x8dd0 info 0 800000f6 kvm_entry: vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0x76d6 info 184 0 kvm_page_fault: address f76d6 error_code 184 kvm_entry: vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0x76d6 info 0 800000f6 kvm_entry: vcpu 0 kvm_exit: reason PENDING_INTERRUPT rip 0xd331 info 0 0 kvm_inj_virq: irq 8 kvm_entry: vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6 kvm_entry: vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0xfea5 info 184 0 kvm_page_fault: address ffea5 error_code 184 kvm_entry: vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6 kvm_entry: vcpu 0 kvm_exit: reason EPT_VIOLATION rip 0xe990 info 184 0 kvm_page_fault: address fe990 error_code 184 kvm_entry: vcpu 0 kvm_exit: reason EXTERNAL_INTERRUPT rip 0xe990 info 0 800000f6 kvm_entry: vcpu 0 kvm_exit: reason EXCEPTION_NMI rip 0xd334 info 0 80000b0d kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17) > Trace without APICv (three reboots, just to make sure to hit the > problematic condition of supposed DF, as it still have not one hundred > percent reproducibility): > http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz The trace here contains a well matching excerpt, just instead of the EXCEPTION_NMI, it does 169.905098: kvm_exit: reason EPT_VIOLATION rip 0xd334 info 181 0 169.905102: kvm_page_fault: address feffd066 error_code 181 and works. Page fault says we tried to read 0xfeffd066 -- probably IOPB of TSS. (I guess it is pre-fetch for following IO instruction.) Nothing strikes me when looking at it, but some APICv boots don't fail, so it would be interesting to compare them ... hosts's 0xf6 interrupt (IRQ_WORK_VECTOR) is a possible source of races. (We could look more closely. It is fired too often for my liking as well.) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015-03-27 14:54+0300, Andrey Korolyov: > Trace with new bits: Thanks. > KVM internal error. Suberror: 2 > extra data[0]: 800000ef > extra data[1]: 80000b0d > extra data[2]: 77b The #GP code looks formatted as documented under INT in SDM, (vector << 3) | 2 | ext where 'ext' stands for 'external' (as opposed to software). 0x77b == (0xef << 3) | 2 | 1 It was 0xef and wasn't triggered by an INT instruction. The weird part is that it looks like a protected mode error, but CR0 says we are in real mode. (If CPU interpreted the vector in protected mode, then it would violate the IDT limit and throw a #GP ... It's too late for coffee today, so I'll try to lure some ideas later.) > EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 > ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006d24 > EIP=0000d331 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 00000000 0000ffff 00009300 > CS =f000 000f0000 0000ffff 00009b00 > SS =0000 00000000 0000ffff 00009300 > DS =0000 00000000 0000ffff 00009300 > FS =0000 00000000 0000ffff 00009300 > GS =0000 00000000 0000ffff 00009300 > LDT=0000 00000000 0000ffff 00008200 > TR =0000 00000000 0000ffff 00008b00 > GDT= 000f6cb0 00000037 > IDT= 00000000 000003ff > CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 > DR3=0000000000000000 > DR6=00000000ffff0ff0 DR7=0000000000000400 > EFER=0000000000000000 > Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb <cd> > 19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f > ba 2d d4 fe fb 3f -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 30, 2015 at 9:56 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote: > 2015-03-27 13:16+0300, Andrey Korolyov: >> On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das <bsd@redhat.com> wrote: >> > Radim Kr?má? <rkrcmar@redhat.com> writes: >> >> I second Bandan -- checking that it reproduces on other machine would be >> >> great for sanity :) (Although a bug in our APICv is far more likely.) >> > >> > If it's APICv related, a run without apicv enabled could give more hints. >> > >> > Your "devices not getting reset" hypothesis makes the most sense to me, >> > maybe the timer vector in the error message is just one part of >> > the whole story. Another misbehaving interrupt from the dark comes in at the >> > same time and leads to a double fault. >> >> Default trace (APICv enabled, first reboot introduced the issue): >> http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz > > The relevant part is here, > prefixed with "qemu-system-x86-4180 [002] 697.111550:" > > kvm_exit: reason CR_ACCESS rip 0xd272 info 0 0 > kvm_cr: cr_write 0 = 0x10 > kvm_mmu_get_page: existing sp gfn 0 0/4 q0 direct --- !pge !nxe root 0 sync > kvm_entry: vcpu 0 > kvm_emulate_insn: f0000:d275: ea 7a d2 00 f0 > kvm_emulate_insn: f0000:d27a: 2e 0f 01 1e f0 6c > kvm_emulate_insn: f0000:d280: 31 c0 > kvm_emulate_insn: f0000:d282: 8e e0 > kvm_emulate_insn: f0000:d284: 8e e8 > kvm_emulate_insn: f0000:d286: 8e c0 > kvm_emulate_insn: f0000:d288: 8e d8 > kvm_emulate_insn: f0000:d28a: 8e d0 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0xd28f info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EPT_VIOLATION rip 0x8dd0 info 184 0 > kvm_page_fault: address f8dd0 error_code 184 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0x8dd0 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EPT_VIOLATION rip 0x76d6 info 184 0 > kvm_page_fault: address f76d6 error_code 184 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0x76d6 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason PENDING_INTERRUPT rip 0xd331 info 0 0 > kvm_inj_virq: irq 8 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EPT_VIOLATION rip 0xfea5 info 184 0 > kvm_page_fault: address ffea5 error_code 184 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EPT_VIOLATION rip 0xe990 info 184 0 > kvm_page_fault: address fe990 error_code 184 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0xe990 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EXCEPTION_NMI rip 0xd334 info 0 80000b0d > kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17) > >> Trace without APICv (three reboots, just to make sure to hit the >> problematic condition of supposed DF, as it still have not one hundred >> percent reproducibility): >> http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz > > The trace here contains a well matching excerpt, just instead of the > EXCEPTION_NMI, it does > > 169.905098: kvm_exit: reason EPT_VIOLATION rip 0xd334 info 181 0 > 169.905102: kvm_page_fault: address feffd066 error_code 181 > > and works. Page fault says we tried to read 0xfeffd066 -- probably IOPB > of TSS. (I guess it is pre-fetch for following IO instruction.) > > Nothing strikes me when looking at it, but some APICv boots don't fail, > so it would be interesting to compare them ... hosts's 0xf6 interrupt > (IRQ_WORK_VECTOR) is a possible source of races. (We could look more > closely. It is fired too often for my liking as well.) Thanks Radim, http://xdel.ru/downloads/kvm-e5v2-issue/no-fail-with-apicv.dat.gz (missed right button in mailer previously) The related bits looks the same as with enable_apicv=0 for me. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 50c675b46901..541a29476a56 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5088,9 +5088,10 @@ static int handle_exception(struct kvm_vcpu *vcpu) !(is_page_fault(intr_info) && !(error_code & PFERR_RSVD_MASK))) { vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_SIMUL_EX; - vcpu->run->internal.ndata = 2; + vcpu->run->internal.ndata = 3; vcpu->run->internal.data[0] = vect_info; vcpu->run->internal.data[1] = intr_info; + vcpu->run->internal.data[2] = error_code; return 0; }