Message ID | 20140628114431.GB4373@pd.tnic (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote: > qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2 > qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a > > kvm injects the #PF into the guest. > > qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1 > qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318 > qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2 > qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0) > > Second #PF at the same address and kvm injects the #DF. > > BUT(!), why? > > I probably am missing something but WTH are we pagefaulting at a > user address in context_switch() while doing a lockdep call, i.e. > spin_release? We're not touching any userspace gunk there AFAICT. > > Is this an async pagefault or so which kvm is doing so that the guest > rip is actually pointing at the wrong place? > There is nothing in the trace that point to async pagefault as far as I see. > Or something else I'm missing, most probably... > Strange indeed. Can you also enable kvmmmu tracing? You can also instrument kvm_multiple_exception() to see which two exception are combined into #DF. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014-06-29 08:46, Gleb Natapov wrote: > On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote: >> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2 >> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a >> >> kvm injects the #PF into the guest. >> >> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1 >> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318 >> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2 >> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0) >> >> Second #PF at the same address and kvm injects the #DF. >> >> BUT(!), why? >> >> I probably am missing something but WTH are we pagefaulting at a >> user address in context_switch() while doing a lockdep call, i.e. >> spin_release? We're not touching any userspace gunk there AFAICT. >> >> Is this an async pagefault or so which kvm is doing so that the guest >> rip is actually pointing at the wrong place? >> > There is nothing in the trace that point to async pagefault as far as I see. > >> Or something else I'm missing, most probably... >> > Strange indeed. Can you also enable kvmmmu tracing? You can also instrument > kvm_multiple_exception() to see which two exception are combined into #DF. > FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It disappears with older KVM (didn't bisect yet, some 3.11 is fine) and when patch-disabling the vmport in QEMU. Let me know if I can help with the analysis. Jan
On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote: > On 2014-06-29 08:46, Gleb Natapov wrote: > > On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote: > >> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2 > >> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a > >> > >> kvm injects the #PF into the guest. > >> > >> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1 > >> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318 > >> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2 > >> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0) > >> > >> Second #PF at the same address and kvm injects the #DF. > >> > >> BUT(!), why? > >> > >> I probably am missing something but WTH are we pagefaulting at a > >> user address in context_switch() while doing a lockdep call, i.e. > >> spin_release? We're not touching any userspace gunk there AFAICT. > >> > >> Is this an async pagefault or so which kvm is doing so that the guest > >> rip is actually pointing at the wrong place? > >> > > There is nothing in the trace that point to async pagefault as far as I see. > > > >> Or something else I'm missing, most probably... > >> > > Strange indeed. Can you also enable kvmmmu tracing? You can also instrument > > kvm_multiple_exception() to see which two exception are combined into #DF. > > > > FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It > disappears with older KVM (didn't bisect yet, some 3.11 is fine) and > when patch-disabling the vmport in QEMU. > > Let me know if I can help with the analysis. > Bisection would be great of course. Once thing that is special about vmport that comes to mind is that it reads vcpu registers to userspace and write them back. IIRC "info registers" does the same. Can you see if the problem is reproducible with disabled vmport, but doing "info registers" in qemu console? Although trace does not should any exists to userspace near the failure... -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014-06-29 12:24, Gleb Natapov wrote: > On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote: >> On 2014-06-29 08:46, Gleb Natapov wrote: >>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote: >>>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2 >>>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a >>>> >>>> kvm injects the #PF into the guest. >>>> >>>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1 >>>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318 >>>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2 >>>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0) >>>> >>>> Second #PF at the same address and kvm injects the #DF. >>>> >>>> BUT(!), why? >>>> >>>> I probably am missing something but WTH are we pagefaulting at a >>>> user address in context_switch() while doing a lockdep call, i.e. >>>> spin_release? We're not touching any userspace gunk there AFAICT. >>>> >>>> Is this an async pagefault or so which kvm is doing so that the guest >>>> rip is actually pointing at the wrong place? >>>> >>> There is nothing in the trace that point to async pagefault as far as I see. >>> >>>> Or something else I'm missing, most probably... >>>> >>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument >>> kvm_multiple_exception() to see which two exception are combined into #DF. >>> >> >> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It >> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and >> when patch-disabling the vmport in QEMU. >> >> Let me know if I can help with the analysis. >> > Bisection would be great of course. Once thing that is special about > vmport that comes to mind is that it reads vcpu registers to userspace and > write them back. IIRC "info registers" does the same. Can you see if the > problem is reproducible with disabled vmport, but doing "info registers" > in qemu console? Although trace does not should any exists to userspace > near the failure... Yes, info registers crashes the guest after a while as well (with different backtrace due to different context). Jan
On Sun, Jun 29, 2014 at 12:31:50PM +0200, Jan Kiszka wrote: > On 2014-06-29 12:24, Gleb Natapov wrote: > > On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote: > >> On 2014-06-29 08:46, Gleb Natapov wrote: > >>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote: > >>>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2 > >>>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a > >>>> > >>>> kvm injects the #PF into the guest. > >>>> > >>>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1 > >>>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318 > >>>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2 > >>>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0) > >>>> > >>>> Second #PF at the same address and kvm injects the #DF. > >>>> > >>>> BUT(!), why? > >>>> > >>>> I probably am missing something but WTH are we pagefaulting at a > >>>> user address in context_switch() while doing a lockdep call, i.e. > >>>> spin_release? We're not touching any userspace gunk there AFAICT. > >>>> > >>>> Is this an async pagefault or so which kvm is doing so that the guest > >>>> rip is actually pointing at the wrong place? > >>>> > >>> There is nothing in the trace that point to async pagefault as far as I see. > >>> > >>>> Or something else I'm missing, most probably... > >>>> > >>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument > >>> kvm_multiple_exception() to see which two exception are combined into #DF. > >>> > >> > >> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It > >> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and > >> when patch-disabling the vmport in QEMU. > >> > >> Let me know if I can help with the analysis. > >> > > Bisection would be great of course. Once thing that is special about > > vmport that comes to mind is that it reads vcpu registers to userspace and > > write them back. IIRC "info registers" does the same. Can you see if the > > problem is reproducible with disabled vmport, but doing "info registers" > > in qemu console? Although trace does not should any exists to userspace > > near the failure... > > Yes, info registers crashes the guest after a while as well (with > different backtrace due to different context). > Oh crap. Bisection would be most helpful. Just to be absolutely sure that this is not QEMU problem: does exactly same QEMU version work with older kernels? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014-06-29 12:53, Gleb Natapov wrote: > On Sun, Jun 29, 2014 at 12:31:50PM +0200, Jan Kiszka wrote: >> On 2014-06-29 12:24, Gleb Natapov wrote: >>> On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote: >>>> On 2014-06-29 08:46, Gleb Natapov wrote: >>>>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote: >>>>>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2 >>>>>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a >>>>>> >>>>>> kvm injects the #PF into the guest. >>>>>> >>>>>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1 >>>>>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318 >>>>>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2 >>>>>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0) >>>>>> >>>>>> Second #PF at the same address and kvm injects the #DF. >>>>>> >>>>>> BUT(!), why? >>>>>> >>>>>> I probably am missing something but WTH are we pagefaulting at a >>>>>> user address in context_switch() while doing a lockdep call, i.e. >>>>>> spin_release? We're not touching any userspace gunk there AFAICT. >>>>>> >>>>>> Is this an async pagefault or so which kvm is doing so that the guest >>>>>> rip is actually pointing at the wrong place? >>>>>> >>>>> There is nothing in the trace that point to async pagefault as far as I see. >>>>> >>>>>> Or something else I'm missing, most probably... >>>>>> >>>>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument >>>>> kvm_multiple_exception() to see which two exception are combined into #DF. >>>>> >>>> >>>> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It >>>> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and >>>> when patch-disabling the vmport in QEMU. >>>> >>>> Let me know if I can help with the analysis. >>>> >>> Bisection would be great of course. Once thing that is special about >>> vmport that comes to mind is that it reads vcpu registers to userspace and >>> write them back. IIRC "info registers" does the same. Can you see if the >>> problem is reproducible with disabled vmport, but doing "info registers" >>> in qemu console? Although trace does not should any exists to userspace >>> near the failure... >> >> Yes, info registers crashes the guest after a while as well (with >> different backtrace due to different context). >> > Oh crap. Bisection would be most helpful. Just to be absolutely sure > that this is not QEMU problem: does exactly same QEMU version work with > older kernels? Yes, that was the case last time I tried (I'm on today's git head with QEMU right now). Will see what I can do regarding bisecting. That host is a bit slow (netbook), so it may take a while. Boris will probably beat me in this. Jan
On Sun, Jun 29, 2014 at 12:59:30PM +0200, Jan Kiszka wrote: > Will see what I can do regarding bisecting. That host is a bit slow > (netbook), so it may take a while. Boris will probably beat me in > this. Nah, I was about to instrument kvm_multiple_exception() first and am slow anyway so... :-) Thanks.
On 2014-06-29 13:51, Borislav Petkov wrote: > On Sun, Jun 29, 2014 at 12:59:30PM +0200, Jan Kiszka wrote: >> Will see what I can do regarding bisecting. That host is a bit slow >> (netbook), so it may take a while. Boris will probably beat me in >> this. > > Nah, I was about to instrument kvm_multiple_exception() first and am > slow anyway so... :-) OK, looks like I won ;): The issue was apparently introduced with "KVM: x86: get CPL from SS.DPL" (ae9fedc793). Maybe we are not properly saving or restoring this state on SVM since then. Need a break, will look into details later. Jan
On Sun, Jun 29, 2014 at 02:22:35PM +0200, Jan Kiszka wrote: > OK, looks like I won ;): I gladly let you win. :-P > The issue was apparently introduced with "KVM: x86: get CPL from > SS.DPL" (ae9fedc793). Maybe we are not properly saving or restoring > this state on SVM since then. I wonder if this change in the CPL saving would have anything to do with the fact that we're doing a CR3 write right before we fail pagetable walk and end up walking a user page table. It could be unrelated though, as in the previous dump I had a get_user right before the #DF. Hmmm. I better go and revert that one and check whether it fixes things. > Need a break, will look into details later. Ok, some more info from my side, see relevant snippet below. We're basically not finding the pte at level 3 during the page walk for 7fff0b0f8908. However, why we're even page walking this userspace address at that point I have no idea. And the CR3 write right before this happens is there so I'm pretty much sure by now that this is related... qemu-system-x86-5007 [007] ...1 346.126204: vcpu_match_mmio: gva 0xffffffffff5fd0b0 gpa 0xfee000b0 Write GVA qemu-system-x86-5007 [007] ...1 346.126204: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0 qemu-system-x86-5007 [007] ...1 346.126205: kvm_apic: apic_write APIC_EOI = 0x0 qemu-system-x86-5007 [007] ...1 346.126205: kvm_eoi: apicid 0 vector 253 qemu-system-x86-5007 [007] d..2 346.126206: kvm_entry: vcpu 0 qemu-system-x86-5007 [007] d..2 346.126211: kvm_exit: reason write_cr3 rip 0xffffffff816113a0 info 8000000000000000 0 qemu-system-x86-5007 [007] ...2 346.126214: kvm_mmu_get_page: sp gen 25 gfn 7b2b1 4 pae q0 wux !nxe root 0 sync existing qemu-system-x86-5007 [007] d..2 346.126215: kvm_entry: vcpu 0 qemu-system-x86-5007 [007] d..2 346.126216: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908 qemu-system-x86-5007 [007] ...1 346.126217: kvm_page_fault: address 7fff0b0f8908 error_code 2 qemu-system-x86-5007 [007] ...1 346.126218: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W qemu-system-x86-5007 [007] ...1 346.126219: kvm_mmu_paging_element: pte 7b2b6067 level 4 qemu-system-x86-5007 [007] ...1 346.126220: kvm_mmu_paging_element: pte 0 level 3 qemu-system-x86-5007 [007] ...1 346.126220: kvm_mmu_walker_error: pferr 2 W qemu-system-x86-5007 [007] ...1 346.126221: kvm_multiple_exception: nr: 14, prev: 255, has_error: 1, error_code: 0x2, reinj: 0 qemu-system-x86-5007 [007] ...1 346.126221: kvm_inj_exception: #PF (0x2) qemu-system-x86-5007 [007] d..2 346.126222: kvm_entry: vcpu 0 qemu-system-x86-5007 [007] d..2 346.126223: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908 qemu-system-x86-5007 [007] ...1 346.126224: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 1 qemu-system-x86-5007 [007] ...1 346.126225: kvm_page_fault: address 7fff0b0f8908 error_code 2 qemu-system-x86-5007 [007] ...1 346.126225: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 0 qemu-system-x86-5007 [007] ...1 346.126226: kvm_mmu_paging_element: pte 7b2b6067 level 4 qemu-system-x86-5007 [007] ...1 346.126227: kvm_mmu_paging_element: pte 0 level 3 qemu-system-x86-5007 [007] ...1 346.126227: kvm_mmu_walker_error: pferr 0 qemu-system-x86-5007 [007] ...1 346.126228: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W qemu-system-x86-5007 [007] ...1 346.126229: kvm_mmu_paging_element: pte 7b2b6067 level 4 qemu-system-x86-5007 [007] ...1 346.126230: kvm_mmu_paging_element: pte 0 level 3 qemu-system-x86-5007 [007] ...1 346.126230: kvm_mmu_walker_error: pferr 2 W qemu-system-x86-5007 [007] ...1 346.126231: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 0 qemu-system-x86-5007 [007] ...1 346.126231: kvm_inj_exception: #DF (0x0) qemu-system-x86-5007 [007] d..2 346.126232: kvm_entry: vcpu 0 qemu-system-x86-5007 [007] d..2 346.126371: kvm_exit: reason io rip 0xffffffff8131e623 info 3d40220 ffffffff8131e625 qemu-system-x86-5007 [007] ...1 346.126372: kvm_pio: pio_write at 0x3d4 size 2 count 1 val 0x130e qemu-system-x86-5007 [007] ...1 346.126374: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-system-x86-5007 [007] d..2 346.126383: kvm_entry: vcpu 0
On Sun, Jun 29, 2014 at 03:14:43PM +0200, Borislav Petkov wrote: > On Sun, Jun 29, 2014 at 02:22:35PM +0200, Jan Kiszka wrote: > > OK, looks like I won ;): > > I gladly let you win. :-P > > > The issue was apparently introduced with "KVM: x86: get CPL from > > SS.DPL" (ae9fedc793). Maybe we are not properly saving or restoring > > this state on SVM since then. > > I wonder if this change in the CPL saving would have anything to do with > the fact that we're doing a CR3 write right before we fail pagetable > walk and end up walking a user page table. It could be unrelated though, > as in the previous dump I had a get_user right before the #DF. Hmmm. > > I better go and revert that one and check whether it fixes things. Please do so and let us know. > > > Need a break, will look into details later. > > Ok, some more info from my side, see relevant snippet below. We're > basically not finding the pte at level 3 during the page walk for > 7fff0b0f8908. > > However, why we're even page walking this userspace address at that > point I have no idea. > > And the CR3 write right before this happens is there so I'm pretty much > sure by now that this is related... > > qemu-system-x86-5007 [007] ...1 346.126204: vcpu_match_mmio: gva 0xffffffffff5fd0b0 gpa 0xfee000b0 Write GVA > qemu-system-x86-5007 [007] ...1 346.126204: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0 > qemu-system-x86-5007 [007] ...1 346.126205: kvm_apic: apic_write APIC_EOI = 0x0 > qemu-system-x86-5007 [007] ...1 346.126205: kvm_eoi: apicid 0 vector 253 > qemu-system-x86-5007 [007] d..2 346.126206: kvm_entry: vcpu 0 > qemu-system-x86-5007 [007] d..2 346.126211: kvm_exit: reason write_cr3 rip 0xffffffff816113a0 info 8000000000000000 0 > qemu-system-x86-5007 [007] ...2 346.126214: kvm_mmu_get_page: sp gen 25 gfn 7b2b1 4 pae q0 wux !nxe root 0 sync existing > qemu-system-x86-5007 [007] d..2 346.126215: kvm_entry: vcpu 0 > qemu-system-x86-5007 [007] d..2 346.126216: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908 > qemu-system-x86-5007 [007] ...1 346.126217: kvm_page_fault: address 7fff0b0f8908 error_code 2 VCPU faults on 7fff0b0f8908. > qemu-system-x86-5007 [007] ...1 346.126218: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W > qemu-system-x86-5007 [007] ...1 346.126219: kvm_mmu_paging_element: pte 7b2b6067 level 4 > qemu-system-x86-5007 [007] ...1 346.126220: kvm_mmu_paging_element: pte 0 level 3 > qemu-system-x86-5007 [007] ...1 346.126220: kvm_mmu_walker_error: pferr 2 W Address is not mapped by the page tables. > qemu-system-x86-5007 [007] ...1 346.126221: kvm_multiple_exception: nr: 14, prev: 255, has_error: 1, error_code: 0x2, reinj: 0 > qemu-system-x86-5007 [007] ...1 346.126221: kvm_inj_exception: #PF (0x2) KVM injects #PF. > qemu-system-x86-5007 [007] d..2 346.126222: kvm_entry: vcpu 0 > qemu-system-x86-5007 [007] d..2 346.126223: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908 > qemu-system-x86-5007 [007] ...1 346.126224: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 1 reinj:1 means that previous injection failed due to another #PF that happened during the event injection itself This may happen if GDT or fist instruction of a fault handler is not mapped by shadow pages, but here it says that the new page fault is at the same address as the previous one as if GDT is or #PF handler is mapped there. Strange. Especially since #DF is injected successfully, so GDT should be fine. May be wrong cpl makes svm crazy? > qemu-system-x86-5007 [007] ...1 346.126225: kvm_page_fault: address 7fff0b0f8908 error_code 2 > qemu-system-x86-5007 [007] ...1 346.126225: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 0 > qemu-system-x86-5007 [007] ...1 346.126226: kvm_mmu_paging_element: pte 7b2b6067 level 4 > qemu-system-x86-5007 [007] ...1 346.126227: kvm_mmu_paging_element: pte 0 level 3 > qemu-system-x86-5007 [007] ...1 346.126227: kvm_mmu_walker_error: pferr 0 > qemu-system-x86-5007 [007] ...1 346.126228: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W > qemu-system-x86-5007 [007] ...1 346.126229: kvm_mmu_paging_element: pte 7b2b6067 level 4 > qemu-system-x86-5007 [007] ...1 346.126230: kvm_mmu_paging_element: pte 0 level 3 > qemu-system-x86-5007 [007] ...1 346.126230: kvm_mmu_walker_error: pferr 2 W > qemu-system-x86-5007 [007] ...1 346.126231: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 0 Here we getting a #PF while delivering another #PF which is, rightfully, transformed to #DF. > qemu-system-x86-5007 [007] ...1 346.126231: kvm_inj_exception: #DF (0x0) > qemu-system-x86-5007 [007] d..2 346.126232: kvm_entry: vcpu 0 > qemu-system-x86-5007 [007] d..2 346.126371: kvm_exit: reason io rip 0xffffffff8131e623 info 3d40220 ffffffff8131e625 > qemu-system-x86-5007 [007] ...1 346.126372: kvm_pio: pio_write at 0x3d4 size 2 count 1 val 0x130e > qemu-system-x86-5007 [007] ...1 346.126374: kvm_userspace_exit: reason KVM_EXIT_IO (2) > qemu-system-x86-5007 [007] d..2 346.126383: kvm_entry: vcpu 0 > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jun 29, 2014 at 03:14:43PM +0200, Borislav Petkov wrote: > I better go and revert that one and check whether it fixes things. Yahaaa, that was some good bisection work Jan! :-) > 20 guest restart cycles and all is fine - it used to trigger after 5 max. Phew, we have it right in time before the football game in 2 hrs. :-)
On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote: > Please do so and let us know. Yep, just did. Reverting ae9fedc793 fixes the issue. > reinj:1 means that previous injection failed due to another #PF that > happened during the event injection itself This may happen if GDT or fist > instruction of a fault handler is not mapped by shadow pages, but here > it says that the new page fault is at the same address as the previous > one as if GDT is or #PF handler is mapped there. Strange. Especially > since #DF is injected successfully, so GDT should be fine. May be wrong > cpl makes svm crazy? Well, I'm not going to even pretend to know kvm to know *when* we're saving VMCB state but if we're saving the wrong CPL and then doing the pagetable walk, I can very well imagine if the walker gets confused. One possible issue could be U/S bit (bit 2) in the PTE bits which allows access to supervisor pages only when CPL < 3. I.e., CPL has effect on pagetable walk and a wrong CPL level could break it. All a conjecture though...
On Sun, Jun 29, 2014 at 04:01:04PM +0200, Borislav Petkov wrote: > On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote: > > Please do so and let us know. > > Yep, just did. Reverting ae9fedc793 fixes the issue. > > > reinj:1 means that previous injection failed due to another #PF that > > happened during the event injection itself This may happen if GDT or fist > > instruction of a fault handler is not mapped by shadow pages, but here > > it says that the new page fault is at the same address as the previous > > one as if GDT is or #PF handler is mapped there. Strange. Especially > > since #DF is injected successfully, so GDT should be fine. May be wrong > > cpl makes svm crazy? > > Well, I'm not going to even pretend to know kvm to know *when* we're > saving VMCB state but if we're saving the wrong CPL and then doing the > pagetable walk, I can very well imagine if the walker gets confused. One > possible issue could be U/S bit (bit 2) in the PTE bits which allows > access to supervisor pages only when CPL < 3. I.e., CPL has effect on > pagetable walk and a wrong CPL level could break it. > > All a conjecture though... > Looks plausible, still strange that second #PF is at the same address as the first one though. Anyway, not we have the commit to blame. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014-06-29 16:27, Gleb Natapov wrote: > On Sun, Jun 29, 2014 at 04:01:04PM +0200, Borislav Petkov wrote: >> On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote: >>> Please do so and let us know. >> >> Yep, just did. Reverting ae9fedc793 fixes the issue. >> >>> reinj:1 means that previous injection failed due to another #PF that >>> happened during the event injection itself This may happen if GDT or fist >>> instruction of a fault handler is not mapped by shadow pages, but here >>> it says that the new page fault is at the same address as the previous >>> one as if GDT is or #PF handler is mapped there. Strange. Especially >>> since #DF is injected successfully, so GDT should be fine. May be wrong >>> cpl makes svm crazy? >> >> Well, I'm not going to even pretend to know kvm to know *when* we're >> saving VMCB state but if we're saving the wrong CPL and then doing the >> pagetable walk, I can very well imagine if the walker gets confused. One >> possible issue could be U/S bit (bit 2) in the PTE bits which allows >> access to supervisor pages only when CPL < 3. I.e., CPL has effect on >> pagetable walk and a wrong CPL level could break it. >> >> All a conjecture though... >> > Looks plausible, still strange that second #PF is at the same address as the first one though. > Anyway, not we have the commit to blame. I suspect there is a gap between cause and effect. I'm tracing CPL changes currently, and my first impression is that QEMU triggers an unwanted switch from CPL 3 to 0 on vmport access: qemu-system-x86-11883 [001] 7493.378630: kvm_entry: vcpu 0 qemu-system-x86-11883 [001] 7493.378631: bprint: svm_vcpu_run: entry cpl 0 qemu-system-x86-11883 [001] 7493.378636: bprint: svm_vcpu_run: exit cpl 3 qemu-system-x86-11883 [001] 7493.378637: kvm_exit: reason io rip 0x400854 info 56580241 400855 qemu-system-x86-11883 [001] 7493.378640: kvm_emulate_insn: 0:400854:ed (prot64) qemu-system-x86-11883 [001] 7493.378642: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-system-x86-11883 [001] 7493.378655: bprint: kvm_arch_vcpu_ioctl_get_sregs: ss.dpl 0 qemu-system-x86-11883 [001] 7493.378684: bprint: kvm_arch_vcpu_ioctl_set_sregs: ss.dpl 0 qemu-system-x86-11883 [001] 7493.378685: bprint: svm_set_segment: cpl = 0 qemu-system-x86-11883 [001] 7493.378711: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x3442554a Yeah... do we have to manually sync save.cpl into ss.dpl on get_sregs on AMD? Jan
From b2e6dd5168373feb7548da5521efc40c58409567 Mon Sep 17 00:00:00 2001 From: Borislav Petkov <bp@suse.de> Date: Fri, 27 Jun 2014 20:22:05 +0200 Subject: [PATCH] kvm, svm: Intercept #DF on AMD Thanks Joro for the initial patch. Originally-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Borislav Petkov <bp@suse.de> --- arch/x86/kvm/svm.c | 9 +++++++++ arch/x86/kvm/trace.h | 15 +++++++++++++++ arch/x86/kvm/x86.c | 1 + 3 files changed, 25 insertions(+) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ec8366c5cfea..30a83f219aa5 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1101,6 +1101,7 @@ static void init_vmcb(struct vcpu_svm *svm) set_exception_intercept(svm, PF_VECTOR); set_exception_intercept(svm, UD_VECTOR); set_exception_intercept(svm, MC_VECTOR); + set_exception_intercept(svm, DF_VECTOR); set_intercept(svm, INTERCEPT_INTR); set_intercept(svm, INTERCEPT_NMI); @@ -1784,6 +1785,13 @@ static int ud_interception(struct vcpu_svm *svm) return 1; } +static int df_interception(struct vcpu_svm *svm) +{ + trace_kvm_df(kvm_rip_read(&svm->vcpu)); + + return 1; +} + static void svm_fpu_activate(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -3324,6 +3332,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_EXCP_BASE + PF_VECTOR] = pf_interception, [SVM_EXIT_EXCP_BASE + NM_VECTOR] = nm_interception, [SVM_EXIT_EXCP_BASE + MC_VECTOR] = mc_interception, + [SVM_EXIT_EXCP_BASE + DF_VECTOR] = df_interception, [SVM_EXIT_INTR] = intr_interception, [SVM_EXIT_NMI] = nmi_interception, [SVM_EXIT_SMI] = nop_on_interception, diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 33574c95220d..8ac01d218443 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -88,6 +88,21 @@ TRACE_EVENT(kvm_hv_hypercall, __entry->outgpa) ); +TRACE_EVENT(kvm_df, + TP_PROTO(unsigned long rip), + TP_ARGS(rip), + + TP_STRUCT__entry( + __field( unsigned long, rip ) + ), + + TP_fast_assign( + __entry->rip = rip; + ), + + TP_printk("rip: 0x%lx", __entry->rip) +); + /* * Tracepoint for PIO. */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f32a02578c0d..9e6056dcdaea 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7576,3 +7576,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_df); -- 2.0.0