diff mbox

BUG at mmu.c:615 from localhost migration using ept+hugetlbfs

Message ID 20090609164036.GA10828@amt.cnet (mailing list archive)
State New, archived
Headers show

Commit Message

Marcelo Tosatti June 9, 2009, 4:40 p.m. UTC
Ryan,

On Fri, May 29, 2009 at 11:43:26AM -0500, Ryan Harper wrote:
> Testing latest qemu-kvm.git and kvm-kmod.git, ept enabled and backing
> guests with large pages trips a BUG in the mmu code.  If I disable ept,
> but still use large pages, migration succeeds.  Reproduce with:
> 
> hugetlbfs setup:
> % mkdir -p /hugetlbfs && mount -t hugetlbfs hugetlbfs /hugetlbfs
> % echo 10000 > /proc/sys/vm/nr_hugepages
> 
> qemu commands:
> 
> guest a:
> sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :12 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw
> 
> guest b:
> sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :13 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw -incoming tcp:0:4444
> 
> Once the guest a is up, issued migrate command:
> (qemu) migrate -d tcp:localhost:444
> 
> rmap_remove: ffff880a08e00098 c0336e65c0336e5b 0->BUG
				^^^^^^^^^^^^^^^^

This value looks very strange (bits 5:3 contain invalid value, for one).
Don't have access to HW at the very moment, so it would be great if you
had time to do a change equivalent to this and reproduce:


Avi, any hints?

> ------------[ cut here ]------------
> kernel BUG at /home/rharper/work/git/kvm-kmod/x86/mmu.c:615!
> invalid opcode: 0000 [1] SMP 
> last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
> CPU 6 
> Modules linked in: kvm_intel(N) kvm(N) tun nfs lockd nfs_acl sunrpc ipv6 bridge stp cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop sr_mod cdrom dm_mod sg rtc_cmos thermal cdc_ether i2c_i801 rtc_core usbnet usb_storage shpchp i2c_core rtc_lib processor bnx2 pcspkr button pci_hotplug mii mptctl joydev usbhid hid ff_memless uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd fan thermal_sys hwmon ext3 mbcache jbd mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: kvm]
> Supported: No
> Pid: 17635, comm: qemu-system-x86 Tainted: G          2.6.27.19-5-default #1
> RIP: 0010:[<ffffffffa012d8dc>]  [<ffffffffa012d8dc>] rmap_remove+0xc9/0x19e [kvm]
> RSP: 0018:ffff880c7a1cbba8  EFLAGS: 00010296
> RAX: 0000000000000039 RBX: 00000036e65c0336 RCX: ffff880c7b405e60
> RDX: ffffffff806e0d08 RSI: 0000000000000092 RDI: ffffffff806e0d00
> RBP: ffff880a08e00098 R08: ffffffff806e0cf0 R09: 0000000100000000
> R10: 0000000000000046 R11: 000000000000000a R12: ffff880c7b066a20
> R13: ffff8806778e0000 R14: 0000000000000000 R15: 0000000000000007
> FS:  00007f298b4ad950(0000) GS:ffff880c7cd83f40(0000) knlGS:0000000000000000
> CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 0000000000879ba0 CR3: 0000000679da8000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process qemu-system-x86 (pid: 17635, threadinfo ffff880c7a1ca000, task ffff8809ebce4880)
> Stack:  ffff88069822b888 0000000000000000 ffff8803f1000040 ffff880a08e00098
>  0000000000000001 ffffffffa012f5d3 ffff880c7a1cbc58 ffffffff8023661a
>  0000000000000000 8000000a08e000e7 ffffffff80228db7 00007f298e413fff
> Call Trace:
>  [<ffffffffa012f5d3>] mmu_set_spte+0x98/0x302 [kvm]
>  [<ffffffffa012ffa3>] __direct_map+0xee/0x1b8 [kvm]
>  [<ffffffffa013014b>] tdp_page_fault+0xde/0x114 [kvm]
>  [<ffffffffa0130f16>] kvm_mmu_page_fault+0x19/0x81 [kvm]
>  [<ffffffffa012a64b>] kvm_arch_vcpu_ioctl_run+0x89b/0xaf2 [kvm]
>  [<ffffffffa0123540>] kvm_vcpu_ioctl+0xf1/0x46b [kvm]
>  [<ffffffff802bd249>] vfs_ioctl+0x21/0x6c
>  [<ffffffff802bd4b6>] do_vfs_ioctl+0x222/0x231
>  [<ffffffff802bd516>] sys_ioctl+0x51/0x73
>  [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b
>  [<00007f298c3c3b77>] 0x7f298c3c3b77
> 
> 
> Code: 80 00 00 00 48 8b 34 c1 e8 0c ff ff ff 49 89 c1 48 8b 00 48 85 c0 75 17 48 8b 55 00 48 89 ee 48 c7 c7 2f db 13 a0 e8 6d cc 36 e0 <0f> 0b eb fe a8 01 75 2a 48 39 c5 74 19 48 8b 55 00 48 89 ee 48 
> RIP  [<ffffffffa012d8dc>] rmap_remove+0xc9/0x19e [kvm]
>  RSP <ffff880c7a1cbba8>
> ---[ end trace 91e1d7963caa34a7 ]---
> 
> hugepage info:
> HugePages_Total: 10000
> HugePages_Free:   7944
> HugePages_Rsvd:      0
> HugePages_Surp:      0
> Hugepagesize:     2048 kB
> 
> module info:
> filename:       /lib/modules/2.6.27.19-5-default/extra/kvm-intel.ko
> license:        GPL
> author:         Qumranet
> version:        kvm-devel
> srcversion:     9F14ECEFD8109654DFA20D2
> depends:        kvm
> vermagic:       2.6.27.19-5-default SMP mod_unload modversions 
> parm:           bypass_guest_pf:bool
> parm:           vpid:bool
> parm:           flexpriority:bool
> parm:           ept:bool
> parm:           emulate_invalid_guest_state:bool
> 
> filename:       /lib/modules/2.6.27.19-5-default/extra/kvm.ko
> license:        GPL
> author:         Qumranet
> version:        kvm-devel
> srcversion:     157F8CB48FC31BC2F44847B
> depends:        
> vermagic:       2.6.27.19-5-default SMP mod_unload modversions 
> parm:           oos_shadow:bool
> 
> 
> 
> -- 
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Avi Kivity June 9, 2009, 4:47 p.m. UTC | #1
Marcelo Tosatti wrote:
> Ryan,
>
> On Fri, May 29, 2009 at 11:43:26AM -0500, Ryan Harper wrote:
>   
>> Testing latest qemu-kvm.git and kvm-kmod.git, ept enabled and backing
>> guests with large pages trips a BUG in the mmu code.  If I disable ept,
>> but still use large pages, migration succeeds.  Reproduce with:
>>
>> hugetlbfs setup:
>> % mkdir -p /hugetlbfs && mount -t hugetlbfs hugetlbfs /hugetlbfs
>> % echo 10000 > /proc/sys/vm/nr_hugepages
>>
>> qemu commands:
>>
>> guest a:
>> sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :12 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw
>>
>> guest b:
>> sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :13 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw -incoming tcp:0:4444
>>
>> Once the guest a is up, issued migrate command:
>> (qemu) migrate -d tcp:localhost:444
>>
>> rmap_remove: ffff880a08e00098 c0336e65c0336e5b 0->BUG
>>     
> 				^^^^^^^^^^^^^^^^
>
> This value looks very strange (bits 5:3 contain invalid value, for one).
> Don't have access to HW at the very moment, so it would be great if you
> had time to do a change equivalent to this and reproduce:
>   

That spte is totally bogus.

> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 809cce0..ceb70b0 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1759,7 +1764,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
>  			child = page_header(pte & PT64_BASE_ADDR_MASK);
>  			mmu_page_remove_parent_pte(child, shadow_pte);
>  		} else if (pfn != spte_to_pfn(*shadow_pte)) {
> -			pgprintk("hfn old %lx new %lx\n",
> +			printk(KERN_ERR "hfn old %lx new %lx\n",
>  				 spte_to_pfn(*shadow_pte), pfn);
>  			rmap_remove(vcpu->kvm, shadow_pte);
>  		} else
>
> Avi, any hints?
>   

Not really.  One thing, migration should transition the shadow 
pagetables from large pages to small ones, maybe that bit is broken.

Maybe we're looking at a largepage spte and interpreting it as a normal 
L2 spte, and interpreting a guest page as the L1 spt.
Ryan Harper June 9, 2009, 6:31 p.m. UTC | #2
* Marcelo Tosatti <mtosatti@redhat.com> [2009-06-09 11:45]:
> Ryan,

Marcelo, thanks for taking a look.  Applied patch and reproduced,
included the new debug output.

> 
> On Fri, May 29, 2009 at 11:43:26AM -0500, Ryan Harper wrote:
> > Testing latest qemu-kvm.git and kvm-kmod.git, ept enabled and backing
> > guests with large pages trips a BUG in the mmu code.  If I disable ept,
> > but still use large pages, migration succeeds.  Reproduce with:
> > 
> > hugetlbfs setup:
> > % mkdir -p /hugetlbfs && mount -t hugetlbfs hugetlbfs /hugetlbfs
> > % echo 10000 > /proc/sys/vm/nr_hugepages
> > 
> > qemu commands:
> > 
> > guest a:
> > sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :12 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw
> > 
> > guest b:
> > sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :13 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw -incoming tcp:0:4444
> > 
> > Once the guest a is up, issued migrate command:
> > (qemu) migrate -d tcp:localhost:444
> > 
> > rmap_remove: ffff880a08e00098 c0336e65c0336e5b 0->BUG
> 				^^^^^^^^^^^^^^^^
> 
> This value looks very strange (bits 5:3 contain invalid value, for one).
> Don't have access to HW at the very moment, so it would be great if you
> had time to do a change equivalent to this and reproduce:
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 809cce0..ceb70b0 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1759,7 +1764,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
>  			child = page_header(pte & PT64_BASE_ADDR_MASK);
>  			mmu_page_remove_parent_pte(child, shadow_pte);
>  		} else if (pfn != spte_to_pfn(*shadow_pte)) {
> -			pgprintk("hfn old %lx new %lx\n",
> +			printk(KERN_ERR "hfn old %lx new %lx\n",
>  				 spte_to_pfn(*shadow_pte), pfn);
>  			rmap_remove(vcpu->kvm, shadow_pte);
>  		} else

hfn old 36e65c0336 new 472213
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [<ffffffffa012ca5e>] gfn_to_rmap+0x17/0x49 [kvm]
PGD 676517067 PUD 2de5cd067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 5
Modules linked in: kvm_intel(N) kvm(N) nls_iso8859_1 nls_cp437 vfat fat crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi iscsi_ibft tun nfs lockd nfs_acl sunrpc ipv6 bridge stp cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod sr_mod cdrom cdc_ether thermal usb_storage usbnet processor sg rtc_cmos i2c_i801 shpchp rtc_core rtc_lib button mii i2c_core pcspkr pci_hotplug joydev bnx2 mptctl usbhid hid ff_memless uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd fan thermal_sys hwmon ext3 mbcache jbd mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: kvm]
Supported: No
Pid: 31785, comm: qemu-system-x86 Tainted: G          2.6.27.19-5-default #1
RIP: 0010:[<ffffffffa012ca5e>]  [<ffffffffa012ca5e>] gfn_to_rmap+0x17/0x49 [kvm]
RSP: 0018:ffff880677499ba8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00000000000fffbd RSI: ffff8803808e08c0 RDI: 730000434950415f
RBP: 730000434950415f R08: 0000000000000023 R09: 0000000000000000
R10: 0000000000000046 R11: 0000000000000006 R12: ffff88067a971c60
R13: ffff8803808e0000 R14: 00000000c0336e5b R15: 0000000000000000
FS:  00007fee7dfbf950(0000) GS:ffff880c7cd938c0(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000674d09000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 31785, threadinfo ffff880677498000, task ffff8803425186c0)
Stack:  ffff880677499bb8 00000036e65c0336 ffff880472200098 ffffffffa012cb3b
 0000000000472213 0000000000000000 ffff880c7a148080 ffff880472200098
 ffff880c7a148080 ffffffffa012e5b2 ffff880677499ce8 00000005425186c0
Call Trace:
 [<ffffffffa012cb3b>] rmap_remove+0xab/0x19e [kvm]
 [<ffffffffa012e5b2>] mmu_set_spte+0xb0/0x316 [kvm]
 [<ffffffffa012ef2b>] direct_map_entry+0x7b/0x104 [kvm]
 [<ffffffffa012c1b1>] walk_shadow+0x8d/0xb7 [kvm]
 [<ffffffffa012deb1>] tdp_page_fault+0xf8/0x137 [kvm]
 [<ffffffffa012f802>] kvm_mmu_page_fault+0x19/0x80 [kvm]
 [<ffffffffa01be9b5>] handle_ept_violation+0xe0/0x17a [kvm_intel]
 [<ffffffffa012a012>] kvm_arch_vcpu_ioctl_run+0x4dd/0x6e5 [kvm]
 [<ffffffffa0123457>] kvm_vcpu_ioctl+0xf1/0x46b [kvm]
 [<ffffffff802bd249>] vfs_ioctl+0x21/0x6c
 [<ffffffff802bd4b6>] do_vfs_ioctl+0x222/0x231
 [<ffffffff802bd516>] sys_ioctl+0x51/0x73
 [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b
 [<00007fee7eed5b77>] 0x7fee7eed5b77


Code: 31 ed 0f 18 08 eb a1 41 58 5b 5d 41 5c 41 5d 41 5e 41 5f c3 55 48 89 f5 53 89 d3 48 83 ec 08 e8 85 7a ff ff 85 db 48 89 c1 75 11 <48> 2b 28 48 8d 14 ed 00 00 00 00 48 03 50 18 eb 19 48 8b 00 48
RIP  [<ffffffffa012ca5e>] gfn_to_rmap+0x17/0x49 [kvm]
 RSP <ffff880677499ba8>
CR2: 0000000000000000
---[ end trace 6127eb9ebc2e7fb6 ]---
Avi Kivity June 10, 2009, 8:08 a.m. UTC | #3
Avi Kivity wrote:
>
> Not really.  One thing, migration should transition the shadow 
> pagetables from large pages to small ones, maybe that bit is broken.
>
> Maybe we're looking at a largepage spte and interpreting it as a 
> normal L2 spte, and interpreting a guest page as the L1 spt.

I tried to find where we drop the mmu (or at least large sptes for the 
slot) when we enable dirty logging, and failed.  Maybe 
remove_write_access() is sufficient.
Marcelo Tosatti June 10, 2009, 12:10 p.m. UTC | #4
On Wed, Jun 10, 2009 at 11:08:14AM +0300, Avi Kivity wrote:
> Avi Kivity wrote:
>>
>> Not really.  One thing, migration should transition the shadow  
>> pagetables from large pages to small ones, maybe that bit is broken.
>>
>> Maybe we're looking at a largepage spte and interpreting it as a  
>> normal L2 spte, and interpreting a guest page as the L1 spt.
>
> I tried to find where we drop the mmu (or at least large sptes for the  
> slot) when we enable dirty logging, and failed.  Maybe  
> remove_write_access() is sufficient.

I believe you have to break down large pages into 4k pages for migration
to work reliably. Was tempted to copy&paste the hugetlbfs file ram alloc
code into user/main.c to use with user/vm.c (which then can also be used
to test TLB flushes on 2M->4k transition which are lacking).

Regarding the bogus spte, could not reproduce yesterday with kvm.git,
but in the worst case the audit code will catch it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 809cce0..ceb70b0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1759,7 +1764,7 @@  static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 			child = page_header(pte & PT64_BASE_ADDR_MASK);
 			mmu_page_remove_parent_pte(child, shadow_pte);
 		} else if (pfn != spte_to_pfn(*shadow_pte)) {
-			pgprintk("hfn old %lx new %lx\n",
+			printk(KERN_ERR "hfn old %lx new %lx\n",
 				 spte_to_pfn(*shadow_pte), pfn);
 			rmap_remove(vcpu->kvm, shadow_pte);
 		} else