diff mbox

2.6.38.1 general protection fault

Message ID 20110328200401.GC12265@random.random (mailing list archive)
State New, archived
Headers show

Commit Message

Andrea Arcangeli March 28, 2011, 8:04 p.m. UTC
None
diff mbox

Patch

different than the one shown so far, I may find more info in the oops
if I get the assembly of the caller too and of the iteration of the
loop that runs in that function before the GFP)

khugepaged is present in your second trace (and khugepaged is mangling
over some memslot range with guest gfn mapped or kvm_unmap_rmapp
wouldn't be called in the first place, hope the memslot are all ok)
but probably you didn't get the right alignment so likely the THP are
mapped as 4k pages in the guest, which must work fine too. I wonder if
that might be related to that (my qemu-kvm I keep it patched with the
patch below which isn't yet polished enough to be digestible for qemu,
wrong alignments, x86 4M alignment not handled yet, and not sure if
the DONTFORK fix to prevent OOM with hotplug/migrate is acceptable in
that position).

Can you try to "echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs"
and then run "cat /proc/`pgrep qemu`/smaps >/dev/null" once per minute (or find
the right pid by hand if you've more than one qemu process running).
This debug trick will only work for 2.6.38.1, as 2.6.39 has a native
THP handling in the smaps file, but in 2.6.38.1 it should flush all
sptes mapped on THP just like fork (this might help to reproduce).

I'm also surprised this happened during fork that initialize the tap
interface, shouldn't that fork run before any sptes is established?
(we're running the spte invalidate with mmu notifier in the parent
before wrprotecting the ptes during fork)

I also wonder if it's a memslot race of some kind, I don't see
anything wrong in the rmapp handling at the moment.

This isn't a patch to try, I'm only showing it here for reference as I
guess I suspect it might hide the bug. I'm now going to reverse it and
see if I can reproduce, in case having large sptes (instead of 4k
sptes) always mapped on host THP changes something.

Thanks!

diff --git a/exec.c b/exec.c
index bb0c1be..f60e5fe 100644
--- a/exec.c
+++ b/exec.c
@@ -2856,6 +2856,18 @@  static ram_addr_t last_ram_offset(void)
     return last;
 }
 
+#if defined(__linux__) && defined(__x86_64__)
+/*
+ * Align on the max transparent hugepage size so that
+ * "(gfn ^ pfn) & (HPAGE_SIZE-1) == 0" to allow KVM to
+ * take advantage of hugepages with NPT/EPT or to
+ * ensure the first 2M of the guest physical ram will
+ * be mapped by the same hugetlb for QEMU (it is worth
+ * it even without NPT/EPT).
+ */
+#define PREFERRED_RAM_ALIGN (2*1024*1024)
+#endif
+
 ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
                                    ram_addr_t size, void *host)
 {
@@ -2902,9 +2914,15 @@  ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
                                    PROT_EXEC|PROT_READ|PROT_WRITE,
                                    MAP_SHARED | MAP_ANONYMOUS, -1, 0);
 #else
-            new_block->host = qemu_vmalloc(size);
+#ifdef PREFERRED_RAM_ALIGN
+	    if (size >= PREFERRED_RAM_ALIGN)
+		    new_block->host = qemu_memalign(PREFERRED_RAM_ALIGN, size);
+	    else
+#endif
+		    new_block->host = qemu_vmalloc(size);
 #endif
             qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
+            qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
         }
     }