diff mbox

[2/9] memory: add support getting and using a dirty bitmap copy.

Message ID 20170421091632.30900-3-kraxel@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gerd Hoffmann April 21, 2017, 9:16 a.m. UTC
This patch adds support for getting and using a local copy of the dirty
bitmap.

memory_region_snapshot_and_clear_dirty() will create a snapshot of the
dirty bitmap for the specified range, clear the dirty bitmap and return
the copy.  The returned bitmap can be a bit larger than requested, the
range is expanded so the code can copy unsigned longs from the bitmap
and avoid atomic bit update operations.

memory_region_snapshot_get_dirty() will return the dirty status of
pages, pretty much like memory_region_get_dirty(), but using the copy
returned by memory_region_copy_and_clear_dirty().

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 include/exec/memory.h   | 47 +++++++++++++++++++++++++++++++
 include/exec/ram_addr.h |  7 +++++
 include/qemu/typedefs.h |  1 +
 exec.c                  | 75 +++++++++++++++++++++++++++++++++++++++++++++++++
 memory.c                | 17 +++++++++++
 5 files changed, 147 insertions(+)

Comments

Kevin Wolf April 27, 2017, 2 p.m. UTC | #1
Am 21.04.2017 um 11:16 hat Gerd Hoffmann geschrieben:
> +bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
> +                                            ram_addr_t start,
> +                                            ram_addr_t length)
> +{
> +    unsigned long page, end;
> +
> +    assert(start >= snap->start);
> +    assert(start + length <= snap->end);

Not sure if this has been reported somewhere else, but I got an
assertion failure here while booting a guest:

$ ~/source/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 2G -drive file=Windows-10-20170427.0-x86_64.qcow2,snapshot=on -usbdevice tablet -vga qxl
qemu-system-x86_64: /home/kwolf/source/qemu/exec.c:1125: cpu_physical_memory_snapshot_get_dirty: Zusicherung >>start + length <= snap->end<< nicht erf?llt.
Abgebrochen (Speicherabzug geschrieben)

Unfortunately, I didn't have gdb attached or core dumps enabled, and it
doesn't seem to reproduce easily, so I don't have anything that could
help debugging it, but I thought I'd just let you know anyway.

Kevin
Gerd Hoffmann April 27, 2017, 3:01 p.m. UTC | #2
On Do, 2017-04-27 at 16:00 +0200, Kevin Wolf wrote:
> Am 21.04.2017 um 11:16 hat Gerd Hoffmann geschrieben:
> > +bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
> > +                                            ram_addr_t start,
> > +                                            ram_addr_t length)
> > +{
> > +    unsigned long page, end;
> > +
> > +    assert(start >= snap->start);
> > +    assert(start + length <= snap->end);
> 
> Not sure if this has been reported somewhere else, but I got an
> assertion failure here while booting a guest:
> 
> $ ~/source/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 2G -drive file=Windows-10-20170427.0-x86_64.qcow2,snapshot=on -usbdevice tablet -vga qxl
> qemu-system-x86_64: /home/kwolf/source/qemu/exec.c:1125: cpu_physical_memory_snapshot_get_dirty: Zusicherung >>start + length <= snap->end<< nicht erf?llt.
> Abgebrochen (Speicherabzug geschrieben)
> 
> Unfortunately, I didn't have gdb attached or core dumps enabled, and it
> doesn't seem to reproduce easily, so I don't have anything that could
> help debugging it, but I thought I'd just let you know anyway.

Saw this a few times too, didn't have the time yet to dig deeper,
appears to happen due to a display update when the guest is half-way
through a mode switch and the vga registers are in inconsistent state.

Reproducer: boot fedora live iso, when isolinux switches back to text
mode it can trigger (one out of ten boots, loaded host seems to make it
more likely)

The easy way out is to just return false instead of asserting.

I want check how exactly this happens though, to make sure this isn't a
exploitable race (unlikely IMHO, but still worth checking ...).  Also
I'd prefer to fix vga and keep the assert()s, they are a good sanity
check.

cheers,
  Gerd
diff mbox

Patch

diff --git a/include/exec/memory.h b/include/exec/memory.h
index f20b191793..1e15e79d00 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -871,6 +871,53 @@  void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr,
  */
 bool memory_region_test_and_clear_dirty(MemoryRegion *mr, hwaddr addr,
                                         hwaddr size, unsigned client);
+
+/**
+ * memory_region_snapshot_and_clear_dirty: Get a snapshot of the dirty
+ *                                         bitmap and clear it.
+ *
+ * Creates a snapshot of the dirty bitmap, clears the dirty bitmap and
+ * returns the snapshot.  The snapshot can then be used to query dirty
+ * status, using memory_region_snapshot_get_dirty.  Unlike
+ * memory_region_test_and_clear_dirty this allows to query the same
+ * page multiple times, which is especially useful for display updates
+ * where the scanlines often are not page aligned.
+ *
+ * The dirty bitmap region which gets copyed into the snapshot (and
+ * cleared afterwards) can be larger than requested.  The boundaries
+ * are rounded up/down so complete bitmap longs (covering 64 pages on
+ * 64bit hosts) can be copied over into the bitmap snapshot.  Which
+ * isn't a problem for display updates as the extra pages are outside
+ * the visible area, and in case the visible area changes a full
+ * display redraw is due anyway.  Should other use cases for this
+ * function emerge we might have to revisit this implementation
+ * detail.
+ *
+ * Use g_free to release DirtyBitmapSnapshot.
+ *
+ * @mr: the memory region being queried.
+ * @addr: the address (relative to the start of the region) being queried.
+ * @size: the size of the range being queried.
+ * @client: the user of the logging information; typically %DIRTY_MEMORY_VGA.
+ */
+DirtyBitmapSnapshot *memory_region_snapshot_and_clear_dirty(MemoryRegion *mr,
+                                                            hwaddr addr,
+                                                            hwaddr size,
+                                                            unsigned client);
+
+/**
+ * memory_region_snapshot_get_dirty: Check whether a range of bytes is dirty
+ *                                   in the specified dirty bitmap snapshot.
+ *
+ * @mr: the memory region being queried.
+ * @snap: the dirty bitmap snapshot
+ * @addr: the address (relative to the start of the region) being queried.
+ * @size: the size of the range being queried.
+ */
+bool memory_region_snapshot_get_dirty(MemoryRegion *mr,
+                                      DirtyBitmapSnapshot *snap,
+                                      hwaddr addr, hwaddr size);
+
 /**
  * memory_region_sync_dirty_bitmap: Synchronize a region's dirty bitmap with
  *                                  any external TLBs (e.g. kvm)
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index b05dc84ab9..2b63d7f59e 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -343,6 +343,13 @@  bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
                                               ram_addr_t length,
                                               unsigned client);
 
+DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
+    (ram_addr_t start, ram_addr_t length, unsigned client);
+
+bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
+                                            ram_addr_t start,
+                                            ram_addr_t length);
+
 static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start,
                                                          ram_addr_t length)
 {
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index e95f28cfec..f08d327aec 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -23,6 +23,7 @@  typedef struct CPUAddressSpace CPUAddressSpace;
 typedef struct CPUState CPUState;
 typedef struct DeviceListener DeviceListener;
 typedef struct DeviceState DeviceState;
+typedef struct DirtyBitmapSnapshot DirtyBitmapSnapshot;
 typedef struct DisplayChangeListener DisplayChangeListener;
 typedef struct DisplayState DisplayState;
 typedef struct DisplaySurface DisplaySurface;
diff --git a/exec.c b/exec.c
index c97ef4a8da..a8894d5ba2 100644
--- a/exec.c
+++ b/exec.c
@@ -223,6 +223,12 @@  struct CPUAddressSpace {
     MemoryListener tcg_as_listener;
 };
 
+struct DirtyBitmapSnapshot {
+    ram_addr_t start;
+    ram_addr_t end;
+    unsigned long dirty[];
+};
+
 #endif
 
 #if !defined(CONFIG_USER_ONLY)
@@ -1061,6 +1067,75 @@  bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
     return dirty;
 }
 
+DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
+     (ram_addr_t start, ram_addr_t length, unsigned client)
+{
+    DirtyMemoryBlocks *blocks;
+    unsigned long align = 1UL << (TARGET_PAGE_BITS + BITS_PER_LEVEL);
+    ram_addr_t first = QEMU_ALIGN_DOWN(start, align);
+    ram_addr_t last  = QEMU_ALIGN_UP(start + length, align);
+    DirtyBitmapSnapshot *snap;
+    unsigned long page, end, dest;
+
+    snap = g_malloc0(sizeof(*snap) +
+                     ((last - first) >> (TARGET_PAGE_BITS + 3)));
+    snap->start = first;
+    snap->end   = last;
+
+    page = first >> TARGET_PAGE_BITS;
+    end  = last  >> TARGET_PAGE_BITS;
+    dest = 0;
+
+    rcu_read_lock();
+
+    blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
+
+    while (page < end) {
+        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+
+        assert(QEMU_IS_ALIGNED(offset, (1 << BITS_PER_LEVEL)));
+        assert(QEMU_IS_ALIGNED(num,    (1 << BITS_PER_LEVEL)));
+        offset >>= BITS_PER_LEVEL;
+
+        bitmap_copy_and_clear_atomic(snap->dirty + dest,
+                                     blocks->blocks[idx] + offset,
+                                     num);
+        page += num;
+        dest += num >> BITS_PER_LEVEL;
+    }
+
+    rcu_read_unlock();
+
+    if (tcg_enabled()) {
+        tlb_reset_dirty_range_all(start, length);
+    }
+
+    return snap;
+}
+
+bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
+                                            ram_addr_t start,
+                                            ram_addr_t length)
+{
+    unsigned long page, end;
+
+    assert(start >= snap->start);
+    assert(start + length <= snap->end);
+
+    end = TARGET_PAGE_ALIGN(start + length - snap->start) >> TARGET_PAGE_BITS;
+    page = (start - snap->start) >> TARGET_PAGE_BITS;
+
+    while (page < end) {
+        if (test_bit(page, snap->dirty)) {
+            return true;
+        }
+        page++;
+    }
+    return false;
+}
+
 /* Called from RCU critical section */
 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
                                        MemoryRegionSection *section,
diff --git a/memory.c b/memory.c
index 4c95aaf39c..8a0648551f 100644
--- a/memory.c
+++ b/memory.c
@@ -1716,6 +1716,23 @@  bool memory_region_test_and_clear_dirty(MemoryRegion *mr, hwaddr addr,
                 memory_region_get_ram_addr(mr) + addr, size, client);
 }
 
+DirtyBitmapSnapshot *memory_region_snapshot_and_clear_dirty(MemoryRegion *mr,
+                                                            hwaddr addr,
+                                                            hwaddr size,
+                                                            unsigned client)
+{
+    assert(mr->ram_block);
+    return cpu_physical_memory_snapshot_and_clear_dirty(
+                memory_region_get_ram_addr(mr) + addr, size, client);
+}
+
+bool memory_region_snapshot_get_dirty(MemoryRegion *mr, DirtyBitmapSnapshot *snap,
+                                      hwaddr addr, hwaddr size)
+{
+    assert(mr->ram_block);
+    return cpu_physical_memory_snapshot_get_dirty(snap,
+                memory_region_get_ram_addr(mr) + addr, size);
+}
 
 void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
 {