diff mbox series

migration: ram block cpr blockers

Message ID 1737135971-132839-1-git-send-email-steven.sistare@oracle.com (mailing list archive)
State New
Headers show
Series migration: ram block cpr blockers | expand

Commit Message

Steven Sistare Jan. 17, 2025, 5:46 p.m. UTC
Unlike cpr-reboot mode, cpr-transfer mode cannot save volatile ram blocks
in the migration stream file and recreate them later, because the physical
memory for the blocks is pinned and registered for vfio.  Add a blocker
for volatile ram blocks.

Also add a blocker for RAM_GUEST_MEMFD.  Preserving guest_memfd may be
sufficient for CPR, but it has not been tested yet.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 include/exec/memory.h   |  3 +++
 include/exec/ramblock.h |  1 +
 migration/savevm.c      |  2 ++
 system/physmem.c        | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+)

Comments

Peter Xu Jan. 17, 2025, 6:16 p.m. UTC | #1
On Fri, Jan 17, 2025 at 09:46:11AM -0800, Steve Sistare wrote:
> +/*
> + * Return true if ram contents would be lost during CPR.
> + * Return false for ram_device because it is remapped in new QEMU.  Do not
> + * exclude rom, even though it is readonly, because the rom file could change
> + * in new QEMU.  Return false for non-migratable blocks.  They are either
> + * re-created in new QEMU, or are handled specially, or are covered by a
> + * device-level CPR blocker.  Return false for an fd, because it is visible and
> + * can be remapped in new QEMU.
> + */
> +static bool ram_is_volatile(RAMBlock *rb)
> +{
> +    MemoryRegion *mr = rb->mr;
> +
> +    return mr &&
> +        memory_region_is_ram(mr) &&
> +        !memory_region_is_ram_device(mr) &&
> +        (!qemu_ram_is_shared(rb) || !qemu_ram_is_named_file(rb)) &&
> +        qemu_ram_is_migratable(rb) &&
> +        rb->fd < 0;
> +}

Blocking guest_memfd looks ok, but comparing to add one more block
notifier, can we check all ramblocks once in migrate_prepare(), and fail
that command directly if it fails the check?

OTOH, is there any simpler way to simplify the check conditions?  It'll be
at least nice to break these checks into smaller if conditions for
readability..

I wonder if we could stick with looping over all ramblocks, then make sure
each of them is on the cpr saved fd list.  It may need to make
cpr_save_fd() always register with the name of ramblock to do such lookup,
or maybe we could also cache the ramblock pointer in CprFd, then the lookup
will be a pointer match check.
Steven Sistare Jan. 17, 2025, 7:10 p.m. UTC | #2
On 1/17/2025 1:16 PM, Peter Xu wrote:
> On Fri, Jan 17, 2025 at 09:46:11AM -0800, Steve Sistare wrote:
>> +/*
>> + * Return true if ram contents would be lost during CPR.
>> + * Return false for ram_device because it is remapped in new QEMU.  Do not
>> + * exclude rom, even though it is readonly, because the rom file could change
>> + * in new QEMU.  Return false for non-migratable blocks.  They are either
>> + * re-created in new QEMU, or are handled specially, or are covered by a
>> + * device-level CPR blocker.  Return false for an fd, because it is visible and
>> + * can be remapped in new QEMU.
>> + */
>> +static bool ram_is_volatile(RAMBlock *rb)
>> +{
>> +    MemoryRegion *mr = rb->mr;
>> +
>> +    return mr &&
>> +        memory_region_is_ram(mr) &&
>> +        !memory_region_is_ram_device(mr) &&
>> +        (!qemu_ram_is_shared(rb) || !qemu_ram_is_named_file(rb)) &&
>> +        qemu_ram_is_migratable(rb) &&
>> +        rb->fd < 0;
>> +}
> 
> Blocking guest_memfd looks ok, but comparing to add one more block
> notifier, can we check all ramblocks once in migrate_prepare(), and fail
> that command directly if it fails the check?

In an upcoming patch, I will be adding an option analogous to only-migratable which
prevents QEMU from starting if anything would block cpr-transfer.  That option
will be checked when blockers are added, like for only-migratable. migrate_prepare
is too late.

> OTOH, is there any simpler way to simplify the check conditions?  It'll be
> at least nice to break these checks into smaller if conditions for
> readability..

I thought the function header comments made it clear, but I could move each
comment next to each condition:

     ...
     /*
      * Return false for an fd, because it is visible and can be remapped in
      * new QEMU.
      */
     if (rb->fd >= 0) {
         return false;
     }
     ...

> I wonder if we could stick with looping over all ramblocks, then make sure
> each of them is on the cpr saved fd list.  It may need to make
> cpr_save_fd() always register with the name of ramblock to do such lookup,
> or maybe we could also cache the ramblock pointer in CprFd, then the lookup
> will be a pointer match check.

Some ramblocks are not on the list, such as named files.  Plus looping in
migrate_prepare is too late as noted above.

IMO what I have already implemented using blockers is clean and elegant.

- Steve
Peter Xu Jan. 17, 2025, 11:57 p.m. UTC | #3
On Fri, Jan 17, 2025 at 02:10:14PM -0500, Steven Sistare wrote:
> On 1/17/2025 1:16 PM, Peter Xu wrote:
> > On Fri, Jan 17, 2025 at 09:46:11AM -0800, Steve Sistare wrote:
> > > +/*
> > > + * Return true if ram contents would be lost during CPR.
> > > + * Return false for ram_device because it is remapped in new QEMU.  Do not
> > > + * exclude rom, even though it is readonly, because the rom file could change
> > > + * in new QEMU.  Return false for non-migratable blocks.  They are either
> > > + * re-created in new QEMU, or are handled specially, or are covered by a
> > > + * device-level CPR blocker.  Return false for an fd, because it is visible and
> > > + * can be remapped in new QEMU.
> > > + */
> > > +static bool ram_is_volatile(RAMBlock *rb)
> > > +{
> > > +    MemoryRegion *mr = rb->mr;
> > > +
> > > +    return mr &&
> > > +        memory_region_is_ram(mr) &&
> > > +        !memory_region_is_ram_device(mr) &&
> > > +        (!qemu_ram_is_shared(rb) || !qemu_ram_is_named_file(rb)) &&
> > > +        qemu_ram_is_migratable(rb) &&
> > > +        rb->fd < 0;
> > > +}
> > 
> > Blocking guest_memfd looks ok, but comparing to add one more block
> > notifier, can we check all ramblocks once in migrate_prepare(), and fail
> > that command directly if it fails the check?
> 
> In an upcoming patch, I will be adding an option analogous to only-migratable which
> prevents QEMU from starting if anything would block cpr-transfer.  That option
> will be checked when blockers are added, like for only-migratable. migrate_prepare
> is too late.
> 
> > OTOH, is there any simpler way to simplify the check conditions?  It'll be
> > at least nice to break these checks into smaller if conditions for
> > readability..
> 
> I thought the function header comments made it clear, but I could move each
> comment next to each condition:
> 
>     ...
>     /*
>      * Return false for an fd, because it is visible and can be remapped in
>      * new QEMU.
>      */
>     if (rb->fd >= 0) {
>         return false;
>     }
>     ...
> 
> > I wonder if we could stick with looping over all ramblocks, then make sure
> > each of them is on the cpr saved fd list.  It may need to make
> > cpr_save_fd() always register with the name of ramblock to do such lookup,
> > or maybe we could also cache the ramblock pointer in CprFd, then the lookup
> > will be a pointer match check.
> 
> Some ramblocks are not on the list, such as named files.  Plus looping in
> migrate_prepare is too late as noted above.
> 
> IMO what I have already implemented using blockers is clean and elegant.

OK if we need to fail it early at boot, then yes blockers are probably
better.

We'll need one more cmdline parameter. I've no objection, but I don't know
how to judge when it's ok to add, when it's better not.. I'll leave others
to comment on this.

But still, could we check it when ramblocks are created?  So in that way
whatever is forbidden is clear in its own path, I feel like that could be
clearer (like what you did with gmemfd).

For example, if I start to convert some of your requirements above, then
memory_region_is_ram_device() implies RAM_PREALLOC.  Actually, ram_device
is not the only RAM_PREALLOC user..  Say, would it also not work with all
memory_region_init_ram_ptr() users (even if they're not ram_device)?  An
example is, looks like virtio-gpu can create random ramblocks on the fly
with prealloced buffers.  I am not sure whether they can be pinned by VFIO
too.  You may know better.

So, to me ram_is_volatile() is harder to follow, meanwhile it may miss
something to me?  IMO it's still better to explicitly add cpr blockers in
the ram block add() path if possible, but maybe you still have good reasons
to do it only until vmstate_register_ram() which I overlooked..
diff mbox series

Patch

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 0ac21cc..088330a 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3185,6 +3185,9 @@  bool ram_block_discard_is_disabled(void);
  */
 bool ram_block_discard_is_required(void);
 
+void ram_block_add_cpr_blocker(RAMBlock *rb, Error **errp);
+void ram_block_del_cpr_blocker(RAMBlock *rb);
+
 #endif
 
 #endif
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 0babd10..64484cd 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -39,6 +39,7 @@  struct RAMBlock {
     /* RCU-enabled, writes protected by the ramlist lock */
     QLIST_ENTRY(RAMBlock) next;
     QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
+    Error *cpr_blocker;
     int fd;
     uint64_t fd_offset;
     int guest_memfd;
diff --git a/migration/savevm.c b/migration/savevm.c
index c929da1..264bc06 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3341,12 +3341,14 @@  void vmstate_register_ram(MemoryRegion *mr, DeviceState *dev)
     qemu_ram_set_idstr(mr->ram_block,
                        memory_region_name(mr), dev);
     qemu_ram_set_migratable(mr->ram_block);
+    ram_block_add_cpr_blocker(mr->ram_block, &error_fatal);
 }
 
 void vmstate_unregister_ram(MemoryRegion *mr, DeviceState *dev)
 {
     qemu_ram_unset_idstr(mr->ram_block);
     qemu_ram_unset_migratable(mr->ram_block);
+    ram_block_del_cpr_blocker(mr->ram_block);
 }
 
 void vmstate_register_ram_global(MemoryRegion *mr)
diff --git a/system/physmem.c b/system/physmem.c
index 67c9db9..0bcfc6c 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -70,7 +70,10 @@ 
 
 #include "qemu/pmem.h"
 
+#include "qapi/qapi-types-migration.h"
+#include "migration/blocker.h"
 #include "migration/cpr.h"
+#include "migration/options.h"
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -1899,6 +1902,14 @@  static void ram_block_add(RAMBlock *new_block, Error **errp)
             qemu_mutex_unlock_ramlist();
             goto out_free;
         }
+
+        error_setg(&new_block->cpr_blocker,
+                   "Memory region %s uses guest_memfd, "
+                   "which is not supported with CPR.",
+                   memory_region_name(new_block->mr));
+        migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
+                                  MIG_MODE_CPR_TRANSFER,
+                                  -1);
     }
 
     ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
@@ -4059,3 +4070,46 @@  bool ram_block_discard_is_required(void)
     return qatomic_read(&ram_block_discard_required_cnt) ||
            qatomic_read(&ram_block_coordinated_discard_required_cnt);
 }
+
+/*
+ * Return true if ram contents would be lost during CPR.
+ * Return false for ram_device because it is remapped in new QEMU.  Do not
+ * exclude rom, even though it is readonly, because the rom file could change
+ * in new QEMU.  Return false for non-migratable blocks.  They are either
+ * re-created in new QEMU, or are handled specially, or are covered by a
+ * device-level CPR blocker.  Return false for an fd, because it is visible and
+ * can be remapped in new QEMU.
+ */
+static bool ram_is_volatile(RAMBlock *rb)
+{
+    MemoryRegion *mr = rb->mr;
+
+    return mr &&
+        memory_region_is_ram(mr) &&
+        !memory_region_is_ram_device(mr) &&
+        (!qemu_ram_is_shared(rb) || !qemu_ram_is_named_file(rb)) &&
+        qemu_ram_is_migratable(rb) &&
+        rb->fd < 0;
+}
+
+/*
+ * Add a blocker for each volatile ram block.
+ */
+void ram_block_add_cpr_blocker(RAMBlock *rb, Error **errp)
+{
+    if (!ram_is_volatile(rb)) {
+        return;
+    }
+
+    error_setg(&rb->cpr_blocker,
+               "Memory region %s is volatile. share=on is required for "
+               "memory-backend objects, and aux-ram-share=on is required.",
+               memory_region_name(rb->mr));
+    migrate_add_blocker_modes(&rb->cpr_blocker, errp, MIG_MODE_CPR_TRANSFER,
+                              -1);
+}
+
+void ram_block_del_cpr_blocker(RAMBlock *rb)
+{
+    migrate_del_blocker(&rb->cpr_blocker);
+}