diff mbox series

[for-9.1,v5,11/14] memory: Add Error** argument to the global_dirty_log routines

Message ID 20240320064911.545001-12-clg@redhat.com (mailing list archive)
State New
Headers show
Series None | expand

Commit Message

Cédric Le Goater March 20, 2024, 6:49 a.m. UTC
Now that the log_global*() handlers take an Error** parameter and
return a bool, do the same for memory_global_dirty_log_start() and
memory_global_dirty_log_stop(). The error is reported in the callers
for now and it will be propagated in the call stack in the next
changes.

To be noted a functional change in ram_init_bitmaps(), if the dirty
pages logger fails to start, there is no need to synchronize the dirty
pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
similar way.

Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paul Durrant <paul@xen.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hyman Huang <yong.huang@smartx.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v5:

 - Removed Yong Huang's R-b 
 - Made use of ram_bitmaps_destroy() in ram_init_bitmaps() to cleanup
   allocated bitmaps
 
 include/exec/memory.h |  5 ++++-
 hw/i386/xen/xen-hvm.c |  2 +-
 migration/dirtyrate.c | 13 +++++++++++--
 migration/ram.c       | 23 +++++++++++++++++++++--
 system/memory.c       | 11 +++++------
 5 files changed, 42 insertions(+), 12 deletions(-)

Comments

Fabiano Rosas March 20, 2024, 2:53 p.m. UTC | #1
Cédric Le Goater <clg@redhat.com> writes:

> Now that the log_global*() handlers take an Error** parameter and
> return a bool, do the same for memory_global_dirty_log_start() and
> memory_global_dirty_log_stop(). The error is reported in the callers
> for now and it will be propagated in the call stack in the next
> changes.
>
> To be noted a functional change in ram_init_bitmaps(), if the dirty
> pages logger fails to start, there is no need to synchronize the dirty
> pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
> similar way.
>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Paul Durrant <paul@xen.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Hyman Huang <yong.huang@smartx.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Peter Xu March 20, 2024, 3:18 p.m. UTC | #2
On Wed, Mar 20, 2024 at 07:49:07AM +0100, Cédric Le Goater wrote:
> Now that the log_global*() handlers take an Error** parameter and
> return a bool, do the same for memory_global_dirty_log_start() and
> memory_global_dirty_log_stop(). The error is reported in the callers
> for now and it will be propagated in the call stack in the next
> changes.
> 
> To be noted a functional change in ram_init_bitmaps(), if the dirty
> pages logger fails to start, there is no need to synchronize the dirty
> pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
> similar way.
> 
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Paul Durrant <paul@xen.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Hyman Huang <yong.huang@smartx.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>

Just to mention that for the rest users (dirtylimit/dirtyrate/colo) the
code still just keeps going even if start logging failed even after this
series applied as a whole.  Migration framework handles the failure
gracefully, not yet the rest.

That might be an issue for some, e.g., ideally we should be able to fail a
calc-dirty-rate request, but it's not supported so far.  Adding that could
add quite some burden to this series, so maybe that's fine to be done
later.  After all, having a VFIO device (that can fail a start_log()), plus
any of those features should be even rarer, I think?

It seems at least memory_global_dirty_log_sync() can be called even without
start logging, so I expect nothing should crash immediately. I spot one in
colo_incoming_start_dirty_log() already of such use.  My wild guess is it
relies on all log_sync*() hooks to cope with it, e.g. KVM ioctl() should
fail with -ENENT on most archs I think when it sees dirty log not ever
started.

For those bits, I'll wait and see whether Yong or Hailiang (cced) has any
comments. From generic migration/memory side, nothing I see wrong:

Acked-by: Peter Xu <peterx@redhat.com>

Thanks,
Yong Huang March 22, 2024, 1:55 a.m. UTC | #3
On Wed, Mar 20, 2024 at 11:19 PM Peter Xu <peterx@redhat.com> wrote:

> On Wed, Mar 20, 2024 at 07:49:07AM +0100, Cédric Le Goater wrote:
> > Now that the log_global*() handlers take an Error** parameter and
> > return a bool, do the same for memory_global_dirty_log_start() and
> > memory_global_dirty_log_stop(). The error is reported in the callers
> > for now and it will be propagated in the call stack in the next
> > changes.
> >
> > To be noted a functional change in ram_init_bitmaps(), if the dirty
> > pages logger fails to start, there is no need to synchronize the dirty
> > pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
> > similar way.
> >
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > Cc: Paul Durrant <paul@xen.org>
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Hyman Huang <yong.huang@smartx.com>
> > Signed-off-by: Cédric Le Goater <clg@redhat.com>
>
> Just to mention that for the rest users (dirtylimit/dirtyrate/colo) the
> code still just keeps going even if start logging failed even after this
> series applied as a whole.  Migration framework handles the failure
> gracefully, not yet the rest.
>
> That might be an issue for some, e.g., ideally we should be able to fail a
> calc-dirty-rate request, but it's not supported so far.  Adding that could
> add quite some burden to this series, so maybe that's fine to be done
>
Indeed, for the GLOBAL_DIRTY_DIRTY_RATE and GLOBAL_DIRTY_LIMIT
dirty tracking, they should handle the failure path of logging start.
The work may be done once the current patchset is merged.


> later.  After all, having a VFIO device (that can fail a start_log()), plus
> any of those features should be even rarer, I think?
>
> It seems at least memory_global_dirty_log_sync() can be called even without
> start logging, so I expect nothing should crash immediately. I spot one in
> colo_incoming_start_dirty_log() already of such use.  My wild guess is it
> relies on all log_sync*() hooks to cope with it, e.g. KVM ioctl() should
> fail with -ENENT on most archs I think when it sees dirty log not ever
> started.
>
> For those bits, I'll wait and see whether Yong or Hailiang (cced) has any
> comments. From generic migration/memory side, nothing I see wrong:
>
> Acked-by: Peter Xu <peterx@redhat.com>
>
> Thanks,
>
> --
> Peter Xu
>
>
Peter Xu March 22, 2024, 1:41 p.m. UTC | #4
On Fri, Mar 22, 2024 at 09:55:18AM +0800, Yong Huang wrote:
> Indeed, for the GLOBAL_DIRTY_DIRTY_RATE and GLOBAL_DIRTY_LIMIT
> dirty tracking, they should handle the failure path of logging start.
> The work may be done once the current patchset is merged.

Thanks for confirming this, Yong.  I think I'll go ahead and queue it then.
It should be in the 1st migration pull for 9.1.  Thanks,
diff mbox series

Patch

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5555567bc4c9fdb53e8f63487f1400980275687d..c129ee6db7162504bd72d4cfc69b5affb2cd87e8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2570,8 +2570,11 @@  void memory_listener_unregister(MemoryListener *listener);
  * memory_global_dirty_log_start: begin dirty logging for all regions
  *
  * @flags: purpose of starting dirty log, migration or dirty rate
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Return: true on success, else false setting @errp with error.
  */
-void memory_global_dirty_log_start(unsigned int flags);
+bool memory_global_dirty_log_start(unsigned int flags, Error **errp);
 
 /**
  * memory_global_dirty_log_stop: end dirty logging for all regions
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index f6e9a1bc86491783077b5cb5aafdb19ab294e392..006d219ad52d739cc406ad5f8082ca82c16c61cc 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -669,7 +669,7 @@  void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length)
 void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
 {
     if (enable) {
-        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
+        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
     } else {
         memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
     }
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 1d2e85746fb7b10eb7f149976970f9a92125af8a..d02d70b7b4b86a29d4d5540ded416543536d8f98 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -90,9 +90,15 @@  static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
 
 void global_dirty_log_change(unsigned int flag, bool start)
 {
+    Error *local_err = NULL;
+    bool ret;
+
     bql_lock();
     if (start) {
-        memory_global_dirty_log_start(flag);
+        ret = memory_global_dirty_log_start(flag, &local_err);
+        if (!ret) {
+            error_report_err(local_err);
+        }
     } else {
         memory_global_dirty_log_stop(flag);
     }
@@ -608,9 +614,12 @@  static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
 {
     int64_t start_time;
     DirtyPageRecord dirty_pages;
+    Error *local_err = NULL;
 
     bql_lock();
-    memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
+    if (!memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE, &local_err)) {
+        error_report_err(local_err);
+    }
 
     /*
      * 1'round of log sync may return all 1 bits with
diff --git a/migration/ram.c b/migration/ram.c
index f0bd71438a4f7212118593b51648b645737933d4..bade3e9281ae839578033524b800dcf3c6f486dc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2862,18 +2862,32 @@  static void migration_bitmap_clear_discarded_pages(RAMState *rs)
 
 static void ram_init_bitmaps(RAMState *rs)
 {
+    Error *local_err = NULL;
+    bool ret = true;
+
     qemu_mutex_lock_ramlist();
 
     WITH_RCU_READ_LOCK_GUARD() {
         ram_list_init_bitmaps();
         /* We don't use dirty log with background snapshots */
         if (!migrate_background_snapshot()) {
-            memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
+            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
+                                                &local_err);
+            if (!ret) {
+                error_report_err(local_err);
+                goto out_unlock;
+            }
             migration_bitmap_sync_precopy(rs, false);
         }
     }
+out_unlock:
     qemu_mutex_unlock_ramlist();
 
+    if (!ret) {
+        ram_bitmaps_destroy();
+        return;
+    }
+
     /*
      * After an eventual first bitmap sync, fixup the initial bitmap
      * containing all 1s to exclude any discarded pages from migration.
@@ -3665,6 +3679,8 @@  int colo_init_ram_cache(void)
 void colo_incoming_start_dirty_log(void)
 {
     RAMBlock *block = NULL;
+    Error *local_err = NULL;
+
     /* For memory_global_dirty_log_start below. */
     bql_lock();
     qemu_mutex_lock_ramlist();
@@ -3676,7 +3692,10 @@  void colo_incoming_start_dirty_log(void)
             /* Discard this dirty bitmap record */
             bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
         }
-        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
+        if (!memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
+                                           &local_err)) {
+            error_report_err(local_err);
+        }
     }
     ram_state->migration_dirty_pages = 0;
     qemu_mutex_unlock_ramlist();
diff --git a/system/memory.c b/system/memory.c
index ca4d91484fb3d06f4b902486fea49dba86dc141b..d7ca1de994f6c3d08820eae4f0b1922db0030831 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2937,10 +2937,9 @@  err:
     return false;
 }
 
-void memory_global_dirty_log_start(unsigned int flags)
+bool memory_global_dirty_log_start(unsigned int flags, Error **errp)
 {
     unsigned int old_flags;
-    Error *local_err = NULL;
 
     assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
 
@@ -2952,7 +2951,7 @@  void memory_global_dirty_log_start(unsigned int flags)
 
     flags &= ~global_dirty_tracking;
     if (!flags) {
-        return;
+        return true;
     }
 
     old_flags = global_dirty_tracking;
@@ -2960,17 +2959,17 @@  void memory_global_dirty_log_start(unsigned int flags)
     trace_global_dirty_changed(global_dirty_tracking);
 
     if (!old_flags) {
-        if (!memory_global_dirty_log_do_start(&local_err)) {
+        if (!memory_global_dirty_log_do_start(errp)) {
             global_dirty_tracking &= ~flags;
             trace_global_dirty_changed(global_dirty_tracking);
-            error_report_err(local_err);
-            return;
+            return false;
         }
 
         memory_region_transaction_begin();
         memory_region_update_pending = true;
         memory_region_transaction_commit();
     }
+    return true;
 }
 
 static void memory_global_dirty_log_do_stop(unsigned int flags)