diff mbox series

[02/12] libxenguest: deal with log-dirty op stats overflow

Message ID 46831816-a5f2-2eb4-bb91-aba345448feb@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: more or less log-dirty related improvements | expand

Commit Message

Jan Beulich June 25, 2021, 1:18 p.m. UTC
In send_memory_live() the precise value the dirty_count struct field
gets initialized to doesn't matter much (apart from the triggering of
the log message in send_dirty_pages(), see below), but it is important
that it not be zero on the first iteration (or else send_dirty_pages()
won't get called at all). Saturate the initializer value at the maximum
value the field can hold.

While there also initialize struct precopy_stats' respective field to a
more sane value: We don't really know how many dirty pages there are at
that point.

In suspend_and_send_dirty() and verify_frames() the local variables
don't need initializing at all, as they're only an output from the
hypercall which gets invoked first thing.

In send_checkpoint_dirty_pfn_list() the local variable can be dropped
altogether: It's optional to xc_logdirty_control() and not used anywhere
else.

Note that in case the clipping actually takes effect, the "Bitmap
contained more entries than expected..." log message will trigger. This
being just an informational message, I don't think this is overly
concerning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Andrew Cooper June 25, 2021, 4:36 p.m. UTC | #1
On 25/06/2021 14:18, Jan Beulich wrote:
> In send_memory_live() the precise value the dirty_count struct field
> gets initialized to doesn't matter much (apart from the triggering of
> the log message in send_dirty_pages(), see below), but it is important
> that it not be zero on the first iteration (or else send_dirty_pages()
> won't get called at all). Saturate the initializer value at the maximum
> value the field can hold.

I don't follow.  Migration would be extremely broken if the first
iteration didn't work correctly, so something else is going on here.

>
> While there also initialize struct precopy_stats' respective field to a
> more sane value: We don't really know how many dirty pages there are at
> that point.
>
> In suspend_and_send_dirty() and verify_frames() the local variables
> don't need initializing at all, as they're only an output from the
> hypercall which gets invoked first thing.
>
> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
> altogether: It's optional to xc_logdirty_control() and not used anywhere
> else.
>
> Note that in case the clipping actually takes effect, the "Bitmap
> contained more entries than expected..." log message will trigger. This
> being just an informational message, I don't think this is overly
> concerning.

That message is currently a error, confirming that the VM will crash on
the resuming side.

This is a consequence of it attempting to balloon during the live phase
of migration, and discussed in docs/features/migration.pandoc (well - at
least mentioned on the "noone has fixed this yet" list).

> --- a/tools/libs/guest/xg_sr_save.c
> +++ b/tools/libs/guest/xg_sr_save.c
> @@ -500,7 +500,9 @@ static int simple_precopy_policy(struct
>  static int send_memory_live(struct xc_sr_context *ctx)
>  {
>      xc_interface *xch = ctx->xch;
> -    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
> +    xc_shadow_op_stats_t stats = {
> +        .dirty_count = MIN(ctx->save.p2m_size, (typeof(stats.dirty_count))~0)
> +    };
>      char *progress_str = NULL;
>      unsigned int x = 0;
>      int rc;
> @@ -519,7 +521,7 @@ static int send_memory_live(struct xc_sr
>          goto out;
>  
>      ctx->save.stats = (struct precopy_stats){
> -        .dirty_count = ctx->save.p2m_size,
> +        .dirty_count = -1,

This is an external interface, and I'm not sure it will tolerate finding
more than p2m_size allegedly dirty.

~Andrew
Jan Beulich June 28, 2021, 7:48 a.m. UTC | #2
On 25.06.2021 18:36, Andrew Cooper wrote:
> On 25/06/2021 14:18, Jan Beulich wrote:
>> In send_memory_live() the precise value the dirty_count struct field
>> gets initialized to doesn't matter much (apart from the triggering of
>> the log message in send_dirty_pages(), see below), but it is important
>> that it not be zero on the first iteration (or else send_dirty_pages()
>> won't get called at all). Saturate the initializer value at the maximum
>> value the field can hold.
> 
> I don't follow.  Migration would be extremely broken if the first
> iteration didn't work correctly, so something else is going on here.

As per the title we're talking about overflow situation here. In particular
the field did end up zero when ctx->save.p2m_size was 0x100000000. I'm not
claiming in any way that the first iteration would generally not work.

>> While there also initialize struct precopy_stats' respective field to a
>> more sane value: We don't really know how many dirty pages there are at
>> that point.
>>
>> In suspend_and_send_dirty() and verify_frames() the local variables
>> don't need initializing at all, as they're only an output from the
>> hypercall which gets invoked first thing.
>>
>> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
>> altogether: It's optional to xc_logdirty_control() and not used anywhere
>> else.
>>
>> Note that in case the clipping actually takes effect, the "Bitmap
>> contained more entries than expected..." log message will trigger. This
>> being just an informational message, I don't think this is overly
>> concerning.
> 
> That message is currently a error, confirming that the VM will crash on
> the resuming side.

An error? All I see is

    if ( written > entries )
        DPRINTF("Bitmap contained more entries than expected...");

> This is a consequence of it attempting to balloon during the live phase
> of migration, and discussed in docs/features/migration.pandoc (well - at
> least mentioned on the "noone has fixed this yet" list).
> 
>> --- a/tools/libs/guest/xg_sr_save.c
>> +++ b/tools/libs/guest/xg_sr_save.c
>> @@ -500,7 +500,9 @@ static int simple_precopy_policy(struct
>>  static int send_memory_live(struct xc_sr_context *ctx)
>>  {
>>      xc_interface *xch = ctx->xch;
>> -    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
>> +    xc_shadow_op_stats_t stats = {
>> +        .dirty_count = MIN(ctx->save.p2m_size, (typeof(stats.dirty_count))~0)
>> +    };
>>      char *progress_str = NULL;
>>      unsigned int x = 0;
>>      int rc;
>> @@ -519,7 +521,7 @@ static int send_memory_live(struct xc_sr
>>          goto out;
>>  
>>      ctx->save.stats = (struct precopy_stats){
>> -        .dirty_count = ctx->save.p2m_size,
>> +        .dirty_count = -1,
> 
> This is an external interface, and I'm not sure it will tolerate finding
> more than p2m_size allegedly dirty.

But you do realize that a few lines down from here there already was

        policy_stats->dirty_count   = -1;

? Or are you trying to tell me that -1 (documented as indicating
"unknown") is okay on subsequent iterations, but not on the first one?
That's not being said anywhere ...

Jan
Olaf Hering June 28, 2021, 11:10 a.m. UTC | #3
Am Mon, 28 Jun 2021 09:48:26 +0200
schrieb Jan Beulich <jbeulich@suse.com>:

> On 25.06.2021 18:36, Andrew Cooper wrote:
> > This is an external interface, and I'm not sure it will tolerate finding
> > more than p2m_size allegedly dirty.  
> But you do realize that a few lines down from here there already was
>         policy_stats->dirty_count   = -1;
> ? Or are you trying to tell me that -1 (documented as indicating
> "unknown") is okay on subsequent iterations, but not on the first one?

precopy_policy() gets called twice during each iteration.
Last time I tried to use this API it was difficult to work with.
It is required to look at dirty_count and iteration to see the actual state.
Maybe it was just me who initially failed to fully understand the intent.

I think as it is right now, the first run with iteration being zero is the only way to know the actual p2m_size, in case the consumer really wants to know this detail.

Olaf
Jan Beulich June 28, 2021, 11:20 a.m. UTC | #4
On 28.06.2021 13:10, Olaf Hering wrote:
> Am Mon, 28 Jun 2021 09:48:26 +0200
> schrieb Jan Beulich <jbeulich@suse.com>:
> 
>> On 25.06.2021 18:36, Andrew Cooper wrote:
>>> This is an external interface, and I'm not sure it will tolerate finding
>>> more than p2m_size allegedly dirty.  
>> But you do realize that a few lines down from here there already was
>>         policy_stats->dirty_count   = -1;
>> ? Or are you trying to tell me that -1 (documented as indicating
>> "unknown") is okay on subsequent iterations, but not on the first one?
> 
> precopy_policy() gets called twice during each iteration.
> Last time I tried to use this API it was difficult to work with.
> It is required to look at dirty_count and iteration to see the actual state.
> Maybe it was just me who initially failed to fully understand the intent.
> 
> I think as it is right now, the first run with iteration being zero is
> the only way to know the actual p2m_size, in case the consumer really
> wants to know this detail.

But if a field named dirty_count was intended to convey the P2M size
(and then only on the first iteration), this very certainly would
have needed writing down somewhere.

Jan
Olaf Hering June 28, 2021, 11:30 a.m. UTC | #5
Am Mon, 28 Jun 2021 13:20:06 +0200
schrieb Jan Beulich <jbeulich@suse.com>:

> But if a field named dirty_count was intended to convey the P2M size
> (and then only on the first iteration), this very certainly would
> have needed writing down somewhere.

Sure. Right now the only documentation for the precopy_policy callback is send_memory_live itself.


Olaf
diff mbox series

Patch

--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -452,7 +452,6 @@  static int send_checkpoint_dirty_pfn_lis
     unsigned int count, written;
     uint64_t i, *pfns = NULL;
     struct iovec *iov = NULL;
-    xc_shadow_op_stats_t stats = { 0, ctx->restore.p2m_size };
     struct xc_sr_record rec = {
         .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST,
     };
@@ -462,7 +461,7 @@  static int send_checkpoint_dirty_pfn_lis
     if ( xc_logdirty_control(
              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
              HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
-             0, &stats) != ctx->restore.p2m_size )
+             0, NULL) != ctx->restore.p2m_size )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         goto err;
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -500,7 +500,9 @@  static int simple_precopy_policy(struct
 static int send_memory_live(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    xc_shadow_op_stats_t stats = {
+        .dirty_count = MIN(ctx->save.p2m_size, (typeof(stats.dirty_count))~0)
+    };
     char *progress_str = NULL;
     unsigned int x = 0;
     int rc;
@@ -519,7 +521,7 @@  static int send_memory_live(struct xc_sr
         goto out;
 
     ctx->save.stats = (struct precopy_stats){
-        .dirty_count = ctx->save.p2m_size,
+        .dirty_count = -1,
     };
     policy_stats = &ctx->save.stats;
 
@@ -643,7 +645,7 @@  static int colo_merge_secondary_dirty_bi
 static int suspend_and_send_dirty(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    xc_shadow_op_stats_t stats;
     char *progress_str = NULL;
     int rc;
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
@@ -701,7 +703,7 @@  static int suspend_and_send_dirty(struct
 static int verify_frames(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    xc_shadow_op_stats_t stats;
     int rc;
     struct xc_sr_record rec = { .type = REC_TYPE_VERIFY };