diff mbox series

[2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list()

Message ID 20210706112332.31753-3-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series tools/migration: Fixes in send_checkpoint_dirty_pfn_list() | expand

Commit Message

Andrew Cooper July 6, 2021, 11:23 a.m. UTC
'count * sizeof(*pfns)' can in principle overflow, but is implausible in
practice as the time between checkpoints is typically sub-second.
Nevertheless, simplify the code and remove the risk.

There is no need to loop over the bitmap to calculate count.  The number of
set bits is returned in xc_shadow_op_stats_t which is already collected (and
ignored).

Bounds check the count against what will fit in REC_LENGTH_MAX.  At the time
of writing, this allows up to 0xffffff pfns.  Rearrange the pfns loop to check
for errors both ways, not simply that there were more pfns than expected.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Jackson <iwj@xenproject.org>
CC: Wei Liu <wl@xen.org>
CC: Juergen Gross <jgross@suse.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Olaf Hering <olaf@aepfle.de>
---
 tools/libs/guest/xg_sr_restore.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

Comments

Jan Beulich July 6, 2021, 12:03 p.m. UTC | #1
On 06.07.2021 13:23, Andrew Cooper wrote:
> 'count * sizeof(*pfns)' can in principle overflow, but is implausible in
> practice as the time between checkpoints is typically sub-second.
> Nevertheless, simplify the code and remove the risk.
> 
> There is no need to loop over the bitmap to calculate count.  The number of
> set bits is returned in xc_shadow_op_stats_t which is already collected (and
> ignored).
> 
> Bounds check the count against what will fit in REC_LENGTH_MAX.  At the time
> of writing, this allows up to 0xffffff pfns.

Well, okay, this then means that an overflow in the reporting of
dirty_count doesn't matter for now, because the limit is lower
anyway.

>  Rearrange the pfns loop to check
> for errors both ways, not simply that there were more pfns than expected.

Hmm, "both ways" to me would mean ...

> @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx)
>          goto err;
>      }
>  
> -    for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i )
> +    for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count )
>      {
>          if ( !test_bit(i, dirty_bitmap) )
>              continue;
>  
> -        if ( written > count )
> -        {
> -            ERROR("Dirty pfn list exceed");
> -            goto err;
> -        }
> -
>          pfns[written++] = i;
>      }
>  
> -    rec.length = count * sizeof(*pfns);
> -
> -    iov[1].iov_base = pfns;
> -    iov[1].iov_len = rec.length;
> +    if ( written != stats.dirty_count )
> +    {
> +        ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)",
> +              written, stats.dirty_count);
> +        goto err;
> +    }

... you then also check that there are no further bit set in the
bitmap. As said elsewhere, I'm not convinced using statistics as
a basis for actual operation (rather than just reporting) is
appropriate. I'm unaware of there being any spelled out guarantee
that the numbers reported back from the hypercall are accurate.

Jan
Olaf Hering July 6, 2021, 12:58 p.m. UTC | #2
Am Tue, 6 Jul 2021 12:23:32 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:

> +    count = stats.dirty_count;

Is this accurate?

I remember the reporting is broken since a while, and testing a busy domU indicates it is still the case.

# xen-logdirty `xl domid domU`
0: faults= 0 dirty= 258050
1: faults= 0 dirty= 257817
2: faults= 0 dirty= 253713
3: faults= 0 dirty= 253197
4: faults= 0 dirty= 255154
5: faults= 0 dirty= 260876
6: faults= 0 dirty= 259083
7: faults= 0 dirty= 253163
8: faults= 0 dirty= 258349
9: faults= 0 dirty= 260330
10: faults= 0 dirty= 259754
11: faults= 0 dirty= 257273
12: faults= 0 dirty= 255756
13: faults= 0 dirty= 258209
14: faults= 0 dirty= 257847
15: faults= 0 dirty= 59639
16: faults= 0 dirty= 81
17: faults= 0 dirty= 86
18: faults= 0 dirty= 111
19: faults= 0 dirty= 81
20: faults= 0 dirty= 80
....


Olaf
Andrew Cooper July 6, 2021, 1:19 p.m. UTC | #3
On 06/07/2021 13:58, Olaf Hering wrote:
> Am Tue, 6 Jul 2021 12:23:32 +0100
> schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
>
>> +    count = stats.dirty_count;
> Is this accurate?

The live loop relies on it, and it worked correctly the last time I
tested it.

> I remember the reporting is broken since a while, and testing a busy domU indicates it is still the case.

If its broken, it needs fixing.

> # xen-logdirty `xl domid domU`
> 0: faults= 0 dirty= 258050
> 1: faults= 0 dirty= 257817
> 2: faults= 0 dirty= 253713
> 3: faults= 0 dirty= 253197
> 4: faults= 0 dirty= 255154
> 5: faults= 0 dirty= 260876
> 6: faults= 0 dirty= 259083
> 7: faults= 0 dirty= 253163
> 8: faults= 0 dirty= 258349
> 9: faults= 0 dirty= 260330
> 10: faults= 0 dirty= 259754
> 11: faults= 0 dirty= 257273
> 12: faults= 0 dirty= 255756
> 13: faults= 0 dirty= 258209
> 14: faults= 0 dirty= 257847
> 15: faults= 0 dirty= 59639
> 16: faults= 0 dirty= 81
> 17: faults= 0 dirty= 86
> 18: faults= 0 dirty= 111
> 19: faults= 0 dirty= 81
> 20: faults= 0 dirty= 80

What is this showing, other than (unsurprisingly) faults doesn't work
for an HVM guest?

~Andrew
Andrew Cooper July 6, 2021, 1:22 p.m. UTC | #4
On 06/07/2021 14:19, Andrew Cooper wrote:
> On 06/07/2021 13:58, Olaf Hering wrote:
>> Am Tue, 6 Jul 2021 12:23:32 +0100
>> schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
>>
>>> +    count = stats.dirty_count;
>> Is this accurate?
> The live loop relies on it, and it worked correctly the last time I
> tested it.
>
>> I remember the reporting is broken since a while, and testing a busy domU indicates it is still the case.
> If its broken, it needs fixing.
>
>> # xen-logdirty `xl domid domU`
>> 0: faults= 0 dirty= 258050
>> 1: faults= 0 dirty= 257817
>> 2: faults= 0 dirty= 253713
>> 3: faults= 0 dirty= 253197
>> 4: faults= 0 dirty= 255154
>> 5: faults= 0 dirty= 260876
>> 6: faults= 0 dirty= 259083
>> 7: faults= 0 dirty= 253163
>> 8: faults= 0 dirty= 258349
>> 9: faults= 0 dirty= 260330
>> 10: faults= 0 dirty= 259754
>> 11: faults= 0 dirty= 257273
>> 12: faults= 0 dirty= 255756
>> 13: faults= 0 dirty= 258209
>> 14: faults= 0 dirty= 257847
>> 15: faults= 0 dirty= 59639
>> 16: faults= 0 dirty= 81
>> 17: faults= 0 dirty= 86
>> 18: faults= 0 dirty= 111
>> 19: faults= 0 dirty= 81
>> 20: faults= 0 dirty= 80
> What is this showing, other than (unsurprisingly) faults doesn't work
> for an HVM guest?

Sorry - I meant HAP guest.

What hardware is this on?  i.e. is the Page Modification Logging feature
in use?

~Andrew
Olaf Hering July 6, 2021, 1:28 p.m. UTC | #5
Am Tue, 6 Jul 2021 14:19:21 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:

> > 20: faults= 0 dirty= 80  
> 
> What is this showing, other than (unsurprisingly) faults doesn't work
> for an HVM guest?

The dirty count goes down after a while for a domU that constantly touches as many pages as it can.
But yes, the current code uses stats.dirty_count in a number of places.
It seems all of the usage is just for reporting, so if these code paths would get wrong input nothing bad happens.
precopy_policy may indicate more or less iterations, this is not critical either.

Olaf
Andrew Cooper July 6, 2021, 1:34 p.m. UTC | #6
On 06/07/2021 13:03, Jan Beulich wrote:
> On 06.07.2021 13:23, Andrew Cooper wrote:
>> 'count * sizeof(*pfns)' can in principle overflow, but is implausible in
>> practice as the time between checkpoints is typically sub-second.
>> Nevertheless, simplify the code and remove the risk.
>>
>> There is no need to loop over the bitmap to calculate count.  The number of
>> set bits is returned in xc_shadow_op_stats_t which is already collected (and
>> ignored).
>>
>> Bounds check the count against what will fit in REC_LENGTH_MAX.  At the time
>> of writing, this allows up to 0xffffff pfns.
> Well, okay, this then means that an overflow in the reporting of
> dirty_count doesn't matter for now, because the limit is lower
> anyway.
>
>>  Rearrange the pfns loop to check
>> for errors both ways, not simply that there were more pfns than expected.
> Hmm, "both ways" to me would mean ...
>
>> @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx)
>>          goto err;
>>      }
>>  
>> -    for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i )
>> +    for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count )
>>      {
>>          if ( !test_bit(i, dirty_bitmap) )
>>              continue;
>>  
>> -        if ( written > count )
>> -        {
>> -            ERROR("Dirty pfn list exceed");
>> -            goto err;
>> -        }
>> -
>>          pfns[written++] = i;
>>      }
>>  
>> -    rec.length = count * sizeof(*pfns);
>> -
>> -    iov[1].iov_base = pfns;
>> -    iov[1].iov_len = rec.length;
>> +    if ( written != stats.dirty_count )
>> +    {
>> +        ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)",
>> +              written, stats.dirty_count);
>> +        goto err;
>> +    }
> ... you then also check that there are no further bit set in the
> bitmap. As said elsewhere, I'm not convinced using statistics as
> a basis for actual operation (rather than just reporting) is
> appropriate.

I'm not interested in inference based on the name of the structure.

>  I'm unaware of there being any spelled out guarantee
> that the numbers reported back from the hypercall are accurate.

The live loop uses this information already for this purpose.  If it is
wrong, we've got bigger problems that this.

~Andrew
Olaf Hering July 6, 2021, 1:39 p.m. UTC | #7
Am Tue, 6 Jul 2021 14:22:58 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:

> What hardware is this on?  i.e. is the Page Modification Logging feature
> in use?

At least it gets reported as VMX feature during boot, this is a CoyotePass system.

Olaf
Andrew Cooper July 6, 2021, 1:43 p.m. UTC | #8
On 06/07/2021 14:39, Olaf Hering wrote:
> Am Tue, 6 Jul 2021 14:22:58 +0100
> schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
>
>> What hardware is this on?  i.e. is the Page Modification Logging feature
>> in use?
> At least it gets reported as VMX feature during boot, this is a CoyotePass system.

That logging is problematic (that means "this hardware has the
feature"), but yes - PML will be used by default when available.

~Andrew
Jan Beulich July 6, 2021, 1:56 p.m. UTC | #9
On 06.07.2021 15:19, Andrew Cooper wrote:
> On 06/07/2021 13:58, Olaf Hering wrote:
>> Am Tue, 6 Jul 2021 12:23:32 +0100
>> schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
>>
>>> +    count = stats.dirty_count;
>> Is this accurate?
> 
> The live loop relies on it, and it worked correctly the last time I
> tested it.

When still merely investigating in preparation of my recent series,
i.e. without having made changes yet except to add some logging, I
did observe "Bitmap contained more entries than expected..." a
couple of times, with "written" and "entries" typically apart by
just 1 (as determined by extra logging; to be honest I don't recall
if they were farther apart at any point). So the number is _not_
accurate in any event, and cannot be used for other than reporting
purposes (as also expressed elsewhere on this thread).

This also underlines that, unlike you did say in a reply to one of
my patches, this is only a "detail" message, not an error, because
migration happily went on and succeeded.

Jan
Jan Beulich July 6, 2021, 2 p.m. UTC | #10
On 06.07.2021 15:34, Andrew Cooper wrote:
> On 06/07/2021 13:03, Jan Beulich wrote:
>> On 06.07.2021 13:23, Andrew Cooper wrote:
>>> 'count * sizeof(*pfns)' can in principle overflow, but is implausible in
>>> practice as the time between checkpoints is typically sub-second.
>>> Nevertheless, simplify the code and remove the risk.
>>>
>>> There is no need to loop over the bitmap to calculate count.  The number of
>>> set bits is returned in xc_shadow_op_stats_t which is already collected (and
>>> ignored).
>>>
>>> Bounds check the count against what will fit in REC_LENGTH_MAX.  At the time
>>> of writing, this allows up to 0xffffff pfns.
>> Well, okay, this then means that an overflow in the reporting of
>> dirty_count doesn't matter for now, because the limit is lower
>> anyway.
>>
>>>  Rearrange the pfns loop to check
>>> for errors both ways, not simply that there were more pfns than expected.
>> Hmm, "both ways" to me would mean ...
>>
>>> @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx)
>>>          goto err;
>>>      }
>>>  
>>> -    for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i )
>>> +    for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count )
>>>      {
>>>          if ( !test_bit(i, dirty_bitmap) )
>>>              continue;
>>>  
>>> -        if ( written > count )
>>> -        {
>>> -            ERROR("Dirty pfn list exceed");
>>> -            goto err;
>>> -        }
>>> -
>>>          pfns[written++] = i;
>>>      }
>>>  
>>> -    rec.length = count * sizeof(*pfns);
>>> -
>>> -    iov[1].iov_base = pfns;
>>> -    iov[1].iov_len = rec.length;
>>> +    if ( written != stats.dirty_count )
>>> +    {
>>> +        ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)",
>>> +              written, stats.dirty_count);
>>> +        goto err;
>>> +    }
>> ... you then also check that there are no further bit set in the
>> bitmap. As said elsewhere, I'm not convinced using statistics as
>> a basis for actual operation (rather than just reporting) is
>> appropriate.
> 
> I'm not interested in inference based on the name of the structure.

I'm afraid that's the problem: Because you started using it for
something it wasn't meant to be used for, you now think it's the
name that's misleading, and the use is okay. I remain of the
opinion that it's the other way around (but see below for there
not being an real dependency at least in this particular case).

>>  I'm unaware of there being any spelled out guarantee
>> that the numbers reported back from the hypercall are accurate.
> 
> The live loop uses this information already for this purpose.  If it is
> wrong, we've got bigger problems that this.

send_memory_live() passes the value to send_dirty_pages(), which
in turn passes it only to xc_report_progress_step() and uses it
in

    if ( written > entries )
        DPRINTF("Bitmap contained more entries than expected...");

There's no relying on this number at all afaics.

Jan
Olaf Hering July 6, 2021, 2:11 p.m. UTC | #11
Am Tue, 6 Jul 2021 14:58:04 +0200
schrieb Olaf Hering <olaf@aepfle.de>:

> the reporting is broken since a while

A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken.


Olaf
Jan Beulich July 6, 2021, 3:13 p.m. UTC | #12
On 06.07.2021 16:11, Olaf Hering wrote:
> Am Tue, 6 Jul 2021 14:58:04 +0200
> schrieb Olaf Hering <olaf@aepfle.de>:
> 
>> the reporting is broken since a while
> 
> A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken.

Not surprising at all, considering PML support was added in 4.6.

Jan
Jan Beulich July 6, 2021, 3:22 p.m. UTC | #13
On 06.07.2021 17:13, Jan Beulich wrote:
> On 06.07.2021 16:11, Olaf Hering wrote:
>> Am Tue, 6 Jul 2021 14:58:04 +0200
>> schrieb Olaf Hering <olaf@aepfle.de>:
>>
>>> the reporting is broken since a while
>>
>> A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken.
> 
> Not surprising at all, considering PML support was added in 4.6.

Or perhaps still surprising, as the functions involved there don't
appear to be bypassing any core logic, so after some looking around
I can't say I see anything that's obviously wrong.

Oh, and I notice only now the "not" in your original reply, which
renders my earlier statement completely pointless anyway. I'm sorry.

Jan
Andrew Cooper July 6, 2021, 4:08 p.m. UTC | #14
On 06/07/2021 16:22, Jan Beulich wrote:
> On 06.07.2021 17:13, Jan Beulich wrote:
>> On 06.07.2021 16:11, Olaf Hering wrote:
>>> Am Tue, 6 Jul 2021 14:58:04 +0200
>>> schrieb Olaf Hering <olaf@aepfle.de>:
>>>
>>>> the reporting is broken since a while
>>> A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken.
>> Not surprising at all, considering PML support was added in 4.6.
> Or perhaps still surprising, as the functions involved there don't
> appear to be bypassing any core logic, so after some looking around
> I can't say I see anything that's obviously wrong.
>
> Oh, and I notice only now the "not" in your original reply, which
> renders my earlier statement completely pointless anyway. I'm sorry.

Ok, so my observations of dirty_count being reliable during development
of migration v2 appears to be correct, and something regressed in the
dev window where it got committed upstream...

That is suspicious.

~Andrew
diff mbox series

Patch

diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 07c9e291610b..bda04ee42e3f 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -425,7 +425,7 @@  static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx)
     int rc = -1;
     unsigned int count, written;
     uint64_t i, *pfns = NULL;
-    xc_shadow_op_stats_t stats = { 0, ctx->restore.p2m_size };
+    xc_shadow_op_stats_t stats;
     struct xc_sr_record rec = {
         .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST,
     };
@@ -444,14 +444,17 @@  static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx)
         goto err;
     }
 
-    for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
+    count = stats.dirty_count;
+
+    if ( ((REC_LENGTH_MAX - sizeof(rec)) / sizeof(*pfns)) < count )
     {
-        if ( test_bit(i, dirty_bitmap) )
-            count++;
+        ERROR("Too many PFNs (%u) to fit in record (limit %zu)", count,
+              ((REC_LENGTH_MAX - sizeof(rec)) / sizeof(*pfns)));
+        goto err;
     }
 
-
-    pfns = malloc(count * sizeof(*pfns));
+    iov[1].iov_len  = rec.length = count * sizeof(*pfns);
+    iov[1].iov_base = pfns       = malloc(rec.length);
     if ( !pfns )
     {
         ERROR("Unable to allocate %zu bytes of memory for dirty pfn list",
@@ -459,24 +462,20 @@  static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx)
         goto err;
     }
 
-    for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i )
+    for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count )
     {
         if ( !test_bit(i, dirty_bitmap) )
             continue;
 
-        if ( written > count )
-        {
-            ERROR("Dirty pfn list exceed");
-            goto err;
-        }
-
         pfns[written++] = i;
     }
 
-    rec.length = count * sizeof(*pfns);
-
-    iov[1].iov_base = pfns;
-    iov[1].iov_len = rec.length;
+    if ( written != stats.dirty_count )
+    {
+        ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)",
+              written, stats.dirty_count);
+        goto err;
+    }
 
     if ( writev_exact(ctx->restore.send_back_fd, iov, ARRAY_SIZE(iov)) )
     {