diff mbox series

[06/12] libxenguest: guard against overflow from too large p2m when checkpointing

Message ID 09e81b91-84de-6e49-9a62-eb3a6f392954@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: more or less log-dirty related improvements | expand

Commit Message

Jan Beulich June 25, 2021, 1:20 p.m. UTC
struct xc_sr_record's length field has just 32 bits. Fill it early and
check that the calculated value hasn't overflowed. Additionally check
for counter overflow early - there's no point even trying to allocate
any memory in such an event.

While there also limit an induction variable's type to unsigned long:
There's no gain from it being uint64_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Of course looping over test_bit() is pretty inefficient, but given that
I have no idea how to test this code I wanted to restrict changes to
what can sensibly be seen as no worse than before from just looking at
the changes.

Comments

Andrew Cooper June 25, 2021, 7 p.m. UTC | #1
On 25/06/2021 14:20, Jan Beulich wrote:
> struct xc_sr_record's length field has just 32 bits.

The stream max record length is

/* Somewhat arbitrary - 128MB */
#define REC_LENGTH_MAX                (128U << 20)

and is checked in the low level helpers, making the upper bound on the
number of PFNs 0xFFFFFF once the record header is taken into account.

There doesn't appear to have been any consideration made to what happens
if this number gets too large.  That said, the replication will totally
fall apart if it ever gets to a fraction of this, because this is the
list of pages the source side needs to send again in addition to
whatever *it* dirtied, as it is the state we've lost on the destination
side by permitting the VM to run live.

The common case is that, when execution diverges, the dirtied pages on
source and destination will be almost the same, so merging this on the
source side shouldn't lead to many superfluous pages needing to be sent.

>  Fill it early and
> check that the calculated value hasn't overflowed. Additionally check
> for counter overflow early - there's no point even trying to allocate
> any memory in such an event.
>
> While there also limit an induction variable's type to unsigned long:
> There's no gain from it being uint64_t.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Of course looping over test_bit() is pretty inefficient, but given that
> I have no idea how to test this code I wanted to restrict changes to
> what can sensibly be seen as no worse than before from just looking at
> the changes.

At this point, I'm not sure it can be tested.  IIRC, COLO depends on
some functionality which didn't make its way upstream into Qemu.

> --- a/tools/libs/guest/xg_sr_restore.c
> +++ b/tools/libs/guest/xg_sr_restore.c
> @@ -450,7 +450,8 @@ static int send_checkpoint_dirty_pfn_lis
>      xc_interface *xch = ctx->xch;
>      int rc = -1;
>      unsigned int count, written;
> -    uint64_t i, *pfns = NULL;
> +    unsigned long i;
> +    uint64_t *pfns = NULL;
>      struct iovec *iov = NULL;
>      struct xc_sr_record rec = {
>          .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST,
> @@ -469,16 +470,28 @@ static int send_checkpoint_dirty_pfn_lis
>  
>      for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
>      {
> -        if ( test_bit(i, dirty_bitmap) )
> -            count++;
> +        if ( test_bit(i, dirty_bitmap) && !++count )

This is far too opaque logic.

Its also entirely unnecessary...  All this loop is doing is calculating
the size for the memory allocation below, and that can be done by using
the stats output from xc_logdirty_control(), which means it doesn't want
deleting in the earlier patch.

~Andrew
Jan Beulich June 28, 2021, 9:05 a.m. UTC | #2
On 25.06.2021 21:00, Andrew Cooper wrote:
> On 25/06/2021 14:20, Jan Beulich wrote:
>> struct xc_sr_record's length field has just 32 bits.
> 
> The stream max record length is
> 
> /* Somewhat arbitrary - 128MB */
> #define REC_LENGTH_MAX                (128U << 20)
> 
> and is checked in the low level helpers, making the upper bound on the
> number of PFNs 0xFFFFFF once the record header is taken into account.
> 
> There doesn't appear to have been any consideration made to what happens
> if this number gets too large.  That said, the replication will totally
> fall apart if it ever gets to a fraction of this, because this is the
> list of pages the source side needs to send again in addition to
> whatever *it* dirtied, as it is the state we've lost on the destination
> side by permitting the VM to run live.
> 
> The common case is that, when execution diverges, the dirtied pages on
> source and destination will be almost the same, so merging this on the
> source side shouldn't lead to many superfluous pages needing to be sent.

While I follow what you say, what I can't conclude is whether with this
you mean me to make any change to this first sentence of the description
(or even the patch itself).

>> --- a/tools/libs/guest/xg_sr_restore.c
>> +++ b/tools/libs/guest/xg_sr_restore.c
>> @@ -450,7 +450,8 @@ static int send_checkpoint_dirty_pfn_lis
>>      xc_interface *xch = ctx->xch;
>>      int rc = -1;
>>      unsigned int count, written;
>> -    uint64_t i, *pfns = NULL;
>> +    unsigned long i;
>> +    uint64_t *pfns = NULL;
>>      struct iovec *iov = NULL;
>>      struct xc_sr_record rec = {
>>          .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST,
>> @@ -469,16 +470,28 @@ static int send_checkpoint_dirty_pfn_lis
>>  
>>      for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
>>      {
>> -        if ( test_bit(i, dirty_bitmap) )
>> -            count++;
>> +        if ( test_bit(i, dirty_bitmap) && !++count )
> 
> This is far too opaque logic.

This is an observation, without hint towards possible improvement.
I can attach a comment, but it's far from clear whether that's all
you're after.

> Its also entirely unnecessary...  All this loop is doing is calculating
> the size for the memory allocation below, and that can be done by using
> the stats output from xc_logdirty_control(), which means it doesn't want
> deleting in the earlier patch.

Only if there is a guarantee that the stats.dirty_count value hasn't
itself been truncated (or, as per the later patch, saturated) in the
hypervisor. Otherwise there may be more set bits in the bitmap, and
counting locally is still necessary. I generally think that anything
called "stats" may be used as guiding information, but not as precise
data. Yet the latter would be needed if you want to make memory
allocation depend on it.

Jan
diff mbox series

Patch

--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -450,7 +450,8 @@  static int send_checkpoint_dirty_pfn_lis
     xc_interface *xch = ctx->xch;
     int rc = -1;
     unsigned int count, written;
-    uint64_t i, *pfns = NULL;
+    unsigned long i;
+    uint64_t *pfns = NULL;
     struct iovec *iov = NULL;
     struct xc_sr_record rec = {
         .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST,
@@ -469,16 +470,28 @@  static int send_checkpoint_dirty_pfn_lis
 
     for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
     {
-        if ( test_bit(i, dirty_bitmap) )
-            count++;
+        if ( test_bit(i, dirty_bitmap) && !++count )
+            break;
     }
 
+    if ( i < ctx->restore.p2m_size )
+    {
+        ERROR("Too many dirty pfns");
+        goto err;
+    }
+
+    rec.length = count * sizeof(*pfns);
+    if ( rec.length / sizeof(*pfns) != count )
+    {
+        ERROR("Too many (%u) dirty pfns", count);
+        goto err;
+    }
 
-    pfns = malloc(count * sizeof(*pfns));
+    pfns = malloc(rec.length);
     if ( !pfns )
     {
-        ERROR("Unable to allocate %zu bytes of memory for dirty pfn list",
-              count * sizeof(*pfns));
+        ERROR("Unable to allocate %u bytes of memory for dirty pfn list",
+              rec.length);
         goto err;
     }
 
@@ -504,8 +517,6 @@  static int send_checkpoint_dirty_pfn_lis
         goto err;
     }
 
-    rec.length = count * sizeof(*pfns);
-
     iov[0].iov_base = &rec.type;
     iov[0].iov_len = sizeof(rec.type);
 
@@ -513,7 +524,7 @@  static int send_checkpoint_dirty_pfn_lis
     iov[1].iov_len = sizeof(rec.length);
 
     iov[2].iov_base = pfns;
-    iov[2].iov_len = count * sizeof(*pfns);
+    iov[2].iov_len = rec.length;
 
     if ( writev_exact(ctx->restore.send_back_fd, iov, 3) )
     {