Message ID | 20210706112332.31753-3-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tools/migration: Fixes in send_checkpoint_dirty_pfn_list() | expand |
On 06.07.2021 13:23, Andrew Cooper wrote: > 'count * sizeof(*pfns)' can in principle overflow, but is implausible in > practice as the time between checkpoints is typically sub-second. > Nevertheless, simplify the code and remove the risk. > > There is no need to loop over the bitmap to calculate count. The number of > set bits is returned in xc_shadow_op_stats_t which is already collected (and > ignored). > > Bounds check the count against what will fit in REC_LENGTH_MAX. At the time > of writing, this allows up to 0xffffff pfns. Well, okay, this then means that an overflow in the reporting of dirty_count doesn't matter for now, because the limit is lower anyway. > Rearrange the pfns loop to check > for errors both ways, not simply that there were more pfns than expected. Hmm, "both ways" to me would mean ... > @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) > goto err; > } > > - for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i ) > + for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count ) > { > if ( !test_bit(i, dirty_bitmap) ) > continue; > > - if ( written > count ) > - { > - ERROR("Dirty pfn list exceed"); > - goto err; > - } > - > pfns[written++] = i; > } > > - rec.length = count * sizeof(*pfns); > - > - iov[1].iov_base = pfns; > - iov[1].iov_len = rec.length; > + if ( written != stats.dirty_count ) > + { > + ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)", > + written, stats.dirty_count); > + goto err; > + } ... you then also check that there are no further bit set in the bitmap. As said elsewhere, I'm not convinced using statistics as a basis for actual operation (rather than just reporting) is appropriate. I'm unaware of there being any spelled out guarantee that the numbers reported back from the hypercall are accurate. Jan
Am Tue, 6 Jul 2021 12:23:32 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
> + count = stats.dirty_count;
Is this accurate?
I remember the reporting is broken since a while, and testing a busy domU indicates it is still the case.
# xen-logdirty `xl domid domU`
0: faults= 0 dirty= 258050
1: faults= 0 dirty= 257817
2: faults= 0 dirty= 253713
3: faults= 0 dirty= 253197
4: faults= 0 dirty= 255154
5: faults= 0 dirty= 260876
6: faults= 0 dirty= 259083
7: faults= 0 dirty= 253163
8: faults= 0 dirty= 258349
9: faults= 0 dirty= 260330
10: faults= 0 dirty= 259754
11: faults= 0 dirty= 257273
12: faults= 0 dirty= 255756
13: faults= 0 dirty= 258209
14: faults= 0 dirty= 257847
15: faults= 0 dirty= 59639
16: faults= 0 dirty= 81
17: faults= 0 dirty= 86
18: faults= 0 dirty= 111
19: faults= 0 dirty= 81
20: faults= 0 dirty= 80
....
Olaf
On 06/07/2021 13:58, Olaf Hering wrote: > Am Tue, 6 Jul 2021 12:23:32 +0100 > schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > >> + count = stats.dirty_count; > Is this accurate? The live loop relies on it, and it worked correctly the last time I tested it. > I remember the reporting is broken since a while, and testing a busy domU indicates it is still the case. If its broken, it needs fixing. > # xen-logdirty `xl domid domU` > 0: faults= 0 dirty= 258050 > 1: faults= 0 dirty= 257817 > 2: faults= 0 dirty= 253713 > 3: faults= 0 dirty= 253197 > 4: faults= 0 dirty= 255154 > 5: faults= 0 dirty= 260876 > 6: faults= 0 dirty= 259083 > 7: faults= 0 dirty= 253163 > 8: faults= 0 dirty= 258349 > 9: faults= 0 dirty= 260330 > 10: faults= 0 dirty= 259754 > 11: faults= 0 dirty= 257273 > 12: faults= 0 dirty= 255756 > 13: faults= 0 dirty= 258209 > 14: faults= 0 dirty= 257847 > 15: faults= 0 dirty= 59639 > 16: faults= 0 dirty= 81 > 17: faults= 0 dirty= 86 > 18: faults= 0 dirty= 111 > 19: faults= 0 dirty= 81 > 20: faults= 0 dirty= 80 What is this showing, other than (unsurprisingly) faults doesn't work for an HVM guest? ~Andrew
On 06/07/2021 14:19, Andrew Cooper wrote: > On 06/07/2021 13:58, Olaf Hering wrote: >> Am Tue, 6 Jul 2021 12:23:32 +0100 >> schrieb Andrew Cooper <andrew.cooper3@citrix.com>: >> >>> + count = stats.dirty_count; >> Is this accurate? > The live loop relies on it, and it worked correctly the last time I > tested it. > >> I remember the reporting is broken since a while, and testing a busy domU indicates it is still the case. > If its broken, it needs fixing. > >> # xen-logdirty `xl domid domU` >> 0: faults= 0 dirty= 258050 >> 1: faults= 0 dirty= 257817 >> 2: faults= 0 dirty= 253713 >> 3: faults= 0 dirty= 253197 >> 4: faults= 0 dirty= 255154 >> 5: faults= 0 dirty= 260876 >> 6: faults= 0 dirty= 259083 >> 7: faults= 0 dirty= 253163 >> 8: faults= 0 dirty= 258349 >> 9: faults= 0 dirty= 260330 >> 10: faults= 0 dirty= 259754 >> 11: faults= 0 dirty= 257273 >> 12: faults= 0 dirty= 255756 >> 13: faults= 0 dirty= 258209 >> 14: faults= 0 dirty= 257847 >> 15: faults= 0 dirty= 59639 >> 16: faults= 0 dirty= 81 >> 17: faults= 0 dirty= 86 >> 18: faults= 0 dirty= 111 >> 19: faults= 0 dirty= 81 >> 20: faults= 0 dirty= 80 > What is this showing, other than (unsurprisingly) faults doesn't work > for an HVM guest? Sorry - I meant HAP guest. What hardware is this on? i.e. is the Page Modification Logging feature in use? ~Andrew
Am Tue, 6 Jul 2021 14:19:21 +0100 schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > > 20: faults= 0 dirty= 80 > > What is this showing, other than (unsurprisingly) faults doesn't work > for an HVM guest? The dirty count goes down after a while for a domU that constantly touches as many pages as it can. But yes, the current code uses stats.dirty_count in a number of places. It seems all of the usage is just for reporting, so if these code paths would get wrong input nothing bad happens. precopy_policy may indicate more or less iterations, this is not critical either. Olaf
On 06/07/2021 13:03, Jan Beulich wrote: > On 06.07.2021 13:23, Andrew Cooper wrote: >> 'count * sizeof(*pfns)' can in principle overflow, but is implausible in >> practice as the time between checkpoints is typically sub-second. >> Nevertheless, simplify the code and remove the risk. >> >> There is no need to loop over the bitmap to calculate count. The number of >> set bits is returned in xc_shadow_op_stats_t which is already collected (and >> ignored). >> >> Bounds check the count against what will fit in REC_LENGTH_MAX. At the time >> of writing, this allows up to 0xffffff pfns. > Well, okay, this then means that an overflow in the reporting of > dirty_count doesn't matter for now, because the limit is lower > anyway. > >> Rearrange the pfns loop to check >> for errors both ways, not simply that there were more pfns than expected. > Hmm, "both ways" to me would mean ... > >> @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) >> goto err; >> } >> >> - for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i ) >> + for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count ) >> { >> if ( !test_bit(i, dirty_bitmap) ) >> continue; >> >> - if ( written > count ) >> - { >> - ERROR("Dirty pfn list exceed"); >> - goto err; >> - } >> - >> pfns[written++] = i; >> } >> >> - rec.length = count * sizeof(*pfns); >> - >> - iov[1].iov_base = pfns; >> - iov[1].iov_len = rec.length; >> + if ( written != stats.dirty_count ) >> + { >> + ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)", >> + written, stats.dirty_count); >> + goto err; >> + } > ... you then also check that there are no further bit set in the > bitmap. As said elsewhere, I'm not convinced using statistics as > a basis for actual operation (rather than just reporting) is > appropriate. I'm not interested in inference based on the name of the structure. > I'm unaware of there being any spelled out guarantee > that the numbers reported back from the hypercall are accurate. The live loop uses this information already for this purpose. If it is wrong, we've got bigger problems that this. ~Andrew
Am Tue, 6 Jul 2021 14:22:58 +0100 schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > What hardware is this on? i.e. is the Page Modification Logging feature > in use? At least it gets reported as VMX feature during boot, this is a CoyotePass system. Olaf
On 06/07/2021 14:39, Olaf Hering wrote: > Am Tue, 6 Jul 2021 14:22:58 +0100 > schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > >> What hardware is this on? i.e. is the Page Modification Logging feature >> in use? > At least it gets reported as VMX feature during boot, this is a CoyotePass system. That logging is problematic (that means "this hardware has the feature"), but yes - PML will be used by default when available. ~Andrew
On 06.07.2021 15:19, Andrew Cooper wrote: > On 06/07/2021 13:58, Olaf Hering wrote: >> Am Tue, 6 Jul 2021 12:23:32 +0100 >> schrieb Andrew Cooper <andrew.cooper3@citrix.com>: >> >>> + count = stats.dirty_count; >> Is this accurate? > > The live loop relies on it, and it worked correctly the last time I > tested it. When still merely investigating in preparation of my recent series, i.e. without having made changes yet except to add some logging, I did observe "Bitmap contained more entries than expected..." a couple of times, with "written" and "entries" typically apart by just 1 (as determined by extra logging; to be honest I don't recall if they were farther apart at any point). So the number is _not_ accurate in any event, and cannot be used for other than reporting purposes (as also expressed elsewhere on this thread). This also underlines that, unlike you did say in a reply to one of my patches, this is only a "detail" message, not an error, because migration happily went on and succeeded. Jan
On 06.07.2021 15:34, Andrew Cooper wrote: > On 06/07/2021 13:03, Jan Beulich wrote: >> On 06.07.2021 13:23, Andrew Cooper wrote: >>> 'count * sizeof(*pfns)' can in principle overflow, but is implausible in >>> practice as the time between checkpoints is typically sub-second. >>> Nevertheless, simplify the code and remove the risk. >>> >>> There is no need to loop over the bitmap to calculate count. The number of >>> set bits is returned in xc_shadow_op_stats_t which is already collected (and >>> ignored). >>> >>> Bounds check the count against what will fit in REC_LENGTH_MAX. At the time >>> of writing, this allows up to 0xffffff pfns. >> Well, okay, this then means that an overflow in the reporting of >> dirty_count doesn't matter for now, because the limit is lower >> anyway. >> >>> Rearrange the pfns loop to check >>> for errors both ways, not simply that there were more pfns than expected. >> Hmm, "both ways" to me would mean ... >> >>> @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) >>> goto err; >>> } >>> >>> - for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i ) >>> + for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count ) >>> { >>> if ( !test_bit(i, dirty_bitmap) ) >>> continue; >>> >>> - if ( written > count ) >>> - { >>> - ERROR("Dirty pfn list exceed"); >>> - goto err; >>> - } >>> - >>> pfns[written++] = i; >>> } >>> >>> - rec.length = count * sizeof(*pfns); >>> - >>> - iov[1].iov_base = pfns; >>> - iov[1].iov_len = rec.length; >>> + if ( written != stats.dirty_count ) >>> + { >>> + ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)", >>> + written, stats.dirty_count); >>> + goto err; >>> + } >> ... you then also check that there are no further bit set in the >> bitmap. As said elsewhere, I'm not convinced using statistics as >> a basis for actual operation (rather than just reporting) is >> appropriate. > > I'm not interested in inference based on the name of the structure. I'm afraid that's the problem: Because you started using it for something it wasn't meant to be used for, you now think it's the name that's misleading, and the use is okay. I remain of the opinion that it's the other way around (but see below for there not being an real dependency at least in this particular case). >> I'm unaware of there being any spelled out guarantee >> that the numbers reported back from the hypercall are accurate. > > The live loop uses this information already for this purpose. If it is > wrong, we've got bigger problems that this. send_memory_live() passes the value to send_dirty_pages(), which in turn passes it only to xc_report_progress_step() and uses it in if ( written > entries ) DPRINTF("Bitmap contained more entries than expected..."); There's no relying on this number at all afaics. Jan
Am Tue, 6 Jul 2021 14:58:04 +0200
schrieb Olaf Hering <olaf@aepfle.de>:
> the reporting is broken since a while
A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken.
Olaf
On 06.07.2021 16:11, Olaf Hering wrote: > Am Tue, 6 Jul 2021 14:58:04 +0200 > schrieb Olaf Hering <olaf@aepfle.de>: > >> the reporting is broken since a while > > A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken. Not surprising at all, considering PML support was added in 4.6. Jan
On 06.07.2021 17:13, Jan Beulich wrote: > On 06.07.2021 16:11, Olaf Hering wrote: >> Am Tue, 6 Jul 2021 14:58:04 +0200 >> schrieb Olaf Hering <olaf@aepfle.de>: >> >>> the reporting is broken since a while >> >> A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken. > > Not surprising at all, considering PML support was added in 4.6. Or perhaps still surprising, as the functions involved there don't appear to be bypassing any core logic, so after some looking around I can't say I see anything that's obviously wrong. Oh, and I notice only now the "not" in your original reply, which renders my earlier statement completely pointless anyway. I'm sorry. Jan
On 06/07/2021 16:22, Jan Beulich wrote: > On 06.07.2021 17:13, Jan Beulich wrote: >> On 06.07.2021 16:11, Olaf Hering wrote: >>> Am Tue, 6 Jul 2021 14:58:04 +0200 >>> schrieb Olaf Hering <olaf@aepfle.de>: >>> >>>> the reporting is broken since a while >>> A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken. >> Not surprising at all, considering PML support was added in 4.6. > Or perhaps still surprising, as the functions involved there don't > appear to be bypassing any core logic, so after some looking around > I can't say I see anything that's obviously wrong. > > Oh, and I notice only now the "not" in your original reply, which > renders my earlier statement completely pointless anyway. I'm sorry. Ok, so my observations of dirty_count being reliable during development of migration v2 appears to be correct, and something regressed in the dev window where it got committed upstream... That is suspicious. ~Andrew
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 07c9e291610b..bda04ee42e3f 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -425,7 +425,7 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) int rc = -1; unsigned int count, written; uint64_t i, *pfns = NULL; - xc_shadow_op_stats_t stats = { 0, ctx->restore.p2m_size }; + xc_shadow_op_stats_t stats; struct xc_sr_record rec = { .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST, }; @@ -444,14 +444,17 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) goto err; } - for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ ) + count = stats.dirty_count; + + if ( ((REC_LENGTH_MAX - sizeof(rec)) / sizeof(*pfns)) < count ) { - if ( test_bit(i, dirty_bitmap) ) - count++; + ERROR("Too many PFNs (%u) to fit in record (limit %zu)", count, + ((REC_LENGTH_MAX - sizeof(rec)) / sizeof(*pfns))); + goto err; } - - pfns = malloc(count * sizeof(*pfns)); + iov[1].iov_len = rec.length = count * sizeof(*pfns); + iov[1].iov_base = pfns = malloc(rec.length); if ( !pfns ) { ERROR("Unable to allocate %zu bytes of memory for dirty pfn list", @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) goto err; } - for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i ) + for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count ) { if ( !test_bit(i, dirty_bitmap) ) continue; - if ( written > count ) - { - ERROR("Dirty pfn list exceed"); - goto err; - } - pfns[written++] = i; } - rec.length = count * sizeof(*pfns); - - iov[1].iov_base = pfns; - iov[1].iov_len = rec.length; + if ( written != stats.dirty_count ) + { + ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)", + written, stats.dirty_count); + goto err; + } if ( writev_exact(ctx->restore.send_back_fd, iov, ARRAY_SIZE(iov)) ) {
'count * sizeof(*pfns)' can in principle overflow, but is implausible in practice as the time between checkpoints is typically sub-second. Nevertheless, simplify the code and remove the risk. There is no need to loop over the bitmap to calculate count. The number of set bits is returned in xc_shadow_op_stats_t which is already collected (and ignored). Bounds check the count against what will fit in REC_LENGTH_MAX. At the time of writing, this allows up to 0xffffff pfns. Rearrange the pfns loop to check for errors both ways, not simply that there were more pfns than expected. Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Ian Jackson <iwj@xenproject.org> CC: Wei Liu <wl@xen.org> CC: Juergen Gross <jgross@suse.com> CC: Jan Beulich <JBeulich@suse.com> CC: Olaf Hering <olaf@aepfle.de> --- tools/libs/guest/xg_sr_restore.c | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-)