Message ID | 20210730085249.8246-7-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration/ram: Optimize for virtio-mem via RamDiscardManager | expand |
On Fri, Jul 30, 2021 at 10:52:48AM +0200, David Hildenbrand wrote: > Currently, when someone (i.e., the VM) accesses discarded parts inside a > RAMBlock with a RamDiscardManager managing the corresponding mapped memory > region, postcopy will request migration of the corresponding page from the > source. The source, however, will never answer, because it refuses to > migrate such pages with undefined content ("logically unplugged"): the > pages are never dirty, and get_queued_page() will consequently skip > processing these postcopy requests. > > Especially reading discarded ("logically unplugged") ranges is supposed to > work in some setups (for example with current virtio-mem), although it > barely ever happens: still, not placing a page would currently stall the > VM, as it cannot make forward progress. > > Let's check the state via the RamDiscardManager (the state e.g., > of virtio-mem is migrated during precopy) and avoid sending a request > that will never get answered. Place a fresh zero page instead to keep > the VM working. This is the same behavior that would happen > automatically without userfaultfd being active, when accessing virtual > memory regions without populated pages -- "populate on demand". > > For now, there are valid cases (as documented in the virtio-mem spec) where > a VM might read discarded memory; in the future, we will disallow that. > Then, we might want to handle that case differently, e.g., warning the > user that the VM seems to be mis-behaving. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++---- > migration/ram.c | 21 +++++++++++++++++++++ > migration/ram.h | 1 + > 3 files changed, 49 insertions(+), 4 deletions(-) > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c > index 2e9697bdd2..38cdfc09c3 100644 > --- a/migration/postcopy-ram.c > +++ b/migration/postcopy-ram.c > @@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, > return ret; > } > > +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb, > + ram_addr_t start, uint64_t haddr) > +{ > + void *aligned = (void *)(uintptr_t)(haddr & -qemu_ram_pagesize(rb)); > + > + /* > + * Discarded pages (via RamDiscardManager) are never migrated. On unlikely > + * access, place a zeropage, which will also set the relevant bits in the > + * recv_bitmap accordingly, so we won't try placing a zeropage twice. > + * > + * Checking a single bit is sufficient to handle pagesize > TPS as either > + * all relevant bits are set or not. > + */ > + assert(QEMU_IS_ALIGNED(start, qemu_ram_pagesize(rb))); Is this check for ramblock_page_is_discarded()? If so, shall we move this into it, e.g. after memory_region_has_ram_discard_manager() returned true? Other than that it looks good to me, thanks. > + if (ramblock_page_is_discarded(rb, start)) { > + bool received = ramblock_recv_bitmap_test_byte_offset(rb, start); > + > + return received ? 0 : postcopy_place_page_zero(mis, aligned, rb); > + } > + > + return migrate_send_rp_req_pages(mis, rb, start, haddr); > +} > + > /* > * Callback from shared fault handlers to ask for a page, > * the page must be specified by a RAMBlock and an offset in that rb > @@ -690,7 +713,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb, > qemu_ram_get_idstr(rb), rb_offset); > return postcopy_wake_shared(pcfd, client_addr, rb); > } > - migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr); > + postcopy_request_page(mis, rb, aligned_rbo, client_addr); > return 0; > } > > @@ -984,8 +1007,8 @@ retry: > * Send the request to the source - we want to request one > * of our host page sizes (which is >= TPS) > */ > - ret = migrate_send_rp_req_pages(mis, rb, rb_offset, > - msg.arg.pagefault.address); > + ret = postcopy_request_page(mis, rb, rb_offset, > + msg.arg.pagefault.address); > if (ret) { > /* May be network failure, try to wait for recovery */ > if (ret == -EIO && postcopy_pause_fault_thread(mis)) { > @@ -993,7 +1016,7 @@ retry: > goto retry; > } else { > /* This is a unavoidable fault */ > - error_report("%s: migrate_send_rp_req_pages() get %d", > + error_report("%s: postcopy_request_page() get %d", > __func__, ret); > break; > } > diff --git a/migration/ram.c b/migration/ram.c > index 9776919faa..01cea01774 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -912,6 +912,27 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb) > return cleared_bits; > } > > +/* > + * Check if a host-page aligned page falls into a discarded range as managed by > + * a RamDiscardManager responsible for the mapped memory region of the RAMBlock. > + * > + * Note: The result is only stable while migration (precopy/postcopy). > + */ > +bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start) > +{ > + if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) { > + RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr); > + MemoryRegionSection section = { > + .mr = rb->mr, > + .offset_within_region = start, > + .size = int128_get64(qemu_ram_pagesize(rb)), > + }; > + > + return !ram_discard_manager_is_populated(rdm, §ion); > + } > + return false; > +} > + > /* Called with RCU critical section */ > static void ramblock_sync_dirty_bitmap(RAMState *rs, RAMBlock *rb) > { > diff --git a/migration/ram.h b/migration/ram.h > index 4833e9fd5b..dda1988f3d 100644 > --- a/migration/ram.h > +++ b/migration/ram.h > @@ -72,6 +72,7 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr); > int64_t ramblock_recv_bitmap_send(QEMUFile *file, > const char *block_name); > int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); > +bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); > > /* ram cache */ > int colo_init_ram_cache(void); > -- > 2.31.1 >
On 7/30/21 10:52 AM, David Hildenbrand wrote: > Currently, when someone (i.e., the VM) accesses discarded parts inside a > RAMBlock with a RamDiscardManager managing the corresponding mapped memory > region, postcopy will request migration of the corresponding page from the > source. The source, however, will never answer, because it refuses to > migrate such pages with undefined content ("logically unplugged"): the > pages are never dirty, and get_queued_page() will consequently skip > processing these postcopy requests. > > Especially reading discarded ("logically unplugged") ranges is supposed to > work in some setups (for example with current virtio-mem), although it > barely ever happens: still, not placing a page would currently stall the > VM, as it cannot make forward progress. > > Let's check the state via the RamDiscardManager (the state e.g., > of virtio-mem is migrated during precopy) and avoid sending a request > that will never get answered. Place a fresh zero page instead to keep > the VM working. This is the same behavior that would happen > automatically without userfaultfd being active, when accessing virtual > memory regions without populated pages -- "populate on demand". > > For now, there are valid cases (as documented in the virtio-mem spec) where > a VM might read discarded memory; in the future, we will disallow that. > Then, we might want to handle that case differently, e.g., warning the > user that the VM seems to be mis-behaving. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++---- > migration/ram.c | 21 +++++++++++++++++++++ > migration/ram.h | 1 + > 3 files changed, 49 insertions(+), 4 deletions(-) > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c > index 2e9697bdd2..38cdfc09c3 100644 > --- a/migration/postcopy-ram.c > +++ b/migration/postcopy-ram.c > @@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, > return ret; > } > > +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb, > + ram_addr_t start, uint64_t haddr) > +{ > + void *aligned = (void *)(uintptr_t)(haddr & -qemu_ram_pagesize(rb)); void *aligned = QEMU_ALIGN_PTR_DOWN(haddr, qemu_ram_pagesize(rb)));
On 05.08.21 09:48, Philippe Mathieu-Daudé wrote: > On 7/30/21 10:52 AM, David Hildenbrand wrote: >> Currently, when someone (i.e., the VM) accesses discarded parts inside a >> RAMBlock with a RamDiscardManager managing the corresponding mapped memory >> region, postcopy will request migration of the corresponding page from the >> source. The source, however, will never answer, because it refuses to >> migrate such pages with undefined content ("logically unplugged"): the >> pages are never dirty, and get_queued_page() will consequently skip >> processing these postcopy requests. >> >> Especially reading discarded ("logically unplugged") ranges is supposed to >> work in some setups (for example with current virtio-mem), although it >> barely ever happens: still, not placing a page would currently stall the >> VM, as it cannot make forward progress. >> >> Let's check the state via the RamDiscardManager (the state e.g., >> of virtio-mem is migrated during precopy) and avoid sending a request >> that will never get answered. Place a fresh zero page instead to keep >> the VM working. This is the same behavior that would happen >> automatically without userfaultfd being active, when accessing virtual >> memory regions without populated pages -- "populate on demand". >> >> For now, there are valid cases (as documented in the virtio-mem spec) where >> a VM might read discarded memory; in the future, we will disallow that. >> Then, we might want to handle that case differently, e.g., warning the >> user that the VM seems to be mis-behaving. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++---- >> migration/ram.c | 21 +++++++++++++++++++++ >> migration/ram.h | 1 + >> 3 files changed, 49 insertions(+), 4 deletions(-) >> >> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c >> index 2e9697bdd2..38cdfc09c3 100644 >> --- a/migration/postcopy-ram.c >> +++ b/migration/postcopy-ram.c >> @@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, >> return ret; >> } >> >> +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb, >> + ram_addr_t start, uint64_t haddr) >> +{ >> + void *aligned = (void *)(uintptr_t)(haddr & -qemu_ram_pagesize(rb)); > > void *aligned = QEMU_ALIGN_PTR_DOWN(haddr, qemu_ram_pagesize(rb))); > Does not compile as haddr is not a pointer. void *aligned = QEMU_ALIGN_PTR_DOWN((void *)haddr, qemu_ram_pagesize(rb))); should work. I can also add a patch to adjust a similar call in migrate_send_rp_req_pages()! Thanks!
On 05.08.21 02:04, Peter Xu wrote: > On Fri, Jul 30, 2021 at 10:52:48AM +0200, David Hildenbrand wrote: >> Currently, when someone (i.e., the VM) accesses discarded parts inside a >> RAMBlock with a RamDiscardManager managing the corresponding mapped memory >> region, postcopy will request migration of the corresponding page from the >> source. The source, however, will never answer, because it refuses to >> migrate such pages with undefined content ("logically unplugged"): the >> pages are never dirty, and get_queued_page() will consequently skip >> processing these postcopy requests. >> >> Especially reading discarded ("logically unplugged") ranges is supposed to >> work in some setups (for example with current virtio-mem), although it >> barely ever happens: still, not placing a page would currently stall the >> VM, as it cannot make forward progress. >> >> Let's check the state via the RamDiscardManager (the state e.g., >> of virtio-mem is migrated during precopy) and avoid sending a request >> that will never get answered. Place a fresh zero page instead to keep >> the VM working. This is the same behavior that would happen >> automatically without userfaultfd being active, when accessing virtual >> memory regions without populated pages -- "populate on demand". >> >> For now, there are valid cases (as documented in the virtio-mem spec) where >> a VM might read discarded memory; in the future, we will disallow that. >> Then, we might want to handle that case differently, e.g., warning the >> user that the VM seems to be mis-behaving. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++---- >> migration/ram.c | 21 +++++++++++++++++++++ >> migration/ram.h | 1 + >> 3 files changed, 49 insertions(+), 4 deletions(-) >> >> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c >> index 2e9697bdd2..38cdfc09c3 100644 >> --- a/migration/postcopy-ram.c >> +++ b/migration/postcopy-ram.c >> @@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, >> return ret; >> } >> >> +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb, >> + ram_addr_t start, uint64_t haddr) >> +{ >> + void *aligned = (void *)(uintptr_t)(haddr & -qemu_ram_pagesize(rb)); >> + >> + /* >> + * Discarded pages (via RamDiscardManager) are never migrated. On unlikely >> + * access, place a zeropage, which will also set the relevant bits in the >> + * recv_bitmap accordingly, so we won't try placing a zeropage twice. >> + * >> + * Checking a single bit is sufficient to handle pagesize > TPS as either >> + * all relevant bits are set or not. >> + */ >> + assert(QEMU_IS_ALIGNED(start, qemu_ram_pagesize(rb))); > > Is this check for ramblock_page_is_discarded()? If so, shall we move this into > it, e.g. after memory_region_has_ram_discard_manager() returned true? > It has to hold true also when calling migrate_send_rp_req_pages(). Both callers -- postcopy_request_shared_page() and postcopy_ram_fault_thread() properly align the offset down (but not the host address). This check is just to make sure we don't mess up in the future.
On 8/5/21 10:07 AM, David Hildenbrand wrote: > On 05.08.21 09:48, Philippe Mathieu-Daudé wrote: >> On 7/30/21 10:52 AM, David Hildenbrand wrote: >>> Currently, when someone (i.e., the VM) accesses discarded parts inside a >>> RAMBlock with a RamDiscardManager managing the corresponding mapped >>> memory >>> region, postcopy will request migration of the corresponding page >>> from the >>> source. The source, however, will never answer, because it refuses to >>> migrate such pages with undefined content ("logically unplugged"): the >>> pages are never dirty, and get_queued_page() will consequently skip >>> processing these postcopy requests. >>> >>> Especially reading discarded ("logically unplugged") ranges is >>> supposed to >>> work in some setups (for example with current virtio-mem), although it >>> barely ever happens: still, not placing a page would currently stall the >>> VM, as it cannot make forward progress. >>> >>> Let's check the state via the RamDiscardManager (the state e.g., >>> of virtio-mem is migrated during precopy) and avoid sending a request >>> that will never get answered. Place a fresh zero page instead to keep >>> the VM working. This is the same behavior that would happen >>> automatically without userfaultfd being active, when accessing virtual >>> memory regions without populated pages -- "populate on demand". >>> >>> For now, there are valid cases (as documented in the virtio-mem spec) >>> where >>> a VM might read discarded memory; in the future, we will disallow that. >>> Then, we might want to handle that case differently, e.g., warning the >>> user that the VM seems to be mis-behaving. >>> >>> Signed-off-by: David Hildenbrand <david@redhat.com> >>> --- >>> migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++---- >>> migration/ram.c | 21 +++++++++++++++++++++ >>> migration/ram.h | 1 + >>> 3 files changed, 49 insertions(+), 4 deletions(-) >>> >>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c >>> index 2e9697bdd2..38cdfc09c3 100644 >>> --- a/migration/postcopy-ram.c >>> +++ b/migration/postcopy-ram.c >>> @@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, >>> return ret; >>> } >>> +static int postcopy_request_page(MigrationIncomingState *mis, >>> RAMBlock *rb, >>> + ram_addr_t start, uint64_t haddr) >>> +{ >>> + void *aligned = (void *)(uintptr_t)(haddr & >>> -qemu_ram_pagesize(rb)); >> >> void *aligned = QEMU_ALIGN_PTR_DOWN(haddr, qemu_ram_pagesize(rb))); >> > > Does not compile as haddr is not a pointer. I suppose the typeof() fails? /* n-byte align pointer down */ #define QEMU_ALIGN_PTR_DOWN(p, n) \ ((typeof(p))QEMU_ALIGN_DOWN((uintptr_t)(p), (n))) > void *aligned = QEMU_ALIGN_PTR_DOWN((void *)haddr, qemu_ram_pagesize(rb))); > > should work. What about void *aligned = QEMU_ALIGN_DOWN(haddr, qemu_ram_pagesize(rb))); else void *aligned = (void *)QEMU_ALIGN_DOWN(haddr, qemu_ram_pagesize(rb))); I don't mind much the style you prefer, as long as it compiles :p But these QEMU_ALIGN_DOWN() macros make the review easier than (a & -b). > I can also add a patch to adjust a similar call in > migrate_send_rp_req_pages()! > > Thanks! >
On 05.08.21 10:17, Philippe Mathieu-Daudé wrote: > On 8/5/21 10:07 AM, David Hildenbrand wrote: >> On 05.08.21 09:48, Philippe Mathieu-Daudé wrote: >>> On 7/30/21 10:52 AM, David Hildenbrand wrote: >>>> Currently, when someone (i.e., the VM) accesses discarded parts inside a >>>> RAMBlock with a RamDiscardManager managing the corresponding mapped >>>> memory >>>> region, postcopy will request migration of the corresponding page >>>> from the >>>> source. The source, however, will never answer, because it refuses to >>>> migrate such pages with undefined content ("logically unplugged"): the >>>> pages are never dirty, and get_queued_page() will consequently skip >>>> processing these postcopy requests. >>>> >>>> Especially reading discarded ("logically unplugged") ranges is >>>> supposed to >>>> work in some setups (for example with current virtio-mem), although it >>>> barely ever happens: still, not placing a page would currently stall the >>>> VM, as it cannot make forward progress. >>>> >>>> Let's check the state via the RamDiscardManager (the state e.g., >>>> of virtio-mem is migrated during precopy) and avoid sending a request >>>> that will never get answered. Place a fresh zero page instead to keep >>>> the VM working. This is the same behavior that would happen >>>> automatically without userfaultfd being active, when accessing virtual >>>> memory regions without populated pages -- "populate on demand". >>>> >>>> For now, there are valid cases (as documented in the virtio-mem spec) >>>> where >>>> a VM might read discarded memory; in the future, we will disallow that. >>>> Then, we might want to handle that case differently, e.g., warning the >>>> user that the VM seems to be mis-behaving. >>>> >>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>> --- >>>> migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++---- >>>> migration/ram.c | 21 +++++++++++++++++++++ >>>> migration/ram.h | 1 + >>>> 3 files changed, 49 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c >>>> index 2e9697bdd2..38cdfc09c3 100644 >>>> --- a/migration/postcopy-ram.c >>>> +++ b/migration/postcopy-ram.c >>>> @@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, >>>> return ret; >>>> } >>>> +static int postcopy_request_page(MigrationIncomingState *mis, >>>> RAMBlock *rb, >>>> + ram_addr_t start, uint64_t haddr) >>>> +{ >>>> + void *aligned = (void *)(uintptr_t)(haddr & >>>> -qemu_ram_pagesize(rb)); >>> >>> void *aligned = QEMU_ALIGN_PTR_DOWN(haddr, qemu_ram_pagesize(rb))); >>> >> >> Does not compile as haddr is not a pointer. > > I suppose the typeof() fails? > > /* n-byte align pointer down */ > #define QEMU_ALIGN_PTR_DOWN(p, n) \ > ((typeof(p))QEMU_ALIGN_DOWN((uintptr_t)(p), (n))) > > >> void *aligned = QEMU_ALIGN_PTR_DOWN((void *)haddr, qemu_ram_pagesize(rb))); >> >> should work. > > What about > > void *aligned = QEMU_ALIGN_DOWN(haddr, qemu_ram_pagesize(rb))); > > else > > void *aligned = (void *)QEMU_ALIGN_DOWN(haddr, qemu_ram_pagesize(rb))); That works as well and will use that for now. At one point we should just pass a pointer instead of uint64_t for the host address. > > I don't mind much the style you prefer, as long as it compiles :p > But these QEMU_ALIGN_DOWN() macros make the review easier than (a & -b). > Yes, absolutely. I'll add a patch converting a bunch of such alignments in migration code.
On Thu, Aug 05, 2021 at 10:10:38AM +0200, David Hildenbrand wrote: > On 05.08.21 02:04, Peter Xu wrote: > > On Fri, Jul 30, 2021 at 10:52:48AM +0200, David Hildenbrand wrote: > > > Currently, when someone (i.e., the VM) accesses discarded parts inside a > > > RAMBlock with a RamDiscardManager managing the corresponding mapped memory > > > region, postcopy will request migration of the corresponding page from the > > > source. The source, however, will never answer, because it refuses to > > > migrate such pages with undefined content ("logically unplugged"): the > > > pages are never dirty, and get_queued_page() will consequently skip > > > processing these postcopy requests. > > > > > > Especially reading discarded ("logically unplugged") ranges is supposed to > > > work in some setups (for example with current virtio-mem), although it > > > barely ever happens: still, not placing a page would currently stall the > > > VM, as it cannot make forward progress. > > > > > > Let's check the state via the RamDiscardManager (the state e.g., > > > of virtio-mem is migrated during precopy) and avoid sending a request > > > that will never get answered. Place a fresh zero page instead to keep > > > the VM working. This is the same behavior that would happen > > > automatically without userfaultfd being active, when accessing virtual > > > memory regions without populated pages -- "populate on demand". > > > > > > For now, there are valid cases (as documented in the virtio-mem spec) where > > > a VM might read discarded memory; in the future, we will disallow that. > > > Then, we might want to handle that case differently, e.g., warning the > > > user that the VM seems to be mis-behaving. > > > > > > Signed-off-by: David Hildenbrand <david@redhat.com> > > > --- > > > migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++---- > > > migration/ram.c | 21 +++++++++++++++++++++ > > > migration/ram.h | 1 + > > > 3 files changed, 49 insertions(+), 4 deletions(-) > > > > > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c > > > index 2e9697bdd2..38cdfc09c3 100644 > > > --- a/migration/postcopy-ram.c > > > +++ b/migration/postcopy-ram.c > > > @@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, > > > return ret; > > > } > > > +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb, > > > + ram_addr_t start, uint64_t haddr) > > > +{ > > > + void *aligned = (void *)(uintptr_t)(haddr & -qemu_ram_pagesize(rb)); > > > + > > > + /* > > > + * Discarded pages (via RamDiscardManager) are never migrated. On unlikely > > > + * access, place a zeropage, which will also set the relevant bits in the > > > + * recv_bitmap accordingly, so we won't try placing a zeropage twice. > > > + * > > > + * Checking a single bit is sufficient to handle pagesize > TPS as either > > > + * all relevant bits are set or not. > > > + */ > > > + assert(QEMU_IS_ALIGNED(start, qemu_ram_pagesize(rb))); > > > > Is this check for ramblock_page_is_discarded()? If so, shall we move this into > > it, e.g. after memory_region_has_ram_discard_manager() returned true? > > > > It has to hold true also when calling migrate_send_rp_req_pages(). > > Both callers -- postcopy_request_shared_page() and > postcopy_ram_fault_thread() properly align the offset down (but not the host > address). This check is just to make sure we don't mess up in the future. OK. Reviewed-by: Peter Xu <peterx@redhat.com>
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 2e9697bdd2..38cdfc09c3 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -671,6 +671,29 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, return ret; } +static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb, + ram_addr_t start, uint64_t haddr) +{ + void *aligned = (void *)(uintptr_t)(haddr & -qemu_ram_pagesize(rb)); + + /* + * Discarded pages (via RamDiscardManager) are never migrated. On unlikely + * access, place a zeropage, which will also set the relevant bits in the + * recv_bitmap accordingly, so we won't try placing a zeropage twice. + * + * Checking a single bit is sufficient to handle pagesize > TPS as either + * all relevant bits are set or not. + */ + assert(QEMU_IS_ALIGNED(start, qemu_ram_pagesize(rb))); + if (ramblock_page_is_discarded(rb, start)) { + bool received = ramblock_recv_bitmap_test_byte_offset(rb, start); + + return received ? 0 : postcopy_place_page_zero(mis, aligned, rb); + } + + return migrate_send_rp_req_pages(mis, rb, start, haddr); +} + /* * Callback from shared fault handlers to ask for a page, * the page must be specified by a RAMBlock and an offset in that rb @@ -690,7 +713,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb, qemu_ram_get_idstr(rb), rb_offset); return postcopy_wake_shared(pcfd, client_addr, rb); } - migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr); + postcopy_request_page(mis, rb, aligned_rbo, client_addr); return 0; } @@ -984,8 +1007,8 @@ retry: * Send the request to the source - we want to request one * of our host page sizes (which is >= TPS) */ - ret = migrate_send_rp_req_pages(mis, rb, rb_offset, - msg.arg.pagefault.address); + ret = postcopy_request_page(mis, rb, rb_offset, + msg.arg.pagefault.address); if (ret) { /* May be network failure, try to wait for recovery */ if (ret == -EIO && postcopy_pause_fault_thread(mis)) { @@ -993,7 +1016,7 @@ retry: goto retry; } else { /* This is a unavoidable fault */ - error_report("%s: migrate_send_rp_req_pages() get %d", + error_report("%s: postcopy_request_page() get %d", __func__, ret); break; } diff --git a/migration/ram.c b/migration/ram.c index 9776919faa..01cea01774 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -912,6 +912,27 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb) return cleared_bits; } +/* + * Check if a host-page aligned page falls into a discarded range as managed by + * a RamDiscardManager responsible for the mapped memory region of the RAMBlock. + * + * Note: The result is only stable while migration (precopy/postcopy). + */ +bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start) +{ + if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) { + RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr); + MemoryRegionSection section = { + .mr = rb->mr, + .offset_within_region = start, + .size = int128_get64(qemu_ram_pagesize(rb)), + }; + + return !ram_discard_manager_is_populated(rdm, §ion); + } + return false; +} + /* Called with RCU critical section */ static void ramblock_sync_dirty_bitmap(RAMState *rs, RAMBlock *rb) { diff --git a/migration/ram.h b/migration/ram.h index 4833e9fd5b..dda1988f3d 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -72,6 +72,7 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr); int64_t ramblock_recv_bitmap_send(QEMUFile *file, const char *block_name); int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); +bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); /* ram cache */ int colo_init_ram_cache(void);
Currently, when someone (i.e., the VM) accesses discarded parts inside a RAMBlock with a RamDiscardManager managing the corresponding mapped memory region, postcopy will request migration of the corresponding page from the source. The source, however, will never answer, because it refuses to migrate such pages with undefined content ("logically unplugged"): the pages are never dirty, and get_queued_page() will consequently skip processing these postcopy requests. Especially reading discarded ("logically unplugged") ranges is supposed to work in some setups (for example with current virtio-mem), although it barely ever happens: still, not placing a page would currently stall the VM, as it cannot make forward progress. Let's check the state via the RamDiscardManager (the state e.g., of virtio-mem is migrated during precopy) and avoid sending a request that will never get answered. Place a fresh zero page instead to keep the VM working. This is the same behavior that would happen automatically without userfaultfd being active, when accessing virtual memory regions without populated pages -- "populate on demand". For now, there are valid cases (as documented in the virtio-mem spec) where a VM might read discarded memory; in the future, we will disallow that. Then, we might want to handle that case differently, e.g., warning the user that the VM seems to be mis-behaving. Signed-off-by: David Hildenbrand <david@redhat.com> --- migration/postcopy-ram.c | 31 +++++++++++++++++++++++++++---- migration/ram.c | 21 +++++++++++++++++++++ migration/ram.h | 1 + 3 files changed, 49 insertions(+), 4 deletions(-)