Message ID | 20210730085249.8246-8-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration/ram: Optimize for virtio-mem via RamDiscardManager | expand |
On 7/30/21 10:52 AM, David Hildenbrand wrote: > We already don't ever migrate memory that corresponds to discarded ranges > as managed by a RamDiscardManager responsible for the mapped memory region > of the RAMBlock. > > virtio-mem uses this mechanism to logically unplug parts of a RAMBlock. > Right now, we still populate zeropages for the whole usable part of the > RAMBlock, which is undesired because: > > 1. Even populating the shared zeropage will result in memory getting > consumed for page tables. > 2. Memory backends without a shared zeropage (like hugetlbfs and shmem) > will populate an actual, fresh page, resulting in an unintended > memory consumption. > > Discarded ("logically unplugged") parts have to remain discarded. As > these pages are never part of the migration stream, there is no need to > track modifications via userfaultfd WP reliably for these parts. > > Further, any writes to these ranges by the VM are invalid and the > behavior is undefined. > > Note that Linux only supports userfaultfd WP on private anonymous memory > for now, which usually results in the shared zeropage getting populated. > The issue will become more relevant once userfaultfd WP supports shmem > and hugetlb. > > Acked-by: Peter Xu <peterx@redhat.com> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 45 insertions(+), 8 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 01cea01774..fd5949734e 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1639,6 +1639,28 @@ out: > return ret; > } > > +static inline void populate_range(RAMBlock *block, hwaddr offset, hwaddr size) > +{ > + char *ptr = (char *) block->host; > + > + for (; offset < size; offset += qemu_real_host_page_size) { > + char tmp = *(ptr + offset); > + > + /* Don't optimize the read out */ > + asm volatile("" : "+r" (tmp)); > + } This template is now used 3 times, a good opportunity to extract it as an (inline?) helper.
On 05.08.21 10:04, Philippe Mathieu-Daudé wrote: > On 7/30/21 10:52 AM, David Hildenbrand wrote: >> We already don't ever migrate memory that corresponds to discarded ranges >> as managed by a RamDiscardManager responsible for the mapped memory region >> of the RAMBlock. >> >> virtio-mem uses this mechanism to logically unplug parts of a RAMBlock. >> Right now, we still populate zeropages for the whole usable part of the >> RAMBlock, which is undesired because: >> >> 1. Even populating the shared zeropage will result in memory getting >> consumed for page tables. >> 2. Memory backends without a shared zeropage (like hugetlbfs and shmem) >> will populate an actual, fresh page, resulting in an unintended >> memory consumption. >> >> Discarded ("logically unplugged") parts have to remain discarded. As >> these pages are never part of the migration stream, there is no need to >> track modifications via userfaultfd WP reliably for these parts. >> >> Further, any writes to these ranges by the VM are invalid and the >> behavior is undefined. >> >> Note that Linux only supports userfaultfd WP on private anonymous memory >> for now, which usually results in the shared zeropage getting populated. >> The issue will become more relevant once userfaultfd WP supports shmem >> and hugetlb. >> >> Acked-by: Peter Xu <peterx@redhat.com> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++-------- >> 1 file changed, 45 insertions(+), 8 deletions(-) >> >> diff --git a/migration/ram.c b/migration/ram.c >> index 01cea01774..fd5949734e 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -1639,6 +1639,28 @@ out: >> return ret; >> } >> >> +static inline void populate_range(RAMBlock *block, hwaddr offset, hwaddr size) >> +{ >> + char *ptr = (char *) block->host; >> + >> + for (; offset < size; offset += qemu_real_host_page_size) { >> + char tmp = *(ptr + offset); >> + >> + /* Don't optimize the read out */ >> + asm volatile("" : "+r" (tmp)); >> + } > > This template is now used 3 times, a good opportunity to extract it as > an (inline?) helper. > Can you point me at the other users? Isn't populate_range() the inline helper you are looking for? :)
On 8/5/21 10:11 AM, David Hildenbrand wrote: > On 05.08.21 10:04, Philippe Mathieu-Daudé wrote: >> On 7/30/21 10:52 AM, David Hildenbrand wrote: >>> We already don't ever migrate memory that corresponds to discarded >>> ranges >>> as managed by a RamDiscardManager responsible for the mapped memory >>> region >>> of the RAMBlock. >>> >>> virtio-mem uses this mechanism to logically unplug parts of a RAMBlock. >>> Right now, we still populate zeropages for the whole usable part of the >>> RAMBlock, which is undesired because: >>> >>> 1. Even populating the shared zeropage will result in memory getting >>> consumed for page tables. >>> 2. Memory backends without a shared zeropage (like hugetlbfs and shmem) >>> will populate an actual, fresh page, resulting in an unintended >>> memory consumption. >>> >>> Discarded ("logically unplugged") parts have to remain discarded. As >>> these pages are never part of the migration stream, there is no need to >>> track modifications via userfaultfd WP reliably for these parts. >>> >>> Further, any writes to these ranges by the VM are invalid and the >>> behavior is undefined. >>> >>> Note that Linux only supports userfaultfd WP on private anonymous memory >>> for now, which usually results in the shared zeropage getting populated. >>> The issue will become more relevant once userfaultfd WP supports shmem >>> and hugetlb. >>> >>> Acked-by: Peter Xu <peterx@redhat.com> >>> Signed-off-by: David Hildenbrand <david@redhat.com> >>> --- >>> migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++-------- >>> 1 file changed, 45 insertions(+), 8 deletions(-) >>> >>> diff --git a/migration/ram.c b/migration/ram.c >>> index 01cea01774..fd5949734e 100644 >>> --- a/migration/ram.c >>> +++ b/migration/ram.c >>> @@ -1639,6 +1639,28 @@ out: >>> return ret; >>> } >>> +static inline void populate_range(RAMBlock *block, hwaddr offset, >>> hwaddr size) >>> +{ >>> + char *ptr = (char *) block->host; >>> + >>> + for (; offset < size; offset += qemu_real_host_page_size) { >>> + char tmp = *(ptr + offset); >>> + >>> + /* Don't optimize the read out */ >>> + asm volatile("" : "+r" (tmp)); >>> + } >> >> This template is now used 3 times, a good opportunity to extract it as >> an (inline?) helper. >> > > Can you point me at the other users? Oops I got lost reviewing the series. > Isn't populate_range() the inline helper you are looking for? :) Indeed :) Being a bit picky, I'd appreciate if you split this patch in 2, extracting populate_range() as a first trivial step.
On 05.08.21 10:21, Philippe Mathieu-Daudé wrote: > On 8/5/21 10:11 AM, David Hildenbrand wrote: >> On 05.08.21 10:04, Philippe Mathieu-Daudé wrote: >>> On 7/30/21 10:52 AM, David Hildenbrand wrote: >>>> We already don't ever migrate memory that corresponds to discarded >>>> ranges >>>> as managed by a RamDiscardManager responsible for the mapped memory >>>> region >>>> of the RAMBlock. >>>> >>>> virtio-mem uses this mechanism to logically unplug parts of a RAMBlock. >>>> Right now, we still populate zeropages for the whole usable part of the >>>> RAMBlock, which is undesired because: >>>> >>>> 1. Even populating the shared zeropage will result in memory getting >>>> consumed for page tables. >>>> 2. Memory backends without a shared zeropage (like hugetlbfs and shmem) >>>> will populate an actual, fresh page, resulting in an unintended >>>> memory consumption. >>>> >>>> Discarded ("logically unplugged") parts have to remain discarded. As >>>> these pages are never part of the migration stream, there is no need to >>>> track modifications via userfaultfd WP reliably for these parts. >>>> >>>> Further, any writes to these ranges by the VM are invalid and the >>>> behavior is undefined. >>>> >>>> Note that Linux only supports userfaultfd WP on private anonymous memory >>>> for now, which usually results in the shared zeropage getting populated. >>>> The issue will become more relevant once userfaultfd WP supports shmem >>>> and hugetlb. >>>> >>>> Acked-by: Peter Xu <peterx@redhat.com> >>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>> --- >>>> migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++-------- >>>> 1 file changed, 45 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/migration/ram.c b/migration/ram.c >>>> index 01cea01774..fd5949734e 100644 >>>> --- a/migration/ram.c >>>> +++ b/migration/ram.c >>>> @@ -1639,6 +1639,28 @@ out: >>>> return ret; >>>> } >>>> +static inline void populate_range(RAMBlock *block, hwaddr offset, >>>> hwaddr size) >>>> +{ >>>> + char *ptr = (char *) block->host; >>>> + >>>> + for (; offset < size; offset += ) { >>>> + char tmp = *(ptr + offset); >>>> + >>>> + /* Don't optimize the read out */ >>>> + asm volatile("" : "+r" (tmp)); >>>> + } >>> >>> This template is now used 3 times, a good opportunity to extract it as >>> an (inline?) helper. >>> >> >> Can you point me at the other users? > > Oops I got lost reviewing the series. > >> Isn't populate_range() the inline helper you are looking for? :) > > Indeed :) > > Being a bit picky, I'd appreciate if you split this patch in 2, > extracting populate_range() as a first trivial step. > Will do, and while at it again, will use ram_addr_t for the parameters and block->page_size instead of qemu_real_host_page_size.
diff --git a/migration/ram.c b/migration/ram.c index 01cea01774..fd5949734e 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1639,6 +1639,28 @@ out: return ret; } +static inline void populate_range(RAMBlock *block, hwaddr offset, hwaddr size) +{ + char *ptr = (char *) block->host; + + for (; offset < size; offset += qemu_real_host_page_size) { + char tmp = *(ptr + offset); + + /* Don't optimize the read out */ + asm volatile("" : "+r" (tmp)); + } +} + +static inline int populate_section(MemoryRegionSection *section, void *opaque) +{ + const hwaddr size = int128_get64(section->size); + hwaddr offset = section->offset_within_region; + RAMBlock *block = section->mr->ram_block; + + populate_range(block, offset, size); + return 0; +} + /* * ram_block_populate_pages: populate memory in the RAM block by reading * an integer from the beginning of each page. @@ -1648,16 +1670,31 @@ out: * * @block: RAM block to populate */ -static void ram_block_populate_pages(RAMBlock *block) +static void ram_block_populate_pages(RAMBlock *rb) { - char *ptr = (char *) block->host; - - for (ram_addr_t offset = 0; offset < block->used_length; - offset += qemu_real_host_page_size) { - char tmp = *(ptr + offset); + /* + * Skip populating all pages that fall into a discarded range as managed by + * a RamDiscardManager responsible for the mapped memory region of the + * RAMBlock. Such discarded ("logically unplugged") parts of a RAMBlock + * must not get populated automatically. We don't have to track + * modifications via userfaultfd WP reliably, because these pages will + * not be part of the migration stream either way -- see + * ramblock_dirty_bitmap_exclude_discarded_pages(). + * + * Note: The result is only stable while migration (precopy/postcopy). + */ + if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) { + RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr); + MemoryRegionSection section = { + .mr = rb->mr, + .offset_within_region = 0, + .size = rb->mr->size, + }; - /* Don't optimize the read out */ - asm volatile("" : "+r" (tmp)); + ram_discard_manager_replay_populated(rdm, §ion, + populate_section, NULL); + } else { + populate_range(rb, 0, qemu_ram_get_used_length(rb)); } }