diff mbox series

[v3,7/7] migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots

Message ID 20210730085249.8246-8-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series migration/ram: Optimize for virtio-mem via RamDiscardManager | expand

Commit Message

David Hildenbrand July 30, 2021, 8:52 a.m. UTC
We already don't ever migrate memory that corresponds to discarded ranges
as managed by a RamDiscardManager responsible for the mapped memory region
of the RAMBlock.

virtio-mem uses this mechanism to logically unplug parts of a RAMBlock.
Right now, we still populate zeropages for the whole usable part of the
RAMBlock, which is undesired because:

1. Even populating the shared zeropage will result in memory getting
   consumed for page tables.
2. Memory backends without a shared zeropage (like hugetlbfs and shmem)
   will populate an actual, fresh page, resulting in an unintended
   memory consumption.

Discarded ("logically unplugged") parts have to remain discarded. As
these pages are never part of the migration stream, there is no need to
track modifications via userfaultfd WP reliably for these parts.

Further, any writes to these ranges by the VM are invalid and the
behavior is undefined.

Note that Linux only supports userfaultfd WP on private anonymous memory
for now, which usually results in the shared zeropage getting populated.
The issue will become more relevant once userfaultfd WP supports shmem
and hugetlb.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 8 deletions(-)

Comments

Philippe Mathieu-Daudé Aug. 5, 2021, 8:04 a.m. UTC | #1
On 7/30/21 10:52 AM, David Hildenbrand wrote:
> We already don't ever migrate memory that corresponds to discarded ranges
> as managed by a RamDiscardManager responsible for the mapped memory region
> of the RAMBlock.
> 
> virtio-mem uses this mechanism to logically unplug parts of a RAMBlock.
> Right now, we still populate zeropages for the whole usable part of the
> RAMBlock, which is undesired because:
> 
> 1. Even populating the shared zeropage will result in memory getting
>    consumed for page tables.
> 2. Memory backends without a shared zeropage (like hugetlbfs and shmem)
>    will populate an actual, fresh page, resulting in an unintended
>    memory consumption.
> 
> Discarded ("logically unplugged") parts have to remain discarded. As
> these pages are never part of the migration stream, there is no need to
> track modifications via userfaultfd WP reliably for these parts.
> 
> Further, any writes to these ranges by the VM are invalid and the
> behavior is undefined.
> 
> Note that Linux only supports userfaultfd WP on private anonymous memory
> for now, which usually results in the shared zeropage getting populated.
> The issue will become more relevant once userfaultfd WP supports shmem
> and hugetlb.
> 
> Acked-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 01cea01774..fd5949734e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1639,6 +1639,28 @@ out:
>      return ret;
>  }
>  
> +static inline void populate_range(RAMBlock *block, hwaddr offset, hwaddr size)
> +{
> +    char *ptr = (char *) block->host;
> +
> +    for (; offset < size; offset += qemu_real_host_page_size) {
> +        char tmp = *(ptr + offset);
> +
> +        /* Don't optimize the read out */
> +        asm volatile("" : "+r" (tmp));
> +    }

This template is now used 3 times, a good opportunity to extract it as
an (inline?) helper.
David Hildenbrand Aug. 5, 2021, 8:11 a.m. UTC | #2
On 05.08.21 10:04, Philippe Mathieu-Daudé wrote:
> On 7/30/21 10:52 AM, David Hildenbrand wrote:
>> We already don't ever migrate memory that corresponds to discarded ranges
>> as managed by a RamDiscardManager responsible for the mapped memory region
>> of the RAMBlock.
>>
>> virtio-mem uses this mechanism to logically unplug parts of a RAMBlock.
>> Right now, we still populate zeropages for the whole usable part of the
>> RAMBlock, which is undesired because:
>>
>> 1. Even populating the shared zeropage will result in memory getting
>>     consumed for page tables.
>> 2. Memory backends without a shared zeropage (like hugetlbfs and shmem)
>>     will populate an actual, fresh page, resulting in an unintended
>>     memory consumption.
>>
>> Discarded ("logically unplugged") parts have to remain discarded. As
>> these pages are never part of the migration stream, there is no need to
>> track modifications via userfaultfd WP reliably for these parts.
>>
>> Further, any writes to these ranges by the VM are invalid and the
>> behavior is undefined.
>>
>> Note that Linux only supports userfaultfd WP on private anonymous memory
>> for now, which usually results in the shared zeropage getting populated.
>> The issue will become more relevant once userfaultfd WP supports shmem
>> and hugetlb.
>>
>> Acked-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 45 insertions(+), 8 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 01cea01774..fd5949734e 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1639,6 +1639,28 @@ out:
>>       return ret;
>>   }
>>   
>> +static inline void populate_range(RAMBlock *block, hwaddr offset, hwaddr size)
>> +{
>> +    char *ptr = (char *) block->host;
>> +
>> +    for (; offset < size; offset += qemu_real_host_page_size) {
>> +        char tmp = *(ptr + offset);
>> +
>> +        /* Don't optimize the read out */
>> +        asm volatile("" : "+r" (tmp));
>> +    }
> 
> This template is now used 3 times, a good opportunity to extract it as
> an (inline?) helper.
> 

Can you point me at the other users?

Isn't populate_range() the inline helper you are looking for? :)
Philippe Mathieu-Daudé Aug. 5, 2021, 8:21 a.m. UTC | #3
On 8/5/21 10:11 AM, David Hildenbrand wrote:
> On 05.08.21 10:04, Philippe Mathieu-Daudé wrote:
>> On 7/30/21 10:52 AM, David Hildenbrand wrote:
>>> We already don't ever migrate memory that corresponds to discarded
>>> ranges
>>> as managed by a RamDiscardManager responsible for the mapped memory
>>> region
>>> of the RAMBlock.
>>>
>>> virtio-mem uses this mechanism to logically unplug parts of a RAMBlock.
>>> Right now, we still populate zeropages for the whole usable part of the
>>> RAMBlock, which is undesired because:
>>>
>>> 1. Even populating the shared zeropage will result in memory getting
>>>     consumed for page tables.
>>> 2. Memory backends without a shared zeropage (like hugetlbfs and shmem)
>>>     will populate an actual, fresh page, resulting in an unintended
>>>     memory consumption.
>>>
>>> Discarded ("logically unplugged") parts have to remain discarded. As
>>> these pages are never part of the migration stream, there is no need to
>>> track modifications via userfaultfd WP reliably for these parts.
>>>
>>> Further, any writes to these ranges by the VM are invalid and the
>>> behavior is undefined.
>>>
>>> Note that Linux only supports userfaultfd WP on private anonymous memory
>>> for now, which usually results in the shared zeropage getting populated.
>>> The issue will become more relevant once userfaultfd WP supports shmem
>>> and hugetlb.
>>>
>>> Acked-by: Peter Xu <peterx@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++--------
>>>   1 file changed, 45 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index 01cea01774..fd5949734e 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1639,6 +1639,28 @@ out:
>>>       return ret;
>>>   }
>>>   +static inline void populate_range(RAMBlock *block, hwaddr offset,
>>> hwaddr size)
>>> +{
>>> +    char *ptr = (char *) block->host;
>>> +
>>> +    for (; offset < size; offset += qemu_real_host_page_size) {
>>> +        char tmp = *(ptr + offset);
>>> +
>>> +        /* Don't optimize the read out */
>>> +        asm volatile("" : "+r" (tmp));
>>> +    }
>>
>> This template is now used 3 times, a good opportunity to extract it as
>> an (inline?) helper.
>>
> 
> Can you point me at the other users?

Oops I got lost reviewing the series.

> Isn't populate_range() the inline helper you are looking for? :)

Indeed :)

Being a bit picky, I'd appreciate if you split this patch in 2,
extracting populate_range() as a first trivial step.
David Hildenbrand Aug. 5, 2021, 8:27 a.m. UTC | #4
On 05.08.21 10:21, Philippe Mathieu-Daudé wrote:
> On 8/5/21 10:11 AM, David Hildenbrand wrote:
>> On 05.08.21 10:04, Philippe Mathieu-Daudé wrote:
>>> On 7/30/21 10:52 AM, David Hildenbrand wrote:
>>>> We already don't ever migrate memory that corresponds to discarded
>>>> ranges
>>>> as managed by a RamDiscardManager responsible for the mapped memory
>>>> region
>>>> of the RAMBlock.
>>>>
>>>> virtio-mem uses this mechanism to logically unplug parts of a RAMBlock.
>>>> Right now, we still populate zeropages for the whole usable part of the
>>>> RAMBlock, which is undesired because:
>>>>
>>>> 1. Even populating the shared zeropage will result in memory getting
>>>>      consumed for page tables.
>>>> 2. Memory backends without a shared zeropage (like hugetlbfs and shmem)
>>>>      will populate an actual, fresh page, resulting in an unintended
>>>>      memory consumption.
>>>>
>>>> Discarded ("logically unplugged") parts have to remain discarded. As
>>>> these pages are never part of the migration stream, there is no need to
>>>> track modifications via userfaultfd WP reliably for these parts.
>>>>
>>>> Further, any writes to these ranges by the VM are invalid and the
>>>> behavior is undefined.
>>>>
>>>> Note that Linux only supports userfaultfd WP on private anonymous memory
>>>> for now, which usually results in the shared zeropage getting populated.
>>>> The issue will become more relevant once userfaultfd WP supports shmem
>>>> and hugetlb.
>>>>
>>>> Acked-by: Peter Xu <peterx@redhat.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>    migration/ram.c | 53 +++++++++++++++++++++++++++++++++++++++++--------
>>>>    1 file changed, 45 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 01cea01774..fd5949734e 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -1639,6 +1639,28 @@ out:
>>>>        return ret;
>>>>    }
>>>>    +static inline void populate_range(RAMBlock *block, hwaddr offset,
>>>> hwaddr size)
>>>> +{
>>>> +    char *ptr = (char *) block->host;
>>>> +
>>>> +    for (; offset < size; offset +=  ) {
>>>> +        char tmp = *(ptr + offset);
>>>> +
>>>> +        /* Don't optimize the read out */
>>>> +        asm volatile("" : "+r" (tmp));
>>>> +    }
>>>
>>> This template is now used 3 times, a good opportunity to extract it as
>>> an (inline?) helper.
>>>
>>
>> Can you point me at the other users?
> 
> Oops I got lost reviewing the series.
> 
>> Isn't populate_range() the inline helper you are looking for? :)
> 
> Indeed :)
> 
> Being a bit picky, I'd appreciate if you split this patch in 2,
> extracting populate_range() as a first trivial step.
> 

Will do, and while at it again, will use ram_addr_t for the parameters 
and block->page_size instead of qemu_real_host_page_size.
diff mbox series

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 01cea01774..fd5949734e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1639,6 +1639,28 @@  out:
     return ret;
 }
 
+static inline void populate_range(RAMBlock *block, hwaddr offset, hwaddr size)
+{
+    char *ptr = (char *) block->host;
+
+    for (; offset < size; offset += qemu_real_host_page_size) {
+        char tmp = *(ptr + offset);
+
+        /* Don't optimize the read out */
+        asm volatile("" : "+r" (tmp));
+    }
+}
+
+static inline int populate_section(MemoryRegionSection *section, void *opaque)
+{
+    const hwaddr size = int128_get64(section->size);
+    hwaddr offset = section->offset_within_region;
+    RAMBlock *block = section->mr->ram_block;
+
+    populate_range(block, offset, size);
+    return 0;
+}
+
 /*
  * ram_block_populate_pages: populate memory in the RAM block by reading
  *   an integer from the beginning of each page.
@@ -1648,16 +1670,31 @@  out:
  *
  * @block: RAM block to populate
  */
-static void ram_block_populate_pages(RAMBlock *block)
+static void ram_block_populate_pages(RAMBlock *rb)
 {
-    char *ptr = (char *) block->host;
-
-    for (ram_addr_t offset = 0; offset < block->used_length;
-            offset += qemu_real_host_page_size) {
-        char tmp = *(ptr + offset);
+    /*
+     * Skip populating all pages that fall into a discarded range as managed by
+     * a RamDiscardManager responsible for the mapped memory region of the
+     * RAMBlock. Such discarded ("logically unplugged") parts of a RAMBlock
+     * must not get populated automatically. We don't have to track
+     * modifications via userfaultfd WP reliably, because these pages will
+     * not be part of the migration stream either way -- see
+     * ramblock_dirty_bitmap_exclude_discarded_pages().
+     *
+     * Note: The result is only stable while migration (precopy/postcopy).
+     */
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = 0,
+            .size = rb->mr->size,
+        };
 
-        /* Don't optimize the read out */
-        asm volatile("" : "+r" (tmp));
+        ram_discard_manager_replay_populated(rdm, &section,
+                                             populate_section, NULL);
+    } else {
+        populate_range(rb, 0, qemu_ram_get_used_length(rb));
     }
 }