diff mbox series

[mm,v2,4/6] mm: Do not set reserved flag for hotplug memory

Message ID 20181011221351.1925.67694.stgit@localhost.localdomain (mailing list archive)
State New, archived
Headers show
Series Deferred page init improvements | expand

Commit Message

Alexander Duyck Oct. 11, 2018, 10:13 p.m. UTC
The general suspicion at this point is that the setting of the reserved bit
is not really needed for hotplug memory. In addition the setting of this
bit results in issues for DAX in that it is not possible to assign the
region to KVM if the reserved bit is set in each page.

For now we can try just not setting the bit since we suspect it isn't
adding value in setting it. If at a later time we find that it is needed we
can come back through and re-add it for the hotplug paths.

Suggested-by: Michael Hocko <mhocko@suse.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 mm/page_alloc.c |   11 -----------
 1 file changed, 11 deletions(-)

Comments

Dan Williams Oct. 11, 2018, 10:58 p.m. UTC | #1
On Thu, Oct 11, 2018 at 3:18 PM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
> The general suspicion at this point is that the setting of the reserved bit
> is not really needed for hotplug memory. In addition the setting of this
> bit results in issues for DAX in that it is not possible to assign the
> region to KVM if the reserved bit is set in each page.
>
> For now we can try just not setting the bit since we suspect it isn't
> adding value in setting it. If at a later time we find that it is needed we
> can come back through and re-add it for the hotplug paths.
>
> Suggested-by: Michael Hocko <mhocko@suse.com>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  mm/page_alloc.c |   11 -----------
>  1 file changed, 11 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3603d5444865..e435223e2ddb 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5571,8 +5571,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>
>                 page = pfn_to_page(pfn);
>                 __init_single_page(page, pfn, zone, nid);
> -               if (context == MEMMAP_HOTPLUG)
> -                       __SetPageReserved(page);

At a minimum I think we need to do this before removing PageReserved,
to make sure zone_device pages are not tracked in the hibernation
image.

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 3d37c279c090..c0613137d726 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1285,6 +1285,9 @@ static struct page *saveable_page(struct zone
*zone, unsigned long pfn)
        if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
                return NULL;

+       if (is_zone_device_page(page))
+               return NULL;
+
        if (PageReserved(page)
            && (!kernel_page_present(page) || pfn_is_nosave(pfn)))
                return NULL;
Alexander Duyck Oct. 11, 2018, 11:22 p.m. UTC | #2
On 10/11/2018 3:58 PM, Dan Williams wrote:
> On Thu, Oct 11, 2018 at 3:18 PM Alexander Duyck
> <alexander.h.duyck@linux.intel.com> wrote:
>>
>> The general suspicion at this point is that the setting of the reserved bit
>> is not really needed for hotplug memory. In addition the setting of this
>> bit results in issues for DAX in that it is not possible to assign the
>> region to KVM if the reserved bit is set in each page.
>>
>> For now we can try just not setting the bit since we suspect it isn't
>> adding value in setting it. If at a later time we find that it is needed we
>> can come back through and re-add it for the hotplug paths.
>>
>> Suggested-by: Michael Hocko <mhocko@suse.com>
>> Reported-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>> ---
>>   mm/page_alloc.c |   11 -----------
>>   1 file changed, 11 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3603d5444865..e435223e2ddb 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5571,8 +5571,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>
>>                  page = pfn_to_page(pfn);
>>                  __init_single_page(page, pfn, zone, nid);
>> -               if (context == MEMMAP_HOTPLUG)
>> -                       __SetPageReserved(page);
> 
> At a minimum I think we need to do this before removing PageReserved,
> to make sure zone_device pages are not tracked in the hibernation
> image.
> 
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index 3d37c279c090..c0613137d726 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -1285,6 +1285,9 @@ static struct page *saveable_page(struct zone
> *zone, unsigned long pfn)
>          if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
>                  return NULL;
> 
> +       if (is_zone_device_page(page))
> +               return NULL;
> +
>          if (PageReserved(page)
>              && (!kernel_page_present(page) || pfn_is_nosave(pfn)))
>                  return NULL;
> 

Yeah, I am pretty sure I broke hotplug in general with this too since it 
seems like it checks for the reserved flag before bringing a range 
online in online_pages_range.

I think I will drop this patch and go back to what I had before. There 
was a slight performance gain to be had for not setting the bit at all, 
but I think we are probably looking at yet another patch set if we want 
to go through and drop the need for the reserved bit to be set.

- Alex
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3603d5444865..e435223e2ddb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5571,8 +5571,6 @@  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 
 		page = pfn_to_page(pfn);
 		__init_single_page(page, pfn, zone, nid);
-		if (context == MEMMAP_HOTPLUG)
-			__SetPageReserved(page);
 
 		/*
 		 * Mark the block movable so that blocks are reserved for
@@ -5626,15 +5624,6 @@  void __ref memmap_init_zone_device(struct zone *zone,
 		__init_single_page(page, pfn, zone_idx, nid);
 
 		/*
-		 * Mark page reserved as it will need to wait for onlining
-		 * phase for it to be fully associated with a zone.
-		 *
-		 * We can use the non-atomic __set_bit operation for setting
-		 * the flag as we are still initializing the pages.
-		 */
-		__SetPageReserved(page);
-
-		/*
 		 * ZONE_DEVICE pages union ->lru with a ->pgmap back
 		 * pointer and hmm_data.  It is a bug if a ZONE_DEVICE
 		 * page is ever freed or placed on a driver-private list.