diff mbox series

[1/1] mm: Always release pages to the buddy allocator in memblock_free_late().

Message ID 010101857bbc4d26-d9683bb4-c4f0-465b-aea6-5314dbf0aa01-000000@us-west-2.amazonses.com (mailing list archive)
State Deferred, archived
Headers show
Series [1/1] mm: Always release pages to the buddy allocator in memblock_free_late(). | expand

Commit Message

Aaron Thompson Jan. 4, 2023, 7:43 a.m. UTC
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. memblock_free_all() initializes all pages in reserved ranges, and
accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.

In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().

For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.

Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Aaron Thompson <dev@aaront.org>
---
 mm/memblock.c                     | 2 +-
 tools/testing/memblock/internal.h | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Mike Rapoport Jan. 4, 2023, 7:34 p.m. UTC | #1
Hi,

On Wed, Jan 04, 2023 at 07:43:36AM +0000, Aaron Thompson wrote:
> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
> only releases pages to the buddy allocator if they are not in the
> deferred range. This is correct for free pages (as defined by
> for_each_free_mem_pfn_range_in_zone()) because free pages in the
> deferred range will be initialized and released as part of the deferred
> init process. memblock_free_pages() is called by memblock_free_late(),
> which is used to free reserved ranges after memblock_free_all() has
> run. memblock_free_all() initializes all pages in reserved ranges, and

To be precise, memblock_free_all() frees pages, or releases them to the
pages allocator, rather than initializes.

> accordingly, those pages are not touched by the deferred init
> process. This means that currently, if the pages that
> memblock_free_late() intends to release are in the deferred range, they
> will never be released to the buddy allocator. They will forever be
> reserved.
> 
> In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
> which is also correct for free pages but is not correct for reserved
> pages. KMSAN metadata for reserved pages is initialized by
> kmsan_init_shadow(), which runs shortly before memblock_free_all().
> 
> For both of these reasons, memblock_free_pages() should only be called
> for free pages, and memblock_free_late() should call __free_pages_core()
> directly instead.

Overall looks fine to me and I couldn't spot potential issues.

I'd appreciate if you add a paragraph about the actual issue with EFI boot
you described in the cover letter to the commit message.

> Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
> Signed-off-by: Aaron Thompson <dev@aaront.org>
> ---
>  mm/memblock.c                     | 2 +-
>  tools/testing/memblock/internal.h | 4 ++++
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 511d4783dcf1..56a5b6086c50 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1640,7 +1640,7 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
>  	end = PFN_DOWN(base + size);
>  
>  	for (; cursor < end; cursor++) {
> -		memblock_free_pages(pfn_to_page(cursor), cursor, 0);
> +		__free_pages_core(pfn_to_page(cursor), 0);

Please add a comment that explains why it is safe to call __free_pages_core() here.
Something like

	/*
	 * Reserved pages are always initialized by the end of
	 * memblock_free_all() either during memmap_init() or, with deferred
	 * initialization if struct page in reserve_bootmem_region()
	 */

>  		totalram_pages_inc();
>  	}
>  }
> diff --git a/tools/testing/memblock/internal.h b/tools/testing/memblock/internal.h
> index fdb7f5db7308..85973e55489e 100644
> --- a/tools/testing/memblock/internal.h
> +++ b/tools/testing/memblock/internal.h
> @@ -15,6 +15,10 @@ bool mirrored_kernelcore = false;
>  
>  struct page {};
>  
> +void __free_pages_core(struct page *page, unsigned int order)
> +{
> +}
> +
>  void memblock_free_pages(struct page *page, unsigned long pfn,
>  			 unsigned int order)
>  {
> -- 
> 2.30.2
>
Aaron Thompson Jan. 5, 2023, 4:02 a.m. UTC | #2
Hi Mike,

On 2023-01-04 11:34, Mike Rapoport wrote:
> Hi,
> 
> On Wed, Jan 04, 2023 at 07:43:36AM +0000, Aaron Thompson wrote:
>> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
>> only releases pages to the buddy allocator if they are not in the
>> deferred range. This is correct for free pages (as defined by
>> for_each_free_mem_pfn_range_in_zone()) because free pages in the
>> deferred range will be initialized and released as part of the 
>> deferred
>> init process. memblock_free_pages() is called by memblock_free_late(),
>> which is used to free reserved ranges after memblock_free_all() has
>> run. memblock_free_all() initializes all pages in reserved ranges, and
> 
> To be precise, memblock_free_all() frees pages, or releases them to the
> pages allocator, rather than initializes.

As you mentioned in the comment below, whether memblock_free_all() does 
any
initializing depends on the particular deferred init situation.
memblock_free_all() does ultimately call init_reserved_page() for every 
reserved
page (via reserve_bootmem_region()), but that only actually initializes 
the page
if it's in the deferred range. In either case, all I was trying to say 
here is
that we can be certain that all reserved pages have been initialized 
after
memblock_free_all() has run, so I'll rephrase that.

>> accordingly, those pages are not touched by the deferred init
>> process. This means that currently, if the pages that
>> memblock_free_late() intends to release are in the deferred range, 
>> they
>> will never be released to the buddy allocator. They will forever be
>> reserved.
>> 
>> In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
>> which is also correct for free pages but is not correct for reserved
>> pages. KMSAN metadata for reserved pages is initialized by
>> kmsan_init_shadow(), which runs shortly before memblock_free_all().
>> 
>> For both of these reasons, memblock_free_pages() should only be called
>> for free pages, and memblock_free_late() should call 
>> __free_pages_core()
>> directly instead.
> 
> Overall looks fine to me and I couldn't spot potential issues.
> 
> I'd appreciate if you add a paragraph about the actual issue with EFI 
> boot
> you described in the cover letter to the commit message.

Sure, will do.

>> Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages 
>> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
>> Signed-off-by: Aaron Thompson <dev@aaront.org>
>> ---
>>  mm/memblock.c                     | 2 +-
>>  tools/testing/memblock/internal.h | 4 ++++
>>  2 files changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 511d4783dcf1..56a5b6086c50 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1640,7 +1640,7 @@ void __init memblock_free_late(phys_addr_t base, 
>> phys_addr_t size)
>>  	end = PFN_DOWN(base + size);
>> 
>>  	for (; cursor < end; cursor++) {
>> -		memblock_free_pages(pfn_to_page(cursor), cursor, 0);
>> +		__free_pages_core(pfn_to_page(cursor), 0);
> 
> Please add a comment that explains why it is safe to call
> __free_pages_core() here.
> Something like
> 
> 	/*
> 	 * Reserved pages are always initialized by the end of
> 	 * memblock_free_all() either during memmap_init() or, with deferred
> 	 * initialization if struct page in reserve_bootmem_region()
> 	 */

Will do. Thanks for the review.

>>  		totalram_pages_inc();
>>  	}
>>  }
>> diff --git a/tools/testing/memblock/internal.h 
>> b/tools/testing/memblock/internal.h
>> index fdb7f5db7308..85973e55489e 100644
>> --- a/tools/testing/memblock/internal.h
>> +++ b/tools/testing/memblock/internal.h
>> @@ -15,6 +15,10 @@ bool mirrored_kernelcore = false;
>> 
>>  struct page {};
>> 
>> +void __free_pages_core(struct page *page, unsigned int order)
>> +{
>> +}
>> +
>>  void memblock_free_pages(struct page *page, unsigned long pfn,
>>  			 unsigned int order)
>>  {
>> --
>> 2.30.2
>> 

Thanks,
-- Aaron
diff mbox series

Patch

diff --git a/mm/memblock.c b/mm/memblock.c
index 511d4783dcf1..56a5b6086c50 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1640,7 +1640,7 @@  void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
 	end = PFN_DOWN(base + size);
 
 	for (; cursor < end; cursor++) {
-		memblock_free_pages(pfn_to_page(cursor), cursor, 0);
+		__free_pages_core(pfn_to_page(cursor), 0);
 		totalram_pages_inc();
 	}
 }
diff --git a/tools/testing/memblock/internal.h b/tools/testing/memblock/internal.h
index fdb7f5db7308..85973e55489e 100644
--- a/tools/testing/memblock/internal.h
+++ b/tools/testing/memblock/internal.h
@@ -15,6 +15,10 @@  bool mirrored_kernelcore = false;
 
 struct page {};
 
+void __free_pages_core(struct page *page, unsigned int order)
+{
+}
+
 void memblock_free_pages(struct page *page, unsigned long pfn,
 			 unsigned int order)
 {