diff mbox

[v6,02/15] x86/mm: setting fields in deferred pages

Message ID 1502138329-123460-3-git-send-email-pasha.tatashin@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pavel Tatashin Aug. 7, 2017, 8:38 p.m. UTC
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to first
initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled there is a case where we set some
fields prior to initializing:

        mem_init() {
                register_page_bootmem_info();
                free_all_bootmem();
                ...
        }

When register_page_bootmem_info() is called only non-deferred struct pages
are initialized. But, this function goes through some reserved pages which
might be part of the deferred, and thus are not yet initialized.

  mem_init
   register_page_bootmem_info
    register_page_bootmem_info_node
     get_page_bootmem
      .. setting fields here ..
      such as: page->freelist = (void *)type;

We end-up with similar issue as in the previous patch, where currently we
do not observe problem as memory is zeroed. But, if flag asserts are
changed we can start hitting issues.

Also, because in this patch series we will stop zeroing struct page memory
during allocation, we must make sure that struct pages are properly
initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 arch/x86/mm/init_64.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Michal Hocko Aug. 11, 2017, 9:02 a.m. UTC | #1
[CC Mel - the full series is here
http://lkml.kernel.org/r/1502138329-123460-1-git-send-email-pasha.tatashin@oracle.com]

On Mon 07-08-17 16:38:36, Pavel Tatashin wrote:
> Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
> flags and other fields in "struct page"es are never changed prior to first
> initializing struct pages by going through __init_single_page().
> 
> With deferred struct page feature enabled there is a case where we set some
> fields prior to initializing:
> 
>         mem_init() {
>                 register_page_bootmem_info();
>                 free_all_bootmem();
>                 ...
>         }
> 
> When register_page_bootmem_info() is called only non-deferred struct pages
> are initialized. But, this function goes through some reserved pages which
> might be part of the deferred, and thus are not yet initialized.
> 
>   mem_init
>    register_page_bootmem_info
>     register_page_bootmem_info_node
>      get_page_bootmem
>       .. setting fields here ..
>       such as: page->freelist = (void *)type;
> 
> We end-up with similar issue as in the previous patch, where currently we
> do not observe problem as memory is zeroed. But, if flag asserts are
> changed we can start hitting issues.
> 
> Also, because in this patch series we will stop zeroing struct page memory
> during allocation, we must make sure that struct pages are properly
> initialized prior to using them.
> 
> The deferred-reserved pages are initialized in free_all_bootmem().
> Therefore, the fix is to switch the above calls.

I have to confess that this part of the early struct page initialization
is not my strongest point and I have to always re-read the code from the
scratch but I really do not undestand what you are trying to achieve
here.

AFAIU register_page_bootmem_info_node is only about struct pages backing
pgdat, usemap and memmap. Those should be in reserved memblocks and we
do not initialize those at later times, they are not relevant to the
deferred initialization as your changelog suggests so the ordering with
get_page_bootmem shouldn't matter. Or am I missing something here?
 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Reviewed-by: Bob Picco <bob.picco@oracle.com>
> ---
>  arch/x86/mm/init_64.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 136422d7d539..1e863baec847 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1165,12 +1165,17 @@ void __init mem_init(void)
>  
>  	/* clear_bss() already clear the empty_zero_page */
>  
> -	register_page_bootmem_info();
> -
>  	/* this will put all memory onto the freelists */
>  	free_all_bootmem();
>  	after_bootmem = 1;
>  
> +	/* Must be done after boot memory is put on freelist, because here we
> +	 * might set fields in deferred struct pages that have not yet been
> +	 * initialized, and free_all_bootmem() initializes all the reserved
> +	 * deferred pages for us.
> +	 */
> +	register_page_bootmem_info();
> +
>  	/* Register memory areas for /proc/kcore */
>  	kclist_add(&kcore_vsyscall, (void *)VSYSCALL_ADDR,
>  			 PAGE_SIZE, KCORE_OTHER);
> -- 
> 2.14.0
Pavel Tatashin Aug. 11, 2017, 3:39 p.m. UTC | #2
> AFAIU register_page_bootmem_info_node is only about struct pages backing
> pgdat, usemap and memmap. Those should be in reserved memblocks and we
> do not initialize those at later times, they are not relevant to the
> deferred initialization as your changelog suggests so the ordering with
> get_page_bootmem shouldn't matter. Or am I missing something here?

The pages for pgdata, usemap, and memmap are part of reserved, and thus 
getting initialized when free_all_bootmem() is called.

So, we have something like this in mem_init()

register_page_bootmem_info
  register_page_bootmem_info_node
   get_page_bootmem
    .. setting fields here ..
    such as: page->freelist = (void *)type;

free_all_bootmem()
  free_low_memory_core_early()
   for_each_reserved_mem_region()
    reserve_bootmem_region()
     init_reserved_page() <- Only if this is deferred reserved page
      __init_single_pfn()
       __init_single_page()
           memset(0) <-- Loose the set fields here!

memblock does not know about deferred pages, and can be requested to 
allocate physical pages anywhere. So, the reserved pages in memblock can 
be both in non-deferred and deferred part of the memory.

Without deferred pages enabled, by the time register_page_bootmem_info() 
is called every page went through __init_single_page(), but with 
deferred pages enabled, there is scenario where fields can be set before 
pages go through __init_single_page(). This patch fixes it.
Michal Hocko Aug. 14, 2017, 11:43 a.m. UTC | #3
On Fri 11-08-17 11:39:41, Pasha Tatashin wrote:
> >AFAIU register_page_bootmem_info_node is only about struct pages backing
> >pgdat, usemap and memmap. Those should be in reserved memblocks and we
> >do not initialize those at later times, they are not relevant to the
> >deferred initialization as your changelog suggests so the ordering with
> >get_page_bootmem shouldn't matter. Or am I missing something here?
> 
> The pages for pgdata, usemap, and memmap are part of reserved, and thus
> getting initialized when free_all_bootmem() is called.
> 
> So, we have something like this in mem_init()
> 
> register_page_bootmem_info
>  register_page_bootmem_info_node
>   get_page_bootmem
>    .. setting fields here ..
>    such as: page->freelist = (void *)type;
> 
> free_all_bootmem()
>  free_low_memory_core_early()
>   for_each_reserved_mem_region()
>    reserve_bootmem_region()
>     init_reserved_page() <- Only if this is deferred reserved page
>      __init_single_pfn()
>       __init_single_page()
>           memset(0) <-- Loose the set fields here!

OK, I have missed that part. Please make it explicit in the changelog.
It is quite easy to get lost in the deep call chains.
Pavel Tatashin Aug. 14, 2017, 1:32 p.m. UTC | #4
On 08/14/2017 07:43 AM, Michal Hocko wrote:
>> register_page_bootmem_info
>>   register_page_bootmem_info_node
>>    get_page_bootmem
>>     .. setting fields here ..
>>     such as: page->freelist = (void *)type;
>>
>> free_all_bootmem()
>>   free_low_memory_core_early()
>>    for_each_reserved_mem_region()
>>     reserve_bootmem_region()
>>      init_reserved_page() <- Only if this is deferred reserved page
>>       __init_single_pfn()
>>        __init_single_page()
>>            memset(0) <-- Loose the set fields here!
> OK, I have missed that part. Please make it explicit in the changelog.
> It is quite easy to get lost in the deep call chains.

Ok, will update comment.
diff mbox

Patch

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 136422d7d539..1e863baec847 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1165,12 +1165,17 @@  void __init mem_init(void)
 
 	/* clear_bss() already clear the empty_zero_page */
 
-	register_page_bootmem_info();
-
 	/* this will put all memory onto the freelists */
 	free_all_bootmem();
 	after_bootmem = 1;
 
+	/* Must be done after boot memory is put on freelist, because here we
+	 * might set fields in deferred struct pages that have not yet been
+	 * initialized, and free_all_bootmem() initializes all the reserved
+	 * deferred pages for us.
+	 */
+	register_page_bootmem_info();
+
 	/* Register memory areas for /proc/kcore */
 	kclist_add(&kcore_vsyscall, (void *)VSYSCALL_ADDR,
 			 PAGE_SIZE, KCORE_OTHER);