diff mbox series

[v7,05/15] mm/bootmem_info: Introduce {free,prepare}_vmemmap_page()

Message ID 20201130151838.11208-6-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Free some vmemmap pages of hugetlb page | expand

Commit Message

Muchun Song Nov. 30, 2020, 3:18 p.m. UTC
In the later patch, we can use the free_vmemmap_page() to free the
unused vmemmap pages and initialize a page for vmemmap page using
via prepare_vmemmap_page().

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/bootmem_info.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

David Hildenbrand Dec. 7, 2020, 12:39 p.m. UTC | #1
On 30.11.20 16:18, Muchun Song wrote:
> In the later patch, we can use the free_vmemmap_page() to free the
> unused vmemmap pages and initialize a page for vmemmap page using
> via prepare_vmemmap_page().
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/bootmem_info.h | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
> index 4ed6dee1adc9..239e3cc8f86c 100644
> --- a/include/linux/bootmem_info.h
> +++ b/include/linux/bootmem_info.h
> @@ -3,6 +3,7 @@
>  #define __LINUX_BOOTMEM_INFO_H
>  
>  #include <linux/mmzone.h>
> +#include <linux/mm.h>
>  
>  /*
>   * Types for free bootmem stored in page->lru.next. These have to be in
> @@ -22,6 +23,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat);
>  void get_page_bootmem(unsigned long info, struct page *page,
>  		      unsigned long type);
>  void put_page_bootmem(struct page *page);
> +
> +static inline void free_vmemmap_page(struct page *page)
> +{
> +	VM_WARN_ON(!PageReserved(page) || page_ref_count(page) != 2);
> +
> +	/* bootmem page has reserved flag in the reserve_bootmem_region */
> +	if (PageReserved(page)) {
> +		unsigned long magic = (unsigned long)page->freelist;
> +
> +		if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
> +			put_page_bootmem(page);
> +		else
> +			WARN_ON(1);
> +	}
> +}
> +
> +static inline void prepare_vmemmap_page(struct page *page)
> +{
> +	unsigned long section_nr = pfn_to_section_nr(page_to_pfn(page));
> +
> +	get_page_bootmem(section_nr, page, SECTION_INFO);
> +	mark_page_reserved(page);
> +}

Can you clarify in the description when exactly these functions are
called and on which type of pages?

Would indicating "bootmem" in the function names make it clearer what we
are dealing with?

E.g., any memory allocated via the memblock allocator and not via the
buddy will be makred reserved already in the memmap. It's unclear to me
why we need the mark_page_reserved() here - can you enlighten me? :)
Muchun Song Dec. 7, 2020, 1:23 p.m. UTC | #2
On Mon, Dec 7, 2020 at 8:39 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 30.11.20 16:18, Muchun Song wrote:
> > In the later patch, we can use the free_vmemmap_page() to free the
> > unused vmemmap pages and initialize a page for vmemmap page using
> > via prepare_vmemmap_page().
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/bootmem_info.h | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
> > index 4ed6dee1adc9..239e3cc8f86c 100644
> > --- a/include/linux/bootmem_info.h
> > +++ b/include/linux/bootmem_info.h
> > @@ -3,6 +3,7 @@
> >  #define __LINUX_BOOTMEM_INFO_H
> >
> >  #include <linux/mmzone.h>
> > +#include <linux/mm.h>
> >
> >  /*
> >   * Types for free bootmem stored in page->lru.next. These have to be in
> > @@ -22,6 +23,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat);
> >  void get_page_bootmem(unsigned long info, struct page *page,
> >                     unsigned long type);
> >  void put_page_bootmem(struct page *page);
> > +
> > +static inline void free_vmemmap_page(struct page *page)
> > +{
> > +     VM_WARN_ON(!PageReserved(page) || page_ref_count(page) != 2);
> > +
> > +     /* bootmem page has reserved flag in the reserve_bootmem_region */
> > +     if (PageReserved(page)) {
> > +             unsigned long magic = (unsigned long)page->freelist;
> > +
> > +             if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
> > +                     put_page_bootmem(page);
> > +             else
> > +                     WARN_ON(1);
> > +     }
> > +}
> > +
> > +static inline void prepare_vmemmap_page(struct page *page)
> > +{
> > +     unsigned long section_nr = pfn_to_section_nr(page_to_pfn(page));
> > +
> > +     get_page_bootmem(section_nr, page, SECTION_INFO);
> > +     mark_page_reserved(page);
> > +}
>
> Can you clarify in the description when exactly these functions are
> called and on which type of pages?

Will do.

>
> Would indicating "bootmem" in the function names make it clearer what we
> are dealing with?
>
> E.g., any memory allocated via the memblock allocator and not via the
> buddy will be makred reserved already in the memmap. It's unclear to me
> why we need the mark_page_reserved() here - can you enlighten me? :)

Very thanks for your suggestions.

>
> --
> Thanks,
>
> David / dhildenb
>
Muchun Song Dec. 9, 2020, 7:36 a.m. UTC | #3
On Mon, Dec 7, 2020 at 8:39 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 30.11.20 16:18, Muchun Song wrote:
> > In the later patch, we can use the free_vmemmap_page() to free the
> > unused vmemmap pages and initialize a page for vmemmap page using
> > via prepare_vmemmap_page().
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/bootmem_info.h | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
> > index 4ed6dee1adc9..239e3cc8f86c 100644
> > --- a/include/linux/bootmem_info.h
> > +++ b/include/linux/bootmem_info.h
> > @@ -3,6 +3,7 @@
> >  #define __LINUX_BOOTMEM_INFO_H
> >
> >  #include <linux/mmzone.h>
> > +#include <linux/mm.h>
> >
> >  /*
> >   * Types for free bootmem stored in page->lru.next. These have to be in
> > @@ -22,6 +23,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat);
> >  void get_page_bootmem(unsigned long info, struct page *page,
> >                     unsigned long type);
> >  void put_page_bootmem(struct page *page);
> > +
> > +static inline void free_vmemmap_page(struct page *page)
> > +{
> > +     VM_WARN_ON(!PageReserved(page) || page_ref_count(page) != 2);
> > +
> > +     /* bootmem page has reserved flag in the reserve_bootmem_region */
> > +     if (PageReserved(page)) {
> > +             unsigned long magic = (unsigned long)page->freelist;
> > +
> > +             if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
> > +                     put_page_bootmem(page);
> > +             else
> > +                     WARN_ON(1);
> > +     }
> > +}
> > +
> > +static inline void prepare_vmemmap_page(struct page *page)
> > +{
> > +     unsigned long section_nr = pfn_to_section_nr(page_to_pfn(page));
> > +
> > +     get_page_bootmem(section_nr, page, SECTION_INFO);
> > +     mark_page_reserved(page);
> > +}
>
> Can you clarify in the description when exactly these functions are
> called and on which type of pages?
>
> Would indicating "bootmem" in the function names make it clearer what we
> are dealing with?
>
> E.g., any memory allocated via the memblock allocator and not via the
> buddy will be makred reserved already in the memmap. It's unclear to me
> why we need the mark_page_reserved() here - can you enlighten me? :)

Sorry for ignoring this question. Because the vmemmap pages are allocated
from the bootmem allocator which is marked as PG_reserved. For those bootmem
pages, we should call put_page_bootmem for free. You can see that we
clear the PG_reserved in the put_page_bootmem. In order to be consistent,
the prepare_vmemmap_page also marks the page as PG_reserved.

Thanks.

>
> --
> Thanks,
>
> David / dhildenb
>
David Hildenbrand Dec. 9, 2020, 8:49 a.m. UTC | #4
On 09.12.20 08:36, Muchun Song wrote:
> On Mon, Dec 7, 2020 at 8:39 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 30.11.20 16:18, Muchun Song wrote:
>>> In the later patch, we can use the free_vmemmap_page() to free the
>>> unused vmemmap pages and initialize a page for vmemmap page using
>>> via prepare_vmemmap_page().
>>>
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> ---
>>>  include/linux/bootmem_info.h | 24 ++++++++++++++++++++++++
>>>  1 file changed, 24 insertions(+)
>>>
>>> diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
>>> index 4ed6dee1adc9..239e3cc8f86c 100644
>>> --- a/include/linux/bootmem_info.h
>>> +++ b/include/linux/bootmem_info.h
>>> @@ -3,6 +3,7 @@
>>>  #define __LINUX_BOOTMEM_INFO_H
>>>
>>>  #include <linux/mmzone.h>
>>> +#include <linux/mm.h>
>>>
>>>  /*
>>>   * Types for free bootmem stored in page->lru.next. These have to be in
>>> @@ -22,6 +23,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat);
>>>  void get_page_bootmem(unsigned long info, struct page *page,
>>>                     unsigned long type);
>>>  void put_page_bootmem(struct page *page);
>>> +
>>> +static inline void free_vmemmap_page(struct page *page)
>>> +{
>>> +     VM_WARN_ON(!PageReserved(page) || page_ref_count(page) != 2);
>>> +
>>> +     /* bootmem page has reserved flag in the reserve_bootmem_region */
>>> +     if (PageReserved(page)) {
>>> +             unsigned long magic = (unsigned long)page->freelist;
>>> +
>>> +             if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
>>> +                     put_page_bootmem(page);
>>> +             else
>>> +                     WARN_ON(1);
>>> +     }
>>> +}
>>> +
>>> +static inline void prepare_vmemmap_page(struct page *page)
>>> +{
>>> +     unsigned long section_nr = pfn_to_section_nr(page_to_pfn(page));
>>> +
>>> +     get_page_bootmem(section_nr, page, SECTION_INFO);
>>> +     mark_page_reserved(page);
>>> +}
>>
>> Can you clarify in the description when exactly these functions are
>> called and on which type of pages?
>>
>> Would indicating "bootmem" in the function names make it clearer what we
>> are dealing with?
>>
>> E.g., any memory allocated via the memblock allocator and not via the
>> buddy will be makred reserved already in the memmap. It's unclear to me
>> why we need the mark_page_reserved() here - can you enlighten me? :)
> 
> Sorry for ignoring this question. Because the vmemmap pages are allocated
> from the bootmem allocator which is marked as PG_reserved. For those bootmem
> pages, we should call put_page_bootmem for free. You can see that we
> clear the PG_reserved in the put_page_bootmem. In order to be consistent,
> the prepare_vmemmap_page also marks the page as PG_reserved.

I don't think that really makes sense.

After put_page_bootmem() put the last reference, it clears PG_reserved
and hands the page over to the buddy via free_reserved_page(). From that
point on, further get_page_bootmem() would be completely wrong and
dangerous.

Both, put_page_bootmem() and get_page_bootmem() rely on the fact that
they are dealing with memblock allcoations - marked via PG_reserved. If
prepare_vmemmap_page() would be called on something that's *not* coming
from the memblock allocator, it would be completely broken - or am I
missing something?

AFAIKT, there should rather be a BUG_ON(!PageReserved(page)) in
prepare_vmemmap_page() - or proper handling to deal with !memblock
allocations.


And as I said, indicating "bootmem" as part of the function names might
make it clearer that this is not for getting any vmemmap pages (esp.
allocated when hotplugging memory).
Muchun Song Dec. 9, 2020, 9:25 a.m. UTC | #5
On Wed, Dec 9, 2020 at 4:50 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 09.12.20 08:36, Muchun Song wrote:
> > On Mon, Dec 7, 2020 at 8:39 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 30.11.20 16:18, Muchun Song wrote:
> >>> In the later patch, we can use the free_vmemmap_page() to free the
> >>> unused vmemmap pages and initialize a page for vmemmap page using
> >>> via prepare_vmemmap_page().
> >>>
> >>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>> ---
> >>>  include/linux/bootmem_info.h | 24 ++++++++++++++++++++++++
> >>>  1 file changed, 24 insertions(+)
> >>>
> >>> diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
> >>> index 4ed6dee1adc9..239e3cc8f86c 100644
> >>> --- a/include/linux/bootmem_info.h
> >>> +++ b/include/linux/bootmem_info.h
> >>> @@ -3,6 +3,7 @@
> >>>  #define __LINUX_BOOTMEM_INFO_H
> >>>
> >>>  #include <linux/mmzone.h>
> >>> +#include <linux/mm.h>
> >>>
> >>>  /*
> >>>   * Types for free bootmem stored in page->lru.next. These have to be in
> >>> @@ -22,6 +23,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat);
> >>>  void get_page_bootmem(unsigned long info, struct page *page,
> >>>                     unsigned long type);
> >>>  void put_page_bootmem(struct page *page);
> >>> +
> >>> +static inline void free_vmemmap_page(struct page *page)
> >>> +{
> >>> +     VM_WARN_ON(!PageReserved(page) || page_ref_count(page) != 2);
> >>> +
> >>> +     /* bootmem page has reserved flag in the reserve_bootmem_region */
> >>> +     if (PageReserved(page)) {
> >>> +             unsigned long magic = (unsigned long)page->freelist;
> >>> +
> >>> +             if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
> >>> +                     put_page_bootmem(page);
> >>> +             else
> >>> +                     WARN_ON(1);
> >>> +     }
> >>> +}
> >>> +
> >>> +static inline void prepare_vmemmap_page(struct page *page)
> >>> +{
> >>> +     unsigned long section_nr = pfn_to_section_nr(page_to_pfn(page));
> >>> +
> >>> +     get_page_bootmem(section_nr, page, SECTION_INFO);
> >>> +     mark_page_reserved(page);
> >>> +}
> >>
> >> Can you clarify in the description when exactly these functions are
> >> called and on which type of pages?
> >>
> >> Would indicating "bootmem" in the function names make it clearer what we
> >> are dealing with?
> >>
> >> E.g., any memory allocated via the memblock allocator and not via the
> >> buddy will be makred reserved already in the memmap. It's unclear to me
> >> why we need the mark_page_reserved() here - can you enlighten me? :)
> >
> > Sorry for ignoring this question. Because the vmemmap pages are allocated
> > from the bootmem allocator which is marked as PG_reserved. For those bootmem
> > pages, we should call put_page_bootmem for free. You can see that we
> > clear the PG_reserved in the put_page_bootmem. In order to be consistent,
> > the prepare_vmemmap_page also marks the page as PG_reserved.
>
> I don't think that really makes sense.
>
> After put_page_bootmem() put the last reference, it clears PG_reserved
> and hands the page over to the buddy via free_reserved_page(). From that
> point on, further get_page_bootmem() would be completely wrong and
> dangerous.
>
> Both, put_page_bootmem() and get_page_bootmem() rely on the fact that
> they are dealing with memblock allcoations - marked via PG_reserved. If
> prepare_vmemmap_page() would be called on something that's *not* coming
> from the memblock allocator, it would be completely broken - or am I
> missing something?
>
> AFAIKT, there should rather be a BUG_ON(!PageReserved(page)) in
> prepare_vmemmap_page() - or proper handling to deal with !memblock
> allocations.
>

I want to allocate some pages as the vmemmap when
we free a HugeTLB page to the buddy allocator. So I use
the prepare_vmemmap_page() to initialize the page (which
allocated from buddy allocator) and make it as the vmemmap
of the freed HugeTLB page.

Any suggestions to deal with this case?

I have a solution to address this. When the pages allocated
from the buddy as vmemmap pages,  we do not call
prepare_vmemmap_page().

When we free some vmemmap pages of a HugeTLB
page, if the PG_reserved of the vmemmap page is set,
we call free_vmemmap_page() to free it to buddy,
otherwise call free_page(). What is your opinion?

>
> And as I said, indicating "bootmem" as part of the function names might
> make it clearer that this is not for getting any vmemmap pages (esp.
> allocated when hotplugging memory).

Agree. I am doing that for the next version.

>
> --
> Thanks,
>
> David / dhildenb
>
David Hildenbrand Dec. 9, 2020, 9:32 a.m. UTC | #6
On 09.12.20 10:25, Muchun Song wrote:
> On Wed, Dec 9, 2020 at 4:50 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 09.12.20 08:36, Muchun Song wrote:
>>> On Mon, Dec 7, 2020 at 8:39 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 30.11.20 16:18, Muchun Song wrote:
>>>>> In the later patch, we can use the free_vmemmap_page() to free the
>>>>> unused vmemmap pages and initialize a page for vmemmap page using
>>>>> via prepare_vmemmap_page().
>>>>>
>>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>>> ---
>>>>>  include/linux/bootmem_info.h | 24 ++++++++++++++++++++++++
>>>>>  1 file changed, 24 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
>>>>> index 4ed6dee1adc9..239e3cc8f86c 100644
>>>>> --- a/include/linux/bootmem_info.h
>>>>> +++ b/include/linux/bootmem_info.h
>>>>> @@ -3,6 +3,7 @@
>>>>>  #define __LINUX_BOOTMEM_INFO_H
>>>>>
>>>>>  #include <linux/mmzone.h>
>>>>> +#include <linux/mm.h>
>>>>>
>>>>>  /*
>>>>>   * Types for free bootmem stored in page->lru.next. These have to be in
>>>>> @@ -22,6 +23,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat);
>>>>>  void get_page_bootmem(unsigned long info, struct page *page,
>>>>>                     unsigned long type);
>>>>>  void put_page_bootmem(struct page *page);
>>>>> +
>>>>> +static inline void free_vmemmap_page(struct page *page)
>>>>> +{
>>>>> +     VM_WARN_ON(!PageReserved(page) || page_ref_count(page) != 2);
>>>>> +
>>>>> +     /* bootmem page has reserved flag in the reserve_bootmem_region */
>>>>> +     if (PageReserved(page)) {
>>>>> +             unsigned long magic = (unsigned long)page->freelist;
>>>>> +
>>>>> +             if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
>>>>> +                     put_page_bootmem(page);
>>>>> +             else
>>>>> +                     WARN_ON(1);
>>>>> +     }
>>>>> +}
>>>>> +
>>>>> +static inline void prepare_vmemmap_page(struct page *page)
>>>>> +{
>>>>> +     unsigned long section_nr = pfn_to_section_nr(page_to_pfn(page));
>>>>> +
>>>>> +     get_page_bootmem(section_nr, page, SECTION_INFO);
>>>>> +     mark_page_reserved(page);
>>>>> +}
>>>>
>>>> Can you clarify in the description when exactly these functions are
>>>> called and on which type of pages?
>>>>
>>>> Would indicating "bootmem" in the function names make it clearer what we
>>>> are dealing with?
>>>>
>>>> E.g., any memory allocated via the memblock allocator and not via the
>>>> buddy will be makred reserved already in the memmap. It's unclear to me
>>>> why we need the mark_page_reserved() here - can you enlighten me? :)
>>>
>>> Sorry for ignoring this question. Because the vmemmap pages are allocated
>>> from the bootmem allocator which is marked as PG_reserved. For those bootmem
>>> pages, we should call put_page_bootmem for free. You can see that we
>>> clear the PG_reserved in the put_page_bootmem. In order to be consistent,
>>> the prepare_vmemmap_page also marks the page as PG_reserved.
>>
>> I don't think that really makes sense.
>>
>> After put_page_bootmem() put the last reference, it clears PG_reserved
>> and hands the page over to the buddy via free_reserved_page(). From that
>> point on, further get_page_bootmem() would be completely wrong and
>> dangerous.
>>
>> Both, put_page_bootmem() and get_page_bootmem() rely on the fact that
>> they are dealing with memblock allcoations - marked via PG_reserved. If
>> prepare_vmemmap_page() would be called on something that's *not* coming
>> from the memblock allocator, it would be completely broken - or am I
>> missing something?
>>
>> AFAIKT, there should rather be a BUG_ON(!PageReserved(page)) in
>> prepare_vmemmap_page() - or proper handling to deal with !memblock
>> allocations.
>>
> 
> I want to allocate some pages as the vmemmap when
> we free a HugeTLB page to the buddy allocator. So I use
> the prepare_vmemmap_page() to initialize the page (which
> allocated from buddy allocator) and make it as the vmemmap
> of the freed HugeTLB page.
> 
> Any suggestions to deal with this case?

If you obtained pages via the buddy, there shouldn't be anything special
to handle, no? What speaks against


prepare_vmemmap_page():
if (!PageReserved(page))
	return;


put_page_bootmem():
if (!PageReserved(page))
	__free_page();


Or if we care about multiple references, get_page() and put_page().

> 
> I have a solution to address this. When the pages allocated
> from the buddy as vmemmap pages,  we do not call
> prepare_vmemmap_page().
> 
> When we free some vmemmap pages of a HugeTLB
> page, if the PG_reserved of the vmemmap page is set,
> we call free_vmemmap_page() to free it to buddy,
> otherwise call free_page(). What is your opinion?

That would also work. Then, please include "bootmem" as part of the
function name. If you plan on using my suggestion, you can drop
"bootmem" from the name as it works for both types of pages.
Muchun Song Dec. 9, 2020, 9:43 a.m. UTC | #7
On Wed, Dec 9, 2020 at 5:33 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 09.12.20 10:25, Muchun Song wrote:
> > On Wed, Dec 9, 2020 at 4:50 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 09.12.20 08:36, Muchun Song wrote:
> >>> On Mon, Dec 7, 2020 at 8:39 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 30.11.20 16:18, Muchun Song wrote:
> >>>>> In the later patch, we can use the free_vmemmap_page() to free the
> >>>>> unused vmemmap pages and initialize a page for vmemmap page using
> >>>>> via prepare_vmemmap_page().
> >>>>>
> >>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>>>> ---
> >>>>>  include/linux/bootmem_info.h | 24 ++++++++++++++++++++++++
> >>>>>  1 file changed, 24 insertions(+)
> >>>>>
> >>>>> diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
> >>>>> index 4ed6dee1adc9..239e3cc8f86c 100644
> >>>>> --- a/include/linux/bootmem_info.h
> >>>>> +++ b/include/linux/bootmem_info.h
> >>>>> @@ -3,6 +3,7 @@
> >>>>>  #define __LINUX_BOOTMEM_INFO_H
> >>>>>
> >>>>>  #include <linux/mmzone.h>
> >>>>> +#include <linux/mm.h>
> >>>>>
> >>>>>  /*
> >>>>>   * Types for free bootmem stored in page->lru.next. These have to be in
> >>>>> @@ -22,6 +23,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat);
> >>>>>  void get_page_bootmem(unsigned long info, struct page *page,
> >>>>>                     unsigned long type);
> >>>>>  void put_page_bootmem(struct page *page);
> >>>>> +
> >>>>> +static inline void free_vmemmap_page(struct page *page)
> >>>>> +{
> >>>>> +     VM_WARN_ON(!PageReserved(page) || page_ref_count(page) != 2);
> >>>>> +
> >>>>> +     /* bootmem page has reserved flag in the reserve_bootmem_region */
> >>>>> +     if (PageReserved(page)) {
> >>>>> +             unsigned long magic = (unsigned long)page->freelist;
> >>>>> +
> >>>>> +             if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
> >>>>> +                     put_page_bootmem(page);
> >>>>> +             else
> >>>>> +                     WARN_ON(1);
> >>>>> +     }
> >>>>> +}
> >>>>> +
> >>>>> +static inline void prepare_vmemmap_page(struct page *page)
> >>>>> +{
> >>>>> +     unsigned long section_nr = pfn_to_section_nr(page_to_pfn(page));
> >>>>> +
> >>>>> +     get_page_bootmem(section_nr, page, SECTION_INFO);
> >>>>> +     mark_page_reserved(page);
> >>>>> +}
> >>>>
> >>>> Can you clarify in the description when exactly these functions are
> >>>> called and on which type of pages?
> >>>>
> >>>> Would indicating "bootmem" in the function names make it clearer what we
> >>>> are dealing with?
> >>>>
> >>>> E.g., any memory allocated via the memblock allocator and not via the
> >>>> buddy will be makred reserved already in the memmap. It's unclear to me
> >>>> why we need the mark_page_reserved() here - can you enlighten me? :)
> >>>
> >>> Sorry for ignoring this question. Because the vmemmap pages are allocated
> >>> from the bootmem allocator which is marked as PG_reserved. For those bootmem
> >>> pages, we should call put_page_bootmem for free. You can see that we
> >>> clear the PG_reserved in the put_page_bootmem. In order to be consistent,
> >>> the prepare_vmemmap_page also marks the page as PG_reserved.
> >>
> >> I don't think that really makes sense.
> >>
> >> After put_page_bootmem() put the last reference, it clears PG_reserved
> >> and hands the page over to the buddy via free_reserved_page(). From that
> >> point on, further get_page_bootmem() would be completely wrong and
> >> dangerous.
> >>
> >> Both, put_page_bootmem() and get_page_bootmem() rely on the fact that
> >> they are dealing with memblock allcoations - marked via PG_reserved. If
> >> prepare_vmemmap_page() would be called on something that's *not* coming
> >> from the memblock allocator, it would be completely broken - or am I
> >> missing something?
> >>
> >> AFAIKT, there should rather be a BUG_ON(!PageReserved(page)) in
> >> prepare_vmemmap_page() - or proper handling to deal with !memblock
> >> allocations.
> >>
> >
> > I want to allocate some pages as the vmemmap when
> > we free a HugeTLB page to the buddy allocator. So I use
> > the prepare_vmemmap_page() to initialize the page (which
> > allocated from buddy allocator) and make it as the vmemmap
> > of the freed HugeTLB page.
> >
> > Any suggestions to deal with this case?
>
> If you obtained pages via the buddy, there shouldn't be anything special
> to handle, no? What speaks against
>
>
> prepare_vmemmap_page():
> if (!PageReserved(page))
>         return;
>
>
> put_page_bootmem():
> if (!PageReserved(page))
>         __free_page();
>

Thanks.

>
> Or if we care about multiple references, get_page() and put_page().
>
> >
> > I have a solution to address this. When the pages allocated
> > from the buddy as vmemmap pages,  we do not call
> > prepare_vmemmap_page().
> >
> > When we free some vmemmap pages of a HugeTLB
> > page, if the PG_reserved of the vmemmap page is set,
> > we call free_vmemmap_page() to free it to buddy,
> > otherwise call free_page(). What is your opinion?
>
> That would also work. Then, please include "bootmem" as part of the
> function name. If you plan on using my suggestion, you can drop
> "bootmem" from the name as it works for both types of pages.
>

Agree. Thanks.

>
> --
> Thanks,
>
> David / dhildenb
>
diff mbox series

Patch

diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
index 4ed6dee1adc9..239e3cc8f86c 100644
--- a/include/linux/bootmem_info.h
+++ b/include/linux/bootmem_info.h
@@ -3,6 +3,7 @@ 
 #define __LINUX_BOOTMEM_INFO_H
 
 #include <linux/mmzone.h>
+#include <linux/mm.h>
 
 /*
  * Types for free bootmem stored in page->lru.next. These have to be in
@@ -22,6 +23,29 @@  void __init register_page_bootmem_info_node(struct pglist_data *pgdat);
 void get_page_bootmem(unsigned long info, struct page *page,
 		      unsigned long type);
 void put_page_bootmem(struct page *page);
+
+static inline void free_vmemmap_page(struct page *page)
+{
+	VM_WARN_ON(!PageReserved(page) || page_ref_count(page) != 2);
+
+	/* bootmem page has reserved flag in the reserve_bootmem_region */
+	if (PageReserved(page)) {
+		unsigned long magic = (unsigned long)page->freelist;
+
+		if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
+			put_page_bootmem(page);
+		else
+			WARN_ON(1);
+	}
+}
+
+static inline void prepare_vmemmap_page(struct page *page)
+{
+	unsigned long section_nr = pfn_to_section_nr(page_to_pfn(page));
+
+	get_page_bootmem(section_nr, page, SECTION_INFO);
+	mark_page_reserved(page);
+}
 #else
 static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {