diff mbox series

[v3,1/5] mm: memory_hotplug: remove head variable in do_migrate_range()

Message ID 20240827114728.3212578-2-wangkefeng.wang@huawei.com (mailing list archive)
State New
Headers show
Series mm: memory_hotplug: improve do_migrate_range() | expand

Commit Message

Kefeng Wang Aug. 27, 2024, 11:47 a.m. UTC
Directly use a folio for HugeTLB and THP when calculate the next pfn, then
remove unused head variable.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/memory_hotplug.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

Comments

Matthew Wilcox Sept. 28, 2024, 4:55 a.m. UTC | #1
On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
> remove unused head variable.

I just noticed this got merged.  You're going to hit BUG_ON with it.

> -		if (PageHuge(page)) {
> -			pfn = page_to_pfn(head) + compound_nr(head) - 1;
> -			isolate_hugetlb(folio, &source);
> -			continue;
> -		} else if (PageTransHuge(page))
> -			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
> +		/*
> +		 * No reference or lock is held on the folio, so it might
> +		 * be modified concurrently (e.g. split).  As such,
> +		 * folio_nr_pages() may read garbage.  This is fine as the outer
> +		 * loop will revisit the split folio later.
> +		 */
> +		if (folio_test_large(folio)) {

But it's not fine.  Look at the implementation of folio_test_large():

static inline bool folio_test_large(const struct folio *folio)
{
        return folio_test_head(folio);
}

That's going to be provided by:

#define FOLIO_TEST_FLAG(name, page)                                     \
static __always_inline bool folio_test_##name(const struct folio *folio) \
{ return test_bit(PG_##name, const_folio_flags(folio, page)); }

and here's the BUG:

static const unsigned long *const_folio_flags(const struct folio *folio,
                unsigned n)
{
        const struct page *page = &folio->page;

        VM_BUG_ON_PGFLAGS(PageTail(page), page);
        VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
        return &page[n].flags;
}

(this page can be transformed from a head page to a tail page because,
as the comment notes, we don't hold a reference.

Please back this out.
David Hildenbrand Sept. 28, 2024, 8:34 a.m. UTC | #2
On 28.09.24 06:55, Matthew Wilcox wrote:
> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>> remove unused head variable.
> 
> I just noticed this got merged.  You're going to hit BUG_ON with it.
> 
>> -		if (PageHuge(page)) {
>> -			pfn = page_to_pfn(head) + compound_nr(head) - 1;
>> -			isolate_hugetlb(folio, &source);
>> -			continue;
>> -		} else if (PageTransHuge(page))
>> -			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>> +		/*
>> +		 * No reference or lock is held on the folio, so it might
>> +		 * be modified concurrently (e.g. split).  As such,
>> +		 * folio_nr_pages() may read garbage.  This is fine as the outer
>> +		 * loop will revisit the split folio later.
>> +		 */
>> +		if (folio_test_large(folio)) {
> 
> But it's not fine.  Look at the implementation of folio_test_large():
> 
> static inline bool folio_test_large(const struct folio *folio)
> {
>          return folio_test_head(folio);
> }
> 
> That's going to be provided by:
> 
> #define FOLIO_TEST_FLAG(name, page)                                     \
> static __always_inline bool folio_test_##name(const struct folio *folio) \
> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
> 
> and here's the BUG:
> 
> static const unsigned long *const_folio_flags(const struct folio *folio,
>                  unsigned n)
> {
>          const struct page *page = &folio->page;
> 
>          VM_BUG_ON_PGFLAGS(PageTail(page), page);
>          VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>          return &page[n].flags;
> }
> 
> (this page can be transformed from a head page to a tail page because,
> as the comment notes, we don't hold a reference.
> 
> Please back this out.

Should we generalize the approach in dump_folio() to locally copy a 
folio, so we can safely perform checks before deciding whether we want 
to try grabbing a reference on the real folio (if it's still a folio :) )?
David Hildenbrand Sept. 28, 2024, 8:39 a.m. UTC | #3
On 28.09.24 10:34, David Hildenbrand wrote:
> On 28.09.24 06:55, Matthew Wilcox wrote:
>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>> remove unused head variable.
>>
>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>
>>> -		if (PageHuge(page)) {
>>> -			pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>> -			isolate_hugetlb(folio, &source);
>>> -			continue;
>>> -		} else if (PageTransHuge(page))
>>> -			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>> +		/*
>>> +		 * No reference or lock is held on the folio, so it might
>>> +		 * be modified concurrently (e.g. split).  As such,
>>> +		 * folio_nr_pages() may read garbage.  This is fine as the outer
>>> +		 * loop will revisit the split folio later.
>>> +		 */
>>> +		if (folio_test_large(folio)) {
>>
>> But it's not fine.  Look at the implementation of folio_test_large():
>>
>> static inline bool folio_test_large(const struct folio *folio)
>> {
>>           return folio_test_head(folio);
>> }
>>
>> That's going to be provided by:
>>
>> #define FOLIO_TEST_FLAG(name, page)                                     \
>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>
>> and here's the BUG:
>>
>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>                   unsigned n)
>> {
>>           const struct page *page = &folio->page;
>>
>>           VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>           VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>           return &page[n].flags;
>> }
>>
>> (this page can be transformed from a head page to a tail page because,
>> as the comment notes, we don't hold a reference.
>>
>> Please back this out.
> 
> Should we generalize the approach in dump_folio() to locally copy a
> folio, so we can safely perform checks before deciding whether we want
> to try grabbing a reference on the real folio (if it's still a folio :) )?
> 

Oh, and I forgot: isn't the existing code already racy?

PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);
Miaohe Lin Sept. 29, 2024, 1:16 a.m. UTC | #4
On 2024/9/28 16:39, David Hildenbrand wrote:
> On 28.09.24 10:34, David Hildenbrand wrote:
>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>>> remove unused head variable.
>>>
>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>
>>>> -        if (PageHuge(page)) {
>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>> -            isolate_hugetlb(folio, &source);
>>>> -            continue;
>>>> -        } else if (PageTransHuge(page))
>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>> +        /*
>>>> +         * No reference or lock is held on the folio, so it might
>>>> +         * be modified concurrently (e.g. split).  As such,
>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer
>>>> +         * loop will revisit the split folio later.
>>>> +         */
>>>> +        if (folio_test_large(folio)) {
>>>
>>> But it's not fine.  Look at the implementation of folio_test_large():
>>>
>>> static inline bool folio_test_large(const struct folio *folio)
>>> {
>>>           return folio_test_head(folio);
>>> }
>>>
>>> That's going to be provided by:
>>>
>>> #define FOLIO_TEST_FLAG(name, page)                                     \
>>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>
>>> and here's the BUG:
>>>
>>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>>                   unsigned n)
>>> {
>>>           const struct page *page = &folio->page;
>>>
>>>           VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>           VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>>           return &page[n].flags;
>>> }
>>>
>>> (this page can be transformed from a head page to a tail page because,
>>> as the comment notes, we don't hold a reference.
>>>
>>> Please back this out.
>>
>> Should we generalize the approach in dump_folio() to locally copy a
>> folio, so we can safely perform checks before deciding whether we want
>> to try grabbing a reference on the real folio (if it's still a folio :) )?
>>
> 
> Oh, and I forgot: isn't the existing code already racy?
> 
> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);

do_migrate_range is called after start_isolate_page_range(). So a page might not be able to
transform from a head page to a tail page as it's isolated?

Thanks.
Kefeng Wang Sept. 29, 2024, 2:04 a.m. UTC | #5
On 2024/9/29 9:16, Miaohe Lin wrote:
> On 2024/9/28 16:39, David Hildenbrand wrote:
>> On 28.09.24 10:34, David Hildenbrand wrote:
>>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>>>> remove unused head variable.
>>>>
>>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>>
>>>>> -        if (PageHuge(page)) {
>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>>> -            isolate_hugetlb(folio, &source);
>>>>> -            continue;
>>>>> -        } else if (PageTransHuge(page))
>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>>> +        /*
>>>>> +         * No reference or lock is held on the folio, so it might
>>>>> +         * be modified concurrently (e.g. split).  As such,
>>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer
>>>>> +         * loop will revisit the split folio later.
>>>>> +         */
>>>>> +        if (folio_test_large(folio)) {
>>>>
>>>> But it's not fine.  Look at the implementation of folio_test_large():
>>>>
>>>> static inline bool folio_test_large(const struct folio *folio)
>>>> {
>>>>            return folio_test_head(folio);
>>>> }
>>>>
>>>> That's going to be provided by:
>>>>
>>>> #define FOLIO_TEST_FLAG(name, page)                                     \
>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>>
>>>> and here's the BUG:
>>>>
>>>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>>>                    unsigned n)
>>>> {
>>>>            const struct page *page = &folio->page;
>>>>
>>>>            VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>>            VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>>>            return &page[n].flags;
>>>> }
>>>>
>>>> (this page can be transformed from a head page to a tail page because,
>>>> as the comment notes, we don't hold a reference.
>>>>
>>>> Please back this out.
>>>
>>> Should we generalize the approach in dump_folio() to locally copy a
>>> folio, so we can safely perform checks before deciding whether we want
>>> to try grabbing a reference on the real folio (if it's still a folio :) )?
>>>
>>
>> Oh, and I forgot: isn't the existing code already racy?
>>
>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);

Yes, in v1[1], I asked same question for existing code for 
PageTransHuge(page),

   "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here,
    but it seems that we don't guarantee the page won't be a tail page."


we could delay the calculation after we got a ref, but the traversal of 
pfn may slow down a little if hint a tail pfn, is it acceptable?

--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long 
start_pfn, unsigned long end_pfn)
                 page = pfn_to_page(pfn);
                 folio = page_folio(page);

-               /*
-                * No reference or lock is held on the folio, so it might
-                * be modified concurrently (e.g. split).  As such,
-                * folio_nr_pages() may read garbage.  This is fine as 
the outer
-                * loop will revisit the split folio later.
-                */
-               if (folio_test_large(folio))
-                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
-
                 /*
                  * HWPoison pages have elevated reference counts so the 
migration would
                  * fail on them. It also doesn't make any sense to 
migrate them in the
@@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long 
start_pfn, unsigned long end_pfn)
                                 folio_isolate_lru(folio);
                         if (folio_mapped(folio))
                                 unmap_poisoned_folio(folio, 
TTU_IGNORE_MLOCK);
+                       if (folio_test_large(folio))
+                               pfn = folio_pfn(folio) + 
folio_nr_pages(folio) - 1;
                         continue;
                 }

@@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long 
start_pfn, unsigned long end_pfn)
                                 dump_page(page, "isolation failed");
                         }
                 }
+
+               if (folio_test_large(folio))
+                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
  put_folio:
                 folio_put(folio);
         }


> 
> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to
> transform from a head page to a tail page as it's isolated?
start_isolate_page_range() is only isolate free pages, so maybe irrelevant.

> 
> Thanks.
> 

[1] 
https://lore.kernel.org/linux-mm/4e693aa6-d742-4fe7-bd97-3d375f96fcfa@huawei.com/
Miaohe Lin Sept. 29, 2024, 2:19 a.m. UTC | #6
On 2024/9/29 10:04, Kefeng Wang wrote:
> 
> 
> On 2024/9/29 9:16, Miaohe Lin wrote:
>> On 2024/9/28 16:39, David Hildenbrand wrote:
>>> On 28.09.24 10:34, David Hildenbrand wrote:
>>>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>>>>> remove unused head variable.
>>>>>
>>>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>>>
>>>>>> -        if (PageHuge(page)) {
>>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>>>> -            isolate_hugetlb(folio, &source);
>>>>>> -            continue;
>>>>>> -        } else if (PageTransHuge(page))
>>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>>>> +        /*
>>>>>> +         * No reference or lock is held on the folio, so it might
>>>>>> +         * be modified concurrently (e.g. split).  As such,
>>>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer
>>>>>> +         * loop will revisit the split folio later.
>>>>>> +         */
>>>>>> +        if (folio_test_large(folio)) {
>>>>>
>>>>> But it's not fine.  Look at the implementation of folio_test_large():
>>>>>
>>>>> static inline bool folio_test_large(const struct folio *folio)
>>>>> {
>>>>>            return folio_test_head(folio);
>>>>> }
>>>>>
>>>>> That's going to be provided by:
>>>>>
>>>>> #define FOLIO_TEST_FLAG(name, page)                                     \
>>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>>>
>>>>> and here's the BUG:
>>>>>
>>>>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>>>>                    unsigned n)
>>>>> {
>>>>>            const struct page *page = &folio->page;
>>>>>
>>>>>            VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>>>            VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>>>>            return &page[n].flags;
>>>>> }
>>>>>
>>>>> (this page can be transformed from a head page to a tail page because,
>>>>> as the comment notes, we don't hold a reference.
>>>>>
>>>>> Please back this out.
>>>>
>>>> Should we generalize the approach in dump_folio() to locally copy a
>>>> folio, so we can safely perform checks before deciding whether we want
>>>> to try grabbing a reference on the real folio (if it's still a folio :) )?
>>>>
>>>
>>> Oh, and I forgot: isn't the existing code already racy?
>>>
>>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);
> 
> Yes, in v1[1], I asked same question for existing code for PageTransHuge(page),
> 
>   "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here,
>    but it seems that we don't guarantee the page won't be a tail page."
> 
> 
> we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable?
> 
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>                 page = pfn_to_page(pfn);
>                 folio = page_folio(page);
> 
> -               /*
> -                * No reference or lock is held on the folio, so it might
> -                * be modified concurrently (e.g. split).  As such,
> -                * folio_nr_pages() may read garbage.  This is fine as the outer
> -                * loop will revisit the split folio later.
> -                */
> -               if (folio_test_large(folio))
> -                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
> -
>                 /*
>                  * HWPoison pages have elevated reference counts so the migration would
>                  * fail on them. It also doesn't make any sense to migrate them in the
> @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>                                 folio_isolate_lru(folio);
>                         if (folio_mapped(folio))
>                                 unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
> +                       if (folio_test_large(folio))
> +                               pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>                         continue;
>                 }
> 
> @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>                                 dump_page(page, "isolation failed");
>                         }
>                 }
> +
> +               if (folio_test_large(folio))
> +                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>  put_folio:
>                 folio_put(folio);
>         }
> 
> 
>>
>> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to
>> transform from a head page to a tail page as it's isolated?
> start_isolate_page_range() is only isolate free pages, so maybe irrelevant.

A page transform from a head page to a tail page should through the below steps:
1. The compound page is freed into buddy.
2. It's merged into larger order in buddy.
3. It's allocated as a larger order compound page.

Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss something?

Thanks.
David Hildenbrand Sept. 30, 2024, 9:25 a.m. UTC | #7
On 29.09.24 04:19, Miaohe Lin wrote:
> On 2024/9/29 10:04, Kefeng Wang wrote:
>>
>>
>> On 2024/9/29 9:16, Miaohe Lin wrote:
>>> On 2024/9/28 16:39, David Hildenbrand wrote:
>>>> On 28.09.24 10:34, David Hildenbrand wrote:
>>>>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>>>>>> remove unused head variable.
>>>>>>
>>>>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>>>>
>>>>>>> -        if (PageHuge(page)) {
>>>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>>>>> -            isolate_hugetlb(folio, &source);
>>>>>>> -            continue;
>>>>>>> -        } else if (PageTransHuge(page))
>>>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>>>>> +        /*
>>>>>>> +         * No reference or lock is held on the folio, so it might
>>>>>>> +         * be modified concurrently (e.g. split).  As such,
>>>>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer
>>>>>>> +         * loop will revisit the split folio later.
>>>>>>> +         */
>>>>>>> +        if (folio_test_large(folio)) {
>>>>>>
>>>>>> But it's not fine.  Look at the implementation of folio_test_large():
>>>>>>
>>>>>> static inline bool folio_test_large(const struct folio *folio)
>>>>>> {
>>>>>>             return folio_test_head(folio);
>>>>>> }
>>>>>>
>>>>>> That's going to be provided by:
>>>>>>
>>>>>> #define FOLIO_TEST_FLAG(name, page)                                     \
>>>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>>>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>>>>
>>>>>> and here's the BUG:
>>>>>>
>>>>>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>>>>>                     unsigned n)
>>>>>> {
>>>>>>             const struct page *page = &folio->page;
>>>>>>
>>>>>>             VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>>>>             VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>>>>>             return &page[n].flags;
>>>>>> }
>>>>>>
>>>>>> (this page can be transformed from a head page to a tail page because,
>>>>>> as the comment notes, we don't hold a reference.
>>>>>>
>>>>>> Please back this out.
>>>>>
>>>>> Should we generalize the approach in dump_folio() to locally copy a
>>>>> folio, so we can safely perform checks before deciding whether we want
>>>>> to try grabbing a reference on the real folio (if it's still a folio :) )?
>>>>>
>>>>
>>>> Oh, and I forgot: isn't the existing code already racy?
>>>>
>>>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);
>>
>> Yes, in v1[1], I asked same question for existing code for PageTransHuge(page),
>>
>>    "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here,
>>     but it seems that we don't guarantee the page won't be a tail page."
>>
>>
>> we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable?
>>
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>                  page = pfn_to_page(pfn);
>>                  folio = page_folio(page);
>>
>> -               /*
>> -                * No reference or lock is held on the folio, so it might
>> -                * be modified concurrently (e.g. split).  As such,
>> -                * folio_nr_pages() may read garbage.  This is fine as the outer
>> -                * loop will revisit the split folio later.
>> -                */
>> -               if (folio_test_large(folio))
>> -                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>> -
>>                  /*
>>                   * HWPoison pages have elevated reference counts so the migration would
>>                   * fail on them. It also doesn't make any sense to migrate them in the
>> @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>                                  folio_isolate_lru(folio);
>>                          if (folio_mapped(folio))
>>                                  unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
>> +                       if (folio_test_large(folio))
>> +                               pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>                          continue;
>>                  }
>>
>> @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>                                  dump_page(page, "isolation failed");
>>                          }
>>                  }
>> +
>> +               if (folio_test_large(folio))
>> +                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>   put_folio:
>>                  folio_put(folio);
>>          }
>>
>>
>>>
>>> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to
>>> transform from a head page to a tail page as it's isolated?
>> start_isolate_page_range() is only isolate free pages, so maybe irrelevant.
> 
> A page transform from a head page to a tail page should through the below steps:
> 1. The compound page is freed into buddy.
> 2. It's merged into larger order in buddy.
> 3. It's allocated as a larger order compound page.
> 
> Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss something?

By isolated, you mean that the pageblock is isolated, and all free pages 
are in the MIGRATE_ISOLATE buddy list. Nice observation.

Indeed, a tail page could become a head page (concurrent split is 
possible), but a head page should not become a tail for the reason you 
mention.

Even mm/page_reporting.c will skip isolated pageblocks.

I wonder if there are some corner cases, but nothing comes to mind that 
would perform compound allocations from the MIGRATE_ISOLATE list.
Kefeng Wang Oct. 9, 2024, 7:27 a.m. UTC | #8
On 2024/9/30 17:25, David Hildenbrand wrote:
> On 29.09.24 04:19, Miaohe Lin wrote:
>> On 2024/9/29 10:04, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/9/29 9:16, Miaohe Lin wrote:
>>>> On 2024/9/28 16:39, David Hildenbrand wrote:
>>>>> On 28.09.24 10:34, David Hildenbrand wrote:
>>>>>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>>>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>>>>>> Directly use a folio for HugeTLB and THP when calculate the next 
>>>>>>>> pfn, then
>>>>>>>> remove unused head variable.
>>>>>>>
>>>>>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>>>>>
>>>>>>>> -        if (PageHuge(page)) {
>>>>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>>>>>> -            isolate_hugetlb(folio, &source);
>>>>>>>> -            continue;
>>>>>>>> -        } else if (PageTransHuge(page))
>>>>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>>>>>> +        /*
>>>>>>>> +         * No reference or lock is held on the folio, so it might
>>>>>>>> +         * be modified concurrently (e.g. split).  As such,
>>>>>>>> +         * folio_nr_pages() may read garbage.  This is fine as 
>>>>>>>> the outer
>>>>>>>> +         * loop will revisit the split folio later.
>>>>>>>> +         */
>>>>>>>> +        if (folio_test_large(folio)) {
>>>>>>>
>>>>>>> But it's not fine.  Look at the implementation of 
>>>>>>> folio_test_large():
>>>>>>>
>>>>>>> static inline bool folio_test_large(const struct folio *folio)
>>>>>>> {
>>>>>>>             return folio_test_head(folio);
>>>>>>> }
>>>>>>>
>>>>>>> That's going to be provided by:
>>>>>>>
>>>>>>> #define FOLIO_TEST_FLAG(name, 
>>>>>>> page)                                     \
>>>>>>> static __always_inline bool folio_test_##name(const struct folio 
>>>>>>> *folio) \
>>>>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>>>>>
>>>>>>> and here's the BUG:
>>>>>>>
>>>>>>> static const unsigned long *const_folio_flags(const struct folio 
>>>>>>> *folio,
>>>>>>>                     unsigned n)
>>>>>>> {
>>>>>>>             const struct page *page = &folio->page;
>>>>>>>
>>>>>>>             VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>>>>>             VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page- 
>>>>>>> >flags), page);
>>>>>>>             return &page[n].flags;
>>>>>>> }
>>>>>>>
>>>>>>> (this page can be transformed from a head page to a tail page 
>>>>>>> because,
>>>>>>> as the comment notes, we don't hold a reference.
>>>>>>>
>>>>>>> Please back this out.
>>>>>>
>>>>>> Should we generalize the approach in dump_folio() to locally copy a
>>>>>> folio, so we can safely perform checks before deciding whether we 
>>>>>> want
>>>>>> to try grabbing a reference on the real folio (if it's still a 
>>>>>> folio :) )?
>>>>>>
>>>>>
>>>>> Oh, and I forgot: isn't the existing code already racy?
>>>>>
>>>>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);
>>>
>>> Yes, in v1[1], I asked same question for existing code for 
>>> PageTransHuge(page),
>>>
>>>    "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here,
>>>     but it seems that we don't guarantee the page won't be a tail page."
>>>
>>>
...
>>>>
>>>> do_migrate_range is called after start_isolate_page_range(). So a 
>>>> page might not be able to
>>>> transform from a head page to a tail page as it's isolated?
>>> start_isolate_page_range() is only isolate free pages, so maybe 
>>> irrelevant.
>>
>> A page transform from a head page to a tail page should through the 
>> below steps:
>> 1. The compound page is freed into buddy.
>> 2. It's merged into larger order in buddy.
>> 3. It's allocated as a larger order compound page.
>>
>> Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss 
>> something?
> 
> By isolated, you mean that the pageblock is isolated, and all free pages 
> are in the MIGRATE_ISOLATE buddy list. Nice observation.
> 
> Indeed, a tail page could become a head page (concurrent split is 
> possible), but a head page should not become a tail for the reason you 
> mention.
> 
> Even mm/page_reporting.c will skip isolated pageblocks.
> 
> I wonder if there are some corner cases, but nothing comes to mind that 
> would perform compound allocations from the MIGRATE_ISOLATE list.
> 

As David/Miaohe explanation, it seems that we can't hint the VM_BUG_ON,
so no more changes, thanks for all the review and comments.
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 38ecd45c3f94..9ef776a25b9d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1773,7 +1773,7 @@  static int scan_movable_pages(unsigned long start, unsigned long end,
 static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
-	struct page *page, *head;
+	struct page *page;
 	LIST_HEAD(source);
 	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
@@ -1786,14 +1786,20 @@  static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		page = pfn_to_page(pfn);
 		folio = page_folio(page);
-		head = &folio->page;
 
-		if (PageHuge(page)) {
-			pfn = page_to_pfn(head) + compound_nr(head) - 1;
-			isolate_hugetlb(folio, &source);
-			continue;
-		} else if (PageTransHuge(page))
-			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
+		/*
+		 * No reference or lock is held on the folio, so it might
+		 * be modified concurrently (e.g. split).  As such,
+		 * folio_nr_pages() may read garbage.  This is fine as the outer
+		 * loop will revisit the split folio later.
+		 */
+		if (folio_test_large(folio)) {
+			pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
+			if (folio_test_hugetlb(folio)) {
+				isolate_hugetlb(folio, &source);
+				continue;
+			}
+		}
 
 		/*
 		 * HWPoison pages have elevated reference counts so the migration would