Message ID | 20240827114728.3212578-2-wangkefeng.wang@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: memory_hotplug: improve do_migrate_range() | expand |
On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: > Directly use a folio for HugeTLB and THP when calculate the next pfn, then > remove unused head variable. I just noticed this got merged. You're going to hit BUG_ON with it. > - if (PageHuge(page)) { > - pfn = page_to_pfn(head) + compound_nr(head) - 1; > - isolate_hugetlb(folio, &source); > - continue; > - } else if (PageTransHuge(page)) > - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; > + /* > + * No reference or lock is held on the folio, so it might > + * be modified concurrently (e.g. split). As such, > + * folio_nr_pages() may read garbage. This is fine as the outer > + * loop will revisit the split folio later. > + */ > + if (folio_test_large(folio)) { But it's not fine. Look at the implementation of folio_test_large(): static inline bool folio_test_large(const struct folio *folio) { return folio_test_head(folio); } That's going to be provided by: #define FOLIO_TEST_FLAG(name, page) \ static __always_inline bool folio_test_##name(const struct folio *folio) \ { return test_bit(PG_##name, const_folio_flags(folio, page)); } and here's the BUG: static const unsigned long *const_folio_flags(const struct folio *folio, unsigned n) { const struct page *page = &folio->page; VM_BUG_ON_PGFLAGS(PageTail(page), page); VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); return &page[n].flags; } (this page can be transformed from a head page to a tail page because, as the comment notes, we don't hold a reference. Please back this out.
On 28.09.24 06:55, Matthew Wilcox wrote: > On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >> Directly use a folio for HugeTLB and THP when calculate the next pfn, then >> remove unused head variable. > > I just noticed this got merged. You're going to hit BUG_ON with it. > >> - if (PageHuge(page)) { >> - pfn = page_to_pfn(head) + compound_nr(head) - 1; >> - isolate_hugetlb(folio, &source); >> - continue; >> - } else if (PageTransHuge(page)) >> - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >> + /* >> + * No reference or lock is held on the folio, so it might >> + * be modified concurrently (e.g. split). As such, >> + * folio_nr_pages() may read garbage. This is fine as the outer >> + * loop will revisit the split folio later. >> + */ >> + if (folio_test_large(folio)) { > > But it's not fine. Look at the implementation of folio_test_large(): > > static inline bool folio_test_large(const struct folio *folio) > { > return folio_test_head(folio); > } > > That's going to be provided by: > > #define FOLIO_TEST_FLAG(name, page) \ > static __always_inline bool folio_test_##name(const struct folio *folio) \ > { return test_bit(PG_##name, const_folio_flags(folio, page)); } > > and here's the BUG: > > static const unsigned long *const_folio_flags(const struct folio *folio, > unsigned n) > { > const struct page *page = &folio->page; > > VM_BUG_ON_PGFLAGS(PageTail(page), page); > VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); > return &page[n].flags; > } > > (this page can be transformed from a head page to a tail page because, > as the comment notes, we don't hold a reference. > > Please back this out. Should we generalize the approach in dump_folio() to locally copy a folio, so we can safely perform checks before deciding whether we want to try grabbing a reference on the real folio (if it's still a folio :) )?
On 28.09.24 10:34, David Hildenbrand wrote: > On 28.09.24 06:55, Matthew Wilcox wrote: >> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then >>> remove unused head variable. >> >> I just noticed this got merged. You're going to hit BUG_ON with it. >> >>> - if (PageHuge(page)) { >>> - pfn = page_to_pfn(head) + compound_nr(head) - 1; >>> - isolate_hugetlb(folio, &source); >>> - continue; >>> - } else if (PageTransHuge(page)) >>> - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>> + /* >>> + * No reference or lock is held on the folio, so it might >>> + * be modified concurrently (e.g. split). As such, >>> + * folio_nr_pages() may read garbage. This is fine as the outer >>> + * loop will revisit the split folio later. >>> + */ >>> + if (folio_test_large(folio)) { >> >> But it's not fine. Look at the implementation of folio_test_large(): >> >> static inline bool folio_test_large(const struct folio *folio) >> { >> return folio_test_head(folio); >> } >> >> That's going to be provided by: >> >> #define FOLIO_TEST_FLAG(name, page) \ >> static __always_inline bool folio_test_##name(const struct folio *folio) \ >> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >> >> and here's the BUG: >> >> static const unsigned long *const_folio_flags(const struct folio *folio, >> unsigned n) >> { >> const struct page *page = &folio->page; >> >> VM_BUG_ON_PGFLAGS(PageTail(page), page); >> VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); >> return &page[n].flags; >> } >> >> (this page can be transformed from a head page to a tail page because, >> as the comment notes, we don't hold a reference. >> >> Please back this out. > > Should we generalize the approach in dump_folio() to locally copy a > folio, so we can safely perform checks before deciding whether we want > to try grabbing a reference on the real folio (if it's still a folio :) )? > Oh, and I forgot: isn't the existing code already racy? PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);
On 2024/9/28 16:39, David Hildenbrand wrote: > On 28.09.24 10:34, David Hildenbrand wrote: >> On 28.09.24 06:55, Matthew Wilcox wrote: >>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then >>>> remove unused head variable. >>> >>> I just noticed this got merged. You're going to hit BUG_ON with it. >>> >>>> - if (PageHuge(page)) { >>>> - pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>> - isolate_hugetlb(folio, &source); >>>> - continue; >>>> - } else if (PageTransHuge(page)) >>>> - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>> + /* >>>> + * No reference or lock is held on the folio, so it might >>>> + * be modified concurrently (e.g. split). As such, >>>> + * folio_nr_pages() may read garbage. This is fine as the outer >>>> + * loop will revisit the split folio later. >>>> + */ >>>> + if (folio_test_large(folio)) { >>> >>> But it's not fine. Look at the implementation of folio_test_large(): >>> >>> static inline bool folio_test_large(const struct folio *folio) >>> { >>> return folio_test_head(folio); >>> } >>> >>> That's going to be provided by: >>> >>> #define FOLIO_TEST_FLAG(name, page) \ >>> static __always_inline bool folio_test_##name(const struct folio *folio) \ >>> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >>> >>> and here's the BUG: >>> >>> static const unsigned long *const_folio_flags(const struct folio *folio, >>> unsigned n) >>> { >>> const struct page *page = &folio->page; >>> >>> VM_BUG_ON_PGFLAGS(PageTail(page), page); >>> VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); >>> return &page[n].flags; >>> } >>> >>> (this page can be transformed from a head page to a tail page because, >>> as the comment notes, we don't hold a reference. >>> >>> Please back this out. >> >> Should we generalize the approach in dump_folio() to locally copy a >> folio, so we can safely perform checks before deciding whether we want >> to try grabbing a reference on the real folio (if it's still a folio :) )? >> > > Oh, and I forgot: isn't the existing code already racy? > > PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page); do_migrate_range is called after start_isolate_page_range(). So a page might not be able to transform from a head page to a tail page as it's isolated? Thanks.
On 2024/9/29 9:16, Miaohe Lin wrote: > On 2024/9/28 16:39, David Hildenbrand wrote: >> On 28.09.24 10:34, David Hildenbrand wrote: >>> On 28.09.24 06:55, Matthew Wilcox wrote: >>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then >>>>> remove unused head variable. >>>> >>>> I just noticed this got merged. You're going to hit BUG_ON with it. >>>> >>>>> - if (PageHuge(page)) { >>>>> - pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>>> - isolate_hugetlb(folio, &source); >>>>> - continue; >>>>> - } else if (PageTransHuge(page)) >>>>> - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>>> + /* >>>>> + * No reference or lock is held on the folio, so it might >>>>> + * be modified concurrently (e.g. split). As such, >>>>> + * folio_nr_pages() may read garbage. This is fine as the outer >>>>> + * loop will revisit the split folio later. >>>>> + */ >>>>> + if (folio_test_large(folio)) { >>>> >>>> But it's not fine. Look at the implementation of folio_test_large(): >>>> >>>> static inline bool folio_test_large(const struct folio *folio) >>>> { >>>> return folio_test_head(folio); >>>> } >>>> >>>> That's going to be provided by: >>>> >>>> #define FOLIO_TEST_FLAG(name, page) \ >>>> static __always_inline bool folio_test_##name(const struct folio *folio) \ >>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >>>> >>>> and here's the BUG: >>>> >>>> static const unsigned long *const_folio_flags(const struct folio *folio, >>>> unsigned n) >>>> { >>>> const struct page *page = &folio->page; >>>> >>>> VM_BUG_ON_PGFLAGS(PageTail(page), page); >>>> VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); >>>> return &page[n].flags; >>>> } >>>> >>>> (this page can be transformed from a head page to a tail page because, >>>> as the comment notes, we don't hold a reference. >>>> >>>> Please back this out. >>> >>> Should we generalize the approach in dump_folio() to locally copy a >>> folio, so we can safely perform checks before deciding whether we want >>> to try grabbing a reference on the real folio (if it's still a folio :) )? >>> >> >> Oh, and I forgot: isn't the existing code already racy? >> >> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page); Yes, in v1[1], I asked same question for existing code for PageTransHuge(page), "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here, but it seems that we don't guarantee the page won't be a tail page." we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable? --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) page = pfn_to_page(pfn); folio = page_folio(page); - /* - * No reference or lock is held on the folio, so it might - * be modified concurrently (e.g. split). As such, - * folio_nr_pages() may read garbage. This is fine as the outer - * loop will revisit the split folio later. - */ - if (folio_test_large(folio)) - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; - /* * HWPoison pages have elevated reference counts so the migration would * fail on them. It also doesn't make any sense to migrate them in the @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) folio_isolate_lru(folio); if (folio_mapped(folio)) unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK); + if (folio_test_large(folio)) + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; continue; } @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) dump_page(page, "isolation failed"); } } + + if (folio_test_large(folio)) + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; put_folio: folio_put(folio); } > > do_migrate_range is called after start_isolate_page_range(). So a page might not be able to > transform from a head page to a tail page as it's isolated? start_isolate_page_range() is only isolate free pages, so maybe irrelevant. > > Thanks. > [1] https://lore.kernel.org/linux-mm/4e693aa6-d742-4fe7-bd97-3d375f96fcfa@huawei.com/
On 2024/9/29 10:04, Kefeng Wang wrote: > > > On 2024/9/29 9:16, Miaohe Lin wrote: >> On 2024/9/28 16:39, David Hildenbrand wrote: >>> On 28.09.24 10:34, David Hildenbrand wrote: >>>> On 28.09.24 06:55, Matthew Wilcox wrote: >>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then >>>>>> remove unused head variable. >>>>> >>>>> I just noticed this got merged. You're going to hit BUG_ON with it. >>>>> >>>>>> - if (PageHuge(page)) { >>>>>> - pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>>>> - isolate_hugetlb(folio, &source); >>>>>> - continue; >>>>>> - } else if (PageTransHuge(page)) >>>>>> - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>>>> + /* >>>>>> + * No reference or lock is held on the folio, so it might >>>>>> + * be modified concurrently (e.g. split). As such, >>>>>> + * folio_nr_pages() may read garbage. This is fine as the outer >>>>>> + * loop will revisit the split folio later. >>>>>> + */ >>>>>> + if (folio_test_large(folio)) { >>>>> >>>>> But it's not fine. Look at the implementation of folio_test_large(): >>>>> >>>>> static inline bool folio_test_large(const struct folio *folio) >>>>> { >>>>> return folio_test_head(folio); >>>>> } >>>>> >>>>> That's going to be provided by: >>>>> >>>>> #define FOLIO_TEST_FLAG(name, page) \ >>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \ >>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >>>>> >>>>> and here's the BUG: >>>>> >>>>> static const unsigned long *const_folio_flags(const struct folio *folio, >>>>> unsigned n) >>>>> { >>>>> const struct page *page = &folio->page; >>>>> >>>>> VM_BUG_ON_PGFLAGS(PageTail(page), page); >>>>> VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); >>>>> return &page[n].flags; >>>>> } >>>>> >>>>> (this page can be transformed from a head page to a tail page because, >>>>> as the comment notes, we don't hold a reference. >>>>> >>>>> Please back this out. >>>> >>>> Should we generalize the approach in dump_folio() to locally copy a >>>> folio, so we can safely perform checks before deciding whether we want >>>> to try grabbing a reference on the real folio (if it's still a folio :) )? >>>> >>> >>> Oh, and I forgot: isn't the existing code already racy? >>> >>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page); > > Yes, in v1[1], I asked same question for existing code for PageTransHuge(page), > > "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here, > but it seems that we don't guarantee the page won't be a tail page." > > > we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable? > > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > page = pfn_to_page(pfn); > folio = page_folio(page); > > - /* > - * No reference or lock is held on the folio, so it might > - * be modified concurrently (e.g. split). As such, > - * folio_nr_pages() may read garbage. This is fine as the outer > - * loop will revisit the split folio later. > - */ > - if (folio_test_large(folio)) > - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > - > /* > * HWPoison pages have elevated reference counts so the migration would > * fail on them. It also doesn't make any sense to migrate them in the > @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > folio_isolate_lru(folio); > if (folio_mapped(folio)) > unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK); > + if (folio_test_large(folio)) > + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > continue; > } > > @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > dump_page(page, "isolation failed"); > } > } > + > + if (folio_test_large(folio)) > + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > put_folio: > folio_put(folio); > } > > >> >> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to >> transform from a head page to a tail page as it's isolated? > start_isolate_page_range() is only isolate free pages, so maybe irrelevant. A page transform from a head page to a tail page should through the below steps: 1. The compound page is freed into buddy. 2. It's merged into larger order in buddy. 3. It's allocated as a larger order compound page. Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss something? Thanks.
On 29.09.24 04:19, Miaohe Lin wrote: > On 2024/9/29 10:04, Kefeng Wang wrote: >> >> >> On 2024/9/29 9:16, Miaohe Lin wrote: >>> On 2024/9/28 16:39, David Hildenbrand wrote: >>>> On 28.09.24 10:34, David Hildenbrand wrote: >>>>> On 28.09.24 06:55, Matthew Wilcox wrote: >>>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then >>>>>>> remove unused head variable. >>>>>> >>>>>> I just noticed this got merged. You're going to hit BUG_ON with it. >>>>>> >>>>>>> - if (PageHuge(page)) { >>>>>>> - pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>>>>> - isolate_hugetlb(folio, &source); >>>>>>> - continue; >>>>>>> - } else if (PageTransHuge(page)) >>>>>>> - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>>>>> + /* >>>>>>> + * No reference or lock is held on the folio, so it might >>>>>>> + * be modified concurrently (e.g. split). As such, >>>>>>> + * folio_nr_pages() may read garbage. This is fine as the outer >>>>>>> + * loop will revisit the split folio later. >>>>>>> + */ >>>>>>> + if (folio_test_large(folio)) { >>>>>> >>>>>> But it's not fine. Look at the implementation of folio_test_large(): >>>>>> >>>>>> static inline bool folio_test_large(const struct folio *folio) >>>>>> { >>>>>> return folio_test_head(folio); >>>>>> } >>>>>> >>>>>> That's going to be provided by: >>>>>> >>>>>> #define FOLIO_TEST_FLAG(name, page) \ >>>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \ >>>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >>>>>> >>>>>> and here's the BUG: >>>>>> >>>>>> static const unsigned long *const_folio_flags(const struct folio *folio, >>>>>> unsigned n) >>>>>> { >>>>>> const struct page *page = &folio->page; >>>>>> >>>>>> VM_BUG_ON_PGFLAGS(PageTail(page), page); >>>>>> VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); >>>>>> return &page[n].flags; >>>>>> } >>>>>> >>>>>> (this page can be transformed from a head page to a tail page because, >>>>>> as the comment notes, we don't hold a reference. >>>>>> >>>>>> Please back this out. >>>>> >>>>> Should we generalize the approach in dump_folio() to locally copy a >>>>> folio, so we can safely perform checks before deciding whether we want >>>>> to try grabbing a reference on the real folio (if it's still a folio :) )? >>>>> >>>> >>>> Oh, and I forgot: isn't the existing code already racy? >>>> >>>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page); >> >> Yes, in v1[1], I asked same question for existing code for PageTransHuge(page), >> >> "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here, >> but it seems that we don't guarantee the page won't be a tail page." >> >> >> we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable? >> >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >> page = pfn_to_page(pfn); >> folio = page_folio(page); >> >> - /* >> - * No reference or lock is held on the folio, so it might >> - * be modified concurrently (e.g. split). As such, >> - * folio_nr_pages() may read garbage. This is fine as the outer >> - * loop will revisit the split folio later. >> - */ >> - if (folio_test_large(folio)) >> - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >> - >> /* >> * HWPoison pages have elevated reference counts so the migration would >> * fail on them. It also doesn't make any sense to migrate them in the >> @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >> folio_isolate_lru(folio); >> if (folio_mapped(folio)) >> unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK); >> + if (folio_test_large(folio)) >> + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >> continue; >> } >> >> @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >> dump_page(page, "isolation failed"); >> } >> } >> + >> + if (folio_test_large(folio)) >> + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >> put_folio: >> folio_put(folio); >> } >> >> >>> >>> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to >>> transform from a head page to a tail page as it's isolated? >> start_isolate_page_range() is only isolate free pages, so maybe irrelevant. > > A page transform from a head page to a tail page should through the below steps: > 1. The compound page is freed into buddy. > 2. It's merged into larger order in buddy. > 3. It's allocated as a larger order compound page. > > Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss something? By isolated, you mean that the pageblock is isolated, and all free pages are in the MIGRATE_ISOLATE buddy list. Nice observation. Indeed, a tail page could become a head page (concurrent split is possible), but a head page should not become a tail for the reason you mention. Even mm/page_reporting.c will skip isolated pageblocks. I wonder if there are some corner cases, but nothing comes to mind that would perform compound allocations from the MIGRATE_ISOLATE list.
On 2024/9/30 17:25, David Hildenbrand wrote: > On 29.09.24 04:19, Miaohe Lin wrote: >> On 2024/9/29 10:04, Kefeng Wang wrote: >>> >>> >>> On 2024/9/29 9:16, Miaohe Lin wrote: >>>> On 2024/9/28 16:39, David Hildenbrand wrote: >>>>> On 28.09.24 10:34, David Hildenbrand wrote: >>>>>> On 28.09.24 06:55, Matthew Wilcox wrote: >>>>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>>>>>>> Directly use a folio for HugeTLB and THP when calculate the next >>>>>>>> pfn, then >>>>>>>> remove unused head variable. >>>>>>> >>>>>>> I just noticed this got merged. You're going to hit BUG_ON with it. >>>>>>> >>>>>>>> - if (PageHuge(page)) { >>>>>>>> - pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>>>>>> - isolate_hugetlb(folio, &source); >>>>>>>> - continue; >>>>>>>> - } else if (PageTransHuge(page)) >>>>>>>> - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>>>>>> + /* >>>>>>>> + * No reference or lock is held on the folio, so it might >>>>>>>> + * be modified concurrently (e.g. split). As such, >>>>>>>> + * folio_nr_pages() may read garbage. This is fine as >>>>>>>> the outer >>>>>>>> + * loop will revisit the split folio later. >>>>>>>> + */ >>>>>>>> + if (folio_test_large(folio)) { >>>>>>> >>>>>>> But it's not fine. Look at the implementation of >>>>>>> folio_test_large(): >>>>>>> >>>>>>> static inline bool folio_test_large(const struct folio *folio) >>>>>>> { >>>>>>> return folio_test_head(folio); >>>>>>> } >>>>>>> >>>>>>> That's going to be provided by: >>>>>>> >>>>>>> #define FOLIO_TEST_FLAG(name, >>>>>>> page) \ >>>>>>> static __always_inline bool folio_test_##name(const struct folio >>>>>>> *folio) \ >>>>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >>>>>>> >>>>>>> and here's the BUG: >>>>>>> >>>>>>> static const unsigned long *const_folio_flags(const struct folio >>>>>>> *folio, >>>>>>> unsigned n) >>>>>>> { >>>>>>> const struct page *page = &folio->page; >>>>>>> >>>>>>> VM_BUG_ON_PGFLAGS(PageTail(page), page); >>>>>>> VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page- >>>>>>> >flags), page); >>>>>>> return &page[n].flags; >>>>>>> } >>>>>>> >>>>>>> (this page can be transformed from a head page to a tail page >>>>>>> because, >>>>>>> as the comment notes, we don't hold a reference. >>>>>>> >>>>>>> Please back this out. >>>>>> >>>>>> Should we generalize the approach in dump_folio() to locally copy a >>>>>> folio, so we can safely perform checks before deciding whether we >>>>>> want >>>>>> to try grabbing a reference on the real folio (if it's still a >>>>>> folio :) )? >>>>>> >>>>> >>>>> Oh, and I forgot: isn't the existing code already racy? >>>>> >>>>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page); >>> >>> Yes, in v1[1], I asked same question for existing code for >>> PageTransHuge(page), >>> >>> "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here, >>> but it seems that we don't guarantee the page won't be a tail page." >>> >>> ... >>>> >>>> do_migrate_range is called after start_isolate_page_range(). So a >>>> page might not be able to >>>> transform from a head page to a tail page as it's isolated? >>> start_isolate_page_range() is only isolate free pages, so maybe >>> irrelevant. >> >> A page transform from a head page to a tail page should through the >> below steps: >> 1. The compound page is freed into buddy. >> 2. It's merged into larger order in buddy. >> 3. It's allocated as a larger order compound page. >> >> Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss >> something? > > By isolated, you mean that the pageblock is isolated, and all free pages > are in the MIGRATE_ISOLATE buddy list. Nice observation. > > Indeed, a tail page could become a head page (concurrent split is > possible), but a head page should not become a tail for the reason you > mention. > > Even mm/page_reporting.c will skip isolated pageblocks. > > I wonder if there are some corner cases, but nothing comes to mind that > would perform compound allocations from the MIGRATE_ISOLATE list. > As David/Miaohe explanation, it seems that we can't hint the VM_BUG_ON, so no more changes, thanks for all the review and comments.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 38ecd45c3f94..9ef776a25b9d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1773,7 +1773,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end, static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn; - struct page *page, *head; + struct page *page; LIST_HEAD(source); static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); @@ -1786,14 +1786,20 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) continue; page = pfn_to_page(pfn); folio = page_folio(page); - head = &folio->page; - if (PageHuge(page)) { - pfn = page_to_pfn(head) + compound_nr(head) - 1; - isolate_hugetlb(folio, &source); - continue; - } else if (PageTransHuge(page)) - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; + /* + * No reference or lock is held on the folio, so it might + * be modified concurrently (e.g. split). As such, + * folio_nr_pages() may read garbage. This is fine as the outer + * loop will revisit the split folio later. + */ + if (folio_test_large(folio)) { + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; + if (folio_test_hugetlb(folio)) { + isolate_hugetlb(folio, &source); + continue; + } + } /* * HWPoison pages have elevated reference counts so the migration would