diff mbox series

[7/9] mm: Add deferred_list page flag

Message ID 20230815032645.1393700-8-willy@infradead.org (mailing list archive)
State New
Headers show
Series Remove _folio_dtor and _folio_order | expand

Commit Message

Matthew Wilcox Aug. 15, 2023, 3:26 a.m. UTC
Stored in the first tail page's flags, this flag replaces the destructor.
That removes the last of the destructors, so remove all references to
folio_dtor and compound_dtor.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/admin-guide/kdump/vmcoreinfo.rst |  4 ++--
 include/linux/mm.h                             | 14 --------------
 include/linux/mm_types.h                       |  2 --
 include/linux/page-flags.h                     |  6 ++++++
 kernel/crash_core.c                            |  1 -
 mm/huge_memory.c                               |  4 ++--
 mm/internal.h                                  |  1 -
 mm/page_alloc.c                                |  7 +------
 8 files changed, 11 insertions(+), 28 deletions(-)

Comments

David Hildenbrand Aug. 15, 2023, 7:54 a.m. UTC | #1
On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> Stored in the first tail page's flags, this flag replaces the destructor.
> That removes the last of the destructors, so remove all references to
> folio_dtor and compound_dtor.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---

[...]

>   
> +	/* Has a deferred list (may be empty).  First tail page. */
> +	PG_deferred_list = PG_reclaim,
> +

If PG_deferred_list implies thp (and replaces the thp dtor), should we 
rather name this PG_thp or something along those lines?

[...]

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 21af71aea6eb..9fe9209605a5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -582,9 +582,6 @@ static inline void free_the_page(struct page *page, unsigned int order)
>    * The remaining PAGE_SIZE pages are called "tail pages". PageTail() is encoded
>    * in bit 0 of page->compound_head. The rest of bits is pointer to head page.
>    *
> - * The first tail page's ->compound_dtor describes how to destroy the
> - * compound page.
> - *
>    * The first tail page's ->compound_order holds the order of allocation.
>    * This usage means that zero-order pages may not be compound.
>    */
> @@ -603,14 +600,12 @@ void prep_compound_page(struct page *page, unsigned int order)
>   
>   void destroy_large_folio(struct folio *folio)
>   {
> -	enum compound_dtor_id dtor = folio->_folio_dtor;
> -
>   	if (folio_test_hugetlb(folio)) {
>   		free_huge_page(folio);
>   		return;
>   	}
>   
> -	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR)
> +	if (folio_test_deferred_list(folio))

Similar question as before: why not have folio_test_transhuge() perform 
this check internally?

The sequence of

	if (folio_test_deferred_list(folio))
		free_transhuge_folio(folio);

Looks less clear to what we had before.
Matthew Wilcox Aug. 15, 2023, 3:32 p.m. UTC | #2
On Tue, Aug 15, 2023 at 09:54:36AM +0200, David Hildenbrand wrote:
> On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> > Stored in the first tail page's flags, this flag replaces the destructor.
> > That removes the last of the destructors, so remove all references to
> > folio_dtor and compound_dtor.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> 
> [...]
> 
> > +	/* Has a deferred list (may be empty).  First tail page. */
> > +	PG_deferred_list = PG_reclaim,
> > +
> 
> If PG_deferred_list implies thp (and replaces the thp dtor), should we
> rather name this PG_thp or something along those lines?

We're trying to use 'thp' to mean 'a folio which is pmd mappable',
so I'd rather not call it that.

> The sequence of
> 
> 	if (folio_test_deferred_list(folio))
> 		free_transhuge_folio(folio);
> 
> Looks less clear to what we had before.

I can rename that.  How about

	if (folio_test_deferred_list(folio))
		folio_remove_deferred(folio);
David Hildenbrand Aug. 15, 2023, 4:40 p.m. UTC | #3
On 15.08.23 17:32, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 09:54:36AM +0200, David Hildenbrand wrote:
>> On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
>>> Stored in the first tail page's flags, this flag replaces the destructor.
>>> That removes the last of the destructors, so remove all references to
>>> folio_dtor and compound_dtor.
>>>
>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>> ---
>>
>> [...]
>>
>>> +	/* Has a deferred list (may be empty).  First tail page. */
>>> +	PG_deferred_list = PG_reclaim,
>>> +
>>
>> If PG_deferred_list implies thp (and replaces the thp dtor), should we
>> rather name this PG_thp or something along those lines?
> 
> We're trying to use 'thp' to mean 'a folio which is pmd mappable',
> so I'd rather not call it that.

There is no conclusion on that.

And I am not sure if inventing new terminology will help anybody (both, 
users and developers). Just call the old thing "PMD-sized THP".

After all, the deferred split queue is just an implementation detail, 
and it happens to live in tailpage 2, no?

Once we would end up initializing something else in 
prep_transhuge_page(), it would turn out pretty confusing if that is 
called folio_remove_deferred() ...

In the end, I don't care as long as it doesn't add confusion; this did. 
We most probably won't reach a conclusion here and that shouldn't block 
this patch set.

So at least prep_transhuge_page() should not be renamed to 
folio_remove_deferred() imho ...
Matthew Wilcox Aug. 15, 2023, 5:06 p.m. UTC | #4
On Tue, Aug 15, 2023 at 06:40:55PM +0200, David Hildenbrand wrote:
> On 15.08.23 17:32, Matthew Wilcox wrote:
> > On Tue, Aug 15, 2023 at 09:54:36AM +0200, David Hildenbrand wrote:
> > > On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> > > > Stored in the first tail page's flags, this flag replaces the destructor.
> > > > That removes the last of the destructors, so remove all references to
> > > > folio_dtor and compound_dtor.
> > > > 
> > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > ---
> > > 
> > > [...]
> > > 
> > > > +	/* Has a deferred list (may be empty).  First tail page. */
> > > > +	PG_deferred_list = PG_reclaim,
> > > > +
> > > 
> > > If PG_deferred_list implies thp (and replaces the thp dtor), should we
> > > rather name this PG_thp or something along those lines?
> > 
> > We're trying to use 'thp' to mean 'a folio which is pmd mappable',
> > so I'd rather not call it that.
> 
> There is no conclusion on that.

Theree are a lot of counters called THP and TransHuge and other variants
which are exposed to userspace, and the (user) assumption is that this counts
PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
you'll find them.  If we have folio_test_thp(), people will write:

	if (folio_test_thp(folio))
		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);

instead of using folio_test_pmd_mappable().

> After all, the deferred split queue is just an implementation detail, and it
> happens to live in tailpage 2, no?
> 
> Once we would end up initializing something else in prep_transhuge_page(),
> it would turn out pretty confusing if that is called folio_remove_deferred()
> ...

Perhaps the key difference between normal compound pages and file/anon
compound pages is that the latter are splittable?  So we can name all
of this:

	folio_init_splittable()
	folio_test_splittable()
	folio_fini_splittable()

Maybe that's still too close to an implementation detail, but it's at
least talking about _a_ characteristic of the folio, even if it's not
the _only_ characteristic of the folio.
David Hildenbrand Aug. 15, 2023, 5:27 p.m. UTC | #5
On 15.08.23 19:06, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 06:40:55PM +0200, David Hildenbrand wrote:
>> On 15.08.23 17:32, Matthew Wilcox wrote:
>>> On Tue, Aug 15, 2023 at 09:54:36AM +0200, David Hildenbrand wrote:
>>>> On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
>>>>> Stored in the first tail page's flags, this flag replaces the destructor.
>>>>> That removes the last of the destructors, so remove all references to
>>>>> folio_dtor and compound_dtor.
>>>>>
>>>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>> +	/* Has a deferred list (may be empty).  First tail page. */
>>>>> +	PG_deferred_list = PG_reclaim,
>>>>> +
>>>>
>>>> If PG_deferred_list implies thp (and replaces the thp dtor), should we
>>>> rather name this PG_thp or something along those lines?
>>>
>>> We're trying to use 'thp' to mean 'a folio which is pmd mappable',
>>> so I'd rather not call it that.
>>
>> There is no conclusion on that.
> 
> Theree are a lot of counters called THP and TransHuge and other variants
> which are exposed to userspace, and the (user) assumption is that this counts
> PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
> you'll find them.  If we have folio_test_thp(), people will write:
> 
> 	if (folio_test_thp(folio))
> 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
> 
> instead of using folio_test_pmd_mappable().
> 


So if we *really* don't want to use THP to express that we have a page, 
then let's see what these pages are:
* can be mapped to user space
* are transparent to most MM-related systemcalls by (un) mapping
   them in system page size (PTEs)

That we can split these pages (not PTE-map, but convert from large folio 
to small folios) is one characteristic, but IMHO not the main one (and 
maybe not even required at all!).

Maybe we can come up with a better term for "THP, but not necessarily 
PMD-sized".

"Large folio" is IMHO bad. A hugetlb page is a large folio and not all 
large folios can be mapped to user space.

"Transparent large folios" ? Better IMHO.


>> After all, the deferred split queue is just an implementation detail, and it
>> happens to live in tailpage 2, no?
>>
>> Once we would end up initializing something else in prep_transhuge_page(),
>> it would turn out pretty confusing if that is called folio_remove_deferred()
>> ...
> 
> Perhaps the key difference between normal compound pages and file/anon
> compound pages is that the latter are splittable?  So we can name all
> of this:
> 
> 	folio_init_splittable()
> 	folio_test_splittable()
> 	folio_fini_splittable()
> 
> Maybe that's still too close to an implementation detail, but it's at
> least talking about _a_ characteristic of the folio, even if it's not
> the _only_ characteristic of the folio.

Maybe folio_init_transparent() ... avoiding the "huge" part of it.

Very open for alternatives. As expressed in other context, we really 
should figure this out soon.
Matthew Wilcox Aug. 15, 2023, 7:58 p.m. UTC | #6
On Tue, Aug 15, 2023 at 07:27:26PM +0200, David Hildenbrand wrote:
> On 15.08.23 19:06, Matthew Wilcox wrote:
> > Theree are a lot of counters called THP and TransHuge and other variants
> > which are exposed to userspace, and the (user) assumption is that this counts
> > PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
> > you'll find them.  If we have folio_test_thp(), people will write:
> > 
> > 	if (folio_test_thp(folio))
> > 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
> > 
> > instead of using folio_test_pmd_mappable().
> 
> So if we *really* don't want to use THP to express that we have a page, then
> let's see what these pages are:
> * can be mapped to user space
> * are transparent to most MM-related systemcalls by (un) mapping
>   them in system page size (PTEs)

 * Are managed on the LRU
 * Can be dirtied, written back

> That we can split these pages (not PTE-map, but convert from large folio to
> small folios) is one characteristic, but IMHO not the main one (and maybe
> not even required at all!).

It's the one which distinguishes them from, say, compound pages used for
slab.  Or used by device drivers.  Or net pagepool, or vmalloc.  There's
a lot of compound allocations out there, and the only ones which need
special treatment here are the ones which are splittable.

> Maybe we can come up with a better term for "THP, but not necessarily
> PMD-sized".
> 
> "Large folio" is IMHO bad. A hugetlb page is a large folio and not all large
> folios can be mapped to user space.
> 
> "Transparent large folios" ? Better IMHO.

I think this goes back to Johannes' point many months ago that we need
separate names for some things.  He wants to split anon & file memory
apart (who gets to keep the name "folio" in the divorce?  what do we
name the type that encompasses both folios and the other one?  or do
they both get different names?)

> > Perhaps the key difference between normal compound pages and file/anon
> > compound pages is that the latter are splittable?  So we can name all
> > of this:
> > 
> > 	folio_init_splittable()
> > 	folio_test_splittable()
> > 	folio_fini_splittable()
> > 
> > Maybe that's still too close to an implementation detail, but it's at
> > least talking about _a_ characteristic of the folio, even if it's not
> > the _only_ characteristic of the folio.
> 
> Maybe folio_init_transparent() ... avoiding the "huge" part of it.
> 
> Very open for alternatives. As expressed in other context, we really should
> figure this out soon.

Yeah, I'm open to better naming too.  At this point in the flow we're
trying to distinguish between compound pages used for slab and compound
pages used for anon/file, but that's not always going to be the case
elsewhere.
Matthew Wilcox Aug. 16, 2023, 3:14 a.m. UTC | #7
On Tue, Aug 15, 2023 at 08:58:24PM +0100, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 07:27:26PM +0200, David Hildenbrand wrote:
> > On 15.08.23 19:06, Matthew Wilcox wrote:
> > > Theree are a lot of counters called THP and TransHuge and other variants
> > > which are exposed to userspace, and the (user) assumption is that this counts
> > > PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
> > > you'll find them.  If we have folio_test_thp(), people will write:
> > > 
> > > 	if (folio_test_thp(folio))
> > > 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
> > > 
> > > instead of using folio_test_pmd_mappable().
> > 
> > So if we *really* don't want to use THP to express that we have a page, then
> > let's see what these pages are:
> > * can be mapped to user space
> > * are transparent to most MM-related systemcalls by (un) mapping
> >   them in system page size (PTEs)
> 
>  * Are managed on the LRU

I think this is the best one to go with.  Either that or "managed by
rmap".  That excludes compoud pages which are allocated from vmalloc()
(which can be mmaped), page tables, slab, etc.  It includes both file
and anon folios.

I have a handy taxonomy here: https://kernelnewbies.org/MemoryTypes

Unfortunately, folio_test_lru() already exists and means something
different ("Is this folio on an LRU list").  I fear folio_test_rmap()
would have a similar confusion -- "Is this folio currently findable by
rmap", or some such. folio_test_rmappable()?
David Hildenbrand Aug. 16, 2023, 9:55 a.m. UTC | #8
On 15.08.23 21:58, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 07:27:26PM +0200, David Hildenbrand wrote:
>> On 15.08.23 19:06, Matthew Wilcox wrote:
>>> Theree are a lot of counters called THP and TransHuge and other variants
>>> which are exposed to userspace, and the (user) assumption is that this counts
>>> PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
>>> you'll find them.  If we have folio_test_thp(), people will write:
>>>
>>> 	if (folio_test_thp(folio))
>>> 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
>>>
>>> instead of using folio_test_pmd_mappable().
>>
>> So if we *really* don't want to use THP to express that we have a page, then
>> let's see what these pages are:
>> * can be mapped to user space
>> * are transparent to most MM-related systemcalls by (un) mapping
>>    them in system page size (PTEs)
> 
>   * Are managed on the LRU
>   * Can be dirtied, written back

Right, but at least hugetlb *could* be extended to do that as well (and 
even implement swapping). I think the biggest difference is the 
transparency/PTE-mapping/unmapping/ ....

> 
>> That we can split these pages (not PTE-map, but convert from large folio to
>> small folios) is one characteristic, but IMHO not the main one (and maybe
>> not even required at all!).
> 
> It's the one which distinguishes them from, say, compound pages used for
> slab.  Or used by device drivers.  Or net pagepool, or vmalloc.  There's
> a lot of compound allocations out there, and the only ones which need
> special treatment here are the ones which are splittable.

And my point is that that is an implementation detail I'm afraid. 
Instead of splitting the folio into order-0 folios, you could also 
migrate off all data to order-0 folios and just free the large folio.

Because splitting only succeeds if there are no other references on the 
folio, just like migration.

But let's not get distracted :)

> 
>> Maybe we can come up with a better term for "THP, but not necessarily
>> PMD-sized".
>>
>> "Large folio" is IMHO bad. A hugetlb page is a large folio and not all large
>> folios can be mapped to user space.
>>
>> "Transparent large folios" ? Better IMHO.
> 
> I think this goes back to Johannes' point many months ago that we need
> separate names for some things.  He wants to split anon & file memory
> apart (who gets to keep the name "folio" in the divorce?  what do we
> name the type that encompasses both folios and the other one?  or do
> they both get different names?)

Good question. I remember discussing a type hierarchy back when you 
upstreamed folios.

Maybe we would have "file folios" and "anon folios.

> 
>>> Perhaps the key difference between normal compound pages and file/anon
>>> compound pages is that the latter are splittable?  So we can name all
>>> of this:
>>>
>>> 	folio_init_splittable()
>>> 	folio_test_splittable()
>>> 	folio_fini_splittable()
>>>
>>> Maybe that's still too close to an implementation detail, but it's at
>>> least talking about _a_ characteristic of the folio, even if it's not
>>> the _only_ characteristic of the folio.
>>
>> Maybe folio_init_transparent() ... avoiding the "huge" part of it.
>>
>> Very open for alternatives. As expressed in other context, we really should
>> figure this out soon.
> 
> Yeah, I'm open to better naming too.  At this point in the flow we're
> trying to distinguish between compound pages used for slab and compound
> pages used for anon/file, but that's not always going to be the case
> elsewhere.


Yes. Let me reply to your other mail.
David Hildenbrand Aug. 16, 2023, 10:12 a.m. UTC | #9
On 16.08.23 05:14, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 08:58:24PM +0100, Matthew Wilcox wrote:
>> On Tue, Aug 15, 2023 at 07:27:26PM +0200, David Hildenbrand wrote:
>>> On 15.08.23 19:06, Matthew Wilcox wrote:
>>>> Theree are a lot of counters called THP and TransHuge and other variants
>>>> which are exposed to userspace, and the (user) assumption is that this counts
>>>> PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
>>>> you'll find them.  If we have folio_test_thp(), people will write:
>>>>
>>>> 	if (folio_test_thp(folio))
>>>> 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
>>>>
>>>> instead of using folio_test_pmd_mappable().
>>>
>>> So if we *really* don't want to use THP to express that we have a page, then
>>> let's see what these pages are:
>>> * can be mapped to user space
>>> * are transparent to most MM-related systemcalls by (un) mapping
>>>    them in system page size (PTEs)
>>
>>   * Are managed on the LRU
> 
> I think this is the best one to go with.  Either that or "managed by
> rmap".  That excludes compoud pages which are allocated from vmalloc()
> (which can be mmaped), page tables, slab, etc.  It includes both file
> and anon folios.
> 
> I have a handy taxonomy here: https://kernelnewbies.org/MemoryTypes
> 
> Unfortunately, folio_test_lru() already exists and means something
> different ("Is this folio on an LRU list").  I fear folio_test_rmap()
> would have a similar confusion -- "Is this folio currently findable by
> rmap", or some such. folio_test_rmappable()?
But what about hugetlb, they are also remappable? We could have 
folio_test_rmappable(), but that would then also better include hugetlb ...

(in theory, one could envision hugetlb also using an lru mechanism, 
although I doubt/hope it will ever happen)

Starting at the link you provided, I guess "vmalloc" and "net pool" 
would not fall under that category, or would they? (I'm assuming they 
don't get mapped using the rmap, so they are "different", and they are 
not managed by lru).

So I assume we only care about anon+file (lru-managed). Only these are 
rmappable (besides hugetlb), correct?

folio_test_lru_managed()

Might be cleanest to describe anon+file that are managed by the lru, 
just might not be on a lru list right now (difference to folio_test_lru()).


I've been also thinking about

"folio_test_normal"

But it only makes sense when "all others (including hugetlb) are the odd 
one".
Matthew Wilcox Aug. 16, 2023, 12:05 p.m. UTC | #10
On Wed, Aug 16, 2023 at 12:12:44PM +0200, David Hildenbrand wrote:
> On 16.08.23 05:14, Matthew Wilcox wrote:
> > >   * Are managed on the LRU
> > 
> > I think this is the best one to go with.  Either that or "managed by
> > rmap".  That excludes compoud pages which are allocated from vmalloc()
> > (which can be mmaped), page tables, slab, etc.  It includes both file
> > and anon folios.
> > 
> > I have a handy taxonomy here: https://kernelnewbies.org/MemoryTypes
> > 
> > Unfortunately, folio_test_lru() already exists and means something
> > different ("Is this folio on an LRU list").  I fear folio_test_rmap()
> > would have a similar confusion -- "Is this folio currently findable by
> > rmap", or some such. folio_test_rmappable()?
> But what about hugetlb, they are also remappable? We could have
> folio_test_rmappable(), but that would then also better include hugetlb ...

We could do that!  Have both hugetlb & huge_memory.c set the rmappable
flag.  We'd still know which destructor to call because hugetlb also sets
the hugetlb flag.

> Starting at the link you provided, I guess "vmalloc" and "net pool" would
> not fall under that category, or would they? (I'm assuming they don't get
> mapped using the rmap, so they are "different", and they are not managed by
> lru).

Right, neither type of page ends up on the LRU, and neither is added to
rmap.

> So I assume we only care about anon+file (lru-managed). Only these are
> rmappable (besides hugetlb), correct?
> 
> folio_test_lru_managed()
> 
> Might be cleanest to describe anon+file that are managed by the lru, just
> might not be on a lru list right now (difference to folio_test_lru()).

Something I didn't think about last night is that this flag only
_exists_ for large folios.  folio_test_lru_managed() (and
folio_test_rmappable()) both sound like they might work if you call them
on single-page folios, but we BUG if you do (see folio_flags()) 

> I've been also thinking about
> 
> "folio_test_normal"
> 
> But it only makes sense when "all others (including hugetlb) are the odd
> one".

Who's to say slab is abnormal?  ;-)  But this one also fails to
communicate "only call this on large folios".  folio_test_splittable()
does at least communicate that this is related to large folios, although
one might simply expect it to return false for single-page folios rather
than BUG.

folio_test_large_rmappable()?
David Hildenbrand Aug. 16, 2023, 12:34 p.m. UTC | #11
On 16.08.23 14:05, Matthew Wilcox wrote:
> On Wed, Aug 16, 2023 at 12:12:44PM +0200, David Hildenbrand wrote:
>> On 16.08.23 05:14, Matthew Wilcox wrote:
>>>>    * Are managed on the LRU
>>>
>>> I think this is the best one to go with.  Either that or "managed by
>>> rmap".  That excludes compoud pages which are allocated from vmalloc()
>>> (which can be mmaped), page tables, slab, etc.  It includes both file
>>> and anon folios.
>>>
>>> I have a handy taxonomy here: https://kernelnewbies.org/MemoryTypes
>>>
>>> Unfortunately, folio_test_lru() already exists and means something
>>> different ("Is this folio on an LRU list").  I fear folio_test_rmap()
>>> would have a similar confusion -- "Is this folio currently findable by
>>> rmap", or some such. folio_test_rmappable()?
>> But what about hugetlb, they are also remappable? We could have
>> folio_test_rmappable(), but that would then also better include hugetlb ...
> 
> We could do that!  Have both hugetlb & huge_memory.c set the rmappable
> flag.  We'd still know which destructor to call because hugetlb also sets
> the hugetlb flag.
> 
>> Starting at the link you provided, I guess "vmalloc" and "net pool" would
>> not fall under that category, or would they? (I'm assuming they don't get
>> mapped using the rmap, so they are "different", and they are not managed by
>> lru).
> 
> Right, neither type of page ends up on the LRU, and neither is added to
> rmap.
> 
>> So I assume we only care about anon+file (lru-managed). Only these are
>> rmappable (besides hugetlb), correct?
>>
>> folio_test_lru_managed()
>>
>> Might be cleanest to describe anon+file that are managed by the lru, just
>> might not be on a lru list right now (difference to folio_test_lru()).
> 
> Something I didn't think about last night is that this flag only
> _exists_ for large folios.  folio_test_lru_managed() (and
> folio_test_rmappable()) both sound like they might work if you call them
> on single-page folios, but we BUG if you do (see folio_flags())
> 
>> I've been also thinking about
>>
>> "folio_test_normal"
>>
>> But it only makes sense when "all others (including hugetlb) are the odd
>> one".
> 
> Who's to say slab is abnormal?  ;-)  But this one also fails to
> communicate "only call this on large folios".  folio_test_splittable()
> does at least communicate that this is related to large folios, although
> one might simply expect it to return false for single-page folios rather
> than BUG.
> 
> folio_test_large_rmappable()?

Sounds good to me. We can then further test if it's hugetlb to rule that 
one out.
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index baa1c355741d..3bd38ac0e7de 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -141,8 +141,8 @@  nodemask_t
 The size of a nodemask_t type. Used to compute the number of online
 nodes.
 
-(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head)
--------------------------------------------------------------------------------------------------
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compound_head)
+----------------------------------------------------------------------------------
 
 User-space tools compute their values based on the offset of these
 variables. The variables are used when excluding unnecessary pages.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c8c8b1fd64d3..cf0ae8c51d7f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1267,20 +1267,6 @@  void folio_copy(struct folio *dst, struct folio *src);
 
 unsigned long nr_free_buffer_pages(void);
 
-/* Compound pages may have a special destructor */
-enum compound_dtor_id {
-	COMPOUND_PAGE_DTOR,
-	TRANSHUGE_PAGE_DTOR,
-	NR_COMPOUND_DTORS
-};
-
-static inline void folio_set_compound_dtor(struct folio *folio,
-		enum compound_dtor_id compound_dtor)
-{
-	VM_BUG_ON_FOLIO(compound_dtor >= NR_COMPOUND_DTORS, folio);
-	folio->_folio_dtor = compound_dtor;
-}
-
 void destroy_large_folio(struct folio *folio);
 
 /* Returns the number of bytes in this potentially compound page. */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index da538ff68953..d45a2b8041e0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -282,7 +282,6 @@  static inline struct page *encoded_page_ptr(struct encoded_page *page)
  * @_refcount: Do not access this member directly.  Use folio_ref_count()
  *    to find how many references there are to this folio.
  * @memcg_data: Memory Control Group data.
- * @_folio_dtor: Which destructor to use for this folio.
  * @_folio_order: Do not use directly, call folio_order().
  * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
  * @_nr_pages_mapped: Do not use directly, call folio_mapcount().
@@ -336,7 +335,6 @@  struct folio {
 			unsigned long _flags_1;
 			unsigned long _head_1;
 	/* public: */
-			unsigned char _folio_dtor;
 			unsigned char _folio_order;
 			atomic_t _entire_mapcount;
 			atomic_t _nr_pages_mapped;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 01d98695e79f..aabf50dc71a3 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -183,6 +183,9 @@  enum pageflags {
 	/* Is a hugetlb page.  Stored in first tail page. */
 	PG_hugetlb = PG_writeback,
 
+	/* Has a deferred list (may be empty).  First tail page. */
+	PG_deferred_list = PG_reclaim,
+
 	/* non-lru isolated movable page */
 	PG_isolated = PG_reclaim,
 
@@ -809,6 +812,9 @@  static inline void ClearPageCompound(struct page *page)
 	BUG_ON(!PageHead(page));
 	ClearPageHead(page);
 }
+PAGEFLAG(DeferredList, deferred_list, PF_SECOND)
+#else
+TESTPAGEFLAG_FALSE(DeferredList, deferred_list)
 #endif
 
 #define PG_head_mask ((1UL << PG_head))
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index dd5f87047d06..934dd86e19f5 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -455,7 +455,6 @@  static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_OFFSET(page, lru);
 	VMCOREINFO_OFFSET(page, _mapcount);
 	VMCOREINFO_OFFSET(page, private);
-	VMCOREINFO_OFFSET(folio, _folio_dtor);
 	VMCOREINFO_OFFSET(folio, _folio_order);
 	VMCOREINFO_OFFSET(page, compound_head);
 	VMCOREINFO_OFFSET(pglist_data, node_zones);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 99e36ad540c4..3b5db99eb7ac 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -583,7 +583,7 @@  void prep_transhuge_page(struct page *page)
 
 	VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
 	INIT_LIST_HEAD(&folio->_deferred_list);
-	folio_set_compound_dtor(folio, TRANSHUGE_PAGE_DTOR);
+	folio_set_deferred_list(folio);
 }
 
 static inline bool is_transparent_hugepage(struct page *page)
@@ -595,7 +595,7 @@  static inline bool is_transparent_hugepage(struct page *page)
 
 	folio = page_folio(page);
 	return is_huge_zero_page(&folio->page) ||
-	       folio->_folio_dtor == TRANSHUGE_PAGE_DTOR;
+		folio_test_deferred_list(folio);
 }
 
 static unsigned long __thp_get_unmapped_area(struct file *filp,
diff --git a/mm/internal.h b/mm/internal.h
index 5a03bc4782a2..e3d11119b04e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -417,7 +417,6 @@  static inline void prep_compound_head(struct page *page, unsigned int order)
 {
 	struct folio *folio = (struct folio *)page;
 
-	folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
 	folio_set_order(folio, order);
 	atomic_set(&folio->_entire_mapcount, -1);
 	atomic_set(&folio->_nr_pages_mapped, 0);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 21af71aea6eb..9fe9209605a5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -582,9 +582,6 @@  static inline void free_the_page(struct page *page, unsigned int order)
  * The remaining PAGE_SIZE pages are called "tail pages". PageTail() is encoded
  * in bit 0 of page->compound_head. The rest of bits is pointer to head page.
  *
- * The first tail page's ->compound_dtor describes how to destroy the
- * compound page.
- *
  * The first tail page's ->compound_order holds the order of allocation.
  * This usage means that zero-order pages may not be compound.
  */
@@ -603,14 +600,12 @@  void prep_compound_page(struct page *page, unsigned int order)
 
 void destroy_large_folio(struct folio *folio)
 {
-	enum compound_dtor_id dtor = folio->_folio_dtor;
-
 	if (folio_test_hugetlb(folio)) {
 		free_huge_page(folio);
 		return;
 	}
 
-	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR)
+	if (folio_test_deferred_list(folio))
 		free_transhuge_folio(folio);
 	mem_cgroup_uncharge(folio);
 	free_the_page(&folio->page, folio_order(folio));