diff mbox series

[RFC,v3] mm: Proper document tail pages fields for folio

Message ID 20230815212547.431693-1-peterx@redhat.com (mailing list archive)
State New
Headers show
Series [RFC,v3] mm: Proper document tail pages fields for folio | expand

Commit Message

Peter Xu Aug. 15, 2023, 9:25 p.m. UTC
Tail page struct reuse is over-comlicated.  Not only because we have
implicit uses of tail page fields (mapcounts, or private for thp swap
support, etc., that we may still use in the page structs, but not obvious
the relationship between that and the folio definitions), but also because
we have 32/64 bits layouts for struct page so it's unclear what we can use
and what we cannot when trying to find a new spot in folio struct.

It's also unclear on how many fields we can reuse for a tail page.  The
real answer is (after help from Matthew): we have 7 WORDs guaranteed on 64
bits and 8 WORDs on 32 bits.  Nothing more than that is guaranteed to even
exist.  That means nothing over page->_refcount field can be reused.

Let's document it clearly on what we can use and what we can't when
extending folio on reusing tail page fields, with explanations on each of
them.  Hopefully after the doc update it will make it easier when:

  (1) Any reader to know exactly what folio field is where and for what,
  the relationships between folio tail pages and struct page definitions,

  (2) Any potential new fields to be added to a large folio, so we're clear
  which field one can still reuse.

This is assuming WORD is defined as sizeof(void *) on any archs, just like
the other comment in struct page we already have.

The _mapcount/_refcount fields are also added for each tail page to clamp
the fields tight, with FOLIO_MATCH() making sure nothing messed up the
ordering.

Signed-off-by: Peter Xu <peterx@redhat.com>
---

rfcv1: https://lore.kernel.org/all/20230810204944.53471-1-peterx@redhat.com
rfcv2: https://lore.kernel.org/r/20230814184411.330496-1-peterx@redhat.com

No change log since it changed quite a bit; I sent patch 1 separately as
non-rfc, while I merged the rest two patches because I just noticed I can
avoid reorder the fields, so no functional change should be intended, hence
no reason to split either.

Matthew, I wanted to remove the whole chunk of comments above the tail
pages from last version (which might fall into "over-documented" category),
but at last I still kept it; not only because I found that helpful to give
me a whole picture (maybe only me?), but also it's a good place to document
a few important things (e.g., on the fact that refcnt==0 is a must for all
tails).  I'm open to removing the chunk or part of it, if you think the
rest is still ok.

This of course also conflict so far with the other series to drop
folio_order/... but I can always rebase if this is not NACKed.

Comments welcomed, thanks.
---
 include/linux/mm_types.h | 69 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 66 insertions(+), 3 deletions(-)

Comments

David Hildenbrand Aug. 16, 2023, 1:33 p.m. UTC | #1
On 15.08.23 23:25, Peter Xu wrote:
> Tail page struct reuse is over-comlicated.  Not only because we have

It is complicated, agreed.

With the ->private for THP_SWAP gone, we would have to document less.
Stating that 4*4byte / 4*8 byte are available after flags+head would
be sufficient and I'd even drop the table.


> implicit uses of tail page fields (mapcounts, or private for thp swap
> support, etc., that we may still use in the page structs, 

Instead of documenting that thp swap should no longer touch the private
field of tail pages, maybe we can indeed fix that quite easily.

My simple tests passed so far. If there isn't something obvious missing,
I can do more testing and send this as an official patch.


 From ec0f8b0dd8fb81c316b6a4c5fc9ae7563e625404 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Wed, 16 Aug 2023 13:14:45 +0200
Subject: [PATCH] mm/swap: stop using page->private on tail pages for THP_SWAP

Let's stop using page->private on tail pages, making it possible to
just unconditionally reuse that field in the tail pages of large folios.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  arch/arm64/mm/mteswap.c |  5 +++--
  include/linux/swap.h    |  9 +++++++++
  mm/huge_memory.c        | 15 ++++++---------
  mm/memory.c             |  2 +-
  mm/rmap.c               |  2 +-
  mm/swap_state.c         |  4 ++--
  mm/swapfile.c           |  4 ++--
  7 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
index cd508ba80ab1..a31833e3ddc5 100644
--- a/arch/arm64/mm/mteswap.c
+++ b/arch/arm64/mm/mteswap.c
@@ -33,8 +33,9 @@ int mte_save_tags(struct page *page)
  
  	mte_save_page_tags(page_address(page), tag_storage);
  
-	/* page_private contains the swap entry.val set in do_swap_page */
-	ret = xa_store(&mte_pages, page_private(page), tag_storage, GFP_KERNEL);
+	/* lookup the swap entry.val from the page */
+	ret = xa_store(&mte_pages, page_swap_entry(page).val, tag_storage,
+		       GFP_KERNEL);
  	if (WARN(xa_is_err(ret), "Failed to store MTE tags")) {
  		mte_free_tag_storage(tag_storage);
  		return xa_err(ret);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index bb5adc604144..84fe0e94f5cd 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -339,6 +339,15 @@ static inline swp_entry_t folio_swap_entry(struct folio *folio)
  	return entry;
  }
  
+static inline swp_entry_t page_swap_entry(struct page *page)
+{
+	struct folio *folio = page_folio(page);
+	swp_entry_t entry = folio_swap_entry(folio);
+
+	entry.val += page - &folio->page;
+	return entry;
+}
+
  static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry)
  {
  	folio->private = (void *)entry.val;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0b709d2c46c6..f7e04cbcb063 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2451,18 +2451,15 @@ static void __split_huge_page_tail(struct page *head, int tail,
  	page_tail->index = head->index + tail;
  
  	/*
-	 * page->private should not be set in tail pages with the exception
-	 * of swap cache pages that store the swp_entry_t in tail pages.
-	 * Fix up and warn once if private is unexpectedly set.
-	 *
-	 * What of 32-bit systems, on which folio->_pincount overlays
-	 * head[1].private?  No problem: THP_SWAP is not enabled on 32-bit, and
-	 * pincount must be 0 for folio_ref_freeze() to have succeeded.
+	 * page->private should not be set in tail pages. Fix up and warn once
+	 * if private is unexpectedly set.
  	 */
-	if (!folio_test_swapcache(page_folio(head))) {
-		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
+	if (unlikely(page_tail->private)) {
+		VM_WARN_ON_ONCE_PAGE(true, page_tail);
  		page_tail->private = 0;
  	}
+	if (PageSwapCache(head))
+		set_page_private(page_tail, (unsigned long)head->private + tail);
  
  	/* Page flags must be visible before we make the page non-compound. */
  	smp_wmb();
diff --git a/mm/memory.c b/mm/memory.c
index d003076b218d..ff13242c1589 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3882,7 +3882,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
  		 * changed.
  		 */
  		if (unlikely(!folio_test_swapcache(folio) ||
-			     page_private(page) != entry.val))
+			     page_swap_entry(page).val != entry.val))
  			goto out_page;
  
  		/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 1f04debdc87a..ec7f8e6c9e48 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1647,7 +1647,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
  			 */
  			dec_mm_counter(mm, mm_counter(&folio->page));
  		} else if (folio_test_anon(folio)) {
-			swp_entry_t entry = { .val = page_private(subpage) };
+			swp_entry_t entry = page_swap_entry(subpage);
  			pte_t swp_pte;
  			/*
  			 * Store the swap location in the pte.
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 01f15139b7d9..450819934e34 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -100,6 +100,7 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
  
  	folio_ref_add(folio, nr);
  	folio_set_swapcache(folio);
+	folio_set_swap_entry(folio, entry);
  
  	do {
  		xas_lock_irq(&xas);
@@ -113,7 +114,6 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
  				if (shadowp)
  					*shadowp = old;
  			}
-			set_page_private(folio_page(folio, i), entry.val + i);
  			xas_store(&xas, folio);
  			xas_next(&xas);
  		}
@@ -154,9 +154,9 @@ void __delete_from_swap_cache(struct folio *folio,
  	for (i = 0; i < nr; i++) {
  		void *entry = xas_store(&xas, shadow);
  		VM_BUG_ON_PAGE(entry != folio, entry);
-		set_page_private(folio_page(folio, i), 0);
  		xas_next(&xas);
  	}
+	folio->private = 0;
  	folio_clear_swapcache(folio);
  	address_space->nrpages -= nr;
  	__node_stat_mod_folio(folio, NR_FILE_PAGES, -nr);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d46933adf789..bd9d904671b9 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3369,7 +3369,7 @@ struct swap_info_struct *swp_swap_info(swp_entry_t entry)
  
  struct swap_info_struct *page_swap_info(struct page *page)
  {
-	swp_entry_t entry = { .val = page_private(page) };
+	swp_entry_t entry = page_swap_entry(page);
  	return swp_swap_info(entry);
  }
  
@@ -3384,7 +3384,7 @@ EXPORT_SYMBOL_GPL(swapcache_mapping);
  
  pgoff_t __page_file_index(struct page *page)
  {
-	swp_entry_t swap = { .val = page_private(page) };
+	swp_entry_t swap = page_swap_entry(page);
  	return swp_offset(swap);
  }
  EXPORT_SYMBOL_GPL(__page_file_index);
Peter Xu Aug. 16, 2023, 4:53 p.m. UTC | #2
On Wed, Aug 16, 2023 at 03:33:30PM +0200, David Hildenbrand wrote:
> On 15.08.23 23:25, Peter Xu wrote:
> > Tail page struct reuse is over-comlicated.  Not only because we have
> 
> It is complicated, agreed.
> 
> With the ->private for THP_SWAP gone, we would have to document less.
> Stating that 4*4byte / 4*8 byte are available after flags+head would
> be sufficient and I'd even drop the table.
> 
> 
> > implicit uses of tail page fields (mapcounts, or private for thp swap
> > support, etc., that we may still use in the page structs,
> 
> Instead of documenting that thp swap should no longer touch the private
> field of tail pages, maybe we can indeed fix that quite easily.
> 
> My simple tests passed so far. If there isn't something obvious missing,
> I can do more testing and send this as an official patch.

It'll be definitely good to fix it rather than document if possible.

Nothing wrong I spot quickly, you may just need a more complete cc list for
swap. One trivial comment below.

> 
> 
> From ec0f8b0dd8fb81c316b6a4c5fc9ae7563e625404 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Wed, 16 Aug 2023 13:14:45 +0200
> Subject: [PATCH] mm/swap: stop using page->private on tail pages for THP_SWAP
> 
> Let's stop using page->private on tail pages, making it possible to
> just unconditionally reuse that field in the tail pages of large folios.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/arm64/mm/mteswap.c |  5 +++--
>  include/linux/swap.h    |  9 +++++++++
>  mm/huge_memory.c        | 15 ++++++---------
>  mm/memory.c             |  2 +-
>  mm/rmap.c               |  2 +-
>  mm/swap_state.c         |  4 ++--
>  mm/swapfile.c           |  4 ++--
>  7 files changed, 24 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
> index cd508ba80ab1..a31833e3ddc5 100644
> --- a/arch/arm64/mm/mteswap.c
> +++ b/arch/arm64/mm/mteswap.c
> @@ -33,8 +33,9 @@ int mte_save_tags(struct page *page)
>  	mte_save_page_tags(page_address(page), tag_storage);
> -	/* page_private contains the swap entry.val set in do_swap_page */
> -	ret = xa_store(&mte_pages, page_private(page), tag_storage, GFP_KERNEL);
> +	/* lookup the swap entry.val from the page */
> +	ret = xa_store(&mte_pages, page_swap_entry(page).val, tag_storage,
> +		       GFP_KERNEL);
>  	if (WARN(xa_is_err(ret), "Failed to store MTE tags")) {
>  		mte_free_tag_storage(tag_storage);
>  		return xa_err(ret);
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index bb5adc604144..84fe0e94f5cd 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -339,6 +339,15 @@ static inline swp_entry_t folio_swap_entry(struct folio *folio)
>  	return entry;
>  }
> +static inline swp_entry_t page_swap_entry(struct page *page)
> +{
> +	struct folio *folio = page_folio(page);
> +	swp_entry_t entry = folio_swap_entry(folio);
> +
> +	entry.val += page - &folio->page;
> +	return entry;
> +}
> +
>  static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry)
>  {
>  	folio->private = (void *)entry.val;
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 0b709d2c46c6..f7e04cbcb063 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2451,18 +2451,15 @@ static void __split_huge_page_tail(struct page *head, int tail,
>  	page_tail->index = head->index + tail;
>  	/*
> -	 * page->private should not be set in tail pages with the exception
> -	 * of swap cache pages that store the swp_entry_t in tail pages.
> -	 * Fix up and warn once if private is unexpectedly set.
> -	 *
> -	 * What of 32-bit systems, on which folio->_pincount overlays
> -	 * head[1].private?  No problem: THP_SWAP is not enabled on 32-bit, and
> -	 * pincount must be 0 for folio_ref_freeze() to have succeeded.
> +	 * page->private should not be set in tail pages. Fix up and warn once
> +	 * if private is unexpectedly set.
>  	 */
> -	if (!folio_test_swapcache(page_folio(head))) {
> -		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
> +	if (unlikely(page_tail->private)) {
> +		VM_WARN_ON_ONCE_PAGE(true, page_tail);
>  		page_tail->private = 0;
>  	}
> +	if (PageSwapCache(head))
> +		set_page_private(page_tail, (unsigned long)head->private + tail);
>  	/* Page flags must be visible before we make the page non-compound. */
>  	smp_wmb();
> diff --git a/mm/memory.c b/mm/memory.c
> index d003076b218d..ff13242c1589 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3882,7 +3882,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>  		 * changed.
>  		 */
>  		if (unlikely(!folio_test_swapcache(folio) ||
> -			     page_private(page) != entry.val))
> +			     page_swap_entry(page).val != entry.val))
>  			goto out_page;
>  		/*
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 1f04debdc87a..ec7f8e6c9e48 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1647,7 +1647,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  			 */
>  			dec_mm_counter(mm, mm_counter(&folio->page));
>  		} else if (folio_test_anon(folio)) {
> -			swp_entry_t entry = { .val = page_private(subpage) };
> +			swp_entry_t entry = page_swap_entry(subpage);
>  			pte_t swp_pte;
>  			/*
>  			 * Store the swap location in the pte.
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 01f15139b7d9..450819934e34 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -100,6 +100,7 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
>  	folio_ref_add(folio, nr);
>  	folio_set_swapcache(folio);
> +	folio_set_swap_entry(folio, entry);
>  	do {
>  		xas_lock_irq(&xas);
> @@ -113,7 +114,6 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
>  				if (shadowp)
>  					*shadowp = old;
>  			}
> -			set_page_private(folio_page(folio, i), entry.val + i);
>  			xas_store(&xas, folio);
>  			xas_next(&xas);
>  		}
> @@ -154,9 +154,9 @@ void __delete_from_swap_cache(struct folio *folio,
>  	for (i = 0; i < nr; i++) {
>  		void *entry = xas_store(&xas, shadow);
>  		VM_BUG_ON_PAGE(entry != folio, entry);
> -		set_page_private(folio_page(folio, i), 0);
>  		xas_next(&xas);
>  	}
> +	folio->private = 0;

I'd rather remove all direct reference to "private" for swap alongside, if
this would be the last spot (perhaps folio_set_swap_entry()).

>  	folio_clear_swapcache(folio);
>  	address_space->nrpages -= nr;
>  	__node_stat_mod_folio(folio, NR_FILE_PAGES, -nr);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d46933adf789..bd9d904671b9 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3369,7 +3369,7 @@ struct swap_info_struct *swp_swap_info(swp_entry_t entry)
>  struct swap_info_struct *page_swap_info(struct page *page)
>  {
> -	swp_entry_t entry = { .val = page_private(page) };
> +	swp_entry_t entry = page_swap_entry(page);
>  	return swp_swap_info(entry);
>  }
> @@ -3384,7 +3384,7 @@ EXPORT_SYMBOL_GPL(swapcache_mapping);
>  pgoff_t __page_file_index(struct page *page)
>  {
> -	swp_entry_t swap = { .val = page_private(page) };
> +	swp_entry_t swap = page_swap_entry(page);
>  	return swp_offset(swap);
>  }
>  EXPORT_SYMBOL_GPL(__page_file_index);
> -- 
> 2.41.0
> 
> 
> -- 
> Cheers,
> 
> David / dhildenb
>
Matthew Wilcox Aug. 16, 2023, 6:41 p.m. UTC | #3
On Wed, Aug 16, 2023 at 03:33:30PM +0200, David Hildenbrand wrote:
> My simple tests passed so far. If there isn't something obvious missing,
> I can do more testing and send this as an official patch.

I think you missed one:

+++ b/mm/swapfile.c
@@ -1490,7 +1490,7 @@ int swp_swapcount(swp_entry_t entry)

        page = vmalloc_to_page(p->swap_map + offset);
        offset &= ~PAGE_MASK;
-       VM_BUG_ON(page_private(page) != SWP_CONTINUED);
+       VM_BUG_ON(page_swap_entry(page).val != SWP_CONTINUED);

        do {
                page = list_next_entry(page, lru);

I'm not smart enough to understand the use of the one in
add_swap_count_continuation().  Maybe that also needs to be fixed?
Maybe it should be fixed for consistency anyway.
David Hildenbrand Aug. 16, 2023, 6:51 p.m. UTC | #4
On 16.08.23 20:41, Matthew Wilcox wrote:
> On Wed, Aug 16, 2023 at 03:33:30PM +0200, David Hildenbrand wrote:
>> My simple tests passed so far. If there isn't something obvious missing,
>> I can do more testing and send this as an official patch.
> 
> I think you missed one:
> 
> +++ b/mm/swapfile.c
> @@ -1490,7 +1490,7 @@ int swp_swapcount(swp_entry_t entry)
> 
>          page = vmalloc_to_page(p->swap_map + offset);
>          offset &= ~PAGE_MASK;
> -       VM_BUG_ON(page_private(page) != SWP_CONTINUED);
> +       VM_BUG_ON(page_swap_entry(page).val != SWP_CONTINUED);

That falls under the "weird handling of SWP_CONTINUED using vmalloced 
pages". So different user of page_private().

Note that we don't even store swap entries in there but extended swap 
counts.

> 
>          do {
>                  page = list_next_entry(page, lru);
> 
> I'm not smart enough to understand the use of the one in
> add_swap_count_continuation().  Maybe that also needs to be fixed?

No, that's independent of THP_SWAP, we're not working on the memmap of 
the THP but some weird extension of swap counts.

Thanks for having a look!
David Hildenbrand Aug. 16, 2023, 6:59 p.m. UTC | #5
>>   	do {
>>   		xas_lock_irq(&xas);
>> @@ -113,7 +114,6 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
>>   				if (shadowp)
>>   					*shadowp = old;
>>   			}
>> -			set_page_private(folio_page(folio, i), entry.val + i);
>>   			xas_store(&xas, folio);
>>   			xas_next(&xas);
>>   		}
>> @@ -154,9 +154,9 @@ void __delete_from_swap_cache(struct folio *folio,
>>   	for (i = 0; i < nr; i++) {
>>   		void *entry = xas_store(&xas, shadow);
>>   		VM_BUG_ON_PAGE(entry != folio, entry);
>> -		set_page_private(folio_page(folio, i), 0);
>>   		xas_next(&xas);
>>   	}
>> +	folio->private = 0;
> 
> I'd rather remove all direct reference to "private" for swap alongside, if
> this would be the last spot (perhaps folio_set_swap_entry()).

Good idea, thanks!
Matthew Wilcox Aug. 16, 2023, 10:05 p.m. UTC | #6
On Wed, Aug 16, 2023 at 08:51:49PM +0200, David Hildenbrand wrote:
> On 16.08.23 20:41, Matthew Wilcox wrote:
> > On Wed, Aug 16, 2023 at 03:33:30PM +0200, David Hildenbrand wrote:
> > > My simple tests passed so far. If there isn't something obvious missing,
> > > I can do more testing and send this as an official patch.
> > 
> > I think you missed one:
> > 
> > +++ b/mm/swapfile.c
> > @@ -1490,7 +1490,7 @@ int swp_swapcount(swp_entry_t entry)
> > 
> >          page = vmalloc_to_page(p->swap_map + offset);
> >          offset &= ~PAGE_MASK;
> > -       VM_BUG_ON(page_private(page) != SWP_CONTINUED);
> > +       VM_BUG_ON(page_swap_entry(page).val != SWP_CONTINUED);
> 
> That falls under the "weird handling of SWP_CONTINUED using vmalloced
> pages". So different user of page_private().
> 
> Note that we don't even store swap entries in there but extended swap
> counts.

Ah, right.  I see now.


Not necessarily as part of this patch, but it got me wondering ...
should we do this?  And then maybe we could remove folio_swap_entry()
and folio_set_swap_entry() and just use folio->swap directly.


diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3880b3f2e321..e23d1356e504 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -266,6 +266,14 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page)
 	return (struct page *)(~ENCODE_PAGE_BITS & (unsigned long)page);
 }
 
+/*
+ * A swap entry has to fit into a "unsigned long", as the entry is hidden
+ * in the "index" field of the swapper address space.
+ */
+typedef struct {
+	unsigned long val;
+} swp_entry_t;
+
 /**
  * struct folio - Represents a contiguous set of bytes.
  * @flags: Identical to the page flags.
@@ -276,7 +284,7 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page)
  * @index: Offset within the file, in units of pages.  For anonymous memory,
  *    this is the index from the beginning of the mmap.
  * @private: Filesystem per-folio data (see folio_attach_private()).
- *    Used for swp_entry_t if folio_test_swapcache().
+ * @swap: Used for swp_entry_t if folio_test_swapcache().
  * @_mapcount: Do not access this member directly.  Use folio_mapcount() to
  *    find out how many times this folio is mapped by userspace.
  * @_refcount: Do not access this member directly.  Use folio_ref_count()
@@ -319,7 +327,10 @@ struct folio {
 			};
 			struct address_space *mapping;
 			pgoff_t index;
-			void *private;
+			union {
+				void *private;
+				swp_entry_t swap;
+			};
 			atomic_t _mapcount;
 			atomic_t _refcount;
 #ifdef CONFIG_MEMCG
@@ -1158,14 +1169,6 @@ enum tlb_flush_reason {
 	NR_TLB_FLUSH_REASONS,
 };
 
- /*
-  * A swap entry has to fit into a "unsigned long", as the entry is hidden
-  * in the "index" field of the swapper address space.
-  */
-typedef struct {
-	unsigned long val;
-} swp_entry_t;
-
 /**
  * enum fault_flag - Fault flag definitions.
  * @FAULT_FLAG_WRITE: Fault was a write fault.
diff --git a/include/linux/swap.h b/include/linux/swap.h
index bb5adc604144..59b0f37eae5b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -335,13 +335,12 @@ struct swap_info_struct {
 
 static inline swp_entry_t folio_swap_entry(struct folio *folio)
 {
-	swp_entry_t entry = { .val = page_private(&folio->page) };
-	return entry;
+	return folio->swap;
 }
 
 static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry)
 {
-	folio->private = (void *)entry.val;
+	folio->swap = entry;
 }
 
 /* linux/mm/workingset.c */
David Hildenbrand Aug. 18, 2023, 2:38 p.m. UTC | #7
On 17.08.23 00:05, Matthew Wilcox wrote:
> On Wed, Aug 16, 2023 at 08:51:49PM +0200, David Hildenbrand wrote:
>> On 16.08.23 20:41, Matthew Wilcox wrote:
>>> On Wed, Aug 16, 2023 at 03:33:30PM +0200, David Hildenbrand wrote:
>>>> My simple tests passed so far. If there isn't something obvious missing,
>>>> I can do more testing and send this as an official patch.
>>>
>>> I think you missed one:
>>>
>>> +++ b/mm/swapfile.c
>>> @@ -1490,7 +1490,7 @@ int swp_swapcount(swp_entry_t entry)
>>>
>>>           page = vmalloc_to_page(p->swap_map + offset);
>>>           offset &= ~PAGE_MASK;
>>> -       VM_BUG_ON(page_private(page) != SWP_CONTINUED);
>>> +       VM_BUG_ON(page_swap_entry(page).val != SWP_CONTINUED);
>>
>> That falls under the "weird handling of SWP_CONTINUED using vmalloced
>> pages". So different user of page_private().
>>
>> Note that we don't even store swap entries in there but extended swap
>> counts.
> 
> Ah, right.  I see now.
> 
> 
> Not necessarily as part of this patch, but it got me wondering ...
> should we do this?  And then maybe we could remove folio_swap_entry()
> and folio_set_swap_entry() and just use folio->swap directly.
> 
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3880b3f2e321..e23d1356e504 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -266,6 +266,14 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page)
>   	return (struct page *)(~ENCODE_PAGE_BITS & (unsigned long)page);
>   }
>   
> +/*
> + * A swap entry has to fit into a "unsigned long", as the entry is hidden
> + * in the "index" field of the swapper address space.
> + */
> +typedef struct {
> +	unsigned long val;
> +} swp_entry_t;
> +
>   /**
>    * struct folio - Represents a contiguous set of bytes.
>    * @flags: Identical to the page flags.
> @@ -276,7 +284,7 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page)
>    * @index: Offset within the file, in units of pages.  For anonymous memory,
>    *    this is the index from the beginning of the mmap.
>    * @private: Filesystem per-folio data (see folio_attach_private()).
> - *    Used for swp_entry_t if folio_test_swapcache().
> + * @swap: Used for swp_entry_t if folio_test_swapcache().
>    * @_mapcount: Do not access this member directly.  Use folio_mapcount() to
>    *    find out how many times this folio is mapped by userspace.
>    * @_refcount: Do not access this member directly.  Use folio_ref_count()
> @@ -319,7 +327,10 @@ struct folio {
>   			};
>   			struct address_space *mapping;
>   			pgoff_t index;
> -			void *private;
> +			union {
> +				void *private;
> +				swp_entry_t swap;

Probably with "/* for anon and shm pages only */

> +			};
>   			atomic_t _mapcount;
>   			atomic_t _refcount;
>   #ifdef CONFIG_MEMCG
> @@ -1158,14 +1169,6 @@ enum tlb_flush_reason {
>   	NR_TLB_FLUSH_REASONS,
>   };
>   
> - /*
> -  * A swap entry has to fit into a "unsigned long", as the entry is hidden
> -  * in the "index" field of the swapper address space.
> -  */
> -typedef struct {
> -	unsigned long val;
> -} swp_entry_t;
> -
>   /**
>    * enum fault_flag - Fault flag definitions.
>    * @FAULT_FLAG_WRITE: Fault was a write fault.
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index bb5adc604144..59b0f37eae5b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -335,13 +335,12 @@ struct swap_info_struct {
>   
>   static inline swp_entry_t folio_swap_entry(struct folio *folio)
>   {
> -	swp_entry_t entry = { .val = page_private(&folio->page) };
> -	return entry;
> +	return folio->swap;
>   }
>   
>   static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry)
>   {
> -	folio->private = (void *)entry.val;
> +	folio->swap = entry;
>   }
>   
>   /* linux/mm/workingset.c */
> 

Sound reasonable, but maybe we should even get rid of the getter/setter 
completely then?
diff mbox series

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 81456fa5fda5..66f1b0814334 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -324,6 +324,35 @@  struct folio {
 		};
 		struct page page;
 	};
+	/*
+	 * Some of the tail page fields may not be reused by the folio
+	 * object because they have already been used by the page struct.
+	 * On 32bits there are at least 8 WORDs while on 64 bits there're
+	 * at least 7 WORDs, all ending at _refcount field.
+	 *
+	 * |--------+-------------+-------------------|
+	 * |  index | 32 bits     | 64 bits           |
+	 * |--------+-------------+-------------------|
+	 * |      0 | flags       | flags             |
+	 * |      1 | head        | head              |
+	 * |      2 | FREE        | FREE              |
+	 * |      3 | FREE [1]    | FREE [1]          |
+	 * |      4 | FREE        | FREE              |
+	 * |      5 | FREE        | private [2]       |
+	 * |      6 | mapcnt      | mapcnt+refcnt [3] |
+	 * |      7 | refcnt [3]  |                   |
+	 * |--------+-------------+-------------------|
+	 *
+	 * [1] "mapping" field.  It is free to use but needs to be with
+	 *     some caution due to poisoning, see TAIL_MAPPING_REUSED_MAX.
+	 *
+	 * [2] "private" field, used when THP_SWAP is on (but disabled on
+	 *     32 bits, so this index is FREE on 32bit or hugetlb folios).
+	 *     May need to be fixed finally.
+	 *
+	 * [3] "refcount" field must be zero for all tail pages.  See e.g.
+	 *     has_unmovable_pages() on page_ref_count() check and comment.
+	 */
 	union {
 		struct {
 			unsigned long _flags_1;
@@ -331,18 +360,29 @@  struct folio {
 	/* public: */
 			unsigned char _folio_dtor;
 			unsigned char _folio_order;
+	/* private: 2 bytes can be reused later */
+			unsigned char _free_1_0[2];
+	/* public: */
 			atomic_t _entire_mapcount;
 			atomic_t _nr_pages_mapped;
 			atomic_t _pincount;
 #ifdef CONFIG_64BIT
 			unsigned int _folio_nr_pages;
+	/* private: 4 bytes can be reused later (64 bits only) */
+			unsigned char _free_1_1[4];
+	/* Currently used by THP_SWAP, to be fixed */
+			void *_private_1;
+	/* public: */
 #endif
+	/* private: */
+			atomic_t _mapcount_1;
+			atomic_t _refcount_1;
 	/* private: the union with struct page is transitional */
 		};
 		struct page __page_1;
 	};
 	union {
-		struct {
+		struct {	/* hugetlb folios */
 			unsigned long _flags_2;
 			unsigned long _head_2;
 	/* public: */
@@ -351,13 +391,22 @@  struct folio {
 			void *_hugetlb_cgroup_rsvd;
 			void *_hugetlb_hwpoison;
 	/* private: the union with struct page is transitional */
+			atomic_t _mapcount_2;
+			atomic_t _refcount_2;
 		};
-		struct {
+		struct {	/* non-hugetlb folios */
 			unsigned long _flags_2a;
 			unsigned long _head_2a;
 	/* public: */
 			struct list_head _deferred_list;
-	/* private: the union with struct page is transitional */
+	/* private: 8 more free bytes for either 32/64 bits */
+			unsigned char _free_2_2[8];
+#ifdef CONFIG_64BIT
+	/* currently used by THP_SWAP, to be fixed */
+			void *_private_2a;
+#endif
+			atomic_t _mapcount_2a;
+			atomic_t _refcount_2a;
 		};
 		struct page __page_2;
 	};
@@ -382,12 +431,26 @@  FOLIO_MATCH(memcg_data, memcg_data);
 			offsetof(struct page, pg) + sizeof(struct page))
 FOLIO_MATCH(flags, _flags_1);
 FOLIO_MATCH(compound_head, _head_1);
+#ifdef CONFIG_64BIT
+FOLIO_MATCH(private, _private_1);
+#endif
+FOLIO_MATCH(_mapcount, _mapcount_1);
+FOLIO_MATCH(_refcount, _refcount_1);
 #undef FOLIO_MATCH
 #define FOLIO_MATCH(pg, fl)						\
 	static_assert(offsetof(struct folio, fl) ==			\
 			offsetof(struct page, pg) + 2 * sizeof(struct page))
 FOLIO_MATCH(flags, _flags_2);
 FOLIO_MATCH(compound_head, _head_2);
+FOLIO_MATCH(_mapcount, _mapcount_2);
+FOLIO_MATCH(_refcount, _refcount_2);
+FOLIO_MATCH(flags, _flags_2a);
+FOLIO_MATCH(compound_head, _head_2a);
+FOLIO_MATCH(_mapcount, _mapcount_2a);
+FOLIO_MATCH(_refcount, _refcount_2a);
+#ifdef CONFIG_64BIT
+FOLIO_MATCH(private, _private_2a);
+#endif
 #undef FOLIO_MATCH
 
 /*