diff mbox series

mm: Wire up tail page poisoning over ->mappings

Message ID 20230815210659.430010-1-peterx@redhat.com (mailing list archive)
State New
Headers show
Series mm: Wire up tail page poisoning over ->mappings | expand

Commit Message

Peter Xu Aug. 15, 2023, 9:06 p.m. UTC
Tail pages have a sanity check on ->mapping fields, not all of them but
only upon index>2, for now.  It's because we reused ->mapping fields of the
tail pages index=1,2 for other things.

Define a macro for "max index of tail pages that got ->mapping field
reused" on top of folio definition, because when we grow folio tail pages
we'd want to boost this too together.

Then wire everything up using that macro.

Don't try to poison the ->mapping field in prep_compound_tail() for tail
pages <=TAIL_MAPPING_REUSED_MAX because it's wrong.  For example, the 1st
tail page already reused ->mapping field as _nr_pages_mapped.  It didn't
already blow up only because we luckily always prepare tail pages before
preparing the head, then prep_compound_head() will update
folio->_nr_pages_mapped so as to void the poisoning.  This should make it
always safe again, even e.g. if we prep the head first.

Clean up free_tail_page_prepare() along the way on checking ->mapping
poisoning to also leverage the new macro.

Signed-off-by: Peter Xu <peterx@redhat.com>
---

I split this out from another rfc series.  Removed RFC tag because it
wasn't for this patch but for the documentation updates.  I'll post the rfc
part alone.  Comments welcomed, thanks.
---
 include/linux/mm_types.h | 11 +++++++++++
 mm/huge_memory.c         |  6 +++---
 mm/internal.h            |  3 ++-
 mm/page_alloc.c          | 28 +++++++++++-----------------
 4 files changed, 27 insertions(+), 21 deletions(-)

Comments

Matthew Wilcox Aug. 18, 2023, 10:14 p.m. UTC | #1
On Tue, Aug 15, 2023 at 05:06:59PM -0400, Peter Xu wrote:
> I split this out from another rfc series.  Removed RFC tag because it
> wasn't for this patch but for the documentation updates.  I'll post the rfc
> part alone.  Comments welcomed, thanks.

I still hate it, as I explained here:

https://lore.kernel.org/linux-mm/ZNp7yUgUrIpILnXu@casper.infradead.org/

> > + * When the tail page's mapping field reused, it'll be exempted from
> > + * ->mapping poisoning and checks.  Also see the macro TAIL_MAPPING.
> > + */
> > +#define  TAIL_MAPPING_REUSED_MAX  (2)
> 
> More importantly, I think this is over-parametrisation.  If you start to
> use extra fields in struct folio, just change the code in page_alloc.c
> directly.
Peter Xu Aug. 21, 2023, 1:13 a.m. UTC | #2
On Fri, Aug 18, 2023 at 11:14:28PM +0100, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 05:06:59PM -0400, Peter Xu wrote:
> > I split this out from another rfc series.  Removed RFC tag because it
> > wasn't for this patch but for the documentation updates.  I'll post the rfc
> > part alone.  Comments welcomed, thanks.
> 
> I still hate it, as I explained here:

I still prefer it be merged.

> 
> https://lore.kernel.org/linux-mm/ZNp7yUgUrIpILnXu@casper.infradead.org/

https://lore.kernel.org/linux-mm/ZNqFv0AwkfDKExiw@x1n/#t

Firstly, I've answered and you didn't follow that up.

This is not identical to that version, it added one more change to remove
the other hard coded "2" 

> 
> > > + * When the tail page's mapping field reused, it'll be exempted from
> > > + * ->mapping poisoning and checks.  Also see the macro TAIL_MAPPING.
> > > + */
> > > +#define  TAIL_MAPPING_REUSED_MAX  (2)
> > 
> > More importantly, I think this is over-parametrisation.  If you start to
> > use extra fields in struct folio, just change the code in page_alloc.c
> > directly.

Change the hard-coded "2"s in different functions?  Can you kindly explain
why can't we just have a macro to help?

Setting tail mapping for tail 1/2 is even wrong, which part of this patch
fixes:

@@ -428,7 +428,8 @@ static inline void prep_compound_tail(struct page *head, int tail_idx)
 {
        struct page *p = head + tail_idx;

-       p->mapping = TAIL_MAPPING;
+       if (tail_idx > TAIL_MAPPING_REUSED_MAX)
+               p->mapping = TAIL_MAPPING;
        set_compound_head(p, head);
        set_page_private(p, 0);
 }

Do you mean you prefer me to add one more hard-coded "2" and "just boost it
when it's needed"?

Thanks,
Matthew Wilcox Aug. 21, 2023, 2:29 a.m. UTC | #3
On Sun, Aug 20, 2023 at 09:13:55PM -0400, Peter Xu wrote:
> > https://lore.kernel.org/linux-mm/ZNp7yUgUrIpILnXu@casper.infradead.org/
> 
> https://lore.kernel.org/linux-mm/ZNqFv0AwkfDKExiw@x1n/#t
> 
> Firstly, I've answered and you didn't follow that up.

I didn't see it.  I get a lot of email ...

> > > More importantly, I think this is over-parametrisation.  If you start to
> > > use extra fields in struct folio, just change the code in page_alloc.c
> > > directly.
> 
> Change the hard-coded "2"s in different functions?  Can you kindly explain
> why can't we just have a macro to help?

Because it's unnecessary.  You're putting in way too much effort here
for something that might happen once.

> Setting tail mapping for tail 1/2 is even wrong, which part of this patch
> fixes:
> 
> @@ -428,7 +428,8 @@ static inline void prep_compound_tail(struct page *head, int tail_idx)
>  {
>         struct page *p = head + tail_idx;
> 
> -       p->mapping = TAIL_MAPPING;
> +       if (tail_idx > TAIL_MAPPING_REUSED_MAX)
> +               p->mapping = TAIL_MAPPING;
>         set_compound_head(p, head);
>         set_page_private(p, 0);
>  }

I didn't see this.  This is wrong.  tail->mapping is only reused for
large rmappable pages.  It's not reused for other compound pages.

If you really insist on cleaning this up, the special casing of tail pages
should move out of page_alloc entirely.  folio_undo_large_rmappable()
should restore TAIL_MAPPING for all tail pages that were modified by
folio_prep_large_rmappable().

The other thing we should do is verify that the additional large-rmap
fields have the correct values in folio_undo_large_rmappable().

But let's look back to why TAIL_MAPPING was introduced.  Commit
1c290f642101e poisoned tail->mapping to catch users which forgot to
call compound_head().  So we can actually remove TAIL_MAPPING entirely
if we get rid of page->mapping.

You probably think that's an unattainable goal; there are something like
340 occurrences of the string 'page->mapping' in the kernel right now
(and there are probably instances of struct page named something other
than 'page'), but a lot of those are actually in comments, which would
be my fault for not fixing them during folio conversions.

However, I have a small patch series which enables 'allnoconfig' to
build after renaming page->mapping to page->_mapping.  Aside from fs/
there are *very* few places which look at page->mapping [1].  I'll post
that patch series tomorrow.

I think with some serious work, we can land "remove page->mapping"
(which would include removing TAIL_MAPPING) by the end of the year.
And that work gets us closer to the goal of shrinking struct page.

[1] device-dax, intel_th, mthca, cortina, fb_defio
Peter Xu Aug. 21, 2023, 4:48 p.m. UTC | #4
On Mon, Aug 21, 2023 at 03:29:01AM +0100, Matthew Wilcox wrote:
> On Sun, Aug 20, 2023 at 09:13:55PM -0400, Peter Xu wrote:
> > > https://lore.kernel.org/linux-mm/ZNp7yUgUrIpILnXu@casper.infradead.org/
> > 
> > https://lore.kernel.org/linux-mm/ZNqFv0AwkfDKExiw@x1n/#t
> > 
> > Firstly, I've answered and you didn't follow that up.
> 
> I didn't see it.  I get a lot of email ...
> 
> > > > More importantly, I think this is over-parametrisation.  If you start to
> > > > use extra fields in struct folio, just change the code in page_alloc.c
> > > > directly.
> > 
> > Change the hard-coded "2"s in different functions?  Can you kindly explain
> > why can't we just have a macro to help?
> 
> Because it's unnecessary.  You're putting in way too much effort here
> for something that might happen once.
> 
> > Setting tail mapping for tail 1/2 is even wrong, which part of this patch
> > fixes:
> > 
> > @@ -428,7 +428,8 @@ static inline void prep_compound_tail(struct page *head, int tail_idx)
> >  {
> >         struct page *p = head + tail_idx;
> > 
> > -       p->mapping = TAIL_MAPPING;
> > +       if (tail_idx > TAIL_MAPPING_REUSED_MAX)
> > +               p->mapping = TAIL_MAPPING;
> >         set_compound_head(p, head);
> >         set_page_private(p, 0);
> >  }
> 
> I didn't see this.  This is wrong.  tail->mapping is only reused for
> large rmappable pages.  It's not reused for other compound pages.

Just to make sure we're on the same page: I think it's not only
_deferred_list (of tail page 2) that reused the mapping field (word offset
3), but also _nr_pages_mapped (of tail page 1)?

> 
> If you really insist on cleaning this up, the special casing of tail pages
> should move out of page_alloc entirely.  folio_undo_large_rmappable()
> should restore TAIL_MAPPING for all tail pages that were modified by
> folio_prep_large_rmappable().
> 
> The other thing we should do is verify that the additional large-rmap
> fields have the correct values in folio_undo_large_rmappable().
> 
> But let's look back to why TAIL_MAPPING was introduced.  Commit
> 1c290f642101e poisoned tail->mapping to catch users which forgot to
> call compound_head().  So we can actually remove TAIL_MAPPING entirely
> if we get rid of page->mapping.
> 
> You probably think that's an unattainable goal; there are something like
> 340 occurrences of the string 'page->mapping' in the kernel right now
> (and there are probably instances of struct page named something other
> than 'page'), but a lot of those are actually in comments, which would
> be my fault for not fixing them during folio conversions.
> 
> However, I have a small patch series which enables 'allnoconfig' to
> build after renaming page->mapping to page->_mapping.  Aside from fs/
> there are *very* few places which look at page->mapping [1].  I'll post
> that patch series tomorrow.

Assuming it's still not yet posted; I can wait and read it.

If you plan to remove the whole TAIL_MAPPING in a few days then I agree
this patch is not needed, but so far I don't know when it'll land and also
why, before that it does sound like we can still keep this patch.

Regarding the question on "why removing TAIL_MAPPING": poisoning an unused
field is always helpful to me even if not referenced with "page->mapping".
So I don't see an immediate benefit from removing the poisoning if it's
already there; OTOH not sure whether poison more unused fields will be more
helpful in general?

> 
> I think with some serious work, we can land "remove page->mapping"
> (which would include removing TAIL_MAPPING) by the end of the year.
> And that work gets us closer to the goal of shrinking struct page.
> 
> [1] device-dax, intel_th, mthca, cortina, fb_defio

Thanks,
Matthew Wilcox Aug. 22, 2023, 3:18 p.m. UTC | #5
On Mon, Aug 21, 2023 at 12:48:18PM -0400, Peter Xu wrote:
> On Mon, Aug 21, 2023 at 03:29:01AM +0100, Matthew Wilcox wrote:
> > On Sun, Aug 20, 2023 at 09:13:55PM -0400, Peter Xu wrote:
> > > Setting tail mapping for tail 1/2 is even wrong, which part of this patch
> > > fixes:
> > > 
> > > @@ -428,7 +428,8 @@ static inline void prep_compound_tail(struct page *head, int tail_idx)
> > >  {
> > >         struct page *p = head + tail_idx;
> > > 
> > > -       p->mapping = TAIL_MAPPING;
> > > +       if (tail_idx > TAIL_MAPPING_REUSED_MAX)
> > > +               p->mapping = TAIL_MAPPING;
> > >         set_compound_head(p, head);
> > >         set_page_private(p, 0);
> > >  }
> > 
> > I didn't see this.  This is wrong.  tail->mapping is only reused for
> > large rmappable pages.  It's not reused for other compound pages.
> 
> Just to make sure we're on the same page: I think it's not only
> _deferred_list (of tail page 2) that reused the mapping field (word offset
> 3), but also _nr_pages_mapped (of tail page 1)?

I don't see how this comment is related to the part of the email you're
replying to.  But yes, prep_large_rmappable overwrites ->mapping in
two tail pages.

> > However, I have a small patch series which enables 'allnoconfig' to
> > build after renaming page->mapping to page->_mapping.  Aside from fs/
> > there are *very* few places which look at page->mapping [1].  I'll post
> > that patch series tomorrow.
> 
> Assuming it's still not yet posted; I can wait and read it.

I sent out a few patches.  Some have made it to -next already because
they're almost trivial.  Nobody's commented on the difficult one.

https://lore.kernel.org/linux-mm/20230821202016.2910321-1-willy@infradead.org/

> If you plan to remove the whole TAIL_MAPPING in a few days then I agree
> this patch is not needed, but so far I don't know when it'll land and also
> why, before that it does sound like we can still keep this patch.

This patch is putting fresh paint on a condemned building.  Just stop
it.

> Regarding the question on "why removing TAIL_MAPPING": poisoning an unused
> field is always helpful to me even if not referenced with "page->mapping".
> So I don't see an immediate benefit from removing the poisoning if it's
> already there; OTOH not sure whether poison more unused fields will be more
> helpful in general?

You are wrong.
diff mbox series

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 291c05cacd48..81456fa5fda5 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -248,6 +248,17 @@  static inline struct page *encoded_page_ptr(struct encoded_page *page)
 	return (struct page *)(~ENCODE_PAGE_BITS & (unsigned long)page);
 }
 
+/*
+ * This macro defines the maximum tail pages (of a folio) that can have the
+ * page->mapping field reused.
+ *
+ * When the tail page's mapping field reused, it'll be exempted from
+ * ->mapping poisoning and checks.  Also see the macro TAIL_MAPPING.
+ *
+ * When grow the folio struct, please consider growing this too.
+ */
+#define  TAIL_MAPPING_REUSED_MAX  (2)
+
 /**
  * struct folio - Represents a contiguous set of bytes.
  * @flags: Identical to the page flags.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0b709d2c46c6..72f244e16dcb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2444,9 +2444,9 @@  static void __split_huge_page_tail(struct page *head, int tail,
 			 (1L << PG_dirty) |
 			 LRU_GEN_MASK | LRU_REFS_MASK));
 
-	/* ->mapping in first and second tail page is replaced by other uses */
-	VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING,
-			page_tail);
+	/* ->mapping in <=TAIL_MAPPING_REUSED_MAX tail pages are reused */
+	VM_BUG_ON_PAGE(tail > TAIL_MAPPING_REUSED_MAX &&
+		       page_tail->mapping != TAIL_MAPPING, page_tail);
 	page_tail->mapping = head->mapping;
 	page_tail->index = head->index + tail;
 
diff --git a/mm/internal.h b/mm/internal.h
index 02a6fd36f46e..a75df0bd1f89 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -428,7 +428,8 @@  static inline void prep_compound_tail(struct page *head, int tail_idx)
 {
 	struct page *p = head + tail_idx;
 
-	p->mapping = TAIL_MAPPING;
+	if (tail_idx > TAIL_MAPPING_REUSED_MAX)
+		p->mapping = TAIL_MAPPING;
 	set_compound_head(p, head);
 	set_page_private(p, 0);
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 96b7c1a7d1f2..7ab7869f3c7f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -990,7 +990,7 @@  static inline bool is_check_pages_enabled(void)
 static int free_tail_page_prepare(struct page *head_page, struct page *page)
 {
 	struct folio *folio = (struct folio *)head_page;
-	int ret = 1;
+	int ret = 1, index = page - head_page;
 
 	/*
 	 * We rely page->lru.next never has bit 0 set, unless the page
@@ -1002,9 +1002,9 @@  static int free_tail_page_prepare(struct page *head_page, struct page *page)
 		ret = 0;
 		goto out;
 	}
-	switch (page - head_page) {
-	case 1:
-		/* the first tail page: these may be in place of ->mapping */
+
+	/* Sanity check the first tail page */
+	if (index == 1) {
 		if (unlikely(folio_entire_mapcount(folio))) {
 			bad_page(page, "nonzero entire_mapcount");
 			goto out;
@@ -1017,20 +1017,14 @@  static int free_tail_page_prepare(struct page *head_page, struct page *page)
 			bad_page(page, "nonzero pincount");
 			goto out;
 		}
-		break;
-	case 2:
-		/*
-		 * the second tail page: ->mapping is
-		 * deferred_list.next -- ignore value.
-		 */
-		break;
-	default:
-		if (page->mapping != TAIL_MAPPING) {
-			bad_page(page, "corrupted mapping in tail page");
-			goto out;
-		}
-		break;
 	}
+
+	/* Sanity check the rest tail pages over ->mapping */
+	if (index > TAIL_MAPPING_REUSED_MAX && page->mapping != TAIL_MAPPING) {
+		bad_page(page, "corrupted mapping in tail page");
+		goto out;
+	}
+
 	if (unlikely(!PageTail(page))) {
 		bad_page(page, "PageTail not set");
 		goto out;