diff mbox series

[RFC,v2,05/27] mm: page_alloc: Add an arch hook to allow prep_new_page() to fail

Message ID 20231119165721.9849-6-alexandru.elisei@arm.com (mailing list archive)
State New
Headers show
Series [RFC,v2,01/27] arm64: mte: Rework naming for tag manipulation functions | expand

Commit Message

Alexandru Elisei Nov. 19, 2023, 4:56 p.m. UTC
Introduce arch_prep_new_page(), which will be used by arm64 to reserve tag
storage for an allocated page. Reserving tag storage can fail, for example,
if the tag storage page has a short pin on it, so allow prep_new_page() ->
arch_prep_new_page() to similarly fail.

arch_alloc_page(), called from post_alloc_hook(), has been considered as an
alternative to adding yet another arch hook, but post_alloc_hook() cannot
fail, as it's also called when free pages are isolated.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/linux/pgtable.h |  7 ++++
 mm/page_alloc.c         | 75 ++++++++++++++++++++++++++++++++---------
 2 files changed, 66 insertions(+), 16 deletions(-)

Comments

David Hildenbrand Nov. 24, 2023, 7:35 p.m. UTC | #1
On 19.11.23 17:56, Alexandru Elisei wrote:
> Introduce arch_prep_new_page(), which will be used by arm64 to reserve tag
> storage for an allocated page. Reserving tag storage can fail, for example,
> if the tag storage page has a short pin on it, so allow prep_new_page() ->
> arch_prep_new_page() to similarly fail.

But what are the side-effects of this? How does the calling code recover?

E.g., what if we need to populate a page into user space, but that 
particular page we allocated fails to be prepared? So we inject a signal 
into that poor process?
Alexandru Elisei Nov. 27, 2023, 12:09 p.m. UTC | #2
Hi,

Thank you so much for your comments, there are genuinely useful.

On Fri, Nov 24, 2023 at 08:35:47PM +0100, David Hildenbrand wrote:
> On 19.11.23 17:56, Alexandru Elisei wrote:
> > Introduce arch_prep_new_page(), which will be used by arm64 to reserve tag
> > storage for an allocated page. Reserving tag storage can fail, for example,
> > if the tag storage page has a short pin on it, so allow prep_new_page() ->
> > arch_prep_new_page() to similarly fail.
> 
> But what are the side-effects of this? How does the calling code recover?
> 
> E.g., what if we need to populate a page into user space, but that
> particular page we allocated fails to be prepared? So we inject a signal
> into that poor process?

When the page fails to be prepared, it is put back to the tail of the
freelist with __free_one_page(.., FPI_TO_TAIL). If all the allocation paths
are exhausted and no page has been found for which tag storage has been
reserved, then that's treated like an OOM situation.

I have been thinking about this, and I think I can simplify the code by
making tag reservation a best effort approach. The page can be allocated
even if reserving tag storage fails, but the page is marked as invalid in
set_pte_at() (PAGE_NONE + an extra bit to tell arm64 that it needs tag
storage) and next time it is accessed, arm64 will reserve tag storage in
the fault handling code (the mechanism for that is implemented in patch #19
of the series, "mm: mprotect: Introduce PAGE_FAULT_ON_ACCESS for
mprotect(PROT_MTE)").

With this new approach, prep_new_page() stays the way it is, and no further
changes are required for the page allocator, as there are already arch
callbacks that can be used for that, for example tag_clear_highpage() and
arch_alloc_page(). The downside is extra page faults, which might impact
performance.

What do you think?

Thanks,
Alex

> 
> -- 
> Cheers,
> 
> David / dhildenb
>
David Hildenbrand Nov. 28, 2023, 4:57 p.m. UTC | #3
On 27.11.23 13:09, Alexandru Elisei wrote:
> Hi,
> 
> Thank you so much for your comments, there are genuinely useful.
> 
> On Fri, Nov 24, 2023 at 08:35:47PM +0100, David Hildenbrand wrote:
>> On 19.11.23 17:56, Alexandru Elisei wrote:
>>> Introduce arch_prep_new_page(), which will be used by arm64 to reserve tag
>>> storage for an allocated page. Reserving tag storage can fail, for example,
>>> if the tag storage page has a short pin on it, so allow prep_new_page() ->
>>> arch_prep_new_page() to similarly fail.
>>
>> But what are the side-effects of this? How does the calling code recover?
>>
>> E.g., what if we need to populate a page into user space, but that
>> particular page we allocated fails to be prepared? So we inject a signal
>> into that poor process?
> 
> When the page fails to be prepared, it is put back to the tail of the
> freelist with __free_one_page(.., FPI_TO_TAIL). If all the allocation paths
> are exhausted and no page has been found for which tag storage has been
> reserved, then that's treated like an OOM situation.
> 
> I have been thinking about this, and I think I can simplify the code by
> making tag reservation a best effort approach. The page can be allocated
> even if reserving tag storage fails, but the page is marked as invalid in
> set_pte_at() (PAGE_NONE + an extra bit to tell arm64 that it needs tag
> storage) and next time it is accessed, arm64 will reserve tag storage in
> the fault handling code (the mechanism for that is implemented in patch #19
> of the series, "mm: mprotect: Introduce PAGE_FAULT_ON_ACCESS for
> mprotect(PROT_MTE)").
> 
> With this new approach, prep_new_page() stays the way it is, and no further
> changes are required for the page allocator, as there are already arch
> callbacks that can be used for that, for example tag_clear_highpage() and
> arch_alloc_page(). The downside is extra page faults, which might impact
> performance.
> 
> What do you think?

That sounds a lot more robust, compared to intermittent failures to 
allocate pages.
Alexandru Elisei Nov. 28, 2023, 5:17 p.m. UTC | #4
Hi,

On Tue, Nov 28, 2023 at 05:57:31PM +0100, David Hildenbrand wrote:
> On 27.11.23 13:09, Alexandru Elisei wrote:
> > Hi,
> > 
> > Thank you so much for your comments, there are genuinely useful.
> > 
> > On Fri, Nov 24, 2023 at 08:35:47PM +0100, David Hildenbrand wrote:
> > > On 19.11.23 17:56, Alexandru Elisei wrote:
> > > > Introduce arch_prep_new_page(), which will be used by arm64 to reserve tag
> > > > storage for an allocated page. Reserving tag storage can fail, for example,
> > > > if the tag storage page has a short pin on it, so allow prep_new_page() ->
> > > > arch_prep_new_page() to similarly fail.
> > > 
> > > But what are the side-effects of this? How does the calling code recover?
> > > 
> > > E.g., what if we need to populate a page into user space, but that
> > > particular page we allocated fails to be prepared? So we inject a signal
> > > into that poor process?
> > 
> > When the page fails to be prepared, it is put back to the tail of the
> > freelist with __free_one_page(.., FPI_TO_TAIL). If all the allocation paths
> > are exhausted and no page has been found for which tag storage has been
> > reserved, then that's treated like an OOM situation.
> > 
> > I have been thinking about this, and I think I can simplify the code by
> > making tag reservation a best effort approach. The page can be allocated
> > even if reserving tag storage fails, but the page is marked as invalid in
> > set_pte_at() (PAGE_NONE + an extra bit to tell arm64 that it needs tag
> > storage) and next time it is accessed, arm64 will reserve tag storage in
> > the fault handling code (the mechanism for that is implemented in patch #19
> > of the series, "mm: mprotect: Introduce PAGE_FAULT_ON_ACCESS for
> > mprotect(PROT_MTE)").
> > 
> > With this new approach, prep_new_page() stays the way it is, and no further
> > changes are required for the page allocator, as there are already arch
> > callbacks that can be used for that, for example tag_clear_highpage() and
> > arch_alloc_page(). The downside is extra page faults, which might impact
> > performance.
> > 
> > What do you think?
> 
> That sounds a lot more robust, compared to intermittent failures to allocate
> pages.

Great, thank you for the feedback, I will use this approach for the next
iteration of the series.

Thanks,
Alex

> 
> -- 
> Cheers,
> 
> David / dhildenb
>
diff mbox series

Patch

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index af7639c3b0a3..b31f53e9ab1d 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -873,6 +873,13 @@  static inline void arch_do_swap_page(struct mm_struct *mm,
 }
 #endif
 
+#ifndef __HAVE_ARCH_PREP_NEW_PAGE
+static inline int arch_prep_new_page(struct page *page, int order, gfp_t gfp)
+{
+	return 0;
+}
+#endif
+
 #ifndef __HAVE_ARCH_UNMAP_ONE
 /*
  * Some architectures support metadata associated with a page. When a
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 770e585b77c8..b2782b778e78 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1538,9 +1538,15 @@  inline void post_alloc_hook(struct page *page, unsigned int order,
 	page_table_check_alloc(page, order);
 }
 
-static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
+static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
 							unsigned int alloc_flags)
 {
+	int ret;
+
+	ret = arch_prep_new_page(page, order, gfp_flags);
+	if (unlikely(ret))
+		return ret;
+
 	post_alloc_hook(page, order, gfp_flags);
 
 	if (order && (gfp_flags & __GFP_COMP))
@@ -1556,6 +1562,8 @@  static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
 		set_page_pfmemalloc(page);
 	else
 		clear_page_pfmemalloc(page);
+
+	return 0;
 }
 
 /*
@@ -3163,6 +3171,24 @@  static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask,
 	return alloc_flags;
 }
 
+#ifdef HAVE_ARCH_ALLOC_PAGE
+static void return_page_to_buddy(struct page *page, int order)
+{
+	int migratetype = get_pfnblock_migratetype(page, pfn);
+	unsigned long pfn = page_to_pfn(page);
+	struct zone *zone = page_zone(page);
+	unsigned long flags;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	__free_one_page(page, pfn, zone, order, migratetype, FPI_TO_TAIL);
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+#else
+static void return_page_to_buddy(struct page *page, int order)
+{
+}
+#endif
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -3309,7 +3335,10 @@  get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 		page = rmqueue(ac->preferred_zoneref->zone, zone, order,
 				gfp_mask, alloc_flags, ac->migratetype);
 		if (page) {
-			prep_new_page(page, order, gfp_mask, alloc_flags);
+			if (prep_new_page(page, order, gfp_mask, alloc_flags)) {
+				return_page_to_buddy(page, order);
+				goto no_page;
+			}
 
 			/*
 			 * If this is a high-order atomic allocation then check
@@ -3319,20 +3348,20 @@  get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 				reserve_highatomic_pageblock(page, zone);
 
 			return page;
-		} else {
-			if (has_unaccepted_memory()) {
-				if (try_to_accept_memory(zone, order))
-					goto try_this_zone;
-			}
+		}
+no_page:
+		if (has_unaccepted_memory()) {
+			if (try_to_accept_memory(zone, order))
+				goto try_this_zone;
+		}
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-			/* Try again if zone has deferred pages */
-			if (deferred_pages_enabled()) {
-				if (_deferred_grow_zone(zone, order))
-					goto try_this_zone;
-			}
-#endif
+		/* Try again if zone has deferred pages */
+		if (deferred_pages_enabled()) {
+			if (_deferred_grow_zone(zone, order))
+				goto try_this_zone;
 		}
+#endif
 	}
 
 	/*
@@ -3538,8 +3567,12 @@  __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	count_vm_event(COMPACTSTALL);
 
 	/* Prep a captured page if available */
-	if (page)
-		prep_new_page(page, order, gfp_mask, alloc_flags);
+	if (page) {
+		if (prep_new_page(page, order, gfp_mask, alloc_flags)) {
+			return_page_to_buddy(page, order);
+			page = NULL;
+		}
+	}
 
 	/* Try get a page from the freelist if available */
 	if (!page)
@@ -4490,9 +4523,18 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 			}
 			break;
 		}
+
+		if (prep_new_page(page, 0, gfp, 0)) {
+			pcp_spin_unlock(pcp);
+			pcp_trylock_finish(UP_flags);
+			return_page_to_buddy(page, 0);
+			if (!nr_account)
+				goto failed;
+			else
+				goto out_statistics;
+		}
 		nr_account++;
 
-		prep_new_page(page, 0, gfp, 0);
 		if (page_list)
 			list_add(&page->lru, page_list);
 		else
@@ -4503,6 +4545,7 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	pcp_spin_unlock(pcp);
 	pcp_trylock_finish(UP_flags);
 
+out_statistics:
 	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
 	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);