diff mbox series

[RFC,10/47] mm: asi: Support for global non-sensitive direct map allocations

Message ID 20220223052223.1202152-11-junaids@google.com (mailing list archive)
State New
Headers show
Series Address Space Isolation for KVM | expand

Commit Message

Junaid Shahid Feb. 23, 2022, 5:21 a.m. UTC
A new GFP flag is added to specify that an allocation should be
considered globally non-sensitive. The pages will be mapped into the
ASI global non-sensitive pseudo-PGD, which is shared between all
standard ASI instances. A new page flag is also added so that when
these pages are freed, they can also be unmapped from the ASI page
tables.

Signed-off-by: Junaid Shahid <junaids@google.com>


---
 include/linux/gfp.h            |  10 ++-
 include/linux/mm_types.h       |   5 ++
 include/linux/page-flags.h     |   9 ++
 include/trace/events/mmflags.h |  12 ++-
 mm/page_alloc.c                | 145 ++++++++++++++++++++++++++++++++-
 tools/perf/builtin-kmem.c      |   1 +
 6 files changed, 178 insertions(+), 4 deletions(-)

Comments

Matthew Wilcox March 23, 2022, 9:06 p.m. UTC | #1
On Tue, Feb 22, 2022 at 09:21:46PM -0800, Junaid Shahid wrote:
> standard ASI instances. A new page flag is also added so that when
> these pages are freed, they can also be unmapped from the ASI page
> tables.

It's cute how you just throw this in as an aside.  Page flags are
in high demand and just adding them is not to be done lightly.  Is
there any other way of accomplishing what you want?

> @@ -542,6 +545,12 @@ TESTCLEARFLAG(Young, young, PF_ANY)
>  PAGEFLAG(Idle, idle, PF_ANY)
>  #endif
>  
> +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION
> +__PAGEFLAG(GlobalNonSensitive, global_nonsensitive, PF_ANY);

Why is PF_ANY appropriate?
Junaid Shahid March 23, 2022, 11:48 p.m. UTC | #2
Hi Matthew,

On 3/23/22 14:06, Matthew Wilcox wrote:
> On Tue, Feb 22, 2022 at 09:21:46PM -0800, Junaid Shahid wrote:
>> standard ASI instances. A new page flag is also added so that when
>> these pages are freed, they can also be unmapped from the ASI page
>> tables.
> 
> It's cute how you just throw this in as an aside.  Page flags are
> in high demand and just adding them is not to be done lightly.  Is
> there any other way of accomplishing what you want?
> 

I suppose we may be able to use page_ext instead. That certainly should be 
feasible for the PG_local_nonsensitive flag introduced in a later patch, 
although I am not completely sure about the PG_global_nonsensitive flag. That 
could get slightly tricky (though likely still possible to do) in case we need 
to allocate any non-sensitive memory before page_ext is initialized. One concern 
with using page_ext could be the extra memory usage on large machines.

BTW is page flag scarcity an issue on 64-bit systems as well, or only 32-bit 
systems? ASI is only supported on 64-bit systems (at least currently).

>> @@ -542,6 +545,12 @@ TESTCLEARFLAG(Young, young, PF_ANY)
>>   PAGEFLAG(Idle, idle, PF_ANY)
>>   #endif
>>   
>> +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION
>> +__PAGEFLAG(GlobalNonSensitive, global_nonsensitive, PF_ANY);
> 
> Why is PF_ANY appropriate?
> 

I think we actually can use PF_HEAD here. That would make the alloc_pages path 
faster too. I'll change to that. (Initially I had gone with PF_ANY because in 
theory, you could allocate a compound page and then break it later and free the 
individual pages, but I don't know if that actually happens apart from THP, and 
certainly doesn't look like the case for any of the stuff that we have marked as 
non-sensitive.)

Thanks,
Junaid
Junaid Shahid March 24, 2022, 1:54 a.m. UTC | #3
On 3/23/22 16:48, Junaid Shahid wrote:
> Hi Matthew,
> 
> On 3/23/22 14:06, Matthew Wilcox wrote:
>> On Tue, Feb 22, 2022 at 09:21:46PM -0800, Junaid Shahid wrote:
>>> standard ASI instances. A new page flag is also added so that when
>>> these pages are freed, they can also be unmapped from the ASI page
>>> tables.
>>
>> It's cute how you just throw this in as an aside.  Page flags are
>> in high demand and just adding them is not to be done lightly.  Is
>> there any other way of accomplishing what you want?
>>
> 
> I suppose we may be able to use page_ext instead. That certainly should be 
> feasible for the PG_local_nonsensitive flag introduced in a later patch, 
> although I am not completely sure about the PG_global_nonsensitive flag. That 
> could get slightly tricky (though likely still possible to do) in case we need 
> to allocate any non-sensitive memory before page_ext is initialized. One concern 
> with using page_ext could be the extra memory usage on large machines.
> 
> BTW is page flag scarcity an issue on 64-bit systems as well, or only 32-bit 
> systems? ASI is only supported on 64-bit systems (at least currently).
> 

One other thing that we could do to remove the need for the 
PG_global_nonsensitive flag altogether (though not the PG_local_nonsensitive 
flag) would be to always try to unmap pages from the asi_global_nonsensitive_pgd 
in free_pages(). Basically, that would mean adding a page table walk to every 
free_pages() rather than just non-sensitive free_pages(). Do you think that may 
be a better trade-off in order to avoid the flag?

Thanks,
Junaid
diff mbox series

Patch

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 8fcc38467af6..07a99a463a34 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -60,6 +60,11 @@  struct vm_area_struct;
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#ifdef CONFIG_ADDRESS_SPACE_ISOLATION
+#define ___GFP_GLOBAL_NONSENSITIVE 0x4000000u
+#else
+#define ___GFP_GLOBAL_NONSENSITIVE 0
+#endif
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -248,8 +253,11 @@  struct vm_area_struct;
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
+/* Allocate non-sensitive memory */
+#define __GFP_GLOBAL_NONSENSITIVE ((__force gfp_t)___GFP_GLOBAL_NONSENSITIVE)
+
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (25 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT 27
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3de1afa57289..5b8028fcfe67 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -191,6 +191,11 @@  struct page {
 
 		/** @rcu_head: You can use this to free a page by RCU. */
 		struct rcu_head rcu_head;
+
+#ifdef CONFIG_ADDRESS_SPACE_ISOLATION
+		/* Links the pages_to_free_async list */
+		struct llist_node async_free_node;
+#endif
 	};
 
 	union {		/* This union is 4 bytes in size. */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index b90a17e9796d..a07434cc679c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -140,6 +140,9 @@  enum pageflags {
 #endif
 #ifdef CONFIG_KASAN_HW_TAGS
 	PG_skip_kasan_poison,
+#endif
+#ifdef CONFIG_ADDRESS_SPACE_ISOLATION
+	PG_global_nonsensitive,
 #endif
 	__NR_PAGEFLAGS,
 
@@ -542,6 +545,12 @@  TESTCLEARFLAG(Young, young, PF_ANY)
 PAGEFLAG(Idle, idle, PF_ANY)
 #endif
 
+#ifdef CONFIG_ADDRESS_SPACE_ISOLATION
+__PAGEFLAG(GlobalNonSensitive, global_nonsensitive, PF_ANY);
+#else
+__PAGEFLAG_FALSE(GlobalNonSensitive, global_nonsensitive);
+#endif
+
 #ifdef CONFIG_KASAN_HW_TAGS
 PAGEFLAG(SkipKASanPoison, skip_kasan_poison, PF_HEAD)
 #else
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 116ed4d5d0f8..73a49197ef54 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -50,7 +50,8 @@ 
 	{(unsigned long)__GFP_DIRECT_RECLAIM,	"__GFP_DIRECT_RECLAIM"},\
 	{(unsigned long)__GFP_KSWAPD_RECLAIM,	"__GFP_KSWAPD_RECLAIM"},\
 	{(unsigned long)__GFP_ZEROTAGS,		"__GFP_ZEROTAGS"},	\
-	{(unsigned long)__GFP_SKIP_KASAN_POISON,"__GFP_SKIP_KASAN_POISON"}\
+	{(unsigned long)__GFP_SKIP_KASAN_POISON,"__GFP_SKIP_KASAN_POISON"},\
+	{(unsigned long)__GFP_GLOBAL_NONSENSITIVE, "__GFP_GLOBAL_NONSENSITIVE"}\
 
 #define show_gfp_flags(flags)						\
 	(flags) ? __print_flags(flags, "|",				\
@@ -93,6 +94,12 @@ 
 #define IF_HAVE_PG_SKIP_KASAN_POISON(flag,string)
 #endif
 
+#ifdef CONFIG_ADDRESS_SPACE_ISOLATION
+#define IF_HAVE_ASI(flag, string) ,{1UL << flag, string}
+#else
+#define IF_HAVE_ASI(flag, string)
+#endif
+
 #define __def_pageflag_names						\
 	{1UL << PG_locked,		"locked"	},		\
 	{1UL << PG_waiters,		"waiters"	},		\
@@ -121,7 +128,8 @@  IF_HAVE_PG_HWPOISON(PG_hwpoison,	"hwpoison"	)		\
 IF_HAVE_PG_IDLE(PG_young,		"young"		)		\
 IF_HAVE_PG_IDLE(PG_idle,		"idle"		)		\
 IF_HAVE_PG_ARCH_2(PG_arch_2,		"arch_2"	)		\
-IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison")
+IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison")	\
+IF_HAVE_ASI(PG_global_nonsensitive,	"global_nonsensitive")
 
 #define show_page_flags(flags)						\
 	(flags) ? __print_flags(flags, "|",				\
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..a4048fa1868a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -697,7 +697,7 @@  static inline bool pcp_allowed_order(unsigned int order)
 	return false;
 }
 
-static inline void free_the_page(struct page *page, unsigned int order)
+static inline void __free_the_page(struct page *page, unsigned int order)
 {
 	if (pcp_allowed_order(order))		/* Via pcp? */
 		free_unref_page(page, order);
@@ -705,6 +705,14 @@  static inline void free_the_page(struct page *page, unsigned int order)
 		__free_pages_ok(page, order, FPI_NONE);
 }
 
+static bool asi_unmap_freed_pages(struct page *page, unsigned int order);
+
+static inline void free_the_page(struct page *page, unsigned int order)
+{
+	if (asi_unmap_freed_pages(page, order))
+		__free_the_page(page, order);
+}
+
 /*
  * Higher-order pages are called "compound pages".  They are structured thusly:
  *
@@ -5162,6 +5170,129 @@  static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 	return true;
 }
 
+#ifdef CONFIG_ADDRESS_SPACE_ISOLATION
+
+static DEFINE_PER_CPU(struct work_struct, async_free_work);
+static DEFINE_PER_CPU(struct llist_head, pages_to_free_async);
+static bool async_free_work_initialized;
+
+static void __free_the_page(struct page *page, unsigned int order);
+
+static void async_free_work_fn(struct work_struct *work)
+{
+	struct page *page, *tmp;
+	struct llist_node *pages_to_free;
+	void *va;
+	size_t len;
+	uint order;
+
+	pages_to_free = llist_del_all(this_cpu_ptr(&pages_to_free_async));
+
+	/* A later patch will do a more optimized TLB flush. */
+
+	llist_for_each_entry_safe(page, tmp, pages_to_free, async_free_node) {
+		va = page_to_virt(page);
+		order = page->private;
+		len = PAGE_SIZE * (1 << order);
+
+		asi_flush_tlb_range(ASI_GLOBAL_NONSENSITIVE, va, len);
+		__free_the_page(page, order);
+	}
+}
+
+static int __init asi_page_alloc_init(void)
+{
+	int cpu;
+
+	if (!static_asi_enabled())
+		return 0;
+
+	for_each_possible_cpu(cpu)
+		INIT_WORK(per_cpu_ptr(&async_free_work, cpu),
+			  async_free_work_fn);
+
+	/*
+	 * This function is called before SMP is initialized, so we can assume
+	 * that this is the only running CPU at this point.
+	 */
+
+	barrier();
+	async_free_work_initialized = true;
+	barrier();
+
+	if (!llist_empty(this_cpu_ptr(&pages_to_free_async)))
+		queue_work_on(smp_processor_id(), mm_percpu_wq,
+			      this_cpu_ptr(&async_free_work));
+
+	return 0;
+}
+early_initcall(asi_page_alloc_init);
+
+static int asi_map_alloced_pages(struct page *page, uint order, gfp_t gfp_mask)
+{
+	uint i;
+
+	if (!static_asi_enabled())
+		return 0;
+
+	if (gfp_mask & __GFP_GLOBAL_NONSENSITIVE) {
+		for (i = 0; i < (1 << order); i++)
+			__SetPageGlobalNonSensitive(page + i);
+
+		return asi_map_gfp(ASI_GLOBAL_NONSENSITIVE, page_to_virt(page),
+				   PAGE_SIZE * (1 << order), gfp_mask);
+	}
+
+	return 0;
+}
+
+static bool asi_unmap_freed_pages(struct page *page, unsigned int order)
+{
+	void *va;
+	size_t len;
+	bool async_flush_needed;
+
+	if (!static_asi_enabled())
+		return true;
+
+	if (!PageGlobalNonSensitive(page))
+		return true;
+
+	va = page_to_virt(page);
+	len = PAGE_SIZE * (1 << order);
+	async_flush_needed = irqs_disabled() || in_interrupt();
+
+	asi_unmap(ASI_GLOBAL_NONSENSITIVE, va, len, !async_flush_needed);
+
+	if (!async_flush_needed)
+		return true;
+
+	page->private = order;
+	llist_add(&page->async_free_node, this_cpu_ptr(&pages_to_free_async));
+
+	if (async_free_work_initialized)
+		queue_work_on(smp_processor_id(), mm_percpu_wq,
+			      this_cpu_ptr(&async_free_work));
+
+	return false;
+}
+
+#else /* CONFIG_ADDRESS_SPACE_ISOLATION */
+
+static inline
+int asi_map_alloced_pages(struct page *pages, uint order, gfp_t gfp_mask)
+{
+	return 0;
+}
+
+static inline
+bool asi_unmap_freed_pages(struct page *page, unsigned int order)
+{
+	return true;
+}
+
+#endif
+
 /*
  * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array
  * @gfp: GFP flags for the allocation
@@ -5345,6 +5476,9 @@  struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
 		return NULL;
 	}
 
+	if (static_asi_enabled() && (gfp & __GFP_GLOBAL_NONSENSITIVE))
+		gfp |= __GFP_ZERO;
+
 	gfp &= gfp_allowed_mask;
 	/*
 	 * Apply scoped allocation constraints. This is mainly about GFP_NOFS
@@ -5388,6 +5522,15 @@  struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
 		page = NULL;
 	}
 
+	if (page) {
+		int err = asi_map_alloced_pages(page, order, gfp);
+
+		if (unlikely(err)) {
+			__free_pages(page, order);
+			page = NULL;
+		}
+	}
+
 	trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
 
 	return page;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index da03a341c63c..5857953cd5c1 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -660,6 +660,7 @@  static const struct {
 	{ "__GFP_RECLAIM",		"R" },
 	{ "__GFP_DIRECT_RECLAIM",	"DR" },
 	{ "__GFP_KSWAPD_RECLAIM",	"KR" },
+	{ "__GFP_GLOBAL_NONSENSITIVE",	"GNS" },
 };
 
 static size_t max_gfp_len;