diff mbox series

kasan: allow sampling page_alloc allocations for HW_TAGS

Message ID c124467c401e9d44dd35a36fdae1c48e4e505e9e.1666901317.git.andreyknvl@google.com (mailing list archive)
State New
Headers show
Series kasan: allow sampling page_alloc allocations for HW_TAGS | expand

Commit Message

andrey.konovalov@linux.dev Oct. 27, 2022, 8:10 p.m. UTC
From: Andrey Konovalov <andreyknvl@google.com>

Add a new boot parameter called kasan.page_alloc.sample, which makes
Hardware Tag-Based KASAN tag only every Nth page_alloc allocation.

As Hardware Tag-Based KASAN is intended to be used in production, its
performance impact is crucial. As page_alloc allocations tend to be big,
tagging and checking all such allocations introduces a significant
slowdown in some testing scenarios. The new flag allows to alleviate
that slowdown.

Enabling page_alloc sampling has a downside: KASAN will miss bad accesses
to a page_alloc allocation that has not been tagged.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 Documentation/dev-tools/kasan.rst |  4 +++
 include/linux/kasan.h             |  7 ++---
 mm/kasan/common.c                 |  9 +++++--
 mm/kasan/hw_tags.c                | 26 +++++++++++++++++++
 mm/kasan/kasan.h                  | 15 +++++++++++
 mm/page_alloc.c                   | 43 +++++++++++++++++++++----------
 6 files changed, 85 insertions(+), 19 deletions(-)

Comments

Andrew Morton Oct. 27, 2022, 8:44 p.m. UTC | #1
On Thu, 27 Oct 2022 22:10:09 +0200 andrey.konovalov@linux.dev wrote:

> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Add a new boot parameter called kasan.page_alloc.sample, which makes
> Hardware Tag-Based KASAN tag only every Nth page_alloc allocation.
> 
> As Hardware Tag-Based KASAN is intended to be used in production, its
> performance impact is crucial. As page_alloc allocations tend to be big,
> tagging and checking all such allocations introduces a significant
> slowdown in some testing scenarios. The new flag allows to alleviate
> that slowdown.
> 
> Enabling page_alloc sampling has a downside: KASAN will miss bad accesses
> to a page_alloc allocation that has not been tagged.
> 

The Documentation:

> --- a/Documentation/dev-tools/kasan.rst
> +++ b/Documentation/dev-tools/kasan.rst
> @@ -140,6 +140,10 @@ disabling KASAN altogether or controlling its features:
>  - ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc
>    allocations (default: ``on``).
>  
> +- ``kasan.page_alloc.sample=<sampling frequency>`` makes KASAN tag only
> +  every Nth page_alloc allocation, where N is the value of the parameter
> +  (default: ``1``).
> +

explains what this does but not why it does it.

Let's tell people that this is here to mitigate the performance overhead.

And how is this performance impact observed?  The kernel just gets
overall slower?

If someone gets a KASAN report using this mitigation, should their next
step be to set kasan.page_alloc.sample back to 1 and rerun, in order to
get a more accurate report before reporting it upstream?  I'm thinking
"no"?

Finally, it would be helpful if the changelog were to give us some
sense of the magnitude of the impact with kasan.page_alloc.sample=1. 
Does the kernel get 3x slower?  50x?
Andrey Konovalov Oct. 27, 2022, 8:49 p.m. UTC | #2
On Thu, Oct 27, 2022 at 10:44 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Thu, 27 Oct 2022 22:10:09 +0200 andrey.konovalov@linux.dev wrote:
>
> > From: Andrey Konovalov <andreyknvl@google.com>
> >
> > Add a new boot parameter called kasan.page_alloc.sample, which makes
> > Hardware Tag-Based KASAN tag only every Nth page_alloc allocation.
> >
> > As Hardware Tag-Based KASAN is intended to be used in production, its
> > performance impact is crucial. As page_alloc allocations tend to be big,
> > tagging and checking all such allocations introduces a significant
> > slowdown in some testing scenarios. The new flag allows to alleviate
> > that slowdown.
> >
> > Enabling page_alloc sampling has a downside: KASAN will miss bad accesses
> > to a page_alloc allocation that has not been tagged.
> >
>
> The Documentation:
>
> > --- a/Documentation/dev-tools/kasan.rst
> > +++ b/Documentation/dev-tools/kasan.rst
> > @@ -140,6 +140,10 @@ disabling KASAN altogether or controlling its features:
> >  - ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc
> >    allocations (default: ``on``).
> >
> > +- ``kasan.page_alloc.sample=<sampling frequency>`` makes KASAN tag only
> > +  every Nth page_alloc allocation, where N is the value of the parameter
> > +  (default: ``1``).
> > +
>
> explains what this does but not why it does it.
>
> Let's tell people that this is here to mitigate the performance overhead.
>
> And how is this performance impact observed?  The kernel just gets
> overall slower?
>
> If someone gets a KASAN report using this mitigation, should their next
> step be to set kasan.page_alloc.sample back to 1 and rerun, in order to
> get a more accurate report before reporting it upstream?  I'm thinking
> "no"?
>
> Finally, it would be helpful if the changelog were to give us some
> sense of the magnitude of the impact with kasan.page_alloc.sample=1.
> Does the kernel get 3x slower?  50x?

Hi Andrew,

I will add explanations for all these points in v2.

Thank you!
Marco Elver Oct. 30, 2022, 2:59 a.m. UTC | #3
On Thu, Oct 27, 2022 at 10:10PM +0200, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Add a new boot parameter called kasan.page_alloc.sample, which makes
> Hardware Tag-Based KASAN tag only every Nth page_alloc allocation.
> 
> As Hardware Tag-Based KASAN is intended to be used in production, its
> performance impact is crucial. As page_alloc allocations tend to be big,
> tagging and checking all such allocations introduces a significant
> slowdown in some testing scenarios. The new flag allows to alleviate
> that slowdown.
> 
> Enabling page_alloc sampling has a downside: KASAN will miss bad accesses
> to a page_alloc allocation that has not been tagged.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> ---
>  Documentation/dev-tools/kasan.rst |  4 +++
>  include/linux/kasan.h             |  7 ++---
>  mm/kasan/common.c                 |  9 +++++--
>  mm/kasan/hw_tags.c                | 26 +++++++++++++++++++
>  mm/kasan/kasan.h                  | 15 +++++++++++
>  mm/page_alloc.c                   | 43 +++++++++++++++++++++----------
>  6 files changed, 85 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
> index 5c93ab915049..bd97301845ef 100644
> --- a/Documentation/dev-tools/kasan.rst
> +++ b/Documentation/dev-tools/kasan.rst
> @@ -140,6 +140,10 @@ disabling KASAN altogether or controlling its features:
>  - ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc
>    allocations (default: ``on``).
>  
> +- ``kasan.page_alloc.sample=<sampling frequency>`` makes KASAN tag only

Frequency is number of samples per frame (unit time, or if used
non-temporally like here, population size).

[1] https://en.wikipedia.org/wiki/Systematic_sampling

You're using it as an interval, so I'd just replace uses of frequency
with "interval" appropriately here and elsewhere.

> +  every Nth page_alloc allocation, where N is the value of the parameter
> +  (default: ``1``).
> +
>  Error reports
>  ~~~~~~~~~~~~~
>  
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index d811b3d7d2a1..d45d45dfd007 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -120,12 +120,13 @@ static __always_inline void kasan_poison_pages(struct page *page,
>  		__kasan_poison_pages(page, order, init);
>  }
>  
> -void __kasan_unpoison_pages(struct page *page, unsigned int order, bool init);
> -static __always_inline void kasan_unpoison_pages(struct page *page,
> +bool __kasan_unpoison_pages(struct page *page, unsigned int order, bool init);
> +static __always_inline bool kasan_unpoison_pages(struct page *page,
>  						 unsigned int order, bool init)
>  {
>  	if (kasan_enabled())
> -		__kasan_unpoison_pages(page, order, init);
> +		return __kasan_unpoison_pages(page, order, init);
> +	return false;
>  }
>  
>  void __kasan_cache_create_kmalloc(struct kmem_cache *cache);
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index 833bf2cfd2a3..1f30080a7a4c 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -95,19 +95,24 @@ asmlinkage void kasan_unpoison_task_stack_below(const void *watermark)
>  }
>  #endif /* CONFIG_KASAN_STACK */
>  
> -void __kasan_unpoison_pages(struct page *page, unsigned int order, bool init)
> +bool __kasan_unpoison_pages(struct page *page, unsigned int order, bool init)
>  {
>  	u8 tag;
>  	unsigned long i;
>  
>  	if (unlikely(PageHighMem(page)))
> -		return;
> +		return false;
> +
> +	if (!kasan_sample_page_alloc())
> +		return false;
>  
>  	tag = kasan_random_tag();
>  	kasan_unpoison(set_tag(page_address(page), tag),
>  		       PAGE_SIZE << order, init);
>  	for (i = 0; i < (1 << order); i++)
>  		page_kasan_tag_set(page + i, tag);
> +
> +	return true;
>  }
>  
>  void __kasan_poison_pages(struct page *page, unsigned int order, bool init)
> diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c
> index b22c4f461cb0..aa3b5a080297 100644
> --- a/mm/kasan/hw_tags.c
> +++ b/mm/kasan/hw_tags.c
> @@ -59,6 +59,11 @@ EXPORT_SYMBOL_GPL(kasan_mode);
>  /* Whether to enable vmalloc tagging. */
>  DEFINE_STATIC_KEY_TRUE(kasan_flag_vmalloc);
>  
> +/* Frequency of page_alloc allocation poisoning. */
> +unsigned long kasan_page_alloc_sample = 1;
> +
> +DEFINE_PER_CPU(unsigned long, kasan_page_alloc_count);
> +
>  /* kasan=off/on */
>  static int __init early_kasan_flag(char *arg)
>  {
> @@ -122,6 +127,27 @@ static inline const char *kasan_mode_info(void)
>  		return "sync";
>  }
>  
> +/* kasan.page_alloc.sample=<sampling frequency> */
> +static int __init early_kasan_flag_page_alloc_sample(char *arg)
> +{
> +	int rv;
> +
> +	if (!arg)
> +		return -EINVAL;
> +
> +	rv = kstrtoul(arg, 0, &kasan_page_alloc_sample);
> +	if (rv)
> +		return rv;
> +
> +	if (!kasan_page_alloc_sample) {
> +		kasan_page_alloc_sample = 1;
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +early_param("kasan.page_alloc.sample", early_kasan_flag_page_alloc_sample);
> +
>  /*
>   * kasan_init_hw_tags_cpu() is called for each CPU.
>   * Not marked as __init as a CPU can be hot-plugged after boot.
> diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
> index abbcc1b0eec5..ee67eb35f4a7 100644
> --- a/mm/kasan/kasan.h
> +++ b/mm/kasan/kasan.h
> @@ -42,6 +42,9 @@ enum kasan_mode {
>  
>  extern enum kasan_mode kasan_mode __ro_after_init;
>  
> +extern unsigned long kasan_page_alloc_sample;
> +DECLARE_PER_CPU(unsigned long, kasan_page_alloc_count);
> +
>  static inline bool kasan_vmalloc_enabled(void)
>  {
>  	return static_branch_likely(&kasan_flag_vmalloc);
> @@ -57,6 +60,13 @@ static inline bool kasan_sync_fault_possible(void)
>  	return kasan_mode == KASAN_MODE_SYNC || kasan_mode == KASAN_MODE_ASYMM;
>  }
>  
> +static inline bool kasan_sample_page_alloc(void)
> +{
> +	unsigned long *count = this_cpu_ptr(&kasan_page_alloc_count);

this_cpu_inc_return()

without it, you need to ensure preemption is disabled around here.

> +
> +	return (*count)++ % kasan_page_alloc_sample == 0;

Doing '%' is a potentially costly operation if called in a fast-path.

We can generate better code with (rename 'count' -> 'skip'):

	long skip_next = this_cpu_dec_return(kasan_page_alloc_skip);

	if (skip_next < 0) {
		this_cpu_write(kasan_page_alloc_skip, kasan_page_alloc_sample - 1);
		return true;
	}

	return false;

Important is also to switch the counter to a 'long', otherwise you'd
have to pre-initialize all of them to something non-zero to avoid wrap.
Andrey Konovalov Nov. 26, 2022, 7:13 p.m. UTC | #4
On Sun, Oct 30, 2022 at 3:59 AM Marco Elver <elver@google.com> wrote:
>
> > +- ``kasan.page_alloc.sample=<sampling frequency>`` makes KASAN tag only
>
> Frequency is number of samples per frame (unit time, or if used
> non-temporally like here, population size).
>
> [1] https://en.wikipedia.org/wiki/Systematic_sampling
>
> You're using it as an interval, so I'd just replace uses of frequency
> with "interval" appropriately here and elsewhere.

Done in v2.

> > +static inline bool kasan_sample_page_alloc(void)
> > +{
> > +     unsigned long *count = this_cpu_ptr(&kasan_page_alloc_count);
>
> this_cpu_inc_return()
>
> without it, you need to ensure preemption is disabled around here.
>
> > +
> > +     return (*count)++ % kasan_page_alloc_sample == 0;
>
> Doing '%' is a potentially costly operation if called in a fast-path.
>
> We can generate better code with (rename 'count' -> 'skip'):
>
>         long skip_next = this_cpu_dec_return(kasan_page_alloc_skip);
>
>         if (skip_next < 0) {
>                 this_cpu_write(kasan_page_alloc_skip, kasan_page_alloc_sample - 1);
>                 return true;
>         }
>
>         return false;

Done in v2.

Thank you, Marco!
diff mbox series

Patch

diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index 5c93ab915049..bd97301845ef 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -140,6 +140,10 @@  disabling KASAN altogether or controlling its features:
 - ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc
   allocations (default: ``on``).
 
+- ``kasan.page_alloc.sample=<sampling frequency>`` makes KASAN tag only
+  every Nth page_alloc allocation, where N is the value of the parameter
+  (default: ``1``).
+
 Error reports
 ~~~~~~~~~~~~~
 
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index d811b3d7d2a1..d45d45dfd007 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -120,12 +120,13 @@  static __always_inline void kasan_poison_pages(struct page *page,
 		__kasan_poison_pages(page, order, init);
 }
 
-void __kasan_unpoison_pages(struct page *page, unsigned int order, bool init);
-static __always_inline void kasan_unpoison_pages(struct page *page,
+bool __kasan_unpoison_pages(struct page *page, unsigned int order, bool init);
+static __always_inline bool kasan_unpoison_pages(struct page *page,
 						 unsigned int order, bool init)
 {
 	if (kasan_enabled())
-		__kasan_unpoison_pages(page, order, init);
+		return __kasan_unpoison_pages(page, order, init);
+	return false;
 }
 
 void __kasan_cache_create_kmalloc(struct kmem_cache *cache);
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 833bf2cfd2a3..1f30080a7a4c 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -95,19 +95,24 @@  asmlinkage void kasan_unpoison_task_stack_below(const void *watermark)
 }
 #endif /* CONFIG_KASAN_STACK */
 
-void __kasan_unpoison_pages(struct page *page, unsigned int order, bool init)
+bool __kasan_unpoison_pages(struct page *page, unsigned int order, bool init)
 {
 	u8 tag;
 	unsigned long i;
 
 	if (unlikely(PageHighMem(page)))
-		return;
+		return false;
+
+	if (!kasan_sample_page_alloc())
+		return false;
 
 	tag = kasan_random_tag();
 	kasan_unpoison(set_tag(page_address(page), tag),
 		       PAGE_SIZE << order, init);
 	for (i = 0; i < (1 << order); i++)
 		page_kasan_tag_set(page + i, tag);
+
+	return true;
 }
 
 void __kasan_poison_pages(struct page *page, unsigned int order, bool init)
diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c
index b22c4f461cb0..aa3b5a080297 100644
--- a/mm/kasan/hw_tags.c
+++ b/mm/kasan/hw_tags.c
@@ -59,6 +59,11 @@  EXPORT_SYMBOL_GPL(kasan_mode);
 /* Whether to enable vmalloc tagging. */
 DEFINE_STATIC_KEY_TRUE(kasan_flag_vmalloc);
 
+/* Frequency of page_alloc allocation poisoning. */
+unsigned long kasan_page_alloc_sample = 1;
+
+DEFINE_PER_CPU(unsigned long, kasan_page_alloc_count);
+
 /* kasan=off/on */
 static int __init early_kasan_flag(char *arg)
 {
@@ -122,6 +127,27 @@  static inline const char *kasan_mode_info(void)
 		return "sync";
 }
 
+/* kasan.page_alloc.sample=<sampling frequency> */
+static int __init early_kasan_flag_page_alloc_sample(char *arg)
+{
+	int rv;
+
+	if (!arg)
+		return -EINVAL;
+
+	rv = kstrtoul(arg, 0, &kasan_page_alloc_sample);
+	if (rv)
+		return rv;
+
+	if (!kasan_page_alloc_sample) {
+		kasan_page_alloc_sample = 1;
+		return -EINVAL;
+	}
+
+	return 0;
+}
+early_param("kasan.page_alloc.sample", early_kasan_flag_page_alloc_sample);
+
 /*
  * kasan_init_hw_tags_cpu() is called for each CPU.
  * Not marked as __init as a CPU can be hot-plugged after boot.
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index abbcc1b0eec5..ee67eb35f4a7 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -42,6 +42,9 @@  enum kasan_mode {
 
 extern enum kasan_mode kasan_mode __ro_after_init;
 
+extern unsigned long kasan_page_alloc_sample;
+DECLARE_PER_CPU(unsigned long, kasan_page_alloc_count);
+
 static inline bool kasan_vmalloc_enabled(void)
 {
 	return static_branch_likely(&kasan_flag_vmalloc);
@@ -57,6 +60,13 @@  static inline bool kasan_sync_fault_possible(void)
 	return kasan_mode == KASAN_MODE_SYNC || kasan_mode == KASAN_MODE_ASYMM;
 }
 
+static inline bool kasan_sample_page_alloc(void)
+{
+	unsigned long *count = this_cpu_ptr(&kasan_page_alloc_count);
+
+	return (*count)++ % kasan_page_alloc_sample == 0;
+}
+
 #else /* CONFIG_KASAN_HW_TAGS */
 
 static inline bool kasan_async_fault_possible(void)
@@ -69,6 +79,11 @@  static inline bool kasan_sync_fault_possible(void)
 	return true;
 }
 
+static inline bool kasan_sample_page_alloc(void)
+{
+	return true;
+}
+
 #endif /* CONFIG_KASAN_HW_TAGS */
 
 #ifdef CONFIG_KASAN_GENERIC
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b5a6c815ae28..0b36456aedfb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1366,6 +1366,8 @@  static int free_tail_pages_check(struct page *head_page, struct page *page)
  *    see the comment next to it.
  * 3. Skipping poisoning is requested via __GFP_SKIP_KASAN_POISON,
  *    see the comment next to it.
+ * 4. The allocation is excluded from being checked due to sampling,
+ *    see the call to kasan_unpoison_pages.
  *
  * Poisoning pages during deferred memory init will greatly lengthen the
  * process and cause problem in large memory systems as the deferred pages
@@ -2475,7 +2477,8 @@  inline void post_alloc_hook(struct page *page, unsigned int order,
 {
 	bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
 			!should_skip_init(gfp_flags);
-	bool init_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+	bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+	bool reset_tags = !zero_tags;
 	int i;
 
 	set_page_private(page, 0);
@@ -2498,30 +2501,42 @@  inline void post_alloc_hook(struct page *page, unsigned int order,
 	 */
 
 	/*
-	 * If memory tags should be zeroed (which happens only when memory
-	 * should be initialized as well).
+	 * If memory tags should be zeroed
+	 * (which happens only when memory should be initialized as well).
 	 */
-	if (init_tags) {
+	if (zero_tags) {
 		/* Initialize both memory and tags. */
 		for (i = 0; i != 1 << order; ++i)
 			tag_clear_highpage(page + i);
 
-		/* Note that memory is already initialized by the loop above. */
+		/* Take note that memory was initialized by the loop above. */
 		init = false;
 	}
 	if (!should_skip_kasan_unpoison(gfp_flags)) {
-		/* Unpoison shadow memory or set memory tags. */
-		kasan_unpoison_pages(page, order, init);
-
-		/* Note that memory is already initialized by KASAN. */
-		if (kasan_has_integrated_init())
-			init = false;
-	} else {
-		/* Ensure page_address() dereferencing does not fault. */
+		/* Try unpoisoning (or setting tags) and initializing memory. */
+		if (kasan_unpoison_pages(page, order, init)) {
+			/* Take note that memory was initialized by KASAN. */
+			if (kasan_has_integrated_init())
+				init = false;
+			/* Take note that memory tags were set by KASAN. */
+			reset_tags = false;
+		} else {
+			/*
+			 * KASAN decided to exclude this allocation from being
+			 * poisoned due to sampling. Skip poisoning as well.
+			 */
+			SetPageSkipKASanPoison(page);
+		}
+	}
+	/*
+	 * If memory tags have not been set, reset the page tags to ensure
+	 * page_address() dereferencing does not fault.
+	 */
+	if (reset_tags) {
 		for (i = 0; i != 1 << order; ++i)
 			page_kasan_tag_reset(page + i);
 	}
-	/* If memory is still not initialized, do it now. */
+	/* If memory is still not initialized, initialize it now. */
 	if (init)
 		kernel_init_pages(page, 1 << order);
 	/* Propagate __GFP_SKIP_KASAN_POISON to page flags. */