diff mbox series

fork, vmalloc: KASAN-poison backing pages of vmapped stacks

Message ID 20230117163543.1049025-1-jannh@google.com (mailing list archive)
State New
Headers show
Series fork, vmalloc: KASAN-poison backing pages of vmapped stacks | expand

Commit Message

Jann Horn Jan. 17, 2023, 4:35 p.m. UTC
KASAN (except in HW_TAGS mode) tracks memory state based on virtual
addresses. The mappings of kernel stack pages in the linear mapping are
currently marked as fully accessible.
Since stack corruption issues can cause some very gnarly errors, let's be
extra careful and tell KASAN to forbid accesses to stack memory through the
linear mapping.

Signed-off-by: Jann Horn <jannh@google.com>
---
I wrote this after seeing
https://lore.kernel.org/all/Y8W5rjKdZ9erIF14@casper.infradead.org/
and wondering about possible ways that this kind of stack corruption
could be sneaking past KASAN.
That's proooobably not the explanation, but still...

 include/linux/vmalloc.h |  6 ++++++
 kernel/fork.c           | 10 ++++++++++
 mm/vmalloc.c            | 24 ++++++++++++++++++++++++
 3 files changed, 40 insertions(+)


base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4

Comments

Dmitry Vyukov Jan. 18, 2023, 7:36 a.m. UTC | #1
On Tue, 17 Jan 2023 at 17:35, Jann Horn <jannh@google.com> wrote:
>
> KASAN (except in HW_TAGS mode) tracks memory state based on virtual
> addresses. The mappings of kernel stack pages in the linear mapping are
> currently marked as fully accessible.

Hi Jann,

To confirm my understanding, this is not just KASAN (except in HW_TAGS
mode), but also CONFIG_VMAP_STACK is required, right?

> Since stack corruption issues can cause some very gnarly errors, let's be
> extra careful and tell KASAN to forbid accesses to stack memory through the
> linear mapping.
>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> I wrote this after seeing
> https://lore.kernel.org/all/Y8W5rjKdZ9erIF14@casper.infradead.org/
> and wondering about possible ways that this kind of stack corruption
> could be sneaking past KASAN.
> That's proooobably not the explanation, but still...

I think catching any silent corruptions is still very useful. Besides
confusing reports, sometimes they lead to an explosion of random
reports all over the kernel.

>  include/linux/vmalloc.h |  6 ++++++
>  kernel/fork.c           | 10 ++++++++++
>  mm/vmalloc.c            | 24 ++++++++++++++++++++++++
>  3 files changed, 40 insertions(+)
>
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 096d48aa3437..bfb50178e5e3 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -297,4 +297,10 @@ bool vmalloc_dump_obj(void *object);
>  static inline bool vmalloc_dump_obj(void *object) { return false; }
>  #endif
>
> +#if defined(CONFIG_MMU) && (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS))
> +void vmalloc_poison_backing_pages(const void *addr);
> +#else
> +static inline void vmalloc_poison_backing_pages(const void *addr) {}
> +#endif

I think this should be in kasan headers and prefixed with kasan_.
There are also kmsan/kcsan that may poison memory and hw poisoning
(MADV_HWPOISON), so it's a somewhat overloaded term on its own.

Can/should this be extended to all vmalloc-ed memory? Or some of it
can be accessed via both addresses?

Also, should we mprotect it instead while it's allocated as the stack?
If it works, it looks like a reasonable improvement for
CONFIG_VMAP_STACK in general. Would also catch non-instrumented
accesses.

>  #endif /* _LINUX_VMALLOC_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9f7fe3541897..5c8c103a3597 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -321,6 +321,16 @@ static int alloc_thread_stack_node(struct task_struct *tsk, int node)
>                 vfree(stack);
>                 return -ENOMEM;
>         }
> +
> +       /*
> +        * A virtually-allocated stack's memory should only be accessed through
> +        * the vmalloc area, not through the linear mapping.
> +        * Inform KASAN that all accesses through the linear mapping should be
> +        * reported (instead of permitting all accesses through the linear
> +        * mapping).
> +        */
> +       vmalloc_poison_backing_pages(stack);
> +
>         /*
>          * We can't call find_vm_area() in interrupt context, and
>          * free_thread_stack() can be called in interrupt context,
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ca71de7c9d77..10c79c53cf5c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4042,6 +4042,30 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  }
>  #endif /* CONFIG_SMP */
>
> +#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
> +/*
> + * Poison the KASAN shadow for the linear mapping of the pages used as stack
> + * memory.
> + * NOTE: This makes no sense in HW_TAGS mode because HW_TAGS marks physical
> + * memory, not virtual memory.
> + */
> +void vmalloc_poison_backing_pages(const void *addr)
> +{
> +       struct vm_struct *area;
> +       int i;
> +
> +       if (WARN(!PAGE_ALIGNED(addr), "bad address (%p)\n", addr))
> +               return;
> +
> +       area = find_vm_area(addr);
> +       if (WARN(!area, "nonexistent vm area (%p)\n", addr))
> +               return;
> +
> +       for (i = 0; i < area->nr_pages; i++)
> +               kasan_poison_pages(area->pages[i], 0, false);
> +}
> +#endif
> +
>  #ifdef CONFIG_PRINTK
>  bool vmalloc_dump_obj(void *object)
>  {
>
> base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
> --
> 2.39.0.314.g84b9a713c41-goog
>
Andrey Konovalov Jan. 23, 2023, 4:45 p.m. UTC | #2
On Tue, Jan 17, 2023 at 5:35 PM Jann Horn <jannh@google.com> wrote:
>
> KASAN (except in HW_TAGS mode) tracks memory state based on virtual
> addresses. The mappings of kernel stack pages in the linear mapping are
> currently marked as fully accessible.
> Since stack corruption issues can cause some very gnarly errors, let's be
> extra careful and tell KASAN to forbid accesses to stack memory through the
> linear mapping.
>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> I wrote this after seeing
> https://lore.kernel.org/all/Y8W5rjKdZ9erIF14@casper.infradead.org/
> and wondering about possible ways that this kind of stack corruption
> could be sneaking past KASAN.
> That's proooobably not the explanation, but still...

Hi Jann,

if you decide to keep KASAN poisoning after addressing Dmitry's
comments, please add a KASAN KUnit test for this.

Thank you!
Jann Horn Jan. 25, 2023, 9:27 a.m. UTC | #3
On Wed, Jan 18, 2023 at 8:36 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, 17 Jan 2023 at 17:35, Jann Horn <jannh@google.com> wrote:
> >
> > KASAN (except in HW_TAGS mode) tracks memory state based on virtual
> > addresses. The mappings of kernel stack pages in the linear mapping are
> > currently marked as fully accessible.
>
> Hi Jann,
>
> To confirm my understanding, this is not just KASAN (except in HW_TAGS
> mode), but also CONFIG_VMAP_STACK is required, right?

Yes.

> > Since stack corruption issues can cause some very gnarly errors, let's be
> > extra careful and tell KASAN to forbid accesses to stack memory through the
> > linear mapping.
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
> > ---
> > I wrote this after seeing
> > https://lore.kernel.org/all/Y8W5rjKdZ9erIF14@casper.infradead.org/
> > and wondering about possible ways that this kind of stack corruption
> > could be sneaking past KASAN.
> > That's proooobably not the explanation, but still...
>
> I think catching any silent corruptions is still very useful. Besides
> confusing reports, sometimes they lead to an explosion of random
> reports all over the kernel.
>
> >  include/linux/vmalloc.h |  6 ++++++
> >  kernel/fork.c           | 10 ++++++++++
> >  mm/vmalloc.c            | 24 ++++++++++++++++++++++++
> >  3 files changed, 40 insertions(+)
> >
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 096d48aa3437..bfb50178e5e3 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -297,4 +297,10 @@ bool vmalloc_dump_obj(void *object);
> >  static inline bool vmalloc_dump_obj(void *object) { return false; }
> >  #endif
> >
> > +#if defined(CONFIG_MMU) && (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS))
> > +void vmalloc_poison_backing_pages(const void *addr);
> > +#else
> > +static inline void vmalloc_poison_backing_pages(const void *addr) {}
> > +#endif
>
> I think this should be in kasan headers and prefixed with kasan_.
> There are also kmsan/kcsan that may poison memory and hw poisoning
> (MADV_HWPOISON), so it's a somewhat overloaded term on its own.
>
> Can/should this be extended to all vmalloc-ed memory? Or some of it
> can be accessed via both addresses?

I think anything that does vmalloc_to_page() has a high chance of
doing accesses via both addresses, in particular anything involving
DMA.

Oooh, actually, there is some CIFS code that does vmalloc_to_page()
and talks about stack memory... I'll report that over on the other
thread re CIFS weirdness.

> Also, should we mprotect it instead while it's allocated as the stack?
> If it works, it looks like a reasonable improvement for
> CONFIG_VMAP_STACK in general. Would also catch non-instrumented
> accesses.

Well, we could also put it under CONFIG_DEBUG_PAGEALLOC and then use
the debug_pagealloc_map_pages() / debug_pagealloc_unmap_pages()
facilities to remove the page table entries. But I don't know if
anyone actually runs fuzzing with CONFIG_DEBUG_PAGEALLOC.
Jann Horn Jan. 25, 2023, 9:30 a.m. UTC | #4
On Wed, Jan 25, 2023 at 10:27 AM Jann Horn <jannh@google.com> wrote:
> Oooh, actually, there is some CIFS code that does vmalloc_to_page()
> and talks about stack memory... I'll report that over on the other
> thread re CIFS weirdness.

Ah, no, nevermind. The corruptions were in ntfs3, not cifs...
Jann Horn Jan. 25, 2023, 9:49 a.m. UTC | #5
On Tue, Jan 17, 2023 at 5:35 PM Jann Horn <jannh@google.com> wrote:
> KASAN (except in HW_TAGS mode) tracks memory state based on virtual
> addresses. The mappings of kernel stack pages in the linear mapping are
> currently marked as fully accessible.
> Since stack corruption issues can cause some very gnarly errors, let's be
> extra careful and tell KASAN to forbid accesses to stack memory through the
> linear mapping.
>
> Signed-off-by: Jann Horn <jannh@google.com>

@akpm please remove this one from your tree for now, it's unlikely to
work at least for now because there's code like cifs_sg_set_buf()
diff mbox series

Patch

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 096d48aa3437..bfb50178e5e3 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -297,4 +297,10 @@  bool vmalloc_dump_obj(void *object);
 static inline bool vmalloc_dump_obj(void *object) { return false; }
 #endif
 
+#if defined(CONFIG_MMU) && (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS))
+void vmalloc_poison_backing_pages(const void *addr);
+#else
+static inline void vmalloc_poison_backing_pages(const void *addr) {}
+#endif
+
 #endif /* _LINUX_VMALLOC_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 9f7fe3541897..5c8c103a3597 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -321,6 +321,16 @@  static int alloc_thread_stack_node(struct task_struct *tsk, int node)
 		vfree(stack);
 		return -ENOMEM;
 	}
+
+	/*
+	 * A virtually-allocated stack's memory should only be accessed through
+	 * the vmalloc area, not through the linear mapping.
+	 * Inform KASAN that all accesses through the linear mapping should be
+	 * reported (instead of permitting all accesses through the linear
+	 * mapping).
+	 */
+	vmalloc_poison_backing_pages(stack);
+
 	/*
 	 * We can't call find_vm_area() in interrupt context, and
 	 * free_thread_stack() can be called in interrupt context,
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ca71de7c9d77..10c79c53cf5c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4042,6 +4042,30 @@  void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 }
 #endif	/* CONFIG_SMP */
 
+#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
+/*
+ * Poison the KASAN shadow for the linear mapping of the pages used as stack
+ * memory.
+ * NOTE: This makes no sense in HW_TAGS mode because HW_TAGS marks physical
+ * memory, not virtual memory.
+ */
+void vmalloc_poison_backing_pages(const void *addr)
+{
+	struct vm_struct *area;
+	int i;
+
+	if (WARN(!PAGE_ALIGNED(addr), "bad address (%p)\n", addr))
+		return;
+
+	area = find_vm_area(addr);
+	if (WARN(!area, "nonexistent vm area (%p)\n", addr))
+		return;
+
+	for (i = 0; i < area->nr_pages; i++)
+		kasan_poison_pages(area->pages[i], 0, false);
+}
+#endif
+
 #ifdef CONFIG_PRINTK
 bool vmalloc_dump_obj(void *object)
 {