diff mbox series

[v6,2/9] x86, kfence: enable KFENCE for x86

Message ID 20201029131649.182037-3-elver@google.com (mailing list archive)
State New, archived
Headers show
Series KFENCE: A low-overhead sampling-based memory safety error detector | expand

Commit Message

Marco Elver Oct. 29, 2020, 1:16 p.m. UTC
From: Alexander Potapenko <glider@google.com>

Add architecture specific implementation details for KFENCE and enable
KFENCE for the x86 architecture. In particular, this implements the
required interface in <asm/kfence.h> for setting up the pool and
providing helper functions for protecting and unprotecting pages.

For x86, we need to ensure that the pool uses 4K pages, which is done
using the set_memory_4k() helper function.

Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
---
v5:
* MAJOR CHANGE: Switch to the memblock_alloc'd pool. Running benchmarks
  with the newly optimized is_kfence_address(), no difference between
  baseline and KFENCE is observed.
* Suggested by Jann Horn:
  * Move x86 kfence_handle_page_fault before oops handling.
  * WARN_ON in kfence_protect_page if non-4K pages.
  * Better comments for x86 kfence_protect_page.

v4:
* Define __kfence_pool_attrs.
---
 arch/x86/Kconfig              |  1 +
 arch/x86/include/asm/kfence.h | 65 +++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c           |  4 +++
 3 files changed, 70 insertions(+)
 create mode 100644 arch/x86/include/asm/kfence.h

Comments

Jann Horn Oct. 30, 2020, 2:49 a.m. UTC | #1
On Thu, Oct 29, 2020 at 2:17 PM Marco Elver <elver@google.com> wrote:
> Add architecture specific implementation details for KFENCE and enable
> KFENCE for the x86 architecture. In particular, this implements the
> required interface in <asm/kfence.h> for setting up the pool and
> providing helper functions for protecting and unprotecting pages.
>
> For x86, we need to ensure that the pool uses 4K pages, which is done
> using the set_memory_4k() helper function.
>
> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
> Co-developed-by: Marco Elver <elver@google.com>
> Signed-off-by: Marco Elver <elver@google.com>
> Signed-off-by: Alexander Potapenko <glider@google.com>
[...]
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
[...]
> @@ -725,6 +726,9 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>         if (IS_ENABLED(CONFIG_EFI))
>                 efi_recover_from_page_fault(address);
>
> +       if (kfence_handle_page_fault(address))
> +               return;

We can also get to this point due to an attempt to execute a data
page. That's very unlikely (given that the same thing would also crash
if you tried to do it with normal heap memory, and KFENCE allocations
are extremely rare); but we might want to try to avoid handling such
faults as KFENCE faults, since KFENCE will assume that it has resolved
the fault and retry execution of the faulting instruction. Once kernel
protection keys are introduced, those might cause the same kind of
trouble.

So we might want to gate this on a check like "if ((error_code &
X86_PF_PROT) == 0)" (meaning "only handle the fault if the fault was
caused by no page being present", see enum x86_pf_error_code).


Unrelated sidenote: Since we're hooking after exception fixup
handling, the debug-only KFENCE_STRESS_TEST_FAULTS can probably still
cause some behavioral differences through spurious faults in places
like copy_user_enhanced_fast_string (where the exception table entries
are used even if the *kernel* pointer, not the user pointer, causes a
fault). But since KFENCE_STRESS_TEST_FAULTS is exclusively for KFENCE
development, the difference might not matter. And ordering them the
other way around definitely isn't possible, because the kernel relies
on being able to fixup OOB reads. So there probably isn't really
anything we can do better here; it's just something to keep in mind.
Maybe you can add a little warning to the help text for that Kconfig
entry that warns people about this?



> +
>  oops:
>         /*
>          * Oops. The kernel tried to access some bad page. We'll have to
> --
> 2.29.1.341.ge80a0c044ae-goog
>
Marco Elver Oct. 30, 2020, 1 p.m. UTC | #2
On Fri, 30 Oct 2020 at 03:49, Jann Horn <jannh@google.com> wrote:
> On Thu, Oct 29, 2020 at 2:17 PM Marco Elver <elver@google.com> wrote:
> > Add architecture specific implementation details for KFENCE and enable
> > KFENCE for the x86 architecture. In particular, this implements the
> > required interface in <asm/kfence.h> for setting up the pool and
> > providing helper functions for protecting and unprotecting pages.
> >
> > For x86, we need to ensure that the pool uses 4K pages, which is done
> > using the set_memory_4k() helper function.
> >
> > Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
> > Co-developed-by: Marco Elver <elver@google.com>
> > Signed-off-by: Marco Elver <elver@google.com>
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> [...]
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> [...]
> > @@ -725,6 +726,9 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> >         if (IS_ENABLED(CONFIG_EFI))
> >                 efi_recover_from_page_fault(address);
> >
> > +       if (kfence_handle_page_fault(address))
> > +               return;
>
> We can also get to this point due to an attempt to execute a data
> page. That's very unlikely (given that the same thing would also crash
> if you tried to do it with normal heap memory, and KFENCE allocations
> are extremely rare); but we might want to try to avoid handling such
> faults as KFENCE faults, since KFENCE will assume that it has resolved
> the fault and retry execution of the faulting instruction. Once kernel
> protection keys are introduced, those might cause the same kind of
> trouble.
>
> So we might want to gate this on a check like "if ((error_code &
> X86_PF_PROT) == 0)" (meaning "only handle the fault if the fault was
> caused by no page being present", see enum x86_pf_error_code).

Good point. Will fix in v7.

> Unrelated sidenote: Since we're hooking after exception fixup
> handling, the debug-only KFENCE_STRESS_TEST_FAULTS can probably still
> cause some behavioral differences through spurious faults in places
> like copy_user_enhanced_fast_string (where the exception table entries
> are used even if the *kernel* pointer, not the user pointer, causes a
> fault). But since KFENCE_STRESS_TEST_FAULTS is exclusively for KFENCE
> development, the difference might not matter. And ordering them the
> other way around definitely isn't possible, because the kernel relies
> on being able to fixup OOB reads. So there probably isn't really
> anything we can do better here; it's just something to keep in mind.
> Maybe you can add a little warning to the help text for that Kconfig
> entry that warns people about this?

Thanks for pointing it out, but that option really is *only* to stress
kfence with concurrent allocations/frees/page faults. If anybody
enables this option for anything other than testing kfence, it's their
own fault. ;-)
I'll try to add a generic note to the Kconfig entry, but what you
mention here seems quite x86-specific.

Thanks,
-- Marco
Jann Horn Oct. 30, 2020, 3:22 p.m. UTC | #3
On Fri, Oct 30, 2020 at 2:00 PM Marco Elver <elver@google.com> wrote:
> On Fri, 30 Oct 2020 at 03:49, Jann Horn <jannh@google.com> wrote:
> > On Thu, Oct 29, 2020 at 2:17 PM Marco Elver <elver@google.com> wrote:
> > > Add architecture specific implementation details for KFENCE and enable
> > > KFENCE for the x86 architecture. In particular, this implements the
> > > required interface in <asm/kfence.h> for setting up the pool and
> > > providing helper functions for protecting and unprotecting pages.
> > >
> > > For x86, we need to ensure that the pool uses 4K pages, which is done
> > > using the set_memory_4k() helper function.
> > >
> > > Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
> > > Co-developed-by: Marco Elver <elver@google.com>
> > > Signed-off-by: Marco Elver <elver@google.com>
> > > Signed-off-by: Alexander Potapenko <glider@google.com>
> > [...]
> > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > [...]
> > > @@ -725,6 +726,9 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> > >         if (IS_ENABLED(CONFIG_EFI))
> > >                 efi_recover_from_page_fault(address);
> > >
> > > +       if (kfence_handle_page_fault(address))
> > > +               return;
[...]
> > Unrelated sidenote: Since we're hooking after exception fixup
> > handling, the debug-only KFENCE_STRESS_TEST_FAULTS can probably still
> > cause some behavioral differences through spurious faults in places
> > like copy_user_enhanced_fast_string (where the exception table entries
> > are used even if the *kernel* pointer, not the user pointer, causes a
> > fault). But since KFENCE_STRESS_TEST_FAULTS is exclusively for KFENCE
> > development, the difference might not matter. And ordering them the
> > other way around definitely isn't possible, because the kernel relies
> > on being able to fixup OOB reads. So there probably isn't really
> > anything we can do better here; it's just something to keep in mind.
> > Maybe you can add a little warning to the help text for that Kconfig
> > entry that warns people about this?
>
> Thanks for pointing it out, but that option really is *only* to stress
> kfence with concurrent allocations/frees/page faults. If anybody
> enables this option for anything other than testing kfence, it's their
> own fault. ;-)

Sounds fair. :P

> I'll try to add a generic note to the Kconfig entry, but what you
> mention here seems quite x86-specific.

(FWIW, I think it could currently also happen on arm64 in the rare
cases where KERNEL_DS is used. But luckily Christoph Hellwig has
already gotten rid of most places that did that.)
diff mbox series

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f6946b81f74a..c9ec6b5ba358 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -144,6 +144,7 @@  config X86
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
 	select HAVE_ARCH_KASAN			if X86_64
 	select HAVE_ARCH_KASAN_VMALLOC		if X86_64
+	select HAVE_ARCH_KFENCE
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS		if MMU
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
new file mode 100644
index 000000000000..beeac105dae7
--- /dev/null
+++ b/arch/x86/include/asm/kfence.h
@@ -0,0 +1,65 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_X86_KFENCE_H
+#define _ASM_X86_KFENCE_H
+
+#include <linux/bug.h>
+#include <linux/kfence.h>
+
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/set_memory.h>
+#include <asm/tlbflush.h>
+
+/*
+ * The page fault handler entry function, up to which the stack trace is
+ * truncated in reports.
+ */
+#define KFENCE_SKIP_ARCH_FAULT_HANDLER "asm_exc_page_fault"
+
+/* Force 4K pages for __kfence_pool. */
+static inline bool arch_kfence_init_pool(void)
+{
+	unsigned long addr;
+
+	for (addr = (unsigned long)__kfence_pool; is_kfence_address((void *)addr);
+	     addr += PAGE_SIZE) {
+		unsigned int level;
+
+		if (!lookup_address(addr, &level))
+			return false;
+
+		if (level != PG_LEVEL_4K)
+			set_memory_4k(addr, 1);
+	}
+
+	return true;
+}
+
+/* Protect the given page and flush TLB. */
+static inline bool kfence_protect_page(unsigned long addr, bool protect)
+{
+	unsigned int level;
+	pte_t *pte = lookup_address(addr, &level);
+
+	if (WARN_ON(!pte || level != PG_LEVEL_4K))
+		return false;
+
+	/*
+	 * We need to avoid IPIs, as we may get KFENCE allocations or faults
+	 * with interrupts disabled. Therefore, the below is best-effort, and
+	 * does not flush TLBs on all CPUs. We can tolerate some inaccuracy;
+	 * lazy fault handling takes care of faults after the page is PRESENT.
+	 */
+
+	if (protect)
+		set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
+	else
+		set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
+
+	/* Flush this CPU's TLB. */
+	flush_tlb_one_kernel(addr);
+	return true;
+}
+
+#endif /* _ASM_X86_KFENCE_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 82bf37a5c9ec..380638745f42 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -9,6 +9,7 @@ 
 #include <linux/kdebug.h>		/* oops_begin/end, ...		*/
 #include <linux/extable.h>		/* search_exception_tables	*/
 #include <linux/memblock.h>		/* max_low_pfn			*/
+#include <linux/kfence.h>		/* kfence_handle_page_fault	*/
 #include <linux/kprobes.h>		/* NOKPROBE_SYMBOL, ...		*/
 #include <linux/mmiotrace.h>		/* kmmio_handler, ...		*/
 #include <linux/perf_event.h>		/* perf_sw_event		*/
@@ -725,6 +726,9 @@  no_context(struct pt_regs *regs, unsigned long error_code,
 	if (IS_ENABLED(CONFIG_EFI))
 		efi_recover_from_page_fault(address);
 
+	if (kfence_handle_page_fault(address))
+		return;
+
 oops:
 	/*
 	 * Oops. The kernel tried to access some bad page. We'll have to