Message ID | 20200213004752.11019-5-dja@axtens.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KASAN for powerpc64 radix | expand |
Le 13/02/2020 à 01:47, Daniel Axtens a écrit : > KASAN support on Book3S is a bit tricky to get right: > > - It would be good to support inline instrumentation so as to be able to > catch stack issues that cannot be caught with outline mode. > > - Inline instrumentation requires a fixed offset. > > - Book3S runs code in real mode after booting. Most notably a lot of KVM > runs in real mode, and it would be good to be able to instrument it. > > - Because code runs in real mode after boot, the offset has to point to > valid memory both in and out of real mode. > > [ppc64 mm note: The kernel installs a linear mapping at effective > address c000... onward. This is a one-to-one mapping with physical > memory from 0000... onward. Because of how memory accesses work on > powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the > same memory both with translations on (accessing as an 'effective > address'), and with translations off (accessing as a 'real > address'). This works in both guests and the hypervisor. For more > details, see s5.7 of Book III of version 3 of the ISA, in particular > the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this > KASAN implementation currently only supports Radix.] > > One approach is just to give up on inline instrumentation. This way all > checks can be delayed until after everything set is up correctly, and the > address-to-shadow calculations can be overridden. However, the features and > speed boost provided by inline instrumentation are worth trying to do > better. > > If _at compile time_ it is known how much contiguous physical memory a > system has, the top 1/8th of the first block of physical memory can be set > aside for the shadow. This is a big hammer and comes with 3 big > consequences: > > - there's no nice way to handle physically discontiguous memory, so only > the first physical memory block can be used. > > - kernels will simply fail to boot on machines with less memory than > specified when compiling. > > - kernels running on machines with more memory than specified when > compiling will simply ignore the extra memory. > > Implement and document KASAN this way. The current implementation is Radix > only. > > Despite the limitations, it can still find bugs, > e.g. http://patchwork.ozlabs.org/patch/1103775/ > > At the moment, this physical memory limit must be set _even for outline > mode_. This may be changed in a later series - a different implementation > could be added for outline mode that dynamically allocates shadow at a > fixed offset. For example, see https://patchwork.ozlabs.org/patch/795211/ > > Suggested-by: Michael Ellerman <mpe@ellerman.id.au> > Cc: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version > Cc: Christophe Leroy <christophe.leroy@c-s.fr> # ppc32 version > Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: <christophe.leroy@c-s.fr> # focussed mainly on Documentation and things impacting PPC32
Le 13/02/2020 à 01:47, Daniel Axtens a écrit : > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 497b7d0b2d7e..f1c54c08a88e 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -169,7 +169,9 @@ config PPC > select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU > select HAVE_ARCH_JUMP_LABEL > select HAVE_ARCH_KASAN if PPC32 > + select HAVE_ARCH_KASAN if PPC_BOOK3S_64 && PPC_RADIX_MMU That's probably detail, but as it is necessary to deeply define the HW when selecting that (I mean giving the exact amount of memory and with restrictions like having a wholeblock memory), should it also depend of EXPERT ? > select HAVE_ARCH_KASAN_VMALLOC if PPC32 > + select HAVE_ARCH_KASAN_VMALLOC if PPC_BOOK3S_64 && PPC_RADIX_MMU Maybe we could have - select HAVE_ARCH_KASAN_VMALLOC if PPC32 + select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN > select HAVE_ARCH_KGDB > select HAVE_ARCH_MMAP_RND_BITS > select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT Christophe
Daniel. Can you start this commit message with a simple description of what you are actually doing? This reads like you've been on a long journey to Mordor and back, which as a reader of this patch in the long distant future, I don't care about. I just want to know what you're implementing. Also I'm struggling to review this as I don't know what software or hardware mechanisms you are using to perform sanitisation. Mikey On Thu, 2020-02-13 at 11:47 +1100, Daniel Axtens wrote: > KASAN support on Book3S is a bit tricky to get right: > > - It would be good to support inline instrumentation so as to be able to > catch stack issues that cannot be caught with outline mode. > > - Inline instrumentation requires a fixed offset. > > - Book3S runs code in real mode after booting. Most notably a lot of KVM > runs in real mode, and it would be good to be able to instrument it. > > - Because code runs in real mode after boot, the offset has to point to > valid memory both in and out of real mode. > > [ppc64 mm note: The kernel installs a linear mapping at effective > address c000... onward. This is a one-to-one mapping with physical > memory from 0000... onward. Because of how memory accesses work on > powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the > same memory both with translations on (accessing as an 'effective > address'), and with translations off (accessing as a 'real > address'). This works in both guests and the hypervisor. For more > details, see s5.7 of Book III of version 3 of the ISA, in particular > the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this > KASAN implementation currently only supports Radix.] > > One approach is just to give up on inline instrumentation. This way all > checks can be delayed until after everything set is up correctly, and the > address-to-shadow calculations can be overridden. However, the features and > speed boost provided by inline instrumentation are worth trying to do > better. > > If _at compile time_ it is known how much contiguous physical memory a > system has, the top 1/8th of the first block of physical memory can be set > aside for the shadow. This is a big hammer and comes with 3 big > consequences: > > - there's no nice way to handle physically discontiguous memory, so only > the first physical memory block can be used. > > - kernels will simply fail to boot on machines with less memory than > specified when compiling. > > - kernels running on machines with more memory than specified when > compiling will simply ignore the extra memory. > > Implement and document KASAN this way. The current implementation is Radix > only. > > Despite the limitations, it can still find bugs, > e.g. http://patchwork.ozlabs.org/patch/1103775/ > > At the moment, this physical memory limit must be set _even for outline > mode_. This may be changed in a later series - a different implementation > could be added for outline mode that dynamically allocates shadow at a > fixed offset. For example, see https://patchwork.ozlabs.org/patch/795211/ > > Suggested-by: Michael Ellerman <mpe@ellerman.id.au> > Cc: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version > Cc: Christophe Leroy <christophe.leroy@c-s.fr> # ppc32 version > Signed-off-by: Daniel Axtens <dja@axtens.net> > > --- > Changes since v6: > - rework kasan_late_init support, which also fixes book3e problem that > snowpatch > picked up (I think) > - fix a checkpatch error that snowpatch picked up > - don't needlessly move the include in kasan.h > > Changes since v5: > - rebase on powerpc/merge, with Christophe's latest changes integrating > kasan-vmalloc > - documentation tweaks based on latest 32-bit changes > > Changes since v4: > - fix some ppc32 build issues > - support ptdump > - clean up the header file. It turns out we don't need or use > KASAN_SHADOW_SIZE, > so just dump it, and make KASAN_SHADOW_END the thing that varies between 32 > and 64 bit. As part of this, make sure KASAN_SHADOW_OFFSET is only > configured for > 32 bit - it is calculated in the Makefile for ppc64. > - various cleanups > > Changes since v3: > - Address further feedback from Christophe. > - Drop changes to stack walking, it looks like the issue I observed is > related to that particular stack, not stack-walking generally. > > Changes since v2: > > - Address feedback from Christophe around cleanups and docs. > - Address feedback from Balbir: at this point I don't have a good solution > for the issues you identify around the limitations of the inline > implementation > but I think that it's worth trying to get the stack instrumentation > support. > I'm happy to have an alternative and more flexible outline mode - I had > envisoned this would be called 'lightweight' mode as it imposes fewer > restrictions. > I've linked to your implementation. I think it's best to add it in a > follow-up series. > - Made the default PHYS_MEM_SIZE_FOR_KASAN value 1024MB. I think most people > have > guests with at least that much memory in the Radix 64s case so it's a much > saner default - it means that if you just turn on KASAN without reading the > docs you're much more likely to have a bootable kernel, which you will > never > have if the value is set to zero! I'm happy to bikeshed the value if we > want. > > Changes since v1: > - Landed kasan vmalloc support upstream > - Lots of feedback from Christophe. > > Changes since the rfc: > > - Boots real and virtual hardware, kvm works. > > - disabled reporting when we're checking the stack for exception > frames. The behaviour isn't wrong, just incompatible with KASAN. > > - Documentation! > > - Dropped old module stuff in favour of KASAN_VMALLOC. > > The bugs with ftrace and kuap were due to kernel bloat pushing > prom_init calls to be done via the plt. Because we did not have > a relocatable kernel, and they are done very early, this caused > everything to explode. Compile with CONFIG_RELOCATABLE! > --- > Documentation/dev-tools/kasan.rst | 9 +- > Documentation/powerpc/kasan.txt | 112 ++++++++++++++++++- > arch/powerpc/Kconfig | 2 + > arch/powerpc/Kconfig.debug | 23 +++- > arch/powerpc/Makefile | 11 ++ > arch/powerpc/include/asm/book3s/64/hash.h | 4 + > arch/powerpc/include/asm/book3s/64/pgtable.h | 7 ++ > arch/powerpc/include/asm/book3s/64/radix.h | 5 + > arch/powerpc/include/asm/kasan.h | 11 +- > arch/powerpc/kernel/prom.c | 61 +++++++++- > arch/powerpc/mm/kasan/Makefile | 1 + > arch/powerpc/mm/kasan/init_book3s_64.c | 73 ++++++++++++ > arch/powerpc/mm/ptdump/ptdump.c | 10 +- > arch/powerpc/platforms/Kconfig.cputype | 1 + > 14 files changed, 320 insertions(+), 10 deletions(-) > create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c > > diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev- > tools/kasan.rst > index 012ef3d91d1f..5722de91ccce 100644 > --- a/Documentation/dev-tools/kasan.rst > +++ b/Documentation/dev-tools/kasan.rst > @@ -22,8 +22,9 @@ global variables yet. > Tag-based KASAN is only supported in Clang and requires version 7.0.0 or > later. > > Currently generic KASAN is supported for the x86_64, arm64, xtensa, s390 and > -riscv architectures. It is also supported on 32-bit powerpc kernels. Tag- > based > -KASAN is supported only on arm64. > +riscv architectures. It is also supported on powerpc, for 32-bit kernels, and > +for 64-bit kernels running under the Radix MMU. Tag-based KASAN is supported > +only on arm64. > > Usage > ----- > @@ -257,8 +258,8 @@ CONFIG_KASAN_VMALLOC > > With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the > cost of greater memory usage. Currently this supported on x86, s390 > -and 32-bit powerpc. It is optional, except on 32-bit powerpc kernels > -with module support, where it is required. > +and powerpc. It is optional, except on 64-bit powerpc kernels, and on > +32-bit powerpc kernels with module support, where it is required. > > This works by hooking into vmalloc and vmap, and dynamically > allocating real shadow memory to back the mappings. > diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt > index 26bb0e8bb18c..bf645a5cd486 100644 > --- a/Documentation/powerpc/kasan.txt > +++ b/Documentation/powerpc/kasan.txt > @@ -1,4 +1,4 @@ > -KASAN is supported on powerpc on 32-bit only. > +KASAN is supported on powerpc on 32-bit and Radix 64-bit only. > > 32 bit support > ============== > @@ -10,3 +10,113 @@ fixmap area and occupies one eighth of the total kernel > virtual memory space. > > Instrumentation of the vmalloc area is optional, unless built with modules, > in which case it is required. > + > +64 bit support > +============== > + > +Currently, only the radix MMU is supported. There have been versions for > Book3E > +processors floating around on the mailing list, but nothing has been merged. > + > +KASAN support on Book3S is a bit tricky to get right: > + > + - It would be good to support inline instrumentation so as to be able to > catch > + stack issues that cannot be caught with outline mode. > + > + - Inline instrumentation requires a fixed offset. > + > + - Book3S runs code in real mode after booting. Most notably a lot of KVM > runs > + in real mode, and it would be good to be able to instrument it. > + > + - Because code runs in real mode after boot, the offset has to point to > + valid memory both in and out of real mode. > + > +One approach is just to give up on inline instrumentation. This way all > checks > +can be delayed until after everything set is up correctly, and the > +address-to-shadow calculations can be overridden. However, the features and > +speed boost provided by inline instrumentation are worth trying to do better. > + > +If _at compile time_ it is known how much contiguous physical memory a system > +has, the top 1/8th of the first block of physical memory can be set aside for > +the shadow. This is a big hammer and comes with 3 big consequences: > + > + - there's no nice way to handle physically discontiguous memory, so only the > + first physical memory block can be used. > + > + - kernels will simply fail to boot on machines with less memory than > specified > + when compiling. > + > + - kernels running on machines with more memory than specified when compiling > + will simply ignore the extra memory. > + > +At the moment, this physical memory limit must be set _even for outline > mode_. > +This may be changed in a future version - a different implementation could be > +added for outline mode that dynamically allocates shadow at a fixed offset. > +For example, see https://patchwork.ozlabs.org/patch/795211/ > + > +This value is configured in CONFIG_PHYS_MEM_SIZE_FOR_KASAN. > + > +Tips > +---- > + > + - Compile with CONFIG_RELOCATABLE. > + > + In development, boot hangs were observed when building with ftrace and > KUAP > + on. These ended up being due to kernel bloat pushing prom_init calls to be > + done via the PLT. Because the kernel was not relocatable, and the calls > are > + done very early, this caused execution to jump off into somewhere > + invalid. Enabling relocation fixes this. > + > +NUMA/discontiguous physical memory > +---------------------------------- > + > +Currently the code cannot really deal with discontiguous physical memory. > Only > +physical memory that is contiguous from physical address zero can be used. > The > +size of that memory, not total memory, must be specified when configuring the > +kernel. > + > +Discontiguous memory can occur on machines with memory spread across multiple > +nodes. For example, on a Talos II with 64GB of RAM: > + > + - 32GB runs from 0x0 to 0x0000_0008_0000_0000, > + - then there's a gap, > + - then the final 32GB runs from 0x0000_2000_0000_0000 to > 0x0000_2008_0000_0000 > + > +This can create _significant_ issues: > + > + - If the machine is treated as having 64GB of _contiguous_ RAM, the > + instrumentation would assume that it ran from 0x0 to > + 0x0000_0010_0000_0000. The last 1/8th - 0x0000_000e_0000_0000 to > + 0x0000_0010_0000_0000 would be reserved as the shadow region. But when the > + kernel tried to access any of that, it would be trying to access pages > that > + are not physically present. > + > + - If the shadow region size is based on the top address, then the shadow > + region would be 0x2008_0000_0000 / 8 = 0x0401_0000_0000 bytes = 4100 GB of > + memory, clearly more than the 64GB of RAM physically present. > + > +Therefore, the code currently is restricted to dealing with memory in the > node > +starting at 0x0. For this system, that's 32GB. If a contiguous physical > memory > +size greater than the size of the first contiguous region of memory is > +specified, the system will be unable to boot or even print an error message. > + > +The layout of a system's memory can be observed in the messages that the > Radix > +MMU prints on boot. The Talos II discussed earlier has: > + > +radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages > (exec) > +radix-mmu: Mapped 0x0000000040000000-0x0000000800000000 with 1.00 GiB pages > +radix-mmu: Mapped 0x0000200000000000-0x0000200800000000 with 1.00 GiB pages > + > +As discussed, this system would be configured for 32768 MB. > + > +Another system prints: > + > +radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages > (exec) > +radix-mmu: Mapped 0x0000000040000000-0x0000002000000000 with 1.00 GiB pages > +radix-mmu: Mapped 0x0000200000000000-0x0000202000000000 with 1.00 GiB pages > + > +This machine has more memory: 0x0000_0040_0000_0000 total, but only > +0x0000_0020_0000_0000 is physically contiguous from zero, so it would be > +configured for 131072 MB of physically contiguous memory. > + > +This restriction currently also affects outline mode, but this could be > +changed in future if an alternative outline implementation is added. > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 497b7d0b2d7e..f1c54c08a88e 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -169,7 +169,9 @@ config PPC > select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && > PPC_RADIX_MMU > select HAVE_ARCH_JUMP_LABEL > select HAVE_ARCH_KASAN if PPC32 > + select HAVE_ARCH_KASAN if PPC_BOOK3S_64 && > PPC_RADIX_MMU > select HAVE_ARCH_KASAN_VMALLOC if PPC32 > + select HAVE_ARCH_KASAN_VMALLOC if PPC_BOOK3S_64 && > PPC_RADIX_MMU > select HAVE_ARCH_KGDB > select HAVE_ARCH_MMAP_RND_BITS > select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug > index 0b063830eea8..faed301a3b10 100644 > --- a/arch/powerpc/Kconfig.debug > +++ b/arch/powerpc/Kconfig.debug > @@ -394,7 +394,28 @@ config PPC_FAST_ENDIAN_SWITCH > help > If you're unsure what this is, say N. > > +config PHYS_MEM_SIZE_FOR_KASAN > + int "Contiguous physical memory size for KASAN (MB)" if KASAN && > PPC_BOOK3S_64 > + default 1024 > + help > + > + To get inline instrumentation support for KASAN on 64-bit Book3S > + machines, you need to know how much contiguous physical memory your > + system has. A shadow offset will be calculated based on this figure, > + which will be compiled in to the kernel. KASAN will use this offset > + to access its shadow region, which is used to verify memory accesses. > + > + If you attempt to boot on a system with less memory than you specify > + here, your system will fail to boot very early in the process. If you > + boot on a system with more memory than you specify, the extra memory > + will wasted - it will be reserved and not used. > + > + For systems with discontiguous blocks of physical memory, specify the > + size of the block starting at 0x0. You can determine this by looking > + at the memory layout info printed to dmesg by the radix MMU code > + early in boot. See Documentation/powerpc/kasan.txt. > + > config KASAN_SHADOW_OFFSET > hex > - depends on KASAN > + depends on KASAN && PPC32 > default 0xe0000000 > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile > index f35730548e42..eb47dc768c0a 100644 > --- a/arch/powerpc/Makefile > +++ b/arch/powerpc/Makefile > @@ -230,6 +230,17 @@ ifdef CONFIG_476FPE_ERR46 > -T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds > endif > > +ifdef CONFIG_PPC_BOOK3S_64 > +# The KASAN shadow offset is such that linear map (0xc000...) is shadowed by > +# the last 8th of linearly mapped physical memory. This way, if the code uses > +# 0xc addresses throughout, accesses work both in in real mode (where the top > +# bits are ignored) and outside of real mode. > +# > +# 0xc000000000000000 >> 3 = 0xa800000000000000 = 12105675798371893248 > +KASAN_SHADOW_OFFSET = $(shell echo 7 \* 1024 \* 1024 \* > $(CONFIG_PHYS_MEM_SIZE_FOR_KASAN) / 8 + 12105675798371893248 | bc) > +KBUILD_CFLAGS += -DKASAN_SHADOW_OFFSET=$(KASAN_SHADOW_OFFSET)UL > +endif > + > # No AltiVec or VSX instructions when building kernel > KBUILD_CFLAGS += $(call cc-option,-mno-altivec) > KBUILD_CFLAGS += $(call cc-option,-mno-vsx) > diff --git a/arch/powerpc/include/asm/book3s/64/hash.h > b/arch/powerpc/include/asm/book3s/64/hash.h > index 2781ebf6add4..fce329b8452e 100644 > --- a/arch/powerpc/include/asm/book3s/64/hash.h > +++ b/arch/powerpc/include/asm/book3s/64/hash.h > @@ -18,6 +18,10 @@ > #include <asm/book3s/64/hash-4k.h> > #endif > > +#define H_PTRS_PER_PTE (1 << H_PTE_INDEX_SIZE) > +#define H_PTRS_PER_PMD (1 << H_PMD_INDEX_SIZE) > +#define H_PTRS_PER_PUD (1 << H_PUD_INDEX_SIZE) > + > /* Bits to set in a PMD/PUD/PGD entry valid bit*/ > #define HASH_PMD_VAL_BITS (0x8000000000000000UL) > #define HASH_PUD_VAL_BITS (0x8000000000000000UL) > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include/asm/book3s/64/pgtable.h > index 201a69e6a355..309fb925a96e 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -231,6 +231,13 @@ extern unsigned long __pmd_frag_size_shift; > #define PTRS_PER_PUD (1 << PUD_INDEX_SIZE) > #define PTRS_PER_PGD (1 << PGD_INDEX_SIZE) > > +#define MAX_PTRS_PER_PTE ((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? \ > + H_PTRS_PER_PTE : R_PTRS_PER_PTE) > +#define MAX_PTRS_PER_PMD ((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? \ > + H_PTRS_PER_PMD : R_PTRS_PER_PMD) > +#define MAX_PTRS_PER_PUD ((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? \ > + H_PTRS_PER_PUD : R_PTRS_PER_PUD) > + > /* PMD_SHIFT determines what a second-level page table entry can map */ > #define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE) > #define PMD_SIZE (1UL << PMD_SHIFT) > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > b/arch/powerpc/include/asm/book3s/64/radix.h > index d97db3ad9aae..4f826259de71 100644 > --- a/arch/powerpc/include/asm/book3s/64/radix.h > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > @@ -35,6 +35,11 @@ > #define RADIX_PMD_SHIFT (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE) > #define RADIX_PUD_SHIFT (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE) > #define RADIX_PGD_SHIFT (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE) > + > +#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE) > +#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE) > +#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE) > + > /* > * Size of EA range mapped by our pagetables. > */ > diff --git a/arch/powerpc/include/asm/kasan.h > b/arch/powerpc/include/asm/kasan.h > index fbff9ff9032e..b21d3ef88214 100644 > --- a/arch/powerpc/include/asm/kasan.h > +++ b/arch/powerpc/include/asm/kasan.h > @@ -21,11 +21,18 @@ > #define KASAN_SHADOW_START (KASAN_SHADOW_OFFSET + \ > (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT)) > > +#ifdef CONFIG_KASAN_SHADOW_OFFSET > #define KASAN_SHADOW_OFFSET ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET) > +#endif > > +#ifdef CONFIG_PPC32 > #define KASAN_SHADOW_END 0UL > +#endif > > -#define KASAN_SHADOW_SIZE (KASAN_SHADOW_END - KASAN_SHADOW_START) > +#ifdef CONFIG_PPC_BOOK3S_64 > +#define KASAN_SHADOW_END (KASAN_SHADOW_OFFSET + \ > + (RADIX_VMEMMAP_END >> > KASAN_SHADOW_SCALE_SHIFT)) > +#endif > > #ifdef CONFIG_KASAN > void kasan_early_init(void); > @@ -38,5 +45,5 @@ static inline void kasan_mmu_init(void) { } > static inline void kasan_late_init(void) { } > #endif > > -#endif /* __ASSEMBLY */ > +#endif /* !__ASSEMBLY__ */ > #endif > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index 6620f37abe73..2857c3d44e9c 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -72,6 +72,7 @@ unsigned long tce_alloc_start, tce_alloc_end; > u64 ppc64_rma_size; > #endif > static phys_addr_t first_memblock_size; > +static phys_addr_t top_phys_addr; > static int __initdata boot_cpu_count; > > static int __init early_parse_mem(char *p) > @@ -449,6 +450,26 @@ static bool validate_mem_limit(u64 base, u64 *size) > { > u64 max_mem = 1UL << (MAX_PHYSMEM_BITS); > > + /* > + * To handle the NUMA/discontiguous memory case, don't allow a block > + * to be added if it falls completely beyond the configured physical > + * memory. Print an informational message. > + * > + * Frustratingly we also see this with qemu - it seems to split the > + * specified memory into a number of smaller blocks. If this happens > + * under qemu, it probably represents misconfiguration. So we want > + * the message to be noticeable, but not shouty. > + * > + * See Documentation/powerpc/kasan.txt > + */ > + if (IS_ENABLED(CONFIG_KASAN) && > + (base >= ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN * SZ_1M))) { > + pr_warn("KASAN: not adding memory block at %llx (size %llx)\n" > + "This could be due to discontiguous memory or kernel > misconfiguration.", > + base, *size); > + return false; > + } > + > if (base >= max_mem) > return false; > if ((base + *size) > max_mem) > @@ -572,8 +593,10 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 > size) > > /* Add the chunk to the MEMBLOCK list */ > if (add_mem_to_memblock) { > - if (validate_mem_limit(base, &size)) > + if (validate_mem_limit(base, &size)) { > memblock_add(base, size); > + top_phys_addr = max(top_phys_addr, (phys_addr_t)(base + > size)); > + } > } > } > > @@ -613,6 +636,8 @@ static void __init early_reserve_mem_dt(void) > static void __init early_reserve_mem(void) > { > __be64 *reserve_map; > + phys_addr_t kasan_shadow_start; > + phys_addr_t kasan_memory_size; > > reserve_map = (__be64 *)(((unsigned long)initial_boot_params) + > fdt_off_mem_rsvmap(initial_boot_params)); > @@ -651,6 +676,40 @@ static void __init early_reserve_mem(void) > return; > } > #endif > + > + if (IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_PPC_BOOK3S_64)) { > + kasan_memory_size = > + ((phys_addr_t)CONFIG_PHYS_MEM_SIZE_FOR_KASAN * SZ_1M); > + > + if (top_phys_addr < kasan_memory_size) { > + /* > + * We are doomed. We shouldn't even be able to get this > + * far, but we do in qemu. If we continue and turn > + * relocations on, we'll take fatal page faults for > + * memory that's not physically present. Instead, > + * panic() here: it will be saved to __log_buf even if > + * it doesn't get printed to the console. > + */ > + panic("Tried to boot a KASAN kernel configured for %u MB > with only %llu MB! Aborting.", > + CONFIG_PHYS_MEM_SIZE_FOR_KASAN, > + (u64)(top_phys_addr * SZ_1M)); > + } else if (top_phys_addr > kasan_memory_size) { > + /* print a biiiig warning in hopes people notice */ > + pr_err("===========================================\n" > + "Physical memory exceeds compiled-in maximum!\n" > + "This kernel was compiled for KASAN with %u MB > physical memory.\n" > + "The physical memory detected is at least %llu > MB.\n" > + "Memory above the compiled limit will not be > used!\n" > + "===========================================\n", > + CONFIG_PHYS_MEM_SIZE_FOR_KASAN, > + (u64)(top_phys_addr * SZ_1M)); > + } > + > + kasan_shadow_start = _ALIGN_DOWN(kasan_memory_size * 7 / 8, > PAGE_SIZE); > + DBG("reserving %llx -> %llx for KASAN", > + kasan_shadow_start, top_phys_addr); > + memblock_reserve(kasan_shadow_start, top_phys_addr - > kasan_shadow_start); > + } > } > > #ifdef CONFIG_PPC_TRANSACTIONAL_MEM > diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile > index 36a4e1b10b2d..f02b15c78e4d 100644 > --- a/arch/powerpc/mm/kasan/Makefile > +++ b/arch/powerpc/mm/kasan/Makefile > @@ -3,3 +3,4 @@ > KASAN_SANITIZE := n > > obj-$(CONFIG_PPC32) += init_32.o > +obj-$(CONFIG_PPC_BOOK3S_64) += init_book3s_64.o > diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c > b/arch/powerpc/mm/kasan/init_book3s_64.c > new file mode 100644 > index 000000000000..1c95fe6495c7 > --- /dev/null > +++ b/arch/powerpc/mm/kasan/init_book3s_64.c > @@ -0,0 +1,73 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * KASAN for 64-bit Book3S powerpc > + * > + * Copyright (C) 2019 IBM Corporation > + * Author: Daniel Axtens <dja@axtens.net> > + */ > + > +#define DISABLE_BRANCH_PROFILING > + > +#include <linux/kasan.h> > +#include <linux/printk.h> > +#include <linux/sched/task.h> > +#include <asm/pgalloc.h> > + > +void __init kasan_init(void) > +{ > + int i; > + void *k_start = kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START); > + void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END); > + > + pte_t pte = pte_mkpte(pfn_pte(virt_to_pfn(kasan_early_shadow_page), > + PAGE_KERNEL)); > + > + if (!early_radix_enabled()) > + panic("KASAN requires radix!"); > + > + for (i = 0; i < PTRS_PER_PTE; i++) > + __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page, > + &kasan_early_shadow_pte[i], pte, 0); > + > + for (i = 0; i < PTRS_PER_PMD; i++) > + pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i], > + kasan_early_shadow_pte); > + > + for (i = 0; i < PTRS_PER_PUD; i++) > + pud_populate(&init_mm, &kasan_early_shadow_pud[i], > + kasan_early_shadow_pmd); > + > + memset((void *)KASAN_SHADOW_START, KASAN_SHADOW_INIT, > + ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN * > + SZ_1M >> KASAN_SHADOW_SCALE_SHIFT)); > + > + kasan_populate_early_shadow(kasan_mem_to_shadow((void > *)RADIX_KERN_VIRT_START), > + kasan_mem_to_shadow((void > *)RADIX_VMALLOC_START)); > + > + /* leave a hole here for vmalloc */ > + > + kasan_populate_early_shadow( > + kasan_mem_to_shadow((void *)RADIX_VMALLOC_END), > + kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END)); > + > + flush_tlb_kernel_range((unsigned long)k_start, (unsigned long)k_end); > + > + /* mark early shadow region as RO and wipe */ > + pte = pte_mkpte(pfn_pte(virt_to_pfn(kasan_early_shadow_page), > PAGE_KERNEL_RO)); > + for (i = 0; i < PTRS_PER_PTE; i++) > + __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page, > + &kasan_early_shadow_pte[i], pte, 0); > + > + /* > + * clear_page relies on some cache info that hasn't been set up yet. > + * It ends up looping ~forever and blows up other data. > + * Use memset instead. > + */ > + memset(kasan_early_shadow_page, 0, PAGE_SIZE); > + > + /* Enable error messages */ > + init_task.kasan_depth = 0; > + pr_info("KASAN init done (64-bit Book3S heavyweight mode)\n"); > +} > + > +void __init kasan_late_init(void) { } > diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c > index 206156255247..b982dc5441c0 100644 > --- a/arch/powerpc/mm/ptdump/ptdump.c > +++ b/arch/powerpc/mm/ptdump/ptdump.c > @@ -73,6 +73,10 @@ struct addr_marker { > > static struct addr_marker address_markers[] = { > { 0, "Start of kernel VM" }, > +#if defined(CONFIG_PPC64) && defined(CONFIG_KASAN) > + { 0, "kasan shadow mem start" }, > + { 0, "kasan shadow mem end" }, > +#endif > { 0, "vmalloc() Area" }, > { 0, "vmalloc() End" }, > #ifdef CONFIG_PPC64 > @@ -92,10 +96,10 @@ static struct addr_marker address_markers[] = { > #endif > { 0, "Fixmap start" }, > { 0, "Fixmap end" }, > -#endif > #ifdef CONFIG_KASAN > { 0, "kasan shadow mem start" }, > { 0, "kasan shadow mem end" }, > +#endif > #endif > { -1, NULL }, > }; > @@ -317,6 +321,10 @@ static void populate_markers(void) > int i = 0; > > address_markers[i++].start_address = PAGE_OFFSET; > +#if defined(CONFIG_PPC64) && defined(CONFIG_KASAN) > + address_markers[i++].start_address = KASAN_SHADOW_START; > + address_markers[i++].start_address = KASAN_SHADOW_END; > +#endif > address_markers[i++].start_address = VMALLOC_START; > address_markers[i++].start_address = VMALLOC_END; > #ifdef CONFIG_PPC64 > diff --git a/arch/powerpc/platforms/Kconfig.cputype > b/arch/powerpc/platforms/Kconfig.cputype > index 6caedc88474f..cedc86686e65 100644 > --- a/arch/powerpc/platforms/Kconfig.cputype > +++ b/arch/powerpc/platforms/Kconfig.cputype > @@ -99,6 +99,7 @@ config PPC_BOOK3S_64 > select ARCH_SUPPORTS_NUMA_BALANCING > select IRQ_WORK > select PPC_MM_SLICES > + select KASAN_VMALLOC if KASAN > > config PPC_BOOK3E_64 > bool "Embedded processors"
Le 17/02/2020 à 00:08, Michael Neuling a écrit : > Daniel. > > Can you start this commit message with a simple description of what you are > actually doing? This reads like you've been on a long journey to Mordor and > back, which as a reader of this patch in the long distant future, I don't care > about. I just want to know what you're implementing. > > Also I'm struggling to review this as I don't know what software or hardware > mechanisms you are using to perform sanitisation. KASAN is standard, it's simply using GCC ASAN in kernel mode, ie kernel is built with -fsanitize=kernel-address (https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html) You have more details there: https://www.kernel.org/doc/html/latest/dev-tools/kasan.html Christophe
Christophe Leroy <christophe.leroy@c-s.fr> writes: > Le 13/02/2020 à 01:47, Daniel Axtens a écrit : >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> index 497b7d0b2d7e..f1c54c08a88e 100644 >> --- a/arch/powerpc/Kconfig >> +++ b/arch/powerpc/Kconfig >> @@ -169,7 +169,9 @@ config PPC >> select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU >> select HAVE_ARCH_JUMP_LABEL >> select HAVE_ARCH_KASAN if PPC32 >> + select HAVE_ARCH_KASAN if PPC_BOOK3S_64 && PPC_RADIX_MMU > > That's probably detail, but as it is necessary to deeply define the HW > when selecting that (I mean giving the exact amount of memory and with > restrictions like having a wholeblock memory), should it also depend of > EXPERT ? If it weren't a debug feature I would definitely agree with you, but I think being a debug feature it's not so necessary. Also anything with more memory than the config option specifies will still boot - it's just less memory that won't boot. I've set the default to 1024MB: I know that's a lot of memory for an embedded system but I think for anything with the Radix MMU it's an OK default. I'm sure if mpe disagrees he can add EXPERT when he's merging :) > Maybe we could have > > - select HAVE_ARCH_KASAN_VMALLOC if PPC32 > + select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN That's a good idea. Done. Thanks for the review! Regards, Daniel > >> select HAVE_ARCH_KGDB >> select HAVE_ARCH_MMAP_RND_BITS >> select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT > > > Christophe
diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index 012ef3d91d1f..5722de91ccce 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -22,8 +22,9 @@ global variables yet. Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later. Currently generic KASAN is supported for the x86_64, arm64, xtensa, s390 and -riscv architectures. It is also supported on 32-bit powerpc kernels. Tag-based -KASAN is supported only on arm64. +riscv architectures. It is also supported on powerpc, for 32-bit kernels, and +for 64-bit kernels running under the Radix MMU. Tag-based KASAN is supported +only on arm64. Usage ----- @@ -257,8 +258,8 @@ CONFIG_KASAN_VMALLOC With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the cost of greater memory usage. Currently this supported on x86, s390 -and 32-bit powerpc. It is optional, except on 32-bit powerpc kernels -with module support, where it is required. +and powerpc. It is optional, except on 64-bit powerpc kernels, and on +32-bit powerpc kernels with module support, where it is required. This works by hooking into vmalloc and vmap, and dynamically allocating real shadow memory to back the mappings. diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt index 26bb0e8bb18c..bf645a5cd486 100644 --- a/Documentation/powerpc/kasan.txt +++ b/Documentation/powerpc/kasan.txt @@ -1,4 +1,4 @@ -KASAN is supported on powerpc on 32-bit only. +KASAN is supported on powerpc on 32-bit and Radix 64-bit only. 32 bit support ============== @@ -10,3 +10,113 @@ fixmap area and occupies one eighth of the total kernel virtual memory space. Instrumentation of the vmalloc area is optional, unless built with modules, in which case it is required. + +64 bit support +============== + +Currently, only the radix MMU is supported. There have been versions for Book3E +processors floating around on the mailing list, but nothing has been merged. + +KASAN support on Book3S is a bit tricky to get right: + + - It would be good to support inline instrumentation so as to be able to catch + stack issues that cannot be caught with outline mode. + + - Inline instrumentation requires a fixed offset. + + - Book3S runs code in real mode after booting. Most notably a lot of KVM runs + in real mode, and it would be good to be able to instrument it. + + - Because code runs in real mode after boot, the offset has to point to + valid memory both in and out of real mode. + +One approach is just to give up on inline instrumentation. This way all checks +can be delayed until after everything set is up correctly, and the +address-to-shadow calculations can be overridden. However, the features and +speed boost provided by inline instrumentation are worth trying to do better. + +If _at compile time_ it is known how much contiguous physical memory a system +has, the top 1/8th of the first block of physical memory can be set aside for +the shadow. This is a big hammer and comes with 3 big consequences: + + - there's no nice way to handle physically discontiguous memory, so only the + first physical memory block can be used. + + - kernels will simply fail to boot on machines with less memory than specified + when compiling. + + - kernels running on machines with more memory than specified when compiling + will simply ignore the extra memory. + +At the moment, this physical memory limit must be set _even for outline mode_. +This may be changed in a future version - a different implementation could be +added for outline mode that dynamically allocates shadow at a fixed offset. +For example, see https://patchwork.ozlabs.org/patch/795211/ + +This value is configured in CONFIG_PHYS_MEM_SIZE_FOR_KASAN. + +Tips +---- + + - Compile with CONFIG_RELOCATABLE. + + In development, boot hangs were observed when building with ftrace and KUAP + on. These ended up being due to kernel bloat pushing prom_init calls to be + done via the PLT. Because the kernel was not relocatable, and the calls are + done very early, this caused execution to jump off into somewhere + invalid. Enabling relocation fixes this. + +NUMA/discontiguous physical memory +---------------------------------- + +Currently the code cannot really deal with discontiguous physical memory. Only +physical memory that is contiguous from physical address zero can be used. The +size of that memory, not total memory, must be specified when configuring the +kernel. + +Discontiguous memory can occur on machines with memory spread across multiple +nodes. For example, on a Talos II with 64GB of RAM: + + - 32GB runs from 0x0 to 0x0000_0008_0000_0000, + - then there's a gap, + - then the final 32GB runs from 0x0000_2000_0000_0000 to 0x0000_2008_0000_0000 + +This can create _significant_ issues: + + - If the machine is treated as having 64GB of _contiguous_ RAM, the + instrumentation would assume that it ran from 0x0 to + 0x0000_0010_0000_0000. The last 1/8th - 0x0000_000e_0000_0000 to + 0x0000_0010_0000_0000 would be reserved as the shadow region. But when the + kernel tried to access any of that, it would be trying to access pages that + are not physically present. + + - If the shadow region size is based on the top address, then the shadow + region would be 0x2008_0000_0000 / 8 = 0x0401_0000_0000 bytes = 4100 GB of + memory, clearly more than the 64GB of RAM physically present. + +Therefore, the code currently is restricted to dealing with memory in the node +starting at 0x0. For this system, that's 32GB. If a contiguous physical memory +size greater than the size of the first contiguous region of memory is +specified, the system will be unable to boot or even print an error message. + +The layout of a system's memory can be observed in the messages that the Radix +MMU prints on boot. The Talos II discussed earlier has: + +radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages (exec) +radix-mmu: Mapped 0x0000000040000000-0x0000000800000000 with 1.00 GiB pages +radix-mmu: Mapped 0x0000200000000000-0x0000200800000000 with 1.00 GiB pages + +As discussed, this system would be configured for 32768 MB. + +Another system prints: + +radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages (exec) +radix-mmu: Mapped 0x0000000040000000-0x0000002000000000 with 1.00 GiB pages +radix-mmu: Mapped 0x0000200000000000-0x0000202000000000 with 1.00 GiB pages + +This machine has more memory: 0x0000_0040_0000_0000 total, but only +0x0000_0020_0000_0000 is physically contiguous from zero, so it would be +configured for 131072 MB of physically contiguous memory. + +This restriction currently also affects outline mode, but this could be +changed in future if an alternative outline implementation is added. diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 497b7d0b2d7e..f1c54c08a88e 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -169,7 +169,9 @@ config PPC select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KASAN if PPC32 + select HAVE_ARCH_KASAN if PPC_BOOK3S_64 && PPC_RADIX_MMU select HAVE_ARCH_KASAN_VMALLOC if PPC32 + select HAVE_ARCH_KASAN_VMALLOC if PPC_BOOK3S_64 && PPC_RADIX_MMU select HAVE_ARCH_KGDB select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 0b063830eea8..faed301a3b10 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -394,7 +394,28 @@ config PPC_FAST_ENDIAN_SWITCH help If you're unsure what this is, say N. +config PHYS_MEM_SIZE_FOR_KASAN + int "Contiguous physical memory size for KASAN (MB)" if KASAN && PPC_BOOK3S_64 + default 1024 + help + + To get inline instrumentation support for KASAN on 64-bit Book3S + machines, you need to know how much contiguous physical memory your + system has. A shadow offset will be calculated based on this figure, + which will be compiled in to the kernel. KASAN will use this offset + to access its shadow region, which is used to verify memory accesses. + + If you attempt to boot on a system with less memory than you specify + here, your system will fail to boot very early in the process. If you + boot on a system with more memory than you specify, the extra memory + will wasted - it will be reserved and not used. + + For systems with discontiguous blocks of physical memory, specify the + size of the block starting at 0x0. You can determine this by looking + at the memory layout info printed to dmesg by the radix MMU code + early in boot. See Documentation/powerpc/kasan.txt. + config KASAN_SHADOW_OFFSET hex - depends on KASAN + depends on KASAN && PPC32 default 0xe0000000 diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index f35730548e42..eb47dc768c0a 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -230,6 +230,17 @@ ifdef CONFIG_476FPE_ERR46 -T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds endif +ifdef CONFIG_PPC_BOOK3S_64 +# The KASAN shadow offset is such that linear map (0xc000...) is shadowed by +# the last 8th of linearly mapped physical memory. This way, if the code uses +# 0xc addresses throughout, accesses work both in in real mode (where the top +# bits are ignored) and outside of real mode. +# +# 0xc000000000000000 >> 3 = 0xa800000000000000 = 12105675798371893248 +KASAN_SHADOW_OFFSET = $(shell echo 7 \* 1024 \* 1024 \* $(CONFIG_PHYS_MEM_SIZE_FOR_KASAN) / 8 + 12105675798371893248 | bc) +KBUILD_CFLAGS += -DKASAN_SHADOW_OFFSET=$(KASAN_SHADOW_OFFSET)UL +endif + # No AltiVec or VSX instructions when building kernel KBUILD_CFLAGS += $(call cc-option,-mno-altivec) KBUILD_CFLAGS += $(call cc-option,-mno-vsx) diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h index 2781ebf6add4..fce329b8452e 100644 --- a/arch/powerpc/include/asm/book3s/64/hash.h +++ b/arch/powerpc/include/asm/book3s/64/hash.h @@ -18,6 +18,10 @@ #include <asm/book3s/64/hash-4k.h> #endif +#define H_PTRS_PER_PTE (1 << H_PTE_INDEX_SIZE) +#define H_PTRS_PER_PMD (1 << H_PMD_INDEX_SIZE) +#define H_PTRS_PER_PUD (1 << H_PUD_INDEX_SIZE) + /* Bits to set in a PMD/PUD/PGD entry valid bit*/ #define HASH_PMD_VAL_BITS (0x8000000000000000UL) #define HASH_PUD_VAL_BITS (0x8000000000000000UL) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 201a69e6a355..309fb925a96e 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -231,6 +231,13 @@ extern unsigned long __pmd_frag_size_shift; #define PTRS_PER_PUD (1 << PUD_INDEX_SIZE) #define PTRS_PER_PGD (1 << PGD_INDEX_SIZE) +#define MAX_PTRS_PER_PTE ((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? \ + H_PTRS_PER_PTE : R_PTRS_PER_PTE) +#define MAX_PTRS_PER_PMD ((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? \ + H_PTRS_PER_PMD : R_PTRS_PER_PMD) +#define MAX_PTRS_PER_PUD ((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? \ + H_PTRS_PER_PUD : R_PTRS_PER_PUD) + /* PMD_SHIFT determines what a second-level page table entry can map */ #define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE) #define PMD_SIZE (1UL << PMD_SHIFT) diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h index d97db3ad9aae..4f826259de71 100644 --- a/arch/powerpc/include/asm/book3s/64/radix.h +++ b/arch/powerpc/include/asm/book3s/64/radix.h @@ -35,6 +35,11 @@ #define RADIX_PMD_SHIFT (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE) #define RADIX_PUD_SHIFT (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE) #define RADIX_PGD_SHIFT (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE) + +#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE) +#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE) +#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE) + /* * Size of EA range mapped by our pagetables. */ diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h index fbff9ff9032e..b21d3ef88214 100644 --- a/arch/powerpc/include/asm/kasan.h +++ b/arch/powerpc/include/asm/kasan.h @@ -21,11 +21,18 @@ #define KASAN_SHADOW_START (KASAN_SHADOW_OFFSET + \ (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT)) +#ifdef CONFIG_KASAN_SHADOW_OFFSET #define KASAN_SHADOW_OFFSET ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET) +#endif +#ifdef CONFIG_PPC32 #define KASAN_SHADOW_END 0UL +#endif -#define KASAN_SHADOW_SIZE (KASAN_SHADOW_END - KASAN_SHADOW_START) +#ifdef CONFIG_PPC_BOOK3S_64 +#define KASAN_SHADOW_END (KASAN_SHADOW_OFFSET + \ + (RADIX_VMEMMAP_END >> KASAN_SHADOW_SCALE_SHIFT)) +#endif #ifdef CONFIG_KASAN void kasan_early_init(void); @@ -38,5 +45,5 @@ static inline void kasan_mmu_init(void) { } static inline void kasan_late_init(void) { } #endif -#endif /* __ASSEMBLY */ +#endif /* !__ASSEMBLY__ */ #endif diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 6620f37abe73..2857c3d44e9c 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -72,6 +72,7 @@ unsigned long tce_alloc_start, tce_alloc_end; u64 ppc64_rma_size; #endif static phys_addr_t first_memblock_size; +static phys_addr_t top_phys_addr; static int __initdata boot_cpu_count; static int __init early_parse_mem(char *p) @@ -449,6 +450,26 @@ static bool validate_mem_limit(u64 base, u64 *size) { u64 max_mem = 1UL << (MAX_PHYSMEM_BITS); + /* + * To handle the NUMA/discontiguous memory case, don't allow a block + * to be added if it falls completely beyond the configured physical + * memory. Print an informational message. + * + * Frustratingly we also see this with qemu - it seems to split the + * specified memory into a number of smaller blocks. If this happens + * under qemu, it probably represents misconfiguration. So we want + * the message to be noticeable, but not shouty. + * + * See Documentation/powerpc/kasan.txt + */ + if (IS_ENABLED(CONFIG_KASAN) && + (base >= ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN * SZ_1M))) { + pr_warn("KASAN: not adding memory block at %llx (size %llx)\n" + "This could be due to discontiguous memory or kernel misconfiguration.", + base, *size); + return false; + } + if (base >= max_mem) return false; if ((base + *size) > max_mem) @@ -572,8 +593,10 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size) /* Add the chunk to the MEMBLOCK list */ if (add_mem_to_memblock) { - if (validate_mem_limit(base, &size)) + if (validate_mem_limit(base, &size)) { memblock_add(base, size); + top_phys_addr = max(top_phys_addr, (phys_addr_t)(base + size)); + } } } @@ -613,6 +636,8 @@ static void __init early_reserve_mem_dt(void) static void __init early_reserve_mem(void) { __be64 *reserve_map; + phys_addr_t kasan_shadow_start; + phys_addr_t kasan_memory_size; reserve_map = (__be64 *)(((unsigned long)initial_boot_params) + fdt_off_mem_rsvmap(initial_boot_params)); @@ -651,6 +676,40 @@ static void __init early_reserve_mem(void) return; } #endif + + if (IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_PPC_BOOK3S_64)) { + kasan_memory_size = + ((phys_addr_t)CONFIG_PHYS_MEM_SIZE_FOR_KASAN * SZ_1M); + + if (top_phys_addr < kasan_memory_size) { + /* + * We are doomed. We shouldn't even be able to get this + * far, but we do in qemu. If we continue and turn + * relocations on, we'll take fatal page faults for + * memory that's not physically present. Instead, + * panic() here: it will be saved to __log_buf even if + * it doesn't get printed to the console. + */ + panic("Tried to boot a KASAN kernel configured for %u MB with only %llu MB! Aborting.", + CONFIG_PHYS_MEM_SIZE_FOR_KASAN, + (u64)(top_phys_addr * SZ_1M)); + } else if (top_phys_addr > kasan_memory_size) { + /* print a biiiig warning in hopes people notice */ + pr_err("===========================================\n" + "Physical memory exceeds compiled-in maximum!\n" + "This kernel was compiled for KASAN with %u MB physical memory.\n" + "The physical memory detected is at least %llu MB.\n" + "Memory above the compiled limit will not be used!\n" + "===========================================\n", + CONFIG_PHYS_MEM_SIZE_FOR_KASAN, + (u64)(top_phys_addr * SZ_1M)); + } + + kasan_shadow_start = _ALIGN_DOWN(kasan_memory_size * 7 / 8, PAGE_SIZE); + DBG("reserving %llx -> %llx for KASAN", + kasan_shadow_start, top_phys_addr); + memblock_reserve(kasan_shadow_start, top_phys_addr - kasan_shadow_start); + } } #ifdef CONFIG_PPC_TRANSACTIONAL_MEM diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile index 36a4e1b10b2d..f02b15c78e4d 100644 --- a/arch/powerpc/mm/kasan/Makefile +++ b/arch/powerpc/mm/kasan/Makefile @@ -3,3 +3,4 @@ KASAN_SANITIZE := n obj-$(CONFIG_PPC32) += init_32.o +obj-$(CONFIG_PPC_BOOK3S_64) += init_book3s_64.o diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c new file mode 100644 index 000000000000..1c95fe6495c7 --- /dev/null +++ b/arch/powerpc/mm/kasan/init_book3s_64.c @@ -0,0 +1,73 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KASAN for 64-bit Book3S powerpc + * + * Copyright (C) 2019 IBM Corporation + * Author: Daniel Axtens <dja@axtens.net> + */ + +#define DISABLE_BRANCH_PROFILING + +#include <linux/kasan.h> +#include <linux/printk.h> +#include <linux/sched/task.h> +#include <asm/pgalloc.h> + +void __init kasan_init(void) +{ + int i; + void *k_start = kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START); + void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END); + + pte_t pte = pte_mkpte(pfn_pte(virt_to_pfn(kasan_early_shadow_page), + PAGE_KERNEL)); + + if (!early_radix_enabled()) + panic("KASAN requires radix!"); + + for (i = 0; i < PTRS_PER_PTE; i++) + __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page, + &kasan_early_shadow_pte[i], pte, 0); + + for (i = 0; i < PTRS_PER_PMD; i++) + pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i], + kasan_early_shadow_pte); + + for (i = 0; i < PTRS_PER_PUD; i++) + pud_populate(&init_mm, &kasan_early_shadow_pud[i], + kasan_early_shadow_pmd); + + memset((void *)KASAN_SHADOW_START, KASAN_SHADOW_INIT, + ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN * + SZ_1M >> KASAN_SHADOW_SCALE_SHIFT)); + + kasan_populate_early_shadow(kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START), + kasan_mem_to_shadow((void *)RADIX_VMALLOC_START)); + + /* leave a hole here for vmalloc */ + + kasan_populate_early_shadow( + kasan_mem_to_shadow((void *)RADIX_VMALLOC_END), + kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END)); + + flush_tlb_kernel_range((unsigned long)k_start, (unsigned long)k_end); + + /* mark early shadow region as RO and wipe */ + pte = pte_mkpte(pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO)); + for (i = 0; i < PTRS_PER_PTE; i++) + __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page, + &kasan_early_shadow_pte[i], pte, 0); + + /* + * clear_page relies on some cache info that hasn't been set up yet. + * It ends up looping ~forever and blows up other data. + * Use memset instead. + */ + memset(kasan_early_shadow_page, 0, PAGE_SIZE); + + /* Enable error messages */ + init_task.kasan_depth = 0; + pr_info("KASAN init done (64-bit Book3S heavyweight mode)\n"); +} + +void __init kasan_late_init(void) { } diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c index 206156255247..b982dc5441c0 100644 --- a/arch/powerpc/mm/ptdump/ptdump.c +++ b/arch/powerpc/mm/ptdump/ptdump.c @@ -73,6 +73,10 @@ struct addr_marker { static struct addr_marker address_markers[] = { { 0, "Start of kernel VM" }, +#if defined(CONFIG_PPC64) && defined(CONFIG_KASAN) + { 0, "kasan shadow mem start" }, + { 0, "kasan shadow mem end" }, +#endif { 0, "vmalloc() Area" }, { 0, "vmalloc() End" }, #ifdef CONFIG_PPC64 @@ -92,10 +96,10 @@ static struct addr_marker address_markers[] = { #endif { 0, "Fixmap start" }, { 0, "Fixmap end" }, -#endif #ifdef CONFIG_KASAN { 0, "kasan shadow mem start" }, { 0, "kasan shadow mem end" }, +#endif #endif { -1, NULL }, }; @@ -317,6 +321,10 @@ static void populate_markers(void) int i = 0; address_markers[i++].start_address = PAGE_OFFSET; +#if defined(CONFIG_PPC64) && defined(CONFIG_KASAN) + address_markers[i++].start_address = KASAN_SHADOW_START; + address_markers[i++].start_address = KASAN_SHADOW_END; +#endif address_markers[i++].start_address = VMALLOC_START; address_markers[i++].start_address = VMALLOC_END; #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 6caedc88474f..cedc86686e65 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -99,6 +99,7 @@ config PPC_BOOK3S_64 select ARCH_SUPPORTS_NUMA_BALANCING select IRQ_WORK select PPC_MM_SLICES + select KASAN_VMALLOC if KASAN config PPC_BOOK3E_64 bool "Embedded processors"
KASAN support on Book3S is a bit tricky to get right: - It would be good to support inline instrumentation so as to be able to catch stack issues that cannot be caught with outline mode. - Inline instrumentation requires a fixed offset. - Book3S runs code in real mode after booting. Most notably a lot of KVM runs in real mode, and it would be good to be able to instrument it. - Because code runs in real mode after boot, the offset has to point to valid memory both in and out of real mode. [ppc64 mm note: The kernel installs a linear mapping at effective address c000... onward. This is a one-to-one mapping with physical memory from 0000... onward. Because of how memory accesses work on powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the same memory both with translations on (accessing as an 'effective address'), and with translations off (accessing as a 'real address'). This works in both guests and the hypervisor. For more details, see s5.7 of Book III of version 3 of the ISA, in particular the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this KASAN implementation currently only supports Radix.] One approach is just to give up on inline instrumentation. This way all checks can be delayed until after everything set is up correctly, and the address-to-shadow calculations can be overridden. However, the features and speed boost provided by inline instrumentation are worth trying to do better. If _at compile time_ it is known how much contiguous physical memory a system has, the top 1/8th of the first block of physical memory can be set aside for the shadow. This is a big hammer and comes with 3 big consequences: - there's no nice way to handle physically discontiguous memory, so only the first physical memory block can be used. - kernels will simply fail to boot on machines with less memory than specified when compiling. - kernels running on machines with more memory than specified when compiling will simply ignore the extra memory. Implement and document KASAN this way. The current implementation is Radix only. Despite the limitations, it can still find bugs, e.g. http://patchwork.ozlabs.org/patch/1103775/ At the moment, this physical memory limit must be set _even for outline mode_. This may be changed in a later series - a different implementation could be added for outline mode that dynamically allocates shadow at a fixed offset. For example, see https://patchwork.ozlabs.org/patch/795211/ Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version Cc: Christophe Leroy <christophe.leroy@c-s.fr> # ppc32 version Signed-off-by: Daniel Axtens <dja@axtens.net> --- Changes since v6: - rework kasan_late_init support, which also fixes book3e problem that snowpatch picked up (I think) - fix a checkpatch error that snowpatch picked up - don't needlessly move the include in kasan.h Changes since v5: - rebase on powerpc/merge, with Christophe's latest changes integrating kasan-vmalloc - documentation tweaks based on latest 32-bit changes Changes since v4: - fix some ppc32 build issues - support ptdump - clean up the header file. It turns out we don't need or use KASAN_SHADOW_SIZE, so just dump it, and make KASAN_SHADOW_END the thing that varies between 32 and 64 bit. As part of this, make sure KASAN_SHADOW_OFFSET is only configured for 32 bit - it is calculated in the Makefile for ppc64. - various cleanups Changes since v3: - Address further feedback from Christophe. - Drop changes to stack walking, it looks like the issue I observed is related to that particular stack, not stack-walking generally. Changes since v2: - Address feedback from Christophe around cleanups and docs. - Address feedback from Balbir: at this point I don't have a good solution for the issues you identify around the limitations of the inline implementation but I think that it's worth trying to get the stack instrumentation support. I'm happy to have an alternative and more flexible outline mode - I had envisoned this would be called 'lightweight' mode as it imposes fewer restrictions. I've linked to your implementation. I think it's best to add it in a follow-up series. - Made the default PHYS_MEM_SIZE_FOR_KASAN value 1024MB. I think most people have guests with at least that much memory in the Radix 64s case so it's a much saner default - it means that if you just turn on KASAN without reading the docs you're much more likely to have a bootable kernel, which you will never have if the value is set to zero! I'm happy to bikeshed the value if we want. Changes since v1: - Landed kasan vmalloc support upstream - Lots of feedback from Christophe. Changes since the rfc: - Boots real and virtual hardware, kvm works. - disabled reporting when we're checking the stack for exception frames. The behaviour isn't wrong, just incompatible with KASAN. - Documentation! - Dropped old module stuff in favour of KASAN_VMALLOC. The bugs with ftrace and kuap were due to kernel bloat pushing prom_init calls to be done via the plt. Because we did not have a relocatable kernel, and they are done very early, this caused everything to explode. Compile with CONFIG_RELOCATABLE! --- Documentation/dev-tools/kasan.rst | 9 +- Documentation/powerpc/kasan.txt | 112 ++++++++++++++++++- arch/powerpc/Kconfig | 2 + arch/powerpc/Kconfig.debug | 23 +++- arch/powerpc/Makefile | 11 ++ arch/powerpc/include/asm/book3s/64/hash.h | 4 + arch/powerpc/include/asm/book3s/64/pgtable.h | 7 ++ arch/powerpc/include/asm/book3s/64/radix.h | 5 + arch/powerpc/include/asm/kasan.h | 11 +- arch/powerpc/kernel/prom.c | 61 +++++++++- arch/powerpc/mm/kasan/Makefile | 1 + arch/powerpc/mm/kasan/init_book3s_64.c | 73 ++++++++++++ arch/powerpc/mm/ptdump/ptdump.c | 10 +- arch/powerpc/platforms/Kconfig.cputype | 1 + 14 files changed, 320 insertions(+), 10 deletions(-) create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c