[v2,5/5] arm64: add KASan support

Message ID	1431698344-28054-6-git-send-email-a.ryabinin@samsung.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Andrey Ryabinin <a.ryabinin@samsung.com> To: linux-kernel@vger.kernel.org Subject: [PATCH v2 5/5] arm64: add KASan support Date: Fri, 15 May 2015 16:59:04 +0300 Message-id: <1431698344-28054-6-git-send-email-a.ryabinin@samsung.com> In-reply-to: <1431698344-28054-1-git-send-email-a.ryabinin@samsung.com> References: <1431698344-28054-1-git-send-email-a.ryabinin@samsung.com> Cc: Arnd Bergmann <arnd@arndb.de>, linux-mm@kvack.org, Andrey Ryabinin <a.ryabinin@samsung.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will.deacon@arm.com>, David Keitel <dkeitel@codeaurora.org>, Alexander Potapenko <glider@google.com>, linux-arm-kernel@lists.infradead.org, Andrew Morton <akpm@linux-foundation.org>, Dmitry Vyukov <dvyukov@google.com> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Andrey Ryabinin May 15, 2015, 1:59 p.m. UTC

This patch adds arch specific code for kernel address sanitizer
(see Documentation/kasan.txt).

1/8 of kernel addresses reserved for shadow memory. There was no
big enough hole for this, so virtual addresses for shadow were
stolen from vmalloc area.

At early boot stage the whole shadow region populated with just
one physical page (kasan_zero_page). Later, this page reused
as readonly zero shadow for some memory that KASan currently
don't track (vmalloc).
After mapping the physical memory, pages for shadow memory are
allocated and mapped.

KASan's stack instrumentation significantly increases stack's
consumption, so CONFIG_KASAN doubles THREAD_SIZE.

Functions like memset/memmove/memcpy do a lot of memory accesses.
If bad pointer passed to one of these function it is important
to catch this. Compiler's instrumentation cannot do this since
these functions are written in assembly.
KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions
in mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in name, so we could call non-instrumented variant
if needed.
Some files built without kasan instrumentation (e.g. mm/slub.c).
Original mem* function replaced (via #define) with prefixed variants
to disable memory access checks for such files.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
---
 arch/arm64/Kconfig                   |   7 ++
 arch/arm64/include/asm/kasan.h       |  24 ++++++
 arch/arm64/include/asm/pgtable.h     |   7 ++
 arch/arm64/include/asm/string.h      |  16 ++++
 arch/arm64/include/asm/thread_info.h |   8 ++
 arch/arm64/kernel/head.S             |   3 +
 arch/arm64/kernel/module.c           |  16 +++-
 arch/arm64/kernel/setup.c            |   2 +
 arch/arm64/lib/memcpy.S              |   3 +
 arch/arm64/lib/memmove.S             |   7 +-
 arch/arm64/lib/memset.S              |   3 +
 arch/arm64/mm/Makefile               |   3 +
 arch/arm64/mm/kasan_init.c           | 143 +++++++++++++++++++++++++++++++++++
 13 files changed, 237 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/include/asm/kasan.h
 create mode 100644 arch/arm64/mm/kasan_init.c

Linus Walleij May 27, 2015, 12:40 p.m. UTC | #1

On Fri, May 15, 2015 at 3:59 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:

> This patch adds arch specific code for kernel address sanitizer
> (see Documentation/kasan.txt).

OK fixed a newer GCC (4.9.3, so still just KASAN_OUTLINE), compiled
and booted on the ARM Juno Development System:

kasan test: kmalloc_large_oob_rigth kmalloc large allocation:
out-of-bounds to right
==================================================================
BUG: KASan: out of bounds access in kmalloc_large_oob_rigth+0x60/0x78
at addr ffffffc06516a00a
Write of size 1 by task swapper/0/1
page:ffffffbdc3945a00 count:1 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000(head)
page dumped because: kasan: bad access detected
CPU: 2 PID: 1 Comm: swapper/0 Tainted: G    B           4.1.0-rc4+ #9
Hardware name: ARM Juno development board (r0) (DT)
Call trace:
[<ffffffc00008aea8>] dump_backtrace+0x0/0x15c
[<ffffffc00008b014>] show_stack+0x10/0x1c
[<ffffffc00080997c>] dump_stack+0xac/0x104
[<ffffffc0001ea4d8>] kasan_report_error+0x3e4/0x400
[<ffffffc0001ea5dc>] kasan_report+0x40/0x4c
[<ffffffc0001e9a8c>] __asan_store1+0x70/0x78
[<ffffffc000a5ae78>] kmalloc_large_oob_rigth+0x5c/0x78
[<ffffffc000a5b6c0>] kmalloc_tests_init+0x14/0x4c
[<ffffffc000082940>] do_one_initcall+0xa0/0x1f4
[<ffffffc000a3bdbc>] kernel_init_freeable+0x1ec/0x294
[<ffffffc000804c5c>] kernel_init+0xc/0xec
Memory state around the buggy address:
 ffffffc065169f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc065169f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffc06516a000: 00 02 fe fe fe fe fe fe fe fe fe fe fe fe fe fe
                      ^
 ffffffc06516a080: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
 ffffffc06516a100: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
==================================================================
kasan test: kmalloc_oob_krealloc_more out-of-bounds after krealloc more
==================================================================
BUG: KASan: out of bounds access in
kmalloc_oob_krealloc_more+0xa0/0xc0 at addr ffffffc06501cd93
Write of size 1 by task swapper/0/1
=============================================================================
BUG kmalloc-64 (Tainted: G    B          ): kasan: bad access detected
-----------------------------------------------------------------------------

INFO: Allocated in kmalloc_oob_krealloc_more+0x48/0xc0 age=4 cpu=2 pid=1
        alloc_debug_processing+0x170/0x17c
        __slab_alloc.isra.59.constprop.61+0x34c/0x36c
        kmem_cache_alloc+0x1a4/0x1e0
        kmalloc_oob_krealloc_more+0x44/0xc0
        kmalloc_tests_init+0x18/0x4c
        do_one_initcall+0xa0/0x1f4
        kernel_init_freeable+0x1ec/0x294
        kernel_init+0xc/0xec
        ret_from_fork+0xc/0x50
INFO: Slab 0xffffffbdc3940700 objects=21 used=19 fp=0xffffffc06501d080
flags=0x4080
INFO: Object 0xffffffc06501cd80 @offset=3456 fp=0xffffffc06501cf00

Bytes b4 ffffffc06501cd70: 00 08 00 00 08 08 01 01 00 00 00 00 02 10
00 00  ................
Object ffffffc06501cd80: 00 cf 01 65 c0 ff ff ff 01 04 0c 00 01 04 10
c1  ...e............
Object ffffffc06501cd90: 00 82 60 28 58 01 04 43 98 48 48 24 01 81 b4
40  ..`(X..C.HH$...@
Object ffffffc06501cda0: 00 80 09 0a 69 a1 3d 82 08 01 34 65 21 31 b0
00  ....i.=...4e!1..
Object ffffffc06501cdb0: 04 42 4d a7 10 26 18 52 27 23 c2 1e 08 01 40
81  .BM..&.R'#....@.
Padding ffffffc06501cef0: 81 20 00 50 00 08 00 0b 00 0c 50 40 01 48 40
42  . .P......P@.H@B
CPU: 2 PID: 1 Comm: swapper/0 Tainted: G    B           4.1.0-rc4+ #9
Hardware name: ARM Juno development board (r0) (DT)
Call trace:
[<ffffffc00008aea8>] dump_backtrace+0x0/0x15c
[<ffffffc00008b014>] show_stack+0x10/0x1c
[<ffffffc00080997c>] dump_stack+0xac/0x104
[<ffffffc0001e3940>] print_trailer+0xdc/0x140
[<ffffffc0001e8384>] object_err+0x38/0x4c
[<ffffffc0001ea2a4>] kasan_report_error+0x1b0/0x400
[<ffffffc0001ea5dc>] kasan_report+0x40/0x4c
[<ffffffc0001e9a8c>] __asan_store1+0x70/0x78
[<ffffffc000a5b3a4>] kmalloc_oob_krealloc_more+0x9c/0xc0
[<ffffffc000a5b6c4>] kmalloc_tests_init+0x18/0x4c
[<ffffffc000082940>] do_one_initcall+0xa0/0x1f4
[<ffffffc000a3bdbc>] kernel_init_freeable+0x1ec/0x294
[<ffffffc000804c5c>] kernel_init+0xc/0xec
Memory state around the buggy address:
 ffffffc06501cc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc06501cd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc06501cd80: 00 00 03 fc fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
 ffffffc06501ce00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc06501ce80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

(etc)

This is how it should look I guess, so:
Tested-by: Linus Walleij <linus.walleij@linaro.org>

Now I have to fix all the naturally occuring KASan OOB bugs
that started to appear in my boot crawl :O

Yours,
Linus Walleij

Linus Walleij June 11, 2015, 1:39 p.m. UTC | #2

On Fri, May 15, 2015 at 3:59 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:

> This patch adds arch specific code for kernel address sanitizer
> (see Documentation/kasan.txt).

I looked closer at this again ... I am trying to get KASan up for
ARM(32) with some tricks and hacks.

> +config KASAN_SHADOW_OFFSET
> +       hex
> +       default 0xdfff200000000000 if ARM64_VA_BITS_48
> +       default 0xdffffc8000000000 if ARM64_VA_BITS_42
> +       default 0xdfffff9000000000 if ARM64_VA_BITS_39

So IIUC these offsets are simply chosen to satisfy the equation

        SHADOW = (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
                + KASAN_SHADOW_OFFSET;

For all memory that needs to be covered, i.e. kernel text+data,
modules text+data, any other kernel-mode running code+data.

And it needs to be statically assigned like this because the offset
is used at compile time.

Atleast that is how I understood it... correct me if wrong.
(Dunno if all is completely obvious to everyone else in the world...)

> +/*
> + * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
> + * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
> + */
> +#define KASAN_SHADOW_START      (UL(0xffffffffffffffff) << (VA_BITS))
> +#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))

Will this not mean that shadow start to end actually covers *all*
virtual addresses including userspace and what not? However a
large portion of this shadow memory will be unused because the
SHADOW_OFFSET only works for code compiled for the kernel
anyway.

When working on ARM32 I certainly could not map
(1UL << (VA_BITS -3)) i.e. for 32 bit (1UL << 29) bytes (512 MB) of
virtual memory for shadow, instead I had to restrict it to the size that
actually maps the memory used by the kernel.

I tried shrinking it down but it crashed on me so tell me how
wrong I am ... :)

But here comes the real tricks, where I need some help to
understand the patch set, maybe some comments should be
inserted here and there to ease understanding:

> +++ b/arch/arm64/mm/kasan_init.c
> @@ -0,0 +1,143 @@
> +#include <linux/kasan.h>
> +#include <linux/kernel.h>
> +#include <linux/memblock.h>
> +#include <linux/start_kernel.h>
> +
> +#include <asm/page.h>
> +#include <asm/pgalloc.h>
> +#include <asm/pgtable.h>
> +#include <asm/tlbflush.h>
> +
> +unsigned char kasan_zero_page[PAGE_SIZE] __page_aligned_bss;
> +static pgd_t tmp_page_table[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> +
> +#if CONFIG_PGTABLE_LEVELS > 3
> +pud_t kasan_zero_pud[PTRS_PER_PUD] __page_aligned_bss;
> +#endif
> +#if CONFIG_PGTABLE_LEVELS > 2
> +pmd_t kasan_zero_pmd[PTRS_PER_PMD] __page_aligned_bss;
> +#endif
> +pte_t kasan_zero_pte[PTRS_PER_PTE] __page_aligned_bss;
> +
> +static void __init kasan_early_pmd_populate(unsigned long start,
> +                                       unsigned long end, pud_t *pud)
> +{
> +       unsigned long addr;
> +       unsigned long next;
> +       pmd_t *pmd;
> +
> +       pmd = pmd_offset(pud, start);
> +       for (addr = start; addr < end; addr = next, pmd++) {
> +               pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
> +               next = pmd_addr_end(addr, end);
> +       }
> +}
> +
> +static void __init kasan_early_pud_populate(unsigned long start,
> +                                       unsigned long end, pgd_t *pgd)
> +{
> +       unsigned long addr;
> +       unsigned long next;
> +       pud_t *pud;
> +
> +       pud = pud_offset(pgd, start);
> +       for (addr = start; addr < end; addr = next, pud++) {
> +               pud_populate(&init_mm, pud, kasan_zero_pmd);
> +               next = pud_addr_end(addr, end);
> +               kasan_early_pmd_populate(addr, next, pud);
> +       }
> +}
> +
> +static void __init kasan_map_early_shadow(pgd_t *pgdp)
> +{
> +       int i;
> +       unsigned long start = KASAN_SHADOW_START;
> +       unsigned long end = KASAN_SHADOW_END;
> +       unsigned long addr;
> +       unsigned long next;
> +       pgd_t *pgd;
> +
> +       for (i = 0; i < PTRS_PER_PTE; i++)
> +               set_pte(&kasan_zero_pte[i], pfn_pte(
> +                               virt_to_pfn(kasan_zero_page), PAGE_KERNEL));
> +
> +       pgd = pgd_offset_k(start);
> +       for (addr = start; addr < end; addr = next, pgd++) {
> +               pgd_populate(&init_mm, pgd, kasan_zero_pud);
> +               next = pgd_addr_end(addr, end);
> +               kasan_early_pud_populate(addr, next, pgd);
> +       }
> +}
> +
> +void __init kasan_early_init(void)
> +{
> +       kasan_map_early_shadow(swapper_pg_dir);
> +       start_kernel();
> +}

So as far as I can see, kasan_early_init() needs to be called before
we even run start_kernel() because every memory access would
crash unless the MMU is set up for the shadow memory.

Is it correct that when the pte's, pgd's and pud's are populated
KASan really doesn't kick in, it's just done to have some scratch
memory with whatever contents so as to do dummy updates
for the __asan_loadN() and __asan_storeN() calls, and no checks
start until the shadow memory is populated in kasan_init()
i.e. there are no KASan checks for any code executing up
to that point, just random writes and reads?

Also, this code under kasan_early_init(), must that not be
written extremely carefully to avoid any loads and stores?
I.e. even if this file is obviously compiled with
KASAN_SANITIZE_kasan_init.o := n so that it is not
instrumented, I'm thinking about the functions it is calling,
like set_pte(), pgd_populate(), pmd_offset() etc etc.

Are we just lucky that these functions never do any loads
and stores?

Yours,
Linus Walleij

Andrey Ryabinin June 12, 2015, 6:14 p.m. UTC | #3

2015-06-11 16:39 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
> On Fri, May 15, 2015 at 3:59 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
>
>> This patch adds arch specific code for kernel address sanitizer
>> (see Documentation/kasan.txt).
>
> I looked closer at this again ... I am trying to get KASan up for
> ARM(32) with some tricks and hacks.
>

I have some patches for that. They still need some polishing, but works for me.
I could share after I get back to office on Tuesday.


>> +config KASAN_SHADOW_OFFSET
>> +       hex
>> +       default 0xdfff200000000000 if ARM64_VA_BITS_48
>> +       default 0xdffffc8000000000 if ARM64_VA_BITS_42
>> +       default 0xdfffff9000000000 if ARM64_VA_BITS_39
>
> So IIUC these offsets are simply chosen to satisfy the equation
>
>         SHADOW = (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
>                 + KASAN_SHADOW_OFFSET;
>
> For all memory that needs to be covered, i.e. kernel text+data,
> modules text+data, any other kernel-mode running code+data.
>
> And it needs to be statically assigned like this because the offset
> is used at compile time.
>
> Atleast that is how I understood it... correct me if wrong.
> (Dunno if all is completely obvious to everyone else in the world...)
>

I think you understood this right.

>> +/*
>> + * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>> + * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>> + */
>> +#define KASAN_SHADOW_START      (UL(0xffffffffffffffff) << (VA_BITS))
>> +#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>
> Will this not mean that shadow start to end actually covers *all*
> virtual addresses including userspace and what not? However a
> large portion of this shadow memory will be unused because the
> SHADOW_OFFSET only works for code compiled for the kernel
> anyway.
>

SHADOW_OFFSET:SHADOW_END - covers *all* 64bits of virtual addresses.
SHADOW_OFFSET:SHADOW_START - unused shadow.
SHADOW_START:SHADOW_END - covers only kernel virtual addresses (used shadow).


> When working on ARM32 I certainly could not map
> (1UL << (VA_BITS -3)) i.e. for 32 bit (1UL << 29) bytes (512 MB) of

Why not? We can just take it from TASK_SIZE.

> virtual memory for shadow, instead I had to restrict it to the size that
> actually maps the memory used by the kernel.
>

That should work too.

> I tried shrinking it down but it crashed on me so tell me how
> wrong I am ... :)
>
> But here comes the real tricks, where I need some help to
> understand the patch set, maybe some comments should be
> inserted here and there to ease understanding:
>

...

>> +
>> +void __init kasan_early_init(void)
>> +{
>> +       kasan_map_early_shadow(swapper_pg_dir);
>> +       start_kernel();
>> +}
>
> So as far as I can see, kasan_early_init() needs to be called before
> we even run start_kernel() because every memory access would
> crash unless the MMU is set up for the shadow memory.
>

 kasan_early_init() should be called before *any* instrumented function.

> Is it correct that when the pte's, pgd's and pud's are populated
> KASan really doesn't kick in, it's just done to have some scratch
> memory with whatever contents so as to do dummy updates
> for the __asan_loadN() and __asan_storeN() calls, and no checks
> start until the shadow memory is populated in kasan_init()
> i.e. there are no KASan checks for any code executing up
> to that point, just random writes and reads?
>

Yes, kasan_early_init() setups scratch memory with whatever contents.
But  KASan checks shadow before kasan_init(), that's the reason why we
need scratch shadow.
So checks are performed, but KASan don't print any reports, because
init_task has non-zero kasan_depth flag (see include/linux/init_task.h)
We check that flag in kasan_report() and print report iff it have zero value.

In kasan_init() after shadow populated, we enable reports by setting
kasan_depth to zero.


> Also, this code under kasan_early_init(), must that not be
> written extremely carefully to avoid any loads and stores?
> I.e. even if this file is obviously compiled with
> KASAN_SANITIZE_kasan_init.o := n so that it is not
> instrumented, I'm thinking about the functions it is calling,
> like set_pte(), pgd_populate(), pmd_offset() etc etc.
>

These functions are not instrumented as well, because they are static
and located in headers.
So these functions will be in kasan_init.o translation unit and will
be compiled without -fsanitize=kernel-address.

> Are we just lucky that these functions never do any loads
> and stores?
>

We relay on fact that these functions are static inline and do not call other
functions from other (instrumented) files.

Linus Walleij June 13, 2015, 3:25 p.m. UTC | #4

On Fri, Jun 12, 2015 at 8:14 PM, Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
> 2015-06-11 16:39 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
>> On Fri, May 15, 2015 at 3:59 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
>>
>>> This patch adds arch specific code for kernel address sanitizer
>>> (see Documentation/kasan.txt).
>>
>> I looked closer at this again ... I am trying to get KASan up for
>> ARM(32) with some tricks and hacks.
>>
>
> I have some patches for that. They still need some polishing, but works for me.
> I could share after I get back to office on Tuesday.

OK! I'd be happy to test!

I have a WIP patch too, but it was trying to do a physical carveout
at early boot because of misunderstandings as to how KASan works...
Yeah I'm a slow learner.

>>> +/*
>>> + * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>>> + * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>>> + */
>>> +#define KASAN_SHADOW_START      (UL(0xffffffffffffffff) << (VA_BITS))
>>> +#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>>
>> Will this not mean that shadow start to end actually covers *all*
>> virtual addresses including userspace and what not? However a
>> large portion of this shadow memory will be unused because the
>> SHADOW_OFFSET only works for code compiled for the kernel
>> anyway.
>>
>
> SHADOW_OFFSET:SHADOW_END - covers *all* 64bits of virtual addresses.
> SHADOW_OFFSET:SHADOW_START - unused shadow.
> SHADOW_START:SHADOW_END - covers only kernel virtual addresses (used shadow).

Aha. I see now...

>> When working on ARM32 I certainly could not map
>> (1UL << (VA_BITS -3)) i.e. for 32 bit (1UL << 29) bytes (512 MB) of
>
> Why not? We can just take it from TASK_SIZE.

Yeah the idea to steal it from userspace occured to me too...
with ARM32 having a highmem split in the middle of vmalloc I was
quite stressed when trying to chisel it out from the vmalloc area.

I actually managed to remove all static iomaps from my platform
so I could allocate the KASan memory from high to log addresses
at 0xf7000000-0xff000000 but it puts requirements on all
the ARM32 platforms to rid their static maps :P

>> Is it correct that when the pte's, pgd's and pud's are populated
>> KASan really doesn't kick in, it's just done to have some scratch
>> memory with whatever contents so as to do dummy updates
>> for the __asan_loadN() and __asan_storeN() calls, and no checks
>> start until the shadow memory is populated in kasan_init()
>> i.e. there are no KASan checks for any code executing up
>> to that point, just random writes and reads?
>
> Yes, kasan_early_init() setups scratch memory with whatever contents.
> But  KASan checks shadow before kasan_init(), that's the reason why we
> need scratch shadow.
>
> So checks are performed, but KASan don't print any reports, because
> init_task has non-zero kasan_depth flag (see include/linux/init_task.h)
> We check that flag in kasan_report() and print report iff it have zero value.
>
> In kasan_init() after shadow populated, we enable reports by setting
> kasan_depth to zero.

Aha now I understand how this works! Now I understand
this init_task.kasan_depth = 0 too.

>> Are we just lucky that these functions never do any loads
>> and stores?
>>
>
> We relay on fact that these functions are static inline and do not call other
> functions from other (instrumented) files.

Aha, makes perfect sense.

I think I understand the code a bit now ... it maps all the KASan
shadow memory to the physical memory of the zero page and let all
updates hit that memory until the memory manager is up and running,
then you allocate physical memory backing the shadow in
kasan_populate_zero_shadow().

I misunderstood it such that the backing physical shadow memory
had to be available when we do the early call... no wonder I
got lost.

Yours,
Linus Walleij

Andrey Ryabinin June 17, 2015, 9:32 p.m. UTC | #5

2015-06-13 18:25 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
>
> On Fri, Jun 12, 2015 at 8:14 PM, Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
> > 2015-06-11 16:39 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
> >> On Fri, May 15, 2015 at 3:59 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
> >>
> >>> This patch adds arch specific code for kernel address sanitizer
> >>> (see Documentation/kasan.txt).
> >>
> >> I looked closer at this again ... I am trying to get KASan up for
> >> ARM(32) with some tricks and hacks.
> >>
> >
> > I have some patches for that. They still need some polishing, but works for me.
> > I could share after I get back to office on Tuesday.
>
> OK! I'd be happy to test!
>

I've pushed it here : git://github.com/aryabinin/linux.git kasan/arm_v0

It far from ready. Firstly I've tried it only in qemu and it works.
Today, I've tried to run it on bare metal (exynos5420), but it hangs
somewhere after early_irq_init().
So, it probably doesn't  worth for trying/testing yet.

Catalin Marinas July 8, 2015, 3:48 p.m. UTC | #6

On Fri, May 15, 2015 at 04:59:04PM +0300, Andrey Ryabinin wrote:
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7796af4..4cc73cc 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -44,6 +44,7 @@ config ARM64
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
>  	select HAVE_ARCH_JUMP_LABEL
> +	select HAVE_ARCH_KASAN if SPARSEMEM_VMEMMAP

Just curious, why the dependency?

>  	select HAVE_ARCH_KGDB
>  	select HAVE_ARCH_SECCOMP_FILTER
>  	select HAVE_ARCH_TRACEHOOK
> @@ -119,6 +120,12 @@ config GENERIC_CSUM
>  config GENERIC_CALIBRATE_DELAY
>  	def_bool y
>  
> +config KASAN_SHADOW_OFFSET
> +	hex
> +	default 0xdfff200000000000 if ARM64_VA_BITS_48
> +	default 0xdffffc8000000000 if ARM64_VA_BITS_42
> +	default 0xdfffff9000000000 if ARM64_VA_BITS_39
> +

How were these numbers generated? I can probably guess but we need a
comment in this file and a BUILD_BUG elsewhere (kasan_init.c) if we
change the memory map and they no longer match.

> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
> new file mode 100644
> index 0000000..65ac50d
> --- /dev/null
> +++ b/arch/arm64/include/asm/kasan.h
> @@ -0,0 +1,24 @@
> +#ifndef __ASM_KASAN_H
> +#define __ASM_KASAN_H
> +
> +#ifndef __ASSEMBLY__
> +
> +#ifdef CONFIG_KASAN
> +
> +#include <asm/memory.h>
> +
> +/*
> + * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
> + * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
> + */
> +#define KASAN_SHADOW_START      (UL(0xffffffffffffffff) << (VA_BITS))
> +#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))

Can you define a VA_START in asm/memory.h so that we avoid this long
list of f's here and in pgtable.h?

Another BUILD_BUG we need is to ensure that KASAN_SHADOW_START/END
covers an exact number of pgd entries, otherwise the logic in
kasan_init.c can go wrong (it seems to be the case in all VA_BITS
configurations but just in case we forget about this requirement in the
future).

> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index bd5db28..8700f66 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -40,7 +40,14 @@
>   *	fixed mappings and modules
>   */
>  #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
> +
> +#ifndef CONFIG_KASAN
>  #define VMALLOC_START		(UL(0xffffffffffffffff) << VA_BITS)

And here we could just use VA_START.

> +#else
> +#include <asm/kasan.h>
> +#define VMALLOC_START		KASAN_SHADOW_END
> +#endif

We could add a SZ_64K guard page here (just in case, the KASan shadow
probably never reaches KASAN_SHADOW_END).

> diff --git a/arch/arm64/include/asm/string.h b/arch/arm64/include/asm/string.h
> index 64d2d48..bff522c 100644
> --- a/arch/arm64/include/asm/string.h
> +++ b/arch/arm64/include/asm/string.h
> @@ -36,17 +36,33 @@ extern __kernel_size_t strnlen(const char *, __kernel_size_t);
>  
>  #define __HAVE_ARCH_MEMCPY
>  extern void *memcpy(void *, const void *, __kernel_size_t);
> +extern void *__memcpy(void *, const void *, __kernel_size_t);
>  
>  #define __HAVE_ARCH_MEMMOVE
>  extern void *memmove(void *, const void *, __kernel_size_t);
> +extern void *__memmove(void *, const void *, __kernel_size_t);
>  
>  #define __HAVE_ARCH_MEMCHR
>  extern void *memchr(const void *, int, __kernel_size_t);
>  
>  #define __HAVE_ARCH_MEMSET
>  extern void *memset(void *, int, __kernel_size_t);
> +extern void *__memset(void *, int, __kernel_size_t);
>  
>  #define __HAVE_ARCH_MEMCMP
>  extern int memcmp(const void *, const void *, size_t);
>  
> +
> +#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
> +
> +/*
> + * For files that not instrumented (e.g. mm/slub.c) we

Missing an "are".

> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index dcd06d1..cfe5ea5 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -24,10 +24,18 @@
>  #include <linux/compiler.h>
>  
>  #ifndef CONFIG_ARM64_64K_PAGES
> +#ifndef CONFIG_KASAN
>  #define THREAD_SIZE_ORDER	2
> +#else
> +#define THREAD_SIZE_ORDER	3
> +#endif
>  #endif
>  
> +#ifndef CONFIG_KASAN
>  #define THREAD_SIZE		16384
> +#else
> +#define THREAD_SIZE		32768
> +#endif
>  #define THREAD_START_SP		(THREAD_SIZE - 16)

Have you actually seen it failing with the 16KB THREAD_SIZE? You may run
into other problems with 8 4KB pages per stack.

>  #ifndef __ASSEMBLY__
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 19f915e..650b1e8 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -486,6 +486,9 @@ __mmap_switched:
>  	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
>  	str_l	x24, memstart_addr, x6		// Save PHYS_OFFSET
>  	mov	x29, #0
> +#ifdef CONFIG_KASAN
> +	b	kasan_early_init
> +#endif
>  	b	start_kernel
>  ENDPROC(__mmap_switched)

I think we still have swapper_pg_dir in x26 at this point, could you
instead do:

	mov	x0, x26
	bl	kasan_map_early_shadow

Actually, I don't think kasan_map_early_shadow() even needs this
argument, it uses pgd_offset_k() anyway.

> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
> index 8a9a96d..845e40a 100644
> --- a/arch/arm64/lib/memcpy.S
> +++ b/arch/arm64/lib/memcpy.S
> @@ -56,6 +56,8 @@ C_h	.req	x12
>  D_l	.req	x13
>  D_h	.req	x14
>  
> +.weak memcpy

Nitpick: as with other such asm declarations, use some indentation:

	.weak	memcpy

(similarly for the others)

> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
> index 773d37a..e17703c 100644
> --- a/arch/arm64/mm/Makefile
> +++ b/arch/arm64/mm/Makefile
> @@ -4,3 +4,6 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
>  				   context.o proc.o pageattr.o
>  obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
>  obj-$(CONFIG_ARM64_PTDUMP)	+= dump.o
> +
> +KASAN_SANITIZE_kasan_init.o	:= n
> +obj-$(CONFIG_KASAN)		+= kasan_init.o
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> new file mode 100644
> index 0000000..35dbd84
> --- /dev/null
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -0,0 +1,143 @@
> +#include <linux/kasan.h>
> +#include <linux/kernel.h>
> +#include <linux/memblock.h>
> +#include <linux/start_kernel.h>
> +
> +#include <asm/page.h>
> +#include <asm/pgalloc.h>
> +#include <asm/pgtable.h>
> +#include <asm/tlbflush.h>
> +
> +unsigned char kasan_zero_page[PAGE_SIZE] __page_aligned_bss;

So that's needed because the shadow memory is mapped before paging_init
is called and we don't have the zero page set up yet. Please add a
comment.

> +static pgd_t tmp_page_table[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);

This doesn't need a full PAGE_SIZE alignment, just PGD_SIZE. You could
also rename to tmp_pg_dir for consistency with swapper and idmap.

> +
> +#if CONFIG_PGTABLE_LEVELS > 3
> +pud_t kasan_zero_pud[PTRS_PER_PUD] __page_aligned_bss;
> +#endif
> +#if CONFIG_PGTABLE_LEVELS > 2
> +pmd_t kasan_zero_pmd[PTRS_PER_PMD] __page_aligned_bss;
> +#endif
> +pte_t kasan_zero_pte[PTRS_PER_PTE] __page_aligned_bss;
> +
> +static void __init kasan_early_pmd_populate(unsigned long start,
> +					unsigned long end, pud_t *pud)
> +{
> +	unsigned long addr;
> +	unsigned long next;
> +	pmd_t *pmd;
> +
> +	pmd = pmd_offset(pud, start);
> +	for (addr = start; addr < end; addr = next, pmd++) {
> +		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
> +		next = pmd_addr_end(addr, end);
> +	}
> +}
> +
> +static void __init kasan_early_pud_populate(unsigned long start,
> +					unsigned long end, pgd_t *pgd)
> +{
> +	unsigned long addr;
> +	unsigned long next;
> +	pud_t *pud;
> +
> +	pud = pud_offset(pgd, start);
> +	for (addr = start; addr < end; addr = next, pud++) {
> +		pud_populate(&init_mm, pud, kasan_zero_pmd);
> +		next = pud_addr_end(addr, end);
> +		kasan_early_pmd_populate(addr, next, pud);
> +	}
> +}
> +
> +static void __init kasan_map_early_shadow(pgd_t *pgdp)
> +{
> +	int i;
> +	unsigned long start = KASAN_SHADOW_START;
> +	unsigned long end = KASAN_SHADOW_END;
> +	unsigned long addr;
> +	unsigned long next;
> +	pgd_t *pgd;
> +
> +	for (i = 0; i < PTRS_PER_PTE; i++)
> +		set_pte(&kasan_zero_pte[i], pfn_pte(
> +				virt_to_pfn(kasan_zero_page), PAGE_KERNEL));

Does this need to be writable? If yes, is there anything writing
non-zero values to it?

> +
> +	pgd = pgd_offset_k(start);
> +	for (addr = start; addr < end; addr = next, pgd++) {
> +		pgd_populate(&init_mm, pgd, kasan_zero_pud);
> +		next = pgd_addr_end(addr, end);
> +		kasan_early_pud_populate(addr, next, pgd);
> +	}

I prefer to use "do ... while" constructs similar to __create_mapping()
(or zero_{pgd,pud,pmd}_populate as you are more familiar with them).

But what I don't get here is that you repopulate the pud page for every
pgd (and so on for pmd). You don't need this recursive call all the way
to kasan_early_pmd_populate() but just sequential:

	kasan_early_pte_populate();
	kasan_early_pmd_populate(..., pte);
	kasan_early_pud_populate(..., pmd);
	kasan_early_pgd_populate(..., pud);

(or in reverse order)

That's because you don't have enough pte/pmd/pud pages to cover the
range (i.e. you need 512 pte pages for a pmd page) but you just reuse
the same table page to make all of them pointing to kasan_zero_page.

> +void __init kasan_early_init(void)
> +{
> +	kasan_map_early_shadow(swapper_pg_dir);
> +	start_kernel();
> +}
> +
> +static void __init clear_pgds(unsigned long start,
> +			unsigned long end)
> +{
> +	/*
> +	 * Remove references to kasan page tables from
> +	 * swapper_pg_dir. pgd_clear() can't be used
> +	 * here because it's nop on 2,3-level pagetable setups
> +	 */
> +	for (; start && start < end; start += PGDIR_SIZE)
> +		set_pgd(pgd_offset_k(start), __pgd(0));
> +}
> +
> +static void __init cpu_set_ttbr1(unsigned long ttbr1)
> +{
> +	asm(
> +	"	msr	ttbr1_el1, %0\n"
> +	"	isb"
> +	:
> +	: "r" (ttbr1));
> +}
> +
> +void __init kasan_init(void)
> +{
> +	struct memblock_region *reg;
> +
> +	/*
> +	 * We are going to perform proper setup of shadow memory.
> +	 * At first we should unmap early shadow (clear_pgds() call bellow).
> +	 * However, instrumented code couldn't execute without shadow memory.
> +	 * tmp_page_table used to keep early shadow mapped until full shadow
> +	 * setup will be finished.
> +	 */
> +	memcpy(tmp_page_table, swapper_pg_dir, sizeof(tmp_page_table));
> +	cpu_set_ttbr1(__pa(tmp_page_table));
> +	flush_tlb_all();
> +
> +	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
> +
> +	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
> +			kasan_mem_to_shadow((void *)MODULES_VADDR));
> +
> +	for_each_memblock(memory, reg) {
> +		void *start = (void *)__phys_to_virt(reg->base);
> +		void *end = (void *)__phys_to_virt(reg->base + reg->size);
> +
> +		if (start >= end)
> +			break;
> +
> +		/*
> +		 * end + 1 here is intentional. We check several shadow bytes in
> +		 * advance to slightly speed up fastpath. In some rare cases
> +		 * we could cross boundary of mapped shadow, so we just map
> +		 * some more here.
> +		 */
> +		vmemmap_populate((unsigned long)kasan_mem_to_shadow(start),
> +				(unsigned long)kasan_mem_to_shadow(end) + 1,
> +				pfn_to_nid(virt_to_pfn(start)));

Is the only reason for sparsemem vmemmap dependency to reuse this
function? Maybe at some point you could factor this out and not require
SPARSEMEM_VMEMMAP to be enabled.

About the "end + 1", what you actually get is an additional full section
(PMD_SIZE) with the 4KB page configuration. Since the shadow is 1/8 of
the VA space, do we need a check for memblocks within 8 * 2MB of each
other?

> +	}
> +
> +	memset(kasan_zero_page, 0, PAGE_SIZE);

Has anyone written to this page? Actually, what's its use after we
enabled the proper KASan shadow mappings?

> +	cpu_set_ttbr1(__pa(swapper_pg_dir));
> +	flush_tlb_all();
> +
> +	/* At this point kasan is fully initialized. Enable error messages */
> +	init_task.kasan_depth = 0;
> +}

Andrey Ryabinin July 10, 2015, 5:11 p.m. UTC | #7

>>  	select HAVE_ARCH_KGDB
>>  	select HAVE_ARCH_SECCOMP_FILTER
>>  	select HAVE_ARCH_TRACEHOOK
>> @@ -119,6 +120,12 @@ config GENERIC_CSUM
>>  config GENERIC_CALIBRATE_DELAY
>>  	def_bool y
>>  
>> +config KASAN_SHADOW_OFFSET
>> +	hex
>> +	default 0xdfff200000000000 if ARM64_VA_BITS_48
>> +	default 0xdffffc8000000000 if ARM64_VA_BITS_42
>> +	default 0xdfffff9000000000 if ARM64_VA_BITS_39
>> +
> 
> How were these numbers generated? I can probably guess but we need a
> comment in this file and a BUILD_BUG elsewhere (kasan_init.c) if we
> change the memory map and they no longer match.
> 

Ok, will do.

Probably the simplest way to get this number is:
	KASAN_SHADOW_END - (1ULL << (64 - 3))

64 is number of bits in pointer, 3 is KASAN_SHADOW_SCALE_SHIFT,
so [KASAN_SHADOW_OFFSET, KASAN_SHADOW_END] covers [0, -1ULL] addresses.


>> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>> new file mode 100644
>> index 0000000..65ac50d
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/kasan.h
>> @@ -0,0 +1,24 @@
>> +#ifndef __ASM_KASAN_H
>> +#define __ASM_KASAN_H
>> +
>> +#ifndef __ASSEMBLY__
>> +
>> +#ifdef CONFIG_KASAN
>> +
>> +#include <asm/memory.h>
>> +
>> +/*
>> + * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>> + * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>> + */
>> +#define KASAN_SHADOW_START      (UL(0xffffffffffffffff) << (VA_BITS))
>> +#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
> 
> Can you define a VA_START in asm/memory.h so that we avoid this long
> list of f's here and in pgtable.h?
> 

Sure, will do.

> Another BUILD_BUG we need is to ensure that KASAN_SHADOW_START/END
> covers an exact number of pgd entries, otherwise the logic in
> kasan_init.c can go wrong (it seems to be the case in all VA_BITS
> configurations but just in case we forget about this requirement in the
> future).
> 
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index bd5db28..8700f66 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -40,7 +40,14 @@
>>   *	fixed mappings and modules
>>   */
>>  #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>> +
>> +#ifndef CONFIG_KASAN
>>  #define VMALLOC_START		(UL(0xffffffffffffffff) << VA_BITS)
> 
> And here we could just use VA_START.
> 
>> +#else
>> +#include <asm/kasan.h>
>> +#define VMALLOC_START		KASAN_SHADOW_END
>> +#endif
> 
> We could add a SZ_64K guard page here (just in case, the KASan shadow
> probably never reaches KASAN_SHADOW_END).
> 

Ok.

>> diff --git a/arch/arm64/include/asm/string.h b/arch/arm64/include/asm/string.h
>> index 64d2d48..bff522c 100644
>> --- a/arch/arm64/include/asm/string.h
>> +++ b/arch/arm64/include/asm/string.h
>> @@ -36,17 +36,33 @@ extern __kernel_size_t strnlen(const char *, __kernel_size_t);
>>  
>>  #define __HAVE_ARCH_MEMCPY
>>  extern void *memcpy(void *, const void *, __kernel_size_t);
>> +extern void *__memcpy(void *, const void *, __kernel_size_t);
>>  
>>  #define __HAVE_ARCH_MEMMOVE
>>  extern void *memmove(void *, const void *, __kernel_size_t);
>> +extern void *__memmove(void *, const void *, __kernel_size_t);
>>  
>>  #define __HAVE_ARCH_MEMCHR
>>  extern void *memchr(const void *, int, __kernel_size_t);
>>  
>>  #define __HAVE_ARCH_MEMSET
>>  extern void *memset(void *, int, __kernel_size_t);
>> +extern void *__memset(void *, int, __kernel_size_t);
>>  
>>  #define __HAVE_ARCH_MEMCMP
>>  extern int memcmp(const void *, const void *, size_t);
>>  
>> +
>> +#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
>> +
>> +/*
>> + * For files that not instrumented (e.g. mm/slub.c) we
> 
> Missing an "are".
> 
>> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
>> index dcd06d1..cfe5ea5 100644
>> --- a/arch/arm64/include/asm/thread_info.h
>> +++ b/arch/arm64/include/asm/thread_info.h
>> @@ -24,10 +24,18 @@
>>  #include <linux/compiler.h>
>>  
>>  #ifndef CONFIG_ARM64_64K_PAGES
>> +#ifndef CONFIG_KASAN
>>  #define THREAD_SIZE_ORDER	2
>> +#else
>> +#define THREAD_SIZE_ORDER	3
>> +#endif
>>  #endif
>>  
>> +#ifndef CONFIG_KASAN
>>  #define THREAD_SIZE		16384
>> +#else
>> +#define THREAD_SIZE		32768
>> +#endif
>>  #define THREAD_START_SP		(THREAD_SIZE - 16)
> 
> Have you actually seen it failing with the 16KB THREAD_SIZE? You may run
> into other problems with 8 4KB pages per stack.
> 

Actually no, so I guess that we could try with 16K.

I've seen it failing on ARM32 with 8K stack (we use some old version of kasan for our ARM kernels),
but that's a different story


>>  #ifndef __ASSEMBLY__
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 19f915e..650b1e8 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -486,6 +486,9 @@ __mmap_switched:
>>  	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
>>  	str_l	x24, memstart_addr, x6		// Save PHYS_OFFSET
>>  	mov	x29, #0
>> +#ifdef CONFIG_KASAN
>> +	b	kasan_early_init
>> +#endif
>>  	b	start_kernel
>>  ENDPROC(__mmap_switched)
> 
> I think we still have swapper_pg_dir in x26 at this point, could you
> instead do:
> 
> 	mov	x0, x26
> 	bl	kasan_map_early_shadow
> 
> Actually, I don't think kasan_map_early_shadow() even needs this
> argument, it uses pgd_offset_k() anyway.
> 

Indeed, just "bl	kasan_map_early_shadow" would be enough.


>> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
>> new file mode 100644
>> index 0000000..35dbd84
>> --- /dev/null
>> +++ b/arch/arm64/mm/kasan_init.c
>> @@ -0,0 +1,143 @@
>> +#include <linux/kasan.h>
>> +#include <linux/kernel.h>
>> +#include <linux/memblock.h>
>> +#include <linux/start_kernel.h>
>> +
>> +#include <asm/page.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/pgtable.h>
>> +#include <asm/tlbflush.h>
>> +
>> +unsigned char kasan_zero_page[PAGE_SIZE] __page_aligned_bss;
> 
> So that's needed because the shadow memory is mapped before paging_init
> is called and we don't have the zero page set up yet. Please add a
> comment.
> 

Actually this page has two purposes, so naming is bad here.
There was a debate (in kasan for x86_64 thread) about its name, but nobody
come up wit a good name.

So I'll add following comment:

/*
 * This page serves two purposes:
 *   - It used as early shadow memory. The entire shadow region populated with this
 *      page, before we will be able to setup normal shadow memory.
 *   - Latter it reused it as zero shadow to cover large ranges of memory
 *      that allowed to access, but not handled by kasan (vmalloc/vmemmap ...).
 */


>> +static pgd_t tmp_page_table[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> 
> This doesn't need a full PAGE_SIZE alignment, just PGD_SIZE. You could
> also rename to tmp_pg_dir for consistency with swapper and idmap.
> 

Ok.

>> +
>> +#if CONFIG_PGTABLE_LEVELS > 3
>> +pud_t kasan_zero_pud[PTRS_PER_PUD] __page_aligned_bss;
>> +#endif
>> +#if CONFIG_PGTABLE_LEVELS > 2
>> +pmd_t kasan_zero_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> +#endif
>> +pte_t kasan_zero_pte[PTRS_PER_PTE] __page_aligned_bss;
>> +
>> +static void __init kasan_early_pmd_populate(unsigned long start,
>> +					unsigned long end, pud_t *pud)
>> +{
>> +	unsigned long addr;
>> +	unsigned long next;
>> +	pmd_t *pmd;
>> +
>> +	pmd = pmd_offset(pud, start);
>> +	for (addr = start; addr < end; addr = next, pmd++) {
>> +		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>> +		next = pmd_addr_end(addr, end);
>> +	}
>> +}
>> +
>> +static void __init kasan_early_pud_populate(unsigned long start,
>> +					unsigned long end, pgd_t *pgd)
>> +{
>> +	unsigned long addr;
>> +	unsigned long next;
>> +	pud_t *pud;
>> +
>> +	pud = pud_offset(pgd, start);
>> +	for (addr = start; addr < end; addr = next, pud++) {
>> +		pud_populate(&init_mm, pud, kasan_zero_pmd);
>> +		next = pud_addr_end(addr, end);
>> +		kasan_early_pmd_populate(addr, next, pud);
>> +	}
>> +}
>> +
>> +static void __init kasan_map_early_shadow(pgd_t *pgdp)
>> +{
>> +	int i;
>> +	unsigned long start = KASAN_SHADOW_START;
>> +	unsigned long end = KASAN_SHADOW_END;
>> +	unsigned long addr;
>> +	unsigned long next;
>> +	pgd_t *pgd;
>> +
>> +	for (i = 0; i < PTRS_PER_PTE; i++)
>> +		set_pte(&kasan_zero_pte[i], pfn_pte(
>> +				virt_to_pfn(kasan_zero_page), PAGE_KERNEL));
> 
> Does this need to be writable? If yes, is there anything writing
> non-zero values to it?
> 

Yes. Before kasan_init() this needs to be writable for stack instrumentation.
In function's prologue GCC generates some code that writes to shadow marking
redzones around stack variables.

So the page will contain some garbage, however it doesn't matter for early
stage of boot. Kasan will ignore any bad shadow value before kasan_init().

>> +
>> +	pgd = pgd_offset_k(start);
>> +	for (addr = start; addr < end; addr = next, pgd++) {
>> +		pgd_populate(&init_mm, pgd, kasan_zero_pud);
>> +		next = pgd_addr_end(addr, end);
>> +		kasan_early_pud_populate(addr, next, pgd);
>> +	}
> 
> I prefer to use "do ... while" constructs similar to __create_mapping()
> (or zero_{pgd,pud,pmd}_populate as you are more familiar with them).
> 
> But what I don't get here is that you repopulate the pud page for every
> pgd (and so on for pmd). You don't need this recursive call all the way
> to kasan_early_pmd_populate() but just sequential:
> 

This repopulation needed for 3,2 level page tables configurations.

E.g. for 3-level page tables we need to call pud_populate(&init_mm, pud, kasan_zero_pmd)
for each pud in [KASAN_SHADOW_START, KASAN_SHADOW_END] range, this causes repopopulation for 4-level
page tables, since we need to pud_populate() only [KASAN_SHADOW_START, KASAN_SHADOW_START + PGDIR_SIZE] range.

> 	kasan_early_pte_populate();
> 	kasan_early_pmd_populate(..., pte);
> 	kasan_early_pud_populate(..., pmd);
> 	kasan_early_pgd_populate(..., pud);
> 
> (or in reverse order)
> 

Unless, I'm missing something, this will either work only with 4-level page tables.
We could do this without repopulation by using CONFIG_PGTABLE_LEVELS ifdefs.



> That's because you don't have enough pte/pmd/pud pages to cover the
> range (i.e. you need 512 pte pages for a pmd page) but you just reuse
> the same table page to make all of them pointing to kasan_zero_page.
> 
>> +void __init kasan_early_init(void)
>> +{
>> +	kasan_map_early_shadow(swapper_pg_dir);
>> +	start_kernel();
>> +}
>> +
>> +static void __init clear_pgds(unsigned long start,
>> +			unsigned long end)
>> +{
>> +	/*
>> +	 * Remove references to kasan page tables from
>> +	 * swapper_pg_dir. pgd_clear() can't be used
>> +	 * here because it's nop on 2,3-level pagetable setups
>> +	 */
>> +	for (; start && start < end; start += PGDIR_SIZE)
>> +		set_pgd(pgd_offset_k(start), __pgd(0));
>> +}
>> +
>> +static void __init cpu_set_ttbr1(unsigned long ttbr1)
>> +{
>> +	asm(
>> +	"	msr	ttbr1_el1, %0\n"
>> +	"	isb"
>> +	:
>> +	: "r" (ttbr1));
>> +}
>> +
>> +void __init kasan_init(void)
>> +{
>> +	struct memblock_region *reg;
>> +
>> +	/*
>> +	 * We are going to perform proper setup of shadow memory.
>> +	 * At first we should unmap early shadow (clear_pgds() call bellow).
>> +	 * However, instrumented code couldn't execute without shadow memory.
>> +	 * tmp_page_table used to keep early shadow mapped until full shadow
>> +	 * setup will be finished.
>> +	 */
>> +	memcpy(tmp_page_table, swapper_pg_dir, sizeof(tmp_page_table));
>> +	cpu_set_ttbr1(__pa(tmp_page_table));
>> +	flush_tlb_all();
>> +
>> +	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>> +
>> +	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
>> +			kasan_mem_to_shadow((void *)MODULES_VADDR));
>> +
>> +	for_each_memblock(memory, reg) {
>> +		void *start = (void *)__phys_to_virt(reg->base);
>> +		void *end = (void *)__phys_to_virt(reg->base + reg->size);
>> +
>> +		if (start >= end)
>> +			break;
>> +
>> +		/*
>> +		 * end + 1 here is intentional. We check several shadow bytes in
>> +		 * advance to slightly speed up fastpath. In some rare cases
>> +		 * we could cross boundary of mapped shadow, so we just map
>> +		 * some more here.
>> +		 */
>> +		vmemmap_populate((unsigned long)kasan_mem_to_shadow(start),
>> +				(unsigned long)kasan_mem_to_shadow(end) + 1,
>> +				pfn_to_nid(virt_to_pfn(start)));
> 
> Is the only reason for sparsemem vmemmap dependency to reuse this
> function? Maybe at some point you could factor this out and not require
> SPARSEMEM_VMEMMAP to be enabled.
> 

Yes, this is the only reason and I'll get rid of this dependency some day.

> About the "end + 1", what you actually get is an additional full section
> (PMD_SIZE) with the 4KB page configuration. Since the shadow is 1/8 of
> the VA space, do we need a check for memblocks within 8 * 2MB of each
> other?
> 

Overlaps should be ok. vmemmap_populate will handle it.

>> +	}
>> +
>> +	memset(kasan_zero_page, 0, PAGE_SIZE);
> 
> Has anyone written to this page? Actually, what's its use after we
> enabled the proper KASan shadow mappings?
> 

As said before, in function's prologue GCC generates code that writes to shadow
so it writes to this page.

After kasan_init() this page used as shadow and covers large portions of memory
which are not handled by kasan (vmalloc/vmemmap). We just assume that any access
to this memory region is good.

>> +	cpu_set_ttbr1(__pa(swapper_pg_dir));
>> +	flush_tlb_all();
>> +
>> +	/* At this point kasan is fully initialized. Enable error messages */
>> +	init_task.kasan_depth = 0;
>> +}
>

Catalin Marinas July 14, 2015, 3:04 p.m. UTC | #8

On Fri, Jul 10, 2015 at 08:11:03PM +0300, Andrey Ryabinin wrote:
> >> +#if CONFIG_PGTABLE_LEVELS > 3
> >> +pud_t kasan_zero_pud[PTRS_PER_PUD] __page_aligned_bss;
> >> +#endif
> >> +#if CONFIG_PGTABLE_LEVELS > 2
> >> +pmd_t kasan_zero_pmd[PTRS_PER_PMD] __page_aligned_bss;
> >> +#endif
> >> +pte_t kasan_zero_pte[PTRS_PER_PTE] __page_aligned_bss;
> >> +
> >> +static void __init kasan_early_pmd_populate(unsigned long start,
> >> +					unsigned long end, pud_t *pud)
> >> +{
> >> +	unsigned long addr;
> >> +	unsigned long next;
> >> +	pmd_t *pmd;
> >> +
> >> +	pmd = pmd_offset(pud, start);
> >> +	for (addr = start; addr < end; addr = next, pmd++) {
> >> +		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
> >> +		next = pmd_addr_end(addr, end);
> >> +	}
> >> +}
> >> +
> >> +static void __init kasan_early_pud_populate(unsigned long start,
> >> +					unsigned long end, pgd_t *pgd)
> >> +{
> >> +	unsigned long addr;
> >> +	unsigned long next;
> >> +	pud_t *pud;
> >> +
> >> +	pud = pud_offset(pgd, start);
> >> +	for (addr = start; addr < end; addr = next, pud++) {
> >> +		pud_populate(&init_mm, pud, kasan_zero_pmd);
> >> +		next = pud_addr_end(addr, end);
> >> +		kasan_early_pmd_populate(addr, next, pud);
> >> +	}
> >> +}
> >> +
> >> +static void __init kasan_map_early_shadow(pgd_t *pgdp)
> >> +{
> >> +	int i;
> >> +	unsigned long start = KASAN_SHADOW_START;
> >> +	unsigned long end = KASAN_SHADOW_END;
> >> +	unsigned long addr;
> >> +	unsigned long next;
> >> +	pgd_t *pgd;
> >> +
> >> +	for (i = 0; i < PTRS_PER_PTE; i++)
> >> +		set_pte(&kasan_zero_pte[i], pfn_pte(
> >> +				virt_to_pfn(kasan_zero_page), PAGE_KERNEL));
> >> +
> >> +	pgd = pgd_offset_k(start);
> >> +	for (addr = start; addr < end; addr = next, pgd++) {
> >> +		pgd_populate(&init_mm, pgd, kasan_zero_pud);
> >> +		next = pgd_addr_end(addr, end);
> >> +		kasan_early_pud_populate(addr, next, pgd);
> >> +	}
> > 
> > I prefer to use "do ... while" constructs similar to __create_mapping()
> > (or zero_{pgd,pud,pmd}_populate as you are more familiar with them).
> > 
> > But what I don't get here is that you repopulate the pud page for every
> > pgd (and so on for pmd). You don't need this recursive call all the way
> > to kasan_early_pmd_populate() but just sequential:
> 
> This repopulation needed for 3,2 level page tables configurations.
> 
> E.g. for 3-level page tables we need to call pud_populate(&init_mm,
> pud, kasan_zero_pmd) for each pud in [KASAN_SHADOW_START,
> KASAN_SHADOW_END] range, this causes repopopulation for 4-level page
> tables, since we need to pud_populate() only [KASAN_SHADOW_START,
> KASAN_SHADOW_START + PGDIR_SIZE] range.

I'm referring to writing the same information multiple times over the
same entry. kasan_map_early_shadow() goes over each pgd entry and writes
the address of kasan_zero_pud. That's fine so far. However, in the same
loop you call kasan_early_pud_populate(). The latter retrieves the pud
page via pud_offset(pgd, start) which would always be kasan_zero_pud
because that's what you wrote via pgd_populate() in each pgd entry. So
for each pgd entry, you keep populating the same kasan_zero_pud page
with pointers to kasan_zero_pmd. And so on for the pmd.

> > 	kasan_early_pte_populate();
> > 	kasan_early_pmd_populate(..., pte);
> > 	kasan_early_pud_populate(..., pmd);
> > 	kasan_early_pgd_populate(..., pud);
> > 
> > (or in reverse order)
> 
> Unless, I'm missing something, this will either work only with 4-level
> page tables. We could do this without repopulation by using
> CONFIG_PGTABLE_LEVELS ifdefs.

Or you could move kasan_early_*_populate outside the loop. You already
do this for the pte at the beginning of the kasan_map_early_shadow()
function (and it probably makes more sense to create a separate
kasan_early_pte_populate).

Andrey Ryabinin July 15, 2015, 8:55 a.m. UTC | #9

On 07/14/2015 06:04 PM, Catalin Marinas wrote:
> On Fri, Jul 10, 2015 at 08:11:03PM +0300, Andrey Ryabinin wrote:
>>>> +#if CONFIG_PGTABLE_LEVELS > 3
>>>> +pud_t kasan_zero_pud[PTRS_PER_PUD] __page_aligned_bss;
>>>> +#endif
>>>> +#if CONFIG_PGTABLE_LEVELS > 2
>>>> +pmd_t kasan_zero_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>>> +#endif
>>>> +pte_t kasan_zero_pte[PTRS_PER_PTE] __page_aligned_bss;
>>>> +
>>>> +static void __init kasan_early_pmd_populate(unsigned long start,
>>>> +					unsigned long end, pud_t *pud)
>>>> +{
>>>> +	unsigned long addr;
>>>> +	unsigned long next;
>>>> +	pmd_t *pmd;
>>>> +
>>>> +	pmd = pmd_offset(pud, start);
>>>> +	for (addr = start; addr < end; addr = next, pmd++) {
>>>> +		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>>>> +		next = pmd_addr_end(addr, end);
>>>> +	}
>>>> +}
>>>> +
>>>> +static void __init kasan_early_pud_populate(unsigned long start,
>>>> +					unsigned long end, pgd_t *pgd)
>>>> +{
>>>> +	unsigned long addr;
>>>> +	unsigned long next;
>>>> +	pud_t *pud;
>>>> +
>>>> +	pud = pud_offset(pgd, start);
>>>> +	for (addr = start; addr < end; addr = next, pud++) {
>>>> +		pud_populate(&init_mm, pud, kasan_zero_pmd);
>>>> +		next = pud_addr_end(addr, end);
>>>> +		kasan_early_pmd_populate(addr, next, pud);
>>>> +	}
>>>> +}
>>>> +
>>>> +static void __init kasan_map_early_shadow(pgd_t *pgdp)
>>>> +{
>>>> +	int i;
>>>> +	unsigned long start = KASAN_SHADOW_START;
>>>> +	unsigned long end = KASAN_SHADOW_END;
>>>> +	unsigned long addr;
>>>> +	unsigned long next;
>>>> +	pgd_t *pgd;
>>>> +
>>>> +	for (i = 0; i < PTRS_PER_PTE; i++)
>>>> +		set_pte(&kasan_zero_pte[i], pfn_pte(
>>>> +				virt_to_pfn(kasan_zero_page), PAGE_KERNEL));
>>>> +
>>>> +	pgd = pgd_offset_k(start);
>>>> +	for (addr = start; addr < end; addr = next, pgd++) {
>>>> +		pgd_populate(&init_mm, pgd, kasan_zero_pud);
>>>> +		next = pgd_addr_end(addr, end);
>>>> +		kasan_early_pud_populate(addr, next, pgd);
>>>> +	}
>>>
>>> I prefer to use "do ... while" constructs similar to __create_mapping()
>>> (or zero_{pgd,pud,pmd}_populate as you are more familiar with them).
>>>
>>> But what I don't get here is that you repopulate the pud page for every
>>> pgd (and so on for pmd). You don't need this recursive call all the way
>>> to kasan_early_pmd_populate() but just sequential:
>>
>> This repopulation needed for 3,2 level page tables configurations.
>>
>> E.g. for 3-level page tables we need to call pud_populate(&init_mm,
>> pud, kasan_zero_pmd) for each pud in [KASAN_SHADOW_START,
>> KASAN_SHADOW_END] range, this causes repopopulation for 4-level page
>> tables, since we need to pud_populate() only [KASAN_SHADOW_START,
>> KASAN_SHADOW_START + PGDIR_SIZE] range.
> 
> I'm referring to writing the same information multiple times over the
> same entry. kasan_map_early_shadow() goes over each pgd entry and writes
> the address of kasan_zero_pud. That's fine so far. However, in the same
> loop you call kasan_early_pud_populate(). The latter retrieves the pud
> page via pud_offset(pgd, start) which would always be kasan_zero_pud

Not always. E.g. if we have 3-level page tables pud = pgd, pgd_populate() is nop, and
pud_populate in fact populates pgd.

pud_offset(pgd, start) will return (swapper_pg_dir + pgd_index(start)) and pud_populate()
will fill that entry with the address of kasan_zero_pmd. So we need to pud_populate() for
each pgd.

> because that's what you wrote via pgd_populate() in each pgd entry. So
> for each pgd entry, you keep populating the same kasan_zero_pud page
> with pointers to kasan_zero_pmd. And so on for the pmd.
> 

Yes, I'm perfectly understand that. And this was done intentionally since I don't
see the way to make this work for all possible CONFIG_PGTABLE_LEVELS without rewrites
or without #ifdefs (and you didn't like them in v1).


>>> 	kasan_early_pte_populate();
>>> 	kasan_early_pmd_populate(..., pte);
>>> 	kasan_early_pud_populate(..., pmd);
>>> 	kasan_early_pgd_populate(..., pud);
>>>
>>> (or in reverse order)
>>
>> Unless, I'm missing something, this will either work only with 4-level
>> page tables. We could do this without repopulation by using
>> CONFIG_PGTABLE_LEVELS ifdefs.
> 
> Or you could move kasan_early_*_populate outside the loop. You already
> do this for the pte at the beginning of the kasan_map_early_shadow()
> function (and it probably makes more sense to create a separate
> kasan_early_pte_populate).
> 


Ok, let's try to implement that.
And for example, let's consider CONFIG_PGTABLE_LEVELS=3 case:

 * pgd_populate() is nop, so kasan_early_pgd_populate() won't do anything.

 * pud_populate() in kasan_early_pud_populate() actually will setup pgd entries in swapper_pg_dir,
   so pud_populate() should be called for the whole shadow range: [KASAN_SHADOW_START, KASAN_SHADOW_END]
	IOW: kasan_early_pud_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, kasan_zero_pmd);
	
	We will need to slightly change kasan_early_pud_populate() implementation for that
	(Current implementation implies that [start, end) addresses belong to one pgd)

	void kasan_early_pud_populate(unsigned long start, unsigned long end, pmd_t *pmd)
	{
		unsigned long addr;
		long next;

		for (addr = start; addr < end; addr = next) {
			pud_t *pud = pud_offset(pgd_offset_k(addr), addr);
			pud_populate(&init_mm, pud, pmd);
			next = pud_addr_end(addr, pgd_addr_end(addr, end));
		}
	}

	But, wait! In 4-level page tables case this will be the same repopulation as we had before!

See? The problem here is that pud_populate() but not pgd_populate() populates pgds (3-level page tables case).
So I still don't see the way to avoid repopulation without ifdefs.
Did I miss anything?

Catalin Marinas July 15, 2015, 4:37 p.m. UTC | #10

On Wed, Jul 15, 2015 at 11:55:20AM +0300, Andrey Ryabinin wrote:
> On 07/14/2015 06:04 PM, Catalin Marinas wrote:
> > On Fri, Jul 10, 2015 at 08:11:03PM +0300, Andrey Ryabinin wrote:
> >>> 	kasan_early_pte_populate();
> >>> 	kasan_early_pmd_populate(..., pte);
> >>> 	kasan_early_pud_populate(..., pmd);
> >>> 	kasan_early_pgd_populate(..., pud);
> >>>
> >>> (or in reverse order)
> >>
> >> Unless, I'm missing something, this will either work only with 4-level
> >> page tables. We could do this without repopulation by using
> >> CONFIG_PGTABLE_LEVELS ifdefs.
> > 
> > Or you could move kasan_early_*_populate outside the loop. You already
> > do this for the pte at the beginning of the kasan_map_early_shadow()
> > function (and it probably makes more sense to create a separate
> > kasan_early_pte_populate).
> 
> Ok, let's try to implement that.
> And for example, let's consider CONFIG_PGTABLE_LEVELS=3 case:
> 
>  * pgd_populate() is nop, so kasan_early_pgd_populate() won't do anything.
> 
>  * pud_populate() in kasan_early_pud_populate() actually will setup pgd entries in swapper_pg_dir,
>    so pud_populate() should be called for the whole shadow range: [KASAN_SHADOW_START, KASAN_SHADOW_END]
> 	IOW: kasan_early_pud_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, kasan_zero_pmd);
> 	
> 	We will need to slightly change kasan_early_pud_populate() implementation for that
> 	(Current implementation implies that [start, end) addresses belong to one pgd)
> 
> 	void kasan_early_pud_populate(unsigned long start, unsigned long end, pmd_t *pmd)
> 	{
> 		unsigned long addr;
> 		long next;
> 
> 		for (addr = start; addr < end; addr = next) {
> 			pud_t *pud = pud_offset(pgd_offset_k(addr), addr);
> 			pud_populate(&init_mm, pud, pmd);
> 			next = pud_addr_end(addr, pgd_addr_end(addr, end));
> 		}
> 	}
> 
> 	But, wait! In 4-level page tables case this will be the same repopulation as we had before!

Ok, so simply taking the call out of the loop won't work unless we
conditionally define these functions (wouldn't be too bad since we have
some #if CONFIG_PGTABLE_LEVELS already introduced by this patch but it
would be nicer without).

Anyway, I think we can keep the current iterations but exit early if
!pud_none() because it means we already populated it (reworked to match
other such patterns throughout the kernel with pgd_populate called from
the pud function; and untested):

void kasan_early_pmd_populate(pud_t *pud, unsigned long addr, unsigned long end)
{
	pmd_t *pmd;
	unsigned long next;

	if (pud_none(*pud))
		pud_populate(&init_mm, pud, kasan_zero_pmd);

	pmd = pmd_offset(pud, addr);
	do {
		next = pmd_addr_end(addr, end);
		kasan_early_pte_populate(pmd, addr, next);
	} while (pmd++, addr = next, addr != end && pmd_none(*pmd));
}

void kasan_early_pud_populate(pgd_t *pgd, unsigned long addr, unsigned long end)
{
	pud_t *pud;
	unsigned long next;

	if (pgd_none(*pgd))
		pgd_populate(&init_mm, pgd, kasan_zero_pud);

	pud = pud_offset(pgd, addr);
	do {
		next = pud_addr_end(addr, end);
		kasan_early_pmd_populate(pud, addr, next);
	} while (pud++, addr = next, addr != end && pud_none(*pud));
}

Given that we check pud_none() after the first iterations, it covers the
lower levels if needed.

Andrey Ryabinin July 16, 2015, 3:30 p.m. UTC | #11

On 07/15/2015 07:37 PM, Catalin Marinas wrote:
> Ok, so simply taking the call out of the loop won't work unless we
> conditionally define these functions (wouldn't be too bad since we have
> some #if CONFIG_PGTABLE_LEVELS already introduced by this patch but it
> would be nicer without).
> 
> Anyway, I think we can keep the current iterations but exit early if
> !pud_none() because it means we already populated it (reworked to match
> other such patterns throughout the kernel with pgd_populate called from
> the pud function; and untested):
> 
> void kasan_early_pmd_populate(pud_t *pud, unsigned long addr, unsigned long end)
> {
> 	pmd_t *pmd;
> 	unsigned long next;
> 
> 	if (pud_none(*pud))
> 		pud_populate(&init_mm, pud, kasan_zero_pmd);
> 
> 	pmd = pmd_offset(pud, addr);
> 	do {
> 		next = pmd_addr_end(addr, end);
> 		kasan_early_pte_populate(pmd, addr, next);
> 	} while (pmd++, addr = next, addr != end && pmd_none(*pmd));
> }
> 
> void kasan_early_pud_populate(pgd_t *pgd, unsigned long addr, unsigned long end)
> {
> 	pud_t *pud;
> 	unsigned long next;
> 
> 	if (pgd_none(*pgd))
> 		pgd_populate(&init_mm, pgd, kasan_zero_pud);
> 
> 	pud = pud_offset(pgd, addr);
> 	do {
> 		next = pud_addr_end(addr, end);
> 		kasan_early_pmd_populate(pud, addr, next);
> 	} while (pud++, addr = next, addr != end && pud_none(*pud));
> }
> 
> Given that we check pud_none() after the first iterations, it covers the
> lower levels if needed.
> 

I think this may work, if pud_none(*pud) will be replaced with !pud_val(*pud).
We can't use pud_none() because with 2-level page tables it's always false, so
we will never go down to pmd level where swapper_pg_dir populated.

But you gave me another idea how we could use p?d_none() and avoid rewriting table entries:


void kasan_early_pmd_populate(unsigned long start, unsigned long end, pte_t *pte)
{
	unsigned long addr = start;
	long next;

	do {
		pgd_t *pgd = pgd_offset_k(addr);
		pud_t *pud = pud_offset(pgd, addr);
		pmd_t *pmd = pmd_offset(pud, addr);

		if (!pmd_none(*pmd))
			break;

		pmd_populate_kernel(&init_mm, pmd, pte);
		next = pgd_addr_end(addr, end);
		next = pud_addr_end(addr, next)
		next = pmd_addr_end(addr, next);
	} while(addr = next, addr != end);
}

void kasan_early_pud_populate(unsigned long start, unsigned long end, pmd_t *pmd)
{
	unsigned long addr = start;
	long next;

	do {
		pgd_t *pgd = pgd_offset_k(addr);
		pud_t *pud = pud_offset(pgd, addr);

		if (!pud_none(*pud))
			break;

		pud_populate(&init_mm, pud, pmd);
		next = pud_addr_end(addr, pgd_addr_end(addr, end));
	} while(addr = next, addr != end);
}


void kasan_early_pgd_populate(...)
{
	//something similar to above
	....
}

static void __init kasan_map_early_shadow(void)
{
	kasan_early_pgd_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, kasan_zero_pud);
	kasan_early_pud_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, kasan_zero_pmd);
	kasan_early_pmd_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, kasan_zero_pte);
	kasan_early_pte_populate();
}

Catalin Marinas July 16, 2015, 4:03 p.m. UTC | #12

On Thu, Jul 16, 2015 at 06:30:11PM +0300, Andrey Ryabinin wrote:
> On 07/15/2015 07:37 PM, Catalin Marinas wrote:
> > Ok, so simply taking the call out of the loop won't work unless we
> > conditionally define these functions (wouldn't be too bad since we have
> > some #if CONFIG_PGTABLE_LEVELS already introduced by this patch but it
> > would be nicer without).
> > 
> > Anyway, I think we can keep the current iterations but exit early if
> > !pud_none() because it means we already populated it (reworked to match
> > other such patterns throughout the kernel with pgd_populate called from
> > the pud function; and untested):
> > 
> > void kasan_early_pmd_populate(pud_t *pud, unsigned long addr, unsigned long end)
> > {
> > 	pmd_t *pmd;
> > 	unsigned long next;
> > 
> > 	if (pud_none(*pud))
> > 		pud_populate(&init_mm, pud, kasan_zero_pmd);
> > 
> > 	pmd = pmd_offset(pud, addr);
> > 	do {
> > 		next = pmd_addr_end(addr, end);
> > 		kasan_early_pte_populate(pmd, addr, next);
> > 	} while (pmd++, addr = next, addr != end && pmd_none(*pmd));
> > }
> > 
> > void kasan_early_pud_populate(pgd_t *pgd, unsigned long addr, unsigned long end)
> > {
> > 	pud_t *pud;
> > 	unsigned long next;
> > 
> > 	if (pgd_none(*pgd))
> > 		pgd_populate(&init_mm, pgd, kasan_zero_pud);
> > 
> > 	pud = pud_offset(pgd, addr);
> > 	do {
> > 		next = pud_addr_end(addr, end);
> > 		kasan_early_pmd_populate(pud, addr, next);
> > 	} while (pud++, addr = next, addr != end && pud_none(*pud));
> > }
> > 
> > Given that we check pud_none() after the first iterations, it covers the
> > lower levels if needed.
> 
> I think this may work, if pud_none(*pud) will be replaced with !pud_val(*pud).
> We can't use pud_none() because with 2-level page tables it's always false, so
> we will never go down to pmd level where swapper_pg_dir populated.

The reason I used "do ... while" vs "while" or "for" is so that it gets
down to the pmd level. The iteration over pgd is always done in the top
loop via pgd_addr_end while the loops for missing levels (nopud, nopmd)
are always a single iteration whether we check for pud_none or not. But
when the level is present, we avoid looping when !pud_none().

> But you gave me another idea how we could use p?d_none() and avoid rewriting table entries:
> 
> 
> void kasan_early_pmd_populate(unsigned long start, unsigned long end, pte_t *pte)
> {
> 	unsigned long addr = start;
> 	long next;
> 
> 	do {
> 		pgd_t *pgd = pgd_offset_k(addr);
> 		pud_t *pud = pud_offset(pgd, addr);
> 		pmd_t *pmd = pmd_offset(pud, addr);
> 
> 		if (!pmd_none(*pmd))
> 			break;
> 
> 		pmd_populate_kernel(&init_mm, pmd, pte);
> 		next = pgd_addr_end(addr, end);
> 		next = pud_addr_end(addr, next)
> 		next = pmd_addr_end(addr, next);
> 	} while(addr = next, addr != end);
> }
> 
> void kasan_early_pud_populate(unsigned long start, unsigned long end, pmd_t *pmd)
> {
> 	unsigned long addr = start;
> 	long next;
> 
> 	do {
> 		pgd_t *pgd = pgd_offset_k(addr);
> 		pud_t *pud = pud_offset(pgd, addr);
> 
> 		if (!pud_none(*pud))
> 			break;
> 
> 		pud_populate(&init_mm, pud, pmd);
> 		next = pud_addr_end(addr, pgd_addr_end(addr, end));
> 	} while(addr = next, addr != end);
> }
> 
> 
> void kasan_early_pgd_populate(...)
> {
> 	//something similar to above
> 	....
> }
> 
> static void __init kasan_map_early_shadow(void)
> {
> 	kasan_early_pgd_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, kasan_zero_pud);
> 	kasan_early_pud_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, kasan_zero_pmd);
> 	kasan_early_pmd_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, kasan_zero_pte);
> 	kasan_early_pte_populate();
> }

While this would probably work, you still need #ifdef's since
kasan_zero_pud is not defined with 2 and 3 levels. That's what I
initially thought we should do but since you didn't like the #ifdef's, I
came up with another proposal.

So, I still prefer my suggestion above unless you find a problem with
it.

Andrey Ryabinin July 17, 2015, 1:13 p.m. UTC | #13

On 07/16/2015 07:03 PM, Catalin Marinas wrote:
> On Thu, Jul 16, 2015 at 06:30:11PM +0300, Andrey Ryabinin wrote:
>>
>> I think this may work, if pud_none(*pud) will be replaced with !pud_val(*pud).
>> We can't use pud_none() because with 2-level page tables it's always false, so
>> we will never go down to pmd level where swapper_pg_dir populated.
> 
> The reason I used "do ... while" vs "while" or "for" is so that it gets
> down to the pmd level. The iteration over pgd is always done in the top
> loop via pgd_addr_end while the loops for missing levels (nopud, nopmd)
> are always a single iteration whether we check for pud_none or not. But
> when the level is present, we avoid looping when !pud_none().
> 

Right, dunno what I was thinking.
It seems to work. Lightly tested with every possible CONFIG_PGTABLE_LEVELS.

Linus Walleij July 21, 2015, 10:36 a.m. UTC | #14

On Wed, Jun 17, 2015 at 11:32 PM, Andrey Ryabinin
<ryabinin.a.a@gmail.com> wrote:
> 2015-06-13 18:25 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
>>
>> On Fri, Jun 12, 2015 at 8:14 PM, Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
>> > 2015-06-11 16:39 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
>> >> On Fri, May 15, 2015 at 3:59 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
>> >>
>> >>> This patch adds arch specific code for kernel address sanitizer
>> >>> (see Documentation/kasan.txt).
>> >>
>> >> I looked closer at this again ... I am trying to get KASan up for
>> >> ARM(32) with some tricks and hacks.
>> >>
>> >
>> > I have some patches for that. They still need some polishing, but works for me.
>> > I could share after I get back to office on Tuesday.
>>
>> OK! I'd be happy to test!
>>
>
> I've pushed it here : git://github.com/aryabinin/linux.git kasan/arm_v0
>
> It far from ready. Firstly I've tried it only in qemu and it works.

Hm what QEMU model are you using? I tried to test it with Versatile
(the most common) and it kinda boots and hangs:

Memory: 106116K/131072K available (3067K kernel code, 166K rwdata,
864K rodata, 3072K init, 130K bss, 24956K reserved, 0K cma-reserved)
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
    kasan   : 0x9f000000 - 0xbf000000   ( 512 MB)
    vmalloc : 0xc8800000 - 0xff000000   ( 872 MB)
(...)

Looks correct, no highmem on this beast.

Then I get this.

Unable to handle kernel NULL pointer dereference at virtual address 00000130
pgd = c5ea8000
[00000130] *pgd=00000000
Internal error: Oops: 5 [#1] ARM
Modules linked in:
CPU: 0 PID: 19 Comm: modprobe Not tainted 4.1.0-rc8+ #7
Hardware name: ARM-Versatile (Device Tree Support)
task: c5e0b5a0 ti: c5ea0000 task.ti: c5ea0000
PC is at v4wbi_flush_user_tlb_range+0x10/0x4c
LR is at move_page_tables+0x218/0x308
pc : [<c001e870>]    lr : [<c008f230>]    psr: 20000153
sp : c5ea7df0  ip : c5e8c000  fp : ff8ec000
r10: 00bf334f  r9 : c5ead3b0  r8 : 9e8ec000
r7 : 00000001  r6 : 00002000  r5 : 9f000000  r4 : 9effe000
r3 : 00000000  r2 : c5e8a000  r1 : 9f000000  r0 : 9effe000
Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
Control: 00093177  Table: 05ea8000  DAC: 00000015
Process modprobe (pid: 19, stack limit = 0xc5ea0190)
Stack: (0xc5ea7df0 to 0xc5ea8000)
7de0:                                     00000000 9f000000 c5e8a000 00000000
7e00: 00000000 00000000 c5e8a000 9effe000 000000c0 c5e8a000 c68c5700 9e8ec000
7e20: c5e8c034 9effe000 9e8ea000 9f000000 00002000 c00a9384 00002000 00000000
7e40: c5e8c000 00000000 00000000 c5e8a000 00000000 00000000 00000000 00000000
7e60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7e80: c5e870e0 5e5e8830 c0701808 00000013 c5ea7ec0 c5eac000 c5e870e0 c5e26140
7ea0: c68c5700 00000000 c5ea7ec0 c5eac000 c5e870e0 c6a2d8c0 00000001 c00e4530
7ec0: c68c5700 00000080 c5df2480 9f000000 00000000 c5ea0000 0000000a c00a99ec
7ee0: 00000017 c5ea7ef4 00000000 00000000 00000000 c7f10e60 c0377bc0 c0bf3f99
7f00: 0000000f 00000000 00000000 c00a9c28 9efff000 c06fa2c8 c68c5700 c06f9eb8
7f20: 00000001 fffffff8 c68c5700 00000000 00000001 c00a9770 c5e0b5a0 00000013
7f40: c5e87ee0 c06ef0c8 c6a60000 c00aabd0 c5e8c034 00000000 00000000 c5e0b790
7f60: c06e8190 c5ddcc60 c5d4c300 0000003f ffffffff 00000000 00000000 00000000
7f80: 00000000 c00aad00 00000000 00000000 00000000 c0030f48 c5d4c300 c0030e54
7fa0: 00000000 00000000 00000000 c0014960 00000000 00000000 00000000 00000000
7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c001e870>] (v4wbi_flush_user_tlb_range) from [<00000000>] (  (null))
Code: e592c020 e3cd3d7f e3c3303f e593300c (e5933130)
---[ end trace b3c4eba35670ba77 ]---

Yours,
Linus Walleij

Andrey Ryabinin July 21, 2015, 2:27 p.m. UTC | #15

On 07/21/2015 01:36 PM, Linus Walleij wrote:
> On Wed, Jun 17, 2015 at 11:32 PM, Andrey Ryabinin
> <ryabinin.a.a@gmail.com> wrote:
>> 2015-06-13 18:25 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
>>>
>>> On Fri, Jun 12, 2015 at 8:14 PM, Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
>>>> 2015-06-11 16:39 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
>>>>> On Fri, May 15, 2015 at 3:59 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
>>>>>
>>>>>> This patch adds arch specific code for kernel address sanitizer
>>>>>> (see Documentation/kasan.txt).
>>>>>
>>>>> I looked closer at this again ... I am trying to get KASan up for
>>>>> ARM(32) with some tricks and hacks.
>>>>>
>>>>
>>>> I have some patches for that. They still need some polishing, but works for me.
>>>> I could share after I get back to office on Tuesday.
>>>
>>> OK! I'd be happy to test!
>>>
>>
>> I've pushed it here : git://github.com/aryabinin/linux.git kasan/arm_v0
>>
>> It far from ready. Firstly I've tried it only in qemu and it works.
> 
> Hm what QEMU model are you using? I tried to test it with Versatile
> (the most common) and it kinda boots and hangs:
> 

I used vexpress. Anyway, it doesn't matter now, since I have an update
with a lot of stuff fixed, and it works on hardware.
I still need to do some work on it and tomorrow, probably, I will share.

Linus Walleij July 21, 2015, 9:27 p.m. UTC | #16

On Tue, Jul 21, 2015 at 4:27 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:

> I used vexpress. Anyway, it doesn't matter now, since I have an update
> with a lot of stuff fixed, and it works on hardware.
> I still need to do some work on it and tomorrow, probably, I will share.

Ah awesome. I have a stash of ARM boards so I can test it on a
range of hardware once you feel it's ready.

Sorry for pulling stuff out of your hands, people are excited about
KASan ARM32 as it turns out.

Yours,
Linus Walleij

Andrey Ryabinin July 22, 2015, 5:54 p.m. UTC | #17

On 07/22/2015 12:27 AM, Linus Walleij wrote:
> On Tue, Jul 21, 2015 at 4:27 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
> 
>> I used vexpress. Anyway, it doesn't matter now, since I have an update
>> with a lot of stuff fixed, and it works on hardware.
>> I still need to do some work on it and tomorrow, probably, I will share.
> 
> Ah awesome. I have a stash of ARM boards so I can test it on a
> range of hardware once you feel it's ready.
> 

So here is updated version:
	git://github.com/aryabinin/linux.git kasan/arm_v0_1

The code is still ugly in some places and it probably have some bugs.
Lightly tested on exynos 5410/5420.


> Sorry for pulling stuff out of your hands, people are excited about
> KASan ARM32 as it turns out.
> 
> Yours,
> Linus Walleij
>

Linus Walleij Aug. 19, 2015, 12:14 p.m. UTC | #18

On Wed, Jul 22, 2015 at 7:54 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:

> So here is updated version:
>         git://github.com/aryabinin/linux.git kasan/arm_v0_1
>
> The code is still ugly in some places and it probably have some bugs.
> Lightly tested on exynos 5410/5420.

I compiled this for various ARM platforms and tested to boot.
I used GCC version 4.9.3 20150113 (prerelease) (Linaro).

I get these compilation warnings no matter what I compile,
I chose to ignore them:

WARNING: vmlinux.o(.meminit.text+0x2c):
Section mismatch in reference from the function kasan_pte_populate()
to the function
.init.text:kasan_alloc_block.constprop.7()
The function __meminit kasan_pte_populate() references
a function __init kasan_alloc_block.constprop.7().
If kasan_alloc_block.constprop.7 is only used by kasan_pte_populate then
annotate kasan_alloc_block.constprop.7 with a matching annotation.

WARNING: vmlinux.o(.meminit.text+0x98):
Section mismatch in reference from the function kasan_pmd_populate()
to the function
.init.text:kasan_alloc_block.constprop.7()
The function __meminit kasan_pmd_populate() references
a function __init kasan_alloc_block.constprop.7().
If kasan_alloc_block.constprop.7 is only used by kasan_pmd_populate then
annotate kasan_alloc_block.constprop.7 with a matching annotation.

These KASan outline tests run fine:

kasan test: kmalloc_oob_right out-of-bounds to right
kasan test: kmalloc_oob_left out-of-bounds to left
kasan test: kmalloc_node_oob_right kmalloc_node(): out-of-bounds to right
kasan test: kmalloc_large_oob_rigth kmalloc large allocation:
out-of-bounds to right
kasan test: kmalloc_oob_krealloc_more out-of-bounds after krealloc more
kasan test: kmalloc_oob_krealloc_less out-of-bounds after krealloc less
kasan test: kmalloc_oob_16 kmalloc out-of-bounds for 16-bytes access
kasan test: kmalloc_oob_in_memset out-of-bounds in memset
kasan test: kmalloc_uaf use-after-free
kasan test: kmalloc_uaf_memset use-after-free in memset
kasan test: kmalloc_uaf2 use-after-free after another kmalloc
kasan test: kmem_cache_oob out-of-bounds in kmem_cache_alloc

These two tests seems to not trigger KASan BUG()s, and seemse to
be like so on all hardware, so I guess it is this kind of test
that requires GCC 5.0:

kasan test: kasan_stack_oob out-of-bounds on stack
kasan test: kasan_global_oob out-of-bounds global variable


Hardware test targets:

Ux500 (ARMv7):

On Ux500 I get a real slow boot (as exepected) and after
enabling the test cases produce KASan warnings
expectedly.

MSM APQ8060 (ARMv7):

Also a real slow boot and the expected KASan warnings when
running the tests.

Integrator/AP (ARMv5):

This one mounted with an ARMv5 ARM926 tile. It boots nicely
(but takes forever) with KASan and run all test cases (!) just like
for the other platforms but before reaching userspace this happens:

Unable to handle kernel paging request at virtual address 00021144
pgd = c5a74000
[00021144] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 24 Comm: modprobe Tainted: G    B
4.2.0-rc2-77613-g11c2df68e4a8 #1
Hardware name: ARM Integrator/AP (Device Tree)
task: c69f8cc0 ti: c5a68000 task.ti: c5a68000
PC is at v4wbi_flush_user_tlb_range+0x10/0x4c
LR is at move_page_tables+0x320/0x46c
pc : [<c00182d0>]    lr : [<c00ce9d0>]    psr: 60000013
sp : c5a6bd78  ip : c5a70000  fp : 9f000000
r10: 9eaab000  r9 : ffaab000  r8 : 0093d34f
r7 : c5a782ac  r6 : c5a68000  r5 : 9effe000  r4 : c0900044
r3 : 00021000  r2 : c5a4c000  r1 : 9f000000  r0 : 9effe000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 05a74000  DAC: 00000015
Process modprobe (pid: 24, stack limit = 0xc5a68190)
Stack: (0xc5a6bd78 to 0xc5a6c000)
bd60:                                                       9f000000 00002000
bd80: 00000000 9f000000 c5a4c000 00000000 00000000 c5a4c020 c5a68000 c5a68004
bda0: c5a4c000 9effe000 00000000 c5a4c000 c59a6540 c5a4c004 c5a70034 c59a65cc
bdc0: 9effe000 00000000 00002000 c00f4558 00002000 00000000 00000000 9f000000
bde0: 9eaa9000 c5a70000 9eaab000 c5a4c04c 00000000 c5a4c000 00000000 00000000
be00: 00000000 00000000 00000000 0f4a4438 b0a5485b 8881364f c0910aec 00000000
be20: 00000000 00000000 c69f8cc0 0f4a4438 c0910aec 00000018 9f000000 c59a68c0
be40: c69f8cc0 c592db40 c59a6540 c592db50 9f000000 c59a68c0 c69f8cc0 00c00000
be60: c5a6be68 c015694c c59a6540 00000080 c5a4aa80 c59a65d8 c5e9bc00 c592db6c
be80: c6a0da00 c592db40 00000001 00000000 c59a6540 c00f4f10 00000000 00002000
bea0: 9f000000 c5a4c000 c69f8cc0 c00f5008 00000017 c5a6bec4 00000000 00000008
bec0: 9efff000 c00b5160 c7e937a0 c00a4684 00000000 c7e937a0 9efff000 c09089a8
bee0: c09089b0 c59a6540 c0908590 fffffff8 c59a65d4 c5a68000 c5a68004 c00f4b28
bf00: c59a65c8 00000001 00000018 c69f8cc0 c5a4ab60 00000018 c6a38000 c59a6540
bf20: 00000001 c59a65e8 c59a65cc c00f6fec c090476c 00000000 c59a65c8 0000038f
bf40: c5a4c028 c59a65c0 c59a65f0 c5a4c004 00000000 c69f8ec0 00000000 c6a38000
bf60: c5a11a00 c5a4ab60 00000000 00000000 ffffffff 00000000 00000000 c00f71a4
bf80: 00000000 ffffffff 00000000 c00361fc c5a11a00 c0036044 00000000 00000000
bfa0: 00000000 00000000 00000000 c000aa60 00000000 00000000 00000000 00000000
bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00002000 41000001
[<c00182d0>] (v4wbi_flush_user_tlb_range) from [<9f000000>] (0x9f000000)
Code: e592c020 e3cd3d7f e3c3303f e593300c (e5933144)
---[ end trace 7fa2f634c630ab6d ]---


Compaq iPAQ H3600 (ARMv4):

This is an ARMv4 machine, SA1100-based. It boots nicely
with KASan and run all test cases (!) just like for the other platforms
but before reaching userspace this happens:

Unable to handle kernel paging request at virtual address e3e57b43
pgd = c08e8000
[e3e57b43] *pgd=00000000
Internal error: Oops: 408eb003 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 944 Comm: modprobe Tainted: G    B
4.2.0-rc2-77612-g66bb8b6c242c #1
Hardware name: Compaq iPAQ H3600
task: c14d4880 ti: c08dc000 task.ti: c08dc000
PC is at v4wb_flush_user_tlb_range+0x10/0x48
LR is at change_protection_range+0x3c8/0x464
pc : [<c001cb10>]    lr : [<c00c9d58>]    psr: 20000013
sp : c08dfd20  ip : c08e6000  fp : 00000000
r10: c08dc000  r9 : c08ea7c8  r8 : 00000001
r7 : 9f000000  r6 : 9f000000  r5 : c08ed800  r4 : c1ff834f
r3 : e3e579ff  r2 : c08ec000  r1 : 9f000000  r0 : 9effe000
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: c08eb17f  Table: c08eb17f  DAC: 00000015
Process modprobe (pid: 944, stack limit = 0xc08dc190)
Stack: (0xc08dfd20 to 0xc08e0000)
fd20: 00000001 9f000000 9effffff c08ec000 00000181 c08dc004 c14bc050 c08ea7c0
fd40: c08dc000 c08e6000 9effe000 c08e6170 c0800d90 00000000 c08ec000 00118177
fd60: 9effe000 00000000 00118173 9f000000 c08e6000 c00c9f50 00000000 00000000
fd80: 00000000 0009effe 00000000 c00e9868 00000000 00000002 c08ef000 00000000
fda0: 00000000 c08ec000 c12d4000 c08ec004 c08e6034 c12d408c 00118177 9effe000
fdc0: 001b8000 c00f0a24 00118177 ffffa1d9 00000000 00100177 00000000 00000000
fde0: 00000000 00000000 00000000 c08ec000 00000000 00000000 00000000 00000000
fe00: 00000000 673bd3e6 80a09842 cd271319 c08087f8 00000000 00000000 00000000
fe20: c14d4880 673bd3e6 c08087f8 000003b0 9f000000 c08f0000 c14d4880 c0ace8c0
fe40: c12d4000 c0ace8d0 9f000000 c08f0000 c14d4880 00c00000 c08dfe60 c015474c
fe60: c12d4000 00000080 c08e5d20 c12d4098 c08f0d80 c0ace8ec c14bc000 c0ace8c0
fe80: 00000002 00000000 9f000000 00000000 c12d4000 c00f14dc 00000000 00002000
fea0: 9f000000 c08ec000 c14d4880 c00f15d4 00000017 c08dfec4 00000000 9efff000
fec0: 00000000 c00b0eb4 c1f38f00 c00a3590 00000000 c0801f28 c12d4000 c08dc000
fee0: c08dc004 fffffff8 c0801b10 c12d4094 00000000 c00f118c c12d4098 c12d4088
ff00: 00000001 c12d4001 000003b0 c14d4880 c08e5e00 000003b0 c1494300 c12d4000
ff20: 00000001 c12d40a8 c12d408c c00f35e8 c07fdd8c 00000000 c12d4088 0000038f
ff40: c08ec028 c12d4080 c12d40b0 c08ec004 00000000 c14d4a80 00000000 c1494300
ff60: c08db300 c08e5e00 00000000 00000000 ffffffff 00000000 00000000 c00f3778
ff80: 00000000 ffffffff 00000000 c0038f3c c08db300 c0038d40 00000000 00000000
ffa0: 00000000 00000000 00000000 c00108a0 00000000 00000000 00000000 00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 e85b38dd 6e46c80d
[<c001cb10>] (v4wb_flush_user_tlb_range) from [<c08ec000>] (0xc08ec000)
Code: e592c020 e3cd3d7f e3c3303f e593300c (e5933144)
---[ end trace 66be5d28dde42c9f ]---
random: nonblocking pool is initialized

So it seems v4wb_flush_user_tlb_range+0x10/0x48 is the culprit
on these two systems. It is in arch/arm/mm/tlb-v4wb.S.

Uninstrumenting these files with ASAN_SANITIZE_tlb-v4wb.o := n does
not help.

I then tested on the Footbridge, another ARMv4 system, the oldest I have
SA110-based. This passes decompression and then you may *think* it hangs.
But it doesn't. It just takes a few minutes to boot with KASan
instrumentation, then all tests run fine also on this hardware.
The crash logs scroll by on the physical console.

They keep scrolling forever however, and are still scrolling as I
write this. I suspect some real memory usage bugs to be causing it,
as it is exercising some ages old code that didn't see much scrutiny
in recent years.


Yours,
Linus Walleij

Russell King - ARM Linux Aug. 24, 2015, 1:15 p.m. UTC | #19

On Tue, Jul 21, 2015 at 11:27:56PM +0200, Linus Walleij wrote:
> On Tue, Jul 21, 2015 at 4:27 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
> 
> > I used vexpress. Anyway, it doesn't matter now, since I have an update
> > with a lot of stuff fixed, and it works on hardware.
> > I still need to do some work on it and tomorrow, probably, I will share.
> 
> Ah awesome. I have a stash of ARM boards so I can test it on a
> range of hardware once you feel it's ready.
> 
> Sorry for pulling stuff out of your hands, people are excited about
> KASan ARM32 as it turns out.

People may be excited about it because it's a new feature, but we really
need to consider whether gobbling up 512MB of userspace for it is a good
idea or not.  There are programs around which like to map large amounts
of memory into their process space, and the more we steal from them, the
more likely these programs are to fail.

The other thing which I'm not happy about is having a 16K allocation per
thread - the 16K allocation for the PGD is already prone to invoking the
OOM killer after memory fragmentation has set in, we don't need another
16K allocation.  We're going from one 16K allocation per process to that
_and_ one 16K allocation per thread.

Linus Walleij Aug. 24, 2015, 1:45 p.m. UTC | #20

On Mon, Aug 24, 2015 at 3:15 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jul 21, 2015 at 11:27:56PM +0200, Linus Walleij wrote:
>> On Tue, Jul 21, 2015 at 4:27 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
>>
>> > I used vexpress. Anyway, it doesn't matter now, since I have an update
>> > with a lot of stuff fixed, and it works on hardware.
>> > I still need to do some work on it and tomorrow, probably, I will share.
>>
>> Ah awesome. I have a stash of ARM boards so I can test it on a
>> range of hardware once you feel it's ready.
>>
>> Sorry for pulling stuff out of your hands, people are excited about
>> KASan ARM32 as it turns out.
>
> People may be excited about it because it's a new feature, but we really
> need to consider whether gobbling up 512MB of userspace for it is a good
> idea or not.  There are programs around which like to map large amounts
> of memory into their process space, and the more we steal from them, the
> more likely these programs are to fail.

I looked at some different approaches over the last weeks for this
when playing around with KASan.

It seems since KASan was developed on 64bit systems, this was
not much of an issue for them as they could take their shadow
memory from the vmalloc space.

I think it is possible to actually just steal as much memory as is
needed to cover the kernel, and not 1/8 of the entire addressable
32bit space. So instead of covering all from 0x0-0xffffffff
at least just MODULES_VADDR thru 0xffffffff should be enough.
So if that is 0xbf000000-0xffffffff in most cases, 0x41000000
bytes, then 1/8 of that, 0x8200000, 130MB should be enough.
(Andrey need to say if this is possible.)

That will probably miss some usecases I'm not familiar with, where
the kernel is actually executing something below 0xbf000000...

I looked at taking memory from vmalloc instead, but ran into
problems since this is subject to the highmem split and KASan
need to have it's address offset at compile time. On
Ux500 I managed to remove all the static maps and steal memory
from the top of the vmalloc area instead of the beginning, but
that is probably not generally feasible.

I suspect you have better ideas than what I can come up
with though.

Yours,
Linus Walleij

Andrey Ryabinin Aug. 24, 2015, 2:15 p.m. UTC | #21

2015-08-24 16:45 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
> On Mon, Aug 24, 2015 at 3:15 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Tue, Jul 21, 2015 at 11:27:56PM +0200, Linus Walleij wrote:
>>> On Tue, Jul 21, 2015 at 4:27 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
>>>
>>> > I used vexpress. Anyway, it doesn't matter now, since I have an update
>>> > with a lot of stuff fixed, and it works on hardware.
>>> > I still need to do some work on it and tomorrow, probably, I will share.
>>>
>>> Ah awesome. I have a stash of ARM boards so I can test it on a
>>> range of hardware once you feel it's ready.
>>>
>>> Sorry for pulling stuff out of your hands, people are excited about
>>> KASan ARM32 as it turns out.
>>
>> People may be excited about it because it's a new feature, but we really
>> need to consider whether gobbling up 512MB of userspace for it is a good
>> idea or not.  There are programs around which like to map large amounts
>> of memory into their process space, and the more we steal from them, the
>> more likely these programs are to fail.
>
> I looked at some different approaches over the last weeks for this
> when playing around with KASan.
>
> It seems since KASan was developed on 64bit systems, this was
> not much of an issue for them as they could take their shadow
> memory from the vmalloc space.
>
> I think it is possible to actually just steal as much memory as is
> needed to cover the kernel, and not 1/8 of the entire addressable
> 32bit space. So instead of covering all from 0x0-0xffffffff
> at least just MODULES_VADDR thru 0xffffffff should be enough.
> So if that is 0xbf000000-0xffffffff in most cases, 0x41000000
> bytes, then 1/8 of that, 0x8200000, 130MB should be enough.
> (Andrey need to say if this is possible.)
>

Yes, ~130Mb (3G/1G split) should work. 512Mb shadow is optional.
The only advantage of 512Mb shadow is better handling of user memory
accesses bugs
(access to user memory without copy_from_user/copy_to_user/strlen_user etc API).
In case of 512Mb shadow we could to not map anything in shadow for
user addresses, so such bug will
guarantee  to crash the kernel.
In case of 130Mb, the behavior will depend on memory layout of the
current process.
So, I think it's fine to keep shadow only for kernel addresses.

> That will probably miss some usecases I'm not familiar with, where
> the kernel is actually executing something below 0xbf000000...
>
> I looked at taking memory from vmalloc instead, but ran into
> problems since this is subject to the highmem split and KASan
> need to have it's address offset at compile time. On
> Ux500 I managed to remove all the static maps and steal memory
> from the top of the vmalloc area instead of the beginning, but
> that is probably not generally feasible.
>
> I suspect you have better ideas than what I can come up
> with though.
>
> Yours,
> Linus Walleij

Vladimir Murzin Aug. 24, 2015, 3:44 p.m. UTC | #22

On 24/08/15 15:15, Andrey Ryabinin wrote:
> 2015-08-24 16:45 GMT+03:00 Linus Walleij <linus.walleij@linaro.org>:
>> On Mon, Aug 24, 2015 at 3:15 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>>> On Tue, Jul 21, 2015 at 11:27:56PM +0200, Linus Walleij wrote:
>>>> On Tue, Jul 21, 2015 at 4:27 PM, Andrey Ryabinin <a.ryabinin@samsung.com> wrote:
>>>>
>>>>> I used vexpress. Anyway, it doesn't matter now, since I have an update
>>>>> with a lot of stuff fixed, and it works on hardware.
>>>>> I still need to do some work on it and tomorrow, probably, I will share.
>>>>
>>>> Ah awesome. I have a stash of ARM boards so I can test it on a
>>>> range of hardware once you feel it's ready.
>>>>
>>>> Sorry for pulling stuff out of your hands, people are excited about
>>>> KASan ARM32 as it turns out.
>>>
>>> People may be excited about it because it's a new feature, but we really
>>> need to consider whether gobbling up 512MB of userspace for it is a good
>>> idea or not.  There are programs around which like to map large amounts
>>> of memory into their process space, and the more we steal from them, the
>>> more likely these programs are to fail.
>>
>> I looked at some different approaches over the last weeks for this
>> when playing around with KASan.
>>
>> It seems since KASan was developed on 64bit systems, this was
>> not much of an issue for them as they could take their shadow
>> memory from the vmalloc space.
>>
>> I think it is possible to actually just steal as much memory as is
>> needed to cover the kernel, and not 1/8 of the entire addressable
>> 32bit space. So instead of covering all from 0x0-0xffffffff
>> at least just MODULES_VADDR thru 0xffffffff should be enough.
>> So if that is 0xbf000000-0xffffffff in most cases, 0x41000000
>> bytes, then 1/8 of that, 0x8200000, 130MB should be enough.
>> (Andrey need to say if this is possible.)
>>
> 
> Yes, ~130Mb (3G/1G split) should work. 512Mb shadow is optional.
> The only advantage of 512Mb shadow is better handling of user memory
> accesses bugs
> (access to user memory without copy_from_user/copy_to_user/strlen_user etc API).
> In case of 512Mb shadow we could to not map anything in shadow for
> user addresses, so such bug will
> guarantee  to crash the kernel.
> In case of 130Mb, the behavior will depend on memory layout of the
> current process.
> So, I think it's fine to keep shadow only for kernel addresses.

Another option would be having "sparse" shadow memory based on page
extension. I did play with that some time ago based on ideas from
original v1 KASan support for x86/arm - it is how 614be38 "irqchip:
gic-v3: Fix out of bounds access to cpu_logical_map" was caught.
It doesn't require any VA reservations, only some contiguous memory for
the page_ext itself, which serves as indirection level for the 0-order
shadow pages.
In theory such design can be reused by others 32-bit arches and, I
think, nommu too. Additionally, the shadow pages might be movable with
help of driver-page migration patch series [1].
The cost is obvious - performance drop, although I didn't bother
measuring it.

[1] https://lwn.net/Articles/650917/

Cheers
Vladimir

> 
>> That will probably miss some usecases I'm not familiar with, where
>> the kernel is actually executing something below 0xbf000000...
>>
>> I looked at taking memory from vmalloc instead, but ran into
>> problems since this is subject to the highmem split and KASan
>> need to have it's address offset at compile time. On
>> Ux500 I managed to remove all the static maps and steal memory
>> from the top of the vmalloc area instead of the beginning, but
>> that is probably not generally feasible.
>>
>> I suspect you have better ideas than what I can come up
>> with though.
>>
>> Yours,
>> Linus Walleij
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

Andrey Ryabinin Aug. 24, 2015, 4 p.m. UTC | #23

2015-08-24 18:44 GMT+03:00 Vladimir Murzin <vladimir.murzin@arm.com>:
>
> Another option would be having "sparse" shadow memory based on page
> extension. I did play with that some time ago based on ideas from
> original v1 KASan support for x86/arm - it is how 614be38 "irqchip:
> gic-v3: Fix out of bounds access to cpu_logical_map" was caught.
> It doesn't require any VA reservations, only some contiguous memory for
> the page_ext itself, which serves as indirection level for the 0-order
> shadow pages.

We won't be able to use inline instrumentation (I could live with that),
and most importantly, we won't be able to use stack instrumentation.
GCC needs to know shadow address for inline and/or stack instrumentation
to generate correct code.

> In theory such design can be reused by others 32-bit arches and, I
> think, nommu too. Additionally, the shadow pages might be movable with
> help of driver-page migration patch series [1].
> The cost is obvious - performance drop, although I didn't bother
> measuring it.
>
> [1] https://lwn.net/Articles/650917/
>
> Cheers
> Vladimir
>

Vladimir Murzin Aug. 24, 2015, 4:16 p.m. UTC | #24

On 24/08/15 17:00, Andrey Ryabinin wrote:
> 2015-08-24 18:44 GMT+03:00 Vladimir Murzin <vladimir.murzin@arm.com>:
>>
>> Another option would be having "sparse" shadow memory based on page
>> extension. I did play with that some time ago based on ideas from
>> original v1 KASan support for x86/arm - it is how 614be38 "irqchip:
>> gic-v3: Fix out of bounds access to cpu_logical_map" was caught.
>> It doesn't require any VA reservations, only some contiguous memory for
>> the page_ext itself, which serves as indirection level for the 0-order
>> shadow pages.
> 
> We won't be able to use inline instrumentation (I could live with that),
> and most importantly, we won't be able to use stack instrumentation.
> GCC needs to know shadow address for inline and/or stack instrumentation
> to generate correct code.

It's definitely a trade-off ;)

Just for my understanding does that stack instrumentation is controlled
via -asan-stack?

Thanks
Vladimir

> 
>> In theory such design can be reused by others 32-bit arches and, I
>> think, nommu too. Additionally, the shadow pages might be movable with
>> help of driver-page migration patch series [1].
>> The cost is obvious - performance drop, although I didn't bother
>> measuring it.
>>
>> [1] https://lwn.net/Articles/650917/
>>
>> Cheers
>> Vladimir
>>
>

Andrey Ryabinin Aug. 24, 2015, 4:18 p.m. UTC | #25

2015-08-24 19:16 GMT+03:00 Vladimir Murzin <vladimir.murzin@arm.com>:
> On 24/08/15 17:00, Andrey Ryabinin wrote:
>> 2015-08-24 18:44 GMT+03:00 Vladimir Murzin <vladimir.murzin@arm.com>:
>>>
>>> Another option would be having "sparse" shadow memory based on page
>>> extension. I did play with that some time ago based on ideas from
>>> original v1 KASan support for x86/arm - it is how 614be38 "irqchip:
>>> gic-v3: Fix out of bounds access to cpu_logical_map" was caught.
>>> It doesn't require any VA reservations, only some contiguous memory for
>>> the page_ext itself, which serves as indirection level for the 0-order
>>> shadow pages.
>>
>> We won't be able to use inline instrumentation (I could live with that),
>> and most importantly, we won't be able to use stack instrumentation.
>> GCC needs to know shadow address for inline and/or stack instrumentation
>> to generate correct code.
>
> It's definitely a trade-off ;)
>
> Just for my understanding does that stack instrumentation is controlled
> via -asan-stack?
>

Yup.

Russell King - ARM Linux Aug. 24, 2015, 5:47 p.m. UTC | #26

On Mon, Aug 24, 2015 at 05:15:22PM +0300, Andrey Ryabinin wrote:
> Yes, ~130Mb (3G/1G split) should work. 512Mb shadow is optional.
> The only advantage of 512Mb shadow is better handling of user memory
> accesses bugs
> (access to user memory without copy_from_user/copy_to_user/strlen_user etc API).

No need for that to be handed by KASan.  I have patches in linux-next,
now acked by Will, which prevent the kernel accessing userspace with
zero memory footprint.  No need for remapping, we have a way to quickly
turn off access to userspace mapped pages on non-LPAE 32-bit CPUs.
(LPAE is not supported yet - Catalin will be working on that using the
hooks I'm providing once he returns.)

This isn't a debugging thing, it's a security hardening thing.  Some
use-after-free bugs are potentially exploitable from userspace.  See
the recent blackhat conference paper.

Will Deacon Aug. 25, 2015, 9:15 a.m. UTC | #27

On Mon, Aug 24, 2015 at 06:47:36PM +0100, Russell King - ARM Linux wrote:
> On Mon, Aug 24, 2015 at 05:15:22PM +0300, Andrey Ryabinin wrote:
> > Yes, ~130Mb (3G/1G split) should work. 512Mb shadow is optional.
> > The only advantage of 512Mb shadow is better handling of user memory
> > accesses bugs
> > (access to user memory without copy_from_user/copy_to_user/strlen_user etc API).
> 
> No need for that to be handed by KASan.  I have patches in linux-next,
> now acked by Will, which prevent the kernel accessing userspace with
> zero memory footprint.  No need for remapping, we have a way to quickly
> turn off access to userspace mapped pages on non-LPAE 32-bit CPUs.
> (LPAE is not supported yet - Catalin will be working on that using the
> hooks I'm providing once he returns.)

Hey, I only acked the "Efficiency cleanups" series so far! The PAN emulation
is still on my list.

Will

[v2,5/5] arm64: add KASan support

Commit Message

Comments

Patch