mbox series

[-fixes,v3,0/6] Fixes KASAN and other along the way

Message ID 20220225123953.3251327-1-alexandre.ghiti@canonical.com (mailing list archive)
Headers show
Series Fixes KASAN and other along the way | expand

Message

Alexandre Ghiti Feb. 25, 2022, 12:39 p.m. UTC
As reported by Aleksandr, syzbot riscv is broken since commit
54c5639d8f50 ("riscv: Fix asan-stack clang build"). This commit actually
breaks KASAN_INLINE which is not fixed in this series, that will come later
when found.

Nevertheless, this series fixes small things that made the syzbot
configuration + KASAN_OUTLINE fail to boot.

Note that even though the config at [1] boots fine with this series, I
was not able to boot the small config at [2] which fails because
kasan_poison receives a really weird address 0x4075706301000000 (maybe a
kasan person could provide some hint about what happens below in
do_ctors -> __asan_register_globals):

Thread 2 hit Breakpoint 1, kasan_poison (addr=<optimized out>, size=<optimized out>, value=<optimized out>, init=<optimized out>) at /home/alex/work/linux/mm/kasan/shadow.c:90
90		if (WARN_ON((unsigned long)addr & KASAN_GRANULE_MASK))
1: x/i $pc
=> 0xffffffff80261712 <kasan_poison>:	andi	a4,a0,7
5: /x $a0 = 0x4075706301000000

Thread 2 hit Breakpoint 2, handle_exception () at /home/alex/work/linux/arch/riscv/kernel/entry.S:27
27		csrrw tp, CSR_SCRATCH, tp
1: x/i $pc
=> 0xffffffff80004098 <handle_exception>:	csrrw	tp,sscratch,tp
5: /x $a0 = 0xe80eae0b60200000
(gdb) bt
#0  handle_exception () at /home/alex/work/linux/arch/riscv/kernel/entry.S:27
#1  0xffffffff80261746 in kasan_poison (addr=<optimized out>, size=<optimized out>, value=<optimized out>, init=<optimized out>)
    at /home/alex/work/linux/mm/kasan/shadow.c:98
#2  0xffffffff802618b4 in kasan_unpoison (addr=<optimized out>, size=<optimized out>, init=<optimized out>)
    at /home/alex/work/linux/mm/kasan/shadow.c:138
#3  0xffffffff80260876 in register_global (global=<optimized out>) at /home/alex/work/linux/mm/kasan/generic.c:214
#4  __asan_register_globals (globals=<optimized out>, size=<optimized out>) at /home/alex/work/linux/mm/kasan/generic.c:226
#5  0xffffffff8125efac in _sub_I_65535_1 ()
#6  0xffffffff81201b32 in do_ctors () at /home/alex/work/linux/init/main.c:1156
#7  do_basic_setup () at /home/alex/work/linux/init/main.c:1407
#8  kernel_init_freeable () at /home/alex/work/linux/init/main.c:1613
#9  0xffffffff81153ddc in kernel_init (unused=<optimized out>) at /home/alex/work/linux/init/main.c:1502
#10 0xffffffff800041c0 in handle_exception () at /home/alex/work/linux/arch/riscv/kernel/entry.S:231


Thanks again to Aleksandr for narrowing down the issues fixed here.


[1] https://gist.github.com/a-nogikh/279c85c2d24f47efcc3e865c08844138
[2] https://gist.github.com/AlexGhiti/a5a0cab0227e2bf38f9d12232591c0e4

Changes in v3:
- Add PATCH 5/6 and PATCH 6/6

Changes in v2:
- Fix kernel test robot failure regarding KERN_VIRT_SIZE that is
  undefined for nommu config

Alexandre Ghiti (6):
  riscv: Fix is_linear_mapping with recent move of KASAN region
  riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
  riscv: Fix DEBUG_VIRTUAL false warnings
  riscv: Fix config KASAN && DEBUG_VIRTUAL
  riscv: Move high_memory initialization to setup_bootmem
  riscv: Fix kasan pud population

 arch/riscv/include/asm/page.h    | 2 +-
 arch/riscv/include/asm/pgtable.h | 1 +
 arch/riscv/mm/Makefile           | 3 +++
 arch/riscv/mm/init.c             | 2 +-
 arch/riscv/mm/kasan_init.c       | 8 +++++---
 arch/riscv/mm/physaddr.c         | 4 +---
 6 files changed, 12 insertions(+), 8 deletions(-)

Comments

Marco Elver Feb. 25, 2022, 1:05 p.m. UTC | #1
On Fri, 25 Feb 2022 at 13:40, Alexandre Ghiti
<alexandre.ghiti@canonical.com> wrote:
>
> As reported by Aleksandr, syzbot riscv is broken since commit
> 54c5639d8f50 ("riscv: Fix asan-stack clang build"). This commit actually
> breaks KASAN_INLINE which is not fixed in this series, that will come later
> when found.
>
> Nevertheless, this series fixes small things that made the syzbot
> configuration + KASAN_OUTLINE fail to boot.
>
> Note that even though the config at [1] boots fine with this series, I
> was not able to boot the small config at [2] which fails because
> kasan_poison receives a really weird address 0x4075706301000000 (maybe a
> kasan person could provide some hint about what happens below in
> do_ctors -> __asan_register_globals):

asan_register_globals is responsible for poisoning redzones around
globals. As hinted by 'do_ctors', it calls constructors, and in this
case a compiler-generated constructor that calls
__asan_register_globals with metadata generated by the compiler. That
metadata contains information about global variables. Note, these
constructors are called on initial boot, but also every time a kernel
module (that has globals) is loaded.

It may also be a toolchain issue, but it's hard to say. If you're
using GCC to test, try Clang (11 or later), and vice-versa.
Alexandre Ghiti Feb. 25, 2022, 2:04 p.m. UTC | #2
On Fri, Feb 25, 2022 at 2:06 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 25 Feb 2022 at 13:40, Alexandre Ghiti
> <alexandre.ghiti@canonical.com> wrote:
> >
> > As reported by Aleksandr, syzbot riscv is broken since commit
> > 54c5639d8f50 ("riscv: Fix asan-stack clang build"). This commit actually
> > breaks KASAN_INLINE which is not fixed in this series, that will come later
> > when found.
> >
> > Nevertheless, this series fixes small things that made the syzbot
> > configuration + KASAN_OUTLINE fail to boot.
> >
> > Note that even though the config at [1] boots fine with this series, I
> > was not able to boot the small config at [2] which fails because
> > kasan_poison receives a really weird address 0x4075706301000000 (maybe a
> > kasan person could provide some hint about what happens below in
> > do_ctors -> __asan_register_globals):
>
> asan_register_globals is responsible for poisoning redzones around
> globals. As hinted by 'do_ctors', it calls constructors, and in this
> case a compiler-generated constructor that calls
> __asan_register_globals with metadata generated by the compiler. That
> metadata contains information about global variables. Note, these
> constructors are called on initial boot, but also every time a kernel
> module (that has globals) is loaded.
>
> It may also be a toolchain issue, but it's hard to say. If you're
> using GCC to test, try Clang (11 or later), and vice-versa.

I tried 3 different gcc toolchains already, but that did not fix the
issue. The only thing that worked was setting asan-globals=0 in
scripts/Makefile.kasan, but ok, that's not a fix.
I tried to bisect this issue but our kasan implementation has been
broken quite a few times, so it failed.

I keep digging!

Thanks for the tips,

Alex
Alexandre Ghiti Feb. 25, 2022, 2:15 p.m. UTC | #3
On Fri, Feb 25, 2022 at 3:10 PM Alexander Potapenko <glider@google.com> wrote:
>
>
>
> On Fri, Feb 25, 2022 at 3:04 PM Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote:
>>
>> On Fri, Feb 25, 2022 at 2:06 PM Marco Elver <elver@google.com> wrote:
>> >
>> > On Fri, 25 Feb 2022 at 13:40, Alexandre Ghiti
>> > <alexandre.ghiti@canonical.com> wrote:
>> > >
>> > > As reported by Aleksandr, syzbot riscv is broken since commit
>> > > 54c5639d8f50 ("riscv: Fix asan-stack clang build"). This commit actually
>> > > breaks KASAN_INLINE which is not fixed in this series, that will come later
>> > > when found.
>> > >
>> > > Nevertheless, this series fixes small things that made the syzbot
>> > > configuration + KASAN_OUTLINE fail to boot.
>> > >
>> > > Note that even though the config at [1] boots fine with this series, I
>> > > was not able to boot the small config at [2] which fails because
>> > > kasan_poison receives a really weird address 0x4075706301000000 (maybe a
>> > > kasan person could provide some hint about what happens below in
>> > > do_ctors -> __asan_register_globals):
>> >
>> > asan_register_globals is responsible for poisoning redzones around
>> > globals. As hinted by 'do_ctors', it calls constructors, and in this
>> > case a compiler-generated constructor that calls
>> > __asan_register_globals with metadata generated by the compiler. That
>> > metadata contains information about global variables. Note, these
>> > constructors are called on initial boot, but also every time a kernel
>> > module (that has globals) is loaded.
>> >
>> > It may also be a toolchain issue, but it's hard to say. If you're
>> > using GCC to test, try Clang (11 or later), and vice-versa.
>>
>> I tried 3 different gcc toolchains already, but that did not fix the
>> issue. The only thing that worked was setting asan-globals=0 in
>> scripts/Makefile.kasan, but ok, that's not a fix.
>> I tried to bisect this issue but our kasan implementation has been
>> broken quite a few times, so it failed.
>>
>> I keep digging!
>>
>
> The problem does not reproduce for me with GCC 11.2.0: kernels built with both [1] and [2] are bootable.

Do you mean you reach userspace? Because my image boots too, and fails
at some point:

[    0.000150] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps
every 4398046511100ns
[    0.015847] Console: colour dummy device 80x25
[    0.016899] printk: console [tty0] enabled
[    0.020326] printk: bootconsole [ns16550a0] disabled

It traps here.

> FWIW here is how I run them:
>
> qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot \
>   -device virtio-rng-pci -machine virt -device \
>   virtio-net-pci,netdev=net0 -netdev \
>   user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device \
>   virtio-blk-device,drive=hd0 -drive \
>   file=${IMAGE},if=none,format=raw,id=hd0 -snapshot \
>   -kernel ${KERNEL_SRC_DIR}/arch/riscv/boot/Image -append "root=/dev/vda
>   console=ttyS0 earlyprintk=serial"
>
>
>>
>> Thanks for the tips,
>>
>> Alex
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Liana Sebastian
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
> Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
>
>
>
> This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
Alexandre Ghiti Feb. 25, 2022, 2:46 p.m. UTC | #4
On Fri, Feb 25, 2022 at 3:31 PM Alexander Potapenko <glider@google.com> wrote:
>
>
>
> On Fri, Feb 25, 2022 at 3:15 PM Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote:
>>
>> On Fri, Feb 25, 2022 at 3:10 PM Alexander Potapenko <glider@google.com> wrote:
>> >
>> >
>> >
>> > On Fri, Feb 25, 2022 at 3:04 PM Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote:
>> >>
>> >> On Fri, Feb 25, 2022 at 2:06 PM Marco Elver <elver@google.com> wrote:
>> >> >
>> >> > On Fri, 25 Feb 2022 at 13:40, Alexandre Ghiti
>> >> > <alexandre.ghiti@canonical.com> wrote:
>> >> > >
>> >> > > As reported by Aleksandr, syzbot riscv is broken since commit
>> >> > > 54c5639d8f50 ("riscv: Fix asan-stack clang build"). This commit actually
>> >> > > breaks KASAN_INLINE which is not fixed in this series, that will come later
>> >> > > when found.
>> >> > >
>> >> > > Nevertheless, this series fixes small things that made the syzbot
>> >> > > configuration + KASAN_OUTLINE fail to boot.
>> >> > >
>> >> > > Note that even though the config at [1] boots fine with this series, I
>> >> > > was not able to boot the small config at [2] which fails because
>> >> > > kasan_poison receives a really weird address 0x4075706301000000 (maybe a
>> >> > > kasan person could provide some hint about what happens below in
>> >> > > do_ctors -> __asan_register_globals):
>> >> >
>> >> > asan_register_globals is responsible for poisoning redzones around
>> >> > globals. As hinted by 'do_ctors', it calls constructors, and in this
>> >> > case a compiler-generated constructor that calls
>> >> > __asan_register_globals with metadata generated by the compiler. That
>> >> > metadata contains information about global variables. Note, these
>> >> > constructors are called on initial boot, but also every time a kernel
>> >> > module (that has globals) is loaded.
>> >> >
>> >> > It may also be a toolchain issue, but it's hard to say. If you're
>> >> > using GCC to test, try Clang (11 or later), and vice-versa.
>> >>
>> >> I tried 3 different gcc toolchains already, but that did not fix the
>> >> issue. The only thing that worked was setting asan-globals=0 in
>> >> scripts/Makefile.kasan, but ok, that's not a fix.
>> >> I tried to bisect this issue but our kasan implementation has been
>> >> broken quite a few times, so it failed.
>> >>
>> >> I keep digging!
>> >>
>> >
>> > The problem does not reproduce for me with GCC 11.2.0: kernels built with both [1] and [2] are bootable.
>>
>> Do you mean you reach userspace? Because my image boots too, and fails
>> at some point:
>>
>> [    0.000150] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps
>> every 4398046511100ns
>> [    0.015847] Console: colour dummy device 80x25
>> [    0.016899] printk: console [tty0] enabled
>> [    0.020326] printk: bootconsole [ns16550a0] disabled
>>
>
> In my case, QEMU successfully boots to the login prompt.
> I am running QEMU 6.2.0 (Debian 1:6.2+dfsg-2) and an image Aleksandr shared with me (guess it was built according to this instruction: https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md)
>

Nice thanks guys! I always use the latest opensbi and not the one that
is embedded in qemu, which is the only difference between your command
line (which works) and mine (which does not work). So the issue is
probably there, I really need to investigate that now.

That means I only need to fix KASAN_INLINE and we're good.

I imagine Palmer can add your Tested-by on the series then?

Thanks again!

Alex

>>
>> It traps here.
>>
>> > FWIW here is how I run them:
>> >
>> > qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot \
>> >   -device virtio-rng-pci -machine virt -device \
>> >   virtio-net-pci,netdev=net0 -netdev \
>> >   user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device \
>> >   virtio-blk-device,drive=hd0 -drive \
>> >   file=${IMAGE},if=none,format=raw,id=hd0 -snapshot \
>> >   -kernel ${KERNEL_SRC_DIR}/arch/riscv/boot/Image -append "root=/dev/vda
>> >   console=ttyS0 earlyprintk=serial"
>> >
>> >
>> >>
>> >> Thanks for the tips,
>> >>
>> >> Alex
>> >
>> >
>> >
>> > --
>> > Alexander Potapenko
>> > Software Engineer
>> >
>> > Google Germany GmbH
>> > Erika-Mann-Straße, 33
>> > 80636 München
>> >
>> > Geschäftsführer: Paul Manicle, Liana Sebastian
>> > Registergericht und -nummer: Hamburg, HRB 86891
>> > Sitz der Gesellschaft: Hamburg
>> >
>> > Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
>> >
>> >
>> >
>> > This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CA%2BzEjCsQPVYSV7CdhKnvjujXkMXuRQd%3DVPok1awb20xifYmidw%40mail.gmail.com.
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Liana Sebastian
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
> Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
>
>
>
> This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.