Message ID | 20220221161232.2168364-5-alexandre.ghiti@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fixes KASAN and other along the way | expand |
Hi Alexandre, Thanks for the series! However, I still haven't managed to boot the kernel. What I did: 1) Checked out the riscv/fixes branch (this is the one we're using on syzbot). The latest commit was 6df2a016c0c8a3d0933ef33dd192ea6606b115e3. 2) Applied all 4 patches. 3) Used the config from the cover letter: https://gist.github.com/a-nogikh/279c85c2d24f47efcc3e865c08844138 4) Built with `make -j32 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-` 5) Ran with `qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot -device virtio-rng-pci -machine virt -device virtio-net-pci,netdev=net0 -netdev user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device virtio-blk-device,drive=hd0 -drive file=~/kernel-image/riscv64,if=none,format=raw,id=hd0 -snapshot -kernel ~/linux-riscv/arch/riscv/boot/Image -append "root=/dev/vda console=ttyS0 earlyprintk=serial"` (this is similar to how syzkaller runs qemu). Can you please hint at what I'm doing differently? A simple config with KASAN, KASAN_OUTLINE and DEBUG_VIRTUAL now indeed leads to a booting kernel, which was not the case before. make defconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- ./scripts/config -e KASAN -e KASAN_OUTLINE -e DEBUG_VIRTUAL make olddefconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- -- Best Regards, Aleksandr On Mon, Feb 21, 2022 at 5:17 PM Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote: > > __virt_to_phys function is called very early in the boot process (ie > kasan_early_init) so it should not be instrumented by KASAN otherwise it > bugs. > > Fix this by declaring phys_addr.c as non-kasan instrumentable. > > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> > --- > arch/riscv/mm/Makefile | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile > index 7ebaef10ea1b..ac7a25298a04 100644 > --- a/arch/riscv/mm/Makefile > +++ b/arch/riscv/mm/Makefile > @@ -24,6 +24,9 @@ obj-$(CONFIG_KASAN) += kasan_init.o > ifdef CONFIG_KASAN > KASAN_SANITIZE_kasan_init.o := n > KASAN_SANITIZE_init.o := n > +ifdef CONFIG_DEBUG_VIRTUAL > +KASAN_SANITIZE_physaddr.o := n > +endif > endif > > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o > -- > 2.32.0 >
Hi Aleksandr, On Tue, Feb 22, 2022 at 11:28 AM Aleksandr Nogikh <nogikh@google.com> wrote: > > Hi Alexandre, > > Thanks for the series! > > However, I still haven't managed to boot the kernel. What I did: > 1) Checked out the riscv/fixes branch (this is the one we're using on > syzbot). The latest commit was > 6df2a016c0c8a3d0933ef33dd192ea6606b115e3. > 2) Applied all 4 patches. > 3) Used the config from the cover letter: > https://gist.github.com/a-nogikh/279c85c2d24f47efcc3e865c08844138 > 4) Built with `make -j32 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-` > 5) Ran with `qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot > -device virtio-rng-pci -machine virt -device > virtio-net-pci,netdev=net0 -netdev > user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device > virtio-blk-device,drive=hd0 -drive > file=~/kernel-image/riscv64,if=none,format=raw,id=hd0 -snapshot > -kernel ~/linux-riscv/arch/riscv/boot/Image -append "root=/dev/vda > console=ttyS0 earlyprintk=serial"` (this is similar to how syzkaller > runs qemu). > > Can you please hint at what I'm doing differently? A short summary of what I found to keep you updated: I compared your command line and mine, the differences are that I use "smp=4" and I add "earlycon" to the kernel command line. When added to your command line, that allows it to boot. I understand why it helps but I can't explain what's wrong...Anyway, I fixed a warning that I had missed and that allows me to remove the "smp=4" and "earlycon". But this is not over yet...Your command line still does not allow to reach userspace, it fails with the following stacktrace: [ 11.537817][ T1] Unable to handle kernel paging request at virtual address fffff5eeffffc800 [ 11.539450][ T1] Oops [#1] [ 11.539909][ T1] Modules linked in: [ 11.540451][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc1-00007-ga68b89289e26-dirty #28 [ 11.541364][ T1] Hardware name: riscv-virtio,qemu (DT) [ 11.542032][ T1] epc : kasan_check_range+0x96/0x13e [ 11.542654][ T1] ra : memset+0x1e/0x4c [ 11.543388][ T1] epc : ffffffff8046c312 ra : ffffffff8046ca16 sp : ffffaf8007337b70 [ 11.544037][ T1] gp : ffffffff85866c80 tp : ffffaf80073d8000 t0 : 0000000000046000 [ 11.544637][ T1] t1 : fffff5eeffffc9ff t2 : 0000000000000000 s0 : ffffaf8007337ba0 [ 11.545409][ T1] s1 : 0000000000001000 a0 : fffff5eeffffca00 a1 : 0000000000001000 [ 11.546072][ T1] a2 : 0000000000000001 a3 : ffffffff8039ef24 a4 : ffffaf7ffffe4000 [ 11.546707][ T1] a5 : fffff5eeffffc800 a6 : 0000004000000000 a7 : ffffaf7ffffe4fff [ 11.547541][ T1] s2 : ffffaf7ffffe4000 s3 : 0000000000000000 s4 : ffffffff8467faa8 [ 11.548277][ T1] s5 : 0000000000000000 s6 : ffffffff85869840 s7 : 0000000000000000 [ 11.548950][ T1] s8 : 0000000000001000 s9 : ffffaf805a54a048 s10: ffffffff8588d420 [ 11.549705][ T1] s11: ffffaf7ffffe4000 t3 : 0000000000000000 t4 : 0000000000000040 [ 11.550465][ T1] t5 : fffff5eeffffca00 t6 : 0000000000000002 [ 11.551131][ T1] status: 0000000000000120 badaddr: fffff5eeffffc800 cause: 000000000000000d [ 11.551961][ T1] [<ffffffff8039ef24>] pcpu_alloc+0x84a/0x125c [ 11.552928][ T1] [<ffffffff8039f994>] __alloc_percpu+0x28/0x34 [ 11.553555][ T1] [<ffffffff83286954>] ip_rt_init+0x15a/0x35c [ 11.554128][ T1] [<ffffffff83286d24>] ip_init+0x18/0x30 [ 11.554642][ T1] [<ffffffff8328844a>] inet_init+0x2a6/0x550 [ 11.555428][ T1] [<ffffffff80003220>] do_one_initcall+0x132/0x7e4 [ 11.556049][ T1] [<ffffffff83201f7a>] kernel_init_freeable+0x510/0x5b4 [ 11.556771][ T1] [<ffffffff831424e4>] kernel_init+0x28/0x21c [ 11.557344][ T1] [<ffffffff800056a0>] ret_from_exception+0x0/0x14 [ 11.585469][ T1] ---[ end trace 0000000000000000 ]--- 0xfffff5eeffffc800 is a KASAN address that points to the very end of vmalloc address range, which is weird since KASAN_VMALLOC is not enabled. Moreover my command line does not trigger the above bug, and I'm trying to understand why: /home/alex/work/qemu/build/riscv64-softmmu/qemu-system-riscv64 -M virt -bios /home/alex/work/opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel /home/alex/work/kernel-build/riscv_rv64_kernel/arch/riscv/boot/Image -netdev user,id=net0 -device virtio-net-device,netdev=net0 -drive file=/home/alex/work/kernel-build/rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -nographic -smp 4 -m 16G -s -append "rootwait earlycon root=/dev/vda ro earlyprintk=serial" I'm looking into all of this and will get back with a v3 soon :) Thanks, Alex > > A simple config with KASAN, KASAN_OUTLINE and DEBUG_VIRTUAL now indeed > leads to a booting kernel, which was not the case before. > make defconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > ./scripts/config -e KASAN -e KASAN_OUTLINE -e DEBUG_VIRTUAL > make olddefconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > > -- > Best Regards, > Aleksandr > > On Mon, Feb 21, 2022 at 5:17 PM Alexandre Ghiti > <alexandre.ghiti@canonical.com> wrote: > > > > __virt_to_phys function is called very early in the boot process (ie > > kasan_early_init) so it should not be instrumented by KASAN otherwise it > > bugs. > > > > Fix this by declaring phys_addr.c as non-kasan instrumentable. > > > > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> > > --- > > arch/riscv/mm/Makefile | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile > > index 7ebaef10ea1b..ac7a25298a04 100644 > > --- a/arch/riscv/mm/Makefile > > +++ b/arch/riscv/mm/Makefile > > @@ -24,6 +24,9 @@ obj-$(CONFIG_KASAN) += kasan_init.o > > ifdef CONFIG_KASAN > > KASAN_SANITIZE_kasan_init.o := n > > KASAN_SANITIZE_init.o := n > > +ifdef CONFIG_DEBUG_VIRTUAL > > +KASAN_SANITIZE_physaddr.o := n > > +endif > > endif > > > > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o > > -- > > 2.32.0 > >
On Wed, Feb 23, 2022 at 2:10 PM Alexandre Ghiti <alexandre.ghiti@canonical.com> wrote: > > Hi Aleksandr, > > On Tue, Feb 22, 2022 at 11:28 AM Aleksandr Nogikh <nogikh@google.com> wrote: > > > > Hi Alexandre, > > > > Thanks for the series! > > > > However, I still haven't managed to boot the kernel. What I did: > > 1) Checked out the riscv/fixes branch (this is the one we're using on > > syzbot). The latest commit was > > 6df2a016c0c8a3d0933ef33dd192ea6606b115e3. > > 2) Applied all 4 patches. > > 3) Used the config from the cover letter: > > https://gist.github.com/a-nogikh/279c85c2d24f47efcc3e865c08844138 > > 4) Built with `make -j32 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-` > > 5) Ran with `qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot > > -device virtio-rng-pci -machine virt -device > > virtio-net-pci,netdev=net0 -netdev > > user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device > > virtio-blk-device,drive=hd0 -drive > > file=~/kernel-image/riscv64,if=none,format=raw,id=hd0 -snapshot > > -kernel ~/linux-riscv/arch/riscv/boot/Image -append "root=/dev/vda > > console=ttyS0 earlyprintk=serial"` (this is similar to how syzkaller > > runs qemu). > > > > Can you please hint at what I'm doing differently? > > A short summary of what I found to keep you updated: > > I compared your command line and mine, the differences are that I use > "smp=4" and I add "earlycon" to the kernel command line. When added to > your command line, that allows it to boot. I understand why it helps > but I can't explain what's wrong...Anyway, I fixed a warning that I > had missed and that allows me to remove the "smp=4" and "earlycon". > > But this is not over yet...Your command line still does not allow to > reach userspace, it fails with the following stacktrace: > > [ 11.537817][ T1] Unable to handle kernel paging request at > virtual address fffff5eeffffc800 > [ 11.539450][ T1] Oops [#1] > [ 11.539909][ T1] Modules linked in: > [ 11.540451][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 5.17.0-rc1-00007-ga68b89289e26-dirty #28 > [ 11.541364][ T1] Hardware name: riscv-virtio,qemu (DT) > [ 11.542032][ T1] epc : kasan_check_range+0x96/0x13e > [ 11.542654][ T1] ra : memset+0x1e/0x4c > [ 11.543388][ T1] epc : ffffffff8046c312 ra : ffffffff8046ca16 sp > : ffffaf8007337b70 > [ 11.544037][ T1] gp : ffffffff85866c80 tp : ffffaf80073d8000 t0 > : 0000000000046000 > [ 11.544637][ T1] t1 : fffff5eeffffc9ff t2 : 0000000000000000 s0 > : ffffaf8007337ba0 > [ 11.545409][ T1] s1 : 0000000000001000 a0 : fffff5eeffffca00 a1 > : 0000000000001000 > [ 11.546072][ T1] a2 : 0000000000000001 a3 : ffffffff8039ef24 a4 > : ffffaf7ffffe4000 > [ 11.546707][ T1] a5 : fffff5eeffffc800 a6 : 0000004000000000 a7 > : ffffaf7ffffe4fff > [ 11.547541][ T1] s2 : ffffaf7ffffe4000 s3 : 0000000000000000 s4 > : ffffffff8467faa8 > [ 11.548277][ T1] s5 : 0000000000000000 s6 : ffffffff85869840 s7 > : 0000000000000000 > [ 11.548950][ T1] s8 : 0000000000001000 s9 : ffffaf805a54a048 > s10: ffffffff8588d420 > [ 11.549705][ T1] s11: ffffaf7ffffe4000 t3 : 0000000000000000 t4 > : 0000000000000040 > [ 11.550465][ T1] t5 : fffff5eeffffca00 t6 : 0000000000000002 > [ 11.551131][ T1] status: 0000000000000120 badaddr: > fffff5eeffffc800 cause: 000000000000000d > [ 11.551961][ T1] [<ffffffff8039ef24>] pcpu_alloc+0x84a/0x125c > [ 11.552928][ T1] [<ffffffff8039f994>] __alloc_percpu+0x28/0x34 > [ 11.553555][ T1] [<ffffffff83286954>] ip_rt_init+0x15a/0x35c > [ 11.554128][ T1] [<ffffffff83286d24>] ip_init+0x18/0x30 > [ 11.554642][ T1] [<ffffffff8328844a>] inet_init+0x2a6/0x550 > [ 11.555428][ T1] [<ffffffff80003220>] do_one_initcall+0x132/0x7e4 > [ 11.556049][ T1] [<ffffffff83201f7a>] kernel_init_freeable+0x510/0x5b4 > [ 11.556771][ T1] [<ffffffff831424e4>] kernel_init+0x28/0x21c > [ 11.557344][ T1] [<ffffffff800056a0>] ret_from_exception+0x0/0x14 > [ 11.585469][ T1] ---[ end trace 0000000000000000 ]--- > > 0xfffff5eeffffc800 is a KASAN address that points to the very end of > vmalloc address range, which is weird since KASAN_VMALLOC is not > enabled. > Moreover my command line does not trigger the above bug, and I'm > trying to understand why: When I read this email I saw that I did not use the same qemu version: I have a locally built version that disables sv48, which is the one that works so the problem came from the sv48 support. In a nutshell, the issue comes from the fact that kasan inner regions are not aligned on PGDIR_SIZE when sv48 (which is 4-level page table) is on, and then when populating the kasan linear mapping region, that clears the kasan vmalloc region which is in the same PGD: the fix is to copy its content before initializing the linear mapping entries. This issue only happens when KASAN_VMALLOC is disabled. I had fixed this already for kasan_shallow_populate_pud, but missed kasan_populate_pud. Tomorrow I'll push the v3. It still does not fix the issue I describe in the cover letter though, so still more work to do. At least, I was able to reach userspace with your *exact* qemu command :) Alex > > /home/alex/work/qemu/build/riscv64-softmmu/qemu-system-riscv64 -M virt > -bios /home/alex/work/opensbi/build/platform/generic/firmware/fw_dynamic.bin > -kernel /home/alex/work/kernel-build/riscv_rv64_kernel/arch/riscv/boot/Image > -netdev user,id=net0 -device virtio-net-device,netdev=net0 -drive > file=/home/alex/work/kernel-build/rootfs.ext2,format=raw,id=hd0 > -device virtio-blk-device,drive=hd0 -nographic -smp 4 -m 16G -s > -append "rootwait earlycon root=/dev/vda ro earlyprintk=serial" > > I'm looking into all of this and will get back with a v3 soon :) > > Thanks, > > Alex > > > > > > > > > > A simple config with KASAN, KASAN_OUTLINE and DEBUG_VIRTUAL now indeed > > leads to a booting kernel, which was not the case before. > > make defconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > > ./scripts/config -e KASAN -e KASAN_OUTLINE -e DEBUG_VIRTUAL > > make olddefconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > > > > -- > > Best Regards, > > Aleksandr > > > > On Mon, Feb 21, 2022 at 5:17 PM Alexandre Ghiti > > <alexandre.ghiti@canonical.com> wrote: > > > > > > __virt_to_phys function is called very early in the boot process (ie > > > kasan_early_init) so it should not be instrumented by KASAN otherwise it > > > bugs. > > > > > > Fix this by declaring phys_addr.c as non-kasan instrumentable. > > > > > > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> > > > --- > > > arch/riscv/mm/Makefile | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile > > > index 7ebaef10ea1b..ac7a25298a04 100644 > > > --- a/arch/riscv/mm/Makefile > > > +++ b/arch/riscv/mm/Makefile > > > @@ -24,6 +24,9 @@ obj-$(CONFIG_KASAN) += kasan_init.o > > > ifdef CONFIG_KASAN > > > KASAN_SANITIZE_kasan_init.o := n > > > KASAN_SANITIZE_init.o := n > > > +ifdef CONFIG_DEBUG_VIRTUAL > > > +KASAN_SANITIZE_physaddr.o := n > > > +endif > > > endif > > > > > > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o > > > -- > > > 2.32.0 > > >
On Wed, 23 Feb 2022 09:17:16 PST (-0800), alexandre.ghiti@canonical.com wrote: > On Wed, Feb 23, 2022 at 2:10 PM Alexandre Ghiti > <alexandre.ghiti@canonical.com> wrote: >> >> Hi Aleksandr, >> >> On Tue, Feb 22, 2022 at 11:28 AM Aleksandr Nogikh <nogikh@google.com> wrote: >> > >> > Hi Alexandre, >> > >> > Thanks for the series! >> > >> > However, I still haven't managed to boot the kernel. What I did: >> > 1) Checked out the riscv/fixes branch (this is the one we're using on >> > syzbot). The latest commit was >> > 6df2a016c0c8a3d0933ef33dd192ea6606b115e3. >> > 2) Applied all 4 patches. >> > 3) Used the config from the cover letter: >> > https://gist.github.com/a-nogikh/279c85c2d24f47efcc3e865c08844138 >> > 4) Built with `make -j32 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-` >> > 5) Ran with `qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot >> > -device virtio-rng-pci -machine virt -device >> > virtio-net-pci,netdev=net0 -netdev >> > user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device >> > virtio-blk-device,drive=hd0 -drive >> > file=~/kernel-image/riscv64,if=none,format=raw,id=hd0 -snapshot >> > -kernel ~/linux-riscv/arch/riscv/boot/Image -append "root=/dev/vda >> > console=ttyS0 earlyprintk=serial"` (this is similar to how syzkaller >> > runs qemu). >> > >> > Can you please hint at what I'm doing differently? >> >> A short summary of what I found to keep you updated: >> >> I compared your command line and mine, the differences are that I use >> "smp=4" and I add "earlycon" to the kernel command line. When added to >> your command line, that allows it to boot. I understand why it helps >> but I can't explain what's wrong...Anyway, I fixed a warning that I >> had missed and that allows me to remove the "smp=4" and "earlycon". >> >> But this is not over yet...Your command line still does not allow to >> reach userspace, it fails with the following stacktrace: >> >> [ 11.537817][ T1] Unable to handle kernel paging request at >> virtual address fffff5eeffffc800 >> [ 11.539450][ T1] Oops [#1] >> [ 11.539909][ T1] Modules linked in: >> [ 11.540451][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted >> 5.17.0-rc1-00007-ga68b89289e26-dirty #28 >> [ 11.541364][ T1] Hardware name: riscv-virtio,qemu (DT) >> [ 11.542032][ T1] epc : kasan_check_range+0x96/0x13e >> [ 11.542654][ T1] ra : memset+0x1e/0x4c >> [ 11.543388][ T1] epc : ffffffff8046c312 ra : ffffffff8046ca16 sp >> : ffffaf8007337b70 >> [ 11.544037][ T1] gp : ffffffff85866c80 tp : ffffaf80073d8000 t0 >> : 0000000000046000 >> [ 11.544637][ T1] t1 : fffff5eeffffc9ff t2 : 0000000000000000 s0 >> : ffffaf8007337ba0 >> [ 11.545409][ T1] s1 : 0000000000001000 a0 : fffff5eeffffca00 a1 >> : 0000000000001000 >> [ 11.546072][ T1] a2 : 0000000000000001 a3 : ffffffff8039ef24 a4 >> : ffffaf7ffffe4000 >> [ 11.546707][ T1] a5 : fffff5eeffffc800 a6 : 0000004000000000 a7 >> : ffffaf7ffffe4fff >> [ 11.547541][ T1] s2 : ffffaf7ffffe4000 s3 : 0000000000000000 s4 >> : ffffffff8467faa8 >> [ 11.548277][ T1] s5 : 0000000000000000 s6 : ffffffff85869840 s7 >> : 0000000000000000 >> [ 11.548950][ T1] s8 : 0000000000001000 s9 : ffffaf805a54a048 >> s10: ffffffff8588d420 >> [ 11.549705][ T1] s11: ffffaf7ffffe4000 t3 : 0000000000000000 t4 >> : 0000000000000040 >> [ 11.550465][ T1] t5 : fffff5eeffffca00 t6 : 0000000000000002 >> [ 11.551131][ T1] status: 0000000000000120 badaddr: >> fffff5eeffffc800 cause: 000000000000000d >> [ 11.551961][ T1] [<ffffffff8039ef24>] pcpu_alloc+0x84a/0x125c >> [ 11.552928][ T1] [<ffffffff8039f994>] __alloc_percpu+0x28/0x34 >> [ 11.553555][ T1] [<ffffffff83286954>] ip_rt_init+0x15a/0x35c >> [ 11.554128][ T1] [<ffffffff83286d24>] ip_init+0x18/0x30 >> [ 11.554642][ T1] [<ffffffff8328844a>] inet_init+0x2a6/0x550 >> [ 11.555428][ T1] [<ffffffff80003220>] do_one_initcall+0x132/0x7e4 >> [ 11.556049][ T1] [<ffffffff83201f7a>] kernel_init_freeable+0x510/0x5b4 >> [ 11.556771][ T1] [<ffffffff831424e4>] kernel_init+0x28/0x21c >> [ 11.557344][ T1] [<ffffffff800056a0>] ret_from_exception+0x0/0x14 >> [ 11.585469][ T1] ---[ end trace 0000000000000000 ]--- >> >> 0xfffff5eeffffc800 is a KASAN address that points to the very end of >> vmalloc address range, which is weird since KASAN_VMALLOC is not >> enabled. >> Moreover my command line does not trigger the above bug, and I'm >> trying to understand why: > > When I read this email I saw that I did not use the same qemu version: > I have a locally built version that disables sv48, which is the one > that works so the problem came from the sv48 support. > > In a nutshell, the issue comes from the fact that kasan inner regions > are not aligned on PGDIR_SIZE when sv48 (which is 4-level page table) > is on, and then when populating the kasan linear mapping region, that > clears the kasan vmalloc region which is in the same PGD: the fix is > to copy its content before initializing the linear mapping entries. > This issue only happens when KASAN_VMALLOC is disabled. I had fixed > this already for kasan_shallow_populate_pud, but missed > kasan_populate_pud. > > Tomorrow I'll push the v3. It still does not fix the issue I describe > in the cover letter though, so still more work to do. At least, I was > able to reach userspace with your *exact* qemu command :) I can't find a v3. > > Alex > > >> >> /home/alex/work/qemu/build/riscv64-softmmu/qemu-system-riscv64 -M virt >> -bios /home/alex/work/opensbi/build/platform/generic/firmware/fw_dynamic.bin >> -kernel /home/alex/work/kernel-build/riscv_rv64_kernel/arch/riscv/boot/Image >> -netdev user,id=net0 -device virtio-net-device,netdev=net0 -drive >> file=/home/alex/work/kernel-build/rootfs.ext2,format=raw,id=hd0 >> -device virtio-blk-device,drive=hd0 -nographic -smp 4 -m 16G -s >> -append "rootwait earlycon root=/dev/vda ro earlyprintk=serial" >> >> I'm looking into all of this and will get back with a v3 soon :) >> >> Thanks, >> >> Alex >> >> >> >> >> >> >> > >> > A simple config with KASAN, KASAN_OUTLINE and DEBUG_VIRTUAL now indeed >> > leads to a booting kernel, which was not the case before. >> > make defconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- >> > ./scripts/config -e KASAN -e KASAN_OUTLINE -e DEBUG_VIRTUAL >> > make olddefconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- >> > >> > -- >> > Best Regards, >> > Aleksandr >> > >> > On Mon, Feb 21, 2022 at 5:17 PM Alexandre Ghiti >> > <alexandre.ghiti@canonical.com> wrote: >> > > >> > > __virt_to_phys function is called very early in the boot process (ie >> > > kasan_early_init) so it should not be instrumented by KASAN otherwise it >> > > bugs. >> > > >> > > Fix this by declaring phys_addr.c as non-kasan instrumentable. >> > > >> > > Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> >> > > --- >> > > arch/riscv/mm/Makefile | 3 +++ >> > > 1 file changed, 3 insertions(+) >> > > >> > > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile >> > > index 7ebaef10ea1b..ac7a25298a04 100644 >> > > --- a/arch/riscv/mm/Makefile >> > > +++ b/arch/riscv/mm/Makefile >> > > @@ -24,6 +24,9 @@ obj-$(CONFIG_KASAN) += kasan_init.o >> > > ifdef CONFIG_KASAN >> > > KASAN_SANITIZE_kasan_init.o := n >> > > KASAN_SANITIZE_init.o := n >> > > +ifdef CONFIG_DEBUG_VIRTUAL >> > > +KASAN_SANITIZE_physaddr.o := n >> > > +endif >> > > endif >> > > >> > > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o >> > > -- >> > > 2.32.0 >> > >
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile index 7ebaef10ea1b..ac7a25298a04 100644 --- a/arch/riscv/mm/Makefile +++ b/arch/riscv/mm/Makefile @@ -24,6 +24,9 @@ obj-$(CONFIG_KASAN) += kasan_init.o ifdef CONFIG_KASAN KASAN_SANITIZE_kasan_init.o := n KASAN_SANITIZE_init.o := n +ifdef CONFIG_DEBUG_VIRTUAL +KASAN_SANITIZE_physaddr.o := n +endif endif obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
__virt_to_phys function is called very early in the boot process (ie kasan_early_init) so it should not be instrumented by KASAN otherwise it bugs. Fix this by declaring phys_addr.c as non-kasan instrumentable. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> --- arch/riscv/mm/Makefile | 3 +++ 1 file changed, 3 insertions(+)