Message ID | Y9l0aBPUEpf1bci9@ZenIV (mailing list archive) |
---|---|
State | Awaiting Upstream |
Headers | show |
Series | [01/10] alpha: fix livelock in uaccess | expand |
Hello Al, On Tue, 31 Jan 2023, Al Viro wrote: > m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY > handling" If e.g. get_user() triggers a page fault and a fatal signal is > caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY > and not doing anything to page tables. In such case we must *not* > return to the faulting insn - that would repeat the entire thing without > making any progress; what we need instead is to treat that as failed > (user) memory access. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> That could be a bug I was chasing back in 2021 but never found. The mmap stressors in stress-ng were triggering a crash on a Mac Quadras, though only rarely. Sometimes it would run all day without a failure. Last year when I started using GCC 12 to build the kernel, I saw the same workload fail again but the failure mode had become a silent hang/livelock instead of the oopses I got with GCC 6. When I press the NMI button after the livelock I always see do_page_fault() in the backtrace. So I've been testing your patch. I've been running the same stress-ng reproducer for about 12 hours now with no failures which looks promising. In case that stress-ng testing is of use: Tested-by: Finn Thain <fthain@linux-m68k.org> BTW, how did you identify that bug in do_page_fault()? If its the same bug I was chasing, it could be an old one. The stress-ng logs I collected last year include a crash from a v4.14 build. > --- > arch/m68k/mm/fault.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c > index 4d2837eb3e2a..228128e45c67 100644 > --- a/arch/m68k/mm/fault.c > +++ b/arch/m68k/mm/fault.c > @@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, > fault = handle_mm_fault(vma, address, flags, regs); > pr_debug("handle_mm_fault returns %x\n", fault); > > - if (fault_signal_pending(fault, regs)) > + if (fault_signal_pending(fault, regs)) { > + if (!user_mode(regs)) > + goto no_context; > return 0; > + } > > /* The fault is fully completed (including releasing mmap lock) */ > if (fault & VM_FAULT_COMPLETED) >
On Sat, Feb 4, 2023 at 10:16 PM Finn Thain <fthain@linux-m68k.org> wrote: > > That could be a bug I was chasing back in 2021 but never found. The mmap > stressors in stress-ng were triggering a crash on a Mac Quadras, though > only rarely. Sometimes it would run all day without a failure. > > Last year when I started using GCC 12 to build the kernel, I saw the same > workload fail again but the failure mode had become a silent hang/livelock > instead of the oopses I got with GCC 6. > > When I press the NMI button after the livelock I always see > do_page_fault() in the backtrace. So I've been testing your patch. I've > been running the same stress-ng reproducer for about 12 hours now with no > failures which looks promising. > > In case that stress-ng testing is of use: > Tested-by: Finn Thain <fthain@linux-m68k.org> Could you test the thing that Mark Rutland pointed to? He had an actual test-case for this for the arm64 fixes some years ago. See https://lore.kernel.org/all/Y9pD+TMP+%2FSyfeJm@FVFF77S0Q05N/ for his email with links to his old test-case? Linus
On Sun, Feb 05, 2023 at 05:18:08PM +1100, Finn Thain wrote: > That could be a bug I was chasing back in 2021 but never found. The mmap > stressors in stress-ng were triggering a crash on a Mac Quadras, though > only rarely. Sometimes it would run all day without a failure. > > Last year when I started using GCC 12 to build the kernel, I saw the same > workload fail again but the failure mode had become a silent hang/livelock > instead of the oopses I got with GCC 6. > > When I press the NMI button after the livelock I always see > do_page_fault() in the backtrace. So I've been testing your patch. I've > been running the same stress-ng reproducer for about 12 hours now with no > failures which looks promising. > > In case that stress-ng testing is of use: > Tested-by: Finn Thain <fthain@linux-m68k.org> > > BTW, how did you identify that bug in do_page_fault()? If its the same bug > I was chasing, it could be an old one. The stress-ng logs I collected last > year include a crash from a v4.14 build. Went to reread the current state of mm/gup.c, decided to reread handle_mm_fault() and its callers, noticed fault_signal_pending() which hadn't been there back when I last crawled through that area, realized what it had replaced, went to check if everything had been converted (arch/um got missed, BTW). Noticed the difference between the architectures (the first hit was on alpha, without the "sod off to no_context if it's a user fault" logics, the last - xtensa, with it). Checked the log for xtensa, found the commit from 2021 adding that part; looked on arm and arm64, found commits from 2017 doing the same thing, then, on x86, Linus' commit from 2014 adding the x86 counterpart... Figuring out what all of those had been for wasn't particularly hard, and it was easy to check which architectures still needed the same thing... BTW, since these patches would be much easier to backport than any unification work, I think the right thing to do would be to have further unification done on top of them.
On Sun, Feb 5, 2023 at 12:39 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > BTW, since these patches would be much easier to backport than any unification > work, I think the right thing to do would be to have further unification done on > top of them. Ack. I'm not NAKing the patches, I was just hoping that we also have some way forward. So "fix the issues, then unify" sounds like the right thing to do to me. Linus
On Tue, Jan 31, 2023 at 9:06 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" > If e.g. get_user() triggers a page fault and a fatal signal is caught, we might > end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything > to page tables. In such case we must *not* return to the faulting insn - > that would repeat the entire thing without making any progress; what we need > instead is to treat that as failed (user) memory access. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Sun, 5 Feb 2023, Linus Torvalds wrote: > On Sat, Feb 4, 2023 at 10:16 PM Finn Thain <fthain@linux-m68k.org> wrote: > > > > That could be a bug I was chasing back in 2021 but never found. The mmap > > stressors in stress-ng were triggering a crash on a Mac Quadras, though > > only rarely. Sometimes it would run all day without a failure. > > > > Last year when I started using GCC 12 to build the kernel, I saw the same > > workload fail again but the failure mode had become a silent hang/livelock > > instead of the oopses I got with GCC 6. > > > > When I press the NMI button after the livelock I always see > > do_page_fault() in the backtrace. So I've been testing your patch. I've > > been running the same stress-ng reproducer for about 12 hours now with no > > failures which looks promising. > > > > In case that stress-ng testing is of use: > > Tested-by: Finn Thain <fthain@linux-m68k.org> > > Could you test the thing that Mark Rutland pointed to? He had an > actual test-case for this for the arm64 fixes some years ago. > > See > > https://lore.kernel.org/all/Y9pD+TMP+%2FSyfeJm@FVFF77S0Q05N/ > > for his email with links to his old test-case? > With qemu-system-m68k v7.2 runing Linux v6.1, killing Mark's program did not trigger any failure. With my Quadra 650 running the same binaries, killing Mark's program produces the BUG splat below. I can confirm that Al's patch fixes it. [ 199.060000] BUG: scheduling while atomic: test/48/0x0003e2e6 [ 199.060000] Modules linked in: [ 199.060000] CPU: 0 PID: 48 Comm: test Tainted: G W 6.1.0-mac #3 [ 199.060000] Stack from 00a6204c: [ 199.060000] 00a6204c 004c1979 004c1979 00000000 00a6215e 00a6206c 00408d62 004c1979 [ 199.060000] 00a62074 0003ad4a 00a620a8 004099e2 00983480 00000000 00a6215e 00000000 [ 199.060000] 00000002 00983480 00a62000 0040970a 00a6209c 00a6209c 00a620c0 00a620c0 [ 199.060000] 00409b94 00000000 00bcf0c0 00a62198 00a357fc 00a6216c 00120e5c 00bcf0d0 [ 199.060000] 00000003 00000001 00000001 0123f039 00000000 00a357fc 00aead38 005870dc [ 199.060000] 005870dc 00a31200 00000000 00000000 00000000 00010000 0000c000 00000000 [ 199.060000] Call Trace: [<00408d62>] dump_stack+0x10/0x16 [ 199.060000] [<0003ad4a>] __schedule_bug+0x5e/0x70 [ 199.060000] [<004099e2>] __schedule+0x2d8/0x446 [ 199.060000] [<0040970a>] __schedule+0x0/0x446 [ 199.060000] [<00409b94>] schedule+0x44/0x8e [ 199.060000] [<00120e5c>] handle_userfault+0x298/0x3de [ 199.060000] [<00010000>] zer_rp2+0x14/0x18 [ 199.060000] [<0000c000>] cu_dmrs+0x0/0x16 [ 199.060000] [<00001200>] kernel_pg_dir+0x200/0x1000 [ 199.060000] [<00010000>] zer_rp2+0x14/0x18 [ 199.060000] [<0000c000>] cu_dmrs+0x0/0x16 [ 199.060000] [<00001200>] kernel_pg_dir+0x200/0x1000 [ 199.060000] [<00010000>] zer_rp2+0x14/0x18 [ 199.060000] [<0000c000>] cu_dmrs+0x0/0x16 [ 199.060000] [<000ab0a0>] handle_mm_fault+0xa34/0xa56 [ 199.060000] [<000c0000>] __alloc_pages_bulk+0x26/0x3f8 [ 199.060000] [<00007056>] do_page_fault+0xd8/0x28a [ 199.060000] [<00006098>] buserr_c+0x1a6/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<000056fc>] do_040writeback1+0x0/0x1d8 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<000062cc>] buserr_c+0x3da/0x6b0 [ 199.060000] [<00001000>] kernel_pg_dir+0x0/0x1000 [ 199.060000] [<00002ac0>] buserr+0x20/0x28 [ 199.060000] [<00001000>] kernel_pg_dir+0x0/0x1000 [ 199.060000] [<00008001>] mac_irq_disable+0x3b/0x98 [ 199.060000] [<00001000>] kernel_pg_dir+0x0/0x1000 [ 199.060000] [<00008001>] mac_irq_disable+0x3b/0x98 [ 199.060000]
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index 4d2837eb3e2a..228128e45c67 100644 --- a/arch/m68k/mm/fault.c +++ b/arch/m68k/mm/fault.c @@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, fault = handle_mm_fault(vma, address, flags, regs); pr_debug("handle_mm_fault returns %x\n", fault); - if (fault_signal_pending(fault, regs)) + if (fault_signal_pending(fault, regs)) { + if (!user_mode(regs)) + goto no_context; return 0; + } /* The fault is fully completed (including releasing mmap lock) */ if (fault & VM_FAULT_COMPLETED)
m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- arch/m68k/mm/fault.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)