Message ID | 20210318151010.100966-1-ben.dooks@codethink.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RFC: riscv: save the SR_SUM status over switches | expand |
Hi Ben, Le 3/18/21 à 11:10 AM, Ben Dooks a écrit : > When threads/tasks are switched we need to ensure the old execution's > SR_SUM state is saved and the new thread has the old SR_SUM state > restored. > > The issue is seen under heavy load especially with the syz-stress tool > running, with crashes as follows in schedule_tail: > > Unable to handle kernel access to user memory without uaccess routines at virtual address 000000002749f0d0 > Oops [#1] > Modules linked in: > CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 > Hardware name: riscv-virtio,qemu (DT) > epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 > ra : task_pid_vnr include/linux/sched.h:1421 [inline] > ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264 > epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0 > gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000 > t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0 > s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003 > a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00 > a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba > s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0 > s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850 > s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8 > s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2 > t5 : ffffffc4043cafba t6 : 0000000000040000 > status: 0000000000000120 badaddr: 000000002749f0d0 cause: 000000000000000f > Call Trace: > [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 > [<ffffffe000005570>] ret_from_exception+0x0/0x14 > Dumping ftrace buffer: > (ftrace buffer empty) > ---[ end trace b5f8f9231dc87dda ]--- > > The issue comes from the put_user() in schedule_tail (kernel/sched/core.c) > doing the following: > > asmlinkage __visible void schedule_tail(struct task_struct *prev) > { > ... > if (current->set_child_tid) > put_user(task_pid_vnr(current), current->set_child_tid); > ... > } > > the put_user() macro causes the code sequence to come out as follows: > > 1: __enable_user_access() > 2: reg = task_pid_vnr(current); > 3: *current->set_child_tid = reg; > 4: __disable_user_access() > > This means the task_pid_vnr() is being called with user-access enabled > which itself is not a good idea, but that is a seperate issue. Here we > have a function that /might/ sleep being called with the SR_SUM and if > it does, then it returns with the SR_SUM flag possibly cleared thus > causing the above abort. > > To try and deal with this, and stop the SR_SUM leaking out into other > threads (this has also been tested and see under stress. It can rarely > happen but it /does/ under load) make sure the __switch_to() will save > and restore the SR_SUM flag, and clear it possibly for the next thread > if it does not need it. > > Note, test code to be supplied once other checks have been finished. Very nice, good catch ! > > There may be further issues with the mstatus flags with this, this > can be discussed further once some initial testing has been done. Yes, that must be discussed, I don't have opinion right now but if this is the only bit to save/restore, we may consider other options: disable preemption, using __typeof(x) like suggested by Dmitry to avoid being scheduled and maybe other options. > > Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com > Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> > Cc: Dmitry Vyukov <dvyukov@google.com> > Cc: Javier Jardon <javier.jardon@codethink.co.uk> > --- > arch/riscv/include/asm/processor.h | 1 + > arch/riscv/kernel/asm-offsets.c | 5 +++++ > arch/riscv/kernel/entry.S | 7 +++++++ > 3 files changed, 13 insertions(+) > > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h > index 021ed64ee608..e2398c6e7af9 100644 > --- a/arch/riscv/include/asm/processor.h > +++ b/arch/riscv/include/asm/processor.h > @@ -35,6 +35,7 @@ struct thread_struct { > unsigned long s[12]; /* s[0]: frame pointer */ > struct __riscv_d_ext_state fstate; > unsigned long bad_cause; > + unsigned long flags; I would not call it "flags", I'd rather use "status", named after CSR_STATUS. > }; > > #define INIT_THREAD { \ > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c > index 9ef33346853c..0036570752d9 100644 > --- a/arch/riscv/kernel/asm-offsets.c > +++ b/arch/riscv/kernel/asm-offsets.c > @@ -29,6 +29,7 @@ void asm_offsets(void) > OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]); > OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]); > OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]); > + OFFSET(TASK_THREAD_FLAGS, task_struct, thread.flags); > OFFSET(TASK_TI_FLAGS, task_struct, thread_info.flags); > OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count); > OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp); > @@ -172,6 +173,10 @@ void asm_offsets(void) > offsetof(struct task_struct, thread.s[11]) > - offsetof(struct task_struct, thread.ra) > ); > + DEFINE(TASK_THREAD_FLAGS_RA, > + offsetof(struct task_struct, thread.flags) > + - offsetof(struct task_struct, thread.ra) > + ); > > DEFINE(TASK_THREAD_F0_F0, > offsetof(struct task_struct, thread.fstate.f[0]) > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S > index 744f3209c48d..6137f6c2a2ad 100644 > --- a/arch/riscv/kernel/entry.S > +++ b/arch/riscv/kernel/entry.S > @@ -417,7 +417,14 @@ ENTRY(__switch_to) > REG_S s9, TASK_THREAD_S9_RA(a3) > REG_S s10, TASK_THREAD_S10_RA(a3) > REG_S s11, TASK_THREAD_S11_RA(a3) > + /* save (and disable the user space access flag) */ > + li s0, SR_SUM > + csrrc s1, CSR_STATUS, s0 I don't think that clearing SR_SUM is needed here since you restore its value later on. > + REG_S s1, TASK_THREAD_FLAGS_RA(a3) > + > /* Restore context from next->thread */ > + REG_L s0, TASK_THREAD_FLAGS_RA(a4) > + csrs CSR_STATUS, s0 > REG_L ra, TASK_THREAD_RA_RA(a4) > REG_L sp, TASK_THREAD_SP_RA(a4) > REG_L s0, TASK_THREAD_S0_RA(a4) > Again, very nice finding, well done :) Alex
On 18/03/2021 16:57, Alex Ghiti wrote: > Hi Ben, > > Le 3/18/21 à 11:10 AM, Ben Dooks a écrit : >> When threads/tasks are switched we need to ensure the old execution's >> SR_SUM state is saved and the new thread has the old SR_SUM state >> restored. >> >> The issue is seen under heavy load especially with the syz-stress tool >> running, with crashes as follows in schedule_tail: >> >> Unable to handle kernel access to user memory without uaccess routines >> at virtual address 000000002749f0d0 >> Oops [#1] >> Modules linked in: >> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted >> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 >> Hardware name: riscv-virtio,qemu (DT) >> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 >> ra : task_pid_vnr include/linux/sched.h:1421 [inline] >> ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264 >> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0 >> gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000 >> t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0 >> s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003 >> a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00 >> a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba >> s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0 >> s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850 >> s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8 >> s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2 >> t5 : ffffffc4043cafba t6 : 0000000000040000 >> status: 0000000000000120 badaddr: 000000002749f0d0 cause: >> 000000000000000f >> Call Trace: >> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 >> [<ffffffe000005570>] ret_from_exception+0x0/0x14 >> Dumping ftrace buffer: >> (ftrace buffer empty) >> ---[ end trace b5f8f9231dc87dda ]--- >> >> The issue comes from the put_user() in schedule_tail >> (kernel/sched/core.c) >> doing the following: >> >> asmlinkage __visible void schedule_tail(struct task_struct *prev) >> { >> ... >> if (current->set_child_tid) >> put_user(task_pid_vnr(current), current->set_child_tid); >> ... >> } >> >> the put_user() macro causes the code sequence to come out as follows: >> >> 1: __enable_user_access() >> 2: reg = task_pid_vnr(current); >> 3: *current->set_child_tid = reg; >> 4: __disable_user_access() >> >> This means the task_pid_vnr() is being called with user-access enabled >> which itself is not a good idea, but that is a seperate issue. Here we >> have a function that /might/ sleep being called with the SR_SUM and if >> it does, then it returns with the SR_SUM flag possibly cleared thus >> causing the above abort. >> >> To try and deal with this, and stop the SR_SUM leaking out into other >> threads (this has also been tested and see under stress. It can rarely >> happen but it /does/ under load) make sure the __switch_to() will save >> and restore the SR_SUM flag, and clear it possibly for the next thread >> if it does not need it. >> >> Note, test code to be supplied once other checks have been finished. > > Very nice, good catch ! Yes, this has taken about 3 days of work to find (thank you to Codethink for not making me do more boring work). I'll try and make a blog post, a talk and some of the tools/patches used available for people to look at. So far this patch has been running syz-stress under qemu-riscv64 for about 2 hours without an issue. My heating bill might be a bit reduced for today. >> >> There may be further issues with the mstatus flags with this, this >> can be discussed further once some initial testing has been done. > > Yes, that must be discussed, I don't have opinion right now but if this > is the only bit to save/restore, we may consider other options: disable > preemption, using __typeof(x) like suggested by Dmitry to avoid being > scheduled and maybe other options. I had a go at the typeof, it made a lot of messy kernel warnings and I am not sure if it will actually fully fix things as gcc may well move things around anyway. There really does need to be a fix for the amount of code that might be executed under put_user(), but I think the correct thing to do is to ensure that the flags are saved over the switch_to(). I'm not sure if disabling pre-emption is great either as there are now kernel sections where the access is disabled for using the "unsafe" versions of the user space access. >> >> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com >> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> >> Cc: Dmitry Vyukov <dvyukov@google.com> >> Cc: Javier Jardon <javier.jardon@codethink.co.uk> >> --- >> arch/riscv/include/asm/processor.h | 1 + >> arch/riscv/kernel/asm-offsets.c | 5 +++++ >> arch/riscv/kernel/entry.S | 7 +++++++ >> 3 files changed, 13 insertions(+) >> >> diff --git a/arch/riscv/include/asm/processor.h >> b/arch/riscv/include/asm/processor.h >> index 021ed64ee608..e2398c6e7af9 100644 >> --- a/arch/riscv/include/asm/processor.h >> +++ b/arch/riscv/include/asm/processor.h >> @@ -35,6 +35,7 @@ struct thread_struct { >> unsigned long s[12]; /* s[0]: frame pointer */ >> struct __riscv_d_ext_state fstate; >> unsigned long bad_cause; >> + unsigned long flags; > > I would not call it "flags", I'd rather use "status", named after > CSR_STATUS. Ok, good idea. Not sure if it should be status or csr_status. > >> }; >> #define INIT_THREAD { \ >> diff --git a/arch/riscv/kernel/asm-offsets.c >> b/arch/riscv/kernel/asm-offsets.c >> index 9ef33346853c..0036570752d9 100644 >> --- a/arch/riscv/kernel/asm-offsets.c >> +++ b/arch/riscv/kernel/asm-offsets.c >> @@ -29,6 +29,7 @@ void asm_offsets(void) >> OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]); >> OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]); >> OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]); >> + OFFSET(TASK_THREAD_FLAGS, task_struct, thread.flags); >> OFFSET(TASK_TI_FLAGS, task_struct, thread_info.flags); >> OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, >> thread_info.preempt_count); >> OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp); >> @@ -172,6 +173,10 @@ void asm_offsets(void) >> offsetof(struct task_struct, thread.s[11]) >> - offsetof(struct task_struct, thread.ra) >> ); >> + DEFINE(TASK_THREAD_FLAGS_RA, >> + offsetof(struct task_struct, thread.flags) >> + - offsetof(struct task_struct, thread.ra) >> + ); >> DEFINE(TASK_THREAD_F0_F0, >> offsetof(struct task_struct, thread.fstate.f[0]) >> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S >> index 744f3209c48d..6137f6c2a2ad 100644 >> --- a/arch/riscv/kernel/entry.S >> +++ b/arch/riscv/kernel/entry.S >> @@ -417,7 +417,14 @@ ENTRY(__switch_to) >> REG_S s9, TASK_THREAD_S9_RA(a3) >> REG_S s10, TASK_THREAD_S10_RA(a3) >> REG_S s11, TASK_THREAD_S11_RA(a3) >> + /* save (and disable the user space access flag) */ >> + li s0, SR_SUM >> + csrrc s1, CSR_STATUS, s0 > > I don't think that clearing SR_SUM is needed here since you restore its > value later on. The restore I think requires the flag to have been cleared. >> + REG_S s1, TASK_THREAD_FLAGS_RA(a3) >> + >> /* Restore context from next->thread */ >> + REG_L s0, TASK_THREAD_FLAGS_RA(a4) >> + csrs CSR_STATUS, s0 >> REG_L ra, TASK_THREAD_RA_RA(a4) >> REG_L sp, TASK_THREAD_SP_RA(a4) >> REG_L s0, TASK_THREAD_S0_RA(a4) >> > > Again, very nice finding, well done :) > > Alex > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv >
On 18/03/2021 17:42, Ben Dooks wrote: > On 18/03/2021 16:57, Alex Ghiti wrote: >> Hi Ben, >> >> Le 3/18/21 à 11:10 AM, Ben Dooks a écrit : >>> When threads/tasks are switched we need to ensure the old execution's >>> SR_SUM state is saved and the new thread has the old SR_SUM state >>> restored. >>> >>> The issue is seen under heavy load especially with the syz-stress tool >>> running, with crashes as follows in schedule_tail: >>> >>> Unable to handle kernel access to user memory without uaccess >>> routines at virtual address 000000002749f0d0 >>> Oops [#1] >>> Modules linked in: [snip] I've had syz-stress running over the weekend now with no further occurrence of the fault. I'll put the put_user() patch out now and will further investigate flags over __switch_to()
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index 021ed64ee608..e2398c6e7af9 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -35,6 +35,7 @@ struct thread_struct { unsigned long s[12]; /* s[0]: frame pointer */ struct __riscv_d_ext_state fstate; unsigned long bad_cause; + unsigned long flags; }; #define INIT_THREAD { \ diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index 9ef33346853c..0036570752d9 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -29,6 +29,7 @@ void asm_offsets(void) OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]); OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]); OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]); + OFFSET(TASK_THREAD_FLAGS, task_struct, thread.flags); OFFSET(TASK_TI_FLAGS, task_struct, thread_info.flags); OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count); OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp); @@ -172,6 +173,10 @@ void asm_offsets(void) offsetof(struct task_struct, thread.s[11]) - offsetof(struct task_struct, thread.ra) ); + DEFINE(TASK_THREAD_FLAGS_RA, + offsetof(struct task_struct, thread.flags) + - offsetof(struct task_struct, thread.ra) + ); DEFINE(TASK_THREAD_F0_F0, offsetof(struct task_struct, thread.fstate.f[0]) diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 744f3209c48d..6137f6c2a2ad 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -417,7 +417,14 @@ ENTRY(__switch_to) REG_S s9, TASK_THREAD_S9_RA(a3) REG_S s10, TASK_THREAD_S10_RA(a3) REG_S s11, TASK_THREAD_S11_RA(a3) + /* save (and disable the user space access flag) */ + li s0, SR_SUM + csrrc s1, CSR_STATUS, s0 + REG_S s1, TASK_THREAD_FLAGS_RA(a3) + /* Restore context from next->thread */ + REG_L s0, TASK_THREAD_FLAGS_RA(a4) + csrs CSR_STATUS, s0 REG_L ra, TASK_THREAD_RA_RA(a4) REG_L sp, TASK_THREAD_SP_RA(a4) REG_L s0, TASK_THREAD_S0_RA(a4)
When threads/tasks are switched we need to ensure the old execution's SR_SUM state is saved and the new thread has the old SR_SUM state restored. The issue is seen under heavy load especially with the syz-stress tool running, with crashes as follows in schedule_tail: Unable to handle kernel access to user memory without uaccess routines at virtual address 000000002749f0d0 Oops [#1] Modules linked in: CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 Hardware name: riscv-virtio,qemu (DT) epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 ra : task_pid_vnr include/linux/sched.h:1421 [inline] ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264 epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0 gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000 t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0 s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003 a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0 s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850 s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8 s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2 t5 : ffffffc4043cafba t6 : 0000000000040000 status: 0000000000000120 badaddr: 000000002749f0d0 cause: 000000000000000f Call Trace: [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 [<ffffffe000005570>] ret_from_exception+0x0/0x14 Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace b5f8f9231dc87dda ]--- The issue comes from the put_user() in schedule_tail (kernel/sched/core.c) doing the following: asmlinkage __visible void schedule_tail(struct task_struct *prev) { ... if (current->set_child_tid) put_user(task_pid_vnr(current), current->set_child_tid); ... } the put_user() macro causes the code sequence to come out as follows: 1: __enable_user_access() 2: reg = task_pid_vnr(current); 3: *current->set_child_tid = reg; 4: __disable_user_access() This means the task_pid_vnr() is being called with user-access enabled which itself is not a good idea, but that is a seperate issue. Here we have a function that /might/ sleep being called with the SR_SUM and if it does, then it returns with the SR_SUM flag possibly cleared thus causing the above abort. To try and deal with this, and stop the SR_SUM leaking out into other threads (this has also been tested and see under stress. It can rarely happen but it /does/ under load) make sure the __switch_to() will save and restore the SR_SUM flag, and clear it possibly for the next thread if it does not need it. Note, test code to be supplied once other checks have been finished. There may be further issues with the mstatus flags with this, this can be discussed further once some initial testing has been done. Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Javier Jardon <javier.jardon@codethink.co.uk> --- arch/riscv/include/asm/processor.h | 1 + arch/riscv/kernel/asm-offsets.c | 5 +++++ arch/riscv/kernel/entry.S | 7 +++++++ 3 files changed, 13 insertions(+)