Message ID | 20220426164315.625149-29-glider@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add KernelMemorySanitizer infrastructure | expand |
On Tue, Apr 26 2022 at 18:42, Alexander Potapenko wrote: Can you please use 'entry:' as prefix. Slapping kmsan in front of everything does not really make sense. > Replace instrumentation_begin() with instrumentation_begin_with_regs() > to let KMSAN handle the non-instrumented code and unpoison pt_regs > passed from the instrumented part. That should be: from the non-instrumented part or passed to the instrumented part right? > --- a/kernel/entry/common.c > +++ b/kernel/entry/common.c > @@ -23,7 +23,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs) > CT_WARN_ON(ct_state() != CONTEXT_USER); > user_exit_irqoff(); > > - instrumentation_begin(); > + instrumentation_begin_with_regs(regs); I can see what you are trying to do, but this will end up doing the same thing over and over. Let's just look at a syscall. __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr) { ... nr = syscall_enter_from_user_mode(regs, nr) __enter_from_user_mode(regs) ..... instrumentation_begin_with_regs(regs); .... instrumentation_begin_with_regs(regs); .... instrumentation_begin_with_regs(regs); if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) { /* Invalid system call, but still a system call. */ regs->ax = __x64_sys_ni_syscall(regs); } instrumentation_end(); syscall_exit_to_user_mode(regs); instrumentation_begin_with_regs(regs); __syscall_exit_to_user_mode_work(regs); instrumentation_end(); __exit_to_user_mode(); That means you memset state four times and unpoison regs four times. I'm not sure whether that's desired. instrumentation_begin()/end() are not really suitable IMO. They were added to allow objtool to validate that nothing escapes into instrumentable code unless annotated accordingly. Thanks, tglx
On Wed, Apr 27, 2022 at 3:32 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, Apr 26 2022 at 18:42, Alexander Potapenko wrote: > > Can you please use 'entry:' as prefix. Slapping kmsan in front of > everything does not really make sense. Sure, will do. > > Replace instrumentation_begin() with instrumentation_begin_with_regs() > > to let KMSAN handle the non-instrumented code and unpoison pt_regs > > passed from the instrumented part. > > That should be: > > from the non-instrumented part > or > passed to the instrumented part > > right? That should be "from the non-instrumented part", you are right. > > --- a/kernel/entry/common.c > > +++ b/kernel/entry/common.c > > @@ -23,7 +23,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs) > > CT_WARN_ON(ct_state() != CONTEXT_USER); > > user_exit_irqoff(); > > > > - instrumentation_begin(); > > + instrumentation_begin_with_regs(regs); > > I can see what you are trying to do, but this will end up doing the same > thing over and over. Let's just look at a syscall. > > __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr) > { > ... > nr = syscall_enter_from_user_mode(regs, nr) > > __enter_from_user_mode(regs) > ..... > instrumentation_begin_with_regs(regs); > .... > > instrumentation_begin_with_regs(regs); > .... > > instrumentation_begin_with_regs(regs); > > if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) { > /* Invalid system call, but still a system call. */ > regs->ax = __x64_sys_ni_syscall(regs); > } > > instrumentation_end(); > > syscall_exit_to_user_mode(regs); > instrumentation_begin_with_regs(regs); > __syscall_exit_to_user_mode_work(regs); > instrumentation_end(); > __exit_to_user_mode(); > > That means you memset state four times and unpoison regs four times. I'm > not sure whether that's desired. Regarding the regs, you are right. It should be enough to unpoison the regs at idtentry prologue instead. I tried that initially, but IIRC it required patching each of the DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin(). This decision can probably be revisited. As for the state, what we are doing here is still not enough, although it appears to work. Every time an instrumented function calls another function, it sets up the metadata for the function arguments in the per-task struct kmsan_context_state. Similarly, every instrumented function expects its caller to put the metadata into that structure. Now, if a non-instrumented function (e.g. every `noinstr` function) calls an instrumented one (which happens inside the instrumentation_begin()/instrumentation_end() region), nobody sets up the state for that instrumented function, so it may report false positives when accessing its arguments, if there are leftover poisoned values in the state. To overcome this problem, ideally we need to wipe kmsan_context_state every time a call from the non-instrumented function occurs. But this cannot be done automatically exactly because we cannot instrument the named function :) We therefore apply an approximation, wiping the state at the point of the first transition between instrumented and non-instrumented code. Because poison values are generally rare, and instrumented regions tend to be short, it is unlikely that further calls from the same non-instrumented function will result in false positives. Yet it is not completely impossible, so wiping the state for the second/third etc. time won't hurt. > > instrumentation_begin()/end() are not really suitable IMO. They were > added to allow objtool to validate that nothing escapes into > instrumentable code unless annotated accordingly. An alternative to this would be adding some extra code unpoisoning the state to every non-instrumented function that contains an instrumented region. That code would have to precede the first instrumentation_begin() anyway, so I thought it would be reasonable to piggyback on the existing annotation. > > Thanks, > > tglx
Alexander, On Mon, May 02 2022 at 19:00, Alexander Potapenko wrote: > On Wed, Apr 27, 2022 at 3:32 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> > --- a/kernel/entry/common.c >> > +++ b/kernel/entry/common.c >> > @@ -23,7 +23,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs) >> > CT_WARN_ON(ct_state() != CONTEXT_USER); >> > user_exit_irqoff(); >> > >> > - instrumentation_begin(); >> > + instrumentation_begin_with_regs(regs); >> >> I can see what you are trying to do, but this will end up doing the same >> thing over and over. Let's just look at a syscall. >> >> __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr) >> { >> ... >> nr = syscall_enter_from_user_mode(regs, nr) >> >> __enter_from_user_mode(regs) >> ..... >> instrumentation_begin_with_regs(regs); >> .... >> >> instrumentation_begin_with_regs(regs); >> .... >> >> instrumentation_begin_with_regs(regs); >> >> if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) { >> /* Invalid system call, but still a system call. */ >> regs->ax = __x64_sys_ni_syscall(regs); >> } >> >> instrumentation_end(); >> >> syscall_exit_to_user_mode(regs); >> instrumentation_begin_with_regs(regs); >> __syscall_exit_to_user_mode_work(regs); >> instrumentation_end(); >> __exit_to_user_mode(); >> >> That means you memset state four times and unpoison regs four times. I'm >> not sure whether that's desired. > > Regarding the regs, you are right. It should be enough to unpoison the > regs at idtentry prologue instead. > I tried that initially, but IIRC it required patching each of the > DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin(). Exactly 4 instances :) > This decision can probably be revisited. It has to be revisited because the whole thing is incomplete if this is not addressed. > As for the state, what we are doing here is still not enough, although > it appears to work. > > Every time an instrumented function calls another function, it sets up > the metadata for the function arguments in the per-task struct > kmsan_context_state. > Similarly, every instrumented function expects its caller to put the > metadata into that structure. > Now, if a non-instrumented function (e.g. every `noinstr` function) > calls an instrumented one (which happens inside the > instrumentation_begin()/instrumentation_end() region), nobody sets up > the state for that instrumented function, so it may report false > positives when accessing its arguments, if there are leftover poisoned > values in the state. > > To overcome this problem, ideally we need to wipe kmsan_context_state > every time a call from the non-instrumented function occurs. > But this cannot be done automatically exactly because we cannot > instrument the named function :) > > We therefore apply an approximation, wiping the state at the point of > the first transition between instrumented and non-instrumented code. > Because poison values are generally rare, and instrumented regions > tend to be short, it is unlikely that further calls from the same > non-instrumented function will result in false positives. > Yet it is not completely impossible, so wiping the state for the > second/third etc. time won't hurt. Understood. But if I understand you correctly: > Similarly, every instrumented function expects its caller to put the > metadata into that structure. then instrumentation_begin(); foo(fargs...); bar(bargs...); instrumentation_end(); is a source of potential false positives because the state is not guaranteed to be correct, neither for foo() nor for bar(), even if you wipe the state in instrumentation_begin(), right? This approximation approach smells fishy and it's inevitably going to be a constant source of 'add yet another kmsan annotation/fixup' patches, which I'm not interested in at all. As this needs compiler support anyway, then why not doing the obvious: #define noinstr \ .... __kmsan_conditional #define instrumentation_begin() \ ..... __kmsan_cond_begin #define instrumentation_end() \ __kmsan_cond_end ....... and let the compiler stick whatever is required into that code section between instrumentation_begin() and instrumentation_end()? That's not violating any of the noinstr constraints at all. In fact we allow _any_ instrumentation to be placed between this two points. We have tracepoints there today. We could also allow breakpoints, kprobes or whatever, but handling this at that granularity level for a production kernel is just overkill and the code in those instrumentable sections is usually not that interesting as it's mostly function calls. But if the compiler converts instrumentation_begin(); foo(fargs...); bar(bargs...); instrumentation_end(); to instrumentation_begin(); kmsan_instr_begin_magic(); kmsan_magic(fargs...); foo(fargs...); kmsan_magic(bargs...); bar(bargs...); kmsan_instr_end_magic(); instrumentation_end(); for the kmsan case and leaves anything outside of these sections alone, then you have: - a minimal code change - the best possible coverage - the least false positive crap to chase and annotate IOW, a solution which is solid and future proof. I'm all for making use of advanced instrumentation, validation and debugging features, but this mindset of 'make the code comply to what the tool of today provides' is fundamentally wrong. Tools have to provide value to the programmer and not the other way round. Yes, it's more work on the tooling side, but the tooling side is mostly a one time effort while chasing the false positives is a long term nightmare. Thanks, tglx
On Tue, May 3, 2022 at 12:00 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Alexander, First of all, thanks a lot for the comments, those are greatly appreciated! I tried to revert this patch and the previous one ("kmsan: instrumentation.h: add instrumentation_begin_with_regs()") and reimplement unpoisoning pt_regs without breaking into instrumentation_begin(), see below. > > > > Regarding the regs, you are right. It should be enough to unpoison the > > regs at idtentry prologue instead. > > I tried that initially, but IIRC it required patching each of the > > DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin(). > > Exactly 4 instances :) > Not really, I had to add a call to `kmsan_unpoison_memory(regs, sizeof(*regs));` to the following places in arch/x86/include/asm/idtentry.h: - DEFINE_IDTENTRY() - DEFINE_IDTENTRY_ERRORCODE() - DEFINE_IDTENTRY_RAW() - DEFINE_IDTENTRY_RAW_ERRORCODE() - DEFINE_IDTENTRY_IRQ() - DEFINE_IDTENTRY_SYSVEC() - DEFINE_IDTENTRY_SYSVEC_SIMPLE() - DEFINE_IDTENTRY_DF() , but even that wasn't enough. For some reason I also had to unpoison pt_regs directly in DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt) and DEFINE_IDTENTRY_IRQ(common_interrupt). In the latter case, this could have been caused by asm_common_interrupt being entered from irq_entries_start(), but I am not sure what is so special about sysvec_apic_timer_interrupt(). Ideally, it would be great to find that single point where pt_regs are set up before being passed to all IDT entries. I used to do that by inserting calls to kmsan_unpoison_memory right into arch/x86/entry/entry_64.S (https://github.com/google/kmsan/commit/3b0583f45f74f3a09f4c7e0e0588169cef918026), but that required storing/restoring all GP registers. Maybe there's a better way? > > then > > instrumentation_begin(); > foo(fargs...); > bar(bargs...); > instrumentation_end(); > > is a source of potential false positives because the state is not > guaranteed to be correct, neither for foo() nor for bar(), even if you > wipe the state in instrumentation_begin(), right? Yes, this is right. > This approximation approach smells fishy and it's inevitably going to be > a constant source of 'add yet another kmsan annotation/fixup' patches, > which I'm not interested in at all. > > As this needs compiler support anyway, then why not doing the obvious: > > #define noinstr \ > .... __kmsan_conditional > > #define instrumentation_begin() \ > ..... __kmsan_cond_begin > > #define instrumentation_end() \ > __kmsan_cond_end ....... > > and let the compiler stick whatever is required into that code section > between instrumentation_begin() and instrumentation_end()? We define noinstr as __attribute__((disable_sanitizer_instrumentation)) (https://llvm.org/docs/LangRef.html#:~:text=disable_sanitizer_instrumentation), which means no instrumentation will be applied to the annotated function. Changing that behavior by adding subregions that can be instrumented sounds questionable. C also doesn't have good syntactic means to define these subregions - perhaps some __xxx_begin()/__xxx_end() intrinsics would work, but they would require both compile-time and run-time validation. Fortunately, I don't think we need to insert extra instrumentation into instrumentation_begin()/instrumentation_end() regions. What I have in mind is adding a bool flag to kmsan_context_state, that the instrumentation sets to true before the function call. When entering an instrumented function, KMSAN would check that flag and set it to false, so that the context state can be only used once. If a function is called from another instrumented function, the context state is properly set up, and there is nothing to worry about. If it is called from non-instrumented code (either noinstr or the skipped files that have KMSAN_SANITIZE:=n), KMSAN would detect that and wipe the context state before use. By the way, I've noticed that at least for now (with pt_regs unpoisoning performed in IDT entries) the problem with false positives in noinstr code is entirely gone, so maybe we don't even have to bother. > Yes, it's more work on the tooling side, but the tooling side is mostly > a one time effort while chasing the false positives is a long term > nightmare. Well said. > Thanks, > > tglx
Alexander, On Thu, May 05 2022 at 20:04, Alexander Potapenko wrote: > On Tue, May 3, 2022 at 12:00 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> > Regarding the regs, you are right. It should be enough to unpoison the >> > regs at idtentry prologue instead. >> > I tried that initially, but IIRC it required patching each of the >> > DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin(). >> >> Exactly 4 instances :) >> > > Not really, I had to add a call to `kmsan_unpoison_memory(regs, > sizeof(*regs));` to the following places in > arch/x86/include/asm/idtentry.h: > - DEFINE_IDTENTRY() > - DEFINE_IDTENTRY_ERRORCODE() > - DEFINE_IDTENTRY_RAW() > - DEFINE_IDTENTRY_RAW_ERRORCODE() > - DEFINE_IDTENTRY_IRQ() > - DEFINE_IDTENTRY_SYSVEC() > - DEFINE_IDTENTRY_SYSVEC_SIMPLE() > - DEFINE_IDTENTRY_DF() > > , but even that wasn't enough. For some reason I also had to unpoison > pt_regs directly in > DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt) and > DEFINE_IDTENTRY_IRQ(common_interrupt). > In the latter case, this could have been caused by > asm_common_interrupt being entered from irq_entries_start(), but I am > not sure what is so special about sysvec_apic_timer_interrupt(). > > Ideally, it would be great to find that single point where pt_regs are > set up before being passed to all IDT entries. > I used to do that by inserting calls to kmsan_unpoison_memory right > into arch/x86/entry/entry_64.S > (https://github.com/google/kmsan/commit/3b0583f45f74f3a09f4c7e0e0588169cef918026), > but that required storing/restoring all GP registers. Maybe there's a > better way? Yes. Something like this should cover all exceptions and syscalls before anything instrumentable can touch @regs. Anything up to those points is either off-limit for instrumentation or does not deal with @regs. --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -24,6 +24,7 @@ static __always_inline void __enter_from user_exit_irqoff(); instrumentation_begin(); + unpoison(regs); trace_hardirqs_off_finish(); instrumentation_end(); } @@ -352,6 +353,7 @@ noinstr irqentry_state_t irqentry_enter( lockdep_hardirqs_off(CALLER_ADDR0); rcu_irq_enter(); instrumentation_begin(); + unpoison(regs); trace_hardirqs_off_finish(); instrumentation_end(); @@ -367,6 +369,7 @@ noinstr irqentry_state_t irqentry_enter( */ lockdep_hardirqs_off(CALLER_ADDR0); instrumentation_begin(); + unpoison(regs); rcu_irq_enter_check_tick(); trace_hardirqs_off_finish(); instrumentation_end(); @@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en rcu_nmi_enter(); instrumentation_begin(); + unpoison(regs); trace_hardirqs_off_finish(); ftrace_nmi_enter(); instrumentation_end(); As I said: 4 places :) > Fortunately, I don't think we need to insert extra instrumentation > into instrumentation_begin()/instrumentation_end() regions. > > What I have in mind is adding a bool flag to kmsan_context_state, that > the instrumentation sets to true before the function call. > When entering an instrumented function, KMSAN would check that flag > and set it to false, so that the context state can be only used once. > If a function is called from another instrumented function, the > context state is properly set up, and there is nothing to worry about. > If it is called from non-instrumented code (either noinstr or the > skipped files that have KMSAN_SANITIZE:=n), KMSAN would detect that > and wipe the context state before use. Sounds good. Thanks, tglx
On Thu, May 5, 2022 at 11:56 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > Alexander, > > On Thu, May 05 2022 at 20:04, Alexander Potapenko wrote: > > On Tue, May 3, 2022 at 12:00 AM Thomas Gleixner <tglx@linutronix.de> wrote: > >> > Regarding the regs, you are right. It should be enough to unpoison the > >> > regs at idtentry prologue instead. > >> > I tried that initially, but IIRC it required patching each of the > >> > DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin(). > >> > >> Exactly 4 instances :) > >> > > > > Not really, I had to add a call to `kmsan_unpoison_memory(regs, > > sizeof(*regs));` to the following places in > > arch/x86/include/asm/idtentry.h: > > - DEFINE_IDTENTRY() > > - DEFINE_IDTENTRY_ERRORCODE() > > - DEFINE_IDTENTRY_RAW() > > - DEFINE_IDTENTRY_RAW_ERRORCODE() > > - DEFINE_IDTENTRY_IRQ() > > - DEFINE_IDTENTRY_SYSVEC() > > - DEFINE_IDTENTRY_SYSVEC_SIMPLE() > > - DEFINE_IDTENTRY_DF() > > > > , but even that wasn't enough. For some reason I also had to unpoison > > pt_regs directly in > > DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt) and > > DEFINE_IDTENTRY_IRQ(common_interrupt). > > In the latter case, this could have been caused by > > asm_common_interrupt being entered from irq_entries_start(), but I am > > not sure what is so special about sysvec_apic_timer_interrupt(). > > > > Ideally, it would be great to find that single point where pt_regs are > > set up before being passed to all IDT entries. > > I used to do that by inserting calls to kmsan_unpoison_memory right > > into arch/x86/entry/entry_64.S > > (https://github.com/google/kmsan/commit/3b0583f45f74f3a09f4c7e0e0588169cef918026), > > but that required storing/restoring all GP registers. Maybe there's a > > better way? > > Yes. Something like this should cover all exceptions and syscalls before > anything instrumentable can touch @regs. Anything up to those points is > either off-limit for instrumentation or does not deal with @regs. > > --- a/kernel/entry/common.c > +++ b/kernel/entry/common.c > @@ -24,6 +24,7 @@ static __always_inline void __enter_from > user_exit_irqoff(); > > instrumentation_begin(); > + unpoison(regs); > trace_hardirqs_off_finish(); > instrumentation_end(); > } > @@ -352,6 +353,7 @@ noinstr irqentry_state_t irqentry_enter( > lockdep_hardirqs_off(CALLER_ADDR0); > rcu_irq_enter(); > instrumentation_begin(); > + unpoison(regs); > trace_hardirqs_off_finish(); > instrumentation_end(); > > @@ -367,6 +369,7 @@ noinstr irqentry_state_t irqentry_enter( > */ > lockdep_hardirqs_off(CALLER_ADDR0); > instrumentation_begin(); > + unpoison(regs); > rcu_irq_enter_check_tick(); > trace_hardirqs_off_finish(); > instrumentation_end(); > @@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en > rcu_nmi_enter(); > > instrumentation_begin(); > + unpoison(regs); > trace_hardirqs_off_finish(); > ftrace_nmi_enter(); > instrumentation_end(); > > As I said: 4 places :) These four instances still do not look sufficient. Right now I am seeing e.g. reports with the following stack trace: BUG: KMSAN: uninit-value in irqtime_account_process_tick+0x255/0x580 kernel/sched/cputime.c:382 irqtime_account_process_tick+0x255/0x580 kernel/sched/cputime.c:382 account_process_tick+0x98/0x450 kernel/sched/cputime.c:476 update_process_times+0xe4/0x3e0 kernel/time/timer.c:1786 tick_sched_handle kernel/time/tick-sched.c:243 tick_sched_timer+0x83e/0x9e0 kernel/time/tick-sched.c:1473 __run_hrtimer+0x518/0xe50 kernel/time/hrtimer.c:1685 __hrtimer_run_queues kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x838/0x15a0 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 __sysvec_apic_timer_interrupt+0x1ae/0x680 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x95/0xc0 arch/x86/kernel/apic/apic.c:1097 ... (uninit creation stack trace is irrelevant here, because it is some random value from the stack) sysvec_apic_timer_interrupt() receives struct pt_regs from uninstrumented code, so regs can be partially uninitialized. They are not passed down the call stack directly, but are instead saved by set_irq_regs() in sysvec_apic_timer_interrupt() and loaded by get_irq_regs() in tick_sched_timer(). The remaining false positives can be fixed by unpoisoning the registers in set_irq_regs(): static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs) { struct pt_regs *old_regs; + kmsan_unpoison_memory(new_regs, sizeof(*new_regs)); old_regs = __this_cpu_read(__irq_regs); __this_cpu_write(__irq_regs, new_regs); Does that sound viable? Is it correct to assume that set_irq_regs() is always called for registers received from non-instrumented code? (It seems that just unpoisoning registers in set_irq_regs() is not enough, i.e. we still need to do what you suggest in kernel/entry/common.c) -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
On Fri, May 06 2022 at 16:52, Alexander Potapenko wrote: > On Thu, May 5, 2022 at 11:56 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> @@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en >> rcu_nmi_enter(); >> >> instrumentation_begin(); >> + unpoison(regs); >> trace_hardirqs_off_finish(); >> ftrace_nmi_enter(); >> instrumentation_end(); >> >> As I said: 4 places :) > > These four instances still do not look sufficient. > Right now I am seeing e.g. reports with the following stack trace: > > BUG: KMSAN: uninit-value in irqtime_account_process_tick+0x255/0x580 > kernel/sched/cputime.c:382 > irqtime_account_process_tick+0x255/0x580 kernel/sched/cputime.c:382 > account_process_tick+0x98/0x450 kernel/sched/cputime.c:476 > update_process_times+0xe4/0x3e0 kernel/time/timer.c:1786 > tick_sched_handle kernel/time/tick-sched.c:243 > tick_sched_timer+0x83e/0x9e0 kernel/time/tick-sched.c:1473 > __run_hrtimer+0x518/0xe50 kernel/time/hrtimer.c:1685 > __hrtimer_run_queues kernel/time/hrtimer.c:1749 > hrtimer_interrupt+0x838/0x15a0 kernel/time/hrtimer.c:1811 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 > __sysvec_apic_timer_interrupt+0x1ae/0x680 arch/x86/kernel/apic/apic.c:1103 > sysvec_apic_timer_interrupt+0x95/0xc0 arch/x86/kernel/apic/apic.c:1097 > ... > (uninit creation stack trace is irrelevant here, because it is some > random value from the stack) > > sysvec_apic_timer_interrupt() receives struct pt_regs from > uninstrumented code, so regs can be partially uninitialized. > They are not passed down the call stack directly, but are instead > saved by set_irq_regs() in sysvec_apic_timer_interrupt() and loaded by > get_irq_regs() in tick_sched_timer(). sysvec_apic_timer_interrupt() invokes irqentry_enter() _before_ set_irq_regs() and irqentry_enter() unpoisons @reg. Confused...
On Fri, May 6, 2022 at 6:14 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Fri, May 06 2022 at 16:52, Alexander Potapenko wrote: > > On Thu, May 5, 2022 at 11:56 PM Thomas Gleixner <tglx@linutronix.de> wrote: > >> @@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en > >> rcu_nmi_enter(); > >> > >> instrumentation_begin(); > >> + unpoison(regs); > >> trace_hardirqs_off_finish(); > >> ftrace_nmi_enter(); > >> instrumentation_end(); > >> > >> As I said: 4 places :) > > > > These four instances still do not look sufficient. > > Right now I am seeing e.g. reports with the following stack trace: > > > > BUG: KMSAN: uninit-value in irqtime_account_process_tick+0x255/0x580 > > kernel/sched/cputime.c:382 > > irqtime_account_process_tick+0x255/0x580 kernel/sched/cputime.c:382 > > account_process_tick+0x98/0x450 kernel/sched/cputime.c:476 > > update_process_times+0xe4/0x3e0 kernel/time/timer.c:1786 > > tick_sched_handle kernel/time/tick-sched.c:243 > > tick_sched_timer+0x83e/0x9e0 kernel/time/tick-sched.c:1473 > > __run_hrtimer+0x518/0xe50 kernel/time/hrtimer.c:1685 > > __hrtimer_run_queues kernel/time/hrtimer.c:1749 > > hrtimer_interrupt+0x838/0x15a0 kernel/time/hrtimer.c:1811 > > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 > > __sysvec_apic_timer_interrupt+0x1ae/0x680 arch/x86/kernel/apic/apic.c:1103 > > sysvec_apic_timer_interrupt+0x95/0xc0 arch/x86/kernel/apic/apic.c:1097 > > ... > > (uninit creation stack trace is irrelevant here, because it is some > > random value from the stack) > > > > sysvec_apic_timer_interrupt() receives struct pt_regs from > > uninstrumented code, so regs can be partially uninitialized. > > They are not passed down the call stack directly, but are instead > > saved by set_irq_regs() in sysvec_apic_timer_interrupt() and loaded by > > get_irq_regs() in tick_sched_timer(). > > sysvec_apic_timer_interrupt() invokes irqentry_enter() _before_ > set_irq_regs() and irqentry_enter() unpoisons @reg. > > Confused... As far as I can tell in this case sysvect_apic_timer_interrupt() is called by the following code in arch/x86/kernel/idt.c: INTG(LOCAL_TIMER_VECTOR, asm_sysvec_apic_timer_interrupt), , which does not use IDTENTRY_SYSVEC framework and thus does not call irqentry_enter(). I guess handling those will require wrapping every interrupt gate into a function that performs register unpoisoning? By the way, if it helps, I think we don't necessarily have to call kmsan_unpoison_memory() from within the instrumentation_begin()/instrumentation_end() region? We could move the call to the beginning of irqentry_enter(), removing unnecessary duplication.
On Fri, May 06 2022 at 19:41, Alexander Potapenko wrote: > On Fri, May 6, 2022 at 6:14 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> sysvec_apic_timer_interrupt() invokes irqentry_enter() _before_ >> set_irq_regs() and irqentry_enter() unpoisons @reg. >> >> Confused... > > As far as I can tell in this case sysvect_apic_timer_interrupt() is > called by the following code in arch/x86/kernel/idt.c: > > INTG(LOCAL_TIMER_VECTOR, asm_sysvec_apic_timer_interrupt), > > , which does not use IDTENTRY_SYSVEC framework and thus does not call > irqentry_enter(). asm_sysvec_apic_timer_interrupt != sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c: DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt) { .... #define DEFINE_IDTENTRY_SYSVEC(func) \ static void __##func(struct pt_regs *regs); \ \ __visible noinstr void func(struct pt_regs *regs) \ { \ irqentry_state_t state = irqentry_enter(regs); \ .... __##func (regs); \ .... } \ \ static noinline void __##func(struct pt_regs *regs) So it goes through that code path _before_ the actual implementation which does set_irq_regs() is reached. The callchain is: asm_sysvec_apic_timer_interrupt <- ASM entry in gate sysvec_apic_timer_interrupt(regs) <- noinstr C entry point irqentry_enter(regs) <- unpoisons @reg __sysvec_apic_timer_interrupt(regs) <- the actual handler set_irq_regs(regs) <- stores regs local_apic_timer_interrupt() ... tick_handler() <- One of the 4 variants regs = get_irq_regs(); <- retrieves regs update_process_times(user_tick = user_mode(regs)) account_process_tick(user_tick) irqtime_account_process_tick(user_tick) line 382: } else if { user_tick } <- KMSAN complains I'm even more confused now. > I guess handling those will require wrapping every interrupt gate into > a function that performs register unpoisoning? No, guessing does not help here. The gates point to the ASM entry point, which then invokes the C entry point. All C entry points use a DEFINE_IDTENTRY variant. Some of the DEFINE_IDTENTRY_* C entry points are not doing anything in the macro, but the C function either invokes irqentry_enter() or irqentry_nmi_enter() open coded _before_ invoking any instrumentable function. So the unpoisoning of @regs in these functions should tell KMSAN that @regs or something derived from @regs are not some random uninitialized values. There should be no difference between unpoisoning @regs in irqentry_enter() or in set_irq_regs(), right? If so, then the problem is definitely _not_ the idt entry code. > By the way, if it helps, I think we don't necessarily have to call > kmsan_unpoison_memory() from within the > instrumentation_begin()/instrumentation_end() region? > We could move the call to the beginning of irqentry_enter(), removing > unnecessary duplication. We could, but then you need to mark unpoison_memory() noinstr too and you have to add the unpoison into the syscall code. No win and irrelevant to the problem at hand. Thanks, tglx
> The callchain is: > > asm_sysvec_apic_timer_interrupt <- ASM entry in gate > sysvec_apic_timer_interrupt(regs) <- noinstr C entry point > irqentry_enter(regs) <- unpoisons @reg > __sysvec_apic_timer_interrupt(regs) <- the actual handler > set_irq_regs(regs) <- stores regs > local_apic_timer_interrupt() > ... > tick_handler() <- One of the 4 variants > regs = get_irq_regs(); <- retrieves regs > update_process_times(user_tick = user_mode(regs)) > account_process_tick(user_tick) > irqtime_account_process_tick(user_tick) > line 382: } else if { user_tick } <- KMSAN complains > > I'm even more confused now. Ok, I think I know what's going on. Indeed, calling kmsan_unpoison_memory() in irqentry_enter() was supposed to be enough, but we have code in kmsan_unpoison_memory() (as well as other runtime functions) that checks for kmsan_in_runtime() and bails out to prevent potential recursion if KMSAN code starts calling itself. kmsan_in_runtime() is implemented as follows: ============================================== static __always_inline bool kmsan_in_runtime(void) { if ((hardirq_count() >> HARDIRQ_SHIFT) > 1) return true; return kmsan_get_context()->kmsan_in_runtime; } ============================================== (see the code here: https://lore.kernel.org/lkml/20220426164315.625149-13-glider@google.com/#Z31mm:kmsan:kmsan.h) If we are running in the task context (in_task()==true), kmsan_get_context() returns a per-task `struct *kmsan_ctx`. If `in_task()==false` and `hardirq_count()>>HARDIRQ_SHIFT==1`, it returns a per-CPU one. Otherwise kmsan_in_runtime() is considered true to avoid dealing with nested interrupts. So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than 1, kmsan_in_runtime() becomes a no-op, which leads to false positives. The solution I currently have in mind is to provide a specialized version of kmsan_unpoison_memory() for entry.c, which will not perform the reentrancy checks. > > I guess handling those will require wrapping every interrupt gate into > > a function that performs register unpoisoning? > > No, guessing does not help here. > > The gates point to the ASM entry point, which then invokes the C entry > point. All C entry points use a DEFINE_IDTENTRY variant. Thanks for the explanation. I previously thought there were two different entry points, one with asm_ and one without, that ended up calling the same code. > Some of the DEFINE_IDTENTRY_* C entry points are not doing anything in > the macro, but the C function either invokes irqentry_enter() or > irqentry_nmi_enter() open coded _before_ invoking any instrumentable > function. So the unpoisoning of @regs in these functions should tell > KMSAN that @regs or something derived from @regs are not some random > uninitialized values. > > There should be no difference between unpoisoning @regs in > irqentry_enter() or in set_irq_regs(), right? > > If so, then the problem is definitely _not_ the idt entry code. > > > By the way, if it helps, I think we don't necessarily have to call > > kmsan_unpoison_memory() from within the > > instrumentation_begin()/instrumentation_end() region? > > We could move the call to the beginning of irqentry_enter(), removing > > unnecessary duplication. > > We could, but then you need to mark unpoison_memory() noinstr too and you > have to add the unpoison into the syscall code. No win and irrelevant to > the problem at hand. Makes sense, thank you! > Thanks, > > tglx > >
On Mon, May 9, 2022 at 6:50 PM Alexander Potapenko <glider@google.com> wrote: > > > The callchain is: > > > > asm_sysvec_apic_timer_interrupt <- ASM entry in gate > > sysvec_apic_timer_interrupt(regs) <- noinstr C entry point > > irqentry_enter(regs) <- unpoisons @reg > > __sysvec_apic_timer_interrupt(regs) <- the actual handler > > set_irq_regs(regs) <- stores regs > > local_apic_timer_interrupt() > > ... > > tick_handler() <- One of the 4 variants > > regs = get_irq_regs(); <- retrieves regs > > update_process_times(user_tick = user_mode(regs)) > > account_process_tick(user_tick) > > irqtime_account_process_tick(user_tick) > > line 382: } else if { user_tick } <- KMSAN complains > > > > I'm even more confused now. > > Ok, I think I know what's going on. > > Indeed, calling kmsan_unpoison_memory() in irqentry_enter() was > supposed to be enough, but we have code in kmsan_unpoison_memory() (as > well as other runtime functions) that checks for kmsan_in_runtime() > and bails out to prevent potential recursion if KMSAN code starts > calling itself. > > kmsan_in_runtime() is implemented as follows: > > ============================================== > static __always_inline bool kmsan_in_runtime(void) > { > if ((hardirq_count() >> HARDIRQ_SHIFT) > 1) > return true; > return kmsan_get_context()->kmsan_in_runtime; > } > ============================================== > (see the code here: > https://lore.kernel.org/lkml/20220426164315.625149-13-glider@google.com/#Z31mm:kmsan:kmsan.h) > > If we are running in the task context (in_task()==true), > kmsan_get_context() returns a per-task `struct *kmsan_ctx`. > If `in_task()==false` and `hardirq_count()>>HARDIRQ_SHIFT==1`, it > returns a per-CPU one. > Otherwise kmsan_in_runtime() is considered true to avoid dealing with > nested interrupts. > > So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than > 1, kmsan_in_runtime() becomes a no-op, which leads to false positives. Should be "kmsan_unpoison_memory() becomes a no-op..."
On Mon, May 09 2022 at 18:50, Alexander Potapenko wrote: > Indeed, calling kmsan_unpoison_memory() in irqentry_enter() was > supposed to be enough, but we have code in kmsan_unpoison_memory() (as > well as other runtime functions) that checks for kmsan_in_runtime() > and bails out to prevent potential recursion if KMSAN code starts > calling itself. > > kmsan_in_runtime() is implemented as follows: > > ============================================== > static __always_inline bool kmsan_in_runtime(void) > { > if ((hardirq_count() >> HARDIRQ_SHIFT) > 1) > return true; > return kmsan_get_context()->kmsan_in_runtime; > } > ============================================== > (see the code here: > https://lore.kernel.org/lkml/20220426164315.625149-13-glider@google.com/#Z31mm:kmsan:kmsan.h) > > If we are running in the task context (in_task()==true), > kmsan_get_context() returns a per-task `struct *kmsan_ctx`. > If `in_task()==false` and `hardirq_count()>>HARDIRQ_SHIFT==1`, it > returns a per-CPU one. > Otherwise kmsan_in_runtime() is considered true to avoid dealing with > nested interrupts. > > So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than > 1, kmsan_in_runtime() becomes a no-op, which leads to false positives. But, that'd only > 1 when there is a nested interrupt, which is not the case. Interrupt handlers keep interrupts disabled. The last exception from that rule was some legacy IDE driver which is gone by now. So no, not a good explanation either. Thanks, tglx
On Mon, May 9, 2022 at 9:09 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Mon, May 09 2022 at 18:50, Alexander Potapenko wrote: > > Indeed, calling kmsan_unpoison_memory() in irqentry_enter() was > > supposed to be enough, but we have code in kmsan_unpoison_memory() (as > > well as other runtime functions) that checks for kmsan_in_runtime() > > and bails out to prevent potential recursion if KMSAN code starts > > calling itself. > > > > kmsan_in_runtime() is implemented as follows: > > > > ============================================== > > static __always_inline bool kmsan_in_runtime(void) > > { > > if ((hardirq_count() >> HARDIRQ_SHIFT) > 1) > > return true; > > return kmsan_get_context()->kmsan_in_runtime; > > } > > ============================================== > > (see the code here: > > https://lore.kernel.org/lkml/20220426164315.625149-13-glider@google.com/#Z31mm:kmsan:kmsan.h) > > > > If we are running in the task context (in_task()==true), > > kmsan_get_context() returns a per-task `struct *kmsan_ctx`. > > If `in_task()==false` and `hardirq_count()>>HARDIRQ_SHIFT==1`, it > > returns a per-CPU one. > > Otherwise kmsan_in_runtime() is considered true to avoid dealing with > > nested interrupts. > > > > So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than > > 1, kmsan_in_runtime() becomes a no-op, which leads to false positives. > > But, that'd only > 1 when there is a nested interrupt, which is not the > case. Interrupt handlers keep interrupts disabled. The last exception from > that rule was some legacy IDE driver which is gone by now. That's good to know, then we probably don't need this hardirq_count() check anymore. > So no, not a good explanation either. After looking deeper I see that unpoisoning was indeed skipped because kmsan_in_runtime() returned true, but I was wrong about the root cause. The problem was not caused by a nested hardirq, but rather by the fact that the KMSAN hook in irqentry_enter() was called with in_task()==1. Roughly said, T0 was running some code in the task context, then it started executing KMSAN instrumentation and entered the runtime by setting current->kmsan_ctx.kmsan_in_runtime. Then an IRQ kicked in and started calling asm_sysvec_apic_timer_interrupt() => sysvec_apic_timer_interrupt(regs) => irqentry_enter(regs) - but at that point in_task() was still true, therefore kmsan_unpoison_memory() became a no-op. As far as I can see, it is irq_enter_rcu() that makes in_task() return 0 by incrementing the preempt count in __irq_enter_raw(), so our unpoisoning can only work if we perform it after we enter the IRQ context. I think the best that can be done here is (as suggested above) to provide some kmsan_unpoison_pt_regs() function that will only be called from the entry points and won't be doing reentrancy checks. It should be safe, because unpoisoning boils down to calculating shadow/origin addresses and calling memset() on them, no instrumented code will be involved. We could try to figure out the places in idtentry code where normal kmsan_unpoison_memory() can be called in IRQ context, but as far as I can see it will depend on the type of the entry point. Another way to deal with the problem is to not rely on in_task(), but rather use some per-cpu counter in irqentry_enter()/irqentry_exit() to figure out whether we are in IRQ code already. However this is only possible irqentry_enter() itself guarantees that the execution cannot be rescheduled to another CPU - is that the case? > Thanks, > > tglx > > -- > You received this message because you are subscribed to the Google Groups "kasan-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/871qx2r09k.ffs%40tglx.
On Thu, May 12 2022 at 14:24, Alexander Potapenko wrote: > On Mon, May 9, 2022 at 9:09 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> > So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than >> > 1, kmsan_in_runtime() becomes a no-op, which leads to false positives. >> >> But, that'd only > 1 when there is a nested interrupt, which is not the >> case. Interrupt handlers keep interrupts disabled. The last exception from >> that rule was some legacy IDE driver which is gone by now. > > That's good to know, then we probably don't need this hardirq_count() > check anymore. > >> So no, not a good explanation either. > > After looking deeper I see that unpoisoning was indeed skipped because > kmsan_in_runtime() returned true, but I was wrong about the root > cause. > The problem was not caused by a nested hardirq, but rather by the fact > that the KMSAN hook in irqentry_enter() was called with in_task()==1. Argh, the preempt counter increment happens _after_ irqentry_enter(). > I think the best that can be done here is (as suggested above) to > provide some kmsan_unpoison_pt_regs() function that will only be > called from the entry points and won't be doing reentrancy checks. > It should be safe, because unpoisoning boils down to calculating > shadow/origin addresses and calling memset() on them, no instrumented > code will be involved. If you keep them where I placed them, then there is no need for a noinstr function. It's already instrumentable. > We could try to figure out the places in idtentry code where normal > kmsan_unpoison_memory() can be called in IRQ context, but as far as I > can see it will depend on the type of the entry point. NMI is covered as it increments before it invokes the unpoison(). Let me figure out why we increment the preempt count late for interrupts. IIRC it's for symmetry reasons related to softirq processing on return, but let me double check. > Another way to deal with the problem is to not rely on in_task(), but > rather use some per-cpu counter in irqentry_enter()/irqentry_exit() to > figure out whether we are in IRQ code already. Well, if you have a irqentry() specific unpoison, then you know the context, right? > However this is only possible irqentry_enter() itself guarantees that > the execution cannot be rescheduled to another CPU - is that the case? Obviously. It runs with interrupts disabled and eventually on a separate interrupt stack. Thanks, tglx
On Thu, May 12 2022 at 18:17, Thomas Gleixner wrote: > On Thu, May 12 2022 at 14:24, Alexander Potapenko wrote: >> We could try to figure out the places in idtentry code where normal >> kmsan_unpoison_memory() can be called in IRQ context, but as far as I >> can see it will depend on the type of the entry point. > > NMI is covered as it increments before it invokes the unpoison(). > > Let me figure out why we increment the preempt count late for > interrupts. IIRC it's for symmetry reasons related to softirq processing > on return, but let me double check. It's even documented: https://www.kernel.org/doc/html/latest/core-api/entry.html#interrupts-and-regular-exceptions But who reads documentation? :) So, I think the simplest and least intrusive solution is to have special purpose unpoison functions. See the patch below for illustration. The reasons why I used specific ones: 1) User entry Whether that's a syscall or interrupt/exception does not matter. It's always on the task stack and your machinery cannot be running at that point because it came from user space. 2) Interrupt/exception/NMI entry kernel Those can nest into an already active context, so you really want to unpoison @regs. Also while regular interrupts cannot nest because of interrupts staying disabled, exceptions triggered in the interrupt handler and NMIs can nest. -> device interrupt() irqentry_enter(regs) -> NMI() irqentry_nmi_enter(regs) -> fault() irqentry_enter(regs) --> debug_exception() irqentry_nmi_enter(regs) Soft interrupt processing on return from interrupt makes it more interesting: interrupt() handler() do_softirq() local_irq_enable() interrupt() NMI .... And everytime you get a new @regs pointer to deal with. Wonderful, isn't it? Thanks, tglx --- --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -24,6 +24,7 @@ static __always_inline void __enter_from user_exit_irqoff(); instrumentation_begin(); + unpoison_user(regs); trace_hardirqs_off_finish(); instrumentation_end(); } @@ -352,6 +353,7 @@ noinstr irqentry_state_t irqentry_enter( lockdep_hardirqs_off(CALLER_ADDR0); rcu_irq_enter(); instrumentation_begin(); + unpoison_irq(regs); trace_hardirqs_off_finish(); instrumentation_end(); @@ -367,6 +369,7 @@ noinstr irqentry_state_t irqentry_enter( */ lockdep_hardirqs_off(CALLER_ADDR0); instrumentation_begin(); + unpoison_irq(regs); rcu_irq_enter_check_tick(); trace_hardirqs_off_finish(); instrumentation_end(); @@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en rcu_nmi_enter(); instrumentation_begin(); + unpoison_irq(regs); trace_hardirqs_off_finish(); ftrace_nmi_enter(); instrumentation_end();
On Thu, May 12, 2022 at 6:48 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Thu, May 12 2022 at 18:17, Thomas Gleixner wrote: > > On Thu, May 12 2022 at 14:24, Alexander Potapenko wrote: > >> We could try to figure out the places in idtentry code where normal > >> kmsan_unpoison_memory() can be called in IRQ context, but as far as I > >> can see it will depend on the type of the entry point. > > > > NMI is covered as it increments before it invokes the unpoison(). > > > > Let me figure out why we increment the preempt count late for > > interrupts. IIRC it's for symmetry reasons related to softirq processing > > on return, but let me double check. > > It's even documented: > > https://www.kernel.org/doc/html/latest/core-api/entry.html#interrupts-and-regular-exceptions > > But who reads documentation? :) > > So, I think the simplest and least intrusive solution is to have special > purpose unpoison functions. See the patch below for illustration. This patch works well and I am going to adopt it for my series. But the problem with occasional calls of instrumented functions from noinstr still persists: if there is a noinstr function foo() and an instrumented function bar() called from foo() with one or more arguments, bar() must wipe its kmsan_context_state before using the arguments. I have a solution for this problem described in https://reviews.llvm.org/D126385 The plan is to pass __builtin_return_address(0) to __msan_get_context_state_caller() at the beginning of each instrumented function. Then KMSAN runtime can check the passed return address and wipe the context if it belongs to the .noinstr code section. Alternatively, we could employ MSan's -fsanitize-memory-param-retval flag, that will report supplying uninitialized parameters when calling functions. Doing so is currently allowed in the kernel, but Clang aggressively applies the noundef attribute (see https://llvm.org/docs/LangRef.html) to function arguments, which effectively makes passing uninit values as function parameters an UB. So if we make KMSAN detect such cases as well, we can ultimately get rid of all cases when uninits are passed to functions. As a result, kmsan_context_state will become unnecessary, because it will never contain nonzero values. > The reasons why I used specific ones: > > 1) User entry > > Whether that's a syscall or interrupt/exception does not > matter. It's always on the task stack and your machinery cannot be > running at that point because it came from user space. > > 2) Interrupt/exception/NMI entry kernel > > Those can nest into an already active context, so you really want > to unpoison @regs. > > Also while regular interrupts cannot nest because of interrupts > staying disabled, exceptions triggered in the interrupt handler and > NMIs can nest. > > -> device interrupt() > irqentry_enter(regs) > > -> NMI() > irqentry_nmi_enter(regs) > > -> fault() > irqentry_enter(regs) > > --> debug_exception() > irqentry_nmi_enter(regs) > > Soft interrupt processing on return from interrupt makes it more > interesting: > > interrupt() > handler() > do_softirq() > local_irq_enable() > interrupt() > NMI > .... > > And everytime you get a new @regs pointer to deal with. > > Wonderful, isn't it? > > Thanks, > > tglx >
diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 93c3b86e781c1..ce2324374882c 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -23,7 +23,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs) CT_WARN_ON(ct_state() != CONTEXT_USER); user_exit_irqoff(); - instrumentation_begin(); + instrumentation_begin_with_regs(regs); trace_hardirqs_off_finish(); instrumentation_end(); } @@ -105,7 +105,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall) __enter_from_user_mode(regs); - instrumentation_begin(); + instrumentation_begin_with_regs(regs); local_irq_enable(); ret = __syscall_enter_from_user_work(regs, syscall); instrumentation_end(); @@ -116,7 +116,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall) noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs) { __enter_from_user_mode(regs); - instrumentation_begin(); + instrumentation_begin_with_regs(regs); local_irq_enable(); instrumentation_end(); } @@ -290,7 +290,7 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs) __visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs) { - instrumentation_begin(); + instrumentation_begin_with_regs(regs); __syscall_exit_to_user_mode_work(regs); instrumentation_end(); __exit_to_user_mode(); @@ -303,7 +303,7 @@ noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs) noinstr void irqentry_exit_to_user_mode(struct pt_regs *regs) { - instrumentation_begin(); + instrumentation_begin_with_regs(regs); exit_to_user_mode_prepare(regs); instrumentation_end(); __exit_to_user_mode(); @@ -351,7 +351,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs) */ lockdep_hardirqs_off(CALLER_ADDR0); rcu_irq_enter(); - instrumentation_begin(); + instrumentation_begin_with_regs(regs); trace_hardirqs_off_finish(); instrumentation_end(); @@ -366,7 +366,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs) * in having another one here. */ lockdep_hardirqs_off(CALLER_ADDR0); - instrumentation_begin(); + instrumentation_begin_with_regs(regs); rcu_irq_enter_check_tick(); trace_hardirqs_off_finish(); instrumentation_end(); @@ -413,7 +413,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) * and RCU as the return to user mode path. */ if (state.exit_rcu) { - instrumentation_begin(); + instrumentation_begin_with_regs(regs); /* Tell the tracer that IRET will enable interrupts */ trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(CALLER_ADDR0); @@ -423,7 +423,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) return; } - instrumentation_begin(); + instrumentation_begin_with_regs(regs); if (IS_ENABLED(CONFIG_PREEMPTION)) irqentry_exit_cond_resched(); @@ -451,7 +451,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) lockdep_hardirq_enter(); rcu_nmi_enter(); - instrumentation_begin(); + instrumentation_begin_with_regs(regs); trace_hardirqs_off_finish(); ftrace_nmi_enter(); instrumentation_end(); @@ -461,7 +461,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state) { - instrumentation_begin(); + instrumentation_begin_with_regs(regs); ftrace_nmi_exit(); if (irq_state.lockdep) { trace_hardirqs_on_prepare();
Replace instrumentation_begin() with instrumentation_begin_with_regs() to let KMSAN handle the non-instrumented code and unpoison pt_regs passed from the instrumented part. Signed-off-by: Alexander Potapenko <glider@google.com> --- Link: https://linux-review.googlesource.com/id/I7f0a9809b66bd85faae43142971d0095771b7a42 --- kernel/entry/common.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)