Message ID | 20191114180303.66955-1-elver@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Add Kernel Concurrency Sanitizer (KCSAN) | expand |
On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > KCSAN is a sampling watchpoint-based *data race detector*. More details > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > only enables KCSAN for x86, but we expect adding support for other > architectures is relatively straightforward (we are aware of > experimental ARM64 and POWER support). > > To gather early feedback, we announced KCSAN back in September, and have > integrated the feedback where possible: > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > The current list of known upstream fixes for data races found by KCSAN > can be found here: > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > We want to point out and acknowledge the work surrounding the LKMM, > including several articles that motivate why data races are dangerous > [1, 2], justifying a data race detector such as KCSAN. > > [1] https://lwn.net/Articles/793253/ > [2] https://lwn.net/Articles/799218/ I queued this and ran a quick rcutorture on it, which completed successfully with quite a few reports. Thanx, Paul > Race conditions vs. data races > ------------------------------ > > Race conditions are logic bugs, where unexpected interleaving of racing > concurrent operations result in an erroneous state. > > Data races on the other hand are defined at the *memory model/language > level*. Many data races are also harmful race conditions, which a tool > like KCSAN reports! However, not all data races are race conditions and > vice-versa. KCSAN's intent is to report data races according to the > LKMM. A data race detector can only work at the memory model/language > level. > > Deeper analysis, to find high-level race conditions only, requires > conveying the intended kernel logic to a tool. This requires (1) the > developer writing a specification or model of their code, and then (2) > the tool verifying that the implementation matches. This has been done > for small bits of code using model checkers and other formal methods, > but does not scale to the level of what can be covered with a dynamic > analysis based data race detector such as KCSAN. > > For reasons outlined in [1, 2], data races can be much more subtle, but > can cause no less harm than high-level race conditions. > > Changelog > --------- > v4: > * Major changes: > - Optimizations resulting in performance improvement of 33% (on > microbenchmark). > - Deal with nested interrupts for atomic_next. > - Simplify report.c (removing double-locking as well), in preparation > for KCSAN_REPORT_VALUE_CHANGE_ONLY. > - Add patch to introduce "data_race(expr)" macro. > - Introduce KCSAN_REPORT_VALUE_CHANGE_ONLY option for further filtering of data > races: if a conflicting write was observed via a watchpoint, only report the > data race if a value change was observed as well. The option will be enabled > by default on syzbot. (rcu-functions will be excluded from this filter at > request of Paul McKenney.) Context: > http://lkml.kernel.org/r/CANpmjNOepvb6+zJmDePxj21n2rctM4Sp4rJ66x_J-L1UmNK54A@mail.gmail.com > > v3: http://lkml.kernel.org/r/20191104142745.14722-1-elver@google.com > * Major changes: > - Add microbenchmark. > - Add instruction watchpoint skip randomization. > - Refactor API and core runtime fast-path and slow-path. Compared to > the previous version, with a default config and benchmarked using the > added microbenchmark, this version is 3.8x faster. > - Make __tsan_unaligned __alias of generic accesses. > - Rename kcsan_{begin,end}_atomic -> > kcsan_{nestable,flat}_atomic_{begin,end} > - For filter list in debugfs.c use kmalloc+krealloc instead of > kvmalloc. > - Split Documentation into separate patch. > > v2: http://lkml.kernel.org/r/20191017141305.146193-1-elver@google.com > * Major changes: > - Replace kcsan_check_access(.., {true, false}) with > kcsan_check_{read,write}. > - Change atomic-instrumented.h to use __atomic_check_{read,write}. > - Use common struct kcsan_ctx in task_struct and for per-CPU interrupt > contexts. > > v1: http://lkml.kernel.org/r/20191016083959.186860-1-elver@google.com > > Marco Elver (10): > kcsan: Add Kernel Concurrency Sanitizer infrastructure > include/linux/compiler.h: Introduce data_race(expr) macro > kcsan: Add Documentation entry in dev-tools > objtool, kcsan: Add KCSAN runtime functions to whitelist > build, kcsan: Add KCSAN build exceptions > seqlock, kcsan: Add annotations for KCSAN > seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier > asm-generic, kcsan: Add KCSAN instrumentation for bitops > locking/atomics, kcsan: Add KCSAN instrumentation > x86, kcsan: Enable KCSAN for x86 > > Documentation/dev-tools/index.rst | 1 + > Documentation/dev-tools/kcsan.rst | 256 +++++++++ > MAINTAINERS | 11 + > Makefile | 3 +- > arch/x86/Kconfig | 1 + > arch/x86/boot/Makefile | 2 + > arch/x86/boot/compressed/Makefile | 2 + > arch/x86/entry/vdso/Makefile | 3 + > arch/x86/include/asm/bitops.h | 6 +- > arch/x86/kernel/Makefile | 4 + > arch/x86/kernel/cpu/Makefile | 3 + > arch/x86/lib/Makefile | 4 + > arch/x86/mm/Makefile | 4 + > arch/x86/purgatory/Makefile | 2 + > arch/x86/realmode/Makefile | 3 + > arch/x86/realmode/rm/Makefile | 3 + > drivers/firmware/efi/libstub/Makefile | 2 + > include/asm-generic/atomic-instrumented.h | 393 +++++++------- > include/asm-generic/bitops-instrumented.h | 18 + > include/linux/compiler-clang.h | 9 + > include/linux/compiler-gcc.h | 7 + > include/linux/compiler.h | 57 +- > include/linux/kcsan-checks.h | 97 ++++ > include/linux/kcsan.h | 115 ++++ > include/linux/sched.h | 4 + > include/linux/seqlock.h | 51 +- > init/init_task.c | 8 + > init/main.c | 2 + > kernel/Makefile | 6 + > kernel/kcsan/Makefile | 11 + > kernel/kcsan/atomic.h | 27 + > kernel/kcsan/core.c | 626 ++++++++++++++++++++++ > kernel/kcsan/debugfs.c | 275 ++++++++++ > kernel/kcsan/encoding.h | 94 ++++ > kernel/kcsan/kcsan.h | 108 ++++ > kernel/kcsan/report.c | 320 +++++++++++ > kernel/kcsan/test.c | 121 +++++ > kernel/sched/Makefile | 6 + > lib/Kconfig.debug | 2 + > lib/Kconfig.kcsan | 118 ++++ > lib/Makefile | 3 + > mm/Makefile | 8 + > scripts/Makefile.kcsan | 6 + > scripts/Makefile.lib | 10 + > scripts/atomic/gen-atomic-instrumented.sh | 17 +- > tools/objtool/check.c | 18 + > 46 files changed, 2641 insertions(+), 206 deletions(-) > create mode 100644 Documentation/dev-tools/kcsan.rst > create mode 100644 include/linux/kcsan-checks.h > create mode 100644 include/linux/kcsan.h > create mode 100644 kernel/kcsan/Makefile > create mode 100644 kernel/kcsan/atomic.h > create mode 100644 kernel/kcsan/core.c > create mode 100644 kernel/kcsan/debugfs.c > create mode 100644 kernel/kcsan/encoding.h > create mode 100644 kernel/kcsan/kcsan.h > create mode 100644 kernel/kcsan/report.c > create mode 100644 kernel/kcsan/test.c > create mode 100644 lib/Kconfig.kcsan > create mode 100644 scripts/Makefile.kcsan > > -- > 2.24.0.rc1.363.gb1bccd3e3d-goog >
On Thu, 14 Nov 2019, Paul E. McKenney wrote: > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > only enables KCSAN for x86, but we expect adding support for other > > architectures is relatively straightforward (we are aware of > > experimental ARM64 and POWER support). > > > > To gather early feedback, we announced KCSAN back in September, and have > > integrated the feedback where possible: > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > The current list of known upstream fixes for data races found by KCSAN > > can be found here: > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > We want to point out and acknowledge the work surrounding the LKMM, > > including several articles that motivate why data races are dangerous > > [1, 2], justifying a data race detector such as KCSAN. > > > > [1] https://lwn.net/Articles/793253/ > > [2] https://lwn.net/Articles/799218/ > > I queued this and ran a quick rcutorture on it, which completed > successfully with quite a few reports. Great. Many thanks for queuing this in -rcu. And regarding merge window you mentioned, we're fine with your assumption to targeting the next (v5.6) merge window. I've just had a look at linux-next to check what a future rebase requires: - There is a change in lib/Kconfig.debug and moving KCSAN to the "Generic Kernel Debugging Instruments" section seems appropriate. - bitops-instrumented.h was removed and split into 3 files, and needs re-inserting the instrumentation into the right places. Otherwise there are no issues. Let me know what you recommend. Thanks, -- Marco
On Thu, Nov 14, 2019 at 10:33:03PM +0100, Marco Elver wrote: > On Thu, 14 Nov 2019, Paul E. McKenney wrote: > > > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > only enables KCSAN for x86, but we expect adding support for other > > > architectures is relatively straightforward (we are aware of > > > experimental ARM64 and POWER support). > > > > > > To gather early feedback, we announced KCSAN back in September, and have > > > integrated the feedback where possible: > > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > > > The current list of known upstream fixes for data races found by KCSAN > > > can be found here: > > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > > > We want to point out and acknowledge the work surrounding the LKMM, > > > including several articles that motivate why data races are dangerous > > > [1, 2], justifying a data race detector such as KCSAN. > > > > > > [1] https://lwn.net/Articles/793253/ > > > [2] https://lwn.net/Articles/799218/ > > > > I queued this and ran a quick rcutorture on it, which completed > > successfully with quite a few reports. > > Great. Many thanks for queuing this in -rcu. And regarding merge window > you mentioned, we're fine with your assumption to targeting the next > (v5.6) merge window. > > I've just had a look at linux-next to check what a future rebase > requires: > > - There is a change in lib/Kconfig.debug and moving KCSAN to the > "Generic Kernel Debugging Instruments" section seems appropriate. > - bitops-instrumented.h was removed and split into 3 files, and needs > re-inserting the instrumentation into the right places. > > Otherwise there are no issues. Let me know what you recommend. Sounds good! I will be rebasing onto v5.5-rc1 shortly after it comes out. My usual approach is to fix any conflicts during that rebasing operation. Does that make sense, or would you prefer to send me a rebased stack at that point? Either way is fine for me. Thanx, Paul
On Thu, 14 Nov 2019 at 23:16, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Thu, Nov 14, 2019 at 10:33:03PM +0100, Marco Elver wrote: > > On Thu, 14 Nov 2019, Paul E. McKenney wrote: > > > > > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > > only enables KCSAN for x86, but we expect adding support for other > > > > architectures is relatively straightforward (we are aware of > > > > experimental ARM64 and POWER support). > > > > > > > > To gather early feedback, we announced KCSAN back in September, and have > > > > integrated the feedback where possible: > > > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > > > > > The current list of known upstream fixes for data races found by KCSAN > > > > can be found here: > > > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > > > > > We want to point out and acknowledge the work surrounding the LKMM, > > > > including several articles that motivate why data races are dangerous > > > > [1, 2], justifying a data race detector such as KCSAN. > > > > > > > > [1] https://lwn.net/Articles/793253/ > > > > [2] https://lwn.net/Articles/799218/ > > > > > > I queued this and ran a quick rcutorture on it, which completed > > > successfully with quite a few reports. > > > > Great. Many thanks for queuing this in -rcu. And regarding merge window > > you mentioned, we're fine with your assumption to targeting the next > > (v5.6) merge window. > > > > I've just had a look at linux-next to check what a future rebase > > requires: > > > > - There is a change in lib/Kconfig.debug and moving KCSAN to the > > "Generic Kernel Debugging Instruments" section seems appropriate. > > - bitops-instrumented.h was removed and split into 3 files, and needs > > re-inserting the instrumentation into the right places. > > > > Otherwise there are no issues. Let me know what you recommend. > > Sounds good! > > I will be rebasing onto v5.5-rc1 shortly after it comes out. My usual > approach is to fix any conflicts during that rebasing operation. > Does that make sense, or would you prefer to send me a rebased stack at > that point? Either way is fine for me. That's fine with me, thanks! To avoid too much additional churn on your end, I just replied to the bitops patch with a version that will apply with the change to bitops-instrumented infrastructure. Also considering the merge window, we had a discussion and there are some arguments for targeting the v5.5 merge window: - we'd unblock ARM and POWER ports; - we'd unblock people wanting to use the data_race macro; - we'd unblock syzbot just tracking upstream; Unless there are strong reasons to not target v5.5, I leave it to you if you think it's appropriate. Thanks, -- Marco
On Fri, Nov 15, 2019 at 01:02:08PM +0100, Marco Elver wrote: > On Thu, 14 Nov 2019 at 23:16, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Thu, Nov 14, 2019 at 10:33:03PM +0100, Marco Elver wrote: > > > On Thu, 14 Nov 2019, Paul E. McKenney wrote: > > > > > > > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > > > only enables KCSAN for x86, but we expect adding support for other > > > > > architectures is relatively straightforward (we are aware of > > > > > experimental ARM64 and POWER support). > > > > > > > > > > To gather early feedback, we announced KCSAN back in September, and have > > > > > integrated the feedback where possible: > > > > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > > > > > > > The current list of known upstream fixes for data races found by KCSAN > > > > > can be found here: > > > > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > > > > > > > We want to point out and acknowledge the work surrounding the LKMM, > > > > > including several articles that motivate why data races are dangerous > > > > > [1, 2], justifying a data race detector such as KCSAN. > > > > > > > > > > [1] https://lwn.net/Articles/793253/ > > > > > [2] https://lwn.net/Articles/799218/ > > > > > > > > I queued this and ran a quick rcutorture on it, which completed > > > > successfully with quite a few reports. > > > > > > Great. Many thanks for queuing this in -rcu. And regarding merge window > > > you mentioned, we're fine with your assumption to targeting the next > > > (v5.6) merge window. > > > > > > I've just had a look at linux-next to check what a future rebase > > > requires: > > > > > > - There is a change in lib/Kconfig.debug and moving KCSAN to the > > > "Generic Kernel Debugging Instruments" section seems appropriate. > > > - bitops-instrumented.h was removed and split into 3 files, and needs > > > re-inserting the instrumentation into the right places. > > > > > > Otherwise there are no issues. Let me know what you recommend. > > > > Sounds good! > > > > I will be rebasing onto v5.5-rc1 shortly after it comes out. My usual > > approach is to fix any conflicts during that rebasing operation. > > Does that make sense, or would you prefer to send me a rebased stack at > > that point? Either way is fine for me. > > That's fine with me, thanks! To avoid too much additional churn on > your end, I just replied to the bitops patch with a version that will > apply with the change to bitops-instrumented infrastructure. My first thought was to replace 8/10 of the previous version of your patch in -rcu (047ca266cfab "asm-generic, kcsan: Add KCSAN instrumentation for bitops"), but this does not apply. So I am guessing that I instead do this substitution when a rebase onto -rc1.. Except... > Also considering the merge window, we had a discussion and there are > some arguments for targeting the v5.5 merge window: > - we'd unblock ARM and POWER ports; > - we'd unblock people wanting to use the data_race macro; > - we'd unblock syzbot just tracking upstream; > Unless there are strong reasons to not target v5.5, I leave it to you > if you think it's appropriate. My normal process is to send the pull request shortly after -rc5 comes out, but you do call out some benefits of getting it in sooner, so... What I will do is to rebase your series onto (say) -rc7, test it, and see about an RFC pull request. One possible complication is the new 8/10 patch. But maybe it will apply against -rc7? Another possible complication is this: scripts/kconfig/conf --syncconfig Kconfig * * Restart config... * * * KCSAN: watchpoint-based dynamic data race detector * KCSAN: watchpoint-based dynamic data race detector (KCSAN) [N/y/?] (NEW) Might be OK in this case because it is quite obvious what it is doing. (Avoiding pain from this is the reason that CONFIG_RCU_EXPERT exists.) But I will just mention this in the pull request. If there is a -rc8, there is of course a higher probability of making it into the next merge window. Fair enough? Thanx, Paul
On Fri, 15 Nov 2019 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Nov 15, 2019 at 01:02:08PM +0100, Marco Elver wrote: > > On Thu, 14 Nov 2019 at 23:16, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Thu, Nov 14, 2019 at 10:33:03PM +0100, Marco Elver wrote: > > > > On Thu, 14 Nov 2019, Paul E. McKenney wrote: > > > > > > > > > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > > > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > > > > only enables KCSAN for x86, but we expect adding support for other > > > > > > architectures is relatively straightforward (we are aware of > > > > > > experimental ARM64 and POWER support). > > > > > > > > > > > > To gather early feedback, we announced KCSAN back in September, and have > > > > > > integrated the feedback where possible: > > > > > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > > > > > > > > > The current list of known upstream fixes for data races found by KCSAN > > > > > > can be found here: > > > > > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > > > > > > > > > We want to point out and acknowledge the work surrounding the LKMM, > > > > > > including several articles that motivate why data races are dangerous > > > > > > [1, 2], justifying a data race detector such as KCSAN. > > > > > > > > > > > > [1] https://lwn.net/Articles/793253/ > > > > > > [2] https://lwn.net/Articles/799218/ > > > > > > > > > > I queued this and ran a quick rcutorture on it, which completed > > > > > successfully with quite a few reports. > > > > > > > > Great. Many thanks for queuing this in -rcu. And regarding merge window > > > > you mentioned, we're fine with your assumption to targeting the next > > > > (v5.6) merge window. > > > > > > > > I've just had a look at linux-next to check what a future rebase > > > > requires: > > > > > > > > - There is a change in lib/Kconfig.debug and moving KCSAN to the > > > > "Generic Kernel Debugging Instruments" section seems appropriate. > > > > - bitops-instrumented.h was removed and split into 3 files, and needs > > > > re-inserting the instrumentation into the right places. > > > > > > > > Otherwise there are no issues. Let me know what you recommend. > > > > > > Sounds good! > > > > > > I will be rebasing onto v5.5-rc1 shortly after it comes out. My usual > > > approach is to fix any conflicts during that rebasing operation. > > > Does that make sense, or would you prefer to send me a rebased stack at > > > that point? Either way is fine for me. > > > > That's fine with me, thanks! To avoid too much additional churn on > > your end, I just replied to the bitops patch with a version that will > > apply with the change to bitops-instrumented infrastructure. > > My first thought was to replace 8/10 of the previous version of your > patch in -rcu (047ca266cfab "asm-generic, kcsan: Add KCSAN instrumentation > for bitops"), but this does not apply. So I am guessing that I instead > do this substitution when a rebase onto -rc1.. > > Except... > > > Also considering the merge window, we had a discussion and there are > > some arguments for targeting the v5.5 merge window: > > - we'd unblock ARM and POWER ports; > > - we'd unblock people wanting to use the data_race macro; > > - we'd unblock syzbot just tracking upstream; > > Unless there are strong reasons to not target v5.5, I leave it to you > > if you think it's appropriate. > > My normal process is to send the pull request shortly after -rc5 comes > out, but you do call out some benefits of getting it in sooner, so... > > What I will do is to rebase your series onto (say) -rc7, test it, and > see about an RFC pull request. > > One possible complication is the new 8/10 patch. But maybe it will > apply against -rc7? > > Another possible complication is this: > > scripts/kconfig/conf --syncconfig Kconfig > * > * Restart config... > * > * > * KCSAN: watchpoint-based dynamic data race detector > * > KCSAN: watchpoint-based dynamic data race detector (KCSAN) [N/y/?] (NEW) > > Might be OK in this case because it is quite obvious what it is doing. > (Avoiding pain from this is the reason that CONFIG_RCU_EXPERT exists.) > > But I will just mention this in the pull request. > > If there is a -rc8, there is of course a higher probability of making it > into the next merge window. > > Fair enough? Totally fine with that, sounds like a good plan, thanks! If it helps, in theory we can also drop and delay the bitops instrumentation patch until the new bitops instrumentation infrastructure is in 5.5-rc1. There won't be any false positives if this is missing, we might just miss a few data races until we have it. Thanks, -- Marco
On Fri, Nov 15, 2019 at 06:14:46PM +0100, Marco Elver wrote: > On Fri, 15 Nov 2019 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Fri, Nov 15, 2019 at 01:02:08PM +0100, Marco Elver wrote: > > > On Thu, 14 Nov 2019 at 23:16, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Thu, Nov 14, 2019 at 10:33:03PM +0100, Marco Elver wrote: > > > > > On Thu, 14 Nov 2019, Paul E. McKenney wrote: > > > > > > > > > > > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > > > > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > > > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > > > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > > > > > only enables KCSAN for x86, but we expect adding support for other > > > > > > > architectures is relatively straightforward (we are aware of > > > > > > > experimental ARM64 and POWER support). > > > > > > > > > > > > > > To gather early feedback, we announced KCSAN back in September, and have > > > > > > > integrated the feedback where possible: > > > > > > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > > > > > > > > > > > The current list of known upstream fixes for data races found by KCSAN > > > > > > > can be found here: > > > > > > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > > > > > > > > > > > We want to point out and acknowledge the work surrounding the LKMM, > > > > > > > including several articles that motivate why data races are dangerous > > > > > > > [1, 2], justifying a data race detector such as KCSAN. > > > > > > > > > > > > > > [1] https://lwn.net/Articles/793253/ > > > > > > > [2] https://lwn.net/Articles/799218/ > > > > > > > > > > > > I queued this and ran a quick rcutorture on it, which completed > > > > > > successfully with quite a few reports. > > > > > > > > > > Great. Many thanks for queuing this in -rcu. And regarding merge window > > > > > you mentioned, we're fine with your assumption to targeting the next > > > > > (v5.6) merge window. > > > > > > > > > > I've just had a look at linux-next to check what a future rebase > > > > > requires: > > > > > > > > > > - There is a change in lib/Kconfig.debug and moving KCSAN to the > > > > > "Generic Kernel Debugging Instruments" section seems appropriate. > > > > > - bitops-instrumented.h was removed and split into 3 files, and needs > > > > > re-inserting the instrumentation into the right places. > > > > > > > > > > Otherwise there are no issues. Let me know what you recommend. > > > > > > > > Sounds good! > > > > > > > > I will be rebasing onto v5.5-rc1 shortly after it comes out. My usual > > > > approach is to fix any conflicts during that rebasing operation. > > > > Does that make sense, or would you prefer to send me a rebased stack at > > > > that point? Either way is fine for me. > > > > > > That's fine with me, thanks! To avoid too much additional churn on > > > your end, I just replied to the bitops patch with a version that will > > > apply with the change to bitops-instrumented infrastructure. > > > > My first thought was to replace 8/10 of the previous version of your > > patch in -rcu (047ca266cfab "asm-generic, kcsan: Add KCSAN instrumentation > > for bitops"), but this does not apply. So I am guessing that I instead > > do this substitution when a rebase onto -rc1.. > > > > Except... > > > > > Also considering the merge window, we had a discussion and there are > > > some arguments for targeting the v5.5 merge window: > > > - we'd unblock ARM and POWER ports; > > > - we'd unblock people wanting to use the data_race macro; > > > - we'd unblock syzbot just tracking upstream; > > > Unless there are strong reasons to not target v5.5, I leave it to you > > > if you think it's appropriate. > > > > My normal process is to send the pull request shortly after -rc5 comes > > out, but you do call out some benefits of getting it in sooner, so... > > > > What I will do is to rebase your series onto (say) -rc7, test it, and > > see about an RFC pull request. > > > > One possible complication is the new 8/10 patch. But maybe it will > > apply against -rc7? > > > > Another possible complication is this: > > > > scripts/kconfig/conf --syncconfig Kconfig > > * > > * Restart config... > > * > > * > > * KCSAN: watchpoint-based dynamic data race detector > > * > > KCSAN: watchpoint-based dynamic data race detector (KCSAN) [N/y/?] (NEW) > > > > Might be OK in this case because it is quite obvious what it is doing. > > (Avoiding pain from this is the reason that CONFIG_RCU_EXPERT exists.) > > > > But I will just mention this in the pull request. > > > > If there is a -rc8, there is of course a higher probability of making it > > into the next merge window. > > > > Fair enough? > > Totally fine with that, sounds like a good plan, thanks! > > If it helps, in theory we can also drop and delay the bitops > instrumentation patch until the new bitops instrumentation > infrastructure is in 5.5-rc1. There won't be any false positives if > this is missing, we might just miss a few data races until we have it. That sounds advisable for an attempt to hit this coming merge window. So just to make sure I understand, I drop 8/10 and keep the rest during a rebase to 5.4-rc7, correct? Thanx, Paul
On Fri, 15 Nov 2019 at 21:43, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Nov 15, 2019 at 06:14:46PM +0100, Marco Elver wrote: > > On Fri, 15 Nov 2019 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Fri, Nov 15, 2019 at 01:02:08PM +0100, Marco Elver wrote: > > > > On Thu, 14 Nov 2019 at 23:16, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > On Thu, Nov 14, 2019 at 10:33:03PM +0100, Marco Elver wrote: > > > > > > On Thu, 14 Nov 2019, Paul E. McKenney wrote: > > > > > > > > > > > > > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > > > > > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > > > > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > > > > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > > > > > > only enables KCSAN for x86, but we expect adding support for other > > > > > > > > architectures is relatively straightforward (we are aware of > > > > > > > > experimental ARM64 and POWER support). > > > > > > > > > > > > > > > > To gather early feedback, we announced KCSAN back in September, and have > > > > > > > > integrated the feedback where possible: > > > > > > > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > > > > > > > > > > > > > The current list of known upstream fixes for data races found by KCSAN > > > > > > > > can be found here: > > > > > > > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > > > > > > > > > > > > > We want to point out and acknowledge the work surrounding the LKMM, > > > > > > > > including several articles that motivate why data races are dangerous > > > > > > > > [1, 2], justifying a data race detector such as KCSAN. > > > > > > > > > > > > > > > > [1] https://lwn.net/Articles/793253/ > > > > > > > > [2] https://lwn.net/Articles/799218/ > > > > > > > > > > > > > > I queued this and ran a quick rcutorture on it, which completed > > > > > > > successfully with quite a few reports. > > > > > > > > > > > > Great. Many thanks for queuing this in -rcu. And regarding merge window > > > > > > you mentioned, we're fine with your assumption to targeting the next > > > > > > (v5.6) merge window. > > > > > > > > > > > > I've just had a look at linux-next to check what a future rebase > > > > > > requires: > > > > > > > > > > > > - There is a change in lib/Kconfig.debug and moving KCSAN to the > > > > > > "Generic Kernel Debugging Instruments" section seems appropriate. > > > > > > - bitops-instrumented.h was removed and split into 3 files, and needs > > > > > > re-inserting the instrumentation into the right places. > > > > > > > > > > > > Otherwise there are no issues. Let me know what you recommend. > > > > > > > > > > Sounds good! > > > > > > > > > > I will be rebasing onto v5.5-rc1 shortly after it comes out. My usual > > > > > approach is to fix any conflicts during that rebasing operation. > > > > > Does that make sense, or would you prefer to send me a rebased stack at > > > > > that point? Either way is fine for me. > > > > > > > > That's fine with me, thanks! To avoid too much additional churn on > > > > your end, I just replied to the bitops patch with a version that will > > > > apply with the change to bitops-instrumented infrastructure. > > > > > > My first thought was to replace 8/10 of the previous version of your > > > patch in -rcu (047ca266cfab "asm-generic, kcsan: Add KCSAN instrumentation > > > for bitops"), but this does not apply. So I am guessing that I instead > > > do this substitution when a rebase onto -rc1.. > > > > > > Except... > > > > > > > Also considering the merge window, we had a discussion and there are > > > > some arguments for targeting the v5.5 merge window: > > > > - we'd unblock ARM and POWER ports; > > > > - we'd unblock people wanting to use the data_race macro; > > > > - we'd unblock syzbot just tracking upstream; > > > > Unless there are strong reasons to not target v5.5, I leave it to you > > > > if you think it's appropriate. > > > > > > My normal process is to send the pull request shortly after -rc5 comes > > > out, but you do call out some benefits of getting it in sooner, so... > > > > > > What I will do is to rebase your series onto (say) -rc7, test it, and > > > see about an RFC pull request. > > > > > > One possible complication is the new 8/10 patch. But maybe it will > > > apply against -rc7? > > > > > > Another possible complication is this: > > > > > > scripts/kconfig/conf --syncconfig Kconfig > > > * > > > * Restart config... > > > * > > > * > > > * KCSAN: watchpoint-based dynamic data race detector > > > * > > > KCSAN: watchpoint-based dynamic data race detector (KCSAN) [N/y/?] (NEW) > > > > > > Might be OK in this case because it is quite obvious what it is doing. > > > (Avoiding pain from this is the reason that CONFIG_RCU_EXPERT exists.) > > > > > > But I will just mention this in the pull request. > > > > > > If there is a -rc8, there is of course a higher probability of making it > > > into the next merge window. > > > > > > Fair enough? > > > > Totally fine with that, sounds like a good plan, thanks! > > > > If it helps, in theory we can also drop and delay the bitops > > instrumentation patch until the new bitops instrumentation > > infrastructure is in 5.5-rc1. There won't be any false positives if > > this is missing, we might just miss a few data races until we have it. > > That sounds advisable for an attempt to hit this coming merge window. > > So just to make sure I understand, I drop 8/10 and keep the rest during > a rebase to 5.4-rc7, correct? Yes, that's right. Many thanks, -- Marco
On Sat, Nov 16, 2019 at 09:20:54AM +0100, Marco Elver wrote: > On Fri, 15 Nov 2019 at 21:43, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Fri, Nov 15, 2019 at 06:14:46PM +0100, Marco Elver wrote: > > > On Fri, 15 Nov 2019 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Fri, Nov 15, 2019 at 01:02:08PM +0100, Marco Elver wrote: > > > > > On Thu, 14 Nov 2019 at 23:16, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > > > On Thu, Nov 14, 2019 at 10:33:03PM +0100, Marco Elver wrote: > > > > > > > On Thu, 14 Nov 2019, Paul E. McKenney wrote: > > > > > > > > > > > > > > > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > > > > > > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > > > > > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > > > > > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > > > > > > > only enables KCSAN for x86, but we expect adding support for other > > > > > > > > > architectures is relatively straightforward (we are aware of > > > > > > > > > experimental ARM64 and POWER support). > > > > > > > > > > > > > > > > > > To gather early feedback, we announced KCSAN back in September, and have > > > > > > > > > integrated the feedback where possible: > > > > > > > > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > > > > > > > > > > > > > > > The current list of known upstream fixes for data races found by KCSAN > > > > > > > > > can be found here: > > > > > > > > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > > > > > > > > > > > > > > > We want to point out and acknowledge the work surrounding the LKMM, > > > > > > > > > including several articles that motivate why data races are dangerous > > > > > > > > > [1, 2], justifying a data race detector such as KCSAN. > > > > > > > > > > > > > > > > > > [1] https://lwn.net/Articles/793253/ > > > > > > > > > [2] https://lwn.net/Articles/799218/ > > > > > > > > > > > > > > > > I queued this and ran a quick rcutorture on it, which completed > > > > > > > > successfully with quite a few reports. > > > > > > > > > > > > > > Great. Many thanks for queuing this in -rcu. And regarding merge window > > > > > > > you mentioned, we're fine with your assumption to targeting the next > > > > > > > (v5.6) merge window. > > > > > > > > > > > > > > I've just had a look at linux-next to check what a future rebase > > > > > > > requires: > > > > > > > > > > > > > > - There is a change in lib/Kconfig.debug and moving KCSAN to the > > > > > > > "Generic Kernel Debugging Instruments" section seems appropriate. > > > > > > > - bitops-instrumented.h was removed and split into 3 files, and needs > > > > > > > re-inserting the instrumentation into the right places. > > > > > > > > > > > > > > Otherwise there are no issues. Let me know what you recommend. > > > > > > > > > > > > Sounds good! > > > > > > > > > > > > I will be rebasing onto v5.5-rc1 shortly after it comes out. My usual > > > > > > approach is to fix any conflicts during that rebasing operation. > > > > > > Does that make sense, or would you prefer to send me a rebased stack at > > > > > > that point? Either way is fine for me. > > > > > > > > > > That's fine with me, thanks! To avoid too much additional churn on > > > > > your end, I just replied to the bitops patch with a version that will > > > > > apply with the change to bitops-instrumented infrastructure. > > > > > > > > My first thought was to replace 8/10 of the previous version of your > > > > patch in -rcu (047ca266cfab "asm-generic, kcsan: Add KCSAN instrumentation > > > > for bitops"), but this does not apply. So I am guessing that I instead > > > > do this substitution when a rebase onto -rc1.. > > > > > > > > Except... > > > > > > > > > Also considering the merge window, we had a discussion and there are > > > > > some arguments for targeting the v5.5 merge window: > > > > > - we'd unblock ARM and POWER ports; > > > > > - we'd unblock people wanting to use the data_race macro; > > > > > - we'd unblock syzbot just tracking upstream; > > > > > Unless there are strong reasons to not target v5.5, I leave it to you > > > > > if you think it's appropriate. > > > > > > > > My normal process is to send the pull request shortly after -rc5 comes > > > > out, but you do call out some benefits of getting it in sooner, so... > > > > > > > > What I will do is to rebase your series onto (say) -rc7, test it, and > > > > see about an RFC pull request. > > > > > > > > One possible complication is the new 8/10 patch. But maybe it will > > > > apply against -rc7? > > > > > > > > Another possible complication is this: > > > > > > > > scripts/kconfig/conf --syncconfig Kconfig > > > > * > > > > * Restart config... > > > > * > > > > * > > > > * KCSAN: watchpoint-based dynamic data race detector > > > > * > > > > KCSAN: watchpoint-based dynamic data race detector (KCSAN) [N/y/?] (NEW) > > > > > > > > Might be OK in this case because it is quite obvious what it is doing. > > > > (Avoiding pain from this is the reason that CONFIG_RCU_EXPERT exists.) > > > > > > > > But I will just mention this in the pull request. > > > > > > > > If there is a -rc8, there is of course a higher probability of making it > > > > into the next merge window. > > > > > > > > Fair enough? > > > > > > Totally fine with that, sounds like a good plan, thanks! > > > > > > If it helps, in theory we can also drop and delay the bitops > > > instrumentation patch until the new bitops instrumentation > > > infrastructure is in 5.5-rc1. There won't be any false positives if > > > this is missing, we might just miss a few data races until we have it. > > > > That sounds advisable for an attempt to hit this coming merge window. > > > > So just to make sure I understand, I drop 8/10 and keep the rest during > > a rebase to 5.4-rc7, correct? > > Yes, that's right. Very good, I just now pushed a "kcsan" branch on -rcu, and am running rcutorture, first without KCSAN enabled and then with it turned on. If all that works out, I set my -next branch to that point and see what -next testing and kbuild test robot think about it. If all goes well, an RFC pull request. Look OK? Thanx, Paul
On Sat, 16 Nov 2019 at 16:34, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Sat, Nov 16, 2019 at 09:20:54AM +0100, Marco Elver wrote: > > On Fri, 15 Nov 2019 at 21:43, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Fri, Nov 15, 2019 at 06:14:46PM +0100, Marco Elver wrote: > > > > On Fri, 15 Nov 2019 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > On Fri, Nov 15, 2019 at 01:02:08PM +0100, Marco Elver wrote: > > > > > > On Thu, 14 Nov 2019 at 23:16, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > > > > > On Thu, Nov 14, 2019 at 10:33:03PM +0100, Marco Elver wrote: > > > > > > > > On Thu, 14 Nov 2019, Paul E. McKenney wrote: > > > > > > > > > > > > > > > > > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > > > > > > > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > > > > > > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > > > > > > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > > > > > > > > only enables KCSAN for x86, but we expect adding support for other > > > > > > > > > > architectures is relatively straightforward (we are aware of > > > > > > > > > > experimental ARM64 and POWER support). > > > > > > > > > > > > > > > > > > > > To gather early feedback, we announced KCSAN back in September, and have > > > > > > > > > > integrated the feedback where possible: > > > > > > > > > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > > > > > > > > > > > > > > > > > The current list of known upstream fixes for data races found by KCSAN > > > > > > > > > > can be found here: > > > > > > > > > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > > > > > > > > > > > > > > > > > We want to point out and acknowledge the work surrounding the LKMM, > > > > > > > > > > including several articles that motivate why data races are dangerous > > > > > > > > > > [1, 2], justifying a data race detector such as KCSAN. > > > > > > > > > > > > > > > > > > > > [1] https://lwn.net/Articles/793253/ > > > > > > > > > > [2] https://lwn.net/Articles/799218/ > > > > > > > > > > > > > > > > > > I queued this and ran a quick rcutorture on it, which completed > > > > > > > > > successfully with quite a few reports. > > > > > > > > > > > > > > > > Great. Many thanks for queuing this in -rcu. And regarding merge window > > > > > > > > you mentioned, we're fine with your assumption to targeting the next > > > > > > > > (v5.6) merge window. > > > > > > > > > > > > > > > > I've just had a look at linux-next to check what a future rebase > > > > > > > > requires: > > > > > > > > > > > > > > > > - There is a change in lib/Kconfig.debug and moving KCSAN to the > > > > > > > > "Generic Kernel Debugging Instruments" section seems appropriate. > > > > > > > > - bitops-instrumented.h was removed and split into 3 files, and needs > > > > > > > > re-inserting the instrumentation into the right places. > > > > > > > > > > > > > > > > Otherwise there are no issues. Let me know what you recommend. > > > > > > > > > > > > > > Sounds good! > > > > > > > > > > > > > > I will be rebasing onto v5.5-rc1 shortly after it comes out. My usual > > > > > > > approach is to fix any conflicts during that rebasing operation. > > > > > > > Does that make sense, or would you prefer to send me a rebased stack at > > > > > > > that point? Either way is fine for me. > > > > > > > > > > > > That's fine with me, thanks! To avoid too much additional churn on > > > > > > your end, I just replied to the bitops patch with a version that will > > > > > > apply with the change to bitops-instrumented infrastructure. > > > > > > > > > > My first thought was to replace 8/10 of the previous version of your > > > > > patch in -rcu (047ca266cfab "asm-generic, kcsan: Add KCSAN instrumentation > > > > > for bitops"), but this does not apply. So I am guessing that I instead > > > > > do this substitution when a rebase onto -rc1.. > > > > > > > > > > Except... > > > > > > > > > > > Also considering the merge window, we had a discussion and there are > > > > > > some arguments for targeting the v5.5 merge window: > > > > > > - we'd unblock ARM and POWER ports; > > > > > > - we'd unblock people wanting to use the data_race macro; > > > > > > - we'd unblock syzbot just tracking upstream; > > > > > > Unless there are strong reasons to not target v5.5, I leave it to you > > > > > > if you think it's appropriate. > > > > > > > > > > My normal process is to send the pull request shortly after -rc5 comes > > > > > out, but you do call out some benefits of getting it in sooner, so... > > > > > > > > > > What I will do is to rebase your series onto (say) -rc7, test it, and > > > > > see about an RFC pull request. > > > > > > > > > > One possible complication is the new 8/10 patch. But maybe it will > > > > > apply against -rc7? > > > > > > > > > > Another possible complication is this: > > > > > > > > > > scripts/kconfig/conf --syncconfig Kconfig > > > > > * > > > > > * Restart config... > > > > > * > > > > > * > > > > > * KCSAN: watchpoint-based dynamic data race detector > > > > > * > > > > > KCSAN: watchpoint-based dynamic data race detector (KCSAN) [N/y/?] (NEW) > > > > > > > > > > Might be OK in this case because it is quite obvious what it is doing. > > > > > (Avoiding pain from this is the reason that CONFIG_RCU_EXPERT exists.) > > > > > > > > > > But I will just mention this in the pull request. > > > > > > > > > > If there is a -rc8, there is of course a higher probability of making it > > > > > into the next merge window. > > > > > > > > > > Fair enough? > > > > > > > > Totally fine with that, sounds like a good plan, thanks! > > > > > > > > If it helps, in theory we can also drop and delay the bitops > > > > instrumentation patch until the new bitops instrumentation > > > > infrastructure is in 5.5-rc1. There won't be any false positives if > > > > this is missing, we might just miss a few data races until we have it. > > > > > > That sounds advisable for an attempt to hit this coming merge window. > > > > > > So just to make sure I understand, I drop 8/10 and keep the rest during > > > a rebase to 5.4-rc7, correct? > > > > Yes, that's right. > > Very good, I just now pushed a "kcsan" branch on -rcu, and am running > rcutorture, first without KCSAN enabled and then with it turned on. > If all that works out, I set my -next branch to that point and see what > -next testing and kbuild test robot think about it. If all goes well, > an RFC pull request. > > Look OK? Looks good to me, many thanks! -- Marco
On Sat, Nov 16, 2019 at 07:09:21PM +0100, Marco Elver wrote: > On Sat, 16 Nov 2019 at 16:34, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Sat, Nov 16, 2019 at 09:20:54AM +0100, Marco Elver wrote: > > > On Fri, 15 Nov 2019 at 21:43, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Fri, Nov 15, 2019 at 06:14:46PM +0100, Marco Elver wrote: > > > > > On Fri, 15 Nov 2019 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > > > On Fri, Nov 15, 2019 at 01:02:08PM +0100, Marco Elver wrote: > > > > > > > On Thu, 14 Nov 2019 at 23:16, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > > > > > > > On Thu, Nov 14, 2019 at 10:33:03PM +0100, Marco Elver wrote: > > > > > > > > > On Thu, 14 Nov 2019, Paul E. McKenney wrote: > > > > > > > > > > > > > > > > > > > On Thu, Nov 14, 2019 at 07:02:53PM +0100, Marco Elver wrote: > > > > > > > > > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > > > > > > > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > > > > > > > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > > > > > > > > > only enables KCSAN for x86, but we expect adding support for other > > > > > > > > > > > architectures is relatively straightforward (we are aware of > > > > > > > > > > > experimental ARM64 and POWER support). > > > > > > > > > > > > > > > > > > > > > > To gather early feedback, we announced KCSAN back in September, and have > > > > > > > > > > > integrated the feedback where possible: > > > > > > > > > > > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > > > > > > > > > > > > > > > > > > > > > The current list of known upstream fixes for data races found by KCSAN > > > > > > > > > > > can be found here: > > > > > > > > > > > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > > > > > > > > > > > > > > > > > > > > > We want to point out and acknowledge the work surrounding the LKMM, > > > > > > > > > > > including several articles that motivate why data races are dangerous > > > > > > > > > > > [1, 2], justifying a data race detector such as KCSAN. > > > > > > > > > > > > > > > > > > > > > > [1] https://lwn.net/Articles/793253/ > > > > > > > > > > > [2] https://lwn.net/Articles/799218/ > > > > > > > > > > > > > > > > > > > > I queued this and ran a quick rcutorture on it, which completed > > > > > > > > > > successfully with quite a few reports. > > > > > > > > > > > > > > > > > > Great. Many thanks for queuing this in -rcu. And regarding merge window > > > > > > > > > you mentioned, we're fine with your assumption to targeting the next > > > > > > > > > (v5.6) merge window. > > > > > > > > > > > > > > > > > > I've just had a look at linux-next to check what a future rebase > > > > > > > > > requires: > > > > > > > > > > > > > > > > > > - There is a change in lib/Kconfig.debug and moving KCSAN to the > > > > > > > > > "Generic Kernel Debugging Instruments" section seems appropriate. > > > > > > > > > - bitops-instrumented.h was removed and split into 3 files, and needs > > > > > > > > > re-inserting the instrumentation into the right places. > > > > > > > > > > > > > > > > > > Otherwise there are no issues. Let me know what you recommend. > > > > > > > > > > > > > > > > Sounds good! > > > > > > > > > > > > > > > > I will be rebasing onto v5.5-rc1 shortly after it comes out. My usual > > > > > > > > approach is to fix any conflicts during that rebasing operation. > > > > > > > > Does that make sense, or would you prefer to send me a rebased stack at > > > > > > > > that point? Either way is fine for me. > > > > > > > > > > > > > > That's fine with me, thanks! To avoid too much additional churn on > > > > > > > your end, I just replied to the bitops patch with a version that will > > > > > > > apply with the change to bitops-instrumented infrastructure. > > > > > > > > > > > > My first thought was to replace 8/10 of the previous version of your > > > > > > patch in -rcu (047ca266cfab "asm-generic, kcsan: Add KCSAN instrumentation > > > > > > for bitops"), but this does not apply. So I am guessing that I instead > > > > > > do this substitution when a rebase onto -rc1.. > > > > > > > > > > > > Except... > > > > > > > > > > > > > Also considering the merge window, we had a discussion and there are > > > > > > > some arguments for targeting the v5.5 merge window: > > > > > > > - we'd unblock ARM and POWER ports; > > > > > > > - we'd unblock people wanting to use the data_race macro; > > > > > > > - we'd unblock syzbot just tracking upstream; > > > > > > > Unless there are strong reasons to not target v5.5, I leave it to you > > > > > > > if you think it's appropriate. > > > > > > > > > > > > My normal process is to send the pull request shortly after -rc5 comes > > > > > > out, but you do call out some benefits of getting it in sooner, so... > > > > > > > > > > > > What I will do is to rebase your series onto (say) -rc7, test it, and > > > > > > see about an RFC pull request. > > > > > > > > > > > > One possible complication is the new 8/10 patch. But maybe it will > > > > > > apply against -rc7? > > > > > > > > > > > > Another possible complication is this: > > > > > > > > > > > > scripts/kconfig/conf --syncconfig Kconfig > > > > > > * > > > > > > * Restart config... > > > > > > * > > > > > > * > > > > > > * KCSAN: watchpoint-based dynamic data race detector > > > > > > * > > > > > > KCSAN: watchpoint-based dynamic data race detector (KCSAN) [N/y/?] (NEW) > > > > > > > > > > > > Might be OK in this case because it is quite obvious what it is doing. > > > > > > (Avoiding pain from this is the reason that CONFIG_RCU_EXPERT exists.) > > > > > > > > > > > > But I will just mention this in the pull request. > > > > > > > > > > > > If there is a -rc8, there is of course a higher probability of making it > > > > > > into the next merge window. > > > > > > > > > > > > Fair enough? > > > > > > > > > > Totally fine with that, sounds like a good plan, thanks! > > > > > > > > > > If it helps, in theory we can also drop and delay the bitops > > > > > instrumentation patch until the new bitops instrumentation > > > > > infrastructure is in 5.5-rc1. There won't be any false positives if > > > > > this is missing, we might just miss a few data races until we have it. > > > > > > > > That sounds advisable for an attempt to hit this coming merge window. > > > > > > > > So just to make sure I understand, I drop 8/10 and keep the rest during > > > > a rebase to 5.4-rc7, correct? > > > > > > Yes, that's right. > > > > Very good, I just now pushed a "kcsan" branch on -rcu, and am running > > rcutorture, first without KCSAN enabled and then with it turned on. > > If all that works out, I set my -next branch to that point and see what > > -next testing and kbuild test robot think about it. If all goes well, > > an RFC pull request. > > > > Look OK? > > Looks good to me, many thanks! And I did get one failure on the KCSAN=n run for the TREE03 scenario, but it does not appear to be your fault. Looks like a race between a swait_queue_head swake_up_one() invocation and one of the CPU hotplug operations done late in system shutdown. I have included the splat below for your amusement. Starting the KCSAN=y runs now. Thanx, Paul ------------------------------------------------------------------------ [ 601.009355] reboot: Power down [ 601.010447] ------------[ cut here ]------------ [ 601.011020] sched: Unexpected reschedule of offline CPU#1! [ 601.011639] WARNING: CPU: 7 PID: 0 at arch/x86/kernel/apic/ipi.c:67 native_smp_send_reschedule+0x2f/0x40 [ 601.012692] Modules linked in: [ 601.013037] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.4.0-rc7+ #1497 [ 601.013755] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 601.014708] RIP: 0010:native_smp_send_reschedule+0x2f/0x40 [ 601.015312] Code: 05 f6 62 5d 01 73 15 48 8b 05 cd ba 28 01 be fd 00 00 00 48 8b 40 30 e9 bf b0 db 00 89 fe 48 c7 c7 d0 df 5e a7 e8 01 20 02 00 <0f> 0b c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 05 99 ba [ 601.017357] RSP: 0018:ffffa1afc020cee0 EFLAGS: 00010086 [ 601.017942] RAX: 0000000000000000 RBX: ffff8d5c1ed24a54 RCX: 0000000000000005 [ 601.018716] RDX: 0000000080000005 RSI: 0000000000000082 RDI: 00000000ffffffff [ 601.019500] RBP: 0000000000000000 R08: 0000000000000cd5 R09: 000000000000003d [ 601.020283] R10: ffff8d5c1f067f80 R11: 20666f20656c7564 R12: 0000000000000001 [ 601.021074] R13: 0000000000027f40 R14: 0000000000000087 R15: ffff8d5c1ed23f00 [ 601.021851] FS: 0000000000000000(0000) GS:ffff8d5c1f1c0000(0000) knlGS:0000000000000000 [ 601.022731] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 601.023360] CR2: 00000000ffffffff CR3: 000000001120a000 CR4: 00000000000006e0 [ 601.024140] Call Trace: [ 601.024437] <IRQ> [ 601.024671] try_to_wake_up+0x2b3/0x650 [ 601.025103] swake_up_locked.part.6+0xe/0x30 [ 601.025579] swake_up_one+0x22/0x30 [ 601.025968] rcu_try_advance_all_cbs+0x71/0x80 [ 601.026459] rcu_cleanup_after_idle+0x28/0x40 [ 601.026941] rcu_irq_enter+0xfb/0x130 [ 601.027347] irq_enter+0x5/0x50 [ 601.027704] smp_reboot_interrupt+0x1a/0xb0 [ 601.028175] ? smp_apic_timer_interrupt+0xa1/0x180 [ 601.028711] reboot_interrupt+0xf/0x20 [ 601.029134] </IRQ> [ 601.029374] RIP: 0010:default_idle+0x1e/0x170 [ 601.029853] Code: 90 90 90 90 90 90 90 90 90 90 90 90 41 55 41 54 55 53 e8 c5 c5 91 ff 0f 1f 44 00 00 e9 07 00 00 00 0f 00 2d e6 b4 54 00 fb f4 <e8> ad c5 91 ff 89 c5 0f 1f 44 00 00 5b 5d 41 5c 41 5d c3 65 8b 05 [ 601.031884] RSP: 0018:ffffa1afc00afec0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff07 [ 601.032716] RAX: 0000000000000007 RBX: 0000000000000007 RCX: 0000000000000007 [ 601.033498] RDX: 0000000000000001 RSI: 0000000000000087 RDI: ffffffffa769a760 [ 601.034280] RBP: ffffffffa7a1c160 R08: 0000009a8e03a656 R09: 0000000000000001 [ 601.035062] R10: 0000000000000400 R11: 00000000000001d8 R12: 0000000000000000 [ 601.035842] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 601.036632] do_idle+0x1a6/0x240 [ 601.036996] cpu_startup_entry+0x14/0x20 [ 601.037433] start_secondary+0x150/0x180 [ 601.037869] secondary_startup_64+0xa4/0xb0 [ 601.038339] ---[ end trace e4c21199f3882c03 ]--- [ 601.038991] acpi_power_off called
On Thu, 2019-11-14 at 19:02 +0100, 'Marco Elver' via kasan-dev wrote: > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > KCSAN is a sampling watchpoint-based *data race detector*. More details > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > only enables KCSAN for x86, but we expect adding support for other > architectures is relatively straightforward (we are aware of > experimental ARM64 and POWER support). Just booting x86 systems because kcsan_setup_watchpoint() disabled hard irqs? [ 8.926145][ T0] ------------[ cut here ]------------ [ 8.927850][ T0] DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled) [ 80] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4406 check_flags.part.26+0x102/0x240 [ 8.933072][ T0] Modules linked in: [ 8.933072][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc8-next- 20191119+ #2 [ 8.933072][ T0] Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 [ 8.933072][ T0] RIP: 0010:check_flags.part.26+0x102/0x240 [ 8.933072][ T0] Code: 7b a2 e8 51 6d 15 00 44 8b 05 fa df 45 01 45 85 c0 0f 85 27 76 00 00 48 c7 c6 02 d6 3b a2 48 c7 c7 79 36 3b a2 e8 2f 9f f5 ff <0f> e9 0d 76 00 00 65 48 8b 3c 25 40 3f 01 00 e8 89 f0 ff ff e8 [ 8.933072][ T0] RSP: 0000:ffffffffa2603860 EFLAGS: 00010086 [ 8.933072][ T0] RAX: 0000000000000000 RBX: ffffffffa2617b40 RCX: 0000000000000000 [ 8.933072][ T0] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 [ 8.933072][ T0] RBP: ffffffffa2603868 R08: 0000000000000000 R09: 0000ffffa27bcad4 [ 8.933072][ T0] R10: 0000ffffffffffff R11: 0000ffffa27bcad7 R12: 0000000000000168 [ 8.933072][ T0] R13: 0000000000092cc0 R14: 0000000000000246 R15: ffffffffa1664c89 [ 8.933072][ T0] FS: 0000000000000000(0000) GS:ffff8987f3000000(0000) knlGS:0000000000000000 [ 8.933072][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.933072][ T0] CR2: ffff898bfc9ff000 CR3: 000000033dc0e001 CR4: 00000000001606f0 [ 8.933072][ T0] Call Trace: [ 8.933072][ T0] lock_is_held_type+0x66/0x13072][ T0] ? rcu_is_watching+0x79/0xa0 [ 8.933072][ T0] ? create_object+0x69/0x690 [ 8.933072][ T0] rcu_read_lock_sched_held+0x7f/0xa0 [ 8.933072][ T0] kmem_cache_alloc+0x3b2/0x420 [ 8.933072][ T0] ? create_object+0x69/0x690 [ 8.933072][ T0] create_object+0x69/0x690 [ 8.933072][ T0] ? find_next_bit+0x7b/0xa0 [ 8.933072][ T0] kmemleak_alloc_percpu+0xde/0x170 [ 8.933072][ T0] pcpu_alloc+0x683/0xc90 [ 8.933072][ T0] __alloc_percpu+0x2d/0x40 [ 8.933072][ T0] alloc_vfsmnt+0xd1/0x380 [ 8.933072][ T0] vfs_create_mount+0x7f/0x2e0 [ 8.933072][ T0] ? proc_get_tree+0x4d/0x60 [ 8.933072][ T0] fc_mount+0x6d/0x80 [ 8.933072][ T0] pid_ns_prepare_proc+0x133/0x190 [ 8.933072][ T0] alloc_pid+0x5c3/0x600 [ 8.933072][ T0] copy_process+0x1ca3/0x3480 [ 8.933072][ T0] ? __lock_acquire+0x739/0x25d0 [ 8.933072][ T0] _do_fork+0xaa/0x9c0 [ 8.933072][ T0] ? rcu_blocking_is_gp+0x83/0xb0 [ 8.933072][ T0] ? synchronize_rcu_expedited+0x80/0x6c0 [ 8.933072][ T0] ? rcu_blocking_is_gp+0x83/0xb0 [ 8.933072][ T0] ? rest_init+0x381/0x381 [ 8.933072][ T0] kernel_thread+0xb0/0xe0 [ 8.933072][ T0] ? rest_init+0x381/0x381 [ 8.933072][ T0] rest_init+0x31/0x381 [ 8.933072][ st_init+0x17/0x29 [ 8.933072][ T0] start_kernel+0x6ac/0x6d0 [ 8.933072][ T0] x86_64_start_reservations+0x24/0x26 [ 8.933072][ T0] x86_64_start_kernel+0xef/0xf6 [ 8.933072][ T0] secondary_startup_64+0xb6/0xc0 [ 8.933072][ T0] irq event stamp: 75594 [ 8.933072][ T0] hardirqs last enabled at (75593): [<ffffffffa1203d52>] trace_hardirqs_on_thunk+0x1a/0x1c [ 8.933072][ T0] hardirqs last disabled at (75594): [<ffffffffa14b4f96>] kcsan_setup_watchpoint+0x96/0x200 [ 8.933072][ T0] softirqs last enabled at (75592): [<ffffffffa200034c>] __do_softirq+0x34c/0x57c [ 8.933072][ T0] softirqs last disabled at (75585): [<ffffffffa12c6fb2>] irq_exit+0xa2/0xc0 [ 8.933072][ T0] ---[ end trace f4a667495da45c20 ]--- [ 8.933072][ T0] possible reason: unannotated irqs-on. [ 8.933072][ T0] irq event stamp: 75594 [ 8.933072][ T0] hardirqs last enabled at (75593): [<ffffffffa1203d52>] trace_hardirqs_on_thunk+0x1a/0x1c [ 8.933072][ T0] hardirqs last disabled at (75594): [<ffffffffa14b4f96>] kcsan_setup_watchpoint+0x96/0x200 [ 8.933072][ T0] softirqs last enabled at (75592): [<ffffffffa200034c>] __do_softirq+0x34c/0x57c [ 8.933072][ T0] softirqs last disabled at (75585): [<ffffffffa12c6fb2>] irq_exit+0xa2/0xc0 > > To gather early feedback, we announced KCSAN back in September, and have > integrated the feedback where possible: > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > The current list of known upstream fixes for data races found by KCSAN > can be found here: > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > We want to point out and acknowledge the work surrounding the LKMM, > including several articles that motivate why data races are dangerous > [1, 2], justifying a data race detector such as KCSAN. > > [1] https://lwn.net/Articles/793253/ > [2] https://lwn.net/Articles/799218/ > > Race conditions vs. data races > ------------------------------ > > Race conditions are logic bugs, where unexpected interleaving of racing > concurrent operations result in an erroneous state. > > Data races on the other hand are defined at the *memory model/language > level*. Many data races are also harmful race conditions, which a tool > like KCSAN reports! However, not all data races are race conditions and > vice-versa. KCSAN's intent is to report data races according to the > LKMM. A data race detector can only work at the memory model/language > level. > > Deeper analysis, to find high-level race conditions only, requires > conveying the intended kernel logic to a tool. This requires (1) the > developer writing a specification or model of their code, and then (2) > the tool verifying that the implementation matches. This has been done > for small bits of code using model checkers and other formal methods, > but does not scale to the level of what can be covered with a dynamic > analysis based data race detector such as KCSAN. > > For reasons outlined in [1, 2], data races can be much more subtle, but > can cause no less harm than high-level race conditions. > > Changelog > --------- > v4: > * Major changes: > - Optimizations resulting in performance improvement of 33% (on > microbenchmark). > - Deal with nested interrupts for atomic_next. > - Simplify report.c (removing double-locking as well), in preparation > for KCSAN_REPORT_VALUE_CHANGE_ONLY. > - Add patch to introduce "data_race(expr)" macro. > - Introduce KCSAN_REPORT_VALUE_CHANGE_ONLY option for further filtering of data > races: if a conflicting write was observed via a watchpoint, only report the > data race if a value change was observed as well. The option will be enabled > by default on syzbot. (rcu-functions will be excluded from this filter at > request of Paul McKenney.) Context: > http://lkml.kernel.org/r/CANpmjNOepvb6+zJmDePxj21n2rctM4Sp4rJ66x_J-L1UmNK54A@mail.gmail.com > > v3: http://lkml.kernel.org/r/20191104142745.14722-1-elver@google.com > * Major changes: > - Add microbenchmark. > - Add instruction watchpoint skip randomization. > - Refactor API and core runtime fast-path and slow-path. Compared to > the previous version, with a default config and benchmarked using the > added microbenchmark, this version is 3.8x faster. > - Make __tsan_unaligned __alias of generic accesses. > - Rename kcsan_{begin,end}_atomic -> > kcsan_{nestable,flat}_atomic_{begin,end} > - For filter list in debugfs.c use kmalloc+krealloc instead of > kvmalloc. > - Split Documentation into separate patch. > > v2: http://lkml.kernel.org/r/20191017141305.146193-1-elver@google.com > * Major changes: > - Replace kcsan_check_access(.., {true, false}) with > kcsan_check_{read,write}. > - Change atomic-instrumented.h to use __atomic_check_{read,write}. > - Use common struct kcsan_ctx in task_struct and for per-CPU interrupt > contexts. > > v1: http://lkml.kernel.org/r/20191016083959.186860-1-elver@google.com > > Marco Elver (10): > kcsan: Add Kernel Concurrency Sanitizer infrastructure > include/linux/compiler.h: Introduce data_race(expr) macro > kcsan: Add Documentation entry in dev-tools > objtool, kcsan: Add KCSAN runtime functions to whitelist > build, kcsan: Add KCSAN build exceptions > seqlock, kcsan: Add annotations for KCSAN > seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier > asm-generic, kcsan: Add KCSAN instrumentation for bitops > locking/atomics, kcsan: Add KCSAN instrumentation > x86, kcsan: Enable KCSAN for x86 > > Documentation/dev-tools/index.rst | 1 + > Documentation/dev-tools/kcsan.rst | 256 +++++++++ > MAINTAINERS | 11 + > Makefile | 3 +- > arch/x86/Kconfig | 1 + > arch/x86/boot/Makefile | 2 + > arch/x86/boot/compressed/Makefile | 2 + > arch/x86/entry/vdso/Makefile | 3 + > arch/x86/include/asm/bitops.h | 6 +- > arch/x86/kernel/Makefile | 4 + > arch/x86/kernel/cpu/Makefile | 3 + > arch/x86/lib/Makefile | 4 + > arch/x86/mm/Makefile | 4 + > arch/x86/purgatory/Makefile | 2 + > arch/x86/realmode/Makefile | 3 + > arch/x86/realmode/rm/Makefile | 3 + > drivers/firmware/efi/libstub/Makefile | 2 + > include/asm-generic/atomic-instrumented.h | 393 +++++++------- > include/asm-generic/bitops-instrumented.h | 18 + > include/linux/compiler-clang.h | 9 + > include/linux/compiler-gcc.h | 7 + > include/linux/compiler.h | 57 +- > include/linux/kcsan-checks.h | 97 ++++ > include/linux/kcsan.h | 115 ++++ > include/linux/sched.h | 4 + > include/linux/seqlock.h | 51 +- > init/init_task.c | 8 + > init/main.c | 2 + > kernel/Makefile | 6 + > kernel/kcsan/Makefile | 11 + > kernel/kcsan/atomic.h | 27 + > kernel/kcsan/core.c | 626 ++++++++++++++++++++++ > kernel/kcsan/debugfs.c | 275 ++++++++++ > kernel/kcsan/encoding.h | 94 ++++ > kernel/kcsan/kcsan.h | 108 ++++ > kernel/kcsan/report.c | 320 +++++++++++ > kernel/kcsan/test.c | 121 +++++ > kernel/sched/Makefile | 6 + > lib/Kconfig.debug | 2 + > lib/Kconfig.kcsan | 118 ++++ > lib/Makefile | 3 + > mm/Makefile | 8 + > scripts/Makefile.kcsan | 6 + > scripts/Makefile.lib | 10 + > scripts/atomic/gen-atomic-instrumented.sh | 17 +- > tools/objtool/check.c | 18 + > 46 files changed, 2641 insertions(+), 206 deletions(-) > create mode 100644 Documentation/dev-tools/kcsan.rst > create mode 100644 include/linux/kcsan-checks.h > create mode 100644 include/linux/kcsan.h > create mode 100644 kernel/kcsan/Makefile > create mode 100644 kernel/kcsan/atomic.h > create mode 100644 kernel/kcsan/core.c > create mode 100644 kernel/kcsan/debugfs.c > create mode 100644 kernel/kcsan/encoding.h > create mode 100644 kernel/kcsan/kcsan.h > create mode 100644 kernel/kcsan/report.c > create mode 100644 kernel/kcsan/test.c > create mode 100644 lib/Kconfig.kcsan > create mode 100644 scripts/Makefile.kcsan > > -- > 2.24.0.rc1.363.gb1bccd3e3d-goog >
On Thu, 2019-11-14 at 19:02 +0100, 'Marco Elver' via kasan-dev wrote: > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > KCSAN is a sampling watchpoint-based *data race detector*. More details > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > only enables KCSAN for x86, but we expect adding support for other > architectures is relatively straightforward (we are aware of > experimental ARM64 and POWER support). This does not allow the system to boot. Just hang forever at the end. https://cailca.github.io/files/dmesg.txt the config (dselect KASAN and select KCSAN with default options): https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > To gather early feedback, we announced KCSAN back in September, and have > integrated the feedback where possible: > http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com > > The current list of known upstream fixes for data races found by KCSAN > can be found here: > https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan > > We want to point out and acknowledge the work surrounding the LKMM, > including several articles that motivate why data races are dangerous > [1, 2], justifying a data race detector such as KCSAN. > > [1] https://lwn.net/Articles/793253/ > [2] https://lwn.net/Articles/799218/ > > Race conditions vs. data races > ------------------------------ > > Race conditions are logic bugs, where unexpected interleaving of racing > concurrent operations result in an erroneous state. > > Data races on the other hand are defined at the *memory model/language > level*. Many data races are also harmful race conditions, which a tool > like KCSAN reports! However, not all data races are race conditions and > vice-versa. KCSAN's intent is to report data races according to the > LKMM. A data race detector can only work at the memory model/language > level. > > Deeper analysis, to find high-level race conditions only, requires > conveying the intended kernel logic to a tool. This requires (1) the > developer writing a specification or model of their code, and then (2) > the tool verifying that the implementation matches. This has been done > for small bits of code using model checkers and other formal methods, > but does not scale to the level of what can be covered with a dynamic > analysis based data race detector such as KCSAN. > > For reasons outlined in [1, 2], data races can be much more subtle, but > can cause no less harm than high-level race conditions. > > Changelog > --------- > v4: > * Major changes: > - Optimizations resulting in performance improvement of 33% (on > microbenchmark). > - Deal with nested interrupts for atomic_next. > - Simplify report.c (removing double-locking as well), in preparation > for KCSAN_REPORT_VALUE_CHANGE_ONLY. > - Add patch to introduce "data_race(expr)" macro. > - Introduce KCSAN_REPORT_VALUE_CHANGE_ONLY option for further filtering of data > races: if a conflicting write was observed via a watchpoint, only report the > data race if a value change was observed as well. The option will be enabled > by default on syzbot. (rcu-functions will be excluded from this filter at > request of Paul McKenney.) Context: > http://lkml.kernel.org/r/CANpmjNOepvb6+zJmDePxj21n2rctM4Sp4rJ66x_J-L1UmNK54A@mail.gmail.com > > v3: http://lkml.kernel.org/r/20191104142745.14722-1-elver@google.com > * Major changes: > - Add microbenchmark. > - Add instruction watchpoint skip randomization. > - Refactor API and core runtime fast-path and slow-path. Compared to > the previous version, with a default config and benchmarked using the > added microbenchmark, this version is 3.8x faster. > - Make __tsan_unaligned __alias of generic accesses. > - Rename kcsan_{begin,end}_atomic -> > kcsan_{nestable,flat}_atomic_{begin,end} > - For filter list in debugfs.c use kmalloc+krealloc instead of > kvmalloc. > - Split Documentation into separate patch. > > v2: http://lkml.kernel.org/r/20191017141305.146193-1-elver@google.com > * Major changes: > - Replace kcsan_check_access(.., {true, false}) with > kcsan_check_{read,write}. > - Change atomic-instrumented.h to use __atomic_check_{read,write}. > - Use common struct kcsan_ctx in task_struct and for per-CPU interrupt > contexts. > > v1: http://lkml.kernel.org/r/20191016083959.186860-1-elver@google.com > > Marco Elver (10): > kcsan: Add Kernel Concurrency Sanitizer infrastructure > include/linux/compiler.h: Introduce data_race(expr) macro > kcsan: Add Documentation entry in dev-tools > objtool, kcsan: Add KCSAN runtime functions to whitelist > build, kcsan: Add KCSAN build exceptions > seqlock, kcsan: Add annotations for KCSAN > seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier > asm-generic, kcsan: Add KCSAN instrumentation for bitops > locking/atomics, kcsan: Add KCSAN instrumentation > x86, kcsan: Enable KCSAN for x86 > > Documentation/dev-tools/index.rst | 1 + > Documentation/dev-tools/kcsan.rst | 256 +++++++++ > MAINTAINERS | 11 + > Makefile | 3 +- > arch/x86/Kconfig | 1 + > arch/x86/boot/Makefile | 2 + > arch/x86/boot/compressed/Makefile | 2 + > arch/x86/entry/vdso/Makefile | 3 + > arch/x86/include/asm/bitops.h | 6 +- > arch/x86/kernel/Makefile | 4 + > arch/x86/kernel/cpu/Makefile | 3 + > arch/x86/lib/Makefile | 4 + > arch/x86/mm/Makefile | 4 + > arch/x86/purgatory/Makefile | 2 + > arch/x86/realmode/Makefile | 3 + > arch/x86/realmode/rm/Makefile | 3 + > drivers/firmware/efi/libstub/Makefile | 2 + > include/asm-generic/atomic-instrumented.h | 393 +++++++------- > include/asm-generic/bitops-instrumented.h | 18 + > include/linux/compiler-clang.h | 9 + > include/linux/compiler-gcc.h | 7 + > include/linux/compiler.h | 57 +- > include/linux/kcsan-checks.h | 97 ++++ > include/linux/kcsan.h | 115 ++++ > include/linux/sched.h | 4 + > include/linux/seqlock.h | 51 +- > init/init_task.c | 8 + > init/main.c | 2 + > kernel/Makefile | 6 + > kernel/kcsan/Makefile | 11 + > kernel/kcsan/atomic.h | 27 + > kernel/kcsan/core.c | 626 ++++++++++++++++++++++ > kernel/kcsan/debugfs.c | 275 ++++++++++ > kernel/kcsan/encoding.h | 94 ++++ > kernel/kcsan/kcsan.h | 108 ++++ > kernel/kcsan/report.c | 320 +++++++++++ > kernel/kcsan/test.c | 121 +++++ > kernel/sched/Makefile | 6 + > lib/Kconfig.debug | 2 + > lib/Kconfig.kcsan | 118 ++++ > lib/Makefile | 3 + > mm/Makefile | 8 + > scripts/Makefile.kcsan | 6 + > scripts/Makefile.lib | 10 + > scripts/atomic/gen-atomic-instrumented.sh | 17 +- > tools/objtool/check.c | 18 + > 46 files changed, 2641 insertions(+), 206 deletions(-) > create mode 100644 Documentation/dev-tools/kcsan.rst > create mode 100644 include/linux/kcsan-checks.h > create mode 100644 include/linux/kcsan.h > create mode 100644 kernel/kcsan/Makefile > create mode 100644 kernel/kcsan/atomic.h > create mode 100644 kernel/kcsan/core.c > create mode 100644 kernel/kcsan/debugfs.c > create mode 100644 kernel/kcsan/encoding.h > create mode 100644 kernel/kcsan/kcsan.h > create mode 100644 kernel/kcsan/report.c > create mode 100644 kernel/kcsan/test.c > create mode 100644 lib/Kconfig.kcsan > create mode 100644 scripts/Makefile.kcsan > > -- > 2.24.0.rc1.363.gb1bccd3e3d-goog >
On Tue, 19 Nov 2019 at 21:13, Qian Cai <cai@lca.pw> wrote: > > On Thu, 2019-11-14 at 19:02 +0100, 'Marco Elver' via kasan-dev wrote: > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > only enables KCSAN for x86, but we expect adding support for other > > architectures is relatively straightforward (we are aware of > > experimental ARM64 and POWER support). > > This does not allow the system to boot. Just hang forever at the end. > > https://cailca.github.io/files/dmesg.txt > > the config (dselect KASAN and select KCSAN with default options): > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config Thanks! That config enables lots of other debug code. I could reproduce the hang. It's related to CONFIG_PROVE_LOCKING etc. The problem is definitely not the fact that kcsan_setup_watchpoint disables interrupts (tested by removing that code). Although lockdep still complains here, and looking at the code in kcsan/core.c, I just can't see how local_irq_restore cannot be called before returning (in the stacktrace you provided, there is no kcsan function), and interrupts should always be re-enabled. (Interrupts are only disabled during delay in kcsan_setup_watchpoint.) What I also notice is that this happens when the console starts getting spammed with data-race reports (presumably because some extra debug code has lots of data races according to KCSAN). My guess is that some of the extra debug logic enabled in that config is incompatible with KCSAN. However, so far I cannot tell where exactly the problem is. For now the work-around would be not using KCSAN with these extra debug options. I will investigate more, but nothing obviously wrong stands out.. Many thanks, -- Marco
On Tue, 19 Nov 2019, Marco Elver wrote: > On Tue, 19 Nov 2019 at 21:13, Qian Cai <cai@lca.pw> wrote: > > > > On Thu, 2019-11-14 at 19:02 +0100, 'Marco Elver' via kasan-dev wrote: > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > only enables KCSAN for x86, but we expect adding support for other > > > architectures is relatively straightforward (we are aware of > > > experimental ARM64 and POWER support). > > > > This does not allow the system to boot. Just hang forever at the end. > > > > https://cailca.github.io/files/dmesg.txt > > > > the config (dselect KASAN and select KCSAN with default options): > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > Thanks! That config enables lots of other debug code. I could > reproduce the hang. It's related to CONFIG_PROVE_LOCKING etc. > > The problem is definitely not the fact that kcsan_setup_watchpoint > disables interrupts (tested by removing that code). Although lockdep > still complains here, and looking at the code in kcsan/core.c, I just > can't see how local_irq_restore cannot be called before returning (in > the stacktrace you provided, there is no kcsan function), and > interrupts should always be re-enabled. (Interrupts are only disabled > during delay in kcsan_setup_watchpoint.) > > What I also notice is that this happens when the console starts > getting spammed with data-race reports (presumably because some extra > debug code has lots of data races according to KCSAN). > > My guess is that some of the extra debug logic enabled in that config > is incompatible with KCSAN. However, so far I cannot tell where > exactly the problem is. For now the work-around would be not using > KCSAN with these extra debug options. I will investigate more, but > nothing obviously wrong stands out.. It seems that due to spinlock_debug.c containing data races, the console gets spammed with reports. However, it's also possible to encounter deadlock, e.g. printk lock -> spinlock_debug -> KCSAN detects data race -> kcsan_print_report() -> printk lock -> deadlock. So the best thing is to fix the data races in spinlock_debug. I will send a patch separately for you to test. The issue that lockdep still reports inconsistency in IRQ flags tracing I cannot yet say what the problem is. It seems that lockdep IRQ flags tracing may have an issue with KCSAN for numerous reasons: let's say lockdep and IRQ flags tracing code is instrumented, which then calls into KCSAN, which disables/enables interrupts, but due to tracing calls back into lockdep code. In other words, there may be some recursion which corrupts hardirqs_enabled. Thanks, -- Marco