Message ID | 0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop (mailing list archive) |
---|---|
State | Accepted |
Commit | 2f987ea06c312e6dc36321c663287c76999c9c38 |
Headers | show |
Series | [RFC] Inform KCSAN of one-byte cmpxchg() in rcu_trc_cmpxchg_need_qs() | expand |
On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. Because not all architectures support 1-byte cmpxchg? What prevents us from implementing it? > Therefore, rcu_trc_cmpxchg_need_qs() emulates one using field substitution > and a four-byte cmpxchg(), such that the other three bytes are always > atomically updated to their old values. This works, but results in > false-positive KCSAN failures because as far as KCSAN knows, this > cmpxchg() operation is updating all four bytes. > > This commit therefore encloses the cmpxchg() in a data_race() and adds > a single-byte instrument_atomic_read_write(), thus telling KCSAN exactly > what is going on so as to avoid the false positives. > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Cc: Marco Elver <elver@google.com> > > --- > > Is this really the right way to do this? This code has a real data race per definition of data race, right? KCSAN instruments the primitive precisely per its real semantics, but the desired semantics does not match the real semantics. As such, to me the right way would be implementing cmpxchgb(). Otherwise, the workaround below is perfectly adequate. > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > index d5319bbe8c982..e83adcdb49b5f 100644 > --- a/kernel/rcu/tasks.h > +++ b/kernel/rcu/tasks.h > @@ -1460,6 +1460,7 @@ static void rcu_st_need_qs(struct task_struct *t, u8 v) > /* > * Do a cmpxchg() on ->trc_reader_special.b.need_qs, allowing for > * the four-byte operand-size restriction of some platforms. > + * > * Returns the old value, which is often ignored. > */ > u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new) > @@ -1471,7 +1472,13 @@ u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new) > if (trs_old.b.need_qs != old) > return trs_old.b.need_qs; > trs_new.b.need_qs = new; > - ret.s = cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s); > + > + // Although cmpxchg() appears to KCSAN to update all four bytes, > + // only the .b.need_qs byte actually changes. > + instrument_atomic_read_write(&t->trc_reader_special.b.need_qs, > + sizeof(t->trc_reader_special.b.need_qs)); > + ret.s = data_race(cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s)); > + > return ret.b.need_qs; > } > EXPORT_SYMBOL_GPL(rcu_trc_cmpxchg_need_qs); >
On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > Because not all architectures support 1-byte cmpxchg? > What prevents us from implementing it? Nothing that I know of, but I didn't want to put up with the KCSAN report in the interim. > > Therefore, rcu_trc_cmpxchg_need_qs() emulates one using field substitution > > and a four-byte cmpxchg(), such that the other three bytes are always > > atomically updated to their old values. This works, but results in > > false-positive KCSAN failures because as far as KCSAN knows, this > > cmpxchg() operation is updating all four bytes. > > > > This commit therefore encloses the cmpxchg() in a data_race() and adds > > a single-byte instrument_atomic_read_write(), thus telling KCSAN exactly > > what is going on so as to avoid the false positives. > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Cc: Marco Elver <elver@google.com> > > > > --- > > > > Is this really the right way to do this? > > This code has a real data race per definition of data race, right? > KCSAN instruments the primitive precisely per its real semantics, but > the desired semantics does not match the real semantics. As such, to > me the right way would be implementing cmpxchgb(). No argument other than timeframe. ;-) Plus I suspect that a straightforward emulation of cmpxchgb() by cmpxchg() would need to do something similar. > Otherwise, the workaround below is perfectly adequate. Thank you very much for checking! Thanx, Paul > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > index d5319bbe8c982..e83adcdb49b5f 100644 > > --- a/kernel/rcu/tasks.h > > +++ b/kernel/rcu/tasks.h > > @@ -1460,6 +1460,7 @@ static void rcu_st_need_qs(struct task_struct *t, u8 v) > > /* > > * Do a cmpxchg() on ->trc_reader_special.b.need_qs, allowing for > > * the four-byte operand-size restriction of some platforms. > > + * > > * Returns the old value, which is often ignored. > > */ > > u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new) > > @@ -1471,7 +1472,13 @@ u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new) > > if (trs_old.b.need_qs != old) > > return trs_old.b.need_qs; > > trs_new.b.need_qs = new; > > - ret.s = cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s); > > + > > + // Although cmpxchg() appears to KCSAN to update all four bytes, > > + // only the .b.need_qs byte actually changes. > > + instrument_atomic_read_write(&t->trc_reader_special.b.need_qs, > > + sizeof(t->trc_reader_special.b.need_qs)); > > + ret.s = data_race(cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s)); > > + > > return ret.b.need_qs; > > } > > EXPORT_SYMBOL_GPL(rcu_trc_cmpxchg_need_qs); > >
On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > Because not all architectures support 1-byte cmpxchg? > > What prevents us from implementing it? > > Nothing that I know of, but I didn't want to put up with the KCSAN report > in the interim. And here is a lightly tested patch to emulate one-byte and two-byte cmpxchg() for architectures that do not support it. This is just the emulation, and would be followed up with patches to make the relevant architectures make use of it. The one-byte emulation has been lightly tested on x86. Thoughts? Thanx, Paul ------------------------------------------------------------------------ commit d72e54166b56d8b373676e1e92a426a07d53899a Author: Paul E. McKenney <paulmck@kernel.org> Date: Sun Mar 17 14:44:38 2024 -0700 lib: Add one-byte and two-byte cmpxchg() emulation functions Architectures are required to provide four-byte cmpxchg() and 64-bit architectures are additionally required to provide eight-byte cmpxchg(). However, there are cases where one-byte and two-byte cmpxchg() would be extremely useful. Therefore, provide cmpxchg_emu_u8() and cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms of four-byte cmpxchg(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: <linux-arch@vger.kernel.org> diff --git a/arch/Kconfig b/arch/Kconfig index 154f994547632..eef11e9918ec7 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT default 4 if FUNCTION_ALIGNMENT_4B default 0 +config ARCH_NEED_CMPXCHG_1_2_EMU + bool + endmenu diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h new file mode 100644 index 0000000000000..fee8171fa05eb --- /dev/null +++ b/include/linux/cmpxchg-emu.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Emulated 1-byte and 2-byte cmpxchg operations for architectures + * lacking direct support for these sizes. These are implemented in terms + * of 4-byte cmpxchg operations. + * + * Copyright (C) 2024 Paul E. McKenney. + */ + +#ifndef __LINUX_CMPXCHG_EMU_H +#define __LINUX_CMPXCHG_EMU_H + +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); + +#endif /* __LINUX_CMPXCHG_EMU_H */ diff --git a/lib/Makefile b/lib/Makefile index 6b09731d8e619..fecd7b8c09cbd 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o lib-$(CONFIG_GENERIC_BUG) += bug.o obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o #ensure exported functions have prototypes diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c new file mode 100644 index 0000000000000..508b55484c2b6 --- /dev/null +++ b/lib/cmpxchg-emu.c @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Emulated 1-byte and 2-byte cmpxchg operations for architectures + * lacking direct support for these sizes. These are implemented in terms + * of 4-byte cmpxchg operations. + * + * Copyright (C) 2024 Paul E. McKenney. + */ + +#include <linux/types.h> +#include <linux/export.h> +#include <linux/instrumented.h> +#include <linux/atomic.h> +#include <asm-generic/rwonce.h> + +union u8_32 { + u8 b[4]; + u32 w; +}; + +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) +{ + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); + int i = ((uintptr_t)p) & 0x3; + union u8_32 old32; + union u8_32 new32; + u32 ret; + + old32.w = READ_ONCE(*p32); + do { + if (old32.b[i] != old) + return old32.b[i]; + new32.w = old32.w; + new32.b[i] = new; + instrument_atomic_read_write(p, 1); + ret = cmpxchg(p32, old32.w, new32.w); + } while (ret != old32.w); + return old; +} +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); + +union u16_32 { + u16 h[2]; + u32 w; +}; + +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) +{ + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); + int i = ((uintptr_t)p) & 0x1; + union u16_32 old32; + union u16_32 new32; + u32 ret; + + old32.w = READ_ONCE(*p32); + do { + if (old32.h[i] != old) + return old32.h[i]; + new32.w = old32.w; + new32.h[i] = new; + instrument_atomic_read_write(p, 2); + ret = cmpxchg(p32, old32.w, new32.w); + } while (ret != old32.w); + return old; +} +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
On Sun, 17 Mar 2024 at 22:55, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > > > Because not all architectures support 1-byte cmpxchg? > > > What prevents us from implementing it? > > > > Nothing that I know of, but I didn't want to put up with the KCSAN report > > in the interim. > > And here is a lightly tested patch to emulate one-byte and two-byte > cmpxchg() for architectures that do not support it. This is just the > emulation, and would be followed up with patches to make the relevant > architectures make use of it. > > The one-byte emulation has been lightly tested on x86. > > Thoughts? > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit d72e54166b56d8b373676e1e92a426a07d53899a > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Sun Mar 17 14:44:38 2024 -0700 > > lib: Add one-byte and two-byte cmpxchg() emulation functions > > Architectures are required to provide four-byte cmpxchg() and 64-bit > architectures are additionally required to provide eight-byte cmpxchg(). > However, there are cases where one-byte and two-byte cmpxchg() > would be extremely useful. Therefore, provide cmpxchg_emu_u8() and > cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms > of four-byte cmpxchg(). > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Cc: Marco Elver <elver@google.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > Cc: Douglas Anderson <dianders@chromium.org> > Cc: Petr Mladek <pmladek@suse.com> > Cc: <linux-arch@vger.kernel.org> > > diff --git a/arch/Kconfig b/arch/Kconfig > index 154f994547632..eef11e9918ec7 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT > default 4 if FUNCTION_ALIGNMENT_4B > default 0 > > +config ARCH_NEED_CMPXCHG_1_2_EMU > + bool > + > endmenu > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h > new file mode 100644 > index 0000000000000..fee8171fa05eb > --- /dev/null > +++ b/include/linux/cmpxchg-emu.h > @@ -0,0 +1,16 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > +/* > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > + * lacking direct support for these sizes. These are implemented in terms > + * of 4-byte cmpxchg operations. > + * > + * Copyright (C) 2024 Paul E. McKenney. > + */ > + > +#ifndef __LINUX_CMPXCHG_EMU_H > +#define __LINUX_CMPXCHG_EMU_H > + > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); > + > +#endif /* __LINUX_CMPXCHG_EMU_H */ > diff --git a/lib/Makefile b/lib/Makefile > index 6b09731d8e619..fecd7b8c09cbd 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o > lib-$(CONFIG_GENERIC_BUG) += bug.o > > obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o Since you add instrumentation explicitly, we need to suppress instrumentation somehow. For the whole file this can be done with: KCSAN_SANITIZE_cmpxchg-emu.o := n Note, since you use cmpxchg, which pulls in its own instrument_read_write(), we can't use a function attribute (like __no_kcsan) if the whole-file no-instrumentation seems like overkill. Alternatively the cmpxchg could be wrapped into a data_race() (like your original RCU use case was doing). But I think "KCSAN_SANITIZE_cmpxchg-emu.o := n" would be my preferred way. With the explicit "instrument_read_write()" also note that this would do double-instrumentation with other sanitizers (KASAN, KMSAN). But I think we actually want to instrument the whole real access with those tools - would it be bad if we accessed some memory out-of-bounds, but that memory isn't actually used? I don't have a clear answer to that. Also, it might be useful to have an alignment check somewhere, because otherwise we end up with split atomic accesses (or whatever other bad thing the given arch does if that happens). Thanks, -- Marco > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o > #ensure exported functions have prototypes > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c > new file mode 100644 > index 0000000000000..508b55484c2b6 > --- /dev/null > +++ b/lib/cmpxchg-emu.c > @@ -0,0 +1,68 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > +/* > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > + * lacking direct support for these sizes. These are implemented in terms > + * of 4-byte cmpxchg operations. > + * > + * Copyright (C) 2024 Paul E. McKenney. > + */ > + > +#include <linux/types.h> > +#include <linux/export.h> > +#include <linux/instrumented.h> > +#include <linux/atomic.h> > +#include <asm-generic/rwonce.h> > + > +union u8_32 { > + u8 b[4]; > + u32 w; > +}; > + > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) > +{ > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); > + int i = ((uintptr_t)p) & 0x3; > + union u8_32 old32; > + union u8_32 new32; > + u32 ret; > + > + old32.w = READ_ONCE(*p32); > + do { > + if (old32.b[i] != old) > + return old32.b[i]; > + new32.w = old32.w; > + new32.b[i] = new; > + instrument_atomic_read_write(p, 1); > + ret = cmpxchg(p32, old32.w, new32.w); > + } while (ret != old32.w); > + return old; > +} > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); > + > +union u16_32 { > + u16 h[2]; > + u32 w; > +}; > + > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > +{ > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); > + int i = ((uintptr_t)p) & 0x1; > + union u16_32 old32; > + union u16_32 new32; > + u32 ret; > + > + old32.w = READ_ONCE(*p32); > + do { > + if (old32.h[i] != old) > + return old32.h[i]; > + new32.w = old32.w; > + new32.h[i] = new; > + instrument_atomic_read_write(p, 2); > + ret = cmpxchg(p32, old32.w, new32.w); > + } while (ret != old32.w); > + return old; > +} > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
On Mon, 18 Mar 2024 at 11:01, Marco Elver <elver@google.com> wrote: > > On Sun, 17 Mar 2024 at 22:55, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > > > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > > > > > Because not all architectures support 1-byte cmpxchg? > > > > What prevents us from implementing it? > > > > > > Nothing that I know of, but I didn't want to put up with the KCSAN report > > > in the interim. > > > > And here is a lightly tested patch to emulate one-byte and two-byte > > cmpxchg() for architectures that do not support it. This is just the > > emulation, and would be followed up with patches to make the relevant > > architectures make use of it. > > > > The one-byte emulation has been lightly tested on x86. > > > > Thoughts? > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit d72e54166b56d8b373676e1e92a426a07d53899a > > Author: Paul E. McKenney <paulmck@kernel.org> > > Date: Sun Mar 17 14:44:38 2024 -0700 > > > > lib: Add one-byte and two-byte cmpxchg() emulation functions > > > > Architectures are required to provide four-byte cmpxchg() and 64-bit > > architectures are additionally required to provide eight-byte cmpxchg(). > > However, there are cases where one-byte and two-byte cmpxchg() > > would be extremely useful. Therefore, provide cmpxchg_emu_u8() and > > cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms > > of four-byte cmpxchg(). > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Cc: Marco Elver <elver@google.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > > Cc: Douglas Anderson <dianders@chromium.org> > > Cc: Petr Mladek <pmladek@suse.com> > > Cc: <linux-arch@vger.kernel.org> > > > > diff --git a/arch/Kconfig b/arch/Kconfig > > index 154f994547632..eef11e9918ec7 100644 > > --- a/arch/Kconfig > > +++ b/arch/Kconfig > > @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT > > default 4 if FUNCTION_ALIGNMENT_4B > > default 0 > > > > +config ARCH_NEED_CMPXCHG_1_2_EMU > > + bool > > + > > endmenu > > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h > > new file mode 100644 > > index 0000000000000..fee8171fa05eb > > --- /dev/null > > +++ b/include/linux/cmpxchg-emu.h > > @@ -0,0 +1,16 @@ > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > +/* > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > + * lacking direct support for these sizes. These are implemented in terms > > + * of 4-byte cmpxchg operations. > > + * > > + * Copyright (C) 2024 Paul E. McKenney. > > + */ > > + > > +#ifndef __LINUX_CMPXCHG_EMU_H > > +#define __LINUX_CMPXCHG_EMU_H > > + > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); > > + > > +#endif /* __LINUX_CMPXCHG_EMU_H */ > > diff --git a/lib/Makefile b/lib/Makefile > > index 6b09731d8e619..fecd7b8c09cbd 100644 > > --- a/lib/Makefile > > +++ b/lib/Makefile > > @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o > > lib-$(CONFIG_GENERIC_BUG) += bug.o > > > > obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o > > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o > > Since you add instrumentation explicitly, we need to suppress > instrumentation somehow. For the whole file this can be done with: > > KCSAN_SANITIZE_cmpxchg-emu.o := n Hrm, I recall this doesn't actually work as-is because it also disables instrument_read_write() instrumentation. So I think the most reliable would be to use data_race() after all. It'll be a bit slower because of double-instrumenting, but I think that's not a major concern with an instrumented build anyway. > Note, since you use cmpxchg, which pulls in its own > instrument_read_write(), we can't use a function attribute (like > __no_kcsan) if the whole-file no-instrumentation seems like overkill. > Alternatively the cmpxchg could be wrapped into a data_race() (like > your original RCU use case was doing). > > But I think "KCSAN_SANITIZE_cmpxchg-emu.o := n" would be my preferred way. > > With the explicit "instrument_read_write()" also note that this would > do double-instrumentation with other sanitizers (KASAN, KMSAN). But I > think we actually want to instrument the whole real access with those > tools - would it be bad if we accessed some memory out-of-bounds, but > that memory isn't actually used? I don't have a clear answer to that. > > Also, it might be useful to have an alignment check somewhere, because > otherwise we end up with split atomic accesses (or whatever other bad > thing the given arch does if that happens). > > Thanks, > -- Marco > > > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o > > #ensure exported functions have prototypes > > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c > > new file mode 100644 > > index 0000000000000..508b55484c2b6 > > --- /dev/null > > +++ b/lib/cmpxchg-emu.c > > @@ -0,0 +1,68 @@ > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > +/* > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > + * lacking direct support for these sizes. These are implemented in terms > > + * of 4-byte cmpxchg operations. > > + * > > + * Copyright (C) 2024 Paul E. McKenney. > > + */ > > + > > +#include <linux/types.h> > > +#include <linux/export.h> > > +#include <linux/instrumented.h> > > +#include <linux/atomic.h> > > +#include <asm-generic/rwonce.h> > > + > > +union u8_32 { > > + u8 b[4]; > > + u32 w; > > +}; > > + > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) > > +{ > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); > > + int i = ((uintptr_t)p) & 0x3; > > + union u8_32 old32; > > + union u8_32 new32; > > + u32 ret; > > + > > + old32.w = READ_ONCE(*p32); > > + do { > > + if (old32.b[i] != old) > > + return old32.b[i]; > > + new32.w = old32.w; > > + new32.b[i] = new; > > + instrument_atomic_read_write(p, 1); > > + ret = cmpxchg(p32, old32.w, new32.w); > > + } while (ret != old32.w); > > + return old; > > +} > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); > > + > > +union u16_32 { > > + u16 h[2]; > > + u32 w; > > +}; > > + > > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > > +{ > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); > > + int i = ((uintptr_t)p) & 0x1; > > + union u16_32 old32; > > + union u16_32 new32; > > + u32 ret; > > + > > + old32.w = READ_ONCE(*p32); > > + do { > > + if (old32.h[i] != old) > > + return old32.h[i]; > > + new32.w = old32.w; > > + new32.h[i] = new; > > + instrument_atomic_read_write(p, 2); > > + ret = cmpxchg(p32, old32.w, new32.w); > > + } while (ret != old32.w); > > + return old; > > +} > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
On Mon, Mar 18, 2024 at 04:43:38PM +0100, Marco Elver wrote: > On Mon, 18 Mar 2024 at 11:01, Marco Elver <elver@google.com> wrote: > > > > On Sun, 17 Mar 2024 at 22:55, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > > > > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > > > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > > > > > > > Because not all architectures support 1-byte cmpxchg? > > > > > What prevents us from implementing it? > > > > > > > > Nothing that I know of, but I didn't want to put up with the KCSAN report > > > > in the interim. > > > > > > And here is a lightly tested patch to emulate one-byte and two-byte > > > cmpxchg() for architectures that do not support it. This is just the > > > emulation, and would be followed up with patches to make the relevant > > > architectures make use of it. > > > > > > The one-byte emulation has been lightly tested on x86. > > > > > > Thoughts? > > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > commit d72e54166b56d8b373676e1e92a426a07d53899a > > > Author: Paul E. McKenney <paulmck@kernel.org> > > > Date: Sun Mar 17 14:44:38 2024 -0700 > > > > > > lib: Add one-byte and two-byte cmpxchg() emulation functions > > > > > > Architectures are required to provide four-byte cmpxchg() and 64-bit > > > architectures are additionally required to provide eight-byte cmpxchg(). > > > However, there are cases where one-byte and two-byte cmpxchg() > > > would be extremely useful. Therefore, provide cmpxchg_emu_u8() and > > > cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms > > > of four-byte cmpxchg(). > > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > Cc: Marco Elver <elver@google.com> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > > > Cc: Douglas Anderson <dianders@chromium.org> > > > Cc: Petr Mladek <pmladek@suse.com> > > > Cc: <linux-arch@vger.kernel.org> > > > > > > diff --git a/arch/Kconfig b/arch/Kconfig > > > index 154f994547632..eef11e9918ec7 100644 > > > --- a/arch/Kconfig > > > +++ b/arch/Kconfig > > > @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT > > > default 4 if FUNCTION_ALIGNMENT_4B > > > default 0 > > > > > > +config ARCH_NEED_CMPXCHG_1_2_EMU > > > + bool > > > + > > > endmenu > > > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h > > > new file mode 100644 > > > index 0000000000000..fee8171fa05eb > > > --- /dev/null > > > +++ b/include/linux/cmpxchg-emu.h > > > @@ -0,0 +1,16 @@ > > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > > +/* > > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > > + * lacking direct support for these sizes. These are implemented in terms > > > + * of 4-byte cmpxchg operations. > > > + * > > > + * Copyright (C) 2024 Paul E. McKenney. > > > + */ > > > + > > > +#ifndef __LINUX_CMPXCHG_EMU_H > > > +#define __LINUX_CMPXCHG_EMU_H > > > + > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); > > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); > > > + > > > +#endif /* __LINUX_CMPXCHG_EMU_H */ > > > diff --git a/lib/Makefile b/lib/Makefile > > > index 6b09731d8e619..fecd7b8c09cbd 100644 > > > --- a/lib/Makefile > > > +++ b/lib/Makefile > > > @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o > > > lib-$(CONFIG_GENERIC_BUG) += bug.o > > > > > > obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o > > > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o > > > > Since you add instrumentation explicitly, we need to suppress > > instrumentation somehow. For the whole file this can be done with: > > > > KCSAN_SANITIZE_cmpxchg-emu.o := n > > Hrm, I recall this doesn't actually work as-is because it also > disables instrument_read_write() instrumentation. > > So I think the most reliable would be to use data_race() after all. > It'll be a bit slower because of double-instrumenting, but I think > that's not a major concern with an instrumented build anyway. And I have added data_race(), thank you! > > Note, since you use cmpxchg, which pulls in its own > > instrument_read_write(), we can't use a function attribute (like > > __no_kcsan) if the whole-file no-instrumentation seems like overkill. > > Alternatively the cmpxchg could be wrapped into a data_race() (like > > your original RCU use case was doing). > > > > But I think "KCSAN_SANITIZE_cmpxchg-emu.o := n" would be my preferred way. > > > > With the explicit "instrument_read_write()" also note that this would > > do double-instrumentation with other sanitizers (KASAN, KMSAN). But I > > think we actually want to instrument the whole real access with those > > tools - would it be bad if we accessed some memory out-of-bounds, but > > that memory isn't actually used? I don't have a clear answer to that. > > > > Also, it might be useful to have an alignment check somewhere, because > > otherwise we end up with split atomic accesses (or whatever other bad > > thing the given arch does if that happens). Excellent point, added. I also fixed an embarrassing pointer-arithmetic bug which the act of coding the alignment check uncovered, so two for one! ;-) Please see below for a patch to the patched code. Thanx, Paul > > Thanks, > > -- Marco > > > > > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o > > > #ensure exported functions have prototypes > > > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c > > > new file mode 100644 > > > index 0000000000000..508b55484c2b6 > > > --- /dev/null > > > +++ b/lib/cmpxchg-emu.c > > > @@ -0,0 +1,68 @@ > > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > > +/* > > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > > + * lacking direct support for these sizes. These are implemented in terms > > > + * of 4-byte cmpxchg operations. > > > + * > > > + * Copyright (C) 2024 Paul E. McKenney. > > > + */ > > > + > > > +#include <linux/types.h> > > > +#include <linux/export.h> > > > +#include <linux/instrumented.h> > > > +#include <linux/atomic.h> > > > +#include <asm-generic/rwonce.h> > > > + > > > +union u8_32 { > > > + u8 b[4]; > > > + u32 w; > > > +}; > > > + > > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) > > > +{ > > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); > > > + int i = ((uintptr_t)p) & 0x3; > > > + union u8_32 old32; > > > + union u8_32 new32; > > > + u32 ret; > > > + > > > + old32.w = READ_ONCE(*p32); > > > + do { > > > + if (old32.b[i] != old) > > > + return old32.b[i]; > > > + new32.w = old32.w; > > > + new32.b[i] = new; > > > + instrument_atomic_read_write(p, 1); > > > + ret = cmpxchg(p32, old32.w, new32.w); > > > + } while (ret != old32.w); > > > + return old; > > > +} > > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); > > > + > > > +union u16_32 { > > > + u16 h[2]; > > > + u32 w; > > > +}; > > > + > > > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > > > +{ > > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); > > > + int i = ((uintptr_t)p) & 0x1; > > > + union u16_32 old32; > > > + union u16_32 new32; > > > + u32 ret; > > > + > > > + old32.w = READ_ONCE(*p32); > > > + do { > > > + if (old32.h[i] != old) > > > + return old32.h[i]; > > > + new32.w = old32.w; > > > + new32.h[i] = new; > > > + instrument_atomic_read_write(p, 2); > > > + ret = cmpxchg(p32, old32.w, new32.w); > > > + } while (ret != old32.w); > > > + return old; > > > +} > > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16); diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c index 508b55484c2b6..b904f954dd4fc 100644 --- a/lib/cmpxchg-emu.c +++ b/lib/cmpxchg-emu.c @@ -11,6 +11,8 @@ #include <linux/export.h> #include <linux/instrumented.h> #include <linux/atomic.h> +#include <linux/panic.h> +#include <linux/bug.h> #include <asm-generic/rwonce.h> union u8_32 { @@ -34,7 +36,7 @@ uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) new32.w = old32.w; new32.b[i] = new; instrument_atomic_read_write(p, 1); - ret = cmpxchg(p32, old32.w, new32.w); + ret = data_race(cmpxchg(p32, old32.w, new32.w)); } while (ret != old32.w); return old; } @@ -48,12 +50,13 @@ union u16_32 { /* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) { - u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); - int i = ((uintptr_t)p) & 0x1; + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); + int i = (((uintptr_t)p) & 0x2) / 2; union u16_32 old32; union u16_32 new32; u32 ret; + WARN_ON_ONCE(((uintptr_t)p) & 0x1); old32.w = READ_ONCE(*p32); do { if (old32.h[i] != old) @@ -61,7 +64,7 @@ uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) new32.w = old32.w; new32.h[i] = new; instrument_atomic_read_write(p, 2); - ret = cmpxchg(p32, old32.w, new32.w); + ret = data_race(cmpxchg(p32, old32.w, new32.w)); } while (ret != old32.w); return old; }
On Tue, 19 Mar 2024 at 02:59, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Mon, Mar 18, 2024 at 04:43:38PM +0100, Marco Elver wrote: > > On Mon, 18 Mar 2024 at 11:01, Marco Elver <elver@google.com> wrote: > > > > > > On Sun, 17 Mar 2024 at 22:55, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > > > > > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > > > > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > > > > > > > > > Because not all architectures support 1-byte cmpxchg? > > > > > > What prevents us from implementing it? > > > > > > > > > > Nothing that I know of, but I didn't want to put up with the KCSAN report > > > > > in the interim. > > > > > > > > And here is a lightly tested patch to emulate one-byte and two-byte > > > > cmpxchg() for architectures that do not support it. This is just the > > > > emulation, and would be followed up with patches to make the relevant > > > > architectures make use of it. > > > > > > > > The one-byte emulation has been lightly tested on x86. > > > > > > > > Thoughts? > > > > > > > > Thanx, Paul > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > commit d72e54166b56d8b373676e1e92a426a07d53899a > > > > Author: Paul E. McKenney <paulmck@kernel.org> > > > > Date: Sun Mar 17 14:44:38 2024 -0700 > > > > > > > > lib: Add one-byte and two-byte cmpxchg() emulation functions > > > > > > > > Architectures are required to provide four-byte cmpxchg() and 64-bit > > > > architectures are additionally required to provide eight-byte cmpxchg(). > > > > However, there are cases where one-byte and two-byte cmpxchg() > > > > would be extremely useful. Therefore, provide cmpxchg_emu_u8() and > > > > cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms > > > > of four-byte cmpxchg(). > > > > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > Cc: Marco Elver <elver@google.com> > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > > > > Cc: Douglas Anderson <dianders@chromium.org> > > > > Cc: Petr Mladek <pmladek@suse.com> > > > > Cc: <linux-arch@vger.kernel.org> > > > > > > > > diff --git a/arch/Kconfig b/arch/Kconfig > > > > index 154f994547632..eef11e9918ec7 100644 > > > > --- a/arch/Kconfig > > > > +++ b/arch/Kconfig > > > > @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT > > > > default 4 if FUNCTION_ALIGNMENT_4B > > > > default 0 > > > > > > > > +config ARCH_NEED_CMPXCHG_1_2_EMU > > > > + bool > > > > + > > > > endmenu > > > > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h > > > > new file mode 100644 > > > > index 0000000000000..fee8171fa05eb > > > > --- /dev/null > > > > +++ b/include/linux/cmpxchg-emu.h > > > > @@ -0,0 +1,16 @@ > > > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > > > +/* > > > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > > > + * lacking direct support for these sizes. These are implemented in terms > > > > + * of 4-byte cmpxchg operations. > > > > + * > > > > + * Copyright (C) 2024 Paul E. McKenney. > > > > + */ > > > > + > > > > +#ifndef __LINUX_CMPXCHG_EMU_H > > > > +#define __LINUX_CMPXCHG_EMU_H > > > > + > > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); > > > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); > > > > + > > > > +#endif /* __LINUX_CMPXCHG_EMU_H */ > > > > diff --git a/lib/Makefile b/lib/Makefile > > > > index 6b09731d8e619..fecd7b8c09cbd 100644 > > > > --- a/lib/Makefile > > > > +++ b/lib/Makefile > > > > @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o > > > > lib-$(CONFIG_GENERIC_BUG) += bug.o > > > > > > > > obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o > > > > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o > > > > > > Since you add instrumentation explicitly, we need to suppress > > > instrumentation somehow. For the whole file this can be done with: > > > > > > KCSAN_SANITIZE_cmpxchg-emu.o := n > > > > Hrm, I recall this doesn't actually work as-is because it also > > disables instrument_read_write() instrumentation. > > > > So I think the most reliable would be to use data_race() after all. > > It'll be a bit slower because of double-instrumenting, but I think > > that's not a major concern with an instrumented build anyway. > > And I have added data_race(), thank you! > > > > Note, since you use cmpxchg, which pulls in its own > > > instrument_read_write(), we can't use a function attribute (like > > > __no_kcsan) if the whole-file no-instrumentation seems like overkill. > > > Alternatively the cmpxchg could be wrapped into a data_race() (like > > > your original RCU use case was doing). > > > > > > But I think "KCSAN_SANITIZE_cmpxchg-emu.o := n" would be my preferred way. > > > > > > With the explicit "instrument_read_write()" also note that this would > > > do double-instrumentation with other sanitizers (KASAN, KMSAN). But I > > > think we actually want to instrument the whole real access with those > > > tools - would it be bad if we accessed some memory out-of-bounds, but > > > that memory isn't actually used? I don't have a clear answer to that. > > > > > > Also, it might be useful to have an alignment check somewhere, because > > > otherwise we end up with split atomic accesses (or whatever other bad > > > thing the given arch does if that happens). > > Excellent point, added. > > I also fixed an embarrassing pointer-arithmetic bug which the act of > coding the alignment check uncovered, so two for one! ;-) > > Please see below for a patch to the patched code. This looks very reasonable to me. Thanks, -- Marco > Thanx, Paul > > > > Thanks, > > > -- Marco > > > > > > > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o > > > > #ensure exported functions have prototypes > > > > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c > > > > new file mode 100644 > > > > index 0000000000000..508b55484c2b6 > > > > --- /dev/null > > > > +++ b/lib/cmpxchg-emu.c > > > > @@ -0,0 +1,68 @@ > > > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > > > +/* > > > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > > > + * lacking direct support for these sizes. These are implemented in terms > > > > + * of 4-byte cmpxchg operations. > > > > + * > > > > + * Copyright (C) 2024 Paul E. McKenney. > > > > + */ > > > > + > > > > +#include <linux/types.h> > > > > +#include <linux/export.h> > > > > +#include <linux/instrumented.h> > > > > +#include <linux/atomic.h> > > > > +#include <asm-generic/rwonce.h> > > > > + > > > > +union u8_32 { > > > > + u8 b[4]; > > > > + u32 w; > > > > +}; > > > > + > > > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) > > > > +{ > > > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); > > > > + int i = ((uintptr_t)p) & 0x3; > > > > + union u8_32 old32; > > > > + union u8_32 new32; > > > > + u32 ret; > > > > + > > > > + old32.w = READ_ONCE(*p32); > > > > + do { > > > > + if (old32.b[i] != old) > > > > + return old32.b[i]; > > > > + new32.w = old32.w; > > > > + new32.b[i] = new; > > > > + instrument_atomic_read_write(p, 1); > > > > + ret = cmpxchg(p32, old32.w, new32.w); > > > > + } while (ret != old32.w); > > > > + return old; > > > > +} > > > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); > > > > + > > > > +union u16_32 { > > > > + u16 h[2]; > > > > + u32 w; > > > > +}; > > > > + > > > > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > > > > +{ > > > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); > > > > + int i = ((uintptr_t)p) & 0x1; > > > > + union u16_32 old32; > > > > + union u16_32 new32; > > > > + u32 ret; > > > > + > > > > + old32.w = READ_ONCE(*p32); > > > > + do { > > > > + if (old32.h[i] != old) > > > > + return old32.h[i]; > > > > + new32.w = old32.w; > > > > + new32.h[i] = new; > > > > + instrument_atomic_read_write(p, 2); > > > > + ret = cmpxchg(p32, old32.w, new32.w); > > > > + } while (ret != old32.w); > > > > + return old; > > > > +} > > > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16); > > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c > index 508b55484c2b6..b904f954dd4fc 100644 > --- a/lib/cmpxchg-emu.c > +++ b/lib/cmpxchg-emu.c > @@ -11,6 +11,8 @@ > #include <linux/export.h> > #include <linux/instrumented.h> > #include <linux/atomic.h> > +#include <linux/panic.h> > +#include <linux/bug.h> > #include <asm-generic/rwonce.h> > > union u8_32 { > @@ -34,7 +36,7 @@ uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) > new32.w = old32.w; > new32.b[i] = new; > instrument_atomic_read_write(p, 1); > - ret = cmpxchg(p32, old32.w, new32.w); > + ret = data_race(cmpxchg(p32, old32.w, new32.w)); > } while (ret != old32.w); > return old; > } > @@ -48,12 +50,13 @@ union u16_32 { > /* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ > uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > { > - u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); > - int i = ((uintptr_t)p) & 0x1; > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); > + int i = (((uintptr_t)p) & 0x2) / 2; > union u16_32 old32; > union u16_32 new32; > u32 ret; > > + WARN_ON_ONCE(((uintptr_t)p) & 0x1); > old32.w = READ_ONCE(*p32); > do { > if (old32.h[i] != old) > @@ -61,7 +64,7 @@ uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > new32.w = old32.w; > new32.h[i] = new; > instrument_atomic_read_write(p, 2); > - ret = cmpxchg(p32, old32.w, new32.w); > + ret = data_race(cmpxchg(p32, old32.w, new32.w)); > } while (ret != old32.w); > return old; > } > > -- > You received this message because you are subscribed to the Google Groups "kasan-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/5e6fdf1d-e84c-463c-b47b-f42500930b28%40paulmck-laptop.
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index d5319bbe8c982..e83adcdb49b5f 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -1460,6 +1460,7 @@ static void rcu_st_need_qs(struct task_struct *t, u8 v) /* * Do a cmpxchg() on ->trc_reader_special.b.need_qs, allowing for * the four-byte operand-size restriction of some platforms. + * * Returns the old value, which is often ignored. */ u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new) @@ -1471,7 +1472,13 @@ u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new) if (trs_old.b.need_qs != old) return trs_old.b.need_qs; trs_new.b.need_qs = new; - ret.s = cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s); + + // Although cmpxchg() appears to KCSAN to update all four bytes, + // only the .b.need_qs byte actually changes. + instrument_atomic_read_write(&t->trc_reader_special.b.need_qs, + sizeof(t->trc_reader_special.b.need_qs)); + ret.s = data_race(cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s)); + return ret.b.need_qs; } EXPORT_SYMBOL_GPL(rcu_trc_cmpxchg_need_qs);
Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. Therefore, rcu_trc_cmpxchg_need_qs() emulates one using field substitution and a four-byte cmpxchg(), such that the other three bytes are always atomically updated to their old values. This works, but results in false-positive KCSAN failures because as far as KCSAN knows, this cmpxchg() operation is updating all four bytes. This commit therefore encloses the cmpxchg() in a data_race() and adds a single-byte instrument_atomic_read_write(), thus telling KCSAN exactly what is going on so as to avoid the false positives. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Marco Elver <elver@google.com> --- Is this really the right way to do this?