Message ID | 20240928135128.991110-2-mathieu.desnoyers@efficios.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce ptr_eq() to preserve address dependency | expand |
On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote: > Compiler CSE and SSA GVN optimizations can cause the address dependency > of addresses returned by rcu_dereference to be lost when comparing those > pointers with either constants or previously loaded pointers. > > Introduce ptr_eq() to compare two addresses while preserving the address > dependencies for later use of the address. It should be used when > comparing an address returned by rcu_dereference(). > > This is needed to prevent the compiler CSE and SSA GVN optimizations > from replacing the registers holding @a or @b based on their "Replacing" isn't the right word. What the compiler does is use one rather than the other. Furthermore, the compiler can play these games even with values that aren't in registers. You should just say: "... from using @a (or @b) in places where the source refers to @b (or @a) (based on the fact that after the comparison, the two are known to be equal), which does not ..." > equality, which does not preserve address dependencies and allows the > following misordering speculations: > > - If @b is a constant, the compiler can issue the loads which depend > on @a before loading @a. > - If @b is a register populated by a prior load, weakly-ordered > CPUs can speculate loads which depend on @a before loading @a. It shouldn't matter whether @a and @b are constants, registers, or anything else. All that matters is that the compiler uses the wrong one, which allows weakly ordered CPUs to speculate loads you wouldn't expect it to, based on the source code alone. > The same logic applies with @a and @b swapped. > > The compiler barrier() is ineffective at fixing this issue. > It does not prevent the compiler CSE from losing the address dependency: > > int fct_2_volatile_barriers(void) > { > int *a, *b; > > do { > a = READ_ONCE(p); > asm volatile ("" : : : "memory"); > b = READ_ONCE(p); > } while (a != b); > asm volatile ("" : : : "memory"); <----- barrier() > return *b; > } > > With gcc 14.2 (arm64): > > fct_2_volatile_barriers: > adrp x0, .LANCHOR0 > add x0, x0, :lo12:.LANCHOR0 > .L2: > ldr x1, [x0] <------ x1 populated by first load. > ldr x2, [x0] > cmp x1, x2 > bne .L2 > ldr w0, [x1] <------ x1 is used for access which should depend on b. > ret > > On weakly-ordered architectures, this lets CPU speculation use the > result from the first load to speculate "ldr w0, [x1]" before > "ldr x2, [x0]". > Based on the RCU documentation, the control dependency does not prevent > the CPU from speculating loads. > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> > Suggested-by: Boqun Feng <boqun.feng@gmail.com> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Reviewed-by: Boqun Feng <boqun.feng@gmail.com> > Acked-by: "Paul E. McKenney" <paulmck@kernel.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Cc: "Paul E. McKenney" <paulmck@kernel.org> > Cc: Will Deacon <will@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Alan Stern <stern@rowland.harvard.edu> > Cc: John Stultz <jstultz@google.com> > Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Frederic Weisbecker <frederic@kernel.org> > Cc: Joel Fernandes <joel@joelfernandes.org> > Cc: Josh Triplett <josh@joshtriplett.org> > Cc: Uladzislau Rezki <urezki@gmail.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Lai Jiangshan <jiangshanlai@gmail.com> > Cc: Zqiang <qiang.zhang1211@gmail.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Waiman Long <longman@redhat.com> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: maged.michael@gmail.com > Cc: Mateusz Guzik <mjguzik@gmail.com> > Cc: Gary Guo <gary@garyguo.net> > Cc: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> > Cc: rcu@vger.kernel.org > Cc: linux-mm@kvack.org > Cc: lkmm@lists.linux.dev > --- > include/linux/compiler.h | 62 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 62 insertions(+) > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index 2df665fa2964..f26705c267e8 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -186,6 +186,68 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, > __asm__ ("" : "=r" (var) : "0" (var)) > #endif > > +/* > + * Compare two addresses while preserving the address dependencies for > + * later use of the address. It should be used when comparing an address > + * returned by rcu_dereference(). > + * > + * This is needed to prevent the compiler CSE and SSA GVN optimizations > + * from replacing the registers holding @a or @b based on their > + * equality, which does not preserve address dependencies and allows the > + * following misordering speculations: > + * > + * - If @b is a constant, the compiler can issue the loads which depend > + * on @a before loading @a. > + * - If @b is a register populated by a prior load, weakly-ordered > + * CPUs can speculate loads which depend on @a before loading @a. > + * > + * The same logic applies with @a and @b swapped. This could be more concise, and it should be more general (along the same lines as the description above). > + * > + * Return value: true if pointers are equal, false otherwise. > + * > + * The compiler barrier() is ineffective at fixing this issue. It does > + * not prevent the compiler CSE from losing the address dependency: > + * > + * int fct_2_volatile_barriers(void) > + * { > + * int *a, *b; > + * > + * do { > + * a = READ_ONCE(p); > + * asm volatile ("" : : : "memory"); > + * b = READ_ONCE(p); > + * } while (a != b); > + * asm volatile ("" : : : "memory"); <-- barrier() > + * return *b; > + * } > + * > + * With gcc 14.2 (arm64): > + * > + * fct_2_volatile_barriers: > + * adrp x0, .LANCHOR0 > + * add x0, x0, :lo12:.LANCHOR0 > + * .L2: > + * ldr x1, [x0] <-- x1 populated by first load. > + * ldr x2, [x0] > + * cmp x1, x2 > + * bne .L2 > + * ldr w0, [x1] <-- x1 is used for access which should depend on b. > + * ret > + * > + * On weakly-ordered architectures, this lets CPU speculation use the > + * result from the first load to speculate "ldr w0, [x1]" before > + * "ldr x2, [x0]". > + * Based on the RCU documentation, the control dependency does not > + * prevent the CPU from speculating loads. IMO, this lengthy explanation is not needed in the source code. Just refer interested readers to the commit description. You're repeating the same text verbatim, after all. (Or if you firmly believe that this explanation _does_ belong in the code, then omit it from the commit description. There's no need to say everything twice.) Alan Stern > + */ > +static __always_inline > +int ptr_eq(const volatile void *a, const volatile void *b) > +{ > + OPTIMIZER_HIDE_VAR(a); > + OPTIMIZER_HIDE_VAR(b); > + return a == b; > +} > + > #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) > > /** > -- > 2.39.2 >
On 2024-09-28 16:49, Alan Stern wrote: > On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote: [...] >> +/* >> + * Compare two addresses while preserving the address dependencies for >> + * later use of the address. It should be used when comparing an address >> + * returned by rcu_dereference(). >> + * >> + * This is needed to prevent the compiler CSE and SSA GVN optimizations >> + * from replacing the registers holding @a or @b based on their >> + * equality, which does not preserve address dependencies and allows the >> + * following misordering speculations: >> + * >> + * - If @b is a constant, the compiler can issue the loads which depend >> + * on @a before loading @a. >> + * - If @b is a register populated by a prior load, weakly-ordered >> + * CPUs can speculate loads which depend on @a before loading @a. >> + * >> + * The same logic applies with @a and @b swapped. [...] >> + * >> + * Return value: true if pointers are equal, false otherwise. >> + * >> + * The compiler barrier() is ineffective at fixing this issue. It does >> + * not prevent the compiler CSE from losing the address dependency: >> + * >> + * int fct_2_volatile_barriers(void) >> + * { >> + * int *a, *b; >> + * >> + * do { >> + * a = READ_ONCE(p); >> + * asm volatile ("" : : : "memory"); >> + * b = READ_ONCE(p); >> + * } while (a != b); >> + * asm volatile ("" : : : "memory"); <-- barrier() >> + * return *b; >> + * } >> + * >> + * With gcc 14.2 (arm64): >> + * >> + * fct_2_volatile_barriers: >> + * adrp x0, .LANCHOR0 >> + * add x0, x0, :lo12:.LANCHOR0 >> + * .L2: >> + * ldr x1, [x0] <-- x1 populated by first load. >> + * ldr x2, [x0] >> + * cmp x1, x2 >> + * bne .L2 >> + * ldr w0, [x1] <-- x1 is used for access which should depend on b. >> + * ret >> + * >> + * On weakly-ordered architectures, this lets CPU speculation use the >> + * result from the first load to speculate "ldr w0, [x1]" before >> + * "ldr x2, [x0]". >> + * Based on the RCU documentation, the control dependency does not >> + * prevent the CPU from speculating loads. > > IMO, this lengthy explanation is not needed in the source code. Just > refer interested readers to the commit description. You're repeating > the same text verbatim, after all. > > (Or if you firmly believe that this explanation _does_ belong in the > code, then omit it from the commit description. There's no need to say > everything twice.) > Linus asked for this code/asm example to be in the comment: https://lore.kernel.org/lkml/CAHk-=wgBgh5U+dyNaN=+XCdcm2OmgSRbcH4Vbtk8i5ZDGwStSA@mail.gmail.com/ I agree that it may be a bit verbose for the comment. Linus, do you want the explanation wrt compiler barriers not being enough (with C/asm example) in the comment above ptr_eq() or in the commit message ? Thanks, Mathieu
On 2024-09-28 16:49, Alan Stern wrote: > On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote: >> Compiler CSE and SSA GVN optimizations can cause the address dependency >> of addresses returned by rcu_dereference to be lost when comparing those >> pointers with either constants or previously loaded pointers. >> >> Introduce ptr_eq() to compare two addresses while preserving the address >> dependencies for later use of the address. It should be used when >> comparing an address returned by rcu_dereference(). >> >> This is needed to prevent the compiler CSE and SSA GVN optimizations >> from replacing the registers holding @a or @b based on their > > "Replacing" isn't the right word. What the compiler does is use one > rather than the other. Furthermore, the compiler can play these games > even with values that aren't in registers. > > You should just say: "... from using @a (or @b) in places where the > source refers to @b (or @a) (based on the fact that after the > comparison, the two are known to be equal), which does not ..." OK. > >> equality, which does not preserve address dependencies and allows the >> following misordering speculations: >> >> - If @b is a constant, the compiler can issue the loads which depend >> on @a before loading @a. >> - If @b is a register populated by a prior load, weakly-ordered >> CPUs can speculate loads which depend on @a before loading @a. > > It shouldn't matter whether @a and @b are constants, registers, or > anything else. All that matters is that the compiler uses the wrong > one, which allows weakly ordered CPUs to speculate loads you wouldn't > expect it to, based on the source code alone. I only partially agree here. On weakly-ordered architectures, indeed we don't care whether the issue is caused by the compiler reordering the code (constant) or the CPU speculating the load (registers). However, on strongly-ordered architectures, AFAIU, only the constant case is problematic (compiler reordering the dependent load), because CPU speculating the loads across the control dependency is not an issue. So am I tempted to keep examples that clearly state whether the issue is caused by compiler reordering instructions, or by CPU speculation. > >> The same logic applies with @a and @b swapped. >> >> The compiler barrier() is ineffective at fixing this issue. >> It does not prevent the compiler CSE from losing the address dependency: >> >> int fct_2_volatile_barriers(void) >> { >> int *a, *b; >> >> do { >> a = READ_ONCE(p); >> asm volatile ("" : : : "memory"); >> b = READ_ONCE(p); >> } while (a != b); >> asm volatile ("" : : : "memory"); <----- barrier() >> return *b; >> } >> >> With gcc 14.2 (arm64): >> >> fct_2_volatile_barriers: >> adrp x0, .LANCHOR0 >> add x0, x0, :lo12:.LANCHOR0 >> .L2: >> ldr x1, [x0] <------ x1 populated by first load. >> ldr x2, [x0] >> cmp x1, x2 >> bne .L2 >> ldr w0, [x1] <------ x1 is used for access which should depend on b. >> ret >> >> On weakly-ordered architectures, this lets CPU speculation use the >> result from the first load to speculate "ldr w0, [x1]" before >> "ldr x2, [x0]". >> Based on the RCU documentation, the control dependency does not prevent >> the CPU from speculating loads. >> [...] >> diff --git a/include/linux/compiler.h b/include/linux/compiler.h >> index 2df665fa2964..f26705c267e8 100644 >> --- a/include/linux/compiler.h >> +++ b/include/linux/compiler.h >> @@ -186,6 +186,68 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, >> __asm__ ("" : "=r" (var) : "0" (var)) >> #endif >> >> +/* >> + * Compare two addresses while preserving the address dependencies for >> + * later use of the address. It should be used when comparing an address >> + * returned by rcu_dereference(). >> + * >> + * This is needed to prevent the compiler CSE and SSA GVN optimizations >> + * from replacing the registers holding @a or @b based on their >> + * equality, which does not preserve address dependencies and allows the Replacing with: * This is needed to prevent the compiler CSE and SSA GVN optimizations * from using @a (or @b) in places where the source refers to @b (or @a) * based on the fact that after the comparison, the two are known to be * equal, which does not preserve address dependencies and allows the * following misordering speculations: >> + * following misordering speculations: >> + * >> + * - If @b is a constant, the compiler can issue the loads which depend >> + * on @a before loading @a. >> + * - If @b is a register populated by a prior load, weakly-ordered >> + * CPUs can speculate loads which depend on @a before loading @a. >> + * >> + * The same logic applies with @a and @b swapped. > > This could be more concise, and it should be more general (along the > same lines as the description above). As per my earlier comment, I would prefer to keep the examples specific rather than general so it is clear which scenarios are problematic on weakly vs strongly ordered architectures. [...] Thanks, Mathieu
On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote: > On 2024-09-28 16:49, Alan Stern wrote: > > On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote: > > > equality, which does not preserve address dependencies and allows the > > > following misordering speculations: > > > > > > - If @b is a constant, the compiler can issue the loads which depend > > > on @a before loading @a. > > > - If @b is a register populated by a prior load, weakly-ordered > > > CPUs can speculate loads which depend on @a before loading @a. > > > > It shouldn't matter whether @a and @b are constants, registers, or > > anything else. All that matters is that the compiler uses the wrong > > one, which allows weakly ordered CPUs to speculate loads you wouldn't > > expect it to, based on the source code alone. > > I only partially agree here. > > On weakly-ordered architectures, indeed we don't care whether the > issue is caused by the compiler reordering the code (constant) > or the CPU speculating the load (registers). > > However, on strongly-ordered architectures, AFAIU, only the constant > case is problematic (compiler reordering the dependent load), because I thought you were trying to prevent the compiler from using one pointer instead of the other, not trying to prevent it from reordering anything. Isn't this the point the documentation wants to get across when it says that comparing pointers can be dangerous? > CPU speculating the loads across the control dependency is not an > issue. > > So am I tempted to keep examples that clearly state whether > the issue is caused by compiler reordering instructions, or by > CPU speculation. Isn't it true that on strongly ordered CPUs, a compiler barrier is sufficient to prevent the rcu_dereference() problem? So the whole idea behind ptr_eq() is that it prevents the problem on all CPUs. You can make your examples as specific as you like, but the fact remains that ptr_eq() is meant to prevent situations where both: The compiler uses the wrong pointer for a load, and The CPU performs the load earlier than you want. If either one of those doesn't hold then the problem won't arise. Alan Stern
On 2024-09-28 17:49, Alan Stern wrote: > On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote: >> On 2024-09-28 16:49, Alan Stern wrote: >>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote: >>>> equality, which does not preserve address dependencies and allows the >>>> following misordering speculations: >>>> >>>> - If @b is a constant, the compiler can issue the loads which depend >>>> on @a before loading @a. >>>> - If @b is a register populated by a prior load, weakly-ordered >>>> CPUs can speculate loads which depend on @a before loading @a. >>> >>> It shouldn't matter whether @a and @b are constants, registers, or >>> anything else. All that matters is that the compiler uses the wrong >>> one, which allows weakly ordered CPUs to speculate loads you wouldn't >>> expect it to, based on the source code alone. >> >> I only partially agree here. >> >> On weakly-ordered architectures, indeed we don't care whether the >> issue is caused by the compiler reordering the code (constant) >> or the CPU speculating the load (registers). >> >> However, on strongly-ordered architectures, AFAIU, only the constant >> case is problematic (compiler reordering the dependent load), because > > I thought you were trying to prevent the compiler from using one pointer > instead of the other, not trying to prevent it from reordering anything. > Isn't this the point the documentation wants to get across when it says > that comparing pointers can be dangerous? The motivation for introducing ptr_eq() is indeed because the compiler barrier is not sufficient to prevent the compiler from using one pointer instead of the other. But it turns out that ptr_eq() is also a good tool to prevent the compiler from reordering loads in case where the comparison is done against a constant. > >> CPU speculating the loads across the control dependency is not an >> issue. >> >> So am I tempted to keep examples that clearly state whether >> the issue is caused by compiler reordering instructions, or by >> CPU speculation. > > Isn't it true that on strongly ordered CPUs, a compiler barrier is > sufficient to prevent the rcu_dereference() problem? So the whole idea > behind ptr_eq() is that it prevents the problem on all CPUs. Correct. But given that we have ptr_eq(), it's good to show how it equally prevents the compiler from reordering address-dependent loads (comparison with constant) *and* prevents the compiler from using one pointer rather than the other (comparison between two non-constant pointers) which affects speculation on weakly-ordered CPUs. > You can make your examples as specific as you like, but the fact remains > that ptr_eq() is meant to prevent situations where both: > > The compiler uses the wrong pointer for a load, and > > The CPU performs the load earlier than you want. > > If either one of those doesn't hold then the problem won't arise. Correct. Thanks, Mathieu
On Sat, Sep 28, 2024 at 11:55:22AM -0400, Mathieu Desnoyers wrote: > On 2024-09-28 17:49, Alan Stern wrote: > > On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote: > > > On 2024-09-28 16:49, Alan Stern wrote: > > > > On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote: > > > > > equality, which does not preserve address dependencies and allows the > > > > > following misordering speculations: > > > > > > > > > > - If @b is a constant, the compiler can issue the loads which depend > > > > > on @a before loading @a. > > > > > - If @b is a register populated by a prior load, weakly-ordered > > > > > CPUs can speculate loads which depend on @a before loading @a. > > > > > > > > It shouldn't matter whether @a and @b are constants, registers, or > > > > anything else. All that matters is that the compiler uses the wrong > > > > one, which allows weakly ordered CPUs to speculate loads you wouldn't > > > > expect it to, based on the source code alone. > > > > > > I only partially agree here. > > > > > > On weakly-ordered architectures, indeed we don't care whether the > > > issue is caused by the compiler reordering the code (constant) > > > or the CPU speculating the load (registers). > > > > > > However, on strongly-ordered architectures, AFAIU, only the constant > > > case is problematic (compiler reordering the dependent load), because > > > > I thought you were trying to prevent the compiler from using one pointer > > instead of the other, not trying to prevent it from reordering anything. > > Isn't this the point the documentation wants to get across when it says > > that comparing pointers can be dangerous? > > The motivation for introducing ptr_eq() is indeed because the > compiler barrier is not sufficient to prevent the compiler from > using one pointer instead of the other. > > But it turns out that ptr_eq() is also a good tool to prevent the > compiler from reordering loads in case where the comparison is > done against a constant. Isn't that the same thing? A constant pointer like &x is still a pointer. What you want to do is compare p with &x without allowing the compiler to then replace *p with *&x (or just x). > > Isn't it true that on strongly ordered CPUs, a compiler barrier is > > sufficient to prevent the rcu_dereference() problem? So the whole idea > > behind ptr_eq() is that it prevents the problem on all CPUs. > > Correct. But given that we have ptr_eq(), it's good to show how it > equally prevents the compiler from reordering address-dependent loads > (comparison with constant) *and* prevents the compiler from using > one pointer rather than the other (comparison between two non-constant > pointers) which affects speculation on weakly-ordered CPUs. I don't see how these two things differ from each other. In the comparison-with-a-constant case, how is the compiler reordering anything? Isn't it just using the constant address rather than the loaded pointer and thereby breaking the address dependency? Alan stern
2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > On 2024-09-28 17:49, Alan Stern wrote: >> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote: >>> On 2024-09-28 16:49, Alan Stern wrote: >>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote: >>>>> equality, which does not preserve address dependencies and allows the >>>>> following misordering speculations: >>>>> >>>>> - If @b is a constant, the compiler can issue the loads which depend >>>>> on @a before loading @a. >>>>> - If @b is a register populated by a prior load, weakly-ordered >>>>> CPUs can speculate loads which depend on @a before loading @a. >>>> >>>> It shouldn't matter whether @a and @b are constants, registers, or >>>> anything else. All that matters is that the compiler uses the wrong >>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't >>>> expect it to, based on the source code alone. >>> >>> I only partially agree here. >>> >>> On weakly-ordered architectures, indeed we don't care whether the >>> issue is caused by the compiler reordering the code (constant) >>> or the CPU speculating the load (registers). >>> >>> However, on strongly-ordered architectures, AFAIU, only the constant >>> case is problematic (compiler reordering the dependent load), because >> I thought you were trying to prevent the compiler from using one pointer >> instead of the other, not trying to prevent it from reordering anything. >> Isn't this the point the documentation wants to get across when it says >> that comparing pointers can be dangerous? > > The motivation for introducing ptr_eq() is indeed because the > compiler barrier is not sufficient to prevent the compiler from > using one pointer instead of the other. barrier_data(&b) prevents that. > > But it turns out that ptr_eq() is also a good tool to prevent the > compiler from reordering loads in case where the comparison is > done against a constant. > >>> CPU speculating the loads across the control dependency is not an >>> issue. >>> >>> So am I tempted to keep examples that clearly state whether >>> the issue is caused by compiler reordering instructions, or by >>> CPU speculation. >> Isn't it true that on strongly ordered CPUs, a compiler barrier is >> sufficient to prevent the rcu_dereference() problem? So the whole idea >> behind ptr_eq() is that it prevents the problem on all CPUs. > > Correct. But given that we have ptr_eq(), it's good to show how it > equally prevents the compiler from reordering address-dependent loads > (comparison with constant) *and* prevents the compiler from using > one pointer rather than the other (comparison between two non-constant > pointers) which affects speculation on weakly-ordered CPUs. > >> You can make your examples as specific as you like, but the fact remains >> that ptr_eq() is meant to prevent situations where both: >> The compiler uses the wrong pointer for a load, and >> The CPU performs the load earlier than you want. >> If either one of those doesn't hold then the problem won't arise. > > Correct. > > Thanks, > > Mathieu > > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com > >
Cc: Nikita Popov <github@npopov.com> Cc: llvm@lists.linux.dev On Sat, 28 Sep 2024 09:51:27 -0400 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > Compiler CSE and SSA GVN optimizations can cause the address dependency > of addresses returned by rcu_dereference to be lost when comparing those > pointers with either constants or previously loaded pointers. > > Introduce ptr_eq() to compare two addresses while preserving the address > dependencies for later use of the address. It should be used when > comparing an address returned by rcu_dereference(). > > This is needed to prevent the compiler CSE and SSA GVN optimizations > from replacing the registers holding @a or @b based on their > equality, which does not preserve address dependencies and allows the > following misordering speculations: > > - If @b is a constant, the compiler can issue the loads which depend > on @a before loading @a. > - If @b is a register populated by a prior load, weakly-ordered > CPUs can speculate loads which depend on @a before loading @a. > > The same logic applies with @a and @b swapped. > > The compiler barrier() is ineffective at fixing this issue. > It does not prevent the compiler CSE from losing the address dependency: > > int fct_2_volatile_barriers(void) > { > int *a, *b; > > do { > a = READ_ONCE(p); > asm volatile ("" : : : "memory"); > b = READ_ONCE(p); > } while (a != b); > asm volatile ("" : : : "memory"); <----- barrier() > return *b; > } > > With gcc 14.2 (arm64): > > fct_2_volatile_barriers: > adrp x0, .LANCHOR0 > add x0, x0, :lo12:.LANCHOR0 > .L2: > ldr x1, [x0] <------ x1 populated by first load. > ldr x2, [x0] > cmp x1, x2 > bne .L2 > ldr w0, [x1] <------ x1 is used for access which should depend on b. > ret > > On weakly-ordered architectures, this lets CPU speculation use the > result from the first load to speculate "ldr w0, [x1]" before > "ldr x2, [x0]". > Based on the RCU documentation, the control dependency does not prevent > the CPU from speculating loads. I recall seeing Nikita Popov (nikic) doing work related to this to LLVM so it respects pointer provenances much better and doesn't randomly replace pointers with others if they compare equal. So I tried to reproduce this example on clang, which seems to generate the correct code, loading from *b instead of *a. The generated code with "ptr_eq" however produces one extra move instruction which is not necessary. I digged into the LLVM source code to see if this behaviour is that we can rely on, and found that the GVN in use is very careful with replacing pointers [1]. Essentially: * null can be replaced * constant addresses can be replaced <-- bad for this use case * pointers originate from the same value (getUnderlyingObject). So it appears to me that if we can ensure that neither a or b come from a constant address then the OPTIMIZER_HIDE_VAR might be unnecessary for clang? This should be testable with __builtin_constant_p. Not necessary worth additional complexity handling clang specially, but I think this is GCC/clang difference is worth pointing out. I cc'ed nikic and clang-built-linux mailing list, please correct me if I'm wrong. [1]: https://github.com/llvm/llvm-project/blob/6558e5615ae9e6af6168b0a363808854fd66663f/llvm/lib/Analysis/Loads.cpp#L777-L788 Best, Gary > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> > Suggested-by: Boqun Feng <boqun.feng@gmail.com> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Reviewed-by: Boqun Feng <boqun.feng@gmail.com> > Acked-by: "Paul E. McKenney" <paulmck@kernel.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Cc: "Paul E. McKenney" <paulmck@kernel.org> > Cc: Will Deacon <will@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Alan Stern <stern@rowland.harvard.edu> > Cc: John Stultz <jstultz@google.com> > Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Frederic Weisbecker <frederic@kernel.org> > Cc: Joel Fernandes <joel@joelfernandes.org> > Cc: Josh Triplett <josh@joshtriplett.org> > Cc: Uladzislau Rezki <urezki@gmail.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Lai Jiangshan <jiangshanlai@gmail.com> > Cc: Zqiang <qiang.zhang1211@gmail.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Waiman Long <longman@redhat.com> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: maged.michael@gmail.com > Cc: Mateusz Guzik <mjguzik@gmail.com> > Cc: Gary Guo <gary@garyguo.net> > Cc: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> > Cc: rcu@vger.kernel.org > Cc: linux-mm@kvack.org > Cc: lkmm@lists.linux.dev > --- > include/linux/compiler.h | 62 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 62 insertions(+) > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index 2df665fa2964..f26705c267e8 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -186,6 +186,68 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, > __asm__ ("" : "=r" (var) : "0" (var)) > #endif > > +/* > + * Compare two addresses while preserving the address dependencies for > + * later use of the address. It should be used when comparing an address > + * returned by rcu_dereference(). > + * > + * This is needed to prevent the compiler CSE and SSA GVN optimizations > + * from replacing the registers holding @a or @b based on their > + * equality, which does not preserve address dependencies and allows the > + * following misordering speculations: > + * > + * - If @b is a constant, the compiler can issue the loads which depend > + * on @a before loading @a. > + * - If @b is a register populated by a prior load, weakly-ordered > + * CPUs can speculate loads which depend on @a before loading @a. > + * > + * The same logic applies with @a and @b swapped. > + * > + * Return value: true if pointers are equal, false otherwise. > + * > + * The compiler barrier() is ineffective at fixing this issue. It does > + * not prevent the compiler CSE from losing the address dependency: > + * > + * int fct_2_volatile_barriers(void) > + * { > + * int *a, *b; > + * > + * do { > + * a = READ_ONCE(p); > + * asm volatile ("" : : : "memory"); > + * b = READ_ONCE(p); > + * } while (a != b); > + * asm volatile ("" : : : "memory"); <-- barrier() > + * return *b; > + * } > + * > + * With gcc 14.2 (arm64): > + * > + * fct_2_volatile_barriers: > + * adrp x0, .LANCHOR0 > + * add x0, x0, :lo12:.LANCHOR0 > + * .L2: > + * ldr x1, [x0] <-- x1 populated by first load. > + * ldr x2, [x0] > + * cmp x1, x2 > + * bne .L2 > + * ldr w0, [x1] <-- x1 is used for access which should depend on b. > + * ret > + * > + * On weakly-ordered architectures, this lets CPU speculation use the > + * result from the first load to speculate "ldr w0, [x1]" before > + * "ldr x2, [x0]". > + * Based on the RCU documentation, the control dependency does not > + * prevent the CPU from speculating loads. > + */ > +static __always_inline > +int ptr_eq(const volatile void *a, const volatile void *b) > +{ > + OPTIMIZER_HIDE_VAR(a); > + OPTIMIZER_HIDE_VAR(b); > + return a == b; > +} > + > #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) > > /**
On Sun, Sep 29, 2024, at 6:26 AM, Alan Huang wrote: > 2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: >> >> On 2024-09-28 17:49, Alan Stern wrote: >>> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote: >>>> On 2024-09-28 16:49, Alan Stern wrote: >>>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote: >>>>>> equality, which does not preserve address dependencies and allows the >>>>>> following misordering speculations: >>>>>> >>>>>> - If @b is a constant, the compiler can issue the loads which depend >>>>>> on @a before loading @a. >>>>>> - If @b is a register populated by a prior load, weakly-ordered >>>>>> CPUs can speculate loads which depend on @a before loading @a. >>>>> >>>>> It shouldn't matter whether @a and @b are constants, registers, or >>>>> anything else. All that matters is that the compiler uses the wrong >>>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't >>>>> expect it to, based on the source code alone. >>>> >>>> I only partially agree here. >>>> >>>> On weakly-ordered architectures, indeed we don't care whether the >>>> issue is caused by the compiler reordering the code (constant) >>>> or the CPU speculating the load (registers). >>>> >>>> However, on strongly-ordered architectures, AFAIU, only the constant >>>> case is problematic (compiler reordering the dependent load), because >>> I thought you were trying to prevent the compiler from using one pointer >>> instead of the other, not trying to prevent it from reordering anything. >>> Isn't this the point the documentation wants to get across when it says >>> that comparing pointers can be dangerous? >> >> The motivation for introducing ptr_eq() is indeed because the >> compiler barrier is not sufficient to prevent the compiler from >> using one pointer instead of the other. > > barrier_data(&b) prevents that. > It prevents that because it acts as barrier() + OPTIMIZER_HIDE_VAR(b). I don’t see much value of using that since we can resolve the problem with OPTIMIZER_HIDE_VAR() alone. Regards, Boqun >> >> But it turns out that ptr_eq() is also a good tool to prevent the >> compiler from reordering loads in case where the comparison is >> done against a constant. >> >>>> CPU speculating the loads across the control dependency is not an >>>> issue. >>>> >>>> So am I tempted to keep examples that clearly state whether >>>> the issue is caused by compiler reordering instructions, or by >>>> CPU speculation. >>> Isn't it true that on strongly ordered CPUs, a compiler barrier is >>> sufficient to prevent the rcu_dereference() problem? So the whole idea >>> behind ptr_eq() is that it prevents the problem on all CPUs. >> >> Correct. But given that we have ptr_eq(), it's good to show how it >> equally prevents the compiler from reordering address-dependent loads >> (comparison with constant) *and* prevents the compiler from using >> one pointer rather than the other (comparison between two non-constant >> pointers) which affects speculation on weakly-ordered CPUs. >> >>> You can make your examples as specific as you like, but the fact remains >>> that ptr_eq() is meant to prevent situations where both: >>> The compiler uses the wrong pointer for a load, and >>> The CPU performs the load earlier than you want. >>> If either one of those doesn't hold then the problem won't arise. >> >> Correct. >> >> Thanks, >> >> Mathieu >> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> https://www.efficios.com >> >>
2024年9月29日 07:55,Boqun Feng <boqun.feng@gmail.com> wrote: > > > > On Sun, Sep 29, 2024, at 6:26 AM, Alan Huang wrote: >> 2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: >>> >>> On 2024-09-28 17:49, Alan Stern wrote: >>>> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote: >>>>> On 2024-09-28 16:49, Alan Stern wrote: >>>>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote: >>>>>>> equality, which does not preserve address dependencies and allows the >>>>>>> following misordering speculations: >>>>>>> >>>>>>> - If @b is a constant, the compiler can issue the loads which depend >>>>>>> on @a before loading @a. >>>>>>> - If @b is a register populated by a prior load, weakly-ordered >>>>>>> CPUs can speculate loads which depend on @a before loading @a. >>>>>> >>>>>> It shouldn't matter whether @a and @b are constants, registers, or >>>>>> anything else. All that matters is that the compiler uses the wrong >>>>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't >>>>>> expect it to, based on the source code alone. >>>>> >>>>> I only partially agree here. >>>>> >>>>> On weakly-ordered architectures, indeed we don't care whether the >>>>> issue is caused by the compiler reordering the code (constant) >>>>> or the CPU speculating the load (registers). >>>>> >>>>> However, on strongly-ordered architectures, AFAIU, only the constant >>>>> case is problematic (compiler reordering the dependent load), because >>>> I thought you were trying to prevent the compiler from using one pointer >>>> instead of the other, not trying to prevent it from reordering anything. >>>> Isn't this the point the documentation wants to get across when it says >>>> that comparing pointers can be dangerous? >>> >>> The motivation for introducing ptr_eq() is indeed because the >>> compiler barrier is not sufficient to prevent the compiler from >>> using one pointer instead of the other. >> >> barrier_data(&b) prevents that. >> > > It prevents that because it acts as barrier() + OPTIMIZER_HIDE_VAR(b). > I don’t see much value of using that since we can resolve the problem > with OPTIMIZER_HIDE_VAR() alone. Yeah, barrier_data generates more instructions. > > Regards, > Boqun > >>> >>> But it turns out that ptr_eq() is also a good tool to prevent the >>> compiler from reordering loads in case where the comparison is >>> done against a constant. >>> >>>>> CPU speculating the loads across the control dependency is not an >>>>> issue. >>>>> >>>>> So am I tempted to keep examples that clearly state whether >>>>> the issue is caused by compiler reordering instructions, or by >>>>> CPU speculation. >>>> Isn't it true that on strongly ordered CPUs, a compiler barrier is >>>> sufficient to prevent the rcu_dereference() problem? So the whole idea >>>> behind ptr_eq() is that it prevents the problem on all CPUs. >>> >>> Correct. But given that we have ptr_eq(), it's good to show how it >>> equally prevents the compiler from reordering address-dependent loads >>> (comparison with constant) *and* prevents the compiler from using >>> one pointer rather than the other (comparison between two non-constant >>> pointers) which affects speculation on weakly-ordered CPUs. >>> >>>> You can make your examples as specific as you like, but the fact remains >>>> that ptr_eq() is meant to prevent situations where both: >>>> The compiler uses the wrong pointer for a load, and >>>> The CPU performs the load earlier than you want. >>>> If either one of those doesn't hold then the problem won't arise. >>> >>> Correct. >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>> -- >>> Mathieu Desnoyers >>> EfficiOS Inc. >>> https://www.efficios.com
diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 2df665fa2964..f26705c267e8 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -186,6 +186,68 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, __asm__ ("" : "=r" (var) : "0" (var)) #endif +/* + * Compare two addresses while preserving the address dependencies for + * later use of the address. It should be used when comparing an address + * returned by rcu_dereference(). + * + * This is needed to prevent the compiler CSE and SSA GVN optimizations + * from replacing the registers holding @a or @b based on their + * equality, which does not preserve address dependencies and allows the + * following misordering speculations: + * + * - If @b is a constant, the compiler can issue the loads which depend + * on @a before loading @a. + * - If @b is a register populated by a prior load, weakly-ordered + * CPUs can speculate loads which depend on @a before loading @a. + * + * The same logic applies with @a and @b swapped. + * + * Return value: true if pointers are equal, false otherwise. + * + * The compiler barrier() is ineffective at fixing this issue. It does + * not prevent the compiler CSE from losing the address dependency: + * + * int fct_2_volatile_barriers(void) + * { + * int *a, *b; + * + * do { + * a = READ_ONCE(p); + * asm volatile ("" : : : "memory"); + * b = READ_ONCE(p); + * } while (a != b); + * asm volatile ("" : : : "memory"); <-- barrier() + * return *b; + * } + * + * With gcc 14.2 (arm64): + * + * fct_2_volatile_barriers: + * adrp x0, .LANCHOR0 + * add x0, x0, :lo12:.LANCHOR0 + * .L2: + * ldr x1, [x0] <-- x1 populated by first load. + * ldr x2, [x0] + * cmp x1, x2 + * bne .L2 + * ldr w0, [x1] <-- x1 is used for access which should depend on b. + * ret + * + * On weakly-ordered architectures, this lets CPU speculation use the + * result from the first load to speculate "ldr w0, [x1]" before + * "ldr x2, [x0]". + * Based on the RCU documentation, the control dependency does not + * prevent the CPU from speculating loads. + */ +static __always_inline +int ptr_eq(const volatile void *a, const volatile void *b) +{ + OPTIMIZER_HIDE_VAR(a); + OPTIMIZER_HIDE_VAR(b); + return a == b; +} + #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) /**