From patchwork Sat Sep 28 13:51:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mathieu Desnoyers X-Patchwork-Id: 13814691 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A69C188737; Sat, 28 Sep 2024 13:53:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727531636; cv=none; b=nIPSOr7Y9aVuU7mGcy/Rrc8ha5420y9abJwYBdAuH32kfAj9w1zy7m0yJX7LazuRXpBC01nqL+iUeWv71HP6RpFCun+pqxhnYDlpJGwTDm7y9smvyE0rbPJK1feu5TH+d2zuc38VHNfPBlmR2aJ3EGrEbrKWaOxqPDab7gSBsL0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727531636; c=relaxed/simple; bh=Yoa1DIcV0GIQnMAvyo+KjzEmi28ghhiEBWS5v4mrukA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HXb42zDyEI4hxhn4J5Xb2DOp2SrM6ByF65Z5eU3dd6BeObKNRW+fk9rx+mjre610cBhkq2dxOlFQDxHwWCs3CxlBNz0Zrf1A2Z9bQ8P3ytHypP+Qxq32qgVoembBUpyzz0XR/pSOdiP/Smmxpy/9e2n8KWIKaa6XKiD95cOEszU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=u/VZHw1O; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="u/VZHw1O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727531632; bh=Yoa1DIcV0GIQnMAvyo+KjzEmi28ghhiEBWS5v4mrukA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=u/VZHw1OSex3HiP1nB/019K3dUqOdR3fLwG+PFqz5kjTu+u8bcZgLZJPQdyu1RP1Z 4nBIeK3PByR5f51paXn+oRvKMbngH0Wbv2O7xrjoojShmJS6IoMygb+Xy8BnUf5iSH gEpEPRsJhW3ZpwYI8R3012VQE07oUoLoXdVQL9CM5s1+9goUwG8YIDQaivZBdA30lo 7FqF1UHUPbBawL51Pta6ZfvWTRBu3la9C0P4Acp6wa8SWckijYCPOq2xYrQUAfqGse /677HPW+5WA60VD8EKgnRjGmqkFIhuNTeXOv7NBdt8t4V4igWiCUJ5nh4YCBc2XmmW KDvjMvgbIsB4w== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:cacb:9855:de1f:ded2]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XG8280b5dz1c1M; Sat, 28 Sep 2024 09:53:52 -0400 (EDT) From: Mathieu Desnoyers To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Peter Zijlstra , Boqun Feng , Alan Stern , John Stultz , Neeraj Upadhyay , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Gary Guo , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev Subject: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency Date: Sat, 28 Sep 2024 09:51:27 -0400 Message-Id: <20240928135128.991110-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240928135128.991110-1-mathieu.desnoyers@efficios.com> References: <20240928135128.991110-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Compiler CSE and SSA GVN optimizations can cause the address dependency of addresses returned by rcu_dereference to be lost when comparing those pointers with either constants or previously loaded pointers. Introduce ptr_eq() to compare two addresses while preserving the address dependencies for later use of the address. It should be used when comparing an address returned by rcu_dereference(). This is needed to prevent the compiler CSE and SSA GVN optimizations from replacing the registers holding @a or @b based on their equality, which does not preserve address dependencies and allows the following misordering speculations: - If @b is a constant, the compiler can issue the loads which depend on @a before loading @a. - If @b is a register populated by a prior load, weakly-ordered CPUs can speculate loads which depend on @a before loading @a. The same logic applies with @a and @b swapped. The compiler barrier() is ineffective at fixing this issue. It does not prevent the compiler CSE from losing the address dependency: int fct_2_volatile_barriers(void) { int *a, *b; do { a = READ_ONCE(p); asm volatile ("" : : : "memory"); b = READ_ONCE(p); } while (a != b); asm volatile ("" : : : "memory"); <----- barrier() return *b; } With gcc 14.2 (arm64): fct_2_volatile_barriers: adrp x0, .LANCHOR0 add x0, x0, :lo12:.LANCHOR0 .L2: ldr x1, [x0] <------ x1 populated by first load. ldr x2, [x0] cmp x1, x2 bne .L2 ldr w0, [x1] <------ x1 is used for access which should depend on b. ret On weakly-ordered architectures, this lets CPU speculation use the result from the first load to speculate "ldr w0, [x1]" before "ldr x2, [x0]". Based on the RCU documentation, the control dependency does not prevent the CPU from speculating loads. Suggested-by: Linus Torvalds Suggested-by: Boqun Feng Signed-off-by: Mathieu Desnoyers Reviewed-by: Boqun Feng Acked-by: "Paul E. McKenney" Cc: Greg Kroah-Hartman Cc: Sebastian Andrzej Siewior Cc: "Paul E. McKenney" Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Alan Stern Cc: John Stultz Cc: Neeraj Upadhyay Cc: Linus Torvalds Cc: Boqun Feng Cc: Frederic Weisbecker Cc: Joel Fernandes Cc: Josh Triplett Cc: Uladzislau Rezki Cc: Steven Rostedt Cc: Lai Jiangshan Cc: Zqiang Cc: Ingo Molnar Cc: Waiman Long Cc: Mark Rutland Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: maged.michael@gmail.com Cc: Mateusz Guzik Cc: Gary Guo Cc: Jonas Oberhauser Cc: rcu@vger.kernel.org Cc: linux-mm@kvack.org Cc: lkmm@lists.linux.dev --- include/linux/compiler.h | 62 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 2df665fa2964..f26705c267e8 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -186,6 +186,68 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, __asm__ ("" : "=r" (var) : "0" (var)) #endif +/* + * Compare two addresses while preserving the address dependencies for + * later use of the address. It should be used when comparing an address + * returned by rcu_dereference(). + * + * This is needed to prevent the compiler CSE and SSA GVN optimizations + * from replacing the registers holding @a or @b based on their + * equality, which does not preserve address dependencies and allows the + * following misordering speculations: + * + * - If @b is a constant, the compiler can issue the loads which depend + * on @a before loading @a. + * - If @b is a register populated by a prior load, weakly-ordered + * CPUs can speculate loads which depend on @a before loading @a. + * + * The same logic applies with @a and @b swapped. + * + * Return value: true if pointers are equal, false otherwise. + * + * The compiler barrier() is ineffective at fixing this issue. It does + * not prevent the compiler CSE from losing the address dependency: + * + * int fct_2_volatile_barriers(void) + * { + * int *a, *b; + * + * do { + * a = READ_ONCE(p); + * asm volatile ("" : : : "memory"); + * b = READ_ONCE(p); + * } while (a != b); + * asm volatile ("" : : : "memory"); <-- barrier() + * return *b; + * } + * + * With gcc 14.2 (arm64): + * + * fct_2_volatile_barriers: + * adrp x0, .LANCHOR0 + * add x0, x0, :lo12:.LANCHOR0 + * .L2: + * ldr x1, [x0] <-- x1 populated by first load. + * ldr x2, [x0] + * cmp x1, x2 + * bne .L2 + * ldr w0, [x1] <-- x1 is used for access which should depend on b. + * ret + * + * On weakly-ordered architectures, this lets CPU speculation use the + * result from the first load to speculate "ldr w0, [x1]" before + * "ldr x2, [x0]". + * Based on the RCU documentation, the control dependency does not + * prevent the CPU from speculating loads. + */ +static __always_inline +int ptr_eq(const volatile void *a, const volatile void *b) +{ + OPTIMIZER_HIDE_VAR(a); + OPTIMIZER_HIDE_VAR(b); + return a == b; +} + #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) /** From patchwork Sat Sep 28 13:51:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mathieu Desnoyers X-Patchwork-Id: 13814689 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A873918890A; Sat, 28 Sep 2024 13:53:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727531635; cv=none; b=SeoksmkRSS8c9TS5RCU9/UBZ4XtO6ZTcYGqCsZ9yV6ie65fN500RRgo/PQEerbG6HxzOQuuHKTE31drT1Qsa0VOs39Z7dzZgc0dp8QvPuopCKFS7/4M/Qlv5+zNTjA3FVd6e259PyYe8Nh58UAMFO0PLF7c35UX7VH6who7Enww= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727531635; c=relaxed/simple; bh=adr64DJrpL036YYqX3VlBhGw9oL6SMunripgcF3S5mk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=MndRobE69JjEOnZZ7nMrqmM6Fad09Uzqqm34pyUv7x14Febvw+AXaXLVAlusvYI/pp5bfXO46kU92Pui7IKf15PbG3toyTbOe3bGjQOZji08bQYprfQeZRnLXUdYYiOGWLKxrOSyeulg+lNEMOscFR53SXzR47WI+BO6PgsOoAo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=VZ2bjJPy; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="VZ2bjJPy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727531632; bh=adr64DJrpL036YYqX3VlBhGw9oL6SMunripgcF3S5mk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VZ2bjJPyzV8LOr/fUK3q5+I26yAxJvJ8sorbWZ4MKmbGYZTIJ/fIr7Hyx30CAKpxO WpIBkNQNlW+jolCn9hc284TT8OGzd/5YzG2DKaCz5NhwzMJGPWiHb3T3Q2ElKSsVQk iY5gtCZDD8laZdfbhHUFc1K4XKjgTuoGqs8JP62v3fuje2aC3KS1EKrX9G6HQmkKQ9 afMfgD+0hWceM6vNAUQGUEgDUjAWBhXzNeWOA2qANQmQZadycqwMlcCHN1lkspEiZU i4ggfozc1u0PYWQlkV8coab3oTkaE8OMtg/HAVLx7XKlwkGOVU3RwUet1Tk7XSLeES KNMf+4kwaXmuQ== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:cacb:9855:de1f:ded2]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XG8282zDgz1bsD; Sat, 28 Sep 2024 09:53:52 -0400 (EDT) From: Mathieu Desnoyers To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Peter Zijlstra , Boqun Feng , Alan Stern , John Stultz , Neeraj Upadhyay , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Gary Guo , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev Subject: [PATCH 2/2] Documentation: RCU: Refer to ptr_eq() Date: Sat, 28 Sep 2024 09:51:28 -0400 Message-Id: <20240928135128.991110-3-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240928135128.991110-1-mathieu.desnoyers@efficios.com> References: <20240928135128.991110-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Refer to ptr_eq() in the rcu_dereference() documentation. ptr_eq() is a mechanism that preserves address dependencies when comparing pointers, and should be favored when comparing a pointer obtained from rcu_dereference() against another pointer. Signed-off-by: Mathieu Desnoyers Cc: Greg Kroah-Hartman Cc: Sebastian Andrzej Siewior Cc: "Paul E. McKenney" Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Alan Stern Cc: John Stultz Cc: Neeraj Upadhyay Cc: Linus Torvalds Cc: Boqun Feng Cc: Frederic Weisbecker Cc: Joel Fernandes Cc: Josh Triplett Cc: Uladzislau Rezki Cc: Steven Rostedt Cc: Lai Jiangshan Cc: Zqiang Cc: Ingo Molnar Cc: Waiman Long Cc: Mark Rutland Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: maged.michael@gmail.com Cc: Mateusz Guzik Cc: Gary Guo Cc: Jonas Oberhauser Cc: rcu@vger.kernel.org Cc: linux-mm@kvack.org Cc: lkmm@lists.linux.dev --- Documentation/RCU/rcu_dereference.rst | 34 +++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/Documentation/RCU/rcu_dereference.rst b/Documentation/RCU/rcu_dereference.rst index 2524dcdadde2..c36b8d1721f6 100644 --- a/Documentation/RCU/rcu_dereference.rst +++ b/Documentation/RCU/rcu_dereference.rst @@ -104,11 +104,13 @@ readers working properly: after such branches, but can speculate loads, which can again result in misordering bugs. -- Be very careful about comparing pointers obtained from - rcu_dereference() against non-NULL values. As Linus Torvalds - explained, if the two pointers are equal, the compiler could - substitute the pointer you are comparing against for the pointer - obtained from rcu_dereference(). For example:: +- Use relational operators which preserve address dependencies + (such as "ptr_eq()") to compare pointers obtained from + rcu_dereference() against non-NULL values or against pointers + obtained from prior loads. As Linus Torvalds explained, if the + two pointers are equal, the compiler could substitute the + pointer you are comparing against for the pointer obtained from + rcu_dereference(). For example:: p = rcu_dereference(gp); if (p == &default_struct) @@ -125,6 +127,23 @@ readers working properly: On ARM and Power hardware, the load from "default_struct.a" can now be speculated, such that it might happen before the rcu_dereference(). This could result in bugs due to misordering. + Performing the comparison with "ptr_eq()" ensures the compiler + does not perform such transformation. + + If the comparison is against a pointer obtained from prior + loads, the compiler is allowed to use either register for the + following accesses, which loses the address dependency and + allows weakly-ordered architectures such as ARM and PowerPC + to speculate the address-dependent load before rcu_dereference(). + For example:: + + p1 = READ_ONCE(gp); + p2 = rcu_dereference(gp); + if (p1 == p2) + do_default(p2->a); + + Performing the comparison with "ptr_eq()" ensures the compiler + preserves the address dependencies. However, comparisons are OK in the following cases: @@ -204,6 +223,11 @@ readers working properly: comparison will provide exactly the information that the compiler needs to deduce the value of the pointer. + When in doubt, use relational operators that preserve address + dependencies (such as "ptr_eq()") to compare pointers obtained + from rcu_dereference() against non-NULL values or against + pointers obtained from prior loads. + - Disable any value-speculation optimizations that your compiler might provide, especially if you are making use of feedback-based optimizations that take data collected from prior runs. Such