[RFC,3/4] hp: Implement Hazard Pointers

Message ID	20241002010205.1341915-4-mathieu.desnoyers@efficios.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> To: Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton <akpm@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Nicholas Piggin <npiggin@gmail.com>, Michael Ellerman <mpe@ellerman.id.au>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, "Paul E. McKenney" <paulmck@kernel.org>, Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com>, Alan Stern <stern@rowland.harvard.edu>, John Stultz <jstultz@google.com>, Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>, Frederic Weisbecker <frederic@kernel.org>, Joel Fernandes <joel@joelfernandes.org>, Josh Triplett <josh@joshtriplett.org>, Uladzislau Rezki <urezki@gmail.com>, Steven Rostedt <rostedt@goodmis.org>, Lai Jiangshan <jiangshanlai@gmail.com>, Zqiang <qiang.zhang1211@gmail.com>, Ingo Molnar <mingo@redhat.com>, Waiman Long <longman@redhat.com>, Mark Rutland <mark.rutland@arm.com>, Thomas Gleixner <tglx@linutronix.de>, Vlastimil Babka <vbabka@suse.cz>, maged.michael@gmail.com, Mateusz Guzik <mjguzik@gmail.com>, Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>, rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev Subject: [RFC PATCH 3/4] hp: Implement Hazard Pointers Date: Tue, 1 Oct 2024 21:02:04 -0400 Message-Id: <20241002010205.1341915-4-mathieu.desnoyers@efficios.com> In-Reply-To: <20241002010205.1341915-1-mathieu.desnoyers@efficios.com> References: <20241002010205.1341915-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	sched+mm: Track lazy active mm existence with hazard pointers \| expand [RFC,0/4] sched+mm: Track lazy active mm existence with hazard pointers [RFC,1/4] compiler.h: Introduce ptr_eq() to preserve address dependency [RFC,2/4] Documentation: RCU: Refer to ptr_eq() [RFC,3/4] hp: Implement Hazard Pointers [RFC,4/4] sched+mm: Use hazard pointers to track lazy active mm existence

Message ID

20241002010205.1341915-4-mathieu.desnoyers@efficios.com (mailing list archive)

State

New

Headers

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Will Deacon <will@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Alan Stern <stern@rowland.harvard.edu>,
	John Stultz <jstultz@google.com>,
	Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	Ingo Molnar <mingo@redhat.com>,
	Waiman Long <longman@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	maged.michael@gmail.com,
	Mateusz Guzik <mjguzik@gmail.com>,
	Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>,
	rcu@vger.kernel.org,
	linux-mm@kvack.org,
	lkmm@lists.linux.dev
Subject: [RFC PATCH 3/4] hp: Implement Hazard Pointers
Date: Tue,  1 Oct 2024 21:02:04 -0400
Message-Id: <20241002010205.1341915-4-mathieu.desnoyers@efficios.com>
In-Reply-To: <20241002010205.1341915-1-mathieu.desnoyers@efficios.com>
References: <20241002010205.1341915-1-mathieu.desnoyers@efficios.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

sched+mm: Track lazy active mm existence with hazard pointers | expand

Commit Message

Mathieu Desnoyers Oct. 2, 2024, 1:02 a.m. UTC

This API provides existence guarantees of objects through Hazard
Pointers (HP).

Each HP domain defines a fixed number of hazard pointer slots (nr_cpus)
across the entire system.

Its main benefit over RCU is that it allows fast reclaim of
HP-protected pointers without needing to wait for a grace period.

It also allows the hazard pointer scan to call a user-defined callback
to retire a hazard pointer slot immediately if needed. This callback
may, for instance, issue an IPI to the relevant CPU.

There are a few possible use-cases for this in the Linux kernel:

  - Improve performance of mm_count by replacing lazy active mm by HP.
  - Guarantee object existence on pointer dereference to use refcount:
    - replace locking used for that purpose in some drivers,
    - replace RCU + inc_not_zero pattern,
  - rtmutex: Improve situations where locks need to be taken in
    reverse dependency chain order by guaranteeing existence of
    first and second locks in traversal order, allowing them to be
    locked in the correct order (which is reverse from traversal
    order) rather than try-lock+retry on nested lock.

References:

[1]: M. M. Michael, "Hazard pointers: safe memory reclamation for
     lock-free objects," in IEEE Transactions on Parallel and
     Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004

Link: https://lore.kernel.org/lkml/j3scdl5iymjlxavomgc6u5ndg3svhab6ga23dr36o4f5mt333w@7xslvq6b6hmv/
Link: https://lpc.events/event/18/contributions/1731/
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: John Stultz <jstultz@google.com>
Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: maged.michael@gmail.com
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
Cc: rcu@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: lkmm@lists.linux.dev
---
 include/linux/hp.h | 154 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/Makefile    |   2 +-
 kernel/hp.c        |  46 ++++++++++++++
 3 files changed, 201 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/hp.h
 create mode 100644 kernel/hp.c

Comments

Boqun Feng Oct. 3, 2024, 12:24 a.m. UTC | #1

On Tue, Oct 01, 2024 at 09:02:04PM -0400, Mathieu Desnoyers wrote:
> This API provides existence guarantees of objects through Hazard
> Pointers (HP).
> 
> Each HP domain defines a fixed number of hazard pointer slots (nr_cpus)
> across the entire system.
> 
> Its main benefit over RCU is that it allows fast reclaim of
> HP-protected pointers without needing to wait for a grace period.
> 
> It also allows the hazard pointer scan to call a user-defined callback
> to retire a hazard pointer slot immediately if needed. This callback
> may, for instance, issue an IPI to the relevant CPU.
> 
> There are a few possible use-cases for this in the Linux kernel:
> 
>   - Improve performance of mm_count by replacing lazy active mm by HP.
>   - Guarantee object existence on pointer dereference to use refcount:
>     - replace locking used for that purpose in some drivers,
>     - replace RCU + inc_not_zero pattern,
>   - rtmutex: Improve situations where locks need to be taken in
>     reverse dependency chain order by guaranteeing existence of
>     first and second locks in traversal order, allowing them to be
>     locked in the correct order (which is reverse from traversal
>     order) rather than try-lock+retry on nested lock.
> 
> References:
> 
> [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for
>      lock-free objects," in IEEE Transactions on Parallel and
>      Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004
> 
> Link: https://lore.kernel.org/lkml/j3scdl5iymjlxavomgc6u5ndg3svhab6ga23dr36o4f5mt333w@7xslvq6b6hmv/
> Link: https://lpc.events/event/18/contributions/1731/
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: John Stultz <jstultz@google.com>
> Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Zqiang <qiang.zhang1211@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: maged.michael@gmail.com
> Cc: Mateusz Guzik <mjguzik@gmail.com>
> Cc: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
> Cc: rcu@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: lkmm@lists.linux.dev
> ---
>  include/linux/hp.h | 154 +++++++++++++++++++++++++++++++++++++++++++++
>  kernel/Makefile    |   2 +-
>  kernel/hp.c        |  46 ++++++++++++++
>  3 files changed, 201 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/hp.h
>  create mode 100644 kernel/hp.c
> 
> diff --git a/include/linux/hp.h b/include/linux/hp.h
> new file mode 100644
> index 000000000000..929e8685a0fd
> --- /dev/null
> +++ b/include/linux/hp.h
> @@ -0,0 +1,154 @@
> +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> +//
> +// SPDX-License-Identifier: LGPL-2.1-or-later
> +
> +#ifndef _LINUX_HP_H
> +#define _LINUX_HP_H
> +
> +/*
> + * HP: Hazard Pointers
> + *
> + * This API provides existence guarantees of objects through hazard
> + * pointers.
> + *
> + * It uses a fixed number of hazard pointer slots (nr_cpus) across the
> + * entire system for each HP domain.
> + *
> + * Its main benefit over RCU is that it allows fast reclaim of
> + * HP-protected pointers without needing to wait for a grace period.
> + *
> + * It also allows the hazard pointer scan to call a user-defined callback
> + * to retire a hazard pointer slot immediately if needed. This callback
> + * may, for instance, issue an IPI to the relevant CPU.
> + *
> + * References:
> + *
> + * [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for
> + *      lock-free objects," in IEEE Transactions on Parallel and
> + *      Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004
> + */
> +
> +#include <linux/rcupdate.h>
> +
> +/*
> + * Hazard pointer slot.
> + */
> +struct hp_slot {
> +	void *addr;
> +};
> +
> +/*
> + * Hazard pointer context, returned by hp_use().
> + */
> +struct hp_ctx {
> +	struct hp_slot *slot;
> +	void *addr;
> +};
> +
> +/*
> + * hp_scan: Scan hazard pointer domain for @addr.
> + *
> + * Scan hazard pointer domain for @addr.
> + * If @retire_cb is NULL, wait to observe that each slot contains a value
> + * that differs from @addr.
> + * If @retire_cb is non-NULL, invoke @callback for each slot containing
> + * @addr.
> + */
> +void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr,
> +	     void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr));
> +
> +/* Get the hazard pointer context address (may be NULL). */
> +static inline
> +void *hp_ctx_addr(struct hp_ctx ctx)
> +{
> +	return ctx.addr;
> +}
> +
> +/*
> + * hp_allocate: Allocate a hazard pointer.
> + *
> + * Allocate a hazard pointer slot for @addr. The object existence should
> + * be guaranteed by the caller.
> + *
> + * Returns a hazard pointer context.
> + */
> +static inline
> +struct hp_ctx hp_allocate(struct hp_slot __percpu *percpu_slots, void *addr)
> +{
> +	struct hp_slot *slot;
> +	struct hp_ctx ctx;
> +
> +	if (!addr)
> +		goto fail;
> +	slot = this_cpu_ptr(percpu_slots);

Are you assuming this is called with preemption disabled? Otherwise,
there could two threads picking up the same hazard pointer slot on one
CPU,

> +	/*
> +	 * A single hazard pointer slot per CPU is available currently.
> +	 * Other hazard pointer domains can eventually have a different
> +	 * configuration.
> +	 */
> +	if (READ_ONCE(slot->addr))
> +		goto fail;

.. and they could both read an empty slot, and both think they
successfully protect the objects, which could be different objects.

Or am I missing something subtle here?

> +	WRITE_ONCE(slot->addr, addr);	/* Store B */
> +	ctx.slot = slot;
> +	ctx.addr = addr;
> +	return ctx;
> +
> +fail:
> +	ctx.slot = NULL;
> +	ctx.addr = NULL;
> +	return ctx;
> +}
> +
> +/*
> + * hp_dereference_allocate: Dereference and allocate a hazard pointer.
> + *
> + * Returns a hazard pointer context.
> + */
> +static inline
> +struct hp_ctx hp_dereference_allocate(struct hp_slot __percpu *percpu_slots, void * const * addr_p)
> +{
> +	struct hp_slot *slot;
> +	void *addr, *addr2;
> +	struct hp_ctx ctx;
> +
> +	addr = READ_ONCE(*addr_p);
> +retry:
> +	ctx = hp_allocate(percpu_slots, addr);
> +	if (!hp_ctx_addr(ctx))
> +		goto fail;
> +	/* Memory ordering: Store B before Load A. */
> +	smp_mb();
> +	/*
> +	 * Use RCU dereference without lockdep checks, because
> +	 * lockdep is not aware of HP guarantees.
> +	 */
> +	addr2 = rcu_access_pointer(*addr_p);	/* Load A */

Why rcu_access_pointer() instead of READ_ONCE()? Because you want to
mark the head of address dependency?

Regards,
Boqun

> +	/*
> +	 * If @addr_p content has changed since the first load,
> +	 * clear the hazard pointer and try again.
> +	 */
> +	if (!ptr_eq(addr2, addr)) {
> +		WRITE_ONCE(slot->addr, NULL);
> +		if (!addr2)
> +			goto fail;
> +		addr = addr2;
> +		goto retry;
> +	}
> +	ctx.slot = slot;
> +	ctx.addr = addr2;
> +	return ctx;
> +
> +fail:
> +	ctx.slot = NULL;
> +	ctx.addr = NULL;
> +	return ctx;
> +}
> +
[...]

Mathieu Desnoyers Oct. 3, 2024, 1:30 p.m. UTC | #2

On 2024-10-03 02:24, Boqun Feng wrote:
> On Tue, Oct 01, 2024 at 09:02:04PM -0400, Mathieu Desnoyers wrote:
[...]
>> +/*
>> + * hp_allocate: Allocate a hazard pointer.
>> + *
>> + * Allocate a hazard pointer slot for @addr. The object existence should
>> + * be guaranteed by the caller.
>> + *
>> + * Returns a hazard pointer context.
>> + */
>> +static inline
>> +struct hp_ctx hp_allocate(struct hp_slot __percpu *percpu_slots, void *addr)
>> +{
>> +	struct hp_slot *slot;
>> +	struct hp_ctx ctx;
>> +
>> +	if (!addr)
>> +		goto fail;
>> +	slot = this_cpu_ptr(percpu_slots);
> 
> Are you assuming this is called with preemption disabled? Otherwise,
> there could two threads picking up the same hazard pointer slot on one
> CPU,

Indeed, this minimalist implementation only covers the preempt-off
use-case, where there is a single use of HP per CPU at any given
time (e.g. for the lazy mm use-case). It expects to be called
from preempt-off context. I will update the comment accordingly.

> 
>> +	/*
>> +	 * A single hazard pointer slot per CPU is available currently.
>> +	 * Other hazard pointer domains can eventually have a different
>> +	 * configuration.
>> +	 */
>> +	if (READ_ONCE(slot->addr))
>> +		goto fail;
> 
> .. and they could both read an empty slot, and both think they
> successfully protect the objects, which could be different objects.
> 
> Or am I missing something subtle here?

You are correct, I should document this.

> 
>> +	WRITE_ONCE(slot->addr, addr);	/* Store B */
>> +	ctx.slot = slot;
>> +	ctx.addr = addr;
>> +	return ctx;
>> +
>> +fail:
>> +	ctx.slot = NULL;
>> +	ctx.addr = NULL;
>> +	return ctx;
>> +}
>> +
>> +/*
>> + * hp_dereference_allocate: Dereference and allocate a hazard pointer.
>> + *
>> + * Returns a hazard pointer context.
>> + */
>> +static inline
>> +struct hp_ctx hp_dereference_allocate(struct hp_slot __percpu *percpu_slots, void * const * addr_p)
>> +{
>> +	struct hp_slot *slot;
>> +	void *addr, *addr2;
>> +	struct hp_ctx ctx;
>> +
>> +	addr = READ_ONCE(*addr_p);
>> +retry:
>> +	ctx = hp_allocate(percpu_slots, addr);
>> +	if (!hp_ctx_addr(ctx))
>> +		goto fail;
>> +	/* Memory ordering: Store B before Load A. */
>> +	smp_mb();
>> +	/*
>> +	 * Use RCU dereference without lockdep checks, because
>> +	 * lockdep is not aware of HP guarantees.
>> +	 */
>> +	addr2 = rcu_access_pointer(*addr_p);	/* Load A */
> 
> Why rcu_access_pointer() instead of READ_ONCE()? Because you want to
> mark the head of address dependency?

Yes, the intent here is to mark the address dependency and provide
a publication guarantee similar to RCU pairing rcu_assign_pointer
and rcu_dereference. Do you see any reason why READ_ONCE() would
suffice here ?

Thanks,

Mathieu


> 
> Regards,
> Boqun
> 
>> +	/*
>> +	 * If @addr_p content has changed since the first load,
>> +	 * clear the hazard pointer and try again.
>> +	 */
>> +	if (!ptr_eq(addr2, addr)) {
>> +		WRITE_ONCE(slot->addr, NULL);
>> +		if (!addr2)
>> +			goto fail;
>> +		addr = addr2;
>> +		goto retry;
>> +	}
>> +	ctx.slot = slot;
>> +	ctx.addr = addr2;
>> +	return ctx;
>> +
>> +fail:
>> +	ctx.slot = NULL;
>> +	ctx.addr = NULL;
>> +	return ctx;
>> +}
>> +
> [...]

Boqun Feng Oct. 7, 2024, 1:47 p.m. UTC | #3

On Thu, Oct 03, 2024 at 09:30:53AM -0400, Mathieu Desnoyers wrote:
[...]
> > > +	/*
> > > +	 * Use RCU dereference without lockdep checks, because
> > > +	 * lockdep is not aware of HP guarantees.
> > > +	 */
> > > +	addr2 = rcu_access_pointer(*addr_p);	/* Load A */
> > 
> > Why rcu_access_pointer() instead of READ_ONCE()? Because you want to
> > mark the head of address dependency?
> 
> Yes, the intent here is to mark the address dependency and provide
> a publication guarantee similar to RCU pairing rcu_assign_pointer
> and rcu_dereference. Do you see any reason why READ_ONCE() would
> suffice here ?

READ_ONCE() also provides address dependencies. See the "DEPENDENCY
RELATIONS: data, addr, and ctrl" section in
tools/memory-model/Documentation/explanantion.txt.

Regards,
Boqun

> 
> Thanks,
> 
> Mathieu
>

Mathieu Desnoyers Oct. 7, 2024, 2:52 p.m. UTC | #4

On 2024-10-07 15:47, Boqun Feng wrote:
> On Thu, Oct 03, 2024 at 09:30:53AM -0400, Mathieu Desnoyers wrote:
> [...]
>>>> +	/*
>>>> +	 * Use RCU dereference without lockdep checks, because
>>>> +	 * lockdep is not aware of HP guarantees.
>>>> +	 */
>>>> +	addr2 = rcu_access_pointer(*addr_p);	/* Load A */
>>>
>>> Why rcu_access_pointer() instead of READ_ONCE()? Because you want to
>>> mark the head of address dependency?
>>
>> Yes, the intent here is to mark the address dependency and provide
>> a publication guarantee similar to RCU pairing rcu_assign_pointer
>> and rcu_dereference. Do you see any reason why READ_ONCE() would
>> suffice here ?
> 
> READ_ONCE() also provides address dependencies. See the "DEPENDENCY
> RELATIONS: data, addr, and ctrl" section in
> tools/memory-model/Documentation/explanantion.txt.

Fair point, so let's use READ_ONCE() then.

Thanks,

Mathieu

> 
> Regards,
> Boqun
> 
>>
>> Thanks,
>>
>> Mathieu
>>

diff --git a/include/linux/hp.h b/include/linux/hp.h
new file mode 100644
index 000000000000..929e8685a0fd
--- /dev/null
+++ b/include/linux/hp.h
@@ -0,0 +1,154 @@ 
+// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+//
+// SPDX-License-Identifier: LGPL-2.1-or-later
+
+#ifndef _LINUX_HP_H
+#define _LINUX_HP_H
+
+/*
+ * HP: Hazard Pointers
+ *
+ * This API provides existence guarantees of objects through hazard
+ * pointers.
+ *
+ * It uses a fixed number of hazard pointer slots (nr_cpus) across the
+ * entire system for each HP domain.
+ *
+ * Its main benefit over RCU is that it allows fast reclaim of
+ * HP-protected pointers without needing to wait for a grace period.
+ *
+ * It also allows the hazard pointer scan to call a user-defined callback
+ * to retire a hazard pointer slot immediately if needed. This callback
+ * may, for instance, issue an IPI to the relevant CPU.
+ *
+ * References:
+ *
+ * [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for
+ *      lock-free objects," in IEEE Transactions on Parallel and
+ *      Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004
+ */
+
+#include <linux/rcupdate.h>
+
+/*
+ * Hazard pointer slot.
+ */
+struct hp_slot {
+	void *addr;
+};
+
+/*
+ * Hazard pointer context, returned by hp_use().
+ */
+struct hp_ctx {
+	struct hp_slot *slot;
+	void *addr;
+};
+
+/*
+ * hp_scan: Scan hazard pointer domain for @addr.
+ *
+ * Scan hazard pointer domain for @addr.
+ * If @retire_cb is NULL, wait to observe that each slot contains a value
+ * that differs from @addr.
+ * If @retire_cb is non-NULL, invoke @callback for each slot containing
+ * @addr.
+ */
+void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr,
+	     void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr));
+
+/* Get the hazard pointer context address (may be NULL). */
+static inline
+void *hp_ctx_addr(struct hp_ctx ctx)
+{
+	return ctx.addr;
+}
+
+/*
+ * hp_allocate: Allocate a hazard pointer.
+ *
+ * Allocate a hazard pointer slot for @addr. The object existence should
+ * be guaranteed by the caller.
+ *
+ * Returns a hazard pointer context.
+ */
+static inline
+struct hp_ctx hp_allocate(struct hp_slot __percpu *percpu_slots, void *addr)
+{
+	struct hp_slot *slot;
+	struct hp_ctx ctx;
+
+	if (!addr)
+		goto fail;
+	slot = this_cpu_ptr(percpu_slots);
+	/*
+	 * A single hazard pointer slot per CPU is available currently.
+	 * Other hazard pointer domains can eventually have a different
+	 * configuration.
+	 */
+	if (READ_ONCE(slot->addr))
+		goto fail;
+	WRITE_ONCE(slot->addr, addr);	/* Store B */
+	ctx.slot = slot;
+	ctx.addr = addr;
+	return ctx;
+
+fail:
+	ctx.slot = NULL;
+	ctx.addr = NULL;
+	return ctx;
+}
+
+/*
+ * hp_dereference_allocate: Dereference and allocate a hazard pointer.
+ *
+ * Returns a hazard pointer context.
+ */
+static inline
+struct hp_ctx hp_dereference_allocate(struct hp_slot __percpu *percpu_slots, void * const * addr_p)
+{
+	struct hp_slot *slot;
+	void *addr, *addr2;
+	struct hp_ctx ctx;
+
+	addr = READ_ONCE(*addr_p);
+retry:
+	ctx = hp_allocate(percpu_slots, addr);
+	if (!hp_ctx_addr(ctx))
+		goto fail;
+	/* Memory ordering: Store B before Load A. */
+	smp_mb();
+	/*
+	 * Use RCU dereference without lockdep checks, because
+	 * lockdep is not aware of HP guarantees.
+	 */
+	addr2 = rcu_access_pointer(*addr_p);	/* Load A */
+	/*
+	 * If @addr_p content has changed since the first load,
+	 * clear the hazard pointer and try again.
+	 */
+	if (!ptr_eq(addr2, addr)) {
+		WRITE_ONCE(slot->addr, NULL);
+		if (!addr2)
+			goto fail;
+		addr = addr2;
+		goto retry;
+	}
+	ctx.slot = slot;
+	ctx.addr = addr2;
+	return ctx;
+
+fail:
+	ctx.slot = NULL;
+	ctx.addr = NULL;
+	return ctx;
+}
+
+/* Retire the hazard pointer in @ctx. */
+static inline
+void hp_retire(const struct hp_ctx ctx)
+{
+	smp_store_release(&ctx.slot->addr, NULL);
+}
+
+#endif /* _LINUX_HP_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 3c13240dfc9f..ec16de96fa80 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -7,7 +7,7 @@  obj-y     = fork.o exec_domain.o panic.o \
 	    cpu.o exit.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o user.o \
 	    signal.o sys.o umh.o workqueue.o pid.o task_work.o \
-	    extable.o params.o \
+	    extable.o params.o hp.o \
 	    kthread.o sys_ni.o nsproxy.o \
 	    notifier.o ksysfs.o cred.o reboot.o \
 	    async.o range.o smpboot.o ucount.o regset.o ksyms_common.o
diff --git a/kernel/hp.c b/kernel/hp.c
new file mode 100644
index 000000000000..b2447bf15300
--- /dev/null
+++ b/kernel/hp.c
@@ -0,0 +1,46 @@ 
+// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+//
+// SPDX-License-Identifier: LGPL-2.1-or-later
+
+/*
+ * HP: Hazard Pointers
+ */
+
+#include <linux/hp.h>
+#include <linux/percpu.h>
+
+/*
+ * hp_scan: Scan hazard pointer domain for @addr.
+ *
+ * Scan hazard pointer domain for @addr.
+ * If @retire_cb is non-NULL, invoke @callback for each slot containing
+ * @addr.
+ * Wait to observe that each slot contains a value that differs from
+ * @addr before returning.
+ */
+void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr,
+	     void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr))
+{
+	int cpu;
+
+	/*
+	 * Store A precedes hp_scan(): it unpublishes addr (sets it to
+	 * NULL or to a different value), and thus hides it from hazard
+	 * pointer readers.
+	 */
+
+	if (!addr)
+		return;
+	/* Memory ordering: Store A before Load B. */
+	smp_mb();
+	/* Scan all CPUs slots. */
+	for_each_possible_cpu(cpu) {
+		struct hp_slot *slot = per_cpu_ptr(percpu_slots, cpu);
+
+		if (retire_cb && smp_load_acquire(&slot->addr) == addr)	/* Load B */
+			retire_cb(cpu, slot, addr);
+		/* Busy-wait if node is found. */
+		while ((smp_load_acquire(&slot->addr)) == addr)	/* Load B */
+			cpu_relax();
+	}
+}

[RFC,3/4] hp: Implement Hazard Pointers

Commit Message

Comments

Patch