From patchwork Tue Sep 17 14:33:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boqun Feng X-Patchwork-Id: 13806221 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 126D9C3601A for ; Tue, 17 Sep 2024 14:35:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6DD176B0088; Tue, 17 Sep 2024 10:35:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CCD76B008A; Tue, 17 Sep 2024 10:35:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46AF76B0089; Tue, 17 Sep 2024 10:35:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1FBD86B0085 for ; Tue, 17 Sep 2024 10:35:01 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C1356160506 for ; Tue, 17 Sep 2024 14:35:00 +0000 (UTC) X-FDA: 82574477160.07.DD31BB4 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf23.hostedemail.com (Postfix) with ESMTP id 8CF39140022 for ; Tue, 17 Sep 2024 14:34:57 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=K7HlJhHc; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of boqun.feng@gmail.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=boqun.feng@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726583666; a=rsa-sha256; cv=none; b=z5zEK9yZf5/ln5gxr4Fe9ltCRTgIh1hLn6BZfFw8rhgGHL998tG0IBEUFlNNyOXLCYLV/Q ghN70HieaNEuT2wuzmRMbMculziIBhp7FT1V+Ic/BCEnbGpcVzVc5WKFDQ+4NoJKP135WT 52O2/QlFWEBv/GQZgAYlI++RTHa69mc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=K7HlJhHc; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of boqun.feng@gmail.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=boqun.feng@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726583666; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zZq7RPD+uH42F46mDIKK9Ztk0MZzott+JAYc6fmxJrg=; b=rnuNzKnZKTImddUVGQsCNw2ijydoRzuoyZaJ+8kOd7Qg/8JWdidI7xxcHIKXRflwOMMQPR WMLBsXi1S6h9apQxc1U4HAS5krUb5ZMXlH14aRJarkV+XA6KFHvAwMzMceM07pEyQ/O3PV 3os3P9RHmFU8om9iG9Pu/hjcvARAUgU= Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-7a9a7bea3cfso367420585a.1 for ; Tue, 17 Sep 2024 07:34:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726583696; x=1727188496; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:feedback-id:from:to:cc:subject :date:message-id:reply-to; bh=zZq7RPD+uH42F46mDIKK9Ztk0MZzott+JAYc6fmxJrg=; b=K7HlJhHcFEww5UVJ2bXR9dldQYYD9kJZLvMKdmzvF7Fkbxzr7hh1AbGjOg38u/qisO yyOdjORBhA5FOXqLS+ldfb2Bf3I2Tw+mBbD2SJXyzXfErhb3mleHAEKPN8kN56Gt43Va Zl8qd6xTxdLjL+YlnkagUBIBz1KM/kgZ07drNNYcb1bYFmTgefQdBmmh6RNHePrU43kF 0wCIOl0ppTkU0++aE/im+JSrrqtCTN8nSwTP8aNwp7xA6JgwLr5pgRAl4cSjy6R0GRTg 3S1i6t1AvkgTD3dl7BmQUdZn1cebw6pv6OekMnoANDAHRDZPb1VIF44TLi/OpkJeG2Vq ZR5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726583696; x=1727188496; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:feedback-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=zZq7RPD+uH42F46mDIKK9Ztk0MZzott+JAYc6fmxJrg=; b=SHTCO8pnjeI99gtbfRfMDPhbE/1IrXELO9keRWgebGpt2UFuXJtibI/KxoHLqfpP4B 4INKdx9IFsF/SDmz61/jlgq+2yaG6axyr98EfXYqUYrfoXgwEe3FdMKQ7PBJi+jkgROG 6/r1H3/NGMXMWIApTIAdj5J+nVZ9KMIAOSmIQ/1MSMOjZjCj89XkjR6Jb0i5P8hu4Toa E/nuY3a6abv6r/cQUn8LxJkxKs4yj3x65YQL0RJaEfF6FwHAipv1cEZBQrB6RHnmKvxD 8Q6oC8V5xQMIohBV97/2dE0isjaq6NT/U7IFKYZOefTFQ9GSKADIHIyky7q1XDzPo7am HSgA== X-Forwarded-Encrypted: i=1; AJvYcCU/1oHEmR5NYOe4UzLgob6Af+d/o3PV7wqzOqbZqFo7CFQLDmjbjUmocos93bk4qtJ6ETMPTqfxEQ==@kvack.org X-Gm-Message-State: AOJu0Yw69YeZTPWxg+t2I9AR/d3rpKgAPtnVqzsp5NFwCf4K8fNAnhMv 9jrwVxvXHUVigNbZ56pzuLp1qCvcDBAEmVXlZPyQG47/Ux1hxvfofV6wpbaE X-Google-Smtp-Source: AGHT+IF2pmEI6yhOYlc9yJ0coEnr4ACImy1rxYaWM0jfd3MyvAsabH8ssbs86MX7XRNikWoU3iGneA== X-Received: by 2002:a05:6214:498b:b0:6c5:72c0:728b with SMTP id 6a1803df08f44-6c573572609mr285955006d6.24.1726583696247; Tue, 17 Sep 2024 07:34:56 -0700 (PDT) Received: from fauth1-smtp.messagingengine.com (fauth1-smtp.messagingengine.com. [103.168.172.200]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6c58c7aefb7sm35174946d6.124.2024.09.17.07.34.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Sep 2024 07:34:55 -0700 (PDT) Received: from phl-compute-12.internal (phl-compute-12.phl.internal [10.202.2.52]) by mailfauth.phl.internal (Postfix) with ESMTP id 34B341200070; Tue, 17 Sep 2024 10:34:55 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-12.internal (MEProxy); Tue, 17 Sep 2024 10:34:55 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrudekjedgjeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnh htshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredttden ucfhrhhomhepuehoqhhunhcuhfgvnhhguceosghoqhhunhdrfhgvnhhgsehgmhgrihhlrd gtohhmqeenucggtffrrghtthgvrhhnpeegleejiedthedvheeggfejveefjeejkefgveff ieeujefhueeigfegueehgeeggfenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpegsohhquhhnodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhi thihqdeiledvgeehtdeigedqudejjeekheehhedvqdgsohhquhhnrdhfvghngheppehgmh grihhlrdgtohhmsehfihigmhgvrdhnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohgu vgepshhmthhpohhuthdprhgtphhtthhopehlihhnuhigqdhkvghrnhgvlhesvhhgvghrrd hkvghrnhgvlhdrohhrghdprhgtphhtthhopehrtghusehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqmhhmsehkvhgrtghkrdhorhhgpdhrtghpthhtoh eplhhkmhhmsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtohepphgruhhlmhgt kheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepfhhrvgguvghrihgtsehkvghrnhgvlh drohhrghdprhgtphhtthhopehnvggvrhgrjhdruhhprgguhhihrgihsehkvghrnhgvlhdr ohhrghdprhgtphhtthhopehjohgvlhesjhhovghlfhgvrhhnrghnuggvshdrohhrghdprh gtphhtthhopehjohhshhesjhhoshhhthhrihhplhgvthhtrdhorhhg X-ME-Proxy: Feedback-ID: iad51458e:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 17 Sep 2024 10:34:54 -0400 (EDT) From: Boqun Feng To: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@vger.kernel.org Cc: "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , Mark Rutland , Thomas Gleixner , Kent Overstreet , Linus Torvalds , Vlastimil Babka , maged.michael@gmail.com, Neeraj Upadhyay Subject: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers Date: Tue, 17 Sep 2024 07:33:59 -0700 Message-ID: <20240917143402.930114-2-boqun.feng@gmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240917143402.930114-1-boqun.feng@gmail.com> References: <20240917143402.930114-1-boqun.feng@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 8CF39140022 X-Rspamd-Server: rspam01 X-Stat-Signature: ydfftcfzfsahk3uyyd4mbyqf6qnqjumt X-HE-Tag: 1726583697-638258 X-HE-Meta: U2FsdGVkX1+vZoG/OwJs9w6jpy0Qj7YzQqCELMvkc0Dib4gqDuLhMm4lh694k6Z6B2jhVsKkugKP7tiTxijmZ+h0BqETozPR1sZxvENg3U5WptgNzNv5jFm2f4tj04gj3yUnbDU/OIv7ngO8d3bd0b4LsJNjqdQUprv9r0GlwAIir5DqvIxdT7qZMg9qdOeuCTToid2ws83uOumSvmhDZcySzilWCoZQkUdVqpu+ExZUNyFmAGVlVqvzN/kgc1aJzFTW6k5xNtOq0UuKWbC8kSCw5UVhBEDDqWKNrDe+WSBC5MtV6tU3DSKSEMbyB8y+nnL+5aKf1EwEnawYkpEIhx1Y6513EVKzt9rSFCp7UblHMEVO5I4eQjyAe1eI+3wb8Ul3zSBIGsC5Osv7k/U93oYfuH1B4jy3qmzPaSc6weoDvvTh38nDMegIkJvEnE9em2naO17zX4Y9cc0swkHpacnLyAhSw6idLWFTqFSMW8YxmRNkEBYqEiTI8fX25f0Z71pmxJGNNFi3bSyz9oXiwE8XG3bkYZgDDyu1mwAM72I927H1kdieooHGFUXjTytd0P6gKTsHQN0xPuW96lj2foBqLdFtxIh1l/pms9KS5bKsoSZa7HeLWUUQ2fwX5QD2HkYra2CrmzG25SlAyeDu3oex90P4rLaksGrzNJEBymYdzhihyWjk+YjPibfki+JH/l3kJ+twDph8iXSaAGbtOSLeAhRn9D+/sfEEY3I4TNgnPG7FTSD4WN9lUmr9pp19spUkcC723RrVrU9ViShkUR2lQPeQmk3aaGsVcUwh/d/6IwMBxvAGvntUQhLrnlg4tPdcR0RZe/i/p3ADyhpljsN9vAMToahWFicPpuyQ99zprr3XPiTzcxcUPhAhmpknZvMxvorD+WzNy1tqvnSEVvIyWtNNCfjtb+InkQG/lFLP4UvG3jRY/At7hs80rhR43FwqfAUd+k3XV7fqkbM 1pMhsa6K QId3xhSADaXPyTfp+sSuUv946FbLFYCiY4jQUEC5ZNyk3rMYm+JYhOVPQGEedP9NeS9P7F+BNGEhhH7oidpe5Su6JrWchqZFFERk2QoFLBuDPOJb2YIH2Tz2QARPHKw/VMXfC7GQAVr5TY9Gu63juivQv9X2L8JeEzMv8hsKYZfBw1hKOqnp3vtBkUL62n2iyDHuybklvkG3fkZljJI+cWKWtgXxqDxP77Mdn4MuL7X0olq5wA6t4S+0xEjG2MY1zStSiULjpBievL8AjiwLEIsB3IpzwbHWEnZuULJBD5vcxA2dHSeN/1qul2qR6tg/7vG8sSrjYvOgHMp7gNMLH7eVLvAIBn6Mm6Z99kSUrrDuWgen+/iV4HCYoU7dhXrjz2vizQwFyZ3rOOkr0J+I31WcRG00lapUE+4ggOILq/WhmYnz9X7hdAOTnl9Y3lw4vocnKZWwSqn3XajASyx/0mbNzL+r097oGUvLu/rQsD6YWz+/pM8iGPdagBKDvhQv7iZHXpZNdSfSIDhg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hazard pointers [1] provide a way to dynamically distribute refcounting and can be used to improve the scalability of refcounting without significant space cost. Hazard pointers are similar to RCU: they build the synchronization between two part, readers and updaters. Readers are refcount users, they acquire and release refcount. Updaters cleanup objects when there are no reader referencing them (via call_hazptr()). The difference is instead of waiting for a grace period, hazard pointers can free an object as long as there is no one referencing the object. This means for a particular workload, hazard pointers may have smaller memory footprint due to fewer pending callbacks. The synchronization between readers and updaters is built around "hazard pointer slots": a slot readers can use to store a pointer value. Reader side protection: 1. Read the value of a pointer to the target data element. 2. Store it to a hazard pointer slot. 3. Enforce full ordering (e.g. smp_mb()). 4. Re-read the original pointer, reset the slot and retry if the value changed. 5. Otherwise, the continued existence of the target data element is guaranteed. Updater side check: 1. Unpublish the target data element (e.g. setting the pointer value to NULL). 2. Enforce full ordering. 3. Read the value from a hazard pointer slot. 4. If the value doesn't match the target data element, then this slot doesn't represent a reference to it. 5. Otherwise, updater needs to re-check (step 3). To distribute the accesses of hazptr slots from different contexts, hazptr_context is introduced. Users need to define/allocate their own hazptr_context to allocate hazard pointer slots. For the updater side to confirm no existing reference, it needs to scan all the possible slots, and to speed up this process, hazptr_context also contains an rbtree node for each slot so that updater can cache the reader-scan result in an rbtree. The rbtree nodes are pre-allocated in this way to prevent "allocate memory to free memory" in extreme cases. [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for lock-free objects," in IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004 Co-developed-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Co-developed-by: Neeraj Upadhyay Signed-off-by: Neeraj Upadhyay Signed-off-by: Boqun Feng --- include/linux/hazptr.h | 83 ++++++++ kernel/Makefile | 1 + kernel/hazptr.c | 463 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 547 insertions(+) create mode 100644 include/linux/hazptr.h create mode 100644 kernel/hazptr.c diff --git a/include/linux/hazptr.h b/include/linux/hazptr.h new file mode 100644 index 000000000000..4548ca8c75eb --- /dev/null +++ b/include/linux/hazptr.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_HAZPTR_H +#define _LINUX_HAZPTR_H + +#include +#include +#include + +/* Hazard pointer slot. */ +typedef void* hazptr_t; + +#define HAZPTR_SLOT_PER_CTX 8 + +struct hazptr_slot_snap { + struct rb_node node; + hazptr_t slot; +}; + +/* + * A set of hazard pointer slots for a context. + * + * The context can be a task, CPU or reader (depends on the use case). It may + * be allocated statically or dynamically. It can only be used after calling + * init_hazptr_context(), and users are also responsible to call + * cleaup_hazptr_context() when it's not used any more. + */ +struct hazptr_context { + // The lock of the percpu context lists. + spinlock_t *lock; + + struct list_head list; + struct hazptr_slot_snap snaps[HAZPTR_SLOT_PER_CTX]; + ____cacheline_aligned hazptr_t slots[HAZPTR_SLOT_PER_CTX]; +}; + +void init_hazptr_context(struct hazptr_context *hzcp); +void cleanup_hazptr_context(struct hazptr_context *hzcp); +hazptr_t *hazptr_alloc(struct hazptr_context *hzcp); +void hazptr_free(struct hazptr_context *hzcp, hazptr_t *hzp); + +#define hazptr_tryprotect(hzp, gp, field) (typeof(gp))__hazptr_tryprotect(hzp, (void **)&(gp), offsetof(typeof(*gp), field)) +#define hazptr_protect(hzp, gp, field) ({ \ + typeof(gp) ___p; \ + \ + ___p = hazptr_tryprotect(hzp, gp, field); \ + BUG_ON(!___p); \ + ___p; \ +}) + +static inline void *__hazptr_tryprotect(hazptr_t *hzp, + void *const *p, + unsigned long head_offset) +{ + void *ptr; + struct callback_head *head; + + ptr = READ_ONCE(*p); + + if (ptr == NULL) + return NULL; + + head = (struct callback_head *)(ptr + head_offset); + + WRITE_ONCE(*hzp, head); + smp_mb(); + + ptr = READ_ONCE(*p); // read again + + if (ptr + head_offset != head) { // pointer changed + WRITE_ONCE(*hzp, NULL); // reset hazard pointer + return NULL; + } else + return ptr; +} + +static inline void hazptr_clear(hazptr_t *hzp) +{ + /* Pairs with smp_load_acquire() in reader scan. */ + smp_store_release(hzp, NULL); +} + +void call_hazptr(struct callback_head *head, rcu_callback_t func); +#endif diff --git a/kernel/Makefile b/kernel/Makefile index 3c13240dfc9f..7927264b9870 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -50,6 +50,7 @@ obj-y += rcu/ obj-y += livepatch/ obj-y += dma/ obj-y += entry/ +obj-y += hazptr.o obj-$(CONFIG_MODULES) += module/ obj-$(CONFIG_KCMP) += kcmp.o diff --git a/kernel/hazptr.c b/kernel/hazptr.c new file mode 100644 index 000000000000..f22ccc2a4a62 --- /dev/null +++ b/kernel/hazptr.c @@ -0,0 +1,463 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include +#include +#include +#include +#include + +#define HAZPTR_UNUSED (1ul) + +/* Per-CPU data for hazard pointer management. */ +struct hazptr_percpu { + spinlock_t ctx_lock; /* hazptr context list lock */ + struct list_head ctx_list; /* hazptr context list */ + spinlock_t callback_lock; /* Per-CPU callback queue lock */ + struct callback_head *queued; /* Per-CPU callback queue */ + struct callback_head **tail; +}; + +/* + * Per-CPU data contains context lists and callbacks, which are maintained in a + * per-CPU maner to reduce potential contention. This means a global scan (for + * readers or callbacks) has to take each CPU's lock, but it's less problematic + * because that's a slowpath. + */ +DEFINE_PER_CPU(struct hazptr_percpu, hzpcpu); + +/* A RBTree that stores the reader scan result of all hazptr slot */ +struct hazptr_reader_tree { + spinlock_t lock; + struct rb_root root; +}; + +static void init_hazptr_reader_tree(struct hazptr_reader_tree *tree) +{ + spin_lock_init(&tree->lock); + tree->root = RB_ROOT; +} + +static bool is_null_or_unused(hazptr_t slot) +{ + return slot == NULL || ((unsigned long)slot) == HAZPTR_UNUSED; +} + +static int hazptr_node_cmp(const void *key, const struct rb_node *curr) +{ + unsigned long slot = (unsigned long)key; + struct hazptr_slot_snap *curr_snap = container_of(curr, struct hazptr_slot_snap, node); + unsigned long curr_slot = (unsigned long)(curr_snap->slot); + + if (slot < curr_slot) + return -1; + else if (slot > curr_slot) + return 1; + else + return 0; +} + +static bool hazptr_node_less(struct rb_node *new, const struct rb_node *curr) +{ + struct hazptr_slot_snap *new_snap = container_of(new, struct hazptr_slot_snap, node); + + return hazptr_node_cmp((void *)new_snap->slot, curr) < 0; +} + +/* Add the snapshot into a search tree. tree->lock must be held. */ +static inline void reader_add_locked(struct hazptr_reader_tree *tree, + struct hazptr_slot_snap *snap) +{ + lockdep_assert_held(&tree->lock); + BUG_ON(is_null_or_unused(snap->slot)); + + rb_add(&snap->node, &tree->root, hazptr_node_less); +} + +static inline void reader_add(struct hazptr_reader_tree *tree, + struct hazptr_slot_snap *snap) +{ + guard(spinlock_irqsave)(&tree->lock); + + reader_add_locked(tree, snap); +} + +/* Delete the snapshot from a search tree. tree->lock must be held. */ +static inline void reader_del_locked(struct hazptr_reader_tree *tree, + struct hazptr_slot_snap *snap) +{ + lockdep_assert_held(&tree->lock); + + rb_erase(&snap->node, &tree->root); +} + +static inline void reader_del(struct hazptr_reader_tree *tree, + struct hazptr_slot_snap *snap) +{ + guard(spinlock_irqsave)(&tree->lock); + + reader_del_locked(tree, snap); +} + +/* Find whether a read exists. tree->lock must be held. */ +static inline bool reader_exist_locked(struct hazptr_reader_tree *tree, + unsigned long slot) +{ + lockdep_assert_held(&tree->lock); + + return !!rb_find((void *)slot, &tree->root, hazptr_node_cmp); +} + +static inline bool reader_exist(struct hazptr_reader_tree *tree, + unsigned long slot) +{ + guard(spinlock_irqsave)(&tree->lock); + + return reader_exist_locked(tree, slot); +} + +/* + * Scan the readers from one hazptr context and update the global readers tree. + * + * Must be called with hzcp->lock held. + */ +static void hazptr_context_snap_readers_locked(struct hazptr_reader_tree *tree, + struct hazptr_context *hzcp) +{ + lockdep_assert_held(hzcp->lock); + + for (int i = 0; i < HAZPTR_SLOT_PER_CTX; i++) { + /* + * Pairs with smp_store_release() in hazptr_{clear,free}(). + * + * Ensure + * + * + * + * [access protected pointers] + * hazptr_clear(); + * smp_store_release() + * // in reader scan. + * smp_load_acquire(); // is null or unused. + * [run callbacks] // all accesses from + * // reader must be + * // observed. + */ + hazptr_t val = smp_load_acquire(&hzcp->slots[i]); + + if (!is_null_or_unused(val)) { + struct hazptr_slot_snap *snap = &hzcp->snaps[i]; + + // Already in the tree, need to remove first. + if (!is_null_or_unused(snap->slot)) { + reader_del(tree, snap); + } + snap->slot = val; + reader_add(tree, snap); + } + } +} + +/* + * Initialize a hazptr context. + * + * Must be called before using the context for hazptr allocation. + */ +void init_hazptr_context(struct hazptr_context *hzcp) +{ + struct hazptr_percpu *pcpu = this_cpu_ptr(&hzpcpu); + + for (int i = 0; i < HAZPTR_SLOT_PER_CTX; i++) { + hzcp->slots[i] = (hazptr_t)HAZPTR_UNUSED; + hzcp->snaps[i].slot = (hazptr_t)HAZPTR_UNUSED; + } + + guard(spinlock_irqsave)(&pcpu->ctx_lock); + list_add(&hzcp->list, &pcpu->ctx_list); + hzcp->lock = &pcpu->ctx_lock; +} + +struct hazptr_struct { + struct work_struct work; + bool scheduled; + + // list of callbacks, we move percpu queued callbacks into the global + // queued list in workqueue function. + spinlock_t callback_lock; + struct callback_head *queued; + struct callback_head **tail; + + struct hazptr_reader_tree readers; +}; + +static struct hazptr_struct hazptr_struct; + +/* + * Clean up hazptr context. + * + * Must call before freeing the context. This function also removes any + * reference held by the hazard pointer slots in the context, even + * hazptr_clear() or hazptr_free() is not called previously. + */ +void cleanup_hazptr_context(struct hazptr_context *hzcp) +{ + if (hzcp->lock) { + scoped_guard(spinlock_irqsave, hzcp->lock) { + list_del(&hzcp->list); + hzcp->lock = NULL; + } + + for (int i = 0; i < HAZPTR_SLOT_PER_CTX; i++) { + struct hazptr_slot_snap *snap = &hzcp->snaps[i]; + + if (!is_null_or_unused(snap->slot)) + reader_del(&hazptr_struct.readers, snap); + } + } +} + +/* + * Allocate a hazptr slot from a hazptr_context. + * + * Return: NULL means fail to allocate, otherwise the address of the allocated + * slot. + */ +hazptr_t *hazptr_alloc(struct hazptr_context *hzcp) +{ + unsigned long unused; + + for (int i = 0; i < HAZPTR_SLOT_PER_CTX; i++) { + if (((unsigned long)READ_ONCE(hzcp->slots[i])) == HAZPTR_UNUSED) { + unused = HAZPTR_UNUSED; + + /* + * rwm-sequence is relied on here. + * + * This is needed since in case of a previous reader: + * + * + * [access protected pointers] + * hazptr_free(): + * smp_store_release(); // hzptr == UNUSED + * hazptr_alloc(): + * try_cmpxchg_relaxed(); // hzptr == NULL + * + * // in reader scan + * smp_load_acquire(); // is null + * [run callbacks] + * + * Because of the rwm-sequence of release operations, + * when callbacks are run, accesses from reader 1 must + * be already observed by the updater. + */ + if (try_cmpxchg_relaxed(&hzcp->slots[i], &unused, NULL)) { + return (hazptr_t *)&hzcp->slots[i]; + } + } + } + + return NULL; +} + +/* Free a hazptr slot. */ +void hazptr_free(struct hazptr_context *hzcp, hazptr_t *hzptr) +{ + WARN_ON(((unsigned long)*hzptr) == HAZPTR_UNUSED); + + /* Pairs with smp_load_acquire() in reader scan */ + smp_store_release(hzptr, (void *)HAZPTR_UNUSED); +} + +/* Scan all possible readers, and update the global reader tree. */ +static void check_readers(struct hazptr_struct *hzst) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct hazptr_percpu *pcpu = per_cpu_ptr(&hzpcpu, cpu); + struct hazptr_context *ctx; + + guard(spinlock_irqsave)(&pcpu->ctx_lock); + list_for_each_entry(ctx, &pcpu->ctx_list, list) + hazptr_context_snap_readers_locked(&hzst->readers, ctx); + } +} + +/* + * Start the background work for hazptr callback handling if not started. + * + * Must be called with hazptr_struct lock held. + */ +static void kick_hazptr_work(void) +{ + if (hazptr_struct.scheduled) + return; + + queue_work(system_wq, &hazptr_struct.work); + hazptr_struct.scheduled = true; +} + +/* + * Check which callbacks are ready to be called. + * + * Return: a callback list that no reader is referencing the corresponding + * objects. + */ +static struct callback_head *do_hazptr(struct hazptr_struct *hzst) +{ + struct callback_head *tmp, **curr; + struct callback_head *todo = NULL, **todo_tail = &todo; + + // Currently, the lock is unnecessary, but maybe we will have multiple + // work_structs sharing the same callback list in the future; + guard(spinlock_irqsave)(&hzst->callback_lock); + + curr = &hzst->queued; + + while ((tmp = *curr)) { + // No reader, we can free the object. + if (!reader_exist(&hzst->readers, (unsigned long)tmp)) { + // Add tmp into todo. + *todo_tail = tmp; + todo_tail = &tmp->next; + + // Delete tmp from ->queued and move to the next entry. + *curr = tmp->next; + } else + curr = &tmp->next; + } + + hzst->tail = curr; + + // Keep checking if callback list is not empty. + if (hzst->queued) + kick_hazptr_work(); + + *todo_tail = NULL; + + return todo; +} + +static void hazptr_work_func(struct work_struct *work) +{ + struct hazptr_struct *hzst = container_of(work, struct hazptr_struct, work); + struct callback_head *todo; + + // Advance callbacks from hzpcpu to hzst + scoped_guard(spinlock_irqsave, &hzst->callback_lock) { + int cpu; + + hzst->scheduled = false; + for_each_possible_cpu(cpu) { + struct hazptr_percpu *pcpu = per_cpu_ptr(&hzpcpu, cpu); + + guard(spinlock)(&pcpu->callback_lock); + + if (pcpu->queued) { + *(hzst->tail) = pcpu->queued; + hzst->tail = pcpu->tail; + pcpu->queued = NULL; + pcpu->tail = &pcpu->queued; + } + } + } + + // Pairs with the smp_mb() on the reader side. This guarantees that if + // the hazptr work picked up the callback from an updater and the + // updater set the global pointer to NULL before enqueue the callback, + // the hazptr work must observe a reader that protects the hazptr before + // the updater. + // + // + // ptr = READ_ONCE(gp); + // WRITE_ONCE(*hazptr, ptr); + // smp_mb(); // => ->strong-fence + // tofree = gp; + // ptr = READ_ONCE(gp); // re-read, gp is not NULL + // // => ->fre + // WRITE_ONCE(gp, NULL); + // call_hazptr(gp, ...): + // lock(->callback_lock); + // [queued the callback] + // unlock(->callback_lock);// => ->po-unlock-lock-po + // lock(->callback_lock); + // [move from hzpcpu to hzst] + // + // smp_mb(); => ->strong-fence + // ptr = READ_ONCE(*hazptr); + // // ptr == gp, otherwise => ->fre + // + // If ptr != gp, it means there exists a circle with the following + // memory ordering relationships: + // ->strong-fence ->fre ->po-unlock-lock-po ->strong-fence ->fre + // => (due to the definition of prop) + // ->strong-fence ->prop ->strong-fence ->hb ->prop + // => (rotate the circle) + // ->prop ->strong-fence ->prop ->strong-fence ->hb + // => (due to the definition of pb) + // ->pb ->pb + // but pb is acyclic according to LKMM, so this cannot happen. + smp_mb(); + check_readers(hzst); + + todo = do_hazptr(hzst); + + while (todo) { + struct callback_head *next = todo->next; + void (*func)(struct callback_head *) = todo->func; + + func(todo); + + todo = next; + } +} + +/* + * Put @head into the cleanup callback queue. + * + * @func(@head) will be called after no one is referencing the object. Caller + * must ensure the object has been unpublished before calling this. + */ +void call_hazptr(struct callback_head *head, rcu_callback_t func) +{ + struct hazptr_percpu *pcpu = this_cpu_ptr(&hzpcpu); + + head->func = func; + head->next = NULL; + + scoped_guard(spinlock_irqsave, &pcpu->callback_lock) { + *(pcpu->tail) = head; + pcpu->tail = &head->next; + } + + guard(spinlock_irqsave)(&hazptr_struct.callback_lock); + kick_hazptr_work(); +} + +static int init_hazptr_struct(void) +{ + int cpu; + + INIT_WORK(&hazptr_struct.work, hazptr_work_func); + + spin_lock_init(&hazptr_struct.callback_lock); + hazptr_struct.queued = NULL; + hazptr_struct.tail = &hazptr_struct.queued; + + for_each_possible_cpu(cpu) { + struct hazptr_percpu *pcpu = per_cpu_ptr(&hzpcpu, cpu); + + spin_lock_init(&pcpu->ctx_lock); + INIT_LIST_HEAD(&pcpu->ctx_list); + + spin_lock_init(&pcpu->callback_lock); + pcpu->queued = NULL; + pcpu->tail = &pcpu->queued; + + } + + init_hazptr_reader_tree(&hazptr_struct.readers); + + return 0; +} + +early_initcall(init_hazptr_struct);