From patchwork Mon Sep 16 05:08:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13805010 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF2C3C3ABCB for ; Mon, 16 Sep 2024 05:09:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 483916B0096; Mon, 16 Sep 2024 01:09:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 432C36B0098; Mon, 16 Sep 2024 01:09:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2AC656B0099; Mon, 16 Sep 2024 01:09:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 077A66B0096 for ; Mon, 16 Sep 2024 01:09:46 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 712A5160109 for ; Mon, 16 Sep 2024 05:09:45 +0000 (UTC) X-FDA: 82569423930.07.4056639 Received: from NAM02-BN1-obe.outbound.protection.outlook.com (mail-bn1nam02on2050.outbound.protection.outlook.com [40.107.212.50]) by imf08.hostedemail.com (Postfix) with ESMTP id 7D393160018 for ; Mon, 16 Sep 2024 05:09:42 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=p0BqNEe5; arc=pass ("microsoft.com:s=arcselector10001:i=1"); spf=pass (imf08.hostedemail.com: domain of Neeraj.Upadhyay@amd.com designates 40.107.212.50 as permitted sender) smtp.mailfrom=Neeraj.Upadhyay@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726463274; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r604xJvf0kpMD9SUu6dvCf+ZojbB7Oe7MWbhODd9r8k=; b=68SFiFpnAy6FxDGuJe5Hk493a+iT7HETkGZjJnp/JT1NpwRGNAaXN6MW9zqmuwEj4GXqLu HB8P5SMe+l+5hDhNt5ZhIYH6WnRmTSYXvk3yNEy8bpeRnamCWD532FILuYHY5NJcVbNRu2 e1dVBlrIJ7CWWhVABOO4jHyvomo0oJ8= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1726463274; a=rsa-sha256; cv=pass; b=GmwJl0OupGL1NrSSils1sxOymYcWD0oOEfpHA6reTEQg8pgcM4X+9M9bvV6v6VhFUwIICW qIUeuco8gWZLWONt+3h/HudptcavNiR206OHKNlOZTC82zQMDNE89+vrb/bMxUdM9gZeMM 8kYtUNmMptdWr5zdW2PD8g4AJVGkXiI= ARC-Authentication-Results: i=2; imf08.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=p0BqNEe5; arc=pass ("microsoft.com:s=arcselector10001:i=1"); spf=pass (imf08.hostedemail.com: domain of Neeraj.Upadhyay@amd.com designates 40.107.212.50 as permitted sender) smtp.mailfrom=Neeraj.Upadhyay@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=wqUtnxMrjx6LqyfywxM5se9oeFvddKQ0FLQ9kHtdW+lwIJ7I0XEl/FlKxmOwyi+gg3YhjWivBBnF34frJPez9qbnCx/tBLuTn8hybsz6LV9Y2RC7uFUu8+dxjcKxflvsIL92ZoJulLL2Qq/hb57TGM5WU4K3cZ1PvklasJEnh7bkVrBcHIM/pDKMd0ElC8YD9xxS76ES2wQ4la5/x7EZHZzGxPJaqBoc/+a3kDyRyQ4w5g2Y01Og+OA4ZfiNB+G19dAwtUSqnTPn+pvSheizmxQFBj3lBwYGB156TqmDAPyS1wTboEp1NS47XhImgL12qrSj1VOs720u/yOcBsRPXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=r604xJvf0kpMD9SUu6dvCf+ZojbB7Oe7MWbhODd9r8k=; b=oBEuog5TwBBe6BHCL/G2+YlSmsR40Os+MQLHvKmCDfgQ8Th7DRyYhuCkbvvfUy0vLI79pz+gpg2gdtGzZytewdKnMYMU9+zhSE/B60Jo2vcYgU6rdC7CY0IcWaRXKovVFpIFNjul1ACkeJxAWpzc9Rd+SwfPSI1y2C8Pyo7OIMyZxb0Y1QTNVKurRqZXgbO0oZb3hRA2e6WqBFXYvWuXm6MXcwl7CQnnnwSLP+XWyu8Gi8Jerl0BKPsdaRbsaEwMO86yi7uqwNjTWhwtgyMjARJ2IDZj82CteiKEYANf2F2/q6Tx5v3VSqdWb0mubN2I9ZscGE8TwjQBhzT4+ZPegg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=r604xJvf0kpMD9SUu6dvCf+ZojbB7Oe7MWbhODd9r8k=; b=p0BqNEe57QVw/NguqJ5X6HfoDI1KREe/QUuLbKSsO5dKhYe2GtC6yKzO5BsIIh22xyY0qShz0e9M1a2TILWJXLlykA0wSQjtUucRy2h5jFmBp2JnTkpallwrAJxR0R+zddwib36pd8b2MT6sBKm1A0eyA1QZ44QNLoHIaM7MXqc= Received: from SJ0PR03CA0168.namprd03.prod.outlook.com (2603:10b6:a03:338::23) by SA1PR12MB8743.namprd12.prod.outlook.com (2603:10b6:806:37c::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.23; Mon, 16 Sep 2024 05:09:37 +0000 Received: from SJ5PEPF000001CF.namprd05.prod.outlook.com (2603:10b6:a03:338:cafe::55) by SJ0PR03CA0168.outlook.office365.com (2603:10b6:a03:338::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Mon, 16 Sep 2024 05:09:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SJ5PEPF000001CF.mail.protection.outlook.com (10.167.242.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 05:09:36 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 16 Sep 2024 00:09:30 -0500 From: Neeraj Upadhyay To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC 3/6] percpu-refcount: Extend managed mode to allow runtime switching Date: Mon, 16 Sep 2024 10:38:08 +0530 Message-ID: <20240916050811.473556-4-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> References: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001CF:EE_|SA1PR12MB8743:EE_ X-MS-Office365-Filtering-Correlation-Id: e9d30e1f-8e1a-4901-7e08-08dcd60dc294 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|36860700013|7416014|376014; X-Microsoft-Antispam-Message-Info: 2/eNvxGBUcgfLL6PgOg/waOf6yMmzplo3OmXZg6n5Gf3h/GeAwXbCtnE9c1tuinzVRcytnuYbUNRwjgd6BZQxsUhIdK00v8ONVTC3C8VvYWgadiZ7tep3MBPTrXxjkmcbi5xZVMunSygJQjEsZqS16ui4MBqfnmZdMcs38rdYrZIQe81agxqAm613fIfJHyBYMb++jETLwwD2wVadL/902iv1mS5ubpFMgz9y3R7M0nBYlP1an+M3Huk34JMVVF1iAFXevpQjvvBZd3RRRxr1FrC1193F1YGLKsmS7H+kc7Qyl2nC04GdHj3O8u58o2uoaKEseLuf51OCnQFTKIzt4IjMYHUp8IBDtURWNXZwF25OovX+/GhsBP3rPZtehNbTgJxXgJl0zquc3X/rqfjofNqRjcxDsupOvca8BWZbWaTayCU0DuyHzWx50/0xf+EfPEBfug5kwd0lTEw45zM5d+A5VRU3P+1xDQxpvsjwAcmbm0BlHmzc3ZnKAQ8AlO6vsJuPFQEhLZzk+XWL1pjkVzhuzuOh+HBuVF8f4jIAQkpdwzksY2uWEUr7nh/d41u2yrNEk2ArbqG8rPaPWpTHqwODe99GWWuLdspZQSJLHLv96kKnf5GOi+k3ftkS0RG7NMHLExH4TzIDKABHoOeNo8MQMwPrmn8Cg+UgjP+T6K7WGcKo0EkMfD55TXvKEhOOSnkOce9U+Cn6twCUpJWFgPjC1F7mhUs/ciZ+Vt/1Zk3SOmXD1gmAyqM4Pp2ce+K+fdoohjxPbEaUBYBpvc/A8ufVHLvhYNCcXexAtbx27auq1qfMWaqmCbkOhlyxpqJmROJnSM04oPwbjxqufC/BoSp4tpBYBZD9h+55OI5Dya7kVcBPUiuCnrXzfcqilzCGUNDytajrJySrDyWDf3A6TftirsXx0ps+ZFeKwZHQ7yFD14jJkQOCCIbmtH0VsaCfdanTYa0xiO/CAGYlpl0Cr9iHC7xLULCUjeBVki/4zTxxbjQKzglUjPyZGpd5BrEAf+BNZrx5fZKQkmSbM9Flaefs7N2oEtGTY38zjgsmiZclYYcEx2vI8DW6m8iIKLg5yvXvz7cAhdSIvZEzspmaM6ExguszBhHkBO/wQBGvOMHMh6fcY+MxG1gQdtirn12ebrt+YspG3FQpoht6EgURutpximmFSEdRjOYx//hbdFhuaMaxY5+72xmgrU5pmjt2q2joactJsuEO6dL10okdX4uzoFu85yVBVWT8aEzAB4uaYH7CzpHie93YjJMEziQojvE26l6PhzaDBVKJ4ekeSDJqFIM7UWOwEq8Rm9hhdo55TkLIxAkaEg2Ds+jzzmY+aDJpO3yNfvG+x3JV+HS7bC4eBInZg2YAkmsmT+joegLmnjbfcOxpEpni9GwlgraDw8OA1sTjHmR0zmkGUAdJZG3PpZ2q/VyoyBS2g4fz7nBp0ncLAyn0WzC+M8NzFM3 X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(36860700013)(7416014)(376014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2024 05:09:36.5791 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e9d30e1f-8e1a-4901-7e08-08dcd60dc294 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001CF.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB8743 X-Rspamd-Queue-Id: 7D393160018 X-Stat-Signature: j6kwjsnfx3shwakg6awfw8sf8eiy7rdr X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1726463382-465086 X-HE-Meta: U2FsdGVkX18/vUSCJQpHp+D7mJb0dZYpgN2jnILEKajI1/FoFAyZAmAyVqsDLjgZMM0mC1P2MlXm3CEC9jFpw4HgOeimgPzDO1nTTHnu2Rdoo+pJGKObe9k0KzgHgmHuNVv/6na0GZOpCf/N9WWvuEjfn4pIhPSQn8wlcsANizuH2wpo+5CLSHnbwOli2APjDZCYzWK3uSbsSncBfN4W0aVIVjDEA2i3zhbK4KwKxW0XFFcXyoqixETAjMktBTOtXrPqbIOcxv0XA9raMUuo3fvo7oUHAUfrRTfCxy4A+XVA4nIPhd2yF+KQaOY1z2lRY5aCfnJgM4uKEqQa1PIWTC1qX8ehULLLxXUtdOmkog05TfON9JwLTKip7boq5eFh2PJQd+8fYgQDK18mIcifGO/wX8u+/9eKpu4OA+NF9GZxEnoRoB/BqAV7jXpSZazrH8QkjXnDuk7RklKjInlAukL0zj/3wHcyg98tdawRDX87hnPhBGRvNk/tsPvme87hzYbzphs7RwgOKqFbWVo/KTEL/yf2JVnJnDA0gkf/bnqWq8Rq5+tH2LyjHAveQBwV8ClE98apLjwNZppPY18Ws1/6RBJEf5qkPPzDedcXeG8ZjeylHZQTwctFHMzYbHq56jZkSgr7aCYOd5DRTn/ydzQZCraTw90MZj3MUFdxf6hVPhmPWN6QoW1qOMTU6oCCzv72SJ7rr4EhuMLavc/OWnBJGx9IlcYAXEZSxDMjVXf8ZJY9E46yyyqYcI0nVPdkVdU7yrALoRrmKLMOBiljRAmkD7ZfyACL2JMm9gMx+cLTzcicHvTo1pwOD0dfR6VBSUF0vI8T8wzcIa1G2UWFEGa2wSdYhYBhV8SGEAdiXj+ZwOyH8aSvTufWh/ls80S2MKEdpyYG7JgCsnc8/Kug1qVIK6MnuZoGaxVRONDIpOLwT1AgRPtxZmM3g+oFUR178KH1fVR585U86s9m48x LPSml727 JLKO04bysxStsMO5jQTddjb4h1VBAGm+aPkH2dZmEqiqdrBATosRps5vZKrL6LpPdBdlkAvhpJnsG8arDkSOU90eC5bsTPeAoFlEIWYpcc7+Vjkqn4yYAWPyq0RErhmV+z46CJ1udhBPgEbNhbAQ62K7bQle1Ca2fDc00HdU3y261jbJ0kGtYWsyK6LIb+FPCFxKntqaqzKO9Y4O3I+b+KrFdWwGadtCcTZiZjuJBXJFTkyu80idTgcmQOPIC/cjdrssf2jeF54ej6eJB1Zh4yT0zGmhR+Nwd5c7hXI3eQeO70gct4+boPIIXCuxhb6lnhrZ+RG+io0UxOGehZGvpAT3pZaZVF8kBXzLpwG0VsSr0XkPDy/RMiwa5ehBt1RnDiV7D11SdysbvVBIiztRK7n25Yg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Provide more flexibility in terms of runtime mode switch for a managed percpu ref. This can be useful for cases when in some scenarios, a managed ref's object enters shutdown phase. Instead of waiting for manager thread to process the ref, user can dirctly invoke percpu_ref_kill() for the ref. The init modes are same as in existing code. Runtime mode switching allows switching back a managed ref to unmanaged mode, which allows transitions to all reinit modes from managed mode. To --> A P P(RI) M D D(RI) D(RI/M) EX REI RES A y n y y n y y y y y P n n n n y n n y n n M y* n y* y n y* y y* y y P(RI) y n y y n y y y y y D(RI) y n y y n y y - y y D(RI/M) y* n y* y n y* y - y y Modes: A - Atomic P - PerCPU M - Managed P(RI) - PerCPU with ReInit D(RI) - Dead with ReInit D(RI/M) - Dead with ReInit and Managed PerCPU Ref Ops: KLL - Kill REI - Reinit RES - Resurrect (RI) is for modes which are initialized with PERCPU_REF_ALLOW_REINIT. The transitions shown above are the allowed transitions and can be indirect transitions. For example, managed ref switches to P(RI) mode when percpu_ref_switch_to_unmanaged() is called for it. P(RI) mode can be directly switched to A mode using percpu_ref_switch_to_atomic(). Signed-off-by: Neeraj Upadhyay --- include/linux/percpu-refcount.h | 3 +- lib/percpu-refcount.c | 248 +++++++++++--------------------- 2 files changed, 88 insertions(+), 163 deletions(-) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index e6aea81b3d01..fe967db431a6 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -110,7 +110,7 @@ struct percpu_ref_data { struct rcu_head rcu; struct percpu_ref *ref; unsigned int aux_flags; - struct llist_node node; + struct list_head node; }; @@ -139,6 +139,7 @@ void percpu_ref_switch_to_atomic(struct percpu_ref *ref, void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref); void percpu_ref_switch_to_percpu(struct percpu_ref *ref); int percpu_ref_switch_to_managed(struct percpu_ref *ref); +void percpu_ref_switch_to_unmanaged(struct percpu_ref *ref); void percpu_ref_kill_and_confirm(struct percpu_ref *ref, percpu_ref_func_t *confirm_kill); void percpu_ref_resurrect(struct percpu_ref *ref); diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index 7d0c85c7ce57..b79e36905aa4 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -5,7 +5,7 @@ #include #include #include -#include +#include #include #include #include @@ -43,7 +43,12 @@ static DEFINE_SPINLOCK(percpu_ref_switch_lock); static DECLARE_WAIT_QUEUE_HEAD(percpu_ref_switch_waitq); -static LLIST_HEAD(percpu_ref_manage_head); +static struct list_head percpu_ref_manage_head = LIST_HEAD_INIT(percpu_ref_manage_head); +/* Spinlock protects node additions/deletions */ +static DEFINE_SPINLOCK(percpu_ref_manage_lock); +/* Mutex synchronizes node deletions with the node being scanned */ +static DEFINE_MUTEX(percpu_ref_active_switch_mutex); +static struct list_head *next_percpu_ref_node = &percpu_ref_manage_head; static unsigned long __percpu *percpu_count_ptr(struct percpu_ref *ref) { @@ -112,7 +117,7 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release, data->confirm_switch = NULL; data->ref = ref; ref->data = data; - init_llist_node(&data->node); + INIT_LIST_HEAD(&data->node); if (flags & PERCPU_REF_REL_MANAGED) percpu_ref_switch_to_managed(ref); @@ -150,9 +155,9 @@ static int __percpu_ref_switch_to_managed(struct percpu_ref *ref) data->force_atomic = false; if (!__ref_is_percpu(ref, &percpu_count)) __percpu_ref_switch_mode(ref, NULL); - /* Ensure ordering of percpu mode switch and node scan */ - smp_mb(); - llist_add(&data->node, &percpu_ref_manage_head); + spin_lock(&percpu_ref_manage_lock); + list_add(&data->node, &percpu_ref_manage_head); + spin_unlock(&percpu_ref_manage_lock); return 0; @@ -162,7 +167,7 @@ static int __percpu_ref_switch_to_managed(struct percpu_ref *ref) } /** - * percpu_ref_switch_to_managed - Switch an unmanaged ref to percpu mode. + * percpu_ref_switch_to_managed - Switch an unmanaged ref to percpu managed mode. * * @ref: percpu_ref to switch to managed mode * @@ -179,6 +184,47 @@ int percpu_ref_switch_to_managed(struct percpu_ref *ref) } EXPORT_SYMBOL_GPL(percpu_ref_switch_to_managed); +/** + * percpu_ref_switch_to_unmanaged - Switch a managed ref to percpu mode. + * + * @ref: percpu_ref to switch back to unmanaged percpu mode + * + * Must only be called with elevated refcount. + */ +void percpu_ref_switch_to_unmanaged(struct percpu_ref *ref) +{ + bool mutex_taken = false; + struct list_head *node; + unsigned long flags; + + might_sleep(); + + WARN_ONCE(!percpu_ref_is_managed(ref), "Percpu ref is not managed"); + + node = &ref->data->node; + spin_lock(&percpu_ref_manage_lock); + if (list_empty(node)) { + spin_unlock(&percpu_ref_manage_lock); + mutex_taken = true; + mutex_lock(&percpu_ref_active_switch_mutex); + spin_lock(&percpu_ref_manage_lock); + } + + if (next_percpu_ref_node == node) + next_percpu_ref_node = next_percpu_ref_node->next; + list_del_init(node); + spin_unlock(&percpu_ref_manage_lock); + if (mutex_taken) + mutex_unlock(&percpu_ref_active_switch_mutex); + + /* Drop the pseudo-init reference */ + percpu_ref_put(ref); + spin_lock_irqsave(&percpu_ref_switch_lock, flags); + ref->data->aux_flags &= ~__PERCPU_REL_MANAGED; + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); +} +EXPORT_SYMBOL_GPL(percpu_ref_switch_to_unmanaged); + static void __percpu_ref_exit(struct percpu_ref *ref) { unsigned long __percpu *percpu_count = percpu_count_ptr(ref); @@ -599,164 +645,35 @@ module_param(max_scan_count, int, 0444); static void percpu_ref_release_work_fn(struct work_struct *work); -/* - * Sentinel llist nodes for lockless list traveral and deletions by - * the pcpu ref release worker, while nodes are added from - * percpu_ref_init() and percpu_ref_switch_to_managed(). - * - * Sentinel node marks the head of list traversal for the current - * iteration of kworker execution. - */ -struct percpu_ref_sen_node { - bool inuse; - struct llist_node node; -}; - -/* - * We need two sentinel nodes for lockless list manipulations from release - * worker - first node will be used in current reclaim iteration. The second - * node will be used in next iteration. Next iteration marks the first node - * as free, for use in subsequent iteration. - */ -#define PERCPU_REF_SEN_NODES_COUNT 2 - -/* Track last processed percpu ref node */ -static struct llist_node *last_percpu_ref_node; - -static struct percpu_ref_sen_node - percpu_ref_sen_nodes[PERCPU_REF_SEN_NODES_COUNT]; - static DECLARE_DELAYED_WORK(percpu_ref_release_work, percpu_ref_release_work_fn); -static bool percpu_ref_is_sen_node(struct llist_node *node) -{ - return &percpu_ref_sen_nodes[0].node <= node && - node <= &percpu_ref_sen_nodes[PERCPU_REF_SEN_NODES_COUNT - 1].node; -} - -static struct llist_node *percpu_ref_get_sen_node(void) -{ - int i; - struct percpu_ref_sen_node *sn; - - for (i = 0; i < PERCPU_REF_SEN_NODES_COUNT; i++) { - sn = &percpu_ref_sen_nodes[i]; - if (!sn->inuse) { - sn->inuse = true; - return &sn->node; - } - } - - return NULL; -} - -static void percpu_ref_put_sen_node(struct llist_node *node) -{ - struct percpu_ref_sen_node *sn = container_of(node, struct percpu_ref_sen_node, node); - - sn->inuse = false; - init_llist_node(node); -} - -static void percpu_ref_put_all_sen_nodes_except(struct llist_node *node) -{ - int i; - - for (i = 0; i < PERCPU_REF_SEN_NODES_COUNT; i++) { - if (&percpu_ref_sen_nodes[i].node == node) - continue; - percpu_ref_sen_nodes[i].inuse = false; - init_llist_node(&percpu_ref_sen_nodes[i].node); - } -} - static struct workqueue_struct *percpu_ref_release_wq; static void percpu_ref_release_work_fn(struct work_struct *work) { - struct llist_node *pos, *first, *head, *prev, *next; - struct llist_node *sen_node; + struct list_head *node; struct percpu_ref *ref; int count = 0; bool held; - struct llist_node *last_node = READ_ONCE(last_percpu_ref_node); - first = READ_ONCE(percpu_ref_manage_head.first); - if (!first) + mutex_lock(&percpu_ref_active_switch_mutex); + spin_lock(&percpu_ref_manage_lock); + if (list_empty(&percpu_ref_manage_head)) { + next_percpu_ref_node = &percpu_ref_manage_head; + spin_unlock(&percpu_ref_manage_lock); + mutex_unlock(&percpu_ref_active_switch_mutex); goto queue_release_work; - - /* - * Enqueue a dummy node to mark the start of scan. This dummy - * node is used as start point of scan and ensures that - * there is no additional synchronization required with new - * label node additions to the llist. Any new labels will - * be processed in next run of the kworker. - * - * SCAN START PTR - * | - * v - * +----------+ +------+ +------+ +------+ - * | | | | | | | | - * | head ------> dummy|--->|label |--->| label|--->NULL - * | | | node | | | | | - * +----------+ +------+ +------+ +------+ - * - * - * New label addition: - * - * SCAN START PTR - * | - * v - * +----------+ +------+ +------+ +------+ +------+ - * | | | | | | | | | | - * | head |--> label|--> dummy|--->|label |--->| label|--->NULL - * | | | | | node | | | | | - * +----------+ +------+ +------+ +------+ +------+ - * - */ - if (last_node == NULL || last_node->next == NULL) { -retry_sentinel_get: - sen_node = percpu_ref_get_sen_node(); - /* - * All sentinel nodes are in use? This should not happen, as we - * require only one sentinel for the start of list traversal and - * other sentinel node is freed during the traversal. - */ - if (WARN_ONCE(!sen_node, "All sentinel nodes are in use")) { - /* Use first node as the sentinel node */ - head = first->next; - if (!head) { - struct llist_node *ign_node = NULL; - /* - * We exhausted sentinel nodes. However, there aren't - * enough nodes in the llist. So, we have leaked - * sentinel nodes. Reclaim sentinels and retry. - */ - if (percpu_ref_is_sen_node(first)) - ign_node = first; - percpu_ref_put_all_sen_nodes_except(ign_node); - goto retry_sentinel_get; - } - prev = first; - } else { - llist_add(sen_node, &percpu_ref_manage_head); - prev = sen_node; - head = prev->next; - } - } else { - prev = last_node; - head = prev->next; } + if (next_percpu_ref_node == &percpu_ref_manage_head) + node = percpu_ref_manage_head.next; + else + node = next_percpu_ref_node; + next_percpu_ref_node = node->next; + list_del_init(node); + spin_unlock(&percpu_ref_manage_lock); - llist_for_each_safe(pos, next, head) { - /* Free sentinel node which is present in the list */ - if (percpu_ref_is_sen_node(pos)) { - prev->next = pos->next; - percpu_ref_put_sen_node(pos); - continue; - } - - ref = container_of(pos, struct percpu_ref_data, node)->ref; + while (!list_is_head(node, &percpu_ref_manage_head)) { + ref = container_of(node, struct percpu_ref_data, node)->ref; __percpu_ref_switch_to_atomic_sync_checked(ref, false); /* * Drop the ref while in RCU read critical section to @@ -765,24 +682,31 @@ static void percpu_ref_release_work_fn(struct work_struct *work) rcu_read_lock(); percpu_ref_put(ref); held = percpu_ref_tryget(ref); - if (!held) { - prev->next = pos->next; - init_llist_node(pos); + if (held) { + spin_lock(&percpu_ref_manage_lock); + list_add(node, &percpu_ref_manage_head); + spin_unlock(&percpu_ref_manage_lock); + __percpu_ref_switch_to_percpu_checked(ref, false); + } else { ref->percpu_count_ptr |= __PERCPU_REF_DEAD; } rcu_read_unlock(); - if (!held) - continue; - __percpu_ref_switch_to_percpu_checked(ref, false); + mutex_unlock(&percpu_ref_active_switch_mutex); count++; - if (count == READ_ONCE(max_scan_count)) { - WRITE_ONCE(last_percpu_ref_node, pos); + if (count == READ_ONCE(max_scan_count)) goto queue_release_work; + mutex_lock(&percpu_ref_active_switch_mutex); + spin_lock(&percpu_ref_manage_lock); + node = next_percpu_ref_node; + if (!list_is_head(next_percpu_ref_node, &percpu_ref_manage_head)) { + next_percpu_ref_node = next_percpu_ref_node->next; + list_del_init(node); } - prev = pos; + spin_unlock(&percpu_ref_manage_lock); } - WRITE_ONCE(last_percpu_ref_node, NULL); + mutex_unlock(&percpu_ref_active_switch_mutex); + queue_release_work: queue_delayed_work(percpu_ref_release_wq, &percpu_ref_release_work, scan_interval);