From patchwork Thu May 12 03:04:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Fernandes X-Patchwork-Id: 12846882 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85E84C433FE for ; Thu, 12 May 2022 03:05:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242984AbiELDFD (ORCPT ); Wed, 11 May 2022 23:05:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37186 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244299AbiELDFA (ORCPT ); Wed, 11 May 2022 23:05:00 -0400 Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCAEF1FD866 for ; Wed, 11 May 2022 20:04:59 -0700 (PDT) Received: by mail-qv1-xf30.google.com with SMTP id n10so3469710qvi.5 for ; Wed, 11 May 2022 20:04:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TONViqYIQx6lxNplcStVpG+Rk+ubHZb2eJ/SOygdl3M=; b=gFB92lOmFtfM+xmcxsj9coXYagNjnFP8WZqbjJYZ58K3M1cv8P/CwVy4w0iBP1Q+qd cQTikBm/JJhJ3VUDwVZu771czMekWQHz91B1w8bQyr+m39dvGmjXXiREy1HqWAybp5V+ ebX5MKBira6/vwpYDijDMUgoKI6/Gt7eMg9PA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TONViqYIQx6lxNplcStVpG+Rk+ubHZb2eJ/SOygdl3M=; b=OcdihprySpT35P4lmMc2e+MvQTsY8uOeoMUBInCVT5qgovb5+QgGWp0Omd7Mcj4W+8 uwGWBuYE7xfqWmropE4Nr2ugAMRP90/v22hVEn3K0SEGBmUqGG9E+ffy+cy4JcJ9aehQ +GXntPsY1gR+j51IsRZYbgndhSLRl5sYWnemeDXpFAEm141DJNCXRmC6fIgC1503zYx1 +UMvbRvpbQzp3LCUlllmwPopRYw6yYiHMpTMtfpQE+cevlY9pquaZoKRU9gNfBy/bamg V+nwGwKynINvBVCXXptYuLEqW4DbfzjyBIRcjXDeTSm5ROr05KXsz6wuK127H6xD92VQ 7xcg== X-Gm-Message-State: AOAM532638AErVD+qpnLrCov9I7EpC8BiT6/3P/Bu4EiPYozPlSkE8LJ 4f7igka1FppPq7SYS1GyCa+0Xf+tuVN1lg== X-Google-Smtp-Source: ABdhPJyfZpvxUICAuNsWMa5CxKFYh+0q83frtSAXsIkrDVLpRjlwptonXmW2teDfbVOUmVYZvpttug== X-Received: by 2002:ad4:5c88:0:b0:45a:e934:4730 with SMTP id o8-20020ad45c88000000b0045ae9344730mr24340822qvh.61.1652324698730; Wed, 11 May 2022 20:04:58 -0700 (PDT) Received: from joelboxx.c.googlers.com.com (29.46.245.35.bc.googleusercontent.com. [35.245.46.29]) by smtp.gmail.com with ESMTPSA id h124-20020a376c82000000b0069fc13ce203sm2270334qkc.52.2022.05.11.20.04.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 May 2022 20:04:57 -0700 (PDT) From: "Joel Fernandes (Google)" To: rcu@vger.kernel.org Cc: rushikesh.s.kadam@intel.com, urezki@gmail.com, neeraj.iitr10@gmail.com, frederic@kernel.org, paulmck@kernel.org, rostedt@goodmis.org, "Joel Fernandes (Google)" Subject: [RFC v1 01/14] rcu: Add a lock-less lazy RCU implementation Date: Thu, 12 May 2022 03:04:29 +0000 Message-Id: <20220512030442.2530552-2-joel@joelfernandes.org> X-Mailer: git-send-email 2.36.0.550.gb090851708-goog In-Reply-To: <20220512030442.2530552-1-joel@joelfernandes.org> References: <20220512030442.2530552-1-joel@joelfernandes.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org Implement timer-based RCU callback batching. The batch is flushed whenever a certain amount of time has passed, or the batch on a particular CPU grows too big. Also memory pressure can flush it. Locking is avoided to reduce lock contention when queuing and dequeuing happens on different CPUs of a per-cpu list, such as when shrinker context is running on different CPU. Also not having to use locks keeps the per-CPU structure size small. Signed-off-by: Joel Fernandes (Google) --- include/linux/rcupdate.h | 6 ++ kernel/rcu/Kconfig | 8 +++ kernel/rcu/Makefile | 1 + kernel/rcu/lazy.c | 145 +++++++++++++++++++++++++++++++++++++++ kernel/rcu/rcu.h | 5 ++ kernel/rcu/tree.c | 2 + 6 files changed, 167 insertions(+) create mode 100644 kernel/rcu/lazy.c diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 88b42eb46406..d0a6c4f5172c 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -82,6 +82,12 @@ static inline int rcu_preempt_depth(void) #endif /* #else #ifdef CONFIG_PREEMPT_RCU */ +#ifdef CONFIG_RCU_LAZY +void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func); +#else +#define call_rcu_lazy(head, func) call_rcu(head, func) +#endif + /* Internal to kernel */ void rcu_init(void); extern int rcu_scheduler_active __read_mostly; diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig index bf8e341e75b4..c09715079829 100644 --- a/kernel/rcu/Kconfig +++ b/kernel/rcu/Kconfig @@ -241,4 +241,12 @@ config TASKS_TRACE_RCU_READ_MB Say N here if you hate read-side memory barriers. Take the default if you are unsure. +config RCU_LAZY + bool "RCU callback lazy invocation functionality" + depends on RCU_NOCB_CPU + default y + help + To save power, batch RCU callbacks and flush after delay, memory + pressure or callback list growing too big. + endmenu # "RCU Subsystem" diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile index 0cfb009a99b9..8968b330d6e0 100644 --- a/kernel/rcu/Makefile +++ b/kernel/rcu/Makefile @@ -16,3 +16,4 @@ obj-$(CONFIG_RCU_REF_SCALE_TEST) += refscale.o obj-$(CONFIG_TREE_RCU) += tree.o obj-$(CONFIG_TINY_RCU) += tiny.o obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o +obj-$(CONFIG_RCU_LAZY) += lazy.o diff --git a/kernel/rcu/lazy.c b/kernel/rcu/lazy.c new file mode 100644 index 000000000000..55e406cfc528 --- /dev/null +++ b/kernel/rcu/lazy.c @@ -0,0 +1,145 @@ +/* + * Lockless lazy-RCU implementation. + */ +#include +#include +#include +#include "rcu.h" + +// How much to batch before flushing? +#define MAX_LAZY_BATCH 2048 + +// How much to wait before flushing? +#define MAX_LAZY_JIFFIES 10000 + +// We cast lazy_rcu_head to rcu_head and back. This keeps the API simple while +// allowing us to use lockless list node in the head. Also, we use BUILD_BUG_ON +// later to ensure that rcu_head and lazy_rcu_head are of the same size. +struct lazy_rcu_head { + struct llist_node llist_node; + void (*func)(struct callback_head *head); +} __attribute__((aligned(sizeof(void *)))); + +struct rcu_lazy_pcp { + struct llist_head head; + struct delayed_work work; + atomic_t count; +}; +DEFINE_PER_CPU(struct rcu_lazy_pcp, rcu_lazy_pcp_ins); + +// Lockless flush of CPU, can be called concurrently. +static void lazy_rcu_flush_cpu(struct rcu_lazy_pcp *rlp) +{ + struct llist_node *node = llist_del_all(&rlp->head); + struct lazy_rcu_head *cursor, *temp; + + if (!node) + return; + + llist_for_each_entry_safe(cursor, temp, node, llist_node) { + struct rcu_head *rh = (struct rcu_head *)cursor; + debug_rcu_head_unqueue(rh); + call_rcu(rh, rh->func); + atomic_dec(&rlp->count); + } +} + +void call_rcu_lazy(struct rcu_head *head_rcu, rcu_callback_t func) +{ + struct lazy_rcu_head *head = (struct lazy_rcu_head *)head_rcu; + struct rcu_lazy_pcp *rlp; + + preempt_disable(); + rlp = this_cpu_ptr(&rcu_lazy_pcp_ins); + preempt_enable(); + + if (debug_rcu_head_queue((void *)head)) { + // Probable double call_rcu(), just leak. + WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", + __func__, head); + + // Mark as success and leave. + return; + } + + // Queue to per-cpu llist + head->func = func; + llist_add(&head->llist_node, &rlp->head); + + // Flush queue if too big + if (atomic_inc_return(&rlp->count) >= MAX_LAZY_BATCH) { + lazy_rcu_flush_cpu(rlp); + } else { + if (!delayed_work_pending(&rlp->work)) { + schedule_delayed_work(&rlp->work, MAX_LAZY_JIFFIES); + } + } +} + +static unsigned long +lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) +{ + unsigned long count = 0; + int cpu; + + /* Snapshot count of all CPUs */ + for_each_possible_cpu(cpu) { + struct rcu_lazy_pcp *rlp = per_cpu_ptr(&rcu_lazy_pcp_ins, cpu); + + count += atomic_read(&rlp->count); + } + + return count; +} + +static unsigned long +lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + int cpu, freed = 0; + + for_each_possible_cpu(cpu) { + struct rcu_lazy_pcp *rlp = per_cpu_ptr(&rcu_lazy_pcp_ins, cpu); + unsigned long count; + + count = atomic_read(&rlp->count); + lazy_rcu_flush_cpu(rlp); + sc->nr_to_scan -= count; + freed += count; + if (sc->nr_to_scan <= 0) + break; + } + + return freed == 0 ? SHRINK_STOP : freed; +} + +/* + * This function is invoked after MAX_LAZY_JIFFIES timeout. + */ +static void lazy_work(struct work_struct *work) +{ + struct rcu_lazy_pcp *rlp = container_of(work, struct rcu_lazy_pcp, work.work); + + lazy_rcu_flush_cpu(rlp); +} + +static struct shrinker lazy_rcu_shrinker = { + .count_objects = lazy_rcu_shrink_count, + .scan_objects = lazy_rcu_shrink_scan, + .batch = 0, + .seeks = DEFAULT_SEEKS, +}; + +void __init rcu_lazy_init(void) +{ + int cpu; + + BUILD_BUG_ON(sizeof(struct lazy_rcu_head) != sizeof(struct rcu_head)); + + for_each_possible_cpu(cpu) { + struct rcu_lazy_pcp *rlp = per_cpu_ptr(&rcu_lazy_pcp_ins, cpu); + INIT_DELAYED_WORK(&rlp->work, lazy_work); + } + + if (register_shrinker(&lazy_rcu_shrinker)) + pr_err("Failed to register lazy_rcu shrinker!\n"); +} diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 24b5f2c2de87..a5f4b44f395f 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -561,4 +561,9 @@ void show_rcu_tasks_trace_gp_kthread(void); static inline void show_rcu_tasks_trace_gp_kthread(void) {} #endif +#ifdef CONFIG_RCU_LAZY +void rcu_lazy_init(void); +#else +static inline void rcu_lazy_init(void) {} +#endif #endif /* __LINUX_RCU_H */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index a4c25a6283b0..ebdf6f7c9023 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4775,6 +4775,8 @@ void __init rcu_init(void) qovld_calc = DEFAULT_RCU_QOVLD_MULT * qhimark; else qovld_calc = qovld; + + rcu_lazy_init(); } #include "tree_stall.h"