From patchwork Thu Jan 5 00:24:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089248 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B448C53210 for ; Thu, 5 Jan 2023 00:27:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235521AbjAEA0f (ORCPT ); Wed, 4 Jan 2023 19:26:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235470AbjAEA0F (ORCPT ); Wed, 4 Jan 2023 19:26:05 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A816950F50; Wed, 4 Jan 2023 16:24:55 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CAF906185C; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2FC00C433EF; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672878290; bh=VS7CzfrucTZq5lKAwKfv+JBikA9PozXDvQDOAqrBc+k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=K5v1H8hxvIHVoAXudNisEdfAmk7CCcswplwyigcFl5Iyls+TsNFn+cRC0ivT/6gca xO6GmJB+eroJwEBbCMVO5LlqSN4W88CpNEi5vmgFC3EayDQG3sagne/KWAF8i9YymN A6VVOPyFbnkjnQjnuT8aVQ8VrJ22KdakoVC0Q1cBpxqDikMenmJ+1dqLr3jPz37+VV GUvGdj0rLTNoXcPgWZrm9ERL1hcxTBptGzvMxAFPwSogs03XfKsks8Z5cUHVSust4j +NbUTgS2iAsG3lH9oTEfzzMVN0YxCn27z1pH+yDrAhLrkOzjHjx55yXwqsyUiOVqao /awWtb/YBvDtQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id DEEDD5C05CA; Wed, 4 Jan 2023 16:24:49 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" Subject: [PATCH rcu 1/8] rcu: Refactor kvfree_call_rcu() and high-level helpers Date: Wed, 4 Jan 2023 16:24:41 -0800 Message-Id: <20230105002448.1768892-1-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> References: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Uladzislau Rezki (Sony)" Currently a kvfree_call_rcu() takes an offset within a structure as a second parameter, so a helper such as a kvfree_rcu_arg_2() has to convert rcu_head and a freed ptr to an offset in order to pass it. That leads to an extra conversion on macro entry. Instead of converting, refactor the code in way that a pointer that has to be freed is passed directly to the kvfree_call_rcu(). This patch does not make any functional change and is transparent to all kvfree_rcu() users. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- include/linux/rcupdate.h | 5 ++--- include/linux/rcutiny.h | 12 ++++++------ include/linux/rcutree.h | 2 +- kernel/rcu/tiny.c | 9 +++------ kernel/rcu/tree.c | 29 ++++++++++++----------------- 5 files changed, 24 insertions(+), 33 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 03abf883a281b..f38d4469d7f30 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -1011,8 +1011,7 @@ do { \ \ if (___p) { \ BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \ - kvfree_call_rcu(&((___p)->rhf), (rcu_callback_t)(unsigned long) \ - (offsetof(typeof(*(ptr)), rhf))); \ + kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \ } \ } while (0) @@ -1021,7 +1020,7 @@ do { \ typeof(ptr) ___p = (ptr); \ \ if (___p) \ - kvfree_call_rcu(NULL, (rcu_callback_t) (___p)); \ + kvfree_call_rcu(NULL, (void *) (___p)); \ } while (0) /* diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 68f9070aa1110..7f17acf29dda7 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -98,25 +98,25 @@ static inline void synchronize_rcu_expedited(void) */ extern void kvfree(const void *addr); -static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) +static inline void __kvfree_call_rcu(struct rcu_head *head, void *ptr) { if (head) { - call_rcu(head, func); + call_rcu(head, (rcu_callback_t) ((void *) head - ptr)); return; } // kvfree_rcu(one_arg) call. might_sleep(); synchronize_rcu(); - kvfree((void *) func); + kvfree(ptr); } #ifdef CONFIG_KASAN_GENERIC -void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func); +void kvfree_call_rcu(struct rcu_head *head, void *ptr); #else -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) +static inline void kvfree_call_rcu(struct rcu_head *head, void *ptr) { - __kvfree_call_rcu(head, func); + __kvfree_call_rcu(head, ptr); } #endif diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 4003bf6cfa1c2..56bccb5a8fdea 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -33,7 +33,7 @@ static inline void rcu_virt_note_context_switch(void) } void synchronize_rcu_expedited(void); -void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func); +void kvfree_call_rcu(struct rcu_head *head, void *ptr); void rcu_barrier(void); bool rcu_eqs_special_set(int cpu); diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c index 72913ce21258b..42f7589e51e09 100644 --- a/kernel/rcu/tiny.c +++ b/kernel/rcu/tiny.c @@ -246,15 +246,12 @@ bool poll_state_synchronize_rcu(unsigned long oldstate) EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu); #ifdef CONFIG_KASAN_GENERIC -void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) +void kvfree_call_rcu(struct rcu_head *head, void *ptr) { - if (head) { - void *ptr = (void *) head - (unsigned long) func; - + if (head) kasan_record_aux_stack_noalloc(ptr); - } - __kvfree_call_rcu(head, func); + __kvfree_call_rcu(head, ptr); } EXPORT_SYMBOL_GPL(kvfree_call_rcu); #endif diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index cf34a961821ad..7d222acd85bfd 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3103,8 +3103,8 @@ static void kfree_rcu_work(struct work_struct *work) * This list is named "Channel 3". */ for (; head; head = next) { - unsigned long offset = (unsigned long)head->func; - void *ptr = (void *)head - offset; + void *ptr = (void *) head->func; + unsigned long offset = (void *) head - ptr; next = head->next; debug_rcu_head_unqueue((struct rcu_head *)ptr); @@ -3342,26 +3342,21 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, * be free'd in workqueue context. This allows us to: batch requests together to * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load. */ -void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) +void kvfree_call_rcu(struct rcu_head *head, void *ptr) { unsigned long flags; struct kfree_rcu_cpu *krcp; bool success; - void *ptr; - if (head) { - ptr = (void *) head - (unsigned long) func; - } else { - /* - * Please note there is a limitation for the head-less - * variant, that is why there is a clear rule for such - * objects: it can be used from might_sleep() context - * only. For other places please embed an rcu_head to - * your data. - */ + /* + * Please note there is a limitation for the head-less + * variant, that is why there is a clear rule for such + * objects: it can be used from might_sleep() context + * only. For other places please embed an rcu_head to + * your data. + */ + if (!head) might_sleep(); - ptr = (unsigned long *) func; - } // Queue the object but don't yet schedule the batch. if (debug_rcu_head_queue(ptr)) { @@ -3382,7 +3377,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) // Inline if kvfree_rcu(one_arg) call. goto unlock_return; - head->func = func; + head->func = ptr; head->next = krcp->head; krcp->head = head; success = true; From patchwork Thu Jan 5 00:24:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089253 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6BEBC54EBE for ; Thu, 5 Jan 2023 00:27:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235724AbjAEA0g (ORCPT ); Wed, 4 Jan 2023 19:26:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235494AbjAEA0G (ORCPT ); Wed, 4 Jan 2023 19:26:06 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F3654FD78; Wed, 4 Jan 2023 16:24:52 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DC16761880; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3330EC433D2; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672878290; bh=9q3RXJaEMI0bXFO/vylOhYFyrOVZOPbjbOVjXgIBpGc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OFXK80sEV5h+vkOB8PnwWECzrbUYp/b297NgAZahEHq44//vzGeqCFZtKKqfathpL qmXHwFA/fOFCI2EzaUykt6N6qrfDAEDh7ntVvMBEW3laHX2zk+cSpEzkrffK+yPywC gnGwKMAmwG5HeM5QA4YeK/DuCLWOMvgg5eZR2esF0Jmj8IJrAnxBVB8ys+mm7HCfzL 3wm0NOhK68AWOJncmc/oEYVMykLfcH1G2oQhufRoGw42KFvRQI4J9+1OvaVYOk6oYb ZmJkeg6GA54gLhya4oyXN7EEDpp2+MBdfMocrTFTTxhC+UoU8PV3fQn1gMJ01BEZH0 TD+2y0KCVSTTA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id E27975C086D; Wed, 4 Jan 2023 16:24:49 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" Subject: [PATCH rcu 2/8] rcu/kvfree: Switch to a generic linked list API Date: Wed, 4 Jan 2023 16:24:42 -0800 Message-Id: <20230105002448.1768892-2-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> References: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Uladzislau Rezki (Sony)" This commit improves the readability and maintainability of the kvfree_rcu() code by switching from an open-coded linked list to the standard Linux-kernel circular doubly linked list. This patch does not introduce any functional change. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- kernel/rcu/tree.c | 89 +++++++++++++++++++++++------------------------ 1 file changed, 43 insertions(+), 46 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 7d222acd85bfd..4088b34ce9610 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2876,13 +2876,13 @@ EXPORT_SYMBOL_GPL(call_rcu); /** * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers + * @list: List node. All blocks are linked between each other * @nr_records: Number of active pointers in the array - * @next: Next bulk object in the block chain * @records: Array of the kvfree_rcu() pointers */ struct kvfree_rcu_bulk_data { + struct list_head list; unsigned long nr_records; - struct kvfree_rcu_bulk_data *next; void *records[]; }; @@ -2898,21 +2898,21 @@ struct kvfree_rcu_bulk_data { * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period * @head_free: List of kfree_rcu() objects waiting for a grace period - * @bkvhead_free: Bulk-List of kvfree_rcu() objects waiting for a grace period + * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period * @krcp: Pointer to @kfree_rcu_cpu structure */ struct kfree_rcu_cpu_work { struct rcu_work rcu_work; struct rcu_head *head_free; - struct kvfree_rcu_bulk_data *bkvhead_free[FREE_N_CHANNELS]; + struct list_head bulk_head_free[FREE_N_CHANNELS]; struct kfree_rcu_cpu *krcp; }; /** * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period * @head: List of kfree_rcu() objects not yet waiting for a grace period - * @bkvhead: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period + * @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period * @lock: Synchronize access to this structure * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES @@ -2936,7 +2936,7 @@ struct kfree_rcu_cpu_work { */ struct kfree_rcu_cpu { struct rcu_head *head; - struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS]; + struct list_head bulk_head[FREE_N_CHANNELS]; struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; raw_spinlock_t lock; struct delayed_work monitor_work; @@ -3031,12 +3031,13 @@ drain_page_cache(struct kfree_rcu_cpu *krcp) /* * This function is invoked in workqueue context after a grace period. - * It frees all the objects queued on ->bkvhead_free or ->head_free. + * It frees all the objects queued on ->bulk_head_free or ->head_free. */ static void kfree_rcu_work(struct work_struct *work) { unsigned long flags; - struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS], *bnext; + struct kvfree_rcu_bulk_data *bnode, *n; + struct list_head bulk_head[FREE_N_CHANNELS]; struct rcu_head *head, *next; struct kfree_rcu_cpu *krcp; struct kfree_rcu_cpu_work *krwp; @@ -3048,10 +3049,8 @@ static void kfree_rcu_work(struct work_struct *work) raw_spin_lock_irqsave(&krcp->lock, flags); // Channels 1 and 2. - for (i = 0; i < FREE_N_CHANNELS; i++) { - bkvhead[i] = krwp->bkvhead_free[i]; - krwp->bkvhead_free[i] = NULL; - } + for (i = 0; i < FREE_N_CHANNELS; i++) + list_replace_init(&krwp->bulk_head_free[i], &bulk_head[i]); // Channel 3. head = krwp->head_free; @@ -3060,36 +3059,33 @@ static void kfree_rcu_work(struct work_struct *work) // Handle the first two channels. for (i = 0; i < FREE_N_CHANNELS; i++) { - for (; bkvhead[i]; bkvhead[i] = bnext) { - bnext = bkvhead[i]->next; - debug_rcu_bhead_unqueue(bkvhead[i]); + list_for_each_entry_safe(bnode, n, &bulk_head[i], list) { + debug_rcu_bhead_unqueue(bnode); rcu_lock_acquire(&rcu_callback_map); if (i == 0) { // kmalloc() / kfree(). trace_rcu_invoke_kfree_bulk_callback( - rcu_state.name, bkvhead[i]->nr_records, - bkvhead[i]->records); + rcu_state.name, bnode->nr_records, + bnode->records); - kfree_bulk(bkvhead[i]->nr_records, - bkvhead[i]->records); + kfree_bulk(bnode->nr_records, bnode->records); } else { // vmalloc() / vfree(). - for (j = 0; j < bkvhead[i]->nr_records; j++) { + for (j = 0; j < bnode->nr_records; j++) { trace_rcu_invoke_kvfree_callback( - rcu_state.name, - bkvhead[i]->records[j], 0); + rcu_state.name, bnode->records[j], 0); - vfree(bkvhead[i]->records[j]); + vfree(bnode->records[j]); } } rcu_lock_release(&rcu_callback_map); raw_spin_lock_irqsave(&krcp->lock, flags); - if (put_cached_bnode(krcp, bkvhead[i])) - bkvhead[i] = NULL; + if (put_cached_bnode(krcp, bnode)) + bnode = NULL; raw_spin_unlock_irqrestore(&krcp->lock, flags); - if (bkvhead[i]) - free_page((unsigned long) bkvhead[i]); + if (bnode) + free_page((unsigned long) bnode); cond_resched_tasks_rcu_qs(); } @@ -3125,7 +3121,7 @@ need_offload_krc(struct kfree_rcu_cpu *krcp) int i; for (i = 0; i < FREE_N_CHANNELS; i++) - if (krcp->bkvhead[i]) + if (!list_empty(&krcp->bulk_head[i])) return true; return !!krcp->head; @@ -3162,21 +3158,20 @@ static void kfree_rcu_monitor(struct work_struct *work) for (i = 0; i < KFREE_N_BATCHES; i++) { struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]); - // Try to detach bkvhead or head and attach it over any + // Try to detach bulk_head or head and attach it over any // available corresponding free channel. It can be that // a previous RCU batch is in progress, it means that // immediately to queue another one is not possible so // in that case the monitor work is rearmed. - if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) || - (krcp->bkvhead[1] && !krwp->bkvhead_free[1]) || + if ((!list_empty(&krcp->bulk_head[0]) && list_empty(&krwp->bulk_head_free[0])) || + (!list_empty(&krcp->bulk_head[1]) && list_empty(&krwp->bulk_head_free[1])) || (krcp->head && !krwp->head_free)) { + // Channel 1 corresponds to the SLAB-pointer bulk path. // Channel 2 corresponds to vmalloc-pointer bulk path. for (j = 0; j < FREE_N_CHANNELS; j++) { - if (!krwp->bkvhead_free[j]) { - krwp->bkvhead_free[j] = krcp->bkvhead[j]; - krcp->bkvhead[j] = NULL; - } + if (list_empty(&krwp->bulk_head_free[j])) + list_replace_init(&krcp->bulk_head[j], &krwp->bulk_head_free[j]); } // Channel 3 corresponds to both SLAB and vmalloc @@ -3288,10 +3283,11 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, return false; idx = !!is_vmalloc_addr(ptr); + bnode = list_first_entry_or_null(&(*krcp)->bulk_head[idx], + struct kvfree_rcu_bulk_data, list); /* Check if a new block is required. */ - if (!(*krcp)->bkvhead[idx] || - (*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) { + if (!bnode || bnode->nr_records == KVFREE_BULK_MAX_ENTR) { bnode = get_cached_bnode(*krcp); if (!bnode && can_alloc) { krc_this_cpu_unlock(*krcp, *flags); @@ -3315,18 +3311,13 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, if (!bnode) return false; - /* Initialize the new block. */ + // Initialize the new block and attach it. bnode->nr_records = 0; - bnode->next = (*krcp)->bkvhead[idx]; - - /* Attach it to the head. */ - (*krcp)->bkvhead[idx] = bnode; + list_add(&bnode->list, &(*krcp)->bulk_head[idx]); } /* Finally insert. */ - (*krcp)->bkvhead[idx]->records - [(*krcp)->bkvhead[idx]->nr_records++] = ptr; - + bnode->records[bnode->nr_records++] = ptr; return true; } @@ -4761,7 +4752,7 @@ struct workqueue_struct *rcu_gp_wq; static void __init kfree_rcu_batch_init(void) { int cpu; - int i; + int i, j; /* Clamp it to [0:100] seconds interval. */ if (rcu_delay_page_cache_fill_msec < 0 || @@ -4781,8 +4772,14 @@ static void __init kfree_rcu_batch_init(void) for (i = 0; i < KFREE_N_BATCHES; i++) { INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); krcp->krw_arr[i].krcp = krcp; + + for (j = 0; j < FREE_N_CHANNELS; j++) + INIT_LIST_HEAD(&krcp->krw_arr[i].bulk_head_free[j]); } + for (i = 0; i < FREE_N_CHANNELS; i++) + INIT_LIST_HEAD(&krcp->bulk_head[i]); + INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func); krcp->initialized = true; From patchwork Thu Jan 5 00:24:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 896D7C54E76 for ; Thu, 5 Jan 2023 00:27:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235393AbjAEA0f (ORCPT ); Wed, 4 Jan 2023 19:26:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240715AbjAEA0A (ORCPT ); Wed, 4 Jan 2023 19:26:00 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EE124FD70; Wed, 4 Jan 2023 16:24:51 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E74D261890; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 386EBC433F1; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672878290; bh=eyuMolz7jRjywZspmtRUtHr0uH/ZCSEztevsvhEK960=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YUYIXIZmmQrjC53VvOnPo8Dx0Onugn4spzyTZp9jWjZ4jlWVjX2fkf1SyTzyQwTiN igY6a0u3zn4v6V6l3dJEMtbCYk8zyRnBpDiXZ5AXWYFmpBGoZoAcO4Fnz2lEexjjSb mwv3y7UxZ6AewUICz2SD5mlyZM89fWkb2l6I+oE+YTx9Ttv55wK50ZmjqVfgrHIbAV HMmvTtw4pwCeP5b08DEc08BUFb5dLseMq4LAHrPtqLzfGLwdV4Mnfi+bZqYfToNXkf GDfe0VW4KlKcl0Yz/xFws6KKHfhf8S+xPNwmMCxjj8ymC/eduZSFlQ9h+dESn863wj 1UqwotAmTspWA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id E536D5C08E5; Wed, 4 Jan 2023 16:24:49 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" Subject: [PATCH rcu 3/8] rcu/kvfree: Move bulk/list reclaim to separate functions Date: Wed, 4 Jan 2023 16:24:43 -0800 Message-Id: <20230105002448.1768892-3-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> References: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Uladzislau Rezki (Sony)" The kvfree_rcu() code maintains lists of pages of pointers, but also a singly linked list, with the latter being used when memory allocation fails. Traversal of these two types of lists is currently open coded. This commit simplifies the code by providing kvfree_rcu_bulk() and kvfree_rcu_list() functions, respectively, to traverse these two types of lists. This patch does not introduce any functional change. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- kernel/rcu/tree.c | 114 ++++++++++++++++++++++++++-------------------- 1 file changed, 65 insertions(+), 49 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 4088b34ce9610..839e617f6c370 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3029,6 +3029,65 @@ drain_page_cache(struct kfree_rcu_cpu *krcp) return freed; } +static void +kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp, + struct kvfree_rcu_bulk_data *bnode, int idx) +{ + unsigned long flags; + int i; + + debug_rcu_bhead_unqueue(bnode); + + rcu_lock_acquire(&rcu_callback_map); + if (idx == 0) { // kmalloc() / kfree(). + trace_rcu_invoke_kfree_bulk_callback( + rcu_state.name, bnode->nr_records, + bnode->records); + + kfree_bulk(bnode->nr_records, bnode->records); + } else { // vmalloc() / vfree(). + for (i = 0; i < bnode->nr_records; i++) { + trace_rcu_invoke_kvfree_callback( + rcu_state.name, bnode->records[i], 0); + + vfree(bnode->records[i]); + } + } + rcu_lock_release(&rcu_callback_map); + + raw_spin_lock_irqsave(&krcp->lock, flags); + if (put_cached_bnode(krcp, bnode)) + bnode = NULL; + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + if (bnode) + free_page((unsigned long) bnode); + + cond_resched_tasks_rcu_qs(); +} + +static void +kvfree_rcu_list(struct rcu_head *head) +{ + struct rcu_head *next; + + for (; head; head = next) { + void *ptr = (void *) head->func; + unsigned long offset = (void *) head - ptr; + + next = head->next; + debug_rcu_head_unqueue((struct rcu_head *)ptr); + rcu_lock_acquire(&rcu_callback_map); + trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset); + + if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) + kvfree(ptr); + + rcu_lock_release(&rcu_callback_map); + cond_resched_tasks_rcu_qs(); + } +} + /* * This function is invoked in workqueue context after a grace period. * It frees all the objects queued on ->bulk_head_free or ->head_free. @@ -3038,10 +3097,10 @@ static void kfree_rcu_work(struct work_struct *work) unsigned long flags; struct kvfree_rcu_bulk_data *bnode, *n; struct list_head bulk_head[FREE_N_CHANNELS]; - struct rcu_head *head, *next; + struct rcu_head *head; struct kfree_rcu_cpu *krcp; struct kfree_rcu_cpu_work *krwp; - int i, j; + int i; krwp = container_of(to_rcu_work(work), struct kfree_rcu_cpu_work, rcu_work); @@ -3058,38 +3117,9 @@ static void kfree_rcu_work(struct work_struct *work) raw_spin_unlock_irqrestore(&krcp->lock, flags); // Handle the first two channels. - for (i = 0; i < FREE_N_CHANNELS; i++) { - list_for_each_entry_safe(bnode, n, &bulk_head[i], list) { - debug_rcu_bhead_unqueue(bnode); - - rcu_lock_acquire(&rcu_callback_map); - if (i == 0) { // kmalloc() / kfree(). - trace_rcu_invoke_kfree_bulk_callback( - rcu_state.name, bnode->nr_records, - bnode->records); - - kfree_bulk(bnode->nr_records, bnode->records); - } else { // vmalloc() / vfree(). - for (j = 0; j < bnode->nr_records; j++) { - trace_rcu_invoke_kvfree_callback( - rcu_state.name, bnode->records[j], 0); - - vfree(bnode->records[j]); - } - } - rcu_lock_release(&rcu_callback_map); - - raw_spin_lock_irqsave(&krcp->lock, flags); - if (put_cached_bnode(krcp, bnode)) - bnode = NULL; - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - if (bnode) - free_page((unsigned long) bnode); - - cond_resched_tasks_rcu_qs(); - } - } + for (i = 0; i < FREE_N_CHANNELS; i++) + list_for_each_entry_safe(bnode, n, &bulk_head[i], list) + kvfree_rcu_bulk(krcp, bnode, i); /* * This is used when the "bulk" path can not be used for the @@ -3098,21 +3128,7 @@ static void kfree_rcu_work(struct work_struct *work) * queued on a linked list through their rcu_head structures. * This list is named "Channel 3". */ - for (; head; head = next) { - void *ptr = (void *) head->func; - unsigned long offset = (void *) head - ptr; - - next = head->next; - debug_rcu_head_unqueue((struct rcu_head *)ptr); - rcu_lock_acquire(&rcu_callback_map); - trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset); - - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) - kvfree(ptr); - - rcu_lock_release(&rcu_callback_map); - cond_resched_tasks_rcu_qs(); - } + kvfree_rcu_list(head); } static bool From patchwork Thu Jan 5 00:24:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089250 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C02AFC54EBD for ; Thu, 5 Jan 2023 00:27:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235632AbjAEA0g (ORCPT ); Wed, 4 Jan 2023 19:26:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235458AbjAEA0F (ORCPT ); Wed, 4 Jan 2023 19:26:05 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D48934FD61; Wed, 4 Jan 2023 16:24:53 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B4786B8167B; Thu, 5 Jan 2023 00:24:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40566C43392; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672878290; bh=f7VPeHyZLzEgBCJrtOOLYruhtWg4ipMr6+XcpqWU8pk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=S0UK3HPKJw+YCNaT91b6RahkwPZdBiq81FkIwsG9KBREFG09g2/nmVPLGuRh9Slsy P+4TZMVDBBg/vGftfxnLkORAChuSVkc5ICBoEKxZrGgAQVIyHYP5LvO8brggTJuZCJ ktTBwbLdLudB0OkMS8YjPWg/R1L39dxxRVPC9Aa9GRSr6i2iOqnHDu/hhRrrY00qCP F9P96p8iaAWPrftcUINLWz1GevjGXlNDd7pRqQuB2mOtbmxH7YE37QtMsKkarxNaxL LeEi54QJk+Yd+5pStjMBYv7ODBiqtkfZuUYssslpRcmsuf/FJ2a3Rw1CUpSqoL42Nk eI+xcDj0wTyrg== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id E79AC5C1456; Wed, 4 Jan 2023 16:24:49 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" Subject: [PATCH rcu 4/8] rcu/kvfree: Move need_offload_krc() out of krcp->lock Date: Wed, 4 Jan 2023 16:24:44 -0800 Message-Id: <20230105002448.1768892-4-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> References: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Uladzislau Rezki (Sony)" The need_offload_krc() function currently holds the krcp->lock in order to safely check krcp->head. This commit removes the need for this lock in that function by updating the krcp->head pointer using WRITE_ONCE() macro so that readers can carry out lockless loads of that pointer. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- kernel/rcu/tree.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 839e617f6c370..0c42fce4efe32 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3194,7 +3194,7 @@ static void kfree_rcu_monitor(struct work_struct *work) // objects queued on the linked list. if (!krwp->head_free) { krwp->head_free = krcp->head; - krcp->head = NULL; + WRITE_ONCE(krcp->head, NULL); } WRITE_ONCE(krcp->count, 0); @@ -3208,6 +3208,8 @@ static void kfree_rcu_monitor(struct work_struct *work) } } + raw_spin_unlock_irqrestore(&krcp->lock, flags); + // If there is nothing to detach, it means that our job is // successfully done here. In case of having at least one // of the channels that is still busy we should rearm the @@ -3215,8 +3217,6 @@ static void kfree_rcu_monitor(struct work_struct *work) // still in progress. if (need_offload_krc(krcp)) schedule_delayed_monitor_work(krcp); - - raw_spin_unlock_irqrestore(&krcp->lock, flags); } static enum hrtimer_restart @@ -3386,7 +3386,7 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) head->func = ptr; head->next = krcp->head; - krcp->head = head; + WRITE_ONCE(krcp->head, head); success = true; } @@ -3463,15 +3463,12 @@ static struct shrinker kfree_rcu_shrinker = { void __init kfree_rcu_scheduler_running(void) { int cpu; - unsigned long flags; for_each_possible_cpu(cpu) { struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - raw_spin_lock_irqsave(&krcp->lock, flags); if (need_offload_krc(krcp)) schedule_delayed_monitor_work(krcp); - raw_spin_unlock_irqrestore(&krcp->lock, flags); } } From patchwork Thu Jan 5 00:24:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089251 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFAB1C54EBC for ; Thu, 5 Jan 2023 00:27:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235733AbjAEA0h (ORCPT ); Wed, 4 Jan 2023 19:26:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235517AbjAEA0I (ORCPT ); Wed, 4 Jan 2023 19:26:08 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A820850F51; Wed, 4 Jan 2023 16:24:55 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E2DF26188D; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3D264C433F0; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672878290; bh=LCE+SHgExBVrEz2EAshUB2ptHQamWCte/NKiDUBtId4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QCQBIAXbEoDtbjF0OqnJDoNR0jiYtqCZCEyLTiGY1uD2OGAmEYKRbyPsUJybLSfC8 rOlvkjRFFZKp/rW7wvfFTBKCdAnXyktk3VJXmMmTyHUpywkPevc0SRdPTLuOnWTj1J 6I0rv/ORP1485Nvwutltd6V6SPsrSEyToCDE9RgEP6CCPSiWavLI3zl4UshUUecxoY oUD98NwCIYScSC0/faMcOy65cLDgUkdxKByqyvXl5SDV9cOYzFjyHPDZNjMq//6ntP hrRU/C2S8NCE52NOOWDNAvaMAqqiv3egZF/LSnZCG+vIdr9KUnUfS1BOOZbg9rQXT0 Egen0kTu+23sw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id EA3D15C149B; Wed, 4 Jan 2023 16:24:49 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" Subject: [PATCH rcu 5/8] rcu/kvfree: Use a polled API to speedup a reclaim process Date: Wed, 4 Jan 2023 16:24:45 -0800 Message-Id: <20230105002448.1768892-5-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> References: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Uladzislau Rezki (Sony)" Currently all objects placed into a batch wait for a full grace period to elapse after that batch is ready to send to RCU. However, this can unnecessarily delay freeing of the first objects that were added to the batch. After all, several RCU grace periods might have elapsed since those objects were added, and if so, there is no point in further deferring their freeing. This commit therefore adds per-page grace-period snapshots which are obtained from get_state_synchronize_rcu(). When the batch is ready to be passed to call_rcu(), each page's snapshot is checked by passing it to poll_state_synchronize_rcu(). If a given page's RCU grace period has already elapsed, its objects are freed immediately by kvfree_rcu_bulk(). Otherwise, these objects are freed after a call to synchronize_rcu(). This approach requires that the pages be traversed in reverse order, that is, the oldest ones first. Test example: kvm.sh --memory 10G --torture rcuscale --allcpus --duration 1 \ --kconfig CONFIG_NR_CPUS=64 \ --kconfig CONFIG_RCU_NOCB_CPU=y \ --kconfig CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y \ --kconfig CONFIG_RCU_LAZY=n \ --bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 \ rcuscale.holdoff=20 rcuscale.kfree_loops=10000 \ torture.disable_onoff_at_boot" --trust-make Before this commit: Total time taken by all kfree'ers: 8535693700 ns, loops: 10000, batches: 1188, memory footprint: 2248MB Total time taken by all kfree'ers: 8466933582 ns, loops: 10000, batches: 1157, memory footprint: 2820MB Total time taken by all kfree'ers: 5375602446 ns, loops: 10000, batches: 1130, memory footprint: 6502MB Total time taken by all kfree'ers: 7523283832 ns, loops: 10000, batches: 1006, memory footprint: 3343MB Total time taken by all kfree'ers: 6459171956 ns, loops: 10000, batches: 1150, memory footprint: 6549MB After this commit: Total time taken by all kfree'ers: 8560060176 ns, loops: 10000, batches: 1787, memory footprint: 61MB Total time taken by all kfree'ers: 8573885501 ns, loops: 10000, batches: 1777, memory footprint: 93MB Total time taken by all kfree'ers: 8320000202 ns, loops: 10000, batches: 1727, memory footprint: 66MB Total time taken by all kfree'ers: 8552718794 ns, loops: 10000, batches: 1790, memory footprint: 75MB Total time taken by all kfree'ers: 8601368792 ns, loops: 10000, batches: 1724, memory footprint: 62MB The reduction in memory footprint is well in excess of an order of magnitude. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- kernel/rcu/tree.c | 47 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 39 insertions(+), 8 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0c42fce4efe32..735312f78e980 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2877,11 +2877,13 @@ EXPORT_SYMBOL_GPL(call_rcu); /** * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers * @list: List node. All blocks are linked between each other + * @gp_snap: Snapshot of RCU state for objects placed to this bulk * @nr_records: Number of active pointers in the array * @records: Array of the kvfree_rcu() pointers */ struct kvfree_rcu_bulk_data { struct list_head list; + unsigned long gp_snap; unsigned long nr_records; void *records[]; }; @@ -2898,13 +2900,15 @@ struct kvfree_rcu_bulk_data { * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period * @head_free: List of kfree_rcu() objects waiting for a grace period + * @head_free_gp_snap: Snapshot of RCU state for objects placed to "@head_free" * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period * @krcp: Pointer to @kfree_rcu_cpu structure */ struct kfree_rcu_cpu_work { - struct rcu_work rcu_work; + struct work_struct rcu_work; struct rcu_head *head_free; + unsigned long head_free_gp_snap; struct list_head bulk_head_free[FREE_N_CHANNELS]; struct kfree_rcu_cpu *krcp; }; @@ -3100,10 +3104,11 @@ static void kfree_rcu_work(struct work_struct *work) struct rcu_head *head; struct kfree_rcu_cpu *krcp; struct kfree_rcu_cpu_work *krwp; + unsigned long head_free_gp_snap; int i; - krwp = container_of(to_rcu_work(work), - struct kfree_rcu_cpu_work, rcu_work); + krwp = container_of(work, + struct kfree_rcu_cpu_work, rcu_work); krcp = krwp->krcp; raw_spin_lock_irqsave(&krcp->lock, flags); @@ -3114,12 +3119,29 @@ static void kfree_rcu_work(struct work_struct *work) // Channel 3. head = krwp->head_free; krwp->head_free = NULL; + head_free_gp_snap = krwp->head_free_gp_snap; raw_spin_unlock_irqrestore(&krcp->lock, flags); // Handle the first two channels. - for (i = 0; i < FREE_N_CHANNELS; i++) + for (i = 0; i < FREE_N_CHANNELS; i++) { + // Start from the tail page, so a GP is likely passed for it. + list_for_each_entry_safe_reverse(bnode, n, &bulk_head[i], list) { + // Not yet ready? Bail out since we need one more GP. + if (!poll_state_synchronize_rcu(bnode->gp_snap)) + break; + + list_del_init(&bnode->list); + kvfree_rcu_bulk(krcp, bnode, i); + } + + // Please note a request for one more extra GP can + // occur only once for all objects in this batch. + if (!list_empty(&bulk_head[i])) + synchronize_rcu(); + list_for_each_entry_safe(bnode, n, &bulk_head[i], list) kvfree_rcu_bulk(krcp, bnode, i); + } /* * This is used when the "bulk" path can not be used for the @@ -3128,7 +3150,10 @@ static void kfree_rcu_work(struct work_struct *work) * queued on a linked list through their rcu_head structures. * This list is named "Channel 3". */ - kvfree_rcu_list(head); + if (head) { + cond_synchronize_rcu(head_free_gp_snap); + kvfree_rcu_list(head); + } } static bool @@ -3195,6 +3220,11 @@ static void kfree_rcu_monitor(struct work_struct *work) if (!krwp->head_free) { krwp->head_free = krcp->head; WRITE_ONCE(krcp->head, NULL); + + // Take a snapshot for this krwp. Please note no more + // any objects can be added to attached head_free channel + // therefore fixate a GP for it here. + krwp->head_free_gp_snap = get_state_synchronize_rcu(); } WRITE_ONCE(krcp->count, 0); @@ -3204,7 +3234,7 @@ static void kfree_rcu_monitor(struct work_struct *work) // be that the work is in the pending state when // channels have been detached following by each // other. - queue_rcu_work(system_wq, &krwp->rcu_work); + queue_work(system_wq, &krwp->rcu_work); } } @@ -3332,8 +3362,9 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, list_add(&bnode->list, &(*krcp)->bulk_head[idx]); } - /* Finally insert. */ + // Finally insert and update the GP for this page. bnode->records[bnode->nr_records++] = ptr; + bnode->gp_snap = get_state_synchronize_rcu(); return true; } @@ -4783,7 +4814,7 @@ static void __init kfree_rcu_batch_init(void) struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); for (i = 0; i < KFREE_N_BATCHES; i++) { - INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); + INIT_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); krcp->krw_arr[i].krcp = krcp; for (j = 0; j < FREE_N_CHANNELS; j++) From patchwork Thu Jan 5 00:24:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089252 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 044BEC54EF0 for ; Thu, 5 Jan 2023 00:27:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235751AbjAEA0i (ORCPT ); Wed, 4 Jan 2023 19:26:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240385AbjAEA0S (ORCPT ); Wed, 4 Jan 2023 19:26:18 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECBA84A976; Wed, 4 Jan 2023 16:25:13 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6BAF961892; Thu, 5 Jan 2023 00:24:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9609EC433A0; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672878290; bh=yDaJ2kiBgiBxalJElJRwgXxKwM8lnJrHfQVXCUrdp/o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=POh+aTc0KBd14nDPvnvVh+qNPHOgWKCdws+JM7+YA9e5ldb2YMcBdkaJeuHrcaOu+ b02yUkMtL/6uKXhJeKJ+6ZTiMtptY6LdXJqwRtAC4q9kFBsX/AylOdXq4kBiw+At9W DdjIf19BQcep8goojS2CIhVzRCMpl4TxPJDOcxm5SkqkHyTAn4BpmQWUrwKnRJDii1 qiNaOwdB7LpnXAWmLEL/HfTxNAC8y2cyDdf5DWlckjcQFpVgLYn8QqpGr9PjYCcxVX JCetQLtt1i2Aqm5whmgV/dA/Ico7yfOv8D+zJGQTQGWg0M+GrrixlUTbIaaHeTW9rA hVW4WBSiot9Qw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id ECD5E5C1ADF; Wed, 4 Jan 2023 16:24:49 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" Subject: [PATCH rcu 6/8] rcu/kvfree: Use READ_ONCE() when access to krcp->head Date: Wed, 4 Jan 2023 16:24:46 -0800 Message-Id: <20230105002448.1768892-6-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> References: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Uladzislau Rezki (Sony)" The need_offload_krc() function is now lock-free, which gives the compiler freedom to load old values from plain C-language loads from the kfree_rcu_cpu struture's ->head pointer. This commit therefore applied READ_ONCE() to these loads. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- kernel/rcu/tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 735312f78e980..02551e0e11328 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3165,7 +3165,7 @@ need_offload_krc(struct kfree_rcu_cpu *krcp) if (!list_empty(&krcp->bulk_head[i])) return true; - return !!krcp->head; + return !!READ_ONCE(krcp->head); } static void @@ -3206,7 +3206,7 @@ static void kfree_rcu_monitor(struct work_struct *work) // in that case the monitor work is rearmed. if ((!list_empty(&krcp->bulk_head[0]) && list_empty(&krwp->bulk_head_free[0])) || (!list_empty(&krcp->bulk_head[1]) && list_empty(&krwp->bulk_head_free[1])) || - (krcp->head && !krwp->head_free)) { + (READ_ONCE(krcp->head) && !krwp->head_free)) { // Channel 1 corresponds to the SLAB-pointer bulk path. // Channel 2 corresponds to vmalloc-pointer bulk path. From patchwork Thu Jan 5 00:24:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6991BC54EBC for ; Thu, 5 Jan 2023 00:28:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240359AbjAEA1G (ORCPT ); Wed, 4 Jan 2023 19:27:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240494AbjAEA0U (ORCPT ); Wed, 4 Jan 2023 19:26:20 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 600E84A941; Wed, 4 Jan 2023 16:25:14 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7A1766189B; Thu, 5 Jan 2023 00:24:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 961D7C433A1; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672878290; bh=d7YTGhFTM4KzsgpxZTGGeZQR3ZpgDoXNIpuouzZiz84=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vL1tIQBRmDEAMoVcwK2B1dk52NSs4R9n3UydE1soj3iDSaSQ2DgdBE/PWt5IOQN80 ZbyurBktwUn/Zh6WnKQKv9a6frTwR18i4tbZ3OmPHG/yxfdGmY1foCRz3hZe3tlUY5 rztozHaUPWvILIwcm9lanovhKr8JUsBOIAE5bm2TrYo3Esz+sePKgV3Ft9AQKVqyMl 6FooEJFGp1UQe+3XPPCZe0dnOKWHEhAEzZzcsY7HOrhkLITx7/J0RXRdMBeXLz5s0U M1kEGIIj33p9uRKpibrds/LHmGTskCJE3ftNyir5I1BVZuIdUOe3FjjNu9tEaEbOlr glMnhsHN5HcsA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id EF0FF5C1AE0; Wed, 4 Jan 2023 16:24:49 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" Subject: [PATCH rcu 7/8] rcu/kvfree: Carefully reset number of objects in krcp Date: Wed, 4 Jan 2023 16:24:47 -0800 Message-Id: <20230105002448.1768892-7-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> References: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Uladzislau Rezki (Sony)" The schedule_delayed_monitor_work() function relies on the count of objects queued into any given kfree_rcu_cpu structure. This count is used to determine how quickly to schedule passing these objects to RCU. There are three pipes where pointers can be placed. When any pipe is offloaded, the kfree_rcu_cpu structure's ->count counter is set to zero, which is wrong because the other pipes might still be non-empty. This commit therefore maintains per-pipe counters, and introduces a krc_count() helper to access the aggregate value of those counters. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- kernel/rcu/tree.c | 40 ++++++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 02551e0e11328..52f4c7e87f88e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2921,7 +2921,8 @@ struct kfree_rcu_cpu_work { * @lock: Synchronize access to this structure * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES * @initialized: The @rcu_work fields have been initialized - * @count: Number of objects for which GP not started + * @head_count: Number of objects in rcu_head singular list + * @bulk_count: Number of objects in bulk-list * @bkvcache: * A simple cache list that contains objects for reuse purpose. * In order to save some per-cpu space the list is singular. @@ -2939,13 +2940,19 @@ struct kfree_rcu_cpu_work { * the interactions with the slab allocators. */ struct kfree_rcu_cpu { + // Objects queued on a linked list + // through their rcu_head structures. struct rcu_head *head; + atomic_t head_count; + + // Objects queued on a bulk-list. struct list_head bulk_head[FREE_N_CHANNELS]; + atomic_t bulk_count[FREE_N_CHANNELS]; + struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; raw_spinlock_t lock; struct delayed_work monitor_work; bool initialized; - int count; struct delayed_work page_cache_work; atomic_t backoff_page_cache_fill; @@ -3168,12 +3175,23 @@ need_offload_krc(struct kfree_rcu_cpu *krcp) return !!READ_ONCE(krcp->head); } +static int krc_count(struct kfree_rcu_cpu *krcp) +{ + int sum = atomic_read(&krcp->head_count); + int i; + + for (i = 0; i < FREE_N_CHANNELS; i++) + sum += atomic_read(&krcp->bulk_count[i]); + + return sum; +} + static void schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) { long delay, delay_left; - delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES; + delay = krc_count(krcp) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES; if (delayed_work_pending(&krcp->monitor_work)) { delay_left = krcp->monitor_work.timer.expires - jiffies; if (delay < delay_left) @@ -3211,8 +3229,10 @@ static void kfree_rcu_monitor(struct work_struct *work) // Channel 1 corresponds to the SLAB-pointer bulk path. // Channel 2 corresponds to vmalloc-pointer bulk path. for (j = 0; j < FREE_N_CHANNELS; j++) { - if (list_empty(&krwp->bulk_head_free[j])) + if (list_empty(&krwp->bulk_head_free[j])) { list_replace_init(&krcp->bulk_head[j], &krwp->bulk_head_free[j]); + atomic_set(&krcp->bulk_count[j], 0); + } } // Channel 3 corresponds to both SLAB and vmalloc @@ -3220,6 +3240,7 @@ static void kfree_rcu_monitor(struct work_struct *work) if (!krwp->head_free) { krwp->head_free = krcp->head; WRITE_ONCE(krcp->head, NULL); + atomic_set(&krcp->head_count, 0); // Take a snapshot for this krwp. Please note no more // any objects can be added to attached head_free channel @@ -3227,8 +3248,6 @@ static void kfree_rcu_monitor(struct work_struct *work) krwp->head_free_gp_snap = get_state_synchronize_rcu(); } - WRITE_ONCE(krcp->count, 0); - // One work is per one batch, so there are three // "free channels", the batch can handle. It can // be that the work is in the pending state when @@ -3365,6 +3384,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, // Finally insert and update the GP for this page. bnode->records[bnode->nr_records++] = ptr; bnode->gp_snap = get_state_synchronize_rcu(); + atomic_inc(&(*krcp)->bulk_count[idx]); + return true; } @@ -3418,11 +3439,10 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) head->func = ptr; head->next = krcp->head; WRITE_ONCE(krcp->head, head); + atomic_inc(&krcp->head_count); success = true; } - WRITE_ONCE(krcp->count, krcp->count + 1); - // Set timer to drain after KFREE_DRAIN_JIFFIES. if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) schedule_delayed_monitor_work(krcp); @@ -3453,7 +3473,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) for_each_possible_cpu(cpu) { struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - count += READ_ONCE(krcp->count); + count += krc_count(krcp); count += READ_ONCE(krcp->nr_bkv_objs); atomic_set(&krcp->backoff_page_cache_fill, 1); } @@ -3470,7 +3490,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) int count; struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - count = krcp->count; + count = krc_count(krcp); count += drain_page_cache(krcp); kfree_rcu_monitor(&krcp->monitor_work.work); From patchwork Thu Jan 5 00:24:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089254 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53EAEC46467 for ; Thu, 5 Jan 2023 00:28:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231258AbjAEA1H (ORCPT ); Wed, 4 Jan 2023 19:27:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240434AbjAEA0T (ORCPT ); Wed, 4 Jan 2023 19:26:19 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECB254A975; Wed, 4 Jan 2023 16:25:13 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6335A61888; Thu, 5 Jan 2023 00:24:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98CFAC4339E; Thu, 5 Jan 2023 00:24:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672878290; bh=BPcSGuUJdzaOvgoBas81TEqlhrtXgfDZlFCPUYBEIeM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=khtpiFUCsXev16OPCqlsUBNEnj6Lbk/2xj3B7g6QBfsy+k/hcGlh3/aq+peF2VFI4 x1BLTaPBRRiUbFUO/gH0CrUH2jEXpJZubnQ872PN4Zzoz+arUrHDTyg/1Zc82bpDe5 4M7eY8ciQJZcWnCEc1pFQOfARFzK3uyne+4wOt5l4syPSNb6/awLd73hxFpACj6vjN QG7hmD6vewbXZN5jBgbo+O2ZMy8sv6q5yAdoIafoVypqLS2T9WxeWwu+T69mhsnUye W35SZGFYBr3BNDjG5bSJSs2qEJsrIzd8ZWjJxFVj7FweXxoAa8mLcXGUcZE3cB0huQ m3HcVX4K8XRWw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id F1C5C5C1C5B; Wed, 4 Jan 2023 16:24:49 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" Subject: [PATCH rcu 8/8] rcu/kvfree: Split ready for reclaim objects from a batch Date: Wed, 4 Jan 2023 16:24:48 -0800 Message-Id: <20230105002448.1768892-8-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> References: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Uladzislau Rezki (Sony)" This patch splits the lists of objects so as to avoid sending any through RCU that have already been queued for more than one grace period. These long-term-resident objects are immediately freed. The remaining short-term-resident objects are queued for later freeing using queue_rcu_work(). This change avoids delaying workqueue handlers with synchronize_rcu() invocations. Yes, workqueue handlers are designed to handle blocking, but avoiding blocking when unnecessary improves performance during low-memory situations. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- kernel/rcu/tree.c | 87 +++++++++++++++++++++++++++++------------------ 1 file changed, 54 insertions(+), 33 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 52f4c7e87f88e..0b4f7dd551572 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2900,15 +2900,13 @@ struct kvfree_rcu_bulk_data { * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period * @head_free: List of kfree_rcu() objects waiting for a grace period - * @head_free_gp_snap: Snapshot of RCU state for objects placed to "@head_free" * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period * @krcp: Pointer to @kfree_rcu_cpu structure */ struct kfree_rcu_cpu_work { - struct work_struct rcu_work; + struct rcu_work rcu_work; struct rcu_head *head_free; - unsigned long head_free_gp_snap; struct list_head bulk_head_free[FREE_N_CHANNELS]; struct kfree_rcu_cpu *krcp; }; @@ -2916,6 +2914,7 @@ struct kfree_rcu_cpu_work { /** * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period * @head: List of kfree_rcu() objects not yet waiting for a grace period + * @head_gp_snap: Snapshot of RCU state for objects placed to "@head" * @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period * @lock: Synchronize access to this structure @@ -2943,6 +2942,7 @@ struct kfree_rcu_cpu { // Objects queued on a linked list // through their rcu_head structures. struct rcu_head *head; + unsigned long head_gp_snap; atomic_t head_count; // Objects queued on a bulk-list. @@ -3111,10 +3111,9 @@ static void kfree_rcu_work(struct work_struct *work) struct rcu_head *head; struct kfree_rcu_cpu *krcp; struct kfree_rcu_cpu_work *krwp; - unsigned long head_free_gp_snap; int i; - krwp = container_of(work, + krwp = container_of(to_rcu_work(work), struct kfree_rcu_cpu_work, rcu_work); krcp = krwp->krcp; @@ -3126,26 +3125,11 @@ static void kfree_rcu_work(struct work_struct *work) // Channel 3. head = krwp->head_free; krwp->head_free = NULL; - head_free_gp_snap = krwp->head_free_gp_snap; raw_spin_unlock_irqrestore(&krcp->lock, flags); // Handle the first two channels. for (i = 0; i < FREE_N_CHANNELS; i++) { // Start from the tail page, so a GP is likely passed for it. - list_for_each_entry_safe_reverse(bnode, n, &bulk_head[i], list) { - // Not yet ready? Bail out since we need one more GP. - if (!poll_state_synchronize_rcu(bnode->gp_snap)) - break; - - list_del_init(&bnode->list); - kvfree_rcu_bulk(krcp, bnode, i); - } - - // Please note a request for one more extra GP can - // occur only once for all objects in this batch. - if (!list_empty(&bulk_head[i])) - synchronize_rcu(); - list_for_each_entry_safe(bnode, n, &bulk_head[i], list) kvfree_rcu_bulk(krcp, bnode, i); } @@ -3157,10 +3141,7 @@ static void kfree_rcu_work(struct work_struct *work) * queued on a linked list through their rcu_head structures. * This list is named "Channel 3". */ - if (head) { - cond_synchronize_rcu(head_free_gp_snap); - kvfree_rcu_list(head); - } + kvfree_rcu_list(head); } static bool @@ -3201,6 +3182,44 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) queue_delayed_work(system_wq, &krcp->monitor_work, delay); } +static void +kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp) +{ + struct list_head bulk_ready[FREE_N_CHANNELS]; + struct kvfree_rcu_bulk_data *bnode, *n; + struct rcu_head *head_ready = NULL; + unsigned long flags; + int i; + + raw_spin_lock_irqsave(&krcp->lock, flags); + for (i = 0; i < FREE_N_CHANNELS; i++) { + INIT_LIST_HEAD(&bulk_ready[i]); + + list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) { + if (!poll_state_synchronize_rcu(bnode->gp_snap)) + break; + + atomic_sub(bnode->nr_records, &krcp->bulk_count[i]); + list_move(&bnode->list, &bulk_ready[i]); + } + } + + if (krcp->head && poll_state_synchronize_rcu(krcp->head_gp_snap)) { + head_ready = krcp->head; + atomic_set(&krcp->head_count, 0); + WRITE_ONCE(krcp->head, NULL); + } + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + for (i = 0; i < FREE_N_CHANNELS; i++) { + list_for_each_entry_safe(bnode, n, &bulk_ready[i], list) + kvfree_rcu_bulk(krcp, bnode, i); + } + + if (head_ready) + kvfree_rcu_list(head_ready); +} + /* * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. */ @@ -3211,6 +3230,9 @@ static void kfree_rcu_monitor(struct work_struct *work) unsigned long flags; int i, j; + // Drain ready for reclaim. + kvfree_rcu_drain_ready(krcp); + raw_spin_lock_irqsave(&krcp->lock, flags); // Attempt to start a new batch. @@ -3230,8 +3252,9 @@ static void kfree_rcu_monitor(struct work_struct *work) // Channel 2 corresponds to vmalloc-pointer bulk path. for (j = 0; j < FREE_N_CHANNELS; j++) { if (list_empty(&krwp->bulk_head_free[j])) { - list_replace_init(&krcp->bulk_head[j], &krwp->bulk_head_free[j]); atomic_set(&krcp->bulk_count[j], 0); + list_replace_init(&krcp->bulk_head[j], + &krwp->bulk_head_free[j]); } } @@ -3239,13 +3262,8 @@ static void kfree_rcu_monitor(struct work_struct *work) // objects queued on the linked list. if (!krwp->head_free) { krwp->head_free = krcp->head; - WRITE_ONCE(krcp->head, NULL); atomic_set(&krcp->head_count, 0); - - // Take a snapshot for this krwp. Please note no more - // any objects can be added to attached head_free channel - // therefore fixate a GP for it here. - krwp->head_free_gp_snap = get_state_synchronize_rcu(); + WRITE_ONCE(krcp->head, NULL); } // One work is per one batch, so there are three @@ -3253,7 +3271,7 @@ static void kfree_rcu_monitor(struct work_struct *work) // be that the work is in the pending state when // channels have been detached following by each // other. - queue_work(system_wq, &krwp->rcu_work); + queue_rcu_work(system_wq, &krwp->rcu_work); } } @@ -3440,6 +3458,9 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) head->next = krcp->head; WRITE_ONCE(krcp->head, head); atomic_inc(&krcp->head_count); + + // Take a snapshot for this krcp. + krcp->head_gp_snap = get_state_synchronize_rcu(); success = true; } @@ -4834,7 +4855,7 @@ static void __init kfree_rcu_batch_init(void) struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); for (i = 0; i < KFREE_N_BATCHES; i++) { - INIT_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); + INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); krcp->krw_arr[i].krcp = krcp; for (j = 0; j < FREE_N_CHANNELS; j++) From patchwork Fri Feb 3 01:43:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13126912 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EE58C636D3 for ; Fri, 3 Feb 2023 01:43:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230169AbjBCBnM (ORCPT ); Thu, 2 Feb 2023 20:43:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232050AbjBCBnK (ORCPT ); Thu, 2 Feb 2023 20:43:10 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76D0922A0E; Thu, 2 Feb 2023 17:43:07 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DFCE761D4B; Fri, 3 Feb 2023 01:43:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3FA1BC433EF; Fri, 3 Feb 2023 01:43:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675388586; bh=8zWI8bpAWD3oVlVykWtql5OSUtDYRERkD7toqAUb7R8=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=sSzjqSKG0c2v4pEaoyOyx0z3psnCdMrbABrkOsIQ7QR3fJYvsCR9bTr4H4GLesSt2 iFE4qI/QNPP83WUAsy6rsNZwOjk3r7pmuVDXSZ32g3gQGxHw/3bE5O0RJaLjpm17a4 UapoOYxj85CAZGh3VE5OuQOPT9oA9XvEUxW/AjzRNhRpLQq80yCu7yu4g+qgHI2dUo 5Mv7cjr1cMqFvM+/kton+7O9uFxzionWH7MdobojY0Gx97PiWCfwfqrmtwwJHfzHFM Ws2y3TFkDkt9vjP/M9/Du0L4a1lskYOIP/6w9kdurWRuldvah7clBRFP0HASL94ALN 15C9AAS1x6uzw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id D608B5C0DE7; Thu, 2 Feb 2023 17:43:05 -0800 (PST) Date: Thu, 2 Feb 2023 17:43:05 -0800 From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org Subject: [PATCH rcu 9/8] Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() Message-ID: <20230203014305.GA1075064@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230105002441.GA1768817@paulmck-ThinkPad-P17-Gen-1> Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org The kvfree_rcu() and kfree_rcu() APIs are hazardous in that if you forget the second argument, it works, but might sleep. This sleeping can be a correctness bug from atomic contexts, and even in non-atomic contexts it might introduce unacceptable latencies. This commit therefore adds kvfree_rcu_mightsleep() and kfree_rcu_mightsleep(), which will replace the single-argument kvfree_rcu() and kfree_rcu(), respectively. This commit enables a series of commits that switch from single-argument kvfree_rcu() and kfree_rcu() to their _mightsleep() counterparts. Once all of these commits land, the single-argument versions will be removed. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index f38d4469d7f30..84433600885a6 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -1004,6 +1004,9 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) #define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__, \ kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__) +#define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr) +#define kfree_rcu_mightsleep(ptr) kvfree_rcu_mightsleep(ptr) + #define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME #define kvfree_rcu_arg_2(ptr, rhf) \ do { \