From patchwork Tue Sep 5 01:38:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 13374717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6759CA0FF8 for ; Tue, 5 Sep 2023 16:05:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238290AbjIEQFr (ORCPT ); Tue, 5 Sep 2023 12:05:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244974AbjIEBkr (ORCPT ); Mon, 4 Sep 2023 21:40:47 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3237CC5 for ; Mon, 4 Sep 2023 18:40:43 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8C13B21850; Tue, 5 Sep 2023 01:40:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1693878042; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ppxSi48FD0R0OjKV6ME1dGirA2s4Zo94mbpK835jSXY=; b=t3Havtc3mVCwqMfl/j7tjaOOiMAx7/M+IG600yLtP65kEURIxljNWksCJPsDapX8UEeE49 l2dDpJc1+74mV+CjxIqyHL3ysVzmei5Nb1EI+jUAZ5am2OV8QHE+13X6PdYMzj9Ktzmti6 HtPt26iA6R5XaB7hOOmlHSyHX/BYgYE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1693878042; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ppxSi48FD0R0OjKV6ME1dGirA2s4Zo94mbpK835jSXY=; b=6O9cIPcHcMJFHMi/F211xLLqatOKt5DzSewY2gkADAHCobPg+YJPnp0BahA/Ype21hqoSQ atgb2R05IYQiYaDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4D8A113499; Tue, 5 Sep 2023 01:40:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id e/PDABmH9mSeUwAAMHmgww (envelope-from ); Tue, 05 Sep 2023 01:40:41 +0000 From: NeilBrown To: Chuck Lever , Jeff Layton Cc: linux-nfs@vger.kernel.org Subject: [PATCH 1/3] lib: add light-weight queuing mechanism. Date: Tue, 5 Sep 2023 11:38:11 +1000 Message-ID: <20230905014011.25472-2-neilb@suse.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230905014011.25472-1-neilb@suse.de> References: <20230905014011.25472-1-neilb@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org lwq is a FIFO single-linked queue that only requires a spinlock for dequeueing, which happens in process context. Enqueueing is atomic with no spinlock and can happen in any context. Include a unit test for basic functionality - runs at boot time. Does not use kunit framework. Signed-off-by: NeilBrown --- include/linux/lwq.h | 120 +++++++++++++++++++++++++++++++++++ lib/Kconfig | 5 ++ lib/Makefile | 2 +- lib/lwq.c | 151 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 277 insertions(+), 1 deletion(-) create mode 100644 include/linux/lwq.h create mode 100644 lib/lwq.c diff --git a/include/linux/lwq.h b/include/linux/lwq.h new file mode 100644 index 000000000000..52b9c81b493a --- /dev/null +++ b/include/linux/lwq.h @@ -0,0 +1,120 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef LWQ_H +#define LWQ_H +/* + * light-weight single-linked queue built from llist + * + * Entries can be enqueued from any context with no locking. + * Entries can be dequeued from process context with integrated locking. + */ +#include +#include +#include + +struct lwq_node { + struct llist_node node; +}; + +struct lwq { + spinlock_t lock; + struct llist_node *ready; /* entries to be dequeued */ + struct llist_head new; /* entries being enqueued */ +}; + +/** + * lwq_init - initialise a lwq + * @q: the lwq object + */ +static inline void lwq_init(struct lwq *q) +{ + spin_lock_init(&q->lock); + q->ready = NULL; + init_llist_head(&q->new); +} + +/** + * lwq_empty - test if lwq contains any entry + * @q: the lwq object + * + * This empty test contains an acquire barrier so that if a wakeup + * is sent when lwq_dequeue returns true, it is safe to go to sleep after + * a test on lwq_empty(). + */ +static inline bool lwq_empty(struct lwq *q) +{ + /* acquire ensures ordering wrt lwq_enqueue() */ + return smp_load_acquire(&q->ready) == NULL && llist_empty(&q->new); +} + +struct llist_node *__lwq_dequeue(struct lwq *q); +/** + * lwq_dequeue - dequeue first (oldest) entry from lwq + * @q: the queue to dequeue from + * @type: the type of object to return + * @member: them member in returned object which is an lwq_node. + * + * Remove a single object from the lwq and return it. This will take + * a spinlock and so must always be called in the same context, typcially + * process contet. + */ +#define lwq_dequeue(q, type, member) \ + ({ struct llist_node *_n = __lwq_dequeue(q); \ + _n ? container_of(_n, type, member.node) : NULL; }) + +struct llist_node *lwq_dequeue_all(struct lwq *q); + +/** + * lwq_for_each_safe - iterate over detached queue allowing deletion + * @_n: iterator variable + * @_t1: temporary struct llist_node ** + * @_t2: temporary struct llist_node * + * @_l: address of llist_node pointer from lwq_dequeue_all() + * @_member: member in _n where lwq_node is found. + * + * Iterate over members in a dequeued list. If the iterator variable + * is set to NULL, the iterator removes that entry from the queue. + */ +#define lwq_for_each_safe(_n, _t1, _t2, _l, _member) \ + for (_t1 = (_l); \ + *(_t1) ? (_n = container_of(*(_t1), typeof(*(_n)), _member.node),\ + _t2 = ((*_t1)->next), \ + true) \ + : false; \ + (_n) ? (_t1 = &(_n)->_member.node.next, 0) \ + : ((*(_t1) = (_t2)), 0)) + +/** + * lwq_enqueue - add a new item to the end of the queue + * @n - the lwq_node embedded in the item to be added + * @q - the lwq to append to. + * + * No locking is needed to append to the queue so this can + * be called from any context. + * Return %true is the list may have previously been empty. + */ +static inline bool lwq_enqueue(struct lwq_node *n, struct lwq *q) +{ + /* acquire enqures ordering wrt lwq_dequeue */ + return llist_add(&n->node, &q->new) && + smp_load_acquire(&q->ready) == NULL; +} + +/** + * lwq_enqueue_batch - add a list of new items to the end of the queue + * @n - the lwq_node embedded in the first item to be added + * @q - the lwq to append to. + * + * No locking is needed to append to the queue so this can + * be called from any context. + * Return %true is the list may have previously been empty. + */ +static inline bool lwq_enqueue_batch(struct llist_node *n, struct lwq *q) +{ + struct llist_node *e = n; + + /* acquire enqures ordering wrt lwq_dequeue */ + return llist_add_batch(llist_reverse_order(n), e, &q->new) && + smp_load_acquire(&q->ready) == NULL; +} +#endif /* LWQ_H */ diff --git a/lib/Kconfig b/lib/Kconfig index 5c2da561c516..d9a12a81fd24 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -763,3 +763,8 @@ config ASN1_ENCODER config POLYNOMIAL tristate + +config LWQ_TEST + bool "lib: enable boot-time test for lwq queuing" + help + Enable boot-time test of lwq functionality. diff --git a/lib/Makefile b/lib/Makefile index 1ffae65bb7ee..4b67c2d6af62 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -45,7 +45,7 @@ obj-y += lockref.o obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \ list_sort.o uuid.o iov_iter.o clz_ctz.o \ - bsearch.o find_bit.o llist.o memweight.o kfifo.o \ + bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o base64.o \ once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ generic-radix-tree.o diff --git a/lib/lwq.c b/lib/lwq.c new file mode 100644 index 000000000000..7fe6c7125357 --- /dev/null +++ b/lib/lwq.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Light weight single-linked queue. + * + * Entries are enqueued to the head of an llist, with no blocking. + * This can happen in any context. + * + * Entries are dequeued using a spinlock to protect against + * multiple access. The llist is staged in reverse order, and refreshed + * from the llist when it exhausts. + */ +#include +#include + +struct llist_node *__lwq_dequeue(struct lwq *q) +{ + struct llist_node *this; + + if (lwq_empty(q)) + return NULL; + spin_lock(&q->lock); + this = q->ready; + if (!this && !llist_empty(&q->new)) { + /* ensure queue doesn't appear transiently lwq_empty */ + smp_store_release(&q->ready, (void *)1); + this = llist_reverse_order(llist_del_all(&q->new)); + if (!this) + q->ready = NULL; + } + if (this) + q->ready = llist_next(this); + spin_unlock(&q->lock); + return this; +} +EXPORT_SYMBOL_GPL(__lwq_dequeue); + +/** + * lwq_dequeue_all - dequeue all currently enqueued objects + * @q: the queue to dequeue from + * + * Remove and return a linked list of llist_nodes of all the objects that were + * in the queue. The first on the list will be the object that was least + * recently enqueued. + */ +struct llist_node *lwq_dequeue_all(struct lwq *q) +{ + struct llist_node *r, *t, **ep; + + if (lwq_empty(q)) + return NULL; + + spin_lock(&q->lock); + r = q->ready; + q->ready = NULL; + t = llist_del_all(&q->new); + spin_unlock(&q->lock); + ep = &r; + while (*ep) + ep = &(*ep)->next; + *ep = llist_reverse_order(t); + return r; +} +EXPORT_SYMBOL_GPL(lwq_dequeue_all); + +#if IS_ENABLED(CONFIG_LWQ_TEST) + +#include +#include +#include +#include +#include +struct tnode { + struct lwq_node n; + int i; + int c; +}; + +static int lwq_exercise(void *qv) +{ + struct lwq *q = qv; + int cnt; + struct tnode *t; + + for (cnt = 0; cnt < 10000; cnt++) { + wait_var_event(q, (t = lwq_dequeue(q, struct tnode, n)) != NULL); + t->c++; + if (lwq_enqueue(&t->n, q)) + wake_up_var(q); + } + while (!kthread_should_stop()) + schedule_timeout_idle(1); + return 0; +} + +static int lwq_test(void) +{ + int i; + struct lwq q; + struct llist_node *l, **t1, *t2; + struct tnode *t; + struct task_struct *threads[8]; + + printk(KERN_INFO "testing lwq....\n"); + lwq_init(&q); + printk(KERN_INFO " lwq: run some threads\n"); + for (i = 0; i < ARRAY_SIZE(threads); i++) + threads[i] = kthread_run(lwq_exercise, &q, "lwq-test-%d", i); + for (i = 0; i < 100; i++) { + t = kmalloc(sizeof(*t), GFP_KERNEL); + t->i = i; + t->c = 0; + if (lwq_enqueue(&t->n, &q)) + wake_up_var(&q); + }; + /* wait for threads to exit */ + for (i = 0; i < ARRAY_SIZE(threads); i++) + if (!IS_ERR_OR_NULL(threads[i])) + kthread_stop(threads[i]); + printk(KERN_INFO " lwq: dequeue first 50:"); + for (i = 0; i < 50 ; i++) { + if (i && (i % 10) == 0) { + printk(KERN_CONT "\n"); + printk(KERN_INFO " lwq: ... "); + } + t = lwq_dequeue(&q, struct tnode, n); + printk(KERN_CONT " %d(%d)", t->i, t->c); + kfree(t); + } + printk(KERN_CONT "\n"); + l = lwq_dequeue_all(&q); + printk(KERN_INFO " lwq: delete the multiples of 3 (test lwq_for_each_safe())\n"); + lwq_for_each_safe(t, t1, t2, &l, n) { + if ((t->i % 3) == 0) { + t->i = -1; + kfree(t); + t = NULL; + } + } + if (l) + lwq_enqueue_batch(l, &q); + printk(KERN_INFO " lwq: dequeue remaining:"); + while ((t = lwq_dequeue(&q, struct tnode, n)) != NULL) { + printk(KERN_CONT " %d", t->i); + kfree(t); + } + printk(KERN_CONT "\n"); + return 0; +} + +module_init(lwq_test); +#endif /* CONFIG_LWQ_TEST*/ From patchwork Tue Sep 5 01:38:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 13374715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C77DCA0FE2 for ; Tue, 5 Sep 2023 16:05:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231438AbjIEQF2 (ORCPT ); Tue, 5 Sep 2023 12:05:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244977AbjIEBkw (ORCPT ); Mon, 4 Sep 2023 21:40:52 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7137CC5 for ; Mon, 4 Sep 2023 18:40:48 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 82D7C21850; Tue, 5 Sep 2023 01:40:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1693878047; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VjccyfIbUBpHg120mqmn5EIK1r/ysxaJ5q/b2jkUkkQ=; b=P6dVW/2dmSMNm3QXHK/wnEPpz2AbqT+F08pEsd12DGNfx22ZEPRsXqAG/8Xt0zXBQGlxDq 04k6Fw6wu3FXOX7vfU9Vf0B5Z8WNOAZjVGJBTsKXfKCEPLWRg3BOZyDSYkwrkT0I3Byse/ lHOAOcGyS4LNn4R9YXK3BReiXUU8Fsc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1693878047; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VjccyfIbUBpHg120mqmn5EIK1r/ysxaJ5q/b2jkUkkQ=; b=2JyT9v1Y+if9sgNFDHuS1lAAvnZn50XoxMZWXfOoGVGJoKi245Au9ifVBzh+8vdBF6shqu u/w2QfeptURgLgDA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 47A9213499; Tue, 5 Sep 2023 01:40:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id q9b0Oh2H9mSuUwAAMHmgww (envelope-from ); Tue, 05 Sep 2023 01:40:45 +0000 From: NeilBrown To: Chuck Lever , Jeff Layton Cc: linux-nfs@vger.kernel.org Subject: [PATCH 2/3] SUNRPC: rename some functions from rqst_ to svc_thread_ Date: Tue, 5 Sep 2023 11:38:12 +1000 Message-ID: <20230905014011.25472-3-neilb@suse.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230905014011.25472-1-neilb@suse.de> References: <20230905014011.25472-1-neilb@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Functions which directly manipulate a 'struct rqst', such as svc_rqst_alloc() or svc_rqst_release_pages(), can reasonably have "rqst" in there name. However functions that act on the running thread, such as XX_should_sleep() or XX_wait_for_work() should seem more natural with a "svc_thread_" prefix. So make those changes. Signed-off-by: NeilBrown --- net/sunrpc/svc_xprt.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 17c43bde35c9..1b300a7889eb 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -699,7 +699,7 @@ static bool svc_alloc_arg(struct svc_rqst *rqstp) } static bool -rqst_should_sleep(struct svc_rqst *rqstp) +svc_thread_should_sleep(struct svc_rqst *rqstp) { struct svc_pool *pool = rqstp->rq_pool; @@ -725,15 +725,15 @@ rqst_should_sleep(struct svc_rqst *rqstp) return true; } -static void svc_rqst_wait_for_work(struct svc_rqst *rqstp) +static void svc_thread_wait_for_work(struct svc_rqst *rqstp) { struct svc_pool *pool = rqstp->rq_pool; - if (rqst_should_sleep(rqstp)) { + if (svc_thread_should_sleep(rqstp)) { set_current_state(TASK_IDLE | TASK_FREEZABLE); llist_add(&rqstp->rq_idle, &pool->sp_idle_threads); - if (unlikely(!rqst_should_sleep(rqstp))) + if (unlikely(!svc_thread_should_sleep(rqstp))) /* Work just became available. This thread cannot simply * choose not to sleep as it *must* wait until removed. * So wake the first waiter - whether it is this @@ -850,7 +850,7 @@ void svc_recv(struct svc_rqst *rqstp) if (!svc_alloc_arg(rqstp)) return; - svc_rqst_wait_for_work(rqstp); + svc_thread_wait_for_work(rqstp); clear_bit(SP_TASK_PENDING, &pool->sp_flags); From patchwork Tue Sep 5 01:38:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 13374718 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 572E0CA0FFA for ; Tue, 5 Sep 2023 16:05:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238323AbjIEQFu (ORCPT ); Tue, 5 Sep 2023 12:05:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244980AbjIEBk5 (ORCPT ); Mon, 4 Sep 2023 21:40:57 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F3A8CC6 for ; Mon, 4 Sep 2023 18:40:54 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8344921850; Tue, 5 Sep 2023 01:40:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1693878052; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lV5LP+XztS+xUnHadOFZmS0i3CHGnCGHFhK1BP4IOa8=; b=H7+j3MIrc25pz7OpCF+UvUncuh6MMW54ooiGUeOaymPEi0rAl6vZ8E+WFMLcOI4aMQj9B6 SCGDuPH6X/38LwGz5puS97YLrNJawpMqYFvztZXtLI0Lv0VS3jWohXaCVpVKTg3/11eIqJ HGRn0LGENHjlKpfSLODTMtmQn1+son4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1693878052; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lV5LP+XztS+xUnHadOFZmS0i3CHGnCGHFhK1BP4IOa8=; b=xfW+yQMRcscEus7nUdBZdh34YXhHSUYnmHUUptBl1SeB0LrZB33NyPCZPd1AWJX//S+fpi R8bwd3QUjywfNuBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 47FF613499; Tue, 5 Sep 2023 01:40:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id TIoIOyKH9mS0UwAAMHmgww (envelope-from ); Tue, 05 Sep 2023 01:40:50 +0000 From: NeilBrown To: Chuck Lever , Jeff Layton Cc: linux-nfs@vger.kernel.org Subject: [PATCH 3/3] SUNRPC: only have one thread waking up at a time Date: Tue, 5 Sep 2023 11:38:13 +1000 Message-ID: <20230905014011.25472-4-neilb@suse.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230905014011.25472-1-neilb@suse.de> References: <20230905014011.25472-1-neilb@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Currently if several items of work become available in quick succession, that number of threads (if available) will be woken. By the time some of them wake up another thread that was already cache-warm might have come along and completed the work. Anecdotal evidence suggests as many as 15% of wakes find nothing to do once they get to the point of looking. This patch changes svc_pool_wake_idle_thread() to wake the first thread on the queue but NOT remove it. Subsequent calls will wake the same thread. Once that thread starts it will dequeue itself and after dequeueing some work to do, it will wake the next thread if there is more work ready. This results in a more orderly increase in the number of busy threads. As a bonus, this allows us to reduce locking around the idle queue. svc_pool_wake_idle_thread() no longer needs to take a lock (beyond rcu_read_lock()) as it doesn't manipulate the queue, it just looks at the first item. The thread itself can avoid locking by using the new llist_del_first_this() interface. This will safely remove the thread itself if it is the head. If it isn't the head, it will do nothing. If multiple threads call this concurrently only one will succeed. The others will do nothing, so no corruption can result. If a thread wakes up and finds that it cannot dequeue itself that means either - that it wasn't woken because it was the head of the queue. Maybe the freezer woke it. In that case it can go back to sleep (after trying to freeze of course). - some other thread found there was nothing to do very recently, and placed itself on the head of the queue in front of this thread. It must check again after placing itself there, so it can be deemed to be responsible for any pending work, and this thread can go back to sleep until woken. No code ever tests for busy threads any more. Only each thread itself cares if it is busy. So svc_thread_busy() is no longer needed. Signed-off-by: NeilBrown --- include/linux/sunrpc/svc.h | 11 ----------- net/sunrpc/svc.c | 14 ++++++-------- net/sunrpc/svc_xprt.c | 38 +++++++++++++++++++++++++------------- 3 files changed, 31 insertions(+), 32 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index ad4572630335..dafa362b4fdd 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -266,17 +266,6 @@ enum { RQ_DATA, /* request has data */ }; -/** - * svc_thread_busy - check if a thread as busy - * @rqstp: the thread which might be busy - * - * A thread is only busy when it is not an the idle list. - */ -static inline bool svc_thread_busy(const struct svc_rqst *rqstp) -{ - return !llist_on_list(&rqstp->rq_idle); -} - #define SVC_NET(rqst) (rqst->rq_xprt ? rqst->rq_xprt->xpt_net : rqst->rq_bc_net) /* diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 5673f30db295..3267d740235e 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -642,7 +642,6 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node) folio_batch_init(&rqstp->rq_fbatch); - init_llist_node(&rqstp->rq_idle); rqstp->rq_server = serv; rqstp->rq_pool = pool; @@ -704,17 +703,16 @@ void svc_pool_wake_idle_thread(struct svc_pool *pool) struct llist_node *ln; rcu_read_lock(); - spin_lock_bh(&pool->sp_lock); - ln = llist_del_first_init(&pool->sp_idle_threads); - spin_unlock_bh(&pool->sp_lock); + ln = READ_ONCE(pool->sp_idle_threads.first); if (ln) { rqstp = llist_entry(ln, struct svc_rqst, rq_idle); - WRITE_ONCE(rqstp->rq_qtime, ktime_get()); - wake_up_process(rqstp->rq_task); + if (!task_is_running(rqstp->rq_task)) { + wake_up_process(rqstp->rq_task); + trace_svc_wake_up(rqstp->rq_task->pid); + percpu_counter_inc(&pool->sp_threads_woken); + } rcu_read_unlock(); - percpu_counter_inc(&pool->sp_threads_woken); - trace_svc_wake_up(rqstp->rq_task->pid); return; } rcu_read_unlock(); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 1b300a7889eb..75f66714e3a7 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -732,20 +732,19 @@ static void svc_thread_wait_for_work(struct svc_rqst *rqstp) if (svc_thread_should_sleep(rqstp)) { set_current_state(TASK_IDLE | TASK_FREEZABLE); llist_add(&rqstp->rq_idle, &pool->sp_idle_threads); + if (likely(svc_thread_should_sleep(rqstp))) + schedule(); - if (unlikely(!svc_thread_should_sleep(rqstp))) - /* Work just became available. This thread cannot simply - * choose not to sleep as it *must* wait until removed. - * So wake the first waiter - whether it is this - * thread or some other, it will get the work done. + while (!llist_del_first_this(&pool->sp_idle_threads, + &rqstp->rq_idle)) { + /* Work just became available. This thread can only + * handle it after removing rqstp from the idle + * list. If that attempt failed, some other thread + * must have queued itself after finding no + * work to do, so that thread has taken responsibly + * for this new work. This thread can safely sleep + * until woken again. */ - svc_pool_wake_idle_thread(pool); - - /* Since a thread cannot remove itself from an llist, - * schedule until someone else removes @rqstp from - * the idle list. - */ - while (!svc_thread_busy(rqstp)) { schedule(); set_current_state(TASK_IDLE | TASK_FREEZABLE); } @@ -835,6 +834,15 @@ static void svc_handle_xprt(struct svc_rqst *rqstp, struct svc_xprt *xprt) svc_xprt_release(rqstp); } +static void svc_thread_wake_next(struct svc_rqst *rqstp) +{ + if (!svc_thread_should_sleep(rqstp)) + /* More work pending after I dequeued some, + * wake another worker + */ + svc_pool_wake_idle_thread(rqstp->rq_pool); +} + /** * svc_recv - Receive and process the next request on any transport * @rqstp: an idle RPC service thread @@ -854,13 +862,16 @@ void svc_recv(struct svc_rqst *rqstp) clear_bit(SP_TASK_PENDING, &pool->sp_flags); - if (svc_thread_should_stop(rqstp)) + if (svc_thread_should_stop(rqstp)) { + svc_thread_wake_next(rqstp); return; + } rqstp->rq_xprt = svc_xprt_dequeue(pool); if (rqstp->rq_xprt) { struct svc_xprt *xprt = rqstp->rq_xprt; + svc_thread_wake_next(rqstp); /* Normally we will wait up to 5 seconds for any required * cache information to be provided. When there are no * idle threads, we reduce the wait time. @@ -885,6 +896,7 @@ void svc_recv(struct svc_rqst *rqstp) if (req) { list_del(&req->rq_bc_list); spin_unlock_bh(&serv->sv_cb_lock); + svc_thread_wake_next(rqstp); svc_process_bc(req, rqstp); return;