From patchwork Tue Mar 26 18:42:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13604864 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FA5313C3D8 for ; Tue, 26 Mar 2024 18:46:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711478783; cv=none; b=l17CqbmlSquUZBXvqzxlAWV2yUofzAKuQZGhfXRDjrHzUg9+KyYQxP1UWTSob62ObLwk5P1tke/hotVTaAfPQuZ8v4AO79zkv0Dd4MoZBnjN3JJyZWFD/rr7tGzxFLQB3mY5eOGJ3ZEeIYCAh6J3WBqYNGw6Tqu86DlvYjvbNc8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711478783; c=relaxed/simple; bh=9HhjGWUhIqjzfz0me/ODfWRKxY7Oa2h+2ifHyQ04mrQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=W+wCc2pXNZoIDdrX0W7e5uSOuTxwY72zX92HON+IhEbGGq4xxZCNHHTNGFcmsGDzItR+n7FHUicQFRVFYQDkMYO12StGc+zpgmEf4rj33dtiqyPQG0SQPsQScOt1bV7AS6jCVjuNIk1P9HD7bXdgdhFtsPRwq3k32wvjII8ANi8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=kkL+Xg0s; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="kkL+Xg0s" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1e0e89faf47so3621975ad.1 for ; Tue, 26 Mar 2024 11:46:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711478779; x=1712083579; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fFTryVI4kc0QXme9znMU16vUwU8JuZfnQ3UDLPF9VYw=; b=kkL+Xg0st8qxMMD+CkMW4F5334h8hp2RLNVbMxE09Bpf7gam9mKD4AFVzD/NK6M36k 8llxayO3JL7k/6qcYw1cix0c3be+vxWXtM23uYYT6NDzUp8NsmR7yUvlKcYPAvYYW+UR FGdcRBSeU0GPsotNDluMBSrgwWXK/q0llYsAwT/WFFlkXg7GKrLgOoTBk0vzP5sq8Kok KKQhBXkuV+H10SKzwDfqlri41G34jR+1eKVzP11/HQWOUWGwV+MQm5Ty+AImgkc8QlM+ HxYboLyNPgDH9uYWjvzjTk5YnwSzgq52ib+uE4LDhIH8d45JxClEcZdrEcpDJhh/njBB LsPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711478779; x=1712083579; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fFTryVI4kc0QXme9znMU16vUwU8JuZfnQ3UDLPF9VYw=; b=m3mI3E9zsFS0ENMFZmxszOO8j4Iph95Ah+Qy4fOd75qvpWUrbo6hqtlniHmS6aPaaa 1+eVeNhRbB9r48qUa4ti/I8k0lbZi3iJNQbMf9J8XhD8bk3Qmp5RAf/qAJyWNIgAAIwo OjhpNyc2FNMSta3sY9k5z5G24PZ5r06kBBUwxftzGVMeO9K5WFm1HxR3MIaIiipmYx06 Cg5kInjGRX9+znGG8xEMs0bGtZ4BbE3hdkrpxQuS87CjYCFbm7dUI6N37GwmLSLGhf5O VMY1wjrq8XFSoRJaFVsh93PkvpGIMeycbg5vl5SZzo0l+6p1E2sYtnfDxak08JRfEZBT QHnA== X-Gm-Message-State: AOJu0YylH+jxYR6d4NrWbwKpz94d50Rk48IoVhkiAIihcRpGNef7F6ht cIsmSbW/0SnaaBSJsdB3BLoVmop5PrNT5wMZkRX38d2zuUcIZAQWR6xFgvyuD1h/ep15ltXItbs o X-Google-Smtp-Source: AGHT+IEvbtyw+iJIgyitC43u4CBp/zxR+H800o4mInRCfsROUYdLMpZjFv8jWZRiqs9UlD709UiU6w== X-Received: by 2002:a17:903:12d1:b0:1dc:c28e:2236 with SMTP id io17-20020a17090312d100b001dcc28e2236mr12672325plb.2.1711478779376; Tue, 26 Mar 2024 11:46:19 -0700 (PDT) Received: from m2max.thefacebook.com ([2620:10d:c090:600::1:163c]) by smtp.gmail.com with ESMTPSA id lg4-20020a170902fb8400b001dede7dd3c7sm7152833plb.111.2024.03.26.11.46.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Mar 2024 11:46:17 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/4] io_uring: use the right type for work_llist empty check Date: Tue, 26 Mar 2024 12:42:45 -0600 Message-ID: <20240326184615.458820-2-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240326184615.458820-1-axboe@kernel.dk> References: <20240326184615.458820-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 io_task_work_pending() uses wq_list_empty() on ctx->work_llist, but it's not an io_wq_work_list, it's a struct llist_head. They both have ->first as head-of-list, and it turns out the checks are identical. But be proper and use the right helper. Fixes: dac6a0eae793 ("io_uring: ensure iopoll runs local task work as well") Signed-off-by: Jens Axboe --- io_uring/io_uring.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index caf1f573bb87..27d039ddb05e 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -343,7 +343,7 @@ static inline int io_run_task_work(void) static inline bool io_task_work_pending(struct io_ring_ctx *ctx) { - return task_work_pending(current) || !wq_list_empty(&ctx->work_llist); + return task_work_pending(current) || !llist_empty(&ctx->work_llist); } static inline void io_tw_lock(struct io_ring_ctx *ctx, struct io_tw_state *ts) From patchwork Tue Mar 26 18:42:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13604865 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE1E713C813 for ; Tue, 26 Mar 2024 18:46:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711478785; cv=none; b=OlNrZ+rP7AhJMAlcr/ng4bP9EIG0AIby1AqVEDzMgZ5FPfV/NK9jGXejRcHrNXbPwfHC4HreNMlStjlQx0tkPicRtbKaN/A2wl+Oga0QgAJQH82ZhqwVEVjazQ5BeThflT+IRyeLhVdjONFAWp7VJNQ1VHOnbmVgwORU762FBlk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711478785; c=relaxed/simple; bh=pUUGwF07yN+J+vXYBNDodN7f8qkqDa+p3UsKuamD6VA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JehuYIkcMzHnOSfUQdxBdkM17dXtlLp681MfqzihG9VSKSutAYHReHD5IDQvOpz1K27Er8eHk1XhHAyUdM3TEs49NrUgo5B2IQCcmLI0Vfo/1Gdvvot/z2warbghYsRv3lsoUtXARY9CHbYd8cZOis45sctPPBPaczn7/aCL8JY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=CHIDRW0m; arc=none smtp.client-ip=209.85.215.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="CHIDRW0m" Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-5d862e8b163so1240861a12.1 for ; Tue, 26 Mar 2024 11:46:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711478782; x=1712083582; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ly44eJhB0sMZoHxDFB9fvObE0Vhl/f3hyfUEw7Lcr9Y=; b=CHIDRW0mC3COV6lPOAoQQsuWZ4YOLy6CMBpPaG2/biQKdiUNeSnfGWwWpYBG3HUvcx 7zAKP/JKYPjJ4adhTWf0iYbaT8bQ8tqc7SLWj2SbgHu2TYi+wJjA/cGD62M4I04dNmAj bqAVZzqPAnXMFFaFOfjMTdkwsLjbxR/EmuoK1s+QBCv+RcHCJE8siaa1jpU0xj0TgkoK d+4VTRAo3CofqqCYZxa2O7qpR+ybh6709EvqeNjoHJN6hTwWmzVWJ9ivre/bmrxaHJhj NCis745csFDxmZdYOkFoQeMcKn2NPMvnbVUvrDdrb3yAhXydOiTlfmx6127a56QXnqa5 pKbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711478782; x=1712083582; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ly44eJhB0sMZoHxDFB9fvObE0Vhl/f3hyfUEw7Lcr9Y=; b=n9EJYb1WmWX5Lb0RzxdOuS/cCTYF5RDrB4epnA+jkbGroMAAyE3WMF5SQQU6B18NBx gWRCTB3K0EvnTL64EDDdI5qcDa/ZiNPFcVN3Evki3rj0d+0ldp+oH/FdxPBfLDroEk1q 5+npjp70gLK0sUiAecFCR1pe58wTYMYyyYZixqKPWpnzTWTPzao/KHyhu5B2ms7hH3Rg qL4uNqzPPlCmBR8XrZzh1M2GpbhvSbymdnQrJiqzmg+qqjb98681ennj5xH9C1W3L7AE gOq10BOwmnPKuJASuHCh7Nlrw+9eAINqmH1vfvc0fIsmxAVMwKYBdkxtgMzMVa3F57Lc Jf4Q== X-Gm-Message-State: AOJu0Yy03wA34QQNujcv3RZ1AwR3x9yZfIajmFy7aah3kZhEM6SF7Lta ROY+AOLmBhZx0U+8homMGtFgtOaf9E4wi0LWGQVKBlh09kXrSsBPRwJSGUe3KkMJc62p8YP10Lp v X-Google-Smtp-Source: AGHT+IHQYUjGIy2PVTqsGGZ3A5iR2bCEDboHp2kSwp7vV5+/8n8r+zuYeabVtx2v902KIWTCW8H2Lg== X-Received: by 2002:a17:903:2448:b0:1dd:dab5:ce0d with SMTP id l8-20020a170903244800b001dddab5ce0dmr12263376pls.2.1711478781529; Tue, 26 Mar 2024 11:46:21 -0700 (PDT) Received: from m2max.thefacebook.com ([2620:10d:c090:600::1:163c]) by smtp.gmail.com with ESMTPSA id lg4-20020a170902fb8400b001dede7dd3c7sm7152833plb.111.2024.03.26.11.46.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Mar 2024 11:46:19 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 2/4] io_uring: switch deferred task_work to an io_wq_work_list Date: Tue, 26 Mar 2024 12:42:46 -0600 Message-ID: <20240326184615.458820-3-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240326184615.458820-1-axboe@kernel.dk> References: <20240326184615.458820-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Lockless lists may be handy for some things, but they mean that items are in the reverse order as we can only add to the head of the list. That in turn means that iterating items on the list needs to reverse it first, if it's sensitive to ordering between items on the list. Switch the DEFER_TASKRUN work list from an llist to a normal io_wq_work_list, and protect it with a lock. Then we can get rid of the manual reversing of the list when running it, which takes considerable cycles particularly for bursty task_work additions. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 11 ++-- io_uring/io_uring.c | 117 ++++++++++++--------------------- io_uring/io_uring.h | 4 +- 3 files changed, 51 insertions(+), 81 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index aeb4639785b5..e51bf15196e4 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -329,7 +329,9 @@ struct io_ring_ctx { * regularly bounce b/w CPUs. */ struct { - struct llist_head work_llist; + struct io_wq_work_list work_list; + spinlock_t work_lock; + int work_items; unsigned long check_cq; atomic_t cq_wait_nr; atomic_t cq_timeouts; @@ -559,7 +561,10 @@ enum { typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); struct io_task_work { - struct llist_node node; + union { + struct io_wq_work_node node; + struct llist_node llist_node; + }; io_req_tw_func_t func; }; @@ -615,8 +620,6 @@ struct io_kiocb { */ u16 buf_index; - unsigned nr_tw; - /* REQ_F_* flags */ io_req_flags_t flags; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 87d7d8bbf814..9c06911077db 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -249,7 +249,7 @@ static __cold void io_fallback_req_func(struct work_struct *work) percpu_ref_get(&ctx->refs); mutex_lock(&ctx->uring_lock); - llist_for_each_entry_safe(req, tmp, node, io_task_work.node) + llist_for_each_entry_safe(req, tmp, node, io_task_work.llist_node) req->io_task_work.func(req, &ts); io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); @@ -330,7 +330,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_LIST_HEAD(&ctx->timeout_list); INIT_LIST_HEAD(&ctx->ltimeout_list); INIT_LIST_HEAD(&ctx->rsrc_ref_list); - init_llist_head(&ctx->work_llist); + INIT_WQ_LIST(&ctx->work_list); + spin_lock_init(&ctx->work_lock); INIT_LIST_HEAD(&ctx->tctx_list); ctx->submit_state.free_list.next = NULL; INIT_WQ_LIST(&ctx->locked_free_list); @@ -1135,7 +1136,7 @@ struct llist_node *io_handle_tw_list(struct llist_node *node, do { struct llist_node *next = node->next; struct io_kiocb *req = container_of(node, struct io_kiocb, - io_task_work.node); + io_task_work.llist_node); if (req->ctx != ctx) { ctx_flush_and_put(ctx, &ts); @@ -1159,20 +1160,6 @@ struct llist_node *io_handle_tw_list(struct llist_node *node, return node; } -/** - * io_llist_xchg - swap all entries in a lock-less list - * @head: the head of lock-less list to delete all entries - * @new: new entry as the head of the list - * - * If list is empty, return NULL, otherwise, return the pointer to the first entry. - * The order of entries returned is from the newest to the oldest added one. - */ -static inline struct llist_node *io_llist_xchg(struct llist_head *head, - struct llist_node *new) -{ - return xchg(&head->first, new); -} - static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) { struct llist_node *node = llist_del_all(&tctx->task_list); @@ -1180,7 +1167,7 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) struct io_kiocb *req; while (node) { - req = container_of(node, struct io_kiocb, io_task_work.node); + req = container_of(node, struct io_kiocb, io_task_work.llist_node); node = node->next; if (sync && last_ctx != req->ctx) { if (last_ctx) { @@ -1190,7 +1177,7 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) last_ctx = req->ctx; percpu_ref_get(&last_ctx->refs); } - if (llist_add(&req->io_task_work.node, + if (llist_add(&req->io_task_work.llist_node, &req->ctx->fallback_llist)) schedule_delayed_work(&req->ctx->fallback_work, 1); } @@ -1238,48 +1225,26 @@ void tctx_task_work(struct callback_head *cb) WARN_ON_ONCE(ret); } -static inline void io_req_local_work_add(struct io_kiocb *req, unsigned flags) +static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags) { struct io_ring_ctx *ctx = req->ctx; - unsigned nr_wait, nr_tw, nr_tw_prev; - struct llist_node *head; + unsigned nr_wait, nr_tw; + unsigned long flags; /* See comment above IO_CQ_WAKE_INIT */ BUILD_BUG_ON(IO_CQ_WAKE_FORCE <= IORING_MAX_CQ_ENTRIES); /* - * We don't know how many reuqests is there in the link and whether + * We don't know how many requests is there in the link and whether * they can even be queued lazily, fall back to non-lazy. */ if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) - flags &= ~IOU_F_TWQ_LAZY_WAKE; - - head = READ_ONCE(ctx->work_llist.first); - do { - nr_tw_prev = 0; - if (head) { - struct io_kiocb *first_req = container_of(head, - struct io_kiocb, - io_task_work.node); - /* - * Might be executed at any moment, rely on - * SLAB_TYPESAFE_BY_RCU to keep it alive. - */ - nr_tw_prev = READ_ONCE(first_req->nr_tw); - } - - /* - * Theoretically, it can overflow, but that's fine as one of - * previous adds should've tried to wake the task. - */ - nr_tw = nr_tw_prev + 1; - if (!(flags & IOU_F_TWQ_LAZY_WAKE)) - nr_tw = IO_CQ_WAKE_FORCE; + tw_flags &= ~IOU_F_TWQ_LAZY_WAKE; - req->nr_tw = nr_tw; - req->io_task_work.node.next = head; - } while (!try_cmpxchg(&ctx->work_llist.first, &head, - &req->io_task_work.node)); + spin_lock_irqsave(&ctx->work_lock, flags); + wq_list_add_tail(&req->io_task_work.node, &ctx->work_list); + nr_tw = ++ctx->work_items; + spin_unlock_irqrestore(&ctx->work_lock, flags); /* * cmpxchg implies a full barrier, which pairs with the barrier @@ -1289,7 +1254,7 @@ static inline void io_req_local_work_add(struct io_kiocb *req, unsigned flags) * is similar to the wait/wawke task state sync. */ - if (!head) { + if (nr_tw == 1) { if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags); if (ctx->has_evfd) @@ -1297,13 +1262,8 @@ static inline void io_req_local_work_add(struct io_kiocb *req, unsigned flags) } nr_wait = atomic_read(&ctx->cq_wait_nr); - /* not enough or no one is waiting */ - if (nr_tw < nr_wait) - return; - /* the previous add has already woken it up */ - if (nr_tw_prev >= nr_wait) - return; - wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE); + if (nr_tw >= nr_wait) + wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE); } static void io_req_normal_work_add(struct io_kiocb *req) @@ -1312,7 +1272,7 @@ static void io_req_normal_work_add(struct io_kiocb *req) struct io_ring_ctx *ctx = req->ctx; /* task_work already pending, we're done */ - if (!llist_add(&req->io_task_work.node, &tctx->task_list)) + if (!llist_add(&req->io_task_work.llist_node, &tctx->task_list)) return; if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) @@ -1346,9 +1306,15 @@ void __io_req_task_work_add(struct io_kiocb *req, unsigned flags) static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx) { - struct llist_node *node; + struct io_wq_work_node *node; + unsigned long flags; + + spin_lock_irqsave(&ctx->work_lock, flags); + node = ctx->work_list.first; + INIT_WQ_LIST(&ctx->work_list); + ctx->work_items = 0; + spin_unlock_irqrestore(&ctx->work_lock, flags); - node = llist_del_all(&ctx->work_llist); while (node) { struct io_kiocb *req = container_of(node, struct io_kiocb, io_task_work.node); @@ -1361,7 +1327,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx) static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events, int min_events) { - if (llist_empty(&ctx->work_llist)) + if (wq_list_empty(&ctx->work_list)) return false; if (events < min_events) return true; @@ -1373,7 +1339,7 @@ static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events, static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts, int min_events) { - struct llist_node *node; + struct io_wq_work_node *node; unsigned int loops = 0; int ret = 0; @@ -1382,13 +1348,14 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts, if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags); again: - /* - * llists are in reverse order, flip it back the right way before - * running the pending items. - */ - node = llist_reverse_order(io_llist_xchg(&ctx->work_llist, NULL)); + spin_lock_irq(&ctx->work_lock); + node = ctx->work_list.first; + INIT_WQ_LIST(&ctx->work_list); + ctx->work_items = 0; + spin_unlock_irq(&ctx->work_lock); + while (node) { - struct llist_node *next = node->next; + struct io_wq_work_node *next = node->next; struct io_kiocb *req = container_of(node, struct io_kiocb, io_task_work.node); INDIRECT_CALL_2(req->io_task_work.func, @@ -1414,7 +1381,7 @@ static inline int io_run_local_work_locked(struct io_ring_ctx *ctx, { struct io_tw_state ts = {}; - if (llist_empty(&ctx->work_llist)) + if (wq_list_empty(&ctx->work_list)) return 0; return __io_run_local_work(ctx, &ts, min_events); } @@ -2426,7 +2393,7 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode, int io_run_task_work_sig(struct io_ring_ctx *ctx) { - if (!llist_empty(&ctx->work_llist)) { + if (!wq_list_empty(&ctx->work_list)) { __set_current_state(TASK_RUNNING); if (io_run_local_work(ctx, INT_MAX) > 0) return 0; @@ -2455,7 +2422,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, if (unlikely(READ_ONCE(ctx->check_cq))) return 1; - if (unlikely(!llist_empty(&ctx->work_llist))) + if (unlikely(!wq_list_empty(&ctx->work_list))) return 1; if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL))) return 1; @@ -2494,7 +2461,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, if (!io_allowed_run_tw(ctx)) return -EEXIST; - if (!llist_empty(&ctx->work_llist)) + if (!wq_list_empty(&ctx->work_list)) io_run_local_work(ctx, min_events); io_run_task_work(); io_cqring_overflow_flush(ctx); @@ -2558,7 +2525,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, * now rather than let the caller do another wait loop. */ io_run_task_work(); - if (!llist_empty(&ctx->work_llist)) + if (!wq_list_empty(&ctx->work_list)) io_run_local_work(ctx, nr_wait); /* @@ -3331,7 +3298,7 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd) io_run_task_work(); io_uring_drop_tctx_refs(current); xa_for_each(&tctx->xa, index, node) { - if (!llist_empty(&node->ctx->work_llist)) { + if (!wq_list_empty(&node->ctx->work_list)) { WARN_ON_ONCE(node->ctx->submitter_task && node->ctx->submitter_task != current); goto end_wait; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 27d039ddb05e..bb30a29d0e27 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -343,7 +343,7 @@ static inline int io_run_task_work(void) static inline bool io_task_work_pending(struct io_ring_ctx *ctx) { - return task_work_pending(current) || !llist_empty(&ctx->work_llist); + return task_work_pending(current) || !wq_list_empty(&ctx->work_list); } static inline void io_tw_lock(struct io_ring_ctx *ctx, struct io_tw_state *ts) @@ -457,6 +457,6 @@ enum { static inline bool io_has_work(struct io_ring_ctx *ctx) { return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) || - !llist_empty(&ctx->work_llist); + !wq_list_empty(&ctx->work_list); } #endif From patchwork Tue Mar 26 18:42:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13604866 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E412513C3CE for ; Tue, 26 Mar 2024 18:46:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711478786; cv=none; b=KC8fac2emqAPyCu91gUsuinYrSjfgVnOydXjvHmpMGVoktwwP0MmZyByLh5n+WB5f2ev7hhGZ6d0Wc57XwVDcYg30kvULbfX7zHHWU96aCqt+n8Urm+TOZmzFAAIqKo0Fz3mzd3AgtTHXRK6tyN8cgjFtvbjSFWAABBOYWWoGzU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711478786; c=relaxed/simple; bh=hqqVwUR8UBgTXC+m3PJGOrCt+4rEJaKqAkmks2x1QfM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bct79eTkU2m7K6asNTXaI5kuVH0/IWfeFI88P8oh8+UHPPXLy9+iOI5r8s/LVRh1a5TLRJs6FVDylJJKiVtwziJAK90IyxozhDrCXzy7Q22UM/l9CfAsoVnMkHTZNUJk9NIS3Qf4+QkqOI2Eo+4xytibExzrm+jkvuYANxh/ZK8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=dHTDdOEk; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="dHTDdOEk" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-55b5a37acb6so1075278a12.0 for ; Tue, 26 Mar 2024 11:46:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711478784; x=1712083584; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NSinA+cYQIpdoVYw+6926mBOgX0n0QlWpqrr0xOSIQ0=; b=dHTDdOEkiqi81bZ89GVI0wSA1Tb6r5locy4D2Mk2W5YB4WvGcJ9UpIB+D1gtCHyLMa q/7G2NknMAL6F0AwoJpgppM512cXciVms12mNzJNbZH2ZD6BrhZX5tEtOo+KNWl1jh1P FBivU6qD3vRQig6N2JaXVJwfKfLdEgfLTnvrnNJLZuPIcu4IcrH8q3LwsC/6yYaER06L EvshKK94qAZlfmY1keUypN+SRCB+pOkAR6lgA9vcGfgfXsdXe4Owyy0sdKilAkPlEkHA +vpwus3CDSA6xkl27N/mwBsAoz7zcZCIgibxfX2r3SUN1jUXIoM3ny3uY9+HXKa1ooGJ +FFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711478784; x=1712083584; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NSinA+cYQIpdoVYw+6926mBOgX0n0QlWpqrr0xOSIQ0=; b=q64tqH/rY5tFCO+xYc/Pe326m4CIOctTOACeN9Tua9dwHfacaPBMAR80sB8FldUUXW g+CUeAJ4DONfP4gel9sPgZWCc+DTMew3XXSfgd+MAB0toYwdRaHin/8Zoif0Dz0m20uL LdDc1LQCXI+/1f1fvoA22rUtP2gUJJtgwciKM+YPNc1uuawVzD0m8M0gs6ZUDlzZA47n BZoFu4rGrYaMxP9d18qwRo0ccxc081NnG4oo1zCyFI5MHvrV1L48trftc0ThK8980WLA 9G6q2+SJtummvxe50K3Zh2aikGdg+ST07/tF8G1z3uR5tEiKyyuyM6enMY0WVBScUom1 NS9Q== X-Gm-Message-State: AOJu0YyfDnjObEqD1dzuVZ4Mav/xFRQWsaHQX1qzkfAyLdrTM8AOIfJr qh/EuLKchkK6Z2kpiwtGJNTmNdH1NovRm5LKH09c3a7F2w2YYfdlOj1dGdLFGsYjWk9M6SrD1y4 7 X-Google-Smtp-Source: AGHT+IFx1RHyoEpiNPhE7nNWvFPVY1oZ7COplYJUI/g6yODL+1334ia+xcjg7yemg20Bt2XOesHjsA== X-Received: by 2002:a17:902:7b8f:b0:1dd:7350:29f6 with SMTP id w15-20020a1709027b8f00b001dd735029f6mr11251596pll.3.1711478783790; Tue, 26 Mar 2024 11:46:23 -0700 (PDT) Received: from m2max.thefacebook.com ([2620:10d:c090:600::1:163c]) by smtp.gmail.com with ESMTPSA id lg4-20020a170902fb8400b001dede7dd3c7sm7152833plb.111.2024.03.26.11.46.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Mar 2024 11:46:21 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/4] io_uring: switch fallback work to io_wq_work_list Date: Tue, 26 Mar 2024 12:42:47 -0600 Message-ID: <20240326184615.458820-4-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240326184615.458820-1-axboe@kernel.dk> References: <20240326184615.458820-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Just like what was done for deferred task_work, convert the fallback task_work to a normal io_wq_work_list. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 +- io_uring/io_uring.c | 24 +++++++++++++++++++----- 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index e51bf15196e4..2bc253f8147d 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -400,7 +400,7 @@ struct io_ring_ctx { struct mm_struct *mm_account; /* ctx exit and cancelation */ - struct llist_head fallback_llist; + struct io_wq_work_list fallback_list; struct delayed_work fallback_work; struct work_struct exit_work; struct list_head tctx_list; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 9c06911077db..8d7138eaa921 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -243,14 +243,22 @@ static __cold void io_fallback_req_func(struct work_struct *work) { struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, fallback_work.work); - struct llist_node *node = llist_del_all(&ctx->fallback_llist); - struct io_kiocb *req, *tmp; + struct io_wq_work_node *node; struct io_tw_state ts = {}; + struct io_kiocb *req; + + spin_lock_irq(&ctx->work_lock); + node = ctx->fallback_list.first; + INIT_WQ_LIST(&ctx->fallback_list); + spin_unlock_irq(&ctx->work_lock); percpu_ref_get(&ctx->refs); mutex_lock(&ctx->uring_lock); - llist_for_each_entry_safe(req, tmp, node, io_task_work.llist_node) + while (node) { + req = container_of(node, struct io_kiocb, io_task_work.node); + node = node->next; req->io_task_work.func(req, &ts); + } io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); percpu_ref_put(&ctx->refs); @@ -1167,6 +1175,9 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) struct io_kiocb *req; while (node) { + unsigned long flags; + bool do_wake; + req = container_of(node, struct io_kiocb, io_task_work.llist_node); node = node->next; if (sync && last_ctx != req->ctx) { @@ -1177,8 +1188,11 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) last_ctx = req->ctx; percpu_ref_get(&last_ctx->refs); } - if (llist_add(&req->io_task_work.llist_node, - &req->ctx->fallback_llist)) + spin_lock_irqsave(&req->ctx->work_lock, flags); + do_wake = wq_list_empty(&req->ctx->fallback_list); + wq_list_add_tail(&req->io_task_work.node, &req->ctx->fallback_list); + spin_unlock_irqrestore(&req->ctx->work_lock, flags); + if (do_wake) schedule_delayed_work(&req->ctx->fallback_work, 1); } From patchwork Tue Mar 26 18:42:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13604867 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26A6713CA98 for ; Tue, 26 Mar 2024 18:46:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711478789; cv=none; b=PwpXzilhuqQDN8dIjgnnPSv28uSM+4K8TJnTEoDfuEvWlIHVw8I5ioZI0Pq1gr1uhlaDjJ3KaNGqs/PWQolhr6rszCDkdwp4+PmUDqLP62xQUTC4EluvV/IhMly5SwXh8qme41dfHwEWu3QT/UKn4b47zStBvFPZTRQBlFOlB3Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711478789; c=relaxed/simple; bh=nSLlOmzfA7Cm3nJYUHKzPufzNumVqrnn2odB+zqlAmg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=s9A0VqRxzoNJfRGQfUCljpu/Tom7ZBvCOi/+KRBKokJrf+R5fiB/JgkyNIawyFODNmgD6ZJPve2byEMAJukTZIGV9ok0b8gPslhDMiaYeDdHP8D6/VuE8TuLuB++Dq3Wf1Kkus1pidASqS0cR45IewpelC7D0AM/ma9rF0ZSfU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=Dgh2WF0l; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="Dgh2WF0l" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-6e694337fffso815464b3a.1 for ; Tue, 26 Mar 2024 11:46:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711478786; x=1712083586; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oy1+wEsVZVxa1c5Jv66XkxfdSyCZlLPbJeIh1zOl9gM=; b=Dgh2WF0lBY+6XYVXp5N+hOdE49Lo2j2i1ILjKBunRxC8xVr5mJzu3yjOVEb8Z1va4y XMvo2Agv9xCfZQpDS44yuw4lKq05/+Lvy/sfPP383BBKe14MM9EFq7H22qHD8wlAwla3 /JDjEsTumBYAGYtIg3YBeCUWKVHL94/78YKYJ4q4/pz27bYUhaSVKLcPuPUENlD4df3h EFNArsmFZxYcwoJrhjuEfn8jJENbzq587zxRNzTZLBNLEIeGRUa9wQZNG2KUXdNHnGNE sAIVYR8GmBfRun4s8to6L6HCpevN5YBKh+6nCQRTnrQuCLOgHEy0xO7MMZtwN1YKS85f vl9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711478786; x=1712083586; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oy1+wEsVZVxa1c5Jv66XkxfdSyCZlLPbJeIh1zOl9gM=; b=opOBieKhN5+aIYJfOAOTk+iELTPXHbAZ9TDklrekvQTZ4/aziKjXqZcLzgEqxqi8o4 CW9EAGmU0G4Pqe94AP8ccZvDm+l+5p0o9XTgn5TCwIObIYZMjjeoVCDshYdgZkSSJdMW o3EEUUWBE8bY3Cv4Yddt4KRbHOYQ267l6cN3cftjuGnWq5FHsAZt6rvkf4ePP1ejE+yY bvm8oZXze+iAfkIjsBrUW2uB2Il1TdpdfqklclENN2shTgzhILnMni5nK/b3+1zmmrGM 7G5N8JX43PYNxqFgen1qokhPasmOu8BAIpSru89E8WEGT35bue2pIohDz2lgAKwbGjfL bS4w== X-Gm-Message-State: AOJu0YzNcjU+ASZLSZwQc91Qovqhm6hrIsl+71LQVT/PliUhtdJ1w67I MmH3Cn6WA7eaLllvKyWUHuGhRehIa7hwBPZ6QvQ0qMUs/jiTHPVEs7TrWcUm512CxeR+lHYAorR L X-Google-Smtp-Source: AGHT+IEFLjtQAVpMcxs7TIwWaepF6oirYCWqGvg7aIAUosmhFdSIiK1+p7vmmGvtEAW0rS9/Nv9RPQ== X-Received: by 2002:a17:902:8a83:b0:1dc:df03:ad86 with SMTP id p3-20020a1709028a8300b001dcdf03ad86mr11781585plo.2.1711478785808; Tue, 26 Mar 2024 11:46:25 -0700 (PDT) Received: from m2max.thefacebook.com ([2620:10d:c090:600::1:163c]) by smtp.gmail.com with ESMTPSA id lg4-20020a170902fb8400b001dede7dd3c7sm7152833plb.111.2024.03.26.11.46.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Mar 2024 11:46:24 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 4/4] io_uring: switch normal task_work to io_wq_work_list Date: Tue, 26 Mar 2024 12:42:48 -0600 Message-ID: <20240326184615.458820-5-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240326184615.458820-1-axboe@kernel.dk> References: <20240326184615.458820-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This concludes the transition, now all times of task_work are using the same mechanism. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 8 ++---- io_uring/io_uring.c | 50 ++++++++++++++++++++++------------ io_uring/io_uring.h | 4 +-- io_uring/sqpoll.c | 8 +++--- io_uring/tctx.c | 3 +- 5 files changed, 43 insertions(+), 30 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 2bc253f8147d..f46f871c09fe 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -95,7 +95,8 @@ struct io_uring_task { struct percpu_counter inflight; struct { /* task_work */ - struct llist_head task_list; + struct io_wq_work_list task_list; + spinlock_t task_lock; struct callback_head task_work; } ____cacheline_aligned_in_smp; }; @@ -561,10 +562,7 @@ enum { typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); struct io_task_work { - union { - struct io_wq_work_node node; - struct llist_node llist_node; - }; + struct io_wq_work_node node; io_req_tw_func_t func; }; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 8d7138eaa921..e12b518e0b84 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1134,17 +1134,17 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts) * If more entries than max_entries are available, stop processing once this * is reached and return the rest of the list. */ -struct llist_node *io_handle_tw_list(struct llist_node *node, - unsigned int *count, - unsigned int max_entries) +struct io_wq_work_node *io_handle_tw_list(struct io_wq_work_node *node, + unsigned int *count, + unsigned int max_entries) { struct io_ring_ctx *ctx = NULL; struct io_tw_state ts = { }; do { - struct llist_node *next = node->next; + struct io_wq_work_node *next = node->next; struct io_kiocb *req = container_of(node, struct io_kiocb, - io_task_work.llist_node); + io_task_work.node); if (req->ctx != ctx) { ctx_flush_and_put(ctx, &ts); @@ -1170,15 +1170,20 @@ struct llist_node *io_handle_tw_list(struct llist_node *node, static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) { - struct llist_node *node = llist_del_all(&tctx->task_list); struct io_ring_ctx *last_ctx = NULL; + struct io_wq_work_node *node; struct io_kiocb *req; + unsigned long flags; + + spin_lock_irqsave(&tctx->task_lock, flags); + node = tctx->task_list.first; + INIT_WQ_LIST(&tctx->task_list); + spin_unlock_irqrestore(&tctx->task_lock, flags); while (node) { - unsigned long flags; bool do_wake; - req = container_of(node, struct io_kiocb, io_task_work.llist_node); + req = container_of(node, struct io_kiocb, io_task_work.node); node = node->next; if (sync && last_ctx != req->ctx) { if (last_ctx) { @@ -1202,22 +1207,24 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) } } -struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, - unsigned int max_entries, - unsigned int *count) +struct io_wq_work_node *tctx_task_work_run(struct io_uring_task *tctx, + unsigned int max_entries, + unsigned int *count) { - struct llist_node *node; + struct io_wq_work_node *node; if (unlikely(current->flags & PF_EXITING)) { io_fallback_tw(tctx, true); return NULL; } - node = llist_del_all(&tctx->task_list); - if (node) { - node = llist_reverse_order(node); + spin_lock_irq(&tctx->task_lock); + node = tctx->task_list.first; + INIT_WQ_LIST(&tctx->task_list); + spin_unlock_irq(&tctx->task_lock); + + if (node) node = io_handle_tw_list(node, count, max_entries); - } /* relaxed read is enough as only the task itself sets ->in_cancel */ if (unlikely(atomic_read(&tctx->in_cancel))) @@ -1229,8 +1236,8 @@ struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, void tctx_task_work(struct callback_head *cb) { + struct io_wq_work_node *ret; struct io_uring_task *tctx; - struct llist_node *ret; unsigned int count = 0; tctx = container_of(cb, struct io_uring_task, task_work); @@ -1284,9 +1291,16 @@ static void io_req_normal_work_add(struct io_kiocb *req) { struct io_uring_task *tctx = req->task->io_uring; struct io_ring_ctx *ctx = req->ctx; + unsigned long flags; + bool was_empty; + + spin_lock_irqsave(&tctx->task_lock, flags); + was_empty = wq_list_empty(&tctx->task_list); + wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); + spin_unlock_irqrestore(&tctx->task_lock, flags); /* task_work already pending, we're done */ - if (!llist_add(&req->io_task_work.llist_node, &tctx->task_list)) + if (!was_empty) return; if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index bb30a29d0e27..e1582529bc58 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -87,8 +87,8 @@ void io_req_task_queue(struct io_kiocb *req); void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts); void io_req_task_queue_fail(struct io_kiocb *req, int ret); void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts); -struct llist_node *io_handle_tw_list(struct llist_node *node, unsigned int *count, unsigned int max_entries); -struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, unsigned int max_entries, unsigned int *count); +struct io_wq_work_node *io_handle_tw_list(struct io_wq_work_node *node, unsigned int *count, unsigned int max_entries); +struct io_wq_work_node *tctx_task_work_run(struct io_uring_task *tctx, unsigned int max_entries, unsigned int *count); void tctx_task_work(struct callback_head *cb); __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd); int io_uring_alloc_task_context(struct task_struct *task, diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index 3983708cef5b..3a34b867d5c0 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -230,7 +230,7 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd) * than we were asked to process. Newly queued task_work isn't run until the * retry list has been fully processed. */ -static unsigned int io_sq_tw(struct llist_node **retry_list, int max_entries) +static unsigned int io_sq_tw(struct io_wq_work_node **retry_list, int max_entries) { struct io_uring_task *tctx = current->io_uring; unsigned int count = 0; @@ -246,11 +246,11 @@ static unsigned int io_sq_tw(struct llist_node **retry_list, int max_entries) return count; } -static bool io_sq_tw_pending(struct llist_node *retry_list) +static bool io_sq_tw_pending(struct io_wq_work_node *retry_list) { struct io_uring_task *tctx = current->io_uring; - return retry_list || !llist_empty(&tctx->task_list); + return retry_list || !wq_list_empty(&tctx->task_list); } static void io_sq_update_worktime(struct io_sq_data *sqd, struct rusage *start) @@ -266,7 +266,7 @@ static void io_sq_update_worktime(struct io_sq_data *sqd, struct rusage *start) static int io_sq_thread(void *data) { - struct llist_node *retry_list = NULL; + struct io_wq_work_node *retry_list = NULL; struct io_sq_data *sqd = data; struct io_ring_ctx *ctx; struct rusage start; diff --git a/io_uring/tctx.c b/io_uring/tctx.c index c043fe93a3f2..9bc0e203b780 100644 --- a/io_uring/tctx.c +++ b/io_uring/tctx.c @@ -86,7 +86,8 @@ __cold int io_uring_alloc_task_context(struct task_struct *task, atomic_set(&tctx->in_cancel, 0); atomic_set(&tctx->inflight_tracked, 0); task->io_uring = tctx; - init_llist_head(&tctx->task_list); + INIT_WQ_LIST(&tctx->task_list); + spin_lock_init(&tctx->task_lock); init_task_work(&tctx->task_work, tctx_task_work); return 0; }