From patchwork Fri Mar 21 19:24:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14025956 Received: from mail-io1-f41.google.com (mail-io1-f41.google.com [209.85.166.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 258E522C35D for ; Fri, 21 Mar 2025 19:31:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585505; cv=none; b=UWSxTMsHiOdpGe1mq65X3VNDx+FGHyHsjzJGzwZd5JvEcMQP7V/SOMAJPgbG+GIFrSR26aSj+J+o7rd6twRnTHrpkXhpzIRzM0z/9oms45T+IwpH2uV/iT0fvpsin66sIsnyTtXHjMWZL1P5LNqdVVCmkSa+c4po2mWq3l3y27I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585505; c=relaxed/simple; bh=tmu44H8PHF/Ekh/eMQW+LHxOQINbgxEcpXVMvOIj8SI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qX25ERdRKPqRNY0mpGAl04ocu67qvn3jtEdu7U87+89hEkSaLye479vXF1mUY+4lVM1Mm1ZedZCfJONotj1b4O0zv9G/Hb3avtvIDyDUDHbXERExcHfbmhizL5M1JkV/uyvy35gylatm04L0mEtdV1LwSi1XYkIUTKKIRzScTV0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=U27JJS2W; arc=none smtp.client-ip=209.85.166.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="U27JJS2W" Received: by mail-io1-f41.google.com with SMTP id ca18e2360f4ac-85b4170f1f5so64571339f.3 for ; Fri, 21 Mar 2025 12:31:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1742585500; x=1743190300; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bZ6z0kB+ScBGWw7Jv+htutGbkm2LGsdDyHzOuI7s9j0=; b=U27JJS2W20uEb9gj1lm2ksItiVXuKaPV/snpjWrM5NpEn2iZ0+dvUb5Xi+mziEqYqf aZSnEj/aCM6rxZQBlb6VOE9N3ZF/KEHvq0oiNSgq4HoroHX8KRxjHrYhnoFZdzPBI7a1 rfy5vK5eEpSzHqldgSBTtYf5cJv8+8z6TDfIPbPFsiXO9UkBOij9550S0FE41ceuxn+6 rJnLmklfUOlRuJznzmJzOTMj9oNNxeP3O+tGxoYwdNceM87CeleYr2oQSK0pD4ss3ga9 h7OeANT+DpoKCNoMpSXcPEOR2RNnAHXK2Afo3uKUem/4ZhZu3cjtcELLhhogPHaVeL/2 O5Qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742585500; x=1743190300; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bZ6z0kB+ScBGWw7Jv+htutGbkm2LGsdDyHzOuI7s9j0=; b=ce8s1/ptbktbVGEoP+9l5GbzUqvpPWJnxhiualYmb9cMpXJas3zufVVUGsu9E2GuvD iDanwJq1pPlDX55cPDPrcyxteuX7LmL5TuFOgewdyTJKjoHpDER2FqGCdH9+i0GTdm+a lTlF6gPg5QuD4E1RhECKQStEubOQRIrZvrVtE6s0GavB0QmJZFhK6X28rfLy1FI3VzmV 26useDwQHZwwi+8oIlWi8WNm7aEZKEcMLjE4Y/eLvoF5cPoyjKLbvjK3rVXCw4EK+IVC Im/9CeszquHRnxTdG/oRsgZvNI44Od4vSgkYS63wQcgwKWWoLx5k2uUJPsUk4Duz/y5l yTow== X-Gm-Message-State: AOJu0YyE+qZ0ANnEn6kCFKuge677tg8sy/VvGQ1pslfDYALclClSNhz/ ksqPcRbZQTaEMEw5YefZLh8Tk90eZJHSVh2wCQp8dMOeqJOFZVzdJorue/2CUdXNI9IS2dNvHC1 I X-Gm-Gg: ASbGncsDZoXDNqFvDx6Sjjflc3IKLbpSRNSUjCEGVN4PjnSymdk3XugE9/+tjUfR3s8 ptiwFH2KO5n7DLrTsDmHZstaN6bmtQx1XsZaxrCUAht9x3E6YiRfZeFXdenHqnnas/tYcPr9Rt6 m5FFdOgM/9FfoZ5Ag4IQbToy0ps7klpv2zZ0/xbU+Kj3SWMLo/v1OlsNX28e9KTDBr67DwLQ9iV glns2sYm8CQ31X6474cmXi4ETjp2e9yJKJE1i2dpI4ceH/+f5dOVhbcL8v8LQtme62wwXFneYVy OU+ql72Dy50p9QzKSP4/Jh480jb0Pj/2pR2+7wxKZhg/dbU+tA== X-Google-Smtp-Source: AGHT+IH8zFuinsrUjDC6QESRtve95GCm71awmku+n53Oz5A6Zkkm6aPHDG+1m96v56KKBok9sBl3Vg== X-Received: by 2002:a05:6602:b8b:b0:85b:3763:9551 with SMTP id ca18e2360f4ac-85e2ca756a0mr561008539f.7.1742585500522; Fri, 21 Mar 2025 12:31:40 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f2cbdeac82sm571268173.71.2025.03.21.12.31.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Mar 2025 12:31:40 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, Jens Axboe Subject: [PATCH 1/5] fs: gate final fput task_work on PF_NO_TASKWORK Date: Fri, 21 Mar 2025 13:24:55 -0600 Message-ID: <20250321193134.738973-2-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250321193134.738973-1-axboe@kernel.dk> References: <20250321193134.738973-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Rather than hardwire this to kernel threads, add a task flag that tells us whether the task in question runs task_work or not. At fork time, this flag is set for kernel threads. This is in preparation for allowing kernel threads to signal that they will run deferred task_work. No functional changes in this patch. Signed-off-by: Jens Axboe --- fs/file_table.c | 2 +- include/linux/sched.h | 2 +- kernel/fork.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index 5c00dc38558d..d824f1330d6e 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -520,7 +520,7 @@ void fput(struct file *file) file_free(file); return; } - if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { + if (likely(!in_interrupt() && !(task->flags & PF_NO_TASKWORK))) { init_task_work(&file->f_task_work, ____fput); if (!task_work_add(task, &file->f_task_work, TWA_RESUME)) return; diff --git a/include/linux/sched.h b/include/linux/sched.h index 9c15365a30c0..301f5dda6a06 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1709,7 +1709,7 @@ extern struct pid *cad_pid; * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF__HOLE__00800000 0x00800000 +#define PF_NO_TASKWORK 0x00800000 /* task doesn't run task_work */ #define PF__HOLE__01000000 0x01000000 #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ diff --git a/kernel/fork.c b/kernel/fork.c index 735405a9c5f3..3745407624c7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2235,7 +2235,7 @@ __latent_entropy struct task_struct *copy_process( goto fork_out; p->flags &= ~PF_KTHREAD; if (args->kthread) - p->flags |= PF_KTHREAD; + p->flags |= PF_KTHREAD | PF_NO_TASKWORK; if (args->user_worker) { /* * Mark us a user worker, and block any signal that isn't From patchwork Fri Mar 21 19:24:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14025958 Received: from mail-io1-f54.google.com (mail-io1-f54.google.com [209.85.166.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB477225A20 for ; Fri, 21 Mar 2025 19:31:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585506; cv=none; b=J3fJgCK7EDA9g97DJfiH5CKSCqCi9EeSbfcWRAwiad2K0iuGvbcYACjC5f8HMcLqSZrQ+6jOnAbvdqGYgnQK013+ihJvN2NlSX02ppTmIizohUCCw7LNJAW6bGLJ30q6RyVjRFi9JDER78XkQsprjdHOh+lK4C9rNdT7tzMjH44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585506; c=relaxed/simple; bh=TWHf8sidIKTXj3vh9lpN8ep+HAEsBdmybz1YeWb2ICk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Qnhn8R30D6syBb/rX30WhAnFDQYTCiYUsjYtezqsIlMjzkWMH7FIIlmIW85et6HKw4wpMri3QMucHvoRWWJQOrFa+6Weahl0GKTYgTra8xRpYggcMbBqpOMGvzIJwH1mZSyOcvxTlI1nGIq0k7Y8R6PbhQznp44IMy3JmjaDmgI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=Sa752X2G; arc=none smtp.client-ip=209.85.166.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="Sa752X2G" Received: by mail-io1-f54.google.com with SMTP id ca18e2360f4ac-85b3f92c8dfso77926139f.2 for ; Fri, 21 Mar 2025 12:31:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1742585502; x=1743190302; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=S7ct+aok0+e88ngkQ9ijTRO+pVsgJ58u38oJY4wLvbk=; b=Sa752X2G2rnVOzVbozn+OZJBJ/0aSI75br0WaBL+JfzRNgiFyMdTQinRSK9XyQ750p UE3T0F0AwLDWd8ha+L37W2SSqjrMmRB1tYbK9uAd6Nbi4JBI6QumNYR4yzfiZFAbA0LI v/+j0FflOCrNAE9S7hHHPB2eXMrC8Xm8I4pfrlZ3TNGMLqbaac+/jHIxHjLxccyhl7R+ yfC16BvkGX+D7/Qmuod8jXBtr8GiP1sEwlrZc3SC80fZgfB1xwXGBexaX4JNMvAjIWFw VvGFNbfGDAvhZIATPHVSHEWpvw+Y2q9d2VL/vQFteO/++CXPeVkcmSDp1jBbb0bxzyzK XEqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742585502; x=1743190302; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S7ct+aok0+e88ngkQ9ijTRO+pVsgJ58u38oJY4wLvbk=; b=Yk0hDQxuEaP8J4qSJa5yEPaFKY6DjqmTb7acloqHp42pqXhH5EE7u+ZmiESWtyL6Fm w35m0J0jYuPuS8gbRl4L1XvnGMoEK/qR4bhHKT8m++oRQmdIQ0XktvKMOuj6AupSBIYd PZIRQW2EjnyyDWhN1txL3WSzB/vmSp4flVcwqIlw01v99rb0nUPhef0QLbT/wnlvs8S6 6AHiAvix68BVs8Q72rLsxsIhND9uUxHqf5l4LB5nP9R4abfU5ZP9czgZad6gC/x+/qe0 EQbIgw36l4p3O74SEecYDYDyxbE0wZ709Or9YAcIqnDSsNFDTlENqVTjbiQBRag+sstr 8KJA== X-Gm-Message-State: AOJu0YwnKPFCQfPFj5o9yTpy5iNQZbc1kiUpW06gPwWD4qeD6+MP9nHM eU/JWK2V5Mgob3iQJxSHTEfd7xRJ/vrZUTKjU9SQBSpEGQLKyrTRqZiqtTx/ie8oh+mbq0q7eE6 j X-Gm-Gg: ASbGncscRw3aTFUdV2qJa3LOu2S3GVVAd6+eFIZa8yF8jzdbihURRLylp2405MapI3L umyvyAn+XNc7ltF439uhG3GlBsqegg05We2+EaVmVH6lVOSsORoznSoYIu3qJlsLRaIuyzSQffo ngo8GG+lCySi1nTu4Cde4yXREkZx89F7cmTzRtLZwvEorK1gHo96dh32zb1op8dyUHh8qrg4mJM ZlKQMEIvxv7cNOsWNpUJMPlt1aCp8xhoKICp8otgaUdbhoQJ1YjkVtnhQPKL30T9XFq+bXadxpX Su1sfjbx7QxaQCirK2z7Sw5Mg2olI/p32UdmFoKflNhSs5JObA== X-Google-Smtp-Source: AGHT+IGNRnt4rzZJAHoD3dEpG69YDtr4uytrMiOPn8OYlhL2VyHmNMiPkUwqJjwR9eZumSUgTZkY/A== X-Received: by 2002:a05:6602:a10d:b0:85b:46b5:6fb5 with SMTP id ca18e2360f4ac-85e2cb4605cmr333349939f.11.1742585502149; Fri, 21 Mar 2025 12:31:42 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f2cbdeac82sm571268173.71.2025.03.21.12.31.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Mar 2025 12:31:40 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, Jens Axboe Subject: [PATCH 2/5] io_uring: mark exit side kworkers as task_work capable Date: Fri, 21 Mar 2025 13:24:56 -0600 Message-ID: <20250321193134.738973-3-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250321193134.738973-1-axboe@kernel.dk> References: <20250321193134.738973-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There are two types of work here: 1) Fallback work, if the task is exiting 2) The exit side cancelations and both of them may do the final fput() of a file. When this happens, fput() will schedule delayed work. This slows down exits when io_uring needs to wait for that work to finish. It is possible to flush this via flush_delayed_fput(), but that's a big hammer as other unrelated files could be involved, and from other tasks as well. Add two io_uring helpers to temporarily clear PF_NO_TASKWORK for the worker threads, and run any queued task_work before setting the flag again. Then we can ensure we only flush related items that received their final fput as part of work cancelation and flushing. For now these are io_uring private, but could obviously be made generically available, should there be a need to do so. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 5f625be52e52..2b9dae588f04 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -238,6 +238,20 @@ static inline void io_req_add_to_cache(struct io_kiocb *req, struct io_ring_ctx wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list); } +static __cold void io_kworker_tw_start(void) +{ + if (WARN_ON_ONCE(!(current->flags & PF_NO_TASKWORK))) + return; + current->flags &= ~PF_NO_TASKWORK; +} + +static __cold void io_kworker_tw_end(void) +{ + while (task_work_pending(current)) + task_work_run(); + current->flags |= PF_NO_TASKWORK; +} + static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref) { struct io_ring_ctx *ctx = container_of(ref, struct io_ring_ctx, refs); @@ -253,6 +267,8 @@ static __cold void io_fallback_req_func(struct work_struct *work) struct io_kiocb *req, *tmp; struct io_tw_state ts = {}; + io_kworker_tw_start(); + percpu_ref_get(&ctx->refs); mutex_lock(&ctx->uring_lock); llist_for_each_entry_safe(req, tmp, node, io_task_work.node) @@ -260,6 +276,7 @@ static __cold void io_fallback_req_func(struct work_struct *work) io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); percpu_ref_put(&ctx->refs); + io_kworker_tw_end(); } static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits) @@ -2879,6 +2896,8 @@ static __cold void io_ring_exit_work(struct work_struct *work) struct io_tctx_node *node; int ret; + io_kworker_tw_start(); + /* * If we're doing polled IO and end up having requests being * submitted async (out-of-line), then completions can come in while @@ -2935,6 +2954,8 @@ static __cold void io_ring_exit_work(struct work_struct *work) */ } while (!wait_for_completion_interruptible_timeout(&ctx->ref_comp, interval)); + io_kworker_tw_end(); + init_completion(&exit.completion); init_task_work(&exit.task_work, io_tctx_exit_cb); exit.ctx = ctx; From patchwork Fri Mar 21 19:24:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14025959 Received: from mail-il1-f182.google.com (mail-il1-f182.google.com [209.85.166.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30CF618FC75 for ; Fri, 21 Mar 2025 19:31:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585506; cv=none; b=H6mLIKjxYhAZsQZ/YLwYiq90puA8p3dI2wTqHjewg5eiZI9B2yZTvn9+or3i7ttY1SbaxQka6w0nljw4xYvR8wjTpelO4UKUJceZZ2uRusPpkx8Lj0BrP436coWRU+7CR9Tr/fn6AK2w1l5jwkcd4z/EklQ3S2x5ryIDXGy7uXM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585506; c=relaxed/simple; bh=KZ+Ic+BWyQcR0e6F7+GzTLf/KD3WATnb9NZNaa4mV4g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VyZAbNoyclZJydeSAqaXu0/Emzqj+GU+egzOdorn6HyN5JAMRhI7zKFZHTNjKo60DfVecF1ZwceDJSGQCbg/AyUs8Gz9PoK0qUCXRUJONel1oilIY8RHeVvYQ99lXIk9hgJgzP3p+QUOYei1Hm/WzzQRKGccZlZe5BerTSKTXkQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=UfjK7qUx; arc=none smtp.client-ip=209.85.166.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="UfjK7qUx" Received: by mail-il1-f182.google.com with SMTP id e9e14a558f8ab-3cf8e017abcso9601305ab.1 for ; Fri, 21 Mar 2025 12:31:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1742585503; x=1743190303; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tU56+PQ2vZTfTRhA3cEUcccF09LVLkb7uS6dS9cTzOw=; b=UfjK7qUxQZSeq1dJ4wWRIhtYozEBtEpEJ4tS7psKGiesaSd6Yfd3Yyk7vpDyV6T7F2 8eNg+bm9fqb2nthy46LqKlE35GxExQkTayGeFkV8nKyZSKyoLlvFP/BSZP06prd5TG7k hRQKuIBO4kNStU3DX9fe8/NTRRvAAj5lp3d4llBVj0tLY8Nebqv9qY6z6556qsIFjmBW +8wUF7ho4uSysKwNs56udXj20sOe93sKhaGUmxaS6p6NnoBKk2lBCQ3BnIORQtbxTvnW EowZ2BGnaLfTGi5Php2n3zkw1OD71PInFt14u8QQ0GrRxisoB+WUGlc7aWlLIZ1FK+r1 Ddmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742585503; x=1743190303; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tU56+PQ2vZTfTRhA3cEUcccF09LVLkb7uS6dS9cTzOw=; b=kDdkn4p44B2lkS/UYJgucuDXrBYRDcHxq8720eiVT0xkMiTkCTSzUD69q2gzXmDGAr VbPjwYbgBf6Wa+gZFP1aqW1cjbtUbbMd2ClYO98Bh8lrPeV16SkMCXOyzkt+KheQZQet VrrBhCWtDXcfdZypYGTzjfmtf2Y5jyUkDma0tQYxzuY8aAlJxb7SdpRmHHiLzcnnVGln u2GAsgIzoynH58iS7mFAcWd1Zgvg6mp5dfp/cvLCJQMyU0OuOwvu0FNTM0BgtS7vazzv Ei2sD1exl7nvX87zTYadSHpIDqM8VMhMhbth8OY5/zY206K/nJ7oKeFvieTZXkJXqmBB 3ZTA== X-Gm-Message-State: AOJu0Yy4bGo5GJcfYxYxdL+7ko6ns2QACDvSwJUcR4bPtIj8zIcIvDJF BTVVtNv/6WNV9oOMdodcr7FHF5gpJ7jjt6atVbO/hYazyGrPUZ1KR6AGU0DXbHznob9GyUMoH6j U X-Gm-Gg: ASbGnct2EgBbt9o1LjCEhb0uq6DakhnH+1xM9rWjr1RSjQx6d3qWsp8eoM38vQQEZ2w OQPdmQI6/LZmAqtwfnjzYYLQfBllMRUYU2GUC5Qy9Oql3nXbB8ZE+jv88vVwmhPJODX4BPlA9W0 m5XeurXmL8zMIPPbTi7g9/53iCNfT0Pg+8aln8V4GNtcHVrMzcYWfsPd1g63tPzN72XJrrKUN6T hQdvaFQStH8dWQuOQ4I5G29nP6XGN1RkEQUBG8c9UVKY8O2CnMyV39TSAyDFW1JPZ4pn77hkAlh BkX7Rd95lK6jdM6LSqjyVxJSltWS6JuDsovPbRZ5+JHIFnKxkg== X-Google-Smtp-Source: AGHT+IF98xNhuR+iTPRcOpDSzKzkv/vrnFzdQso/HbsfIFcKJ/3wsGRa0pq6EI1L6XdrQW6J0qYahA== X-Received: by 2002:a05:6e02:144a:b0:3d5:81aa:4d11 with SMTP id e9e14a558f8ab-3d5960e275emr41343825ab.9.1742585503558; Fri, 21 Mar 2025 12:31:43 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f2cbdeac82sm571268173.71.2025.03.21.12.31.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Mar 2025 12:31:42 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, Jens Axboe Subject: [PATCH 3/5] io_uring: consider ring dead once the ref is marked dying Date: Fri, 21 Mar 2025 13:24:57 -0600 Message-ID: <20250321193134.738973-4-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250321193134.738973-1-axboe@kernel.dk> References: <20250321193134.738973-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Don't gate this on the task exiting flag. It's generally not a good idea to gate it on the task PF_EXITING flag anyway. Once the ring is starting to go through ring teardown, the ref is marked as dying. Use that as our fallback/cancel mechanism. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 2b9dae588f04..984db01f5184 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -555,7 +555,8 @@ static void io_queue_iowq(struct io_kiocb *req) * procedure rather than attempt to run this request (or create a new * worker for it). */ - if (WARN_ON_ONCE(!same_thread_group(tctx->task, current))) + if (WARN_ON_ONCE(!same_thread_group(tctx->task, current) && + !percpu_ref_is_dying(&req->ctx->refs))) atomic_or(IO_WQ_WORK_CANCEL, &req->work.flags); trace_io_uring_queue_async_work(req, io_wq_is_hashed(&req->work)); @@ -1254,7 +1255,8 @@ static void io_req_normal_work_add(struct io_kiocb *req) return; } - if (likely(!task_work_add(tctx->task, &tctx->task_work, ctx->notify_method))) + if (!percpu_ref_is_dying(&ctx->refs) && + !task_work_add(tctx->task, &tctx->task_work, ctx->notify_method)) return; io_fallback_tw(tctx, false); From patchwork Fri Mar 21 19:24:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14025960 Received: from mail-il1-f182.google.com (mail-il1-f182.google.com [209.85.166.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7DB522FDF1 for ; Fri, 21 Mar 2025 19:31:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585508; cv=none; b=hny2Z+DFdIlyudHJ0IkgX3Iz7cw9iRkPmbYtu+SX/dYJt0qP1zQoR1ycUdEfIQb1bLt6M5DJthUjaJ2EI9+2puNSzsUjdv/vQbTbsBAhOyjghI7vdItGhJpMHAV9+Bqdj2qlq5+6DAy7CceezWf069p3hZWzzj/dzK0Og1120UU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585508; c=relaxed/simple; bh=197zaw8aqPQrJc4ndS9SK/R7Fr/wKQkU/qb8sBNFCxI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j3ptzkMVj7nMXVRUI1y2Jlrmcz11SLq4QEOpzXZYgtIokwAsgmmGz9ZmXiQLulufpkctxX47bZIZb4JZ9/SSrn/NhY9zmUj77ypcPbynKQVoxLG4YdNGT42Pafy8wfpL3vO6LsEMFzcpOv7r4btdzuK61DjZXmOM6aBNBNSKPnY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=Tj+mhNyc; arc=none smtp.client-ip=209.85.166.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="Tj+mhNyc" Received: by mail-il1-f182.google.com with SMTP id e9e14a558f8ab-3cf82bd380bso23387105ab.0 for ; Fri, 21 Mar 2025 12:31:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1742585505; x=1743190305; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Lyv0v+9t/Gd7eyMs0VD/TYXqCX5NM23FonCkFLeqK8k=; b=Tj+mhNycMOgxymJxplaPXzsJrrxDeJmike414+mbxVnI7d8LDjcTNcUhsJtHSZqMG5 O+QMkXFaG5vvOrP67C/VBL8NCa4IlQDSzkt1Y5QorSNojqF+kmyCTOWuUnwatFnSVl9F PxvZqiq0CI5BWws3DVk/swFwMbNrg0UVYupukIK6kDhp8UHHn19AnsdDD6p6W0vBBBxm w+LBsbPAlgL+05ZqRgfKxGGnskkKVm+UsR7rbqi1wxfBCUf+YN3UlBrdHNUXLyK9ls6Z ymMq3DfeToVCYgKbW22Ln2cBJOfictCyBGJTy0YGnAUMSI+/q3nOC5WmV4dzzQuBNquN V/qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742585505; x=1743190305; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Lyv0v+9t/Gd7eyMs0VD/TYXqCX5NM23FonCkFLeqK8k=; b=DPFkQndKTK2np/8pGOS6sAtYkP8iFYpzxjO/I/9EgPPDYbxEpl2ZdPP6x7ooFIREDX wwgRcCImWrz63+QwzFPLoP0YG+SMqQCI/EI8j9z98gXGEiOtPbBYyrAZAn+WaG2dxZJj kdUGm6igZvZOtEal9PTUXDwAgdSqzBZYg9d2ueIsJJnkReVGIEhGus/sCmiNn8IacJxG 0oHtmrXxSL/8s/i6uDh1Gm5m4fCyr4jV5BX4Y4ERwQklgkf0EcQ7Rv9ioPbLjEONZ+KY uLm2dWyLB7xFtyryF4imYvIcGZM+xUKo+l8LGXPEhuLX1B6XHSjGPkURUQMfm6eYVKok x7QQ== X-Gm-Message-State: AOJu0YyGmvyG1EKQwpypKjhNpOzfL4yCcDFsV0/E7DTMEmIOOW62yPBP egX9JhAVTfeshPhr32IMgZPzbr4CGa2nawei8jS5UINzVTsLum6QtmsLyUouXkZnWijMF/YEyVI M X-Gm-Gg: ASbGnctoW4w/kbPVr2cX0BDgng+cvX3EfQhG73/DmpMKuqf1Nc355lZI00V01sRi7i7 Y9bM/9A1+4G+VIrmGyxNJhwdDwXn+bVToAUuODl1vhuZbXl2rwKP0nCyGNKmgzKzqcxfwOrJp+2 gp32iUGna5g14lYwHXel+d6OzN4KwJ59lqSnRwPGaJptGg6fqXpcgRoMavWBLKchxMMIUz2jt2J wH3XcCy8H2BNS+lb0B2xAvVBFOMisxpQHM03MzH17JD9VPvVKn/ZofO1TCEFmGdQvfpg7oai3e1 9UhrhlHZjq6QvT0RDy1LUhQeVEGUEf+hGQdqExaFwQsUE6yUkg== X-Google-Smtp-Source: AGHT+IHDzZ9mnAh4G/k6cqLjL0gwmkvCkzcei4RqCvFryxR08cJUxR6laJCk6mpKm7TMJQEv95rPsA== X-Received: by 2002:a05:6e02:1a2e:b0:3cf:fe21:af8 with SMTP id e9e14a558f8ab-3d59612bcd5mr47747575ab.4.1742585505040; Fri, 21 Mar 2025 12:31:45 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f2cbdeac82sm571268173.71.2025.03.21.12.31.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Mar 2025 12:31:43 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, Jens Axboe Subject: [PATCH 4/5] io_uring: wait for cancelations on final ring put Date: Fri, 21 Mar 2025 13:24:58 -0600 Message-ID: <20250321193134.738973-5-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250321193134.738973-1-axboe@kernel.dk> References: <20250321193134.738973-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We still offload the cancelation to a workqueue, as not to introduce dependencies between the exiting task waiting on cleanup, and that task needing to run task_work to complete the process. This means that once the final ring put is done, any request that was inflight and needed cancelation will be done as well. Notably requests that hold references to files - once the ring fd close is done, we will have dropped any of those references too. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 ++ io_uring/io_uring.c | 17 +++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index c17d2eedf478..79e223fd4733 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -450,6 +450,8 @@ struct io_ring_ctx { struct io_mapped_region param_region; /* just one zcrx per ring for now, will move to io_zcrx_ifq eventually */ struct io_mapped_region zcrx_region; + + struct completion *exit_comp; }; /* diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 984db01f5184..d9b65a322ae1 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2894,6 +2894,7 @@ static __cold void io_ring_exit_work(struct work_struct *work) struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work); unsigned long timeout = jiffies + HZ * 60 * 5; unsigned long interval = HZ / 20; + struct completion *exit_comp; struct io_tctx_exit exit; struct io_tctx_node *node; int ret; @@ -2958,6 +2959,10 @@ static __cold void io_ring_exit_work(struct work_struct *work) io_kworker_tw_end(); + exit_comp = READ_ONCE(ctx->exit_comp); + if (exit_comp) + complete(exit_comp); + init_completion(&exit.completion); init_task_work(&exit.task_work, io_tctx_exit_cb); exit.ctx = ctx; @@ -3020,9 +3025,21 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) static int io_uring_release(struct inode *inode, struct file *file) { struct io_ring_ctx *ctx = file->private_data; + DECLARE_COMPLETION_ONSTACK(exit_comp); file->private_data = NULL; + WRITE_ONCE(ctx->exit_comp, &exit_comp); io_ring_ctx_wait_and_kill(ctx); + + /* + * Wait for cancel to run before exiting task + */ + do { + if (current->io_uring) + io_fallback_tw(current->io_uring, false); + cond_resched(); + } while (wait_for_completion_interruptible(&exit_comp)); + return 0; } From patchwork Fri Mar 21 19:24:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14025961 Received: from mail-il1-f171.google.com (mail-il1-f171.google.com [209.85.166.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39FD422FE05 for ; Fri, 21 Mar 2025 19:31:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585510; cv=none; b=XguwaESVIlVbLKqasPN6WBSX8WAGK8xS3e8S/w4Y6QGUWihS2vDxsip9MF+TVjBvB558ac0g4R5qBE87qLCBBLjSPB7cOa5Y9M2diBxPrSuL2E1cbTC/i/ZDQMGyZhykFhFwYtWfkvw3g08d9kVmDw7JoFEuGWIoUNvUzDU5a0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742585510; c=relaxed/simple; bh=7QEqub1Vj3MdKwHXPBFxW1jq7PfMc+eISFIemSeZgrY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NnNsFHb9u7M8yDFrVH5GqFQkk1Zc0iV6X0RPPvivFaAy5xdlcH2EoFOh4bKJt0pQBKMEAVO+EneT0D/6eIuygOdtCvJwEmOExhfLbKxQHZL+PdqMwdpYE8pGDhjub07PKUKRwnIuf7DJPZ5GHaCldls01SmtaYa/UXve6fiSGQc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=S/uK+kUj; arc=none smtp.client-ip=209.85.166.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="S/uK+kUj" Received: by mail-il1-f171.google.com with SMTP id e9e14a558f8ab-3cda56e1dffso13751655ab.1 for ; Fri, 21 Mar 2025 12:31:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1742585507; x=1743190307; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=48z6L6l3zCWyto92roFZZ+pWxEMdrdG+pAn9DXy+AN8=; b=S/uK+kUjPpZqQTUOx+8J9WnWY+Z19byxrEjQh8X/6FI7sdd/8sSBhJpbt5vqLl6c33 PajjNUDJharDrlANs73zzd/5oeMsr5IxQ9+M8vcRoEATvQ/AfGU3Z9ORQQnv7H3eu2f+ EkX4KEUZH4u4jUn2q+smK/GvKE1nKiwxClzQxtWnQLuKgfrsxqtTPdYTci8vnJNg+J2P Tria1YT67MLs73Xr5oR8ZkSX0A+bisqvAqOUusm8Q6lrLTX++jIF+Od2d5D1eeJDAtMH FpsQ9Fs2qYV4FGLU+8RpBsS9MbLz1GnMDK5UHTcZVY0S2jNDN4A8ZM/yGOKjMWHz+PX+ qb/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742585507; x=1743190307; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=48z6L6l3zCWyto92roFZZ+pWxEMdrdG+pAn9DXy+AN8=; b=UmLHaUUzNwoutF74zueqxBO0lSVcDV5IKW7swBIrefkyCdq7p8lvreD90ydp/quN6h HwUatZV+R9J4a4IO6ihytkO3MbHn46lKnWCKLcmVMr88/m1viRhj8G6sXgv/r2jvpMa+ G9T1Wxn4mPvCrupHy6uTPWtcbmcOus2IIIIMmSxPTbO1LImwlpIfJucdOzsiVuGX1pxW rh5IMOUZISgRWsir4TBDGgGxEAIf5cSVQ28dK2jxjv/HkUbzOfP7/C4ccIdeSWtZgLVG Si1/tdHZYQbw1D7liGCNa2DGCSZb/VaEK9sF5HBU13M45sgTqYfowJ+H/N8rvNo22CU/ VOBw== X-Gm-Message-State: AOJu0YzB+uFFOzEtVyT7TqkWkDx2PrvnYhpcRFZUNrcHqUsPekdKHNQQ zk9drdCvMZVHJyPpQ6QCGi4n91iMTl5qtMHMpAmR4ewGrPp/+zYmLVC6boGDaR7KH98C+ho8KcI 0 X-Gm-Gg: ASbGncsUrcbHVydYGSY3o1YTpLxGQFFzYSDwC5wOexVPL/tXUFXkEzE+dyd9uiU0XJu 69GK5WVuqEDl0NcxZScEe1+2WmB38Szfb9OIMzb6IwAfT7VAuIBucsixeWJoNFeyF2g4UK0H8Tq FdKTVx53Qgfm64G+YQ51HvNN9j+37GYrHnsQefPjoo3eCgT9dH6ZZkfXVYcsMOIQgvPDd5k9BIH AYi1DVA0+Xh6KOfRvRUKvM3nUaGUSkz2ZGjJ2Gzes9E6qAfcXn8aIWJ6IcCFkBAAY65SrdoPjNc ginMSkbaLerpxhSJVkPME0i2y87z06KZdnf+TugPJndYfY9W0A== X-Google-Smtp-Source: AGHT+IG0bi34962aLlJkefxbOqyCdHf9rGf/M7Gi3lNoQjjBccvr0EWssrV2/c/eRYMQrprIA+FArw== X-Received: by 2002:a05:6e02:160b:b0:3d4:3fbf:967d with SMTP id e9e14a558f8ab-3d5960f2797mr52105765ab.7.1742585506719; Fri, 21 Mar 2025 12:31:46 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f2cbdeac82sm571268173.71.2025.03.21.12.31.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Mar 2025 12:31:45 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, Jens Axboe Subject: [PATCH 5/5] io_uring: switch away from percpu refcounts Date: Fri, 21 Mar 2025 13:24:59 -0600 Message-ID: <20250321193134.738973-6-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250321193134.738973-1-axboe@kernel.dk> References: <20250321193134.738973-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For the common cases, the io_uring ref counts are all batched and hence need not be a percpu reference. This saves some memory on systems, but outside of that, it gets rid of needing a full RCU grace period on tearing down the reference. With io_uring now waiting on cancelations and IO during exit, this slows down the tear down a lot, up to 100x as slow. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 +- io_uring/io_uring.c | 47 ++++++++++++---------------------- io_uring/io_uring.h | 3 ++- io_uring/msg_ring.c | 4 +-- io_uring/refs.h | 43 +++++++++++++++++++++++++++++++ io_uring/register.c | 2 +- io_uring/rw.c | 2 +- io_uring/sqpoll.c | 2 +- io_uring/zcrx.c | 4 +-- 9 files changed, 70 insertions(+), 39 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 79e223fd4733..8894b0639a3a 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -256,7 +256,7 @@ struct io_ring_ctx { struct task_struct *submitter_task; struct io_rings *rings; - struct percpu_ref refs; + atomic_long_t refs; clockid_t clockid; enum tk_offsets clock_offset; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index d9b65a322ae1..69b8f3237b1a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -252,13 +252,6 @@ static __cold void io_kworker_tw_end(void) current->flags |= PF_NO_TASKWORK; } -static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref) -{ - struct io_ring_ctx *ctx = container_of(ref, struct io_ring_ctx, refs); - - complete(&ctx->ref_comp); -} - static __cold void io_fallback_req_func(struct work_struct *work) { struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, @@ -269,13 +262,13 @@ static __cold void io_fallback_req_func(struct work_struct *work) io_kworker_tw_start(); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); mutex_lock(&ctx->uring_lock); llist_for_each_entry_safe(req, tmp, node, io_task_work.node) req->io_task_work.func(req, ts); io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); io_kworker_tw_end(); } @@ -333,10 +326,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) hash_bits = clamp(hash_bits, 1, 8); if (io_alloc_hash_table(&ctx->cancel_table, hash_bits)) goto err; - if (percpu_ref_init(&ctx->refs, io_ring_ctx_ref_free, - 0, GFP_KERNEL)) - goto err; + io_ring_ref_init(ctx); ctx->flags = p->flags; ctx->hybrid_poll_time = LLONG_MAX; atomic_set(&ctx->cq_wait_nr, IO_CQ_WAKE_INIT); @@ -360,7 +351,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) ret |= io_futex_cache_init(ctx); ret |= io_rsrc_cache_init(ctx); if (ret) - goto free_ref; + goto err; init_completion(&ctx->ref_comp); xa_init_flags(&ctx->personalities, XA_FLAGS_ALLOC1); mutex_init(&ctx->uring_lock); @@ -386,9 +377,6 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) mutex_init(&ctx->mmap_lock); return ctx; - -free_ref: - percpu_ref_exit(&ctx->refs); err: io_free_alloc_caches(ctx); kvfree(ctx->cancel_table.hbs); @@ -556,7 +544,7 @@ static void io_queue_iowq(struct io_kiocb *req) * worker for it). */ if (WARN_ON_ONCE(!same_thread_group(tctx->task, current) && - !percpu_ref_is_dying(&req->ctx->refs))) + !io_ring_ref_is_dying(req->ctx))) atomic_or(IO_WQ_WORK_CANCEL, &req->work.flags); trace_io_uring_queue_async_work(req, io_wq_is_hashed(&req->work)); @@ -998,7 +986,7 @@ __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx) ret = 1; } - percpu_ref_get_many(&ctx->refs, ret); + io_ring_ref_get_many(ctx, ret); while (ret--) { struct io_kiocb *req = reqs[ret]; @@ -1053,7 +1041,7 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, io_tw_token_t tw) io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); } /* @@ -1077,7 +1065,7 @@ struct llist_node *io_handle_tw_list(struct llist_node *node, ctx_flush_and_put(ctx, ts); ctx = req->ctx; mutex_lock(&ctx->uring_lock); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); } INDIRECT_CALL_2(req->io_task_work.func, io_poll_task_func, io_req_rw_complete, @@ -1106,10 +1094,10 @@ static __cold void __io_fallback_tw(struct llist_node *node, bool sync) if (sync && last_ctx != req->ctx) { if (last_ctx) { flush_delayed_work(&last_ctx->fallback_work); - percpu_ref_put(&last_ctx->refs); + io_ring_ref_put(last_ctx); } last_ctx = req->ctx; - percpu_ref_get(&last_ctx->refs); + io_ring_ref_get(last_ctx); } if (llist_add(&req->io_task_work.node, &req->ctx->fallback_llist)) @@ -1118,7 +1106,7 @@ static __cold void __io_fallback_tw(struct llist_node *node, bool sync) if (last_ctx) { flush_delayed_work(&last_ctx->fallback_work); - percpu_ref_put(&last_ctx->refs); + io_ring_ref_put(last_ctx); } } @@ -1255,7 +1243,7 @@ static void io_req_normal_work_add(struct io_kiocb *req) return; } - if (!percpu_ref_is_dying(&ctx->refs) && + if (!io_ring_ref_is_dying(ctx) && !task_work_add(tctx->task, &tctx->task_work, ctx->notify_method)) return; @@ -2739,7 +2727,7 @@ static void io_req_caches_free(struct io_ring_ctx *ctx) nr++; } if (nr) - percpu_ref_put_many(&ctx->refs, nr); + io_ring_ref_put_many(ctx, nr); mutex_unlock(&ctx->uring_lock); } @@ -2773,7 +2761,6 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) if (!(ctx->flags & IORING_SETUP_NO_SQARRAY)) static_branch_dec(&io_key_has_sqarray); - percpu_ref_exit(&ctx->refs); free_uid(ctx->user); io_req_caches_free(ctx); if (ctx->hash_map) @@ -2798,7 +2785,7 @@ static __cold void io_activate_pollwq_cb(struct callback_head *cb) * might've been lost due to loose synchronisation. */ wake_up_all(&ctx->poll_wq); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); } __cold void io_activate_pollwq(struct io_ring_ctx *ctx) @@ -2816,9 +2803,9 @@ __cold void io_activate_pollwq(struct io_ring_ctx *ctx) * only need to sync with it, which is done by injecting a tw */ init_task_work(&ctx->poll_wq_task_work, io_activate_pollwq_cb); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); if (task_work_add(ctx->submitter_task, &ctx->poll_wq_task_work, TWA_SIGNAL)) - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); out: spin_unlock(&ctx->completion_lock); } @@ -3005,7 +2992,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) struct creds *creds; mutex_lock(&ctx->uring_lock); - percpu_ref_kill(&ctx->refs); + io_ring_ref_kill(ctx); xa_for_each(&ctx->personalities, index, creds) io_unregister_personality(ctx, index); mutex_unlock(&ctx->uring_lock); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 87f883130286..67e5921771be 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -13,6 +13,7 @@ #include "slist.h" #include "filetable.h" #include "opdef.h" +#include "refs.h" #ifndef CREATE_TRACE_POINTS #include @@ -143,7 +144,7 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) * Not from an SQE, as those cannot be submitted, but via * updating tagged resources. */ - if (!percpu_ref_is_dying(&ctx->refs)) + if (!io_ring_ref_is_dying(ctx)) lockdep_assert(current == ctx->submitter_task); } #endif diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c index 0bbcbbcdebfd..30d4cabb66d6 100644 --- a/io_uring/msg_ring.c +++ b/io_uring/msg_ring.c @@ -83,7 +83,7 @@ static void io_msg_tw_complete(struct io_kiocb *req, io_tw_token_t tw) } if (req) kmem_cache_free(req_cachep, req); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); } static int io_msg_remote_post(struct io_ring_ctx *ctx, struct io_kiocb *req, @@ -95,7 +95,7 @@ static int io_msg_remote_post(struct io_ring_ctx *ctx, struct io_kiocb *req, } req->cqe.user_data = user_data; io_req_set_res(req, res, cflags); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); req->ctx = ctx; req->tctx = NULL; req->io_task_work.func = io_msg_tw_complete; diff --git a/io_uring/refs.h b/io_uring/refs.h index 63982ead9f7d..a794e6980cb8 100644 --- a/io_uring/refs.h +++ b/io_uring/refs.h @@ -52,4 +52,47 @@ static inline void io_req_set_refcount(struct io_kiocb *req) { __io_req_set_refcount(req, 1); } + +#define IO_RING_REF_DEAD (1ULL << 63) +#define IO_RING_REF_MASK (~IO_RING_REF_DEAD) + +static inline bool io_ring_ref_is_dying(struct io_ring_ctx *ctx) +{ + return atomic_long_read(&ctx->refs) & IO_RING_REF_DEAD; +} + +static inline void io_ring_ref_put_many(struct io_ring_ctx *ctx, int nr_refs) +{ + unsigned long refs; + + refs = atomic_long_sub_return(nr_refs, &ctx->refs); + if (!(refs & IO_RING_REF_MASK)) + complete(&ctx->ref_comp); +} + +static inline void io_ring_ref_put(struct io_ring_ctx *ctx) +{ + io_ring_ref_put_many(ctx, 1); +} + +static inline void io_ring_ref_kill(struct io_ring_ctx *ctx) +{ + atomic_long_xor(IO_RING_REF_DEAD, &ctx->refs); + io_ring_ref_put(ctx); +} + +static inline void io_ring_ref_init(struct io_ring_ctx *ctx) +{ + atomic_long_set(&ctx->refs, 1); +} + +static inline void io_ring_ref_get_many(struct io_ring_ctx *ctx, int nr_refs) +{ + atomic_long_add(nr_refs, &ctx->refs); +} + +static inline void io_ring_ref_get(struct io_ring_ctx *ctx) +{ + atomic_long_inc(&ctx->refs); +} #endif diff --git a/io_uring/register.c b/io_uring/register.c index cc23a4c205cd..54fe94a0101b 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -637,7 +637,7 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, * We don't quiesce the refs for register anymore and so it can't be * dying as we're holding a file ref here. */ - if (WARN_ON_ONCE(percpu_ref_is_dying(&ctx->refs))) + if (WARN_ON_ONCE(io_ring_ref_is_dying(ctx))) return -ENXIO; if (ctx->submitter_task && ctx->submitter_task != current) diff --git a/io_uring/rw.c b/io_uring/rw.c index 039e063f7091..e010d548edea 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -496,7 +496,7 @@ static bool io_rw_should_reissue(struct io_kiocb *req) * Don't attempt to reissue from that path, just let it fail with * -EAGAIN. */ - if (percpu_ref_is_dying(&ctx->refs)) + if (io_ring_ref_is_dying(ctx)) return false; io_meta_restore(io, &rw->kiocb); diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index d037cc68e9d3..b71f8d52386e 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -184,7 +184,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) * Don't submit if refs are dying, good for io_uring_register(), * but also it is relied upon by io_ring_exit_work() */ - if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) && + if (to_submit && likely(!io_ring_ref_is_dying(ctx)) && !(ctx->flags & IORING_SETUP_R_DISABLED)) ret = io_submit_sqes(ctx, to_submit); mutex_unlock(&ctx->uring_lock); diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 9c95b5b6ec4e..07719e3bf1b3 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -629,7 +629,7 @@ static int io_pp_zc_init(struct page_pool *pp) if (pp->p.dma_dir != DMA_FROM_DEVICE) return -EOPNOTSUPP; - percpu_ref_get(&ifq->ctx->refs); + io_ring_ref_get(ifq->ctx); return 0; } @@ -640,7 +640,7 @@ static void io_pp_zc_destroy(struct page_pool *pp) if (WARN_ON_ONCE(area->free_count != area->nia.num_niovs)) return; - percpu_ref_put(&ifq->ctx->refs); + io_ring_ref_put(ifq->ctx); } static int io_pp_nl_fill(void *mp_priv, struct sk_buff *rsp,