From patchwork Tue Jun 4 19:01:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13685802 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18F7314AD20 for ; Tue, 4 Jun 2024 19:13:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528403; cv=none; b=m+66zIYIafuwXO1PMitSFie3hOrWJ7GMb3//B2L99cFHCDktavW9vngH4Spfz4WK5D1N+pSxEhJRE1VFgON0Au7UuzlAgjjB+kGkPc0RBzoAmJdsfPaQ9uDpS5tYDm6CmN6ybkuNZ9dksIfh60BiPGjlsbtFNnx8y3EiVQrj0uc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528403; c=relaxed/simple; bh=fj/cklpDzlPmnDR7MsY0pfzqzUBbRWjrpXtpK3wWDMs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XqOJjbTXCy2JY8SHEvX0BcE11WW5h/tbtzE1x92/q03jfXM8DGpwuZyTg/dvZS8MBhWVdVC5PmP8CklOu+FeHZbR7jpHko9JRPa/phf29Z1bYuFTraOIbcq9OHirM1Gkn4ff/J91k42T468uLshwXYhqpXbXVpXAV0XlJhMHoE4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=VRjvKcN/; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="VRjvKcN/" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2bfae86f1ffso775776a91.0 for ; Tue, 04 Jun 2024 12:13:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1717528399; x=1718133199; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jM0+32GUsyvMsyazGkgmuEhBi1SKTIjEAt7QjYFovOI=; b=VRjvKcN/8uqYK4G5TBtTdJa6lCSDyyoCS4tWZYrgFVGZg200nj5b9YMJswQgi5YLge vZ79qiVS6e4dJqyVPvMbYYQBYCCBudKAFdCWJpE7ghPPnIM77glAuIiD+yKp2kLRKeZS kHvixPU1IjeV0gScY5d/NLNBjphxsB0ecNwvDyujRG71/dP9drdJkciZ3mKSoh6qnbRY r5VjgqZTg/Uzknz2wsSLi12JhSxVtwsspbdkg1Lgvq4rrvBgFFU+t/WXPgYWo66rsJOH vVg35wTBC+BguLx2BWBSGNaZnANFh40iJ6P8BGgSrfn6TFRirssaKljEMKUM5agcdbld GTDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717528399; x=1718133199; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jM0+32GUsyvMsyazGkgmuEhBi1SKTIjEAt7QjYFovOI=; b=cGn8pNvb9dFyloetDpEq9zXamztMH/NbII9nNQgobsZFzTM04VnHX6PEXwt5Hnf4nK UZWdutLp5bx8Lgt0sY8KaEdzza3/V09CjZewc+lkj6rfocrcqBUA6G8eBf+Fwa9mS93+ TpuPiW/2qmrliB7O/ymSpft7zmfg03o26VZSvszYDel+t7+YEneYoDmfJbALcN/20rds /IEWWDoY2cpq8BkxaLEjSn86JA0VYs+hqsfy7gWFT9q2J0ohlxNRtycWHvz1so7P+qgY 1KUZxekH6xc9xrBDEnTiTIwTtzg7kqC/xjNje6vcMaNRp+sL9qVHnuFqsvmLach7B6bK Ljxw== X-Gm-Message-State: AOJu0YwyE7LGl3UnEtfeB761ZHsAvZr3Jk1kN9MCuzZKEupkDn9UFKqq RDpJ/cTPEA9byYR1NWu1OFcAERgvJDkhK8VaEGKRq7jPHEzgseQgjs23wU/9kXncf/T3v0soa76 y X-Google-Smtp-Source: AGHT+IERAhvTaLC21BqRpXi4xBBNj41yhtdOdhTh2EK24iWJb6/gW95DIvQeQuzmS4sy7qaDlvuZ8g== X-Received: by 2002:a17:90b:1052:b0:2bd:f770:f95e with SMTP id 98e67ed59e1d1-2c27da3e17cmr407488a91.0.1717528399248; Tue, 04 Jun 2024 12:13:19 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2c1c283164fsm8960265a91.37.2024.06.04.12.13.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 12:13:18 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/5] fs: gate final fput task_work on PF_NO_TASKWORK Date: Tue, 4 Jun 2024 13:01:28 -0600 Message-ID: <20240604191314.454554-2-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240604191314.454554-1-axboe@kernel.dk> References: <20240604191314.454554-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Rather than hardwire this to kernel threads, add a task flag that tells us whether the task in question runs task_work or not. At fork time, this flag is set for kernel threads. This is in preparation for allowing kernel threads to signal that they will run deferred task_work. No functional changes in this patch. Signed-off-by: Jens Axboe --- fs/file_table.c | 2 +- include/linux/sched.h | 2 +- kernel/fork.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index 4f03beed4737..d7c6685afbcb 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -477,7 +477,7 @@ void fput(struct file *file) file_free(file); return; } - if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { + if (likely(!in_interrupt() && !(task->flags & PF_NO_TASKWORK))) { init_task_work(&file->f_task_work, ____fput); if (!task_work_add(task, &file->f_task_work, TWA_RESUME)) return; diff --git a/include/linux/sched.h b/include/linux/sched.h index 61591ac6eab6..1393d557f05e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1635,7 +1635,7 @@ extern struct pid *cad_pid; #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ #define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */ #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ -#define PF__HOLE__00010000 0x00010000 +#define PF_NO_TASKWORK 0x00010000 /* task doesn't run task_work */ #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocations inherit GFP_NOFS. See memalloc_nfs_save() */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocations inherit GFP_NOIO. See memalloc_noio_save() */ diff --git a/kernel/fork.c b/kernel/fork.c index 99076dbe27d8..156bf8778d18 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2222,7 +2222,7 @@ __latent_entropy struct task_struct *copy_process( goto fork_out; p->flags &= ~PF_KTHREAD; if (args->kthread) - p->flags |= PF_KTHREAD; + p->flags |= PF_KTHREAD | PF_NO_TASKWORK; if (args->user_worker) { /* * Mark us a user worker, and block any signal that isn't From patchwork Tue Jun 4 19:01:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13685803 Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F05A614AD2D for ; Tue, 4 Jun 2024 19:13:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528404; cv=none; b=MRJ8V5AHOI3BA1epGeg6QQuIIQr+DWC0w68istjthcjojoFJoj4aKJEERTs/Uk8fuzjpZ/7m+GQC4VBT2TxXiSSN4aO/s3DazUntBn2BcYgX0tZ+SNqdWSVEfXDzJNuFg8UFhJvAAnSEdZFPlOQD3Z1JWqq3qex/w1nfWAf0KNk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528404; c=relaxed/simple; bh=D+m/wSgQAbMA+02aZWRWA1Gwz4JUhMB+0I6b8qbq06k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BwmVJ11WCivIOxFtYoTsGg+ziPDRy5RNrnv+knKrWdl66Gca8U/faJJvyYBhjYAHcmB+Kr/xFwxVF4rn3+n52Fd9en93vljf1v+5LQ0PsRbQbkrz1do9ugjhhcRdnHGMeJ5vopprSYx6hNTy9tUa7x/KgKz9/0K59dNuGE8zNWg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=B0af9lpX; arc=none smtp.client-ip=209.85.215.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="B0af9lpX" Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-6c53be088b1so546668a12.1 for ; Tue, 04 Jun 2024 12:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1717528402; x=1718133202; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/XuP3zpjuZ3vDa/VOrDbCkjCQYw9n0YiMJHOj6m2/Xo=; b=B0af9lpX69FPay5SrX2xAXATqxdF1Fk5E1cziCm20g11l+j7w1hGtGtcXqjTue0nYv ZwkiL1eBMg16GyZ4iiuXha7Qg4haX6q7BrXF6EmKJ15RkcVt5C8BM6ODm34MisULJ7L/ 9KIefxGrP/31lA1qyVw4TWdrWVNZ37WVuQRn4G8cE7MxTDHqsyp48PN1gT8B9tuj87Gv 5y+ZX7G4G0i5Cq1wvl/X8Z+76z4d/9inx9Ey0fEoHTuTZjImVOeK+MABVb0tfqiiH78K 63VzuZhyxTpbiuMRwoGVxbLlzVHkiBYwXTfNfCVikBVRyOSb+IyxIn8Y9eQEtCg3zYSm V5vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717528402; x=1718133202; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/XuP3zpjuZ3vDa/VOrDbCkjCQYw9n0YiMJHOj6m2/Xo=; b=HQgh9G08GnaSe1981Qi3si7tFpcFaG0jrTwn1W+UAFPtokk6txeuvFN3Xn7geX2k2e gMtNZxwMjHEJMxIt17BXOSo+CS/nxdvXpD0GpDJMy/j9M5TFPxG2yjBmCBSudHFyr4Qp HYGN4DEjZ7hb4xIpBYzVzxY76aqgmVjC+5ViYQxEqMpAGp31i19mYWNBjBEuFuS0mZmH eYHOXCHHKx3emSbk5XVp6TFRBOZ0sOK5vBn+RfyumUG60mHewee0zBsI60gPnVkegcg4 IYl1nK6kybj+D4O+YBxYLGkxAOy9s49GDsYKEZhSFZjVG87TmkoEBP5bWcy0KYhhiXiZ oeRA== X-Gm-Message-State: AOJu0YzwTrRuTgsZa55jwjcjDdXeI76h4KAq3Y2ZStvhyxesQWNnFg3W CpltwWoAaLZNCaquVX1Ns8EOKZCKORwl6M8Dx0MQeafwRbdgIL21KAvXeBBTJIhk5GmE9gc7cw8 5 X-Google-Smtp-Source: AGHT+IGAoKhdXRhDtpufvSIrfJ3zzy3QPSAlVZkvRFki22lnwhTE3h85Z2pdhXNeQ9C20rcgbriLzQ== X-Received: by 2002:a05:6a20:3caa:b0:1aa:68c4:3271 with SMTP id adf61e73a8af0-1b2b70ebef8mr634772637.3.1717528401830; Tue, 04 Jun 2024 12:13:21 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2c1c283164fsm8960265a91.37.2024.06.04.12.13.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 12:13:19 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 2/5] io_uring: mark exit side kworkers as task_work capable Date: Tue, 4 Jun 2024 13:01:29 -0600 Message-ID: <20240604191314.454554-3-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240604191314.454554-1-axboe@kernel.dk> References: <20240604191314.454554-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There are two types of work here: 1) Fallback work, if the task is exiting 2) The exit side cancelations and both of them may do the final fput() of a file. When this happens, fput() will schedule delayed work. This slows down exits when io_uring needs to wait for that work to finish. It is possible to flush this via flush_delayed_fput(), but that's a big hammer as other unrelated files could be involved, and from other tasks as well. Add two io_uring helpers to temporarily clear PF_NO_TASKWORK for the worker threads, and run any queued task_work before setting the flag again. Then we can ensure we only flush related items that received their final fput as part of work cancelation and flushing. For now these are io_uring private, but could obviously be made generically available, should there be a need to do so. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 96f6da0bf5cd..3ad915262a45 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -234,6 +234,20 @@ static inline void io_req_add_to_cache(struct io_kiocb *req, struct io_ring_ctx wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list); } +static __cold void io_kworker_tw_start(void) +{ + if (WARN_ON_ONCE(!(current->flags & PF_NO_TASKWORK))) + return; + current->flags &= ~PF_NO_TASKWORK; +} + +static __cold void io_kworker_tw_end(void) +{ + while (task_work_pending(current)) + task_work_run(); + current->flags |= PF_NO_TASKWORK; +} + static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref) { struct io_ring_ctx *ctx = container_of(ref, struct io_ring_ctx, refs); @@ -249,6 +263,8 @@ static __cold void io_fallback_req_func(struct work_struct *work) struct io_kiocb *req, *tmp; struct io_tw_state ts = {}; + io_kworker_tw_start(); + percpu_ref_get(&ctx->refs); mutex_lock(&ctx->uring_lock); llist_for_each_entry_safe(req, tmp, node, io_task_work.node) @@ -256,6 +272,7 @@ static __cold void io_fallback_req_func(struct work_struct *work) io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); percpu_ref_put(&ctx->refs); + io_kworker_tw_end(); } static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits) @@ -2720,6 +2737,8 @@ static __cold void io_ring_exit_work(struct work_struct *work) struct io_tctx_node *node; int ret; + io_kworker_tw_start(); + /* * If we're doing polled IO and end up having requests being * submitted async (out-of-line), then completions can come in while @@ -2770,6 +2789,8 @@ static __cold void io_ring_exit_work(struct work_struct *work) */ } while (!wait_for_completion_interruptible_timeout(&ctx->ref_comp, interval)); + io_kworker_tw_end(); + init_completion(&exit.completion); init_task_work(&exit.task_work, io_tctx_exit_cb); exit.ctx = ctx; From patchwork Tue Jun 4 19:01:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13685804 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C7BD14AD20 for ; Tue, 4 Jun 2024 19:13:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528407; cv=none; b=RlYD+coa8//gskyyPHMa4auVzadiOjTYCTWUXLxX65eKkK38BamoMm+EJ+azMeaUKgRM8Nk5LBXmrTUWgQBp65kO2/WCku3O5DNm8ZY1liY/NSWmCwSfFv2yoLIX/DoNjAz+aQZaDtID3hQ/tLgpqVdIpLLQp7SsZ/Bp9vTIqfc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528407; c=relaxed/simple; bh=jLwjwVmgpmJV/5OeRd9c4e0euHdaOtuhURblzrIFA1U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F73R+Ue3KacvYAJFmNY1Tbeq4Jsswcj4Pj26OSVaHWlBZtWM/Ffppy4O76dAtPe6xyuiF+w9yJizO1FNza0yzOIY52am/SfE9ZS7WUyrTwLxtvHCbGY47mxMBJNtSoMmf2ZOVDhzkGr0Yed7eKitYchNzh8xNXjqU9CBefru6pQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=Ei6NdeEF; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="Ei6NdeEF" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2c1b16b9755so581816a91.0 for ; Tue, 04 Jun 2024 12:13:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1717528403; x=1718133203; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6ZPLrNRezmocE4Rkfvj0VL4ygrCPYVD0bCTIaofhj+g=; b=Ei6NdeEFadVz7c7TtFq4HfesJJKZd5efwriMQvN3MPLlCa7CUx0fXcG64I/FLsKe+9 WaTdMYLx7knovnX4nVRU7IoWRm6/eGQBQRWatDyptQ2q761KZs5Fh6+izWvFWLENPdaN 9pUGlH1xXdkzeRhDFZOdHF8qapiRLyKrd8n9fAPvuhuVvgiu4EqsBxoGdQnI7wC3Hch+ bsuLQE3jJbh3GIaRPkeaVYgYBxRW9M80xOmScP/ALXMA61txi/kzD38mv+NOQNgg+mYU +N1L7Rq6yRUdMthHICfhpw2ivvmDzpiqwMs4GzVCxAKEtvRkG9nFosrzShZp6w/NrIPY 0lUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717528404; x=1718133204; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6ZPLrNRezmocE4Rkfvj0VL4ygrCPYVD0bCTIaofhj+g=; b=YU2UADmjUso5JlTJPbcBjrAp61isqWBK8aRII8B1K/vnFPiR1MOhKCQYUqdXFZ/I59 COmBEtDvbuwEBt3IRn+I6xYhh+uAuXnR51etCLfqaTVwvhZIYH5F/QF/E8SvIPP04d4i zWcayg7F3XgdKtFs3s4SRfuSvvF5NBiXC1S6O5SYhFpao4BmdPD7/84kI+EuNjaslgCt vFtWryWULU1VqX6GAG0c0Kx6uHtVoQkg4noL5HEEj/5yjHrqoc+99zn6Kt36vxaAV2sG 9Qk84heh/dQHvDyQR/xRTWGKonUtSvnID4mXF8SEiw93SB/qodcjqbirL115L7sMIX1W VDmA== X-Gm-Message-State: AOJu0YyjAOa0D5kGNmhmNGPgJhITP1N6V6QV9FrtWLnj03Ry9zzPu/dV rIgSI491vFltYc//5V7oa72uF+ubGLk833y2AQKf8IkGqlyDVgAVHcdS5rNO9iJUDclU3CTQsRi x X-Google-Smtp-Source: AGHT+IFwL3I7rWCVxTojjk2RjESqa8lx6lJLJkZUpueWsgWbetCmU1bZcNmzD0sgnBiNdzdFONiIcA== X-Received: by 2002:a17:90b:360b:b0:2c2:204d:6c2 with SMTP id 98e67ed59e1d1-2c27db681dcmr366493a91.2.1717528403405; Tue, 04 Jun 2024 12:13:23 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2c1c283164fsm8960265a91.37.2024.06.04.12.13.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 12:13:22 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/5] io_uring: move to using private ring references Date: Tue, 4 Jun 2024 13:01:30 -0600 Message-ID: <20240604191314.454554-4-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240604191314.454554-1-axboe@kernel.dk> References: <20240604191314.454554-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 io_uring currently uses percpu refcounts for the ring reference. This works fine, but exiting a ring requires an RCU grace period to lapse and this slows down ring exit quite a lot. Add a basic per-cpu counter for our references instead, and use that. This is in preparation for doing a sync wait on on any request (notably file) references on ring exit. As we're going to be waiting on ctx refs going away as well with that, the RCU grace period wait becomes a noticeable slowdown. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 +- io_uring/Makefile | 2 +- io_uring/io_uring.c | 39 ++++++++++------------- io_uring/refs.c | 58 ++++++++++++++++++++++++++++++++++ io_uring/refs.h | 53 +++++++++++++++++++++++++++++++ io_uring/register.c | 3 +- io_uring/rw.c | 3 +- io_uring/sqpoll.c | 3 +- 8 files changed, 135 insertions(+), 28 deletions(-) create mode 100644 io_uring/refs.c diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index a2227ab7fd16..fc1e0e65d474 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -238,7 +238,7 @@ struct io_ring_ctx { struct task_struct *submitter_task; struct io_rings *rings; - struct percpu_ref refs; + unsigned long ref_ptr; enum task_work_notify_mode notify_method; unsigned sq_thread_idle; diff --git a/io_uring/Makefile b/io_uring/Makefile index 61923e11c767..b167ab8930a9 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -4,7 +4,7 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \ tctx.o filetable.o rw.o net.o poll.o \ - eventfd.o uring_cmd.o openclose.o \ + eventfd.o refs.o uring_cmd.o openclose.o \ sqpoll.o xattr.o nop.o fs.o splice.o \ sync.o msg_ring.o advise.o openclose.o \ epoll.o statx.o timeout.o fdinfo.o \ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3ad915262a45..841a5dd6ba89 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -248,13 +248,6 @@ static __cold void io_kworker_tw_end(void) current->flags |= PF_NO_TASKWORK; } -static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref) -{ - struct io_ring_ctx *ctx = container_of(ref, struct io_ring_ctx, refs); - - complete(&ctx->ref_comp); -} - static __cold void io_fallback_req_func(struct work_struct *work) { struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, @@ -265,13 +258,13 @@ static __cold void io_fallback_req_func(struct work_struct *work) io_kworker_tw_start(); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); mutex_lock(&ctx->uring_lock); llist_for_each_entry_safe(req, tmp, node, io_task_work.node) req->io_task_work.func(req, &ts); io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); io_kworker_tw_end(); } @@ -312,8 +305,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) goto err; if (io_alloc_hash_table(&ctx->cancel_table_locked, hash_bits)) goto err; - if (percpu_ref_init(&ctx->refs, io_ring_ctx_ref_free, - 0, GFP_KERNEL)) + if (io_ring_ref_init(ctx)) goto err; ctx->flags = p->flags; @@ -939,7 +931,7 @@ __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx) ret = 1; } - percpu_ref_get_many(&ctx->refs, ret); + io_ring_ref_get_many(ctx, ret); while (ret--) { struct io_kiocb *req = reqs[ret]; @@ -994,7 +986,7 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts) io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); } /* @@ -1018,7 +1010,7 @@ struct llist_node *io_handle_tw_list(struct llist_node *node, ctx_flush_and_put(ctx, &ts); ctx = req->ctx; mutex_lock(&ctx->uring_lock); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); } INDIRECT_CALL_2(req->io_task_work.func, io_poll_task_func, io_req_rw_complete, @@ -1062,10 +1054,10 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) if (sync && last_ctx != req->ctx) { if (last_ctx) { flush_delayed_work(&last_ctx->fallback_work); - percpu_ref_put(&last_ctx->refs); + io_ring_ref_put(last_ctx); } last_ctx = req->ctx; - percpu_ref_get(&last_ctx->refs); + io_ring_ref_get(last_ctx); } if (llist_add(&req->io_task_work.node, &req->ctx->fallback_llist)) @@ -1074,7 +1066,7 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) if (last_ctx) { flush_delayed_work(&last_ctx->fallback_work); - percpu_ref_put(&last_ctx->refs); + io_ring_ref_put(last_ctx); } } @@ -2566,7 +2558,7 @@ static void io_req_caches_free(struct io_ring_ctx *ctx) nr++; } if (nr) - percpu_ref_put_many(&ctx->refs, nr); + io_ring_ref_put_many(ctx, nr); mutex_unlock(&ctx->uring_lock); } @@ -2610,7 +2602,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) } io_rings_free(ctx); - percpu_ref_exit(&ctx->refs); + io_ring_ref_free(ctx); free_uid(ctx->user); io_req_caches_free(ctx); if (ctx->hash_map) @@ -2636,7 +2628,7 @@ static __cold void io_activate_pollwq_cb(struct callback_head *cb) * might've been lost due to loose synchronisation. */ wake_up_all(&ctx->poll_wq); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); } __cold void io_activate_pollwq(struct io_ring_ctx *ctx) @@ -2654,9 +2646,9 @@ __cold void io_activate_pollwq(struct io_ring_ctx *ctx) * only need to sync with it, which is done by injecting a tw */ init_task_work(&ctx->poll_wq_task_work, io_activate_pollwq_cb); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); if (task_work_add(ctx->submitter_task, &ctx->poll_wq_task_work, TWA_SIGNAL)) - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); out: spin_unlock(&ctx->completion_lock); } @@ -2833,7 +2825,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) struct creds *creds; mutex_lock(&ctx->uring_lock); - percpu_ref_kill(&ctx->refs); + io_ring_ref_kill(ctx); xa_for_each(&ctx->personalities, index, creds) io_unregister_personality(ctx, index); mutex_unlock(&ctx->uring_lock); @@ -2848,6 +2840,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) * over using system_wq. */ queue_work(iou_wq, &ctx->exit_work); + io_ring_ref_put(ctx); } static int io_uring_release(struct inode *inode, struct file *file) diff --git a/io_uring/refs.c b/io_uring/refs.c new file mode 100644 index 000000000000..af21f3937f09 --- /dev/null +++ b/io_uring/refs.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +#include "refs.h" + +int io_ring_ref_init(struct io_ring_ctx *ctx) +{ + size_t align = max_t(size_t, 1 << __PERCPU_REF_FLAG_BITS, + __alignof__(local_t)); + + ctx->ref_ptr = (unsigned long) __alloc_percpu(sizeof(local_t), align); + if (ctx->ref_ptr) + return 0; + + return -ENOMEM; +} + +void io_ring_ref_free(struct io_ring_ctx *ctx) +{ + local_t __percpu *refs = io_ring_ref(ctx); + + free_percpu(refs); + ctx->ref_ptr = 0; +} + +/* + * Checks if all references are gone, completes if so. + */ +void __cold io_ring_ref_maybe_done(struct io_ring_ctx *ctx) +{ + local_t __percpu *refs = io_ring_ref(ctx); + long sum = 0; + int cpu; + + preempt_disable(); + for_each_possible_cpu(cpu) + sum += local_read(per_cpu_ptr(refs, cpu)); + preempt_enable(); + + if (!sum) + complete(&ctx->ref_comp); +} + +/* + * Mark the reference killed. This grabs a reference which the caller must + * drop. + */ +void io_ring_ref_kill(struct io_ring_ctx *ctx) +{ + io_ring_ref_get(ctx); + set_bit(CTX_REF_DEAD_BIT, &ctx->ref_ptr); + io_ring_ref_maybe_done(ctx); +} diff --git a/io_uring/refs.h b/io_uring/refs.h index 63982ead9f7d..a4d4d46d6290 100644 --- a/io_uring/refs.h +++ b/io_uring/refs.h @@ -2,6 +2,7 @@ #define IOU_REQ_REF_H #include +#include #include /* @@ -52,4 +53,56 @@ static inline void io_req_set_refcount(struct io_kiocb *req) { __io_req_set_refcount(req, 1); } + +int io_ring_ref_init(struct io_ring_ctx *ctx); +void io_ring_ref_free(struct io_ring_ctx *ctx); +void __cold io_ring_ref_maybe_done(struct io_ring_ctx *ctx); +void io_ring_ref_kill(struct io_ring_ctx *ctx); + +enum { + CTX_REF_DEAD_BIT = 0UL, + CTX_REF_DEAD_MASK = 1UL, +}; + +static inline local_t __percpu *io_ring_ref(struct io_ring_ctx *ctx) +{ + return (local_t __percpu *) (ctx->ref_ptr & ~CTX_REF_DEAD_MASK); +} + +static inline bool io_ring_ref_is_dying(struct io_ring_ctx *ctx) +{ + return test_bit(CTX_REF_DEAD_BIT, &ctx->ref_ptr); +} + +static inline void io_ring_ref_get_many(struct io_ring_ctx *ctx, int nr) +{ + local_t __percpu *refs = io_ring_ref(ctx); + + preempt_disable(); + local_add(nr, this_cpu_ptr(refs)); + preempt_enable(); +} + +static inline void io_ring_ref_get(struct io_ring_ctx *ctx) +{ + io_ring_ref_get_many(ctx, 1); +} + +static inline void io_ring_ref_put_many(struct io_ring_ctx *ctx, int nr) +{ + local_t __percpu *refs = io_ring_ref(ctx); + + preempt_disable(); + local_sub(nr, this_cpu_ptr(refs)); + preempt_enable(); + + if (unlikely(io_ring_ref_is_dying(ctx))) + io_ring_ref_maybe_done(ctx); +} + +static inline void io_ring_ref_put(struct io_ring_ctx *ctx) +{ + io_ring_ref_put_many(ctx, 1); +} + #endif diff --git a/io_uring/register.c b/io_uring/register.c index f121e02f5e10..9c1984e5c2f2 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -28,6 +28,7 @@ #include "kbuf.h" #include "napi.h" #include "eventfd.h" +#include "refs.h" #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ IORING_REGISTER_LAST + IORING_OP_LAST) @@ -347,7 +348,7 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, * We don't quiesce the refs for register anymore and so it can't be * dying as we're holding a file ref here. */ - if (WARN_ON_ONCE(percpu_ref_is_dying(&ctx->refs))) + if (WARN_ON_ONCE(io_ring_ref_is_dying(ctx))) return -ENXIO; if (ctx->submitter_task && ctx->submitter_task != current) diff --git a/io_uring/rw.c b/io_uring/rw.c index 1a2128459cb4..1092a6d5cefc 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -21,6 +21,7 @@ #include "alloc_cache.h" #include "rsrc.h" #include "poll.h" +#include "refs.h" #include "rw.h" struct io_rw { @@ -419,7 +420,7 @@ static bool io_rw_should_reissue(struct io_kiocb *req) * Don't attempt to reissue from that path, just let it fail with * -EAGAIN. */ - if (percpu_ref_is_dying(&ctx->refs)) + if (io_ring_ref_is_dying(ctx)) return false; /* * Play it safe and assume not safe to re-import and reissue if we're diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index b3722e5275e7..de003b6b06ce 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -16,6 +16,7 @@ #include "io_uring.h" #include "napi.h" +#include "refs.h" #include "sqpoll.h" #define IORING_SQPOLL_CAP_ENTRIES_VALUE 8 @@ -190,7 +191,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) * Don't submit if refs are dying, good for io_uring_register(), * but also it is relied upon by io_ring_exit_work() */ - if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) && + if (to_submit && likely(!io_ring_ref_is_dying(ctx)) && !(ctx->flags & IORING_SETUP_R_DISABLED)) ret = io_submit_sqes(ctx, to_submit); mutex_unlock(&ctx->uring_lock); From patchwork Tue Jun 4 19:01:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13685805 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B7DB14AD20 for ; Tue, 4 Jun 2024 19:13:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528410; cv=none; b=h4BEdlAWQc+kldybr3h7WULnUVqtujhTmWrHJc91yja5VYxBP2IyXfjgeONlL3p9qq5oYfAoAlEqWFsl9iw5UpDhM4VCjpJRjD7U2kZzHpfT7xixzME1t1CYv6MHc1gNowsbEnl5lJM+NZxO61aqfOynXaI7n++4egchMILx2Ao= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528410; c=relaxed/simple; bh=OR0CnN2UkeymA99yO1fBYAaowi0El/TSdyC92Zt4N4I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aPmy/K7DHmI+5cpQP1V+BoAZhbLr0vvO9AVzmkbwcAAiiiwRn6PGr0siagkOG7i3V6Akq3JjyDiSvnf2Qnzfr/FmFFMgVW7FitGuI8ECKzfpO0KG4VQ9nZYRiLdQh5HOByjE0Cd3R0kEsy7pwjKFpWuFMj+Io0662gAZ/tp6Sd4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=jJMltWwH; arc=none smtp.client-ip=209.85.216.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="jJMltWwH" Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-2c2083b004fso569861a91.2 for ; Tue, 04 Jun 2024 12:13:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1717528406; x=1718133206; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n8AftqVlYD0e2UfGEa2RUz3rAXs2L3XRzDeiOK4ao4k=; b=jJMltWwHSNjGnbSZF/hFFfqrRhS3VGU+cGdWfWReQV8jwq8XNICrX/OKBQxDTOAJ/j l3TsFRzYW5ZsyldTSszGTq87hzi2/6SAeC4qTrEPN2YLwyx0ByzRh6DAlaCbVYp1dPUw YU9qKmRcDtSeyxiDJ08Jk7QcedWrLD1+ZZyq59usOnJtOGlChcNpY/tr+hOVyX6ojwGA WLk7fHv87Asg9i33k+jLnQR1fmjDfw84xeAHkS7kMSifJKUN64j3fdrd15xIVmKbtlUm SBdPeiVzwVkC705yEhvAHlF8K7oYw97I4COS8EgPgAVNoaccLhGNyPauIqZ97UBeZWY6 qJMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717528406; x=1718133206; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n8AftqVlYD0e2UfGEa2RUz3rAXs2L3XRzDeiOK4ao4k=; b=Cy3zI/nbPCaQtjdwDK+6jZeE9zfOUoF/tmnraHEpPtvFDp6KO+5EqLlPl9DuNucz4I s3CvZeTnQBJt+PLHlXnKLW0I+LICeLiavQ10+22k7dpIfnU8yhDQyWO3pYRhG69gzSBH yT6aqhoAp1dhNRGEw8mB0FbRbxxPiEvOsdbVySuMa3Du+WNXWOBvGWHp5IVxtXjy41wh zTjw+BRGL17CIQ6o3sNCQgvBb76AwtUZGYGkifPCeRFOCO8ROD+MjZZkIWNCjJEz6j1I vsdSx1il6B7roGnIY5vSULCHMzBKbs7+zPyYptiaZbxjxDLcAqOmdknye99/GOW72R+R lqmw== X-Gm-Message-State: AOJu0YwqSHxoALLaX03Sku1CeMu2b/2wqHIdo9W+yibf8G8buv9/9S0l LX6BMpxH6ClA5rmG1J+KSaKVKOe6zuQufnyZc2/7Ej44ppjHQjzPlNu8VQlLDK0pQipl17p2kuW 4 X-Google-Smtp-Source: AGHT+IE0gEAziTrxxww6sLqkZC+Gf4/6tC+yCuRoM/4nl28djr4Ol3nNA5jgsVonHsJjMdc9zCETgA== X-Received: by 2002:a17:90a:12ce:b0:2c1:a06c:c6ac with SMTP id 98e67ed59e1d1-2c27de60e56mr361621a91.3.1717528406358; Tue, 04 Jun 2024 12:13:26 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2c1c283164fsm8960265a91.37.2024.06.04.12.13.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 12:13:23 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 4/5] io_uring: consider ring dead once the ref is marked dying Date: Tue, 4 Jun 2024 13:01:31 -0600 Message-ID: <20240604191314.454554-5-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240604191314.454554-1-axboe@kernel.dk> References: <20240604191314.454554-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Don't gate this on the task exiting flag. It's generally not a good idea to gate it on the task PF_EXITING flag anyway. Once the ring is starting to go through ring teardown, the ref is marked as dying. Use that as our fallback/cancel mechanism. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 9 +++++++-- io_uring/io_uring.h | 3 ++- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 841a5dd6ba89..5a4699170136 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -528,7 +528,11 @@ static void io_queue_iowq(struct io_kiocb *req) * procedure rather than attempt to run this request (or create a new * worker for it). */ - if (WARN_ON_ONCE(!same_thread_group(req->task, current))) + WARN_ON_ONCE(!io_ring_ref_is_dying(req->ctx) && + !same_thread_group(req->task, current)); + + if (!same_thread_group(req->task, current) || + io_ring_ref_is_dying(req->ctx)) req->work.flags |= IO_WQ_WORK_CANCEL; trace_io_uring_queue_async_work(req, io_wq_is_hashed(&req->work)); @@ -1196,7 +1200,8 @@ static void io_req_normal_work_add(struct io_kiocb *req) return; } - if (likely(!task_work_add(req->task, &tctx->task_work, ctx->notify_method))) + if (!io_ring_ref_is_dying(ctx) && + !task_work_add(req->task, &tctx->task_work, ctx->notify_method)) return; io_fallback_tw(tctx, false); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index cd43924eed04..55eac07d5fe0 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -11,6 +11,7 @@ #include "io-wq.h" #include "slist.h" #include "filetable.h" +#include "refs.h" #ifndef CREATE_TRACE_POINTS #include @@ -122,7 +123,7 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) * Not from an SQE, as those cannot be submitted, but via * updating tagged resources. */ - if (ctx->submitter_task->flags & PF_EXITING) + if (io_ring_ref_is_dying(ctx)) lockdep_assert(current_work()); else lockdep_assert(current == ctx->submitter_task); From patchwork Tue Jun 4 19:01:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13685806 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EA4014AD32 for ; Tue, 4 Jun 2024 19:13:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528411; cv=none; b=AqKx7kBxX8TSUMDACO56xujDTYs+FSdnDUugb+ZcS0FS6MhHzfE6dJ+k9ncrEDWsLsWk7lWI2mM7pG9bq8vVotqWNqHNESjDT+o/VJ5VMepsYtXNpuEdLWhJQIWkPOOWVGEw93VwwxbOhhwb3uN6mMLGUNZqIHZfhch1XaasgyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717528411; c=relaxed/simple; bh=elsAwSpveYyAAJD935fO32pChhAOuw2D+06WGsnF6u4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rKxQrKd6EMeFc6rg6cA9eoFRihQdSNLhCXKe/X+h2g5/SdPWotuZ6Iu1Xf1TsMTQA974sX7rW2hXm/PhuT1zFmNHyp+fah2yWXA9U3KBjqfhTlc+A7R5ryoYwa6I0ZMfWHlG7BfqCoYnpMqj45p5InwR2n350q+4z4i7Xpx/qD0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=LVgP9Y6o; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="LVgP9Y6o" Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-2c1b16b9755so581822a91.0 for ; Tue, 04 Jun 2024 12:13:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1717528409; x=1718133209; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Cml926L31b1JFRrdBC6XRZijdltsC3bbHEbQVTJZY4Q=; b=LVgP9Y6oxyAysuzC+Y1zbICJiMvSXFZMjrrGB3FR8Hi7KDSs8uJPC5c4LfJWVd/Gbt 27w7sXGcEpomDGPofL8CoWR1Af4SZ8wE431xCiernpHSyH1Aw06i96rS/22OzRenMFog OTK2ZAMMj+bgzpByvn66WngMWB2nwpEjYSwASLzSNXpwKEMELqJJKmO5x6S1BG98+7Q/ /5+S0l/vSSooLD1PUgULikanCbaXlhE22CeKquZ97LE4a+pNH2jNN1li5rFFehAYcmhZ f47Z0GB4VvtxKVEOGqX+hKpPQSti8HPAk5x6mKvoskGh5TR0k+dh4wrUBYyRxaj6/BjE ywjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717528409; x=1718133209; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Cml926L31b1JFRrdBC6XRZijdltsC3bbHEbQVTJZY4Q=; b=I+iYiBoeYqhQW9sfFgwVg17hX54mNylpbfleG25eAwCJ4Wtsm9XpjcJM46rpXusw9E sP4qLTcaMMqwEd0zbHBqjA1PoiJe0tzbxEmxhhZHwOStRG8ejxqqqmabmN0ewo5qJEWO jW0SBUd+LNdKEm12I7HaP6npZkj1lXx6xQZ658L0cj2nmVie115g5HwY5yKma49VMtaS F/KUtVqP+uF985DHoZyBuSalwleK5gkEqborcP36ouGlGYFVpB+9+unISZjdxJsEq9xE YMB5LZKCftYksH6h5mBy3FkCNay/wvoMoIL4HY75zXnDPlm6SI+dI/dW/hMJd+11iYJl R34Q== X-Gm-Message-State: AOJu0YwyIGJkTkEWTmIm37LM1CFeRg156+6T+ZrKu7Vuq1JGEy90r5yB /KSlCXzQSewz+CNB/eWs9/B004WCXBfQpP8xdsGc1cLIer/CaYs+wa0T5al6yTk7Dc0nYVPlECo A X-Google-Smtp-Source: AGHT+IHlhF1P6eAZVCMo1Vx28B0RdpEEO7jMoEXVU16L3e7psSUfjOmHSefhGmNsh/vXzeOvIuQ2CQ== X-Received: by 2002:a17:90a:eb17:b0:2bd:e340:377f with SMTP id 98e67ed59e1d1-2c27dd6073bmr362026a91.3.1717528408771; Tue, 04 Jun 2024 12:13:28 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2c1c283164fsm8960265a91.37.2024.06.04.12.13.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 12:13:26 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 5/5] io_uring: wait for cancelations on final ring put Date: Tue, 4 Jun 2024 13:01:32 -0600 Message-ID: <20240604191314.454554-6-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240604191314.454554-1-axboe@kernel.dk> References: <20240604191314.454554-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We still offload the cancelation to a workqueue, as not to introduce dependencies between the exiting task waiting on cleanup, and that task needing to run task_work to complete the process. This means that once the final ring put is done, any request that was inflight and needed cancelation will be done as well. Notably requests that hold references to files - once the ring fd close is done, we will have dropped any of those references too. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 ++ io_uring/io_uring.c | 16 ++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index fc1e0e65d474..a6b5f041423f 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -420,6 +420,8 @@ struct io_ring_ctx { unsigned short n_sqe_pages; struct page **ring_pages; struct page **sqe_pages; + + struct completion *exit_comp; }; struct io_tw_state { diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 5a4699170136..3000a865baec 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2730,6 +2730,7 @@ static __cold void io_ring_exit_work(struct work_struct *work) struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work); unsigned long timeout = jiffies + HZ * 60 * 5; unsigned long interval = HZ / 20; + struct completion *exit_comp; struct io_tctx_exit exit; struct io_tctx_node *node; int ret; @@ -2788,6 +2789,10 @@ static __cold void io_ring_exit_work(struct work_struct *work) io_kworker_tw_end(); + exit_comp = READ_ONCE(ctx->exit_comp); + if (exit_comp) + complete(exit_comp); + init_completion(&exit.completion); init_task_work(&exit.task_work, io_tctx_exit_cb); exit.ctx = ctx; @@ -2851,9 +2856,20 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) static int io_uring_release(struct inode *inode, struct file *file) { struct io_ring_ctx *ctx = file->private_data; + DECLARE_COMPLETION_ONSTACK(exit_comp); file->private_data = NULL; + WRITE_ONCE(ctx->exit_comp, &exit_comp); io_ring_ctx_wait_and_kill(ctx); + + /* + * Wait for cancel to run before exiting task + */ + do { + if (current->io_uring) + io_fallback_tw(current->io_uring, false); + } while (wait_for_completion_interruptible(&exit_comp)); + return 0; }