From patchwork Mon Dec 5 02:44:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13064088 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46F05C4708D for ; Mon, 5 Dec 2022 02:45:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231152AbiLECpm (ORCPT ); Sun, 4 Dec 2022 21:45:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230488AbiLECpl (ORCPT ); Sun, 4 Dec 2022 21:45:41 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98952FAD0 for ; Sun, 4 Dec 2022 18:45:39 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id bs21so16633277wrb.4 for ; Sun, 04 Dec 2022 18:45:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IwIojQsbtwdudytw7H3OFWsaCpN1PVeV6GgZoyHazjk=; b=SswPh+5S+n1yxqpxrhBzHdnGFXlOlR7cAJ6zSGeB0n1xoEqVH8uxGwY0uGMrJaOA3y WzKVANO8Mr4egH/nx04waZrPCrQtPWCUqFN5/Z2le2XUmYSQlQP3Ne36LI+ugJveSmCR 0O/ooPNVvHo2omHIccAD5LlUBpQI1Fk4lP7uc7bucjTPpH+M6vkVEw3R8hGi+YcJS/PE b8EAne0vXNU5wVvPvoT8ukfpH30dum5DSfw+qJ3drhyaqEwBXiN1JaKKPsZHhokWuMhR rY0XrdJZq3aaxA2+sGLGIdLLeq/kZmpd1IiDpAEnk5zBr0klWvwTYgoq9+M4MCCRMaqt 0aFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IwIojQsbtwdudytw7H3OFWsaCpN1PVeV6GgZoyHazjk=; b=7s1eoLAN3eF5CB3/SjBBC/ZIPGaMPuc3akJI2QCu2ZlmFTx8rhNGYNecVgK/bsUXhB TbLxp0CdoCr3q0738bv+0rsfwJzvMxgfdUH9C4D0s4hzWTWecGLhJ3VNboNlrBHlfSzH sk6Ekfy2XoPXKCjhlKg0uQ2cIqxhCPupimIzFsRm0/9JyKtW2IuFRK1IzxztZ6mM0E8X krnm9zfS2tZVfnkAM7/xIVmN9HTMNonFLS9vgBWoTeZC4KG5A9yk3yAeCPW60/A96xvu FBybKKraiKEB1tJ00tBdWpuUjwbBVkhdqNdY1Y5PHQdNCYBaWKKXNE99eWlyogVYP6XD t+mg== X-Gm-Message-State: ANoB5pnhQGikrHo3oncE1uG82tsTqYaQusuIjfryRvlDcwrORyKOV9xG rUf7KtihcOBcdDGcHx8ofxLEznUoPLE= X-Google-Smtp-Source: AA0mqf4Dc5CA4SSWPTVsRnqzZpmKccFRWV0php5wVZbm1ZYm7b+eilAuxReYKXd7F9kFc17CI1EP5g== X-Received: by 2002:a5d:5286:0:b0:241:eba6:7d83 with SMTP id c6-20020a5d5286000000b00241eba67d83mr34016323wrv.691.1670208337891; Sun, 04 Dec 2022 18:45:37 -0800 (PST) Received: from 127.0.0.1localhost (94.196.241.58.threembb.co.uk. [94.196.241.58]) by smtp.gmail.com with ESMTPSA id t17-20020a05600c41d100b003cf71b1f66csm15281532wmh.0.2022.12.04.18.45.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Dec 2022 18:45:37 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH for-next 1/7] io_uring: skip overflow CQE posting for dying ring Date: Mon, 5 Dec 2022 02:44:25 +0000 Message-Id: <6b6397cffe0446834741647c7cc0b624b38abbb5.1670207706.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org After io_ring_ctx_wait_and_kill() is called there should be no users poking into rings and so there is no need to post CQEs. So, instead of trying to post overflowed CQEs into the CQ, drop them. Also, do it in io_ring_exit_work() in a loop to reduce the number of contexts it can be executed from and even when it struggles to quiesce the ring we won't be leaving memory allocated for longer than needed. Signed-off-by: Pavel Begunkov --- io_uring/io_uring.c | 45 +++++++++++++++++++++++++++++++-------------- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 436b1ac8f6d0..4721ff6cafaa 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -611,12 +611,30 @@ void io_cq_unlock_post(struct io_ring_ctx *ctx) } /* Returns true if there are no backlogged entries after the flush */ -static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) +static void io_cqring_overflow_kill(struct io_ring_ctx *ctx) +{ + struct io_overflow_cqe *ocqe; + LIST_HEAD(list); + + io_cq_lock(ctx); + list_splice_init(&ctx->cq_overflow_list, &list); + clear_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq); + io_cq_unlock(ctx); + + while (!list_empty(&list)) { + ocqe = list_first_entry(&list, struct io_overflow_cqe, list); + list_del(&ocqe->list); + kfree(ocqe); + } +} + +/* Returns true if there are no backlogged entries after the flush */ +static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx) { bool all_flushed; size_t cqe_size = sizeof(struct io_uring_cqe); - if (!force && __io_cqring_events(ctx) == ctx->cq_entries) + if (__io_cqring_events(ctx) == ctx->cq_entries) return false; if (ctx->flags & IORING_SETUP_CQE32) @@ -627,15 +645,11 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) struct io_uring_cqe *cqe = io_get_cqe_overflow(ctx, true); struct io_overflow_cqe *ocqe; - if (!cqe && !force) + if (!cqe) break; ocqe = list_first_entry(&ctx->cq_overflow_list, struct io_overflow_cqe, list); - if (cqe) - memcpy(cqe, &ocqe->cqe, cqe_size); - else - io_account_cq_overflow(ctx); - + memcpy(cqe, &ocqe->cqe, cqe_size); list_del(&ocqe->list); kfree(ocqe); } @@ -658,7 +672,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx) /* iopoll syncs against uring_lock, not completion_lock */ if (ctx->flags & IORING_SETUP_IOPOLL) mutex_lock(&ctx->uring_lock); - ret = __io_cqring_overflow_flush(ctx, false); + ret = __io_cqring_overflow_flush(ctx); if (ctx->flags & IORING_SETUP_IOPOLL) mutex_unlock(&ctx->uring_lock); } @@ -1478,7 +1492,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min) check_cq = READ_ONCE(ctx->check_cq); if (unlikely(check_cq)) { if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT)) - __io_cqring_overflow_flush(ctx, false); + __io_cqring_overflow_flush(ctx); /* * Similarly do not spin if we have not informed the user of any * dropped CQE. @@ -2646,8 +2660,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) __io_sqe_buffers_unregister(ctx); if (ctx->file_data) __io_sqe_files_unregister(ctx); - if (ctx->rings) - __io_cqring_overflow_flush(ctx, true); + io_cqring_overflow_kill(ctx); io_eventfd_unregister(ctx); io_alloc_cache_free(&ctx->apoll_cache, io_apoll_cache_free); io_alloc_cache_free(&ctx->netmsg_cache, io_netmsg_cache_free); @@ -2788,6 +2801,12 @@ static __cold void io_ring_exit_work(struct work_struct *work) * as nobody else will be looking for them. */ do { + if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) { + mutex_lock(&ctx->uring_lock); + io_cqring_overflow_kill(ctx); + mutex_unlock(&ctx->uring_lock); + } + if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) io_move_task_work_from_local(ctx); @@ -2853,8 +2872,6 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) mutex_lock(&ctx->uring_lock); percpu_ref_kill(&ctx->refs); - if (ctx->rings) - __io_cqring_overflow_flush(ctx, true); xa_for_each(&ctx->personalities, index, creds) io_unregister_personality(ctx, index); if (ctx->rings) From patchwork Mon Dec 5 02:44:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13064090 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECA0FC47089 for ; Mon, 5 Dec 2022 02:45:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230448AbiLECpn (ORCPT ); Sun, 4 Dec 2022 21:45:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231139AbiLECpl (ORCPT ); Sun, 4 Dec 2022 21:45:41 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99008FADD for ; Sun, 4 Dec 2022 18:45:40 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id h7so10611291wrs.6 for ; Sun, 04 Dec 2022 18:45:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PVLl5MtlPDSyk/qlXtfhxF35haHp/uds22QS3HFvxRc=; b=WjUeR/hNPShOaUOXLRC3VFXbDiGBvSfxKHW4xU5d5ILDhgVWi9vdJsENU85B8ioE/L nbbMkILUpYMmZm14wKTwweq9DlBC51xm9IKI12Xi79Y9ps2YICwDe3LH8tsId4wLaffX MPTNBiYanGN0TLsK9SfUW/FXDH7XHsnTAn/U1rOnBsvonMSsPDysPDTAVV/C2/TgT7FJ tsIoXnIKciI5frzVj1cKVXoIvHNVmwHzjNfTnsW8rZftWpRR+DzSCNtG2OLz/Mct29XA kRq3bnfPxuVAD2+ponEmA8gniAfiWJ/HSnIl0VtYmL9WAxgOsCiYNAf4rUzP4ZEMYZvB 8i4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PVLl5MtlPDSyk/qlXtfhxF35haHp/uds22QS3HFvxRc=; b=rbPM2VljyEksz2KYXBRZ4BYv/dvLW8KSTPxi6XF9pncslhYetYHs+X8HBlGN45xS/L Ff5Ng1N/0DKWgf15CFwUE1jDXlaNFBoiJvjiMl9c6O9D21OSSmae3zMBVoiCN612faQs 9THfb+7AuLr8oNOk41DrPsA2e4reZd/3rYhtwFpMPCrWXbcFobRDR8trIpCyZGLc6Z7C IXQ9LfQcyzhwEWRm8rBd/yS+K+Ct+szun0GUkvSGqF+oBX6vEa1LD62BPLEaO8naUllm NvzrZrFg95eo3q1l2kJDmqJTrsJLQFAvfgg9td3wdOJP+fcJk0WEYtifyyqQtV0Jjro8 8pFg== X-Gm-Message-State: ANoB5plSKI8hgAcjbbPF3DVG6ZY87mSnjuJp2YPju6H5E//bYP9CXoIn EpPA9Ngz4c6i/2QLTD7pkpIXX8R8FDE= X-Google-Smtp-Source: AA0mqf7WSbVW8VFJRB0pnBHgacUGBAwK3Nopmvzj29ThSTDFUB4haQ13kqAMzYeC1HdK/YiV1ddPvA== X-Received: by 2002:adf:eec5:0:b0:242:1352:2b62 with SMTP id a5-20020adfeec5000000b0024213522b62mr22565318wrp.370.1670208338832; Sun, 04 Dec 2022 18:45:38 -0800 (PST) Received: from 127.0.0.1localhost (94.196.241.58.threembb.co.uk. [94.196.241.58]) by smtp.gmail.com with ESMTPSA id t17-20020a05600c41d100b003cf71b1f66csm15281532wmh.0.2022.12.04.18.45.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Dec 2022 18:45:38 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH for-next 2/7] io_uring: don't check overflow flush failures Date: Mon, 5 Dec 2022 02:44:26 +0000 Message-Id: X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org The only way to fail overflowed CQEs flush is for CQ to be fully packed. There is one place checking for flush failures, i.e. io_cqring_wait(), but we limit the number to be waited for by the CQ size, so getting a failure automatically means that we're done with waiting. Don't check for failures, rarely but they might spuriously fail CQ waiting with -EBUSY. Signed-off-by: Pavel Begunkov --- io_uring/io_uring.c | 24 ++++++------------------ 1 file changed, 6 insertions(+), 18 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4721ff6cafaa..7239776a9d4b 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -629,13 +629,12 @@ static void io_cqring_overflow_kill(struct io_ring_ctx *ctx) } /* Returns true if there are no backlogged entries after the flush */ -static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx) +static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx) { - bool all_flushed; size_t cqe_size = sizeof(struct io_uring_cqe); if (__io_cqring_events(ctx) == ctx->cq_entries) - return false; + return; if (ctx->flags & IORING_SETUP_CQE32) cqe_size <<= 1; @@ -654,30 +653,23 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx) kfree(ocqe); } - all_flushed = list_empty(&ctx->cq_overflow_list); - if (all_flushed) { + if (list_empty(&ctx->cq_overflow_list)) { clear_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq); atomic_andnot(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags); } - io_cq_unlock_post(ctx); - return all_flushed; } -static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx) +static void io_cqring_overflow_flush(struct io_ring_ctx *ctx) { - bool ret = true; - if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) { /* iopoll syncs against uring_lock, not completion_lock */ if (ctx->flags & IORING_SETUP_IOPOLL) mutex_lock(&ctx->uring_lock); - ret = __io_cqring_overflow_flush(ctx); + __io_cqring_overflow_flush(ctx); if (ctx->flags & IORING_SETUP_IOPOLL) mutex_unlock(&ctx->uring_lock); } - - return ret; } void __io_put_task(struct task_struct *task, int nr) @@ -2505,11 +2497,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, trace_io_uring_cqring_wait(ctx, min_events); do { - /* if we can't even flush overflow, don't wait for more */ - if (!io_cqring_overflow_flush(ctx)) { - ret = -EBUSY; - break; - } + io_cqring_overflow_flush(ctx); prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq, TASK_INTERRUPTIBLE); ret = io_cqring_wait_schedule(ctx, &iowq, timeout); From patchwork Mon Dec 5 02:44:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13064089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2797C3A5A7 for ; Mon, 5 Dec 2022 02:45:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230488AbiLECpn (ORCPT ); Sun, 4 Dec 2022 21:45:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230402AbiLECpm (ORCPT ); Sun, 4 Dec 2022 21:45:42 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44DF1FACA for ; Sun, 4 Dec 2022 18:45:41 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id h10so7094113wrx.3 for ; Sun, 04 Dec 2022 18:45:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ajY7oGhe0hmOGNUv5BKDKhjewuXgAiFYQv5RXF5CL34=; b=ToBIr6PVGPshoyADvy5PYrdDjRH+YZ+vrcG0LmtA1UW7N/kafzyk5n5gl8J36smCk2 xYNJDM63TgvfjewObMDupCEGiQNoB3/hvJu62l9qk1Nb7RjRmiBrFdySengxOQhaAU5f 7pYlN5LwbusqNCHre2z4BKHL5VIVWwOcSc0QGoJ0ow1TatZNfhDY5Yvfnac4ixqYUh+n MjNuC4hib84jU30D5hglT7PN2RVWq+t6QtweMVwhY8+EfkwDTkVuMqLo9Yk/rMRKKbXA R+fIekK+6RmeLgemfPDKjqjAqBFpkbUW6d+enQHZBJ3e507uNNnwAGf8t2kZiM+hPu8K jhCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ajY7oGhe0hmOGNUv5BKDKhjewuXgAiFYQv5RXF5CL34=; b=kphNU4j/SaoMJqsYjzwwRiUXihxv68KDrvHgWS2foxWyVNxc8wbY2bIzV40Jp5XYqJ 7ugmsKCBPOyRbXjKSOYqNQq0l1BvaaJOvNIlhsDV8kM0H/LJOUSOqjvX3ji613TLEVpt thEMVc+9IXFu8rNne3w4tIEhmK2V3juzAlme+i/zAQA+bY0oJ8DixfRtEgKi9LI0cK7F nQaugmy4e35DkCB+K/1D+JwK1Jj36WAy3gp3SNlXnIIxUkmhdCnHozT7DL8YensumQsp Xv7He0/6tRY41i9qycGA1YDGXK5JU41GNUztJvfdQ0ZrOoFDoPy4oiK5iBySZrLC+9ND UFXA== X-Gm-Message-State: ANoB5plcx3wUUH8+nT7BYoTFfcx0Ld/BqaAnVrc+FBUTAfWgLuvXFFJs LwQwNwBHs2Xweid6pshTWHtzf1nM0ac= X-Google-Smtp-Source: AA0mqf51LGxdRxx2l4Gjq+YOFdrXlljXmHDczQBz2IRTMN7ZyqZ9ov9r3nMbEe3x1RjKLDtC2/Ahpg== X-Received: by 2002:a5d:67d0:0:b0:241:781e:606 with SMTP id n16-20020a5d67d0000000b00241781e0606mr49435377wrw.216.1670208339584; Sun, 04 Dec 2022 18:45:39 -0800 (PST) Received: from 127.0.0.1localhost (94.196.241.58.threembb.co.uk. [94.196.241.58]) by smtp.gmail.com with ESMTPSA id t17-20020a05600c41d100b003cf71b1f66csm15281532wmh.0.2022.12.04.18.45.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Dec 2022 18:45:39 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH for-next 3/7] io_uring: complete all requests in task context Date: Mon, 5 Dec 2022 02:44:27 +0000 Message-Id: <24ed012156ad8c9f3b13dd7fe83925443cbdd627.1670207706.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This patch adds ctx->task_complete flag. If set, we'll complete all requests in the context of the original task. Note, this extends to completion CQE posting only but not io_kiocb cleanup / free, e.g. io-wq may free the requests in the free calllback. This flag will be used later for optimisations purposes. Signed-off-by: Pavel Begunkov --- include/linux/io_uring.h | 2 ++ include/linux/io_uring_types.h | 2 ++ io_uring/io_uring.c | 14 +++++++++++--- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 29e519752da4..934e5dd4ccc0 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -11,6 +11,8 @@ enum io_uring_cmd_flags { IO_URING_F_UNLOCKED = 2, /* the request is executed from poll, it should not be freed */ IO_URING_F_MULTISHOT = 4, + /* executed by io-wq */ + IO_URING_F_IOWQ = 8, /* int's last bit, sign checks are usually faster than a bit test */ IO_URING_F_NONBLOCK = INT_MIN, diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index accdfecee953..6be1e1359c89 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -208,6 +208,8 @@ struct io_ring_ctx { unsigned int drain_disabled: 1; unsigned int has_evfd: 1; unsigned int syscall_iopoll: 1; + /* all CQEs should be posted only by the submitter task */ + unsigned int task_complete: 1; } ____cacheline_aligned_in_smp; /* submission data */ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7239776a9d4b..0c86df7112fb 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -932,8 +932,11 @@ static void __io_req_complete_post(struct io_kiocb *req) void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) { - if (!(issue_flags & IO_URING_F_UNLOCKED) || - !(req->ctx->flags & IORING_SETUP_IOPOLL)) { + if (req->ctx->task_complete && (issue_flags & IO_URING_F_IOWQ)) { + req->io_task_work.func = io_req_task_complete; + io_req_task_work_add(req); + } else if (!(issue_flags & IO_URING_F_UNLOCKED) || + !(req->ctx->flags & IORING_SETUP_IOPOLL)) { __io_req_complete_post(req); } else { struct io_ring_ctx *ctx = req->ctx; @@ -1841,7 +1844,7 @@ void io_wq_submit_work(struct io_wq_work *work) { struct io_kiocb *req = container_of(work, struct io_kiocb, work); const struct io_op_def *def = &io_op_defs[req->opcode]; - unsigned int issue_flags = IO_URING_F_UNLOCKED; + unsigned int issue_flags = IO_URING_F_UNLOCKED | IO_URING_F_IOWQ; bool needs_poll = false; int ret = 0, err = -ECANCELED; @@ -3501,6 +3504,11 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, if (!ctx) return -ENOMEM; + if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) && + !(ctx->flags & IORING_SETUP_IOPOLL) && + !(ctx->flags & IORING_SETUP_SQPOLL)) + ctx->task_complete = true; + /* * When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, user * space applications don't need to do io completion events From patchwork Mon Dec 5 02:44:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13064091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6239AC4708E for ; Mon, 5 Dec 2022 02:45:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230402AbiLECpo (ORCPT ); Sun, 4 Dec 2022 21:45:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231139AbiLECpn (ORCPT ); Sun, 4 Dec 2022 21:45:43 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E8D8FAD0 for ; Sun, 4 Dec 2022 18:45:42 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id w15so16617945wrl.9 for ; Sun, 04 Dec 2022 18:45:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BzK/zrdj76nrH7tgY872zfma02vZfGuAP6Fn/6zdEXI=; b=oOmj4JTziR/PMj48t6VbE0a0DdU09MevOPpA/h+1YcXecYmBtvaJClw9ZcawEsQ31P NgsnSPBA8eLVf8rNEYXHF0yHkAg0mENKk6R9CscvNtFuGmJipMlgET+awwCbcGli6y3V AX2j0e/n0SlA88QN3/v1ZucQOmRbvXAVtT5pjNp8vwelHcxRLKHHL43D8rPkfsB1ma5O Fc2ucV9qeIBpalUQEUHSfHxSi+9a4zls/01GO2rFTZ2ovxwrbISfgzRaLJn1nCsZMw3O SdYXaJkikwD8/eRKTLLbs3B0PkBB8H9+0y0J+7mMZqVsGAltuxETghSN8WGsdo8rRrem OebA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BzK/zrdj76nrH7tgY872zfma02vZfGuAP6Fn/6zdEXI=; b=kPVDwG323ha6n4Bga5m9TyYbmH/wHabu8Qmp3SCzVxB78AZmPE1pTFtDUkHnKfPWji 4i+XG5HlTOHZ3ULNN3Lctuya5hKP3CFABh7Lr6DVt6wh46wqxche5WOhys33BBKr/0BN vmjcRLOYP082zVzmPYbopzV+pjC918i5vJRR8lHNVnWfaBJJvTI67BM0xy71E2wpjtGK x2phE1Wukm2nUhaNfCW8R4P1ffo/GIpxhxPOsQ+jAGM7VHRLR+LAEpAbmFqFyYv4Uph8 stThv9b3cRVH9PLUAlhNpsF3Iijx08frE5TAJsp6wo/oTZHvdHsSqDyWSt/hBKUuNCKw fqRg== X-Gm-Message-State: ANoB5pmbtFFR997cEDXtZO0eYjuRqufl8B2cJdl7swkDVnI3fHfAI1cJ p8txCGMG1Hww7JNFdu8wh3yh6jIh5zA= X-Google-Smtp-Source: AA0mqf4J9yL3VZ6+7xTsrrZSA1cYiV1LeVnp1hK7FAYm7eztuIW7J+PplUEI7Mcq0yhoBJhSwSwEtA== X-Received: by 2002:a05:6000:1702:b0:241:ffc4:dd1c with SMTP id n2-20020a056000170200b00241ffc4dd1cmr31285372wrc.538.1670208340464; Sun, 04 Dec 2022 18:45:40 -0800 (PST) Received: from 127.0.0.1localhost (94.196.241.58.threembb.co.uk. [94.196.241.58]) by smtp.gmail.com with ESMTPSA id t17-20020a05600c41d100b003cf71b1f66csm15281532wmh.0.2022.12.04.18.45.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Dec 2022 18:45:40 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH for-next 4/7] io_uring: force multishot CQEs into task context Date: Mon, 5 Dec 2022 02:44:28 +0000 Message-Id: X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Multishot are posting CQEs outside of the normal request completion path, which is usually done from within a task work handler. However, it might be not the case when it's yet to be polled but has been punted to io-wq. Make it abide ->task_complete and push it to the polling path when executed by io-wq. Signed-off-by: Pavel Begunkov --- io_uring/net.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/io_uring/net.c b/io_uring/net.c index 90342dcb6b1d..f276f6dd5b09 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -67,6 +67,19 @@ struct io_sr_msg { struct io_kiocb *notif; }; +static inline bool io_check_multishot(struct io_kiocb *req, + unsigned int issue_flags) +{ + /* + * When ->locked_cq is set we only allow to post CQEs from the original + * task context. Usual request completions will be handled in other + * generic paths but multipoll may decide to post extra cqes. + */ + return !(issue_flags & IO_URING_F_IOWQ) || + !(issue_flags & IO_URING_F_MULTISHOT) || + !req->ctx->task_complete; +} + int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_shutdown *shutdown = io_kiocb_to_cmd(req, struct io_shutdown); @@ -730,6 +743,9 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags) (sr->flags & IORING_RECVSEND_POLL_FIRST)) return io_setup_async_msg(req, kmsg, issue_flags); + if (!io_check_multishot(req, issue_flags)) + return io_setup_async_msg(req, kmsg, issue_flags); + retry_multishot: if (io_do_buffer_select(req)) { void __user *buf; @@ -829,6 +845,9 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) (sr->flags & IORING_RECVSEND_POLL_FIRST)) return -EAGAIN; + if (!io_check_multishot(req, issue_flags)) + return -EAGAIN; + sock = sock_from_file(req->file); if (unlikely(!sock)) return -ENOTSOCK; @@ -1280,6 +1299,8 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags) struct file *file; int ret, fd; + if (!io_check_multishot(req, issue_flags)) + return -EAGAIN; retry: if (!fixed) { fd = __get_unused_fd_flags(accept->flags, accept->nofile); From patchwork Mon Dec 5 02:44:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13064092 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D62AAC4321E for ; Mon, 5 Dec 2022 02:45:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231208AbiLECpo (ORCPT ); Sun, 4 Dec 2022 21:45:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231186AbiLECpn (ORCPT ); Sun, 4 Dec 2022 21:45:43 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7071FAF8 for ; Sun, 4 Dec 2022 18:45:42 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id n9-20020a05600c3b8900b003d0944dba41so4273631wms.4 for ; Sun, 04 Dec 2022 18:45:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ea7C17AoEV6IEO9YMzkfNZJhWQ5hTG5jtnXDl5YDDSQ=; b=U8DK/cdzmeRfwYMGOGn6v9Q4kXU8I6XBz+WSJIOyuLD8LD4xSilW4wpuIxlwT9kdYh uNoZBXqAIaHXp2Ggm4poGzdeQcoEm1K0GN6qEqZSPREvWIv8N1Fwj2au3Xcp2ErnI5eO jCKrJxEJm7XdaewvAMVBupo+JUVPVaNKvxlzLiQnQwwK//wCnVQn+Kz5pHVNbZX7wdAh TAFIfkRozxyY3e8upDRmaR6K0Omk6JrhwN9iDNNiLSGdjgT9NZ/A5Km8ugs71Z3rNKVl ThosgA7Iam7O4K3RibWo4AROfCWQE1U4W5SH3ndwY+gFnhbwWbcXlJUwt9Qa0P0Wqn2E FozQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ea7C17AoEV6IEO9YMzkfNZJhWQ5hTG5jtnXDl5YDDSQ=; b=Ho+Vki6kksX5c8v5EUn7neKuTU7TBYkQPhjKTgFyeu6tEL1wZ8mLTv8x+TEWxkxaUH vVa6CrCCpbCSvUPZ7dDXOLBjI9M5m71dDJVHL8voZi4VtjDMkixhmDcf/YkFfjUcWP6Z GpqVaBwZGhvX9GNrp61oU0+SJ7EJC/iAxNdPYTUWwKmeD01O6SRD2OkpU33zr1Kqq87W UjLukKyI8v3A+XW87OKWZPGVqh0Wj6cToL39d/GnWxGPobkiQZNQeZF90Fn41iWExwny T16nlSVayuoCaKDSAEQf1BHL+0UcX1NPrJK/c6nd4ng/0hf+iRC3a+wxe7OiI8hD8Yah 4f8Q== X-Gm-Message-State: ANoB5pmZgQGhYY1pphvWUDD0guiDxLWErHW6Ag8vAOGGE7BYuVozOR7F /cHnl9KWKhWWymNLEAKSzwTUFS+K+YY= X-Google-Smtp-Source: AA0mqf47wM5MLXhcd39WZwe1ePp07ahPdGcmKNynUX6HYsLpQHb/W1qmPG52qBmzSASn5UsbgOzC7Q== X-Received: by 2002:a05:600c:4586:b0:3cf:848a:21cc with SMTP id r6-20020a05600c458600b003cf848a21ccmr51314333wmo.5.1670208341160; Sun, 04 Dec 2022 18:45:41 -0800 (PST) Received: from 127.0.0.1localhost (94.196.241.58.threembb.co.uk. [94.196.241.58]) by smtp.gmail.com with ESMTPSA id t17-20020a05600c41d100b003cf71b1f66csm15281532wmh.0.2022.12.04.18.45.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Dec 2022 18:45:40 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH for-next 5/7] io_uring: post msg_ring CQE in task context Date: Mon, 5 Dec 2022 02:44:29 +0000 Message-Id: X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org We want to limit post_aux_cqe() to the task context when ->task_complete is set, and so we can't just deliver a IORING_OP_MSG_RING CQE to another thread. Instead of trying to invent a new delayed CQE posting mechanism push them into the overflow list. Signed-off-by: Pavel Begunkov --- io_uring/io_uring.c | 12 ++++++++++++ io_uring/io_uring.h | 2 ++ io_uring/msg_ring.c | 14 ++++++++++++-- 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 0c86df7112fb..7fda57dc0e8c 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -860,6 +860,18 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags return __io_post_aux_cqe(ctx, user_data, res, cflags, true); } +bool io_post_aux_cqe_overflow(struct io_ring_ctx *ctx, + u64 user_data, s32 res, u32 cflags) +{ + bool filled; + + io_cq_lock(ctx); + ctx->cq_extra++; + filled = io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0); + io_cq_unlock_post(ctx); + return filled; +} + bool io_aux_cqe(struct io_ring_ctx *ctx, bool defer, u64 user_data, s32 res, u32 cflags, bool allow_overflow) { diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 62227ec3260c..a0b11a631e29 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -36,6 +36,8 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags bool io_aux_cqe(struct io_ring_ctx *ctx, bool defer, u64 user_data, s32 res, u32 cflags, bool allow_overflow); void __io_commit_cqring_flush(struct io_ring_ctx *ctx); +bool io_post_aux_cqe_overflow(struct io_ring_ctx *ctx, + u64 user_data, s32 res, u32 cflags); struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages); diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c index afb543aab9f6..7717fe519b07 100644 --- a/io_uring/msg_ring.c +++ b/io_uring/msg_ring.c @@ -23,6 +23,16 @@ struct io_msg { u32 flags; }; +/* post cqes to another ring */ +static int io_msg_post_cqe(struct io_ring_ctx *ctx, + u64 user_data, s32 res, u32 cflags) +{ + if (!ctx->task_complete || current == ctx->submitter_task) + return io_post_aux_cqe(ctx, user_data, res, cflags); + else + return io_post_aux_cqe_overflow(ctx, user_data, res, cflags); +} + static int io_msg_ring_data(struct io_kiocb *req) { struct io_ring_ctx *target_ctx = req->file->private_data; @@ -31,7 +41,7 @@ static int io_msg_ring_data(struct io_kiocb *req) if (msg->src_fd || msg->dst_fd || msg->flags) return -EINVAL; - if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0)) + if (io_msg_post_cqe(target_ctx, msg->user_data, msg->len, 0)) return 0; return -EOVERFLOW; @@ -116,7 +126,7 @@ static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags) * completes with -EOVERFLOW, then the sender must ensure that a * later IORING_OP_MSG_RING delivers the message. */ - if (!io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0)) + if (!io_msg_post_cqe(target_ctx, msg->user_data, msg->len, 0)) ret = -EOVERFLOW; out_unlock: io_double_unlock_ctx(ctx, target_ctx, issue_flags); From patchwork Mon Dec 5 02:44:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13064093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6A46C4708D for ; Mon, 5 Dec 2022 02:45:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231309AbiLECpp (ORCPT ); Sun, 4 Dec 2022 21:45:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231206AbiLECpo (ORCPT ); Sun, 4 Dec 2022 21:45:44 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B85EC10060 for ; Sun, 4 Dec 2022 18:45:43 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id q7so16627571wrr.8 for ; Sun, 04 Dec 2022 18:45:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IkqYtqviC3RF91Z5Z3rSsyFaerXGj63h6THg5DwVRpU=; b=O9QWAK9HCsnvW4fc/e+iFHZPx9gWOeDiv1WamZvgQv3zXPUQTmlBuG2YqYvg91abp/ qvwM7C1Kg/gDzmlOp2kl8POddopHZSzEo+YOOFO+6b8dd9PMPnM5Glk0Bw2hCQsQh+L9 Nc5WseAdOmR7nyR6DaoT8SuxJx3rzcEaiwU7Up5c1u+UGEeulChBepMmJ7zmVhjOEPV4 BX+s2BXelt5aPaqCvBFo4oytz5qOQuFisULuxFBNDQMZsjEE790h0gyJRN0Ch6ijxhwO KPu1X5tDL3/QTZa+gHQlIeEUCd4z+nWwiAZNiBv8UGBrfFu1/85JiUM/AmU652YoIbzS s7ZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IkqYtqviC3RF91Z5Z3rSsyFaerXGj63h6THg5DwVRpU=; b=3ltu1tYBdxgMJpJil9Ot6dTDQ69PVzBX32SZ5bCjcgKBXKst9miTCcyHobA/l7CXii p2gQHh8bdtXVrHYapGucVKP4RQUFrsdI8J1qGNKJphKjSd3PEQ4/2g2oK7qlQ2S4knF6 yqdDn95xq8lstWOdBo7Go+iBmiCi7xj7YmmD1JxC+NsPRJbF9yB8Aea0oISQ7GTQxJ1G RrTo7CaIAQp2/Esb2KCs1sf0WI67+X+tiK2tyP0uVRYYNRAmp94TLg0Ahal3F2T6Jeii L0OH26Iwy2ucaNO/Xq9byoMOzVowVXZ6ORJk1dmRQH58gJNUGva6Clx5FbJWmXh0w9j6 GppA== X-Gm-Message-State: ANoB5pksQ9NeaJoLW4WrnKLs/t7UdFZhtwryIx8nzY7Cy+MRj2Gg6gWF ihw8gbZ47yeIYs39wVfRv9HBLYWhw70= X-Google-Smtp-Source: AA0mqf4ZszM4cDnO4QFcQ3/w+TnpcNBzYXKg0daAWyn9euY98yH5IMbFKL7ZhJaM+Xc8C4OkJVanAg== X-Received: by 2002:adf:db87:0:b0:242:2719:5784 with SMTP id u7-20020adfdb87000000b0024227195784mr15157913wri.130.1670208341996; Sun, 04 Dec 2022 18:45:41 -0800 (PST) Received: from 127.0.0.1localhost (94.196.241.58.threembb.co.uk. [94.196.241.58]) by smtp.gmail.com with ESMTPSA id t17-20020a05600c41d100b003cf71b1f66csm15281532wmh.0.2022.12.04.18.45.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Dec 2022 18:45:41 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH for-next 6/7] io_uring: use tw for putting rsrc Date: Mon, 5 Dec 2022 02:44:30 +0000 Message-Id: <9b35443a6f758f76ea75bb7438e6ff5a7b4f40e3.1670207706.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Use task_work for completing rsrc removals, it'll be needed later for spinlock optimisations. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 1 + io_uring/io_uring.c | 1 + io_uring/rsrc.c | 19 +++++++++++++++++-- io_uring/rsrc.h | 1 + 4 files changed, 20 insertions(+), 2 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 6be1e1359c89..dcd8a563ab52 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -328,6 +328,7 @@ struct io_ring_ctx { struct io_rsrc_data *buf_data; struct delayed_work rsrc_put_work; + struct callback_head rsrc_put_tw; struct llist_head rsrc_put_llist; struct list_head rsrc_ref_list; spinlock_t rsrc_ref_lock; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7fda57dc0e8c..9eb771a4c912 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -326,6 +326,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) spin_lock_init(&ctx->rsrc_ref_lock); INIT_LIST_HEAD(&ctx->rsrc_ref_list); INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work); + init_task_work(&ctx->rsrc_put_tw, io_rsrc_put_tw); init_llist_head(&ctx->rsrc_put_llist); init_llist_head(&ctx->work_llist); INIT_LIST_HEAD(&ctx->tctx_list); diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index d25309400a45..18de10c68a15 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -204,6 +204,14 @@ void io_rsrc_put_work(struct work_struct *work) } } +void io_rsrc_put_tw(struct callback_head *cb) +{ + struct io_ring_ctx *ctx = container_of(cb, struct io_ring_ctx, + rsrc_put_tw); + + io_rsrc_put_work(&ctx->rsrc_put_work.work); +} + void io_wait_rsrc_data(struct io_rsrc_data *data) { if (data && !atomic_dec_and_test(&data->refs)) @@ -242,8 +250,15 @@ static __cold void io_rsrc_node_ref_zero(struct percpu_ref *ref) } spin_unlock_irqrestore(&ctx->rsrc_ref_lock, flags); - if (first_add) - mod_delayed_work(system_wq, &ctx->rsrc_put_work, delay); + if (!first_add) + return; + + if (ctx->submitter_task) { + if (!task_work_add(ctx->submitter_task, &ctx->rsrc_put_tw, + ctx->notify_method)) + return; + } + mod_delayed_work(system_wq, &ctx->rsrc_put_work, delay); } static struct io_rsrc_node *io_rsrc_node_alloc(void) diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 81445a477622..2b8743645efc 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -53,6 +53,7 @@ struct io_mapped_ubuf { struct bio_vec bvec[]; }; +void io_rsrc_put_tw(struct callback_head *cb); void io_rsrc_put_work(struct work_struct *work); void io_rsrc_refs_refill(struct io_ring_ctx *ctx); void io_wait_rsrc_data(struct io_rsrc_data *data); From patchwork Mon Dec 5 02:44:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13064094 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBC60C4321E for ; Mon, 5 Dec 2022 02:45:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231186AbiLECpr (ORCPT ); Sun, 4 Dec 2022 21:45:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231302AbiLECpp (ORCPT ); Sun, 4 Dec 2022 21:45:45 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8035510077 for ; Sun, 4 Dec 2022 18:45:44 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id p13-20020a05600c468d00b003cf8859ed1bso8964763wmo.1 for ; Sun, 04 Dec 2022 18:45:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3Yv9W3MRI4b+o6DpSQxY7QbWQXPCf1pUEqejWq6FCA8=; b=MmLd64M0amzzpl35D4YprfTAXwtXsY7BTh4bCsmAHD2PTQ3BSBE/SvZRQpO7j5l+s5 wZSMEHuSkUCLegn1tBrCH5DkpP3i3qkctjRqbx/uHIkIER3Lr2AI5+f3W8O5+pCNtdG1 oalGg2tZ89tv/UvmNTBOuW/l0j7LcDyrchAcv5FqFb5D9e3xRjT1NRTL+zkISQfEJOSc 7U1mYDD3sgf6nKF6wyQyIJkUUJj9YjXCCXO+sbHkEjGbZmghPJES2XF5g+g8pWV0S9vm YKLfuLbXrTdzLfWH+ezizseAZqG7IZtK7ecvhNaaXVV/LK01Pm0HAZTSK9yJXrYpuW2g TVwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3Yv9W3MRI4b+o6DpSQxY7QbWQXPCf1pUEqejWq6FCA8=; b=NU3QwLXOFEzK2HAUrtFNhDWmkOXLYB8E2XPfJPoHoUsKP2YEosCPR7wH/8vR+s0DES Zlr5AUyIQgTeSDqQZO5ZLtRtCKj/5a4SSqRNb17Rv1cUZNpEfcXfD8Gd0xJ9G0YGKYCP 4qyZa+De/Uwnm3aseQ6LFvzBwxur9v7j3j9FKFpnPN6XbcnIMLXcyssUGhr5wlGUUk8Q poBSH8lf2OfXEFVKM3nMqZ9Xt5zusCOgNcwnhGtQ8UHzN9MdR29cNqL4zooTP96DKUvF EeNuFxSNWKISXICXPJ35WSluydIulSA32wHwbXpMtOJK1MgpPz7G7Mgufm38XnpUGXeL Yw2g== X-Gm-Message-State: ANoB5pnqewZpYOyeqp4L2hJZsIHcl2Btv68ILK6IVJxJqF65V0BIalLR xR4SgU0eL9/jKDzx62L6XQJrfPs+fVY= X-Google-Smtp-Source: AA0mqf4E9btS7fyFBWIAt1E3ZEYJOM6PtzvhQi8Xl6eQ7bK/gs1FVrUJr4QP65p+U1051wfpiwwuvg== X-Received: by 2002:a05:600c:4a99:b0:3cf:91e5:3d69 with SMTP id b25-20020a05600c4a9900b003cf91e53d69mr63084137wmp.160.1670208342783; Sun, 04 Dec 2022 18:45:42 -0800 (PST) Received: from 127.0.0.1localhost (94.196.241.58.threembb.co.uk. [94.196.241.58]) by smtp.gmail.com with ESMTPSA id t17-20020a05600c41d100b003cf71b1f66csm15281532wmh.0.2022.12.04.18.45.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Dec 2022 18:45:42 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH for-next 7/7] io_uring: skip spinlocking for ->task_complete Date: Mon, 5 Dec 2022 02:44:31 +0000 Message-Id: <76ed0107ff51cec483eda84056363cc0d0775c7e.1670207706.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org ->task_complete was added to serialised CQE posting by doing it from the task context only (or fallback wq when the task is dead), and now we can use that to avoid taking ->completion_lock while filling CQ entries. The patch skips spinlocking only in two spots, __io_submit_flush_completions() and flushing in io_aux_cqe, it's safer and covers all cases we care about. Extra care is taken to force taking the lock while queueing overflow entries. It fundamentally relies on SINGLE_ISSUER to have only one task posting events. It also need to take into account overflowed CQEs, flushing of which happens in the cq wait path, and so this implementation also needs DEFER_TASKRUN to limit waiters. For the same reason we disable it for SQPOLL, and for IOPOLL as it won't benefit from it in any case. DEFER_TASKRUN, SQPOLL and IOPOLL requirement may be relaxed in the future. Signed-off-by: Pavel Begunkov --- io_uring/io_uring.c | 71 +++++++++++++++++++++++++++++++++------------ io_uring/io_uring.h | 12 ++++++-- 2 files changed, 62 insertions(+), 21 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 9eb771a4c912..36cb63e4174f 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -594,13 +594,25 @@ static inline void io_cq_unlock(struct io_ring_ctx *ctx) spin_unlock(&ctx->completion_lock); } +static inline void __io_cq_lock(struct io_ring_ctx *ctx) + __acquires(ctx->completion_lock) +{ + if (!ctx->task_complete) + spin_lock(&ctx->completion_lock); +} + +static inline void __io_cq_unlock(struct io_ring_ctx *ctx) +{ + if (!ctx->task_complete) + spin_unlock(&ctx->completion_lock); +} + /* keep it inlined for io_submit_flush_completions() */ -static inline void io_cq_unlock_post_inline(struct io_ring_ctx *ctx) +static inline void __io_cq_unlock_post(struct io_ring_ctx *ctx) __releases(ctx->completion_lock) { io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); - + __io_cq_unlock(ctx); io_commit_cqring_flush(ctx); io_cqring_wake(ctx); } @@ -608,7 +620,10 @@ static inline void io_cq_unlock_post_inline(struct io_ring_ctx *ctx) void io_cq_unlock_post(struct io_ring_ctx *ctx) __releases(ctx->completion_lock) { - io_cq_unlock_post_inline(ctx); + io_commit_cqring(ctx); + spin_unlock(&ctx->completion_lock); + io_commit_cqring_flush(ctx); + io_cqring_wake(ctx); } /* Returns true if there are no backlogged entries after the flush */ @@ -795,12 +810,13 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) return &rings->cqes[off]; } -static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, - bool allow_overflow) +static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, + u32 cflags) { struct io_uring_cqe *cqe; - lockdep_assert_held(&ctx->completion_lock); + if (!ctx->task_complete) + lockdep_assert_held(&ctx->completion_lock); ctx->cq_extra++; @@ -823,10 +839,6 @@ static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 } return true; } - - if (allow_overflow) - return io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0); - return false; } @@ -840,7 +852,17 @@ static void __io_flush_post_cqes(struct io_ring_ctx *ctx) for (i = 0; i < state->cqes_count; i++) { struct io_uring_cqe *cqe = &state->cqes[i]; - io_fill_cqe_aux(ctx, cqe->user_data, cqe->res, cqe->flags, true); + if (!io_fill_cqe_aux(ctx, cqe->user_data, cqe->res, cqe->flags)) { + if (ctx->task_complete) { + spin_lock(&ctx->completion_lock); + io_cqring_event_overflow(ctx, cqe->user_data, + cqe->res, cqe->flags, 0, 0); + spin_unlock(&ctx->completion_lock); + } else { + io_cqring_event_overflow(ctx, cqe->user_data, + cqe->res, cqe->flags, 0, 0); + } + } } state->cqes_count = 0; } @@ -851,7 +873,10 @@ static bool __io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u bool filled; io_cq_lock(ctx); - filled = io_fill_cqe_aux(ctx, user_data, res, cflags, allow_overflow); + filled = io_fill_cqe_aux(ctx, user_data, res, cflags); + if (!filled && allow_overflow) + filled = io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0); + io_cq_unlock_post(ctx); return filled; } @@ -887,10 +912,10 @@ bool io_aux_cqe(struct io_ring_ctx *ctx, bool defer, u64 user_data, s32 res, u32 lockdep_assert_held(&ctx->uring_lock); if (ctx->submit_state.cqes_count == length) { - io_cq_lock(ctx); + __io_cq_lock(ctx); __io_flush_post_cqes(ctx); /* no need to flush - flush is deferred */ - io_cq_unlock(ctx); + __io_cq_unlock_post(ctx); } /* For defered completions this is not as strict as it is otherwise, @@ -1418,7 +1443,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx) struct io_wq_work_node *node, *prev; struct io_submit_state *state = &ctx->submit_state; - io_cq_lock(ctx); + __io_cq_lock(ctx); /* must come first to preserve CQE ordering in failure cases */ if (state->cqes_count) __io_flush_post_cqes(ctx); @@ -1426,10 +1451,18 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx) struct io_kiocb *req = container_of(node, struct io_kiocb, comp_list); - if (!(req->flags & REQ_F_CQE_SKIP)) - io_fill_cqe_req(ctx, req); + if (!(req->flags & REQ_F_CQE_SKIP) && + unlikely(!__io_fill_cqe_req(ctx, req))) { + if (ctx->task_complete) { + spin_lock(&ctx->completion_lock); + io_req_cqe_overflow(req); + spin_unlock(&ctx->completion_lock); + } else { + io_req_cqe_overflow(req); + } + } } - io_cq_unlock_post_inline(ctx); + __io_cq_unlock_post(ctx); if (!wq_list_empty(&ctx->submit_state.compl_reqs)) { io_free_batch_list(ctx, state->compl_reqs.first); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index a0b11a631e29..c20f15f5024d 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -112,7 +112,7 @@ static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) return io_get_cqe_overflow(ctx, false); } -static inline bool io_fill_cqe_req(struct io_ring_ctx *ctx, +static inline bool __io_fill_cqe_req(struct io_ring_ctx *ctx, struct io_kiocb *req) { struct io_uring_cqe *cqe; @@ -124,7 +124,7 @@ static inline bool io_fill_cqe_req(struct io_ring_ctx *ctx, */ cqe = io_get_cqe(ctx); if (unlikely(!cqe)) - return io_req_cqe_overflow(req); + return false; trace_io_uring_complete(req->ctx, req, req->cqe.user_data, req->cqe.res, req->cqe.flags, @@ -147,6 +147,14 @@ static inline bool io_fill_cqe_req(struct io_ring_ctx *ctx, return true; } +static inline bool io_fill_cqe_req(struct io_ring_ctx *ctx, + struct io_kiocb *req) +{ + if (likely(__io_fill_cqe_req(ctx, req))) + return true; + return io_req_cqe_overflow(req); +} + static inline void req_set_fail(struct io_kiocb *req) { req->flags |= REQ_F_FAIL;