From patchwork Mon Oct 9 09:33:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13413231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA7E6E95A97 for ; Mon, 9 Oct 2023 09:35:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345801AbjJIJfY (ORCPT ); Mon, 9 Oct 2023 05:35:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345695AbjJIJfY (ORCPT ); Mon, 9 Oct 2023 05:35:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A6F08E for ; Mon, 9 Oct 2023 02:34:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696844075; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TX7Wzkd4ZYKzaRjV/dKAq+t7hN8oAyiuhJUvpnY/OP8=; b=PI+yF3gWJGpIAzOxl22ImeelyT4YnqUYvEuETK88rS7veFaFLQ5Xs0NWwjkBwjgeLiz+FY 2B568y0rZCbtUBTZlg13kmkAVBah+65vxyBKi4dXIA1n7o1kQy6B3m6YE8DyKRFzqebzZv 3kf4A7mIGT8POwV7FKULEMqcbgQSwv8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-691-uELNN7glPqKhLNU7_H5v9Q-1; Mon, 09 Oct 2023 05:34:29 -0400 X-MC-Unique: uELNN7glPqKhLNU7_H5v9Q-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 946C6101A597; Mon, 9 Oct 2023 09:34:29 +0000 (UTC) Received: from localhost (unknown [10.72.120.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id C0D604026FB; Mon, 9 Oct 2023 09:34:28 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: Ming Lei Subject: [PATCH for-6.7/io_uring 1/7] ublk: don't get ublk device reference in ublk_abort_queue() Date: Mon, 9 Oct 2023 17:33:16 +0800 Message-ID: <20231009093324.957829-2-ming.lei@redhat.com> In-Reply-To: <20231009093324.957829-1-ming.lei@redhat.com> References: <20231009093324.957829-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org ublk_abort_queue() is called in ublk_daemon_monitor_work(), in which it is guaranteed that the device is live because monitor work is canceled when removing device, so no need to get the device reference. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 630ddfe6657b..9b3c0b3dd36e 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1419,9 +1419,6 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq) { int i; - if (!ublk_get_device(ub)) - return; - for (i = 0; i < ubq->q_depth; i++) { struct ublk_io *io = &ubq->ios[i]; @@ -1437,7 +1434,6 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq) __ublk_fail_req(ubq, io, rq); } } - ublk_put_device(ub); } static void ublk_daemon_monitor_work(struct work_struct *work) From patchwork Mon Oct 9 09:33:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13413229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 929CDE95A8E for ; Mon, 9 Oct 2023 09:35:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345728AbjJIJfU (ORCPT ); Mon, 9 Oct 2023 05:35:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345692AbjJIJfU (ORCPT ); Mon, 9 Oct 2023 05:35:20 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBA3D94 for ; Mon, 9 Oct 2023 02:34:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696844075; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DfRxw27qVBIdPCd6o1lBvFpTVHLecT139zgNuCHJ+mo=; b=Oysz7EnBb2bf6QQkJHq3IIZ+fp339pAlNAjQ0p34lSY7k7fjwoSl/ujHErC6V1A6K42Voi KOXKkl488yzwEpBKcF+0bWb6S01RWyp0SPQn7DxTss+kOw7eSeMyB37SHSrmPRKSftmeZw hm0dMwOxLr2NILqGdCc9NMpN7H3SRjU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-486-5U-cS7sBNKi9Zeal4lr_CQ-1; Mon, 09 Oct 2023 05:34:33 -0400 X-MC-Unique: 5U-cS7sBNKi9Zeal4lr_CQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 319C1101A598; Mon, 9 Oct 2023 09:34:33 +0000 (UTC) Received: from localhost (unknown [10.72.120.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 643DA140E962; Mon, 9 Oct 2023 09:34:32 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: Ming Lei Subject: [PATCH for-6.7/io_uring 2/7] ublk: make sure io cmd handled in submitter task context Date: Mon, 9 Oct 2023 17:33:17 +0800 Message-ID: <20231009093324.957829-3-ming.lei@redhat.com> In-Reply-To: <20231009093324.957829-1-ming.lei@redhat.com> References: <20231009093324.957829-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In well-done ublk server implementation, ublk io command won't be linked into any link chain. Meantime they are always handled in no-wait style, so basically io cmd is always handled in submitter task context. However, the server may set IOSQE_ASYNC, or io command is linked to one chain mistakenly, then we may still run into io-wq context and ctx->uring_lock isn't held. So in case of IO_URING_F_UNLOCKED, schedule this command by io_uring_cmd_complete_in_task to force running it in submitter task. Then ublk_ch_uring_cmd_local() is guaranteed to run with context uring_lock held, and we needn't to worry about sync among submission code path any more. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 9b3c0b3dd36e..46d499d96ca3 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1810,7 +1810,8 @@ static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, return NULL; } -static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) +static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd, + unsigned int issue_flags) { /* * Not necessary for async retry, but let's keep it simple and always @@ -1824,9 +1825,28 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) .addr = READ_ONCE(ub_src->addr) }; + WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED); + return __ublk_ch_uring_cmd(cmd, issue_flags, &ub_cmd); } +static void ublk_ch_uring_cmd_cb(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + ublk_ch_uring_cmd_local(cmd, issue_flags); +} + +static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) +{ + /* well-implemented server won't run into unlocked */ + if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) { + io_uring_cmd_complete_in_task(cmd, ublk_ch_uring_cmd_cb); + return -EIOCBQUEUED; + } + + return ublk_ch_uring_cmd_local(cmd, issue_flags); +} + static inline bool ublk_check_ubuf_dir(const struct request *req, int ubuf_dir) { From patchwork Mon Oct 9 09:33:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13413234 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 102AFE95A9B for ; Mon, 9 Oct 2023 09:36:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345818AbjJIJgI (ORCPT ); Mon, 9 Oct 2023 05:36:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345804AbjJIJgH (ORCPT ); Mon, 9 Oct 2023 05:36:07 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB3D2BA for ; Mon, 9 Oct 2023 02:34:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696844080; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hy3J/yvb0KfYJNvwsfix6gMHGkLxfT8yOe7cLSEhwSo=; b=AzUEXSARo0mOYKR19FVCVQCoKNWJYnn4w3Dhlx5WSuSuRAgsn6nuGXUnWrM1uImHybSPpK y0gAIozrlWnis0+B2AUY5JymKKLL3W2OY6cKgIPGwPra8Tr/53oZNk4/Vyv/23qPAOrnfL vN37xS5ehIYkJ98JZ3LBFhNWzmMoes8= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-482-hpELf4TGM7uOPn06y1prwg-1; Mon, 09 Oct 2023 05:34:37 -0400 X-MC-Unique: hpELf4TGM7uOPn06y1prwg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7CD263C025A2; Mon, 9 Oct 2023 09:34:36 +0000 (UTC) Received: from localhost (unknown [10.72.120.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9F1AE2027045; Mon, 9 Oct 2023 09:34:35 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: Ming Lei Subject: [PATCH for-6.7/io_uring 3/7] ublk: move ublk_cancel_dev() out of ub->mutex Date: Mon, 9 Oct 2023 17:33:18 +0800 Message-ID: <20231009093324.957829-4-ming.lei@redhat.com> In-Reply-To: <20231009093324.957829-1-ming.lei@redhat.com> References: <20231009093324.957829-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org ublk_cancel_dev() just calls ublk_cancel_queue() to cancel all pending io commands after ublk request queue is idle. The only protection is just the read & write of ubq->nr_io_ready and avoid duplicated command cancel, so add one per-queue lock with cancel flag for providing this protection, meantime move ublk_cancel_dev() out of ub->mutex. Then we needn't to call io_uring_cmd_complete_in_task() to cancel pending command. And the same cancel logic will be re-used for cancelable uring command. This patch basically reverts commit ac5902f84bb5 ("ublk: fix AB-BA lockdep warning"). Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 40 +++++++++++++++++++++++----------------- 1 file changed, 23 insertions(+), 17 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 46d499d96ca3..15b52210488a 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -115,6 +115,9 @@ struct ublk_uring_cmd_pdu { */ #define UBLK_IO_FLAG_NEED_GET_DATA 0x08 +/* atomic RW with ubq->cancel_lock */ +#define UBLK_IO_FLAG_CANCELED 0x80000000 + struct ublk_io { /* userspace buffer address from io cmd */ __u64 addr; @@ -139,6 +142,7 @@ struct ublk_queue { bool force_abort; bool timeout; unsigned short nr_io_ready; /* how many ios setup */ + spinlock_t cancel_lock; struct ublk_device *dev; struct ublk_io ios[]; }; @@ -1473,28 +1477,28 @@ static inline bool ublk_queue_ready(struct ublk_queue *ubq) return ubq->nr_io_ready == ubq->q_depth; } -static void ublk_cmd_cancel_cb(struct io_uring_cmd *cmd, unsigned issue_flags) -{ - io_uring_cmd_done(cmd, UBLK_IO_RES_ABORT, 0, issue_flags); -} - static void ublk_cancel_queue(struct ublk_queue *ubq) { int i; - if (!ublk_queue_ready(ubq)) - return; - for (i = 0; i < ubq->q_depth; i++) { struct ublk_io *io = &ubq->ios[i]; - if (io->flags & UBLK_IO_FLAG_ACTIVE) - io_uring_cmd_complete_in_task(io->cmd, - ublk_cmd_cancel_cb); - } + if (io->flags & UBLK_IO_FLAG_ACTIVE) { + bool done; - /* all io commands are canceled */ - ubq->nr_io_ready = 0; + spin_lock(&ubq->cancel_lock); + done = !!(io->flags & UBLK_IO_FLAG_CANCELED); + if (!done) + io->flags |= UBLK_IO_FLAG_CANCELED; + spin_unlock(&ubq->cancel_lock); + + if (!done) + io_uring_cmd_done(io->cmd, + UBLK_IO_RES_ABORT, 0, + IO_URING_F_UNLOCKED); + } + } } /* Cancel all pending commands, must be called after del_gendisk() returns */ @@ -1541,7 +1545,6 @@ static void __ublk_quiesce_dev(struct ublk_device *ub) blk_mq_quiesce_queue(ub->ub_disk->queue); ublk_wait_tagset_rqs_idle(ub); ub->dev_info.state = UBLK_S_DEV_QUIESCED; - ublk_cancel_dev(ub); /* we are going to release task_struct of ubq_daemon and resets * ->ubq_daemon to NULL. So in monitor_work, check on ubq_daemon causes UAF. * Besides, monitor_work is not necessary in QUIESCED state since we have @@ -1564,6 +1567,7 @@ static void ublk_quiesce_work_fn(struct work_struct *work) __ublk_quiesce_dev(ub); unlock: mutex_unlock(&ub->mutex); + ublk_cancel_dev(ub); } static void ublk_unquiesce_dev(struct ublk_device *ub) @@ -1603,8 +1607,8 @@ static void ublk_stop_dev(struct ublk_device *ub) put_disk(ub->ub_disk); ub->ub_disk = NULL; unlock: - ublk_cancel_dev(ub); mutex_unlock(&ub->mutex); + ublk_cancel_dev(ub); cancel_delayed_work_sync(&ub->monitor_work); } @@ -1978,6 +1982,7 @@ static int ublk_init_queue(struct ublk_device *ub, int q_id) void *ptr; int size; + spin_lock_init(&ubq->cancel_lock); ubq->flags = ub->dev_info.flags; ubq->q_id = q_id; ubq->q_depth = ub->dev_info.queue_depth; @@ -2585,8 +2590,9 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq) int i; WARN_ON_ONCE(!(ubq->ubq_daemon && ubq_daemon_is_dying(ubq))); + /* All old ioucmds have to be completed */ - WARN_ON_ONCE(ubq->nr_io_ready); + ubq->nr_io_ready = 0; /* old daemon is PF_EXITING, put it now */ put_task_struct(ubq->ubq_daemon); /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */ From patchwork Mon Oct 9 09:33:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13413233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50740E95A97 for ; Mon, 9 Oct 2023 09:36:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345834AbjJIJgG (ORCPT ); Mon, 9 Oct 2023 05:36:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345804AbjJIJgF (ORCPT ); Mon, 9 Oct 2023 05:36:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E777C5 for ; Mon, 9 Oct 2023 02:34:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696844084; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NKHZfvG6Z8aGlAnyqm9NbYAuSR+v/Mw4bxijcdGmwVY=; b=XlPZ/JQhlfdJoOIfJyEXdeDEuNqDdYeyBVoyQc22cddAB2IlI34YcODTrrDdIpjStr2vN5 iZX119jXBdl8aQ3QtmXb7pKrLX4XHX3sD7qF3/qsYjMWYXrbegGsPjeUMBC0dRv4VVcTux 5Qyphd11wW5bRHcpvqfBanSb2BCxE3k= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-653-RnjU7tFYOK-OI3_NUiaRSw-1; Mon, 09 Oct 2023 05:34:40 -0400 X-MC-Unique: RnjU7tFYOK-OI3_NUiaRSw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2EA0685A5BE; Mon, 9 Oct 2023 09:34:40 +0000 (UTC) Received: from localhost (unknown [10.72.120.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 627EBC0FE2D; Mon, 9 Oct 2023 09:34:39 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: Ming Lei Subject: [PATCH for-6.7/io_uring 4/7] ublk: rename mm_lock as lock Date: Mon, 9 Oct 2023 17:33:19 +0800 Message-ID: <20231009093324.957829-5-ming.lei@redhat.com> In-Reply-To: <20231009093324.957829-1-ming.lei@redhat.com> References: <20231009093324.957829-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Rename mm_lock field of ublk_device as lock, so that this lock can be reused for protecting access of ub->ub_disk, which will be used for simplifying ublk_abort_queue() by quiesce queue in next patch. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 15b52210488a..ab8828ab3a0c 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -170,7 +170,7 @@ struct ublk_device { struct mutex mutex; - spinlock_t mm_lock; + spinlock_t lock; struct mm_struct *mm; struct ublk_params params; @@ -1361,12 +1361,12 @@ static int ublk_ch_mmap(struct file *filp, struct vm_area_struct *vma) unsigned long pfn, end, phys_off = vma->vm_pgoff << PAGE_SHIFT; int q_id, ret = 0; - spin_lock(&ub->mm_lock); + spin_lock(&ub->lock); if (!ub->mm) ub->mm = current->mm; if (current->mm != ub->mm) ret = -EINVAL; - spin_unlock(&ub->mm_lock); + spin_unlock(&ub->lock); if (ret) return ret; @@ -2341,7 +2341,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) if (!ub) goto out_unlock; mutex_init(&ub->mutex); - spin_lock_init(&ub->mm_lock); + spin_lock_init(&ub->lock); INIT_WORK(&ub->quiesce_work, ublk_quiesce_work_fn); INIT_WORK(&ub->stop_work, ublk_stop_work_fn); INIT_DELAYED_WORK(&ub->monitor_work, ublk_daemon_monitor_work); From patchwork Mon Oct 9 09:33:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13413232 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93D5AE95A99 for ; Mon, 9 Oct 2023 09:36:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345846AbjJIJgH (ORCPT ); Mon, 9 Oct 2023 05:36:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345833AbjJIJgF (ORCPT ); Mon, 9 Oct 2023 05:36:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 292A2CA for ; Mon, 9 Oct 2023 02:34:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696844086; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iORPmUdveCXK71IAfleFJmUzvMnWnU8t382yX/ZIa+8=; b=i+Ne23KhBO+Osy9mXi6njrvVlslvZTFNaKSIw0tzHd1uaEX5euYs/0Moa1LcRlif2gnVEs HepZ2RoXxpMoRK8+L5pUUsPeEEg4K8dkJYPb3ruh5U+8RySCqOs66MTBOuNLHmIS/cf2jH C7aXON0x8of1KmIRlVOLUMDbmpIr5fM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-627-GN1A-iNFOLKEc4Xb3fWL8g-1; Mon, 09 Oct 2023 05:34:43 -0400 X-MC-Unique: GN1A-iNFOLKEc4Xb3fWL8g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5F887805B32; Mon, 9 Oct 2023 09:34:43 +0000 (UTC) Received: from localhost (unknown [10.72.120.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9086010EE859; Mon, 9 Oct 2023 09:34:42 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: Ming Lei Subject: [PATCH for-6.7/io_uring 5/7] ublk: quiesce request queue when aborting queue Date: Mon, 9 Oct 2023 17:33:20 +0800 Message-ID: <20231009093324.957829-6-ming.lei@redhat.com> In-Reply-To: <20231009093324.957829-1-ming.lei@redhat.com> References: <20231009093324.957829-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org So far aborting queue ends request when the ubq daemon is exiting, and it can be run concurrently with ublk_queue_rq(), this way is fragile and we depend on the tricky usage of UBLK_IO_FLAG_ABORTED for avoiding such race. Quiesce queue when aborting queue, and the two code paths can be run completely exclusively, then it becomes easier to add new ublk feature, such as relaxing single same task limit for each queue. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 59 ++++++++++++++++++++++++++++++++++------ 1 file changed, 50 insertions(+), 9 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index ab8828ab3a0c..e8d52cd7b226 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1440,25 +1440,59 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq) } } -static void ublk_daemon_monitor_work(struct work_struct *work) +static bool ublk_abort_requests(struct ublk_device *ub) { - struct ublk_device *ub = - container_of(work, struct ublk_device, monitor_work.work); + struct gendisk *disk; int i; + spin_lock(&ub->lock); + disk = ub->ub_disk; + if (disk) + get_device(disk_to_dev(disk)); + spin_unlock(&ub->lock); + + /* Our disk has been dead */ + if (!disk) + return false; + + /* Now we are serialized with ublk_queue_rq() */ + blk_mq_quiesce_queue(disk->queue); for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { struct ublk_queue *ubq = ublk_get_queue(ub, i); if (ubq_daemon_is_dying(ubq)) { - if (ublk_queue_can_use_recovery(ubq)) - schedule_work(&ub->quiesce_work); - else - schedule_work(&ub->stop_work); - /* abort queue is for making forward progress */ ublk_abort_queue(ub, ubq); } } + blk_mq_unquiesce_queue(disk->queue); + put_device(disk_to_dev(disk)); + + return true; +} + +static void ublk_daemon_monitor_work(struct work_struct *work) +{ + struct ublk_device *ub = + container_of(work, struct ublk_device, monitor_work.work); + int i; + + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { + struct ublk_queue *ubq = ublk_get_queue(ub, i); + + if (ubq_daemon_is_dying(ubq)) + goto found; + } + return; + +found: + if (!ublk_abort_requests(ub)) + return; + + if (ublk_can_use_recovery(ub)) + schedule_work(&ub->quiesce_work); + else + schedule_work(&ub->stop_work); /* * We can't schedule monitor work after ub's state is not UBLK_S_DEV_LIVE. @@ -1593,6 +1627,8 @@ static void ublk_unquiesce_dev(struct ublk_device *ub) static void ublk_stop_dev(struct ublk_device *ub) { + struct gendisk *disk; + mutex_lock(&ub->mutex); if (ub->dev_info.state == UBLK_S_DEV_DEAD) goto unlock; @@ -1602,10 +1638,15 @@ static void ublk_stop_dev(struct ublk_device *ub) ublk_unquiesce_dev(ub); } del_gendisk(ub->ub_disk); + + /* Sync with ublk_abort_queue() by holding the lock */ + spin_lock(&ub->lock); + disk = ub->ub_disk; ub->dev_info.state = UBLK_S_DEV_DEAD; ub->dev_info.ublksrv_pid = -1; - put_disk(ub->ub_disk); ub->ub_disk = NULL; + spin_unlock(&ub->lock); + put_disk(disk); unlock: mutex_unlock(&ub->mutex); ublk_cancel_dev(ub); From patchwork Mon Oct 9 09:33:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13413236 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AED60E95A91 for ; Mon, 9 Oct 2023 09:36:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345893AbjJIJgQ (ORCPT ); Mon, 9 Oct 2023 05:36:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345883AbjJIJgO (ORCPT ); Mon, 9 Oct 2023 05:36:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB752EA for ; Mon, 9 Oct 2023 02:35:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696844107; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JkG471HQUGQOgXNjt7eXYoAUHBPwXZNOYbJ0zPaw2ok=; b=XhHDV+NJ8Uuky8HRvVEZtc0pjc7MLX7g+AhGwtVQjzB+tbyCMNpk2FSkuqoBmAQUOAGddO CeSNM5NWDcowQxTdwxJoxDUS1Yb56xM/3+SnKVl8jqHDO1TYtuFXBKHTl/2GalgICmZWPB gkpRh4z75jLuJ9cawiHjHntx0u3HGI4= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-568-ecvjAjjhPJabhCiZqIT8gA-1; Mon, 09 Oct 2023 05:34:48 -0400 X-MC-Unique: ecvjAjjhPJabhCiZqIT8gA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6D8A4381CC10; Mon, 9 Oct 2023 09:34:47 +0000 (UTC) Received: from localhost (unknown [10.72.120.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 32EE0167F8; Mon, 9 Oct 2023 09:34:45 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: Ming Lei Subject: [PATCH for-6.7/io_uring 6/7] ublk: replace monitor with cancelable uring_cmd Date: Mon, 9 Oct 2023 17:33:21 +0800 Message-ID: <20231009093324.957829-7-ming.lei@redhat.com> In-Reply-To: <20231009093324.957829-1-ming.lei@redhat.com> References: <20231009093324.957829-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Monitor work actually introduces one extra context for handling abort, this way is easy to cause race, and also introduce extra delay when handling aborting. Now we start to support cancelable uring_cmd, so use it instead: 1) this cancel callback is either run from the uring cmd submission task context or called after the io_uring context is exit, so the callback is run exclusively with ublk_ch_uring_cmd() and __ublk_rq_task_work(). 2) the previous patch freezes request queue when calling ublk_abort_queue(), which is now completely exclusive with ublk_queue_rq() and ublk_ch_uring_cmd()/__ublk_rq_task_work(). 3) in timeout handler, if all IOs are in-flight, then all uring commands are completed, uring command canceling can't help us to provide forward progress any more, so call ublk_abort_requests() in timeout handler. This way simplifies aborting queue, and is helpful for adding new feature, such as, relax the limit of using single task for handling one queue. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 208 ++++++++++++++++++++++----------------- 1 file changed, 119 insertions(+), 89 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index e8d52cd7b226..75ed7b87a844 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -75,6 +75,7 @@ struct ublk_rq_data { struct ublk_uring_cmd_pdu { struct ublk_queue *ubq; + u16 tag; }; /* @@ -141,14 +142,13 @@ struct ublk_queue { unsigned int max_io_sz; bool force_abort; bool timeout; + bool canceling; unsigned short nr_io_ready; /* how many ios setup */ spinlock_t cancel_lock; struct ublk_device *dev; struct ublk_io ios[]; }; -#define UBLK_DAEMON_MONITOR_PERIOD (5 * HZ) - struct ublk_device { struct gendisk *ub_disk; @@ -179,11 +179,6 @@ struct ublk_device { unsigned int nr_queues_ready; unsigned int nr_privileged_daemon; - /* - * Our ubq->daemon may be killed without any notification, so - * monitor each queue's daemon periodically - */ - struct delayed_work monitor_work; struct work_struct quiesce_work; struct work_struct stop_work; }; @@ -194,10 +189,11 @@ struct ublk_params_header { __u32 types; }; +static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq); + static inline unsigned int ublk_req_build_flags(struct request *req); static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, int tag); - static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub) { return ub->dev_info.flags & UBLK_F_USER_COPY; @@ -1122,8 +1118,6 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq, blk_mq_requeue_request(rq, false); else blk_mq_end_request(rq, BLK_STS_IOERR); - - mod_delayed_work(system_wq, &ubq->dev->monitor_work, 0); } static inline void __ublk_rq_task_work(struct request *req, @@ -1244,12 +1238,12 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq) io = &ubq->ios[rq->tag]; /* * If the check pass, we know that this is a re-issued request aborted - * previously in monitor_work because the ubq_daemon(cmd's task) is + * previously in cancel fn because the ubq_daemon(cmd's task) is * PF_EXITING. We cannot call io_uring_cmd_complete_in_task() anymore * because this ioucmd's io_uring context may be freed now if no inflight * ioucmd exists. Otherwise we may cause null-deref in ctx->fallback_work. * - * Note: monitor_work sets UBLK_IO_FLAG_ABORTED and ends this request(releasing + * Note: cancel fn sets UBLK_IO_FLAG_ABORTED and ends this request(releasing * the tag). Then the request is re-started(allocating the tag) and we are here. * Since releasing/allocating a tag implies smp_mb(), finding UBLK_IO_FLAG_ABORTED * guarantees that here is a re-issued request aborted previously. @@ -1257,17 +1251,15 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq) if (unlikely(io->flags & UBLK_IO_FLAG_ABORTED)) { ublk_abort_io_cmds(ubq); } else { - struct io_uring_cmd *cmd = io->cmd; - struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); - - pdu->ubq = ubq; - io_uring_cmd_complete_in_task(cmd, ublk_rq_task_work_cb); + io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb); } } static enum blk_eh_timer_return ublk_timeout(struct request *rq) { struct ublk_queue *ubq = rq->mq_hctx->driver_data; + unsigned int nr_inflight = 0; + int i; if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) { if (!ubq->timeout) { @@ -1278,6 +1270,29 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq) return BLK_EH_DONE; } + if (!ubq_daemon_is_dying(ubq)) + return BLK_EH_RESET_TIMER; + + for (i = 0; i < ubq->q_depth; i++) { + struct ublk_io *io = &ubq->ios[i]; + + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) + nr_inflight++; + } + + /* cancelable uring_cmd can't help us if all commands are in-flight */ + if (nr_inflight == ubq->q_depth) { + struct ublk_device *ub = ubq->dev; + + if (ublk_abort_requests(ub, ubq)) { + if (ublk_can_use_recovery(ub)) + schedule_work(&ub->quiesce_work); + else + schedule_work(&ub->stop_work); + } + return BLK_EH_DONE; + } + return BLK_EH_RESET_TIMER; } @@ -1307,7 +1322,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx, blk_mq_start_request(bd->rq); - if (unlikely(ubq_daemon_is_dying(ubq))) { + if (unlikely(ubq->canceling)) { __ublk_abort_rq(ubq, rq); return BLK_STS_OK; } @@ -1415,9 +1430,9 @@ static void ublk_commit_completion(struct ublk_device *ub, } /* - * When ->ubq_daemon is exiting, either new request is ended immediately, - * or any queued io command is drained, so it is safe to abort queue - * lockless + * Called from ubq_daemon context via cancel fn, meantime quiesce ublk + * blk-mq queue, so we are called exclusively with blk-mq and ubq_daemon + * context, so everything is serialized. */ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq) { @@ -1440,10 +1455,17 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq) } } -static bool ublk_abort_requests(struct ublk_device *ub) +static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq) { struct gendisk *disk; - int i; + + spin_lock(&ubq->cancel_lock); + if (ubq->canceling) { + spin_unlock(&ubq->cancel_lock); + return false; + } + ubq->canceling = true; + spin_unlock(&ubq->cancel_lock); spin_lock(&ub->lock); disk = ub->ub_disk; @@ -1457,53 +1479,69 @@ static bool ublk_abort_requests(struct ublk_device *ub) /* Now we are serialized with ublk_queue_rq() */ blk_mq_quiesce_queue(disk->queue); - for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { - struct ublk_queue *ubq = ublk_get_queue(ub, i); - - if (ubq_daemon_is_dying(ubq)) { - /* abort queue is for making forward progress */ - ublk_abort_queue(ub, ubq); - } - } + /* abort queue is for making forward progress */ + ublk_abort_queue(ub, ubq); blk_mq_unquiesce_queue(disk->queue); put_device(disk_to_dev(disk)); return true; } -static void ublk_daemon_monitor_work(struct work_struct *work) +static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io, + unsigned int issue_flags) { - struct ublk_device *ub = - container_of(work, struct ublk_device, monitor_work.work); - int i; + bool done; - for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { - struct ublk_queue *ubq = ublk_get_queue(ub, i); + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) + return; - if (ubq_daemon_is_dying(ubq)) - goto found; - } - return; + spin_lock(&ubq->cancel_lock); + done = !!(io->flags & UBLK_IO_FLAG_CANCELED); + if (!done) + io->flags |= UBLK_IO_FLAG_CANCELED; + spin_unlock(&ubq->cancel_lock); + + if (!done) + io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags); +} + +/* + * The ublk char device won't be closed when calling cancel fn, so both + * ublk device and queue are guaranteed to be live + */ +static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); + struct ublk_queue *ubq = pdu->ubq; + struct task_struct *task; + struct ublk_device *ub; + bool need_schedule; + struct ublk_io *io; -found: - if (!ublk_abort_requests(ub)) + if (WARN_ON_ONCE(!ubq)) return; - if (ublk_can_use_recovery(ub)) - schedule_work(&ub->quiesce_work); - else - schedule_work(&ub->stop_work); + if (WARN_ON_ONCE(pdu->tag >= ubq->q_depth)) + return; - /* - * We can't schedule monitor work after ub's state is not UBLK_S_DEV_LIVE. - * after ublk_remove() or __ublk_quiesce_dev() is started. - * - * No need ub->mutex, monitor work are canceled after state is marked - * as not LIVE, so new state is observed reliably. - */ - if (ub->dev_info.state == UBLK_S_DEV_LIVE) - schedule_delayed_work(&ub->monitor_work, - UBLK_DAEMON_MONITOR_PERIOD); + task = io_uring_cmd_get_task(cmd); + if (WARN_ON_ONCE(task && task != ubq->ubq_daemon)) + return; + + ub = ubq->dev; + need_schedule = ublk_abort_requests(ub, ubq); + + io = &ubq->ios[pdu->tag]; + WARN_ON_ONCE(io->cmd != cmd); + ublk_cancel_cmd(ubq, &ubq->ios[pdu->tag], issue_flags); + + if (need_schedule) { + if (ublk_can_use_recovery(ub)) + schedule_work(&ub->quiesce_work); + else + schedule_work(&ub->stop_work); + } } static inline bool ublk_queue_ready(struct ublk_queue *ubq) @@ -1515,24 +1553,8 @@ static void ublk_cancel_queue(struct ublk_queue *ubq) { int i; - for (i = 0; i < ubq->q_depth; i++) { - struct ublk_io *io = &ubq->ios[i]; - - if (io->flags & UBLK_IO_FLAG_ACTIVE) { - bool done; - - spin_lock(&ubq->cancel_lock); - done = !!(io->flags & UBLK_IO_FLAG_CANCELED); - if (!done) - io->flags |= UBLK_IO_FLAG_CANCELED; - spin_unlock(&ubq->cancel_lock); - - if (!done) - io_uring_cmd_done(io->cmd, - UBLK_IO_RES_ABORT, 0, - IO_URING_F_UNLOCKED); - } - } + for (i = 0; i < ubq->q_depth; i++) + ublk_cancel_cmd(ubq, &ubq->ios[i], IO_URING_F_UNLOCKED); } /* Cancel all pending commands, must be called after del_gendisk() returns */ @@ -1579,15 +1601,6 @@ static void __ublk_quiesce_dev(struct ublk_device *ub) blk_mq_quiesce_queue(ub->ub_disk->queue); ublk_wait_tagset_rqs_idle(ub); ub->dev_info.state = UBLK_S_DEV_QUIESCED; - /* we are going to release task_struct of ubq_daemon and resets - * ->ubq_daemon to NULL. So in monitor_work, check on ubq_daemon causes UAF. - * Besides, monitor_work is not necessary in QUIESCED state since we have - * already scheduled quiesce_work and quiesced all ubqs. - * - * Do not let monitor_work schedule itself if state it QUIESCED. And we cancel - * it here and re-schedule it in END_USER_RECOVERY to avoid UAF. - */ - cancel_delayed_work_sync(&ub->monitor_work); } static void ublk_quiesce_work_fn(struct work_struct *work) @@ -1650,7 +1663,6 @@ static void ublk_stop_dev(struct ublk_device *ub) unlock: mutex_unlock(&ub->mutex); ublk_cancel_dev(ub); - cancel_delayed_work_sync(&ub->monitor_work); } /* device can only be started after all IOs are ready */ @@ -1701,6 +1713,21 @@ static inline void ublk_fill_io_cmd(struct ublk_io *io, io->addr = buf_addr; } +static inline void ublk_prep_cancel(struct io_uring_cmd *cmd, + unsigned int issue_flags, + struct ublk_queue *ubq, unsigned int tag) +{ + struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); + + /* + * Safe to refer to @ubq since ublk_queue won't be died until its + * commands are completed + */ + pdu->ubq = ubq; + pdu->tag = tag; + io_uring_cmd_mark_cancelable(cmd, issue_flags); +} + static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags, const struct ublksrv_io_cmd *ub_cmd) @@ -1816,6 +1843,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, default: goto out; } + ublk_prep_cancel(cmd, issue_flags, ubq, tag); return -EIOCBQUEUED; out: @@ -1883,6 +1911,11 @@ static void ublk_ch_uring_cmd_cb(struct io_uring_cmd *cmd, static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { + if (unlikely(issue_flags & IO_URING_F_CANCEL)) { + ublk_uring_cmd_cancel_fn(cmd, issue_flags); + return 0; + } + /* well-implemented server won't run into unlocked */ if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) { io_uring_cmd_complete_in_task(cmd, ublk_ch_uring_cmd_cb); @@ -2213,8 +2246,6 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd) if (wait_for_completion_interruptible(&ub->completion) != 0) return -EINTR; - schedule_delayed_work(&ub->monitor_work, UBLK_DAEMON_MONITOR_PERIOD); - mutex_lock(&ub->mutex); if (ub->dev_info.state == UBLK_S_DEV_LIVE || test_bit(UB_STATE_USED, &ub->state)) { @@ -2385,7 +2416,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) spin_lock_init(&ub->lock); INIT_WORK(&ub->quiesce_work, ublk_quiesce_work_fn); INIT_WORK(&ub->stop_work, ublk_stop_work_fn); - INIT_DELAYED_WORK(&ub->monitor_work, ublk_daemon_monitor_work); ret = ublk_alloc_dev_number(ub, header->dev_id); if (ret < 0) @@ -2639,6 +2669,7 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq) /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */ ubq->ubq_daemon = NULL; ubq->timeout = false; + ubq->canceling = false; for (i = 0; i < ubq->q_depth; i++) { struct ublk_io *io = &ubq->ios[i]; @@ -2724,7 +2755,6 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub, __func__, header->dev_id); blk_mq_kick_requeue_list(ub->ub_disk->queue); ub->dev_info.state = UBLK_S_DEV_LIVE; - schedule_delayed_work(&ub->monitor_work, UBLK_DAEMON_MONITOR_PERIOD); ret = 0; out_unlock: mutex_unlock(&ub->mutex); From patchwork Mon Oct 9 09:33:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13413235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7905E95A96 for ; Mon, 9 Oct 2023 09:36:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345804AbjJIJgM (ORCPT ); Mon, 9 Oct 2023 05:36:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345875AbjJIJgL (ORCPT ); Mon, 9 Oct 2023 05:36:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2105BD6 for ; Mon, 9 Oct 2023 02:34:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696844094; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3kULn7XJW6r9/K5SsnJw6mRPX4XQ+SkNMJL3kPzdeUY=; b=FeJwaEGZe6v0dbdqE+Z1GPgK+w+NBssafeD0ADWraoKWNWGjNG8cZ3vTgV5oES+AsskyLl 1HYaM4xTcWGxtk2FPPltbHZ7kbMmpeNUCxq3l3r9CKr9I2hOvsNNM+Amr0FHHyNqU69iDL k70TzkLWiyA5n9dHusfYFpNIux+lHDY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-665-xRrnsAbvMmatImXt8H0UXg-1; Mon, 09 Oct 2023 05:34:51 -0400 X-MC-Unique: xRrnsAbvMmatImXt8H0UXg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A802A80CDA1; Mon, 9 Oct 2023 09:34:50 +0000 (UTC) Received: from localhost (unknown [10.72.120.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id C484E63F50; Mon, 9 Oct 2023 09:34:49 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: Ming Lei Subject: [PATCH for-6.7/io_uring 7/7] ublk: simplify aborting request Date: Mon, 9 Oct 2023 17:33:22 +0800 Message-ID: <20231009093324.957829-8-ming.lei@redhat.com> In-Reply-To: <20231009093324.957829-1-ming.lei@redhat.com> References: <20231009093324.957829-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Now ublk_abort_queue() is run exclusively with ublk_queue_rq() and the ubq_daemon task, so simplify aborting request: - set UBLK_IO_FLAG_ABORTED in ublk_abort_queue() just for aborting this request - abort request in ublk_queue_rq() if ubq->canceling is set Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 41 +++++++++++----------------------------- 1 file changed, 11 insertions(+), 30 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 75ed7b87a844..20237b8f318a 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1083,13 +1083,10 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io, { WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE); - if (!(io->flags & UBLK_IO_FLAG_ABORTED)) { - io->flags |= UBLK_IO_FLAG_ABORTED; - if (ublk_queue_can_use_recovery_reissue(ubq)) - blk_mq_requeue_request(req, false); - else - ublk_put_req_ref(ubq, req); - } + if (ublk_queue_can_use_recovery_reissue(ubq)) + blk_mq_requeue_request(req, false); + else + ublk_put_req_ref(ubq, req); } static void ubq_complete_io_cmd(struct ublk_io *io, int res, @@ -1230,27 +1227,10 @@ static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, unsigned issue_flags) static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq) { struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); - struct ublk_io *io; - if (!llist_add(&data->node, &ubq->io_cmds)) - return; + if (llist_add(&data->node, &ubq->io_cmds)) { + struct ublk_io *io = &ubq->ios[rq->tag]; - io = &ubq->ios[rq->tag]; - /* - * If the check pass, we know that this is a re-issued request aborted - * previously in cancel fn because the ubq_daemon(cmd's task) is - * PF_EXITING. We cannot call io_uring_cmd_complete_in_task() anymore - * because this ioucmd's io_uring context may be freed now if no inflight - * ioucmd exists. Otherwise we may cause null-deref in ctx->fallback_work. - * - * Note: cancel fn sets UBLK_IO_FLAG_ABORTED and ends this request(releasing - * the tag). Then the request is re-started(allocating the tag) and we are here. - * Since releasing/allocating a tag implies smp_mb(), finding UBLK_IO_FLAG_ABORTED - * guarantees that here is a re-issued request aborted previously. - */ - if (unlikely(io->flags & UBLK_IO_FLAG_ABORTED)) { - ublk_abort_io_cmds(ubq); - } else { io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb); } } @@ -1320,13 +1300,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx, if (ublk_queue_can_use_recovery(ubq) && unlikely(ubq->force_abort)) return BLK_STS_IOERR; - blk_mq_start_request(bd->rq); - if (unlikely(ubq->canceling)) { __ublk_abort_rq(ubq, rq); return BLK_STS_OK; } + blk_mq_start_request(bd->rq); ublk_queue_cmd(ubq, rq); return BLK_STS_OK; @@ -1449,8 +1428,10 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq) * will do it */ rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i); - if (rq) + if (rq && blk_mq_request_started(rq)) { + io->flags |= UBLK_IO_FLAG_ABORTED; __ublk_fail_req(ubq, io, rq); + } } } } @@ -1534,7 +1515,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd, io = &ubq->ios[pdu->tag]; WARN_ON_ONCE(io->cmd != cmd); - ublk_cancel_cmd(ubq, &ubq->ios[pdu->tag], issue_flags); + ublk_cancel_cmd(ubq, io, issue_flags); if (need_schedule) { if (ublk_can_use_recovery(ub))