From patchwork Tue Sep 3 08:16:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 13788248 Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9B45205E13 for ; Tue, 3 Sep 2024 08:17:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725351434; cv=none; b=W4/6pqFUdoMMD13gvBDfE86PZ2kncpisCxInhL+KWKBQIPPM5NUFyHQYbINJpMpFX6G73ZY9ui6Nl/oHwHtxWsIeBSWZcKw21Scx7OCJsgGuccz4WSFwQFYoQXZAbU58IKLBjR0wVagsBiluklpN49e+GUpzPghHYH4sQu4OvRw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725351434; c=relaxed/simple; bh=SDvrf8Qpiz7hwXgQkgMbvx6pHfwmjuJf1BI3qF5qsUw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VQiSR96Ae2JYq38sEwhzcNxd4lL54BvDR5zThXcsZfUMblpTrpPWs2blBq6HvCzVZdFfEJfymmYjzOkYwJDJJDfFfj/zOe0G0m5vl0Z+X2Z+u2qwn8gWoYCKWZYBawNFWz6c5ajU8Q2E2txUJBl00I4vx0ospTzUL1Sljs5MdVA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=XN9Jxqps; arc=none smtp.client-ip=209.85.215.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="XN9Jxqps" Received: by mail-pg1-f175.google.com with SMTP id 41be03b00d2f7-6c5bcb8e8edso3579845a12.2 for ; Tue, 03 Sep 2024 01:17:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1725351432; x=1725956232; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ibW+O9NpvNOI9jTnJvwxWKx05ZtMP1ehfDgoD0Cwn/o=; b=XN9Jxqps54SlOFrHvi6vO+cbKpfK3kPweI6WP6vaDsWZSIFMh+7VsAHJGJYxcMKf97 2N/z8vP67KV5V6Q5CQn14BWkYGyVreexvTSgms5H52L96pvfEik9tdnq7Xpi58Rvvv8y tSAnBPGjEf7f3NGf30cEyvGrdf75SxS1V6d7pxDbx75SL5tEyUp7LYdxWzS9/eXBDcHo YLgosILWZaIPKiic/3n8VF1RIL8IiizgViTtkWKuid8WldJYFeCoE7r7z8nZxrkeiPF5 Ct3QVu5mk52F6egzD1WJHFDYORDJ3G8ZFMH+UukSZmncN1//o6aVsmFdicgkWcGL94MG /yEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725351432; x=1725956232; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ibW+O9NpvNOI9jTnJvwxWKx05ZtMP1ehfDgoD0Cwn/o=; b=bGMUWXZSnhsdIri6cAJ4SWo23Gi1Dx380KucG1fM9wcuOWZp5rMHjMFksqeuoqU2IX chKai8v00sewYf0PyjWmTL70M+JTaRyPVc6eoIxvlwuCZYXRzZMzlTqMEsYOieYLkJER nx895rqIBRKtgOQe55Zi6x0rHRyct5eoeX7TgR1X7H1xlXGaeoGYGYzreS2c4R+n//Zy iREtxoAA8LnUBQR6eHBBvu1JljC6thq2CJIVq3b9jINuL2APrsDWpK6QPbzH6S/oSs3j IMepo8ExcSNZ+e8NrFkBiKxUdgAprF303T8ws7dfQtrTu87N4LXGZZYDESqHwpXnznXh Emvw== X-Gm-Message-State: AOJu0YxNPxQMXiWYE8kvwKZVY3Fm1ZHRngI/BCnWSdMNww7yHVetCFt5 qJN99cQkOCr2tyexOVPV1zHppE3s8hnrvPd3DiYpalDBkcHJ1t7I298jxz9ALR4= X-Google-Smtp-Source: AGHT+IH8T062SK2gcp022LHlt/cN78AcsQ9Ou2pVs6Q14mJRrrNfE/3voyxn4xjoc1ocTKZsy05PnQ== X-Received: by 2002:a17:903:26c5:b0:202:4666:f018 with SMTP id d9443c01a7336-20584193b42mr34059395ad.15.1725351432090; Tue, 03 Sep 2024 01:17:12 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([139.177.225.235]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20559cae667sm38155435ad.95.2024.09.03.01.17.08 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 03 Sep 2024 01:17:11 -0700 (PDT) From: Muchun Song To: axboe@kernel.dk, ming.lei@redhat.com, yukuai1@huaweicloud.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Muchun Song , stable@vger.kernel.org Subject: [PATCH v2 1/3] block: fix missing dispatching request when queue is started or unquiesced Date: Tue, 3 Sep 2024 16:16:51 +0800 Message-Id: <20240903081653.65613-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240903081653.65613-1-songmuchun@bytedance.com> References: <20240903081653.65613-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Supposing the following scenario with a virtio_blk driver. CPU0 CPU1 CPU2 blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() blk_mq_try_issue_directly() virtblk_done() if (blk_mq_hctx_stopped()) blk_mq_request_bypass_insert() blk_mq_start_stopped_hw_queue() blk_mq_run_hw_queue() blk_mq_run_hw_queue() blk_mq_insert_request() return // Who is responsible for dispatching this IO request? After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped. But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt of completion of request, so it will run the hardware queue and marks the queue as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1 and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same hardware queue and returns. It misses dispatching a request. Fix it by running the hardware queue explicitly. And blk_mq_request_issue_directly() should handle a similar situation. Fix it as well. Fixes: d964f04a8fde8 ("blk-mq: fix direct issue") Cc: stable@vger.kernel.org Cc: Muchun Song Signed-off-by: Muchun Song Reviewed-by: Ming Lei --- block/blk-mq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index e3c3c0c21b553..b2d0f22de0c7f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2619,6 +2619,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) { blk_mq_insert_request(rq, 0); + blk_mq_run_hw_queue(hctx, false); return; } @@ -2649,6 +2650,7 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last) if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) { blk_mq_insert_request(rq, 0); + blk_mq_run_hw_queue(hctx, false); return BLK_STS_OK; } From patchwork Tue Sep 3 08:16:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 13788249 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2008C2139A8 for ; Tue, 3 Sep 2024 08:17:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725351438; cv=none; b=Rv0eOkne1KLw48LCCGSyDihOCArqFSxr4f9YcvuPM5LfufUB1s0BNb/qjJHd2K/X0/IhuxML2uyMnxumV1XMyVLkKnELM2O97VEPMQjcMf6VUhudMxaQ/bWHF4rrHkS9sYSwmyD8MPwRRKkbnOOiieggUrHV7nodAFduO6J+KwU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725351438; c=relaxed/simple; bh=HT1O81auJF8jkUbfoH8a9VXFCdl4byvmYDOuWSv02rQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ajQOzgUqLMGL7y0VY7k1+8dOBvEufZ0oDynqHG0ocjBFk7sk2dFlIFehRz0PWswejVSqXIYD99Xs9x76vf+XQvyLjps5BRI3XMG2tT91z50DiyxgtLkK/mG/JhdGRegYSuSqDb0C1zETvx7F8fVNSBAmyl7hxuICUJMUBiH/G+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=jfKybEPZ; arc=none smtp.client-ip=209.85.215.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="jfKybEPZ" Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-7d4ed6158bcso494356a12.1 for ; Tue, 03 Sep 2024 01:17:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1725351436; x=1725956236; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=b5RIKba1vN04ymFPVOLPxfjA30q3/2IhkWeGUF/bwck=; b=jfKybEPZZ4IZvQT8g0LM8NUJxUc+tmgXcracvv+pVHq5AWbHQYDpCz8Dkm/7d3PWG8 ++DPejHMXHkzl9aI/B6fbcVDuN4gzVjAEhVKG54OuxbZ70BebfmGQzuU604UyP0zT6cb ydXV1Nk6FqyZB01zQtpYd9jrfUo6k4HCLO1ypTJlCCa5PumOLuF7fGovHpoDixTYzd2r UCqty6UfAGWn9uXvk5YrBwlF6tQlYsbxAkskPv84WK2Va+j0QucXuupkCh6TIU74RXvr sxutAX0QbQ57idtJm0hS7TdyJ2QoX98yDdGGYISvMNb4Mx176iiWeZVltITfAOCAgZwI a+Aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725351436; x=1725956236; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b5RIKba1vN04ymFPVOLPxfjA30q3/2IhkWeGUF/bwck=; b=qiTZgw8tqEv0FiFy52zDpcp31qlOVmBfLfv0Z3Np98+dkldRUX7tq70BsbZUvayfdP R32QK56KMDHIV397ysWxpgTx/0F6q/OQFNPYsoOsI7Y5lXrXgbgpPmvZIcBfMR0j/qnC pBKeqPODAvb87Au6wViu5zH9V3CAMBsYdeMhk+Y65Zm70N2xlWhYfvW49k3NQpEvvqp1 QPs1G/CCEEeh1u57ZMZdDrZRzjlxqZD81FcO1S4otshF0jgJdPgL8kQFurUh3mTUFarL 4qVgtFBQSPzhP+iyBIyD4+eH3gkPIuxEnpFBKPbSwyR+77QRmalPXV1CFsVHav5tPawT O+IQ== X-Gm-Message-State: AOJu0Yxsg8uZCPL5lIL+AJottZeT/yGuqxJXQK+TkaLq2a6WVf1TTd3C NaJ1XHZXhrNjCaEtrJSEnptLIHCxRM3QsuxQLXCXZMq0Dp2drW1zRhtNbB7Q1yM= X-Google-Smtp-Source: AGHT+IG6Sq1/GZFiK1wz/f6HwLVjUyMVd96eIAsnzZlYfQt73JmPituiRdU5k5HR2fvAmfKgLi2QWQ== X-Received: by 2002:a05:6a21:3483:b0:1cc:de68:436b with SMTP id adf61e73a8af0-1cece503ad2mr8214023637.20.1725351436282; Tue, 03 Sep 2024 01:17:16 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([139.177.225.235]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20559cae667sm38155435ad.95.2024.09.03.01.17.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 03 Sep 2024 01:17:15 -0700 (PDT) From: Muchun Song To: axboe@kernel.dk, ming.lei@redhat.com, yukuai1@huaweicloud.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Muchun Song , stable@vger.kernel.org Subject: [PATCH v2 2/3] block: fix ordering between checking QUEUE_FLAG_QUIESCED and adding requests Date: Tue, 3 Sep 2024 16:16:52 +0800 Message-Id: <20240903081653.65613-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240903081653.65613-1-songmuchun@bytedance.com> References: <20240903081653.65613-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Supposing the following scenario. CPU0 CPU1 blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_mq_run_hw_queue() blk_queue_flag_clear(QUEUE_FLAG_QUIESCED) 3) store if (blk_queue_quiesced()) 2) load blk_mq_run_hw_queues() return blk_mq_run_hw_queue() blk_mq_sched_dispatch_requests() if (!blk_mq_hctx_has_pending()) 4) load return The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not re-run the hardware queue causing starvation. So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained. Fixes: f4560ffe8cec1 ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song Signed-off-by: Muchun Song Reviewed-by: Ming Lei --- block/blk-mq.c | 47 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index b2d0f22de0c7f..ac39f2a346a52 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2202,6 +2202,24 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) } EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); +static inline bool blk_mq_hw_queue_need_run(struct blk_mq_hw_ctx *hctx) +{ + bool need_run; + + /* + * When queue is quiesced, we may be switching io scheduler, or + * updating nr_hw_queues, or other things, and we can't run queue + * any more, even blk_mq_hctx_has_pending() can't be called safely. + * + * And queue will be rerun in blk_mq_unquiesce_queue() if it is + * quiesced. + */ + __blk_mq_run_dispatch_ops(hctx->queue, false, + need_run = !blk_queue_quiesced(hctx->queue) && + blk_mq_hctx_has_pending(hctx)); + return need_run; +} + /** * blk_mq_run_hw_queue - Start to run a hardware queue. * @hctx: Pointer to the hardware queue to run. @@ -2222,20 +2240,23 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING); - /* - * When queue is quiesced, we may be switching io scheduler, or - * updating nr_hw_queues, or other things, and we can't run queue - * any more, even __blk_mq_hctx_has_pending() can't be called safely. - * - * And queue will be rerun in blk_mq_unquiesce_queue() if it is - * quiesced. - */ - __blk_mq_run_dispatch_ops(hctx->queue, false, - need_run = !blk_queue_quiesced(hctx->queue) && - blk_mq_hctx_has_pending(hctx)); + need_run = blk_mq_hw_queue_need_run(hctx); + if (!need_run) { + unsigned long flags; - if (!need_run) - return; + /* + * Synchronize with blk_mq_unquiesce_queue(), becuase we check + * if hw queue is quiesced locklessly above, we need the use + * ->queue_lock to make sure we see the up-to-date status to + * not miss rerunning the hw queue. + */ + spin_lock_irqsave(&hctx->queue->queue_lock, flags); + need_run = blk_mq_hw_queue_need_run(hctx); + spin_unlock_irqrestore(&hctx->queue->queue_lock, flags); + + if (!need_run) + return; + } if (async || !cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)) { blk_mq_delay_run_hw_queue(hctx, 0); From patchwork Tue Sep 3 08:16:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 13788250 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 229C021C176 for ; Tue, 3 Sep 2024 08:17:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725351442; cv=none; b=IejYYrUgYw2vTMiUe06g+VPjy+xXHCjEYfybqxVCjzxAL6r2l1Hsk9MCpIZEQptySHWY+3T17rTXfMeAZQzHSvyIf+/VX/QMJDRIqfKC7mr/zB4yMLgMjn+5pp3wF+wk2g5LiyoLPLswOT4PPjdLrDmUL8RB1MtTq514yWo/3Q8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725351442; c=relaxed/simple; bh=2nS7H6L6WID7zbgTfR3cXsYRR840ppC8iAY3XsHIAg0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=pIVhetcdvJ5Pvphg2fIl8M3EcbLvJHuMO/1kKZ5JdEsYrOiysMhJ1U2fpHYOtxXi/w/s96kF7YVzSgskZOfaQA6K7h4ey1rJU73mo8pdNnBtmoWYTrm7a0CqOWUOJu95fMdzsOdCnBC5bfou5SaPnwIE3Mm//VsvBh/e1iUdlmQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=eR1ma8vP; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="eR1ma8vP" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1fee6435a34so34903575ad.0 for ; Tue, 03 Sep 2024 01:17:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1725351440; x=1725956240; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zDpBNlTp4IAb2nYSzA4S5C3FeNDLg/TiXjgWPP/Ofy0=; b=eR1ma8vP1rVo2mTTdEb7mmcigU/DSrA3KFc1Tano8EYDkLPVUG2vguMd6M3o61aaUd zGUUv0JzlEQLNTQu3rIsFF3+ER1OscK1lRxPtcWtbmnfwrTyl4uVEnryl2qxSnSOTIKO kSbNuWgU14s7AgajBYideY69lEMcARo54QaeNv9KmQ7gcK2XKEB+1Xn3nXtHJ6X75lhK Yb9HFl8gafzsGH5BQDQaV9Uub1dqjn8fx/RorIkcAytnoPbeoCzLTVUIHIL6QvuVMe/g 8Pg+MlhNbZ93OJlPLsHcjKMiceZC8E5VnEvxVcsigBKaqULh/oeHtpjsHQOavcqrAL19 phyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725351440; x=1725956240; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zDpBNlTp4IAb2nYSzA4S5C3FeNDLg/TiXjgWPP/Ofy0=; b=nZrDzkxu/t2Gk+v7l6lu5KxMkQMK5Cd5sfhQiXGz2IHe3+FA+y1P62LrdLog4ScKpu 07UtboVJ0Gz9Z8w++PLI20eSAe0XOq8RZLKGbLOuGVAUXXMSX+4vfsIzfxS+nOdDwOfX lriwUVSReeL1FHHVwzQ/X6W4d4BCHQ3eyOm2VBIEut790ZiVaiMhruvLmtfOgNEh90Qq K2f76hVgZGuLpwnxZfbdx5crdiWxpb08A2p2JmTcLTzxNPjLvGjxjusvPxRdnhBR3fwo WAF0677LoDVoXZCDR8UMUvFRXgieeAGou0RBcuaAgCF2O6VGyN9tBc0I82QKBSZAPBFL /rbQ== X-Gm-Message-State: AOJu0YxsRklNFhGP/YiJzJj6ylqjLTcFF5pubpnPUfT+W2Oa72dA5Y/r t+ciprnF+7ZkxdYhk1qrWoseU2ZRQzCp8O1CQrFuTcb8MQd1elBWL9VQvPbRmqA= X-Google-Smtp-Source: AGHT+IGlXjkmm4wRivHlKxkzFWkFVKdYxcu00qrJi9p0aoh/oPB1y4lwMJCCvMAwLqQVPKWe0Wg4uw== X-Received: by 2002:a17:903:230a:b0:203:a13a:c49e with SMTP id d9443c01a7336-20699acb7bfmr12843815ad.1.1725351440502; Tue, 03 Sep 2024 01:17:20 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([139.177.225.235]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20559cae667sm38155435ad.95.2024.09.03.01.17.16 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 03 Sep 2024 01:17:19 -0700 (PDT) From: Muchun Song To: axboe@kernel.dk, ming.lei@redhat.com, yukuai1@huaweicloud.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Muchun Song , stable@vger.kernel.org Subject: [PATCH v2 3/3] block: fix ordering between checking BLK_MQ_S_STOPPED and adding requests Date: Tue, 3 Sep 2024 16:16:53 +0800 Message-Id: <20240903081653.65613-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240903081653.65613-1-songmuchun@bytedance.com> References: <20240903081653.65613-1-songmuchun@bytedance.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Supposing first scenario with a virtio_blk driver. CPU0 CPU1 blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_request_bypass_insert() blk_mq_start_stopped_hw_queues() /* Add IO request to dispatch list */ 1) store blk_mq_start_stopped_hw_queue() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) if (!blk_mq_hctx_has_pending()) 4) load return return blk_mq_sched_dispatch_requests() blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load if (blk_mq_hctx_stopped()) return return __blk_mq_sched_dispatch_requests() __blk_mq_sched_dispatch_requests() Supposing another scenario. CPU0 CPU1 blk_mq_requeue_work() /* Add IO request to dispatch list */ 1) store virtblk_done() blk_mq_run_hw_queues()/blk_mq_delay_run_hw_queues() blk_mq_start_stopped_hw_queues() if (blk_mq_hctx_stopped()) 2) load blk_mq_start_stopped_hw_queue() continue clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue()/blk_mq_delay_run_hw_queue() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests() Both scenarios are similar, the full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list. Otherwise, either CPU will not rerun the hardware queue causing starvation of the request. The easy way to fix it is to add the essential full memory barrier into helper of blk_mq_hctx_stopped(). In order to not affect the fast path (hardware queue is not stopped most of the time), we only insert the barrier into the slow path. Actually, only slow path needs to care about missing of dispatching the request to the low-level device driver. Fixes: 320ae51feed5c ("blk-mq: new multi-queue block IO queueing mechanism") Cc: stable@vger.kernel.org Cc: Muchun Song Signed-off-by: Muchun Song Reviewed-by: Ming Lei --- block/blk-mq.c | 6 ++++++ block/blk-mq.h | 13 +++++++++++++ 2 files changed, 19 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index ac39f2a346a52..48a6a437fba5e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2413,6 +2413,12 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) return; clear_bit(BLK_MQ_S_STOPPED, &hctx->state); + /* + * Pairs with the smp_mb() in blk_mq_hctx_stopped() to order the + * clearing of BLK_MQ_S_STOPPED above and the checking of dispatch + * list in the subsequent routine. + */ + smp_mb__after_atomic(); blk_mq_run_hw_queue(hctx, async); } EXPORT_SYMBOL_GPL(blk_mq_start_stopped_hw_queue); diff --git a/block/blk-mq.h b/block/blk-mq.h index 260beea8e332c..f36f3bff70d86 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -228,6 +228,19 @@ static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx) { + /* Fast path: hardware queue is not stopped most of the time. */ + if (likely(!test_bit(BLK_MQ_S_STOPPED, &hctx->state))) + return false; + + /* + * This barrier is used to order adding of dispatch list before and + * the test of BLK_MQ_S_STOPPED below. Pairs with the memory barrier + * in blk_mq_start_stopped_hw_queue() so that dispatch code could + * either see BLK_MQ_S_STOPPED is cleared or dispatch list is not + * empty to avoid missing dispatching requests. + */ + smp_mb(); + return test_bit(BLK_MQ_S_STOPPED, &hctx->state); }