From patchwork Wed Jan 15 22:46:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940987 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 076E51DCB2D for ; Wed, 15 Jan 2025 22:47:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981229; cv=none; b=UlKMtolqwfFJrdEk6s27ARE1EZpSTt49b7CpOi1QHyEflMF75K1TZqQO/r3XELmBXabrn8Y+YK3BmXt+gNeK7ezO3Po7M34XuFNTKsjH/8rwaBHfO0pEI4ef0GanM15Mn83PuMVK0RqboSDd9MQxMX+Z+7aIAiuwNbBGeEDNVr4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981229; c=relaxed/simple; bh=Vl376vBzbC3818vHqyrU+BG0ANdndjucMMWz/Ja/Jus=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UVOmedWklXUnGk857uvv1FJVUN9bgk1Tq+C9nyWsZC3GVZVrmMgJQoDCGGWmmoO/3QX9P7g+2KNzSEOHuBdVDgcpTjC0thlZhmzrlKr8r7DjupbmbvwZJwZ4syOK687BndyornkhV/8zfTKb6stng9oC5+Rh9x8PljJK7WAE07s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=dGA4I5X7; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="dGA4I5X7" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLj62wQ0z6CmQyl; Wed, 15 Jan 2025 22:47:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981221; x=1739573222; bh=2oQci tnXPB1PGgSCEGuTk8GRrlHVH4EXutGmUuz0FuU=; b=dGA4I5X7DTz10cSqUKI89 vBvdk4HbBLKqyEcSmGPfrJoDG7/TUHwbZEC+ZuUmCCGrW2beiR+EfrA1BA+83aAz 2wRchEdKueequfGH1+66MZwMd8hl6FEHqA+8ucn54KnLEclpnYlm6wCaiOlGkftL vENIHCGDMMLLjeK0RGQjpDZOcFJgUMUex1lGR7dgVmEnjVuxIj0EEJaTM+bRW3Ov upba1BnaU1JZPeseOb7RSuQGOC8E6kQD3rNhdJOeiuxYVnIaFG+zd6a6JfryqTaA toP3yBduyJz+sCTLzBQgKWumhXe7IVPN415fxY8asyoEUH2NkcSzHcNUc8n3UQKP Q== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id 1hb4lZKh1uDK; Wed, 15 Jan 2025 22:47:01 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLhy59qwz6CmQtQ; Wed, 15 Jan 2025 22:46:58 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche , Hannes Reinecke , Nitesh Shetty , Ming Lei Subject: [PATCH v17 01/14] block: Support block drivers that preserve the order of write requests Date: Wed, 15 Jan 2025 14:46:35 -0800 Message-ID: <20250115224649.3973718-2-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Some storage controllers preserve the request order per hardware queue. Introduce the request queue limit member variable 'driver_preserves_write_order' to allow block drivers to indicate that the order of write commands is preserved per hardware queue and hence that serialization of writes per zone is not required if all pending writes are submitted to the same hardware queue. Cc: Damien Le Moal Cc: Hannes Reinecke Cc: Nitesh Shetty Cc: Christoph Hellwig Cc: Ming Lei Signed-off-by: Bart Van Assche --- block/blk-settings.c | 2 ++ include/linux/blkdev.h | 5 +++++ 2 files changed, 7 insertions(+) diff --git a/block/blk-settings.c b/block/blk-settings.c index c8368ee8de2e..18bcf6e6dc60 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -796,6 +796,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, } t->max_secure_erase_sectors = min_not_zero(t->max_secure_erase_sectors, b->max_secure_erase_sectors); + t->driver_preserves_write_order = t->driver_preserves_write_order && + b->driver_preserves_write_order; t->zone_write_granularity = max(t->zone_write_granularity, b->zone_write_granularity); if (!(t->features & BLK_FEAT_ZONED)) { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 7ac153e4423a..df9887412a9e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -399,6 +399,11 @@ struct queue_limits { unsigned int max_open_zones; unsigned int max_active_zones; + /* + * Whether or not the block driver preserves the order of write + * requests. Set by the block driver. + */ + bool driver_preserves_write_order; /* * Drivers that set dma_alignment to less than 511 must be prepared to From patchwork Wed Jan 15 22:46:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940988 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F27A1DD520 for ; Wed, 15 Jan 2025 22:47:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981231; cv=none; b=idrjwTySWRt9kHCJ05AQOScFrAGz94hS88eUK4ooD28ryiRnBaA4Y1Nd0QsXtmat0BAzF6TykKBifIOqI6YYSdaEl3wy5iIQBYpUM9iXhrGVlz31JKwddOWtomjifOyEmqFmvb+abVANMy/+kmmmtSMkZvwkt4DuGq7rzIRdUOE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981231; c=relaxed/simple; bh=sN0I09t5+4VIk54P5Q4iVxRAnAfjcZv7g5EecMQO1sY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=StVBeCE3hH2CwQG65b4blz4q6UvzsD86DL+bodKxiF5JrVP3orjfLbJx9L3sEEsh3eTB3w0HiKQQrc1RfMrU7+sHhB6HZwMDy2kO8z6P1GbjgNj1Qs+Z9E3iFkKgBMHWWdhaPhSLKMh/xaoqmrqJ177DmYSS/qA+ZVmS1O/5ICQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=Rz0Xr7BD; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="Rz0Xr7BD" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLj95Cbjz6CmQyY; Wed, 15 Jan 2025 22:47:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981225; x=1739573226; bh=o60G3 w0NbtF4/aQV5VamjBXkULtyPdmW2grUeoAUUSE=; b=Rz0Xr7BDVsXNhZPPzud+d 5nO8O15R3cHSEGBpd/OTCQMVREp8Ws22eVEtFY3kq1wIbd21X1/PufjBPxqgoVfN E3rtjLDNl0lzm+3VfFxtM4am47MQpUWSuN8PEups7ZuJfhZV8wWZPjXMK0D7IvR7 KDUfHGYSMKOxq3KpOCHT5Tu+KrKcVOGgvCzKEi1qiDLpuSlAVbzJ0DtWk8j0UylD W7P4swF81bQh6o4t77m/7/rewsVbLSl7wrLG87c373+Zf9jszYFxziim6lGSd7BH r23VITyxJ8UF41D/DcESCgTv11g933mEudr5y/mxGq35UL+tFaAkK52MeLI5hU8h Q== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id L18z9L71kr5O; Wed, 15 Jan 2025 22:47:05 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLj20tpvz6Cp2tZ; Wed, 15 Jan 2025 22:47:01 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche , Alasdair Kergon , Mike Snitzer , Mikulas Patocka Subject: [PATCH v17 02/14] dm-linear: Report to the block layer that the write order is preserved Date: Wed, 15 Jan 2025 14:46:36 -0800 Message-ID: <20250115224649.3973718-3-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Enable write pipelining if dm-linear is stacked on top of a driver that supports write pipelining. Cc: Alasdair Kergon Cc: Mike Snitzer Cc: Mikulas Patocka Signed-off-by: Bart Van Assche --- drivers/md/dm-linear.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 49fb0f684193..967fbf856abc 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -148,6 +148,11 @@ static int linear_report_zones(struct dm_target *ti, #define linear_report_zones NULL #endif +static void linear_io_hints(struct dm_target *ti, struct queue_limits *limits) +{ + limits->driver_preserves_write_order = true; +} + static int linear_iterate_devices(struct dm_target *ti, iterate_devices_callout_fn fn, void *data) { @@ -209,6 +214,7 @@ static struct target_type linear_target = { .map = linear_map, .status = linear_status, .prepare_ioctl = linear_prepare_ioctl, + .io_hints = linear_io_hints, .iterate_devices = linear_iterate_devices, .direct_access = linear_dax_direct_access, .dax_zero_page_range = linear_dax_zero_page_range, From patchwork Wed Jan 15 22:46:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940989 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7649B1D6DDA for ; Wed, 15 Jan 2025 22:47:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981233; cv=none; b=cOwb47v496sjMtmnez76yR6ljLQD9o1vfVtEPpFxsLblRvGKk3M63NlSD47slee9uC+cvtqqw730f5y8Z2VRYoWCXYRZDJtV7zE4KOwudqLOjwGI9M0sJJnoDPZpV6DVn9poufrTPgnhb3EWHozXIsFcBLzMZOXxSNqlJWZ3GI8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981233; c=relaxed/simple; bh=7aoVsMoZlsLW7EJlpr4POdBKg9vaNycginxTxbD6T3U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A+wGScTefvvmAfbQM430AEUq4wzGCWPuzNkadTckk6uR1bp/wo57zp1zoYj/eIgCgQ2tDlqqnBJoWgMdclemlmZcNl6f/YCgXbGTZAoxIosZR4MJTI6OgdjJ1c1PxxicFrlPBVSTD91k8d/6NoNB7xUvl7yOR8Wrqtmr6OJcF1U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=VXUQ0ONL; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="VXUQ0ONL" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjB6T3Hz6CmM6M; Wed, 15 Jan 2025 22:47:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981227; x=1739573228; bh=9s7d8 rrG+cc742a+hcMtRaEbguqVpvsYL0SH0n3FdHw=; b=VXUQ0ONLczNlsYac56/2L XXObHyuXBP+cCreN/EOndv4HQcliDeJC/s162DDDWsMlhCoaw83ahU0vzhBb9jX6 NogbXEUP/8pJLaoq68oRgJzNTjRPBnT0JLEX7D1RagdOqQh8KIwKMV9hlfz9FfeZ WG7QJQAubQSvNfxkeCF4Tbg98rDzrmXeiXKnp6FjX9HOLCbUOhrBNnDpO8MP7kck wCbLQjnMU5rMsBA4VctsSSbQwtWI4Id89iZBjVppbLTnm7ZiFgq3iKzvnecr3Ucl gCSCgkq02SRTHE1Iao74oNjpXTGQkNMH8huY/SSkWlf/LCNQ3CuebMVCs7ckwja9 A== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id 8PItihEJGSeU; Wed, 15 Jan 2025 22:47:07 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLj53mmWz6CmR5y; Wed, 15 Jan 2025 22:47:05 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche Subject: [PATCH v17 03/14] block: Rework request allocation in blk_mq_submit_bio() Date: Wed, 15 Jan 2025 14:46:37 -0800 Message-ID: <20250115224649.3973718-4-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Prepare for allocating a request from a specific hctx by making blk_mq_submit_bio() allocate a request later. The performance impact of this patch on the hot path is small: if a request is cached, one percpu_ref_get(&q->q_usage_counter) call and one percpu_ref_put(&q->q_usage_counter) call are added to the hot path. Cc: Christoph Hellwig Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- block/blk-mq.c | 31 ++++++++++--------------------- 1 file changed, 10 insertions(+), 21 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index da39a1cac702..666e6e6ba143 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3063,11 +3063,6 @@ void blk_mq_submit_bio(struct bio *bio) struct request *rq; blk_status_t ret; - /* - * If the plug has a cached request for this queue, try to use it. - */ - rq = blk_mq_peek_cached_request(plug, q, bio->bi_opf); - /* * A BIO that was released from a zone write plug has already been * through the preparation in this function, already holds a reference @@ -3076,21 +3071,13 @@ void blk_mq_submit_bio(struct bio *bio) */ if (bio_zone_write_plugging(bio)) { nr_segs = bio->__bi_nr_segments; - if (rq) - blk_queue_exit(q); goto new_request; } bio = blk_queue_bounce(bio, q); - /* - * The cached request already holds a q_usage_counter reference and we - * don't have to acquire a new one if we use it. - */ - if (!rq) { - if (unlikely(bio_queue_enter(bio))) - return; - } + if (unlikely(bio_queue_enter(bio))) + return; /* * Device reconfiguration may change logical block size or reduce the @@ -3122,8 +3109,15 @@ void blk_mq_submit_bio(struct bio *bio) goto queue_exit; new_request: + rq = blk_mq_peek_cached_request(plug, q, bio->bi_opf); if (rq) { blk_mq_use_cached_rq(rq, plug, bio); + /* + * Here we hold two references: one because of the + * bio_queue_enter() call and a second one as the result of + * request allocation. Drop one. + */ + blk_queue_exit(q); } else { rq = blk_mq_get_new_requests(q, plug, bio, nr_segs); if (unlikely(!rq)) { @@ -3169,12 +3163,7 @@ void blk_mq_submit_bio(struct bio *bio) return; queue_exit: - /* - * Don't drop the queue reference if we were trying to use a cached - * request and thus didn't acquire one. - */ - if (!rq) - blk_queue_exit(q); + blk_queue_exit(q); } #ifdef CONFIG_BLK_MQ_STACKING From patchwork Wed Jan 15 22:46:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940990 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56D541DD872 for ; Wed, 15 Jan 2025 22:47:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981235; cv=none; b=LNidF9C5zJt0zUls9f6PQJL0HP3flVSj6UI3ktCu/CMJdHn3FpMpcrK/Nysm4neJBNtVuorWhzSxUjvjc0f/KDS7s+UFnP47zIsYtsjwW//kizVSUxCWpIaaNcPYheKPhEoxbwIRCaWGiqPs5PYtP3nZw56E4GnuEEpEKhM3cA4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981235; c=relaxed/simple; bh=4GCCqIDZujNuQ4iOfAV76l8g6eZ5Wo0+bzJlP+ck8vc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Kz6DYNTZRRvLChM3zU/roEr4vo1Zzm/IlRgynqe77dYUjo7+131hOLW/l1D6vDy+wnvGAdQekNbiGlE/Ymz8xJIyVKhx9UVyyvNvpvS+rBhkB6zMDQzgRWNQT1wUrRbQjV+aglNu43IxNzMuzhHOLAruaIv0XMKDOBszBuw8qWY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=diUOB4Bp; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="diUOB4Bp" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjG0k43z6CmM6Q; Wed, 15 Jan 2025 22:47:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981229; x=1739573230; bh=1meVa 6+cXI2xhPzQYIN+eJ30Zh47fE3MjJOPmo1Kd0U=; b=diUOB4BpU3eA8KXLpdVBr yvacROE3VeAQOp6/NJWsleCKEJI0g7hMMyM0wS/S+e47wmE+dYGNJOLgtpe2T4fV 05WPvwlF9tf4rPOpZCpkqTH1r6HTAxZvwBhct62DPvKc0NdS8L7JeBg1qWxLUgeq ePx0Ik3PWIIo6X9VVr8ucE6jrzUzk/T4+3Zb5UlPUO9K+ojRk1Ggf0DlqOM2Wq11 wmvyD9fgW65sOeUplslRQ4n36sGGrNMJgQFq+TnmInEjdCkE+1GEqejj16KLrttW 5yeAYmQQHxhnGv4Jfy76QRYnvReX6mS1iFjlIKMpbGh6LxO40sAJvGGNHinOXDJT g== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id HePNhly0cOUx; Wed, 15 Jan 2025 22:47:09 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLj773Wkz6CmQvG; Wed, 15 Jan 2025 22:47:07 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche Subject: [PATCH v17 04/14] block: Support allocating from a specific software queue Date: Wed, 15 Jan 2025 14:46:38 -0800 Message-ID: <20250115224649.3973718-5-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 A later patch will preserve the order of pipelined zoned writes by submitting all zoned writes per zone to the same software queue as previously submitted zoned writes. Hence support allocating a request from a specific software queue. Cc: Christoph Hellwig Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- block/blk-mq.c | 18 ++++++++++++++---- block/blk-mq.h | 1 + 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 666e6e6ba143..4262c85be206 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -495,6 +495,7 @@ __blk_mq_alloc_requests_batch(struct blk_mq_alloc_data *data) static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) { struct request_queue *q = data->q; + int swq_cpu = data->swq_cpu; u64 alloc_time_ns = 0; struct request *rq; unsigned int tag; @@ -507,7 +508,8 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) data->flags |= BLK_MQ_REQ_NOWAIT; retry: - data->ctx = blk_mq_get_ctx(q); + data->swq_cpu = swq_cpu >= 0 ? swq_cpu : raw_smp_processor_id(); + data->ctx = __blk_mq_get_ctx(q, data->swq_cpu); data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx); if (q->elevator) { @@ -587,6 +589,7 @@ static struct request *blk_mq_rq_cache_fill(struct request_queue *q, .cmd_flags = opf, .nr_tags = plug->nr_ios, .cached_rqs = &plug->cached_rqs, + .swq_cpu = -1, }; struct request *rq; @@ -648,6 +651,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, blk_opf_t opf, .flags = flags, .cmd_flags = opf, .nr_tags = 1, + .swq_cpu = -1, }; int ret; @@ -2964,12 +2968,14 @@ static bool blk_mq_attempt_bio_merge(struct request_queue *q, } static struct request *blk_mq_get_new_requests(struct request_queue *q, + int swq_cpu, struct blk_plug *plug, struct bio *bio, unsigned int nsegs) { struct blk_mq_alloc_data data = { .q = q, + .swq_cpu = swq_cpu, .nr_tags = 1, .cmd_flags = bio->bi_opf, }; @@ -2993,7 +2999,8 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q, * Check if there is a suitable cached request and return it. */ static struct request *blk_mq_peek_cached_request(struct blk_plug *plug, - struct request_queue *q, blk_opf_t opf) + struct request_queue *q, + int swq_cpu, blk_opf_t opf) { enum hctx_type type = blk_mq_get_hctx_type(opf); struct request *rq; @@ -3003,6 +3010,8 @@ static struct request *blk_mq_peek_cached_request(struct blk_plug *plug, rq = rq_list_peek(&plug->cached_rqs); if (!rq || rq->q != q) return NULL; + if (swq_cpu >= 0 && rq->mq_ctx->cpu != swq_cpu) + return NULL; if (type != rq->mq_hctx->type && (type != HCTX_TYPE_READ || rq->mq_hctx->type != HCTX_TYPE_DEFAULT)) return NULL; @@ -3061,6 +3070,7 @@ void blk_mq_submit_bio(struct bio *bio) struct blk_mq_hw_ctx *hctx; unsigned int nr_segs; struct request *rq; + int swq_cpu = -1; blk_status_t ret; /* @@ -3109,7 +3119,7 @@ void blk_mq_submit_bio(struct bio *bio) goto queue_exit; new_request: - rq = blk_mq_peek_cached_request(plug, q, bio->bi_opf); + rq = blk_mq_peek_cached_request(plug, q, swq_cpu, bio->bi_opf); if (rq) { blk_mq_use_cached_rq(rq, plug, bio); /* @@ -3119,7 +3129,7 @@ void blk_mq_submit_bio(struct bio *bio) */ blk_queue_exit(q); } else { - rq = blk_mq_get_new_requests(q, plug, bio, nr_segs); + rq = blk_mq_get_new_requests(q, swq_cpu, plug, bio, nr_segs); if (unlikely(!rq)) { if (bio->bi_opf & REQ_NOWAIT) bio_wouldblock_error(bio); diff --git a/block/blk-mq.h b/block/blk-mq.h index 44979e92b79f..d5536dcf2182 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -158,6 +158,7 @@ struct blk_mq_alloc_data { struct rq_list *cached_rqs; /* input & output parameter */ + int swq_cpu; struct blk_mq_ctx *ctx; struct blk_mq_hw_ctx *hctx; }; From patchwork Wed Jan 15 22:46:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940991 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 364191DD874 for ; Wed, 15 Jan 2025 22:47:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981241; cv=none; b=IdRpxcgjDxEP4WLLIc2MD6Qf3pVmmVPG6506YREowt8A4EFMlgFyy9+uUSbT6vTGQF9DrecaEtl3thPDxt5I13MVOXM6dScAVk778p/kbEf1V9UbAoC7/Ubm/EUdbK6Q/5KNxR/BEaL1JHlEfnVTMDKfl6vr0cpv6SORSIxYrYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981241; c=relaxed/simple; bh=AIWGLAFmYK7HmxLiqsKkn4tnss5KadvATTg9LJLXdEQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JDAGIqeLTbKX08GQv4EhB3eT2VPtsT8PUZr9bXTHNX3Hze4CKZdAiAm/hriu8twkZUjw/K2a5e3BDd8lHPsA6o2YnnYrN5l8XTCn8KS247b9+GwpBeIPNQIfGBg/0GtaU0dcnlTZRwLAU9B/5cvH+XvdQ0vuaIqIbVFXipH9bXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=yMSMPirE; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="yMSMPirE" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjL4ppSz6CmM6d; Wed, 15 Jan 2025 22:47:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981232; x=1739573233; bh=WfoHu efnwx8BAwBgYDC7Q/3S4Nf80YWNAHsqKZY4Tss=; b=yMSMPirE47xhKx19BsK4K 7fa/7z9T9VAaxic4rQoumUN293/235uqAg/auQsGyj8jONXY8rjQcidKLEWfHgpc g5Gd/tDHzAZ5eTyExq/GHKA0Zm6wgNfE0tHAAT0o2QYzzOG3fvoMDvUQ5z4Ts6/B rLJkUA7G04tPvegsPOfs6rlv5KuI8ezXVHeyG/KxHzUdEMye2AGwlsUgLLvUag3C bYIvtvlQacf9LKRFAFgXjPFOlJs9CJgNfIhNrEpX+DoWMyL1aOhCSqfK51HTrtKM laLtzUIX9dgjzIzhlUa6IPV5NGo6SuiYw41+6CejHSoJP8FFlneuU1FpQGBOW9u5 w== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id Lw3up8S2K4mx; Wed, 15 Jan 2025 22:47:12 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjB29bMz6CmQyl; Wed, 15 Jan 2025 22:47:09 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche , Yu Kuai Subject: [PATCH v17 05/14] blk-mq: Restore the zoned write order when requeuing Date: Wed, 15 Jan 2025 14:46:39 -0800 Message-ID: <20250115224649.3973718-6-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Zoned writes may be requeued. This happens if a block driver returns BLK_STS_RESOURCE, to handle SCSI unit attentions or by the SCSI error handler after error handling has finished. Requests may be requeued in another order than submitted. Restore the request order if requests are requeued. Add RQF_DONTPREP to RQF_NOMERGE_FLAGS because this patch may cause RQF_DONTPREP requests to be sent to the code that checks whether a request can be merged and RQF_DONTPREP requests must not be merged. Cc: Christoph Hellwig Cc: Damien Le Moal Cc: Yu Kuai Signed-off-by: Bart Van Assche --- block/bfq-iosched.c | 2 ++ block/blk-mq.c | 20 +++++++++++++++++++- block/blk-mq.h | 2 ++ block/kyber-iosched.c | 2 ++ block/mq-deadline.c | 7 ++++++- include/linux/blk-mq.h | 13 ++++++++++++- 6 files changed, 43 insertions(+), 3 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 167542201603..ffa4ca3aad62 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -6276,6 +6276,8 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, if (flags & BLK_MQ_INSERT_AT_HEAD) { list_add(&rq->queuelist, &bfqd->dispatch); + } else if (flags & BLK_MQ_INSERT_ORDERED) { + blk_mq_insert_ordered(rq, &bfqd->dispatch); } else if (!bfqq) { list_add_tail(&rq->queuelist, &bfqd->dispatch); } else { diff --git a/block/blk-mq.c b/block/blk-mq.c index 4262c85be206..01478777ae5f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1557,7 +1557,9 @@ static void blk_mq_requeue_work(struct work_struct *work) * already. Insert it into the hctx dispatch list to avoid * block layer merges for the request. */ - if (rq->rq_flags & RQF_DONTPREP) + if (blk_rq_is_seq_zoned_write(rq)) + blk_mq_insert_request(rq, BLK_MQ_INSERT_ORDERED); + else if (rq->rq_flags & RQF_DONTPREP) blk_mq_request_bypass_insert(rq, 0); else blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD); @@ -2592,6 +2594,20 @@ static void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, blk_mq_run_hw_queue(hctx, run_queue_async); } +void blk_mq_insert_ordered(struct request *rq, struct list_head *list) +{ + struct request_queue *q = rq->q; + struct request *rq2; + + list_for_each_entry(rq2, list, queuelist) + if (rq2->q == q && blk_rq_pos(rq2) > blk_rq_pos(rq)) + break; + + /* Insert rq before rq2. If rq2 is the list head, append at the end. */ + list_add_tail(&rq->queuelist, &rq2->queuelist); +} +EXPORT_SYMBOL_GPL(blk_mq_insert_ordered); + static void blk_mq_insert_request(struct request *rq, blk_insert_t flags) { struct request_queue *q = rq->q; @@ -2646,6 +2662,8 @@ static void blk_mq_insert_request(struct request *rq, blk_insert_t flags) spin_lock(&ctx->lock); if (flags & BLK_MQ_INSERT_AT_HEAD) list_add(&rq->queuelist, &ctx->rq_lists[hctx->type]); + else if (flags & BLK_MQ_INSERT_ORDERED) + blk_mq_insert_ordered(rq, &ctx->rq_lists[hctx->type]); else list_add_tail(&rq->queuelist, &ctx->rq_lists[hctx->type]); diff --git a/block/blk-mq.h b/block/blk-mq.h index d5536dcf2182..4035643c51a7 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -40,8 +40,10 @@ enum { typedef unsigned int __bitwise blk_insert_t; #define BLK_MQ_INSERT_AT_HEAD ((__force blk_insert_t)0x01) +#define BLK_MQ_INSERT_ORDERED ((__force blk_insert_t)0x02) void blk_mq_submit_bio(struct bio *bio); +void blk_mq_insert_ordered(struct request *rq, struct list_head *list); int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch *iob, unsigned int flags); void blk_mq_exit_queue(struct request_queue *q); diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index dc31f2dfa414..2877cce690f3 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -603,6 +603,8 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx *hctx, trace_block_rq_insert(rq); if (flags & BLK_MQ_INSERT_AT_HEAD) list_move(&rq->queuelist, head); + else if (flags & BLK_MQ_INSERT_ORDERED) + blk_mq_insert_ordered(rq, head); else list_move_tail(&rq->queuelist, head); sbitmap_set_bit(&khd->kcq_map[sched_domain], diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 754f6b7415cd..78534279adab 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -710,7 +710,12 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, * set expire time and add to fifo list */ rq->fifo_time = jiffies + dd->fifo_expire[data_dir]; - list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]); + if (flags & BLK_MQ_INSERT_ORDERED) + blk_mq_insert_ordered(rq, + &per_prio->fifo_list[data_dir]); + else + list_add_tail(&rq->queuelist, + &per_prio->fifo_list[data_dir]); } } diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index a0a9007cc1e3..482d5432817c 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -85,7 +85,7 @@ enum { /* flags that prevent us from merging requests: */ #define RQF_NOMERGE_FLAGS \ - (RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD) + (RQF_STARTED | RQF_FLUSH_SEQ | RQF_DONTPREP | RQF_SPECIAL_PAYLOAD) enum mq_rq_state { MQ_RQ_IDLE = 0, @@ -1152,4 +1152,15 @@ static inline int blk_rq_map_sg(struct request_queue *q, struct request *rq, } void blk_dump_rq_flags(struct request *, char *); +static inline bool blk_rq_is_seq_zoned_write(struct request *rq) +{ + switch (req_op(rq)) { + case REQ_OP_WRITE: + case REQ_OP_WRITE_ZEROES: + return bdev_zone_is_seq(rq->q->disk->part0, blk_rq_pos(rq)); + default: + return false; + } +} + #endif /* BLK_MQ_H */ From patchwork Wed Jan 15 22:46:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940992 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F00C81DD877 for ; Wed, 15 Jan 2025 22:47:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981243; cv=none; b=Iu0v8+UQWFIcl8+mx+/fxK9Do4yJWGr0U7RnVpW2eun5sLUOFVQo4BoGnMR0hLyrcvcvOWn6oFh3VBQ9xJfVC7srONni3H9OqJGuEm+9UAivjOatA69VHfMPXYu8BlKmoEIkrmP3mwly/WKLuYNOGqULl3TyiHB4B/8xOzbiUwk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981243; c=relaxed/simple; bh=dY+plBFSBACAfUO2Nso0B7I5fr4OO1/B48+2jZyJHjA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lUZm9NzKoqXc1jzn3Li6QzWLdniVjpbNbOGBqwdVY3i8XCTabjZLPrEPR4rXAhbt0+dnCdm5oA1vPcZaqXgeD/T7Md5BW6XicZft6SZADh0UdeGIa3Paaf4r/adn1pcN7bImb9flJRXupcrADnNhpwgJuPHiekvWOXGufzyxfhA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=mU0teB0o; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="mU0teB0o" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjN2xdHz6CmM6M; Wed, 15 Jan 2025 22:47:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981234; x=1739573235; bh=krRN/ Q54JB7gsBwcJitQM3QAdibDLvVQac+ARKsxcuw=; b=mU0teB0oz4x3NonzMtZ7I 8C3L65BDSZw7W6bdhFVxsQIdTMhtKs9jmmfiF+Lw7mi42TjS+KFcO9zrWo3mYXOn VjkB3lYCzKDknBYJpOCWi9bDJuzduUwkDyuWWlPuFUGSxLSqIR4vSkbdgAXi0G21 gDeheuKOggP3N+Y49jJ4Lvvwj1jjNOB888BA0HNXoTY6L/+3c5um6ku5EKLpvj5f frwpy8GW5Z/qQGPowDW/qMVIJY5lYqRlwYq4zlADH3w4nfsKDJCgVRxV7dApfVsk eK5lt0PiAPRbIzMmZOynanKSxbl+Z+dhbycEjp2m2lV209cZDJLxMqhtd2xy4iN9 A== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id GPiaIbyDG42m; Wed, 15 Jan 2025 22:47:14 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjD706cz6CmR5y; Wed, 15 Jan 2025 22:47:12 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche Subject: [PATCH v17 06/14] blk-zoned: Track the write pointer per zone Date: Wed, 15 Jan 2025 14:46:40 -0800 Message-ID: <20250115224649.3973718-7-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Derive the write pointer from successfully completed zoned writes. This patch prepares for restoring the write pointer after a write has failed either by the device (e.g. a unit attention or an unaligned write) or by the driver (e.g. BLK_STS_RESOURCE). Cc: Christoph Hellwig Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- block/blk-zoned.c | 37 +++++++++++++++++++++++++++++-------- block/blk.h | 4 +++- 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 9d08a54c201e..089c6740df4a 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -51,6 +51,8 @@ static const char *const zone_cond_name[] = { * @zone_no: The number of the zone the plug is managing. * @wp_offset: The zone write pointer location relative to the start of the zone * as a number of 512B sectors. + * @wp_offset_compl: End offset for completed zoned writes as a number of 512 + * byte sectors. * @bio_list: The list of BIOs that are currently plugged. * @bio_work: Work struct to handle issuing of plugged BIOs * @rcu_head: RCU head to free zone write plugs with an RCU grace period. @@ -63,6 +65,7 @@ struct blk_zone_wplug { unsigned int flags; unsigned int zone_no; unsigned int wp_offset; + unsigned int wp_offset_compl; struct bio_list bio_list; struct work_struct bio_work; struct rcu_head rcu_head; @@ -554,6 +557,7 @@ static struct blk_zone_wplug *disk_get_and_lock_zone_wplug(struct gendisk *disk, zwplug->flags = 0; zwplug->zone_no = zno; zwplug->wp_offset = bdev_offset_from_zone_start(disk->part0, sector); + zwplug->wp_offset_compl = zwplug->wp_offset; bio_list_init(&zwplug->bio_list); INIT_WORK(&zwplug->bio_work, blk_zone_wplug_bio_work); zwplug->disk = disk; @@ -612,6 +616,7 @@ static void disk_zone_wplug_set_wp_offset(struct gendisk *disk, /* Update the zone write pointer and abort all plugged BIOs. */ zwplug->flags &= ~BLK_ZONE_WPLUG_NEED_WP_UPDATE; zwplug->wp_offset = wp_offset; + zwplug->wp_offset_compl = zwplug->wp_offset; disk_zone_wplug_abort(zwplug); /* @@ -1148,6 +1153,7 @@ void blk_zone_write_plug_bio_endio(struct bio *bio) struct gendisk *disk = bio->bi_bdev->bd_disk; struct blk_zone_wplug *zwplug = disk_get_zone_wplug(disk, bio->bi_iter.bi_sector); + unsigned int end_sector; unsigned long flags; if (WARN_ON_ONCE(!zwplug)) @@ -1165,11 +1171,24 @@ void blk_zone_write_plug_bio_endio(struct bio *bio) bio->bi_opf |= REQ_OP_ZONE_APPEND; } - /* - * If the BIO failed, abort all plugged BIOs and mark the plug as - * needing a write pointer update. - */ - if (bio->bi_status != BLK_STS_OK) { + if (bio->bi_status == BLK_STS_OK) { + switch (bio_op(bio)) { + case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: + case REQ_OP_WRITE_ZEROES: + end_sector = bdev_offset_from_zone_start(disk->part0, + bio->bi_iter.bi_sector + bio_sectors(bio)); + if (end_sector > zwplug->wp_offset_compl) + zwplug->wp_offset_compl = end_sector; + break; + default: + break; + } + } else { + /* + * If the BIO failed, mark the plug as having an error to + * trigger recovery. + */ spin_lock_irqsave(&zwplug->lock, flags); disk_zone_wplug_abort(zwplug); zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE; @@ -1772,7 +1791,7 @@ EXPORT_SYMBOL_GPL(blk_zone_issue_zeroout); static void queue_zone_wplug_show(struct blk_zone_wplug *zwplug, struct seq_file *m) { - unsigned int zwp_wp_offset, zwp_flags; + unsigned int zwp_wp_offset, zwp_wp_offset_compl, zwp_flags; unsigned int zwp_zone_no, zwp_ref; unsigned int zwp_bio_list_size; unsigned long flags; @@ -1782,11 +1801,13 @@ static void queue_zone_wplug_show(struct blk_zone_wplug *zwplug, zwp_flags = zwplug->flags; zwp_ref = refcount_read(&zwplug->ref); zwp_wp_offset = zwplug->wp_offset; + zwp_wp_offset_compl = zwplug->wp_offset_compl; zwp_bio_list_size = bio_list_size(&zwplug->bio_list); spin_unlock_irqrestore(&zwplug->lock, flags); - seq_printf(m, "%u 0x%x %u %u %u\n", zwp_zone_no, zwp_flags, zwp_ref, - zwp_wp_offset, zwp_bio_list_size); + seq_printf(m, "zone_no %u flags 0x%x ref %u wp_offset %u wp_offset_compl %u bio_list_size %u\n", + zwp_zone_no, zwp_flags, zwp_ref, zwp_wp_offset, + zwp_wp_offset_compl, zwp_bio_list_size); } int queue_zone_wplugs_show(void *data, struct seq_file *m) diff --git a/block/blk.h b/block/blk.h index 4904b86d5fec..2274253cfa58 100644 --- a/block/blk.h +++ b/block/blk.h @@ -470,8 +470,10 @@ static inline void blk_zone_update_request_bio(struct request *rq, * the original BIO sector so that blk_zone_write_plug_bio_endio() can * lookup the zone write plug. */ - if (req_op(rq) == REQ_OP_ZONE_APPEND || bio_zone_write_plugging(bio)) + if (req_op(rq) == REQ_OP_ZONE_APPEND || bio_zone_write_plugging(bio)) { bio->bi_iter.bi_sector = rq->__sector; + bio->bi_iter.bi_size = rq->__data_len; + } } void blk_zone_write_plug_bio_endio(struct bio *bio); static inline void blk_zone_bio_endio(struct bio *bio) From patchwork Wed Jan 15 22:46:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940996 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90A2F1DD0D4 for ; Wed, 15 Jan 2025 22:47:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981252; cv=none; b=dPiTCxP7x5pdZel0u+xMhB5hv4JWnmMQuiyiWZkShTQGBVZfqZvZOqfSzBvroABC1gnFSXm698jifMuRTgSxf0qpM294iNtZfQjZcqSxMFzMYtQhpiJXD9wU4wAk1Ez0tNibu7Dd8Q1IcccIsRCTq1hljuBWlZCbbrRWzVoCDz0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981252; c=relaxed/simple; bh=yHLB3KS794JLyokOem9w2y29sfhYl32BIlUOJByo6SQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K1qwv+RtP9+GYPSvFwfXvoRlrreM90Cbv71hqw/vdNKR5PIg1FdfK6Lqv8/cEkKa+QhznOh5afthvQT+RmVQGLSpr+6eFQBpoDKQX2fjAn8Q+MauoPspwkIXIAat8fsO8kwfcFKkke8J6ZsT390+RmmEpMgQ9zdBC9nzCZTIIAQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=eCIDmn8i; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="eCIDmn8i" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjY2fffz6CmR5y; Wed, 15 Jan 2025 22:47:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981237; x=1739573238; bh=I7tm7 aBMqvJ4y5BsfXNcLUhLrB8MeK6wOh6IMODi7hQ=; b=eCIDmn8iBJkcD3AAWBQSH wRSo27kiICG+f+LIbZVfZbBGpQA6BNTjMAC8LAZdAgBCKv7b9eQMaQQ9z8mNqOfz 3oT61kTvjZ1RSV8kAzLUO5DqceZZeNzSCAHuLVEMkpjfDrFtT9FgOcIunLnSIDf7 By8YLGNvFYJr5e3d+2aC80rMJ+t50FZ0dbMxe6wgrn4iQsTduqwM/cQuMb3/d2SW VGH269DWiTiS6bB+cSsPS60DCjkDvaDYbyAkFWP0WTAtrAYPhk/SFN+2av6is4W1 RJC32DyK3qwA7owPfBcMc8txnPfeQdZMApXN6+D4sSTzixc6mmhsxc1j4/sW4VaN g== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id ijsXzPVsB0MS; Wed, 15 Jan 2025 22:47:17 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjH27G4z6CmQvG; Wed, 15 Jan 2025 22:47:14 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche Subject: [PATCH v17 07/14] blk-zoned: Defer error handling Date: Wed, 15 Jan 2025 14:46:41 -0800 Message-ID: <20250115224649.3973718-8-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Only handle errors after pending zoned writes have completed or have been requeued instead of handling write errors immediately. This patch prepares for implementing write pipelining support. If the e.g. the SCSI error handler is activated while multiple requests are queued, all requests must have completed or failed before any requests are resubmitted. Cc: Christoph Hellwig Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- block/blk-mq.c | 9 ++ block/blk-zoned.c | 279 ++++++++++++++++++++++++++++++++++++++--- block/blk.h | 27 ++++ include/linux/blkdev.h | 4 +- 4 files changed, 302 insertions(+), 17 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 01478777ae5f..ca34ec34d595 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -799,6 +799,9 @@ void blk_mq_free_request(struct request *rq) rq_qos_done(q, rq); WRITE_ONCE(rq->state, MQ_RQ_IDLE); + + blk_zone_free_request(rq); + if (req_ref_put_and_test(rq)) __blk_mq_free_request(rq); } @@ -1195,6 +1198,9 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob) continue; WRITE_ONCE(rq->state, MQ_RQ_IDLE); + + blk_zone_free_request(rq); + if (!req_ref_put_and_test(rq)) continue; @@ -1513,6 +1519,7 @@ static void __blk_mq_requeue_request(struct request *rq) if (blk_mq_request_started(rq)) { WRITE_ONCE(rq->state, MQ_RQ_IDLE); rq->rq_flags &= ~RQF_TIMED_OUT; + blk_zone_requeue_work(q); } } @@ -1548,6 +1555,8 @@ static void blk_mq_requeue_work(struct work_struct *work) list_splice_init(&q->flush_list, &flush_list); spin_unlock_irq(&q->requeue_lock); + blk_zone_requeue_work(q); + while (!list_empty(&rq_list)) { rq = list_entry(rq_list.next, struct request, queuelist); list_del_init(&rq->queuelist); diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 089c6740df4a..cc09ae84acc8 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -8,6 +8,7 @@ * Copyright (c) 2016, Damien Le Moal * Copyright (c) 2016, Western Digital * Copyright (c) 2024, Western Digital Corporation or its affiliates. + * Copyright 2024 Google LLC */ #include @@ -37,6 +38,7 @@ static const char *const zone_cond_name[] = { /* * Per-zone write plug. * @node: hlist_node structure for managing the plug using a hash table. + * @link: To list the plug in the zone write plug error list of the disk. * @ref: Zone write plug reference counter. A zone write plug reference is * always at least 1 when the plug is hashed in the disk plug hash table. * The reference is incremented whenever a new BIO needing plugging is @@ -60,6 +62,7 @@ static const char *const zone_cond_name[] = { */ struct blk_zone_wplug { struct hlist_node node; + struct list_head link; refcount_t ref; spinlock_t lock; unsigned int flags; @@ -86,10 +89,16 @@ struct blk_zone_wplug { * to prevent new references to the zone write plug to be taken for * newly incoming BIOs. A zone write plug flagged with this flag will be * freed once all remaining references from BIOs or functions are dropped. + * - BLK_ZONE_WPLUG_ERROR: Indicates that a write error happened. Recovery + * from the write error will happen after all pending zoned write requests + * either have been requeued or have been completed. */ #define BLK_ZONE_WPLUG_PLUGGED (1U << 0) #define BLK_ZONE_WPLUG_NEED_WP_UPDATE (1U << 1) #define BLK_ZONE_WPLUG_UNHASHED (1U << 2) +#define BLK_ZONE_WPLUG_ERROR (1U << 3) + +#define BLK_ZONE_WPLUG_BUSY (BLK_ZONE_WPLUG_PLUGGED | BLK_ZONE_WPLUG_ERROR) /** * blk_zone_cond_str - Return string XXX in BLK_ZONE_COND_XXX. @@ -468,8 +477,8 @@ static inline bool disk_should_remove_zone_wplug(struct gendisk *disk, if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) return false; - /* If the zone write plug is still plugged, it cannot be removed. */ - if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) + /* If the zone write plug is still busy, it cannot be removed. */ + if (zwplug->flags & BLK_ZONE_WPLUG_BUSY) return false; /* @@ -552,6 +561,7 @@ static struct blk_zone_wplug *disk_get_and_lock_zone_wplug(struct gendisk *disk, return NULL; INIT_HLIST_NODE(&zwplug->node); + INIT_LIST_HEAD(&zwplug->link); refcount_set(&zwplug->ref, 2); spin_lock_init(&zwplug->lock); zwplug->flags = 0; @@ -601,6 +611,49 @@ static void disk_zone_wplug_abort(struct blk_zone_wplug *zwplug) blk_zone_wplug_bio_io_error(zwplug, bio); } +static void disk_zone_wplug_set_error(struct gendisk *disk, + struct blk_zone_wplug *zwplug) +{ + lockdep_assert_held(&zwplug->lock); + + if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) + return; + + zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; + zwplug->flags |= BLK_ZONE_WPLUG_ERROR; + /* + * Increase the zwplug reference count because BLK_ZONE_WPLUG_ERROR has + * been set. This reference will be dropped when BLK_ZONE_WPLUG_ERROR is + * cleared. + */ + refcount_inc(&zwplug->ref); + + scoped_guard(spinlock_irqsave, &disk->zone_wplugs_lock) + list_add_tail(&zwplug->link, &disk->zone_wplugs_err_list); +} + +static void disk_zone_wplug_clear_error(struct gendisk *disk, + struct blk_zone_wplug *zwplug) +{ + if (!(READ_ONCE(zwplug->flags) & BLK_ZONE_WPLUG_ERROR)) + return; + + /* + * We are racing with the error handling work which drops the reference + * on the zone write plug after handling the error state. So remove the + * plug from the error list and drop its reference count only if the + * error handling has not yet started, that is, if the zone write plug + * is still listed. + */ + scoped_guard(spinlock_irqsave, &disk->zone_wplugs_lock) { + if (list_empty(&zwplug->link)) + return; + list_del_init(&zwplug->link); + zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR; + } + disk_put_zone_wplug(zwplug); +} + /* * Set a zone write plug write pointer offset to the specified value. * This aborts all plugged BIOs, which is fine as this function is called for @@ -619,6 +672,13 @@ static void disk_zone_wplug_set_wp_offset(struct gendisk *disk, zwplug->wp_offset_compl = zwplug->wp_offset; disk_zone_wplug_abort(zwplug); + /* + * Updating the write pointer offset puts back the zone + * in a good state. So clear the error flag and decrement the + * error count if we were in error state. + */ + disk_zone_wplug_clear_error(disk, zwplug); + /* * The zone write plug now has no BIO plugged: remove it from the * hash table so that it cannot be seen. The plug will be freed @@ -747,6 +807,70 @@ static bool blk_zone_wplug_handle_reset_all(struct bio *bio) return false; } +struct all_zwr_inserted_data { + struct blk_zone_wplug *zwplug; + bool res; +}; + +/* + * Changes @data->res to %false if and only if @rq is a zoned write for the + * given zone and if it is owned by the block driver. + * + * @rq members may change while this function is in progress. Hence, use + * READ_ONCE() to read @rq members. + */ +static bool blk_zwr_inserted(struct request *rq, void *data) +{ + struct all_zwr_inserted_data *d = data; + struct blk_zone_wplug *zwplug = d->zwplug; + struct request_queue *q = zwplug->disk->queue; + struct bio *bio = READ_ONCE(rq->bio); + + if (rq->q == q && READ_ONCE(rq->state) != MQ_RQ_IDLE && + blk_rq_is_seq_zoned_write(rq) && bio && + bio_zone_no(bio) == zwplug->zone_no) { + d->res = false; + return false; + } + + return true; +} + +/* + * Report whether or not all zoned writes for a zone have been inserted into a + * software queue, elevator queue or hardware queue. + */ +static bool blk_zone_all_zwr_inserted(struct blk_zone_wplug *zwplug) +{ + struct gendisk *disk = zwplug->disk; + struct request_queue *q = disk->queue; + struct all_zwr_inserted_data d = { .zwplug = zwplug, .res = true }; + struct blk_mq_hw_ctx *hctx; + unsigned long i; + struct request *rq; + + scoped_guard(spinlock_irqsave, &q->requeue_lock) { + list_for_each_entry(rq, &q->requeue_list, queuelist) + if (blk_rq_is_seq_zoned_write(rq) && + bio_zone_no(rq->bio) == zwplug->zone_no) + return false; + list_for_each_entry(rq, &q->flush_list, queuelist) + if (blk_rq_is_seq_zoned_write(rq) && + bio_zone_no(rq->bio) == zwplug->zone_no) + return false; + } + + queue_for_each_hw_ctx(q, hctx, i) { + struct blk_mq_tags *tags = hctx->sched_tags ?: hctx->tags; + + blk_mq_all_tag_iter(tags, blk_zwr_inserted, &d); + if (!d.res || blk_mq_is_shared_tags(q->tag_set->flags)) + break; + } + + return d.res; +} + static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk, struct blk_zone_wplug *zwplug) { @@ -953,14 +1077,6 @@ static bool blk_zone_wplug_prepare_bio(struct blk_zone_wplug *zwplug, * so that we can restore its operation code on completion. */ bio_set_flag(bio, BIO_EMULATES_ZONE_APPEND); - } else { - /* - * Check for non-sequential writes early as we know that BIOs - * with a start sector not unaligned to the zone write pointer - * will fail. - */ - if (bio_offset_from_zone_start(bio) != zwplug->wp_offset) - return false; } /* Advance the zone write pointer offset. */ @@ -1021,7 +1137,7 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) * BLK_STS_AGAIN failure if we let the BIO execute. * Otherwise, plug and let the BIO execute. */ - if ((zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) || + if ((zwplug->flags & BLK_ZONE_WPLUG_BUSY) || (bio->bi_opf & REQ_NOWAIT)) goto plug; @@ -1122,6 +1238,29 @@ bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs) } EXPORT_SYMBOL_GPL(blk_zone_plug_bio); +/* + * Change the zone state to "error" if a zoned write request is requeued to + * postpone processing of requeued requests until all pending requests have + * either completed or have been requeued. + */ +void blk_zone_write_plug_requeue_request(struct request *rq) +{ + struct gendisk *disk = rq->q->disk; + struct blk_zone_wplug *zwplug; + + if (!blk_rq_is_seq_zoned_write(rq)) + return; + + zwplug = disk_get_zone_wplug(disk, blk_rq_pos(rq)); + if (WARN_ON_ONCE(!zwplug)) + return; + + scoped_guard(spinlock_irqsave, &zwplug->lock) + disk_zone_wplug_set_error(disk, zwplug); + + disk_put_zone_wplug(zwplug); +} + static void disk_zone_wplug_unplug_bio(struct gendisk *disk, struct blk_zone_wplug *zwplug) { @@ -1187,11 +1326,14 @@ void blk_zone_write_plug_bio_endio(struct bio *bio) } else { /* * If the BIO failed, mark the plug as having an error to - * trigger recovery. + * trigger recovery. Since we cannot rely the completion + * information for torn SAS SMR writes, set + * BLK_ZONE_WPLUG_NEED_WP_UPDATE for these devices. */ spin_lock_irqsave(&zwplug->lock, flags); - disk_zone_wplug_abort(zwplug); - zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE; + if (!disk->queue->limits.driver_preserves_write_order) + zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE; + zwplug->flags |= BLK_ZONE_WPLUG_ERROR; spin_unlock_irqrestore(&zwplug->lock, flags); } @@ -1233,6 +1375,25 @@ void blk_zone_write_plug_finish_request(struct request *req) disk_put_zone_wplug(zwplug); } +/* + * Schedule zone_plugs_work if a zone is in the error state and if no requests + * are in flight. Called from blk_mq_free_request(). + */ +void blk_zone_write_plug_free_request(struct request *rq) +{ + struct gendisk *disk = rq->q->disk; + struct blk_zone_wplug *zwplug; + + zwplug = disk_get_zone_wplug(disk, blk_rq_pos(rq)); + if (!zwplug) + return; + + if (READ_ONCE(zwplug->flags) & BLK_ZONE_WPLUG_ERROR) + kblockd_schedule_work(&disk->zone_wplugs_work); + + disk_put_zone_wplug(zwplug); +} + static void blk_zone_wplug_bio_work(struct work_struct *work) { struct blk_zone_wplug *zwplug = @@ -1279,6 +1440,88 @@ static void blk_zone_wplug_bio_work(struct work_struct *work) disk_put_zone_wplug(zwplug); } +static void disk_zone_wplug_handle_error(struct gendisk *disk, + struct blk_zone_wplug *zwplug) +{ + scoped_guard(spinlock_irqsave, &zwplug->lock) { + /* + * A zone reset or finish may have cleared the error + * already. In such case, do nothing as the report zones may + * have seen the "old" write pointer value before the + * reset/finish operation completed. + */ + if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR)) + return; + + zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR; + + /* Update the zone write pointer offset. */ + zwplug->wp_offset = zwplug->wp_offset_compl; + + /* Restart BIO submission if we still have any BIO left. */ + if (!bio_list_empty(&zwplug->bio_list)) { + disk_zone_wplug_schedule_bio_work(disk, zwplug); + } else { + zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; + if (disk_should_remove_zone_wplug(disk, zwplug)) + disk_remove_zone_wplug(disk, zwplug); + } + } + + disk_put_zone_wplug(zwplug); +} + +static void disk_zone_process_err_list(struct gendisk *disk) +{ + struct blk_zone_wplug *zwplug, *next; + unsigned long flags; + + spin_lock_irqsave(&disk->zone_wplugs_lock, flags); + + list_for_each_entry_safe(zwplug, next, &disk->zone_wplugs_err_list, + link) { + if (!blk_zone_all_zwr_inserted(zwplug)) + continue; + list_del_init(&zwplug->link); + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); + + disk_zone_wplug_handle_error(disk, zwplug); + disk_put_zone_wplug(zwplug); + + spin_lock_irqsave(&disk->zone_wplugs_lock, flags); + } + + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); + + /* + * If one or more zones have been skipped, this work will be requeued + * when a request is requeued (blk_zone_requeue_work()) or freed + * (blk_zone_write_plug_free_request()). + */ +} + +static void disk_zone_wplugs_work(struct work_struct *work) +{ + struct gendisk *disk = + container_of(work, struct gendisk, zone_wplugs_work); + + disk_zone_process_err_list(disk); +} + +/* May be called from interrupt context. */ +void blk_zone_requeue_work(struct request_queue *q) +{ + struct gendisk *disk = q->disk; + + if (!disk) + return; + + if (in_interrupt()) + kblockd_schedule_work(&disk->zone_wplugs_work); + else + disk_zone_process_err_list(disk); +} + static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk) { return 1U << disk->zone_wplugs_hash_bits; @@ -1287,6 +1530,8 @@ static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk) void disk_init_zone_resources(struct gendisk *disk) { spin_lock_init(&disk->zone_wplugs_lock); + INIT_LIST_HEAD(&disk->zone_wplugs_err_list); + INIT_WORK(&disk->zone_wplugs_work, disk_zone_wplugs_work); } /* @@ -1805,9 +2050,11 @@ static void queue_zone_wplug_show(struct blk_zone_wplug *zwplug, zwp_bio_list_size = bio_list_size(&zwplug->bio_list); spin_unlock_irqrestore(&zwplug->lock, flags); - seq_printf(m, "zone_no %u flags 0x%x ref %u wp_offset %u wp_offset_compl %u bio_list_size %u\n", + bool all_zwr_inserted = blk_zone_all_zwr_inserted(zwplug); + + seq_printf(m, "zone_no %u flags 0x%x ref %u wp_offset %u bio_list_size %u all_zwr_inserted %d\n", zwp_zone_no, zwp_flags, zwp_ref, zwp_wp_offset, - zwp_wp_offset_compl, zwp_bio_list_size); + zwp_bio_list_size, all_zwr_inserted); } int queue_zone_wplugs_show(void *data, struct seq_file *m) diff --git a/block/blk.h b/block/blk.h index 2274253cfa58..98954fb0069f 100644 --- a/block/blk.h +++ b/block/blk.h @@ -475,6 +475,16 @@ static inline void blk_zone_update_request_bio(struct request *rq, bio->bi_iter.bi_size = rq->__data_len; } } + +void blk_zone_write_plug_requeue_request(struct request *rq); +static inline void blk_zone_requeue_request(struct request *rq) +{ + if (blk_rq_is_seq_zoned_write(rq)) + blk_zone_write_plug_requeue_request(rq); +} + +void blk_zone_requeue_work(struct request_queue *q); + void blk_zone_write_plug_bio_endio(struct bio *bio); static inline void blk_zone_bio_endio(struct bio *bio) { @@ -492,6 +502,14 @@ static inline void blk_zone_finish_request(struct request *rq) if (rq->rq_flags & RQF_ZONE_WRITE_PLUGGING) blk_zone_write_plug_finish_request(rq); } + +void blk_zone_write_plug_free_request(struct request *rq); +static inline void blk_zone_free_request(struct request *rq) +{ + if (blk_rq_is_seq_zoned_write(rq)) + blk_zone_write_plug_free_request(rq); +} + int blkdev_report_zones_ioctl(struct block_device *bdev, unsigned int cmd, unsigned long arg); int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode, @@ -517,12 +535,21 @@ static inline void blk_zone_update_request_bio(struct request *rq, struct bio *bio) { } +static inline void blk_zone_requeue_request(struct request *rq) +{ +} +static inline void blk_zone_requeue_work(struct request_queue *q) +{ +} static inline void blk_zone_bio_endio(struct bio *bio) { } static inline void blk_zone_finish_request(struct request *rq) { } +static inline void blk_zone_free_request(struct request *rq) +{ +} static inline int blkdev_report_zones_ioctl(struct block_device *bdev, unsigned int cmd, unsigned long arg) { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index df9887412a9e..fcea07b4062e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -199,7 +199,9 @@ struct gendisk { unsigned int zone_wplugs_hash_bits; spinlock_t zone_wplugs_lock; struct mempool_s *zone_wplugs_pool; - struct hlist_head *zone_wplugs_hash; + struct hlist_head *zone_wplugs_hash; + struct list_head zone_wplugs_err_list; + struct work_struct zone_wplugs_work; struct workqueue_struct *zone_wplugs_wq; #endif /* CONFIG_BLK_DEV_ZONED */ From patchwork Wed Jan 15 22:46:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940993 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB6451DBB19 for ; Wed, 15 Jan 2025 22:47:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981245; cv=none; b=qzVyTW0hkQMgMcXgVbmcMNCHmgpVm6luZdzihW2AeGay9szApcIVNaaG9i1ghDwrwxF1dte7MfjLqZXtWXtkqVLFS6xRucXoFGADziyhpnQI4Joautjz5wqh0rDqNT+wL9gZVcCtCfFhaLqsM6h6ZDa5cGx3Zb46jUYDyJcF1mM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981245; c=relaxed/simple; bh=RggSVnHIh7yp3OjIe5TAtJDqRBdlfYhjfFvYrhtbXHg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qEH3+PSVOpg/58RAoisUY2ryh8Nm1pqGQcdKWpitEqCBq8oLUpZADwnDaM6mpSgyAYLSdwAI1co+n3URpsf9HjI8v+AWOqS7l5Y8oco9wUQRnEGDXzoZPBXFvVe//l0p/6w9TLmXaR0DsiC5Z1+SPClYwHMrC8t4bOYa/4TORxY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=G7HD26a8; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="G7HD26a8" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjR1mdjz6CmM6Q; Wed, 15 Jan 2025 22:47:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981239; x=1739573240; bh=4sl61 HsrwV+FSPuE2ckZJGaDsWAFUjIzGU073sRJ32M=; b=G7HD26a85DPgQmDn7YNNy CCydymH4erGTafeFQ+2Umzyox/Rk27XYXCVNCyyaRnnMN4isXjSFXGUWuxyk5dYR kXUKzl6ZY3UvZNodI9eokVKK9OWW61EPkraNGrtPFlTbDARjAEmREtf00AbyEP+R kPKclwsUwb5A8vfNFAcWobhHh+T4eOuIDwhUq4s4p9nkKB6qLR8T5LLy5Hmt0OWh xdbkiwPIQkuky+IPa/YF4+gvAtbN5DV321HW/sGHhaJWBJTfnrfYxTxY5EfXnu7P 2i5purzhtCibWSGg3m2BJzc9I1mhnk2z00tUWHjhrtHI6urCaT6o3LjBnU3mNpAt Q== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id WfZDMUtyxcjJ; Wed, 15 Jan 2025 22:47:19 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjK4qd9z6CmM6X; Wed, 15 Jan 2025 22:47:17 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche Subject: [PATCH v17 08/14] blk-zoned: Add an argument to blk_zone_plug_bio() Date: Wed, 15 Jan 2025 14:46:42 -0800 Message-ID: <20250115224649.3973718-9-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Prepare for preserving the order of pipelined zoned writes per zone. Cc: Christoph Hellwig Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- block/blk-mq.c | 2 +- block/blk-zoned.c | 3 ++- drivers/md/dm.c | 2 +- include/linux/blkdev.h | 5 +++-- 4 files changed, 7 insertions(+), 5 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index ca34ec34d595..01cfcc6f7b02 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3142,7 +3142,7 @@ void blk_mq_submit_bio(struct bio *bio) if (blk_mq_attempt_bio_merge(q, bio, nr_segs)) goto queue_exit; - if (blk_queue_is_zoned(q) && blk_zone_plug_bio(bio, nr_segs)) + if (blk_queue_is_zoned(q) && blk_zone_plug_bio(bio, nr_segs, &swq_cpu)) goto queue_exit; new_request: diff --git a/block/blk-zoned.c b/block/blk-zoned.c index cc09ae84acc8..e2929d00dafd 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -1165,6 +1165,7 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) * blk_zone_plug_bio - Handle a zone write BIO with zone write plugging * @bio: The BIO being submitted * @nr_segs: The number of physical segments of @bio + * @swq_cpu: [out] CPU of the software queue to which the bio should be queued * * Handle write, write zeroes and zone append operations requiring emulation * using zone write plugging. @@ -1173,7 +1174,7 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) * write plug. Otherwise, return false to let the submission path process * @bio normally. */ -bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs) +bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs, int *swq_cpu) { struct block_device *bdev = bio->bi_bdev; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 12ecf07a3841..c3f851fe26f6 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1796,7 +1796,7 @@ static inline bool dm_zone_bio_needs_split(struct mapped_device *md, } static inline bool dm_zone_plug_bio(struct mapped_device *md, struct bio *bio) { - return dm_emulate_zone_append(md) && blk_zone_plug_bio(bio, 0); + return dm_emulate_zone_append(md) && blk_zone_plug_bio(bio, 0, NULL); } static blk_status_t __send_zone_reset_all_emulated(struct clone_info *ci, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index fcea07b4062e..0ae106944ab3 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -704,13 +704,14 @@ static inline unsigned int disk_nr_zones(struct gendisk *disk) { return disk->nr_zones; } -bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs); +bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs, int *swq_cpu); #else /* CONFIG_BLK_DEV_ZONED */ static inline unsigned int disk_nr_zones(struct gendisk *disk) { return 0; } -static inline bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs) +static inline bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs, + int *swq_cpu) { return false; } From patchwork Wed Jan 15 22:46:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940994 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13A8D1DD0F8 for ; Wed, 15 Jan 2025 22:47:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981250; cv=none; b=CR+AG/XkzKXAWpEjAOxIU9r8TET7gKjfJxLte5sq8PudLveBw1anTM2ZtDY3y7gaJo7AVL0P96eaAAJVINXGm3Ikap5VyOydfbWp2NyaUhuQfyfyTNXkaAyfC688xBcRfW9QTurc5K9Zds6WEXMhzyZ7ooNkZJ4m5Mp4kw5kqyk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981250; c=relaxed/simple; bh=Y3sA9anukKPVn9fAcJQFNJ5W5bJcLwTQJ8Nxv2IROpU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=chPYQ4LimSrnEySiVdAoYdZMSq+qbxPG74+QRv2Ywbte8ZOuZ7/qagKZ5b7lLsLjFg4kbnV55IGbZSQC655sjy6xinsft8dBplDj0/8RyXja0vueFFXTOC+cVeo+jcofKCuG3zh2obXp7ASAhphdFv18y93URf3tdoJ3bPkr8hc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=OklR/lef; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="OklR/lef" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjW54BQz6CmM6f; Wed, 15 Jan 2025 22:47:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981241; x=1739573242; bh=K5KXa uSteIJJuQf0l8TmDlKKt/2zlxs+y1SoTKJ/GXo=; b=OklR/lefDV9iWICDPy7KE hGJUuvy2AJgN5nz+RzqK7e0n5ccFdyk+d3+EmxRCVVQHI0oQvUfOzDT+ldi9BxWU r+A49Ba3JaTvllbCcpy2I2NgRuYkLwS7DqcxGysDTBZBKy3OqrP2klztPvKqRuzr VuMkV2m4muInXBXjOkpvIpdJEgVo6/qnOq1bwrzbS7PShkWkq7+BIJ31PgVqaGK5 ol/XTcKmVToHq6FVDAq6jXbqMHBIUcAlAPLb2ol6PhoprL1+FkaQNvRb3B+SQi51 2rT1stemiLUAEoWd2O231Iqt6UDmvSbgawQqyGSGvkzSPHDLVwJ4J5Ct3us3oM/A Q== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id B3qqkWu3n_xu; Wed, 15 Jan 2025 22:47:21 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjM72f8z6CmQyl; Wed, 15 Jan 2025 22:47:19 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche Subject: [PATCH v17 09/14] blk-zoned: Support pipelining of zoned writes Date: Wed, 15 Jan 2025 14:46:43 -0800 Message-ID: <20250115224649.3973718-10-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Support pipelining of zoned writes if the block driver preserves the write order per hardware queue. Track per zone to which software queue writes have been queued. If zoned writes are pipelined, submit new writes to the same software queue as the writes that are already in progress. This prevents reordering by submitting requests for the same zone to different software or hardware queues. Cc: Christoph Hellwig Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- block/blk-mq.c | 4 ++-- block/blk-zoned.c | 33 ++++++++++++++++++++++++--------- 2 files changed, 26 insertions(+), 11 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 01cfcc6f7b02..5ac9ff1ab380 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3103,8 +3103,8 @@ void blk_mq_submit_bio(struct bio *bio) /* * A BIO that was released from a zone write plug has already been * through the preparation in this function, already holds a reference - * on the queue usage counter, and is the only write BIO in-flight for - * the target zone. Go straight to preparing a request for it. + * on the queue usage counter. Go straight to preparing a request for + * it. */ if (bio_zone_write_plugging(bio)) { nr_segs = bio->__bi_nr_segments; diff --git a/block/blk-zoned.c b/block/blk-zoned.c index e2929d00dafd..6eec11b04501 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -55,6 +55,8 @@ static const char *const zone_cond_name[] = { * as a number of 512B sectors. * @wp_offset_compl: End offset for completed zoned writes as a number of 512 * byte sectors. + * @swq_cpu: Software queue to submit writes to for drivers that preserve the + * write order. * @bio_list: The list of BIOs that are currently plugged. * @bio_work: Work struct to handle issuing of plugged BIOs * @rcu_head: RCU head to free zone write plugs with an RCU grace period. @@ -69,6 +71,7 @@ struct blk_zone_wplug { unsigned int zone_no; unsigned int wp_offset; unsigned int wp_offset_compl; + int swq_cpu; struct bio_list bio_list; struct work_struct bio_work; struct rcu_head rcu_head; @@ -78,8 +81,7 @@ struct blk_zone_wplug { /* * Zone write plug flags bits: * - BLK_ZONE_WPLUG_PLUGGED: Indicates that the zone write plug is plugged, - * that is, that write BIOs are being throttled due to a write BIO already - * being executed or the zone write plug bio list is not empty. + * that is, that write BIOs are being throttled. * - BLK_ZONE_WPLUG_NEED_WP_UPDATE: Indicates that we lost track of a zone * write pointer offset and need to update it. * - BLK_ZONE_WPLUG_UNHASHED: Indicates that the zone write plug was removed @@ -568,6 +570,7 @@ static struct blk_zone_wplug *disk_get_and_lock_zone_wplug(struct gendisk *disk, zwplug->zone_no = zno; zwplug->wp_offset = bdev_offset_from_zone_start(disk->part0, sector); zwplug->wp_offset_compl = zwplug->wp_offset; + zwplug->swq_cpu = -1; bio_list_init(&zwplug->bio_list); INIT_WORK(&zwplug->bio_work, blk_zone_wplug_bio_work); zwplug->disk = disk; @@ -1085,7 +1088,8 @@ static bool blk_zone_wplug_prepare_bio(struct blk_zone_wplug *zwplug, return true; } -static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) +static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs, + int *swq_cpu) { struct gendisk *disk = bio->bi_bdev->bd_disk; sector_t sector = bio->bi_iter.bi_sector; @@ -1138,8 +1142,15 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) * Otherwise, plug and let the BIO execute. */ if ((zwplug->flags & BLK_ZONE_WPLUG_BUSY) || - (bio->bi_opf & REQ_NOWAIT)) + (bio->bi_opf & REQ_NOWAIT)) { goto plug; + } else if (disk->queue->limits.driver_preserves_write_order) { + if (zwplug->swq_cpu < 0) + zwplug->swq_cpu = raw_smp_processor_id(); + *swq_cpu = zwplug->swq_cpu; + } else { + zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; + } if (!blk_zone_wplug_prepare_bio(zwplug, bio)) { spin_unlock_irqrestore(&zwplug->lock, flags); @@ -1147,8 +1158,6 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) return true; } - zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; - spin_unlock_irqrestore(&zwplug->lock, flags); return false; @@ -1223,7 +1232,7 @@ bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs, int *swq_cpu) fallthrough; case REQ_OP_WRITE: case REQ_OP_WRITE_ZEROES: - return blk_zone_wplug_handle_write(bio, nr_segs); + return blk_zone_wplug_handle_write(bio, nr_segs, swq_cpu); case REQ_OP_ZONE_RESET: return blk_zone_wplug_handle_reset_or_finish(bio, 0); case REQ_OP_ZONE_FINISH: @@ -1278,6 +1287,9 @@ static void disk_zone_wplug_unplug_bio(struct gendisk *disk, zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; + if (refcount_read(&zwplug->ref) == 2) + zwplug->swq_cpu = -1; + /* * If the zone is full (it was fully written or finished, or empty * (it was reset), remove its zone write plug from the hash table. @@ -2041,6 +2053,7 @@ static void queue_zone_wplug_show(struct blk_zone_wplug *zwplug, unsigned int zwp_zone_no, zwp_ref; unsigned int zwp_bio_list_size; unsigned long flags; + int swq_cpu; spin_lock_irqsave(&zwplug->lock, flags); zwp_zone_no = zwplug->zone_no; @@ -2049,13 +2062,15 @@ static void queue_zone_wplug_show(struct blk_zone_wplug *zwplug, zwp_wp_offset = zwplug->wp_offset; zwp_wp_offset_compl = zwplug->wp_offset_compl; zwp_bio_list_size = bio_list_size(&zwplug->bio_list); + swq_cpu = zwplug->swq_cpu; spin_unlock_irqrestore(&zwplug->lock, flags); bool all_zwr_inserted = blk_zone_all_zwr_inserted(zwplug); - seq_printf(m, "zone_no %u flags 0x%x ref %u wp_offset %u bio_list_size %u all_zwr_inserted %d\n", + seq_printf(m, "zone_no %u flags 0x%x ref %u wp_offset %u wp_offset_compl %u bio_list_size %u all_zwr_inserted %d swq_cpu %d\n", zwp_zone_no, zwp_flags, zwp_ref, zwp_wp_offset, - zwp_bio_list_size, all_zwr_inserted); + zwp_wp_offset_compl, zwp_bio_list_size, all_zwr_inserted, + swq_cpu); } int queue_zone_wplugs_show(void *data, struct seq_file *m) From patchwork Wed Jan 15 22:46:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940995 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DB0F1DBB19 for ; Wed, 15 Jan 2025 22:47:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981251; cv=none; b=ePS1Tds89vDq7/opHwfMkOxiCJLS4l+KYWpeBj5+q4g9g4XGoIaCSG9HgpQQJctFAawrPFtfiuNw/qgq7KRgyk+Ex6JsRwJLQA7ETVqSnbh6BpzSzfw23bhbnGb6O4Cbuhdf6Ln6p4QLT7Kf4oG0zoXRIAlvzsIc8yP3SfSp51c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981251; c=relaxed/simple; bh=AIhneZsKBZZdIP4cBp5Qt9w5GESafUfEVnq7f85c5R8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=it1CMOXePgKjOGuFlUxyY7N1rUqNhHlTA7E6ucnGBGedmnF6dV0pX2Ws3RQ32mfwSqwajynuVPz70JMrYYxX5IjcqVoCO+Ceh0hCgb2oXV8GjBkb82jM84fzgdhhyxUAxRwS7kMxVhT5BCX7fomVGrWzMsmrnbS0VyqkCwHquvc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=ZosbXVD7; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="ZosbXVD7" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjY0qTbz6CmM6M; Wed, 15 Jan 2025 22:47:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981244; x=1739573245; bh=xQ7Ve aZq6rWbKoI4S5dg8YZWrUXht2fshQkVyJPRC+s=; b=ZosbXVD7vis/nNfVoHSMi c/IHIwL/WbEsBy5mPsU5wImSHlV9P8DETQxA6PWeeQzhmgHkaVk6wagbgm7a+EN0 2l2JqaQx24L/l2na7Y59+pGfhyscTcj0+WEGCLQFkVmpmyoWFA3eAK7W5gpEzaKB g8y3cqaBDRTqnsCP5SVhNseW3NJQbjtj3/5vzxFoiXX9uKREQm1G4IcrajQEQcHS KobnTIAcWQgDZE9KqoLCdCxVyORCpm+JljCtw/WcSr4HpmfA4vAzt+4lw9bPZUjY 4bYKu4/+YbOxfBFh/tEtInXhxw3VeDA7qGrVZTlgh/X0h+4j7mOJvOYRJbf0w2l+ g== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id 6GCBoAuRyfIl; Wed, 15 Jan 2025 22:47:24 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjQ27NCz6CmR5y; Wed, 15 Jan 2025 22:47:21 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche , "Martin K. Petersen" , Ming Lei Subject: [PATCH v17 10/14] scsi: core: Retry unaligned zoned writes Date: Wed, 15 Jan 2025 14:46:44 -0800 Message-ID: <20250115224649.3973718-11-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If zoned writes (REQ_OP_WRITE) for a sequential write required zone have a starting LBA that differs from the write pointer, e.g. because a prior write triggered a unit attention condition, then the storage device will respond with an UNALIGNED WRITE COMMAND error. Retry commands that failed with an unaligned write error. The block layer core will sort the SCSI commands per LBA before these are resubmitted. Reviewed-by: Damien Le Moal Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Ming Lei Signed-off-by: Bart Van Assche --- drivers/scsi/scsi_error.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 10154d78e336..24cd8e8538e5 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -700,6 +700,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) fallthrough; case ILLEGAL_REQUEST: + /* + * Unaligned write command. This may indicate that zoned writes + * have been received by the device in the wrong order. If write + * pipelining is enabled, retry. + */ + if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 && + req->q->limits.driver_preserves_write_order && + blk_rq_is_seq_zoned_write(req) && + scsi_cmd_retry_allowed(scmd)) { + SCSI_LOG_ERROR_RECOVERY(1, + sdev_printk(KERN_WARNING, scmd->device, + "Retrying unaligned write at LBA %#llx.\n", + scsi_get_lba(scmd))); + return NEEDS_RETRY; + } + if (sshdr.asc == 0x20 || /* Invalid command operation code */ sshdr.asc == 0x21 || /* Logical block address out of range */ sshdr.asc == 0x22 || /* Invalid function */ From patchwork Wed Jan 15 22:46:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940997 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2542D1D9350 for ; Wed, 15 Jan 2025 22:47:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981253; cv=none; b=NLVM9uxLrrFLL9VEofaedOKPCe4VaYTO+TmoWlX9g1E3EEgtrQk7Zzazb2sC1ddJS7ykc4GT5G23sVt5AN2CRkr4GC+k4u7Ew44OsH/5BxcNrWIyvN6FJ1F68CpwU9ik4AkpnXGyhWs/8q3nuIX6bUZarWHbq36zEBNrVSI2+1E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981253; c=relaxed/simple; bh=yLVqPbvQvuO7qewjYJfIewsSWHJZ1Ake4gMns2/vMvw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O9wuGJ8pY3H+x1aN8wjCdFqEPYIJEuG8TdJhEsEXflg8peVA/PdjB60x/RIz6eN6kkzlwJNLhbqA9jKOqSxE1Z8BvTdsoI3QbP5kdSJDsdFOinl8jm/lpX+CSadC4FHeQbTVqKXZfMrmJ7fSa+PXr3FmL55bsyX8RxQ8HrFO+rU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=JqSGov9g; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="JqSGov9g" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjb3jp2z6CmQvG; Wed, 15 Jan 2025 22:47:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981247; x=1739573248; bh=EAkuB 2xypSUVIPiBl8SymBTYsDKGpfY48lwKCQ0aKug=; b=JqSGov9gfWFMNp/ET/EfO +s3JNn9idQSGyJ31OWEGu17aNUZ/ecF1pxVfi4bWqgOL/XNYgoa3oPE/OPPrbwTM TVwJnJ2q4Bq2kiTJLTiGWGDqHferg6+UFSnvPTK11wZ4lnzwMlqTrG5Z9X6HA5Za MHI8P6PTmVG2cY0G/kqaeb5zCdN4f/iUt7tAHfBmUujlyt66a3Oxdfm0XOaVNdS1 3zEhqBcqHjIxeSsqhoqp7r1ZWVlPc8zC9jPJaTuitHjhPNjLB25pgKWwvg8zRZxF QgEXKYSgjQeDBRZksdRiDvsV3JYI9kVkO/1T1byIFhRFwMAbn8WSvzfDtcMwOul3 g== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id Y6lsaAbzsiqa; Wed, 15 Jan 2025 22:47:27 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjT2Qgmz6CmM6X; Wed, 15 Jan 2025 22:47:24 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche , "Martin K. Petersen" , Ming Lei Subject: [PATCH v17 11/14] scsi: sd: Increase retry count for zoned writes Date: Wed, 15 Jan 2025 14:46:45 -0800 Message-ID: <20250115224649.3973718-12-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If the write order is preserved, increase the number of retries for write commands sent to a sequential zone to the maximum number of outstanding commands because in the worst case the number of times reordered zoned writes have to be retried is (number of outstanding writes per sequential zone) - 1. Cc: Damien Le Moal Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Ming Lei Signed-off-by: Bart Van Assche --- drivers/scsi/sd.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index d9e3235d7fd0..2594debb756c 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1403,6 +1403,13 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd) cmd->transfersize = sdp->sector_size; cmd->underflow = nr_blocks << 9; cmd->allowed = sdkp->max_retries; + /* + * Increase the number of allowed retries for zoned writes if the driver + * preserves the command order. + */ + if (rq->q->limits.driver_preserves_write_order && + blk_rq_is_seq_zoned_write(rq)) + cmd->allowed += rq->q->nr_requests; cmd->sdb.length = nr_blocks * sdp->sector_size; SCSI_LOG_HLQUEUE(1, From patchwork Wed Jan 15 22:46:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940998 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E93F31DD866 for ; Wed, 15 Jan 2025 22:47:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981259; cv=none; b=q3lw+0oe061vN1b3HGxEMyGPqFFWmmQdYnJ9sjMlJe1aWiAwliAR1AU9GvEOONcCrabBEmeTldD5y7o6nD+KGPZ8ui/gkcIcB+aTqVKUtKlly2NOoZ0fJ43298rmdcckWHc7GgD8wPfDObvlxSeBNGL8sQvWhf6BTzq1dN4jZM4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981259; c=relaxed/simple; bh=zvjePGvtgrcqy2AnfUL2/iLbg/JK9+JCge9K7ucyIlg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cI7uItT1+B2Mg8XW1a//it4SCP6U6sO9XH2Ut9y+oYrGUXhe6kMALhE4/RMNexH34kqpSntJB+xWVpH+07M2y2YW3YCuBo14PbGzYbHPAXfL+zEvvsC8kx85z6iuI8+XulEVbT/dGNC8QTnGw6hA70mJ1UZprk2o+Ff9oUn2hhk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=ckVBa/BC; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="ckVBa/BC" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjj2kDWz6CmM6Q; Wed, 15 Jan 2025 22:47:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981251; x=1739573252; bh=IjbPM ee6OZw0TlMEMNUyTgAtiYTK3YmByiMBVyGynOg=; b=ckVBa/BCptkvdWUcyn3es G4XmcdVGCaNL/Mvyr+ajGvpstcKTAH6c6EmSEY+Ym4lootG/f9D3ULG1q8v+O9r6 jA5e65kre1VCK/ZB6tBx/7TONBQP9i0Mxzz69zONQuk2QGx20IH7tSP7ShSFgQgQ rhvIU9AeFd95qFjhi3FY7xR5txIt3uZ4fqnPV2E9hKO662vUyE3xKLKZwCAzHwkS ukDSk2rnHoc78pOw1Xjs8G4Y9XJ1bl7CJYQY2FmUzeAgY7Zz/KysYVrws+jRM+/+ KgC3wpy8JsLmxW7Co3RSIcy6ElQ+t/5yOZGCp5XktpPXJBs/EfXTLXd77h55reXP Q== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id b5cKNCrUS9Q6; Wed, 15 Jan 2025 22:47:31 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjX2mqNz6CmQyl; Wed, 15 Jan 2025 22:47:27 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche , Douglas Gilbert , "Martin K. Petersen" , Ming Lei Subject: [PATCH v17 12/14] scsi: scsi_debug: Add the preserves_write_order module parameter Date: Wed, 15 Jan 2025 14:46:46 -0800 Message-ID: <20250115224649.3973718-13-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Zone write locking is not used for zoned devices if the block driver reports that it preserves the order of write commands. Make it easier to test not using zone write locking by adding support for setting the driver_preserves_write_order flag. Acked-by: Douglas Gilbert Reviewed-by: Damien Le Moal Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Ming Lei Signed-off-by: Bart Van Assche --- drivers/scsi/scsi_debug.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index 680ba180a672..11df07b25c26 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -927,6 +927,7 @@ static int dix_reads; static int dif_errors; /* ZBC global data */ +static bool sdeb_preserves_write_order; static bool sdeb_zbc_in_use; /* true for host-aware and host-managed disks */ static int sdeb_zbc_zone_cap_mb; static int sdeb_zbc_zone_size_mb; @@ -5881,10 +5882,14 @@ static struct sdebug_dev_info *find_build_dev_info(struct scsi_device *sdev) static int scsi_debug_slave_alloc(struct scsi_device *sdp) { + struct request_queue *q = sdp->request_queue; + if (sdebug_verbose) pr_info("slave_alloc <%u %u %u %llu>\n", sdp->host->host_no, sdp->channel, sdp->id, sdp->lun); + q->limits.driver_preserves_write_order = sdeb_preserves_write_order; + return 0; } @@ -6620,6 +6625,8 @@ module_param_named(statistics, sdebug_statistics, bool, S_IRUGO | S_IWUSR); module_param_named(strict, sdebug_strict, bool, S_IRUGO | S_IWUSR); module_param_named(submit_queues, submit_queues, int, S_IRUGO); module_param_named(poll_queues, poll_queues, int, S_IRUGO); +module_param_named(preserves_write_order, sdeb_preserves_write_order, bool, + S_IRUGO); module_param_named(tur_ms_to_ready, sdeb_tur_ms_to_ready, int, S_IRUGO); module_param_named(unmap_alignment, sdebug_unmap_alignment, int, S_IRUGO); module_param_named(unmap_granularity, sdebug_unmap_granularity, int, S_IRUGO); @@ -6692,6 +6699,8 @@ MODULE_PARM_DESC(opts, "1->noise, 2->medium_err, 4->timeout, 8->recovered_err... MODULE_PARM_DESC(per_host_store, "If set, next positive add_host will get new store (def=0)"); MODULE_PARM_DESC(physblk_exp, "physical block exponent (def=0)"); MODULE_PARM_DESC(poll_queues, "support for iouring iopoll queues (1 to max(submit_queues - 1))"); +MODULE_PARM_DESC(preserves_write_order, + "Whether or not to inform the block layer that this driver preserves the order of WRITE commands (def=0)"); MODULE_PARM_DESC(ptype, "SCSI peripheral type(def=0[disk])"); MODULE_PARM_DESC(random, "If set, uniformly randomize command duration between 0 and delay_in_ns"); MODULE_PARM_DESC(removable, "claim to have removable media (def=0)"); From patchwork Wed Jan 15 22:46:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13940999 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E6F81DCB21 for ; Wed, 15 Jan 2025 22:47:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981261; cv=none; b=hLTLwIbUwcMpAn3350MvYdFjXGCDY64gI+0eX9GRI+QoOBYZdMBRxMT4AvZnK3mrXnbHWghNB7jSYXcyR+7okjOyF8Ll6ub47o1eFoeBwuE5DcuHmAMyhjo5Ksy5NDvJk4ZP46PoBqUXkYIB200GDu9CPtXsZXwvp9ZEzRojYQA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981261; c=relaxed/simple; bh=0EFE0q92nMI+ov8TwxRyXIwSNAGwiTv4YkkBLZC9Zbk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tp+IKj5D7CpjivzVBRF2xloADQPyGHVOmCz9cuwJXad7lvQoTucM8yWktIMsXpn+nC6Bo+kXyybS8alkTDxpKuuTfhfslb2BXpBxs3s2p4+cGSUfjo5PXIRy+OswW40wuKA9Awcp2l2WqCXBf3xmZFjpenTG/gCivhEGip+YA3M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=i5LkeM+/; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="i5LkeM+/" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjl1f4wz6CmQyl; Wed, 15 Jan 2025 22:47:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981254; x=1739573255; bh=abGm1 kxByfLcEn/T/NXr9eygOONfq4vsZVAOkSEoenk=; b=i5LkeM+/dgTWVc3beIPME z20fIaX9fW6nHIjRZ0pTS9ioRJ08aXkEJ1h8Fb9IrFwmAA99Coldpq5HPnPpaXRk HqLqEKjH5CgBzUOGKfIKynUXPACpfZUTd7f7aU9MqVjuyQDmTl2ENQX/CyKiI39v +OmfFiPB95pSau/EZ32nay979J3W45qvt4xBzwzOO/Y8iyLnoWF3xvYJTzpAHfx+ AHyiIMKaVbv//Y8V5Be/nUtrNYNmyUjfI9Xk0DOY9HKM79QK40t23Lr8500CtqrD v9zmWuMmqesaG7KgRoUNsT6J1o8FEOYeS31B/fVk+RWRyfMbaCsAUqg1EUiBsOcl w== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id kPMEws_hU7ek; Wed, 15 Jan 2025 22:47:34 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjb5glXz6CmM6X; Wed, 15 Jan 2025 22:47:31 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche , Douglas Gilbert , "Martin K. Petersen" , Ming Lei Subject: [PATCH v17 13/14] scsi: scsi_debug: Support injecting unaligned write errors Date: Wed, 15 Jan 2025 14:46:47 -0800 Message-ID: <20250115224649.3973718-14-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Allow user space software, e.g. a blktests test, to inject unaligned write errors. Acked-by: Douglas Gilbert Reviewed-by: Damien Le Moal Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Ming Lei Signed-off-by: Bart Van Assche --- drivers/scsi/scsi_debug.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index 11df07b25c26..af6a128be9b6 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -193,6 +193,7 @@ static const char *sdebug_version_date = "20210520"; #define SDEBUG_OPT_NO_CDB_NOISE 0x4000 #define SDEBUG_OPT_HOST_BUSY 0x8000 #define SDEBUG_OPT_CMD_ABORT 0x10000 +#define SDEBUG_OPT_UNALIGNED_WRITE 0x20000 #define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \ SDEBUG_OPT_RESET_NOISE) #define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \ @@ -200,7 +201,8 @@ static const char *sdebug_version_date = "20210520"; SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR | \ SDEBUG_OPT_SHORT_TRANSFER | \ SDEBUG_OPT_HOST_BUSY | \ - SDEBUG_OPT_CMD_ABORT) + SDEBUG_OPT_CMD_ABORT | \ + SDEBUG_OPT_UNALIGNED_WRITE) #define SDEBUG_OPT_RECOV_DIF_DIX (SDEBUG_OPT_RECOVERED_ERR | \ SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR) @@ -4191,6 +4193,14 @@ static int resp_write_dt0(struct scsi_cmnd *scp, struct sdebug_dev_info *devip) u8 *cmd = scp->cmnd; bool meta_data_locked = false; + if (unlikely(sdebug_opts & SDEBUG_OPT_UNALIGNED_WRITE && + atomic_read(&sdeb_inject_pending))) { + atomic_set(&sdeb_inject_pending, 0); + mk_sense_buffer(scp, ILLEGAL_REQUEST, LBA_OUT_OF_RANGE, + UNALIGNED_WRITE_ASCQ); + return check_condition_result; + } + switch (cmd[0]) { case WRITE_16: ei_lba = 0; From patchwork Wed Jan 15 22:46:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13941000 Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 236511D9350 for ; Wed, 15 Jan 2025 22:47:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=199.89.1.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981269; cv=none; b=aMfGLrjTQcftjTJdWjaHjGy0s+XzIG+nEY8fLtHbxeysDVYcm915SpU0lL1hCQRMuYTSOcRdesQm2a2Auz5pxBByAmV69L5iEnWiI8KU9upCGDYE0Wc5dK0QiS8ysTXpfawQH1k4Dti5GRF8gm0RBCAkjCGsqmCIMfak1mC4Vb8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736981269; c=relaxed/simple; bh=VxZAq3k1numrivsoeehmfhaUDP/RVl58WLFtd9RVTtE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HFyj9TtUTyBKe0sfzfJOZqEwb90SALYKIRK1wIKPwAcEPig5CNcF1VwtrUjV6koW7I8zKndss7iGEMgVnc/kCpfqmo0Vkk60V9HgpeKZe/0bjxkAM4tDjpgdIripB73dFYH31Jy/A0XmgPOOGhni+0vBlWLmfcKKhuBRNH5lHu0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org; spf=pass smtp.mailfrom=acm.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b=pNqRF+Fd; arc=none smtp.client-ip=199.89.1.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=acm.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=acm.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=acm.org header.i=@acm.org header.b="pNqRF+Fd" Received: from localhost (localhost [127.0.0.1]) by 008.lax.mailroute.net (Postfix) with ESMTP id 4YYLjr4KjWz6CmR5y; Wed, 15 Jan 2025 22:47:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acm.org; h= content-transfer-encoding:mime-version:references:in-reply-to :x-mailer:message-id:date:date:subject:subject:from:from :received:received; s=mr01; t=1736981258; x=1739573259; bh=w43fq ie8sFeviDvAwAfj/UUjn3xnlWRs5/7ncET4ttM=; b=pNqRF+FdUB11Ke9EMMUwq XzaozEUYTAdy9Pr//ErVCBrNINgbdzVqR26DFaV+2wC7QPrTISxfUPRl9nY75fUW BGMcT9YhgidPAORZ42vM679JV8z4u3eG27vnLw6k+8ooeLkcRThGeQe5i5Yzg16b TXpDJApiS+/j5JMfTFp7czrxiySTZ7913JJOBHRCDc2SdY/e86sIAi5lRFSxfl66 BJZoq5DGiGKaqEtWjiT3/668glvkVWLgCL27uZJha17rH2f1zk6oNB2F7ljz3oQG 526jLhddCxoyA52Kxk9YUDgzARba34kjvcp6ZyIZMptBDAGNkzHDVsXTa6r/TU2t Q== X-Virus-Scanned: by MailRoute Received: from 008.lax.mailroute.net ([127.0.0.1]) by localhost (008.lax [127.0.0.1]) (mroute_mailscanner, port 10029) with LMTP id hTfTkWVimpWe; Wed, 15 Jan 2025 22:47:38 +0000 (UTC) Received: from bvanassche.mtv.corp.google.com (unknown [104.135.204.82]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4YYLjg1PQ9z6CmM6d; Wed, 15 Jan 2025 22:47:34 +0000 (UTC) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , Bart Van Assche , "Bao D. Nguyen" , Can Guo , "Martin K. Petersen" , Avri Altman Subject: [PATCH v17 14/14] scsi: ufs: Inform the block layer about write ordering Date: Wed, 15 Jan 2025 14:46:48 -0800 Message-ID: <20250115224649.3973718-15-bvanassche@acm.org> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog In-Reply-To: <20250115224649.3973718-1-bvanassche@acm.org> References: <20250115224649.3973718-1-bvanassche@acm.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From the UFSHCI 4.0 specification, about the legacy (single queue) mode: "The host controller always process transfer requests in-order according to the order submitted to the list. In case of multiple commands with single doorbell register ringing (batch mode), The dispatch order for these transfer requests by host controller will base on their index in the List. A transfer request with lower index value will be executed before a transfer request with higher index value." From the UFSHCI 4.0 specification, about the MCQ mode: "Command Submission 1. Host SW writes an Entry to SQ 2. Host SW updates SQ doorbell tail pointer Command Processing 3. After fetching the Entry, Host Controller updates SQ doorbell head pointer 4. Host controller sends COMMAND UPIU to UFS device" In other words, for both legacy and MCQ mode, UFS controllers are required to forward commands to the UFS device in the order these commands have been received from the host. Notes: - For legacy mode this is only correct if the host submits one command at a time. The UFS driver does this. - Also in legacy mode, the command order is not preserved if auto-hibernation is enabled in the UFS controller. This patch improves performance as follows on a test setup with UFSHCI 3.0 controller: - With the mq-deadline scheduler: 2.5x more IOPS for small writes. - When not using an I/O scheduler compared to using mq-deadline with zone locking: 4x more IOPS for small writes. Cc: Bao D. Nguyen Cc: Can Guo Cc: Martin K. Petersen Cc: Avri Altman Signed-off-by: Bart Van Assche --- drivers/ufs/core/ufshcd.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 3094f3c89e82..08803ba21668 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -5255,6 +5255,13 @@ static int ufshcd_device_configure(struct scsi_device *sdev, struct ufs_hba *hba = shost_priv(sdev->host); struct request_queue *q = sdev->request_queue; + /* + * With auto-hibernation disabled, the write order is preserved per + * MCQ. Auto-hibernation may cause write reordering that results in + * unaligned write errors. The SCSI core will retry the failed writes. + */ + lim->driver_preserves_write_order = true; + lim->dma_pad_mask = PRDT_DATA_BYTE_COUNT_PAD - 1; /*