From patchwork Mon Jul 10 18:01:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13307468 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DE94EB64D9 for ; Mon, 10 Jul 2023 18:02:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233064AbjGJSC1 (ORCPT ); Mon, 10 Jul 2023 14:02:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230449AbjGJSCY (ORCPT ); Mon, 10 Jul 2023 14:02:24 -0400 Received: from mail-oa1-f52.google.com (mail-oa1-f52.google.com [209.85.160.52]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3ECCC11B for ; Mon, 10 Jul 2023 11:02:23 -0700 (PDT) Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-1b060bce5b0so4124480fac.3 for ; Mon, 10 Jul 2023 11:02:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689012142; x=1691604142; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MriTkQt0C3boYQDOV/LvSCg2biMhVn5vndXXr9b78HU=; b=ZxBRWGoP6/MBwZUMMiApX7SefuFXfuSETyCRWgxNgfKJe/ObcvOpFuOgVB2Hu1g0p2 hCB1Lovg9Z8haeKfBzUNsBnzqL9Z5EKtda1gSpIo/vWglrRkU+PIs73hg1xUksOP8g6p 2YLpUdCwdFhExoMbKzmKewSfe1WaPS+Q/17eU9Ic9xtXURZCXBWXocpcV/xGhBg/vCtU VJYAzM8G/X42DLduwclE9hT9pyqo+61j97EvYzUFzgtLAj2cP3tUQnT/bm7iTjzZZ5yg 4c/V3f31Yg4gBhyQxOK8nx9YfCisL9FT3EMtP4aPYdpCnMJyXIjz5XfWcr9luVmkcra1 HsHg== X-Gm-Message-State: ABy/qLZeZ2iKJi8YKh06nT3/5pYtzfwIvmeazzYuaxCEegDwSt+S3G+p iqjg2Ntol8gM2Xt0GDxtNDY= X-Google-Smtp-Source: APBJJlELzuC7updyF+1xMdghxfYzZJWWNfqiIGRGXvOjIVrrA+4VV1/JBHTwvCD4M91G76u0Dgzpaw== X-Received: by 2002:a05:6870:5607:b0:1b3:df19:b1b9 with SMTP id m7-20020a056870560700b001b3df19b1b9mr13771350oao.15.1689012142431; Mon, 10 Jul 2023 11:02:22 -0700 (PDT) Received: from bvanassche-linux.mtv.corp.google.com ([2620:15c:211:201:e582:53b1:a691:ab70]) by smtp.gmail.com with ESMTPSA id gt4-20020a17090af2c400b00263f446d432sm6531846pjb.43.2023.07.10.11.02.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 11:02:22 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Damien Le Moal Subject: [PATCH v2 1/5] block: Introduce a request queue flag for pipelining zoned writes Date: Mon, 10 Jul 2023 11:01:38 -0700 Message-ID: <20230710180210.1582299-2-bvanassche@acm.org> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog In-Reply-To: <20230710180210.1582299-1-bvanassche@acm.org> References: <20230710180210.1582299-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Writes in sequential write required zones must happen at the write pointer. Even if the submitter of the write commands (e.g. a filesystem) submits writes for sequential write required zones in order, the block layer or the storage controller may reorder these write commands. The zone locking mechanism in the mq-deadline I/O scheduler serializes write commands for sequential zones. Some but not all storage controllers require this serialization. Introduce a new flag such that block drivers can request pipelining of writes for sequential write required zones. An example of a storage controller standard that requires write serialization is AHCI (Advanced Host Controller Interface). Submitting commands to AHCI controllers happens by writing a bit pattern into a register. Each set bit corresponds to an active command. This mechanism does not preserve command ordering information. Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- include/linux/blkdev.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ed44a997f629..805012c5a6ab 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -534,6 +534,8 @@ struct request_queue { #define QUEUE_FLAG_NONROT 6 /* non-rotational device (SSD) */ #define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */ #define QUEUE_FLAG_IO_STAT 7 /* do disk/partitions IO accounting */ +/* Writes for sequential write required zones may be pipelined. */ +#define QUEUE_FLAG_PIPELINE_ZONED_WRITES 8 #define QUEUE_FLAG_NOXMERGES 9 /* No extended merges */ #define QUEUE_FLAG_ADD_RANDOM 10 /* Contributes to random pool */ #define QUEUE_FLAG_SYNCHRONOUS 11 /* always completes in submit context */ @@ -596,6 +598,11 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #define blk_queue_skip_tagset_quiesce(q) \ test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags) +static inline bool blk_queue_pipeline_zoned_writes(struct request_queue *q) +{ + return test_bit(QUEUE_FLAG_PIPELINE_ZONED_WRITES, &q->queue_flags); +} + extern void blk_set_pm_only(struct request_queue *q); extern void blk_clear_pm_only(struct request_queue *q); From patchwork Mon Jul 10 18:01:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13307469 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3598EB64DC for ; Mon, 10 Jul 2023 18:02:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232704AbjGJSC2 (ORCPT ); Mon, 10 Jul 2023 14:02:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230449AbjGJSC2 (ORCPT ); Mon, 10 Jul 2023 14:02:28 -0400 Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58637AB for ; Mon, 10 Jul 2023 11:02:24 -0700 (PDT) Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-5577900c06bso3684959a12.2 for ; Mon, 10 Jul 2023 11:02:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689012144; x=1691604144; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z0A9NyIpfQV6lRpjnJcy3o0fOAxrWavnDAITb4WfrYk=; b=PETGwB2xEZRLgsmZMvP2ZbSw/6ikRFsoaB1/c4TZXFe0hRk0gd2F6S6qX1HnEpvSjL u7yPGrVolkVs3tQuJfGPHyxugOuxBdaMsx1MFcraYsXW7272zEZcuBjrjRvrJ04JUdLY oomyEgRg5eFPG9MlrgsIjNuqkVAseoFVtFkqe/pUKW3R1XGjNScGQG9nBYz0qMPEeIXa VP6pwbtxsrHi4UMbBulUkUBkp0Efo/TgOcbvdRaNoCk2QKUB+X5ejoC2RU99mqUwe3Fe wdHW08asMHoH8b0IAhTl09/SUqKqx76wDiHYZKqoSz4P88VoToQWujhsw5zTgrizWU3I /LUQ== X-Gm-Message-State: ABy/qLZFzI83Jp/hWn3j2rUDO9s06QckB64TUa+DwvaExDL13AU0Be6X yhwHqTLgC/yWDwDG/9xdTMlXbOpkd38= X-Google-Smtp-Source: APBJJlHdK0leGcGQ5+ND1cD3ytZMl7kVM+ryaIJ2o39cKdQeBseR+BmlWpX1K3LQKA6IhsrjgXz+lQ== X-Received: by 2002:a17:90a:2e82:b0:263:e814:5d0f with SMTP id r2-20020a17090a2e8200b00263e8145d0fmr13611149pjd.41.1689012143648; Mon, 10 Jul 2023 11:02:23 -0700 (PDT) Received: from bvanassche-linux.mtv.corp.google.com ([2620:15c:211:201:e582:53b1:a691:ab70]) by smtp.gmail.com with ESMTPSA id gt4-20020a17090af2c400b00263f446d432sm6531846pjb.43.2023.07.10.11.02.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 11:02:23 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Damien Le Moal Subject: [PATCH v2 2/5] block/mq-deadline: Only use zone locking if necessary Date: Mon, 10 Jul 2023 11:01:39 -0700 Message-ID: <20230710180210.1582299-3-bvanassche@acm.org> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog In-Reply-To: <20230710180210.1582299-1-bvanassche@acm.org> References: <20230710180210.1582299-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Measurements have shown that limiting the queue depth to one for zoned writes has a significant negative performance impact on zoned UFS devices. Hence this patch that disables zone locking from the mq-deadline scheduler for storage controllers that support pipelining zoned writes. This patch is based on the following assumptions: - Applications submit write requests to sequential write required zones in order. - It happens infrequently that zoned write requests are reordered by the block layer. - The storage controller does not reorder write requests that have been submitted to the same hardware queue. This is the case for UFS: the UFSHCI specification requires that UFS controllers process requests in order per hardware queue. - The I/O priority of all pipelined write requests is the same per zone. - Either no I/O scheduler is used or an I/O scheduler is used that submits write requests per zone in LBA order. Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- block/blk-zoned.c | 3 ++- block/mq-deadline.c | 14 +++++++++----- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 0f9f97cdddd9..59560d1657e3 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -504,7 +504,8 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, break; case BLK_ZONE_TYPE_SEQWRITE_REQ: case BLK_ZONE_TYPE_SEQWRITE_PREF: - if (!args->seq_zones_wlock) { + if (!blk_queue_pipeline_zoned_writes(q) && + !args->seq_zones_wlock) { args->seq_zones_wlock = blk_alloc_zone_bitmap(q->node, args->nr_zones); if (!args->seq_zones_wlock) diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 6aa5daf7ae32..0bed2bdeed89 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -353,7 +353,8 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio, return NULL; rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next); - if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q)) + if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) || + blk_queue_pipeline_zoned_writes(rq->q)) return rq; /* @@ -398,7 +399,8 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio, if (!rq) return NULL; - if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q)) + if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) || + blk_queue_pipeline_zoned_writes(rq->q)) return rq; /* @@ -526,8 +528,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd, } /* - * For a zoned block device, if we only have writes queued and none of - * them can be dispatched, rq will be NULL. + * For a zoned block device that requires write serialization, if we + * only have writes queued and none of them can be dispatched, rq will + * be NULL. */ if (!rq) return NULL; @@ -933,7 +936,8 @@ static void dd_finish_request(struct request *rq) atomic_inc(&per_prio->stats.completed); - if (blk_queue_is_zoned(q)) { + if (blk_queue_is_zoned(rq->q) && + !blk_queue_pipeline_zoned_writes(q)) { unsigned long flags; spin_lock_irqsave(&dd->zone_lock, flags); From patchwork Mon Jul 10 18:01:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13307470 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CA61EB64D9 for ; Mon, 10 Jul 2023 18:02:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231585AbjGJSCu (ORCPT ); Mon, 10 Jul 2023 14:02:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232397AbjGJSCt (ORCPT ); Mon, 10 Jul 2023 14:02:49 -0400 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23A70AB for ; Mon, 10 Jul 2023 11:02:42 -0700 (PDT) Received: by mail-pg1-f177.google.com with SMTP id 41be03b00d2f7-51b4ef5378bso3541364a12.1 for ; Mon, 10 Jul 2023 11:02:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689012161; x=1691604161; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w2J3/6TQmJEZYTGRB9SMk/+oE4pHEwlR8798avx4KTw=; b=MngRVTOEL6LphsH+sBRrEbBQtQXsPgGxUMJ1nDCLOTMHh9bapjri7kAres+B+3ESUx 47z0vBdsFaf//FJsjjYCbMcgDqQCSmDS/QmVsS/Ds7hbM9w6GdCNtDuCom4hkEAQR5Vx yVyPWDmSw+FrkFdU+pX7EHvp0kcjzqKLj5qQcIgxLlWyKWYCDIfa/4dGx/qnE60nl18K xUkIrHInpltY6bLCxq8JJ37DVdIEm9cWipc1Sw9ZqYRblfEapGYgOHr7lbbgSggfcntY nPyze6KBYO9262OZuN9d1oVAP/YiJATw0ZMVwyhnETjL14Whwe4xRkGYvQu2VKf4BeES Zyuw== X-Gm-Message-State: ABy/qLY85LlwvphCu7X8BHdTGjNpTvzhjaPML8QGhPvECJupLxAqW1/q u4wuqv2jQ0cS2m0zWXJwwcM= X-Google-Smtp-Source: APBJJlFs8Niu66WJ5ndYLBlGYA5i6v1Fpb6xkfE4vpqxbqKGC3aTN/nG2rlRdavoxPVHkg4256Hr2g== X-Received: by 2002:a17:90a:5218:b0:262:f872:fa77 with SMTP id v24-20020a17090a521800b00262f872fa77mr10829024pjh.31.1689012161266; Mon, 10 Jul 2023 11:02:41 -0700 (PDT) Received: from bvanassche-linux.mtv.corp.google.com ([2620:15c:211:201:e582:53b1:a691:ab70]) by smtp.gmail.com with ESMTPSA id gt4-20020a17090af2c400b00263f446d432sm6531846pjb.43.2023.07.10.11.02.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 11:02:40 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Damien Le Moal , Chaitanya Kulkarni , Damien Le Moal , Johannes Thumshirn , Vincent Fu , "Shin'ichiro Kawasaki" , Akinobu Mita Subject: [PATCH v2 3/5] block/null_blk: Add support for pipelining zoned writes Date: Mon, 10 Jul 2023 11:01:40 -0700 Message-ID: <20230710180210.1582299-4-bvanassche@acm.org> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog In-Reply-To: <20230710180210.1582299-1-bvanassche@acm.org> References: <20230710180210.1582299-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add a new configfs attribute for enabling pipelining of zoned writes. The test script below reports 250 K IOPS with no I/O scheduler, 6 K IOPS with mq-deadline and pipelining disabled and 123 K IOPS with mq-deadline and pipelining enabled. This shows that pipelining results in about 20 times more IOPS for this particular test case. #!/bin/bash for mode in "none 0" "mq-deadline 0" "mq-deadline 1"; do set +e for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d" done modprobe -r null_blk set -e read -r iosched pipelining <<<"$mode" modprobe null_blk nr_devices=0 ( cd /sys/kernel/config/nullb mkdir nullb0 cd nullb0 params=( completion_nsec=100000 hw_queue_depth=64 irqmode=2 # NULL_IRQ_TIMER max_sectors=$((4096/512)) memory_backed=1 pipeline_zoned_writes="${pipelining}" size=1 submit_queues=1 zone_size=1 zoned=1 power=1 ) for p in "${params[@]}"; do echo "${p//*=}" > "${p//=*}" done ) udevadm settle dev=/dev/nullb0 [ -b "${dev}" ] params=( --direct=1 --filename="${dev}" --iodepth=64 --iodepth_batch=16 --ioengine=io_uring --ioscheduler="${iosched}" --gtod_reduce=1 --hipri=0 --name=nullb0 --runtime=30 --rw=write --time_based=1 --zonemode=zbd ) fio "${params[@]}" done Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- drivers/block/null_blk/main.c | 2 ++ drivers/block/null_blk/null_blk.h | 1 + drivers/block/null_blk/zoned.c | 3 +++ 3 files changed, 6 insertions(+) diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index 864013019d6b..1cd6eb4e2c16 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -424,6 +424,7 @@ NULLB_DEVICE_ATTR(zone_capacity, ulong, NULL); NULLB_DEVICE_ATTR(zone_nr_conv, uint, NULL); NULLB_DEVICE_ATTR(zone_max_open, uint, NULL); NULLB_DEVICE_ATTR(zone_max_active, uint, NULL); +NULLB_DEVICE_ATTR(pipeline_zoned_writes, bool, NULL); NULLB_DEVICE_ATTR(virt_boundary, bool, NULL); NULLB_DEVICE_ATTR(no_sched, bool, NULL); NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL); @@ -569,6 +570,7 @@ static struct configfs_attribute *nullb_device_attrs[] = { &nullb_device_attr_zone_max_active, &nullb_device_attr_zone_readonly, &nullb_device_attr_zone_offline, + &nullb_device_attr_pipeline_zoned_writes, &nullb_device_attr_virt_boundary, &nullb_device_attr_no_sched, &nullb_device_attr_shared_tag_bitmap, diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h index 929f659dd255..248acf288a8e 100644 --- a/drivers/block/null_blk/null_blk.h +++ b/drivers/block/null_blk/null_blk.h @@ -117,6 +117,7 @@ struct nullb_device { bool memory_backed; /* if data is stored in memory */ bool discard; /* if support discard */ bool zoned; /* if device is zoned */ + bool pipeline_zoned_writes; bool virt_boundary; /* virtual boundary on/off for the device */ bool no_sched; /* no IO scheduler for the device */ bool shared_tag_bitmap; /* use hostwide shared tags */ diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c index 635ce0648133..2bfd5f7cee67 100644 --- a/drivers/block/null_blk/zoned.c +++ b/drivers/block/null_blk/zoned.c @@ -96,6 +96,9 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q) spin_lock_init(&dev->zone_res_lock); + if (dev->pipeline_zoned_writes) + blk_queue_flag_set(QUEUE_FLAG_PIPELINE_ZONED_WRITES, q); + if (dev->zone_nr_conv >= dev->nr_zones) { dev->zone_nr_conv = dev->nr_zones - 1; pr_info("changed the number of conventional zones to %u", From patchwork Mon Jul 10 18:01:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13307471 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11544EB64DA for ; Mon, 10 Jul 2023 18:02:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229451AbjGJSCv (ORCPT ); Mon, 10 Jul 2023 14:02:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230449AbjGJSCu (ORCPT ); Mon, 10 Jul 2023 14:02:50 -0400 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D079187 for ; Mon, 10 Jul 2023 11:02:43 -0700 (PDT) Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-5577905ef38so1649612a12.0 for ; Mon, 10 Jul 2023 11:02:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689012163; x=1691604163; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=30RAgPQwNyyrt3ts6ShvyJIOlHzYMZrPsYCI8GiZL+o=; b=FAsHz9IU/l5phT6p41s6PJO4pNX9iwiDqjZatCXFIqo4xlKNlVJIvkFOI33EAB7p3M zbVW3EjJKWYl1FNEhn1pZYNyZgFT45XXdTSqblC16uswLlPvd09gNE0fjTDqXEpP00Fi yaGkjuGaOKmxqv+n3LEtgZJ/+IgNAxOfWPeuSPVZ/0pjWSKVoND27WwwmIU6pKLbLx7/ fBJOVF1OSLOmvEgGQXXR0/2y6CGMYJgS0YMZSvhHauvJ+1ThYwqrWbLkfvxDRkdaQ6x7 n5/ldFFqyt2GRZFBoirYnkJd13pQH4Gl3iChFweT/xsxgvW1IXLQRuls1OxxEmzrXvSM KCmg== X-Gm-Message-State: ABy/qLZPacMWEnARMwQobM4W/1HYT6L3pXH7MMSfXo9xAfWqtxN9mvnn 23CbbM88WlFOrPvhb7zMc60= X-Google-Smtp-Source: APBJJlE0rKWBZClYfL0ZVb3jHnm1knsh50ptu+Lv/MY3RPgFzdQPZ8kHMIuPGEI/jR4EPZarlca1Xw== X-Received: by 2002:a17:90b:1998:b0:263:a37:fcc3 with SMTP id mv24-20020a17090b199800b002630a37fcc3mr10295283pjb.5.1689012162632; Mon, 10 Jul 2023 11:02:42 -0700 (PDT) Received: from bvanassche-linux.mtv.corp.google.com ([2620:15c:211:201:e582:53b1:a691:ab70]) by smtp.gmail.com with ESMTPSA id gt4-20020a17090af2c400b00263f446d432sm6531846pjb.43.2023.07.10.11.02.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 11:02:42 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , "Martin K . Petersen" , Damien Le Moal , "James E.J. Bottomley" Subject: [PATCH v2 4/5] scsi: Retry unaligned zoned writes Date: Mon, 10 Jul 2023 11:01:41 -0700 Message-ID: <20230710180210.1582299-5-bvanassche@acm.org> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog In-Reply-To: <20230710180210.1582299-1-bvanassche@acm.org> References: <20230710180210.1582299-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From ZBC-2: "The device server terminates with CHECK CONDITION status, with the sense key set to ILLEGAL REQUEST, and the additional sense code set to UNALIGNED WRITE COMMAND a write command, other than an entire medium write same command, that specifies: a) the starting LBA in a sequential write required zone set to a value that is not equal to the write pointer for that sequential write required zone; or b) an ending LBA that is not equal to the last logical block within a physical block (see SBC-5)." I am not aware of any other conditions that may trigger the UNALIGNED WRITE COMMAND response. Send commands that failed with an unaligned write error to the SCSI error handler. Let the SCSI error handler sort SCSI commands per LBA before resubmitting these. Increase the number of retries for write commands sent to a sequential zone to the maximum number of outstanding commands. Cc: Martin K. Petersen Cc: Damien Le Moal Signed-off-by: Bart Van Assche --- drivers/scsi/scsi_error.c | 37 +++++++++++++++++++++++++++++++++++++ drivers/scsi/scsi_lib.c | 1 + drivers/scsi/sd.c | 3 +++ include/scsi/scsi.h | 1 + 4 files changed, 42 insertions(+) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 3ec8bfd4090f..a6d562f3a085 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -681,6 +682,17 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) fallthrough; case ILLEGAL_REQUEST: + /* + * Unaligned write command. This indicates that zoned writes + * have been received by the device in the wrong order. If zoned + * write pipelining is enabled, retry after all pending commands + * have completed. + */ + if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 && + blk_queue_pipeline_zoned_writes(sdev->request_queue) && + !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd)) + return NEEDS_DELAYED_RETRY; + if (sshdr.asc == 0x20 || /* Invalid command operation code */ sshdr.asc == 0x21 || /* Logical block address out of range */ sshdr.asc == 0x22 || /* Invalid function */ @@ -2177,6 +2189,25 @@ void scsi_eh_flush_done_q(struct list_head *done_q) } EXPORT_SYMBOL(scsi_eh_flush_done_q); +/* + * Returns a negative value if @_a has a lower LBA than @_b, zero if + * both have the same LBA and a positive value otherwise. + */ +static int scsi_cmp_lba(void *priv, const struct list_head *_a, + const struct list_head *_b) +{ + struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry); + struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry); + const sector_t pos_a = blk_rq_pos(scsi_cmd_to_rq(a)); + const sector_t pos_b = blk_rq_pos(scsi_cmd_to_rq(b)); + + if (pos_a < pos_b) + return -1; + if (pos_a > pos_b) + return 1; + return 0; +} + /** * scsi_unjam_host - Attempt to fix a host which has a cmd that failed. * @shost: Host to unjam. @@ -2212,6 +2243,12 @@ static void scsi_unjam_host(struct Scsi_Host *shost) SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q)); + /* + * Sort pending SCSI commands in LBA order. This is important if write + * pipelining is enabled for a zoned SCSI device. + */ + list_sort(NULL, &eh_work_q, scsi_cmp_lba); + if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q)) scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 0226c9279cef..a6cfdc4bfbf1 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1445,6 +1445,7 @@ static void scsi_complete(struct request *rq) case ADD_TO_MLQUEUE: scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); break; + case NEEDS_DELAYED_RETRY: default: scsi_eh_scmd_add(cmd); break; diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index ab216976dbdc..6cf7495c6b56 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1206,6 +1206,9 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd) cmd->transfersize = sdp->sector_size; cmd->underflow = nr_blocks << 9; cmd->allowed = sdkp->max_retries; + if (blk_queue_pipeline_zoned_writes(rq->q) && + blk_rq_is_seq_zoned_write(rq)) + cmd->allowed += rq->q->nr_requests; cmd->sdb.length = nr_blocks * sdp->sector_size; SCSI_LOG_HLQUEUE(1, diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h index ec093594ba53..6600db046227 100644 --- a/include/scsi/scsi.h +++ b/include/scsi/scsi.h @@ -93,6 +93,7 @@ static inline int scsi_status_is_check_condition(int status) * Internal return values. */ enum scsi_disposition { + NEEDS_DELAYED_RETRY = 0x2000, NEEDS_RETRY = 0x2001, SUCCESS = 0x2002, FAILED = 0x2003, From patchwork Mon Jul 10 18:01:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 13307472 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB1F1EB64DC for ; Mon, 10 Jul 2023 18:02:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230227AbjGJSCw (ORCPT ); Mon, 10 Jul 2023 14:02:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230449AbjGJSCw (ORCPT ); Mon, 10 Jul 2023 14:02:52 -0400 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9557128 for ; Mon, 10 Jul 2023 11:02:50 -0700 (PDT) Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-263036d4bc3so3495438a91.2 for ; Mon, 10 Jul 2023 11:02:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689012170; x=1691604170; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2EY7/MdXMbHZJmt1qUeo9JN1ygy4O7c3IfbaOSwHr8k=; b=XnWcS9KUONIGx9K8wxPvn11tsnyRYrW6u7kQ61AWlLKEcq03sU3WvpzIVrAFE8jP16 tfE+37w6LVqOLNiG6Per9CdaxDKlO2O1XZlRdUsorkLh95bIBh4aZDt1wt6nYAvDdPxS BIqV344uRV1WqHae+batLDq/9mD1D+IcYV/GDnlHlGTpMjEFIudL5yBEpgl9hiv/si4H weh0SZ475egXeQDDWrvUJFgfbr6T69GOsTXEzACGbfsLojW7opq1MdLPtN/N+YGKh58+ 4GNCGWqwX/zLRgL+ql8SqHdCo+QHQ9toS9y7FiJsW1H8tv3yytZ2rEMWc6F2VzUFn5KQ Bxug== X-Gm-Message-State: ABy/qLZ2vUhq0/83hSGP/Rgd+BuBusm3DBasfr1OP9uqPVg7qNn2cI4S ePplr29l3zEG7Zbkk9T3++s= X-Google-Smtp-Source: APBJJlGrQQgCP319ix8sUw1znhWbJPNJRoutDjk3xbjqoKuGIdOQy/DSzvAH0IP6G71y/xwEDj4rRQ== X-Received: by 2002:a17:90b:283:b0:263:45c3:b17c with SMTP id az3-20020a17090b028300b0026345c3b17cmr14045721pjb.14.1689012170178; Mon, 10 Jul 2023 11:02:50 -0700 (PDT) Received: from bvanassche-linux.mtv.corp.google.com ([2620:15c:211:201:e582:53b1:a691:ab70]) by smtp.gmail.com with ESMTPSA id gt4-20020a17090af2c400b00263f446d432sm6531846pjb.43.2023.07.10.11.02.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 11:02:49 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , "Martin K . Petersen" , Damien Le Moal , Avri Altman , "James E.J. Bottomley" , Stanley Chu , Manivannan Sadhasivam , Asutosh Das , Bean Huo , Jinyoung Choi , Ziqi Chen , Arthur Simchaev , Adrien Thierry Subject: [PATCH v2 5/5] scsi: ufs: Enable zoned write pipelining Date: Mon, 10 Jul 2023 11:01:42 -0700 Message-ID: <20230710180210.1582299-6-bvanassche@acm.org> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog In-Reply-To: <20230710180210.1582299-1-bvanassche@acm.org> References: <20230710180210.1582299-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From the UFSHCI 4.0 specification, about the legacy (single queue) mode: "The host controller always process transfer requests in-order according to the order submitted to the list. In case of multiple commands with single doorbell register ringing (batch mode), The dispatch order for these transfer requests by host controller will base on their index in the List. A transfer request with lower index value will be executed before a transfer request with higher index value." From the UFSHCI 4.0 specification, about the MCQ mode: "Command Submission 1. Host SW writes an Entry to SQ 2. Host SW updates SQ doorbell tail pointer Command Processing 3. After fetching the Entry, Host Controller updates SQ doorbell head pointer 4. Host controller sends COMMAND UPIU to UFS device" In other words, for both legacy and MCQ mode, UFS controllers are required to forward commands to the UFS device in the order these commands have been received from the host. Note: for legacy mode this is only correct if the host submits one command at a time. The UFS driver does this. This patch improves small write IOPS by about 150% on my test setup. Cc: Martin K. Petersen Cc: Damien Le Moal Cc: Avri Altman Signed-off-by: Bart Van Assche --- drivers/ufs/core/ufshcd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index e7e79f515e14..8d0e495ae6fa 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -5146,6 +5146,7 @@ static int ufshcd_slave_configure(struct scsi_device *sdev) ufshcd_hpb_configure(hba, sdev); + blk_queue_flag_set(QUEUE_FLAG_PIPELINE_ZONED_WRITES, q); blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1); if (hba->quirks & UFSHCD_QUIRK_4KB_DMA_ALIGNMENT) blk_queue_update_dma_alignment(q, 4096 - 1);