From patchwork Tue Mar 10 09:46:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428953 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9698514B7 for ; Tue, 10 Mar 2020 09:47:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7726624681 for ; Tue, 10 Mar 2020 09:47:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Xx37JvBu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726244AbgCJJrO (ORCPT ); Tue, 10 Mar 2020 05:47:14 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26501 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726202AbgCJJrN (ORCPT ); Tue, 10 Mar 2020 05:47:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833635; x=1615369635; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EvmkpZ0dKDsiRcbVAy5MDx85HPrJ0hxuLDJvq95gdds=; b=Xx37JvBuYj6msVP4tRzMaUpnJUAGkTXrj1qqst2PgikzfxjS5yZFEs7f boc1ds9/ZE3OglqnegxvVIu+UVVmvBz+wiw5NOQCm3F1HPIOzosSHCAAc VcFYHn2cCZM1r1TN1vrN1KSRapB4NFOIb6KM5WWAL7+S/QMed8LnX+4em QuucGC2Zg9CsFTG2SprVWDNQFnXXLmjg4lTcYN0N2Vg7cysWdWauTgxXT xzlTetoenqqDvGdYLMfccFYAl88K2xRWl4+KO3BCPMZORN0vkjpfSqGBf T5hmERLSW6SoC8dKjccWPEXT3sTY32e+TapEkwNM+vUsbaieOSM40r/pm w==; IronPort-SDR: Z4EKzVB1wzkRDt4fMDjLtwiYsfNFqS42ElsjgzLoeJEyroSNOOSaeGz576M2BS3o1ezYvNdIH7 YGL1ZKhF5GcQPmanpU5W+kgM7teyGSz5QpPrdoi0R6bFCwmL8Fj/+/DbmGeu/WAw5zysHW6bsx ovgUK+I/DJzC+jxlGEd1c15WvHdlmX5Z/DPOhnFF8aOvL64A0r6Z5GCIW9NS9CFBdCSIFy9GhK 0tUYA4HdaXl3wCy1FpbKDOwoNzCW2fFrDIcPSsf/EzDIMpktNf+B84Rg6IE3igiKntaVTTwdR4 KVs= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082775" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:15 +0800 IronPort-SDR: oUDjsJwN+PtUE1M/mx++v3GgRS8xtMT1lVqamJ0oBkTeTYCPZUsYaDcyPLDistca4HLPV6966U rkckPKMyyEy+IMz397AJ/jM99qqebj2CQP9MkA8CyaO2T52tvpzf1njzblzo8hAJNy2uFtbYCd wEIx8HuAPUDyf2Hx2dOroKA1K0JcWKunIKcLAzcYLEzjJ2Z3BGbGcEXOmHvBNzJxELzFw8c1Px 2QBT6Oo/IveJC4LSpvFAXLJ53RUhE8sgS3ffLq9Qvo8yJFt/NyKBc4yXqwguapx8rCjcac4IkM pIzjweywNMqGAuXCFEiKozNZ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:38:53 -0700 IronPort-SDR: OJkPzBjmqtMowgja8M3BwZgm06g8lb1sJw9LHtcFglRsL8I68bdigWTjR4+yEu8U23TaN6IxmF FiGKkirxif9pcBuXUtUfxzfKTW7xDrvJDQ3vw8IH8C6wZQNVoRDyIEVtyu2kZ+irxXqBpFyi3R 4D5LyhjjoggXw9uoq5Ddx3bsSKOohyRqmocUMkwwpwuiOqBNQ0fQUwIpcw805uWC3/NBxz4hhL Syik36q3YU+o5EK7bOgabC6MeSozbAeNr3ZEVKb186TDehH9usHX11d4Lwh6CVf/YSl0jRc5jM t38= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:12 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Johannes Thumshirn Subject: [PATCH 01/11] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Date: Tue, 10 Mar 2020 18:46:43 +0900 Message-Id: <20200310094653.33257-2-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org blk_queue_zone_is_seq() and blk_queue_zone_no() have not been called with CONFIG_BLK_DEV_ZONED disabled until now. The introduction of REQ_OP_ZONE_APPEND will change this, so we need to provide noop fallbacks for the !CONFIG_BLK_DEV_ZONED case. Signed-off-by: Johannes Thumshirn --- include/linux/blkdev.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f629d40c645c..25b63f714619 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -729,6 +729,16 @@ static inline unsigned int blk_queue_nr_zones(struct request_queue *q) { return 0; } +static inline bool blk_queue_zone_is_seq(struct request_queue *q, + sector_t sector) +{ + return false; +} +static inline unsigned int blk_queue_zone_no(struct request_queue *q, + sector_t sector) +{ + return 0; +} #endif /* CONFIG_BLK_DEV_ZONED */ static inline bool rq_is_sync(struct request *rq) From patchwork Tue Mar 10 09:46:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428957 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 392EA924 for ; Tue, 10 Mar 2020 09:47:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 105A624682 for ; Tue, 10 Mar 2020 09:47:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="JdPtfQR9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726273AbgCJJrP (ORCPT ); Tue, 10 Mar 2020 05:47:15 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26501 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726202AbgCJJrP (ORCPT ); Tue, 10 Mar 2020 05:47:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833637; x=1615369637; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yfpOi5iezkjuZIH6uqNUd0CVBezoyulmwxer0C/fB/Q=; b=JdPtfQR9Y7cN6FhdsJ7hiXi6tMh5lEb4v7ZGGCuBu+ybOkoA0Y5/0sLJ vlytwLgxOISRGHz+0Lu5RinDxrYkDAUml5a9qq6u9yuu0SYjJY5TzmDhk gB9Rusk/IRYP4/7ZwM4mYvkKAwgk8ZZAc3NjmAP+wu3l0JnTc25gGmACu K9DqkSDHv/He015uVMyJ0Xw1AQZ6EujKlhKKZctjxsso8AdJRHt4GSXx4 zE8+o1tiPyplRxu78hY+Kf1eIlCpgyOL4ZoQIq5dEhvZaj5ixR6lw9dKJ ovuQfaibc30uUwJO2ekn0K3+ePLXy5cLT1TYHA+6JSFYr4qysHVnTsRTE A==; IronPort-SDR: OWdGXEeCKjnGpv3BU1om/QQN4EF7GBn8FaADFOXtXB7MNd14rUqsA5foP/aQ5gZnDAkn/pvItQ yJhgMghyvv+FfaU0FEZCAok9Q0vI+s2TKsv+zfIDK/jAsTWq0Taix0LVeAEOQy7qRBe5LGmTWd 0H7hmka2+NHYf0FlwfOizSBCj4S4kjXfeK1o22T5oU4YDc0/Taq3XJMUHe+Z7d2E5BmrxYem8S W1RuQEWGhCLFOmHor5GkVKRjbhTEMzG+k24Itf2cfEu2TJO9pSA8N3qVg8fZ3mHN/tWR2TyfUr wdA= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082777" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:17 +0800 IronPort-SDR: 6zFFAWiAs/XMpx8m5fgF8Lhd2jX+WvN8+KaWI0XpAuzU4B4teE949gTo1BTfXMyGRgZMcXRkOn /Dj5GrhsISZWN0p2sn82UJZX2Ggrfmw3hS5lvGkEheembA06KbEL45HhXxmbKY0yKWtZ9LSiBN 9dTt7LlQ61kLYsTPgQhWAmFuSlHhdH9F+cmaCyjOo6UujyJLfzLDzvttfk6ejxUNpdKKT5yJBs zc0xII+83Q5H+AAxyfAIUsyFI+6Fm0d3dx0hLVnnyz7jRpv8xgbf9abWCP1Rnxrrd2Yw6CzvlK VAhAhMhKnqlgqA6z/T9MayyI Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:38:55 -0700 IronPort-SDR: HXNIdmb3heDpCnK+zeBX+pRn2tQ1h2qQb2bGROr/ab2DYnJVGovJOO6mmaYWOTl7hE0V0btJ42 83/wnjwVvZIQStxYl89ttaYYPCj/eVcBV8bToPka2I770V5zHjasSaeYna5p9eiJNstGBWRCh/ NW1ZU76jByjuZBWXWkjHTzne4/J1Gs1SPd7X1P9CkUxP8Oq2Ww286Z7N4j2W06A870nVRNy6LO EetewwokJdyq1Zx4d1P0ElKpcgEUcEFaPP5XjU5GlzaH/qyrbKjdcBUyS+yNsgwKC1qAVHMO7C Jyg= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:13 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" Subject: [PATCH 02/11] block: Introduce REQ_OP_ZONE_APPEND Date: Tue, 10 Mar 2020 18:46:44 +0900 Message-Id: <20200310094653.33257-3-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Keith Busch Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned block device. This is a no-merge write operation. A zone append write BIO must: * Target a zoned block device * Have a sector position indicating the start sector of the target zone * The target zone must be a sequential write zone * The BIO must not cross a zone boundary * The BIO size must not be split to ensure that a single range of LBAs is written with a single command. Implement these checks in generic_make_request_checks() using the helper function blk_check_zone_append(). To avoid write append BIO splitting, introduce the new max_zone_append_sectors queue limit attribute and ensure that a BIO size is always lower than this limit. Export this new limit through sysfs. Finally, make sure that the bio sector position indicates the actual write position as indicated by the device on completion. Signed-off-by: Keith Busch --- block/bio.c | 4 ++++ block/blk-core.c | 49 +++++++++++++++++++++++++++++++++++++++ block/blk-settings.c | 16 +++++++++++++ block/blk-sysfs.c | 13 +++++++++++ include/linux/bio.h | 1 + include/linux/blk_types.h | 2 ++ include/linux/blkdev.h | 11 +++++++++ 7 files changed, 96 insertions(+) diff --git a/block/bio.c b/block/bio.c index 94d697217887..5bff80fc2ad9 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1895,6 +1895,10 @@ struct bio *bio_split(struct bio *bio, int sectors, BUG_ON(sectors <= 0); BUG_ON(sectors >= bio_sectors(bio)); + /* Zone append commands cannot be split */ + if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND)) + return NULL; + split = bio_clone_fast(bio, gfp, bs); if (!split) return NULL; diff --git a/block/blk-core.c b/block/blk-core.c index 60dc9552ef8d..544c5a130ac5 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -135,6 +135,7 @@ static const char *const blk_op_name[] = { REQ_OP_NAME(ZONE_OPEN), REQ_OP_NAME(ZONE_CLOSE), REQ_OP_NAME(ZONE_FINISH), + REQ_OP_NAME(ZONE_APPEND), REQ_OP_NAME(WRITE_SAME), REQ_OP_NAME(WRITE_ZEROES), REQ_OP_NAME(SCSI_IN), @@ -239,6 +240,16 @@ static void req_bio_endio(struct request *rq, struct bio *bio, bio_set_flag(bio, BIO_QUIET); bio_advance(bio, nbytes); + if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) { + /* + * Partial completions cannot be supported as the BIO + * fragments may end up not being written sequentially. + */ + if (bio->bi_iter.bi_size) + bio->bi_status = BLK_STS_IOERR; + else + bio->bi_iter.bi_sector = rq->__sector; + } /* don't actually finish bio if it's part of flush sequence */ if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ)) @@ -865,6 +876,39 @@ static inline int blk_partition_remap(struct bio *bio) return ret; } +/* + * Check write append to a zoned block device. + */ +static inline blk_status_t blk_check_zone_append(struct request_queue *q, + struct bio *bio) +{ + sector_t pos = bio->bi_iter.bi_sector; + int nr_sectors = bio_sectors(bio); + + /* Only applicable to zoned block devices */ + if (!blk_queue_is_zoned(q)) + return BLK_STS_NOTSUPP; + + /* The bio sector must point to the start of a sequential zone */ + if (pos & (blk_queue_zone_sectors(q) - 1) || + !blk_queue_zone_is_seq(q, pos)) + return BLK_STS_IOERR; + + /* + * Not allowed to cross zone boundaries. Otherwise, the BIO will be + * split and could result in non-contiguous sectors being written in + * different zones. + */ + if (blk_queue_zone_no(q, pos) != blk_queue_zone_no(q, pos + nr_sectors)) + return BLK_STS_IOERR; + + /* Make sure the BIO is small enough and will not get split */ + if (nr_sectors > q->limits.max_zone_append_sectors) + return BLK_STS_IOERR; + + return BLK_STS_OK; +} + static noinline_for_stack bool generic_make_request_checks(struct bio *bio) { @@ -937,6 +981,11 @@ generic_make_request_checks(struct bio *bio) if (!q->limits.max_write_same_sectors) goto not_supported; break; + case REQ_OP_ZONE_APPEND: + status = blk_check_zone_append(q, bio); + if (status != BLK_STS_OK) + goto end_io; + break; case REQ_OP_ZONE_RESET: case REQ_OP_ZONE_OPEN: case REQ_OP_ZONE_CLOSE: diff --git a/block/blk-settings.c b/block/blk-settings.c index c8eda2e7b91e..1345dbe5a245 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim) lim->chunk_sectors = 0; lim->max_write_same_sectors = 0; lim->max_write_zeroes_sectors = 0; + lim->max_zone_append_sectors = 0; lim->max_discard_sectors = 0; lim->max_hw_discard_sectors = 0; lim->discard_granularity = 0; @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->max_dev_sectors = UINT_MAX; lim->max_write_same_sectors = UINT_MAX; lim->max_write_zeroes_sectors = UINT_MAX; + lim->max_zone_append_sectors = UINT_MAX; } EXPORT_SYMBOL(blk_set_stacking_limits); @@ -257,6 +259,18 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q, } EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors); +/** + * blk_queue_max_zone_append_sectors - set max sectors for a single zone append + * @q: the request queue for the device + * @max_zone_append_sectors: maximum number of sectors to write per command + **/ +void blk_queue_max_zone_append_sectors(struct request_queue *q, + unsigned int max_zone_append_sectors) +{ + q->limits.max_zone_append_sectors = max_zone_append_sectors; +} +EXPORT_SYMBOL_GPL(blk_queue_max_zone_append_sectors); + /** * blk_queue_max_segments - set max hw segments for a request for this queue * @q: the request queue for the device @@ -506,6 +520,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, b->max_write_same_sectors); t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors, b->max_write_zeroes_sectors); + t->max_zone_append_sectors = min(t->max_zone_append_sectors, + b->max_zone_append_sectors); t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn); t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index fca9b158f4a0..02643e149d5e 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -218,6 +218,13 @@ static ssize_t queue_write_zeroes_max_show(struct request_queue *q, char *page) (unsigned long long)q->limits.max_write_zeroes_sectors << 9); } +static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page) +{ + unsigned long long max_sectors = q->limits.max_zone_append_sectors; + + return sprintf(page, "%llu\n", max_sectors << SECTOR_SHIFT); +} + static ssize_t queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) { @@ -639,6 +646,11 @@ static struct queue_sysfs_entry queue_write_zeroes_max_entry = { .show = queue_write_zeroes_max_show, }; +static struct queue_sysfs_entry queue_zone_append_max_entry = { + .attr = {.name = "zone_append_max_bytes", .mode = 0444 }, + .show = queue_zone_append_max_show, +}; + static struct queue_sysfs_entry queue_nonrot_entry = { .attr = {.name = "rotational", .mode = 0644 }, .show = queue_show_nonrot, @@ -749,6 +761,7 @@ static struct attribute *queue_attrs[] = { &queue_discard_zeroes_data_entry.attr, &queue_write_same_max_entry.attr, &queue_write_zeroes_max_entry.attr, + &queue_zone_append_max_entry.attr, &queue_nonrot_entry.attr, &queue_zoned_entry.attr, &queue_nr_zones_entry.attr, diff --git a/include/linux/bio.h b/include/linux/bio.h index 853d92ceee64..ef640fd76c23 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -74,6 +74,7 @@ static inline bool bio_no_advance_iter(struct bio *bio) { return bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_SECURE_ERASE || + bio_op(bio) == REQ_OP_ZONE_APPEND || bio_op(bio) == REQ_OP_WRITE_SAME || bio_op(bio) == REQ_OP_WRITE_ZEROES; } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 70254ae11769..ae809f07aa27 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -296,6 +296,8 @@ enum req_opf { REQ_OP_ZONE_CLOSE = 11, /* Transition a zone to full */ REQ_OP_ZONE_FINISH = 12, + /* write data at the current zone write pointer */ + REQ_OP_ZONE_APPEND = 13, /* SCSI passthrough using struct scsi_request */ REQ_OP_SCSI_IN = 32, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 25b63f714619..36111b10d514 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -336,6 +336,7 @@ struct queue_limits { unsigned int max_hw_discard_sectors; unsigned int max_write_same_sectors; unsigned int max_write_zeroes_sectors; + unsigned int max_zone_append_sectors; unsigned int discard_granularity; unsigned int discard_alignment; @@ -757,6 +758,9 @@ static inline bool rq_mergeable(struct request *rq) if (req_op(rq) == REQ_OP_WRITE_ZEROES) return false; + if (req_op(rq) == REQ_OP_ZONE_APPEND) + return false; + if (rq->cmd_flags & REQ_NOMERGE_FLAGS) return false; if (rq->rq_flags & RQF_NOMERGE_FLAGS) @@ -1088,6 +1092,8 @@ extern void blk_queue_max_write_same_sectors(struct request_queue *q, extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, unsigned int max_write_same_sectors); extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); +extern void blk_queue_max_zone_append_sectors(struct request_queue *q, + unsigned int max_zone_append_sectors); extern void blk_queue_physical_block_size(struct request_queue *, unsigned int); extern void blk_queue_alignment_offset(struct request_queue *q, unsigned int alignment); @@ -1301,6 +1307,11 @@ static inline unsigned int queue_max_segment_size(const struct request_queue *q) return q->limits.max_segment_size; } +static inline unsigned int queue_max_zone_append_sectors(const struct request_queue *q) +{ + return q->limits.max_zone_append_sectors; +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { int retval = 512; From patchwork Tue Mar 10 09:46:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428961 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DAA3214B7 for ; Tue, 10 Mar 2020 09:47:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BAA4624681 for ; Tue, 10 Mar 2020 09:47:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="BWrxrTeK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726290AbgCJJrR (ORCPT ); Tue, 10 Mar 2020 05:47:17 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26501 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726202AbgCJJrQ (ORCPT ); Tue, 10 Mar 2020 05:47:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833640; x=1615369640; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GPwQZ0HJLXihqtToIOjaKwVIO2sx4QEs+D9iQ1iYFlw=; b=BWrxrTeK0WyQphFRjw+nBndOlpCZzhIRMqmWcu5YfclMVZlT9jhbevzc m06heVjZrn3Mw5jgsaECLVe+iqkPoby30OfrwhRg/fMEXWBiAezIYKhol E2tz23vPPY5GvBWshXfW0XGDeW0lmA4cqPlX/UrAx6RgRI6EyVGVlzkAR 9Is3f1JKNtl76+5MDs+wJ/YiaLFziA33V7JlAoUwb3/WiQr7DbNJHoaM4 zKOPqR1EUmyDFfRBkWswW+SRK2X03KBnBfJ3UxSYCCUrqHAEGMzx8U2vm eNbTpJGdlaBVYzgke9QOkcUZlymI+PyUjORRKv2IhfcIqWfSJt9OSRxAb Q==; IronPort-SDR: +vzu0PYPy6uEhPavDkkps2P/0M0iPd3tnMnDdRDjhiPFvhSx32De5LJ81kkkaqa896RBZ36SfT +YDRldPXZVnbdlXKdiaXE2QsJ1gxlpGknOuinaa9tEOmM0lGE+xBukDBs7AES7bPzpVn0cfqkr n5UKhAwQD3qabVVr1rhz4wtn7JpNdQiCVOTCh66nThIb/jRZvqSOb8L33kUjTmWeKyrIXx82op kHvwIN1v+CHmKqq7sO9KxZtkr9LKQrq6rijYso3vfU8B1nbXm4gtgGNxzWliszqudSFLVOXU+7 hfE= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082781" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:20 +0800 IronPort-SDR: QkI1LwfpcAcntJ46tE/63xRUWdUFluw+ygxDaKNEW6mRkgm8DjbimWPOh72icYuNn/tdqXG/vP HdIjZ5pL0siZrv8SLNX1+Mgi6l/tgN0HlL0YKQXAcXci1TYILzYvtGuOFZoYsN51x0yRKx5TVb UNwx2qXa3GWJS6TpxR6njO5JXmfeZuw7rh9CH2XO+OqF9N7nmyIlDKmuaTNpGHbbAL0dxYOQT8 9m9mPr+bJ/TtujHSijrAwfVrOq94mJtQ0aOUMqSNUh+zk/0aAJHNiYoT4ttS/DMdsc8/QZ23JK 1px3EbRBzNzJPWxO6SoogFTu Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:38:56 -0700 IronPort-SDR: 1HcwUKh2Ams6epC17LEySXSSIAsp64AFA9n03J9RHLt8cgq/A0OOErbU2e8RwAc26FZ30n1bh9 JZJjGBq6eGnBKek8Vha8g/C8HlL/QtWpPAph41C0W3bODFgF2Iz27tPLtSOAFS25eU2GGYa/yG AIWWV5efmykPQ+ntiLfoC3HXnEFU5PPT+7i2hDkuvLs9vpiaz52ylxLT/uw6Sp7/+69vKnBwDc 3y2tlt8PcDXONIMj/vV1qFqqB5FD/bgYMPhQbefqpme97GvmGwGB7YZ5KV5EhCj1iIuP1LuNmV 7AQ= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:15 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Johannes Thumshirn Subject: [PATCH 03/11] block: introduce bio_add_append_page Date: Tue, 10 Mar 2020 18:46:45 +0900 Message-Id: <20200310094653.33257-4-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org For REQ_OP_ZONE_APPEND we cannot add unlimited amounts of pages to a bio, as the bio cannot be split later on. This is similar to what we have to do for passthrough pages as well, just with a different limit. Introduce bio_add_append_page() which can used by file-systems add pages for a REQ_OP_ZONE_APPEND bio. Signed-off-by: Johannes Thumshirn --- block/bio.c | 37 ++++++++++++++++++++++++++++++------- block/blk-map.c | 2 +- include/linux/bio.h | 2 +- 3 files changed, 32 insertions(+), 9 deletions(-) diff --git a/block/bio.c b/block/bio.c index 5bff80fc2ad9..3bd648671a28 100644 --- a/block/bio.c +++ b/block/bio.c @@ -732,7 +732,7 @@ static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio, */ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, - bool *same_page) + bool *same_page, unsigned int max_sectors) { struct bio_vec *bvec; @@ -742,7 +742,7 @@ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio, if (unlikely(bio_flagged(bio, BIO_CLONED))) return 0; - if (((bio->bi_iter.bi_size + len) >> 9) > queue_max_hw_sectors(q)) + if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors) return 0; if (bio->bi_vcnt > 0) { @@ -777,10 +777,20 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset) { bool same_page = false; - return __bio_add_pc_page(q, bio, page, len, offset, &same_page); + return __bio_add_pc_page(q, bio, page, len, offset, &same_page, + queue_max_hw_sectors(q)); } EXPORT_SYMBOL(bio_add_pc_page); +int bio_add_append_page(struct request_queue *q, struct bio *bio, + struct page *page, unsigned int len, unsigned int offset) +{ + bool same_page = false; + return __bio_add_pc_page(q, bio, page, len, offset, &same_page, + queue_max_zone_append_sectors(q)); +} +EXPORT_SYMBOL(bio_add_append_page); + /** * __bio_try_merge_page - try appending data to an existing bvec. * @bio: destination bio @@ -945,8 +955,15 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) len = min_t(size_t, PAGE_SIZE - offset, left); - if (__bio_try_merge_page(bio, page, len, offset, &same_page)) { - if (same_page) + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + size = bio_add_append_page(bio->bi_disk->queue, bio, + page, len, offset); + + if (size != len) + return -E2BIG; + } else if (__bio_try_merge_page(bio, page, len, offset, + &same_page)) { + if (same_page) put_page(page); } else { if (WARN_ON_ONCE(bio_full(bio, len))) @@ -1389,11 +1406,12 @@ struct bio *bio_copy_user_iov(struct request_queue *q, */ struct bio *bio_map_user_iov(struct request_queue *q, struct iov_iter *iter, - gfp_t gfp_mask) + gfp_t gfp_mask, unsigned int op) { int j; struct bio *bio; int ret; + unsigned int max_sectors; if (!iov_iter_count(iter)) return ERR_PTR(-EINVAL); @@ -1402,6 +1420,11 @@ struct bio *bio_map_user_iov(struct request_queue *q, if (!bio) return ERR_PTR(-ENOMEM); + if (op == REQ_OP_ZONE_APPEND) + max_sectors = queue_max_zone_append_sectors(q); + else + max_sectors = queue_max_hw_sectors(q); + while (iov_iter_count(iter)) { struct page **pages; ssize_t bytes; @@ -1429,7 +1452,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, n = bytes; if (!__bio_add_pc_page(q, bio, page, n, offs, - &same_page)) { + &same_page, max_sectors)) { if (same_page) put_page(page); break; diff --git a/block/blk-map.c b/block/blk-map.c index b0790268ed9d..a83ba39251a9 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -72,7 +72,7 @@ static int __blk_rq_map_user_iov(struct request *rq, if (copy) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else - bio = bio_map_user_iov(q, iter, gfp_mask); + bio = bio_map_user_iov(q, iter, gfp_mask, req_op(rq)); if (IS_ERR(bio)) return PTR_ERR(bio); diff --git a/include/linux/bio.h b/include/linux/bio.h index ef640fd76c23..ef69e52cc8d9 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -444,7 +444,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); void bio_release_pages(struct bio *bio, bool mark_dirty); struct rq_map_data; extern struct bio *bio_map_user_iov(struct request_queue *, - struct iov_iter *, gfp_t); + struct iov_iter *, gfp_t, unsigned int); extern void bio_unmap_user(struct bio *); extern struct bio *bio_map_kern(struct request_queue *, void *, unsigned int, gfp_t); From patchwork Tue Mar 10 09:46:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428965 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 60078924 for ; Tue, 10 Mar 2020 09:47:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3FB912467D for ; Tue, 10 Mar 2020 09:47:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="XsbUXEC5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726307AbgCJJrS (ORCPT ); Tue, 10 Mar 2020 05:47:18 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26501 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726202AbgCJJrS (ORCPT ); Tue, 10 Mar 2020 05:47:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833642; x=1615369642; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wPs97aMDNDoq0Oy8HEWbeiY8f8p0e2xRgSLjwRsz5sA=; b=XsbUXEC5LZXEAWaeITWoR7WWqWDA9JDcyUmXkGrCiBGVQW6ISxhA8j3R lK0KiHVXkhYz7uOPUbV/q2sPBD/HwPC1Tk6CoJRL2Ny/8YCxG0YRU93yx QUtjymMNO7Y/GqpilJp9E+kujjPY7CbFHY6n3tLdwdVwWgR70yXXafUji FQL1ugooCqTd8JhD1T1pffjnHeUmc8EKz41bsvthaBslhukci3fk1xybr bq3WgD/MVXxU+b9076dAX5436/+FmgrICStAUUM5SnB7ONmShQgVha8hz SWrU768GxupGmFTDmQiQCiTeS8ZpH+REd7Pc8M+rQ4uLTsvrmWduoCn80 w==; IronPort-SDR: Y/zk6cOsIk9w/x3Vbf67R02YbeOxNYuE1J1oGdODs3lasrM7cAcOxSsSXSNogcdYb0b94AMzyK N15ZtuF2l5Nk9vQTu8c/YGww33OjWKEw6TZQ/qsh4XCX2pQ5VePrxzF9jn8h+Lzz89nG0+KtHx xI7w1nAx2CfqL5L8nDaxgmyZUGIPs1LI+uPBaRzCOriOeKv+SMMELtoOwLvZA0r/DIfEjmt+p4 Cv4FgCFeoNjE2z9+Qcv7Mid4gyI+61zcOUZt0fYCAeb402QVylVeD4P5AZfiS5+xBV+CmJzAJ/ aT0= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082785" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:22 +0800 IronPort-SDR: 8Q463pSCRvLdQs9EnsRpz1jaY3JEmR/WUoRZjC+ND3mRjF247ZTl1j8Gs64tZ+KZdCUy4i1OT5 9FeFDHX6VihDO1mmB+RncE2fH8NyN7Od6D1uF7ycImcNjd46DFOQg1+3zd7n2X1bc64vbxnpKi 7sD1f5rjvTcznoBoVovFtfSCrlfQBjcDq8kSd2AmmM/KQt/vljZGooCV0pOmwSidjCwiGsDXxD fM+EHRMwI2aQfDFX/nHWgWfJIaOAZ2hue7aUSA2i4HT0A1aVsUMCuzq9mqW+r8W7NQFNA16NPY xxLasjzoLLbpP7EwUNS2U054 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:38:58 -0700 IronPort-SDR: sxD0Vgtf+SdqB0iX93mUGed1Tiu37buXt3gdpbh4DzuirIojrXpIoTeTSA5zr7Fo/3XQumfCzs GzBchO4u+s5Y3yw9pSV9clDirQVPehwZwvYQ2byBv2CRYV6VpKes916NECqvY4s5IhMsy+zUbs WZfp8Q9d5s1QVKOQdx/PfK6cHImfCkslycnzMQm4K6voByjUPvQ6WLlDtDm664NwcvSh6ACTdL f+UqZzM+pWNEscU/vKGGjilV9XaM5RrFWwMTfex63DA1pZ9CpRz3BcnkcZSjqIqA+wYSxNG5wv fp0= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:17 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Damien Le Moal Subject: [PATCH 04/11] null_blk: Support REQ_OP_ZONE_APPEND Date: Tue, 10 Mar 2020 18:46:46 +0900 Message-Id: <20200310094653.33257-5-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Damien Le Moal Support REQ_OP_ZONE_APPEND requests for zone mode null_blk devices. Use the internally tracked zone write pointer position as the write position. Signed-off-by: Damien Le Moal --- drivers/block/null_blk_main.c | 9 ++++++--- drivers/block/null_blk_zoned.c | 21 ++++++++++++++++++--- 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c index 133060431dbd..62869431f2cf 100644 --- a/drivers/block/null_blk_main.c +++ b/drivers/block/null_blk_main.c @@ -1575,15 +1575,18 @@ static int null_gendisk_register(struct nullb *nullb) #ifdef CONFIG_BLK_DEV_ZONED if (nullb->dev->zoned) { - if (queue_is_mq(nullb->q)) { + struct request_queue *q = nullb->q; + + if (queue_is_mq(q)) { int ret = blk_revalidate_disk_zones(disk); if (ret) return ret; } else { - blk_queue_chunk_sectors(nullb->q, + blk_queue_chunk_sectors(q, nullb->dev->zone_size_sects); - nullb->q->nr_zones = blkdev_nr_zones(disk); + q->nr_zones = blkdev_nr_zones(disk); } + blk_queue_max_zone_append_sectors(q, q->limits.max_hw_sectors); } #endif diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c index ed34785dd64b..ed9c4cde68f3 100644 --- a/drivers/block/null_blk_zoned.c +++ b/drivers/block/null_blk_zoned.c @@ -116,7 +116,7 @@ size_t null_zone_valid_read_len(struct nullb *nullb, } static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, - unsigned int nr_sectors) + unsigned int nr_sectors, bool append) { struct nullb_device *dev = cmd->nq->dev; unsigned int zno = null_zone_no(dev, sector); @@ -131,7 +131,20 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, case BLK_ZONE_COND_IMP_OPEN: case BLK_ZONE_COND_EXP_OPEN: case BLK_ZONE_COND_CLOSED: - /* Writes must be at the write pointer position */ + /* + * Regular writes must be at the write pointer position. + * Zone append writes are automatically issued at the write + * pointer and the position returned using the request or BIO + * sector. + */ + if (append) { + sector = zone->wp; + if (cmd->bio) + cmd->bio->bi_iter.bi_sector = sector; + else + cmd->rq->__sector = sector; + } + if (sector != zone->wp) return BLK_STS_IOERR; @@ -211,7 +224,9 @@ blk_status_t null_handle_zoned(struct nullb_cmd *cmd, enum req_opf op, { switch (op) { case REQ_OP_WRITE: - return null_zone_write(cmd, sector, nr_sectors); + return null_zone_write(cmd, sector, nr_sectors, false); + case REQ_OP_ZONE_APPEND: + return null_zone_write(cmd, sector, nr_sectors, true); case REQ_OP_ZONE_RESET: case REQ_OP_ZONE_RESET_ALL: case REQ_OP_ZONE_OPEN: From patchwork Tue Mar 10 09:46:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428969 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BEB1E924 for ; Tue, 10 Mar 2020 09:47:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9E8DF24681 for ; Tue, 10 Mar 2020 09:47:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="AkI7E4HU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726325AbgCJJrU (ORCPT ); Tue, 10 Mar 2020 05:47:20 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26501 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726202AbgCJJrT (ORCPT ); Tue, 10 Mar 2020 05:47:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833644; x=1615369644; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3p7RHEyoePg7i1Mq/5dKs5CFba9Y+OBx37yaWHnJv0Q=; b=AkI7E4HUUfq/b+Cv0vdNxENawjiJIQuysdZp3cqR7ENlTWAY77qfPg8J XDpL8ybCqcaxpVAGBDclq4jtRQRJM1nBYSpdpOD7UjVr5rLn2DhKhBZCy lI9BKi7dKLpetYvNSxiin6ag8C1/tzbLeY0g1t4WQdMdMhtfzx6QxpgmK JmBiZX1kggifu//h0ebrUmaMiUPcuNsuvqGgIJB4E2f4ty65WOMPgEcYS OtDJVKl/8bKS7KkjmriJzY8jhr/0JAYyislaXE2gO8lmMQXaxYI02cpYw NopsdvnoLS4XffeN733X6H/jhnzjx6x5mzNidbNjgT2vA29aVLcKYNceD Q==; IronPort-SDR: jxupdKjoHNP/XsPyzDKKYvH4nwAypI+WGsOk7p1cwK0C2T+7RNjbtRZzp625P8rK3kIRv4zYmj yRXBZWaasLjP6htXJe+F+h25hvLMhTEcST63mbYJ9NRHgYl4mCSqk3EaKgyfRreg0mVPyPNMY7 8YTacxefkGp99qRW/+e/wKWzEPIbkleCu0XqwPpdQTrD2rayNsKeaNRfE6sI6b8A5NYQNbi/kz gKJfVesvy7KCRZ2iISchADrs4zDVTPKXtyeYFqA/8aAUx7HeNvqe2mCYYpYq2ANqvg2HnQOKI2 X90= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082787" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:24 +0800 IronPort-SDR: nqCUm4+6YZ3rQsIBdLBIPOXk525qy3Z7/u6CO1HRKWYzksIns2CvW9okRsFVcfKg1d3qywHCyB TjzB9aufHX600q4lam0qEjTESRUlhOCYwf7xwXgHeATazvbT230LfeEHDShWnQV08eyFo64saY pkvTTqThARqLBlvUErK1uwpecy86SAxV7/x1V15/Qo1GRw/KwXeT/ITFdmfoVf7cXa34qXW54G 7tFiBajwP2zbJbnuKOwbNMPRL6s/uV8Nn4y6r59meaLHxHueK511oQ5Lth5JpKuW1PJlg2jvYd yEpcmvYqCrBN33Udh7vDEoeT Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:38:59 -0700 IronPort-SDR: C7vB4ADhzATcn/BECNcdERi5PCk6AzABilnzyc6cfl/JZd3i4ZkrBjYOYNE6dbQgQ1drHIDRyd +KSe83UqI6P9MwF0B0FkHfu4oJsWZ2qN/lwUYoQ8g39c6cDIZebdAuQ27bQOI5pGvZayroRoZV yV8uKyimZsQQz2DZ+2IVR4QhH9g1Z8hH0MHdIVCKv+cJprTLNQ8M0bAYL0rmF4Y/ypiEMRTwvX Ds2ZrteK0eQhzqOZO5J77g1Gesjj5BviRvNJELRTLU4NXFdCYYpzo4CTINJFknxI6fFIvSXCv9 UkM= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:18 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Johannes Thumshirn Subject: [PATCH 05/11] block: introduce BLK_STS_ZONE_RESOURCE Date: Tue, 10 Mar 2020 18:46:47 +0900 Message-Id: <20200310094653.33257-6-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org BLK_STS_ZONE_RESOURCE is returned from the driver to the block layer if zone related resources are unavailable, but the driver can guarantee the queue will be rerun in the future once the resources become available again. Signed-off-by: Johannes Thumshirn --- include/linux/blk_types.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index ae809f07aa27..824ec2d89954 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -63,6 +63,18 @@ typedef u8 __bitwise blk_status_t; */ #define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) +/* + * BLK_STS_ZONE_RESOURCE is returned from the driver to the block layer if zone + * related resources are unavailable, but the driver can guarantee the queue + * will be rerun in the future once the resources become available again. + * + * This is different from BLK_STS_DEV_RESOURCE in that it explicitly references + * a zone specific resource and IO to a different zone on the same device could + * still be served. Examples of that are zones that are write-locked, but a read + * to the same zone could be served. + */ +#define BLK_STS_ZONE_RESOURCE ((__force blk_status_t)14) + /** * blk_path_error - returns true if error may be path related * @error: status the request was completed with From patchwork Tue Mar 10 09:46:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428973 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AD4C14B7 for ; Tue, 10 Mar 2020 09:47:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4BBCC2467F for ; Tue, 10 Mar 2020 09:47:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="QwnWYFxb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726378AbgCJJrV (ORCPT ); Tue, 10 Mar 2020 05:47:21 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26501 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726202AbgCJJrV (ORCPT ); Tue, 10 Mar 2020 05:47:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833647; x=1615369647; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EwKO5aoN0Yru0ywPm9Sy5+xOh9OevyNdB2LDiJcdQj4=; b=QwnWYFxb/cn571T5GZv5pdMzHbPvKgTjrqBQZM/5D+NYTQY2XNEp/vOe HWsFDqoxGCU0+7WNh3bUFZ7O8s6WdaXc7A6ICHrc7I2OpwYpje0C0WzjK j8ZlCmLmHoVQigXEVhLohIJSBYFXho+tT+4XlyADqbnrP3tXiVHxTtLSu NbYNLwseCMaLsiu6cxGBHZ08pPt73BiDKST4fI94i4aVU3C9x60XyTHiV ZsbxaZ8+nEw7j7wZ9mzhwMdbzGkYG9t52BDiZYCcr4VYVz7PVfVP3MkqD l9EBdKNzMgRLbqbtcnDiEN3vgMOaWqvDmaZjyAY06sJd9nn+qcGKXyygS Q==; IronPort-SDR: 7t0+shfSJpClnWfBhNIGfY9D9PVHbiluKUFslZBm5h9Y8ZSKrRvcDghj44mgThlKARQZG+FAmS rLSAiMldn3Dq3hkk4Noz7AtzcW9FuHCjhlgTmbp1yXfFHa2qU83yY0hAshIn7PVyEH5BcVL/f2 tcSURLewpPKC814ORRJwfXZ4pu/RMlAZQTkAR+2e+psxziX4chH6IUmiE7hcWtiluCf3hySRq+ 7YCEcgFKF6+zRq1gLfDp62M8EuKntrIxKXLsFqJpAOOObAS9panv5I+RQOZ4IUEgZGOAKWqwIp 9Zw= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082792" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:26 +0800 IronPort-SDR: 6h4HcwQgON7V1CHi5X+ii6XY0hxSk2ufogFZ3NE41AVj14MB/1LbZg+33DbfhL9skUByvr0Twf +uo8y3+Mz8fDGoHPVSudHyBX8cU3VD7YC3dKxuWMlOkY0veOMGdS1U2BpGBNlTqq0GIZEg1pj/ fvZByu/2o1nV9NA3M7hUTuDgLhclPNndi568RFMIdRahWMEK3FVN8kAN3tpmTZUAXX9t2NVv95 7oTwFTrpRVvuA/pPURlQ9dS6++unBHtcSCaloKyiPCYqrMK4HGozfGTqRTY8Q613LHd+UTIohb fnsyKlEgOXqnUQNZoYVBv/1Z Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:39:01 -0700 IronPort-SDR: VECWyrasUrZXGT/LOibSlJBXNClZAV1UNrRobYltk21W3M7e9q4FOkFDtx+s35JFeSSDZALgJN 9Kccd1UkMvE3TKjtfSeQD1SMxu2n8mWyAueYuS2d+75IDfbfxrN20Rjs4yjRVRw3eLpZD5hJ// p0i1nc3EY+cVTD6AaxaQWklEfbOfzthv2GnnPNJ9/GkSFfjozshJzzcouHz5uwjaU0HZldzST/ G79tdUQXteKDoipARZDH2dbEiEPyiB8ZvcyDFk4mVn/C5aW54mhubeOI2RGTo6c64Z2Whxen5L X78= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:20 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Johannes Thumshirn Subject: [PATCH 06/11] block: introduce blk_req_zone_write_trylock Date: Tue, 10 Mar 2020 18:46:48 +0900 Message-Id: <20200310094653.33257-7-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Signed-off-by: Johannes Thumshirn --- block/blk-zoned.c | 14 ++++++++++++++ include/linux/blkdev.h | 1 + 2 files changed, 15 insertions(+) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 05741c6f618b..00b025b8b7c0 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -50,6 +50,20 @@ bool blk_req_needs_zone_write_lock(struct request *rq) } EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock); +bool blk_req_zone_write_trylock(struct request *rq) +{ + unsigned int zno = blk_rq_zone_no(rq); + + if (test_and_set_bit(zno, rq->q->seq_zones_wlock)) + return false; + + WARN_ON_ONCE(rq->rq_flags & RQF_ZONE_WRITE_LOCKED); + rq->rq_flags |= RQF_ZONE_WRITE_LOCKED; + + return true; +} +EXPORT_SYMBOL_GPL(blk_req_zone_write_trylock); + void __blk_req_zone_write_lock(struct request *rq) { if (WARN_ON_ONCE(test_and_set_bit(blk_rq_zone_no(rq), diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 36111b10d514..e591b22ace03 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1746,6 +1746,7 @@ extern int bdev_write_page(struct block_device *, sector_t, struct page *, #ifdef CONFIG_BLK_DEV_ZONED bool blk_req_needs_zone_write_lock(struct request *rq); +bool blk_req_zone_write_trylock(struct request *rq); void __blk_req_zone_write_lock(struct request *rq); void __blk_req_zone_write_unlock(struct request *rq); From patchwork Tue Mar 10 09:46:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428977 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15C0F924 for ; Tue, 10 Mar 2020 09:47:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EA97624681 for ; Tue, 10 Mar 2020 09:47:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="iHm3JEd/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726390AbgCJJrY (ORCPT ); Tue, 10 Mar 2020 05:47:24 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26501 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726202AbgCJJrW (ORCPT ); Tue, 10 Mar 2020 05:47:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833649; x=1615369649; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hKeEpOS2tHfv2S2nqYpCIKfQHW+iPi0t+T2+Lvls+fI=; b=iHm3JEd/xOSWEO+pDtI8EROhij/jl286NcIahjiHNkCDQWrvIBTanZea CwIkhBvHE7663A7UF/C0hl6ZJhfDxYUv0L9CMzdksIkPJ55dnSYrz9USR VVLAjHCaYUMCKsNILYLIFef69RHgSDhv0NOIWxsVMY8GiGKZGWuhkHnG0 7rMdgG9G8DdtqkaGQH6VtGLDAhL6wmgzJmDIV3t4nn1Efki0n7U6MlE33 KCIT8Mr2F75Em+YEagrVpGxfokxU11NwaNfpRoagjeaXkqlbIyHducuvf A4wiebprS2xKmJZC8JQeuSYh+Cd8aTlIBJ6pjLcW1scYKcdHs71e05oQO A==; IronPort-SDR: hwRKxOKXPgwUbcqha4iWuX9Gb3h1bFoGw/S3amN22bpgejFEGBDcxM8xHkABnzUAStz7deBe/9 4ddFIUQEL7IDCtD2K6jz3Do63wrVMsn57bcVZ/9TrUWGf/2UYRFKpFBq2FGKR8W4jG8X8YoPaY 8/3wQnefVwBAIGQ1uMazodAWcLEhEuuggYHiq/zrVjeXhvb3SyMtP0Ac3TSNcCRh//SBvoeip5 0mG0K9KwG/GFBtbFewsNHkcKiSJwBoPngMPjP6ulEbJ1gqCwKrJcOBoQ26ELdcarxOkp9NZp2/ zKs= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082793" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:29 +0800 IronPort-SDR: tbG0fC0Mx9qvVkrRNJfBr2Nw0BmkHAq8aQKmDn+GN2YstujcsHBmx9MF4lI53KLyXtxb63OmSD GiJsMCy7Meg1Q1k/xLlaywhfag4fcN3oe+mc/xsqbbO5/Y0A/uq9C6YpWAnp5a++t9erYkZi2M FElvPVXnGW6qUYioXrPsCNVdwFM/IQdiJ8iDhdVRDVcTzZiW+PgBhWK7RVavOfrE7TKdGblTYz gmHow7I67nm+iUoz/mdW60qRe/+j+TvrzYhCzZNAgYGAbcPRwlYJlbfRW1r5xw1WQajHInnaXT aaZnmm+ZcbyvxgrdWPeHmNur Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:39:03 -0700 IronPort-SDR: OkjUZnWHvpaydu5PLh/LrfCKpb1me7GRleP4vCJeZgbL+8yY+FUojtoTMOEQ/4MQvWVN2eKb9f aLilg/BPQmYoYeX/01znRih65Xehr5HFZfKbXfwSGVLMLlEEY+dDbCpNnFYBkW0R2+TWQ2y0Nz YU3HibWQ+R6bz+DrX3T6UpC2tar3guZcp0s4n+mxk4yDQ/vrmRbEcDl7cror/YgHVTI2f9blCv fIjtZb56b6kEuGcViH2LEPhf+SB8QbL1fScwKD1lOwjmJ+51eSYWScF++c8mdflkcYOUrH7slF 9q0= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:21 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Johannes Thumshirn Subject: [PATCH 07/11] block: factor out requeue handling from dispatch code Date: Tue, 10 Mar 2020 18:46:49 +0900 Message-Id: <20200310094653.33257-8-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Factor out the requeue handling from the dispatch code, this will make subsequent addition of different requeueing schemes easier. Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- block/blk-mq.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index d92088dec6c3..f7ab75ef4d0e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1178,6 +1178,23 @@ static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy) #define BLK_MQ_RESOURCE_DELAY 3 /* ms units */ +static void blk_mq_handle_dev_resource(struct request *rq, + struct list_head *list) +{ + struct request *next = + list_first_entry_or_null(list, struct request, queuelist); + + /* + * If an I/O scheduler has been configured and we got a driver tag for + * the next request already, free it. + */ + if (next) + blk_mq_put_driver_tag(next); + + list_add(&rq->queuelist, list); + __blk_mq_requeue_request(rq); +} + /* * Returns true if we did some work AND can potentially do more. */ @@ -1245,17 +1262,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, ret = q->mq_ops->queue_rq(hctx, &bd); if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { - /* - * If an I/O scheduler has been configured and we got a - * driver tag for the next request already, free it - * again. - */ - if (!list_empty(list)) { - nxt = list_first_entry(list, struct request, queuelist); - blk_mq_put_driver_tag(nxt); - } - list_add(&rq->queuelist, list); - __blk_mq_requeue_request(rq); + blk_mq_handle_dev_resource(rq, list); break; } From patchwork Tue Mar 10 09:46:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428981 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 27EB414B7 for ; Tue, 10 Mar 2020 09:47:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 046F024682 for ; Tue, 10 Mar 2020 09:47:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="CebjJu05" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726395AbgCJJrZ (ORCPT ); Tue, 10 Mar 2020 05:47:25 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26528 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726224AbgCJJrZ (ORCPT ); Tue, 10 Mar 2020 05:47:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833652; x=1615369652; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pqB/3oPl/1fWSriQZpKfkdGTuPJZ67HVd0K1UzVhqTc=; b=CebjJu05B9IMZPx9OkKFDWOyZftvTAsIey5/75sh6YllxndpU07PE4+P eoAW/vRC9JJfq297978zGt2ugs/ZbMXOGU04tcPE1yWNEYjwESxSvF0EE 9XH4fwIu5VROw5jCFV9fdE8Zb8o3fPrj7HI5KCoM01v9f8Oh3Yqg8nPmX oY8mojaBFVXlnTQJpTn5gGYsNctM+li1JaOvXXB6qL3y6B3VoYaqoDCu1 plZx9/Me3CKVtQ45FkxfFbMpEX+LiXKndmaTJlFQK/AO0lcAcpY3WjJvJ w11b81KWLlywl+X83J5zb7hBmpe1z4O1avv/SV3CCOPw5dSGH86F7bNkL Q==; IronPort-SDR: ckdZpa0zJLLy2RL7u1toGjwqkeJAikfLav2VH8FKLnU0bSV9igBukSa4mBa5Tul/aScG//+YxK qnfw9ROXUTobDypV1OCo4XnsKC4pckbv1s/XVGIsT24phsOPkcVugNxXD7A7Lp1ynIkDIBiyP9 wVcudpECQLu9SUSES+Hw6YV3B+K59npaa+bmMY4oEki7p/kn+GLqEUZ2YdozYs+HDLhjk43DZJ 8RaJRB2NUUaHBcYBzIWgS+BJMjc9mNGGe3egwzhq3TpNXDkLleUzXu/9u1zh6K7d+Zhb3WfaS8 TC4= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082798" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:31 +0800 IronPort-SDR: dHBU6kzNfYf16mQjUAYaluCQ8i4vwGELhTd3Y0hJY3csFp1V0UvrkKX4//2xRIOIOiMqwGsZPE mN/Ibqe8KRUuogYmjwY78KnJ+BrYQKCAtRe48pIp2iGkWKoOQ9Km3Li9ORLmJzG6fU5LCyvZ0u 9MgQzIpcWTa51T8r8y+h8yCzf1QEkekqrqZxNcFJsJRw9qToSbOIGp1waCCqXdQBW7NcL51edg FEos/epKJ0Q4PgSvinx0f5skgSCnH05TeclUJgl7YUtzC4ANdTY+BVDSQsJqyvkCfso2Z9S1Ai SpbilabETWrFwkYlHICK+lKZ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:39:04 -0700 IronPort-SDR: 0l51rYSZ/52f+/VoROrDouuBTc3i+alL24k7q8VvrCJ2HeEv5we53SEtMl5nnUuupI+kp3Xx78 wabpi44vZV7GmwOPXeN6wg6P8x/5cK35OFg1sxNiOOuoOFrUsWzsWIPdzfgjNQmh6iVPI/cjuy 09KS2WFqhIST4gwhmFkpYlT8+U2nC07LY20vMwYtJ+GRP8MqI7n3yEIMJwR/HvFw92G4zOmOjc WoSaJ3WzMoGgPyBZAq6cDA90bJthc2Tqjh0D4Aauyl/a8lJWnIJlJNjYABo66JFDvK6HblYQHB Sfc= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:23 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Johannes Thumshirn Subject: [PATCH 08/11] block: delay un-dispatchable request Date: Tue, 10 Mar 2020 18:46:50 +0900 Message-Id: <20200310094653.33257-9-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When a LLDD can't dispatch a request to a specific zone, it will return BLK_STS_ZONE_RESOURCE indicating this request needs to be delayed, e.g. because the zone it will be dispatched to is still write-locked. If this happens set the request aside in a local list to continue trying dispatching requests such as READ requests or a WRITE/ZONE_APPEND requests targetting other zones. This way we can still keep a high queue depth without starving other requests even if one request can't be served due to zone write-locking. All requests put aside in the local list due to BLK_STS_ZONE_RESOURCE are placed back at the head of the dispatch list for retrying the next time the device queues are run again. Signed-off-by: Johannes Thumshirn --- block/blk-mq.c | 27 +++++++++++++++++++++++++++ drivers/scsi/scsi_lib.c | 1 + 2 files changed, 28 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index f7ab75ef4d0e..89eb062825a7 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1195,6 +1195,19 @@ static void blk_mq_handle_dev_resource(struct request *rq, __blk_mq_requeue_request(rq); } +static void blk_mq_handle_zone_resource(struct request *rq, + struct list_head *zone_list) +{ + /* + * If we end up here it is because we cannot dispatch a request to a + * specific zone due to LLD level zone-write locking or other zone + * related resource not being available. In this case, set the request + * aside in zone_list for retrying it later. + */ + list_add(&rq->queuelist, zone_list); + __blk_mq_requeue_request(rq); +} + /* * Returns true if we did some work AND can potentially do more. */ @@ -1206,6 +1219,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bool no_tag = false; int errors, queued; blk_status_t ret = BLK_STS_OK; + LIST_HEAD(zone_list); if (list_empty(list)) return false; @@ -1264,6 +1278,16 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_mq_handle_dev_resource(rq, list); break; + } else if (ret == BLK_STS_ZONE_RESOURCE) { + /* + * Move the request to zone_list and keep going through + * the dipatch list to find more requests the drive + * accepts. + */ + blk_mq_handle_zone_resource(rq, &zone_list); + if (list_empty(list)) + break; + continue; } if (unlikely(ret != BLK_STS_OK)) { @@ -1275,6 +1299,9 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, queued++; } while (!list_empty(list)); + if (!list_empty(&zone_list)) + list_splice_tail_init(&zone_list, list); + hctx->dispatched[queued_to_index(queued)]++; /* diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 610ee41fa54c..ea327f320b7f 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1706,6 +1706,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, case BLK_STS_OK: break; case BLK_STS_RESOURCE: + case BLK_STS_ZONE_RESOURCE: if (atomic_read(&sdev->device_busy) || scsi_device_blocked(sdev)) ret = BLK_STS_DEV_RESOURCE; From patchwork Tue Mar 10 09:46:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428985 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 52755924 for ; Tue, 10 Mar 2020 09:47:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2983E24681 for ; Tue, 10 Mar 2020 09:47:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="c/sRUgGo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726410AbgCJJr1 (ORCPT ); Tue, 10 Mar 2020 05:47:27 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26535 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726224AbgCJJr0 (ORCPT ); Tue, 10 Mar 2020 05:47:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833654; x=1615369654; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ju7fYOdMMw/mCbEYlYx13bRIQKvI30RVH6IZsnaZ3Nw=; b=c/sRUgGoxXG2+ILSgb75c8J59ZM5eFh77Iv8Xt1dR5dOHZ3NRVvCXIFk kk/QsPfP2vEdue5q/3EiX41mm3PMrarLCeYRmWNXsNT0UcpDuXpmSh/b9 oxr+4F1kFBFimqVC31CPj6Qlz94btezZAMM+ScYxdJtKnwPSTYMU3hR27 QJd7OIMaZDYBNnevNe+0hPlL/BjTugcBCjFqBn7XvvoVs6NMZ/gXRlfp8 JQEvnrhkGyKSHwp80nWvnmOikofXyTbSnQ4yQUyiLdJdzPxrJFRWZ0nCB yT78QP2WJX9e+Oyh0ezExsntW7LVeldG+2prkAjtfgFtcJK9fP3mFRZZY Q==; IronPort-SDR: 5zfMzb564KApGTvvGR5x1imPewGYuEVXPppRR8nQbg5/YcRRADTBL5pZoxRRKelHegYSTj3Qz3 Tl0+4NpJ1x5Hb/0wGIbbxONQ+WvMbPFK9YfkRWwJNUeG7F1G1ibDgim/Jrz4Tsz8H/uf6DFtmh OG2BobGoPKx4J4//7DHHi5FSzN+QY3GpE+ehXXmUGRFD6zZ3LPy8ZGfuvAueaJYtjO8ri3Rzyv CzJ1Z1AL5300g5Kmi9C0d0zIMz6aXlM5p9Rj5c8VZDtoUniTEuxI1zSErzrSxliOgmEREXfvf3 KVI= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082803" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:34 +0800 IronPort-SDR: GHg5DiO5NU9FnZEpfwgmg24PPj7JuU9qMcQFYqPalyZiLR5fDetsF6bgfNiyvCoSreDIvr3mRP 2ELPyjX+lzOS/zYJCosFd10c4oa3TGQnAb6x/3arKuuzWEMpuA3po5kA5PGaGhDFcfc0wQ8WQc NY7/m8dqLbjywdjr8IbF9cyhTkBhCLk5taygEVW0x3gmrTpwc7IvRtETWBRupNdvfgad86OL48 pDaFd0pWbW+dRFuI49fufMOVGp4l5ZieSIZXxC99BIT+HmNyU/zcY6NZa0El3aibdObqAnJQbI ST+dCuaO9DyUaasuRm+VBLol Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:39:06 -0700 IronPort-SDR: c8ChprOnfiNBtInv0VhHSYNYakue7OBusX/qS3r6YKLVNi1rPLL37a3Af+/D+fD1VVPrM7Bl8L C57DEXrw11ksHjoyt0/leJBJm5nf9fHesRgNEH1GSZQpUNCKpZXzze1SOPgVwsgNsl54G/BzsJ 5DVPJk8UVSzlpTs2SO44TN+nd9afnVpkHlE0T7aq8r+zuYPnY+G3uXNrO/pnKLg4sZsVmxKZkB kGsNN/k410wJpULRIvyUH61U97qriUJbVpPiScGSb8fKiFFiKhEmoICvGQM64vAbQMPpekRGsN +YE= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:24 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Damien Le Moal Subject: [PATCH 09/11] block: Introduce zone write pointer offset caching Date: Tue, 10 Mar 2020 18:46:51 +0900 Message-Id: <20200310094653.33257-10-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Damien Le Moal Not all zoned block devices natively support the zone append command. E.g. SCSI and ATA disks do not define this command. However, it is fairly straightforward to emulate this command at the LLD level using regular write commands if a zone write pointer position is known. Introducing such emulation enables the use of zone append write for all device types, therefore simplifying for instance the implementation of file systems zoned block device support by avoiding the need for different write pathes depending on the device capabilities. To allow devices without zone append command support to emulate its behavior, introduce a zone write pointer cache attached to the device request_queue, similarly to the zones bitmaps. To save memory, this cache stores write pointer offsets relative to each zone start sector as a 32bits value rather than the 64bits absolute sector position of each zone write pointer. The allocation and initialization of this cache can be requested by a device driver using the QUEUE_FLAG_ZONE_WP_OFST queue flag. The allocation and initialization of the cache is done in the same manner as for the zone bitmaps, within the report zones callback function used by blk_revalidate_disk_zones(). In case of changes to the device zone configuration, the cache is updated under a queue freeze to avoid any race between the device driver use of the cache and the request queue update. Freeing of this new cache is done together with the zone bitmaps from the function blk_queue_free_zone_bitmaps(), renamed here to blk_queue_free_zone_resources(). Maintaining the write pointer offset values is the responsibility of the device LLD. The helper function blk_get_zone_wp_offset() is provided to simplify this task. Signed-off-by: Damien Le Moal --- block/blk-sysfs.c | 2 +- block/blk-zoned.c | 69 ++++++++++++++++++++++++++++++++++++++++-- block/blk.h | 4 +-- include/linux/blkdev.h | 20 ++++++++---- 4 files changed, 84 insertions(+), 11 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 02643e149d5e..bd0c9b4c1c5b 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -901,7 +901,7 @@ static void __blk_release_queue(struct work_struct *work) blk_exit_queue(q); - blk_queue_free_zone_bitmaps(q); + blk_queue_free_zone_resources(q); if (queue_is_mq(q)) blk_mq_release(q); diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 00b025b8b7c0..0f83b8baf607 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -344,18 +344,65 @@ static inline unsigned long *blk_alloc_zone_bitmap(int node, GFP_NOIO, node); } -void blk_queue_free_zone_bitmaps(struct request_queue *q) +static inline unsigned int *blk_alloc_zone_wp_ofst(unsigned int nr_zones) +{ + return kvcalloc(nr_zones, sizeof(unsigned int), GFP_NOIO); +} + +void blk_queue_free_zone_resources(struct request_queue *q) { kfree(q->conv_zones_bitmap); q->conv_zones_bitmap = NULL; kfree(q->seq_zones_wlock); q->seq_zones_wlock = NULL; + kvfree(q->seq_zones_wp_ofst); + q->seq_zones_wp_ofst = NULL; } +/** + * blk_get_zone_wp_ofst - Calculate a zone write pointer offset position + * @zone: Target zone + * @wp_ofst: Calculated write pointer offset + * + * Helper function for low-level device drivers to obtain a zone write pointer + * position relative to the zone start sector (write pointer offset). The write + * pointer offset depends on the zone condition. If the zone has an invalid + * condition, -ENODEV is returned. + */ +int blk_get_zone_wp_offset(struct blk_zone *zone, unsigned int *wp_ofst) +{ + switch (zone->cond) { + case BLK_ZONE_COND_EMPTY: + *wp_ofst = 0; + return 0; + case BLK_ZONE_COND_IMP_OPEN: + case BLK_ZONE_COND_EXP_OPEN: + case BLK_ZONE_COND_CLOSED: + *wp_ofst = zone->wp - zone->start; + return 0; + case BLK_ZONE_COND_FULL: + *wp_ofst = zone->len; + return 0; + case BLK_ZONE_COND_NOT_WP: + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + /* + * Conventional, offline and read-only zones do not have a valid + * write pointer. Use 0 as a dummy value. + */ + *wp_ofst = 0; + return 0; + default: + return -ENODEV; + } +} +EXPORT_SYMBOL_GPL(blk_get_zone_wp_offset); + struct blk_revalidate_zone_args { struct gendisk *disk; unsigned long *conv_zones_bitmap; unsigned long *seq_zones_wlock; + unsigned int *seq_zones_wp_ofst; unsigned int nr_zones; sector_t zone_sectors; sector_t sector; @@ -371,6 +418,7 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, struct gendisk *disk = args->disk; struct request_queue *q = disk->queue; sector_t capacity = get_capacity(disk); + int ret; /* * All zones must have the same size, with the exception on an eventual @@ -406,6 +454,13 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, return -ENODEV; } + if (blk_queue_zone_wp_ofst(q) && !args->seq_zones_wp_ofst) { + args->seq_zones_wp_ofst = + blk_alloc_zone_wp_ofst(args->nr_zones); + if (!args->seq_zones_wp_ofst) + return -ENOMEM; + } + /* Check zone type */ switch (zone->type) { case BLK_ZONE_TYPE_CONVENTIONAL: @@ -432,6 +487,14 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, return -ENODEV; } + if (args->seq_zones_wp_ofst) { + /* Initialize the zone write pointer offset */ + ret = blk_get_zone_wp_offset(zone, + &args->seq_zones_wp_ofst[idx]); + if (ret) + return ret; + } + args->sector += zone->len; return 0; } @@ -480,15 +543,17 @@ int blk_revalidate_disk_zones(struct gendisk *disk) q->nr_zones = args.nr_zones; swap(q->seq_zones_wlock, args.seq_zones_wlock); swap(q->conv_zones_bitmap, args.conv_zones_bitmap); + swap(q->seq_zones_wp_ofst, args.seq_zones_wp_ofst); ret = 0; } else { pr_warn("%s: failed to revalidate zones\n", disk->disk_name); - blk_queue_free_zone_bitmaps(q); + blk_queue_free_zone_resources(q); } blk_mq_unfreeze_queue(q); kfree(args.seq_zones_wlock); kfree(args.conv_zones_bitmap); + kvfree(args.seq_zones_wp_ofst); return ret; } EXPORT_SYMBOL_GPL(blk_revalidate_disk_zones); diff --git a/block/blk.h b/block/blk.h index 0b8884353f6b..cc28391bb0b3 100644 --- a/block/blk.h +++ b/block/blk.h @@ -349,9 +349,9 @@ static inline int blk_iolatency_init(struct request_queue *q) { return 0; } struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp); #ifdef CONFIG_BLK_DEV_ZONED -void blk_queue_free_zone_bitmaps(struct request_queue *q); +void blk_queue_free_zone_resources(struct request_queue *q); #else -static inline void blk_queue_free_zone_bitmaps(struct request_queue *q) {} +static inline void blk_queue_free_zone_resources(struct request_queue *q) {} #endif #endif /* BLK_INTERNAL_H */ diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index e591b22ace03..950d3476918c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -363,6 +363,7 @@ extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op, sector_t sectors, sector_t nr_sectors, gfp_t gfp_mask); extern int blk_revalidate_disk_zones(struct gendisk *disk); +int blk_get_zone_wp_offset(struct blk_zone *zone, unsigned int *wp_ofst); extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg); @@ -499,14 +500,17 @@ struct request_queue { /* * Zoned block device information for request dispatch control. * nr_zones is the total number of zones of the device. This is always - * 0 for regular block devices. conv_zones_bitmap is a bitmap of nr_zones - * bits which indicates if a zone is conventional (bit set) or + * 0 for regular block devices. conv_zones_bitmap is a bitmap of + * nr_zones bits which indicates if a zone is conventional (bit set) or * sequential (bit clear). seq_zones_wlock is a bitmap of nr_zones * bits which indicates if a zone is write locked, that is, if a write - * request targeting the zone was dispatched. All three fields are - * initialized by the low level device driver (e.g. scsi/sd.c). - * Stacking drivers (device mappers) may or may not initialize - * these fields. + * request targeting the zone was dispatched. seq_zones_wp_ofst is an + * array of nr_zones write pointer values relative to the zone start + * sector. This is only initialized for LLDs needing zone append write + * command emulation with regular write. All fields are initialized by + * the blk_revalidate_disk_zones() function when called by the low + * level device driver (e.g. scsi/sd.c). Stacking drivers (device + * mappers) may or may not initialize these fields. * * Reads of this information must be protected with blk_queue_enter() / * blk_queue_exit(). Modifying this information is only allowed while @@ -516,6 +520,7 @@ struct request_queue { unsigned int nr_zones; unsigned long *conv_zones_bitmap; unsigned long *seq_zones_wlock; + unsigned int *seq_zones_wp_ofst; #endif /* CONFIG_BLK_DEV_ZONED */ /* @@ -613,6 +618,7 @@ struct request_queue { #define QUEUE_FLAG_PCI_P2PDMA 25 /* device supports PCI p2p requests */ #define QUEUE_FLAG_ZONE_RESETALL 26 /* supports Zone Reset All */ #define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */ +#define QUEUE_FLAG_ZONE_WP_OFST 28 /* queue needs zone wp offsets */ #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP)) @@ -647,6 +653,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #else #define blk_queue_rq_alloc_time(q) false #endif +#define blk_queue_zone_wp_ofst(q) \ + test_bit(QUEUE_FLAG_ZONE_WP_OFST, &(q)->queue_flags) #define blk_noretry_request(rq) \ ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \ From patchwork Tue Mar 10 09:46:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428989 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B4D7924 for ; Tue, 10 Mar 2020 09:47:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F02CE24682 for ; Tue, 10 Mar 2020 09:47:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="eBEJA2/r" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726224AbgCJJr2 (ORCPT ); Tue, 10 Mar 2020 05:47:28 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26539 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726268AbgCJJr1 (ORCPT ); Tue, 10 Mar 2020 05:47:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833656; x=1615369656; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cLV6+K2Lm6YoCZPJOQV3jBgYk32R+A7ByUqqPEc0XBc=; b=eBEJA2/rANbvdLwXfX1Lbn6jfLEfqFVzpbs7Zk5PuLnHzDH685ReFDba skPPDCgETIaHP5gsJGPZzWYCmCr2wxt4eKbuXcpjwdpFkKRMiCJrVrmQz 1xvQPdflZGX7ckUaEzzwhftqxIXJNc56adhWBtDgickaK+/qOHq76e66K fCWW4JWY4qcq12oIX5MnJZPa8XhZC7bz7zomR4fYKqoTIr6i6wMZtAO1J 3UvYml4Tu5XAJBdh1jsp/cB6GDv2P0WHSLeY4GLUOlh4x2OdyXSr86UHW sL+yhU05AVVE2AS5Q5dnvj+kwmZ8Dx/mtPdV39DxstnuFAMGwIejQ3MX6 g==; IronPort-SDR: Q8mW4vr6Pg5EBeisGanxfG3TyMHxMztrH6V9Cl1L8aUSuMsh9KJ79hFCF1AM2Cn73mJClx04DR 79unEL/t9nhlg5La4xDV/TfixNvXyjSdSOkGmmUBKY0tr2FJV6QNTiZ4xubvSrcPlJPJyrHAV7 C3nRwlFUED8d2hYyGdHgK3B8YVR8RprvcEwj5X5NW+DR6/6VjH+qlhGqYk+uNHIe40XBAoxFhJ CDrqq/FWUIy1kJiDSA73IcVgJuDb7WAY7zMFO+kBR/3DJPEuBP8s7tC/8eQxjcwp7Zh+SK8x3a rhs= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082810" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:36 +0800 IronPort-SDR: PDqSd1mx8rAc/6jsvVtOSVSZpoiQic87veo9RUg1gyUbnTledBFZAPVs33e/a3m+UhleXGYmFn CbIW26Uo3WXqrM5zKyxBHulhXxo+dfMDSfWJdy80sUEzXGpENp/0DXA8zd8rJFZhKSJHC/gs2O tZU3r2zXyhYa08+fC3483zLVPzQudmCcyDq7MI4JmsObUV1BU4Ezdnflqf4mVMQKbvDygO2krq R/qlBQEJw/QPhhkhYONvaKb//t0YOZqLFMkRuBcLP5beW6xpoOsEiDCXd7/RhiXU2mauxbDBF7 KfMXXCy9Jw1NnXAMmAbFh9OZ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:39:07 -0700 IronPort-SDR: VgaJXqfpAwlXXEMQZ1/FQYlYIvlUcnjjfnqt+ftXigX2Gt9mAecTJ27MP9oZfUFm0/qcEfFl3a XjMG63fwXAkQPzFf80QjBJoElHc3m0Mnszith2uldbGHa+0Z2YIpA0Rblel6iEfmniM1f/EwIH F+oeWOjOQivoNmM1zZZ7f7hB32fV3f5na76AjsRfQlULPtTV/lMNC/UAhiqpOHD7yQbV4crOdW kPNUy62oY3LOyEfFps+uexN6sOzeohXWEAva0qIEKNnvVfsaR9kLRHe9A1TzQQMHuLB1ONLJML VoE= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:26 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Johannes Thumshirn Subject: [PATCH 10/11] scsi: sd_zbc: factor out sanity checks for zoned commands Date: Tue, 10 Mar 2020 18:46:52 +0900 Message-Id: <20200310094653.33257-11-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Factor sanity checks for zoned commands from sd_zbc_setup_zone_mgmt_cmnd(). This will help with the introduction of an emulated ZONE_APPEND command. Signed-off-by: Johannes Thumshirn --- drivers/scsi/sd_zbc.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index e4282bce5834..5b925f52d1ce 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -204,6 +204,26 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, return ret; } +static blk_status_t sd_zbc_cmnd_checks(struct scsi_cmnd *cmd) +{ + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + sector_t sector = blk_rq_pos(rq); + + if (!sd_is_zoned(sdkp)) + /* Not a zoned device */ + return BLK_STS_IOERR; + + if (sdkp->device->changed) + return BLK_STS_IOERR; + + if (sector & (sd_zbc_zone_sectors(sdkp) - 1)) + /* Unaligned request */ + return BLK_STS_IOERR; + + return BLK_STS_OK; +} + /** * sd_zbc_setup_zone_mgmt_cmnd - Prepare a zone ZBC_OUT command. The operations * can be RESET WRITE POINTER, OPEN, CLOSE or FINISH. @@ -218,20 +238,14 @@ blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, unsigned char op, bool all) { struct request *rq = cmd->request; - struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); sector_t sector = blk_rq_pos(rq); + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); sector_t block = sectors_to_logical(sdkp->device, sector); + blk_status_t ret; - if (!sd_is_zoned(sdkp)) - /* Not a zoned device */ - return BLK_STS_IOERR; - - if (sdkp->device->changed) - return BLK_STS_IOERR; - - if (sector & (sd_zbc_zone_sectors(sdkp) - 1)) - /* Unaligned request */ - return BLK_STS_IOERR; + ret = sd_zbc_cmnd_checks(cmd); + if (ret != BLK_STS_OK) + return ret; cmd->cmd_len = 16; memset(cmd->cmnd, 0, cmd->cmd_len); From patchwork Tue Mar 10 09:46:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11428995 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE0581874 for ; Tue, 10 Mar 2020 09:47:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A52442467D for ; Tue, 10 Mar 2020 09:47:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="DruM76/S" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726426AbgCJJr3 (ORCPT ); Tue, 10 Mar 2020 05:47:29 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:26542 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726268AbgCJJr3 (ORCPT ); Tue, 10 Mar 2020 05:47:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583833659; x=1615369659; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vQYiGS18jPqBK62tH6GVHNrk4hu/ktUKr5a9fmHzjM8=; b=DruM76/S6wQj95hyRvn7GkXWFbxlJGMR3aXGDPqPpXqI0fVBt4cQBMwC /vEkDzLyoYOMga/NEXDDteQJEAYKsXzMTdQbe9gy52wt8PdIUqQ0LUHai b46X9iFbn5QzMtTx7lRDTrapst2xRDSVNReXl6qAgkbygxEqNekaKbYHq 1m2RQrGoqi22iID8lSt2Zofv/m7ZK8u/T9ZT2CQd56llPNzcpuYdXf8JN qg2PfvE3GCQXQfdboBDlP1cG8w+C27JM6dv7tzDeyggr+gmcvUnFIZdlc Kt2Jiuw01/7vCWyDeJ6cMPKsqESESz4e7SUhsBK7RTlfzfnfGsmXLwcpW w==; IronPort-SDR: CymlAOpLkThhPyr9ZuTTf/vd/zhvK2j+Wfzr7vQB3lReEc04JCTW9bEEhmbtUcbMCr7CSl/lOp q7syNz3ShegH9G/mRwBvBclGNCWRjxkout+I26d6y6InjqQDXLq28cAHyKxX4THSI5+ZzWOCoV z9TNHM2rcoLfGtxzhLJlpDKMGty2oVb4g9iw+wyy+APTetQcV6C59WwzPSfLXa1xAiMKjWx/EW ULt708LTpgQCoNQf6Rmdt8mpIdDyRGr8Y3oVVrY/8r4zkCaS1gcXhdjxOdwL/cJYzc6/qEqe1+ s6Y= X-IronPort-AV: E=Sophos;i="5.70,536,1574092800"; d="scan'208";a="234082818" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 17:47:39 +0800 IronPort-SDR: Kar+RxQZp8IVst8G5qooMpHBhqBKY43zZkjSIzV34JCVuoGPAjKs91S2yROcx/3zxixhLuvFm8 potsDmIJRcI8NiFZhXSCRjHftD2yFnPQi3Ey/8UYkBVc6GuBfp7qqKQx/Yx9LYF/CTMz6OjM6v mCgXoVWn8Ni1rgr3fAegSWmI9OgE4v01rALOMy7BEo/mvBpVDRnPi6Ht++pHwgLprS2bdjap5P GmrZEHfQ3c3Md7IjhQK+1sT9hix1n9mxyr1vp3Bsg5mhbCPh+UbC2ntqoehwaZNjd+yadt/cij I9Y1M380aYG+NFmnODoGjtru Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2020 02:39:09 -0700 IronPort-SDR: dQpg+9K+3AFxo0Me2eLU5loQ3UpZjD6ZbOVU/Y2tZWa2ypK2vr7BcB12OMwFcc8b8+bAwLL87b wuqTp8R4V8VffS5low4xEpWdlEFrEGdUZWHsV4MGBctj+gcwtOmi58GWTG9ym7YQcpiIfUYN9X c9+osRrXwCXunoXtIsY8NDsVoq91EMNnCNb6OPtDuJ7OeEQu16hwLR8sLs4orr3TLzK2se0Qmv uMmSxYrzwU+qC1ei8TWHAuTOC3AVLSgiUPBGH4kShDZVPeRvHRD9ES86Xr2pB947MCAhf+xI1i x8g= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 10 Mar 2020 02:47:28 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , Johannes Thumshirn Subject: [PATCH 11/11] scsi: sd_zbc: emulate ZONE_APPEND commands Date: Tue, 10 Mar 2020 18:46:53 +0900 Message-Id: <20200310094653.33257-12-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200310094653.33257-1-johannes.thumshirn@wdc.com> References: <20200310094653.33257-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) with a start LBA set to the target zone write pointer position. In order to always know the write pointer position of a sequential write zone, the queue flag QUEUE_FLAG_ZONE_WP_OFST is set to get an initialized write pointer offset array attached to the device request queue. The values of the cache are maintained in sync with the device as follows: 1) the write pointer offset of a zone is reset to 0 when a REQ_OP_ZONE_RESET command completes. 2) the write pointer offset of a zone is set to the zone size when a REQ_OP_ZONE_FINISH command completes. 3) the write pointer offset of a zone is incremented by the number of 512B sectors written when a write or a zone append command completes 4) the write pointer offset of all zones is reset to 0 when a REQ_OP_ZONE_RESET_ALL command completes. Since the block layer does not write lock zones for zone append commands, to ensure a sequential ordering of the write commands used for the emulation, the target zone of a zone append command is locked when the function sd_zbc_prepare_zone_append() is called from sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained (e.g. a zone append is in-flight or a regular write has already locked the zone), the zone append command dispatching is delayed by returning BLK_STS_ZONE_RESOURCE. Since zone reset and finish operations can be issued concurrently with writes and zone append requests, ensure a coherent update of the zone write pointer offsets by also write locking the target zones for these zone management requests. Finally, to avoid the need for write locking all zones for REQ_OP_ZONE_RESET_ALL requests, use a spinlock to protect accesses and modifications of the zone write pointer offsets. This spinlock is initialized from sd_probe() using the new function sd_zbc_init(). Signed-off-by: Johannes Thumshirn --- drivers/scsi/sd.c | 28 +++- drivers/scsi/sd.h | 35 ++++- drivers/scsi/sd_zbc.c | 308 +++++++++++++++++++++++++++++++++++++++++- 3 files changed, 358 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 8ca9299ffd36..02b8f2c6bc52 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1215,6 +1215,12 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd) else protect = 0; + if (req_op(rq) == REQ_OP_ZONE_APPEND) { + ret = sd_zbc_prepare_zone_append(cmd, &lba, nr_blocks); + if (ret) + return ret; + } + if (protect && sdkp->protection_type == T10_PI_TYPE2_PROTECTION) { ret = sd_setup_rw32_cmnd(cmd, write, lba, nr_blocks, protect | fua); @@ -1287,6 +1293,7 @@ static blk_status_t sd_init_command(struct scsi_cmnd *cmd) return sd_setup_flush_cmnd(cmd); case REQ_OP_READ: case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return sd_setup_read_write_cmnd(cmd); case REQ_OP_ZONE_RESET: return sd_zbc_setup_zone_mgmt_cmnd(cmd, ZO_RESET_WRITE_POINTER, @@ -2055,7 +2062,7 @@ static int sd_done(struct scsi_cmnd *SCpnt) out: if (sd_is_zoned(sdkp)) - sd_zbc_complete(SCpnt, good_bytes, &sshdr); + good_bytes = sd_zbc_complete(SCpnt, good_bytes, &sshdr); SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt, "sd_done: completed %d of %d bytes\n", @@ -3369,6 +3376,8 @@ static int sd_probe(struct device *dev) sdkp->first_scan = 1; sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS; + sd_zbc_init_disk(sdkp); + sd_revalidate_disk(gd); gd->flags = GENHD_FL_EXT_DEVT; @@ -3662,19 +3671,26 @@ static int __init init_sd(void) if (!sd_page_pool) { printk(KERN_ERR "sd: can't init discard page pool\n"); err = -ENOMEM; - goto err_out_ppool; + goto err_out_cdb_pool; } + err = sd_zbc_init(); + if (err) + goto err_out_ppool; + err = scsi_register_driver(&sd_template.gendrv); if (err) - goto err_out_driver; + goto err_out_zbc; return 0; -err_out_driver: - mempool_destroy(sd_page_pool); +err_out_zbc: + sd_zbc_exit(); err_out_ppool: + mempool_destroy(sd_page_pool); + +err_out_cdb_pool: mempool_destroy(sd_cdb_pool); err_out_cache: @@ -3704,6 +3720,8 @@ static void __exit exit_sd(void) mempool_destroy(sd_page_pool); kmem_cache_destroy(sd_cdb_cache); + sd_zbc_exit(); + class_unregister(&sd_disk_class); for (i = 0; i < SD_MAJORS; i++) { diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 50fff0bf8c8e..19961cdc5a53 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -79,6 +79,7 @@ struct scsi_disk { u32 zones_optimal_open; u32 zones_optimal_nonseq; u32 zones_max_open; + spinlock_t zone_wp_ofst_lock; #endif atomic_t openers; sector_t capacity; /* size in logical blocks */ @@ -207,17 +208,33 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp) #ifdef CONFIG_BLK_DEV_ZONED +int __init sd_zbc_init(void); +void sd_zbc_exit(void); + +void sd_zbc_init_disk(struct scsi_disk *sdkp); extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer); extern void sd_zbc_print_zones(struct scsi_disk *sdkp); blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, unsigned char op, bool all); -extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, - struct scsi_sense_hdr *sshdr); +unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, + struct scsi_sense_hdr *sshdr); int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); +blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba, + unsigned int nr_blocks); + #else /* CONFIG_BLK_DEV_ZONED */ +static inline int sd_zbc_init(void) +{ + return 0; +} + +static inline void sd_zbc_exit(void) {} + +static inline void sd_zbc_init_disk(struct scsi_disk *sdkp) {} + static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { @@ -233,9 +250,17 @@ static inline blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, return BLK_STS_TARGET; } -static inline void sd_zbc_complete(struct scsi_cmnd *cmd, - unsigned int good_bytes, - struct scsi_sense_hdr *sshdr) {} +static inline insigned int sd_zbc_complete(struct scsi_cmnd *cmd, + unsigned int good_bytes, struct scsi_sense_hdr *sshdr) +{ + return 0; +} + +static inline blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, + sector_t *lba) +{ + return BLK_STS_TARGET; +} #define sd_zbc_report_zones NULL diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index 5b925f52d1ce..7ee5b0259b40 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -19,6 +19,11 @@ #include "sd.h" +static struct kmem_cache *sd_zbc_zone_work_cache; +static mempool_t *sd_zbc_zone_work_pool; + +#define SD_ZBC_ZONE_WORK_MEMPOOL_SIZE 8 + static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf, unsigned int idx, report_zones_cb cb, void *data) { @@ -224,6 +229,152 @@ static blk_status_t sd_zbc_cmnd_checks(struct scsi_cmnd *cmd) return BLK_STS_OK; } +#define SD_ZBC_INVALID_WP_OFST ~(0u) +#define SD_ZBC_UPDATING_WP_OFST (SD_ZBC_INVALID_WP_OFST - 1) + +struct sd_zbc_zone_work { + struct work_struct work; + struct scsi_disk *sdkp; + unsigned int zno; + char buf[SD_BUF_SIZE]; +}; + +static int sd_zbc_update_wp_ofst_cb(struct blk_zone *zone, unsigned int idx, + void *data) +{ + struct sd_zbc_zone_work *zwork = data; + struct scsi_disk *sdkp = zwork->sdkp; + struct request_queue *q = sdkp->disk->queue; + int ret; + + spin_lock_bh(&sdkp->zone_wp_ofst_lock); + ret = blk_get_zone_wp_offset(zone, &q->seq_zones_wp_ofst[zwork->zno]); + if (ret) + q->seq_zones_wp_ofst[zwork->zno] = SD_ZBC_INVALID_WP_OFST; + spin_unlock_bh(&sdkp->zone_wp_ofst_lock); + + return ret; +} + +static void sd_zbc_update_wp_ofst_workfn(struct work_struct *work) +{ + struct sd_zbc_zone_work *zwork; + struct scsi_disk *sdkp; + int ret; + + zwork = container_of(work, struct sd_zbc_zone_work, work); + sdkp = zwork->sdkp; + + ret = sd_zbc_do_report_zones(sdkp, zwork->buf, SD_BUF_SIZE, + zwork->zno * sdkp->zone_blocks, true); + if (!ret) + sd_zbc_parse_report(sdkp, zwork->buf + 64, 0, + sd_zbc_update_wp_ofst_cb, zwork); + + mempool_free(zwork, sd_zbc_zone_work_pool); + scsi_device_put(sdkp->device); +} + +static blk_status_t sd_zbc_update_wp_ofst(struct scsi_disk *sdkp, + unsigned int zno) +{ + struct sd_zbc_zone_work *zwork; + + /* + * We are about to schedule work to update a zone write pointer offset, + * which will cause the zone append command to be requeued. So make + * sure that the scsi device does not go away while the work is + * being processed. + */ + if (scsi_device_get(sdkp->device)) + return BLK_STS_IOERR; + + zwork = mempool_alloc(sd_zbc_zone_work_pool, GFP_ATOMIC); + if (!zwork) { + /* Retry later */ + scsi_device_put(sdkp->device); + return BLK_STS_RESOURCE; + } + + memset(zwork, 0, sizeof(struct sd_zbc_zone_work)); + INIT_WORK(&zwork->work, sd_zbc_update_wp_ofst_workfn); + zwork->sdkp = sdkp; + zwork->zno = zno; + + sdkp->disk->queue->seq_zones_wp_ofst[zno] = SD_ZBC_UPDATING_WP_OFST; + + schedule_work(&zwork->work); + + return BLK_STS_RESOURCE; +} + +/** + * sd_zbc_prepare_zone_append() - Prepare an emulated ZONE_APPEND command. + * @cmd: the command to setup + * @lba: the LBA to patch + * @nr_blocks: the number of LBAs to be written + * + * Called from sd_setup_read_write_cmnd() for REQ_OP_ZONE_APPEND. + * @sd_zbc_prepare_zone_append() handles the necessary zone wrote locking and + * patching of the lba for an emulated ZONE_APPEND command. + * + * In case the cached write pointer offset is %SD_ZBC_INVALID_WP_OFST it will + * schedule a REPORT ZONES command and return BLK_STS_IOERR. + */ +blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba, + unsigned int nr_blocks) +{ + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + unsigned int wp_ofst, zno = blk_rq_zone_no(rq); + blk_status_t ret; + + ret = sd_zbc_cmnd_checks(cmd); + if (ret != BLK_STS_OK) + return ret; + + if (!blk_rq_zone_is_seq(rq)) + return BLK_STS_IOERR; + + /* Unlock of the write lock will happen in sd_zbc_complete() */ + if (!blk_req_zone_write_trylock(rq)) + return BLK_STS_ZONE_RESOURCE; + + spin_lock_bh(&sdkp->zone_wp_ofst_lock); + + wp_ofst = rq->q->seq_zones_wp_ofst[zno]; + + if (wp_ofst == SD_ZBC_UPDATING_WP_OFST) { + /* Write pointer offset update in progress: ask for a requeue */ + ret = BLK_STS_RESOURCE; + goto err; + } + + if (wp_ofst == SD_ZBC_INVALID_WP_OFST) { + /* Invalid write pointer offset: trigger an update from disk */ + ret = sd_zbc_update_wp_ofst(sdkp, zno); + goto err; + } + + wp_ofst = sectors_to_logical(sdkp->device, wp_ofst); + if (wp_ofst + nr_blocks > sdkp->zone_blocks) { + ret = BLK_STS_IOERR; + goto err; + } + + /* Set the LBA for the write command used to emulate zone append */ + *lba += wp_ofst; + + spin_unlock_bh(&sdkp->zone_wp_ofst_lock); + + return BLK_STS_OK; + +err: + spin_unlock_bh(&sdkp->zone_wp_ofst_lock); + blk_req_zone_write_unlock(rq); + return ret; +} + /** * sd_zbc_setup_zone_mgmt_cmnd - Prepare a zone ZBC_OUT command. The operations * can be RESET WRITE POINTER, OPEN, CLOSE or FINISH. @@ -261,23 +412,75 @@ blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, cmd->transfersize = 0; cmd->allowed = 0; + /* Only zone reset and zone finish need zone write locking */ + if (op != ZO_RESET_WRITE_POINTER && op != ZO_FINISH_ZONE) + return BLK_STS_OK; + + if (all) { + /* We do not write lock all zones for an all zone reset */ + if (op == ZO_RESET_WRITE_POINTER) + return BLK_STS_OK; + + /* Finishing all zones is not supported */ + return BLK_STS_IOERR; + } + + if (!blk_rq_zone_is_seq(rq)) + return BLK_STS_IOERR; + + if (!blk_req_zone_write_trylock(rq)) + return BLK_STS_ZONE_RESOURCE; + return BLK_STS_OK; } +static inline bool sd_zbc_zone_needs_write_unlock(struct request *rq) +{ + /* + * For zone append, the zone was locked in sd_zbc_prepare_zone_append(). + * For zone reset and zone finish, the zone was locked in + * sd_zbc_setup_zone_mgmt_cmnd(). + * For regular writes, the zone is unlocked by the block layer elevator. + */ + return req_op(rq) == REQ_OP_ZONE_APPEND || + req_op(rq) == REQ_OP_ZONE_RESET || + req_op(rq) == REQ_OP_ZONE_FINISH; +} + +static bool sd_zbc_need_zone_wp_update(struct request *rq) +{ + if (req_op(rq) == REQ_OP_ZONE_RESET_ALL) + return true; + + if (!blk_rq_zone_is_seq(rq)) + return false; + + if (req_op(rq) == REQ_OP_WRITE || + req_op(rq) == REQ_OP_WRITE_ZEROES || + req_op(rq) == REQ_OP_WRITE_SAME) + return true; + + return sd_zbc_zone_needs_write_unlock(rq); +} + /** * sd_zbc_complete - ZBC command post processing. * @cmd: Completed command * @good_bytes: Command reply bytes * @sshdr: command sense header * - * Called from sd_done(). Process report zones reply and handle reset zone - * and write commands errors. + * Called from sd_done() to handle zone commands errors and updates to the + * device queue zone write pointer offset cahce. */ -void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, +unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, struct scsi_sense_hdr *sshdr) { int result = cmd->result; struct request *rq = cmd->request; + struct request_queue *q = rq->q; + struct gendisk *disk = rq->rq_disk; + struct scsi_disk *sdkp = scsi_disk(disk); + unsigned int zno; if (op_is_zone_mgmt(req_op(rq)) && result && @@ -289,7 +492,67 @@ void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, * so be quiet about the error. */ rq->rq_flags |= RQF_QUIET; + goto unlock_zone; } + + if (!sd_zbc_need_zone_wp_update(rq)) + goto unlock_zone; + + /* + * If we got an error for a command that needs updating the write + * pointer offset cache, we must mark the zone wp offset entry as + * invalid to force an update from disk the next time a zone append + * command is issued. + */ + zno = blk_rq_zone_no(rq); + spin_lock_bh(&sdkp->zone_wp_ofst_lock); + + if (result && req_op(rq) != REQ_OP_ZONE_RESET_ALL) { + if (req_op(rq) == REQ_OP_ZONE_APPEND) { + /* Force complete completion (no retry) */ + good_bytes = 0; + scsi_set_resid(cmd, blk_rq_bytes(rq)); + } + + /* + * Force an update of the zone write pointer offset on + * the next zone append access. + */ + if (q->seq_zones_wp_ofst[zno] != SD_ZBC_UPDATING_WP_OFST) + q->seq_zones_wp_ofst[zno] = SD_ZBC_INVALID_WP_OFST; + goto unlock_wp_ofst; + } + + switch (req_op(rq)) { + case REQ_OP_ZONE_APPEND: + rq->__sector += q->seq_zones_wp_ofst[zno]; + /* fallthrough */ + case REQ_OP_WRITE_ZEROES: + case REQ_OP_WRITE_SAME: + case REQ_OP_WRITE: + if (q->seq_zones_wp_ofst[zno] < sd_zbc_zone_sectors(sdkp)) + q->seq_zones_wp_ofst[zno] += good_bytes >> SECTOR_SHIFT; + break; + case REQ_OP_ZONE_RESET: + q->seq_zones_wp_ofst[zno] = 0; + break; + case REQ_OP_ZONE_FINISH: + q->seq_zones_wp_ofst[zno] = sd_zbc_zone_sectors(sdkp); + break; + case REQ_OP_ZONE_RESET_ALL: + memset(q->seq_zones_wp_ofst, 0, + sdkp->nr_zones * sizeof(unsigned int)); + break; + } + +unlock_wp_ofst: + spin_unlock_bh(&sdkp->zone_wp_ofst_lock); + +unlock_zone: + if (sd_zbc_zone_needs_write_unlock(rq)) + blk_req_zone_write_unlock(rq); + + return good_bytes; } /** @@ -417,8 +680,11 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) /* The drive satisfies the kernel restrictions: set it up */ blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, sdkp->disk->queue); + blk_queue_flag_set(QUEUE_FLAG_ZONE_WP_OFST, sdkp->disk->queue); blk_queue_required_elevator_features(sdkp->disk->queue, ELEVATOR_F_ZBD_SEQ_WRITE); + blk_queue_max_zone_append_sectors(sdkp->disk->queue, + sdkp->disk->queue->limits.max_hw_sectors); nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks); /* READ16/WRITE16 is mandatory for ZBC disks */ @@ -470,3 +736,39 @@ void sd_zbc_print_zones(struct scsi_disk *sdkp) sdkp->nr_zones, sdkp->zone_blocks); } + +void sd_zbc_init_disk(struct scsi_disk *sdkp) +{ + if (!sd_is_zoned(sdkp)) + return; + + spin_lock_init(&sdkp->zone_wp_ofst_lock); +} + +int __init sd_zbc_init(void) +{ + sd_zbc_zone_work_cache = + kmem_cache_create("sd_zbc_zone_work", + sizeof(struct sd_zbc_zone_work), + 0, 0, NULL); + if (!sd_zbc_zone_work_cache) + return -ENOMEM; + + sd_zbc_zone_work_pool = + mempool_create_slab_pool(SD_ZBC_ZONE_WORK_MEMPOOL_SIZE, + sd_zbc_zone_work_cache); + if (!sd_zbc_zone_work_pool) { + kmem_cache_destroy(sd_zbc_zone_work_cache); + printk(KERN_ERR "sd_zbc: create zone work pool failed\n"); + return -ENOMEM; + } + + return 0; +} + +void sd_zbc_exit(void) +{ + mempool_destroy(sd_zbc_zone_work_pool); + kmem_cache_destroy(sd_zbc_zone_work_cache); +} +