From patchwork Fri Mar 27 16:50:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462639 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 777AB17EA for ; Fri, 27 Mar 2020 16:50:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5658B206E6 for ; Fri, 27 Mar 2020 16:50:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="JH5HDC7Q" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727834AbgC0QuV (ORCPT ); Fri, 27 Mar 2020 12:50:21 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2564 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727829AbgC0QuU (ORCPT ); Fri, 27 Mar 2020 12:50:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327819; x=1616863819; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NchUkm7LiVI+SxmCSpPmzzlAW5apjiaxdbNPeI6W/yk=; b=JH5HDC7QNjRMpmS/YNJ+ZVxbyTW8B+1+RCxjyKzP4Df4Tv9pTFxvxge4 5rb+EzQ6neuGT1Gm4nQuA6t0wZ2x3cmQeKKbk0LtCErq4qpgKLB+wgXYL CPuf05+O4/o8OctUwrBteUEqdVE87uYiGRQCPkJ7tlIi3oH7cWyGMqYiM tLg2sTZKC3LCT2iGQ6aGnkJYdB/AWGD+zfPl/5klcMfDPx45ga8h4EvGD dEcn3P9neOVv2IQO0brT+mw3JSQw/Ge3Rdf38/1xLFXPietrRseJ4V85/ 4okMPi/Ik48AdqAH1QQN0D09yhpTWE/zg9OQGuC4r9XDhxh7XmBHtsjf3 g==; IronPort-SDR: LBBNERc2MlTMeWR2ZNIjeZRckooVuFFKp/KQZ7FykBaPWs+JX2nlrWkcQtT0HkVH0ak3KHNRHr 3l+otFF1jQth1/qRmy9yUkbc3L8wfgl9TsRSIAnZNI4WO7I3TWu/iUcqiBbh1S+Rb4L1EXV9uh XQNZPpnWbucvIiPOCZPnOZXwz2pUyyXDhPNFBcFm0+tGdIxNYvicTEEvw1134YY2z6klFSnH1J 8QazSp1QHMEJ6h3rulqtQ+PzHOSeshlbAvPb51ZpNaa8Uj1kGvLT8o6TkrBjzRkJlfbmgFtT16 Hu0= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210437" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:18 +0800 IronPort-SDR: wMWQfbTSmK5BRnEDqTrOuSo+nXMAkvH6Neyg8Oxqw87NaT0Ejv1w/tFcpN+XjTLG97oEC9hm9Q queMeLLpwJvYNR/TrRXIE05tPoJgmXLr0YY9uuYeBx48IX+Wbz+R7XapOF2qCJtF0M/sISHUkH KLdXzAFCSp23SFzzsTk78OeDcQ1Q5B0MS/YaHdk/1hwKHeQsAIi1XLzpfjfiUVpleSn2gExnSu I7EhBp6xcZt/SVD8Hk9MZFROUg/pqcHRggA9PTtQZTotDWm4sHjTZVDI+61uiAu4mAa9mTK/f+ cpW3Y7L5qr+zSVINoIaX8cSw Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:41:53 -0700 IronPort-SDR: PbzSbmgRNHkbiNSMXWYrU8ZaSXH3+bIIJzS9zjQ4OqoLxv0H99t95QttXTeA0hRa3bTmiJjlNH nCXmDQ0EUm/TehQhbyh61EzYd7rWbhLhPxJ+D5JrTj1GMG0r1djgbj/GwMCI1CF9Z8lRpVgfvM iEXNCS6bPK+q+YuwND8+rl3W4c5snEZKg9U4urEW+zWbuF0OcyGW+DBHylRaoTvr4gVwLzm/6Z zh/vQ3/mvNheNB5cngxEgRqkxPa6Bjx/gtUh+XzlbdmfJdIpHPzD6CQnIYzALMfCkQpceSSPKV 4nQ= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:17 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v3 01/10] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Date: Sat, 28 Mar 2020 01:50:03 +0900 Message-Id: <20200327165012.34443-2-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org blk_queue_zone_is_seq() and blk_queue_zone_no() have not been called with CONFIG_BLK_DEV_ZONED disabled until now. The introduction of REQ_OP_ZONE_APPEND will change this, so we need to provide noop fallbacks for the !CONFIG_BLK_DEV_ZONED case. Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- include/linux/blkdev.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 53a1325efbc3..cda34e0f94d3 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -729,6 +729,16 @@ static inline unsigned int blk_queue_nr_zones(struct request_queue *q) { return 0; } +static inline bool blk_queue_zone_is_seq(struct request_queue *q, + sector_t sector) +{ + return false; +} +static inline unsigned int blk_queue_zone_no(struct request_queue *q, + sector_t sector) +{ + return 0; +} #endif /* CONFIG_BLK_DEV_ZONED */ static inline bool rq_is_sync(struct request *rq) From patchwork Fri Mar 27 16:50:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D3DBD17EA for ; Fri, 27 Mar 2020 16:50:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B11DD20857 for ; Fri, 27 Mar 2020 16:50:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Oot+0ucv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727846AbgC0QuW (ORCPT ); Fri, 27 Mar 2020 12:50:22 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2564 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727833AbgC0QuV (ORCPT ); Fri, 27 Mar 2020 12:50:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327821; x=1616863821; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hUdfEc7Ft7iTxSMiRM9lyScyXTJz4Yw7cmn3+XUcwNs=; b=Oot+0ucvDofoyGEXsWwWSwAYEdMhmvXhTZWV4JetjNg55wzu/IM93ngh lTPtz4PfnuvR4J2zZRM528ErFVYH1icRl991+IT/PkMho2ApD3BIRTeOE CUlzvYF8TkaVoKVM2c8FN68zj780C86fHX8gBqREZNE4nUSUz/uDRGJnj WBgFERsqWFjxAifo4nYgkTM3SpV1XSuCevBNB2uQpdwVN6dX5fn++ZFTQ 0quoeoeSHk1QvvV7ltyJn3/RCN4E1Ztx+i6cqa1XS2siOr54r33Vf0+Ja VYcrj1m0kFqVQC+WvDs1KBEhNMK9HQNevDG13IdEp6qdcD0yUY2bjHXRg g==; IronPort-SDR: 7ZzwvHTstwGatMcK5p+1+P6T37bjUZCKFFDWUMA6ie2tElXh4JQ4U2yBFBh1wD2/zJaGFzjPny xXVhDoMyhtjcQubfyUS5s37EYU66SnidPvIYAeOq78JByjK6VAIbKmx/44smYu9hjoPwa0A1NY v8+KCYGN5tJsfJIKjw1dV7k8d+sOpcQgu2jjSsfpLJH2IrMrsw/QTUmEojXKS8WgenMsnkpOW+ IeS4dCa93Jr+AugqwnXXxs7LqHMaSRV4Zb1ZBjj7DL4e4mJ318mXn+tQpJSaotn4yfzS8sDinv 4/o= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210440" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:20 +0800 IronPort-SDR: jEO7n0268IG0toWsD+hiSeBcCfZG9YVff76F7GAMPucUPHSZZL7vKBQyaIjdWTrc3otnp4l8Am TL7xqnPv/D4soZcMCKpENLwy13TwO2VQbLxAEXlC2Eu7c3dmlaQ5/cfx3sQ//UGeL9Q0wsB/yS aFhF1Dt2FDPQAsDPtEtDJs4f3PIB8RDDmRV2z5foSE+R9J1tkHnU0A9CRIt3+a7mHFgrVyr1tG 7ohmtF9gA2BOXTSKdOQp4BiekPdg6lsAP4bsY8nVuZ09acQyxv0CDTRW/3JuAfoeE5jDK67K8g 6sKagaQwHZw+u/lP54Ablaa5 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:41:55 -0700 IronPort-SDR: QH+6TBv3kfVFLi0tUM3GsmUYuMPMZeJIk3yS8xIqDMoioZjkgDeCfovAqwF/N2raeiHRv1Q25O U6GGDf7LkmUcN9LJ+aRwKZiUbtmWiM24FvKS60HEcVYUAGEE4nkdgL0bka+CUxjzGY5kJIi7sP INW3RuoI+KmaoiAMbvZIt3uFb9Nv8xQGWL2+Spvmtp18h4kxuCRVzWt6rQ/D75ltBltIwFnwHC yEnyp4QI+BUtsuhsS2HeA6Zkqztd8oIh2yBXR2Dkd5xBnPmUxlPzxq6zE+gpMwn57rq4Hgedl6 BCI= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:19 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v3 02/10] block: Introduce REQ_OP_ZONE_APPEND Date: Sat, 28 Mar 2020 01:50:04 +0900 Message-Id: <20200327165012.34443-3-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Keith Busch Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned block device. This is a no-merge write operation. A zone append write BIO must: * Target a zoned block device * Have a sector position indicating the start sector of the target zone * The target zone must be a sequential write zone * The BIO must not cross a zone boundary * The BIO size must not be split to ensure that a single range of LBAs is written with a single command. Implement these checks in generic_make_request_checks() using the helper function blk_check_zone_append(). To avoid write append BIO splitting, introduce the new max_zone_append_sectors queue limit attribute and ensure that a BIO size is always lower than this limit. Export this new limit through sysfs and check these limits in bio_full(). Also when a LLDD can't dispatch a request to a specific zone, it will return BLK_STS_ZONE_RESOURCE indicating this request needs to be delayed, e.g. because the zone it will be dispatched to is still write-locked. If this happens set the request aside in a local list to continue trying dispatching requests such as READ requests or a WRITE/ZONE_APPEND requests targetting other zones. This way we can still keep a high queue depth without starving other requests even if one request can't be served due to zone write-locking. Finally, make sure that the bio sector position indicates the actual write position as indicated by the device on completion. Signed-off-by: Keith Busch Signed-off-by: Johannes Thumshirn --- Changes since v2: - Fixed commit message regarding bio_full() - Fixed return values of bio_can_zone_append() --- block/bio.c | 72 +++++++++++++++++++++++++++++++++++++-- block/blk-core.c | 52 ++++++++++++++++++++++++++++ block/blk-map.c | 2 +- block/blk-mq.c | 27 +++++++++++++++ block/blk-settings.c | 19 +++++++++++ block/blk-sysfs.c | 13 +++++++ block/blk-zoned.c | 10 ++++++ drivers/scsi/scsi_lib.c | 1 + include/linux/bio.h | 22 ++---------- include/linux/blk_types.h | 14 ++++++++ include/linux/blkdev.h | 11 ++++++ 11 files changed, 220 insertions(+), 23 deletions(-) diff --git a/block/bio.c b/block/bio.c index 11e6aac35092..aee214db92d3 100644 --- a/block/bio.c +++ b/block/bio.c @@ -729,6 +729,45 @@ const char *bio_devname(struct bio *bio, char *buf) } EXPORT_SYMBOL(bio_devname); +static inline bool bio_can_zone_append(struct bio *bio, unsigned len) +{ + struct request_queue *q = bio->bi_disk->queue; + unsigned int max_append_sectors = queue_max_zone_append_sectors(q); + + if (WARN_ON_ONCE(!max_append_sectors)) + return true; + + if (((bio->bi_iter.bi_size + len) >> 9) > max_append_sectors) + return true; + + if (bio->bi_vcnt >= queue_max_segments(q)) + return true; + + return false; +} + +/** + * bio_full - check if the bio is full + * @bio: bio to check + * @len: length of one segment to be added + * + * Return true if @bio is full and one segment with @len bytes can't be + * added to the bio, otherwise return false + */ +bool bio_full(struct bio *bio, unsigned len) +{ + if (bio->bi_vcnt >= bio->bi_max_vecs) + return true; + + if (bio->bi_iter.bi_size > UINT_MAX - len) + return true; + + if (bio_op(bio) == REQ_OP_ZONE_APPEND) + return bio_can_zone_append(bio, len); + + return false; +} + static inline bool page_is_mergeable(const struct bio_vec *bv, struct page *page, unsigned int len, unsigned int off, bool *same_page) @@ -831,6 +870,22 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, } EXPORT_SYMBOL(bio_add_pc_page); +static bool bio_try_merge_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int off) +{ + struct request_queue *q = bio->bi_disk->queue; + struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; + unsigned long mask = queue_segment_boundary(q); + phys_addr_t addr1 = page_to_phys(bv->bv_page) + bv->bv_offset; + phys_addr_t addr2 = page_to_phys(page) + off + len - 1; + + if ((addr1 | mask) != (addr2 | mask)) + return false; + if (bv->bv_len + len > queue_max_segment_size(q)) + return false; + return true; +} + /** * __bio_try_merge_page - try appending data to an existing bvec. * @bio: destination bio @@ -856,6 +911,12 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, if (bio->bi_vcnt > 0) { struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (!bio_try_merge_zone_append_page(bio, page, len, + off)) + return false; + } + if (page_is_mergeable(bv, page, len, off, same_page)) { if (bio->bi_iter.bi_size > UINT_MAX - len) return false; @@ -916,6 +977,7 @@ int bio_add_page(struct bio *bio, struct page *page, if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) { if (bio_full(bio, len)) return 0; + __bio_add_page(bio, page, len, offset); } return len; @@ -948,7 +1010,7 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count); size = bio_add_page(bio, bv->bv_page, len, - bv->bv_offset + iter->iov_offset); + bv->bv_offset + iter->iov_offset); if (unlikely(size != len)) return -EINVAL; iov_iter_advance(iter, size); @@ -1448,7 +1510,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q, */ struct bio *bio_map_user_iov(struct request_queue *q, struct iov_iter *iter, - gfp_t gfp_mask) + gfp_t gfp_mask, unsigned int op) { int j; struct bio *bio; @@ -1488,7 +1550,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, n = bytes; if (!__bio_add_pc_page(q, bio, page, n, offs, - &same_page)) { + &same_page)) { if (same_page) put_page(page); break; @@ -1953,6 +2015,10 @@ struct bio *bio_split(struct bio *bio, int sectors, BUG_ON(sectors <= 0); BUG_ON(sectors >= bio_sectors(bio)); + /* Zone append commands cannot be split */ + if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND)) + return NULL; + split = bio_clone_fast(bio, gfp, bs); if (!split) return NULL; diff --git a/block/blk-core.c b/block/blk-core.c index eaf6cb3887e6..b602daa79a6d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -135,6 +135,7 @@ static const char *const blk_op_name[] = { REQ_OP_NAME(ZONE_OPEN), REQ_OP_NAME(ZONE_CLOSE), REQ_OP_NAME(ZONE_FINISH), + REQ_OP_NAME(ZONE_APPEND), REQ_OP_NAME(WRITE_SAME), REQ_OP_NAME(WRITE_ZEROES), REQ_OP_NAME(SCSI_IN), @@ -240,6 +241,17 @@ static void req_bio_endio(struct request *rq, struct bio *bio, bio_advance(bio, nbytes); + if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) { + /* + * Partial zone append completions cannot be supported as the + * BIO fragments may end up not being written sequentially. + */ + if (bio->bi_iter.bi_size) + bio->bi_status = BLK_STS_IOERR; + else + bio->bi_iter.bi_sector = rq->__sector; + } + /* don't actually finish bio if it's part of flush sequence */ if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ)) bio_endio(bio); @@ -864,6 +876,41 @@ static inline int blk_partition_remap(struct bio *bio) return ret; } +/* + * Check write append to a zoned block device. + */ +static inline blk_status_t blk_check_zone_append(struct request_queue *q, + struct bio *bio) +{ + sector_t pos = bio->bi_iter.bi_sector; + int nr_sectors = bio_sectors(bio); + + /* Only applicable to zoned block devices */ + if (!blk_queue_is_zoned(q)) + return BLK_STS_NOTSUPP; + + /* The bio sector must point to the start of a sequential zone */ + if (pos & (blk_queue_zone_sectors(q) - 1) || + !blk_queue_zone_is_seq(q, pos)) + return BLK_STS_IOERR; + + /* + * Not allowed to cross zone boundaries. Otherwise, the BIO will be + * split and could result in non-contiguous sectors being written in + * different zones. + */ + if (blk_queue_zone_no(q, pos) != blk_queue_zone_no(q, pos + nr_sectors)) + return BLK_STS_IOERR; + + /* Make sure the BIO is small enough and will not get split */ + if (nr_sectors > q->limits.max_zone_append_sectors) + return BLK_STS_IOERR; + + bio->bi_opf |= REQ_NOMERGE; + + return BLK_STS_OK; +} + static noinline_for_stack bool generic_make_request_checks(struct bio *bio) { @@ -936,6 +983,11 @@ generic_make_request_checks(struct bio *bio) if (!q->limits.max_write_same_sectors) goto not_supported; break; + case REQ_OP_ZONE_APPEND: + status = blk_check_zone_append(q, bio); + if (status != BLK_STS_OK) + goto end_io; + break; case REQ_OP_ZONE_RESET: case REQ_OP_ZONE_OPEN: case REQ_OP_ZONE_CLOSE: diff --git a/block/blk-map.c b/block/blk-map.c index b0790268ed9d..a83ba39251a9 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -72,7 +72,7 @@ static int __blk_rq_map_user_iov(struct request *rq, if (copy) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else - bio = bio_map_user_iov(q, iter, gfp_mask); + bio = bio_map_user_iov(q, iter, gfp_mask, req_op(rq)); if (IS_ERR(bio)) return PTR_ERR(bio); diff --git a/block/blk-mq.c b/block/blk-mq.c index 745ec592a513..c06c796742ec 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1195,6 +1195,19 @@ static void blk_mq_handle_dev_resource(struct request *rq, __blk_mq_requeue_request(rq); } +static void blk_mq_handle_zone_resource(struct request *rq, + struct list_head *zone_list) +{ + /* + * If we end up here it is because we cannot dispatch a request to a + * specific zone due to LLD level zone-write locking or other zone + * related resource not being available. In this case, set the request + * aside in zone_list for retrying it later. + */ + list_add(&rq->queuelist, zone_list); + __blk_mq_requeue_request(rq); +} + /* * Returns true if we did some work AND can potentially do more. */ @@ -1206,6 +1219,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bool no_tag = false; int errors, queued; blk_status_t ret = BLK_STS_OK; + LIST_HEAD(zone_list); if (list_empty(list)) return false; @@ -1264,6 +1278,16 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_mq_handle_dev_resource(rq, list); break; + } else if (ret == BLK_STS_ZONE_RESOURCE) { + /* + * Move the request to zone_list and keep going through + * the dipatch list to find more requests the drive + * accepts. + */ + blk_mq_handle_zone_resource(rq, &zone_list); + if (list_empty(list)) + break; + continue; } if (unlikely(ret != BLK_STS_OK)) { @@ -1275,6 +1299,9 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, queued++; } while (!list_empty(list)); + if (!list_empty(&zone_list)) + list_splice_tail_init(&zone_list, list); + hctx->dispatched[queued_to_index(queued)]++; /* diff --git a/block/blk-settings.c b/block/blk-settings.c index be1dca0103a4..ac0711803ee7 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim) lim->chunk_sectors = 0; lim->max_write_same_sectors = 0; lim->max_write_zeroes_sectors = 0; + lim->max_zone_append_sectors = 0; lim->max_discard_sectors = 0; lim->max_hw_discard_sectors = 0; lim->discard_granularity = 0; @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->max_dev_sectors = UINT_MAX; lim->max_write_same_sectors = UINT_MAX; lim->max_write_zeroes_sectors = UINT_MAX; + lim->max_zone_append_sectors = UINT_MAX; } EXPORT_SYMBOL(blk_set_stacking_limits); @@ -257,6 +259,21 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q, } EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors); +/** + * blk_queue_max_zone_append_sectors - set max sectors for a single zone append + * @q: the request queue for the device + * @max_zone_append_sectors: maximum number of sectors to write per command + **/ +void blk_queue_max_zone_append_sectors(struct request_queue *q, + unsigned int max_zone_append_sectors) +{ + unsigned int max_sectors; + + max_sectors = min(q->limits.max_hw_sectors, max_zone_append_sectors); + q->limits.max_zone_append_sectors = max_sectors; +} +EXPORT_SYMBOL_GPL(blk_queue_max_zone_append_sectors); + /** * blk_queue_max_segments - set max hw segments for a request for this queue * @q: the request queue for the device @@ -506,6 +523,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, b->max_write_same_sectors); t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors, b->max_write_zeroes_sectors); + t->max_zone_append_sectors = min(t->max_zone_append_sectors, + b->max_zone_append_sectors); t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn); t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index fca9b158f4a0..02643e149d5e 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -218,6 +218,13 @@ static ssize_t queue_write_zeroes_max_show(struct request_queue *q, char *page) (unsigned long long)q->limits.max_write_zeroes_sectors << 9); } +static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page) +{ + unsigned long long max_sectors = q->limits.max_zone_append_sectors; + + return sprintf(page, "%llu\n", max_sectors << SECTOR_SHIFT); +} + static ssize_t queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) { @@ -639,6 +646,11 @@ static struct queue_sysfs_entry queue_write_zeroes_max_entry = { .show = queue_write_zeroes_max_show, }; +static struct queue_sysfs_entry queue_zone_append_max_entry = { + .attr = {.name = "zone_append_max_bytes", .mode = 0444 }, + .show = queue_zone_append_max_show, +}; + static struct queue_sysfs_entry queue_nonrot_entry = { .attr = {.name = "rotational", .mode = 0644 }, .show = queue_show_nonrot, @@ -749,6 +761,7 @@ static struct attribute *queue_attrs[] = { &queue_discard_zeroes_data_entry.attr, &queue_write_same_max_entry.attr, &queue_write_zeroes_max_entry.attr, + &queue_zone_append_max_entry.attr, &queue_nonrot_entry.attr, &queue_zoned_entry.attr, &queue_nr_zones_entry.attr, diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 6b442ae96499..9d30a4115dbc 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -455,6 +455,15 @@ int blk_revalidate_disk_zones(struct gendisk *disk) blk_revalidate_zone_cb, &args); memalloc_noio_restore(noio_flag); + if (ret == 0 && + (queue_max_zone_append_sectors(q) > queue_max_hw_sectors(q) || + queue_max_zone_append_sectors(q) > q->limits.chunk_sectors)) { + pr_warn("%s: invalid max_zone_append_bytes value: %u\n", + disk->disk_name, queue_max_zone_append_sectors(q) << 9); + ret = -EINVAL; + goto out; + } + /* * Install the new bitmaps and update nr_zones only once the queue is * stopped and all I/Os are completed (i.e. a scheduler is not @@ -473,6 +482,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk) } blk_mq_unfreeze_queue(q); +out: kfree(args.seq_zones_wlock); kfree(args.conv_zones_bitmap); return ret; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 610ee41fa54c..ea327f320b7f 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1706,6 +1706,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, case BLK_STS_OK: break; case BLK_STS_RESOURCE: + case BLK_STS_ZONE_RESOURCE: if (atomic_read(&sdev->device_busy) || scsi_device_blocked(sdev)) ret = BLK_STS_DEV_RESOURCE; diff --git a/include/linux/bio.h b/include/linux/bio.h index a430e9c1c2d2..59d840706027 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -102,24 +102,7 @@ static inline void *bio_data(struct bio *bio) return NULL; } -/** - * bio_full - check if the bio is full - * @bio: bio to check - * @len: length of one segment to be added - * - * Return true if @bio is full and one segment with @len bytes can't be - * added to the bio, otherwise return false - */ -static inline bool bio_full(struct bio *bio, unsigned len) -{ - if (bio->bi_vcnt >= bio->bi_max_vecs) - return true; - - if (bio->bi_iter.bi_size > UINT_MAX - len) - return true; - - return false; -} +bool bio_full(struct bio *bio, unsigned len); static inline bool bio_next_segment(const struct bio *bio, struct bvec_iter_all *iter) @@ -435,6 +418,7 @@ void bio_chain(struct bio *, struct bio *); extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int); extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *, unsigned int, unsigned int); + bool __bio_try_merge_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off, bool *same_page); void __bio_add_page(struct bio *bio, struct page *page, @@ -443,7 +427,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); void bio_release_pages(struct bio *bio, bool mark_dirty); struct rq_map_data; extern struct bio *bio_map_user_iov(struct request_queue *, - struct iov_iter *, gfp_t); + struct iov_iter *, gfp_t, unsigned int); extern void bio_unmap_user(struct bio *); extern struct bio *bio_map_kern(struct request_queue *, void *, unsigned int, gfp_t); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 70254ae11769..824ec2d89954 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -63,6 +63,18 @@ typedef u8 __bitwise blk_status_t; */ #define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) +/* + * BLK_STS_ZONE_RESOURCE is returned from the driver to the block layer if zone + * related resources are unavailable, but the driver can guarantee the queue + * will be rerun in the future once the resources become available again. + * + * This is different from BLK_STS_DEV_RESOURCE in that it explicitly references + * a zone specific resource and IO to a different zone on the same device could + * still be served. Examples of that are zones that are write-locked, but a read + * to the same zone could be served. + */ +#define BLK_STS_ZONE_RESOURCE ((__force blk_status_t)14) + /** * blk_path_error - returns true if error may be path related * @error: status the request was completed with @@ -296,6 +308,8 @@ enum req_opf { REQ_OP_ZONE_CLOSE = 11, /* Transition a zone to full */ REQ_OP_ZONE_FINISH = 12, + /* write data at the current zone write pointer */ + REQ_OP_ZONE_APPEND = 13, /* SCSI passthrough using struct scsi_request */ REQ_OP_SCSI_IN = 32, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index cda34e0f94d3..50e9b140cad7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -336,6 +336,7 @@ struct queue_limits { unsigned int max_hw_discard_sectors; unsigned int max_write_same_sectors; unsigned int max_write_zeroes_sectors; + unsigned int max_zone_append_sectors; unsigned int discard_granularity; unsigned int discard_alignment; @@ -757,6 +758,9 @@ static inline bool rq_mergeable(struct request *rq) if (req_op(rq) == REQ_OP_WRITE_ZEROES) return false; + if (req_op(rq) == REQ_OP_ZONE_APPEND) + return false; + if (rq->cmd_flags & REQ_NOMERGE_FLAGS) return false; if (rq->rq_flags & RQF_NOMERGE_FLAGS) @@ -1088,6 +1092,8 @@ extern void blk_queue_max_write_same_sectors(struct request_queue *q, extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, unsigned int max_write_same_sectors); extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); +extern void blk_queue_max_zone_append_sectors(struct request_queue *q, + unsigned int max_zone_append_sectors); extern void blk_queue_physical_block_size(struct request_queue *, unsigned int); extern void blk_queue_alignment_offset(struct request_queue *q, unsigned int alignment); @@ -1301,6 +1307,11 @@ static inline unsigned int queue_max_segment_size(const struct request_queue *q) return q->limits.max_segment_size; } +static inline unsigned int queue_max_zone_append_sectors(const struct request_queue *q) +{ + return q->limits.max_zone_append_sectors; +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { int retval = 512; From patchwork Fri Mar 27 16:50:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462675 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6EE981668 for ; Fri, 27 Mar 2020 16:50:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4C40E21473 for ; Fri, 27 Mar 2020 16:50:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="axQxFPWR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727852AbgC0QuZ (ORCPT ); Fri, 27 Mar 2020 12:50:25 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2564 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726900AbgC0QuX (ORCPT ); Fri, 27 Mar 2020 12:50:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327822; x=1616863822; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EPr2SzSUKNC8Hgcg5K/bHK+2y5HKhjTakUPPF8D8d4g=; b=axQxFPWRpcGnP6CawVqbvB8utDuUu0r2VO1e4UU+JEJpDs/GVko574PE MJnKzzJc0SfgcEqAeog0wCFuFu7jlb8kvC9HwwYZ2A3KjH6VVIwaxkv0f JzflpbeX8H87//WKMU5Uazx5tyITHuGBcthD4SX0maOCBPo/gq9cy+fQT /LGG2QtKL/L7HCKViLMiRV+hVulwQjCV61qcWtdA9WBv3pGnTyBF+I1TT xZ+Eup9aOrJ07/gEB5XBzyJeFIkkRorPAAupxGQ4iknlnK8QUyOfgCGLf KEoEl4yHFghaUhnTmDmwyFHCiMb2syU4ewelmSmfpIVmJokOh4WZzpwIR A==; IronPort-SDR: jLxhnTGlebUN89e+7ad6pNC0YG6PBeuYhzmDDFGB36uo7NO7D+RG+c6EMizFLE9a4DtfCuh5E/ +kNYqY35zLzr5AUvGqlLf7hDsP9Ugt8X2ereeQOhEfEc7vX/oWvh/2CRWhEu7U85MWVp0qP0aG FMQdnry/F07ASfnrAL42Uy6XWpLizGNYWasC3S38cYmV+c9jv902V86zNF7pt28o2USwDNuMHy E/eP7NhsSpPFvO3SydHcSbhfyC0bXmzs7pEXE5pODyn+kayFFe6xoiMDKmvewuXkjK5pmWJDRY F8o= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210442" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:22 +0800 IronPort-SDR: KYl1HfwKShOB6oduRhRnMv7QYZoTI5MY98XV7jqJbOsNJSHP/iAnGQOmWLLQk1FpEjsNDFDbv4 QZ6CcQwKe5mI63wySsiPjtIbmHyvpAgWWyBehu/u05X+UDxxfQMCfR7p0D+nuMwLc5uj8Q89lF 7q6QqHQuSG21TJKKvZXiLgmXDetneUo0tFEaeroIhFJTuLQ9QEloh5p+tLXHb/Y0/s1h5WdYiG 5yNBomE0ybgjwKH0W+KKgYNYZVD//XDzOFVBbhSntsUTE7Q5EOspHqVfNsrojFqTaIUYJvYxOw 2KwdSgb1tGdNdM4quOnEz3My Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:41:57 -0700 IronPort-SDR: AnBM3QJBFDNMLPGhRMtQbQ1CUjtHaYIex9GJfxM9C7xAUafNNJPtMDslcoPS11Xo91VLuy4q51 9OPM+iaGDC1tPZU+I2W8vdRTzPpmc8UQNKTo1HhLt7ZJAfxl5JDrT3zW7l3LW3JIzbLaa08itX TcPMwD7bSiXdnFs3HqJKcXraEA5qHfZgZ6MqxJgvC79TmX1cAyA2VEo5STVpPVyv/MqRFghfDO ZaEQAmdfkrTQe9F+ya3F6JzaDXgYTIBPPjhkWUoB4w8o7MAQw2wnsJRJZCldE6ifMvzTR1B6so G2k= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:21 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v3 03/10] block: introduce blk_req_zone_write_trylock Date: Sat, 28 Mar 2020 01:50:05 +0900 Message-Id: <20200327165012.34443-4-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- block/blk-zoned.c | 14 ++++++++++++++ include/linux/blkdev.h | 1 + 2 files changed, 15 insertions(+) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 9d30a4115dbc..3de463a15901 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -50,6 +50,20 @@ bool blk_req_needs_zone_write_lock(struct request *rq) } EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock); +bool blk_req_zone_write_trylock(struct request *rq) +{ + unsigned int zno = blk_rq_zone_no(rq); + + if (test_and_set_bit(zno, rq->q->seq_zones_wlock)) + return false; + + WARN_ON_ONCE(rq->rq_flags & RQF_ZONE_WRITE_LOCKED); + rq->rq_flags |= RQF_ZONE_WRITE_LOCKED; + + return true; +} +EXPORT_SYMBOL_GPL(blk_req_zone_write_trylock); + void __blk_req_zone_write_lock(struct request *rq) { if (WARN_ON_ONCE(test_and_set_bit(blk_rq_zone_no(rq), diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 50e9b140cad7..2187d3778eba 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1737,6 +1737,7 @@ extern int bdev_write_page(struct block_device *, sector_t, struct page *, #ifdef CONFIG_BLK_DEV_ZONED bool blk_req_needs_zone_write_lock(struct request *rq); +bool blk_req_zone_write_trylock(struct request *rq); void __blk_req_zone_write_lock(struct request *rq); void __blk_req_zone_write_unlock(struct request *rq); From patchwork Fri Mar 27 16:50:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462651 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B055C14B4 for ; Fri, 27 Mar 2020 16:50:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7B69721473 for ; Fri, 27 Mar 2020 16:50:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="NSc5AfUy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727766AbgC0Qu1 (ORCPT ); Fri, 27 Mar 2020 12:50:27 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2579 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727242AbgC0Qu0 (ORCPT ); Fri, 27 Mar 2020 12:50:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327824; x=1616863824; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RemkMp72YWDBDZq0LO+BRIk0z88x7L+xAj2e+yGhSJo=; b=NSc5AfUyLUC0f6W+57MEbCU/c0wCJ5M1v2X7ujj4KeoziVFv5HEece4z BkBBECNZqXErHsYQ+IyMJsd4aI8XCkSlHpUHUwE49EKPdpiIaBj1fuUcu WiACQTLi0pZkGIsLf1HY2wBgokFwT4EFKt7ZSNiGijH9U/3voxl3YluGt c+NeKyDL8SHSuPCPoSCixRSkiw3zUBQWQgXm5NwlGe3sp3hDWgJdFn/F2 kplfUvIIcKV9TTY8ZVgPyu5H/xq3qGGDVoxp1v95g+wXKvKtyt3dZQhcL F5MSchuwKEpljnNx93zbVPX2FK9pFwgYYAkfrNeLZpUQNdIKPHXOEJft8 A==; IronPort-SDR: DUdiHi3Apf1lzgGib/hqoWYaZSifXn16oix6cilNjg5nyNq24v5f0aM/TpLDuUUCQAmReCZQn2 WRiNta52cVkg+RtIhXGfXBMmMl58V7zsXB7vcX194RjIjpHcBR4pNYgcprb7c1rBICeKzvAsYN RKCz2LAO7204N7FI3XuE2XXo8vY6OQIqN9g77xn91hBot/ur78NAcSHSf76jyazo0D/wc0dBE9 6siKaYBLgboPNidgKEgtdUmNkMR4x+hYhGlpxwZMBI/qSKF03wAwUj3YdvcdB2xXE0fY6+hVSJ bCY= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210446" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:24 +0800 IronPort-SDR: 3nTUIXGxiH6oZUF9m7/8Sk04TAkgN2Sp77sOSR0q8ahlyyaX/4vbXhwJmHW64ZAANJm5WyVqwV QWeZ+zXSh/PVOCpWiaUdU5TsrEe+z8/D5I025jNYSa46qo4ZpqRiNhCDP+L5FOsLMaG/67Fk1q FRUUVdu2iHBe9P6NpBr3Z/uqwa5SJ+LFL5vy8/ClgDBVAtztwfHXDlE8GvrYm7PVk10BxGh9zp uvpUQZlfR8fo9Knr/ljpBDi7cvUbBZiYqmc9jVmeuWBNVkWs4BYC9gGMuJcn5bPmGyZU8OGmfH BoNCt4Pj7ltaWhf8+Ry6F9Tt Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:41:58 -0700 IronPort-SDR: zRukiA0CNAgvQRg4YKkpq8mgVn1IJ3UA6cS7FX8+5u5yEtl4h29HinIaGKETkrdK8EUERK3wo3 bCyxS9oARNQWSxdlgSzRn45PJvn8SUXRjmFT7mIALGYWTlLo/Bm+YYtjVd5E+G772S+youC8QD 0j0J4H9zvWHJSM5Nqk3ZCmkncUIZqqrJ7AlkaB6LGx0zj9LorjCyYeQ5Lk0mn/a5k9lSkNJYhL SAZOON+eiRMO2QPZZr6j1g0gqLUtPM0E9JcO+4omZqSJFxtpuibioxB92Eg1G2pssnbENFmnCY IK0= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:23 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Damien Le Moal Subject: [PATCH v3 04/10] block: Introduce zone write pointer offset caching Date: Sat, 28 Mar 2020 01:50:06 +0900 Message-Id: <20200327165012.34443-5-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Damien Le Moal Not all zoned block devices natively support the zone append command. E.g. SCSI and ATA disks do not define this command. However, it is possible to emulate this command at the LLD level using regular write commands if the write pointer position of zones is known. Introducing such emulation enables the use of zone append write for all zoned block device types, therefore simplifying for instance the implementation of file systems native zoned block device support by avoiding the need for different write pathes depending on the device capabilities. To allow devices without zone append command support to emulate its behavior, introduce a zone write pointer cache attached to the device request_queue, similarly to the zones bitmaps. To save memory, this cache stores write pointer offsets relative to each zone start sector as a 32bits value rather than the 64bits absolute sector position of each zone write pointer position. While it would be natural to have each LLD implement as needed zone write pointer caching, the integration of this cache allocation and initialization within blk_revalidate_disk_zones() greatly simplifies the code and avoids potential races between the zone wp array size and the device known number of zones in the case of changes detected during device revalidation. Furthermore, initializing the zone wp array together with the device queue zone bitmaps when blk_revalidate_disk_zones() execute a full device zone report avoids the need for an additional full device zone report execution in the LLD revalidate method. This can significantly reduce the overhead of device revalidation as larger capacity SMR drives result in very costly full drive report zones processing. E.g., with a 20TB SMR disks and 256 MB zones, more than 75000 zones need to be reported using multiple report zone commands. The added delay of an additional full zone report is significant and can be avoided with an initialization within blk_revalidate_disk_zones(). By default, blk_revalidate_disk_zones() will not allocate and initialize a drive zone wp array. The allocation and initialization of this cache is done only if a device driver request it with the QUEUE_FLAG_ZONE_WP_OFST queue flag. The allocation and initialization of the cache is done in the same manner as for the zone bitmaps, within the report zones callback function used by blk_revalidate_disk_zones(). In case of changes to the device zone configuration, the cache is updated under a queue freeze to avoid any race between the device driver use of the cache and the request queue update. Freeing of this new cache is done together with the zone bitmaps from the function blk_queue_free_zone_bitmaps(), renamed here to blk_queue_free_zone_resources(). Maintaining the write pointer offset values is the responsibility of the device LLD. The helper function blk_get_zone_wp_offset() is provided to simplify this task. Signed-off-by: Damien Le Moal --- block/blk-sysfs.c | 2 +- block/blk-zoned.c | 69 ++++++++++++++++++++++++++++++++++++++++-- block/blk.h | 4 +-- include/linux/blkdev.h | 20 ++++++++---- 4 files changed, 84 insertions(+), 11 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 02643e149d5e..bd0c9b4c1c5b 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -901,7 +901,7 @@ static void __blk_release_queue(struct work_struct *work) blk_exit_queue(q); - blk_queue_free_zone_bitmaps(q); + blk_queue_free_zone_resources(q); if (queue_is_mq(q)) blk_mq_release(q); diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 3de463a15901..665edf8a6d8d 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -344,18 +344,65 @@ static inline unsigned long *blk_alloc_zone_bitmap(int node, GFP_NOIO, node); } -void blk_queue_free_zone_bitmaps(struct request_queue *q) +static inline unsigned int *blk_alloc_zone_wp_ofst(unsigned int nr_zones) +{ + return kvcalloc(nr_zones, sizeof(unsigned int), GFP_NOIO); +} + +void blk_queue_free_zone_resources(struct request_queue *q) { kfree(q->conv_zones_bitmap); q->conv_zones_bitmap = NULL; kfree(q->seq_zones_wlock); q->seq_zones_wlock = NULL; + kvfree(q->seq_zones_wp_ofst); + q->seq_zones_wp_ofst = NULL; } +/** + * blk_get_zone_wp_ofst - Calculate a zone write pointer offset position + * @zone: Target zone + * @wp_ofst: Calculated write pointer offset + * + * Helper function for low-level device drivers to obtain a zone write pointer + * position relative to the zone start sector (write pointer offset). The write + * pointer offset depends on the zone condition. If the zone has an invalid + * condition, -ENODEV is returned. + */ +int blk_get_zone_wp_offset(struct blk_zone *zone, unsigned int *wp_ofst) +{ + switch (zone->cond) { + case BLK_ZONE_COND_EMPTY: + *wp_ofst = 0; + return 0; + case BLK_ZONE_COND_IMP_OPEN: + case BLK_ZONE_COND_EXP_OPEN: + case BLK_ZONE_COND_CLOSED: + *wp_ofst = zone->wp - zone->start; + return 0; + case BLK_ZONE_COND_FULL: + *wp_ofst = zone->len; + return 0; + case BLK_ZONE_COND_NOT_WP: + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + /* + * Conventional, offline and read-only zones do not have a valid + * write pointer. Use 0 as a dummy value. + */ + *wp_ofst = 0; + return 0; + default: + return -ENODEV; + } +} +EXPORT_SYMBOL_GPL(blk_get_zone_wp_offset); + struct blk_revalidate_zone_args { struct gendisk *disk; unsigned long *conv_zones_bitmap; unsigned long *seq_zones_wlock; + unsigned int *seq_zones_wp_ofst; unsigned int nr_zones; sector_t zone_sectors; sector_t sector; @@ -371,6 +418,7 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, struct gendisk *disk = args->disk; struct request_queue *q = disk->queue; sector_t capacity = get_capacity(disk); + int ret; /* * All zones must have the same size, with the exception on an eventual @@ -406,6 +454,13 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, return -ENODEV; } + if (blk_queue_zone_wp_ofst(q) && !args->seq_zones_wp_ofst) { + args->seq_zones_wp_ofst = + blk_alloc_zone_wp_ofst(args->nr_zones); + if (!args->seq_zones_wp_ofst) + return -ENOMEM; + } + /* Check zone type */ switch (zone->type) { case BLK_ZONE_TYPE_CONVENTIONAL: @@ -432,6 +487,14 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, return -ENODEV; } + if (args->seq_zones_wp_ofst) { + /* Initialize the zone write pointer offset */ + ret = blk_get_zone_wp_offset(zone, + &args->seq_zones_wp_ofst[idx]); + if (ret) + return ret; + } + args->sector += zone->len; return 0; } @@ -489,16 +552,18 @@ int blk_revalidate_disk_zones(struct gendisk *disk) q->nr_zones = args.nr_zones; swap(q->seq_zones_wlock, args.seq_zones_wlock); swap(q->conv_zones_bitmap, args.conv_zones_bitmap); + swap(q->seq_zones_wp_ofst, args.seq_zones_wp_ofst); ret = 0; } else { pr_warn("%s: failed to revalidate zones\n", disk->disk_name); - blk_queue_free_zone_bitmaps(q); + blk_queue_free_zone_resources(q); } blk_mq_unfreeze_queue(q); out: kfree(args.seq_zones_wlock); kfree(args.conv_zones_bitmap); + kvfree(args.seq_zones_wp_ofst); return ret; } EXPORT_SYMBOL_GPL(blk_revalidate_disk_zones); diff --git a/block/blk.h b/block/blk.h index d9673164a145..77936611413c 100644 --- a/block/blk.h +++ b/block/blk.h @@ -370,9 +370,9 @@ static inline int blk_iolatency_init(struct request_queue *q) { return 0; } struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp); #ifdef CONFIG_BLK_DEV_ZONED -void blk_queue_free_zone_bitmaps(struct request_queue *q); +void blk_queue_free_zone_resources(struct request_queue *q); #else -static inline void blk_queue_free_zone_bitmaps(struct request_queue *q) {} +static inline void blk_queue_free_zone_resources(struct request_queue *q) {} #endif void part_dec_in_flight(struct request_queue *q, struct hd_struct *part, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2187d3778eba..a1e2336da5b0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -363,6 +363,7 @@ extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op, sector_t sectors, sector_t nr_sectors, gfp_t gfp_mask); extern int blk_revalidate_disk_zones(struct gendisk *disk); +int blk_get_zone_wp_offset(struct blk_zone *zone, unsigned int *wp_ofst); extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg); @@ -499,14 +500,17 @@ struct request_queue { /* * Zoned block device information for request dispatch control. * nr_zones is the total number of zones of the device. This is always - * 0 for regular block devices. conv_zones_bitmap is a bitmap of nr_zones - * bits which indicates if a zone is conventional (bit set) or + * 0 for regular block devices. conv_zones_bitmap is a bitmap of + * nr_zones bits which indicates if a zone is conventional (bit set) or * sequential (bit clear). seq_zones_wlock is a bitmap of nr_zones * bits which indicates if a zone is write locked, that is, if a write - * request targeting the zone was dispatched. All three fields are - * initialized by the low level device driver (e.g. scsi/sd.c). - * Stacking drivers (device mappers) may or may not initialize - * these fields. + * request targeting the zone was dispatched. seq_zones_wp_ofst is an + * array of nr_zones write pointer values relative to the zone start + * sector. This is only initialized for LLDs needing zone append write + * command emulation with regular write. All fields are initialized by + * the blk_revalidate_disk_zones() function when called by the low + * level device driver (e.g. scsi/sd.c). Stacking drivers (device + * mappers) may or may not initialize these fields. * * Reads of this information must be protected with blk_queue_enter() / * blk_queue_exit(). Modifying this information is only allowed while @@ -516,6 +520,7 @@ struct request_queue { unsigned int nr_zones; unsigned long *conv_zones_bitmap; unsigned long *seq_zones_wlock; + unsigned int *seq_zones_wp_ofst; #endif /* CONFIG_BLK_DEV_ZONED */ /* @@ -613,6 +618,7 @@ struct request_queue { #define QUEUE_FLAG_PCI_P2PDMA 25 /* device supports PCI p2p requests */ #define QUEUE_FLAG_ZONE_RESETALL 26 /* supports Zone Reset All */ #define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */ +#define QUEUE_FLAG_ZONE_WP_OFST 28 /* queue needs zone wp offsets */ #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP)) @@ -647,6 +653,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #else #define blk_queue_rq_alloc_time(q) false #endif +#define blk_queue_zone_wp_ofst(q) \ + test_bit(QUEUE_FLAG_ZONE_WP_OFST, &(q)->queue_flags) #define blk_noretry_request(rq) \ ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \ From patchwork Fri Mar 27 16:50:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462657 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9287717EA for ; Fri, 27 Mar 2020 16:50:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7090B20BED for ; Fri, 27 Mar 2020 16:50:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="b27fdLCK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727874AbgC0Qu2 (ORCPT ); Fri, 27 Mar 2020 12:50:28 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2581 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727143AbgC0Qu0 (ORCPT ); Fri, 27 Mar 2020 12:50:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327826; x=1616863826; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wowatp8PN3iR/trDrof+MFQ0lADDim1HmYmnYYeViYA=; b=b27fdLCKdbftEGSE7apzg4ahjDrge8nc5GRXhqF9/BAqo89XAWEip/3Q wjTK9dwVMCFU/qzfD/FAZRQHk7cCiycaL8Hf7X/432xIAhQXIeTZ+Agri Kj5AOHH6p+8fEMDJfZOYoQUfj/w6NaoChRG+VcfN9JPcyJB+P3BCc2S07 Et8t8IxlunU529oERmgz13zeG0Oeu09iIWwGoe9a6XbWGbHqJxqdiy05u 6OtXJvskTFvOiXzMtWujYI9p7d0yfvqtqXrVW7wNsed1H7852nhUcb3ck 3rejKpDXpGAST83vSrjG8NKxciz0NhYOYg+aexsgEuMt9pg5SejpD8Wlw w==; IronPort-SDR: cpBqsye8KM4YoFbuHe9CwXxYe33K2fUuEP7vxdCeVl3QFJESdjexLGHU7KVOfkGlygDJ+yWSX6 4XdSAJKj729ax6g8IMzpOAD0Kr8+AeieE+ulfTw1b4vHXA+0M0UkfHSoI4L4PNZhyrhsdPmHMy 93/aJZkLl9Ic7mMFKTtuHVL3OGAZJXClVykBf2xoo5JGIWITeGkENclLXjl/q5O3EbhJ49GahF +HFE0qOpO1NLF2y26/XpyvB9KL0FPUoEWpCBdWbZNELAPthv+DVGhZeyTZDnbbeUDdc64QX+au vLw= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210449" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:25 +0800 IronPort-SDR: ZEkr6wKZLR8qjVY90YdNgB5JYUPEJDVyaZ8M1fkaHnj56u0FbarAYHw8FHpcJR4qCyCGkxI1BZ rV711V4P6zvkZ/cVV6s/me8C/PKS6wu9FYTuUZdwilNuiWr7ibwSpIW/Q88MHTr8u6sn6RBKIC +0SXYfU7Cby73M7Ml//kN6P/62m8q+oAOZsHIARPjhFxqytA43EHLc2JI+lEVHmA8tgE5IjROl qwyZhLuuiCvuYpHpOkjcevHrPRg535jh6NfJS6d8Ysia/LFCTehVGwg+fhT7cH8a1CMKDv9IYR FPzaTwOm0tHmFfzFv4Nc/qKo Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:42:00 -0700 IronPort-SDR: gh1eTpNzkWrbHDFDtSpAFZ2CQpyTelAw/5AsIYh85c0kLniKzWzXYx8uZ5sqNgt/pdVqauVr13 mWvQ5m41/1la0UHtwoY2kh7cN9YtAZUtFFfELepivh713VPESsWyUByqF0mHuyxuxLT3JVcdKc I0dnlD1X9D+SRqxiEuNn2OI8UKT5uq+/UkhRn18wKJtYspNR5Js4ZddpbguDYDsn9yi0REGA/z i9lw8W07a/C/Thmu07o7ahqlhOOw+MVvLWdGPTNj08B6Ma/GNZLuqWzYPXfJ8Txy2rbAGXMSDQ IpI= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:24 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v3 05/10] scsi: sd_zbc: factor out sanity checks for zoned commands Date: Sat, 28 Mar 2020 01:50:07 +0900 Message-Id: <20200327165012.34443-6-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Factor sanity checks for zoned commands from sd_zbc_setup_zone_mgmt_cmnd(). This will help with the introduction of an emulated ZONE_APPEND command. Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- drivers/scsi/sd_zbc.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index f45c22b09726..ee156fbf3780 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -209,6 +209,26 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, return ret; } +static blk_status_t sd_zbc_cmnd_checks(struct scsi_cmnd *cmd) +{ + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + sector_t sector = blk_rq_pos(rq); + + if (!sd_is_zoned(sdkp)) + /* Not a zoned device */ + return BLK_STS_IOERR; + + if (sdkp->device->changed) + return BLK_STS_IOERR; + + if (sector & (sd_zbc_zone_sectors(sdkp) - 1)) + /* Unaligned request */ + return BLK_STS_IOERR; + + return BLK_STS_OK; +} + /** * sd_zbc_setup_zone_mgmt_cmnd - Prepare a zone ZBC_OUT command. The operations * can be RESET WRITE POINTER, OPEN, CLOSE or FINISH. @@ -223,20 +243,14 @@ blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, unsigned char op, bool all) { struct request *rq = cmd->request; - struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); sector_t sector = blk_rq_pos(rq); + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); sector_t block = sectors_to_logical(sdkp->device, sector); + blk_status_t ret; - if (!sd_is_zoned(sdkp)) - /* Not a zoned device */ - return BLK_STS_IOERR; - - if (sdkp->device->changed) - return BLK_STS_IOERR; - - if (sector & (sd_zbc_zone_sectors(sdkp) - 1)) - /* Unaligned request */ - return BLK_STS_IOERR; + ret = sd_zbc_cmnd_checks(cmd); + if (ret != BLK_STS_OK) + return ret; cmd->cmd_len = 16; memset(cmd->cmnd, 0, cmd->cmd_len); From patchwork Fri Mar 27 16:50:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462663 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D1E8F17EA for ; Fri, 27 Mar 2020 16:50:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B096D21655 for ; Fri, 27 Mar 2020 16:50:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="jlFvzaYv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727878AbgC0Qu3 (ORCPT ); Fri, 27 Mar 2020 12:50:29 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2590 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727143AbgC0Qu2 (ORCPT ); Fri, 27 Mar 2020 12:50:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327828; x=1616863828; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8KcQf63Dqr8oNlcqCcTsYF2SuiRSRH0lprU902Zkvyw=; b=jlFvzaYvltFi0Rr0Zwh4ojetQDQ1dZrv8JQwmDocnu79ObqPpenTtYxF eZRmXQqqdjSPjg8tTAaDNPaon7GAQ5dXJOdNwGXK04keNUTCuFPzCiZ5D SUP4Ztfiq7IGipQYwIyhlTBppy1WqU6wfddkj5FaDpJ6m/vTEJ7mqvl86 NYPA1RbZ4xZk67ON+RnZW4eTFl7aQOE7SzZCOXiyi/Tsv35H3UZHTtlTO Vi6y9SUVFaOUe3aWj9by0P1W06WL2Be2zAbqU2NhPATSH7P1k3UcH/esp Km/i47eaTbqkqlHuOrUQ6N7xQvN9dd6VnMEAqhGko69jPf80EKxZVrBEO g==; IronPort-SDR: zjbTxuY+SkXtjZq2ovHKodRf6VTZyzpdejuWW+Ca6QD+mT5Iv//5EcHA/upf3NntInP5hbP8Fr 5/AMlOErrRdwGZOzosqjjiwE0rkCnHsctk24Up6HKNaAQyvXxJe0soIGurI/sJyiElzlet2ORk WtDofchTKOseH1MRUcHXgAD/rA3vYPggeq7t2/rQ8awzbqbZ1eIETD0TFEocBOl20ZCTedoilV eq0W+M0Zd29MpQ77+Jn/34hhZe2ve3gcVIuQeGdU0T+34Zga04I9pqphMTBC2OuCXWhkD2uYuI dQY= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210455" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:27 +0800 IronPort-SDR: 93DFMNvViBoVqoB2GQA17XbPk6dxbXeaiP5DGVsVElIyWU4aDCP3BxG3B2zzBsg4wTRri+C1+m +2ZTH11cT7yJ5+xf2tkmPFSxlUvRdRo4lEpDLUIWMETljMu7ySqGSKxwW+/n23TmDweb0HztJn ms+/MUGWJvxemODvmNW3H2kod+Kvq2Y0L/KHrmszKA8oOnGsYpTkNmmYn4/SrAxhG6bVNy/PXH RPNO5kIwxgMNw9HS8CjjQZeN+Xapm+QinAYBuzxJYzBFX45HRI1nazeeF4txPG5YMFiJCL4eOx ccHbnweCWqpU5D9ccgTfjtR5 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:42:02 -0700 IronPort-SDR: DmjFkXgJbaHi+4iUXg4DvnX3X7T7cRYgDo/gGyTYS9q917uwehgSlTG7aOnJPc5GS9axlrJ1n0 +W2tlY5ErstEO6Rdp/KzszGnsMYYj50mMvaIXmKJyocZOIsz5R3KGdh97zOuuQgaUuUSOuTKTh jWQTUbktHnG5KiDAOwhbUrx3T0Ow4otb1WfFMHVkEI2Q/vm3pw8CYCVcy5mXZdSgVfDpe8Yg37 bzMlc82bkBx/A2SvJIidisuolPHO+QOtbKopb09P6RWkRF16WkiSl+K1z3oKfVa5WrrUwvQwRq /g8= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:26 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v3 06/10] scsi: sd_zbc: emulate ZONE_APPEND commands Date: Sat, 28 Mar 2020 01:50:08 +0900 Message-Id: <20200327165012.34443-7-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) with a start LBA set to the target zone write pointer position. In order to always know the write pointer position of a sequential write zone, the queue flag QUEUE_FLAG_ZONE_WP_OFST is set to get an initialized write pointer offset array attached to the device request queue. The values of the cache are maintained in sync with the device as follows: 1) the write pointer offset of a zone is reset to 0 when a REQ_OP_ZONE_RESET command completes. 2) the write pointer offset of a zone is set to the zone size when a REQ_OP_ZONE_FINISH command completes. 3) the write pointer offset of a zone is incremented by the number of 512B sectors written when a write or a zone append command completes 4) the write pointer offset of all zones is reset to 0 when a REQ_OP_ZONE_RESET_ALL command completes. Since the block layer does not write lock zones for zone append commands, to ensure a sequential ordering of the write commands used for the emulation, the target zone of a zone append command is locked when the function sd_zbc_prepare_zone_append() is called from sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained (e.g. a zone append is in-flight or a regular write has already locked the zone), the zone append command dispatching is delayed by returning BLK_STS_ZONE_RESOURCE. Since zone reset and finish operations can be issued concurrently with writes and zone append requests, ensure a coherent update of the zone write pointer offsets by also write locking the target zones for these zone management requests. Finally, to avoid the need for write locking all zones for REQ_OP_ZONE_RESET_ALL requests, use a spinlock to protect accesses and modifications of the zone write pointer offsets. This spinlock is initialized from sd_probe() using the new function sd_zbc_init(). Signed-off-by: Johannes Thumshirn --- drivers/scsi/sd.c | 28 +++- drivers/scsi/sd.h | 36 ++++- drivers/scsi/sd_zbc.c | 316 +++++++++++++++++++++++++++++++++++++++++- 3 files changed, 363 insertions(+), 17 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 707f47c0ec98..18584bf01e11 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1215,6 +1215,12 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd) else protect = 0; + if (req_op(rq) == REQ_OP_ZONE_APPEND) { + ret = sd_zbc_prepare_zone_append(cmd, &lba, nr_blocks); + if (ret) + return ret; + } + if (protect && sdkp->protection_type == T10_PI_TYPE2_PROTECTION) { ret = sd_setup_rw32_cmnd(cmd, write, lba, nr_blocks, protect | fua); @@ -1287,6 +1293,7 @@ static blk_status_t sd_init_command(struct scsi_cmnd *cmd) return sd_setup_flush_cmnd(cmd); case REQ_OP_READ: case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return sd_setup_read_write_cmnd(cmd); case REQ_OP_ZONE_RESET: return sd_zbc_setup_zone_mgmt_cmnd(cmd, ZO_RESET_WRITE_POINTER, @@ -2055,7 +2062,7 @@ static int sd_done(struct scsi_cmnd *SCpnt) out: if (sd_is_zoned(sdkp)) - sd_zbc_complete(SCpnt, good_bytes, &sshdr); + good_bytes = sd_zbc_complete(SCpnt, good_bytes, &sshdr); SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt, "sd_done: completed %d of %d bytes\n", @@ -3370,6 +3377,8 @@ static int sd_probe(struct device *dev) sdkp->first_scan = 1; sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS; + sd_zbc_init_disk(sdkp); + sd_revalidate_disk(gd); gd->flags = GENHD_FL_EXT_DEVT; @@ -3663,19 +3672,26 @@ static int __init init_sd(void) if (!sd_page_pool) { printk(KERN_ERR "sd: can't init discard page pool\n"); err = -ENOMEM; - goto err_out_ppool; + goto err_out_cdb_pool; } + err = sd_zbc_init(); + if (err) + goto err_out_ppool; + err = scsi_register_driver(&sd_template.gendrv); if (err) - goto err_out_driver; + goto err_out_zbc; return 0; -err_out_driver: - mempool_destroy(sd_page_pool); +err_out_zbc: + sd_zbc_exit(); err_out_ppool: + mempool_destroy(sd_page_pool); + +err_out_cdb_pool: mempool_destroy(sd_cdb_pool); err_out_cache: @@ -3705,6 +3721,8 @@ static void __exit exit_sd(void) mempool_destroy(sd_page_pool); kmem_cache_destroy(sd_cdb_cache); + sd_zbc_exit(); + class_unregister(&sd_disk_class); for (i = 0; i < SD_MAJORS; i++) { diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 50fff0bf8c8e..34641be1d434 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -79,6 +79,7 @@ struct scsi_disk { u32 zones_optimal_open; u32 zones_optimal_nonseq; u32 zones_max_open; + spinlock_t zone_wp_ofst_lock; #endif atomic_t openers; sector_t capacity; /* size in logical blocks */ @@ -207,17 +208,33 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp) #ifdef CONFIG_BLK_DEV_ZONED +int __init sd_zbc_init(void); +void sd_zbc_exit(void); + +void sd_zbc_init_disk(struct scsi_disk *sdkp); extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer); extern void sd_zbc_print_zones(struct scsi_disk *sdkp); blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, unsigned char op, bool all); -extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, - struct scsi_sense_hdr *sshdr); +unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, + struct scsi_sense_hdr *sshdr); int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); +blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba, + unsigned int nr_blocks); + #else /* CONFIG_BLK_DEV_ZONED */ +static inline int sd_zbc_init(void) +{ + return 0; +} + +static inline void sd_zbc_exit(void) {} + +static inline void sd_zbc_init_disk(struct scsi_disk *sdkp) {} + static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { @@ -233,9 +250,18 @@ static inline blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, return BLK_STS_TARGET; } -static inline void sd_zbc_complete(struct scsi_cmnd *cmd, - unsigned int good_bytes, - struct scsi_sense_hdr *sshdr) {} +static inline unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, + unsigned int good_bytes, struct scsi_sense_hdr *sshdr) +{ + return 0; +} + +static inline blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, + sector_t *lba, + unsigned int nr_blocks) +{ + return BLK_STS_TARGET; +} #define sd_zbc_report_zones NULL diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index ee156fbf3780..17bdc50d29f3 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -19,6 +19,11 @@ #include "sd.h" +static struct kmem_cache *sd_zbc_zone_work_cache; +static mempool_t *sd_zbc_zone_work_pool; + +#define SD_ZBC_ZONE_WORK_MEMPOOL_SIZE 8 + static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf, unsigned int idx, report_zones_cb cb, void *data) { @@ -229,6 +234,152 @@ static blk_status_t sd_zbc_cmnd_checks(struct scsi_cmnd *cmd) return BLK_STS_OK; } +#define SD_ZBC_INVALID_WP_OFST ~(0u) +#define SD_ZBC_UPDATING_WP_OFST (SD_ZBC_INVALID_WP_OFST - 1) + +struct sd_zbc_zone_work { + struct work_struct work; + struct scsi_disk *sdkp; + unsigned int zno; + char buf[SD_BUF_SIZE]; +}; + +static int sd_zbc_update_wp_ofst_cb(struct blk_zone *zone, unsigned int idx, + void *data) +{ + struct sd_zbc_zone_work *zwork = data; + struct scsi_disk *sdkp = zwork->sdkp; + struct request_queue *q = sdkp->disk->queue; + int ret; + + spin_lock_bh(&sdkp->zone_wp_ofst_lock); + ret = blk_get_zone_wp_offset(zone, &q->seq_zones_wp_ofst[zwork->zno]); + if (ret) + q->seq_zones_wp_ofst[zwork->zno] = SD_ZBC_INVALID_WP_OFST; + spin_unlock_bh(&sdkp->zone_wp_ofst_lock); + + return ret; +} + +static void sd_zbc_update_wp_ofst_workfn(struct work_struct *work) +{ + struct sd_zbc_zone_work *zwork; + struct scsi_disk *sdkp; + int ret; + + zwork = container_of(work, struct sd_zbc_zone_work, work); + sdkp = zwork->sdkp; + + ret = sd_zbc_do_report_zones(sdkp, zwork->buf, SD_BUF_SIZE, + zwork->zno * sdkp->zone_blocks, true); + if (!ret) + sd_zbc_parse_report(sdkp, zwork->buf + 64, 0, + sd_zbc_update_wp_ofst_cb, zwork); + + mempool_free(zwork, sd_zbc_zone_work_pool); + scsi_device_put(sdkp->device); +} + +static blk_status_t sd_zbc_update_wp_ofst(struct scsi_disk *sdkp, + unsigned int zno) +{ + struct sd_zbc_zone_work *zwork; + + /* + * We are about to schedule work to update a zone write pointer offset, + * which will cause the zone append command to be requeued. So make + * sure that the scsi device does not go away while the work is + * being processed. + */ + if (scsi_device_get(sdkp->device)) + return BLK_STS_IOERR; + + zwork = mempool_alloc(sd_zbc_zone_work_pool, GFP_ATOMIC); + if (!zwork) { + /* Retry later */ + scsi_device_put(sdkp->device); + return BLK_STS_RESOURCE; + } + + memset(zwork, 0, sizeof(struct sd_zbc_zone_work)); + INIT_WORK(&zwork->work, sd_zbc_update_wp_ofst_workfn); + zwork->sdkp = sdkp; + zwork->zno = zno; + + sdkp->disk->queue->seq_zones_wp_ofst[zno] = SD_ZBC_UPDATING_WP_OFST; + + schedule_work(&zwork->work); + + return BLK_STS_RESOURCE; +} + +/** + * sd_zbc_prepare_zone_append() - Prepare an emulated ZONE_APPEND command. + * @cmd: the command to setup + * @lba: the LBA to patch + * @nr_blocks: the number of LBAs to be written + * + * Called from sd_setup_read_write_cmnd() for REQ_OP_ZONE_APPEND. + * @sd_zbc_prepare_zone_append() handles the necessary zone wrote locking and + * patching of the lba for an emulated ZONE_APPEND command. + * + * In case the cached write pointer offset is %SD_ZBC_INVALID_WP_OFST it will + * schedule a REPORT ZONES command and return BLK_STS_IOERR. + */ +blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba, + unsigned int nr_blocks) +{ + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + unsigned int wp_ofst, zno = blk_rq_zone_no(rq); + blk_status_t ret; + + ret = sd_zbc_cmnd_checks(cmd); + if (ret != BLK_STS_OK) + return ret; + + if (!blk_rq_zone_is_seq(rq)) + return BLK_STS_IOERR; + + /* Unlock of the write lock will happen in sd_zbc_complete() */ + if (!blk_req_zone_write_trylock(rq)) + return BLK_STS_ZONE_RESOURCE; + + spin_lock_bh(&sdkp->zone_wp_ofst_lock); + + wp_ofst = rq->q->seq_zones_wp_ofst[zno]; + + if (wp_ofst == SD_ZBC_UPDATING_WP_OFST) { + /* Write pointer offset update in progress: ask for a requeue */ + ret = BLK_STS_RESOURCE; + goto err; + } + + if (wp_ofst == SD_ZBC_INVALID_WP_OFST) { + /* Invalid write pointer offset: trigger an update from disk */ + ret = sd_zbc_update_wp_ofst(sdkp, zno); + goto err; + } + + wp_ofst = sectors_to_logical(sdkp->device, wp_ofst); + if (wp_ofst + nr_blocks > sdkp->zone_blocks) { + ret = BLK_STS_IOERR; + goto err; + } + + /* Set the LBA for the write command used to emulate zone append */ + *lba += wp_ofst; + + spin_unlock_bh(&sdkp->zone_wp_ofst_lock); + + return BLK_STS_OK; + +err: + spin_unlock_bh(&sdkp->zone_wp_ofst_lock); + blk_req_zone_write_unlock(rq); + return ret; +} + /** * sd_zbc_setup_zone_mgmt_cmnd - Prepare a zone ZBC_OUT command. The operations * can be RESET WRITE POINTER, OPEN, CLOSE or FINISH. @@ -266,25 +417,75 @@ blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, cmd->transfersize = 0; cmd->allowed = 0; + /* Only zone reset and zone finish need zone write locking */ + if (op != ZO_RESET_WRITE_POINTER && op != ZO_FINISH_ZONE) + return BLK_STS_OK; + + if (all) { + /* We do not write lock all zones for an all zone reset */ + if (op == ZO_RESET_WRITE_POINTER) + return BLK_STS_OK; + + /* Finishing all zones is not supported */ + return BLK_STS_IOERR; + } + + if (!blk_rq_zone_is_seq(rq)) + return BLK_STS_IOERR; + + if (!blk_req_zone_write_trylock(rq)) + return BLK_STS_ZONE_RESOURCE; + return BLK_STS_OK; } +static inline bool sd_zbc_zone_needs_write_unlock(struct request *rq) +{ + /* + * For zone append, the zone was locked in sd_zbc_prepare_zone_append(). + * For zone reset and zone finish, the zone was locked in + * sd_zbc_setup_zone_mgmt_cmnd(). + * For regular writes, the zone is unlocked by the block layer elevator. + */ + return req_op(rq) == REQ_OP_ZONE_APPEND || + req_op(rq) == REQ_OP_ZONE_RESET || + req_op(rq) == REQ_OP_ZONE_FINISH; +} + +static bool sd_zbc_need_zone_wp_update(struct request *rq) +{ + if (req_op(rq) == REQ_OP_WRITE || + req_op(rq) == REQ_OP_WRITE_ZEROES || + req_op(rq) == REQ_OP_WRITE_SAME) + return blk_rq_zone_is_seq(rq); + + if (req_op(rq) == REQ_OP_ZONE_RESET_ALL) + return true; + + return sd_zbc_zone_needs_write_unlock(rq); +} + /** * sd_zbc_complete - ZBC command post processing. * @cmd: Completed command * @good_bytes: Command reply bytes * @sshdr: command sense header * - * Called from sd_done(). Process report zones reply and handle reset zone - * and write commands errors. + * Called from sd_done() to handle zone commands errors and updates to the + * device queue zone write pointer offset cahce. */ -void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, +unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, struct scsi_sense_hdr *sshdr) { int result = cmd->result; struct request *rq = cmd->request; + struct request_queue *q = rq->q; + struct gendisk *disk = rq->rq_disk; + struct scsi_disk *sdkp = scsi_disk(disk); + enum req_opf op = req_op(rq); + unsigned int zno; - if (op_is_zone_mgmt(req_op(rq)) && + if (op_is_zone_mgmt(op) && result && sshdr->sense_key == ILLEGAL_REQUEST && sshdr->asc == 0x24) { @@ -294,7 +495,69 @@ void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, * so be quiet about the error. */ rq->rq_flags |= RQF_QUIET; + goto unlock_zone; + } + + if (!sd_zbc_need_zone_wp_update(rq)) + goto unlock_zone; + + /* + * If we got an error for a command that needs updating the write + * pointer offset cache, we must mark the zone wp offset entry as + * invalid to force an update from disk the next time a zone append + * command is issued. + */ + zno = blk_rq_zone_no(rq); + spin_lock_bh(&sdkp->zone_wp_ofst_lock); + + if (result && op != REQ_OP_ZONE_RESET_ALL) { + if (op == REQ_OP_ZONE_APPEND) { + /* Force complete completion (no retry) */ + good_bytes = 0; + scsi_set_resid(cmd, blk_rq_bytes(rq)); + } + + /* + * Force an update of the zone write pointer offset on + * the next zone append access. + */ + if (q->seq_zones_wp_ofst[zno] != SD_ZBC_UPDATING_WP_OFST) + q->seq_zones_wp_ofst[zno] = SD_ZBC_INVALID_WP_OFST; + goto unlock_wp_ofst; } + + switch (op) { + case REQ_OP_ZONE_APPEND: + rq->__sector += q->seq_zones_wp_ofst[zno]; + /* fallthrough */ + case REQ_OP_WRITE_ZEROES: + case REQ_OP_WRITE_SAME: + case REQ_OP_WRITE: + if (q->seq_zones_wp_ofst[zno] < sd_zbc_zone_sectors(sdkp)) + q->seq_zones_wp_ofst[zno] += good_bytes >> SECTOR_SHIFT; + break; + case REQ_OP_ZONE_RESET: + q->seq_zones_wp_ofst[zno] = 0; + break; + case REQ_OP_ZONE_FINISH: + q->seq_zones_wp_ofst[zno] = sd_zbc_zone_sectors(sdkp); + break; + case REQ_OP_ZONE_RESET_ALL: + memset(q->seq_zones_wp_ofst, 0, + sdkp->nr_zones * sizeof(unsigned int)); + break; + default: + break; + } + +unlock_wp_ofst: + spin_unlock_bh(&sdkp->zone_wp_ofst_lock); + +unlock_zone: + if (sd_zbc_zone_needs_write_unlock(rq)) + blk_req_zone_write_unlock(rq); + + return good_bytes; } /** @@ -399,6 +662,7 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf, int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { struct gendisk *disk = sdkp->disk; + struct request_queue *q = disk->queue; unsigned int nr_zones; u32 zone_blocks = 0; int ret; @@ -421,9 +685,12 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) goto err; /* The drive satisfies the kernel restrictions: set it up */ - blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, sdkp->disk->queue); - blk_queue_required_elevator_features(sdkp->disk->queue, - ELEVATOR_F_ZBD_SEQ_WRITE); + blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); + blk_queue_flag_set(QUEUE_FLAG_ZONE_WP_OFST, q); + blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE); + blk_queue_max_zone_append_sectors(q, + min_t(u32, logical_to_sectors(sdkp->device, zone_blocks), + q->limits.max_segments << (PAGE_SHIFT - SECTOR_SHIFT))); nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks); /* READ16/WRITE16 is mandatory for ZBC disks */ @@ -475,3 +742,38 @@ void sd_zbc_print_zones(struct scsi_disk *sdkp) sdkp->nr_zones, sdkp->zone_blocks); } + +void sd_zbc_init_disk(struct scsi_disk *sdkp) +{ + if (!sd_is_zoned(sdkp)) + return; + + spin_lock_init(&sdkp->zone_wp_ofst_lock); +} + +int __init sd_zbc_init(void) +{ + sd_zbc_zone_work_cache = + kmem_cache_create("sd_zbc_zone_work", + sizeof(struct sd_zbc_zone_work), + 0, 0, NULL); + if (!sd_zbc_zone_work_cache) + return -ENOMEM; + + sd_zbc_zone_work_pool = + mempool_create_slab_pool(SD_ZBC_ZONE_WORK_MEMPOOL_SIZE, + sd_zbc_zone_work_cache); + if (!sd_zbc_zone_work_pool) { + kmem_cache_destroy(sd_zbc_zone_work_cache); + printk(KERN_ERR "sd_zbc: create zone work pool failed\n"); + return -ENOMEM; + } + + return 0; +} + +void sd_zbc_exit(void) +{ + mempool_destroy(sd_zbc_zone_work_pool); + kmem_cache_destroy(sd_zbc_zone_work_cache); +} From patchwork Fri Mar 27 16:50:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462667 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B8D117EA for ; Fri, 27 Mar 2020 16:50:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2B037215A4 for ; Fri, 27 Mar 2020 16:50:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="NYYADsZv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727880AbgC0Qub (ORCPT ); Fri, 27 Mar 2020 12:50:31 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2590 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727143AbgC0Qua (ORCPT ); Fri, 27 Mar 2020 12:50:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327830; x=1616863830; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0x46aiJQVLk+zdk3MjudKqj10oWp8p31iapMZu1OrXo=; b=NYYADsZvCcaYC4rhoTn8tdXgJHDHEkfd+5OL7rLMMvN0ys3qeHQ5fd7W lMT6e9JZ+RtDKc1ipch4/QbGbjTgIyn5wFaQXNgwGRMEJTEx8qFdqUEbL RYwk10y2yc2xQpE45i1FaJPXfWvT9YszTaBhdoAjNtDuCihpoZglmNlcd xwWuld12u/pHKcR9P6ixmDgLIgybXSFZ2cZyeJw75xNuid7k5ClXXatQx AT6rK2qCudn0ZlPvbxAPwKp3707CKAMvPqOog+w0QO19F2nTX1VZunjor 7+Anr5nZ6II5+6I+ATI4CUUcZzR2ARW8nMBGQN5LBQCykt2D//upS0+NY w==; IronPort-SDR: CJx6oW1Pfgn98X7a7B/Mlb7/GiW3ZVZljN719Lijf7Yvj4gETDAKBH6DZMg7jNqG1LXlVQux4/ k5P9VrUjIcj0elB/Ns86fQbOYbTx8nYnXfxc8V6IDlJf8F0DWUUV52EN1320ESJvGV/zenRvLK vHZKZFGaKz8g/vNueColLN+Duxkmfp3JYBBz2fY3LIeo1oEXWoZlSHNs3QToHlWv52d3THi5q2 RfTEh/n44z+Ha3RpWIwDHkmm5CthQJdCVeXLVGShJhkf+k2hN+bPXVQkHBi9sFyqnJkE2LVRh3 Uyc= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210459" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:29 +0800 IronPort-SDR: GAdek01ATE5zK0UjkpeIQF98GtdH0L8XAOEmQIN+X21I4mNm4eDSEsYElwkCsteOMDZ2XWQuVx QAvU1Ej+6i5Ce/zercDgPHkvKbvUE9ARZhQO2oiQkaN/rZCzLSCRwyRQifoYY6GlMQZwLalb0h YCelhMckYlT2M/cuWmpjrj6FFSrLyYOnit6Rqnzg0M34yLbx+MZ/hnYdL6HNN+vKNSmrosTE6E IJ2q0As1/KFlJ1MVTN19NxIhnTQcHMsbD47mlwNnV9onqVxijJHArufhtk6PlgUkzZ5icSY2MT gIrL6HH24/HGbgAv26ANQWkE Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:42:04 -0700 IronPort-SDR: z8Z4I65d35Nd6xBkrVn2qTPb0v5JRSPeDJ/yor4G4gQ4F9Tz5z6mzltvXnDjlOqtyf7kINcmqV BNFXq4sqlPcdvVUJ+ypbt21OgxbRW0I0G29KjsY+fUQFGjQmmL86hbnmvpGui1nFkcEttihn61 eqCZMtb8BjK12xYu36va7Q9ZXhGIV6SE9TAE5rt/Apa5g2UEUtkkDzcaH2tmXuFSBqOeU89nae 7vLg4KyPBkBGphbgbV7IsemRp0/Sn0G1kM/hrEEY8I9DHtVgazNFCSHDOWR+GM8S61tONJjxhT A6s= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:28 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Damien Le Moal , Johannes Thumshirn Subject: [PATCH v3 07/10] null_blk: Cleanup zoned device initialization Date: Sat, 28 Mar 2020 01:50:09 +0900 Message-Id: <20200327165012.34443-8-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Damien Le Moal Move all zoned mode related code from null_blk_main.c to null_blk_zoned.c, avoiding an ugly #ifdef in the process. Rename null_zone_init() into null_init_zoned_dev(), null_zone_exit() into null_free_zoned_dev() and add the new function null_register_zoned_dev() to finalize the zoned dev setup before add_disk(). Signed-off-by: Damien Le Moal Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- drivers/block/null_blk.h | 14 ++++++++++---- drivers/block/null_blk_main.c | 26 ++++++-------------------- drivers/block/null_blk_zoned.c | 21 +++++++++++++++++++-- 3 files changed, 35 insertions(+), 26 deletions(-) diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h index 62b660821dbc..2874463f1d42 100644 --- a/drivers/block/null_blk.h +++ b/drivers/block/null_blk.h @@ -86,8 +86,9 @@ struct nullb { }; #ifdef CONFIG_BLK_DEV_ZONED -int null_zone_init(struct nullb_device *dev); -void null_zone_exit(struct nullb_device *dev); +int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q); +int null_register_zoned_dev(struct nullb *nullb); +void null_free_zoned_dev(struct nullb_device *dev); int null_report_zones(struct gendisk *disk, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); blk_status_t null_handle_zoned(struct nullb_cmd *cmd, @@ -96,12 +97,17 @@ blk_status_t null_handle_zoned(struct nullb_cmd *cmd, size_t null_zone_valid_read_len(struct nullb *nullb, sector_t sector, unsigned int len); #else -static inline int null_zone_init(struct nullb_device *dev) +static inline int null_init_zoned_dev(struct nullb_device *dev, + struct request_queue *q) { pr_err("CONFIG_BLK_DEV_ZONED not enabled\n"); return -EINVAL; } -static inline void null_zone_exit(struct nullb_device *dev) {} +static inline int null_register_zoned_dev(struct nullb *nullb) +{ + return -ENODEV; +} +static inline void null_free_zoned_dev(struct nullb_device *dev) {} static inline blk_status_t null_handle_zoned(struct nullb_cmd *cmd, enum req_opf op, sector_t sector, sector_t nr_sectors) diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c index e9d66cc0d6b9..3e45e3640c12 100644 --- a/drivers/block/null_blk_main.c +++ b/drivers/block/null_blk_main.c @@ -580,7 +580,7 @@ static void null_free_dev(struct nullb_device *dev) if (!dev) return; - null_zone_exit(dev); + null_free_zoned_dev(dev); badblocks_exit(&dev->badblocks); kfree(dev); } @@ -1605,19 +1605,11 @@ static int null_gendisk_register(struct nullb *nullb) disk->queue = nullb->q; strncpy(disk->disk_name, nullb->disk_name, DISK_NAME_LEN); -#ifdef CONFIG_BLK_DEV_ZONED if (nullb->dev->zoned) { - if (queue_is_mq(nullb->q)) { - int ret = blk_revalidate_disk_zones(disk); - if (ret) - return ret; - } else { - blk_queue_chunk_sectors(nullb->q, - nullb->dev->zone_size_sects); - nullb->q->nr_zones = blkdev_nr_zones(disk); - } + int ret = null_register_zoned_dev(nullb); + if (ret) + return ret; } -#endif add_disk(disk); return 0; @@ -1795,14 +1787,9 @@ static int null_add_dev(struct nullb_device *dev) } if (dev->zoned) { - rv = null_zone_init(dev); + rv = null_init_zoned_dev(dev, nullb->q); if (rv) goto out_cleanup_blk_queue; - - nullb->q->limits.zoned = BLK_ZONED_HM; - blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, nullb->q); - blk_queue_required_elevator_features(nullb->q, - ELEVATOR_F_ZBD_SEQ_WRITE); } nullb->q->queuedata = nullb; @@ -1831,8 +1818,7 @@ static int null_add_dev(struct nullb_device *dev) return 0; out_cleanup_zone: - if (dev->zoned) - null_zone_exit(dev); + null_free_zoned_dev(dev); out_cleanup_blk_queue: blk_cleanup_queue(nullb->q); out_cleanup_tags: diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c index ed34785dd64b..8259f3212a28 100644 --- a/drivers/block/null_blk_zoned.c +++ b/drivers/block/null_blk_zoned.c @@ -10,7 +10,7 @@ static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect) return sect >> ilog2(dev->zone_size_sects); } -int null_zone_init(struct nullb_device *dev) +int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q) { sector_t dev_size = (sector_t)dev->size * 1024 * 1024; sector_t sector = 0; @@ -58,10 +58,27 @@ int null_zone_init(struct nullb_device *dev) sector += dev->zone_size_sects; } + q->limits.zoned = BLK_ZONED_HM; + blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); + blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE); + + return 0; +} + +int null_register_zoned_dev(struct nullb *nullb) +{ + struct request_queue *q = nullb->q; + + if (queue_is_mq(q)) + return blk_revalidate_disk_zones(nullb->disk); + + blk_queue_chunk_sectors(q, nullb->dev->zone_size_sects); + q->nr_zones = blkdev_nr_zones(nullb->disk); + return 0; } -void null_zone_exit(struct nullb_device *dev) +void null_free_zoned_dev(struct nullb_device *dev) { kvfree(dev->zones); } From patchwork Fri Mar 27 16:50:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462683 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7601417EF for ; Fri, 27 Mar 2020 16:50:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 51BCA20BED for ; Fri, 27 Mar 2020 16:50:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Fvvg6Wq3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727892AbgC0Quf (ORCPT ); Fri, 27 Mar 2020 12:50:35 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2590 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727143AbgC0Quc (ORCPT ); Fri, 27 Mar 2020 12:50:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327831; x=1616863831; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hO0jQxdyY1ykEreRRj8II7pjB7WCTRWEQg4KbI4Pvtc=; b=Fvvg6Wq3LFdh1FVdar/um968xP5LWue7cGT4WU2Kpo0d1HT7D4J/prIu 2X69zAeXiNWJup/tE8INcg+rLJfBYV0Sc8cc6WCZU0xqQjxi783YeTXyb ubHFrmNewDADxWqgEOOyTwky1zq27lIIzvVIjvCehFI/zN+IAb1KRDsuT 7Qn4LE2bL0fXq2oxiUFenTmOamCLP4fkj3lomSyJeyQncCByuMmiNG9cP W1WrtRjpEqn41Wazu7aJspHbBSvB4/fI0d9p82d/2hCDEIk3+mKSCLWBB 2Crburp1iqQ1yGwBgBYrYDeJaYgZa//sKs4+feQ4s1sTxgJ/IAn18fvD0 Q==; IronPort-SDR: 6ojcjvAcJP2ROpkA6X/49GYhpjXTvWlUF9BpcTqI1/q2Yy43LnV+7il6zDRBW1ckKK5h643/M7 gzvNmfWK8l5HOxvdd8NJffiA9YmnEXwbQSU/QIi9IQxeSpHPmCR+7ZWTg7daWsfHezhRkidnsr VzcXf9hRpKJczRO6cQYXrDMyk/3ZnErxXP1XYHHvtRJ3nYPavHL1EKPC2PgimrIJz5ogU3gTCb EhKDnHDna4hg9jIIDS08/1nHnOm5BVFaRyvo72c3f5JsM8eoF8UXMCJ6NT2zdpbPNMIluUaqqt K6k= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210464" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:31 +0800 IronPort-SDR: nImQ9MOv7Yf2kqrr8jY/Lkj4DMhgyz1/i02jveD5DFpfq9l1n4Bdt9s0CJD81EJTLJ7hkCJ3M4 tG54qh2HVCdQKXKWIwT7TnjLQFZrlZ+2M4UAvnaiiPT/O2xegGb/h3CR7LcNg1/p76CjhAeJji IZmBpaPVGYVa2OY1ybyPmNjzHggAW1psCk4RKXIf06bdr0YaqovozrYvEC5BbTw2y6hoWITLS0 fHN5wTgEycoeu2whLOQ6Bv3b/2hp7YJfBHPHnEV15eZaXMCIFarKgzciX6jGP0CanljY7+wurJ +jTixZ9w0ZhHcqWOHYodxecC Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:42:06 -0700 IronPort-SDR: +9zzBPz4Bn4fJLPecXnXfgyKkwnWCJxaZlTi8TTgKmcTfEmrSgfRhKFxFwnn4Q7QWDyIgjQjSW YVQeNVeLA6P8CW6nNGEJoqVfHI6K5nkd4A2nJlvl+SWDKEQJE1icsKBK/v87qaNvIANdDAxltU +fzR9IWVZajbJhVLmqANQDnaD8M0efFX415jZKgb9BOYPnrMJcmte6MpI5ve4kEqQflkabm6ZA agsyo0xDG1gIykx/ElD3o2B3UeereFVaGOFhbsmvQz+KeHGUgv7PH1wcd/7EamMuSGTUo9neWX gbs= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:30 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Damien Le Moal , Johannes Thumshirn Subject: [PATCH v3 08/10] null_blk: Support REQ_OP_ZONE_APPEND Date: Sat, 28 Mar 2020 01:50:10 +0900 Message-Id: <20200327165012.34443-9-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Damien Le Moal Support REQ_OP_ZONE_APPEND requests for zone mode null_blk devices. Use the internally tracked zone write pointer position as the actual write position, which is returned using the command request __sector field in the case of an mq device and using the command BIO sector in the case of a BIO device. Since the write position is used for data copy in the case of a memory backed device, reverse the order in which null_handle_zoned() and null_handle_memory_backed() are called to ensure that null_handle_memory_backed() sees the correct write position for REQ_OP_ZONE_APPEND operations. Signed-off-by: Damien Le Moal Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- drivers/block/null_blk_main.c | 9 +++++--- drivers/block/null_blk_zoned.c | 38 +++++++++++++++++++++++++++------- 2 files changed, 37 insertions(+), 10 deletions(-) diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c index 3e45e3640c12..5492f1e49eee 100644 --- a/drivers/block/null_blk_main.c +++ b/drivers/block/null_blk_main.c @@ -1300,12 +1300,15 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd, sector_t sector, goto out; } + if (dev->zoned) { + cmd->error = null_handle_zoned(cmd, op, sector, nr_sectors); + if (cmd->error != BLK_STS_OK) + goto out; + } + if (dev->memory_backed) cmd->error = null_handle_memory_backed(cmd, op); - if (!cmd->error && dev->zoned) - cmd->error = null_handle_zoned(cmd, op, sector, nr_sectors); - out: nullb_complete_cmd(cmd); return BLK_STS_OK; diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c index 8259f3212a28..f20be7b91b9f 100644 --- a/drivers/block/null_blk_zoned.c +++ b/drivers/block/null_blk_zoned.c @@ -67,13 +67,22 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q) int null_register_zoned_dev(struct nullb *nullb) { + struct nullb_device *dev = nullb->dev; struct request_queue *q = nullb->q; - if (queue_is_mq(q)) - return blk_revalidate_disk_zones(nullb->disk); + if (queue_is_mq(q)) { + int ret = blk_revalidate_disk_zones(nullb->disk); + + if (ret) + return ret; + } else { + blk_queue_chunk_sectors(q, dev->zone_size_sects); + q->nr_zones = blkdev_nr_zones(nullb->disk); + } - blk_queue_chunk_sectors(q, nullb->dev->zone_size_sects); - q->nr_zones = blkdev_nr_zones(nullb->disk); + blk_queue_max_zone_append_sectors(q, + min_t(sector_t, q->limits.max_hw_sectors, + dev->zone_size_sects)); return 0; } @@ -133,7 +142,7 @@ size_t null_zone_valid_read_len(struct nullb *nullb, } static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, - unsigned int nr_sectors) + unsigned int nr_sectors, bool append) { struct nullb_device *dev = cmd->nq->dev; unsigned int zno = null_zone_no(dev, sector); @@ -148,7 +157,20 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, case BLK_ZONE_COND_IMP_OPEN: case BLK_ZONE_COND_EXP_OPEN: case BLK_ZONE_COND_CLOSED: - /* Writes must be at the write pointer position */ + /* + * Regular writes must be at the write pointer position. + * Zone append writes are automatically issued at the write + * pointer and the position returned using the request or BIO + * sector. + */ + if (append) { + sector = zone->wp; + if (cmd->bio) + cmd->bio->bi_iter.bi_sector = sector; + else + cmd->rq->__sector = sector; + } + if (sector != zone->wp) return BLK_STS_IOERR; @@ -228,7 +250,9 @@ blk_status_t null_handle_zoned(struct nullb_cmd *cmd, enum req_opf op, { switch (op) { case REQ_OP_WRITE: - return null_zone_write(cmd, sector, nr_sectors); + return null_zone_write(cmd, sector, nr_sectors, false); + case REQ_OP_ZONE_APPEND: + return null_zone_write(cmd, sector, nr_sectors, true); case REQ_OP_ZONE_RESET: case REQ_OP_ZONE_RESET_ALL: case REQ_OP_ZONE_OPEN: From patchwork Fri Mar 27 16:50:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462681 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C12F14B4 for ; Fri, 27 Mar 2020 16:50:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EFDAA21556 for ; Fri, 27 Mar 2020 16:50:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="J8VvgXQ7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727888AbgC0Quf (ORCPT ); Fri, 27 Mar 2020 12:50:35 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2607 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727883AbgC0Qud (ORCPT ); Fri, 27 Mar 2020 12:50:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327833; x=1616863833; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hfSaALAwOkEjkHXBUzEOgrr3qDzp+TUzOjdfRsri0lw=; b=J8VvgXQ7PnoiAQ+dVO11KF8EExAH5V9W5TiTPD4rnvdXTnRSdDKB9IOh AxZiQFPqqsKFN2HzISknpGc9jloCWyWmPxjqMVI87JlPfvBhh4aUzddZi /MXVEJY7Z2fhN5PngfS/DX73bYngiY/IyB04W3BaEpzLu9XnyrpjwqFsY aixd9w0U8AUySyYRWxsp6Ty5PA9KBM7/y2PJQRp+65+TCc3BnFEkmVpSH Fgw1EDAjZZEpHXDwlY/LMq+ANCAaWcgpuQVFAhXvGq8aa+iXt2+WsoPe4 /ZzITbSfgeREDfcJZvtwW7URL0CgPPO6NhJh+txD/1Lix1aEXG34GqNe0 w==; IronPort-SDR: 6WkaZyBbAUyhL9plfPI2jYA5vGnZQk5urjLrw2UbzSa84390Dl9FRHzLkCSMsVVR4g9lpRifhL INEXOIr8SixmJpsfQSxDLfGxlukiy1RmuR/faHMlBQL/mKTIL4uouJUNEQThaM8VnrBu8txy8r ExWIGn/Ek3N8dQnt5CMplnwuTcMCIKa1uJAVF7bghceqKMtV98hO5i7HKexBG4gBNxKIPHZOuj 2iBa98VW9u1Zbe+BRh0CLUcFb6SgRzSdQfjDKBbqzy8EPalrVRc8IRbO1J7HuPkvA0Yi5XgF/8 XcU= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210468" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:33 +0800 IronPort-SDR: UDzevWbbLOHJGhB4aUVSd9wtEvasaZ30b920BYOTNr1kBwr8WUZ8R8ui2qooQ4zXotpdveypPc xs5jskTf56tzDLfv7M3WKlRJ8brIiyYGCHjqbQR6A30GjdTacUPv804TQKEeUKGEQz5we67j69 EruTGjlD401E+3VTI8DXqKdsXall1KS8FaTl4/ORYAxtU+ZLvbyFgW0TrAadMNmRXh6na66Q6I zyWppcumzw5tic5IxtWIyjuPAKvfR+32MuMrt3dJPqL17VHMdenmMhcnrN0vXx+9epI3tXXczm QKyM5mLOrBjdPPdULyIFwMbJ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:42:07 -0700 IronPort-SDR: Ee5vfifK/q6CaWhGLfVLdauIISNxd9ua1WXrJsCuI2uHvPqEegQll2GsmXVhlsgZpHP0TI0kC3 5gbSXgVenN1zVYKFhJqp3ZF07eG3UPIv9Y75wj2tJJFc5G18y9NutvGuaIK7jO4g49O2GHQSTK qCDfeKeg2eV86Jy6CH6I34n3m72IrHyXY/ZotO0dcafc4A5VbeeAYmzujmBlGsmnXI8m85ceHu 96VwWL3CwMvDrBtbICkY7KG/GzqjpQlu5WF93lRrr0RWDzWkoGglfJfYsy65Tc8e40pfJXyvXf tgg= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:32 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v3 09/10] block: export bio_release_pages and bio_iov_iter_get_pages Date: Sat, 28 Mar 2020 01:50:11 +0900 Message-Id: <20200327165012.34443-10-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Export bio_release_pages and bio_iov_iter_get_pages, so it can be used from modular code. Signed-off-by: Johannes Thumshirn --- block/bio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/bio.c b/block/bio.c index aee214db92d3..023ad8bd26c7 100644 --- a/block/bio.c +++ b/block/bio.c @@ -998,6 +998,7 @@ void bio_release_pages(struct bio *bio, bool mark_dirty) put_page(bvec->bv_page); } } +EXPORT_SYMBOL(bio_release_pages); static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) { @@ -1111,6 +1112,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) bio_set_flag(bio, BIO_NO_PAGE_REF); return bio->bi_vcnt ? 0 : ret; } +EXPORT_SYMBOL(bio_iov_iter_get_pages); static void submit_bio_wait_endio(struct bio *bio) { From patchwork Fri Mar 27 16:50:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11462691 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7399D14B4 for ; Fri, 27 Mar 2020 16:50:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 483E921473 for ; Fri, 27 Mar 2020 16:50:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="OGO8glJW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727901AbgC0Quh (ORCPT ); Fri, 27 Mar 2020 12:50:37 -0400 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:2612 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727895AbgC0Quf (ORCPT ); Fri, 27 Mar 2020 12:50:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1585327835; x=1616863835; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kLz9W2yxC9sEv7xBmU0z7qzPmrIvu5rxo6K0NTo1D9s=; b=OGO8glJW9+JJB/P8BRjmDI6iiGbuwuJiXBUVL4+e1K+k7FSSPSXtyMc4 ZpslqYh2jo3g/XMN3aq0Aqgci7vNH0nO9fIA51joab9R9iOqNtVsqA0V3 8yKdU9q+AdYG9OXXk/bGyupRIoOaP95w4gIgRakL+Lej32tT8nVfrlmy+ cxvk7DZhEqk92NehDXnT5NVigHA81mDzV96WjEY6acCFhAVAbeeLLJZDf 6h7xHTAoL9Yov+THofuX0u14/DbRZtFBIPINLSe8U9xuhmBTY9+5gXXu6 NesDZaKIdZImOvi6pl9ZEVGLjcefkQyTAvNdESNjFp/SUqP3FbMEITYu/ w==; IronPort-SDR: GF1L5EmPqGHR1AEmqtWVvqyhXDE7cBp2vJuGxyT38I5+pL6ADZpWYZ2BM7GnBed9tvuyu4GW9a 3DbnvDuJJhsLIi8k5Dp1dyG9WGWvH0NLv+U0pe99lnTf72WaIRZdIja10acSZFfx0LElGpNzHT uvJuuL6W1fiTTpu3OvnJwdwiXTKMqSZCbGUtpho2XEA7/mXLmM+AJ7WAhKGLSqygNtFZG1tm1p eLEuQq5h/jZi8M8wtQ85wkjyMsRyKyeWJH9MAy3UyE4uPh+23STX5qxrfJr+mLvL/uP/WyGExL CoI= X-IronPort-AV: E=Sophos;i="5.72,313,1580745600"; d="scan'208";a="242210473" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 28 Mar 2020 00:50:34 +0800 IronPort-SDR: 9NtglhQzS8gDEJTSKZlIop1m6T9WGe2sprqR2x9G7AZQEgj12yEpPwa1pG4FeBhpdjiKVnd9mA kURuKdzauhC/F3VgApjjgJmgblw2Ek3vWJe0LVjMezCaLK/Rj3kJGB6f8V4Aq/VAUPUzlZkgGu /wbihKREARgR8hgJdYhHhdo5D++S31i/yBRO2X9fpXpvJ+TC5+m4lN9UETwuvH/lEoj5KdQ6FX rXvKchylPKpF3eHkWUnLpO1H1Jok15RMq3Fsd/qNz3EOilve6dTUGC8Zuisp0AGitGLwUzJS4Z 2OheP3cmg0bpB1kH2zCX6d7o Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2020 09:42:09 -0700 IronPort-SDR: XNNOh8DsM4jnUZAsOUigEwHLWFZVgBnFTrTEh3NvmJF181kMsfLAVAdOwjS1nV2nnuxck7SD+0 9aVWMPWWboVJgXZ3KNQpsO9s0DRrpmxvtxu6OAlLWP6hSpOAbzGm3GsHvm+/0EjOvuV5PKohmw yi0tnn4KvHJXh1YJh85QswrS+bYAyCY/T2mT/kV3D6Pf7MMnsncKQbOVzWdj6+YR00FG5HXZBJ KsTUQRLkFP+FOU7Q9eTBYNKYrwENJAC4TZe4ijKGAVhzCyEnZxytem6lFT8aA0RxXTyG+DX7fD xSo= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip02.wdc.com with ESMTP; 27 Mar 2020 09:50:33 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v3 10/10] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Date: Sat, 28 Mar 2020 01:50:12 +0900 Message-Id: <20200327165012.34443-11-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200327165012.34443-1-johannes.thumshirn@wdc.com> References: <20200327165012.34443-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Synchronous direct I/O to a sequential write only zone can be issued using the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple BIOs can potentially result in reordering, we cannot support asynchronous IO via this interface. We also can only dispatch up to queue_max_zone_append_sectors() via the new zone-append method and have to return a short write back to user-space in case an IO larger than queue_max_zone_append_sectors() has been issued. Signed-off-by: Johannes Thumshirn --- fs/zonefs/super.c | 92 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 90 insertions(+), 2 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 69aee3dfb660..b5432861d62e 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -20,6 +20,7 @@ #include #include #include +#include #include "zonefs.h" @@ -582,6 +583,89 @@ static const struct iomap_dio_ops zonefs_write_dio_ops = { .end_io = zonefs_file_write_dio_end_io, }; +static void zonefs_zone_append_bio_endio(struct bio *bio) +{ + struct task_struct *waiter = bio->bi_private; + + WRITE_ONCE(bio->bi_private, NULL); + blk_wake_io_task(waiter); + + bio_release_pages(bio, false); + bio_put(bio); +} + +static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from) +{ + struct inode *inode = file_inode(iocb->ki_filp); + struct zonefs_inode_info *zi = ZONEFS_I(inode); + struct block_device *bdev = inode->i_sb->s_bdev; + ssize_t ret = 0; + ssize_t size; + struct bio *bio; + unsigned max; + int nr_pages; + blk_qc_t qc; + + nr_pages = iov_iter_npages(from, BIO_MAX_PAGES); + if (!nr_pages) + return 0; + + max = queue_max_zone_append_sectors(bdev_get_queue(bdev)) << 9; + max = ALIGN_DOWN(max, inode->i_sb->s_blocksize); + iov_iter_truncate(from, max); + + bio = bio_alloc_bioset(GFP_NOFS, nr_pages, &fs_bio_set); + if (!bio) + return -ENOMEM; + + bio_set_dev(bio, bdev); + bio->bi_iter.bi_sector = zi->i_zsector; + bio->bi_write_hint = iocb->ki_hint; + bio->bi_private = current; + bio->bi_end_io = zonefs_zone_append_bio_endio; + bio->bi_ioprio = iocb->ki_ioprio; + bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; + if (iocb->ki_flags & IOCB_DSYNC) + bio->bi_opf |= REQ_FUA; + + ret = bio_iov_iter_get_pages(bio, from); + if (unlikely(ret)) { + bio->bi_status = BLK_STS_IOERR; + bio_endio(bio); + return ret; + } + size = bio->bi_iter.bi_size; + task_io_account_write(ret); + + if (iocb->ki_flags & IOCB_HIPRI) + bio_set_polled(bio, iocb); + + bio_get(bio); + qc = submit_bio(bio); + for (;;) { + set_current_state(TASK_UNINTERRUPTIBLE); + if (!READ_ONCE(bio->bi_private)) + break; + if (!(iocb->ki_flags & IOCB_HIPRI) || + !blk_poll(bdev_get_queue(bdev), qc, true)) + io_schedule(); + } + __set_current_state(TASK_RUNNING); + + if (unlikely(bio->bi_status)) + ret = blk_status_to_errno(bio->bi_status); + + bio_put(bio); + + zonefs_file_write_dio_end_io(iocb, size, ret, 0); + if (ret >= 0) { + iocb->ki_pos += size; + return size; + } + + return ret; +} + /* * Handle direct writes. For sequential zone files, this is the only possible * write path. For these files, check that the user is issuing writes @@ -599,6 +683,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) struct super_block *sb = inode->i_sb; size_t count; ssize_t ret; + bool sync = is_sync_kiocb(iocb); /* * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT @@ -637,8 +722,11 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) } mutex_unlock(&zi->i_truncate_mutex); - ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops, - &zonefs_write_dio_ops, is_sync_kiocb(iocb)); + if (sync && zi->i_ztype == ZONEFS_ZTYPE_SEQ) + ret = zonefs_file_dio_append(iocb, from); + else + ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops, + &zonefs_write_dio_ops, sync); if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && (ret > 0 || ret == -EIOCBQUEUED)) { if (ret > 0)