From patchwork Tue Dec 22 03:48:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985639 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63F56C433E0 for ; Tue, 22 Dec 2020 03:52:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 37AE722CB2 for ; Tue, 22 Dec 2020 03:52:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726138AbgLVDv4 (ORCPT ); Mon, 21 Dec 2020 22:51:56 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46436 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725985AbgLVDv4 (ORCPT ); Mon, 21 Dec 2020 22:51:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609115; x=1640145115; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=L306P33jOLf6svdQjsNWRjnB1A6g8bi2oLYv7PhCk8Q=; b=aVT5Kx96tHE1qNQjCy4bxts3nw4AChhBDjpB5wD+0TqwKe5qr1EPFKqO ad783+vTj8gsrk84BfnjYxtwTk7Y0vZqi+j7LHrzJgltzEfXEaVxyEeTX 2AF3ppdO6agksF2zGgMYulrq4o70PM4PxwfrtzmGdzjpIMK0X1jusvV0w J57Lr50iBzPfrbfysPFe+H59usCW7YcE0Ri5wXZ2Au25nf8W2aAddO9Fi rUU5UFzsV6OHrTrvAcnx4BF9D3a+pUFQSnGIsxVvPrW82Kn4bdGzBY/oM ByYEL4JN/N3kEFgV9wiJ98l9n6kWv2JLqQ3ptlK361pC8gfz+cL3YI/MK A==; IronPort-SDR: LKVUfH7qy3sEpraGrWTv8g4AUCob/RKfpxWdVQepOkUoyng8PQewHWmNUsWGuOfDAfBZUfigWm rdZoZbEh1Htk4bVhyrr4Lpqok1VI/+Hy+IfV0i7SRXWFGXPQYiEsie0mp1VYmnmoLKgbJTg07R HtK5mDBUIUaA7/AuDI+ilfv/Udqi5padyrXt61tePuoHsM6yfecvbsRAGo597bA7EiI9E0wO8/ le7pYELt9CYB+4Ia72+HQvuMjVzryhpvH/K+gfV+EzjZlzfsIGkwBozmRqMFr7jGr+dAz/wSXq HDo= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193712" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:25 +0800 IronPort-SDR: 97t3EwjmAaalN21XLHz8d1DgYPVYT9gnY9DM+keoTrlIro+Z1+tSmm+HDfDeHWrUsO310Iu5PZ /kZzF+LdYlCjKtluixs7dGR3Uz1ptN7INmJZ3zfAuhIRC8bh9cXilGUjmQPzag/f87Y9J5ouGQ qF7pe/3wuCZUl3P0qZnXqlf6k9LVhgVFdc3Ig3uMAUFtwHsGtZt524jZCAXtXHDd/jMDKYnbOA ytMS6UozYpMEXi60mLhvxJlgkdJIqfj8cqZkE95pknrsOSuoZn17isBz0ru8CkhimVYwrMOEQF JdYKqCdCAOjicPlv9+UpbfGQ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:36 -0800 IronPort-SDR: WSzmHQsQjSWX+v/mV/UfsEFuvhb0X5ZTMurEBU7R5AB2njfaOsaebNAP4JE8W3MIUw3P20OjoQ 0Gjmp7i/pEgU8TtL3ST0T5XVn4BhTrgLYswLTI/BIxo2InOMwfh+1nUTuwsacqkl4CDQsOysTf mRqMi4/X9Z7tsi9//unIMT/90WIpOUUhwz1LqAimf13hBa7sVABAtJzXqemTrCUj/sxuOX14wj RspEfGjhPiRX5zGb1AZ7cxDIHPTroy4LS7E9jWOLwMcvJDlS4Dq3Mlm2wR2hPa4SdD3M4BkPeX qqc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:24 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Christoph Hellwig Subject: [PATCH v11 01/40] block: add bio_add_zone_append_page Date: Tue, 22 Dec 2020 12:48:54 +0900 Message-Id: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which is intended to be used by file systems that directly add pages to a bio instead of using bio_iov_iter_get_pages(). Cc: Jens Axboe Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Christoph Hellwig --- block/bio.c | 35 +++++++++++++++++++++++++++++++++++ include/linux/bio.h | 2 ++ 2 files changed, 37 insertions(+) diff --git a/block/bio.c b/block/bio.c index fa01bef35bb1..a6e482c8f43f 100644 --- a/block/bio.c +++ b/block/bio.c @@ -851,6 +851,41 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, } EXPORT_SYMBOL(bio_add_pc_page); +/** + * bio_add_zone_append_page - attempt to add page to zone-append bio + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist of a bio that will be submitted + * for a zone-append request. This can fail for a number of reasons, such as the + * bio being full or the target block device is not a zoned block device or + * other limitations of the target block device. The target block device must + * allow bio's up to PAGE_SIZE, so it is always possible to add a single page + * to an empty bio. + * + * Returns: number of bytes added to the bio, or 0 in case of a failure. + */ +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset) +{ + struct request_queue *q; + bool same_page = false; + + if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND)) + return 0; + + q = bio->bi_disk->queue; + + if (WARN_ON_ONCE(!blk_queue_is_zoned(q))) + return 0; + + return bio_add_hw_page(q, bio, page, len, offset, + queue_max_zone_append_sectors(q), &same_page); +} +EXPORT_SYMBOL_GPL(bio_add_zone_append_page); + /** * __bio_try_merge_page - try appending data to an existing bvec. * @bio: destination bio diff --git a/include/linux/bio.h b/include/linux/bio.h index c6d765382926..7ef300cb4e9a 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -442,6 +442,8 @@ void bio_chain(struct bio *, struct bio *); extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int); extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *, unsigned int, unsigned int); +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset); bool __bio_try_merge_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off, bool *same_page); void __bio_add_page(struct bio *bio, struct page *page, From patchwork Tue Dec 22 03:48:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985641 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86884C433E6 for ; Tue, 22 Dec 2020 03:52:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6552122D49 for ; Tue, 22 Dec 2020 03:52:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726159AbgLVDv5 (ORCPT ); Mon, 21 Dec 2020 22:51:57 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46437 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725985AbgLVDv5 (ORCPT ); Mon, 21 Dec 2020 22:51:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609116; x=1640145116; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=miwcS2PUXlaNgE1YrtKyGZ/a7gdhwX7SYGwz74INDPs=; b=PRvSsfzezufPo2XlFs++x3zzZ04z9pCqxi/CCP/JNWqZ3lWY4W5CvQ32 0yB47RI2AB+AMMVAjC4u4Rn+leGFma4PbwAxL8+IsSO5Cq35hdwnnNnOC ozBD26VxzO4o5lxbYv/NU1VVNO4H12Rfn1DIf9Jd8KGRp2GEVKGELcMdG QccSULCQ58e+sfDU0XK+V0Iu5qGF12EbCsx+WPT9+0Ip7NcGgf9h5Wmdv 88XXcwN006ufRNNruXbe00ZJr5ys5D/EqkNNTpLgge4oC7Zii9MujUx7X 5JgPjPnfAxHJuXR9GkgOBplzj/GWcJoOt+WyNwiGagp048J+oVkfZVEjU g==; IronPort-SDR: F1DkdjJGfnzk2bqrIQthpWwu/xv93XALBN2aKil8PMpSfYotIz9d4HD+c/oPpP750AulbNmt+z qSn9zbVCtkHWfKpi6pgs87K9DZEGKMTgwFcv545qlQOqitVt5kiFcgljEA1UAwQMXsq4JgLw6Z 1TjDpIEt7U1BqHbkpFL9v+iGheW+E8EN1qRLW43LngUhkEnOUVhiLzqPGsD7+hmzNVxUzAKjRi FEhqgr8RXtWejnkFYafYIgOpS3EDYbiLWGD9FF5l+onndHtjDKuggXvkdFtWxrgK7EPb1p8VwM pqc= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193715" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:27 +0800 IronPort-SDR: Ae5CLJfRc+7nUNfq0fXDF/z0tZn9FI5MPSPCQJXztnjxhvM2Lv4IqizYT4x09pj9IRg5D11Lxo xhBNivmUawylwHL/Y9A9ayr1ZtmkHMJbpEMMYNUjfT0vN2xY0n3hYLr6QYepMwXHrORPzPfikm eq30wFVf/4OKZrN6vv+0e7FbkPAccNqvIEz00ZmjGwsUuUxZ1I9txIxwSBFh5WSAj6F7Pgnkgr fsXIiFg+dU9ntTqL0U6KraiCOda196HLSrt1th3ypTIuBo+rAlfR4DV3dvkEoonXvn37PVfYqB xKJRKIzLTNzs0jW8QYTM5KmW Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:38 -0800 IronPort-SDR: pMirtehJWeBM4zJgQNWy3hlmjVPYg7DIR+5P9a0AVU5bhi0Y4ouVndhkgYpMZqwNsSA2AETm6i CQ1U43nZmJwmL9BK3z5p7VpR6cd+AWdi3TIrjQT6f3ei4YVlXxutnKymuHsDgjqiN9CbdKNBZ9 D9fhEScSUm78JgeYMmUkiuNfjVt9rn5sNfPWQC31TFBu4/Atw4/gZdFSmAIlqB8q9Dugg6HO1v Bg1skExivcWXkoPrduG0iSb6UIdvP1WESYsCbToCD8z5SHRImQXjPs9X1R6s3OIJMV0tVEdW2x 13E= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:26 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Christoph Hellwig Subject: [PATCH v11 02/40] iomap: support REQ_OP_ZONE_APPEND Date: Tue, 22 Dec 2020 12:48:55 +0900 Message-Id: <33bbb544385b7710f29c03b06699755def39319a.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds such restricted bio using __bio_iov_append_get_pages if bio_op(bio) == REQ_OP_ZONE_APPEND. To utilize it, we need to set the bio_op before calling bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND and restricted bio. Reviewed-by: Christoph Hellwig Signed-off-by: Naohiro Aota Reviewed-by: Darrick J. Wong --- fs/iomap/direct-io.c | 43 +++++++++++++++++++++++++++++++++++++------ include/linux/iomap.h | 1 + 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 933f234d5bec..2273120d8ed7 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -201,6 +201,34 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, iomap_dio_submit_bio(dio, iomap, bio, pos); } +/* + * Figure out the bio's operation flags from the dio request, the + * mapping, and whether or not we want FUA. Note that we can end up + * clearing the WRITE_FUA flag in the dio request. + */ +static inline unsigned int +iomap_dio_bio_opflags(struct iomap_dio *dio, struct iomap *iomap, bool use_fua) +{ + unsigned int opflags = REQ_SYNC | REQ_IDLE; + + if (!(dio->flags & IOMAP_DIO_WRITE)) { + WARN_ON_ONCE(iomap->flags & IOMAP_F_ZONE_APPEND); + return REQ_OP_READ; + } + + if (iomap->flags & IOMAP_F_ZONE_APPEND) + opflags |= REQ_OP_ZONE_APPEND; + else + opflags |= REQ_OP_WRITE; + + if (use_fua) + opflags |= REQ_FUA; + else + dio->flags &= ~IOMAP_DIO_WRITE_FUA; + + return opflags; +} + static loff_t iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, struct iomap_dio *dio, struct iomap *iomap) @@ -208,6 +236,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); unsigned int fs_block_size = i_blocksize(inode), pad; unsigned int align = iov_iter_alignment(dio->submit.iter); + unsigned int bio_opf; struct bio *bio; bool need_zeroout = false; bool use_fua = false; @@ -263,6 +292,13 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, iomap_dio_zero(dio, iomap, pos - pad, pad); } + /* + * Set the operation flags early so that bio_iov_iter_get_pages + * can set up the page vector appropriately for a ZONE_APPEND + * operation. + */ + bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua); + do { size_t n; if (dio->error) { @@ -278,6 +314,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio->bi_ioprio = dio->iocb->ki_ioprio; bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; + bio->bi_opf = bio_opf; ret = bio_iov_iter_get_pages(bio, dio->submit.iter); if (unlikely(ret)) { @@ -293,14 +330,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, n = bio->bi_iter.bi_size; if (dio->flags & IOMAP_DIO_WRITE) { - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE; - if (use_fua) - bio->bi_opf |= REQ_FUA; - else - dio->flags &= ~IOMAP_DIO_WRITE_FUA; task_io_account_write(n); } else { - bio->bi_opf = REQ_OP_READ; if (dio->flags & IOMAP_DIO_DIRTY) bio_set_pages_dirty(bio); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 5bd3cac4df9c..8ebb1fa6f3b7 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -55,6 +55,7 @@ struct vm_fault; #define IOMAP_F_SHARED 0x04 #define IOMAP_F_MERGED 0x08 #define IOMAP_F_BUFFER_HEAD 0x10 +#define IOMAP_F_ZONE_APPEND 0x20 /* * Flags set by the core iomap code during operations: From patchwork Tue Dec 22 03:48:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985643 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0857C433E9 for ; Tue, 22 Dec 2020 03:52:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 909E022D49 for ; Tue, 22 Dec 2020 03:52:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726175AbgLVDwC (ORCPT ); Mon, 21 Dec 2020 22:52:02 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46443 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725985AbgLVDwC (ORCPT ); Mon, 21 Dec 2020 22:52:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609121; x=1640145121; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tSXU9OaAmTXe/euJxQi8T9mOgyCni/SVnixZLuW3GNU=; b=SstOubEVj1paSKLn3PE6ZlfPol4eztNLo9wa3RfapJAiRl8a9U4m37MZ aoO0CBxWoshcOAG6qD3Jcz0ZatF9HU7aVxSJbvGi8b3OAAyRrIErDvUWT MAlFHncL7H64u0S2j8Ndf+RcTJwTNBJghxTuUW2Jv6rjleQcHET457WCE MUGYxZFWLRGvLpBCeQjjsK6FCr+odVvA3zccUK1pWFrqTlVRLH98MFZaZ d26/YyVD2zuYOVhSMRjbDfkyBTPR1AM+NoOLPKlTdI1I1WBL7BFEMVVpX OPv09bOH+0rsFul6AIf8Lpnyi/SNFp1lhcS+KoR1PU4Ce6KG02nzE17mO w==; IronPort-SDR: +pVDATlNSj3CyuorvZKgQfkAcvG0197cqnmZK7DPLmgCrMlqRVqxUomf4bNSTs9ACsxSLu1YUE 6LWuEXTJSe35bHbsttmvsf1Q2JroZhWXKkUhOTVt+8FTJ1/FCWZCYFvTarxnu0T/pNEJ20Fihj 7DeDT6T0dW7kIk7v2BGWYro997Id+SK2fsMyavaAApHTODCJ4EIm21E/rXmEXLWcnw7ZWme4p4 jBR0pefUpj8kbVbyFteGZ4LnHRGx4/eKgMZHVd7oSsQXHq9gUjfFnFNRDQ3tane/kdXhUP7K43 +2Y= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193719" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:28 +0800 IronPort-SDR: ql0D5faosnJywUGyZ2zRaItSKaZwj4ypZuROW4Y63X99PlR4vF1lqEEUit9foVAR1u7tJYS0Um 1KvSWtugVxhrska7ff6CewcdumngvvkNXopE9HiccUmSsIxYbIi+Qv1Kx09IQQg7ZSnIGn969i y29080V5NxRNeAKJpH+WDbRn5bSlclKReEIhh6M7Rzrbux/fpbSWA56vNvg9UPjijeDIaoR4Fp T6YNNyZ8KqhkmG0jvlFa9uF6pxGn2GQh4366YcgN7Wd4LYlW4FxwzBUTH2yr+shivBwR/lSGya izxwpH86ap8Im3Okz9+ALRMN Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:39 -0800 IronPort-SDR: 8f/F1RMZlcGujEVUi6wh49V5UtvzpvGX21jHcSyQLn+zzxGr1vDiorysPC4vfagm9dEHnAZ+tr NAo916jtDKnsZcdUXeywnV50uJZrwKsTV8mL0BogEg1Llcw49ERTtWDIAg23eESYDY4VWuWisJ alpuiZpjNSNlJeMJJ89jkQ06MbT96qzkt6VZlSYexOLbQMA9JG7IPejjN3z//9QeRyRcgqWDNq QwtSYxBKx0LLiclYs1lZIuV10W+A+437VOpyKUgXwdIqiQczme3tQ9olj7yuTlUB/vvCp3vkPx dMU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:28 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 03/40] btrfs: defer loading zone info after opening trees Date: Tue, 22 Dec 2020 12:48:56 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is preparation patch to implement zone emulation on a regular device. To emulate zoned mode on a regular (non-zoned) device, we need to decide an emulating zone size. Instead of making it compile-time static value, we'll make it configurable at mkfs time. Since we have one zone == one device extent restriction, we can determine the emulated zone size from the size of a device extent. We can extend btrfs_get_dev_zone_info() to show a regular device filled with conventional zones once the zone size is decided. The current call site of btrfs_get_dev_zone_info() during the mount process is earlier than reading the trees, so we can't slice a regular device to conventional zones. This patch defers the loading of zone info to open_ctree() to load the emulated zone size from a device extent. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/disk-io.c | 13 +++++++++++++ fs/btrfs/volumes.c | 4 ---- fs/btrfs/zoned.c | 24 ++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 4 files changed, 44 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 948661554db4..e7b451d30ae2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3257,6 +3257,19 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (ret) goto fail_tree_roots; + /* + * Get zone type information of zoned block devices. This will also + * handle emulation of the zoned mode for btrfs if a regular device has + * the zoned incompat feature flag set. + */ + ret = btrfs_get_dev_zone_info_all_devices(fs_info); + if (ret) { + btrfs_err(fs_info, + "failed to read device zone info: %d", + ret); + goto fail_block_groups; + } + /* * If we have a uuid root and we're not being told to rescan we need to * check the generation here so we can set the diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2c0aa03b6437..7d92b11ea603 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -669,10 +669,6 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; - ret = btrfs_get_dev_zone_info(device); - if (ret != 0) - goto error_free_page; - fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 155545180046..90b8d1d5369f 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -143,6 +143,30 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + int ret = 0; + + if (!btrfs_fs_incompat(fs_info, ZONED)) + return 0; + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + /* We can skip reading of zone info for missing devices */ + if (!device->bdev) + continue; + + ret = btrfs_get_dev_zone_info(device); + if (ret) + break; + } + mutex_unlock(&fs_devices->device_list_mutex); + + return ret; +} + int btrfs_get_dev_zone_info(struct btrfs_device *device) { struct btrfs_zoned_device_info *zone_info = NULL; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 8abe2f83272b..5e0e7de84a82 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -25,6 +25,7 @@ struct btrfs_zoned_device_info { #ifdef CONFIG_BLK_DEV_ZONED int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone); +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info); int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); @@ -42,6 +43,12 @@ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, return 0; } +static inline int btrfs_get_dev_zone_info_all_devices( + struct btrfs_fs_info *fs_info) +{ + return 0; +} + static inline int btrfs_get_dev_zone_info(struct btrfs_device *device) { return 0; From patchwork Tue Dec 22 03:48:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985645 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E12F5C4332D for ; Tue, 22 Dec 2020 03:52:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BB56722D2A for ; Tue, 22 Dec 2020 03:52:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726197AbgLVDwL (ORCPT ); Mon, 21 Dec 2020 22:52:11 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46382 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725895AbgLVDwK (ORCPT ); Mon, 21 Dec 2020 22:52:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609130; x=1640145130; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pTqmR39NZa50+GsSfqSXeZ5uEGvutOxJFpJGtFeS/S4=; b=BOZZMZxuSjCvdnedQY5OUJhoAXIEskHBS1jP0mvbolTbLArfrhTPcXm4 PzPoWUvDA6DHWALNCV7b9nVLfZYu2BjuMW0YmodFvVRGJRHOMltr+34+3 YTwYRNOU5XQ54qxbR2JAXAwc8nab48zZHEs5X7RxuMRAHlo+m+SGL29zQ 5emm0hQ95KbQ5xRr503Tm/CoQM8JQoIQdQtaca+gO8zbVh+LKcyecqI23 4DjULeVc/DjktC6AkC5QMKkOZA8H6a54QiC+m1djhrrjR7QLaJkkW/4hO q0FuVApPOpT1XtU3eBgw5IjO8fa1I447tMcXCrzElN/czpib7p4MhPWWW A==; IronPort-SDR: RaUArahMfIIg8z6A/9nZMNthCp6hz1XmzzPcXsngOwXi+1KOaUQSeNmVUOrobsVEoEmwvaB6PD xZfuE4MeDbtBfYLe34PA7ANdbAtyuy2U/wviPvk4gDETlrRwF5sz3emE/REtNZPT8w1HeZBmlL lGNwx5XDQP2oiWtRrbuJdXHgeCS2nAX5/9spFrvDNnKxLQHCBr+3JcSLb8RnvjeP01js2s2iNo vDSvyPg7Y76ynacOLnCVEpULGHwAaH1ytEpBap42hWbV7peCFiviHsmmNsReWRy/Vq9rbVQvYy tpw= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193726" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:30 +0800 IronPort-SDR: am785JMzsy2sn2Xt+ZIURx5f4S8Lzgaa55V6Jo0etJ1WGxpxJq3bvgKVY8e2D6jgS43CMHf0aF gJINKlHHo2GjMQW9/Ux6OCgKzXhXxGgCKqeN6VfER+Z/vnL5RduwrHero3GfO/6LDNTiRxxOHC /q0LPidKeZ6pIBLqUv52fBOxbYiWUZXRI7+5mswGP+CF1KSC9rGdesyXjrujSiEpzNO/B9h9RZ A9WliIADCf9oULO+vHrKAfNZ+kr9twmkjLfkwvuyB2DrPDT1xTStJkDxoKoT2a1xGR2mJH+Bdn RRuNGNZ0PnSccubrdb8mFgVE Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:41 -0800 IronPort-SDR: 6wiUSmeIcDEvmdV60XHtuRQNMjVfVf4BQcPR33cWuDADig5khiS8YGYJJPXW2ll6L6NAhV5scO kukBiEQBAAAznuVqP3y6O4Yh4+BsLWv+qDcsEc2s2PDx6pMtNmQXUlmFehTc31WeWbc4ml+dKm As4R9cKULXvDgvYBi2dT696U02LufEY9yy1sK0CzgZ7jLeqdJrhygpTUBfo8NqvtV5XiTHgTUS Q1z8bGv0w/us4+2+ZqqiyIX+E71NA4Cr/f/wXdPXVpnV1zb6LvyVtkOShLBfsTJ6AzWkV4kbV4 XKw= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:29 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 04/40] btrfs: change superblock location on conventional zone Date: Tue, 22 Dec 2020 12:48:57 +0900 Message-Id: <42c1712556e6865837151ad58252fb5f6ecff8f7.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We cannot use log-structured superblock writing in conventional zones since there is no write pointer to determine the last written superblock position. So, we write a superblock at a static location in a conventional zone. The written position is at the beginning of a zone, which is different from an SB position of regular btrfs. This difference causes a "chicken-and-egg problem" when supporting zoned emulation on a regular device. To know if btrfs is (emulated) zoned btrfs, we need to load an SB and check the feature flag. However, to load an SB, we need to know that it is zoned btrfs to load it from a different position. This patch moves the SB location on conventional zones so that the first SB location will be the same as regular btrfs. Signed-off-by: Naohiro Aota --- fs/btrfs/zoned.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 90b8d1d5369f..e5619c8bcebb 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -465,7 +465,8 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones, int ret; if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) { - *bytenr_ret = zones[0].start << SECTOR_SHIFT; + *bytenr_ret = (zones[0].start << SECTOR_SHIFT) + + btrfs_sb_offset(0); return 0; } From patchwork Tue Dec 22 03:48:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985647 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE2BEC43332 for ; Tue, 22 Dec 2020 03:52:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ADC1322D02 for ; Tue, 22 Dec 2020 03:52:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726210AbgLVDwO (ORCPT ); Mon, 21 Dec 2020 22:52:14 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46466 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725895AbgLVDwN (ORCPT ); Mon, 21 Dec 2020 22:52:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609133; x=1640145133; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FlPdWIqtYD5Xxqc/lXmoy2E+ERH2oUzLhki0CTMZK1A=; b=kZsdPq4nmLG45Xbdd1UnRBoILlNjF42YX4KtCmnYBrUQdjTYoPhllnh1 3OembAFRtslWGnJ6FI3InidszPpjpiS2gu3UXI0rtHeU950ow0i9kqPwm ajwpOhP+zFaHHL0Pf5BacB9KvLTmpFiVOFT0LYI8rbX/McSt2hYUAh1vR L+85Hx9pOKSIfJq8jNVsA+17khetBKebN6UlMN85m+hz6Ke6D1QJEwnzS n7FcGtdQC3nnMaxt6rgyP+ZfFhAfXj4GP7r273qNKZgOiXkLxMNSb3zC8 FX/VjrL0NVcdwZCzxQMPKzIQ7ZSVNuh0iGfIKRyt3MqQKET8r7gLDQfmM w==; IronPort-SDR: tLPwYvfGTVnCpDiWYb+IE7Y5wtvrxAHeh3p7EAkhX7S0UnLn8uSd4I1uCf5zXA6Y17SnaGJmWI ZBr+OCHpZ5LJWShHE2Z9wZWhZ2ifTg4jsnk8/5MlmAzLZ9tB8v810dsZuEY/RE1Fatgaho4ydV 9Uky+O/EumaMTKQdfiQ8BjS/bIgE14Awt4+yeYfpSW+7ZPCBS+3XpP2v6kQ6UaJ6Z2Jd5ot53g wjdvP0VC488NA/UvKnctFxNq8FCMeHwaAv60XGl1PwP55++NuMaWzzNYP5HigiGVEzYwN7g1aU 66o= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193731" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:32 +0800 IronPort-SDR: 0D5PeZUHVnV4btp99cfcFo0wrnFZ4cUfldkU8edrQr7neXVa0+Q3ykgtEznR7bxCIOc3vbjlW+ PMMlPJe6P6PAOqPkStV34Na6a9/exBwTjRyJufdtLt09xjaVWeLhsbblywUhRsZ2uE2WyuhzgD BPMCHQdbdTW8243HO90g/Fy5Oh2Kbl6kNFHdcoWSyPWqKZ+yXlC4PbDO77DqOZ0KaxKuIPUtpV e4JwwJW+itH2HVKL0SWu99CChEMwMP466ChE3zIk5TKFAw5Tb+zvs+E9ACPniBBw1z8BlCSQv5 txYM+qhnic2FsnM00+coHb3m Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:43 -0800 IronPort-SDR: qDSeSD3IE8hgUaV1F4PKc53mdmhAUnjnFHpK75J04TFq3YlUXGEwnGzANJOm7VIhuKo82a6iN+ BPu988H9iat3nEVyjq4H8FZBl40uJjUiJjfs6dPdsTHNmXrjS2p8qRSTYHt8hZVxV1eC2liBFZ nDQO7pZz5JFi4D9/KyHkQZb50cHAPeCk9/6yRt8eCqMVPAScHKo8/KCnOVQaoHsJB5lOen/5hg 5QE4GX8M7qBfSVT7JnBpxIdOwvLKGPJgWfY2qRggX0UkVCzHzIMM5WHWJC4WTyTtXgQxUecfKo Qg8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:31 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v11 05/40] btrfs: release path before calling into btrfs_load_block_group_zone_info Date: Tue, 22 Dec 2020 12:48:58 +0900 Message-Id: <8fa3073375cf26759f9c5d3ce083c64d573ad9a6.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Since we have no write pointer in conventional zones, we cannot determine allocation offset from it. Instead, we set the allocation offset after the highest addressed extent. This is done by reading the extent tree in btrfs_load_block_group_zone_info(). However, this function is called from btrfs_read_block_groups(), so the read lock for the tree node can recursively taken. Signed-off-by: Johannes Thumshirn --- fs/btrfs/block-group.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index b8bbdd95743e..69e1b24bbbad 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1839,6 +1839,7 @@ static int read_one_block_group(struct btrfs_fs_info *info, return -ENOMEM; read_block_group_item(cache, path, key); + btrfs_release_path(path); set_free_space_tree_thresholds(cache); @@ -2009,7 +2010,6 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) goto error; key.objectid += key.offset; key.offset = 0; - btrfs_release_path(path); } btrfs_release_path(path); From patchwork Tue Dec 22 03:48:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985649 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CCDAC433DB for ; Tue, 22 Dec 2020 03:52:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 382D122D02 for ; Tue, 22 Dec 2020 03:52:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726218AbgLVDwa (ORCPT ); Mon, 21 Dec 2020 22:52:30 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46487 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725895AbgLVDwa (ORCPT ); Mon, 21 Dec 2020 22:52:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609149; x=1640145149; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ED4EOEUJg6a3xFIzZ2Ewk0nDsoWASDuLgC90jwAnokQ=; b=MMtK7Y9vOR5QYBX7W8CK/Xpd9gHZRvJIgmSTcZVJe8A/iN+N8fSMx2QD S3e8cJ/xBmyaFZRSVAMrUw3LIVx8PGdFCG/4iEv77NkmrHJdUGAXEhPw6 UBxY7j4Yjdi/3gsRLu8JsmoPbMKgVA+OrofaxTYpFdZ7p1RZj8XiFIrzu V3Fm3Z0mp4bhvtXN3zPJnvU9Z1nKBGG3ctfEC49lVGRbhhW8wwRax4z0G X5BiqLivo3BmAk2IIp7vt/WZQLJnf7zVvv3XDBgl5NxdHioMSf8xZF2lf sBwORkuC/svcPPMyfQY4BzRlR1PG1EFcyYztHrhyEzEdnI0trgb25cUDx Q==; IronPort-SDR: pFliGu/I3lIbwmHpzyRhwaw5BS9RxC1/QFRDNgxDLf2O4nkwoaS/tWGGBzka/G/ARyv4SVHWrp ceN/7XO61T08VSeL+Gx9NFKRxBLpsq33oXHcw6+poRWC7+Uruk5lkJ+ozxdccn6hYRc7JlI/k/ 9B+eJRKrhQUtFn8FAQHOVscSE/BxIWPxuUe4zRl5izA/fohnpMXlLFr2jkynGrhjyOebVu5zH9 Z2b6mHq5/GHAV4wkiv8PKVHZO2j1vPhL/s+jODp/9OY4dfOfnR0rIPeFygszW5I+8HrtwnxrJD ng8= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193735" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:33 +0800 IronPort-SDR: MUN8h6YItS/IkzhqKWAyzWB0IdGfLlfIS4L8yMfSgYyJ8gY9Gt7nbVJZYBqN0LG1rmc8ld5dj4 fntFy5p9B3cIaqCXrhfzQ15xAlWaWHIWHN1IOLcApQWaUTA4mq3pM5hJFJVi6zDy57hSFo6VfX ax0p8vpYvO4TNJ5P2yjqi3iALH2wLmm1LNJltcrUV6IFdxBzGKa6CCnUX4r8rZW2OJLQ3tDPcL cGFs4ZznQH2+iitsB5NuOLo0rLRjl/dRYyWbOpAjBFHdYk403AV2joo1geaoHQj7FqkLYbFBjA VM8MyIEir3SRM9IhRsd4tpy2 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:44 -0800 IronPort-SDR: rjk9IsCQX+siPAx6V27L8Zyucp9G7Mwyg/lGFV+fdswJXE3J/V7EXyB5YMnf7RFNCBDodog8Op yF/9RBXIhNvm7kvB9Dsii8VUTsKjRLMjLhXLeq6TBJRj2kzXQ5TE9KDZOI2+zi7kmxhhaxo2Ga ak9EAlRFHQ8lu/4eYQVMaBrh6C3THKxlwxL+yG0vyUbo1bRiPITLnNN4neHD/SgBvB//19diRI PvXZ1b/2NuYnt93hYcuwBWC5f5vdyMxFAvgrsnCAcphogbgXOVfQNBdFydFtJv/ANdlSkRTwnO gOc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:32 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v11 06/40] btrfs: do not load fs_info->zoned from incompat flag Date: Tue, 22 Dec 2020 12:48:59 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Since fs_info->zoned is unioned with fs_info->zone_size, loading fs_info->zoned from the incompat flag screw up the zone_size. So, let's avoid to load it from the flag. It will be eventually set by btrfs_get_dev_zone_info_all_devices(). Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/disk-io.c | 2 -- fs/btrfs/zoned.c | 8 ++++++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e7b451d30ae2..192e366f8afc 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3136,8 +3136,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) btrfs_info(fs_info, "has skinny extents"); - fs_info->zoned = (features & BTRFS_FEATURE_INCOMPAT_ZONED); - /* * flag our filesystem as having big metadata blocks if * they are bigger than the page size diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index e5619c8bcebb..ae566a7da088 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -431,6 +431,14 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + /* + * Check mount options here, because we might change fs_info->zoned + * from fs_info->zone_size. + */ + ret = btrfs_check_mountopts_zoned(fs_info); + if (ret) + goto out; + btrfs_info(fs_info, "zoned mode enabled with zone size %llu", zone_size); out: return ret; From patchwork Tue Dec 22 03:49:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985651 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00A22C433E0 for ; Tue, 22 Dec 2020 03:52:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE10422D02 for ; Tue, 22 Dec 2020 03:52:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726243AbgLVDwh (ORCPT ); Mon, 21 Dec 2020 22:52:37 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46436 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725972AbgLVDwg (ORCPT ); Mon, 21 Dec 2020 22:52:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609156; x=1640145156; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DLMsPpwGaiGfPa+b5F7+MFUNCdl+ilE2MdGK45Nr0fU=; b=a9mHmFuH32LUBk8K64oFvsOnFxZ1ll/TWZZHEYGP/BSxr9Xc8uq2jYJX xrOX2QT8rcFietbH1/CVhXzbrQJZu+9DV4/JcepJdaGQo6xps1lKRs56J tmJ1aTaVTpzh1IWXs0v/jiYeB4c9AbC+Mp9WsOdmnM+SHNr8rqIMV0hrX gelWbW0f4N5ef+u7M5eOthsMRfECNTWgBkap3A6ZUraOWk1fgKcWgSGyh 34iwOGylKCSqYEqse02+lf67dy2uxpm0ecdeQxe1r0ghT9JnV/hJXeTSo w94dyRSKYFL4aRXx/GXQ2xp/VTxyake9+py8gt8NSdryeaVXCcE4zq+nt w==; IronPort-SDR: 3Zsk7DEAyiOFCuv/Kut2hh+ITDTUwPFUx7JH+UY95vP/6HFIzKZ45r7bH7sns6lyS1YZspWR2g qVkkC/KmoE09u8Nk+wO6brd50I3lmxzTLdvc9dQPMTxC9FuNbru3ysDx/lmbXQOqvtWu/jYu8d Ho6jrmAXzv5yB6NQs85mcUNG6FfyLSviufqzmv3xY45z4iRYq9uNSZ/HR7PB4o+69vXRSoAlg0 XwCkgSUGRyC0MuBtsUwmq6iPD/ezuOvfu05ss8h2BTDbLWQ1UOGdaH7qeKPmbSPsI6tcvUT7hp kys= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193739" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:35 +0800 IronPort-SDR: VfKpTnuGhRvfmM7s45MpQCyReVyT7kfny73t4ietxQXzOZWIMUrZaD9/pblTYPvbRitrvtDWYZ xgtoQP91jlewIJGEn/SJ2MW5C7qidjuPbR6VAMQTAIJOfh7B9ztALO6onAd/ZuzQuTELbMxG0h UOspFi+TpFYB7NGsttrycFVlBseVsNLZihupLiWl1L+zf+B3XoXWJ/aDsdZ2LYg/2ZB+iLntrT W9IJmL9zWoRazfxgVUrbtehF09CFHWxi35kxPkrKp4T0Hr/iYJDMKNlgc5JB9SlkE5K5O0/5yY WmFDXRBA2uCLhFHIcmZM39gi Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:46 -0800 IronPort-SDR: pa/9VTVWaPwEncQU5ifStxTBWWtsHd3Ptlq7in1MNjj4mYBVsBmaSGzBZzdN9Muomz26yg3bH/ sGz51/tSERUc2wqlXkwBxHyZ7z2gJmFAbzFhszKzgJDP99CkWJBScXBBCP0klDPd5lqskR1npN VJ2b3vyebX/dCeDraDQiAoCUYBKhbO4TEGp9KzjqERBMDxda853ea/A51jnPx9R/V0/msT50bq WzC5uqIZsNxThs3SnXFeHPw0H+sdxZdu8RsOHktClmFctqUnWvifIV2lFy9Ans30M47WcSyLbv i5Y= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:34 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 07/40] btrfs: disallow fitrim in ZONED mode Date: Tue, 22 Dec 2020 12:49:00 +0900 Message-Id: <7e1a3b008e0ded5b0ea1a86ec842618c2bcac56a.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The implementation of fitrim is depending on space cache, which is not used and disabled for zoned btrfs' extent allocator. So the current code does not work with zoned btrfs. In the future, we can implement fitrim for zoned btrfs by enabling space cache (but, only for fitrim) or scanning the extent tree at fitrim time. But, for now, disallow fitrim in ZONED mode. Signed-off-by: Naohiro Aota --- fs/btrfs/ioctl.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 5b9b0a390f0e..6df362081478 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -527,6 +527,14 @@ static noinline int btrfs_ioctl_fitrim(struct btrfs_fs_info *fs_info, if (!capable(CAP_SYS_ADMIN)) return -EPERM; + /* + * btrfs_trim_block_group() is depending on space cache, which is + * not available in ZONED mode. So, disallow fitrim in ZONED mode + * for now. + */ + if (fs_info->zoned) + return -EOPNOTSUPP; + /* * If the fs is mounted with nologreplay, which requires it to be * mounted in RO mode as well, we can not allow discard on free space From patchwork Tue Dec 22 03:49:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985653 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA88EC433E6 for ; Tue, 22 Dec 2020 03:52:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B519822CB2 for ; Tue, 22 Dec 2020 03:52:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726260AbgLVDwi (ORCPT ); Mon, 21 Dec 2020 22:52:38 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46437 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725972AbgLVDwh (ORCPT ); Mon, 21 Dec 2020 22:52:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609157; x=1640145157; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gjEXRn+1lfGV9vAJQVWu7z67FtdI/W78TJR3Uy2XVQM=; b=mcQ8TbNVSE0qKo5FhQF/kJv332/n1a7Ln04GKVfMq81CrPFim8as7i4z Y17rjHMq72/yGnBJxLRrDNTVIEGtzVeGwaIHpc7vLpGrFX7JVUqqXv+ps SCx90FWpNSjMHEo9RBZeQGXZqImedlCdTF+qwQzNennWMArHXf2FoovDo wYJiWD2OVyGGiRS5tWMiXMQxiH50wWdyMIH4ujmdk3JTcBa6XjE4sUiYX JpmbQV2EfjA+fDn7NlIcMjSdTR3Ro0PuHVaxUwP1YgXUtyauMeJspS+mz wseDuDd0vrpL4zLNS1junpzgq5HvhKICnaNyzGe2fiNkLjjJiPIiJvQBk g==; IronPort-SDR: ycP7an68lcYOROTphqhzrmr+pZWqzij7uKrDhISNntnOZ5fHqqup2yirPGWQWaWwrtEcEO38DN +cGhCsuYIRG9wo2cwUts5KAN6gQhxol8BA30OYOOUntY1+MX5zi2BbUL45u43uu/TeLQA7SwV3 MXmDkjD6uLleRR7bFJo5o7nTKcLpvXb2j27TLZhE5fsLT5p9NritEKb9zxIC+EwKPJA+53pvFX NIjiwmUuqYHjTZ3QKcDh9FDfoNSZ/0QGDqTM3diYJOaC16SEDlWmfIc3JKao6Qjyt2oyhp2f+H zaA= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193742" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:36 +0800 IronPort-SDR: c7fDFsS6xk53zAM4LIJ5PQLW4u17HwvIQjvd6hvFGScYcID/C7H4jTXYZM+NExYMINS+66+4yZ IIbdpFEi9rJqdrBVOOzcwHTopBbyaeYLVjMezGSG84ywiMlvkwlAwhhf8EQmt0kdzV/JmK75Cv +lazynqgYBDJZXFG4d/m2SzgKel9ePr8JKjW0DVh3iMsNPT2+KG5A82+JGvi7btUrd6dpQclSp kfFXRZq3S6LJIZWnGqiHleBvLfUtIIePbryRHfC1F+jAWseFD+aaK/lMXUBTXUAQtRotVEbL4b EOWVnrNOQ0PLlAE8SyJ/IwFY Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:47 -0800 IronPort-SDR: AD5Is/l3hNGlDaob1pkVVwAvAQiI1Wg/oZZ2l9Lj0LgcTnGDJCSdOoHbgRPj9R7/uGEHWkH5HK xk8jD6se7kGf31MJA+STHrHiCoTZO4lZnyQOmhs46JlozVUbLU/due9WSq0N+WaKko6KBHzw40 1HGPWax4l+tEq9CvLKrtVIbbsMnWcc5Z2i2BlGSWrf2KQvXZ6vnySbcOdhO0nrs9MoJy524BU5 gzOhbqQIrjE0xsSMrcRe9fd5aYkBk+Ph2vPTZyvOfqtKBGk1jNzJ1HdqULud57ayhnnscNQPeI wCk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:35 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Naohiro Aota Subject: [PATCH v11 08/40] btrfs: emulated zoned mode on non-zoned devices Date: Tue, 22 Dec 2020 12:49:01 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Emulate zoned btrfs mode on non-zoned devices. This is done by "slicing up" the block-device into static sized chunks and fake a conventional zone on each of them. The emulated zone size is determined from the size of device extent. This is mainly aimed at testing parts of the zoned mode, i.e. the zoned chunk allocator, on regular block devices. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/dev-replace.c | 3 + fs/btrfs/volumes.c | 14 +++++ fs/btrfs/volumes.h | 3 + fs/btrfs/zoned.c | 121 +++++++++++++++++++++++++++++++++++++---- fs/btrfs/zoned.h | 14 +++-- 5 files changed, 139 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 324f646d6e5e..e77cb46bf15d 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -321,6 +321,9 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE); device->fs_devices = fs_info->fs_devices; + if (btrfs_is_zoned(fs_info) && bdev_zoned_model(bdev) == BLK_ZONED_NONE) + device->force_zoned = true; + ret = btrfs_get_dev_zone_info(device); if (ret) goto error; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7d92b11ea603..2cdb5fe3e423 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -669,6 +669,15 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; + /* Emulate zoned mode on regular device? */ + if ((btrfs_super_incompat_flags(disk_super) & + BTRFS_FEATURE_INCOMPAT_ZONED) && + bdev_zoned_model(device->bdev) == BLK_ZONED_NONE) { + btrfs_info(NULL, +"zoned: incompat zoned flag detected on regular device, forcing zoned mode emulation"); + device->force_zoned = true; + } + fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { @@ -2562,6 +2571,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path device->fs_info = fs_info; device->bdev = bdev; + /* Zoned mode is enabled. Emulate zoned device on a regular device. */ + if (btrfs_is_zoned(fs_info) && + bdev_zoned_model(device->bdev) == BLK_ZONED_NONE) + device->force_zoned = true; + ret = btrfs_get_dev_zone_info(device); if (ret) goto error_free_device; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 1997a4649a66..59d9d47f173d 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -144,6 +144,9 @@ struct btrfs_device { struct completion kobj_unregister; /* For sysfs/FSID/devinfo/devid/ */ struct kobject devid_kobj; + + /* Force zoned mode */ + bool force_zoned; }; /* diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index ae566a7da088..fc43a650cd79 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -119,6 +119,32 @@ static inline u32 sb_zone_number(int shift, int mirror) return 0; } +static int emulate_report_zones(struct btrfs_device *device, u64 pos, + struct blk_zone *zones, unsigned int nr_zones) +{ + const sector_t zone_sectors = + device->fs_info->zone_size >> SECTOR_SHIFT; + sector_t bdev_size = device->bdev->bd_part->nr_sects; + unsigned int i; + + pos >>= SECTOR_SHIFT; + for (i = 0; i < nr_zones; i++) { + zones[i].start = i * zone_sectors + pos; + zones[i].len = zone_sectors; + zones[i].capacity = zone_sectors; + zones[i].wp = zones[i].start + zone_sectors; + zones[i].type = BLK_ZONE_TYPE_CONVENTIONAL; + zones[i].cond = BLK_ZONE_COND_NOT_WP; + + if (zones[i].wp >= bdev_size) { + i++; + break; + } + } + + return i; +} + static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, struct blk_zone *zones, unsigned int *nr_zones) { @@ -127,6 +153,12 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, if (!*nr_zones) return 0; + if (device->force_zoned) { + ret = emulate_report_zones(device, pos, zones, *nr_zones); + *nr_zones = ret; + return 0; + } + ret = blkdev_report_zones(device->bdev, pos >> SECTOR_SHIFT, *nr_zones, copy_zone_info_cb, zones); if (ret < 0) { @@ -143,6 +175,49 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +static int calculate_emulated_zone_size(struct btrfs_fs_info *fs_info) +{ + struct btrfs_path *path; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_key key; + struct extent_buffer *leaf; + struct btrfs_dev_extent *dext; + int ret = 0; + + key.objectid = 1; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto out; + + if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_item(root, path); + if (ret < 0) + goto out; + /* No dev extents at all? Not good */ + if (ret > 0) { + ret = -EUCLEAN; + goto out; + } + } + + leaf = path->nodes[0]; + dext = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_dev_extent); + fs_info->zone_size = btrfs_dev_extent_length(leaf, dext); + ret = 0; + +out: + btrfs_free_path(path); + + return ret; +} + int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; @@ -158,6 +233,12 @@ int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) if (!device->bdev) continue; + if (device->force_zoned && !fs_info->zone_size) { + ret = calculate_emulated_zone_size(fs_info); + if (ret) + break; + } + ret = btrfs_get_dev_zone_info(device); if (ret) break; @@ -177,9 +258,11 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) struct blk_zone *zones = NULL; unsigned int i, nreported = 0, nr_zones; unsigned int zone_sectors; + const bool force_zoned = device->force_zoned; + char *model, *emulated; int ret; - if (!bdev_is_zoned(bdev)) + if (!bdev_is_zoned(bdev) && !force_zoned) return 0; if (device->zone_info) @@ -189,8 +272,12 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) if (!zone_info) return -ENOMEM; + if (force_zoned) + zone_sectors = device->fs_info->zone_size >> SECTOR_SHIFT; + else + zone_sectors = bdev_zone_sectors(bdev); + nr_sectors = bdev->bd_part->nr_sects; - zone_sectors = bdev_zone_sectors(bdev); /* Check if it's power of 2 (see is_power_of_2) */ ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0); zone_info->zone_size = zone_sectors << SECTOR_SHIFT; @@ -296,12 +383,22 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) device->zone_info = zone_info; - /* device->fs_info is not safe to use for printing messages */ - btrfs_info_in_rcu(NULL, - "host-%s zoned block device %s, %u zones of %llu bytes", - bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", - rcu_str_deref(device->name), zone_info->nr_zones, - zone_info->zone_size); + if (bdev_zoned_model(bdev) == BLK_ZONED_HM) { + model = "host-managed zoned"; + emulated = ""; + } else if (bdev_zoned_model(bdev) == BLK_ZONED_HA) { + model = "host-aware zoned"; + emulated = ""; + } else if (bdev_zoned_model(bdev) == BLK_ZONED_NONE && + device->force_zoned) { + model = "regular"; + emulated = "emulated "; + } + + btrfs_info_in_rcu(device->fs_info, + "%s block device %s, %u %szones of %llu bytes", + model, rcu_str_deref(device->name), zone_info->nr_zones, + emulated, zone_info->zone_size); return 0; @@ -348,7 +445,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) u64 nr_devices = 0; u64 zone_size = 0; u64 max_zone_append_size = 0; - const bool incompat_zoned = btrfs_is_zoned(fs_info); + const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; /* Count zoned devices */ @@ -360,8 +457,10 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) model = bdev_zoned_model(device->bdev); if (model == BLK_ZONED_HM || - (model == BLK_ZONED_HA && incompat_zoned)) { - struct btrfs_zoned_device_info *zone_info; + (model == BLK_ZONED_HA && incompat_zoned) || + device->force_zoned) { + struct btrfs_zoned_device_info *zone_info = + device->zone_info; zone_info = device->zone_info; zoned_devices++; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 5e0e7de84a82..058a57317c05 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -143,12 +143,16 @@ static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device, u64 p static inline bool btrfs_check_device_zone_type(const struct btrfs_fs_info *fs_info, struct block_device *bdev) { - u64 zone_size; - if (btrfs_is_zoned(fs_info)) { - zone_size = bdev_zone_sectors(bdev) << SECTOR_SHIFT; - /* Do not allow non-zoned device */ - return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size; + /* + * We can allow a regular device on a zoned btrfs, because + * we will emulate zoned device on the regular device. + */ + if (!bdev_is_zoned(bdev)) + return true; + + return fs_info->zone_size == + (bdev_zone_sectors(bdev) << SECTOR_SHIFT); } /* Do not allow Host Manged zoned device */ From patchwork Tue Dec 22 03:49:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985655 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D7F6C433DB for ; Tue, 22 Dec 2020 03:52:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29B2922D03 for ; Tue, 22 Dec 2020 03:52:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726275AbgLVDwn (ORCPT ); Mon, 21 Dec 2020 22:52:43 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46443 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725972AbgLVDwm (ORCPT ); Mon, 21 Dec 2020 22:52:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609162; x=1640145162; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FsGwMWT1XdRFQq7DqeeoRexq/M4KpuVCHqlyoqzLbGw=; b=e+LPGUNNr/5RGDE91mNalhEuDjOJuECnaToaYvxYcQ3bldoBxiWF408X c+d6mlxRKAXBQBWNiijWessDLyFPfjMsNJcgcAMU3K6d5RVdJ6FZXrH0h rqDEMDjnSB9J0RmVL3h3Ae+5TK809tG0AMXL49yVdQI6/FZDpyy0ToTui wnp7nUU5OnkOfP/N97bGCoqdgrcr/Gax+lBHbD+qu0d3nuLXoEGLW5bDK ybD9RJpiv6X9RRP29pE1I/GGTOsDVwUVTQKQ1KFYeIWNYIukNcKyTFuxJ lmq+Zuc4dVnyKlZdwRvI6YsWUKADizAgD82r09AIaURPfXFLYwP5Gp9uh A==; IronPort-SDR: YOTmcAduBLEtyBJo0OYBYwwnnt5rEX86JF2Y2b+2ZpWD+yVDPTGEZfO8DcXotrbR/6ibZhyFal Jw9k3ALIufnSP1YlV27K30Io+71/aAeaTvynhLB7YCQuFwjDz+eOTrFypYWpj4l5bXKtGq/tjv +Tvq/IqTCVaU5xVw4jtXId3sBsIUSz7y2SMvCzSOMO4ejUtVBQ8UXPNduJ8T3sClowMyGVjhI/ iZwWb/Pg50NHlJuWpmqAyQTIaTY2U0elZvXuzlLs0HFZvDzKwrhynyYoOUNzsSEpTJzonXQfQ1 jjY= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193747" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:38 +0800 IronPort-SDR: bmXmwKCxYXno/oBmUX+cKZfzgqVikZoT45KcsSh/OXJwq2cbETC3N6Fb0PPq4JrdXYf1lkB7FN bpNzhi0U1zvib4pCPWXxJBzAcuW3lNdG1YPLgRY/8Qalz/2Y8q0KJPVJ08Az7L4b0UKzOYB6KV /Jd0Un8fCZx9JAUHmN3OL9OL0CCpMFF+tsINKB6VK8VGj6/M7z3c/8vVVmCJjPG+Qf7E2LsRHY KCXTbAW+ly5B9uAiFz+zLg4PtEx1EBFgWMypTZyHzFsChRTKmpi+fQu+UFFMwx83ppD3lig63t fR/w85CtBgiVwBxDR8r/5HvW Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:49 -0800 IronPort-SDR: a1OfIBz8EJTgh9l7xbJaEynMeZKO64zB+Iw5lal3AljS9JjXASVjAC8NVyUjh1ZB2oDhYM09ac GzUfBq0tX3VzEGkqVR9XZp74mDpvzWZ2+LDBKwF2GfTKtE1YleLx/KB5qNI5lIHOxXgfzBRgog Vpq5ujeLi5akU2vmQndBpbP4TnQ9kd7maIeb4+XfJX9MfoM69qq/XLuQGhOVSMOh79cE0Qktiz FmKstHxVkyrwfU8Me43sNnj/m1+LjpELUOFJgSGLc1qEbeqnhhECua7wjrxi1srrS5Rv4M2C1z GzU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:37 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 09/40] btrfs: implement zoned chunk allocator Date: Tue, 22 Dec 2020 12:49:02 +0900 Message-Id: <6c977b7099812637cff36d09ac1a8e6ea2e00519.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a zoned chunk/dev_extent allocator. The zoned allocator aligns the device extents to zone boundaries, so that a zone reset affects only the device extent and does not change the state of blocks in the neighbor device extents. Also, it checks that a region allocation is not overlapping any of the super block zones, and ensures the region is empty. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/volumes.c | 169 ++++++++++++++++++++++++++++++++++++++++----- fs/btrfs/volumes.h | 1 + fs/btrfs/zoned.c | 144 ++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 25 +++++++ 4 files changed, 323 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2cdb5fe3e423..19c76cf9d2d2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1424,11 +1424,62 @@ static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) * make sure to start at an offset of at least 1MB. */ return max_t(u64, start, SZ_1M); + case BTRFS_CHUNK_ALLOC_ZONED: + /* + * We don't care about the starting region like regular + * allocator, because we anyway use/reserve the first two + * zones for superblock logging. + */ + return ALIGN(start, device->zone_info->zone_size); default: BUG(); } } +static bool dev_extent_hole_check_zoned(struct btrfs_device *device, + u64 *hole_start, u64 *hole_size, + u64 num_bytes) +{ + u64 zone_size = device->zone_info->zone_size; + u64 pos; + int ret; + int changed = 0; + + ASSERT(IS_ALIGNED(*hole_start, zone_size)); + + while (*hole_size > 0) { + pos = btrfs_find_allocatable_zones(device, *hole_start, + *hole_start + *hole_size, + num_bytes); + if (pos != *hole_start) { + *hole_size = *hole_start + *hole_size - pos; + *hole_start = pos; + changed = 1; + if (*hole_size < num_bytes) + break; + } + + ret = btrfs_ensure_empty_zones(device, pos, num_bytes); + + /* Range is ensured to be empty */ + if (!ret) + return changed; + + /* Given hole range was invalid (outside of device) */ + if (ret == -ERANGE) { + *hole_start += *hole_size; + *hole_size = 0; + return 1; + } + + *hole_start += zone_size; + *hole_size -= zone_size; + changed = 1; + } + + return changed; +} + /** * dev_extent_hole_check - check if specified hole is suitable for allocation * @device: the device which we have the hole @@ -1445,24 +1496,39 @@ static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start, bool changed = false; u64 hole_end = *hole_start + *hole_size; - /* - * Check before we set max_hole_start, otherwise we could end up - * sending back this offset anyway. - */ - if (contains_pending_extent(device, hole_start, *hole_size)) { - if (hole_end >= *hole_start) - *hole_size = hole_end - *hole_start; - else - *hole_size = 0; - changed = true; - } + for (;;) { + /* + * Check before we set max_hole_start, otherwise we could end up + * sending back this offset anyway. + */ + if (contains_pending_extent(device, hole_start, *hole_size)) { + if (hole_end >= *hole_start) + *hole_size = hole_end - *hole_start; + else + *hole_size = 0; + changed = true; + } + + switch (device->fs_devices->chunk_alloc_policy) { + case BTRFS_CHUNK_ALLOC_REGULAR: + /* No extra check */ + break; + case BTRFS_CHUNK_ALLOC_ZONED: + if (dev_extent_hole_check_zoned(device, hole_start, + hole_size, num_bytes)) { + changed = true; + /* + * The changed hole can contain pending + * extent. Loop again to check that. + */ + continue; + } + break; + default: + BUG(); + } - switch (device->fs_devices->chunk_alloc_policy) { - case BTRFS_CHUNK_ALLOC_REGULAR: - /* No extra check */ break; - default: - BUG(); } return changed; @@ -1515,6 +1581,9 @@ static int find_free_dev_extent_start(struct btrfs_device *device, search_start = dev_extent_search_start(device, search_start); + WARN_ON(device->zone_info && + !IS_ALIGNED(num_bytes, device->zone_info->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -4913,6 +4982,37 @@ static void init_alloc_chunk_ctl_policy_regular( ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes; } +static void init_alloc_chunk_ctl_policy_zoned( + struct btrfs_fs_devices *fs_devices, + struct alloc_chunk_ctl *ctl) +{ + u64 zone_size = fs_devices->fs_info->zone_size; + u64 limit; + int min_num_stripes = ctl->devs_min * ctl->dev_stripes; + int min_data_stripes = (min_num_stripes - ctl->nparity) / ctl->ncopies; + u64 min_chunk_size = min_data_stripes * zone_size; + u64 type = ctl->type; + + ctl->max_stripe_size = zone_size; + if (type & BTRFS_BLOCK_GROUP_DATA) { + ctl->max_chunk_size = round_down(BTRFS_MAX_DATA_CHUNK_SIZE, + zone_size); + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { + ctl->max_chunk_size = ctl->max_stripe_size; + } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) { + ctl->max_chunk_size = 2 * ctl->max_stripe_size; + ctl->devs_max = min_t(int, ctl->devs_max, + BTRFS_MAX_DEVS_SYS_CHUNK); + } + + /* We don't want a chunk larger than 10% of writable space */ + limit = max(round_down(div_factor(fs_devices->total_rw_bytes, 1), + zone_size), + min_chunk_size); + ctl->max_chunk_size = min(limit, ctl->max_chunk_size); + ctl->dev_extent_min = zone_size * ctl->dev_stripes; +} + static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl) { @@ -4933,6 +5033,9 @@ static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, case BTRFS_CHUNK_ALLOC_REGULAR: init_alloc_chunk_ctl_policy_regular(fs_devices, ctl); break; + case BTRFS_CHUNK_ALLOC_ZONED: + init_alloc_chunk_ctl_policy_zoned(fs_devices, ctl); + break; default: BUG(); } @@ -5059,6 +5162,38 @@ static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl, return 0; } +static int decide_stripe_size_zoned(struct alloc_chunk_ctl *ctl, + struct btrfs_device_info *devices_info) +{ + u64 zone_size = devices_info[0].dev->zone_info->zone_size; + /* Number of stripes that count for block group size */ + int data_stripes; + + /* + * It should hold because: + * dev_extent_min == dev_extent_want == zone_size * dev_stripes + */ + ASSERT(devices_info[ctl->ndevs - 1].max_avail == ctl->dev_extent_min); + + ctl->stripe_size = zone_size; + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + + /* stripe_size is fixed in ZONED. Reduce ndevs instead. */ + if (ctl->stripe_size * data_stripes > ctl->max_chunk_size) { + ctl->ndevs = div_u64(div_u64(ctl->max_chunk_size * ctl->ncopies, + ctl->stripe_size) + ctl->nparity, + ctl->dev_stripes); + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + ASSERT(ctl->stripe_size * data_stripes <= ctl->max_chunk_size); + } + + ctl->chunk_size = ctl->stripe_size * data_stripes; + + return 0; +} + static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl, struct btrfs_device_info *devices_info) @@ -5086,6 +5221,8 @@ static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, switch (fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: return decide_stripe_size_regular(ctl, devices_info); + case BTRFS_CHUNK_ALLOC_ZONED: + return decide_stripe_size_zoned(ctl, devices_info); default: BUG(); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 59d9d47f173d..c8841b714f2e 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -216,6 +216,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used); enum btrfs_chunk_allocation_policy { BTRFS_CHUNK_ALLOC_REGULAR, + BTRFS_CHUNK_ALLOC_ZONED, }; /* diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index fc43a650cd79..b1ece6b978dd 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1,11 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" +#include "disk-io.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -529,6 +531,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; /* * Check mount options here, because we might change fs_info->zoned @@ -746,3 +749,144 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) sb_zone << zone_sectors_shift, zone_sectors * BTRFS_NR_SB_LOG_ZONES, GFP_NOFS); } + +/* + * btrfs_check_allocatable_zones - find allocatable zones within give region + * @device: the device to allocate a region + * @hole_start: the position of the hole to allocate the region + * @num_bytes: the size of wanted region + * @hole_size: the size of hole + * + * Allocatable region should not contain any superblock locations. + */ +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + u64 nzones = num_bytes >> shift; + u64 pos = hole_start; + u64 begin, end; + bool have_sb; + int i; + + ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + while (pos < hole_end) { + begin = pos >> shift; + end = begin + nzones; + + if (end > zinfo->nr_zones) + return hole_end; + + /* Check if zones in the region are all empty */ + if (btrfs_dev_is_sequential(device, pos) && + find_next_zero_bit(zinfo->empty_zones, end, begin) != end) { + pos += zinfo->zone_size; + continue; + } + + have_sb = false; + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + u32 sb_zone; + u64 sb_pos; + + sb_zone = sb_zone_number(shift, i); + if (!(end <= sb_zone || + sb_zone + BTRFS_NR_SB_LOG_ZONES <= begin)) { + have_sb = true; + pos = ((u64)sb_zone + BTRFS_NR_SB_LOG_ZONES) << shift; + break; + } + + /* + * We also need to exclude regular superblock + * positions + */ + sb_pos = btrfs_sb_offset(i); + if (!(pos + num_bytes <= sb_pos || + sb_pos + BTRFS_SUPER_INFO_SIZE <= pos)) { + have_sb = true; + pos = ALIGN(sb_pos + BTRFS_SUPER_INFO_SIZE, + zinfo->zone_size); + break; + } + } + if (!have_sb) + break; + + } + + return pos; +} + +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes) +{ + int ret; + + *bytes = 0; + ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, + physical >> SECTOR_SHIFT, length >> SECTOR_SHIFT, + GFP_NOFS); + if (ret) + return ret; + + *bytes = length; + while (length) { + btrfs_dev_set_zone_empty(device, physical); + physical += device->zone_info->zone_size; + length -= device->zone_info->zone_size; + } + + return 0; +} + +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + unsigned long begin = start >> shift; + unsigned long end = (start + size) >> shift; + u64 pos; + int ret; + + ASSERT(IS_ALIGNED(start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(size, zinfo->zone_size)); + + if (end > zinfo->nr_zones) + return -ERANGE; + + /* All the zones are conventional */ + if (find_next_bit(zinfo->seq_zones, begin, end) == end) + return 0; + + /* All the zones are sequential and empty */ + if (find_next_zero_bit(zinfo->seq_zones, begin, end) == end && + find_next_zero_bit(zinfo->empty_zones, begin, end) == end) + return 0; + + for (pos = start; pos < start + size; pos += zinfo->zone_size) { + u64 reset_bytes; + + if (!btrfs_dev_is_sequential(device, pos) || + btrfs_dev_is_empty_zone(device, pos)) + continue; + + /* Free regions should be empty */ + btrfs_warn_in_rcu( + device->fs_info, + "zoned: resetting device %s (devid %llu) zone %llu for allocation", + rcu_str_deref(device->name), device->devid, + pos >> shift); + WARN_ON_ONCE(1); + + ret = btrfs_reset_device_zone(device, pos, zinfo->zone_size, + &reset_bytes); + if (ret) + return ret; + } + + return 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 058a57317c05..de5901f5ae66 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -36,6 +36,11 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, u64 *bytenr_ret); void btrfs_advance_sb_log(struct btrfs_device *device, int mirror); int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror); +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes); +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes); +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -92,6 +97,26 @@ static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror return 0; } +static inline u64 btrfs_find_allocatable_zones(struct btrfs_device *device, + u64 hole_start, u64 hole_end, + u64 num_bytes) +{ + return hole_start; +} + +static inline int btrfs_reset_device_zone(struct btrfs_device *device, + u64 physical, u64 length, u64 *bytes) +{ + *bytes = 0; + return 0; +} + +static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, + u64 start, u64 size) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Dec 22 03:49:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07D05C433E0 for ; Tue, 22 Dec 2020 03:52:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D2FB722CB2 for ; Tue, 22 Dec 2020 03:52:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726286AbgLVDwv (ORCPT ); Mon, 21 Dec 2020 22:52:51 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46382 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725972AbgLVDwv (ORCPT ); Mon, 21 Dec 2020 22:52:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609171; x=1640145171; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ehiyT/rjXF+jKIzQjKxIWcDbNp9RE+AwX6W27HgSZwY=; b=lItX+ZGuMe8xIHR9jr3fNAxcBzfVyHutPn+S5eXjX+wdX7dSqDzZym9z EMoKYoEe3+XEupKBbDvCPY5F6zpQXnQkZw/XqopSSg/4HYT39z9clMkst 78LZyXoAO9kTZwo6vROBEwCpVkKMYeivdU5qQg1SCX1Y6tEk97EKOapEM Cxb2emAj/T9pfY8dghcHBsj5pNCpUK6DuCjO8kUPvU/Y8K0LGxjK5mFxN ndEvmEh/HwwktArQlGmXuQwv1VTOnBlVO6eF+cvsHwFahlGuEDhuvkq8P 8lkYUoZ6WkWoS8Ttta0YvmovDl2roUJd2ctU2XMTaSyniEmcBdmhUNIsU A==; IronPort-SDR: JGOascXF/QbtG57cX/0dVDyJz+sSI4FoWGtNpuPbzsa189r5wn+YgfBvtyFdhy2sqNpdv9hmzz 0ltCMYOKiu+88lS8Qvur2hpKl3U5TGCFz6jPIfOztx7WtKiMgM4kUYa2c2TANRxykLsHOiE45a YY2ypvV2p8hrUNuvz8paS5IoZ0n32m9d5UrTComwkA5hFWAMUr5VNat+vbORnb051QSxezBWgz Oo+C8/Kc+3iKha03K7eMN7Q2Un+AIcBlUb9aCy0RHm0j9hYaMAXZC+T8+0IRR6DgH4BbQ4qEgw Vvk= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193750" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:39 +0800 IronPort-SDR: R2vZfqVHGD4QpWJuawansrST7LKH9jjgpNCvB3TonWXnIH0K4RoScBLMpyLjZJMf97mCxJ/QzZ V4nGBGNUEwnS6sMTJ4JjMDVfDOWG+KMYbfnOMN5BdogB40xIY6Fy5yiRRShbm72L4mUtYbGXKl w3LOpSCcPCzNeMDrbTUMxAkF+R+edBfOKiCM+GlxI2TEPtMj5tOYBLY14RS149SefWowT8aYoh AA2xccVYil3WqGbmwDhEf2ShhHr6ZNChkeHWfsDfOcSobOGaePEk2mpYyrKU+crp1LoQx7aei7 cGG3DHqdTbdvOme7D1qOZmHq Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:51 -0800 IronPort-SDR: Te9kFLbjGBCHHOiQW6K2U6zfV/qU/iBdkHlkKBRfrC8WT6BnZQarIgo8l+hDQSYeSrCpPUACbH x0WZ/l3haI6bjv50ozzSuUzNdfAKY+eKY+DjDdgWHZrxK1aaLkyLZ1idTsqjI6diJj0Jxt+END 7mDDNrq5enk1zEoBlZKFC+aBIgZGDlfjK9L1t6TxDGXbZPU9R0nIWpO7s+foIm8t6l5ZNFfFNm 1X9RsIorjt8C+NcRmXNtHYw3akinYRoL/r1w+/UmgGNnzYyA/yWm8D67m7W92K1BpU2w0op5hC Xzo= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:39 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v11 10/40] btrfs: verify device extent is aligned to zone Date: Tue, 22 Dec 2020 12:49:03 +0900 Message-Id: <842b11a0724845c3710943d9eb7c707eedad569a.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a check in verify_one_dev_extent() to check if a device extent on a zoned block device is aligned to the respective zone boundary. Signed-off-by: Naohiro Aota Reviewed-by: Anand Jain Reviewed-by: Josef Bacik --- fs/btrfs/volumes.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 19c76cf9d2d2..e0d17e08a46c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7790,6 +7790,20 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, ret = -EUCLEAN; goto out; } + + if (dev->zone_info) { + u64 zone_size = dev->zone_info->zone_size; + + if (!IS_ALIGNED(physical_offset, zone_size) || + !IS_ALIGNED(physical_len, zone_size)) { + btrfs_err(fs_info, +"zoned: dev extent devid %llu physical offset %llu len %llu is not aligned to device zone", + devid, physical_offset, physical_len); + ret = -EUCLEAN; + goto out; + } + } + out: free_extent_map(em); return ret; From patchwork Tue Dec 22 03:49:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C734C433E0 for ; Tue, 22 Dec 2020 03:52:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3ED5922CB1 for ; Tue, 22 Dec 2020 03:52:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726293AbgLVDwy (ORCPT ); Mon, 21 Dec 2020 22:52:54 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46466 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725972AbgLVDwy (ORCPT ); Mon, 21 Dec 2020 22:52:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609173; x=1640145173; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZOnx58KFSNMwnDl2sQiEWJLzqq6DayRnou7mmvMhlT0=; b=l51Z3q50Myqb8dPOh9hS3qbj+A5ORqNu6v2Wjfe9LxDmdpfbRkItNM6s 1lVmnRk2CkTIxhz1g5SDXqRPXT/c1Q4L7RMmtWiNSA5tr6yoOuBhlgg+0 NdMjCSJEfj04itC7C37OJAT5yqq/KMj3RmGjjm2Fpm7HPTAaOl0faZV5I OsAwKMImxoFaTKEpsFt0Rkc9URxlybQJ3KKnV2WjBhgalnr4UR5Idd1An ntqsNx90z2rx9JE2Ymkh3cyWWYAuX9I0HzanGy3MZCC+yylGlx4vBxN2z L5fHniPn2pHGVU8X6TI7LJvxEvg2SLIzcR7IShgSaJVPeGg5BVxX6bz50 Q==; IronPort-SDR: Fg/2UCzQg9g29l2mbNckqf89TPS44NfFmkjEzvOhrBTkn7qbfZWs3YgZlAL1yJv0AtrLZUempl R7daV5dB5XoVzbvtxXA42StJ8Sm+PHfxRXlRTn9DgU27Pe7rFHPF2uO0e9TRKg3c3kw6emENrC 033fstoc4oi6fYIDi+ApWz9JEiW1/HrtvWaWcaRghptGm/eWppkhjWBYGDtZaDiepviZjgRHNt QE17QNuZd3wYvf8INKua/f5+qNA2aJvMzTmLMEag4Ve/oy6fGG5+CC3kqMSJv7J52AU9jCt6Ya yYQ= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193759" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:41 +0800 IronPort-SDR: CZcHBYHyH64ojkEhUlclhZ8x2Cy5Y4AE0f+fCaMhncMFAL3WqXYP/q19caLdmTDGlKKFmiHDLN pafaTkCQYyFhsKAdcYHuqHNoewhlHlB/ey3SQ+SOr5Z/Lr0eWnaLwq+iOad2XDGxAGT/eXBhcr Fs7TOWjTyxNcIxLl7/tEPjndW2HOskuOiZLCNTjARm1fBXaHEsPDAHSwS7a33fpUriOJPkO4TP OLsMC5ghnHvoWTWLqyVDyhn8W5lPsTZCcw6zrNkHs2ZGDIqcBxqzg0Cukdy/UV4MAaW+jlBMNS Q1BsABA42HHSTOz8xw6ZP/DK Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:53 -0800 IronPort-SDR: synHGJGw/Z8MrsduBsva0HNEq/fa6C4p/ODKwKJjk9lYR14mtNq0HLfTLLoAVhyoU/ByfQC0no 7H+/27JqPmyCYFisUYwxJk+H6v07T+cb6QUTmcuB+B6TH+kSNuqoSPRyZqcwS+JemDA3/taHsP M1WntdyJ9M9kzuwRCYwJBCn6Kp9q22ujRdREtihQbDq/KHIlNBtYcFEj9vdYy7PhNqkm0DqWRV U9nNUQ/ScrNs869cuSnM6CjMCkJsyX9MCgqRJvn8NWjPP7IAlVpA0YaFDQYh8gR4rFELJu8KZ0 6Hk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:40 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Anand Jain Subject: [PATCH v11 11/40] btrfs: load zone's allocation offset Date: Tue, 22 Dec 2020 12:49:04 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned btrfs must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. This commit adds "alloc_offset" to track the logical address. This logical address is populated in btrfs_load_block_group_zone_info() from write pointers of corresponding zones. For now, zoned btrfs only support the SINGLE profile. Supporting non-SINGLE profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-SINGLE profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/block-group.c | 15 ++++ fs/btrfs/block-group.h | 6 ++ fs/btrfs/zoned.c | 154 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 ++ 4 files changed, 182 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 69e1b24bbbad..8c029e45a573 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -15,6 +15,7 @@ #include "delalloc-space.h" #include "discard.h" #include "raid56.h" +#include "zoned.h" /* * Return target flags in extended format or 0 if restripe for this chunk_type @@ -1866,6 +1867,13 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_err(info, "zoned: failed to load zone info of bg %llu", + cache->start); + goto error; + } + /* * We need to exclude the super stripes now so that the space info has * super bytes accounted for, otherwise we'll think we have more space @@ -2141,6 +2149,13 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, cache->cached = BTRFS_CACHE_FINISHED; if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; + + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_put_block_group(cache); + return ret; + } + ret = exclude_super_stripes(cache); if (ret) { /* We may have excluded something, so call this just in case */ diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 8f74a96074f7..9d026ab1768d 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -183,6 +183,12 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + + /* + * Allocation offset for the block group to implement sequential + * allocation. This is used only with ZONED mode enabled. + */ + u64 alloc_offset; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index b1ece6b978dd..adca89a5ebc1 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -3,14 +3,20 @@ #include #include #include +#include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" #include "disk-io.h" +#include "block-group.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* Invalid allocation pointer value for missing devices */ +#define WP_MISSING_DEV ((u64)-1) +/* Pseudo write pointer value for conventional zone */ +#define WP_CONVENTIONAL ((u64)-2) /* Number of superblock log zones */ #define BTRFS_NR_SB_LOG_ZONES 2 @@ -890,3 +896,151 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map_tree *em_tree = &fs_info->mapping_tree; + struct extent_map *em; + struct map_lookup *map; + struct btrfs_device *device; + u64 logical = cache->start; + u64 length = cache->length; + u64 physical = 0; + int ret; + int i; + unsigned int nofs_flag; + u64 *alloc_offsets = NULL; + u32 num_sequential = 0, num_conventional = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + /* Sanity check */ + if (!IS_ALIGNED(length, fs_info->zone_size)) { + btrfs_err(fs_info, "zoned: block group %llu len %llu unaligned to zone size %llu", + logical, length, fs_info->zone_size); + return -EIO; + } + + /* Get the chunk mapping */ + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, logical, length); + read_unlock(&em_tree->lock); + + if (!em) + return -EINVAL; + + map = em->map_lookup; + + /* + * Get the zone type: if the group is mapped to a non-sequential zone, + * there is no need for the allocation offset (fit allocation is OK). + */ + alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), + GFP_NOFS); + if (!alloc_offsets) { + free_extent_map(em); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + bool is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->bdev == NULL) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (is_sequential) + num_sequential++; + else + num_conventional++; + + if (!is_sequential) { + alloc_offsets[i] = WP_CONVENTIONAL; + continue; + } + + /* + * This zone will be used for allocation, so mark this + * zone non-empty. + */ + btrfs_dev_clear_zone_empty(device, physical); + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + nofs_flag = memalloc_nofs_save(); + ret = btrfs_get_dev_zone(device, physical, &zone); + memalloc_nofs_restore(nofs_flag); + if (ret == -EIO || ret == -EOPNOTSUPP) { + ret = 0; + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } else if (ret) { + goto out; + } + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + btrfs_err(fs_info, "zoned: offline/readonly zone %llu on device %s (devid %llu)", + physical >> device->zone_info->zone_size_shift, + rcu_str_deref(device->name), device->devid); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = + ((zone.wp - zone.start) << SECTOR_SHIFT); + break; + } + } + + if (num_conventional > 0) { + /* + * Since conventional zones do not have a write pointer, we + * cannot determine alloc_offset from the pointer + */ + ret = -EINVAL; + goto out; + } + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + cache->alloc_offset = alloc_offsets[0]; + break; + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID0: + case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* non-SINGLE profiles are not supported yet */ + default: + btrfs_err(fs_info, "zoned: profile %s not supported", + btrfs_bg_type_to_raid_name(map->type)); + ret = -EINVAL; + goto out; + } + +out: + kfree(alloc_offsets); + free_extent_map(em); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index de5901f5ae66..491b98c97f48 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,6 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -117,6 +118,12 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, return 0; } +static inline int btrfs_load_block_group_zone_info( + struct btrfs_block_group *cache) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Dec 22 03:49:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36FB7C433E0 for ; Tue, 22 Dec 2020 03:53:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 13BD022CB1 for ; Tue, 22 Dec 2020 03:53:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726305AbgLVDxL (ORCPT ); Mon, 21 Dec 2020 22:53:11 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46487 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725972AbgLVDxK (ORCPT ); Mon, 21 Dec 2020 22:53:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609190; x=1640145190; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0k58NK9bGJjpzxazJG5Jr6a+JRS4AIJhTmB88U7w6ms=; b=YOsGt7FETm8SuT+/iUxuaa2zPhr76j9wggMqmcdHDt5VX8TrEtNr+LEt MbZMfK/eBv8KNaQ+DISYNjLYmKUZ8YvUjoklmE64m/x+XhljlQmp6tgwR U0cuZC5ux18BFAtn8FiNIn2mG++MxesufqQ/MK9un43vRjkRUQ7880RiK rswpMh4GWN8YGloyQNqMp8xZk0lwwhs0U4yVYpVEeERQl8cubSLDiLUh5 mcYgRVOTIhJGm26JK7bzg1aXGyoZfzCPbcx8i1DOVN+gjxqAI7ZGK15ar APMHAaDPW82iKLeVqAWAA0fbY+i5K5Ns1bzjcgHRqyXA30V/OIPzkox9O w==; IronPort-SDR: pVXchcid6Dw63P5wKvzbHwizlscZwy69SaYUSLvgUvN4g+rk0PMyrD2jzmNnhHjPUmQdn5CfcO sYfcfOLDz70n5SkIwkQdTJJ1ElJY/VW7W8kaLu+UAXuaZq07yPUfqfg6WpF8BfGLrh4w+ww+Gz AvJHHM+qgVjIr7/8dTONBVIAYkk/dAyy/0zFr/ugZfPRZsR32IrEUl+bQte6VRCNylIFgD439y Sa2nRRa04uGFKyG/ymKvt23r/MR508/UEqYVCJPTXjT3iK5BdrW+NyTUxlyUTkjrc6zB4QD/kE W9o= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193763" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:43 +0800 IronPort-SDR: +i9vXJXmWlKhycv+j8h3hJLVhOn8wDs519uNmRMjXnTd9L/YwvTU5nEJqPnaIZBP4GM8uT28cK 6YEbtGvtWuwLIj2QOQX0clMjQO2L7HZ77x0njm5+3uikLzU/UgAvDiAu5p5Uh/KTLp30WviuEN o1Yn/ZX6iBIe2xXkLq39e2c4Jko9zv3AzxFrvMLsqvihSj5zzb1iBkq0pNcGBWOz9Jadwp8mY8 wnar9eZ971oi6XJFaP78wPITXz/DJQ+8EmWcyOr3+AfBtCePtz6dczS8f7XgFozZJe4xtdSu2n YZNUtjS39hrLYhMSTvrIEvsw Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:54 -0800 IronPort-SDR: AQqJlV0cpDF33YTI6VGJcfEVU2Fs0dky76ODlmUP+ibDQxuZfyDXqDMDhsgyS9yM3O4nj0agTC z7B1bHSlWw7kZMjq+T8MW9+7jJrefYgyadvu2W3bi5i2N9MunPy6HY+0W5ZQUsiG7oLHxewT6K SQy4icpuWJ2AgDJ3p485/R0GjxjYDmMdnrF8XSeAceT/HBLTaM+OmiNEoGJOnNuYqyMDZMfZVG SWZffRZuQ1a7NlyTHGgm4kA8u6wnssUtE4bdGHN+YjqGkKmQ1Mcs65KtWIRckndWcoqOXi5JDd qoE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:42 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 12/40] btrfs: calculate allocation offset for conventional zones Date: Tue, 22 Dec 2020 12:49:05 +0900 Message-Id: <5101ed472a046b3fc691aeb90f84bb55790d4fc0.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset if a block group contains a conventional zone. But instead, we can consider the end of the last allocated extent in the block group as an allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex (which is already taken in btrfs_make_block_group()). Since it is a new block group, we can simply set the allocation offset to 0, anyway. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 +- fs/btrfs/zoned.c | 93 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/zoned.h | 4 +- 3 files changed, 92 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 8c029e45a573..9eb1e3aa5e0f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1867,7 +1867,7 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, false); if (ret) { btrfs_err(info, "zoned: failed to load zone info of bg %llu", cache->start); @@ -2150,7 +2150,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, true); if (ret) { btrfs_put_block_group(cache); return ret; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index adca89a5ebc1..ceb6d0d7d33b 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -897,7 +897,62 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +static int calculate_alloc_pointer(struct btrfs_block_group *cache, + u64 *offset_ret) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + int ret; + u64 length; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = cache->start + cache->length; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + /* We should not find the exact match */ + if (ret <= 0) { + ret = -EUCLEAN; + goto out; + } + + ret = btrfs_previous_extent_item(root, path, cache->start); + if (ret) { + if (ret == 1) { + ret = 0; + *offset_ret = 0; + } + goto out; + } + + btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]); + + if (found_key.type == BTRFS_EXTENT_ITEM_KEY) + length = found_key.offset; + else + length = fs_info->nodesize; + + if (!(found_key.objectid >= cache->start && + found_key.objectid + length <= cache->start + cache->length)) { + ret = -EUCLEAN; + goto out; + } + *offset_ret = found_key.objectid + length - cache->start; + ret = 0; + +out: + btrfs_free_path(path); + return ret; +} + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) { struct btrfs_fs_info *fs_info = cache->fs_info; struct extent_map_tree *em_tree = &fs_info->mapping_tree; @@ -911,6 +966,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) int i; unsigned int nofs_flag; u64 *alloc_offsets = NULL; + u64 last_alloc = 0; u32 num_sequential = 0, num_conventional = 0; if (!btrfs_is_zoned(fs_info)) @@ -1013,11 +1069,30 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) if (num_conventional > 0) { /* - * Since conventional zones do not have a write pointer, we - * cannot determine alloc_offset from the pointer + * Avoid calling calculate_alloc_pointer() for new BG. It + * is no use for new BG. It must be always 0. + * + * Also, we have a lock chain of extent buffer lock -> + * chunk mutex. For new BG, this function is called from + * btrfs_make_block_group() which is already taking the + * chunk mutex. Thus, we cannot call + * calculate_alloc_pointer() which takes extent buffer + * locks to avoid deadlock. */ - ret = -EINVAL; - goto out; + if (new) { + cache->alloc_offset = 0; + goto out; + } + ret = calculate_alloc_pointer(cache, &last_alloc); + if (ret || map->num_stripes == num_conventional) { + if (!ret) + cache->alloc_offset = last_alloc; + else + btrfs_err(fs_info, + "zoned: failed to determine allocation offset of bg %llu", + cache->start); + goto out; + } } switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { @@ -1039,6 +1114,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) } out: + /* An extent is allocated after the write pointer */ + if (num_conventional && last_alloc > cache->alloc_offset) { + btrfs_err(fs_info, + "zoned: got wrong write pointer in BG %llu: %llu > %llu", + logical, last_alloc, cache->alloc_offset); + ret = -EIO; + } + kfree(alloc_offsets); free_extent_map(em); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 491b98c97f48..b53403ba0b10 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,7 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -119,7 +119,7 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, } static inline int btrfs_load_block_group_zone_info( - struct btrfs_block_group *cache) + struct btrfs_block_group *cache, bool new) { return 0; } From patchwork Tue Dec 22 03:49:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65809C433E0 for ; Tue, 22 Dec 2020 03:53:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 33FE622ADC for ; Tue, 22 Dec 2020 03:53:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726320AbgLVDxR (ORCPT ); Mon, 21 Dec 2020 22:53:17 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46436 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726313AbgLVDxR (ORCPT ); Mon, 21 Dec 2020 22:53:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609197; x=1640145197; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=V4G3HrjXg2KOqy7Cr6RLwwbBYSzX5YEo1H5goIZEfIk=; b=N4efL5/vb9FdDlvd3rT3Sqn93/o1ryz9LJC8tZYK6MP8rqITCwu0zH2W gVs6nzlQcmYpeHiYL9DPl2P9sIVKfyP4X1Gl2jP5YoTYU0LLVepZ3pLCp ZtboNJAGzBcZaAHjDzGUZjKXUlVOJpCRf+DgR6cf1TaVefxs4UMLoFuuh /WzBddMt/qydZ266FcM//NGayukyGUFA3o7meDc0EKeMTL0uQvtS1yEvF sGz4d66Usxipxc1nHlPao0qZPemZgJ3S98EVYZOrin0F3tb4VH0Wij5RN kKhXjy30BAstt49KSfrKt8uygJhXSFGEIGfyHEkhvDmLsbXNNVTFDeUuP A==; IronPort-SDR: zPHTWSIwHeK5D6/KmByrGzxV7SvPHUoQ6jgZ5gumFqZML15HBc6etv5n2pljna1bv0+v6FLp46 sCFi6ZxgyERy8O7P9rK7CipW2dQV/U7XgBv8iwXcj8IVs5zkSjGnauM+UIuEs6pQ5/vZBnonDw n9kIVc6bhjk1HatCszTTJ8mGjhI3OX2IUrPgb0OdqeWHf3irtD+47PFvQnQ4ycwuvMjw7MohVi mgpanaa8YlaONAj4bobeSKPPLiTu4rNRXZSduNFXG284I0IXcOvnZrS/eG7ifwxHbj8HWF1X7/ Pts= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193768" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:45 +0800 IronPort-SDR: JlAgvpKtNDH+mE9ZzJYQz0CmmmypmOTM1kBp8b0754oFCGgVIH9UcUrZg9/PeNgHglh55SitnO j4muoawzvYToDUT2jry3wITs3w3Y6CroMYpRX59fUTPn/IOL7TdmA6U2EvNOE9/Tehsv1YcIUi O6GhdwwReHWghjMdwqmgCDsLLIn0hW6A/2MIDgZvjQZSTE+LZRDdOK0rMPBmTICmvOfgZSfO34 wzlrxvQZ0driab4Mr7496+rYvw7t6pwWeqGLljwqoG0REtM2XhteRBZIgUUTvbYtt8VTY2XH2t 4Un8LEyT0nTeNnBfEpPSRTDO Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:56 -0800 IronPort-SDR: t8rhPH/Be0ixWPVZ8CU1sV2ApDkWNxTnTQHY0cmhOOR2QsQZnTxJqQKD/u3Ls/IgEon6X1MW0T Gr1Dnc21cB88j0ndkBkIahC8fNq05wgCdCnosnyMwZdWVAtOnODXhbMa+Ux8oDu54UcFu0SQf2 MsXA5qTegxjF0sD1iSh4QoFGmpXq40THLAKYjXtByEG2r01hfP/NIEfNkR6DPmaWYBniv0pfmW xCDOthg98k2KPw3Xd+HBdIac4xbllzR/r9R6b3ZB3m9ulNXGKDAfZy2GggH0rdgVVRzUo4brsk GhE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:44 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 13/40] btrfs: track unusable bytes for zones Date: Tue, 22 Dec 2020 12:49:06 +0900 Message-Id: <43075f585c6866abcf2b4e000f4481159b39d78a.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In zoned btrfs a region that was once written then freed is not usable until we reset the underlying zone. So we need to distinguish such unusable space from usable free space. Therefore we need to introduce the "zone_unusable" field to the block group structure, and "bytes_zone_unusable" to the space_info structure to track the unusable space. Pinned bytes are always reclaimed to the unusable space. But, when an allocated region is returned before using e.g., the block group becomes read-only between allocation time and reservation time, we can safely return the region to the block group. For the situation, this commit introduces "btrfs_add_free_space_unused". This behaves the same as btrfs_add_free_space() on regular btrfs. On zoned btrfs, it rewinds the allocation offset. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.c | 23 ++++++++++----- fs/btrfs/block-group.h | 1 + fs/btrfs/extent-tree.c | 10 ++++++- fs/btrfs/free-space-cache.c | 57 +++++++++++++++++++++++++++++++++++++ fs/btrfs/free-space-cache.h | 2 ++ fs/btrfs/space-info.c | 13 +++++---- fs/btrfs/space-info.h | 4 ++- fs/btrfs/sysfs.c | 2 ++ fs/btrfs/zoned.c | 24 ++++++++++++++++ fs/btrfs/zoned.h | 3 ++ 10 files changed, 125 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 9eb1e3aa5e0f..33c5c47ebbc3 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1010,12 +1010,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, WARN_ON(block_group->space_info->total_bytes < block_group->length); WARN_ON(block_group->space_info->bytes_readonly - < block_group->length); + < block_group->length - block_group->zone_unusable); + WARN_ON(block_group->space_info->bytes_zone_unusable + < block_group->zone_unusable); WARN_ON(block_group->space_info->disk_total < block_group->length * factor); } block_group->space_info->total_bytes -= block_group->length; - block_group->space_info->bytes_readonly -= block_group->length; + block_group->space_info->bytes_readonly -= + (block_group->length - block_group->zone_unusable); + block_group->space_info->bytes_zone_unusable -= + block_group->zone_unusable; block_group->space_info->disk_total -= block_group->length * factor; spin_unlock(&block_group->space_info->lock); @@ -1159,7 +1164,7 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force) } num_bytes = cache->length - cache->reserved - cache->pinned - - cache->bytes_super - cache->used; + cache->bytes_super - cache->zone_unusable - cache->used; /* * Data never overcommits, even in mixed mode, so do just the straight @@ -1904,6 +1909,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, btrfs_free_excluded_extents(cache); } + btrfs_calc_zone_unusable(cache); + ret = btrfs_add_block_group_cache(info, cache); if (ret) { btrfs_remove_free_space_cache(cache); @@ -1911,7 +1918,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, } trace_btrfs_add_block_group(info, cache, 0); btrfs_update_space_info(info, cache->flags, cache->length, - cache->used, cache->bytes_super, &space_info); + cache->used, cache->bytes_super, + cache->zone_unusable, &space_info); cache->space_info = space_info; @@ -1967,7 +1975,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info) break; } btrfs_update_space_info(fs_info, bg->flags, em->len, em->len, - 0, &space_info); + 0, 0, &space_info); bg->space_info = space_info; link_block_group(bg); @@ -2197,7 +2205,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, */ trace_btrfs_add_block_group(fs_info, cache, 1); btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, - cache->bytes_super, &cache->space_info); + cache->bytes_super, 0, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); @@ -2305,7 +2313,8 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache) spin_lock(&cache->lock); if (!--cache->ro) { num_bytes = cache->length - cache->reserved - - cache->pinned - cache->bytes_super - cache->used; + cache->pinned - cache->bytes_super - + cache->zone_unusable - cache->used; sinfo->bytes_readonly -= num_bytes; list_del_init(&cache->ro_list); } diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 9d026ab1768d..0f3c62c561bc 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -189,6 +189,7 @@ struct btrfs_block_group { * allocation. This is used only with ZONED mode enabled. */ u64 alloc_offset; + u64 zone_unusable; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d79b8369e6aa..043a2fe79270 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -34,6 +34,7 @@ #include "block-group.h" #include "discard.h" #include "rcu-string.h" +#include "zoned.h" #undef SCRAMBLE_DELAYED_REFS @@ -2725,6 +2726,9 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info, { struct btrfs_free_cluster *ret = NULL; + if (btrfs_is_zoned(fs_info)) + return NULL; + *empty_cluster = 0; if (btrfs_mixed_space_info(space_info)) return ret; @@ -2808,7 +2812,11 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, space_info->max_extent_size = 0; percpu_counter_add_batch(&space_info->total_bytes_pinned, -len, BTRFS_TOTAL_BYTES_PINNED_BATCH); - if (cache->ro) { + if (btrfs_is_zoned(fs_info)) { + /* Need reset before reusing in a zoned block group */ + space_info->bytes_zone_unusable += len; + readonly = true; + } else if (cache->ro) { space_info->bytes_readonly += len; readonly = true; } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index fd6ddd6b8165..5a5c2c527dd5 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2465,6 +2465,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, int ret = 0; u64 filter_bytes = bytes; + ASSERT(!btrfs_is_zoned(fs_info)); + info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS); if (!info) return -ENOMEM; @@ -2522,11 +2524,49 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, return ret; } +static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, + u64 bytenr, u64 size, bool used) +{ + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 offset = bytenr - block_group->start; + u64 to_free, to_unusable; + + spin_lock(&ctl->tree_lock); + if (!used) + to_free = size; + else if (offset >= block_group->alloc_offset) + to_free = size; + else if (offset + size <= block_group->alloc_offset) + to_free = 0; + else + to_free = offset + size - block_group->alloc_offset; + to_unusable = size - to_free; + + ctl->free_space += to_free; + block_group->zone_unusable += to_unusable; + spin_unlock(&ctl->tree_lock); + if (!used) { + spin_lock(&block_group->lock); + block_group->alloc_offset -= size; + spin_unlock(&block_group->lock); + } + + /* All the region is now unusable. Mark it as unused and reclaim */ + if (block_group->zone_unusable == block_group->length) + btrfs_mark_bg_unused(block_group); + + return 0; +} + int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size) { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2535,6 +2575,16 @@ int btrfs_add_free_space(struct btrfs_block_group *block_group, bytenr, size, trim_state); } +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size) +{ + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + false); + + return btrfs_add_free_space(block_group, bytenr, size); +} + /* * This is a subtle distinction because when adding free space back in general, * we want it to be added as untrimmed for async. But in the case where we add @@ -2545,6 +2595,10 @@ int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC) || btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2562,6 +2616,9 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; + if (btrfs_is_zoned(block_group->fs_info)) + return 0; + spin_lock(&ctl->tree_lock); again: diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index ecb09a02d544..1f23088d43f9 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -107,6 +107,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, enum btrfs_trim_state trim_state); int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size); +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size); int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, u64 bytenr, u64 size); int btrfs_remove_free_space(struct btrfs_block_group *block_group, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 67e55c5479b8..025349c5c439 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -163,6 +163,7 @@ u64 __pure btrfs_space_info_used(struct btrfs_space_info *s_info, ASSERT(s_info); return s_info->bytes_used + s_info->bytes_reserved + s_info->bytes_pinned + s_info->bytes_readonly + + s_info->bytes_zone_unusable + (may_use_included ? s_info->bytes_may_use : 0); } @@ -257,7 +258,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; @@ -273,6 +274,7 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, found->bytes_used += bytes_used; found->disk_used += bytes_used * factor; found->bytes_readonly += bytes_readonly; + found->bytes_zone_unusable += bytes_zone_unusable; if (total_bytes > 0) found->full = 0; btrfs_try_granting_tickets(info, found); @@ -422,10 +424,10 @@ static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info, info->total_bytes - btrfs_space_info_used(info, true), info->full ? "" : "not "); btrfs_info(fs_info, - "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu", + "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu zone_unusable=%llu", info->total_bytes, info->bytes_used, info->bytes_pinned, info->bytes_reserved, info->bytes_may_use, - info->bytes_readonly); + info->bytes_readonly, info->bytes_zone_unusable); DUMP_BLOCK_RSV(fs_info, global_block_rsv); DUMP_BLOCK_RSV(fs_info, trans_block_rsv); @@ -454,9 +456,10 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, list_for_each_entry(cache, &info->block_groups[index], list) { spin_lock(&cache->lock); btrfs_info(fs_info, - "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s", + "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu zone_unusable %s", cache->start, cache->length, cache->used, cache->pinned, - cache->reserved, cache->ro ? "[readonly]" : ""); + cache->reserved, cache->zone_unusable, + cache->ro ? "[readonly]" : ""); spin_unlock(&cache->lock); btrfs_dump_free_space(cache, bytes); } diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 5646393b928c..ee003ffba956 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -17,6 +17,8 @@ struct btrfs_space_info { u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ + u64 bytes_zone_unusable; /* total bytes that are unusable until + resetting the device zone */ u64 max_extent_size; /* This will hold the maximum extent size of the space info if we had an ENOSPC in the @@ -119,7 +121,7 @@ DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned"); int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, u64 flags); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 4522a1c4cd08..cf7e766f7c58 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -666,6 +666,7 @@ SPACE_INFO_ATTR(bytes_pinned); SPACE_INFO_ATTR(bytes_reserved); SPACE_INFO_ATTR(bytes_may_use); SPACE_INFO_ATTR(bytes_readonly); +SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); BTRFS_ATTR(space_info, total_bytes_pinned, @@ -679,6 +680,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, bytes_reserved), BTRFS_ATTR_PTR(space_info, bytes_may_use), BTRFS_ATTR_PTR(space_info, bytes_readonly), + BTRFS_ATTR_PTR(space_info, bytes_zone_unusable), BTRFS_ATTR_PTR(space_info, disk_used), BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, total_bytes_pinned), diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index ceb6d0d7d33b..02373a7433b8 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1127,3 +1127,27 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) return ret; } + +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) +{ + u64 unusable, free; + + if (!btrfs_is_zoned(cache->fs_info)) + return; + + WARN_ON(cache->bytes_super != 0); + unusable = cache->alloc_offset - cache->used; + free = cache->length - cache->alloc_offset; + + /* We only need ->free_space in ALLOC_SEQ BGs */ + cache->last_byte_to_unpin = (u64)-1; + cache->cached = BTRFS_CACHE_FINISHED; + cache->free_space_ctl->free_space = free; + cache->zone_unusable = unusable; + + /* + * Should not have any excluded extents. Just + * in case, though. + */ + btrfs_free_excluded_extents(cache); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index b53403ba0b10..0cc0b27e9437 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -42,6 +42,7 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -124,6 +125,8 @@ static inline int btrfs_load_block_group_zone_info( return 0; } +static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Dec 22 03:49:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF704C433E9 for ; Tue, 22 Dec 2020 03:53:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C87F022ADC for ; Tue, 22 Dec 2020 03:53:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726333AbgLVDxU (ORCPT ); Mon, 21 Dec 2020 22:53:20 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46437 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726313AbgLVDxS (ORCPT ); Mon, 21 Dec 2020 22:53:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609197; x=1640145197; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i3dYwyyU/o7+hLBnRBd3qA6z2ZczfP7zQfUEzK2PEqc=; b=CkBWg3bxitEW8HLf2JJV7WOuM8UPl+xfAHih17vkJve9M/PQX0k8VSKi wftoNYFwF3hdpa5mhu49OX7O+ckhuNEqisg43lqnR51gn25KwllJ+TZsI I/NzZyaEf19EdC6F1dRBILeb9b29gr5pcPnPTVbj2xx3tzyFM9Xa1mfj+ MgCQ2jY4lJlm1wmSwzBzdBzBaY2ox1G+KoYaLIeLIHU4VkXiAYBJ+4AvO 2PGx9e1vT2i36bYydflD7tgFwEMrTjcYwLlvpYRzGVsPEM+mXug2BySRl KkOAKXm7Sk595InDSl3kWeTlM40G4Ou7uu01xXbINrc5sXAboLomTYt5A A==; IronPort-SDR: T2f21iot5ZSCFTD6jbT3JPK/8MGlQVcH8YQ9nUF9HeGKS7z0ozrvufxs8roGVg7mOwqr6jobpr ml+tVGCji1uo577nSzead+TcYva5u3iTCVYJoHI2lJpw7rtbEX65DpJJlIqXJcGZs3J77HG+5i A0ZRqzomAuytPMM3Y78DUO0A3oH2a+qUL8YNKphzg2cWkMcn59yJY16U8NGGkzVBjaDQRMejDN 8rld0WY/mYPiaieLRS0hEIYYLD8OJ4izCP2in9bbzyjXi/EotFWGtVSGAt9DTJOF0L38MKVJHP 2YY= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193769" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:46 +0800 IronPort-SDR: xK5QNAOcW66cdkZKu3OA38ChnqMltsECznBnE4vc56vxvcjxtuVNEXSVCYx8vPe/m6rZJy4RQq 7Cp1e8QbdX6pSR7fHPGcQMsPX9Qk4etBCO1yBkY9aKRlP/iYkCjXoxuQxFnDFQiDNGQfeMLDVG ncyvQ8zXftAJG7e2xA8Yl8i9Lb0pDZMGTBLDzJEo7QNw8ypfQqCVIM6Ls81i+VTScU9waOwq16 Hqwlxh//mp2jhw7w+fU6tQsvEY1+LPu+vU5QqOpUXOyDF9O+ClcLIEOQmnYMo1pADo2p6ngugo XJWPrZS3ezNVV+hgloppJ1Zb Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:57 -0800 IronPort-SDR: gE5rqVRTyyGZsZEWhgqRcH1qfY32qtfXQXPqPV0Wqeluggd+3oEas3ZDMIyvPNMybWfUpySXy6 gnrptHhovfGFI8rRraTpVr4BzwAPEBGd7UmWUQfmiXhbyvh6Z9649e618xSFuuV3+7cLL/kI0g JBDEAM6HpCAPbHoIIhBv5S00dD3x15qjJ8hg1P4HA9N6OgkMikGEriU+VuqXKDLE91SehGQtwW yYtJY6yArbRmWFbL/B0PbISaPLyMAFs0AgQ1f6jMBhZRnGn+abBJzibxzbDu2BhVLMiO7B0VtS MIQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:45 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v11 14/40] btrfs: do sequential extent allocation in ZONED mode Date: Tue, 22 Dec 2020 12:49:07 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a sequential extent allocator for the ZONED mode. This allocator just needs to check if there is enough space in the block group. Therefor the allocator never manages bitmaps or clusters. Also add ASSERTs to the corresponding functions. Actually, with zone append writing, it is unnecessary to track the allocation offset. It only needs to check space availability. But, by tracking the offset and returning the offset as an allocated region, we can skip modification of ordered extents and checksum information when there is no IO reordering. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 ++ fs/btrfs/extent-tree.c | 85 ++++++++++++++++++++++++++++++++++--- fs/btrfs/free-space-cache.c | 6 +++ 3 files changed, 89 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 33c5c47ebbc3..eea776180c37 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -726,6 +726,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only struct btrfs_caching_control *caching_ctl = NULL; int ret = 0; + /* Allocator for ZONED btrfs does not use the cache at all */ + if (btrfs_is_zoned(fs_info)) + return 0; + caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS); if (!caching_ctl) return -ENOMEM; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 043a2fe79270..88e103451aca 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3522,6 +3522,7 @@ btrfs_release_block_group(struct btrfs_block_group *cache, enum btrfs_extent_allocation_policy { BTRFS_EXTENT_ALLOC_CLUSTERED, + BTRFS_EXTENT_ALLOC_ZONED, }; /* @@ -3774,6 +3775,58 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Simple allocator for sequential only block group. It only allows + * sequential allocation. No need to play with trees. This function + * also reserves the bytes as in btrfs_add_reserved_bytes. + */ +static int do_allocation_zoned(struct btrfs_block_group *block_group, + struct find_free_extent_ctl *ffe_ctl, + struct btrfs_block_group **bg_ret) +{ + struct btrfs_space_info *space_info = block_group->space_info; + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 start = block_group->start; + u64 num_bytes = ffe_ctl->num_bytes; + u64 avail; + int ret = 0; + + ASSERT(btrfs_is_zoned(block_group->fs_info)); + + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + if (block_group->ro) { + ret = 1; + goto out; + } + + avail = block_group->length - block_group->alloc_offset; + if (avail < num_bytes) { + ffe_ctl->max_extent_size = avail; + ret = 1; + goto out; + } + + ffe_ctl->found_offset = start + block_group->alloc_offset; + block_group->alloc_offset += num_bytes; + spin_lock(&ctl->tree_lock); + ctl->free_space -= num_bytes; + spin_unlock(&ctl->tree_lock); + + /* + * We do not check if found_offset is aligned to stripesize. The + * address is anyway rewritten when using zone append writing. + */ + + ffe_ctl->search_start = ffe_ctl->found_offset; + +out: + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + return ret; +} + static int do_allocation(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) @@ -3781,6 +3834,8 @@ static int do_allocation(struct btrfs_block_group *block_group, switch (ffe_ctl->policy) { case BTRFS_EXTENT_ALLOC_CLUSTERED: return do_allocation_clustered(block_group, ffe_ctl, bg_ret); + case BTRFS_EXTENT_ALLOC_ZONED: + return do_allocation_zoned(block_group, ffe_ctl, bg_ret); default: BUG(); } @@ -3795,6 +3850,9 @@ static void release_block_group(struct btrfs_block_group *block_group, ffe_ctl->retry_clustered = false; ffe_ctl->retry_unclustered = false; break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3823,6 +3881,9 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, case BTRFS_EXTENT_ALLOC_CLUSTERED: found_extent_clustered(ffe_ctl, ins); break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3838,6 +3899,9 @@ static int chunk_allocation_failed(struct find_free_extent_ctl *ffe_ctl) */ ffe_ctl->loop = LOOP_NO_EMPTY_SIZE; return 0; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Give up here */ + return -ENOSPC; default: BUG(); } @@ -4006,6 +4070,9 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, case BTRFS_EXTENT_ALLOC_CLUSTERED: return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); + case BTRFS_EXTENT_ALLOC_ZONED: + /* nothing to do */ + return 0; default: BUG(); } @@ -4069,6 +4136,9 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.last_ptr = NULL; ffe_ctl.use_cluster = true; + if (btrfs_is_zoned(fs_info)) + ffe_ctl.policy = BTRFS_EXTENT_ALLOC_ZONED; + ins->type = BTRFS_EXTENT_ITEM_KEY; ins->objectid = 0; ins->offset = 0; @@ -4211,20 +4281,23 @@ static noinline int find_free_extent(struct btrfs_root *root, /* move on to the next group */ if (ffe_ctl.search_start + num_bytes > block_group->start + block_group->length) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } if (ffe_ctl.found_offset < ffe_ctl.search_start) - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - ffe_ctl.search_start - ffe_ctl.found_offset); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + ffe_ctl.search_start - ffe_ctl.found_offset); ret = btrfs_add_reserved_bytes(block_group, ram_bytes, num_bytes, delalloc); if (ret == -EAGAIN) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } btrfs_inc_block_group_reservations(block_group); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 5a5c2c527dd5..757c740de179 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2906,6 +2906,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group, u64 align_gap_len = 0; enum btrfs_trim_state align_gap_trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, block_group->full_stripe_len, max_extent_size); @@ -3037,6 +3039,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group, struct rb_node *node; u64 ret = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&cluster->lock); if (bytes > cluster->max_size) goto out; @@ -3813,6 +3817,8 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group, int ret; u64 rem = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + *trimmed = 0; spin_lock(&block_group->lock); From patchwork Tue Dec 22 03:49:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C49C5C433DB for ; Tue, 22 Dec 2020 03:53:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 96F2422CB1 for ; Tue, 22 Dec 2020 03:53:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726344AbgLVDxX (ORCPT ); Mon, 21 Dec 2020 22:53:23 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46443 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726335AbgLVDxX (ORCPT ); Mon, 21 Dec 2020 22:53:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609203; x=1640145203; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GYa11iEsnXzX0Yo405/HHXpfAEQfifwR5LbxdiZHpLI=; b=RCxhdgrG0/Zm1RwPFH/XiuHUBbKb3EHAxCQDq4ayLRtWYcz8v/sljK89 YwUTPKczx55edO6o4Y/4NrGuOzQnNrQ8TqZnyTNCZyGmF0CMINMcrGgp/ xsl7Tf13CFGv1PoUGvWkjjcjVr+9mGF66aquSUYMa1j2qwrybF5HeOcLe jJxfGuagXTuNpJGVowGxdJJcpgWG2JeYxum/l+tf4qa2cz2AjaLNtFslg IU+z1cl/l8IHTpIYnRQiBqwkuJtBMlLOtiec/HLIaccwyS8efSzJVWLr5 bT6z4cczPxt/ACZ8a21m2ARAJbtrVF53slMDA9g4sjvc69dSxWD6cDKmE Q==; IronPort-SDR: ZrzN0GjGuoLbK/lPlhpxhL8qtua15YHvKRQQNTsVOnVwBYd15uMR03tQRZSQgcCx6INiQ0ECKH QoLKp3eIQ890FjT5k7PEuiYDb6v1MK8C6qmOK1nOmvTrFzMkPBt8H5/VA49E6P7+7wEfQ3ocLa cTtkHdheD3b1ZLCt5BozhmbFwVpmI4Pc215anjvLnMssVpYNU/vVSHMSJcVws2gy/OCBLf9Qyu Q8A0s73BvJmzI+ZCj8ko6G3Exdo3PvHu3cHl5x81jaRKl/cFU3X8sjNHQvquerIuNk+JhiQFpa rQo= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193775" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:48 +0800 IronPort-SDR: ISMZePdT/8CJw3IvxJZ7PkK9sjCSNkxRF2c3k1ZvjrxZ65P9PmmyVaK9p7rMS/qh3BR9obz3aS QgGx2GlDFp4qOqqA4juFO/EWOPpo5KEefvCVF+R4F6I1j0qy4Qp31L/RAwHr+LNZ4kIVvurexH ESpeFSaIk7WYhZBeRBmQMcVxU2ywWBkBrE8hggiQV0eU8VS3vkPQ5bQQN0Ujv3LguhV5fnctMF fSs+e6i2hRmfBxMz7ltwSkPuoArn73ucnRer+yYhSPG8Gkggaew7Paw4rV8WgYaTiDipJeABxA gVxaDvlgMXQhUs23KC7z6d5d Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:35:59 -0800 IronPort-SDR: RL/cLtqlzZUlQp80xjcwVRNIdvIb4nw+cBg9/61Dxtwwm88iuTXuIEXek7M5GlvornwOtV8ikQ hV1cHURlFIWqPSqkbIqB+6eR83EhgDWqvYU2q/rHsHoGBcj9Zol+SKXLwQzO9wJaqecg3o1Zal D/B83686A89zMed1Sx0CmguqPSYMdCgDgnm3B3dF0X6yenguNFoyo2v7ToJHRhDL52/6Gw89no cipH8/9H4eL0t45MyD7deQhiclQg8zlFQFbq7+QPbH83MnQqI5PSeePd+tbeuTVlt2OtkLhidy nfs= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:47 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 15/40] btrfs: redirty released extent buffers in ZONED mode Date: Tue, 22 Dec 2020 12:49:08 +0900 Message-Id: <530bf9339d499c4f2209baeca7769a1c32a245bc.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED volumes, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch introduces a list of clean and unwritten extent buffers that have been released in a transaction. Btrfs redirty the buffer so that btree_write_cache_pages() can send proper bios to the devices. Besides it clears the entire content of the extent buffer not to confuse raw block scanners e.g. btrfsck. By clearing the content, csum_dirty_buffer() complains about bytenr mismatch, so avoid the checking and checksum using newly introduced buffer flag EXTENT_BUFFER_NO_CHECK. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/disk-io.c | 8 ++++++++ fs/btrfs/extent-tree.c | 12 +++++++++++- fs/btrfs/extent_io.c | 4 ++++ fs/btrfs/extent_io.h | 2 ++ fs/btrfs/transaction.c | 10 ++++++++++ fs/btrfs/transaction.h | 3 +++ fs/btrfs/tree-log.c | 6 ++++++ fs/btrfs/zoned.c | 37 +++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 9 files changed, 88 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 192e366f8afc..e9b6c6a21681 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -459,6 +459,12 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec return 0; found_start = btrfs_header_bytenr(eb); + + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) { + WARN_ON(found_start != 0); + return 0; + } + /* * Please do not consolidate these warnings into a single if. * It is useful to know what went wrong. @@ -4697,6 +4703,8 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, EXTENT_DIRTY); btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents); + btrfs_free_redirty_list(cur_trans); + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 88e103451aca..c3e955bbd2ab 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3374,8 +3374,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = check_ref_cleanup(trans, buf->start); - if (!ret) + if (!ret) { + btrfs_redirty_list_add(trans->transaction, buf); goto out; + } } pin = 0; @@ -3387,6 +3389,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, goto out; } + if (btrfs_is_zoned(fs_info)) { + btrfs_redirty_list_add(trans->transaction, buf); + pin_down_extent(trans, cache, buf->start, buf->len, 1); + btrfs_put_block_group(cache); + goto out; + } + WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)); btrfs_add_free_space(cache, buf->start, buf->len); @@ -4726,6 +4735,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, __btrfs_tree_lock(buf, nest); btrfs_clean_tree_block(buf); clear_bit(EXTENT_BUFFER_STALE, &buf->bflags); + clear_bit(EXTENT_BUFFER_NO_CHECK, &buf->bflags); set_extent_buffer_uptodate(buf); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6e3b72e63e42..129d571a5c1a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -24,6 +24,7 @@ #include "rcu-string.h" #include "backref.h" #include "disk-io.h" +#include "zoned.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -5048,6 +5049,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, btrfs_leak_debug_add(&fs_info->eb_leak_lock, &eb->leak_list, &fs_info->allocated_ebs); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); @@ -5825,6 +5827,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv, char *src = (char *)srcv; unsigned long i = get_eb_page_index(start); + WARN_ON(test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)); + if (check_eb_range(eb, start, len)) return; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 19221095c635..5a81268c4d8c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -31,6 +31,7 @@ enum { EXTENT_BUFFER_IN_TREE, /* write IO error */ EXTENT_BUFFER_WRITE_ERR, + EXTENT_BUFFER_NO_CHECK, }; /* these are flags for __process_pages_contig */ @@ -93,6 +94,7 @@ struct extent_buffer { struct rw_semaphore lock; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 4ffe66164fa3..ce480fe78531 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -21,6 +21,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" #define BTRFS_ROOT_TRANS_TAG 0 @@ -375,6 +376,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); @@ -2344,6 +2347,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + /* + * At this point, we should have written all the tree blocks + * allocated in this transaction. So it's now safe to free the + * redirtyied extent buffers. + */ + btrfs_free_redirty_list(cur_trans); + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 31ca81bad822..660b4e1f1181 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -92,6 +92,9 @@ struct btrfs_transaction { */ atomic_t pending_ordered; wait_queue_head_t pending_wait; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8ee0700a980f..930e752686b4 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -19,6 +19,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" /* magic values for the inode_only field in btrfs_log_inode: * @@ -2752,6 +2753,8 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans, free_extent_buffer(next); return ret; } + btrfs_redirty_list_add( + trans->transaction, next); } else { if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &next->bflags)) clear_extent_buffer_dirty(next); @@ -3296,6 +3299,9 @@ static void free_log_tree(struct btrfs_trans_handle *trans, clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); extent_io_tree_release(&log->log_csum_range); + + if (trans && log->node) + btrfs_redirty_list_add(trans->transaction, log->node); btrfs_put_root(log); } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 02373a7433b8..73e083a86213 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -10,6 +10,7 @@ #include "rcu-string.h" #include "disk-io.h" #include "block-group.h" +#include "transaction.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1151,3 +1152,39 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) */ btrfs_free_excluded_extents(cache); } + +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) +{ + struct btrfs_fs_info *fs_info = eb->fs_info; + + if (!btrfs_is_zoned(fs_info) || + btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) || + !list_empty(&eb->release_list)) + return; + + set_extent_buffer_dirty(eb); + set_extent_bits_nowait(&trans->dirty_pages, eb->start, + eb->start + eb->len - 1, EXTENT_DIRTY); + memzero_extent_buffer(eb, 0, eb->len); + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); + + spin_lock(&trans->releasing_ebs_lock); + list_add_tail(&eb->release_list, &trans->releasing_ebs); + spin_unlock(&trans->releasing_ebs_lock); + atomic_inc(&eb->refs); +} + +void btrfs_free_redirty_list(struct btrfs_transaction *trans) +{ + spin_lock(&trans->releasing_ebs_lock); + while (!list_empty(&trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + spin_unlock(&trans->releasing_ebs_lock); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 0cc0b27e9437..b2ce16de0c22 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -43,6 +43,9 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb); +void btrfs_free_redirty_list(struct btrfs_transaction *trans); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -127,6 +130,10 @@ static inline int btrfs_load_block_group_zone_info( static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } +static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) { } +static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Dec 22 03:49:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 860EEC433E0 for ; Tue, 22 Dec 2020 03:53:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5AE2122CB1 for ; Tue, 22 Dec 2020 03:53:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726361AbgLVDxc (ORCPT ); Mon, 21 Dec 2020 22:53:32 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46382 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725782AbgLVDxc (ORCPT ); Mon, 21 Dec 2020 22:53:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609211; x=1640145211; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KU5h2l6r10zE208M43rRcVxNWqUINjImzSjBF+rEi1E=; b=af1/BzxzzRyy0OJlvRYe09xp/tiGT8T+2VONfjKubS2piG3xTJaL+0N4 cKtOFdbSXzGzhMnA7KqHfSnhI3YqupxlgpOg62jqMGc1aOeOUi0jUfTT6 PgDhqQoyd77rXyS1BIJ+3IJnqP1OXi+8k5y/Z6qEmn63Efg38YtWciUou 9+z66+QyJ+JwEe/oEVRhW7AP0O4X2OZ8qkRpaxeAN5FgbYzsiVd2uORYc MsmdmAV4fbfDqM+8BCFwg9t2WF8R2cxQ/lD3eYI4pfNmqamOcwf70NrVw W03bptM4Zc6JxQ8hoy4zjVZkkViUuL0HmN2a+IQFYKDrnM0yW8yd/Lq+v w==; IronPort-SDR: bEC5U7xAixhwuApe96Aijd1gP6pLFswXCxLGmo1Ri+YBk0789KiLSwYZNxFSQy/a+alLRxlMrq 5tYVKyt3XhrxdYUbSQlUK3lJuB4g76UWNv7SxQii91rCOomQZa+GOopF4IfhTFSWjZe9/KKt+6 DnHJqLZqhtBIyuExSGvqRTUtteKKsqF1XJUK81IWVXMzz7G7KG7rbWCBpxS8oNiIbpaC4lZGhx UsAX9sBqDfh3dW6jG+tnVVSG/BZIyL+QGExYmBruZSOEXGce6F1InOZu5S22CG2o377tKQC+qx AUs= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193778" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:49 +0800 IronPort-SDR: iVgvwIjeeoAb9AUwqLqFogL/GxQ9E7Z6oiEYmryNV4vnEUDn1jiU9kgKd24TZp7KJyODbLeACF yhZWES12L8db+9/kSD8/lWT4MniSck45Z8xDLT8StZLtxYJdmkYTtsBhv6tqPE1gw8G7YldSFD PUu/c1vv7CWyWwgAkMAy4zjep7FyjDKaqfdoM/LXFDlUTLmaxwIheGJMo5eqTvALy1Kzq9jHDU oX6vujG2vGpU2/sVHUpT9sV2FTFIv7MlYeVVz8g4fCiuRZD4QWq5tUT3Zl7e56+mx18FIJ5L2I JANE1qPPs3QSKct2H7TfT5Q7 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:00 -0800 IronPort-SDR: N34iGXJxQquJejSW3+Ms3zJaePtHAVssTMN0e4arofeTf9eCUSANpwRIcgIf0WQDi7BLy0U7g8 0Ghov4Y6xuv8EXYQwYwkswUmYkCaXeHFQXzcxi6P1Dwxq3H7Nvf2b2AMksTecJdvDRkrDp0pML TyA+4p96gIV708GvJiXKZPEn5TVfoa04/My4sCY9bP8ynDPYcCyqvD3cDCCsGquQ+gmV7saHNl mcW7YYCTYtl6k0+cqQYimIEUpNaeJbC2K6n0AuZvf2PIMAYpBTRNDLm9/vl8EzkOmijNt6zy7B tw8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:49 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 16/40] btrfs: advance allocation pointer after tree log node Date: Tue, 22 Dec 2020 12:49:09 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Since the allocation info of tree log node is not recorded to the extent tree, calculate_alloc_pointer() cannot detect the node, so the pointer can be over a tree node. Replaying the log call btrfs_remove_free_space() for each node in the log tree. So, advance the pointer after the node. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 757c740de179..ed39388209b8 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2616,8 +2616,22 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; - if (btrfs_is_zoned(block_group->fs_info)) + if (btrfs_is_zoned(block_group->fs_info)) { + /* + * This can happen with conventional zones when replaying + * log. Since the allocation info of tree-log nodes are + * not recorded to the extent-tree, calculate_alloc_pointer() + * failed to advance the allocation pointer after last + * allocated tree log node blocks. + * + * This function is called from + * btrfs_pin_extent_for_log_replay() when replaying the + * log. Advance the pointer not to overwrite the tree-log nodes. + */ + if (block_group->alloc_offset < offset + bytes) + block_group->alloc_offset = offset + bytes; return 0; + } spin_lock(&ctl->tree_lock); From patchwork Tue Dec 22 03:49:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29CE8C433DB for ; Tue, 22 Dec 2020 03:53:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0851822CB2 for ; Tue, 22 Dec 2020 03:53:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726387AbgLVDxf (ORCPT ); Mon, 21 Dec 2020 22:53:35 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46466 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725782AbgLVDxe (ORCPT ); Mon, 21 Dec 2020 22:53:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609214; x=1640145214; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dbW14dbO41wF1P19H34BLRQ31vBlOrekBYTwdSWavus=; b=gVR4VWDgjV1zrqUvnglex1HF7eOvsqJ2FhecqX8z/NFygV37PNnEGItd fKR94GFzXqJaMwc/Z6wpaWdi/9ydvFOAMJX4Kx68smWLxNtGZ7qYzz5JT TQS4aLmJW8ZqSsqBZbo0WOSsShJpaBwRAwDvCVXeMTioaPEc++FNNMkn7 RzA41pEyb2ZB7mg/BaZ4VZ/sLNd90KHSJPIYSxfZX56cTxhfpbeB/uWkM tvGjCwZ5avwmu87fW6HEaDeA0JoX30yd+dutFby838nt2VWI4mHfxnj7a nKnvD0yCrTwaZTv05NvaYDu8ekOl7oGbYDTagmgEnxN3GPy5H4Z/+clP4 Q==; IronPort-SDR: aF+NF0b8Z4K0fYfxA3cg1shoZzhhqbPnj1H7nnUrAYyEK64zKECf1foOPPUDKTK68AzQtccVVo uqSQUqHIK2IwXTsMxvhsB310DH3cTgCZfvRvRF+Ys1hh/TyB+lYeR608KObKJ6BT+CJTBx2j11 coNDIBHGmgrXY+98lljRs9a/DdSy5guN5xY5bE8yDMtUi5RGP7pnT2nk+rmjWM5343w+uevlxb j+OD3wlw99vnUL2IPCcSWZgn722cdJ10yql86DT2WvGL01yowJBbeOpqxYU9Fs8IdpcIxMSTDu roI= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193780" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:51 +0800 IronPort-SDR: aj8AhSj9FbHZko0NcYimCzVv7k7Owl7HiCueXUdMwvmjMX9oKbFJ2YnoE733bL0mlwkLiq/Bqx qbOg3kznYImlyTWoasQQYVB/E/8FC/Km8rLEXSxaq7Ssi5Cj7mQCqH3RBnjXZy9tcQbbAVkDgC L7fRqX96bwu2apRqg/mA4Q3xgVrhvvWVUAuqpNh4D1OS5ag8NZiVMUVZC1oY8ffVzaObj9F82e 7hcXovq2I15kpqCF9ZDYUwAxntXB5dTz+t6uLNUlMO9/N0CtjnWfI+dfRAvcP9lIsM7w8NWsD4 kzOapYcaPK6LyfneIEyGvTZn Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:02 -0800 IronPort-SDR: rRLTkVlMwfxnRQMEZOSEErtj+4BFh7bCuGjFYVydqut3u4EQjZ4PQgo6uHzxB/e6XplU5jyisv M+oJEwyqeCv6S8VytQcXMiUiY3kZk9WDsPTnQU99ZuTZpTtuxUuPTM9SJpm2KLt7qScFvSb6ik RPcPxHLu/3s8dhttebtRLPkQ6BGcEQ2YDslN1Kaw7pz6psJzhfWhOS0jII+jBPVAR8HtQeuhvG KWi9kTxhXBGK05okWXk1ljMte72LreB2JYQU70izMWed1epUFCgX7o7X/8MzoeB7cmcD7RUvAN kpA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:50 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v11 17/40] btrfs: enable to mount ZONED incompat flag Date: Tue, 22 Dec 2020 12:49:10 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This final patch adds the ZONED incompat flag to BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount ZONED flagged file system. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e80ce910b61d..cc8b8bab241d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -299,7 +299,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ - BTRFS_FEATURE_INCOMPAT_RAID1C34) + BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ + BTRFS_FEATURE_INCOMPAT_ZONED) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) From patchwork Tue Dec 22 03:49:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D8FCC433DB for ; Tue, 22 Dec 2020 03:54:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 110BD22CB2 for ; Tue, 22 Dec 2020 03:54:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726480AbgLVDxv (ORCPT ); Mon, 21 Dec 2020 22:53:51 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46487 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725927AbgLVDxv (ORCPT ); Mon, 21 Dec 2020 22:53:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609230; x=1640145230; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sAi8e4ryOWkS17e0NBHujW2cp671zJWKZiSe9bRSrZg=; b=Xi7IsKPv3NWcshfsraJNq1C6ifkSm6pzFPNR9Kw2VBGO4Spwd5RIdKBP PiJwtPvaxPsyRT1sb7QSjt8NqmEdLuj6LKRwNhZt6Gk1TQfR801iSksr6 sX47U9FgvhKSHggVeWcIZ6H8BdNHmtZ+sTirKomQ4SbFd0j9LnkTwdHpG WjfY2vjCyYFsT7E3Mf92/wjxY8ciE81+hpTuqk5pK77lMLw5mvAUFSuhw VgjOXzbkliFJIbOAmiO7+xuYnqlk0GQ/a2Pycarmx5S8L242kEvUvC+Cl q///b49/luEaz8zjrsb5iMJmlgA1e6PqyvcGYQOTNPLzkTRfU6wSI41Vr g==; IronPort-SDR: h+NKtwcND57fWzvkugz6dTWSdEBgq0FJmwgxtZguSYJD61ytQYmphQqwJVv9gjkntUjTo1BKB3 dpQGMwmR5OYKJtG3mTO0QfwqvPoo1PwoCn6lWor2p3PVP4GMEVc5uFeSiN/OgZ79f9/RlGXKmY WZLRgw5X5AzPsAGUbM2YuNiJNaOHBl6BRAo3NQXUNty7WsCzbvFkt6OX1qnTaDJTD8XG7Jiwym /oD59p5qeuvyfXYs86gA6GLxIRDUHxPWTrf5L3MTaAAAQjr6QxtpreGNd0zgy4iAlEMyB79Wsm Mhk= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193785" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:53 +0800 IronPort-SDR: tca8aiMn29sqJPItghVM7uUy4XsJX3K+B/f23NRUFbjNc+ltkDc3JqvC8z/kNFPoLK6hRf56LY MxQoQ4TjINA/2p6AYKZ7ohrxvkW79xi53CIAdezlOCJlIehFRAyf64UJpZSGBsUDXwuN5Z169m I2QNoLV7REU4v/pxegyTKvfLk/aX3lWS6zTZOyBWlXaRCtIUyIeyIJcsO9b1N5fWUheZR4Xdxb 3QzW1PoGvo3VUz70JBzHCN/fQlrSk+aGSlzLfqZjuYuMpNsYb1mlBX6kDEg7cWDklYxJfZ75b/ U0z/MPtk+aG00MMteNXiDdeU Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:04 -0800 IronPort-SDR: 3eM7reyrx3lfYErw5EfIdm1C6kYmSABVo+wCTq8PT563rcjkAneUKdj571W9GUDfZ/c+Z4ZJMH WNw1B6NYOqI4mAVKnYzpID8s9S4rkLlBoO/gyvaM5eCHk21bEfzAwHahEsXuPzNHJYUFBbgCLr MR8Yy7mNobvhUvLF2yvkf3J9J0JiR/MpLsJsVzrR0JnzSQxrUaz7I4L2ObsnXMOw4ylVJoHsXg fFCC+gYCSJ5xRT3e5bs4G88E7JvHjqVbdr4FOJY/dZqbQrIEOWJ7gSyPYBGP0a8VGjwJ54c/Ag L9E= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:52 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 18/40] btrfs: reset zones of unused block groups Date: Tue, 22 Dec 2020 12:49:11 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For an ZONED volume, a block group maps to a zone of the device. For deleted unused block groups, the zone of the block group can be reset to rewind the zone write pointer at the start of the zone. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.c | 8 ++++++-- fs/btrfs/extent-tree.c | 17 ++++++++++++----- fs/btrfs/zoned.h | 16 ++++++++++++++++ 3 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index eea776180c37..9bc6a05c8e38 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1400,8 +1400,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) if (!async_trim_enabled && btrfs_test_opt(fs_info, DISCARD_ASYNC)) goto flip_async; - /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); + /* + * DISCARD can flip during remount. In ZONED mode, we need + * to reset sequential required zones. + */ + trimming = btrfs_test_opt(fs_info, DISCARD_SYNC) || + btrfs_is_zoned(fs_info); /* Implicit trim during transaction commit. */ if (trimming) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c3e955bbd2ab..ac24a79ce32a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1333,6 +1333,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { + struct btrfs_device *dev = stripe->dev; + u64 physical = stripe->physical; + u64 length = stripe->length; u64 bytes; struct request_queue *req_q; @@ -1340,14 +1343,18 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + req_q = bdev_get_queue(stripe->dev->bdev); - if (!blk_queue_discard(req_q)) + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, physical, length)) + ret = btrfs_reset_device_zone(dev, physical, + length, &bytes); + else if (blk_queue_discard(req_q)) + ret = btrfs_issue_discard(dev->bdev, physical, + length, &bytes); + else continue; - ret = btrfs_issue_discard(stripe->dev->bdev, - stripe->physical, - stripe->length, - &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index b2ce16de0c22..331951978487 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -210,4 +210,20 @@ static inline bool btrfs_check_super_location(struct btrfs_device *device, u64 p return device->zone_info == NULL || !btrfs_dev_is_sequential(device, pos); } +static inline bool btrfs_can_zone_reset(struct btrfs_device *device, + u64 physical, u64 length) +{ + u64 zone_size; + + if (!btrfs_dev_is_sequential(device, physical)) + return false; + + zone_size = device->zone_info->zone_size; + if (!IS_ALIGNED(physical, zone_size) || + !IS_ALIGNED(length, zone_size)) + return false; + + return true; +} + #endif From patchwork Tue Dec 22 03:49:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F7A3C433E9 for ; Tue, 22 Dec 2020 03:54:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4DB8B22CB2 for ; Tue, 22 Dec 2020 03:54:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726175AbgLVDx6 (ORCPT ); Mon, 21 Dec 2020 22:53:58 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46436 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725913AbgLVDx5 (ORCPT ); Mon, 21 Dec 2020 22:53:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609237; x=1640145237; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bU8sglzjy3FBMmGf/+DEoK6LMnSwuBzpBK7kWvoOAjE=; b=qIldjNK1B0IPo5TkXDtRT3pu5pcNRhWBztfT98TfD7sA75F9iSm/kwiL DsG7wY2yfr3d7LCLO5mcuZ9JXkw6RPgDaK2qDAGQLkWUrM1SZSWx6SMEk EJnAsICAW4UCJrkJslA+DJN/PgehfWqKjjYSVkLKo32O0mufK2i5/JO0V m1HrbblC7PlX6PdA+w0gLCfzKXRPXuVwq+j5RHp0hKaTBmTyAbyHENXmp A2s+tvMaJ/8HPeLEdiKTBKeKISQWmG7ixwOUiFphspApZZB/zhtRYYcii HQsjl+ZuBwn3vc52eikXo3KIr5LKgjNG1Hhg4D5Wq0bW1mUKfieOGwVIm A==; IronPort-SDR: GRsQJl4SBBfqEPGieKUdnYVSpf+/09QErxpnPnG92B94isBK4/1dPQGXWdL07Q3WUozzrvGHoN W00IZV1uuOObMumKDf/qbnqVjsQboXTBEvkv0T/BPwzWO6B7T8UW4E5+VLDXORS2smPN4AqMMl xLCkgwURHb5QqdW2jgocBuCo0Rvf7MSfervCATe85hYkVvNnTOukv2vGl9AUlbgISrtElXpeT2 O62kXHq1HZbGESfBn0OoHfpmUZSB5gJx2OXjhsByx9x4ZQvQgEp+GZZolMu62osZzuHg7TwLOX 5yA= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193789" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:54 +0800 IronPort-SDR: GkXxfFf3ZfcG4Q8i50FPskusLtSWDYH6mPohxIdi/mvt+1/a5LzNvoxsLsDjNlZvJw8FpNSPZ8 giwbZSsIQm1L3xlLrenghCWw2NNXAwOPNVVjXalzJASIG1BU5xT7TE5X3zifqnOBA5/KpT7c79 1Uu1bZJPTWkbxjymiw1Fvh3SpWX0LzokOuEBhNnnDZrnZlvyd4VFvzhjEH5pzFGswt22+8wBM2 8nPgBUQtX+8cq/znBH7S3kpgAEPiKZgLreJb65v3BQ8vV9pcfHMKC6QlQUs65Xu4A9JMN+sW0W nMJaa39yw+qWhsdtVQixeoHQ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:05 -0800 IronPort-SDR: /Jn3DbzDejVaTr1MEUMR+fZZ++/CbqOWBPD9MbS3NheEchcbLAk0kf+Roo4vCLrPFAC7qQZIDj lotU35bTwJAS3QRYKfhysLYqmFzAjddt11803D5MGlgV6paUaHmnEslfjbjjHcDFaBgB/A1tma u2cI+0hUnQjdYDBo2zYE2uETZBxzek2SSgTsja9HgPaDWaRCD6SQvViEcCK4uiJd0V6JyBLq2N ZzMthUodRlnKFUuWvFrdqNETUBgRUob3Sj7lwKf6BAGeWZHAPO11gKjAr01AqP8oNVZAZbPEfF GjY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:53 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v11 19/40] btrfs: extract page adding function Date: Tue, 22 Dec 2020 12:49:12 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit extract page adding to bio part from submit_extent_page(). The page is added only when bio_flags are the same, contiguous and the added page fits in the same stripe as pages in the bio. Condition checkings are reordered to allow early return to avoid possibly heavy btrfs_bio_fits_in_stripe() calling. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 56 ++++++++++++++++++++++++++++++++------------ 1 file changed, 41 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 129d571a5c1a..2f070a9e5b22 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3061,6 +3061,44 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size) return bio; } +/** + * btrfs_bio_add_page - attempt to add a page to bio + * @bio: destination bio + * @page: page to add to the bio + * @logical: offset of the new bio or to check whether we are adding + * a contiguous page to the previous one + * @pg_offset: starting offset in the page + * @size: portion of page that we want to write + * @prev_bio_flags: flags of previous bio to see if we can merge the current one + * @bio_flags: flags of the current bio to see if we can merge them + * + * Attempt to add a page to bio considering stripe alignment etc. Return + * true if successfully page added. Otherwise, return false. + */ +static bool btrfs_bio_add_page(struct bio *bio, struct page *page, u64 logical, + unsigned int size, unsigned int pg_offset, + unsigned long prev_bio_flags, + unsigned long bio_flags) +{ + sector_t sector = logical >> SECTOR_SHIFT; + bool contig; + + if (prev_bio_flags != bio_flags) + return false; + + if (prev_bio_flags & EXTENT_BIO_COMPRESSED) + contig = bio->bi_iter.bi_sector == sector; + else + contig = bio_end_sector(bio) == sector; + if (!contig) + return false; + + if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) + return false; + + return bio_add_page(bio, page, size, pg_offset) == size; +} + /* * @opf: bio REQ_OP_* and REQ_* flags as one value * @wbc: optional writeback control for io accounting @@ -3089,27 +3127,15 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - sector_t sector = offset >> 9; struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; ASSERT(bio_ret); if (*bio_ret) { - bool contig; - bool can_merge = true; - bio = *bio_ret; - if (prev_bio_flags & EXTENT_BIO_COMPRESSED) - contig = bio->bi_iter.bi_sector == sector; - else - contig = bio_end_sector(bio) == sector; - - if (btrfs_bio_fits_in_stripe(page, io_size, bio, bio_flags)) - can_merge = false; - - if (prev_bio_flags != bio_flags || !contig || !can_merge || - force_bio_submit || - bio_add_page(bio, page, io_size, pg_offset) < io_size) { + if (force_bio_submit || + !btrfs_bio_add_page(bio, page, offset, io_size, pg_offset, + prev_bio_flags, bio_flags)) { ret = submit_one_bio(bio, mirror_num, prev_bio_flags); if (ret < 0) { *bio_ret = NULL; From patchwork Tue Dec 22 03:49:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7814C4332B for ; Tue, 22 Dec 2020 03:54:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8C276230FB for ; Tue, 22 Dec 2020 03:54:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726260AbgLVDyB (ORCPT ); Mon, 21 Dec 2020 22:54:01 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46437 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725913AbgLVDyA (ORCPT ); Mon, 21 Dec 2020 22:54:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609239; x=1640145239; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VyxGN/7XXpdp+zOrzDjHCRgwKrztQYGRgwIFUc9Qbtw=; b=l9m9tdIiHm4NmC6k2G/b/NDQ11XbxrJ0o0J52zTlMyPyAINg6lqJaRGM dMg7Auh4aEJu344dngMI4Bmk7tE47R2T5UkGVELSWS7uwH4w/IdZrlTc0 YR01LGnjguxPyRTqr2grwxkGJij2scPrM8cfpHAVcYbfVbRuMSvKjviRE 5HIfY5vnXMqTX2Ij4bVQHhrOWuvWoW/yTZtCtgvKlhJOw0E8SuH3jEKK7 wrXm0ZNiFCiSQQHmnYqF7FybXOmOFKhtdFRS/G+lrexG3ywqpAE5lPTWo CzBD7U5H1xtESpw1NqC7zxCbG238atSRcjrT98UtWZ4cZxHP8H31w7xla w==; IronPort-SDR: RmNXAZ3quUrri4597y5xZaCBneEp9C04sU2WYH4iUlvFghqBmPHRB+ZmqQWOvMLM/EQH7W5JqN iukiCI9tBV9ZOfonEEnyc6n575KQk5Yt2BeT9Dm/2CMuIHdO5FIP9xqFkS7BCPvQuXDhFocwVQ Md5d7UmIP/jEYfLQQz2wQh+Eo63YBRw6OPo7ntqCrvsfprnEbbg+v9kPoFHcxpn/yNyfi79SKT TTsWVerSlikwREy1kH7sXaIa6e0u2r06VTvQth8grSp5EHzaJOVs1AubDIKsbEmk20wiMfg5Nv u5w= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193794" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:56 +0800 IronPort-SDR: xAT/PD6aabFGF+i/JjeUIuEIMsZu8vwIIUeth6EWn9vIvsefW5przy4OnDtWH4v4zOwfUl4C8I 0BG2cEbazDeZ8sMfLxferirRuyjo1Kbd4WP34nlGBr9EY5hrV8zc/neY9penZ2YcKTQEDav8ds 2CtNVOpqMN817fZKvk2LOPJzcYigfshtjYEOsjEIs7YXpRmH8yWgWbGzp4XIrw43wCTPeifIQo H8TuhanbS5FWnm5EijQX5lwSG0+jO1YNALd/ev7XcboXLA5ZIiXa3NXGAgS13+rSLpBsy5fJIK 4fjBGh1u1iYlBkyyS74qTDpG Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:07 -0800 IronPort-SDR: z0JH8bnb4b0MijYt4if0aF0PaodPKuFv+rFHxCEpLEzU6BmJoyyi9lFFMLNjFzdaR+/G6W/4y0 D10u21pFFHeSofdafAzdyFS/kLNGVGm9QgiKCTc/CH2tjuUhP2unHP3iCnNGAxkQ5PxOIVTYW3 7eUOMCS294dSsQ4z9MrsBcX+SHoe+6QpeZC7XePfg3Cd5ZMnGcXp5jlk5mXNV+Lp0RqkqTw3q0 +3vMc14ucbuzReESLTQopoiFYFrdj8uugSg+FvzPSuRc9ob1md06YV9etuq+w6uplFC8pbbDjH CGY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:55 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 20/40] btrfs: use bio_add_zone_append_page for zoned btrfs Date: Tue, 22 Dec 2020 12:49:13 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned device has its own hardware restrictions e.g. max_zone_append_size when using REQ_OP_ZONE_APPEND. To follow the restrictions, use bio_add_zone_append_page() instead of bio_add_page(). We need target device to use bio_add_zone_append_page(), so this commit reads the chunk information to memoize the target device to btrfs_io_bio(bio)->device. Currently, zoned btrfs only supports SINGLE profile. In the feature, btrfs_io_bio can hold extent_map and check the restrictions for all the devices the bio will be mapped. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2f070a9e5b22..d59b13f7ddcf 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3082,6 +3082,7 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, u64 logical, { sector_t sector = logical >> SECTOR_SHIFT; bool contig; + int ret; if (prev_bio_flags != bio_flags) return false; @@ -3096,7 +3097,12 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, u64 logical, if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) return false; - return bio_add_page(bio, page, size, pg_offset) == size; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) + ret = bio_add_zone_append_page(bio, page, size, pg_offset); + else + ret = bio_add_page(bio, page, size, pg_offset); + + return ret == size; } /* @@ -3127,7 +3133,9 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; + struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct extent_io_tree *tree = &inode->io_tree; + struct btrfs_fs_info *fs_info = inode->root->fs_info; ASSERT(bio_ret); @@ -3158,11 +3166,27 @@ static int submit_extent_page(unsigned int opf, if (wbc) { struct block_device *bdev; - bdev = BTRFS_I(page->mapping->host)->root->fs_info->fs_devices->latest_bdev; + bdev = fs_info->fs_devices->latest_bdev; bio_set_dev(bio, bdev); wbc_init_bio(wbc, bio); wbc_account_cgroup_owner(wbc, page, io_size); } + if (btrfs_is_zoned(fs_info) && + bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct extent_map *em; + struct map_lookup *map; + + em = btrfs_get_chunk_map(fs_info, offset, io_size); + if (IS_ERR(em)) + return PTR_ERR(em); + + map = em->map_lookup; + /* We only support SINGLE profile for now */ + ASSERT(map->num_stripes == 1); + btrfs_io_bio(bio)->device = map->stripes[0].dev; + + free_extent_map(em); + } *bio_ret = bio; From patchwork Tue Dec 22 03:49:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F09E5C4332E for ; Tue, 22 Dec 2020 03:54:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CBF5322CB1 for ; Tue, 22 Dec 2020 03:54:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726491AbgLVDyE (ORCPT ); Mon, 21 Dec 2020 22:54:04 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46443 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725911AbgLVDyD (ORCPT ); Mon, 21 Dec 2020 22:54:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609243; x=1640145243; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KEXyY8CTnaaxKFVhnnrT7BR5Y9jsuDvZJIBa4t1/tFA=; b=kNB/NcakUhHKJuLRNpfvzqx5gxIT+zqFXNVXnB/tuj21Y+PNt1Q2xBPh tm+l13Ujkp02AaYxjzLQAIAObM8rWEBoLLeZwXFeRkosN9vDrXyDf7xym WDu7vK5I8Gs4lWvHozMoM9EcY3xTa/zSoA6MoHmHQE+rqdi6b9zrXv2p+ 3CfZNEylt/3rHfrcm9te3eGAHxjuFFu/ZanXqT5lLLTVG5P1EFt0P3Rs9 6Q/xTLdceQxINLRY3UBhcdHUClug57TLtLi0Vg4q7ZZk3KHc1HigWlwR7 vswp3wRrzqfFWu0EZ+J2MMl38EpSy+5SKv7iA++Yk9M4eYiEwEPNKSdgW w==; IronPort-SDR: KQxgda17U57ngfUarCWc9gn2Io2B/JfIlsxVjpBw9VvF4tPfsytNK/QKE0jSxs+u7CJXrRys9e HVwyMo8W6BYW3+OVgFjtV+zdfDOnOW7s6ytoiOiH8uUy8pGAlFwOBW2RIYzR5ueGEcsyZIUX5j ZSavcalRp72tGw4FHxAoq3+W9nM7C3lsWgR27bZB19zag2PVLo2QIGC/QK0BIqcaiQxZypt4U9 kBd7/owPQhA2A4MQJ2XlymdAfHLVcjrHAFNBYpyHCaAtTwDAZU5+XOeFqchpP4kbhA9HNsurtk 8FQ= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193802" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:58 +0800 IronPort-SDR: JrHqiS/euJMCyDMCVzYYBUCtBeZswtISo6TYJKuECEtgqeJD9NgalXoup6A8vX4TTDQSCAlebv rkpvtxmZ3us4fU6GdYiPphOkk1b26MezaXTEAFwdJQoGrP2oKvHPbxJFMdBdhcJWyhLylaM0Ed zEacQQYIj6kIWKg3cYZX05dJHjZP8gf17+iC1zE730Fw5+xHcQkmzSQegGbOR9vF82n4QQM7v+ 35VkJZD3j07+maz+3l3p8Myo14jt4Q5HaytCn+CyS5Q6stcpVytfUZHsTJOTc+BZ8nifWRsyth sy88TKPopqAyhgMm9TSVHonE Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:09 -0800 IronPort-SDR: WCRwD5t0gU4D0pQpLRBbwqM5nKOdTIo0SnBX5jgIBIU1CUzIWNoLEn0QxZgJ/5DlTPOvnsTEnb XUpQH4l7cb9UNeDACeu+cOegFDNURBnXAIdhIUTM4q7zgZwgpr5ZrQbIP5+d0+lYd2RXADoABQ dbxE4ABqutXV1jYmOPuoB5efxbLm4OV9eLyrCNajcxG1vHHPhYETvrWFnzAiGieAJUrYPIH2Sv qUlR71b+CxY0pehxZd2e0wZNcsut/i8t/GTDl8cuXcmzxbg++d0CH7ELVJcsS4BiorYIB1ILQx 8Tk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:57 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v11 21/40] btrfs: handle REQ_OP_ZONE_APPEND as writing Date: Tue, 22 Dec 2020 12:49:14 +0900 Message-Id: <4d80758a4cfa908c862a293b4abb023a3faa963b.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org ZONED btrfs uses REQ_OP_ZONE_APPEND bios for writing to actual devices. Let btrfs_end_bio() and btrfs_op be aware of it. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/inode.c | 10 +++++----- fs/btrfs/volumes.c | 8 ++++---- fs/btrfs/volumes.h | 1 + 4 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e9b6c6a21681..1cbcf53ba756 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -652,7 +652,7 @@ static void end_workqueue_bio(struct bio *bio) fs_info = end_io_wq->info; end_io_wq->status = bio->bi_status; - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) wq = fs_info->endio_meta_write_workers; else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) @@ -828,7 +828,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int async = check_async_write(fs_info, BTRFS_I(inode)); blk_status_t ret; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { /* * called for a read, do the setup so that checksum validation * can happen in the async kernel threads diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9c2800fa80c6..37782b4cfd28 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2252,7 +2252,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) goto out; @@ -7682,7 +7682,7 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) if (!refcount_dec_and_test(&dip->refs)) return; - if (bio_op(dip->dio_bio) == REQ_OP_WRITE) { + if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) { __endio_write_update_ordered(BTRFS_I(dip->inode), dip->logical_offset, dip->bytes, @@ -7850,7 +7850,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip = bio->bi_private; - bool write = bio_op(bio) == REQ_OP_WRITE; + bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; blk_status_t ret; /* Check btrfs_submit_bio_hook() for rules about async submit. */ @@ -7900,7 +7900,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, struct inode *inode, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); size_t dip_size; struct btrfs_dio_private *dip; @@ -7930,7 +7930,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, struct bio *dio_bio, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const bool raid56 = (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e0d17e08a46c..322396ac3f8e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6469,7 +6469,7 @@ static void btrfs_end_bio(struct bio *bio) struct btrfs_device *dev = btrfs_io_bio(bio)->device; ASSERT(dev->bdev); - if (bio_op(bio) == REQ_OP_WRITE) + if (btrfs_op(bio) == BTRFS_MAP_WRITE) btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); else if (!(bio->bi_opf & REQ_RAHEAD)) @@ -6582,10 +6582,10 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, atomic_set(&bbio->stripes_pending, bbio->num_stripes); if ((bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) && - ((bio_op(bio) == REQ_OP_WRITE) || (mirror_num > 1))) { + ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) { /* In this case, map_length has been set to the length of a single stripe; not the whole write */ - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { ret = raid56_parity_write(fs_info, bio, bbio, map_length); } else { @@ -6608,7 +6608,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, dev = bbio->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) || - (bio_op(first_bio) == REQ_OP_WRITE && + (btrfs_op(first_bio) == BTRFS_MAP_WRITE && !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) { bbio_error(bbio, first_bio, logical); continue; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index c8841b714f2e..4e34830f3e78 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -426,6 +426,7 @@ static inline enum btrfs_map_op btrfs_op(struct bio *bio) case REQ_OP_DISCARD: return BTRFS_MAP_DISCARD; case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return BTRFS_MAP_WRITE; default: WARN_ON_ONCE(1); From patchwork Tue Dec 22 03:49:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23B5DC43331 for ; Tue, 22 Dec 2020 03:54:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E7FF922CB1 for ; Tue, 22 Dec 2020 03:54:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726500AbgLVDyN (ORCPT ); Mon, 21 Dec 2020 22:54:13 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46382 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725907AbgLVDyM (ORCPT ); Mon, 21 Dec 2020 22:54:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609252; x=1640145252; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=r1UsnUVb28RH0TOk1io1dbfEeAom1wOZV8QYwCxCXI8=; b=NYAmsjC3qXA4VYTyrjQpyghE0+191Y697k7ZPA8fsDaepRm11eOb0ngn YX1qof5JXCqf/XgiJbJ/VC3tMo8DiweouuPRlwjwlumhns80x9lxIlfQp j3I7c8OFKqZ1qNP9Wh43Ysn5nksOiv/s1VjshZND/SK/zKuPk7qY+Cus1 2vVF9trvO4LX3cQb8s0IfkNswMjjuKUEpnF+IQy4PisDof70/g+VmbsoE j0erMU1oqm9MQoAqnkJT6AinYSgIBdXgpVosr/Ynt6IH+CXhPQix568ht d2u38JfpuGaBFG31srwh8n0dmMbdZ2f8l8VP/6SB56iaHYYIC/6i2nS1f g==; IronPort-SDR: Cjzn6z7n1XxlbNbzj/DcyQ/g3NbE68TAXjUnEd2hW72M9QV4jFjJ4t+TXgT0gP0r5BHKK0W/zy ULku0hla0f0M5VmwYYwtP0K+24Vlio7wLNrNarlyBJFhgdx4gD+LcSXP7Wp1J9/Gk2kEFw4EOw eUpiB6B/iMKq5GmkPI8HIoht7L4ikfgvEZMKXbeuacxIcoJIz61r4+ToQfxg1V9TOpUF4oyIWF WAv/dm1PZNI+gua7BrOhiJjPbhj/IYQ1AJ3CjS32NnZWLGhC0EeaVkFxClzu4FroXsE4dqx0Am GAA= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193804" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:50:59 +0800 IronPort-SDR: M8JgcjK0MEIQoFiw4L+dcA9CCcpTQif2Gwf8ee2RZ/4DkP7k6qgp0KzGg06sQ0hZOHvdcy5r34 ZXFRCNvzi+MVYYtv7apj/CNIgzwABBeyyj55/Fe1W3RO+QRGV+qyHRud2M9RCPe9FcGZzLjxzJ 45haSHwIpryYAy4GzsgyGpltM7K+164AO5IFctIpMIRTU00KvUom2eJtz54KLRkHZzOdQb5rBR 7YMvjZAfpllsIAy5H0Ei1Qbg3W7WVEf+Rhwo7W5ZrB6dT8oQ40a1+KWzpZfTB2g+3me958W1Xa NaMoQji2Q6h616aMBpsB/rSb Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:10 -0800 IronPort-SDR: Q1swFIKLXgcck/IEwgtDQXD/37j1dJgZERl6EqURCV/9cFKhq6IFCV14XyQ3TjC1hjDGoqVN3u 1UHuHWLiiO3KNnrVKoOLYetcKNNur18PQZ01KWt3+1dQEwipw703fsZqxgJpG2c4ZvBEZjjrqf kRDw+IJt/1kTKC8StBU1s1Sv4Arm/EDXSxNGna10sEE21Jl1HwHCjt3cf1VydMKebHMSByjYz3 k6p33yJ2yNGwdxN6NFerQIDlJnuJzLEtTFkhtHhWziF8YZn975WbwenZkf7bQ4O6B/qyzzX1aG 2SY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:50:58 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , kernel test robot Subject: [PATCH v11 22/40] btrfs: split ordered extent when bio is sent Date: Tue, 22 Dec 2020 12:49:15 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For a zone append write, the device decides the location the data is written to. Therefore we cannot ensure that two bios are written consecutively on the device. In order to ensure that a ordered extent maps to a contiguous region on disk, we need to maintain a "one bio == one ordered extent" rule. This commit implements the splitting of an ordered extent and extent map on bio submission to adhere to the rule. [testbot] made extract_ordered_extent static Reported-by: kernel test robot Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 89 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.c | 76 +++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.h | 2 + 3 files changed, 167 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 37782b4cfd28..15e0c7714c7f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2217,6 +2217,86 @@ static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } +static int extract_ordered_extent(struct inode *inode, struct bio *bio, + loff_t file_offset) +{ + struct btrfs_ordered_extent *ordered; + struct extent_map *em = NULL, *em_new = NULL; + struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree; + u64 start = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + u64 len = bio->bi_iter.bi_size; + u64 end = start + len; + u64 ordered_end; + u64 pre, post; + int ret = 0; + + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + if (WARN_ON_ONCE(!ordered)) + return -EIO; + + /* No need to split */ + if (ordered->disk_num_bytes == len) + goto out; + + /* We cannot split once end_bio'd ordered extent */ + if (WARN_ON_ONCE(ordered->bytes_left != ordered->disk_num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a compressed ordered extent */ + if (WARN_ON_ONCE(ordered->disk_num_bytes != ordered->num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a waited ordered extent */ + if (WARN_ON_ONCE(wq_has_sleeper(&ordered->wait))) { + ret = -EINVAL; + goto out; + } + + ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes; + /* bio must be in one ordered extent */ + if (WARN_ON_ONCE(start < ordered->disk_bytenr || end > ordered_end)) { + ret = -EINVAL; + goto out; + } + + /* Checksum list should be empty */ + if (WARN_ON_ONCE(!list_empty(&ordered->list))) { + ret = -EINVAL; + goto out; + } + + pre = start - ordered->disk_bytenr; + post = ordered_end - end; + + btrfs_split_ordered_extent(ordered, pre, post); + + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, ordered->file_offset, len); + if (!em) { + read_unlock(&em_tree->lock); + ret = -EIO; + goto out; + } + read_unlock(&em_tree->lock); + + ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)); + em_new = create_io_em(BTRFS_I(inode), em->start + pre, len, + em->start + pre, em->block_start + pre, len, + len, len, BTRFS_COMPRESS_NONE, + BTRFS_ORDERED_REGULAR); + free_extent_map(em_new); + +out: + free_extent_map(em); + btrfs_put_ordered_extent(ordered); + + return ret; +} + /* * extent_io.c submission hook. This does the right thing for csum calculation * on write, or reading the csums from the tree before a read. @@ -2252,6 +2332,15 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct page *page = bio_first_bvec_all(bio)->bv_page; + loff_t file_offset = page_offset(page); + + ret = extract_ordered_extent(inode, bio, file_offset); + if (ret) + goto out; + } + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 79d366a36223..4f8f48e7a482 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -898,6 +898,82 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, } } +static void clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos, + u64 len) +{ + struct inode *inode = ordered->inode; + u64 file_offset = ordered->file_offset + pos; + u64 disk_bytenr = ordered->disk_bytenr + pos; + u64 num_bytes = len; + u64 disk_num_bytes = len; + int type; + unsigned long flags_masked = + ordered->flags & ~(1 << BTRFS_ORDERED_DIRECT); + int compress_type = ordered->compress_type; + unsigned long weight; + + weight = hweight_long(flags_masked); + WARN_ON_ONCE(weight > 1); + if (!weight) + type = 0; + else + type = __ffs(flags_masked); + + if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered->flags)) { + WARN_ON_ONCE(1); + btrfs_add_ordered_extent_compress(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type, + compress_type); + } else if (test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) { + btrfs_add_ordered_extent_dio(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type); + } else { + btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, disk_num_bytes, + type); + } +} + +void btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post) +{ + struct inode *inode = ordered->inode; + struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree; + struct rb_node *node; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + spin_lock_irq(&tree->lock); + /* Remove from tree once */ + node = &ordered->rb_node; + rb_erase(node, &tree->tree); + RB_CLEAR_NODE(node); + if (tree->last == node) + tree->last = NULL; + + ordered->file_offset += pre; + ordered->disk_bytenr += pre; + ordered->num_bytes -= (pre + post); + ordered->disk_num_bytes -= (pre + post); + ordered->bytes_left -= (pre + post); + + /* Re-insert the node */ + node = tree_insert(&tree->tree, ordered->file_offset, + &ordered->rb_node); + if (node) + btrfs_panic(fs_info, -EEXIST, + "zoned: inconsistency in ordered tree at offset %llu", + ordered->file_offset); + + spin_unlock_irq(&tree->lock); + + if (pre) + clone_ordered_extent(ordered, 0, pre); + if (post) + clone_ordered_extent(ordered, pre + ordered->disk_num_bytes, post); +} + int __init ordered_data_init(void) { btrfs_ordered_extent_cache = kmem_cache_create("btrfs_ordered_extent", diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 0bfa82b58e23..f9964276f85f 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -190,6 +190,8 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, u64 end, struct extent_state **cached_state); +void btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post); int __init ordered_data_init(void); void __cold ordered_data_exit(void); From patchwork Tue Dec 22 03:49:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDB63C433E0 for ; Tue, 22 Dec 2020 03:54:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA43722CB1 for ; Tue, 22 Dec 2020 03:54:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726513AbgLVDyQ (ORCPT ); Mon, 21 Dec 2020 22:54:16 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46466 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725973AbgLVDyP (ORCPT ); Mon, 21 Dec 2020 22:54:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609255; x=1640145255; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1ryrJqolQuBLyluqSpEAB6tG7QL/h6xLkGT2xVe2fUc=; b=l6DgOCab5kJdXgE2sgs86scCy4bfVGf/4mQQFD/mLWgQ0l+9i38GU0zr Gk7CgG2ps8qa5w19XbeXFnhFNlgGPQPCO2Snk1JjtTf+q6DOjNAdbXzWK 6tkauvjluOQeU8MSmRFS19RuygCiAqNLPLS9TTBuFjAfs8ONcfMrdF3uf MnYHjIv7xlaBINQNzgmRuXj15skeijfVEKm4sflkqsmifE1xMWbWVfTiY vnXAmUJsKz0qx570iRDoYGWJ9EbS3Yj94dwtbXJSNINaEvxSkfHyxBv58 CeA32sSWlY7D0yqvt6riUs6krkkVXTbQkvq31qd1yG9Od+AMRY+NfBrMu w==; IronPort-SDR: HhCQrK3FxXMWgIMZ2rxOUuZ0pdIh/cRc0ovgIGAMSLCb96A4a5zNX9o4hRLDo3U/kBFje+RcbM 4Di26m0Vc6WQJdfH0GdxnPCDuqNrlQ1NGdYMw8eqBNI2+JYOhzcwRiC74HMc7Lk/lywVAlzxF6 lTLNoBxxdxtPcI0V06yiOr8x+CYcIltGi4zYhEiMjLLv2ZEddFmf34BdD6iqL2CRXn1VF+zp5X M9s+ey9aGxLEeA1ysw/oLmx9DN+X5n4CPQSlXGoJgpB+Cjep10U6gQDJCTPyAMrw0k3AI3fuGy fiA= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193809" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:01 +0800 IronPort-SDR: ACkJkvOrRPW04yEHbav+RityWtnmYeo9461LrJr0nKFhux2Ip40y2BYKv9cHbTURCk9/PWBoxO 0llJyE4VGQBXsaxqyJaacF3SeMu4HP7H8gcshFpT/2Njkwu33q5FiadQuUssIrY1Iy+LCZldW3 UpUwwz9Z/epFBq/9Re2CUJxEl/4sZOa0QzKFiCFpszDkhThJEaJJOjOMJHpwNWaYKWtW20zxXo walf/qoLW5TqixaoiGI/TelvKld5QzADM5wAvl/nUAE5tjUSVvdG90QrPZZFjJn/pvYG/s7RFc FEZZcjqCnQ+6j503dYJpZxbI Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:12 -0800 IronPort-SDR: HibHF9LM71QsXr9g0/lYv7Kzz+b3ClhYiVBC5iDlN59ZjtQVY8U0G3p2f512nWKhq1Gz91SCp0 xyz4QuXlGmgkiVT1/GF/HjckJd2dprydAN1qEBN0gQKLALZDU+7cB4RmCZRjFDXlL4E8Spkrcx iT3ivxRz9Xb9WKETT22p8PMfuZ4Kf1ATsRYh1P3fr2fkSii9Yv12Jwx3xS1B+tDkp65g54oDVq zxbh062fz/cyyjgtJa8lr4iMR/Plu2tfkyCx0V/U0vwuT1BI8OAjCrwhTbDW9Ek+Vjx7QgIeDf +MY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:00 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 23/40] btrfs: extend btrfs_rmap_block for specifying a device Date: Tue, 22 Dec 2020 12:49:16 +0900 Message-Id: <62d40762a5bbcc27377d15ac76e5f5f874acbc1a.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org btrfs_rmap_block currently reverse-maps the physical addresses on all devices to the corresponding logical addresses. This commit extends the function to match to a specified device. The old functionality of querying all devices is left intact by specifying NULL as target device. We pass block_device instead of btrfs_device to __btrfs_rmap_block. This function is intended to reverse-map the result of bio, which only have block_device. This commit also exports the function for later use. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.c | 20 ++++++++++++++------ fs/btrfs/block-group.h | 8 +++----- fs/btrfs/tests/extent-map-tests.c | 2 +- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 9bc6a05c8e38..5b477617021f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1576,8 +1576,11 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) } /** - * btrfs_rmap_block - Map a physical disk address to a list of logical addresses + * btrfs_rmap_block - Map a physical disk address to a list of logical + * addresses * @chunk_start: logical address of block group + * @bdev: physical device to resolve. Can be NULL to indicate any + * device. * @physical: physical address to map to logical addresses * @logical: return array of logical addresses which map to @physical * @naddrs: length of @logical @@ -1587,9 +1590,9 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) * Used primarily to exclude those portions of a block group that contain super * block copies. */ -EXPORT_FOR_TESTS int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len) + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len) { struct extent_map *em; struct map_lookup *map; @@ -1607,6 +1610,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, map = em->map_lookup; data_stripe_length = em->orig_block_len; io_stripe_size = map->stripe_len; + chunk_start = em->start; /* For RAID5/6 adjust to a full IO stripe length */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) @@ -1621,14 +1625,18 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, for (i = 0; i < map->num_stripes; i++) { bool already_inserted = false; u64 stripe_nr; + u64 offset; int j; if (!in_range(physical, map->stripes[i].physical, data_stripe_length)) continue; + if (bdev && map->stripes[i].dev->bdev != bdev) + continue; + stripe_nr = physical - map->stripes[i].physical; - stripe_nr = div64_u64(stripe_nr, map->stripe_len); + stripe_nr = div64_u64_rem(stripe_nr, map->stripe_len, &offset); if (map->type & BTRFS_BLOCK_GROUP_RAID10) { stripe_nr = stripe_nr * map->num_stripes + i; @@ -1642,7 +1650,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, * instead of map->stripe_len */ - bytenr = chunk_start + stripe_nr * io_stripe_size; + bytenr = chunk_start + stripe_nr * io_stripe_size + offset; /* Ensure we don't add duplicate addresses */ for (j = 0; j < nr; j++) { @@ -1684,7 +1692,7 @@ static int exclude_super_stripes(struct btrfs_block_group *cache) for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); - ret = btrfs_rmap_block(fs_info, cache->start, + ret = btrfs_rmap_block(fs_info, cache->start, NULL, bytenr, &logical, &nr, &stripe_len); if (ret) return ret; diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 0f3c62c561bc..9df00ada09f9 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -277,6 +277,9 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info); int btrfs_free_block_groups(struct btrfs_fs_info *info); void btrfs_wait_space_cache_v1_finished(struct btrfs_block_group *cache, struct btrfs_caching_control *caching_ctl); +int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len); static inline u64 btrfs_data_alloc_profile(struct btrfs_fs_info *fs_info) { @@ -303,9 +306,4 @@ static inline int btrfs_block_group_done(struct btrfs_block_group *cache) void btrfs_freeze_block_group(struct btrfs_block_group *cache); void btrfs_unfreeze_block_group(struct btrfs_block_group *cache); -#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS -int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len); -#endif - #endif /* BTRFS_BLOCK_GROUP_H */ diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c index 57379e96ccc9..c0aefe6dee0b 100644 --- a/fs/btrfs/tests/extent-map-tests.c +++ b/fs/btrfs/tests/extent-map-tests.c @@ -507,7 +507,7 @@ static int test_rmap_block(struct btrfs_fs_info *fs_info, goto out_free; } - ret = btrfs_rmap_block(fs_info, em->start, btrfs_sb_offset(1), + ret = btrfs_rmap_block(fs_info, em->start, NULL, btrfs_sb_offset(1), &logical, &out_ndaddrs, &out_stripe_len); if (ret || (out_ndaddrs == 0 && test->expected_mapped_addr)) { test_err("didn't rmap anything but expected %d", From patchwork Tue Dec 22 03:49:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89D26C433DB for ; Tue, 22 Dec 2020 03:54:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5F09222CB1 for ; Tue, 22 Dec 2020 03:54:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726333AbgLVDyc (ORCPT ); Mon, 21 Dec 2020 22:54:32 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46487 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725946AbgLVDyc (ORCPT ); Mon, 21 Dec 2020 22:54:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609271; x=1640145271; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YNwDccOmGOQz7MgHjfTb/oJgwO37jM+ib/FFOJnV5UM=; b=PnkUtq885nZ8wEFZxQ15267tajcV6hNjvFNJdmHMd2dTtwXac/fgxrjI VvDNHoyEEQ2+76xSC88JS1CzF77FLz6SVfwRZH0QrUKT77qYlOYNiSbtv f0QaNzDxfmJnmlkW6T2ARzEKbsRqitEJUR8mw5Ce5msU25YUmGStEYO5x /+jxdPJQFMhkwzsy5xlkW5bAXbcPmalRtENonN4RrPLdr5EMlHtRfrFnz pbLbMXLrGeApuhIw/KW8zB3jtsBRz1014dEnGtKeIHBpng7flaX3Nm9Kk utD/fI0n6PUPAjarZ4aGg/AoUNCuL38h7JDxfM62EDG2poKJvuvFDH1Iq Q==; IronPort-SDR: t8V2velhHmPKG/jxoIw/zPzqNMK2wKsI+cZCYUqe7CBImI9LvdlaDTs4FC7r1FWFnAVoON0nKg xX96E+2Nt2hK352jhv1UjRD+lYcAfEfqXLYhbdhL7sgQHLVdu54pBTJ1DPaiFZerVLwfQXpan6 ef36CV6HEIshL2oIf2UmqhUt9CRRJ56/GOBa8yRZXmUzjKt+4dMP5WwmtYmeRzOvACqWyn9Lnu cprcwtHfrMUa/19jrk/6RrHEvK2Ff3h48osE6HDsRKl5e6APVsw0qETg3PQ1SNgwmuqAgDOJoh 61g= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193813" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:02 +0800 IronPort-SDR: TAmC8j8OZYItONk2k2wYs3jIr5CSz4k+cVUGm6oxbG+oCKV8QegLlVyc7tw3FmS1XmbD9mes9k rcs8mwxkKpWGKzmtwzpn8qZxtnBqTjI+g3fbRIU8O/SxVhh7QPXRVjELTv1+gZY5nKQpvlSrO0 vlUo+x5wkOteSsXAaDlBHHrNY4va6UfhnZGlqDg09u297GUfCWeIekdwnP72WbkP/THhSznI9a x9vagrplmA8quy4g8USxzOwLRL5sYUvywNV0nXjxDH3uY8ZfjmvvLgDfl17KShgmdzrajB+YFT +FHh9e/6MfKSa3/Q2hDaC8my Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:13 -0800 IronPort-SDR: c1MWcVe2NBnfZ4sLdxcnIQTCRMpEMTFSWtt9dDSppbNBzsWf6pCRhYRvyKb4piqkET7TQf2/p0 Cr02Ym1qJx9ZQDNYetxYUey5WnwNLTm+fSEs41lJ1kufOQq7odt1NTXrSldvWgluTR+Fi7TrTA 2aSQRvXROQmxKBJgT9HynklW/In5mZREMGN5Jrlg39y4hjno5YSimUCMPIKnBoAoXYa6maBiDb KTE9zpvV4Z110sRXrQEFNKaCr+6We1ohXXcGKuI7smM/F+zMgPqI+tRutgkfp9Z0FGYtHMXnWZ EuQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:02 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v11 24/40] btrfs: cache if block-group is on a sequential zone Date: Tue, 22 Dec 2020 12:49:17 +0900 Message-Id: <8fe22c91f24d5e79c8dbc349538908d7ba0bb2a4.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn In zoned mode, cache if a block-group is on a sequential write only zone. On sequential write only zones, we can use REQ_OP_ZONE_APPEND for writing of data, therefore provide btrfs_use_zone_append() to figure out if I/O is targeting a sequential write only zone and we can use said REQ_OP_ZONE_APPEND for data writing. Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 2 ++ fs/btrfs/zoned.c | 29 +++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 5 +++++ 3 files changed, 36 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 9df00ada09f9..a1d96c4cfa3b 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -184,6 +184,8 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + /* Flag indicating this block-group is placed on a sequential zone */ + bool seq_zone; /* * Allocation offset for the block group to implement sequential * allocation. This is used only with ZONED mode enabled. diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 73e083a86213..72735e948b6e 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1068,6 +1068,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) } } + if (num_sequential > 0) + cache->seq_zone = true; + if (num_conventional > 0) { /* * Avoid calling calculate_alloc_pointer() for new BG. It @@ -1188,3 +1191,29 @@ void btrfs_free_redirty_list(struct btrfs_transaction *trans) } spin_unlock(&trans->releasing_ebs_lock); } + +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_block_group *cache; + bool ret = false; + + if (!btrfs_is_zoned(fs_info)) + return false; + + if (!fs_info->max_zone_append_size) + return false; + + if (!is_data_inode(&inode->vfs_inode)) + return false; + + cache = btrfs_lookup_block_group(fs_info, em->block_start); + ASSERT(cache); + if (!cache) + return false; + + ret = cache->seq_zone; + btrfs_put_block_group(cache); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 331951978487..92888eb86055 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -46,6 +46,7 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -134,6 +135,10 @@ static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb) { } static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) +{ + return false; +} #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Dec 22 03:49:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9155C433DB for ; Tue, 22 Dec 2020 03:54:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9240D22CB1 for ; Tue, 22 Dec 2020 03:54:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726518AbgLVDyi (ORCPT ); Mon, 21 Dec 2020 22:54:38 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46436 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725946AbgLVDyi (ORCPT ); Mon, 21 Dec 2020 22:54:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609277; x=1640145277; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZPXNzG0U93Slup/YL94VuSnNANiIyriRuqgV+FyhWdk=; b=dGJUjR4OZjQU8Te2Aq6WqiJeKS71aUMZX/YrWIxWWkg1Cd5nBf4zbbn3 6+5ECt14sXn0fd7gv2N5n3kqWwKPO3tHgRAEcY8yzW+Hmc0a/JUG/KWUe EGJYaogz7HZHI+dWw3nSdWvzVYEuhWVY2Kwc5NI92OcvUqmKWVKsUpUNz Ox9ZcNsQWJuEPuUL8YyJBvM6ttFlAhRrNW68YrHXlboQFCO3vYChLmet6 XqzNbk6bkdhIaIhilOGJxuJU9ABng5k1KrjihSDYYUyGHHezl1XPsSoBK e0tn0uO3jKSoXb2aW/oKYO2mZVGPXS8/wE1IKEPTpy35KDoGzDj/sWxxK A==; IronPort-SDR: 4phThRdXx9U1DLwoLavYhMbbz8MCiwAUfILtFMIkAP0yVQNErf7z1iwpe4YVZtDvm2lE4noLr7 2P2QmbMYsFiG9jY2OljttRKmwyjHIGqO99xT8BIoCi9l9mkDsdKuZ90SX8IxvhC/k8tWPcKa/r Smq9/q7jWMF4xdELNAEbmLAl97Ym5AstUaQbDSds93EYoyeM9BPSHo1kkilyYXaSZH4huu5UB2 Yq6xNQ0VziX97qm1SKyYv8pFRE4bJ2iVTcI2dJuVbRhtdNIpBb8B+3YdarnIjD4y1qXQT24cVd Rcc= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193814" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:04 +0800 IronPort-SDR: KkAajbTiooSHNy18sLBf2pHOX9C3SAoxU7Kh4tUIkMLS1mxGxQkxwnYeurXSZYgXQ8R7J/DDG7 j6lBhXzl3zAjVDXU4ERGixKyYOysizjPOJDe1gbIaNFFFX25ixHjwoFjCXQII3V4Xh0q3Xg0mv AKZXAs8zaa9cMzNPkFjSL3ZLVFExmHaSsjBGihWAXyDL5oD0IidkdMrXQj16NttnqJ5VU0Rtaf pttQ36DgqdcDOf0Bv76VAP+w2M4kFz9o/upBTjoW7dFp+gKXjiC4ySYyGvF40bA54WmkVtUF0W VBGecg1NcehHoIFghqcSY2oT Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:15 -0800 IronPort-SDR: SdmTzsSstStzVAhL2dc8M/7vjj5wFAhAMVWEWalzru8oZXANMo3YQStGERcawjbLIlvb+g1wab POCWE2N9MZe4nfPsMIqbeL/fadHDOzG7CiZqyIUHpbPak0iRRqebg6JfU0Ci2/bIzfC/7av+Mi r78YC/WiunMjlAufmkTdEMPpPsKzT2PnUam5UB4AWx+S31faQELtvjlMfxtkSkow4tqY/T/Zus wy16rIc8SB0EP2sveovkXsyTkz47JGim8O991VTK36iRSYx6z8P8+95hBhB4NoCsueZj68Riyp zWY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:03 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v11 25/40] btrfs: use ZONE_APPEND write for ZONED btrfs Date: Tue, 22 Dec 2020 12:49:18 +0900 Message-Id: <7f7891de68acb153cb8b56747e2f38362f5f4d7a.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit enables zone append writing for zoned btrfs. When using zone append, a bio is issued to the start of a target zone and the device decides to place it inside the zone. Upon completion the device reports the actual written position back to the host. Three parts are necessary to enable zone append in btrfs. First, modify the bio to use REQ_OP_ZONE_APPEND in btrfs_submit_bio_hook() and adjust the bi_sector to point the beginning of the zone. Secondly, records the returned physical address (and disk/partno) to the ordered extent in end_bio_extent_writepage() after the bio has been completed. We cannot resolve the physical address to the logical address because we can neither take locks nor allocate a buffer in this end_bio context. So, we need to record the physical address to resolve it later in btrfs_finish_ordered_io(). And finally, rewrites the logical addresses of the extent mapping and checksum data according to the physical address (using __btrfs_rmap_block). If the returned address matches the originally allocated address, we can skip this rewriting process. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 11 ++++++- fs/btrfs/file.c | 2 +- fs/btrfs/inode.c | 4 +++ fs/btrfs/ordered-data.c | 3 ++ fs/btrfs/ordered-data.h | 8 +++++ fs/btrfs/volumes.c | 15 +++++++++ fs/btrfs/zoned.c | 68 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 12 ++++++++ 8 files changed, 121 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d59b13f7ddcf..0cffb6901e58 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2735,6 +2735,7 @@ static void end_bio_extent_writepage(struct bio *bio) u64 start; u64 end; struct bvec_iter_all iter_all; + bool first_bvec = true; ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { @@ -2761,6 +2762,11 @@ static void end_bio_extent_writepage(struct bio *bio) start = page_offset(page); end = start + bvec->bv_offset + bvec->bv_len - 1; + if (first_bvec) { + btrfs_record_physical_zoned(inode, start, bio); + first_bvec = false; + } + end_extent_writepage(page, error, start, end); end_page_writeback(page); } @@ -3579,6 +3585,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, size_t blocksize; int ret = 0; int nr = 0; + int opf = REQ_OP_WRITE; const unsigned int write_flags = wbc_to_write_flags(wbc); bool compressed; @@ -3625,6 +3632,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, offset = em->block_start + extent_offset; block_start = em->block_start; compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags); + if (btrfs_use_zone_append(inode, em)) + opf = REQ_OP_ZONE_APPEND; free_extent_map(em); em = NULL; @@ -3651,7 +3660,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, page->index, cur, end); } - ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, + ret = submit_extent_page(opf | write_flags, wbc, page, offset, iosize, pg_offset, &epd->bio, end_bio_extent_writepage, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e65223e3510d..5c120d8d060d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2176,7 +2176,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * the current transaction commits before the ordered extents complete * and a power failure happens right after that. */ - if (full_sync) { + if (full_sync || btrfs_is_zoned(fs_info)) { ret = btrfs_wait_ordered_range(inode, start, len); } else { /* diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 15e0c7714c7f..0ca5b6c9f0ef 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -50,6 +50,7 @@ #include "delalloc-space.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" struct btrfs_iget_args { u64 ino; @@ -2828,6 +2829,9 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) bool clear_reserved_extent = true; unsigned int clear_bits = EXTENT_DEFRAG; + if (ordered_extent->disk) + btrfs_rewrite_logical_zoned(ordered_extent); + start = ordered_extent->file_offset; end = start + ordered_extent->num_bytes - 1; diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 4f8f48e7a482..db3797bc8bb5 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -199,6 +199,9 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset entry->compress_type = compress_type; entry->truncated_len = (u64)-1; entry->qgroup_rsv = ret; + entry->physical = (u64)-1; + entry->disk = NULL; + entry->partno = (u8)-1; if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE) set_bit(type, &entry->flags); diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index f9964276f85f..6fdf44a2dea9 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -127,6 +127,14 @@ struct btrfs_ordered_extent { struct completion completion; struct btrfs_work flush_work; struct list_head work_list; + + /* + * used to reverse-map physical address returned from ZONE_APPEND + * write command in a workqueue context. + */ + u64 physical; + struct gendisk *disk; + u8 partno; }; /* diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 322396ac3f8e..d2b86aa1fc72 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6521,6 +6521,21 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_io_bio(bio)->device = dev; bio->bi_end_io = btrfs_end_bio; bio->bi_iter.bi_sector = physical >> 9; + /* + * For zone append writing, bi_sector must point the beginning of the + * zone + */ + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (btrfs_dev_is_sequential(dev, physical)) { + u64 zone_start = round_down(physical, + fs_info->zone_size); + + bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT; + } else { + bio->bi_opf &= ~REQ_OP_ZONE_APPEND; + bio->bi_opf |= REQ_OP_WRITE; + } + } btrfs_debug_in_rcu(fs_info, "btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u", bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector, diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 72735e948b6e..a4def29e7851 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1217,3 +1217,71 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) return ret; } + +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio) +{ + struct btrfs_ordered_extent *ordered; + u64 physical = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + + if (bio_op(bio) != REQ_OP_ZONE_APPEND) + return; + + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + if (WARN_ON(!ordered)) + return; + + ordered->physical = physical; + ordered->disk = bio->bi_disk; + ordered->partno = bio->bi_partno; + + btrfs_put_ordered_extent(ordered); +} + +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) +{ + struct extent_map_tree *em_tree; + struct extent_map *em; + struct inode *inode = ordered->inode; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_ordered_sum *sum; + struct block_device *bdev; + u64 orig_logical = ordered->disk_bytenr; + u64 *logical = NULL; + int nr, stripe_len; + + bdev = bdget_disk(ordered->disk, ordered->partno); + if (WARN_ON(!bdev)) + return; + + if (WARN_ON(btrfs_rmap_block(fs_info, orig_logical, bdev, + ordered->physical, &logical, &nr, + &stripe_len))) + goto out; + + WARN_ON(nr != 1); + + if (orig_logical == *logical) + goto out; + + ordered->disk_bytenr = *logical; + + em_tree = &BTRFS_I(inode)->extent_tree; + write_lock(&em_tree->lock); + em = search_extent_mapping(em_tree, ordered->file_offset, + ordered->num_bytes); + em->block_start = *logical; + free_extent_map(em); + write_unlock(&em_tree->lock); + + list_for_each_entry(sum, &ordered->list, list) { + if (*logical < orig_logical) + sum->bytenr -= orig_logical - *logical; + else + sum->bytenr += *logical - orig_logical; + } + +out: + kfree(logical); + bdput(bdev); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 92888eb86055..cf420964305f 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -47,6 +47,9 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio); +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -139,6 +142,15 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) { return false; } + +static inline void btrfs_record_physical_zoned(struct inode *inode, + u64 file_offset, struct bio *bio) +{ +} + +static inline void btrfs_rewrite_logical_zoned( + struct btrfs_ordered_extent *ordered) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Dec 22 03:49:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0C3DC433DB for ; Tue, 22 Dec 2020 03:54:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8756E22CB1 for ; Tue, 22 Dec 2020 03:54:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726533AbgLVDyl (ORCPT ); Mon, 21 Dec 2020 22:54:41 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46437 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725946AbgLVDyl (ORCPT ); Mon, 21 Dec 2020 22:54:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609281; x=1640145281; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Fey7rYYX2UHhwcJjDen2FDnpCBTzvjYdlVUCU43g4OQ=; b=YGeDR6yvhOXPOe/ALZYX1d73TkDzwSxjErigHz+N+a0sNzM3d1tPTaOB e/Mo/RyM1aDe3oEeKyhvP8o10t3bBpm3bCVKMDOyxAtWRx1HawohPneHU UZdvjjOLIC6DXHtvCrBEdRR1TA4N+8XXqYM+7d6wtH42c3W961bj4PeP9 ziSvNLc/Uy8fXS4Hgpg5/XKO50DCToVCB0JD3sTNTJx61qkt8hSUBQ2ZE 0UZdUf20yTl90+sufzLWO5ePF6Jm0oO1c7nhRhjO5S68LoxRo031kGDZN Ys11Zev7Ktca6/7DVj3Eko9YeGwtBlDbc4LZylNzRUL7KWEb/SBXeKXxc Q==; IronPort-SDR: FlTh8bmtq+daRYuDWOjLmYnbp7nv3m0P9cG1Wny4R1r2lRsvr8j/b1uZM5kMTU9RFZZAc3QpkV YzuDgFKyFznRD1z85IvKVi3GYy63/uLBKOAiB0so6JoinqA8t+UQp7oIWPu1VnQ5deA6XXshFT lC6fyqm/e3V/LHiWkiIb7fiLW9NKOce7OHExPPq3kSA/3Dr9/vzIWC+M3qbDh8A8fc5R+eifR/ TKA2IW+X5jZ1kr4Oa7mudwLQP1TkGrd1e6RdYbAV3PmRhOigbWhsUuynKh8rBi4qFBZddDFzvr 2I8= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193825" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:06 +0800 IronPort-SDR: sDkx8Vu6Ls29BL2y+xrqz4qxgg3fv0o3S+X2Tb1dqi9qDJo8La2p/0AlomuE90QPyJ4g0ZUe3T UtF6L5K3QDYyy7ySRbgCo8vJuZNN7fkGeAGMdp/WVzttB1zkQ0Iz1scsUdBd0IP6+3QKmkJ2tW 84T7adBtI/7Wcj1ImPmMHMCOQzYzsXKuLfwn4WvZcgvPh0LKRgt8lY6MrTmnpOzOX6O+YKQx6I W0zY3ZZ+mv49l9IdyekWVwSG6jqAPEAFSUIs7uQM1ygASvWPrXzSDrHsiCbnAHB+7hHHUtvITN A5dtpGIZOBIqb5CflUDE3Zus Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:17 -0800 IronPort-SDR: 2ZNDpBCxiz63nzS9hPojhT4jPmcvbVX539ffu+sla56BkCh2NkT2tnfx8+ORteedb0JnW6ec3e w22BNPasmUCbIqtFlogSfOChO0a7Y4FNUZmKXyp5uJa2TrxeioKIvx43mWUxfgGZq2OJDJiCM1 l/ksmKEmU4e+KcuGizswLtvsB1EiKeJg42XctiT6yuu+6XNKURBZ1JXosrrrDBrZJeDtYZXT55 HnzdJHLMXENT/cmBYSxSbjkuiHM8Qetkt2GIVLUP0MNLOOCgwSYAdnNq53Utg9oPHWars4YW++ 1o0= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:05 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v11 26/40] btrfs: enable zone append writing for direct IO Date: Tue, 22 Dec 2020 12:49:19 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Likewise to buffered IO, enable zone append writing for direct IO when its used on a zoned block device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0ca5b6c9f0ef..5e96d9631038 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7706,6 +7706,9 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, iomap->bdev = fs_info->fs_devices->latest_bdev; iomap->length = len; + if (write && btrfs_use_zone_append(BTRFS_I(inode), em)) + iomap->flags |= IOMAP_F_ZONE_APPEND; + free_extent_map(em); return 0; @@ -7934,6 +7937,8 @@ static void btrfs_end_dio_bio(struct bio *bio) if (err) dip->dio_bio->bi_status = err; + btrfs_record_physical_zoned(dip->inode, dip->logical_offset, bio); + bio_put(bio); btrfs_dio_private_put(dip); } @@ -8086,6 +8091,18 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, bio->bi_end_io = btrfs_end_dio_bio; btrfs_io_bio(bio)->logical = file_offset; + WARN_ON_ONCE(write && btrfs_is_zoned(fs_info) && + fs_info->max_zone_append_size && + bio_op(bio) != REQ_OP_ZONE_APPEND); + + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + ret = extract_ordered_extent(inode, bio, file_offset); + if (ret) { + bio_put(bio); + goto out_err; + } + } + ASSERT(submit_len >= clone_len); submit_len -= clone_len; From patchwork Tue Dec 22 03:49:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5A1CC433DB for ; Tue, 22 Dec 2020 03:54:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BFF5122ADC for ; Tue, 22 Dec 2020 03:54:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726548AbgLVDyp (ORCPT ); Mon, 21 Dec 2020 22:54:45 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46443 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725946AbgLVDyo (ORCPT ); Mon, 21 Dec 2020 22:54:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609284; x=1640145284; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IT+w7E9xT1lSCDrMBt2X0Qgo8oL+9wbBL7fpKIJFd+I=; b=C6HuIX069PTzxxonlsv4yayx+vcEaEAqWV/ZlLmgMVE0nw+i50AcS6LE scjCvzM0hlePM+qwg5Yjlpa2BbdX/3I2anDWir4RDxrwfft5iuqeaXRD/ 8puYd/N6Zfs7U+gfTsm0blgwIpJaB8os7yjnE30VLsIJ68A5TsXvyoWkK 0Gsq0uqDOSh8SqsfgnhKNrVt1GlrdZbc9tQUOt6eKPRTyxgM80av9sGLj Wp62JBL75UeEG8ZJecIA7nPzfhuIE7MRe98wRGL6GWME1JBtxS6qnXA4e pOQawEwtyLeWZBpJQNPNMNoLmvtGIB5iRapxxH8qe96Ms66kvgTJQXBKm w==; IronPort-SDR: dN89SUPhp+MbBfyjXC0D3l9l0OFAzLfLiEkLLunGULqyNZwYh1j3n+K2FBIOT8b1qciqnc8kvW UvrN8rRLCu52iQGgsJ3Djjeibt1b6wG4WIsM97gwmiak1hSKpluhruXXNYsoqP4+48dbcAswyE 39AAC3EqWXLVZ61ry3jzGDaw9zTbP1CioAVSCHqqooaB7dfnYVjm1Yl8Mgf3YY3w1pWqtgC/Or hkbQ7ckR5Ex7jLGTBuAi1WiKZ9EFW+h7HayW4fR0M2fx2XxcNeVuQ/HCxuH1jxY1KhR/Vt3JdH K0E= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193828" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:07 +0800 IronPort-SDR: xuidMcm8Gfapz7FSbKlF+gDHo/rHb9Dg8l5/PC8F5kFYZcuIbGQGWGcvZjcAIJsXl2pXh/dT6n MUaeCu6xccOEaN48WhuWq2i6w1bNrYgts5+8euahhLW1HikQDTukeQiIj/kwDjlW/a3ZNHED2X ytdSPGCu6cnDep5q8maYyYhzt+KtUrLJLHosw0WPSnXjZtW/Yg9AbtXJuO1BaEOdovxn2XQSK4 I4ieakSt94cKRo53hZa9zdn39Db//KLqj+zEaB3WmQB+ik1b6Aid+B7MSFjXsco3JrBeVsZHHM cm2dBHyA+YZNf+/nlP2zPfJV Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:19 -0800 IronPort-SDR: /bIPPa1IQCt2ryTRcrGNjbUB51KjCoqq7XtkXTe1TNdgbjtl4IJa+/DeNFSESirMTPt7AqWHoQ pFRdCna1RI0oHkIOCSY12PTFx9sCzUGomT4ugq9MSRz7G7U4A/RQ7v319nFOG4h39b3md0Fcad 4KyXFhw7HRHR11RC4o8LJJzzXQ6VIbwSAUQAGGLWSatTooi/odoWGIq3n2Iro/jHdatF2qnX68 wFVqzcOWn+ugKG3kb3IHo3pu7fN1V/d5Vv1Psy+0z8A3UmkiB4b1hbh2gifAlBKrcJPQtvOP1b IYk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:07 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 27/40] btrfs: introduce dedicated data write path for ZONED mode Date: Tue, 22 Dec 2020 12:49:20 +0900 Message-Id: <2b4271752514c9f376b1fc6a988336ed9238aa0d.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If more than one IO is issued for one file extent, these IO can be written to separate regions on a device. Since we cannot map one file extent to such a separate area, we need to follow the "one IO == one ordered extent" rule. The Normal buffered, uncompressed, not pre-allocated write path (used by cow_file_range()) sometimes does not follow this rule. It can write a part of an ordered extent when specified a region to write e.g., when its called from fdatasync(). Introduces a dedicated (uncompressed buffered) data write path for ZONED mode. This write path will CoW the region and write it at once. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5e96d9631038..5f4de6ebebbd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1400,6 +1400,29 @@ static int cow_file_range_async(struct btrfs_inode *inode, return 0; } +static noinline int run_delalloc_zoned(struct btrfs_inode *inode, + struct page *locked_page, u64 start, + u64 end, int *page_started, + unsigned long *nr_written) +{ + int ret; + + ret = cow_file_range(inode, locked_page, start, end, + page_started, nr_written, 0); + if (ret) + return ret; + + if (*page_started) + return 0; + + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + extent_write_locked_range(&inode->vfs_inode, start, end, WB_SYNC_ALL); + *page_started = 1; + + return 0; +} + static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes) { @@ -1879,17 +1902,24 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page { int ret; int force_cow = need_force_cow(inode, start, end); + const bool do_compress = inode_can_compress(inode) && + inode_need_compress(inode, start, end); + const bool zoned = btrfs_is_zoned(inode->root->fs_info); if (inode->flags & BTRFS_INODE_NODATACOW && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); } else if (inode->flags & BTRFS_INODE_PREALLOC && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_can_compress(inode) || - !inode_need_compress(inode, start, end)) { + } else if (!do_compress && !zoned) { ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); + } else if (!do_compress && zoned) { + ret = run_delalloc_zoned(inode, locked_page, start, end, + page_started, nr_written); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); ret = cow_file_range_async(inode, wbc, locked_page, start, end, From patchwork Tue Dec 22 03:49:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25079C433E0 for ; Tue, 22 Dec 2020 03:54:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EDBD422CB1 for ; Tue, 22 Dec 2020 03:54:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726552AbgLVDyy (ORCPT ); Mon, 21 Dec 2020 22:54:54 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46382 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726087AbgLVDyx (ORCPT ); Mon, 21 Dec 2020 22:54:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609293; x=1640145293; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MQiHU4ncWfcd60/PGbKA0MAGmm9qh3uCGdWyTZD7aOs=; b=PpWNjr6wu15N7TRchM/VMp8OE5wJ/UiQE1a1r3hBa4Oe6aG4QawTfAWf VrE3x/qbXGGQPZi+ylN71WEC4cD9o65/fGX5kW3yvyO6XyG0E+ZMxFidL 9YjAUr+IzaPk8LTDuJdr5iH315HFfB+RIT3KE1wO+4agWLQBqtt6x7SVi z78soIbU/jWGP7FDuLgnhSmLecbhqv5T9Cd6wdoZkfCwoVCBx8j5H8HDk 2xGxj8ZJZgA1vmjYZLjwxS8jqibQQvPJh+hTGrbogllt2/xXIuiYnRP1v byydfIATUOi/PgIn2GhsBeZTpPs3imtHf3yFei1bG2vn/QzJQjv/AwHaa w==; IronPort-SDR: Zcw4O34c3V+wMG2RNki7Fs4QbUNVgYjRC7eO74pXaGDYXXl1kww0rQyZo/yNo8XVF7asqsp+Q7 SUtSh0y2MpQS3aB8I93Q14qq4TFfzpI5V22+6k0FWETPlIIX7S/gLLt/LfzgGknqbMNVPVJgAy 4564hBU2yM7BtphPZPYlU5qNXiTRWNBmjrH1zGCc6k5OEJydtfJIOsgpLoR+FtxSvUPfq7GQBT GxtzJCPtqBNytz83oQ+3AEtVq+JsgK4tlrldp5N4nTRgmXpMcr/4VXwkMpwDl9FMjXg0q4pPEH 4lM= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193834" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:09 +0800 IronPort-SDR: Q2P14uhfhI6TXelZ5IQWrkmRacrfA13jYjXoCEOLzoW3ZanvejznJcgQ83F47a/f+XR1BfsviZ +MHPepCnytuMQcPNzANM8uE8pGei5TVoqULxVcZQm9H+/kPohJ3nB4iLTZkzpX3ShyfNal+svl ODHBLnzjE2gxFfr9GxX8Y4BoHDvsnsO6UQOXqFCbEq6RyNI9smcpQw/mMnfi/uo6NLqK6oD/o1 9+VS9DtQBN0GocRpdEi9gyVNml0gPvidMqbwBOnPgUrB5WuVMUBP7YPm+6da1hs0aKlqW24NcS q8GNFFBb4OOKoDEli7lH/7pe Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:20 -0800 IronPort-SDR: jUOlX4/3STiudKZ4NA5jyWdWLKkTeRj0Yo+Se5CO/2SkIuB4t3OkUCCAIWGqI4G/m7cdG2gxlW Stv4nKLjfDGEgWmh478t6lab+eEpHYOtX0K7uFqfV0e74waUT+dFxar4i+Inzj5pqAeo28uZeC /ffsOlQjhHaPbk0amOuItKXfe6qFghBUwT65wUzj3jO7cM4dp9azPnfeSSZvuNiTEpAHDVuK6I lBvt6qhJ0yisuhMhxE1W2Xt0VMRtHbw0aQ2ujheq+noc5C29zdAnTax2kzuX1Oj+A1a8n8VQR+ avk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:08 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v11 28/40] btrfs: serialize meta IOs on ZONED mode Date: Tue, 22 Dec 2020 12:49:21 +0900 Message-Id: <660d1b81f3f865dbd728ef6cd0b7efb669f36743.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We cannot use zone append for writing metadata, because the B-tree nodes have references to each other using the logical address. Without knowing the address in advance, we cannot construct the tree in the first place. So we need to serialize write IOs for metadata. We cannot add a mutex around allocation and submission because metadata blocks are allocated in an earlier stage to build up B-trees. Add a zoned_meta_io_lock and hold it during metadata IO submission in btree_write_cache_pages() to serialize IOs. Furthermore, this add a per-block group metadata IO submission pointer "meta_write_pointer" to ensure sequential writing, which can be caused when writing back blocks in an unfinished transaction. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 1 + fs/btrfs/extent_io.c | 25 ++++++++++++++++++++- fs/btrfs/zoned.c | 50 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 32 +++++++++++++++++++++++++++ 6 files changed, 109 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index a1d96c4cfa3b..19a22bf930c6 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -192,6 +192,7 @@ struct btrfs_block_group { */ u64 alloc_offset; u64 zone_unusable; + u64 meta_write_pointer; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index cc8b8bab241d..1085f8d9752b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -976,6 +976,7 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; + struct mutex zoned_meta_io_lock; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1cbcf53ba756..1f0523a796b4 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2704,6 +2704,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) mutex_init(&fs_info->delete_unused_bgs_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); + mutex_init(&fs_info->zoned_meta_io_lock); seqlock_init(&fs_info->profiles_lock); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 0cffb6901e58..80e5352d8d2c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -25,6 +25,7 @@ #include "backref.h" #include "disk-io.h" #include "zoned.h" +#include "block-group.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -4072,6 +4073,7 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, struct extent_buffer **eb_context) { struct address_space *mapping = page->mapping; + struct btrfs_block_group *cache = NULL; struct extent_buffer *eb; int ret; @@ -4104,13 +4106,31 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, if (!ret) return 0; + if (!btrfs_check_meta_write_pointer(eb->fs_info, eb, &cache)) { + /* + * If for_sync, this hole will be filled with + * trasnsaction commit. + */ + if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) + ret = -EAGAIN; + else + ret = 0; + free_extent_buffer(eb); + return ret; + } + *eb_context = eb; ret = lock_extent_buffer_for_io(eb, epd); if (ret <= 0) { + btrfs_revert_meta_write_pointer(cache, eb); + if (cache) + btrfs_put_block_group(cache); free_extent_buffer(eb); return ret; } + if (cache) + btrfs_put_block_group(cache); ret = write_one_eb(eb, wbc, epd); free_extent_buffer(eb); if (ret < 0) @@ -4156,6 +4176,7 @@ int btree_write_cache_pages(struct address_space *mapping, tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; + btrfs_zoned_meta_io_lock(fs_info); retry: if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); @@ -4196,7 +4217,7 @@ int btree_write_cache_pages(struct address_space *mapping, } if (ret < 0) { end_write_bio(&epd, ret); - return ret; + goto out; } /* * If something went wrong, don't allow any metadata write bio to be @@ -4231,6 +4252,8 @@ int btree_write_cache_pages(struct address_space *mapping, ret = -EROFS; end_write_bio(&epd, ret); } +out: + btrfs_zoned_meta_io_unlock(fs_info); return ret; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index a4def29e7851..01f84b4c4224 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1126,6 +1126,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) ret = -EIO; } + if (!ret) + cache->meta_write_pointer = cache->alloc_offset + cache->start; + kfree(alloc_offsets); free_extent_map(em); @@ -1285,3 +1288,50 @@ void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) kfree(logical); bdput(bdev); } + +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + struct btrfs_block_group *cache; + bool ret = true; + + if (!btrfs_is_zoned(fs_info)) + return true; + + cache = *cache_ret; + + if (cache && (eb->start < cache->start || + cache->start + cache->length <= eb->start)) { + btrfs_put_block_group(cache); + cache = NULL; + *cache_ret = NULL; + } + + if (!cache) + cache = btrfs_lookup_block_group(fs_info, eb->start); + + if (cache) { + if (cache->meta_write_pointer != eb->start) { + btrfs_put_block_group(cache); + cache = NULL; + ret = false; + } else { + cache->meta_write_pointer = eb->start + eb->len; + } + + *cache_ret = cache; + } + + return ret; +} + +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ + if (!btrfs_is_zoned(eb->fs_info) || !cache) + return; + + ASSERT(cache->meta_write_pointer == eb->start + eb->len); + cache->meta_write_pointer = eb->start; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index cf420964305f..a42e120158ab 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -50,6 +50,11 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, struct bio *bio); void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret); +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -151,6 +156,19 @@ static inline void btrfs_record_physical_zoned(struct inode *inode, static inline void btrfs_rewrite_logical_zoned( struct btrfs_ordered_extent *ordered) { } +static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + return true; +} + +static inline void btrfs_revert_meta_write_pointer( + struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -243,4 +261,18 @@ static inline bool btrfs_can_zone_reset(struct btrfs_device *device, return true; } +static inline void btrfs_zoned_meta_io_lock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_lock(&fs_info->zoned_meta_io_lock); +} + +static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_unlock(&fs_info->zoned_meta_io_lock); +} + #endif From patchwork Tue Dec 22 03:49:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61FE9C433DB for ; Tue, 22 Dec 2020 03:54:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 392C722CB1 for ; Tue, 22 Dec 2020 03:54:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726561AbgLVDy4 (ORCPT ); Mon, 21 Dec 2020 22:54:56 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46466 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726087AbgLVDy4 (ORCPT ); Mon, 21 Dec 2020 22:54:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609295; x=1640145295; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/CovWTmDimoUKmByImuI2MgnWsjIyjtgTlqwHlnPpPQ=; b=QfijF8Ih+mHXplSuRs95muo7w3LHOolECgc52FanK/jzRwV0+ZqVWZOt 5eRbd0PprmtcfXwqQsSe/pEC77VizLubogVjUvGXEMnD8lZD3zgQXlaIm xuADtt305wuMG8Zv3tocxfRklXilWooLKsoWU4b8cCevzK8nD6oDFd1r5 IHLFAtiPtfujq9RQ4LTNS21dsJJiTDuiZ76DmwR7d0ac6rCTmj2P+3asS PHsL0HsxfQvauKJIbwAUVoCaQAKtMBVcq5JSyGec49O1u5nMzNKEIGVID KDuDhBePZAclKk6EERjdNciu1JdDEjelQ5ql+Wq00O/MF92RRhd4nP+qn Q==; IronPort-SDR: v5Amq28hF/Mlit3770iCwArIZbMfTDCLp7is1gdZHpJWcsje+oNY9B/u7I34DSFRp+lQAUy3MQ Z+tYOl77zKoxJPiAHKom6VTUgSH7vBxtvIY9iJKWDVobX59Awbr3iIiA4agD6QCNl58UwAZmr8 1hHC77b+HrLpgQQqG/FHLddFJfsFVBLrWtB7fWOrHBryY6X3azDL3POxKEPycTNvxp8L3BsSt9 aXKldi9SDYFiaaEqS2nt9eXuOinKlnx7U96VIY41uyA08+aQg8pJys7o6d9GgCMrqiXXLFUR7h 77I= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193838" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:11 +0800 IronPort-SDR: UShBFw8GB5pfStahL3+ESJ8WyVrDqhXlJMMIGItHiXnaEinZ+eF8CZ/fJR/dwonSXIC2ZniX4l 4Mnoo1+ccKmL2hY9f3dEtr5qlRhrVmNfnPqzz7fVBCfcmi3iAm5HipKgyrFoaS7t/3tKZvqLEo mTWZP9MmXKeDZTpMZXILSMoabY3y3etx3hYbaNIWw5tSLseKy+5QyUCxeZlS1tQQWm17bYlNgf rBgGOfWPNei4PfhlYPTp4X+Ya7O5GOLIfvi7vwEs2mW6eQjN7NR15L/CodLFEoIKo4J85P2WJ3 zOTgQAlDoogy+tWoNnVOtPld Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:22 -0800 IronPort-SDR: TtipgX++M9XinyIjILi9e1my06L6Imq8Stl0HsIIDJ7Kv8GVoXvhk9AZekiCsENZIFa7qXloub GePYEMOKpuXzdWlqtOKkzLBhEYhwemzqzURPTnPqagL5NcSV8/SA2O3DzKlX/tYGiTdvCzOIj4 nX13k5KqPy9myUI0Ei0Wop2qX551xXkJ3eog9U3CHSSmIER5GCIk3GVHNS3idL9N7dXi6XSDJw I9g5vaBo3QDufJewJuv4ILUhg3sBzywLkbb8O1DU/bpco03FJrt+qClIxCGfpDoyyGD2ilcmSg oiw= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:10 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v11 29/40] btrfs: wait existing extents before truncating Date: Tue, 22 Dec 2020 12:49:22 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When truncating a file, file buffers which have already been allocated but not yet written may be truncated. Truncating these buffers could cause breakage of a sequential write pattern in a block group if the truncated blocks are for example followed by blocks allocated to another file. To avoid this problem, always wait for write out of all unwritten buffers before proceeding with the truncate execution. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5f4de6ebebbd..04ca49504518 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5134,6 +5134,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_drew_write_unlock(&root->snapshot_lock); btrfs_end_transaction(trans); } else { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + if (btrfs_is_zoned(fs_info)) { + ret = btrfs_wait_ordered_range( + inode, + ALIGN(newsize, fs_info->sectorsize), + (u64)-1); + if (ret) + return ret; + } /* * We're truncating a file that used to have good data down to From patchwork Tue Dec 22 03:49:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985697 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CF5EC433DB for ; Tue, 22 Dec 2020 03:55:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D4BFE22CB1 for ; Tue, 22 Dec 2020 03:55:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726569AbgLVDzN (ORCPT ); Mon, 21 Dec 2020 22:55:13 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46487 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725885AbgLVDzM (ORCPT ); Mon, 21 Dec 2020 22:55:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609312; x=1640145312; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QHop2SNGXgOREr2Aczdit1N1Gf0FQv5Ys9s3ZFZ8P9k=; b=XWqERA7o/Ux9f+O4+ZRfUSg+h+qOrGRv13HMYdSXHj6j2UJO7n9yjt0g ghB9f8ki680BxY5p0UDKYwmrB4rxAdfum4U4a6lXEkusY8xJQ6SegV7Ye pBBFyCvNVKR+MgJu2Sj3F5sBtQEPRVSFpLbX0nUt2w+nrSTou1mCk2hZc BkIKpOp7ugnkSpL1/Q0337osOlqH1Z5oA92icxyvdIsMwLa8cftVofXs2 LuPTQ3S4Dw9jDmV6m1P75EL6IJB9HqJkYI96EYUNaoJfwR+VqXtWvPfg7 CXIfEiBQ+mAIEJF27fwAmflcwYxGfKxdZHhIclYEfck38ZNQ70S0+n7S2 g==; IronPort-SDR: mU+rUmBvHvTbrKQxl7Jqjh0NIHMdrcgXEgcII7UO+c1NS8pBFj12j6JYDjhsK0uw+jcPcke2M1 3Z/fuqvL8pVqAP16TTg+P0fyXP8+9rKzrI4rMDDBtwnZGnv/Pj6sp+w6K/JEgW2hAzcnPHMZCL ddq7TRALXqovp4kkgeFafNzZmfoAeHc9SKRuZJ537jHT89Nf/uK+puTtABGlI6/Wpby+/5iAJG LwkuG9VSLUyjIcWdW+D1FadHAvOZMf63eaJ6vvTO/PKBz+28dRew8xgm5/AKy1GCKasPyG/dZb 3ro= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193843" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:12 +0800 IronPort-SDR: g36KfR1DANNGf4ayVaNvekk20hS8GFDOfpmKNJ9B1iG0vjSrXsOZn7azCkf1EyNotqOrAR/46i 05FoB2makSHxOENZsfNE4f06ybFb7usDJP+THEROJh35MK05FzUT2JFcxZpIQ3Kz1uB20V15e9 1ponRdjwheVffYq9njNyDk9w0s/xx1JducLtzQTvsF6glgujrh/RxeMB+zXo38Z4b4Yl+t3htT jRyvaR909Tp0pnZkvvLtTCyA9BYifCMP5m9p7VA45V0IcIJ4j7k0rcTg8psO8PkJ1qWSDWPh8f HAIAkDmuVS10RefAo9Z779jo Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:24 -0800 IronPort-SDR: FHvKbuAzj7LeplJ/orqO5iXiwCa0tTX8wVI2tzVuXKQwJ2SadnosVNijtDyEpoHcCSsjACLv4K AgFqzGEbe+XFMR0PIpu+ocDgh/sv6UIlO7dpxD+tWt1IF1cN6M3/PIqJuiJ2dyKR7SkxAdUe8Z BrQCX2Fp+hq5qB/ePdErqoWs28OYWX4W3kxGPdyF9qWnQKJD5YfoIE5TXnaFI/XkU2DcGlru90 eTXEngfxKSKy4HJtPx2xW3CSuBlW0DCpQiiyjBCWbHRsuVvvX/frdBZvBvLbMk3KXKmqxs6KJN tNw= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:12 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v11 30/40] btrfs: avoid async metadata checksum on ZONED mode Date: Tue, 22 Dec 2020 12:49:23 +0900 Message-Id: <57bf58857026225f3e3500003b489075e9c8dda1.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In ZONED, btrfs uses per-FS zoned_meta_io_lock to serialize the metadata write IOs. Even with these serialization, write bios sent from btree_write_cache_pages can be reordered by async checksum workers as these workers are per CPU and not per zone. To preserve write BIO ordering, we can disable async metadata checksum on ZONED. This does not result in lower performance with HDDs as a single CPU core is fast enough to do checksum for a single zone write stream with the maximum possible bandwidth of the device. If multiple zones are being written simultaneously, HDD seek overhead lowers the achievable maximum bandwidth, resulting again in a per zone checksum serialization not affecting performance. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1f0523a796b4..efcf1a343732 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -814,6 +814,8 @@ static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio, static int check_async_write(struct btrfs_fs_info *fs_info, struct btrfs_inode *bi) { + if (btrfs_is_zoned(fs_info)) + return 0; if (atomic_read(&bi->sync_writers)) return 0; if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) From patchwork Tue Dec 22 03:49:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985699 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C26FEC433DB for ; Tue, 22 Dec 2020 03:55:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9511622ADC for ; Tue, 22 Dec 2020 03:55:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725907AbgLVDzS (ORCPT ); Mon, 21 Dec 2020 22:55:18 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46436 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725782AbgLVDzS (ORCPT ); Mon, 21 Dec 2020 22:55:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609318; x=1640145318; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xCLazPlf0G9ei38pJ5KHLjMX1MJYSy8wER0MXnVCXe8=; b=dC3YhX7gSov0hXUATe9Z1ZN0h5jV6THzPIZ2m+aKKMrWXF6UuLj9u9K7 ONlyKZp4VHO3MgduZ1spTdlOrGsY+VSHuWkzT8bUiMJzH67BkeUXmDgL1 r5PipGN2IXlAmZwSDXt2E1MU/12qvM+eziSGCDRaDecniKJGjnpwPp/KS OtiCLNsbAeOHbU3LDaigGpRUoYSLTAi6twUUPKE6Fq23eWA879jsLibFH MllmMQIj2KHy8ka0yRIeHhegjJ/PAEnc459P+IIxkuQ4ivZHZyOIfinCZ VByathpGMuY/Ui34BU9s7bz9IHhkqT1jk7lhP9a5IzonDx1Hle8loHUzA g==; IronPort-SDR: WZCRipCtB76wV/w/E7NySHXzR/wdTVp4qz086GmFR34YiE9doLPbxmtI84Be4PUlyELyosZqpg /ZCS/SslxsJuPCyUpZ7D3dmyQIPisx140maOYE0q9D4GEPHTekiWcYz56GAKqMHcKfiHEcqPxG Orw7wTjOnjoLMG1AK50g4PllE9OGQJ4XzJjB6XmV6/5OoJu9JOw+WSl5Fxln3t0hMafI90dspH cevz5TN+eti+8bnfq07UUcpUTVTJS3wChg8jb88MKMlBMqLHP25j0x8jyfpazwya76VbGx1b0M g4M= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193846" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:14 +0800 IronPort-SDR: NRLMUxmVkWsxGkJu4mjWeKIZvCE/zNa1vtszt+1+lVmltgidLbfXeGDi7iGankx9gRvibLILvw rdGW/gD2KEyx5dUXCswCDWZ2UliN8kbXxOWuf5x8YcVt6JVbXMmbmw/Z/rL9FLNm7bKOqMRvkd uajllX18f+mhAYBbWy1UPh+7NhuzkWhQm7D0R4fSCfj9yg4y7CCfjLUFGKjY4ASMt2WZkOyeH6 53LJzxfKe1JmnzN+XgjWHUVWZCjyQuzuxgoU90N9j4BnSfu3uu9JcQiw34c/WjREMeD9vEbE6z P4Ucgf3wN0A8Kwj6mBy5tnCZ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:25 -0800 IronPort-SDR: ts/jHAQxSNIQqxlW8pwPLY/cU9mGvyZONQoeFFMygQUxmFo48y+AZ93yLmcihjr44lonYTNGDR YR6okvM5KlrIjIEwpu0YQRfUA0+hAUIyhRR5vfapWAtXNy0OPBHUyfuk0QAzuwzrAp+kpw6l/w V44JvN2XKGbRkSCiTJqMaf/0CtBpmJSiG/DDOu5H7IzdZpdWIkqjhTGDKQBA56xvTxKdQQK04M SM3kuLjnogNMQlxe9/erjLdX+buoVu5RVRzBVu5+/zBQqJ5EVaMzrTyxPK3PNklTxSsTPCPhaf 5gs= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:13 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 31/40] btrfs: mark block groups to copy for device-replace Date: Tue, 22 Dec 2020 12:49:24 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/4 patch to support device-replace in ZONED mode. We have two types of I/Os during the device-replace process. One is an I/O to "copy" (by the scrub functions) all the device extents on the source device to the destination device. The other one is an I/O to "clone" (by handle_ops_on_dev_replace()) new incoming write I/Os from users to the source device into the target device. Cloning incoming I/Os can break the sequential write rule in the target device. When writing is mapped in the middle of a block group, the I/O is directed in the middle of a target device zone, which breaks the sequential write rule. However, the cloning function cannot be merely disabled since incoming I/Os targeting already copied device extents must be cloned so that the I/O is executed on the target device. We cannot use dev_replace->cursor_{left,right} to determine whether bio is going to not yet copied region. Since we have a time gap between finishing btrfs_scrub_dev() and rewriting the mapping tree in btrfs_dev_replace_finishing(), we can have a newly allocated device extent which is never cloned nor copied. So the point is to copy only already existing device extents. This patch introduces mark_block_group_to_copy() to mark existing block groups as a target of copying. Then, handle_ops_on_dev_replace() and dev-replace can check the flag to do their job. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 1 + fs/btrfs/dev-replace.c | 182 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/dev-replace.h | 3 + fs/btrfs/scrub.c | 17 ++++ 4 files changed, 203 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 19a22bf930c6..3dec66ed36cb 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -95,6 +95,7 @@ struct btrfs_block_group { unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; + unsigned int to_copy:1; int disk_cache_state; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index e77cb46bf15d..accbc3624b8c 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -22,6 +22,7 @@ #include "dev-replace.h" #include "sysfs.h" #include "zoned.h" +#include "block-group.h" /* * Device replace overview @@ -462,6 +463,183 @@ static char* btrfs_dev_name(struct btrfs_device *device) return rcu_str_deref(device->name); } +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info, + struct btrfs_device *src_dev) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_dev_extent *dev_extent = NULL; + struct btrfs_block_group *cache; + struct btrfs_trans_handle *trans; + int ret = 0; + u64 chunk_offset; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return 0; + + mutex_lock(&fs_info->chunk_mutex); + + /* Ensure we don't have pending new block group */ + spin_lock(&fs_info->trans_lock); + while (fs_info->running_transaction && + !list_empty(&fs_info->running_transaction->dev_update_list)) { + spin_unlock(&fs_info->trans_lock); + mutex_unlock(&fs_info->chunk_mutex); + trans = btrfs_attach_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret == -ENOENT) + continue; + else + goto unlock; + } + + ret = btrfs_commit_transaction(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret) + goto unlock; + + spin_lock(&fs_info->trans_lock); + } + spin_unlock(&fs_info->trans_lock); + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto unlock; + } + + path->reada = READA_FORWARD; + path->search_commit_root = 1; + path->skip_locking = 1; + + key.objectid = src_dev->devid; + key.offset = 0; + key.type = BTRFS_DEV_EXTENT_KEY; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto free_path; + if (ret > 0) { + if (path->slots[0] >= + btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_leaf(root, path); + if (ret < 0) + goto free_path; + if (ret > 0) { + ret = 0; + goto free_path; + } + } else { + ret = 0; + } + } + + while (1) { + struct extent_buffer *l = path->nodes[0]; + int slot = path->slots[0]; + + btrfs_item_key_to_cpu(l, &found_key, slot); + + if (found_key.objectid != src_dev->devid) + break; + + if (found_key.type != BTRFS_DEV_EXTENT_KEY) + break; + + if (found_key.offset < key.offset) + break; + + dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent); + + chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent); + + cache = btrfs_lookup_block_group(fs_info, chunk_offset); + if (!cache) + goto skip; + + spin_lock(&cache->lock); + cache->to_copy = 1; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + +skip: + ret = btrfs_next_item(root, path); + if (ret != 0) { + if (ret > 0) + ret = 0; + break; + } + } + +free_path: + btrfs_free_path(path); +unlock: + mutex_unlock(&fs_info->chunk_mutex); + + return ret; +} + +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map *em; + struct map_lookup *map; + u64 chunk_offset = cache->start; + int num_extents, cur_extent; + int i; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return true; + + spin_lock(&cache->lock); + if (cache->removed) { + spin_unlock(&cache->lock); + return true; + } + spin_unlock(&cache->lock); + + em = btrfs_get_chunk_map(fs_info, chunk_offset, 1); + ASSERT(!IS_ERR(em)); + map = em->map_lookup; + + num_extents = cur_extent = 0; + for (i = 0; i < map->num_stripes; i++) { + /* We have more device extent to copy */ + if (srcdev != map->stripes[i].dev) + continue; + + num_extents++; + if (physical == map->stripes[i].physical) + cur_extent = i; + } + + free_extent_map(em); + + if (num_extents > 1 && cur_extent < num_extents - 1) { + /* + * Has more stripes on this device. Keep this BG + * readonly until we finish all the stripes. + */ + return false; + } + + /* Last stripe on this device */ + spin_lock(&cache->lock); + cache->to_copy = 0; + spin_unlock(&cache->lock); + + return true; +} + static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, const char *tgtdev_name, u64 srcdevid, const char *srcdev_name, int read_src) @@ -503,6 +681,10 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, if (ret) return ret; + ret = mark_block_group_to_copy(fs_info, src_device); + if (ret) + return ret; + down_write(&dev_replace->rwsem); switch (dev_replace->replace_state) { case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h index 60b70dacc299..3911049a5f23 100644 --- a/fs/btrfs/dev-replace.h +++ b/fs/btrfs/dev-replace.h @@ -18,5 +18,8 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info); void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info); int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info); int __pure btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace); +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical); #endif diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 3a0a6b8ed6f2..b57c1184f330 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3564,6 +3564,17 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + + if (sctx->is_dev_replace && btrfs_is_zoned(fs_info)) { + spin_lock(&cache->lock); + if (!cache->to_copy) { + spin_unlock(&cache->lock); + ro_set = 0; + goto done; + } + spin_unlock(&cache->lock); + } + /* * Make sure that while we are scrubbing the corresponding block * group doesn't get its logical address and its device extents @@ -3695,6 +3706,12 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, scrub_pause_off(fs_info); + if (sctx->is_dev_replace && + !btrfs_finish_block_group_to_copy(dev_replace->srcdev, + cache, found_key.offset)) + ro_set = 0; + +done: down_write(&dev_replace->rwsem); dev_replace->cursor_left = dev_replace->cursor_right; dev_replace->item_needs_writeback = 1; From patchwork Tue Dec 22 03:49:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985701 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCC1DC433DB for ; Tue, 22 Dec 2020 03:55:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AE04222ADC for ; Tue, 22 Dec 2020 03:55:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726075AbgLVDzX (ORCPT ); Mon, 21 Dec 2020 22:55:23 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46437 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725782AbgLVDzW (ORCPT ); Mon, 21 Dec 2020 22:55:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609321; x=1640145321; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xGB1YW/m/PVdyIFc8B0FP2nDC5s/aYIwrEWrqvmVyvg=; b=HggQw5UUtcywTjWtjblZii1oSxi5YCWwjdaN/xs47DcAKKz8zdkcx5e/ Bsmv8g6U3S07Fk3iQ3SoT8WKXjlYkxHI/z39i2FYTcJ2f92IW67sseKb4 ageuMi18BaNS42pKKFmrXzGlLip7pU84gLGT5CiRgtZsBHtqbMgUPMTAF l3R7lf9thzGth2pJ8UofEMIlKY1CiLtEO4jJJ6z6+wzbiQpMarvBnypDx dKUbOWwc0JZ/yuIL7dzg8cmXFGKe47VO5Zeruyj1EpyXBqHjvrTn0czlQ BEVTb5mTcqNgSZYFM5B/whhcmqNT/aR3aQbbaCkw1X2XagEahK41Q1tdE Q==; IronPort-SDR: i3mLLEPEYHFWppZaltnKIU0ey5xIOgQW84AwvCVTf3V0EhPq8yAOUPaQOFyJWvEFbtDuCbOCE6 rCz2vqAEa2so6I1pRni4GU5ZEiGMSO7A+zwGhO9W6FdgS32hScaRjCMxo7gDQNxh+IgD5ykFZp 7Ai5IDfOEnu69l0EhzEQ0vVAfYzXXpzgNP1sBoJZOHX2uDxWxBVtbxnvkPGiK9tpBB517zuEcX S7v9Nw2rlRjgYssPgJekcK6nEh2LP6OS3X6CiNmzSXXxXw2v5uNsCOpwWN3qJHmEOhywT3RiD8 sdk= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193851" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:16 +0800 IronPort-SDR: MBji+eZyNIo8QbAjx97ZBy1jB5om5uiH3Gp4iNV3svtnntoLKBZh/zLA3+0fCozdnvayzFeim2 qaBZ8dBStsdqw7DNLPaCCPdcOECVm1HAJZeJDRV+qjK8L1+EwI4YjOmE8d2vfDGA3Xkn7ueFI3 lYifJYeBvI9eJ2KUmk8MuJ2kozM1QrrGcpS1TUiboy3oAzQuelV5Uwvd2/1d0lBTlPPHzDRU0+ RJmdTd2tjQ4BulZNYipkg50cjate9fvwZKWFf+v8R7jxuf85plhbfkyAbE2lRADBpOzV2L0Oh0 2kwRlVnTpYbY8/qY+pf+yENo Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:27 -0800 IronPort-SDR: KDyVcD8DroCz/IeAWqNnBYwg+uUPVF8+GZNuk2jcXuNqjx058ybALj2wOqwwsjGZQutscYr+7v 4Rh6xRtkpUPIuHNI+bYiF4suuDOiCvamInJYmkTSNbTlYn4NrmmBm5jtpdwyQsdbyCtf2hbEYJ eK5K6vBTMebnzLorjfpec1hMnz5xTdCRGH0TzU7fz/I6/yLMBr8/wy7ho36NRyboPAaCj10SAE nURkyPeA9D5G8f01847GfPqiSeGF54d1RKqDf3YekFs73Sa1PdPIk5jQ9+c20zSUT3RPf+Q5w6 Z7I= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:15 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 32/40] btrfs: implement cloning for ZONED device-replace Date: Tue, 22 Dec 2020 12:49:25 +0900 Message-Id: <25d52cf3e1e1fcc53625d1a6d925dfa46aedc547.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 2/4 patch to implement device-replace for ZONED mode. On zoned mode, a block group must be either copied (from the source device to the destination device) or cloned (to the both device). This commit implements the cloning part. If a block group targeted by an IO is marked to copy, we should not clone the IO to the destination device, because the block group is eventually copied by the replace process. This commit also handles cloning of device reset. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++----------- fs/btrfs/volumes.c | 33 ++++++++++++++++++++++-- fs/btrfs/zoned.c | 11 ++++++++ 3 files changed, 84 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ac24a79ce32a..23d77e3196ca 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -35,6 +35,7 @@ #include "discard.h" #include "rcu-string.h" #include "zoned.h" +#include "dev-replace.h" #undef SCRAMBLE_DELAYED_REFS @@ -1300,6 +1301,46 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, return ret; } +static int do_discard_extent(struct btrfs_bio_stripe *stripe, u64 *bytes) +{ + struct btrfs_device *dev = stripe->dev; + struct btrfs_fs_info *fs_info = dev->fs_info; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + u64 phys = stripe->physical; + u64 len = stripe->length; + u64 discarded = 0; + int ret = 0; + + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, phys, len)) { + u64 src_disc; + + ret = btrfs_reset_device_zone(dev, phys, len, &discarded); + if (ret) + goto out; + + if (!btrfs_dev_replace_is_ongoing(dev_replace) || + dev != dev_replace->srcdev) + goto out; + + src_disc = discarded; + + /* send to replace target as well */ + ret = btrfs_reset_device_zone(dev_replace->tgtdev, phys, len, + &discarded); + discarded += src_disc; + } else if (blk_queue_discard(bdev_get_queue(stripe->dev->bdev))) { + ret = btrfs_issue_discard(dev->bdev, phys, len, &discarded); + } else { + ret = 0; + *bytes = 0; + } + +out: + *bytes = discarded; + return ret; +} + int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes, u64 *actual_bytes) { @@ -1333,28 +1374,14 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { - struct btrfs_device *dev = stripe->dev; - u64 physical = stripe->physical; - u64 length = stripe->length; u64 bytes; - struct request_queue *req_q; if (!stripe->dev->bdev) { ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } - req_q = bdev_get_queue(stripe->dev->bdev); - /* Zone reset in ZONED mode */ - if (btrfs_can_zone_reset(dev, physical, length)) - ret = btrfs_reset_device_zone(dev, physical, - length, &bytes); - else if (blk_queue_discard(req_q)) - ret = btrfs_issue_discard(dev->bdev, physical, - length, &bytes); - else - continue; - + ret = do_discard_extent(stripe, &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d2b86aa1fc72..f1cc8a421580 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5987,9 +5987,29 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info, return ret; } +static bool is_block_group_to_copy(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + bool ret; + + /* non-ZONED mode does not use "to_copy" flag */ + if (!btrfs_is_zoned(fs_info)) + return false; + + cache = btrfs_lookup_block_group(fs_info, logical); + + spin_lock(&cache->lock); + ret = cache->to_copy; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + return ret; +} + static void handle_ops_on_dev_replace(enum btrfs_map_op op, struct btrfs_bio **bbio_ret, struct btrfs_dev_replace *dev_replace, + u64 logical, int *num_stripes_ret, int *max_errors_ret) { struct btrfs_bio *bbio = *bbio_ret; @@ -6002,6 +6022,15 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, if (op == BTRFS_MAP_WRITE) { int index_where_to_add; + /* + * a block group which have "to_copy" set will + * eventually copied by dev-replace process. We can + * avoid cloning IO here. + */ + if (is_block_group_to_copy(dev_replace->srcdev->fs_info, + logical)) + return; + /* * duplicate the write operations while the dev replace * procedure is running. Since the copying of the old disk to @@ -6397,8 +6426,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op)) { - handle_ops_on_dev_replace(op, &bbio, dev_replace, &num_stripes, - &max_errors); + handle_ops_on_dev_replace(op, &bbio, dev_replace, logical, + &num_stripes, &max_errors); } *bbio_ret = bbio; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 01f84b4c4224..7fc8c68f2981 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -11,6 +11,7 @@ #include "disk-io.h" #include "block-group.h" #include "transaction.h" +#include "dev-replace.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1004,6 +1005,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) for (i = 0; i < map->num_stripes; i++) { bool is_sequential; struct blk_zone zone; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + int dev_replace_is_ongoing = 0; device = map->stripes[i].dev; physical = map->stripes[i].physical; @@ -1030,6 +1033,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) */ btrfs_dev_clear_zone_empty(device, physical); + down_read(&dev_replace->rwsem); + dev_replace_is_ongoing = + btrfs_dev_replace_is_ongoing(dev_replace); + if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL) + btrfs_dev_clear_zone_empty(dev_replace->tgtdev, + physical); + up_read(&dev_replace->rwsem); + /* * The group is mapped to a sequential zone. Get the zone write * pointer to determine the allocation offset within the zone. From patchwork Tue Dec 22 03:49:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985703 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC20FC433E0 for ; Tue, 22 Dec 2020 03:55:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BEDCE22CB1 for ; Tue, 22 Dec 2020 03:55:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726573AbgLVDzZ (ORCPT ); Mon, 21 Dec 2020 22:55:25 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46443 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726087AbgLVDzZ (ORCPT ); Mon, 21 Dec 2020 22:55:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609325; x=1640145325; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=So4P3NKPZAFvKKqxMst3zqRAB7K2Vz98FLNdrJick0c=; b=qICQrh8rAY5f2L9WMxqM3b0fz8h5nJ5tRFEsQSz0hSrKRTiy2B0LzaI6 I1M4jqpCVqSE5bGd+wLnkBKvRwQvi6mYrpnNGaukb652ZrY1/nsFkHn5r DXKOMI+iCCh2wQ2Dr+B2GXdw5wVqUg+/MXvvytV3Iy/H+Ofy9ap04C6Of nSlMRMvvcSNcGV1HVj57mutIN9zSacmLCq/rMSGH5zBm6p2aJTR3b1fac AYT620l88cHq4MNoESgNxOysoPRvjcoyL1j5UOn8rjh3fsaXqaSE3kZ7c S0GEARznFgPxW12rK33SAJ/zykzMHA0mdhgA3RAmvbVqErUyu7Urcrmak A==; IronPort-SDR: GPwAfS0Rrxd4/4eRqyT6TVjfFcSailSHajgYp8YgWjjDPE2okqLPUqL9CT/zZpmgyKQErS7iuz pz+g4y4nfy164kcoCJWITQPDvWOaQ/mC+JEECfr6/4bz7mqL1YLRVimlQts1NAcjfB0yZUUSXV 7554cZl1VaTMbTrYibBMpbSoAyLtjuig8kN0wVCEIeDtjuWPRkFKy71PFKtIZt+qivsP7cZroh n5Yilarb8KxDxDUKNR+ic7+/FyMD7+gQIk7TWA8A3WAmHwKHKcS3aKjB3IL7mscQL+hBWz92Ke AuM= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193853" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:17 +0800 IronPort-SDR: 6eVeU8LWRbu2fsfBcvBPgIH21k9B6DmMcZipRPR08aEwR7+2MvD5YXUKQE6r9S1YM19wP+TxIH vJXyF3+fKleGL4gInTvWdzMESSkX7hcE93B86R/jrYBOmKL21S8nAYib+DasyNBkKE+KkIqcEA +f/tg2SRo23kIPeadnksO+gOXaLz+iA7epzv3gsj6siGuG7j/g61moAcRaEYqum7sMAPmxF3Zc 5BAy5wD4F/gbCG6GqPD8OvNOsp3ki+y5RVN8Ia9Y99N3iZ0FjNW4OomIKjwKQpdEH65KldOupb NZRhtz7Jgof51d1pOVzOLG+k Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:28 -0800 IronPort-SDR: fxwFO7FSdb8F4Re2s7FzbB3sf/+mQST5khaefFiUnwgkNJs1dSe2CRg8HB9WllgYSs7m0GpMk2 jLhTWJaND8MG/Xhw9SZxFk3/iPN6D6refK1OLKbTgwES/IumuqxVqDuGfkzx4YwS6uykZl/uG/ 99kPU54RSR1azO6O4NOUMJ5uyi7PHF8u35jYMYh8jfH1vKcKiQGrpdIZuqQKiEH7CawNPqKffI c2kIrqNfoVMzeZIodNq02yn+VPvOKegQ+6IaIbrVd0Sa7cWwJ9T98fsv+TqMzDUMYoObb32dnb JH4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:16 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 33/40] btrfs: implement copying for ZONED device-replace Date: Tue, 22 Dec 2020 12:49:26 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 3/4 patch to implement device-replace on ZONED mode. This commit implement copying. So, it track the write pointer during device replace process. Device-replace's copying is smart to copy only used extents on source device, we have to fill the gap to honor the sequential write rule in the target device. Device-replace process in ZONED mode must copy or clone all the extents in the source device exactly once. So, we need to use to ensure allocations started just before the dev-replace process to have their corresponding extent information in the B-trees. finish_extent_writes_for_zoned() implements that functionality, which basically is the removed code in the commit 042528f8d840 ("Btrfs: fix block group remaining RO forever after error during device replace"). Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/scrub.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.c | 12 +++++++ fs/btrfs/zoned.h | 8 +++++ 3 files changed, 106 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index b57c1184f330..b03c3629fb12 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -166,6 +166,7 @@ struct scrub_ctx { int pages_per_rd_bio; int is_dev_replace; + u64 write_pointer; struct scrub_bio *wr_curr_bio; struct mutex wr_lock; @@ -1619,6 +1620,25 @@ static int scrub_write_page_to_dev_replace(struct scrub_block *sblock, return scrub_add_page_to_wr_bio(sblock->sctx, spage); } +static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) +{ + int ret = 0; + u64 length; + + if (!btrfs_is_zoned(sctx->fs_info)) + return 0; + + if (sctx->write_pointer < physical) { + length = physical - sctx->write_pointer; + + ret = btrfs_zoned_issue_zeroout(sctx->wr_tgtdev, + sctx->write_pointer, length); + if (!ret) + sctx->write_pointer = physical; + } + return ret; +} + static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, struct scrub_page *spage) { @@ -1641,6 +1661,13 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, if (sbio->page_count == 0) { struct bio *bio; + ret = fill_writer_pointer_gap(sctx, + spage->physical_for_dev_replace); + if (ret) { + mutex_unlock(&sctx->wr_lock); + return ret; + } + sbio->physical = spage->physical_for_dev_replace; sbio->logical = spage->logical; sbio->dev = sctx->wr_tgtdev; @@ -1705,6 +1732,10 @@ static void scrub_wr_submit(struct scrub_ctx *sctx) * doubled the write performance on spinning disks when measured * with Linux 3.5 */ btrfsic_submit_bio(sbio->bio); + + if (btrfs_is_zoned(sctx->fs_info)) + sctx->write_pointer = sbio->physical + + sbio->page_count * PAGE_SIZE; } static void scrub_wr_bio_end_io(struct bio *bio) @@ -3028,6 +3059,21 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return ret < 0 ? ret : 0; } +static void sync_replace_for_zoned(struct scrub_ctx *sctx) +{ + if (!btrfs_is_zoned(sctx->fs_info)) + return; + + sctx->flush_all_writes = true; + scrub_submit(sctx); + mutex_lock(&sctx->wr_lock); + scrub_wr_submit(sctx); + mutex_unlock(&sctx->wr_lock); + + wait_event(sctx->list_wait, + atomic_read(&sctx->bios_in_flight) == 0); +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3168,6 +3214,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ blk_start_plug(&plug); + if (sctx->is_dev_replace && + btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) { + mutex_lock(&sctx->wr_lock); + sctx->write_pointer = physical; + mutex_unlock(&sctx->wr_lock); + sctx->flush_all_writes = true; + } + /* * now find all extents for each stripe and scrub them */ @@ -3356,6 +3410,9 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) goto out; + if (sctx->is_dev_replace) + sync_replace_for_zoned(sctx); + if (extent_logical + extent_len < key.objectid + bytes) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { @@ -3478,6 +3535,25 @@ static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx, return ret; } +static int finish_extent_writes_for_zoned(struct btrfs_root *root, + struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_trans_handle *trans; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + btrfs_wait_block_group_reservations(cache); + btrfs_wait_nocow_writers(cache); + btrfs_wait_ordered_roots(fs_info, U64_MAX, cache->start, cache->length); + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) + return PTR_ERR(trans); + return btrfs_commit_transaction(trans); +} + static noinline_for_stack int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, u64 start, u64 end) @@ -3633,6 +3709,16 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, * group is not RO. */ ret = btrfs_inc_block_group_ro(cache, sctx->is_dev_replace); + if (!ret && sctx->is_dev_replace) { + ret = finish_extent_writes_for_zoned(root, cache); + if (ret) { + btrfs_dec_block_group_ro(cache); + scrub_pause_off(fs_info); + btrfs_put_block_group(cache); + break; + } + } + if (ret == 0) { ro_set = 1; } else if (ret == -ENOSPC && !sctx->is_dev_replace) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 7fc8c68f2981..2c7adfb43028 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1346,3 +1346,15 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, ASSERT(cache->meta_write_pointer == eb->start + eb->len); cache->meta_write_pointer = eb->start; } + +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length) +{ + if (!btrfs_dev_is_sequential(device, physical)) + return -EOPNOTSUPP; + + return blkdev_issue_zeroout(device->bdev, + physical >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, + GFP_NOFS, 0); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index a42e120158ab..a9698470c08e 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -55,6 +55,8 @@ bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, struct btrfs_block_group **cache_ret); void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -169,6 +171,12 @@ static inline void btrfs_revert_meta_write_pointer( { } +static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, + u64 physical, u64 length) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Dec 22 03:49:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985705 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01F39C433DB for ; Tue, 22 Dec 2020 03:55:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE3FE22CB1 for ; Tue, 22 Dec 2020 03:55:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726588AbgLVDzf (ORCPT ); Mon, 21 Dec 2020 22:55:35 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46382 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726087AbgLVDze (ORCPT ); Mon, 21 Dec 2020 22:55:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609333; x=1640145333; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=B9RJMyibyj8R0wRfKwUPRXyHha2r089U6e0S6zma2Ts=; b=UY269y2CbfolmMWcELrTWBwuXsK78frDTc3Wz8YMbn50Qm8e82XviJbv 8qfNCGma9vk/6iQeapi9a6oHNTO18baO1Ja7apZnm27G+XQy9+urLlZ0X r9Isn+Wkfulkvb4ojrCo9aIYIk1lZDYwYb6iComSzltgS91v1vRdkMAPP E+1A186iCYWI5DndbqQ1F4BPIzsTo6+npOepRmj1lfKO19jGiXrFGEvSF XMZ5F0JWSDbpcgTBzd60TNm4d/Q4tWXoImkgsJLrZUXYe3wv7rPiv3roh UFqPUwNvU2ht+5z7TVNCGS+n8M9FBlWXJvySKFNi4V124n0QjfKrO1Y9T Q==; IronPort-SDR: cCWNYADhiv16AcnNBvrHb0gVMQln0utqV9HkFOvL/BCsxB4pP1b85Qaf1OXj8I/WRwpqBaGso5 V2v5vCVWhDBOpV0iG0ql4oQUMN565scxoJqaucEjsqOguQRPMlI/J8SsBJdnh2NbJSOXZ7SvH5 un4Xz7rpo55wuHD4qoI+uBca35D9jP4mzEm/9r4j3Eyv9mafLdqtpm7CpFdN3aOh64lqTd2bcD WmoHEJSIrdgfCbRMjGvI9n8fkkk7ipIeM0rxsvIH2qa3TPOZNs1wPWuac2kkdWrkeRoFMGLhSa UMs= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193859" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:19 +0800 IronPort-SDR: SDlzpTFnPl5/LGiqsNGn+Q22c+WU0RTm0eMJFZ5XB6zMpwJoXVcIUiYOZTPg9TRbVmYxmsWy1+ mZ5K4iSQBADuPgmUCSPNFVxfnV+H3P3JFUk9lbOqTzqAc3ANq437/IaSsmp3Uy8L6GVQKF6HpA KxDM8rH8Vi8WuJhUCH/qXNlnRE2mgXiLp9jAjKO81eWz7VbQ3+SgEoSjN/YEczpPFApTB/I6Jc mo4dq6zO9wlyAcMAR5erYMFnx8yFem1PiUwc0kG4k0fWwouFSifuAGUCSlLQP0OmbQPEZO9Ssf zTiS5iuQ57DDBRrfG2nlt1fk Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:30 -0800 IronPort-SDR: ZeVfZIFjAo5jJXgIRvx66g60gh4O6nTs3sC8GXD1lQbiTQ1vBPN8/RdR+6EXsHwIxQqELQZCNv XwLfJ4VRUx0CeryFfrSdjsPNXpQCvaiinfUDcwKn1sorvoxDwxRW5pj4tBZl2NFClIKYYfubPj 0DCZwii24xL3yC4TsSgiO5DDWoazQBEy2w+LiKsV2+RNwNloj02mIa8ihtX25ymHAild2SWzMh 9m/9wBKCs6n7STya58mC7On/wUl3bSerLKu4HvZhP+5e1KsjEVxfnn3N47YjdO1A/pOR6Vjbtu Z6c= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:18 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 34/40] btrfs: support dev-replace in ZONED mode Date: Tue, 22 Dec 2020 12:49:27 +0900 Message-Id: <019923ef8aca1d3d8ccddb439e397df35cfe02a7.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 4/4 patch to implement device-replace on ZONED mode. Even after the copying is done, the write pointers of the source device and the destination device may not be synchronized. For example, when the last allocated extent is freed before device-replace process, the extent is not copied, leaving a hole there. This patch synchronize the write pointers by writing zeros to the destination device. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Josef Bacik --- fs/btrfs/scrub.c | 39 +++++++++++++++++++++++++++ fs/btrfs/zoned.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 9 +++++++ 3 files changed, 117 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index b03c3629fb12..2f577f3b1c31 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1628,6 +1628,9 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) if (!btrfs_is_zoned(sctx->fs_info)) return 0; + if (!btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) + return 0; + if (sctx->write_pointer < physical) { length = physical - sctx->write_pointer; @@ -3074,6 +3077,31 @@ static void sync_replace_for_zoned(struct scrub_ctx *sctx) atomic_read(&sctx->bios_in_flight) == 0); } +static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical, + u64 physical, u64 physical_end) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + int ret = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); + + mutex_lock(&sctx->wr_lock); + if (sctx->write_pointer < physical_end) { + ret = btrfs_sync_zone_write_pointer(sctx->wr_tgtdev, logical, + physical, + sctx->write_pointer); + if (ret) + btrfs_err(fs_info, "failed to recover write pointer"); + } + mutex_unlock(&sctx->wr_lock); + btrfs_dev_clear_zone_empty(sctx->wr_tgtdev, physical); + + return ret; +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3480,6 +3508,17 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, blk_finish_plug(&plug); btrfs_free_path(path); btrfs_free_path(ppath); + + if (sctx->is_dev_replace && ret >= 0) { + int ret2; + + ret2 = sync_write_pointer_for_zoned(sctx, base + offset, + map->stripes[num].physical, + physical_end); + if (ret2) + ret = ret2; + } + return ret < 0 ? ret : 0; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 2c7adfb43028..9ecd636596aa 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -12,6 +12,7 @@ #include "block-group.h" #include "transaction.h" #include "dev-replace.h" +#include "space-info.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1358,3 +1359,71 @@ int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, length >> SECTOR_SHIFT, GFP_NOFS, 0); } + +static int read_zone_info(struct btrfs_fs_info *fs_info, u64 logical, + struct blk_zone *zone) +{ + struct btrfs_bio *bbio = NULL; + u64 mapped_length = PAGE_SIZE; + unsigned int nofs_flag; + int nmirrors; + int i, ret; + + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, + &mapped_length, &bbio); + if (ret || !bbio || mapped_length < PAGE_SIZE) { + btrfs_put_bbio(bbio); + return -EIO; + } + + if (bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) + return -EINVAL; + + nofs_flag = memalloc_nofs_save(); + nmirrors = (int)bbio->num_stripes; + for (i = 0; i < nmirrors; i++) { + u64 physical = bbio->stripes[i].physical; + struct btrfs_device *dev = bbio->stripes[i].dev; + + /* Missing device */ + if (!dev->bdev) + continue; + + ret = btrfs_get_dev_zone(dev, physical, zone); + /* Failing device */ + if (ret == -EIO || ret == -EOPNOTSUPP) + continue; + break; + } + memalloc_nofs_restore(nofs_flag); + + return ret; +} + +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos) +{ + struct btrfs_fs_info *fs_info = tgt_dev->fs_info; + struct blk_zone zone; + u64 length; + u64 wp; + int ret; + + if (!btrfs_dev_is_sequential(tgt_dev, physical_pos)) + return 0; + + ret = read_zone_info(fs_info, logical, &zone); + if (ret) + return ret; + + wp = physical_start + ((zone.wp - zone.start) << SECTOR_SHIFT); + + if (physical_pos == wp) + return 0; + + if (physical_pos > wp) + return -EUCLEAN; + + length = wp - physical_pos; + return btrfs_zoned_issue_zeroout(tgt_dev, physical_pos, length); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index a9698470c08e..8c203c0425e0 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -57,6 +57,8 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length); +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -177,6 +179,13 @@ static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, return -EOPNOTSUPP; } +static inline int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, + u64 logical, u64 physical_start, + u64 physical_pos) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Dec 22 03:49:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985707 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E2CEC433E0 for ; Tue, 22 Dec 2020 03:55:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1855F22ADC for ; Tue, 22 Dec 2020 03:55:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726604AbgLVDzh (ORCPT ); Mon, 21 Dec 2020 22:55:37 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46466 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726087AbgLVDzg (ORCPT ); Mon, 21 Dec 2020 22:55:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609336; x=1640145336; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ck+h504WIFiv+j2i047DwHY8LAOv/31ulyKIT557WPs=; b=K+GkGq6+IptDpr1ZKAUJahXXG/JqnuSDxEzqtwmbtEzOzC6iuCGvwcNl EKWjwugkYW1VZx7j2d9e7tCq5K7LWu3H8oX2QFQP/psPeXL6cOlJwAwz5 TvACFU8ODcGr+apZki4TmPa8J+8HbwkJXjYLwxpVy4/8cjqnZb6yVljNJ T+Gx5Z6bkSoVVKHzVaT80U1y+T53SwwV9zOwPQfpL2tg2g+lYo9DzWMFx jZodosHO/B/UPD6M3GCBvr3vRHmgAhoTGvm1zPrTwknSizLZqA5jSShLo AkvIN0XmyGU3FCga8K1oY63NggPWN2fe2PeeQfYN/SK4sUaz5CT8gVP9g Q==; IronPort-SDR: IeQYJZUJ8WUFSCQSgl92CViCRw0k4nlQnECtqsuTRPJBitcnKsdwKpMeURbBksufjUpo+nHjz3 co/Bi/wxbzvB0yeuXmRUDjF02+5beaoU9YqypnFu+vccyEZyJJ4bw0iXEsztvItSCum+4yY5cN zYOCTkAphRNpe6sqyGiUPMhSfXERV9c4luacs/l3OHPEkt4EbLaBTUhT0rLc81+TqF6f/jW9N8 lKqCKzdLAo20uyixrZmtegd+wZ8q1Bo7xJxjti+s5QVTofMCqNLEFJba/Bx9OMsZlFFMz5XdzP lOE= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193864" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:20 +0800 IronPort-SDR: 7sQzYoVMtWqh5JXDxWAkIiV5TOT/1FePNTWZ7irwa4SU9o002iXWoZ8p2O+ZR5ElzRwrg+YiSB YLa6kG0oGtaEBwlwwgJFrF5RTD6+QzHSFuYBk7dNUPJt4Ogn/7w0W6dCHP5JmeujUWZd/xAZ5m LR+/Br4DU84VidL5ymFEAccsNYIs2cpb9sDITdn+85A2+Hrk28km0dIiFlOD0jnW53uQLXepBJ bDi2mYYL5ocKYxh6V3ToGObBQlDKcHZKNsbCWw8tLe8UMxX1BXv+SwHJDOlyVZ4KDLg34zdzJq EsPm+S1hkZeTTCx2KNiNNmcf Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:31 -0800 IronPort-SDR: YyoqLIbBzQ6y7NR0soKUB9wyG7SBK0q/nlMb2mmpbyAJ2EDPF3wL2rw60WLUqVs4wUytA5lAum PUlrlLcuqPhjeqLulElQ/CTKc7iR8XR+9F/wJV7aa48lNDK8ZrBphySlAHuT37xjDJgcmXIDkH KxGPRnZTInayYoe19TWIR8uOQuC+PdVHl+gKkvhlqTUFgt4hIkLMHP7ENTXspyFhSWAUilpKWy Cf9gvUxHLQOEgCVcbOWNTQXJUdhN2Rkvztp/385pSPZDMN7EWlLqus3Fb5VbdUpL66q17WM985 Nq8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:19 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 35/40] btrfs: enable relocation in ZONED mode Date: Tue, 22 Dec 2020 12:49:28 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org To serialize allocation and submit_bio, we introduced mutex around them. As a result, preallocation must be completely disabled to avoid a deadlock. Since current relocation process relies on preallocation to move file data extents, it must be handled in another way. In ZONED mode, we just truncate the inode to the size that we wanted to pre-allocate. Then, we flush dirty pages on the file before finishing relocation process. run_delalloc_zoned() will handle all the allocation and submit IOs to the underlying layers. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/relocation.c | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 30a80669647f..94c72bea6a43 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2553,6 +2553,32 @@ static noinline_for_stack int prealloc_file_extent_cluster( if (ret) return ret; + /* + * In ZONED mode, we cannot preallocate the file region. Instead, we + * dirty and fiemap_write the region. + */ + + if (btrfs_is_zoned(inode->root->fs_info)) { + struct btrfs_root *root = inode->root; + struct btrfs_trans_handle *trans; + + end = cluster->end - offset + 1; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + inode->vfs_inode.i_ctime = current_time(&inode->vfs_inode); + i_size_write(&inode->vfs_inode, end); + ret = btrfs_update_inode(trans, root, inode); + if (ret) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + } + + return btrfs_end_transaction(trans); + } + inode_lock(&inode->vfs_inode); for (nr = 0; nr < cluster->nr; nr++) { start = cluster->boundary[nr] - offset; @@ -2749,6 +2775,8 @@ static int relocate_file_extent_cluster(struct inode *inode, } } WARN_ON(nr != cluster->nr); + if (btrfs_is_zoned(fs_info) && !ret) + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); out: kfree(ra); return ret; @@ -3384,8 +3412,12 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, struct btrfs_path *path; struct btrfs_inode_item *item; struct extent_buffer *leaf; + u64 flags = BTRFS_INODE_NOCOMPRESS | BTRFS_INODE_PREALLOC; int ret; + if (btrfs_is_zoned(trans->fs_info)) + flags &= ~BTRFS_INODE_PREALLOC; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -3400,8 +3432,7 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, btrfs_set_inode_generation(leaf, item, 1); btrfs_set_inode_size(leaf, item, 0); btrfs_set_inode_mode(leaf, item, S_IFREG | 0600); - btrfs_set_inode_flags(leaf, item, BTRFS_INODE_NOCOMPRESS | - BTRFS_INODE_PREALLOC); + btrfs_set_inode_flags(leaf, item, flags); btrfs_mark_buffer_dirty(leaf); out: btrfs_free_path(path); From patchwork Tue Dec 22 03:49:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C6FDC433E0 for ; Tue, 22 Dec 2020 03:55:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E0B7422CB1 for ; Tue, 22 Dec 2020 03:55:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726610AbgLVDzy (ORCPT ); Mon, 21 Dec 2020 22:55:54 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46487 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726594AbgLVDzx (ORCPT ); Mon, 21 Dec 2020 22:55:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609352; x=1640145352; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uWYmLnWOdamxjID5lydHll2E47QqruR/bMtIS6iAJPM=; b=LWFjTOq8AMJG5yIFvWgo0gWm37emZtedQsqmEweOh1VzfrefKbzn1lr+ OrpepwMEaMZeAxMW+l5+qLRCkiPAphNt/Sut9ImTFfbALygpgFQpWsXfo ihQUvWXmCV/8JW7kQyo272shUdvD5dYZBXR9OlP6m/8EU6B0uoN54ZBDW b4sD4d9LAyJBlEkTV0VoAXrm1mgW0HzrF7L43QrxTcV6D9TyevcZThcs1 z+QIfSJSnrNiLtoeWos4JbJWNIz1d212N1Jvswi5QftIhLwyz3WcZ4zag jirzPWdOAgFNeSjcmPqc89QnFuG4JoWi8MeOKeC4rpPItbbjC9Ddq8b9C g==; IronPort-SDR: 7dxtt7GPhz+js+HNN6FOffZFoiCVFjtqv3tf+MI10aoZ13YYUi29EUEC2G6NJzZw2yX1H7057f KNFAWHGiwhlZPP7FJtFw5bZDVK0afTzcBR+XgRVCdyv7CJick+ZB1rkzSGpOSh86ay6c9pEKR1 E6XR9LU5n61PhRgUMgJmBqf1S1V8+P7AqYcpUXcyo2rVvwXwgXIPtdzI87k3SGxDgnmYyL2Q60 aS16pjP5jUgZlt33jMLu2zA9PslcsY1m0fcDT4adXKS8Xwww3+b0Gdp6OFf3jmN+756Jw7K2BS vmk= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193869" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:22 +0800 IronPort-SDR: EE2Susc09gcnVFy2okL/0RpyLV//L2M12RD5btap8GDPXlyl8Wzyfg0wRRtsfUWjOwWA86UsD/ 2SFbFrENjZ9ElXTmJUh+pC1xErfrwoytvDf0s3wrMqvNDaVAJ1ruVjSp7oUcj//XrFQ7VDZ/6N QmWhZdXPL67+l8J2XpCDDJbFhNTRIQQt+H7cLiXokfrF2AjBhDLsgGUJSSOYWrvdi2iuhkYxpe XXv0ZqN4nEDqnGuZGXYxo7L9oYoTEIoYZIqSRRKB7Vd0RuSfkEEGeWWZGmAWiWlDt9BoswDGnP 6Bt2gNv2etKc1igwlsK89j5T Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:33 -0800 IronPort-SDR: 1LD8xsfpfUIqQlJEUq6f03jeWEWtfiV1nBlZD929juICfOzFDVaNNYVLK5uyVMfGD3LZLdGrYG lscTupRuEcQd9Snx5zdJ0zynw7YK1JbBbgC1hHZZ/Nb3WWAdLGl/3ycjb4MNQv7+yS/VgxIcRF 3mbSOInb53nekBL7XeCowpqqUNC/tPMBiwzJanAt8lih0A+j5lK/v8KE0ss7NaBMlIj1jQbI3+ QxbolQxU3tfY/2Ys4ttRhdyOlxBV1Cpx51uzqzrJgNj8B41cN3ayTb5E2PtH9A8v6nHqR4mz5H nwA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:21 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v11 36/40] btrfs: relocate block group to repair IO failure in ZONED Date: Tue, 22 Dec 2020 12:49:29 +0900 Message-Id: <94f8a1c18dea94ae9e1231d7e0e078a0d6546a6e.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When btrfs find a checksum error and if the file system has a mirror of the damaged data, btrfs read the correct data from the mirror and write the data to damaged blocks. This repairing, however, is against the sequential write required rule. We can consider three methods to repair an IO failure in ZONED mode: (1) Reset and rewrite the damaged zone (2) Allocate new device extent and replace the damaged device extent to the new extent (3) Relocate the corresponding block group Method (1) is most similar to a behavior done with regular devices. However, it also wipes non-damaged data in the same device extent, and so it unnecessary degrades non-damaged data. Method (2) is much like device replacing but done in the same device. It is safe because it keeps the device extent until the replacing finish. However, extending device replacing is non-trivial. It assumes "src_dev->physical == dst_dev->physical". Also, the extent mapping replacing function should be extended to support replacing device extent position in one device. Method (3) invokes relocation of the damaged block group, so it is straightforward to implement. It relocates all the mirrored device extents, so it is, potentially, a more costly operation than method (1) or (2). But it relocates only using extents which reduce the total IO size. Let's apply method (3) for now. In the future, we can extend device-replace and apply method (2). For protecting a block group gets relocated multiple time with multiple IO errors, this commit introduces "relocating_repair" bit to show it's now relocating to repair IO failures. Also it uses a new kthread "btrfs-relocating-repair", not to block IO path with relocating process. This commit also supports repairing in the scrub process. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 1 + fs/btrfs/extent_io.c | 3 ++ fs/btrfs/scrub.c | 3 ++ fs/btrfs/volumes.c | 71 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 5 files changed, 79 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 3dec66ed36cb..36654bcd2a83 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -96,6 +96,7 @@ struct btrfs_block_group { unsigned int has_caching_ctl:1; unsigned int removed:1; unsigned int to_copy:1; + unsigned int relocating_repair:1; int disk_cache_state; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 80e5352d8d2c..202f4c3196ed 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2260,6 +2260,9 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); BUG_ON(!mirror_num); + if (btrfs_is_zoned(fs_info)) + return btrfs_repair_one_zone(fs_info, logical); + bio = btrfs_io_bio_alloc(1); bio->bi_iter.bi_size = 0; map_length = length; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 2f577f3b1c31..d0c47ef72d46 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -857,6 +857,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) have_csum = sblock_to_check->pagev[0]->have_csum; dev = sblock_to_check->pagev[0]->dev; + if (btrfs_is_zoned(fs_info) && !sctx->is_dev_replace) + return btrfs_repair_one_zone(fs_info, logical); + /* * We must use GFP_NOFS because the scrub task might be waiting for a * worker task executing this function and in turn a transaction commit diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f1cc8a421580..46be2a3e616c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -8004,3 +8004,74 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr) spin_unlock(&fs_info->swapfile_pins_lock); return node != NULL; } + +static int relocating_repair_kthread(void *data) +{ + struct btrfs_block_group *cache = (struct btrfs_block_group *) data; + struct btrfs_fs_info *fs_info = cache->fs_info; + u64 target; + int ret = 0; + + target = cache->start; + btrfs_put_block_group(cache); + + if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) { + btrfs_info(fs_info, + "zoned: skip relocating block group %llu to repair: EBUSY", + target); + return -EBUSY; + } + + mutex_lock(&fs_info->delete_unused_bgs_mutex); + + /* Ensure Block Group still exists */ + cache = btrfs_lookup_block_group(fs_info, target); + if (!cache) + goto out; + + if (!cache->relocating_repair) + goto out; + + ret = btrfs_may_alloc_data_chunk(fs_info, target); + if (ret < 0) + goto out; + + btrfs_info(fs_info, "zoned: relocating block group %llu to repair IO failure", + target); + ret = btrfs_relocate_chunk(fs_info, target); + +out: + if (cache) + btrfs_put_block_group(cache); + mutex_unlock(&fs_info->delete_unused_bgs_mutex); + btrfs_exclop_finish(fs_info); + + return ret; +} + +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + + /* Do not attempt to repair in degraded state */ + if (btrfs_test_opt(fs_info, DEGRADED)) + return 0; + + cache = btrfs_lookup_block_group(fs_info, logical); + if (!cache) + return 0; + + spin_lock(&cache->lock); + if (cache->relocating_repair) { + spin_unlock(&cache->lock); + btrfs_put_block_group(cache); + return 0; + } + cache->relocating_repair = 1; + spin_unlock(&cache->lock); + + kthread_run(relocating_repair_kthread, cache, + "btrfs-relocating-repair"); + + return 0; +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 4e34830f3e78..9a8ee1a850c8 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -600,5 +600,6 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical); #endif From patchwork Tue Dec 22 03:49:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985711 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8755DC433DB for ; Tue, 22 Dec 2020 03:56:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5841F22CB1 for ; Tue, 22 Dec 2020 03:56:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726614AbgLVDz7 (ORCPT ); Mon, 21 Dec 2020 22:55:59 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46436 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725833AbgLVDz6 (ORCPT ); Mon, 21 Dec 2020 22:55:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609358; x=1640145358; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bAerOpoWyJ4Z6ileXAdXXjiHOqmGj5NREdGkMvScpbg=; b=kefH29Fs+o1X8d1YW1o5VjfiAeFoePSU+VEPP6MFq9WhGHhe7dr97wEV NHgXjROW1TmtTbGtNOJiwAgRs59KExpQ5kv4pikqJtHLMp0PQCziOIvVA CCSxORXPLqx9w1pqUQE2YtePKTwCdJnW4Y6Gn7bbePCbvONLhREgd1bxE n0GqGX+az74fUd3hGBlEkVtVRLuC5a4LO55+qECPoS+RaqNFlnjSMHXdO JoauaYaj8hS9KmxHRJ6ATxD25fjS9h+uAuHBp4ZByAOLhK94gIAxGVpX9 8wrFgHjDLnHYK8JGHTOl3XKZ0wrOvYer5Pnf/wPnVYU/ZShdTS3NtNBuf g==; IronPort-SDR: P//FBOV2vjJZRvYRGs5GND2+iVetpJuP0trViG8gv9JOJ5Ilrcv6nRfJEN4v8IUdaNOSgfOiOx iC22KTaV/SAaNZiIZZ3O9356+xRvhGxolVvXJUXMMmT2BcVuCh8XxJNv60XF9SHQx586eO9tVm wulBNwkTvE5koeRQu+CQE/mtasXlcC4jctRmtJdXXjNZn5hALXQr96Rxov2+G7sKo+tvJxxxo5 MCujWGJqkM1bqUO925OFxEm/yGXRua58uoBrscGBqey9ITeCPBtGMO5BZn7l2rtxQg2uHKmYWM aDs= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193875" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:24 +0800 IronPort-SDR: VdSnqwJKrt7pCJ9ul6cenqSTPzk9C9Ihvpafq9OGhep76xkUT011nYl3HMk7WLYpEP97KdUOaP rOV/PvyIyEI88/smm70sHkp3WSWQLRWLLiRRmgeNK5LcIdhM7+VQn5awCz6oVetAKr70iW37y9 wd5Fhh3GxOHrYjvDl1crLTXgSPVjBw3uqgNYe9XKnh36soyW9OhetzXY8vHjC/4Cy0NypbnzCv Zis3/SavVYYVJ3UUMpOsooVR31dNcsw6fPGPp/QuxrOU1MWdPgzqSMJmRqQdvEcvM3V+WrYpcu hXgnqzvQs68o0KrTCDjIxbOj Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:35 -0800 IronPort-SDR: j2gjhs76+XVh9qhelyJMoeY6G+mUTAXIE4kwFSfnfWmNSsm9wLTQJNgd2NbaR2Q5RoCOLt+Pkt tSP4R2qvyL2lm3UYomeBw9Qy3GtNBgndQEvpqPWd30vNJGWHQ7nH+RGCQ+umm1q17c7IIk/0xb QrjXHPpMQN9cQDMknBxRdiiWhoyfs/NUT1pTRi+bCeR0cx6QopWv6cBfmPfqsKFyyfMFrmAmhy AJ9oMinq71cvaFw8FianTonRekGOMnVDC3CFcYEI7MMIbygsyhKr9OGFtpJn0SJZnXqKKY08iz dIE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:23 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Johannes Thumshirn Subject: [PATCH v11 37/40] btrfs: split alloc_log_tree() Date: Tue, 22 Dec 2020 12:49:30 +0900 Message-Id: <24e3b5fbc3897a7ab6881750a8ac28d70d91595d.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is a preparation for the next patch. This commit split alloc_log_tree() to allocating tree structure part (remains in alloc_log_tree()) and allocating tree node part (moved in btrfs_alloc_log_tree_node()). The latter part is also exported to be used in the next patch. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/disk-io.c | 33 +++++++++++++++++++++++++++------ fs/btrfs/disk-io.h | 2 ++ 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index efcf1a343732..dc0ddd097c6e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1197,7 +1197,6 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *root; - struct extent_buffer *leaf; root = btrfs_alloc_root(fs_info, BTRFS_TREE_LOG_OBJECTID, GFP_NOFS); if (!root) @@ -1207,6 +1206,14 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, root->root_key.type = BTRFS_ROOT_ITEM_KEY; root->root_key.offset = BTRFS_TREE_LOG_OBJECTID; + return root; +} + +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root) +{ + struct extent_buffer *leaf; + /* * DON'T set SHAREABLE bit for log trees. * @@ -1219,26 +1226,33 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, leaf = btrfs_alloc_tree_block(trans, root, 0, BTRFS_TREE_LOG_OBJECTID, NULL, 0, 0, 0, BTRFS_NESTING_NORMAL); - if (IS_ERR(leaf)) { - btrfs_put_root(root); - return ERR_CAST(leaf); - } + if (IS_ERR(leaf)) + return PTR_ERR(leaf); root->node = leaf; btrfs_mark_buffer_dirty(root->node); btrfs_tree_unlock(root->node); - return root; + + return 0; } int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; @@ -1250,11 +1264,18 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *log_root; struct btrfs_inode_item *inode_item; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + log_root->last_trans = trans->transid; log_root->root_key.offset = root->root_key.objectid; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 9f4a2a1e3d36..0e7e9526b6a8 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -120,6 +120,8 @@ blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, extent_submit_bio_start_t *submit_bio_start); blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio, int mirror_num); +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root); int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_add_log_tree(struct btrfs_trans_handle *trans, From patchwork Tue Dec 22 03:49:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985713 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8ADBC433DB for ; Tue, 22 Dec 2020 03:56:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 97CB122ADC for ; Tue, 22 Dec 2020 03:56:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726626AbgLVD4E (ORCPT ); Mon, 21 Dec 2020 22:56:04 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46437 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726200AbgLVD4D (ORCPT ); Mon, 21 Dec 2020 22:56:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609362; x=1640145362; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Sn6k/589k8wbYimhdvHuGqYc4KdzkBuwFOVyca0Jq0k=; b=pBbuKguJ7jJS1R0fM5qv1UfTJbIRxkAMO/IyEMG30iFzBl7iIfzSTyQT Tn/ARKrHApuOHqaDooPzihN8eGlQlh4LsQVVA2LfoTeIfKwsekT9AbNRp bBi5vd8AQL4ou+j81+Q6m1JJbjRqoqx2eyixJC/ZKBxZ4AiYnSxBh36m6 IWYymXnfwmtQka3ovnngpa99T2u2dqDyemfShAh0yZBZmZYARSO52d4C1 EgJFuz0mXsp9xLKMJO/V8th/kY3hqzwseegnZszwGoWynePf3saP/eXbI vO9PeCzBEv0KeZxrskiNhQVGVDaYfD5D4t9IrUVNK6wmsTnmapqdlJo2m g==; IronPort-SDR: o2aL7FNaKM+kw9McdFNujp5oHj0PNcrzukLR/ZKRkHDG+zlyFTfOzAMZl+kZspHy2gRi3demM0 pelAdpszguawaiB4EBmtgkC8Dy463omTkdkOJPxK/O/rg2k2rGdIHZeab9z3tyfi5Zfikdnclm V+TBpHf6aEkjbOj99HXVVCZa8iYVDTnqgRt18h7fpHC2t3zfuyXg0hLKDlpjxadcrAxeAkI3oe 3uTZzfj7k/EF0fjEX1J499M91X5ta2VPgAKY/xctNM8L50FARfuqyKN/JOnvByRf60Gc/V/5Jc wHc= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193878" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:25 +0800 IronPort-SDR: o+2W/T+5a+/3SkiwBEEVTiFgmUmWyUlI0ZAbm37FDSeB3BOuoN2gnHeGYCxipLqGnVpFye3yvm ja9UhbXQCV7kkSnoLiO742O7gEguv3F7O3Tg5eLYcTQD1k82FcS709r5nq+7t3KuKLAOoOMZxl hGGhqXfQDvA9g190RYWYr2k+XTqAQF8FLC3pKQcoSz3bl6aUTE4jOm8icraNaarLkcZYjuV7uI XSNOZmC+czQM5pO3eoBIxyFAMfXmlEygqDRBV31NACANrMrZgC9pALfF8WF8JvXGFp4ELGLllB X20eooe5gV0zes2CGsetWxe8 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:36 -0800 IronPort-SDR: GVevf0A2VyHo2D3vPQDj9Ptc8f+byFEJ3mpLAMdeGfTRXdV+DEWIhmklDdaCCgeEXbQgxzvQfq JdYtm6idc6r2BbUG/cJaum5Zv2M1/wd58fFVKLYpT5OcQHy2yRJ8nVqqatoB8zkICJZdc1OK2B prrZ5HMQX/+LR5kVRjyaQtosxGn64TyOvs3XwspjBA1bDgg/wmU7QnJBW7epwYE1+MziYoXKXV BXK9nwSRUIutkOJ8rgT2jDmmBOC7YRfyneCVvwUbb3WSveaveHkN4UxrIP3z5Fa3IkO+VN1Z+b G9g= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:24 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Johannes Thumshirn Subject: [PATCH v11 38/40] btrfs: extend zoned allocator to use dedicated tree-log block group Date: Tue, 22 Dec 2020 12:49:31 +0900 Message-Id: <920bc20e9b4b1bed3802c3dca6f9fa3c72850804.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/3 patch to enable tree log on ZONED mode. The tree-log feature does not work on ZONED mode as is. Blocks for a tree-log tree are allocated mixed with other metadata blocks, and btrfs writes and syncs the tree-log blocks to devices at the time of fsync(), which is different timing from a global transaction commit. As a result, both writing tree-log blocks and writing other metadata blocks become non-sequential writes that ZONED mode must avoid. We can introduce a dedicated block group for tree-log blocks so that tree-log blocks and other metadata blocks can be separated write streams. As a result, each write stream can now be written to devices separately. "fs_info->treelog_bg" tracks the dedicated block group and btrfs assign "treelog_bg" on-demand on tree-log block allocation time. This commit extends the zoned block allocator to use the block group. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.c | 7 ++++ fs/btrfs/ctree.h | 2 ++ fs/btrfs/disk-io.c | 1 + fs/btrfs/extent-tree.c | 79 +++++++++++++++++++++++++++++++++++++++--- 4 files changed, 85 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 5b477617021f..ffe8cf5818fd 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -902,6 +902,13 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, btrfs_return_cluster_to_free_space(block_group, cluster); spin_unlock(&cluster->refill_lock); + if (btrfs_is_zoned(fs_info)) { + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg == block_group->start) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); + } + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1085f8d9752b..b4485ea90805 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -977,6 +977,8 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; struct mutex zoned_meta_io_lock; + spinlock_t treelog_bg_lock; + u64 treelog_bg; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dc0ddd097c6e..12c23cb410fd 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2722,6 +2722,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) spin_lock_init(&fs_info->super_lock); spin_lock_init(&fs_info->buffer_lock); spin_lock_init(&fs_info->unused_bgs_lock); + spin_lock_init(&fs_info->treelog_bg_lock); rwlock_init(&fs_info->tree_mod_log_lock); mutex_init(&fs_info->unused_bg_unpin_mutex); mutex_init(&fs_info->delete_unused_bgs_mutex); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 23d77e3196ca..e11ad53c6734 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3590,6 +3590,9 @@ struct find_free_extent_ctl { bool have_caching_bg; bool orig_have_caching_bg; + /* Allocation is called for tree-log */ + bool for_treelog; + /* RAID index, converted from flags */ int index; @@ -3818,6 +3821,22 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Tree-log Block Group Locking + * ============================ + * + * fs_info::treelog_bg_lock protects the fs_info::treelog_bg which + * indicates the starting address of a block group, which is reserved only + * for tree-log metadata. + * + * Lock nesting + * ============ + * + * space_info::lock + * block_group::lock + * fs_info::treelog_bg_lock + */ + /* * Simple allocator for sequential only block group. It only allows * sequential allocation. No need to play with trees. This function @@ -3827,23 +3846,54 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) { + struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_space_info *space_info = block_group->space_info; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; u64 start = block_group->start; u64 num_bytes = ffe_ctl->num_bytes; u64 avail; + u64 bytenr = block_group->start; + u64 log_bytenr; int ret = 0; + bool skip; ASSERT(btrfs_is_zoned(block_group->fs_info)); + /* + * Do not allow non-tree-log blocks in the dedicated tree-log block + * group, and vice versa. + */ + spin_lock(&fs_info->treelog_bg_lock); + log_bytenr = fs_info->treelog_bg; + skip = log_bytenr && ((ffe_ctl->for_treelog && bytenr != log_bytenr) || + (!ffe_ctl->for_treelog && bytenr == log_bytenr)); + spin_unlock(&fs_info->treelog_bg_lock); + if (skip) + return 1; + spin_lock(&space_info->lock); spin_lock(&block_group->lock); + spin_lock(&fs_info->treelog_bg_lock); + + ASSERT(!ffe_ctl->for_treelog || + block_group->start == fs_info->treelog_bg || + fs_info->treelog_bg == 0); if (block_group->ro) { ret = 1; goto out; } + /* + * Do not allow currently using block group to be tree-log dedicated + * block group. + */ + if (ffe_ctl->for_treelog && !fs_info->treelog_bg && + (block_group->used || block_group->reserved)) { + ret = 1; + goto out; + } + avail = block_group->length - block_group->alloc_offset; if (avail < num_bytes) { ffe_ctl->max_extent_size = avail; @@ -3851,6 +3901,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, goto out; } + if (ffe_ctl->for_treelog && !fs_info->treelog_bg) + fs_info->treelog_bg = block_group->start; + ffe_ctl->found_offset = start + block_group->alloc_offset; block_group->alloc_offset += num_bytes; spin_lock(&ctl->tree_lock); @@ -3865,6 +3918,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, ffe_ctl->search_start = ffe_ctl->found_offset; out: + if (ret && ffe_ctl->for_treelog) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); return ret; @@ -4114,7 +4170,12 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); case BTRFS_EXTENT_ALLOC_ZONED: - /* nothing to do */ + if (ffe_ctl->for_treelog) { + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg) + ffe_ctl->hint_byte = fs_info->treelog_bg; + spin_unlock(&fs_info->treelog_bg_lock); + } return 0; default: BUG(); @@ -4158,6 +4219,7 @@ static noinline int find_free_extent(struct btrfs_root *root, struct find_free_extent_ctl ffe_ctl = {0}; struct btrfs_space_info *space_info; bool full_search = false; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; WARN_ON(num_bytes < fs_info->sectorsize); @@ -4171,6 +4233,7 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.orig_have_caching_bg = false; ffe_ctl.found_offset = 0; ffe_ctl.hint_byte = hint_byte_orig; + ffe_ctl.for_treelog = for_treelog; ffe_ctl.policy = BTRFS_EXTENT_ALLOC_CLUSTERED; /* For clustered allocation */ @@ -4245,8 +4308,15 @@ static noinline int find_free_extent(struct btrfs_root *root, struct btrfs_block_group *bg_ret; /* If the block group is read-only, we can skip it entirely. */ - if (unlikely(block_group->ro)) + if (unlikely(block_group->ro)) { + if (btrfs_is_zoned(fs_info) && for_treelog) { + spin_lock(&fs_info->treelog_bg_lock); + if (block_group->start == fs_info->treelog_bg) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); + } continue; + } btrfs_grab_block_group(block_group, delalloc); ffe_ctl.search_start = block_group->start; @@ -4434,6 +4504,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, bool final_tried = num_bytes == min_alloc_size; u64 flags; int ret; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; flags = get_alloc_profile_by_root(root, is_data); again: @@ -4457,8 +4528,8 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, sinfo = btrfs_find_space_info(fs_info, flags); btrfs_err(fs_info, - "allocation failed flags %llu, wanted %llu", - flags, num_bytes); + "allocation failed flags %llu, wanted %llu treelog %d", + flags, num_bytes, for_treelog); if (sinfo) btrfs_dump_space_info(fs_info, sinfo, num_bytes, 1); From patchwork Tue Dec 22 03:49:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3BDFC433DB for ; Tue, 22 Dec 2020 03:56:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9CE9722ADC for ; Tue, 22 Dec 2020 03:56:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726636AbgLVD4G (ORCPT ); Mon, 21 Dec 2020 22:56:06 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46443 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726200AbgLVD4G (ORCPT ); Mon, 21 Dec 2020 22:56:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609365; x=1640145365; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FoJH1O+Cnxcff/7nmcL2tOpoaJhBp0OwvJmRlFfChiw=; b=IHj6mqtVw1nSdFn1sClSl5gY4F3yYt94gTsO6+hWzL5DuXBF73jvNF6u VgT8snicpG8wcAz43cAJ1Od81Wrl1pPwITvYWAVsA9G3HadNhdgbFDGTI BZaYxF1n63m9YNr6EwZBTDNGpfP6uLowEfRJQNrSHiSrs5BRAp9/NkJs7 TSrzPoKmCfRJ/bkodZYgwV4rCElKd/d7fcngLP+hbgErRWCeB8kzBM9BN V/SnxCRapoc+x88Op8s/J0OQ/v6VZ3DqQryvBo3Q3FO0QJ6LGJBbdiOKb NGzN0Hu4rAa/HQv/3pqutQHT5sa0tzH87lTexwZDmuiZipea8jlZ/GImu A==; IronPort-SDR: kgPkdyswlQ36hLn1z6Hu6krO9PKBpSfuJwg3+rL1jl/y008Kq8YGKQQkRJC1q/vKKDelPq68cX zj0Mq9jM6i9u8yvegt4+ZYxSHMl9PdjnowN2azhn4gPpjm9boivGumNWrYKiUMo5orHX1hv0EI Tpk92dfDiIroupn4lpo+ZMQOhINLgo3DT4dWHTxXwo2BU0tHtTgXau+GjYDKfymN/fIXHzaClc LGoZZ657yE8Ncnxv8Is3LUxtsOCRF0Z+NECs7i90sfXxB/uw1At7pw4TA9dWH+X59t4EL4jAfu YVg= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193887" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:27 +0800 IronPort-SDR: lxkCZ6bRvNab6cOvb0+ZpSOUTJpYq3asXLLiN00X0um6o5HRRkamBvmTe692Hjb9cBTRbZobJ1 XDpYX8CplHJQFdX/Uxlax7S1YF2LMZXyarT8Irz3X796jCSO+HB0XblVbr8hpBk2YR6e05xhxC tSH3DpRZ0SHo9Y/5yOeZo08s5YQwqkKeRJy0OnQ4c/VEsyZwNYdSQqdtMQa9Pquj8ROoIxWP+C v86BmM7tVXkG655DBUwQIVQQuF/fMXBo9kDuPNYIh42ol1Vz5n0Tr6CtVh2eiJUwi3I8oC0iMg kUYL5EwTtn73NnYrkd0WqgFs Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:38 -0800 IronPort-SDR: NhhCrds9VJk8snZ0gnUS4LUS3gmf7mR170+dx7RFLfKZqOVKnHpwXNkMVrzjFfMkHRbHBJMvAl Ur6pcbxSjbPFyXlPcAP6AbboB2RWkMXx17aQ28EsR4z4xJRZdvHxGsnz2LhPLskmUHZ9hQwoIZ Q33821fdPbUExsCIyW76oCGjcS82NHiJ5lqKQKRC/VCfDsWxYbYJDNC+NIknw1U7r7/CkM/i87 XSqA/7KGdaak5f9HlRrmkdubldakRcbf4lah8yuWqEyOvrxWdOEXEE+G69BYYehnbXFoXE7mpp 7/Q= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:26 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v11 39/40] btrfs: serialize log transaction on ZONED mode Date: Tue, 22 Dec 2020 12:49:32 +0900 Message-Id: <39b1c016d74422b9dcac01ba6e33d3ccd8000889.1608608848.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 2/3 patch to enable tree-log on ZONED mode. Since we can start more than one log transactions per subvolume simultaneously, nodes from multiple transactions can be allocated interleaved. Such mixed allocation results in non-sequential writes at the time of log transaction commit. The nodes of the global log root tree (fs_info->log_root_tree), also have the same mixed allocation problem. This patch serializes log transactions by waiting for a committing transaction when someone tries to start a new transaction, to avoid the mixed allocation problem. We must also wait for running log transactions from another subvolume, but there is no easy way to detect which subvolume root is running a log transaction. So, this patch forbids starting a new log transaction when other subvolumes already allocated the global log root tree. Signed-off-by: Naohiro Aota --- fs/btrfs/tree-log.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 930e752686b4..d269c9ea8706 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -105,6 +105,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans, struct btrfs_root *log, struct btrfs_path *path, u64 dirid, int del_all); +static void wait_log_commit(struct btrfs_root *root, int transid); /* * tree logging is a special write ahead log used to make sure that @@ -140,6 +141,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans, { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *tree_root = fs_info->tree_root; + const bool zoned = btrfs_is_zoned(fs_info); int ret = 0; /* @@ -160,12 +162,20 @@ static int start_log_trans(struct btrfs_trans_handle *trans, mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + if (btrfs_need_log_full_commit(trans)) { ret = -EAGAIN; goto out; } + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } + if (!root->log_start_pid) { clear_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); root->log_start_pid = current->pid; @@ -173,6 +183,15 @@ static int start_log_trans(struct btrfs_trans_handle *trans, set_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); } } else { + mutex_lock(&fs_info->tree_log_mutex); + if (zoned && fs_info->log_root_tree) + ret = -EAGAIN; + else if (!fs_info->log_root_tree) + ret = btrfs_init_log_root_tree(trans, fs_info); + mutex_unlock(&fs_info->tree_log_mutex); + if (ret) + goto out; + ret = btrfs_add_log_tree(trans, root); if (ret) goto out; @@ -201,14 +220,22 @@ static int start_log_trans(struct btrfs_trans_handle *trans, */ static int join_running_log_trans(struct btrfs_root *root) { + const bool zoned = btrfs_is_zoned(root->fs_info); int ret = -ENOENT; if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &root->state)) return ret; mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + ret = 0; + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } atomic_inc(&root->log_writers); } mutex_unlock(&root->log_mutex); From patchwork Tue Dec 22 03:49:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11985717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1DCBC433E6 for ; Tue, 22 Dec 2020 03:56:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7FD7A22CB1 for ; Tue, 22 Dec 2020 03:56:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726644AbgLVD4P (ORCPT ); Mon, 21 Dec 2020 22:56:15 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:46382 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725975AbgLVD4P (ORCPT ); Mon, 21 Dec 2020 22:56:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1608609374; x=1640145374; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CpO3uWrOCuwnRCof5DeE8EOqszaN2mESlFAcrCxQTZw=; b=Kso2wPEF2xh9mSUq66v9bHLWAFgECynxD8nM4lGKv5z5HBjqV7eASfwY YAa8Nr6Nz+HezL4VKG6qSwWHNB3ZkWuVo30q5f24mapn0z5Twjc79uNhS XuXdR2zdkxH6juAoCjGOXg8ICtorOjD1EEwbPbO9jNzt4fIAH2pzGQgFg SgCV7yZUNYSUWuyql929QE9rVWrEx0EfmYw822yqR/vo9udaT19EgBHuM PAb4y5BqZ6CGeD98vQ/dv+IDki7cCFW3GC3CCbUxVK0E0KKSNrwogHsaE 6ofEqmN6ai5NBzEh0CSxbUvoHkaEEofDaNmuOHVBK9mKXRXIfWLujYuRl Q==; IronPort-SDR: 7dCuQKUJUuuxeb8HBJ42+k31QwDEbd4IS4XDI34N7lYZOHCB9Qe3Q0s9exEonfJ6mLJQJ03/ik T7+JwfgeHVLnT6SxZVrvUGZw5gwKgikHdvDYWjU8yWNV0M0ZyqGLMsIsyx8l6R0KrlzA9mkoxg HhFOgVCXN5XttziXoFE4uUdi1n6Zcoe1r9n4rydW6m6WdYxfvLy3NrXWLIm0qNV7BMRWFnxoJl RjKhEx7np2sBviWht/6Wqm63Aea8lLXggqE6Kor1o6EG3RVf0b4+D9rBo9ST34GQwT9Xp6a5i9 ecU= X-IronPort-AV: E=Sophos;i="5.78,438,1599494400"; d="scan'208";a="160193910" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Dec 2020 11:51:29 +0800 IronPort-SDR: vTsGHhRMu2oze+etzUJ9aPdy1R8VWZI7hfhnD7QW+tJ57B24btFv/Ts+jxJrWX4hk8y1m0WwOx x8Ym8RbCp/jVXVOCZYf6ARVrDm7yi/A2+pdJDcScsbzohOvi3wJYVsbtQoMqm6xCSnqO5oSKSR 1C/cMggJpoV1asbYb/aFWV7S7byHKyC2jMPr+NaR5lr2YQgvTj6FJktXrmCAqjBA2IQLhpzZCp gFrXrLSBbsFrsyTIUUUVkJ8u7EuzDM/U8ghNjC/d80QSdPuMZA9fQkdKtYO0tagpxRhD6tZQH4 XdsKUkTzyH0xH3+FPYt0FDIP Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2020 19:36:40 -0800 IronPort-SDR: w14eSHPGdWO13CozOGfaVimijjvjBSy7x8EJ/7uLMS5C24yvydn8ZGRYtuOw4XIwIQ4OdX0ZWz onHEryPi8pV7m5Y/mv4nh7GyVN93Cm9xF5ZILLAh+sISPP2STCp4PS58uECumHG8ueB5nFSTfU eXWwmB3+0B1ZXB8PEOmZCWHyKWlhQ3mXDFXzbtY8P3BY4RT/oMfnkdKqdEf9izRDUU4b/a5FU0 4jOtr5aXg9atexVa3qxTUsr98y8SCvAElfNvS17yNhBVsE8hN/LcrqSyetHETSHpnNmFdgWCen SAE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Dec 2020 19:51:28 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v11 40/40] btrfs: reorder log node allocation Date: Tue, 22 Dec 2020 12:49:33 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> References: <06add214bc16ef08214de1594ecdfcc4cdcdbd78.1608608848.git.naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 3/3 patch to enable tree-log on ZONED mode. The allocation order of nodes of "fs_info->log_root_tree" and nodes of "root->log_root" is not the same as the writing order of them. So, the writing causes unaligned write errors. This patch reorders the allocation of them by delaying allocation of the root node of "fs_info->log_root_tree," so that the node buffers can go out sequentially to devices. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 7 ------- fs/btrfs/tree-log.c | 24 ++++++++++++++++++------ 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 12c23cb410fd..0b403affa59c 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1241,18 +1241,11 @@ int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; - int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); - ret = btrfs_alloc_log_tree_node(trans, log_root); - if (ret) { - btrfs_put_root(log_root); - return ret; - } - WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index d269c9ea8706..8f917cb91151 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3157,6 +3157,16 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, list_add_tail(&root_log_ctx.list, &log_root_tree->log_ctxs[index2]); root_log_ctx.log_transid = log_root_tree->log_transid; + mutex_lock(&fs_info->tree_log_mutex); + if (!log_root_tree->node) { + ret = btrfs_alloc_log_tree_node(trans, log_root_tree); + if (ret) { + mutex_unlock(&fs_info->tree_log_mutex); + goto out; + } + } + mutex_unlock(&fs_info->tree_log_mutex); + /* * Now we are safe to update the log_root_tree because we're under the * log_mutex, and we're a current writer so we're holding the commit @@ -3315,12 +3325,14 @@ static void free_log_tree(struct btrfs_trans_handle *trans, .process_func = process_one_buffer }; - ret = walk_log_tree(trans, log, &wc); - if (ret) { - if (trans) - btrfs_abort_transaction(trans, ret); - else - btrfs_handle_fs_error(log->fs_info, ret, NULL); + if (log->node) { + ret = walk_log_tree(trans, log, &wc); + if (ret) { + if (trans) + btrfs_abort_transaction(trans, ret); + else + btrfs_handle_fs_error(log->fs_info, ret, NULL); + } } clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1,