From patchwork Fri Jan 15 06:53:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D99E0C433E0 for ; Fri, 15 Jan 2021 06:56:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A2D8022473 for ; Fri, 15 Jan 2021 06:56:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727357AbhAOG4g (ORCPT ); Fri, 15 Jan 2021 01:56:36 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41680 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG4f (ORCPT ); Fri, 15 Jan 2021 01:56:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693794; x=1642229794; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cV8mSpl3hDmF7Y62HKxGpA/uecu7FnYx08eRGKjzOw4=; b=EURnZ/q06J+yug2rdmjLwwk4Jly1CrrDe89uZWrY6iCGvO4KIW4GpgNw zAsrKNGfcHJm+HNz2bbRf6R5i9Vtlajh7sk/Fx/0rCqXyOL8rM3JXcCfP qv1FRuiWC8whlIuiqXnSykawBvecA/CMQcB1ZFiqzwAh7L6wX/Pqy2+dt Dmi1NkmyFxqZD2bLQ/onPo87IQ704maiRHLu23+e8iygOhU7AIFjzntJF 2qJrenxYFjM/NeYBVV6p7wSnj6c46da76NYSSmKXD547PzkBkYplowgOw 6wiYY6zzl82iRkJ5pomlHIsuLwYJPOp3jKNC/2jqaiIDYyZPjf7ZGr94D g==; IronPort-SDR: yjgb1u5GcH0UgA1vYqLjozjSkoFj88jg5w8cBUydfYwjO+tw28QaTXH/djTopFhcVuj5JjPWUa 1ALQY4p2jKRJgpxiGNeNgHWwnRvxEyVX0WDHf1h5xMvMjw4FKZ4uFKn3oyy4liEjNTa5VJc7Cc 8jiiYK2804slobbm3jijjfZOjTq5LpVid2P4c79xZJiuznKopTQLNt7XMG25x8wPM5Ms9Q/xfa ep6pzR0eFn1Gwpdae++cwooGtCry9hhBBFEwvYdHolgC2pbcNHa9sZ+sTq9TRQSS2FDvXJPs9n ++o= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928176" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:05 +0800 IronPort-SDR: 7yBUaecGf/C81xgTmTxNhQkgDky1syjYUgkcN+85G0uxDXPNOxJotIA6VURl8Krleo5MwLYcYW TRNsVY7GCs2WeAYDTZquLu46CuL6WRLj53Mlme1nvfAP82EkFoWf053H1K8dV4YoHp4W4PCfsZ d6XAQJxqYenvFTxVpndVuQtCZ/UrtW6kI/PLzdmVFqmQ2C171HPgkKzygCPe15ozdS86buCtj9 IgYDTm7KbwoAhIgwT1Dno7lXMhrMJxwXunUVtSvLckHqxCEUDGKvVdMRglmpKV58Me2xaxC/fv z7ICkDvmYgKWGA0lvI0elAVL Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:39:48 -0800 IronPort-SDR: HVJkhQNys2w27pzx3tAlqxjh7ISrAj4XQU5gAludki91qmgPgWb9nXtLFgpQoyRHiVeLL8sVhy edqXr82femRF2bGxn8KPZSFx0EmDqNIHY3ARrDe+uFZMNMQeEjn8ff+kxLqVvJz1jPBz9OvQx1 wKQTq+ylwVA7maIDIMVPVhNrPB4flex/ulrAkb6pAHxotYLPwL2AeAKFrgo+NFbLsrnftVgS20 Tds3hxpAO4rS3TZqDlX6BrsHE5+9wLo3EoeN1FThUvjjd6I60rs9NDF7yAUIYGgJ4HRNsaRtnW xok= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:04 -0800 Received: (nullmailer pid 1916420 invoked by uid 1000); Fri, 15 Jan 2021 06:55:01 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Christoph Hellwig Subject: [PATCH v12 01/41] block: add bio_add_zone_append_page Date: Fri, 15 Jan 2021 15:53:04 +0900 Message-Id: <8d02dae71ff7ec934bc3155850e2e2b030b7dbbe.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which is intended to be used by file systems that directly add pages to a bio instead of using bio_iov_iter_get_pages(). Cc: Jens Axboe Reviewed-by: Christoph Hellwig Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- block/bio.c | 33 +++++++++++++++++++++++++++++++++ include/linux/bio.h | 2 ++ 2 files changed, 35 insertions(+) diff --git a/block/bio.c b/block/bio.c index fa01bef35bb1..a5c534bfe999 100644 --- a/block/bio.c +++ b/block/bio.c @@ -851,6 +851,39 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, } EXPORT_SYMBOL(bio_add_pc_page); +/** + * bio_add_zone_append_page - attempt to add page to zone-append bio + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist of a bio that will be submitted + * for a zone-append request. This can fail for a number of reasons, such as the + * bio being full or the target block device is not a zoned block device or + * other limitations of the target block device. The target block device must + * allow bio's up to PAGE_SIZE, so it is always possible to add a single page + * to an empty bio. + * + * Returns: number of bytes added to the bio, or 0 in case of a failure. + */ +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset) +{ + struct request_queue *q = bio->bi_disk->queue; + bool same_page = false; + + if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND)) + return 0; + + if (WARN_ON_ONCE(!blk_queue_is_zoned(q))) + return 0; + + return bio_add_hw_page(q, bio, page, len, offset, + queue_max_zone_append_sectors(q), &same_page); +} +EXPORT_SYMBOL_GPL(bio_add_zone_append_page); + /** * __bio_try_merge_page - try appending data to an existing bvec. * @bio: destination bio diff --git a/include/linux/bio.h b/include/linux/bio.h index c6d765382926..7ef300cb4e9a 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -442,6 +442,8 @@ void bio_chain(struct bio *, struct bio *); extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int); extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *, unsigned int, unsigned int); +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset); bool __bio_try_merge_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off, bool *same_page); void __bio_add_page(struct bio *bio, struct page *page, From patchwork Fri Jan 15 06:53:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C4ECC433E6 for ; Fri, 15 Jan 2021 06:56:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E6456233CF for ; Fri, 15 Jan 2021 06:56:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727707AbhAOG4i (ORCPT ); Fri, 15 Jan 2021 01:56:38 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41681 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG4h (ORCPT ); Fri, 15 Jan 2021 01:56:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693796; x=1642229796; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=miwcS2PUXlaNgE1YrtKyGZ/a7gdhwX7SYGwz74INDPs=; b=USkamyy8joykEO7qWqvFLHOwyRuqvXzXqXsZijlCN3L0E3d072LcOlMQ wJfudVr/WMK/JnR8BCg9RfYA9fUhbD+QN57xYztrNi4jrG2R9RTgeLRCT p/ey3LznSUaEmzqxtp50lmAp5aGV0h0bPDxr4JfPM36Mjl0gXH/hWXqZg aimMTo2wQxIUPt+sGWz3H89O5omtPPHw2/y5UKH9oY+oAw5W8KvpMmGFn KSfzDblSTRuPTqtOx/SvN3RY5Gf5xvN9K+ZhUPGzG4QnCn+D6VQ3poBJX lCIxYxWJ2NdxAlhcU5rZqrxBbDp3rnIV25UNLSk6RoVWPhJ2fpiq4+57Y Q==; IronPort-SDR: exccyp09M0Q0+0H0wPsxfuxXchG+kEi6lMf2FuNeEamojowXx4G3n7Ix54Bj3iJRaXnfCY2OBE whVfoq8w/CFX3+jFWMiWvKhs8Df78mcAU9sLypR/xfnKTaJbd5hARPmDmhFD50fpHBF8xIYcbk UaWGPGraG+Ij0zLmXYYNn4qxZ1zdT98aK9Qs0El2o1VR9meOj5bZCPYNxEIj3N8rSBGXA+86iI SRrlkINWuaM9Dcp1VEydJ2Lv5RF1693fkVmvGKfwilFfjm/IuOoSeHFs/GbgZ5UOuIRdYEgEsp dGY= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928180" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:07 +0800 IronPort-SDR: N5ZvsC32N9irjXjhXytOaBEyS8esXgrTT77AIrOulDpvWsTkcXkdD9OZSNClctWuU2tSrPIgZV BKBWe9r3DutyMHKXQjfF9pL8fTJhvw/v3aC2SKcAVHFp+GL0sFRw9WRnxALcoSpRhLTC5nTf1W crdbBCBZPWyv6XyXJig7mww3wrLwSvzfk8PtuosA9b3q+MJrkGGHOX041mPaRO/VXckcfj5AjO XvzgFD3lM3DUw2t9CmLkaQr9G0Ob4bj03l5ZWrXMbJNhobHPhQJvltrs3ISZbWkWjURetm1Z+8 s0V58NrXInO2LJXjrWakKcSx Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:39:50 -0800 IronPort-SDR: ZJXgoxndNWnN4v85ydN6Dn3LuOwzDjbzArYp0bcmoMiBBORpVgc48YfgSmTXfILumo2UnE8Xok B4uDPzfEdAWU63hMTPGIMXZraGcJLxb9s0bwoAIGTwrdQvnEQufoelbm0lJTe7yv04KgqpTAFf 6c67iZ8gJxEGv7rmw3+nl6KbD16M6GMSPWO57Ep15at/FNM1YQVu25YJZoLrGy2p/Mq5li59qv MsEN0IYW5kkH7dQH0QaWIAAtKyH9OE5HDX7mXB46IuQn/89I3I2paby5sYOZUOQJumbH6MjoIl FgI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:06 -0800 Received: (nullmailer pid 1916422 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Christoph Hellwig Subject: [PATCH v12 02/41] iomap: support REQ_OP_ZONE_APPEND Date: Fri, 15 Jan 2021 15:53:05 +0900 Message-Id: <2f2639925d82a137308c6566f1863cc6fb79c58d.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds such restricted bio using __bio_iov_append_get_pages if bio_op(bio) == REQ_OP_ZONE_APPEND. To utilize it, we need to set the bio_op before calling bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND and restricted bio. Reviewed-by: Christoph Hellwig Signed-off-by: Naohiro Aota --- fs/iomap/direct-io.c | 43 +++++++++++++++++++++++++++++++++++++------ include/linux/iomap.h | 1 + 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 933f234d5bec..2273120d8ed7 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -201,6 +201,34 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, iomap_dio_submit_bio(dio, iomap, bio, pos); } +/* + * Figure out the bio's operation flags from the dio request, the + * mapping, and whether or not we want FUA. Note that we can end up + * clearing the WRITE_FUA flag in the dio request. + */ +static inline unsigned int +iomap_dio_bio_opflags(struct iomap_dio *dio, struct iomap *iomap, bool use_fua) +{ + unsigned int opflags = REQ_SYNC | REQ_IDLE; + + if (!(dio->flags & IOMAP_DIO_WRITE)) { + WARN_ON_ONCE(iomap->flags & IOMAP_F_ZONE_APPEND); + return REQ_OP_READ; + } + + if (iomap->flags & IOMAP_F_ZONE_APPEND) + opflags |= REQ_OP_ZONE_APPEND; + else + opflags |= REQ_OP_WRITE; + + if (use_fua) + opflags |= REQ_FUA; + else + dio->flags &= ~IOMAP_DIO_WRITE_FUA; + + return opflags; +} + static loff_t iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, struct iomap_dio *dio, struct iomap *iomap) @@ -208,6 +236,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); unsigned int fs_block_size = i_blocksize(inode), pad; unsigned int align = iov_iter_alignment(dio->submit.iter); + unsigned int bio_opf; struct bio *bio; bool need_zeroout = false; bool use_fua = false; @@ -263,6 +292,13 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, iomap_dio_zero(dio, iomap, pos - pad, pad); } + /* + * Set the operation flags early so that bio_iov_iter_get_pages + * can set up the page vector appropriately for a ZONE_APPEND + * operation. + */ + bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua); + do { size_t n; if (dio->error) { @@ -278,6 +314,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio->bi_ioprio = dio->iocb->ki_ioprio; bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; + bio->bi_opf = bio_opf; ret = bio_iov_iter_get_pages(bio, dio->submit.iter); if (unlikely(ret)) { @@ -293,14 +330,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, n = bio->bi_iter.bi_size; if (dio->flags & IOMAP_DIO_WRITE) { - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE; - if (use_fua) - bio->bi_opf |= REQ_FUA; - else - dio->flags &= ~IOMAP_DIO_WRITE_FUA; task_io_account_write(n); } else { - bio->bi_opf = REQ_OP_READ; if (dio->flags & IOMAP_DIO_DIRTY) bio_set_pages_dirty(bio); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 5bd3cac4df9c..8ebb1fa6f3b7 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -55,6 +55,7 @@ struct vm_fault; #define IOMAP_F_SHARED 0x04 #define IOMAP_F_MERGED 0x08 #define IOMAP_F_BUFFER_HEAD 0x10 +#define IOMAP_F_ZONE_APPEND 0x20 /* * Flags set by the core iomap code during operations: From patchwork Fri Jan 15 06:53:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A8FCC4332E for ; Fri, 15 Jan 2021 06:56:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6398D225AA for ; Fri, 15 Jan 2021 06:56:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728000AbhAOG4t (ORCPT ); Fri, 15 Jan 2021 01:56:49 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41699 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG4t (ORCPT ); Fri, 15 Jan 2021 01:56:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693808; x=1642229808; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aXSuJwRCq0uscBpyggSda8v7fGwcH4DQGS83dCwzua8=; b=gpyl1YiY/GuN2mk33fC7qH/2+6eCBxpMrd+derbEXZz8yQgFo3wziX9b 2Rnzy6w+BJAXW3Y5e9H3qX8jmUe6n/1Z6nB1nkNdiZCbWxGXcslYcFYI5 ggjIficjLFMsqhqA2xFUU9j+Gv8+YISB5qwzdqXp7sDN2UXInRzUA/OLs A4GU6h2SjiZN40kK2ZkV7ULds9VkVCj1+wx7NlHR+QYn2fEjhXYURM8OH 06HwdVPCIxizCo1agrRGjSKMP2nWzzFjWtBfE00jNC0l03ANDzi927msv 4LAbbVzmFU7qGZ8HwW+3+xyN/BNW/bmxO5wSFppgH988e5yuIFEipPam8 w==; IronPort-SDR: rA3zN/Q1rRnkyz87pObJlPXlGX3KsBVjvlNlBxS4T/89J4MkQd2PxC3h7qTwDsVTZB+bElqPk1 NLx8cyaQa1x8QAVfiePBK7Muw3qt6jn8JlovV0UR0qaNVYMiM8c9RMbo1shZMscPgtf13RcOWO skyA0lUlp9Xueb4CZFlMRvQ3/s9lwzK14CV+uuPAEKSh641gRplJboHzTH1BfKSELuOVeUO3VT A2wHStVt2r0nzmh2Vk947B+pAzilpPyw3K471EGbP2c8PewyTtDHpclqlEcB077ioHqN0XVKty p50= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928184" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:09 +0800 IronPort-SDR: vn+3Qx6k+eBbK5JLdTZrRWxo60HZYKNDkX4Rf4SWz2NT+fZ3EU87QRnU78uslxsyzF7lnWgG5Y Pnl4WbMkgbxWi9VekT0DlatpL9HEkVah5DigdNHIvbZFwOolWTRd0n5F8kQFxfkouLkUoQppL2 bcY6GZMEMAxLvlc3smJ6t2Q8Jun/oPfzeEVgsQUY7S8KDGbGJ0HDx6e0drKZrs0ebdg8VAP5up 18sKZThFPZ/OR3yM3YXxt+znp8BMtiDAN2Be75UQIewC88/BaGWfS2mtXDgZrLdAcU73aYjY6F yHE5Ue+1aQBxcJzUZ3JjUWw+ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:39:52 -0800 IronPort-SDR: KrSOCqq5unLTwrd1eZeAX/v6KW/R6dhfHqCtqwAOwepTSIqVf6z/3xhnvHv5ADx/5sQYtKTIJG EP7FCnIvBTxudQ+JISO1SeAEZ1686JZSrXbpk8xD7OiQVL5HfMF+AGUY4z2yRaCqCKl/aiNvwk jbWlu1h/6YxyQFkVjAd6v78ZDAhndI9ywCQ3VwH5hSlJb49Agus9Pho57YnwtEy+bjQ4izxYp2 dgZ27iPPV4qbwfspsZyZ8sRjST9vkX4e0GOgoCUz5UeTVlxl9RJq4jC2Ehi8fZrQLeECZrHK4p utU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:08 -0800 Received: (nullmailer pid 1916424 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 03/41] btrfs: defer loading zone info after opening trees Date: Fri, 15 Jan 2021 15:53:06 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is preparation patch to implement zone emulation on a regular device. To emulate zoned mode on a regular (non-zoned) device, we need to decide an emulating zone size. Instead of making it compile-time static value, we'll make it configurable at mkfs time. Since we have one zone == one device extent restriction, we can determine the emulated zone size from the size of a device extent. We can extend btrfs_get_dev_zone_info() to show a regular device filled with conventional zones once the zone size is decided. The current call site of btrfs_get_dev_zone_info() during the mount process is earlier than reading the trees, so we can't slice a regular device to conventional zones. This patch defers the loading of zone info to open_ctree() to load the emulated zone size from a device extent. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 13 +++++++++++++ fs/btrfs/volumes.c | 4 ---- fs/btrfs/zoned.c | 24 ++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 4 files changed, 44 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 948661554db4..e7b451d30ae2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3257,6 +3257,19 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (ret) goto fail_tree_roots; + /* + * Get zone type information of zoned block devices. This will also + * handle emulation of the zoned mode for btrfs if a regular device has + * the zoned incompat feature flag set. + */ + ret = btrfs_get_dev_zone_info_all_devices(fs_info); + if (ret) { + btrfs_err(fs_info, + "failed to read device zone info: %d", + ret); + goto fail_block_groups; + } + /* * If we have a uuid root and we're not being told to rescan we need to * check the generation here so we can set the diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2c0aa03b6437..7d92b11ea603 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -669,10 +669,6 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; - ret = btrfs_get_dev_zone_info(device); - if (ret != 0) - goto error_free_page; - fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 155545180046..90b8d1d5369f 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -143,6 +143,30 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + int ret = 0; + + if (!btrfs_fs_incompat(fs_info, ZONED)) + return 0; + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + /* We can skip reading of zone info for missing devices */ + if (!device->bdev) + continue; + + ret = btrfs_get_dev_zone_info(device); + if (ret) + break; + } + mutex_unlock(&fs_devices->device_list_mutex); + + return ret; +} + int btrfs_get_dev_zone_info(struct btrfs_device *device) { struct btrfs_zoned_device_info *zone_info = NULL; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 8abe2f83272b..5e0e7de84a82 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -25,6 +25,7 @@ struct btrfs_zoned_device_info { #ifdef CONFIG_BLK_DEV_ZONED int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone); +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info); int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); @@ -42,6 +43,12 @@ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, return 0; } +static inline int btrfs_get_dev_zone_info_all_devices( + struct btrfs_fs_info *fs_info) +{ + return 0; +} + static inline int btrfs_get_dev_zone_info(struct btrfs_device *device) { return 0; From patchwork Fri Jan 15 06:53:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F0B7C43332 for ; Fri, 15 Jan 2021 06:56:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4A4BB22597 for ; Fri, 15 Jan 2021 06:56:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728197AbhAOG4w (ORCPT ); Fri, 15 Jan 2021 01:56:52 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41647 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG4v (ORCPT ); Fri, 15 Jan 2021 01:56:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693811; x=1642229811; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6NMYHMWkWnYQESyXK/o0VVftEdCB2HRA05izQsfXKGU=; b=c5x7s0DwohjCG74DBKdDkvVT++koC7UKM05bmXg1xpK/T9Fx+249sVQD j3+YkR2BPD/GK/p970/5KXuGxOHrtM1oKCSGP1efMSp1lrtfq7KLqulxH pXgc14il00Jht5IXdZ7S83il6qUJiW14oO71w7iN3AsqwZUt6ye/KxlZV eO8uXAey9lRDPC4qQ+0j9DE4NjcAL2u4U4U4F/ieuVSiUUvNxn88a1gFo 7oQsKe/TamN2seOfL5V7IIr1N4+BEnvp0VPN/hgzddSB7GOSpMPkFRjYK HBnC5L/eQR02i+8RO7BJIhaYu5FzcSOg5uY+xBdWpp/WtCCVpmKRJM5d3 A==; IronPort-SDR: Q0mwnt9f+e1zVJqb0EDUxwsp7FUu6MfCPtt55ESeE+diQZHO50EtorupFq8+T+I7I1XEtvViVO 6ecit8Cifu9bOmqlmxDIbzVgDT5iMth+J2XaI8oWoT3ey9hg5to0Nu4O4LtOQK1CbaAmaRZaJd PrWEy26h96SGgjc3Xteo0Kfw9Ys3t/uDr7Ove4rNqtUsclRWSuvdVac+aL6fj6ogZRhN3ZkPk7 aMOmc5qg8rfbERJBmsXVhWJlPdJolRkDM0LMDGz+Z/Yf/EUxHtx+1tHNuQP3JL7oXILqJSGYgg CSQ= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928188" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:11 +0800 IronPort-SDR: xuVNWJEQznZByNdzyu1XHZGoqmCopYprKXbSRnlT4PLhDhAIKxeQxHWbZLfRMaIZKKwVATdjjg rSDijMxjHJO1oirhq875qSjUsfyfqekrDWE/zYNdEiCX/Kz7rnxHRB36jLV0oSE7N/ln4xaMMw JP6RjUTuZsn8cOiVRuxfzJpV2spjdcmZFoOhSlasg6wfpJ7uA75JHXcGdyT2Zbsyb1v9NboT7Y lH032eKKZAUzXNnCWaQygV+eZzmml7iJSvi5noVvWUQlzIw3aAM222O2j9dDs8K4E4vRxNg6LB NlcSRHV7qVILqhZFnGmLONqM Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:39:53 -0800 IronPort-SDR: XbYC1y12tPXtMASpB9imPeu1wb0BFiD9dK+N16lqYqQit8VE3U7O5nspgyHhiOxS11HfnIAFui GtcdOMSzgIBAaoht5YISIDOK939WYPK/MWjJBQJgpSJJzPnq8J3uhEjR12i1TfaSp9UF7B1RCP 6ioBQkovFkOTMtXS0I9nfyMeYZLDC+pd2n7ccNfO36c90eSvsdCTTimlqA73aJw6EY5O88b0IA Lq/4wuSkGHxt6fHVKKpui3Rudn1NCIcZPPWFRKHcO0ugnGIFBkyoZplRHc0xW0bS3wEPGSR4wb XQ4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:10 -0800 Received: (nullmailer pid 1916426 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v12 04/41] btrfs: use regular SB location on emulated zoned mode Date: Fri, 15 Jan 2021 15:53:07 +0900 Message-Id: <30ac9e674289d206ec9299228d38cd7d03cd16c4.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The zoned btrfs puts a superblock at the beginning of SB logging zones if the zone is conventional. This difference causes a chicken-and-egg problem for emulated zoned mode. Since the device is a regular (non-zoned) device, we cannot know if the btrfs is regular or emulated zoned while we read the superblock. But, to load proper superblock, we need to see if it is emulated zoned or not. We place the SBs at the same location as the regular btrfs on emulated zoned mode to solve the problem. It is possible because it's ensured that all the SB locations are at a conventional zone on emulated zoned mode. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/zoned.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 90b8d1d5369f..49148e7a44b4 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -553,7 +553,13 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, struct btrfs_zoned_device_info *zinfo = device->zone_info; u32 zone_num; - if (!zinfo) { + /* + * With btrfs zoned mode on a non-zoned block device, use the same + * super block locations as regular btrfs. Doing so, the super + * block can always be retrieved and the zoned-mode of the volume + * detected from the super block information. + */ + if (!bdev_is_zoned(device->bdev)) { *bytenr_ret = btrfs_sb_offset(mirror); return 0; } From patchwork Fri Jan 15 06:53:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 955F7C433DB for ; Fri, 15 Jan 2021 06:57:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 703BC222BB for ; Fri, 15 Jan 2021 06:57:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728452AbhAOG5H (ORCPT ); Fri, 15 Jan 2021 01:57:07 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41718 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG5H (ORCPT ); Fri, 15 Jan 2021 01:57:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693826; x=1642229826; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TKpueyymWk3GGVnIta9p83tW6I0bSf3tAuzYPlV4XHE=; b=eZkT+u7C92Nmuuyp9QT6VYRSYJzKC6fc5E91ew3OTvtYyJIlG94DpPKR vIUM55sEztw0Wj0ICvpd3qGis2klrv+eILd31yZqx4qRX9TMn+G/1qsqU cN0LjusEQViDObRrHWWVEZe4xENIxyDhTYIjF9HVb3WjWlgu6gb7T7yVI vMNGY9nf+zVvUMLGY1pamshuXKQfCMbESrkudKJuK6HEqlyvjwkap3nkv L/uBKZTOtvAYHOZ9P3APCozTcz8zhehfIMowPLEmPKhfHgrcC+S2OvcDt w0nB+BrfFeQWDnCti3PBGlUX1HnxtmR0GI2iKk+xBQefrQfRBvyDLF7qT Q==; IronPort-SDR: pooufi3pPy429PXLPJzwQqMgmOBVPGbsh06PJ033v5RAQOppUOANHj94srDMdEAtNnraSH53T2 qWJpVBj+9HtuNs72WhYTZoV9cpiEvKkQGnPfeG+4mROQfh4mRkB5LBOsRUWriPZzJcl0JMIiSL cWCupYgx8twqFGhtUPiU5WmxqSIPNmRTlFRizdjHBbAkr1r4B4luVZ8+LL9moXYHS+5LTAKmuf 5RGd2psXwczofaN5BC1ZTrmsnl5fQGk0JfQA6vbGUf+lPzSnf2ahP94aBV8CRxW7b730oW8Say 6rY= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928191" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:13 +0800 IronPort-SDR: WXpmbJyp3nrdAapu/XDUQe1r1WINj0+CYoB86lLS/laz9/rFODCiF1RiQn7DenQExPVW9ukKES E438TAk1tKCEYo+LCp4h+gsLU+S2QZt1rABah4idj9FhOEImtwgYatS+fDO2PbsKLQbKKp1vVS i8tyo2hmDIHLGhegp3ve3Dj6biv9UVBPJ2pAwnftKboyP6C1zt1qKVI9EIgcSbAlc1jzZPUiAH +H7ZVnEd00XbYBeu8r6KcLoOHF8+/lqh6A3SU1insql64lWJicWUhTEafBOIxGirkL/AauiCwz 688E1j/g1Foho0KgETruH4Nl Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:39:55 -0800 IronPort-SDR: 8t4B3Fws7sG3ACmlGVB2R2AmEe78lNqfWhodKT6E2M2vE+uNM/U1ofumceKgZnAjYRxcCYJJP1 ZRJv71VOxFLGJSIyPBOvbNZBGwCB58q6B4D/4QS9HksQK2VVGLvuXYQ4SerJJ7YwguqK/XMQjD P/4h4d2gdy4mGtdZ/5Fj5bNcS7tuHVq60peaFmAX0a6BrsyvMoLW0vpYlLazfRxMcWI0/S3iXr hrxMY9p+Xd6HqPiXpHEYtnf1efrN/tcvMcIhV+0+D5dp4Hy3G1o75ZC3gNYTRam9JD359XmndY wr8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:12 -0800 Received: (nullmailer pid 1916428 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v12 05/41] btrfs: release path before calling into btrfs_load_block_group_zone_info Date: Fri, 15 Jan 2021 15:53:08 +0900 Message-Id: <0786a9782ec6306cddb0a2808116c3f95a88849b.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Since we have no write pointer in conventional zones, we cannot determine the allocation offset from it. Instead, we set the allocation offset after the highest addressed extent. This is done by reading the extent tree in btrfs_load_block_group_zone_info(). However, this function is called from btrfs_read_block_groups(), so the read lock for the tree node can recursively taken. To avoid this unsafe locking scenario, release the path before reading the extent tree to get the allocation offset. Signed-off-by: Johannes Thumshirn --- fs/btrfs/block-group.c | 39 ++++++++++++++++++--------------------- 1 file changed, 18 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index b8bbdd95743e..ff13f7554ee5 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1806,24 +1806,8 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info) return ret; } -static void read_block_group_item(struct btrfs_block_group *cache, - struct btrfs_path *path, - const struct btrfs_key *key) -{ - struct extent_buffer *leaf = path->nodes[0]; - struct btrfs_block_group_item bgi; - int slot = path->slots[0]; - - cache->length = key->offset; - - read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), - sizeof(bgi)); - cache->used = btrfs_stack_block_group_used(&bgi); - cache->flags = btrfs_stack_block_group_flags(&bgi); -} - static int read_one_block_group(struct btrfs_fs_info *info, - struct btrfs_path *path, + struct btrfs_block_group_item *bgi, const struct btrfs_key *key, int need_clear) { @@ -1838,7 +1822,9 @@ static int read_one_block_group(struct btrfs_fs_info *info, if (!cache) return -ENOMEM; - read_block_group_item(cache, path, key); + cache->length = key->offset; + cache->used = btrfs_stack_block_group_used(bgi); + cache->flags = btrfs_stack_block_group_flags(bgi); set_free_space_tree_thresholds(cache); @@ -1997,19 +1983,30 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) need_clear = 1; while (1) { + struct btrfs_block_group_item bgi; + struct extent_buffer *leaf; + int slot; + ret = find_first_block_group(info, path, &key); if (ret > 0) break; if (ret != 0) goto error; - btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); - ret = read_one_block_group(info, path, &key, need_clear); + leaf = path->nodes[0]; + slot = path->slots[0]; + btrfs_release_path(path); + + read_extent_buffer(leaf, &bgi, + btrfs_item_ptr_offset(leaf, slot), + sizeof(bgi)); + + btrfs_item_key_to_cpu(leaf, &key, slot); + ret = read_one_block_group(info, &bgi, &key, need_clear); if (ret < 0) goto error; key.objectid += key.offset; key.offset = 0; - btrfs_release_path(path); } btrfs_release_path(path); From patchwork Fri Jan 15 06:53:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AA1CC433E0 for ; Fri, 15 Jan 2021 06:57:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 59182222BB for ; Fri, 15 Jan 2021 06:57:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728772AbhAOG5R (ORCPT ); Fri, 15 Jan 2021 01:57:17 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41680 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG5Q (ORCPT ); Fri, 15 Jan 2021 01:57:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693835; x=1642229835; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=W7XFF0P7OQNDxD89tYtJUXC4wa061hO3/UHNt84IISc=; b=oLI3gkCyiDGoywwDUii9O5/ujhFaC1xLD2NmKrWnQp4zJbA6/6jxoK/o BZZcG1kKOyueTlT9Udof8OPbjHopjehTc0zVkmmR1b6PmAnFV3Au5rrKE T8WD6AQPq+an5+ehEnflblh6+pnaGSbZPBO11us7RZbb/czKHriYzxolv RGz9FwdQZatYmchLSgjqfFzWgRBlzCcJ2iqAve10WdU9g+iomLhFHiLCr c3cRkNMlRm+yWlzpSQ8vUHi21sEOcVkq5SHvNum3Tmo9BbyJmKr6HFHG0 g64zQG8L5zl4Biejz5KTZ9gKW8GhE9lFlXhqZhN2RjLxyEE9/CFVd7hjr A==; IronPort-SDR: tHmRGuTR32+F8pXUYRrQ+QSfEpCbmWFwaXLA62YZuy2j56m1/rZQE3IYeV4MBjb1WuXaytzHn8 uCbwWa6nkwJuK0S9oLgbdi9OUN/jRQrBCaZppbgLd3+ccwcT5t+jFFIk3gm5e+rYbkGS/0QX6f GRAU6lRq8kmJc+mFNQTMrPe/u0neLppuo7cU7S0LhMlXWJ2B94LG+I6w5oyVyvnBwVjRFn7csq FPeWpQJdAiL0Uysj77BN6EjcJtSYIub2i8z6CAaZwWX6sradNdAkThvT0Onh0L78/echPTOmaU Vlc= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928201" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:15 +0800 IronPort-SDR: ZMg2bHIBUMpLK/k9eDEjw+BSEkl+G8oxYJl1/n2n/qQCd13tj0Lmv67Ex2ly6R7eNBjdhA7dTi oC7uSr6AK2fBOXpnj+nzupuF+vtsxheIMOM5w5QjEZTdckG5kGdLJ2xYW+m6Ti5sFpwF7gCJS1 76ya1gMgkYTPmHO2sQImubLON6M0dpYNxwijbv1KW5FwvUDCGhIcsBlltEcMkSO7lQKo1DtvPO WtoDBoPlAVhf8/UZizb+9Hcwfg46QLcgRps20WPXiG4s+ERPYmcGqEn8GFdUGQt+NksFuha1Mj HhmZvXbgKX6P/FuZwmbX0RVJ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:39:57 -0800 IronPort-SDR: dRBYkP/Egkn/bt1qPEWlf+4yQ5TPCJ5hE0OEa6K5SX96T72uHK3C+FhTDBYkc3+MhmSP5Fs51Q i5fhSW+mKEDOJGCoXjHfjFDbiZfnWEucy9OvKJHiu/QphtKUMmtp0TH6Z1V+5gfGw0nL+Dqc4L BTFRDmM1tJnWDOAN7I9Y31kBhQoLEakrlo2MGiAcVjAC6hi1ryxBuSy4+Ksz0sz3Ij78y2RT4N p7vD21IEfzGX68+m+hG0iOrEDjHWskbMmJjy05m+MvacmWJdShBflOZyZizTeNmDEfvBwW4E5I Cho= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:14 -0800 Received: (nullmailer pid 1916430 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v12 06/41] btrfs: do not load fs_info->zoned from incompat flag Date: Fri, 15 Jan 2021 15:53:09 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Don't set the zoned flag in fs_info when encountering the BTRFS_FEATURE_INCOMPAT_ZONED on mount. The zoned flag in fs_info is in a union together with the zone_size, so setting it too early will result in setting an incorrect zone_size as well. Once the correct zone_size is read from the device, we can rely on the zoned flag in fs_info as well to determine if the filesystem is running in zoned mode. Signed-off-by: Johannes Thumshirn --- fs/btrfs/disk-io.c | 2 -- fs/btrfs/zoned.c | 8 ++++++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e7b451d30ae2..192e366f8afc 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3136,8 +3136,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) btrfs_info(fs_info, "has skinny extents"); - fs_info->zoned = (features & BTRFS_FEATURE_INCOMPAT_ZONED); - /* * flag our filesystem as having big metadata blocks if * they are bigger than the page size diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 49148e7a44b4..684dad749a8c 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -431,6 +431,14 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + /* + * Check mount options here, because we might change fs_info->zoned + * from fs_info->zone_size. + */ + ret = btrfs_check_mountopts_zoned(fs_info); + if (ret) + goto out; + btrfs_info(fs_info, "zoned mode enabled with zone size %llu", zone_size); out: return ret; From patchwork Fri Jan 15 06:53:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA434C433E6 for ; Fri, 15 Jan 2021 06:57:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8611322597 for ; Fri, 15 Jan 2021 06:57:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728869AbhAOG5T (ORCPT ); Fri, 15 Jan 2021 01:57:19 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41681 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG5S (ORCPT ); Fri, 15 Jan 2021 01:57:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693838; x=1642229838; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cLqJJprYlUuRZMJFVCLH3kU4ORDHfyDHPqdKF+CpmbA=; b=RkyUMIy+j/aVDtf1Tgff4ijoKwSh4rAq4zOILldR5RMzPDRx3FhdnR8B clkz149gU92zLHjsJjfeJFLDfv3yn6IHDy/PTwS6skwHJqHxiL/4Vor3m 3BdKX1e5n1v5OlIgCBCtrrgmqUGhEMaTtLU7TBtUJR1V+LhttPDtHHySu q4mjRnNtygIfUuyPOdd4fCKfOAp1SLWXdTk2kvqIcqmZBjY8t5W4+fNSv 5q2p/vP/qXzfW79wusCL/R53aXm7vaOUNxVW06H/KRfG+dWY25f3RzGOi seEManyl0HO1HL9bijWLppJOuiH93LPat/iF+rVUzQ29ganb2/KYgiiPS w==; IronPort-SDR: wN4dOEd6eCLwMH4qA+nsCjaYZT2d7c5s83ZSWNVoQaO3vsbPWNn5muQNLLwEqSJA6x7j88GHRN KHSzxB0AANENEEJ8GyptmXcSkwYcOaa1hgCD63UKhSFaz7v0PP5CFzSo6N819CT2cU26Xr2DVS f/KkLXv263BHoCrSCtiN8ssXSH1yqMIks2PGevIv6WgnxVVZaCg12Ozp6Q8CBOSvpxvR2WVGIe evGKktNqYyhoK3BKiRUuWd2Q2ynB2B5szbhWyDSsH/1+RImnz9wWrIKCWGydInfexB9hjAXEOx 9os= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928204" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:17 +0800 IronPort-SDR: QGlkErlGI3F7AWJfN/SVgo0Gr/5Y172+EqQ+rMBs72A5yWK0OUiC9uAORffDGzWQV5EEroXt0I wkj0meG/5m+TKWYnWviUm4CulZUNnwrYiD+yAjioLEKu8GoKQbtsyOpSpw6Ya2QzBBEDkqJ+oQ DxoPF5+5owMZG04MfznfJt2pCIzJuTAIor2fQdzRFLUKSeHfd8sCYy18WIxOcyyvhUxTvx9GqB VzH2xApV4VBf54Y7egTgE0Ealipvh/1qfOwAXzN/E6+aqYdcnypo2+Oiz1yVx/IBiVsD8kDbLl gZ0b8ZI5VidWsygmnXrFX91A Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:39:59 -0800 IronPort-SDR: Jbld1S1DOjtrInnm0zpgJ4aHxFpSKNdzXhl0d/+wyolpybo2k1yqv8d2LxjOtqqBFezv2DjZAj /cWPd2uDT4hAd7eAyK52yrk9VbxsVISWeh62bOxb8lG8JNQqn7IlGVBuhzK/oWTH9NnH1BwR9O N1Gm4lpYI/jjQHsPY2n4T6TG0hKtBimWVrdNqRuHoRuX+5puIL6IMwPiruDR6ymL2HHNqz4kWU h+x9gEl0JdQY3b2Qyg3NBif9w7sDKXsiaLnukRCIxILDhGiA7q57jYGJmS4GE/0aQJVFiHn7h7 3Po= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:16 -0800 Received: (nullmailer pid 1916432 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v12 07/41] btrfs: disallow fitrim in ZONED mode Date: Fri, 15 Jan 2021 15:53:10 +0900 Message-Id: <4f14a1ab8ab7b9ed558404c6b3e92ed69d4054f1.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The implementation of fitrim is depending on space cache, which is not used and disabled for zoned btrfs' extent allocator. So the current code does not work with zoned btrfs. In the future, we can implement fitrim for zoned btrfs by enabling space cache (but, only for fitrim) or scanning the extent tree at fitrim time. But, for now, disallow fitrim in ZONED mode. Signed-off-by: Naohiro Aota --- fs/btrfs/ioctl.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 5b9b0a390f0e..a7980c20c77e 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -527,6 +527,14 @@ static noinline int btrfs_ioctl_fitrim(struct btrfs_fs_info *fs_info, if (!capable(CAP_SYS_ADMIN)) return -EPERM; + /* + * btrfs_trim_block_group() is depending on space cache, which is + * not available in ZONED mode. So, disallow fitrim in ZONED mode + * for now. + */ + if (btrfs_is_zoned(fs_info)) + return -EOPNOTSUPP; + /* * If the fs is mounted with nologreplay, which requires it to be * mounted in RO mode as well, we can not allow discard on free space From patchwork Fri Jan 15 06:53:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7E6AC43381 for ; Fri, 15 Jan 2021 06:57:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B109622473 for ; Fri, 15 Jan 2021 06:57:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728940AbhAOG51 (ORCPT ); Fri, 15 Jan 2021 01:57:27 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41752 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG5Z (ORCPT ); Fri, 15 Jan 2021 01:57:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693845; x=1642229845; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Byrp/LbzSYs1xXDKYbLKlz6PUoB9ASc9IQVGmlw34ZI=; b=RO5ftpEjR1HN7x8HYaX5OhE6xMt04+i+5jrtYuLBpx44KYj5fEf/KYzm Sfk69371m2mjGSS/VxKbzLQJGU+VS2CJcwdKcOonuy8lBAXv9ajVe2qmr NTkfEtmDHrD46CDjBHtH1WTN0fSYH9y9vK2uSh+SNnm4TW2kXwaVI421Q VSb/fTqNo/VsmUkJb3K/yeKR4MgULEU3t7R/AITSmbEZgTCF8ckDEPFZG Rh5EjD8L0LAVaiGaLxtZf0ozFqQROCJackhyMFeLV5IjMuXDV/rr9L0p3 RpSmL/U+iZTd4LNWFJNNlrcH0itHIQIWTtoCPlqVGL4PO3G0v444PGi4R A==; IronPort-SDR: vhQDqloJbrDgYVL1NVjJzN8XOcltq0T7s8tHxN7921bYz8Llsnox7HIHoJZLa0kQ5QAHqtyTDG OQN5VNbu7NBFOaq3luUReS50KpAEHcqGI+riMFOPU1+bSGfVlg+VvcSKZlQkGh6+7Ea47DS9t4 k4j72/ay6/FkZXYD1lFcfMrxCz2MAFH/yonsESiMa0J8ZdIo42NPN7ZQaq+7sosDCg5Fe6DeVh 4RGTzN8j7k/Ca/0X7KQrAf1j4ZecrA2gwSV4GnKt1Ql2tQrFMPR/XmbVpXZDpFmVLLfKDu6HvU F6M= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928208" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:19 +0800 IronPort-SDR: 5J5uVGCifczXO2u901plnq7cxl1e/H1CakayXpYRV8ULevzhBjhkF8mfwMoG0L4hxe9xTZMeGu RI0km1GUIVt32ZnpSic5jDBe7CMZqI6ESDzKBoUyClAf87bsZoSV9tvYr/vvmjgBwEQ0ax40Cl RSY5TUDWotP3rHNJeddTzMdhdxlnR/HhjCLTJIhKh7XDn5TfPJouxxyW2yHITTmnecOoTUBw2d J190NEn/9d7KGRMHE9xbhr0zxKSwJ/zN1LycI1DRIJRMxrahUdTGk+5od86kUo8OJ8scqN1edM 1l4DDksjsZtLwi9v8wN99tnk Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:01 -0800 IronPort-SDR: aS3qfFUm1unC9zYPY7Dw+hGU8hkExSGWCzmOUrURlNJgg5Nj5wt9xe0tNpci4TkJe0hgmuWWoS ZoxJ8byx35sJdHzEYAaIrrqqGTfMtbTXgf2TIe60DTdcevn3EjGsZ1EVx1LE1xVETajNX226Vv nsSZndrPUF+rPYi0ELG95CpYZtmcZYsDpNDe98xT1T6LG8LfXt1DxXCxjOXmUrMgMcBD44n7Ez 0F3cGq1AMs2KNHADSZ7Cs0K4OdG2px/yMnIicM5HFHF5OLuu+4SmNYvGgNcIEXKAvbLKKrl/XO FRA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:17 -0800 Received: (nullmailer pid 1916434 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Naohiro Aota Subject: [PATCH v12 08/41] btrfs: allow zoned mode on non-zoned block devices Date: Fri, 15 Jan 2021 15:53:11 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Run zoned btrfs mode on non-zoned devices. This is done by "slicing up" the block-device into static sized chunks and fake a conventional zone on each of them. The emulated zone size is determined from the size of device extent. This is mainly aimed at testing parts of the zoned mode, i.e. the zoned chunk allocator, on regular block devices. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/zoned.c | 149 +++++++++++++++++++++++++++++++++++++++++++---- fs/btrfs/zoned.h | 14 +++-- 2 files changed, 147 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 684dad749a8c..13b240e5db4e 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -119,6 +119,37 @@ static inline u32 sb_zone_number(int shift, int mirror) return 0; } +/* + * Emulate blkdev_report_zones() for a non-zoned device. It slice up + * the block device into static sized chunks and fake a conventional zone + * on each of them. + */ +static int emulate_report_zones(struct btrfs_device *device, u64 pos, + struct blk_zone *zones, unsigned int nr_zones) +{ + const sector_t zone_sectors = + device->fs_info->zone_size >> SECTOR_SHIFT; + sector_t bdev_size = device->bdev->bd_part->nr_sects; + unsigned int i; + + pos >>= SECTOR_SHIFT; + for (i = 0; i < nr_zones; i++) { + zones[i].start = i * zone_sectors + pos; + zones[i].len = zone_sectors; + zones[i].capacity = zone_sectors; + zones[i].wp = zones[i].start + zone_sectors; + zones[i].type = BLK_ZONE_TYPE_CONVENTIONAL; + zones[i].cond = BLK_ZONE_COND_NOT_WP; + + if (zones[i].wp >= bdev_size) { + i++; + break; + } + } + + return i; +} + static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, struct blk_zone *zones, unsigned int *nr_zones) { @@ -127,6 +158,12 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, if (!*nr_zones) return 0; + if (!bdev_is_zoned(device->bdev)) { + ret = emulate_report_zones(device, pos, zones, *nr_zones); + *nr_zones = ret; + return 0; + } + ret = blkdev_report_zones(device->bdev, pos >> SECTOR_SHIFT, *nr_zones, copy_zone_info_cb, zones); if (ret < 0) { @@ -143,6 +180,50 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +/* The emulated zone size is determined from the size of device extent. */ +static int calculate_emulated_zone_size(struct btrfs_fs_info *fs_info) +{ + struct btrfs_path *path; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_key key; + struct extent_buffer *leaf; + struct btrfs_dev_extent *dext; + int ret = 0; + + key.objectid = 1; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto out; + + if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_item(root, path); + if (ret < 0) + goto out; + /* No dev extents at all? Not good */ + if (ret > 0) { + ret = -EUCLEAN; + goto out; + } + } + + leaf = path->nodes[0]; + dext = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_dev_extent); + fs_info->zone_size = btrfs_dev_extent_length(leaf, dext); + ret = 0; + +out: + btrfs_free_path(path); + + return ret; +} + int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; @@ -169,6 +250,7 @@ int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) int btrfs_get_dev_zone_info(struct btrfs_device *device) { + struct btrfs_fs_info *fs_info = device->fs_info; struct btrfs_zoned_device_info *zone_info = NULL; struct block_device *bdev = device->bdev; struct request_queue *queue = bdev_get_queue(bdev); @@ -177,9 +259,14 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) struct blk_zone *zones = NULL; unsigned int i, nreported = 0, nr_zones; unsigned int zone_sectors; + char *model, *emulated; int ret; - if (!bdev_is_zoned(bdev)) + /* + * Cannot use btrfs_is_zoned here, since fs_info->zone_size might + * not be set yet. + */ + if (!btrfs_fs_incompat(fs_info, ZONED)) return 0; if (device->zone_info) @@ -189,8 +276,20 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) if (!zone_info) return -ENOMEM; + if (!bdev_is_zoned(bdev)) { + if (!fs_info->zone_size) { + ret = calculate_emulated_zone_size(fs_info); + if (ret) + goto out; + } + + ASSERT(fs_info->zone_size); + zone_sectors = fs_info->zone_size >> SECTOR_SHIFT; + } else { + zone_sectors = bdev_zone_sectors(bdev); + } + nr_sectors = bdev->bd_part->nr_sects; - zone_sectors = bdev_zone_sectors(bdev); /* Check if it's power of 2 (see is_power_of_2) */ ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0); zone_info->zone_size = zone_sectors << SECTOR_SHIFT; @@ -296,12 +395,32 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) device->zone_info = zone_info; - /* device->fs_info is not safe to use for printing messages */ - btrfs_info_in_rcu(NULL, - "host-%s zoned block device %s, %u zones of %llu bytes", - bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", - rcu_str_deref(device->name), zone_info->nr_zones, - zone_info->zone_size); + switch (bdev_zoned_model(bdev)) { + case BLK_ZONED_HM: + model = "host-managed zoned"; + emulated = ""; + break; + case BLK_ZONED_HA: + model = "host-aware zoned"; + emulated = ""; + break; + case BLK_ZONED_NONE: + model = "regular"; + emulated = "emulated "; + break; + default: + /* Just in case */ + btrfs_err_in_rcu(fs_info, "Unsupported zoned model %d on %s", + bdev_zoned_model(bdev), + rcu_str_deref(device->name)); + ret = -EOPNOTSUPP; + goto out; + } + + btrfs_info_in_rcu(fs_info, + "%s block device %s, %u %szones of %llu bytes", + model, rcu_str_deref(device->name), zone_info->nr_zones, + emulated, zone_info->zone_size); return 0; @@ -348,7 +467,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) u64 nr_devices = 0; u64 zone_size = 0; u64 max_zone_append_size = 0; - const bool incompat_zoned = btrfs_is_zoned(fs_info); + const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; /* Count zoned devices */ @@ -359,9 +478,17 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) continue; model = bdev_zoned_model(device->bdev); + /* + * A Host-Managed zoned device msut be used as a zoned + * device. A Host-Aware zoned device and a non-zoned devices + * can be treated as a zoned device, if ZONED flag is + * enabled in the superblock. + */ if (model == BLK_ZONED_HM || - (model == BLK_ZONED_HA && incompat_zoned)) { - struct btrfs_zoned_device_info *zone_info; + (model == BLK_ZONED_HA && incompat_zoned) || + (model == BLK_ZONED_NONE && incompat_zoned)) { + struct btrfs_zoned_device_info *zone_info = + device->zone_info; zone_info = device->zone_info; zoned_devices++; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 5e0e7de84a82..058a57317c05 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -143,12 +143,16 @@ static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device, u64 p static inline bool btrfs_check_device_zone_type(const struct btrfs_fs_info *fs_info, struct block_device *bdev) { - u64 zone_size; - if (btrfs_is_zoned(fs_info)) { - zone_size = bdev_zone_sectors(bdev) << SECTOR_SHIFT; - /* Do not allow non-zoned device */ - return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size; + /* + * We can allow a regular device on a zoned btrfs, because + * we will emulate zoned device on the regular device. + */ + if (!bdev_is_zoned(bdev)) + return true; + + return fs_info->zone_size == + (bdev_zone_sectors(bdev) << SECTOR_SHIFT); } /* Do not allow Host Manged zoned device */ From patchwork Fri Jan 15 06:53:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8982EC43332 for ; Fri, 15 Jan 2021 06:57:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6A02B222BB for ; Fri, 15 Jan 2021 06:57:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729182AbhAOG5d (ORCPT ); Fri, 15 Jan 2021 01:57:33 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41647 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG5c (ORCPT ); Fri, 15 Jan 2021 01:57:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693852; x=1642229852; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WDIeswTFr360RUHtfzwCDoCtwuRsed0xJjPDMP++dpY=; b=roKHHTDze/of/1OI3WN99Xk6q4TcPhVftBo1NRkU3kTHZP/+Vm8mMAxy 0zdA71EBlFj7iCoNgtsbBcoEh3TBp37h+J4q+XydzEVR0aijIAVDaj+KQ YClkg/1qIq5qZR8UFGHiJvErGJu+OVsViEk7OkeHE0TpFUcEDMTm2sqFP 6ZJt7fl1Zh/GXwLloHFU+jx/LZZ6bGpORjZka7MqkFTt0zCcOk4paffHy X55Hi3+GsbRy4CEzbnuPZEdkHtfIHXNJH2khtADJWon4kpfcFCZRGgFSF EWH6mlGOmSSG8/vvqBgYMYlDHp2OUJey0O4kxM0AOV5dnfe6m0GrM0L9m A==; IronPort-SDR: crYqjOXPSuxXCRGPLJg5gaR+U4to1zibTJkh/3bJoWtydCAZRSlkR+FgUKL/C9A7Yrxc76y5hz vyaGcvl77LLQIHww6dFJmGZXtC/Q3G7MLQYmi0VqVmcD1o178r7nzQXopd9WwkB9kd04xnGvuB eZ7wC2hnGVK7a/dYDGzC042M154M7ZmuJvVwckRowZWzpJJQEkDQy2pKlxNo3c6J6qVAsQ1Lpo lZ2LEfPZNx35t/2/pttlJvOKG3qUvCnDQ1lNSa+922zeQ8D67pvEOJg2DSnUCxusUlmseviY1u Fc8= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928214" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:23 +0800 IronPort-SDR: yoCmTO3tuYFuaNI1DuRePgyaWNhkNBvH/EYRxgJ783yFYqN5f58ZqstOyDsDPjOd0orEqOZ5BR gIJ7WewxpaZm96/Fz0h+Sr+i20zi4caVSBfx9m1acKeo9xX7luElHqIB85jlXfdxwdbT3fQtEL jpseFl5RYEW1tdqtrYPCU33Cp+F/ojnN+rNd+91H/l9yJ6qJEplvawFJzHHYXHUf6b8ALlHeB+ vJkbNcI1h05RxDHQuTLmPjYnHK6HW7ttTAPTv23yPwmKt4QcIQmuP1JaWldbbaWlSfYRBMFWPV uHigC0o5MF6yvEpA9DXDLKEg Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:05 -0800 IronPort-SDR: 1jJ0s8N94cB+5epa3ZgecYpfqjFjZB57hHp+H/fkGT64N0iH1RisZC7ThZrP+2I2as4GZVaKu8 YEAkrPPD2MdOTOmH6+NUbZE0BqkC2wgVnfMjfXO50m+jrlXfkXft96WjM3dDccrealrU+1u0Uc qPnExrKs50XIra1D6dCJuVYDUZwRyZcyy6cdmpBlWe2WvslsESou5cEsYSgpCpNsGIfXMJlnGo 5v6zw/UXzK6aIOo8+e1LGcnrSlBCSNGnjvBXTMC2uhaSnYs6OqBGPUu+qq+obKgYem0qRVzM/B CWk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:22 -0800 Received: (nullmailer pid 1916438 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 09/41] btrfs: implement zoned chunk allocator Date: Fri, 15 Jan 2021 15:53:13 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a zoned chunk/dev_extent allocator. The zoned allocator aligns the device extents to zone boundaries, so that a zone reset affects only the device extent and does not change the state of blocks in the neighbor device extents. Also, it checks that a region allocation is not overlapping any of the super block zones, and ensures the region is empty. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 169 ++++++++++++++++++++++++++++++++++++++++----- fs/btrfs/volumes.h | 1 + fs/btrfs/zoned.c | 144 ++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 25 +++++++ 4 files changed, 323 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7d92b11ea603..4f02b570736e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1415,11 +1415,62 @@ static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) * make sure to start at an offset of at least 1MB. */ return max_t(u64, start, SZ_1M); + case BTRFS_CHUNK_ALLOC_ZONED: + /* + * We don't care about the starting region like regular + * allocator, because we anyway use/reserve the first two + * zones for superblock logging. + */ + return ALIGN(start, device->zone_info->zone_size); default: BUG(); } } +static bool dev_extent_hole_check_zoned(struct btrfs_device *device, + u64 *hole_start, u64 *hole_size, + u64 num_bytes) +{ + u64 zone_size = device->zone_info->zone_size; + u64 pos; + int ret; + int changed = 0; + + ASSERT(IS_ALIGNED(*hole_start, zone_size)); + + while (*hole_size > 0) { + pos = btrfs_find_allocatable_zones(device, *hole_start, + *hole_start + *hole_size, + num_bytes); + if (pos != *hole_start) { + *hole_size = *hole_start + *hole_size - pos; + *hole_start = pos; + changed = 1; + if (*hole_size < num_bytes) + break; + } + + ret = btrfs_ensure_empty_zones(device, pos, num_bytes); + + /* Range is ensured to be empty */ + if (!ret) + return changed; + + /* Given hole range was invalid (outside of device) */ + if (ret == -ERANGE) { + *hole_start += *hole_size; + *hole_size = 0; + return 1; + } + + *hole_start += zone_size; + *hole_size -= zone_size; + changed = 1; + } + + return changed; +} + /** * dev_extent_hole_check - check if specified hole is suitable for allocation * @device: the device which we have the hole @@ -1436,24 +1487,39 @@ static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start, bool changed = false; u64 hole_end = *hole_start + *hole_size; - /* - * Check before we set max_hole_start, otherwise we could end up - * sending back this offset anyway. - */ - if (contains_pending_extent(device, hole_start, *hole_size)) { - if (hole_end >= *hole_start) - *hole_size = hole_end - *hole_start; - else - *hole_size = 0; - changed = true; - } + for (;;) { + /* + * Check before we set max_hole_start, otherwise we could end up + * sending back this offset anyway. + */ + if (contains_pending_extent(device, hole_start, *hole_size)) { + if (hole_end >= *hole_start) + *hole_size = hole_end - *hole_start; + else + *hole_size = 0; + changed = true; + } + + switch (device->fs_devices->chunk_alloc_policy) { + case BTRFS_CHUNK_ALLOC_REGULAR: + /* No extra check */ + break; + case BTRFS_CHUNK_ALLOC_ZONED: + if (dev_extent_hole_check_zoned(device, hole_start, + hole_size, num_bytes)) { + changed = true; + /* + * The changed hole can contain pending + * extent. Loop again to check that. + */ + continue; + } + break; + default: + BUG(); + } - switch (device->fs_devices->chunk_alloc_policy) { - case BTRFS_CHUNK_ALLOC_REGULAR: - /* No extra check */ break; - default: - BUG(); } return changed; @@ -1506,6 +1572,9 @@ static int find_free_dev_extent_start(struct btrfs_device *device, search_start = dev_extent_search_start(device, search_start); + WARN_ON(device->zone_info && + !IS_ALIGNED(num_bytes, device->zone_info->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -4899,6 +4968,37 @@ static void init_alloc_chunk_ctl_policy_regular( ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes; } +static void init_alloc_chunk_ctl_policy_zoned( + struct btrfs_fs_devices *fs_devices, + struct alloc_chunk_ctl *ctl) +{ + u64 zone_size = fs_devices->fs_info->zone_size; + u64 limit; + int min_num_stripes = ctl->devs_min * ctl->dev_stripes; + int min_data_stripes = (min_num_stripes - ctl->nparity) / ctl->ncopies; + u64 min_chunk_size = min_data_stripes * zone_size; + u64 type = ctl->type; + + ctl->max_stripe_size = zone_size; + if (type & BTRFS_BLOCK_GROUP_DATA) { + ctl->max_chunk_size = round_down(BTRFS_MAX_DATA_CHUNK_SIZE, + zone_size); + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { + ctl->max_chunk_size = ctl->max_stripe_size; + } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) { + ctl->max_chunk_size = 2 * ctl->max_stripe_size; + ctl->devs_max = min_t(int, ctl->devs_max, + BTRFS_MAX_DEVS_SYS_CHUNK); + } + + /* We don't want a chunk larger than 10% of writable space */ + limit = max(round_down(div_factor(fs_devices->total_rw_bytes, 1), + zone_size), + min_chunk_size); + ctl->max_chunk_size = min(limit, ctl->max_chunk_size); + ctl->dev_extent_min = zone_size * ctl->dev_stripes; +} + static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl) { @@ -4919,6 +5019,9 @@ static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, case BTRFS_CHUNK_ALLOC_REGULAR: init_alloc_chunk_ctl_policy_regular(fs_devices, ctl); break; + case BTRFS_CHUNK_ALLOC_ZONED: + init_alloc_chunk_ctl_policy_zoned(fs_devices, ctl); + break; default: BUG(); } @@ -5045,6 +5148,38 @@ static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl, return 0; } +static int decide_stripe_size_zoned(struct alloc_chunk_ctl *ctl, + struct btrfs_device_info *devices_info) +{ + u64 zone_size = devices_info[0].dev->zone_info->zone_size; + /* Number of stripes that count for block group size */ + int data_stripes; + + /* + * It should hold because: + * dev_extent_min == dev_extent_want == zone_size * dev_stripes + */ + ASSERT(devices_info[ctl->ndevs - 1].max_avail == ctl->dev_extent_min); + + ctl->stripe_size = zone_size; + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + + /* stripe_size is fixed in ZONED. Reduce ndevs instead. */ + if (ctl->stripe_size * data_stripes > ctl->max_chunk_size) { + ctl->ndevs = div_u64(div_u64(ctl->max_chunk_size * ctl->ncopies, + ctl->stripe_size) + ctl->nparity, + ctl->dev_stripes); + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + ASSERT(ctl->stripe_size * data_stripes <= ctl->max_chunk_size); + } + + ctl->chunk_size = ctl->stripe_size * data_stripes; + + return 0; +} + static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl, struct btrfs_device_info *devices_info) @@ -5072,6 +5207,8 @@ static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, switch (fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: return decide_stripe_size_regular(ctl, devices_info); + case BTRFS_CHUNK_ALLOC_ZONED: + return decide_stripe_size_zoned(ctl, devices_info); default: BUG(); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 1997a4649a66..98a447badd6a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -213,6 +213,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used); enum btrfs_chunk_allocation_policy { BTRFS_CHUNK_ALLOC_REGULAR, + BTRFS_CHUNK_ALLOC_ZONED, }; /* diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 13b240e5db4e..ae5f49fe4fb4 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1,11 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" +#include "disk-io.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -557,6 +559,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; /* * Check mount options here, because we might change fs_info->zoned @@ -779,3 +782,144 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) sb_zone << zone_sectors_shift, zone_sectors * BTRFS_NR_SB_LOG_ZONES, GFP_NOFS); } + +/* + * btrfs_check_allocatable_zones - find allocatable zones within give region + * @device: the device to allocate a region + * @hole_start: the position of the hole to allocate the region + * @num_bytes: the size of wanted region + * @hole_size: the size of hole + * @return: position of allocatable zones + * + * Allocatable region should not contain any superblock locations. + */ +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + u64 nzones = num_bytes >> shift; + u64 pos = hole_start; + u64 begin, end; + bool have_sb; + int i; + + ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + while (pos < hole_end) { + begin = pos >> shift; + end = begin + nzones; + + if (end > zinfo->nr_zones) + return hole_end; + + /* Check if zones in the region are all empty */ + if (btrfs_dev_is_sequential(device, pos) && + find_next_zero_bit(zinfo->empty_zones, end, begin) != end) { + pos += zinfo->zone_size; + continue; + } + + have_sb = false; + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + u32 sb_zone; + u64 sb_pos; + + sb_zone = sb_zone_number(shift, i); + if (!(end <= sb_zone || + sb_zone + BTRFS_NR_SB_LOG_ZONES <= begin)) { + have_sb = true; + pos = ((u64)sb_zone + BTRFS_NR_SB_LOG_ZONES) << shift; + break; + } + + /* + * We also need to exclude regular superblock + * positions + */ + sb_pos = btrfs_sb_offset(i); + if (!(pos + num_bytes <= sb_pos || + sb_pos + BTRFS_SUPER_INFO_SIZE <= pos)) { + have_sb = true; + pos = ALIGN(sb_pos + BTRFS_SUPER_INFO_SIZE, + zinfo->zone_size); + break; + } + } + if (!have_sb) + break; + } + + return pos; +} + +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes) +{ + int ret; + + *bytes = 0; + ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, + physical >> SECTOR_SHIFT, length >> SECTOR_SHIFT, + GFP_NOFS); + if (ret) + return ret; + + *bytes = length; + while (length) { + btrfs_dev_set_zone_empty(device, physical); + physical += device->zone_info->zone_size; + length -= device->zone_info->zone_size; + } + + return 0; +} + +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + unsigned long begin = start >> shift; + unsigned long end = (start + size) >> shift; + u64 pos; + int ret; + + ASSERT(IS_ALIGNED(start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(size, zinfo->zone_size)); + + if (end > zinfo->nr_zones) + return -ERANGE; + + /* All the zones are conventional */ + if (find_next_bit(zinfo->seq_zones, begin, end) == end) + return 0; + + /* All the zones are sequential and empty */ + if (find_next_zero_bit(zinfo->seq_zones, begin, end) == end && + find_next_zero_bit(zinfo->empty_zones, begin, end) == end) + return 0; + + for (pos = start; pos < start + size; pos += zinfo->zone_size) { + u64 reset_bytes; + + if (!btrfs_dev_is_sequential(device, pos) || + btrfs_dev_is_empty_zone(device, pos)) + continue; + + /* Free regions should be empty */ + btrfs_warn_in_rcu( + device->fs_info, + "zoned: resetting device %s (devid %llu) zone %llu for allocation", + rcu_str_deref(device->name), device->devid, + pos >> shift); + WARN_ON_ONCE(1); + + ret = btrfs_reset_device_zone(device, pos, zinfo->zone_size, + &reset_bytes); + if (ret) + return ret; + } + + return 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 058a57317c05..de5901f5ae66 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -36,6 +36,11 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, u64 *bytenr_ret); void btrfs_advance_sb_log(struct btrfs_device *device, int mirror); int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror); +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes); +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes); +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -92,6 +97,26 @@ static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror return 0; } +static inline u64 btrfs_find_allocatable_zones(struct btrfs_device *device, + u64 hole_start, u64 hole_end, + u64 num_bytes) +{ + return hole_start; +} + +static inline int btrfs_reset_device_zone(struct btrfs_device *device, + u64 physical, u64 length, u64 *bytes) +{ + *bytes = 0; + return 0; +} + +static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, + u64 start, u64 size) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 15 06:53:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01BEDC433E0 for ; Fri, 15 Jan 2021 06:58:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CD27C222BB for ; Fri, 15 Jan 2021 06:58:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729303AbhAOG5t (ORCPT ); Fri, 15 Jan 2021 01:57:49 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41718 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG5s (ORCPT ); Fri, 15 Jan 2021 01:57:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693867; x=1642229867; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4tH11szVcdkzxaEsE9Bv8s+Q0+RrZaKM22QEuPXRbkE=; b=loCYuM6ts5bAvwBsYAv7LD1/JSnlOwpvC0+2uVX1ii1K/S19AJ7nj9ZP 5JQ1q7+B7qeQQ6p+nCRr+zGEWU4RHiXYEGHzshLRIumpqWnRzNFbIvgv/ Atf5EIp7UMY6gSiE9KP5rcsIJFELB6fMn3hNOCg595AlX3ldJzXNQsINE zmuVvslEWbaToSDbkq/zIpKm4tKMZy294iieqeJRBwcctDUqGySTGy1Bq XqF4dVk/QlFc+5qlsiN1cel7kcNVZ/DlqmlxVQ7J6R4G++1CkM9j4m60N 2dHkOquHoZw3F9vrD1N4WG59aqpm3FEt/q/XDg2P3BmhS9NR5+qOEMk4D A==; IronPort-SDR: ZWeBSIjYSgLZ8J/4BM2kaaaAfcFlxoH1sKMwN3ocN7G9/onc8K3uznsQZKQkPLwOZhNA44y9ty zj6SUvZZd68NcZxWaErp6EW2J8EiZY5gARMkKUxqRJSYoOKrTK0TKAZmJiiYi0DHlueKLc8HX6 /pwseumIx6BvbefSRMi5t4lAd+wmrvaT0eKg7Y//IffLWeykEc0uskWymRlGy9scXkZ8n9nqno UrFt1+6eE++QNyG6P6q2B03snT10ZabKKAQkVEzN5wJg7+A15JYM1xCvzHRzzpcn4hADcUeOfh QLY= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928219" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:25 +0800 IronPort-SDR: +FDLuiFvLPldvs0wLShCWo5G62p0KqZEFcX7uLzn8PuNkFrYDvqwNwKBhFc09a3hoikXCipz3I Vjc3VLFl3Mvnggt0kfGChLmaHg4JJtl85UJmPG5UW3/GBM1zZqvFu4MDjGk3KkO/xaS5TdjkPp D8vDA1S5J72JKgrqIr6lidQLjXKeN58vtGhYJUTVpLkemBr3ZsviI6nCW55vglm1IdIDfQP1F9 NhPIJa/HT8ux82XnMduWTejFAdSpMVBk4qZwuooCTAW6ntfi7jsHeCYafljuoONfxgoDz/RVzs HojVNhCwiTjULKn0WtwVydk5 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:07 -0800 IronPort-SDR: gHKaSFbQ5KzJ6xY9jEm6WD3G6ib0XtWLUNPqZz333Pfqz5MiK5KSub22i3kAGEst4gf6jmlhRM XnkxzcLGOuWWLIwR24cPeMcQiC2h5z3MaMh1VK2sfg3C1LLnAXZRo6BnZzjFVD34zTxKYGQLQD y39+mMaBcfm0X5qT7wfr/vWqcJBL0utm9vb+OqP5puCl4JghruYJF33uYdNgLVB/CAQH/+BbXS CdZsWVzErXXH0GlNugcFKj3xXvigYSE2DCAi5D9zaCjfct9sYLn+2dvkQWQkT7Jh+ifARHAsVT 2P8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:24 -0800 Received: (nullmailer pid 1916440 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v12 10/41] btrfs: verify device extent is aligned to zone Date: Fri, 15 Jan 2021 15:53:14 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a check in verify_one_dev_extent() to check if a device extent on a zoned block device is aligned to the respective zone boundary. Signed-off-by: Naohiro Aota Reviewed-by: Anand Jain Reviewed-by: Josef Bacik --- fs/btrfs/volumes.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4f02b570736e..be26fdfefc8c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7776,6 +7776,20 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, ret = -EUCLEAN; goto out; } + + if (dev->zone_info) { + u64 zone_size = dev->zone_info->zone_size; + + if (!IS_ALIGNED(physical_offset, zone_size) || + !IS_ALIGNED(physical_len, zone_size)) { + btrfs_err(fs_info, +"zoned: dev extent devid %llu physical offset %llu len %llu is not aligned to device zone", + devid, physical_offset, physical_len); + ret = -EUCLEAN; + goto out; + } + } + out: free_extent_map(em); return ret; From patchwork Fri Jan 15 06:53:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39559C433DB for ; Fri, 15 Jan 2021 06:58:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 09D8C222BB for ; Fri, 15 Jan 2021 06:58:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729478AbhAOG56 (ORCPT ); Fri, 15 Jan 2021 01:57:58 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41680 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG55 (ORCPT ); Fri, 15 Jan 2021 01:57:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693877; x=1642229877; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cSwBuBrP4rQzE/FX2gEzi0+iwzn8RA934m8m4cZG1gE=; b=iLvuBOYy8Sf7h/Lm6pakMB3JtpbzzEkTW4AESOfHlwybYAjCgYOky3Nj RAAn0zGCpiER1y9fmb+mEVh+clO1ZiYVAK/90Wf5V2jqeFwSr6ztycEyu eJJEc7cIwQWWgNQz9THBMVnDjsw8vNvbuDhtaFjuTOjaNVWDmW7yCnTT/ giZWEXeLcxznQ5dTdwsTMeoDpqr6aLyBN9qkgJuyD/tiV86YhDzBbGdEg cKHYjaEIToi2WH0I0bOUDM5uXv2sclLSb9ZoBdWpobV8s47IWe09xDEW5 x1xm1WLmzulmHgr+ds/6sL+7xvG1Qd3BVOysC+cXDVQ09f3xAy7NnQAS1 A==; IronPort-SDR: C4y+zo6dukuNkBqHDOJuZuhsjmnTRl+qDHgT2pQonIACgIoFj2d2/Y+b3bHtpCGxufz3NY093L PNiHUprI3xbIpLV3DNgEwX8zi1/2gD9VBiBESAXTwZt0RjAauQb7tXI0xIss3J1nLv6+36eLYv hxtxw+W0BDeXg1P3HhJp/f6TN01XNut93ifY7oOv6xW1g8TvO6Dh4yq4W712xAHf0DLQdg1uug CYsR598InwFUICpQc1lb6CRwRx8WrKy/QU3f3MIVZhN0u0DVxaB9ox+wJeph/DxC/BqFFeFg0f PjI= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928224" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:27 +0800 IronPort-SDR: xcuzOuFwQZv6rhKOg/ewChVsKFu9qS/CbhUUMsySHTpzmSPnu24BKDl49k+AFtpjkw2vwUwb0+ FneGCwHYDvQaKUS+2xB9+h5cBPRqQ7t1+Y2YJO68OC1HsSkdBPVlBl9ETkKTVKbRvgvNWRm53F PnVJLUVV+DFfZD3SsQLVVSy2+M+HzetXf8magYArNHxF7aoMCI72EK/mdCoixlO83/bgzNljaP YV03xACehEA+SOsrE8uKfe3jBxEEA1MhlnFXnKSIkDD/Eb7uHhRhryvlZeIcmAIVQYFtIA90N6 E6pE6BO0LtXSOeGktVNM6FYD Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:09 -0800 IronPort-SDR: KJS7VSDz+4NHaoBddcdn5HumjIU8H6BYC3X8Bx/jLTGTSS9Y509w/UiocfJog3ZF87kJe2C7fr zyPrDhzn+hRg8imEIbg8rFeYELuHYbsTLLJ0vwlnqYC+QI8cQLl6mjuqLhjBxW/xdUsuK2+mG6 K8hRxK9sgvydg5abwcLIU3Pu2pyg2XehJbm46E4mGnuSjyyoqi9TX5+8JzlJuEr1ddo2gr2Q9c geqqHk4i4cHsa4uLfYi1n2f8SVprAFQLvV+YiN/jzI85gIs9rl7DugOvGtf6IPvRWuDIbYxreE S38= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:26 -0800 Received: (nullmailer pid 1916442 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Anand Jain Subject: [PATCH v12 11/41] btrfs: load zone's allocation offset Date: Fri, 15 Jan 2021 15:53:15 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned btrfs must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. This commit adds "alloc_offset" to track the logical address. This logical address is populated in btrfs_load_block_group_zone_info() from write pointers of corresponding zones. For now, zoned btrfs only support the SINGLE profile. Supporting non-SINGLE profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-SINGLE profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/block-group.c | 15 +++++ fs/btrfs/block-group.h | 6 ++ fs/btrfs/zoned.c | 150 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 ++ 4 files changed, 178 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index ff13f7554ee5..13edbc959bac 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -15,6 +15,7 @@ #include "delalloc-space.h" #include "discard.h" #include "raid56.h" +#include "zoned.h" /* * Return target flags in extended format or 0 if restripe for this chunk_type @@ -1851,6 +1852,13 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_err(info, "zoned: failed to load zone info of bg %llu", + cache->start); + goto error; + } + /* * We need to exclude the super stripes now so that the space info has * super bytes accounted for, otherwise we'll think we have more space @@ -2138,6 +2146,13 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, cache->cached = BTRFS_CACHE_FINISHED; if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; + + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_put_block_group(cache); + return ret; + } + ret = exclude_super_stripes(cache); if (ret) { /* We may have excluded something, so call this just in case */ diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 8f74a96074f7..9d026ab1768d 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -183,6 +183,12 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + + /* + * Allocation offset for the block group to implement sequential + * allocation. This is used only with ZONED mode enabled. + */ + u64 alloc_offset; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index ae5f49fe4fb4..78be99b3c090 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -3,14 +3,20 @@ #include #include #include +#include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" #include "disk-io.h" +#include "block-group.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* Invalid allocation pointer value for missing devices */ +#define WP_MISSING_DEV ((u64)-1) +/* Pseudo write pointer value for conventional zone */ +#define WP_CONVENTIONAL ((u64)-2) /* Number of superblock log zones */ #define BTRFS_NR_SB_LOG_ZONES 2 @@ -923,3 +929,147 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map_tree *em_tree = &fs_info->mapping_tree; + struct extent_map *em; + struct map_lookup *map; + struct btrfs_device *device; + u64 logical = cache->start; + u64 length = cache->length; + u64 physical = 0; + int ret; + int i; + unsigned int nofs_flag; + u64 *alloc_offsets = NULL; + u32 num_sequential = 0, num_conventional = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + /* Sanity check */ + if (!IS_ALIGNED(length, fs_info->zone_size)) { + btrfs_err(fs_info, "zoned: block group %llu len %llu unaligned to zone size %llu", + logical, length, fs_info->zone_size); + return -EIO; + } + + /* Get the chunk mapping */ + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, logical, length); + read_unlock(&em_tree->lock); + + if (!em) + return -EINVAL; + + map = em->map_lookup; + + alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), + GFP_NOFS); + if (!alloc_offsets) { + free_extent_map(em); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + bool is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->bdev == NULL) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (is_sequential) + num_sequential++; + else + num_conventional++; + + if (!is_sequential) { + alloc_offsets[i] = WP_CONVENTIONAL; + continue; + } + + /* + * This zone will be used for allocation, so mark this + * zone non-empty. + */ + btrfs_dev_clear_zone_empty(device, physical); + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + nofs_flag = memalloc_nofs_save(); + ret = btrfs_get_dev_zone(device, physical, &zone); + memalloc_nofs_restore(nofs_flag); + if (ret == -EIO || ret == -EOPNOTSUPP) { + ret = 0; + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } else if (ret) { + goto out; + } + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + btrfs_err(fs_info, "zoned: offline/readonly zone %llu on device %s (devid %llu)", + physical >> device->zone_info->zone_size_shift, + rcu_str_deref(device->name), device->devid); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = + ((zone.wp - zone.start) << SECTOR_SHIFT); + break; + } + } + + if (num_conventional > 0) { + /* + * Since conventional zones do not have a write pointer, we + * cannot determine alloc_offset from the pointer + */ + ret = -EINVAL; + goto out; + } + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + cache->alloc_offset = alloc_offsets[0]; + break; + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID0: + case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* non-SINGLE profiles are not supported yet */ + default: + btrfs_err(fs_info, "zoned: profile %s not supported", + btrfs_bg_type_to_raid_name(map->type)); + ret = -EINVAL; + goto out; + } + +out: + kfree(alloc_offsets); + free_extent_map(em); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index de5901f5ae66..491b98c97f48 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,6 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -117,6 +118,12 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, return 0; } +static inline int btrfs_load_block_group_zone_info( + struct btrfs_block_group *cache) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 15 06:53:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021609 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58295C43381 for ; Fri, 15 Jan 2021 06:58:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3EA1022473 for ; Fri, 15 Jan 2021 06:58:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729590AbhAOG6A (ORCPT ); Fri, 15 Jan 2021 01:58:00 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41681 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG57 (ORCPT ); Fri, 15 Jan 2021 01:57:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693879; x=1642229879; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A++ufw9phHY3ZhkogXCWE4GuAvnx0fWMUujKI5le+QQ=; b=a6/hKzObcrue/dEU6tmlEzPODNwo6P8utXsPP0G4ZybOu/DYXTdcAOWt AU5eWYsrh5N+T2eTTobqxblA5vyyaNXdY/jC8dRO8CjAyPmoyHLRsbDFz Cd/Nzy4/QLjCsPbRvPhLgJ0wS9J58vPHPWqZQYvUAZ36qMRUs652g/WzN F+VXLEKZfAcNpTh3elWV/p9nvw0OyGy7Va1zlwMKqCzkbqIljF+ZGFoE0 v1qUalI9lXGU/X3Eo651ufprcTZMIh5zGVjhRvEpOUeIJiw5e8TQq0NWp Zok3aEPRjTsE0hvh+giZR/LBkuSBfDc06Xixq0muF1xcEmi5MY44MfW8O g==; IronPort-SDR: nFD2r6ZoeMn3hpffm9IonBAvgzgHIO+jq8eBl8YKgbfGksPOcsLebhahCrFGF4exGk1mVpEDMO bQO9TEcgQ8ecUv4YzqeX5n/1KvUBklJiu625kkuYpKyIfBwq13mBZT1jDJuJsYrui0u28hKyRc dQbinASbRIPBrk2nCusKbBImOMHogcbVCzCyzMFK7zQhyjRSAvuybo0YTyWPQyHV+UQHU0nGjV uuOfgBvnaXlFkx4WX1TfGfoRgJZGa2vjBCFayiMj/Lnxn3drgpwIXSQindAZNDJhCMnLOZS3G0 Vmg= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928230" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:29 +0800 IronPort-SDR: D9tuy/CWn2cqfIQf2gZ15inp8mM38l/hAi9CQ8AJn6kdzrK0xi7p0AIHe0QTJbNWhrU8+bx20r 6nBpBmNcta/8VVf37a4oLwkmkZNmrj6XwJk587IrAV+kP1M+J3jwawby64Qo1DfDARV0bIyiKY S06apLASiMxb6eqiGwbkvEv3ObjNMdFoOICWqQziR7u6KCYZmz2izM0Ec9wFk5gdPCuyr27sAZ je51EYXfVxmU02kiml5TbGICReMVZD6o15kxZ2T6Wb3tirEPTwfG2mF9sLHnjE/sXT2F84mIFd U62bZGcEnBWgkp23KG/PVBMy Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:11 -0800 IronPort-SDR: drcCDu7N3d6ANmQfhzEPChauxyEPkEBp86pfdcumL0n77Y0bwyTxNyNLUKIkbUvkpB0cjgemaL +nOalM31Rmpw8Hz8f2fQ0VadJdoqYwC/dju3Jnzc6qgzFmNMAvc9vrJvFg25wk1qY3f20G137f 5tYXrZC1cMIG3MzUd/EUtmuyojmXb0NPB6aAqGAuaRJWH0nyOzQ3aJgmyMQioAht7uxDvoi7Sf Qum/5KGEKz4XoTHXrcWHPytRt++wHmAvbvDyUBO2VyL8lZuuXOgH9qxx8rrpGCLBDWEo76edF9 b0I= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:28 -0800 Received: (nullmailer pid 1916444 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v12 12/41] btrfs: calculate allocation offset for conventional zones Date: Fri, 15 Jan 2021 15:53:16 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset if a block group contains a conventional zone. But instead, we can consider the end of the last allocated extent in the block group as an allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex (which is already taken in btrfs_make_block_group()). Since it is a new block group, we can simply set the allocation offset to 0, anyway. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 +- fs/btrfs/zoned.c | 99 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/zoned.h | 4 +- 3 files changed, 98 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 13edbc959bac..4607577df484 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1852,7 +1852,7 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, false); if (ret) { btrfs_err(info, "zoned: failed to load zone info of bg %llu", cache->start); @@ -2147,7 +2147,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, true); if (ret) { btrfs_put_block_group(cache); return ret; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 78be99b3c090..e8e7bca81a30 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -930,7 +930,68 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +/* + * Calculate an allocation pointer from the extent allocation information + * for a block group consist of conventional zones. It is pointed to the + * end of the last allocated extent in the block group as an allocation + * offset. + */ +static int calculate_alloc_pointer(struct btrfs_block_group *cache, + u64 *offset_ret) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + int ret; + u64 length; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = cache->start + cache->length; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + /* We should not find the exact match */ + if (!ret) + ret = -EUCLEAN; + if (ret < 0) + goto out; + + ret = btrfs_previous_extent_item(root, path, cache->start); + if (ret) { + if (ret == 1) { + ret = 0; + *offset_ret = 0; + } + goto out; + } + + btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]); + + if (found_key.type == BTRFS_EXTENT_ITEM_KEY) + length = found_key.offset; + else + length = fs_info->nodesize; + + if (!(found_key.objectid >= cache->start && + found_key.objectid + length <= cache->start + cache->length)) { + ret = -EUCLEAN; + goto out; + } + *offset_ret = found_key.objectid + length - cache->start; + ret = 0; + +out: + btrfs_free_path(path); + return ret; +} + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) { struct btrfs_fs_info *fs_info = cache->fs_info; struct extent_map_tree *em_tree = &fs_info->mapping_tree; @@ -944,6 +1005,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) int i; unsigned int nofs_flag; u64 *alloc_offsets = NULL; + u64 last_alloc = 0; u32 num_sequential = 0, num_conventional = 0; if (!btrfs_is_zoned(fs_info)) @@ -1042,11 +1104,30 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) if (num_conventional > 0) { /* - * Since conventional zones do not have a write pointer, we - * cannot determine alloc_offset from the pointer + * Avoid calling calculate_alloc_pointer() for new BG. It + * is no use for new BG. It must be always 0. + * + * Also, we have a lock chain of extent buffer lock -> + * chunk mutex. For new BG, this function is called from + * btrfs_make_block_group() which is already taking the + * chunk mutex. Thus, we cannot call + * calculate_alloc_pointer() which takes extent buffer + * locks to avoid deadlock. */ - ret = -EINVAL; - goto out; + if (new) { + cache->alloc_offset = 0; + goto out; + } + ret = calculate_alloc_pointer(cache, &last_alloc); + if (ret || map->num_stripes == num_conventional) { + if (!ret) + cache->alloc_offset = last_alloc; + else + btrfs_err(fs_info, + "zoned: failed to determine allocation offset of bg %llu", + cache->start); + goto out; + } } switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { @@ -1068,6 +1149,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) } out: + /* An extent is allocated after the write pointer */ + if (num_conventional && last_alloc > cache->alloc_offset) { + btrfs_err(fs_info, + "zoned: got wrong write pointer in BG %llu: %llu > %llu", + logical, last_alloc, cache->alloc_offset); + ret = -EIO; + } + kfree(alloc_offsets); free_extent_map(em); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 491b98c97f48..b53403ba0b10 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,7 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -119,7 +119,7 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, } static inline int btrfs_load_block_group_zone_info( - struct btrfs_block_group *cache) + struct btrfs_block_group *cache, bool new) { return 0; } From patchwork Fri Jan 15 06:53:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 701E5C433E0 for ; Fri, 15 Jan 2021 06:58:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3FB5222473 for ; Fri, 15 Jan 2021 06:58:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729689AbhAOG6I (ORCPT ); Fri, 15 Jan 2021 01:58:08 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41752 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG6H (ORCPT ); Fri, 15 Jan 2021 01:58:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693887; x=1642229887; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+Nf5y4255P4Hf4EPZ7AsBkJPmbuTy8J2oiBlLkSld3I=; b=lXYBjVEY+muhdq07p+vAgKgen3EueC28cminau3XjNgRCX3G1Jn5BGae 9s/sJsbap1UFiWYnBDBoT9C7sqnYIQ/ftGtdkMoJUMg2TdHLl/bHAjSJa YdvLf9/KW/x3f70GMwdAxU7XoP9Xb+jdoxEDd7TBc3neg7OC89BV3Uu+P vd8yrKIBEigeFN+0iwvNNXVpzwJqovIirFFDk7SFJlsNpwqJTuz207L6j Qy4kP103gRG0Q4lInSJCIg/nj/5Uq5lHn24NzKRKeJ6xNLduXtdme+49F oo65Lf4VuUmNHdeoihreaDaaU1veBjtMU8S0In9WN3yXl70mEHnPsGQ6x w==; IronPort-SDR: 5kBrd+D1DbeNDnkN/Q8JM5HY7eC0cuORqlvwMxDyFuO3qFdKQ4DnnaCXTi5A8V7Yj9M+4UWUzh SXX94MQ+A7xUmhCdJ4Z9ypuc39W+qNbfuhXOBmT8ELsi9nCBd8O83dF7slu7zvrZdZoLlJ3qmw fr2kTPLk69voJResHvYNafkd7iLLWv9ODDyTHb/QVQKvGNcyI19K7TBTdlcQMMK4qSgemVtNvO yzidL0d+zrW3JBjHLo4XUvBTh1c8aRbwhPHXJXPjNMwcItGBjYsZpnS8fAfF+MLQ/n2sJ7l3js ROk= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928232" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:31 +0800 IronPort-SDR: ShC43F+kfa+TPfmiC9V6HE0cjXmp2VqGPngnu8ARULhYdl3+5boQUTo+rwdcdefxGXN4j5Mhyv Tx4USu2KbnvF7r42PcmELXqwseAGnvKnJiRiNWCSf5pex0+WpOuFmQDtK4v6Q9ahXu2IxvNyCs 9SaSMJTAmNkQTqDdqvvPVXeRiorn+HkFKTgqSwUi2Wux+jzXMB3EP6a1oCQuP+R+P1aroPaxv4 mfC8RhXkYzD35g6bTnliAB2xXw512b1hAMcmc5aZ+q+d2uJjzXzkdbbn+nj9CoC2kiBrOzBx8H JX4j1VQ0jqNAaMqWpuqgRRzg Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:13 -0800 IronPort-SDR: jKTCY8yGwA+JN3qNlkumiWIQjMANdPY5gPEAdDNKw6iK+H56xlaIDfaM+eKuSwp3kfbnvt8GAU mWJ7ZiF7pg0awkuUHhyLY/Ssb1k6EcVfsHI23jRAMgfeGCQK670zegyyyHdLWdYe+2e9p+EZFI cl+OTz3sEssOYQEjh3DWSN1fQPiiFbQEHY0t1Lhsp90PMbm1vhCfbydYIlFnCg/gNqM08DEkwN su6GIMsTkZ3q5HigUId102q0B7hea6Z9nc0oIMXgfNv5Zs6sp36YAYxA6SMvO9Zq1210r1iSSe zgM= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:30 -0800 Received: (nullmailer pid 1916446 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 13/41] btrfs: track unusable bytes for zones Date: Fri, 15 Jan 2021 15:53:17 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In zoned btrfs a region that was once written then freed is not usable until we reset the underlying zone. So we need to distinguish such unusable space from usable free space. Therefore we need to introduce the "zone_unusable" field to the block group structure, and "bytes_zone_unusable" to the space_info structure to track the unusable space. Pinned bytes are always reclaimed to the unusable space. But, when an allocated region is returned before using e.g., the block group becomes read-only between allocation time and reservation time, we can safely return the region to the block group. For the situation, this commit introduces "btrfs_add_free_space_unused". This behaves the same as btrfs_add_free_space() on regular btrfs. On zoned btrfs, it rewinds the allocation offset. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 23 ++++++++++----- fs/btrfs/block-group.h | 1 + fs/btrfs/extent-tree.c | 10 ++++++- fs/btrfs/free-space-cache.c | 57 +++++++++++++++++++++++++++++++++++++ fs/btrfs/free-space-cache.h | 2 ++ fs/btrfs/space-info.c | 13 +++++---- fs/btrfs/space-info.h | 4 ++- fs/btrfs/sysfs.c | 2 ++ fs/btrfs/zoned.c | 24 ++++++++++++++++ fs/btrfs/zoned.h | 3 ++ 10 files changed, 125 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 4607577df484..faab7704523d 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1010,12 +1010,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, WARN_ON(block_group->space_info->total_bytes < block_group->length); WARN_ON(block_group->space_info->bytes_readonly - < block_group->length); + < block_group->length - block_group->zone_unusable); + WARN_ON(block_group->space_info->bytes_zone_unusable + < block_group->zone_unusable); WARN_ON(block_group->space_info->disk_total < block_group->length * factor); } block_group->space_info->total_bytes -= block_group->length; - block_group->space_info->bytes_readonly -= block_group->length; + block_group->space_info->bytes_readonly -= + (block_group->length - block_group->zone_unusable); + block_group->space_info->bytes_zone_unusable -= + block_group->zone_unusable; block_group->space_info->disk_total -= block_group->length * factor; spin_unlock(&block_group->space_info->lock); @@ -1159,7 +1164,7 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force) } num_bytes = cache->length - cache->reserved - cache->pinned - - cache->bytes_super - cache->used; + cache->bytes_super - cache->zone_unusable - cache->used; /* * Data never overcommits, even in mixed mode, so do just the straight @@ -1889,6 +1894,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, btrfs_free_excluded_extents(cache); } + btrfs_calc_zone_unusable(cache); + ret = btrfs_add_block_group_cache(info, cache); if (ret) { btrfs_remove_free_space_cache(cache); @@ -1896,7 +1903,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, } trace_btrfs_add_block_group(info, cache, 0); btrfs_update_space_info(info, cache->flags, cache->length, - cache->used, cache->bytes_super, &space_info); + cache->used, cache->bytes_super, + cache->zone_unusable, &space_info); cache->space_info = space_info; @@ -1952,7 +1960,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info) break; } btrfs_update_space_info(fs_info, bg->flags, em->len, em->len, - 0, &space_info); + 0, 0, &space_info); bg->space_info = space_info; link_block_group(bg); @@ -2194,7 +2202,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, */ trace_btrfs_add_block_group(fs_info, cache, 1); btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, - cache->bytes_super, &cache->space_info); + cache->bytes_super, 0, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); @@ -2302,7 +2310,8 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache) spin_lock(&cache->lock); if (!--cache->ro) { num_bytes = cache->length - cache->reserved - - cache->pinned - cache->bytes_super - cache->used; + cache->pinned - cache->bytes_super - + cache->zone_unusable - cache->used; sinfo->bytes_readonly -= num_bytes; list_del_init(&cache->ro_list); } diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 9d026ab1768d..0f3c62c561bc 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -189,6 +189,7 @@ struct btrfs_block_group { * allocation. This is used only with ZONED mode enabled. */ u64 alloc_offset; + u64 zone_unusable; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d79b8369e6aa..043a2fe79270 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -34,6 +34,7 @@ #include "block-group.h" #include "discard.h" #include "rcu-string.h" +#include "zoned.h" #undef SCRAMBLE_DELAYED_REFS @@ -2725,6 +2726,9 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info, { struct btrfs_free_cluster *ret = NULL; + if (btrfs_is_zoned(fs_info)) + return NULL; + *empty_cluster = 0; if (btrfs_mixed_space_info(space_info)) return ret; @@ -2808,7 +2812,11 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, space_info->max_extent_size = 0; percpu_counter_add_batch(&space_info->total_bytes_pinned, -len, BTRFS_TOTAL_BYTES_PINNED_BATCH); - if (cache->ro) { + if (btrfs_is_zoned(fs_info)) { + /* Need reset before reusing in a zoned block group */ + space_info->bytes_zone_unusable += len; + readonly = true; + } else if (cache->ro) { space_info->bytes_readonly += len; readonly = true; } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index fd6ddd6b8165..5a5c2c527dd5 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2465,6 +2465,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, int ret = 0; u64 filter_bytes = bytes; + ASSERT(!btrfs_is_zoned(fs_info)); + info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS); if (!info) return -ENOMEM; @@ -2522,11 +2524,49 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, return ret; } +static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, + u64 bytenr, u64 size, bool used) +{ + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 offset = bytenr - block_group->start; + u64 to_free, to_unusable; + + spin_lock(&ctl->tree_lock); + if (!used) + to_free = size; + else if (offset >= block_group->alloc_offset) + to_free = size; + else if (offset + size <= block_group->alloc_offset) + to_free = 0; + else + to_free = offset + size - block_group->alloc_offset; + to_unusable = size - to_free; + + ctl->free_space += to_free; + block_group->zone_unusable += to_unusable; + spin_unlock(&ctl->tree_lock); + if (!used) { + spin_lock(&block_group->lock); + block_group->alloc_offset -= size; + spin_unlock(&block_group->lock); + } + + /* All the region is now unusable. Mark it as unused and reclaim */ + if (block_group->zone_unusable == block_group->length) + btrfs_mark_bg_unused(block_group); + + return 0; +} + int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size) { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2535,6 +2575,16 @@ int btrfs_add_free_space(struct btrfs_block_group *block_group, bytenr, size, trim_state); } +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size) +{ + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + false); + + return btrfs_add_free_space(block_group, bytenr, size); +} + /* * This is a subtle distinction because when adding free space back in general, * we want it to be added as untrimmed for async. But in the case where we add @@ -2545,6 +2595,10 @@ int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC) || btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2562,6 +2616,9 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; + if (btrfs_is_zoned(block_group->fs_info)) + return 0; + spin_lock(&ctl->tree_lock); again: diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index ecb09a02d544..1f23088d43f9 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -107,6 +107,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, enum btrfs_trim_state trim_state); int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size); +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size); int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, u64 bytenr, u64 size); int btrfs_remove_free_space(struct btrfs_block_group *block_group, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 67e55c5479b8..025349c5c439 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -163,6 +163,7 @@ u64 __pure btrfs_space_info_used(struct btrfs_space_info *s_info, ASSERT(s_info); return s_info->bytes_used + s_info->bytes_reserved + s_info->bytes_pinned + s_info->bytes_readonly + + s_info->bytes_zone_unusable + (may_use_included ? s_info->bytes_may_use : 0); } @@ -257,7 +258,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; @@ -273,6 +274,7 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, found->bytes_used += bytes_used; found->disk_used += bytes_used * factor; found->bytes_readonly += bytes_readonly; + found->bytes_zone_unusable += bytes_zone_unusable; if (total_bytes > 0) found->full = 0; btrfs_try_granting_tickets(info, found); @@ -422,10 +424,10 @@ static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info, info->total_bytes - btrfs_space_info_used(info, true), info->full ? "" : "not "); btrfs_info(fs_info, - "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu", + "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu zone_unusable=%llu", info->total_bytes, info->bytes_used, info->bytes_pinned, info->bytes_reserved, info->bytes_may_use, - info->bytes_readonly); + info->bytes_readonly, info->bytes_zone_unusable); DUMP_BLOCK_RSV(fs_info, global_block_rsv); DUMP_BLOCK_RSV(fs_info, trans_block_rsv); @@ -454,9 +456,10 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, list_for_each_entry(cache, &info->block_groups[index], list) { spin_lock(&cache->lock); btrfs_info(fs_info, - "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s", + "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu zone_unusable %s", cache->start, cache->length, cache->used, cache->pinned, - cache->reserved, cache->ro ? "[readonly]" : ""); + cache->reserved, cache->zone_unusable, + cache->ro ? "[readonly]" : ""); spin_unlock(&cache->lock); btrfs_dump_free_space(cache, bytes); } diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 5646393b928c..ee003ffba956 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -17,6 +17,8 @@ struct btrfs_space_info { u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ + u64 bytes_zone_unusable; /* total bytes that are unusable until + resetting the device zone */ u64 max_extent_size; /* This will hold the maximum extent size of the space info if we had an ENOSPC in the @@ -119,7 +121,7 @@ DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned"); int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, u64 flags); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 4522a1c4cd08..cf7e766f7c58 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -666,6 +666,7 @@ SPACE_INFO_ATTR(bytes_pinned); SPACE_INFO_ATTR(bytes_reserved); SPACE_INFO_ATTR(bytes_may_use); SPACE_INFO_ATTR(bytes_readonly); +SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); BTRFS_ATTR(space_info, total_bytes_pinned, @@ -679,6 +680,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, bytes_reserved), BTRFS_ATTR_PTR(space_info, bytes_may_use), BTRFS_ATTR_PTR(space_info, bytes_readonly), + BTRFS_ATTR_PTR(space_info, bytes_zone_unusable), BTRFS_ATTR_PTR(space_info, disk_used), BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, total_bytes_pinned), diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index e8e7bca81a30..3f873f2c28e2 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1162,3 +1162,27 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) return ret; } + +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) +{ + u64 unusable, free; + + if (!btrfs_is_zoned(cache->fs_info)) + return; + + WARN_ON(cache->bytes_super != 0); + unusable = cache->alloc_offset - cache->used; + free = cache->length - cache->alloc_offset; + + /* We only need ->free_space in ALLOC_SEQ BGs */ + cache->last_byte_to_unpin = (u64)-1; + cache->cached = BTRFS_CACHE_FINISHED; + cache->free_space_ctl->free_space = free; + cache->zone_unusable = unusable; + + /* + * Should not have any excluded extents. Just + * in case, though. + */ + btrfs_free_excluded_extents(cache); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index b53403ba0b10..0cc0b27e9437 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -42,6 +42,7 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -124,6 +125,8 @@ static inline int btrfs_load_block_group_zone_info( return 0; } +static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 15 06:53:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 858AFC433E0 for ; Fri, 15 Jan 2021 06:58:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 519C722473 for ; Fri, 15 Jan 2021 06:58:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729761AbhAOG6M (ORCPT ); Fri, 15 Jan 2021 01:58:12 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41699 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG6L (ORCPT ); Fri, 15 Jan 2021 01:58:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693890; x=1642229890; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wr2gA80eaX+raezfX00qe/DcxuPtzXHnv4BXH7y5XY4=; b=Sqp/pdywHGL2BjMdjaZERSKc4eOD/8u1WbcCviOFyXCDg/qYrI1ECRYp Oy90KIRV1wpZF0umpuMQf2kgHDSe2bjDHxxLW6UsqBJHtUh7pqjv3RqHD aJ1aWF5dO1dtSB2oNYFexYVaNwwU4fBtMWiHLS9VYTHSixiQDeA/TSWX8 vG4vaj67MfzDcRNHHHval5XCknqBip/PCCRk/wPKD3nZANriPGLc/pM38 nGjlgwqrVNKnWFeYtLg3QyWE8kSfqZmPiiHlPvzhXa0ca1IietT295Kby 9l1Sc6rVaXZETICkxOXdU8GKLsNyuEXZrcuOdyA/msjlNr4eU5iKjFIeP g==; IronPort-SDR: X9q70j+92yQ5bAFLqWzdhYQyoj5yBC6wl1274JH29HX+mc9WJDeCssutAlk+3pzr9uDQ4UJwVf qao6P66PTHLCyyacuuAdO1W7GyHbbbDtcluIz6nGIuCAWpZQfMcpMQPfe9HZFdY7fQp3WypAFH YYhy+tjF5c5UFZ/4wOI/F9U/4P69zXYIxDFY+gk6FMZDuw6yMNiMV7ZQjALX7Ol0b8K4N9NuEM ZAxjfqstrpVeygp8yoaUMniG3v66K/phRkABBna3OHxAojc7EipTvINIuzETto0u8trRHC1wi6 EUU= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928240" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:33 +0800 IronPort-SDR: RYf1IjUnAYB7ZqRChUuMxD801hIY1Ttv+7p6T5f9NxYOmBytZs4sEpLbYcavYclzoviNkLNRCQ hnvrpv2b3CSd3x8NtQHVcoKaVqQ3afMsX6GGUakz0Gg3vus5ABGPXPn+ToRaQPuDxiEBIkVt/W uhrb8T3PMHtGsa09R5q+Gm3jm6KHkTYdjT/qdby/Rq8Y1YUu63d2JIfE88jV5ptYPR9nNxCyt6 muEARCn4eSPhNnlBsXhW67VOPV9l8k5SzOk9BXB2S8sFZ+ycS58Vyyh5SL26z35826fzpeuN+z swlE8qWBFmkm6nG5nltMcUEI Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:15 -0800 IronPort-SDR: BEqJPqBPuM9ODP6NGJcbidB3Q1xCtpWKa6pqGJcxMU1Z7JU1rOb9/QBftKiWqGHhj7EfOHZXI6 dPrsMOCCDAfIS0xHo4et/OdpALAw+FnnrnntuIfF3STFkKlD6Mfq2iQjPE6FFJbdH4U9cFb9pS Qt9qc+2bK1Px5Bg14c6wPlgDAeGQdf5Ts0qDqn0ah3+SHScmTFj+IU6K1wovCO58RLGs7PbZKv j0rlTQAugdkp3rmj+WW3FFe1OYeOEYjojgYBWcuvqlzkHGjjaaTPt4prfMdbcUtJxCFxZx7F5y dDQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:32 -0800 Received: (nullmailer pid 1916448 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 14/41] btrfs: do sequential extent allocation in ZONED mode Date: Fri, 15 Jan 2021 15:53:18 +0900 Message-Id: <3814ab2f4f5d0f975bc16f61cecc1d0e10a1206b.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a sequential extent allocator for the ZONED mode. This allocator just needs to check if there is enough space in the block group. Therefor the allocator never manages bitmaps or clusters. Also add ASSERTs to the corresponding functions. Actually, with zone append writing, it is unnecessary to track the allocation offset. It only needs to check space availability. But, by tracking the offset and returning the offset as an allocated region, we can skip modification of ordered extents and checksum information when there is no IO reordering. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 ++ fs/btrfs/extent-tree.c | 85 ++++++++++++++++++++++++++++++++++--- fs/btrfs/free-space-cache.c | 6 +++ 3 files changed, 89 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index faab7704523d..21ff5ff0c735 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -726,6 +726,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only struct btrfs_caching_control *caching_ctl = NULL; int ret = 0; + /* Allocator for ZONED btrfs does not use the cache at all */ + if (btrfs_is_zoned(fs_info)) + return 0; + caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS); if (!caching_ctl) return -ENOMEM; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 043a2fe79270..88e103451aca 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3522,6 +3522,7 @@ btrfs_release_block_group(struct btrfs_block_group *cache, enum btrfs_extent_allocation_policy { BTRFS_EXTENT_ALLOC_CLUSTERED, + BTRFS_EXTENT_ALLOC_ZONED, }; /* @@ -3774,6 +3775,58 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Simple allocator for sequential only block group. It only allows + * sequential allocation. No need to play with trees. This function + * also reserves the bytes as in btrfs_add_reserved_bytes. + */ +static int do_allocation_zoned(struct btrfs_block_group *block_group, + struct find_free_extent_ctl *ffe_ctl, + struct btrfs_block_group **bg_ret) +{ + struct btrfs_space_info *space_info = block_group->space_info; + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 start = block_group->start; + u64 num_bytes = ffe_ctl->num_bytes; + u64 avail; + int ret = 0; + + ASSERT(btrfs_is_zoned(block_group->fs_info)); + + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + if (block_group->ro) { + ret = 1; + goto out; + } + + avail = block_group->length - block_group->alloc_offset; + if (avail < num_bytes) { + ffe_ctl->max_extent_size = avail; + ret = 1; + goto out; + } + + ffe_ctl->found_offset = start + block_group->alloc_offset; + block_group->alloc_offset += num_bytes; + spin_lock(&ctl->tree_lock); + ctl->free_space -= num_bytes; + spin_unlock(&ctl->tree_lock); + + /* + * We do not check if found_offset is aligned to stripesize. The + * address is anyway rewritten when using zone append writing. + */ + + ffe_ctl->search_start = ffe_ctl->found_offset; + +out: + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + return ret; +} + static int do_allocation(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) @@ -3781,6 +3834,8 @@ static int do_allocation(struct btrfs_block_group *block_group, switch (ffe_ctl->policy) { case BTRFS_EXTENT_ALLOC_CLUSTERED: return do_allocation_clustered(block_group, ffe_ctl, bg_ret); + case BTRFS_EXTENT_ALLOC_ZONED: + return do_allocation_zoned(block_group, ffe_ctl, bg_ret); default: BUG(); } @@ -3795,6 +3850,9 @@ static void release_block_group(struct btrfs_block_group *block_group, ffe_ctl->retry_clustered = false; ffe_ctl->retry_unclustered = false; break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3823,6 +3881,9 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, case BTRFS_EXTENT_ALLOC_CLUSTERED: found_extent_clustered(ffe_ctl, ins); break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3838,6 +3899,9 @@ static int chunk_allocation_failed(struct find_free_extent_ctl *ffe_ctl) */ ffe_ctl->loop = LOOP_NO_EMPTY_SIZE; return 0; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Give up here */ + return -ENOSPC; default: BUG(); } @@ -4006,6 +4070,9 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, case BTRFS_EXTENT_ALLOC_CLUSTERED: return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); + case BTRFS_EXTENT_ALLOC_ZONED: + /* nothing to do */ + return 0; default: BUG(); } @@ -4069,6 +4136,9 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.last_ptr = NULL; ffe_ctl.use_cluster = true; + if (btrfs_is_zoned(fs_info)) + ffe_ctl.policy = BTRFS_EXTENT_ALLOC_ZONED; + ins->type = BTRFS_EXTENT_ITEM_KEY; ins->objectid = 0; ins->offset = 0; @@ -4211,20 +4281,23 @@ static noinline int find_free_extent(struct btrfs_root *root, /* move on to the next group */ if (ffe_ctl.search_start + num_bytes > block_group->start + block_group->length) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } if (ffe_ctl.found_offset < ffe_ctl.search_start) - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - ffe_ctl.search_start - ffe_ctl.found_offset); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + ffe_ctl.search_start - ffe_ctl.found_offset); ret = btrfs_add_reserved_bytes(block_group, ram_bytes, num_bytes, delalloc); if (ret == -EAGAIN) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } btrfs_inc_block_group_reservations(block_group); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 5a5c2c527dd5..757c740de179 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2906,6 +2906,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group, u64 align_gap_len = 0; enum btrfs_trim_state align_gap_trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, block_group->full_stripe_len, max_extent_size); @@ -3037,6 +3039,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group, struct rb_node *node; u64 ret = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&cluster->lock); if (bytes > cluster->max_size) goto out; @@ -3813,6 +3817,8 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group, int ret; u64 rem = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + *trimmed = 0; spin_lock(&block_group->lock); From patchwork Fri Jan 15 06:53:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F214C433DB for ; Fri, 15 Jan 2021 06:58:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6706B22473 for ; Fri, 15 Jan 2021 06:58:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729822AbhAOG6P (ORCPT ); Fri, 15 Jan 2021 01:58:15 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41647 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbhAOG6N (ORCPT ); Fri, 15 Jan 2021 01:58:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693893; x=1642229893; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=L4rZ/4NaM2Urx/hVNfvEeJ9hVmpk3i9yPta+wJhWwjY=; b=foxbC1qpCLIyh6W1u9El52JLjeRBiGc8/y2rZixKFZM/ru/T6G2/aNVn ZiAiwOJKpu4LPnpPa/CKQcvYha23qfL0bBMW4+fVhGxi0BGznhyyMvwqf jQaLAjmogaOdQjEE7ZnJa2yFQEx07hWqPY7Y5j+dnq4DUr5sUpxGkkx21 6JICReBIl8eXc12JvCAXh3wwH+mI8I0c9uvAg8KQwwHEM9BriY2aJxxNw D2CX5K/bz76BT8M3Vxf77NJfd4Z6wN0wbtKb7cHQnY0KIsZEHgIfkGFAg 8eknYYY440sRWFbBb/OnFRuq8QmZGF71HI7aEJ+uiem21M8Kc+FmvdhKF g==; IronPort-SDR: o9RN4SCGHoUtLhmNvZpJI+VnCCiZ+mqGoWmYiBum1NOD2QXxdZsp3I0zsDBmCw+3AAXmXQJnZ9 jLA/ejCgqw34B/i9W+Fz7xq8zAMhPbVc6cG3JMPAtbwHnGRmZncqaY1rZO/H3Ja9JnXKlzrOEg IizTmRWHRBZIL0Suf9OWDFpfGaiuQEfdhjjt1WyfUiWk9ajyX8dsTHR9GFkDy+akjQ5Px1d7ZC uz8CtZQtKV6TVTZLTGl5G9X19bXlMJCjP6mzP7FC5/6F7sUoCyZdJ8Qlj5IDCpQRgkS5DPri9v pdQ= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928246" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:35 +0800 IronPort-SDR: L+Vacf26gqj4uRU8y2cTQQVNyoCMObl4ki/t64KOnSIS1GgDrJQrSoRbTROG5Bx5VssOnglsMx PszxrQKgdsq+KmGpt5AiKz1dfeV8dptUScBvEUwNddUy0++rqmZnkz3Y+6yD6NcpMMBUtAmcx2 b6LvGVFPYeAXeIpLvQcTb1Z0DZrgrW7PjtW+PhIn29Qg4I3POWjN5pGbkdCgwUnKwv3j1jRWgx mPq0pvXtedflI79PvVccrzR6VnMP1C8nGoqGnG8Rmv3kUNi5FMsfhxOdQ+gMB8PtpbzVmPdlVv knPdjp8rYSdQVi738gqaLdlQ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:17 -0800 IronPort-SDR: FVbjGIERzLiJXknMRhJiME4K12TQn+fuiA2APBojqZEceTNzTIPs45x72jF/mnbJkKnmnOg5QM 5frZgwwqJWmc4VV5OM6UK882mbeG7frnbQN9IuVjO0kProOoxy71L8pOpx1IsZV/Dz0f2jRzdl L/nOUHZ35a6JghO7ki/cZ1g7cIXdGtyTnaiNuR/5rCMxVkLX6/JmQlU65ZIbMlYIsW7J65s0uN 76Z/++w4RAwr15aMAosC60dqDozcLChJatrBLxLJ/Hco34lAwIvpKYpBX613RPLHOLdqoas8pp R44= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:34 -0800 Received: (nullmailer pid 1916450 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 15/41] btrfs: redirty released extent buffers in ZONED mode Date: Fri, 15 Jan 2021 15:53:19 +0900 Message-Id: <9ad42e6d464b143817a23717231035daaa48cefd.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED volumes, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch introduces a list of clean and unwritten extent buffers that have been released in a transaction. Btrfs redirty the buffer so that btree_write_cache_pages() can send proper bios to the devices. Besides it clears the entire content of the extent buffer not to confuse raw block scanners e.g. btrfsck. By clearing the content, csum_dirty_buffer() complains about bytenr mismatch, so avoid the checking and checksum using newly introduced buffer flag EXTENT_BUFFER_NO_CHECK. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 8 ++++++++ fs/btrfs/extent-tree.c | 12 +++++++++++- fs/btrfs/extent_io.c | 4 ++++ fs/btrfs/extent_io.h | 2 ++ fs/btrfs/transaction.c | 10 ++++++++++ fs/btrfs/transaction.h | 3 +++ fs/btrfs/tree-log.c | 6 ++++++ fs/btrfs/zoned.c | 37 +++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 9 files changed, 88 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 192e366f8afc..e9b6c6a21681 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -459,6 +459,12 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec return 0; found_start = btrfs_header_bytenr(eb); + + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) { + WARN_ON(found_start != 0); + return 0; + } + /* * Please do not consolidate these warnings into a single if. * It is useful to know what went wrong. @@ -4697,6 +4703,8 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, EXTENT_DIRTY); btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents); + btrfs_free_redirty_list(cur_trans); + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 88e103451aca..c3e955bbd2ab 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3374,8 +3374,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = check_ref_cleanup(trans, buf->start); - if (!ret) + if (!ret) { + btrfs_redirty_list_add(trans->transaction, buf); goto out; + } } pin = 0; @@ -3387,6 +3389,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, goto out; } + if (btrfs_is_zoned(fs_info)) { + btrfs_redirty_list_add(trans->transaction, buf); + pin_down_extent(trans, cache, buf->start, buf->len, 1); + btrfs_put_block_group(cache); + goto out; + } + WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)); btrfs_add_free_space(cache, buf->start, buf->len); @@ -4726,6 +4735,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, __btrfs_tree_lock(buf, nest); btrfs_clean_tree_block(buf); clear_bit(EXTENT_BUFFER_STALE, &buf->bflags); + clear_bit(EXTENT_BUFFER_NO_CHECK, &buf->bflags); set_extent_buffer_uptodate(buf); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6e3b72e63e42..129d571a5c1a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -24,6 +24,7 @@ #include "rcu-string.h" #include "backref.h" #include "disk-io.h" +#include "zoned.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -5048,6 +5049,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, btrfs_leak_debug_add(&fs_info->eb_leak_lock, &eb->leak_list, &fs_info->allocated_ebs); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); @@ -5825,6 +5827,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv, char *src = (char *)srcv; unsigned long i = get_eb_page_index(start); + WARN_ON(test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)); + if (check_eb_range(eb, start, len)) return; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 19221095c635..5a81268c4d8c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -31,6 +31,7 @@ enum { EXTENT_BUFFER_IN_TREE, /* write IO error */ EXTENT_BUFFER_WRITE_ERR, + EXTENT_BUFFER_NO_CHECK, }; /* these are flags for __process_pages_contig */ @@ -93,6 +94,7 @@ struct extent_buffer { struct rw_semaphore lock; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 4ffe66164fa3..ce480fe78531 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -21,6 +21,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" #define BTRFS_ROOT_TRANS_TAG 0 @@ -375,6 +376,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); @@ -2344,6 +2347,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + /* + * At this point, we should have written all the tree blocks + * allocated in this transaction. So it's now safe to free the + * redirtyied extent buffers. + */ + btrfs_free_redirty_list(cur_trans); + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 31ca81bad822..660b4e1f1181 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -92,6 +92,9 @@ struct btrfs_transaction { */ atomic_t pending_ordered; wait_queue_head_t pending_wait; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8ee0700a980f..930e752686b4 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -19,6 +19,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" /* magic values for the inode_only field in btrfs_log_inode: * @@ -2752,6 +2753,8 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans, free_extent_buffer(next); return ret; } + btrfs_redirty_list_add( + trans->transaction, next); } else { if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &next->bflags)) clear_extent_buffer_dirty(next); @@ -3296,6 +3299,9 @@ static void free_log_tree(struct btrfs_trans_handle *trans, clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); extent_io_tree_release(&log->log_csum_range); + + if (trans && log->node) + btrfs_redirty_list_add(trans->transaction, log->node); btrfs_put_root(log); } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 3f873f2c28e2..2b02a38b11f9 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -10,6 +10,7 @@ #include "rcu-string.h" #include "disk-io.h" #include "block-group.h" +#include "transaction.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1186,3 +1187,39 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) */ btrfs_free_excluded_extents(cache); } + +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) +{ + struct btrfs_fs_info *fs_info = eb->fs_info; + + if (!btrfs_is_zoned(fs_info) || + btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) || + !list_empty(&eb->release_list)) + return; + + set_extent_buffer_dirty(eb); + set_extent_bits_nowait(&trans->dirty_pages, eb->start, + eb->start + eb->len - 1, EXTENT_DIRTY); + memzero_extent_buffer(eb, 0, eb->len); + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); + + spin_lock(&trans->releasing_ebs_lock); + list_add_tail(&eb->release_list, &trans->releasing_ebs); + spin_unlock(&trans->releasing_ebs_lock); + atomic_inc(&eb->refs); +} + +void btrfs_free_redirty_list(struct btrfs_transaction *trans) +{ + spin_lock(&trans->releasing_ebs_lock); + while (!list_empty(&trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + spin_unlock(&trans->releasing_ebs_lock); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 0cc0b27e9437..b2ce16de0c22 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -43,6 +43,9 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb); +void btrfs_free_redirty_list(struct btrfs_transaction *trans); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -127,6 +130,10 @@ static inline int btrfs_load_block_group_zone_info( static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } +static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) { } +static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 15 06:53:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA7FFC433E0 for ; Fri, 15 Jan 2021 06:58:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C95F22473 for ; Fri, 15 Jan 2021 06:58:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728197AbhAOG6a (ORCPT ); Fri, 15 Jan 2021 01:58:30 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41718 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727132AbhAOG63 (ORCPT ); Fri, 15 Jan 2021 01:58:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693908; x=1642229908; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HPmcv5byno73bmwo8Jg2eY0YlMtgNL95TyRHUx647j8=; b=B1Pd3epHBRs2FRaOIEVCaPkNxvtfqj2n+pmh8WhQ6/fXVAL5+P2EvhW2 Xz6KGt5EmOkw96BSe5WW5VV32GV1xClHbtjJZXit3Vq/xPdXrXcPP2e9w LomF8EsAP1sNnPfatZXKwn1A/Xjqx8JRdckQ25jqbuj7omFVSOuRtfaUM Y7+hqdugC606ZyKBJ6pXNYz5lMgnB5GLbR+tKPGNosyh9eRsVkOgFweCB 7YgCFSldjseD2xeE73HMVepAo6oLGXzTDvY+utCZwWwhxFgQPpB543x8V usRVk1+6x/GGskHYVYS5vFREASC0PKRpLwHflpuA2w/H0nvf6KqoupdNA Q==; IronPort-SDR: xy2NUtL3d5piUL3z2376uyFptQQXi74cNG5oRI16CCvcmlCsKrGYePeT/hKdFxCaC76qzMzhDg yIM1LmrmsbiBGM7P2vVa+C79u0hLZH3iZsJIL9W945Jpi+seNt001URJf+vTOri6gD8HIWBVrP C1/B0ty76WhZujptHQ45Z2Rr0VYeLZ+A3T6W70m3XX7aRn2TWCfnqFZz5vGzuQXr3bcQVUUHFB a5L4+BVMSdUtE0n+UryIXivPSprL9xZFXkNWVblBE3dkJtWdEVMYvHkxSdt2R0j7AXxPLv4qmV xME= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928251" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:37 +0800 IronPort-SDR: TysY5FnkNr1nj77/5537zj9WAk+WOsPlXv7TvPWWzakABZE8kK/Y6K5hMy8RQxKa/JL3HjODZp 2ibLpbxVlz0Rt/i1POhDvCC1GPxhbmQWPjPF1dprY9fdvpIWqVCLs8yE22hNKlKCeRKXJDedyw RFQNYHRw+YkXfcWt8K7LCBL4fk2B//Dxx94Sa/swGtOewmQiGn+jITKII0ccz5NT01XLdfirv4 m2Ak016KlmDz2RNz8kUb82w/+WIB2q0CveDT/1I7fOO7NEayiXKKSMyO27C/xQeNVgCjWohoZw bbpiqW/zoFHp/xOwUKtkK0SB Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:19 -0800 IronPort-SDR: 2yTOBFjKiwMpw23uyGAQd65kAWgU7R4vxnyorwTWKKjRa5QgayCfgUIcOzLtmef2XSHVhMy4SG Gv1+Szs2rtDFvOmo540RNePhvpjITs6MkXVjcr75LWjvKCMvHLgc8+XZshK5+aWTst4WuhIfSW we5uH6RZHLd3juUzP6va+ms+jpDkO0G4o47maMjj4Q3JmmNrK5V+sn3hKbauNgdxO9mf54yYFq PfqcNwy1UwG5l9XMHj0SP9MhDZ+r5mXsjK3eOVfUDRMzTtTQDLUHc/pWYcjXJpkWlZJRIOG69/ zHI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:36 -0800 Received: (nullmailer pid 1916452 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 16/41] btrfs: advance allocation pointer after tree log node Date: Fri, 15 Jan 2021 15:53:20 +0900 Message-Id: <00b835bd1973fe3f3b9a35eb6fba90abab939186.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Since the allocation info of tree log node is not recorded to the extent tree, calculate_alloc_pointer() cannot detect the node, so the pointer can be over a tree node. Replaying the log call btrfs_remove_free_space() for each node in the log tree. So, advance the pointer after the node. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/free-space-cache.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 757c740de179..ed39388209b8 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2616,8 +2616,22 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; - if (btrfs_is_zoned(block_group->fs_info)) + if (btrfs_is_zoned(block_group->fs_info)) { + /* + * This can happen with conventional zones when replaying + * log. Since the allocation info of tree-log nodes are + * not recorded to the extent-tree, calculate_alloc_pointer() + * failed to advance the allocation pointer after last + * allocated tree log node blocks. + * + * This function is called from + * btrfs_pin_extent_for_log_replay() when replaying the + * log. Advance the pointer not to overwrite the tree-log nodes. + */ + if (block_group->alloc_offset < offset + bytes) + block_group->alloc_offset = offset + bytes; return 0; + } spin_lock(&ctl->tree_lock); From patchwork Fri Jan 15 06:53:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 478D3C433E6 for ; Fri, 15 Jan 2021 06:58:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1BF6B225AA for ; Fri, 15 Jan 2021 06:58:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729914AbhAOG6j (ORCPT ); Fri, 15 Jan 2021 01:58:39 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41680 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728452AbhAOG6i (ORCPT ); Fri, 15 Jan 2021 01:58:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693917; x=1642229917; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dbW14dbO41wF1P19H34BLRQ31vBlOrekBYTwdSWavus=; b=HyDKfCtLDdvoUXSewtYs0C1HoqUTHhNrn9E5LbPjxa61m2aJP7oSxTS1 YTB6ULfY67j+k4hMYRarVFtQ8yiTYJarP3KIBsS2xme4bqiFpIi/d8V+C T8JVTyFN84LXzBgHIq7IymoCEd+++XgBvy3s8xegW5ctq0doLbJyuRFg+ BkhRlagnkXEtQ3DBPkpqy/lrA9oj9mqzWRQiqbELDpE37JH24Aq1DPZJQ F2LqHumhh5aqhho3xAq00gH+/5jvkkEoPKMTmO3MwZl4WiWx/sZshjOFK jJRz4RPZ6howhjO16axYmKgk49NhRzASQYaCjrNf5iwPeJlOowYXnvUZ4 w==; IronPort-SDR: 8ifOq/HkPt58Xg5bkf4W5k+/zlZY2rSuUiM2BPrAUPUtIq0Pwa6UbRHE+6RNOqEPdU0NsNfKja KDnsFguNlDaRLuE/gv3wBk/paKazqfFPlt7ZlN4ICL6A0QXoxE12S53ruBDIg1gjAW8oP/AWTg 0whaEp6x59EjQZiOgPwbwgJIOMH8/O7JkGj7h16dCFiHkErO/5NFLRU/82dCDFNmO10MampPfC l69v5Xe9BT83NNkHt5BIEZkchDDGSjfzRgtkNLeDYTff2gV3bNPlmyCFfurbK2JL7oQ5/2ZPXo MeE= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928254" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:39 +0800 IronPort-SDR: m7ad8OwxLSEXPaFa8G8qBecG2KWjIpi1KEr1CrYKTvyzc+R6T2vv7RkqovtZzlqj3g9cahWrGE aoK2X+QS5W59zLqJduvbIQMzhdLRO//0JcteJ1oAwo2UQeROlhQnSwXAgqicUfg0GiIR//ZerZ hvnVDb0h0b2FmOkTyb8lMgcXTlXtOTsxTev8dz3NFdgF/UkvfP+boyAWLSQKYj5bYWifAD3n62 IXbV6GGqE7XKBE6Lwr6r+cmLkOD4UISsctKPhY5Ee4c8jhZ8qa6v08bxeaEecgcCzzYp04aWwJ OZWs5/1kWFbMLsWkIy/mLlzT Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:21 -0800 IronPort-SDR: 6b9YLNTxg9QKdW372dg2fOoiPL6Xd7/4ayPi5xaZNEEc0Nid47veFnLEYjxXHhBE5ypEvhL1yo 9+scVPBC0gvwIFL8Pk6s2Y31+cAG8L/exIHg5vbLB07CjHQ62ChUZhKoMmhAJrTxOH0OWeLmNa 35DflqsLAKHMLD19qPd/Q68niZQtCeGFz5QHlTXdSdJrsqrSBSr9Mm+ENxBT4082+rb7PTU4cC dA1pha8VQuZTYmZvWQgEy5oL/WbGAZez5VD79EqcZSibDMxNZd8nfNqVqezWKVRyMM5j8hC/X5 FD4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:38 -0800 Received: (nullmailer pid 1916454 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 17/41] btrfs: enable to mount ZONED incompat flag Date: Fri, 15 Jan 2021 15:53:21 +0900 Message-Id: <6a0e5aaae5714f7693fa5ff58ee4d24a84a60718.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This final patch adds the ZONED incompat flag to BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount ZONED flagged file system. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e80ce910b61d..cc8b8bab241d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -299,7 +299,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ - BTRFS_FEATURE_INCOMPAT_RAID1C34) + BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ + BTRFS_FEATURE_INCOMPAT_ZONED) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) From patchwork Fri Jan 15 06:53:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BDACC433E0 for ; Fri, 15 Jan 2021 06:58:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 20D3622473 for ; Fri, 15 Jan 2021 06:58:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729938AbhAOG6m (ORCPT ); Fri, 15 Jan 2021 01:58:42 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41681 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728452AbhAOG6k (ORCPT ); Fri, 15 Jan 2021 01:58:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693920; x=1642229920; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2BJYZ1aw+WlPJcGdpHT5BxdTOVhSDU0P/EHhQvQXEUE=; b=L+54gfJ86J5L/VzoHavFS4eLQTOGJnW9BkJY4CuFF/lluyRVg+VrXRb2 u8d4nJTt6bO8x1BWCXXVsZPJYarP/43dK69iXRfQAq+6BioJjE7uDUovB uAeDdKPEh3xoBT9xEwP6vKmCikb+ntNT6HiWz0Zo5+ZHdkTQEcoAS3X88 jXLD+FtEjhfYXGCPEz74OK/97K009/THOenivQbWKznZdWPfdnOtjk53Y 25eZVmldhcKr/EYyZRoTximoKmfFeh089Dp8gfBs0olUe3/lqneoigsyR SvyUlmSNBd+WK6+dM1PjpMAEKnnJFIPWwqHGL2NMOFDBa6ooWfDMcnjMi w==; IronPort-SDR: YfXPgmg+aP7Ym8TBmCUssLElVmUSP/06vRzjnSMm6WYT1fQRGKDEjiNviqcMbt4a6N4S2EgbKL 5RZKpsWQFYDmssFeoID3Qbfe+r6UTWwKVc73SxjNpCLAGmIxMRt4PX5Ry/KsfPFIQEBXLlcxZM u0xOVynkqsKrUGb5CLqfYO4Gjpdyqx52pWq+46Orrq2ccJtJlb1pnQ5igPKb1A320obasjvQvx QQi6fM3ZGDnmiTnowpdqZXJ8+nVmGQdNqdLyqUJPjXiWw4ulApl5LwJTJFf/KURFctDU3UprHf G1Q= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928256" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:41 +0800 IronPort-SDR: R+YUF9dGbDfnMtrfEZgxFgWUB1obKBqVd94VRjVZLgkdRffDYyGu0Pw/KqG56xhkFvg8O6+BrB eoDnGXHr2mMLljQ/vCrrIjKPdzdXdr6kefHK9jg9/bzGxnowjvyuKdpwcV+b07HcNCUseLd4vC qEOaenZmTMRODVPTpdqfxhnL6uo7EXh6Ve0UNQOjCoEDyKzGFkV5U+gVzZNIzF6RqLmxxN5QNa /idgrwrQs3o9kw9q5NwTX9Ym64cCm2aV1/22j9xFM5M6iyNzCGtV+aUXmiNMftA1Uha5S16m9A O7/ndphfe2a5Yx5y6z0tAWJn Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:23 -0800 IronPort-SDR: MQI5vOOvdbDm5RghpFZWraYHbHHhcupB+imkgRvtRUs+cWLDfmbrGI8lX77D9dOobNDoQuN1uG VjNzBwpiz5Z7VyVYHrw/ixe3R2jqJwVaeKep3Xh8jIRZpt924Xq8zxNKkCft8rZiIGDZBl/t/E rbnGIROB8GxpO1udlRjhIWTYVK9ro1T1uobTCXlDgWM0VNKW/PY+AecGVH9fJGhHVuoyeiq2Dr GJABcHgbAgIrXPgSjrfIrNc/Bj2c7M/llC1wckRrGHkZ2Aod61YDXDmMAY/ennVf00Cuek6CrR Pic= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:40 -0800 Received: (nullmailer pid 1916456 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 18/41] btrfs: reset zones of unused block groups Date: Fri, 15 Jan 2021 15:53:22 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For an ZONED volume, a block group maps to a zone of the device. For deleted unused block groups, the zone of the block group can be reset to rewind the zone write pointer at the start of the zone. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 8 ++++++-- fs/btrfs/extent-tree.c | 17 ++++++++++++----- fs/btrfs/zoned.h | 16 ++++++++++++++++ 3 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 21ff5ff0c735..f7c85cc81d1e 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1400,8 +1400,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) if (!async_trim_enabled && btrfs_test_opt(fs_info, DISCARD_ASYNC)) goto flip_async; - /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); + /* + * DISCARD can flip during remount. In ZONED mode, we need + * to reset sequential required zones. + */ + trimming = btrfs_test_opt(fs_info, DISCARD_SYNC) || + btrfs_is_zoned(fs_info); /* Implicit trim during transaction commit. */ if (trimming) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c3e955bbd2ab..ac24a79ce32a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1333,6 +1333,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { + struct btrfs_device *dev = stripe->dev; + u64 physical = stripe->physical; + u64 length = stripe->length; u64 bytes; struct request_queue *req_q; @@ -1340,14 +1343,18 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + req_q = bdev_get_queue(stripe->dev->bdev); - if (!blk_queue_discard(req_q)) + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, physical, length)) + ret = btrfs_reset_device_zone(dev, physical, + length, &bytes); + else if (blk_queue_discard(req_q)) + ret = btrfs_issue_discard(dev->bdev, physical, + length, &bytes); + else continue; - ret = btrfs_issue_discard(stripe->dev->bdev, - stripe->physical, - stripe->length, - &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index b2ce16de0c22..331951978487 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -210,4 +210,20 @@ static inline bool btrfs_check_super_location(struct btrfs_device *device, u64 p return device->zone_info == NULL || !btrfs_dev_is_sequential(device, pos); } +static inline bool btrfs_can_zone_reset(struct btrfs_device *device, + u64 physical, u64 length) +{ + u64 zone_size; + + if (!btrfs_dev_is_sequential(device, physical)) + return false; + + zone_size = device->zone_info->zone_size; + if (!IS_ALIGNED(physical, zone_size) || + !IS_ALIGNED(length, zone_size)) + return false; + + return true; +} + #endif From patchwork Fri Jan 15 06:53:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4F2AC433E6 for ; Fri, 15 Jan 2021 06:59:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C7B1233EE for ; Fri, 15 Jan 2021 06:59:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730067AbhAOG6u (ORCPT ); Fri, 15 Jan 2021 01:58:50 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41752 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728452AbhAOG6t (ORCPT ); Fri, 15 Jan 2021 01:58:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693928; x=1642229928; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XGPYvnrKmMcJTNq9V3Udz9HKkSqIOb888Y/DKSsZ2ow=; b=qyLu9Go3bGUH3Drf8jhaCzzyinaJBOpTVe7b73Yfae3tG+m7dik+JgCL ejj6wA9+o5bzkzn0GzlPqzP+Fn+N1zVbikEUESZ/ETMcaSFaUJbyliCjV 9M5lKExWEFjipudTD5w17083iBvXhK5Yzd1dA2tcE90HCXnJPg4LJYWBx IFOJN+tCVGeh2Yvz9uLjJ4zZbY8sMBzoG1gcX04OompCTypY6Ed/H5xgx QliLprMudbV6bMB9p7KGbYDdAmNoRoG8+d7JiNlhWWQB4sec/7m706ubf mAmNItJt5lmfrhbqyFE2lsq739ZDSka3bCyqhV9OmKoAe7xMqYPZFTVIU Q==; IronPort-SDR: KJLsvsYXFglnn3s5aNOk2Q+9CwLrqxZp69PkOVxLcWogBdOloYqbS9rmW9aQejn5YxFRy1cDFQ rhp+grnadkqCMCQz+1Nj1XpeTbMiLDBqQDXGG0vKi6QS6emo/QaXTGOtMlObYe7PcmS+M5IByM 6db+UZFmip2PRYzPudXRuNtHo+6yMiQMMCW1rc21nEOfKK66XzlMmOuJ80cjd7YDFlFTSeIAyq SCGrKX3MjOsApje//ReyZWhowaXTO/loWIe5nFBPuLFsKVqX3cmDlzgOnQMF4vGKoObs2VnWJ0 Kog= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928259" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:43 +0800 IronPort-SDR: jcHwS56kCkMkfH8J0Z6rTXNS2VU7OsDIiv6Ck1seJMZHK5dWCdiXiFoVCtIQSUkjutIp5SIPzp GhQ1aYkbVqFPsLFPHkeq0ulfs5OctWGayP/BukbgRIU4ioUEwIuktIEYWL7/RhP7DbFm/mz5bt ENw4pwK3dwdYOXo61lYNkwOvMZYSAvvnPIOE12FdghKpUCDSX505dC0ujua+HHXuxb1PiOvdZr t6OTG+RBOcyw+9lbw2zauAOlfKnDvItWbTVSWTYV5TV6U27b0vnJebQ+vYVSCFN8KnHlelaAbT uUTKnRhFXQJ4VOB2fSl4/UsJ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:25 -0800 IronPort-SDR: aNnfu7fs4Ga3koA+tZMsjACOnbgQ6P2Zqy5s/BrQrIcdSE+h+YcQtv1v23EnUkQ95+GUuHN+7u Gr68RGspa0h0oqQxHZua7dJpt/Lq/7O1SilgB3KImsbQxDA73vBKlqtDevuV4ACcxcctOayMC6 GymlpvQF9jZ32KsK5aB3g+wDRbxdaEbdNHoKZoJz+gEo/CDmymtGU+/pERMgdIr2v3h/6XTBud tUjuNtdqWzD6Mi5DE86OfRVP/JBQlahby8NzSXvcU6ggJ2JVW8g59DrEqjCxOARPA0myZ1NgqE sXA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:42 -0800 Received: (nullmailer pid 1916458 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 19/41] btrfs: extract page adding function Date: Fri, 15 Jan 2021 15:53:23 +0900 Message-Id: <59940825e958cf3e4cf99813febae57beb86ddaf.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit extract page adding to bio part from submit_extent_page(). The page is added only when bio_flags are the same, contiguous and the added page fits in the same stripe as pages in the bio. Condition checkings are reordered to allow early return to avoid possibly heavy btrfs_bio_fits_in_stripe() calling. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 57 ++++++++++++++++++++++++++++++++------------ 1 file changed, 42 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 129d571a5c1a..96f43b9121d6 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3061,6 +3061,45 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size) return bio; } +/** + * btrfs_bio_add_page - attempt to add a page to bio + * @bio: destination bio + * @page: page to add to the bio + * @logical: offset of the new bio or to check whether we are adding + * a contiguous page to the previous one + * @pg_offset: starting offset in the page + * @size: portion of page that we want to write + * @prev_bio_flags: flags of previous bio to see if we can merge the current one + * @bio_flags: flags of the current bio to see if we can merge them + * @return: true if page was added, false otherwise + * + * Attempt to add a page to bio considering stripe alignment etc. Return + * true if successfully page added. Otherwise, return false. + */ +static bool btrfs_bio_add_page(struct bio *bio, struct page *page, u64 logical, + unsigned int size, unsigned int pg_offset, + unsigned long prev_bio_flags, + unsigned long bio_flags) +{ + sector_t sector = logical >> SECTOR_SHIFT; + bool contig; + + if (prev_bio_flags != bio_flags) + return false; + + if (prev_bio_flags & EXTENT_BIO_COMPRESSED) + contig = bio->bi_iter.bi_sector == sector; + else + contig = bio_end_sector(bio) == sector; + if (!contig) + return false; + + if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) + return false; + + return bio_add_page(bio, page, size, pg_offset) == size; +} + /* * @opf: bio REQ_OP_* and REQ_* flags as one value * @wbc: optional writeback control for io accounting @@ -3089,27 +3128,15 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - sector_t sector = offset >> 9; struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; ASSERT(bio_ret); if (*bio_ret) { - bool contig; - bool can_merge = true; - bio = *bio_ret; - if (prev_bio_flags & EXTENT_BIO_COMPRESSED) - contig = bio->bi_iter.bi_sector == sector; - else - contig = bio_end_sector(bio) == sector; - - if (btrfs_bio_fits_in_stripe(page, io_size, bio, bio_flags)) - can_merge = false; - - if (prev_bio_flags != bio_flags || !contig || !can_merge || - force_bio_submit || - bio_add_page(bio, page, io_size, pg_offset) < io_size) { + if (force_bio_submit || + !btrfs_bio_add_page(bio, page, offset, io_size, pg_offset, + prev_bio_flags, bio_flags)) { ret = submit_one_bio(bio, mirror_num, prev_bio_flags); if (ret < 0) { *bio_ret = NULL; From patchwork Fri Jan 15 06:53:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17250C43381 for ; Fri, 15 Jan 2021 06:59:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E5C9E23A05 for ; Fri, 15 Jan 2021 06:59:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730113AbhAOG6y (ORCPT ); Fri, 15 Jan 2021 01:58:54 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41699 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730008AbhAOG6w (ORCPT ); Fri, 15 Jan 2021 01:58:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693932; x=1642229932; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YRMay/Q+9oo58lzbzw+6AQwTsydC8WqDxk0+o9Qk468=; b=ZsSwbq9AjUoBGbN4c/bZv6FM492F4tIZV3KSxDEQyOBG4h64rZnqGe06 Rub04AL6afq3g93u3uXJtIP3JxTTIINLbQuvecazlb0eG9h8dH9E/SEBT 75DBzcVMJ2S+b5IiPSaErdNROScsQwExE5jJCqYLqa9WkgZ4+IrP4UfNV e/JoNb4Yh81qdUH4aE4g0S9fagTpmjYo53ko0Sqj8dncPmRVkDSyp83m7 wVR8ROvkO36HR0+K/I8EUuRGE/MR6mOmRnVBxvY6WDJ9CHY/pxO7SWq2c I0KMMkTB7VNcfnDFvCuNuRFIyelxRSSecbXFUeHqU+JIk93OjdBUihhSq A==; IronPort-SDR: QIq4BYQOQmRRhmlQq2enembW0DvLCashUWwQ9cNRefGZFyr4FpbEQm0X1QQt9N4k3aoZNcWnsz sp76hdQ6w1ONRthEe8p8w7/mbe99r61FtG6pBeBEBa4+gdT4Q1BKGZzlm7DIudKhFtcCAl9fy+ LjvTrcY/0GelV5FeSDbaRAC75wxlW2/v3ab/qjugBs0twGEc4yVTqDVoPz+WSAOptyKhU2ohEC UqP0ar8mThZX54cbGHxbGDU1k88IGo4iFClDI/23LwNKXAy9fGgSM7/EwN/ocu7+GZS58Hl8WZ WQM= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928264" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:45 +0800 IronPort-SDR: z8ZcAnsF16vhVDH8rSBGE/sRYhLfM6Qt4+GRI/X7J1C8RP3hiK+MPGx3YESsTiJRvDRJYngCzQ P1WYP7kPWjdW2i1PSpY44cfqD0dB1OxIj+4aci+WT5KrLtwm94AUqzonSDi8Tz/LXN34vYeWPL DCU7JNMvgk+3XTfuI2LxIOrho6vtU5rTu+f83XBBKPJqwjZPSsZorHrahs2VsLgmX/KMvonty1 JoXgVSw+T9Zz/5V47sj5bwLNvYE8pBCh3TOzXZzZQGkK88IsAdkkqfovP7TCMXb9/LnpKbYUwT mnKI1/L4MpZAmgLj9ozWoUpT Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:27 -0800 IronPort-SDR: lDpMeESmZtn48nz0dCSJ59HgJAAx+niC7EFiav6LpE4nZp0aaWglohE8JHlvfy/oARvASgm0mN 8eW0K4dT4uszMOU63tWTqJ45A8HSSLGFzzR3X9U7Qg1p850dTx65iLrqrvhGl9ni2R+xOCpbUj v8AH++gFsGW+9zRcbOcBitX4BF+VDt2vuIfiGomeTOhhKCUKXsWGRs33kfQ1hYAfCWbkkRF5rs iC/Fp/3CX8+K9Bpd6KODjAXOln/1+xY8NSjQV4C4KBEzW3pPOBSGryZVEgnor40RzFKw9qJymQ EpA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:44 -0800 Received: (nullmailer pid 1916460 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 20/41] btrfs: use bio_add_zone_append_page for zoned btrfs Date: Fri, 15 Jan 2021 15:53:24 +0900 Message-Id: <9cb19c5a674bede549a357b676627083bf71345d.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned device has its own hardware restrictions e.g. max_zone_append_size when using REQ_OP_ZONE_APPEND. To follow the restrictions, use bio_add_zone_append_page() instead of bio_add_page(). We need target device to use bio_add_zone_append_page(), so this commit reads the chunk information to memoize the target device to btrfs_io_bio(bio)->device. Currently, zoned btrfs only supports SINGLE profile. In the feature, btrfs_io_bio can hold extent_map and check the restrictions for all the devices the bio will be mapped. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 96f43b9121d6..41fccfbaee15 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3083,6 +3083,7 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, u64 logical, { sector_t sector = logical >> SECTOR_SHIFT; bool contig; + int ret; if (prev_bio_flags != bio_flags) return false; @@ -3097,7 +3098,12 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, u64 logical, if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) return false; - return bio_add_page(bio, page, size, pg_offset) == size; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) + ret = bio_add_zone_append_page(bio, page, size, pg_offset); + else + ret = bio_add_page(bio, page, size, pg_offset); + + return ret == size; } /* @@ -3128,7 +3134,9 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; + struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct extent_io_tree *tree = &inode->io_tree; + struct btrfs_fs_info *fs_info = inode->root->fs_info; ASSERT(bio_ret); @@ -3159,11 +3167,27 @@ static int submit_extent_page(unsigned int opf, if (wbc) { struct block_device *bdev; - bdev = BTRFS_I(page->mapping->host)->root->fs_info->fs_devices->latest_bdev; + bdev = fs_info->fs_devices->latest_bdev; bio_set_dev(bio, bdev); wbc_init_bio(wbc, bio); wbc_account_cgroup_owner(wbc, page, io_size); } + if (btrfs_is_zoned(fs_info) && + bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct extent_map *em; + struct map_lookup *map; + + em = btrfs_get_chunk_map(fs_info, offset, io_size); + if (IS_ERR(em)) + return PTR_ERR(em); + + map = em->map_lookup; + /* We only support SINGLE profile for now */ + ASSERT(map->num_stripes == 1); + btrfs_io_bio(bio)->device = map->stripes[0].dev; + + free_extent_map(em); + } *bio_ret = bio; From patchwork Fri Jan 15 06:53:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B794C4332E for ; Fri, 15 Jan 2021 06:59:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2EB1C23403 for ; Fri, 15 Jan 2021 06:59:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730157AbhAOG64 (ORCPT ); Fri, 15 Jan 2021 01:58:56 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41647 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730008AbhAOG6z (ORCPT ); Fri, 15 Jan 2021 01:58:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693935; x=1642229935; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JFpGSNJPIzf3yswycj4ZgTIAU3el686NtYaicK5zcfU=; b=FQXUdZAJ5rWuEbrauW6s3leSPL7imsUG0RXvhKaDubqZBfKq1GLr7Y/P LDP7lAHP8Ux9eI9Ga7Dugi12+XUOwz5gRI2nwNst0v0uVJLfQ7ywYS63t icgfq6JflIyiln+DiJl15wYEb2+y9I95iQO6E5DofKU13dkd/CAv7WHh8 udw3K6nEZRfg77WGM1SG3Azpd84azTr5EazXI9JwzSNIcgtAivnI3sqtc uAd01oy8frs7Jujhoj45m13Xj1Fn3p4a49iNYeFVdwF/W5zQzHwnwEsTW 0rNcJGGLLVSg25PUMIOf3IIt1/RAdADw+8gadWpLFYcH/E2UZ7iZH7td/ g==; IronPort-SDR: pYaagZZmfg37lvvfk4nodfAzNXBQdcpgVNDWJ6g/ChDRzhnzPbYxy0D1F8/vpMSztPStMA4FKE IEa1dXeXkP1/7G53xrl2G14TJqUfYjienVzCQ9iiTXzBj0ZtBIYjVyAphfplC6dv1f3a8kYIv2 /jVy7mAtg315Uc/I10gMJIesp2Ni5i+bimMjiVTfBHktXPq+M6oor/DS/EqTILHf2RYRuabPGR sSbAswtvIrA0TINdcOhhwbmPiOk01zIqXB6nOUx7i8ECKlFGWxoJtjAFuTh4Eo5zR4VtydNgw7 gAU= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928267" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:47 +0800 IronPort-SDR: mXvEQnuZDXcGW8P3cT9t4HpY/HvzGPm3T/GF4DTZhq/sFI+nWtlVELagyDETxO+nP7CXFw9EK1 IVeLFC8P0B/aDs5VQDO7V4YhSIItDWaiX2KI5/8Nvy9UQu4IExapdMxfYp5dZtRJuUPTF3UPIY 9gmM/9oAEBwjeopW0rkNUDkB+TRUgyLTF3m53UOx5vrn5o1nX077/aIPPyzTydgnp2oAVkdxqe 630aWbeT1so31kWFbVsKRZlqF6LpXoIod1Q3XFjoM47tTX8HGYp2TpRVX7h9V0UGofFhnuw/LA CPctFdavTWO3ljxe1F/VL/Tc Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:29 -0800 IronPort-SDR: mKrq3Gz4DXaQo4TBKJd4+ZkQmtoe5NJlVe/Y4rJF6kccKRR+QZNK4COVlirXdQ8y3K3UINrkFR eby/KCwV7TMnAASHGDgb+JFZ53YoIr23dQj4gV2CWTnLmIgo5XEd9N12y4m8dFoCKdSzYHDRX3 htvp8eZ1r4C2u+k16O3KhMrqL6aXDMf3VrS7+jz+2Um27STxGUCQXX9z5mpeZ4QDZaQ0hNCM7N 972hW7xv6goKXkWn+tSNhotx4qkKGgWl0d+dD6+EQ+MGzuUjB138nZt6U71xI0xNz3aEk+AeZl Ee8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:46 -0800 Received: (nullmailer pid 1916462 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 21/41] btrfs: handle REQ_OP_ZONE_APPEND as writing Date: Fri, 15 Jan 2021 15:53:25 +0900 Message-Id: <60d04a3d1556033c4dba7b5e61af40c2132d2f5c.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org ZONED btrfs uses REQ_OP_ZONE_APPEND bios for writing to actual devices. Let btrfs_end_bio() and btrfs_op be aware of it. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/inode.c | 10 +++++----- fs/btrfs/volumes.c | 8 ++++---- fs/btrfs/volumes.h | 1 + 4 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e9b6c6a21681..1cbcf53ba756 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -652,7 +652,7 @@ static void end_workqueue_bio(struct bio *bio) fs_info = end_io_wq->info; end_io_wq->status = bio->bi_status; - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) wq = fs_info->endio_meta_write_workers; else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) @@ -828,7 +828,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int async = check_async_write(fs_info, BTRFS_I(inode)); blk_status_t ret; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { /* * called for a read, do the setup so that checksum validation * can happen in the async kernel threads diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9c2800fa80c6..37782b4cfd28 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2252,7 +2252,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) goto out; @@ -7682,7 +7682,7 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) if (!refcount_dec_and_test(&dip->refs)) return; - if (bio_op(dip->dio_bio) == REQ_OP_WRITE) { + if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) { __endio_write_update_ordered(BTRFS_I(dip->inode), dip->logical_offset, dip->bytes, @@ -7850,7 +7850,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip = bio->bi_private; - bool write = bio_op(bio) == REQ_OP_WRITE; + bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; blk_status_t ret; /* Check btrfs_submit_bio_hook() for rules about async submit. */ @@ -7900,7 +7900,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, struct inode *inode, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); size_t dip_size; struct btrfs_dio_private *dip; @@ -7930,7 +7930,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, struct bio *dio_bio, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const bool raid56 = (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index be26fdfefc8c..5752cc470158 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6455,7 +6455,7 @@ static void btrfs_end_bio(struct bio *bio) struct btrfs_device *dev = btrfs_io_bio(bio)->device; ASSERT(dev->bdev); - if (bio_op(bio) == REQ_OP_WRITE) + if (btrfs_op(bio) == BTRFS_MAP_WRITE) btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); else if (!(bio->bi_opf & REQ_RAHEAD)) @@ -6568,10 +6568,10 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, atomic_set(&bbio->stripes_pending, bbio->num_stripes); if ((bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) && - ((bio_op(bio) == REQ_OP_WRITE) || (mirror_num > 1))) { + ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) { /* In this case, map_length has been set to the length of a single stripe; not the whole write */ - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { ret = raid56_parity_write(fs_info, bio, bbio, map_length); } else { @@ -6594,7 +6594,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, dev = bbio->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) || - (bio_op(first_bio) == REQ_OP_WRITE && + (btrfs_op(first_bio) == BTRFS_MAP_WRITE && !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) { bbio_error(bbio, first_bio, logical); continue; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 98a447badd6a..0bcf87a9e594 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -423,6 +423,7 @@ static inline enum btrfs_map_op btrfs_op(struct bio *bio) case REQ_OP_DISCARD: return BTRFS_MAP_DISCARD; case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return BTRFS_MAP_WRITE; default: WARN_ON_ONCE(1); From patchwork Fri Jan 15 06:53:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 190F4C433E0 for ; Fri, 15 Jan 2021 06:59:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D35B922473 for ; Fri, 15 Jan 2021 06:59:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730191AbhAOG7L (ORCPT ); Fri, 15 Jan 2021 01:59:11 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41718 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730008AbhAOG7K (ORCPT ); Fri, 15 Jan 2021 01:59:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693949; x=1642229949; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LfckCAVDVP6Zc1h7kEer9kYo+23V2IzuMdtgT95Ez00=; b=Mi2ziEypUstcjvinp0464vBkqCTxuzrOIkiF/JCn6HuUDcJomVONxmXg me52KtphvjmSlhvVsGmOfyOIgfOYA/sewe7+wiD4iczLI14c2cOgQkKr6 l5AWPGXbvJPFQKqWRWJdqE+bg7ApLF556zsh3jMV6k+ge+z3L4y3h6yg2 aoUVThdSuC3NPhkMxMp1uMNRmBSJIn2yc3oowf38Zf/Le2CaG+ESh8agW htJWyYffErlrHlujMxCVdz1Z81JwWfCNdorZY04n/clvJ3v+rX77/FjBF V0PkU6sbKZP/CEPawON4Fzp/IqDLtoY0aNqjcGTsdyqZ4H6WdXZKhhWVs g==; IronPort-SDR: R1ubPIdrsvYLn7S6kThhPdX8MWoXyWY/3NsFZ5Y2qNfdXId/rfy/1LQ9afQL70YhQLCNNrDmUe s0Oie2m5HcODgQgylQ8cqrxg57xF2nDgpBv4q58yXlwudGhvktmRLB7RvlWAEXsuu/d7OR6RBT kVvbVqGmFOFy9CBb84TxrcO/e3GfyY0Pux1LFU2cU7Wt5SQ2NOLLhUuxY5JUcx79wKnM8WUWLu sKjkykmlFlrthyNiUuK93S4uKmzhNUVc9nS/sGrhWgjhECYQv1Gl9ggLMgYC1pmS25WlmNp54q n0E= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928270" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:49 +0800 IronPort-SDR: AESoyod5LascMI9Myjfk806hnpdGHWyaNt0Jy/OF3sHBmLFP5wdtBiojX1Yr/bCMeyrrqINOgd rIgadgev426OaJx6qt5kNxkYN1UV79xcYtcUgZZj0SwN3/NKIEAlBIL1KEtbDoxhMMlFdrImWe 2/uG1A0kcMuQfEqqFVmQJZqBD/tCBD2NNgib6oOV9ZCBmk7Dwa+/ZsMECbd64JmGdXAaKeXfRW AEC5G3De509As1TIOGHq5lo7Pexvw4TfkD6tMg8x29pgouZwfwLjWetdPsQ6oNgYXEeBQNQHTL I5wt+2sC2Ur1gklbO8oKLyRv Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:31 -0800 IronPort-SDR: KtBhbmrpp/jOBToy1yPiWWhhZl4+o/fohWPPs2UggIlgc5oNs2Aa55u+XhLeOkLkKYwjXL/ij4 T4i27pVJiS7ar5pDwQjaCY8GfU3jRB56SrVMd4Fkrxq17rcVXDP1w8iRl6CziNmykYPMRU8k9W QAJL61C84PBE7wjL40aFwbzpmDl7VWS8POjvHikBCpr68iR1jspC58QEqPVtqb3WYmp/5OlSgt hVe1mSpK66BRITtCiay0OP/CEmY7zdCaRDcqjpw7wbYlltjZXL73VG0GJ5mHIRm8BaXlxoxM4j 9xc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:48 -0800 Received: (nullmailer pid 1916464 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , kernel test robot Subject: [PATCH v12 22/41] btrfs: split ordered extent when bio is sent Date: Fri, 15 Jan 2021 15:53:26 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For a zone append write, the device decides the location the data is written to. Therefore we cannot ensure that two bios are written consecutively on the device. In order to ensure that a ordered extent maps to a contiguous region on disk, we need to maintain a "one bio == one ordered extent" rule. This commit implements the splitting of an ordered extent and extent map on bio submission to adhere to the rule. [testbot] made extract_ordered_extent static Reported-by: kernel test robot Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 91 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.c | 85 ++++++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.h | 2 + 3 files changed, 178 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 37782b4cfd28..4df5900dd197 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2217,6 +2217,88 @@ static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } +static int extract_ordered_extent(struct inode *inode, struct bio *bio, + loff_t file_offset) +{ + struct btrfs_ordered_extent *ordered; + struct extent_map *em = NULL, *em_new = NULL; + struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree; + u64 start = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + u64 len = bio->bi_iter.bi_size; + u64 end = start + len; + u64 ordered_end; + u64 pre, post; + int ret = 0; + + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + if (WARN_ON_ONCE(!ordered)) + return -EIO; + + /* No need to split */ + if (ordered->disk_num_bytes == len) + goto out; + + /* We cannot split once end_bio'd ordered extent */ + if (WARN_ON_ONCE(ordered->bytes_left != ordered->disk_num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a compressed ordered extent */ + if (WARN_ON_ONCE(ordered->disk_num_bytes != ordered->num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a waited ordered extent */ + if (WARN_ON_ONCE(wq_has_sleeper(&ordered->wait))) { + ret = -EINVAL; + goto out; + } + + ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes; + /* bio must be in one ordered extent */ + if (WARN_ON_ONCE(start < ordered->disk_bytenr || end > ordered_end)) { + ret = -EINVAL; + goto out; + } + + /* Checksum list should be empty */ + if (WARN_ON_ONCE(!list_empty(&ordered->list))) { + ret = -EINVAL; + goto out; + } + + pre = start - ordered->disk_bytenr; + post = ordered_end - end; + + ret = btrfs_split_ordered_extent(ordered, pre, post); + if (ret) + goto out; + + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, ordered->file_offset, len); + if (!em) { + read_unlock(&em_tree->lock); + ret = -EIO; + goto out; + } + read_unlock(&em_tree->lock); + + ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)); + em_new = create_io_em(BTRFS_I(inode), em->start + pre, len, + em->start + pre, em->block_start + pre, len, + len, len, BTRFS_COMPRESS_NONE, + BTRFS_ORDERED_REGULAR); + free_extent_map(em_new); + +out: + free_extent_map(em); + btrfs_put_ordered_extent(ordered); + + return ret; +} + /* * extent_io.c submission hook. This does the right thing for csum calculation * on write, or reading the csums from the tree before a read. @@ -2252,6 +2334,15 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct page *page = bio_first_bvec_all(bio)->bv_page; + loff_t file_offset = page_offset(page); + + ret = extract_ordered_extent(inode, bio, file_offset); + if (ret) + goto out; + } + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 79d366a36223..6e4ffb3861e7 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -898,6 +898,91 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, } } +static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos, + u64 len) +{ + struct inode *inode = ordered->inode; + u64 file_offset = ordered->file_offset + pos; + u64 disk_bytenr = ordered->disk_bytenr + pos; + u64 num_bytes = len; + u64 disk_num_bytes = len; + int type; + unsigned long flags_masked = + ordered->flags & ~(1 << BTRFS_ORDERED_DIRECT); + int compress_type = ordered->compress_type; + unsigned long weight; + int ret; + + weight = hweight_long(flags_masked); + WARN_ON_ONCE(weight > 1); + if (!weight) + type = 0; + else + type = __ffs(flags_masked); + + if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered->flags)) { + WARN_ON_ONCE(1); + ret = btrfs_add_ordered_extent_compress(BTRFS_I(inode), + file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type, + compress_type); + } else if (test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) { + ret = btrfs_add_ordered_extent_dio(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type); + } else { + ret = btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type); + } + + return ret; +} + +int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post) +{ + struct inode *inode = ordered->inode; + struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree; + struct rb_node *node; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + int ret = 0; + + spin_lock_irq(&tree->lock); + /* Remove from tree once */ + node = &ordered->rb_node; + rb_erase(node, &tree->tree); + RB_CLEAR_NODE(node); + if (tree->last == node) + tree->last = NULL; + + ordered->file_offset += pre; + ordered->disk_bytenr += pre; + ordered->num_bytes -= (pre + post); + ordered->disk_num_bytes -= (pre + post); + ordered->bytes_left -= (pre + post); + + /* Re-insert the node */ + node = tree_insert(&tree->tree, ordered->file_offset, + &ordered->rb_node); + if (node) + btrfs_panic(fs_info, -EEXIST, + "zoned: inconsistency in ordered tree at offset %llu", + ordered->file_offset); + + spin_unlock_irq(&tree->lock); + + if (pre) + ret = clone_ordered_extent(ordered, 0, pre); + if (post) + ret = clone_ordered_extent(ordered, + pre + ordered->disk_num_bytes, + post); + + return ret; +} + int __init ordered_data_init(void) { btrfs_ordered_extent_cache = kmem_cache_create("btrfs_ordered_extent", diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 0bfa82b58e23..2ff238b78eda 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -190,6 +190,8 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, u64 end, struct extent_state **cached_state); +int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post); int __init ordered_data_init(void); void __cold ordered_data_exit(void); From patchwork Fri Jan 15 06:53:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DCD2C433E9 for ; Fri, 15 Jan 2021 06:59:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0D50622A84 for ; Fri, 15 Jan 2021 06:59:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730249AbhAOG7U (ORCPT ); Fri, 15 Jan 2021 01:59:20 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41680 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730008AbhAOG7T (ORCPT ); Fri, 15 Jan 2021 01:59:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693958; x=1642229958; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pSZERuJD86LxlnqxkApvgrloFWpKf3y7EDgp8rxmJ8A=; b=DlyH8yR8lf7HGqOIgN6lrSRqi6S9r52QysyvxA6JcCLURU/VAPdBnyO8 9vVd01qEb97eF0VfDqpnzE9ZmDlevPePYiJAElLKBxHKl1NACSDX7o26G FYtMPxXh8w5E1fDc/znp134ijLe5dKyLwQOoBG6IBsJ714YU7MmICowTl TydjbraQip2YIqlStgBdUpdjJpBriHjzha68A07/6byGDbwFpurp/o1YS MibY9DT/bPyzDPmh0IhAG8C20ubjNhVdN8avTS3fl3bp7EcsWPp4dt1vN CmAOcnFHutPZPdtjDEhsQiIRgDc2Lr/JOLiO/Xneb71faqJsRgJKHwxwl g==; IronPort-SDR: sZOv0YF3r5/kI1ucJnO29kf1790rsA8iZI+tytWTxv/x02sldvin0hc9O0IgwdOMRou/TiWtDe Xr0Ae4mCuAeSmxtBsoyNzQrw/bP9eVjXejzxVt7wXtMxZxAF5kZ8Ul5uAyRK10Yir44qleisaR 2WlfQGA5k4t6/K6OIRPlzM1mHVFJDUVQUX2LWDIED348od90oAbJ3OctH6Lnu/78Ar35CJs5gv XE3SNzQbX01p2nS7aoojqWBRs6cKTlS4vt2LxHzdilr80V+H2nImk+VAFM5mQNazd0m/rgj2cM sv4= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928273" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:51 +0800 IronPort-SDR: rY6qp9BKjni+4bZeKHhp6KGJBIEBVSaAdfqKdRHIojqBpDNH3OI365nZwTNRyHbsm8fC7E87Km mVDMvYThKxK9mu52WEAhHwHuX3m1ZxVbNDCecsfsvTPYxhptuqx0dtHcTwCZzE23dJa4KgOzU0 vfn/HhC+cB1095TAwNYfBEoXK5ug1PPUYWhi/fD9ZCkiq+BjCsS8wgKKNzbbX8i3sTfz4FwhTo YYfgNX99p00T++lIX2Sw7OdDR54VsiPNSaczr+FRyi2dbJuHKD1COvG8m4VKk3Dnxiy+Ed5UtR iFBzVLMxkSyrnIesQeFfr3Kd Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:33 -0800 IronPort-SDR: B13D6lOAcF86JWkGUQ4Zoq89UJYpLFKbFyBdXkWki42kYBRDKZt9RR24HaXNFkEfio7JnUSKHu Ae+HhOTJ7q4PAdIhtSEGj2ArDgnX371wcHUlgyFi8NlUSBW3V5ZWu8DzLmSIgXXZ4YSM5QsyHv X34ie5Yb1BgLuj4ArsdAgcjEVXndHx351bJfq4BIuuGrErAlCKskOjYN64hkoxsfuqXQOaFA0t OPuEuIx24QZpZ8XX11HudYytuZKZ0DBozR5wyGrHZlNVXiIxZtorAYoMo8xncRtp6TRJ3QInI9 za0= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:50 -0800 Received: (nullmailer pid 1916466 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 23/41] btrfs: extend btrfs_rmap_block for specifying a device Date: Fri, 15 Jan 2021 15:53:27 +0900 Message-Id: <02a9e2dd2d839fb694d5bfeb15fe6cc86a886f8a.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org btrfs_rmap_block currently reverse-maps the physical addresses on all devices to the corresponding logical addresses. This commit extends the function to match to a specified device. The old functionality of querying all devices is left intact by specifying NULL as target device. We pass block_device instead of btrfs_device to __btrfs_rmap_block. This function is intended to reverse-map the result of bio, which only have block_device. This commit also exports the function for later use. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 20 ++++++++++++++------ fs/btrfs/block-group.h | 8 +++----- fs/btrfs/tests/extent-map-tests.c | 2 +- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index f7c85cc81d1e..7083189884de 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1576,8 +1576,11 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) } /** - * btrfs_rmap_block - Map a physical disk address to a list of logical addresses + * btrfs_rmap_block - Map a physical disk address to a list of logical + * addresses * @chunk_start: logical address of block group + * @bdev: physical device to resolve. Can be NULL to indicate any + * device. * @physical: physical address to map to logical addresses * @logical: return array of logical addresses which map to @physical * @naddrs: length of @logical @@ -1587,9 +1590,9 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) * Used primarily to exclude those portions of a block group that contain super * block copies. */ -EXPORT_FOR_TESTS int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len) + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len) { struct extent_map *em; struct map_lookup *map; @@ -1607,6 +1610,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, map = em->map_lookup; data_stripe_length = em->orig_block_len; io_stripe_size = map->stripe_len; + chunk_start = em->start; /* For RAID5/6 adjust to a full IO stripe length */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) @@ -1621,14 +1625,18 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, for (i = 0; i < map->num_stripes; i++) { bool already_inserted = false; u64 stripe_nr; + u64 offset; int j; if (!in_range(physical, map->stripes[i].physical, data_stripe_length)) continue; + if (bdev && map->stripes[i].dev->bdev != bdev) + continue; + stripe_nr = physical - map->stripes[i].physical; - stripe_nr = div64_u64(stripe_nr, map->stripe_len); + stripe_nr = div64_u64_rem(stripe_nr, map->stripe_len, &offset); if (map->type & BTRFS_BLOCK_GROUP_RAID10) { stripe_nr = stripe_nr * map->num_stripes + i; @@ -1642,7 +1650,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, * instead of map->stripe_len */ - bytenr = chunk_start + stripe_nr * io_stripe_size; + bytenr = chunk_start + stripe_nr * io_stripe_size + offset; /* Ensure we don't add duplicate addresses */ for (j = 0; j < nr; j++) { @@ -1684,7 +1692,7 @@ static int exclude_super_stripes(struct btrfs_block_group *cache) for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); - ret = btrfs_rmap_block(fs_info, cache->start, + ret = btrfs_rmap_block(fs_info, cache->start, NULL, bytenr, &logical, &nr, &stripe_len); if (ret) return ret; diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 0f3c62c561bc..9df00ada09f9 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -277,6 +277,9 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info); int btrfs_free_block_groups(struct btrfs_fs_info *info); void btrfs_wait_space_cache_v1_finished(struct btrfs_block_group *cache, struct btrfs_caching_control *caching_ctl); +int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len); static inline u64 btrfs_data_alloc_profile(struct btrfs_fs_info *fs_info) { @@ -303,9 +306,4 @@ static inline int btrfs_block_group_done(struct btrfs_block_group *cache) void btrfs_freeze_block_group(struct btrfs_block_group *cache); void btrfs_unfreeze_block_group(struct btrfs_block_group *cache); -#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS -int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len); -#endif - #endif /* BTRFS_BLOCK_GROUP_H */ diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c index 57379e96ccc9..c0aefe6dee0b 100644 --- a/fs/btrfs/tests/extent-map-tests.c +++ b/fs/btrfs/tests/extent-map-tests.c @@ -507,7 +507,7 @@ static int test_rmap_block(struct btrfs_fs_info *fs_info, goto out_free; } - ret = btrfs_rmap_block(fs_info, em->start, btrfs_sb_offset(1), + ret = btrfs_rmap_block(fs_info, em->start, NULL, btrfs_sb_offset(1), &logical, &out_ndaddrs, &out_stripe_len); if (ret || (out_ndaddrs == 0 && test->expected_mapped_addr)) { test_err("didn't rmap anything but expected %d", From patchwork Fri Jan 15 06:53:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DEC3C433DB for ; Fri, 15 Jan 2021 06:59:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5139022A84 for ; Fri, 15 Jan 2021 06:59:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730299AbhAOG7X (ORCPT ); Fri, 15 Jan 2021 01:59:23 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41681 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730008AbhAOG7W (ORCPT ); Fri, 15 Jan 2021 01:59:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693961; x=1642229961; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GAnczFidO7iFYKvl2xGT3RTr3YtuD9PBLHPuewv8gkY=; b=NlnSofVHnW7AtWx2krNhYIKttq2LVoAPvPjra0LgHpAxQWL1yD8Fbdce qUERWqKG0dI+v8sMWGHU9/Opel3d7J9EvODq1GvRb7jsMshql0mhSOH/F jWdVj6CKWZ5NMkeAo+yow9aulO5MCsIhaUVWqo6+AMueIBX2P2a6Z8cXS O/pLuKgHW5u/7cxZLeUi1PwbzGnXvl8AT7OZhvRR3A/y0v7dLwiDjv4pU YN67qAiVepLwEChbmrx4hc2SGqosz/1Yo7/iGPYWp+KdVKP+lBQK8x6we iOJW6E9XfYi5bR1cP8qlbQ2cdPVwesMVBRfN6Wnt7/QIy+3tlpeaQucDO g==; IronPort-SDR: GKy95HN+m2tQ3Jz1iDUHP+8N2ftR5gbQioZDc3QeONmHpMpfYNF6zUYbAXNV/AQCyvE0uUwN2E oBlJ/lTczfYlK6JLopBIpn6h17cbRcAehVyX+Jr8rJigdboXIVaIl5zRFGg+pb/VTQeoL+KuyP jyCR3ILO/exfctlHUMXcJhsIgcSzmDN0bz8ea6NtqPl28vJlvkRgqHkYsnEAXECzh4CXLIOJsZ rtFjZxIWsfiEmTXIz+o1H8J39J5eEGy4cXNCyHwwCNUr5gV6UgkWDEP+xcYR1OZNlC4pcSKjZs vw4= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928276" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:53 +0800 IronPort-SDR: 3WJLzoHWF9r1MXFb+Mqa8gtQp3MRxrY+xGDgkgO/nru89+AklOKZJEJz8sXzoOWO7J021Ev3qI kjCpb+ghZJ8jvsjp4Um5gGGo9t25N6Hpi2Zgf7GLJt8jaHRYbrqHelsX7SMB3UQlEMl2cyPmLN 9bTrAJxUxGZT/YsYPFNuB3rGxm1C26biKU3OxqFONMWHoNOX3ElYkM7VafWBc73PXcg+ertKTl B/O4Uqq/TNmJ8YQRFvEX2LfpIuRHkREhLgiF9HaaTqroBNyj/yuejdh1ACqaHP15zTrtUJB/3X G2IwMkg7IkCcEOmILk4lv3fx Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:35 -0800 IronPort-SDR: RjPpNew1xyFprClUzFU6Mu+UhTt8Svniorc11O05rzxWy9zojjPK3vWwFDMnu2DbWBo5oKAAUY ed7AqCA/C7L8gHTw8g6xNE1Hjbl6zVh5HRJ7k66eGQRoZxCHIDSevFI2xMzICVDzBXeUp0XyTU MRG6ufQ8S1KjvMGbSaw8GXoqu36OL/vK4dZ30BNkaXT9Bp6cLmz4Q9i1rabL4InUE71LTOxRTk r3f7lFad7hKO2mF6mECH2KlEygyqVeF24rfA1aAFh8CImTa8izJKgF5ofL8AujE2ybxtELhX8v sv8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:52 -0800 Received: (nullmailer pid 1916468 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Josef Bacik Subject: [PATCH v12 24/41] btrfs: cache if block-group is on a sequential zone Date: Fri, 15 Jan 2021 15:53:28 +0900 Message-Id: <327e5c00dd98a9bdb70e3ec7da74a315fd08c551.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn In zoned mode, cache if a block-group is on a sequential write only zone. On sequential write only zones, we can use REQ_OP_ZONE_APPEND for writing of data, therefore provide btrfs_use_zone_append() to figure out if I/O is targeting a sequential write only zone and we can use said REQ_OP_ZONE_APPEND for data writing. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn --- fs/btrfs/block-group.h | 2 ++ fs/btrfs/zoned.c | 29 +++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 5 +++++ 3 files changed, 36 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 9df00ada09f9..a1d96c4cfa3b 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -184,6 +184,8 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + /* Flag indicating this block-group is placed on a sequential zone */ + bool seq_zone; /* * Allocation offset for the block group to implement sequential * allocation. This is used only with ZONED mode enabled. diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 2b02a38b11f9..fed014211f32 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1103,6 +1103,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) } } + if (num_sequential > 0) + cache->seq_zone = true; + if (num_conventional > 0) { /* * Avoid calling calculate_alloc_pointer() for new BG. It @@ -1223,3 +1226,29 @@ void btrfs_free_redirty_list(struct btrfs_transaction *trans) } spin_unlock(&trans->releasing_ebs_lock); } + +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_block_group *cache; + bool ret = false; + + if (!btrfs_is_zoned(fs_info)) + return false; + + if (!fs_info->max_zone_append_size) + return false; + + if (!is_data_inode(&inode->vfs_inode)) + return false; + + cache = btrfs_lookup_block_group(fs_info, em->block_start); + ASSERT(cache); + if (!cache) + return false; + + ret = cache->seq_zone; + btrfs_put_block_group(cache); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 331951978487..92888eb86055 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -46,6 +46,7 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -134,6 +135,10 @@ static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb) { } static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) +{ + return false; +} #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 15 06:53:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3C58C4332E for ; Fri, 15 Jan 2021 06:59:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8C72E22A84 for ; Fri, 15 Jan 2021 06:59:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729689AbhAOG7b (ORCPT ); Fri, 15 Jan 2021 01:59:31 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41752 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728869AbhAOG7a (ORCPT ); Fri, 15 Jan 2021 01:59:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693969; x=1642229969; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DF0s6vOXeJuIWmLbfvjbkqoi5OpIHuh6ergiXOjz52s=; b=FW9IvLiFb1um4i6m+bDsr5LKGdP0OILZY1afwUU+qnfdZuqOO+1XBXFP tCyyboVrVmLCD0xUHuVCwkMm2zOR0g0wQ+TwFKV+AbQtglOPcldom4voM IpanowfW24jNPgSi8xv0Ab12UEOh3r65jKSuM2dZq/w90DKjrSflDKt1e ZdJ9VuvrkrcZGUK+C2ejE0OrL0Cp0fc8V9LMf5uVZe46SvLCprlbFFxzZ SNw6ujzzy6yCuKb1BGbgZhcs83RXc+FXbNem9CfXlrrzsSaKf3TuUnKAY 5xiGvaetgbSKwflzWWuc3Kvss50H/d4jzPiR2juiCg5EoLmH2tsYPmesS Q==; IronPort-SDR: 5FBbL+q0qw8fBLzWXnasDYKtJBdvtlFtiybXY64Rt4oXGcCUxf3JXvMSNyF3psIQdG+c5WwW5o Z1SUdesLZi2i0eP5tGJ6AUGXrdAbYjPMOIz79+eK0hbUjlt2lLRkzQJ9A5bYnW0QTWFHDSGaez cb/zdRsTindgGbF+ha4qhJpSYsaz50g6hcVtHjoKKn6uWq6+XWPLeiJ9SpdHC7zZoLgROKqx6K 4oS2iiGzIL3CYry+OXR6lDZXO0FApmyU+p8qt0Yk+zIFccodS5CL1VW6Qjb9X8eWpoWmytGjq7 qzo= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928279" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:55 +0800 IronPort-SDR: Z65y+lkfWQ7BqRClAZGFIUuXUs6UqXLyKsGJT7TuGZ52F631qqgsLXGXGi6SeVbyGa6vDZKwWy CcdNwhe8PcK0BUtgr/rNxB2qMNxTnvF29817/TbDs+jDNTePbP/CQQQvG7mX7QCMSa280SPAaD sop18Ni5An0Nv+aSIh8+cPpzeO02vrht1RV8IcLcGfpZ81U6AwoCqMFVQlVLszOJA4wA2/WTYF fBAUlqr3kYBI4WYBwd9jX4/EdSmMUwQmiQrjAXeiT3nc0sgEorcf+PraTT7yRyfNirL26PSGFx N3bmd4zZeiRjb3TfgyFgvzHd Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:37 -0800 IronPort-SDR: OOhquzltL2FZ/EvuPnANPZezul+pQOgk2KX54cVVB2v7ynNan7wHTCcA8gGYWCy14sEvUcvIoB 3+abYazFfh3AnklduQJag2jtNDsWJBJ9922roLlvkuv023bBCMgyGekPw+KaQ64fS/hyHwju6Y 0xNwK4+MlrmeYXPKnGmkeJIQDIt2F2Vy6ivbQN8Rz5gAKxbGC+sxWVx38UQt8KkiUYrAcfkJNS D8X9iHC5cceBs5dBa/a+QXmb88ljpGR6ybcftvet35lCkBZiljUtEYTv3x19RTgBCtrUSBrJeM Mlo= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:54 -0800 Received: (nullmailer pid 1916470 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v12 25/41] btrfs: save irq flags when looking up an ordered extent Date: Fri, 15 Jan 2021 15:53:29 +0900 Message-Id: <2006d10556749769d73fde4958dde0d844bb4f8d.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn A following patch will add another caller of btrfs_lookup_ordered_extent() from a bio endio context. btrfs_lookup_ordered_extent() uses spin_lock_irq() which unconditionally disables interrupts. Change this to spin_lock_irqsave() so interrupts aren't disabled and re-enabled unconditionally. Signed-off-by: Johannes Thumshirn --- fs/btrfs/ordered-data.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 6e4ffb3861e7..5c0df39d0503 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -745,9 +745,10 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *ino struct btrfs_ordered_inode_tree *tree; struct rb_node *node; struct btrfs_ordered_extent *entry = NULL; + unsigned long flags; tree = &inode->ordered_tree; - spin_lock_irq(&tree->lock); + spin_lock_irqsave(&tree->lock, flags); node = tree_search(tree, file_offset); if (!node) goto out; @@ -758,7 +759,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *ino if (entry) refcount_inc(&entry->refs); out: - spin_unlock_irq(&tree->lock); + spin_unlock_irqrestore(&tree->lock, flags); return entry; } From patchwork Fri Jan 15 06:53:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021637 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6C93C43332 for ; Fri, 15 Jan 2021 06:59:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B5A0B22597 for ; Fri, 15 Jan 2021 06:59:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730333AbhAOG7f (ORCPT ); Fri, 15 Jan 2021 01:59:35 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41699 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728869AbhAOG7e (ORCPT ); Fri, 15 Jan 2021 01:59:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693974; x=1642229974; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9q+ZrrqpzILsI7/cUu1g9dMj4Qx2Rm+ZlOOskHzjCws=; b=G316UuQr/F4X1Ztt+3EqAcG365w9lKAq7mfSehZeQFYlr4Tu+dv/SZVT RZBlgV7VQdP3OLgCioTbQKVe7x0tJsbIiuwkM+TOWRsn6myn+URtBKfWV iAhRcvrcFOr89ACXq0uW7B2uv33LRnTLkkxnrle2IwjjxcTjhDoRJlWoT 1GByGbJsxg/4MPkntuffi3RPfw9vHmgxZzVY6bKGMPRLA44/zXZTMRxvh fCD8iQgKr66QCw40TwQXMrCoJtlYIgIyZHwuA7LQlIGpf9zDYRe8+7rmR o4Bht3F4ZNPrTeHO0bH2nprO9F17zmcqhVq/GVIdKnFi9QDsvJwQZbbTm A==; IronPort-SDR: ITAynqIWugoQ7w7BLDmxtl0Xbeymu8jzwjkD0YxPvYlX6HNH34vRhMnaO+6qWpzAF5NIgtaRkT suaULQ5+Phmk1Fuu981sVUbY7q8mlIXAbc4YSa5L/o7OMszJ57GUzc2IFfkeNmBqW8PXHedxmb f2DOWywW6XujubHvwcloxK/YvZ/9H6h1CfrV93D3whyBKnchcVL/5vyDWrDQnrh2wX3cHWYd/u CGDPq+X1nrrKR+y3mfUHfwrJ9TzWO/NhazJyHpyf545qVWbOjcZvFzfhXcORjC5xwRRKxOYnGW 1T0= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928282" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:57 +0800 IronPort-SDR: hRUVq9OKcmaR03V1JqU1A6zJ1gJWh/mrpznd8zixKq2II74p0fevtq5p9lo4rHXOQvUoVm5xnp Gl3ZaJyBy9ViPSrKHovvXPKHgYMpsXT4IrVG2lWDTxZW/tQIoOGluue8xtm985yGEmsfrEeD7b dUK7xpJ6LpNIS/PY6WZwa6JSZC39Xj7vEJQJL635KAbNn7vVOMfkfiw5t2D1GDRMinjWuP3m+i hp0YMsOJPVrxsbG5s0qmRcNq/mM0zMPteS1LWPLaTWrlfSqk+Y2sNJP3sZIgIwpG+4U7OxN4+z 0qR1M6tfywgfbMFDm525Nx7v Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:39 -0800 IronPort-SDR: roxMngERzAAEkN7k8U0Stu/hd+BSHCIRXCmGf3oZOaaJafw+8xK1emh7dCri2X1d35aF43YMn9 DGT+GS+7shpYlYFmngHPa5D7FW9wuZzcrSyapN/9p7eO6Z07wR32RqijkrtoV0Y5UytNOMkN7C MaFZiyWg874Ai5HLNT5NJfVb8z83u4vsUQ0WltTk3XkVIv82ej1zFV0aQOtWv8XPmy/ykAFE9B 6HJvJURYQQhF3EbCuhYsZyXvuh49s7xylafqceDHigfZVUJtj4rxXeju5bWoGiABJFPFLsila6 o+4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:56 -0800 Received: (nullmailer pid 1916472 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v12 26/41] btrfs: use ZONE_APPEND write for ZONED btrfs Date: Fri, 15 Jan 2021 15:53:30 +0900 Message-Id: <69bc6601a2c82457f5f7e40744d6dbebd328e958.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit enables zone append writing for zoned btrfs. When using zone append, a bio is issued to the start of a target zone and the device decides to place it inside the zone. Upon completion the device reports the actual written position back to the host. Three parts are necessary to enable zone append in btrfs. First, modify the bio to use REQ_OP_ZONE_APPEND in btrfs_submit_bio_hook() and adjust the bi_sector to point the beginning of the zone. Secondly, records the returned physical address (and disk/partno) to the ordered extent in end_bio_extent_writepage() after the bio has been completed. We cannot resolve the physical address to the logical address because we can neither take locks nor allocate a buffer in this end_bio context. So, we need to record the physical address to resolve it later in btrfs_finish_ordered_io(). And finally, rewrites the logical addresses of the extent mapping and checksum data according to the physical address (using __btrfs_rmap_block). If the returned address matches the originally allocated address, we can skip this rewriting process. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 11 ++++++- fs/btrfs/file.c | 2 +- fs/btrfs/inode.c | 4 +++ fs/btrfs/ordered-data.c | 3 ++ fs/btrfs/ordered-data.h | 8 +++++ fs/btrfs/volumes.c | 15 +++++++++ fs/btrfs/zoned.c | 68 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 12 ++++++++ 8 files changed, 121 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 41fccfbaee15..214b330dc490 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2735,6 +2735,7 @@ static void end_bio_extent_writepage(struct bio *bio) u64 start; u64 end; struct bvec_iter_all iter_all; + bool first_bvec = true; ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { @@ -2761,6 +2762,11 @@ static void end_bio_extent_writepage(struct bio *bio) start = page_offset(page); end = start + bvec->bv_offset + bvec->bv_len - 1; + if (first_bvec) { + btrfs_record_physical_zoned(inode, start, bio); + first_bvec = false; + } + end_extent_writepage(page, error, start, end); end_page_writeback(page); } @@ -3580,6 +3586,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, size_t blocksize; int ret = 0; int nr = 0; + int opf = REQ_OP_WRITE; const unsigned int write_flags = wbc_to_write_flags(wbc); bool compressed; @@ -3626,6 +3633,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, offset = em->block_start + extent_offset; block_start = em->block_start; compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags); + if (btrfs_use_zone_append(inode, em)) + opf = REQ_OP_ZONE_APPEND; free_extent_map(em); em = NULL; @@ -3652,7 +3661,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, page->index, cur, end); } - ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, + ret = submit_extent_page(opf | write_flags, wbc, page, offset, iosize, pg_offset, &epd->bio, end_bio_extent_writepage, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e65223e3510d..5c120d8d060d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2176,7 +2176,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * the current transaction commits before the ordered extents complete * and a power failure happens right after that. */ - if (full_sync) { + if (full_sync || btrfs_is_zoned(fs_info)) { ret = btrfs_wait_ordered_range(inode, start, len); } else { /* diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4df5900dd197..6b5f273a0d83 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -50,6 +50,7 @@ #include "delalloc-space.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" struct btrfs_iget_args { u64 ino; @@ -2830,6 +2831,9 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) bool clear_reserved_extent = true; unsigned int clear_bits = EXTENT_DEFRAG; + if (ordered_extent->disk) + btrfs_rewrite_logical_zoned(ordered_extent); + start = ordered_extent->file_offset; end = start + ordered_extent->num_bytes - 1; diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 5c0df39d0503..ac1f9fd348eb 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -199,6 +199,9 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset entry->compress_type = compress_type; entry->truncated_len = (u64)-1; entry->qgroup_rsv = ret; + entry->physical = (u64)-1; + entry->disk = NULL; + entry->partno = (u8)-1; if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE) set_bit(type, &entry->flags); diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 2ff238b78eda..635c398a173f 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -127,6 +127,14 @@ struct btrfs_ordered_extent { struct completion completion; struct btrfs_work flush_work; struct list_head work_list; + + /* + * used to reverse-map physical address returned from ZONE_APPEND + * write command in a workqueue context. + */ + u64 physical; + struct gendisk *disk; + u8 partno; }; /* diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5752cc470158..c8c94e5081eb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6507,6 +6507,21 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_io_bio(bio)->device = dev; bio->bi_end_io = btrfs_end_bio; bio->bi_iter.bi_sector = physical >> 9; + /* + * For zone append writing, bi_sector must point the beginning of the + * zone + */ + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (btrfs_dev_is_sequential(dev, physical)) { + u64 zone_start = round_down(physical, + fs_info->zone_size); + + bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT; + } else { + bio->bi_opf &= ~REQ_OP_ZONE_APPEND; + bio->bi_opf |= REQ_OP_WRITE; + } + } btrfs_debug_in_rcu(fs_info, "btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u", bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector, diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index fed014211f32..6d11081fde7d 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1252,3 +1252,71 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) return ret; } + +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio) +{ + struct btrfs_ordered_extent *ordered; + u64 physical = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + + if (bio_op(bio) != REQ_OP_ZONE_APPEND) + return; + + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + if (WARN_ON(!ordered)) + return; + + ordered->physical = physical; + ordered->disk = bio->bi_disk; + ordered->partno = bio->bi_partno; + + btrfs_put_ordered_extent(ordered); +} + +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) +{ + struct extent_map_tree *em_tree; + struct extent_map *em; + struct inode *inode = ordered->inode; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_ordered_sum *sum; + struct block_device *bdev; + u64 orig_logical = ordered->disk_bytenr; + u64 *logical = NULL; + int nr, stripe_len; + + bdev = bdget_disk(ordered->disk, ordered->partno); + if (WARN_ON(!bdev)) + return; + + if (WARN_ON(btrfs_rmap_block(fs_info, orig_logical, bdev, + ordered->physical, &logical, &nr, + &stripe_len))) + goto out; + + WARN_ON(nr != 1); + + if (orig_logical == *logical) + goto out; + + ordered->disk_bytenr = *logical; + + em_tree = &BTRFS_I(inode)->extent_tree; + write_lock(&em_tree->lock); + em = search_extent_mapping(em_tree, ordered->file_offset, + ordered->num_bytes); + em->block_start = *logical; + free_extent_map(em); + write_unlock(&em_tree->lock); + + list_for_each_entry(sum, &ordered->list, list) { + if (*logical < orig_logical) + sum->bytenr -= orig_logical - *logical; + else + sum->bytenr += *logical - orig_logical; + } + +out: + kfree(logical); + bdput(bdev); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 92888eb86055..cf420964305f 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -47,6 +47,9 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio); +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -139,6 +142,15 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) { return false; } + +static inline void btrfs_record_physical_zoned(struct inode *inode, + u64 file_offset, struct bio *bio) +{ +} + +static inline void btrfs_rewrite_logical_zoned( + struct btrfs_ordered_extent *ordered) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 15 06:53:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021639 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BCD2C432C3 for ; Fri, 15 Jan 2021 06:59:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0881222DA7 for ; Fri, 15 Jan 2021 06:59:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730369AbhAOG7h (ORCPT ); Fri, 15 Jan 2021 01:59:37 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41647 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728869AbhAOG7g (ORCPT ); Fri, 15 Jan 2021 01:59:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693976; x=1642229976; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x0qh581EhEV9VljJVg6uXrojD8Nw39aN75ASM6rP5PU=; b=mSkvjN1gKoyAS2/BAjQ+NSmwaqhg5yPKc7O9V+mhb6BBx4qSAd0NiIbk +U59OeGwxtJgbuoBlfm2WkoxIFepZaESGlNryKOGACgGufmacIZWW1oKC 9/pHFdoAbnUENAVJ9LbHp4nH/V40slRh8HImsopwNynBgSpY161yGem0y lKZQPv5lgq6aJJ4dEv6WEXnVj/a5pldfFGJujaikfudiGFXIAKR+xVFWm KG3fNRGm4+iCEPQXxxj2jqZVspc3fzQh2ncFHd3yvbtxOwypp+xB/vKkG 5fBjLcUA+Z45prWzn1/Pc94xzIDN3pbV9xXK02rogGzSdMQoT9IJWffvb Q==; IronPort-SDR: USn2lDRyIUZ1P039pcafPaQU2vcx4zNxUnEtj/KmxGHqrECLsbwhpHxV7JJNd8AIVDFWgzaTY8 FZ/mUHNg8NykWA+eQWFDUsQsngjFYWLfvr/7HhsILprGX9IQH39ncrvC3Nr52wZLimVabxOlAp 9dv3Vgv1g2QjMoP+sDGcGIhrIhLioDxoIeXgi5IkJ6FGemb13xSi1pAP1+WTScwsF3H/nJiVBs cfW5eX05Q/2t9r95VldvngB7JGy1DefVbNBLN7D1xAMQw9ffrlExfwMFhmq67mFHGgeufWoBF2 Fc8= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928284" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:55:59 +0800 IronPort-SDR: 2w43JGyLMiEDdH5WVYAXTYkzPlwcTpE5Zd3hw8uo5TJ0Vf9VixGWuAjEdegf9yEnqOdz4ymDpx QeE6JmfrYOcnhyrKtd5eqhbVjbQ+NohCVQamgm1K6BPtbNjbACvzBwiRXc2rVTIpvS6gK2M5SF 5R2NcHmjbXRHEt/oCAiZcdpCgq8sQxuefkAXKmgi3x7h1PuMYjLI8K523xt5NZ5qjECEM8dUzN PvA0i5R3WN+ZvJM9ZjFq5ZrA+c5pMO5yoQ9bTNJYDCO0kbK5q4H+MTlHBD7Qgwe/EzN/YyVRl2 mO/IMCvdCpxntDhO6b/1MNOj Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:41 -0800 IronPort-SDR: WAcpIRnSv+pgoqsgM3zM18sV8pkMh8UJ1EC0XefWCKYFjBZW9JcIN6XWxAoTNFITOf7wWFs3Q2 4qtR1aAe1b/HFJSXjHh8UF1lej+jEhBF+w5CNJAMLGP5cwf6IpInGynCl+vY8dZjjp1HgJK6Bj YpPU5mdFYHsJ4Re4zSinwRAZ+C0Oa5Uq3/EkdDrMrAe2/b+iW/SnkFysVpsVid9wgFS6xgk3wQ 35vDRD3ECG78fc0gQjEvB5iwob5OTgYSaIuuJoECt1ITZgnJ04v+Fj6TsH20UqLwsMNGX5i5tN TkM= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:55:58 -0800 Received: (nullmailer pid 1916474 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 27/41] btrfs: enable zone append writing for direct IO Date: Fri, 15 Jan 2021 15:53:31 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Likewise to buffered IO, enable zone append writing for direct IO when its used on a zoned block device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6b5f273a0d83..4f0915346c9d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7708,6 +7708,9 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, iomap->bdev = fs_info->fs_devices->latest_bdev; iomap->length = len; + if (write && btrfs_use_zone_append(BTRFS_I(inode), em)) + iomap->flags |= IOMAP_F_ZONE_APPEND; + free_extent_map(em); return 0; @@ -7936,6 +7939,8 @@ static void btrfs_end_dio_bio(struct bio *bio) if (err) dip->dio_bio->bi_status = err; + btrfs_record_physical_zoned(dip->inode, dip->logical_offset, bio); + bio_put(bio); btrfs_dio_private_put(dip); } @@ -8088,6 +8093,18 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, bio->bi_end_io = btrfs_end_dio_bio; btrfs_io_bio(bio)->logical = file_offset; + WARN_ON_ONCE(write && btrfs_is_zoned(fs_info) && + fs_info->max_zone_append_size && + bio_op(bio) != REQ_OP_ZONE_APPEND); + + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + ret = extract_ordered_extent(inode, bio, file_offset); + if (ret) { + bio_put(bio); + goto out_err; + } + } + ASSERT(submit_len >= clone_len); submit_len -= clone_len; From patchwork Fri Jan 15 06:53:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021641 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86DBDC433DB for ; Fri, 15 Jan 2021 07:00:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5079C22D50 for ; Fri, 15 Jan 2021 07:00:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730483AbhAOG7w (ORCPT ); Fri, 15 Jan 2021 01:59:52 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41718 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730431AbhAOG7v (ORCPT ); Fri, 15 Jan 2021 01:59:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610693990; x=1642229990; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QsVu7GBkwVh+sJroTATbqitsmKDIxtzYpMvVRjLwIjg=; b=ESk2Z/2nH68yefeR1w17jJinpLbzCF7zKeTIwDARMWVNkbzo19Zqzshd leqfqcQKHzzTGj+km4Z4z89gKiQbGkFVNbAOVIbvwmag1ZuJ+n7JqobAS b85LOQMt2i5XPZtDimmDKncpswPOohjAJbL2ReLAbqPGxRDLmRa4kOSBz NeUcr/wFVoY0Dmmm2Ib2NJfSCxgLNYWIYH5tOf3nFu5IaH8lEsFQBqyI+ FMzywM3aanUyftR/JXuiYN5vLQDqDBp90vtOxhCoC8zrGP+/gisbL6sky +duKe3YoYy414eucSqo7ytfhuviHk+DtFCF5TaM9DwEDsMRECYA/8uvu3 w==; IronPort-SDR: IgvWB/w/umFpGjc2a5G53apVdX0RzdhL4h8mwTeyfKszHhFMyZjlVV0T1DGzcoF9H8MPghxdq/ zoSOpcF3ygtrn1y0wdGfLoYMLVXd3rm78J4U9x2bK/YmQvdK0qEebzES5uNX55hm25GQJglusH 25mhXONZVWSedBRVRR0WUpvcfdjeeiasYeB2rD+jfdnsTODXWYqtU0d5ZVXvHTXbhXqyexIIOQ 169ZlxkKBRkkTijSLDa55OIOAmQG9U5OquHTnnQJTiFS4rgpnLFwLzhW8KMSIB2C0eL3q4ic8X mRo= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928287" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:01 +0800 IronPort-SDR: rGr9WjJi/yAg+twUu3tbKpRCkcIkYx6diR1bRrgMU3Zi74fSn9xXNTWLefXN+ChABIGPc+lc1v SifAcxnugaT/zswNpKGB9d0LOU0MJrWraSMaOquKnilmcxXkbnnK6lz/FXaqE9M+jChqEZO9iC +//L2Vc1XJItMEP+jA4OkfxorpxnP9hT5wn2teWbbcakHsNwq1fB5Qz0x63d/yl4CMOjrP0/zv GUDZtIBEgkUUNj7IfAUXrRCpwoOkGZ7bUBOCdA93sv9GgiCq9cN0tzo1CiLINCAS27kdHNF64l m9qxX158lyIwZFUuf01Gac3w Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:43 -0800 IronPort-SDR: 1qq5PG/HOhqxWetSMqqG3K34f5yBftHZefBnN6RqV5D0w97WdZ2qM3HdIFp45gv/6mHlsBelcW OSiVl0B7mMsiuupZ8mBUESvGYb7g5lu7ZMRyw03wpdkY1ToINfvdbSwbnXhR2KImCZZSkKHpKA 57YXfrxp+Fqq1kVt9mru2jEdbQTOXZRz8Neirhm4YfkmCSD4C3uQun3p1K+Gr9/rylC9UlN3Tz 96LUQ9l+VZwG9Kf5p1AbOWUPq6nUAbbl1Hwv0RNneR21Suk297XSjvyi3tGQqCpjgVsK+ATBEc NiU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:00 -0800 Received: (nullmailer pid 1916476 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 28/41] btrfs: introduce dedicated data write path for ZONED mode Date: Fri, 15 Jan 2021 15:53:32 +0900 Message-Id: <34fed65a5240be2d2eed7b0be01f1db56c1f88d9.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If more than one IO is issued for one file extent, these IO can be written to separate regions on a device. Since we cannot map one file extent to such a separate area, we need to follow the "one IO == one ordered extent" rule. The Normal buffered, uncompressed, not pre-allocated write path (used by cow_file_range()) sometimes does not follow this rule. It can write a part of an ordered extent when specified a region to write e.g., when its called from fdatasync(). Introduces a dedicated (uncompressed buffered) data write path for ZONED mode. This write path will CoW the region and write it at once. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4f0915346c9d..cf84fdfd6543 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1400,6 +1400,29 @@ static int cow_file_range_async(struct btrfs_inode *inode, return 0; } +static noinline int run_delalloc_zoned(struct btrfs_inode *inode, + struct page *locked_page, u64 start, + u64 end, int *page_started, + unsigned long *nr_written) +{ + int ret; + + ret = cow_file_range(inode, locked_page, start, end, + page_started, nr_written, 0); + if (ret) + return ret; + + if (*page_started) + return 0; + + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + extent_write_locked_range(&inode->vfs_inode, start, end, WB_SYNC_ALL); + *page_started = 1; + + return 0; +} + static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes) { @@ -1879,17 +1902,24 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page { int ret; int force_cow = need_force_cow(inode, start, end); + const bool do_compress = inode_can_compress(inode) && + inode_need_compress(inode, start, end); + const bool zoned = btrfs_is_zoned(inode->root->fs_info); if (inode->flags & BTRFS_INODE_NODATACOW && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); } else if (inode->flags & BTRFS_INODE_PREALLOC && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_can_compress(inode) || - !inode_need_compress(inode, start, end)) { + } else if (!do_compress && !zoned) { ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); + } else if (!do_compress && zoned) { + ret = run_delalloc_zoned(inode, locked_page, start, end, + page_started, nr_written); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); ret = cow_file_range_async(inode, wbc, locked_page, start, end, From patchwork Fri Jan 15 06:53:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021645 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A9C6C433E0 for ; Fri, 15 Jan 2021 07:00:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 783C423436 for ; Fri, 15 Jan 2021 07:00:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730655AbhAOHAB (ORCPT ); Fri, 15 Jan 2021 02:00:01 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41680 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730431AbhAOHAA (ORCPT ); Fri, 15 Jan 2021 02:00:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694000; x=1642230000; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/PVyq3vtYmNhQaOZ+JfHrBp0uNR8tszwSC13TJ4K8LA=; b=an02NXxz/mVgLAmv2bzn/NCdVkDtLYjswGbvQMBnyCBM9DWEcj+7iUJ5 uTGj1P5xH5MgIQkKfCUTMXVGAXkeNNXqYULdErq/qNg3bUnHALxFBL4EO 7t/YJtEPN6T+DaZ8h2vUhboGtleW6h8DfiqVLnpAkmPsda7IW2O3BuGfc 7ELIUVA1CX9c+Th54xDm1IRPwTDkmQfuWHwpoUoYykUxraCdsXfQvNdlg SElngwu0WiUR7bAT2T25K4Ftgd/R8wypuYsrLV0SMXbZ0q+nMQuQRO/ai kUdvj7QN+Km1lBfN8SmudMrhMkecxASDg/aNrLFnpBEgDk4dRQ2/a3FHR Q==; IronPort-SDR: 41hEJEhd1JTahD1XkG9eBfRHOpEi9aXlNL3AT2uBYrePuAMZf1cxXQ7GV5RYANcOxvl7NhLFZ+ XL0RHCIAXvsxwh4ajWnSGMVa9CRjZgeZxiU9IMdJad0gfezB1YUMgC3VC12qvAcrUMsxjoGDkm Iqg9rPyR3LVNV5YrYBiBmCgr/t9a2LSMQB4Lea3ZiNTrMKWSRi0PzfOm7LTMjwUuulnITyGVws t1GVHSvNeRQtlvnyOfyIlO8KdZ7N2BAMsJQMtG8wnHAGLh1gJ7uHwLKZTx8pPDTN8t5YNc/ER2 2jE= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928288" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:03 +0800 IronPort-SDR: 4IW1TtISIlNNKA0ZmvI9bxQ/p5w8qEIakIYW3kSvMdC12Esm9OfjIRYkN7zAD3yQtn7y2MENup eIFWx/y6q4S8uvZPq7/EOLQH6y7mOykZUSuwDxH7CHM94MWaYVkhVDUb2mIptDNoXryy8sFOiy jl/wk84BbsTm71TVn6dpapX7XPloba0zaq/B3zRfwRSGMzeM3OCXsZ0nulxnH5kKvnMSKQbxem fIZdITtaYjFESHZFHYgTRdicRel3e6gU9GoYzXTie2GSoJYTA2mQxzM1Gkh4zaUQodkRaVFW/v x0ckzkMOVSuqrPmXX3yDvP9H Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:45 -0800 IronPort-SDR: Pu8binni7wrtU2UVSGMBouNND2FZ1EWyHcBeO+hjUqpEuYp00ImTuqiDH+py00BzmoHTPZz1bq qHYySOy+v17UNc2PNqKqa4Y08ITEDFc/5FRdb9HZh4hvZxppgdds8NipMpqgDfqysLUjpGAwPP K03hrNAdLELYutl5jEf6/nh5BvyzFpitjhgilFiIcDfiziiKIM6A7/VOiSpts2inMIW8XQRylJ 4479H8E6rPk4ftSl0rlY0d9UtWpYkvHoZ3MzcXqXlH+4cf/IBzMBXRJ3jcRf5UxO0u82tynXGR 20c= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:02 -0800 Received: (nullmailer pid 1916478 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 29/41] btrfs: serialize meta IOs on ZONED mode Date: Fri, 15 Jan 2021 15:53:33 +0900 Message-Id: <5811f1708400c6ad39ed0fd8df1fd6ca961c4ba8.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We cannot use zone append for writing metadata, because the B-tree nodes have references to each other using the logical address. Without knowing the address in advance, we cannot construct the tree in the first place. So we need to serialize write IOs for metadata. We cannot add a mutex around allocation and submission because metadata blocks are allocated in an earlier stage to build up B-trees. Add a zoned_meta_io_lock and hold it during metadata IO submission in btree_write_cache_pages() to serialize IOs. Furthermore, this add a per-block group metadata IO submission pointer "meta_write_pointer" to ensure sequential writing, which can be caused when writing back blocks in an unfinished transaction. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 1 + fs/btrfs/extent_io.c | 25 ++++++++++++++++++++- fs/btrfs/zoned.c | 50 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 32 +++++++++++++++++++++++++++ 6 files changed, 109 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index a1d96c4cfa3b..19a22bf930c6 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -192,6 +192,7 @@ struct btrfs_block_group { */ u64 alloc_offset; u64 zone_unusable; + u64 meta_write_pointer; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index cc8b8bab241d..1085f8d9752b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -976,6 +976,7 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; + struct mutex zoned_meta_io_lock; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1cbcf53ba756..1f0523a796b4 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2704,6 +2704,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) mutex_init(&fs_info->delete_unused_bgs_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); + mutex_init(&fs_info->zoned_meta_io_lock); seqlock_init(&fs_info->profiles_lock); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 214b330dc490..3d004bae2fa2 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -25,6 +25,7 @@ #include "backref.h" #include "disk-io.h" #include "zoned.h" +#include "block-group.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -4073,6 +4074,7 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, struct extent_buffer **eb_context) { struct address_space *mapping = page->mapping; + struct btrfs_block_group *cache = NULL; struct extent_buffer *eb; int ret; @@ -4105,13 +4107,31 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, if (!ret) return 0; + if (!btrfs_check_meta_write_pointer(eb->fs_info, eb, &cache)) { + /* + * If for_sync, this hole will be filled with + * trasnsaction commit. + */ + if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) + ret = -EAGAIN; + else + ret = 0; + free_extent_buffer(eb); + return ret; + } + *eb_context = eb; ret = lock_extent_buffer_for_io(eb, epd); if (ret <= 0) { + btrfs_revert_meta_write_pointer(cache, eb); + if (cache) + btrfs_put_block_group(cache); free_extent_buffer(eb); return ret; } + if (cache) + btrfs_put_block_group(cache); ret = write_one_eb(eb, wbc, epd); free_extent_buffer(eb); if (ret < 0) @@ -4157,6 +4177,7 @@ int btree_write_cache_pages(struct address_space *mapping, tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; + btrfs_zoned_meta_io_lock(fs_info); retry: if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); @@ -4197,7 +4218,7 @@ int btree_write_cache_pages(struct address_space *mapping, } if (ret < 0) { end_write_bio(&epd, ret); - return ret; + goto out; } /* * If something went wrong, don't allow any metadata write bio to be @@ -4232,6 +4253,8 @@ int btree_write_cache_pages(struct address_space *mapping, ret = -EROFS; end_write_bio(&epd, ret); } +out: + btrfs_zoned_meta_io_unlock(fs_info); return ret; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 6d11081fde7d..d4edcc5edcfc 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1161,6 +1161,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) ret = -EIO; } + if (!ret) + cache->meta_write_pointer = cache->alloc_offset + cache->start; + kfree(alloc_offsets); free_extent_map(em); @@ -1320,3 +1323,50 @@ void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) kfree(logical); bdput(bdev); } + +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + struct btrfs_block_group *cache; + bool ret = true; + + if (!btrfs_is_zoned(fs_info)) + return true; + + cache = *cache_ret; + + if (cache && (eb->start < cache->start || + cache->start + cache->length <= eb->start)) { + btrfs_put_block_group(cache); + cache = NULL; + *cache_ret = NULL; + } + + if (!cache) + cache = btrfs_lookup_block_group(fs_info, eb->start); + + if (cache) { + if (cache->meta_write_pointer != eb->start) { + btrfs_put_block_group(cache); + cache = NULL; + ret = false; + } else { + cache->meta_write_pointer = eb->start + eb->len; + } + + *cache_ret = cache; + } + + return ret; +} + +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ + if (!btrfs_is_zoned(eb->fs_info) || !cache) + return; + + ASSERT(cache->meta_write_pointer == eb->start + eb->len); + cache->meta_write_pointer = eb->start; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index cf420964305f..a42e120158ab 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -50,6 +50,11 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, struct bio *bio); void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret); +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -151,6 +156,19 @@ static inline void btrfs_record_physical_zoned(struct inode *inode, static inline void btrfs_rewrite_logical_zoned( struct btrfs_ordered_extent *ordered) { } +static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + return true; +} + +static inline void btrfs_revert_meta_write_pointer( + struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -243,4 +261,18 @@ static inline bool btrfs_can_zone_reset(struct btrfs_device *device, return true; } +static inline void btrfs_zoned_meta_io_lock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_lock(&fs_info->zoned_meta_io_lock); +} + +static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_unlock(&fs_info->zoned_meta_io_lock); +} + #endif From patchwork Fri Jan 15 06:53:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021643 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA0FDC4332B for ; Fri, 15 Jan 2021 07:00:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9E9D622A84 for ; Fri, 15 Jan 2021 07:00:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730697AbhAOHAE (ORCPT ); Fri, 15 Jan 2021 02:00:04 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41681 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730431AbhAOHAD (ORCPT ); Fri, 15 Jan 2021 02:00:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694002; x=1642230002; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=o6YHwAG7kXo/LlM/7zJUdczbFgCU0tQ1zureC3L547I=; b=hdrzuW7KQkz0yoQygK/e48CRdJiA08mgw0yYn+45nS8dMj77EPuxt2KZ SdRXz+elTqEO4l7iY98Fcg78q4zqciQguaR2qRxQv18Y5LXiC3jcY9vb3 hgMmurz4TEaCwCNnDLlDkt0zMSEUzJZqkI3AsXrA16eyY0unX8esRNgUx cqqitloflpqpkyGmzD9x45gUf6ttLBkfxtSZPIxRTab288Wae1F9FmG0I F7vj9RiOTC3Qrd0j1k1zc2WzH6jOA7hSfB5olXZK933BNREEa1D57eCkK lNKuKkrtY5fV7ZDiwX/+8YuUSYsSqLnLiidE9EjTcnaxqy5u1LJEveof6 Q==; IronPort-SDR: pfxwGNPI7mhDhNVfuylBek/PHPWOvch5SvY7JOEJ6jj4D0HTjquKhJYq8JBu08rPF81DLI18Ha /jr4RTtgsZKGyUZKUdTSLQE+YCRqGowH8wlLuAv6QifoAXcKF1D6yCXLKVojC7ctLxRgcVmMe3 ynhe0jCpCt13kodiSN8Lz8d7KZ2lzcX6P8ftW0fenbmymMnJNoqJMq7wHyXkCja5mNpt7veBIm cYix6giBExCZjWWO2ssYL3AtQfuMylOG1L6aBT+7MLUcaO6hQEA4qA3oCp6UQw0Sz7tqQPm70z Olw= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928291" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:05 +0800 IronPort-SDR: hebNwMZht/Bw3Ei6b3nBnd8sIWt9LGSeu9k9r4eABKjwBNYgW2WV/N2IXZy3v2rsnGT/TdAhIm +xaXgaqmG/XC5KM3POIPwDEYaq+IRaFJo4h5dKPefTZxF7VTRFe+SuZKa1LtQzrvnE7nh7ZpUN L3AblFeBRZmZAt1BvLqT2lzESWPWXROrPeiUq02TsrNhXLnCifkvTGOzgBUOomqSKRD+B23Nku uFf+FC1sFm7Ngekf/FzfsTy2gA9lgOdUyFMyjn56Nyf92kFZnU+OTyyKm8SvOykLqv1D+aUCiq yECFUDBv9kOtic/weIYga5Au Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:47 -0800 IronPort-SDR: 0wJLofFV4VDoXeucziSPaP/L+1AtJeDQJgwmHMhIcTCdaastdrBJl0AEqVliivtYTBCEkhGipA 9rEu03N2m/W7x2muIM/cfybSI8IpjRNaoemvN7f6hYgf/XtlrB0Ds99ObpOwZoJJJUckAAPyMY JbmOecb3CI3mE0rUbOlFz9kcSPhjxScGQCLY5d2emQEgoVmNaXFwbeh8wwdXtF/I+UqGMj48/p /Io4ZkyE04QJmtui4Z1zfmd1KF0GJvPxrgxAqnxaLwlK0AgFI9IKvfwDuNqkbZRm/EL99Ruxfo /Jk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:04 -0800 Received: (nullmailer pid 1916480 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 30/41] btrfs: wait existing extents before truncating Date: Fri, 15 Jan 2021 15:53:34 +0900 Message-Id: <278a7c8f77cbe67e3d36a91e24c9390eec8f0a39.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When truncating a file, file buffers which have already been allocated but not yet written may be truncated. Truncating these buffers could cause breakage of a sequential write pattern in a block group if the truncated blocks are for example followed by blocks allocated to another file. To avoid this problem, always wait for write out of all unwritten buffers before proceeding with the truncate execution. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index cf84fdfd6543..4b29a770bfa5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5136,6 +5136,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_drew_write_unlock(&root->snapshot_lock); btrfs_end_transaction(trans); } else { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + if (btrfs_is_zoned(fs_info)) { + ret = btrfs_wait_ordered_range( + inode, + ALIGN(newsize, fs_info->sectorsize), + (u64)-1); + if (ret) + return ret; + } /* * We're truncating a file that used to have good data down to From patchwork Fri Jan 15 06:53:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021647 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E382C433E6 for ; Fri, 15 Jan 2021 07:00:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 78F2822597 for ; Fri, 15 Jan 2021 07:00:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730734AbhAOHAL (ORCPT ); Fri, 15 Jan 2021 02:00:11 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41752 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730484AbhAOHAL (ORCPT ); Fri, 15 Jan 2021 02:00:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694010; x=1642230010; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QHop2SNGXgOREr2Aczdit1N1Gf0FQv5Ys9s3ZFZ8P9k=; b=gw9E7Fin4a/zt6vPSulobsJtntdAtoQ9IcTPRCEDjzRD6V2LP0h31oBe DPNs4n65geK35fxNerTGQYyZFJSH748u6apSgUSpmANhWNuKGupT4H9LA 41oPAqEAAvXa1BJBIfUUbVaP7dApjTISxGJRv+OdOnaJCfp2bw0kv64CS fN3kQgpK3oVm0De5kj8eaCQjjqQmJ57gkAO4mtZxy5gPZAygHgRCuv4AA 0Ja8tSuqp+HU4NHPhf3xxHkHPMgYEsPscFGH5IVJ8KryOj68vZQwFtr07 sj7zFb7ri7vC7kjOTT2AlXskAX6Pf/FA3qxju20MrSAq4f8r5U5FTOpay g==; IronPort-SDR: ocETKv8Lz3Zgp6jU+1fVMin25p7jYI41oHjA1iS8rw6Gh7X9RP9Zj5IbnYSIoX6uNxf9v2D8po gQU2QMDWMvbJOuMMy9lYua21oQraQGK0yrmOPo8llDdwMKMq6MsLnUmySsoFTi5vZgUJSWup2/ lrn1pWYo90nK21k213PuS3rXTLW+uu+JkyOlOg1AV3MDElWggtAbPl4Bxb882D7jgfKIQXMfoo dvQwrbX8LNdnIyGCJM1t7qQeZNJpV65Lc3TrWSV2qJNtieAWCUqyF1yZM7F/Ptjez/Y86Q9kCF Ps4= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928296" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:07 +0800 IronPort-SDR: GuA0mcvRW1hy6LsKwR43c72fZxzTJ5b6kbuqdgnp6XFrJ1SDYZO/58/XBANdhAe+ID3SrRT4ip 0ya4jZJWUpM4C8H521AN2pH8aG57thU+rmGj4lkm1Yf6ty2cUrMdm8CU/1WQAonE0azAyU5nS/ 8tnlaYMQPIf24y511QJ1pBwHjxR4VH7ehW0O+H5mXiryRCQ0vxbCzC6vEuw4ODUbpQQxqUF9Jp yXxwfW7zoZvUMpatiC8gMqK+URKdSMohEPkXiPIwG9KEeESeuyG1iCIPqiMKhRdu5BPlpb9Qw0 GehFVsVaqyH/Com0a2bzfM1+ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:49 -0800 IronPort-SDR: rPEz92gahBtHKl1cfIesRXVke2lxSO9kEMwv038FQ5qAcJBKfTDJSpPnt2dYJrQnZIFs9z11pM XrwMN8VRrTCS9EfBWYK6rYAMMuatL96tQKjXZyXpZNxFAMSg6e1MDZ9PFRL4w91KMB+cOHbId6 aMG492ZDIbliSXXqOV8VGczE/hs1OPlBit7RzLVDle00D6CKADse996kXcNv/SN0q/g4bSQOKm YoAcIi2tPfhY1hiT6zuC3UxQGmjnAkMT7FA/Wl72z501u6WsqGz7sJNkCwg2fgyuIxLAs17GKI Ldk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:06 -0800 Received: (nullmailer pid 1916482 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 31/41] btrfs: avoid async metadata checksum on ZONED mode Date: Fri, 15 Jan 2021 15:53:35 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In ZONED, btrfs uses per-FS zoned_meta_io_lock to serialize the metadata write IOs. Even with these serialization, write bios sent from btree_write_cache_pages can be reordered by async checksum workers as these workers are per CPU and not per zone. To preserve write BIO ordering, we can disable async metadata checksum on ZONED. This does not result in lower performance with HDDs as a single CPU core is fast enough to do checksum for a single zone write stream with the maximum possible bandwidth of the device. If multiple zones are being written simultaneously, HDD seek overhead lowers the achievable maximum bandwidth, resulting again in a per zone checksum serialization not affecting performance. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1f0523a796b4..efcf1a343732 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -814,6 +814,8 @@ static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio, static int check_async_write(struct btrfs_fs_info *fs_info, struct btrfs_inode *bi) { + if (btrfs_is_zoned(fs_info)) + return 0; if (atomic_read(&bi->sync_writers)) return 0; if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) From patchwork Fri Jan 15 06:53:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021649 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30965C433E0 for ; Fri, 15 Jan 2021 07:00:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 03FD222473 for ; Fri, 15 Jan 2021 07:00:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730763AbhAOHAQ (ORCPT ); Fri, 15 Jan 2021 02:00:16 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41699 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730484AbhAOHAP (ORCPT ); Fri, 15 Jan 2021 02:00:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694014; x=1642230014; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bs5kdRQRmWNJivMdayB8MbrQ6IwKl0lXTAehbR6yqYk=; b=rTN84j9zXrcDPz4J7z/+B0ewml1wcYMIV3FBxrVqGw/817aSgElpG3ns nAAZTastniqfGaz9oCHi9wy3taSkTAIMjdM3iUqDIihOOWwirhTGx/6iC P9F7x1Mn4Fix6rrMurpIr2Yrmqe4I9fVjP7RM49YbouvMHliWF8BejehB cd06KQ22RLxbtaKdGM2nKVSbU2u5xxCVBeJRlNuPV2ORfagoh4PseBlwx Sg9e3O3Ae8DEn5v9FjrGjpU0AgOGrdXOZu01MC9ecQryN+EOjKWx7T/LV 5o+C+g1AK+nJ/PyiDTDkauGuRRtKAZu9L7GQUv+Xppdm/K3ri/jEtPZ2p Q==; IronPort-SDR: GhkJgwiTYTCGIfAwOQ7+fbMesF8zHSrXcYKqFxrhHq9ZPbtwqTmhNngIMsdjWx33S3ikHccxMf 2w7hW+GdK2mmFnI+OwOyNXZW0tStfaFA3pI+9/EAy1zCyqTpxHt1BvstlIgCabsoSmGLaRGDi0 35u54S540mgr8phQ8x/GqfgUpD+2xKwCVPV1I6uY0Mo2MlRk3Be801qcP5P5nuEQogEdVkZB6H wOEkJVqHCV3KVNPHmxv9OkvUghZNBJ7+MAcuAs2yRpkJ/RJAVhFT4Z5ynqQqW8LPBQ6AK7bgqt r6A= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928301" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:09 +0800 IronPort-SDR: zfb2QjtIwtdXew4g1hTftZ79LhdOQ5gTqyjXWD5m4uqI+ZjgkHhKszIZQ0wgJtO6sICAASsJZ2 3uOw7RQYSF3EzjwQd8kVuqAmIU+gzTjaMVYWQ0YzhFpjaSsNaPu9o16CqpEyglVezqjIkukoGb NDBN1wx/cELHemO/SVbxCM4i7cDTVdORsNV46Ll4E83sDmhMtmbpN3poB2+C4x2s4GJBZmOnSX Oe8HOUu3bRgJRzbLzWDhyujtBlXSHk1tcsx+KVwh9tZ8jI3vRvS6BQe+ewGJfwQGujC5CfUcEX PLP6mvcywZNAxBhjsm7zouMP Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:51 -0800 IronPort-SDR: iZpgUr9LelIn1nWU+sMH9G0VVLCkQtD4+0asOk25GDpTPa5F1cizjzUBjGosElJYoFdqnHS2uh dDfnIpxNHx0kRmlrXCLiBeOBmtrVxgwX07ABssnA7fSDVaFmNw3brg/DjzQcO4trz7ojuzAADZ 3OK/ToW5mwDQndtY/n6HfOFUBwBiUVdSfIUrP+LGDuuRaTg64xrcqv8opUQqcUyIh2OXCESdsw mlRRFr2ZXvu4HVRfL1wpqQlAhQyS59ifs1o1fgYr2d1ZoZPmWmIFG9pkMrMCUsocMbFuO0GyI5 pAk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:08 -0800 Received: (nullmailer pid 1916484 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 32/41] btrfs: mark block groups to copy for device-replace Date: Fri, 15 Jan 2021 15:53:36 +0900 Message-Id: <8a15ae8f4ac407a54b44d5d1bec80342b28dd047.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/4 patch to support device-replace in ZONED mode. We have two types of I/Os during the device-replace process. One is an I/O to "copy" (by the scrub functions) all the device extents on the source device to the destination device. The other one is an I/O to "clone" (by handle_ops_on_dev_replace()) new incoming write I/Os from users to the source device into the target device. Cloning incoming I/Os can break the sequential write rule in the target device. When writing is mapped in the middle of a block group, the I/O is directed in the middle of a target device zone, which breaks the sequential write rule. However, the cloning function cannot be merely disabled since incoming I/Os targeting already copied device extents must be cloned so that the I/O is executed on the target device. We cannot use dev_replace->cursor_{left,right} to determine whether bio is going to not yet copied region. Since we have a time gap between finishing btrfs_scrub_dev() and rewriting the mapping tree in btrfs_dev_replace_finishing(), we can have a newly allocated device extent which is never cloned nor copied. So the point is to copy only already existing device extents. This patch introduces mark_block_group_to_copy() to mark existing block groups as a target of copying. Then, handle_ops_on_dev_replace() and dev-replace can check the flag to do their job. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/dev-replace.c | 182 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/dev-replace.h | 3 + fs/btrfs/scrub.c | 17 ++++ 4 files changed, 203 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 19a22bf930c6..3dec66ed36cb 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -95,6 +95,7 @@ struct btrfs_block_group { unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; + unsigned int to_copy:1; int disk_cache_state; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 324f646d6e5e..b2a6ca206ac0 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -22,6 +22,7 @@ #include "dev-replace.h" #include "sysfs.h" #include "zoned.h" +#include "block-group.h" /* * Device replace overview @@ -459,6 +460,183 @@ static char* btrfs_dev_name(struct btrfs_device *device) return rcu_str_deref(device->name); } +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info, + struct btrfs_device *src_dev) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_dev_extent *dev_extent = NULL; + struct btrfs_block_group *cache; + struct btrfs_trans_handle *trans; + int ret = 0; + u64 chunk_offset; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return 0; + + mutex_lock(&fs_info->chunk_mutex); + + /* Ensure we don't have pending new block group */ + spin_lock(&fs_info->trans_lock); + while (fs_info->running_transaction && + !list_empty(&fs_info->running_transaction->dev_update_list)) { + spin_unlock(&fs_info->trans_lock); + mutex_unlock(&fs_info->chunk_mutex); + trans = btrfs_attach_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret == -ENOENT) + continue; + else + goto unlock; + } + + ret = btrfs_commit_transaction(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret) + goto unlock; + + spin_lock(&fs_info->trans_lock); + } + spin_unlock(&fs_info->trans_lock); + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto unlock; + } + + path->reada = READA_FORWARD; + path->search_commit_root = 1; + path->skip_locking = 1; + + key.objectid = src_dev->devid; + key.offset = 0; + key.type = BTRFS_DEV_EXTENT_KEY; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto free_path; + if (ret > 0) { + if (path->slots[0] >= + btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_leaf(root, path); + if (ret < 0) + goto free_path; + if (ret > 0) { + ret = 0; + goto free_path; + } + } else { + ret = 0; + } + } + + while (1) { + struct extent_buffer *l = path->nodes[0]; + int slot = path->slots[0]; + + btrfs_item_key_to_cpu(l, &found_key, slot); + + if (found_key.objectid != src_dev->devid) + break; + + if (found_key.type != BTRFS_DEV_EXTENT_KEY) + break; + + if (found_key.offset < key.offset) + break; + + dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent); + + chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent); + + cache = btrfs_lookup_block_group(fs_info, chunk_offset); + if (!cache) + goto skip; + + spin_lock(&cache->lock); + cache->to_copy = 1; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + +skip: + ret = btrfs_next_item(root, path); + if (ret != 0) { + if (ret > 0) + ret = 0; + break; + } + } + +free_path: + btrfs_free_path(path); +unlock: + mutex_unlock(&fs_info->chunk_mutex); + + return ret; +} + +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map *em; + struct map_lookup *map; + u64 chunk_offset = cache->start; + int num_extents, cur_extent; + int i; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return true; + + spin_lock(&cache->lock); + if (cache->removed) { + spin_unlock(&cache->lock); + return true; + } + spin_unlock(&cache->lock); + + em = btrfs_get_chunk_map(fs_info, chunk_offset, 1); + ASSERT(!IS_ERR(em)); + map = em->map_lookup; + + num_extents = cur_extent = 0; + for (i = 0; i < map->num_stripes; i++) { + /* We have more device extent to copy */ + if (srcdev != map->stripes[i].dev) + continue; + + num_extents++; + if (physical == map->stripes[i].physical) + cur_extent = i; + } + + free_extent_map(em); + + if (num_extents > 1 && cur_extent < num_extents - 1) { + /* + * Has more stripes on this device. Keep this BG + * readonly until we finish all the stripes. + */ + return false; + } + + /* Last stripe on this device */ + spin_lock(&cache->lock); + cache->to_copy = 0; + spin_unlock(&cache->lock); + + return true; +} + static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, const char *tgtdev_name, u64 srcdevid, const char *srcdev_name, int read_src) @@ -500,6 +678,10 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, if (ret) return ret; + ret = mark_block_group_to_copy(fs_info, src_device); + if (ret) + return ret; + down_write(&dev_replace->rwsem); switch (dev_replace->replace_state) { case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h index 60b70dacc299..3911049a5f23 100644 --- a/fs/btrfs/dev-replace.h +++ b/fs/btrfs/dev-replace.h @@ -18,5 +18,8 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info); void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info); int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info); int __pure btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace); +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical); #endif diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 3a0a6b8ed6f2..b57c1184f330 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3564,6 +3564,17 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + + if (sctx->is_dev_replace && btrfs_is_zoned(fs_info)) { + spin_lock(&cache->lock); + if (!cache->to_copy) { + spin_unlock(&cache->lock); + ro_set = 0; + goto done; + } + spin_unlock(&cache->lock); + } + /* * Make sure that while we are scrubbing the corresponding block * group doesn't get its logical address and its device extents @@ -3695,6 +3706,12 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, scrub_pause_off(fs_info); + if (sctx->is_dev_replace && + !btrfs_finish_block_group_to_copy(dev_replace->srcdev, + cache, found_key.offset)) + ro_set = 0; + +done: down_write(&dev_replace->rwsem); dev_replace->cursor_left = dev_replace->cursor_right; dev_replace->item_needs_writeback = 1; From patchwork Fri Jan 15 06:53:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021651 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EDEBC433E9 for ; Fri, 15 Jan 2021 07:00:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3482C22597 for ; Fri, 15 Jan 2021 07:00:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730805AbhAOHAS (ORCPT ); Fri, 15 Jan 2021 02:00:18 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41647 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730484AbhAOHAR (ORCPT ); Fri, 15 Jan 2021 02:00:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694017; x=1642230017; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WD9LT9tM0885Oy1NdHTkL0CRdbifJwUyVIslk7M6Ut4=; b=En5LAMKEoV6jw60/m1T+5CxEgJQx66P2mI/L+LfPH6AzWNgr89inBDeJ +VeBsEWs6/8fpKyZXhBv7K15RHKCK9N9bSXHxvv0sd2MggnJuleSj5OM3 +o606/KF+R6LprRPwTWdudJ5Zo9WtvHMkGj5Ch4M3ZybWXur7I8DTdbox 3QpaH5svkNDTD6nLlsptm5B2aZciCix/0hEVCOOIHoTn4SXQuB+jIzjkn 9p2nYHmC586bCahdfq6G+D+V/HFOmlFlocZ0+TvhO49ygPydeIvKOvkcZ c6ZZOr9o3s89JzGiNtGALg/dtrmXagXfpYGxumea9H8CAFyfsluClwsAq Q==; IronPort-SDR: MNnYmfAz3FrjbEj6wIYUGKqnh1gquXsqVVWQUhtm8HKThI196NBbe0nYxK76A70xM32p7g1+Zs nTnKVlg0uCbslVtBWkxDsF/dVknQVsDFqouuFbIh92nPwkvnhd9lWQKjvlIJutktSY0wyCJYjx P/FpXQW9fem/cuRxDO80qOcORFnYDupS4FUDn9R16ygJpIH1LwNKQPB7aRBunVYVbWhrYEQbtT /4emFnwJb82cCci2TBoAlRkZ1P3n3PNKajRVeyaTWoUGOY0DZcx8auQNip7m/COE921H5G5Duz 59A= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928304" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:11 +0800 IronPort-SDR: uPlEvnzlthldBLsu+kG093uf38MWTj4utz4BoNHTgZUtxxnYc6wFCwZLNuO8mSTxcQTokPRPhm ug/N30hMIpRJnUULW0sw5cPFwKWK8oQsmFjNXaAyzc7eR8BAsxanLmMNni6sgPEBzfV1jrdAnj 4/axSHrQVjpBxQB3LWB6xV7uKXAPwulqH2RhnqbTxVVu9MOTcMAfTLmz0fx11xyZlY5vG13Y+e iVl/F7NxTEP8edWdfPHIvTQkTdCrXdejD171wApab/SsTJDCxiQTeJj8Q+x/MgJZyoLagfd+Mw cXlQsFL3t9T855CyfBo/DYog Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:53 -0800 IronPort-SDR: BpYbzjXjaZYcmtD1tNFGYFR74mBCJi8ujW1lJ0V52ymJsn/sHsjIcOLtSKP3CqoFyS9FyS6efH 4DeXw8b2QBE6BHQ3yk6CX1rqS3GvhB+IPZwuubTT+q78pzc2QOUPuinEK+sE/3JumD9h4H5MHP wIdZvftqHiivz+olWf1aFFl4ZxLgh7D/yQ0YxQdLozMy1uiAAZwBfPT58FYMn2qp3xVmXxzSmY yWEr0xKxyzezv5sff4OBi1jFAnS9VdtLzzX3OO3JIQfafuzDL+XAyKP8NLGpcmnYS7LmFMJv5J G4Y= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:10 -0800 Received: (nullmailer pid 1916486 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 33/41] btrfs: implement cloning for ZONED device-replace Date: Fri, 15 Jan 2021 15:53:37 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 2/4 patch to implement device-replace for ZONED mode. On zoned mode, a block group must be either copied (from the source device to the destination device) or cloned (to the both device). This commit implements the cloning part. If a block group targeted by an IO is marked to copy, we should not clone the IO to the destination device, because the block group is eventually copied by the replace process. This commit also handles cloning of device reset. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++----------- fs/btrfs/volumes.c | 33 ++++++++++++++++++++++-- fs/btrfs/zoned.c | 11 ++++++++ 3 files changed, 84 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ac24a79ce32a..23d77e3196ca 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -35,6 +35,7 @@ #include "discard.h" #include "rcu-string.h" #include "zoned.h" +#include "dev-replace.h" #undef SCRAMBLE_DELAYED_REFS @@ -1300,6 +1301,46 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, return ret; } +static int do_discard_extent(struct btrfs_bio_stripe *stripe, u64 *bytes) +{ + struct btrfs_device *dev = stripe->dev; + struct btrfs_fs_info *fs_info = dev->fs_info; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + u64 phys = stripe->physical; + u64 len = stripe->length; + u64 discarded = 0; + int ret = 0; + + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, phys, len)) { + u64 src_disc; + + ret = btrfs_reset_device_zone(dev, phys, len, &discarded); + if (ret) + goto out; + + if (!btrfs_dev_replace_is_ongoing(dev_replace) || + dev != dev_replace->srcdev) + goto out; + + src_disc = discarded; + + /* send to replace target as well */ + ret = btrfs_reset_device_zone(dev_replace->tgtdev, phys, len, + &discarded); + discarded += src_disc; + } else if (blk_queue_discard(bdev_get_queue(stripe->dev->bdev))) { + ret = btrfs_issue_discard(dev->bdev, phys, len, &discarded); + } else { + ret = 0; + *bytes = 0; + } + +out: + *bytes = discarded; + return ret; +} + int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes, u64 *actual_bytes) { @@ -1333,28 +1374,14 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { - struct btrfs_device *dev = stripe->dev; - u64 physical = stripe->physical; - u64 length = stripe->length; u64 bytes; - struct request_queue *req_q; if (!stripe->dev->bdev) { ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } - req_q = bdev_get_queue(stripe->dev->bdev); - /* Zone reset in ZONED mode */ - if (btrfs_can_zone_reset(dev, physical, length)) - ret = btrfs_reset_device_zone(dev, physical, - length, &bytes); - else if (blk_queue_discard(req_q)) - ret = btrfs_issue_discard(dev->bdev, physical, - length, &bytes); - else - continue; - + ret = do_discard_extent(stripe, &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c8c94e5081eb..f3ab7ff0769f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5973,9 +5973,29 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info, return ret; } +static bool is_block_group_to_copy(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + bool ret; + + /* non-ZONED mode does not use "to_copy" flag */ + if (!btrfs_is_zoned(fs_info)) + return false; + + cache = btrfs_lookup_block_group(fs_info, logical); + + spin_lock(&cache->lock); + ret = cache->to_copy; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + return ret; +} + static void handle_ops_on_dev_replace(enum btrfs_map_op op, struct btrfs_bio **bbio_ret, struct btrfs_dev_replace *dev_replace, + u64 logical, int *num_stripes_ret, int *max_errors_ret) { struct btrfs_bio *bbio = *bbio_ret; @@ -5988,6 +6008,15 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, if (op == BTRFS_MAP_WRITE) { int index_where_to_add; + /* + * a block group which have "to_copy" set will + * eventually copied by dev-replace process. We can + * avoid cloning IO here. + */ + if (is_block_group_to_copy(dev_replace->srcdev->fs_info, + logical)) + return; + /* * duplicate the write operations while the dev replace * procedure is running. Since the copying of the old disk to @@ -6383,8 +6412,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op)) { - handle_ops_on_dev_replace(op, &bbio, dev_replace, &num_stripes, - &max_errors); + handle_ops_on_dev_replace(op, &bbio, dev_replace, logical, + &num_stripes, &max_errors); } *bbio_ret = bbio; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index d4edcc5edcfc..a50c441115ab 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -11,6 +11,7 @@ #include "disk-io.h" #include "block-group.h" #include "transaction.h" +#include "dev-replace.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1039,6 +1040,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) for (i = 0; i < map->num_stripes; i++) { bool is_sequential; struct blk_zone zone; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + int dev_replace_is_ongoing = 0; device = map->stripes[i].dev; physical = map->stripes[i].physical; @@ -1065,6 +1068,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) */ btrfs_dev_clear_zone_empty(device, physical); + down_read(&dev_replace->rwsem); + dev_replace_is_ongoing = + btrfs_dev_replace_is_ongoing(dev_replace); + if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL) + btrfs_dev_clear_zone_empty(dev_replace->tgtdev, + physical); + up_read(&dev_replace->rwsem); + /* * The group is mapped to a sequential zone. Get the zone write * pointer to determine the allocation offset within the zone. From patchwork Fri Jan 15 06:53:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021653 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07AB0C433E0 for ; Fri, 15 Jan 2021 07:00:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE86222A84 for ; Fri, 15 Jan 2021 07:00:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730842AbhAOHAd (ORCPT ); Fri, 15 Jan 2021 02:00:33 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41718 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730696AbhAOHAb (ORCPT ); Fri, 15 Jan 2021 02:00:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694031; x=1642230031; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sb0NvVeKKjV9V8itCgyI64EPkB3J1ghPate2Bn9olY0=; b=Z+G6ccQ1hYO7gK223eIsJr+3oytNpABqW4XjKQSk80m+uw3hnT3C2I/q c/TH0FN+0cHrQkELL9+GqzJGhYtP3RlGNBmw1t8Xdh+JAqY6ZkHIT5t7E ey9Gkw6bkuYRB7tQ3GFWnuNFPwQnnempOCHNVLHa0SY3Z4QDRHe8xiGvC zQ59ssQ3YTnwovVry1a//iGIdBPpZZrUliwIMGLSi8XxF5z3RWdVJcKqR poPGXBnagA2NWotvHRulhGHxGTHKY+MHxWYXF/VyIsJEjWJsgIDNEClxA xpJk1bOEoZvtSmJa81P8q8km1khYBP7PBoTyQZSplPL2wDL51jKsO8a7X A==; IronPort-SDR: 9t5BnF9U87fR2bXoSbquI+eIH+mhqxRb0fZbHVtGg57EMqiiTH/h0nUHCEBKQvVEMzHIhPV3Dd HX2gx6MuU2mQSOHgX7CFzY2P0Bi/Sj3Cp+2q0Ar5swV0LiGZtXCfaY/HK0TN83OH5xBGvSThqM lLlVDmGIOxbuI9M407ktLLIse8MxVD0YE0q+zV4uryY4M5CWr6rT5R1ar13R+AIDaqmN7rhlGc H0SGK6pl+WA5g228yguMjkXA5f9qcY5SvnlNernVQKoK+wbh8PaNGD0s8LHqJZPa7MeENAjUft KZ4= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928307" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:13 +0800 IronPort-SDR: 50UaLwhrgo5wA0AZuBlWbwejIM/i9kIf/57MQzDGiE3EWnzAXiK4WV2JvB6Xm4bgtThktOWjwN oCLiBAUZ3apc6xnxsZPNZ4zbfjwcSeUC8cggf3j6rviEG4NbrbGx9p2TibKPZ30lMQkLVIQEUe oJBYgwoc4OeI7pv89WnFFa93b0SOqh1ZpGhRYGkAAuv309GYRls3Vn/w67t3ZXIrSTzA7jR1qR trN3F/IAdNepi6kKVyw5bISQV9rHvUpnw+UXNbJVYIdTfNGpTHmIkoepRvwI9wPVwcROSy3KK7 fcj2s4IRqbDPUUE6/1/2IeWc Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:55 -0800 IronPort-SDR: /VWs+C2p3Z3XoCaZkeA304cS/C97E89Uo26iEoXM9CNO6k+Ue7AuQIt0s/3yp8dN+2Pe2W70Hd RglettZbCTq58hX/33SwWno8dRSr4FG2MLIIthZhVlW0t5JPXTsv44GV7sl2/buNg3JdhsWnMa RRcAoDKJE28sknXET4f/Ncg2cTXIfEGVzCj/bNpWgjnvyPD3nZOZBVStxVicR8KyVqRwtj7FWf 2XBaN5qEb4wIw03vtLktuwJSv6HsZuc64xM6ebzwATzR8DRR0qdnSMCzw1U9BCeyMaNiFKLA2j RNg= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:12 -0800 Received: (nullmailer pid 1916488 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 34/41] btrfs: implement copying for ZONED device-replace Date: Fri, 15 Jan 2021 15:53:38 +0900 Message-Id: <0ef506b58789744b96b46984f252aaaaca33d820.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 3/4 patch to implement device-replace on ZONED mode. This commit implement copying. So, it track the write pointer during device replace process. Device-replace's copying is smart to copy only used extents on source device, we have to fill the gap to honor the sequential write rule in the target device. Device-replace process in ZONED mode must copy or clone all the extents in the source device exactly once. So, we need to use to ensure allocations started just before the dev-replace process to have their corresponding extent information in the B-trees. finish_extent_writes_for_zoned() implements that functionality, which basically is the removed code in the commit 042528f8d840 ("Btrfs: fix block group remaining RO forever after error during device replace"). Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.c | 12 +++++++ fs/btrfs/zoned.h | 8 +++++ 3 files changed, 106 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index b57c1184f330..b03c3629fb12 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -166,6 +166,7 @@ struct scrub_ctx { int pages_per_rd_bio; int is_dev_replace; + u64 write_pointer; struct scrub_bio *wr_curr_bio; struct mutex wr_lock; @@ -1619,6 +1620,25 @@ static int scrub_write_page_to_dev_replace(struct scrub_block *sblock, return scrub_add_page_to_wr_bio(sblock->sctx, spage); } +static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) +{ + int ret = 0; + u64 length; + + if (!btrfs_is_zoned(sctx->fs_info)) + return 0; + + if (sctx->write_pointer < physical) { + length = physical - sctx->write_pointer; + + ret = btrfs_zoned_issue_zeroout(sctx->wr_tgtdev, + sctx->write_pointer, length); + if (!ret) + sctx->write_pointer = physical; + } + return ret; +} + static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, struct scrub_page *spage) { @@ -1641,6 +1661,13 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, if (sbio->page_count == 0) { struct bio *bio; + ret = fill_writer_pointer_gap(sctx, + spage->physical_for_dev_replace); + if (ret) { + mutex_unlock(&sctx->wr_lock); + return ret; + } + sbio->physical = spage->physical_for_dev_replace; sbio->logical = spage->logical; sbio->dev = sctx->wr_tgtdev; @@ -1705,6 +1732,10 @@ static void scrub_wr_submit(struct scrub_ctx *sctx) * doubled the write performance on spinning disks when measured * with Linux 3.5 */ btrfsic_submit_bio(sbio->bio); + + if (btrfs_is_zoned(sctx->fs_info)) + sctx->write_pointer = sbio->physical + + sbio->page_count * PAGE_SIZE; } static void scrub_wr_bio_end_io(struct bio *bio) @@ -3028,6 +3059,21 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return ret < 0 ? ret : 0; } +static void sync_replace_for_zoned(struct scrub_ctx *sctx) +{ + if (!btrfs_is_zoned(sctx->fs_info)) + return; + + sctx->flush_all_writes = true; + scrub_submit(sctx); + mutex_lock(&sctx->wr_lock); + scrub_wr_submit(sctx); + mutex_unlock(&sctx->wr_lock); + + wait_event(sctx->list_wait, + atomic_read(&sctx->bios_in_flight) == 0); +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3168,6 +3214,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ blk_start_plug(&plug); + if (sctx->is_dev_replace && + btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) { + mutex_lock(&sctx->wr_lock); + sctx->write_pointer = physical; + mutex_unlock(&sctx->wr_lock); + sctx->flush_all_writes = true; + } + /* * now find all extents for each stripe and scrub them */ @@ -3356,6 +3410,9 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) goto out; + if (sctx->is_dev_replace) + sync_replace_for_zoned(sctx); + if (extent_logical + extent_len < key.objectid + bytes) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { @@ -3478,6 +3535,25 @@ static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx, return ret; } +static int finish_extent_writes_for_zoned(struct btrfs_root *root, + struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_trans_handle *trans; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + btrfs_wait_block_group_reservations(cache); + btrfs_wait_nocow_writers(cache); + btrfs_wait_ordered_roots(fs_info, U64_MAX, cache->start, cache->length); + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) + return PTR_ERR(trans); + return btrfs_commit_transaction(trans); +} + static noinline_for_stack int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, u64 start, u64 end) @@ -3633,6 +3709,16 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, * group is not RO. */ ret = btrfs_inc_block_group_ro(cache, sctx->is_dev_replace); + if (!ret && sctx->is_dev_replace) { + ret = finish_extent_writes_for_zoned(root, cache); + if (ret) { + btrfs_dec_block_group_ro(cache); + scrub_pause_off(fs_info); + btrfs_put_block_group(cache); + break; + } + } + if (ret == 0) { ro_set = 1; } else if (ret == -ENOSPC && !sctx->is_dev_replace) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index a50c441115ab..9344d49f8b56 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1381,3 +1381,15 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, ASSERT(cache->meta_write_pointer == eb->start + eb->len); cache->meta_write_pointer = eb->start; } + +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length) +{ + if (!btrfs_dev_is_sequential(device, physical)) + return -EOPNOTSUPP; + + return blkdev_issue_zeroout(device->bdev, + physical >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, + GFP_NOFS, 0); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index a42e120158ab..a9698470c08e 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -55,6 +55,8 @@ bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, struct btrfs_block_group **cache_ret); void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -169,6 +171,12 @@ static inline void btrfs_revert_meta_write_pointer( { } +static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, + u64 physical, u64 length) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 15 06:53:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021655 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FF0AC433E6 for ; Fri, 15 Jan 2021 07:00:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 34E5B22A84 for ; Fri, 15 Jan 2021 07:00:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730863AbhAOHAm (ORCPT ); Fri, 15 Jan 2021 02:00:42 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41680 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730849AbhAOHAl (ORCPT ); Fri, 15 Jan 2021 02:00:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694041; x=1642230041; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hcObCvZrTFsV8rlbA9jYB27S/5RRD+XCCNFKc01V95M=; b=c8HiQcnwt681Uujnf23bK6QBCCW9BaRwQfjGI0QLC+So/UCrsiYgSgTB vNBiR47VCS9kFLy0pUC15rcVSRLiDOJTZIUT5veqBz4gwJIazHRY74Mvi YXvZS9b8fyuM6uSCC9qicPf0uIOMzN3NmvPOjfnKV7+2tCEjgNbIk47NJ UjsMrBdwQ/NYe6w/D4tzoZqMlPu0R8IkYC/0u8C1W07PcraFPWirQ70TJ Z7ZPmei4aDq9kaQ8+QM+cUqW/nNbA650/hHoXqng0JPdojbi8L2M1Acnd q0Hnl52n+TpJX/D/h8GbiW8jZupKA49/rhHXhqvcA0+QiOpE+eNwYeubP w==; IronPort-SDR: VvPX2N44FHg6qNNn6FGRpn2qb+fVOVY02TNSDDlVKdaKFW7BgLHvHdXVx9UkRhW/WZ01yNhF+W IerA143+HtJuSb9wpXjc3kiKhqB+d2qIT9Zstyd7NCBT21HCTI9rUfETurAaSv68ceeA/Z3YgW 8CE9aB1SoI7sbpojuKxIJ/T2ORgwZCndXBK6OVw/CBSZsBz4Y4krlnZcsPiQ0UOfJtbzNIWOh+ iWHNDPkiIFeQpFBy4kDLgi4sYZxDIW334UAMwKRpBJYrZ4t/EJulMSr6jgQ3c7ZAw2gd5EzRrH cZw= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928310" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:15 +0800 IronPort-SDR: SeRuTmly++h80ycyG2V/NYnh4O3MvZPRqjYvaskcV5cpeZX3EpUgH3N+xE3b409ADOVVrH9LjZ zTwT1m6WmWB/U+fgarZQFt2R0UCxV87ApOjO6foX7NfflA01iZ9Pi2NDR+9imY4lo4AUcdE1fK Lqpk80CFgoFEkIWzEF4cQlkyo2rN9NITJL+BAcRRmK2Z/S9iafjgni3fMRhSteI0detMK8mHJx vyO5Rekp+JmW4NhQji7PlxEN9jjVfZPck6VkYMCef18JRyTrNt0r1Php1l0rFnaKbhUpYZEq9+ ZqFmOi5/q0VptL3Gevb26l2/ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:57 -0800 IronPort-SDR: 8aEt9nvfWxMieP+Hj2z6Dm0GnU76I+mCSfTKyNUa99e1ggCMnlkcruhJi2rMOE7rlX5xq8rblW dL8VuGX/h7/ujBOKwX0Ikd7FaCW/ogWERgPt4Tp23EHOiCdCqmHCUwkPVRD4N6trv1Qf3BhbPG sQNdxfw/5isFM/HNKgypoTLSUddrxKt7srsZYw/x24EpJWZPNRXJlhIejrCpyT71Dz6QQwOIrH QksHQt0UqAzHDEJD6uN2f4Nc8+fKExkpuKrOCuPOWSp1Zo3zvcZZjaJYofs/koMzL8ZJoXvA6r MuQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:14 -0800 Received: (nullmailer pid 1916490 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 35/41] btrfs: support dev-replace in ZONED mode Date: Fri, 15 Jan 2021 15:53:39 +0900 Message-Id: <30cfb6b35f69048554247f66599e821f769d60a9.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 4/4 patch to implement device-replace on ZONED mode. Even after the copying is done, the write pointers of the source device and the destination device may not be synchronized. For example, when the last allocated extent is freed before device-replace process, the extent is not copied, leaving a hole there. This patch synchronize the write pointers by writing zeros to the destination device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 39 +++++++++++++++++++++++++ fs/btrfs/zoned.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 9 ++++++ 3 files changed, 122 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index b03c3629fb12..2f577f3b1c31 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1628,6 +1628,9 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) if (!btrfs_is_zoned(sctx->fs_info)) return 0; + if (!btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) + return 0; + if (sctx->write_pointer < physical) { length = physical - sctx->write_pointer; @@ -3074,6 +3077,31 @@ static void sync_replace_for_zoned(struct scrub_ctx *sctx) atomic_read(&sctx->bios_in_flight) == 0); } +static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical, + u64 physical, u64 physical_end) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + int ret = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); + + mutex_lock(&sctx->wr_lock); + if (sctx->write_pointer < physical_end) { + ret = btrfs_sync_zone_write_pointer(sctx->wr_tgtdev, logical, + physical, + sctx->write_pointer); + if (ret) + btrfs_err(fs_info, "failed to recover write pointer"); + } + mutex_unlock(&sctx->wr_lock); + btrfs_dev_clear_zone_empty(sctx->wr_tgtdev, physical); + + return ret; +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3480,6 +3508,17 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, blk_finish_plug(&plug); btrfs_free_path(path); btrfs_free_path(ppath); + + if (sctx->is_dev_replace && ret >= 0) { + int ret2; + + ret2 = sync_write_pointer_for_zoned(sctx, base + offset, + map->stripes[num].physical, + physical_end); + if (ret2) + ret = ret2; + } + return ret < 0 ? ret : 0; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 9344d49f8b56..ecee4a9d2127 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -12,6 +12,7 @@ #include "block-group.h" #include "transaction.h" #include "dev-replace.h" +#include "space-info.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1393,3 +1394,76 @@ int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, length >> SECTOR_SHIFT, GFP_NOFS, 0); } + +static int read_zone_info(struct btrfs_fs_info *fs_info, u64 logical, + struct blk_zone *zone) +{ + struct btrfs_bio *bbio = NULL; + u64 mapped_length = PAGE_SIZE; + unsigned int nofs_flag; + int nmirrors; + int i, ret; + + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, + &mapped_length, &bbio); + if (ret || !bbio || mapped_length < PAGE_SIZE) { + btrfs_put_bbio(bbio); + return -EIO; + } + + if (bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) + return -EINVAL; + + nofs_flag = memalloc_nofs_save(); + nmirrors = (int)bbio->num_stripes; + for (i = 0; i < nmirrors; i++) { + u64 physical = bbio->stripes[i].physical; + struct btrfs_device *dev = bbio->stripes[i].dev; + + /* Missing device */ + if (!dev->bdev) + continue; + + ret = btrfs_get_dev_zone(dev, physical, zone); + /* Failing device */ + if (ret == -EIO || ret == -EOPNOTSUPP) + continue; + break; + } + memalloc_nofs_restore(nofs_flag); + + return ret; +} + +/* + * Synchronize write pointer in a zone at @physical_start on @tgt_dev, by + * filling zeros between @physical_pos to a write pointer of dev-replace + * source device. + */ +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos) +{ + struct btrfs_fs_info *fs_info = tgt_dev->fs_info; + struct blk_zone zone; + u64 length; + u64 wp; + int ret; + + if (!btrfs_dev_is_sequential(tgt_dev, physical_pos)) + return 0; + + ret = read_zone_info(fs_info, logical, &zone); + if (ret) + return ret; + + wp = physical_start + ((zone.wp - zone.start) << SECTOR_SHIFT); + + if (physical_pos == wp) + return 0; + + if (physical_pos > wp) + return -EUCLEAN; + + length = wp - physical_pos; + return btrfs_zoned_issue_zeroout(tgt_dev, physical_pos, length); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index a9698470c08e..8c203c0425e0 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -57,6 +57,8 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length); +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -177,6 +179,13 @@ static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, return -EOPNOTSUPP; } +static inline int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, + u64 logical, u64 physical_start, + u64 physical_pos) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 15 06:53:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58312C433E6 for ; Fri, 15 Jan 2021 07:00:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 30FA622597 for ; Fri, 15 Jan 2021 07:00:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730930AbhAOHAp (ORCPT ); Fri, 15 Jan 2021 02:00:45 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41681 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730849AbhAOHAo (ORCPT ); Fri, 15 Jan 2021 02:00:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694044; x=1642230044; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lx27rK6uZJJJhyjkN5LEz7I+gLUjp/lTLA/Uc8ByMQA=; b=cUX3xy+KJ5Y9Iu1fqUN3+hUpkgTbj1YZw7gBSwiCA9yUv76Mb0Zamniv ifzk6DhOenI4S2GXzftsmvEcADLQhEVLg5yqT8z8n77/OBDSbXF+z+NWX LKlOgngav2Vxz4L4kZa/5mVIxGV0NiGTsI0jNOsq/M+oK4EzNM8Uiqivm 5Y4/0L0vSAUdK7ud+1XR6/ETGYhb5yxhvisahYCLeZxK8Pfc9VrKP/m04 kxiHRe4jx6WANwtNDZoMEEvGLdVsWMyrlGUK05hXL9T0rJJGZZ6H7vQXj owxOukMNAFQfktd6Giry30SdIGh5hXUb/tVC1B6IoBhguVao2zG8si/ST A==; IronPort-SDR: u4UgBoN++j7eOqam9czqkVi9lqom8zXtx2C3qC4AkNdiyS9UvH4EROs3C66+6hP6kapa+ArKbW 0mfVwa2LMlqnLcdXr6Cv2eTZk+IQ+FIfhuG0q6lNwKbWcHiBtfCHqUd5oIiAx3cMh1RDew1qWJ +dwXofXwie+ixMvn5diCgEmm6bYAbux8EL9QGnC1HbJIJshgZgP4qRhTZfGpEhZe6rNjeYRR8n UhIhAtyxJs/es2Wbx70K4cBrypTI1tRAXqHpoT+IYPcJSHHodFZUfbyMCmlBDBN0vM1mPtE40S D0A= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928313" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:17 +0800 IronPort-SDR: MaZl+v4ra79rRnsrjGlMMCJ6WszbFGw4UoYE2jh0+a9sjttkKMF0YxhT+NEkYLZbETytVO6Hba p5vje25H/H0JTM6DaJi7IsGA0MvRBZYPYaZhvQ/+MoKCNXocT2yEZ77be+s3TBhLcnX0xMZnbC NQupCaj2Cs7B23JDlw8io/KAWxJvcwiPWKX+ijZ/x1m+988a2UCmDJ6rj4YrbN1XZdWw0cAZhS jtvDn4i6+l1Mx/DNQtT6q1QIYoSGW3Xaw7mRwM+I0830QjwW1HNs6em9gtTRyeAcj1HIj2m1j3 6fKjqpMOa6FwdtYbvYRn6xvq Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:40:59 -0800 IronPort-SDR: 03lSu83ohZX9kXGWS09CEiSB0CjezRIoYgza/x1oGTRjVwwoem2FB8UHmq6htYiG1gP3zqTBlq kkVv7ZZ7YQtmR+n5yl26pUU/uA8NZktg9Aogd/0nTyrmlUHxFJizXpe0Z4+0QqeO5vp4DlXvvp 1XD+zioeSffKaTS2M7cCepOyInKx1SI53ZANIxLfPnIMiP3GV4ZyCGvMnbf4M+KAIcVvdWxsli A7kHAgF34r1EgEXKrM+Xh3AMqyqV0Kt+Cs2cWglpbgLx7MiCxYBOte4GucFZy0z55Ih7E4kKPM zvY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:16 -0800 Received: (nullmailer pid 1916492 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 36/41] btrfs: enable relocation in ZONED mode Date: Fri, 15 Jan 2021 15:53:40 +0900 Message-Id: <6bda928563e2db015bdee6cd277ac83852bc6054.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org To serialize allocation and submit_bio, we introduced mutex around them. As a result, preallocation must be completely disabled to avoid a deadlock. Since current relocation process relies on preallocation to move file data extents, it must be handled in another way. In ZONED mode, we just truncate the inode to the size that we wanted to pre-allocate. Then, we flush dirty pages on the file before finishing relocation process. run_delalloc_zoned() will handle all the allocation and submit IOs to the underlying layers. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/relocation.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 30a80669647f..ee10cfd590ea 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2553,6 +2553,31 @@ static noinline_for_stack int prealloc_file_extent_cluster( if (ret) return ret; + /* + * In ZONED mode, we cannot preallocate the file region. Instead, we + * dirty and fiemap_write the region. + */ + if (btrfs_is_zoned(inode->root->fs_info)) { + struct btrfs_root *root = inode->root; + struct btrfs_trans_handle *trans; + + end = cluster->end - offset + 1; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + inode->vfs_inode.i_ctime = current_time(&inode->vfs_inode); + i_size_write(&inode->vfs_inode, end); + ret = btrfs_update_inode(trans, root, inode); + if (ret) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + } + + return btrfs_end_transaction(trans); + } + inode_lock(&inode->vfs_inode); for (nr = 0; nr < cluster->nr; nr++) { start = cluster->boundary[nr] - offset; @@ -2749,6 +2774,8 @@ static int relocate_file_extent_cluster(struct inode *inode, } } WARN_ON(nr != cluster->nr); + if (btrfs_is_zoned(fs_info) && !ret) + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); out: kfree(ra); return ret; @@ -3384,8 +3411,12 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, struct btrfs_path *path; struct btrfs_inode_item *item; struct extent_buffer *leaf; + u64 flags = BTRFS_INODE_NOCOMPRESS | BTRFS_INODE_PREALLOC; int ret; + if (btrfs_is_zoned(trans->fs_info)) + flags &= ~BTRFS_INODE_PREALLOC; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -3400,8 +3431,7 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, btrfs_set_inode_generation(leaf, item, 1); btrfs_set_inode_size(leaf, item, 0); btrfs_set_inode_mode(leaf, item, S_IFREG | 0600); - btrfs_set_inode_flags(leaf, item, BTRFS_INODE_NOCOMPRESS | - BTRFS_INODE_PREALLOC); + btrfs_set_inode_flags(leaf, item, flags); btrfs_mark_buffer_dirty(leaf); out: btrfs_free_path(path); From patchwork Fri Jan 15 06:53:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86C0AC433DB for ; Fri, 15 Jan 2021 07:00:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5DB4F22597 for ; Fri, 15 Jan 2021 07:00:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730965AbhAOHAw (ORCPT ); Fri, 15 Jan 2021 02:00:52 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41752 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730849AbhAOHAv (ORCPT ); Fri, 15 Jan 2021 02:00:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694051; x=1642230051; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vG5uupA+pIubBMRWkzIvSeukV4Tr04jgkobUzi1uE0k=; b=Pwn30BwiFI3dzicjzgOapvIueSM62Pr+/gUMlIqcAN1lzS1tfMHsYhCi g5ixrSzYcXpxy1n6V2CGuUoOQhR61mgjnFtIGO1wu4pkbr0uEf3Sfp6ep LZ4ULpBtnexJTTH4ElBYL0uGgAJoqsqLIyL3GjH8XWUSMIx6xzdzky2Lp lVwEiRWD2yT8hJzC/XVebbQ5uat0uAKNoo575XC/sICXmL8Bh6WTnyhKL Rv74mW8Q8WjTPkSu2UisWVUBDD8SZvr0xaM9usXMV4PA0VEDeICfEP3+T m/tJT+RqjZGvWDp4Biv+Oesc6YTNi4O983ruMzC9JiaHMDRnuGinWJwlK A==; IronPort-SDR: EWK/bjhATiB5N5PMgVmuZXcyTXld2LR1DaGM82RMzA18Y+1mkqut34h+7HIvEpTdA2Gkpxr7R5 Nyld0dzsBVHenCMCYVKlnh2oGEFa8P6dMQRgO/3asliW999yB8XQ4Tzcx8FUdNVp6+9z7WZVrU D7HgT6Q8G774CXokYi5a07yT3dyvJDfR9OSB9SetQlF6svkU/l4OFNWSsFd4pc33avKz0B+HbH ywdzJAP6lz3BHX/ppD4/LulCUMeNCv6FMcfx7YOSpgN8EW73Gz6CsVFO03p+tVOZIMholBWNCL d14= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928321" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:19 +0800 IronPort-SDR: pz7M1eFqFdY9I5EO2eO7xqnd8f+fqyg9tXaSjsswQafTQM4i3EA2pxemZl3b/kA3tVgTb4AfXc bbWWCMGmGxqkxIeAIycX3X84rlsJM4XZ21LCacLZX5X4UMAVTlvIExc9CS73Y64JjZZuSmXcap ERGZFfxc89cXFPmw2Wg3g54rYjZXoa7puTyjEyK0W66FBktmfdMCngFIEHc59v341LoiSUkGYp 3B8SJHx9UzkyTcpQXVB2NHF1AtpW1Q0Nx9aDZHPM404mcm6mXCke6YGWF9zNQRKpML2U6O1apV fxrNi48fVPed8vj/5Jk1A6e0 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:41:01 -0800 IronPort-SDR: sHobbkB1pBpbFpUk4zu1fGN6jFUMXoAnsewKhya7GiAGzDlWOnz9HdM9jQVEy5d7NhZZB+9rQC bMv7XNuO0GRcDp6bW2uO7zmGnR5FGx9QIkbshrugUkfp0gZjvslAklQZabFd9i6z0VR7fwHO+9 V2J3MmuVA4CCMDOM8NWCkPn4Nzf2krbuKCMs7Btjz2yxYhStLqGb2SK1jMu934RuHSoYVIxAqw 6x0mEiYKyKDcVTpcO+WKewJKc3B2dqMQkIkjq9Z/QBub+sDnIoD+rNyZT/nsrmfix7v/jq7uTx FdM= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:18 -0800 Received: (nullmailer pid 1916494 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v12 37/41] btrfs: relocate block group to repair IO failure in ZONED Date: Fri, 15 Jan 2021 15:53:41 +0900 Message-Id: <7daa3aa0dc8a454a49b81380fd6b8a9bb19237a9.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When btrfs find a checksum error and if the file system has a mirror of the damaged data, btrfs read the correct data from the mirror and write the data to damaged blocks. This repairing, however, is against the sequential write required rule. We can consider three methods to repair an IO failure in ZONED mode: (1) Reset and rewrite the damaged zone (2) Allocate new device extent and replace the damaged device extent to the new extent (3) Relocate the corresponding block group Method (1) is most similar to a behavior done with regular devices. However, it also wipes non-damaged data in the same device extent, and so it unnecessary degrades non-damaged data. Method (2) is much like device replacing but done in the same device. It is safe because it keeps the device extent until the replacing finish. However, extending device replacing is non-trivial. It assumes "src_dev->physical == dst_dev->physical". Also, the extent mapping replacing function should be extended to support replacing device extent position in one device. Method (3) invokes relocation of the damaged block group, so it is straightforward to implement. It relocates all the mirrored device extents, so it is, potentially, a more costly operation than method (1) or (2). But it relocates only using extents which reduce the total IO size. Let's apply method (3) for now. In the future, we can extend device-replace and apply method (2). For protecting a block group gets relocated multiple time with multiple IO errors, this commit introduces "relocating_repair" bit to show it's now relocating to repair IO failures. Also it uses a new kthread "btrfs-relocating-repair", not to block IO path with relocating process. This commit also supports repairing in the scrub process. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 1 + fs/btrfs/extent_io.c | 3 ++ fs/btrfs/scrub.c | 3 ++ fs/btrfs/volumes.c | 71 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 5 files changed, 79 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 3dec66ed36cb..36654bcd2a83 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -96,6 +96,7 @@ struct btrfs_block_group { unsigned int has_caching_ctl:1; unsigned int removed:1; unsigned int to_copy:1; + unsigned int relocating_repair:1; int disk_cache_state; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3d004bae2fa2..c4453cfcbf14 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2260,6 +2260,9 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); BUG_ON(!mirror_num); + if (btrfs_is_zoned(fs_info)) + return btrfs_repair_one_zone(fs_info, logical); + bio = btrfs_io_bio_alloc(1); bio->bi_iter.bi_size = 0; map_length = length; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 2f577f3b1c31..d0c47ef72d46 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -857,6 +857,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) have_csum = sblock_to_check->pagev[0]->have_csum; dev = sblock_to_check->pagev[0]->dev; + if (btrfs_is_zoned(fs_info) && !sctx->is_dev_replace) + return btrfs_repair_one_zone(fs_info, logical); + /* * We must use GFP_NOFS because the scrub task might be waiting for a * worker task executing this function and in turn a transaction commit diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f3ab7ff0769f..dbcc4b66972d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7990,3 +7990,74 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr) spin_unlock(&fs_info->swapfile_pins_lock); return node != NULL; } + +static int relocating_repair_kthread(void *data) +{ + struct btrfs_block_group *cache = (struct btrfs_block_group *) data; + struct btrfs_fs_info *fs_info = cache->fs_info; + u64 target; + int ret = 0; + + target = cache->start; + btrfs_put_block_group(cache); + + if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) { + btrfs_info(fs_info, + "zoned: skip relocating block group %llu to repair: EBUSY", + target); + return -EBUSY; + } + + mutex_lock(&fs_info->delete_unused_bgs_mutex); + + /* Ensure Block Group still exists */ + cache = btrfs_lookup_block_group(fs_info, target); + if (!cache) + goto out; + + if (!cache->relocating_repair) + goto out; + + ret = btrfs_may_alloc_data_chunk(fs_info, target); + if (ret < 0) + goto out; + + btrfs_info(fs_info, "zoned: relocating block group %llu to repair IO failure", + target); + ret = btrfs_relocate_chunk(fs_info, target); + +out: + if (cache) + btrfs_put_block_group(cache); + mutex_unlock(&fs_info->delete_unused_bgs_mutex); + btrfs_exclop_finish(fs_info); + + return ret; +} + +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + + /* Do not attempt to repair in degraded state */ + if (btrfs_test_opt(fs_info, DEGRADED)) + return 0; + + cache = btrfs_lookup_block_group(fs_info, logical); + if (!cache) + return 0; + + spin_lock(&cache->lock); + if (cache->relocating_repair) { + spin_unlock(&cache->lock); + btrfs_put_block_group(cache); + return 0; + } + cache->relocating_repair = 1; + spin_unlock(&cache->lock); + + kthread_run(relocating_repair_kthread, cache, + "btrfs-relocating-repair"); + + return 0; +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 0bcf87a9e594..54f475e0c702 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -597,5 +597,6 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical); #endif From patchwork Fri Jan 15 06:53:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0092CC433E0 for ; Fri, 15 Jan 2021 07:00:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C3B5B22A84 for ; Fri, 15 Jan 2021 07:00:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730982AbhAOHA4 (ORCPT ); Fri, 15 Jan 2021 02:00:56 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41699 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730849AbhAOHA4 (ORCPT ); Fri, 15 Jan 2021 02:00:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694055; x=1642230055; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jRI20jlKCrFsFVWxCms2eubSZPqN4w0R3+WP+Nzk9No=; b=QGZIvRXQzaACrxHqEzQ9lfxDv06AN43AA8iUNgLMHtmzDFVm6IInQyTF VaQE/cFy3VMHB7N1eE9+wpkdUODkEZIjOZDAZnjdj0rVKaFbdi0+L0gl0 Um6tP9g8p5lAiVkB5S2f+2PiAufL6FiW9kTDh1+LiDpWgl/qPFeASqDLc QNkgVNPR6LV3FkCbn3ucU/DgEtYzhG0p9/DqQc5HMwTreiuuLIQj16V4q cOU7DKt5muj5YP0J1KzYFQoLEzlnpT3NBHBguNppEwf0o43h8AoR1HC7N zX8zznySmcyOlzA2lSekGN+4LoP44Mfpf4bSZPFqRC+JtgJWV7qG4fGOy A==; IronPort-SDR: CpAmhEIWiiM5yPE75joY0JX1Buke/dnGP2FFDKCQ+YcORLuMjJwrVLjkEhoNkfc0K2kmO5+Jn+ Hr/vlGVPvloCs0STxKCB4pSoVO7HiYxcttSWyVjwt9r+MWYitRyzAEPNgqvhnaYVFV15sLRNEy 2bz1zfKlHD8EAcsXVNSG/l2hMminZUmiFNWbjUKnrNo3/bW5LzJqvfGQ3j0SOBiFwQzNxp3YZG pEdIXwMHaii+8e1IcRmtn2heBYz/VtOFDK798awZSQLiF17N8Sda3e3cwOPkfTmUhvw0AGhuc/ ZUk= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928323" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:21 +0800 IronPort-SDR: 4wonu0zfEclVFVF5SZRkKSwefQuV7Ci/uIHRdIndX8Ali4+SBY+Bl8sGK8Say5pNhIDLM63pYs fuTm/km1PoNU5LfvGUKwr1R++9+jW2WlRCPuzxdm5r/0K+2067rwMSeKbnfzUKyek79cDQ1yqg hs3ntwJxI9WxGdVLEosI8tzDSK4sCmmYYh52wZBYzt4WjyxCaCuGs/Vz5yu6bK0bPqfoTQ6U6p JhnlXKA6g1BLPf5mWM8nQx9COznHtv14/6nqNteJHaw0LsW1k/+z1NWSJ5fC7h3zRYTftxpXOF AUNu4NOagqpm2Fd0bPqffkJH Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:41:03 -0800 IronPort-SDR: 3FjJSoC2lwrHJaD/n6fAHrhBy1Cg5sqEFekBoQmqJpOSrA1jG3wayCHEvxjMB1EB5QyZRZgR0X sCi1LShmZ6YcdvqtoDcXYL7acpVuKs82+opqlIBfwOk08IV66MoVQ77MWlKWIQMuOf7pZgn3jK f3Ih37bRfAdS939pqaE0gx9BuzUNmc2ljLB3NaHm7O6hZduItyUepz5c00yb3uqzRXE9LHHkZq 3IHOSDpJ7DQz4cXYNqZMFuq3YAqIr4anJmX+EsKXNifhIdG1F0rKiTnuLFIPbzGU0PY31LymRq hts= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:20 -0800 Received: (nullmailer pid 1916496 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v12 38/41] btrfs: split alloc_log_tree() Date: Fri, 15 Jan 2021 15:53:42 +0900 Message-Id: <55c972c153fbb041a0a0c1dda329da7d76e4432b.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is a preparation for the next patch. This commit split alloc_log_tree() to allocating tree structure part (remains in alloc_log_tree()) and allocating tree node part (moved in btrfs_alloc_log_tree_node()). The latter part is also exported to be used in the next patch. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 33 +++++++++++++++++++++++++++------ fs/btrfs/disk-io.h | 2 ++ 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index efcf1a343732..dc0ddd097c6e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1197,7 +1197,6 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *root; - struct extent_buffer *leaf; root = btrfs_alloc_root(fs_info, BTRFS_TREE_LOG_OBJECTID, GFP_NOFS); if (!root) @@ -1207,6 +1206,14 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, root->root_key.type = BTRFS_ROOT_ITEM_KEY; root->root_key.offset = BTRFS_TREE_LOG_OBJECTID; + return root; +} + +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root) +{ + struct extent_buffer *leaf; + /* * DON'T set SHAREABLE bit for log trees. * @@ -1219,26 +1226,33 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, leaf = btrfs_alloc_tree_block(trans, root, 0, BTRFS_TREE_LOG_OBJECTID, NULL, 0, 0, 0, BTRFS_NESTING_NORMAL); - if (IS_ERR(leaf)) { - btrfs_put_root(root); - return ERR_CAST(leaf); - } + if (IS_ERR(leaf)) + return PTR_ERR(leaf); root->node = leaf; btrfs_mark_buffer_dirty(root->node); btrfs_tree_unlock(root->node); - return root; + + return 0; } int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; @@ -1250,11 +1264,18 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *log_root; struct btrfs_inode_item *inode_item; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + log_root->last_trans = trans->transid; log_root->root_key.offset = root->root_key.objectid; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 9f4a2a1e3d36..0e7e9526b6a8 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -120,6 +120,8 @@ blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, extent_submit_bio_start_t *submit_bio_start); blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio, int mirror_num); +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root); int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_add_log_tree(struct btrfs_trans_handle *trans, From patchwork Fri Jan 15 06:53:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A985BC433E0 for ; Fri, 15 Jan 2021 07:01:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 79741225AA for ; Fri, 15 Jan 2021 07:01:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731068AbhAOHA7 (ORCPT ); Fri, 15 Jan 2021 02:00:59 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41647 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730849AbhAOHA6 (ORCPT ); Fri, 15 Jan 2021 02:00:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694058; x=1642230058; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=42+5pf8iPeUx24FyPV/ieIPc3CXNMjsaqgBxkdRxIJw=; b=Lx3bQb8KPDCxcuf2cxtYliU5QsVEgnSvpscOeN7py7653/aOgmEDauFm y3AFvRghMNIUSiN+jbk+BF70KLfb8CtGVcyiXZFAZWY144rSyCfcvg5/l vkiTPsuRrYGiWy9FYPoz+xBWfqb7qC7ZJx33W3Cn95lCBHHzhK+IgDjUn FM1Bho8n8RG19ynuBgzRdAxrsczJao5v4iXuhqgLQ76vR9f3kGudCq7sz ver9VdqxGib11xBmhkAuP3EwrUV6OzSTxeOFp8QiXkNjl+kIi43zHzvyI XxLEbQMM+Z4IWe3lEoYA4vkFAjl9GEEqtdk5tTJ8YIzICzt9VFPkJb55f Q==; IronPort-SDR: YUaSYk6/kxmDWJH31F/oTQBMhOJBacUXp23a7RCYyG1j3u6JFBfWR0FwjUHPka5/8SuJFTIAi8 aD5OTJgZhaQThMpr7g3Mhz5UjUQgFV9+IKVIFeBy9+R3d265rZ6+EnykdHfmCl48fkpkTe2gEg pKEY7LRUBhf2hkLlB2y/y3hUZso6I/3PmHEdlxLBCKQRDLraJO7+/NGTopD42HHDr8xQxvE7Bl D4NdB74vE7J+nm7wwksZZQ5MZaojCIgUdf4qbS4Qoc282qXE2pRSZYYP7ownUih4cILVPgAoPE 49w= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928325" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:23 +0800 IronPort-SDR: E/VGx3WiobBsxETGk8tqvgSjEbeCMycmCCXIzTukaAlsVD0ylSr06faNZzh3W13TXMR+cDxMwC ajR4bl2GIhNQAbV8W7ZopgSLbTk86lMXK+gVhiMWAbY1u9ZMhxtS/yhUgItOcic90JbehoE9Ed uyRBW5411q/uRoK2v1imtWsGqef7R0wtqvpKCwmL3+68wSNc9wBFNmoAdA0Vf5lRwUQZ6T2DXl EgXFDzKttlnfXAOZVHc+9wqHpRzxQDPFiQaQl4IzDqVBtfUyoadYMMtRAoxYtYFI9wn0Mch4ZM VPr13Owxp2DeHFMknyVHWOSY Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:41:05 -0800 IronPort-SDR: G+VXN0VwCxVt96egSN/aGR0JgwJ+cjJHSwbbNahWwVqqsB6ViUKxBn3JK5SCIH0ed7+yC9t5di QCOIPOXcu7S3FqrTFl9apvxtD4b/Za+TDKzNySRb14JjfcWi9UTidiOR/1RX2cWZly7yeZQ2oo Jrs4bhETFaj6l4ca2MAaNM+Ip4XZo/3GkLLtPfLkIDjNQk681rbTm85FZ/p++1/53YdvhWC+Im JLAq+AynMoPE9+ZVIsMymqk/gGqB5d0jYV9X3WgJfD+/XMlf3BrbUs2d4MSmZiBMJwtdPjkQck Jk8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:22 -0800 Received: (nullmailer pid 1916498 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v12 39/41] btrfs: extend zoned allocator to use dedicated tree-log block group Date: Fri, 15 Jan 2021 15:53:43 +0900 Message-Id: <57606df632b5db50c7de22ce947f21f09ace4232.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/3 patch to enable tree log on ZONED mode. The tree-log feature does not work on ZONED mode as is. Blocks for a tree-log tree are allocated mixed with other metadata blocks, and btrfs writes and syncs the tree-log blocks to devices at the time of fsync(), which is different timing from a global transaction commit. As a result, both writing tree-log blocks and writing other metadata blocks become non-sequential writes that ZONED mode must avoid. We can introduce a dedicated block group for tree-log blocks so that tree-log blocks and other metadata blocks can be separated write streams. As a result, each write stream can now be written to devices separately. "fs_info->treelog_bg" tracks the dedicated block group and btrfs assign "treelog_bg" on-demand on tree-log block allocation time. This commit extends the zoned block allocator to use the block group. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 2 ++ fs/btrfs/ctree.h | 2 ++ fs/btrfs/disk-io.c | 1 + fs/btrfs/extent-tree.c | 75 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/zoned.h | 14 ++++++++ 5 files changed, 90 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 7083189884de..b98a49041b51 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -902,6 +902,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, btrfs_return_cluster_to_free_space(block_group, cluster); spin_unlock(&cluster->refill_lock); + btrfs_clear_treelog_bg(block_group); + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1085f8d9752b..b4485ea90805 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -977,6 +977,8 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; struct mutex zoned_meta_io_lock; + spinlock_t treelog_bg_lock; + u64 treelog_bg; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dc0ddd097c6e..12c23cb410fd 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2722,6 +2722,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) spin_lock_init(&fs_info->super_lock); spin_lock_init(&fs_info->buffer_lock); spin_lock_init(&fs_info->unused_bgs_lock); + spin_lock_init(&fs_info->treelog_bg_lock); rwlock_init(&fs_info->tree_mod_log_lock); mutex_init(&fs_info->unused_bg_unpin_mutex); mutex_init(&fs_info->delete_unused_bgs_mutex); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 23d77e3196ca..52fd3090f06a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3590,6 +3590,9 @@ struct find_free_extent_ctl { bool have_caching_bg; bool orig_have_caching_bg; + /* Allocation is called for tree-log */ + bool for_treelog; + /* RAID index, converted from flags */ int index; @@ -3818,6 +3821,22 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Tree-log Block Group Locking + * ============================ + * + * fs_info::treelog_bg_lock protects the fs_info::treelog_bg which + * indicates the starting address of a block group, which is reserved only + * for tree-log metadata. + * + * Lock nesting + * ============ + * + * space_info::lock + * block_group::lock + * fs_info::treelog_bg_lock + */ + /* * Simple allocator for sequential only block group. It only allows * sequential allocation. No need to play with trees. This function @@ -3827,23 +3846,54 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) { + struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_space_info *space_info = block_group->space_info; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; u64 start = block_group->start; u64 num_bytes = ffe_ctl->num_bytes; u64 avail; + u64 bytenr = block_group->start; + u64 log_bytenr; int ret = 0; + bool skip; ASSERT(btrfs_is_zoned(block_group->fs_info)); + /* + * Do not allow non-tree-log blocks in the dedicated tree-log block + * group, and vice versa. + */ + spin_lock(&fs_info->treelog_bg_lock); + log_bytenr = fs_info->treelog_bg; + skip = log_bytenr && ((ffe_ctl->for_treelog && bytenr != log_bytenr) || + (!ffe_ctl->for_treelog && bytenr == log_bytenr)); + spin_unlock(&fs_info->treelog_bg_lock); + if (skip) + return 1; + spin_lock(&space_info->lock); spin_lock(&block_group->lock); + spin_lock(&fs_info->treelog_bg_lock); + + ASSERT(!ffe_ctl->for_treelog || + block_group->start == fs_info->treelog_bg || + fs_info->treelog_bg == 0); if (block_group->ro) { ret = 1; goto out; } + /* + * Do not allow currently using block group to be tree-log dedicated + * block group. + */ + if (ffe_ctl->for_treelog && !fs_info->treelog_bg && + (block_group->used || block_group->reserved)) { + ret = 1; + goto out; + } + avail = block_group->length - block_group->alloc_offset; if (avail < num_bytes) { ffe_ctl->max_extent_size = avail; @@ -3851,6 +3901,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, goto out; } + if (ffe_ctl->for_treelog && !fs_info->treelog_bg) + fs_info->treelog_bg = block_group->start; + ffe_ctl->found_offset = start + block_group->alloc_offset; block_group->alloc_offset += num_bytes; spin_lock(&ctl->tree_lock); @@ -3865,6 +3918,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, ffe_ctl->search_start = ffe_ctl->found_offset; out: + if (ret && ffe_ctl->for_treelog) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); return ret; @@ -4114,7 +4170,12 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); case BTRFS_EXTENT_ALLOC_ZONED: - /* nothing to do */ + if (ffe_ctl->for_treelog) { + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg) + ffe_ctl->hint_byte = fs_info->treelog_bg; + spin_unlock(&fs_info->treelog_bg_lock); + } return 0; default: BUG(); @@ -4158,6 +4219,7 @@ static noinline int find_free_extent(struct btrfs_root *root, struct find_free_extent_ctl ffe_ctl = {0}; struct btrfs_space_info *space_info; bool full_search = false; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; WARN_ON(num_bytes < fs_info->sectorsize); @@ -4171,6 +4233,7 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.orig_have_caching_bg = false; ffe_ctl.found_offset = 0; ffe_ctl.hint_byte = hint_byte_orig; + ffe_ctl.for_treelog = for_treelog; ffe_ctl.policy = BTRFS_EXTENT_ALLOC_CLUSTERED; /* For clustered allocation */ @@ -4245,8 +4308,11 @@ static noinline int find_free_extent(struct btrfs_root *root, struct btrfs_block_group *bg_ret; /* If the block group is read-only, we can skip it entirely. */ - if (unlikely(block_group->ro)) + if (unlikely(block_group->ro)) { + if (for_treelog) + btrfs_clear_treelog_bg(block_group); continue; + } btrfs_grab_block_group(block_group, delalloc); ffe_ctl.search_start = block_group->start; @@ -4434,6 +4500,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, bool final_tried = num_bytes == min_alloc_size; u64 flags; int ret; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; flags = get_alloc_profile_by_root(root, is_data); again: @@ -4457,8 +4524,8 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, sinfo = btrfs_find_space_info(fs_info, flags); btrfs_err(fs_info, - "allocation failed flags %llu, wanted %llu", - flags, num_bytes); + "allocation failed flags %llu, wanted %llu treelog %d", + flags, num_bytes, for_treelog); if (sinfo) btrfs_dump_space_info(fs_info, sinfo, num_bytes, 1); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 8c203c0425e0..52789da61fa3 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -7,6 +7,7 @@ #include #include "volumes.h" #include "disk-io.h" +#include "block-group.h" struct btrfs_zoned_device_info { /* @@ -292,4 +293,17 @@ static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) mutex_unlock(&fs_info->zoned_meta_io_lock); } +static inline void btrfs_clear_treelog_bg(struct btrfs_block_group *bg) +{ + struct btrfs_fs_info *fs_info = bg->fs_info; + + if (!btrfs_is_zoned(fs_info)) + return; + + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg == bg->start) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); +} + #endif From patchwork Fri Jan 15 06:53:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021757 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1276DC433DB for ; Fri, 15 Jan 2021 07:01:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BD639225AA for ; Fri, 15 Jan 2021 07:01:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730191AbhAOHBO (ORCPT ); Fri, 15 Jan 2021 02:01:14 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41718 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726614AbhAOHBM (ORCPT ); Fri, 15 Jan 2021 02:01:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694072; x=1642230072; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GgKvwZ3/MumYs8snuRzyykc3kM4rM9mgurEtkGpvjZY=; b=KrRoby+GkyGSqol746NzESmNbv7NlpLuSzBnWt8TkgmuEPdAMuZ5w3ss sLmkoUUojiT9O7YWFg24p2aIHthXQ1xt+CzuJKGlCdMoGNgOPbXPHodUj Y1KDU012rOBeBAHQG8W97LrDlqjNZxl7THe+zR4KwezwV5yaE8D2ezIeC Btqu0J0a9VzsmALrjrgPZkkJr/hgQRtQlL4xTevzfSMdbFJqLtmd/apv+ 7EhfaXdXBfD1jFNTN8o7S6uqUDlu85Aabno5Zj+bCdP09W7FV/7t0XMA6 P2LPe01b/v/1lAMzpKiW42yMk1P8/SJ89+bwDvnw1UqWdfM1OXYJXzTpf g==; IronPort-SDR: KA25qDQtqM6zBy7v+0q0x1xDXCXsIAZvLg+WsyM/1R3uJBahLPh1GpFs1AFzuu7ORPeQJUZD0P K0g7WJFbEzq5gRoz1gNM9Mkx/cVJZ8+dD1zLbbU66+XZIGe7G/TC1A3WyMy+LWbyqJCdUEw1be NwevKDY6vRuZUZwy0wj5PuzSYLGgGbauaz6hXTi/oDK4E6APFaPvlv0ludKcScA9fknBZH4k4q b4SopWMG+o3VFtGI4Y+AW7UIkNeYROZ0yUogdqyBBk/LrIpgPdAN4Ql+kCmVck4u8s6/oXkedD 7X4= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928330" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:25 +0800 IronPort-SDR: ZBIHPrhS7tRW0uwsNs7Jmeedl3ZnqfEfw5Gp1QG+xcD032jDInUtKEdX8mYo5v8HYKXJMKt5rd rSbd0m//hBlKiF+LdM21rZ6l+syYGUCI6jKcn+Iw8ncdTETNoWbIoRN/ARk6p2MLnn6yMT7qd6 DpRMS0SsLRFuMpzoVL2h79rH8vGMxzb75OZE/C7A5STP9vsYef4Fd+Od3s5xkpXAWZihYqMAJF DJ34Lm9gFfb+NKZPwrWuuED1MyCAdqtqz30tCOVKpEgrziAgB4qLFP7vIuCEn5zBDcN4lG0/06 DEHVvxdujBudzBqszJpI4VDw Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:41:07 -0800 IronPort-SDR: a/Clb3fmtFWbKaemYVvMsNZWmO4eZx/F4tVCYymQ4PN+tn3KYkNCZ8z4n9XK5IKiRlYv6s4GLX X36fGqyjZOsd44Yk+65nL6Z/JIdkRX7sQpWnI8zEwVHYdH9DlcYN43L5jGOwhjyDNqsQqLii5r guJaI9CzT8GeBbMHT4bn2MnoWuguP7ZhpygCw5dXkGhIDddxoKe6NPKFbD9ONFAh+jCwMDKXtA zklbPf8LsF4KVZBjp87ff4TQ8K+G+r6ud72bX1ALBjiDWvuUOJtru+0zSgoiAJ9yyMZqze/Eoi GHY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:24 -0800 Received: (nullmailer pid 1916500 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v12 40/41] btrfs: serialize log transaction on ZONED mode Date: Fri, 15 Jan 2021 15:53:44 +0900 Message-Id: <2e011d54e1b85437c27b40201fcf2901e073d1ce.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 2/3 patch to enable tree-log on ZONED mode. Since we can start more than one log transactions per subvolume simultaneously, nodes from multiple transactions can be allocated interleaved. Such mixed allocation results in non-sequential writes at the time of log transaction commit. The nodes of the global log root tree (fs_info->log_root_tree), also have the same mixed allocation problem. This patch serializes log transactions by waiting for a committing transaction when someone tries to start a new transaction, to avoid the mixed allocation problem. We must also wait for running log transactions from another subvolume, but there is no easy way to detect which subvolume root is running a log transaction. So, this patch forbids starting a new log transaction when other subvolumes already allocated the global log root tree. Signed-off-by: Naohiro Aota --- fs/btrfs/tree-log.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 930e752686b4..71a1c0b5bc26 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -105,6 +105,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans, struct btrfs_root *log, struct btrfs_path *path, u64 dirid, int del_all); +static void wait_log_commit(struct btrfs_root *root, int transid); /* * tree logging is a special write ahead log used to make sure that @@ -140,6 +141,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans, { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *tree_root = fs_info->tree_root; + const bool zoned = btrfs_is_zoned(fs_info); int ret = 0; /* @@ -160,12 +162,20 @@ static int start_log_trans(struct btrfs_trans_handle *trans, mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + if (btrfs_need_log_full_commit(trans)) { ret = -EAGAIN; goto out; } + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } + if (!root->log_start_pid) { clear_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); root->log_start_pid = current->pid; @@ -173,6 +183,17 @@ static int start_log_trans(struct btrfs_trans_handle *trans, set_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); } } else { + if (zoned) { + mutex_lock(&fs_info->tree_log_mutex); + if (fs_info->log_root_tree) + ret = -EAGAIN; + else + ret = btrfs_init_log_root_tree(trans, fs_info); + mutex_unlock(&fs_info->tree_log_mutex); + } + if (ret) + goto out; + ret = btrfs_add_log_tree(trans, root); if (ret) goto out; @@ -201,14 +222,22 @@ static int start_log_trans(struct btrfs_trans_handle *trans, */ static int join_running_log_trans(struct btrfs_root *root) { + const bool zoned = btrfs_is_zoned(root->fs_info); int ret = -ENOENT; if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &root->state)) return ret; mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + ret = 0; + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } atomic_inc(&root->log_writers); } mutex_unlock(&root->log_mutex); From patchwork Fri Jan 15 06:53:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12021759 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64C2FC433E0 for ; Fri, 15 Jan 2021 07:01:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 35CAC225AA for ; Fri, 15 Jan 2021 07:01:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730635AbhAOHB3 (ORCPT ); Fri, 15 Jan 2021 02:01:29 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41680 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730242AbhAOHBW (ORCPT ); Fri, 15 Jan 2021 02:01:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1610694081; x=1642230081; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=csLrrBm102qZuWi7D/pLp/YzsEOPSdkmDmkPCx5pep0=; b=mx7jRgYdmq9jzwstZyKQKa/U7pNNHKfi5ALcw1Nfoe/Lyzj9tQM7yuLY VImisj0m3itGLjLiO8zMw3LC+2dmt+y3SzZITb/iuXej9nVaYBoi8VOcU S72ikRwFbpEQW/ZR6qBG0aYSuxw9O5VSpgQQNPMle1BIzzA6QknJd+h6j g2+wBKjKekq+sIxIhe78XajBZ3VfoGKgpr/nHDrK5eyp7Ka0zxb2MCLXz troYAAKDn4PnDvsPIYJZ1V/1TgCP/uiTrUH+N8vkHohZz4cdAfijsAWvN aav32u9YO1TteCzk/HgmeGuhrmb+wq9jDZOIdbL+eohYIklBWhfVdh8Bm g==; IronPort-SDR: 0U4N7+dTgMFXe6RnAa9LL1YliS+3V9Dz5e2aEqgx9BsIkM5VgjA8JNWPQkuSWiPrrYDKJ8Tcyt imgbJuBt301SDC7dfVrH17tew9IIMFX6KxtDyTAcw4M7PXKSsnxtAaNUUvtbfUeIYwtNqippcn B7QiKKRZJcmwPhzCMndvWFYQw8jiBPQ42/LPxDU1GKKJkf7Vaui5/QED/Mng6yRFFLbPzPdq6n yWWSu5b+QKfLnbHouo9K4onV9yBio/MA1zBPn+QyUCgV+DuT1sBLmUxRfrhB/46ZIu0xELdekx WtE= X-IronPort-AV: E=Sophos;i="5.79,348,1602518400"; d="scan'208";a="161928332" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jan 2021 14:56:27 +0800 IronPort-SDR: vqYf03/TOb5w/pmfOyfBil4T+Qxal3w7Z0npkwCs2ZPWoC7D/K0lzcyYxT4XY+FsTIsagyW/kE CVZJ8hVgp+H8F2QepHQ0Pprg6NxAPweJVzmDPf+TBf9UQCxFMap7zOLDK7mp5przhOg6iaGtdy iAO6OdXZMdJqs319MsfP+H895WD1ozowBpAvlvpr7Nq7Q6HCGzZKq7Hh3ao7dz96I7kV2ir6ZV K8V/5Ho53IF9mQcgrKKTfr6H+bufvkLtmAo58S6m3Zqkwp/G34hN6ICTZXE7s2rThiQqz4MR4w jOM1wD0Joqe91ZfulopxuosG Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 22:41:09 -0800 IronPort-SDR: zFsbnVfml/IZovoWvEj385HiMenjtojOf4stmxLN/SV3DNmZh1WZG5OP48nu9oBlxI7T1ltz55 JivfmctVVWz2EsCp1OnIsSHLgGf+7dFM8A/ZpRtqsUQNOUJsK1e14QFutnSjjPrzm+9+/xTWjf QzqLIAKZBVF4yoSSAjUl8eLmCe9YvH9sHqYJ86wzxeMmO39xy0cyX/HKwkd6u8ibPIYVZSyKTq lq70pS8Q90iF0M2dOPIObks4mgp1FN0vi95DTzPhZH32ozTeWVdtLZJKhjqd7AdVx9JdO3xdWC vMs= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with SMTP; 14 Jan 2021 22:56:26 -0800 Received: (nullmailer pid 1916502 invoked by uid 1000); Fri, 15 Jan 2021 06:55:02 -0000 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v12 41/41] btrfs: reorder log node allocation Date: Fri, 15 Jan 2021 15:53:45 +0900 Message-Id: <2bec51e7fed71bf3b386360686b8bd25b1c81e62.1610693037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 3/3 patch to enable tree-log on ZONED mode. The allocation order of nodes of "fs_info->log_root_tree" and nodes of "root->log_root" is not the same as the writing order of them. So, the writing causes unaligned write errors. This patch reorders the allocation of them by delaying allocation of the root node of "fs_info->log_root_tree," so that the node buffers can go out sequentially to devices. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 7 ------- fs/btrfs/tree-log.c | 24 ++++++++++++++++++------ 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 12c23cb410fd..0b403affa59c 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1241,18 +1241,11 @@ int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; - int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); - ret = btrfs_alloc_log_tree_node(trans, log_root); - if (ret) { - btrfs_put_root(log_root); - return ret; - } - WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 71a1c0b5bc26..d8315363dc1e 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3159,6 +3159,16 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, list_add_tail(&root_log_ctx.list, &log_root_tree->log_ctxs[index2]); root_log_ctx.log_transid = log_root_tree->log_transid; + mutex_lock(&fs_info->tree_log_mutex); + if (!log_root_tree->node) { + ret = btrfs_alloc_log_tree_node(trans, log_root_tree); + if (ret) { + mutex_unlock(&fs_info->tree_log_mutex); + goto out; + } + } + mutex_unlock(&fs_info->tree_log_mutex); + /* * Now we are safe to update the log_root_tree because we're under the * log_mutex, and we're a current writer so we're holding the commit @@ -3317,12 +3327,14 @@ static void free_log_tree(struct btrfs_trans_handle *trans, .process_func = process_one_buffer }; - ret = walk_log_tree(trans, log, &wc); - if (ret) { - if (trans) - btrfs_abort_transaction(trans, ret); - else - btrfs_handle_fs_error(log->fs_info, ret, NULL); + if (log->node) { + ret = walk_log_tree(trans, log, &wc); + if (ret) { + if (trans) + btrfs_abort_transaction(trans, ret); + else + btrfs_handle_fs_error(log->fs_info, ret, NULL); + } } clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1,