From patchwork Tue Nov 10 11:26:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894007 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D9CC4697 for ; Tue, 10 Nov 2020 11:28:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ACC2A20795 for ; Tue, 10 Nov 2020 11:28:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="HHl1qV5D" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729679AbgKJL2H (ORCPT ); Tue, 10 Nov 2020 06:28:07 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11933 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729772AbgKJL2G (ORCPT ); Tue, 10 Nov 2020 06:28:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007685; x=1636543685; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BuRFMbf+ahickXV6NT593M5CLgBwFnCrpXMYcWjKFAM=; b=HHl1qV5DTGWW6CMwx+X+TqhjQRsKDyEnO1ecNUpGdWZs9C3MKa9dhQoo sVFt2QDDV9SG04jklXl9W7orvWDWle3kp2U6up/06ZCgtsbWZohQAuGij H0Lg+qEIYEg5cN3ewyAX4eil2hRLUECMmmQ5iVTCLWKgg8dQsZmtED4eO 2COuMNARmgS1DdxFLXVmka4WuA83ev21+uz4ozdguUv2xYJGEowIxMG22 G33aslcn49qDzk9kGgdbHyKWLNBEOCN2KqLYZAQPmnWztY+wPHN31rX+w RXx/Jjd4THdGLgnoG5qz/riSoBGHQMjnhpiqKond28sfuqGVkukM4i4z4 g==; IronPort-SDR: f1wjh3C4aRlXuh2ieCU6DBi2L6dbDqq27ZoUbt9pa+BcrPsX1it0rS8z6KCyaNzQpxDGn/CEr9 fXeoMq1oDo4TrHO8Zvl5NE0LNIyYVGcvECrfJyZMCOsv7ir7e05L5hyaQl2GlOb15AZVyDtMga aYtL0ZuJju2Qg+LKFdfWBem1SezcISzeBruDtXuYqnGo9lZvJNIDoYMay+WTYsNwOj/xdziPcD 1hSPAs3ulAFjWrO4SAhspuhYH3SvsOnb9Uvy3xPM7OqMiqgMuZj7WtoZ1QqaJJzGlggOq9s9zr LVc= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376409" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:05 +0800 IronPort-SDR: ae5CWyyuLvH0bNoQZn09RNCV5PfrhrdK7182PdqR5OGAS7Aa0BzQIARri/1vtYmoMAelmPh5GT wFvbbIioXV1OlqI0OKZSf8LCKIhxqLS6CJ94zDaiSZIKKDuviDH3UqYa3eDEf4GubCLT2G5yhZ n3SxsDJkMqG7Sry8nHPpyNl8chT3s6ldyaoTFx7nzFYYMQ7GPoVtcggK5jCeFF4Tjly6QE4cv4 Jt1wrif+nccL57eZ2fSX5FZp1ciKBDERDjaIKRgUeL+6V0/UM2x5mPYy5PtN1XOIsQMA2VZGlW YhuPhIfjgmBn4fcCqfX7IXmc Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:06 -0800 IronPort-SDR: tw16f9TXcwu7AadHvvBEUstkeljEO2GSMLkNdJjpJ0enwCGlG1rFFEHdz2fiKLCHHpuVc+pkk+ qCkVMkHezCXMRQpuSVfiSejc1/W6rxAWHZLMMhyDfdN7qps11mM1So3WZzURbu9j9OdA6W5glw GWb8DzvIb0mIb60XjhwIY9CywY77TK8A4DvLj9ovg3oxAfMy3ojAoXDHVBH4b0bKHsSVZ3yuCt 3s8ZL722HMXdKVTmNsUtLByHrpSWXxxK/dOOouIFQdSpMjmbxVvYaXc88uGDbAaFss+37yYYtt +BI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:04 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v10 01/41] block: add bio_add_zone_append_page Date: Tue, 10 Nov 2020 20:26:04 +0900 Message-Id: <01fbaba7b2f2404489b5779e1719ebf3d062aadc.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which is intended to be used by file systems that directly add pages to a bio instead of using bio_iov_iter_get_pages(). Cc: Jens Axboe Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- block/bio.c | 38 ++++++++++++++++++++++++++++++++++++++ include/linux/bio.h | 2 ++ 2 files changed, 40 insertions(+) diff --git a/block/bio.c b/block/bio.c index 58d765400226..c8943201c26c 100644 --- a/block/bio.c +++ b/block/bio.c @@ -853,6 +853,44 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, } EXPORT_SYMBOL(bio_add_pc_page); +/** + * bio_add_zone_append_page - attempt to add page to zone-append bio + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist of a bio that will be submitted + * for a zone-append request. This can fail for a number of reasons, such as the + * bio being full or the target block device is not a zoned block device or + * other limitations of the target block device. The target block device must + * allow bio's up to PAGE_SIZE, so it is always possible to add a single page + * to an empty bio. + * + * Returns: number of bytes added to the bio, or 0 in case of a failure. + */ +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset) +{ + struct request_queue *q; + bool same_page = false; + + if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND)) + return 0; + + if (WARN_ON_ONCE(!bio->bi_disk)) + return 0; + + q = bio->bi_disk->queue; + + if (WARN_ON_ONCE(!blk_queue_is_zoned(q))) + return 0; + + return bio_add_hw_page(q, bio, page, len, offset, + queue_max_zone_append_sectors(q), &same_page); +} +EXPORT_SYMBOL_GPL(bio_add_zone_append_page); + /** * __bio_try_merge_page - try appending data to an existing bvec. * @bio: destination bio diff --git a/include/linux/bio.h b/include/linux/bio.h index c6d765382926..7ef300cb4e9a 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -442,6 +442,8 @@ void bio_chain(struct bio *, struct bio *); extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int); extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *, unsigned int, unsigned int); +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset); bool __bio_try_merge_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off, bool *same_page); void __bio_add_page(struct bio *bio, struct page *page, From patchwork Tue Nov 10 11:26:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894011 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47110138B for ; Tue, 10 Nov 2020 11:28:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1AC8E20795 for ; Tue, 10 Nov 2020 11:28:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Bu5RM2iM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729772AbgKJL2P (ORCPT ); Tue, 10 Nov 2020 06:28:15 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11940 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730054AbgKJL2H (ORCPT ); Tue, 10 Nov 2020 06:28:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007686; x=1636543686; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8NoeNl3Tt1JCx8RCbgz8L4axYDCtVntSuJmZVeke6U8=; b=Bu5RM2iMagQPJynknSYcAIo5kmb0jaVGonl9ZusHcJ41kjd+/SRGqkYy zuMM2pO74l1Xpfy7lW3sq98T5Y3zMtvX2jEZmWryT5q4x5cthdydhXKlj A72meH38yNl8cm+nbM04Te/2dBEFZIpyAISHUneHbzkcqZCjH82tWJW94 Ri7Cr1KUIJkTdL4SaDKGD83t2pLjYxp1WWDF55o6nYol3cjZMCmtwGDgm o97ypGWX0TsYdoSlfi3+1RBvKgMRAKyDA8GgLT0p00fpMBKhpmS5ni3Qm rvd8XbjKTB8tHgfAgnzy6IfAw/1oidMen34nMKZvE0W5ymum4XCg7WgVu g==; IronPort-SDR: mgBRpTl31GmQPUtCVCtM0feR7WQupWIsrgALLN0dsJsj5wm/FlFUp16CeZT4bDnl139XKff8Xr uf0rQsWHJ3kQljgL0A8NqPA66RoG2caukfMDFAOJT7FTwKoroStd4zM4OjEExVx2pTq7ftvpJq lZKou9qcC2XG8N8CXew42/uizxBR+oyt3hZ9IY3sH9zlQywNtA9QURqEikRd1B5mOPKLXatH/h kk9cTZ53GrRyfP4SfQVWf4w6GWO7hSCnUaghUGd2NzXX8uqzr0jkaUum/It2kHQhirRdl7CLlh fs4= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376411" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:06 +0800 IronPort-SDR: KIGBsKzPCuMDD6Fadu7IlJJ56/68Bucj5/tLMqD7YFhiGV70IPEyCUumcs/mdbuW+QsxBzqef3 vha1ldAdelEgBBohJJhyVtDizL1HE8znp0p/1qOurXk7VKiiOnAvndDnNTUBTqJwynQlcn4+eP zXh05ZdD4/+E33tZIeFTddKWaMIHGTZ3uFGSxyJlQGTTy3/bEAVDvlTqbf6UOWDVuqYsoHg+fq 3kSXF5mgg4GR03YvG6a61KLkOpeL8OCyXYaKGnx7Z70nhHRMckKw7wbdI98v1MNNDCv6xl2v/J ybQNEz/Fh+XQfa7CKSkzFUd4 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:07 -0800 IronPort-SDR: 8bfWCJKgm7ywL4SpSw2b7WVFlH18SBGc1bZ5XNTIz7ciYqQ7TzyaaGiWTB/NmgDB0pTdrSVrHH 7H1pfVMsT8J8cxCT7yJeHY4yGh9PAgrK+4GpIsWxgOZRGmCF20R3aOFz70rkbIa/eRiGgRKRMW ORCccDMk2gYxHnd8QGfky6Cdmzd35VcuTm/RVA/WkJ1j2P6n1Tf89iJrmQUrV4UtTtUIBdl9bw MSfEeVQKfTZGQx9kCw39JzB6Ovd1+zPZCr1isalMYeeN7L9ETON3qGzVp21UNMSITEFh3XjIKJ t68= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:05 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 02/41] iomap: support REQ_OP_ZONE_APPEND Date: Tue, 10 Nov 2020 20:26:05 +0900 Message-Id: <72734501cc1d9e08117c215ed60f7b38e3665f14.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds such restricted bio using __bio_iov_append_get_pages if bio_op(bio) == REQ_OP_ZONE_APPEND. To utilize it, we need to set the bio_op before calling bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND and restricted bio. Signed-off-by: Naohiro Aota Reviewed-by: Darrick J. Wong --- fs/iomap/direct-io.c | 41 +++++++++++++++++++++++++++++++++++------ include/linux/iomap.h | 1 + 2 files changed, 36 insertions(+), 6 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index c1aafb2ab990..f04572a55a09 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -200,6 +200,34 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, iomap_dio_submit_bio(dio, iomap, bio, pos); } +/* + * Figure out the bio's operation flags from the dio request, the + * mapping, and whether or not we want FUA. Note that we can end up + * clearing the WRITE_FUA flag in the dio request. + */ +static inline unsigned int +iomap_dio_bio_opflags(struct iomap_dio *dio, struct iomap *iomap, bool use_fua) +{ + unsigned int opflags = REQ_SYNC | REQ_IDLE; + + if (!(dio->flags & IOMAP_DIO_WRITE)) { + WARN_ON_ONCE(iomap->flags & IOMAP_F_ZONE_APPEND); + return REQ_OP_READ; + } + + if (iomap->flags & IOMAP_F_ZONE_APPEND) + opflags |= REQ_OP_ZONE_APPEND; + else + opflags |= REQ_OP_WRITE; + + if (use_fua) + opflags |= REQ_FUA; + else + dio->flags &= ~IOMAP_DIO_WRITE_FUA; + + return opflags; +} + static loff_t iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, struct iomap_dio *dio, struct iomap *iomap) @@ -278,6 +306,13 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; + /* + * Set the operation flags early so that bio_iov_iter_get_pages + * can set up the page vector appropriately for a ZONE_APPEND + * operation. + */ + bio->bi_opf = iomap_dio_bio_opflags(dio, iomap, use_fua); + ret = bio_iov_iter_get_pages(bio, dio->submit.iter); if (unlikely(ret)) { /* @@ -292,14 +327,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, n = bio->bi_iter.bi_size; if (dio->flags & IOMAP_DIO_WRITE) { - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE; - if (use_fua) - bio->bi_opf |= REQ_FUA; - else - dio->flags &= ~IOMAP_DIO_WRITE_FUA; task_io_account_write(n); } else { - bio->bi_opf = REQ_OP_READ; if (dio->flags & IOMAP_DIO_DIRTY) bio_set_pages_dirty(bio); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 4d1d3c3469e9..1bccd1880d0d 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -54,6 +54,7 @@ struct vm_fault; #define IOMAP_F_SHARED 0x04 #define IOMAP_F_MERGED 0x08 #define IOMAP_F_BUFFER_HEAD 0x10 +#define IOMAP_F_ZONE_APPEND 0x20 /* * Flags set by the core iomap code during operations: From patchwork Tue Nov 10 11:26:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894009 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A6B0C697 for ; Tue, 10 Nov 2020 11:28:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7CA8B20795 for ; Tue, 10 Nov 2020 11:28:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="RL/akRa3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729826AbgKJL2Q (ORCPT ); Tue, 10 Nov 2020 06:28:16 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11943 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726152AbgKJL2K (ORCPT ); Tue, 10 Nov 2020 06:28:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007689; x=1636543689; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JURNtChtu4jU2Kkb/XX8ESd3HK5ZuzHxwYwetvjZgvM=; b=RL/akRa3O1zn/QBbaqt3qaRWosLq2YLrgtXN2dUk077VgHUkNgrU+PQf Yg6rIk7i0T8hfx3lq7RGunJo/NmmRNiIJx3J6TyZeryMi3g93kbHc4+TI XqV66rJUMaf4h9qbQGKH0WZ4lSwWWXg2opfVqvc40WNQFpYGizAGzxXrv Txh9ui5VcTYvhSBlUNp939XOPhXj7Rqqf8DPu4qCfn5gFV8S50FkxWPMK 1gNVNgC1J3N2nCt8wvho/Fpm8lQCa4RYMrhZguFCnX0fwj+zSo9NNgunS hhssB6rt02ilSsug/gXR7dR2IZCLWAU9hDJBaqwhYb7Q0bm3oSeXRksRS w==; IronPort-SDR: 1KOfDF8W92K8jSnDPTD8dqyFv45tOp38lGNUB9Sb+gDPLdN4nEcvPr0XJ8rtBDiMY+/RN0ERvs oZm3OI9fZ+cOJQ/kfXYL67v8lyI4XOlamaqSrDMqK5wy2oidrWOj+6f/WYe9goRcJlv61fe2ft PPK/vNosEocG5eCXeO7SzUKRk2nfblJUniFIm3wusiqhzQ98aktqdKNb7SxQr1C7OvJz1YFYEv atGS2qa9Lqlbv5me6+kxXTdUfTw29s9m2YFNFdSDgoNmfgdvA3l2yDB/n3BV2jEVtAPd4/5Y6S qQQ= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376413" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:09 +0800 IronPort-SDR: JCEgWA8K6dgPaHHZViuZLkAXBDIvg45tald8hMZVJjrlY8t+QewflVNoXzHaFdjQ62ao9LOyyX 1kkR2aTHPUIauozsaR6paTcqIEIR96GcjUtPI0yBZ5Lona3w8F4RLgTwkSwGMY6zPfD6+tY88K JhMVl5lMk0QUSqqPb8sVxs1pxa8i5KD+YOl2/l0Tx/HyJNPQgPdns2z+04WilO6s5y9TE6iAUQ ar0NfsSNjuNmpD3qYPMLup+uH2FMLX98CMQDWfjEvacpvl8mC9YXXvbQwEEfyVh+fkieEVkTZV FEsPBhYB2PBqZM68bz91xli9 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:10 -0800 IronPort-SDR: G/5wE+L+D6oY4teJBwHVyS0eq0M9Gtqk4Q9cPJjposUxfyUXIF0XreHpRjIiycfK6YDV+XYH+K tZtjAsxb/7Z2OnNHW20DYYm6N9AWSst2GC1vfEiQsuFTEDcaeqrrutQxM82BA1dWNIdDxE2kJB 9LUz3L5kiw+LfuRTqjIpCEM6rs1lIxZPuowcj9QmwaY9b7UtCcm51WmxTTwyKsDUneb4g0cFO/ E2B/RsFfU3TL8wLWUNHUmCQj/Naq4roeYED7tFeR4M/7tEnjdiIXYpP4twTi8g+Izcu/6XWyIO /sA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:07 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Damien Le Moal , Anand Jain , Johannes Thumshirn Subject: [PATCH v10 03/41] btrfs: introduce ZONED feature flag Date: Tue, 10 Nov 2020 20:26:06 +0900 Message-Id: <5abeb08ecb3fe5776b359d318641ef5078467070.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch introduces the ZONED incompat flag. The flag indicates that the volume management will satisfy the constraints imposed by host-managed zoned block devices. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota Reviewed-by: Anand Jain Reviewed-by: Johannes Thumshirn --- fs/btrfs/sysfs.c | 2 ++ include/uapi/linux/btrfs.h | 1 + 2 files changed, 3 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 279d9262b676..828006020bbd 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -263,6 +263,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES); BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID); BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE); BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34); +BTRFS_FEAT_ATTR_INCOMPAT(zoned, ZONED); static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(mixed_backref), @@ -278,6 +279,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(metadata_uuid), BTRFS_FEAT_ATTR_PTR(free_space_tree), BTRFS_FEAT_ATTR_PTR(raid1c34), + BTRFS_FEAT_ATTR_PTR(zoned), NULL }; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 2c39d15a2beb..5df73001aad4 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -307,6 +307,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) #define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) +#define BTRFS_FEATURE_INCOMPAT_ZONED (1ULL << 12) struct btrfs_ioctl_feature_flags { __u64 compat_flags; From patchwork Tue Nov 10 11:26:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894013 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 63A33697 for ; Tue, 10 Nov 2020 11:28:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2564D20797 for ; Tue, 10 Nov 2020 11:28:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="rg1lguh5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730057AbgKJL2W (ORCPT ); Tue, 10 Nov 2020 06:28:22 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11940 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbgKJL2L (ORCPT ); Tue, 10 Nov 2020 06:28:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007691; x=1636543691; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KmfkIiYRRcMNCe21CPpoRyCIvOUVBrmuf8I7xRb7U/c=; b=rg1lguh5TPti1F7wsCMbKcwmioVzKt136jbR4ob05qD9TEM5DKO61KDy fWP4U8kq28qlcYh4WyWSCoSKuIovREBfIFQQZGQqH9QBXyZEbpUILPj0w vUxZk1UpQxaJOB3brrOO0f+GHIIAJAFgQoUSwK0w/a7FGFUtzVEEWNwh8 4/m+3OW+95Dozw0qguXank4vJmC9Qc41uX4W1+FxNJmvgv2SNO9BWBjKw 8qm46o1trBiIgaPuTSgTLDVu3B8quOURGR83FS4ozSYMaz9xWCOLqJgwt GRKmlSNCHXmiS0Yoe/S0+9FrDx+W9Jkr9gb52Rne7hYTPrW5D4MmoIlcS A==; IronPort-SDR: +wcSj9+Nca+06XtHLzfuAWJPaVqbLI8qsR6/AS146BcCqsqK9GOKptn0ir0dQJDVHTVZZ6TSWQ fHitmnYkHfo3rRr0SejDwfQ8a+hFwKZPiIpaDSncmRixNdhqPsRjXqZ6xcT7SswaNz8/Xf1MFA tgjHk9EFdAb24zggvmQ6rZ0Q97K2yjxY5n8hf2esw7reVrt8uTg30/JB/sP+gzODxBuDpM8g5G NkTgCrMnxKWS5TO4ahHn0xWYRW7iYp0/ZawK7yGIVAhYAP+goakdkUxNGdHClkbUrU0KwkOI23 vIo= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376414" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:11 +0800 IronPort-SDR: YwDpQsYa4y/V2Bp82pGEaCEg4evkdZnHhWP9XdRckXA44fHl+l/vWq8HFfO9lWXbF1iTjzm/u6 59hN3NKERL9J957252f7suYLCI97kEbOLfoLMRM1OiNo2y9D9tg9lp+r1IKzxXCL6BiCX2MFpn Si1jrRLTUEc/JPlbYLrEx/ZzvCYFy5PsqBA3TlHuzPEH3zrMyc539QPidXFKW82lxZcMpP5tAr snuxxlGJpUzI/k6XseKPMWAfT4BGGJcks9w5n4jUzk6pSb6uVYsWPDamQwbQH+1z+Ln7EnKg1O AkpPdXwHXNJUB5OcYxDbqjAJ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:11 -0800 IronPort-SDR: QCXKOXZQ80NT8cI1OHh32rbCiHPL2m9Vzx7wc/3m0jP1W1cxWJuJVKoQLOfgRusow85kJ6KdFw nVEQ3b5FSFeNf1BIExDSbpTMgKf9kT35W6q8wl10p7AqNayetu/DX1J6asECg9hYYBVg0qI+/1 KIa4krs1IARLtFy7eUy+QzEKdkAPVpeqjFjN3xI72iK+hsyI6Mr0p3Eh0eSOUdRxoy9z+znLJS siFPDevK8owA4CKmNKpztcQjOql3V4+JHHhNbubRaCJEnIYJnxIqGCHqAAkpcI3sbvKx45YXRn 6ok= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:09 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Damien Le Moal , Josef Bacik Subject: [PATCH v10 04/41] btrfs: get zone information of zoned block devices Date: Tue, 10 Nov 2020 20:26:07 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If a zoned block device is found, get its zone information (number of zones and zone size) using the new helper function btrfs_get_dev_zone_info(). To avoid costly run-time zone report commands to test the device zones type during block allocation, attach the seq_zones bitmap to the device structure to indicate if a zone is sequential or accept random writes. Also it attaches the empty_zones bitmap to indicate if a zone is empty or not. This patch also introduces the helper function btrfs_dev_is_sequential() to test if the zone storing a block is a sequential write required zone and btrfs_dev_is_empty_zone() to test if the zone is a empty zone. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/Makefile | 1 + fs/btrfs/dev-replace.c | 5 ++ fs/btrfs/super.c | 5 ++ fs/btrfs/volumes.c | 19 ++++- fs/btrfs/volumes.h | 4 + fs/btrfs/zoned.c | 182 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 91 +++++++++++++++++++++ 7 files changed, 305 insertions(+), 2 deletions(-) create mode 100644 fs/btrfs/zoned.c create mode 100644 fs/btrfs/zoned.h diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index e738f6206ea5..0497fdc37f90 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -16,6 +16,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o btrfs-$(CONFIG_BTRFS_FS_REF_VERIFY) += ref-verify.o +btrfs-$(CONFIG_BLK_DEV_ZONED) += zoned.o btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \ tests/extent-buffer-tests.o tests/btrfs-tests.o \ diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 20ce1970015f..6f6d77224c2b 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -21,6 +21,7 @@ #include "rcu-string.h" #include "dev-replace.h" #include "sysfs.h" +#include "zoned.h" /* * Device replace overview @@ -291,6 +292,10 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE); device->fs_devices = fs_info->fs_devices; + ret = btrfs_get_dev_zone_info(device); + if (ret) + goto error; + mutex_lock(&fs_info->fs_devices->device_list_mutex); list_add(&device->dev_list, &fs_info->fs_devices->devices); fs_info->fs_devices->num_devices++; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 8840a4fa81eb..ed55014fd1bd 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2462,6 +2462,11 @@ static void __init btrfs_print_mod_info(void) #endif #ifdef CONFIG_BTRFS_FS_REF_VERIFY ", ref-verify=on" +#endif +#ifdef CONFIG_BLK_DEV_ZONED + ", zoned=yes" +#else + ", zoned=no" #endif ; pr_info("Btrfs loaded, crc32c=%s%s\n", crc32c_impl(), options); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 58b9c419a2b6..e787bf89f761 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -31,6 +31,7 @@ #include "space-info.h" #include "block-group.h" #include "discard.h" +#include "zoned.h" const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { [BTRFS_RAID_RAID10] = { @@ -374,6 +375,7 @@ void btrfs_free_device(struct btrfs_device *device) rcu_string_free(device->name); extent_io_tree_release(&device->alloc_state); bio_put(device->flush_bio); + btrfs_destroy_dev_zone_info(device); kfree(device); } @@ -667,6 +669,11 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; + /* Get zone type information of zoned block devices */ + ret = btrfs_get_dev_zone_info(device); + if (ret != 0) + goto error_free_page; + fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { @@ -1143,6 +1150,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) device->bdev = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); + btrfs_destroy_dev_zone_info(device); device->fs_info = NULL; atomic_set(&device->dev_stats_ccnt, 0); @@ -2543,6 +2551,14 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path } rcu_assign_pointer(device->name, name); + device->fs_info = fs_info; + device->bdev = bdev; + + /* Get zone type information of zoned block devices */ + ret = btrfs_get_dev_zone_info(device); + if (ret) + goto error_free_device; + trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { ret = PTR_ERR(trans); @@ -2559,8 +2575,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path fs_info->sectorsize); device->disk_total_bytes = device->total_bytes; device->commit_total_bytes = device->total_bytes; - device->fs_info = fs_info; - device->bdev = bdev; set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state); device->mode = FMODE_EXCL; @@ -2707,6 +2721,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path sb->s_flags |= SB_RDONLY; if (trans) btrfs_end_transaction(trans); + btrfs_destroy_dev_zone_info(device); error_free_device: btrfs_free_device(device); error: diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index bf27ac07d315..9c07b97a2260 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -51,6 +51,8 @@ struct btrfs_io_geometry { #define BTRFS_DEV_STATE_REPLACE_TGT (3) #define BTRFS_DEV_STATE_FLUSH_SENT (4) +struct btrfs_zoned_device_info; + struct btrfs_device { struct list_head dev_list; /* device_list_mutex */ struct list_head dev_alloc_list; /* chunk mutex */ @@ -64,6 +66,8 @@ struct btrfs_device { struct block_device *bdev; + struct btrfs_zoned_device_info *zone_info; + /* the mode sent to blkdev_get */ fmode_t mode; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c new file mode 100644 index 000000000000..b7ffe6670d3a --- /dev/null +++ b/fs/btrfs/zoned.c @@ -0,0 +1,182 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include "ctree.h" +#include "volumes.h" +#include "zoned.h" +#include "rcu-string.h" + +/* Maximum number of zones to report per blkdev_report_zones() call */ +#define BTRFS_REPORT_NR_ZONES 4096 + +static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, + void *data) +{ + struct blk_zone *zones = data; + + memcpy(&zones[idx], zone, sizeof(*zone)); + + return 0; +} + +static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, + struct blk_zone *zones, unsigned int *nr_zones) +{ + int ret; + + if (!*nr_zones) + return 0; + + ret = blkdev_report_zones(device->bdev, pos >> SECTOR_SHIFT, *nr_zones, + copy_zone_info_cb, zones); + if (ret < 0) { + btrfs_err_in_rcu(device->fs_info, + "zoned: failed to read zone %llu on %s (devid %llu)", + pos, rcu_str_deref(device->name), + device->devid); + return ret; + } + *nr_zones = ret; + if (!ret) + return -EIO; + + return 0; +} + +int btrfs_get_dev_zone_info(struct btrfs_device *device) +{ + struct btrfs_zoned_device_info *zone_info = NULL; + struct block_device *bdev = device->bdev; + sector_t nr_sectors = bdev->bd_part->nr_sects; + sector_t sector = 0; + struct blk_zone *zones = NULL; + unsigned int i, nreported = 0, nr_zones; + unsigned int zone_sectors; + int ret; + + if (!bdev_is_zoned(bdev)) + return 0; + + if (device->zone_info) + return 0; + + zone_info = kzalloc(sizeof(*zone_info), GFP_KERNEL); + if (!zone_info) + return -ENOMEM; + + zone_sectors = bdev_zone_sectors(bdev); + ASSERT(is_power_of_2(zone_sectors)); + zone_info->zone_size = (u64)zone_sectors << SECTOR_SHIFT; + zone_info->zone_size_shift = ilog2(zone_info->zone_size); + zone_info->nr_zones = nr_sectors >> ilog2(bdev_zone_sectors(bdev)); + if (!IS_ALIGNED(nr_sectors, zone_sectors)) + zone_info->nr_zones++; + + zone_info->seq_zones = bitmap_zalloc(zone_info->nr_zones, GFP_KERNEL); + if (!zone_info->seq_zones) { + ret = -ENOMEM; + goto out; + } + + zone_info->empty_zones = bitmap_zalloc(zone_info->nr_zones, GFP_KERNEL); + if (!zone_info->empty_zones) { + ret = -ENOMEM; + goto out; + } + + zones = kcalloc(BTRFS_REPORT_NR_ZONES, + sizeof(struct blk_zone), GFP_KERNEL); + if (!zones) { + ret = -ENOMEM; + goto out; + } + + /* Get zones type */ + while (sector < nr_sectors) { + nr_zones = BTRFS_REPORT_NR_ZONES; + ret = btrfs_get_dev_zones(device, sector << SECTOR_SHIFT, zones, + &nr_zones); + if (ret) + goto out; + + for (i = 0; i < nr_zones; i++) { + if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) + set_bit(nreported, zone_info->seq_zones); + if (zones[i].cond == BLK_ZONE_COND_EMPTY) + set_bit(nreported, zone_info->empty_zones); + nreported++; + } + sector = zones[nr_zones - 1].start + zones[nr_zones - 1].len; + } + + if (nreported != zone_info->nr_zones) { + btrfs_err_in_rcu(device->fs_info, + "inconsistent number of zones on %s (%u / %u)", + rcu_str_deref(device->name), nreported, + zone_info->nr_zones); + ret = -EIO; + goto out; + } + + kfree(zones); + + device->zone_info = zone_info; + + /* + * This function is called from open_fs_devices(), which is before + * we set the device->fs_info. So, we use pr_info instead of + * btrfs_info to avoid printing confusing message like "BTRFS info + * (device ) ..." + */ + + rcu_read_lock(); + if (device->fs_info) + btrfs_info(device->fs_info, + "host-%s zoned block device %s, %u zones of %llu bytes", + bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", + rcu_str_deref(device->name), zone_info->nr_zones, + zone_info->zone_size); + else + pr_info("BTRFS info: host-%s zoned block device %s, %u zones of %llu bytes", + bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", + rcu_str_deref(device->name), zone_info->nr_zones, + zone_info->zone_size); + rcu_read_unlock(); + + return 0; + +out: + kfree(zones); + bitmap_free(zone_info->empty_zones); + bitmap_free(zone_info->seq_zones); + kfree(zone_info); + + return ret; +} + +void btrfs_destroy_dev_zone_info(struct btrfs_device *device) +{ + struct btrfs_zoned_device_info *zone_info = device->zone_info; + + if (!zone_info) + return; + + bitmap_free(zone_info->seq_zones); + bitmap_free(zone_info->empty_zones); + kfree(zone_info); + device->zone_info = NULL; +} + +int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, + struct blk_zone *zone) +{ + unsigned int nr_zones = 1; + int ret; + + ret = btrfs_get_dev_zones(device, pos, zone, &nr_zones); + if (ret != 0 || !nr_zones) + return ret ? ret : -EIO; + + return 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h new file mode 100644 index 000000000000..c9e69ff87ab9 --- /dev/null +++ b/fs/btrfs/zoned.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef BTRFS_ZONED_H +#define BTRFS_ZONED_H + +#include + +struct btrfs_zoned_device_info { + /* + * Number of zones, zone size and types of zones if bdev is a + * zoned block device. + */ + u64 zone_size; + u8 zone_size_shift; + u32 nr_zones; + unsigned long *seq_zones; + unsigned long *empty_zones; +}; + +#ifdef CONFIG_BLK_DEV_ZONED +int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, + struct blk_zone *zone); +int btrfs_get_dev_zone_info(struct btrfs_device *device); +void btrfs_destroy_dev_zone_info(struct btrfs_device *device); +#else /* CONFIG_BLK_DEV_ZONED */ +static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, + struct blk_zone *zone) +{ + return 0; +} + +static inline int btrfs_get_dev_zone_info(struct btrfs_device *device) +{ + return 0; +} + +static inline void btrfs_destroy_dev_zone_info(struct btrfs_device *device) { } + +#endif + +static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) +{ + struct btrfs_zoned_device_info *zone_info = device->zone_info; + + if (!zone_info) + return false; + + return test_bit(pos >> zone_info->zone_size_shift, + zone_info->seq_zones); +} + +static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos) +{ + struct btrfs_zoned_device_info *zone_info = device->zone_info; + + if (!zone_info) + return true; + + return test_bit(pos >> zone_info->zone_size_shift, + zone_info->empty_zones); +} + +static inline void btrfs_dev_set_empty_zone_bit(struct btrfs_device *device, + u64 pos, bool set) +{ + struct btrfs_zoned_device_info *zone_info = device->zone_info; + unsigned int zno; + + if (!zone_info) + return; + + zno = pos >> zone_info->zone_size_shift; + if (set) + set_bit(zno, zone_info->empty_zones); + else + clear_bit(zno, zone_info->empty_zones); +} + +static inline void btrfs_dev_set_zone_empty(struct btrfs_device *device, + u64 pos) +{ + btrfs_dev_set_empty_zone_bit(device, pos, true); +} + +static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device, + u64 pos) +{ + btrfs_dev_set_empty_zone_bit(device, pos, false); +} + +#endif From patchwork Tue Nov 10 11:26:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894015 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B761697 for ; Tue, 10 Nov 2020 11:28:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6A84120795 for ; Tue, 10 Nov 2020 11:28:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="lrc3c2SG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729986AbgKJL2W (ORCPT ); Tue, 10 Nov 2020 06:28:22 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11946 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727536AbgKJL2O (ORCPT ); Tue, 10 Nov 2020 06:28:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007693; x=1636543693; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EUE4mP/cNLBVW+C7fZHuHGi7rGC9yLUMBgyp+vYU9n4=; b=lrc3c2SGq2J1jS++rDshyBkUPjPDJI2T+ZzSRaVS0xJBFhnsThIfb/V6 ZsFVH7UlVEX5Cmso9Y1uJgG8u0si5pK4J6UjxzMIP5INJeHHkV2ZXYXuq dIaMq+eztadfC9VEu1fWhNAg8/mYZh7QCdyBNKVbXkmTIj8kf5QO0jDZ2 OQ/mczfhA2UqoUYkU2fU6sM0kxxqOLwgPwFr6JoUghU2pQmCX5gPnCrsj dr1x9tytv9Do8WhDLLjfAkJLeXVPMN+vhUi+YJhFHyZ19N1MGD87wB+/w gQdyJ3fpK+vIR/MEse4F8j18mHlqNX+RmA7NqCin0ncjH78Nqt1lWeoXs Q==; IronPort-SDR: RhopRcQ1Sp2NHa0Wdn2fY/n/4CSDzYTF6dcryj1JHov+37mXa6/WUNWOAd8hgTBrV6qkZ6MwnZ WRGW/8R/DYcK9j2PiWtNfWpqw7qJueIwYQ923RjIlwsMofjfN0TF8hlVT3qRMGWYtj4fGpuvkQ LdsUwXjiTlSBT9OLIfKrPRFqFRkpovry0ZDU+hUb/To2MKGtI7qNhebbXpIfL8GEAUJkXQE9k5 3/YinLJovUQGuI97I5W8vWiB4MCShm/NIo471iWY+/dVxTHmfYwwCmHSXRMKJ2qedConBiFTEq /cE= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376417" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:12 +0800 IronPort-SDR: BO4+psgAL28WnefEmOd9asBEATWMtUH6CLkx+OfPmY6+kW9CEo/GO/PrcKda8B5C5H8dMla3x5 mhfcQuXPwR/HxeVce9wBNizuiBP2+Xgzk+Qsu40qN1TB8zG7L4j0STc0wrRKavB6m+/ykRTsgD z/9X1GgE1ErX3yabhAC9iulN9Y01C0Op+LUHmb804p8qvrLSTbgp4ldRsqC+hSAjB+UmzPqGlE ecwlDUDI7QNM8kWxQ9Kvty6X3JOO1pWc23E0wtn8emWcv3ICDAQBqAUowJLQptMDm1l6mCx3cx cy89U421fy7P4S2x+nEc3IH8 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:13 -0800 IronPort-SDR: IYDyPkeoUEgIChjiqGpOUP7XriyRE6R2vxhpeD5nrn9FRfyXNIZkJFXrjx5qIGCo+zGUdKlv15 1HXzhdFr6rFXzxPrRcv9OEV3df0zwX6CFMd9lem19LVxoHf09kaFe3mAQi7ntTalztBqJyFXiz tpCPQK0fj5Z7Pl21AbeNEZR1OH0+ohzawajdUU/688NU9pe/uBOoi+jGasSqt3m5jXjoNaB9vb kLKHNmP0Pj6gKw9QBXXyB8xFP4UsNZajdWiXw4w3orcT7umvF9U6k/NvCa7tXOsmXN+dCV63LN mdw= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:11 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Johannes Thumshirn , Damien Le Moal , Josef Bacik Subject: [PATCH v10 05/41] btrfs: check and enable ZONED mode Date: Tue, 10 Nov 2020 20:26:08 +0900 Message-Id: <104218b8d66fec2e4121203b90e7673ddac19d6a.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit introduces the function btrfs_check_zoned_mode() to check if ZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Signed-off-by: Johannes Thumshirn Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 11 ++++++ fs/btrfs/dev-replace.c | 7 ++++ fs/btrfs/disk-io.c | 11 ++++++ fs/btrfs/super.c | 1 + fs/btrfs/volumes.c | 5 +++ fs/btrfs/zoned.c | 81 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 26 ++++++++++++++ 7 files changed, 142 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aac3d6f4e35b..453f41ca024e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -948,6 +948,12 @@ struct btrfs_fs_info { /* Type of exclusive operation running */ unsigned long exclusive_operation; + /* Zone size when in ZONED mode */ + union { + u64 zone_size; + u64 zoned; + }; + #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; struct rb_root block_tree; @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info) } #endif +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info) +{ + return fs_info->zoned != 0; +} + #endif diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 6f6d77224c2b..db87f1aa604b 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, return PTR_ERR(bdev); } + if (!btrfs_check_device_zone_type(fs_info, bdev)) { + btrfs_err(fs_info, + "dev-replace: zoned type of target device mismatch with filesystem"); + ret = -EINVAL; + goto error; + } + sync_blockdev(bdev); list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) { diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 764001609a15..e76ac4da208d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -42,6 +42,7 @@ #include "block-group.h" #include "discard.h" #include "space-info.h" +#include "zoned.h" #define BTRFS_SUPER_FLAG_SUPP (BTRFS_HEADER_FLAG_WRITTEN |\ BTRFS_HEADER_FLAG_RELOC |\ @@ -2976,6 +2977,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) btrfs_info(fs_info, "has skinny extents"); + fs_info->zoned = features & BTRFS_FEATURE_INCOMPAT_ZONED; + /* * flag our filesystem as having big metadata blocks if * they are bigger than the page size @@ -3130,7 +3133,15 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device btrfs_free_extra_devids(fs_devices, 1); + ret = btrfs_check_zoned_mode(fs_info); + if (ret) { + btrfs_err(fs_info, "failed to inititialize zoned mode: %d", + ret); + goto fail_block_groups; + } + ret = btrfs_sysfs_add_fsid(fs_devices); + if (ret) { btrfs_err(fs_info, "failed to init sysfs fsid interface: %d", ret); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index ed55014fd1bd..3312fe08168f 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -44,6 +44,7 @@ #include "backref.h" #include "space-info.h" #include "sysfs.h" +#include "zoned.h" #include "tests/btrfs-tests.h" #include "block-group.h" #include "discard.h" diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e787bf89f761..10827892c086 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path if (IS_ERR(bdev)) return PTR_ERR(bdev); + if (!btrfs_check_device_zone_type(fs_info, bdev)) { + ret = -EINVAL; + goto error; + } + if (fs_devices->seeding) { seeding_dev = 1; down_write(&sb->s_umount); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index b7ffe6670d3a..1223d5b0e411 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -180,3 +180,84 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, return 0; } + +int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + u64 zoned_devices = 0; + u64 nr_devices = 0; + u64 zone_size = 0; + const bool incompat_zoned = btrfs_is_zoned(fs_info); + int ret = 0; + + /* Count zoned devices */ + list_for_each_entry(device, &fs_devices->devices, dev_list) { + enum blk_zoned_model model; + + if (!device->bdev) + continue; + + model = bdev_zoned_model(device->bdev); + if (model == BLK_ZONED_HM || + (model == BLK_ZONED_HA && incompat_zoned)) { + zoned_devices++; + if (!zone_size) { + zone_size = device->zone_info->zone_size; + } else if (device->zone_info->zone_size != zone_size) { + btrfs_err(fs_info, + "zoned: unequal block device zone sizes: have %llu found %llu", + device->zone_info->zone_size, + zone_size); + ret = -EINVAL; + goto out; + } + } + nr_devices++; + } + + if (!zoned_devices && !incompat_zoned) + goto out; + + if (!zoned_devices && incompat_zoned) { + /* No zoned block device found on ZONED FS */ + btrfs_err(fs_info, + "zoned: no zoned devices found on a zoned filesystem"); + ret = -EINVAL; + goto out; + } + + if (zoned_devices && !incompat_zoned) { + btrfs_err(fs_info, + "zoned: mode not enabled but zoned device found"); + ret = -EINVAL; + goto out; + } + + if (zoned_devices != nr_devices) { + btrfs_err(fs_info, + "zoned: cannot mix zoned and regular devices"); + ret = -EINVAL; + goto out; + } + + /* + * stripe_size is always aligned to BTRFS_STRIPE_LEN in + * __btrfs_alloc_chunk(). Since we want stripe_len == zone_size, + * check the alignment here. + */ + if (!IS_ALIGNED(zone_size, BTRFS_STRIPE_LEN)) { + btrfs_err(fs_info, + "zoned: zone size not aligned to stripe %u", + BTRFS_STRIPE_LEN); + ret = -EINVAL; + goto out; + } + + fs_info->zone_size = zone_size; + + btrfs_info(fs_info, "zoned mode enabled with zone size %llu", + fs_info->zone_size); +out: + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index c9e69ff87ab9..bcb1cb99a4f3 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -4,6 +4,7 @@ #define BTRFS_ZONED_H #include +#include struct btrfs_zoned_device_info { /* @@ -22,6 +23,7 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone); int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); +int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -36,6 +38,15 @@ static inline int btrfs_get_dev_zone_info(struct btrfs_device *device) static inline void btrfs_destroy_dev_zone_info(struct btrfs_device *device) { } +static inline int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return 0; + + btrfs_err(fs_info, "Zoned block devices support is not enabled"); + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -88,4 +99,19 @@ static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device, btrfs_dev_set_empty_zone_bit(device, pos, false); } +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info *fs_info, + struct block_device *bdev) +{ + u64 zone_size; + + if (btrfs_is_zoned(fs_info)) { + zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT; + /* Do not allow non-zoned device */ + return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size; + } + + /* Do not allow Host Manged zoned device */ + return bdev_zoned_model(bdev) != BLK_ZONED_HM; +} + #endif From patchwork Tue Nov 10 11:26:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894141 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E1E78697 for ; Tue, 10 Nov 2020 11:30:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BB77020795 for ; Tue, 10 Nov 2020 11:30:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="DViLR0Pk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732388AbgKJLaX (ORCPT ); Tue, 10 Nov 2020 06:30:23 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11943 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728272AbgKJL2Q (ORCPT ); Tue, 10 Nov 2020 06:28:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007694; x=1636543694; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rJSa0nQ/eAUC9/dwO481wrXWUnETIiwkx7A2/nLOMmI=; b=DViLR0PkwaHELikkfM7vy75S/55eySENvcfsoERls6ZOXeUBSM2wI3Pw IMfSZz0sVCHSGyp9h0hCF7BZtI582MkSKlUAPZRoSyAzvHwlbGiEboc6t 287AJ9jxrkCCc6FUZyakqpTSN4/Ord79s5iP8ifXKmfriW4WjQurM4GDz bkX+yxHe3oQIMz9R8qESpSpXPb9iFkTB+cc+7k8tVtgaxTIa+nRa76Ibh UQ1FybnZqLGKFfT769t/fU4VujF+86BJaXaojRbUV03Gt70rPbZ9pdg8i aBXklMverpt1TGnn4RYDQ6g/38UIa482wl7TWsgzzD18EG3rb2ab78fCH A==; IronPort-SDR: 31pXfEFCMlUgat5ezLwZLhygCjn92S3KLnQAiXw4am/raLILZfdCoxceRSnPHnzwdknNmNh3fj x777IIxQjmFLYtGAvNZUzqHBaDmlOtZYljHLiFZgbqyuvlb7t5lE6Hgv0y6/Ru0Pluw9ljfUlN HbwzdXPJVn0azgYnNJ+kamohKltT1KgXBnXgvz6+HdLCfZ9DCKgqqXQnYwRIZM2MXLkv3fFLs9 v5KvyxBWdBXuWANgiGDmxhSSfXzLqvu/t8L+bhPxegyrdAXYelTerAAtvcU2FRcVZarjUclLwS ljs= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376419" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:14 +0800 IronPort-SDR: NSChn9v+YFnUuB9iwUAsYK0EakV9UzBrgkQuBOn0N8ftUKFqaIWolVQdWVMXyiq0CJozk0rrNF wBNkjRu6jPah5fsn5owPPfiUwAuYiB0CjY7f3jQdWVHXoOMbuDwL9wsr3H8Zk6QFbF0iMQ8SJ2 k5ZQ7p0iLNKBpQasYbkKYYQNLYB1gWXpvDze2fWHVBm6+2uy4VxNCaUkbAI8w9Y/o2bKSyBl6C fsvA4YH0bjIlspemZaW+4DvNFDU7PLVS3a/TZ9eEZQElxwG7LFyJ0HoR1dBmGbdXMnccRHXNfX ZIyBJSnMtNpdrqHfAMIMv0+l Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:15 -0800 IronPort-SDR: tn+WBtw5tj4Ec9fJzoAKpH2lt4z+ej16y6vjauHvz8PAxxpjPKkpoEdNqIxuSxg/9acLurJk2b xm3fPA/T9ZR9xrkrbDUxnF1+8m5teNcakzcnXfVK/FIdnGSfFlMR2klZm9TZ7exJ1rHomH6jgr /pwRxW1dBeS4puGVV1RDaitaCbTMFlUlI4V8Ts1anTHl2M9mRVVZ/M6hme1l2zWDyOzyxqAqrQ +0ubzrYcXoDpz2/EGTWLhC4/JTwzNZzfxIkGjueCZjRbD//ygznZhkeAPYYa3E6eCNQjeoOaS2 RpA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:13 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 06/41] btrfs: introduce max_zone_append_size Date: Tue, 10 Nov 2020 20:26:09 +0900 Message-Id: <173cd5def63acdf094a3b83ce129696c26fd3a3c.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The zone append write command has a maximum IO size restriction it accepts. This is because a zone append write command cannot be split, as we ask the device to place the data into a specific target zone and the device responds with the actual written location of the data. Introduce max_zone_append_size to zone_info and fs_info to track the value, so we can limit all I/O to a zoned block device that we want to write using the zone append command to the device's limits. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/zoned.c | 17 +++++++++++++++-- fs/btrfs/zoned.h | 1 + 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 453f41ca024e..c70d3fcc62c2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -954,6 +954,9 @@ struct btrfs_fs_info { u64 zoned; }; + /* Max size to emit ZONE_APPEND write command */ + u64 max_zone_append_size; + #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; struct rb_root block_tree; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 1223d5b0e411..2897432eb43c 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -48,6 +48,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) { struct btrfs_zoned_device_info *zone_info = NULL; struct block_device *bdev = device->bdev; + struct request_queue *queue = bdev_get_queue(bdev); sector_t nr_sectors = bdev->bd_part->nr_sects; sector_t sector = 0; struct blk_zone *zones = NULL; @@ -69,6 +70,8 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) ASSERT(is_power_of_2(zone_sectors)); zone_info->zone_size = (u64)zone_sectors << SECTOR_SHIFT; zone_info->zone_size_shift = ilog2(zone_info->zone_size); + zone_info->max_zone_append_size = + (u64)queue_max_zone_append_sectors(queue) << SECTOR_SHIFT; zone_info->nr_zones = nr_sectors >> ilog2(bdev_zone_sectors(bdev)); if (!IS_ALIGNED(nr_sectors, zone_sectors)) zone_info->nr_zones++; @@ -188,6 +191,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) u64 zoned_devices = 0; u64 nr_devices = 0; u64 zone_size = 0; + u64 max_zone_append_size = 0; const bool incompat_zoned = btrfs_is_zoned(fs_info); int ret = 0; @@ -201,10 +205,13 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) model = bdev_zoned_model(device->bdev); if (model == BLK_ZONED_HM || (model == BLK_ZONED_HA && incompat_zoned)) { + struct btrfs_zoned_device_info *zone_info = + device->zone_info; + zoned_devices++; if (!zone_size) { - zone_size = device->zone_info->zone_size; - } else if (device->zone_info->zone_size != zone_size) { + zone_size = zone_info->zone_size; + } else if (zone_info->zone_size != zone_size) { btrfs_err(fs_info, "zoned: unequal block device zone sizes: have %llu found %llu", device->zone_info->zone_size, @@ -212,6 +219,11 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) ret = -EINVAL; goto out; } + if (!max_zone_append_size || + (zone_info->max_zone_append_size && + zone_info->max_zone_append_size < max_zone_append_size)) + max_zone_append_size = + zone_info->max_zone_append_size; } nr_devices++; } @@ -255,6 +267,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) } fs_info->zone_size = zone_size; + fs_info->max_zone_append_size = max_zone_append_size; btrfs_info(fs_info, "zoned mode enabled with zone size %llu", fs_info->zone_size); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index bcb1cb99a4f3..52aa6af5d8dc 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -13,6 +13,7 @@ struct btrfs_zoned_device_info { */ u64 zone_size; u8 zone_size_shift; + u64 max_zone_append_size; u32 nr_zones; unsigned long *seq_zones; unsigned long *empty_zones; From patchwork Tue Nov 10 11:26:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894143 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9AEA315E6 for ; Tue, 10 Nov 2020 11:30:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 746B520781 for ; Tue, 10 Nov 2020 11:30:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="SmjsNjR0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732362AbgKJLaY (ORCPT ); Tue, 10 Nov 2020 06:30:24 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11954 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729909AbgKJL2R (ORCPT ); Tue, 10 Nov 2020 06:28:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007696; x=1636543696; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Lqxdx35FUap/CAvTMJciFfgXpY7uJ4ECqpMTE8oPvhg=; b=SmjsNjR0mNfYoFdMrW29l/DRD4JByCG2uJOat9I5LtzvLf3176eD5A6R Q4xtNL7YHKvakemyOX4UAIRVjYpdrHsxi4nNkm6Q3mGaJ5r+F3t3eX5wJ tGJjgRjykA40cB9lW6TcCDVESp3uD2vuLz+jwyeG2R1aWN41gL+x8x2Xu u9KAydONuNj+6WntZ/LBvJHoq3Zhd9jQMolYYEvuYgrLovs20E07iqEKF XFRPLS8O36RbvdsPwKRUfl3ajb8cTNiB6R321VIHwzubMrPd75S7r+t71 lmVUaIXwd/xI9Nijm8M/KC4B0oO4o1tGnOIXUhXu4nB3dWxdm06EKqcsA A==; IronPort-SDR: wnl2pV3WreKfeSerkD3vZKxSPW1bPnIxukRYcT7pSnueSU0mXFAzHu3IgItbHkY5GssAH8EQob mWWKSVonclkBxl8/kHBhV/IUuOs8a3B+ZSsHwyNoojo1lyodWhxWh9h6vK5VTj32hQyldcCcNo UemC/OVZc4qBZM/H1TQX35k8ql8s07S30osSrRyzsLDZt9HBOysrU/48axO6rAkV9frbVUbhFf rlGR0ivVbMwIU/23UE9AdjCXWVuwiEZRXiJfQ2hoan5lewD0o/5ttd4UipYc1ntocxMz50oy0/ Hyk= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376422" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:16 +0800 IronPort-SDR: O+TcR1keJPqb9hWipHSP6HnP7BsQ0s3byC481LVTEEfNgIb3scoLc0inVNclRutdcKbQv3iJoX nE8Jj50MObqgZos7vMakpMocmHk7U/Ifdakr8kxrmj9djdcMYxbVuCou1Sey6gDOAnT0K7S1ep uqbmr/EkY0GbhaTRjXtO6riuy5jo7w7StS+boIA0k2yu042R/txfE/4hs+mZOlnkLKlJuLA/hc iZ/PHqhnIO1h+05XRUN/WaduDGvnMA3xvrRN/n8vqEyOnigiBiVEf+BXEyflfyfcug+CmFCTZq /WxO6D8YwKdqwAOpyY67DtKz Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:17 -0800 IronPort-SDR: G9ICEja8KLzU+TWEHUehzuZoGKMM09JtPyOgZdtiyLJZMEyu1oB+IQsYGKN3wfQLOJBfJbrgUx npOf6DoTryIHeyCqwXIk+Ob9007tKfaazV6ofgXs4bR3VBPtKKo8RbrdOLSBU2lOewpS5X2+X4 ByFrpKWPz0mDbp9q/nwHcEMYHorSMW3AMXm692OgiGWZYvDwZ5aCMYtOs6z6guOWqpL00X6ydt KCNTBt4YqEGrO1aKUqq81akyISNgtNkKC1TKCHgIutM7IWqsSgCYYqm/sQFEQ5yG/k0yS5ocXE vuk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:15 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 07/41] btrfs: disallow space_cache in ZONED mode Date: Tue, 10 Nov 2020 20:26:10 +0900 Message-Id: <2276011f71705fff9e6a20966e7f6c601867ecbc.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org As updates to the space cache v1 are in-place, the space cache cannot be located over sequential zones and there is no guarantees that the device will have enough conventional zones to store this cache. Resolve this problem by disabling completely the space cache v1. This does not introduces any problems with sequential block groups: all the free space is located after the allocation pointer and no free space before the pointer. There is no need to have such cache. Note: we can technically use free-space-tree (space cache v2) on ZONED mode. But, since ZONED mode now always allocate extents in a block group sequentially regardless of underlying device zone type, it's no use to enable and maintain the tree. For the same reason, NODATACOW is also disabled. In summary, ZONED will disable: | Disabled features | Reason | |-------------------+-----------------------------------------------------| | RAID/Dup | Cannot handle two zone append writes to different | | | zones | |-------------------+-----------------------------------------------------| | space_cache (v1) | In-place updating | | NODATACOW | In-place updating | |-------------------+-----------------------------------------------------| | fallocate | Reserved extent will be a write hole | |-------------------+-----------------------------------------------------| | MIXED_BG | Allocated metadata region will be write holes for | | | data writes | Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/super.c | 13 +++++++++++-- fs/btrfs/zoned.c | 18 ++++++++++++++++++ fs/btrfs/zoned.h | 6 ++++++ 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 3312fe08168f..1adbbeebc649 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -525,8 +525,15 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, cache_gen = btrfs_super_cache_generation(info->super_copy); if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE)) btrfs_set_opt(info->mount_opt, FREE_SPACE_TREE); - else if (cache_gen) - btrfs_set_opt(info->mount_opt, SPACE_CACHE); + else if (cache_gen) { + if (btrfs_is_zoned(info)) { + btrfs_info(info, + "zoned: clearing existing space cache"); + btrfs_set_super_cache_generation(info->super_copy, 0); + } else { + btrfs_set_opt(info->mount_opt, SPACE_CACHE); + } + } /* * Even the options are empty, we still need to do extra check @@ -985,6 +992,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, ret = -EINVAL; } + if (!ret) + ret = btrfs_check_mountopts_zoned(info); if (!ret && btrfs_test_opt(info, SPACE_CACHE)) btrfs_info(info, "disk space caching is enabled"); if (!ret && btrfs_test_opt(info, FREE_SPACE_TREE)) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 2897432eb43c..d6b8165e2c91 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -274,3 +274,21 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) out: return ret; } + +int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info) +{ + if (!btrfs_is_zoned(info)) + return 0; + + /* + * Space cache writing is not COWed. Disable that to avoid write + * errors in sequential zones. + */ + if (btrfs_test_opt(info, SPACE_CACHE)) { + btrfs_err(info, + "zoned: space cache v1 is not supported"); + return -EINVAL; + } + + return 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 52aa6af5d8dc..81c00a3ed202 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -25,6 +25,7 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); +int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -48,6 +49,11 @@ static inline int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) return -EOPNOTSUPP; } +static inline int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Nov 10 11:26:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894029 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F174C697 for ; Tue, 10 Nov 2020 11:28:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C9AB120795 for ; Tue, 10 Nov 2020 11:28:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="ltGd9qa0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730766AbgKJL2j (ORCPT ); Tue, 10 Nov 2020 06:28:39 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11959 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729911AbgKJL2T (ORCPT ); Tue, 10 Nov 2020 06:28:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007698; x=1636543698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=szoiIRf5AkbzT+2ZmJTgqCaLd0pybpHzchAtHCKhcaU=; b=ltGd9qa0GxmKeHNYQXeOQBsbDqpSEc7jr3o1WcvVtdTNw008nvvABO1d L2Q2xgbfJrxnn/LQcUdqMeALUmI83DJyDswjFSvlrm7BAWEhOwo4gJcV3 YfSiFk7UN8nE2ku/I2I5htF2voOUGuoroLRNElFAWgc4WuRbj0syHNJI0 42opoF5ULinnxhbd92MVs5J4EDwBnSUdZuHmDyXTY60CzT9ei0JX8JER+ XlnmNFx4xa+YuTBNT24gRuk48LejTD5LnWjaOMROGtMtesQ/hqyaeNdfv /X5IXqTgHGcEP8WyrqSD3nyttZy8aXRzHKxmjKTsQWBaGGuK8iW/2Bjkv Q==; IronPort-SDR: OosxW4e9PilAIhUn4fxfLdn9m1EgSmFAsGmaCMTBQCc9lo4JTyQwj0QtG2iyHS1G5bjuEXbgjB YtpgQ/tNg5wkWPyV2FPhMaD3FYRziNrhDPRb6bpuW9wYQlSq9XKWj7VyN5yqPg3xdVe6THKlx0 mCU57K+Q2htW2GFxp0/DHkbNUXMyS3IJfld4H59YMvJUG9W/uxIcS1CgqEAgM82THvxBpx83Gl /EoiVdcyurorA1kW74Sw7y2tlKQ76lPmQY6lVYxB/oPBQX9m14gSzbcS0uz+fCijK16wBHu5ZY sMs= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376426" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:18 +0800 IronPort-SDR: 8RHSm/rj00zHnxhVneHVnbmsa9Jzed/Zn0kwMFS8NfNhC1RXJq3v0QJdKD48Ckh4GX9zAXZmpD hrylgeplBk5rKk1w60ZUBJQ4aZoMHF6GKmIQlDBYUpKHXNMuwGAiu0YTV/VahQespZXvU6qADN 2Ns/qWUd8DpJ4N/FxokCwjAZeYGm6BxzKNvZaS0RAEENFtJNLPtPZMgyJ08JsYYFVMn/lpOo7s l37bGuo+OYDFXsl4mrpfRQ9EvqR1JdcIIdE+PaBBBn2xqWg6m+/dOoLstWkI3HgReil6PdVeEL e8/qpHXnMbbVaEvYS12Ktlvp Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:19 -0800 IronPort-SDR: rysyaEfiyX7o090rjXn/yy5arl0m6HjAiczBOnPnunANhoiNOv3NB6CUMaGlkIgwQVoX1YlhAa 6NRZ92Td2jlhR+i/FgXRJQGP1cStYhWJ1N2cchAYlK6R3tsC9I/iHzQ0ilklVkw6tbA2uxV/w/ 95X7NgcEP7DEJEwxld7qWZZkJtXZc530cvEYOeSPd2zrw3RFIRPM5Ilx08XQmE4p5y/ii2L1wE mmksp5SdvmHf85Aq+DxYs7yIeLmemZvy5l3+VfN+XOwWnJ7XLH2MfUHhGhLn7ScfiLAt1xeov/ CvA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:17 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v10 08/41] btrfs: disallow NODATACOW in ZONED mode Date: Tue, 10 Nov 2020 20:26:11 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org NODATACOW implies overwriting the file data on a device, which is impossible in sequential required zones. Disable NODATACOW globally with mount option and per-file NODATACOW attribute by masking FS_NOCOW_FL. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/ioctl.c | 13 +++++++++++++ fs/btrfs/zoned.c | 5 +++++ 2 files changed, 18 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index ab408a23ba32..d13b522e7bb2 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -193,6 +193,15 @@ static int check_fsflags(unsigned int old_flags, unsigned int flags) return 0; } +static int check_fsflags_compatible(struct btrfs_fs_info *fs_info, + unsigned int flags) +{ + if (btrfs_is_zoned(fs_info) && (flags & FS_NOCOW_FL)) + return -EPERM; + + return 0; +} + static int btrfs_ioctl_setflags(struct file *file, void __user *arg) { struct inode *inode = file_inode(file); @@ -230,6 +239,10 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg) if (ret) goto out_unlock; + ret = check_fsflags_compatible(fs_info, fsflags); + if (ret) + goto out_unlock; + binode_flags = binode->flags; if (fsflags & FS_SYNC_FL) binode_flags |= BTRFS_INODE_SYNC; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index d6b8165e2c91..bd153932606e 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -290,5 +290,10 @@ int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info) return -EINVAL; } + if (btrfs_test_opt(info, NODATACOW)) { + btrfs_err(info, "zoned: NODATACOW not supported"); + return -EINVAL; + } + return 0; } From patchwork Tue Nov 10 11:26:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894131 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 473A4921 for ; Tue, 10 Nov 2020 11:30:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 214E820795 for ; Tue, 10 Nov 2020 11:30:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="nzhnKOp2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730409AbgKJL2i (ORCPT ); Tue, 10 Nov 2020 06:28:38 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11943 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbgKJL2Z (ORCPT ); Tue, 10 Nov 2020 06:28:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007705; x=1636543705; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KmhEYe4FQn2hKciTrSoqULPqx/1dV3UQO8AYYi+tRAQ=; b=nzhnKOp2cqfDU9nywwVzBBs1u3LUVHambSlGGFTwszEiN0fG8ggEgv9Y 4BWGnvaf0zmtYkGbpvHE8HBdoLjxERpqkeZQwZqsC7fIueUYCXOTWur/s KK3v+IugiVI8v7Uo0YcmG6tS5paraDtMn04BWCwl5LStPwUOz5pS5Mco8 ufbsHpTZX89OFQwJTXn0f2o1W3e1tGyCo7hwN0rdXNbvlkCBLy5kUH5Jx TDlF40hwdqv3iDmz5Sax512fsfbFzzLuHf5FEjSSKdbcuLHy3bYFU645Q Cq8aYm/EDpqCEsyozvRw+dUnjGsJO2RtC2rPwwZxl4ctgWlwhr96gkFtM A==; IronPort-SDR: vYU6jjs+Yh84tg6y9UJZ2wMmVdiGP2dxo5CAm30ryrUrbT24d4S5KmgYd6+bOsniVsXtiErzzO +D9ny+Yb4FU9baNIiWGDMWrf3YEvL3OarAYWOdq99JEUwV/jG97hQ+KQqEtrqeJcVGtvwzlYpJ ycSlywNQTPMwDKNCYFAEv2/8fR8AwM69RM3qr00S7r6OCYpiZiAr/kERbRrakjRKghX7mH4TfY jkC38rW4vvS8W39jkQ/6/fw1rjI4G5bYbN+gsg07VQaz6un1ik7yz3aNrJLdd1cOOH6q099CZP ktE= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376435" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:19 +0800 IronPort-SDR: ciu2TAvjxPjrI9cYpMJq+N+70rWbsvqKFs1677orTGf0Z9+n/4gpS923qjCc2Rj6f3o4IPMAiP TN3MP1mxGCZcZoOIBpzNsKzaXB5dSZb1hpFmDt+Kq3aAmg2rtX5P6tJrOnyhWmycuPAJEz3XNb i1rOeFH3AgokQ9iMkIyTm4m1tDPuFAbPw9pLvNB7XrLFZQEkcOR8OzM9Lac4NXpaGlXdBdFwgo pPTWlw0QeKL/cHLgduxD0aDufzRI1SlW0yRe3W7ooqV4Tsg6Ay9CHJsLej6JB9FkfXdhSFSkOI LzbxDx7IBR8N1YljRKpe4OHB Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:20 -0800 IronPort-SDR: 3rRZGCBdHAv3BfMTlNd/l3D4Avuk26fAH8DD2/SN6biLCwn9A4FkgSBO+5VWdj8R+9K4I4YbOT bNW98gnJ4uQgbfjPurswL4loMzZpVajmytFqXkRaTlHasFO4SObDZcV6Op/oe4j7lAZxZIj5MW DvXXn++5Cmb1nF/zORT22fkMvrBto47oAL/2SPBxPssaOaUXo2a2Q3YCmsf3KRHGx2NVVTwtiD VBcCWcHVkb9WAP5FUu8dqZcSMrED6boblt0fgS8ha0K5rCOyChyPEX2maK/b/MFq/pLh5fjJfm fe0= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:19 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Johannes Thumshirn , Josef Bacik Subject: [PATCH v10 09/41] btrfs: disable fallocate in ZONED mode Date: Tue, 10 Nov 2020 20:26:12 +0900 Message-Id: <5136fb8ba2a9746bfc55247c93b86e33bdd7eb7b.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org fallocate() is implemented by reserving actual extent instead of reservations. This can result in exposing the sequential write constraint of host-managed zoned block devices to the application, which would break the POSIX semantic for the fallocated file. To avoid this, report fallocate() as not supported when in ZONED mode for now. In the future, we may be able to implement "in-memory" fallocate() in ZONED mode by utilizing space_info->bytes_may_use or so. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/file.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 0ff659455b1e..68938a43081e 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3341,6 +3341,10 @@ static long btrfs_fallocate(struct file *file, int mode, alloc_end = round_up(offset + len, blocksize); cur_offset = alloc_start; + /* Do not allow fallocate in ZONED mode */ + if (btrfs_is_zoned(btrfs_sb(inode->i_sb))) + return -EOPNOTSUPP; + /* Make sure we aren't being give some crap mode */ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)) From patchwork Tue Nov 10 11:26:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894017 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D74C0138B for ; Tue, 10 Nov 2020 11:28:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B2FE320797 for ; Tue, 10 Nov 2020 11:28:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="nSf/p80n" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730242AbgKJL2a (ORCPT ); Tue, 10 Nov 2020 06:28:30 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11954 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730086AbgKJL2Z (ORCPT ); Tue, 10 Nov 2020 06:28:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007705; x=1636543705; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eq53qh42p+Y2mSsc6rtBdbwPxpAi/ndOdRmRcNutkHE=; b=nSf/p80nK1JFILDRQnX9qZpR8kEVVaj/8VWhs+RSNA6zW0ufJ4HNZCEd vLZyN6oNE1JrAAE0mdp/+xDG2PbdPVOIVA5z6YTEsJTucJf6UkGRoSRIM 99r5+Une4PCiLbNqSvlkK/gkgBsPCwLtefZA75BrHWUD5Z0wP10i9ASG2 AmODRDIxvmUO8bLYfnMMF+cFf+0cdjOJ3PPiAtYKJhNiIVfYQGG8k6UJP em7dLcYZDLwZeWVf3stZRLVOP7Pv7SecN3uiHJ3fk7xLzeMzwJOlJ/SQE 9vwDFajmxAdbTTtpGxEOHcl+ItMAulcUedpkl11t2njWWnh+ryKdDweh9 w==; IronPort-SDR: WmTP4HLy4TpPkNcB2uHzKx67MtBCpwjebFzYijVMzOz7lrFtyC0hg6+LudH+c/Mn338hR8jU0G /GNfW9WQcKu9veKD5S/dmKNjvFWb8wcrDouaUVYrw8PWHop1mUyzG1PBvxPtB6tXL53aXQSTEh 1wKlqqT7Ir8zKFKqrbCN1x4HUL0yexGwxGki69065j/9YEgn/wwrt0Kwbq4HTrnDsrQFkou0TA oiHrkiTiME3KjkKjtcY5voL4rOHPmJ9qrrRTw4EnE2Q2GudUICYV8NOVTEREMQe4tMkVfoHnHC PV4= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376447" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:21 +0800 IronPort-SDR: zrR9Ns92DsjTuAu8zvTVjYvPSEfsQ4as7dGxYebOnOo9uhD5/tXH6TprjPTiAh6DFVgSm++fC8 mLRl0Yw+DKlL8K0AGX4EURb9EdbPKwshnNRRdxIzCQV3YkyFkfKRYREbIklY7hKXxli/nHbtzJ 6hWGBC23JbYFb9wuwFleFX/nWeaqez1PWaGowy1LmbU/WFwy/SbrefxcpjFGZ1QpXOEcmZ/L7e qDXn+DQbDBbk1FWPHmgmbD++AEWedHztkXoxy6ULEYbbVTeZUZzmm0ne69oV30M13sJpb+wRXH odPhQ8OjjSHObZYZHlGwBoem Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:22 -0800 IronPort-SDR: gslDqKU3fWXTJiMCIOxqMKivSSoDBgD8eqMwVEsrF1zT46lmNXLCdYMcu2Aa8s8aWkbSZGYEWK Rqt2J2D21V+KM9u4yxPrJykSUEQ3R2AY+VwFPmEt9tiOsGFdrV2Y/qKo8753q08ZcZLhbidlNs 4yXYPMbsJNTuOe8UO+zwiEPz2QgnlhS6LcWfZd0Nrmo7DcZMbNZLASoJcTFz8O7hq+EiDYDUQK RfaHXHsfb+WPOfpqlo1IjJvZxjZSbwcPfvmBv4EJ4PxhDszgQHtYJT8WrPMeRlO3fMwCKWhTFR Sf8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:20 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 10/41] btrfs: disallow mixed-bg in ZONED mode Date: Tue, 10 Nov 2020 20:26:13 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Placing both data and metadata in a block group is impossible in ZONED mode. For data, we can allocate a space for it and write it immediately after the allocation. For metadata, however, we cannot do so, because the logical addresses are recorded in other metadata buffers to build up the trees. As a result, a data buffer can be placed after a metadata buffer, which is not written yet. Writing out the data buffer will break the sequential write rule. This commit check and disallow MIXED_BG with ZONED mode. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/zoned.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index bd153932606e..f87d35cb9235 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -266,6 +266,13 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) goto out; } + if (btrfs_fs_incompat(fs_info, MIXED_GROUPS)) { + btrfs_err(fs_info, + "zoned: mixed block groups not supported"); + ret = -EINVAL; + goto out; + } + fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; From patchwork Tue Nov 10 11:26:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894021 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 623A2697 for ; Tue, 10 Nov 2020 11:28:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 21B3320781 for ; Tue, 10 Nov 2020 11:28:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="LsdP8OA9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730265AbgKJL2d (ORCPT ); Tue, 10 Nov 2020 06:28:33 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11959 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730097AbgKJL22 (ORCPT ); Tue, 10 Nov 2020 06:28:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007706; x=1636543706; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oZ6XJA7lr1BItXgAy0Yw153M9ql2BHRRcn4NLImZx2k=; b=LsdP8OA9Q8xU90M2ZwNyC0FfNJBmVWWvy43Y9qWl7NAk6yb6+SrHKWku 7qtNJKknMGRqAXifsCPYKQ4TnoBRXtgqsFjUgTa55LVZTwGclinWv81gF Go71wZ4KqV9djGgIWDBR+7VCuWH9YtaNQfidhFhhUzrQTRwhcIr3t5atp oGt9F3KG7ifpCFau4SbjPHbMQj0jS5lEc6ELPIf3M6ZyJp7ZapkRW2QsX F3HLJPHFTm3TzmphBqRGf6QaHwWhlQAgI+QQPTVWFoVBgpEbDyveuHA8B NLizSl2SGI7E3ciMKO9IVNTSA2QIBfaudCuScM5mRX/XKqhtUW3u7Y9qB g==; IronPort-SDR: DR5wlgsUITB1U3ME75tXLjW2MKqq0PHvn9S/FFnIU3j3Y3vn1X3w9vZSGWfGfIUr95HSkM4sbt VDGVNAcUOtCNM6iJWoLos6wEM0Xmw+6DzxymOg5o+M7MiF6+nQLt4CESdMCpGGO3CUaClJpSZY H5hi4JTS3l78FBOats/2C1zJMGZBDi31DNQFvZFyXE2704FAHjMM+50gEylOOw41KEzu6CYSbB if92vbg+W0rs88IRVvdPot1y/2L6JIHC/cbX2RzW1c1i28082PRHcVBn33g/Z8F+DdF9/SdOyO wLQ= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376454" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:23 +0800 IronPort-SDR: Cz5jK+WRgc6sCp1571BHSx9nwlrjcdOMPWoo7kllEmC8l4JQKju9p4bFQNioMuNDllMdxGwa3t 8vLteGXILM7v0QsxSNxpyuOdNUxDhejdFyFwEBCg844hOemSJ/yo3v2ZwGqRlvr/ycTYQxB1LI hEjs+JAzfVfF0txZyf4ESNbN7f/wZfP0kpKcMKiVpF5KWXzUuU+bm+ZemUUnXsRzNHjrMBU6S1 0fAigvD0Nig+umG6xLqnb8bHFsdc+ezr7+zglNkpk2p7yNJJybc8hPkivWpgaweIh/gJ8ANzH4 FLphWl+/KwuBg198evvBsMqI Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:24 -0800 IronPort-SDR: 6B+iP75FK+1YahDgjCdfLrEwExQnownIxE0GBrvkXLB3Ea7owSX5otWJYd6kezaRdHVdCNiHWL jK7tI+JbZSGh3I3PgrJRqP7gFsu2Wx6lRbHhq6UXYk3Ctw3fplWrmUHHe51ElznuoSvFfPymW/ BjNkG9Bxia8PCb8lqCj9dnbWJow3jKjIH2bGHcP6tevDllO5pigQGe19NthKQjAnTcXXxHoCYL C1hNMLHNlws99fVfBhCHikaSuMOy2lGqbCDW+WboHqcb9wanbJsBhrgsD9hM8zR7vSBqukvQ2R 400= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:22 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 11/41] btrfs: implement log-structured superblock for ZONED mode Date: Tue, 10 Nov 2020 20:26:14 +0900 Message-Id: <5aa30b45e2e29018e19e47181586f3f436759b69.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Superblock (and its copies) is the only data structure in btrfs which has a fixed location on a device. Since we cannot overwrite in a sequential write required zone, we cannot place superblock in the zone. One easy solution is limiting superblock and copies to be placed only in conventional zones. However, this method has two downsides: one is reduced number of superblock copies. The location of the second copy of superblock is 256GB, which is in a sequential write required zone on typical devices in the market today. So, the number of superblock and copies is limited to be two. Second downside is that we cannot support devices which have no conventional zones at all. To solve these two problems, we employ superblock log writing. It uses two zones as a circular buffer to write updated superblocks. Once the first zone is filled up, start writing into the second buffer. Then, when the both zones are filled up and before start writing to the first zone again, it reset the first zone. We can determine the position of the latest superblock by reading write pointer information from a device. One corner case is when the both zones are full. For this situation, we read out the last superblock of each zone, and compare them to determine which zone is older. The following zones are reserved as the circular buffer on ZONED btrfs. - The primary superblock: zones 0 and 1 - The first copy: zones 16 and 17 - The second copy: zones 1024 or zone at 256GB which is minimum, and next to it If these reserved zones are conventional, superblock is written fixed at the start of the zone without logging. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 9 ++ fs/btrfs/disk-io.c | 41 ++++- fs/btrfs/scrub.c | 3 + fs/btrfs/volumes.c | 21 ++- fs/btrfs/zoned.c | 329 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 44 ++++++ 6 files changed, 435 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index c0f1d6818df7..6b4831824f51 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1723,6 +1723,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, static int exclude_super_stripes(struct btrfs_block_group *cache) { struct btrfs_fs_info *fs_info = cache->fs_info; + const bool zoned = btrfs_is_zoned(fs_info); u64 bytenr; u64 *logical; int stripe_len; @@ -1744,6 +1745,14 @@ static int exclude_super_stripes(struct btrfs_block_group *cache) if (ret) return ret; + /* Shouldn't have super stripes in sequential zones */ + if (zoned && nr) { + btrfs_err(fs_info, + "zoned: block group %llu must not contain super block", + cache->start); + return -EUCLEAN; + } + while (nr--) { u64 len = min_t(u64, stripe_len, cache->start + cache->length - logical[nr]); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e76ac4da208d..509085a368bb 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3423,10 +3423,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, { struct btrfs_super_block *super; struct page *page; - u64 bytenr; + u64 bytenr, bytenr_orig; struct address_space *mapping = bdev->bd_inode->i_mapping; + int ret; + + bytenr_orig = btrfs_sb_offset(copy_num); + ret = btrfs_sb_log_location_bdev(bdev, copy_num, READ, &bytenr); + if (ret == -ENOENT) + return ERR_PTR(-EINVAL); + else if (ret) + return ERR_PTR(ret); - bytenr = btrfs_sb_offset(copy_num); if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode)) return ERR_PTR(-EINVAL); @@ -3440,7 +3447,7 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, return ERR_PTR(-ENODATA); } - if (btrfs_super_bytenr(super) != bytenr) { + if (btrfs_super_bytenr(super) != bytenr_orig) { btrfs_release_disk_super(super); return ERR_PTR(-EINVAL); } @@ -3495,7 +3502,8 @@ static int write_dev_supers(struct btrfs_device *device, SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); int i; int errors = 0; - u64 bytenr; + int ret; + u64 bytenr, bytenr_orig; if (max_mirrors == 0) max_mirrors = BTRFS_SUPER_MIRROR_MAX; @@ -3507,12 +3515,21 @@ static int write_dev_supers(struct btrfs_device *device, struct bio *bio; struct btrfs_super_block *disk_super; - bytenr = btrfs_sb_offset(i); + bytenr_orig = btrfs_sb_offset(i); + ret = btrfs_sb_log_location(device, i, WRITE, &bytenr); + if (ret == -ENOENT) { + continue; + } else if (ret < 0) { + btrfs_err(device->fs_info, "couldn't get super block location for mirror %d", + i); + errors++; + continue; + } if (bytenr + BTRFS_SUPER_INFO_SIZE >= device->commit_total_bytes) break; - btrfs_set_super_bytenr(sb, bytenr); + btrfs_set_super_bytenr(sb, bytenr_orig); crypto_shash_digest(shash, (const char *)sb + BTRFS_CSUM_SIZE, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE, @@ -3557,6 +3574,7 @@ static int write_dev_supers(struct btrfs_device *device, bio->bi_opf |= REQ_FUA; btrfsic_submit_bio(bio); + btrfs_advance_sb_log(device, i); } return errors < i ? 0 : -1; } @@ -3573,6 +3591,7 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors) int i; int errors = 0; bool primary_failed = false; + int ret; u64 bytenr; if (max_mirrors == 0) @@ -3581,7 +3600,15 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors) for (i = 0; i < max_mirrors; i++) { struct page *page; - bytenr = btrfs_sb_offset(i); + ret = btrfs_sb_log_location(device, i, READ, &bytenr); + if (ret == -ENOENT) { + break; + } else if (ret < 0) { + errors++; + if (i == 0) + primary_failed = true; + continue; + } if (bytenr + BTRFS_SUPER_INFO_SIZE >= device->commit_total_bytes) break; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index cf63f1e27a27..aa1b36cf5c88 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -20,6 +20,7 @@ #include "rcu-string.h" #include "raid56.h" #include "block-group.h" +#include "zoned.h" /* * This is only the first step towards a full-features scrub. It reads all @@ -3704,6 +3705,8 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, if (bytenr + BTRFS_SUPER_INFO_SIZE > scrub_dev->commit_total_bytes) break; + if (!btrfs_check_super_location(scrub_dev, bytenr)) + continue; ret = scrub_pages(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr, scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 10827892c086..db884b96a5ea 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1282,7 +1282,8 @@ void btrfs_release_disk_super(struct btrfs_super_block *super) } static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev, - u64 bytenr) + u64 bytenr, + u64 bytenr_orig) { struct btrfs_super_block *disk_super; struct page *page; @@ -1313,7 +1314,7 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev /* align our pointer to the offset of the super block */ disk_super = p + offset_in_page(bytenr); - if (btrfs_super_bytenr(disk_super) != bytenr || + if (btrfs_super_bytenr(disk_super) != bytenr_orig || btrfs_super_magic(disk_super) != BTRFS_MAGIC) { btrfs_release_disk_super(p); return ERR_PTR(-EINVAL); @@ -1348,7 +1349,8 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, bool new_device_added = false; struct btrfs_device *device = NULL; struct block_device *bdev; - u64 bytenr; + u64 bytenr, bytenr_orig; + int ret; lockdep_assert_held(&uuid_mutex); @@ -1358,14 +1360,18 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - bytenr = btrfs_sb_offset(0); flags |= FMODE_EXCL; bdev = blkdev_get_by_path(path, flags, holder); if (IS_ERR(bdev)) return ERR_CAST(bdev); - disk_super = btrfs_read_disk_super(bdev, bytenr); + bytenr_orig = btrfs_sb_offset(0); + ret = btrfs_sb_log_location_bdev(bdev, 0, READ, &bytenr); + if (ret) + return ERR_PTR(ret); + + disk_super = btrfs_read_disk_super(bdev, bytenr, bytenr_orig); if (IS_ERR(disk_super)) { device = ERR_CAST(disk_super); goto error_bdev_put; @@ -2029,6 +2035,11 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, if (IS_ERR(disk_super)) continue; + if (bdev_is_zoned(bdev)) { + btrfs_reset_sb_log_zones(bdev, copy_num); + continue; + } + memset(&disk_super->magic, 0, sizeof(disk_super->magic)); page = virt_to_page(disk_super); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index f87d35cb9235..84ade8c19ddc 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -10,6 +10,9 @@ /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* Number of superblock log zones */ +#define BTRFS_NR_SB_LOG_ZONES 2 + static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data) { @@ -20,6 +23,106 @@ static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, return 0; } +static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones, + u64 *wp_ret) +{ + bool empty[BTRFS_NR_SB_LOG_ZONES]; + bool full[BTRFS_NR_SB_LOG_ZONES]; + sector_t sector; + + ASSERT(zones[0].type != BLK_ZONE_TYPE_CONVENTIONAL && + zones[1].type != BLK_ZONE_TYPE_CONVENTIONAL); + + empty[0] = (zones[0].cond == BLK_ZONE_COND_EMPTY); + empty[1] = (zones[1].cond == BLK_ZONE_COND_EMPTY); + full[0] = (zones[0].cond == BLK_ZONE_COND_FULL); + full[1] = (zones[1].cond == BLK_ZONE_COND_FULL); + + /* + * Possible state of log buffer zones + * + * E I F + * E * x 0 + * I 0 x 0 + * F 1 1 C + * + * Row: zones[0] + * Col: zones[1] + * State: + * E: Empty, I: In-Use, F: Full + * Log position: + * *: Special case, no superblock is written + * 0: Use write pointer of zones[0] + * 1: Use write pointer of zones[1] + * C: Compare SBs from zones[0] and zones[1], use the newer one + * x: Invalid state + */ + + if (empty[0] && empty[1]) { + /* Special case to distinguish no superblock to read */ + *wp_ret = zones[0].start << SECTOR_SHIFT; + return -ENOENT; + } else if (full[0] && full[1]) { + /* Compare two super blocks */ + struct address_space *mapping = bdev->bd_inode->i_mapping; + struct page *page[BTRFS_NR_SB_LOG_ZONES]; + struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES]; + int i; + + for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) { + u64 bytenr = ((zones[i].start + zones[i].len) << SECTOR_SHIFT) - + BTRFS_SUPER_INFO_SIZE; + + page[i] = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS); + if (IS_ERR(page[i])) { + if (i == 1) + btrfs_release_disk_super(super[0]); + return PTR_ERR(page[i]); + } + super[i] = page_address(page[i]); + } + + if (super[0]->generation > super[1]->generation) + sector = zones[1].start; + else + sector = zones[0].start; + + for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) + btrfs_release_disk_super(super[i]); + } else if (!full[0] && (empty[1] || full[1])) { + sector = zones[0].wp; + } else if (full[0]) { + sector = zones[1].wp; + } else { + return -EUCLEAN; + } + *wp_ret = sector << SECTOR_SHIFT; + return 0; +} + +/* + * The following zones are reserved as the circular buffer on ZONED btrfs. + * - The primary superblock: zones 0 and 1 + * - The first copy: zones 16 and 17 + * - The second copy: zones 1024 or zone at 256GB which is minimum, and + * next to it + */ +static inline u32 sb_zone_number(u8 shift, int mirror) +{ + ASSERT(mirror < BTRFS_SUPER_MIRROR_MAX); + + switch (mirror) { + case 0: + return 0; + case 1: + return 16; + case 2: + return min(btrfs_sb_offset(mirror) >> shift, 1024ULL); + } + + return 0; +} + static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, struct blk_zone *zones, unsigned int *nr_zones) { @@ -122,6 +225,52 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) goto out; } + /* Validate superblock log */ + nr_zones = BTRFS_NR_SB_LOG_ZONES; + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + u32 sb_zone = sb_zone_number(zone_info->zone_size_shift, i); + u64 sb_wp; + int sb_pos = BTRFS_NR_SB_LOG_ZONES * i; + + if (sb_zone + 1 >= zone_info->nr_zones) + continue; + + sector = sb_zone << (zone_info->zone_size_shift - SECTOR_SHIFT); + ret = btrfs_get_dev_zones(device, sector << SECTOR_SHIFT, + &zone_info->sb_zones[sb_pos], + &nr_zones); + if (ret) + goto out; + + if (nr_zones != BTRFS_NR_SB_LOG_ZONES) { + btrfs_err_in_rcu(device->fs_info, + "zoned: failed to read super block log zone info at devid %llu zone %u", + device->devid, sb_zone); + ret = -EUCLEAN; + goto out; + } + + /* + * If zones[0] is conventional, always use the beggining of + * the zone to record superblock. No need to validate in + * that case. + */ + if (zone_info->sb_zones[BTRFS_NR_SB_LOG_ZONES * i].type == + BLK_ZONE_TYPE_CONVENTIONAL) + continue; + + ret = sb_write_pointer(device->bdev, + &zone_info->sb_zones[sb_pos], &sb_wp); + if (ret != -ENOENT && ret) { + btrfs_err_in_rcu(device->fs_info, + "zoned: super block log zone corrupted devid %llu zone %u", + device->devid, sb_zone); + ret = -EUCLEAN; + goto out; + } + } + + kfree(zones); device->zone_info = zone_info; @@ -304,3 +453,183 @@ int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info) return 0; } + +static int sb_log_location(struct block_device *bdev, struct blk_zone *zones, + int rw, u64 *bytenr_ret) +{ + u64 wp; + int ret; + + if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) { + *bytenr_ret = zones[0].start << SECTOR_SHIFT; + return 0; + } + + ret = sb_write_pointer(bdev, zones, &wp); + if (ret != -ENOENT && ret < 0) + return ret; + + if (rw == WRITE) { + struct blk_zone *reset = NULL; + + if (wp == zones[0].start << SECTOR_SHIFT) + reset = &zones[0]; + else if (wp == zones[1].start << SECTOR_SHIFT) + reset = &zones[1]; + + if (reset && reset->cond != BLK_ZONE_COND_EMPTY) { + ASSERT(reset->cond == BLK_ZONE_COND_FULL); + + ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET, + reset->start, reset->len, + GFP_NOFS); + if (ret) + return ret; + + reset->cond = BLK_ZONE_COND_EMPTY; + reset->wp = reset->start; + } + } else if (ret != -ENOENT) { + /* For READ, we want the precious one */ + if (wp == zones[0].start << SECTOR_SHIFT) + wp = (zones[1].start + zones[1].len) << SECTOR_SHIFT; + wp -= BTRFS_SUPER_INFO_SIZE; + } + + *bytenr_ret = wp; + return 0; + +} + +int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw, + u64 *bytenr_ret) +{ + struct blk_zone zones[BTRFS_NR_SB_LOG_ZONES]; + unsigned int zone_sectors; + u32 sb_zone; + int ret; + u64 zone_size; + u8 zone_sectors_shift; + sector_t nr_sectors = bdev->bd_part->nr_sects; + u32 nr_zones; + + if (!bdev_is_zoned(bdev)) { + *bytenr_ret = btrfs_sb_offset(mirror); + return 0; + } + + ASSERT(rw == READ || rw == WRITE); + + zone_sectors = bdev_zone_sectors(bdev); + if (!is_power_of_2(zone_sectors)) + return -EINVAL; + zone_size = zone_sectors << SECTOR_SHIFT; + zone_sectors_shift = ilog2(zone_sectors); + nr_zones = nr_sectors >> zone_sectors_shift; + + sb_zone = sb_zone_number(zone_sectors_shift + SECTOR_SHIFT, mirror); + if (sb_zone + 1 >= nr_zones) + return -ENOENT; + + ret = blkdev_report_zones(bdev, sb_zone << zone_sectors_shift, + BTRFS_NR_SB_LOG_ZONES, copy_zone_info_cb, + zones); + if (ret < 0) + return ret; + if (ret != BTRFS_NR_SB_LOG_ZONES) + return -EIO; + + return sb_log_location(bdev, zones, rw, bytenr_ret); +} + +int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, + u64 *bytenr_ret) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u32 zone_num; + + if (!zinfo) { + *bytenr_ret = btrfs_sb_offset(mirror); + return 0; + } + + zone_num = sb_zone_number(zinfo->zone_size_shift, mirror); + if (zone_num + 1 >= zinfo->nr_zones) + return -ENOENT; + + return sb_log_location(device->bdev, + &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror], + rw, bytenr_ret); +} + +static inline bool is_sb_log_zone(struct btrfs_zoned_device_info *zinfo, + int mirror) +{ + u32 zone_num; + + if (!zinfo) + return false; + + zone_num = sb_zone_number(zinfo->zone_size_shift, mirror); + if (zone_num + 1 >= zinfo->nr_zones) + return false; + + if (!test_bit(zone_num, zinfo->seq_zones)) + return false; + + return true; +} + +void btrfs_advance_sb_log(struct btrfs_device *device, int mirror) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + struct blk_zone *zone; + + if (!is_sb_log_zone(zinfo, mirror)) + return; + + zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror]; + if (zone->cond != BLK_ZONE_COND_FULL) { + + if (zone->cond == BLK_ZONE_COND_EMPTY) + zone->cond = BLK_ZONE_COND_IMP_OPEN; + + zone->wp += (BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT); + + if (zone->wp == zone->start + zone->len) + zone->cond = BLK_ZONE_COND_FULL; + + return; + } + + zone++; + ASSERT(zone->cond != BLK_ZONE_COND_FULL); + if (zone->cond == BLK_ZONE_COND_EMPTY) + zone->cond = BLK_ZONE_COND_IMP_OPEN; + + zone->wp += (BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT); + + if (zone->wp == zone->start + zone->len) + zone->cond = BLK_ZONE_COND_FULL; +} + +int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) +{ + sector_t zone_sectors; + sector_t nr_sectors = bdev->bd_part->nr_sects; + u8 zone_sectors_shift; + u32 sb_zone; + u32 nr_zones; + + zone_sectors = bdev_zone_sectors(bdev); + zone_sectors_shift = ilog2(zone_sectors); + nr_zones = nr_sectors >> zone_sectors_shift; + + sb_zone = sb_zone_number(zone_sectors_shift + SECTOR_SHIFT, mirror); + if (sb_zone + 1 >= nr_zones) + return -ENOENT; + + return blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET, + sb_zone << zone_sectors_shift, + zone_sectors * BTRFS_NR_SB_LOG_ZONES, GFP_NOFS); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 81c00a3ed202..de9d7dd8c351 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -5,6 +5,8 @@ #include #include +#include "volumes.h" +#include "disk-io.h" struct btrfs_zoned_device_info { /* @@ -17,6 +19,7 @@ struct btrfs_zoned_device_info { u32 nr_zones; unsigned long *seq_zones; unsigned long *empty_zones; + struct blk_zone sb_zones[2 * BTRFS_SUPER_MIRROR_MAX]; }; #ifdef CONFIG_BLK_DEV_ZONED @@ -26,6 +29,12 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info); +int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw, + u64 *bytenr_ret); +int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, + u64 *bytenr_ret); +void btrfs_advance_sb_log(struct btrfs_device *device, int mirror); +int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -54,6 +63,30 @@ static inline int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info) return 0; } +static inline int btrfs_sb_log_location_bdev(struct block_device *bdev, + int mirror, int rw, + u64 *bytenr_ret) +{ + *bytenr_ret = btrfs_sb_offset(mirror); + return 0; +} + +static inline int btrfs_sb_log_location(struct btrfs_device *device, int mirror, + int rw, u64 *bytenr_ret) +{ + *bytenr_ret = btrfs_sb_offset(mirror); + return 0; +} + +static inline void btrfs_advance_sb_log(struct btrfs_device *device, + int mirror) { } + +static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, + int mirror) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -121,4 +154,15 @@ static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info *fs_info, return bdev_zoned_model(bdev) != BLK_ZONED_HM; } +static inline bool btrfs_check_super_location(struct btrfs_device *device, + u64 pos) +{ + /* + * On a non-zoned device, any address is OK. On a zoned device, + * non-SEQUENTIAL WRITE REQUIRED zones are capable. + */ + return device->zone_info == NULL || + !btrfs_dev_is_sequential(device, pos); +} + #endif From patchwork Tue Nov 10 11:26:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894027 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5255F697 for ; Tue, 10 Nov 2020 11:28:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 16E4520781 for ; Tue, 10 Nov 2020 11:28:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="QxyjHWqL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730368AbgKJL2h (ORCPT ); Tue, 10 Nov 2020 06:28:37 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11943 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730114AbgKJL22 (ORCPT ); Tue, 10 Nov 2020 06:28:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007708; x=1636543708; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=w4p5zdBX3vuqMNItg1rplArk4UwYLA38qFidJsIoKEo=; b=QxyjHWqLwCIVvmaGFkTISTHiJcpG5QG/eMxDNTIXuh/17DIUxu8erYqh WH5R06HxWtn+nv/QHjnH6bkOtBcoxikmgj/c8R18Ku5DkLOkrGAk4ZTvI y7rwdn4Jtg1Rrt3xSM/YEqP2J7RdDLAaBLuzz1pumuys0lQ9Sw0pmwXOy IN80aReRYzrLvP3JQQzZ65cC3/I3rOgEOufJWWpR6Lo5dhOBLDnl3Dx2l qCf0bVY2pbjwQbrygv/ZUgKUiW/4lAMSLoWNE3beDWH9rcPPYmkLDK+pY 52+6bSwDejRXCZWJhcRIV647cAH29aqdYpj++lwA5PeTuLgUMTTEL7DBS A==; IronPort-SDR: 244UC/lTLKx9mhLrVdLhxJsphFOHubmZ4GGmVhulwGgnJpVqodZpSeL+acxb4eYwnD15Z7BurR bY7rZqm7p5dM2p2ETdOz550+ws4jpcXH092bidl4aOmtvo0GYVH25Mzm4b/2betmegL7KyfixA l6wZAl475FRIGji3b/Ql79Me/bW+lBBRYL5StV9Sy3sn2gUjoS4JGiWyVihp3Odg9gl5yMRJgr OvAhzI0Ffrs3p511t306fK3lAKrcVSjW0PBJZutpNTLeR5nj0avfdm4bPORfgFRIpYQ6hRZtG0 RMU= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376459" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:25 +0800 IronPort-SDR: mdbKvykhfkXQvBgMLQvrR+mbTV4dJ9YLVOdVST+6D4awOHcN1H7iO1Z+05XyG4Z3hQUI2PBc3P mgZPzg6RhE04cPyxFeX6zvmbffiVnROx485fSCqTrjifGMe2NuO/Jf1Qti7UUrcpxLviigg4al hMw7JgfSw/B1X3KBt7/djF8MBRV/EiujaeNt2FkajDgqCmfQvNyuc8gffx9AJg6skCstcDpmSf /dpBdw+Xec0jYpgRAfaAxvhq/haBiS80iA5kwSFCU2O0l/oscKKCardFJnHqgPOEf3/KqLTfI3 F82jTYYwp3qymryJ0xndxkhq Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:25 -0800 IronPort-SDR: DPgnxvo9Kb5dg9Jm6PTtH3ndBZ4D4igthqp9UZDDOShA8AHIO791cCalwLxKEayJrJxr9V1JmT lV2boVfsPKoGSKcoJc8C9HS7x7v5lAfP3j6GkaRD/u6LVY2/CFK+ePdqzBzqI1R2Pn2aVVBCyO +6LmFuo0RX8K1bKyrcHW7Z9ozsiDpESJOqu89xgLe33NdmfsUmCm/EFS3cCzF+0fttJ6CiMaw6 nfx8KR292CZYWoDDtCxIvslyF1A5d2T7SvBJx+PlREPwFmlEdPmjg+IipSc57J0sjbsBrCelVt Yas= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:24 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 12/41] btrfs: implement zoned chunk allocator Date: Tue, 10 Nov 2020 20:26:15 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a zoned chunk/dev_extent allocator. The zoned allocator aligns the device extents to zone boundaries, so that a zone reset affects only the device extent and does not change the state of blocks in the neighbor device extents. Also, it checks that a region allocation is not overlapping any of the super block zones, and ensures the region is empty. Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 136 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + fs/btrfs/zoned.c | 144 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 34 +++++++++++ 4 files changed, 315 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index db884b96a5ea..7831cf6c6da4 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1416,6 +1416,21 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start, return false; } +static inline u64 dev_extent_search_start_zoned(struct btrfs_device *device, + u64 start) +{ + u64 tmp; + + if (device->zone_info->zone_size > SZ_1M) + tmp = device->zone_info->zone_size; + else + tmp = SZ_1M; + if (start < tmp) + start = tmp; + + return btrfs_align_offset_to_zone(device, start); +} + static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) { switch (device->fs_devices->chunk_alloc_policy) { @@ -1426,11 +1441,57 @@ static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) * make sure to start at an offset of at least 1MB. */ return max_t(u64, start, SZ_1M); + case BTRFS_CHUNK_ALLOC_ZONED: + return dev_extent_search_start_zoned(device, start); default: BUG(); } } +static bool dev_extent_hole_check_zoned(struct btrfs_device *device, + u64 *hole_start, u64 *hole_size, + u64 num_bytes) +{ + u64 zone_size = device->zone_info->zone_size; + u64 pos; + int ret; + int changed = 0; + + ASSERT(IS_ALIGNED(*hole_start, zone_size)); + + while (*hole_size > 0) { + pos = btrfs_find_allocatable_zones(device, *hole_start, + *hole_start + *hole_size, + num_bytes); + if (pos != *hole_start) { + *hole_size = *hole_start + *hole_size - pos; + *hole_start = pos; + changed = 1; + if (*hole_size < num_bytes) + break; + } + + ret = btrfs_ensure_empty_zones(device, pos, num_bytes); + + /* Range is ensured to be empty */ + if (!ret) + return changed; + + /* Given hole range was invalid (outside of device) */ + if (ret == -ERANGE) { + *hole_start += *hole_size; + *hole_size = 0; + return 1; + } + + *hole_start += zone_size; + *hole_size -= zone_size; + changed = 1; + } + + return changed; +} + /** * dev_extent_hole_check - check if specified hole is suitable for allocation * @device: the device which we have the hole @@ -1463,6 +1524,10 @@ static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start, case BTRFS_CHUNK_ALLOC_REGULAR: /* No extra check */ break; + case BTRFS_CHUNK_ALLOC_ZONED: + changed |= dev_extent_hole_check_zoned(device, hole_start, + hole_size, num_bytes); + break; default: BUG(); } @@ -1517,6 +1582,9 @@ static int find_free_dev_extent_start(struct btrfs_device *device, search_start = dev_extent_search_start(device, search_start); + WARN_ON(device->zone_info && + !IS_ALIGNED(num_bytes, device->zone_info->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -4907,6 +4975,37 @@ static void init_alloc_chunk_ctl_policy_regular( ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes; } +static void init_alloc_chunk_ctl_policy_zoned( + struct btrfs_fs_devices *fs_devices, + struct alloc_chunk_ctl *ctl) +{ + u64 zone_size = fs_devices->fs_info->zone_size; + u64 limit; + int min_num_stripes = ctl->devs_min * ctl->dev_stripes; + int min_data_stripes = (min_num_stripes - ctl->nparity) / ctl->ncopies; + u64 min_chunk_size = min_data_stripes * zone_size; + u64 type = ctl->type; + + ctl->max_stripe_size = zone_size; + if (type & BTRFS_BLOCK_GROUP_DATA) { + ctl->max_chunk_size = round_down(BTRFS_MAX_DATA_CHUNK_SIZE, + zone_size); + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { + ctl->max_chunk_size = ctl->max_stripe_size; + } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) { + ctl->max_chunk_size = 2 * ctl->max_stripe_size; + ctl->devs_max = min_t(int, ctl->devs_max, + BTRFS_MAX_DEVS_SYS_CHUNK); + } + + /* We don't want a chunk larger than 10% of writable space */ + limit = max(round_down(div_factor(fs_devices->total_rw_bytes, 1), + zone_size), + min_chunk_size); + ctl->max_chunk_size = min(limit, ctl->max_chunk_size); + ctl->dev_extent_min = zone_size * ctl->dev_stripes; +} + static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl) { @@ -4927,6 +5026,9 @@ static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, case BTRFS_CHUNK_ALLOC_REGULAR: init_alloc_chunk_ctl_policy_regular(fs_devices, ctl); break; + case BTRFS_CHUNK_ALLOC_ZONED: + init_alloc_chunk_ctl_policy_zoned(fs_devices, ctl); + break; default: BUG(); } @@ -5053,6 +5155,38 @@ static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl, return 0; } +static int decide_stripe_size_zoned(struct alloc_chunk_ctl *ctl, + struct btrfs_device_info *devices_info) +{ + u64 zone_size = devices_info[0].dev->zone_info->zone_size; + /* Number of stripes that count for block group size */ + int data_stripes; + + /* + * It should hold because: + * dev_extent_min == dev_extent_want == zone_size * dev_stripes + */ + ASSERT(devices_info[ctl->ndevs - 1].max_avail == ctl->dev_extent_min); + + ctl->stripe_size = zone_size; + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + + /* stripe_size is fixed in ZONED. Reduce ndevs instead. */ + if (ctl->stripe_size * data_stripes > ctl->max_chunk_size) { + ctl->ndevs = div_u64(div_u64(ctl->max_chunk_size * ctl->ncopies, + ctl->stripe_size) + ctl->nparity, + ctl->dev_stripes); + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + ASSERT(ctl->stripe_size * data_stripes <= ctl->max_chunk_size); + } + + ctl->chunk_size = ctl->stripe_size * data_stripes; + + return 0; +} + static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl, struct btrfs_device_info *devices_info) @@ -5080,6 +5214,8 @@ static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, switch (fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: return decide_stripe_size_regular(ctl, devices_info); + case BTRFS_CHUNK_ALLOC_ZONED: + return decide_stripe_size_zoned(ctl, devices_info); default: BUG(); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 9c07b97a2260..0249aca668fb 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -213,6 +213,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used); enum btrfs_chunk_allocation_policy { BTRFS_CHUNK_ALLOC_REGULAR, + BTRFS_CHUNK_ALLOC_ZONED, }; struct btrfs_fs_devices { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 84ade8c19ddc..ed5de1c138d7 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1,11 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" +#include "disk-io.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -424,6 +426,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; btrfs_info(fs_info, "zoned mode enabled with zone size %llu", fs_info->zone_size); @@ -633,3 +636,144 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) sb_zone << zone_sectors_shift, zone_sectors * BTRFS_NR_SB_LOG_ZONES, GFP_NOFS); } + +/* + * btrfs_check_allocatable_zones - find allocatable zones within give region + * @device: the device to allocate a region + * @hole_start: the position of the hole to allocate the region + * @num_bytes: the size of wanted region + * @hole_size: the size of hole + * + * Allocatable region should not contain any superblock locations. + */ +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + u64 nzones = num_bytes >> shift; + u64 pos = hole_start; + u64 begin, end; + bool have_sb; + int i; + + ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + while (pos < hole_end) { + begin = pos >> shift; + end = begin + nzones; + + if (end > zinfo->nr_zones) + return hole_end; + + /* Check if zones in the region are all empty */ + if (btrfs_dev_is_sequential(device, pos) && + find_next_zero_bit(zinfo->empty_zones, end, begin) != end) { + pos += zinfo->zone_size; + continue; + } + + have_sb = false; + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + u32 sb_zone; + u64 sb_pos; + + sb_zone = sb_zone_number(shift, i); + if (!(end <= sb_zone || + sb_zone + BTRFS_NR_SB_LOG_ZONES <= begin)) { + have_sb = true; + pos = ((u64)sb_zone + BTRFS_NR_SB_LOG_ZONES) << shift; + break; + } + + /* + * We also need to exclude regular superblock + * positions + */ + sb_pos = btrfs_sb_offset(i); + if (!(pos + num_bytes <= sb_pos || + sb_pos + BTRFS_SUPER_INFO_SIZE <= pos)) { + have_sb = true; + pos = ALIGN(sb_pos + BTRFS_SUPER_INFO_SIZE, + zinfo->zone_size); + break; + } + } + if (!have_sb) + break; + + } + + return pos; +} + +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes) +{ + int ret; + + *bytes = 0; + ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, + physical >> SECTOR_SHIFT, length >> SECTOR_SHIFT, + GFP_NOFS); + if (ret) + return ret; + + *bytes = length; + while (length) { + btrfs_dev_set_zone_empty(device, physical); + physical += device->zone_info->zone_size; + length -= device->zone_info->zone_size; + } + + return 0; +} + +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + unsigned long begin = start >> shift; + unsigned long end = (start + size) >> shift; + u64 pos; + int ret; + + ASSERT(IS_ALIGNED(start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(size, zinfo->zone_size)); + + if (end > zinfo->nr_zones) + return -ERANGE; + + /* All the zones are conventional */ + if (find_next_bit(zinfo->seq_zones, begin, end) == end) + return 0; + + /* All the zones are sequential and empty */ + if (find_next_zero_bit(zinfo->seq_zones, begin, end) == end && + find_next_zero_bit(zinfo->empty_zones, begin, end) == end) + return 0; + + for (pos = start; pos < start + size; pos += zinfo->zone_size) { + u64 reset_bytes; + + if (!btrfs_dev_is_sequential(device, pos) || + btrfs_dev_is_empty_zone(device, pos)) + continue; + + /* Free regions should be empty */ + btrfs_warn_in_rcu( + device->fs_info, + "zoned: resetting device %s (devid %llu) zone %llu for allocation", + rcu_str_deref(device->name), device->devid, + pos >> shift); + WARN_ON_ONCE(1); + + ret = btrfs_reset_device_zone(device, pos, zinfo->zone_size, + &reset_bytes); + if (ret) + return ret; + } + + return 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index de9d7dd8c351..ec2391c52d8b 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -35,6 +35,11 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, u64 *bytenr_ret); void btrfs_advance_sb_log(struct btrfs_device *device, int mirror); int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror); +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes); +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes); +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -87,6 +92,26 @@ static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, return 0; } +static inline u64 btrfs_find_allocatable_zones(struct btrfs_device *device, + u64 hole_start, u64 hole_end, + u64 num_bytes) +{ + return hole_start; +} + +static inline int btrfs_reset_device_zone(struct btrfs_device *device, + u64 physical, u64 length, u64 *bytes) +{ + *bytes = 0; + return 0; +} + +static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, + u64 start, u64 size) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -165,4 +190,13 @@ static inline bool btrfs_check_super_location(struct btrfs_device *device, !btrfs_dev_is_sequential(device, pos); } +static inline u64 btrfs_align_offset_to_zone(struct btrfs_device *device, + u64 pos) +{ + if (!device->zone_info) + return pos; + + return ALIGN(pos, device->zone_info->zone_size); +} + #endif From patchwork Tue Nov 10 11:26:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894025 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38BC5697 for ; Tue, 10 Nov 2020 11:28:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1232420795 for ; Tue, 10 Nov 2020 11:28:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="WCXpKJn9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730097AbgKJL2f (ORCPT ); Tue, 10 Nov 2020 06:28:35 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11954 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730117AbgKJL23 (ORCPT ); Tue, 10 Nov 2020 06:28:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007709; x=1636543709; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K/wCGnSG/8q6cq2UEr6gw9asvw1dGZ/c1MPBO/Bqf7Y=; b=WCXpKJn9R1E7CljgWE1ZLn6yf81eA9/ZZgNgdbZ2yroM6LL2Eiw1jC7M cHyALpkqvAx985vSoR7Njm8glUND0IUTojLPSqSzF7mZKioHJQeyn42hG 8Vcn7ylzjKofdRT+4YKJBwjB73Yn0Jn3eWDrdLEwqo0+9VFmJMWQol+iA 4Jx2g2ehVqZ0y5dVPO6Jju2oHxmXQYt9dUG+LYjJ+rrx79LZC1SEkD67K w283rNrZYkzwJtUDLXjveBNJX2y+6fb4bMUQ5XC7FDEYIm/fuWUkSm1qR WJDAL+NeMzcySHUFQOu01keLVXHwsUed/nfMtgI1yDlNmnPiqs+iPo6Or g==; IronPort-SDR: ktZ37FR1mA1Kjx8DNC07H3R1JSq3tcEqbFhCXyUsokt+3FhAWP9Me+PyCkVk+YwFU7yFuTUmXS BgaSMznSvIs7zoN9Rtjnd8T2C0Ui0dKGrIL8SXqceMvvCMIUO+tgPjurWdPnb7VuQRTfPRNopn u3bB219v4VI51q1SePsj0Twf3uF7Z4DNEBBWMNH3erMlTNPeJAVb8gr1JCPZp5/cVBPMkpiaLR 8gZxuVs334BPElNiHBh8rNYJoMWVtGwbct6akivV1Hox6ECG8oUy5Ah9dGUae4pYqhJ6ew9fMb 1H8= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376462" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:26 +0800 IronPort-SDR: Vqbywq8H6QSupN5bPwhhB10ZPjNmzR2wHeGilLVQx5aD/OKmmSyK2q1DdBt3u5sQdVNQKwe3VJ YL5zUofzeq0I+hYAIqyuGqXwl8BouUbZ0HvehOaMBY9Dd0Yove/vCYVrpk2q//npmmo730nfts fDU4oVPdEfz7N08dMs1ipNY3a4lS9EYfsXo6l6io/2wKgCfJfHX9eT5rq6X5mJCOf1cZPXdUT+ smY0RQBZRH5IwDxGo9nWdvZdxQ/OZ10g5iBZlG1kJXsyYi0MlHjacKICOl0RiyEYZmGL6OGHhD wa7pdupRSGiE+KZs2zUoRvxt Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:27 -0800 IronPort-SDR: rtHgONUiG9HOA8UXCoAZeEy9I6qO6CuGTTsHCDmuifCdGYwbd4kQbp7kyqUu3xqoltNzBo8A8P rVuIZyuho6FAvvu32Qx7WejA2QkfigsLQHV7GSTFgb5rRTWL2PktmpWmoqn4xeXXvuEZoKcbLo HJdFn0k+2FKPD0FMbkVg68irKaDEX7PiMSAB/+o+zlpzBNueatIcMN5WnBz/9aSQYgCkU+Pz62 2tFjvDc8R9b102jRUcwd/86D2slSa6qjXlcR74KCInw5Dca/MDU3JLkS+t9ypZJkVNpwLiDbNR 8oE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:25 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 13/41] btrfs: verify device extent is aligned to zone Date: Tue, 10 Nov 2020 20:26:16 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a check in verify_one_dev_extent() to check if a device extent on a zoned block device is aligned to the respective zone boundary. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/volumes.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7831cf6c6da4..c0e27c1e2559 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7783,6 +7783,20 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, ret = -EUCLEAN; goto out; } + + if (dev->zone_info) { + u64 zone_size = dev->zone_info->zone_size; + + if (!IS_ALIGNED(physical_offset, zone_size) || + !IS_ALIGNED(physical_len, zone_size)) { + btrfs_err(fs_info, +"zoned: dev extent devid %llu physical offset %llu len %llu is not aligned to device zone", + devid, physical_offset, physical_len); + ret = -EUCLEAN; + goto out; + } + } + out: free_extent_map(em); return ret; From patchwork Tue Nov 10 11:26:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894139 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D7904921 for ; Tue, 10 Nov 2020 11:30:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AF44520781 for ; Tue, 10 Nov 2020 11:30:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="nYViO/gE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732371AbgKJLaW (ORCPT ); Tue, 10 Nov 2020 06:30:22 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11959 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730133AbgKJL2a (ORCPT ); Tue, 10 Nov 2020 06:28:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007709; x=1636543709; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ld+Szk5GqbI1Q/9gJOULtt8uYEuLrS7uKHMGKSXFv58=; b=nYViO/gEjeRIjvuJrB3iauJOlneJZjB/eg4Vi81HK+KWK1Br9wxxr8CD z8SQaYa9tnFW+TrwXnzQQmI87pGuFGoK0L6M3uqLeA5H2hQSVxsXBvQrJ ebNTGOtMxf/Pn77T8reM4dLPlPIWS5s88+dN7MveFnz3TFDtGdOAfH9So XeIhiK30uECIFdiqJwqe46bRBaM615Tmg3kqZhJEhxIfXWrggxnBEWxhx CABzeRzQQgt2jy9MVtAYmrHSa74/iPV6VXA5kgZBmTryuBe11NlKnVaak LPgO+bdDA/xI515SOvlqF69hCx6YzlnES0g2Xm/5pyZvod1trpaA9f+Pd g==; IronPort-SDR: mKprWQqqCtysAAhJu6K3k9DHzDukLdj3f82goWmADJd/Q7VoaTwyoVuYl7llAdIASR+eu/ym9N NPe2sx7uSmmi+pkCSD79mbwGPEoqToDhuiKB+Og68vISWFuNnc2BXu16roIdA9XJRykJCpxjZx 8AJDzPItyPSf3F4UwMxFc58YBOvTsyXj397r5BbYyOpM/oyta/OUSNHU4CWBx93o6c2Dz+hoqy kZjnySc6m7ULWz4a7daS/pYAXAY2TIR3Y3FioizXAhohRs4GrzNK85BieKI1gCbahiNmWv+vxm Gos= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376475" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:28 +0800 IronPort-SDR: jIR1CE1WAEHpds6idzsRvR2cCaQjoFWHucLnowLf1EBf7LRqly807sDXWcaZJstSUiCOh6StNI jwYMfh35wUvoyXzfATXyjCfKMAHQSNAUeXE7XWz3XVLVDdvf19ulZzHofiIDdVa/yA2n7SLBpU 4e94ax2xV2bXDWaJi7BQSdCA7/BEO/2LmqJRodhnD6wEyVSFxy5xN7ReNch4b1svARoS/ZXUvb eWyEYhR79jqbHUj5CXlgmTxq6g3wPY0WJc5JpuJl0RZjObczyZmllHi8TpyZJqD8+A8qfPsPmd bPeLYZIIyvkH6fIdSXySj3DS Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:29 -0800 IronPort-SDR: QPUmGq22G/Eke27GApuPWBiqdVaMeT2/yM1ujLqjftBgtXjfdN/Si8jeGJ2/MRqmQy+o1dtWeP cMQQe8nXy45KUGL4RMgvXhORXqdSzI5tVWM3BTQdYeqYRS4VQumi1Gw/LogSkq3YqzfybIiftp y01YUCwtyOlIJUfiK9HeBddB0VpEO86s+vALWcPElHUwzANKZ+PWyKREqYy8Pl0xy2IQ4b1+SX 520fi4SYmHifza8KAY28GV3RapriBCMBaWiXMP1kkbV1ffeP3EyAub71N09o9juM3MSC1THGer eWY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:27 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 14/41] btrfs: load zone's alloction offset Date: Tue, 10 Nov 2020 20:26:17 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned btrfs must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. This commit adds "alloc_offset" to track the logical address. This logical address is populated in btrfs_load_block_group_zone_info() from write pointers of corresponding zones. For now, zoned btrfs only support the SINGLE profile. Supporting non-SINGLE profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-SINGLE profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/block-group.c | 15 ++++ fs/btrfs/block-group.h | 6 ++ fs/btrfs/zoned.c | 154 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 ++ 4 files changed, 182 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 6b4831824f51..ffc64dfbe09e 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -15,6 +15,7 @@ #include "delalloc-space.h" #include "discard.h" #include "raid56.h" +#include "zoned.h" /* * Return target flags in extended format or 0 if restripe for this chunk_type @@ -1935,6 +1936,13 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_err(info, "zoned: failed to load zone info of bg %llu", + cache->start); + goto error; + } + /* * We need to exclude the super stripes now so that the space info has * super bytes accounted for, otherwise we'll think we have more space @@ -2161,6 +2169,13 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; cache->needs_free_space = 1; + + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_put_block_group(cache); + return ret; + } + ret = exclude_super_stripes(cache); if (ret) { /* We may have excluded something, so call this just in case */ diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index adfd7583a17b..14e3043c9ce7 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -183,6 +183,12 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + + /* + * Allocation offset for the block group to implement sequential + * allocation. This is used only with ZONED mode enabled. + */ + u64 alloc_offset; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index ed5de1c138d7..69d3412c4fef 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -3,14 +3,20 @@ #include #include #include +#include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" #include "disk-io.h" +#include "block-group.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* Invalid allocation pointer value for missing devices */ +#define WP_MISSING_DEV ((u64)-1) +/* Pseudo write pointer value for conventional zone */ +#define WP_CONVENTIONAL ((u64)-2) /* Number of superblock log zones */ #define BTRFS_NR_SB_LOG_ZONES 2 @@ -777,3 +783,151 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map_tree *em_tree = &fs_info->mapping_tree; + struct extent_map *em; + struct map_lookup *map; + struct btrfs_device *device; + u64 logical = cache->start; + u64 length = cache->length; + u64 physical = 0; + int ret; + int i; + unsigned int nofs_flag; + u64 *alloc_offsets = NULL; + u32 num_sequential = 0, num_conventional = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + /* Sanity check */ + if (!IS_ALIGNED(length, fs_info->zone_size)) { + btrfs_err(fs_info, "zoned: block group %llu len %llu unaligned to zone size %llu", + logical, length, fs_info->zone_size); + return -EIO; + } + + /* Get the chunk mapping */ + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, logical, length); + read_unlock(&em_tree->lock); + + if (!em) + return -EINVAL; + + map = em->map_lookup; + + /* + * Get the zone type: if the group is mapped to a non-sequential zone, + * there is no need for the allocation offset (fit allocation is OK). + */ + alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), + GFP_NOFS); + if (!alloc_offsets) { + free_extent_map(em); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + bool is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->bdev == NULL) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (is_sequential) + num_sequential++; + else + num_conventional++; + + if (!is_sequential) { + alloc_offsets[i] = WP_CONVENTIONAL; + continue; + } + + /* + * This zone will be used for allocation, so mark this + * zone non-empty. + */ + btrfs_dev_clear_zone_empty(device, physical); + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + nofs_flag = memalloc_nofs_save(); + ret = btrfs_get_dev_zone(device, physical, &zone); + memalloc_nofs_restore(nofs_flag); + if (ret == -EIO || ret == -EOPNOTSUPP) { + ret = 0; + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } else if (ret) { + goto out; + } + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + btrfs_err(fs_info, "zoned: offline/readonly zone %llu on device %s (devid %llu)", + physical >> device->zone_info->zone_size_shift, + rcu_str_deref(device->name), device->devid); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = + ((zone.wp - zone.start) << SECTOR_SHIFT); + break; + } + } + + if (num_conventional > 0) { + /* + * Since conventional zones do not have a write pointer, we + * cannot determine alloc_offset from the pointer + */ + ret = -EINVAL; + goto out; + } + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + cache->alloc_offset = alloc_offsets[0]; + break; + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID0: + case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* non-SINGLE profiles are not supported yet */ + default: + btrfs_err(fs_info, "zoned: profile %s not supported", + btrfs_bg_type_to_raid_name(map->type)); + ret = -EINVAL; + goto out; + } + +out: + kfree(alloc_offsets); + free_extent_map(em); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index ec2391c52d8b..e3338a2f1be9 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -40,6 +40,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -112,6 +113,12 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, return 0; } +static inline int btrfs_load_block_group_zone_info( + struct btrfs_block_group *cache) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Nov 10 11:26:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894137 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A142697 for ; Tue, 10 Nov 2020 11:30:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 620F120781 for ; Tue, 10 Nov 2020 11:30:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="aaDmZPsZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732353AbgKJLaP (ORCPT ); Tue, 10 Nov 2020 06:30:15 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11989 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730193AbgKJL2a (ORCPT ); Tue, 10 Nov 2020 06:28:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007710; x=1636543710; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TL4h9w0H2mkq0W+1q7so0L+cv0bM1FkrasJQ2e/hSj8=; b=aaDmZPsZqg/b1sL2esLeBnVyU5gJU6UYETwsctzEinWgxxHbymi8Sp9M TvamFw+IiU0kUijZVbFBhtd6iynVMucqEmKECmbETxD4dwOYJfvgsZ15+ TRMQ9OtjbEDUpMH3mMtvkrRIWeB1FGSA1iS9WD27tnTRyI/fhcj3oW/0+ OBDJgFD9faTq/nnlFKhkz/+4B0I5Rxs5OnhdCW560qjocsaS6Q/qVCPF3 4RsPO9XfqnGgjywo9/87wGupmNh8HUhdFk+g/bRfsKaTJQ5dIz6ysc6AM SxJDoGnVtlzO8tHqXvSmWf9CvQA6YGnPyf0MlyE7eWowmA/E8m15HSYY7 w==; IronPort-SDR: PXq4JBOghS2qLSJDwfi4nus2MSViMtTLL0qWEdBq9iM9mrmkpWg1T1ZShyHqVMQi0NgOZgA562 1vn/fio+9MHaETL3kAxOioxKVN2tCzBM8rgHIkVk40QMZSAvma5fosuxY+EzgvaMdrQ4NXBjtJ FPYWDAcicpArkM1KiMK8FAdfP93TJTJ7W3bEzCLH0hm3h897ydH4vhXYuQbMft3gl1eRjR0lTh zW7xB3ADhASaRjSo0g9PONYc+aabpG/lnAjP52EISutjbgFaq/lcBsM+tHXqihF5yFRiU3ZXWV IcE= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376488" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:29 +0800 IronPort-SDR: eHg1EYAYQcf7pHTs4GjIbMXaK1/Hge36j31GHW7/0ydBHt7/MP0uBX6Rk8dl1LpxipI1s+HUbE pVOJjgM7UnY3Ii83uHLELnWH65rh/+3M72VaMFtjhVTp1+ZqvTZTHy9jYSXJz3r5/6lw5S8oSO of/3lIwyGc8oOSEoetm2bVk6L0tWwAMddBPkQ3JevpoSTzTc6etBmtJ4jy4LYB4cDTRLN0mojw HBBtXh6scnT9DKsfSd/m9HvxZZXwqjJS+9eE+KeJuhU2QM7/IPDP+EvqlD5oAeTgN8mFjgPVMx 24ICTeLPgnBhNonq6U3wxnyh Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:30 -0800 IronPort-SDR: 2xJJ4wpuyIln1ZVpTZsv4rH3drGaNxA6Imgs0CrEsnTt/WqbJQrSWSSjVG6FBuj+dUaIaZ8oSN 8/Hf3qqYRy7lJD0FtvfF2ztrXY4lkfX70XDT2LqC92AgdBGY0OtKs73AEgQ2a5sLHsDuTahPNa K+K49bdKAiSh63drZ70aBA9Z9EIxLcuCdLOuo6qPcBwYwDBAK6EV07B7VcuwbBg0v3Tds23k5A vH8A8K4SOD6x+TVkcYXhsGffiIbiFazJHHq9XeGWKEfmumWU4FSLgdooDkAoE7ik1ueMYmfytv dR0= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:29 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 15/41] btrfs: emulate write pointer for conventional zones Date: Tue, 10 Nov 2020 20:26:18 +0900 Message-Id: <4f84109a6857753734228af3cb626bed112703b2.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset if a block group contains a conventional zone. But instead, we can consider the end of the last allocated extent in the block group as an allocation offset. Signed-off-by: Naohiro Aota --- fs/btrfs/zoned.c | 80 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 74 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 69d3412c4fef..9bf40300e428 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -784,6 +784,61 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } +static int emulate_write_pointer(struct btrfs_block_group *cache, + u64 *offset_ret) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + int ret; + u64 length; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = cache->start + cache->length; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + /* We should not find the exact match */ + if (ret <= 0) { + ret = -EUCLEAN; + goto out; + } + + ret = btrfs_previous_extent_item(root, path, cache->start); + if (ret) { + if (ret == 1) { + ret = 0; + *offset_ret = 0; + } + goto out; + } + + btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]); + + if (found_key.type == BTRFS_EXTENT_ITEM_KEY) + length = found_key.offset; + else + length = fs_info->nodesize; + + if (!(found_key.objectid >= cache->start && + found_key.objectid + length <= cache->start + cache->length)) { + ret = -EUCLEAN; + goto out; + } + *offset_ret = found_key.objectid + length - cache->start; + ret = 0; + +out: + btrfs_free_path(path); + return ret; +} + int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) { struct btrfs_fs_info *fs_info = cache->fs_info; @@ -798,6 +853,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) int i; unsigned int nofs_flag; u64 *alloc_offsets = NULL; + u64 emulated_offset = 0; u32 num_sequential = 0, num_conventional = 0; if (!btrfs_is_zoned(fs_info)) @@ -899,12 +955,16 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) } if (num_conventional > 0) { - /* - * Since conventional zones do not have a write pointer, we - * cannot determine alloc_offset from the pointer - */ - ret = -EINVAL; - goto out; + ret = emulate_write_pointer(cache, &emulated_offset); + if (ret || map->num_stripes == num_conventional) { + if (!ret) + cache->alloc_offset = emulated_offset; + else + btrfs_err(fs_info, + "zoned: failed to emulate write pointer of bg %llu", + cache->start); + goto out; + } } switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { @@ -926,6 +986,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) } out: + /* An extent is allocated after the write pointer */ + if (num_conventional && emulated_offset > cache->alloc_offset) { + btrfs_err(fs_info, + "zoned: got wrong write pointer in BG %llu: %llu > %llu", + logical, emulated_offset, cache->alloc_offset); + ret = -EIO; + } + kfree(alloc_offsets); free_extent_map(em); From patchwork Tue Nov 10 11:26:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894135 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC1A3697 for ; Tue, 10 Nov 2020 11:30:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7669A20781 for ; Tue, 10 Nov 2020 11:30:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="AJMh8Gtq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732327AbgKJLaO (ORCPT ); Tue, 10 Nov 2020 06:30:14 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11994 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730254AbgKJL2c (ORCPT ); Tue, 10 Nov 2020 06:28:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007711; x=1636543711; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fecS+jxie6mEuhUVqmIiaoaIZxDk4mfkukIWy3O2oZs=; b=AJMh8GtqeGRnK5824ULGJNgmnVgWbRcFR48t0NjbeIc8pIMBr0kYC2WM ZeW6JTBAwxKWRSuTJf0jKJ6qHeX8xqvJQdisDd9qwmrE6+NAwLnrufwl5 uZ4rR6V0elb3O7fNv2leuMTpTLQxWYTn46xh1LESHNEcG9+DXXLK8rX5V nPcfbXdg9IQL/vFvhaPJ1UsI8bpTRr7YNEKimioUGIGWUW0+irPT1iVJD Vf9wXmy9tCWDpvZrA3kzRGjbEyjxis2OG9Q6iORb/uJx9kI75jC9tdtNw n8AIpw3j0qup0YSh1Q3iJR6ABLG9KbNqGhiru7BRp1hF7oCZB/goGNhgz g==; IronPort-SDR: CI8aFrAqvqmIndqkVlcZPV2RtIeZpjfw3SIA42p9uf653ZICkNYj8tJ7oxWqp8VATavO4hJY3T qessaraI690VTSo/UPvseqryzuj5z0Mdvbldt0rMF4g7opomFG15ueAbRTdt22J3PEn7sdr0BX y1ZyRSqWoOxT14GgwStGIVWPBxOJoKctiq2WzjuEWB+issJrhLdZPODx132TQqqKf6w2jxgXJS Ksf4LeyR0NqHhI7VgWVsRAstT0gHHZF5e3m5gUEou3j98HnpgbzD/vpP1ZrvB6dUnATCi5Hbk9 hxc= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376503" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:31 +0800 IronPort-SDR: Kq/lwyFrnHxsFCusIs9l5nW9vFtaXtwA5On4fLkGKrMYilp9b43Lf3+96cZxgVrb5rq3v4I4kS QRm74HGwVCxGQxfkSPb9ggSmo1czYTk/J3ZaY515+it/L8bQp2t2vQvqjXJ+aVWbOJ67Kvh+jI Rh60PRjGCY5TL23c4GEQa3oHVwh2NY7aQn584G9SxsDN2V95VG2ssATvcCCE0quepIKL616+tx hXhROmqNTmfjJPck3vkdpbLQTAqiyJf45J2vlvab7EMIZIxqj8Oi5+T0Wf0+LX10Sq3VjJ4G28 +hygAldv6L5zSC+Bqt/SuDlE Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:32 -0800 IronPort-SDR: tKTOrzlhLhxc9DCwBWX4wIwcM14Bov4ObuKq/qOgi9ujj0Y0JeR6qCZ+Wv1Wzby1ce20sQbo5J 3PWNW3zRTsAb7u4x/mB/Gr/C1SBn6IwMOlUyHBe/HgZGBZeCsyEtHKL8OMOxfQu6bPwWLfdU7q sEwgHTm8VtDEvMyWH+E0vIQV0R097s8I2MXaDh+bdk0RuaDGSc5vHLmCRtV1/W5H7qA3SZt6sK 4Vho4bs2oAQ7RX7p2OORiDpwkT1QGcoHKB1bN8Yecln630qh0uYGEuPYcR7ji0WOab8Oaoh8gp VZQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:30 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 16/41] btrfs: track unusable bytes for zones Date: Tue, 10 Nov 2020 20:26:19 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In zoned btrfs a region that was once written then freed is not usable until we reset the underlying zone. So we need to distinguish such unusable space from usable free space. Therefore we need to introduce the "zone_unusable" field to the block group structure, and "bytes_zone_unusable" to the space_info structure to track the unusable space. Pinned bytes are always reclaimed to the unusable space. But, when an allocated region is returned before using e.g., the block group becomes read-only between allocation time and reservation time, we can safely return the region to the block group. For the situation, this commit introduces "btrfs_add_free_space_unused". This behaves the same as btrfs_add_free_space() on regular btrfs. On zoned btrfs, it rewinds the allocation offset. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 21 ++++++++++----- fs/btrfs/block-group.h | 1 + fs/btrfs/extent-tree.c | 10 ++++++- fs/btrfs/free-space-cache.c | 52 +++++++++++++++++++++++++++++++++++++ fs/btrfs/free-space-cache.h | 2 ++ fs/btrfs/space-info.c | 13 ++++++---- fs/btrfs/space-info.h | 4 ++- fs/btrfs/sysfs.c | 2 ++ fs/btrfs/zoned.c | 24 +++++++++++++++++ fs/btrfs/zoned.h | 3 +++ 10 files changed, 119 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index ffc64dfbe09e..723b7c183cd9 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1080,12 +1080,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, WARN_ON(block_group->space_info->total_bytes < block_group->length); WARN_ON(block_group->space_info->bytes_readonly - < block_group->length); + < block_group->length - block_group->zone_unusable); + WARN_ON(block_group->space_info->bytes_zone_unusable + < block_group->zone_unusable); WARN_ON(block_group->space_info->disk_total < block_group->length * factor); } block_group->space_info->total_bytes -= block_group->length; - block_group->space_info->bytes_readonly -= block_group->length; + block_group->space_info->bytes_readonly -= + (block_group->length - block_group->zone_unusable); + block_group->space_info->bytes_zone_unusable -= + block_group->zone_unusable; block_group->space_info->disk_total -= block_group->length * factor; spin_unlock(&block_group->space_info->lock); @@ -1229,7 +1234,7 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force) } num_bytes = cache->length - cache->reserved - cache->pinned - - cache->bytes_super - cache->used; + cache->bytes_super - cache->zone_unusable - cache->used; /* * Data never overcommits, even in mixed mode, so do just the straight @@ -1973,6 +1978,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, btrfs_free_excluded_extents(cache); } + btrfs_calc_zone_unusable(cache); + ret = btrfs_add_block_group_cache(info, cache); if (ret) { btrfs_remove_free_space_cache(cache); @@ -1980,7 +1987,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, } trace_btrfs_add_block_group(info, cache, 0); btrfs_update_space_info(info, cache->flags, cache->length, - cache->used, cache->bytes_super, &space_info); + cache->used, cache->bytes_super, + cache->zone_unusable, &space_info); cache->space_info = space_info; @@ -2217,7 +2225,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, */ trace_btrfs_add_block_group(fs_info, cache, 1); btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, - cache->bytes_super, &cache->space_info); + cache->bytes_super, 0, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); @@ -2325,7 +2333,8 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache) spin_lock(&cache->lock); if (!--cache->ro) { num_bytes = cache->length - cache->reserved - - cache->pinned - cache->bytes_super - cache->used; + cache->pinned - cache->bytes_super - + cache->zone_unusable - cache->used; sinfo->bytes_readonly -= num_bytes; list_del_init(&cache->ro_list); } diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 14e3043c9ce7..5be47f4bfea7 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -189,6 +189,7 @@ struct btrfs_block_group { * allocation. This is used only with ZONED mode enabled. */ u64 alloc_offset; + u64 zone_unusable; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3b21fee13e77..09439782b9a8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -34,6 +34,7 @@ #include "block-group.h" #include "discard.h" #include "rcu-string.h" +#include "zoned.h" #undef SCRAMBLE_DELAYED_REFS @@ -2765,6 +2766,9 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info, { struct btrfs_free_cluster *ret = NULL; + if (btrfs_is_zoned(fs_info)) + return NULL; + *empty_cluster = 0; if (btrfs_mixed_space_info(space_info)) return ret; @@ -2846,7 +2850,11 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, space_info->max_extent_size = 0; percpu_counter_add_batch(&space_info->total_bytes_pinned, -len, BTRFS_TOTAL_BYTES_PINNED_BATCH); - if (cache->ro) { + if (btrfs_is_zoned(fs_info)) { + /* Need reset before reusing in a zoned block group */ + space_info->bytes_zone_unusable += len; + readonly = true; + } else if (cache->ro) { space_info->bytes_readonly += len; readonly = true; } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index af0013d3df63..f6434794cb0b 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2467,6 +2467,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, int ret = 0; u64 filter_bytes = bytes; + ASSERT(!btrfs_is_zoned(fs_info)); + info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS); if (!info) return -ENOMEM; @@ -2524,11 +2526,44 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, return ret; } +static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, + u64 bytenr, u64 size, bool used) +{ + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 offset = bytenr - block_group->start; + u64 to_free, to_unusable; + + spin_lock(&ctl->tree_lock); + if (!used) + to_free = size; + else if (offset >= block_group->alloc_offset) + to_free = size; + else if (offset + size <= block_group->alloc_offset) + to_free = 0; + else + to_free = offset + size - block_group->alloc_offset; + to_unusable = size - to_free; + + ctl->free_space += to_free; + block_group->zone_unusable += to_unusable; + spin_unlock(&ctl->tree_lock); + if (!used) { + spin_lock(&block_group->lock); + block_group->alloc_offset -= size; + spin_unlock(&block_group->lock); + } + return 0; +} + int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size) { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2537,6 +2572,16 @@ int btrfs_add_free_space(struct btrfs_block_group *block_group, bytenr, size, trim_state); } +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size) +{ + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + false); + + return btrfs_add_free_space(block_group, bytenr, size); +} + /* * This is a subtle distinction because when adding free space back in general, * we want it to be added as untrimmed for async. But in the case where we add @@ -2547,6 +2592,10 @@ int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC) || btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2564,6 +2613,9 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; + if (btrfs_is_zoned(block_group->fs_info)) + return 0; + spin_lock(&ctl->tree_lock); again: diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index e3d5e0ad8f8e..469382529f7e 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -116,6 +116,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, enum btrfs_trim_state trim_state); int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size); +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size); int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, u64 bytenr, u64 size); int btrfs_remove_free_space(struct btrfs_block_group *block_group, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 64099565ab8f..bbbf3c1412a4 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -163,6 +163,7 @@ u64 __pure btrfs_space_info_used(struct btrfs_space_info *s_info, ASSERT(s_info); return s_info->bytes_used + s_info->bytes_reserved + s_info->bytes_pinned + s_info->bytes_readonly + + s_info->bytes_zone_unusable + (may_use_included ? s_info->bytes_may_use : 0); } @@ -257,7 +258,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; @@ -273,6 +274,7 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, found->bytes_used += bytes_used; found->disk_used += bytes_used * factor; found->bytes_readonly += bytes_readonly; + found->bytes_zone_unusable += bytes_zone_unusable; if (total_bytes > 0) found->full = 0; btrfs_try_granting_tickets(info, found); @@ -422,10 +424,10 @@ static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info, info->total_bytes - btrfs_space_info_used(info, true), info->full ? "" : "not "); btrfs_info(fs_info, - "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu", + "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu zone_unusable=%llu", info->total_bytes, info->bytes_used, info->bytes_pinned, info->bytes_reserved, info->bytes_may_use, - info->bytes_readonly); + info->bytes_readonly, info->bytes_zone_unusable); DUMP_BLOCK_RSV(fs_info, global_block_rsv); DUMP_BLOCK_RSV(fs_info, trans_block_rsv); @@ -454,9 +456,10 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, list_for_each_entry(cache, &info->block_groups[index], list) { spin_lock(&cache->lock); btrfs_info(fs_info, - "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s", + "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu zone_unusable %s", cache->start, cache->length, cache->used, cache->pinned, - cache->reserved, cache->ro ? "[readonly]" : ""); + cache->reserved, cache->zone_unusable, + cache->ro ? "[readonly]" : ""); spin_unlock(&cache->lock); btrfs_dump_free_space(cache, bytes); } diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 5646393b928c..ee003ffba956 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -17,6 +17,8 @@ struct btrfs_space_info { u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ + u64 bytes_zone_unusable; /* total bytes that are unusable until + resetting the device zone */ u64 max_extent_size; /* This will hold the maximum extent size of the space info if we had an ENOSPC in the @@ -119,7 +121,7 @@ DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned"); int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, u64 flags); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 828006020bbd..ea679803da9b 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -635,6 +635,7 @@ SPACE_INFO_ATTR(bytes_pinned); SPACE_INFO_ATTR(bytes_reserved); SPACE_INFO_ATTR(bytes_may_use); SPACE_INFO_ATTR(bytes_readonly); +SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); BTRFS_ATTR(space_info, total_bytes_pinned, @@ -648,6 +649,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, bytes_reserved), BTRFS_ATTR_PTR(space_info, bytes_may_use), BTRFS_ATTR_PTR(space_info, bytes_readonly), + BTRFS_ATTR_PTR(space_info, bytes_zone_unusable), BTRFS_ATTR_PTR(space_info, disk_used), BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, total_bytes_pinned), diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 9bf40300e428..5ee26b9fe5b1 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -999,3 +999,27 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) return ret; } + +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) +{ + u64 unusable, free; + + if (!btrfs_is_zoned(cache->fs_info)) + return; + + WARN_ON(cache->bytes_super != 0); + unusable = cache->alloc_offset - cache->used; + free = cache->length - cache->alloc_offset; + + /* We only need ->free_space in ALLOC_SEQ BGs */ + cache->last_byte_to_unpin = (u64)-1; + cache->cached = BTRFS_CACHE_FINISHED; + cache->free_space_ctl->free_space = free; + cache->zone_unusable = unusable; + + /* + * Should not have any excluded extents. Just + * in case, though. + */ + btrfs_free_excluded_extents(cache); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index e3338a2f1be9..c86cde1978cd 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,6 +41,7 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -119,6 +120,8 @@ static inline int btrfs_load_block_group_zone_info( return 0; } +static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Nov 10 11:26:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894043 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 943E8697 for ; Tue, 10 Nov 2020 11:29:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6E6FF20795 for ; Tue, 10 Nov 2020 11:29:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="kwz+a8/P" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730924AbgKJL2l (ORCPT ); Tue, 10 Nov 2020 06:28:41 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11959 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730397AbgKJL2i (ORCPT ); Tue, 10 Nov 2020 06:28:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007717; x=1636543717; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QXoi4hz8+t4wLk78uy0VB7V1Ylvk6b47Nu308FFNIQc=; b=kwz+a8/PqEmwfAc8L2q0pgpKq5AliHDfn9Jmd5LEpfjXxZ7i7AKHH1+r TibSU82WmM0/HPOvJWliVh4U/CTnbw+t0g4d05sJfc+KhdjUtallkRGC0 Cay3b6EZW9C/wxg7CrUVw0pc2XIy3Pj5f2k3phFt68w1or9MzSAnmUsda D3IOEZelHbu9uTIY5RY81iSlUypj2MH2c3MqEHS7u/2tQoimDaQeK7tYB PitBoHIZ4kU96UinQCO8d95MnyXCldIdf3J1fIdBO/6jqjLTRuF797C9+ aeGpNDRpzKCoUOD3QbuNSqzLuDzFvL5B4Wd+LfLpUzMmk9ouOk4RcRPVv w==; IronPort-SDR: WS1PpPoGE9uNLVmXuqForY5xI9EJhWu1aLNt2NFOJ0g053ZlqzhC+kRsZRB+sdz199DSA3dv2+ KTi+wFtv5ik0ZjjUiIpidPKySm56HBYrPU4Y2qL2wDraXbY88ZD3oxjMrwEeQm2KpRpMZ+lRE4 75kbhew8n1+HbeHXBn9+80Oln20kNTtGQfWSRIEq3Cc+iEo5EzhaCA4gnyD7iXHXDx5pSXyJNi Df8lPonzOd5DKzE8QOFtdMeuGhkVwfkSJuBZfhEsxjUSzh5o8Ft5owkOkgE4tksdHJ9CbN9H5K uQ0= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376518" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:32 +0800 IronPort-SDR: gusCJyL3DeqMYkaY7dK8WKX3f2+Uqj8Xz2rdya6K9Pokb4kee+yp1/0Yo6Saj3reELn0WRvezt 11n0nXdAuA4iAjCNnq4/3Q+BzjZno4Mj7So2OB0xeOSsVcqXMvRyPQR6s7cj4lP6CWcGRsrhdM cKyw8QObsYNZD7emf1VOg89YZTnIWnjKQzkKJs9LKuHgnzmTn3UwAq1s2MtNOdCPS5Xxmtr8pk FA7u7ebfFHwrIQAeD2ywzvZQUzhUt9j40B48D4eTnfmS6+MHcresh9g75Tz19nyE9MBhxhm9TC Ywgi0UTAmjuWfqIDIOFqML7w Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:34 -0800 IronPort-SDR: +fKs21Wo4u6P+cqOYxU3R8t1RF/NHpALLUvAxiWIOjTPKrVEldC6mK325gEbY3fAn6MLmgikj9 PfrcXXztLeYzVarmDcpVo/qX4qslW1yJ5lnAqscZFUAgo0xfwBmjETBwgW3w8kF39SrVIrktyx Q/69OQyxno7+80QGn4cqUbWgAIqOL9ofTd3Gn8aD+MxLQqABZe6g4i8t/Tq7QjO+oJWeGV6ohw so7QLS733KtZV5GrfP0lxWMI7tcipLXeSDDcmwO4ukJWtmGTA7pKrvZNEu0181eHiXEZPLKuLR MAE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:32 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 17/41] btrfs: do sequential extent allocation in ZONED mode Date: Tue, 10 Nov 2020 20:26:20 +0900 Message-Id: <604e8da010ec0d5529ee5f8a468681d3f1f0282a.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a sequential extent allocator for the ZONED mode. This allocator just needs to check if there is enough space in the block group. Therefor the allocator never manages bitmaps or clusters. Also add ASSERTs to the corresponding functions. Actually, with zone append writing, it is unnecessary to track the allocation offset. It only needs to check space availability. But, by tracking the offset and returning the offset as an allocated region, we can skip modification of ordered extents and checksum information when there is no IO reordering. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 ++ fs/btrfs/extent-tree.c | 85 ++++++++++++++++++++++++++++++++++--- fs/btrfs/free-space-cache.c | 6 +++ 3 files changed, 89 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 723b7c183cd9..232885261c37 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -683,6 +683,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only struct btrfs_caching_control *caching_ctl; int ret = 0; + /* Allocator for ZONED btrfs does not use the cache at all */ + if (btrfs_is_zoned(fs_info)) + return 0; + caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS); if (!caching_ctl) return -ENOMEM; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 09439782b9a8..ab0ce3ba2b89 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3563,6 +3563,7 @@ btrfs_release_block_group(struct btrfs_block_group *cache, enum btrfs_extent_allocation_policy { BTRFS_EXTENT_ALLOC_CLUSTERED, + BTRFS_EXTENT_ALLOC_ZONED, }; /* @@ -3815,6 +3816,58 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Simple allocator for sequential only block group. It only allows + * sequential allocation. No need to play with trees. This function + * also reserves the bytes as in btrfs_add_reserved_bytes. + */ +static int do_allocation_zoned(struct btrfs_block_group *block_group, + struct find_free_extent_ctl *ffe_ctl, + struct btrfs_block_group **bg_ret) +{ + struct btrfs_space_info *space_info = block_group->space_info; + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 start = block_group->start; + u64 num_bytes = ffe_ctl->num_bytes; + u64 avail; + int ret = 0; + + ASSERT(btrfs_is_zoned(block_group->fs_info)); + + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + if (block_group->ro) { + ret = 1; + goto out; + } + + avail = block_group->length - block_group->alloc_offset; + if (avail < num_bytes) { + ffe_ctl->max_extent_size = avail; + ret = 1; + goto out; + } + + ffe_ctl->found_offset = start + block_group->alloc_offset; + block_group->alloc_offset += num_bytes; + spin_lock(&ctl->tree_lock); + ctl->free_space -= num_bytes; + spin_unlock(&ctl->tree_lock); + + /* + * We do not check if found_offset is aligned to stripesize. The + * address is anyway rewritten when using zone append writing. + */ + + ffe_ctl->search_start = ffe_ctl->found_offset; + +out: + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + return ret; +} + static int do_allocation(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) @@ -3822,6 +3875,8 @@ static int do_allocation(struct btrfs_block_group *block_group, switch (ffe_ctl->policy) { case BTRFS_EXTENT_ALLOC_CLUSTERED: return do_allocation_clustered(block_group, ffe_ctl, bg_ret); + case BTRFS_EXTENT_ALLOC_ZONED: + return do_allocation_zoned(block_group, ffe_ctl, bg_ret); default: BUG(); } @@ -3836,6 +3891,9 @@ static void release_block_group(struct btrfs_block_group *block_group, ffe_ctl->retry_clustered = false; ffe_ctl->retry_unclustered = false; break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3864,6 +3922,9 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, case BTRFS_EXTENT_ALLOC_CLUSTERED: found_extent_clustered(ffe_ctl, ins); break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3879,6 +3940,9 @@ static int chunk_allocation_failed(struct find_free_extent_ctl *ffe_ctl) */ ffe_ctl->loop = LOOP_NO_EMPTY_SIZE; return 0; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Give up here */ + return -ENOSPC; default: BUG(); } @@ -4047,6 +4111,9 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, case BTRFS_EXTENT_ALLOC_CLUSTERED: return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); + case BTRFS_EXTENT_ALLOC_ZONED: + /* nothing to do */ + return 0; default: BUG(); } @@ -4110,6 +4177,9 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.last_ptr = NULL; ffe_ctl.use_cluster = true; + if (btrfs_is_zoned(fs_info)) + ffe_ctl.policy = BTRFS_EXTENT_ALLOC_ZONED; + ins->type = BTRFS_EXTENT_ITEM_KEY; ins->objectid = 0; ins->offset = 0; @@ -4252,20 +4322,23 @@ static noinline int find_free_extent(struct btrfs_root *root, /* move on to the next group */ if (ffe_ctl.search_start + num_bytes > block_group->start + block_group->length) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } if (ffe_ctl.found_offset < ffe_ctl.search_start) - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - ffe_ctl.search_start - ffe_ctl.found_offset); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + ffe_ctl.search_start - ffe_ctl.found_offset); ret = btrfs_add_reserved_bytes(block_group, ram_bytes, num_bytes, delalloc); if (ret == -EAGAIN) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } btrfs_inc_block_group_reservations(block_group); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index f6434794cb0b..2161d0ad5cf0 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2903,6 +2903,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group, u64 align_gap_len = 0; enum btrfs_trim_state align_gap_trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, block_group->full_stripe_len, max_extent_size); @@ -3034,6 +3036,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group, struct rb_node *node; u64 ret = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&cluster->lock); if (bytes > cluster->max_size) goto out; @@ -3810,6 +3814,8 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group, int ret; u64 rem = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + *trimmed = 0; spin_lock(&block_group->lock); From patchwork Tue Nov 10 11:26:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894033 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 019D3138B for ; Tue, 10 Nov 2020 11:28:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE45820797 for ; Tue, 10 Nov 2020 11:28:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Vqa6Bcny" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729911AbgKJL2n (ORCPT ); Tue, 10 Nov 2020 06:28:43 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11994 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730559AbgKJL2i (ORCPT ); Tue, 10 Nov 2020 06:28:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007718; x=1636543718; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sKARQgR5Wfxf3GLvD7ZCEBxfMjbSuqApaZ3xwA24Wl4=; b=Vqa6BcnyhQQhnv+tvBp+QujN2bFLVp7bjT22THYW3+4g2J1B4EVGqH60 ozA2Aoh9Gu7Gx90ofeHR6ohpxhq/R75/P+T+LAFQlNwKrrjk7OggGlKQj vpMQFUbPwzeh6UuEtBPiHnJni0xQ02JaZZCvb9fLaQ4THXSOCU0PEibel +EgM6+IQZKJ5XD3n9YQo1pce4icX81BjAY670fCbs4C6+Ac1NfhClUQga HPuGZt5ZhXLOcKzZwQBJ3pnVfawKejtiUMcJ/oEkrF+hoXo+JQC3jg0wp YsCjdWUCI9medwpmcxDcKToTg0PoWsxCCRn5Lo5IaTLc/lw38wxGMUXhJ A==; IronPort-SDR: 653FIuNEF7b2GRoWk0Yv9zKwbSM6FJbMhLkO6C/3aocyHmFVHMvBp55Thf6rnfYxBMoIG+JZd6 V3iWHrOcTY+tNS7+Vq7ukbrk0DfpOLDP5p9dlwybF/HateE4XHFgdNQ20EX/c/bZxkhkVCyJck AnqbiZY12lj31Lox69uTuIc0BBGW0jmY1MuZLaX+f8jTPyXZ1odKQKkphV0QaAEsJkK9zT4UrB ftGux6TlKKpJWg/rHyQOv+QQ8iGLwTMnr+6PKMFU1L5ohjJ0WQMS9rVDRfUIemc/ka9ilwUC44 eJQ= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376526" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:34 +0800 IronPort-SDR: /Y6d++q3pukPnZ3TrBGJz3rujdtRKtaw2jJ9tza4vaMynZfOUamwRADvj/EdaX0Q0Xj0Vmrd1e vdktxkfMOrCeVuBzhyje7yV+qPPsvnKFsKJupQO7IixQ4c8hd6l0l/J9tUJ4n84WVLsPMCUkKK yG5N23hYt512+7bmoJwxoI9r2xYteAzKPhSVtEczGxzMYAEuoAqKZ8F5wI6JQX4QWfCS3On4RX E3cmDHWaSYRMHiZ+/x3LoA3gsJoZCGqVDL8kn/stAC8WkqS/Es7PIrLd5+sNUmUMhqaqk0SJTf PGRI1a9nZUeLe2/VKfmwDbXh Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:35 -0800 IronPort-SDR: rt4XZ/Dd8G0EdhJwMbIraOZCZrrkIPcjP+wPeSMK9Ab6y/U5W5NJTx1KmZkAsCVMYdQ2jAz1js LYVTpHR3EJ1cAwlL5TxFJSZOYRdyHwgFKXb8Dpv0ex7aBCV8tWPEzIk8Ed9kV3sNHZ65qqeDzI qbsqb7rVB4ihvTalYtf+6s/QpSD1RYt+hYCjXd1OtjVGU+QGAiVTR/QtBVNIyM3yTiimR1d/Wx 4RMvNyjsCSX56IPEbz5yMF9bNmqoK+RfHYR8TBpBxb3NfHXmoOE9rOyzOyp5hBXN6gumo2WOsd AZs= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:33 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 18/41] btrfs: reset zones of unused block groups Date: Tue, 10 Nov 2020 20:26:21 +0900 Message-Id: <7b1cd3bcb175cfc8d38b118dbd5bab99a483ca2f.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For an ZONED volume, a block group maps to a zone of the device. For deleted unused block groups, the zone of the block group can be reset to rewind the zone write pointer at the start of the zone. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 8 ++++++-- fs/btrfs/extent-tree.c | 17 ++++++++++++----- fs/btrfs/zoned.h | 16 ++++++++++++++++ 3 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 232885261c37..31511e59ca74 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1470,8 +1470,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) if (!async_trim_enabled && btrfs_test_opt(fs_info, DISCARD_ASYNC)) goto flip_async; - /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); + /* + * DISCARD can flip during remount. In ZONED mode, we need + * to reset sequential required zones. + */ + trimming = btrfs_test_opt(fs_info, DISCARD_SYNC) || + btrfs_is_zoned(fs_info); /* Implicit trim during transaction commit. */ if (trimming) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ab0ce3ba2b89..11e6483372c3 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1331,6 +1331,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { + struct btrfs_device *dev = stripe->dev; + u64 physical = stripe->physical; + u64 length = stripe->length; u64 bytes; struct request_queue *req_q; @@ -1338,14 +1341,18 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + req_q = bdev_get_queue(stripe->dev->bdev); - if (!blk_queue_discard(req_q)) + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, physical, length)) + ret = btrfs_reset_device_zone(dev, physical, + length, &bytes); + else if (blk_queue_discard(req_q)) + ret = btrfs_issue_discard(dev->bdev, physical, + length, &bytes); + else continue; - ret = btrfs_issue_discard(stripe->dev->bdev, - stripe->physical, - stripe->length, - &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index c86cde1978cd..6a07af0c7f6d 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -209,4 +209,20 @@ static inline u64 btrfs_align_offset_to_zone(struct btrfs_device *device, return ALIGN(pos, device->zone_info->zone_size); } +static inline bool btrfs_can_zone_reset(struct btrfs_device *device, + u64 physical, u64 length) +{ + u64 zone_size; + + if (!btrfs_dev_is_sequential(device, physical)) + return false; + + zone_size = device->zone_info->zone_size; + if (!IS_ALIGNED(physical, zone_size) || + !IS_ALIGNED(length, zone_size)) + return false; + + return true; +} + #endif From patchwork Tue Nov 10 11:26:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894133 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 443D2697 for ; Tue, 10 Nov 2020 11:30:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1383720795 for ; Tue, 10 Nov 2020 11:30:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="PIJsG2kV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732315AbgKJLaK (ORCPT ); Tue, 10 Nov 2020 06:30:10 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11989 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730453AbgKJL2i (ORCPT ); Tue, 10 Nov 2020 06:28:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007718; x=1636543718; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9lnEVZ2ItOnsD3NwBy/s+6hPwRi5Jxg5WTqdk/nMxFA=; b=PIJsG2kV0tm9+8FV+X7rKTrhxxZ8vvrSw3jICdb+IP4VCTRFoqNExDC2 Pemh5eJgwswtKLSHoedC/iIyqqKJa00tsEuSiplAImAJHNUYa3M1ckfHh UN2EBMEGeQxgZpTMmkm+rI0WJEPjwStMJU7YCkoXVVKxsa5uI6TJmSG2e BT0idCcpvPId+r/vbq7bl2RtQ4Ms0gyD7aGvz7j6LwG7gJrfXrlb3JUoM 9XFsLDYtClkHW/wlCORXTWs9yOep+hNoCkevO35o76doMsqpRJ5hllsE3 6aeGLaNxUfBHfCBYOrHhy8X0YT2+qDdjYiV6N1V7z0fdUMTIODNWW0d81 w==; IronPort-SDR: dRLiAoNQkhCqgeezGtqCEIGBd8TcCCxW37c+4OMssMjWhzt7aXpJRmyUJ/4w481TDQDxpXanKg JgfW+ucx3sJ337pq7ZPFGDDLHUuyaiZaHo+Bvbt8CB1m7WbXFvZ8Ebcl97ZrpI/Bj/HVfJeEk6 czkCnPiymKl1JL3oP3xSFFX7ylvBsskVodjcXbAOAItG+Zo1yuz0ZRPqEGFIhzSHbq4GkSpJCr tVlY18bJUzttSREPrq1hxtB7ikESXwCqIwERjJGzmcez6Y/eZ/nt/GBOOSRP4ZASO/e7+5MxJK BIM= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376536" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:36 +0800 IronPort-SDR: A3Kmlct59RX7t5aw9Gs7wZY3dFPq4+8iLpwR2x7dV9ix7I6KAXrysqg47KL9KoSmIQU/xJiU4U H92tFdoSDE4uzPSzPuFYE1mc9gTQNHHGvT9mVvLunG82BxlBDijbDmMgq9+rTO0uWvHa/N9Mwh RU9BAwyF/MIQjlZeZPGUTf770MjxZYiGiqWTtD37cRAoOlGrMoK2j9TGycc6YAEKVV3riLTlD4 QtYG/x1LfERzWfqRE0R7yi2SFXXdDHdJspyrekP+AL7JFINcUS+qW8akAZGHi/I0YbFUAhUf9q 9FnqTL7lAivI9MgIQFpHYvsG Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:37 -0800 IronPort-SDR: 68QBVDNKIssb3ReeUCgr/p9e2qWPWE3ux8SGq+Cp3ZGhG5N3ADJjWsJt9GEDdsn76Dmliqd8Wy 6jmIIs2dXI8ugaQM+dGrvZtQbP29yDIXC+NGg+WWasgfrhuZKlhbY+gnTQQOdJOmKAoOwx2g72 V7TP/k2u5mPBPCXEBIv3FmqptOWTUsJaEEJKvZTS9rrbI2RMWzKRyvj4sZ0cFBBHjq85Wu4PxA CPmBMdJcQBynOb4XWqjy9PncBLxfQGim3bhlQiMCT8/4TUs2n6/UluphjXqQv8DahS7rhOzZRT cRY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:35 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 19/41] btrfs: redirty released extent buffers in ZONED mode Date: Tue, 10 Nov 2020 20:26:22 +0900 Message-Id: <7ce858b1ea48d495b4c414d3f3ac12ebf745c781.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED volumes, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch introduces a list of clean and unwritten extent buffers that have been released in a transaction. Btrfs redirty the buffer so that btree_write_cache_pages() can send proper bios to the devices. Besides it clears the entire content of the extent buffer not to confuse raw block scanners e.g. btrfsck. By clearing the content, csum_dirty_buffer() complains about bytenr mismatch, so avoid the checking and checksum using newly introduced buffer flag EXTENT_BUFFER_NO_CHECK. Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 8 ++++++++ fs/btrfs/extent-tree.c | 12 +++++++++++- fs/btrfs/extent_io.c | 4 ++++ fs/btrfs/extent_io.h | 2 ++ fs/btrfs/transaction.c | 10 ++++++++++ fs/btrfs/transaction.h | 3 +++ fs/btrfs/tree-log.c | 6 ++++++ fs/btrfs/zoned.c | 37 +++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 9 files changed, 88 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 509085a368bb..c0180fbd5c78 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -462,6 +462,12 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) return 0; found_start = btrfs_header_bytenr(eb); + + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) { + WARN_ON(found_start != 0); + return 0; + } + /* * Please do not consolidate these warnings into a single if. * It is useful to know what went wrong. @@ -4616,6 +4622,8 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, EXTENT_DIRTY); btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents); + btrfs_free_redirty_list(cur_trans); + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 11e6483372c3..99640dacf8e6 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3422,8 +3422,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = check_ref_cleanup(trans, buf->start); - if (!ret) + if (!ret) { + btrfs_redirty_list_add(trans->transaction, buf); goto out; + } } pin = 0; @@ -3435,6 +3437,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, goto out; } + if (btrfs_is_zoned(fs_info)) { + btrfs_redirty_list_add(trans->transaction, buf); + pin_down_extent(trans, cache, buf->start, buf->len, 1); + btrfs_put_block_group(cache); + goto out; + } + WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)); btrfs_add_free_space(cache, buf->start, buf->len); @@ -4771,6 +4780,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, __btrfs_tree_lock(buf, nest); btrfs_clean_tree_block(buf); clear_bit(EXTENT_BUFFER_STALE, &buf->bflags); + clear_bit(EXTENT_BUFFER_NO_CHECK, &buf->bflags); btrfs_set_lock_blocking_write(buf); set_extent_buffer_uptodate(buf); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 60f5f68d892d..e91c504fe973 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -24,6 +24,7 @@ #include "rcu-string.h" #include "backref.h" #include "disk-io.h" +#include "zoned.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -4959,6 +4960,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, btrfs_leak_debug_add(&fs_info->eb_leak_lock, &eb->leak_list, &fs_info->allocated_ebs); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); @@ -5744,6 +5746,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv, char *src = (char *)srcv; unsigned long i = start >> PAGE_SHIFT; + WARN_ON(test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)); + if (check_eb_range(eb, start, len)) return; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index f39d02e7f7ef..5f2ccfd0205e 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -30,6 +30,7 @@ enum { EXTENT_BUFFER_IN_TREE, /* write IO error */ EXTENT_BUFFER_WRITE_ERR, + EXTENT_BUFFER_NO_CHECK, }; /* these are flags for __process_pages_contig */ @@ -107,6 +108,7 @@ struct extent_buffer { */ wait_queue_head_t read_lock_wq; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG int spinning_writers; atomic_t spinning_readers; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 52ada47aff50..a8561536cd0d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -22,6 +22,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" #define BTRFS_ROOT_TRANS_TAG 0 @@ -336,6 +337,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); @@ -2345,6 +2348,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + /* + * At this point, we should have written all the tree blocks + * allocated in this transaction. So it's now safe to free the + * redirtyied extent buffers. + */ + btrfs_free_redirty_list(cur_trans); + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 858d9153a1cd..380e0aaa15b3 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -92,6 +92,9 @@ struct btrfs_transaction { */ atomic_t pending_ordered; wait_queue_head_t pending_wait; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 56cbc1706b6f..5f585cf57383 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -20,6 +20,7 @@ #include "inode-map.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" /* magic values for the inode_only field in btrfs_log_inode: * @@ -2742,6 +2743,8 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans, free_extent_buffer(next); return ret; } + btrfs_redirty_list_add( + trans->transaction, next); } else { if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &next->bflags)) clear_extent_buffer_dirty(next); @@ -3277,6 +3280,9 @@ static void free_log_tree(struct btrfs_trans_handle *trans, clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); extent_io_tree_release(&log->log_csum_range); + + if (trans && log->node) + btrfs_redirty_list_add(trans->transaction, log->node); btrfs_put_root(log); } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 5ee26b9fe5b1..b56bfeaf8744 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -10,6 +10,7 @@ #include "rcu-string.h" #include "disk-io.h" #include "block-group.h" +#include "transaction.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1023,3 +1024,39 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) */ btrfs_free_excluded_extents(cache); } + +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) +{ + struct btrfs_fs_info *fs_info = eb->fs_info; + + if (!btrfs_is_zoned(fs_info) || + btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) || + !list_empty(&eb->release_list)) + return; + + set_extent_buffer_dirty(eb); + set_extent_bits_nowait(&trans->dirty_pages, eb->start, + eb->start + eb->len - 1, EXTENT_DIRTY); + memzero_extent_buffer(eb, 0, eb->len); + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); + + spin_lock(&trans->releasing_ebs_lock); + list_add_tail(&eb->release_list, &trans->releasing_ebs); + spin_unlock(&trans->releasing_ebs_lock); + atomic_inc(&eb->refs); +} + +void btrfs_free_redirty_list(struct btrfs_transaction *trans) +{ + spin_lock(&trans->releasing_ebs_lock); + while (!list_empty(&trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + spin_unlock(&trans->releasing_ebs_lock); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 6a07af0c7f6d..a7de80c313be 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -42,6 +42,9 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb); +void btrfs_free_redirty_list(struct btrfs_transaction *trans); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -122,6 +125,10 @@ static inline int btrfs_load_block_group_zone_info( static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } +static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) { } +static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Nov 10 11:26:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894031 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 50D81697 for ; Tue, 10 Nov 2020 11:28:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2867D20795 for ; Tue, 10 Nov 2020 11:28:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="ViMwpgc1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731275AbgKJL2o (ORCPT ); Tue, 10 Nov 2020 06:28:44 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11959 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730651AbgKJL2j (ORCPT ); Tue, 10 Nov 2020 06:28:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007718; x=1636543718; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0DiHBGFjDCJ5ptw3denLqvkhsyqTI6b8OhHflIU+leY=; b=ViMwpgc1NRjgIw3kLEwqUfoYvk1n9d0ds3zMGixe62n27MvqE0l9sQDD z1EcTYd5VaNmaHzKO3K9l9zuqtlAHepCQVUp0HgAi+aNHVVnpbICZIDKK YGRO7MA9qcrI0cN4sPRPIXnP0LhU8APPRGassAjJWntubNyEWqLOuHZ0W cJp1opu1oPsY+ed0slSs0QWLK20lt0PJvPNlBwQJAcAAgmolx4Vawgu9V n3+o6Zl+q9Q6bn0vJBzI5UtBdEV7FWDA5CdqcJNOBYVWSP1e5mDvMNGhA lCI7/DOjPct7rAst/tRQoZXGaefu6CzW1TAeNffkGv08vUMWtHn1GUmTF Q==; IronPort-SDR: ZevCRCd8Z6FT52ZQtBie2joxAfcjwk48e3FJ+QlY8mkO8SxPuy1rjWZuk44RjXVzluTEvekIXL Y/6o/D/Yq1NhoOQbSgob/xYEZrsyjD9moyNS/PsrcEikaB2zY0zyMYc+2sfCJjuI7kdyWpBLKr SU0KEPxOWneFQ1Dgsj+4DCPt38/DHCHcWCLbd6Na5txjVFQ854X61U2DhPC+KfsWNphwkJL1QI K1ZMttbmmkgWUx/WhO+JEi3AsLsuJBF4WFjbAyIO+W/jN5rJOtGi17zauSHI6ayL4D9ty3JJMO d0I= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376546" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:37 +0800 IronPort-SDR: iM+FUreI4CxIDHtCwD0XHbj5Ce9GUFzHDMOpfr/ZUQem8o4glink2+VHyv3lox6RrrF9ukOMMU G+khoatX74QzUsBqArLZbGOhHI0i555fl3+3jhYimY5eAWj4AIxq025vt880k4L9xAYD9I6u8m 3dLZf2Wa3gSFRwRVfouaiPH9N5pIYfTwyHsp4bAExwU1jlDXzgCNTbJNrHuyrKl6doGKEtb3+q ayJOC8wGzDK88r9WUwRHqVuQUlJTt/GyCz6k57jrKTdRC80bix7P70K6fKQMN8JeqmGeY+9ZZb fBHhnmv+5YgZ7LRWhm3T4RlU Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:38 -0800 IronPort-SDR: M9Lv1HMBCoQ6QrL93UKKXBJUG9DCaYFF2aV5jYYqLBQsClN2EKnp6GIy9/UIRTsmzftLAqtCeb Z4w1dc3gm2F9ETPBlJdKTTUWVGDqRoNraZ1/XHhp3o7KwOaiQH0CG35MQP3gaShYXTZI1jOg8h jit9zVPwdsbw/MVjLWBMdqcR/rsw9xLCRoGDk2a7yUpM7zEHv1/ImnmLBaCLT11xWJ6FfYd7vm YTqf8v/ojy06KZbuKrCrOo8KgeLTeEwJQaDF4RjPsfbkRiFLrfe3NINEwBe0P8Fgl8pCgvK9Ys zGA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:37 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 20/41] btrfs: extract page adding function Date: Tue, 10 Nov 2020 20:26:23 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit extract page adding to bio part from submit_extent_page(). The page is added only when bio_flags are the same, contiguous and the added page fits in the same stripe as pages in the bio. Condition checkings are reordered to allow early return to avoid possibly heavy btrfs_bio_fits_in_stripe() calling. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 56 ++++++++++++++++++++++++++++++++------------ 1 file changed, 41 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e91c504fe973..868ae0874a34 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3012,6 +3012,44 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size) return bio; } +/** + * btrfs_bio_add_page - attempt to add a page to bio + * @bio: destination bio + * @page: page to add to the bio + * @logical: offset of the new bio or to check whether we are adding + * a contiguous page to the previous one + * @pg_offset: starting offset in the page + * @size: portion of page that we want to write + * @prev_bio_flags: flags of previous bio to see if we can merge the current one + * @bio_flags: flags of the current bio to see if we can merge them + * + * Attempt to add a page to bio considering stripe alignment etc. Return + * true if successfully page added. Otherwise, return false. + */ +static bool btrfs_bio_add_page(struct bio *bio, struct page *page, u64 logical, + unsigned int size, unsigned int pg_offset, + unsigned long prev_bio_flags, + unsigned long bio_flags) +{ + sector_t sector = logical >> SECTOR_SHIFT; + bool contig; + + if (prev_bio_flags != bio_flags) + return false; + + if (prev_bio_flags & EXTENT_BIO_COMPRESSED) + contig = bio->bi_iter.bi_sector == sector; + else + contig = bio_end_sector(bio) == sector; + if (!contig) + return false; + + if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) + return false; + + return bio_add_page(bio, page, size, pg_offset) == size; +} + /* * @opf: bio REQ_OP_* and REQ_* flags as one value * @wbc: optional writeback control for io accounting @@ -3040,27 +3078,15 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t page_size = min_t(size_t, size, PAGE_SIZE); - sector_t sector = offset >> 9; struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; ASSERT(bio_ret); if (*bio_ret) { - bool contig; - bool can_merge = true; - bio = *bio_ret; - if (prev_bio_flags & EXTENT_BIO_COMPRESSED) - contig = bio->bi_iter.bi_sector == sector; - else - contig = bio_end_sector(bio) == sector; - - if (btrfs_bio_fits_in_stripe(page, page_size, bio, bio_flags)) - can_merge = false; - - if (prev_bio_flags != bio_flags || !contig || !can_merge || - force_bio_submit || - bio_add_page(bio, page, page_size, pg_offset) < page_size) { + if (force_bio_submit || + !btrfs_bio_add_page(bio, page, offset, page_size, pg_offset, + prev_bio_flags, bio_flags)) { ret = submit_one_bio(bio, mirror_num, prev_bio_flags); if (ret < 0) { *bio_ret = NULL; From patchwork Tue Nov 10 11:26:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894047 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CCA5697 for ; Tue, 10 Nov 2020 11:29:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 43E1A20795 for ; Tue, 10 Nov 2020 11:29:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Ko9xrZsF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729484AbgKJL3E (ORCPT ); Tue, 10 Nov 2020 06:29:04 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11994 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730779AbgKJL2k (ORCPT ); Tue, 10 Nov 2020 06:28:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007719; x=1636543719; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iUwnSUCNikli3pTqbc7EDAdDA7Oj0czvrjroerNYigk=; b=Ko9xrZsFSe17rbb0tUIFncbN+A5ZMxcxDWHOHxQ+qyasCQpXDFjat21M R70AU6rXsUlqM9qHPGd6b6ZFuciQ/p0I8WHuRu5iqbwjIN4dOMc5NVgiC sLNFgh397cnZkgBWW5BW5IBbMbmHDed7iSmLuLia2mE2SuDNWKMDFoSGK yhRtacvqXFNxtTLXGJFOIcWTZS+ct2AhbP/OEJzkmP04d9PQWD0FIpDo8 +Sj4nDktov99TPR/+gfxoMJUVa+n8vLr2d+si+Dd0sHqINsTzhczqxYDH hKs68Q64T2jYz24JzdOyFNjtMVDJSX4PduIuwzLaAdSqU7VzCqOwyieKb w==; IronPort-SDR: u5/t1IOI+6cSGRRpm2LdACkwaRQMxzDwISM3c2kobahXVUIy5twr4Y/CSVE4UvQXEyfsv7sF5q ui+8Ts+ge7LW13DH0Qyqn00pz8ObeXJcUUBghqwXkutrPhMe4VWPrwpMOhF4OAskVBcmZTP9sO YlVvqRztUZvHJQxFuscIZpiM5zraVu7icEaqGdrBaHl+FiaK+A8aWh3vEzHspQ/+uQ2jKAKDoX LtoRSDbopmIB6aOP/uyr9HB/XUsuMiAnN8iyOz1u1h3aE9+a4En4bdiHc9h5t5ITDWWv3AA5bd SXE= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376558" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:39 +0800 IronPort-SDR: 5d+q+rL8KDo2ijv2up4VXs1UMynpnUSyy7n/hb8v3Hl4zi8CneXvN5LES40cbvpUj80sOLMZp3 qM8I9u7X+q8UuuLUK8yfxjOOyEXIYsJlEAXD+oQakTOhz98RBTD9jciQJOd854hHjcQN6RQzkb 5Q9vfvQAG89y5mLUjpbhsabU/h7axrNgV/RSxTEfz3a7AFDSU7Ojyf0iLxXXQPIdbHBj/nOevs fekJY7LZi69U/nntkrAJ3JMECZ+hVFr+rwXGtGr/gcUQEf787DNmT6p23JNaM926qf1WFdxw7O wIpjbxKi4LGjLRPAqkR9LPHX Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:40 -0800 IronPort-SDR: YI1yqGJnrjhcNjyEhqclTsFVk2GBt4Zj1e8tQcNGBsGwBpQLiGZYgtBLJDyhh9w4Vc2b3ewpUR WEf1PjUuywYLCA3BlUgEV14AbsiYBCDjJK7MI5Juc77IjiDoW3kCM/5tj+pgFkHxn5hZFErSJJ Hi/GBOxxqrAY6bEnt8N7Ww+H3kyj5NBGjsEpTxjL1lJSc9sOzYJffLxaAWlogJORBbeZrliJhb V1XQo6ebtBRrNM7rX1DMV5w7MjIefv7s6HFh7Q/mFkLX7gzzi4fDpwS/phv2pkNr5N3n9nj4iu 2d0= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:38 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 21/41] btrfs: use bio_add_zone_append_page for zoned btrfs Date: Tue, 10 Nov 2020 20:26:24 +0900 Message-Id: <190a0c66e8debf9017e91ceccf05320451b4529e.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned device has its own hardware restrictions e.g. max_zone_append_size when using REQ_OP_ZONE_APPEND. To follow the restrictions, use bio_add_zone_append_page() instead of bio_add_page(). We need target device to use bio_add_zone_append_page(), so this commit reads the chunk information to memoize the target device to btrfs_io_bio(bio)->device. Currently, zoned btrfs only supports SINGLE profile. In the feature, btrfs_io_bio can hold extent_map and check the restrictions for all the devices the bio will be mapped. Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 868ae0874a34..b9b366f4d942 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3033,6 +3033,7 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, u64 logical, { sector_t sector = logical >> SECTOR_SHIFT; bool contig; + int ret; if (prev_bio_flags != bio_flags) return false; @@ -3047,7 +3048,12 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, u64 logical, if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) return false; - return bio_add_page(bio, page, size, pg_offset) == size; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) + ret = bio_add_zone_append_page(bio, page, size, pg_offset); + else + ret = bio_add_page(bio, page, size, pg_offset); + + return ret == size; } /* @@ -3078,7 +3084,9 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t page_size = min_t(size_t, size, PAGE_SIZE); - struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; + struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct extent_io_tree *tree = &inode->io_tree; + struct btrfs_fs_info *fs_info = inode->root->fs_info; ASSERT(bio_ret); @@ -3109,11 +3117,27 @@ static int submit_extent_page(unsigned int opf, if (wbc) { struct block_device *bdev; - bdev = BTRFS_I(page->mapping->host)->root->fs_info->fs_devices->latest_bdev; + bdev = fs_info->fs_devices->latest_bdev; bio_set_dev(bio, bdev); wbc_init_bio(wbc, bio); wbc_account_cgroup_owner(wbc, page, page_size); } + if (btrfs_is_zoned(fs_info) && + bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct extent_map *em; + struct map_lookup *map; + + em = btrfs_get_chunk_map(fs_info, offset, page_size); + if (IS_ERR(em)) + return PTR_ERR(em); + + map = em->map_lookup; + /* We only support SINGLE profile for now */ + ASSERT(map->num_stripes == 1); + btrfs_io_bio(bio)->device = map->stripes[0].dev; + + free_extent_map(em); + } *bio_ret = bio; From patchwork Tue Nov 10 11:26:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894037 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A45B7697 for ; Tue, 10 Nov 2020 11:28:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7E68020795 for ; Tue, 10 Nov 2020 11:28:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="DA5OAuGR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731801AbgKJL2y (ORCPT ); Tue, 10 Nov 2020 06:28:54 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12008 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730893AbgKJL2m (ORCPT ); Tue, 10 Nov 2020 06:28:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007721; x=1636543721; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aLK/wkq3EbfLhHOYD2vou6jXpNPWIkCcEfwQ4r7wf1M=; b=DA5OAuGRikY11/elydBNy7nTO8mdk33250eWrsQiF1FeDTXrUuXsJbC2 M1jpKFlnSKed4jGf/gPjfGlG9s6WZlsw/pTQ7DZDrMwJ7Sw2hmJFYkNWz prCfCO4G22g2tL+OztBdT+QMbkGPHkJ7sG90iTFrR9lFRzESv0gH3+YzJ 2rSedOokOIgh1QaNyMtpygPncpGCI1WY1hY327ZiXUinNoldKsXQJ4oqb v7X1Y3DBLJmHvwBc6HiSgcgXfySp1fi3wmxBSYpSpKqOJijLwiyqEQhSe ZQxfJnctevYvGrCYFrn5Y7mBP42ZskYNW2YWGLzpXNOIKk6VNRfZWejfl g==; IronPort-SDR: mhYVlWnh+2gtAThKI90OwKdnP9bhK+o3C0tbQVe/X6U7wwVMbBosw4jfmyGI9JRSeJqCFVfXRa ZWgz/RHEIHG/HUNvRQMFoaP79/GziHQuA2mNHeYtQmpLBg35TFxjKlxdtvGAgWIoqJxzXq9F74 kV7+2zqOUuj1VZV26qxIV7JTLosBkxVsJoLCu97UgKOhq/blEAKfxW1hFNqpiP+Jj5vT79q0Pg 3M/fjHeQG6jkNkcokkzKMoRJ6kzcS1VtHdtBaIfR9W9/YQdYXVrBo+PlNDVF1SZs2+kl71Qj99 V4A= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376569" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:41 +0800 IronPort-SDR: CLTatmkL6ZcniUIoDJ2hH8rz6dKHalor6LP4WZ2vBsu/4s0EkwxN5IF4+9AkOtb9PwP4Rk7idA N+Nb+PxFrdpwH0k0bxF0BkDE80PZ047COtzy2SzrbXRWC9pSJsym0WyPuY8IzgXFxLrDzqfiet qn94WdXNB9ao9eXF/0AI/eHShOPV/qfnR9HR1/kpsLWP226FIgwfpUQmXFXA/ym57MiJHW+wZM t4jWyzNQ3FowGf/DhePEIShE2GtAOZythQ2SQpd46SUo2chNMIWSnmmreIp5J14ltIqj3JvfnX s07lLmibCRlH0jCMGS1fuu6K Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:42 -0800 IronPort-SDR: pWydRBFS8zbWq/5w4AssesrTYi10Wuo2u/aofmEgBNnTMix/tjTMfIFEpI9imftUXupXlHDwNo KGL3davSODbwqqTP/VlyCm0pH26o/utQh9Um1UmunLSWL1YOqPSCRSeyrO5i1OsfHrTgCh8575 bi5Bkbut8YIxBaQGmmMjiH8KI/k9jZcYxjLJZXPgMJ11e5FNNyJRLXmvCe/DvoMbZ4kxsyzKDb 9TJd73t+upHvdgcv7ijh6uI6Y8kTaVXrIjL3REM9iMbratQKUl1ZmARfh6rqPoMp24x9dibh61 P1A= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:40 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 22/41] btrfs: handle REQ_OP_ZONE_APPEND as writing Date: Tue, 10 Nov 2020 20:26:25 +0900 Message-Id: <08be135868502b85ae7612df919624e26a8c0b2d.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org ZONED btrfs uses REQ_OP_ZONE_APPEND bios for writing to actual devices. Let btrfs_end_bio() and btrfs_op be aware of it. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/inode.c | 10 +++++----- fs/btrfs/volumes.c | 8 ++++---- fs/btrfs/volumes.h | 1 + 4 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c0180fbd5c78..8acf1ed75889 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -652,7 +652,7 @@ static void end_workqueue_bio(struct bio *bio) fs_info = end_io_wq->info; end_io_wq->status = bio->bi_status; - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) wq = fs_info->endio_meta_write_workers; else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) @@ -827,7 +827,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int async = check_async_write(fs_info, BTRFS_I(inode)); blk_status_t ret; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { /* * called for a read, do the setup so that checksum validation * can happen in the async kernel threads diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 936c3137c646..591ca539e444 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2192,7 +2192,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) goto out; @@ -7526,7 +7526,7 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) if (!refcount_dec_and_test(&dip->refs)) return; - if (bio_op(dip->dio_bio) == REQ_OP_WRITE) { + if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) { __endio_write_update_ordered(BTRFS_I(dip->inode), dip->logical_offset, dip->bytes, @@ -7695,7 +7695,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip = bio->bi_private; - bool write = bio_op(bio) == REQ_OP_WRITE; + bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; blk_status_t ret; /* Check btrfs_submit_bio_hook() for rules about async submit. */ @@ -7746,7 +7746,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, struct inode *inode, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); size_t dip_size; struct btrfs_dio_private *dip; @@ -7777,7 +7777,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, struct bio *dio_bio, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const bool raid56 = (btrfs_data_alloc_profile(fs_info) & diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c0e27c1e2559..683b3ed06226 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6451,7 +6451,7 @@ static void btrfs_end_bio(struct bio *bio) struct btrfs_device *dev = btrfs_io_bio(bio)->device; ASSERT(dev->bdev); - if (bio_op(bio) == REQ_OP_WRITE) + if (btrfs_op(bio) == BTRFS_MAP_WRITE) btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); else if (!(bio->bi_opf & REQ_RAHEAD)) @@ -6564,10 +6564,10 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, atomic_set(&bbio->stripes_pending, bbio->num_stripes); if ((bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) && - ((bio_op(bio) == REQ_OP_WRITE) || (mirror_num > 1))) { + ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) { /* In this case, map_length has been set to the length of a single stripe; not the whole write */ - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { ret = raid56_parity_write(fs_info, bio, bbio, map_length); } else { @@ -6590,7 +6590,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, dev = bbio->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) || - (bio_op(first_bio) == REQ_OP_WRITE && + (btrfs_op(first_bio) == BTRFS_MAP_WRITE && !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) { bbio_error(bbio, first_bio, logical); continue; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 0249aca668fb..cff1f7689eac 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -410,6 +410,7 @@ static inline enum btrfs_map_op btrfs_op(struct bio *bio) case REQ_OP_DISCARD: return BTRFS_MAP_DISCARD; case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return BTRFS_MAP_WRITE; default: WARN_ON_ONCE(1); From patchwork Tue Nov 10 11:26:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894039 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E9C4138B for ; Tue, 10 Nov 2020 11:29:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7AD9520797 for ; Tue, 10 Nov 2020 11:29:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="qZuwAD9n" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731671AbgKJL2x (ORCPT ); Tue, 10 Nov 2020 06:28:53 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12009 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731047AbgKJL2n (ORCPT ); Tue, 10 Nov 2020 06:28:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007722; x=1636543722; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/awjwcguPNehCkzUxeqKOKxd1bHu539Hs6ykxdiN6Co=; b=qZuwAD9ngFRRWE6bGcyFL2PiDTxCW1hwfZ3KKEToCe8x94pujBlEqnNT XBoOQ5YDnYenGZa3fyffmHIxznRlcP/8Yu+NFvECdrDWLHOH7FdkW9idh /cINC7JS5zKN7+ir2V46X/XXM7OW3ITzqMew2PzmPSoA9tiy0jfWNpr3b miy4Cx+4nQwmsUTx8ldrTKx1+KAgqA2uhFEe6TlJ1AHBLngdm6TEH+t8v yf4XC8jlToRcgtoPMskVS917UHt5qTktHAtG+uyn5fWHdTkzd+ri2ZGGG 9Wj1gWxMfhXvTkfPkMPO/UqpHGgiqVGNENIY+A1nL9vVZz1GYXqsqWWhr w==; IronPort-SDR: 4Saj9jdsEDqjzg/SxhUIf63GunzsWUvUg75bTsMcOfbrzly8a94te1ka4ykcxEqDDeOiorN2GA fInVSSiMQpVvHdb5gfMJ3CdOhMtEZdk/lLWChh4uC53tCeBo5czXkdu/PE1MbxJTb8makwVoPm RD/KpyuDw3WoHuI9MCVVwzgal1LE3O7GfsYEd9luWrBCKGPRwvZrzgxdIkXiMI0/5F+56zuv7X moObdQWSdN+lpqPVxR32OBySIN/wBDyBWBBj/z9Nl7BsMv3mbKbs9vA5hQ+W6mcM/6+45yyHHo 9wM= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376571" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:42 +0800 IronPort-SDR: SB7R1iVPAaSg2poCHX/kzzVJwP7q+j74QsV8ZsWOmQT+7fs960sTKIKt91qDk3/+16q39zbnzt m0UhvnWVlsiXh2PjEO9G+iMYn3B+s8kG3EUNR37+sfh5cZr+undWWt33zss5JEtApQrbitn177 j0skg4Q1jiRzdcM+/NRL7B6w4y3R3K/q3W6gxZRwSsqtkhIui/vzKGtPpXA2/iUuAT5kvMsnF+ 8ncHGQt1XXb/6utRzGqfABCBW21eoJ1Peh2DvreHydl6kPtZQjBfNJbZ8dB8gctfqmowDbB4Ls L/Egzw0qMGtBj0EgJfhCP0ao Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:43 -0800 IronPort-SDR: XOfvAHvO7a+D5bX5A4aI4Nd1qA5z+g0d1ng90DK26yvljDmX0+gmcNPArvPh5q2KwnTH+qVzYF qMPfRT4Edchbabxzk9/0l9GH7ljhOsHg/U5MTp5ffsEnCVUUi4owIMkeiYAQSk0UIZT60i8uHp nc2viLRKGUDcnn4ghQmLM5bBWYxlgol6DZJxG/yOuIIYEr0kpeo5fiIOIzO+24hzlfnJAEcr2M 1dRqlvujj39rUt1I8QlUYa5/DWdEydTZSSA9H6NkUGAR+b/BmWdNrdQW1Ffd+t4HAtUpQ2vJ8Y VMI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:42 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 23/41] btrfs: split ordered extent when bio is sent Date: Tue, 10 Nov 2020 20:26:26 +0900 Message-Id: <4c6d82729e000c4552fceae4a64b2a869c93eb8c.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For a zone append write, the device decides the location the data is written to. Therefore we cannot ensure that two bios are written consecutively on the device. In order to ensure that a ordered extent maps to a contiguous region on disk, we need to maintain a "one bio == one ordered extent" rule. This commit implements the splitting of an ordered extent and extent map on bio submission to adhere to the rule. Signed-off-by: Naohiro Aota Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: kernel test robot --- fs/btrfs/inode.c | 89 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.c | 76 +++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.h | 2 + 3 files changed, 167 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 591ca539e444..df85d8dea37c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2158,6 +2158,86 @@ static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio *bio, return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } +int extract_ordered_extent(struct inode *inode, struct bio *bio, + loff_t file_offset) +{ + struct btrfs_ordered_extent *ordered; + struct extent_map *em = NULL, *em_new = NULL; + struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree; + u64 start = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + u64 len = bio->bi_iter.bi_size; + u64 end = start + len; + u64 ordered_end; + u64 pre, post; + int ret = 0; + + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + if (WARN_ON_ONCE(!ordered)) + return -EIO; + + /* No need to split */ + if (ordered->disk_num_bytes == len) + goto out; + + /* We cannot split once end_bio'd ordered extent */ + if (WARN_ON_ONCE(ordered->bytes_left != ordered->disk_num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a compressed ordered extent */ + if (WARN_ON_ONCE(ordered->disk_num_bytes != ordered->num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a waited ordered extent */ + if (WARN_ON_ONCE(wq_has_sleeper(&ordered->wait))) { + ret = -EINVAL; + goto out; + } + + ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes; + /* bio must be in one ordered extent */ + if (WARN_ON_ONCE(start < ordered->disk_bytenr || end > ordered_end)) { + ret = -EINVAL; + goto out; + } + + /* Checksum list should be empty */ + if (WARN_ON_ONCE(!list_empty(&ordered->list))) { + ret = -EINVAL; + goto out; + } + + pre = start - ordered->disk_bytenr; + post = ordered_end - end; + + btrfs_split_ordered_extent(ordered, pre, post); + + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, ordered->file_offset, len); + if (!em) { + read_unlock(&em_tree->lock); + ret = -EIO; + goto out; + } + read_unlock(&em_tree->lock); + + ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)); + em_new = create_io_em(BTRFS_I(inode), em->start + pre, len, + em->start + pre, em->block_start + pre, len, + len, len, BTRFS_COMPRESS_NONE, + BTRFS_ORDERED_REGULAR); + free_extent_map(em_new); + +out: + free_extent_map(em); + btrfs_put_ordered_extent(ordered); + + return ret; +} + /* * extent_io.c submission hook. This does the right thing for csum calculation * on write, or reading the csums from the tree before a read. @@ -2192,6 +2272,15 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct page *page = bio_first_bvec_all(bio)->bv_page; + loff_t file_offset = page_offset(page); + + ret = extract_ordered_extent(inode, bio, file_offset); + if (ret) + goto out; + } + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 87bac9ecdf4c..35ef25e39561 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -943,6 +943,82 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, } } +static void clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos, + u64 len) +{ + struct inode *inode = ordered->inode; + u64 file_offset = ordered->file_offset + pos; + u64 disk_bytenr = ordered->disk_bytenr + pos; + u64 num_bytes = len; + u64 disk_num_bytes = len; + int type; + unsigned long flags_masked = + ordered->flags & ~(1 << BTRFS_ORDERED_DIRECT); + int compress_type = ordered->compress_type; + unsigned long weight; + + weight = hweight_long(flags_masked); + WARN_ON_ONCE(weight > 1); + if (!weight) + type = 0; + else + type = __ffs(flags_masked); + + if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered->flags)) { + WARN_ON_ONCE(1); + btrfs_add_ordered_extent_compress(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type, + compress_type); + } else if (test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) { + btrfs_add_ordered_extent_dio(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type); + } else { + btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, disk_num_bytes, + type); + } +} + +void btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post) +{ + struct inode *inode = ordered->inode; + struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree; + struct rb_node *node; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + spin_lock_irq(&tree->lock); + /* Remove from tree once */ + node = &ordered->rb_node; + rb_erase(node, &tree->tree); + RB_CLEAR_NODE(node); + if (tree->last == node) + tree->last = NULL; + + ordered->file_offset += pre; + ordered->disk_bytenr += pre; + ordered->num_bytes -= (pre + post); + ordered->disk_num_bytes -= (pre + post); + ordered->bytes_left -= (pre + post); + + /* Re-insert the node */ + node = tree_insert(&tree->tree, ordered->file_offset, + &ordered->rb_node); + if (node) + btrfs_panic(fs_info, -EEXIST, + "zoned: inconsistency in ordered tree at offset %llu", + ordered->file_offset); + + spin_unlock_irq(&tree->lock); + + if (pre) + clone_ordered_extent(ordered, 0, pre); + if (post) + clone_ordered_extent(ordered, pre + ordered->disk_num_bytes, post); +} + int __init ordered_data_init(void) { btrfs_ordered_extent_cache = kmem_cache_create("btrfs_ordered_extent", diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index c3a2325e64a4..e346b03bd66a 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -193,6 +193,8 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, u64 end, struct extent_state **cached_state); +void btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post); int __init ordered_data_init(void); void __cold ordered_data_exit(void); From patchwork Tue Nov 10 11:26:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894035 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 209C4697 for ; Tue, 10 Nov 2020 11:28:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ECCAA20795 for ; Tue, 10 Nov 2020 11:28:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="CRESpMiZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731738AbgKJL2y (ORCPT ); Tue, 10 Nov 2020 06:28:54 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11994 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731283AbgKJL2p (ORCPT ); Tue, 10 Nov 2020 06:28:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007724; x=1636543724; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=phzcPvuHmsPkQJ+yUffj3Bei/hxUy575+3UAfNI92uo=; b=CRESpMiZl/HW+4mVAzsQKyuFNfcbWvaaEH19XMHkkBYG3A+q+ekEuD0n YxrnPhpPlxm6AdZaGuWX3F7OracCTS8mmo2HTX3Q87pZKz2PPDDGnrcnB rgF+c6z5vpLZvNgQuXgjMovX640ZKr6L7qkISNLiqZEh208+LHX+d8jm+ DxvQjn6f8arxs851HdfzCfryd7sh9OA+qPajhorHYbEZxM/Z/FJ9BsliQ 0JIg6yuf/b0p614l0aHnNkOZCS8O/9fD8FcKy5JQQqJBMWFvLeCgigCQF yIiH+h/PPtoEEa+fDi58FjWI1cfXJM6X+A7jzT9xNI09v6lzQELXDUzBA A==; IronPort-SDR: 6DD7g/XoorMleccjUPkPUWngVja6g68PlWfQvI1+eh2kAWHijwwx2oKRfRWRjqfVPvQSWI4cyH WR11Ydlr3h44oduvgggpVPs7BGXdgHBkUR9OP5l+MeCkXROmOJMC0sj34HcnfOOlXxgPOeFX+9 cmMTb01syp5vZywvzKNX/sOIbKzNHD9oNbRAHcVBdc544qWko91ppJ6+Smfi6VBtiJCJyL4yue /KUi1FSIDLo1xwUQvQuy+pF76Rf0tV0pb+t7YRHdJEPpnCGWAibvDZb61l6FstLA8X+ZTX2SoA vYQ= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376580" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:44 +0800 IronPort-SDR: UviRmUMZjM+PaDXCuLfagybCyHaKXWZkydYzKYkuPcN6rlA4vrc88qep8IShCYJOZpxOUw65DC H3EgH+jz7/wlOyQpK7s8E0Apfh5WOl3QO0bYbAFR6wcEJRJJZM8JCB9xQzfI6zWTyYom2lk0+2 X9ZLi0bB9pKFJHwwkDuxx790mFc76CGCIOsgVxeV78hAF1aWEitM4Ss/46/TzDMc5FSiZxDw79 eSuHm0sR4G66C5yImA5vdDD7Z+AJGtSpQGgLJghbgQ8JWKHJN1OPrsXbr4MfxGpt4DfGxcT5GK Z1Q32wT+blq0lqJYmj/BtMpG Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:45 -0800 IronPort-SDR: Yr55ZuEVX28RKeIJjbXvdfmwR0XijpHt1uoVUSsiheJ3zhe9q+UrPDs0mzqlm9kcFMsn/Z3ZBC wIbGz8s9Kcvxx/4PJ+kfSmCbaHxkpm1l1u+kXx3kFnNo6MXLiLpbItgmtZa0Eutb5oI4L4oy7e l72/rNjd38eEy8akY3lTYBzPXqHoAYBYeQrfc7YlrRDsXtgOXNExX1iSpp5NB52Jlr1LE7WWOu u3G+5cnWhbidenPjvLUuRVFaURyxToueIUVkII+7WmO+fKKKJ/TNbLMvqBHBsPcNTTsYm5iTh7 3tM= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:43 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 24/41] btrfs: extend btrfs_rmap_block for specifying a device Date: Tue, 10 Nov 2020 20:26:27 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org btrfs_rmap_block currently reverse-maps the physical addresses on all devices to the corresponding logical addresses. This commit extends the function to match to a specified device. The old functionality of querying all devices is left intact by specifying NULL as target device. We pass block_device instead of btrfs_device to __btrfs_rmap_block. This function is intended to reverse-map the result of bio, which only have block_device. This commit also exports the function for later use. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 20 ++++++++++++++------ fs/btrfs/block-group.h | 8 +++----- fs/btrfs/tests/extent-map-tests.c | 2 +- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 31511e59ca74..04bb0602f1cc 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1646,8 +1646,11 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) } /** - * btrfs_rmap_block - Map a physical disk address to a list of logical addresses + * btrfs_rmap_block - Map a physical disk address to a list of logical + * addresses * @chunk_start: logical address of block group + * @bdev: physical device to resolve. Can be NULL to indicate any + * device. * @physical: physical address to map to logical addresses * @logical: return array of logical addresses which map to @physical * @naddrs: length of @logical @@ -1657,9 +1660,9 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) * Used primarily to exclude those portions of a block group that contain super * block copies. */ -EXPORT_FOR_TESTS int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len) + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len) { struct extent_map *em; struct map_lookup *map; @@ -1677,6 +1680,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, map = em->map_lookup; data_stripe_length = em->orig_block_len; io_stripe_size = map->stripe_len; + chunk_start = em->start; /* For RAID5/6 adjust to a full IO stripe length */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) @@ -1691,14 +1695,18 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, for (i = 0; i < map->num_stripes; i++) { bool already_inserted = false; u64 stripe_nr; + u64 offset; int j; if (!in_range(physical, map->stripes[i].physical, data_stripe_length)) continue; + if (bdev && map->stripes[i].dev->bdev != bdev) + continue; + stripe_nr = physical - map->stripes[i].physical; - stripe_nr = div64_u64(stripe_nr, map->stripe_len); + stripe_nr = div64_u64_rem(stripe_nr, map->stripe_len, &offset); if (map->type & BTRFS_BLOCK_GROUP_RAID10) { stripe_nr = stripe_nr * map->num_stripes + i; @@ -1712,7 +1720,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, * instead of map->stripe_len */ - bytenr = chunk_start + stripe_nr * io_stripe_size; + bytenr = chunk_start + stripe_nr * io_stripe_size + offset; /* Ensure we don't add duplicate addresses */ for (j = 0; j < nr; j++) { @@ -1754,7 +1762,7 @@ static int exclude_super_stripes(struct btrfs_block_group *cache) for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); - ret = btrfs_rmap_block(fs_info, cache->start, + ret = btrfs_rmap_block(fs_info, cache->start, NULL, bytenr, &logical, &nr, &stripe_len); if (ret) return ret; diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 5be47f4bfea7..9a4009eaaecb 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -275,6 +275,9 @@ void check_system_chunk(struct btrfs_trans_handle *trans, const u64 type); u64 btrfs_get_alloc_profile(struct btrfs_fs_info *fs_info, u64 orig_flags); void btrfs_put_block_group_cache(struct btrfs_fs_info *info); int btrfs_free_block_groups(struct btrfs_fs_info *info); +int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len); static inline u64 btrfs_data_alloc_profile(struct btrfs_fs_info *fs_info) { @@ -301,9 +304,4 @@ static inline int btrfs_block_group_done(struct btrfs_block_group *cache) void btrfs_freeze_block_group(struct btrfs_block_group *cache); void btrfs_unfreeze_block_group(struct btrfs_block_group *cache); -#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS -int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len); -#endif - #endif /* BTRFS_BLOCK_GROUP_H */ diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c index 57379e96ccc9..c0aefe6dee0b 100644 --- a/fs/btrfs/tests/extent-map-tests.c +++ b/fs/btrfs/tests/extent-map-tests.c @@ -507,7 +507,7 @@ static int test_rmap_block(struct btrfs_fs_info *fs_info, goto out_free; } - ret = btrfs_rmap_block(fs_info, em->start, btrfs_sb_offset(1), + ret = btrfs_rmap_block(fs_info, em->start, NULL, btrfs_sb_offset(1), &logical, &out_ndaddrs, &out_stripe_len); if (ret || (out_ndaddrs == 0 && test->expected_mapped_addr)) { test_err("didn't rmap anything but expected %d", From patchwork Tue Nov 10 11:26:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894045 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7189E697 for ; Tue, 10 Nov 2020 11:29:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E78E20797 for ; Tue, 10 Nov 2020 11:29:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="fVb6tHJw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731940AbgKJL3C (ORCPT ); Tue, 10 Nov 2020 06:29:02 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:11994 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730559AbgKJL2r (ORCPT ); Tue, 10 Nov 2020 06:28:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007726; x=1636543726; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LjZe5Xoe5pTP9Y53VSdJDUSVoh2ZKG8qiTFbp/0qTJE=; b=fVb6tHJwKqTVADWiHzj5WKnY8Ff9+Diaf8kVf2vhT4Tq05fcQbUAaFJu tPIKTR4xAWECMQIBVplYhhjMo28AZwm+QN2IANFobSfOk+A/lWFByBAdg EUBj1mxHMqd+TqrQ9wbNcUsaPUS1I5DU8W1dg8tEVePMyCkPCZAk6M6hr ehtuWxz+ZPa5WdKXpTps5T4CinadeGf4AbKImYfUKr8edvZLlo1198iCv ZdbvgfO5P6PhdQJO0U0FUbBMbVuYYyhhQKy2bbBqq79GbtJewVyuEZzPJ zNQjkyBeFy/tQ/Jzr0BuWm4HmgfONg2V03YvKBs4bhzqCldbMxmv/PO83 g==; IronPort-SDR: TlWHVnNfI/sCmRi6AnX0eG+Y9nLo5P/HIhcgJDPXyjQOnVzzQrnMlnXsNPgFewu7EE2DWgPFLw SBHY15syVIzWrgMnW/StM+yyR9BQMCT+RDGc4FExIW3NA1HtN+xMK5e1TTpaEMVDNYfFP7v0gb DCNRep9LukMUqvSahSwljs9+IdY7Rza04kp/gdj27rcKQVgSvL1iRX+wHJON+kDsQG1vOkOSsO RWP+SU9xQWqSBuLHFjKPt5eQMZVJyATgi06CEr2IKKS/XYxmYwN9YcZMXvbjqKpfnOI9izWwfV GZM= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376588" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:46 +0800 IronPort-SDR: 4UKjNrcRSsKnd5pgwyQmGC1XV6XBCEhW3GZoY4jT96jkNPldHAqE3L7/5UMh7JGZxxkTwHE1qF Gxt2GOaB239QVf1K1pOWshb9obqfOAbAqxlcL+CoKjADZH0T6S9YfFJPg7Qgm5xw8LJq8+Ks14 JvpOO94J/kcK4HgtFU78LXt5llWMU9QhhomakZHtufrLzaBFIeSycPVIYGQa8UqRDfpbYR83Sb p9BYD/RDnkxCwpIBDfbDKZErEd+/oX5+hY1nwhYj21HS+lwKXdTLF9HHyPno0aSwOwlhTQhi2n YTyF+475kHsA8fzigbfdxhf0 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:47 -0800 IronPort-SDR: pKt8FWIGxuy7FArfjg3ZLHx1ovn4y1reG6zkE9w4xkykqVqH0/O+EgAkqArJDekT8r1WkNII7k ZC2NuZpqtZjwsLiGf8IHzesrlZJ0IOtk6zUGk0nEohnIvCmrv24Mzh6fL6L9qiwmkoLI93Wsd4 EGup2UVKycBpfvcAnDXflXb4+Fo69AkqdZ6smuDmaJg7wQ2TAPQWGApmEzTYAoQ2OS0t5Y1yYY OyGzXNz9Ntb3xBMwJR/EZwwzCKAR9VZMLbwImBU2hAFHFXA38MWLM00zNEJm/SM1ksD4Eq0ruj j/g= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:45 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v10 25/41] btrfs: use ZONE_APPEND write for ZONED btrfs Date: Tue, 10 Nov 2020 20:26:28 +0900 Message-Id: <3ee842f2daea5bee76e62997b843dd5a49183481.1605007036.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit enables zone append writing for zoned btrfs. When using zone append, a bio is issued to the start of a target zone and the device decides to place it inside the zone. Upon completion the device reports the actual written position back to the host. Three parts are necessary to enable zone append in btrfs. First, modify the bio to use REQ_OP_ZONE_APPEND in btrfs_submit_bio_hook() and adjust the bi_sector to point the beginning of the zone. Secondly, records the returned physical address (and disk/partno) to the ordered extent in end_bio_extent_writepage() after the bio has been completed. We cannot resolve the physical address to the logical address because we can neither take locks nor allocate a buffer in this end_bio context. So, we need to record the physical address to resolve it later in btrfs_finish_ordered_io(). And finally, rewrites the logical addresses of the extent mapping and checksum data according to the physical address (using __btrfs_rmap_block). If the returned address matches the originally allocated address, we can skip this rewriting process. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 12 +++++++- fs/btrfs/file.c | 2 +- fs/btrfs/inode.c | 4 +++ fs/btrfs/ordered-data.c | 3 ++ fs/btrfs/ordered-data.h | 8 +++++ fs/btrfs/volumes.c | 15 +++++++++ fs/btrfs/zoned.c | 68 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 11 +++++++ 8 files changed, 121 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b9b366f4d942..7f94fef3647b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2743,6 +2743,7 @@ static void end_bio_extent_writepage(struct bio *bio) u64 start; u64 end; struct bvec_iter_all iter_all; + bool first_bvec = true; ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { @@ -2769,6 +2770,11 @@ static void end_bio_extent_writepage(struct bio *bio) start = page_offset(page); end = start + bvec->bv_offset + bvec->bv_len - 1; + if (first_bvec) { + btrfs_record_physical_zoned(inode, start, bio); + first_bvec = false; + } + end_extent_writepage(page, error, start, end); end_page_writeback(page); } @@ -3525,6 +3531,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, size_t blocksize; int ret = 0; int nr = 0; + int opf = REQ_OP_WRITE; const unsigned int write_flags = wbc_to_write_flags(wbc); bool compressed; @@ -3537,6 +3544,9 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, return 1; } + if (btrfs_is_zoned(inode->root->fs_info)) + opf = REQ_OP_ZONE_APPEND; + /* * we don't want to touch the inode after unlocking the page, * so we update the mapping writeback index now @@ -3597,7 +3607,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, page->index, cur, end); } - ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, + ret = submit_extent_page(opf | write_flags, wbc, page, offset, iosize, pg_offset, &epd->bio, end_bio_extent_writepage, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 68938a43081e..bdc268c91334 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2226,7 +2226,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * the current transaction commits before the ordered extents complete * and a power failure happens right after that. */ - if (full_sync) { + if (full_sync || btrfs_is_zoned(fs_info)) { ret = btrfs_wait_ordered_range(inode, start, len); } else { /* diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index df85d8dea37c..fe15441278de 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -51,6 +51,7 @@ #include "delalloc-space.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" struct btrfs_iget_args { u64 ino; @@ -2676,6 +2677,9 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) bool clear_reserved_extent = true; unsigned int clear_bits; + if (ordered_extent->disk) + btrfs_rewrite_logical_zoned(ordered_extent); + start = ordered_extent->file_offset; end = start + ordered_extent->num_bytes - 1; diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 35ef25e39561..1a3b06713d0f 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -199,6 +199,9 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset entry->compress_type = compress_type; entry->truncated_len = (u64)-1; entry->qgroup_rsv = ret; + entry->physical = (u64)-1; + entry->disk = NULL; + entry->partno = (u8)-1; if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE) set_bit(type, &entry->flags); diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index e346b03bd66a..084c609afd83 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -127,6 +127,14 @@ struct btrfs_ordered_extent { struct completion completion; struct btrfs_work flush_work; struct list_head work_list; + + /* + * used to reverse-map physical address returned from ZONE_APPEND + * write command in a workqueue context. + */ + u64 physical; + struct gendisk *disk; + u8 partno; }; /* diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 683b3ed06226..c8187d704c89 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6503,6 +6503,21 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_io_bio(bio)->device = dev; bio->bi_end_io = btrfs_end_bio; bio->bi_iter.bi_sector = physical >> 9; + /* + * For zone append writing, bi_sector must point the beginning of the + * zone + */ + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (btrfs_dev_is_sequential(dev, physical)) { + u64 zone_start = round_down(physical, + fs_info->zone_size); + + bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT; + } else { + bio->bi_opf &= ~REQ_OP_ZONE_APPEND; + bio->bi_opf |= REQ_OP_WRITE; + } + } btrfs_debug_in_rcu(fs_info, "btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u", bio_op(bio), bio->bi_opf, (u64)bio->bi_iter.bi_sector, diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index b56bfeaf8744..f38bd0200788 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1060,3 +1060,71 @@ void btrfs_free_redirty_list(struct btrfs_transaction *trans) } spin_unlock(&trans->releasing_ebs_lock); } + +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio) +{ + struct btrfs_ordered_extent *ordered; + u64 physical = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + + if (bio_op(bio) != REQ_OP_ZONE_APPEND) + return; + + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + if (WARN_ON(!ordered)) + return; + + ordered->physical = physical; + ordered->disk = bio->bi_disk; + ordered->partno = bio->bi_partno; + + btrfs_put_ordered_extent(ordered); +} + +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) +{ + struct extent_map_tree *em_tree; + struct extent_map *em; + struct inode *inode = ordered->inode; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_ordered_sum *sum; + struct block_device *bdev; + u64 orig_logical = ordered->disk_bytenr; + u64 *logical = NULL; + int nr, stripe_len; + + bdev = bdget_disk(ordered->disk, ordered->partno); + if (WARN_ON(!bdev)) + return; + + if (WARN_ON(btrfs_rmap_block(fs_info, orig_logical, bdev, + ordered->physical, &logical, &nr, + &stripe_len))) + goto out; + + WARN_ON(nr != 1); + + if (orig_logical == *logical) + goto out; + + ordered->disk_bytenr = *logical; + + em_tree = &BTRFS_I(inode)->extent_tree; + write_lock(&em_tree->lock); + em = search_extent_mapping(em_tree, ordered->file_offset, + ordered->num_bytes); + em->block_start = *logical; + free_extent_map(em); + write_unlock(&em_tree->lock); + + list_for_each_entry(sum, &ordered->list, list) { + if (*logical < orig_logical) + sum->bytenr -= orig_logical - *logical; + else + sum->bytenr += *logical - orig_logical; + } + +out: + kfree(logical); + bdput(bdev); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index a7de80c313be..2872a0cbc847 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -45,6 +45,9 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio); +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -129,6 +132,14 @@ static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb) { } static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } +static inline void btrfs_record_physical_zoned(struct inode *inode, + u64 file_offset, struct bio *bio) +{ +} + +static inline void btrfs_rewrite_logical_zoned( + struct btrfs_ordered_extent *ordered) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Nov 10 11:26:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894049 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C226697 for ; Tue, 10 Nov 2020 11:29:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0707620795 for ; Tue, 10 Nov 2020 11:29:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="lqQhP9Q2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730779AbgKJL3K (ORCPT ); Tue, 10 Nov 2020 06:29:10 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12022 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730001AbgKJL3I (ORCPT ); Tue, 10 Nov 2020 06:29:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007747; x=1636543747; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sMvVHJ6TFdYGhKN5I5Yr/kCzLx1a7+nltou3ISSvzeA=; b=lqQhP9Q2wkK8NVs++Yzd6yLDLrQ0ZbNiXMUN5HX3p9fufREWoJTk3ayD 7IG37FiHS/8/zVGxR7CLj2UzGkzMwS53IuXGYqf7rdRTRUIvvCmnfsZDp C406xR4CxtCMADCqv9oL+kgmxM95gNtMirbymdalzzDtQtG3R3OPKoO1A egjsXi9do4XB9Ib9pojy0BVLcYm3D/jJZra/fOnEJfR7e59lpLZj0rqvS SaRTu3iFFU9MnQWdpVkF0G62hU1k2F4XZwXeqc6rc+rStKyNf+kr4CLKC 3XeHOX6SiZrgDixAGjSHmXMMghw6vBQXKS6Rmv1Nz6Nc6A8kTPp3lRw7L w==; IronPort-SDR: QGVbxKe9tQnHxv3Cb+Gwetn2hifreaqm4ehDf8fbHoyWvaxJ7s2EbTIclSUZV7pwobHOKNRwc9 LIjZ73XIRWRvdQ8uJ8yEbIS3SM/+YDZV9qS4ghCRWjjoau4naK5Hd8ejCgxPL8EK2lmkrcAk3Q rHnSTtqNnEnDztzbCGVktagtuRO/IV309tRSPNyb9z0T+U2vm6uk7lucsOUbp2ArJ4itdDi+1A LRWLyubNHywtTMZrK3zuQ7n3Dpu8rc1zRgUCViRcQ4PPVEhjY8pSP/AxxBraVgHIx3WqTPxl6Z vZo= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376599" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:47 +0800 IronPort-SDR: uzQiijWtlAP4tca3h6uq9tiptJ69wnGbyQUKSBa0WqE8yTA2OJhH9KkzYIZoVs76dTFWwIKZPe EKWdh794ydzA1Hxf/MYVQn+cMLB3fB4/XEi3N+Uj8yVlW0K6t0WVqRsJFF6GY8yUNWjAYlrE9B vb2TxbwSL/8ZG+UGRIk/l8feqLCUIqU4Hpjm1dKLyEz5K9rafhsV5LgDyYjGiBewHy32eVXD4H iAj+HlOgxReYTQ2YPwnOA3w/eSVLuZNaeQvJwkBKts/y33zD8xRl+jf90r6Sp4YPayk/nxWURC gHoV1G9pLQgN7JnivSWCvrrO Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:48 -0800 IronPort-SDR: iV+DyAGixlRd/ix9HUOEDmQtqtm8NEf0ZGKsGJlKpvtzHW2PUQJQRbwMRPaqHwyOlAAxMRvNBQ TLug05e+9wYNvjJQfMw3uY1U8ZVATYHH8RiaOSxgogE62aZEUkuHDx1fcZb1LWfVNj0oSosq6H MC68Sfk5FwGi5ahq8IL7xJP125HyGj0BpyyWaJIYzvNeaKoSt78iWe0u6vv6yqcQmYtON5TYpZ gsfC82Y+tKFP7n5rRNKA//WE35kO1ZkxWVGVmSwo1vsRVod+W6LZwFZwK1uoGNJyO4vrDs1WAk jcU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:47 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 26/41] btrfs: enable zone append writing for direct IO Date: Tue, 10 Nov 2020 20:26:29 +0900 Message-Id: <38ffdd3dad3415079c350e284006d51aced384d6.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Likewise to buffered IO, enable zone append writing for direct IO when its used on a zoned block device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index fe15441278de..445cb6ba4a59 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7542,6 +7542,9 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, iomap->bdev = fs_info->fs_devices->latest_bdev; iomap->length = len; + if (write && btrfs_is_zoned(fs_info) && fs_info->max_zone_append_size) + iomap->flags |= IOMAP_F_ZONE_APPEND; + free_extent_map(em); return 0; @@ -7779,6 +7782,8 @@ static void btrfs_end_dio_bio(struct bio *bio) if (err) dip->dio_bio->bi_status = err; + btrfs_record_physical_zoned(dip->inode, dip->logical_offset, bio); + bio_put(bio); btrfs_dio_private_put(dip); } @@ -7933,6 +7938,18 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, bio->bi_end_io = btrfs_end_dio_bio; btrfs_io_bio(bio)->logical = file_offset; + WARN_ON_ONCE(write && btrfs_is_zoned(fs_info) && + fs_info->max_zone_append_size && + bio_op(bio) != REQ_OP_ZONE_APPEND); + + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + ret = extract_ordered_extent(inode, bio, file_offset); + if (ret) { + bio_put(bio); + goto out_err; + } + } + ASSERT(submit_len >= clone_len); submit_len -= clone_len; From patchwork Tue Nov 10 11:26:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894055 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3FC84138B for ; Tue, 10 Nov 2020 11:29:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1807C20781 for ; Tue, 10 Nov 2020 11:29:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="bGM9OLnm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728320AbgKJL3T (ORCPT ); Tue, 10 Nov 2020 06:29:19 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12024 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728345AbgKJL3K (ORCPT ); Tue, 10 Nov 2020 06:29:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007749; x=1636543749; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Od3cczacg0iwa+hP7gcge78uvy1r9Vofbv4BmzlKqqw=; b=bGM9OLnmKf8To/fU7wLPypYQrFVYduB1KlexKorvptVlYDiM28ebuPlk oiAhPt/7mgPdn/XDAxH03nPIXY0q01298nZPYlNit0E0Fb/8PDMdcQvi4 zyHRiCIaEHobL/DNJVgiGZziBb+D7PzJ/+trVPx501EyNremKv4lXig0h lbaV7UjUd/huhW8dzt/VlbpQVPgzaty5U+Z/JjnXUM+RQRJuFc1/eJvD6 xUbRC0th525iLFXKPkB5AlJI2sz2eyBPeSVmkuGzOxBBDnDyEztXZqu5d HjkXGHoGFys9D+F06Wrmp9JyPfwofgEj3otavRnWzgmqfS6UhDWsXL0eY w==; IronPort-SDR: 0Uf/amAoazw+RHeZRFypL09E5AAoLq59jAbxM6VkNfNc0rIihFfODayMOVSpnyANx3rpSWhmyQ Pu7FYaAaA6cIxO2AodvCsWRfq2aj0i6Fba6c6w/zKSwOHqDBozvcjnEAPFSQ3BoXcL73qnUxF2 M5cM/bRzx/lsiJA1MMdmRdtGbUYAatDwswr/P/3MBrFotmB63mTXoUzGyESsaoZ32mrYe4OcZ2 kT1x+cTcQ9VCsgCGgxviX5Wdz5J+oid2caEW+E1Qg92KxsBGOUuhosxwQCTg7HGPkBIUvWnWYB Llc= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376610" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:49 +0800 IronPort-SDR: idppTa8cuF/msrApj4ATR7NUOknpUS5Y42hBCejpaUGxZvPWrQ8/rftwCJVIEUu1kz1VKZlxUU bLPvxHMF+Yd5lyhPlVBJRiOhAE7pGCbfQka9Vzf0wtx20BOak4OdGumMoO/CoAVQeJS9/jPW6G 6X3Amavla+fLUdyLOuxRnMLKwDEQEmUZBygm+GYNJNtYAlkNa0k9+cMsjW2aZDpy6yjFycDUNU e1JktZXj+BJ/jjhjIwHSZEFj7jsAjwBAX+I0OgsCmLoHJgsW3R+ISnInxKUSLZtMqZPqLhOfHZ Q2cNDJuVOjROonYDsiBpk87r Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:50 -0800 IronPort-SDR: mpMNz9ABdCkNS0Jh13dX/Ol/kgabC1uEBnAluxgfzBIQy/RLHu/jn/9DCZScjJrgqJRqvoezSN oig0HLQX26YqkoZMxPBWINUeZLLCbKuJf8POvT1oxHxRmYcmPNt4TEaSGJ6GD3LBy2vYB3hEpA zDLmh9O2gbVtLN/AW+C283RvMNzKTEjb1sE67kJJcmHtF5finZaAJ0R0mS3lUaEYl5SbGDbzUf h9I/2ID2uY1/PUtiBPM7y2d+O6AAiUAUKIcfI5XmlOmVYL57Rf3eXyHzxjWjrt94qGmA1wb4tv gEI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:48 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 27/41] btrfs: introduce dedicated data write path for ZONED mode Date: Tue, 10 Nov 2020 20:26:30 +0900 Message-Id: <446f278b547d02adf1e0fa564d7b6ac76c89b57f.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If more than one IO is issued for one file extent, these IO can be written to separate regions on a device. Since we cannot map one file extent to such a separate area, we need to follow the "one IO == one ordered extent" rule. The Normal buffered, uncompressed, not pre-allocated write path (used by cow_file_range()) sometimes does not follow this rule. It can write a part of an ordered extent when specified a region to write e.g., when its called from fdatasync(). Introduces a dedicated (uncompressed buffered) data write path for ZONED mode. This write path will CoW the region and write it at once. Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 445cb6ba4a59..991ef2bf018f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1350,6 +1350,29 @@ static int cow_file_range_async(struct btrfs_inode *inode, return 0; } +static noinline int run_delalloc_zoned(struct btrfs_inode *inode, + struct page *locked_page, u64 start, + u64 end, int *page_started, + unsigned long *nr_written) +{ + int ret; + + ret = cow_file_range(inode, locked_page, start, end, + page_started, nr_written, 0); + if (ret) + return ret; + + if (*page_started) + return 0; + + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + extent_write_locked_range(&inode->vfs_inode, start, end, WB_SYNC_ALL); + *page_started = 1; + + return 0; +} + static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes) { @@ -1820,17 +1843,24 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page { int ret; int force_cow = need_force_cow(inode, start, end); + const bool do_compress = inode_can_compress(inode) && + inode_need_compress(inode, start, end); + const bool zoned = btrfs_is_zoned(inode->root->fs_info); if (inode->flags & BTRFS_INODE_NODATACOW && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); } else if (inode->flags & BTRFS_INODE_PREALLOC && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_can_compress(inode) || - !inode_need_compress(inode, start, end)) { + } else if (!do_compress && !zoned) { ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); + } else if (!do_compress && zoned) { + ret = run_delalloc_zoned(inode, locked_page, start, end, + page_started, nr_written); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); ret = cow_file_range_async(inode, wbc, locked_page, start, end, From patchwork Tue Nov 10 11:26:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894053 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 089FD138B for ; Tue, 10 Nov 2020 11:29:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D492F20797 for ; Tue, 10 Nov 2020 11:29:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="DJUoW7NT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732009AbgKJL3S (ORCPT ); Tue, 10 Nov 2020 06:29:18 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12022 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728320AbgKJL3K (ORCPT ); Tue, 10 Nov 2020 06:29:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007749; x=1636543749; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q9ylZB25gulV6gMz51E7B2lJ+KQTj0EvRMZM7uw2+XA=; b=DJUoW7NTCLgwwYQQJZ0uMhkp48sbXX7Qyo6skY8vgGmFT7muiy0TuNAK wnREmb5xmSGCMTaFajQqICWPvABMWIDobdzRi1uTJhOcPiZ5fzaFsXZM0 JHU0dPU6doJk7MRglVWHWWpIvFLCuw5cG2owhy1S9vy/iRl8IQngY3ugs kUtfdhly2Umw5RxGr0KBWf+UuQTmai1e3o2tjej7EG/c3tqsJ7v71Li/B I3Tk6L5fiERnTuFj9mxabnVYxHOQ33ZbP0wTWwa8NAsW9aTtBbtU1L4JA Qv4CMQbJQqDW7mlSJ9+DTgwZIUKjYSnlYtR1KGxdFMFFDdit1OHwZkQj7 w==; IronPort-SDR: X6BwGX3SThFp81m2t2GELdRN8S/7PSU3iQqJjWCdW5fDd6nKhRGYGDzzbGtQQvyWxMMqN3u1yZ UgRHm6ogQsWTiod6xMt/02pGoualOEdPeEwTRFN2Z3sCOBFS12b/uJAXs7gC8E2XEzrSbBs/FB g2uK4p5qauXJJApaRwbAQbtzJEUKxj5iU0jW/yylSH4bDm8BHL9ehNFCWduFkyx0n0xMoxe+EX s1qLJ5Mw3SDJQPd0DCEWr4Go4n0rv3YNJwftBzIpdroif3Q6PuJA3LISgIs8Tsms2jY+PjYjK1 kqg= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376619" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:50 +0800 IronPort-SDR: TbKorjJvnRP0SqAkfp8IEE2Twh9EV5kzSjOZj9xvwePY8nOGKnmaIVkkjazXx4QKiJ2uzY4ZQi /yZGkgnAhyFL+a/3GSFDGCRdBt0vnMKlEWGgo+rRGzggv/xmwopMPw47pHw7dCX/CeByH4vVQp B9TFvKGLb+kHLTrwAMmmeAhkNMvpc0Ubqkj1zdqYMHTguNPc4hrtuMOgxCr1Ini0mrb2jedYI4 /f68QO/wOwBqzYmhxxbcbGBKkB8doE6/cw4PwW4P0xItNDSKMo81l9wQWnM9BrjV6H39KEH5vF EEebwxe/29GPVKsQWOfbhwYz Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:52 -0800 IronPort-SDR: hscIgVNhjgVLI5diDk/TpOpLtQNhvlJeTSI4jBLRdqzV+7dm4w+Lzy53jzhkuzz/ckuoI5Nf6z BPmHrpfnWRUwcb3G4doiY5dV716AwOVtDtIYAJin1cndnd6j3MEyjUgrhilBPMCxVUV2gc/CqN aWqw9n1zsnv++UqSA5RIVwEeXnlREPCn06rpIexdFhv/l94VacsH1EaLY4StS6Tfmg8JQC/WTk eDQIsF2FJMik30DWpovtwnBoccCo2EIveYw+oC/wdQns23MVkJgPL5Wv0h6deB1FO2dUVfJ91L EDI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:50 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 28/41] btrfs: serialize meta IOs on ZONED mode Date: Tue, 10 Nov 2020 20:26:31 +0900 Message-Id: <4a7c166980699b0316fc10a8a496ddc68d9297be.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We cannot use zone append for writing metadata, because the B-tree nodes have references to each other using the logical address. Without knowing the address in advance, we cannot construct the tree in the first place. So we need to serialize write IOs for metadata. We cannot add a mutex around allocation and submission because metadata blocks are allocated in an earlier stage to build up B-trees. Add a zoned_meta_io_lock and hold it during metadata IO submission in btree_write_cache_pages() to serialize IOs. Furthermore, this add a per-block group metadata IO submission pointer "meta_write_pointer" to ensure sequential writing, which can be caused when writing back blocks in an unfinished transaction. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 1 + fs/btrfs/extent_io.c | 27 ++++++++++++++++++++++- fs/btrfs/zoned.c | 50 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 32 +++++++++++++++++++++++++++ 6 files changed, 111 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 9a4009eaaecb..44f68e12f863 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -190,6 +190,7 @@ struct btrfs_block_group { */ u64 alloc_offset; u64 zone_unusable; + u64 meta_write_pointer; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c70d3fcc62c2..8138e932b7cc 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -956,6 +956,7 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; + struct mutex zoned_meta_io_lock; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8acf1ed75889..66f90ebfc01f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2652,6 +2652,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) mutex_init(&fs_info->delete_unused_bgs_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); + mutex_init(&fs_info->zoned_meta_io_lock); seqlock_init(&fs_info->profiles_lock); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 7f94fef3647b..d26c827f39c6 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -25,6 +25,7 @@ #include "backref.h" #include "disk-io.h" #include "zoned.h" +#include "block-group.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -3995,6 +3996,7 @@ int btree_write_cache_pages(struct address_space *mapping, struct writeback_control *wbc) { struct extent_buffer *eb, *prev_eb = NULL; + struct btrfs_block_group *cache = NULL; struct extent_page_data epd = { .bio = NULL, .extent_locked = 0, @@ -4029,6 +4031,7 @@ int btree_write_cache_pages(struct address_space *mapping, tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; + btrfs_zoned_meta_io_lock(fs_info); retry: if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); @@ -4071,12 +4074,30 @@ int btree_write_cache_pages(struct address_space *mapping, if (!ret) continue; + if (!btrfs_check_meta_write_pointer(fs_info, eb, + &cache)) { + /* + * If for_sync, this hole will be filled with + * trasnsaction commit. + */ + if (wbc->sync_mode == WB_SYNC_ALL && + !wbc->for_sync) + ret = -EAGAIN; + else + ret = 0; + done = 1; + free_extent_buffer(eb); + break; + } + prev_eb = eb; ret = lock_extent_buffer_for_io(eb, &epd); if (!ret) { + btrfs_revert_meta_write_pointer(cache, eb); free_extent_buffer(eb); continue; } else if (ret < 0) { + btrfs_revert_meta_write_pointer(cache, eb); done = 1; free_extent_buffer(eb); break; @@ -4109,10 +4130,12 @@ int btree_write_cache_pages(struct address_space *mapping, index = 0; goto retry; } + if (cache) + btrfs_put_block_group(cache); ASSERT(ret <= 0); if (ret < 0) { end_write_bio(&epd, ret); - return ret; + goto out; } /* * If something went wrong, don't allow any metadata write bio to be @@ -4147,6 +4170,8 @@ int btree_write_cache_pages(struct address_space *mapping, ret = -EROFS; end_write_bio(&epd, ret); } +out: + btrfs_zoned_meta_io_unlock(fs_info); return ret; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index f38bd0200788..d345c07f5fdf 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -995,6 +995,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) ret = -EIO; } + if (!ret) + cache->meta_write_pointer = cache->alloc_offset + cache->start; + kfree(alloc_offsets); free_extent_map(em); @@ -1128,3 +1131,50 @@ void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) kfree(logical); bdput(bdev); } + +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + struct btrfs_block_group *cache; + bool ret = true; + + if (!btrfs_is_zoned(fs_info)) + return true; + + cache = *cache_ret; + + if (cache && (eb->start < cache->start || + cache->start + cache->length <= eb->start)) { + btrfs_put_block_group(cache); + cache = NULL; + *cache_ret = NULL; + } + + if (!cache) + cache = btrfs_lookup_block_group(fs_info, eb->start); + + if (cache) { + if (cache->meta_write_pointer != eb->start) { + btrfs_put_block_group(cache); + cache = NULL; + ret = false; + } else { + cache->meta_write_pointer = eb->start + eb->len; + } + + *cache_ret = cache; + } + + return ret; +} + +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ + if (!btrfs_is_zoned(eb->fs_info) || !cache) + return; + + ASSERT(cache->meta_write_pointer == eb->start + eb->len); + cache->meta_write_pointer = eb->start; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 2872a0cbc847..41d786a97e40 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -48,6 +48,11 @@ void btrfs_free_redirty_list(struct btrfs_transaction *trans); void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, struct bio *bio); void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret); +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -140,6 +145,19 @@ static inline void btrfs_record_physical_zoned(struct inode *inode, static inline void btrfs_rewrite_logical_zoned( struct btrfs_ordered_extent *ordered) { } +static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + return true; +} + +static inline void btrfs_revert_meta_write_pointer( + struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -243,4 +261,18 @@ static inline bool btrfs_can_zone_reset(struct btrfs_device *device, return true; } +static inline void btrfs_zoned_meta_io_lock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_lock(&fs_info->zoned_meta_io_lock); +} + +static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_unlock(&fs_info->zoned_meta_io_lock); +} + #endif From patchwork Tue Nov 10 11:26:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894051 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F363697 for ; Tue, 10 Nov 2020 11:29:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EB5B520795 for ; Tue, 10 Nov 2020 11:29:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="NQsGEQYu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731962AbgKJL3M (ORCPT ); Tue, 10 Nov 2020 06:29:12 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12024 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728478AbgKJL3L (ORCPT ); Tue, 10 Nov 2020 06:29:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007750; x=1636543750; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ymT8kV10Uka3PsUpR2UxtJE4fiKgqhZ9KbB4eo2cScc=; b=NQsGEQYu8m/VEzl8qIyV6G84xy9ISaWrqS86MQrPXk0rM/vWnCDgJUa0 dBHNrKgXp94mArZnCN1KddXj9lr9a5TEAYPU1ReiXCkyBOtLGeSapIaG+ bWYNxPw1LaN+e5sMxroF6RXajTR0GDqzsO5JrpuALDHNXsnox2tGdP+9c ugj4BVT/zWtQQVitFkkBcyyVI+9QvkTDia7OmRm9h+6uUs3g2HAO0kfvV 3PHfhAleBLAd5deVNEOYkShIL/mIr87Xcsqo0mGGfYI7k6ONlXWWjNR8m Y+jqkLYDRLvb1Gsu5x9CSHWDaVbYW3uUTV/kZ+DfPh3kEt5FG7xxwZCN7 w==; IronPort-SDR: 1TvHuiJKDEh8Y6mh1o87kQERyyMzwvPJHgJXG6zCVg/ZTDwXdslw2le0luY3MzLco1RwGTQW/h pK0/CFxfl0D9/GzcGdz9LHovFFpY7r6svyavQBM06E5t+XRhYpzD68qizCap5FM1h/3WwtU243 CcU0r5buDp4PCEdhuUK2hLCTlc7MQH5wvxxKyamx24YrTUVTHtwbpirmMuNH96eTaq2+IseXDr bYM7ks4Fu/WAezGAvAoFOgveoN4VBvctkyi3bNR36dgY58BY1nq06hzsK8eOm5eRG4ijfzEnf+ nm0= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376630" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:52 +0800 IronPort-SDR: 4s3uOVPdiMfu/Tx5pakkngVkDEi3Jb40VgWBy9V2yFPvyelTDjnYJtAfja74buOPMMhTpJBhW9 XQ1SQ5zs5aUpqF1KofdA+lcQhKvVo/NSI3YEsPlrJe0QinKueP92aEquU88Hiv3wKNnAOV6FE2 QFGQKh7xvjoXM5VvbkGAXvNxw+TU/YAru9KOJzNqc3mo0NiSduWlWKEieCUficKvnWz109sFMs 3+/rqm1uNiBfWcsZVN3a8+3yxuEaoOVCZDAwCAB7ct3VSo3zKQui0DX/FeLjlbT67TvEDV6/ad zE1cMiuX1SL0tRDhEMEcBhZM Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:53 -0800 IronPort-SDR: VLNZ2HUDN4pvAE2uhCYoFMLgcZ8YEKiMKK0ugtH9ln1WNG410aMsp7vFhMOZXue/oXA0w355P7 04iWboKDl46dTH2Hkr18tth54hsgtVaONRq0M5silZM4zbkTMhh1bO9KzJggK+m/0ev/aqY2nC gjscoV4sjZ4BSUvAgxD5Ueh9SW8ArJuOCvvQDLpg/zq65ASSy3mPjqeBHqDy/i8/g6iGaT6IjX K6OpkrwArU+SZIdvVnYfXJ0w6dQEW1qXLT9UK6L07gtC60fBj5AfaZLGvsc5CeNAlq6DtxLv6c wEI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:51 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 29/41] btrfs: wait existing extents before truncating Date: Tue, 10 Nov 2020 20:26:32 +0900 Message-Id: <47aba0c9cc8686ad4807de22c555f4d13db5dc9a.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When truncating a file, file buffers which have already been allocated but not yet written may be truncated. Truncating these buffers could cause breakage of a sequential write pattern in a block group if the truncated blocks are for example followed by blocks allocated to another file. To avoid this problem, always wait for write out of all unwritten buffers before proceeding with the truncate execution. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 991ef2bf018f..992aa963592d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4955,6 +4955,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_drew_write_unlock(&root->snapshot_lock); btrfs_end_transaction(trans); } else { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + if (btrfs_is_zoned(fs_info)) { + ret = btrfs_wait_ordered_range( + inode, + ALIGN(newsize, fs_info->sectorsize), + (u64)-1); + if (ret) + return ret; + } /* * We're truncating a file that used to have good data down to From patchwork Tue Nov 10 11:26:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894105 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D874B697 for ; Tue, 10 Nov 2020 11:29:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AB2EC20781 for ; Tue, 10 Nov 2020 11:29:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Y24ENbTH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732110AbgKJL33 (ORCPT ); Tue, 10 Nov 2020 06:29:29 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12022 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731588AbgKJL3L (ORCPT ); Tue, 10 Nov 2020 06:29:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007750; x=1636543750; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wZjvAc327/RtuGLWgNBK72YschgrDw5lTDO/jaWYdfI=; b=Y24ENbTHjSRcm7/0vxgFJe7Udy0zEruArZWF4pInaDZ4mR2K4gJm9vc0 zor2AvxDlS5mslKXtMGxLblzNzqE41TpPJiZJJGWiDwFDDsPrIGMyIRC4 7Se6f0fqN6gojWqbcl1FjTHQQHo0pakZbPtpqJWLieA+4UcqxQpCkcCkt 2fL/ek+gpr3oVOJiqzgmxGL8r8vMl217rVksN3KytStKBKZOIWsOSjTzF MBuqpg0PGOnVtRggzIrbPUX/8+C54HojY9UriPB5suMMyt+v9NpotLkMl P+2ipzWoIHQszui9S248nktZpGIhWqh1ADMSYsXyFARP3bratPVSot/PX g==; IronPort-SDR: wZNpQv4ixTjLkS1zU7nOIwtbJr12NvBKUos6QuUtddVRQIsoAOg/Wy4OhJZt3JSo/YnfyhbmFa OoMAgJ0/XyxicSlzHjTFBjM8OuV8rL4zZoCGSSykYy+XC7kjRq3iNELnLZWSryVeEA+jPeQxuk TynOZDZODk+QKLtkVy1J4qnTo/Z3fiZ0v6juG4to1mXJF7bGEttAfizbmv6N7yWcJOIBl2QjHg AQfbIK/hNEJlpshEXkqpYJwbanO4oq2R4Lipxgyi9lHzXkNC51Xn4a4tf+NX8FLTnSGyMLX1E7 ok8= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376640" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:54 +0800 IronPort-SDR: s9JcY7SGNh/ebHR0UVpZ4MjeXpBF2GYPsv5obAYQMfpVktUMi87h9h2PMVQ2iqe7KUYkrGzi4i xNCZRGHUcIjdHQuN+mwM5AOpQpXdoFUkbk2qQxtakyrhihQtz3Aib8dkuzs1ZX8787N4dzpH9d pBimBhtrJ05kYI5O3WlQjqRWewf3yh2zvdZZuPlijQe3eW6VOYXOlJpQZSZHKvi5/kMSZncPvy zaQyJhEz8F+4xB1EDHomcZbCm9KrByQffWpwcdI7H4nPWiB+LIy4yyQRWoXl5lL1tL+/wUZSEE 1R3u39JVtyuXhRMtSY3zJ9Kj Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:55 -0800 IronPort-SDR: 6kdp1V+9FmFFPT0pjavFEQq1SR7ggm96q6rlz5AnlI1bmr3PphbwiyhAWAgzn3VMrEsbpJsXmQ V9AOJy1djzorbLxSotSnEew1Mha0z6uSvzMincc3ZbuC8/kGAp9Vf9u2mN1j/dSTwLkPx1AyIM f5LkWX0/Cu/NXFmcGHa/i7YqqQ+w0ea1AIOvE09nqHMg+mNg0lFT3aal2chyIq1aGtZKu/7EL5 cwBtizP88C30EjJNd0Wabt4SViCYsHEqnNvXlfM8hINFKZOrx3uoOjXseorhTlGRxFgcIHvvzv mJQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:53 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 30/41] btrfs: avoid async metadata checksum on ZONED mode Date: Tue, 10 Nov 2020 20:26:33 +0900 Message-Id: <88b50c919c7ee1e85c97430a9d53b68610c32fa8.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In ZONED, btrfs uses per-FS zoned_meta_io_lock to serialize the metadata write IOs. Even with these serialization, write bios sent from btree_write_cache_pages can be reordered by async checksum workers as these workers are per CPU and not per zone. To preserve write BIO ordering, we can disable async metadata checksum on ZONED. This does not result in lower performance with HDDs as a single CPU core is fast enough to do checksum for a single zone write stream with the maximum possible bandwidth of the device. If multiple zones are being written simultaneously, HDD seek overhead lowers the achievable maximum bandwidth, resulting again in a per zone checksum serialization not affecting performance. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 66f90ebfc01f..9490dbbbdb2a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -813,6 +813,8 @@ static blk_status_t btree_submit_bio_start(void *private_data, struct bio *bio, static int check_async_write(struct btrfs_fs_info *fs_info, struct btrfs_inode *bi) { + if (btrfs_is_zoned(fs_info)) + return 0; if (atomic_read(&bi->sync_writers)) return 0; if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) From patchwork Tue Nov 10 11:26:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894109 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F3E85697 for ; Tue, 10 Nov 2020 11:29:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C81EA20781 for ; Tue, 10 Nov 2020 11:29:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="cqFk083N" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732147AbgKJL3c (ORCPT ); Tue, 10 Nov 2020 06:29:32 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12024 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731949AbgKJL3M (ORCPT ); Tue, 10 Nov 2020 06:29:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007751; x=1636543751; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=F07ZLxCzYtDZmvG+sINVM2GfFwZ1qzZY0JrWGzHgaFo=; b=cqFk083NZhCDSpZPaO0ODyXc6XNDaibZJfnCVkv3i3U3v/n5dQ9dnmlF 0XaO/4/xaCZUSPk5cgoOE8rcw6OlOQMuIQq6vsx63hNMsuKNnQt9/Q+rR m0k27UZWbjsKzKBlhtIiBuI6ILBzNrRPQs8wQLtqT5exnve0lBoJB9LMt qvmWEk2+U+PA9EYeu4FrS80AdAHONGKg3yRrrMIXHbrsxDsXap9IOwhtd wRlLCBkdaQqR+JwMW6JYs+NA3F2GFVY3NS/OPhySSa7h6qUQhSXGHw9cR MDI7/H6J5OVSOGhVs6s7Ed0VfFWbhDqX6k+PAPpJt2T/3W5BF7uWes1vw A==; IronPort-SDR: 3C0KhP7CPUhYw59Z1c5pdbHVllbVVhVzZ9GDuCYI/uAWRkHa6dvlTuIZ46wh1nZ5qaidt4tcF8 hgai6WJarmS9bofyABgMYGuucxYJbxW+yYm4TxzrGZUk/xWNBJLzonog7G+o87s9zJtbhsG46k RHhuBkj0pLT3DSyTzaZFZmPv9Kam2FUd7BBi5m0VNqZtw+/KMJ1kmehwnuSqbyevq4bvPm1lrQ l7pbHrtV/SW4ly2GKgALyPtpmAFM2NEvaKPaKPW8QWYLR1DM7PFH8FhbpNKmCYjmCWW4PMZeGV pZ8= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376650" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:55 +0800 IronPort-SDR: yUO7zgR/uY32/J7TzSHz8G66GwE+2nvHg/9g/AKzi6ssZIXJhrrsGRPzS2XUBg8AYMRcCDTzE4 Eq7yQ9Z6BaAcrq50m33sGEurb9i9cqlRwADSnB8TB3uqc0jeHKLGk1TbM2kccxwIiDCn1zJbkO ptY5N7fUztCmiPVxuOTNmKc4RiGNni2CNPWi/z0U7wkoDEYFPdAjDwfgs62z6+OTY4WrBCVcna xCYby66P+EOs+OovPKHuvOXA6ldBciC+vSgooUw4fHUu+LsAvGNEh2N3zRXOiidA2SaGK7cyf8 VviJZOQ9ZkpxHkl5X+DAyC7b Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:56 -0800 IronPort-SDR: 3dHRP3yyIYCr6+cjHRcZQRoJhcu3Jan0gkRWHVC+tsQVgzJWZCO3b1uzYQiCroXLsRXmOVm4Ti GT79r6l3oZJd4AONaOsZe7jWs0u0L1QJCaHOTctSSRMhwk0G2zkAD8WQj8v6hAd7oUalXdARay ZSVZdizd7R2v3LiKb9f5BjzWn+8qnPcTJoccJb6i2L9zC+sTHjw/VuyxqJxJIqVuvdAHty91OM lya0d9ASAL1xeSHjQXH1E6FtWKt/SLPvKtx96KE5Ed3g786oeQ+pPMWroaV7MAviYUcgNMBxd4 jbc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:55 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 31/41] btrfs: mark block groups to copy for device-replace Date: Tue, 10 Nov 2020 20:26:34 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/4 patch to support device-replace in ZONED mode. We have two types of I/Os during the device-replace process. One is an I/O to "copy" (by the scrub functions) all the device extents on the source device to the destination device. The other one is an I/O to "clone" (by handle_ops_on_dev_replace()) new incoming write I/Os from users to the source device into the target device. Cloning incoming I/Os can break the sequential write rule in the target device. When writing is mapped in the middle of a block group, the I/O is directed in the middle of a target device zone, which breaks the sequential write rule. However, the cloning function cannot be merely disabled since incoming I/Os targeting already copied device extents must be cloned so that the I/O is executed on the target device. We cannot use dev_replace->cursor_{left,right} to determine whether bio is going to not yet copied region. Since we have a time gap between finishing btrfs_scrub_dev() and rewriting the mapping tree in btrfs_dev_replace_finishing(), we can have a newly allocated device extent which is never cloned nor copied. So the point is to copy only already existing device extents. This patch introduces mark_block_group_to_copy() to mark existing block groups as a target of copying. Then, handle_ops_on_dev_replace() and dev-replace can check the flag to do their job. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/dev-replace.c | 183 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/dev-replace.h | 3 + fs/btrfs/scrub.c | 17 ++++ 4 files changed, 204 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 44f68e12f863..ccbcf37eae9c 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -95,6 +95,7 @@ struct btrfs_block_group { unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; + unsigned int to_copy:1; int disk_cache_state; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index db87f1aa604b..95e75fc8e266 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -22,6 +22,7 @@ #include "dev-replace.h" #include "sysfs.h" #include "zoned.h" +#include "block-group.h" /* * Device replace overview @@ -437,6 +438,184 @@ static char* btrfs_dev_name(struct btrfs_device *device) return rcu_str_deref(device->name); } +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info, + struct btrfs_device *src_dev) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_dev_extent *dev_extent = NULL; + struct btrfs_block_group *cache; + struct btrfs_trans_handle *trans; + int ret = 0; + u64 chunk_offset, length; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return 0; + + mutex_lock(&fs_info->chunk_mutex); + + /* Ensure we don't have pending new block group */ + spin_lock(&fs_info->trans_lock); + while (fs_info->running_transaction && + !list_empty(&fs_info->running_transaction->dev_update_list)) { + spin_unlock(&fs_info->trans_lock); + mutex_unlock(&fs_info->chunk_mutex); + trans = btrfs_attach_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret == -ENOENT) + continue; + else + goto unlock; + } + + ret = btrfs_commit_transaction(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret) + goto unlock; + + spin_lock(&fs_info->trans_lock); + } + spin_unlock(&fs_info->trans_lock); + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto unlock; + } + + path->reada = READA_FORWARD; + path->search_commit_root = 1; + path->skip_locking = 1; + + key.objectid = src_dev->devid; + key.offset = 0; + key.type = BTRFS_DEV_EXTENT_KEY; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto free_path; + if (ret > 0) { + if (path->slots[0] >= + btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_leaf(root, path); + if (ret < 0) + goto free_path; + if (ret > 0) { + ret = 0; + goto free_path; + } + } else { + ret = 0; + } + } + + while (1) { + struct extent_buffer *l = path->nodes[0]; + int slot = path->slots[0]; + + btrfs_item_key_to_cpu(l, &found_key, slot); + + if (found_key.objectid != src_dev->devid) + break; + + if (found_key.type != BTRFS_DEV_EXTENT_KEY) + break; + + if (found_key.offset < key.offset) + break; + + dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent); + length = btrfs_dev_extent_length(l, dev_extent); + + chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent); + + cache = btrfs_lookup_block_group(fs_info, chunk_offset); + if (!cache) + goto skip; + + spin_lock(&cache->lock); + cache->to_copy = 1; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + +skip: + ret = btrfs_next_item(root, path); + if (ret != 0) { + if (ret > 0) + ret = 0; + break; + } + } + +free_path: + btrfs_free_path(path); +unlock: + mutex_unlock(&fs_info->chunk_mutex); + + return ret; +} + +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map *em; + struct map_lookup *map; + u64 chunk_offset = cache->start; + int num_extents, cur_extent; + int i; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return true; + + spin_lock(&cache->lock); + if (cache->removed) { + spin_unlock(&cache->lock); + return true; + } + spin_unlock(&cache->lock); + + em = btrfs_get_chunk_map(fs_info, chunk_offset, 1); + ASSERT(!IS_ERR(em)); + map = em->map_lookup; + + num_extents = cur_extent = 0; + for (i = 0; i < map->num_stripes; i++) { + /* We have more device extent to copy */ + if (srcdev != map->stripes[i].dev) + continue; + + num_extents++; + if (physical == map->stripes[i].physical) + cur_extent = i; + } + + free_extent_map(em); + + if (num_extents > 1 && cur_extent < num_extents - 1) { + /* + * Has more stripes on this device. Keep this BG + * readonly until we finish all the stripes. + */ + return false; + } + + /* Last stripe on this device */ + spin_lock(&cache->lock); + cache->to_copy = 0; + spin_unlock(&cache->lock); + + return true; +} + static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, const char *tgtdev_name, u64 srcdevid, const char *srcdev_name, int read_src) @@ -478,6 +657,10 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, if (ret) return ret; + ret = mark_block_group_to_copy(fs_info, src_device); + if (ret) + return ret; + down_write(&dev_replace->rwsem); switch (dev_replace->replace_state) { case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h index 60b70dacc299..3911049a5f23 100644 --- a/fs/btrfs/dev-replace.h +++ b/fs/btrfs/dev-replace.h @@ -18,5 +18,8 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info); void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info); int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info); int __pure btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace); +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical); #endif diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index aa1b36cf5c88..d0d7db3c8b0b 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3500,6 +3500,17 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + + if (sctx->is_dev_replace && btrfs_fs_incompat(fs_info, ZONED)) { + spin_lock(&cache->lock); + if (!cache->to_copy) { + spin_unlock(&cache->lock); + ro_set = 0; + goto done; + } + spin_unlock(&cache->lock); + } + /* * Make sure that while we are scrubbing the corresponding block * group doesn't get its logical address and its device extents @@ -3631,6 +3642,12 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, scrub_pause_off(fs_info); + if (sctx->is_dev_replace && + !btrfs_finish_block_group_to_copy(dev_replace->srcdev, + cache, found_key.offset)) + ro_set = 0; + +done: down_write(&dev_replace->rwsem); dev_replace->cursor_left = dev_replace->cursor_right; dev_replace->item_needs_writeback = 1; From patchwork Tue Nov 10 11:26:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894115 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BB7E138B for ; Tue, 10 Nov 2020 11:29:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0468720797 for ; Tue, 10 Nov 2020 11:29:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="j4dG2Vuk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732181AbgKJL3i (ORCPT ); Tue, 10 Nov 2020 06:29:38 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12030 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731969AbgKJL3N (ORCPT ); Tue, 10 Nov 2020 06:29:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007752; x=1636543752; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KFDG1OUyV9i/8yQHJIgy0JjLAUweQbuTxjRzb4n2poM=; b=j4dG2VukK4GG2R6vFwCtLKpA/nF4jfFt5F0Pw9wTdiqoiS5IIzH+9fMH 14hcTnbrYFYnlJY6RoteXgHs/ai6m6+AKr0MINi+WEd8HuKhYa95qeYAD X+jpSaq9HmomIAfte3DBHFr525Nn/stl5Agm5I5v2sF+ECMBAgW4rURaW Pb0aMPgOmSjvERSCQTw0mDXZ5FtrupLV0agVqa3uxTP8F7g+ONiJ0g3AE UJagsmAVWbo4dYKSyAtZs7gmeKf7ZnoEwM6dwe0NeUG0bKlvG2AAj2zdg 4U2ohTthB1w0AjH619ybTPnlD2pE4iGK3eT4re+1F57t6pwlZbd7Qh0gq w==; IronPort-SDR: 2BO8ICZgp10ixN3NkLKkjUoxo5g1BUzHKErHqP5clxLfO/oKSbLOMJ98UoQmprQydV8geG30DI y2AJcJWxhwOwv2lH+/uk3FAFG6fp/TW55g3Eey5YT50NG1JZiUXBzwbLRCmJ8QobJbNFaPhY/G nesVoJyZWb9r0Xl3xKcLzyWJxRGbgMJG/h1boWgNVQhfjJIRTNhnWBDIE8smGJXA5oaEFOWYT3 QbEcBFXpuvJAGvhshRO/S29oCLQoMi2AKymdbf6hYA/86qgXdGdrXKPjfZXG9Ce8dB83QH9Ks2 AkA= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376661" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:57 +0800 IronPort-SDR: uOtf2OHI6gLdf+KYqAc5xss9dlLqFjAYRtcwDw/r0v18oltLCUmRU7pgcknIpmwLMIGwH4N/pJ fOSf7HcCha4FCIqOvq1P54k87rANDQsbegKRiehv5JBEDRF5xawDhSn4Dn2pwcVcV4hDyGEw/y VrW+Ru0RvB6+/gZqMpubWyc0PwgH9rv3qq+7ZZ9PAXrHQDRvK1mg9kU9KfVWo6K8H94Hdy35Wf g8CkfaDDEX5wWslhrNs7klJ7qy37xzNIZL5tQIYGlwxB0AO6g9GXeKTriUOcl5ydX0VMK575OR LdUiKDkyAq80tzVRn5eSwEC3 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:14:58 -0800 IronPort-SDR: HthKYk3hKc2uaUqDB3QWtXV6rbou05ZT3CuFqfGRac8/vsc5xT+Lz0YxHfboOJDPgyIjos9dz0 /vCvdHmrhC4lHNPsmrUuwItHa/8o2CcB8ZPW2BQxTg2pD0001fRDKUqywFfnsqAjlbaHioastV SBz5X3e0nWPwmEmXX6KiyGvuN7v6s2ja6usKWqyVWS+DxXF55jTR+AlZJUiK1zYBbTvS6S4hV3 AYHiXdUblb73IGHwfGW71IhcbZQ9CsWCpMeu8tgw2+iAHIJWTqyhcJgYBS+su7Gp0gI5fhKWQM wWE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:56 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 32/41] btrfs: implement cloning for ZONED device-replace Date: Tue, 10 Nov 2020 20:26:35 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 2/4 patch to implement device-replace for ZONED mode. On zoned mode, a block group must be either copied (from the source device to the destination device) or cloned (to the both device). This commit implements the cloning part. If a block group targeted by an IO is marked to copy, we should not clone the IO to the destination device, because the block group is eventually copied by the replace process. This commit also handles cloning of device reset. Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++----------- fs/btrfs/scrub.c | 2 +- fs/btrfs/volumes.c | 33 ++++++++++++++++++++++-- fs/btrfs/zoned.c | 11 ++++++++ 4 files changed, 85 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 99640dacf8e6..2ee21076b641 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -35,6 +35,7 @@ #include "discard.h" #include "rcu-string.h" #include "zoned.h" +#include "dev-replace.h" #undef SCRAMBLE_DELAYED_REFS @@ -1298,6 +1299,46 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, return ret; } +static int do_discard_extent(struct btrfs_bio_stripe *stripe, u64 *bytes) +{ + struct btrfs_device *dev = stripe->dev; + struct btrfs_fs_info *fs_info = dev->fs_info; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + u64 phys = stripe->physical; + u64 len = stripe->length; + u64 discarded = 0; + int ret = 0; + + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, phys, len)) { + u64 src_disc; + + ret = btrfs_reset_device_zone(dev, phys, len, &discarded); + if (ret) + goto out; + + if (!btrfs_dev_replace_is_ongoing(dev_replace) || + dev != dev_replace->srcdev) + goto out; + + src_disc = discarded; + + /* send to replace target as well */ + ret = btrfs_reset_device_zone(dev_replace->tgtdev, phys, len, + &discarded); + discarded += src_disc; + } else if (blk_queue_discard(bdev_get_queue(stripe->dev->bdev))) { + ret = btrfs_issue_discard(dev->bdev, phys, len, &discarded); + } else { + ret = 0; + *bytes = 0; + } + +out: + *bytes = discarded; + return ret; +} + int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes, u64 *actual_bytes) { @@ -1331,28 +1372,14 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { - struct btrfs_device *dev = stripe->dev; - u64 physical = stripe->physical; - u64 length = stripe->length; u64 bytes; - struct request_queue *req_q; if (!stripe->dev->bdev) { ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } - req_q = bdev_get_queue(stripe->dev->bdev); - /* Zone reset in ZONED mode */ - if (btrfs_can_zone_reset(dev, physical, length)) - ret = btrfs_reset_device_zone(dev, physical, - length, &bytes); - else if (blk_queue_discard(req_q)) - ret = btrfs_issue_discard(dev->bdev, physical, - length, &bytes); - else - continue; - + ret = do_discard_extent(stripe, &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index d0d7db3c8b0b..371bb6437cab 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3501,7 +3501,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, goto skip; - if (sctx->is_dev_replace && btrfs_fs_incompat(fs_info, ZONED)) { + if (sctx->is_dev_replace && btrfs_is_zoned(fs_info)) { spin_lock(&cache->lock); if (!cache->to_copy) { spin_unlock(&cache->lock); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c8187d704c89..434fc6f758cc 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5969,9 +5969,29 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info, return ret; } +static bool is_block_group_to_copy(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + bool ret; + + /* non-ZONED mode does not use "to_copy" flag */ + if (!btrfs_is_zoned(fs_info)) + return false; + + cache = btrfs_lookup_block_group(fs_info, logical); + + spin_lock(&cache->lock); + ret = cache->to_copy; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + return ret; +} + static void handle_ops_on_dev_replace(enum btrfs_map_op op, struct btrfs_bio **bbio_ret, struct btrfs_dev_replace *dev_replace, + u64 logical, int *num_stripes_ret, int *max_errors_ret) { struct btrfs_bio *bbio = *bbio_ret; @@ -5984,6 +6004,15 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, if (op == BTRFS_MAP_WRITE) { int index_where_to_add; + /* + * a block group which have "to_copy" set will + * eventually copied by dev-replace process. We can + * avoid cloning IO here. + */ + if (is_block_group_to_copy(dev_replace->srcdev->fs_info, + logical)) + return; + /* * duplicate the write operations while the dev replace * procedure is running. Since the copying of the old disk to @@ -6379,8 +6408,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op)) { - handle_ops_on_dev_replace(op, &bbio, dev_replace, &num_stripes, - &max_errors); + handle_ops_on_dev_replace(op, &bbio, dev_replace, logical, + &num_stripes, &max_errors); } *bbio_ret = bbio; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index d345c07f5fdf..8bf5df03ceb8 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -11,6 +11,7 @@ #include "disk-io.h" #include "block-group.h" #include "transaction.h" +#include "dev-replace.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -891,6 +892,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) for (i = 0; i < map->num_stripes; i++) { bool is_sequential; struct blk_zone zone; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + int dev_replace_is_ongoing = 0; device = map->stripes[i].dev; physical = map->stripes[i].physical; @@ -917,6 +920,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) */ btrfs_dev_clear_zone_empty(device, physical); + down_read(&dev_replace->rwsem); + dev_replace_is_ongoing = + btrfs_dev_replace_is_ongoing(dev_replace); + if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL) + btrfs_dev_clear_zone_empty(dev_replace->tgtdev, + physical); + up_read(&dev_replace->rwsem); + /* * The group is mapped to a sequential zone. Get the zone write * pointer to determine the allocation offset within the zone. From patchwork Tue Nov 10 11:26:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894113 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B163138B for ; Tue, 10 Nov 2020 11:29:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2188220795 for ; Tue, 10 Nov 2020 11:29:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="JY9HgxxT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732153AbgKJL3g (ORCPT ); Tue, 10 Nov 2020 06:29:36 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12022 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731457AbgKJL3W (ORCPT ); Tue, 10 Nov 2020 06:29:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007762; x=1636543762; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t7E8uxZdIsUk6LH6L2iVS8j99GufLQmEbBv3oVYR1fc=; b=JY9HgxxT/XLyP9yHiKDHUTTLFj5O0WoIOIhDcRz7VXazwQpaILy7gEWb l/AkYCh11Ek5xWrCFKbvxUq6uFC8h+mp+UNtkOLGJ2HKOF3FTgMq9FOHK 1dyzrfa6oUXyZoMOyxtQEecREg6KIutk5G60gD9sr33k7GEeBwdciIYcq ef96YdnwavmtrZ552LEAbiXE/8DNelXUoftUSGMprV4/7B9Sc2vU07Kkt qXxdEmxrtn1qWjRCdZboN2FL4pUTavVHP5Q24PlAhoBqXXh8JGHWek2Sx 8H6bvYEzAq4JAbLZwhzgOS4drks8r0T6U5ANS+LmDVtEyNNeXt49a9pLh w==; IronPort-SDR: c3PEl/vrmAvpqgs2wjBUnmA1fBD9uYhETiMNu1c3aP+ao7kd/2RPXxthi4aMGJZEsIfOu9dkTb DfBCKT1ljHDuRy412ksy6pSLpHCJe/v0QSqcfmGOryBml4iAkkgVGmbcsbaD2I2bTGwMGtBH7j l1Sd4jngxTDI3S+qmXz6yNHzrVFxnQd6HIa3tcL73LVcJonfneWyVsAtRmfnycxNsi8vqUt18r rz3+RI8BZ8yhm44e3ylMnYnXn5I2gW+tLpEPEhfpJNoRdAN/5ByLri2VuBBs2HOKR6DOA7D/eb d7c= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376673" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:28:58 +0800 IronPort-SDR: IXuSQT+nigbB3XBoAoJGc9IX+lU1RmccUm7gEcbw9dv21BkOp1w9U9JzuqxnzybSkpl3SKIaq0 02/MEJeN8Pd7SzEPUB+7O+4VO4U0ArhoyGz4HqJnYNBWOavypcoVgAzq5Vh+e0YPJjddGDfyOo 3w8s3TEC+tCup1lKO5pmqzDC5UBSqt8Wmgx611peM8ALzMGz6zpCyMPs70MUlZ7/fZtk+1qZhr eDX1RCsz3/djxk3Ek3/8buOj8fwdSAhZhhqhoulJ1iSFI1yVLF4GRRk50B8YktITAI8D1nv0/O dqdIks+Chitea2SGURjH0aIj Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:15:00 -0800 IronPort-SDR: +1HAFVzF5Oei295J29EB8YF79qoVEMbaSL0fXloEMH3/fFXFM0cSIIRZbs9nZR6eMDpU+YVh3T lNHI0waYZNaBHz7rQd6Y3Ic2g3mQ53iqgmMFzuj0V0kp1/0a0TUTzlwxl+TABozeseB7nq0KsL UjHRDGRCRjwJVB30dFRJLY+FBRQfUE31Qf9IVNGIOvK5JO2xw9HtgTnMKAWXPnc7q3gHz3GrjO KkkljrU7OSGGTdNH99Uah3tGZWQepcQqdUP3U4Kc5jAFbkcggJ8mypXm/zKAtIkwbnobqr0AG8 SH4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:58 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 33/41] btrfs: implement copying for ZONED device-replace Date: Tue, 10 Nov 2020 20:26:36 +0900 Message-Id: <3518b93d16bfa5f0199a278b4cba6711a96661cc.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 3/4 patch to implement device-replace on ZONED mode. This commit implement copying. So, it track the write pointer during device replace process. Device-replace's copying is smart to copy only used extents on source device, we have to fill the gap to honor the sequential write rule in the target device. Device-replace process in ZONED mode must copy or clone all the extents in the source device exactly once. So, we need to use to ensure allocations started just before the dev-replace process to have their corresponding extent information in the B-trees. finish_extent_writes_for_zoned() implements that functionality, which basically is the removed code in the commit 042528f8d840 ("Btrfs: fix block group remaining RO forever after error during device replace"). Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.c | 12 +++++++ fs/btrfs/zoned.h | 8 +++++ 3 files changed, 106 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 371bb6437cab..aaf7882dee06 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -169,6 +169,7 @@ struct scrub_ctx { int pages_per_rd_bio; int is_dev_replace; + u64 write_pointer; struct scrub_bio *wr_curr_bio; struct mutex wr_lock; @@ -1623,6 +1624,25 @@ static int scrub_write_page_to_dev_replace(struct scrub_block *sblock, return scrub_add_page_to_wr_bio(sblock->sctx, spage); } +static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) +{ + int ret = 0; + u64 length; + + if (!btrfs_is_zoned(sctx->fs_info)) + return 0; + + if (sctx->write_pointer < physical) { + length = physical - sctx->write_pointer; + + ret = btrfs_zoned_issue_zeroout(sctx->wr_tgtdev, + sctx->write_pointer, length); + if (!ret) + sctx->write_pointer = physical; + } + return ret; +} + static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, struct scrub_page *spage) { @@ -1645,6 +1665,13 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, if (sbio->page_count == 0) { struct bio *bio; + ret = fill_writer_pointer_gap(sctx, + spage->physical_for_dev_replace); + if (ret) { + mutex_unlock(&sctx->wr_lock); + return ret; + } + sbio->physical = spage->physical_for_dev_replace; sbio->logical = spage->logical; sbio->dev = sctx->wr_tgtdev; @@ -1706,6 +1733,10 @@ static void scrub_wr_submit(struct scrub_ctx *sctx) * doubled the write performance on spinning disks when measured * with Linux 3.5 */ btrfsic_submit_bio(sbio->bio); + + if (btrfs_is_zoned(sctx->fs_info)) + sctx->write_pointer = sbio->physical + + sbio->page_count * PAGE_SIZE; } static void scrub_wr_bio_end_io(struct bio *bio) @@ -2973,6 +3004,21 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return ret < 0 ? ret : 0; } +static void sync_replace_for_zoned(struct scrub_ctx *sctx) +{ + if (!btrfs_is_zoned(sctx->fs_info)) + return; + + sctx->flush_all_writes = true; + scrub_submit(sctx); + mutex_lock(&sctx->wr_lock); + scrub_wr_submit(sctx); + mutex_unlock(&sctx->wr_lock); + + wait_event(sctx->list_wait, + atomic_read(&sctx->bios_in_flight) == 0); +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3105,6 +3151,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ blk_start_plug(&plug); + if (sctx->is_dev_replace && + btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) { + mutex_lock(&sctx->wr_lock); + sctx->write_pointer = physical; + mutex_unlock(&sctx->wr_lock); + sctx->flush_all_writes = true; + } + /* * now find all extents for each stripe and scrub them */ @@ -3292,6 +3346,9 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) goto out; + if (sctx->is_dev_replace) + sync_replace_for_zoned(sctx); + if (extent_logical + extent_len < key.objectid + bytes) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { @@ -3414,6 +3471,25 @@ static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx, return ret; } +static int finish_extent_writes_for_zoned(struct btrfs_root *root, + struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_trans_handle *trans; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + btrfs_wait_block_group_reservations(cache); + btrfs_wait_nocow_writers(cache); + btrfs_wait_ordered_roots(fs_info, U64_MAX, cache->start, cache->length); + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) + return PTR_ERR(trans); + return btrfs_commit_transaction(trans); +} + static noinline_for_stack int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, u64 start, u64 end) @@ -3569,6 +3645,16 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, * group is not RO. */ ret = btrfs_inc_block_group_ro(cache, sctx->is_dev_replace); + if (!ret && sctx->is_dev_replace) { + ret = finish_extent_writes_for_zoned(root, cache); + if (ret) { + btrfs_dec_block_group_ro(cache); + scrub_pause_off(fs_info); + btrfs_put_block_group(cache); + break; + } + } + if (ret == 0) { ro_set = 1; } else if (ret == -ENOSPC && !sctx->is_dev_replace) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 8bf5df03ceb8..cce4ddfff5d2 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1189,3 +1189,15 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, ASSERT(cache->meta_write_pointer == eb->start + eb->len); cache->meta_write_pointer = eb->start; } + +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length) +{ + if (!btrfs_dev_is_sequential(device, physical)) + return -EOPNOTSUPP; + + return blkdev_issue_zeroout(device->bdev, + physical >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, + GFP_NOFS, 0); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 41d786a97e40..40204f8310ca 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -53,6 +53,8 @@ bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, struct btrfs_block_group **cache_ret); void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -158,6 +160,12 @@ static inline void btrfs_revert_meta_write_pointer( { } +static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, + u64 physical, u64 length) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Nov 10 11:26:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894111 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69FAA697 for ; Tue, 10 Nov 2020 11:29:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4225620781 for ; Tue, 10 Nov 2020 11:29:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="goTWo7yE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732072AbgKJL3g (ORCPT ); Tue, 10 Nov 2020 06:29:36 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12024 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731886AbgKJL3X (ORCPT ); Tue, 10 Nov 2020 06:29:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007763; x=1636543763; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1Zf+mTBpk3dXiZYpxrZ6btL7+6AutqeowtwPs4bCEd0=; b=goTWo7yE9rvyzx4iTgA9nYwN+3lHYZr1jftkskF83Jekj/xakmx9Ghg3 bAXSNdlA1Iu7fc8ldIeCNUc09b+AKC0tncjWoBo7+EALj3rStrqkpuk0N 9SMwv8EHLjKNL8nF9bwvxApMAmd/p5ILr7W/M0w/KTd+fxQIru7ru56Lf JY5RVkUFK+VQyaUKbJQQqcApCNjjwKt8+CY4j2EaW5VenT/Q5UT08NqOP PE83j3zNJI8chPCKclFDNJo3/YcwfAUKGhopWL+HsVkE9Lj6l/Zh3DZc0 amB6h6Why6TNZYJKwnl0CEBP/qkBkTAwrltQP7ECv3tyXLi3efwO3O0iL w==; IronPort-SDR: 8ZHbDDjOjI5X+9tJo96PJ+AwunEwmEDj+VLhNhY46L4VonTmj/lhglKV/Dsft06TK2u69QiAKs gmTp1xSpc04arqtC9tmPHylsf8Hp8lvrCfDE2GTUyC683khqKOl1OA/PDmv1yy0llJu8vWm5hk KB1q+70NENkfdbhEtf1Ew6tg5KYgauFfd4dDSfPnSKN06zEd48tJLVVJPm6yp0qdGbWERQLsRw xBUa6y7Rv1wdGXpWxrr0PS3dmyhOmmWaqhC5Z4ejtIFbUzQCynTIxbHzc5RdlBN1ny/Pl3FRZn XUo= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376690" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:29:00 +0800 IronPort-SDR: id31yvqYPzJkSTBZEftEm62+bP2d9OLjbuZ9YUkjvK41HO4G//X4wFXYiQfvHu61OKHU4JTWpj nuvCQPyXgOKoQc9kuTP38u5xsgfIE8bak5G4LLKtNjTBprbP9U/NTOcJL4UGdSmoidnuE/Nzra IsSnfizLUL4Mp8m8vnRUdvlynh4m+CyrYHeK9AzSLOkVnb3ynotxS/RkOSaDOK++ULcGlUeCPE 4qxulm8nMKdYFuBztAuMVj/HikYGFF6diaRx7UygiaEWn0GlIi3g1bEMWuUiGnOs01DN4Me5RX nkTB1aGxXFZZah8cb6jlQmSt Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:15:01 -0800 IronPort-SDR: eCENEISv/Ch39BzqsWzoTPhuf4kZhGOFGWm6S67F1qndx2ciagewcQ12+8X3b6xEIq+PexFvkC Bcss73j2LCNnrJI9WqGD2fGqdR3AP0G6+coBMYY3x90MwssJ1iVLuBVQI3SfKf9okjTAib5Haz Bq68oY79n6tZYQcvfgEpuU01y4jtPHocS6zIueam58HPChh1VlME383qvvLXPt2UJq2bs3qNg5 9xPaIJ8iT+FRBEDb6D0E280Fn8+y+dfg35CTO1lqtw7fl3QreuWINfzeDDRCxMu41JsF0SoXoF AkA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:28:59 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 34/41] btrfs: support dev-replace in ZONED mode Date: Tue, 10 Nov 2020 20:26:37 +0900 Message-Id: <25283ad0c9c0206f62ed68f2e7c546bde946fc17.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 4/4 patch to implement device-replace on ZONED mode. Even after the copying is done, the write pointers of the source device and the destination device may not be synchronized. For example, when the last allocated extent is freed before device-replace process, the extent is not copied, leaving a hole there. This patch synchronize the write pointers by writing zeros to the destination device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 36 +++++++++++++++++++++++++ fs/btrfs/zoned.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 9 +++++++ 3 files changed, 114 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index aaf7882dee06..0e2211b9c810 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3019,6 +3019,31 @@ static void sync_replace_for_zoned(struct scrub_ctx *sctx) atomic_read(&sctx->bios_in_flight) == 0); } +static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical, + u64 physical, u64 physical_end) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + int ret = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); + + mutex_lock(&sctx->wr_lock); + if (sctx->write_pointer < physical_end) { + ret = btrfs_sync_zone_write_pointer(sctx->wr_tgtdev, logical, + physical, + sctx->write_pointer); + if (ret) + btrfs_err(fs_info, "failed to recover write pointer"); + } + mutex_unlock(&sctx->wr_lock); + btrfs_dev_clear_zone_empty(sctx->wr_tgtdev, physical); + + return ret; +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3416,6 +3441,17 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, blk_finish_plug(&plug); btrfs_free_path(path); btrfs_free_path(ppath); + + if (sctx->is_dev_replace && ret >= 0) { + int ret2; + + ret2 = sync_write_pointer_for_zoned(sctx, base + offset, + map->stripes[num].physical, + physical_end); + if (ret2) + ret = ret2; + } + return ret < 0 ? ret : 0; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index cce4ddfff5d2..77ca93bda258 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -12,6 +12,7 @@ #include "block-group.h" #include "transaction.h" #include "dev-replace.h" +#include "space-info.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1201,3 +1202,71 @@ int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, length >> SECTOR_SHIFT, GFP_NOFS, 0); } + +static int read_zone_info(struct btrfs_fs_info *fs_info, u64 logical, + struct blk_zone *zone) +{ + struct btrfs_bio *bbio = NULL; + u64 mapped_length = PAGE_SIZE; + unsigned int nofs_flag; + int nmirrors; + int i, ret; + + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, + &mapped_length, &bbio); + if (ret || !bbio || mapped_length < PAGE_SIZE) { + btrfs_put_bbio(bbio); + return -EIO; + } + + if (bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) + return -EINVAL; + + nofs_flag = memalloc_nofs_save(); + nmirrors = (int)bbio->num_stripes; + for (i = 0; i < nmirrors; i++) { + u64 physical = bbio->stripes[i].physical; + struct btrfs_device *dev = bbio->stripes[i].dev; + + /* Missing device */ + if (!dev->bdev) + continue; + + ret = btrfs_get_dev_zone(dev, physical, zone); + /* Failing device */ + if (ret == -EIO || ret == -EOPNOTSUPP) + continue; + break; + } + memalloc_nofs_restore(nofs_flag); + + return ret; +} + +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos) +{ + struct btrfs_fs_info *fs_info = tgt_dev->fs_info; + struct blk_zone zone; + u64 length; + u64 wp; + int ret; + + if (!btrfs_dev_is_sequential(tgt_dev, physical_pos)) + return 0; + + ret = read_zone_info(fs_info, logical, &zone); + if (ret) + return ret; + + wp = physical_start + ((zone.wp - zone.start) << SECTOR_SHIFT); + + if (physical_pos == wp) + return 0; + + if (physical_pos > wp) + return -EUCLEAN; + + length = wp - physical_pos; + return btrfs_zoned_issue_zeroout(tgt_dev, physical_pos, length); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 40204f8310ca..5b61500a0aa9 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -55,6 +55,8 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length); +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -166,6 +168,13 @@ static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, return -EOPNOTSUPP; } +static inline int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, + u64 logical, u64 physical_start, + u64 physical_pos) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Nov 10 11:26:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894129 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E66D7697 for ; Tue, 10 Nov 2020 11:30:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE3A920781 for ; Tue, 10 Nov 2020 11:30:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Fbvy5F44" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732299AbgKJL37 (ORCPT ); Tue, 10 Nov 2020 06:29:59 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12030 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731949AbgKJL3d (ORCPT ); Tue, 10 Nov 2020 06:29:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007773; x=1636543773; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xPo3kn4hqbPhdopEoZslI7bCsYHDqKdsh6NiaL+hT/k=; b=Fbvy5F44p2j8djaNWVpA14Fnqrv8IIVUXNLWV0rGSaFrKnEwMjUu1NUE Upxz1TV1YV6bq93SWiSz1lcWAaZrKAT5Sme4JlsPltQ5vC962sr6uvKlu rbhXHiGm/1UnYuEQzmsxG4r17IZAvOv6muoarSxUYCu4jEe1Nf9PXAkpN d4ZEufMoFaRQbybvTYg/sNg3vF74baTgohkIT6E3pDbs8j8cpMk0vFDrX bV2RjYuTij0lT0/kULo/GS0xUBZ0cOoL64hW2wbKPzvm4eHkkO5HMpZhe vla77ebu6pZIuNdB7W42K6z3XakATsonWVkTlBxhp8In4hBynfGhoZ2gO w==; IronPort-SDR: prtFNrOroBWfQgnbyhyvK44LhKZ0+azoSFG0j+SpEyq1Qj8Qz5I7Svfx7hceAdOviAsgVgo2RD DF6KAKmAUbpptY1Pmw1rKqza4DKztiGpIhmzYmYUuuEylp2C6Naw4uyXbkAcQdHtOLMDvQVEtA xn9sAnn9qAr//JSPmBD5TkEdfOGEAT94/zBOhrB3NbLM2PX4J22JAqnkmIbJLoHQrbRoCs4ld1 yyGXKLvySqG2O4hmQT4eP5suiC9pU2RspNGYUc6m6hIiTJNRwgOWBgjXXiW9CX2nUSsuG10oE9 w5w= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376702" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:29:02 +0800 IronPort-SDR: QVRIzgynM0JKBZ+IISXadLUg4YGfp6NLY0R72NQ7t9Ktd2seZqrgMqXDq4Pew4An478QjbNPeQ e30a8fypMSpQ6N5+YwdA5MCLWH8Ps9sQ+L7GIaWHhnAheF8yVtfy1ootQTVnHbIZYpj7jOOny+ JmVZq5Ahb+V0+FxyCs5pbmeaCXdlymZj8hwgbhgzmMIdCwgehAGLD24z6FIzkV+9yHAqxohjqP VDPVlpjfZTDE+s+Tcbhnon8yFbkx6wUt4/I4CXs6d0zJdUUfQX/1h8+c/MkLyDuHeSa8ffmRi4 ptO0DZYOoc7cVGabKrKVi2PT Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:15:03 -0800 IronPort-SDR: xCUn3TgKaZpbTWVpqHx1y9KD3atEPSQKEdzCsa4GAHeYTu9hXnTGhMryX4ayGpDDWaOPd4tJWB ECxT0E4T6cePPsMY8j0oYep/dsWs8/uUdzgJw+aNSsgV1L85b9eQoVu5GDbC3Ld1LV9n+SSjro hGt+ZhxCksk25cDb0/08Zn9pvY3Vy31ZA2PDGtgVMytyZTzonbx4HRtjF8BSCo0t0UShD745UB bw826yEAC7+D9i9JeKaKpxCg5xWlrlJQAjANSlXmvWfwViXPuR/0hTsgbP6s5aX1OHvqehCNr0 MFE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:29:01 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 35/41] btrfs: enable relocation in ZONED mode Date: Tue, 10 Nov 2020 20:26:38 +0900 Message-Id: <34c21befbcb421bc93d3350a027ced670f568c90.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org To serialize allocation and submit_bio, we introduced mutex around them. As a result, preallocation must be completely disabled to avoid a deadlock. Since current relocation process relies on preallocation to move file data extents, it must be handled in another way. In ZONED mode, we just truncate the inode to the size that we wanted to pre-allocate. Then, we flush dirty pages on the file before finishing relocation process. run_delalloc_zoned() will handle all the allocation and submit IOs to the underlying layers. Signed-off-by: Naohiro Aota --- fs/btrfs/relocation.c | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 3602806d71bd..44b697b881b6 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2603,6 +2603,32 @@ static noinline_for_stack int prealloc_file_extent_cluster( if (ret) return ret; + /* + * In ZONED mode, we cannot preallocate the file region. Instead, we + * dirty and fiemap_write the region. + */ + + if (btrfs_is_zoned(inode->root->fs_info)) { + struct btrfs_root *root = inode->root; + struct btrfs_trans_handle *trans; + + end = cluster->end - offset + 1; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + inode->vfs_inode.i_ctime = current_time(&inode->vfs_inode); + i_size_write(&inode->vfs_inode, end); + ret = btrfs_update_inode(trans, root, &inode->vfs_inode); + if (ret) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + } + + return btrfs_end_transaction(trans); + } + inode_lock(&inode->vfs_inode); for (nr = 0; nr < cluster->nr; nr++) { start = cluster->boundary[nr] - offset; @@ -2799,6 +2825,8 @@ static int relocate_file_extent_cluster(struct inode *inode, } } WARN_ON(nr != cluster->nr); + if (btrfs_is_zoned(fs_info) && !ret) + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); out: kfree(ra); return ret; @@ -3434,8 +3462,12 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, struct btrfs_path *path; struct btrfs_inode_item *item; struct extent_buffer *leaf; + u64 flags = BTRFS_INODE_NOCOMPRESS | BTRFS_INODE_PREALLOC; int ret; + if (btrfs_is_zoned(trans->fs_info)) + flags &= ~BTRFS_INODE_PREALLOC; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -3450,8 +3482,7 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, btrfs_set_inode_generation(leaf, item, 1); btrfs_set_inode_size(leaf, item, 0); btrfs_set_inode_mode(leaf, item, S_IFREG | 0600); - btrfs_set_inode_flags(leaf, item, BTRFS_INODE_NOCOMPRESS | - BTRFS_INODE_PREALLOC); + btrfs_set_inode_flags(leaf, item, flags); btrfs_mark_buffer_dirty(leaf); out: btrfs_free_path(path); From patchwork Tue Nov 10 11:26:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894127 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 20546697 for ; Tue, 10 Nov 2020 11:30:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EAB8420781 for ; Tue, 10 Nov 2020 11:29:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="DQwkPg4v" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732198AbgKJL36 (ORCPT ); Tue, 10 Nov 2020 06:29:58 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12022 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732151AbgKJL3d (ORCPT ); Tue, 10 Nov 2020 06:29:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007773; x=1636543773; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fO7fczkDPPhLC1awKk+6Qj/uT853qiV2q2oLTyKWl+4=; b=DQwkPg4v2droWZuYLGBvgv00iJvEaqYIREBeumutWNi50JJdx5ldHUTq 1A6DKzGSN0c3KAf0sMY2nXcnIMZZuaCRVKkw3uugL0/wO/hadmOqtYcp6 7ywFTDzT1XvtoUTzIwLhso5ZN9EnGaqmqExGqY4UWzfQkW4Vqhcx10XjY OBSLwC59Z4LUpA7F/cdl46mAYWOrg/1NrGkuzbp1SlFujABGqOrNszIC7 3ZkOEr5e46D9w6kTxdv8Pq8YwgoBXjB7/pUuB7TXvUqlmJHBKV0+l8/Dj +IwAAOqbfWGDHD8E7k9Bi/0EGF8rFsydfd6bdZz8AR2NM8rLpGKAZOhfi w==; IronPort-SDR: ZuINvXsq7EwO57ryz6ozdd4OZ0W/Z5NdvToJZ8KBy2FPHfc8dfHAEoxHMxviXFjuQwRdGKBzTl s/yJgYdzHY1cSKiJ0m2XNaIckjqCxN+6OEm0a6PzacYcB6KmmpMOELjd/YF7Wi2/tKHrU3MuiK XxTCoQb7bdjShmbndd5xYoo5DBv/fvEvujKfysHTe/ulL7ytYGuh04vBZhHlh8InybIvBEmRav 0Zr25tgToCNjDYRtqW8p5q2VSS9hKbjRGGIO0FvKExudiM/dbuEAVRvhUTOk7PQLTc/akf73+M g2c= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376709" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:29:03 +0800 IronPort-SDR: jXL4LcnxTFV3r23xihNi5qjExcUbUTal+VwAFngswKGd9lOFjtnZmVGOtL42yhXEwZbVkUquyX 2ok42TnpiBMP4exNlQ8ZJfqbpudS4w0Piu6LSim83AfBfarB5vzGHi5ppusE7CeE90N9vgM0wc 42aFLhS6RtpsufnyqfgWJ9xPTANk8tWS9gJrovMgEOG+QDBI/dqzUGZwR4hTQfUoCW6C8GvrFH z/v6JDpDBrNG4TS8k7rSaaFiaKaqRjSthlZCqt/iOD4MRza1gSOpR/cRAntnbY0agxLMFXVaGd 0XPkOd3XiCiUSBPk49N59jxk Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:15:05 -0800 IronPort-SDR: W/yRkaKW2PFWeqjqpPqfUtx0LyO7DatzGr0BAyTrcFeDCbtUvOsu0MjFBGOZdF5jINoetzCFMF tSa1FHbFGcDX4WEfAgSem57bEF40KLdCbRqwo9HccYHOGQL/yRnoJcKDunYaeKsw3B9VyyCTdY LxJidyeErkTvHfqMWAcCCWMhjTE6zPcpqGQ1ZfXfb48ZQHg9aWGR/hKPu+kmRfseMIma8xP28u DjJQ/yw0GGNNJ0RceOUA4XLCFNeup6WlfJPfRaq3MDn0v+pkYJWmGoyXZIgwDx66xtMBy8aIkK rYc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:29:03 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 36/41] btrfs: relocate block group to repair IO failure in ZONED Date: Tue, 10 Nov 2020 20:26:39 +0900 Message-Id: <49cb37a5f0c1c1ddfbf9389f8038948aec640c37.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When btrfs find a checksum error and if the file system has a mirror of the damaged data, btrfs read the correct data from the mirror and write the data to damaged blocks. This repairing, however, is against the sequential write required rule. We can consider three methods to repair an IO failure in ZONED mode: (1) Reset and rewrite the damaged zone (2) Allocate new device extent and replace the damaged device extent to the new extent (3) Relocate the corresponding block group Method (1) is most similar to a behavior done with regular devices. However, it also wipes non-damaged data in the same device extent, and so it unnecessary degrades non-damaged data. Method (2) is much like device replacing but done in the same device. It is safe because it keeps the device extent until the replacing finish. However, extending device replacing is non-trivial. It assumes "src_dev->physical == dst_dev->physical". Also, the extent mapping replacing function should be extended to support replacing device extent position in one device. Method (3) invokes relocation of the damaged block group, so it is straightforward to implement. It relocates all the mirrored device extents, so it is, potentially, a more costly operation than method (1) or (2). But it relocates only using extents which reduce the total IO size. Let's apply method (3) for now. In the future, we can extend device-replace and apply method (2). For protecting a block group gets relocated multiple time with multiple IO errors, this commit introduces "relocating_repair" bit to show it's now relocating to repair IO failures. Also it uses a new kthread "btrfs-relocating-repair", not to block IO path with relocating process. This commit also supports repairing in the scrub process. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 1 + fs/btrfs/extent_io.c | 3 ++ fs/btrfs/scrub.c | 3 ++ fs/btrfs/volumes.c | 71 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 5 files changed, 79 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index ccbcf37eae9c..25f67fe24746 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -96,6 +96,7 @@ struct btrfs_block_group { unsigned int has_caching_ctl:1; unsigned int removed:1; unsigned int to_copy:1; + unsigned int relocating_repair:1; int disk_cache_state; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d26c827f39c6..c11cf531ba86 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2268,6 +2268,9 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); BUG_ON(!mirror_num); + if (btrfs_is_zoned(fs_info)) + return btrfs_repair_one_zone(fs_info, logical); + bio = btrfs_io_bio_alloc(1); bio->bi_iter.bi_size = 0; map_length = length; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 0e2211b9c810..e6a8df8a8f4f 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -861,6 +861,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) have_csum = sblock_to_check->pagev[0]->have_csum; dev = sblock_to_check->pagev[0]->dev; + if (btrfs_is_zoned(fs_info) && !sctx->is_dev_replace) + return btrfs_repair_one_zone(fs_info, logical); + /* * We must use GFP_NOFS because the scrub task might be waiting for a * worker task executing this function and in turn a transaction commit diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 434fc6f758cc..8788dc64ba46 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7984,3 +7984,74 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr) spin_unlock(&fs_info->swapfile_pins_lock); return node != NULL; } + +static int relocating_repair_kthread(void *data) +{ + struct btrfs_block_group *cache = (struct btrfs_block_group *) data; + struct btrfs_fs_info *fs_info = cache->fs_info; + u64 target; + int ret = 0; + + target = cache->start; + btrfs_put_block_group(cache); + + if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) { + btrfs_info(fs_info, + "zoned: skip relocating block group %llu to repair: EBUSY", + target); + return -EBUSY; + } + + mutex_lock(&fs_info->delete_unused_bgs_mutex); + + /* Ensure Block Group still exists */ + cache = btrfs_lookup_block_group(fs_info, target); + if (!cache) + goto out; + + if (!cache->relocating_repair) + goto out; + + ret = btrfs_may_alloc_data_chunk(fs_info, target); + if (ret < 0) + goto out; + + btrfs_info(fs_info, "zoned: relocating block group %llu to repair IO failure", + target); + ret = btrfs_relocate_chunk(fs_info, target); + +out: + if (cache) + btrfs_put_block_group(cache); + mutex_unlock(&fs_info->delete_unused_bgs_mutex); + btrfs_exclop_finish(fs_info); + + return ret; +} + +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + + /* Do not attempt to repair in degraded state */ + if (btrfs_test_opt(fs_info, DEGRADED)) + return 0; + + cache = btrfs_lookup_block_group(fs_info, logical); + if (!cache) + return 0; + + spin_lock(&cache->lock); + if (cache->relocating_repair) { + spin_unlock(&cache->lock); + btrfs_put_block_group(cache); + return 0; + } + cache->relocating_repair = 1; + spin_unlock(&cache->lock); + + kthread_run(relocating_repair_kthread, cache, + "btrfs-relocating-repair"); + + return 0; +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index cff1f7689eac..7c1ad6901791 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -584,5 +584,6 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical); #endif From patchwork Tue Nov 10 11:26:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894121 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 099BD697 for ; Tue, 10 Nov 2020 11:29:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D7EF520781 for ; Tue, 10 Nov 2020 11:29:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="akFMYeoG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732195AbgKJL3s (ORCPT ); Tue, 10 Nov 2020 06:29:48 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12024 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732152AbgKJL3e (ORCPT ); Tue, 10 Nov 2020 06:29:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007774; x=1636543774; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=o6jKXgxzt9+2YdizC4VIrF+u5LglfzPnGXRDXk1bV4w=; b=akFMYeoGsDHgcgTIwegfGYAvaHEh1jeRN5oHGS5bVbEsbyU0lJNIOtlw VRVuD8Ouckmm07PHSZoemsQbmhDzZqAdXc4wuNJNyLVK2y9b8W6KwoogT pHRVclcmXkqU4jqmPqgFnegJbrr6wYa8sd6pkYazCvpMYRm71wc2Gs9AG P1/dwED3xH5orRn0UT32h7EC3Nu7M9GyxfTomPPG7FlC/A30Exp9tTDgi Zz+TYz2uam+QweAlVP8TgIViJ4iCLuA/xcNz20yeN7LIJKP3C2r6hphrO 5S3kIrY0aMQ+7a2qTzdn3kuQQ7yTeiIaE7A0da0GMWbbSBYRAJ3hOMKES Q==; IronPort-SDR: UE1V3bVFXTzv+htsqToFA4a9Xy/XsCh0AcMllIe/Eib15+EGA5zQWElslKyVbdjMDsJubPi8FP vdPbxZYzMHotZbqZbHhyEb9yS0soIEmtNxmHSHUJCBF2/KBpMMLtNZvKNKgnz3QIqxLO0qzysP K1q1MqAOGXSC24X42lYo+XL0n95ylms51demrNYcTfGWJfAbQZpN6D98NjC0gcYRazyKPXti4z YAtIqoDMOo4N9Yeh06Re8W8TpExnfWBCFfBvFciBS4Ga71qNu4F4rPasyIAyt5wW1nAQvbG3lg CgM= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376714" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:29:05 +0800 IronPort-SDR: Stfhn7VNMi5pZ3j7rAbYzUYKVqmikHnxunmx0uPu/SNtuaUspVNGiB/t1tuOTwQogxzJRCn6tJ sR9nURe/q3zQHiWlnA99CdpV2ZGOPBiAyWdjyMQQCZrSUxpk+NlfQ/C0Z0fG02r3x3ErnktXTi PZ9Z/hWeCcIvMB5RnqDOfazNcuEPfQMo9qJAXLZsQ9DEHpfMADyIdBkYkwRzOXr5Wy6gcIpo61 mG/rSckdEUP04nzxWGzqERrlX0oRe1NF6Swm13A10vYMZQ7Z2or3CI2lOxhuNB45wULt7tFW0+ OfIQC7HtiWVYivAXyPNrXpBh Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:15:06 -0800 IronPort-SDR: 2oeLZ2AbX0In4eFVHQb8MvUy7+KvtS/gNQuzSyyGr/fPhD7aR93XVuJVmeeKdZvyeH5XkI+7aA pWzQQeqUuMOZvEs3srH9I7FShhcmYiUoKRYcMjo956nVgmD9rTDPhOdoDefODPKtEy1Sw4LoNt MT6pSxMTJ7jMqzfTeImcqiMaalloL47x8e5T31n28xIwUlncs5XCz1QBZXSwXjxkmXOkSkDIVq jU/0W9QvCVNIJUboKxnwE97lYX82rk2qrKBbWrGYMWUgbhHtPia89ttifzJQxmtaoe6hMD44G2 5Ak= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:29:04 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Johannes Thumshirn Subject: [PATCH v10 37/41] btrfs: split alloc_log_tree() Date: Tue, 10 Nov 2020 20:26:40 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is a preparation for the next patch. This commit split alloc_log_tree() to allocating tree structure part (remains in alloc_log_tree()) and allocating tree node part (moved in btrfs_alloc_log_tree_node()). The latter part is also exported to be used in the next patch. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 33 +++++++++++++++++++++++++++------ fs/btrfs/disk-io.h | 2 ++ 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9490dbbbdb2a..97e3deb46cf1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1211,7 +1211,6 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *root; - struct extent_buffer *leaf; root = btrfs_alloc_root(fs_info, BTRFS_TREE_LOG_OBJECTID, GFP_NOFS); if (!root) @@ -1221,6 +1220,14 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, root->root_key.type = BTRFS_ROOT_ITEM_KEY; root->root_key.offset = BTRFS_TREE_LOG_OBJECTID; + return root; +} + +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root) +{ + struct extent_buffer *leaf; + /* * DON'T set SHAREABLE bit for log trees. * @@ -1233,26 +1240,33 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, leaf = btrfs_alloc_tree_block(trans, root, 0, BTRFS_TREE_LOG_OBJECTID, NULL, 0, 0, 0, BTRFS_NESTING_NORMAL); - if (IS_ERR(leaf)) { - btrfs_put_root(root); - return ERR_CAST(leaf); - } + if (IS_ERR(leaf)) + return PTR_ERR(leaf); root->node = leaf; btrfs_mark_buffer_dirty(root->node); btrfs_tree_unlock(root->node); - return root; + + return 0; } int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; @@ -1264,11 +1278,18 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *log_root; struct btrfs_inode_item *inode_item; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + log_root->last_trans = trans->transid; log_root->root_key.offset = root->root_key.objectid; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index fee69ced58b4..b82ae3711c42 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -115,6 +115,8 @@ blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, extent_submit_bio_start_t *submit_bio_start); blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio, int mirror_num); +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root); int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_add_log_tree(struct btrfs_trans_handle *trans, From patchwork Tue Nov 10 11:26:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894123 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AD45315E6 for ; Tue, 10 Nov 2020 11:29:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85BB1207E8 for ; Tue, 10 Nov 2020 11:29:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="CsuONGdw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732170AbgKJL3r (ORCPT ); Tue, 10 Nov 2020 06:29:47 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12030 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732195AbgKJL3l (ORCPT ); Tue, 10 Nov 2020 06:29:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007781; x=1636543781; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BLX9S8ivqjtwXHkCZxXvEU5pBlznLZ1k/tZ8n9x+nek=; b=CsuONGdwSuID80P7oMVgTPayjpu+6EwQep2uwr3wF6ZH+g3F5IJH5UBe re//5w4W1AnAoS8CywgTH2SZtuH8vSvtrofCeL7+YlhkHgh8Spc/6w+D1 WZIV4UpiRes1SBG2Ic4YuEIu+A5MbNBYjmVkE4bxioorwZJfQzaUjZ8+K aLVC5nsZSOqlzJzu1J4YxYJDlBVrQPgEjlbJd4U+APLibjkfSBpdfJRGT hlM6pqMMDOyzitLcF7/7/euSEqOWnttCj7EAUGzd1uur+wwQZkJuonQZP OEtvbw6Pl4CcXnN1kfAHTo+CkJAA8BR6iH5U425RySw8ISaENCxSEDFSN A==; IronPort-SDR: KR1xDRf0nS0ZCT9R6laF2Um2pkHFjts1Skxe4F92TmWzeGGr/fEs/w9IKsoknd/E6R7+zV3qmf bx2RVDpsQRfZYV0oyIYWRvjduoockWvwhwzB+2UOnpkGKCDqGDhVcGyYjRsxb47qKc8tBm76l+ hYXZgJKOHR/3VfjBSBDjreth+RS4r2BhTnpX51Auo7V26nzvJO0PxjRZAMkdLHkDRcGzQ8zccM lzaByDFrHLa6U8RKiNZ9uMSz/350bmPUI8OMr4/bdosP/MZj4bIpLf+cFgPD+2aB2DxoKzzisM 8us= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376719" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:29:07 +0800 IronPort-SDR: dgTax0kZpDGR/BmgFd9NHYJK98q94OZ/tK2CLR3HLBEif67QP+OnMDP+CBuJnV7bbwABjKbMM7 DrQRTANMum+4wbt0F6Fr8/kj1PWwO6VKcp8M8zD2J9YENV1irIGw8/089ekMLD6aCqHrAwT5OO uGabLoAKp6q5Lx5cWQmvyHkCRvF7iw2TFY3qKQ9N/lx+Mc+Gt2jMsyrG0lyEH2MOK4sHPAZCYT zuoahmygtHbyXzi0l8r+v0vmEHtzShTJKppKya+MDbtOrLoqVoSleOiKjxjAn7OAnRrWzc3Et2 dbYbdLObGXlZuIlELF9OsiQA Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:15:08 -0800 IronPort-SDR: n8dr7va6Imbzj5evk/gyAip7XszNMSabmulilNkS8ZILXXj6VJOHXFn3GxCD40GH7xUYIBqjmU fA3N/keqAyFxJhRVljp06QOsnRyRNh17myoiOh4tlyMkGa0aINu8/V9sOFlf5absXMvYnu4jqk MAqufe8jjUEoIVxJgqBCCceUztk647CX3ERkO14V4ujIqLKfxR6do3djR7LeHz785nxGIxa6Oe GTKj8UaXrRnhQLbXoYJSJb8xDxbe57OYronzL/PjY3I4VwtrIwY+udm3AZgUEwvX0BPNt3Hv1H x3Q= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:29:06 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Johannes Thumshirn Subject: [PATCH v10 38/41] btrfs: extend zoned allocator to use dedicated tree-log block group Date: Tue, 10 Nov 2020 20:26:41 +0900 Message-Id: <551fecb79909221d640f9d6d08f5ccc8487717be.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/3 patch to enable tree log on ZONED mode. The tree-log feature does not work on ZONED mode as is. Blocks for a tree-log tree are allocated mixed with other metadata blocks, and btrfs writes and syncs the tree-log blocks to devices at the time of fsync(), which is different timing from a global transaction commit. As a result, both writing tree-log blocks and writing other metadata blocks become non-sequential writes that ZONED mode must avoid. We can introduce a dedicated block group for tree-log blocks so that tree-log blocks and other metadata blocks can be separated write streams. As a result, each write stream can now be written to devices separately. "fs_info->treelog_bg" tracks the dedicated block group and btrfs assign "treelog_bg" on-demand on tree-log block allocation time. This commit extends the zoned block allocator to use the block group. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 7 +++++ fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent-tree.c | 63 +++++++++++++++++++++++++++++++++++++++--- 3 files changed, 68 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 04bb0602f1cc..d222f54eb0c1 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -939,6 +939,13 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, btrfs_return_cluster_to_free_space(block_group, cluster); spin_unlock(&cluster->refill_lock); + if (btrfs_is_zoned(fs_info)) { + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg == block_group->start) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); + } + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8138e932b7cc..2fd7e58343ce 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -957,6 +957,8 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; struct mutex zoned_meta_io_lock; + spinlock_t treelog_bg_lock; + u64 treelog_bg; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 2ee21076b641..69d913ffc425 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3631,6 +3631,9 @@ struct find_free_extent_ctl { bool have_caching_bg; bool orig_have_caching_bg; + /* Allocation is called for tree-log */ + bool for_treelog; + /* RAID index, converted from flags */ int index; @@ -3868,23 +3871,54 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) { + struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_space_info *space_info = block_group->space_info; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; u64 start = block_group->start; u64 num_bytes = ffe_ctl->num_bytes; u64 avail; + u64 bytenr = block_group->start; + u64 log_bytenr; int ret = 0; + bool skip; ASSERT(btrfs_is_zoned(block_group->fs_info)); + /* + * Do not allow non-tree-log blocks in the dedicated tree-log block + * group, and vice versa. + */ + spin_lock(&fs_info->treelog_bg_lock); + log_bytenr = fs_info->treelog_bg; + skip = log_bytenr && ((ffe_ctl->for_treelog && bytenr != log_bytenr) || + (!ffe_ctl->for_treelog && bytenr == log_bytenr)); + spin_unlock(&fs_info->treelog_bg_lock); + if (skip) + return 1; + spin_lock(&space_info->lock); spin_lock(&block_group->lock); + spin_lock(&fs_info->treelog_bg_lock); + + ASSERT(!ffe_ctl->for_treelog || + block_group->start == fs_info->treelog_bg || + fs_info->treelog_bg == 0); if (block_group->ro) { ret = 1; goto out; } + /* + * Do not allow currently using block group to be tree-log dedicated + * block group. + */ + if (ffe_ctl->for_treelog && !fs_info->treelog_bg && + (block_group->used || block_group->reserved)) { + ret = 1; + goto out; + } + avail = block_group->length - block_group->alloc_offset; if (avail < num_bytes) { ffe_ctl->max_extent_size = avail; @@ -3892,6 +3926,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, goto out; } + if (ffe_ctl->for_treelog && !fs_info->treelog_bg) + fs_info->treelog_bg = block_group->start; + ffe_ctl->found_offset = start + block_group->alloc_offset; block_group->alloc_offset += num_bytes; spin_lock(&ctl->tree_lock); @@ -3906,6 +3943,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, ffe_ctl->search_start = ffe_ctl->found_offset; out: + if (ret && ffe_ctl->for_treelog) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); return ret; @@ -4155,7 +4195,12 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); case BTRFS_EXTENT_ALLOC_ZONED: - /* nothing to do */ + if (ffe_ctl->for_treelog) { + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg) + ffe_ctl->hint_byte = fs_info->treelog_bg; + spin_unlock(&fs_info->treelog_bg_lock); + } return 0; default: BUG(); @@ -4199,6 +4244,7 @@ static noinline int find_free_extent(struct btrfs_root *root, struct find_free_extent_ctl ffe_ctl = {0}; struct btrfs_space_info *space_info; bool full_search = false; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; WARN_ON(num_bytes < fs_info->sectorsize); @@ -4212,6 +4258,7 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.orig_have_caching_bg = false; ffe_ctl.found_offset = 0; ffe_ctl.hint_byte = hint_byte_orig; + ffe_ctl.for_treelog = for_treelog; ffe_ctl.policy = BTRFS_EXTENT_ALLOC_CLUSTERED; /* For clustered allocation */ @@ -4286,8 +4333,15 @@ static noinline int find_free_extent(struct btrfs_root *root, struct btrfs_block_group *bg_ret; /* If the block group is read-only, we can skip it entirely. */ - if (unlikely(block_group->ro)) + if (unlikely(block_group->ro)) { + if (btrfs_is_zoned(fs_info) && for_treelog) { + spin_lock(&fs_info->treelog_bg_lock); + if (block_group->start == fs_info->treelog_bg) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); + } continue; + } btrfs_grab_block_group(block_group, delalloc); ffe_ctl.search_start = block_group->start; @@ -4475,6 +4529,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, bool final_tried = num_bytes == min_alloc_size; u64 flags; int ret; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; flags = get_alloc_profile_by_root(root, is_data); again: @@ -4498,8 +4553,8 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, sinfo = btrfs_find_space_info(fs_info, flags); btrfs_err(fs_info, - "allocation failed flags %llu, wanted %llu", - flags, num_bytes); + "allocation failed flags %llu, wanted %llu treelog %d", + flags, num_bytes, for_treelog); if (sinfo) btrfs_dump_space_info(fs_info, sinfo, num_bytes, 1); From patchwork Tue Nov 10 11:26:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894117 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C35FB138B for ; Tue, 10 Nov 2020 11:29:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9ACE320797 for ; Tue, 10 Nov 2020 11:29:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="CocTDqSz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732269AbgKJL3t (ORCPT ); Tue, 10 Nov 2020 06:29:49 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12022 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731969AbgKJL3m (ORCPT ); Tue, 10 Nov 2020 06:29:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007781; x=1636543781; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SYSIJjfqADPp8c18wBFU2La2NMbjrakFsPN2JUf/boM=; b=CocTDqSzPwpxJKkfAcpw/9/QGHHfpT6fcK3xBW/CADAN2iIYIZDpQMkh 9LBRPGaUbCI14W9JK4oaSw39Cer93Z6gOB3Hpui70MCIOyd38/laPM0mL zPM6UhqWqvosR/UGmBN+cetSHun6iuuN2L7PiAurZ8O0O6HrJwlN88wW4 A6Um06lK/hyjvE3unsvqSUpxVgBlQJB1OLqCZhHFXONUbV1SQRpFxYtYd M1pfz9Qu0I82DZ6qcY0QLkdOGUtRMdU2f+6+W4MsiPJg5vr4MBO5wrLNG QUTtNoP9stFn4lSmUYEZyvw6GNLAO0PK895JuDSJ2fElQFXVH9kIpapLG Q==; IronPort-SDR: Ds3XnWIpzCQFvi2Dy47OMMflIHxHqg1TqqoIf5LriW48V/IdUSNdoHJxyNRDpwuTEKAabw3pW8 EgQP/vmAXvUYry8oq3nCSuaikS8tpMxKPcJk3OttJcQTcVb2t8WfnwbRbxNx7bJvrLxtSWnDt8 SYR74eCR+Imaj2RPFVlTBOock9p6gbBFtqf0e8LRT9VYn4YnOEjarMeIVBsFyxtOG4LOvHoeJP Qs4SZLVErLpAYe14nmsHe0QIalExjyXs/5ezmzR6uaqrTqoyt1ut1nk8Sln0DWbDXCJUWNbdb2 cFU= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376730" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:29:08 +0800 IronPort-SDR: WKLTEvH5jjN6K/98T/LDfE0lU5IMTUeO/4BO+UGmyW5JvYIAx6hkrudk6aHWnM8NzFwYuxLZ1q f1Hf0DC2zPAgCHiaTiV45qRr1K+Hsce6hjRS6EG3NtGTiYKXY5kRuvGIDMdkgusl9zYiqETLvd lM0XEObU6SJe8+QqWseyhhm3kygcPcjZsyw3pkX+tejK1b/XOgmVWjrEb4oThRLZmfqQMXNle4 S/NPF13Mrsn3C/O+p9Tp05EkDzSMjyAOpKUlHGbrCe2C5I+uWVW9LZWKMui92cHzx5sVmKDOfM ORrmdg7cNfo1iWX0jnqdbYoh Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:15:09 -0800 IronPort-SDR: uqQvsKFcoC6kGT/12347WRjw1GUJSUwSxKtzuq9yUJ0l85EMFWCuAyf5Ps56W8Gsa9ueM+UJ6X hnpxjkwi5hV9B6qyTGqgWmVJV5NZZ9A/u9Hqs9x4Da809uM3BKUlmmpSP8268b7f/jNeg8fBr7 RU2n3+6DOCnf/aei5ClT+g05hXOl/WAu6/DmL4EuvI9SEFPthdNGAWmzb8exyFoE3OHKndpx1K R/1DVCHPwB61yix2GWRFAduYigKuqIUXFvw27OtP7oFhi1F5k6VMJI8i4ifxmMNs6ZjyqSDkjz T/Y= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:29:08 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v10 39/41] btrfs: serialize log transaction on ZONED mode Date: Tue, 10 Nov 2020 20:26:42 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 2/3 patch to enable tree-log on ZONED mode. Since we can start more than one log transactions per subvolume simultaneously, nodes from multiple transactions can be allocated interleaved. Such mixed allocation results in non-sequential writes at the time of log transaction commit. The nodes of the global log root tree (fs_info->log_root_tree), also have the same mixed allocation problem. This patch serializes log transactions by waiting for a committing transaction when someone tries to start a new transaction, to avoid the mixed allocation problem. We must also wait for running log transactions from another subvolume, but there is no easy way to detect which subvolume root is running a log transaction. So, this patch forbids starting a new log transaction when other subvolumes already allocated the global log root tree. Signed-off-by: Naohiro Aota --- fs/btrfs/tree-log.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 5f585cf57383..505de1cc1394 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -106,6 +106,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans, struct btrfs_root *log, struct btrfs_path *path, u64 dirid, int del_all); +static void wait_log_commit(struct btrfs_root *root, int transid); /* * tree logging is a special write ahead log used to make sure that @@ -140,16 +141,25 @@ static int start_log_trans(struct btrfs_trans_handle *trans, struct btrfs_log_ctx *ctx) { struct btrfs_fs_info *fs_info = root->fs_info; + const bool zoned = btrfs_is_zoned(fs_info); int ret = 0; mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + if (btrfs_need_log_full_commit(trans)) { ret = -EAGAIN; goto out; } + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } + if (!root->log_start_pid) { clear_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); root->log_start_pid = current->pid; @@ -158,7 +168,9 @@ static int start_log_trans(struct btrfs_trans_handle *trans, } } else { mutex_lock(&fs_info->tree_log_mutex); - if (!fs_info->log_root_tree) + if (zoned && fs_info->log_root_tree) + ret = -EAGAIN; + else if (!fs_info->log_root_tree) ret = btrfs_init_log_root_tree(trans, fs_info); mutex_unlock(&fs_info->tree_log_mutex); if (ret) @@ -193,14 +205,22 @@ static int start_log_trans(struct btrfs_trans_handle *trans, */ static int join_running_log_trans(struct btrfs_root *root) { + const bool zoned = btrfs_is_zoned(root->fs_info); int ret = -ENOENT; if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &root->state)) return ret; mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + ret = 0; + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } atomic_inc(&root->log_writers); } mutex_unlock(&root->log_mutex); From patchwork Tue Nov 10 11:26:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894125 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5BA28697 for ; Tue, 10 Nov 2020 11:29:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 33D7420659 for ; Tue, 10 Nov 2020 11:29:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="JJmrsBkW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732282AbgKJL34 (ORCPT ); Tue, 10 Nov 2020 06:29:56 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12024 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732198AbgKJL3l (ORCPT ); Tue, 10 Nov 2020 06:29:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007781; x=1636543781; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6m8PjvRrTOtZX6WwjPVwecIv7/RxFMU445Z2iBOh5Q0=; b=JJmrsBkW1we8QHNB9LuYnUxfEZr1TwflrgVDauBn47DX4f9UK5jIGcwR BApNA8Hn4h85zXMc2AJBTMquNIE5ywlSBBfP2oQmINMa90rgbMLEeT8AL 1DI1L2htY6cPQoickf7tERyCztKeRrowoSY8QXs0LS+FGOZigk8kAQdB7 rrrF8WwEmS8eAdqe5n3YsfIh011s0t+S2qNSjwVzLEVyKKdLMsxeKgsM0 EeHgrwxj9LCELPc3eb+R4yBFCvAfGr7UuWg1OW3n05Poi3nqzeNVhuRse qND/pAbVwmmVj9bTemmsA5i+hCKzqSqnH0e5DDe0ZNuOU4nOGxOreT0Ff g==; IronPort-SDR: +tfKEwd7ahUIWwHRuEptT5oZkcdwD2MkFx9Db1FMp+48gDDBxm2woqypqi3Oyq38Oisd6PP3vi iv9YH0KPzhPl8WQ88qs728IfJCgNgHDkEyUFbOlG3RRFlpeN0FB2Bp/oHIZwoVhBldEP1Y/5AF 3KIVX8tnhCdxM4dgySRp+4xTPXFvepUaFonrHStt3QOnhOuO8iUR5016LaKG8za/OaL9mEIPXY o1ls5VL4DWMr/YAg9/9DUdctus1h9hj3SaZbVi4Vi9O6TNHtO/S6xSonegiPij3auaGR2nvToz /sU= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376740" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:29:10 +0800 IronPort-SDR: rQEFe/b5x2mzN1Lt+aqZiy6ci6WfwFew2QQTS7SWVO/kB7YRbL5LP8ho3/vsEHsTmfIsYjyAea vq9opKPmitKBjUonYp+tDQNrs9KI2ivemhHZeOyUt514htHpXVQyP75nTsBFiR39O68RShDMDh J4J2L/wvsW6vzSZbr2m2hYvjVKIV4kLWQOn6QJIZmu4MZoNd0+0hPcDpANZf8IEZLUdrpyu/+J lgtOJY5V9Z7ek1W6U795wwKTi944UKfFkxOdMdI5fhOJNNTbF4QykpclAn786Sr2HW3Gvbk4gr Pmz3HlXMMDXwqIpLIXalS+xx Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:15:11 -0800 IronPort-SDR: rOfT+k2lcOdioOaRf1KbwMRGCFGjiYfAq/dN5GmaVtcjTU7XdRndBsGSzXoton7Ke68DJPhpjk TBzHXFYlLZm49HeipoBJ4zRe2a13Oipa+zthU5hqHx3q492IEIVNrbiWGjoor9NcXU6uzYN1Aq 7ZHxtIZUYAtjg/t+VeagtEG50LoRZ4nBOTkJrqCb2WFk5DNvhnTYI+5flrt/84a1UAwsimAMf0 XvfAcjDGxHHEjrTeK1qZ3N461HjfFKMKSzMKsDalI/a9T/cM2s4kBJgF/RTZl1nsAo+66tudbV Hcc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:29:09 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v10 40/41] btrfs: reorder log node allocation Date: Tue, 10 Nov 2020 20:26:43 +0900 Message-Id: <76ce2df7936106a806f05e5e3628c586bd7bc62a.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 3/3 patch to enable tree-log on ZONED mode. The allocation order of nodes of "fs_info->log_root_tree" and nodes of "root->log_root" is not the same as the writing order of them. So, the writing causes unaligned write errors. This patch reorders the allocation of them by delaying allocation of the root node of "fs_info->log_root_tree," so that the node buffers can go out sequentially to devices. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 7 ------- fs/btrfs/tree-log.c | 24 ++++++++++++++++++------ 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 97e3deb46cf1..e896dd564434 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1255,18 +1255,11 @@ int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; - int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); - ret = btrfs_alloc_log_tree_node(trans, log_root); - if (ret) { - btrfs_put_root(log_root); - return ret; - } - WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 505de1cc1394..15f9e8a461ee 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3140,6 +3140,16 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, list_add_tail(&root_log_ctx.list, &log_root_tree->log_ctxs[index2]); root_log_ctx.log_transid = log_root_tree->log_transid; + mutex_lock(&fs_info->tree_log_mutex); + if (!log_root_tree->node) { + ret = btrfs_alloc_log_tree_node(trans, log_root_tree); + if (ret) { + mutex_unlock(&fs_info->tree_log_mutex); + goto out; + } + } + mutex_unlock(&fs_info->tree_log_mutex); + /* * Now we are safe to update the log_root_tree because we're under the * log_mutex, and we're a current writer so we're holding the commit @@ -3289,12 +3299,14 @@ static void free_log_tree(struct btrfs_trans_handle *trans, .process_func = process_one_buffer }; - ret = walk_log_tree(trans, log, &wc); - if (ret) { - if (trans) - btrfs_abort_transaction(trans, ret); - else - btrfs_handle_fs_error(log->fs_info, ret, NULL); + if (log->node) { + ret = walk_log_tree(trans, log, &wc); + if (ret) { + if (trans) + btrfs_abort_transaction(trans, ret); + else + btrfs_handle_fs_error(log->fs_info, ret, NULL); + } } clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, From patchwork Tue Nov 10 11:26:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11894119 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C505C138B for ; Tue, 10 Nov 2020 11:29:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F79720781 for ; Tue, 10 Nov 2020 11:29:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="J3HowSG7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732152AbgKJL3s (ORCPT ); Tue, 10 Nov 2020 06:29:48 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:12030 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732166AbgKJL3p (ORCPT ); Tue, 10 Nov 2020 06:29:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1605007785; x=1636543785; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HetSTPx+/iWbmDorkK+6Niy9oo3oHMHL4RCyAkTV0GM=; b=J3HowSG7FlS0oH3TRXD9U0TqqwS44FEkkSODo2/63sbUS5Fkc7jdwCOF zMoWDsVeJsZAmdDFhdym/r+kofQndqK5uqbLVLXngk/kFzpvr1gDB2VSN wzwxYnHMEv+kKNiCm8GX8CWtGkcTMmrqVhTvJ67qgfPjckORdCKI3tgK1 cTpt1kGU0sws5ZzPpB+Cp8OeHn+AcXcUHfg9lyGK/auPnxtoUhc4aqi3J 9VTBXZqjOcOjyTWD4QUNtcRRkPM3wftVoC/ahlN2/JN7c3b8dp6b/iOxS MqkYNvYI8BGJko/UaoDlTx/HHDE1jBWD7LgiPmwOzN2t+jkK9UjfUMVW5 w==; IronPort-SDR: yI9z9omTgOzNsXt9YfGOg+pBbjuMLk71clO/gjyQC+/ks+83RE7NZpebVe63SQO4Hr6+05eJtv /MMD5jxLRhJ6KepbHdcQskEMbV2LFxPmub2nfAs6Y3nVVEH0i0qliw2BpAai+bVfzf+VqukhU5 hxQtT6TxZGgpy6mw2M/10ulTZNosmtKu0lgMqiemU08W0vi5nS/ASnGcMaJxZsK7pmTunlMkE7 ahqL4jAmva+RnQHcCBaiv8wk7HrPIoHmTOP/bR4JurBs5Mhojh8zeCcWwutitakyCtgYhmEX0u ziA= X-IronPort-AV: E=Sophos;i="5.77,466,1596470400"; d="scan'208";a="152376751" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 10 Nov 2020 19:29:12 +0800 IronPort-SDR: jjGVshq6AY1TNMtxQobHumDbrbBIrcPy7El0dE0DkGPjaVZXuzETZqgx0lwIZDwvZ1CMLhegIi yU9XYGcAdtnplGwCShIVIa5XGqOOYOYSHwLS1Xi+IoREqCq0JI3IAC2tcZX5R/RRYdRYNur7F/ 6h7ETQoMLjWr/N9sMC7+FTtsz0CPM1eWP8klbjPJFU+nTayAbEBkExxBGJA3PcdLbPGiOEjP9t 4QP/PSC6mK0TUH1TdiDbdQTNn/GFgpXp0WKT3o9dAtgYBofsQtVRlXHPukadfxBjN4hspg1eTo 0L4kDQGTP1pSfLkaAHWUsFMZ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 03:15:13 -0800 IronPort-SDR: dMJGR/j82JzHfEPDJ7iLDjyXhcY31sdScHrTfnd4a1qPvCylXcjwP2RMY1vXWiKVdaY2J6jVCq 2i0nquFZ2lYDNu7D86P3bd46l01Oijnl6+87zfrBKlwNZGyCML6t68i46UbyHGQsg9ag9r72wP Yuau1S1F4Z2fyBDzHKrjnDcC45rWcMzEvcnOTpekZhKjUbQVGBRv7LC7AJq4+GcDvaiNS4h3y6 5dDiSqvve7uvFMb+IYSvDzi+PYFfQola0yOEJytIVo9RHp07mS9FQXqhoPz9yr3rcBFWQihnta v/8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip01.wdc.com with ESMTP; 10 Nov 2020 03:29:11 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v10 41/41] btrfs: enable to mount ZONED incompat flag Date: Tue, 10 Nov 2020 20:26:44 +0900 Message-Id: <9792b90dd95d44f86b5ddc3e25373286ec9fbf04.1605007037.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This final patch adds the ZONED incompat flag to BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount ZONED flagged file system. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2fd7e58343ce..935b3470a069 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -302,7 +302,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ - BTRFS_FEATURE_INCOMPAT_RAID1C34) + BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ + BTRFS_FEATURE_INCOMPAT_ZONED) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)