From patchwork Fri Jan 22 06:21:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038351 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 496B9C433DB for ; Fri, 22 Jan 2021 06:23:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 05721239D1 for ; Fri, 22 Jan 2021 06:23:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726175AbhAVGXb (ORCPT ); Fri, 22 Jan 2021 01:23:31 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51034 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726168AbhAVGX3 (ORCPT ); Fri, 22 Jan 2021 01:23:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296608; x=1642832608; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tc4jvgtKmRYjRYt//weEvB2LvZuw7y0rhdeJ8hOnRVc=; b=CkfhkoJvot06S3dUr3SVNAP5HpwZIlD0fW5a29t7lAUxOY3DVvgNWANP jk/SRr+30i4ho2oVVQhgn4cV9r7L+aFM5rKK2wKr9tiiTwYkzcdCGHDeb t0qFI6NG439Fz8GlqqTaZJfhoffX+FyBuLqngpIlsjzkAaArfMMmmlGec vFfDNQ2A2b2C3fSL4V8100kN1WqBf2QrfXKUyA7Jz1KX/Cc7dS4Zjc8G5 YAbfhlAjWzaTODVixZ0Xn4o0Nh2WPME3dNqwzpgJ0bb82fjl+BpzjHn8b 1EdqpXbFW2gyPEREbdgGycEPLKqPnsqEhk7z6OF1fcL1krInxkvz9ltay g==; IronPort-SDR: hBWY6fGwj1Jf5VQN4do7u03e2LZ5CZXx2zZE/CPYou6W4+0N5P7hkZJ3+/TsyTmL3Wwnv1whyv pAGR87JZSoDaebcwY0Bt5ZHDApDVIkIbFzHlvt0Pyh3oAwINOsdr5EBx77rojSujn4kDUejWEr hc8if9RpE/VfXsxgpmQnaIr+vvFealunx99ltuBY0L2dpK2LSX90XDFLnQX92joGwm13xQ/V41 6A3+Dl7NQzXkjWQOp9UzU3mvwVHu6wgjbbYQ3E45y0NDXQKyc4CJw631TZUqZtQHSS5bCfhqdr uqw= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391919" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:22 +0800 IronPort-SDR: G+xLyz2mnXaEEeYqiStyfYok04I+9z5spgznJTD4UmBR+27aVsnRIFdNSdDNVdecYn9QHo3y/p Pr84HFRjTDsfDuNO4b0USjT0aEsir2xd7909UyFoUFnyecTDSWpzahy8JWy1fSMdrbyLw9EPKc X2j7rZrpFpBlYaMxKaRvWAZ5Urv6uDGE9RopJMserKPDxvLJ432++HmONMUA49vGcP/62jwKdl A2CQf1eThT6VYeTjA1GlC+M+F++Gsrsj8RuUftzAEybBY9o03k9SWKeQtChewoWx5alYa0ccpq Pt0WbMWtMcn/shVkCGPs80hz Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:04:54 -0800 IronPort-SDR: 4htMumFIsnksP71uIavYO+wsQ9fHomNTfjCLv/A6bDjk+FFzAdeVlZEpCQTgbe0eDfvkgOy0MJ VV4V7lPidOb3AIUGsHki2oXHyGyag9VuQ5MnP1w6AfDcZXpWwrT7uXwtI+LfC3ur4kWAIKkZA/ 1l0ETLF68szzim3gBXOz4lP1zsKDTFqYUYHmqRgHrrN+FMUWqN0wfdq52GJ+P8OfIFvotkQlbA 1Rzt4vbVkV6aTY3qadR7U7Tg6J/3I4ZwxqKTXyZbRVm9F6cyz2xfYYMpdDS4hyOAkO34/FMmwr tvY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:21 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Christoph Hellwig , Josef Bacik Subject: [PATCH v13 01/42] block: add bio_add_zone_append_page Date: Fri, 22 Jan 2021 15:21:01 +0900 Message-Id: <94eb00fce052ef7c64a45b067a86904d174e92fb.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which is intended to be used by file systems that directly add pages to a bio instead of using bio_iov_iter_get_pages(). Cc: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Reviewed-by: Chaitanya Kulkarni Reviewed-by: Chaitanya Kulkarni --- block/bio.c | 33 +++++++++++++++++++++++++++++++++ include/linux/bio.h | 2 ++ 2 files changed, 35 insertions(+) diff --git a/block/bio.c b/block/bio.c index 1f2cc1fbe283..2f21d2958b60 100644 --- a/block/bio.c +++ b/block/bio.c @@ -851,6 +851,39 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, } EXPORT_SYMBOL(bio_add_pc_page); +/** + * bio_add_zone_append_page - attempt to add page to zone-append bio + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist of a bio that will be submitted + * for a zone-append request. This can fail for a number of reasons, such as the + * bio being full or the target block device is not a zoned block device or + * other limitations of the target block device. The target block device must + * allow bio's up to PAGE_SIZE, so it is always possible to add a single page + * to an empty bio. + * + * Returns: number of bytes added to the bio, or 0 in case of a failure. + */ +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset) +{ + struct request_queue *q = bio->bi_disk->queue; + bool same_page = false; + + if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND)) + return 0; + + if (WARN_ON_ONCE(!blk_queue_is_zoned(q))) + return 0; + + return bio_add_hw_page(q, bio, page, len, offset, + queue_max_zone_append_sectors(q), &same_page); +} +EXPORT_SYMBOL_GPL(bio_add_zone_append_page); + /** * __bio_try_merge_page - try appending data to an existing bvec. * @bio: destination bio diff --git a/include/linux/bio.h b/include/linux/bio.h index 1edda614f7ce..de62911473bb 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -455,6 +455,8 @@ void bio_chain(struct bio *, struct bio *); extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int); extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *, unsigned int, unsigned int); +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset); bool __bio_try_merge_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off, bool *same_page); void __bio_add_page(struct bio *bio, struct page *page, From patchwork Fri Jan 22 06:21:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038355 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83661C433E6 for ; Fri, 22 Jan 2021 06:23:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 504B7235FF for ; Fri, 22 Jan 2021 06:23:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726417AbhAVGXi (ORCPT ); Fri, 22 Jan 2021 01:23:38 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51039 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726187AbhAVGXa (ORCPT ); Fri, 22 Jan 2021 01:23:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296609; x=1642832609; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=miwcS2PUXlaNgE1YrtKyGZ/a7gdhwX7SYGwz74INDPs=; b=T0wF8BM30en6vKOrPoOv618KQi8aTFFHZCouCtQwiKKpoDb90pM67rCz hR67YbqzCDTMgbburVGBkbg1I7GwEuj202T/xyR7AW3SiLYNVg5iSt6Cl H/xW2Rv09zi+ZX0/cfOElnQrjpqEeMwEQsfgN4/hGjB45pR5sfjQjqAdt 2Ez8WEx571zx2RvnVjwkaCuzaaXXtgb5Le1X+VQmB5pBNHjYplF1iqPak K11606+8ndEC1Ko5LVAy+LcfrPuVUKpQ9h+ftvlgdD0oOZtr7mM9/H0j/ MV9t4C9+PEPiXAoJMrftd9mJuABzgGnBw7FgfXWSmUu0njqet0Tsl9+1M g==; IronPort-SDR: fwC42muGmJChA8mvmU/xo1AcYUwrFS+5YKWyKfSZ+/3lvLyzWVsfymQGdifTeLdIAVD6JK9Bme P7RKVvUH/Y+nVVQCJt6p5xaMyy7cSaJx4E5BPwNY0fB7EbQdTCjt72tMUGf9Q0brdfX+LJv1z2 E4j9wAsiwXkKkHYoBITWqPHYNCD0q/Z2gQYS4cUU+I7Z17iZ8FQrloFO46b7cbh3myY4insUTC 4kVSuQtnAIK9IyrWBRE9cTtbjKWEnjA2QuMnqSpCXsi3gQBanfwIrgOkBPJ1jcXa6Xj21OpCsq Lw8= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391922" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:24 +0800 IronPort-SDR: /oCxIRLMwKzHN6LM1F9zTbrTPDcX5C9Zxp5gZREi0bSApsqxuBpdhjqBaMwoy3YT1E6bqZKevL gCBSZpxk4rgA1Ig/6eeBZ2wzkmvSK4tzDi+BZykfzsKrweFiKiAu3hpoXekCqyOhFCPIjYi1Ue /y1xH0k9JB3Ic/bQZle8C092JqRu65eq4KfQYVtBxM6DU7eWU/dK2NCY2gVQPSUdysEYf/OTw0 ddOB3UzU2LJ8KE7Vg6RUCx4QQA61kdeL549GHxsngxHSnfcDsl9rY5bronqJnJHuuqbdAAbG/2 bOMCVtv/8f6JZliGffWaLpaW Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:04:56 -0800 IronPort-SDR: HVTPAkVEIdKbaPlnG0E3aWuDuclsvegqFUtTMbS7upRjpE7XrjdxTDjxGP3BGtiCAfnCJPT+Nq PpFDsV79mzQFHpBrQz+RhJVL7QkHmhB8uvcBx5rvJfMFCwy0r1sbxkO++roSQfIpL7AiP975P+ xG1UoAzoeeSSwLVPwdrFrIrCCxD8wZ3UgYS3LT/aq9wr8SJ7PYba+TscTcY+8b6fkLdpJGun5F quKfWUYFa8ktvPggFfcY4hOjA9kLHentlkiUZQ9um91cW24dJLFSSNQ7y6CInGeH3xrI9oOFcH 4Qk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:22 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Christoph Hellwig Subject: [PATCH v13 02/42] iomap: support REQ_OP_ZONE_APPEND Date: Fri, 22 Jan 2021 15:21:02 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds such restricted bio using __bio_iov_append_get_pages if bio_op(bio) == REQ_OP_ZONE_APPEND. To utilize it, we need to set the bio_op before calling bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND and restricted bio. Reviewed-by: Christoph Hellwig Signed-off-by: Naohiro Aota Reviewed-by: Chaitanya Kulkarni --- fs/iomap/direct-io.c | 43 +++++++++++++++++++++++++++++++++++++------ include/linux/iomap.h | 1 + 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 933f234d5bec..2273120d8ed7 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -201,6 +201,34 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, iomap_dio_submit_bio(dio, iomap, bio, pos); } +/* + * Figure out the bio's operation flags from the dio request, the + * mapping, and whether or not we want FUA. Note that we can end up + * clearing the WRITE_FUA flag in the dio request. + */ +static inline unsigned int +iomap_dio_bio_opflags(struct iomap_dio *dio, struct iomap *iomap, bool use_fua) +{ + unsigned int opflags = REQ_SYNC | REQ_IDLE; + + if (!(dio->flags & IOMAP_DIO_WRITE)) { + WARN_ON_ONCE(iomap->flags & IOMAP_F_ZONE_APPEND); + return REQ_OP_READ; + } + + if (iomap->flags & IOMAP_F_ZONE_APPEND) + opflags |= REQ_OP_ZONE_APPEND; + else + opflags |= REQ_OP_WRITE; + + if (use_fua) + opflags |= REQ_FUA; + else + dio->flags &= ~IOMAP_DIO_WRITE_FUA; + + return opflags; +} + static loff_t iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, struct iomap_dio *dio, struct iomap *iomap) @@ -208,6 +236,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); unsigned int fs_block_size = i_blocksize(inode), pad; unsigned int align = iov_iter_alignment(dio->submit.iter); + unsigned int bio_opf; struct bio *bio; bool need_zeroout = false; bool use_fua = false; @@ -263,6 +292,13 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, iomap_dio_zero(dio, iomap, pos - pad, pad); } + /* + * Set the operation flags early so that bio_iov_iter_get_pages + * can set up the page vector appropriately for a ZONE_APPEND + * operation. + */ + bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua); + do { size_t n; if (dio->error) { @@ -278,6 +314,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio->bi_ioprio = dio->iocb->ki_ioprio; bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; + bio->bi_opf = bio_opf; ret = bio_iov_iter_get_pages(bio, dio->submit.iter); if (unlikely(ret)) { @@ -293,14 +330,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, n = bio->bi_iter.bi_size; if (dio->flags & IOMAP_DIO_WRITE) { - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE; - if (use_fua) - bio->bi_opf |= REQ_FUA; - else - dio->flags &= ~IOMAP_DIO_WRITE_FUA; task_io_account_write(n); } else { - bio->bi_opf = REQ_OP_READ; if (dio->flags & IOMAP_DIO_DIRTY) bio_set_pages_dirty(bio); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 5bd3cac4df9c..8ebb1fa6f3b7 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -55,6 +55,7 @@ struct vm_fault; #define IOMAP_F_SHARED 0x04 #define IOMAP_F_MERGED 0x08 #define IOMAP_F_BUFFER_HEAD 0x10 +#define IOMAP_F_ZONE_APPEND 0x20 /* * Flags set by the core iomap code during operations: From patchwork Fri Jan 22 06:21:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038357 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8A82C433DB for ; Fri, 22 Jan 2021 06:24:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9B7F8235FF for ; Fri, 22 Jan 2021 06:24:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726472AbhAVGYG (ORCPT ); Fri, 22 Jan 2021 01:24:06 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51100 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726498AbhAVGYD (ORCPT ); Fri, 22 Jan 2021 01:24:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296642; x=1642832642; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A8PFG3Diie06Y2nM5+Pb7vgzC9jKX3Ev6iZz/zowzC8=; b=p6DyVdOBewDyYOsvxS1dP1D3bkVatYN+Vt95AdEzWm8MHLLj5EXfcSK0 8tvxXP/g8aeNbZTz5fM2A7zjhgFF/oEUU8eKSm9f9wqzBs0kuH7JAL+oN CypJEUM9itPI1cnH+0HlCnjrLFg86wv6s7uAR6p0gmyJJ3Yqs05rLGEDL svm35XFcsjpbDCNKA7OqKBX6ex0pyXbmRwbc1KhL7eDli07SUMNWT3dkT YUp5eoaGwYCBpqLW6Lj3XjjcgpxMC+sNpZaFx2M947RPfIv1CCxFto4QX VXWGzfP+Wg6OkgL4xQdgoYhUWPIm1+4dsY25docMyPaTQCiRcVic2j+3L Q==; IronPort-SDR: /VugOEtwZpT1qQlCR4AtQDu72sXUEeg1yKGm5tYSUW2gXm2Z/HwV7NW42rl3TpohZIDSq02rOQ RkALELQE1dTCZDWsTQ0Iguuc0Ah8S6XelPTu9fORT625VkmVxzfmJMz3qQFRmd/EoykgqN/VDA Bfgd8ww/vDZ4DG2JFgjR+3ANqEH8VJiqOUta/WOJNQB8VbIY81BxhebK4qkDNr9XpbOcdum71W SOjD4nfTfq1QjjkzzT+bsvP1BqnXgD1Fy5bH4gMjVC5c307lxlJIDUE+xPash5n6asY7XkYC0g cyM= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391930" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:26 +0800 IronPort-SDR: VgaYJaJj80H8TIsJGjdzUtAP/BNwExgAEc3J1jNFh6cmZCLsi37QTmhk9nJOH74Cu516h1awgA 1zf05AhxebznRe2d0UFp7F/wHUmYlKQNjZzI7CUgcqRfTK51+uTaW3VuKFRVtz2VSBb1ZsTiDS 4KhYBQq6UAppKO4KblbX8H1Z/tc8NeykDnAqE85PqQILW9GdPb7Gk8VFvJrkMag7uNKA4otoGm t7xUYaQs4c2mqyvt5sT7Jcpkve6T/GeQB76VfanwUuE/lkLCHS6sykoKaOZTyPicjWFCuLFvNK CPkF9GAZjau9Dw15LqxN7dey Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:04:57 -0800 IronPort-SDR: vGAsk6CPFqXXaa+ws/f/0hx7GaMoS+W56abJrEPNamrzQXHrUgngAdP0LUBOGpGqu+ijFVenpo K42ZBpOevmTelGMQZOhqXoTVWDduomA1/RDNx1JHmH8XmGADNd0smPGLp3Mz6VjsSCG35Xn9WA x4wQIaK6LyREpMC+a4zL1HwQoLQJdY98aZy4N9XJ0FfWhYC0G0Y91cCKRRwjqekct1ltlLEja9 z5r+lkbFDyShndzjmjqovS7kk4b09/FetkSS+bVcBGs00Rhgp9vlqJbBHxHOrEf9colTVKOUYL aZ8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:24 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 03/42] btrfs: defer loading zone info after opening trees Date: Fri, 22 Jan 2021 15:21:03 +0900 Message-Id: <6863be2df7ec31c8a8266342c5b7e4bfb8c8a5b7.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is preparation patch to implement zone emulation on a regular device. To emulate zoned mode on a regular (non-zoned) device, we need to decide an emulating zone size. Instead of making it compile-time static value, we'll make it configurable at mkfs time. Since we have one zone == one device extent restriction, we can determine the emulated zone size from the size of a device extent. We can extend btrfs_get_dev_zone_info() to show a regular device filled with conventional zones once the zone size is decided. The current call site of btrfs_get_dev_zone_info() during the mount process is earlier than reading the trees, so we can't slice a regular device to conventional zones. This patch defers the loading of zone info to open_ctree() to load the emulated zone size from a device extent. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 13 +++++++++++++ fs/btrfs/volumes.c | 4 ---- fs/btrfs/zoned.c | 24 ++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 4 files changed, 44 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5473bed6a7e8..39cbe10a81b6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3257,6 +3257,19 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (ret) goto fail_tree_roots; + /* + * Get zone type information of zoned block devices. This will also + * handle emulation of the zoned mode for btrfs if a regular device has + * the zoned incompat feature flag set. + */ + ret = btrfs_get_dev_zone_info_all_devices(fs_info); + if (ret) { + btrfs_err(fs_info, + "failed to read device zone info: %d", + ret); + goto fail_block_groups; + } + /* * If we have a uuid root and we're not being told to rescan we need to * check the generation here so we can set the diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index badb972919eb..bb3f341f6a22 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -669,10 +669,6 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; - ret = btrfs_get_dev_zone_info(device); - if (ret != 0) - goto error_free_page; - fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index c38846659019..bcabdb2c97f1 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -143,6 +143,30 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + int ret = 0; + + if (!btrfs_fs_incompat(fs_info, ZONED)) + return 0; + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + /* We can skip reading of zone info for missing devices */ + if (!device->bdev) + continue; + + ret = btrfs_get_dev_zone_info(device); + if (ret) + break; + } + mutex_unlock(&fs_devices->device_list_mutex); + + return ret; +} + int btrfs_get_dev_zone_info(struct btrfs_device *device) { struct btrfs_zoned_device_info *zone_info = NULL; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 8abe2f83272b..5e0e7de84a82 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -25,6 +25,7 @@ struct btrfs_zoned_device_info { #ifdef CONFIG_BLK_DEV_ZONED int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone); +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info); int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); @@ -42,6 +43,12 @@ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, return 0; } +static inline int btrfs_get_dev_zone_info_all_devices( + struct btrfs_fs_info *fs_info) +{ + return 0; +} + static inline int btrfs_get_dev_zone_info(struct btrfs_device *device) { return 0; From patchwork Fri Jan 22 06:21:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038359 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4183AC433DB for ; Fri, 22 Jan 2021 06:24:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E289C235FF for ; Fri, 22 Jan 2021 06:24:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726560AbhAVGYR (ORCPT ); Fri, 22 Jan 2021 01:24:17 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51031 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726498AbhAVGYK (ORCPT ); Fri, 22 Jan 2021 01:24:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296649; x=1642832649; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lXJV3aeX/g+21wp5YVCT9XcbjBESAfZEy0QiID+uj0o=; b=iK9nX2/bqKkl6QX7BzHQnRqwqUwhaacRtRCm1oi9EhAXsMtdeUcyZ4aA Slc3b7t/cBSKvOcEBVFMZ+yMnfA824aWUonTZJy7COR5P5Nvc5nOMzrka u3IqzzAdOFkeuKjtL3KqjhOrN0HpjWeqrJ3WeccYsq++yqfQo15bcancY gBWK3wOjZUv4r+1GoMvXIsyA5eWTl1frkDromQF5oIjE0ZbsBqSM9X5V5 rDua3a/zz4pN3CVSmkwVT6qM6UB7LEwDioepPtUv3I96TQCvTKx4MTNtq U+J3seWX+nnZG00G9XGtQy3UBC8wQ2u7Oi2e0wdvmn8CwyqwXjK9DbiDe Q==; IronPort-SDR: TtlxzJilQ07F5edblsJI4sQobKwTAqv1YeTox6SnB7lrh1yZ1x6qX9xT0juvG7xIX+1+V8PujS 0eYs7Q2KB+ARE3e8eFBeTPp49ENd90Ze7BIM5X1nENMNkj8sgkDeOfxfX+beJVLPBN2uVKJwIA fsRnsOtFkCeeemLro5WdN3nmfSZgLRAF74FKugvE+X8HFkCIQKYMp38JedKrr9q8MeGuAWBOE6 7sWkvUVwEwGN1N6vrHmrMF9h4DJeXLip39np/G70hib42pESzQUl+iDYKRrRsDs8oC16bZGMay dKQ= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391933" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:27 +0800 IronPort-SDR: z4xBmwQQB9X+3o1Qy7/YkZ6UyRt8alFTClgpSFtmUBG3npSdgKLSOWJqnoSJNKxZ+RERDcXHdW KXctz016yF2s2S5huJjlWnXzzo2uCs9bOaDRrVNZjV4Jxw05Eo9Aucqe+6MaH3LEf8ZW5llqbD 3q8dxrWnz4B92F4cIvcWtuXvw//2fqCpp0RyaMEPBuFJcOaH2Zui+BLJFoqjjeY5zutcky4cmu zTi1EPPNTh8misdjGmE/8Wi+rWSg78YnFvhYqk9PBNwKQw3qy9LGw7zvzCNYSMsJV95EMbCgUg QDezfz1uFJ3ZeRIBr5AYLLYG Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:04:59 -0800 IronPort-SDR: suyh4awmDeZmtu7UUnjbGCvNGn7pR8Bf/1miA/cKFsedHHB5AYPUwI3aIqs4WdvqSe2Xrb+h6F 3+ZGOQkj2enP03vOdD9+z7iH8vKwAuDMeA6ssSN7DZeFAIB4617V27cb0aYoB1ZUSwSaa2ZvL6 FUPs8SnFELa2FoAENkXK6fLBl9r0NvWjAuFCDcyqRKjyxjyE1vS5zzRgqfeWD7Pws1jmIQ1KDK 3miVB93WpL6rn9aN76htwJb0DVmp8GPfQpyTZgWDR/gjXScEHNJ70EKffncKASCtrBi7MEb5Wx LXA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:26 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 04/42] btrfs: use regular SB location on emulated zoned mode Date: Fri, 22 Jan 2021 15:21:04 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The zoned btrfs puts a superblock at the beginning of SB logging zones if the zone is conventional. This difference causes a chicken-and-egg problem for emulated zoned mode. Since the device is a regular (non-zoned) device, we cannot know if the btrfs is regular or emulated zoned while we read the superblock. But, to load proper superblock, we need to see if it is emulated zoned or not. We place the SBs at the same location as the regular btrfs on emulated zoned mode to solve the problem. It is possible because it's ensured that all the SB locations are at a conventional zone on emulated zoned mode. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/zoned.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index bcabdb2c97f1..87172ce7173b 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -553,7 +553,13 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, struct btrfs_zoned_device_info *zinfo = device->zone_info; u32 zone_num; - if (!zinfo) { + /* + * With btrfs zoned mode on a non-zoned block device, use the same + * super block locations as regular btrfs. Doing so, the super + * block can always be retrieved and the zoned-mode of the volume + * detected from the super block information. + */ + if (!bdev_is_zoned(device->bdev)) { *bytenr_ret = btrfs_sb_offset(mirror); return 0; } From patchwork Fri Jan 22 06:21:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C518C433E0 for ; Fri, 22 Jan 2021 06:33:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 294EF239D4 for ; Fri, 22 Jan 2021 06:33:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726498AbhAVGYT (ORCPT ); Fri, 22 Jan 2021 01:24:19 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51034 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726507AbhAVGYK (ORCPT ); Fri, 22 Jan 2021 01:24:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296650; x=1642832650; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=I8LrIa6gECYZDXtQkC4uR5xH8wik8We3GtwQpzwZDMQ=; b=om/VNzq0Q/IS+J9tz8vAe/dCkZerYSxhJVJrkWP6808SFg+Gm0JJ1NuD LE/LmUUcmQj9MMHOdviBlWwZJxd2G71kvL71mxiS25BpGbkY5NZIOi1jw oGBbZC28ayktI3vYtfwbxYwGKe7XZtsVzNlAS5bSrvhq6vXMBHmLAhxa6 u9ad7ikzXCxgeO67FhkeWSXrOgZ03WYhvNX2QRKl36taILWTBmBgz+bLS liZUoThpeNlRDR0lWoE0qKOelVzaXHZ+rk8CXoZ0oPBaFokgHP4Oo8qOS 2TSTxSDSpBi7i4vSo5asw3EEATl+o2kVngNXS6jPPP2sdFtpC94uBrHmj g==; IronPort-SDR: f0TCIH70QaBsPjN0d4vb/heuwHKc40ADL8JINprlxPDhvm+GoJJ/seG9WSlyA6RZtOl81npvCn hpwcmKkzngppo+pNh9hnNp2+D/EQQ04HPNwdnt0lfSTYA4Q/NIdg5j3tS2Oc/IOOilxGZU+bSt HqPAFQjuwFbCTkrMHnH2VNK4WWolWZ1vMoVku605Y+zY132TDWToPgYVgkkYzNUzZH6LZ5QXM7 czsYXY4R6Qx3ERWyjV72cDqkdk088WSd/o2zz2Ni71vb+lotfjiySAeIqlEzjCtrZ5uEuJxHNX 6NM= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391939" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:29 +0800 IronPort-SDR: QkwOp3mzFGQ+WueVIwTSho0OTVmrSB066XAeSuPGPwATe1IhQW0DcRk4xO1/fqz+0VYIX0Qxyt KfJyoKuMa6lHvvVbbDM9fPvq9V3b9bM3iCxrkoOUPmQdDtMozWonS4kmTdHrtH5RpyTFPKJFjG v52AmDSm9/B8Pe+jYdq/BRNZqWa3oiVJhWz7VDEfq6akBBhvEOXx9Ur1RGfx04cDhC4Xjl9gb4 3If72ysNzou7qJT+pL4mRqWPnF+a9I5l1h+vn4c5HiDqVMrYAhhqWJhHtCn2sT1Db4pVfgeizh Z/+H5ktPXhFAeRmHD/p2O8A5 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:01 -0800 IronPort-SDR: GYPVu1TtWTgGcJEReCT9FU2wRhDD16NVnqxf+lwlQUMrspqx6hGxBQITREE9Ir3BAAznRR2E8v 9waDynpaxibn1LpkiibRvoH5yF4ZO6F89i+c5onT0BG9mmuPuP8w5+JbZnOGfzqH5zh4VCcb+m cT2gDP+t9lsqyfziqcP8JGRM8VGEwe+bmOvUIRHQ7A9H1lOOFzd+yymvLdYdQ1vModqohMbdOv WHvAxU8Bs7t1FAa21SUqPxg/woquyjC8AWM6iTvjhtaIDTKNM7TS+M1s1NY1Nn+lINZ30NuTcM 5RE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:27 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v13 05/42] btrfs: release path before calling into btrfs_load_block_group_zone_info Date: Fri, 22 Jan 2021 15:21:05 +0900 Message-Id: <9caf351d3da77e5b9f781226b2c199b570cccb62.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Since we have no write pointer in conventional zones, we cannot determine the allocation offset from it. Instead, we set the allocation offset after the highest addressed extent. This is done by reading the extent tree in btrfs_load_block_group_zone_info(). However, this function is called from btrfs_read_block_groups(), so the read lock for the tree node can recursively taken. To avoid this unsafe locking scenario, release the path before reading the extent tree to get the allocation offset. Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/block-group.c | 39 ++++++++++++++++++--------------------- 1 file changed, 18 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 0886e81e5540..60d843f341aa 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1797,24 +1797,8 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info) return ret; } -static void read_block_group_item(struct btrfs_block_group *cache, - struct btrfs_path *path, - const struct btrfs_key *key) -{ - struct extent_buffer *leaf = path->nodes[0]; - struct btrfs_block_group_item bgi; - int slot = path->slots[0]; - - cache->length = key->offset; - - read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), - sizeof(bgi)); - cache->used = btrfs_stack_block_group_used(&bgi); - cache->flags = btrfs_stack_block_group_flags(&bgi); -} - static int read_one_block_group(struct btrfs_fs_info *info, - struct btrfs_path *path, + struct btrfs_block_group_item *bgi, const struct btrfs_key *key, int need_clear) { @@ -1829,7 +1813,9 @@ static int read_one_block_group(struct btrfs_fs_info *info, if (!cache) return -ENOMEM; - read_block_group_item(cache, path, key); + cache->length = key->offset; + cache->used = btrfs_stack_block_group_used(bgi); + cache->flags = btrfs_stack_block_group_flags(bgi); set_free_space_tree_thresholds(cache); @@ -1988,19 +1974,30 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) need_clear = 1; while (1) { + struct btrfs_block_group_item bgi; + struct extent_buffer *leaf; + int slot; + ret = find_first_block_group(info, path, &key); if (ret > 0) break; if (ret != 0) goto error; - btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); - ret = read_one_block_group(info, path, &key, need_clear); + leaf = path->nodes[0]; + slot = path->slots[0]; + + read_extent_buffer(leaf, &bgi, + btrfs_item_ptr_offset(leaf, slot), + sizeof(bgi)); + + btrfs_item_key_to_cpu(leaf, &key, slot); + btrfs_release_path(path); + ret = read_one_block_group(info, &bgi, &key, need_clear); if (ret < 0) goto error; key.objectid += key.offset; key.offset = 0; - btrfs_release_path(path); } btrfs_release_path(path); From patchwork Fri Jan 22 06:21:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038361 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 743C9C433E6 for ; Fri, 22 Jan 2021 06:24:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 41AFE235FF for ; Fri, 22 Jan 2021 06:24:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726019AbhAVGYW (ORCPT ); Fri, 22 Jan 2021 01:24:22 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51039 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726065AbhAVGYP (ORCPT ); Fri, 22 Jan 2021 01:24:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296655; x=1642832655; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Nil00kLlHevsN5SD0h2AjqhEQMomGPW36drmXXKfdl0=; b=QaEi2Ch+nxa5tdTz31I7g33Vvr5paftPVcY6fbCvKI4hbUjnYvXH80VJ 6uuLV3oKSSOmCp/3tMvVY4dFUksBLEFol1JEZexJkrx6XNZc6af/5D1TO EqSp71I/BVc0u1XaDnNGvzvIw4DtTUmZddvs4RNmvVInetVhUi49jjUSg fyaklS6RhptKcs0Kl8V3RWJ014nEL8b3w67+dWMG8I42dyAtaXflRPDKU Jxjupj83uSFD7hcFd3G2R4hA65k6YzNrSKyiY6biiZjPGyPonMsNdv8Hs DjUd+LBDVB7krynHVK5awxCryi/LDU1JuPjU4j0NpD0TPByyxziMnuQJb g==; IronPort-SDR: c5efJD6VYGed+8P0og0nxmYywMGVa003Nux2x5NVwG5CFXv/suf4EpieV24yukTJtsS8kPfqgh +VwD6GEoaeJqpMoLUv1kdEO/tKuTHUYp0PhxTU3AZEeyCkn5ZLFZ+ory3U5/+eWrcu8Mrs758e aNHhUPrkZRWtZjvoeN7QofSJpnXZXyNMx6bL1E7Y7c2WNFMrm8WzjDJ7VSBw8JWZ9rgMK38bLS vlhxYYn49XXuLNoQdeIW/8KpacA5wUZx3asx58Os7MmHI1WgatAZeMgdTs9FA5EkxYtmp4R34H Njc= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391954" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:30 +0800 IronPort-SDR: VxuExwLe3AzIIzCcUGC+Pq4ORhfkqsag5Jmp6Qc6NDWeEDUoxfHLQkxITWCAWorWPFA5Qdy2pv 91pgEDnlIReVSZroFSUAe4xwzGMyapE4Und0vhyCY+VcBzJpebizJJzoLvW8agE9Yi4o5qaoQl scPUXzbrNs4+D4NBk9KIKqM0vet97oaR5YclV5T79QTuJK5UsLS3YHuW7eR2wEOp/Vw+UOKbzO VcFRNPRp9ZEoQkIS2fIwWmMzoquW8cQcJzLtrq4+JzeJamXarKBgp9ucvBI6wcbz/D+CAoixP9 xh8NngrMJ2ZBeZ/RwoM1blOi Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:02 -0800 IronPort-SDR: WnjyAS7JSb9zz5BR6c9pWZlBNsvZ4vm1kzYzqUmhkF8AD60X2Ux4fSabw4lmc6UMn6IRkHAqTL 4C9k2azNuvAnXd2g29DiwrAXOTQLTvRRIVwxiY3cMKKYkTMZIC2WFi9b52eOV/ul9aUQm6WuIU OlLhUlto9DBK9+r20wG5dQVq/ljNvDgnmUlWlyIY6zXsiYfiVZiN15xDbBMDDZ3yTxipkXC24f HJoKv3JmeBDF0CDB3CO0T4TqNtQPp5wzWbU7GRmihRdNb+84DH9eFhiLBRMKN0zmrQeAecPR14 gMg= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:29 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v13 06/42] btrfs: do not load fs_info->zoned from incompat flag Date: Fri, 22 Jan 2021 15:21:06 +0900 Message-Id: <44c5468ccdca173216967582abde007af9c3cc9e.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Don't set the zoned flag in fs_info when encountering the BTRFS_FEATURE_INCOMPAT_ZONED on mount. The zoned flag in fs_info is in a union together with the zone_size, so setting it too early will result in setting an incorrect zone_size as well. Once the correct zone_size is read from the device, we can rely on the zoned flag in fs_info as well to determine if the filesystem is running in zoned mode. Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/disk-io.c | 2 -- fs/btrfs/zoned.c | 8 ++++++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 39cbe10a81b6..76ab86dacc8d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3136,8 +3136,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) btrfs_info(fs_info, "has skinny extents"); - fs_info->zoned = (features & BTRFS_FEATURE_INCOMPAT_ZONED); - /* * flag our filesystem as having big metadata blocks if * they are bigger than the page size diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 87172ce7173b..315cd5189781 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -431,6 +431,14 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + /* + * Check mount options here, because we might change fs_info->zoned + * from fs_info->zone_size. + */ + ret = btrfs_check_mountopts_zoned(fs_info); + if (ret) + goto out; + btrfs_info(fs_info, "zoned mode enabled with zone size %llu", zone_size); out: return ret; From patchwork Fri Jan 22 06:21:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9781C43381 for ; Fri, 22 Jan 2021 06:33:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B08E1239D4 for ; Fri, 22 Jan 2021 06:33:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726935AbhAVGdZ (ORCPT ); Fri, 22 Jan 2021 01:33:25 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51117 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726518AbhAVGYQ (ORCPT ); Fri, 22 Jan 2021 01:24:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296655; x=1642832655; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=W00A4ElYUxWqx3pZQThpqSdlc1PopF9tk891Iam97g0=; b=NsyUc4w0DSRMU8xDpMCP/Hw+RVo1bBsJkNdx/VbmhwrljUbr1Lm9/mLL FgmPNM8pKVWB8f6IMrT19x21U4UEnfvHzMKdToN4zUhc5iRPSTdrJ6BB/ mfLpiluV/I1x/14TMpJvVYf0cleF5r/JU9nugNxbofBYh72pPk6ANql6K 0znP3DmEL8YzKVl85aEM5ZKnxOTGRSIqDFtQOWw+S4kI8XYiX44B/qx4I 9FBrpVBzS4Q2Vkc/YPPbDYYfOz4qv8SzW++u59OKgIiZdZ/DAiL4uCp/l EdNebNEQzIwj5dB9dD9Ruy3c/V4qVmIpGkH5woiCtQ07ZGoYYHgCClTUI A==; IronPort-SDR: sOLIAeJgKsrvuCzlYU3SS5lx4UrmZwmR06EgvzbyIF9RP7IZWnR1Ry24lxnUCdDI8DUHmXWTUR TANG3qqAVwA0hGo7LY50QdTmc/TPVDwq66WeUuWDyUtayri4Nq56b9o8I0CO4GDCn3T83yYWhI xMWnU/4kFrAdQkxTGM5jyHtcru5DZgCaOL4+Eexo7NE6cIzC1xk83cfNvTpr2iCcfRbH/X1owZ dQ3BPqXcG1r80osx2D4Ko6htcDAp0Tfm6HH//8nlc24zN3TtY6Ko2PAtmbA8OikXKR+18gS2dw hLQ= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391956" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:32 +0800 IronPort-SDR: FwpCZHtsc9P0qdCtKdx5GRhCUqklPms5xzFGUyUcAQy04qkqt8cgGQMbf25taLf8OdXub5NukV gQ2KIqcXpoNS00cdCweuh+Z38eUfxbu1rceg5F3MXVzoJyplaNsVRVoRfcS6Th8LwfGpFU+Z/j Yl2Zp/iwOIj50ldH/oy3oHM7hoghY3X3WTnnsqeX7npgAa2Uagaoc8FbFvHEr3snvjDY3AQujb nS2vXVEgPSjAP8KpVfOLvjO5B3xv77Xss6bYhpqbeXIIxmQ1qHbKQQjfJODjSQY5yY9urE5TGD kjcp3RlZTPX6RC14XSWJfBLT Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:04 -0800 IronPort-SDR: FJAjOkRi8Hoiclmmkk/J0XyduzZbt0eNyznGPNmLbYM/Sk+61YiKy1fbjoOSlxpwJnSCyo+zvl 26uQm4eHkLueVd0UPr7Kwzz1O2ZgywvR1Haw51RBdvfU7kbHUdRlmJiFE8cCiMeu4sKo8aAn+S +dQ96RU4d2iCG1M9L3IAEOjMEJQ2zUVbdgZq7zmGNH4pTb8OcgXrDGyXQ/Q+Fmg0t6CbcRDlt4 UZvL/ABBYsYvHuokXZy6aZyBS9ZgQyT0ibJrfn23N88KT5G21nnRBPwYccHDalBcMkCDWmk6BZ Nu4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:31 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v13 07/42] btrfs: disallow fitrim in ZONED mode Date: Fri, 22 Jan 2021 15:21:07 +0900 Message-Id: <51f6f258af8d5de433c3437c26a98936a69eea7e.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The implementation of fitrim is depending on space cache, which is not used and disabled for zoned btrfs' extent allocator. So the current code does not work with zoned btrfs. In the future, we can implement fitrim for zoned btrfs by enabling space cache (but, only for fitrim) or scanning the extent tree at fitrim time. But, for now, disallow fitrim in ZONED mode. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/ioctl.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 7f2935ea8d3a..f05b0b8b1595 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -527,6 +527,14 @@ static noinline int btrfs_ioctl_fitrim(struct btrfs_fs_info *fs_info, if (!capable(CAP_SYS_ADMIN)) return -EPERM; + /* + * btrfs_trim_block_group() is depending on space cache, which is + * not available in ZONED mode. So, disallow fitrim in ZONED mode + * for now. + */ + if (btrfs_is_zoned(fs_info)) + return -EOPNOTSUPP; + /* * If the fs is mounted with nologreplay, which requires it to be * mounted in RO mode as well, we can not allow discard on free space From patchwork Fri Jan 22 06:21:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038363 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83572C433E9 for ; Fri, 22 Jan 2021 06:25:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 59088238EE for ; Fri, 22 Jan 2021 06:25:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726720AbhAVGYw (ORCPT ); Fri, 22 Jan 2021 01:24:52 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51138 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726021AbhAVGY0 (ORCPT ); Fri, 22 Jan 2021 01:24:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296666; x=1642832666; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IZc0+r8npoIstl8ljv31Fz95D/F869EXL1bGj1X3uSU=; b=HlQDvoZDPUA61b4uZ+YQpj/9iGTqUBlgl35ENRR45tHs4giFYMhYRgMq F/Vgbdvjms2+YsOIqAOIYZ+Tg1rcCB6F1YvW4S7FYFAZqxeEezjW6pQMM UYD41GDxubnW+b//0DK1F2w3/mqZJdDPC7aCxBoSBbCiP23hKYRGUBWK7 jtiBVD2Ck1C8OGdyngN9IdZ00HL98c6qhwD/SFbeUFmm25lHYTpDiRb82 tLJyfXBhUw+1Q7U8tCjoWVMDNKCfHGWsmaD3jmVagcQtl+4u+9tVZ1g8i 8L5Js7lAceMNvjbclWvoNN+/ttvDJAJxyiumFln/Di7Nf7CzYB9HhmpST g==; IronPort-SDR: Z08ArFfBs1bFCQsC13SOAD2ioctf//20scq0C1Ao7IFPKtakoL37sOb0xgrUtd1jFvsUchfc/z l0kJaRL8I36wex1OqRbtTvrxFCfOrnaDGYL2fFxFFNg2i381jHS9yOy9GEbz9zQYzYKYT3Ll90 8UZ7buw2CMAR9lH+u+Kg/vJvw7FM3ZhPYJ7W5Oh2Fb/CPdIYfHZP/LKsOlB9aBoJ/5Wp2CkaZg vnzISRE6UNdJ1ajHQczE+uqfG4o5+o7YUOQ12DeWx8nFWulZJAuCAA7EtheSJsTKRz2fyUUOPN FSI= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391959" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:34 +0800 IronPort-SDR: UTcY3X+D9EVsthE3XLrvDLpj12LqHYcdZsa/Aq41VGl+utFSzUGppv9Y+yAlPMTDmOzNb/d/4I 1R/K2uP7G3eOGyJExMX4/E4jnxNk34v8+pUC2RG3cbp3ErESGW/yobI+Vq0D+ltKEx3OuDcmlu ckoINWeaaJaR8fSFPd8qz4TC917bZUstAze8DWn5vCNtlBvuZwsrlvtERvQnNnxaByZTxyZ9hP yHqcLCfbuvvAQNWpEGSKqBYGayusUsIAsirv/+Y8YJr5pRdd5gRIae+SvKwEAs1JU9gvHnrnPo xDFY8OPVg/T2F2so776XVowh Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:05 -0800 IronPort-SDR: rbAJHKlsOHdgiYrVPIbsH3pXI8KnSrNgFGHqMYUdNt2IKmaqsdSFBgt7oh5F3XaqZV4YcjIcgz EctqaRpjLQboQntrZWr4OCXvlakZT36uqFVkjG9eHskId2ZMl2QA+QZrIYqVwuoj6FqSy4laNB A7M45mPgDcH9DV907InKPcQjAqMRvhAq+U7MnJdZLRuOgdxJkjhS/hQiNNa5fALgyJ7SAYpN6B 2Z8SY7W2wuhVnO7g8zoLJh34KhfRxUYzSfWM4E9MqHAsCtOTC254sLiFp20J8ApSZibnhwgprG 1Lc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:32 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Naohiro Aota Subject: [PATCH v13 08/42] btrfs: allow zoned mode on non-zoned block devices Date: Fri, 22 Jan 2021 15:21:08 +0900 Message-Id: <6764c8d232325868e47ded876af398053e674f50.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Run zoned btrfs mode on non-zoned devices. This is done by "slicing up" the block-device into static sized chunks and fake a conventional zone on each of them. The emulated zone size is determined from the size of device extent. This is mainly aimed at testing parts of the zoned mode, i.e. the zoned chunk allocator, on regular block devices. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/zoned.c | 149 +++++++++++++++++++++++++++++++++++++++++++---- fs/btrfs/zoned.h | 14 +++-- 2 files changed, 147 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 315cd5189781..f0af88d497c7 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -119,6 +119,37 @@ static inline u32 sb_zone_number(int shift, int mirror) return 0; } +/* + * Emulate blkdev_report_zones() for a non-zoned device. It slice up + * the block device into static sized chunks and fake a conventional zone + * on each of them. + */ +static int emulate_report_zones(struct btrfs_device *device, u64 pos, + struct blk_zone *zones, unsigned int nr_zones) +{ + const sector_t zone_sectors = + device->fs_info->zone_size >> SECTOR_SHIFT; + sector_t bdev_size = bdev_nr_sectors(device->bdev); + unsigned int i; + + pos >>= SECTOR_SHIFT; + for (i = 0; i < nr_zones; i++) { + zones[i].start = i * zone_sectors + pos; + zones[i].len = zone_sectors; + zones[i].capacity = zone_sectors; + zones[i].wp = zones[i].start + zone_sectors; + zones[i].type = BLK_ZONE_TYPE_CONVENTIONAL; + zones[i].cond = BLK_ZONE_COND_NOT_WP; + + if (zones[i].wp >= bdev_size) { + i++; + break; + } + } + + return i; +} + static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, struct blk_zone *zones, unsigned int *nr_zones) { @@ -127,6 +158,12 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, if (!*nr_zones) return 0; + if (!bdev_is_zoned(device->bdev)) { + ret = emulate_report_zones(device, pos, zones, *nr_zones); + *nr_zones = ret; + return 0; + } + ret = blkdev_report_zones(device->bdev, pos >> SECTOR_SHIFT, *nr_zones, copy_zone_info_cb, zones); if (ret < 0) { @@ -143,6 +180,50 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +/* The emulated zone size is determined from the size of device extent. */ +static int calculate_emulated_zone_size(struct btrfs_fs_info *fs_info) +{ + struct btrfs_path *path; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_key key; + struct extent_buffer *leaf; + struct btrfs_dev_extent *dext; + int ret = 0; + + key.objectid = 1; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto out; + + if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_item(root, path); + if (ret < 0) + goto out; + /* No dev extents at all? Not good */ + if (ret > 0) { + ret = -EUCLEAN; + goto out; + } + } + + leaf = path->nodes[0]; + dext = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_dev_extent); + fs_info->zone_size = btrfs_dev_extent_length(leaf, dext); + ret = 0; + +out: + btrfs_free_path(path); + + return ret; +} + int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; @@ -169,6 +250,7 @@ int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) int btrfs_get_dev_zone_info(struct btrfs_device *device) { + struct btrfs_fs_info *fs_info = device->fs_info; struct btrfs_zoned_device_info *zone_info = NULL; struct block_device *bdev = device->bdev; struct request_queue *queue = bdev_get_queue(bdev); @@ -177,9 +259,14 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) struct blk_zone *zones = NULL; unsigned int i, nreported = 0, nr_zones; unsigned int zone_sectors; + char *model, *emulated; int ret; - if (!bdev_is_zoned(bdev)) + /* + * Cannot use btrfs_is_zoned here, since fs_info->zone_size might + * not be set yet. + */ + if (!btrfs_fs_incompat(fs_info, ZONED)) return 0; if (device->zone_info) @@ -189,8 +276,20 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) if (!zone_info) return -ENOMEM; + if (!bdev_is_zoned(bdev)) { + if (!fs_info->zone_size) { + ret = calculate_emulated_zone_size(fs_info); + if (ret) + goto out; + } + + ASSERT(fs_info->zone_size); + zone_sectors = fs_info->zone_size >> SECTOR_SHIFT; + } else { + zone_sectors = bdev_zone_sectors(bdev); + } + nr_sectors = bdev_nr_sectors(bdev); - zone_sectors = bdev_zone_sectors(bdev); /* Check if it's power of 2 (see is_power_of_2) */ ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0); zone_info->zone_size = zone_sectors << SECTOR_SHIFT; @@ -296,12 +395,32 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) device->zone_info = zone_info; - /* device->fs_info is not safe to use for printing messages */ - btrfs_info_in_rcu(NULL, - "host-%s zoned block device %s, %u zones of %llu bytes", - bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", - rcu_str_deref(device->name), zone_info->nr_zones, - zone_info->zone_size); + switch (bdev_zoned_model(bdev)) { + case BLK_ZONED_HM: + model = "host-managed zoned"; + emulated = ""; + break; + case BLK_ZONED_HA: + model = "host-aware zoned"; + emulated = ""; + break; + case BLK_ZONED_NONE: + model = "regular"; + emulated = "emulated "; + break; + default: + /* Just in case */ + btrfs_err_in_rcu(fs_info, "Unsupported zoned model %d on %s", + bdev_zoned_model(bdev), + rcu_str_deref(device->name)); + ret = -EOPNOTSUPP; + goto out; + } + + btrfs_info_in_rcu(fs_info, + "%s block device %s, %u %szones of %llu bytes", + model, rcu_str_deref(device->name), zone_info->nr_zones, + emulated, zone_info->zone_size); return 0; @@ -348,7 +467,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) u64 nr_devices = 0; u64 zone_size = 0; u64 max_zone_append_size = 0; - const bool incompat_zoned = btrfs_is_zoned(fs_info); + const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; /* Count zoned devices */ @@ -359,9 +478,17 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) continue; model = bdev_zoned_model(device->bdev); + /* + * A Host-Managed zoned device msut be used as a zoned + * device. A Host-Aware zoned device and a non-zoned devices + * can be treated as a zoned device, if ZONED flag is + * enabled in the superblock. + */ if (model == BLK_ZONED_HM || - (model == BLK_ZONED_HA && incompat_zoned)) { - struct btrfs_zoned_device_info *zone_info; + (model == BLK_ZONED_HA && incompat_zoned) || + (model == BLK_ZONED_NONE && incompat_zoned)) { + struct btrfs_zoned_device_info *zone_info = + device->zone_info; zone_info = device->zone_info; zoned_devices++; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 5e0e7de84a82..058a57317c05 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -143,12 +143,16 @@ static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device, u64 p static inline bool btrfs_check_device_zone_type(const struct btrfs_fs_info *fs_info, struct block_device *bdev) { - u64 zone_size; - if (btrfs_is_zoned(fs_info)) { - zone_size = bdev_zone_sectors(bdev) << SECTOR_SHIFT; - /* Do not allow non-zoned device */ - return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size; + /* + * We can allow a regular device on a zoned btrfs, because + * we will emulate zoned device on the regular device. + */ + if (!bdev_is_zoned(bdev)) + return true; + + return fs_info->zone_size == + (bdev_zone_sectors(bdev) << SECTOR_SHIFT); } /* Do not allow Host Manged zoned device */ From patchwork Fri Jan 22 06:21:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038365 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9B31C433DB for ; Fri, 22 Jan 2021 06:25:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 724B823106 for ; Fri, 22 Jan 2021 06:25:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726751AbhAVGZJ (ORCPT ); Fri, 22 Jan 2021 01:25:09 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51100 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726526AbhAVGYp (ORCPT ); Fri, 22 Jan 2021 01:24:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296685; x=1642832685; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hw5rz406n16kqVNs0O4FE53tJCJ27kTzImUMixMpf4g=; b=QYCHK7Xu3inhwekgLwgG22VOCjl1PWR4DSVxS33bJSA5saJSq+C9nOiO +pkB56xFZ0CrXupJPf4xCbSh0Lzc2VfsH4UqNQM2+DMrwt8CSoKDvagYn v9BIhmZyBMLMmbIEyVHG9qbDP+RGFcLVKRYnj/cc5x4PKCsDfLqkwTJzp 8Cn8PSDF05oWZ0WywXI3itp+E81AxmsEVZ85N1CRwchWZNcRVkThtTJM4 E/If8M4sb0OUC/2jmOVDBOshDKlPEzJSPygNHttQp51j/sjJcUzKPDn/0 wox5gRnQCyVZHHf10yfMjWrgsmuNwVQvhRhVj/nfW0jOAYHXNtwow4qWx Q==; IronPort-SDR: Qix6LMvC4J9YjBpQNI4Sx7IC1s+xJmTZimmI79XBN28Ai6EA3kgK+zVTo3uABtIYa9N6cm9lFQ +r6mv7FV1Px8qutVh4nYOKtJAkzTeFmafZRQszE+gFQVbdN9VC6e69mZ9/ajP7ehV/dUyZDtM5 0GBs8847bLrsE3GHoWAuqiu6zuvP1kSuhgieOFjUi7J/tzaX8I1d2T0UuRU9Intn4Ozmbi4dkp x5aG4qc1xOT7F1ctkpHVwBUBVfuxrYKjP7BynCTxLQj30alSwE8DVjbZVSxLVfOm8q+bqaPasx 6Tc= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391962" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:35 +0800 IronPort-SDR: bog+2+pfRCKTU3KTMvhboPvFQ2p8HSv62RdURTs23Bd8TssfO/uBOF7T30b3OJWjmX0fLcczxB yQmLUDv9862VGLOQq4oeEiasECo6cwp7ortroIZgtASYrquzPvkxQd4QMTH2YNQdZ/0I/lyTOn I+am++Debncp+NXK/W/mW/1rNRYv30Rus5afKXDO7CdIt/QFn44/2YeHoVMmys08cdL+xY2ezs J7lmHkeMUquZ3jmZPTqKYISSKDAqzVvSJkQncNw5cmB29ZX9EfBBd4aRDIWnD+ofKtakP3XYfL DbDq1sOTMc+8yeGw7QKtOSld Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:07 -0800 IronPort-SDR: keEMTh6v1i62JN0gR9n8xRGOs3FJ72vlDMFAewnuGQb5B935/OrGNv1n8fBYPr9gXJ9nqW9mAi fjI9M0oH4uvkSpD9fJYQotH/4kXz2EnZzp5ezjGbmBmiYO1Vl/0dCgd3Rvte9bzOEKvgYbrLMt EPflSudVQUaeNW6pLvEJq83NkISYsAk9eyBBTVru9qMdtapM7rO7+UHGI4uEhN3dVZYIbAqn8Z CLzM4nAX71t6LiHbJAVZNJPdfzODiXDn6JY6ZRd3VCtURa6fKDmr3bK7ynbbua1bF3yRr0SXrQ yfk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:34 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 09/42] btrfs: implement zoned chunk allocator Date: Fri, 22 Jan 2021 15:21:09 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a zoned chunk/dev_extent allocator. The zoned allocator aligns the device extents to zone boundaries, so that a zone reset affects only the device extent and does not change the state of blocks in the neighbor device extents. Also, it checks that a region allocation is not overlapping any of the super block zones, and ensures the region is empty. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 169 ++++++++++++++++++++++++++++++++++++++++----- fs/btrfs/volumes.h | 1 + fs/btrfs/zoned.c | 144 ++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 25 +++++++ 4 files changed, 323 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bb3f341f6a22..27208139d6e2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1414,11 +1414,62 @@ static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) * make sure to start at an offset of at least 1MB. */ return max_t(u64, start, SZ_1M); + case BTRFS_CHUNK_ALLOC_ZONED: + /* + * We don't care about the starting region like regular + * allocator, because we anyway use/reserve the first two + * zones for superblock logging. + */ + return ALIGN(start, device->zone_info->zone_size); default: BUG(); } } +static bool dev_extent_hole_check_zoned(struct btrfs_device *device, + u64 *hole_start, u64 *hole_size, + u64 num_bytes) +{ + u64 zone_size = device->zone_info->zone_size; + u64 pos; + int ret; + int changed = 0; + + ASSERT(IS_ALIGNED(*hole_start, zone_size)); + + while (*hole_size > 0) { + pos = btrfs_find_allocatable_zones(device, *hole_start, + *hole_start + *hole_size, + num_bytes); + if (pos != *hole_start) { + *hole_size = *hole_start + *hole_size - pos; + *hole_start = pos; + changed = 1; + if (*hole_size < num_bytes) + break; + } + + ret = btrfs_ensure_empty_zones(device, pos, num_bytes); + + /* Range is ensured to be empty */ + if (!ret) + return changed; + + /* Given hole range was invalid (outside of device) */ + if (ret == -ERANGE) { + *hole_start += *hole_size; + *hole_size = 0; + return 1; + } + + *hole_start += zone_size; + *hole_size -= zone_size; + changed = 1; + } + + return changed; +} + /** * dev_extent_hole_check - check if specified hole is suitable for allocation * @device: the device which we have the hole @@ -1435,24 +1486,39 @@ static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start, bool changed = false; u64 hole_end = *hole_start + *hole_size; - /* - * Check before we set max_hole_start, otherwise we could end up - * sending back this offset anyway. - */ - if (contains_pending_extent(device, hole_start, *hole_size)) { - if (hole_end >= *hole_start) - *hole_size = hole_end - *hole_start; - else - *hole_size = 0; - changed = true; - } + for (;;) { + /* + * Check before we set max_hole_start, otherwise we could end up + * sending back this offset anyway. + */ + if (contains_pending_extent(device, hole_start, *hole_size)) { + if (hole_end >= *hole_start) + *hole_size = hole_end - *hole_start; + else + *hole_size = 0; + changed = true; + } + + switch (device->fs_devices->chunk_alloc_policy) { + case BTRFS_CHUNK_ALLOC_REGULAR: + /* No extra check */ + break; + case BTRFS_CHUNK_ALLOC_ZONED: + if (dev_extent_hole_check_zoned(device, hole_start, + hole_size, num_bytes)) { + changed = true; + /* + * The changed hole can contain pending + * extent. Loop again to check that. + */ + continue; + } + break; + default: + BUG(); + } - switch (device->fs_devices->chunk_alloc_policy) { - case BTRFS_CHUNK_ALLOC_REGULAR: - /* No extra check */ break; - default: - BUG(); } return changed; @@ -1505,6 +1571,9 @@ static int find_free_dev_extent_start(struct btrfs_device *device, search_start = dev_extent_search_start(device, search_start); + WARN_ON(device->zone_info && + !IS_ALIGNED(num_bytes, device->zone_info->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -4899,6 +4968,37 @@ static void init_alloc_chunk_ctl_policy_regular( ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes; } +static void init_alloc_chunk_ctl_policy_zoned( + struct btrfs_fs_devices *fs_devices, + struct alloc_chunk_ctl *ctl) +{ + u64 zone_size = fs_devices->fs_info->zone_size; + u64 limit; + int min_num_stripes = ctl->devs_min * ctl->dev_stripes; + int min_data_stripes = (min_num_stripes - ctl->nparity) / ctl->ncopies; + u64 min_chunk_size = min_data_stripes * zone_size; + u64 type = ctl->type; + + ctl->max_stripe_size = zone_size; + if (type & BTRFS_BLOCK_GROUP_DATA) { + ctl->max_chunk_size = round_down(BTRFS_MAX_DATA_CHUNK_SIZE, + zone_size); + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { + ctl->max_chunk_size = ctl->max_stripe_size; + } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) { + ctl->max_chunk_size = 2 * ctl->max_stripe_size; + ctl->devs_max = min_t(int, ctl->devs_max, + BTRFS_MAX_DEVS_SYS_CHUNK); + } + + /* We don't want a chunk larger than 10% of writable space */ + limit = max(round_down(div_factor(fs_devices->total_rw_bytes, 1), + zone_size), + min_chunk_size); + ctl->max_chunk_size = min(limit, ctl->max_chunk_size); + ctl->dev_extent_min = zone_size * ctl->dev_stripes; +} + static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl) { @@ -4919,6 +5019,9 @@ static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, case BTRFS_CHUNK_ALLOC_REGULAR: init_alloc_chunk_ctl_policy_regular(fs_devices, ctl); break; + case BTRFS_CHUNK_ALLOC_ZONED: + init_alloc_chunk_ctl_policy_zoned(fs_devices, ctl); + break; default: BUG(); } @@ -5045,6 +5148,38 @@ static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl, return 0; } +static int decide_stripe_size_zoned(struct alloc_chunk_ctl *ctl, + struct btrfs_device_info *devices_info) +{ + u64 zone_size = devices_info[0].dev->zone_info->zone_size; + /* Number of stripes that count for block group size */ + int data_stripes; + + /* + * It should hold because: + * dev_extent_min == dev_extent_want == zone_size * dev_stripes + */ + ASSERT(devices_info[ctl->ndevs - 1].max_avail == ctl->dev_extent_min); + + ctl->stripe_size = zone_size; + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + + /* stripe_size is fixed in ZONED. Reduce ndevs instead. */ + if (ctl->stripe_size * data_stripes > ctl->max_chunk_size) { + ctl->ndevs = div_u64(div_u64(ctl->max_chunk_size * ctl->ncopies, + ctl->stripe_size) + ctl->nparity, + ctl->dev_stripes); + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + ASSERT(ctl->stripe_size * data_stripes <= ctl->max_chunk_size); + } + + ctl->chunk_size = ctl->stripe_size * data_stripes; + + return 0; +} + static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl, struct btrfs_device_info *devices_info) @@ -5072,6 +5207,8 @@ static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, switch (fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: return decide_stripe_size_regular(ctl, devices_info); + case BTRFS_CHUNK_ALLOC_ZONED: + return decide_stripe_size_zoned(ctl, devices_info); default: BUG(); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 1997a4649a66..98a447badd6a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -213,6 +213,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used); enum btrfs_chunk_allocation_policy { BTRFS_CHUNK_ALLOC_REGULAR, + BTRFS_CHUNK_ALLOC_ZONED, }; /* diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index f0af88d497c7..e829fa2df8ac 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1,11 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" +#include "disk-io.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -557,6 +559,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; /* * Check mount options here, because we might change fs_info->zoned @@ -779,3 +782,144 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) sb_zone << zone_sectors_shift, zone_sectors * BTRFS_NR_SB_LOG_ZONES, GFP_NOFS); } + +/* + * btrfs_check_allocatable_zones - find allocatable zones within give region + * @device: the device to allocate a region + * @hole_start: the position of the hole to allocate the region + * @num_bytes: the size of wanted region + * @hole_size: the size of hole + * @return: position of allocatable zones + * + * Allocatable region should not contain any superblock locations. + */ +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + u64 nzones = num_bytes >> shift; + u64 pos = hole_start; + u64 begin, end; + bool have_sb; + int i; + + ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + while (pos < hole_end) { + begin = pos >> shift; + end = begin + nzones; + + if (end > zinfo->nr_zones) + return hole_end; + + /* Check if zones in the region are all empty */ + if (btrfs_dev_is_sequential(device, pos) && + find_next_zero_bit(zinfo->empty_zones, end, begin) != end) { + pos += zinfo->zone_size; + continue; + } + + have_sb = false; + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + u32 sb_zone; + u64 sb_pos; + + sb_zone = sb_zone_number(shift, i); + if (!(end <= sb_zone || + sb_zone + BTRFS_NR_SB_LOG_ZONES <= begin)) { + have_sb = true; + pos = ((u64)sb_zone + BTRFS_NR_SB_LOG_ZONES) << shift; + break; + } + + /* + * We also need to exclude regular superblock + * positions + */ + sb_pos = btrfs_sb_offset(i); + if (!(pos + num_bytes <= sb_pos || + sb_pos + BTRFS_SUPER_INFO_SIZE <= pos)) { + have_sb = true; + pos = ALIGN(sb_pos + BTRFS_SUPER_INFO_SIZE, + zinfo->zone_size); + break; + } + } + if (!have_sb) + break; + } + + return pos; +} + +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes) +{ + int ret; + + *bytes = 0; + ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, + physical >> SECTOR_SHIFT, length >> SECTOR_SHIFT, + GFP_NOFS); + if (ret) + return ret; + + *bytes = length; + while (length) { + btrfs_dev_set_zone_empty(device, physical); + physical += device->zone_info->zone_size; + length -= device->zone_info->zone_size; + } + + return 0; +} + +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + unsigned long begin = start >> shift; + unsigned long end = (start + size) >> shift; + u64 pos; + int ret; + + ASSERT(IS_ALIGNED(start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(size, zinfo->zone_size)); + + if (end > zinfo->nr_zones) + return -ERANGE; + + /* All the zones are conventional */ + if (find_next_bit(zinfo->seq_zones, begin, end) == end) + return 0; + + /* All the zones are sequential and empty */ + if (find_next_zero_bit(zinfo->seq_zones, begin, end) == end && + find_next_zero_bit(zinfo->empty_zones, begin, end) == end) + return 0; + + for (pos = start; pos < start + size; pos += zinfo->zone_size) { + u64 reset_bytes; + + if (!btrfs_dev_is_sequential(device, pos) || + btrfs_dev_is_empty_zone(device, pos)) + continue; + + /* Free regions should be empty */ + btrfs_warn_in_rcu( + device->fs_info, + "zoned: resetting device %s (devid %llu) zone %llu for allocation", + rcu_str_deref(device->name), device->devid, + pos >> shift); + WARN_ON_ONCE(1); + + ret = btrfs_reset_device_zone(device, pos, zinfo->zone_size, + &reset_bytes); + if (ret) + return ret; + } + + return 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 058a57317c05..de5901f5ae66 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -36,6 +36,11 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, u64 *bytenr_ret); void btrfs_advance_sb_log(struct btrfs_device *device, int mirror); int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror); +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes); +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes); +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -92,6 +97,26 @@ static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror return 0; } +static inline u64 btrfs_find_allocatable_zones(struct btrfs_device *device, + u64 hole_start, u64 hole_end, + u64 num_bytes) +{ + return hole_start; +} + +static inline int btrfs_reset_device_zone(struct btrfs_device *device, + u64 physical, u64 length, u64 *bytes) +{ + *bytes = 0; + return 0; +} + +static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, + u64 start, u64 size) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 22 06:21:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038367 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C514C433E9 for ; Fri, 22 Jan 2021 06:25:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C679E23106 for ; Fri, 22 Jan 2021 06:25:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726755AbhAVGZO (ORCPT ); Fri, 22 Jan 2021 01:25:14 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51031 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726134AbhAVGY6 (ORCPT ); Fri, 22 Jan 2021 01:24:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296698; x=1642832698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=S9+/RYxer975SEcYFMIh/JyuK33yUpWJ6D9D8jaz4oI=; b=IMZAXN1ARB6eeJKQcOg+KejZS9MLen2viGrYWXsFnLRqcUn74lkeT+2x kEJ8WKlYhvy8cE75oOLLtYYHYA5apxIB2eIgasEDccQJY87KSLN+lLTmZ qiDTY7FquGtfNXdaPU0jrsZRFztllVZ58Ag68Dot33laTxEJPiu01eEAV W1lT38izHCLRKO/EcqpDk3gDP/jwcpJlGzcQdJZiAK6lMNlzSKkSkgmC0 p840nefWUE3zaz8rNnlAM1MbO0agsEcuRk4RjUSxyuoLC0gIk2bSTV7Rw xRHihijgOQ6AaYVhDpBWD7/7EnSAEUOnU6l8GfsLKjBrTngFTkNWFKeN3 g==; IronPort-SDR: H5uPYE42q2BmK+buIYjVmXOD3yk0AVJYdYGciJ34wScQJV+83oAXC/AlKyLwdKn86+YP0nY5IG 45kremn7lAKzNGEdGSF9M5cghIiRkpaTlErpWlAcS8qPGtTKmDuvJLQip5qO7mC0jeMVsSjxcB /PRhfwvPnMg8xO160CqmWWofOOx8w1ELL8/BOnMWbTIZop4IJSXdw5MjzM2m/fIGk3fuimw8j5 GEAHuRpcE6d2dRNQ3d/H4WmkX54QY+03lXIFNeh7yXBZn0V1r2gkBT81pRNyz8Git4s2f75ATK Vns= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391964" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:37 +0800 IronPort-SDR: pKFyybzJIXSIkh6n44Y5rKusBwM1nuhz3qP4az138Go2eciYmY9IXYId6tmFMgh3mPrMFWBU7Z 6nZJgUHkoPPsgmiRLxpOO72dZn9GD6TUqKCU+3JxCnAZS/NSyWTexMAEvrClV2CrO2pxz0YYtt YFTxsITwpZ37yz48hhEaYR28ee8Tm26IXT1fVbUiBxX5wXj1bfAU+tNg2iG8huCloA9XffOdi2 OjhPvRRrk16JYAa/Fp6vR43iLC8/Pma1UOpxsfksIRP7cLO+lZx0hn4aZgFu+eezPbAI7nicFP mWzlY1QF0FcXfhsrYL2Jgjja Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:09 -0800 IronPort-SDR: UDS69Jse1ee9ppUGBy3B1zkeX7Is60qzsKFoykFqoEuG7adwITNtHp6MicjFyI5ze6+AdBOTCc 3D2UY+gZYkhIOD1hIssp7ekoOUZI9Dw1yZSBuWqIHUTELztOfrsKQoUet2OwNqcmlZg7kvuigf co7hQh7KQjIhrBxU2eYjkHpsx7Az2bmT+Unr1Rl7n1a4KyH7Z11wIWeu+tlVMw7xYN89yT6muo RLHbCd95gBWZfs0ss0qNz1K7SCdy7xOiQEJSvp2eY6Sm5PXenBrVClasDnPofcB9lBZWfgKwYv RIA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:35 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v13 10/42] btrfs: verify device extent is aligned to zone Date: Fri, 22 Jan 2021 15:21:10 +0900 Message-Id: <2ffd3693e8d81d619345f9cc31070cd98191c84d.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a check in verify_one_dev_extent() to check if a device extent on a zoned block device is aligned to the respective zone boundary. Signed-off-by: Naohiro Aota Reviewed-by: Anand Jain Reviewed-by: Josef Bacik --- fs/btrfs/volumes.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 27208139d6e2..2d52330f26b5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7776,6 +7776,20 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, ret = -EUCLEAN; goto out; } + + if (dev->zone_info) { + u64 zone_size = dev->zone_info->zone_size; + + if (!IS_ALIGNED(physical_offset, zone_size) || + !IS_ALIGNED(physical_len, zone_size)) { + btrfs_err(fs_info, +"zoned: dev extent devid %llu physical offset %llu len %llu is not aligned to device zone", + devid, physical_offset, physical_len); + ret = -EUCLEAN; + goto out; + } + } + out: free_extent_map(em); return ret; From patchwork Fri Jan 22 06:21:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038369 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FA3EC43381 for ; Fri, 22 Jan 2021 06:25:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0BD67238D7 for ; Fri, 22 Jan 2021 06:25:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726778AbhAVGZe (ORCPT ); Fri, 22 Jan 2021 01:25:34 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51034 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726735AbhAVGY6 (ORCPT ); Fri, 22 Jan 2021 01:24:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296698; x=1642832698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+qBTdF9MvihTRbktJGAoJMSoHkaOlQlVntTFaARTchE=; b=bTEwPZByRtAyfbt45bfh2AmkzRYbTFrjzY6Fsj8o06VLdveySabUoku4 yriODWpjtv7MRiyHVTFGFENSyuWkQUu7jgoPNa52fCn8FLkhpm6H6XAfr 4zRBEG7yHGcxzGAc86tTGyfj9Ch9oIqbYOICroN5F3C80AiAEMcIAnNQn 6LsFzJLWm94A+5Cadye1VGEDCw//OvaEiEjEKYr0Gs2O8cWVIfnV6gE82 JLhhgUflWYCEVh5axbGk+SoyoC2L9N4NJUj67kC44Jzp9ypJ6VdCZXJ4U FcarWZZdRqLC1ul26wtdbhagQDk/CUWS9JnQSEJ8V9HEYzbD0EgU5mcec w==; IronPort-SDR: /xRzccXQSr1ZV4BJYBjoOwxubXwm35CDAfdPCg60XoGQWk8twnrlDrieV3l3S9aikWV7Ji03xj zZKetpJq+ajmITe1lK8hA/O2kvwEDVTsJepy/gD+YHT6evLZE6YNlpLGgDUQNdszYAS/8IYWuf l+jry8Nzk3VlK/JSJLSchYj9QGF44rBchT/sbNvP8UNRUvvZz4cVLdvBctsPSaJ8ISdhvMjHyQ mYGcgDtvkP8hLICG97lZPN8qzmMH/6Tc31F76d2BObeWpTvCQH7geZSmzIYtx+HLVujSaM+0eC mW0= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391969" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:39 +0800 IronPort-SDR: Vt2QTiyVn20k04AEvzHi1+AHTGhk4BKKukdvuis3/degyr/tPG0xeKhi1t8xnA7yrRnA8pLg5Z IkbDkzmvfGMgG5yX11soUO5czgFZLjW6sl/uyqIOhV8fA0c6dESE817IM0J2KjGDpx3jqP0V7u 5J1KlS3Ocf+G0I/fhihw2xXc6kUt5hAOoGpsQy1p3dC01+KsGv0AK1z1jxNjOwZqPiZO+t964J 063ayM1JzO33STrRQNtohiRkuSTvDV+bfe/pfZ7tckv6CcTFfwq1qaUWL/Wq5FdtzcA6DV2sDh +zY3OV322se7mcMswRHYBm5i Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:11 -0800 IronPort-SDR: czpqo/x23DhNW6frKeu3ixXjb+88WXQlonVhvxnKI93yF73s7OljAOUWf+ZbQfpbWD2BuzcJmz aRzuyAkZu/Dve68L4vnqa1E1pFAFTDv36z6oFZnNlpUh2TJxqk+MJR8XmaNboNHWvG/AKOfy8k DwHOXTB3RnxJzI9YWwXBFdho13a6gw0QPl2zYIRvc3Ib70Sz+AlMYq1IOKXZvRWMbJhP1b8jmj Mi7l7gm6FsF3/Avpc/2xgJzJv5ypjRGynpVUYJa5bMX5BokTT027tALrHGI1b5HP++Fmsz3zei O4g= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:37 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Anand Jain Subject: [PATCH v13 11/42] btrfs: load zone's allocation offset Date: Fri, 22 Jan 2021 15:21:11 +0900 Message-Id: <18b4014ae7ff556ccc0d2287ea9e68c08dd84643.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned btrfs must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. This commit adds "alloc_offset" to track the logical address. This logical address is populated in btrfs_load_block_group_zone_info() from write pointers of corresponding zones. For now, zoned btrfs only support the SINGLE profile. Supporting non-SINGLE profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-SINGLE profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/block-group.c | 15 +++++ fs/btrfs/block-group.h | 6 ++ fs/btrfs/zoned.c | 150 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 ++ 4 files changed, 178 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 60d843f341aa..1c5ed46d376c 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -15,6 +15,7 @@ #include "delalloc-space.h" #include "discard.h" #include "raid56.h" +#include "zoned.h" /* * Return target flags in extended format or 0 if restripe for this chunk_type @@ -1842,6 +1843,13 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_err(info, "zoned: failed to load zone info of bg %llu", + cache->start); + goto error; + } + /* * We need to exclude the super stripes now so that the space info has * super bytes accounted for, otherwise we'll think we have more space @@ -2129,6 +2137,13 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, cache->cached = BTRFS_CACHE_FINISHED; if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; + + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_put_block_group(cache); + return ret; + } + ret = exclude_super_stripes(cache); if (ret) { /* We may have excluded something, so call this just in case */ diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 8f74a96074f7..9d026ab1768d 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -183,6 +183,12 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + + /* + * Allocation offset for the block group to implement sequential + * allocation. This is used only with ZONED mode enabled. + */ + u64 alloc_offset; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index e829fa2df8ac..22c0665ee816 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -3,14 +3,20 @@ #include #include #include +#include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" #include "disk-io.h" +#include "block-group.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* Invalid allocation pointer value for missing devices */ +#define WP_MISSING_DEV ((u64)-1) +/* Pseudo write pointer value for conventional zone */ +#define WP_CONVENTIONAL ((u64)-2) /* Number of superblock log zones */ #define BTRFS_NR_SB_LOG_ZONES 2 @@ -923,3 +929,147 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map_tree *em_tree = &fs_info->mapping_tree; + struct extent_map *em; + struct map_lookup *map; + struct btrfs_device *device; + u64 logical = cache->start; + u64 length = cache->length; + u64 physical = 0; + int ret; + int i; + unsigned int nofs_flag; + u64 *alloc_offsets = NULL; + u32 num_sequential = 0, num_conventional = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + /* Sanity check */ + if (!IS_ALIGNED(length, fs_info->zone_size)) { + btrfs_err(fs_info, "zoned: block group %llu len %llu unaligned to zone size %llu", + logical, length, fs_info->zone_size); + return -EIO; + } + + /* Get the chunk mapping */ + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, logical, length); + read_unlock(&em_tree->lock); + + if (!em) + return -EINVAL; + + map = em->map_lookup; + + alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), + GFP_NOFS); + if (!alloc_offsets) { + free_extent_map(em); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + bool is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->bdev == NULL) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (is_sequential) + num_sequential++; + else + num_conventional++; + + if (!is_sequential) { + alloc_offsets[i] = WP_CONVENTIONAL; + continue; + } + + /* + * This zone will be used for allocation, so mark this + * zone non-empty. + */ + btrfs_dev_clear_zone_empty(device, physical); + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + nofs_flag = memalloc_nofs_save(); + ret = btrfs_get_dev_zone(device, physical, &zone); + memalloc_nofs_restore(nofs_flag); + if (ret == -EIO || ret == -EOPNOTSUPP) { + ret = 0; + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } else if (ret) { + goto out; + } + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + btrfs_err(fs_info, "zoned: offline/readonly zone %llu on device %s (devid %llu)", + physical >> device->zone_info->zone_size_shift, + rcu_str_deref(device->name), device->devid); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = + ((zone.wp - zone.start) << SECTOR_SHIFT); + break; + } + } + + if (num_conventional > 0) { + /* + * Since conventional zones do not have a write pointer, we + * cannot determine alloc_offset from the pointer + */ + ret = -EINVAL; + goto out; + } + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + cache->alloc_offset = alloc_offsets[0]; + break; + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID0: + case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* non-SINGLE profiles are not supported yet */ + default: + btrfs_err(fs_info, "zoned: profile %s not supported", + btrfs_bg_type_to_raid_name(map->type)); + ret = -EINVAL; + goto out; + } + +out: + kfree(alloc_offsets); + free_extent_map(em); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index de5901f5ae66..491b98c97f48 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,6 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -117,6 +118,12 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, return 0; } +static inline int btrfs_load_block_group_zone_info( + struct btrfs_block_group *cache) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 22 06:21:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0311C433DB for ; Fri, 22 Jan 2021 06:30:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8B6B2208C7 for ; Fri, 22 Jan 2021 06:30:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726773AbhAVGZY (ORCPT ); Fri, 22 Jan 2021 01:25:24 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51039 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726736AbhAVGY6 (ORCPT ); Fri, 22 Jan 2021 01:24:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296698; x=1642832698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lUdrzrdMU/0wxfXVljbe4dJw/vLuacMWmLD0h11wv6I=; b=GkHqhDbb8/9Y+Q8UFw3kKrezzlXzyOgy8/Vs2FBD9bEZBpbvm1X5YSjS 3DXjA/J/wlK9HrMfwMkMyJRDkvFfQJK5EndCa2uzX2iV0UDEM+IavMeSY LqlHAGKUuY7UDmQh/pzIGA49SkKWj+DTQhwTZlKOSNqHVnG1ScZxeMR78 KH3UcvlRFSH7GLb1QTOO5WbGuAmxHwxI9rx6N8nmG0OVtckBsThkutCFB h4mIXTv+sWVcmktDoDSIv7pQb41AoWDCOc9IdizrT389nD3J+Fr6W4iSG 6ZxFuF2esfPYulLWkFv0m2GReuq7jshyXlbkWM1K1/j6Nj94JgnaPmQ7w g==; IronPort-SDR: 9umSTZvI9flSOGdmOt/M4XBuATdUayDQRidpMNlRWuxQRyZqHPBQKJ2d3RfIKUAgGyLcJSuOsh PA4rfHvxu4rG2oIrkbXZO4hI5dG/EG5T1PD9HBaWFSYUIAvrFOy6grzKSC22U+zmXBAMY3rYWg YW14xCwML9mgMNI346miUPhGGcSqOmDLKOEHaxJabat3Jk/S+I66E7Eo/mn1OWXnwSFj11GSuA Z7qVJt7pKQJ0vshl5GiWhK1tshTHe2QyHNkO6BTLtqzixujYyLnmIgrJBELa9nlZ7OlHySRDY7 bLk= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391977" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:40 +0800 IronPort-SDR: x40dw6SNgHNufCTp0tx+O2h1THU5AyCAVUIfEX+G7ZfQBswzqh1hCKWtavmU9FEnNtptlLAmuf L8mDGJGkt7Mr3CrslO4VUKmPji6oJC0uvSeO1RqGFHR0YgwcROCGwlj1+zPyKTlBzfrxourjO+ 2l2jgnaRZtuOBYcHxY3pZlFv5zdQXTaIGXKxPZ2eBfe9HrkxBwZF9BKY+epDSJMRw73D/W4cFs dKWVgkeaeLLemSCDtAz8mFWb64EWTQQ5TOTXJnq+x9MXBr5qgONH/qyroYUC8VOKE7hTYjyYjj CuxnVzmzxTA0nm8ld5Zg+k4f Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:12 -0800 IronPort-SDR: 6b51J7aElHtvvQaeISf8SzwYWEAixHbnkG2hHhOEQ/LC4KdID3Qk4cu8tknZQWajGco4gIXqUJ 4TT94jV8ZVOnKPiSBrfcNw+wn/FUE5Aeh3zgaAvjO9MnDBkg537fWtd8hQiVHb1ZlJOg/l+x2j AP6PK5FSr+hgH6nBV+IPCvjNqrLKbXakHtik2eSJGgkQuvAzrR2+qgY3gVJenK1mbHiWfg3eV/ 5xA/ZjnORLCE1K+XP6imK3l09EdlLxiL/qoGg0kq4ReOQyD5LdIFQqakvi1K+WxibwwMZls71h kYo= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:39 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v13 12/42] btrfs: calculate allocation offset for conventional zones Date: Fri, 22 Jan 2021 15:21:12 +0900 Message-Id: <617bb7d3a62aa5702bbf31f47ec67fbc56576b30.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset if a block group contains a conventional zone. But instead, we can consider the end of the last allocated extent in the block group as an allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex (which is already taken in btrfs_make_block_group()). Since it is a new block group, we can simply set the allocation offset to 0, anyway. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 +- fs/btrfs/zoned.c | 99 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/zoned.h | 4 +- 3 files changed, 98 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1c5ed46d376c..7c210aa5f25f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1843,7 +1843,7 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, false); if (ret) { btrfs_err(info, "zoned: failed to load zone info of bg %llu", cache->start); @@ -2138,7 +2138,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, true); if (ret) { btrfs_put_block_group(cache); return ret; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 22c0665ee816..1b85a18d8573 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -930,7 +930,68 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +/* + * Calculate an allocation pointer from the extent allocation information + * for a block group consist of conventional zones. It is pointed to the + * end of the last allocated extent in the block group as an allocation + * offset. + */ +static int calculate_alloc_pointer(struct btrfs_block_group *cache, + u64 *offset_ret) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + int ret; + u64 length; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = cache->start + cache->length; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + /* We should not find the exact match */ + if (!ret) + ret = -EUCLEAN; + if (ret < 0) + goto out; + + ret = btrfs_previous_extent_item(root, path, cache->start); + if (ret) { + if (ret == 1) { + ret = 0; + *offset_ret = 0; + } + goto out; + } + + btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]); + + if (found_key.type == BTRFS_EXTENT_ITEM_KEY) + length = found_key.offset; + else + length = fs_info->nodesize; + + if (!(found_key.objectid >= cache->start && + found_key.objectid + length <= cache->start + cache->length)) { + ret = -EUCLEAN; + goto out; + } + *offset_ret = found_key.objectid + length - cache->start; + ret = 0; + +out: + btrfs_free_path(path); + return ret; +} + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) { struct btrfs_fs_info *fs_info = cache->fs_info; struct extent_map_tree *em_tree = &fs_info->mapping_tree; @@ -944,6 +1005,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) int i; unsigned int nofs_flag; u64 *alloc_offsets = NULL; + u64 last_alloc = 0; u32 num_sequential = 0, num_conventional = 0; if (!btrfs_is_zoned(fs_info)) @@ -1042,11 +1104,30 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) if (num_conventional > 0) { /* - * Since conventional zones do not have a write pointer, we - * cannot determine alloc_offset from the pointer + * Avoid calling calculate_alloc_pointer() for new BG. It + * is no use for new BG. It must be always 0. + * + * Also, we have a lock chain of extent buffer lock -> + * chunk mutex. For new BG, this function is called from + * btrfs_make_block_group() which is already taking the + * chunk mutex. Thus, we cannot call + * calculate_alloc_pointer() which takes extent buffer + * locks to avoid deadlock. */ - ret = -EINVAL; - goto out; + if (new) { + cache->alloc_offset = 0; + goto out; + } + ret = calculate_alloc_pointer(cache, &last_alloc); + if (ret || map->num_stripes == num_conventional) { + if (!ret) + cache->alloc_offset = last_alloc; + else + btrfs_err(fs_info, + "zoned: failed to determine allocation offset of bg %llu", + cache->start); + goto out; + } } switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { @@ -1068,6 +1149,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) } out: + /* An extent is allocated after the write pointer */ + if (num_conventional && last_alloc > cache->alloc_offset) { + btrfs_err(fs_info, + "zoned: got wrong write pointer in BG %llu: %llu > %llu", + logical, last_alloc, cache->alloc_offset); + ret = -EIO; + } + kfree(alloc_offsets); free_extent_map(em); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 491b98c97f48..b53403ba0b10 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,7 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -119,7 +119,7 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, } static inline int btrfs_load_block_group_zone_info( - struct btrfs_block_group *cache) + struct btrfs_block_group *cache, bool new) { return 0; } From patchwork Fri Jan 22 06:21:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038511 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3095C433E6 for ; Fri, 22 Jan 2021 06:33:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5951D235FF for ; Fri, 22 Jan 2021 06:33:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726732AbhAVGdV (ORCPT ); Fri, 22 Jan 2021 01:33:21 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51117 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726737AbhAVGY6 (ORCPT ); Fri, 22 Jan 2021 01:24:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296698; x=1642832698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=01DZ4vIzYkv22PEbswG5Q7jsqyJ6aX2+IV8V3uDBB50=; b=lbt9Pir1nPrmXIbKiXiHrxetkiGvnWlP9Qu+NOBLq9oD4sHZ+gTII9tM OR2GV6xvpzdlTxyqCRWO+fQEpntFxOTykECFCeRbiQ9FcHFxMgYUaiiqw Cl6lxEK88zGitfA9vWDXj1UIFMkY6NMhN48AxML3DQfR1veNhVVPbxLo/ rjMNOasGhNj8BbXOvEKbR2puoAjZUubH2wT6baLnWUSMQSATqmVN1kWpR 4SwrlMzRtUsHzXttT/tyoQLm4AUh2SEIgX5XvQl0AiXPt9ZaLnEcFyOef pF6ZiZS/CvudR8+1qOOHBNVJA7DHgoFYhnw48FjaeC3OSz0sNjZeTBVbJ A==; IronPort-SDR: VExd0gGU2Il+80uOZ2rRvSOMvf7gSubJlgqBRKM/aqE1GP7T/YqI8ZE8zxWmUz/8ShDPKCu/4c 0wp5UEWCbH5Ik5gst0Ei17cjmoF42XYkmXg0qLDshF4dZ5NHEpYmFD5KTGqQkk3j1JmPNavbSq lLjLAQjirwvuohy2cUz1WnPLri6hBbTDqHhT1JkpjjZS/72N0slLM5CW2exwL0KkQiZpTFR/B0 ad31dDokRkobVNF/bJ79qtctMVsJDNTO7lxqXaUyjT32VxwkKYXSvgPNMDROqq0SXJe16H9l+z ZWk= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391979" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:42 +0800 IronPort-SDR: 84+t+lYKjk+rR5SSCamsnPHJtFpP4AiB1xbOF1NqXV2qMib5ZqbWTYIB+lUIQexa5NfDPJ99Xz 8gauBmyXL1MnaNJRUjZVs0Ia1Sho+N9P50iTMaurA3hsG6JcKEBnSRGfvMhIjVnnBZlLMZ4jdg kgYe7sJNtOJQVmpR1GCYfSaFAWIQ85rbLpdguXtyFOa861gMivabKG9a5FNCoQB40vZVTBFtKR KgySRlRIHH6r0ICCXPLpQbOiG9d9ny0bvKNHu/nCQc89C+qjePfMxa2RQhjSWjITzBawd/rtVt Hg4t/nZzYBnqmsTll5IUmYvJ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:14 -0800 IronPort-SDR: O5Z9LqGarCZzG793qRCNlU5N7gjmN1Hl1gXMrMmv6bXhyYI4Aw608jS2oMQ19R+0QoFa8Y3l2C AlHf6Q36LRo6xKqInbf45QeIXIfBgjz+gQE2cAcKY+LZhF2+yp4Vf1C1ze+KUTprvFp3E/buL9 9eO3fs1jpaAWIsdc6872QTLsmsx6jIlcKaKEzQ8NPnnVX2Lbr/dh42V6iOVVNb/6zhKWsjgbfZ /gntG/sT3SyqGWFeZEyWx+rPi2tCHudpYQrhyXWuyX/vqEWGaOfzfcpPSjhbUFRda2CPStLbly F5k= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:41 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v13 13/42] btrfs: track unusable bytes for zones Date: Fri, 22 Jan 2021 15:21:13 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In zoned btrfs a region that was once written then freed is not usable until we reset the underlying zone. So we need to distinguish such unusable space from usable free space. Therefore we need to introduce the "zone_unusable" field to the block group structure, and "bytes_zone_unusable" to the space_info structure to track the unusable space. Pinned bytes are always reclaimed to the unusable space. But, when an allocated region is returned before using e.g., the block group becomes read-only between allocation time and reservation time, we can safely return the region to the block group. For the situation, this commit introduces "btrfs_add_free_space_unused". This behaves the same as btrfs_add_free_space() on regular btrfs. On zoned btrfs, it rewinds the allocation offset. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 39 ++++++++++++++------- fs/btrfs/block-group.h | 1 + fs/btrfs/extent-tree.c | 10 +++++- fs/btrfs/free-space-cache.c | 67 +++++++++++++++++++++++++++++++++++++ fs/btrfs/free-space-cache.h | 2 ++ fs/btrfs/space-info.c | 13 ++++--- fs/btrfs/space-info.h | 4 ++- fs/btrfs/sysfs.c | 2 ++ fs/btrfs/zoned.c | 24 +++++++++++++ fs/btrfs/zoned.h | 3 ++ 10 files changed, 146 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 7c210aa5f25f..487511e3f000 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1001,12 +1001,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, WARN_ON(block_group->space_info->total_bytes < block_group->length); WARN_ON(block_group->space_info->bytes_readonly - < block_group->length); + < block_group->length - block_group->zone_unusable); + WARN_ON(block_group->space_info->bytes_zone_unusable + < block_group->zone_unusable); WARN_ON(block_group->space_info->disk_total < block_group->length * factor); } block_group->space_info->total_bytes -= block_group->length; - block_group->space_info->bytes_readonly -= block_group->length; + block_group->space_info->bytes_readonly -= + (block_group->length - block_group->zone_unusable); + block_group->space_info->bytes_zone_unusable -= + block_group->zone_unusable; block_group->space_info->disk_total -= block_group->length * factor; spin_unlock(&block_group->space_info->lock); @@ -1150,7 +1155,7 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force) } num_bytes = cache->length - cache->reserved - cache->pinned - - cache->bytes_super - cache->used; + cache->bytes_super - cache->zone_unusable - cache->used; /* * Data never overcommits, even in mixed mode, so do just the straight @@ -1863,12 +1868,20 @@ static int read_one_block_group(struct btrfs_fs_info *info, } /* - * Check for two cases, either we are full, and therefore don't need - * to bother with the caching work since we won't find any space, or we - * are empty, and we can just add all the space in and be done with it. - * This saves us _a_lot_ of time, particularly in the full case. + * For zoned btrfs, space after the allocation offset is the only + * free space for a block group. So, we don't need any caching + * work. btrfs_calc_zone_unusable() will set the amount of free + * space and zone_unusable space. + * + * For regular btrfs, check for two cases, either we are full, and + * therefore don't need to bother with the caching work since we + * won't find any space, or we are empty, and we can just add all + * the space in and be done with it. This saves us _a_lot_ of + * time, particularly in the full case. */ - if (cache->length == cache->used) { + if (btrfs_is_zoned(info)) { + btrfs_calc_zone_unusable(cache); + } else if (cache->length == cache->used) { cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; btrfs_free_excluded_extents(cache); @@ -1887,7 +1900,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, } trace_btrfs_add_block_group(info, cache, 0); btrfs_update_space_info(info, cache->flags, cache->length, - cache->used, cache->bytes_super, &space_info); + cache->used, cache->bytes_super, + cache->zone_unusable, &space_info); cache->space_info = space_info; @@ -1943,7 +1957,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info) break; } btrfs_update_space_info(fs_info, bg->flags, em->len, em->len, - 0, &space_info); + 0, 0, &space_info); bg->space_info = space_info; link_block_group(bg); @@ -2185,7 +2199,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, */ trace_btrfs_add_block_group(fs_info, cache, 1); btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, - cache->bytes_super, &cache->space_info); + cache->bytes_super, 0, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); @@ -2293,7 +2307,8 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache) spin_lock(&cache->lock); if (!--cache->ro) { num_bytes = cache->length - cache->reserved - - cache->pinned - cache->bytes_super - cache->used; + cache->pinned - cache->bytes_super - + cache->zone_unusable - cache->used; sinfo->bytes_readonly -= num_bytes; list_del_init(&cache->ro_list); } diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 9d026ab1768d..0f3c62c561bc 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -189,6 +189,7 @@ struct btrfs_block_group { * allocation. This is used only with ZONED mode enabled. */ u64 alloc_offset; + u64 zone_unusable; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 30b1a630dc2f..071a521927e6 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -34,6 +34,7 @@ #include "block-group.h" #include "discard.h" #include "rcu-string.h" +#include "zoned.h" #undef SCRAMBLE_DELAYED_REFS @@ -2725,6 +2726,9 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info, { struct btrfs_free_cluster *ret = NULL; + if (btrfs_is_zoned(fs_info)) + return NULL; + *empty_cluster = 0; if (btrfs_mixed_space_info(space_info)) return ret; @@ -2808,7 +2812,11 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, space_info->max_extent_size = 0; percpu_counter_add_batch(&space_info->total_bytes_pinned, -len, BTRFS_TOTAL_BYTES_PINNED_BATCH); - if (cache->ro) { + if (btrfs_is_zoned(fs_info)) { + /* Need reset before reusing in a zoned block group */ + space_info->bytes_zone_unusable += len; + readonly = true; + } else if (cache->ro) { space_info->bytes_readonly += len; readonly = true; } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index fd6ddd6b8165..8975a3a1ba49 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2465,6 +2465,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, int ret = 0; u64 filter_bytes = bytes; + ASSERT(!btrfs_is_zoned(fs_info)); + info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS); if (!info) return -ENOMEM; @@ -2522,11 +2524,49 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, return ret; } +static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, + u64 bytenr, u64 size, bool used) +{ + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 offset = bytenr - block_group->start; + u64 to_free, to_unusable; + + spin_lock(&ctl->tree_lock); + if (!used) + to_free = size; + else if (offset >= block_group->alloc_offset) + to_free = size; + else if (offset + size <= block_group->alloc_offset) + to_free = 0; + else + to_free = offset + size - block_group->alloc_offset; + to_unusable = size - to_free; + + ctl->free_space += to_free; + block_group->zone_unusable += to_unusable; + spin_unlock(&ctl->tree_lock); + if (!used) { + spin_lock(&block_group->lock); + block_group->alloc_offset -= size; + spin_unlock(&block_group->lock); + } + + /* All the region is now unusable. Mark it as unused and reclaim */ + if (block_group->zone_unusable == block_group->length) + btrfs_mark_bg_unused(block_group); + + return 0; +} + int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size) { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2535,6 +2575,16 @@ int btrfs_add_free_space(struct btrfs_block_group *block_group, bytenr, size, trim_state); } +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size) +{ + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + false); + + return btrfs_add_free_space(block_group, bytenr, size); +} + /* * This is a subtle distinction because when adding free space back in general, * we want it to be added as untrimmed for async. But in the case where we add @@ -2545,6 +2595,10 @@ int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC) || btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2562,6 +2616,9 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; + if (btrfs_is_zoned(block_group->fs_info)) + return 0; + spin_lock(&ctl->tree_lock); again: @@ -2656,6 +2713,16 @@ void btrfs_dump_free_space(struct btrfs_block_group *block_group, struct rb_node *n; int count = 0; + /* + * Zoned btrfs does not use free space tree and cluster. Just print + * out the free space after the allocation offset. + */ + if (btrfs_is_zoned(fs_info)) { + btrfs_info(fs_info, "free space %llu", + block_group->length - block_group->alloc_offset); + return; + } + spin_lock(&ctl->tree_lock); for (n = rb_first(&ctl->free_space_offset); n; n = rb_next(n)) { info = rb_entry(n, struct btrfs_free_space, offset_index); diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index ecb09a02d544..1f23088d43f9 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -107,6 +107,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, enum btrfs_trim_state trim_state); int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size); +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size); int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, u64 bytenr, u64 size); int btrfs_remove_free_space(struct btrfs_block_group *block_group, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 84fb94e78a8f..d006fca277ef 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -163,6 +163,7 @@ u64 __pure btrfs_space_info_used(struct btrfs_space_info *s_info, ASSERT(s_info); return s_info->bytes_used + s_info->bytes_reserved + s_info->bytes_pinned + s_info->bytes_readonly + + s_info->bytes_zone_unusable + (may_use_included ? s_info->bytes_may_use : 0); } @@ -257,7 +258,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; @@ -273,6 +274,7 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, found->bytes_used += bytes_used; found->disk_used += bytes_used * factor; found->bytes_readonly += bytes_readonly; + found->bytes_zone_unusable += bytes_zone_unusable; if (total_bytes > 0) found->full = 0; btrfs_try_granting_tickets(info, found); @@ -422,10 +424,10 @@ static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info, info->total_bytes - btrfs_space_info_used(info, true), info->full ? "" : "not "); btrfs_info(fs_info, - "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu", + "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu zone_unusable=%llu", info->total_bytes, info->bytes_used, info->bytes_pinned, info->bytes_reserved, info->bytes_may_use, - info->bytes_readonly); + info->bytes_readonly, info->bytes_zone_unusable); DUMP_BLOCK_RSV(fs_info, global_block_rsv); DUMP_BLOCK_RSV(fs_info, trans_block_rsv); @@ -454,9 +456,10 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, list_for_each_entry(cache, &info->block_groups[index], list) { spin_lock(&cache->lock); btrfs_info(fs_info, - "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s", + "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu zone_unusable %s", cache->start, cache->length, cache->used, cache->pinned, - cache->reserved, cache->ro ? "[readonly]" : ""); + cache->reserved, cache->zone_unusable, + cache->ro ? "[readonly]" : ""); spin_unlock(&cache->lock); btrfs_dump_free_space(cache, bytes); } diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 5646393b928c..ee003ffba956 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -17,6 +17,8 @@ struct btrfs_space_info { u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ + u64 bytes_zone_unusable; /* total bytes that are unusable until + resetting the device zone */ u64 max_extent_size; /* This will hold the maximum extent size of the space info if we had an ENOSPC in the @@ -119,7 +121,7 @@ DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned"); int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, u64 flags); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 19b9fffa2c9c..6eb1c50fa98c 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -666,6 +666,7 @@ SPACE_INFO_ATTR(bytes_pinned); SPACE_INFO_ATTR(bytes_reserved); SPACE_INFO_ATTR(bytes_may_use); SPACE_INFO_ATTR(bytes_readonly); +SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); BTRFS_ATTR(space_info, total_bytes_pinned, @@ -679,6 +680,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, bytes_reserved), BTRFS_ATTR_PTR(space_info, bytes_may_use), BTRFS_ATTR_PTR(space_info, bytes_readonly), + BTRFS_ATTR_PTR(space_info, bytes_zone_unusable), BTRFS_ATTR_PTR(space_info, disk_used), BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, total_bytes_pinned), diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 1b85a18d8573..c5100c982f41 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1162,3 +1162,27 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) return ret; } + +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) +{ + u64 unusable, free; + + if (!btrfs_is_zoned(cache->fs_info)) + return; + + WARN_ON(cache->bytes_super != 0); + unusable = cache->alloc_offset - cache->used; + free = cache->length - cache->alloc_offset; + + /* We only need ->free_space in ALLOC_SEQ BGs */ + cache->last_byte_to_unpin = (u64)-1; + cache->cached = BTRFS_CACHE_FINISHED; + cache->free_space_ctl->free_space = free; + cache->zone_unusable = unusable; + + /* + * Should not have any excluded extents. Just + * in case, though. + */ + btrfs_free_excluded_extents(cache); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index b53403ba0b10..0cc0b27e9437 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -42,6 +42,7 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -124,6 +125,8 @@ static inline int btrfs_load_block_group_zone_info( return 0; } +static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 22 06:21:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0D32C433E9 for ; Fri, 22 Jan 2021 06:32:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 59341235FF for ; Fri, 22 Jan 2021 06:32:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727040AbhAVGcl (ORCPT ); Fri, 22 Jan 2021 01:32:41 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51138 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726757AbhAVGZS (ORCPT ); Fri, 22 Jan 2021 01:25:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296718; x=1642832718; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OR6frRfWIKwaS7K/QrW0FTPqOwmGS+MbQeBJLIaGm30=; b=Co7GmqnLqEmlM3A4buEo3QPfMUFxy8c/BzRmpqTTS/+M5EnOO+snzVrS CTZndMoDRnZQaatx20Yv/XJ+zt1JTKreTiS/wRwXpocScGKNujlgHMFy2 xaZz9l+iKjdnybWAFgGyEA18nQlkOpCpTro9XwVqZDiS96qhlcsTieGvZ z6Ik1nLSdqCJ5oFLfc92n0UB9Wi6rC5EK8qfkaUgIRIz2vlGL7N9WjJjW jZgqd5oZoW1AFxXmAHYA7qE3N2WKNLgYLIgEr7DXIxe7eAkTm5vfMpknC Czd1jAclphFjTbYPhg6yI9uJ92Vd5r0z3jHK2z4Vy6ELda2pWidgZdHQr w==; IronPort-SDR: vkXWzOXy7sskOSV0JB/+Pz2oniN28kyGMAkFAamPUn/0QMKHiBAxzIBOGzMbebCyl+eJ+O+tEX Zdzs8QZMqdbmPig5IrIkpE/O9TExx9fHW7LaX7hx0wQy9Jxq+npNxTaovZ1kngqq43H6mrOgN8 fqcuWG9knSCbO8OkhXIq1nd5T90v2/nugU6xOsx0RuewM+DqGT1yyy39XmisspHhiNue3TbklL YGRNzdVCW90vfoWvppiD3AfemnbAnvkSybaq0VjuoiQ8LKDf6bHIs9kDLQaJf2JDl/Jrs5M7Pk dso= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391985" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:44 +0800 IronPort-SDR: 4sSBHPyt4/+mv2PPIpquRxttxXKVuAajyj9K/0mviW74CZECJNVx14HbCT/lmEb7WMaFFrw6kq reCJnwYK0x2a0WTTgAz7oQT8yxwJips37gNplFoX6Ig0QXW6yC7p+znbOfFH8kFVw4JnWNbiq4 AfRYgo/odjRHo97vIok/aIdbFouzfT6ZZA8ouOBHHJILPAWzIG94Em/EJI6cbpQ0r1vdtZcPnA 0QOH8b90yNw0WZ4PUIXGMzRhIqY0a4T67IksyukrXvaj8FDjziX4+akvR+5GE0XQMGk7ArXG40 IL5QIIt6Sev9TlttnkvpRYta Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:15 -0800 IronPort-SDR: qWaocSNRQmcVrFJob3lfrWWpDSzD1QMCZLKx71Fsp66WOJVwmNS8Iu4z6NXAdqPCyVCBLL4m1t M3fc56MfV7T91Qpu3Sa2yCvFtbUL/4lyiGE36+R9YB3jGcBa95SZNcgJEKhIFs/JgWpjGcjg+B VnmFj0oRoZnae/K52MmL7sJIQZMfC5pz844nFtlrlguDoEXLsFvb4ADfq1mvu6O46urMmy0sUN t81tlSww0JtzDlXlFQQLD6XZbdCoIyT82sLTIoQSFA+EL/bglDnPGtrJWPBjm3qIspmB/fmKJh pMM= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:42 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 14/42] btrfs: do sequential extent allocation in ZONED mode Date: Fri, 22 Jan 2021 15:21:14 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a sequential extent allocator for the ZONED mode. This allocator just needs to check if there is enough space in the block group. Therefor the allocator never manages bitmaps or clusters. Also add ASSERTs to the corresponding functions. Actually, with zone append writing, it is unnecessary to track the allocation offset. It only needs to check space availability. But, by tracking the offset and returning the offset as an allocated region, we can skip modification of ordered extents and checksum information when there is no IO reordering. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 ++ fs/btrfs/extent-tree.c | 92 ++++++++++++++++++++++++++++++++++--- fs/btrfs/free-space-cache.c | 6 +++ 3 files changed, 96 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 487511e3f000..d4c336e470dc 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -717,6 +717,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only struct btrfs_caching_control *caching_ctl = NULL; int ret = 0; + /* Allocator for ZONED btrfs does not use the cache at all */ + if (btrfs_is_zoned(fs_info)) + return 0; + caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS); if (!caching_ctl) return -ENOMEM; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 071a521927e6..00eb42c7f09c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3522,6 +3522,7 @@ btrfs_release_block_group(struct btrfs_block_group *cache, enum btrfs_extent_allocation_policy { BTRFS_EXTENT_ALLOC_CLUSTERED, + BTRFS_EXTENT_ALLOC_ZONED, }; /* @@ -3774,6 +3775,65 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Simple allocator for sequential only block group. It only allows + * sequential allocation. No need to play with trees. This function + * also reserves the bytes as in btrfs_add_reserved_bytes. + */ +static int do_allocation_zoned(struct btrfs_block_group *block_group, + struct find_free_extent_ctl *ffe_ctl, + struct btrfs_block_group **bg_ret) +{ + struct btrfs_space_info *space_info = block_group->space_info; + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 start = block_group->start; + u64 num_bytes = ffe_ctl->num_bytes; + u64 avail; + int ret = 0; + + ASSERT(btrfs_is_zoned(block_group->fs_info)); + + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + if (block_group->ro) { + ret = 1; + goto out; + } + + avail = block_group->length - block_group->alloc_offset; + if (avail < num_bytes) { + if (ffe_ctl->max_extent_size < avail) { + /* + * With sequential allocator, free space is always + * contiguous. + */ + ffe_ctl->max_extent_size = avail; + ffe_ctl->total_free_space = avail; + } + ret = 1; + goto out; + } + + ffe_ctl->found_offset = start + block_group->alloc_offset; + block_group->alloc_offset += num_bytes; + spin_lock(&ctl->tree_lock); + ctl->free_space -= num_bytes; + spin_unlock(&ctl->tree_lock); + + /* + * We do not check if found_offset is aligned to stripesize. The + * address is anyway rewritten when using zone append writing. + */ + + ffe_ctl->search_start = ffe_ctl->found_offset; + +out: + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + return ret; +} + static int do_allocation(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) @@ -3781,6 +3841,8 @@ static int do_allocation(struct btrfs_block_group *block_group, switch (ffe_ctl->policy) { case BTRFS_EXTENT_ALLOC_CLUSTERED: return do_allocation_clustered(block_group, ffe_ctl, bg_ret); + case BTRFS_EXTENT_ALLOC_ZONED: + return do_allocation_zoned(block_group, ffe_ctl, bg_ret); default: BUG(); } @@ -3795,6 +3857,9 @@ static void release_block_group(struct btrfs_block_group *block_group, ffe_ctl->retry_clustered = false; ffe_ctl->retry_unclustered = false; break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3823,6 +3888,9 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, case BTRFS_EXTENT_ALLOC_CLUSTERED: found_extent_clustered(ffe_ctl, ins); break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3838,6 +3906,9 @@ static int chunk_allocation_failed(struct find_free_extent_ctl *ffe_ctl) */ ffe_ctl->loop = LOOP_NO_EMPTY_SIZE; return 0; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Give up here */ + return -ENOSPC; default: BUG(); } @@ -4006,6 +4077,9 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, case BTRFS_EXTENT_ALLOC_CLUSTERED: return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); + case BTRFS_EXTENT_ALLOC_ZONED: + /* nothing to do */ + return 0; default: BUG(); } @@ -4069,6 +4143,9 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.last_ptr = NULL; ffe_ctl.use_cluster = true; + if (btrfs_is_zoned(fs_info)) + ffe_ctl.policy = BTRFS_EXTENT_ALLOC_ZONED; + ins->type = BTRFS_EXTENT_ITEM_KEY; ins->objectid = 0; ins->offset = 0; @@ -4211,20 +4288,23 @@ static noinline int find_free_extent(struct btrfs_root *root, /* move on to the next group */ if (ffe_ctl.search_start + num_bytes > block_group->start + block_group->length) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } if (ffe_ctl.found_offset < ffe_ctl.search_start) - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - ffe_ctl.search_start - ffe_ctl.found_offset); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + ffe_ctl.search_start - ffe_ctl.found_offset); ret = btrfs_add_reserved_bytes(block_group, ram_bytes, num_bytes, delalloc); if (ret == -EAGAIN) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } btrfs_inc_block_group_reservations(block_group); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 8975a3a1ba49..990b0887ea45 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2916,6 +2916,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group, u64 align_gap_len = 0; enum btrfs_trim_state align_gap_trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, block_group->full_stripe_len, max_extent_size); @@ -3047,6 +3049,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group, struct rb_node *node; u64 ret = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&cluster->lock); if (bytes > cluster->max_size) goto out; @@ -3823,6 +3827,8 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group, int ret; u64 rem = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + *trimmed = 0; spin_lock(&block_group->lock); From patchwork Fri Jan 22 06:21:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 439F8C433E0 for ; Fri, 22 Jan 2021 06:26:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F340C208C7 for ; Fri, 22 Jan 2021 06:26:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726896AbhAVG0q (ORCPT ); Fri, 22 Jan 2021 01:26:46 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51100 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726795AbhAVGZi (ORCPT ); Fri, 22 Jan 2021 01:25:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296737; x=1642832737; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hocFyXvEPK4sAYjvl0GWlKjycg9g8lxtp2FVjrj390Y=; b=V/J0VO+xvsKj0li/AriBr/Fqc2GkeYCsdkkKnNb7OYpqeyCy6pz4sF63 Gtm5TEGzOMkkwHbbdG6ws4eMLSWfyLn/YGkzA+fMGRILvbSv6e7T3at4O wYlpBaCtwWxJX9cLPEYVulrk5Kay77F9CPd/WbvYvjZsaSQ3mn+zZMsGc 6Ipzh2evSeIMXqRsw29Me/eZeC4zZHvf3FAGm4D1Di8PfU1P8JiCz3S2M Zq3/pSFINWqth/wRR/hXYOREFtYcnRaR+WV9o2uqIgOgkURw6hKyKF7kC 45BX1tgRVc9eiQmvtnD8/ljExjeHQ/PSZoxQM1Or1Iv+Loor9hx3B6o1+ g==; IronPort-SDR: 7NdJimQ14N4oy+7FyzqVmQaK3a5QK0AXAYRmIkAVj9MoM3qxmwopiqTd/mkyK/WSYNagse6MRJ TVH1/50eNf20TynCYWLEHrAc3W0zTwcRWxAgGgauA3h1FOBcSTjc8ar4D2L4Hf58wbof0GSITW /7nw/9QhJG8/FPqBsEdyIivUN6aPwU9aeqgirlgK5bRxLAYpq6z7JPXMyKkYXZ/1uckWm1R7AB PqfRLX9GlkL0EtVluruO4wJwZQmC96NjRaG/9mxuiXKoo/PFb1on2WkrYXb48sqwrMLTPS87AO gzU= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391986" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:45 +0800 IronPort-SDR: Mk3DkGngZZrpBJjSLJM8k6+p0YhxjXQsJp6bRVjSRtGJtPEIAtYolndkrYivhD4vLXN0effHrd 5aSucGc9Yn1bEGKCH+ZqN0sCiSSTX+0C/buWvLVD4zb4+7qz8x14AXYNgB4tU0pG5SSAnHoXDe PJHwNusMUaDFEl2F5vIQSwC3/bwdz/GxxGWu+7UD5TJZYaP1lDrKkWTscrLpLgMpSRr5kRbA1c wGpwKDEBatUEfsv7NNQnROq93VZhT/5aeJXH2EsnEXaQ8pazvxzSRSgZfgmoBvwrbgF4tUNkRd 3g/N8kb4UH8o9xq81tcDVjB3 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:17 -0800 IronPort-SDR: VC4pZNzCTp0LGqSmpznGNe+lkoLEfv5qaWPB2nEmEsIVv4eacs4RMUUicNAvzc6bzF0leo02oE U4Jx2PrDOYBGZArFfETX/8Vzwkm65w8LPdQttrhQ6f5f8Dtdtb1Ztc6QRDmdUj2DdZzfTU78Am jWxWD9UPItACDEhR0CcuXgWn3bAa21R2VKcYvY848EsQEq6Bs/XX4/E9pJTA07WTYjScpgj85k DEM2LOWkzZX/unoYuUMtr9qrpn8HMaeKBMerEuMIY5RYcFC2OOFCVLSz4JArx4Jfb2eTdyIuUL nFI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:44 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 15/42] btrfs: redirty released extent buffers in ZONED mode Date: Fri, 22 Jan 2021 15:21:15 +0900 Message-Id: <72d6fb72365b88d750cd7601a422b2e09b35b54a.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED volumes, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch introduces a list of clean and unwritten extent buffers that have been released in a transaction. Btrfs redirty the buffer so that btree_write_cache_pages() can send proper bios to the devices. Besides it clears the entire content of the extent buffer not to confuse raw block scanners e.g. btrfsck. By clearing the content, csum_dirty_buffer() complains about bytenr mismatch, so avoid the checking and checksum using newly introduced buffer flag EXTENT_BUFFER_NO_CHECK. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 8 ++++++++ fs/btrfs/extent-tree.c | 12 +++++++++++- fs/btrfs/extent_io.c | 4 ++++ fs/btrfs/extent_io.h | 2 ++ fs/btrfs/transaction.c | 10 ++++++++++ fs/btrfs/transaction.h | 3 +++ fs/btrfs/tree-log.c | 6 ++++++ fs/btrfs/zoned.c | 37 +++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 9 files changed, 88 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 76ab86dacc8d..d530bceb8f9b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -459,6 +459,12 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec return 0; found_start = btrfs_header_bytenr(eb); + + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) { + WARN_ON(found_start != 0); + return 0; + } + /* * Please do not consolidate these warnings into a single if. * It is useful to know what went wrong. @@ -4697,6 +4703,8 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, EXTENT_DIRTY); btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents); + btrfs_free_redirty_list(cur_trans); + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 00eb42c7f09c..9dbc8031c73f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3374,8 +3374,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = check_ref_cleanup(trans, buf->start); - if (!ret) + if (!ret) { + btrfs_redirty_list_add(trans->transaction, buf); goto out; + } } pin = 0; @@ -3387,6 +3389,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, goto out; } + if (btrfs_is_zoned(fs_info)) { + btrfs_redirty_list_add(trans->transaction, buf); + pin_down_extent(trans, cache, buf->start, buf->len, 1); + btrfs_put_block_group(cache); + goto out; + } + WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)); btrfs_add_free_space(cache, buf->start, buf->len); @@ -4733,6 +4742,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, __btrfs_tree_lock(buf, nest); btrfs_clean_tree_block(buf); clear_bit(EXTENT_BUFFER_STALE, &buf->bflags); + clear_bit(EXTENT_BUFFER_NO_CHECK, &buf->bflags); set_extent_buffer_uptodate(buf); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 7f689ad7709c..1e652281afa6 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -24,6 +24,7 @@ #include "rcu-string.h" #include "backref.h" #include "disk-io.h" +#include "zoned.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -5041,6 +5042,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, btrfs_leak_debug_add(&fs_info->eb_leak_lock, &eb->leak_list, &fs_info->allocated_ebs); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); @@ -5828,6 +5830,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv, char *src = (char *)srcv; unsigned long i = get_eb_page_index(start); + WARN_ON(test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)); + if (check_eb_range(eb, start, len)) return; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 19221095c635..5a81268c4d8c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -31,6 +31,7 @@ enum { EXTENT_BUFFER_IN_TREE, /* write IO error */ EXTENT_BUFFER_WRITE_ERR, + EXTENT_BUFFER_NO_CHECK, }; /* these are flags for __process_pages_contig */ @@ -93,6 +94,7 @@ struct extent_buffer { struct rw_semaphore lock; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3bcb5444536e..ef4fcb925cb7 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -21,6 +21,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" #define BTRFS_ROOT_TRANS_TAG 0 @@ -375,6 +376,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); @@ -2336,6 +2339,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + /* + * At this point, we should have written all the tree blocks + * allocated in this transaction. So it's now safe to free the + * redirtyied extent buffers. + */ + btrfs_free_redirty_list(cur_trans); + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 31ca81bad822..660b4e1f1181 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -92,6 +92,9 @@ struct btrfs_transaction { */ atomic_t pending_ordered; wait_queue_head_t pending_wait; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8ee0700a980f..930e752686b4 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -19,6 +19,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" /* magic values for the inode_only field in btrfs_log_inode: * @@ -2752,6 +2753,8 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans, free_extent_buffer(next); return ret; } + btrfs_redirty_list_add( + trans->transaction, next); } else { if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &next->bflags)) clear_extent_buffer_dirty(next); @@ -3296,6 +3299,9 @@ static void free_log_tree(struct btrfs_trans_handle *trans, clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); extent_io_tree_release(&log->log_csum_range); + + if (trans && log->node) + btrfs_redirty_list_add(trans->transaction, log->node); btrfs_put_root(log); } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index c5100c982f41..77ebc4cc5b07 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -10,6 +10,7 @@ #include "rcu-string.h" #include "disk-io.h" #include "block-group.h" +#include "transaction.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1186,3 +1187,39 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) */ btrfs_free_excluded_extents(cache); } + +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) +{ + struct btrfs_fs_info *fs_info = eb->fs_info; + + if (!btrfs_is_zoned(fs_info) || + btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) || + !list_empty(&eb->release_list)) + return; + + set_extent_buffer_dirty(eb); + set_extent_bits_nowait(&trans->dirty_pages, eb->start, + eb->start + eb->len - 1, EXTENT_DIRTY); + memzero_extent_buffer(eb, 0, eb->len); + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); + + spin_lock(&trans->releasing_ebs_lock); + list_add_tail(&eb->release_list, &trans->releasing_ebs); + spin_unlock(&trans->releasing_ebs_lock); + atomic_inc(&eb->refs); +} + +void btrfs_free_redirty_list(struct btrfs_transaction *trans) +{ + spin_lock(&trans->releasing_ebs_lock); + while (!list_empty(&trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + spin_unlock(&trans->releasing_ebs_lock); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 0cc0b27e9437..b2ce16de0c22 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -43,6 +43,9 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb); +void btrfs_free_redirty_list(struct btrfs_transaction *trans); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -127,6 +130,10 @@ static inline int btrfs_load_block_group_zone_info( static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } +static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) { } +static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 22 06:21:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038371 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF524C433E6 for ; Fri, 22 Jan 2021 06:26:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8F05323106 for ; Fri, 22 Jan 2021 06:26:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726834AbhAVGZz (ORCPT ); Fri, 22 Jan 2021 01:25:55 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51039 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726810AbhAVGZm (ORCPT ); Fri, 22 Jan 2021 01:25:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296741; x=1642832741; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xRZH9KEu1Wn2ThIAvcmBq2KQH7GSQKzuEhsQiBxI9rw=; b=EG83WM93lVwe9ExEmddUskWxewAOXTLkQCHWglSEZ2vhkIHfmde2xsV/ gyj4JK7NSeU4Mfhm0/aSizBK1rYL+cnLr1LCCJbZyBw/eyKn5LJYN5iUl HnesRhloiCkgTgWNgKHREQcy/4qKiaa5htFMo9BfnBNHaWOIKIw0lk6Oo u0Id0jrWgOzPk8LORIGUMO9yMN9VEyJM8GXN9Pk+eLFDtesgTn/i5RaIq 1BuA9JDIp4MiZevJxd5rWoXIQtDMWRXRuCx2YV0fdsQo28MtlTBThS9zm mCagfiCZ1YbJ24QBVx0tH23uJgJxrnw46u4QOcav0Mt9/7YCJpHyUTXaa g==; IronPort-SDR: 2ecxe+HhYp8GBfG+3BlD1TCxfgJ4lZXt0IzIxuksWUKtC/RlfDwsdMlhpSWyDgefnP17ycYBYD MkzmFaa1PdR3KueUpUu6PC5mVZnaZSqf/SdYLiI9u+pV4iK7L7yAekVsfyvvhrFPWI26PvaPqc qlkWH00rWxQeHUMNADW81uAVxZY38PTkHMLT7lSnxwj7xBUV1CWH71veHM7dqzNeEKdkdCD4w5 ZR/iNFDY5rInyUETVLmWqg5RTCL2k5Ju/bTdoSclnbFRN+xytp6rooQ9EIkJJoxW8mm6AJoZWe A3A= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391988" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:47 +0800 IronPort-SDR: oCRXuvnYntCq4oL8C2W63vdZSj7kyzojN61KgLzd4nwgguSJH9dSb59g3IKB+3mqQR1BeU0lfQ zGVRQt1dsUG3cLTrtCj8RAFa8tvtKzpEB3xd7PO/yegz19aLqQ4S6mVX+nucHHO4VnkX2Wkzsx 7MQtot1qocYiUhb4TtzDV8ye3tvPHSbpbgUrpGmBTNKjLVRoyC/6vzp8l5JgLFLAyrULGIfCmd RlA5008J9WV+yDBLD9af0xVuO86pxVOTMzbvOYox/o2YEoM3y0nmuEKjax0MdfdjuO2Ol4prsn qnzdGwmhw9fuAxqJhZ/Nauvu Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:19 -0800 IronPort-SDR: ANtHpyoYxVgsSF4Img3Yx4xsgdIVDe5iC5DC+3cb5QndoVeo48lfGHZ6g1HNRQIfLDYFVbgM0T ETRO3kYQBMSKxvViID3geG7dBXgdyy5CoQ1IYHFtKOh1q1r1iXowRE4dxbbTCA+tXnHSY5oMsO OkT2KzPS1Wwy7c+3DsBRYLwjaKIE4eUuabdmS4ibLfAWp2HQjK+60db6eFj0SIcr8r8AT7TQmn ot/ovxwbE/Uif2qslTz37lZtszlaby+eaonqiFiu4bzjHdqvyjKiLZlNtlMRdv8DQZDIOyXB+i sjQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:46 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 16/42] btrfs: advance allocation pointer after tree log node Date: Fri, 22 Jan 2021 15:21:16 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Since the allocation info of tree log node is not recorded to the extent tree, calculate_alloc_pointer() cannot detect the node, so the pointer can be over a tree node. Replaying the log call btrfs_remove_free_space() for each node in the log tree. So, advance the pointer after the node. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/free-space-cache.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 990b0887ea45..1af6eec79f66 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2616,8 +2616,22 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; - if (btrfs_is_zoned(block_group->fs_info)) + if (btrfs_is_zoned(block_group->fs_info)) { + /* + * This can happen with conventional zones when replaying + * log. Since the allocation info of tree-log nodes are + * not recorded to the extent-tree, calculate_alloc_pointer() + * failed to advance the allocation pointer after last + * allocated tree log node blocks. + * + * This function is called from + * btrfs_pin_extent_for_log_replay() when replaying the + * log. Advance the pointer not to overwrite the tree-log nodes. + */ + if (block_group->alloc_offset < offset + bytes) + block_group->alloc_offset = offset + bytes; return 0; + } spin_lock(&ctl->tree_lock); From patchwork Fri Jan 22 06:21:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038373 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2DF5C433DB for ; Fri, 22 Jan 2021 06:26:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 92A1523106 for ; Fri, 22 Jan 2021 06:26:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726847AbhAVG0A (ORCPT ); Fri, 22 Jan 2021 01:26:00 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51031 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726812AbhAVGZo (ORCPT ); Fri, 22 Jan 2021 01:25:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296744; x=1642832744; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=217WurwqngJSQS+/uREeJ26E/Vutk6ZF1Zu9PxkqegM=; b=M9kmW/GGOt4BxXnEBk2NhyCl9F3fR7N6iF47siYb33YyK0Dr1SoA5ivd YbRGlmBlY68FBrkzBLj5u/qlkIV3nExMJmzN8S6DjOdCocti9qO392rDW OB0M/b227RlX0L0dU8KeXTHkCHt6oChyIgmlHYqAyeWHI+ZZQvZaYvh9W wmI3HnlZaH8Xw53KkC4hwAcF0fUFAFGFrAV7E3onxCtv2uLLwSmIoxvEg k0oHoRdKq+FpVEmuWLddLx/eFc57iLnQDfqBXjz9k4J6W1FERwhJ1z2N0 l3YtTcjtx2tq70q6YYVsWan3k6nPvrny/wMl6Gbx30WLi2nxUmb4bWrPM A==; IronPort-SDR: OI+lAVkGCjaj8WMTu8w1niPPl2zSnc9BmO1NOv+b+9BmZiu4z0hYlG1R/B8dZMONuqU0XClwel WFVj5cbp6AKAIbWA5U1tqkusyrI6gxqDcKTq66D9oXx0Mw7HD5wfTUeAPa50D9W06nxNxYAN+g 2l2ddzC5IipP2RF05rlEoB7BQNDtJ2SWd8V3X1UeIH4JfubtGIZ8u7tQMdg5pyFCHIjFR1dD6+ EHIwPpBslGA09Ql8QnnTeI84JNws2ADedt7p1CXlEGaeBFCOq4fQkwOLxn+mEfcHpyyN3pmNOQ ELw= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391991" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:49 +0800 IronPort-SDR: PKorHpQg8rdB8EBcsTM8c2YLIazLyVpmvvcf0IaGQ4vVdIZMcw6ZnSrEUjcDs19jo9xerdmpeo 2emiDnjI1G2bbHDK2wJP+X7a/1rzMw3DN7JqIJ4cF1aYaBDhrxNIF32QayoVjZJGcC84/Ak16g 7lYa2G2zGo47nD01D/LC0FKVlJVdTgGzJXo/YbaxSd1tgJUJYi7fBCvZjQQKBb/r+rUrnqEwQA zfcte99ruhEBWb5l7DGJ1UCtc5Hg/MAiFxMh/CelfqbeTZUYFg7dxwiunGbys+pxp6+OkK2br4 gVh6MEQrSsjymTkw1AuOZAEY Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:20 -0800 IronPort-SDR: kgBLsE8jkvybmSqKy1ZWt0/bO9v2KIBM2jL6DwPQbpX76GcUStrVQo57eI0TJixltwCNXLAHhz gwKcPuSFE+A+6J6m3uMMay8wPvV/kOlvlW6aRUgAUXJwiRyPpEfPTPw9d3xTfLC2Us2stYPA8s KhwFm5U+0u9m+r2wZWPdgEr4ynKSNnJPfy4pY7sZbq5eqyYiovgCLaeW2llgtTh6wd94OBTpZf DaGxSlQlr+Vn0eu9TbMItFEmpGPv0ep7ZNWPL9FTJZ57KWvXBL1GGI2k3WsN2Z1neehlL41qQf dNc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:47 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 17/42] btrfs: enable to mount ZONED incompat flag Date: Fri, 22 Jan 2021 15:21:17 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This final patch adds the ZONED incompat flag to BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount ZONED flagged file system. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1bd416e5a731..4b2e19daed40 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -298,7 +298,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ - BTRFS_FEATURE_INCOMPAT_RAID1C34) + BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ + BTRFS_FEATURE_INCOMPAT_ZONED) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) From patchwork Fri Jan 22 06:21:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF2B9C433E6 for ; Fri, 22 Jan 2021 06:26:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0D04208C7 for ; Fri, 22 Jan 2021 06:26:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726858AbhAVG0Q (ORCPT ); Fri, 22 Jan 2021 01:26:16 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51117 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726814AbhAVGZr (ORCPT ); Fri, 22 Jan 2021 01:25:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296746; x=1642832746; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=N7QyYw3V4ker5UN7e3i23Eb5OMrt3b58vDZYUM5rczo=; b=EhQVcNpYzEgELUby6FigLeyO40IXGaxCbcKO0gFUmFPbewsEuLeW+um0 bLrY+evEXogjrU1d0F0KT4R1dKOKRF75ru30kSkbod6/vNxjjFjHkcmYs MYDbEE06NR+COF/hNC9coHd3B40JwIl68PbBxNMGAEGavWCbKwijRsWHV 5GJQK5bJDM00Zl5UJiDa7gz73BdaeJVLjAGaLnOEJP0ZxNsltJEZ7mTyL gTnsFOH+ZxDC8adpKJAVMS7uE90LMpe9lK1QONQvCvyb1ZmVaJL6XEbtO piFYpLXWoI5SFAhqV2rVr+vp8FCVgGfuK6Cel1b7UMQGf25lJEbVTy2nB w==; IronPort-SDR: 6pRqML5o2TwBI/279P2J851lRvRorXW+pU1YHkByUzCYhAE9e4pp/MreecnlTXVRg5W8IT9Q9Q wwCzycRguwM5Ng6B07G4WZhaDtFLudvS94yCVf+31VA7eGmSiABOFZiRY1+VyEjAhJbER9s0sq OWIRaNjCTMvZjAGQR3W/UavTpqrXs6v+RlXL00pJ1nBCu1tYGCMv6iAjU1MWWeO7C/F/p4XZoF lwiasEdklUb3gYFXta+NAb98165ts4Ueu4qHMJLi5NdkXv5kWUJknEv6OK21+ZKHjg8ktR1PCj THQ= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391994" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:50 +0800 IronPort-SDR: dGx106aXHODaJ03C7pnyxx15R3NgK30k0tunVdEm0QYkha8St40NqAOtRaUMsMNvRW6oT00De5 ZPQoXpUbgO5gaZqRNy4HylKkg28A6YCVeYYKlLHQp3DXLCq/ftbdFzNwGM1H2AsN5KztwBMPYj hUDOoEEht0FRgPFpmF78vIIJlGeLwdKVXKM5FDNbW4RVUodhOiDMGA7xRjhj6VlK6i+HUPZFBv 7GebG5aULiSFYMEFxdpXN68FxIIGzZIyYLgZYa0mrg7tyqtfVOXddGFNXFJBpYI51XIcrksPbK AyJnRE66HKfAv5izhEYoTJGU Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:22 -0800 IronPort-SDR: i3TsLEX78CL/gUyIbDFFhXg2pSEGcNx4kAJxHucpnMJ0yrGAOR+w402l2IpjMqaVWyzdhmWCNm YLFeM3NYa/sm4rniafMEXLZ8bCxyHug0OJdIH3GdW/kkZnGF7mQ8L+YaKjnIm/QqggpY20CAiG ZDKty0cL5XWrJ5vIn7UmiWbL7MJsny4tyuX991XxQ0x0FbtLwSQhGFKSbf/h8M4xGQNM8GHB8q bAv/RaWeFPYLFQWqbVBwgseW7JgXwSKYNZYB6W7L0Y+CpiVxOPL2EUxb4C4WjQyE8M9G6Qk4Cr 4wY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:49 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 18/42] btrfs: reset zones of unused block groups Date: Fri, 22 Jan 2021 15:21:18 +0900 Message-Id: <9c1916a262fd8daf176abfb935de904ad83e1ad4.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For an ZONED volume, a block group maps to a zone of the device. For deleted unused block groups, the zone of the block group can be reset to rewind the zone write pointer at the start of the zone. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 8 ++++++-- fs/btrfs/extent-tree.c | 17 ++++++++++++----- fs/btrfs/zoned.h | 16 ++++++++++++++++ 3 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index d4c336e470dc..e05707f2d272 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1391,8 +1391,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) if (!async_trim_enabled && btrfs_test_opt(fs_info, DISCARD_ASYNC)) goto flip_async; - /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); + /* + * DISCARD can flip during remount. In ZONED mode, we need + * to reset sequential required zones. + */ + trimming = btrfs_test_opt(fs_info, DISCARD_SYNC) || + btrfs_is_zoned(fs_info); /* Implicit trim during transaction commit. */ if (trimming) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 9dbc8031c73f..6a644f64b22e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1333,6 +1333,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { + struct btrfs_device *dev = stripe->dev; + u64 physical = stripe->physical; + u64 length = stripe->length; u64 bytes; struct request_queue *req_q; @@ -1340,14 +1343,18 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + req_q = bdev_get_queue(stripe->dev->bdev); - if (!blk_queue_discard(req_q)) + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, physical, length)) + ret = btrfs_reset_device_zone(dev, physical, + length, &bytes); + else if (blk_queue_discard(req_q)) + ret = btrfs_issue_discard(dev->bdev, physical, + length, &bytes); + else continue; - ret = btrfs_issue_discard(stripe->dev->bdev, - stripe->physical, - stripe->length, - &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index b2ce16de0c22..331951978487 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -210,4 +210,20 @@ static inline bool btrfs_check_super_location(struct btrfs_device *device, u64 p return device->zone_info == NULL || !btrfs_dev_is_sequential(device, pos); } +static inline bool btrfs_can_zone_reset(struct btrfs_device *device, + u64 physical, u64 length) +{ + u64 zone_size; + + if (!btrfs_dev_is_sequential(device, physical)) + return false; + + zone_size = device->zone_info->zone_size; + if (!IS_ALIGNED(physical, zone_size) || + !IS_ALIGNED(length, zone_size)) + return false; + + return true; +} + #endif From patchwork Fri Jan 22 06:21:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52875C4332D for ; Fri, 22 Jan 2021 06:26:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 160ED208C7 for ; Fri, 22 Jan 2021 06:26:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726485AbhAVG0V (ORCPT ); Fri, 22 Jan 2021 01:26:21 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51034 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726817AbhAVGZr (ORCPT ); Fri, 22 Jan 2021 01:25:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296746; x=1642832746; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OZu0hn380A0SKADq2p8+XA/2CYsLVFmKuaW6JAXQtyA=; b=cOZzLi2XeK8WFkQMLgirSFmLbQi0rTax5WSfujGV1KXEQyUrkb/vYJMX gU9nPU6cmUvHZIx8cEA29XZ9mNSgdihIFJGstfwnMEmeCqk+odoxrKu8k /OXFiCWH33mTt0TrmjLEVlgRJmDn/KDvdOPjrWx3P0GXd3tlVArIMElUv swkcWIqB646Rmbdpuc7R73JTXthnCQ9mUhHH3MOc+t+cXKosUtl9XMFL5 6dikcNyDKI5ZpoQRzwZFvqC2+lhhvHvEoSSVPLH7lmocLnbCtY20pt0Sf jsDREp6mtX3kERfn658qy8bCI8JF/InUTx3CajOibuN/+GiZY6F25ZlKt Q==; IronPort-SDR: tId0Pj5VHGWHAUpHcGnMIDDYfcl9M6rYw2Rw52cEtoL1ivXrmkZKQ6FjQR1pcBJst3ZYm57xD8 MdU2AmYinxnoLrzS8RAfCrtUvJVg8wAIImRcfZPbJo5egOwFYzalNd/pH8O6kiQ5DbarpXOmMJ +PYHJKxkCHC5ZBw0la2sRIeNbQE4JrtyyA8RnNVHq60Y33VL3VSLtzPKn34EXXJRWcG/bWv7r7 YSS+hLiqZUxCJsZyJWwjHiKlsPqNs8tAH6Zj/vTfYyu81z/D/+sPnefM5gXW849W6jtxtzY+qo gjo= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268391998" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:52 +0800 IronPort-SDR: IEb4zx6TQj/sZY9yuoY/gtWWRzuuVcN3Sku38fGcl2zb5V1hTon9Z7GLzO36dpecaYCpcay6Hk boAJ6cVGsifLJof32Eg5xwouTrAkn+SS2zlCQm4/M1aaQS8G9bV6YpUwdHTDVJZddy0Z20ZI7K YFHXPajwn3UFK57GJz9Hty6i/gbF7m2hQoZv/MA8u+3P8sMvUGzJNG5aS7ODmdyMm1ysNvG90g z1m7UH9w4kp2eTAAeORYRasaDi4CcxW/iMkdKWmPplajn3fpXJy8rXi4ou+hR6EX0JXddf+jir ggjQ+9mEx42ruF1VGJenjxYF Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:24 -0800 IronPort-SDR: ob9lTzFCvxZCGWngV0JrUsGkrYnY5vtElqcS6QLfsMZkQqhg01Db4x06POzDbt6D3F+J6M2fW8 7OqxJ/oVdvzWy4t9Gn8evHXBhwHh9gHPjITa7r8dWJQY8EGSsdGSf/CfjxutVYwKQWfWTr0QkY BYUSZk+vMDg27K9UQVnuh/w+0w4fjeveqPnPw+jwgFLItPEN1N8ya6wTgAcIY8W3ryv1fAXqgk BPod/Z55BL3+TEcWcNWy+zBZGem3acyO8A/ioXdJuDAcu7u6Gdi9lNmmSo7ugOQKp+3B554zY+ jbg= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:51 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 19/42] btrfs: extract page adding function Date: Fri, 22 Jan 2021 15:21:19 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit extract page adding to bio part from submit_extent_page(). The page is added only when bio_flags are the same, contiguous and the added page fits in the same stripe as pages in the bio. Condition checkings are reordered to allow early return to avoid possibly heavy btrfs_bio_fits_in_stripe() calling. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 58 ++++++++++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1e652281afa6..df7665cdbb2c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3059,6 +3059,46 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size) return bio; } +/** + * btrfs_bio_add_page - attempt to add a page to bio + * @bio: destination bio + * @page: page to add to the bio + * @disk_bytenr: offset of the new bio or to check whether we are adding + * a contiguous page to the previous one + * @pg_offset: starting offset in the page + * @size: portion of page that we want to write + * @prev_bio_flags: flags of previous bio to see if we can merge the current one + * @bio_flags: flags of the current bio to see if we can merge them + * @return: true if page was added, false otherwise + * + * Attempt to add a page to bio considering stripe alignment etc. Return + * true if successfully page added. Otherwise, return false. + */ +static bool btrfs_bio_add_page(struct bio *bio, struct page *page, + u64 disk_bytenr, unsigned int size, + unsigned int pg_offset, + unsigned long prev_bio_flags, + unsigned long bio_flags) +{ + sector_t sector = disk_bytenr >> SECTOR_SHIFT; + bool contig; + + if (prev_bio_flags != bio_flags) + return false; + + if (prev_bio_flags & EXTENT_BIO_COMPRESSED) + contig = bio->bi_iter.bi_sector == sector; + else + contig = bio_end_sector(bio) == sector; + if (!contig) + return false; + + if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) + return false; + + return bio_add_page(bio, page, size, pg_offset) == size; +} + /* * @opf: bio REQ_OP_* and REQ_* flags as one value * @wbc: optional writeback control for io accounting @@ -3087,27 +3127,15 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - sector_t sector = disk_bytenr >> 9; struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; ASSERT(bio_ret); if (*bio_ret) { - bool contig; - bool can_merge = true; - bio = *bio_ret; - if (prev_bio_flags & EXTENT_BIO_COMPRESSED) - contig = bio->bi_iter.bi_sector == sector; - else - contig = bio_end_sector(bio) == sector; - - if (btrfs_bio_fits_in_stripe(page, io_size, bio, bio_flags)) - can_merge = false; - - if (prev_bio_flags != bio_flags || !contig || !can_merge || - force_bio_submit || - bio_add_page(bio, page, io_size, pg_offset) < io_size) { + if (force_bio_submit || + !btrfs_bio_add_page(bio, page, disk_bytenr, io_size, + pg_offset, prev_bio_flags, bio_flags)) { ret = submit_one_bio(bio, mirror_num, prev_bio_flags); if (ret < 0) { *bio_ret = NULL; From patchwork Fri Jan 22 06:21:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4355DC433E0 for ; Fri, 22 Jan 2021 06:26:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1676A235FF for ; Fri, 22 Jan 2021 06:26:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726902AbhAVG0u (ORCPT ); Fri, 22 Jan 2021 01:26:50 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51138 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726812AbhAVG0P (ORCPT ); Fri, 22 Jan 2021 01:26:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296774; x=1642832774; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0hlh/nk+eI9NxaEmuQbJ4vSowBJ1lLOdbFOWcJflPo0=; b=Piq57/W2tpGa96aXJCDRyBcdeFB6qhAUrECfLlQVbOHZxBtZng0kGi7p ERLzROCK1jilkFll+BLjQtJrMS8nT4DJ4+osaUGtx6J6XJwAuVWyJ931U q5ljgUjHhWz5szu/ALiFRBXkHp+pVwXnBVFADeOdXzYSb/I6y4BguyCFP K34Km8H9FHnVmjusMawTcMosbS9UZ31AjL/temuoxsDrU0jxbkpAhz/ic 7hUDO1WhoxloXhMQZTySXUTLGlvg0U8/OyXbWbtHQQbdjkJycqqeD/+V2 kZyTJXuMg4oPqLaGXGKOYDUDLaVZ/5uzWWUEz/udxsNvl0eXXYpHrWgLg g==; IronPort-SDR: +0h1RoJZ+u3K+sJA7BVL49GoAoWArg0zOcVvGpOeiFhONeBMNba8tRtIARfpTejVr20zvdJoo+ AbLDFKcKA5AUEkwB/vq6Gprm/hFKKsQ3D26GAJUDiBu6kUiVdAW0iAXOqKzGdDDyjNA7dogsDf zr4wtI87pBoBgZ5hTCgU1/5aUuc3eH53253jFRiDN71V/7w3Fyp8GSCgp8FXil4ZY8l74QCeL4 nloMwGfgPqe9n06ATeSPbvHyEzHd4b6AajR6iKCYFjlV5n1umyMfOjyJaYwggKcTBKFgLzL9lm UZc= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392006" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:54 +0800 IronPort-SDR: ZLrS/a4byBIF7rEDJQOad3pgQ3ZLt6osW0JWDXyZCuD3XtZxhvIG+oZsX013GHvTg8TTUtapuF 8Lhmp9WgFjhgOEQYhjaowzO9Zlr9i3fG/ASLAFJrNGHO6r8A8Qg5Yq4mc9WmDVuambqanRsqEO 7f+n8vUpTlk2RcNSQQm/hKBd6Ab9gPYaGcsPixLc+Ut63atqSln+cFdiSFRTCmTFsFA28wBdkE 45QDSp72VhIcCuGrqDVc3bDWCEqFFrN6RnMmCRTGPp7/SbY2wvMnCDouj3vc1YCwmEPLUi4RkU 1bnsd6/f+06PEILRBciNiZag Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:25 -0800 IronPort-SDR: XqkitWtHKW9XAvBgXkYZvqXGhGkD0plzspqL2zkJ1x1nBXCRnR+si9gdMV2aR1KyXX/h+cw/wS SW9Q61E13OJTw8fuQLWw8PiRw6ktXOuc+aKk90ekVnBrrpfXB6z/eGjaSMZn1+STrAC1SsfMSR y8gGylqaSdUC7PjjQ9xDIjD3eblpPInDY5iAZ0G6R4TO50oOO1vHOwIgj2Ku8jh7y7RTVa4I7x fd5K3pVJxSmsiYy4DVRYg5VMLaTAnXCGnB1j5Ax22/gBPPyjtWKnkaX/8L8mWdDK4HbrKtn5n4 OpU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:52 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 20/42] btrfs: use bio_add_zone_append_page for zoned btrfs Date: Fri, 22 Jan 2021 15:21:20 +0900 Message-Id: <5079f58020ef53a905ffdf8f7ec25b103d88f0bb.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned device has its own hardware restrictions e.g. max_zone_append_size when using REQ_OP_ZONE_APPEND. To follow the restrictions, use bio_add_zone_append_page() instead of bio_add_page(). We need target device to use bio_add_zone_append_page(), so this commit reads the chunk information to memoize the target device to btrfs_io_bio(bio)->device. Currently, zoned btrfs only supports SINGLE profile. In the feature, btrfs_io_bio can hold extent_map and check the restrictions for all the devices the bio will be mapped. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index df7665cdbb2c..9c8faaf260ee 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3082,6 +3082,7 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, { sector_t sector = disk_bytenr >> SECTOR_SHIFT; bool contig; + int ret; if (prev_bio_flags != bio_flags) return false; @@ -3096,7 +3097,12 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) return false; - return bio_add_page(bio, page, size, pg_offset) == size; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) + ret = bio_add_zone_append_page(bio, page, size, pg_offset); + else + ret = bio_add_page(bio, page, size, pg_offset); + + return ret == size; } /* @@ -3127,7 +3133,9 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; + struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct extent_io_tree *tree = &inode->io_tree; + struct btrfs_fs_info *fs_info = inode->root->fs_info; ASSERT(bio_ret); @@ -3158,11 +3166,27 @@ static int submit_extent_page(unsigned int opf, if (wbc) { struct block_device *bdev; - bdev = BTRFS_I(page->mapping->host)->root->fs_info->fs_devices->latest_bdev; + bdev = fs_info->fs_devices->latest_bdev; bio_set_dev(bio, bdev); wbc_init_bio(wbc, bio); wbc_account_cgroup_owner(wbc, page, io_size); } + if (btrfs_is_zoned(fs_info) && + bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct extent_map *em; + struct map_lookup *map; + + em = btrfs_get_chunk_map(fs_info, disk_bytenr, io_size); + if (IS_ERR(em)) + return PTR_ERR(em); + + map = em->map_lookup; + /* We only support SINGLE profile for now */ + ASSERT(map->num_stripes == 1); + btrfs_io_bio(bio)->device = map->stripes[0].dev; + + free_extent_map(em); + } *bio_ret = bio; From patchwork Fri Jan 22 06:21:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59677C433DB for ; Fri, 22 Jan 2021 06:26:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 256C8235FF for ; Fri, 22 Jan 2021 06:26:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726886AbhAVG0l (ORCPT ); Fri, 22 Jan 2021 01:26:41 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51100 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726871AbhAVG0a (ORCPT ); Fri, 22 Jan 2021 01:26:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296790; x=1642832790; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pgsN3LbRWcYhTz/c3eMHZXbqSnd3OFOK49SDAdOmhZQ=; b=qiuwFjWP12hkQZSuOMzhm3nBIUNR96b9Bm8SqcbkgCcZn0+LNuFKexIK 9otFFFyA6b4Z40QzjWDSVUaw2X9RsURAG6y/L7K6QoQABa73eDxQPLb6h 5fJypMlbFXMcm4KEmsjxjT3TwM00KNDcKOZWz3GJRMUICaOd0IVY+FN6H gHX9TxtVwJmP9RR9LdvLEya97Ap3Rbmbu8ZRVdoMsllwxee381+v9/BFq PdWKbROtunvhPksuWA1ZjkJT3GfeOKjvxPWChaUArU9bg4ImsLkIdHqyc rzQ2CT5H2fjDqsZA8lJcsJizXfLSZrFKxNBYmJcE199YvI3Np+2VgXWZr g==; IronPort-SDR: yrC/EHI5ewdaPVQF0PHuhGjmrZOK/XHoyIZpdOkYxg/mNIdP7AjthW5myIFSsMmVMhyvFCCQBY 5jYqkEo9WUUC/cqzUm2dQ3dj7bcsFz/x+hZC9qEouhFLHQO0hOD1d3bmDHpegSYiJM5rqG83Ji qUm3/FIPX5HWDqKlydtq8NJD3a3gCY7gt3z5Cjk80By0Y9C07wIqtAjCHjwkikLGkh0ilj2l85 HTNyQFe+91/MpdHum16GvJ+fsY7vviBK2RVC8Z22+gzVKbLCTq0jYvwodaVemSYs7FTs9OqkJm dbw= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392008" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:55 +0800 IronPort-SDR: +fsgt/UaDB2+Oxc3X5oyTru6W/hb1ub7O1uKY27aOByOTURV9cNaQi5EZnRA6T+N2lUjjIiCiP 06b+rpIRS/jpuRX4/Oh+4pRH0joOVJeLaIWTRDlF/IWjtgZoP+L8PhM7MU/36G467+crHlV8iT BZuJpopmtd/e7Ojwvb+HJ6XZvFogvVtk2Fb4jMvUznaaBaTeQvUatyb2GP3j6gzZnJqLFkD1Hc fTC9D/PAe1bUb15RgihMIJ9jMcLN2OU+LfPFT7Zr+ch5kBNKrFNgyN0QVMw4L3wQyYjY6oF+l4 UtED5vPRksi9kB9St5u25oCj Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:27 -0800 IronPort-SDR: wClXVor2ktw31FZiTWm3Itr/b5Biuhb/BenM5sb9yeJgKPCfAvVSpErVANVuPcPGH2WOzwujoH 0Hik7XOvZwnaJSmnwsv459rUWtL/TzID/Pu/mlXWaLD0M50xD8Abn6zeUh7277B7YivDoAPwqD 9eR14DEjxw53vP5lJzCx+TIzPBydV8QXaVyFShSd/egCVlTS6/WUT1gHrvpJaiXtbtEq1raUtH j6bAPHc7k/VUVzx50my56C9YIqQKjiTe/CRoQysLsei7DlG0n5GLiZUD7s7NJ16jIHOkqDaWM2 89M= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:54 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 21/42] btrfs: handle REQ_OP_ZONE_APPEND as writing Date: Fri, 22 Jan 2021 15:21:21 +0900 Message-Id: <0b48ccedf7a0982cca507db3b973a8a57a1f277a.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org ZONED btrfs uses REQ_OP_ZONE_APPEND bios for writing to actual devices. Let btrfs_end_bio() and btrfs_op be aware of it. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/inode.c | 10 +++++----- fs/btrfs/volumes.c | 8 ++++---- fs/btrfs/volumes.h | 1 + 4 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index d530bceb8f9b..ba0ca953f7e5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -652,7 +652,7 @@ static void end_workqueue_bio(struct bio *bio) fs_info = end_io_wq->info; end_io_wq->status = bio->bi_status; - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) wq = fs_info->endio_meta_write_workers; else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) @@ -828,7 +828,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int async = check_async_write(fs_info, BTRFS_I(inode)); blk_status_t ret; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { /* * called for a read, do the setup so that checksum validation * can happen in the async kernel threads diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ef6cb7b620d0..2e1c1f37b3f6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2252,7 +2252,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) goto out; @@ -7682,7 +7682,7 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) if (!refcount_dec_and_test(&dip->refs)) return; - if (bio_op(dip->dio_bio) == REQ_OP_WRITE) { + if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) { __endio_write_update_ordered(BTRFS_I(dip->inode), dip->logical_offset, dip->bytes, @@ -7848,7 +7848,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip = bio->bi_private; - bool write = bio_op(bio) == REQ_OP_WRITE; + bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; blk_status_t ret; /* Check btrfs_submit_bio_hook() for rules about async submit. */ @@ -7898,7 +7898,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, struct inode *inode, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); size_t dip_size; struct btrfs_dio_private *dip; @@ -7928,7 +7928,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, struct bio *dio_bio, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const bool raid56 = (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2d52330f26b5..e69754af2eba 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6455,7 +6455,7 @@ static void btrfs_end_bio(struct bio *bio) struct btrfs_device *dev = btrfs_io_bio(bio)->device; ASSERT(dev->bdev); - if (bio_op(bio) == REQ_OP_WRITE) + if (btrfs_op(bio) == BTRFS_MAP_WRITE) btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); else if (!(bio->bi_opf & REQ_RAHEAD)) @@ -6568,10 +6568,10 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, atomic_set(&bbio->stripes_pending, bbio->num_stripes); if ((bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) && - ((bio_op(bio) == REQ_OP_WRITE) || (mirror_num > 1))) { + ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) { /* In this case, map_length has been set to the length of a single stripe; not the whole write */ - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { ret = raid56_parity_write(fs_info, bio, bbio, map_length); } else { @@ -6594,7 +6594,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, dev = bbio->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) || - (bio_op(first_bio) == REQ_OP_WRITE && + (btrfs_op(first_bio) == BTRFS_MAP_WRITE && !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) { bbio_error(bbio, first_bio, logical); continue; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 98a447badd6a..0bcf87a9e594 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -423,6 +423,7 @@ static inline enum btrfs_map_op btrfs_op(struct bio *bio) case REQ_OP_DISCARD: return BTRFS_MAP_DISCARD; case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return BTRFS_MAP_WRITE; default: WARN_ON_ONCE(1); From patchwork Fri Jan 22 06:21:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CD38C433E0 for ; Fri, 22 Jan 2021 06:27:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 535BB238EE for ; Fri, 22 Jan 2021 06:27:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726917AbhAVG0x (ORCPT ); Fri, 22 Jan 2021 01:26:53 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51039 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726873AbhAVG0b (ORCPT ); Fri, 22 Jan 2021 01:26:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296790; x=1642832790; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x9xVk1qfO5sKEHjzVtilwbPyG0KageDh5Su97jQ96XI=; b=mS2UMUTXOX2gD6/yi9927+JlVmKDClXxEEZwIpFnq7wFklVER8nbz16g +TLIUFM4N9H639rVdYguP5lr+nTrP/xHLvn5YI5vSqoc5kfREIBFNxBPV 06XcNO5rTICkQBjT3ijg/OELlyElqvSEbSzmvLIvYhbuH8O3Wa38gzxku 5hFsm23LZbOZHeLXaEvn8CYLxNIVlCcYmeoxCQAKZIx4MNCfhfgcsPBcA LOk9Uy6jcHm2UCGnB7L8ZzHts3TLa6wA91Jmmh+LnSeDncIVJJDtLLwEH 8uN7pKDiSOS28xB5086aYlX0Ej9FbD/QM3SfFlCJZTqL4FWBqEDQdRpDk Q==; IronPort-SDR: TqKhS2AKmRbowWncQUVu29kQORqi0PiJ8TpYE7QyzjaXQVkXtJ5G+4hdMYw7az+x319Mco4U+s Tpjkvwhv8azlTt9tKkbzs3f7nD/kAyQ1XnFSJoK7nM5BY9siqI0MO2Kry4DoX/FEFs68x7i6Ls GJcxNNQRRvwinzgEvoqA9QVsh99A1HPnZcTQIMyP8xKDQWZJ9UDVvjgEubTU9v63cmhPPka3Dw tiaXXcIChA5hK003s09gQxmRBbMyu6YdJUDUyriDgYzsIwjC1fbaHJJ2ua3+4EClD0Sv8R9q3c BQA= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392013" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:57 +0800 IronPort-SDR: C2f1APWRGfN2zZsreHS21upmG5lTGryoJB5+3VTJA8gm1dBlOnhTRuzsodRNpMAiG8BGfNkYPI 5DqeuwZQibcb7bbwXr83KZ69pBbIEHeewHQ1CUcmsq/XACbR/sTaJpD4DQREEW9ltdNaRal+yu thnmkpVmpUu2srSFQCVAr0PQbjqfmG4pT8G+xviZRxDXrJWdHQfveD6Ln7iGdUA9D+TSoLFh88 avEZXLEczAQG3RdYys/o/BtcAWvkD7qXvGILuyfKWcevIGYYjcpOCG7pKzKohRHgSnMyh6LtPY UGC1EqOMUHu+LPhczbF7jsXH Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:29 -0800 IronPort-SDR: /h3SnWcpEP14V/Or/6p16T0mXXD/MJ38FJRoJNCiB1xMFC6lcOH6uDconDxQwO7owu7Egamm0v fsLCiiBDImeos+G9vtVuesIH6feFZaPtoggOPRGcvmJhrtlVzs7tICQnbBdqgGBfROCqO2MuYA un+dYpmJhB5s9cPH8PWJmHlQfcd0SHfyetVeb5qhKz4c6SGoKOdcthfIGUgijW7EOMQC4Wg2j9 i9flXYYYDtimes4q8UasRY1/hmKLPTVuaqJ/f1FaD9MaeJ8jAiV8/I7K5ahwNmvXAJffddXJIa uLk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:56 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , kernel test robot Subject: [PATCH v13 22/42] btrfs: split ordered extent when bio is sent Date: Fri, 22 Jan 2021 15:21:22 +0900 Message-Id: <25b86d9571b1af386f1711d0d0ae626ae6a86b35.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For a zone append write, the device decides the location the data is written to. Therefore we cannot ensure that two bios are written consecutively on the device. In order to ensure that a ordered extent maps to a contiguous region on disk, we need to maintain a "one bio == one ordered extent" rule. This commit implements the splitting of an ordered extent and extent map on bio submission to adhere to the rule. [testbot] made extract_ordered_extent static Reported-by: kernel test robot Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 95 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.c | 85 ++++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.h | 2 + 3 files changed, 182 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2e1c1f37b3f6..ab97d4349515 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2217,6 +2217,92 @@ static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } +static blk_status_t extract_ordered_extent(struct btrfs_inode *inode, + struct bio *bio, loff_t file_offset) +{ + struct btrfs_ordered_extent *ordered; + struct extent_map *em = NULL, *em_new = NULL; + struct extent_map_tree *em_tree = &inode->extent_tree; + u64 start = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + u64 len = bio->bi_iter.bi_size; + u64 end = start + len; + u64 ordered_end; + u64 pre, post; + int ret = 0; + + ordered = btrfs_lookup_ordered_extent(inode, file_offset); + if (WARN_ON_ONCE(!ordered)) + return BLK_STS_IOERR; + + /* No need to split */ + if (ordered->disk_num_bytes == len) + goto out; + + /* We cannot split once end_bio'd ordered extent */ + if (WARN_ON_ONCE(ordered->bytes_left != ordered->disk_num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a compressed ordered extent */ + if (WARN_ON_ONCE(ordered->disk_num_bytes != ordered->num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a waited ordered extent */ + if (WARN_ON_ONCE(wq_has_sleeper(&ordered->wait))) { + ret = -EINVAL; + goto out; + } + + ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes; + /* bio must be in one ordered extent */ + if (WARN_ON_ONCE(start < ordered->disk_bytenr || end > ordered_end)) { + ret = -EINVAL; + goto out; + } + + /* Checksum list should be empty */ + if (WARN_ON_ONCE(!list_empty(&ordered->list))) { + ret = -EINVAL; + goto out; + } + + pre = start - ordered->disk_bytenr; + post = ordered_end - end; + + ret = btrfs_split_ordered_extent(ordered, pre, post); + if (ret) + goto out; + + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, ordered->file_offset, len); + if (!em) { + read_unlock(&em_tree->lock); + ret = -EIO; + goto out; + } + read_unlock(&em_tree->lock); + + ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)); + em_new = create_io_em(inode, em->start + pre, len, + em->start + pre, em->block_start + pre, len, + len, len, BTRFS_COMPRESS_NONE, + BTRFS_ORDERED_REGULAR); + if (IS_ERR(em_new)) { + ret = PTR_ERR(em_new); + goto out; + } + free_extent_map(em_new); + +out: + free_extent_map(em); + btrfs_put_ordered_extent(ordered); + + return errno_to_blk_status(ret); +} + /* * extent_io.c submission hook. This does the right thing for csum calculation * on write, or reading the csums from the tree before a read. @@ -2252,6 +2338,15 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct page *page = bio_first_bvec_all(bio)->bv_page; + loff_t file_offset = page_offset(page); + + ret = extract_ordered_extent(BTRFS_I(inode), bio, file_offset); + if (ret) + goto out; + } + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index d5d326c674b1..4dd935d602b8 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -910,6 +910,91 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, } } +static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos, + u64 len) +{ + struct inode *inode = ordered->inode; + u64 file_offset = ordered->file_offset + pos; + u64 disk_bytenr = ordered->disk_bytenr + pos; + u64 num_bytes = len; + u64 disk_num_bytes = len; + int type; + unsigned long flags_masked = + ordered->flags & ~(1 << BTRFS_ORDERED_DIRECT); + int compress_type = ordered->compress_type; + unsigned long weight; + int ret; + + weight = hweight_long(flags_masked); + WARN_ON_ONCE(weight > 1); + if (!weight) + type = 0; + else + type = __ffs(flags_masked); + + if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered->flags)) { + WARN_ON_ONCE(1); + ret = btrfs_add_ordered_extent_compress(BTRFS_I(inode), + file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type, + compress_type); + } else if (test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) { + ret = btrfs_add_ordered_extent_dio(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type); + } else { + ret = btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type); + } + + return ret; +} + +int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post) +{ + struct inode *inode = ordered->inode; + struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree; + struct rb_node *node; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + int ret = 0; + + spin_lock_irq(&tree->lock); + /* Remove from tree once */ + node = &ordered->rb_node; + rb_erase(node, &tree->tree); + RB_CLEAR_NODE(node); + if (tree->last == node) + tree->last = NULL; + + ordered->file_offset += pre; + ordered->disk_bytenr += pre; + ordered->num_bytes -= (pre + post); + ordered->disk_num_bytes -= (pre + post); + ordered->bytes_left -= (pre + post); + + /* Re-insert the node */ + node = tree_insert(&tree->tree, ordered->file_offset, + &ordered->rb_node); + if (node) + btrfs_panic(fs_info, -EEXIST, + "zoned: inconsistency in ordered tree at offset %llu", + ordered->file_offset); + + spin_unlock_irq(&tree->lock); + + if (pre) + ret = clone_ordered_extent(ordered, 0, pre); + if (post) + ret = clone_ordered_extent(ordered, + pre + ordered->disk_num_bytes, + post); + + return ret; +} + int __init ordered_data_init(void) { btrfs_ordered_extent_cache = kmem_cache_create("btrfs_ordered_extent", diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 46194c2c05d4..3bf2f62fce5c 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -190,6 +190,8 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, u64 end, struct extent_state **cached_state); +int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post); int __init ordered_data_init(void); void __cold ordered_data_exit(void); From patchwork Fri Jan 22 06:21:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038421 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C2D8C433E0 for ; Fri, 22 Jan 2021 06:27:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 00E3A235FF for ; Fri, 22 Jan 2021 06:27:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726881AbhAVG1l (ORCPT ); Fri, 22 Jan 2021 01:27:41 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51031 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726877AbhAVG0d (ORCPT ); Fri, 22 Jan 2021 01:26:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296792; x=1642832792; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m8HW66vMHAbw2Hs4DnX6UYtfb9TX4vO8e6FqIcm3MCc=; b=G5E7zCGNVo8Cr4YX6A8nZa+fQ0B7LBJatARk+MULL7nNeZiI2aXZDyIF kJMX6NqsFc9ejXSeY7qZevSASq20OduhewuOlaRmluubRb01Ov/yVcXU+ bjIMUSuV0+IIB0aHcTjAiU0tFomJ3OMpVf8TbXO8HjtkSB69DR3nC6pG2 NwH6A8nsH4XOl591Tz68SYhliLhPGnUNZUbu8iTS1pm1JADORjABUvmhC KkoCzlVznYXwv+lPW2iwPfohR0I11gb8HWwDFClHw/evkZBODtXo11j4L 0NSd0oyFedDz/cNQ3s5pUQ8FvtX/98jwGraJdGaUlIyzqDsnF0QXnPcds Q==; IronPort-SDR: I0hawge27db28yCblhy4gzDPw3+pZ5RzWlGabd/7Yux8DCL6EDkaUxY6p3mjsKqjRVB3QFTHex gP/u/iIKz+aIksJD0a+Try4hopEdCzSAjiCNU0TxknBD/VSOKN668U3OM5QouD3qgwOGz9RjnF qQDFcCCeL5xbF8pjFi7S7PLuD8NA0NFpyqg/ayyV/0IBYncCGRgvKp7tiPaO8i0317sbkueCOH mkYhb8QgWLY/RVOkGMCFg+c4V+WoLqrD9sAQwkUMbdI4Nm/aIQGTisUN8d13k+x7a1RFEEg8c1 mzQ= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392014" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:22:59 +0800 IronPort-SDR: mCA0vNFfknlbxOEP12l+0wAehAnOFWHqIGJ5l4ECklJVYR9caWO7aorI27Cz+BkEIQAkhUmIOP JtaUNFRerHi7gog7HiE+q6GmYoJXkWfwnYUtP+guFKg/4emq7mCu1z9ZznyYvDdjA+eMJjGaAV lNnqI/UKz3BtSWzV+IgNGl9xZvtaXhUpM4eHM40YqM+BWZb5Pr+6oYAQMFsT2YTuzap+weVu2u YB6zpmzoNmr36d/MoOBAAmFmAMqCbJ4KtlXPUEFqe3wOHIxNS/t18HyxLIIsvViCfsHaruX952 ssLJHw4c4thpM+H5TmFSWx5i Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:31 -0800 IronPort-SDR: lNlzhjCuNG+Q/zVK6J2NnZmM1qRYkiuUTkdIAvKps8xcEi2L937VlKvNUPy50ogCD/vfwqEBUA cdpew1pfsWCJwTpF674uWUmHYtAqcB8m7WQhtO0VHZVxZK6EdhddDWEbxlZ49k+iYFTVABAVVX 18rREj8w19uf0ZFUMmx3KghKsqI1o3i5aiWN/cxtLoV2ccKveO0aS8WlgRzPA9KTtXdK91YOCt GzqrQB7NG0fyMOSOgNh3CzF94zvcRinw4lA7jKslnp8dNnTAjqFuinZ1a34WQhY6ZR/J4s+7mZ 6VE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:57 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Naohiro Aota Subject: [PATCH v13 23/42] btrfs: check if bio spans across an ordered extent Date: Fri, 22 Jan 2021 15:21:23 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn To ensure that an ordered extent maps to a contiguous region on disk, we need to maintain a "one bio == one ordered extent" rule. This commit ensures that constructing bio does not span across an ordered extent. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent_io.c | 9 +++++++-- fs/btrfs/inode.c | 29 +++++++++++++++++++++++++++++ 3 files changed, 38 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4b2e19daed40..1b6f66575471 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3116,6 +3116,8 @@ void btrfs_split_delalloc_extent(struct inode *inode, struct extent_state *orig, u64 split); int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio, unsigned long bio_flags); +bool btrfs_bio_fits_in_ordered_extent(struct page *page, struct bio *bio, + unsigned int size); void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end); vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf); int btrfs_readpage(struct file *file, struct page *page); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9c8faaf260ee..b9fefa624760 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3097,10 +3097,15 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) return false; - if (bio_op(bio) == REQ_OP_ZONE_APPEND) + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct page *first_page = bio_first_bvec_all(bio)->bv_page; + + if (!btrfs_bio_fits_in_ordered_extent(first_page, bio, size)) + return false; ret = bio_add_zone_append_page(bio, page, size, pg_offset); - else + } else { ret = bio_add_page(bio, page, size, pg_offset); + } return ret == size; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ab97d4349515..286eee122657 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2217,6 +2217,35 @@ static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } + + +bool btrfs_bio_fits_in_ordered_extent(struct page *page, struct bio *bio, + unsigned int size) +{ + struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_ordered_extent *ordered; + u64 len = bio->bi_iter.bi_size + size; + bool ret = true; + + ASSERT(btrfs_is_zoned(fs_info)); + ASSERT(fs_info->max_zone_append_size > 0); + ASSERT(bio_op(bio) == REQ_OP_ZONE_APPEND); + + /* Ordered extent not yet created, so we're good */ + ordered = btrfs_lookup_ordered_extent(inode, page_offset(page)); + if (!ordered) + return ret; + + if ((bio->bi_iter.bi_sector << SECTOR_SHIFT) + len > + ordered->disk_bytenr + ordered->disk_num_bytes) + ret = false; + + btrfs_put_ordered_extent(ordered); + + return ret; +} + static blk_status_t extract_ordered_extent(struct btrfs_inode *inode, struct bio *bio, loff_t file_offset) { From patchwork Fri Jan 22 06:21:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038423 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D92C8C433E6 for ; Fri, 22 Jan 2021 06:27:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D588235FF for ; Fri, 22 Jan 2021 06:27:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726602AbhAVG1h (ORCPT ); Fri, 22 Jan 2021 01:27:37 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51034 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726880AbhAVG0e (ORCPT ); Fri, 22 Jan 2021 01:26:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296793; x=1642832793; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=//70rDfox0kCUW89riT7Fpn9gi3UCqIqusi9c1b7/Hk=; b=Uwjf78MxnZ5iMD6zA11XsQnKkcYMq7TSsrpDMnpYSuRPxtj0N2uNq/cX p0RGUiJWWPAOIrnHa1fidBCD5GfFJZtSPQB7Vh8ePRLJxiqdvXqdZs5Gm lTT9oelNblPQEEJrt//XnIzZk9n15BlP5o7UYz4s2F6gGPC+FxboCc3Py ooi/Vn/y6VZKm/27kLh6TGqWOPqa6GUaD/eoi+3nGeN45xwne2QB2ya67 3WIuGVfDkVA6M0LwbmN1eUzAVMZ/7NIezClLcX9BCUc2T1NTCOmiAyy3B BtoOPmBPwM/PdDy1U1+o3hw1FRanhc/vq0MAYMfzTPlB68l83Wl3Ns/Tq Q==; IronPort-SDR: AhvIUp/3jiFtKZw4xWcsCq8hoe3Oy68SYxzktN9WFqHc5hAyEzx3M1QaHVd5POEqFscJHFjSAy /hqIlQfmet4KJ6QlC12TCu1wDrJU0hOct7wgUYlbAI8pi30Us4Ze4WbRpsC02QLV4xbCWcodxa hepO3I44kmegaj+VqFQKaJaTkyWe8i41cTX7prdwglcUG2zSz287K/J5LsOK5lmiIPx/oa/FkJ fc0XuuV52NgzBd1v/9zGSdk2LUzqpKOZF5atJ/v+t2m4CQjSmnRzRLVKpUQW65WYYowoXkf7V7 msY= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392016" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:00 +0800 IronPort-SDR: QJ0iqQ73yvnydUpk11Dm64hKi76uHdHnuWFWRmsh5QAGBLBDgrTZ5CjSj699tjOeFShi5HwmkG EF2RTxVi8zUWBcNC52sBq7V/100MtkWa1XxxMM8boja8SwQGiKXpKhhMRwWmNBviH4UFVB73M3 jPvdfA3prV/n/D45xq1m19pAq5QFRXnO6wlM432Ll0OEbWsUZJffYXYd/wM5I2DXiXCGco/oIj m8kJNM7HOzRvpF0BESLLATyJAkEixYefpRspJpsRMjHhmapqiZrMy2X8m1CRvfoGQj5ejfBlgc ahLAAkQ0fHSovJRADI/plmuT Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:32 -0800 IronPort-SDR: a8Ne05rkLySN2lKyBSnllGFHuaSgDwldUfUOcvN75pYKn75L4LLmgSM6cSRffiVJEjzKHADnYP g7UfQQe3SYhylvjU1PXeM0R5FbllyCAtMEu60A2BWFCrBTBFAB59FJz90O7tjqPVuMbAo8UUr6 1yMyqMuBEhizbt0Jti9NyM0X+h0Ier/pTPfDfoWUj5mPVV5JzBXEvvS/1x0MdjPDKRGD8kkOnO K88vs93Nueo0WbBwxKtiAK1aQs36W+WcP9KjmiWr79UpRFuRLKZ9Il+VQNw1SxoBDEyJX5oWO+ Zsg= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:22:59 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 24/42] btrfs: extend btrfs_rmap_block for specifying a device Date: Fri, 22 Jan 2021 15:21:24 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org btrfs_rmap_block currently reverse-maps the physical addresses on all devices to the corresponding logical addresses. This commit extends the function to match to a specified device. The old functionality of querying all devices is left intact by specifying NULL as target device. We pass block_device instead of btrfs_device to __btrfs_rmap_block. This function is intended to reverse-map the result of bio, which only have block_device. This commit also exports the function for later use. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 20 ++++++++++++++------ fs/btrfs/block-group.h | 8 +++----- fs/btrfs/tests/extent-map-tests.c | 2 +- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index e05707f2d272..56fab3d490b0 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1567,8 +1567,11 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) } /** - * btrfs_rmap_block - Map a physical disk address to a list of logical addresses + * btrfs_rmap_block - Map a physical disk address to a list of logical + * addresses * @chunk_start: logical address of block group + * @bdev: physical device to resolve. Can be NULL to indicate any + * device. * @physical: physical address to map to logical addresses * @logical: return array of logical addresses which map to @physical * @naddrs: length of @logical @@ -1578,9 +1581,9 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) * Used primarily to exclude those portions of a block group that contain super * block copies. */ -EXPORT_FOR_TESTS int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len) + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len) { struct extent_map *em; struct map_lookup *map; @@ -1598,6 +1601,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, map = em->map_lookup; data_stripe_length = em->orig_block_len; io_stripe_size = map->stripe_len; + chunk_start = em->start; /* For RAID5/6 adjust to a full IO stripe length */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) @@ -1612,14 +1616,18 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, for (i = 0; i < map->num_stripes; i++) { bool already_inserted = false; u64 stripe_nr; + u64 offset; int j; if (!in_range(physical, map->stripes[i].physical, data_stripe_length)) continue; + if (bdev && map->stripes[i].dev->bdev != bdev) + continue; + stripe_nr = physical - map->stripes[i].physical; - stripe_nr = div64_u64(stripe_nr, map->stripe_len); + stripe_nr = div64_u64_rem(stripe_nr, map->stripe_len, &offset); if (map->type & BTRFS_BLOCK_GROUP_RAID10) { stripe_nr = stripe_nr * map->num_stripes + i; @@ -1633,7 +1641,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, * instead of map->stripe_len */ - bytenr = chunk_start + stripe_nr * io_stripe_size; + bytenr = chunk_start + stripe_nr * io_stripe_size + offset; /* Ensure we don't add duplicate addresses */ for (j = 0; j < nr; j++) { @@ -1675,7 +1683,7 @@ static int exclude_super_stripes(struct btrfs_block_group *cache) for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); - ret = btrfs_rmap_block(fs_info, cache->start, + ret = btrfs_rmap_block(fs_info, cache->start, NULL, bytenr, &logical, &nr, &stripe_len); if (ret) return ret; diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 0f3c62c561bc..9df00ada09f9 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -277,6 +277,9 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info); int btrfs_free_block_groups(struct btrfs_fs_info *info); void btrfs_wait_space_cache_v1_finished(struct btrfs_block_group *cache, struct btrfs_caching_control *caching_ctl); +int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len); static inline u64 btrfs_data_alloc_profile(struct btrfs_fs_info *fs_info) { @@ -303,9 +306,4 @@ static inline int btrfs_block_group_done(struct btrfs_block_group *cache) void btrfs_freeze_block_group(struct btrfs_block_group *cache); void btrfs_unfreeze_block_group(struct btrfs_block_group *cache); -#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS -int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len); -#endif - #endif /* BTRFS_BLOCK_GROUP_H */ diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c index 57379e96ccc9..c0aefe6dee0b 100644 --- a/fs/btrfs/tests/extent-map-tests.c +++ b/fs/btrfs/tests/extent-map-tests.c @@ -507,7 +507,7 @@ static int test_rmap_block(struct btrfs_fs_info *fs_info, goto out_free; } - ret = btrfs_rmap_block(fs_info, em->start, btrfs_sb_offset(1), + ret = btrfs_rmap_block(fs_info, em->start, NULL, btrfs_sb_offset(1), &logical, &out_ndaddrs, &out_stripe_len); if (ret || (out_ndaddrs == 0 && test->expected_mapped_addr)) { test_err("didn't rmap anything but expected %d", From patchwork Fri Jan 22 06:21:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30678C433E0 for ; Fri, 22 Jan 2021 06:27:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DD85E235FF for ; Fri, 22 Jan 2021 06:27:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726884AbhAVG1S (ORCPT ); Fri, 22 Jan 2021 01:27:18 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51117 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726881AbhAVG0f (ORCPT ); Fri, 22 Jan 2021 01:26:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296794; x=1642832794; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iTDApfHjxNn23xXxCjyrWAIv04B+IUUueksYv/jar9g=; b=cuM/wDqcVIf/l1PQ7+78R571wLMUZ1EOqbEdWCWmCQeRHZB3XEj4AVqB QP4/N8XGi69kdlpOaa3HxHKNsxV9jOE2HuVLypS/GVO50MGJcRszYl3y6 XWeBX7CemOaSJcMHA1f+rvgR8o7qrV1SheZA8Rzd4tuiv163+7X3MDXJl eLrvV9oqrdcIS8W7ICQs2r3GKreKTzk2dHg9RArw8EKeW2Ef8fyetEjjV Z61/HDbNfHqzuSZEQ9sIQp1N/dyZghyx8rprSM7WB6WfLt2eNfO2rBXXD FFXWtbRw7tfMotSGsbq5nahqLDkhsSuvfVWEJiOd1nK8yHhhlATs5kSHI A==; IronPort-SDR: UpGmvzFH4Uu2yhRPxZOqlo7C1WSwGy6PdsHTCsROMLQww5qRy7vdbNv50Qhw58K2CguI4MCzR2 caGoQcKWuXRbNpE+kdtX1uSl6j4vhweT1bBfStkIdcMqI6S2fKRt5pmytb7ULez2/6HQuD+w0B 1r00bnGRH0gkeqy7hXRqS3EbQ1Oi/nupbmcRyHZ27MNVLI69xlxCVj9YIfU8wE6ujlIWjcTTKr TRk824WOsq+vMZIhoyl0fhE03dEeokYPzMAt6Gu+jfP4poYGZaA62z7uqReh+DUpf8Bqy8/Mmo opw= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392019" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:02 +0800 IronPort-SDR: ob+xaI1PAxC6a1ZKr8/hksjmIuRRmD+m7E8Ok+8awhZ0TiUfu7Z5a4Fpqe0E9wCilN+sTE35df sYun6s+F+2tfwKVly44U5JaFh0iw0mccp9UCNfAhIj2BZtfHsQI/5O3UZY+GL51Qdo3vqzi6FV 9ZNmS1HYfgTcd81vxprbzrkCOVTOdhE8mGWe9JAYdzQE1mGZQwi89eRqct/FQg4wa/RMDTtsI+ Kppv5s6joMgBd1ee2f0oOMFE7Q8oe0YkUyQ9pCEvz92JSqYAAXuBYNDfWEbQhqp5h+lEwU4GW2 BQHPJto94rluH8/Lx2dcFtcs Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:34 -0800 IronPort-SDR: JYIMXcBteC4EIruOzeouzp7Oh6z+ER6Lt8NPCUeGH40AlQGIQaohiVQCMpTiNmTakbM58H2/a7 EmWn9FmE1DJBGKJvG/J3KphkUJ3yob9xejZlxQ1k4ZBTJDnJdz9WQZHMVivB0EClJuP559HIPn BcXEGmSigv3Y1VL1xeJ61fSS9snXVmSUJYWBjILvj60MUoxGBDto/ZsHKoHHBFxmbQsTevjb0A KGu0odMt6N+QqZZDIj1VqJOw68Mfxg/vsSJ78y6YwpPdoWqZPOxRnaCX0tScAlpSrZ6jmky3Nv itc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:01 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Josef Bacik Subject: [PATCH v13 25/42] btrfs: cache if block-group is on a sequential zone Date: Fri, 22 Jan 2021 15:21:25 +0900 Message-Id: <7ff0091533246b3b30d692de42aff19a3fc5f72d.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn In zoned mode, cache if a block-group is on a sequential write only zone. On sequential write only zones, we can use REQ_OP_ZONE_APPEND for writing of data, therefore provide btrfs_use_zone_append() to figure out if I/O is targeting a sequential write only zone and we can use said REQ_OP_ZONE_APPEND for data writing. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn --- fs/btrfs/block-group.h | 2 ++ fs/btrfs/zoned.c | 29 +++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 5 +++++ 3 files changed, 36 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 9df00ada09f9..a1d96c4cfa3b 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -184,6 +184,8 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + /* Flag indicating this block-group is placed on a sequential zone */ + bool seq_zone; /* * Allocation offset for the block group to implement sequential * allocation. This is used only with ZONED mode enabled. diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 77ebc4cc5b07..d026257a43a9 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1103,6 +1103,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) } } + if (num_sequential > 0) + cache->seq_zone = true; + if (num_conventional > 0) { /* * Avoid calling calculate_alloc_pointer() for new BG. It @@ -1223,3 +1226,29 @@ void btrfs_free_redirty_list(struct btrfs_transaction *trans) } spin_unlock(&trans->releasing_ebs_lock); } + +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_block_group *cache; + bool ret = false; + + if (!btrfs_is_zoned(fs_info)) + return false; + + if (!fs_info->max_zone_append_size) + return false; + + if (!is_data_inode(&inode->vfs_inode)) + return false; + + cache = btrfs_lookup_block_group(fs_info, em->block_start); + ASSERT(cache); + if (!cache) + return false; + + ret = cache->seq_zone; + btrfs_put_block_group(cache); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 331951978487..92888eb86055 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -46,6 +46,7 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -134,6 +135,10 @@ static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb) { } static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) +{ + return false; +} #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 22 06:21:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27AE9C433E0 for ; Fri, 22 Jan 2021 06:28:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D3FC5238D7 for ; Fri, 22 Jan 2021 06:28:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726712AbhAVG20 (ORCPT ); Fri, 22 Jan 2021 01:28:26 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51138 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726920AbhAVG05 (ORCPT ); Fri, 22 Jan 2021 01:26:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296816; x=1642832816; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=H2b0mafCGrW4jffVCyQfF4Y3rcmtTeWec/wt7UOlYEM=; b=Qe7RWhpwqnBHPNne+sqcvqDXZy/kpcN5mGv8Nkn/GXPPYAqJIkpi6QuO XjuN1QTnI2UtldAk77of7leVLWmC6etcTRY2PCtk4/SNI0iI0mnthqwof eOSl2gHYnZiKfMlTT9v8heTu2sBOLo7jewlQ4SSON2qK+1zXSoec/RzY0 G9gDQzvy3yP6DHSeG+1KTtAhDTwv8OWJ5Z1/PaGiWY9xOyAxGJ56IVxDz 2IFvFc5sIKqndjrmhKpjzz11edJZvKeobor7DgPmmDNAebLiWui64OT+Z 2cN0IgtvRx4Ml3IyKFo1QW85Xt75a8VTfMZRKM44Wv/SB4H9n67lTyqum w==; IronPort-SDR: s1soo1O9Iugvvme1BWM5yCWRgH8hl+ilrcbq+0gPmBaJjQ0tFSvL3zsW74+x1hAav+hp+CLfdK Ob+gEfK8ChmGrMQJrxVL/ZJhCkvFOp7dHxF6T8VlyjSDt4oLQrtJ+ekn4Wf070PtQi4M9LQQWk mlwPi9QjmsfBT00G651stOiSmeDXNFGh+mBnELS3I5JDkuZ6CVNAgzUMzB0NnbkDixrPrh5yvz wIfjn+GRfmGU2p8IDaHdntJZNV7c2WVEW9ZH+m8NmslF+jShlkSakJ42fOjPHoy6Unofen5mEM Ns4= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392021" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:04 +0800 IronPort-SDR: QmWVN/vCGX3KdvPgMdJp7veAIGcKpcNoEebNG0YPFaa7mJekr5S/Mb2yub/rAfmMct+Pr/1H8H Zd+m/XJz6HnB4dfpkET5fGucjyCvIIXyoIDUVBYtEPXTXjus6b5ucuV2cbNuRUEu3Gr4lMDQpF wUtsgU/HMGm0h2W5Qzawv+DKPsVGd+FUizg4d92VK+xkUah39AGNTpY+XOLmFI53YXjaB9ZRSl xcbz3fwMJZzBM9gQ5kVwSoLuW2JPjvP6Co5iuXtVXask/VtR1jfDgK5tbaXs4VrF8ofpSha0hS mwdNrlEhaYHQDiuZG4pac/Jq Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:35 -0800 IronPort-SDR: pVje8odUPz2yW/xu5oReEBJ8xQ30amRlzip9LTpvvwqqx9bmVb3Bj3WQIQm6R+y2ic58+4E8gi 9qFhb4VDN9w+uHTrlYN6q7KW4D9rjwUff6gfD6Nb5BGrz9wNtR8CT9DCT8mF9zk6ni1bmoQkn5 4xvNYbaf3rvYp6xn+Ho948xGBzCoKOn9+Mj0PyoUyVDqFUn4Kz0CI4BX46smAgAHfVGSmcFnZJ zrR9vH+J6K/e/AZuDFyvHu3Q81MiUDYnvLDkfcTHbw8OYtUJ4pRGKa001swPSDs51wi+ynOkGx Gas= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:02 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn Subject: [PATCH v13 26/42] btrfs: save irq flags when looking up an ordered extent Date: Fri, 22 Jan 2021 15:21:26 +0900 Message-Id: <8537c12ab50510bf029ff6c780a0ce2a850fa603.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn A following patch will add another caller of btrfs_lookup_ordered_extent() from a bio endio context. btrfs_lookup_ordered_extent() uses spin_lock_irq() which unconditionally disables interrupts. Change this to spin_lock_irqsave() so interrupts aren't disabled and re-enabled unconditionally. Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/ordered-data.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 4dd935d602b8..538378fe0853 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -757,9 +757,10 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *ino struct btrfs_ordered_inode_tree *tree; struct rb_node *node; struct btrfs_ordered_extent *entry = NULL; + unsigned long flags; tree = &inode->ordered_tree; - spin_lock_irq(&tree->lock); + spin_lock_irqsave(&tree->lock, flags); node = tree_search(tree, file_offset); if (!node) goto out; @@ -770,7 +771,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *ino if (entry) refcount_inc(&entry->refs); out: - spin_unlock_irq(&tree->lock); + spin_unlock_irqrestore(&tree->lock, flags); return entry; } From patchwork Fri Jan 22 06:21:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B100C43331 for ; Fri, 22 Jan 2021 06:29:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EF6E4208C7 for ; Fri, 22 Jan 2021 06:29:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726986AbhAVG3s (ORCPT ); Fri, 22 Jan 2021 01:29:48 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51100 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726935AbhAVG1T (ORCPT ); Fri, 22 Jan 2021 01:27:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296838; x=1642832838; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=epcoJ9topUwoUj4xUUSEMGJ4qhHq6hU1NLqjTwOrDRw=; b=WuCHRjRJ138/6kXAc9HTzx674pl0TBKhMsTOk18B+EnbGtk09rxKxx7X wRWFvrszMxlAl9zqabNX8QO5aTB8bhHT7vYjvnTWpIx0otWnE1WQilbWz Sso9/ix/vsVFLi+peV17Yu6XNdee2eSlWRZbZn5Morh0ziqOns/UzH+ua se0sEW6gPCEnEjbBkhwwhT9ynGlmLRX1hCzUfmRHqLoJnC3zHFxpI9m5k 2L3H4p7fNNEfAzwvd4gkldyxUd7EAnR5/SGnjc9CZszVA3tj7aAGFtke8 CMYJmcKGrhdwuSI0w4GNcaberjkgV6ByxBSL2IaZSWCAWwyTjN2IYI2iw A==; IronPort-SDR: pyJu//5oobF3W/xgZ+s42JX8JFdWY9SGqNAx7xOk5UkUlHGgdowW7M8sfDfI93aukIZQFWR8PG mu4U/pr0sDLAEbwdSvRj6idaVHecSSDKZ2wEkZVmAeMWWqSO/qIskFPlFbdVgEnaQUX/Ei5GZJ tsSVVmvGjLvik5ChoIK+uUrnwOtzi7kfGf61WoOk90V0UHHtf6U3fjzb+4+n7T5qbIazx8Bk82 s87yTsu2s5LpnAbz8wq/MfdlPlsaBkS7bUtcZaQZ2bt3NaFiW98e0lmxunlbgpcb/hB3H9l4g5 O4c= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392029" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:05 +0800 IronPort-SDR: dRuN+BhCVOLDvowO7r/+0vEmCluqe5Lg7mTzMEkYi8TFodiLkSkRfu2Zkr+rkGiZk1/ea9SNnl I5BVcoLXE8GHCUWuUFq3F20BXvTuQtazWq18862d4MQDTOjdfd0LWznbA9eRYNDA2YoMJTYMs8 8iPSSbYWfZTLSUONWEyZIaymls67ISh4G23VLoEEY6/BsuCPjUdsAaaC7mMonMG7npPxMWE1Vp 3j4lX4UZsCDQlqyyy0VnvDVHH0Ez7Mqf+cqCtOhLnkYANdNTYxJq4ZjQmTgLN2dAvA+QrrnSWg GpNsFffMel3/Nsb7+9K503Ed Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:37 -0800 IronPort-SDR: MgCtGmXx8ZSruxuAypVkPN9v1+jqqCar92BvIyCy4H36WpJV6dd8yLJYbwR70InnAp98/YfCEs DJKKtmaATVlroPh73f61SD/124jV21O1QwOVoXupsH9GvCn0HE9gmqaE6qfqSBhR7SePSAneBE JtcEkcdzxK+fwWJYCBnb/J854xwPl9oVw1JOlfhgg8IinpopWegZkRfjcfvN/4VgZewLGv58iU ZCGD+RDyZAg6zF0R9s1A5UaGPuSoUqs8+uyykFAKp2ppnDvh2PrUKBe/ApFJdfUlmwThZu/qYL R2U= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:04 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Johannes Thumshirn Subject: [PATCH v13 27/42] btrfs: use ZONE_APPEND write for ZONED btrfs Date: Fri, 22 Jan 2021 15:21:27 +0900 Message-Id: <3ce68f36407d9aa3665c5d5b444382650a6e1967.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit enables zone append writing for zoned btrfs. When using zone append, a bio is issued to the start of a target zone and the device decides to place it inside the zone. Upon completion the device reports the actual written position back to the host. Three parts are necessary to enable zone append in btrfs. First, modify the bio to use REQ_OP_ZONE_APPEND in btrfs_submit_bio_hook() and adjust the bi_sector to point the beginning of the zone. Secondly, records the returned physical address (and disk/partno) to the ordered extent in end_bio_extent_writepage() after the bio has been completed. We cannot resolve the physical address to the logical address because we can neither take locks nor allocate a buffer in this end_bio context. So, we need to record the physical address to resolve it later in btrfs_finish_ordered_io(). And finally, rewrites the logical addresses of the extent mapping and checksum data according to the physical address (using __btrfs_rmap_block). If the returned address matches the originally allocated address, we can skip this rewriting process. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 15 +++++++-- fs/btrfs/file.c | 6 +++- fs/btrfs/inode.c | 4 +++ fs/btrfs/ordered-data.c | 3 ++ fs/btrfs/ordered-data.h | 8 +++++ fs/btrfs/volumes.c | 15 +++++++++ fs/btrfs/zoned.c | 73 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 12 +++++++ 8 files changed, 133 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b9fefa624760..e0d212fd5678 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2733,6 +2733,7 @@ static void end_bio_extent_writepage(struct bio *bio) u64 start; u64 end; struct bvec_iter_all iter_all; + bool first_bvec = true; ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { @@ -2759,6 +2760,11 @@ static void end_bio_extent_writepage(struct bio *bio) start = page_offset(page); end = start + bvec->bv_offset + bvec->bv_len - 1; + if (first_bvec) { + btrfs_record_physical_zoned(inode, start, bio); + first_bvec = false; + } + end_extent_writepage(page, error, start, end); end_page_writeback(page); } @@ -3581,6 +3587,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, struct extent_map *em; int ret = 0; int nr = 0; + int opf = REQ_OP_WRITE; const unsigned int write_flags = wbc_to_write_flags(wbc); bool compressed; @@ -3627,6 +3634,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, /* Note that em_end from extent_map_end() is exclusive */ iosize = min(em_end, end + 1) - cur; + + if (btrfs_use_zone_append(inode, em)) + opf = REQ_OP_ZONE_APPEND; + free_extent_map(em); em = NULL; @@ -3652,8 +3663,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, page->index, cur, end); } - ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, - page, disk_bytenr, iosize, + ret = submit_extent_page(opf | write_flags, wbc, page, + disk_bytenr, iosize, cur - page_offset(page), &epd->bio, end_bio_extent_writepage, 0, 0, 0, false); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index d81ae1f518f2..eaa1e473e75e 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2174,8 +2174,12 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * commit waits for their completion, to avoid data loss if we fsync, * the current transaction commits before the ordered extents complete * and a power failure happens right after that. + * + * For zoned btrfs, if a write IO uses a ZONE_APPEND command, the + * logical address recorded in the ordered extent may change. We + * need to wait for the IO to stabilize the logical address. */ - if (full_sync) { + if (full_sync || btrfs_is_zoned(fs_info)) { ret = btrfs_wait_ordered_range(inode, start, len); } else { /* diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 286eee122657..c67bfe9a8434 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -50,6 +50,7 @@ #include "delalloc-space.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" struct btrfs_iget_args { u64 ino; @@ -2878,6 +2879,9 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) goto out; } + if (ordered_extent->disk) + btrfs_rewrite_logical_zoned(ordered_extent); + btrfs_free_io_failure_record(inode, start, end); if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) { diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 538378fe0853..e39744a14f0a 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -199,6 +199,9 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset entry->compress_type = compress_type; entry->truncated_len = (u64)-1; entry->qgroup_rsv = ret; + entry->physical = (u64)-1; + entry->disk = NULL; + entry->partno = (u8)-1; if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE) set_bit(type, &entry->flags); diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 3bf2f62fce5c..a74c459bbfac 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -127,6 +127,14 @@ struct btrfs_ordered_extent { struct completion completion; struct btrfs_work flush_work; struct list_head work_list; + + /* + * used to reverse-map physical address returned from ZONE_APPEND + * write command in a workqueue context. + */ + u64 physical; + struct gendisk *disk; + u8 partno; }; /* diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e69754af2eba..4cb5e940356e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6507,6 +6507,21 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_io_bio(bio)->device = dev; bio->bi_end_io = btrfs_end_bio; bio->bi_iter.bi_sector = physical >> 9; + /* + * For zone append writing, bi_sector must point the beginning of the + * zone + */ + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (btrfs_dev_is_sequential(dev, physical)) { + u64 zone_start = round_down(physical, + fs_info->zone_size); + + bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT; + } else { + bio->bi_opf &= ~REQ_OP_ZONE_APPEND; + bio->bi_opf |= REQ_OP_WRITE; + } + } btrfs_debug_in_rcu(fs_info, "btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u", bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector, diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index d026257a43a9..aa158735a1e6 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1252,3 +1252,76 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) return ret; } + +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio) +{ + struct btrfs_ordered_extent *ordered; + u64 physical = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + + if (bio_op(bio) != REQ_OP_ZONE_APPEND) + return; + + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + if (WARN_ON(!ordered)) + return; + + ordered->physical = physical; + ordered->disk = bio->bi_disk; + ordered->partno = bio->bi_partno; + + btrfs_put_ordered_extent(ordered); +} + +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) +{ + struct extent_map_tree *em_tree; + struct extent_map *em; + struct inode *inode = ordered->inode; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_ordered_sum *sum; + struct block_device *bdev; + u64 orig_logical = ordered->disk_bytenr; + u64 *logical = NULL; + int nr, stripe_len; + + /* + * Zoned devices should not have partitions. So, we can assume it + * is 0. + */ + ASSERT(ordered->partno == 0); + bdev = bdgrab(ordered->disk->part0); + if (WARN_ON(!bdev)) + return; + + if (WARN_ON(btrfs_rmap_block(fs_info, orig_logical, bdev, + ordered->physical, &logical, &nr, + &stripe_len))) + goto out; + + WARN_ON(nr != 1); + + if (orig_logical == *logical) + goto out; + + ordered->disk_bytenr = *logical; + + em_tree = &BTRFS_I(inode)->extent_tree; + write_lock(&em_tree->lock); + em = search_extent_mapping(em_tree, ordered->file_offset, + ordered->num_bytes); + em->block_start = *logical; + free_extent_map(em); + write_unlock(&em_tree->lock); + + list_for_each_entry(sum, &ordered->list, list) { + if (*logical < orig_logical) + sum->bytenr -= orig_logical - *logical; + else + sum->bytenr += *logical - orig_logical; + } + +out: + kfree(logical); + bdput(bdev); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 92888eb86055..cf420964305f 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -47,6 +47,9 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio); +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -139,6 +142,15 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) { return false; } + +static inline void btrfs_record_physical_zoned(struct inode *inode, + u64 file_offset, struct bio *bio) +{ +} + +static inline void btrfs_rewrite_logical_zoned( + struct btrfs_ordered_extent *ordered) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 22 06:21:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038433 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2CF6C43381 for ; Fri, 22 Jan 2021 06:29:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 90BBF208C7 for ; Fri, 22 Jan 2021 06:29:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726806AbhAVG3l (ORCPT ); Fri, 22 Jan 2021 01:29:41 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51039 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726663AbhAVG1U (ORCPT ); Fri, 22 Jan 2021 01:27:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296840; x=1642832840; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZDR+84muBJ96AMMRsDBsvzAu0B4WOvAi0xdManaKHBw=; b=JSazOebskAdMK0x4JBjdGf9Cpzv55NRoXq7GrIQmon6yBcEae0KFGzze r+i+qajuCSDNNZpNAlQwhECDxKjDR+coefrE9kOhjXizP+CLo2rOb3RMB iCLJIyvJehfwWdsQnnxn73ys/3XuiNFB7FDgnRPpk+61nMsrNTuTfX6Yd QEpxydccZzU7JxmjRy9WJL9yHr0kHygqlLyvS6Z8WTXDrsHxbAuBm8i9L 1ahrWVB9G7jVHhO5iead30C82tYBxMYHcRkBpetPtdndt8Zv70XohX1j+ +KtLQ2hPmsCSgcryR+41d4OFMpX62avEGeAzJnRgx8+NCT2+0++rwV2N1 w==; IronPort-SDR: CYmw81+ew4PzhmSpP7vTS9x+as5LtPCPSM2binJfKY+xoW3XIfSb+C+W5o4yimGsggE54+e9C3 wvfL7Mp6D+NkaNT3p4MPUMRIGcLrqV60GG4GSdwoB58+JDsq5bjvSm6i56wmbM787EKrMyd1WR +cLVBsZGzDtOFzYd9CqWHQsRAi6wZWDw5wd5oFWYyBp0rI6P3muuhTQrltyjOcHiwTHOFc0EH4 FzFUGwJczyNXCthTA1CEIy0R4SgphOQefIXzd0pHRaZvnYB8raXL2pJoz366r55I5pwWonvB+l XYg= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392034" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:07 +0800 IronPort-SDR: mA969MS+tQ703FWrl4ESttQq9PZ7aQYUiX/3+JCA8+TTVdtghRfy1jQrh+dsE03RTqlIFj8mf6 lDnxLu76S0LUL3cAmHcQoLGeTuH5nJINBRgOSey2LrsD/VWLgApr8kyVIoAjgIQiztmebYMVpN C9Ja931BZdYIGvDdLexCItW7LSKgOUJYZud+BmgN9KoTSsJhFsaLO/orZN+sxHfMN4p+ZLg0Lg USg/6GxL2HO5B2V51QxkLxkeaVl2VMTRWQJPTRLL/TiUl2hqYtbXTsCBqqpsBAdhfgAIEmsDf1 fV5sX9hLM4ScUjpRovPSNomR Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:39 -0800 IronPort-SDR: 1haN+nK4vC8wvxG790VU1vnDxNmo5krZcroax52tYCu8A80OrsaFNT1kr+3lzrCUrL1p/QC6wD i87H4eyP91jxIK/W/uAHPGaOOIZVrHkFk8uB8KwBDHXmxl/rTMKSPn24NLjXBpL0PT7PdBAKFM buQBcz9UGqwqLMiqh2UbFuQeTmHVlFn0Sld56BAltik+II535ja38N+nrGAFbNx+sxGvZw1cAD 6oVcrlaZLBW2LUmV4XGQFZOHX9un2s/TcrjduaqZsJQJ3g2ijwjAAvsAdon4YfRkVCL6kdFtb0 w/s= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:06 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 28/42] btrfs: enable zone append writing for direct IO Date: Fri, 22 Jan 2021 15:21:28 +0900 Message-Id: <0800322f509ef63bf4309d53742f5bfd53a8eb51.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Likewise to buffered IO, enable zone append writing for direct IO when its used on a zoned block device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c67bfe9a8434..26de8158fbe8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7741,6 +7741,9 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, iomap->bdev = fs_info->fs_devices->latest_bdev; iomap->length = len; + if (write && btrfs_use_zone_append(BTRFS_I(inode), em)) + iomap->flags |= IOMAP_F_ZONE_APPEND; + free_extent_map(em); return 0; @@ -7967,6 +7970,8 @@ static void btrfs_end_dio_bio(struct bio *bio) if (err) dip->dio_bio->bi_status = err; + btrfs_record_physical_zoned(dip->inode, dip->logical_offset, bio); + bio_put(bio); btrfs_dio_private_put(dip); } @@ -8119,6 +8124,19 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, bio->bi_end_io = btrfs_end_dio_bio; btrfs_io_bio(bio)->logical = file_offset; + WARN_ON_ONCE(write && btrfs_is_zoned(fs_info) && + fs_info->max_zone_append_size && + bio_op(bio) != REQ_OP_ZONE_APPEND); + + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + status = extract_ordered_extent(BTRFS_I(inode), bio, + file_offset); + if (status) { + bio_put(bio); + goto out_err; + } + } + ASSERT(submit_len >= clone_len); submit_len -= clone_len; From patchwork Fri Jan 22 06:21:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038431 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D6E6C433E6 for ; Fri, 22 Jan 2021 06:29:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 31A50208C7 for ; Fri, 22 Jan 2021 06:29:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726770AbhAVG3b (ORCPT ); Fri, 22 Jan 2021 01:29:31 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51031 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726292AbhAVG1W (ORCPT ); Fri, 22 Jan 2021 01:27:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296841; x=1642832841; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/9weRuO7b6vz8tavqWUFrAlWchRN3uhfdP1UwvxTx0Y=; b=ecPmqBV5relJkcImdsjWZCg/goBMiVegm4U4GUrMwQHVaSBwJNrmhWlK ixe3YGn9N03RzhIQJ4ZyCgPpms7CWtfX9+zs8wwKjiXxl1z77zDw414/v aIMcNt3iCyaV8beppKuVKeTDDT4ZHQJRz1OFR0uaczUCPJsEz52MH6BmU u+evmZpYu3iO0sqO/2IqCc5DE3RFTolKRmWQv+xHgeUZK9C5ilqxzY4Fa f2nd/xP16Qt3Owiefn6H2wpE3sBxddzzNqIy+NEcJdUZ39WXebLUNZwE/ WgsClDIF0Tzr/Ar3aT02g/F4tYMxROwUp481yePTTwPK3JA9V7tN6VrM8 w==; IronPort-SDR: /gj4F4wPTxwXExSFrtMTfwjq2TEKVfCoQiv+Evi5t08mAwUcHLACWJg+ZFOmRnIbhiY148K3MC DUR3dJEHaszgFwxTAs67DqrMcgYJB3PCQKTSvzKKPltFe7lqbBu1nxRT+iU84yEKrbH5Tx8KQS S4KeT4+XjX/qoR9Jh15nTQvxrax9IZVBlxkw90UKdfEop/8tyTK58AfhZg9vBx8JZSYUt8ysg/ xqgJcxy+IF4R9EK+uR04+GClT/GVSw4+zPcld6YAzlTP6ZdxuOzOugCFRZWUPDQ8nQCoaiTJT9 8cI= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392039" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:09 +0800 IronPort-SDR: AD/pzmzPImQ4cKeP6o45CHnHlEQebsxGwSNyeV/vZmMRXsZoohSMPGt/ng0FVd43BHh4kUYJTy qbX4Vw0mXqeAz3xgl/sTtNzlmvFHP/0yUQ1z1SA3WFRiyU1MbbEtedbOu9lYFbc3Aar4x6ewu5 H16BsROZHUegBgpf3OGnuVND0s26rqVfnrr8KiimS9S+D9T90Rit8WMFT7kqQYq52Cw3Fy07Gm rq8szD+D5DMK/OPZkaer6ojeoHgi9NaxXC2sD9e/cDVaVIZaQEJPs7ntAGs7mqFpJKdaouw8+b kz9DIpxIdF6079T45SnT13Qe Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:40 -0800 IronPort-SDR: 10uPCxzHrOW9JoYBreATosNqSqRXUz/NYbmtEShtHD7/ov1N9NO5HybTrxtVUm+sks8Dso4OUd CgpFyg4iO8wQyua+fgWms/ew2FAVPkpdrOVGbqR6k8GwCVqXMwvU2ZfWW19Q3j+SAdWL8gAqOn 58Qg8S245+VYGz5m/6lelD6qj7o1KQhIRfEk9+PWyMGKeO/u47sYmRLyoYBlZaXOK8YUM9/4bV Ik8QEcaUuppz87sAFTo3U6Fots0ojNRynkIBsZbKZglKWs6R2NpUsGjBuAeSkAePACmaO/obse +mw= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:07 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 29/42] btrfs: introduce dedicated data write path for ZONED mode Date: Fri, 22 Jan 2021 15:21:29 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If more than one IO is issued for one file extent, these IO can be written to separate regions on a device. Since we cannot map one file extent to such a separate area, we need to follow the "one IO == one ordered extent" rule. The Normal buffered, uncompressed, not pre-allocated write path (used by cow_file_range()) sometimes does not follow this rule. It can write a part of an ordered extent when specified a region to write e.g., when its called from fdatasync(). Introduces a dedicated (uncompressed buffered) data write path for ZONED mode. This write path will CoW the region and write it at once. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 26de8158fbe8..a5503af5369b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1400,6 +1400,29 @@ static int cow_file_range_async(struct btrfs_inode *inode, return 0; } +static noinline int run_delalloc_zoned(struct btrfs_inode *inode, + struct page *locked_page, u64 start, + u64 end, int *page_started, + unsigned long *nr_written) +{ + int ret; + + ret = cow_file_range(inode, locked_page, start, end, + page_started, nr_written, 0); + if (ret) + return ret; + + if (*page_started) + return 0; + + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + extent_write_locked_range(&inode->vfs_inode, start, end, WB_SYNC_ALL); + *page_started = 1; + + return 0; +} + static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes) { @@ -1879,17 +1902,24 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page { int ret; int force_cow = need_force_cow(inode, start, end); + const bool do_compress = inode_can_compress(inode) && + inode_need_compress(inode, start, end); + const bool zoned = btrfs_is_zoned(inode->root->fs_info); if (inode->flags & BTRFS_INODE_NODATACOW && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); } else if (inode->flags & BTRFS_INODE_PREALLOC && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_can_compress(inode) || - !inode_need_compress(inode, start, end)) { + } else if (!do_compress && !zoned) { ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); + } else if (!do_compress && zoned) { + ret = run_delalloc_zoned(inode, locked_page, start, end, + page_started, nr_written); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); ret = cow_file_range_async(inode, wbc, locked_page, start, end, From patchwork Fri Jan 22 06:21:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038435 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B753C433DB for ; Fri, 22 Jan 2021 06:29:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 54F6B239D4 for ; Fri, 22 Jan 2021 06:29:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726981AbhAVG3U (ORCPT ); Fri, 22 Jan 2021 01:29:20 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51034 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726674AbhAVG1W (ORCPT ); Fri, 22 Jan 2021 01:27:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296842; x=1642832842; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7bhI4GsdYtyiwv2jI0vmusT05p0v6H3hNyj5go0qKkg=; b=WmAe8wOGE/2py9ehSqFlCdLn+0fOan82IzKkDV5ohYoauCIVLxivrXsT AP+We41tLOTzS3jKWZQhPfr7tbx9NeUcQ7OunVmhCrgi197bBA8qkp4es CD0yFXrzdDdOTsoMnbkR+0aAj42etf0gx+ya93MeV6Vp9EdscTFjDVNUW rZFzPhP2RyPaP1mPkeRhsU1XQxTMJjEW9kcwM1AP4DFVnaK8UhK1mBP4L SeYOBtkprkQ+vBwn+S8TNykvSVeeyn4DIe9y5s3Ix3wDnJu/sYwWeOgyH OiTubeyO+cd4MfO4tVI1H1USKcf0Ylw3/rJ2jrcTNwbn7GMfAbu8ZTc6v w==; IronPort-SDR: CFCaQJZuVUydGjzl4gT4AcxSKz6gJRRU9ZE4OrW1yxFIe9Ae8HXjh9UfWeRKf/zA9xX8oij4Ui C3NBCBqTC25BLwdoATKH84xjiTBcw6vCjkdUm1jE4Y4YVpuBbnuM/YBV1GNgQU8MklUv4YznQX qos3z7/t9aAWXZWgj8Xkcg2fT2XkYx24QvxF/Z4nnfZDpjMT9mTr8u5hnjHboFQuVwtMvwXbwT Ek5dYa54ZqYSE79fXsdRF0uc/p3ypCbyvPNUQuYdZP3aKblLMTP1tB3sKXkXTzohyTP3M3Y26W +D8= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392047" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:10 +0800 IronPort-SDR: HH8YBtHFr++rrZfz/rQ6CIuLEnuD8LwDFKPQrsatAbXeN8DvrxbOCEj7vc942fPIc64un10x2d /pKAhZHnbT5IqN8QKd75xl6Y2f8Vsj0UYNzoqZso8NiMkCQaAeU36sfqwxo8nYidRFir05fSgk 7AqGtSSZyVvm9S93hZtUwbnfx7wBKSasRTO2r0zug2ndizGlVHCZ+U9n59rUES1aKp8pdN2lHo GfB+XZXQsSPiLzIVfn2NDSI7p9GF0E2onEituFr8A323RMomdz11zeuXrxOf+3t6fWGM7AJ46b 0vmCYM68WK4MxdX0cRZmfbpj Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:42 -0800 IronPort-SDR: KdI+XdayyETU9xW6I4j+UJTjPIeoI4QqAhLMP6ZUkHm6m02ym97Hq0N48TvgvpMDWfmvi/DE2W TaYZbWmKyubmOhkjjGMzt219yi5hQaV8zEyXxtB41ouI+TclCdPJDLeF7MTZxWosdmKkNO43Fb PGyc1H6TxeuHldpaznBe8NcZYhcIaLKBNelcFKNfb1krJqJlFBBELM+/4B7kYwN6fi9Z1VAapk NKtaFn8xbiVKx+e1wqBEyKC/1TiH6AnuD5z5YLzk6X40lUMgpaVQCNY4UbbGaPOmcBL4EgTSIy S9M= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:09 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 30/42] btrfs: serialize meta IOs on ZONED mode Date: Fri, 22 Jan 2021 15:21:30 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We cannot use zone append for writing metadata, because the B-tree nodes have references to each other using the logical address. Without knowing the address in advance, we cannot construct the tree in the first place. So we need to serialize write IOs for metadata. We cannot add a mutex around allocation and submission because metadata blocks are allocated in an earlier stage to build up B-trees. Add a zoned_meta_io_lock and hold it during metadata IO submission in btree_write_cache_pages() to serialize IOs. Furthermore, this add a per-block group metadata IO submission pointer "meta_write_pointer" to ensure sequential writing, which can be caused when writing back blocks in an unfinished transaction. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 1 + fs/btrfs/extent_io.c | 25 ++++++++++++++++++++- fs/btrfs/zoned.c | 50 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 32 +++++++++++++++++++++++++++ 6 files changed, 109 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index a1d96c4cfa3b..19a22bf930c6 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -192,6 +192,7 @@ struct btrfs_block_group { */ u64 alloc_offset; u64 zone_unusable; + u64 meta_write_pointer; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1b6f66575471..4e9c55171ddb 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -972,6 +972,7 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; + struct mutex zoned_meta_io_lock; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ba0ca953f7e5..a41bdf9312d6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2704,6 +2704,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) mutex_init(&fs_info->delete_unused_bgs_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); + mutex_init(&fs_info->zoned_meta_io_lock); seqlock_init(&fs_info->profiles_lock); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e0d212fd5678..ed976f6e620c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -25,6 +25,7 @@ #include "backref.h" #include "disk-io.h" #include "zoned.h" +#include "block-group.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -4074,6 +4075,7 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, struct extent_buffer **eb_context) { struct address_space *mapping = page->mapping; + struct btrfs_block_group *cache = NULL; struct extent_buffer *eb; int ret; @@ -4106,13 +4108,31 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, if (!ret) return 0; + if (!btrfs_check_meta_write_pointer(eb->fs_info, eb, &cache)) { + /* + * If for_sync, this hole will be filled with + * trasnsaction commit. + */ + if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) + ret = -EAGAIN; + else + ret = 0; + free_extent_buffer(eb); + return ret; + } + *eb_context = eb; ret = lock_extent_buffer_for_io(eb, epd); if (ret <= 0) { + btrfs_revert_meta_write_pointer(cache, eb); + if (cache) + btrfs_put_block_group(cache); free_extent_buffer(eb); return ret; } + if (cache) + btrfs_put_block_group(cache); ret = write_one_eb(eb, wbc, epd); free_extent_buffer(eb); if (ret < 0) @@ -4158,6 +4178,7 @@ int btree_write_cache_pages(struct address_space *mapping, tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; + btrfs_zoned_meta_io_lock(fs_info); retry: if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); @@ -4198,7 +4219,7 @@ int btree_write_cache_pages(struct address_space *mapping, } if (ret < 0) { end_write_bio(&epd, ret); - return ret; + goto out; } /* * If something went wrong, don't allow any metadata write bio to be @@ -4233,6 +4254,8 @@ int btree_write_cache_pages(struct address_space *mapping, ret = -EROFS; end_write_bio(&epd, ret); } +out: + btrfs_zoned_meta_io_unlock(fs_info); return ret; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index aa158735a1e6..b66f57119068 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1161,6 +1161,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) ret = -EIO; } + if (!ret) + cache->meta_write_pointer = cache->alloc_offset + cache->start; + kfree(alloc_offsets); free_extent_map(em); @@ -1325,3 +1328,50 @@ void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) kfree(logical); bdput(bdev); } + +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + struct btrfs_block_group *cache; + bool ret = true; + + if (!btrfs_is_zoned(fs_info)) + return true; + + cache = *cache_ret; + + if (cache && (eb->start < cache->start || + cache->start + cache->length <= eb->start)) { + btrfs_put_block_group(cache); + cache = NULL; + *cache_ret = NULL; + } + + if (!cache) + cache = btrfs_lookup_block_group(fs_info, eb->start); + + if (cache) { + if (cache->meta_write_pointer != eb->start) { + btrfs_put_block_group(cache); + cache = NULL; + ret = false; + } else { + cache->meta_write_pointer = eb->start + eb->len; + } + + *cache_ret = cache; + } + + return ret; +} + +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ + if (!btrfs_is_zoned(eb->fs_info) || !cache) + return; + + ASSERT(cache->meta_write_pointer == eb->start + eb->len); + cache->meta_write_pointer = eb->start; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index cf420964305f..a42e120158ab 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -50,6 +50,11 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, struct bio *bio); void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret); +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -151,6 +156,19 @@ static inline void btrfs_record_physical_zoned(struct inode *inode, static inline void btrfs_rewrite_logical_zoned( struct btrfs_ordered_extent *ordered) { } +static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + return true; +} + +static inline void btrfs_revert_meta_write_pointer( + struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -243,4 +261,18 @@ static inline bool btrfs_can_zone_reset(struct btrfs_device *device, return true; } +static inline void btrfs_zoned_meta_io_lock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_lock(&fs_info->zoned_meta_io_lock); +} + +static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_unlock(&fs_info->zoned_meta_io_lock); +} + #endif From patchwork Fri Jan 22 06:21:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038429 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DADCC43381 for ; Fri, 22 Jan 2021 06:29:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3C601208C7 for ; Fri, 22 Jan 2021 06:29:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726961AbhAVG3H (ORCPT ); Fri, 22 Jan 2021 01:29:07 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51117 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726683AbhAVG11 (ORCPT ); Fri, 22 Jan 2021 01:27:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296846; x=1642832846; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ek7JIhSO4mv1skx19zGKgLOh9ix/IxT0bw2T3bkcUKw=; b=U3bhN/2TjhJvfj0LOB/kLpUbqUH4UXuecnlLEbjODHgXpxKcFD/+l7kD u+92184y/sHGue7pOsnxGOHYgm23B3jeN6U6rcXCsnzx8R1KXSmReQm8O 7+IIt7hCFj/P7wQuQZGjxma9l1jM2pHvfm5m1LvLS45SL/6LZPY0AitMF e9cnq0RmEnMBj8l9MGvBs9bFSbWQ/ExL7IUbLD0qIB8xMdwFni2k1OSl8 A3t4Gxps9DC/33GjdQgIMbJr92cYvwDfNCfYGP2GLlgFUhRbKdTVGMCPT STi3c88UabyhhBuZMZgqnuYOOoTSglrsRpMoCmOJkZ9FHagpxJ9FTofEK g==; IronPort-SDR: KlPbBqT12+xF3JO9LPjSPwuhdUwtLDE9Nd+7OyJBrDfBZwnAuzCdd0WPExePAvgeF44353WdYb xAiqgTwI5CH7Oxb9fIGBjwOxGbYsTG7zSDh652m4hq540vtzUzNFPfT3yXvxtppt22Vs930GEG RZ4QNu/33QIjfGOd2Vwt/FYNceFkwZDEu0xPfO2MdV+at6KdQTcvKu7swafggoaXkapFhCVk9H e6RilT2KMwLZd2fONcRqqpk/10rZCB5yrfq2VpU9cT91LYfWad9nM8ON1J5IqunojeucV1zGRt j/A= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392050" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:12 +0800 IronPort-SDR: T+A7lej0XOxJxTTf4k/qIYy3QA3p25BJe51qukWMXOUtYNhbMSaYZLM61KEm7ghe6muC/rq4LM 3va0Hd15f+Do8dr+gLhLi+ChXkhF0qGzji+KMMb/p9GWouO5PzKLu+mnc8c/5CSRFjEt04/ENz w3jOri17k6Iq0EJ9GIG/653lx+U1iiqMUAw28+bAIqkRT4J1jmyYN4BuRRztr4+UKNCcGl2M0W Krq7XvrfBtYNy40OBPwAjYf07KrMWoSiR9vPvPrf606CXQop7lTHRFF2cyIwer8Jobda0wWoFY C2yntb03cQ00kvY3/onTwsqs Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:44 -0800 IronPort-SDR: IB76tcEVF/LHW3MQIkLxyS4Ci0TfQXCJY3H1ZbuQjKDVfN2fC8t2CFZN31ZW6OPaXLNOO3l2v3 Y5lmc9eafiBXMTA2VxwIUpAiQ+bKRX0RgqtVaLrijoSXzW6wR0xynpoDx3IQzU/ec3ieOVSXgE bFLbxLrW0arS+6ge+ddbIsELYR12SnHVFdaoGWB7/JdlYux8fYGDZjB+3ugWNxLC/mYGfYOP/a KgAXkxnh/gIlj7bIsTewqroYKtRlyzY/lC+RWpmXIjYMbHSTsgW1vytlTU8upWqK3b2/+xJXmR ZnI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:11 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 31/42] btrfs: wait existing extents before truncating Date: Fri, 22 Jan 2021 15:21:31 +0900 Message-Id: <9abfa1e527228bdc44ef9f285cf0daf221b9a715.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When truncating a file, file buffers which have already been allocated but not yet written may be truncated. Truncating these buffers could cause breakage of a sequential write pattern in a block group if the truncated blocks are for example followed by blocks allocated to another file. To avoid this problem, always wait for write out of all unwritten buffers before proceeding with the truncate execution. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a5503af5369b..06d15d77f170 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5169,6 +5169,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_drew_write_unlock(&root->snapshot_lock); btrfs_end_transaction(trans); } else { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + if (btrfs_is_zoned(fs_info)) { + ret = btrfs_wait_ordered_range( + inode, + ALIGN(newsize, fs_info->sectorsize), + (u64)-1); + if (ret) + return ret; + } /* * We're truncating a file that used to have good data down to From patchwork Fri Jan 22 06:21:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038427 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DD00C433E0 for ; Fri, 22 Jan 2021 06:28:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4790F235FF for ; Fri, 22 Jan 2021 06:28:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726920AbhAVG2o (ORCPT ); Fri, 22 Jan 2021 01:28:44 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51138 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726960AbhAVG17 (ORCPT ); Fri, 22 Jan 2021 01:27:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296879; x=1642832879; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2s5NK8tSWvlhL5OFm4v5zkBSPKsGpOvG+1jvfV3eidk=; b=U8Z3S2utgYHWukFkvx/PzUZlLV82DgUSOxK0Qc6aQb1d0Hn3R+ZPrJtL cbs4AMKFj+zLQWkHXtbAJfJPL1akoXkNe0+0Jedg6WcLsfe1TXgyyY7Oy RRJx6hQtqLuj3nUE+u+bucnOSAa3uDyNfbq4frMQPbybshHlHt5ZbH4mg 3GRyfIbVGfbSqrVCyOHYKyC2QKZmmHPIPoHooytijWS86U1U8p0bhBtCO s79DcSe/eE5A9HSAnNIACVTdrBjkD/ALJL5KRClrp4KDFKkE6AUHpErkw wvTyggpL74JRMXtTmJFb2OxVRnHfTgQ+L4ZTeTZA+w0Q05WLnRX4xELq3 w==; IronPort-SDR: 3JP02dlNIE3HMntMiEBw6AqR2CfWT3JsVe6EfK7JMUJBko83EqqlkQ1DZIcp9gmCZiLgGhBZq8 2vP1loh14+wWb5KN3litQ/UQFcSY0dmHk4bWOOUYgEJ01cPOK+S+vc/MrElLaLWCmdIbmtFNZp Y3GwIPD3COMtJVxLbtxIvsOBOdGmwcjMBYu4lXihkRWlictHR9zIga5P8hpzpoxfy45GllBUcp Ja7lscZ5psytBhrINJvRf2iiNp7sR28ZGCgvi764pRPGdk4iALdN1XXhyyJB1FdG8jgGcpJQGv 4DM= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392052" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:14 +0800 IronPort-SDR: 9IZ1jthdsWqerJ1gd2YAlrRKtLmssF4RWXyOFM2/XI2N84E5oYVCiTgu7BLcOYnbJ6i5G2X1s/ HTGl6p553w+J6vyDCA3Sc/8Bda6cWC3SKDZWFHQQ4xsJa4IpXh/ByTVhWxoVBa64PVbio6irk+ 36CFw8SjwB6GtuZ11xVA1he9685fDPRvDTzsJKOL6r82miSJ6AbVTWoEu4G0s5i2QWfFN3Pgoi dVkuoVIglBfgCP/Ke0CaIcggsVc3xNbumiDis9UPhwR56FvPxdLK4gjV4JgD2MwAUyo3F3ckl3 ws1gj2PKERsnBU1qyWs1alh+ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:45 -0800 IronPort-SDR: KRiT5LtSciRwUL/uanpo1oUPnJBcJc+QClB/yOMrTkvt12/VNs31h1TIfshZxRTssu1MJC4dUj X8fYPUv9gQMad2Bh/O+wn88EDBUhhwXnT71iBKJhl7sn2IkjFD+XkAh68urQ79qcMYgqZ+Ekj9 BbLfdawh71mojvyoOYLQo3OViOp4VRKxvU/lkdzHTQmJ2FdBbikOBL+eJAp41e3T07Hkd4h6wH 6CJt4mW7ukYkCpEi64YtVmbl90Cex/ansK+X2YGS1cDJ9uwbQ2aeVRK96BSRJf9jyvUUkxSUcN Lek= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:12 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 32/42] btrfs: avoid async metadata checksum on ZONED mode Date: Fri, 22 Jan 2021 15:21:32 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In ZONED, btrfs uses per-FS zoned_meta_io_lock to serialize the metadata write IOs. Even with these serialization, write bios sent from btree_write_cache_pages can be reordered by async checksum workers as these workers are per CPU and not per zone. To preserve write BIO ordering, we can disable async metadata checksum on ZONED. This does not result in lower performance with HDDs as a single CPU core is fast enough to do checksum for a single zone write stream with the maximum possible bandwidth of the device. If multiple zones are being written simultaneously, HDD seek overhead lowers the achievable maximum bandwidth, resulting again in a per zone checksum serialization not affecting performance. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a41bdf9312d6..5d14100ecf72 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -814,6 +814,8 @@ static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio, static int check_async_write(struct btrfs_fs_info *fs_info, struct btrfs_inode *bi) { + if (btrfs_is_zoned(fs_info)) + return 0; if (atomic_read(&bi->sync_writers)) return 0; if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) From patchwork Fri Jan 22 06:21:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23A98C433DB for ; Fri, 22 Jan 2021 06:31:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D7B71235FF for ; Fri, 22 Jan 2021 06:31:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727019AbhAVGa6 (ORCPT ); Fri, 22 Jan 2021 01:30:58 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51100 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726951AbhAVG3G (ORCPT ); Fri, 22 Jan 2021 01:29:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296944; x=1642832944; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=u+vkAh1NMFAZIkRuh6fcXsdDpdyPjj3y49UjVClZ7hA=; b=fdL3rffQIBbeQiBUC7yyFyRXBhKgH8yToi0YyAaT84t1hWvZi2F3Thz6 GUTUmPqtqR0Vtgzr+mupDfPFJHu0iwhgGJsNy8AOEZocIjsNtnnYRH+O5 yqY2jG7X7+kgt8a3RKjYSnIwzlxScj3Cvar2+1HnfwmQpMdBk+GU1SbgC rYKeVPQbFJI3lrXnMkWiOq+Le5X9Si/5sNbSKhblGqZVaQyI+ROaJb+p0 w9gzx+U+Zk203bWoDtd7ixOt8j8EABdl/w+kkqbLWr/Q/ALjHpums4iAR VwnajWqZtWVskN1IgPwISro4Hw7qvL1/K0rWuOw62UbXm9RRzAGYmxSoQ Q==; IronPort-SDR: MnaFWcIFgOBKZKrOhFMnW+S4jTgmchO6FZcS9FGSK4QOh48Lry8N7j6dq1O4esfECmueEXWhF7 Hatf+po6GUV9BgptlkYDobJo1h0485BS1ccRo+adr3ywVzVnr+wyFj9RU4OdQMZU5tQfF3GTop qinsc2hoJh0zENgMMX03B/MOstiRVddPZp5eEYFUPRQhrHQlB29qshRoLLLwN3J8OguXut9R+t 76/7ngstcFyrES7j/vtEScJG6RLOPdYtF6dtA4C0CB7LMGBAQ2KpJW+A1GBKv0XAUpSJYjJPWi Kzc= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392057" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:15 +0800 IronPort-SDR: 8D+PRie2EZtxf/K6rStCdWM/W3fjcR3Wp0ZSXWsP34Azjt9Wx+PmOrBjEs/ME9G2ZzgGqjugrt 2zF/SJCnZGYvwOF0Prkwgr5UqPjVMdXwPcYUG777EdRWrH9RLl6JFu4sJDp22D8cgW16VBTZgr y9Y+04xE2mU34l1MhXloekCEzsr2fRxyZd7cSRm1L9PSpQWnF2fudSVmyPxsxJ/UCPYjjzI22+ dxt4+7xeMj57i33Aw7S4C6++TDjAMNyDQbfYUP8wlQirzPpGwf3fzQCJHuW5ZocvrwLZRMJ4My nLrLhafQDX/HRjEgUnoOlDdk Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:47 -0800 IronPort-SDR: s45hxG/pJpT3PtwcvcbrtS+f2JcGKsiTeicg8xaoJyKeylr39z1WXTJxAojau9fJZNwgqD4V57 kX+dy6BQivjZHuxb6aBQ1D4Vh+tNSsgM4wVBc5CA3ZQlBg6SD0n7/HxkFD72xphuWa+43gBBFr bfjY4Otmd2NThhT/OHRWdGAkZyR9u3jtL9ki8l2qdQNt+/OV9ZKIxNSfEEIrmDEEkrPY/dAocG nDNXljwfmXlWUpsLknyyqPnkbWNeWLNTF132DnfccFGxTbUzihev6I9nAl/ba372jIJOv62SFz aWY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:14 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 33/42] btrfs: mark block groups to copy for device-replace Date: Fri, 22 Jan 2021 15:21:33 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/4 patch to support device-replace in ZONED mode. We have two types of I/Os during the device-replace process. One is an I/O to "copy" (by the scrub functions) all the device extents on the source device to the destination device. The other one is an I/O to "clone" (by handle_ops_on_dev_replace()) new incoming write I/Os from users to the source device into the target device. Cloning incoming I/Os can break the sequential write rule in the target device. When writing is mapped in the middle of a block group, the I/O is directed in the middle of a target device zone, which breaks the sequential write rule. However, the cloning function cannot be merely disabled since incoming I/Os targeting already copied device extents must be cloned so that the I/O is executed on the target device. We cannot use dev_replace->cursor_{left,right} to determine whether bio is going to not yet copied region. Since we have a time gap between finishing btrfs_scrub_dev() and rewriting the mapping tree in btrfs_dev_replace_finishing(), we can have a newly allocated device extent which is never cloned nor copied. So the point is to copy only already existing device extents. This patch introduces mark_block_group_to_copy() to mark existing block groups as a target of copying. Then, handle_ops_on_dev_replace() and dev-replace can check the flag to do their job. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/dev-replace.c | 182 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/dev-replace.h | 3 + fs/btrfs/scrub.c | 17 ++++ 4 files changed, 203 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 19a22bf930c6..3dec66ed36cb 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -95,6 +95,7 @@ struct btrfs_block_group { unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; + unsigned int to_copy:1; int disk_cache_state; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index bc73f798ce3a..b7f84fe45368 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -22,6 +22,7 @@ #include "dev-replace.h" #include "sysfs.h" #include "zoned.h" +#include "block-group.h" /* * Device replace overview @@ -459,6 +460,183 @@ static char* btrfs_dev_name(struct btrfs_device *device) return rcu_str_deref(device->name); } +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info, + struct btrfs_device *src_dev) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_dev_extent *dev_extent = NULL; + struct btrfs_block_group *cache; + struct btrfs_trans_handle *trans; + int ret = 0; + u64 chunk_offset; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return 0; + + mutex_lock(&fs_info->chunk_mutex); + + /* Ensure we don't have pending new block group */ + spin_lock(&fs_info->trans_lock); + while (fs_info->running_transaction && + !list_empty(&fs_info->running_transaction->dev_update_list)) { + spin_unlock(&fs_info->trans_lock); + mutex_unlock(&fs_info->chunk_mutex); + trans = btrfs_attach_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret == -ENOENT) + continue; + else + goto unlock; + } + + ret = btrfs_commit_transaction(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret) + goto unlock; + + spin_lock(&fs_info->trans_lock); + } + spin_unlock(&fs_info->trans_lock); + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto unlock; + } + + path->reada = READA_FORWARD; + path->search_commit_root = 1; + path->skip_locking = 1; + + key.objectid = src_dev->devid; + key.offset = 0; + key.type = BTRFS_DEV_EXTENT_KEY; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto free_path; + if (ret > 0) { + if (path->slots[0] >= + btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_leaf(root, path); + if (ret < 0) + goto free_path; + if (ret > 0) { + ret = 0; + goto free_path; + } + } else { + ret = 0; + } + } + + while (1) { + struct extent_buffer *l = path->nodes[0]; + int slot = path->slots[0]; + + btrfs_item_key_to_cpu(l, &found_key, slot); + + if (found_key.objectid != src_dev->devid) + break; + + if (found_key.type != BTRFS_DEV_EXTENT_KEY) + break; + + if (found_key.offset < key.offset) + break; + + dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent); + + chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent); + + cache = btrfs_lookup_block_group(fs_info, chunk_offset); + if (!cache) + goto skip; + + spin_lock(&cache->lock); + cache->to_copy = 1; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + +skip: + ret = btrfs_next_item(root, path); + if (ret != 0) { + if (ret > 0) + ret = 0; + break; + } + } + +free_path: + btrfs_free_path(path); +unlock: + mutex_unlock(&fs_info->chunk_mutex); + + return ret; +} + +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map *em; + struct map_lookup *map; + u64 chunk_offset = cache->start; + int num_extents, cur_extent; + int i; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return true; + + spin_lock(&cache->lock); + if (cache->removed) { + spin_unlock(&cache->lock); + return true; + } + spin_unlock(&cache->lock); + + em = btrfs_get_chunk_map(fs_info, chunk_offset, 1); + ASSERT(!IS_ERR(em)); + map = em->map_lookup; + + num_extents = cur_extent = 0; + for (i = 0; i < map->num_stripes; i++) { + /* We have more device extent to copy */ + if (srcdev != map->stripes[i].dev) + continue; + + num_extents++; + if (physical == map->stripes[i].physical) + cur_extent = i; + } + + free_extent_map(em); + + if (num_extents > 1 && cur_extent < num_extents - 1) { + /* + * Has more stripes on this device. Keep this BG + * readonly until we finish all the stripes. + */ + return false; + } + + /* Last stripe on this device */ + spin_lock(&cache->lock); + cache->to_copy = 0; + spin_unlock(&cache->lock); + + return true; +} + static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, const char *tgtdev_name, u64 srcdevid, const char *srcdev_name, int read_src) @@ -500,6 +678,10 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, if (ret) return ret; + ret = mark_block_group_to_copy(fs_info, src_device); + if (ret) + return ret; + down_write(&dev_replace->rwsem); switch (dev_replace->replace_state) { case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h index 60b70dacc299..3911049a5f23 100644 --- a/fs/btrfs/dev-replace.h +++ b/fs/btrfs/dev-replace.h @@ -18,5 +18,8 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info); void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info); int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info); int __pure btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace); +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical); #endif diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 3a0a6b8ed6f2..b57c1184f330 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3564,6 +3564,17 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + + if (sctx->is_dev_replace && btrfs_is_zoned(fs_info)) { + spin_lock(&cache->lock); + if (!cache->to_copy) { + spin_unlock(&cache->lock); + ro_set = 0; + goto done; + } + spin_unlock(&cache->lock); + } + /* * Make sure that while we are scrubbing the corresponding block * group doesn't get its logical address and its device extents @@ -3695,6 +3706,12 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, scrub_pause_off(fs_info); + if (sctx->is_dev_replace && + !btrfs_finish_block_group_to_copy(dev_replace->srcdev, + cache, found_key.offset)) + ro_set = 0; + +done: down_write(&dev_replace->rwsem); dev_replace->cursor_left = dev_replace->cursor_right; dev_replace->item_needs_writeback = 1; From patchwork Fri Jan 22 06:21:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DED9EC4332E for ; Fri, 22 Jan 2021 06:29:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0F43208C7 for ; Fri, 22 Jan 2021 06:29:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726877AbhAVG3o (ORCPT ); Fri, 22 Jan 2021 01:29:44 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51039 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726960AbhAVG3G (ORCPT ); Fri, 22 Jan 2021 01:29:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296946; x=1642832946; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BpZmgOYztxiW9CEQp1qjiZqpQjtViS+IU+wdQN5aSfU=; b=jtSPnPjVR9B7zzoxHevEJkJVBWnEvYOB2v3/m2jmKwfRGpQoJlnCyk1O vdPLmGu5Nld60b7ZosoUCOY0T76A3OY2IAehHRUzrVWY/qVGYnhlPl4wq LDW8MKMMRtqgs2U0Ue+b2mSY+jTDcMQEXDQig7PHfzpwXxcESX7IPfVvu Df/YCViFj8so4sWp89TRROtRV7vd6ZAUFQTnZl76AhZNk9RV9GvzIAEGj m/tlMkKDBXiUVU9zpNzIYoHPt0RtbH/txOVmIbt35AhUBWgX4Zy5eJsmH q6pEaReygViQotrviVYcgldjV0jakElQ9oZxD+kGSdli/Y417YDflgYpN Q==; IronPort-SDR: C1ot0p1uUhVEx1I+tvdekRRAe7bYaoZNrG4AsHh19FsNUrfvNMI4MptBGEkaR1Lcr5XqTp4LLQ +scruoetz5FMYLcNcF/HHM59JRU7uLtQOi0FZSl4phP1c5nTEe01JNjAuyIpAS0wSCIDT6UPzQ qphyLTFfJelAD/jsdFHWuRSLlqtWzelWB4XuYt47qkqkb90ldyB3rNibUD3znioJTqMqaxsTHS Tgcd2EPLnIeMxO8MlsJWC3Q6EwM7iZpsqz0IOpIhBrB4d7fxKo6ZTPE9zlZ6l8a5lLFZpZYgre Cdo= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392064" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:17 +0800 IronPort-SDR: c2FaYmOilFqSkkub4cooEPYjd/nD9dBPAKSMxMecOxAZxAKjWIx+5btIkoNNRH9T72aJkLSg99 zivFpHLMAKXwdAapJgn5dUqfX3I0c/Yz7P5xysVBwGZFaUHdD8KEWLuDfH6OtDuZOfbf7ctOkR N8Az8/gf2Qe0feU6qcfVTPj0zTyjqziDjqp0TAAgulNz+vAk2tPeXSGGIPmqJbb913DCxCHak/ 3QYm5mzxSFkSTHObTsL6uAXwdPGmkEt7kQCMIhBLmPMx8S0y+ugbvpzGgpaSyxsF7UlZOPNssy C0zsWrQztsQxtWqsnvEmxumx Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:49 -0800 IronPort-SDR: FnPTgEO22I6bmAodIOnu03NFd6qU+0orHUPh2IubljWtnq0iHtgbXCtpTWvyhyiTLPvkrJajkk fZ+r1Qo2kc7SV6xNgH5xD6ZmRGC5DWfbquAVaIhOmm7X4JWvWSo9jWwbQXFzmId5f8LLECB2va 2rEHNNglKaS1aByXBIrXU+C+k6wfPWQq/US4g7e9HUyb3TVkpJgvOUvgwJ84NX+P7YFR14ymjk Z69gn2RIHWAtG9I7392Jkno/4j6YiLNuZwogRYvC9wk6Jsp3jy9hzY112y3MV+e2kTyBp5GMYj wRQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:16 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 34/42] btrfs: implement cloning for ZONED device-replace Date: Fri, 22 Jan 2021 15:21:34 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 2/4 patch to implement device-replace for ZONED mode. On zoned mode, a block group must be either copied (from the source device to the destination device) or cloned (to the both device). This commit implements the cloning part. If a block group targeted by an IO is marked to copy, we should not clone the IO to the destination device, because the block group is eventually copied by the replace process. This commit also handles cloning of device reset. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++----------- fs/btrfs/volumes.c | 33 ++++++++++++++++++++++-- fs/btrfs/zoned.c | 11 ++++++++ 3 files changed, 84 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6a644f64b22e..1317f5d61024 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -35,6 +35,7 @@ #include "discard.h" #include "rcu-string.h" #include "zoned.h" +#include "dev-replace.h" #undef SCRAMBLE_DELAYED_REFS @@ -1300,6 +1301,46 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, return ret; } +static int do_discard_extent(struct btrfs_bio_stripe *stripe, u64 *bytes) +{ + struct btrfs_device *dev = stripe->dev; + struct btrfs_fs_info *fs_info = dev->fs_info; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + u64 phys = stripe->physical; + u64 len = stripe->length; + u64 discarded = 0; + int ret = 0; + + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, phys, len)) { + u64 src_disc; + + ret = btrfs_reset_device_zone(dev, phys, len, &discarded); + if (ret) + goto out; + + if (!btrfs_dev_replace_is_ongoing(dev_replace) || + dev != dev_replace->srcdev) + goto out; + + src_disc = discarded; + + /* send to replace target as well */ + ret = btrfs_reset_device_zone(dev_replace->tgtdev, phys, len, + &discarded); + discarded += src_disc; + } else if (blk_queue_discard(bdev_get_queue(stripe->dev->bdev))) { + ret = btrfs_issue_discard(dev->bdev, phys, len, &discarded); + } else { + ret = 0; + *bytes = 0; + } + +out: + *bytes = discarded; + return ret; +} + int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes, u64 *actual_bytes) { @@ -1333,28 +1374,14 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { - struct btrfs_device *dev = stripe->dev; - u64 physical = stripe->physical; - u64 length = stripe->length; u64 bytes; - struct request_queue *req_q; if (!stripe->dev->bdev) { ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } - req_q = bdev_get_queue(stripe->dev->bdev); - /* Zone reset in ZONED mode */ - if (btrfs_can_zone_reset(dev, physical, length)) - ret = btrfs_reset_device_zone(dev, physical, - length, &bytes); - else if (blk_queue_discard(req_q)) - ret = btrfs_issue_discard(dev->bdev, physical, - length, &bytes); - else - continue; - + ret = do_discard_extent(stripe, &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4cb5e940356e..a99735dda515 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5973,9 +5973,29 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info, return ret; } +static bool is_block_group_to_copy(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + bool ret; + + /* non-ZONED mode does not use "to_copy" flag */ + if (!btrfs_is_zoned(fs_info)) + return false; + + cache = btrfs_lookup_block_group(fs_info, logical); + + spin_lock(&cache->lock); + ret = cache->to_copy; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + return ret; +} + static void handle_ops_on_dev_replace(enum btrfs_map_op op, struct btrfs_bio **bbio_ret, struct btrfs_dev_replace *dev_replace, + u64 logical, int *num_stripes_ret, int *max_errors_ret) { struct btrfs_bio *bbio = *bbio_ret; @@ -5988,6 +6008,15 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, if (op == BTRFS_MAP_WRITE) { int index_where_to_add; + /* + * a block group which have "to_copy" set will + * eventually copied by dev-replace process. We can + * avoid cloning IO here. + */ + if (is_block_group_to_copy(dev_replace->srcdev->fs_info, + logical)) + return; + /* * duplicate the write operations while the dev replace * procedure is running. Since the copying of the old disk to @@ -6383,8 +6412,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op)) { - handle_ops_on_dev_replace(op, &bbio, dev_replace, &num_stripes, - &max_errors); + handle_ops_on_dev_replace(op, &bbio, dev_replace, logical, + &num_stripes, &max_errors); } *bbio_ret = bbio; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index b66f57119068..a9079e267676 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -11,6 +11,7 @@ #include "disk-io.h" #include "block-group.h" #include "transaction.h" +#include "dev-replace.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1039,6 +1040,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) for (i = 0; i < map->num_stripes; i++) { bool is_sequential; struct blk_zone zone; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + int dev_replace_is_ongoing = 0; device = map->stripes[i].dev; physical = map->stripes[i].physical; @@ -1065,6 +1068,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) */ btrfs_dev_clear_zone_empty(device, physical); + down_read(&dev_replace->rwsem); + dev_replace_is_ongoing = + btrfs_dev_replace_is_ongoing(dev_replace); + if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL) + btrfs_dev_clear_zone_empty(dev_replace->tgtdev, + physical); + up_read(&dev_replace->rwsem); + /* * The group is mapped to a sequential zone. Get the zone write * pointer to determine the allocation offset within the zone. From patchwork Fri Jan 22 06:21:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F198C4332B for ; Fri, 22 Jan 2021 06:31:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 26CC4208C7 for ; Fri, 22 Jan 2021 06:31:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726647AbhAVGar (ORCPT ); Fri, 22 Jan 2021 01:30:47 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51031 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726942AbhAVG3I (ORCPT ); Fri, 22 Jan 2021 01:29:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296948; x=1642832948; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7WkPOppAzETbSGoiCpo2k0JMaZoq5r7/CbQ0WRHJMvY=; b=nPxDM3LojiZpzIcXfyxDx1MwpJpO/Z1d15KiWg8XlTZ/wtgpv7uylsM1 JG79L+qFhhNHQyQLpO+i0CvM3Ru+9wNFPbgJXXw5Bf4TNSdmhKfry89HT dfh7aZnxirmt4ok4Dn/J45sl4cN663kcm3QCU4KpMaoA493NcC0aIBIhD y/Rz/GdWiQG7IJx2ZVhkGsD7/Xf/JCNTstZeFwEsXxek3rpifIR57Cq7O Xz+zjTEVyGCN01Y932RBEBnULbn4LvP7kVb6nk28CFx2bN3as2BsQeR/5 H48UtU0BYxs6N/Q4NLXcF4aX7kstqXovkEZjii4aaSxhwEchV7n7S3gmT A==; IronPort-SDR: Uq9aFe8i9vJ/BTuNcML4KAL7drMsCnffQDYD0noi6ZEDM5so9SsznQF7umAYlSLekRPJ+dc7Y5 m88FM3Sb4FMe2cqvegBbzztakIpeuXi8B6MxQfO/wzlOMxWE4rCAkfD7HRa0znP5pYRfAdwAqh fpu3rdm53G/cf6xMg1jx//sBkBuN57KM/1vui+Q1HZCbduiikCyJ6WFFxOa6Tn0s/rJeZSECYB TmdFxkxCuifJ029MPsb8M4agPvLMmK7K17UOakRNVl6j2CfigiC9QDyA4VSw7KHToQvj38LaMN bWE= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392070" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:19 +0800 IronPort-SDR: S9NxQ2efH+fRY/9PTxMGpztgqKvxzftRDaHgFSkSLgvEVquj45xKnLjCk7YoNy+LNktaejIxxm 6IVyvAZ3tZ9CvbtGrHUKYbGt5YLi+ouHS0+RpLsA39TJ7wKddyauDmJiTdKKR9DIZN135o1o/F 7c9RQjN4mZvLgC/azd+u3uOYiTjCDqlp9/EKKtXCmoVfAJeLgSs01hwkTp0bi9RSV+AqOLH0AQ 1b6L7bC0/wrwXyIXwRR0fzGUx5C3MUt6FO5MdFvewER0IlsnFwDVi2jZg1RbcPANxccW/25a5A XqsvPbdUv32ZoWuCvX0hUYmU Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:51 -0800 IronPort-SDR: 6OwiKWKx3acz8M+K6qtMVFi+hPcuPai/qRxAF7Pujp46KgDjzwegVRmGVlBOG3QCggCsHTxwuS /03/rvrUkwPqnsWddVdueUfQgYyyAJ0hv263A5U/StPznvq0joMngtCx/ntod0m3Zr2ocxI2zT Bvh7vHkBxaYZ7IueX28bi/LD4fT82pxckpHV2L+NhK3Ax4fCHuPmXMb1AsinyiJG4U65XB5pZB XttRYdsy1aEpTTikv5RG3wntoZ326HRkwwOOSoovEqNGWacOyvYVcP2XGi94rxN+MgZYzMWccP gAc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:17 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 35/42] btrfs: implement copying for ZONED device-replace Date: Fri, 22 Jan 2021 15:21:35 +0900 Message-Id: <6871488842cd4657fac35246c3f3b79cbbbdbe7c.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 3/4 patch to implement device-replace on ZONED mode. This commit implement copying. So, it track the write pointer during device replace process. Device-replace's copying is smart to copy only used extents on source device, we have to fill the gap to honor the sequential write rule in the target device. Device-replace process in ZONED mode must copy or clone all the extents in the source device exactly once. So, we need to use to ensure allocations started just before the dev-replace process to have their corresponding extent information in the B-trees. finish_extent_writes_for_zoned() implements that functionality, which basically is the removed code in the commit 042528f8d840 ("Btrfs: fix block group remaining RO forever after error during device replace"). Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.c | 12 +++++++ fs/btrfs/zoned.h | 8 +++++ 3 files changed, 106 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index b57c1184f330..b03c3629fb12 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -166,6 +166,7 @@ struct scrub_ctx { int pages_per_rd_bio; int is_dev_replace; + u64 write_pointer; struct scrub_bio *wr_curr_bio; struct mutex wr_lock; @@ -1619,6 +1620,25 @@ static int scrub_write_page_to_dev_replace(struct scrub_block *sblock, return scrub_add_page_to_wr_bio(sblock->sctx, spage); } +static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) +{ + int ret = 0; + u64 length; + + if (!btrfs_is_zoned(sctx->fs_info)) + return 0; + + if (sctx->write_pointer < physical) { + length = physical - sctx->write_pointer; + + ret = btrfs_zoned_issue_zeroout(sctx->wr_tgtdev, + sctx->write_pointer, length); + if (!ret) + sctx->write_pointer = physical; + } + return ret; +} + static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, struct scrub_page *spage) { @@ -1641,6 +1661,13 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, if (sbio->page_count == 0) { struct bio *bio; + ret = fill_writer_pointer_gap(sctx, + spage->physical_for_dev_replace); + if (ret) { + mutex_unlock(&sctx->wr_lock); + return ret; + } + sbio->physical = spage->physical_for_dev_replace; sbio->logical = spage->logical; sbio->dev = sctx->wr_tgtdev; @@ -1705,6 +1732,10 @@ static void scrub_wr_submit(struct scrub_ctx *sctx) * doubled the write performance on spinning disks when measured * with Linux 3.5 */ btrfsic_submit_bio(sbio->bio); + + if (btrfs_is_zoned(sctx->fs_info)) + sctx->write_pointer = sbio->physical + + sbio->page_count * PAGE_SIZE; } static void scrub_wr_bio_end_io(struct bio *bio) @@ -3028,6 +3059,21 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return ret < 0 ? ret : 0; } +static void sync_replace_for_zoned(struct scrub_ctx *sctx) +{ + if (!btrfs_is_zoned(sctx->fs_info)) + return; + + sctx->flush_all_writes = true; + scrub_submit(sctx); + mutex_lock(&sctx->wr_lock); + scrub_wr_submit(sctx); + mutex_unlock(&sctx->wr_lock); + + wait_event(sctx->list_wait, + atomic_read(&sctx->bios_in_flight) == 0); +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3168,6 +3214,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ blk_start_plug(&plug); + if (sctx->is_dev_replace && + btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) { + mutex_lock(&sctx->wr_lock); + sctx->write_pointer = physical; + mutex_unlock(&sctx->wr_lock); + sctx->flush_all_writes = true; + } + /* * now find all extents for each stripe and scrub them */ @@ -3356,6 +3410,9 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) goto out; + if (sctx->is_dev_replace) + sync_replace_for_zoned(sctx); + if (extent_logical + extent_len < key.objectid + bytes) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { @@ -3478,6 +3535,25 @@ static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx, return ret; } +static int finish_extent_writes_for_zoned(struct btrfs_root *root, + struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_trans_handle *trans; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + btrfs_wait_block_group_reservations(cache); + btrfs_wait_nocow_writers(cache); + btrfs_wait_ordered_roots(fs_info, U64_MAX, cache->start, cache->length); + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) + return PTR_ERR(trans); + return btrfs_commit_transaction(trans); +} + static noinline_for_stack int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, u64 start, u64 end) @@ -3633,6 +3709,16 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, * group is not RO. */ ret = btrfs_inc_block_group_ro(cache, sctx->is_dev_replace); + if (!ret && sctx->is_dev_replace) { + ret = finish_extent_writes_for_zoned(root, cache); + if (ret) { + btrfs_dec_block_group_ro(cache); + scrub_pause_off(fs_info); + btrfs_put_block_group(cache); + break; + } + } + if (ret == 0) { ro_set = 1; } else if (ret == -ENOSPC && !sctx->is_dev_replace) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index a9079e267676..360165bd0396 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1386,3 +1386,15 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, ASSERT(cache->meta_write_pointer == eb->start + eb->len); cache->meta_write_pointer = eb->start; } + +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length) +{ + if (!btrfs_dev_is_sequential(device, physical)) + return -EOPNOTSUPP; + + return blkdev_issue_zeroout(device->bdev, + physical >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, + GFP_NOFS, 0); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index a42e120158ab..a9698470c08e 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -55,6 +55,8 @@ bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, struct btrfs_block_group **cache_ret); void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -169,6 +171,12 @@ static inline void btrfs_revert_meta_write_pointer( { } +static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, + u64 physical, u64 length) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 22 06:21:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30CFEC43381 for ; Fri, 22 Jan 2021 06:31:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 02BFA235FF for ; Fri, 22 Jan 2021 06:30:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727012AbhAVGak (ORCPT ); Fri, 22 Jan 2021 01:30:40 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51117 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726969AbhAVG3L (ORCPT ); Fri, 22 Jan 2021 01:29:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296951; x=1642832951; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q73EUCkaDpj1g3aIn4kBEYnS1Lu3JYMWfvjL63A8GcI=; b=dYtYB7vsaSaKwwvvaCJzNAkJbUanJf9n4kMYG+itSG7JrSxHery3SO6P +1EoaycBHRAkcJ0/a0/CYqX+RRUdyjIVlaQo9P5PFIiv9HawkjQKPm5mO uUohoMzboI0KAN6fnViP9f8XsGl8v3R3TiN4EuT20kKYClvBGD+wkxpgK vJYBtpGmToMX7k7P/BxQWG0P6iBmZMK0fbUCpBY9Zc0DBhPsqR9ENf0M5 xf9ggNHNnkqBeK4bYYFi4B/vSLDpU55l0MoylzUJYOh2amjl5hlz1OXrQ DzTBXhtA9K5dyHViRVjVrZkaHoKxdTw/Zw6OlMhniTdutS051i6lw1FdT A==; IronPort-SDR: b8VJrqLvYAoF84FtNBCVPMXOEd+zAuWZSVXKtrHKweLr/Lh+ac6WiVTkwGIglqfw5MbR/dLQLf u9cwFOl16jc/LJY+lwekWIwzDZk15fVg8qxDGLvp4G7k9BgWv0dmTzEHHAan5V3fp3FK5T962K ZYeV32edsFYqXvyvdUnOfbofd+NbyQhr04n6ipo++J5sASBO51YAxTNFv7kmluPFqqlG13nec/ izRxyvQjFxtTYYgXtD2H6zNJr/wyKJptskEE/O5T01pdAGcEuIlrfzVtHvO44HAfhos4lwdFzs pQE= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392078" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:20 +0800 IronPort-SDR: y5FvMO9dM/q9m/6Z0KJai7myFf5YuhbEyMcSgvxb29Ow/o9VUFnXsvagWZeXAA/HYOaawK92T/ 5bla1SsGVjNNV47+SUopt1qNMrgTzJQl1U2jd/G6FBe8+9H9KVSEFZplb9JREh5OC2s6AqLR2H tn8thMOJF7B+ssnUod2H1tCwMBkZjyts8v/duYdxKrBQh3azf/qhRtYa6RElJZSR2NvE1vDfAy ikXOVCQV3d2kc3Bc+wgUSYvf83dpzfwNHiPQm62DmON0YLaagsQyH998FOBjiDjJXalqWW/o3V nged45HLGYpuvBq2oBAq11Gk Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:52 -0800 IronPort-SDR: 5gispbNdPoAB+BQsB9nKtsNIhUFcUQdB+4JyXraQLjj58Yzovn5AxPfVkFhixYrLUApB6R1fUF VcoFCGlfslzj2scv4C8m+8WZ1cVnxHcqsJb6QFBAWIz0bN8T6tsrGc4+cEtR/AbXyyX7eVbYvi jsdmQytpKYkU/R1ab971dB5iMZh+UPh8bNyNlez9mFPoQUd1d7WGYa3A2QVRndI+lIGwxX2Pxj 95SZHYUVzQ29borG8IVeAL1XTa99TlbDz7ru0U/PhykZz3Vz1tfa1M6fgvzwm6CWMqYvTlx3Uq gsI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:19 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 36/42] btrfs: support dev-replace in ZONED mode Date: Fri, 22 Jan 2021 15:21:36 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 4/4 patch to implement device-replace on ZONED mode. Even after the copying is done, the write pointers of the source device and the destination device may not be synchronized. For example, when the last allocated extent is freed before device-replace process, the extent is not copied, leaving a hole there. This patch synchronize the write pointers by writing zeros to the destination device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 39 +++++++++++++++++++++++++ fs/btrfs/zoned.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 9 ++++++ 3 files changed, 122 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index b03c3629fb12..2f577f3b1c31 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1628,6 +1628,9 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) if (!btrfs_is_zoned(sctx->fs_info)) return 0; + if (!btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) + return 0; + if (sctx->write_pointer < physical) { length = physical - sctx->write_pointer; @@ -3074,6 +3077,31 @@ static void sync_replace_for_zoned(struct scrub_ctx *sctx) atomic_read(&sctx->bios_in_flight) == 0); } +static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical, + u64 physical, u64 physical_end) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + int ret = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); + + mutex_lock(&sctx->wr_lock); + if (sctx->write_pointer < physical_end) { + ret = btrfs_sync_zone_write_pointer(sctx->wr_tgtdev, logical, + physical, + sctx->write_pointer); + if (ret) + btrfs_err(fs_info, "failed to recover write pointer"); + } + mutex_unlock(&sctx->wr_lock); + btrfs_dev_clear_zone_empty(sctx->wr_tgtdev, physical); + + return ret; +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3480,6 +3508,17 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, blk_finish_plug(&plug); btrfs_free_path(path); btrfs_free_path(ppath); + + if (sctx->is_dev_replace && ret >= 0) { + int ret2; + + ret2 = sync_write_pointer_for_zoned(sctx, base + offset, + map->stripes[num].physical, + physical_end); + if (ret2) + ret = ret2; + } + return ret < 0 ? ret : 0; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 360165bd0396..6b3750abc22a 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -12,6 +12,7 @@ #include "block-group.h" #include "transaction.h" #include "dev-replace.h" +#include "space-info.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1398,3 +1399,76 @@ int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, length >> SECTOR_SHIFT, GFP_NOFS, 0); } + +static int read_zone_info(struct btrfs_fs_info *fs_info, u64 logical, + struct blk_zone *zone) +{ + struct btrfs_bio *bbio = NULL; + u64 mapped_length = PAGE_SIZE; + unsigned int nofs_flag; + int nmirrors; + int i, ret; + + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, + &mapped_length, &bbio); + if (ret || !bbio || mapped_length < PAGE_SIZE) { + btrfs_put_bbio(bbio); + return -EIO; + } + + if (bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) + return -EINVAL; + + nofs_flag = memalloc_nofs_save(); + nmirrors = (int)bbio->num_stripes; + for (i = 0; i < nmirrors; i++) { + u64 physical = bbio->stripes[i].physical; + struct btrfs_device *dev = bbio->stripes[i].dev; + + /* Missing device */ + if (!dev->bdev) + continue; + + ret = btrfs_get_dev_zone(dev, physical, zone); + /* Failing device */ + if (ret == -EIO || ret == -EOPNOTSUPP) + continue; + break; + } + memalloc_nofs_restore(nofs_flag); + + return ret; +} + +/* + * Synchronize write pointer in a zone at @physical_start on @tgt_dev, by + * filling zeros between @physical_pos to a write pointer of dev-replace + * source device. + */ +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos) +{ + struct btrfs_fs_info *fs_info = tgt_dev->fs_info; + struct blk_zone zone; + u64 length; + u64 wp; + int ret; + + if (!btrfs_dev_is_sequential(tgt_dev, physical_pos)) + return 0; + + ret = read_zone_info(fs_info, logical, &zone); + if (ret) + return ret; + + wp = physical_start + ((zone.wp - zone.start) << SECTOR_SHIFT); + + if (physical_pos == wp) + return 0; + + if (physical_pos > wp) + return -EUCLEAN; + + length = wp - physical_pos; + return btrfs_zoned_issue_zeroout(tgt_dev, physical_pos, length); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index a9698470c08e..8c203c0425e0 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -57,6 +57,8 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length); +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -177,6 +179,13 @@ static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, return -EOPNOTSUPP; } +static inline int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, + u64 logical, u64 physical_start, + u64 physical_pos) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jan 22 06:21:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 085D0C433E6 for ; Fri, 22 Jan 2021 06:31:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B791A238D7 for ; Fri, 22 Jan 2021 06:30:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727000AbhAVGaO (ORCPT ); Fri, 22 Jan 2021 01:30:14 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51034 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726974AbhAVG3M (ORCPT ); Fri, 22 Jan 2021 01:29:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296951; x=1642832951; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C0suuqAEVuTZh016N5UsV5kbXaTibBUMf8NpAAdpB+4=; b=rz9wJBAsCRf37AbqVk2x7PsqRJRjzJVJzwHxDmwoZGauRQoKPQLmo9Aj XG1h6Xni1b/nPz8t7bcmksVn9NrG5OEYQs+55YYT8A8MjdcYXAmbpEz9y uwYkrtCbHbmHPskwekJvCg6Bq+CcUqNW31Kf5uQ2s4gON5lXSFHuASi0f NKqEGiQYC8T4Y0x8dUfyA1lTHh0mDw2tbixEQEzxsJxIkLhd1rC9vXXRx IIlifBYC0wpeVtAfVGOisg+itzXXeLbjN1dtbu1lUHJJrYe+9iBY5Y1sc xg7cHVU5K0cJs3KD/IuNwdEs19hpeT9dlsn4Q2JGyuxlTMFFRBhEzda5i w==; IronPort-SDR: lp4XoL4fAto9lKPuoXAsValU0lTtSHav13JYv7j04PzF1BsVuKr/13B1BWfnn9kgIxPzbiSrHf M7KncdJwJ0vq6XkYNc3qXC9R/72M3aEROTS0d8FIBhszoeD7K2AD/mpRua5APlFgpTlkke82kT HokEITRuY8Uj3NxrrBkiVwMvBz4SJCX/Pl+n8ouXVZYo+PNI4FRZePYOSwUaMZBNIuMhKGVilK IWCOwrqmjUhdRBstbQ+hp47pTKEFkGquiYdhGbU/te/aA0yDdvEfqC8wnIJMcKTFoGbwpXgALj DDE= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392081" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:22 +0800 IronPort-SDR: uQvdYoTVCZDkz5nOsbidNS28bznti3h6AfGRNuZ/f/vsy6faa2gGvaP1KcxHyzNgrFeDiHQtZ6 8WxIQPVJ9iYY3CbOrkI5BCJluP5JWIadf136FFfjG+OljwPoV1//3shOUmSrokchk4IvcwL0zZ a5zaNnQWErsi9BZNwhZBQIYlKvLmuQmWYUadfE4GYev7GS5mrC992/C9O6iCykgw905CBq2nMi VoUcSiqmrxtAPOAJR3Mo9x95zjqq5HVYxf4HYR0cg0cGAGQhB0Ixi8hYQCH8bpI6qdqmrCv4WM AGNCNylFI6MxSIxXIQTe9/Mj Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:54 -0800 IronPort-SDR: eVwryliSKZSxWvTim7ym3y745wvJpHaft1dsrR4Q/z665mbMivbs3rumJDtx25DctojBV8u9w4 7hGLZkByuJ3+IX2ZLH72IJDHPli/62GwLcd9OJkysFH/Ot8dKhvkrvH8PXQkwq2T/wONmlIyWi jMx39EdqhVX+FAP1n+TA9aT+W+oRwM83qN34yIdi4SUst0V/D+tErxf6Ht4YyhKHZWIK+JRa3f zYvsikkeL1Lc3AX60lQFezzSbrF/OXraRCz5hWrZo1BIAKyrtyUcj4pNH6YhqiJk2Oqf0t5CDy 9jU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:21 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 37/42] btrfs: enable relocation in ZONED mode Date: Fri, 22 Jan 2021 15:21:37 +0900 Message-Id: <7186d9b29e59a1658a55e7bc6a1998c89097754e.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org To serialize allocation and submit_bio, we introduced mutex around them. As a result, preallocation must be completely disabled to avoid a deadlock. Since current relocation process relies on preallocation to move file data extents, it must be handled in another way. In ZONED mode, we just truncate the inode to the size that we wanted to pre-allocate. Then, we flush dirty pages on the file before finishing relocation process. run_delalloc_zoned() will handle all the allocation and submit IOs to the underlying layers. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/relocation.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 9f2289bcdde6..702986b83f6c 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2555,6 +2555,31 @@ static noinline_for_stack int prealloc_file_extent_cluster( if (ret) return ret; + /* + * In ZONED mode, we cannot preallocate the file region. Instead, we + * dirty and fiemap_write the region. + */ + if (btrfs_is_zoned(inode->root->fs_info)) { + struct btrfs_root *root = inode->root; + struct btrfs_trans_handle *trans; + + end = cluster->end - offset + 1; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + inode->vfs_inode.i_ctime = current_time(&inode->vfs_inode); + i_size_write(&inode->vfs_inode, end); + ret = btrfs_update_inode(trans, root, inode); + if (ret) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + } + + return btrfs_end_transaction(trans); + } + inode_lock(&inode->vfs_inode); for (nr = 0; nr < cluster->nr; nr++) { start = cluster->boundary[nr] - offset; @@ -2751,6 +2776,8 @@ static int relocate_file_extent_cluster(struct inode *inode, } } WARN_ON(nr != cluster->nr); + if (btrfs_is_zoned(fs_info) && !ret) + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); out: kfree(ra); return ret; @@ -3429,8 +3456,12 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, struct btrfs_path *path; struct btrfs_inode_item *item; struct extent_buffer *leaf; + u64 flags = BTRFS_INODE_NOCOMPRESS | BTRFS_INODE_PREALLOC; int ret; + if (btrfs_is_zoned(trans->fs_info)) + flags &= ~BTRFS_INODE_PREALLOC; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -3445,8 +3476,7 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, btrfs_set_inode_generation(leaf, item, 1); btrfs_set_inode_size(leaf, item, 0); btrfs_set_inode_mode(leaf, item, S_IFREG | 0600); - btrfs_set_inode_flags(leaf, item, BTRFS_INODE_NOCOMPRESS | - BTRFS_INODE_PREALLOC); + btrfs_set_inode_flags(leaf, item, flags); btrfs_mark_buffer_dirty(leaf); out: btrfs_free_path(path); From patchwork Fri Jan 22 06:21:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 259D4C433DB for ; Fri, 22 Jan 2021 06:30:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D3A48235FF for ; Fri, 22 Jan 2021 06:30:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726993AbhAVGaN (ORCPT ); Fri, 22 Jan 2021 01:30:13 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51138 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726980AbhAVG3U (ORCPT ); Fri, 22 Jan 2021 01:29:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611296960; x=1642832960; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KniscivBnmSPeKpF8HfbtvbLDZRK/U7ytCGfrgiswNM=; b=D+YH+zn1mGY+XHULXIf2j4Fojxt9lu38/+OHKl2qOp2DnID/5gO0bXHg uA75YL7NEf/BmRl9ZosybU9uMvaG6kTUL6O/xBersYTKVW5RErgQCQfvH /1k4Z+a9S0+/1a1S+bB0Eqfu3cU6LxuUMdD7sdBXQgSFuOQ6RXOfEBNKA Qe0f/FGSqwyjZeo63gs03uLoF9alVp/yT3V3Rt2QbzlIOSM4O63x+nWno x52TqgLt9UWvv3HzMBLe+aCkZUkITonTL71+mt9snJeQcYSyHD7dow6MY OPF6D+5ZvkEyBjtQnJvt0Lp5m3AnnO3EfIXT4gOpqQTXqYLfaBGZ1RBjk g==; IronPort-SDR: d6YkMsrFLQuy2kDG764bbyXXlV/JCc25pPu8gdYCTSMYh16vfSI8amM3Fycej5VUSWrq3JucNs J7FrEkYjuOjAIGhprbLjtaI5Q7yxyc3jEsbfDGrk1aXcmx83leq2vmCCAYP3dmbQBOJHb7jbu1 y0cu5m0PEcT5CSAen7hB+A00mYf/ylkFh9FGvpF2hINeyXmx2u5IyqNSRQj9wI552beatDlSka qgWBOitMXt/Re4w1c9a/6qalm99o08IUmRAhg6g8GMSnwkKwrIcjIFGSoj3U2h32wZBbVwSeam PmY= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392088" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:24 +0800 IronPort-SDR: QSfD5IyTHDkcqMfIqpHt7iJkCs7Atdnui/dApvvWn5c/ULR7MfGhcjIoZcZ5Cug5S4NrSBVff6 A6b3lIj1MT6qvZxcQPiJKSu4eng2h/GERiIlUjDTT5GSeg8rSa/oJkqGSWp5iXK9TP2ZWf3XDy 1UZzMcOoFfHdWQh9z9M0NkPpGtCLe700rvIA7hIDeZpuuSyilHjvDG/gE4KieR/wAT3sKRDRPE CvqJVVoPeELt31ngia6Y/u9mkbbhoHC43u/vM983bBVYdbMPbWfGcfQMcIRPA8RFG6YSDjfjfg Mxuo6vUDlzKQqBc/b7l7dGcM Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:56 -0800 IronPort-SDR: QOVkWQZJZNY93FwRLN1WvjCkOUBxGKEv9lRovHau/OlI/kphk4dPIvmlEcGoARFUgWnJw5ZIxV noZK1IzRzUBbad+jcKTuxmU+KHrOXK9Y8R3Ci+EENRkE5QyXml3BpzAMXAClOdET+YIVq1djoT I8RnQB+HindTvygs+s5AV+PJLEjvJiFn8E7JRdfQxuYYURu8k3F6Kj3M+gZl4/Jpmp/oAI/tGD NkphWcWa83Wxh259Q8M5FQhDsD20vVyTNNicJaQ+svciILJOhPQvYUoTaZ/9shxPF35YfPkVKl B4g= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:22 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v13 38/42] btrfs: relocate block group to repair IO failure in ZONED Date: Fri, 22 Jan 2021 15:21:38 +0900 Message-Id: <68afb2cd7ac6399a7b781cda41d9ad4b282288c7.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When btrfs find a checksum error and if the file system has a mirror of the damaged data, btrfs read the correct data from the mirror and write the data to damaged blocks. This repairing, however, is against the sequential write required rule. We can consider three methods to repair an IO failure in ZONED mode: (1) Reset and rewrite the damaged zone (2) Allocate new device extent and replace the damaged device extent to the new extent (3) Relocate the corresponding block group Method (1) is most similar to a behavior done with regular devices. However, it also wipes non-damaged data in the same device extent, and so it unnecessary degrades non-damaged data. Method (2) is much like device replacing but done in the same device. It is safe because it keeps the device extent until the replacing finish. However, extending device replacing is non-trivial. It assumes "src_dev->physical == dst_dev->physical". Also, the extent mapping replacing function should be extended to support replacing device extent position in one device. Method (3) invokes relocation of the damaged block group, so it is straightforward to implement. It relocates all the mirrored device extents, so it is, potentially, a more costly operation than method (1) or (2). But it relocates only using extents which reduce the total IO size. Let's apply method (3) for now. In the future, we can extend device-replace and apply method (2). For protecting a block group gets relocated multiple time with multiple IO errors, this commit introduces "relocating_repair" bit to show it's now relocating to repair IO failures. Also it uses a new kthread "btrfs-relocating-repair", not to block IO path with relocating process. This commit also supports repairing in the scrub process. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 1 + fs/btrfs/extent_io.c | 3 ++ fs/btrfs/scrub.c | 3 ++ fs/btrfs/volumes.c | 71 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 5 files changed, 79 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 3dec66ed36cb..36654bcd2a83 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -96,6 +96,7 @@ struct btrfs_block_group { unsigned int has_caching_ctl:1; unsigned int removed:1; unsigned int to_copy:1; + unsigned int relocating_repair:1; int disk_cache_state; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ed976f6e620c..70a296087711 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2258,6 +2258,9 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); BUG_ON(!mirror_num); + if (btrfs_is_zoned(fs_info)) + return btrfs_repair_one_zone(fs_info, logical); + bio = btrfs_io_bio_alloc(1); bio->bi_iter.bi_size = 0; map_length = length; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 2f577f3b1c31..d0c47ef72d46 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -857,6 +857,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) have_csum = sblock_to_check->pagev[0]->have_csum; dev = sblock_to_check->pagev[0]->dev; + if (btrfs_is_zoned(fs_info) && !sctx->is_dev_replace) + return btrfs_repair_one_zone(fs_info, logical); + /* * We must use GFP_NOFS because the scrub task might be waiting for a * worker task executing this function and in turn a transaction commit diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a99735dda515..0f6a79e67666 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7990,3 +7990,74 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr) spin_unlock(&fs_info->swapfile_pins_lock); return node != NULL; } + +static int relocating_repair_kthread(void *data) +{ + struct btrfs_block_group *cache = (struct btrfs_block_group *) data; + struct btrfs_fs_info *fs_info = cache->fs_info; + u64 target; + int ret = 0; + + target = cache->start; + btrfs_put_block_group(cache); + + if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) { + btrfs_info(fs_info, + "zoned: skip relocating block group %llu to repair: EBUSY", + target); + return -EBUSY; + } + + mutex_lock(&fs_info->delete_unused_bgs_mutex); + + /* Ensure Block Group still exists */ + cache = btrfs_lookup_block_group(fs_info, target); + if (!cache) + goto out; + + if (!cache->relocating_repair) + goto out; + + ret = btrfs_may_alloc_data_chunk(fs_info, target); + if (ret < 0) + goto out; + + btrfs_info(fs_info, "zoned: relocating block group %llu to repair IO failure", + target); + ret = btrfs_relocate_chunk(fs_info, target); + +out: + if (cache) + btrfs_put_block_group(cache); + mutex_unlock(&fs_info->delete_unused_bgs_mutex); + btrfs_exclop_finish(fs_info); + + return ret; +} + +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + + /* Do not attempt to repair in degraded state */ + if (btrfs_test_opt(fs_info, DEGRADED)) + return 0; + + cache = btrfs_lookup_block_group(fs_info, logical); + if (!cache) + return 0; + + spin_lock(&cache->lock); + if (cache->relocating_repair) { + spin_unlock(&cache->lock); + btrfs_put_block_group(cache); + return 0; + } + cache->relocating_repair = 1; + spin_unlock(&cache->lock); + + kthread_run(relocating_repair_kthread, cache, + "btrfs-relocating-repair"); + + return 0; +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 0bcf87a9e594..54f475e0c702 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -597,5 +597,6 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical); #endif From patchwork Fri Jan 22 06:21:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59123C433DB for ; Fri, 22 Jan 2021 06:32:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0F3B5239D1 for ; Fri, 22 Jan 2021 06:32:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727035AbhAVGcg (ORCPT ); Fri, 22 Jan 2021 01:32:36 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51039 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726974AbhAVGaT (ORCPT ); Fri, 22 Jan 2021 01:30:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611297019; x=1642833019; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mMmFlm1ADxq3dYqfU0mxrP6qLI19NJyl4vx6551H+1s=; b=RMVBDivLnm8yAkDymkakd1Ec4IEmcx7zIi+cTlrIGoLIG1/dCI3cIxT8 bnDfBgvuiVeRK8dNAV8kSQTSJ5kujSNFyk45+C/7me55ug1Jz2ZRtTkjr u4KHvSUv9OXD1eY4b7bPEeMjxjQtixYFGwgrrTMB9CA8yvIrDQJr7qL9Q xNmd2XzqeFRBl2YIQTQJSq0pOx4BBfcg7qDat4wa7UCncOxKb264GobhV tab8lc6lXIRizaLBjTB/RzExmE20hvb8TDVebE66EUbdc3Z1Eb6gztHlf IX/B1RRsrUwwndd2NmpJ7WUn8hEiFBNCHw2LKCQO4PlXe1kwybW9Ghtoe A==; IronPort-SDR: fp06muvTz2FyX1kaTu66n9ATV/nySIQZvJNJk5TQEe9t3ReojQplY0+GhEYMrbMcx0YLCc1dXR oTD6IZCwwql1S8G8QuD+cU4pPxJLyXABx2NRaOyQoiMwynYyhYCh/07g3u+j7mNkku7s3HYKtk +ADUb7dBosVLbzJ2T5eS79RDbxZdofp2BzjM7BPkGdY4nKj+872vIqBS1HMYdfRsveAMVZb+37 STcQPQXkSD+H4bcP+EEbJ5FTecWgC5BCHaLjlcv+2fnffR6xI88uYcGJKikDmrLjUzHOaaAgZF 5lA= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392091" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:26 +0800 IronPort-SDR: WmZmx0zT0rW7Rt3neLtFdHnXCWnSxlLnIJTa6JWw/8PUIruG1uzONRI+1u618zdOj/MP66NnPD Tkx2V58IK/r2ulfy+g6kti5eXmsOV3EhmNwUH0UNMu29cMAzXx/z7ERQMWxJYOUkcX5j2Qa2B/ KVFFKT1mZf27SiOQ0txRMldVascfK9izLduLlByuW/s1KqnVCxEwAnzuH5RHDqpp2YDnPx7XGM 65kmWXdzBCAfzaWYX2f6hDrojgcvvAD0IsFGJqleTrD/zZ0zETZN8RNZxWACsO+FIln5dnRY8t kjxqck6k4atdiRXZjfVCI/lF Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:57 -0800 IronPort-SDR: 544bDMKixWiEQUPiNojbFMrcWZRfa/U/Lezd71ivZnwASOPIlDwCMoHSTClT51BczOhxfei5wZ wHTUL3G3Ojobjnz8tJ7pXJQwLFPIPj9OtlHA60WPGFdo19qRKJKKtI/kceNmQDzzCfbRUP9joI 55kRNoMw7Tt13y61WIdbMyTM0ReRwWGHssGwWkOtzcpWkA/uTQwKORnyhPnowZPoVksOQxGWwD RSdQNcnTSE92hZKalI6PDkaDxceEmJFb6pZaQOeuapTpwNbx3I6YFP0E3zk4pBP1DEEzUIt4Gi eF0= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:24 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v13 39/42] btrfs: split alloc_log_tree() Date: Fri, 22 Jan 2021 15:21:39 +0900 Message-Id: <074f66f9d68a916d5c897b08c80860375fbfb974.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is a preparation for the next patch. This commit split alloc_log_tree() to allocating tree structure part (remains in alloc_log_tree()) and allocating tree node part (moved in btrfs_alloc_log_tree_node()). The latter part is also exported to be used in the next patch. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 33 +++++++++++++++++++++++++++------ fs/btrfs/disk-io.h | 2 ++ 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5d14100ecf72..2e2f09a46f45 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1197,7 +1197,6 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *root; - struct extent_buffer *leaf; root = btrfs_alloc_root(fs_info, BTRFS_TREE_LOG_OBJECTID, GFP_NOFS); if (!root) @@ -1207,6 +1206,14 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, root->root_key.type = BTRFS_ROOT_ITEM_KEY; root->root_key.offset = BTRFS_TREE_LOG_OBJECTID; + return root; +} + +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root) +{ + struct extent_buffer *leaf; + /* * DON'T set SHAREABLE bit for log trees. * @@ -1219,26 +1226,33 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, leaf = btrfs_alloc_tree_block(trans, root, 0, BTRFS_TREE_LOG_OBJECTID, NULL, 0, 0, 0, BTRFS_NESTING_NORMAL); - if (IS_ERR(leaf)) { - btrfs_put_root(root); - return ERR_CAST(leaf); - } + if (IS_ERR(leaf)) + return PTR_ERR(leaf); root->node = leaf; btrfs_mark_buffer_dirty(root->node); btrfs_tree_unlock(root->node); - return root; + + return 0; } int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; @@ -1250,11 +1264,18 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *log_root; struct btrfs_inode_item *inode_item; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + log_root->last_trans = trans->transid; log_root->root_key.offset = root->root_key.objectid; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 9f4a2a1e3d36..0e7e9526b6a8 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -120,6 +120,8 @@ blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, extent_submit_bio_start_t *submit_bio_start); blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio, int mirror_num); +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root); int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_add_log_tree(struct btrfs_trans_handle *trans, From patchwork Fri Jan 22 06:21:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CF2FC433DB for ; Fri, 22 Jan 2021 06:33:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EB61D235FF for ; Fri, 22 Jan 2021 06:33:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726924AbhAVGc1 (ORCPT ); Fri, 22 Jan 2021 01:32:27 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51100 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726977AbhAVGaV (ORCPT ); Fri, 22 Jan 2021 01:30:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611297021; x=1642833021; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ty3y61SAJAsARvaP7frpCCbEjnIreuR4z9tvb2Zs4vY=; b=L6vwW9HKtL5iEbI0kkji5HtnlyPL26oLM0brcKWpImG6GAQf3r/7u8pz oXI9d+HoocRJVf0SpYfcHs9WRVJnl5ImFUpmmTI8HsRGO9s6TiQUu185N EPjG15u4TLEfq65zKbjR9Am5peQnaQL255wJc0PP8reDD2bJXLavTBQE1 jf/LFZ37N38xntsLUasxBMkvB1YGCxYnkSuaQQdtqmSCUdZ707qHQ9eWN QkzcMUIsASEn6CL5ubCakw0r0a/LItq3HAcum11+s43MKekujSGF9DyNB WXC60OuaodrWo8sA2GVOxV4n75KoeMQklULLh0Evxc2F5JWCvsOipl9m2 A==; IronPort-SDR: Ae3Qcncs7Z+tI8xebCYRvCtPThMC0f/wUcTMgXbP/phdSko1TS1IoZqgaxOMvNZAtHQbp1IcMs X3G8hhEc4JRw4mLyPmuCIsvMJK0E/Wh0TrNjF19OSQYtsqKPg37ZMZhS3INnbYoX4VgzP2gSfo ybNiNLKtn7lGsmmazH+Dvsiaxa8d9rKLsFwGVsuxPtQy3Ock6SJD4rnE1uR/YazsY0KsYoGDIy ekbLWbrGBfu8DEm7wkK1l5U/EmeOlcZUGswcIZS+wOmlH/cPSb6It32u8nt6uQ8OrNvD3M55u9 lVk= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392098" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:27 +0800 IronPort-SDR: Xvu3r79PEYMiKXIcTPmZA90ogupFY8Le7aL7v5j53W3VCdxA/S+7+cXsbLOGXgzSKKSyHAjet8 KH9T6XG4/tKHA5Kq9Ivgki934Cy0SpMUT+bcMc8CdcqxmF/kJqbDiU13gbB27wD0GfLBEyV0zB MZHrGm3sa1CTqDwkG3XTHL1bp1XLHRXQWV6p+HrA7tZv3aIoJVojHhEf+VTZ3A4OhOpHhEQasY /lbHLgnJIG3Upo7UTWZNDbmRquFWc6l/QFBeMp63qUpbUGzo9m5klCQXvG3tNFyrzi9Q33Feyb Q0rHkm7bPjOwElVR1BvvV6j0 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:05:59 -0800 IronPort-SDR: xU5zYB/dqIp8e1EXULOw0Q2d8pKhN6MEnK6UmqWGn/J2VpEhJ1kInhhz2kzwWKYDkOdXA2ZPqK 3h24z3KNTVGHt4xWt/z66h+NCHEJNO8gDAsEQ6hBAfAC76TdGgy4RO7xNd+Ev5bHfT15cwE53M WyMXklO7CsG4JMUpkhgkyt9XXInLqh6mhyEMafuCn5G+8Rx055yInb/mIYDw+Z3rKN3z8Szczn NWYVH323I+v3rxOUt7k9pc0osHYLp8bjZ6APm6k/NgueOcsXPydEg17msyi1EOGRs0eDcyXFfl 8v4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:26 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v13 40/42] btrfs: extend zoned allocator to use dedicated tree-log block group Date: Fri, 22 Jan 2021 15:21:40 +0900 Message-Id: <4449f11b454278c57ffd6592b634012675b1539b.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/3 patch to enable tree log on ZONED mode. The tree-log feature does not work on ZONED mode as is. Blocks for a tree-log tree are allocated mixed with other metadata blocks, and btrfs writes and syncs the tree-log blocks to devices at the time of fsync(), which is different timing from a global transaction commit. As a result, both writing tree-log blocks and writing other metadata blocks become non-sequential writes that ZONED mode must avoid. We can introduce a dedicated block group for tree-log blocks so that tree-log blocks and other metadata blocks can be separated write streams. As a result, each write stream can now be written to devices separately. "fs_info->treelog_bg" tracks the dedicated block group and btrfs assign "treelog_bg" on-demand on tree-log block allocation time. This commit extends the zoned block allocator to use the block group. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 2 ++ fs/btrfs/ctree.h | 2 ++ fs/btrfs/disk-io.c | 1 + fs/btrfs/extent-tree.c | 75 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/zoned.h | 14 ++++++++ 5 files changed, 90 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 56fab3d490b0..9f768c81d464 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -893,6 +893,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, btrfs_return_cluster_to_free_space(block_group, cluster); spin_unlock(&cluster->refill_lock); + btrfs_clear_treelog_bg(block_group); + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4e9c55171ddb..2deb34a8d65d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -973,6 +973,8 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; struct mutex zoned_meta_io_lock; + spinlock_t treelog_bg_lock; + u64 treelog_bg; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2e2f09a46f45..c3b5cfe4d928 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2722,6 +2722,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) spin_lock_init(&fs_info->super_lock); spin_lock_init(&fs_info->buffer_lock); spin_lock_init(&fs_info->unused_bgs_lock); + spin_lock_init(&fs_info->treelog_bg_lock); rwlock_init(&fs_info->tree_mod_log_lock); mutex_init(&fs_info->unused_bg_unpin_mutex); mutex_init(&fs_info->delete_unused_bgs_mutex); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1317f5d61024..fe8f1211f74c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3590,6 +3590,9 @@ struct find_free_extent_ctl { bool have_caching_bg; bool orig_have_caching_bg; + /* Allocation is called for tree-log */ + bool for_treelog; + /* RAID index, converted from flags */ int index; @@ -3818,6 +3821,22 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Tree-log Block Group Locking + * ============================ + * + * fs_info::treelog_bg_lock protects the fs_info::treelog_bg which + * indicates the starting address of a block group, which is reserved only + * for tree-log metadata. + * + * Lock nesting + * ============ + * + * space_info::lock + * block_group::lock + * fs_info::treelog_bg_lock + */ + /* * Simple allocator for sequential only block group. It only allows * sequential allocation. No need to play with trees. This function @@ -3827,23 +3846,54 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) { + struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_space_info *space_info = block_group->space_info; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; u64 start = block_group->start; u64 num_bytes = ffe_ctl->num_bytes; u64 avail; + u64 bytenr = block_group->start; + u64 log_bytenr; int ret = 0; + bool skip; ASSERT(btrfs_is_zoned(block_group->fs_info)); + /* + * Do not allow non-tree-log blocks in the dedicated tree-log block + * group, and vice versa. + */ + spin_lock(&fs_info->treelog_bg_lock); + log_bytenr = fs_info->treelog_bg; + skip = log_bytenr && ((ffe_ctl->for_treelog && bytenr != log_bytenr) || + (!ffe_ctl->for_treelog && bytenr == log_bytenr)); + spin_unlock(&fs_info->treelog_bg_lock); + if (skip) + return 1; + spin_lock(&space_info->lock); spin_lock(&block_group->lock); + spin_lock(&fs_info->treelog_bg_lock); + + ASSERT(!ffe_ctl->for_treelog || + block_group->start == fs_info->treelog_bg || + fs_info->treelog_bg == 0); if (block_group->ro) { ret = 1; goto out; } + /* + * Do not allow currently using block group to be tree-log dedicated + * block group. + */ + if (ffe_ctl->for_treelog && !fs_info->treelog_bg && + (block_group->used || block_group->reserved)) { + ret = 1; + goto out; + } + avail = block_group->length - block_group->alloc_offset; if (avail < num_bytes) { if (ffe_ctl->max_extent_size < avail) { @@ -3858,6 +3908,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, goto out; } + if (ffe_ctl->for_treelog && !fs_info->treelog_bg) + fs_info->treelog_bg = block_group->start; + ffe_ctl->found_offset = start + block_group->alloc_offset; block_group->alloc_offset += num_bytes; spin_lock(&ctl->tree_lock); @@ -3872,6 +3925,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, ffe_ctl->search_start = ffe_ctl->found_offset; out: + if (ret && ffe_ctl->for_treelog) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); return ret; @@ -4121,7 +4177,12 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); case BTRFS_EXTENT_ALLOC_ZONED: - /* nothing to do */ + if (ffe_ctl->for_treelog) { + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg) + ffe_ctl->hint_byte = fs_info->treelog_bg; + spin_unlock(&fs_info->treelog_bg_lock); + } return 0; default: BUG(); @@ -4165,6 +4226,7 @@ static noinline int find_free_extent(struct btrfs_root *root, struct find_free_extent_ctl ffe_ctl = {0}; struct btrfs_space_info *space_info; bool full_search = false; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; WARN_ON(num_bytes < fs_info->sectorsize); @@ -4178,6 +4240,7 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.orig_have_caching_bg = false; ffe_ctl.found_offset = 0; ffe_ctl.hint_byte = hint_byte_orig; + ffe_ctl.for_treelog = for_treelog; ffe_ctl.policy = BTRFS_EXTENT_ALLOC_CLUSTERED; /* For clustered allocation */ @@ -4252,8 +4315,11 @@ static noinline int find_free_extent(struct btrfs_root *root, struct btrfs_block_group *bg_ret; /* If the block group is read-only, we can skip it entirely. */ - if (unlikely(block_group->ro)) + if (unlikely(block_group->ro)) { + if (for_treelog) + btrfs_clear_treelog_bg(block_group); continue; + } btrfs_grab_block_group(block_group, delalloc); ffe_ctl.search_start = block_group->start; @@ -4441,6 +4507,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, bool final_tried = num_bytes == min_alloc_size; u64 flags; int ret; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; flags = get_alloc_profile_by_root(root, is_data); again: @@ -4464,8 +4531,8 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, sinfo = btrfs_find_space_info(fs_info, flags); btrfs_err(fs_info, - "allocation failed flags %llu, wanted %llu", - flags, num_bytes); + "allocation failed flags %llu, wanted %llu treelog %d", + flags, num_bytes, for_treelog); if (sinfo) btrfs_dump_space_info(fs_info, sinfo, num_bytes, 1); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 8c203c0425e0..52789da61fa3 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -7,6 +7,7 @@ #include #include "volumes.h" #include "disk-io.h" +#include "block-group.h" struct btrfs_zoned_device_info { /* @@ -292,4 +293,17 @@ static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) mutex_unlock(&fs_info->zoned_meta_io_lock); } +static inline void btrfs_clear_treelog_bg(struct btrfs_block_group *bg) +{ + struct btrfs_fs_info *fs_info = bg->fs_info; + + if (!btrfs_is_zoned(fs_info)) + return; + + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg == bg->start) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); +} + #endif From patchwork Fri Jan 22 06:21:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F000C433DB for ; Fri, 22 Jan 2021 06:31:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E02A3235FF for ; Fri, 22 Jan 2021 06:31:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726894AbhAVGbh (ORCPT ); Fri, 22 Jan 2021 01:31:37 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51117 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727009AbhAVGaW (ORCPT ); Fri, 22 Jan 2021 01:30:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611297021; x=1642833021; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GgKvwZ3/MumYs8snuRzyykc3kM4rM9mgurEtkGpvjZY=; b=Px0s+AC+7qv+SD2kzBQwyoK77zzqRNiT7nPGDAmaF5vIUfc4nS/x7df+ aN7HE4UPTfkYUJmYm1mV5NDKkESv2Mg41iQBZfjuTuuJrAqBh+uniP35Q aQ0uJPguqOY87iGPIysHxDyH7U/aW2+8YXvATnxKegWvnCHwuBxH/RxYV 1R78iMGrvOmYVhe5SxBGmTAkTrknKNlNe/UkxOWvxWMPjPW3QiUlFZM8C Ddab5y7UJUwtsSG/JiTmYd7EdT4GaNi7zwREK1fHKguVEwg9qkOs0OQlX K6+vaO3rUHPKh3zFHrdfjLpRfgNUyBcjifkVTo83QewuLxvY9LdEg2ahO A==; IronPort-SDR: PDLIPAnrlO/8gJaN0rn67QGF9ssOboKqTLYF/EaM5jRtCk/Xlp6rjlOrC7F7TcwP/ZIDk4zeai +43yOA0ShVWUxn954+Y9ZBTXtOZ7W4nS0ly3+5BaN1e0fBp/gmcLDBcQi5+pMMaXm0MIeFxgAP 38tKuwlL2MECUy9jw3UvyYJbeNDYIA1e5jejK8DLcgPpo8zH3qHn+hTLCg4sl443hEdLppKyC+ l3pzh7wEJIwjZb2vHuTaq0PhKDkakySg2s34L+nDUMeMDV8mkH61WT5FUEFdOu9ZennSE71oiF mYk= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392102" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:29 +0800 IronPort-SDR: coYYqKq0hGNbfFikCF461wBHhQgzb5Fy7hcR9JA1kRd+P1XzddIZU2OezZRIxdL0slSN3X9mSP VbgAxPoDt1n4goSc8oDErNo/TOwPekIr97YoNeolggWc1gcWNnfb9LUn9tSSth9H1StN/Cs0sC Jr2rkTzW5+4VxBpJeREXLUQKzJe0v5ogcgIhTjRCp6tGUyovqpZPEk/xXhLwE4X5W26OrgC/b1 u6jlpiGrEA9UBgVrs+gIXmPWzvdCodz32Zevb3tlPKbe49rB6lploF+H/tyQpIWBLBWRnoIAfL Cu4a/kx5fMX994AJXUmkvm7F Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:06:01 -0800 IronPort-SDR: Q+TS4J5vqpWVc1i7bymdkhRBPCpnKJjnFkR7k0OxocJx8uqeNdiSfgPie2bJGWEEWNUUJHL+S8 18pYNfBrILX2lt7QVX5xlcFmwGYLBL5Wd4w0/NI06pRyc6ceF3WxN1X8kdQomIdSYWRI8LGjza Mn+kqJfvAXrDRBnpfi5t2gGmfKskkFrUvITSq88Nca+AqguouiouetYAGiXi4GrzExHgZKanZi zye3hcwhHT/Zsgc08E9LEx5ukrCprdMlD7Dx03rzqZZXzuC7Tng/3rMu63uSpLTe9vmlbO9C81 6lY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:28 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v13 41/42] btrfs: serialize log transaction on ZONED mode Date: Fri, 22 Jan 2021 15:21:41 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 2/3 patch to enable tree-log on ZONED mode. Since we can start more than one log transactions per subvolume simultaneously, nodes from multiple transactions can be allocated interleaved. Such mixed allocation results in non-sequential writes at the time of log transaction commit. The nodes of the global log root tree (fs_info->log_root_tree), also have the same mixed allocation problem. This patch serializes log transactions by waiting for a committing transaction when someone tries to start a new transaction, to avoid the mixed allocation problem. We must also wait for running log transactions from another subvolume, but there is no easy way to detect which subvolume root is running a log transaction. So, this patch forbids starting a new log transaction when other subvolumes already allocated the global log root tree. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/tree-log.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 930e752686b4..71a1c0b5bc26 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -105,6 +105,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans, struct btrfs_root *log, struct btrfs_path *path, u64 dirid, int del_all); +static void wait_log_commit(struct btrfs_root *root, int transid); /* * tree logging is a special write ahead log used to make sure that @@ -140,6 +141,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans, { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *tree_root = fs_info->tree_root; + const bool zoned = btrfs_is_zoned(fs_info); int ret = 0; /* @@ -160,12 +162,20 @@ static int start_log_trans(struct btrfs_trans_handle *trans, mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + if (btrfs_need_log_full_commit(trans)) { ret = -EAGAIN; goto out; } + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } + if (!root->log_start_pid) { clear_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); root->log_start_pid = current->pid; @@ -173,6 +183,17 @@ static int start_log_trans(struct btrfs_trans_handle *trans, set_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); } } else { + if (zoned) { + mutex_lock(&fs_info->tree_log_mutex); + if (fs_info->log_root_tree) + ret = -EAGAIN; + else + ret = btrfs_init_log_root_tree(trans, fs_info); + mutex_unlock(&fs_info->tree_log_mutex); + } + if (ret) + goto out; + ret = btrfs_add_log_tree(trans, root); if (ret) goto out; @@ -201,14 +222,22 @@ static int start_log_trans(struct btrfs_trans_handle *trans, */ static int join_running_log_trans(struct btrfs_root *root) { + const bool zoned = btrfs_is_zoned(root->fs_info); int ret = -ENOENT; if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &root->state)) return ret; mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + ret = 0; + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } atomic_inc(&root->log_writers); } mutex_unlock(&root->log_mutex); From patchwork Fri Jan 22 06:21:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12038499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3EBFC433E0 for ; Fri, 22 Jan 2021 06:32:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 99062235FF for ; Fri, 22 Jan 2021 06:32:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727009AbhAVGbs (ORCPT ); Fri, 22 Jan 2021 01:31:48 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:51031 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727010AbhAVGaW (ORCPT ); Fri, 22 Jan 2021 01:30:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611297022; x=1642833022; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4E4eQmbryp9+ccIW3vYSkkKzer1kPPbZykOhJlEGgJQ=; b=eDTnD5fNYeHfV6lbKjDKyHVcRUcyGEGDWNf32uZDN7zTPBXXntmIRgzH 0n4+AbCXAyUgaL1oNLkyVBsURkMVBJt6ntLFX2q+aNKtgTxm9bVnKdiad ud5AWOyW0rU5GIB8FrcHV5cqqMQuOcff67d3JgC+xHCIYu0CWT9blKCqA /HqCtA51C7+Ca/r30sdrnVEHSNeniCHJUKrDdjc0P9qWZ2ETvN+SRclMO Jvg+XmJZh3LAgoDw36qfIpjvdXwYEt0HE7g6cyw/TPxz3N9UvPcecZPLD H0glII41hMuCgw/N4KKaOs7TS3DP1V1c5oZEif0VksAbVdVEOGvBKjwRP w==; IronPort-SDR: eDRqazBn6G3HeFGj68goOj30gSbevsWvdUyoq72fgjW/5tGg+4h5rNvkVoKOYjhyMtMlfBzaCr klQHIvLb/UtU7EdXSN6uj7Gni/fQiUyWSCCJNTAal8ZfKCubvMWF4qBF2YLa7lpvwoHhueEhGe YrqgkkVbm8L+hPXCuriKxar3ayPTBCx15G+r8OtYAoEfUwmcuwSZPaKs5YVWku2hPlulVy7qbj RC/1UIw21AGPtlV9Jc+iK5Hwnj1benyYxkIeQ5jXAzJa5bljILb60O3xlTswHfNPokVbe+Fqrm aW0= X-IronPort-AV: E=Sophos;i="5.79,365,1602518400"; d="scan'208";a="268392109" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2021 14:23:31 +0800 IronPort-SDR: OHHiEv6pcE8ZELTLTGKjq7Y75GaonrVaWqZZLRpLq9y9zWs0nFNECrCak2nMXBVYheHLI6WRU6 73CbHlc7WVcaBSEbGbFB4Br0Q72Ur9asAhZ/wcvAuXxrlfw9M06U4nG3Dg0ioeZL+bpD17xqEj vKqVWiVq8RerUarYfvqcTS2u0Fp8ku2sSi3yNN1bGXeFJ6GGMRUhunq5Zy3TuxUAGgiOchWRpn niivqx/IPxKnH0XdR6/P0DTIKw45f+B+Fqk+xDH4YRUsa0QsRpAKo9rnkTz5sNQgzVs6l6XrQi UOuQ5c/qwV24Dbl2SRLkpF7x Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2021 22:06:02 -0800 IronPort-SDR: tDGYIbakBpobWCYs/XeuPzEv5Obc5LXe3WiTbp8fqDs/HLdkZ4h6korm/XeLeglgAAok0FUG4/ Ak4yY/W/wmCBSJol1oaWRhZfGqAi+BdbRDn/CMgjo4fX+am5oa1MEZ40AeYgxwxBZ71nbkNOm6 JwPx4GvPNBxm5YokU9XHCJWjLnkCKVzNkMXZk1XJCnQA4Uqo1c9VJFzxds6PxeJdjG1HdD5lfB UGN4wVKA8KE53qQ1+oCViv11vsPI+JINXySWH7CF4FofLp7HllZnlaLqCpq8FYarYwiVTAPlpR b+4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 21 Jan 2021 22:23:29 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v13 42/42] btrfs: reorder log node allocation Date: Fri, 22 Jan 2021 15:21:42 +0900 Message-Id: <84e050d142732282ba7356fc06095e82fe5d3b72.1611295439.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 3/3 patch to enable tree-log on ZONED mode. The allocation order of nodes of "fs_info->log_root_tree" and nodes of "root->log_root" is not the same as the writing order of them. So, the writing causes unaligned write errors. This patch reorders the allocation of them by delaying allocation of the root node of "fs_info->log_root_tree," so that the node buffers can go out sequentially to devices. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 7 ------- fs/btrfs/tree-log.c | 24 ++++++++++++++++++------ 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3b5cfe4d928..d2b30716de84 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1241,18 +1241,11 @@ int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; - int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); - ret = btrfs_alloc_log_tree_node(trans, log_root); - if (ret) { - btrfs_put_root(log_root); - return ret; - } - WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 71a1c0b5bc26..d8315363dc1e 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3159,6 +3159,16 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, list_add_tail(&root_log_ctx.list, &log_root_tree->log_ctxs[index2]); root_log_ctx.log_transid = log_root_tree->log_transid; + mutex_lock(&fs_info->tree_log_mutex); + if (!log_root_tree->node) { + ret = btrfs_alloc_log_tree_node(trans, log_root_tree); + if (ret) { + mutex_unlock(&fs_info->tree_log_mutex); + goto out; + } + } + mutex_unlock(&fs_info->tree_log_mutex); + /* * Now we are safe to update the log_root_tree because we're under the * log_mutex, and we're a current writer so we're holding the commit @@ -3317,12 +3327,14 @@ static void free_log_tree(struct btrfs_trans_handle *trans, .process_func = process_one_buffer }; - ret = walk_log_tree(trans, log, &wc); - if (ret) { - if (trans) - btrfs_abort_transaction(trans, ret); - else - btrfs_handle_fs_error(log->fs_info, ret, NULL); + if (log->node) { + ret = walk_log_tree(trans, log, &wc); + if (ret) { + if (trans) + btrfs_abort_transaction(trans, ret); + else + btrfs_handle_fs_error(log->fs_info, ret, NULL); + } } clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1,