From patchwork Wed Dec 2 06:47:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944811 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54FEDC83016 for ; Wed, 2 Dec 2020 06:49:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0CA6E22203 for ; Wed, 2 Dec 2020 06:49:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728726AbgLBGtH (ORCPT ); Wed, 2 Dec 2020 01:49:07 -0500 Received: from mx2.suse.de ([195.135.220.15]:53226 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728673AbgLBGtH (ORCPT ); Wed, 2 Dec 2020 01:49:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891700; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kpytfpaTj990YZ36+owCZgBDAjsA8jP4wtNUccMQHxw=; b=tWvUcATD0YcB88Gh4yOkc7laWS96QmGHGsNT92ZEt15BK7aNODEYH9Y45sZkme/50wkJhu G3pZKx/cPlxWAHR+xAWkziPThpKCwUnG5jWo9/d/SiQMtCBUZW0h6FRx8Z2Jt46z+knyOS or/eStaZm4AhS9lgAgi/F0HOSscJpWw= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 906BFAD5A for ; Wed, 2 Dec 2020 06:48:20 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 01/15] btrfs: rename bio_offset of extent_submit_bio_start_t to opt_file_offset Date: Wed, 2 Dec 2020 14:47:57 +0800 Message-Id: <20201202064811.100688-2-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The parameter bio_offset of extent_submit_bio_start_t is very confusing. If it's really bio_offset (offset to bio), then it should be u32. But in fact, it's only utilized by dio read, and that member is used as file offset, which must be u64. Rename it to opt_file_offset since the only user uses it as file offset, and add comment for who is using it. Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 17 ++++++++--------- fs/btrfs/disk-io.h | 2 +- fs/btrfs/extent_io.h | 2 +- fs/btrfs/inode.c | 10 +++++----- 4 files changed, 15 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 46dd9e0b077e..504636803bc4 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -113,11 +113,9 @@ struct async_submit_bio { struct bio *bio; extent_submit_bio_start_t *submit_bio_start; int mirror_num; - /* - * bio_offset is optional, can be used if the pages in the bio - * can't tell us where in the file the bio should go - */ - u64 bio_offset; + + /* Optional parameter for submit_bio_start, used by direct io */ + u64 opt_file_offset; struct btrfs_work work; blk_status_t status; }; @@ -697,7 +695,8 @@ static void run_one_async_start(struct btrfs_work *work) blk_status_t ret; async = container_of(work, struct async_submit_bio, work); - ret = async->submit_bio_start(async->inode, async->bio, async->bio_offset); + ret = async->submit_bio_start(async->inode, async->bio, + async->opt_file_offset); if (ret) async->status = ret; } @@ -749,7 +748,7 @@ static void run_one_async_free(struct btrfs_work *work) blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, int mirror_num, unsigned long bio_flags, - u64 bio_offset, + u64 opt_file_offset, extent_submit_bio_start_t *submit_bio_start) { struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; @@ -767,7 +766,7 @@ blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, btrfs_init_work(&async->work, run_one_async_start, run_one_async_done, run_one_async_free); - async->bio_offset = bio_offset; + async->opt_file_offset = opt_file_offset; async->status = 0; @@ -797,7 +796,7 @@ static blk_status_t btree_csum_one_bio(struct bio *bio) } static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio, - u64 bio_offset) + u64 opt_file_offset) { /* * when we're called for a write, we're already in the async diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 7b3ecad88d7e..9c2d6469bf25 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -116,7 +116,7 @@ blk_status_t btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio, enum btrfs_wq_endio_type metadata); blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, int mirror_num, unsigned long bio_flags, - u64 bio_offset, + u64 opt_file_offset, extent_submit_bio_start_t *submit_bio_start); blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio, int mirror_num); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 66762c3cdf81..d3c7ad02db24 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -72,7 +72,7 @@ typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio, unsigned long bio_flags); typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode, - struct bio *bio, u64 bio_offset); + struct bio *bio, u64 opt_file_offset); #define INLINE_EXTENT_BUFFER_PAGES 16 #define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0ce42d52d53e..cf27729e41c8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2212,7 +2212,7 @@ int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio, * are inserted into the btree */ static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, - u64 bio_offset) + u64 opt_file_offset) { return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } @@ -7795,9 +7795,10 @@ static void __endio_write_update_ordered(struct btrfs_inode *inode, } static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode, - struct bio *bio, u64 offset) + struct bio *bio, + u64 opt_file_offset) { - return btrfs_csum_one_bio(BTRFS_I(inode), bio, offset, 1); + return btrfs_csum_one_bio(BTRFS_I(inode), bio, opt_file_offset, 1); } static void btrfs_end_dio_bio(struct bio *bio) @@ -7846,8 +7847,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, goto map; if (write && async_submit) { - ret = btrfs_wq_submit_bio(inode, bio, 0, 0, - file_offset, + ret = btrfs_wq_submit_bio(inode, bio, 0, 0, file_offset, btrfs_submit_bio_start_direct_io); goto err; } else if (write) { From patchwork Wed Dec 2 06:47:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944813 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D65DDC83012 for ; Wed, 2 Dec 2020 06:49:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7694322203 for ; Wed, 2 Dec 2020 06:49:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728734AbgLBGtM (ORCPT ); Wed, 2 Dec 2020 01:49:12 -0500 Received: from mx2.suse.de ([195.135.220.15]:53238 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728727AbgLBGtL (ORCPT ); Wed, 2 Dec 2020 01:49:11 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891704; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pAQVnz17eaV/KGT6Q3AI3r5uDcsh2LFrnaOWTkHV63g=; b=jTz/8+V+3tsdOLZ3xtyNjxKgjWfTl+1PHJV1aDt9w2d8WT6OrGIW9HgNuhibMNmawA+GMd yVI03Mtvtgbfu0pKqAgJ+ec2UzIBVtDF1G4DWfYgdvurneb1+lj4lY4wVg9KJdOzCXwPFy FsJAWdX9cKa7NaHifAgNUvnS7PWizTg= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1D62EAD5C; Wed, 2 Dec 2020 06:48:24 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Johannes Thumshirn , Nikolay Borisov , Josef Bacik Subject: [PATCH v3 02/15] btrfs: pass bio_offset to check_data_csum() directly Date: Wed, 2 Dec 2020 14:47:58 +0800 Message-Id: <20201202064811.100688-3-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Parameter @icsum for check_data_csum() is a little hard to understand. So is the @phy_offset for btrfs_verify_data_csum(). Both parameters are calculated values for csum lookup. Instead of some calculated value, just pass @bio_offset and let the final and only user, check_data_csum(), to calculate whatever it needs. Since we are here, also make the bio_offset parameter and some related variables to be u32 (unsigned int). As bio size is limited by its bi_size, which is unsigned int, and has extra size limit check during various bio operations. Thus we are ensured that bio_offset won't over flow u32. Thus for all involved functions, not only rename the parameter from @phy_offset to @bio_offset, but also reduce its width to u32, so we won't have suspicious "u32 = u64 >> sector_bits;" lines anymore. Reviewed-by: Johannes Thumshirn Reviewed-by: Nikolay Borisov Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 2 +- fs/btrfs/extent_io.c | 32 ++++++++++++++++++++------------ fs/btrfs/extent_io.h | 2 +- fs/btrfs/inode.c | 32 ++++++++++++++++++++------------ 4 files changed, 42 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2744e13e8eb9..112c9a2ae47b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3052,7 +3052,7 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path); /* inode.c */ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, int mirror_num, unsigned long bio_flags); -int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset, +int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset, struct page *page, u64 start, u64 end, int mirror); struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode, u64 start, u64 len); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 569d50ccf78a..a28b61510265 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2634,7 +2634,7 @@ static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio) } blk_status_t btrfs_submit_read_repair(struct inode *inode, - struct bio *failed_bio, u64 phy_offset, + struct bio *failed_bio, u32 bio_offset, struct page *page, unsigned int pgoff, u64 start, u64 end, int failed_mirror, submit_bio_hook_t *submit_bio_hook) @@ -2644,7 +2644,7 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode, struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; struct btrfs_io_bio *failed_io_bio = btrfs_io_bio(failed_bio); - const int icsum = phy_offset >> fs_info->sectorsize_bits; + const int icsum = bio_offset >> fs_info->sectorsize_bits; bool need_validation; struct bio *repair_bio; struct btrfs_io_bio *repair_io_bio; @@ -2869,10 +2869,11 @@ static void end_bio_extent_readpage(struct bio *bio) struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); struct extent_io_tree *tree, *failure_tree; struct processed_extent processed = { 0 }; - u64 offset = 0; - u64 start; - u64 end; - u64 len; + /* + * The offset to the beginning of a bio, since one bio can never be + * larger than UINT_MAX, u32 here is enough. + */ + u32 bio_offset = 0; int mirror; int ret; struct bvec_iter_all iter_all; @@ -2882,7 +2883,10 @@ static void end_bio_extent_readpage(struct bio *bio) struct page *page = bvec->bv_page; struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - u32 sectorsize = fs_info->sectorsize; + const u32 sectorsize = fs_info->sectorsize; + u64 start; + u64 end; + u32 len; btrfs_debug(fs_info, "end_bio_extent_readpage: bi_sector=%llu, err=%d, mirror=%u", @@ -2915,8 +2919,9 @@ static void end_bio_extent_readpage(struct bio *bio) mirror = io_bio->mirror_num; if (likely(uptodate)) { if (is_data_inode(inode)) - ret = btrfs_verify_data_csum(io_bio, offset, page, - start, end, mirror); + ret = btrfs_verify_data_csum(io_bio, + bio_offset, page, start, end, + mirror); else ret = btrfs_validate_metadata_buffer(io_bio, page, start, end, mirror); @@ -2944,12 +2949,14 @@ static void end_bio_extent_readpage(struct bio *bio) * If it can't handle the error it will return -EIO and * we remain responsible for that page. */ - if (!btrfs_submit_read_repair(inode, bio, offset, page, + if (!btrfs_submit_read_repair(inode, bio, bio_offset, + page, start - page_offset(page), start, end, mirror, btrfs_submit_data_bio)) { uptodate = !bio->bi_status; - offset += len; + ASSERT(bio_offset + len > bio_offset); + bio_offset += len; continue; } } else { @@ -2974,7 +2981,8 @@ static void end_bio_extent_readpage(struct bio *bio) if (page->index == end_index && off) zero_user_segment(page, off, PAGE_SIZE); } - offset += len; + ASSERT(bio_offset + len > bio_offset); + bio_offset += len; /* Update page status and unlock */ endio_readpage_update_page_status(page, uptodate); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index d3c7ad02db24..db95468801c7 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -291,7 +291,7 @@ struct io_failure_record { blk_status_t btrfs_submit_read_repair(struct inode *inode, - struct bio *failed_bio, u64 phy_offset, + struct bio *failed_bio, u32 bio_offset, struct page *page, unsigned int pgoff, u64 start, u64 end, int failed_mirror, submit_bio_hook_t *submit_bio_hook); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index cf27729e41c8..5c051b0e58a5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2942,28 +2942,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, /* * check_data_csum - verify checksum of one sector of uncompressed data - * @inode: the inode + * @inode: inode * @io_bio: btrfs_io_bio which contains the csum - * @icsum: checksum index in the io_bio->csum array, size of csum_size + * @bio_offset: offset to the beginning of the bio (in bytes) * @page: page where is the data to be verified * @pgoff: offset inside the page * * The length of such check is always one sector size. */ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, - int icsum, struct page *page, int pgoff) + u32 bio_offset, struct page *page, int pgoff) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); char *kaddr; u32 len = fs_info->sectorsize; const u32 csum_size = fs_info->csum_size; + unsigned int offset_sectors; u8 *csum_expected; u8 csum[BTRFS_CSUM_SIZE]; ASSERT(pgoff + len <= PAGE_SIZE); - csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size; + offset_sectors = bio_offset >> fs_info->sectorsize_bits; + csum_expected = ((u8 *)io_bio->csum) + offset_sectors * csum_size; kaddr = kmap_atomic(page); shash->tfm = fs_info->csum_shash; @@ -2988,11 +2990,16 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, } /* - * when reads are done, we need to check csums to verify the data is correct + * When reads are done, we need to check csums to verify the data is correct. * if there's a match, we allow the bio to finish. If not, the code in * extent_io.c will try to find good copies for us. + * + * @bio_offset: offset to the beginning of the bio (in bytes) + * @start: file offset of the range start + * @end: file offset of the range end (inclusive) + * @mirror: mirror number */ -int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset, +int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset, struct page *page, u64 start, u64 end, int mirror) { size_t offset = start - page_offset(page); @@ -3017,8 +3024,7 @@ int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset, return 0; } - phy_offset >>= root->fs_info->sectorsize_bits; - return check_data_csum(inode, io_bio, phy_offset, page, offset); + return check_data_csum(inode, io_bio, bio_offset, page, offset); } /* @@ -7712,7 +7718,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, struct bio_vec bvec; struct bvec_iter iter; u64 start = io_bio->logical; - int icsum = 0; + u32 bio_offset = 0; blk_status_t err = BLK_STS_OK; __bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) { @@ -7723,8 +7729,8 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, for (i = 0; i < nr_sectors; i++) { ASSERT(pgoff < PAGE_SIZE); if (uptodate && - (!csum || !check_data_csum(inode, io_bio, icsum, - bvec.bv_page, pgoff))) { + (!csum || !check_data_csum(inode, io_bio, + bio_offset, bvec.bv_page, pgoff))) { clean_io_failure(fs_info, failure_tree, io_tree, start, bvec.bv_page, btrfs_ino(BTRFS_I(inode)), @@ -7732,6 +7738,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, } else { blk_status_t status; + ASSERT((start - io_bio->logical) < UINT_MAX); status = btrfs_submit_read_repair(inode, &io_bio->bio, start - io_bio->logical, @@ -7744,7 +7751,8 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, err = status; } start += sectorsize; - icsum++; + ASSERT(bio_offset + sectorsize > bio_offset); + bio_offset += sectorsize; pgoff += sectorsize; } } From patchwork Wed Dec 2 06:47:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F727C71156 for ; Wed, 2 Dec 2020 06:49:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A521B20DD4 for ; Wed, 2 Dec 2020 06:49:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387446AbgLBGtN (ORCPT ); Wed, 2 Dec 2020 01:49:13 -0500 Received: from mx2.suse.de ([195.135.220.15]:53254 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728729AbgLBGtN (ORCPT ); Wed, 2 Dec 2020 01:49:13 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891706; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z42Uca6UdEd30hZpG4Nu6fEt4kSlQLtaMW5jy5Zig+g=; b=mSPZqT0TAVKOSyWeXo23kY9/oS/i5o5oQ0ViCvDd3WDInYag0+Hv/Lf+RhBTepiIXEWDRK sr75snAhv+cqE0hVC+LW8dN9oQGTZ+V00oi1oNvuofGsvkRgV+5ljUfg+4LKo0cYdGdU0J GJEQHv5+RQnCVAReycUTHKOWdym0n/A= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 78D8FAC2D; Wed, 2 Dec 2020 06:48:26 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik , Goldwyn Rodrigues Subject: [PATCH v3 03/15] btrfs: inode: make btrfs_verify_data_csum() follow sector size Date: Wed, 2 Dec 2020 14:47:59 +0800 Message-Id: <20201202064811.100688-4-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently btrfs_verify_data_csum() just pass the whole page to check_data_csum(), which is fine since we only support sectorsize == PAGE_SIZE. To support subpage, we need to properly honor per-sector checksum verification, just like what we did in dio read path. This patch will do the csum verification in a for loop, starts with pg_off == start - page_offset(page), with sectorsize increasement for each loop. For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we will only finish with just one loop. For subpage case, we do the loop to iterate each sector and if we found any error, we return error. Reviewed-by: Josef Bacik Signed-off-by: Goldwyn Rodrigues Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5c051b0e58a5..255ea28982ff 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2951,7 +2951,7 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, * The length of such check is always one sector size. */ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, - u32 bio_offset, struct page *page, int pgoff) + u32 bio_offset, struct page *page, u32 pgoff) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); @@ -3002,10 +3002,11 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset, struct page *page, u64 start, u64 end, int mirror) { - size_t offset = start - page_offset(page); struct inode *inode = page->mapping->host; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct btrfs_root *root = BTRFS_I(inode)->root; + const u32 sectorsize = root->fs_info->sectorsize; + u32 pg_off; if (PageChecked(page)) { ClearPageChecked(page); @@ -3024,7 +3025,18 @@ int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset, return 0; } - return check_data_csum(inode, io_bio, bio_offset, page, offset); + ASSERT(page_offset(page) <= start && + end <= page_offset(page) + PAGE_SIZE - 1); + for (pg_off = offset_in_page(start); + pg_off < offset_in_page(end); + pg_off += sectorsize, bio_offset += sectorsize) { + int ret; + + ret = check_data_csum(inode, io_bio, bio_offset, page, pg_off); + if (ret < 0) + return -EIO; + } + return 0; } /* From patchwork Wed Dec 2 06:48:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26E57C83013 for ; Wed, 2 Dec 2020 06:49:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D3B9F22202 for ; Wed, 2 Dec 2020 06:49:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387490AbgLBGtP (ORCPT ); Wed, 2 Dec 2020 01:49:15 -0500 Received: from mx2.suse.de ([195.135.220.15]:53282 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387447AbgLBGtP (ORCPT ); Wed, 2 Dec 2020 01:49:15 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891709; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5Ox2BwghMIpJpnG6f4dUT/FZWX7wEuTF3Fsl/J/sOM4=; b=ndxrr28UUBzb0ElArW4V2GTf5SHDNVeBfIwqqj/jiHdFmZmGdoREuZESxJfa7/RQEmcz7X UjGGOgLY9xnpJOsnFS6ekqrcwubR3ox9jGGmeI4tnpEY1E00/spKmRyNDFTQiKhPdZGrwd Pza7Jvr/CSi+Dd6hH0iExpPYmK1Nl5c= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 0607BAEBB; Wed, 2 Dec 2020 06:48:29 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik , David Sterba Subject: [PATCH v3 04/15] btrfs: extent_io: extract the btree page submission code into its own helper function Date: Wed, 2 Dec 2020 14:48:00 +0800 Message-Id: <20201202064811.100688-5-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In btree_write_cache_pages() we have a btree page submission routine buried deeply into a nested loop. This patch will extract that part of code into a helper function, submit_eb_page(), to do the same work. Also, since submit_eb_page() now can return >0 for successful extent buffer submission, remove the "ASSERT(ret <= 0);" line. Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 122 ++++++++++++++++++++++++++----------------- 1 file changed, 75 insertions(+), 47 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index a28b61510265..4b72c824064c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3987,10 +3987,81 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, return ret; } +/* + * Submit all page(s) of one extent buffer. + * + * @page: The page of one extent buffer + * @eb_context: To determine if we need to submit this page. If current page + * belongs to this eb, we don't need to submit. + * + * The caller should pass each page in their bytenr order, and here we use + * @eb_context to determine if we have submitted pages of one extent buffer. + * + * If have submitted, we just skip until we hit a new page that doesn't belong + * to current @eb_context. + * + * If not yet submitted, we submit all the page(s) of the extent buffer. + * + * Return >0 if we have submitted the extent buffer successfully. + * Return 0 if we don't need to submit the page, as it's already submitted by + * previous call. + * Return <0 for fatal error. + */ +static int submit_eb_page(struct page *page, struct writeback_control *wbc, + struct extent_page_data *epd, + struct extent_buffer **eb_context) +{ + struct address_space *mapping = page->mapping; + struct extent_buffer *eb; + int ret; + + if (!PagePrivate(page)) + return 0; + + spin_lock(&mapping->private_lock); + if (!PagePrivate(page)) { + spin_unlock(&mapping->private_lock); + return 0; + } + + eb = (struct extent_buffer *)page->private; + + /* + * Shouldn't happen and normally this would be a BUG_ON but no sense + * in crashing the users box for something we can survive anyway. + */ + if (WARN_ON(!eb)) { + spin_unlock(&mapping->private_lock); + return 0; + } + + if (eb == *eb_context) { + spin_unlock(&mapping->private_lock); + return 0; + } + ret = atomic_inc_not_zero(&eb->refs); + spin_unlock(&mapping->private_lock); + if (!ret) + return 0; + + *eb_context = eb; + + ret = lock_extent_buffer_for_io(eb, epd); + if (ret <= 0) { + free_extent_buffer(eb); + return ret; + } + ret = write_one_eb(eb, wbc, epd); + free_extent_buffer(eb); + if (ret < 0) + return ret; + return 1; +} + int btree_write_cache_pages(struct address_space *mapping, struct writeback_control *wbc) { - struct extent_buffer *eb, *prev_eb = NULL; + struct extent_buffer *eb_context = NULL; struct extent_page_data epd = { .bio = NULL, .extent_locked = 0, @@ -4036,55 +4107,13 @@ int btree_write_cache_pages(struct address_space *mapping, for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; - if (!PagePrivate(page)) - continue; - - spin_lock(&mapping->private_lock); - if (!PagePrivate(page)) { - spin_unlock(&mapping->private_lock); - continue; - } - - eb = (struct extent_buffer *)page->private; - - /* - * Shouldn't happen and normally this would be a BUG_ON - * but no sense in crashing the users box for something - * we can survive anyway. - */ - if (WARN_ON(!eb)) { - spin_unlock(&mapping->private_lock); - continue; - } - - if (eb == prev_eb) { - spin_unlock(&mapping->private_lock); + ret = submit_eb_page(page, wbc, &epd, &eb_context); + if (ret == 0) continue; - } - - ret = atomic_inc_not_zero(&eb->refs); - spin_unlock(&mapping->private_lock); - if (!ret) - continue; - - prev_eb = eb; - ret = lock_extent_buffer_for_io(eb, &epd); - if (!ret) { - free_extent_buffer(eb); - continue; - } else if (ret < 0) { - done = 1; - free_extent_buffer(eb); - break; - } - - ret = write_one_eb(eb, wbc, &epd); - if (ret) { + if (ret < 0) { done = 1; - free_extent_buffer(eb); break; } - free_extent_buffer(eb); /* * the filesystem may choose to bump up nr_to_write. @@ -4105,7 +4134,6 @@ int btree_write_cache_pages(struct address_space *mapping, index = 0; goto retry; } - ASSERT(ret <= 0); if (ret < 0) { end_write_bio(&epd, ret); return ret; From patchwork Wed Dec 2 06:48:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944815 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F285C64E8A for ; Wed, 2 Dec 2020 06:49:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C5A5B20DD4 for ; Wed, 2 Dec 2020 06:49:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728711AbgLBGtv (ORCPT ); Wed, 2 Dec 2020 01:49:51 -0500 Received: from mx2.suse.de ([195.135.220.15]:53472 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727529AbgLBGtv (ORCPT ); Wed, 2 Dec 2020 01:49:51 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891713; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c+jAOG+tWlNV6MVdEr6707WrftUMSOW1PqCzDxCVrHY=; b=FzaFcK11Pv7BeSPi50SJuus+V+oeetVKzJwzvR1Ib1kQPpUlf5vdiEragepP76b//rtLHR JrgkxwG65+R0a0gra9TDEH3wMlzhpQjbMN9RYntRK8WIQ0qaslshwAAAvs7vRhwnKdSnP0 YMCO0YdiSkhqFt3oFYxtANwmJLp26nU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 30DACAEC1; Wed, 2 Dec 2020 06:48:33 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Johannes Thumshirn , Nikolay Borisov , David Sterba Subject: [PATCH v3 05/15] btrfs: extent_io: calculate inline extent buffer page size based on page size Date: Wed, 2 Dec 2020 14:48:01 +0800 Message-Id: <20201202064811.100688-6-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Btrfs only support 64K as max node size, thus for 4K page system, we would have at most 16 pages for one extent buffer. For a system using 64K page size, we would really have just one single page. While we always use 16 pages for extent_buffer::pages[], this means for systems using 64K pages, we are wasting memory for the 15 pages which will never be utilized. So this patch will change how the extent_buffer::pages[] array size is calclulated, now it will be calculated using BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE. For systems using 4K page size, it will stay 16 pages. For systems using 64K page size, it will be just 1 page. Since we're here, also move the definition of BTRFS_MAX_METADATA_BLOCKSIZE to btrfs_tree.h, to avoid circle including of ctree.h. Reviewed-by: Johannes Thumshirn Reviewed-by: Nikolay Borisov Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/ctree.h | 6 ------ fs/btrfs/extent_io.c | 7 +------ fs/btrfs/extent_io.h | 4 ++-- include/uapi/linux/btrfs_tree.h | 4 ++++ 4 files changed, 7 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 112c9a2ae47b..c5ef29078954 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -67,12 +67,6 @@ struct btrfs_ref; #define BTRFS_OLDEST_GENERATION 0ULL -/* - * the max metadata block size. This limit is somewhat artificial, - * but the memmove costs go through the roof for larger blocks. - */ -#define BTRFS_MAX_METADATA_BLOCKSIZE 65536 - /* * we can actually store much bigger names, but lets not confuse the rest * of linux diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4b72c824064c..2bab66b42395 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5053,12 +5053,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, atomic_set(&eb->refs, 1); atomic_set(&eb->io_pages, 0); - /* - * Sanity checks, currently the maximum is 64k covered by 16x 4k pages - */ - BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE - > MAX_INLINE_EXTENT_BUFFER_SIZE); - BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE); + ASSERT(len <= BTRFS_MAX_METADATA_BLOCKSIZE); return eb; } diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index db95468801c7..02b4786478b6 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -6,6 +6,7 @@ #include #include #include +#include #include "ulist.h" /* @@ -74,8 +75,7 @@ typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio, typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode, struct bio *bio, u64 opt_file_offset); -#define INLINE_EXTENT_BUFFER_PAGES 16 -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE) +#define INLINE_EXTENT_BUFFER_PAGES (BTRFS_MAX_METADATA_BLOCKSIZE / PAGE_SIZE) struct extent_buffer { u64 start; unsigned long len; diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 6b885982ece6..44e22f935b8c 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -4,6 +4,7 @@ #include #include +#include #ifdef __KERNEL__ #include #else @@ -990,4 +991,7 @@ struct btrfs_qgroup_limit_item { __le64 rsv_excl; } __attribute__ ((__packed__)); +/* Maximum metadata block size (nodesize) */ +#define BTRFS_MAX_METADATA_BLOCKSIZE SZ_64K + #endif /* _BTRFS_CTREE_H_ */ From patchwork Wed Dec 2 06:48:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944817 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3859C71156 for ; Wed, 2 Dec 2020 06:49:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4F3DC20DD4 for ; Wed, 2 Dec 2020 06:49:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728699AbgLBGtv (ORCPT ); Wed, 2 Dec 2020 01:49:51 -0500 Received: from mx2.suse.de ([195.135.220.15]:53474 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728623AbgLBGtu (ORCPT ); Wed, 2 Dec 2020 01:49:50 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891715; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3BLNgJuooQugIyc5magaI5yoL/H+svhmP1oQvBidNBY=; b=tJNtgJsnDd6YJXTq8+xwe+nYvvlsr7ESNWRKeThUAidKwQXTYgWRbibtGM9Cu7MhJbDeyO wuBoxXi9Dz5rCT662wNSYL14i3gzLBMq8FP7moovLgTECnqpXwvhXmr7Z9aMc5raB7GGif 23DshYLo7GcF1VJaNJ7RKFczAKoi3g4= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8FF96AEC3 for ; Wed, 2 Dec 2020 06:48:35 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 06/15] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Date: Wed, 2 Dec 2020 14:48:02 +0800 Message-Id: <20201202064811.100688-7-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org As a preparation for subpage sector size support (allowing filesystem with sector size smaller than page size to be mounted) if the sector size is smaller than page size, we don't allow tree block to be read if it crosses 64K(*) boundary. The 64K is selected because: - We are only going to support 64K page size for subpage for now - 64K is also the max node size btrfs supports This ensures that, tree blocks are always contained in one page for a system with 64K page size, which can greatly simplify the handling. Or we need to do complex multi-page handling for tree blocks. Currently there is no way to create such tree blocks. Kernel has avoided such tree blocks allocation even on 4K page size, as it can lead to RAID56 stripe scrubing. While btrfs-progs has fixed its chunk allocator since 2016 for convert, and has extra checks to do the same behavior as the kernel. Just add such graceful checks for ancient btrfs. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2bab66b42395..8cbd6d43b154 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5272,6 +5272,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, btrfs_err(fs_info, "bad tree block start %llu", start); return ERR_PTR(-EINVAL); } + if (fs_info->sectorsize < PAGE_SIZE && + offset_in_page(start) + len > PAGE_SIZE) { + btrfs_err(fs_info, + "tree block crosses page boundary, start %llu nodesize %lu", + start, len); + return ERR_PTR(-EINVAL); + } eb = find_extent_buffer(fs_info, start); if (eb) From patchwork Wed Dec 2 06:48:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADD70C64E8A for ; Wed, 2 Dec 2020 06:49:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5A91C20DD4 for ; Wed, 2 Dec 2020 06:49:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387506AbgLBGtx (ORCPT ); Wed, 2 Dec 2020 01:49:53 -0500 Received: from mx2.suse.de ([195.135.220.15]:53486 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727529AbgLBGtx (ORCPT ); Wed, 2 Dec 2020 01:49:53 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891719; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WGd++jE3b60PgZRvV2KRonRU8+aZziCu6mgAkgyGAWc=; b=MvuAg64HPuXudcnij3aYhc5h6S9zM/C1ryI+zS9KUBQwqdxMQY8eDJaAjmLSO+HpUr2VUo WWH1mfD9QjmrUNHjGHb0/y8BXIQr/M5uG5sQCs122il0TKMC7P3vRlbHVSR7LmAe0TpPNy BB4aoBgohDhboDaLbYt+dDYezIJ2J5c= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 67010AECD; Wed, 2 Dec 2020 06:48:39 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v3 07/15] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Date: Wed, 2 Dec 2020 14:48:03 +0800 Message-Id: <20201202064811.100688-8-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For subpage sized extent buffer, we have ensured no extent buffer will cross page boundary, thus we would only need one page for any extent buffer. This patch will update the function num_extent_pages() to handle such case. Now num_extent_pages() would return 1 instead of for subpage sized extent buffer. Reviewed-by: Nikolay Borisov Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 02b4786478b6..811f44d82c7c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -203,8 +203,14 @@ void btrfs_readahead_node_child(struct extent_buffer *node, int slot); static inline int num_extent_pages(const struct extent_buffer *eb) { - return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) - - (eb->start >> PAGE_SHIFT); + /* + * For sectorsize == PAGE_SIZE case, since nodesize is always aligned to + * sectorsize, it's just eb->len >> PAGE_SHIFT. + * + * For sectorsize < PAGE_SIZE case, we could have nodesize < PAGE_SIZE, + * thus have to ensure we get at least one page. + */ + return eb->len >> PAGE_SHIFT ?: 1; } static inline int extent_buffer_uptodate(const struct extent_buffer *eb) From patchwork Wed Dec 2 06:48:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60CC7C64E8A for ; Wed, 2 Dec 2020 06:49:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 024E920DD4 for ; Wed, 2 Dec 2020 06:49:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387519AbgLBGtz (ORCPT ); Wed, 2 Dec 2020 01:49:55 -0500 Received: from mx2.suse.de ([195.135.220.15]:53484 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728729AbgLBGtz (ORCPT ); Wed, 2 Dec 2020 01:49:55 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891721; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g/+HBOxlv7qk/fJJlPVeM7oxzIPT1suWEJP5DtJBf5Q=; b=eKtuD2Gy6wnlWx/drZnWQaeBhKdpXpFOjQh2yUV0PsP26Rh+rvcdhQiQHDh+dolgQ/E+Lt LSFeJpc7gRMHU/nsbVYWQAJeiH8Wdxw7P3TAoNgMDwGxFAAuGYXH3z3iTe+pfNxEk9sFMO TOZPCNXVP3aDaEFnxiX4F+M99xQadrI= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 48359AEDE; Wed, 2 Dec 2020 06:48:41 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Goldwyn Rodrigues Subject: [PATCH v3 08/15] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Date: Wed, 2 Dec 2020 14:48:04 +0800 Message-Id: <20201202064811.100688-9-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org To support sectorsize < PAGE_SIZE case, we need to take extra care for extent buffer accessors. Since sectorsize is smaller than PAGE_SIZE, one page can contain multiple tree blocks, we must use eb->start to determine the real offset to read/write for extent buffer accessors. This patch introduces two helpers to do these: - get_eb_page_index() This is to calculate the index to access extent_buffer::pages. It's just a simple wrapper around "start >> PAGE_SHIFT". For sectorsize == PAGE_SIZE case, nothing is changed. For sectorsize < PAGE_SIZE case, we always get index as 0, and the existing page shift works also fine. - get_eb_offset_in_page() This is to calculate the offset to access extent_buffer::pages. This needs to take extent_buffer::start into consideration. For sectorsize == PAGE_SIZE case, extent_buffer::start is always aligned to PAGE_SIZE, thus adding extent_buffer::start to offset_in_page() won't change the result. For sectorsize < PAGE_SIZE case, adding extent_buffer::start gives us the correct offset to access. This patch will touch the following parts to cover all extent buffer accessors: - BTRFS_SETGET_HEADER_FUNCS() - read_extent_buffer() - read_extent_buffer_to_user() - memcmp_extent_buffer() - write_extent_buffer_chunk_tree_uuid() - write_extent_buffer_fsid() - write_extent_buffer() - memzero_extent_buffer() - copy_extent_buffer_full() - copy_extent_buffer() - memcpy_extent_buffer() - memmove_extent_buffer() - btrfs_get_token_##bits() - btrfs_get_##bits() - btrfs_set_token_##bits() - btrfs_set_##bits() - generic_bin_search() Signed-off-by: Goldwyn Rodrigues Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.c | 3 +- fs/btrfs/ctree.h | 38 ++++++++++++++++++++++-- fs/btrfs/extent_io.c | 64 ++++++++++++++++++++++++----------------- fs/btrfs/struct-funcs.c | 18 ++++++------ 4 files changed, 85 insertions(+), 38 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index e5a0941c4bde..07810891e204 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1683,9 +1683,10 @@ static noinline int generic_bin_search(struct extent_buffer *eb, oip = offset_in_page(offset); if (oip + key_size <= PAGE_SIZE) { - const unsigned long idx = offset >> PAGE_SHIFT; + const unsigned long idx = get_eb_page_index(offset); char *kaddr = page_address(eb->pages[idx]); + oip = get_eb_offset_in_page(eb, offset); tmp = (struct btrfs_disk_key *)(kaddr + oip); } else { read_extent_buffer(eb, &unaligned, offset, key_size); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c5ef29078954..c9eb6d881064 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1552,13 +1552,14 @@ static inline void btrfs_set_token_##name(struct btrfs_map_token *token,\ #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits) \ static inline u##bits btrfs_##name(const struct extent_buffer *eb) \ { \ - const type *p = page_address(eb->pages[0]); \ + const type *p = page_address(eb->pages[0]) + \ + offset_in_page(eb->start); \ return get_unaligned_le##bits(&p->member); \ } \ static inline void btrfs_set_##name(const struct extent_buffer *eb, \ u##bits val) \ { \ - type *p = page_address(eb->pages[0]); \ + type *p = page_address(eb->pages[0]) + offset_in_page(eb->start); \ put_unaligned_le##bits(val, &p->member); \ } @@ -3366,6 +3367,39 @@ static inline void assertfail(const char *expr, const char* file, int line) { } #define ASSERT(expr) (void)(expr) #endif +/* + * Get the correct offset inside the page of extent buffer. + * + * Will handle both sectorsize == PAGE_SIZE and sectorsize < PAGE_SIZE cases. + * + * @eb: The target extent buffer + * @start: The offset inside the extent buffer + */ +static inline size_t get_eb_offset_in_page(const struct extent_buffer *eb, + unsigned long offset) +{ + /* + * For sectorsize == PAGE_SIZE case, eb->start will always be aligned + * to PAGE_SIZE, thus adding it won't cause any difference. + * + * For sectorsize < PAGE_SIZE, we must only read the data belongs to + * the eb, thus we have to take the eb->start into consideration. + */ + return offset_in_page(offset + eb->start); +} + +static inline unsigned long get_eb_page_index(unsigned long offset) +{ + /* + * For sectorsize == PAGE_SIZE case, plain >> PAGE_SHIFT is enough. + * + * For sectorsize < PAGE_SIZE case, we only support 64K PAGE_SIZE, + * and has ensured all tree blocks are contained in one page, thus + * we always get index == 0. + */ + return offset >> PAGE_SHIFT; +} + /* * Use that for functions that are conditionally exported for sanity tests but * otherwise static diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8cbd6d43b154..8719c51bb4c5 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5702,12 +5702,12 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv, struct page *page; char *kaddr; char *dst = (char *)dstv; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); if (check_eb_range(eb, start, len)) return; - offset = offset_in_page(start); + offset = get_eb_offset_in_page(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5732,13 +5732,13 @@ int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb, struct page *page; char *kaddr; char __user *dst = (char __user *)dstv; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); int ret = 0; WARN_ON(start > eb->len); WARN_ON(start + len > eb->start + eb->len); - offset = offset_in_page(start); + offset = get_eb_offset_in_page(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5767,13 +5767,13 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv, struct page *page; char *kaddr; char *ptr = (char *)ptrv; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); int ret = 0; if (check_eb_range(eb, start, len)) return -EINVAL; - offset = offset_in_page(start); + offset = get_eb_offset_in_page(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5799,7 +5799,7 @@ void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb, char *kaddr; WARN_ON(!PageUptodate(eb->pages[0])); - kaddr = page_address(eb->pages[0]); + kaddr = page_address(eb->pages[0]) + get_eb_offset_in_page(eb, 0); memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv, BTRFS_FSID_SIZE); } @@ -5809,7 +5809,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv) char *kaddr; WARN_ON(!PageUptodate(eb->pages[0])); - kaddr = page_address(eb->pages[0]); + kaddr = page_address(eb->pages[0]) + get_eb_offset_in_page(eb, 0); memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv, BTRFS_FSID_SIZE); } @@ -5822,12 +5822,12 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv, struct page *page; char *kaddr; char *src = (char *)srcv; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); if (check_eb_range(eb, start, len)) return; - offset = offset_in_page(start); + offset = get_eb_offset_in_page(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5851,12 +5851,12 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start, size_t offset; struct page *page; char *kaddr; - unsigned long i = start >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(start); if (check_eb_range(eb, start, len)) return; - offset = offset_in_page(start); + offset = get_eb_offset_in_page(eb, start); while (len > 0) { page = eb->pages[i]; @@ -5880,10 +5880,20 @@ void copy_extent_buffer_full(const struct extent_buffer *dst, ASSERT(dst->len == src->len); - num_pages = num_extent_pages(dst); - for (i = 0; i < num_pages; i++) - copy_page(page_address(dst->pages[i]), - page_address(src->pages[i])); + if (dst->fs_info->sectorsize == PAGE_SIZE) { + num_pages = num_extent_pages(dst); + for (i = 0; i < num_pages; i++) + copy_page(page_address(dst->pages[i]), + page_address(src->pages[i])); + } else { + size_t src_offset = get_eb_offset_in_page(src, 0); + size_t dst_offset = get_eb_offset_in_page(dst, 0); + + ASSERT(src->fs_info->sectorsize < PAGE_SIZE); + memcpy(page_address(dst->pages[0]) + dst_offset, + page_address(src->pages[0]) + src_offset, + src->len); + } } void copy_extent_buffer(const struct extent_buffer *dst, @@ -5896,7 +5906,7 @@ void copy_extent_buffer(const struct extent_buffer *dst, size_t offset; struct page *page; char *kaddr; - unsigned long i = dst_offset >> PAGE_SHIFT; + unsigned long i = get_eb_page_index(dst_offset); if (check_eb_range(dst, dst_offset, len) || check_eb_range(src, src_offset, len)) @@ -5904,7 +5914,7 @@ void copy_extent_buffer(const struct extent_buffer *dst, WARN_ON(src->len != dst_len); - offset = offset_in_page(dst_offset); + offset = get_eb_offset_in_page(dst, dst_offset); while (len > 0) { page = dst->pages[i]; @@ -5948,7 +5958,7 @@ static inline void eb_bitmap_offset(const struct extent_buffer *eb, * the bitmap item in the extent buffer + the offset of the byte in the * bitmap item. */ - offset = start + byte_offset; + offset = start + offset_in_page(eb->start) + byte_offset; *page_index = offset >> PAGE_SHIFT; *page_offset = offset_in_page(offset); @@ -6102,11 +6112,11 @@ void memcpy_extent_buffer(const struct extent_buffer *dst, return; while (len > 0) { - dst_off_in_page = offset_in_page(dst_offset); - src_off_in_page = offset_in_page(src_offset); + dst_off_in_page = get_eb_offset_in_page(dst, dst_offset); + src_off_in_page = get_eb_offset_in_page(dst, src_offset); - dst_i = dst_offset >> PAGE_SHIFT; - src_i = src_offset >> PAGE_SHIFT; + dst_i = get_eb_page_index(dst_offset); + src_i = get_eb_page_index(src_offset); cur = min(len, (unsigned long)(PAGE_SIZE - src_off_in_page)); @@ -6142,11 +6152,11 @@ void memmove_extent_buffer(const struct extent_buffer *dst, return; } while (len > 0) { - dst_i = dst_end >> PAGE_SHIFT; - src_i = src_end >> PAGE_SHIFT; + dst_i = get_eb_page_index(dst_end); + src_i = get_eb_page_index(src_end); - dst_off_in_page = offset_in_page(dst_end); - src_off_in_page = offset_in_page(src_end); + dst_off_in_page = get_eb_offset_in_page(dst, dst_end); + src_off_in_page = get_eb_offset_in_page(dst, src_end); cur = min_t(unsigned long, len, src_off_in_page + 1); cur = min(cur, dst_off_in_page + 1); diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c index c46be27be700..fc282eec7291 100644 --- a/fs/btrfs/struct-funcs.c +++ b/fs/btrfs/struct-funcs.c @@ -57,8 +57,9 @@ u##bits btrfs_get_token_##bits(struct btrfs_map_token *token, \ const void *ptr, unsigned long off) \ { \ const unsigned long member_offset = (unsigned long)ptr + off; \ - const unsigned long idx = member_offset >> PAGE_SHIFT; \ - const unsigned long oip = offset_in_page(member_offset); \ + const unsigned long idx = get_eb_page_index(member_offset); \ + const unsigned long oip = get_eb_offset_in_page(token->eb, \ + member_offset); \ const int size = sizeof(u##bits); \ u8 lebytes[sizeof(u##bits)]; \ const int part = PAGE_SIZE - oip; \ @@ -85,8 +86,8 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \ const void *ptr, unsigned long off) \ { \ const unsigned long member_offset = (unsigned long)ptr + off; \ - const unsigned long oip = offset_in_page(member_offset); \ - const unsigned long idx = member_offset >> PAGE_SHIFT; \ + const unsigned long oip = get_eb_offset_in_page(eb, member_offset); \ + const unsigned long idx = get_eb_page_index(member_offset); \ char *kaddr = page_address(eb->pages[idx]); \ const int size = sizeof(u##bits); \ const int part = PAGE_SIZE - oip; \ @@ -106,8 +107,9 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token, \ u##bits val) \ { \ const unsigned long member_offset = (unsigned long)ptr + off; \ - const unsigned long idx = member_offset >> PAGE_SHIFT; \ - const unsigned long oip = offset_in_page(member_offset); \ + const unsigned long idx = get_eb_page_index(member_offset); \ + const unsigned long oip = get_eb_offset_in_page(token->eb, \ + member_offset); \ const int size = sizeof(u##bits); \ u8 lebytes[sizeof(u##bits)]; \ const int part = PAGE_SIZE - oip; \ @@ -136,8 +138,8 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \ unsigned long off, u##bits val) \ { \ const unsigned long member_offset = (unsigned long)ptr + off; \ - const unsigned long oip = offset_in_page(member_offset); \ - const unsigned long idx = member_offset >> PAGE_SHIFT; \ + const unsigned long oip = get_eb_offset_in_page(eb, member_offset); \ + const unsigned long idx = get_eb_page_index(member_offset); \ char *kaddr = page_address(eb->pages[idx]); \ const int size = sizeof(u##bits); \ const int part = PAGE_SIZE - oip; \ From patchwork Wed Dec 2 06:48:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 742BCC64E8A for ; Wed, 2 Dec 2020 06:49:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2922E20DD4 for ; Wed, 2 Dec 2020 06:49:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387527AbgLBGt6 (ORCPT ); Wed, 2 Dec 2020 01:49:58 -0500 Received: from mx2.suse.de ([195.135.220.15]:53502 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387520AbgLBGt6 (ORCPT ); Wed, 2 Dec 2020 01:49:58 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891724; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L1L7AU8lT4bcUIwOOLLllfpn90gWuMD0UJ0DFn53ZUA=; b=aE79wDQu0IZ8drq+ur21VWFWxWjf53+lVD0sN/+tIDz/Ip5JL5J+bm2ntU3/53dvXe1uN2 rSMgiVI7VsHlNpxm3npzrENvgsfHXcqjysXGP6ncCQCi+jiJbmlgxQiK9VwR8o2Se1g8Mf FvVUV+wcBkiZ5+/6PCnWw8BOeULnU3U= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 629AAAED6; Wed, 2 Dec 2020 06:48:44 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v3 09/15] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums() Date: Wed, 2 Dec 2020 14:48:05 +0800 Message-Id: <20201202064811.100688-10-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The function btrfs_lookup_bio_sums() is only called for read bios. While btrfs_find_ordered_sum() is to search ordered extent sums, which is only for write path. This means to read a page we either: - Submit read bio if it's no uptodate This means we only need to search csum tree for csums. - The page is already uptodate It can be marked uptodate for previous read, or being marked dirty. As we always mark page uptodate for dirty page. In that case, we don't need to submit read bio at all, thus no need to search any csum. So this patch will remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums(). And since btrfs_lookup_bio_sums() is the only caller for btrfs_find_ordered_sum(), also remove the implementation. Reviewed-by: Nikolay Borisov Signed-off-by: Qu Wenruo --- fs/btrfs/file-item.c | 16 ++++++++++----- fs/btrfs/ordered-data.c | 44 ----------------------------------------- fs/btrfs/ordered-data.h | 2 -- 3 files changed, 11 insertions(+), 51 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 8fa98d55fcfd..3df13d0446b9 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -239,7 +239,8 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, } /** - * btrfs_lookup_bio_sums - Look up checksums for a bio. + * btrfs_lookup_bio_sums - Look up checksums for a read bio. + * * @inode: inode that the bio is for. * @bio: bio to look up. * @offset: Unless (u64)-1, look up checksums for this offset in the file. @@ -274,6 +275,15 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, if (!fs_info->csum_root || (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) return BLK_STS_OK; + /* + * This function is only called for read bio. + * + * This means several things: + * - All of our csums should only be in csum tree + * No ordered extents csums. As ordered extents are only for write + * path. + */ + ASSERT(bio_op(bio) == REQ_OP_READ); path = btrfs_alloc_path(); if (!path) return BLK_STS_RESOURCE; @@ -324,10 +334,6 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, if (page_offsets) offset = page_offset(bvec.bv_page) + bvec.bv_offset; - count = btrfs_find_ordered_sum(BTRFS_I(inode), offset, - disk_bytenr, csum, nblocks); - if (count) - goto found; if (!item || disk_bytenr < item_start_offset || disk_bytenr >= item_last_offset) { diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 0d61f9fefc02..79d366a36223 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -854,50 +854,6 @@ btrfs_lookup_first_ordered_extent(struct btrfs_inode *inode, u64 file_offset) return entry; } -/* - * search the ordered extents for one corresponding to 'offset' and - * try to find a checksum. This is used because we allow pages to - * be reclaimed before their checksum is actually put into the btree - */ -int btrfs_find_ordered_sum(struct btrfs_inode *inode, u64 offset, - u64 disk_bytenr, u8 *sum, int len) -{ - struct btrfs_fs_info *fs_info = inode->root->fs_info; - struct btrfs_ordered_sum *ordered_sum; - struct btrfs_ordered_extent *ordered; - struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree; - unsigned long num_sectors; - unsigned long i; - const u32 csum_size = fs_info->csum_size; - int index = 0; - - ordered = btrfs_lookup_ordered_extent(inode, offset); - if (!ordered) - return 0; - - spin_lock_irq(&tree->lock); - list_for_each_entry_reverse(ordered_sum, &ordered->list, list) { - if (disk_bytenr >= ordered_sum->bytenr && - disk_bytenr < ordered_sum->bytenr + ordered_sum->len) { - i = (disk_bytenr - ordered_sum->bytenr) >> - fs_info->sectorsize_bits; - num_sectors = ordered_sum->len >> fs_info->sectorsize_bits; - num_sectors = min_t(int, len - index, num_sectors - i); - memcpy(sum + index, ordered_sum->sums + i * csum_size, - num_sectors * csum_size); - - index += (int)num_sectors * csum_size; - if (index == len) - goto out; - disk_bytenr += num_sectors * fs_info->sectorsize; - } - } -out: - spin_unlock_irq(&tree->lock); - btrfs_put_ordered_extent(ordered); - return index; -} - /* * btrfs_flush_ordered_range - Lock the passed range and ensures all pending * ordered extents in it are run to completion. diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 367269effd6a..0bfa82b58e23 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -183,8 +183,6 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range( u64 len); void btrfs_get_ordered_extents_for_logging(struct btrfs_inode *inode, struct list_head *list); -int btrfs_find_ordered_sum(struct btrfs_inode *inode, u64 offset, - u64 disk_bytenr, u8 *sum, int len); u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr, const u64 range_start, const u64 range_len); void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, From patchwork Wed Dec 2 06:48:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944825 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BA71C64E7C for ; Wed, 2 Dec 2020 06:50:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D90F420DD4 for ; Wed, 2 Dec 2020 06:49:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387535AbgLBGt7 (ORCPT ); Wed, 2 Dec 2020 01:49:59 -0500 Received: from mx2.suse.de ([195.135.220.15]:53500 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387522AbgLBGt6 (ORCPT ); Wed, 2 Dec 2020 01:49:58 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891727; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6vCfvm61XaFRzapBowFvMKVakAj2I+6hoxnBcZX/rt8=; b=cLH0+COCrKb1aCfJnkbjr3MH7ikJEbp3yIprtYD1or6eUTkUl6YMyHyRkdeSkJ/CNncwKS /6WpvLovp2jjr+oHVa9mwmZ8MCOoF2+1D7uMHo8lTSEe+AiCeTAyN1362cyJj/a5/4aHTc n6w56joJS8Wb+0o2fu35HlaTKh0MGbY= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 39629AED8 for ; Wed, 2 Dec 2020 06:48:47 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 10/15] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Date: Wed, 2 Dec 2020 14:48:06 +0800 Message-Id: <20201202064811.100688-11-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Refactor btrfs_lookup_bio_sums() by: - Remove the @file_offset parameter There are two factors making the @file_offset parameter useless: * For csum lookup in csum tree, file offset makes no sense We only need disk_bytenr, which is unrelated to file_offset * page_offset (file offset) of each bvec is not contiguous. Pages can be added to the same bio as long as their on-disk bytenr is contiguous, meaning we could have pages at different file offsets in the same bio. Thus passing file_offset makes no sense any more. The only user of file_offset is for data reloc inode, we will use a new function, search_file_offset_in_bio(), to handle it. - Extract the csum tree lookup into search_csum_tree() The new function will handle the csum search in csum tree. The return value is the same as btrfs_find_ordered_sum(), returning the found number of sectors which has checksum. - Change how we do the main loop The only needed info from bio is: * the on-disk bytenr * the length After extracting above info, we can do the search without bio at all, which makes the main loop much simpler: for (cur_disk_bytenr = orig_disk_bytenr; cur_disk_bytenr < orig_disk_bytenr + orig_len; cur_disk_bytenr += count * sectorsize) { /* Lookup csum tree */ count = search_csum_tree(fs_info, path, cur_disk_bytenr, search_len, csum_dst); if (!count) { /* Csum hole handling */ } } - Use single variable as core to calculate all other offsets Instead of all different type of variables, we use only one core variable, cur_disk_bytenr, which represents the current disk bytenr. All involved values can be calculated from that core variable, and all those variable will only be visible in the inner loop. All above refactor makes btrfs_lookup_bio_sums() way more robust than it used to, especially related to the file offset lookup. Now file_offset lookup is only related to data reloc inode, other wise we don't need to bother file_offset at all. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 5 +- fs/btrfs/ctree.h | 2 +- fs/btrfs/file-item.c | 252 +++++++++++++++++++++++++++-------------- fs/btrfs/inode.c | 5 +- 4 files changed, 173 insertions(+), 91 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 12d50f1cdc58..5ae3fa0386b7 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -719,8 +719,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, */ refcount_inc(&cb->pending_bios); - ret = btrfs_lookup_bio_sums(inode, comp_bio, (u64)-1, - sums); + ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); BUG_ON(ret); /* -ENOMEM */ nr_sectors = DIV_ROUND_UP(comp_bio->bi_iter.bi_size, @@ -746,7 +745,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA); BUG_ON(ret); /* -ENOMEM */ - ret = btrfs_lookup_bio_sums(inode, comp_bio, (u64)-1, sums); + ret = btrfs_lookup_bio_sums(inode, comp_bio, sums); BUG_ON(ret); /* -ENOMEM */ ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c9eb6d881064..d31627449acd 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3014,7 +3014,7 @@ struct btrfs_dio_private; int btrfs_del_csums(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 len); blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, - u64 offset, u8 *dst); + u8 *dst); int btrfs_insert_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 objectid, u64 pos, diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 3df13d0446b9..0eaa78800861 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -238,13 +238,118 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, return ret; } +/* + * Find csums for logical bytenr range + * [disk_bytenr, disk_bytenr + len) and restore the result to @dst. + * + * Return >0 for the number of sectors we found. + * Return 0 for the range [disk_bytenr, disk_bytenr + sectorsize) has no csum + * for it. Caller may want to try next sector until one range is hit. + * Return <0 for fatal error. + */ +static int search_csum_tree(struct btrfs_fs_info *fs_info, + struct btrfs_path *path, u64 disk_bytenr, + u64 len, u8 *dst) +{ + struct btrfs_csum_item *item = NULL; + struct btrfs_key key; + const u32 sectorsize = fs_info->sectorsize; + const u32 csum_size = fs_info->csum_size; + u32 itemsize; + int ret; + u64 csum_start; + u64 csum_len; + + ASSERT(IS_ALIGNED(disk_bytenr, sectorsize) && + IS_ALIGNED(len, sectorsize)); + + /* Check if the current csum item covers disk_bytenr */ + if (path->nodes[0]) { + item = btrfs_item_ptr(path->nodes[0], path->slots[0], + struct btrfs_csum_item); + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); + itemsize = btrfs_item_size_nr(path->nodes[0], path->slots[0]); + + csum_start = key.offset; + csum_len = (itemsize / csum_size) * sectorsize; + + if (in_range(disk_bytenr, csum_start, csum_len)) + goto found; + } + + /* Current item doesn't contain the desired range, re-search */ + btrfs_release_path(path); + item = btrfs_lookup_csum(NULL, fs_info->csum_root, path, + disk_bytenr, 0); + if (IS_ERR(item)) { + ret = PTR_ERR(item); + goto out; + } + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); + itemsize = btrfs_item_size_nr(path->nodes[0], path->slots[0]); + + csum_start = key.offset; + csum_len = (itemsize / csum_size) * sectorsize; + ASSERT(in_range(disk_bytenr, csum_start, csum_len)); + +found: + ret = (min(csum_start + csum_len, disk_bytenr + len) - + disk_bytenr) >> fs_info->sectorsize_bits; + read_extent_buffer(path->nodes[0], dst, (unsigned long)item, + ret * csum_size); +out: + if (ret == -ENOENT) + ret = 0; + return ret; +} + +/* + * A helper to locate the file_offset of @cur_disk_bytenr of a @bio. + * + * Bio of btrfs represents read range of + * [bi_sector << 9, bi_sector << 9 + bi_size). + * Knowing this, we can iterate through each bvec to locate the page belong to + * @cur_disk_bytenr and get the file offset. + * + * @inode is used to determine the bvec page really belongs to @inode. + * + * Return 0 if we can't find the file offset; + * Return >0 if we find the file offset and restore it to @file_offset_ret + */ +static int search_file_offset_in_bio(struct bio *bio, struct inode *inode, + u64 disk_bytenr, u64 *file_offset_ret) +{ + struct bvec_iter iter; + struct bio_vec bvec; + u64 cur = bio->bi_iter.bi_sector << 9; + int ret = 0; + + bio_for_each_segment(bvec, bio, iter) { + struct page *page = bvec.bv_page; + + if (cur > disk_bytenr) + break; + if (cur + bvec.bv_len <= disk_bytenr) { + cur += bvec.bv_len; + continue; + } + ASSERT(in_range(disk_bytenr, cur, bvec.bv_len)); + if (page->mapping && page->mapping->host && + page->mapping->host == inode) { + ret = 1; + *file_offset_ret = page_offset(page) + bvec.bv_offset + + disk_bytenr - cur; + break; + } + } + return ret; +} + /** - * btrfs_lookup_bio_sums - Look up checksums for a read bio. + * Lookup the csum for the read bio in csum tree. * * @inode: inode that the bio is for. * @bio: bio to look up. - * @offset: Unless (u64)-1, look up checksums for this offset in the file. - * If (u64)-1, use the page offsets from the bio instead. * @dst: Buffer of size nblocks * btrfs_super_csum_size() used to return * checksum (nblocks = bio->bi_iter.bi_size / fs_info->sectorsize). If * NULL, the checksum buffer is allocated and returned in @@ -253,24 +358,19 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise. */ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, - u64 offset, u8 *dst) + u8 *dst) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct bio_vec bvec; - struct bvec_iter iter; - struct btrfs_csum_item *item = NULL; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct btrfs_path *path; - const bool page_offsets = (offset == (u64)-1); + const u32 sectorsize = fs_info->sectorsize; + const u32 csum_size = fs_info->csum_size; + u32 orig_len = bio->bi_iter.bi_size; + u64 orig_disk_bytenr = bio->bi_iter.bi_sector << 9; + u64 cur_disk_bytenr; u8 *csum; - u64 item_start_offset = 0; - u64 item_last_offset = 0; - u64 disk_bytenr; - u64 page_bytes_left; - u32 diff; - int nblocks; + unsigned int nblocks = orig_len >> fs_info->sectorsize_bits; int count = 0; - const u32 csum_size = fs_info->csum_size; if (!fs_info->csum_root || (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) return BLK_STS_OK; @@ -282,13 +382,16 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, * - All of our csums should only be in csum tree * No ordered extents csums. As ordered extents are only for write * path. + * - No need to bother any other info from bvec + * Since we're looking up csums, the only important info is the + * disk_bytenr and the length, which can all be extracted from + * bi_iter directly. */ ASSERT(bio_op(bio) == REQ_OP_READ); path = btrfs_alloc_path(); if (!path) return BLK_STS_RESOURCE; - nblocks = bio->bi_iter.bi_size >> fs_info->sectorsize_bits; if (!dst) { struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio); @@ -325,81 +428,62 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, path->skip_locking = 1; } - disk_bytenr = bio->bi_iter.bi_sector << 9; + for (cur_disk_bytenr = orig_disk_bytenr; + cur_disk_bytenr < orig_disk_bytenr + orig_len; + cur_disk_bytenr += (count * sectorsize)) { + u64 search_len = orig_disk_bytenr + orig_len - cur_disk_bytenr; + unsigned int sector_offset; + u8 *csum_dst; - bio_for_each_segment(bvec, bio, iter) { - page_bytes_left = bvec.bv_len; - if (count) - goto next; - - if (page_offsets) - offset = page_offset(bvec.bv_page) + bvec.bv_offset; + /* + * Although both cur_disk_bytenr and orig_disk_bytenr is u64, + * we're calculating the offset to the bio start. + * + * Bio size is limited to UINT_MAX, thus unsigned int is + * large enough to contain the raw result, not to mention + * the right shifted result. + */ + ASSERT(cur_disk_bytenr - orig_disk_bytenr < UINT_MAX); + sector_offset = (cur_disk_bytenr - orig_disk_bytenr) >> + fs_info->sectorsize_bits; + csum_dst = csum + sector_offset * csum_size; + + count = search_csum_tree(fs_info, path, cur_disk_bytenr, + search_len, csum_dst); + if (count <= 0) { + /* + * Either we hit a critical error or we didn't find + * the csum. + * Either way, we put zero into the csums dst, and just + * skip to next sector for a better luck. + */ + memset(csum_dst, 0, csum_size); + count = 1; - if (!item || disk_bytenr < item_start_offset || - disk_bytenr >= item_last_offset) { - struct btrfs_key found_key; - u32 item_size; - - if (item) - btrfs_release_path(path); - item = btrfs_lookup_csum(NULL, fs_info->csum_root, - path, disk_bytenr, 0); - if (IS_ERR(item)) { - count = 1; - memset(csum, 0, csum_size); - if (BTRFS_I(inode)->root->root_key.objectid == - BTRFS_DATA_RELOC_TREE_OBJECTID) { - set_extent_bits(io_tree, offset, - offset + fs_info->sectorsize - 1, + /* + * For data reloc inode, we need to mark the + * range NODATASUM so that balance won't report + * false csum error. + */ + if (BTRFS_I(inode)->root->root_key.objectid == + BTRFS_DATA_RELOC_TREE_OBJECTID) { + u64 file_offset; + int ret; + + ret = search_file_offset_in_bio(bio, inode, + cur_disk_bytenr, &file_offset); + if (ret) + set_extent_bits(io_tree, file_offset, + file_offset + sectorsize - 1, EXTENT_NODATASUM); - } else { - btrfs_info_rl(fs_info, - "no csum found for inode %llu start %llu", - btrfs_ino(BTRFS_I(inode)), offset); - } - item = NULL; - btrfs_release_path(path); - goto found; + } else { + btrfs_warn_rl(fs_info, + "csum hole found for disk bytenr range [%llu, %llu)", + cur_disk_bytenr, cur_disk_bytenr + sectorsize); } - btrfs_item_key_to_cpu(path->nodes[0], &found_key, - path->slots[0]); - - item_start_offset = found_key.offset; - item_size = btrfs_item_size_nr(path->nodes[0], - path->slots[0]); - item_last_offset = item_start_offset + - (item_size / csum_size) * - fs_info->sectorsize; - item = btrfs_item_ptr(path->nodes[0], path->slots[0], - struct btrfs_csum_item); - } - /* - * this byte range must be able to fit inside - * a single leaf so it will also fit inside a u32 - */ - diff = disk_bytenr - item_start_offset; - diff = diff >> fs_info->sectorsize_bits; - diff = diff * csum_size; - count = min_t(int, nblocks, (item_last_offset - disk_bytenr) >> - fs_info->sectorsize_bits); - read_extent_buffer(path->nodes[0], csum, - ((unsigned long)item) + diff, - csum_size * count); -found: - csum += count * csum_size; - nblocks -= count; -next: - while (count > 0) { - count--; - disk_bytenr += fs_info->sectorsize; - offset += fs_info->sectorsize; - page_bytes_left -= fs_info->sectorsize; - if (!page_bytes_left) - break; /* move to next bio */ } } - WARN_ON_ONCE(count); btrfs_free_path(path); return BLK_STS_OK; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 255ea28982ff..8fb4b60a0091 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2268,7 +2268,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, * need to csum or not, which is why we ignore skip_sum * here. */ - ret = btrfs_lookup_bio_sums(inode, bio, (u64)-1, NULL); + ret = btrfs_lookup_bio_sums(inode, bio, NULL); if (ret) goto out; } @@ -7964,8 +7964,7 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, * * If we have csums disabled this will do nothing. */ - status = btrfs_lookup_bio_sums(inode, dio_bio, file_offset, - dip->csums); + status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums); if (status != BLK_STS_OK) goto out_err; } From patchwork Wed Dec 2 06:48:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944829 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 825FAC64E8A for ; Wed, 2 Dec 2020 06:50:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 39FBD20DD4 for ; Wed, 2 Dec 2020 06:50:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387553AbgLBGuB (ORCPT ); Wed, 2 Dec 2020 01:50:01 -0500 Received: from mx2.suse.de ([195.135.220.15]:53506 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387533AbgLBGuB (ORCPT ); Wed, 2 Dec 2020 01:50:01 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891729; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dh0r33LSE6GDKqd+TH+OYgTaTqiyx2bcc67CEN3TAy0=; b=W86+vH1RG4CqaeftE6sVX4hhdH9bUMJycMT98i/LfnI0+HK3Kh4iZpKiUeYypA96d06DnF zeA0N9P8nTmH5VFYVohXnJsDVPHbJM8S58MHoQQoRdJjNHZfn0l6RVAZm4SD4GAbNmBk3n cn/RDJQ5ChSRaQXAMowTw6clFke7XnE= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 38BFAAEE6 for ; Wed, 2 Dec 2020 06:48:49 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 11/15] btrfs: scrub: reduce the width for extent_len/stripe_len from 64 bits to 32 bits Date: Wed, 2 Dec 2020 14:48:07 +0800 Message-Id: <20201202064811.100688-12-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Btrfs on-disk format choose to use u64 for almost everything, but there are a lot of restriction that, we can't use more than u32 for things like extent length (the maximum length is 128MiB for non-hole extents), or stripe length (we have device number limit). This means if we don't have extra handling to convert u64 to u32, we will always have some questionable operations like "u32 = u64 >> sectorsize_bits" in the code. This patch will try to address the problem by reducing the width for the following members/parameters: - scrub_parity::stripe_len - @len of scrub_pages() - @extent_len of scrub_remap_extent() - @len of scrub_parity_mark_sectors_error() - @len of scrub_parity_mark_sectors_data() - @len of scrub_extent() - @len of scrub_pages_for_parity() - @len of scrub_extent_for_parity() For members extracted from on-disk structure, like map->stripe_len, they will be kept as is. Since that modification would require on-disk format change. There will be cases like "u32 = u64 - u64" or "u32 = u64", for such call sites, extra ASSERT() is added to be extra safe for debug build. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 54 +++++++++++++++++++++++++++--------------------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 78759bc9c980..8026606f7510 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -130,7 +130,7 @@ struct scrub_parity { int nsectors; - u64 stripe_len; + u32 stripe_len; refcount_t refs; @@ -233,7 +233,7 @@ static void scrub_parity_get(struct scrub_parity *sparity); static void scrub_parity_put(struct scrub_parity *sparity); static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx, struct scrub_page *spage); -static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, +static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u32 len, u64 physical, struct btrfs_device *dev, u64 flags, u64 gen, int mirror_num, u8 *csum, u64 physical_for_dev_replace); @@ -241,7 +241,7 @@ static void scrub_bio_end_io(struct bio *bio); static void scrub_bio_end_io_worker(struct btrfs_work *work); static void scrub_block_complete(struct scrub_block *sblock); static void scrub_remap_extent(struct btrfs_fs_info *fs_info, - u64 extent_logical, u64 extent_len, + u64 extent_logical, u32 extent_len, u64 *extent_physical, struct btrfs_device **extent_dev, int *extent_mirror_num); @@ -2147,7 +2147,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock) spin_unlock(&sctx->stat_lock); } -static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, +static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u32 len, u64 physical, struct btrfs_device *dev, u64 flags, u64 gen, int mirror_num, u8 *csum, u64 physical_for_dev_replace) @@ -2171,7 +2171,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, for (index = 0; len > 0; index++) { struct scrub_page *spage; - u64 l = min_t(u64, len, PAGE_SIZE); + u32 l = min_t(u32, len, PAGE_SIZE); spage = kzalloc(sizeof(*spage), GFP_KERNEL); if (!spage) { @@ -2292,10 +2292,9 @@ static void scrub_bio_end_io_worker(struct btrfs_work *work) static inline void __scrub_mark_bitmap(struct scrub_parity *sparity, unsigned long *bitmap, - u64 start, u64 len) + u64 start, u32 len) { u64 offset; - u64 nsectors64; u32 nsectors; u32 sectorsize_bits = sparity->sctx->fs_info->sectorsize_bits; @@ -2307,10 +2306,7 @@ static inline void __scrub_mark_bitmap(struct scrub_parity *sparity, start -= sparity->logic_start; start = div64_u64_rem(start, sparity->stripe_len, &offset); offset = offset >> sectorsize_bits; - nsectors64 = len >> sectorsize_bits; - - ASSERT(nsectors64 < UINT_MAX); - nsectors = (u32)nsectors64; + nsectors = len >> sectorsize_bits; if (offset + nsectors <= sparity->nsectors) { bitmap_set(bitmap, offset, nsectors); @@ -2322,13 +2318,13 @@ static inline void __scrub_mark_bitmap(struct scrub_parity *sparity, } static inline void scrub_parity_mark_sectors_error(struct scrub_parity *sparity, - u64 start, u64 len) + u64 start, u32 len) { __scrub_mark_bitmap(sparity, sparity->ebitmap, start, len); } static inline void scrub_parity_mark_sectors_data(struct scrub_parity *sparity, - u64 start, u64 len) + u64 start, u32 len) { __scrub_mark_bitmap(sparity, sparity->dbitmap, start, len); } @@ -2356,6 +2352,7 @@ static void scrub_block_complete(struct scrub_block *sblock) u64 end = sblock->pagev[sblock->page_count - 1]->logical + PAGE_SIZE; + ASSERT(end - start <= U32_MAX); scrub_parity_mark_sectors_error(sblock->sparity, start, end - start); } @@ -2425,7 +2422,7 @@ static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum) /* scrub extent tries to collect up to 64 kB for each bio */ static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, - u64 logical, u64 len, + u64 logical, u32 len, u64 physical, struct btrfs_device *dev, u64 flags, u64 gen, int mirror_num, u64 physical_for_dev_replace) { @@ -2457,7 +2454,7 @@ static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, } while (len) { - u64 l = min_t(u64, len, blocksize); + u32 l = min(len, blocksize); int have_csum = 0; if (flags & BTRFS_EXTENT_FLAG_DATA) { @@ -2480,7 +2477,7 @@ static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, } static int scrub_pages_for_parity(struct scrub_parity *sparity, - u64 logical, u64 len, + u64 logical, u32 len, u64 physical, struct btrfs_device *dev, u64 flags, u64 gen, int mirror_num, u8 *csum) { @@ -2506,7 +2503,7 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, for (index = 0; len > 0; index++) { struct scrub_page *spage; - u64 l = min_t(u64, len, PAGE_SIZE); + u32 l = min_t(u32, len, PAGE_SIZE); spage = kzalloc(sizeof(*spage), GFP_KERNEL); if (!spage) { @@ -2564,7 +2561,7 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, } static int scrub_extent_for_parity(struct scrub_parity *sparity, - u64 logical, u64 len, + u64 logical, u32 len, u64 physical, struct btrfs_device *dev, u64 flags, u64 gen, int mirror_num) { @@ -2588,7 +2585,7 @@ static int scrub_extent_for_parity(struct scrub_parity *sparity, } while (len) { - u64 l = min_t(u64, len, blocksize); + u32 l = min(len, blocksize); int have_csum = 0; if (flags & BTRFS_EXTENT_FLAG_DATA) { @@ -2792,7 +2789,8 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, u64 generation; u64 extent_logical; u64 extent_physical; - u64 extent_len; + /* Check the comment in scrub_stripe() for why u32 is enough here */ + u32 extent_len; u64 mapped_length; struct btrfs_device *extent_dev; struct scrub_parity *sparity; @@ -2801,6 +2799,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, int extent_mirror_num; int stop_loop = 0; + ASSERT(map->stripe_len <= U32_MAX); nsectors = map->stripe_len >> fs_info->sectorsize_bits; bitmap_len = scrub_calc_parity_bitmap_len(nsectors); sparity = kzalloc(sizeof(struct scrub_parity) + 2 * bitmap_len, @@ -2812,6 +2811,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return -ENOMEM; } + ASSERT(map->stripe_len <= U32_MAX); sparity->stripe_len = map->stripe_len; sparity->nsectors = nsectors; sparity->sctx = sctx; @@ -2906,6 +2906,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, } again: extent_logical = key.objectid; + ASSERT(bytes <= U32_MAX); extent_len = bytes; if (extent_logical < logic_start) { @@ -2984,9 +2985,11 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, logic_start += map->stripe_len; } out: - if (ret < 0) + if (ret < 0) { + ASSERT(logic_end - logic_start <= U32_MAX); scrub_parity_mark_sectors_error(sparity, logic_start, logic_end - logic_start); + } scrub_parity_put(sparity); scrub_submit(sctx); mutex_lock(&sctx->wr_lock); @@ -3028,7 +3031,11 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, u64 offset; u64 extent_logical; u64 extent_physical; - u64 extent_len; + /* + * Unlike chunk length, extent length should never go beyond + * BTRFS_MAX_EXTENT_SIZE, thus u32 is enough here. + */ + u32 extent_len; u64 stripe_logical; u64 stripe_end; struct btrfs_device *extent_dev; @@ -3277,6 +3284,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, again: extent_logical = key.objectid; + ASSERT(bytes <= U32_MAX); extent_len = bytes; /* @@ -4074,7 +4082,7 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, } static void scrub_remap_extent(struct btrfs_fs_info *fs_info, - u64 extent_logical, u64 extent_len, + u64 extent_logical, u32 extent_len, u64 *extent_physical, struct btrfs_device **extent_dev, int *extent_mirror_num) From patchwork Wed Dec 2 06:48:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944827 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CFDCC71156 for ; Wed, 2 Dec 2020 06:50:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 33BEE20DD4 for ; Wed, 2 Dec 2020 06:50:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387540AbgLBGuA (ORCPT ); Wed, 2 Dec 2020 01:50:00 -0500 Received: from mx2.suse.de ([195.135.220.15]:53504 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387531AbgLBGuA (ORCPT ); Wed, 2 Dec 2020 01:50:00 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891731; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YuOLdrngHaLfQKXDkj1I+F8EWFLJuhFrXifk6fLlGkA=; b=r493HqEItRhwPmBelp2gKkQcEccTSpZ2m1FFEqgqhf2CKas2GjRmGPTS7TBvpo1r4EUxoP GtGsTAY6Ra4jE/URMhfhOWhp058g9nvRIw2PuC/zX3Hf6OMTF7MtxgXspbp+sIwT00kpPg lyKVZxWF5N2VifbEieZZpfSgZB++NRU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1D789AEEE for ; Wed, 2 Dec 2020 06:48:51 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 12/15] btrfs: scrub: always allocate one full page for one sector for RAID56 Date: Wed, 2 Dec 2020 14:48:08 +0800 Message-Id: <20201202064811.100688-13-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For scrub_pages() and scrub_pages_for_parity(), we currently allocate one scrub_page structure for one page. This is fine if we only read/write one sector one time. But for cases like scrubing RAID56, we need to read/write the full stripe, which is in 64K size. For subpage size, we will submit the read in just one page, which is normally a good thing, but for RAID56 case, it only expects to see one sector, not the full stripe in its endio function. This could lead to wrong parity checksum for RAID56 on subpage. To make the existing code work well for subpage case, here we take a shortcut, by always allocating a full page for one sector. This should provide the basis to make RAID56 work for subpage case. The cost is pretty obvious now, for one RAID56 stripe now we always need 16 pages. For support subpage situation (64K page size, 4K sector size), this means we need full one megabyte to scrub just one RAID56 stripe. And for data scrub, each 4K sector will also need one 64K page. This is mostly just a workaround, the proper fix for this is a much larger project, using scrub_block to replace scrub_page, and allow scrub_block to handle multi pages, csums, and csum_bitmap to avoid allocating one page for each sector. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 8026606f7510..efc6f5f2b8a4 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2153,6 +2153,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u32 len, u64 physical_for_dev_replace) { struct scrub_block *sblock; + const u32 sectorsize = sctx->fs_info->sectorsize; int index; sblock = kzalloc(sizeof(*sblock), GFP_KERNEL); @@ -2171,7 +2172,12 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u32 len, for (index = 0; len > 0; index++) { struct scrub_page *spage; - u32 l = min_t(u32, len, PAGE_SIZE); + /* + * Here we will allocate one page for one sector to scrub. + * This is fine if PAGE_SIZE == sectorsize, but will cost + * more memory for PAGE_SIZE > sectorsize case. + */ + u32 l = min(sectorsize, len); spage = kzalloc(sizeof(*spage), GFP_KERNEL); if (!spage) { @@ -2483,8 +2489,11 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, { struct scrub_ctx *sctx = sparity->sctx; struct scrub_block *sblock; + u32 sectorsize = sctx->fs_info->sectorsize; int index; + ASSERT(IS_ALIGNED(len, sectorsize)); + sblock = kzalloc(sizeof(*sblock), GFP_KERNEL); if (!sblock) { spin_lock(&sctx->stat_lock); @@ -2503,7 +2512,6 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, for (index = 0; len > 0; index++) { struct scrub_page *spage; - u32 l = min_t(u32, len, PAGE_SIZE); spage = kzalloc(sizeof(*spage), GFP_KERNEL); if (!spage) { @@ -2538,9 +2546,12 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity, spage->page = alloc_page(GFP_KERNEL); if (!spage->page) goto leave_nomem; - len -= l; - logical += l; - physical += l; + + + /* Iterate over the stripe range in sectorsize steps */ + len -= sectorsize; + logical += sectorsize; + physical += sectorsize; } WARN_ON(sblock->page_count == 0); From patchwork Wed Dec 2 06:48:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944833 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED226C71156 for ; Wed, 2 Dec 2020 06:50:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A252520DD4 for ; Wed, 2 Dec 2020 06:50:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387557AbgLBGuD (ORCPT ); Wed, 2 Dec 2020 01:50:03 -0500 Received: from mx2.suse.de ([195.135.220.15]:53518 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387543AbgLBGuC (ORCPT ); Wed, 2 Dec 2020 01:50:02 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891732; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lug76vsqWj1n0hgdm8j3DLsRI4TJKGwQW1u/FGV0POw=; b=h9N3p7gk0c9V50dYZntHla/5mjgkKEmyASS5bL6m4NqKh3su8MYXYGRjbFfu1UwfcqjpWb XjnESZVdrT5ctFq1rpfhl+EKzKFCuDffvo2cqbwimpAzNx8DRgZA5VzD9IntwTtRSYJOhj 5FDM6F8i/BT/UhYMoiWaNw1jPCbq0Lw= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B8E81AEEF for ; Wed, 2 Dec 2020 06:48:52 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 13/15] btrfs: scrub: support subpage tree block scrub Date: Wed, 2 Dec 2020 14:48:09 +0800 Message-Id: <20201202064811.100688-14-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org To support subpage tree block scrub, scrub_checksum_tree_block() only needs to learn 2 new tricks: - Follow sector size Now scrub_page only represents one sector, we need to follow it properly. - Run checksum on all sectors Since scrub_page only represents one sector, we need to run hash on all sectors, no longer just (nodesize >> PAGE_SIZE). Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index efc6f5f2b8a4..a4d30106bacb 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1808,15 +1808,20 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock) struct scrub_ctx *sctx = sblock->sctx; struct btrfs_header *h; struct btrfs_fs_info *fs_info = sctx->fs_info; + const u32 sectorsize = sctx->fs_info->sectorsize; SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); u8 calculated_csum[BTRFS_CSUM_SIZE]; u8 on_disk_csum[BTRFS_CSUM_SIZE]; - const int num_pages = sctx->fs_info->nodesize >> PAGE_SHIFT; + const int num_sectors = fs_info->nodesize >> fs_info->sectorsize_bits; int i; struct scrub_page *spage; char *kaddr; BUG_ON(sblock->page_count < 1); + + /* Each pagev[] is in fact just one sector, not a full page */ + ASSERT(sblock->page_count == num_sectors); + spage = sblock->pagev[0]; kaddr = page_address(spage->page); h = (struct btrfs_header *)kaddr; @@ -1845,11 +1850,11 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock) shash->tfm = fs_info->csum_shash; crypto_shash_init(shash); crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE, - PAGE_SIZE - BTRFS_CSUM_SIZE); + sectorsize - BTRFS_CSUM_SIZE); - for (i = 1; i < num_pages; i++) { + for (i = 1; i < num_sectors; i++) { kaddr = page_address(sblock->pagev[i]->page); - crypto_shash_update(shash, kaddr, PAGE_SIZE); + crypto_shash_update(shash, kaddr, sectorsize); } crypto_shash_final(shash, calculated_csum); From patchwork Wed Dec 2 06:48:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944831 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40979C83012 for ; Wed, 2 Dec 2020 06:50:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E96C420DD4 for ; Wed, 2 Dec 2020 06:50:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387555AbgLBGuC (ORCPT ); Wed, 2 Dec 2020 01:50:02 -0500 Received: from mx2.suse.de ([195.135.220.15]:53520 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387531AbgLBGuB (ORCPT ); Wed, 2 Dec 2020 01:50:01 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891734; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9tmBF4FvB7n9EWQf/JKg3aO2YROU0a0ills3tduD1rQ=; b=t48WSRCttvncf1QhhN3yCAkBs1wl+oWaeiyk884phTJQn/FKBYXFvPCUloe9cmDltC6izn R5X+sThqBkaiskvfaAL3eWcKcVQfBOu5vwPaSXG5bEKNuKmYFcZ8weWjE7pPGlYSpGCui4 YdZHId32Xi996he9zUu4ANNcfQpbOII= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id AEFC6AEF5 for ; Wed, 2 Dec 2020 06:48:54 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 14/15] btrfs: scrub: support subpage data scrub Date: Wed, 2 Dec 2020 14:48:10 +0800 Message-Id: <20201202064811.100688-15-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Btrfs scrub is in fact much more flex than buffered data write path, as we can read an unaligned subpage data into page offset 0. This ability makes subpage support much easier, we just need to check each scrub_page::page_len and ensure we only calculate hash for [0, page_len) of a page, and call it a day for subpage scrub support. There is a small thing to notice, for subpage case, we still do sector by sector scrub. This means we will submit a read bio for each sector to scrub, resulting the same amount of read bios, just like the 4K page systems. This behavior can be considered as a good thing, if we want everything to be the same as 4K page systems. But this also means, we're wasting the ability to submit larger bio using 64K page size. This is another problem to consider in the future. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index a4d30106bacb..8a43e8cb10a6 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1795,11 +1795,15 @@ static int scrub_checksum_data(struct scrub_block *sblock) shash->tfm = fs_info->csum_shash; crypto_shash_init(shash); - crypto_shash_digest(shash, kaddr, PAGE_SIZE, csum); - if (memcmp(csum, spage->csum, sctx->fs_info->csum_size)) - sblock->checksum_error = 1; + /* + * In scrub_pages() and scrub_pages_for_parity() we ensure + * each spage only contains just one sector of data. + */ + crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum); + if (memcmp(csum, spage->csum, fs_info->csum_size)) + sblock->checksum_error = 1; return sblock->checksum_error; } From patchwork Wed Dec 2 06:48:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11944835 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A8DCC64E8A for ; Wed, 2 Dec 2020 06:50:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 15E6E20DD4 for ; Wed, 2 Dec 2020 06:50:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387561AbgLBGuN (ORCPT ); Wed, 2 Dec 2020 01:50:13 -0500 Received: from mx2.suse.de ([195.135.220.15]:53474 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387543AbgLBGuL (ORCPT ); Wed, 2 Dec 2020 01:50:11 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606891737; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s86QM7wHGXhcbOUHWVC15k2fWKUBsBgzuX9bmQWim5I=; b=b04ebu6nUXp16Ap5nZYXNrmBEyKz/vw9iPAHc8TOXK5hhtjzhLguZ9v1pMEpcPoiQwFObE 0qwEL7k5TN0k06ccunHz9zb+3VB1t+SkRvx6/X0zhYsYJVi4fS64uvHjoOxehPppkPz1y3 3IADMelKIHihc+iN0/OniWXfGPDavp8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1DDE4AEF8 for ; Wed, 2 Dec 2020 06:48:57 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 15/15] btrfs: scrub: allow scrub to work with subpage sectorsize Date: Wed, 2 Dec 2020 14:48:11 +0800 Message-Id: <20201202064811.100688-16-wqu@suse.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201202064811.100688-1-wqu@suse.com> References: <20201202064811.100688-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since btrfs scrub is utilizing its own infrastructure to submit read/write, scrub is independent from all other routines. This brings one very neat feature, allow us to read 4K data into offset 0 of a 64K page. So is the writeback routine. This makes scrub on subpage sector size much easier to implement, and thanks to previous commits which just changed the implementation to always do scrub based on sector size, now scrub can handle subpage filesystem without any problem. This patch will just remove the restriction on (sectorsize != PAGE_SIZE), to make scrub finally work on subpage filesystems. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 8a43e8cb10a6..e0ac0009303d 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3880,14 +3880,6 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, return -EINVAL; } - if (fs_info->sectorsize != PAGE_SIZE) { - /* not supported for data w/o checksums */ - btrfs_err_rl(fs_info, - "scrub: size assumption sectorsize != PAGE_SIZE (%d != %lu) fails", - fs_info->sectorsize, PAGE_SIZE); - return -EINVAL; - } - if (fs_info->nodesize > PAGE_SIZE * SCRUB_MAX_PAGES_PER_BLOCK || fs_info->sectorsize > PAGE_SIZE * SCRUB_MAX_PAGES_PER_BLOCK) {