From patchwork Mon May 4 23:58:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11527843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 510B417EF for ; Mon, 4 May 2020 23:58:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 43C02206B8 for ; Mon, 4 May 2020 23:58:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728441AbgEDX6c (ORCPT ); Mon, 4 May 2020 19:58:32 -0400 Received: from mx2.suse.de ([195.135.220.15]:37866 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728278AbgEDX6c (ORCPT ); Mon, 4 May 2020 19:58:32 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 3717DABAD for ; Mon, 4 May 2020 23:58:33 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 1/7] btrfs: block-group: Don't set the wrong READA flag for btrfs_read_block_groups() Date: Tue, 5 May 2020 07:58:19 +0800 Message-Id: <20200504235825.4199-2-wqu@suse.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200504235825.4199-1-wqu@suse.com> References: <20200504235825.4199-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For regular block group items in extent tree, they are scattered inside the huge tree, thus forward readahead makes no sense. Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 5627f53ca115..42dae6473354 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1985,7 +1985,6 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) path = btrfs_alloc_path(); if (!path) return -ENOMEM; - path->reada = READA_FORWARD; cache_gen = btrfs_super_cache_generation(info->super_copy); if (btrfs_test_opt(info, SPACE_CACHE) && From patchwork Mon May 4 23:58:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11527845 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0908912 for ; Mon, 4 May 2020 23:58:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C3612206B8 for ; Mon, 4 May 2020 23:58:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728468AbgEDX6f (ORCPT ); Mon, 4 May 2020 19:58:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:37894 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728278AbgEDX6e (ORCPT ); Mon, 4 May 2020 19:58:34 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 043BDABAD for ; Mon, 4 May 2020 23:58:34 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 2/7] btrfs: block-group: Refactor how we read one block group item Date: Tue, 5 May 2020 07:58:20 +0800 Message-Id: <20200504235825.4199-3-wqu@suse.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200504235825.4199-1-wqu@suse.com> References: <20200504235825.4199-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Structure btrfs_block_group has the following members which are currently read from on-disk block group item and key: - Length From item key. - Used - Flags From block group item. However for incoming skinny block group tree, we are going to read those members from different sources. This patch will refactor such read by: - Don't initialize btrfs_block_group::length at allocation Caller should initialize them manually. Also to avoid possible (well, only two callers) missing initialization, add extra ASSERT() in btrfs_add_block_group_cache(). - Refactor length/used/flags initialization into one function The new function, fill_one_block_group() will handle the initialization of such members. - Use btrfs_block_group::length to replace key::offset Since skinny block group item would have a different meaning for its key offset. Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 47 ++++++++++++++++++++++++++++-------------- 1 file changed, 32 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 42dae6473354..57483ea07d51 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -161,6 +161,8 @@ static int btrfs_add_block_group_cache(struct btrfs_fs_info *info, struct rb_node *parent = NULL; struct btrfs_block_group *cache; + ASSERT(block_group->length != 0); + spin_lock(&info->block_group_cache_lock); p = &info->block_group_cache_tree.rb_node; @@ -1768,7 +1770,7 @@ static void link_block_group(struct btrfs_block_group *cache) } static struct btrfs_block_group *btrfs_create_block_group_cache( - struct btrfs_fs_info *fs_info, u64 start, u64 size) + struct btrfs_fs_info *fs_info, u64 start) { struct btrfs_block_group *cache; @@ -1784,7 +1786,6 @@ static struct btrfs_block_group *btrfs_create_block_group_cache( } cache->start = start; - cache->length = size; cache->fs_info = fs_info; cache->full_stripe_len = btrfs_full_stripe_len(fs_info, start); @@ -1864,25 +1865,44 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info) return ret; } +static int read_block_group_item(struct btrfs_block_group *cache, + struct btrfs_path *path, + const struct btrfs_key *key) +{ + struct extent_buffer *leaf = path->nodes[0]; + struct btrfs_block_group_item bgi; + int slot = path->slots[0]; + + cache->length = key->offset; + + read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), + sizeof(bgi)); + cache->used = btrfs_stack_block_group_used(&bgi); + cache->flags = btrfs_stack_block_group_flags(&bgi); + + return 0; +} + static int read_one_block_group(struct btrfs_fs_info *info, struct btrfs_path *path, const struct btrfs_key *key, int need_clear) { - struct extent_buffer *leaf = path->nodes[0]; struct btrfs_block_group *cache; struct btrfs_space_info *space_info; - struct btrfs_block_group_item bgi; const bool mixed = btrfs_fs_incompat(info, MIXED_GROUPS); - int slot = path->slots[0]; int ret; ASSERT(key->type == BTRFS_BLOCK_GROUP_ITEM_KEY); - cache = btrfs_create_block_group_cache(info, key->objectid, key->offset); + cache = btrfs_create_block_group_cache(info, key->objectid); if (!cache) return -ENOMEM; + ret = read_block_group_item(cache, path, key); + if (ret < 0) + goto error; + if (need_clear) { /* * When we mount with old space cache, we need to @@ -1897,10 +1917,6 @@ static int read_one_block_group(struct btrfs_fs_info *info, if (btrfs_test_opt(info, SPACE_CACHE)) cache->disk_cache_state = BTRFS_DC_CLEAR; } - read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), - sizeof(bgi)); - cache->used = btrfs_stack_block_group_used(&bgi); - cache->flags = btrfs_stack_block_group_flags(&bgi); if (!mixed && ((cache->flags & BTRFS_BLOCK_GROUP_METADATA) && (cache->flags & BTRFS_BLOCK_GROUP_DATA))) { btrfs_err(info, @@ -1928,15 +1944,15 @@ static int read_one_block_group(struct btrfs_fs_info *info, * are empty, and we can just add all the space in and be done with it. * This saves us _a_lot_ of time, particularly in the full case. */ - if (key->offset == cache->used) { + if (cache->length == cache->used) { cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; btrfs_free_excluded_extents(cache); } else if (cache->used == 0) { cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; - add_new_free_space(cache, key->objectid, - key->objectid + key->offset); + add_new_free_space(cache, cache->start, + cache->start + cache->length); btrfs_free_excluded_extents(cache); } @@ -1946,7 +1962,7 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } trace_btrfs_add_block_group(info, cache, 0); - btrfs_update_space_info(info, cache->flags, key->offset, + btrfs_update_space_info(info, cache->flags, cache->length, cache->used, cache->bytes_super, &space_info); cache->space_info = space_info; @@ -2093,10 +2109,11 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, btrfs_set_log_full_commit(trans); - cache = btrfs_create_block_group_cache(fs_info, chunk_offset, size); + cache = btrfs_create_block_group_cache(fs_info, chunk_offset); if (!cache) return -ENOMEM; + cache->length = size; cache->used = bytes_used; cache->flags = type; cache->last_byte_to_unpin = (u64)-1; From patchwork Mon May 4 23:58:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11527847 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B8E73912 for ; Mon, 4 May 2020 23:58:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A1ACC206B8 for ; Mon, 4 May 2020 23:58:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728476AbgEDX6g (ORCPT ); Mon, 4 May 2020 19:58:36 -0400 Received: from mx2.suse.de ([195.135.220.15]:37908 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728278AbgEDX6g (ORCPT ); Mon, 4 May 2020 19:58:36 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id CC54CAEB9 for ; Mon, 4 May 2020 23:58:36 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 3/7] btrfs: block-group: Refactor how we delete one block group item Date: Tue, 5 May 2020 07:58:21 +0800 Message-Id: <20200504235825.4199-4-wqu@suse.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200504235825.4199-1-wqu@suse.com> References: <20200504235825.4199-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When deleting a block group item, it's pretty straight forward, just delete the item pointed by the key. However it will not be that straight-forward for incoming skinny block group item. So refactor the block group item deletion into a new function, remove_block_group_item(), also to make the already lengthy btrfs_remove_block_group() a little shorter. Signed-off-by: Qu Wenruo Reviewed-by: Johannes Thumshirn --- fs/btrfs/block-group.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 57483ea07d51..914b1c2064ac 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -865,11 +865,34 @@ static void clear_incompat_bg_bits(struct btrfs_fs_info *fs_info, u64 flags) } } +static int remove_block_group_item(struct btrfs_trans_handle *trans, + struct btrfs_path *path, + struct btrfs_block_group *block_group) +{ + struct btrfs_fs_info *fs_info = trans->fs_info; + struct btrfs_root *root; + struct btrfs_key key; + int ret; + + root = fs_info->extent_root; + key.objectid = block_group->start; + key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; + key.offset = block_group->length; + + ret = btrfs_search_slot(trans, root, &key, path, -1, 1); + if (ret > 0) + ret = -ENOENT; + if (ret < 0) + return ret; + + ret = btrfs_del_item(trans, root, path); + return ret; +} + int btrfs_remove_block_group(struct btrfs_trans_handle *trans, u64 group_start, struct extent_map *em) { struct btrfs_fs_info *fs_info = trans->fs_info; - struct btrfs_root *root = fs_info->extent_root; struct btrfs_path *path; struct btrfs_block_group *block_group; struct btrfs_free_cluster *cluster; @@ -1067,10 +1090,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, spin_unlock(&block_group->space_info->lock); - key.objectid = block_group->start; - key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; - key.offset = block_group->length; - mutex_lock(&fs_info->chunk_mutex); spin_lock(&block_group->lock); block_group->removed = 1; @@ -1109,16 +1128,10 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, /* Once for the block groups rbtree */ btrfs_put_block_group(block_group); - ret = btrfs_search_slot(trans, root, &key, path, -1, 1); - if (ret > 0) - ret = -EIO; + ret = remove_block_group_item(trans, path, block_group); if (ret < 0) goto out; - ret = btrfs_del_item(trans, root, path); - if (ret) - goto out; - if (remove_em) { struct extent_map_tree *em_tree; From patchwork Mon May 4 23:58:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11527849 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CCEE081 for ; Mon, 4 May 2020 23:58:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B3339206B8 for ; Mon, 4 May 2020 23:58:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728483AbgEDX6i (ORCPT ); Mon, 4 May 2020 19:58:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:37914 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728278AbgEDX6i (ORCPT ); Mon, 4 May 2020 19:58:38 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id C667EABAD for ; Mon, 4 May 2020 23:58:38 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 4/7] btrfs: block-group: Refactor how we insert a block group item Date: Tue, 5 May 2020 07:58:22 +0800 Message-Id: <20200504235825.4199-5-wqu@suse.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200504235825.4199-1-wqu@suse.com> References: <20200504235825.4199-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently the block group item insert is pretty straight forward, fill the block group item structure and insert it into extent tree. However the incoming skinny block group feature is going to change this, so this patch will refactor such insert into a new function, insert_block_group_item(), to make the incoming feature easier to add. Signed-off-by: Qu Wenruo Reviewed-by: Johannes Thumshirn --- fs/btrfs/block-group.c | 41 +++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 914b1c2064ac..93e7835ca79d 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2068,13 +2068,32 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) return ret; } +static int insert_block_group_item(struct btrfs_trans_handle *trans, + struct btrfs_block_group *block_group) +{ + struct btrfs_fs_info *fs_info = trans->fs_info; + struct btrfs_block_group_item bgi; + struct btrfs_root *root; + struct btrfs_key key; + + spin_lock(&block_group->lock); + btrfs_set_stack_block_group_used(&bgi, block_group->used); + btrfs_set_stack_block_group_chunk_objectid(&bgi, + BTRFS_FIRST_CHUNK_TREE_OBJECTID); + btrfs_set_stack_block_group_flags(&bgi, block_group->flags); + key.objectid = block_group->start; + key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; + key.offset = block_group->length; + spin_unlock(&block_group->lock); + + root = fs_info->extent_root; + return btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi)); +} + void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_block_group *block_group; - struct btrfs_root *extent_root = fs_info->extent_root; - struct btrfs_block_group_item item; - struct btrfs_key key; int ret = 0; if (!trans->can_flush_pending_bgs) @@ -2087,21 +2106,11 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans) if (ret) goto next; - spin_lock(&block_group->lock); - btrfs_set_stack_block_group_used(&item, block_group->used); - btrfs_set_stack_block_group_chunk_objectid(&item, - BTRFS_FIRST_CHUNK_TREE_OBJECTID); - btrfs_set_stack_block_group_flags(&item, block_group->flags); - key.objectid = block_group->start; - key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; - key.offset = block_group->length; - spin_unlock(&block_group->lock); - - ret = btrfs_insert_item(trans, extent_root, &key, &item, - sizeof(item)); + ret = insert_block_group_item(trans, block_group); if (ret) btrfs_abort_transaction(trans, ret); - ret = btrfs_finish_chunk_alloc(trans, key.objectid, key.offset); + ret = btrfs_finish_chunk_alloc(trans, block_group->start, + block_group->length); if (ret) btrfs_abort_transaction(trans, ret); add_block_group_free_space(trans, block_group); From patchwork Mon May 4 23:58:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11527851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E2B65912 for ; Mon, 4 May 2020 23:58:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA882206B8 for ; Mon, 4 May 2020 23:58:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728489AbgEDX6l (ORCPT ); Mon, 4 May 2020 19:58:41 -0400 Received: from mx2.suse.de ([195.135.220.15]:37920 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728278AbgEDX6k (ORCPT ); Mon, 4 May 2020 19:58:40 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 75A5AAEB9 for ; Mon, 4 May 2020 23:58:40 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 5/7] btrfs: block-group: Rename write_one_cahce_group() Date: Tue, 5 May 2020 07:58:23 +0800 Message-Id: <20200504235825.4199-6-wqu@suse.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200504235825.4199-1-wqu@suse.com> References: <20200504235825.4199-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The name of this function contains the word "cache", which is left from the era where btrfs_block_group is called btrfs_block_group_cache. Now this "cache" doesn't match any thing, and we have better namings for functions like read/insert/remove_block_group_item(). So rename this function to update_block_group_item(). Since we're here, also rename the local variables to be more like a Chrismas tree, and rename @extent_root to @root for later reuse. Signed-off-by: Qu Wenruo Reviewed-by: Johannes Thumshirn --- fs/btrfs/block-group.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 93e7835ca79d..46846749b631 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2346,23 +2346,24 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache) spin_unlock(&sinfo->lock); } -static int write_one_cache_group(struct btrfs_trans_handle *trans, - struct btrfs_path *path, - struct btrfs_block_group *cache) +static int update_block_group_item(struct btrfs_trans_handle *trans, + struct btrfs_path *path, + struct btrfs_block_group *cache) { struct btrfs_fs_info *fs_info = trans->fs_info; - int ret; - struct btrfs_root *extent_root = fs_info->extent_root; - unsigned long bi; - struct extent_buffer *leaf; struct btrfs_block_group_item bgi; + struct extent_buffer *leaf; + struct btrfs_root *root; struct btrfs_key key; + unsigned long bi; + int ret; key.objectid = cache->start; key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; key.offset = cache->length; + root = fs_info->extent_root; - ret = btrfs_search_slot(trans, extent_root, &key, path, 0, 1); + ret = btrfs_search_slot(trans, root, &key, path, 0, 1); if (ret) { if (ret > 0) ret = -ENOENT; @@ -2674,7 +2675,7 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans) } } if (!ret) { - ret = write_one_cache_group(trans, path, cache); + ret = update_block_group_item(trans, path, cache); /* * Our block group might still be attached to the list * of new block groups in the transaction handle of some @@ -2823,7 +2824,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans) } } if (!ret) { - ret = write_one_cache_group(trans, path, cache); + ret = update_block_group_item(trans, path, cache); /* * One of the free space endio workers might have * created a new block group while updating a free space @@ -2840,7 +2841,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans) if (ret == -ENOENT) { wait_event(cur_trans->writer_wait, atomic_read(&cur_trans->num_writers) == 1); - ret = write_one_cache_group(trans, path, cache); + ret = update_block_group_item(trans, path, cache); } if (ret) btrfs_abort_transaction(trans, ret); From patchwork Mon May 4 23:58:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11527853 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31DDF912 for ; Mon, 4 May 2020 23:58:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 160B3206B8 for ; Mon, 4 May 2020 23:58:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728502AbgEDX6o (ORCPT ); Mon, 4 May 2020 19:58:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:37928 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728278AbgEDX6n (ORCPT ); Mon, 4 May 2020 19:58:43 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 68EABABAD for ; Mon, 4 May 2020 23:58:42 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 6/7] btrfs: Introduce new incompat feature, SKINNY_BG_TREE, to hugely reduce mount time Date: Tue, 5 May 2020 07:58:24 +0800 Message-Id: <20200504235825.4199-7-wqu@suse.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200504235825.4199-1-wqu@suse.com> References: <20200504235825.4199-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The overall idea of the new SKINNY_BG_TREE is: - Put block group items into a separate tree - Reduce the size of block group item to minimal This brings one obvious enhancement: - Reduce mount time of large fs to minimal With all these good news, there are still something we need extra attention: - Need extra chunk lookup when read block group To get the block group length and block group flags, but that's all in-memory lookup, thus way faster than regular tree search. - We can no longer rely on key->offset for various block group context Only key->objectid is still the same. The good news is, in previous refactors, we have already killed such call sites. For now, if an existing fs want to take advantage of BG_TREE feature, btrfs-progs will provide offline convertion tool. [[Benchmark]] Physical device: NVMe SSD VM device: VirtIO block device, backup by sparse file Nodesize: 4K (to bump up tree height) Extent data size: 4M Fs size used: 1T All file extents on disk is in 4M size, preallocated to reduce space usage (as the VM uses loopback block device backed by sparse file) Without patchset: Use ftrace function graph: 7) | open_ctree [btrfs]() { 7) | btrfs_read_block_groups [btrfs]() { 7) @ 805851.8 us | } 7) @ 911890.2 us | } btrfs_read_block_groups() takes 88% of the total mount time, With patchset, and use -O bg-tree mkfs option: 5) | open_ctree [btrfs]() { 5) | btrfs_read_block_groups [btrfs]() { 5) * 63395.75 us | } 5) @ 143106.9 us | } open_ctree() time is only 15% of original mount time. And btrfs_read_block_groups() only takes 7% of total open_ctree() execution time. The reason is pretty obvious when considering how many tree blocks needs to be read from disk: | Extent tree | Skinny bg tree | ---------------------------------------------- nodes | 55 | 1 | leaves | 1025 | 7 | total | 1080 | 8 | The skinny bg tree reduces half of its tree blocks compared to regular bg tree, less than 1% of extent tree implementation. Not to mention all the tree blocks readahead works pretty fine for regular and skinny bg tree, as we will read every item. While readahead for extent tree will just be a diaster, as all block groups are scatter across the whole extent tree. The reduction of mount time is already obvious even on super fast NVMe disk with memory cache. It would be even more obvious if the fs is on spinning rust. Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 168 +++++++++++++++++++++++++++++--- fs/btrfs/block-rsv.c | 2 + fs/btrfs/ctree.h | 4 +- fs/btrfs/disk-io.c | 18 +++- fs/btrfs/sysfs.c | 2 + include/uapi/linux/btrfs.h | 1 + include/uapi/linux/btrfs_tree.h | 11 +++ 7 files changed, 190 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 46846749b631..fad9118c2c1e 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -874,6 +874,28 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans, struct btrfs_key key; int ret; + if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) { + root = fs_info->bg_root; + key.objectid = block_group->start; + key.type = BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY; + key.offset = (u64)-1; + + ret = btrfs_search_slot(trans, root, &key, path, -1, 1); + if (ret == 0) { + btrfs_release_path(path); + return -EUCLEAN; + } + if (ret < 0) + return ret; + ret = btrfs_previous_item(root, path, key.objectid, key.type); + if (ret > 0) + ret = -ENOENT; + if (ret < 0) + return ret; + ret = btrfs_del_item(trans, root, path); + return ret; + } + root = fs_info->extent_root; key.objectid = block_group->start; key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; @@ -1883,9 +1905,38 @@ static int read_block_group_item(struct btrfs_block_group *cache, const struct btrfs_key *key) { struct extent_buffer *leaf = path->nodes[0]; + struct btrfs_fs_info *fs_info = leaf->fs_info; struct btrfs_block_group_item bgi; int slot = path->slots[0]; + if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) { + struct extent_map *em; + + ASSERT(key->type == BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY); + em = btrfs_get_chunk_map(fs_info, key->objectid, 1); + if (IS_ERR(em)) + return PTR_ERR(em); + if (em->start != key->objectid) { + btrfs_err(fs_info, "no chunk found for bytenr %llu", + key->objectid); + free_extent_map(em); + return -EUCLEAN; + } + /* key->offset is used space */ + if (key->offset > em->len) { + btrfs_err(fs_info, + "invalid bg used, have %llu expect [0, %llu]", + key->offset, em->len); + free_extent_map(em); + return -EUCLEAN; + } + cache->length = em->len; + cache->flags = em->map_lookup->type; + cache->used = key->offset; + free_extent_map(em); + return 0; + } + ASSERT(key->type == BTRFS_BLOCK_GROUP_ITEM_KEY); cache->length = key->offset; read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), @@ -1906,7 +1957,10 @@ static int read_one_block_group(struct btrfs_fs_info *info, const bool mixed = btrfs_fs_incompat(info, MIXED_GROUPS); int ret; - ASSERT(key->type == BTRFS_BLOCK_GROUP_ITEM_KEY); + ASSERT((!btrfs_fs_incompat(info, SKINNY_BG_TREE) && + key->type == BTRFS_BLOCK_GROUP_ITEM_KEY) || + (btrfs_fs_incompat(info, SKINNY_BG_TREE) && + key->type == BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY)); cache = btrfs_create_block_group_cache(info, key->objectid); if (!cache) @@ -1998,6 +2052,54 @@ static int read_one_block_group(struct btrfs_fs_info *info, return ret; } +static int read_skinny_block_groups(struct btrfs_fs_info *fs_info, + struct btrfs_path *path, + int need_clear) +{ + struct btrfs_root *root = fs_info->bg_root; + struct btrfs_key key; + int ret; + + key.objectid = 0; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + return ret; + if (ret == 0) { + btrfs_err(fs_info, + "found invalid key (0, 0, 0) in block group tree"); + ret = -EUCLEAN; + goto out; + } + + while (1) { + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); + if (key.type != BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY) { + btrfs_err(fs_info, + "found invalid key (%llu, %u, %llu) in block group tree", + key.objectid, key.type, key.offset); + ret = -EUCLEAN; + goto out; + } + + ret = read_one_block_group(fs_info, path, &key, need_clear); + if (ret < 0) + goto out; + ret = btrfs_next_item(root, path); + if (ret < 0) + goto out; + if (ret > 0) { + ret = 0; + goto out; + } + } +out: + btrfs_release_path(path); + return ret; +} + int btrfs_read_block_groups(struct btrfs_fs_info *info) { struct btrfs_path *path; @@ -2022,20 +2124,27 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) if (btrfs_test_opt(info, CLEAR_CACHE)) need_clear = 1; - while (1) { - ret = find_first_block_group(info, path, &key); - if (ret > 0) - break; - if (ret != 0) - goto error; + if (btrfs_fs_incompat(info, SKINNY_BG_TREE)) { + path->reada = READA_FORWARD; + ret = read_skinny_block_groups(info, path, need_clear); + } else { + while (1) { + ret = find_first_block_group(info, path, &key); + if (ret > 0) + break; + if (ret != 0) + goto error; - btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); - ret = read_one_block_group(info, path, &key, need_clear); - if (ret < 0) - goto error; - key.objectid += key.offset; - key.offset = 0; - btrfs_release_path(path); + btrfs_item_key_to_cpu(path->nodes[0], &key, + path->slots[0]); + ret = read_one_block_group(info, path, &key, + need_clear); + if (ret < 0) + goto error; + key.objectid += key.offset; + key.offset = 0; + btrfs_release_path(path); + } } rcu_read_lock(); @@ -2076,6 +2185,16 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans, struct btrfs_root *root; struct btrfs_key key; + if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) { + spin_lock(&block_group->lock); + key.objectid = block_group->start; + key.type = BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY; + key.offset = block_group->used; + spin_unlock(&block_group->lock); + + root = fs_info->bg_root; + return btrfs_insert_item(trans, root, &key, 0, 0); + } spin_lock(&block_group->lock); btrfs_set_stack_block_group_used(&bgi, block_group->used); btrfs_set_stack_block_group_chunk_objectid(&bgi, @@ -2358,6 +2477,27 @@ static int update_block_group_item(struct btrfs_trans_handle *trans, unsigned long bi; int ret; + if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) { + key.objectid = cache->start; + key.type = BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY; + key.offset = (u64)-1; + root = fs_info->bg_root; + + ret = btrfs_search_slot(trans, root, &key, path, 0, 1); + if (ret == 0) + ret = -EUCLEAN; + if (ret < 0) + goto fail; + ret = btrfs_previous_item(root, path, key.objectid, key.type); + if (ret > 0) + ret = -ENOENT; + if (ret < 0) + goto fail; + key.offset = cache->used; + btrfs_set_item_key_safe(fs_info, path, &key); + btrfs_release_path(path); + return 0; + } key.objectid = cache->start; key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; key.offset = cache->length; diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c index dbba53e712e6..f0d36f1c8c6e 100644 --- a/fs/btrfs/block-rsv.c +++ b/fs/btrfs/block-rsv.c @@ -432,6 +432,8 @@ void btrfs_init_global_block_rsv(struct btrfs_fs_info *fs_info) fs_info->tree_root->block_rsv = &fs_info->global_block_rsv; if (fs_info->quota_root) fs_info->quota_root->block_rsv = &fs_info->global_block_rsv; + if (fs_info->bg_root) + fs_info->bg_root->block_rsv = &fs_info->global_block_rsv; fs_info->chunk_root->block_rsv = &fs_info->chunk_block_rsv; btrfs_update_global_block_rsv(fs_info); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 03ea7370aea7..e40131d662b8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -302,7 +302,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ - BTRFS_FEATURE_INCOMPAT_RAID1C34) + BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ + BTRFS_FEATURE_INCOMPAT_SKINNY_BG_TREE) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) @@ -582,6 +583,7 @@ struct btrfs_fs_info { struct btrfs_root *quota_root; struct btrfs_root *uuid_root; struct btrfs_root *free_space_root; + struct btrfs_root *bg_root; /* the log root tree is a directory of all the other log roots */ struct btrfs_root *log_root_tree; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8ad451695d49..56675d3cd23a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1516,6 +1516,7 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) kfree(fs_info->balance_ctl); kfree(fs_info->delayed_root); btrfs_put_root(fs_info->extent_root); + btrfs_put_root(fs_info->bg_root); btrfs_put_root(fs_info->tree_root); btrfs_put_root(fs_info->chunk_root); btrfs_put_root(fs_info->dev_root); @@ -1531,7 +1532,6 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) kvfree(fs_info); } - struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, struct btrfs_key *location, bool check_ref) @@ -1560,6 +1560,10 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, if (location->objectid == BTRFS_FREE_SPACE_TREE_OBJECTID) return btrfs_grab_root(fs_info->free_space_root) ? fs_info->free_space_root : ERR_PTR(-ENOENT); + if (location->objectid == BTRFS_BLOCK_GROUP_TREE_OBJECTID) + return btrfs_grab_root(fs_info->bg_root) ? + fs_info->bg_root : ERR_PTR(-ENOENT); + again: root = btrfs_lookup_fs_root(fs_info, location->objectid); if (root) { @@ -1983,6 +1987,7 @@ static void free_root_pointers(struct btrfs_fs_info *info, bool free_chunk_root) if (free_chunk_root) free_root_extent_buffers(info->chunk_root); free_root_extent_buffers(info->free_space_root); + free_root_extent_buffers(info->bg_root); } void btrfs_put_root(struct btrfs_root *root) @@ -2267,6 +2272,17 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info) set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state); fs_info->extent_root = root; + if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) { + location.objectid = BTRFS_BLOCK_GROUP_TREE_OBJECTID; + root = btrfs_read_tree_root(tree_root, &location); + if (IS_ERR(root)) { + ret = PTR_ERR(root); + goto out; + } + set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state); + fs_info->bg_root = root; + } + location.objectid = BTRFS_DEV_TREE_OBJECTID; root = btrfs_read_tree_root(tree_root, &location); if (IS_ERR(root)) { diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index a39bff64ff24..d71bd4949636 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -261,6 +261,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES); BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID); BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE); BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34); +BTRFS_FEAT_ATTR_INCOMPAT(skinny_bg_tree, SKINNY_BG_TREE); static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(mixed_backref), @@ -276,6 +277,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(metadata_uuid), BTRFS_FEAT_ATTR_PTR(free_space_tree), BTRFS_FEAT_ATTR_PTR(raid1c34), + BTRFS_FEAT_ATTR_PTR(skinny_bg_tree), NULL }; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index e6b6cb0f8bc6..dc8675d892a4 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -290,6 +290,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) #define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) +#define BTRFS_FEATURE_INCOMPAT_SKINNY_BG_TREE (1ULL << 12) struct btrfs_ioctl_feature_flags { __u64 compat_flags; diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 8e322e2c7e78..c1b63a63db9f 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -48,6 +48,9 @@ /* tracks free space in block groups. */ #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL +/* sotre SKINNY_BLOCK_GROUPs in a seprate tree */ +#define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL + /* device stats in the device tree */ #define BTRFS_DEV_STATS_OBJECTID 0ULL @@ -182,6 +185,14 @@ */ #define BTRFS_BLOCK_GROUP_ITEM_KEY 192 +/* + * A more optimized structure for block group item. + * key.objectid = bg->start + * key.offset = bg->used + * No item needed, all other info can be extracted from corresponding chunk. + */ +#define BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY 193 + /* * Every block group is represented in the free space tree by a free space info * item, which stores some accounting information. It is keyed on From patchwork Mon May 4 23:58:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11527855 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1616C81 for ; Mon, 4 May 2020 23:58:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0861B20675 for ; Mon, 4 May 2020 23:58:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728497AbgEDX6o (ORCPT ); Mon, 4 May 2020 19:58:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:37920 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728490AbgEDX6n (ORCPT ); Mon, 4 May 2020 19:58:43 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 32BAAAEB9 for ; Mon, 4 May 2020 23:58:44 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v4 7/7] btrfs: tree-checker: Introduce checks for skinny block group item Date: Tue, 5 May 2020 07:58:25 +0800 Message-Id: <20200504235825.4199-8-wqu@suse.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200504235825.4199-1-wqu@suse.com> References: <20200504235825.4199-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The check is pretty simple, just two checks: - Tree owner check Skinny block group item should only exist in block group tree. - Used num bytes (key->offset) To avoid possible later chunk size change, here we use super large value (64T) as threshold to reduce false alert. - Duplicated skinny block group items There shouldn't be duplicated items for the same block group. Signed-off-by: Qu Wenruo --- fs/btrfs/tree-checker.c | 56 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 51 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 517b44300a05..f5eb91b4139f 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -621,11 +621,18 @@ static void block_group_err(const struct extent_buffer *eb, int slot, vaf.fmt = fmt; vaf.va = &args; - btrfs_crit(fs_info, - "corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu bg_len=%llu, %pV", - btrfs_header_level(eb) == 0 ? "leaf" : "node", - btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, - key.objectid, key.offset, &vaf); + if (key.type == BTRFS_BLOCK_GROUP_ITEM_KEY) + btrfs_crit(fs_info, +"corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu bg_len=%llu, %pV", + btrfs_header_level(eb) == 0 ? "leaf" : "node", + btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, + key.objectid, key.offset, &vaf); + else + btrfs_crit(fs_info, +"corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu, %pV", + btrfs_header_level(eb) == 0 ? "leaf" : "node", + btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, + key.objectid, &vaf); va_end(args); } @@ -698,6 +705,42 @@ static int check_block_group_item(struct extent_buffer *leaf, return 0; } +static int check_skinny_block_group_item(struct extent_buffer *leaf, + struct btrfs_key *key, + struct btrfs_key *prev_key, int slot) +{ + if (btrfs_header_owner(leaf) != BTRFS_BLOCK_GROUP_TREE_OBJECTID) { + block_group_err(leaf, slot, + "bad tree owner for skinny block group item, have %llu expect %llu", + btrfs_header_owner(leaf), + BTRFS_BLOCK_GROUP_TREE_OBJECTID); + return -EUCLEAN; + } + /* + * We can't do direct key->offset check, but at least we shouldn't + * have anything larger than max chunk size. + * Here we use something way larger than that to ensure we only catch + * obviouly wrong result. + */ + if (key->offset >= SZ_64T) { + block_group_err(leaf, slot, + "too large used num bytes, have %llu expect [0, %llu)", + key->offset, BTRFS_MAX_DATA_CHUNK_SIZE); + return -EUCLEAN; + } + + /* + * There shouldn't be any duplicated skinny bg items + * (same objectid but different offset) + */ + if (slot > 0 && prev_key->objectid == key->objectid) { + block_group_err(leaf, slot, + "duplicated skinny block group items found"); + return -EUCLEAN; + } + return 0; +} + __printf(4, 5) __cold static void chunk_err(const struct extent_buffer *leaf, @@ -1511,6 +1554,9 @@ static int check_leaf_item(struct extent_buffer *leaf, case BTRFS_BLOCK_GROUP_ITEM_KEY: ret = check_block_group_item(leaf, key, slot); break; + case BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY: + ret = check_skinny_block_group_item(leaf, key, prev_key, slot); + break; case BTRFS_CHUNK_ITEM_KEY: chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk); ret = check_leaf_chunk_item(leaf, chunk, key, slot);