From patchwork Tue Jul 7 08:15:28 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 6730991 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 04709C05AC for ; Tue, 7 Jul 2015 08:17:41 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DE9F520721 for ; Tue, 7 Jul 2015 08:17:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C434F20719 for ; Tue, 7 Jul 2015 08:17:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754488AbbGGIQW (ORCPT ); Tue, 7 Jul 2015 04:16:22 -0400 Received: from cn.fujitsu.com ([59.151.112.132]:55022 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1756573AbbGGIPv (ORCPT ); Tue, 7 Jul 2015 04:15:51 -0400 X-IronPort-AV: E=Sophos;i="5.13,665,1427731200"; d="scan'208";a="98162892" Received: from unknown (HELO edo.cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 07 Jul 2015 16:19:31 +0800 Received: from G08CNEXCHPEKD02.g08.fujitsu.local (localhost.localdomain [127.0.0.1]) by edo.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id t678DtC9008545; Tue, 7 Jul 2015 16:13:55 +0800 Received: from localhost.localdomain (10.167.226.33) by G08CNEXCHPEKD02.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.181.6; Tue, 7 Jul 2015 16:15:36 +0800 From: Qu Wenruo To: CC: Subject: [PATCH 7/7] btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance behavior. Date: Tue, 7 Jul 2015 16:15:28 +0800 Message-ID: <1436256928-23812-8-git-send-email-quwenruo@cn.fujitsu.com> X-Mailer: git-send-email 2.4.4 In-Reply-To: <1436256928-23812-1-git-send-email-quwenruo@cn.fujitsu.com> References: <1436256928-23812-1-git-send-email-quwenruo@cn.fujitsu.com> MIME-Version: 1.0 X-Originating-IP: [10.167.226.33] Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP [BUG] # mkfs.btrfs /dev/sdb /dev/sdd -m raid0 -d raid0 # mount /dev/sdb /mnt/btrfs # btrfs balance start /mnt/btrfs # btrfs fi df /mnt/btrfs Data, single: total=1.00GiB, used=320.00KiB System, single: total=32.00MiB, used=16.00KiB Metadata, RAID0: total=256.00MiB, used=112.00KiB GlobalReserve, single: total=16.00MiB, used=0.00B Only metadata stay RAID0. Data and system goes from RAID0 to single. [REASON] The problem is caused by the temporary single chunk. In mkfs, it will always create single data/metadata/sys chunk and them add device into the temporary btrfs. When doing all chunk balance, for data and syschunk, they are almost empty, so balance will move them into the single chunk and remove the old RAID0 chunk. For metadata, it has more data and will kick the metadata chunk pre alloc, so new RAID0 chunk is allocated and the old metadata is move there. Old RAID0 and single chunks are removed. [FIX] Now we add a new function to cleanup the temporary chunks at the end of mkfs routine. It will cleanup the chunks which is empty and its profile differs from the mkfs profile. So in balance, btrfs will always alloc a new chunk to keep the profile, other than moving data into the single chunk. Signed-off-by: Qu Wenruo --- mkfs.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+) diff --git a/mkfs.c b/mkfs.c index b60fc5a..ee8a3cb 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1182,6 +1182,149 @@ static void list_all_devices(struct btrfs_root *root) printf("\n"); } +static int is_temp_block_group(struct extent_buffer *node, + struct btrfs_block_group_item *bgi, + u64 data_profile, u64 meta_profile, + u64 sys_profile) +{ + u64 flag = btrfs_disk_block_group_flags(node, bgi); + u64 flag_type = flag & BTRFS_BLOCK_GROUP_TYPE_MASK; + u64 flag_profile = flag & BTRFS_BLOCK_GROUP_PROFILE_MASK; + u64 used = btrfs_disk_block_group_used(node, bgi); + + /* + * Chunks meets all the following conditions is a temp chunk + * 1) Empty chunk + * Temp chunk is always empty. + * + * 2) profile dismatch with mkfs profile. + * Temp chunk is always in SINGLE + * + * 3) Size differs with mkfs_alloc + * Special case for SINGLE/SINGLE btrfs. + * In that case, temp data chunk and real data chunk are always empty. + * So we need to use mkfs_alloc to be sure which chunk is the newly + * allocated. + * + * Normally, new chunk size is equal to mkfs one (One chunk) + * If it has multiple chunks, we just refuse to delete any one. + * As they are all single, so no real problem will happen. + * So only use condition 1) and 2) to judge them. + */ + if (used != 0) + return 0; + switch (flag_type) { + case BTRFS_BLOCK_GROUP_DATA: + case BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA: + data_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK; + if (flag_profile != data_profile) + return 1; + break; + case BTRFS_BLOCK_GROUP_METADATA: + meta_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK; + if (flag_profile != meta_profile) + return 1; + break; + case BTRFS_BLOCK_GROUP_SYSTEM: + sys_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK; + if (flag_profile != sys_profile) + return 1; + break; + } + return 0; +} + +/* Note: if current is a block group, it will skip it anyway */ +static int next_block_group(struct btrfs_root *root, + struct btrfs_path *path) +{ + struct btrfs_key key; + int ret = 0; + + while (1) { + ret = btrfs_next_item(root, path); + if (ret) + goto out; + + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); + if (key.type == BTRFS_BLOCK_GROUP_ITEM_KEY) + goto out; + } +out: + return ret; +} + +/* This function will cleanup */ +static int cleanup_temp_chunks(struct btrfs_fs_info *fs_info, + struct mkfs_allocation *alloc, + u64 data_profile, u64 meta_profile, + u64 sys_profile) +{ + struct btrfs_trans_handle *trans = NULL; + struct btrfs_block_group_item *bgi; + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_path *path; + int ret = 0; + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto out; + } + + trans = btrfs_start_transaction(root, 1); + + key.objectid = 0; + key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; + key.offset = 0; + + while (1) { + /* + * as the rest of the loop may modify the tree, we need to + * start a new search each time. + */ + ret = btrfs_search_slot(trans, root, &key, path, 0, 0); + if (ret < 0) + goto out; + + btrfs_item_key_to_cpu(path->nodes[0], &found_key, + path->slots[0]); + if (found_key.objectid < key.objectid) + goto out; + if (found_key.type != BTRFS_BLOCK_GROUP_ITEM_KEY) { + ret = next_block_group(root, path); + if (ret < 0) + goto out; + if (ret > 0) { + ret = 0; + goto out; + } + btrfs_item_key_to_cpu(path->nodes[0], &found_key, + path->slots[0]); + } + + bgi = btrfs_item_ptr(path->nodes[0], path->slots[0], + struct btrfs_block_group_item); + if (is_temp_block_group(path->nodes[0], bgi, + data_profile, meta_profile, + sys_profile)) { + ret = btrfs_free_block_group(trans, fs_info, + found_key.objectid, found_key.offset); + if (ret < 0) + goto out; + } + btrfs_release_path(path); + key.objectid = found_key.objectid + found_key.offset; + } +out: + if (trans) + btrfs_commit_transaction(trans, root); + btrfs_free_path(path); + return ret; +} + int main(int ac, char **av) { char *file; @@ -1669,6 +1812,12 @@ skip_multidev: ret = make_image(source_dir, root, fd); BUG_ON(ret); } + ret = cleanup_temp_chunks(root->fs_info, &allocation, data_profile, + metadata_profile, metadata_profile); + if (ret < 0) { + fprintf(stderr, "Failed to cleanup temporary chunks\n"); + goto out; + } if (verbose) { char features_buf[64]; @@ -1703,6 +1852,7 @@ skip_multidev: list_all_devices(root); } +out: ret = close_ctree(root); BUG_ON(ret); free(label);