From patchwork Thu Oct 1 05:57:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810901 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A27C139A for ; Thu, 1 Oct 2020 05:57:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 047CE221E7 for ; Thu, 1 Oct 2020 05:57:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="nqnxdwo1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730476AbgJAF5z (ORCPT ); Thu, 1 Oct 2020 01:57:55 -0400 Received: from mx2.suse.de ([195.135.220.15]:40050 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730385AbgJAF5z (ORCPT ); Thu, 1 Oct 2020 01:57:55 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531873; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r55sHVqQj+jXdCTel7rkLS/jMKiSBEsFR+BROdHTMzs=; b=nqnxdwo1ByM5OxksCLqmKfor/SAj3yzUt+JMO/jhjk+5xLuTPPHxZSg818hi0NqoNtsUrV n+6RmFnsnM+DcPOFaBheDSONuOymR6FtSXiZiHAXvHpwpBQBnld3fP0mVqMOTVCC3/tWri 0kf1R6kohbzi2d7JjWOC3QrsHpNDrOQ= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 7C562B31D for ; Thu, 1 Oct 2020 05:57:53 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 01/12] btrfs: block-group: cleanup btrfs_add_block_group_cache() Date: Thu, 1 Oct 2020 13:57:33 +0800 Message-Id: <20201001055744.103261-2-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org We can cleanup btrfs_add_block_group_cache() by: - Remove the "btrfs_" prefix Since it's not exported, and only used inside block-group.c - Remove the "_cache" suffix We have renamed struct btrfs_block_group_cache to btrfs_block_group, thus no need to keep the "_cache" suffix. - Sink the btrfs_fs_info parameter Since commit aac0023c2106 ("btrfs: move basic block_group definitions to their own header") we can grab btrfs_fs_info from struct btrfs_block_group directly. Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index ea8aaf36647e..585843d39e06 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -150,9 +150,9 @@ void btrfs_put_block_group(struct btrfs_block_group *cache) /* * This adds the block group to the fs_info rb tree for the block group cache */ -static int btrfs_add_block_group_cache(struct btrfs_fs_info *info, - struct btrfs_block_group *block_group) +static int add_block_group(struct btrfs_block_group *block_group) { + struct btrfs_fs_info *info = block_group->fs_info; struct rb_node **p; struct rb_node *parent = NULL; struct btrfs_block_group *cache; @@ -1966,7 +1966,7 @@ static int read_one_block_group(struct btrfs_fs_info *info, btrfs_free_excluded_extents(cache); } - ret = btrfs_add_block_group_cache(info, cache); + ret = add_block_group(cache); if (ret) { btrfs_remove_free_space_cache(cache); goto error; @@ -2167,7 +2167,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, cache->space_info = btrfs_find_space_info(fs_info, cache->flags); ASSERT(cache->space_info); - ret = btrfs_add_block_group_cache(fs_info, cache); + ret = add_block_group(cache); if (ret) { btrfs_remove_free_space_cache(cache); btrfs_put_block_group(cache); From patchwork Thu Oct 1 05:57:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810903 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D559174A for ; Thu, 1 Oct 2020 05:57:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3B806221E7 for ; Thu, 1 Oct 2020 05:57:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="a4G3eVf3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730501AbgJAF55 (ORCPT ); Thu, 1 Oct 2020 01:57:57 -0400 Received: from mx2.suse.de ([195.135.220.15]:40070 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730498AbgJAF55 (ORCPT ); Thu, 1 Oct 2020 01:57:57 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531875; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VnFZbVw8sNN840vtB6i++jsztgAWLo+nfEWHXTrf8wc=; b=a4G3eVf3gzVuG3ifvvFempkjNxCOcqmqAsBKKjvVr0BHADs6uvoKMGe4h58KYiF8tpvS9H iANHfw0DjOSqv0sc1pAzLnT2nYxpnyvnN4cmAkZiN/pI/SyMZyRS4m1ABXxKgYDHpmvf11 MXn33xBiWPYJyh0AyJ0smnlWPtH1j5k= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 61BB5B328 for ; Thu, 1 Oct 2020 05:57:55 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 02/12] btrfs: block-group: extra the code to delete block group from fs_info rb tree Date: Thu, 1 Oct 2020 13:57:34 +0800 Message-Id: <20201001055744.103261-3-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Extra the common code into a function, del_block_group(), to delete block group from fs_info rb tree. The function will remove it from rb tree, and update the logical bytenr hint for fs_info. There is only one caller for now, btrfs_remove_block_group(). Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 585843d39e06..831855c85419 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -187,6 +187,21 @@ static int add_block_group(struct btrfs_block_group *block_group) return 0; } +/* This removes block group from fs_info rb tree */ +static void del_block_group(struct btrfs_block_group *block_group) +{ + struct btrfs_fs_info *fs_info = block_group->fs_info; + + spin_lock(&fs_info->block_group_cache_lock); + rb_erase(&block_group->cache_node, + &fs_info->block_group_cache_tree); + RB_CLEAR_NODE(&block_group->cache_node); + + if (fs_info->first_logical_byte == block_group->start) + fs_info->first_logical_byte = (u64)-1; + spin_unlock(&fs_info->block_group_cache_lock); +} + /* * This will return the block group at or after bytenr if contains is 0, else * it will return the block group that contains the bytenr @@ -1008,18 +1023,10 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, btrfs_release_path(path); } - spin_lock(&fs_info->block_group_cache_lock); - rb_erase(&block_group->cache_node, - &fs_info->block_group_cache_tree); - RB_CLEAR_NODE(&block_group->cache_node); - + del_block_group(block_group); /* Once for the block groups rbtree */ btrfs_put_block_group(block_group); - if (fs_info->first_logical_byte == block_group->start) - fs_info->first_logical_byte = (u64)-1; - spin_unlock(&fs_info->block_group_cache_lock); - down_write(&block_group->space_info->groups_sem); /* * we must use list_del_init so people can check to see if they From patchwork Thu Oct 1 05:57:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810905 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4274139A for ; Thu, 1 Oct 2020 05:58:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A09F0221E7 for ; Thu, 1 Oct 2020 05:58:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="QpIMO+Rv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730560AbgJAF56 (ORCPT ); Thu, 1 Oct 2020 01:57:58 -0400 Received: from mx2.suse.de ([195.135.220.15]:40098 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730534AbgJAF56 (ORCPT ); Thu, 1 Oct 2020 01:57:58 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531877; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MEoxUABMqVRLswpNAFRpcjJgbPjxSsANud/cpgqWsUI=; b=QpIMO+RvuI3/6vHCjO3V98pDk5TjLuHxWfpX6fAX3T1Cs5spzz/xDFeJ9TDhNaabHyokvN CPUed8is+limE56PiU87/A5tRSU9TJ9JyXqU8oW1NOvkvEsPVK6rr8E1+bq0UO4jCOpU63 REgK+ZnKtQdM33tFfdUkI8hiDAvsLwY= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 43DDFB31D for ; Thu, 1 Oct 2020 05:57:57 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 03/12] btrfs: block-group: make link_block_group() to handle avail alloc bits Date: Thu, 1 Oct 2020 13:57:35 +0800 Message-Id: <20201001055744.103261-4-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When we call link_block_group(), we also call set_avail_alloc_bits() after that. Thus we can merge the set_avail_alloc_bits() into link_block_group(). Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 831855c85419..cb6be9a3d1dc 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1771,6 +1771,7 @@ static int exclude_super_stripes(struct btrfs_block_group *cache) static void link_block_group(struct btrfs_block_group *cache) { + struct btrfs_fs_info *fs_info = cache->fs_info; struct btrfs_space_info *space_info = cache->space_info; int index = btrfs_bg_flags_to_raid_index(cache->flags); bool first = false; @@ -1783,6 +1784,8 @@ static void link_block_group(struct btrfs_block_group *cache) if (first) btrfs_sysfs_add_block_group_type(cache); + + set_avail_alloc_bits(fs_info, cache->flags); } static struct btrfs_block_group *btrfs_create_block_group_cache( @@ -1986,7 +1989,6 @@ static int read_one_block_group(struct btrfs_fs_info *info, link_block_group(cache); - set_avail_alloc_bits(info, cache->flags); if (btrfs_chunk_readonly(info, cache->start)) { inc_block_group_ro(cache, 1); } else if (cache->used == 0) { @@ -2196,7 +2198,6 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, trans->delayed_ref_updates++; btrfs_update_delayed_refs_rsv(trans); - set_avail_alloc_bits(fs_info, type); return 0; } From patchwork Thu Oct 1 05:57:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810907 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4D4CA112C for ; Thu, 1 Oct 2020 05:58:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29F59221EC for ; Thu, 1 Oct 2020 05:58:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="hbyowyEY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730660AbgJAF6C (ORCPT ); Thu, 1 Oct 2020 01:58:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:40152 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730534AbgJAF6C (ORCPT ); Thu, 1 Oct 2020 01:58:02 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531880; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Igp57LS3VRplvrf/mkOyjiCoHTwtGZu7czGgMCdpLXg=; b=hbyowyEYdfBCoSjuLqFnj4RX44+WcQjrP6vpWSkX6XRiTzBuoMpqUY7oUV2m/XNLkdtgkw jzMuLanbetNkEG4rvQdEtSyQWHt0js/02NmyIKyC8R58mmo1s7z40BPIS1iXZ/1+cGDhbn oh9IH1cXsLL9zF5o8JpqiLe5M6S3FyY= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id BAB0CB32D for ; Thu, 1 Oct 2020 05:58:00 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 04/12] btrfs: block-group: extract the code to unlink block group from space info Date: Thu, 1 Oct 2020 13:57:36 +0800 Message-Id: <20201001055744.103261-5-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Introduce a new helper, unlink_block_group(), to unlink a block group from space info. The function will remove the block group from space info, and cleanup the kobject if that block group is the last one of the space info. There are two callers, btrfs_free_block_groups() and btrfs_remove_block_group() for now. Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 50 +++++++++++++++++++++++------------------- 1 file changed, 27 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index cb6be9a3d1dc..262805b96b9b 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -900,6 +900,31 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans, return ret; } +static void unlink_block_group(struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct kobject *kobj = NULL; + int index = btrfs_bg_flags_to_raid_index(cache->flags); + + down_write(&cache->space_info->groups_sem); + /* + * we must use list_del_init so people can check to see if they + * are still on the list after taking the semaphore + */ + list_del_init(&cache->list); + if (list_empty(&cache->space_info->block_groups[index])) { + kobj = cache->space_info->block_group_kobjs[index]; + cache->space_info->block_group_kobjs[index] = NULL; + clear_avail_alloc_bits(fs_info, cache->flags); + } + up_write(&cache->space_info->groups_sem); + clear_incompat_bg_bits(fs_info, cache->flags); + if (kobj) { + kobject_del(kobj); + kobject_put(kobj); + } +} + int btrfs_remove_block_group(struct btrfs_trans_handle *trans, u64 group_start, struct extent_map *em) { @@ -910,9 +935,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, struct btrfs_root *tree_root = fs_info->tree_root; struct btrfs_key key; struct inode *inode; - struct kobject *kobj = NULL; int ret; - int index; int factor; struct btrfs_caching_control *caching_ctl = NULL; bool remove_em; @@ -931,7 +954,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, btrfs_free_ref_tree_range(fs_info, block_group->start, block_group->length); - index = btrfs_bg_flags_to_raid_index(block_group->flags); factor = btrfs_bg_type_to_factor(block_group->flags); /* make sure this block group isn't part of an allocation cluster */ @@ -1027,23 +1049,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, /* Once for the block groups rbtree */ btrfs_put_block_group(block_group); - down_write(&block_group->space_info->groups_sem); - /* - * we must use list_del_init so people can check to see if they - * are still on the list after taking the semaphore - */ - list_del_init(&block_group->list); - if (list_empty(&block_group->space_info->block_groups[index])) { - kobj = block_group->space_info->block_group_kobjs[index]; - block_group->space_info->block_group_kobjs[index] = NULL; - clear_avail_alloc_bits(fs_info, block_group->flags); - } - up_write(&block_group->space_info->groups_sem); - clear_incompat_bg_bits(fs_info, block_group->flags); - if (kobj) { - kobject_del(kobj); - kobject_put(kobj); - } + unlink_block_group(block_group); if (block_group->has_caching_ctl) caching_ctl = btrfs_get_caching_control(block_group); @@ -3322,9 +3328,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) RB_CLEAR_NODE(&block_group->cache_node); spin_unlock(&info->block_group_cache_lock); - down_write(&block_group->space_info->groups_sem); - list_del(&block_group->list); - up_write(&block_group->space_info->groups_sem); + unlink_block_group(block_group); /* * We haven't cached this block group, which means we could From patchwork Thu Oct 1 05:57:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810909 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D28BD112C for ; Thu, 1 Oct 2020 05:58:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A97A6221EC for ; Thu, 1 Oct 2020 05:58:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="gvL1yRzm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730679AbgJAF6F (ORCPT ); Thu, 1 Oct 2020 01:58:05 -0400 Received: from mx2.suse.de ([195.135.220.15]:40216 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730534AbgJAF6F (ORCPT ); Thu, 1 Oct 2020 01:58:05 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531883; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LK7eHJRxARkK6h4xsTdp8OjW14GAmuDTVrMBlQ+ahRI=; b=gvL1yRzmhgBzjwG5VpOZuo6ExVj55xcQUOCi+WfXRsKfAGQnvlGXjkJbK9ICaT3WZ4Q/ly 3GqF2yxrdQMMifj0BbWF98bzCSWn1cYJG9g49YBBwh9Mk7ldkXr+CdLpmGra1CY2391AKM 52iC6ZbM4tdUPZUEhpp+W7zChU28LMw= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 87081B31D for ; Thu, 1 Oct 2020 05:58:03 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 05/12] btrfs: space-info: update btrfs_update_space_info() to handle block group removal Date: Thu, 1 Oct 2020 13:57:37 +0800 Message-Id: <20201001055744.103261-6-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Update btrfs_update_space_info() to handle block group removal, by adding a new paramter, @add, to indicate whether we're adding or removing a block group. This allows btrfs_remove_block_group() to call btrfs_update_space_info() instead of doing it manually. Also since we're here, sink the parameters, as we always call btrfs_update_space_info() with values extracted from a block group, just pass the btrfs_block_group paramter in directly. This also removes the btrfs_fs_info prameter. Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 23 +++---------------- fs/btrfs/space-info.c | 50 +++++++++++++++++++++++++++++------------- fs/btrfs/space-info.h | 4 +--- 3 files changed, 39 insertions(+), 38 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 262805b96b9b..bbe3c4cd28d8 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1085,22 +1085,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, btrfs_remove_free_space_cache(block_group); - spin_lock(&block_group->space_info->lock); - list_del_init(&block_group->ro_list); - - if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) { - WARN_ON(block_group->space_info->total_bytes - < block_group->length); - WARN_ON(block_group->space_info->bytes_readonly - < block_group->length); - WARN_ON(block_group->space_info->disk_total - < block_group->length * factor); - } - block_group->space_info->total_bytes -= block_group->length; - block_group->space_info->bytes_readonly -= block_group->length; - block_group->space_info->disk_total -= block_group->length * factor; - - spin_unlock(&block_group->space_info->lock); + btrfs_update_space_info(block_group, false, NULL); /* * Remove the free space for the block group from the free space tree @@ -1988,8 +1973,7 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } trace_btrfs_add_block_group(info, cache, 0); - btrfs_update_space_info(info, cache->flags, cache->length, - cache->used, cache->bytes_super, &space_info); + btrfs_update_space_info(cache, true, &space_info); cache->space_info = space_info; @@ -2194,8 +2178,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, * the rbtree, update the space info's counters. */ trace_btrfs_add_block_group(fs_info, cache, 1); - btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, - cache->bytes_super, &cache->space_info); + btrfs_update_space_info(cache, true, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 475968ccbd1d..c86baa331612 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -257,29 +257,49 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) return ret; } -void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, - u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, +void btrfs_update_space_info(struct btrfs_block_group *bg, bool add, struct btrfs_space_info **space_info) { + struct btrfs_fs_info *info = bg->fs_info; struct btrfs_space_info *found; int factor; - factor = btrfs_bg_type_to_factor(flags); - - found = btrfs_find_space_info(info, flags); + factor = btrfs_bg_type_to_factor(bg->flags); + found = btrfs_find_space_info(info, bg->flags); ASSERT(found); spin_lock(&found->lock); - found->total_bytes += total_bytes; - found->disk_total += total_bytes * factor; - found->bytes_used += bytes_used; - found->disk_used += bytes_used * factor; - found->bytes_readonly += bytes_readonly; - if (total_bytes > 0) - found->full = 0; - btrfs_try_granting_tickets(info, found); + if (add) { + found->total_bytes += bg->length; + found->disk_total += bg->length * factor; + found->bytes_used += bg->used; + found->disk_used += bg->used * factor; + found->bytes_readonly += bg->bytes_super; + if (bg->length > 0) + found->full = 0; + btrfs_try_granting_tickets(info, found); + } else { + /* The block group to be removed should be empty */ + WARN_ON(bg->used || !bg->ro); + + /* For removal, we need more overflow check */ + if (btrfs_test_opt(info, ENOSPC_DEBUG)) { + WARN_ON(found->total_bytes < bg->length); + WARN_ON(found->bytes_readonly < bg->length); + WARN_ON(found->disk_total < bg->length * factor); + } + found->total_bytes -= bg ->length; + found->bytes_readonly -= bg->length; + found->disk_total -= bg->length * factor; + + /* + * Also remove the block group from ro list since we're + * delete it from the space info accounting. + */ + list_del_init(&bg->ro_list); + } spin_unlock(&found->lock); - *space_info = found; + if (space_info) + *space_info = found; } struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index c3c64019950a..3b5081511d7a 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -117,9 +117,7 @@ DECLARE_SPACE_INFO_UPDATE(bytes_may_use, "space_info"); DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned"); int btrfs_init_space_info(struct btrfs_fs_info *fs_info); -void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, - u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, +void btrfs_update_space_info(struct btrfs_block_group *bg, bool add, struct btrfs_space_info **space_info); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, u64 flags); From patchwork Thu Oct 1 05:57:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810911 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0269112C for ; Thu, 1 Oct 2020 05:58:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A3FCE221EC for ; Thu, 1 Oct 2020 05:58:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="cO5gLgRU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730723AbgJAF6H (ORCPT ); Thu, 1 Oct 2020 01:58:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:40244 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730534AbgJAF6H (ORCPT ); Thu, 1 Oct 2020 01:58:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531885; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6bRolufmTc+LjcKcO4eEj82qYiqTWj/g8cfI+brliHo=; b=cO5gLgRUK4rXHW3bINGKXDfrW4yR8t8lzQBArUo/YZ/5m5tnjStO3DmpLvEqNQ9PoBlm5D jZf/x+emUXSK/2GUeCAUkDfFgrY+JalLBWmo0LlKRsIUKOmcOIM/+RW+OJ8Rz59+FIbAY0 Ff/HCUKaNDp4yVBhOCxH/fJzmv3McFY= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C08C1B320 for ; Thu, 1 Oct 2020 05:58:05 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 06/12] btrfs: block-group: introduce btrfs_revert_block_group() Date: Thu, 1 Oct 2020 13:57:38 +0800 Message-Id: <20201001055744.103261-7-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch introudces a new function, btrfs_revert_block_group(), to revert a newly created but not yet finished block group. This is for error handling where we just called btrfs_make_block_group() but then some error happened. Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 33 +++++++++++++++++++++++++++++++++ fs/btrfs/block-group.h | 1 + fs/btrfs/space-info.c | 12 +++++++++--- 3 files changed, 43 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index bbe3c4cd28d8..dc70d3581bf0 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2190,6 +2190,39 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, return 0; } +/* + * This is a function to revert the newly created block group, mostly for error + * handling. + * + * Unlike btrfs_remove_block_group(), since the new block group hasn't + * finished creating, it's much easier to remove it. + */ +void btrfs_revert_block_group(struct btrfs_trans_handle *trans, u64 bytenr) +{ + struct btrfs_block_group *bg; + + bg = btrfs_lookup_block_group(trans->fs_info, bytenr); + + if (!bg) + return; + trace_btrfs_remove_block_group(bg); + + btrfs_update_space_info(bg, false, NULL); + unlink_block_group(bg); + + btrfs_delayed_refs_rsv_release(trans->fs_info, 1); + list_del_init(&bg->bg_list); + + del_block_group(bg); + + /* One for the lookup reference */ + btrfs_put_block_group(bg); + + /* Finally free the last reference */ + WARN_ON(refcount_read(&bg->refs) != 1); + btrfs_put_block_group(bg); +} + /* * Mark one block group RO, can be called several times for the same block * group. diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index adfd7583a17b..619ca97254fb 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -248,6 +248,7 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg); int btrfs_read_block_groups(struct btrfs_fs_info *info); int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, u64 type, u64 chunk_offset, u64 size); +void btrfs_revert_block_group(struct btrfs_trans_handle *trans, u64 bytenr); void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans); int btrfs_inc_block_group_ro(struct btrfs_block_group *cache, bool do_chunk_alloc); diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index c86baa331612..64b6e1d44f47 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -278,8 +278,14 @@ void btrfs_update_space_info(struct btrfs_block_group *bg, bool add, found->full = 0; btrfs_try_granting_tickets(info, found); } else { + /* We get called for either removing an unused bg, or a newly + * created bg. + * Use their ro bit to determine which the case is. + */ + bool ro = bg->ro; + /* The block group to be removed should be empty */ - WARN_ON(bg->used || !bg->ro); + WARN_ON(bg->used); /* For removal, we need more overflow check */ if (btrfs_test_opt(info, ENOSPC_DEBUG)) { @@ -288,9 +294,9 @@ void btrfs_update_space_info(struct btrfs_block_group *bg, bool add, WARN_ON(found->disk_total < bg->length * factor); } found->total_bytes -= bg ->length; - found->bytes_readonly -= bg->length; found->disk_total -= bg->length * factor; - + if (ro) + found->bytes_readonly -= bg->length; /* * Also remove the block group from ro list since we're * delete it from the space info accounting. From patchwork Thu Oct 1 05:57:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810913 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F19E112C for ; Thu, 1 Oct 2020 05:58:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7C14B221EC for ; Thu, 1 Oct 2020 05:58:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="aC2x68v+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730732AbgJAF6K (ORCPT ); Thu, 1 Oct 2020 01:58:10 -0400 Received: from mx2.suse.de ([195.135.220.15]:40300 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725892AbgJAF6K (ORCPT ); Thu, 1 Oct 2020 01:58:10 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NPMm8IH0RLDFJj7EAl6KXu0fgpNeWCNEpmuEK6e56DY=; b=aC2x68v+oy9BYmuIDR7O6vkYgfNjURk5VVEkxMyRFq1yOD/1Oo8EF820L0KclxIHNl9qdp OfU2o4agETRPVcTncqdy8BNfXJMLeLpHj7nLBhQgXqVGbLWk0m192+lbPZvofe3P/LhuBq u/dpmhPNP29g3x33YHxh92DryxwDxao= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id F37ECB320 for ; Thu, 1 Oct 2020 05:58:07 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 07/12] btrfs: volumes: introduce the device layout aware per-profile available space infrastructure Date: Thu, 1 Oct 2020 13:57:39 +0800 Message-Id: <20201001055744.103261-8-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [PROBLEM] There are some locations in btrfs requiring accurate estimation on how many new bytes can be allocated on unallocated space. We have two types of estimation: - Factor based calculation Just use all unallocated space, divide by the profile factor One obvious user is can_overcommit(). - Chunk allocator like calculation This will emulate the chunk allocator behavior, to get a proper estimation. The only user is btrfs_calc_avail_data_space(), utilized by btrfs_statfs(). The problem is, that function is not generic purposed enough, can't handle things like RAID5/6. Current factor based calculation can't handle the following case: devid 1 unallocated: 1T devid 2 unallocated: 10T metadata type: RAID1 If using factor, we can use (1T + 10T) / 2 = 5.5T free space for metadata. But in fact we can only get 1T free space, as we're limited by the smallest device for RAID1. [SOLUTION] This patch will introduce per-profile available space calculation, which can give an estimation based on chunk-allocator-like behavior. The difference between it and chunk allocator is mostly on rounding and [0, 1M) reserved space handling, which shouldn't cause practical impact. The newly introduced per-profile available space calculation will calculate available space for each type, using chunk-allocator like calculation. With that facility, for above device layout we get the full available space array: RAID10: 0 (not enough devices) RAID1: 1T RAID1C3: 0 (not enough devices) RAID1C4: 0 (not enough devices) DUP: 5.5T RAID0: 2T SINGLE: 11T RAID5: 1T RAID6: 0 (not enough devices) Or for a more complex example: devid 1 unallocated: 1T devid 2 unallocated: 1T devid 3 unallocated: 10T We will get an array of: RAID10: 0 (not enough devices) RAID1: 2T RAID1C3: 1T RAID1C4: 0 (not enough devices) DUP: 6T RAID0: 3T SINGLE: 12T RAID5: 2T RAID6: 0 (not enough devices) And for the each profile , we go chunk allocator level calculation: The pseudo code looks like: clear_virtual_used_space_of_all_rw_devices(); do { /* * The same as chunk allocator, despite used space, * we also take virtual used space into consideration. */ sort_device_with_virtual_free_space(); /* * Unlike chunk allocator, we don't need to bother hole/stripe * size, so we use the smallest device to make sure we can * allocated as many stripes as regular chunk allocator */ stripe_size = device_with_smallest_free->avail_space; stripe_size = min(stripe_size, to_alloc / ndevs); /* * Allocate a virtual chunk, allocated virtual chunk will * increase virtual used space, allow next iteration to * properly emulate chunk allocator behavior. */ ret = alloc_virtual_chunk(stripe_size, &allocated_size); if (ret == 0) avail += allocated_size; } while (ret == 0) As we always select the device with least free space, the device with the most space will be the first to be utilized, just like chunk allocator. For above 1T + 10T device, we will allocate a 1T virtual chunk in the first iteration, then run out of device in next iteration. Thus only get 1T free space for RAID1 type, just like what chunk allocator would do. This patch just introduces the infrastructure, no hooks are executed yet. Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 181 ++++++++++++++++++++++++++++++++++++++++----- fs/btrfs/volumes.h | 10 +++ 2 files changed, 172 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 214856c4ccb1..28636cf01190 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2038,6 +2038,168 @@ static void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, update_dev_time(device_path); } +/* + * sort the devices in descending order by max_avail, total_avail + */ +static int btrfs_cmp_device_info(const void *a, const void *b) +{ + const struct btrfs_device_info *di_a = a; + const struct btrfs_device_info *di_b = b; + + if (di_a->max_avail > di_b->max_avail) + return -1; + if (di_a->max_avail < di_b->max_avail) + return 1; + if (di_a->total_avail > di_b->total_avail) + return -1; + if (di_a->total_avail < di_b->total_avail) + return 1; + return 0; +} + +/* + * Return 0 if we allocated any ballon(*) chunk, and restore the size to + * @allocated (the last parameter). + * Return -ENOSPC if we have no more space to allocate virtual chunk + * + * *: Ballon chunks are space holder for per-profile available space allocator. + * Ballon chunks won't really take on-disk space, but only to emulate + * chunk allocator behavior to get accurate estimation on available space. + */ +static int alloc_virtual_chunk(struct btrfs_fs_info *fs_info, + struct btrfs_device_info *devices_info, + enum btrfs_raid_types type, + u64 *allocated) +{ + const struct btrfs_raid_attr *raid_attr = &btrfs_raid_array[type]; + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + u64 stripe_size; + int i; + int ndevs = 0; + + lockdep_assert_held(&fs_info->chunk_mutex); + + /* Go through devices to collect their unallocated space */ + list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) { + u64 avail; + if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA, + &device->dev_state) || + test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) + continue; + + if (device->total_bytes > device->bytes_used + + device->ballon_allocated) + avail = device->total_bytes - device->bytes_used - + device->ballon_allocated; + else + avail = 0; + + /* And exclude the [0, 1M) reserved space */ + if (avail > SZ_1M) + avail -= SZ_1M; + else + avail = 0; + + if (avail < fs_info->sectorsize) + continue; + /* + * Unlike chunk allocator, we don't care about stripe or hole + * size, so here we use @avail directly + */ + devices_info[ndevs].dev_offset = 0; + devices_info[ndevs].total_avail = avail; + devices_info[ndevs].max_avail = avail; + devices_info[ndevs].dev = device; + ++ndevs; + } + sort(devices_info, ndevs, sizeof(struct btrfs_device_info), + btrfs_cmp_device_info, NULL); + ndevs = rounddown(ndevs, raid_attr->devs_increment); + if (ndevs < raid_attr->devs_min) + return -ENOSPC; + if (raid_attr->devs_max) + ndevs = min(ndevs, (int)raid_attr->devs_max); + else + ndevs = min(ndevs, (int)BTRFS_MAX_DEVS(fs_info)); + + /* + * Now allocate a virtual chunk using the unallocated space of the + * device with the least unallocated space. + */ + stripe_size = round_down(devices_info[ndevs - 1].total_avail, + fs_info->sectorsize); + for (i = 0; i < ndevs; i++) + devices_info[i].dev->ballon_allocated += stripe_size; + *allocated = stripe_size * (ndevs - raid_attr->nparity) / + raid_attr->ncopies; + return 0; +} + +static int calc_one_profile_avail(struct btrfs_fs_info *fs_info, + enum btrfs_raid_types type, + u64 *result_ret) +{ + struct btrfs_device_info *devices_info = NULL; + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + u64 allocated; + u64 result = 0; + int ret = 0; + + lockdep_assert_held(&fs_info->chunk_mutex); + ASSERT(type >= 0 && type < BTRFS_NR_RAID_TYPES); + + /* Not enough devices, quick exit, just update the result */ + if (fs_devices->rw_devices < btrfs_raid_array[type].devs_min) + goto out; + + devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info), + GFP_NOFS); + if (!devices_info) { + ret = -ENOMEM; + goto out; + } + /* Clear virtual chunk used space for each device */ + list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) + device->ballon_allocated = 0; + + while (!alloc_virtual_chunk(fs_info, devices_info, type, &allocated)) + result += allocated; + +out: + kfree(devices_info); + if (ret < 0 && ret != -ENOSPC) + return ret; + *result_ret = result; + return 0; +} + +/* + * Update the per-profile available space array. + * + * Return 0 if we succeeded updating the array. + * Return <0 if something went wrong (ENOMEM), and the array is not + * updated. + */ +int btrfs_update_per_profile_avail(struct btrfs_fs_info *fs_info) +{ + u64 results[BTRFS_NR_RAID_TYPES]; + int i; + int ret; + + for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) { + ret = calc_one_profile_avail(fs_info, i, &results[i]); + if (ret < 0) + return ret; + } + + for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) + atomic64_set(&fs_info->fs_devices->per_profile_avail[i], + results[i]); + return ret; +} + int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, u64 devid) { @@ -4785,25 +4947,6 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info, return 0; } -/* - * sort the devices in descending order by max_avail, total_avail - */ -static int btrfs_cmp_device_info(const void *a, const void *b) -{ - const struct btrfs_device_info *di_a = a; - const struct btrfs_device_info *di_b = b; - - if (di_a->max_avail > di_b->max_avail) - return -1; - if (di_a->max_avail < di_b->max_avail) - return 1; - if (di_a->total_avail > di_b->total_avail) - return -1; - if (di_a->total_avail < di_b->total_avail) - return 1; - return 0; -} - static void check_raid56_incompat_flag(struct btrfs_fs_info *info, u64 type) { if (!(type & BTRFS_BLOCK_GROUP_RAID56_MASK)) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 5eea93916fbf..cd213c5e16cf 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -138,6 +138,12 @@ struct btrfs_device { struct completion kobj_unregister; /* For sysfs/FSID/devinfo/devid/ */ struct kobject devid_kobj; + + /* + * The ballon allocated space, to emulate chunk allocator to get + * an esitmation on available space. + */ + u64 ballon_allocated; }; /* @@ -264,6 +270,9 @@ struct btrfs_fs_devices { struct completion kobj_unregister; enum btrfs_chunk_allocation_policy chunk_alloc_policy; + + /* Records accurate per-type available space */ + atomic64_t per_profile_avail[BTRFS_NR_RAID_TYPES]; }; #define BTRFS_BIO_INLINE_CSUM_SIZE 64 @@ -577,5 +586,6 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +int btrfs_update_per_profile_avail(struct btrfs_fs_info *fs_info); #endif From patchwork Thu Oct 1 05:57:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810915 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 22558112C for ; Thu, 1 Oct 2020 05:58:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 044B2221E7 for ; Thu, 1 Oct 2020 05:58:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="KfwUBYug" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730763AbgJAF6M (ORCPT ); Thu, 1 Oct 2020 01:58:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:40314 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725892AbgJAF6L (ORCPT ); Thu, 1 Oct 2020 01:58:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531890; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cSp7lzar01qW4cPKj6DUznAqb4JvW9BdnIyP2n3OHE4=; b=KfwUBYugi9H8uUrwmYFn2nT0hs923DKpTHvSZFbaJfiM9P8m9EvVNfiiX/wVV975stQ+Q9 2y0agZXplGyaQTL0LY6NfMCyCFRvxSCaA9zVBsKlz0fHjFXukhB5uumrbxW6xjzO5SP8fh YKUQ6e+e7sOZgEaflbgmABsuzcm7WTs= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 7ED3DB320 for ; Thu, 1 Oct 2020 05:58:10 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 08/12] btrfs: volumes: update per-profile available space at mount time Date: Thu, 1 Oct 2020 13:57:40 +0800 Message-Id: <20201001055744.103261-9-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch will update the initial per-profile available space at mount time. Error (-ENOMEM) would lead to mount failure. If we can't even allocate memory at this moment, not allowing mount is good for everyone. Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 28636cf01190..e28d6a304f87 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7860,6 +7860,13 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info) /* Ensure all chunks have corresponding dev extents */ ret = verify_chunk_dev_extent_mapping(fs_info); + if (ret < 0) + goto out; + + /* All dev extents are verified, update per-profile available space */ + mutex_lock(&fs_info->chunk_mutex); + ret = btrfs_update_per_profile_avail(fs_info); + mutex_unlock(&fs_info->chunk_mutex); out: btrfs_free_path(path); return ret; From patchwork Thu Oct 1 05:57:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810917 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 76C5F139A for ; Thu, 1 Oct 2020 05:58:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5A15F221EF for ; Thu, 1 Oct 2020 05:58:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="pjcoh8qz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730813AbgJAF6R (ORCPT ); Thu, 1 Oct 2020 01:58:17 -0400 Received: from mx2.suse.de ([195.135.220.15]:40348 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725892AbgJAF6R (ORCPT ); Thu, 1 Oct 2020 01:58:17 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531895; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fc9gTJGcWro3JtQkjyZ/HYQ1ZGTn6vPEfAUrEe8AiL4=; b=pjcoh8qztzuOBfu2bYPiP68mLkBY5ecVDWtSa7NIelUj4brJq8wywtl1RipVV2wD9nauKf 4WHRVKQFSm0Hjhl49Rl+/fQZnixMzdL4A8QoD/QYpVHDCq6x7K+c/FJWqqSzRf0cEPg/QG glCDGbeJ2dmsUksn794034u5Vxo6bdM= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8BE3DB328 for ; Thu, 1 Oct 2020 05:58:15 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 09/12] btrfs: volumes: call btrfs_update_per_profile_avail() for chunk allocation and removal Date: Thu, 1 Oct 2020 13:57:41 +0800 Message-Id: <20201001055744.103261-10-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For chunk allocation, if we failed to update per profile available space, we need to revert the newly created block group, revert the device status, then return error. For chunk removal, if we failed we just abort transaction, like all error patterns in btrfs_remove_chunk(). Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e28d6a304f87..12c08648f5b6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3135,7 +3135,13 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset) device->bytes_used - dev_extent_len); atomic64_add(dev_extent_len, &fs_info->free_chunk_space); btrfs_clear_space_info_full(fs_info); + ret = btrfs_update_per_profile_avail(fs_info); mutex_unlock(&fs_info->chunk_mutex); + if (ret < 0) { + mutex_unlock(&fs_devices->device_list_mutex); + btrfs_abort_transaction(trans, ret); + goto out; + } } ret = btrfs_update_device(trans, device); @@ -5275,6 +5281,12 @@ static int create_chunk(struct btrfs_trans_handle *trans, &trans->transaction->dev_update_list); } + ret = btrfs_update_per_profile_avail(info); + if (ret < 0) { + btrfs_revert_block_group(trans, start); + goto error_revert_devices; + } + atomic64_sub(ctl->stripe_size * map->num_stripes, &info->free_chunk_space); @@ -5284,6 +5296,13 @@ static int create_chunk(struct btrfs_trans_handle *trans, return 0; +error_revert_devices: + for (i = 0; i < map->num_stripes; i++) { + struct btrfs_device *dev = map->stripes[i].dev; + + btrfs_device_set_bytes_used(dev, + dev->bytes_used - ctl->stripe_size); + } error_del_extent: write_lock(&em_tree->lock); remove_extent_mapping(em_tree, em); From patchwork Thu Oct 1 05:57:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810919 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9AC1139A for ; Thu, 1 Oct 2020 05:58:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A55DC221EC for ; Thu, 1 Oct 2020 05:58:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="lJaxyEyt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730862AbgJAF6U (ORCPT ); Thu, 1 Oct 2020 01:58:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:40428 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725892AbgJAF6U (ORCPT ); Thu, 1 Oct 2020 01:58:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531899; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=moPA+mPoNOnmuH6WQg8GDD500Y8nQ4VvSpBcd3qFQe4=; b=lJaxyEytsUP0TrFRATMlHgAF8koGs+yQQdCglhHLMrGoSSSUrgTmmf+GSwWEvaosqJVSLY 1J6rIZSvKRkWIzD0E8ucqiGt+XzD3YGnEn7vaXCyvJX4bj00s/h6GRIqnOlM6eumm+NAV7 sEePJASR4a5Vy7sEoD4mbIGDxdNHF28= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 0D697B320 for ; Thu, 1 Oct 2020 05:58:19 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9 10/12] btrfs: volumes: update per-profile available space for device update Date: Thu, 1 Oct 2020 13:57:42 +0800 Message-Id: <20201001055744.103261-11-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org 4 locations are involved, and we need to handle the extra error there: - device removal The existing error handling is good enough to revert. - device add We abort transaction when failed, just like the existing error patterns. - device grow We revert the device size if we failed. - device shrink The existing error handling is good enough to revert the device size. Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 12c08648f5b6..77276a6b172a 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2251,7 +2251,10 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, mutex_lock(&fs_info->chunk_mutex); list_del_init(&device->dev_alloc_list); device->fs_devices->rw_devices--; + ret = btrfs_update_per_profile_avail(fs_info); mutex_unlock(&fs_info->chunk_mutex); + if (ret < 0) + goto error_undo; } mutex_unlock(&uuid_mutex); @@ -2777,14 +2780,21 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path /* add sysfs device entry */ btrfs_sysfs_add_devices_dir(fs_devices, device); - /* - * we've got more storage, clear any full flags on the space - * infos - */ - btrfs_clear_space_info_full(fs_info); + ret = btrfs_update_per_profile_avail(fs_info); + + if (!ret) + /* + * we've got more storage, clear any full flags on the space + * infos + */ + btrfs_clear_space_info_full(fs_info); mutex_unlock(&fs_info->chunk_mutex); mutex_unlock(&fs_devices->device_list_mutex); + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + goto error_sysfs; + } if (seeding_dev) { mutex_lock(&fs_info->chunk_mutex); @@ -2937,8 +2947,10 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans, { struct btrfs_fs_info *fs_info = device->fs_info; struct btrfs_super_block *super_copy = fs_info->super_copy; + u64 old_dev_size; u64 old_total; u64 diff; + int ret; if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) return -EACCES; @@ -2947,6 +2959,7 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans, mutex_lock(&fs_info->chunk_mutex); old_total = btrfs_super_total_bytes(super_copy); + old_dev_size = device->total_bytes; diff = round_down(new_size - device->total_bytes, fs_info->sectorsize); if (new_size <= device->total_bytes || @@ -2955,17 +2968,26 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans, return -EINVAL; } + btrfs_device_set_total_bytes(device, new_size); + btrfs_device_set_disk_total_bytes(device, new_size); + ret = btrfs_update_per_profile_avail(fs_info); + if (ret < 0) { + btrfs_device_set_total_bytes(device, old_dev_size); + btrfs_device_set_disk_total_bytes(device, old_dev_size); + mutex_unlock(&fs_info->chunk_mutex); + return ret; + } + btrfs_set_super_total_bytes(super_copy, round_down(old_total + diff, fs_info->sectorsize)); device->fs_devices->total_rw_bytes += diff; - - btrfs_device_set_total_bytes(device, new_size); - btrfs_device_set_disk_total_bytes(device, new_size); btrfs_clear_space_info_full(device->fs_info); if (list_empty(&device->post_commit_list)) list_add_tail(&device->post_commit_list, &trans->transaction->dev_update_list); mutex_unlock(&fs_info->chunk_mutex); + if (ret < 0) + return ret; return btrfs_update_device(trans, device); } @@ -4784,6 +4806,12 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) device->fs_devices->total_rw_bytes -= diff; atomic64_sub(diff, &fs_info->free_chunk_space); } + ret = btrfs_update_per_profile_avail(fs_info); + if (ret < 0) { + mutex_unlock(&fs_info->chunk_mutex); + btrfs_end_transaction(trans); + goto done; + } /* * Once the device's size has been set to the new size, ensure all From patchwork Thu Oct 1 05:57:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810921 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 901F2139A for ; Thu, 1 Oct 2020 05:58:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 66A18221EC for ; Thu, 1 Oct 2020 05:58:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="o9CH+hmh" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730902AbgJAF61 (ORCPT ); Thu, 1 Oct 2020 01:58:27 -0400 Received: from mx2.suse.de ([195.135.220.15]:40478 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725892AbgJAF61 (ORCPT ); Thu, 1 Oct 2020 01:58:27 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531905; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CE0QbbqCiaFjmC03hr/KHSUo6dSgKvB0XDN8BwggmHI=; b=o9CH+hmhB1+7qXthR/SnnGdlEAUcoocv/wj7r2Pk9ppqa4tK0d7Bgz3aKQoA48gqX2EXfA f0r9YL5iWcoBe74tUvvO4Bs03LwdOdRrAqg49cBhSI5gTKpfOaWggVUIdyUdCNVqjrP1Gd TVf1+qZjHT7UIhAf1juYE26rrFnHVds= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 79304B328; Thu, 1 Oct 2020 05:58:25 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: stable@vger.kernel.org, Marc Lehmann , Josef Bacik Subject: [PATCH 9 11/12] btrfs: space-info: Use per-profile available space in can_overcommit() Date: Thu, 1 Oct 2020 13:57:43 +0800 Message-Id: <20201001055744.103261-12-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For the following disk layout, can_overcommit() can cause false confidence in available space: devid 1 unallocated: 1T devid 2 unallocated: 10T metadata type: RAID1 As can_overcommit() simply uses unallocated space with factor to calculate the allocatable metadata chunk size. can_overcommit() believes we still have 5.5T for metadata chunks, while the truth is, we only have 1T available for metadata chunks. This can lead to ENOSPC at run_delalloc_range() and cause transaction abort. Since factor based calculation can't distinguish RAID1/RAID10 and DUP at all, we need proper chunk-allocator level awareness to do such estimation. Thankfully, we have per-profile available space already calculated, just use that facility to avoid such false confidence. CC: stable@vger.kernel.org # 5.4+ Reported-by: Marc Lehmann Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik --- fs/btrfs/space-info.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 64b6e1d44f47..4bb4e3c3531f 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -336,25 +336,21 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, enum btrfs_reserve_flush_enum flush) { + enum btrfs_raid_types index; u64 profile; u64 avail; - int factor; if (space_info->flags & BTRFS_BLOCK_GROUP_SYSTEM) profile = btrfs_system_alloc_profile(fs_info); else profile = btrfs_metadata_alloc_profile(fs_info); - avail = atomic64_read(&fs_info->free_chunk_space); - /* - * If we have dup, raid1 or raid10 then only half of the free - * space is actually usable. For raid56, the space info used - * doesn't include the parity drive, so we don't have to - * change the math + * Grab avail space from per-profile array which should be as accurate + * as chunk allocator. */ - factor = btrfs_bg_type_to_factor(profile); - avail = div_u64(avail, factor); + index = btrfs_bg_flags_to_raid_index(profile); + avail = atomic64_read(&fs_info->fs_devices->per_profile_avail[index]); /* * If we aren't flushing all things, let us overcommit up to From patchwork Thu Oct 1 05:57:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11810923 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C910139A for ; Thu, 1 Oct 2020 05:58:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 298972220C for ; Thu, 1 Oct 2020 05:58:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="mkRvd8o/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730981AbgJAF6b (ORCPT ); Thu, 1 Oct 2020 01:58:31 -0400 Received: from mx2.suse.de ([195.135.220.15]:40526 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725892AbgJAF6b (ORCPT ); Thu, 1 Oct 2020 01:58:31 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1601531908; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pKkIYD0FsgXYfXbdWioq2lxd+fMO6Cj4U8u5XJcy0dE=; b=mkRvd8o/tiZ3TPqO5HeRnOwvokkGUPlRiuvO9WtIyMY+m9UKWzUBCvRNoCt3pQzT8MvpQN 2x/OA05KlicSpv7EofYMCHZSz8ICenRWFZw4liEMW6x2lZVttOLHQB18xconKcDx3zo7RO z4iVptgsx4BehkMWvIke79nZXCmoNeU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6E97FB320; Thu, 1 Oct 2020 05:58:28 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH 9 12/12] btrfs: statfs: Use pre-calculated per-profile available space Date: Thu, 1 Oct 2020 13:57:44 +0800 Message-Id: <20201001055744.103261-13-wqu@suse.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201001055744.103261-1-wqu@suse.com> References: <20201001055744.103261-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Although btrfs_calc_avail_data_space() is trying to do an estimation on how many data chunks it can allocate, the estimation is far from perfect: - Metadata over-commit is not considered at all - Chunk allocation doesn't take RAID5/6 into consideration This patch will change btrfs_calc_avail_data_space() to use pre-calculated per-profile available space. This provides the following benefits: - Accurate unallocated data space estimation It's as accurate as chunk allocator, and can handle RAID5/6 and newly introduced RAID1C3/C4. For the metadata over-commit part, we don't take that into consideration yet. As metadata over-commit only happens when we have enough unallocated space, and under most case we won't use that much metadata space at all. And we still have the existing 0-available space check, to prevent us from reporting too optimistic f_bavail result. Since we're keeping the old lock-free design, statfs should not experience any extra delay. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo --- fs/btrfs/super.c | 131 +++-------------------------------------------- 1 file changed, 7 insertions(+), 124 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 25967ecaaf0a..355e4f6a2fd4 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2016,124 +2016,6 @@ static inline void btrfs_descending_sort_devices( btrfs_cmp_device_free_bytes, NULL); } -/* - * The helper to calc the free space on the devices that can be used to store - * file data. - */ -static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info, - u64 *free_bytes) -{ - struct btrfs_device_info *devices_info; - struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; - struct btrfs_device *device; - u64 type; - u64 avail_space; - u64 min_stripe_size; - int num_stripes = 1; - int i = 0, nr_devices; - const struct btrfs_raid_attr *rattr; - - /* - * We aren't under the device list lock, so this is racy-ish, but good - * enough for our purposes. - */ - nr_devices = fs_info->fs_devices->open_devices; - if (!nr_devices) { - smp_mb(); - nr_devices = fs_info->fs_devices->open_devices; - ASSERT(nr_devices); - if (!nr_devices) { - *free_bytes = 0; - return 0; - } - } - - devices_info = kmalloc_array(nr_devices, sizeof(*devices_info), - GFP_KERNEL); - if (!devices_info) - return -ENOMEM; - - /* calc min stripe number for data space allocation */ - type = btrfs_data_alloc_profile(fs_info); - rattr = &btrfs_raid_array[btrfs_bg_flags_to_raid_index(type)]; - - if (type & BTRFS_BLOCK_GROUP_RAID0) - num_stripes = nr_devices; - else if (type & BTRFS_BLOCK_GROUP_RAID1) - num_stripes = 2; - else if (type & BTRFS_BLOCK_GROUP_RAID1C3) - num_stripes = 3; - else if (type & BTRFS_BLOCK_GROUP_RAID1C4) - num_stripes = 4; - else if (type & BTRFS_BLOCK_GROUP_RAID10) - num_stripes = 4; - - /* Adjust for more than 1 stripe per device */ - min_stripe_size = rattr->dev_stripes * BTRFS_STRIPE_LEN; - - rcu_read_lock(); - list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) { - if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA, - &device->dev_state) || - !device->bdev || - test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) - continue; - - if (i >= nr_devices) - break; - - avail_space = device->total_bytes - device->bytes_used; - - /* align with stripe_len */ - avail_space = rounddown(avail_space, BTRFS_STRIPE_LEN); - - /* - * In order to avoid overwriting the superblock on the drive, - * btrfs starts at an offset of at least 1MB when doing chunk - * allocation. - * - * This ensures we have at least min_stripe_size free space - * after excluding 1MB. - */ - if (avail_space <= SZ_1M + min_stripe_size) - continue; - - avail_space -= SZ_1M; - - devices_info[i].dev = device; - devices_info[i].max_avail = avail_space; - - i++; - } - rcu_read_unlock(); - - nr_devices = i; - - btrfs_descending_sort_devices(devices_info, nr_devices); - - i = nr_devices - 1; - avail_space = 0; - while (nr_devices >= rattr->devs_min) { - num_stripes = min(num_stripes, nr_devices); - - if (devices_info[i].max_avail >= min_stripe_size) { - int j; - u64 alloc_size; - - avail_space += devices_info[i].max_avail * num_stripes; - alloc_size = devices_info[i].max_avail; - for (j = i + 1 - num_stripes; j <= i; j++) - devices_info[j].max_avail -= alloc_size; - } - i--; - nr_devices--; - } - - kfree(devices_info); - *free_bytes = avail_space; - return 0; -} - /* * Calculate numbers for 'df', pessimistic in case of mixed raid profiles. * @@ -2150,6 +2032,7 @@ static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info, static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb); + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_super_block *disk_super = fs_info->super_copy; struct btrfs_space_info *found; u64 total_used = 0; @@ -2159,7 +2042,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) __be32 *fsid = (__be32 *)fs_info->fs_devices->fsid; unsigned factor = 1; struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv; - int ret; + enum btrfs_raid_types data_type; u64 thresh = 0; int mixed = 0; @@ -2208,11 +2091,11 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_bfree = 0; spin_unlock(&block_rsv->lock); - buf->f_bavail = div_u64(total_free_data, factor); - ret = btrfs_calc_avail_data_space(fs_info, &total_free_data); - if (ret) - return ret; - buf->f_bavail += div_u64(total_free_data, factor); + data_type = btrfs_bg_flags_to_raid_index( + btrfs_data_alloc_profile(fs_info)); + + buf->f_bavail = total_free_data + + atomic64_read(&fs_devices->per_profile_avail[data_type]); buf->f_bavail = buf->f_bavail >> bits; /*