diff mbox

[RFC,v2.1,16/16] btrfs-progs: fsck: Introduce low memory mode

Message ID 1463477930-3925-17-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive)
State Accepted
Headers show

Commit Message

Qu Wenruo May 17, 2016, 9:38 a.m. UTC
From: Lu Fengqi <lufq.fnst@cn.fujitsu.com>

Introduce a new fsck mode: low memory mode.

Old btrfsck is doing a quite efficient but uses some memory for each
extent item.
Old method will ensure extents are only iterated once at extent/chunk
tree check process.

But since it uses a little memory for each extent item, for large fs
with several TB metadata, this can easily eat up memory and cause OOM.

To handle such limitation and improve scalability, the new low-memory
mode will not use any heap memory to record which extent is checked.
Instead it will use extent backref to avoid most of uneeded check on
shared fs/subvolume tree blocks.
And with the use forward and backward reference cross check, we can also
ensure every tree block is at least checked once.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 Documentation/btrfs-check.asciidoc |  6 +++
 cmds-check.c                       | 80 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 84 insertions(+), 2 deletions(-)

Comments

Josef Bacik May 17, 2016, 3:29 p.m. UTC | #1
On 05/17/2016 05:38 AM, Qu Wenruo wrote:
> From: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
>

You can add

Reviewed-by: Josef Bacik <jbacik@fb.com>

To the whole series.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo May 18, 2016, 12:58 a.m. UTC | #2
Josef Bacik wrote on 2016/05/17 11:29 -0400:
> On 05/17/2016 05:38 AM, Qu Wenruo wrote:
>> From: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
>>
>
> You can add
>
> Reviewed-by: Josef Bacik <jbacik@fb.com>
>
> To the whole series.  Thanks,
>
> Josef
>
>
Thanks for the review.

I'll add it to github branch to avoid unneeded mail bombing.

Thanks
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba May 19, 2016, 2:51 p.m. UTC | #3
On Wed, May 18, 2016 at 08:58:57AM +0800, Qu Wenruo wrote:
> Josef Bacik wrote on 2016/05/17 11:29 -0400:
> > On 05/17/2016 05:38 AM, Qu Wenruo wrote:
> >
> Thanks for the review.
> 
> I'll add it to github branch to avoid unneeded mail bombing.

Thanks.  I did one more pass and fixed the error messages and some minor
formatting. Branch merged in devel, any fixups, please send as separate
patches. I'd like to give enough time for testing, so ETA for 4.6 will
be the end of the next week.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo May 20, 2016, 2:33 a.m. UTC | #4
David Sterba wrote on 2016/05/19 16:51 +0200:
> On Wed, May 18, 2016 at 08:58:57AM +0800, Qu Wenruo wrote:
>> Josef Bacik wrote on 2016/05/17 11:29 -0400:
>>> On 05/17/2016 05:38 AM, Qu Wenruo wrote:
>>>
>> Thanks for the review.
>>
>> I'll add it to github branch to avoid unneeded mail bombing.
>
> Thanks.  I did one more pass and fixed the error messages and some minor
> formatting. Branch merged in devel, any fixups, please send as separate
> patches. I'd like to give enough time for testing, so ETA for 4.6 will
> be the end of the next week.

Thank you a lot, for all the work.

We'll enrich the test cases for current low memory mode.

And only until current extent tree part is completely OK, then we will 
try to implement later fs tree one.
(Currently fs tree check still eats a lot of memory, making the memory 
save in extent tree check a little meaningless)

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba May 23, 2016, 11:08 a.m. UTC | #5
On Fri, May 20, 2016 at 10:33:55AM +0800, Qu Wenruo wrote:
> We'll enrich the test cases for current low memory mode.

I started something to add optional default options for a few basic
commands (mkfs, fsck, convert) to extend the coverage. I'm not finished,
the idea is to call the commands via some wrapper that will grab the
defaults from a file or from environment.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo May 24, 2016, 3:19 a.m. UTC | #6
David Sterba wrote on 2016/05/23 13:08 +0200:
> On Fri, May 20, 2016 at 10:33:55AM +0800, Qu Wenruo wrote:
>> We'll enrich the test cases for current low memory mode.
>
> I started something to add optional default options for a few basic
> commands (mkfs, fsck, convert) to extend the coverage. I'm not finished,
> the idea is to call the commands via some wrapper that will grab the
> defaults from a file or from environment.
>
>
Thank you a lot.

But that's still not enough for low memory fsck yet.

Even we can add --low-memory option for btrfsck to run on that images, 
we still have the following problems:

1) Lack of support for repair
    Repair support for low memory is quite tricky, as we need to do a
    lot of record work other than just calling btrfs_previous/next_item()

    This won't be implemented in a short time. And this will make almost
    all repair function test fails for low memory backend.

2) btrfs-image bug causing missing chunk stripe
    We're actively working on this before low memory mode for fs tree
    check.

    In fact the problem is already here for a long time, and another bug
    in btrfsck, which will ignore the error returned from dev_extent
    check, makes btrfsck can pass the fsck test images.

    Unfortunately (or fortunately?) low memory mode won't ignore such
    error and always report missing chunk for dev_extent.

    Unless we fix btrfs-image (only restore part is affected), low
    memory mode will always report error on btrfs-image restored image.

3) Extra images
    During the development of low memory mode, we found that current
    test images are all for some special fix case.

    No check on health images, not to mention test on all possible extent
    backrefs.

    We have build such images for internal low memory mode tests, and
    hopes to push it into current test.

    But since we don't have such check only test cases infrastructure
    and due to the bug of 2), we still needs some work for this.

So we still have to some work to do.

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/btrfs-check.asciidoc b/Documentation/btrfs-check.asciidoc
index 74a2ad2..4e27863 100644
--- a/Documentation/btrfs-check.asciidoc
+++ b/Documentation/btrfs-check.asciidoc
@@ -93,6 +93,12 @@  build the extent tree from scratch
 +
 NOTE: Do not use unless you know what you're doing.
 
+--low-memory::
+check fs in low memory usage mode(experimental)
+May takes longer time than normal check.
++
+NOTE: Doesn't work with '--repair' option yet.
+
 EXIT STATUS
 -----------
 *btrfs check* returns a zero exit status if it succeeds. Non zero is
diff --git a/cmds-check.c b/cmds-check.c
index 6a49f07..7a3026c 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -71,6 +71,7 @@  static int repair = 0;
 static int no_holes = 0;
 static int init_extent_tree = 0;
 static int check_data_csum = 0;
+static int low_memory = 0;
 static struct btrfs_fs_info *global_info;
 static struct task_ctx ctx = { 0 };
 static struct cache_tree *roots_info_cache = NULL;
@@ -9795,6 +9796,63 @@  static int traversal_tree_block(struct btrfs_root *root,
 	return err;
 }
 
+/*
+ * Low memory usage version check_chunks_and_extents.
+ */
+static int check_chunks_and_extents_v2(struct btrfs_root *root)
+{
+	struct btrfs_path path;
+	struct btrfs_key key;
+	struct btrfs_root *root1;
+	struct btrfs_root *cur_root;
+	int err = 0;
+	int ret;
+
+	root1 = root->fs_info->chunk_root;
+	ret = traversal_tree_block(root1, root1->node);
+	err |= ret;
+
+	root1 = root->fs_info->tree_root;
+	ret = traversal_tree_block(root1, root1->node);
+	err |= ret;
+
+	btrfs_init_path(&path);
+	key.objectid = BTRFS_EXTENT_TREE_OBJECTID;
+	key.offset = 0;
+	key.type = BTRFS_ROOT_ITEM_KEY;
+
+	ret = btrfs_search_slot(NULL, root1, &key, &path, 0, 0);
+	if (ret) {
+		error("couldn't find extent_tree_root from tree_root");
+		goto out;
+	}
+
+	while (1) {
+		btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+		if (key.type != BTRFS_ROOT_ITEM_KEY)
+			goto next;
+		key.offset = (u64)-1;
+
+		cur_root = btrfs_read_fs_root(root->fs_info, &key);
+		if (IS_ERR(cur_root) || !cur_root) {
+			error("Fail to read tree: %lld", key.objectid);
+			goto next;
+		}
+
+		ret = traversal_tree_block(cur_root, cur_root->node);
+		err |= ret;
+
+next:
+		ret = btrfs_next_item(root1, &path);
+		if (ret)
+			goto out;
+	}
+
+out:
+	btrfs_release_path(&path);
+	return err;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root, int overwrite)
 {
@@ -10911,6 +10969,7 @@  const char * const cmd_check_usage[] = {
 	"--readonly                  run in read-only mode (default)",
 	"--init-csum-tree            create a new CRC tree",
 	"--init-extent-tree          create a new extent tree",
+	"--low-memory                check in low memory usage mode(experimental)",
 	"--check-data-csum           verify checksums of data blocks",
 	"-Q|--qgroup-report           print a report on qgroup consistency",
 	"-E|--subvol-extents <subvolid>",
@@ -10942,7 +11001,8 @@  int cmd_check(int argc, char **argv)
 		int c;
 		enum { GETOPT_VAL_REPAIR = 257, GETOPT_VAL_INIT_CSUM,
 			GETOPT_VAL_INIT_EXTENT, GETOPT_VAL_CHECK_CSUM,
-			GETOPT_VAL_READONLY, GETOPT_VAL_CHUNK_TREE };
+			GETOPT_VAL_READONLY, GETOPT_VAL_CHUNK_TREE,
+			GETOPT_VAL_LOW_MEMORY };
 		static const struct option long_options[] = {
 			{ "super", required_argument, NULL, 's' },
 			{ "repair", no_argument, NULL, GETOPT_VAL_REPAIR },
@@ -10960,6 +11020,8 @@  int cmd_check(int argc, char **argv)
 			{ "chunk-root", required_argument, NULL,
 				GETOPT_VAL_CHUNK_TREE },
 			{ "progress", no_argument, NULL, 'p' },
+			{ "low-memory", no_argument, NULL,
+				GETOPT_VAL_LOW_MEMORY },
 			{ NULL, 0, NULL, 0}
 		};
 
@@ -11024,6 +11086,9 @@  int cmd_check(int argc, char **argv)
 			case GETOPT_VAL_CHECK_CSUM:
 				check_data_csum = 1;
 				break;
+			case GETOPT_VAL_LOW_MEMORY:
+				low_memory = 1;
+				break;
 		}
 	}
 
@@ -11041,6 +11106,14 @@  int cmd_check(int argc, char **argv)
 		exit(1);
 	}
 
+	/*
+	 * Not supported yet
+	 */
+	if (repair && low_memory) {
+		error("Low memory mode doesn't support repair yet");
+		exit(1);
+	}
+
 	radix_tree_init();
 	cache_tree_init(&root_cache);
 
@@ -11164,7 +11237,10 @@  int cmd_check(int argc, char **argv)
 
 	if (!ctx.progress_enabled)
 		fprintf(stderr, "checking extents\n");
-	ret = check_chunks_and_extents(root);
+	if (low_memory)
+		ret = check_chunks_and_extents_v2(root);
+	else
+		ret = check_chunks_and_extents(root);
 	if (ret)
 		fprintf(stderr, "Errors found in extent allocation tree or chunk allocation\n");