diff mbox

[v2,3/4] Btrfs-progs: fsck: deal with corrupted csum root

Message ID 1401357597-9494-2-git-send-email-wangsl.fnst@cn.fujitsu.com (mailing list archive)
State Accepted
Delegated to: David Sterba
Headers show

Commit Message

Wang Shilong May 29, 2014, 9:59 a.m. UTC
If checksum root is corrupted, fsck will get segmentation. This
is because if we fail to load checksum root, root's node is NULL which
cause NULL pointer deferences later.

To fix this problem, we just did something like extent tree rebuilding.
Allocate a new one and clear uptodate flag. We will do sanity check
before fsck going on.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
---
v1->v2: fix typo for output message.
---
 cmds-check.c | 5 +++++
 disk-io.c    | 7 +++++++
 2 files changed, 12 insertions(+)

Comments

David Sterba June 2, 2014, 5:27 p.m. UTC | #1
On Thu, May 29, 2014 at 05:59:57PM +0800, Wang Shilong wrote:
> If checksum root is corrupted, fsck will get segmentation. This
> is because if we fail to load checksum root, root's node is NULL which
> cause NULL pointer deferences later.
> 
> To fix this problem, we just did something like extent tree rebuilding.
> Allocate a new one and clear uptodate flag. We will do sanity check
> before fsck going on.

I'm a bit worried about recommending --init-csum-root, though in this
case there's not much else left to do. A filesystem with initialized
csum tree will mount, but reading non-inline data will produce 'csum
missing' errors.

> --- a/cmds-check.c
> +++ b/cmds-check.c
> @@ -6963,6 +6963,11 @@ int cmd_check(int argc, char **argv)
>  		ret = -EIO;
>  		goto close_out;
>  	}
> +	if (!extent_buffer_uptodate(info->csum_root->node)) {
> +		fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n");
> +		ret = -EIO;
> +		goto close_out;

So this should prevent segfaults due to missing csum tree, fine. The
error message can copy what the broken extent tree reports a few lines
above.

And now that I'm looking at other extent_buffer_uptodate(tree) checks in
the function, for clarity, each root check should be done separately and
followed by a message that says which tree is broken.

The idea behind this is to do improve the error reporting and then
document what type of breakage can be fixed and how.

I'm CCing Chris, as this is a matter of design and direction of fsck,
more oppinions are desirable.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wang Shilong June 3, 2014, 3:25 a.m. UTC | #2
On 06/03/2014 01:27 AM, David Sterba wrote:
> On Thu, May 29, 2014 at 05:59:57PM +0800, Wang Shilong wrote:
>> If checksum root is corrupted, fsck will get segmentation. This
>> is because if we fail to load checksum root, root's node is NULL which
>> cause NULL pointer deferences later.
>>
>> To fix this problem, we just did something like extent tree rebuilding.
>> Allocate a new one and clear uptodate flag. We will do sanity check
>> before fsck going on.
> I'm a bit worried about recommending --init-csum-root, though in this
> case there's not much else left to do. A filesystem with initialized
> csum tree will mount, but reading non-inline data will produce 'csum
> missing' errors.
Agree.
>> --- a/cmds-check.c
>> +++ b/cmds-check.c
>> @@ -6963,6 +6963,11 @@ int cmd_check(int argc, char **argv)
>>   		ret = -EIO;
>>   		goto close_out;
>>   	}
>> +	if (!extent_buffer_uptodate(info->csum_root->node)) {
>> +		fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n");
>> +		ret = -EIO;
>> +		goto close_out;
> So this should prevent segfaults due to missing csum tree, fine. The
> error message can copy what the broken extent tree reports a few lines
> above.
>
> And now that I'm looking at other extent_buffer_uptodate(tree) checks in
> the function, for clarity, each root check should be done separately and
> followed by a message that says which tree is broken.
Normally, extent_buffer_update(tree) is called after reading.
We need this in fsck is because we need reinit extent tree and csum tree.

check it again is to make sure root node has been setup properly and
fsck can go further..


>
> The idea behind this is to do improve the error reporting and then
> document what type of breakage can be fixed and how.
>
> I'm CCing Chris, as this is a matter of design and direction of fsck,
> more oppinions are desirable.
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba June 3, 2014, 4:21 p.m. UTC | #3
On Tue, Jun 03, 2014 at 11:25:49AM +0800, Wang Shilong wrote:
> On 06/03/2014 01:27 AM, David Sterba wrote:
> >On Thu, May 29, 2014 at 05:59:57PM +0800, Wang Shilong wrote:
> >>If checksum root is corrupted, fsck will get segmentation. This
> >>is because if we fail to load checksum root, root's node is NULL which
> >>cause NULL pointer deferences later.
> >>
> >>To fix this problem, we just did something like extent tree rebuilding.
> >>Allocate a new one and clear uptodate flag. We will do sanity check
> >>before fsck going on.
> >I'm a bit worried about recommending --init-csum-root, though in this
> >case there's not much else left to do. A filesystem with initialized
> >csum tree will mount, but reading non-inline data will produce 'csum
> >missing' errors.
> Agree.

Are you ok with removing the "rerun with --init-csum-tree option" part
of the message?

> >>--- a/cmds-check.c
> >>+++ b/cmds-check.c
> >>@@ -6963,6 +6963,11 @@ int cmd_check(int argc, char **argv)
> >>  		ret = -EIO;
> >>  		goto close_out;
> >>  	}
> >>+	if (!extent_buffer_uptodate(info->csum_root->node)) {
> >>+		fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n");
> >>+		ret = -EIO;
> >>+		goto close_out;
> >So this should prevent segfaults due to missing csum tree, fine. The
> >error message can copy what the broken extent tree reports a few lines
> >above.
> >
> >And now that I'm looking at other extent_buffer_uptodate(tree) checks in
> >the function, for clarity, each root check should be done separately and
> >followed by a message that says which tree is broken.
> Normally, extent_buffer_update(tree) is called after reading.
> We need this in fsck is because we need reinit extent tree and csum tree.
> 
> check it again is to make sure root node has been setup properly and
> fsck can go further..

Yeah, I see how it works now, thanks.

I've reorganized the patches in integration so the ones for fsck are
grouped together. Fsck is scary and needs more reviews obviously, so the
patches will be pushed towards release branches based on that. Reviews
or tests so to say. I appreciate your work in that area and hope you
understand the slow progress with your patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wang Shilong June 4, 2014, 1:43 a.m. UTC | #4
On 06/04/2014 12:21 AM, David Sterba wrote:
> On Tue, Jun 03, 2014 at 11:25:49AM +0800, Wang Shilong wrote:
>> On 06/03/2014 01:27 AM, David Sterba wrote:
>>> On Thu, May 29, 2014 at 05:59:57PM +0800, Wang Shilong wrote:
>>>> If checksum root is corrupted, fsck will get segmentation. This
>>>> is because if we fail to load checksum root, root's node is NULL which
>>>> cause NULL pointer deferences later.
>>>>
>>>> To fix this problem, we just did something like extent tree rebuilding.
>>>> Allocate a new one and clear uptodate flag. We will do sanity check
>>>> before fsck going on.
>>> I'm a bit worried about recommending --init-csum-root, though in this
>>> case there's not much else left to do. A filesystem with initialized
>>> csum tree will mount, but reading non-inline data will produce 'csum
>>> missing' errors.
>> Agree.
> Are you ok with removing the "rerun with --init-csum-tree option" part
> of the message?
That's not good, i agree with your point here.
>
>>>> --- a/cmds-check.c
>>>> +++ b/cmds-check.c
>>>> @@ -6963,6 +6963,11 @@ int cmd_check(int argc, char **argv)
>>>>   		ret = -EIO;
>>>>   		goto close_out;
>>>>   	}
>>>> +	if (!extent_buffer_uptodate(info->csum_root->node)) {
>>>> +		fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n");
>>>> +		ret = -EIO;
>>>> +		goto close_out;
>>> So this should prevent segfaults due to missing csum tree, fine. The
>>> error message can copy what the broken extent tree reports a few lines
>>> above.
>>>
>>> And now that I'm looking at other extent_buffer_uptodate(tree) checks in
>>> the function, for clarity, each root check should be done separately and
>>> followed by a message that says which tree is broken.
>> Normally, extent_buffer_update(tree) is called after reading.
>> We need this in fsck is because we need reinit extent tree and csum tree.
>>
>> check it again is to make sure root node has been setup properly and
>> fsck can go further..
> Yeah, I see how it works now, thanks.
>
> I've reorganized the patches in integration so the ones for fsck are
> grouped together. Fsck is scary and needs more reviews obviously, so the
> patches will be pushed towards release branches based on that. Reviews
> or tests so to say. I appreciate your work in that area and hope you
> understand the slow progress with your patches.
That's ok for me, thanks for your review and comments^_^

> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/cmds-check.c b/cmds-check.c
index 0e4e042..ad5514e 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -6963,6 +6963,11 @@  int cmd_check(int argc, char **argv)
 		ret = -EIO;
 		goto close_out;
 	}
+	if (!extent_buffer_uptodate(info->csum_root->node)) {
+		fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n");
+		ret = -EIO;
+		goto close_out;
+	}
 
 	fprintf(stderr, "checking extents\n");
 	ret = check_chunks_and_extents(root);
diff --git a/disk-io.c b/disk-io.c
index 63e153d..bbfd8e7 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -914,6 +914,13 @@  int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 		printk("Couldn't setup csum tree\n");
 		if (!(flags & OPEN_CTREE_PARTIAL))
 			return -EIO;
+		/* do the same thing as extent tree rebuilding */
+		fs_info->csum_root->node =
+			btrfs_find_create_tree_block(fs_info->extent_root, 0,
+						     leafsize);
+		if (!fs_info->csum_root->node)
+			return -ENOMEM;
+		clear_extent_buffer_uptodate(NULL, fs_info->csum_root->node);
 	}
 	fs_info->csum_root->track_dirty = 1;