diff mbox

[1/2] btrfs-progs: Add support for btrfs-image + corrupt script fsck test case.

Message ID 1418615699-18169-1-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Qu Wenruo Dec. 15, 2014, 3:54 a.m. UTC
Although btrfsck test case support pure image dump(tar.xz), it is still
too large for some images, e.g, a small 64M image with about 3 levels
(level 0~2) metadata will produce about 2.6M after xz zip, which is too
large for a single binary commit.

However btrfs-image -c9 will works much finer, the above image with
btrfs-image dump will only be less than 200K, which is quite reasonable.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 tests/fsck-tests.sh | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

Comments

Qu Wenruo Dec. 15, 2014, 6:09 a.m. UTC | #1
It seems the second patch, which about 225K including a btrfs-image 
dump, can't pass the ml's size limit.

To David:
I created the pull request to your github repo:
https://github.com/kdave/btrfs-progs/pull/3

Thanks,
Qu

-------- Original Message --------
Subject: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt 
script fsck test case.
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <linux-btrfs@vger.kernel.org>
Date: 2014?12?15? 11:54
> Although btrfsck test case support pure image dump(tar.xz), it is still
> too large for some images, e.g, a small 64M image with about 3 levels
> (level 0~2) metadata will produce about 2.6M after xz zip, which is too
> large for a single binary commit.
>
> However btrfs-image -c9 will works much finer, the above image with
> btrfs-image dump will only be less than 200K, which is quite reasonable.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>   tests/fsck-tests.sh | 34 ++++++++++++++++++++++++++++++++--
>   1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh
> index 8987d04..007e5b0 100644
> --- a/tests/fsck-tests.sh
> +++ b/tests/fsck-tests.sh
> @@ -22,16 +22,38 @@ run_check()
>   	"$@" >> $RESULT 2>&1 || _fail "failed: $@"
>   }
>   
> +# For complicated fsck repair case,
> +# where even repairing is OK, it may still report problem before or after
> +# reparing since the repair needs several loops to repair all the problems
> +# but report checks it before all repair loops done
> +run_check_no_fail()
> +{
> +	echo "############### $@" >> $RESULT 2>&1
> +	"$@" >> $RESULT 2>&1
> +}
> +
>   rm -f $RESULT
>   
>   # test rely on corrupting blocks tool
>   run_check make btrfs-corrupt-block
>   
> +# Supported test image formats:
> +# 1) btrfs-image dump(.img files)
>   # Some broken filesystem images are kept as .img files, created by the tool
> -# btrfs-image, and others are kept as .tar.xz files that contain raw filesystem
> +# btrfs-image
> +#
> +# 2) binary image dump only(only test.img in .tar.xz)
> +# Some are kept as .tar.xz files that contain raw filesystem
>   # image (the backing file of a loop device, as a sparse file). The reason for
>   # keeping some as tarballs of raw images is that for these cases btrfs-image
>   # isn't able to preserve all the (bad) filesystem structure for some reason.
> +# This provides great flexibility at the cost of large file size.
> +#
> +# 3) script generated dump(generate_image.sh + needed things in .tar.gz)
> +# The image is generated by the generate_image.sh script alone the needed
> +# files in the tarball, normally a quite small btrfs-image dump.
> +# This one combines the advatange of relative small btrfs-image and the
> +# flexibility to support corrupted image.
>   for i in $(find $here/tests/fsck-tests -name '*.img' -o -name '*.tar.xz' | sort)
>   do
>   	echo "     [TEST]    $(basename $i)"
> @@ -39,16 +61,24 @@ do
>   
>   	extension=${i#*.}
>   
> +	if [ -f generate_image.sh ]; then
> +		rm generate_image.sh
> +	fi
> +
>   	if [ $extension == "img" ]; then
>   		run_check $here/btrfs-image -r $i test.img
>   	else
>   		run_check tar xJf $i
>   	fi
>   
> +	if [ -x generate_image.sh ]; then
> +		./generate_image.sh
> +	fi
> +
>   	$here/btrfsck test.img >> $RESULT 2>&1
>   	[ $? -eq 0 ] && _fail "btrfsck should have detected corruption"
>   
> -	run_check $here/btrfsck --repair test.img
> +	run_check_no_fail $here/btrfsck --repair test.img
>   	run_check $here/btrfsck test.img
>   done
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Dec. 15, 2014, 9 a.m. UTC | #2
On Mon, Dec 15, 2014 at 3:54 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> Although btrfsck test case support pure image dump(tar.xz), it is still
> too large for some images, e.g, a small 64M image with about 3 levels
> (level 0~2) metadata will produce about 2.6M after xz zip, which is too
> large for a single binary commit.
>
> However btrfs-image -c9 will works much finer, the above image with
> btrfs-image dump will only be less than 200K, which is quite reasonable.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  tests/fsck-tests.sh | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh
> index 8987d04..007e5b0 100644
> --- a/tests/fsck-tests.sh
> +++ b/tests/fsck-tests.sh
> @@ -22,16 +22,38 @@ run_check()
>         "$@" >> $RESULT 2>&1 || _fail "failed: $@"
>  }
>
> +# For complicated fsck repair case,
> +# where even repairing is OK, it may still report problem before or after
> +# reparing since the repair needs several loops to repair all the problems
> +# but report checks it before all repair loops done
> +run_check_no_fail()
> +{
> +       echo "############### $@" >> $RESULT 2>&1
> +       "$@" >> $RESULT 2>&1
> +}

I'm confused with this function, why it's needed and the respective comment.
So I can interpret it as either:

1) The several loops means fsck --repair does multiple passages
internally to fix some issues?
 If this is the case, we (user or script) only need to call fsck
--repair once, which should exit with status 0 if it was able to fix
all the issues, right? If so, then we should check that fsck --repair
exits with status 0, removing the need for this new function.

2) The several loops means a user or script must call fsck --repair
multiple times to fix all the issues? If this is the case then you're
only calling this function once, for a single fsck --repair, in the
code below, which confuses me and it makes this new function redundant
too.

Thanks

> +
>  rm -f $RESULT
>
>  # test rely on corrupting blocks tool
>  run_check make btrfs-corrupt-block
>
> +# Supported test image formats:
> +# 1) btrfs-image dump(.img files)
>  # Some broken filesystem images are kept as .img files, created by the tool
> -# btrfs-image, and others are kept as .tar.xz files that contain raw filesystem
> +# btrfs-image
> +#
> +# 2) binary image dump only(only test.img in .tar.xz)
> +# Some are kept as .tar.xz files that contain raw filesystem
>  # image (the backing file of a loop device, as a sparse file). The reason for
>  # keeping some as tarballs of raw images is that for these cases btrfs-image
>  # isn't able to preserve all the (bad) filesystem structure for some reason.
> +# This provides great flexibility at the cost of large file size.
> +#
> +# 3) script generated dump(generate_image.sh + needed things in .tar.gz)
> +# The image is generated by the generate_image.sh script alone the needed
> +# files in the tarball, normally a quite small btrfs-image dump.
> +# This one combines the advatange of relative small btrfs-image and the
> +# flexibility to support corrupted image.
>  for i in $(find $here/tests/fsck-tests -name '*.img' -o -name '*.tar.xz' | sort)
>  do
>         echo "     [TEST]    $(basename $i)"
> @@ -39,16 +61,24 @@ do
>
>         extension=${i#*.}
>
> +       if [ -f generate_image.sh ]; then
> +               rm generate_image.sh
> +       fi
> +
>         if [ $extension == "img" ]; then
>                 run_check $here/btrfs-image -r $i test.img
>         else
>                 run_check tar xJf $i
>         fi
>
> +       if [ -x generate_image.sh ]; then
> +               ./generate_image.sh
> +       fi
> +
>         $here/btrfsck test.img >> $RESULT 2>&1
>         [ $? -eq 0 ] && _fail "btrfsck should have detected corruption"
>
> -       run_check $here/btrfsck --repair test.img
> +       run_check_no_fail $here/btrfsck --repair test.img
>         run_check $here/btrfsck test.img
>  done
>
> --
> 2.1.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Dec. 15, 2014, 9:36 a.m. UTC | #3
On Mon, Dec 15, 2014 at 3:54 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> Although btrfsck test case support pure image dump(tar.xz), it is still
> too large for some images, e.g, a small 64M image with about 3 levels
> (level 0~2) metadata will produce about 2.6M after xz zip, which is too
> large for a single binary commit.
>
> However btrfs-image -c9 will works much finer, the above image with
> btrfs-image dump will only be less than 200K, which is quite reasonable.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  tests/fsck-tests.sh | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh
> index 8987d04..007e5b0 100644
> --- a/tests/fsck-tests.sh
> +++ b/tests/fsck-tests.sh
> @@ -22,16 +22,38 @@ run_check()
>         "$@" >> $RESULT 2>&1 || _fail "failed: $@"
>  }
>
> +# For complicated fsck repair case,
> +# where even repairing is OK, it may still report problem before or after
> +# reparing since the repair needs several loops to repair all the problems
> +# but report checks it before all repair loops done
> +run_check_no_fail()
> +{
> +       echo "############### $@" >> $RESULT 2>&1
> +       "$@" >> $RESULT 2>&1
> +}
> +
>  rm -f $RESULT
>
>  # test rely on corrupting blocks tool
>  run_check make btrfs-corrupt-block
>
> +# Supported test image formats:
> +# 1) btrfs-image dump(.img files)
>  # Some broken filesystem images are kept as .img files, created by the tool
> -# btrfs-image, and others are kept as .tar.xz files that contain raw filesystem
> +# btrfs-image
> +#
> +# 2) binary image dump only(only test.img in .tar.xz)
> +# Some are kept as .tar.xz files that contain raw filesystem
>  # image (the backing file of a loop device, as a sparse file). The reason for
>  # keeping some as tarballs of raw images is that for these cases btrfs-image
>  # isn't able to preserve all the (bad) filesystem structure for some reason.
> +# This provides great flexibility at the cost of large file size.
> +#
> +# 3) script generated dump(generate_image.sh + needed things in .tar.gz)
> +# The image is generated by the generate_image.sh script alone the needed
> +# files in the tarball, normally a quite small btrfs-image dump.
> +# This one combines the advatange of relative small btrfs-image and the
> +# flexibility to support corrupted image.
>  for i in $(find $here/tests/fsck-tests -name '*.img' -o -name '*.tar.xz' | sort)
>  do
>         echo "     [TEST]    $(basename $i)"
> @@ -39,16 +61,24 @@ do
>
>         extension=${i#*.}
>
> +       if [ -f generate_image.sh ]; then
> +               rm generate_image.sh
> +       fi
> +
>         if [ $extension == "img" ]; then
>                 run_check $here/btrfs-image -r $i test.img
>         else
>                 run_check tar xJf $i
>         fi
>
> +       if [ -x generate_image.sh ]; then
> +               ./generate_image.sh
> +       fi
> +
>         $here/btrfsck test.img >> $RESULT 2>&1
>         [ $? -eq 0 ] && _fail "btrfsck should have detected corruption"
>
> -       run_check $here/btrfsck --repair test.img
> +       run_check_no_fail $here/btrfsck --repair test.img
>         run_check $here/btrfsck test.img
>  done

So another thing I would like to see is doing a more comprehensive
verification that the repair code worked as expected. Currently we
only check that a readonly fsck, after running fsck --repair, returns
0.

For the improvements you've been doing, it's equally important to
verify that --repair recovered the inodes, links, etc to the
lost+found directory (or whatever is the directory's name).

So perhaps adding a verify.sh script to the tarball for example?

Thanks

>
> --
> 2.1.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Dec. 15, 2014, 9:40 a.m. UTC | #4
-------- Original Message --------
Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + 
corrupt script fsck test case.
From: Filipe David Manana <fdmanana@gmail.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2014?12?15? 17:00
> On Mon, Dec 15, 2014 at 3:54 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> Although btrfsck test case support pure image dump(tar.xz), it is still
>> too large for some images, e.g, a small 64M image with about 3 levels
>> (level 0~2) metadata will produce about 2.6M after xz zip, which is too
>> large for a single binary commit.
>>
>> However btrfs-image -c9 will works much finer, the above image with
>> btrfs-image dump will only be less than 200K, which is quite reasonable.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   tests/fsck-tests.sh | 34 ++++++++++++++++++++++++++++++++--
>>   1 file changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh
>> index 8987d04..007e5b0 100644
>> --- a/tests/fsck-tests.sh
>> +++ b/tests/fsck-tests.sh
>> @@ -22,16 +22,38 @@ run_check()
>>          "$@" >> $RESULT 2>&1 || _fail "failed: $@"
>>   }
>>
>> +# For complicated fsck repair case,
>> +# where even repairing is OK, it may still report problem before or after
>> +# reparing since the repair needs several loops to repair all the problems
>> +# but report checks it before all repair loops done
>> +run_check_no_fail()
>> +{
>> +       echo "############### $@" >> $RESULT 2>&1
>> +       "$@" >> $RESULT 2>&1
>> +}
> I'm confused with this function, why it's needed and the respective comment.
> So I can interpret it as either:
>
> 1) The several loops means fsck --repair does multiple passages
> internally to fix some issues?
>   If this is the case, we (user or script) only need to call fsck
> --repair once, which should exit with status 0 if it was able to fix
> all the issues, right? If so, then we should check that fsck --repair
> exits with status 0, removing the need for this new function.
Sorry for the poor explain.

The problem is, there is some check cases before we doing repair and 
these check result is bad so
btrfsck thinks there is err even it will be repaired later.

So The result is, especially on corrupted-leaf case, btrfsck --repair 
will fix all the problems but
still return 1, and the next btrfsck without --repair will return 0.

I think there are still  a lot of other cases causing things like 
this(multiple different errors combines together)
so just discard the return value of btrfsck --repair and focus on the 
second btrfsck return value.

Thanks,
Qu
>
> 2) The several loops means a user or script must call fsck --repair
> multiple times to fix all the issues? If this is the case then you're
> only calling this function once, for a single fsck --repair, in the
> code below, which confuses me and it makes this new function redundant
> too.
>
> Thanks
>
>> +
>>   rm -f $RESULT
>>
>>   # test rely on corrupting blocks tool
>>   run_check make btrfs-corrupt-block
>>
>> +# Supported test image formats:
>> +# 1) btrfs-image dump(.img files)
>>   # Some broken filesystem images are kept as .img files, created by the tool
>> -# btrfs-image, and others are kept as .tar.xz files that contain raw filesystem
>> +# btrfs-image
>> +#
>> +# 2) binary image dump only(only test.img in .tar.xz)
>> +# Some are kept as .tar.xz files that contain raw filesystem
>>   # image (the backing file of a loop device, as a sparse file). The reason for
>>   # keeping some as tarballs of raw images is that for these cases btrfs-image
>>   # isn't able to preserve all the (bad) filesystem structure for some reason.
>> +# This provides great flexibility at the cost of large file size.
>> +#
>> +# 3) script generated dump(generate_image.sh + needed things in .tar.gz)
>> +# The image is generated by the generate_image.sh script alone the needed
>> +# files in the tarball, normally a quite small btrfs-image dump.
>> +# This one combines the advatange of relative small btrfs-image and the
>> +# flexibility to support corrupted image.
>>   for i in $(find $here/tests/fsck-tests -name '*.img' -o -name '*.tar.xz' | sort)
>>   do
>>          echo "     [TEST]    $(basename $i)"
>> @@ -39,16 +61,24 @@ do
>>
>>          extension=${i#*.}
>>
>> +       if [ -f generate_image.sh ]; then
>> +               rm generate_image.sh
>> +       fi
>> +
>>          if [ $extension == "img" ]; then
>>                  run_check $here/btrfs-image -r $i test.img
>>          else
>>                  run_check tar xJf $i
>>          fi
>>
>> +       if [ -x generate_image.sh ]; then
>> +               ./generate_image.sh
>> +       fi
>> +
>>          $here/btrfsck test.img >> $RESULT 2>&1
>>          [ $? -eq 0 ] && _fail "btrfsck should have detected corruption"
>>
>> -       run_check $here/btrfsck --repair test.img
>> +       run_check_no_fail $here/btrfsck --repair test.img
>>          run_check $here/btrfsck test.img
>>   done
>>
>> --
>> 2.1.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Dec. 15, 2014, 9:43 a.m. UTC | #5
On Mon, Dec 15, 2014 at 9:40 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
> -------- Original Message --------
> Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
> script fsck test case.
> From: Filipe David Manana <fdmanana@gmail.com>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2014?12?15? 17:00
>>
>> On Mon, Dec 15, 2014 at 3:54 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>> wrote:
>>>
>>> Although btrfsck test case support pure image dump(tar.xz), it is still
>>> too large for some images, e.g, a small 64M image with about 3 levels
>>> (level 0~2) metadata will produce about 2.6M after xz zip, which is too
>>> large for a single binary commit.
>>>
>>> However btrfs-image -c9 will works much finer, the above image with
>>> btrfs-image dump will only be less than 200K, which is quite reasonable.
>>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> ---
>>>   tests/fsck-tests.sh | 34 ++++++++++++++++++++++++++++++++--
>>>   1 file changed, 32 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh
>>> index 8987d04..007e5b0 100644
>>> --- a/tests/fsck-tests.sh
>>> +++ b/tests/fsck-tests.sh
>>> @@ -22,16 +22,38 @@ run_check()
>>>          "$@" >> $RESULT 2>&1 || _fail "failed: $@"
>>>   }
>>>
>>> +# For complicated fsck repair case,
>>> +# where even repairing is OK, it may still report problem before or
>>> after
>>> +# reparing since the repair needs several loops to repair all the
>>> problems
>>> +# but report checks it before all repair loops done
>>> +run_check_no_fail()
>>> +{
>>> +       echo "############### $@" >> $RESULT 2>&1
>>> +       "$@" >> $RESULT 2>&1
>>> +}
>>
>> I'm confused with this function, why it's needed and the respective
>> comment.
>> So I can interpret it as either:
>>
>> 1) The several loops means fsck --repair does multiple passages
>> internally to fix some issues?
>>   If this is the case, we (user or script) only need to call fsck
>> --repair once, which should exit with status 0 if it was able to fix
>> all the issues, right? If so, then we should check that fsck --repair
>> exits with status 0, removing the need for this new function.
>
> Sorry for the poor explain.
>
> The problem is, there is some check cases before we doing repair and these
> check result is bad so
> btrfsck thinks there is err even it will be repaired later.
>
> So The result is, especially on corrupted-leaf case, btrfsck --repair will
> fix all the problems but
> still return 1, and the next btrfsck without --repair will return 0.

That seems wrong to me. If --repair was able to fix all problems it
should exit with status 0.
If a script is running fsck --repair it would incorrectly assume
--repair failed.

>
> I think there are still  a lot of other cases causing things like
> this(multiple different errors combines together)
> so just discard the return value of btrfsck --repair and focus on the second
> btrfsck return value.
>
> Thanks,
> Qu
>
>>
>> 2) The several loops means a user or script must call fsck --repair
>> multiple times to fix all the issues? If this is the case then you're
>> only calling this function once, for a single fsck --repair, in the
>> code below, which confuses me and it makes this new function redundant
>> too.
>>
>> Thanks
>>
>>> +
>>>   rm -f $RESULT
>>>
>>>   # test rely on corrupting blocks tool
>>>   run_check make btrfs-corrupt-block
>>>
>>> +# Supported test image formats:
>>> +# 1) btrfs-image dump(.img files)
>>>   # Some broken filesystem images are kept as .img files, created by the
>>> tool
>>> -# btrfs-image, and others are kept as .tar.xz files that contain raw
>>> filesystem
>>> +# btrfs-image
>>> +#
>>> +# 2) binary image dump only(only test.img in .tar.xz)
>>> +# Some are kept as .tar.xz files that contain raw filesystem
>>>   # image (the backing file of a loop device, as a sparse file). The
>>> reason for
>>>   # keeping some as tarballs of raw images is that for these cases
>>> btrfs-image
>>>   # isn't able to preserve all the (bad) filesystem structure for some
>>> reason.
>>> +# This provides great flexibility at the cost of large file size.
>>> +#
>>> +# 3) script generated dump(generate_image.sh + needed things in .tar.gz)
>>> +# The image is generated by the generate_image.sh script alone the
>>> needed
>>> +# files in the tarball, normally a quite small btrfs-image dump.
>>> +# This one combines the advatange of relative small btrfs-image and the
>>> +# flexibility to support corrupted image.
>>>   for i in $(find $here/tests/fsck-tests -name '*.img' -o -name
>>> '*.tar.xz' | sort)
>>>   do
>>>          echo "     [TEST]    $(basename $i)"
>>> @@ -39,16 +61,24 @@ do
>>>
>>>          extension=${i#*.}
>>>
>>> +       if [ -f generate_image.sh ]; then
>>> +               rm generate_image.sh
>>> +       fi
>>> +
>>>          if [ $extension == "img" ]; then
>>>                  run_check $here/btrfs-image -r $i test.img
>>>          else
>>>                  run_check tar xJf $i
>>>          fi
>>>
>>> +       if [ -x generate_image.sh ]; then
>>> +               ./generate_image.sh
>>> +       fi
>>> +
>>>          $here/btrfsck test.img >> $RESULT 2>&1
>>>          [ $? -eq 0 ] && _fail "btrfsck should have detected corruption"
>>>
>>> -       run_check $here/btrfsck --repair test.img
>>> +       run_check_no_fail $here/btrfsck --repair test.img
>>>          run_check $here/btrfsck test.img
>>>   done
>>>
>>> --
>>> 2.1.3
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
Filipe Manana Dec. 15, 2014, 10:13 a.m. UTC | #6
On Mon, Dec 15, 2014 at 9:36 AM, Filipe David Manana <fdmanana@gmail.com> wrote:
> On Mon, Dec 15, 2014 at 3:54 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> Although btrfsck test case support pure image dump(tar.xz), it is still
>> too large for some images, e.g, a small 64M image with about 3 levels
>> (level 0~2) metadata will produce about 2.6M after xz zip, which is too
>> large for a single binary commit.
>>
>> However btrfs-image -c9 will works much finer, the above image with
>> btrfs-image dump will only be less than 200K, which is quite reasonable.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>  tests/fsck-tests.sh | 34 ++++++++++++++++++++++++++++++++--
>>  1 file changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh
>> index 8987d04..007e5b0 100644
>> --- a/tests/fsck-tests.sh
>> +++ b/tests/fsck-tests.sh
>> @@ -22,16 +22,38 @@ run_check()
>>         "$@" >> $RESULT 2>&1 || _fail "failed: $@"
>>  }
>>
>> +# For complicated fsck repair case,
>> +# where even repairing is OK, it may still report problem before or after
>> +# reparing since the repair needs several loops to repair all the problems
>> +# but report checks it before all repair loops done
>> +run_check_no_fail()
>> +{
>> +       echo "############### $@" >> $RESULT 2>&1
>> +       "$@" >> $RESULT 2>&1
>> +}
>> +
>>  rm -f $RESULT
>>
>>  # test rely on corrupting blocks tool
>>  run_check make btrfs-corrupt-block
>>
>> +# Supported test image formats:
>> +# 1) btrfs-image dump(.img files)
>>  # Some broken filesystem images are kept as .img files, created by the tool
>> -# btrfs-image, and others are kept as .tar.xz files that contain raw filesystem
>> +# btrfs-image
>> +#
>> +# 2) binary image dump only(only test.img in .tar.xz)
>> +# Some are kept as .tar.xz files that contain raw filesystem
>>  # image (the backing file of a loop device, as a sparse file). The reason for
>>  # keeping some as tarballs of raw images is that for these cases btrfs-image
>>  # isn't able to preserve all the (bad) filesystem structure for some reason.
>> +# This provides great flexibility at the cost of large file size.
>> +#
>> +# 3) script generated dump(generate_image.sh + needed things in .tar.gz)
>> +# The image is generated by the generate_image.sh script alone the needed
>> +# files in the tarball, normally a quite small btrfs-image dump.
>> +# This one combines the advatange of relative small btrfs-image and the
>> +# flexibility to support corrupted image.
>>  for i in $(find $here/tests/fsck-tests -name '*.img' -o -name '*.tar.xz' | sort)
>>  do
>>         echo "     [TEST]    $(basename $i)"
>> @@ -39,16 +61,24 @@ do
>>
>>         extension=${i#*.}
>>
>> +       if [ -f generate_image.sh ]; then
>> +               rm generate_image.sh
>> +       fi
>> +
>>         if [ $extension == "img" ]; then
>>                 run_check $here/btrfs-image -r $i test.img
>>         else
>>                 run_check tar xJf $i
>>         fi
>>
>> +       if [ -x generate_image.sh ]; then
>> +               ./generate_image.sh
>> +       fi
>> +
>>         $here/btrfsck test.img >> $RESULT 2>&1
>>         [ $? -eq 0 ] && _fail "btrfsck should have detected corruption"
>>
>> -       run_check $here/btrfsck --repair test.img
>> +       run_check_no_fail $here/btrfsck --repair test.img
>>         run_check $here/btrfsck test.img
>>  done
>
> So another thing I would like to see is doing a more comprehensive
> verification that the repair code worked as expected. Currently we
> only check that a readonly fsck, after running fsck --repair, returns
> 0.
>
> For the improvements you've been doing, it's equally important to
> verify that --repair recovered the inodes, links, etc to the
> lost+found directory (or whatever is the directory's name).
>
> So perhaps adding a verify.sh script to the tarball for example?

Or, forgot before, it might be better to do such verification/test in
xfstests since we can create the fs and use the new btrfs-progs
programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
already and probably run by a lot more people (compared to the fsck
tests of btrfs-progs).

>
> Thanks
>
>>
>> --
>> 2.1.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
David Sterba Dec. 15, 2014, 5:35 p.m. UTC | #7
On Mon, Dec 15, 2014 at 09:36:51AM +0000, Filipe David Manana wrote:
> So another thing I would like to see is doing a more comprehensive
> verification that the repair code worked as expected. Currently we
> only check that a readonly fsck, after running fsck --repair, returns
> 0.
> 
> For the improvements you've been doing, it's equally important to
> verify that --repair recovered the inodes, links, etc to the
> lost+found directory (or whatever is the directory's name).
> 
> So perhaps adding a verify.sh script to the tarball for example?

A verifier script would be good, but I'd rather not put it into the
tarball. We might want to edit it, do cleanups etc, this would require
to regenerate the image each time and the changes would be hard to
review.

We can use the base image name and add -verify.sh suffix instead, eg.
007-bad_root_items_fs_skinny.tar.xz and
007-bad_root_items_fs_skinny-verify.sh


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Dec. 15, 2014, 6:19 p.m. UTC | #8
On Mon, Dec 15, 2014 at 10:13:45AM +0000, Filipe David Manana wrote:
> >
> > So another thing I would like to see is doing a more comprehensive
> > verification that the repair code worked as expected. Currently we
> > only check that a readonly fsck, after running fsck --repair, returns
> > 0.
> >
> > For the improvements you've been doing, it's equally important to
> > verify that --repair recovered the inodes, links, etc to the
> > lost+found directory (or whatever is the directory's name).
> >
> > So perhaps adding a verify.sh script to the tarball for example?
> 
> Or, forgot before, it might be better to do such verification/test in
> xfstests since we can create the fs and use the new btrfs-progs
> programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
> already and probably run by a lot more people (compared to the fsck
> tests of btrfs-progs).

I'm thinking about the best way how to integrate that, but it seems that
there will be always some level of code or infrastructure duplication
(or other hassle).

btrfs-corrupt-block is not installed by default (make install) and it's
not a type of utility I'd consider for default installations. The tests
would be skipped in absence of the utility, so there will be test
environments where "install xfstests, install btrfspprogs" will not add
the desired test coverage. Solvable by packaging the extra progs.

Adding corrupt-block into xfsprogs is infeasible (IMO too much code from
btrfs-progs to be added).

I don't know how much infrastructure code we'd have to either write or
copy from fstests, but I think it would not be that much. Ideally we
could write the tests within btrfs-progs and then submit them to fstests
once they're considered reliable. If we keep the same "syntax" of the
tests, provide stubs where applicable, the code duplication in test
itself would be zero. We'd only have to write the stubs in btrfs-progs
and probably extend fstests to provide helpers for preparing/unpacking
the images.

Also, collecting the crafted binary images may bloat the git repo
nicely, even if we use xz.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Dec. 16, 2014, 12:58 a.m. UTC | #9
-------- Original Message --------
Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + 
corrupt script fsck test case.
From: David Sterba <dsterba@suse.cz>
To: Filipe David Manana <fdmanana@gmail.com>
Date: 2014?12?16? 01:35
> On Mon, Dec 15, 2014 at 09:36:51AM +0000, Filipe David Manana wrote:
>> So another thing I would like to see is doing a more comprehensive
>> verification that the repair code worked as expected. Currently we
>> only check that a readonly fsck, after running fsck --repair, returns
>> 0.
>>
>> For the improvements you've been doing, it's equally important to
>> verify that --repair recovered the inodes, links, etc to the
>> lost+found directory (or whatever is the directory's name).
>>
>> So perhaps adding a verify.sh script to the tarball for example?
> A verifier script would be good, but I'd rather not put it into the
> tarball. We might want to edit it, do cleanups etc, this would require
> to regenerate the image each time and the changes would be hard to
> review.
>
> We can use the base image name and add -verify.sh suffix instead, eg.
> 007-bad_root_items_fs_skinny.tar.xz and
> 007-bad_root_items_fs_skinny-verify.sh
>
>
I'd like to add verify script too, especially when it is put out of the 
tarball.

But to the leaf-corruption case, it seems a little overkilled for me.

1) The object of leaf-corrupt recover is not to salvage data.
Although most of the patches are trying its best to salvage as much data 
as possible ,
from ino to file type or even later extent data, but in fact, the 
patchset's main object is to make the metadata
of the btrfs consistent. The data recovery is just a optional addition.
(Original, it's designed to delete every inode whose metadata is lost in 
a corrupted leaf)
So the second btrfsck's return value instead of the contents in 
lost+found is the important.

2) The recovery is *lossy*, verify would better be called on *lossless* 
recovery
Leaf-corruption is based on the btree recovery, which will introduce 
data loss(at least a leaf),
so we can't ensure anything.
And in some case, repair_inode_backref() will even repair backref before 
nlink repair,
which may introduce some randomness
(if a inode_item is not corrupted in a leaf, then a backref maybe 
repaired without move it to lost+found dir)
So for *lossy* repair, I prefer not to add verify script.

I generally agree to add verify script support, but only for lossless 
recovery case.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Dec. 16, 2014, 1 a.m. UTC | #10
-------- Original Message --------
Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + 
corrupt script fsck test case.
From: Filipe David Manana <fdmanana@gmail.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2014?12?15? 17:43
> On Mon, Dec 15, 2014 at 9:40 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> -------- Original Message --------
>> Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
>> script fsck test case.
>> From: Filipe David Manana <fdmanana@gmail.com>
>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Date: 2014?12?15? 17:00
>>> On Mon, Dec 15, 2014 at 3:54 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> wrote:
>>>> Although btrfsck test case support pure image dump(tar.xz), it is still
>>>> too large for some images, e.g, a small 64M image with about 3 levels
>>>> (level 0~2) metadata will produce about 2.6M after xz zip, which is too
>>>> large for a single binary commit.
>>>>
>>>> However btrfs-image -c9 will works much finer, the above image with
>>>> btrfs-image dump will only be less than 200K, which is quite reasonable.
>>>>
>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> ---
>>>>    tests/fsck-tests.sh | 34 ++++++++++++++++++++++++++++++++--
>>>>    1 file changed, 32 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh
>>>> index 8987d04..007e5b0 100644
>>>> --- a/tests/fsck-tests.sh
>>>> +++ b/tests/fsck-tests.sh
>>>> @@ -22,16 +22,38 @@ run_check()
>>>>           "$@" >> $RESULT 2>&1 || _fail "failed: $@"
>>>>    }
>>>>
>>>> +# For complicated fsck repair case,
>>>> +# where even repairing is OK, it may still report problem before or
>>>> after
>>>> +# reparing since the repair needs several loops to repair all the
>>>> problems
>>>> +# but report checks it before all repair loops done
>>>> +run_check_no_fail()
>>>> +{
>>>> +       echo "############### $@" >> $RESULT 2>&1
>>>> +       "$@" >> $RESULT 2>&1
>>>> +}
>>> I'm confused with this function, why it's needed and the respective
>>> comment.
>>> So I can interpret it as either:
>>>
>>> 1) The several loops means fsck --repair does multiple passages
>>> internally to fix some issues?
>>>    If this is the case, we (user or script) only need to call fsck
>>> --repair once, which should exit with status 0 if it was able to fix
>>> all the issues, right? If so, then we should check that fsck --repair
>>> exits with status 0, removing the need for this new function.
>> Sorry for the poor explain.
>>
>> The problem is, there is some check cases before we doing repair and these
>> check result is bad so
>> btrfsck thinks there is err even it will be repaired later.
>>
>> So The result is, especially on corrupted-leaf case, btrfsck --repair will
>> fix all the problems but
>> still return 1, and the next btrfsck without --repair will return 0.
> That seems wrong to me. If --repair was able to fix all problems it
> should exit with status 0.
> If a script is running fsck --repair it would incorrectly assume
> --repair failed.
That's right, I'll look into it and try to fix the return value things 
before I send the v2 test case.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Dec. 16, 2014, 1:35 a.m. UTC | #11
-------- Original Message --------
Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + 
corrupt script fsck test case.
From: David Sterba <dsterba@suse.cz>
To: Filipe David Manana <fdmanana@gmail.com>
Date: 2014?12?16? 02:19
> On Mon, Dec 15, 2014 at 10:13:45AM +0000, Filipe David Manana wrote:
>>> So another thing I would like to see is doing a more comprehensive
>>> verification that the repair code worked as expected. Currently we
>>> only check that a readonly fsck, after running fsck --repair, returns
>>> 0.
>>>
>>> For the improvements you've been doing, it's equally important to
>>> verify that --repair recovered the inodes, links, etc to the
>>> lost+found directory (or whatever is the directory's name).
>>>
>>> So perhaps adding a verify.sh script to the tarball for example?
>> Or, forgot before, it might be better to do such verification/test in
>> xfstests since we can create the fs and use the new btrfs-progs
>> programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
>> already and probably run by a lot more people (compared to the fsck
>> tests of btrfs-progs).
> I'm thinking about the best way how to integrate that, but it seems that
> there will be always some level of code or infrastructure duplication
> (or other hassle).
>
> btrfs-corrupt-block is not installed by default (make install) and it's
> not a type of utility I'd consider for default installations. The tests
> would be skipped in absence of the utility, so there will be test
> environments where "install xfstests, install btrfspprogs" will not add
> the desired test coverage. Solvable by packaging the extra progs.
>
> Adding corrupt-block into xfsprogs is infeasible (IMO too much code from
> btrfs-progs to be added).
>
> I don't know how much infrastructure code we'd have to either write or
> copy from fstests, but I think it would not be that much. Ideally we
> could write the tests within btrfs-progs and then submit them to fstests
> once they're considered reliable. If we keep the same "syntax" of the
> tests, provide stubs where applicable, the code duplication in test
> itself would be zero. We'd only have to write the stubs in btrfs-progs
> and probably extend fstests to provide helpers for preparing/unpacking
> the images.
In my wildest idea, if we have a good enough btrfs debugger(maybe even 
stronger than debugfs), which can
do almost everything from read key/item to corrupt given structure, then 
we can resolve them all.
No binary image since corruption can be done by it and verify can also 
done by it.
(OK, it's just a daydream)

But IMHO, isn't xfstests designed to mainly detect kernel defeats?
I don't see any fsck tool test case in it.

Thanks,
Qu
>
> Also, collecting the crafted binary images may bloat the git repo
> nicely, even if we use xz.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Dec. 16, 2014, 1:55 p.m. UTC | #12
On Tue, Dec 16, 2014 at 12:58 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
> -------- Original Message --------
> Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
> script fsck test case.
> From: David Sterba <dsterba@suse.cz>
> To: Filipe David Manana <fdmanana@gmail.com>
> Date: 2014?12?16? 01:35
>>
>> On Mon, Dec 15, 2014 at 09:36:51AM +0000, Filipe David Manana wrote:
>>>
>>> So another thing I would like to see is doing a more comprehensive
>>> verification that the repair code worked as expected. Currently we
>>> only check that a readonly fsck, after running fsck --repair, returns
>>> 0.
>>>
>>> For the improvements you've been doing, it's equally important to
>>> verify that --repair recovered the inodes, links, etc to the
>>> lost+found directory (or whatever is the directory's name).
>>>
>>> So perhaps adding a verify.sh script to the tarball for example?
>>
>> A verifier script would be good, but I'd rather not put it into the
>> tarball. We might want to edit it, do cleanups etc, this would require
>> to regenerate the image each time and the changes would be hard to
>> review.
>>
>> We can use the base image name and add -verify.sh suffix instead, eg.
>> 007-bad_root_items_fs_skinny.tar.xz and
>> 007-bad_root_items_fs_skinny-verify.sh
>>
>>
> I'd like to add verify script too, especially when it is put out of the
> tarball.
>
> But to the leaf-corruption case, it seems a little overkilled for me.
>
> 1) The object of leaf-corrupt recover is not to salvage data.
> Although most of the patches are trying its best to salvage as much data as
> possible ,
> from ino to file type or even later extent data, but in fact, the patchset's
> main object is to make the metadata
> of the btrfs consistent. The data recovery is just a optional addition.
> (Original, it's designed to delete every inode whose metadata is lost in a
> corrupted leaf)
> So the second btrfsck's return value instead of the contents in lost+found
> is the important.
>
> 2) The recovery is *lossy*, verify would better be called on *lossless*
> recovery
> Leaf-corruption is based on the btree recovery, which will introduce data
> loss(at least a leaf),
> so we can't ensure anything.
> And in some case, repair_inode_backref() will even repair backref before
> nlink repair,
> which may introduce some randomness
> (if a inode_item is not corrupted in a leaf, then a backref maybe repaired
> without move it to lost+found dir)
> So for *lossy* repair, I prefer not to add verify script.

From the moment we have code that accomplishes something, it doesn't
matter if it was part of a primary or secondary goal of a patch, nor
if it does full or partial recovery. If we have code that does
something (intentionally) we should always try to have tests for it -
if we don't care about what the code does exactly, then we probably
shouldn't have it in the first place.
Otherwise code will break more easily with future changes. Having
manual tests done on each release (or ideally after each btrfs-progs
or fsck at least) is error prone...

>
> I generally agree to add verify script support, but only for lossless
> recovery case.
>
> Thanks,
> Qu
Filipe Manana Dec. 16, 2014, 2:08 p.m. UTC | #13
On Tue, Dec 16, 2014 at 1:35 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
> -------- Original Message --------
> Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
> script fsck test case.
> From: David Sterba <dsterba@suse.cz>
> To: Filipe David Manana <fdmanana@gmail.com>
> Date: 2014?12?16? 02:19
>>
>> On Mon, Dec 15, 2014 at 10:13:45AM +0000, Filipe David Manana wrote:
>>>>
>>>> So another thing I would like to see is doing a more comprehensive
>>>> verification that the repair code worked as expected. Currently we
>>>> only check that a readonly fsck, after running fsck --repair, returns
>>>> 0.
>>>>
>>>> For the improvements you've been doing, it's equally important to
>>>> verify that --repair recovered the inodes, links, etc to the
>>>> lost+found directory (or whatever is the directory's name).
>>>>
>>>> So perhaps adding a verify.sh script to the tarball for example?
>>>
>>> Or, forgot before, it might be better to do such verification/test in
>>> xfstests since we can create the fs and use the new btrfs-progs
>>> programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
>>> already and probably run by a lot more people (compared to the fsck
>>> tests of btrfs-progs).
>>
>> I'm thinking about the best way how to integrate that, but it seems that
>> there will be always some level of code or infrastructure duplication
>> (or other hassle).
>>
>> btrfs-corrupt-block is not installed by default (make install) and it's
>> not a type of utility I'd consider for default installations. The tests
>> would be skipped in absence of the utility, so there will be test
>> environments where "install xfstests, install btrfspprogs" will not add
>> the desired test coverage. Solvable by packaging the extra progs.
>>
>> Adding corrupt-block into xfsprogs is infeasible (IMO too much code from
>> btrfs-progs to be added).
>>
>> I don't know how much infrastructure code we'd have to either write or
>> copy from fstests, but I think it would not be that much. Ideally we
>> could write the tests within btrfs-progs and then submit them to fstests
>> once they're considered reliable. If we keep the same "syntax" of the
>> tests, provide stubs where applicable, the code duplication in test
>> itself would be zero. We'd only have to write the stubs in btrfs-progs
>> and probably extend fstests to provide helpers for preparing/unpacking
>> the images.
>
> In my wildest idea, if we have a good enough btrfs debugger(maybe even
> stronger than debugfs), which can
> do almost everything from read key/item to corrupt given structure, then we
> can resolve them all.
> No binary image since corruption can be done by it and verify can also done
> by it.
> (OK, it's just a daydream)
>
> But IMHO, isn't xfstests designed to mainly detect kernel defeats?
> I don't see any fsck tool test case in it.

I don't think xfstests is specific to test the kernel implementation
of filesystems. I believe it includes user space code too, but I might
be wrong so I'm CCing fstests and Dave to get an authoritative answer.

And I don't see a big problem with btrfs-corrupt not being built by
default when running "make" on btrfs-progs. We can make the xfstest
not run and print an informative message if the btrfs-corrupt program
isn't available - several xfstests do this, they require some
executable which isn't either in the xfstests nor xfsprogs
repositories - for example generic/241 which requires 'dbench' and
generic/299 which requires 'fio'.

Sure it would be ideal not requiring people to go into btrfs-progs,
type "make btrfs-corrupt" and put in a $PATH directory. But that's
certainly much better than not having automated tests, and I think
even with that extra "work" developers at least would be willing to do
it in their test machines. I also have a slight preference to get all
tests in the same place (xfstests) rather than in multiple
repositories (btrfs-progs, xfstests).

thanks

>
> Thanks,
> Qu
>
>>
>> Also, collecting the crafted binary images may bloat the git repo
>> nicely, even if we use xz.
Qu Wenruo Dec. 17, 2014, 12:49 a.m. UTC | #14
-------- Original Message --------
Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + 
corrupt script fsck test case.
From: Filipe David Manana <fdmanana@gmail.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2014?12?16? 21:55
> On Tue, Dec 16, 2014 at 12:58 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> -------- Original Message --------
>> Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
>> script fsck test case.
>> From: David Sterba <dsterba@suse.cz>
>> To: Filipe David Manana <fdmanana@gmail.com>
>> Date: 2014?12?16? 01:35
>>> On Mon, Dec 15, 2014 at 09:36:51AM +0000, Filipe David Manana wrote:
>>>> So another thing I would like to see is doing a more comprehensive
>>>> verification that the repair code worked as expected. Currently we
>>>> only check that a readonly fsck, after running fsck --repair, returns
>>>> 0.
>>>>
>>>> For the improvements you've been doing, it's equally important to
>>>> verify that --repair recovered the inodes, links, etc to the
>>>> lost+found directory (or whatever is the directory's name).
>>>>
>>>> So perhaps adding a verify.sh script to the tarball for example?
>>> A verifier script would be good, but I'd rather not put it into the
>>> tarball. We might want to edit it, do cleanups etc, this would require
>>> to regenerate the image each time and the changes would be hard to
>>> review.
>>>
>>> We can use the base image name and add -verify.sh suffix instead, eg.
>>> 007-bad_root_items_fs_skinny.tar.xz and
>>> 007-bad_root_items_fs_skinny-verify.sh
>>>
>>>
>> I'd like to add verify script too, especially when it is put out of the
>> tarball.
>>
>> But to the leaf-corruption case, it seems a little overkilled for me.
>>
>> 1) The object of leaf-corrupt recover is not to salvage data.
>> Although most of the patches are trying its best to salvage as much data as
>> possible ,
>> from ino to file type or even later extent data, but in fact, the patchset's
>> main object is to make the metadata
>> of the btrfs consistent. The data recovery is just a optional addition.
>> (Original, it's designed to delete every inode whose metadata is lost in a
>> corrupted leaf)
>> So the second btrfsck's return value instead of the contents in lost+found
>> is the important.
>>
>> 2) The recovery is *lossy*, verify would better be called on *lossless*
>> recovery
>> Leaf-corruption is based on the btree recovery, which will introduce data
>> loss(at least a leaf),
>> so we can't ensure anything.
>> And in some case, repair_inode_backref() will even repair backref before
>> nlink repair,
>> which may introduce some randomness
>> (if a inode_item is not corrupted in a leaf, then a backref maybe repaired
>> without move it to lost+found dir)
>> So for *lossy* repair, I prefer not to add verify script.
>  From the moment we have code that accomplishes something, it doesn't
> matter if it was part of a primary or secondary goal of a patch, nor
> if it does full or partial recovery. If we have code that does
> something (intentionally) we should always try to have tests for it -
> if we don't care about what the code does exactly, then we probably
> shouldn't have it in the first place.
First please let me make it clear when you mention the verify script, 
what it really means.
Which case in the leaf-corruption recovery do you mean?

1) Generic script verifying everything or part of the inodes in original 
image is recovered.
If you mean this *GENERIC* one, that's impractical for leaf-corruption 
recovery and any other
lossy recovery.

2) Specific script only for this specific damaged image.
This one is suitable for lossy recovery case but it may be restrict 
verify script, it only tells
what result it should be after recovery for the specific image. And it 
may be only a reminder
for new patches modifying the existing recovery codes.

If you mean 1), IMHO it's not practical.
If you mean 2), I am OK to implement the verify script but I doubt about 
the necessarily.

Thanks,
Qu
> Otherwise code will break more easily with future changes. Having
> manual tests done on each release (or ideally after each btrfs-progs
> or fsck at least) is error prone...
>
>> I generally agree to add verify script support, but only for lossless
>> recovery case.
>>
>> Thanks,
>> Qu
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Dec. 18, 2014, 5:16 p.m. UTC | #15
On Mon, Dec 15, 2014 at 02:09:02PM +0800, Qu Wenruo wrote:
> It seems the second patch, which about 225K including a btrfs-image 
> dump, can't pass the ml's size limit.

I've cherry-picked the commit with the test image.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Dec. 24, 2014, 12:03 a.m. UTC | #16
[ Sorry to take some time to get to this, it got caught by a spam
filter and I only just noticed. ]

On Tue, Dec 16, 2014 at 02:08:53PM +0000, Filipe David Manana wrote:
> On Tue, Dec 16, 2014 at 1:35 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> >
> > -------- Original Message --------
> > Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
> > script fsck test case.
> > From: David Sterba <dsterba@suse.cz>
> > To: Filipe David Manana <fdmanana@gmail.com>
> > Date: 2014?12?16? 02:19
> >>
> >> On Mon, Dec 15, 2014 at 10:13:45AM +0000, Filipe David Manana wrote:
> >>>>
> >>>> So another thing I would like to see is doing a more comprehensive
> >>>> verification that the repair code worked as expected. Currently we
> >>>> only check that a readonly fsck, after running fsck --repair, returns
> >>>> 0.
> >>>>
> >>>> For the improvements you've been doing, it's equally important to
> >>>> verify that --repair recovered the inodes, links, etc to the
> >>>> lost+found directory (or whatever is the directory's name).
> >>>>
> >>>> So perhaps adding a verify.sh script to the tarball for example?
> >>>
> >>> Or, forgot before, it might be better to do such verification/test in
> >>> xfstests since we can create the fs and use the new btrfs-progs
> >>> programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
> >>> already and probably run by a lot more people (compared to the fsck
> >>> tests of btrfs-progs).
> >>
> >> I'm thinking about the best way how to integrate that, but it seems that
> >> there will be always some level of code or infrastructure duplication
> >> (or other hassle).
> >>
> >> btrfs-corrupt-block is not installed by default (make install) and it's
> >> not a type of utility I'd consider for default installations. The tests
> >> would be skipped in absence of the utility, so there will be test
> >> environments where "install xfstests, install btrfspprogs" will not add
> >> the desired test coverage. Solvable by packaging the extra progs.
> >>
> >> Adding corrupt-block into xfsprogs is infeasible (IMO too much code from
> >> btrfs-progs to be added).
> >>
> >> I don't know how much infrastructure code we'd have to either write or
> >> copy from fstests, but I think it would not be that much. Ideally we
> >> could write the tests within btrfs-progs and then submit them to fstests
> >> once they're considered reliable. If we keep the same "syntax" of the
> >> tests, provide stubs where applicable, the code duplication in test
> >> itself would be zero. We'd only have to write the stubs in btrfs-progs
> >> and probably extend fstests to provide helpers for preparing/unpacking
> >> the images.
> >
> > In my wildest idea, if we have a good enough btrfs debugger(maybe even
> > stronger than debugfs), which can
> > do almost everything from read key/item to corrupt given structure, then we
> > can resolve them all.
> > No binary image since corruption can be done by it and verify can also done
> > by it.
> > (OK, it's just a daydream)
> >
> > But IMHO, isn't xfstests designed to mainly detect kernel defeats?
> > I don't see any fsck tool test case in it.
> 
> I don't think xfstests is specific to test the kernel implementation
> of filesystems. I believe it includes user space code too, but I might
> be wrong so I'm CCing fstests and Dave to get an authoritative answer.

We use fstests to test everything we ship for XFS - kernel and
userspace. i.e. we have tests that corrupt filesystems with xfs_db
and then test that xfs_repair can fix them, and once fixed the
filesystem can be mounted and used by the kernel...

i.e. fstests is for testing both the kernel code and the utilities
used to manipulate filesystems.

> And I don't see a big problem with btrfs-corrupt not being built by
> default when running "make" on btrfs-progs. We can make the xfstest
> not run and print an informative message if the btrfs-corrupt program
> isn't available - several xfstests do this, they require some
> executable which isn't either in the xfstests nor xfsprogs
> repositories - for example generic/241 which requires 'dbench' and
> generic/299 which requires 'fio'.

_require_btrfs_corrupt_prog()

Just like we do with lots of other optional userspace tools that are
required for various tests to run.

> I also have a slight preference to get all
> tests in the same place (xfstests) rather than in multiple
> repositories (btrfs-progs, xfstests).

Definitely my preference as well.

Cheers,

Dave.
Qu Wenruo Dec. 24, 2014, 2:56 a.m. UTC | #17
-------- Original Message --------
Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + 
corrupt script fsck test case.
From: Dave Chinner <david@fromorbit.com>
To: Filipe David Manana <fdmanana@gmail.com>
Date: 2014?12?24? 08:03
> [ Sorry to take some time to get to this, it got caught by a spam
> filter and I only just noticed. ]
>
> On Tue, Dec 16, 2014 at 02:08:53PM +0000, Filipe David Manana wrote:
>> On Tue, Dec 16, 2014 at 1:35 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>> -------- Original Message --------
>>> Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
>>> script fsck test case.
>>> From: David Sterba <dsterba@suse.cz>
>>> To: Filipe David Manana <fdmanana@gmail.com>
>>> Date: 2014?12?16? 02:19
>>>> On Mon, Dec 15, 2014 at 10:13:45AM +0000, Filipe David Manana wrote:
>>>>>> So another thing I would like to see is doing a more comprehensive
>>>>>> verification that the repair code worked as expected. Currently we
>>>>>> only check that a readonly fsck, after running fsck --repair, returns
>>>>>> 0.
>>>>>>
>>>>>> For the improvements you've been doing, it's equally important to
>>>>>> verify that --repair recovered the inodes, links, etc to the
>>>>>> lost+found directory (or whatever is the directory's name).
>>>>>>
>>>>>> So perhaps adding a verify.sh script to the tarball for example?
>>>>> Or, forgot before, it might be better to do such verification/test in
>>>>> xfstests since we can create the fs and use the new btrfs-progs
>>>>> programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
>>>>> already and probably run by a lot more people (compared to the fsck
>>>>> tests of btrfs-progs).
>>>> I'm thinking about the best way how to integrate that, but it seems that
>>>> there will be always some level of code or infrastructure duplication
>>>> (or other hassle).
>>>>
>>>> btrfs-corrupt-block is not installed by default (make install) and it's
>>>> not a type of utility I'd consider for default installations. The tests
>>>> would be skipped in absence of the utility, so there will be test
>>>> environments where "install xfstests, install btrfspprogs" will not add
>>>> the desired test coverage. Solvable by packaging the extra progs.
>>>>
>>>> Adding corrupt-block into xfsprogs is infeasible (IMO too much code from
>>>> btrfs-progs to be added).
>>>>
>>>> I don't know how much infrastructure code we'd have to either write or
>>>> copy from fstests, but I think it would not be that much. Ideally we
>>>> could write the tests within btrfs-progs and then submit them to fstests
>>>> once they're considered reliable. If we keep the same "syntax" of the
>>>> tests, provide stubs where applicable, the code duplication in test
>>>> itself would be zero. We'd only have to write the stubs in btrfs-progs
>>>> and probably extend fstests to provide helpers for preparing/unpacking
>>>> the images.
>>> In my wildest idea, if we have a good enough btrfs debugger(maybe even
>>> stronger than debugfs), which can
>>> do almost everything from read key/item to corrupt given structure, then we
>>> can resolve them all.
>>> No binary image since corruption can be done by it and verify can also done
>>> by it.
>>> (OK, it's just a daydream)
>>>
>>> But IMHO, isn't xfstests designed to mainly detect kernel defeats?
>>> I don't see any fsck tool test case in it.
>> I don't think xfstests is specific to test the kernel implementation
>> of filesystems. I believe it includes user space code too, but I might
>> be wrong so I'm CCing fstests and Dave to get an authoritative answer.
> We use fstests to test everything we ship for XFS - kernel and
> userspace. i.e. we have tests that corrupt filesystems with xfs_db
> and then test that xfs_repair can fix them, and once fixed the
> filesystem can be mounted and used by the kernel...
>
> i.e. fstests is for testing both the kernel code and the utilities
> used to manipulate filesystems.
That's great.

But what will happen if some btrfs cases need binary(still huge even 
compressed) or
btrfs-image dump(some existing dumps are already several MB)?
Will it be OK for fstests?

Or should we wait until btrfs-progs has a good debug tools like xfs_db 
or debugfs and use
them to generate the corrupted image like xfs testcases do?

Thanks,
Qu

>
>> And I don't see a big problem with btrfs-corrupt not being built by
>> default when running "make" on btrfs-progs. We can make the xfstest
>> not run and print an informative message if the btrfs-corrupt program
>> isn't available - several xfstests do this, they require some
>> executable which isn't either in the xfstests nor xfsprogs
>> repositories - for example generic/241 which requires 'dbench' and
>> generic/299 which requires 'fio'.
> _require_btrfs_corrupt_prog()
>
> Just like we do with lots of other optional userspace tools that are
> required for various tests to run.
>
>> I also have a slight preference to get all
>> tests in the same place (xfstests) rather than in multiple
>> repositories (btrfs-progs, xfstests).
> Definitely my preference as well.
>
> Cheers,
>
> Dave.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Dec. 24, 2014, 3:27 a.m. UTC | #18
On Wed, Dec 24, 2014 at 10:56:04AM +0800, Qu Wenruo wrote:
> 
> -------- Original Message --------
> Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image +
> corrupt script fsck test case.
> From: Dave Chinner <david@fromorbit.com>
> To: Filipe David Manana <fdmanana@gmail.com>
> Date: 2014?12?24? 08:03
> >[ Sorry to take some time to get to this, it got caught by a spam
> >filter and I only just noticed. ]
> >
> >On Tue, Dec 16, 2014 at 02:08:53PM +0000, Filipe David Manana wrote:
> >>On Tue, Dec 16, 2014 at 1:35 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> >>>-------- Original Message --------
> >>>Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
> >>>script fsck test case.
> >>>From: David Sterba <dsterba@suse.cz>
> >>>To: Filipe David Manana <fdmanana@gmail.com>
> >>>Date: 2014?12?16? 02:19
> >>>>On Mon, Dec 15, 2014 at 10:13:45AM +0000, Filipe David Manana wrote:
> >>>>>>So another thing I would like to see is doing a more comprehensive
> >>>>>>verification that the repair code worked as expected. Currently we
> >>>>>>only check that a readonly fsck, after running fsck --repair, returns
> >>>>>>0.
> >>>>>>
> >>>>>>For the improvements you've been doing, it's equally important to
> >>>>>>verify that --repair recovered the inodes, links, etc to the
> >>>>>>lost+found directory (or whatever is the directory's name).
> >>>>>>
> >>>>>>So perhaps adding a verify.sh script to the tarball for example?
> >>>>>Or, forgot before, it might be better to do such verification/test in
> >>>>>xfstests since we can create the fs and use the new btrfs-progs
> >>>>>programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
> >>>>>already and probably run by a lot more people (compared to the fsck
> >>>>>tests of btrfs-progs).
> >>>>I'm thinking about the best way how to integrate that, but it seems that
> >>>>there will be always some level of code or infrastructure duplication
> >>>>(or other hassle).
> >>>>
> >>>>btrfs-corrupt-block is not installed by default (make install) and it's
> >>>>not a type of utility I'd consider for default installations. The tests
> >>>>would be skipped in absence of the utility, so there will be test
> >>>>environments where "install xfstests, install btrfspprogs" will not add
> >>>>the desired test coverage. Solvable by packaging the extra progs.
> >>>>
> >>>>Adding corrupt-block into xfsprogs is infeasible (IMO too much code from
> >>>>btrfs-progs to be added).
> >>>>
> >>>>I don't know how much infrastructure code we'd have to either write or
> >>>>copy from fstests, but I think it would not be that much. Ideally we
> >>>>could write the tests within btrfs-progs and then submit them to fstests
> >>>>once they're considered reliable. If we keep the same "syntax" of the
> >>>>tests, provide stubs where applicable, the code duplication in test
> >>>>itself would be zero. We'd only have to write the stubs in btrfs-progs
> >>>>and probably extend fstests to provide helpers for preparing/unpacking
> >>>>the images.
> >>>In my wildest idea, if we have a good enough btrfs debugger(maybe even
> >>>stronger than debugfs), which can
> >>>do almost everything from read key/item to corrupt given structure, then we
> >>>can resolve them all.
> >>>No binary image since corruption can be done by it and verify can also done
> >>>by it.
> >>>(OK, it's just a daydream)
> >>>
> >>>But IMHO, isn't xfstests designed to mainly detect kernel defeats?
> >>>I don't see any fsck tool test case in it.
> >>I don't think xfstests is specific to test the kernel implementation
> >>of filesystems. I believe it includes user space code too, but I might
> >>be wrong so I'm CCing fstests and Dave to get an authoritative answer.
> >We use fstests to test everything we ship for XFS - kernel and
> >userspace. i.e. we have tests that corrupt filesystems with xfs_db
> >and then test that xfs_repair can fix them, and once fixed the
> >filesystem can be mounted and used by the kernel...
> >
> >i.e. fstests is for testing both the kernel code and the utilities
> >used to manipulate filesystems.
> That's great.
> 
> But what will happen if some btrfs cases need binary(still huge even
> compressed) or
> btrfs-image dump(some existing dumps are already several MB)?
> Will it be OK for fstests?

No. Random filesystem images don't make a regression test. They are
just different encoding of the game of whack-a-mole. Corruption
recovery tests work best with simple, obvious corruptions. This
makes it easy to make sure correct detection and recovery behaviour
works without any other complications.

e.g. overwriting the primary superblock with zeros is a simple test
that every filesystem recovery tool should be able to handle.

> Or should we wait until btrfs-progs has a good debug tools like
> xfs_db or debugfs and use
> them to generate the corrupted image like xfs testcases do?

You don't need a tool like xfs_db to do repeatable structure
corruption - you can do this with dd by writing to known offsets.

xfs_db just makes that really easy by having a filesystem format
parser that means finding the offset we want to write to is
trivial....

Cheers,

Dave.
diff mbox

Patch

diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh
index 8987d04..007e5b0 100644
--- a/tests/fsck-tests.sh
+++ b/tests/fsck-tests.sh
@@ -22,16 +22,38 @@  run_check()
 	"$@" >> $RESULT 2>&1 || _fail "failed: $@"
 }
 
+# For complicated fsck repair case,
+# where even repairing is OK, it may still report problem before or after
+# reparing since the repair needs several loops to repair all the problems
+# but report checks it before all repair loops done
+run_check_no_fail()
+{
+	echo "############### $@" >> $RESULT 2>&1
+	"$@" >> $RESULT 2>&1
+}
+
 rm -f $RESULT
 
 # test rely on corrupting blocks tool
 run_check make btrfs-corrupt-block
 
+# Supported test image formats:
+# 1) btrfs-image dump(.img files)
 # Some broken filesystem images are kept as .img files, created by the tool
-# btrfs-image, and others are kept as .tar.xz files that contain raw filesystem
+# btrfs-image
+#
+# 2) binary image dump only(only test.img in .tar.xz)
+# Some are kept as .tar.xz files that contain raw filesystem
 # image (the backing file of a loop device, as a sparse file). The reason for
 # keeping some as tarballs of raw images is that for these cases btrfs-image
 # isn't able to preserve all the (bad) filesystem structure for some reason.
+# This provides great flexibility at the cost of large file size.
+#
+# 3) script generated dump(generate_image.sh + needed things in .tar.gz)
+# The image is generated by the generate_image.sh script alone the needed
+# files in the tarball, normally a quite small btrfs-image dump.
+# This one combines the advatange of relative small btrfs-image and the
+# flexibility to support corrupted image.
 for i in $(find $here/tests/fsck-tests -name '*.img' -o -name '*.tar.xz' | sort)
 do
 	echo "     [TEST]    $(basename $i)"
@@ -39,16 +61,24 @@  do
 
 	extension=${i#*.}
 
+	if [ -f generate_image.sh ]; then
+		rm generate_image.sh
+	fi
+
 	if [ $extension == "img" ]; then
 		run_check $here/btrfs-image -r $i test.img
 	else
 		run_check tar xJf $i
 	fi
 
+	if [ -x generate_image.sh ]; then
+		./generate_image.sh
+	fi
+
 	$here/btrfsck test.img >> $RESULT 2>&1
 	[ $? -eq 0 ] && _fail "btrfsck should have detected corruption"
 
-	run_check $here/btrfsck --repair test.img
+	run_check_no_fail $here/btrfsck --repair test.img
 	run_check $here/btrfsck test.img
 done