diff mbox

fstests: btrfs: Test send on heavily deduped file

Message ID 20160719024402.19324-1-quwenruo@cn.fujitsu.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Qu Wenruo July 19, 2016, 2:44 a.m. UTC
For fully deduped file, whose file extents are all pointing to the same
extent, btrfs backref walk can be very time consuming, long enough to
trigger softlock.

Unfortunately, btrfs send is one of the caller of such backref walk
under an O(n) loop, making the total time complexity to O(n^3) or more.

And even worse, btrfs send will allocate memory in such loop, to trigger
OOM on system with small memory(<4G).

This test case will check if btrfs send will cause these problems.

Reporeted-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
To: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
To Filipe:
  For the soft lockup, I will try my best to figure out some method to
  avoid such lockup (but it will still be very time consuming though).

  But for the OOM problem, would you mind disabling clone/reflink
  detection in btrfs send?

  In fact we should really avoid doing full backref walk inside an O(n)
  loop (just like previous fiemap ioctl test case), and avoid any full
  backref walk if possible.
  So I'm afraid that's the only solution yet.

Thanks,
Qu
---
 tests/btrfs/127     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/127.out |  3 ++
 tests/btrfs/group   |  1 +
 3 files changed, 93 insertions(+)
 create mode 100755 tests/btrfs/127
 create mode 100644 tests/btrfs/127.out

Comments

Eryu Guan July 19, 2016, 4:35 a.m. UTC | #1
On Tue, Jul 19, 2016 at 10:44:02AM +0800, Qu Wenruo wrote:
> For fully deduped file, whose file extents are all pointing to the same
> extent, btrfs backref walk can be very time consuming, long enough to
> trigger softlock.
> 
> Unfortunately, btrfs send is one of the caller of such backref walk
> under an O(n) loop, making the total time complexity to O(n^3) or more.
> 
> And even worse, btrfs send will allocate memory in such loop, to trigger
> OOM on system with small memory(<4G).
> 
> This test case will check if btrfs send will cause these problems.
> 
> Reporeted-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
> To: Filipe Manana <fdmanana@gmail.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
> To Filipe:
>   For the soft lockup, I will try my best to figure out some method to
>   avoid such lockup (but it will still be very time consuming though).
> 
>   But for the OOM problem, would you mind disabling clone/reflink
>   detection in btrfs send?
> 
>   In fact we should really avoid doing full backref walk inside an O(n)
>   loop (just like previous fiemap ioctl test case), and avoid any full
>   backref walk if possible.
>   So I'm afraid that's the only solution yet.
> 
> Thanks,
> Qu
> ---
>  tests/btrfs/127     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/127.out |  3 ++
>  tests/btrfs/group   |  1 +
>  3 files changed, 93 insertions(+)
>  create mode 100755 tests/btrfs/127
>  create mode 100644 tests/btrfs/127.out
> 
> diff --git a/tests/btrfs/127 b/tests/btrfs/127
> new file mode 100755
> index 0000000..a31a653
> --- /dev/null
> +++ b/tests/btrfs/127
> @@ -0,0 +1,89 @@
> +#! /bin/bash
> +# FS QA Test 127
> +#
> +# Check if btrfs send can handle large deduped file, whose file extents
> +# are all pointing to one extent.
> +# Such file structure will cause quite large pressure to any operation which
> +# iterates all backref of one extent.
> +# And unfortunately, btrfs send is one of these operations, and will cause
> +# softlock or OOM on systems with small memory(<4G).
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2016 Fujitsu. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/reflink
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_require_scratch_reflink
> +
> +_scratch_mkfs > /dev/null 2>&1
> +_scratch_mount
> +
> +nr_extents=$((4096 * $LOAD_FACTOR))
> +
> +# Use 128K blocksize, the default value of both deduperemove or
> +# inband dedupe
> +blocksize=$((128 * 1024))
> +file=$SCRATCH_MNT/foobar
> +
> +# create the initial file, whose file extents are all point to one extent
> +_pwrite_byte 0xcdcdcdcd 0 $blocksize  $file | _filter_xfs_io
> +
> +for i in $(seq 1 $(($nr_extents - 1))); do
> +	_reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \
> +		> /dev/null 2>&1
> +done
> +
> +# create a RO snapshot, so we can send out the snapshot
> +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/ro_snap
> +
> +# send out the subvolume, and it will either:
> +# 1) OOM since memory is allocated inside a O(n^3) loop
> +# 2) Softlock since time consuming backref walk is called without scheduling.
> +# the send destination is not important, just send will cause the problem
> +_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/127.out b/tests/btrfs/127.out
> new file mode 100644
> index 0000000..8b08bf8
> --- /dev/null
> +++ b/tests/btrfs/127.out
> @@ -0,0 +1,3 @@
> +QA output created by 127
> +wrote 131072/131072 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index a21a80a..d9174b5 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -129,3 +129,4 @@
>  124 auto replace
>  125 auto replace
>  126 auto quick qgroup
> +127 auto clone send

This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it
hangs the latest kernel, stop other tests from running, I think we can
add it to 'dangerous' group as well.

I can fix them at merge time, if there's no other major updates to be
done. (I'll let the patch sitting in the list for more time, in case
others have more review comments).

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo July 19, 2016, 5:06 a.m. UTC | #2
Add Filipe to the reception, as "To:" doesn't add him automatically.

Thanks,
Qu

At 07/19/2016 10:44 AM, Qu Wenruo wrote:
> For fully deduped file, whose file extents are all pointing to the same
> extent, btrfs backref walk can be very time consuming, long enough to
> trigger softlock.
>
> Unfortunately, btrfs send is one of the caller of such backref walk
> under an O(n) loop, making the total time complexity to O(n^3) or more.
>
> And even worse, btrfs send will allocate memory in such loop, to trigger
> OOM on system with small memory(<4G).
>
> This test case will check if btrfs send will cause these problems.
>
> Reporeted-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
> To: Filipe Manana <fdmanana@gmail.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
> To Filipe:
>   For the soft lockup, I will try my best to figure out some method to
>   avoid such lockup (but it will still be very time consuming though).
>
>   But for the OOM problem, would you mind disabling clone/reflink
>   detection in btrfs send?
>
>   In fact we should really avoid doing full backref walk inside an O(n)
>   loop (just like previous fiemap ioctl test case), and avoid any full
>   backref walk if possible.
>   So I'm afraid that's the only solution yet.
>
> Thanks,
> Qu
> ---
>  tests/btrfs/127     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/127.out |  3 ++
>  tests/btrfs/group   |  1 +
>  3 files changed, 93 insertions(+)
>  create mode 100755 tests/btrfs/127
>  create mode 100644 tests/btrfs/127.out
>
> diff --git a/tests/btrfs/127 b/tests/btrfs/127
> new file mode 100755
> index 0000000..a31a653
> --- /dev/null
> +++ b/tests/btrfs/127
> @@ -0,0 +1,89 @@
> +#! /bin/bash
> +# FS QA Test 127
> +#
> +# Check if btrfs send can handle large deduped file, whose file extents
> +# are all pointing to one extent.
> +# Such file structure will cause quite large pressure to any operation which
> +# iterates all backref of one extent.
> +# And unfortunately, btrfs send is one of these operations, and will cause
> +# softlock or OOM on systems with small memory(<4G).
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2016 Fujitsu. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/reflink
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_require_scratch_reflink
> +
> +_scratch_mkfs > /dev/null 2>&1
> +_scratch_mount
> +
> +nr_extents=$((4096 * $LOAD_FACTOR))
> +
> +# Use 128K blocksize, the default value of both deduperemove or
> +# inband dedupe
> +blocksize=$((128 * 1024))
> +file=$SCRATCH_MNT/foobar
> +
> +# create the initial file, whose file extents are all point to one extent
> +_pwrite_byte 0xcdcdcdcd 0 $blocksize  $file | _filter_xfs_io
> +
> +for i in $(seq 1 $(($nr_extents - 1))); do
> +	_reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \
> +		> /dev/null 2>&1
> +done
> +
> +# create a RO snapshot, so we can send out the snapshot
> +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/ro_snap
> +
> +# send out the subvolume, and it will either:
> +# 1) OOM since memory is allocated inside a O(n^3) loop
> +# 2) Softlock since time consuming backref walk is called without scheduling.
> +# the send destination is not important, just send will cause the problem
> +_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/127.out b/tests/btrfs/127.out
> new file mode 100644
> index 0000000..8b08bf8
> --- /dev/null
> +++ b/tests/btrfs/127.out
> @@ -0,0 +1,3 @@
> +QA output created by 127
> +wrote 131072/131072 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index a21a80a..d9174b5 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -129,3 +129,4 @@
>  124 auto replace
>  125 auto replace
>  126 auto quick qgroup
> +127 auto clone send
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo July 19, 2016, 5:42 a.m. UTC | #3
At 07/19/2016 12:35 PM, Eryu Guan wrote:
> On Tue, Jul 19, 2016 at 10:44:02AM +0800, Qu Wenruo wrote:
>> For fully deduped file, whose file extents are all pointing to the same
>> extent, btrfs backref walk can be very time consuming, long enough to
>> trigger softlock.
>>
>> Unfortunately, btrfs send is one of the caller of such backref walk
>> under an O(n) loop, making the total time complexity to O(n^3) or more.
>>
>> And even worse, btrfs send will allocate memory in such loop, to trigger
>> OOM on system with small memory(<4G).
>>
>> This test case will check if btrfs send will cause these problems.
>>
>> Reporeted-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
>> To: Filipe Manana <fdmanana@gmail.com>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>> To Filipe:
>>   For the soft lockup, I will try my best to figure out some method to
>>   avoid such lockup (but it will still be very time consuming though).
>>
>>   But for the OOM problem, would you mind disabling clone/reflink
>>   detection in btrfs send?
>>
>>   In fact we should really avoid doing full backref walk inside an O(n)
>>   loop (just like previous fiemap ioctl test case), and avoid any full
>>   backref walk if possible.
>>   So I'm afraid that's the only solution yet.
>>
>> Thanks,
>> Qu
>> ---
>>  tests/btrfs/127     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  tests/btrfs/127.out |  3 ++
>>  tests/btrfs/group   |  1 +
>>  3 files changed, 93 insertions(+)
>>  create mode 100755 tests/btrfs/127
>>  create mode 100644 tests/btrfs/127.out
>>
>> diff --git a/tests/btrfs/127 b/tests/btrfs/127
>> new file mode 100755
>> index 0000000..a31a653
>> --- /dev/null
>> +++ b/tests/btrfs/127
>> @@ -0,0 +1,89 @@
>> +#! /bin/bash
>> +# FS QA Test 127
>> +#
>> +# Check if btrfs send can handle large deduped file, whose file extents
>> +# are all pointing to one extent.
>> +# Such file structure will cause quite large pressure to any operation which
>> +# iterates all backref of one extent.
>> +# And unfortunately, btrfs send is one of these operations, and will cause
>> +# softlock or OOM on systems with small memory(<4G).
>> +#
>> +#-----------------------------------------------------------------------
>> +# Copyright (c) 2016 Fujitsu. All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#-----------------------------------------------------------------------
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1	# failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +	cd /
>> +	rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/reflink
>> +
>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +
>> +# Modify as appropriate.
>> +_supported_fs btrfs
>> +_supported_os Linux
>> +_require_scratch
>> +_require_scratch_reflink
>> +
>> +_scratch_mkfs > /dev/null 2>&1
>> +_scratch_mount
>> +
>> +nr_extents=$((4096 * $LOAD_FACTOR))
>> +
>> +# Use 128K blocksize, the default value of both deduperemove or
>> +# inband dedupe
>> +blocksize=$((128 * 1024))
>> +file=$SCRATCH_MNT/foobar
>> +
>> +# create the initial file, whose file extents are all point to one extent
>> +_pwrite_byte 0xcdcdcdcd 0 $blocksize  $file | _filter_xfs_io
>> +
>> +for i in $(seq 1 $(($nr_extents - 1))); do
>> +	_reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \
>> +		> /dev/null 2>&1
>> +done
>> +
>> +# create a RO snapshot, so we can send out the snapshot
>> +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/ro_snap
>> +
>> +# send out the subvolume, and it will either:
>> +# 1) OOM since memory is allocated inside a O(n^3) loop
>> +# 2) Softlock since time consuming backref walk is called without scheduling.
>> +# the send destination is not important, just send will cause the problem
>> +_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/127.out b/tests/btrfs/127.out
>> new file mode 100644
>> index 0000000..8b08bf8
>> --- /dev/null
>> +++ b/tests/btrfs/127.out
>> @@ -0,0 +1,3 @@
>> +QA output created by 127
>> +wrote 131072/131072 bytes at offset 0
>> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> diff --git a/tests/btrfs/group b/tests/btrfs/group
>> index a21a80a..d9174b5 100644
>> --- a/tests/btrfs/group
>> +++ b/tests/btrfs/group
>> @@ -129,3 +129,4 @@
>>  124 auto replace
>>  125 auto replace
>>  126 auto quick qgroup
>> +127 auto clone send
>
> This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it
> hangs the latest kernel, stop other tests from running, I think we can
> add it to 'dangerous' group as well.
>

Thanks for this info.
I'm completely OK to add this group to 'stress' and 'dangerous'.


However I'm a little curious about the meaning/standard of these groups.

Does 'dangerous' conflicts with 'auto'?
Since under most case, tester would just execute './check -g auto' and 
the system hangs at the test case.
So I'm a little confused with the 'auto' group.

BTW, I also hopes there will be some documentation explaining the 
standard of these groups, so some guys like me can avoid wasting time of 
maintainers.

Thanks,
Qu

> I can fix them at merge time, if there's no other major updates to be
> done. (I'll let the patch sitting in the list for more time, in case
> others have more review comments).
>
> Thanks,
> Eryu
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan July 20, 2016, 7:01 a.m. UTC | #4
On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote:
> > 
> > This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it
> > hangs the latest kernel, stop other tests from running, I think we can
> > add it to 'dangerous' group as well.
> > 
> 
> Thanks for this info.
> I'm completely OK to add this group to 'stress' and 'dangerous'.
> 
> 
> However I'm a little curious about the meaning/standard of these groups.
> 
> Does 'dangerous' conflicts with 'auto'?
> Since under most case, tester would just execute './check -g auto' and the
> system hangs at the test case.
> So I'm a little confused with the 'auto' group.

I quote my previous email here to explain the 'auto' group
http://www.spinics.net/lists/fstests/msg03262.html

"
I searched for Dave's explainations on 'auto' group in his reviews, and
got the following definitions:

- it should be a valid & reliable test (it's finished and have
  deterministic output) [1]
- it passes on current upstream kernels, if it fails, it's likely to be
  resolved in forseeable future [2]
- it should take no longer than 5 minutes to finish [3]
"

And "The only difference between quick and auto group criteria is the
test runtime." Usually 'quick' tests finish within 30s.

For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add
freeze and dangerous groups"), and seems that it didn't have a very
clear definition[*]. But I think any test that could hang/crash recent
kernels is considered as dangerous.

* http://oss.sgi.com/archives/xfs/2012-03/msg00073.html

For this test, it triggers soft lockup on latest 4.7-rc7 kernel and
prevents further tests from running, so it's part of dangerous. And this
bug will be fixed in forseeable future, right? So it's OK to add 'auto'
group. And we can always remove 'dangerous' group from tests when we
find they're only crashing old kernels, e.g. commit 8c94797 ext4: move
30[1234] from the dangerous to the auto group

For running tests, "./check -g auto -x dangerous" might fit your need.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo July 20, 2016, 7:40 a.m. UTC | #5
At 07/20/2016 03:01 PM, Eryu Guan wrote:
> On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote:
>>>
>>> This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it
>>> hangs the latest kernel, stop other tests from running, I think we can
>>> add it to 'dangerous' group as well.
>>>
>>
>> Thanks for this info.
>> I'm completely OK to add this group to 'stress' and 'dangerous'.
>>
>>
>> However I'm a little curious about the meaning/standard of these groups.
>>
>> Does 'dangerous' conflicts with 'auto'?
>> Since under most case, tester would just execute './check -g auto' and the
>> system hangs at the test case.
>> So I'm a little confused with the 'auto' group.
>
> I quote my previous email here to explain the 'auto' group
> http://www.spinics.net/lists/fstests/msg03262.html
>
> "
> I searched for Dave's explainations on 'auto' group in his reviews, and
> got the following definitions:
>
> - it should be a valid & reliable test (it's finished and have
>   deterministic output) [1]
> - it passes on current upstream kernels, if it fails, it's likely to be
>   resolved in forseeable future [2]
> - it should take no longer than 5 minutes to finish [3]
> "
>
> And "The only difference between quick and auto group criteria is the
> test runtime." Usually 'quick' tests finish within 30s.
>
> For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add
> freeze and dangerous groups"), and seems that it didn't have a very
> clear definition[*]. But I think any test that could hang/crash recent
> kernels is considered as dangerous.
>
> * http://oss.sgi.com/archives/xfs/2012-03/msg00073.html

Thanks for all the info, really helps a lot.

Especially for quick and auto difference.

Would you mind me applying this standard to current btrfs test cases?

BTW, does the standard apply to *ALL* possible mount options or just 
*deafult* mount option?

For example, btrfs/011 can finish in about 5min with default mount 
option, but for 'nodatasum' it can take up to 2 hours.
So should it belong to 'auto'?

Thanks,
Qu

>
> For this test, it triggers soft lockup on latest 4.7-rc7 kernel and
> prevents further tests from running, so it's part of dangerous. And this
> bug will be fixed in forseeable future, right? So it's OK to add 'auto'
> group. And we can always remove 'dangerous' group from tests when we
> find they're only crashing old kernels, e.g. commit 8c94797 ext4: move
> 30[1234] from the dangerous to the auto group
>
> For running tests, "./check -g auto -x dangerous" might fit your need.
>
> Thanks,
> Eryu
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner July 20, 2016, 11:30 p.m. UTC | #6
On Wed, Jul 20, 2016 at 03:01:00PM +0800, Eryu Guan wrote:
> For running tests, "./check -g auto -x dangerous" might fit your need.

Yes, that's precisely the way the dangerous group is intended to be
used: as a exclusion filter that gets applied to other test group
definitions.

Cheers,

Dave.
Dave Chinner July 20, 2016, 11:37 p.m. UTC | #7
On Wed, Jul 20, 2016 at 03:40:29PM +0800, Qu Wenruo wrote:
> At 07/20/2016 03:01 PM, Eryu Guan wrote:
> >On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote:
> >>>
> >>>This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it
> >>>hangs the latest kernel, stop other tests from running, I think we can
> >>>add it to 'dangerous' group as well.
> >>>
> >>
> >>Thanks for this info.
> >>I'm completely OK to add this group to 'stress' and 'dangerous'.
> >>
> >>
> >>However I'm a little curious about the meaning/standard of these groups.
> >>
> >>Does 'dangerous' conflicts with 'auto'?
> >>Since under most case, tester would just execute './check -g auto' and the
> >>system hangs at the test case.
> >>So I'm a little confused with the 'auto' group.
> >
> >I quote my previous email here to explain the 'auto' group
> >http://www.spinics.net/lists/fstests/msg03262.html
> >
> >"
> >I searched for Dave's explainations on 'auto' group in his reviews, and
> >got the following definitions:
> >
> >- it should be a valid & reliable test (it's finished and have
> >  deterministic output) [1]
> >- it passes on current upstream kernels, if it fails, it's likely to be
> >  resolved in forseeable future [2]
> >- it should take no longer than 5 minutes to finish [3]
> >"
> >
> >And "The only difference between quick and auto group criteria is the
> >test runtime." Usually 'quick' tests finish within 30s.
> >
> >For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add
> >freeze and dangerous groups"), and seems that it didn't have a very
> >clear definition[*]. But I think any test that could hang/crash recent
> >kernels is considered as dangerous.
> >
> >* http://oss.sgi.com/archives/xfs/2012-03/msg00073.html
> 
> Thanks for all the info, really helps a lot.
> 
> Especially for quick and auto difference.
> 
> Would you mind me applying this standard to current btrfs test cases?

It shoul dbe applied to all test cases, regardless of the filesystem
type.

> BTW, does the standard apply to *ALL* possible mount options or just
> *deafult* mount option?

Generally it applies to the default case.

> For example, btrfs/011 can finish in about 5min with default mount
> option, but for 'nodatasum' it can take up to 2 hours.
> So should it belong to 'auto'?

Yes. Also, keep in mind that runtime is dependent on the type of
storage you are testing on. The general idea is that the
"< 30s quick, < 5m auto" rule is based on how long the test takes to
run on a local single spindle SATA drive, as that is the basic
hardware we'd expect people to be testing against. This means that a
test that takes 20s on your SSD might not be a "quick" test because
it takes 5 minutes on spinning rust....

Cheers,

Dave.
Qu Wenruo July 21, 2016, 2:05 a.m. UTC | #8
At 07/21/2016 07:37 AM, Dave Chinner wrote:
> On Wed, Jul 20, 2016 at 03:40:29PM +0800, Qu Wenruo wrote:
>> At 07/20/2016 03:01 PM, Eryu Guan wrote:
>>> On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote:
>>>>>
>>>>> This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it
>>>>> hangs the latest kernel, stop other tests from running, I think we can
>>>>> add it to 'dangerous' group as well.
>>>>>
>>>>
>>>> Thanks for this info.
>>>> I'm completely OK to add this group to 'stress' and 'dangerous'.
>>>>
>>>>
>>>> However I'm a little curious about the meaning/standard of these groups.
>>>>
>>>> Does 'dangerous' conflicts with 'auto'?
>>>> Since under most case, tester would just execute './check -g auto' and the
>>>> system hangs at the test case.
>>>> So I'm a little confused with the 'auto' group.
>>>
>>> I quote my previous email here to explain the 'auto' group
>>> http://www.spinics.net/lists/fstests/msg03262.html
>>>
>>> "
>>> I searched for Dave's explainations on 'auto' group in his reviews, and
>>> got the following definitions:
>>>
>>> - it should be a valid & reliable test (it's finished and have
>>>  deterministic output) [1]
>>> - it passes on current upstream kernels, if it fails, it's likely to be
>>>  resolved in forseeable future [2]
>>> - it should take no longer than 5 minutes to finish [3]
>>> "
>>>
>>> And "The only difference between quick and auto group criteria is the
>>> test runtime." Usually 'quick' tests finish within 30s.
>>>
>>> For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add
>>> freeze and dangerous groups"), and seems that it didn't have a very
>>> clear definition[*]. But I think any test that could hang/crash recent
>>> kernels is considered as dangerous.
>>>
>>> * http://oss.sgi.com/archives/xfs/2012-03/msg00073.html
>>
>> Thanks for all the info, really helps a lot.
>>
>> Especially for quick and auto difference.
>>
>> Would you mind me applying this standard to current btrfs test cases?
>
> It shoul dbe applied to all test cases, regardless of the filesystem
> type.

It's straightforward for specific fs test cases.

But for generic, I'm a little concerned of the quick/auto standard.
Should we use result of one single fs(and which fs? I assume xfs though) 
or all fs, to determine quick/auto group?

For example, generic/127 involves quite a lot metadata operation, while 
for some fs (OK, btrfs again) metadata operation is quite slow compared 
to other stable fs like ext4 or xfs.

So it makes quick/auto tag quite hard to determine.

Thanks,
Qu

>
>> BTW, does the standard apply to *ALL* possible mount options or just
>> *deafult* mount option?
>
> Generally it applies to the default case.
>
>> For example, btrfs/011 can finish in about 5min with default mount
>> option, but for 'nodatasum' it can take up to 2 hours.
>> So should it belong to 'auto'?
>
> Yes. Also, keep in mind that runtime is dependent on the type of
> storage you are testing on. The general idea is that the
> "< 30s quick, < 5m auto" rule is based on how long the test takes to
> run on a local single spindle SATA drive, as that is the basic
> hardware we'd expect people to be testing against. This means that a
> test that takes 20s on your SSD might not be a "quick" test because
> it takes 5 minutes on spinning rust....
>
> Cheers,
>
> Dave.
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner July 21, 2016, 10:57 p.m. UTC | #9
On Thu, Jul 21, 2016 at 10:05:25AM +0800, Qu Wenruo wrote:
> 
> 
> At 07/21/2016 07:37 AM, Dave Chinner wrote:
> >On Wed, Jul 20, 2016 at 03:40:29PM +0800, Qu Wenruo wrote:
> >>At 07/20/2016 03:01 PM, Eryu Guan wrote:
> >>>On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote:
> >>>>>
> >>>>>This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it
> >>>>>hangs the latest kernel, stop other tests from running, I think we can
> >>>>>add it to 'dangerous' group as well.
> >>>>>
> >>>>
> >>>>Thanks for this info.
> >>>>I'm completely OK to add this group to 'stress' and 'dangerous'.
> >>>>
> >>>>
> >>>>However I'm a little curious about the meaning/standard of these groups.
> >>>>
> >>>>Does 'dangerous' conflicts with 'auto'?
> >>>>Since under most case, tester would just execute './check -g auto' and the
> >>>>system hangs at the test case.
> >>>>So I'm a little confused with the 'auto' group.
> >>>
> >>>I quote my previous email here to explain the 'auto' group
> >>>http://www.spinics.net/lists/fstests/msg03262.html
> >>>
> >>>"
> >>>I searched for Dave's explainations on 'auto' group in his reviews, and
> >>>got the following definitions:
> >>>
> >>>- it should be a valid & reliable test (it's finished and have
> >>> deterministic output) [1]
> >>>- it passes on current upstream kernels, if it fails, it's likely to be
> >>> resolved in forseeable future [2]
> >>>- it should take no longer than 5 minutes to finish [3]
> >>>"
> >>>
> >>>And "The only difference between quick and auto group criteria is the
> >>>test runtime." Usually 'quick' tests finish within 30s.
> >>>
> >>>For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add
> >>>freeze and dangerous groups"), and seems that it didn't have a very
> >>>clear definition[*]. But I think any test that could hang/crash recent
> >>>kernels is considered as dangerous.
> >>>
> >>>* http://oss.sgi.com/archives/xfs/2012-03/msg00073.html
> >>
> >>Thanks for all the info, really helps a lot.
> >>
> >>Especially for quick and auto difference.
> >>
> >>Would you mind me applying this standard to current btrfs test cases?
> >
> >It shoul dbe applied to all test cases, regardless of the filesystem
> >type.
> 
> It's straightforward for specific fs test cases.
> 
> But for generic, I'm a little concerned of the quick/auto standard.
> Should we use result of one single fs(and which fs? I assume xfs
> though) or all fs, to determine quick/auto group?

It's up to the person introducing the new test to determine how it
should be classified. If this causes problems for other people, then
they can send patches to reclassify it appropriately based on their
runtime numbers and configuration.

Historicaly speaking, we've tended to ignore btrfs performance
because it's been so randomly terrible. It's so often been a massive
outlier that we've generally considered btrfs performance to be a
bug that needs fixing and not something that requires the test to be
reclassified for everyone else.

> So it makes quick/auto tag quite hard to determine.

It's quite straight forward, really. Send patches with numbers for
the tests you want reclassified, and lots of people will say "yes, i
see that too" or "no, that only takes 2s on my smallest, slowest
test machine, it should remain as a quick test". And that's about
it.

Cheers,

Dave.
diff mbox

Patch

diff --git a/tests/btrfs/127 b/tests/btrfs/127
new file mode 100755
index 0000000..a31a653
--- /dev/null
+++ b/tests/btrfs/127
@@ -0,0 +1,89 @@ 
+#! /bin/bash
+# FS QA Test 127
+#
+# Check if btrfs send can handle large deduped file, whose file extents
+# are all pointing to one extent.
+# Such file structure will cause quite large pressure to any operation which
+# iterates all backref of one extent.
+# And unfortunately, btrfs send is one of these operations, and will cause
+# softlock or OOM on systems with small memory(<4G).
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Fujitsu. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_scratch_reflink
+
+_scratch_mkfs > /dev/null 2>&1
+_scratch_mount
+
+nr_extents=$((4096 * $LOAD_FACTOR))
+
+# Use 128K blocksize, the default value of both deduperemove or
+# inband dedupe
+blocksize=$((128 * 1024))
+file=$SCRATCH_MNT/foobar
+
+# create the initial file, whose file extents are all point to one extent
+_pwrite_byte 0xcdcdcdcd 0 $blocksize  $file | _filter_xfs_io
+
+for i in $(seq 1 $(($nr_extents - 1))); do
+	_reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \
+		> /dev/null 2>&1
+done
+
+# create a RO snapshot, so we can send out the snapshot
+_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/ro_snap
+
+# send out the subvolume, and it will either:
+# 1) OOM since memory is allocated inside a O(n^3) loop
+# 2) Softlock since time consuming backref walk is called without scheduling.
+# the send destination is not important, just send will cause the problem
+_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/127.out b/tests/btrfs/127.out
new file mode 100644
index 0000000..8b08bf8
--- /dev/null
+++ b/tests/btrfs/127.out
@@ -0,0 +1,3 @@ 
+QA output created by 127
+wrote 131072/131072 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/group b/tests/btrfs/group
index a21a80a..d9174b5 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -129,3 +129,4 @@ 
 124 auto replace
 125 auto replace
 126 auto quick qgroup
+127 auto clone send