diff mbox

fstests: btrfs: Add regression test for reserved space leak.

Message ID 1438669637-3666-1-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Qu Wenruo Aug. 4, 2015, 6:27 a.m. UTC
The regression is introduced in v4.2-rc1, with the big btrfs qgroup
change.
The problem is, qgroup reserved space is never freed, causing even we
increase the limit, we can still hit the EDQUOT much faster than it
should.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 tests/btrfs/089     | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/089.out |  5 ++++
 tests/btrfs/group   |  1 +
 3 files changed, 89 insertions(+)
 create mode 100755 tests/btrfs/089
 create mode 100644 tests/btrfs/089.out

Comments

Filipe Manana Aug. 4, 2015, 1:16 p.m. UTC | #1
On Tue, Aug 4, 2015 at 7:27 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> The regression is introduced in v4.2-rc1, with the big btrfs qgroup
> change.
> The problem is, qgroup reserved space is never freed, causing even we
> increase the limit, we can still hit the EDQUOT much faster than it
> should.
>
> Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Thanks for doing this Qu.
The test fails without the btrfs fix and passes with it, as expected.
However, one question below:

> ---
>  tests/btrfs/089     | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/089.out |  5 ++++
>  tests/btrfs/group   |  1 +
>  3 files changed, 89 insertions(+)
>  create mode 100755 tests/btrfs/089
>  create mode 100644 tests/btrfs/089.out
>
> diff --git a/tests/btrfs/089 b/tests/btrfs/089
> new file mode 100755
> index 0000000..0c018f2
> --- /dev/null
> +++ b/tests/btrfs/089
> @@ -0,0 +1,83 @@
> +#! /bin/bash
> +# FS QA Test 089
> +#
> +# Regression test for btrfs qgroup reserved space leak.
> +#
> +# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not
> +# over limit after previous write.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2015 Fujitsu. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1       # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +       cd /
> +       rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_need_to_be_root
> +
> +# Use big blocksize to ensure there is still enough space left
> +# for metadata reserve after hitting EDQUOT
> +BLOCKSIZE=$(( 2 * 1024 * 1024 ))
> +FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes
> +
> +# The last block won't be able to finish write, as metadata takes
> +# $NODESIZE space, causing the last block triggering EDQUOT
> +LENGTH=$(( $FILESIZE - $BLOCKSIZE ))
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024))
> +
> +_run_btrfs_util_prog quota enable $SCRATCH_MNT
> +_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT
> +
> +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \
> +       $SCRATCH_MNT/foo | _filter_xfs_io
> +sync

Why is the sync needed here? Can you add a comment explaining why? It
isn't trivial/obvious (for me at least), specially because without the
call to "sync" the test passes without the btrfs fix.

thanks

> +
> +# Double the limit to allow further write
> +_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT
> +
> +# Test whether further write can succeed
> +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \
> +       $SCRATCH_MNT/foo | _filter_xfs_io
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out
> new file mode 100644
> index 0000000..396888f
> --- /dev/null
> +++ b/tests/btrfs/089.out
> @@ -0,0 +1,5 @@
> +QA output created by 089
> +wrote 132120576/132120576 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 132120576/132120576 bytes at offset 132120576
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index ffe18bf..225b532 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -91,6 +91,7 @@
>  086 auto quick clone
>  087 auto quick send
>  088 auto quick metadata
> +089 auto quick qgroup
>  090 auto quick metadata
>  091 auto quick qgroup
>  092 auto quick send
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Aug. 5, 2015, 12:57 a.m. UTC | #2
Filipe David Manana wrote on 2015/08/04 14:16 +0100:
> On Tue, Aug 4, 2015 at 7:27 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> The regression is introduced in v4.2-rc1, with the big btrfs qgroup
>> change.
>> The problem is, qgroup reserved space is never freed, causing even we
>> increase the limit, we can still hit the EDQUOT much faster than it
>> should.
>>
>> Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>
> Thanks for doing this Qu.
> The test fails without the btrfs fix and passes with it, as expected.
> However, one question below:
Thanks for the review, Filipe.

I'll explain it inline below.
>
>> ---
>>   tests/btrfs/089     | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   tests/btrfs/089.out |  5 ++++
>>   tests/btrfs/group   |  1 +
>>   3 files changed, 89 insertions(+)
>>   create mode 100755 tests/btrfs/089
>>   create mode 100644 tests/btrfs/089.out
>>
>> diff --git a/tests/btrfs/089 b/tests/btrfs/089
>> new file mode 100755
>> index 0000000..0c018f2
>> --- /dev/null
>> +++ b/tests/btrfs/089
>> @@ -0,0 +1,83 @@
>> +#! /bin/bash
>> +# FS QA Test 089
>> +#
>> +# Regression test for btrfs qgroup reserved space leak.
>> +#
>> +# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not
>> +# over limit after previous write.
>> +#
>> +#-----------------------------------------------------------------------
>> +# Copyright (c) 2015 Fujitsu. All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#-----------------------------------------------------------------------
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1       # failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +       cd /
>> +       rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +
>> +# real QA test starts here
>> +
>> +# Modify as appropriate.
>> +_supported_fs btrfs
>> +_supported_os Linux
>> +_require_scratch
>> +_need_to_be_root
>> +
>> +# Use big blocksize to ensure there is still enough space left
>> +# for metadata reserve after hitting EDQUOT
>> +BLOCKSIZE=$(( 2 * 1024 * 1024 ))
>> +FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes
>> +
>> +# The last block won't be able to finish write, as metadata takes
>> +# $NODESIZE space, causing the last block triggering EDQUOT
>> +LENGTH=$(( $FILESIZE - $BLOCKSIZE ))
>> +
>> +_scratch_mkfs >>$seqres.full 2>&1
>> +_scratch_mount
>> +_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024))
>> +
>> +_run_btrfs_util_prog quota enable $SCRATCH_MNT
>> +_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT
>> +
>> +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \
>> +       $SCRATCH_MNT/foo | _filter_xfs_io
>> +sync
>
> Why is the sync needed here? Can you add a comment explaining why? It
> isn't trivial/obvious (for me at least), specially because without the
> call to "sync" the test passes without the btrfs fix.
>
> thanks
No problem, I'll send a v2 patch with explain about the sync.

The reason is, without the sync, it's highly possible the data is not 
flush into disk. So the reserved space is correct until data is written.

For current write flow with sync, without the fix patch:
1) Want to write first 126M
    Reserve 126M space

    Qgroup 5: reserved = 126M, rfer = 0(*), rfer_max = 128M
*: Just ignore metadata, as blocksize 2M is much larger than nodesize(16K)

2) Sync
    Data writeback and metadata change
     |- Run delayed refs
        |- Qgroup accouting
    Qgroup 5: reserved = 126M, rfer = 126M, rfer_max = 128M
              ^^ Should be 0, as reserved data is written into disk.

3) Increase limit to 256M
    Qgroup 5: reserved = 126M, rfer = 126M, rfer_max = 256M

4) Want to write the next 126M
    Reserve 126M space.

    But qgroup 5 only has less than 4M available space.
    rfer_max - (reserved + rfer) = 4M

    So reserve fails with EDQUOT.

On the other hand, if there is no sync:
1) Want to write first 126M
    Reserve 126M space

    Qgroup 5: reserved = 126M, rfer = 0(*), rfer_max = 128M
    *: Ignore metadata again.
    Also we assume your memory is large enough to keep that amount of
    dirty pages without trigger page flush.

3) Increase limit to 256M
    Qgroup 5: reserved = 126M, rfer = 0, rfer_max = 256M

    Rfer will only be increase at commit_transaction() time.
    So it will stay 0 until manually sync or dirty page number triggers a
    flush.

4) Want to write the next 126M
    Reserve 126M space.

    Now qgroup 5 has 256 - 126 = 130M available space.
    The reserve will succeed without problem.
    So that's the reason why it pass the test without sync and the fix
    patch.

Thanks,
Qu
>
>> +
>> +# Double the limit to allow further write
>> +_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT
>> +
>> +# Test whether further write can succeed
>> +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \
>> +       $SCRATCH_MNT/foo | _filter_xfs_io
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out
>> new file mode 100644
>> index 0000000..396888f
>> --- /dev/null
>> +++ b/tests/btrfs/089.out
>> @@ -0,0 +1,5 @@
>> +QA output created by 089
>> +wrote 132120576/132120576 bytes at offset 0
>> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +wrote 132120576/132120576 bytes at offset 132120576
>> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> diff --git a/tests/btrfs/group b/tests/btrfs/group
>> index ffe18bf..225b532 100644
>> --- a/tests/btrfs/group
>> +++ b/tests/btrfs/group
>> @@ -91,6 +91,7 @@
>>   086 auto quick clone
>>   087 auto quick send
>>   088 auto quick metadata
>> +089 auto quick qgroup
>>   090 auto quick metadata
>>   091 auto quick qgroup
>>   092 auto quick send
>> --
>> 1.8.3.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tests/btrfs/089 b/tests/btrfs/089
new file mode 100755
index 0000000..0c018f2
--- /dev/null
+++ b/tests/btrfs/089
@@ -0,0 +1,83 @@ 
+#! /bin/bash
+# FS QA Test 089
+#
+# Regression test for btrfs qgroup reserved space leak.
+#
+# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not
+# over limit after previous write.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2015 Fujitsu. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+# Use big blocksize to ensure there is still enough space left
+# for metadata reserve after hitting EDQUOT
+BLOCKSIZE=$(( 2 * 1024 * 1024 ))
+FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes 
+
+# The last block won't be able to finish write, as metadata takes
+# $NODESIZE space, causing the last block triggering EDQUOT
+LENGTH=$(( $FILESIZE - $BLOCKSIZE ))
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024))
+
+_run_btrfs_util_prog quota enable $SCRATCH_MNT
+_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT
+
+$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \
+	$SCRATCH_MNT/foo | _filter_xfs_io
+sync
+
+# Double the limit to allow further write
+_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT
+
+# Test whether further write can succeed
+$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \
+	$SCRATCH_MNT/foo | _filter_xfs_io
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out
new file mode 100644
index 0000000..396888f
--- /dev/null
+++ b/tests/btrfs/089.out
@@ -0,0 +1,5 @@ 
+QA output created by 089
+wrote 132120576/132120576 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 132120576/132120576 bytes at offset 132120576
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/group b/tests/btrfs/group
index ffe18bf..225b532 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -91,6 +91,7 @@ 
 086 auto quick clone
 087 auto quick send
 088 auto quick metadata
+089 auto quick qgroup
 090 auto quick metadata
 091 auto quick qgroup
 092 auto quick send