diff mbox

[v2] fstests: generic: Check if a bull fallocate will change extent number

Message ID 1443519264-19184-1-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Qu Wenruo Sept. 29, 2015, 9:34 a.m. UTC
Normally, a bull fallocate call on a fully written and synced file
should not add an extent.

But not all filesystem follows the correct behavior.

Btrfs has a bug to always truncate the last page if the fallocate start
offset is smaller than inode size.

So add this test case to detect such malfunction.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Add author info...
  Fix some comment typo
---
 tests/generic/110     | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/110.out |  3 ++
 tests/generic/group   |  1 +
 3 files changed, 82 insertions(+)
 create mode 100755 tests/generic/110
 create mode 100644 tests/generic/110.out

Comments

Eryu Guan Sept. 29, 2015, 9:55 a.m. UTC | #1
On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
> Normally, a bull fallocate call on a fully written and synced file
> should not add an extent.
> 
> But not all filesystem follows the correct behavior.
> 
> Btrfs has a bug to always truncate the last page if the fallocate start
> offset is smaller than inode size.
> 
> So add this test case to detect such malfunction.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
> v2:
>   Add author info...
>   Fix some comment typo

(I was going to point these out then saw a v2 :)

> ---
>  tests/generic/110     | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/110.out |  3 ++
>  tests/generic/group   |  1 +
>  3 files changed, 82 insertions(+)
>  create mode 100755 tests/generic/110
>  create mode 100644 tests/generic/110.out
> 
> diff --git a/tests/generic/110 b/tests/generic/110
> new file mode 100755
> index 0000000..b2b140c
> --- /dev/null
> +++ b/tests/generic/110
> @@ -0,0 +1,78 @@
> +#! /bin/bash
> +# FS QA Test 110
> +#
> +# Test if fallocate will create uneeded extra tailing extent
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2015 Fujitsu.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/defrag
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +_supported_fs generic
> +_supported_os IRIX Linux
> +_require_scratch
> +_need_to_be_root

need a '_require_xfs_io_command "falloc"' here, so test _notrun
correctly on ext2/3/NFS/CIFS that don't support fallocate.

> +
> +# Use 64K file size to match any sectorsize
> +# And with a unaligned tailing range to ensure it will be at least 2 pages
> +filesize=$(( 64 * 1024 + 1024 ))
> +
> +_scratch_mkfs > /dev/null 2>&1
> +_scratch_mount
> +$XFS_IO_PROG -f -c "pwrite 0 $filesize" $SCRATCH_MNT/foo | _filter_xfs_io
> +sync
> +orig_extent_nr=`_extent_count $SCRATCH_MNT/foo`
> +
> +# As all space are allocated and even written to disk, this falloc
> +# should do nothing with extent modification.
> +$XFS_IO_PROG -f -c "falloc 0 $filesize" $SCRATCH_MNT/foo
> +sync
> +new_extent_nr=`_extent_count $SCRATCH_MNT/foo`
> +
> +echo "orig: $orig_extent_nr, new: $new_extent_nr" >> $seqres.full
> +
> +if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
> +	echo "number of extents mis-match after bull fallocate"

print out the $orig_extent_nr and $new_extent_nr in this failure case? I
think it's useful to see the difference just from the output diff, don't
have to check the full file.

Thanks,
Eryu

> +fi
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/110.out b/tests/generic/110.out
> new file mode 100644
> index 0000000..64049da
> --- /dev/null
> +++ b/tests/generic/110.out
> @@ -0,0 +1,3 @@
> +QA output created by 110
> +wrote 66560/66560 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> diff --git a/tests/generic/group b/tests/generic/group
> index 4ae256f..428f3e3 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -112,6 +112,7 @@
>  107 auto quick metadata
>  108 auto quick rw
>  109 auto metadata dir
> +110 auto quick prealloc
>  112 rw aio auto quick
>  113 rw aio auto quick
>  117 attr auto quick
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugo Mills Sept. 29, 2015, 10 a.m. UTC | #2
On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
> Normally, a bull fallocate call on a fully written and synced file
> should not add an extent.

   What's a "bull" fallocate call? Is it a typo, or simply something
I'm not familiar with?

   Hugo.

> But not all filesystem follows the correct behavior.
> 
> Btrfs has a bug to always truncate the last page if the fallocate start
> offset is smaller than inode size.
> 
> So add this test case to detect such malfunction.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
> v2:
>   Add author info...
>   Fix some comment typo
> ---
>  tests/generic/110     | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/110.out |  3 ++
>  tests/generic/group   |  1 +
>  3 files changed, 82 insertions(+)
>  create mode 100755 tests/generic/110
>  create mode 100644 tests/generic/110.out
> 
> diff --git a/tests/generic/110 b/tests/generic/110
> new file mode 100755
> index 0000000..b2b140c
> --- /dev/null
> +++ b/tests/generic/110
> @@ -0,0 +1,78 @@
> +#! /bin/bash
> +# FS QA Test 110
> +#
> +# Test if fallocate will create uneeded extra tailing extent
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2015 Fujitsu.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/defrag
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +_supported_fs generic
> +_supported_os IRIX Linux
> +_require_scratch
> +_need_to_be_root
> +
> +# Use 64K file size to match any sectorsize
> +# And with a unaligned tailing range to ensure it will be at least 2 pages
> +filesize=$(( 64 * 1024 + 1024 ))
> +
> +_scratch_mkfs > /dev/null 2>&1
> +_scratch_mount
> +$XFS_IO_PROG -f -c "pwrite 0 $filesize" $SCRATCH_MNT/foo | _filter_xfs_io
> +sync
> +orig_extent_nr=`_extent_count $SCRATCH_MNT/foo`
> +
> +# As all space are allocated and even written to disk, this falloc
> +# should do nothing with extent modification.
> +$XFS_IO_PROG -f -c "falloc 0 $filesize" $SCRATCH_MNT/foo
> +sync
> +new_extent_nr=`_extent_count $SCRATCH_MNT/foo`
> +
> +echo "orig: $orig_extent_nr, new: $new_extent_nr" >> $seqres.full
> +
> +if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
> +	echo "number of extents mis-match after bull fallocate"
> +fi
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/110.out b/tests/generic/110.out
> new file mode 100644
> index 0000000..64049da
> --- /dev/null
> +++ b/tests/generic/110.out
> @@ -0,0 +1,3 @@
> +QA output created by 110
> +wrote 66560/66560 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> diff --git a/tests/generic/group b/tests/generic/group
> index 4ae256f..428f3e3 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -112,6 +112,7 @@
>  107 auto quick metadata
>  108 auto quick rw
>  109 auto metadata dir
> +110 auto quick prealloc
>  112 rw aio auto quick
>  113 rw aio auto quick
>  117 attr auto quick
Qu Wenruo Sept. 29, 2015, 10:13 a.m. UTC | #3
? 2015?09?29? 18:00, Hugo Mills ??:
> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>> Normally, a bull fallocate call on a fully written and synced file
>> should not add an extent.
>
>     What's a "bull" fallocate call? Is it a typo, or simply something
> I'm not familiar with?
>
>     Hugo.

Oh, it should be null.
But null still seems not appropriate here.

I mean a fallocate call which doesn't really allocate any space...

Any good ideas?

Thanks,
Qu
>
>> But not all filesystem follows the correct behavior.
>>
>> Btrfs has a bug to always truncate the last page if the fallocate start
>> offset is smaller than inode size.
>>
>> So add this test case to detect such malfunction.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>> v2:
>>    Add author info...
>>    Fix some comment typo
>> ---
>>   tests/generic/110     | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   tests/generic/110.out |  3 ++
>>   tests/generic/group   |  1 +
>>   3 files changed, 82 insertions(+)
>>   create mode 100755 tests/generic/110
>>   create mode 100644 tests/generic/110.out
>>
>> diff --git a/tests/generic/110 b/tests/generic/110
>> new file mode 100755
>> index 0000000..b2b140c
>> --- /dev/null
>> +++ b/tests/generic/110
>> @@ -0,0 +1,78 @@
>> +#! /bin/bash
>> +# FS QA Test 110
>> +#
>> +# Test if fallocate will create uneeded extra tailing extent
>> +#
>> +#-----------------------------------------------------------------------
>> +# Copyright (c) 2015 Fujitsu.  All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#-----------------------------------------------------------------------
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1	# failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +	cd /
>> +	rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/defrag
>> +
>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +
>> +_supported_fs generic
>> +_supported_os IRIX Linux
>> +_require_scratch
>> +_need_to_be_root
>> +
>> +# Use 64K file size to match any sectorsize
>> +# And with a unaligned tailing range to ensure it will be at least 2 pages
>> +filesize=$(( 64 * 1024 + 1024 ))
>> +
>> +_scratch_mkfs > /dev/null 2>&1
>> +_scratch_mount
>> +$XFS_IO_PROG -f -c "pwrite 0 $filesize" $SCRATCH_MNT/foo | _filter_xfs_io
>> +sync
>> +orig_extent_nr=`_extent_count $SCRATCH_MNT/foo`
>> +
>> +# As all space are allocated and even written to disk, this falloc
>> +# should do nothing with extent modification.
>> +$XFS_IO_PROG -f -c "falloc 0 $filesize" $SCRATCH_MNT/foo
>> +sync
>> +new_extent_nr=`_extent_count $SCRATCH_MNT/foo`
>> +
>> +echo "orig: $orig_extent_nr, new: $new_extent_nr" >> $seqres.full
>> +
>> +if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
>> +	echo "number of extents mis-match after bull fallocate"
>> +fi
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/generic/110.out b/tests/generic/110.out
>> new file mode 100644
>> index 0000000..64049da
>> --- /dev/null
>> +++ b/tests/generic/110.out
>> @@ -0,0 +1,3 @@
>> +QA output created by 110
>> +wrote 66560/66560 bytes at offset 0
>> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> diff --git a/tests/generic/group b/tests/generic/group
>> index 4ae256f..428f3e3 100644
>> --- a/tests/generic/group
>> +++ b/tests/generic/group
>> @@ -112,6 +112,7 @@
>>   107 auto quick metadata
>>   108 auto quick rw
>>   109 auto metadata dir
>> +110 auto quick prealloc
>>   112 rw aio auto quick
>>   113 rw aio auto quick
>>   117 attr auto quick
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Sept. 29, 2015, 10:16 a.m. UTC | #4
? 2015?09?29? 17:55, Eryu Guan ??:
> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>> Normally, a bull fallocate call on a fully written and synced file
>> should not add an extent.
>>
>> But not all filesystem follows the correct behavior.
>>
>> Btrfs has a bug to always truncate the last page if the fallocate start
>> offset is smaller than inode size.
>>
>> So add this test case to detect such malfunction.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>> v2:
>>    Add author info...
>>    Fix some comment typo
>
> (I was going to point these out then saw a v2 :)
>
>> ---
>>   tests/generic/110     | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   tests/generic/110.out |  3 ++
>>   tests/generic/group   |  1 +
>>   3 files changed, 82 insertions(+)
>>   create mode 100755 tests/generic/110
>>   create mode 100644 tests/generic/110.out
>>
>> diff --git a/tests/generic/110 b/tests/generic/110
>> new file mode 100755
>> index 0000000..b2b140c
>> --- /dev/null
>> +++ b/tests/generic/110
>> @@ -0,0 +1,78 @@
>> +#! /bin/bash
>> +# FS QA Test 110
>> +#
>> +# Test if fallocate will create uneeded extra tailing extent
>> +#
>> +#-----------------------------------------------------------------------
>> +# Copyright (c) 2015 Fujitsu.  All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#-----------------------------------------------------------------------
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1	# failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +	cd /
>> +	rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/defrag
>> +
>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +
>> +_supported_fs generic
>> +_supported_os IRIX Linux
>> +_require_scratch
>> +_need_to_be_root
>
> need a '_require_xfs_io_command "falloc"' here, so test _notrun
> correctly on ext2/3/NFS/CIFS that don't support fallocate.

Will add it.

>
>> +
>> +# Use 64K file size to match any sectorsize
>> +# And with a unaligned tailing range to ensure it will be at least 2 pages
>> +filesize=$(( 64 * 1024 + 1024 ))
>> +
>> +_scratch_mkfs > /dev/null 2>&1
>> +_scratch_mount
>> +$XFS_IO_PROG -f -c "pwrite 0 $filesize" $SCRATCH_MNT/foo | _filter_xfs_io
>> +sync
>> +orig_extent_nr=`_extent_count $SCRATCH_MNT/foo`
>> +
>> +# As all space are allocated and even written to disk, this falloc
>> +# should do nothing with extent modification.
>> +$XFS_IO_PROG -f -c "falloc 0 $filesize" $SCRATCH_MNT/foo
>> +sync
>> +new_extent_nr=`_extent_count $SCRATCH_MNT/foo`
>> +
>> +echo "orig: $orig_extent_nr, new: $new_extent_nr" >> $seqres.full
>> +
>> +if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
>> +	echo "number of extents mis-match after bull fallocate"
>
> print out the $orig_extent_nr and $new_extent_nr in this failure case? I
> think it's useful to see the difference just from the output diff, don't
> have to check the full file.

The problem is, we can't ensure orig/new_extent_nr always be a constant 
value(1 for btrfs case).

Maybe there is some file system which can only allocate extent up to 
64K, then the orig_extent_nr will be 2 and fails on that filesystem.

So I follow the method used in btrfs/010 to do it.

Thanks,
Qu
>
> Thanks,
> Eryu
>
>> +fi
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/generic/110.out b/tests/generic/110.out
>> new file mode 100644
>> index 0000000..64049da
>> --- /dev/null
>> +++ b/tests/generic/110.out
>> @@ -0,0 +1,3 @@
>> +QA output created by 110
>> +wrote 66560/66560 bytes at offset 0
>> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> diff --git a/tests/generic/group b/tests/generic/group
>> index 4ae256f..428f3e3 100644
>> --- a/tests/generic/group
>> +++ b/tests/generic/group
>> @@ -112,6 +112,7 @@
>>   107 auto quick metadata
>>   108 auto quick rw
>>   109 auto metadata dir
>> +110 auto quick prealloc
>>   112 rw aio auto quick
>>   113 rw aio auto quick
>>   117 attr auto quick
>> --
>> 1.8.3.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Sept. 29, 2015, 10:24 a.m. UTC | #5
On Tue, Sep 29, 2015 at 11:13 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> ? 2015?09?29? 18:00, Hugo Mills ??:
>>
>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>>
>>> Normally, a bull fallocate call on a fully written and synced file
>>> should not add an extent.
>>
>>
>>     What's a "bull" fallocate call? Is it a typo, or simply something
>> I'm not familiar with?
>>
>>     Hugo.
>
>
> Oh, it should be null.
> But null still seems not appropriate here.
>
> I mean a fallocate call which doesn't really allocate any space...
>
> Any good ideas?

"Test if fallocate will create uneeded extra tailing extent"

change it to:

"Test that calling fallocate against a range which is already
allocated does not create new file extents".

No need to categorize such case for fallocate imho (but I'm not a
native English speaker either).

>
> Thanks,
> Qu
>
>>
>>> But not all filesystem follows the correct behavior.
>>>
>>> Btrfs has a bug to always truncate the last page if the fallocate start
>>> offset is smaller than inode size.
>>>
>>> So add this test case to detect such malfunction.
>>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> ---
>>> v2:
>>>    Add author info...
>>>    Fix some comment typo
>>> ---
>>>   tests/generic/110     | 78
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   tests/generic/110.out |  3 ++
>>>   tests/generic/group   |  1 +
>>>   3 files changed, 82 insertions(+)
>>>   create mode 100755 tests/generic/110
>>>   create mode 100644 tests/generic/110.out
>>>
>>> diff --git a/tests/generic/110 b/tests/generic/110
>>> new file mode 100755
>>> index 0000000..b2b140c
>>> --- /dev/null
>>> +++ b/tests/generic/110
>>> @@ -0,0 +1,78 @@
>>> +#! /bin/bash
>>> +# FS QA Test 110
>>> +#
>>> +# Test if fallocate will create uneeded extra tailing extent
>>> +#
>>> +#-----------------------------------------------------------------------
>>> +# Copyright (c) 2015 Fujitsu.  All Rights Reserved.
>>> +#
>>> +# This program is free software; you can redistribute it and/or
>>> +# modify it under the terms of the GNU General Public License as
>>> +# published by the Free Software Foundation.
>>> +#
>>> +# This program is distributed in the hope that it would be useful,
>>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +# GNU General Public License for more details.
>>> +#
>>> +# You should have received a copy of the GNU General Public License
>>> +# along with this program; if not, write the Free Software Foundation,
>>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>>> +#-----------------------------------------------------------------------
>>> +#
>>> +
>>> +seq=`basename $0`
>>> +seqres=$RESULT_DIR/$seq
>>> +echo "QA output created by $seq"
>>> +
>>> +here=`pwd`
>>> +tmp=/tmp/$$
>>> +status=1       # failure is the default!
>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>>> +
>>> +_cleanup()
>>> +{
>>> +       cd /
>>> +       rm -f $tmp.*
>>> +}
>>> +
>>> +# get standard environment, filters and checks
>>> +. ./common/rc
>>> +. ./common/filter
>>> +. ./common/defrag
>>> +
>>> +# remove previous $seqres.full before test
>>> +rm -f $seqres.full
>>> +
>>> +# real QA test starts here
>>> +
>>> +_supported_fs generic
>>> +_supported_os IRIX Linux
>>> +_require_scratch
>>> +_need_to_be_root
>>> +
>>> +# Use 64K file size to match any sectorsize
>>> +# And with a unaligned tailing range to ensure it will be at least 2
>>> pages
>>> +filesize=$(( 64 * 1024 + 1024 ))
>>> +
>>> +_scratch_mkfs > /dev/null 2>&1
>>> +_scratch_mount
>>> +$XFS_IO_PROG -f -c "pwrite 0 $filesize" $SCRATCH_MNT/foo |
>>> _filter_xfs_io
>>> +sync
>>> +orig_extent_nr=`_extent_count $SCRATCH_MNT/foo`
>>> +
>>> +# As all space are allocated and even written to disk, this falloc
>>> +# should do nothing with extent modification.
>>> +$XFS_IO_PROG -f -c "falloc 0 $filesize" $SCRATCH_MNT/foo
>>> +sync
>>> +new_extent_nr=`_extent_count $SCRATCH_MNT/foo`
>>> +
>>> +echo "orig: $orig_extent_nr, new: $new_extent_nr" >> $seqres.full
>>> +
>>> +if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
>>> +       echo "number of extents mis-match after bull fallocate"
>>> +fi
>>> +
>>> +# success, all done
>>> +status=0
>>> +exit
>>> diff --git a/tests/generic/110.out b/tests/generic/110.out
>>> new file mode 100644
>>> index 0000000..64049da
>>> --- /dev/null
>>> +++ b/tests/generic/110.out
>>> @@ -0,0 +1,3 @@
>>> +QA output created by 110
>>> +wrote 66560/66560 bytes at offset 0
>>> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>> diff --git a/tests/generic/group b/tests/generic/group
>>> index 4ae256f..428f3e3 100644
>>> --- a/tests/generic/group
>>> +++ b/tests/generic/group
>>> @@ -112,6 +112,7 @@
>>>   107 auto quick metadata
>>>   108 auto quick rw
>>>   109 auto metadata dir
>>> +110 auto quick prealloc
>>>   112 rw aio auto quick
>>>   113 rw aio auto quick
>>>   117 attr auto quick
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugo Mills Sept. 29, 2015, 10:24 a.m. UTC | #6
On Tue, Sep 29, 2015 at 06:13:37PM +0800, Qu Wenruo wrote:
> 
> 
> ? 2015?09?29? 18:00, Hugo Mills ??:
> >On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
> >>Normally, a bull fallocate call on a fully written and synced file
> >>should not add an extent.
> >
> >    What's a "bull" fallocate call? Is it a typo, or simply something
> >I'm not familiar with?
> >
> >    Hugo.
> 
> Oh, it should be null.
> But null still seems not appropriate here.
> 
> I mean a fallocate call which doesn't really allocate any space...
> 
> Any good ideas?

   Not in a single word. Maybe one of the following:

Normally, an fallocate call which only touches existing extents should
not add any more extents on a fully written and synced file.

Normally, an fallocate call which does not allocate outside existing
extents should not add any more extents on a fully written and synced
file.

   I think I slightly prefer the second option.

   Hugo

> Thanks,
> Qu
> >
> >>But not all filesystem follows the correct behavior.
> >>
> >>Btrfs has a bug to always truncate the last page if the fallocate start
> >>offset is smaller than inode size.
> >>
> >>So add this test case to detect such malfunction.
> >>
> >>Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> >>---
> >>v2:
> >>   Add author info...
> >>   Fix some comment typo
> >>---
> >>  tests/generic/110     | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  tests/generic/110.out |  3 ++
> >>  tests/generic/group   |  1 +
> >>  3 files changed, 82 insertions(+)
> >>  create mode 100755 tests/generic/110
> >>  create mode 100644 tests/generic/110.out
> >>
> >>diff --git a/tests/generic/110 b/tests/generic/110
> >>new file mode 100755
> >>index 0000000..b2b140c
> >>--- /dev/null
> >>+++ b/tests/generic/110
> >>@@ -0,0 +1,78 @@
> >>+#! /bin/bash
> >>+# FS QA Test 110
> >>+#
> >>+# Test if fallocate will create uneeded extra tailing extent
> >>+#
> >>+#-----------------------------------------------------------------------
> >>+# Copyright (c) 2015 Fujitsu.  All Rights Reserved.
> >>+#
> >>+# This program is free software; you can redistribute it and/or
> >>+# modify it under the terms of the GNU General Public License as
> >>+# published by the Free Software Foundation.
> >>+#
> >>+# This program is distributed in the hope that it would be useful,
> >>+# but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>+# GNU General Public License for more details.
> >>+#
> >>+# You should have received a copy of the GNU General Public License
> >>+# along with this program; if not, write the Free Software Foundation,
> >>+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> >>+#-----------------------------------------------------------------------
> >>+#
> >>+
> >>+seq=`basename $0`
> >>+seqres=$RESULT_DIR/$seq
> >>+echo "QA output created by $seq"
> >>+
> >>+here=`pwd`
> >>+tmp=/tmp/$$
> >>+status=1	# failure is the default!
> >>+trap "_cleanup; exit \$status" 0 1 2 3 15
> >>+
> >>+_cleanup()
> >>+{
> >>+	cd /
> >>+	rm -f $tmp.*
> >>+}
> >>+
> >>+# get standard environment, filters and checks
> >>+. ./common/rc
> >>+. ./common/filter
> >>+. ./common/defrag
> >>+
> >>+# remove previous $seqres.full before test
> >>+rm -f $seqres.full
> >>+
> >>+# real QA test starts here
> >>+
> >>+_supported_fs generic
> >>+_supported_os IRIX Linux
> >>+_require_scratch
> >>+_need_to_be_root
> >>+
> >>+# Use 64K file size to match any sectorsize
> >>+# And with a unaligned tailing range to ensure it will be at least 2 pages
> >>+filesize=$(( 64 * 1024 + 1024 ))
> >>+
> >>+_scratch_mkfs > /dev/null 2>&1
> >>+_scratch_mount
> >>+$XFS_IO_PROG -f -c "pwrite 0 $filesize" $SCRATCH_MNT/foo | _filter_xfs_io
> >>+sync
> >>+orig_extent_nr=`_extent_count $SCRATCH_MNT/foo`
> >>+
> >>+# As all space are allocated and even written to disk, this falloc
> >>+# should do nothing with extent modification.
> >>+$XFS_IO_PROG -f -c "falloc 0 $filesize" $SCRATCH_MNT/foo
> >>+sync
> >>+new_extent_nr=`_extent_count $SCRATCH_MNT/foo`
> >>+
> >>+echo "orig: $orig_extent_nr, new: $new_extent_nr" >> $seqres.full
> >>+
> >>+if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
> >>+	echo "number of extents mis-match after bull fallocate"
> >>+fi
> >>+
> >>+# success, all done
> >>+status=0
> >>+exit
> >>diff --git a/tests/generic/110.out b/tests/generic/110.out
> >>new file mode 100644
> >>index 0000000..64049da
> >>--- /dev/null
> >>+++ b/tests/generic/110.out
> >>@@ -0,0 +1,3 @@
> >>+QA output created by 110
> >>+wrote 66560/66560 bytes at offset 0
> >>+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >>diff --git a/tests/generic/group b/tests/generic/group
> >>index 4ae256f..428f3e3 100644
> >>--- a/tests/generic/group
> >>+++ b/tests/generic/group
> >>@@ -112,6 +112,7 @@
> >>  107 auto quick metadata
> >>  108 auto quick rw
> >>  109 auto metadata dir
> >>+110 auto quick prealloc
> >>  112 rw aio auto quick
> >>  113 rw aio auto quick
> >>  117 attr auto quick
> >
Eryu Guan Sept. 29, 2015, 10:33 a.m. UTC | #7
On Tue, Sep 29, 2015 at 06:16:11PM +0800, Qu Wenruo wrote:
>
<snip>
> >>+
> >>+if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
> >>+	echo "number of extents mis-match after bull fallocate"
> >
> >print out the $orig_extent_nr and $new_extent_nr in this failure case? I
> >think it's useful to see the difference just from the output diff, don't
> >have to check the full file.
> 
> The problem is, we can't ensure orig/new_extent_nr always be a constant
> value(1 for btrfs case).

Sorry, I might be unclear, I mean print the extent number in the error
path, e.g.

echo "number of extents mis-match after null fallocate"
echo "old: $orig_extent_nr, new: $new_extent_nr"

not matching the number with golden output.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Sept. 29, 2015, 10:46 a.m. UTC | #8
? 2015?09?29? 18:33, Eryu Guan ??:
> On Tue, Sep 29, 2015 at 06:16:11PM +0800, Qu Wenruo wrote:
>>
> <snip>
>>>> +
>>>> +if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
>>>> +	echo "number of extents mis-match after bull fallocate"
>>>
>>> print out the $orig_extent_nr and $new_extent_nr in this failure case? I
>>> think it's useful to see the difference just from the output diff, don't
>>> have to check the full file.
>>
>> The problem is, we can't ensure orig/new_extent_nr always be a constant
>> value(1 for btrfs case).
>
> Sorry, I might be unclear, I mean print the extent number in the error
> path, e.g.
>
> echo "number of extents mis-match after null fallocate"
> echo "old: $orig_extent_nr, new: $new_extent_nr"
>
> not matching the number with golden output.
>
> Thanks,
> Eryu

That sounds pretty nice.
Will change it in next version.

Thanks,
Qu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Sept. 29, 2015, 10:48 a.m. UTC | #9
? 2015?09?29? 18:24, Filipe Manana ??:
> On Tue, Sep 29, 2015 at 11:13 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> ? 2015?09?29? 18:00, Hugo Mills ??:
>>>
>>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>>>
>>>> Normally, a bull fallocate call on a fully written and synced file
>>>> should not add an extent.
>>>
>>>
>>>      What's a "bull" fallocate call? Is it a typo, or simply something
>>> I'm not familiar with?
>>>
>>>      Hugo.
>>
>>
>> Oh, it should be null.
>> But null still seems not appropriate here.
>>
>> I mean a fallocate call which doesn't really allocate any space...
>>
>> Any good ideas?
>
> "Test if fallocate will create uneeded extra tailing extent"
>
> change it to:
>
> "Test that calling fallocate against a range which is already
> allocated does not create new file extents".
>
> No need to categorize such case for fallocate imho (but I'm not a
> native English speaker either).

Thank you, Filipe and Huge,

Both gives quite good expression, I'll pick one of them.

Thanks,
Qu
>
>>
>> Thanks,
>> Qu
>>
>>>
>>>> But not all filesystem follows the correct behavior.
>>>>
>>>> Btrfs has a bug to always truncate the last page if the fallocate start
>>>> offset is smaller than inode size.
>>>>
>>>> So add this test case to detect such malfunction.
>>>>
>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> ---
>>>> v2:
>>>>     Add author info...
>>>>     Fix some comment typo
>>>> ---
>>>>    tests/generic/110     | 78
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    tests/generic/110.out |  3 ++
>>>>    tests/generic/group   |  1 +
>>>>    3 files changed, 82 insertions(+)
>>>>    create mode 100755 tests/generic/110
>>>>    create mode 100644 tests/generic/110.out
>>>>
>>>> diff --git a/tests/generic/110 b/tests/generic/110
>>>> new file mode 100755
>>>> index 0000000..b2b140c
>>>> --- /dev/null
>>>> +++ b/tests/generic/110
>>>> @@ -0,0 +1,78 @@
>>>> +#! /bin/bash
>>>> +# FS QA Test 110
>>>> +#
>>>> +# Test if fallocate will create uneeded extra tailing extent
>>>> +#
>>>> +#-----------------------------------------------------------------------
>>>> +# Copyright (c) 2015 Fujitsu.  All Rights Reserved.
>>>> +#
>>>> +# This program is free software; you can redistribute it and/or
>>>> +# modify it under the terms of the GNU General Public License as
>>>> +# published by the Free Software Foundation.
>>>> +#
>>>> +# This program is distributed in the hope that it would be useful,
>>>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> +# GNU General Public License for more details.
>>>> +#
>>>> +# You should have received a copy of the GNU General Public License
>>>> +# along with this program; if not, write the Free Software Foundation,
>>>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>>>> +#-----------------------------------------------------------------------
>>>> +#
>>>> +
>>>> +seq=`basename $0`
>>>> +seqres=$RESULT_DIR/$seq
>>>> +echo "QA output created by $seq"
>>>> +
>>>> +here=`pwd`
>>>> +tmp=/tmp/$$
>>>> +status=1       # failure is the default!
>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>>>> +
>>>> +_cleanup()
>>>> +{
>>>> +       cd /
>>>> +       rm -f $tmp.*
>>>> +}
>>>> +
>>>> +# get standard environment, filters and checks
>>>> +. ./common/rc
>>>> +. ./common/filter
>>>> +. ./common/defrag
>>>> +
>>>> +# remove previous $seqres.full before test
>>>> +rm -f $seqres.full
>>>> +
>>>> +# real QA test starts here
>>>> +
>>>> +_supported_fs generic
>>>> +_supported_os IRIX Linux
>>>> +_require_scratch
>>>> +_need_to_be_root
>>>> +
>>>> +# Use 64K file size to match any sectorsize
>>>> +# And with a unaligned tailing range to ensure it will be at least 2
>>>> pages
>>>> +filesize=$(( 64 * 1024 + 1024 ))
>>>> +
>>>> +_scratch_mkfs > /dev/null 2>&1
>>>> +_scratch_mount
>>>> +$XFS_IO_PROG -f -c "pwrite 0 $filesize" $SCRATCH_MNT/foo |
>>>> _filter_xfs_io
>>>> +sync
>>>> +orig_extent_nr=`_extent_count $SCRATCH_MNT/foo`
>>>> +
>>>> +# As all space are allocated and even written to disk, this falloc
>>>> +# should do nothing with extent modification.
>>>> +$XFS_IO_PROG -f -c "falloc 0 $filesize" $SCRATCH_MNT/foo
>>>> +sync
>>>> +new_extent_nr=`_extent_count $SCRATCH_MNT/foo`
>>>> +
>>>> +echo "orig: $orig_extent_nr, new: $new_extent_nr" >> $seqres.full
>>>> +
>>>> +if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
>>>> +       echo "number of extents mis-match after bull fallocate"
>>>> +fi
>>>> +
>>>> +# success, all done
>>>> +status=0
>>>> +exit
>>>> diff --git a/tests/generic/110.out b/tests/generic/110.out
>>>> new file mode 100644
>>>> index 0000000..64049da
>>>> --- /dev/null
>>>> +++ b/tests/generic/110.out
>>>> @@ -0,0 +1,3 @@
>>>> +QA output created by 110
>>>> +wrote 66560/66560 bytes at offset 0
>>>> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> diff --git a/tests/generic/group b/tests/generic/group
>>>> index 4ae256f..428f3e3 100644
>>>> --- a/tests/generic/group
>>>> +++ b/tests/generic/group
>>>> @@ -112,6 +112,7 @@
>>>>    107 auto quick metadata
>>>>    108 auto quick rw
>>>>    109 auto metadata dir
>>>> +110 auto quick prealloc
>>>>    112 rw aio auto quick
>>>>    113 rw aio auto quick
>>>>    117 attr auto quick
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Sept. 29, 2015, 9:51 p.m. UTC | #10
On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
> Normally, a bull fallocate call on a fully written and synced file
> should not add an extent.

Why not? Filesystems can do whatever they want with extents during
a fallocate call. e.g. if the blocks are shared, then fallocate
might break the block sharing so future overwrites don't get
ENOSPC. This is a requirement set down by posix_fallocate(3)

"After a successful call to posix_fallocate(), subsequent writes to
bytes in the specified range are guaranteed not  to fail because of
lack of disk space."

Hence if you've got a file with shared blocks, a "full fallocate"
must change the extent layout to break the sharing. As such, the
premise of this test is wrong.

That's not to say that btrfs has a bug:

> Btrfs has a bug to always truncate the last page if the fallocate start
> offset is smaller than inode size.

But it' not clear that this behaviour is actually a bug if it's not
changing the file data.

Cheers,

Dave.
Qu Wenruo Sept. 30, 2015, 1:05 a.m. UTC | #11
Dave Chinner wrote on 2015/09/30 07:51 +1000:
> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>> Normally, a bull fallocate call on a fully written and synced file
>> should not add an extent.
>
> Why not? Filesystems can do whatever they want with extents during
> a fallocate call. e.g. if the blocks are shared, then fallocate
> might break the block sharing so future overwrites don't get
> ENOSPC. This is a requirement set down by posix_fallocate(3)
>
> "After a successful call to posix_fallocate(), subsequent writes to
> bytes in the specified range are guaranteed not  to fail because of
> lack of disk space."
>
> Hence if you've got a file with shared blocks, a "full fallocate"
> must change the extent layout to break the sharing. As such, the
> premise of this test is wrong.

First, btrfs never meets the posix_fallocate requirement by its COW nature.

Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
Only when btrfs fallocate a new prealloc extent, next write into it may 
use the space if they are not shared between different snapshots.

Under most case, btrfs fallocate follows the behavior of other non-COW 
filesystems.
Which means, btrfs won't alloc new extent if there is existing extent, 
not matter if it's shared or not.

As a result, fallocate in btrfs only works in a limited use-case, and 
can easily break posix requirement.
Like the following case without snapshots:
1)Fallocate 0~50M
2)Write 0~50M		<- Will not return ENOSPC
3)Write 0~25M		<- COW happens, allocate another 25M,
			   may cause ENOSPC.
Or even easier with snapshot:
1)Fallocate 0~50M in subvol A
2)Snapshot subvol A into snap B
3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC

As in step 3), fallocated 50M is shared, so write will be forced COW.

So I'd prefer to make btrfs follows the behavior of other non-COW 
filesystems, as posix standard doesn't fit well here.
>
> That's not to say that btrfs has a bug:
>
>> Btrfs has a bug to always truncate the last page if the fallocate start
>> offset is smaller than inode size.
>
> But it' not clear that this behaviour is actually a bug if it's not
> changing the file data.
File data is not changed, as btrfs just COW the last tailing page, as 
reset the last already 0 part.

Like the follow ascii arts:

0)Before
0	4K	8K	12K	16K
|///////////////////////////|000|
|<----------Extent A----------->|
The file is 14K size, on disk(to be accurate, btrfs logical address 
space) it takes 16K, with last 2K padding 0.

And all that 16K is in extent A.

1)Fallocate 0~14K
In fact, all space in range 0~14K is allocated, so there is no need to 
reallocate any space.

2)But in btrfs
Result will be:
0	4K	8K	12K	16K
|///////////////////////////|000|
|<------Extent A------->|<--B-->|

Btrfs has a wrong judgment, which will always re-padding the last page.
Causing a new extent, extent B to be created.
Even the contents is the same with original last page.

It's OK not to consider it as a bug, at least data is not corrupted.
But IMO the btrfs behavior is not needed and need optimization.
So kernel patch is submitted to btrfs ml:
https://patchwork.kernel.org/patch/7284431/

And if fstests is not the proper place, any idea where such "test case" 
should belong?

Thanks,
Qu

>
> Cheers,
>
> Dave.
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tsutomu Itoh Sept. 30, 2015, 1:45 a.m. UTC | #12
On 2015/09/30 10:05, Qu Wenruo wrote:
> Dave Chinner wrote on 2015/09/30 07:51 +1000:
>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>> Normally, a bull fallocate call on a fully written and synced file
>>> should not add an extent.
>>
>> Why not? Filesystems can do whatever they want with extents during
>> a fallocate call. e.g. if the blocks are shared, then fallocate
>> might break the block sharing so future overwrites don't get
>> ENOSPC. This is a requirement set down by posix_fallocate(3)
>>
>> "After a successful call to posix_fallocate(), subsequent writes to
>> bytes in the specified range are guaranteed not  to fail because of
>> lack of disk space."
>>
>> Hence if you've got a file with shared blocks, a "full fallocate"
>> must change the extent layout to break the sharing. As such, the
>> premise of this test is wrong.
>
> First, btrfs never meets the posix_fallocate requirement by its COW nature.
>
> Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
> Only when btrfs fallocate a new prealloc extent, next write into it may use the space if they are not shared between different snapshots.
>
> Under most case, btrfs fallocate follows the behavior of other non-COW filesystems.
> Which means, btrfs won't alloc new extent if there is existing extent, not matter if it's shared or not.
>
> As a result, fallocate in btrfs only works in a limited use-case, and can easily break posix requirement.
> Like the following case without snapshots:
> 1)Fallocate 0~50M
> 2)Write 0~50M        <- Will not return ENOSPC
> 3)Write 0~25M        <- COW happens, allocate another 25M,
>                 may cause ENOSPC.
> Or even easier with snapshot:
> 1)Fallocate 0~50M in subvol A
> 2)Snapshot subvol A into snap B
> 3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC
>
> As in step 3), fallocated 50M is shared, so write will be forced COW.
>
> So I'd prefer to make btrfs follows the behavior of other non-COW filesystems, as posix standard doesn't fit well here.
>>
>> That's not to say that btrfs has a bug:
>>
>>> Btrfs has a bug to always truncate the last page if the fallocate start
>>> offset is smaller than inode size.
>>
>> But it' not clear that this behaviour is actually a bug if it's not
>> changing the file data.
> File data is not changed, as btrfs just COW the last tailing page, as reset the last already 0 part.
>
> Like the follow ascii arts:
>
> 0)Before
> 0    4K    8K    12K    16K
> |///////////////////////////|000|
> |<----------Extent A----------->|
> The file is 14K size, on disk(to be accurate, btrfs logical address space) it takes 16K, with last 2K padding 0.
>
> And all that 16K is in extent A.
>
> 1)Fallocate 0~14K
> In fact, all space in range 0~14K is allocated, so there is no need to reallocate any space.
>
> 2)But in btrfs
> Result will be:
> 0    4K    8K    12K    16K
> |///////////////////////////|000|
> |<------Extent A------->|<--B-->|
>
> Btrfs has a wrong judgment, which will always re-padding the last page.
> Causing a new extent, extent B to be created.
> Even the contents is the same with original last page.
>
> It's OK not to consider it as a bug, at least data is not corrupted.
> But IMO the btrfs behavior is not needed and need optimization.

> So kernel patch is submitted to btrfs ml:
> https://patchwork.kernel.org/patch/7284431/

Is this https://patchwork.kernel.org/patch/7283461/ ?  Right?

Thanks,
Tsutomu

>
> And if fstests is not the proper place, any idea where such "test case" should belong?
>
> Thanks,
> Qu
>
>>
>> Cheers,
>>
>> Dave.
>>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Sept. 30, 2015, 1:49 a.m. UTC | #13
Tsutomu Itoh wrote on 2015/09/30 10:45 +0900:
> On 2015/09/30 10:05, Qu Wenruo wrote:
>> Dave Chinner wrote on 2015/09/30 07:51 +1000:
>>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>>> Normally, a bull fallocate call on a fully written and synced file
>>>> should not add an extent.
>>>
>>> Why not? Filesystems can do whatever they want with extents during
>>> a fallocate call. e.g. if the blocks are shared, then fallocate
>>> might break the block sharing so future overwrites don't get
>>> ENOSPC. This is a requirement set down by posix_fallocate(3)
>>>
>>> "After a successful call to posix_fallocate(), subsequent writes to
>>> bytes in the specified range are guaranteed not  to fail because of
>>> lack of disk space."
>>>
>>> Hence if you've got a file with shared blocks, a "full fallocate"
>>> must change the extent layout to break the sharing. As such, the
>>> premise of this test is wrong.
>>
>> First, btrfs never meets the posix_fallocate requirement by its COW
>> nature.
>>
>> Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
>> Only when btrfs fallocate a new prealloc extent, next write into it
>> may use the space if they are not shared between different snapshots.
>>
>> Under most case, btrfs fallocate follows the behavior of other non-COW
>> filesystems.
>> Which means, btrfs won't alloc new extent if there is existing extent,
>> not matter if it's shared or not.
>>
>> As a result, fallocate in btrfs only works in a limited use-case, and
>> can easily break posix requirement.
>> Like the following case without snapshots:
>> 1)Fallocate 0~50M
>> 2)Write 0~50M        <- Will not return ENOSPC
>> 3)Write 0~25M        <- COW happens, allocate another 25M,
>>                 may cause ENOSPC.
>> Or even easier with snapshot:
>> 1)Fallocate 0~50M in subvol A
>> 2)Snapshot subvol A into snap B
>> 3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC
>>
>> As in step 3), fallocated 50M is shared, so write will be forced COW.
>>
>> So I'd prefer to make btrfs follows the behavior of other non-COW
>> filesystems, as posix standard doesn't fit well here.
>>>
>>> That's not to say that btrfs has a bug:
>>>
>>>> Btrfs has a bug to always truncate the last page if the fallocate start
>>>> offset is smaller than inode size.
>>>
>>> But it' not clear that this behaviour is actually a bug if it's not
>>> changing the file data.
>> File data is not changed, as btrfs just COW the last tailing page, as
>> reset the last already 0 part.
>>
>> Like the follow ascii arts:
>>
>> 0)Before
>> 0    4K    8K    12K    16K
>> |///////////////////////////|000|
>> |<----------Extent A----------->|
>> The file is 14K size, on disk(to be accurate, btrfs logical address
>> space) it takes 16K, with last 2K padding 0.
>>
>> And all that 16K is in extent A.
>>
>> 1)Fallocate 0~14K
>> In fact, all space in range 0~14K is allocated, so there is no need to
>> reallocate any space.
>>
>> 2)But in btrfs
>> Result will be:
>> 0    4K    8K    12K    16K
>> |///////////////////////////|000|
>> |<------Extent A------->|<--B-->|
>>
>> Btrfs has a wrong judgment, which will always re-padding the last page.
>> Causing a new extent, extent B to be created.
>> Even the contents is the same with original last page.
>>
>> It's OK not to consider it as a bug, at least data is not corrupted.
>> But IMO the btrfs behavior is not needed and need optimization.
>
>> So kernel patch is submitted to btrfs ml:
>> https://patchwork.kernel.org/patch/7284431/
>
> Is this https://patchwork.kernel.org/patch/7283461/ ?  Right?
>
> Thanks,
> Tsutomu

Oh, my fault, 728461 is the right patch...

Thanks for pointing this out,
Qu
>
>>
>> And if fstests is not the proper place, any idea where such "test
>> case" should belong?
>>
>> Thanks,
>> Qu
>>
>>>
>>> Cheers,
>>>
>>> Dave.
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Sept. 30, 2015, 4:20 a.m. UTC | #14
On Wed, Sep 30, 2015 at 09:05:15AM +0800, Qu Wenruo wrote:
> 
> 
> Dave Chinner wrote on 2015/09/30 07:51 +1000:
> >On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
> >>Normally, a bull fallocate call on a fully written and synced file
> >>should not add an extent.
> >
> >Why not? Filesystems can do whatever they want with extents during
> >a fallocate call. e.g. if the blocks are shared, then fallocate
> >might break the block sharing so future overwrites don't get
> >ENOSPC. This is a requirement set down by posix_fallocate(3)
> >
> >"After a successful call to posix_fallocate(), subsequent writes to
> >bytes in the specified range are guaranteed not  to fail because of
> >lack of disk space."
> >
> >Hence if you've got a file with shared blocks, a "full fallocate"
> >must change the extent layout to break the sharing. As such, the
> >premise of this test is wrong.
> 
> First, btrfs never meets the posix_fallocate requirement by its COW nature.
> 
> Btrfs fallocate can only ensure at most next write will not cause ENOSPC.

Which, it can be successfully argued, meets the basic requirement of
posix_fallocate().  i.e. "subsequent writes" is "one or more" future
writes.

But in trying to explain how COW works you've completely missed the
point I was making about a fundamental principle that COW is based
on - overwrite requires allocation. Hence fallocate must be allowed
modify the underlying layout of a file, even if the file is already
full of allocated blocks and data.

This isn't just btrfs - any filesystem that does dedupe or reflink
or snapshots or compression or any other sort of operation that can
cause extent reallocation on overwrite *may* change the file layout
during a fallocate call to guarantee the next write succeeds.

> It's OK not to consider it as a bug, at least data is not corrupted.
> But IMO the btrfs behavior is not needed and need optimization.
> So kernel patch is submitted to btrfs ml:
> https://patchwork.kernel.org/patch/7284431/

That's a link to the fstests patch, not a btrfs kernel patch. :/

> And if fstests is not the proper place, any idea where such "test
> case" should belong?

You still haven't understood what I said. If you want to test that
btrfs does not truncate extents beyond EOF when fallocate is called,
then it's a btrfs test. Yes, You can do whatever you want with
btrfs, but you've proposed a generic test that applies a constraint
to a generic operation that has no such constraint defined in it's
API. If you want to constrain fallocate behaviour like this, then
take it to linux-fsdevel and get everyone to agree on the
constraint, and then I'll take it as a generic test...

Cheers,

Dave.
Qu Wenruo Sept. 30, 2015, 5:48 a.m. UTC | #15
Dave Chinner wrote on 2015/09/30 14:20 +1000:
> On Wed, Sep 30, 2015 at 09:05:15AM +0800, Qu Wenruo wrote:
>>
>>
>> Dave Chinner wrote on 2015/09/30 07:51 +1000:
>>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>>> Normally, a bull fallocate call on a fully written and synced file
>>>> should not add an extent.
>>>
>>> Why not? Filesystems can do whatever they want with extents during
>>> a fallocate call. e.g. if the blocks are shared, then fallocate
>>> might break the block sharing so future overwrites don't get
>>> ENOSPC. This is a requirement set down by posix_fallocate(3)
>>>
>>> "After a successful call to posix_fallocate(), subsequent writes to
>>> bytes in the specified range are guaranteed not  to fail because of
>>> lack of disk space."
>>>
>>> Hence if you've got a file with shared blocks, a "full fallocate"
>>> must change the extent layout to break the sharing. As such, the
>>> premise of this test is wrong.
>>
>> First, btrfs never meets the posix_fallocate requirement by its COW nature.
>>
>> Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
>
> Which, it can be successfully argued, meets the basic requirement of
> posix_fallocate().  i.e. "subsequent writes" is "one or more" future
> writes.
>
> But in trying to explain how COW works you've completely missed the
> point I was making about a fundamental principle that COW is based
> on - overwrite requires allocation. Hence fallocate must be allowed
> modify the underlying layout of a file, even if the file is already
> full of allocated blocks and data.
>
> This isn't just btrfs - any filesystem that does dedupe or reflink
> or snapshots or compression or any other sort of operation that can
> cause extent reallocation on overwrite *may* change the file layout
> during a fallocate call to guarantee the next write succeeds.
>
>> It's OK not to consider it as a bug, at least data is not corrupted.
>> But IMO the btrfs behavior is not needed and need optimization.
>> So kernel patch is submitted to btrfs ml:
>> https://patchwork.kernel.org/patch/7284431/
>
> That's a link to the fstests patch, not a btrfs kernel patch. :/

Sorry, the real patch is https://patchwork.kernel.org/patch/7283461/

>
>> And if fstests is not the proper place, any idea where such "test
>> case" should belong?
>
> You still haven't understood what I said. If you want to test that
> btrfs does not truncate extents beyond EOF when fallocate is called,
> then it's a btrfs test. Yes, You can do whatever you want with
> btrfs, but you've proposed a generic test that applies a constraint
> to a generic operation that has no such constraint defined in it's
> API. If you want to constrain fallocate behaviour like this, then
> take it to linux-fsdevel and get everyone to agree on the
> constraint, and then I'll take it as a generic test...

Now, I think I understand the point.
The test case is not for generic, but for Btrfs only.

The reason is:
Fallocate should be able to change extent layout
Even all extent is allocated, for its purpose.

So the assumption for the test case is not valid, and how a filesystem 
handle the last page truncate should follow its own standard.

This is convincing now.

I'll change it to btrfs test.

BTW, at least xfs doesn't truncate beyond EOF, will xfs also need such test?

Thanks,
Qu

>
> Cheers,
>
> Dave.
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Oct. 2, 2015, 8:35 a.m. UTC | #16
Hi Dave, I updated the patch and moved it to btrfs.

But I still has some question about the fallocate behavior.

Just as the new btrfs test case, I changed the fallocate range, not to 
cover the last part, to make the problem more obvious:

Btrfs will truncate beyond EOF even that's *not covered* by the 
fallocate range.

It's OK for a fs to modify the extent layout during fallocate, but is it 
OK to modify extent layout completely *out* of the fallocate range?

Thanks,
Qu

? 2015?09?30? 05:51, Dave Chinner ??:
> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>> Normally, a bull fallocate call on a fully written and synced file
>> should not add an extent.
>
> Why not? Filesystems can do whatever they want with extents during
> a fallocate call. e.g. if the blocks are shared, then fallocate
> might break the block sharing so future overwrites don't get
> ENOSPC. This is a requirement set down by posix_fallocate(3)
>
> "After a successful call to posix_fallocate(), subsequent writes to
> bytes in the specified range are guaranteed not  to fail because of
> lack of disk space."
>
> Hence if you've got a file with shared blocks, a "full fallocate"
> must change the extent layout to break the sharing. As such, the
> premise of this test is wrong.
>
> That's not to say that btrfs has a bug:
>
>> Btrfs has a bug to always truncate the last page if the fallocate start
>> offset is smaller than inode size.
>
> But it' not clear that this behaviour is actually a bug if it's not
> changing the file data.
>
> Cheers,
>
> Dave.
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Oct. 6, 2015, 1:31 a.m. UTC | #17
On Fri, Oct 02, 2015 at 04:35:53PM +0800, Qu Wenruo wrote:
> Hi Dave, I updated the patch and moved it to btrfs.
> 
> But I still has some question about the fallocate behavior.
> 
> Just as the new btrfs test case, I changed the fallocate range, not
> to cover the last part, to make the problem more obvious:
> 
> Btrfs will truncate beyond EOF even that's *not covered* by the
> fallocate range.
> 
> It's OK for a fs to modify the extent layout during fallocate, but
> is it OK to modify extent layout completely *out* of the fallocate
> range?

It's beyond EOF, so the filesystem can make whatever changes to
allocated blocks in that space whenever it likes because userspace
cannot access it.

In XFS, we don't remove unwritten extents beyond EOF if they were
placed there by fallocate except via explicit truncate() or
fallocate() operations (e.g. hole punch) from userspace that
manipulate that extent range beyond EOF.

What other filesytems do with blocks beyond EOF in any given
operation is up to the filesystem, really. If btrfs wants to
truncate away all extents beyond EOF when punching a hole that spans
EOF, then you can do that. It might not be what the user expects,
but blocks beyond EOF don't fall under posix_fallocate() because
it explicitly states:

"If the size of the file is less than offset+len, then the file is
increased to this size; otherwise the file size is left unchanged."

which means it can't allocate blocks beyond EOF....

Cheers,

Dave.
diff mbox

Patch

diff --git a/tests/generic/110 b/tests/generic/110
new file mode 100755
index 0000000..b2b140c
--- /dev/null
+++ b/tests/generic/110
@@ -0,0 +1,78 @@ 
+#! /bin/bash
+# FS QA Test 110
+#
+# Test if fallocate will create uneeded extra tailing extent
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2015 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/defrag
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs generic
+_supported_os IRIX Linux
+_require_scratch
+_need_to_be_root
+
+# Use 64K file size to match any sectorsize
+# And with a unaligned tailing range to ensure it will be at least 2 pages
+filesize=$(( 64 * 1024 + 1024 ))
+
+_scratch_mkfs > /dev/null 2>&1
+_scratch_mount
+$XFS_IO_PROG -f -c "pwrite 0 $filesize" $SCRATCH_MNT/foo | _filter_xfs_io
+sync
+orig_extent_nr=`_extent_count $SCRATCH_MNT/foo`
+
+# As all space are allocated and even written to disk, this falloc
+# should do nothing with extent modification.
+$XFS_IO_PROG -f -c "falloc 0 $filesize" $SCRATCH_MNT/foo
+sync
+new_extent_nr=`_extent_count $SCRATCH_MNT/foo`
+
+echo "orig: $orig_extent_nr, new: $new_extent_nr" >> $seqres.full
+
+if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
+	echo "number of extents mis-match after bull fallocate"
+fi
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/110.out b/tests/generic/110.out
new file mode 100644
index 0000000..64049da
--- /dev/null
+++ b/tests/generic/110.out
@@ -0,0 +1,3 @@ 
+QA output created by 110
+wrote 66560/66560 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/generic/group b/tests/generic/group
index 4ae256f..428f3e3 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -112,6 +112,7 @@ 
 107 auto quick metadata
 108 auto quick rw
 109 auto metadata dir
+110 auto quick prealloc
 112 rw aio auto quick
 113 rw aio auto quick
 117 attr auto quick