diff mbox series

[8/8] xfs: test for premature ENOSPC with large cow delalloc extents

Message ID 170899915333.896550.18395785595853879309.stgit@frogsfrogsfrogs (mailing list archive)
State New, archived
Headers show
Series [1/8] generic/604: try to make race occur reliably | expand

Commit Message

Darrick J. Wong Feb. 27, 2024, 2:02 a.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

On a higly fragmented filesystem a Direct IO write can fail with -ENOSPC error
even though the filesystem has sufficient number of free blocks.

This occurs if the file offset range on which the write operation is being
performed has a delalloc extent in the cow fork and this delalloc extent
begins much before the Direct IO range.

In such a scenario, xfs_reflink_allocate_cow() invokes xfs_bmapi_write() to
allocate the blocks mapped by the delalloc extent. The extent thus allocated
may not cover the beginning of file offset range on which the Direct IO write
was issued. Hence xfs_reflink_allocate_cow() ends up returning -ENOSPC.

This test addresses this issue.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/rc          |   14 +++++++++
 tests/xfs/1923     |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/1923.out |    8 +++++
 3 files changed, 107 insertions(+)
 create mode 100755 tests/xfs/1923
 create mode 100644 tests/xfs/1923.out

Comments

Zorro Lang Feb. 27, 2024, 6 a.m. UTC | #1
On Mon, Feb 26, 2024 at 06:02:37PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> On a higly fragmented filesystem a Direct IO write can fail with -ENOSPC error
> even though the filesystem has sufficient number of free blocks.
> 
> This occurs if the file offset range on which the write operation is being
> performed has a delalloc extent in the cow fork and this delalloc extent
> begins much before the Direct IO range.
> 
> In such a scenario, xfs_reflink_allocate_cow() invokes xfs_bmapi_write() to
> allocate the blocks mapped by the delalloc extent. The extent thus allocated
> may not cover the beginning of file offset range on which the Direct IO write
> was issued. Hence xfs_reflink_allocate_cow() ends up returning -ENOSPC.
> 
> This test addresses this issue.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  common/rc          |   14 +++++++++
>  tests/xfs/1923     |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/1923.out |    8 +++++
>  3 files changed, 107 insertions(+)
>  create mode 100755 tests/xfs/1923
>  create mode 100644 tests/xfs/1923.out
> 
> 
> diff --git a/common/rc b/common/rc
> index 30c44dddd9..d3a2a0718b 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -1873,6 +1873,20 @@ _require_scratch_delalloc()
>  	_scratch_unmount
>  }
>  
> +# Require test fs supports delay allocation.
> +_require_test_delalloc()
> +{
> +	_require_command "$FILEFRAG_PROG" filefrag
> +
> +	rm -f $TEST_DIR/testy
> +	$XFS_IO_PROG -f -c 'pwrite 0 64k' $TEST_DIR/testy &> /dev/null
> +	$FILEFRAG_PROG -v $TEST_DIR/testy 2>&1 | grep -q delalloc

I'm wondering if it's a 100% reliable way to get the "delalloc" flag when
the delalloc is supported? If not, is it worth testing with several files
or a loop, then return 0 if one of them get delalloc ?

> +	res=$?
> +	rm -f $TEST_DIR/testy
> +	test $res -eq 0 || \
> +		_notrun "test requires delayed allocation buffered writes"
> +}
> +
>  # this test needs a test partition - check we're ok & mount it
>  #
>  _require_test()
> diff --git a/tests/xfs/1923 b/tests/xfs/1923
> new file mode 100755
> index 0000000000..4e494ad8c2
> --- /dev/null
> +++ b/tests/xfs/1923
> @@ -0,0 +1,85 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test No. 1923
> +#
> +# This is a regression test for "xfs: Fix false ENOSPC when performing direct
> +# write on a delalloc extent in cow fork".  If there is a lot of free space but
> +# it is very fragmented, it's possible that a very large delalloc reservation
> +# could be created in the CoW fork by a buffered write.  If a directio write
> +# tries to convert the delalloc reservation to a real extent, it's possible
> +# that the allocation will succeed but fail to convert even the first block of
> +# the directio write range.  In this case, XFS will return ENOSPC even though
> +# all it needed to do was to keep converting until the allocator returns ENOSPC
> +# or the first block of the direct write got some space.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick clone
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $file1 $file2 $fragmentedfile
> +}
> +
> +# Import common functions.
> +. ./common/reflink
> +. ./common/inject
> +
> +# real QA test starts here
> +_fixed_by_kernel_commit XXXXX \
> +	"xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork"

Is it this commit below, or a new fix with same subject?

commit d62113303d691bcd8d0675ae4ac63e7769afc56c
Author: Chandan Babu R <chandan.babu@oracle.com>
Date:   Thu Aug 4 08:59:27 2022 -0700

    xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork

If it's an old commit, please replace the "XXXXX" to "d62113303d69".

> +
> +# Modify as appropriate.
> +_supported_fs generic

"xfs"? As it's in tests/xfs/ directory.

> +_require_test_program "punch-alternating"
> +_require_test_reflink
> +_require_xfs_io_error_injection "bmap_alloc_minlen_extent"
> +_require_test_delalloc
> +
> +file1=$TEST_DIR/file1.$seq
> +file2=$TEST_DIR/file2.$seq
> +fragmentedfile=$TEST_DIR/fragmentedfile.$seq

As you use the $TEST_DIR, it might be worth making sure these files
aren't existed. Due to the files (from other cases) in $TEST_DIR might
not be cleaned, especially if they don't take much free space.

So if these 3 files won't take much space, we can keep them, don't need
a specific _cleanup(). And move the "rm -f $file1 $file2 $fragmentedfile"
at here.

> +
> +# COW operates on pages, so we must not perform operations in units smaller
> +# than a page.
> +blksz=$(_get_file_block_size $TEST_DIR)
> +pagesz=$(_get_page_size)
> +if (( $blksz < $pagesz )); then
> +	blksz=$pagesz
> +fi

Just curious, this's a xfs specific test case, can xfs support blocksize >
pagesize? If not, can we just use pagesz directly at here ?

> +
> +echo "Create source file"
> +$XFS_IO_PROG -f -c "pwrite 0 $((blksz * 256))" $file1 >> $seqres.full
> +
> +sync
> +
> +echo "Create Reflinked file"
> +_cp_reflink $file1 $file2 >> $seqres.full
> +#$XFS_IO_PROG -f -c "reflink $file1" $file2 >> $seqres.full

There's a "#", do you hope to run it or not?

> +
> +echo "Set cowextsize"
> +$XFS_IO_PROG -c "cowextsize $((blksz * 128))" -c stat $file1 >> $seqres.full
> +
> +echo "Fragment FS"
> +$XFS_IO_PROG -f -c "pwrite 0 $((blksz * 512))" $fragmentedfile >> $seqres.full
> +sync
> +$here/src/punch-alternating $fragmentedfile
> +
> +echo "Allocate block sized extent from now onwards"
> +_test_inject_error bmap_alloc_minlen_extent 1
> +
> +echo "Create big delalloc extent in CoW fork"
> +$XFS_IO_PROG -c "pwrite 0 $blksz" $file1 >> $seqres.full
> +
> +sync
> +
> +$XFS_IO_PROG -c 'bmap -elpv' -c 'bmap -celpv' $file1 >> $seqres.full
> +
> +echo "Direct I/O write at offset 3FSB"
> +$XFS_IO_PROG -d -c "pwrite $((blksz * 3)) $((blksz * 2))" $file1 >> $seqres.full
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/1923.out b/tests/xfs/1923.out
> new file mode 100644
> index 0000000000..a0553cf3ee
> --- /dev/null
> +++ b/tests/xfs/1923.out
> @@ -0,0 +1,8 @@
> +QA output created by 1923
> +Create source file
> +Create Reflinked file
> +Set cowextsize
> +Fragment FS
> +Allocate block sized extent from now onwards
> +Create big delalloc extent in CoW fork
> +Direct I/O write at offset 3FSB
>
Darrick J. Wong Feb. 28, 2024, 1:36 a.m. UTC | #2
On Tue, Feb 27, 2024 at 02:00:21PM +0800, Zorro Lang wrote:
> On Mon, Feb 26, 2024 at 06:02:37PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > On a higly fragmented filesystem a Direct IO write can fail with -ENOSPC error
> > even though the filesystem has sufficient number of free blocks.
> > 
> > This occurs if the file offset range on which the write operation is being
> > performed has a delalloc extent in the cow fork and this delalloc extent
> > begins much before the Direct IO range.
> > 
> > In such a scenario, xfs_reflink_allocate_cow() invokes xfs_bmapi_write() to
> > allocate the blocks mapped by the delalloc extent. The extent thus allocated
> > may not cover the beginning of file offset range on which the Direct IO write
> > was issued. Hence xfs_reflink_allocate_cow() ends up returning -ENOSPC.
> > 
> > This test addresses this issue.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  common/rc          |   14 +++++++++
> >  tests/xfs/1923     |   85 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/1923.out |    8 +++++
> >  3 files changed, 107 insertions(+)
> >  create mode 100755 tests/xfs/1923
> >  create mode 100644 tests/xfs/1923.out
> > 
> > 
> > diff --git a/common/rc b/common/rc
> > index 30c44dddd9..d3a2a0718b 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -1873,6 +1873,20 @@ _require_scratch_delalloc()
> >  	_scratch_unmount
> >  }
> >  
> > +# Require test fs supports delay allocation.
> > +_require_test_delalloc()
> > +{
> > +	_require_command "$FILEFRAG_PROG" filefrag
> > +
> > +	rm -f $TEST_DIR/testy
> > +	$XFS_IO_PROG -f -c 'pwrite 0 64k' $TEST_DIR/testy &> /dev/null
> > +	$FILEFRAG_PROG -v $TEST_DIR/testy 2>&1 | grep -q delalloc
> 
> I'm wondering if it's a 100% reliable way to get the "delalloc" flag when
> the delalloc is supported? If not, is it worth testing with several files
> or a loop, then return 0 if one of them get delalloc ?
> 
> > +	res=$?
> > +	rm -f $TEST_DIR/testy
> > +	test $res -eq 0 || \
> > +		_notrun "test requires delayed allocation buffered writes"
> > +}
> > +
> >  # this test needs a test partition - check we're ok & mount it
> >  #
> >  _require_test()
> > diff --git a/tests/xfs/1923 b/tests/xfs/1923
> > new file mode 100755
> > index 0000000000..4e494ad8c2
> > --- /dev/null
> > +++ b/tests/xfs/1923
> > @@ -0,0 +1,85 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
> > +#
> > +# FS QA Test No. 1923
> > +#
> > +# This is a regression test for "xfs: Fix false ENOSPC when performing direct
> > +# write on a delalloc extent in cow fork".  If there is a lot of free space but
> > +# it is very fragmented, it's possible that a very large delalloc reservation
> > +# could be created in the CoW fork by a buffered write.  If a directio write
> > +# tries to convert the delalloc reservation to a real extent, it's possible
> > +# that the allocation will succeed but fail to convert even the first block of
> > +# the directio write range.  In this case, XFS will return ENOSPC even though
> > +# all it needed to do was to keep converting until the allocator returns ENOSPC
> > +# or the first block of the direct write got some space.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto quick clone
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	rm -f $file1 $file2 $fragmentedfile
> > +}
> > +
> > +# Import common functions.
> > +. ./common/reflink
> > +. ./common/inject
> > +
> > +# real QA test starts here
> > +_fixed_by_kernel_commit XXXXX \
> > +	"xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork"
> 
> Is it this commit below, or a new fix with same subject?
> 
> commit d62113303d691bcd8d0675ae4ac63e7769afc56c
> Author: Chandan Babu R <chandan.babu@oracle.com>
> Date:   Thu Aug 4 08:59:27 2022 -0700
> 
>     xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork
> 
> If it's an old commit, please replace the "XXXXX" to "d62113303d69".

Ah, yes, it is that one.  Sorry I forgot to update that.  As you can
tell, this is a really old test that ... I guess we never sent
upstream.

> > +
> > +# Modify as appropriate.
> > +_supported_fs generic
> 
> "xfs"? As it's in tests/xfs/ directory.

<nod>

> > +_require_test_program "punch-alternating"
> > +_require_test_reflink
> > +_require_xfs_io_error_injection "bmap_alloc_minlen_extent"
> > +_require_test_delalloc
> > +
> > +file1=$TEST_DIR/file1.$seq
> > +file2=$TEST_DIR/file2.$seq
> > +fragmentedfile=$TEST_DIR/fragmentedfile.$seq
> 
> As you use the $TEST_DIR, it might be worth making sure these files
> aren't existed. Due to the files (from other cases) in $TEST_DIR might
> not be cleaned, especially if they don't take much free space.

Ok.

> So if these 3 files won't take much space, we can keep them, don't need
> a specific _cleanup(). And move the "rm -f $file1 $file2 $fragmentedfile"
> at here.

Well you make a good point that we should remove those files so that
they end up in a known state (i.e. nonexistence) where we can't get
tripped up by someone creating, say, a /dev/sda symlink to
$TEST_DIR/file1.$seq.

I don't think the files are all that big, but I'll continue removing
them anyway.  Deletion is part of aging.

> > +
> > +# COW operates on pages, so we must not perform operations in units smaller
> > +# than a page.
> > +blksz=$(_get_file_block_size $TEST_DIR)
> > +pagesz=$(_get_page_size)
> > +if (( $blksz < $pagesz )); then
> > +	blksz=$pagesz
> > +fi
> 
> Just curious, this's a xfs specific test case, can xfs support blocksize >
> pagesize? If not, can we just use pagesz directly at here ?

It doesn't support bs > ps yet, but the point of clamping $blksz here
is the fact that we'll never try to writeback less than a page, so we
might as well run with those units.

(This test has nothing to do with the ongoing bs>ps effort.)

> > +
> > +echo "Create source file"
> > +$XFS_IO_PROG -f -c "pwrite 0 $((blksz * 256))" $file1 >> $seqres.full
> > +
> > +sync
> > +
> > +echo "Create Reflinked file"
> > +_cp_reflink $file1 $file2 >> $seqres.full
> > +#$XFS_IO_PROG -f -c "reflink $file1" $file2 >> $seqres.full
> 
> There's a "#", do you hope to run it or not?

Oops.  That's redundant with the _cp_reflink.

--D

> > +
> > +echo "Set cowextsize"
> > +$XFS_IO_PROG -c "cowextsize $((blksz * 128))" -c stat $file1 >> $seqres.full
> > +
> > +echo "Fragment FS"
> > +$XFS_IO_PROG -f -c "pwrite 0 $((blksz * 512))" $fragmentedfile >> $seqres.full
> > +sync
> > +$here/src/punch-alternating $fragmentedfile
> > +
> > +echo "Allocate block sized extent from now onwards"
> > +_test_inject_error bmap_alloc_minlen_extent 1
> > +
> > +echo "Create big delalloc extent in CoW fork"
> > +$XFS_IO_PROG -c "pwrite 0 $blksz" $file1 >> $seqres.full
> > +
> > +sync
> > +
> > +$XFS_IO_PROG -c 'bmap -elpv' -c 'bmap -celpv' $file1 >> $seqres.full
> > +
> > +echo "Direct I/O write at offset 3FSB"
> > +$XFS_IO_PROG -d -c "pwrite $((blksz * 3)) $((blksz * 2))" $file1 >> $seqres.full
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/xfs/1923.out b/tests/xfs/1923.out
> > new file mode 100644
> > index 0000000000..a0553cf3ee
> > --- /dev/null
> > +++ b/tests/xfs/1923.out
> > @@ -0,0 +1,8 @@
> > +QA output created by 1923
> > +Create source file
> > +Create Reflinked file
> > +Set cowextsize
> > +Fragment FS
> > +Allocate block sized extent from now onwards
> > +Create big delalloc extent in CoW fork
> > +Direct I/O write at offset 3FSB
> > 
> 
>
diff mbox series

Patch

diff --git a/common/rc b/common/rc
index 30c44dddd9..d3a2a0718b 100644
--- a/common/rc
+++ b/common/rc
@@ -1873,6 +1873,20 @@  _require_scratch_delalloc()
 	_scratch_unmount
 }
 
+# Require test fs supports delay allocation.
+_require_test_delalloc()
+{
+	_require_command "$FILEFRAG_PROG" filefrag
+
+	rm -f $TEST_DIR/testy
+	$XFS_IO_PROG -f -c 'pwrite 0 64k' $TEST_DIR/testy &> /dev/null
+	$FILEFRAG_PROG -v $TEST_DIR/testy 2>&1 | grep -q delalloc
+	res=$?
+	rm -f $TEST_DIR/testy
+	test $res -eq 0 || \
+		_notrun "test requires delayed allocation buffered writes"
+}
+
 # this test needs a test partition - check we're ok & mount it
 #
 _require_test()
diff --git a/tests/xfs/1923 b/tests/xfs/1923
new file mode 100755
index 0000000000..4e494ad8c2
--- /dev/null
+++ b/tests/xfs/1923
@@ -0,0 +1,85 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+#
+# FS QA Test No. 1923
+#
+# This is a regression test for "xfs: Fix false ENOSPC when performing direct
+# write on a delalloc extent in cow fork".  If there is a lot of free space but
+# it is very fragmented, it's possible that a very large delalloc reservation
+# could be created in the CoW fork by a buffered write.  If a directio write
+# tries to convert the delalloc reservation to a real extent, it's possible
+# that the allocation will succeed but fail to convert even the first block of
+# the directio write range.  In this case, XFS will return ENOSPC even though
+# all it needed to do was to keep converting until the allocator returns ENOSPC
+# or the first block of the direct write got some space.
+#
+. ./common/preamble
+_begin_fstest auto quick clone
+
+_cleanup()
+{
+	cd /
+	rm -f $file1 $file2 $fragmentedfile
+}
+
+# Import common functions.
+. ./common/reflink
+. ./common/inject
+
+# real QA test starts here
+_fixed_by_kernel_commit XXXXX \
+	"xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork"
+
+# Modify as appropriate.
+_supported_fs generic
+_require_test_program "punch-alternating"
+_require_test_reflink
+_require_xfs_io_error_injection "bmap_alloc_minlen_extent"
+_require_test_delalloc
+
+file1=$TEST_DIR/file1.$seq
+file2=$TEST_DIR/file2.$seq
+fragmentedfile=$TEST_DIR/fragmentedfile.$seq
+
+# COW operates on pages, so we must not perform operations in units smaller
+# than a page.
+blksz=$(_get_file_block_size $TEST_DIR)
+pagesz=$(_get_page_size)
+if (( $blksz < $pagesz )); then
+	blksz=$pagesz
+fi
+
+echo "Create source file"
+$XFS_IO_PROG -f -c "pwrite 0 $((blksz * 256))" $file1 >> $seqres.full
+
+sync
+
+echo "Create Reflinked file"
+_cp_reflink $file1 $file2 >> $seqres.full
+#$XFS_IO_PROG -f -c "reflink $file1" $file2 >> $seqres.full
+
+echo "Set cowextsize"
+$XFS_IO_PROG -c "cowextsize $((blksz * 128))" -c stat $file1 >> $seqres.full
+
+echo "Fragment FS"
+$XFS_IO_PROG -f -c "pwrite 0 $((blksz * 512))" $fragmentedfile >> $seqres.full
+sync
+$here/src/punch-alternating $fragmentedfile
+
+echo "Allocate block sized extent from now onwards"
+_test_inject_error bmap_alloc_minlen_extent 1
+
+echo "Create big delalloc extent in CoW fork"
+$XFS_IO_PROG -c "pwrite 0 $blksz" $file1 >> $seqres.full
+
+sync
+
+$XFS_IO_PROG -c 'bmap -elpv' -c 'bmap -celpv' $file1 >> $seqres.full
+
+echo "Direct I/O write at offset 3FSB"
+$XFS_IO_PROG -d -c "pwrite $((blksz * 3)) $((blksz * 2))" $file1 >> $seqres.full
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/1923.out b/tests/xfs/1923.out
new file mode 100644
index 0000000000..a0553cf3ee
--- /dev/null
+++ b/tests/xfs/1923.out
@@ -0,0 +1,8 @@ 
+QA output created by 1923
+Create source file
+Create Reflinked file
+Set cowextsize
+Fragment FS
+Allocate block sized extent from now onwards
+Create big delalloc extent in CoW fork
+Direct I/O write at offset 3FSB