diff mbox series

[v2] _require_sparse_files: add a safeguard against media wearout

Message ID 20231217210053.9180-1-patrakov@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] _require_sparse_files: add a safeguard against media wearout | expand

Commit Message

Alexander Patrakov Dec. 17, 2023, 9 p.m. UTC
_require_sparse_files is implemented as a list of filesystems known not to
support sparse files, and therefore it misses some cases.

However, if sparse files do not work as expected during a test, the risk
is that the test will write out to the disk all the zeros that would
normally be unwritten. This amounts to at least 4 TB for the generic/129
test, and therefore there is a significant media wearout concern here.

Adding more filesystems to the list of exclusions would not scale and
would not work anyway because CIFS backed by SAMBA is safe, while CIFS
backed by Windows Server 2022 is not.

In other words, Windows reserves the right to sometimes (!) ignore our
intent to create a sparse file.

More discussion: https://lore.kernel.org/fstests/20231206184759.GA3964019@frogsfrogsfrogs/T/#t

Mitigate this risk by doing a small-scale test that reliably triggers
Windows misbehavior and checking if the resulting file ends up being
not sufficiently sparse.

Signed-off-by: Alexander Patrakov <patrakov@gmail.com>
---
 common/rc | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Darrick J. Wong Dec. 18, 2023, 5:45 p.m. UTC | #1
On Mon, Dec 18, 2023 at 05:00:53AM +0800, Alexander Patrakov wrote:
> _require_sparse_files is implemented as a list of filesystems known not to
> support sparse files, and therefore it misses some cases.
> 
> However, if sparse files do not work as expected during a test, the risk
> is that the test will write out to the disk all the zeros that would
> normally be unwritten. This amounts to at least 4 TB for the generic/129
> test, and therefore there is a significant media wearout concern here.
> 
> Adding more filesystems to the list of exclusions would not scale and
> would not work anyway because CIFS backed by SAMBA is safe, while CIFS
> backed by Windows Server 2022 is not.
> 
> In other words, Windows reserves the right to sometimes (!) ignore our
> intent to create a sparse file.
> 
> More discussion: https://lore.kernel.org/fstests/20231206184759.GA3964019@frogsfrogsfrogs/T/#t
> 
> Mitigate this risk by doing a small-scale test that reliably triggers
> Windows misbehavior and checking if the resulting file ends up being
> not sufficiently sparse.
> 
> Signed-off-by: Alexander Patrakov <patrakov@gmail.com>
> ---
>  common/rc | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index cc92fe06..5d27602a 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -2871,6 +2871,12 @@ _require_fs_space()
>  # Check if the filesystem supports sparse files.
>  #
>  # Unfortunately there is no better way to do this than a manual black list.

FWIW if the truncate/write/nblocks check strategy below can be made more
general, then the comment above (and the FSTYP-filtering) become
unnecessary, right?

> +# However, letting tests expand all the holes and write terabytes of zeros to
> +# the media is also not acceptable due to wearout concerns.
> +#
> +# Note: even though CIFS supports sparse files, this test will mark it as
> +# failing the requirement if we can coax the server into allocating and writing
> +# the ranges where holes are expected. This happens with Windows servers.
>  #
>  _require_sparse_files()
>  {
> @@ -2881,6 +2887,23 @@ _require_sparse_files()
>      *)
>          ;;
>      esac
> +
> +    local testfile="$TEST_DIR/$$.sparsefiletest"
> +    rm -f "$testfile"
> +
> +    # A small-scale version of looptest - known to trigger Microsoft SMB server
> +    # into the decision to write zeros to the disk. Also creates a non-sparse file
> +    # on vfat.
> +    # See also the discussion at https://lore.kernel.org/fstests/20231206184759.GA3964019@frogsfrogsfrogs/T/#t
> +    $XFS_IO_PROG -f \
> +	-c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 102400 102400' \
> +	-c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 204800 102400' \
> +	-c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 307200 102400' \
> +	-c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 409600 102400' "$testfile" >/dev/null
> +    resulting_file_size_kb=$( du -sk "$testfile" | cut -f 1 )
> +    rm -f "$testfile"
> +    [ $resulting_file_size_kb -ge 300 ] && \

I might be missing something here because I've long forgotten how CIFS
and Windows work, but -- why is it necessary to truncate and write past
eof four times?  Won't the truncates free all the blocks associated with
the file?

Also, why isn't it sufficient to check that the du output doesn't exceed
~110K (adding 10% overhead)?

> +	_notrun "Sparse files do not work as expected, skipping test due to media wearout concerns"

I think the notrun message should be restricted to stating that sparse
files do not work as expected -- the other callers aren't necessarily
worried about wearout.

--D

>  }
>  
>  _require_debugfs()
> -- 
> 2.43.0
> 
>
Alexander Patrakov Dec. 18, 2023, 6:27 p.m. UTC | #2
Hello Darrick, and thanks for the feedback.

On Tue, Dec 19, 2023 at 1:45 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Mon, Dec 18, 2023 at 05:00:53AM +0800, Alexander Patrakov wrote:
> > _require_sparse_files is implemented as a list of filesystems known not to
> > support sparse files, and therefore it misses some cases.
> >
> > However, if sparse files do not work as expected during a test, the risk
> > is that the test will write out to the disk all the zeros that would
> > normally be unwritten. This amounts to at least 4 TB for the generic/129
> > test, and therefore there is a significant media wearout concern here.
> >
> > Adding more filesystems to the list of exclusions would not scale and
> > would not work anyway because CIFS backed by SAMBA is safe, while CIFS
> > backed by Windows Server 2022 is not.
> >
> > In other words, Windows reserves the right to sometimes (!) ignore our
> > intent to create a sparse file.
> >
> > More discussion: https://lore.kernel.org/fstests/20231206184759.GA3964019@frogsfrogsfrogs/T/#t
> >
> > Mitigate this risk by doing a small-scale test that reliably triggers
> > Windows misbehavior and checking if the resulting file ends up being
> > not sufficiently sparse.
> >
> > Signed-off-by: Alexander Patrakov <patrakov@gmail.com>
> > ---
> >  common/rc | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> >
> > diff --git a/common/rc b/common/rc
> > index cc92fe06..5d27602a 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -2871,6 +2871,12 @@ _require_fs_space()
> >  # Check if the filesystem supports sparse files.
> >  #
> >  # Unfortunately there is no better way to do this than a manual black list.
>
> FWIW if the truncate/write/nblocks check strategy below can be made more
> general, then the comment above (and the FSTYP-filtering) become
> unnecessary, right?

I think so. I will retest with exfat and hfsplus, and, if their
failures are caught by the code that I added, I will delete the black
list.

>
> > +# However, letting tests expand all the holes and write terabytes of zeros to
> > +# the media is also not acceptable due to wearout concerns.
> > +#
> > +# Note: even though CIFS supports sparse files, this test will mark it as
> > +# failing the requirement if we can coax the server into allocating and writing
> > +# the ranges where holes are expected. This happens with Windows servers.
> >  #
> >  _require_sparse_files()
> >  {
> > @@ -2881,6 +2887,23 @@ _require_sparse_files()
> >      *)
> >          ;;
> >      esac
> > +
> > +    local testfile="$TEST_DIR/$$.sparsefiletest"
> > +    rm -f "$testfile"
> > +
> > +    # A small-scale version of looptest - known to trigger Microsoft SMB server
> > +    # into the decision to write zeros to the disk. Also creates a non-sparse file
> > +    # on vfat.
> > +    # See also the discussion at https://lore.kernel.org/fstests/20231206184759.GA3964019@frogsfrogsfrogs/T/#t
> > +    $XFS_IO_PROG -f \
> > +     -c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 102400 102400' \
> > +     -c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 204800 102400' \
> > +     -c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 307200 102400' \
> > +     -c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 409600 102400' "$testfile" >/dev/null
> > +    resulting_file_size_kb=$( du -sk "$testfile" | cut -f 1 )
> > +    rm -f "$testfile"
> > +    [ $resulting_file_size_kb -ge 300 ] && \
>
> I might be missing something here because I've long forgotten how CIFS
> and Windows work, but -- why is it necessary to truncate and write past
> eof four times?  Won't the truncates free all the blocks associated with
> the file?

No, Windows has special not-fully-understood logic (or maybe a race)
here. If I leave only the last truncate/pwrite pair, Windows
successfully makes a sparse file here, and then writes multiple
terabytes of real zeros during generic/129, which is what I am trying
to avoid.

In other words, CIFS mounts from Windows do support sparse files in
general, but this particular pattern of writes triggers the "your file
shall not be sparse" logic.

Let me quote from
https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-fsctl_set_zero_data:

"""
If you use the FSCTL_SET_ZERO_DATA control code to write zeros (0) to
a non-sparse file, zeros (0) are written to the file. The system
allocates disk storage for all of the zero (0) range, which is
equivalent to using the WriteFile function to write zeros (0) to a
file.
"""

I guess here is what happens: initially the file is not created as a
sparse one (although I am not sure why - it would make more sense to
add another iteration at the beginning to ensure that). Then,
apparently, truncation to zero size does not reset the non-sparse
status. Then Windows sees that the file is not sparse, and so any
attempts to seek pas the EOF should, according to the Microsoft rule
quoted above, be translated to writing real zeros.

>
> Also, why isn't it sufficient to check that the du output doesn't exceed
> ~110K (adding 10% overhead)?

The overhead might be larger if the filesystem supports sparse files
but has, e.g., 64 KB native cluster size. Then a single badly-aligned
100 KB blob (which is what happens here) might actually occupy three
clusters and thus use 192 KB on the disk - which is still fine as long
as all preceding zeros are not written.

>
> > +     _notrun "Sparse files do not work as expected, skipping test due to media wearout concerns"
>
> I think the notrun message should be restricted to stating that sparse
> files do not work as expected -- the other callers aren't necessarily
> worried about wearout.

OK, the v3 will have this reworded.

>
> --D
>
> >  }
> >
> >  _require_debugfs()
> > --
> > 2.43.0
> >
> >
diff mbox series

Patch

diff --git a/common/rc b/common/rc
index cc92fe06..5d27602a 100644
--- a/common/rc
+++ b/common/rc
@@ -2871,6 +2871,12 @@  _require_fs_space()
 # Check if the filesystem supports sparse files.
 #
 # Unfortunately there is no better way to do this than a manual black list.
+# However, letting tests expand all the holes and write terabytes of zeros to
+# the media is also not acceptable due to wearout concerns.
+#
+# Note: even though CIFS supports sparse files, this test will mark it as
+# failing the requirement if we can coax the server into allocating and writing
+# the ranges where holes are expected. This happens with Windows servers.
 #
 _require_sparse_files()
 {
@@ -2881,6 +2887,23 @@  _require_sparse_files()
     *)
         ;;
     esac
+
+    local testfile="$TEST_DIR/$$.sparsefiletest"
+    rm -f "$testfile"
+
+    # A small-scale version of looptest - known to trigger Microsoft SMB server
+    # into the decision to write zeros to the disk. Also creates a non-sparse file
+    # on vfat.
+    # See also the discussion at https://lore.kernel.org/fstests/20231206184759.GA3964019@frogsfrogsfrogs/T/#t
+    $XFS_IO_PROG -f \
+	-c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 102400 102400' \
+	-c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 204800 102400' \
+	-c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 307200 102400' \
+	-c 'truncate 0' -c 'pwrite -b 102400 -S 0x61 409600 102400' "$testfile" >/dev/null
+    resulting_file_size_kb=$( du -sk "$testfile" | cut -f 1 )
+    rm -f "$testfile"
+    [ $resulting_file_size_kb -ge 300 ] && \
+	_notrun "Sparse files do not work as expected, skipping test due to media wearout concerns"
 }
 
 _require_debugfs()