diff mbox series

generic: add missing $FSX_AVOID to fsx invocations

Message ID 20221105182918.24099-1-tytso@mit.edu (mailing list archive)
State New, archived
Headers show
Series generic: add missing $FSX_AVOID to fsx invocations | expand

Commit Message

Theodore Ts'o Nov. 5, 2022, 6:29 p.m. UTC
From: Eric Whitney <enwlinux@gmail.com>

generic/455 fails when run on an ext4 bigalloc file system.  Its
fsx invocations can make insert range and collapse range calls whose
arguments are not cluster aligned, and ext4 will fail those calls for
bigalloc.  They can be suppressed by adding the FSX_AVOID environment
variable to the fsx invocation and setting its value appropriately in
the test environment, as is done for other fsx-based tests.  This
avoids the need to exclude the test to avoid failures and makes it
possible to take advantage of the remainder of its coverage.

[ Also fix generic/457, as requested by Dave Chinner -- TYT]

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---

This is a respin of "generic/455: add $FSX_AVOID" which Eric posted here:
https://lore.kernel.org/r/20221021211950.510006-1-enwlinux@gmail.com
it adds a similar fix for generic/457, as requested by Dave and Zorro.

 tests/generic/455 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Zorro Lang Nov. 6, 2022, 12:10 p.m. UTC | #1
On Sat, Nov 05, 2022 at 02:29:18PM -0400, Theodore Ts'o wrote:
> From: Eric Whitney <enwlinux@gmail.com>
> 
> generic/455 fails when run on an ext4 bigalloc file system.  Its
> fsx invocations can make insert range and collapse range calls whose
> arguments are not cluster aligned, and ext4 will fail those calls for
> bigalloc.  They can be suppressed by adding the FSX_AVOID environment
> variable to the fsx invocation and setting its value appropriately in
> the test environment, as is done for other fsx-based tests.  This
> avoids the need to exclude the test to avoid failures and makes it
> possible to take advantage of the remainder of its coverage.
> 
> [ Also fix generic/457, as requested by Dave Chinner -- TYT]
> 
> Signed-off-by: Eric Whitney <enwlinux@gmail.com>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
> 
> This is a respin of "generic/455: add $FSX_AVOID" which Eric posted here:
> https://lore.kernel.org/r/20221021211950.510006-1-enwlinux@gmail.com
> it adds a similar fix for generic/457, as requested by Dave and Zorro.

Thanks Ted, actually I'm going to merge this patch (with g/457 fix by myself)
this week.

But looks like you missed the change on g/457 (might forgot to commit). Anyway,
I think it's not worth wasting one more week for this small change, I'll help
to change g/457 when I merge this patch.

Thanks,
Zorro

> 
>  tests/generic/455 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/generic/455 b/tests/generic/455
> index 649b54108..c13d872c6 100755
> --- a/tests/generic/455
> +++ b/tests/generic/455
> @@ -77,7 +77,7 @@ FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV"
>  seeds=(0 0 0 0)
>  # Run fsx for a while
>  for j in `seq 0 $((NUM_FILES-1))`; do
> -	run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j &
> +	run_check $here/ltp/fsx $FSX_OPTS $FSX_AVOID -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j &
>  done
>  wait
>  
> -- 
> 2.31.0
>
Theodore Ts'o Nov. 6, 2022, 9:44 p.m. UTC | #2
On Sun, Nov 06, 2022 at 08:10:31PM +0800, Zorro Lang wrote:
> 
> Thanks Ted, actually I'm going to merge this patch (with g/457 fix by myself)
> this week.
> 
> But looks like you missed the change on g/457 (might forgot to commit). Anyway,
> I think it's not worth wasting one more week for this small change, I'll help
> to change g/457 when I merge this patch.

Argh, yeah, sorry, I forgot to do the "git add -u".  Anyway, here's
the leftover change that was in my git working directory, if it saves
you 5 seconds or so.  :-)

BTW, I noticed there were a number of fsx --replay-ops invocations
where we could potentially add the $FSX_AVOID.  OTOH, it would
probably make those tests completely pointless, so it might be easier
just for the test runners to relay on a group-based exclusion in those
cases.   What do you think?

				- Ted

diff --git a/tests/generic/457 b/tests/generic/457
index da75798f1..ca0f5e622 100755
--- a/tests/generic/457
+++ b/tests/generic/457
@@ -83,7 +83,7 @@ FSX_OPTS="-N $NUM_OPS -d -k -P $SANITY_DIR -i $LOGWRITES_DMDEV"
 for j in `seq 0 $((NUM_FILES-1))`; do
 	# clone the clone from prev iteration which may have already mutated
 	_cp_reflink $SCRATCH_MNT/testfile$((j-1)) $SCRATCH_MNT/testfile$j
-	run_check $here/ltp/fsx $FSX_OPTS -S 0 -j $j $SCRATCH_MNT/testfile$j &
+	run_check $here/ltp/fsx $FSX_OPTS $FSX_AVOID -S 0 -j $j $SCRATCH_MNT/testfile$j &
 done
 wait
Zorro Lang Nov. 7, 2022, 2:02 a.m. UTC | #3
On Sun, Nov 06, 2022 at 04:44:05PM -0500, Theodore Ts'o wrote:
> On Sun, Nov 06, 2022 at 08:10:31PM +0800, Zorro Lang wrote:
> > 
> > Thanks Ted, actually I'm going to merge this patch (with g/457 fix by myself)
> > this week.
> > 
> > But looks like you missed the change on g/457 (might forgot to commit). Anyway,
> > I think it's not worth wasting one more week for this small change, I'll help
> > to change g/457 when I merge this patch.
> 
> Argh, yeah, sorry, I forgot to do the "git add -u".  Anyway, here's
> the leftover change that was in my git working directory, if it saves
> you 5 seconds or so.  :-)

Hi Ted,

Thanks, I've pushed this patch last night :)

> 
> BTW, I noticed there were a number of fsx --replay-ops invocations
> where we could potentially add the $FSX_AVOID.  OTOH, it would
> probably make those tests completely pointless, so it might be easier
> just for the test runners to relay on a group-based exclusion in those
> cases.   What do you think?

I think it doesn't make sense to use $FSX_AVOID in `fsx --replay-ops` cases.
Due to generally the operations which a cases would like to replay are exact
steps to reproduce to a known bug. If we skip some operations (e.g. -F), it
doesn't make sense for this reproducer.

The recommended way for this kind of cases is making sure current fs/system
support the operations will be run by fsx, especially those features are not
common on different fs/system. Likes g/456, it does:

  write 0x137dd 0xdc69 0x0
  fallocate 0xb531 0xb5ad 0x21446
  collapse_range 0x1c000 0x4000 0x21446
  write 0x3e5ec 0x1a14 0x21446
  zero_range 0x20fac 0x6d9c 0x40000 keep_size
  mapwrite 0x216ad 0x274f 0x40000

So it uses below _require_* helpers to make sure these operations are supported,
before testing:

  _require_xfs_io_command "falloc"
  _require_xfs_io_command "falloc" "-k"
  _require_xfs_io_command "fzero"
  _require_xfs_io_command "fcollapse"

That's my point, hope I didn't misunderstand what you said :)

Thanks,
Zorro

> 
> 				- Ted
> 
> diff --git a/tests/generic/457 b/tests/generic/457
> index da75798f1..ca0f5e622 100755
> --- a/tests/generic/457
> +++ b/tests/generic/457
> @@ -83,7 +83,7 @@ FSX_OPTS="-N $NUM_OPS -d -k -P $SANITY_DIR -i $LOGWRITES_DMDEV"
>  for j in `seq 0 $((NUM_FILES-1))`; do
>  	# clone the clone from prev iteration which may have already mutated
>  	_cp_reflink $SCRATCH_MNT/testfile$((j-1)) $SCRATCH_MNT/testfile$j
> -	run_check $here/ltp/fsx $FSX_OPTS -S 0 -j $j $SCRATCH_MNT/testfile$j &
> +	run_check $here/ltp/fsx $FSX_OPTS $FSX_AVOID -S 0 -j $j $SCRATCH_MNT/testfile$j &
>  done
>  wait
>  
>
Theodore Ts'o Nov. 7, 2022, 4:35 p.m. UTC | #4
On Mon, Nov 07, 2022 at 10:02:36AM +0800, Zorro Lang wrote:
> I think it doesn't make sense to use $FSX_AVOID in `fsx --replay-ops` cases.
> Due to generally the operations which a cases would like to replay are exact
> steps to reproduce to a known bug. If we skip some operations (e.g. -F), it
> doesn't make sense for this reproducer.
> 
> The recommended way for this kind of cases is making sure current fs/system
> support the operations will be run by fsx, especially those features are not
> common on different fs/system....
>
> So it uses below _require_* helpers to make sure these operations are supported,
> before testing:
> 
>   _require_xfs_io_command "falloc"
>   _require_xfs_io_command "falloc" "-k"
>   _require_xfs_io_command "fzero"
>   _require_xfs_io_command "fcollapse"
> 
> That's my point, hope I didn't misunderstand what you said :)

No, you didn't understand me.  :-)

For context, I have an out of tree patch (see attached), which I had
tried upstreaming a while back, but it got rejected, so I've continued
to keep it in my personal tree.  The basic idea is sometimes you might
want to suppress a test even *though* _require_xfs_io_command seems to
indicate that operation was supported.

This might either be because the test didn't know about ext4
bigalloc's cluster alignment requirements, or because a particular
operation might just be *buggy* and being able to run tests as if a
particular command wasn't supported was useful.

It was rejected because the claim was that you could just exclude by
group instead (e.g., "punch", "collapse") but I didn't trust that the
group list would be kept up to date, so I never really agreed with
that line of reasoning.  These days, given that group declaration are
kept in the test script, it's much less likely to happen, but I've
kept the patch in my tree because it's occasionally useful.

At this point, it's admittedly pretty rarely needed since ext4's
collapse and insert range commands are pretty solid modulo tests not
understanding cluster alignment, but still, it's not much effort for
me to keep carrying the patch and I don't expect it will ever get
upstreamed.

					- Ted

commit c9d25475a94d5e53d7f18d247a17088999522862
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sat Oct 17 14:39:26 2015 -0400

    common: introduce XFS_IO_AVOID env var
    
    Like FSSTRESS_AVOID and FSX_AVOID, XFS_IO_AVOID can be used to avoid
    using various advanced file system features such as "fpunch"
    "fcollapse", "finsert", or "zero".  Tests that require an xfs_io
    command which is included in the space-separated list found in the
    XFS_IO_AVOID environment variable will be skipped using _notrun.
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

diff --git a/README b/README
index 4c4f22f85..42baff07b 100644
--- a/README
+++ b/README
@@ -245,6 +245,10 @@ Misc:
    this option is supported for all filesystems currently only -overlay is
    expected to run without issues. For other filesystems additional patches
    and fixes to the test suite might be needed.
+ - setenv XFS_IO_AVOID, which may contain a list of space separated
+   xfs_io commands which will be avoided in case you want to exclude
+   tests that require the use of certain file system operations such
+   as "fpunch", "fcollapse", "finsert", or "zero".
 
 ______________________
 USING THE FSQA SUITE
diff --git a/common/rc b/common/rc
index eb67e0cdc..d1c07a4d0 100644
--- a/common/rc
+++ b/common/rc
@@ -2485,6 +2485,11 @@ _require_xfs_io_command()
 	local opts=""
 	local attr_info=""
 
+	if echo "$XFS_IO_AVOID" | grep -wq -- "$command"
+	then
+		_notrun "Avoiding xfs_io $command"
+	fi
+
 	local testfile=$TEST_DIR/$$.xfs_io
 	local testio
 	case $command in
Darrick J. Wong Nov. 7, 2022, 8:09 p.m. UTC | #5
On Mon, Nov 07, 2022 at 11:35:16AM -0500, Theodore Ts'o wrote:
> On Mon, Nov 07, 2022 at 10:02:36AM +0800, Zorro Lang wrote:
> > I think it doesn't make sense to use $FSX_AVOID in `fsx --replay-ops` cases.
> > Due to generally the operations which a cases would like to replay are exact
> > steps to reproduce to a known bug. If we skip some operations (e.g. -F), it
> > doesn't make sense for this reproducer.
> > 
> > The recommended way for this kind of cases is making sure current fs/system
> > support the operations will be run by fsx, especially those features are not
> > common on different fs/system....
> >
> > So it uses below _require_* helpers to make sure these operations are supported,
> > before testing:
> > 
> >   _require_xfs_io_command "falloc"
> >   _require_xfs_io_command "falloc" "-k"
> >   _require_xfs_io_command "fzero"
> >   _require_xfs_io_command "fcollapse"
> > 
> > That's my point, hope I didn't misunderstand what you said :)
> 
> No, you didn't understand me.  :-)
> 
> For context, I have an out of tree patch (see attached), which I had
> tried upstreaming a while back, but it got rejected, so I've continued
> to keep it in my personal tree.  The basic idea is sometimes you might
> want to suppress a test even *though* _require_xfs_io_command seems to
> indicate that operation was supported.
> 
> This might either be because the test didn't know about ext4
> bigalloc's cluster alignment requirements, or because a particular
> operation might just be *buggy* and being able to run tests as if a
> particular command wasn't supported was useful.
> 
> It was rejected because the claim was that you could just exclude by
> group instead (e.g., "punch", "collapse") but I didn't trust that the
> group list would be kept up to date, so I never really agreed with
> that line of reasoning.  These days, given that group declaration are
> kept in the test script, it's much less likely to happen, but I've
> kept the patch in my tree because it's occasionally useful.
> 
> At this point, it's admittedly pretty rarely needed since ext4's
> collapse and insert range commands are pretty solid modulo tests not
> understanding cluster alignment, but still, it's not much effort for
> me to keep carrying the patch and I don't expect it will ever get
> upstreamed.

If it's collapse/insert range you're specifically worried about, perhaps
its time to implement _get_file_block_size for ext4 so that
_test_congruent_file_oplen can exclude those tests that will get the
alignment wrong?

--D

> 
> 					- Ted
> 
> commit c9d25475a94d5e53d7f18d247a17088999522862
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Sat Oct 17 14:39:26 2015 -0400
> 
>     common: introduce XFS_IO_AVOID env var
>     
>     Like FSSTRESS_AVOID and FSX_AVOID, XFS_IO_AVOID can be used to avoid
>     using various advanced file system features such as "fpunch"
>     "fcollapse", "finsert", or "zero".  Tests that require an xfs_io
>     command which is included in the space-separated list found in the
>     XFS_IO_AVOID environment variable will be skipped using _notrun.
>     
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> 
> diff --git a/README b/README
> index 4c4f22f85..42baff07b 100644
> --- a/README
> +++ b/README
> @@ -245,6 +245,10 @@ Misc:
>     this option is supported for all filesystems currently only -overlay is
>     expected to run without issues. For other filesystems additional patches
>     and fixes to the test suite might be needed.
> + - setenv XFS_IO_AVOID, which may contain a list of space separated
> +   xfs_io commands which will be avoided in case you want to exclude
> +   tests that require the use of certain file system operations such
> +   as "fpunch", "fcollapse", "finsert", or "zero".
>  
>  ______________________
>  USING THE FSQA SUITE
> diff --git a/common/rc b/common/rc
> index eb67e0cdc..d1c07a4d0 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -2485,6 +2485,11 @@ _require_xfs_io_command()
>  	local opts=""
>  	local attr_info=""
>  
> +	if echo "$XFS_IO_AVOID" | grep -wq -- "$command"
> +	then
> +		_notrun "Avoiding xfs_io $command"
> +	fi
> +
>  	local testfile=$TEST_DIR/$$.xfs_io
>  	local testio
>  	case $command in
Zorro Lang Nov. 8, 2022, 2:44 a.m. UTC | #6
On Mon, Nov 07, 2022 at 12:09:58PM -0800, Darrick J. Wong wrote:
> On Mon, Nov 07, 2022 at 11:35:16AM -0500, Theodore Ts'o wrote:
> > On Mon, Nov 07, 2022 at 10:02:36AM +0800, Zorro Lang wrote:
> > > I think it doesn't make sense to use $FSX_AVOID in `fsx --replay-ops` cases.
> > > Due to generally the operations which a cases would like to replay are exact
> > > steps to reproduce to a known bug. If we skip some operations (e.g. -F), it
> > > doesn't make sense for this reproducer.
> > > 
> > > The recommended way for this kind of cases is making sure current fs/system
> > > support the operations will be run by fsx, especially those features are not
> > > common on different fs/system....
> > >
> > > So it uses below _require_* helpers to make sure these operations are supported,
> > > before testing:
> > > 
> > >   _require_xfs_io_command "falloc"
> > >   _require_xfs_io_command "falloc" "-k"
> > >   _require_xfs_io_command "fzero"
> > >   _require_xfs_io_command "fcollapse"
> > > 
> > > That's my point, hope I didn't misunderstand what you said :)
> > 
> > No, you didn't understand me.  :-)

Wow, sorry I didn't realize that's a longer story :)

> > 
> > For context, I have an out of tree patch (see attached), which I had
> > tried upstreaming a while back, but it got rejected, so I've continued
> > to keep it in my personal tree.  The basic idea is sometimes you might
> > want to suppress a test even *though* _require_xfs_io_command seems to
> > indicate that operation was supported.
> > 
> > This might either be because the test didn't know about ext4
> > bigalloc's cluster alignment requirements, or because a particular
> > operation might just be *buggy* and being able to run tests as if a
> > particular command wasn't supported was useful.
> > 
> > It was rejected because the claim was that you could just exclude by
> > group instead (e.g., "punch", "collapse") but I didn't trust that the
> > group list would be kept up to date, so I never really agreed with

I agree that most of people don't pay much attention to the group names
when they write a test case. And there're new group names sometimes,
so we always need to supplement some group names later. But the group
name is still helpful, so if you feel some cases missed some groups, please
feel free to tell us :)

> > that line of reasoning.  These days, given that group declaration are
> > kept in the test script, it's much less likely to happen, but I've
> > kept the patch in my tree because it's occasionally useful.
> > 
> > At this point, it's admittedly pretty rarely needed since ext4's
> > collapse and insert range commands are pretty solid modulo tests not
> > understanding cluster alignment, but still, it's not much effort for
> > me to keep carrying the patch and I don't expect it will ever get
> > upstreamed.
> 
> If it's collapse/insert range you're specifically worried about, perhaps
> its time to implement _get_file_block_size for ext4 so that
> _test_congruent_file_oplen can exclude those tests that will get the
> alignment wrong?

Thanks Darrick, I'm thinking about this helper you wrote recently :)
(The real name is _require_congruent_file_oplen in common/rc.)

Hi Ted, is this helpful if you write a _ext4_get_file_block_size (refer to
_xfs_get_file_block_size in common/xfs), then call it in _get_file_block_size
to help _require_congruent_file_oplen to get the correct length which is an
integer multiple of the file's allocation unit size ?

If this's helpful for your first concern [1], please tell me. Then we can talk
about your second concern [2], if it's still your main concern now :)

"This might *[1]* either be because the test didn't know about ext4
 bigalloc's cluster alignment requirements, *[2]* or because a particular
 operation might just be *buggy* and being able to run tests as if a
 particular command wasn't supported was useful."

Thanks,
Zorro

> 
> --D
> 
> > 
> > 					- Ted
> > 
> > commit c9d25475a94d5e53d7f18d247a17088999522862
> > Author: Theodore Ts'o <tytso@mit.edu>
> > Date:   Sat Oct 17 14:39:26 2015 -0400
> > 
> >     common: introduce XFS_IO_AVOID env var
> >     
> >     Like FSSTRESS_AVOID and FSX_AVOID, XFS_IO_AVOID can be used to avoid
> >     using various advanced file system features such as "fpunch"
> >     "fcollapse", "finsert", or "zero".  Tests that require an xfs_io
> >     command which is included in the space-separated list found in the
> >     XFS_IO_AVOID environment variable will be skipped using _notrun.
> >     
> >     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> > 
> > diff --git a/README b/README
> > index 4c4f22f85..42baff07b 100644
> > --- a/README
> > +++ b/README
> > @@ -245,6 +245,10 @@ Misc:
> >     this option is supported for all filesystems currently only -overlay is
> >     expected to run without issues. For other filesystems additional patches
> >     and fixes to the test suite might be needed.
> > + - setenv XFS_IO_AVOID, which may contain a list of space separated
> > +   xfs_io commands which will be avoided in case you want to exclude
> > +   tests that require the use of certain file system operations such
> > +   as "fpunch", "fcollapse", "finsert", or "zero".
> >  
> >  ______________________
> >  USING THE FSQA SUITE
> > diff --git a/common/rc b/common/rc
> > index eb67e0cdc..d1c07a4d0 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -2485,6 +2485,11 @@ _require_xfs_io_command()
> >  	local opts=""
> >  	local attr_info=""
> >  
> > +	if echo "$XFS_IO_AVOID" | grep -wq -- "$command"
> > +	then
> > +		_notrun "Avoiding xfs_io $command"
> > +	fi
> > +
> >  	local testfile=$TEST_DIR/$$.xfs_io
> >  	local testio
> >  	case $command in
>
Theodore Ts'o Nov. 8, 2022, 3:08 p.m. UTC | #7
On Tue, Nov 08, 2022 at 10:44:55AM +0800, Zorro Lang wrote:
> > If it's collapse/insert range you're specifically worried about, perhaps
> > its time to implement _get_file_block_size for ext4 so that
> > _test_congruent_file_oplen can exclude those tests that will get the
> > alignment wrong?
> 
> Thanks Darrick, I'm thinking about this helper you wrote recently :)
> (The real name is _require_congruent_file_oplen in common/rc.)
> 
> Hi Ted, is this helpful if you write a _ext4_get_file_block_size (refer to
> _xfs_get_file_block_size in common/xfs), then call it in _get_file_block_size
> to help _require_congruent_file_oplen to get the correct length which is an
> integer multiple of the file's allocation unit size ?

Well, it's a bit more complicated than that, since there's a
distinction between the cluster size and block size.  The cluster size
is the allocation unit.  However, other things are done in units of
the block size, including how we report fiemap results, etc.

For example, take a look at generic/206.  It will try to create a file
system where the file system block size is the one fourth of the page
size --- so for x86, 1k.  However, for ext4, the default cluster size
for 16 times the block size, so for this file system the allocation
unit size is 16k.  If we make _get_file_block_size return 64k for all
files, then generic/206 will fail:

    pagesz=$(getconf PAGE_SIZE)
    blksz=$((pagesz / 4))
    ...
    _scratch_mkfs_blocksized $blksz > $seqres.full 2>&1
    ...
    real_blksz=$(_get_file_block_size $testdir)
    test $real_blksz != $blksz && _notrun "Failed to format with small blocksize."

I assume this works for xfs because only files in the real-time block
size will have a larger allocation unit size, but the root directory
has a allocation unit size of the "real" block size?

I suppose I could make a hypothetical _ext4_get_file_block_size lie
and return the "real" block size when it's called on the mount point,
but that seems kinda of gross, and it's also a lie, since the
allocation unit size for the root directory actually is the cluster
size, not the block size.

If I were going to be doing things from scratch, I'd make a
distinction between _get_file_allocation_size and
_get_file_system_block_size, which would be a lot *clearer* about what
is going on.  Even then, I could imagine some tests getting confused
with how, say, fiemap behaves with an ext4 file system with a 4k block
size and a 64k allocation unit size, so I'm not sure it's a complete
solution.  And doing this now would require quite a bit of code churn
in xfstests.

> If this's helpful for your first concern [1], please tell me. Then we can talk
> about your second concern [2], if it's still your main concern now :)
> 
> "This might *[1]* either be because the test didn't know about ext4
>  bigalloc's cluster alignment requirements, *[2]* or because a particular
>  operation might just be *buggy* and being able to run tests as if a
>  particular command wasn't supported was useful."

So [1] is a real concern, and at the moment, we just suppress all of
the tests that try to use collapse/insert range.  For example, from
the ext4/bigalloc config file:

    # Until we can teach xfstests the difference between cluster size and
    # block size, avoid collapse_range and insert_range since these will
    # fail due the fact that these operations require cluster-aligned
    # ranges.
    export FSX_AVOID="-C -I"
    export FSSTRESS_AVOID="-f collapse=0 -f insert=0"
    export XFS_IO_AVOID="fcollapse finsert"
    TEST_SET_EXCLUDE="-x collapse,insert"

That's not ideal, and it's been on my todo list to try to fix it, when
I could get one of those round tuit's.  However, I had assumed that we
would split _get_file_block_size somehow, given the observation I've
made above.  So this was always been something that has put me off,
because it looked like a much larger project than say, "gee, I have an
hour or two on a weekend, let me see if I can fix this".

[2] is practically not _really_ a concern any more.  It used to be
that one of the ways that I would root cause a failing test was to
suppress one of the advanced fallocate modes, whether it be
collapse/insert range, or going even further back, punch hole.  If
test then passed, then I could say, "ah, hah; the problem can probably
be localized to a certain part of the fs code."

However, that's mainly tests that are using fsstress and fsx, and we
have solutions for that.  And for other tests, I can examine the test
and see whether or not it's using collapse/insert range by inspection,
so it's really not that big of a deal.  I could imagine other file
systems who might find this useful in the future, if they were trying
to growing support for the more advanced fallocate modes --- but that
wouldn't *my* concern, and arguably those file systems could solve
problems alternate ways, such as having mount options that suppress
those fallocate modes entirely which could be used when running
xfstests.

						- Ted

P.S.  I have noticed some tests that use collapse/insert range but
don't declare that they are in the collapse or insert groups.
Fortunately, all of these tests are also in the clone group, and so
they don't apply to ext4, so _I_ don't care.  :-)
Zorro Lang Nov. 8, 2022, 3:56 p.m. UTC | #8
On Tue, Nov 08, 2022 at 10:08:03AM -0500, Theodore Ts'o wrote:
> On Tue, Nov 08, 2022 at 10:44:55AM +0800, Zorro Lang wrote:
> > > If it's collapse/insert range you're specifically worried about, perhaps
> > > its time to implement _get_file_block_size for ext4 so that
> > > _test_congruent_file_oplen can exclude those tests that will get the
> > > alignment wrong?
> > 
> > Thanks Darrick, I'm thinking about this helper you wrote recently :)
> > (The real name is _require_congruent_file_oplen in common/rc.)
> > 
> > Hi Ted, is this helpful if you write a _ext4_get_file_block_size (refer to
> > _xfs_get_file_block_size in common/xfs), then call it in _get_file_block_size
> > to help _require_congruent_file_oplen to get the correct length which is an
> > integer multiple of the file's allocation unit size ?
> 
> Well, it's a bit more complicated than that, since there's a
> distinction between the cluster size and block size.  The cluster size
> is the allocation unit.  However, other things are done in units of
> the block size, including how we report fiemap results, etc.
> 
> For example, take a look at generic/206.  It will try to create a file
> system where the file system block size is the one fourth of the page
> size --- so for x86, 1k.  However, for ext4, the default cluster size
> for 16 times the block size, so for this file system the allocation
> unit size is 16k.  If we make _get_file_block_size return 64k for all
> files, then generic/206 will fail:
> 
>     pagesz=$(getconf PAGE_SIZE)
>     blksz=$((pagesz / 4))
>     ...
>     _scratch_mkfs_blocksized $blksz > $seqres.full 2>&1
>     ...
>     real_blksz=$(_get_file_block_size $testdir)

Hmm... I'm wondering is it possible that g/206 need _get_block_size() at here?

Due to from the logic of _get_file_block_size() [1], extN call _get_block_size()
directly. So if you feel a specific _ext4_get_file_block_size() isn't suit for
g/206, maybe it turely want _get_block_size() at here?

[1]
_get_file_block_size()
{
        if [ -z $1 ] || [ ! -d $1 ]; then
                echo "Missing mount point argument for _get_file_block_size"
                exit 1
        fi

        case "$FSTYP" in
        "ocfs2")
                stat -c '%o' $1
                ;;
        "xfs")
                _xfs_get_file_block_size $1
                ;;
        *)
                _get_block_size $1
                ;;
        esac
}

>     test $real_blksz != $blksz && _notrun "Failed to format with small blocksize."
> 
> I assume this works for xfs because only files in the real-time block
> size will have a larger allocation unit size, but the root directory
> has a allocation unit size of the "real" block size?
> 
> I suppose I could make a hypothetical _ext4_get_file_block_size lie
> and return the "real" block size when it's called on the mount point,
> but that seems kinda of gross, and it's also a lie, since the
> allocation unit size for the root directory actually is the cluster
> size, not the block size.
> 
> If I were going to be doing things from scratch, I'd make a
> distinction between _get_file_allocation_size and
> _get_file_system_block_size, which would be a lot *clearer* about what

Currently _get_block_size is more like _get_file_system_block_size, and
_get_file_block_size is more like _get_file_allocation_size. Actually
I'm confused about these two names before :)

> is going on.  Even then, I could imagine some tests getting confused
> with how, say, fiemap behaves with an ext4 file system with a 4k block
> size and a 64k allocation unit size, so I'm not sure it's a complete
> solution.  And doing this now would require quite a bit of code churn
> in xfstests.
> 
> > If this's helpful for your first concern [1], please tell me. Then we can talk
> > about your second concern [2], if it's still your main concern now :)
> > 
> > "This might *[1]* either be because the test didn't know about ext4
> >  bigalloc's cluster alignment requirements, *[2]* or because a particular
> >  operation might just be *buggy* and being able to run tests as if a
> >  particular command wasn't supported was useful."
> 
> So [1] is a real concern, and at the moment, we just suppress all of
> the tests that try to use collapse/insert range.  For example, from
> the ext4/bigalloc config file:
> 
>     # Until we can teach xfstests the difference between cluster size and
>     # block size, avoid collapse_range and insert_range since these will
>     # fail due the fact that these operations require cluster-aligned
>     # ranges.
>     export FSX_AVOID="-C -I"
>     export FSSTRESS_AVOID="-f collapse=0 -f insert=0"
>     export XFS_IO_AVOID="fcollapse finsert"
>     TEST_SET_EXCLUDE="-x collapse,insert"

Thanks for pointing this out, I'll help to check all fcollapse and finsert
related cases, and add them into collapse or insert group. We're keep improving
group problem, most of cases really not take too much care about its group.
And I'll pay more attention about the groups when I review patches.

> 
> That's not ideal, and it's been on my todo list to try to fix it, when
> I could get one of those round tuit's.  However, I had assumed that we
> would split _get_file_block_size somehow, given the observation I've
> made above.  So this was always been something that has put me off,
> because it looked like a much larger project than say, "gee, I have an
> hour or two on a weekend, let me see if I can fix this".

I'd like to help your team to deal with the fstests problems you hit in your
using procedure, feel free to tell us the part you hope to fix/improve. I
have to care about other teams too, so might need more time to evaluate and
talk, sorry about that.

> 
> [2] is practically not _really_ a concern any more.  It used to be
> that one of the ways that I would root cause a failing test was to
> suppress one of the advanced fallocate modes, whether it be
> collapse/insert range, or going even further back, punch hole.  If
> test then passed, then I could say, "ah, hah; the problem can probably
> be localized to a certain part of the fs code."
> 
> However, that's mainly tests that are using fsstress and fsx, and we
> have solutions for that.  And for other tests, I can examine the test
> and see whether or not it's using collapse/insert range by inspection,
> so it's really not that big of a deal.  I could imagine other file
> systems who might find this useful in the future, if they were trying
> to growing support for the more advanced fallocate modes --- but that
> wouldn't *my* concern, and arguably those file systems could solve
> problems alternate ways, such as having mount options that suppress
> those fallocate modes entirely which could be used when running
> xfstests.
> 
> 						- Ted
> 
> P.S.  I have noticed some tests that use collapse/insert range but
> don't declare that they are in the collapse or insert groups.
> Fortunately, all of these tests are also in the clone group, and so
> they don't apply to ext4, so _I_ don't care.  :-)

Thanks, I'll try to check all collapse/insert related cases.

Thanks,
Zorro

>
Darrick J. Wong Nov. 8, 2022, 4:45 p.m. UTC | #9
On Tue, Nov 08, 2022 at 11:56:05PM +0800, Zorro Lang wrote:
> On Tue, Nov 08, 2022 at 10:08:03AM -0500, Theodore Ts'o wrote:
> > On Tue, Nov 08, 2022 at 10:44:55AM +0800, Zorro Lang wrote:
> > > > If it's collapse/insert range you're specifically worried about, perhaps
> > > > its time to implement _get_file_block_size for ext4 so that
> > > > _test_congruent_file_oplen can exclude those tests that will get the
> > > > alignment wrong?
> > > 
> > > Thanks Darrick, I'm thinking about this helper you wrote recently :)
> > > (The real name is _require_congruent_file_oplen in common/rc.)
> > > 
> > > Hi Ted, is this helpful if you write a _ext4_get_file_block_size (refer to
> > > _xfs_get_file_block_size in common/xfs), then call it in _get_file_block_size
> > > to help _require_congruent_file_oplen to get the correct length which is an
> > > integer multiple of the file's allocation unit size ?
> > 
> > Well, it's a bit more complicated than that, since there's a
> > distinction between the cluster size and block size.  The cluster size
> > is the allocation unit.  However, other things are done in units of
> > the block size, including how we report fiemap results, etc.
> > 
> > For example, take a look at generic/206.  It will try to create a file
> > system where the file system block size is the one fourth of the page
> > size --- so for x86, 1k.  However, for ext4, the default cluster size
> > for 16 times the block size, so for this file system the allocation
> > unit size is 16k.  If we make _get_file_block_size return 64k for all
> > files, then generic/206 will fail:
> > 
> >     pagesz=$(getconf PAGE_SIZE)
> >     blksz=$((pagesz / 4))
> >     ...
> >     _scratch_mkfs_blocksized $blksz > $seqres.full 2>&1
> >     ...
> >     real_blksz=$(_get_file_block_size $testdir)
> 
> Hmm... I'm wondering is it possible that g/206 need _get_block_size() at here?
> 
> Due to from the logic of _get_file_block_size() [1], extN call _get_block_size()
> directly. So if you feel a specific _ext4_get_file_block_size() isn't suit for
> g/206, maybe it turely want _get_block_size() at here?
> 
> [1]
> _get_file_block_size()
> {
>         if [ -z $1 ] || [ ! -d $1 ]; then
>                 echo "Missing mount point argument for _get_file_block_size"
>                 exit 1
>         fi
> 
>         case "$FSTYP" in
>         "ocfs2")
>                 stat -c '%o' $1
>                 ;;
>         "xfs")
>                 _xfs_get_file_block_size $1
>                 ;;
>         *)
>                 _get_block_size $1
>                 ;;
>         esac
> }
> 
> >     test $real_blksz != $blksz && _notrun "Failed to format with small blocksize."
> > 
> > I assume this works for xfs because only files in the real-time block
> > size will have a larger allocation unit size, but the root directory
> > has a allocation unit size of the "real" block size?
> > 
> > I suppose I could make a hypothetical _ext4_get_file_block_size lie
> > and return the "real" block size when it's called on the mount point,
> > but that seems kinda of gross, and it's also a lie, since the
> > allocation unit size for the root directory actually is the cluster
> > size, not the block size.
> > 
> > If I were going to be doing things from scratch, I'd make a
> > distinction between _get_file_allocation_size and
> > _get_file_system_block_size, which would be a lot *clearer* about what
> 
> Currently _get_block_size is more like _get_file_system_block_size, and
> _get_file_block_size is more like _get_file_allocation_size. Actually
> I'm confused about these two names before :)

Yes, both statements are correct.  They're both crappily named, one of
them by me. :(

> > is going on.  Even then, I could imagine some tests getting confused
> > with how, say, fiemap behaves with an ext4 file system with a 4k block
> > size and a 64k allocation unit size, so I'm not sure it's a complete
> > solution.  And doing this now would require quite a bit of code churn
> > in xfstests.

Yes, it has been, uh, fun to fix fstests so that xfs realtime with a 28k
allocation unit doesn't spray false failures everywhere.  I'm mostly
done now, I hope? ;)

(Or at least I haven't heard any complaints from Leah in a solid 2
months...)

> > > If this's helpful for your first concern [1], please tell me. Then we can talk
> > > about your second concern [2], if it's still your main concern now :)
> > > 
> > > "This might *[1]* either be because the test didn't know about ext4
> > >  bigalloc's cluster alignment requirements, *[2]* or because a particular
> > >  operation might just be *buggy* and being able to run tests as if a
> > >  particular command wasn't supported was useful."
> > 
> > So [1] is a real concern, and at the moment, we just suppress all of
> > the tests that try to use collapse/insert range.  For example, from
> > the ext4/bigalloc config file:
> > 
> >     # Until we can teach xfstests the difference between cluster size and
> >     # block size, avoid collapse_range and insert_range since these will
> >     # fail due the fact that these operations require cluster-aligned
> >     # ranges.
> >     export FSX_AVOID="-C -I"
> >     export FSSTRESS_AVOID="-f collapse=0 -f insert=0"
> >     export XFS_IO_AVOID="fcollapse finsert"
> >     TEST_SET_EXCLUDE="-x collapse,insert"
> 
> Thanks for pointing this out, I'll help to check all fcollapse and finsert
> related cases, and add them into collapse or insert group. We're keep improving
> group problem, most of cases really not take too much care about its group.
> And I'll pay more attention about the groups when I review patches.

<nod>

> > 
> > That's not ideal, and it's been on my todo list to try to fix it, when
> > I could get one of those round tuit's.  However, I had assumed that we
> > would split _get_file_block_size somehow, given the observation I've
> > made above.  So this was always been something that has put me off,
> > because it looked like a much larger project than say, "gee, I have an
> > hour or two on a weekend, let me see if I can fix this".
> 
> I'd like to help your team to deal with the fstests problems you hit in your
> using procedure, feel free to tell us the part you hope to fix/improve. I
> have to care about other teams too, so might need more time to evaluate and
> talk, sorry about that.
> 
> > 
> > [2] is practically not _really_ a concern any more.  It used to be
> > that one of the ways that I would root cause a failing test was to
> > suppress one of the advanced fallocate modes, whether it be
> > collapse/insert range, or going even further back, punch hole.  If
> > test then passed, then I could say, "ah, hah; the problem can probably
> > be localized to a certain part of the fs code."
> > 
> > However, that's mainly tests that are using fsstress and fsx, and we
> > have solutions for that.  And for other tests, I can examine the test
> > and see whether or not it's using collapse/insert range by inspection,
> > so it's really not that big of a deal.  I could imagine other file
> > systems who might find this useful in the future, if they were trying
> > to growing support for the more advanced fallocate modes --- but that
> > wouldn't *my* concern, and arguably those file systems could solve
> > problems alternate ways, such as having mount options that suppress
> > those fallocate modes entirely which could be used when running
> > xfstests.
> > 
> > 						- Ted
> > 
> > P.S.  I have noticed some tests that use collapse/insert range but
> > don't declare that they are in the collapse or insert groups.
> > Fortunately, all of these tests are also in the clone group, and so
> > they don't apply to ext4, so _I_ don't care.  :-)
> 
> Thanks, I'll try to check all collapse/insert related cases.

<nod> Bad throwback to the days when we didn't scrutinize group names
all that closely.  Someone probably ought to write a dumb linter that
can look for obvious patterns in a testcase and nominate group names.

--D

> Thanks,
> Zorro
> 
> > 
>
Zorro Lang Nov. 11, 2022, 2:25 a.m. UTC | #10
On Tue, Nov 08, 2022 at 08:45:04AM -0800, Darrick J. Wong wrote:
> On Tue, Nov 08, 2022 at 11:56:05PM +0800, Zorro Lang wrote:
> > On Tue, Nov 08, 2022 at 10:08:03AM -0500, Theodore Ts'o wrote:
> > > On Tue, Nov 08, 2022 at 10:44:55AM +0800, Zorro Lang wrote:
> > > > > If it's collapse/insert range you're specifically worried about, perhaps
> > > > > its time to implement _get_file_block_size for ext4 so that
> > > > > _test_congruent_file_oplen can exclude those tests that will get the
> > > > > alignment wrong?
> > > > 
> > > > Thanks Darrick, I'm thinking about this helper you wrote recently :)
> > > > (The real name is _require_congruent_file_oplen in common/rc.)
> > > > 
> > > > Hi Ted, is this helpful if you write a _ext4_get_file_block_size (refer to
> > > > _xfs_get_file_block_size in common/xfs), then call it in _get_file_block_size
> > > > to help _require_congruent_file_oplen to get the correct length which is an
> > > > integer multiple of the file's allocation unit size ?
> > > 
> > > Well, it's a bit more complicated than that, since there's a
> > > distinction between the cluster size and block size.  The cluster size
> > > is the allocation unit.  However, other things are done in units of
> > > the block size, including how we report fiemap results, etc.
> > > 
> > > For example, take a look at generic/206.  It will try to create a file
> > > system where the file system block size is the one fourth of the page
> > > size --- so for x86, 1k.  However, for ext4, the default cluster size
> > > for 16 times the block size, so for this file system the allocation
> > > unit size is 16k.  If we make _get_file_block_size return 64k for all
> > > files, then generic/206 will fail:
> > > 
> > >     pagesz=$(getconf PAGE_SIZE)
> > >     blksz=$((pagesz / 4))
> > >     ...
> > >     _scratch_mkfs_blocksized $blksz > $seqres.full 2>&1
> > >     ...
> > >     real_blksz=$(_get_file_block_size $testdir)
> > 
> > Hmm... I'm wondering is it possible that g/206 need _get_block_size() at here?
> > 
> > Due to from the logic of _get_file_block_size() [1], extN call _get_block_size()
> > directly. So if you feel a specific _ext4_get_file_block_size() isn't suit for
> > g/206, maybe it turely want _get_block_size() at here?

I just checked g/206 and other similar cases, I think they really need
_get_file_block_size, not the _get_block_size.

But for extN, as Ted said "it's a bit more complicated ...". If the
_get_file_block_size should return different allocate size according
to different later operations, how about let _get_file_block_size
support an optional argument, so specify what kind of allocate size
should be returned? XFS part can ignore this optional argument, if
it doesn't care.

Thanks,
Zorro

> > 
> > [1]
> > _get_file_block_size()
> > {
> >         if [ -z $1 ] || [ ! -d $1 ]; then
> >                 echo "Missing mount point argument for _get_file_block_size"
> >                 exit 1
> >         fi
> > 
> >         case "$FSTYP" in
> >         "ocfs2")
> >                 stat -c '%o' $1
> >                 ;;
> >         "xfs")
> >                 _xfs_get_file_block_size $1
> >                 ;;
> >         *)
> >                 _get_block_size $1
> >                 ;;
> >         esac
> > }
> > 
> > >     test $real_blksz != $blksz && _notrun "Failed to format with small blocksize."
> > > 
> > > I assume this works for xfs because only files in the real-time block
> > > size will have a larger allocation unit size, but the root directory
> > > has a allocation unit size of the "real" block size?
> > > 
> > > I suppose I could make a hypothetical _ext4_get_file_block_size lie
> > > and return the "real" block size when it's called on the mount point,
> > > but that seems kinda of gross, and it's also a lie, since the
> > > allocation unit size for the root directory actually is the cluster
> > > size, not the block size.
> > > 
> > > If I were going to be doing things from scratch, I'd make a
> > > distinction between _get_file_allocation_size and
> > > _get_file_system_block_size, which would be a lot *clearer* about what
> > 
> > Currently _get_block_size is more like _get_file_system_block_size, and
> > _get_file_block_size is more like _get_file_allocation_size. Actually
> > I'm confused about these two names before :)
> 
> Yes, both statements are correct.  They're both crappily named, one of
> them by me. :(
> 
> > > is going on.  Even then, I could imagine some tests getting confused
> > > with how, say, fiemap behaves with an ext4 file system with a 4k block
> > > size and a 64k allocation unit size, so I'm not sure it's a complete
> > > solution.  And doing this now would require quite a bit of code churn
> > > in xfstests.
> 
> Yes, it has been, uh, fun to fix fstests so that xfs realtime with a 28k
> allocation unit doesn't spray false failures everywhere.  I'm mostly
> done now, I hope? ;)
> 
> (Or at least I haven't heard any complaints from Leah in a solid 2
> months...)
> 
> > > > If this's helpful for your first concern [1], please tell me. Then we can talk
> > > > about your second concern [2], if it's still your main concern now :)
> > > > 
> > > > "This might *[1]* either be because the test didn't know about ext4
> > > >  bigalloc's cluster alignment requirements, *[2]* or because a particular
> > > >  operation might just be *buggy* and being able to run tests as if a
> > > >  particular command wasn't supported was useful."
> > > 
> > > So [1] is a real concern, and at the moment, we just suppress all of
> > > the tests that try to use collapse/insert range.  For example, from
> > > the ext4/bigalloc config file:
> > > 
> > >     # Until we can teach xfstests the difference between cluster size and
> > >     # block size, avoid collapse_range and insert_range since these will
> > >     # fail due the fact that these operations require cluster-aligned
> > >     # ranges.
> > >     export FSX_AVOID="-C -I"
> > >     export FSSTRESS_AVOID="-f collapse=0 -f insert=0"
> > >     export XFS_IO_AVOID="fcollapse finsert"
> > >     TEST_SET_EXCLUDE="-x collapse,insert"
> > 
> > Thanks for pointing this out, I'll help to check all fcollapse and finsert
> > related cases, and add them into collapse or insert group. We're keep improving
> > group problem, most of cases really not take too much care about its group.
> > And I'll pay more attention about the groups when I review patches.
> 
> <nod>
> 
> > > 
> > > That's not ideal, and it's been on my todo list to try to fix it, when
> > > I could get one of those round tuit's.  However, I had assumed that we
> > > would split _get_file_block_size somehow, given the observation I've
> > > made above.  So this was always been something that has put me off,
> > > because it looked like a much larger project than say, "gee, I have an
> > > hour or two on a weekend, let me see if I can fix this".
> > 
> > I'd like to help your team to deal with the fstests problems you hit in your
> > using procedure, feel free to tell us the part you hope to fix/improve. I
> > have to care about other teams too, so might need more time to evaluate and
> > talk, sorry about that.
> > 
> > > 
> > > [2] is practically not _really_ a concern any more.  It used to be
> > > that one of the ways that I would root cause a failing test was to
> > > suppress one of the advanced fallocate modes, whether it be
> > > collapse/insert range, or going even further back, punch hole.  If
> > > test then passed, then I could say, "ah, hah; the problem can probably
> > > be localized to a certain part of the fs code."
> > > 
> > > However, that's mainly tests that are using fsstress and fsx, and we
> > > have solutions for that.  And for other tests, I can examine the test
> > > and see whether or not it's using collapse/insert range by inspection,
> > > so it's really not that big of a deal.  I could imagine other file
> > > systems who might find this useful in the future, if they were trying
> > > to growing support for the more advanced fallocate modes --- but that
> > > wouldn't *my* concern, and arguably those file systems could solve
> > > problems alternate ways, such as having mount options that suppress
> > > those fallocate modes entirely which could be used when running
> > > xfstests.
> > > 
> > > 						- Ted
> > > 
> > > P.S.  I have noticed some tests that use collapse/insert range but
> > > don't declare that they are in the collapse or insert groups.
> > > Fortunately, all of these tests are also in the clone group, and so
> > > they don't apply to ext4, so _I_ don't care.  :-)
> > 
> > Thanks, I'll try to check all collapse/insert related cases.
> 
> <nod> Bad throwback to the days when we didn't scrutinize group names
> all that closely.  Someone probably ought to write a dumb linter that
> can look for obvious patterns in a testcase and nominate group names.
> 
> --D
> 
> > Thanks,
> > Zorro
> > 
> > > 
> > 
>
Zorro Lang Feb. 7, 2023, 6:26 p.m. UTC | #11
On Mon, Nov 07, 2022 at 11:35:16AM -0500, Theodore Ts'o wrote:
> On Mon, Nov 07, 2022 at 10:02:36AM +0800, Zorro Lang wrote:
> > I think it doesn't make sense to use $FSX_AVOID in `fsx --replay-ops` cases.
> > Due to generally the operations which a cases would like to replay are exact
> > steps to reproduce to a known bug. If we skip some operations (e.g. -F), it
> > doesn't make sense for this reproducer.
> > 
> > The recommended way for this kind of cases is making sure current fs/system
> > support the operations will be run by fsx, especially those features are not
> > common on different fs/system....
> >
> > So it uses below _require_* helpers to make sure these operations are supported,
> > before testing:
> > 
> >   _require_xfs_io_command "falloc"
> >   _require_xfs_io_command "falloc" "-k"
> >   _require_xfs_io_command "fzero"
> >   _require_xfs_io_command "fcollapse"
> > 
> > That's my point, hope I didn't misunderstand what you said :)
> 
> No, you didn't understand me.  :-)
> 
> For context, I have an out of tree patch (see attached), which I had
> tried upstreaming a while back, but it got rejected, so I've continued
> to keep it in my personal tree.  The basic idea is sometimes you might
> want to suppress a test even *though* _require_xfs_io_command seems to
> indicate that operation was supported.
> 
> This might either be because the test didn't know about ext4
> bigalloc's cluster alignment requirements, or because a particular
> operation might just be *buggy* and being able to run tests as if a
> particular command wasn't supported was useful.
> 
> It was rejected because the claim was that you could just exclude by
> group instead (e.g., "punch", "collapse") but I didn't trust that the
> group list would be kept up to date, so I never really agreed with
> that line of reasoning.  These days, given that group declaration are
> kept in the test script, it's much less likely to happen, but I've
> kept the patch in my tree because it's occasionally useful.
> 
> At this point, it's admittedly pretty rarely needed since ext4's
> collapse and insert range commands are pretty solid modulo tests not
> understanding cluster alignment, but still, it's not much effort for
> me to keep carrying the patch and I don't expect it will ever get
> upstreamed.
> 
> 					- Ted
> 
> commit c9d25475a94d5e53d7f18d247a17088999522862
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Sat Oct 17 14:39:26 2015 -0400
> 
>     common: introduce XFS_IO_AVOID env var

Hi Ted,

fstests has merged below change 2 month ago:
  [PATCH] fstests: update group name according to xfs_io command requirement
  https://lore.kernel.org/fstests/20221108183242.3362013-1-zlang@kernel.org/

So I'd like to check if it helps for the problem you described above?
If not, I think we can think about the patch you metioned above.

Thanks,
Zorro

>     
>     Like FSSTRESS_AVOID and FSX_AVOID, XFS_IO_AVOID can be used to avoid
>     using various advanced file system features such as "fpunch"
>     "fcollapse", "finsert", or "zero".  Tests that require an xfs_io
>     command which is included in the space-separated list found in the
>     XFS_IO_AVOID environment variable will be skipped using _notrun.
>     
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> 
> diff --git a/README b/README
> index 4c4f22f85..42baff07b 100644
> --- a/README
> +++ b/README
> @@ -245,6 +245,10 @@ Misc:
>     this option is supported for all filesystems currently only -overlay is
>     expected to run without issues. For other filesystems additional patches
>     and fixes to the test suite might be needed.
> + - setenv XFS_IO_AVOID, which may contain a list of space separated
> +   xfs_io commands which will be avoided in case you want to exclude
> +   tests that require the use of certain file system operations such
> +   as "fpunch", "fcollapse", "finsert", or "zero".
>  
>  ______________________
>  USING THE FSQA SUITE
> diff --git a/common/rc b/common/rc
> index eb67e0cdc..d1c07a4d0 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -2485,6 +2485,11 @@ _require_xfs_io_command()
>  	local opts=""
>  	local attr_info=""
>  
> +	if echo "$XFS_IO_AVOID" | grep -wq -- "$command"
> +	then
> +		_notrun "Avoiding xfs_io $command"
> +	fi
> +
>  	local testfile=$TEST_DIR/$$.xfs_io
>  	local testio
>  	case $command in
>
Theodore Ts'o July 28, 2023, 8:30 p.m. UTC | #12
On Wed, Feb 08, 2023 at 02:26:56AM +0800, Zorro Lang wrote:
> fstests has merged below change 2 month ago:
>   [PATCH] fstests: update group name according to xfs_io command requirement
>   https://lore.kernel.org/fstests/20221108183242.3362013-1-zlang@kernel.org/
> 
> So I'd like to check if it helps for the problem you described above?
> If not, I think we can think about the patch you metioned above.

Oops, sorry, this got lost in my inbox.  :-(

It definitely helped, thanks.  My one observation about this patch is
that it's a one-time fix-up.  I tried rerunning the script referenced
in the patch, and there were 11 tests that it "fixed up".  Now, they
were all adding tests to the "prealloc" group, which I think you had
deliberately excluded, because they weren't actually testing prealloc,
but it's the worry that future fstests developers might forget to set
the group name correctly, which is why I still have "common: introduce
XFS_IO_AVOID env var"[1] as an out of tree patch.

[1] https://lore.kernel.org/all/1445107518-32022-1-git-send-email-tytso@mit.edu/

It's a small change, it almost never conflicts with upstream changes
(generally the only time I have to deal with a conflcit when rebasing
is when a new environment variable is added to the documentation in
README), and it means that when I run "kvm-xfstests --no-collapse", my
wrapper scripts do this:

    no_collapse)
        ALL_FSSTRESS_AVOID="$ALL_FSSTRESS_AVOID -f collapse=0"
        ALL_FSX_AVOID="$ALL_FSX_AVOID -C"
        ALL_XFS_IO_AVOID="$ALL_XFS_IO_AVOID fcollapse"
        FSTESTSET="$FSTESTSET -x collapse"
        ;;

and I'm *guaranteed* to make sure that any tests involving
collapse_range will be skipped.  Do I strictly speaking need the
out-of-tree patch in [1], probably not, assuming the group list is
always kept up to date, and to be honest it's been a *long* time since
I've never needed to use gce-xfstests --no-collapse or --no-insert.

However, the cost of keeping the out-of-tree patch in my local
xfstests git repo is quite low, so I've just kept it.  But do I *need*
it?  Arguably, no, which is why I haven't been bugging you about it.
:-)

						- Ted
Zorro Lang Aug. 1, 2023, 8:27 a.m. UTC | #13
On Fri, Jul 28, 2023 at 04:30:40PM -0400, Theodore Ts'o wrote:
> On Wed, Feb 08, 2023 at 02:26:56AM +0800, Zorro Lang wrote:
> > fstests has merged below change 2 month ago:
> >   [PATCH] fstests: update group name according to xfs_io command requirement
> >   https://lore.kernel.org/fstests/20221108183242.3362013-1-zlang@kernel.org/
> > 
> > So I'd like to check if it helps for the problem you described above?
> > If not, I think we can think about the patch you metioned above.
> 
> Oops, sorry, this got lost in my inbox.  :-(
> 
> It definitely helped, thanks.  My one observation about this patch is
> that it's a one-time fix-up.  I tried rerunning the script referenced
> in the patch, and there were 11 tests that it "fixed up".  Now, they
> were all adding tests to the "prealloc" group, which I think you had
> deliberately excluded, because they weren't actually testing prealloc,
> but it's the worry that future fstests developers might forget to set
> the group name correctly, which is why I still have "common: introduce
> XFS_IO_AVOID env var"[1] as an out of tree patch.
> 
> [1] https://lore.kernel.org/all/1445107518-32022-1-git-send-email-tytso@mit.edu/
> 
> It's a small change, it almost never conflicts with upstream changes
> (generally the only time I have to deal with a conflcit when rebasing
> is when a new environment variable is added to the documentation in
> README), and it means that when I run "kvm-xfstests --no-collapse", my
> wrapper scripts do this:
> 
>     no_collapse)
>         ALL_FSSTRESS_AVOID="$ALL_FSSTRESS_AVOID -f collapse=0"
>         ALL_FSX_AVOID="$ALL_FSX_AVOID -C"
>         ALL_XFS_IO_AVOID="$ALL_XFS_IO_AVOID fcollapse"
>         FSTESTSET="$FSTESTSET -x collapse"
>         ;;
> 
> and I'm *guaranteed* to make sure that any tests involving
> collapse_range will be skipped.  Do I strictly speaking need the
> out-of-tree patch in [1], probably not, assuming the group list is
> always kept up to date, and to be honest it's been a *long* time since
> I've never needed to use gce-xfstests --no-collapse or --no-insert.

Many thanks, glad to know that helps.

Tell the truth, the "XFS_IO_AVOID" is more like a trick of the "exclude
individual tests (./check -X)". If a case contains an operation (e.g. collapse),
we can't skip it by group name, but can do that through a trick. That
cause fstests leave a "group name missing bug" there, and we even try to hide
it.

So the best way I think is anyone who can't skip a test properly by a
group name, please report that bug to fstests. Let's fix it. BTW, I really
tried to notice the missed group name from that day when I review new cases,
especially if there're some obvious xfs_io operations. But some operations
might be hide, feel free to report/fix that if anyone find :)

Thanks,
Zorro

> 
> However, the cost of keeping the out-of-tree patch in my local
> xfstests git repo is quite low, so I've just kept it.  But do I *need*
> it?  Arguably, no, which is why I haven't been bugging you about it.
> :-)
> 
> 						- Ted
>
diff mbox series

Patch

diff --git a/tests/generic/455 b/tests/generic/455
index 649b54108..c13d872c6 100755
--- a/tests/generic/455
+++ b/tests/generic/455
@@ -77,7 +77,7 @@  FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV"
 seeds=(0 0 0 0)
 # Run fsx for a while
 for j in `seq 0 $((NUM_FILES-1))`; do
-	run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j &
+	run_check $here/ltp/fsx $FSX_OPTS $FSX_AVOID -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j &
 done
 wait