diff mbox series

fstests: fix btrfs/255 to fail on deadlock

Message ID 20220216100535.4231-1-gniebler@suse.com (mailing list archive)
State New, archived
Headers show
Series fstests: fix btrfs/255 to fail on deadlock | expand

Commit Message

Gabriel Niebler Feb. 16, 2022, 10:05 a.m. UTC
In its current implementation, the test btrfs/255 would hang forever
on any kernel w/o patch "btrfs: fix deadlock between quota disable
and qgroup rescan worker", rather than failing, as it should.
Fix this by introducing generous timeouts.

Signed-off-by: Gabriel Niebler <gniebler@suse.com>
---
 tests/btrfs/255 | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Eryu Guan Feb. 20, 2022, 5:07 p.m. UTC | #1
On Wed, Feb 16, 2022 at 11:05:35AM +0100, Gabriel Niebler wrote:
> In its current implementation, the test btrfs/255 would hang forever
> on any kernel w/o patch "btrfs: fix deadlock between quota disable
> and qgroup rescan worker", rather than failing, as it should.
> Fix this by introducing generous timeouts.
> 
> Signed-off-by: Gabriel Niebler <gniebler@suse.com>

If deadlock was already triggered, I don't think killing the userspace
program with timeout will help, as the kernel already deadlocked, and
filesystem and/or device can't be used by next test either.

I think we should just exclude the test when running tests on unpatched
kernel.

Thanks,
Eryu

> ---
>  tests/btrfs/255 | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/btrfs/255 b/tests/btrfs/255
> index 7e70944a..4c779458 100755
> --- a/tests/btrfs/255
> +++ b/tests/btrfs/255
> @@ -14,6 +14,7 @@ _begin_fstest auto qgroup balance
>  
>  # real QA test starts here
>  _supported_fs btrfs
> +_require_command "$TIMEOUT_PROG" timeout
>  _require_scratch
>  
>  _scratch_mkfs >> $seqres.full 2>&1
> @@ -37,15 +38,23 @@ done
>  _btrfs_stress_balance $SCRATCH_MNT >> $seqres.full &
>  balance_pid=$!
>  echo $balance_pid >> $seqres.full
> +timeout=$((30 * 60))
>  for ((i = 0; i < 20; i++)); do
> -	$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT
> -	$BTRFS_UTIL_PROG quota disable $SCRATCH_MNT
> +	$TIMEOUT_PROG -s KILL ${timeout}s $BTRFS_UTIL_PROG quota enable $SCRATCH_MNT
> +	[ $? -eq 0 ] || _fail "quota enable timed out"
> +	$TIMEOUT_PROG -s KILL ${timeout}s $BTRFS_UTIL_PROG quota disable $SCRATCH_MNT
> +	[ $? -eq 0 ] || _fail "quota disable timed out"
>  done
>  kill $balance_pid &> /dev/null
> -wait
> +
>  # wait for the balance operation to finish
> +elapsed=0
>  while ps aux | grep "balance start" | grep -qv grep; do
> +	if [ $elapsed -gt $timeout ]; then
> +		_fail "balance not finished after $timeout seconds"
> +	fi
>  	sleep 1
> +	elapsed=$(( ++elapsed ))
>  done
>  
>  echo "Silence is golden"
> -- 
> 2.35.1
David Sterba Feb. 23, 2022, 5:11 p.m. UTC | #2
On Mon, Feb 21, 2022 at 01:07:35AM +0800, Eryu Guan wrote:
> On Wed, Feb 16, 2022 at 11:05:35AM +0100, Gabriel Niebler wrote:
> > In its current implementation, the test btrfs/255 would hang forever
> > on any kernel w/o patch "btrfs: fix deadlock between quota disable
> > and qgroup rescan worker", rather than failing, as it should.
> > Fix this by introducing generous timeouts.
> > 
> > Signed-off-by: Gabriel Niebler <gniebler@suse.com>
> 
> If deadlock was already triggered, I don't think killing the userspace
> program with timeout will help, as the kernel already deadlocked, and
> filesystem and/or device can't be used by next test either.
> 
> I think we should just exclude the test when running tests on unpatched
> kernel.

I don't see a way how to detect it at runtime, or do you mean to use the
expunge files?
Eryu Guan March 20, 2022, 3:33 p.m. UTC | #3
On Wed, Feb 23, 2022 at 06:11:26PM +0100, David Sterba wrote:
> On Mon, Feb 21, 2022 at 01:07:35AM +0800, Eryu Guan wrote:
> > On Wed, Feb 16, 2022 at 11:05:35AM +0100, Gabriel Niebler wrote:
> > > In its current implementation, the test btrfs/255 would hang forever
> > > on any kernel w/o patch "btrfs: fix deadlock between quota disable
> > > and qgroup rescan worker", rather than failing, as it should.
> > > Fix this by introducing generous timeouts.
> > > 
> > > Signed-off-by: Gabriel Niebler <gniebler@suse.com>
> > 
> > If deadlock was already triggered, I don't think killing the userspace
> > program with timeout will help, as the kernel already deadlocked, and
> > filesystem and/or device can't be used by next test either.
> > 
> > I think we should just exclude the test when running tests on unpatched
> > kernel.
> 
> I don't see a way how to detect it at runtime, or do you mean to use the
> expunge files?

Yes, use expunge file and run fstests with './check -E <path_to_expunge_file>'

Thanks,
Eryu
diff mbox series

Patch

diff --git a/tests/btrfs/255 b/tests/btrfs/255
index 7e70944a..4c779458 100755
--- a/tests/btrfs/255
+++ b/tests/btrfs/255
@@ -14,6 +14,7 @@  _begin_fstest auto qgroup balance
 
 # real QA test starts here
 _supported_fs btrfs
+_require_command "$TIMEOUT_PROG" timeout
 _require_scratch
 
 _scratch_mkfs >> $seqres.full 2>&1
@@ -37,15 +38,23 @@  done
 _btrfs_stress_balance $SCRATCH_MNT >> $seqres.full &
 balance_pid=$!
 echo $balance_pid >> $seqres.full
+timeout=$((30 * 60))
 for ((i = 0; i < 20; i++)); do
-	$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT
-	$BTRFS_UTIL_PROG quota disable $SCRATCH_MNT
+	$TIMEOUT_PROG -s KILL ${timeout}s $BTRFS_UTIL_PROG quota enable $SCRATCH_MNT
+	[ $? -eq 0 ] || _fail "quota enable timed out"
+	$TIMEOUT_PROG -s KILL ${timeout}s $BTRFS_UTIL_PROG quota disable $SCRATCH_MNT
+	[ $? -eq 0 ] || _fail "quota disable timed out"
 done
 kill $balance_pid &> /dev/null
-wait
+
 # wait for the balance operation to finish
+elapsed=0
 while ps aux | grep "balance start" | grep -qv grep; do
+	if [ $elapsed -gt $timeout ]; then
+		_fail "balance not finished after $timeout seconds"
+	fi
 	sleep 1
+	elapsed=$(( ++elapsed ))
 done
 
 echo "Silence is golden"