Message ID | 20220216100535.4231-1-gniebler@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fstests: fix btrfs/255 to fail on deadlock | expand |
On Wed, Feb 16, 2022 at 11:05:35AM +0100, Gabriel Niebler wrote: > In its current implementation, the test btrfs/255 would hang forever > on any kernel w/o patch "btrfs: fix deadlock between quota disable > and qgroup rescan worker", rather than failing, as it should. > Fix this by introducing generous timeouts. > > Signed-off-by: Gabriel Niebler <gniebler@suse.com> If deadlock was already triggered, I don't think killing the userspace program with timeout will help, as the kernel already deadlocked, and filesystem and/or device can't be used by next test either. I think we should just exclude the test when running tests on unpatched kernel. Thanks, Eryu > --- > tests/btrfs/255 | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/tests/btrfs/255 b/tests/btrfs/255 > index 7e70944a..4c779458 100755 > --- a/tests/btrfs/255 > +++ b/tests/btrfs/255 > @@ -14,6 +14,7 @@ _begin_fstest auto qgroup balance > > # real QA test starts here > _supported_fs btrfs > +_require_command "$TIMEOUT_PROG" timeout > _require_scratch > > _scratch_mkfs >> $seqres.full 2>&1 > @@ -37,15 +38,23 @@ done > _btrfs_stress_balance $SCRATCH_MNT >> $seqres.full & > balance_pid=$! > echo $balance_pid >> $seqres.full > +timeout=$((30 * 60)) > for ((i = 0; i < 20; i++)); do > - $BTRFS_UTIL_PROG quota enable $SCRATCH_MNT > - $BTRFS_UTIL_PROG quota disable $SCRATCH_MNT > + $TIMEOUT_PROG -s KILL ${timeout}s $BTRFS_UTIL_PROG quota enable $SCRATCH_MNT > + [ $? -eq 0 ] || _fail "quota enable timed out" > + $TIMEOUT_PROG -s KILL ${timeout}s $BTRFS_UTIL_PROG quota disable $SCRATCH_MNT > + [ $? -eq 0 ] || _fail "quota disable timed out" > done > kill $balance_pid &> /dev/null > -wait > + > # wait for the balance operation to finish > +elapsed=0 > while ps aux | grep "balance start" | grep -qv grep; do > + if [ $elapsed -gt $timeout ]; then > + _fail "balance not finished after $timeout seconds" > + fi > sleep 1 > + elapsed=$(( ++elapsed )) > done > > echo "Silence is golden" > -- > 2.35.1
On Mon, Feb 21, 2022 at 01:07:35AM +0800, Eryu Guan wrote: > On Wed, Feb 16, 2022 at 11:05:35AM +0100, Gabriel Niebler wrote: > > In its current implementation, the test btrfs/255 would hang forever > > on any kernel w/o patch "btrfs: fix deadlock between quota disable > > and qgroup rescan worker", rather than failing, as it should. > > Fix this by introducing generous timeouts. > > > > Signed-off-by: Gabriel Niebler <gniebler@suse.com> > > If deadlock was already triggered, I don't think killing the userspace > program with timeout will help, as the kernel already deadlocked, and > filesystem and/or device can't be used by next test either. > > I think we should just exclude the test when running tests on unpatched > kernel. I don't see a way how to detect it at runtime, or do you mean to use the expunge files?
On Wed, Feb 23, 2022 at 06:11:26PM +0100, David Sterba wrote: > On Mon, Feb 21, 2022 at 01:07:35AM +0800, Eryu Guan wrote: > > On Wed, Feb 16, 2022 at 11:05:35AM +0100, Gabriel Niebler wrote: > > > In its current implementation, the test btrfs/255 would hang forever > > > on any kernel w/o patch "btrfs: fix deadlock between quota disable > > > and qgroup rescan worker", rather than failing, as it should. > > > Fix this by introducing generous timeouts. > > > > > > Signed-off-by: Gabriel Niebler <gniebler@suse.com> > > > > If deadlock was already triggered, I don't think killing the userspace > > program with timeout will help, as the kernel already deadlocked, and > > filesystem and/or device can't be used by next test either. > > > > I think we should just exclude the test when running tests on unpatched > > kernel. > > I don't see a way how to detect it at runtime, or do you mean to use the > expunge files? Yes, use expunge file and run fstests with './check -E <path_to_expunge_file>' Thanks, Eryu
diff --git a/tests/btrfs/255 b/tests/btrfs/255 index 7e70944a..4c779458 100755 --- a/tests/btrfs/255 +++ b/tests/btrfs/255 @@ -14,6 +14,7 @@ _begin_fstest auto qgroup balance # real QA test starts here _supported_fs btrfs +_require_command "$TIMEOUT_PROG" timeout _require_scratch _scratch_mkfs >> $seqres.full 2>&1 @@ -37,15 +38,23 @@ done _btrfs_stress_balance $SCRATCH_MNT >> $seqres.full & balance_pid=$! echo $balance_pid >> $seqres.full +timeout=$((30 * 60)) for ((i = 0; i < 20; i++)); do - $BTRFS_UTIL_PROG quota enable $SCRATCH_MNT - $BTRFS_UTIL_PROG quota disable $SCRATCH_MNT + $TIMEOUT_PROG -s KILL ${timeout}s $BTRFS_UTIL_PROG quota enable $SCRATCH_MNT + [ $? -eq 0 ] || _fail "quota enable timed out" + $TIMEOUT_PROG -s KILL ${timeout}s $BTRFS_UTIL_PROG quota disable $SCRATCH_MNT + [ $? -eq 0 ] || _fail "quota disable timed out" done kill $balance_pid &> /dev/null -wait + # wait for the balance operation to finish +elapsed=0 while ps aux | grep "balance start" | grep -qv grep; do + if [ $elapsed -gt $timeout ]; then + _fail "balance not finished after $timeout seconds" + fi sleep 1 + elapsed=$(( ++elapsed )) done echo "Silence is golden"
In its current implementation, the test btrfs/255 would hang forever on any kernel w/o patch "btrfs: fix deadlock between quota disable and qgroup rescan worker", rather than failing, as it should. Fix this by introducing generous timeouts. Signed-off-by: Gabriel Niebler <gniebler@suse.com> --- tests/btrfs/255 | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)