diff mbox series

btrfs: add test for enable/disable quota and create/destroy qgroup repeatedly

Message ID 20220301151930.1315-1-realwakka@gmail.com (mailing list archive)
State New, archived
Headers show
Series btrfs: add test for enable/disable quota and create/destroy qgroup repeatedly | expand

Commit Message

Sidong Yang March 1, 2022, 3:19 p.m. UTC
Test enabling/disable quota and creating/destroying qgroup repeatedly
in asynchronous and confirm it does not cause kernel hang. This is a
regression test for the problem reported to linux-btrfs list [1].

The hang was recreated using the test case and fixed by kernel patch
titled

  btrfs: qgroup: fix deadlock between rescan worker and remove qgroup

[1] https://lore.kernel.org/linux-btrfs/20220228014340.21309-1-realwakka@gmail.com/

Signed-off-by: Sidong Yang <realwakka@gmail.com>
---
 tests/btrfs/262     | 54 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/262.out |  2 ++
 2 files changed, 56 insertions(+)
 create mode 100755 tests/btrfs/262
 create mode 100644 tests/btrfs/262.out

Comments

Shinichiro Kawasaki March 2, 2022, 4:43 a.m. UTC | #1
Hi Sidong,

I tried this patch and observed that it recreates the hang and confirms the fix.
Thanks. Here's my comments for improvements.

On Mar 01, 2022 / 15:19, Sidong Yang wrote:
> Test enabling/disable quota and creating/destroying qgroup repeatedly

nit: gerund (...ing) and base form are mixed. Base form would be the better to
be same as the code comment.

> in asynchronous and confirm it does not cause kernel hang. This is a
> regression test for the problem reported to linux-btrfs list [1].
> 
> The hang was recreated using the test case and fixed by kernel patch
> titled
> 
>   btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
> 
> [1] https://lore.kernel.org/linux-btrfs/20220228014340.21309-1-realwakka@gmail.com/
> 
> Signed-off-by: Sidong Yang <realwakka@gmail.com>
> ---
>  tests/btrfs/262     | 54 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/262.out |  2 ++
>  2 files changed, 56 insertions(+)
>  create mode 100755 tests/btrfs/262
>  create mode 100644 tests/btrfs/262.out
> 
> diff --git a/tests/btrfs/262 b/tests/btrfs/262
> new file mode 100755
> index 00000000..9be380f9
> --- /dev/null
> +++ b/tests/btrfs/262
> @@ -0,0 +1,54 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.

It's the better to put your name there :)

> +#
> +# FS QA Test 262
> +#
> +# Test the deadlock between qgroup and quota commands
> +#
> +. ./common/preamble
> +_begin_fstest auto qgroup
> +
> +# Import common functions.
> +. ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.

Can we remove this comment?

> +_supported_fs btrfs
> +
> +_require_scratch
> +
> +# Run command that enable/disable quota and create/destroy qgroup asynchronously
> +qgroup_deadlock_test()
> +{
> +	_scratch_mkfs > /dev/null 2>&1
> +	_scratch_mount
> +	echo "=== qgroup deadlock test ===" >> $seqres.full
> +
> +	pids=()
> +	for ((i = 0; i < 200; i++)); do
> +		$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT 2>> $seqres.full &
> +		pids+=($!)
> +		$BTRFS_UTIL_PROG qgroup create 1/0 $SCRATCH_MNT 2>> $seqres.full &
> +		pids+=($!)
> +		$BTRFS_UTIL_PROG qgroup destroy 1/0 $SCRATCH_MNT 2>> $seqres.full &
> +		pids+=($!)
> +		$BTRFS_UTIL_PROG quota disable $SCRATCH_MNT 2>> $seqres.full &
> +		pids+=($!)		

Trailing tabs in the line above.

> +	done
> +
> +	for pid in "${pids[@]}"; do
> +		wait $pid
> +	done

I think simple "wait" command does what the for loop does.

> +
> +	_scratch_unmount
> +	_check_scratch_fs

I think _scratch_unmount and _check_scratch_fs are required only when
the test case repeats scratch mount and unmount. This test case looks
simple for me and the qgroup_deadlock_test function may not be necessary.

> +}
> +
> +qgroup_deadlock_test
> +
> +# success, all done
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/btrfs/262.out b/tests/btrfs/262.out
> new file mode 100644
> index 00000000..404badc3
> --- /dev/null
> +++ b/tests/btrfs/262.out
> @@ -0,0 +1,2 @@
> +QA output created by 262
> +Silence is golden
> -- 
> 2.25.1
>
Sidong Yang March 2, 2022, 6:26 a.m. UTC | #2
On Wed, Mar 02, 2022 at 04:43:35AM +0000, Shinichiro Kawasaki wrote:

Hi, Shinichiro.

Thanks for reply!

> Hi Sidong,
> 
> I tried this patch and observed that it recreates the hang and confirms the fix.
> Thanks. Here's my comments for improvements.
> 
> On Mar 01, 2022 / 15:19, Sidong Yang wrote:
> > Test enabling/disable quota and creating/destroying qgroup repeatedly
> 
> nit: gerund (...ing) and base form are mixed. Base form would be the better to
> be same as the code comment.

Yeah, 'disable' should be disabling.
> 
> > in asynchronous and confirm it does not cause kernel hang. This is a
> > regression test for the problem reported to linux-btrfs list [1].
> > 
> > The hang was recreated using the test case and fixed by kernel patch
> > titled
> > 
> >   btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
> > 
> > [1] https://lore.kernel.org/linux-btrfs/20220228014340.21309-1-realwakka@gmail.com/
> > 
> > Signed-off-by: Sidong Yang <realwakka@gmail.com>
> > ---
> >  tests/btrfs/262     | 54 +++++++++++++++++++++++++++++++++++++++++++++
> >  tests/btrfs/262.out |  2 ++
> >  2 files changed, 56 insertions(+)
> >  create mode 100755 tests/btrfs/262
> >  create mode 100644 tests/btrfs/262.out
> > 
> > diff --git a/tests/btrfs/262 b/tests/btrfs/262
> > new file mode 100755
> > index 00000000..9be380f9
> > --- /dev/null
> > +++ b/tests/btrfs/262
> > @@ -0,0 +1,54 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.
> 
> It's the better to put your name there :)

Oops, 
> 
> > +#
> > +# FS QA Test 262
> > +#
> > +# Test the deadlock between qgroup and quota commands
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto qgroup
> > +
> > +# Import common functions.
> > +. ./common/filter
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> 
> Can we remove this comment?
Sure!
> 
> > +_supported_fs btrfs
> > +
> > +_require_scratch
> > +
> > +# Run command that enable/disable quota and create/destroy qgroup asynchronously
> > +qgroup_deadlock_test()
> > +{
> > +	_scratch_mkfs > /dev/null 2>&1
> > +	_scratch_mount
> > +	echo "=== qgroup deadlock test ===" >> $seqres.full
> > +
> > +	pids=()
> > +	for ((i = 0; i < 200; i++)); do
> > +		$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT 2>> $seqres.full &
> > +		pids+=($!)
> > +		$BTRFS_UTIL_PROG qgroup create 1/0 $SCRATCH_MNT 2>> $seqres.full &
> > +		pids+=($!)
> > +		$BTRFS_UTIL_PROG qgroup destroy 1/0 $SCRATCH_MNT 2>> $seqres.full &
> > +		pids+=($!)
> > +		$BTRFS_UTIL_PROG quota disable $SCRATCH_MNT 2>> $seqres.full &
> > +		pids+=($!)		
> 
> Trailing tabs in the line above.
Oops..
> 
> > +	done
> > +
> > +	for pid in "${pids[@]}"; do
> > +		wait $pid
> > +	done
> 
> I think simple "wait" command does what the for loop does.

I didn't know that "wait" command with no parameter waits all background
processes to finish. So it seems that we don't need pids it can be
deleted. Thanks.

Actually I've been agony about this. Does it needs timeout? When I tried
to command like this "timeout 10s wait", This command couldn't be
executed becase "wait" command is not binary. How can I insert timeout?

> 
> > +
> > +	_scratch_unmount
> > +	_check_scratch_fs
> 
> I think _scratch_unmount and _check_scratch_fs are required only when
> the test case repeats scratch mount and unmount. This test case looks
> simple for me and the qgroup_deadlock_test function may not be necessary.

I agree. This case is very simple. We don't need to use function. It
would be better that it's written in simple.

> 
> > +}
> > +
> > +qgroup_deadlock_test
> > +
> > +# success, all done
> > +echo "Silence is golden"
> > +status=0
> > +exit
> > diff --git a/tests/btrfs/262.out b/tests/btrfs/262.out
> > new file mode 100644
> > index 00000000..404badc3
> > --- /dev/null
> > +++ b/tests/btrfs/262.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 262
> > +Silence is golden
> > -- 
> > 2.25.1
> > 
> 
> -- 
> Best Regards,
> Shin'ichiro Kawasaki
Shinichiro Kawasaki March 2, 2022, 8:24 a.m. UTC | #3
On Mar 02, 2022 / 06:26, Sidong Yang wrote:
> On Wed, Mar 02, 2022 at 04:43:35AM +0000, Shinichiro Kawasaki wrote:
> 
> Hi, Shinichiro.
> 
> Thanks for reply!
> 
> > Hi Sidong,
> > 
> > I tried this patch and observed that it recreates the hang and confirms the fix.
> > Thanks. Here's my comments for improvements.
> > 
> > On Mar 01, 2022 / 15:19, Sidong Yang wrote:
> > > Test enabling/disable quota and creating/destroying qgroup repeatedly
> > 
> > nit: gerund (...ing) and base form are mixed. Base form would be the better to
> > be same as the code comment.
> 
> Yeah, 'disable' should be disabling.
> > 
> > > in asynchronous and confirm it does not cause kernel hang. This is a
> > > regression test for the problem reported to linux-btrfs list [1].
> > > 
> > > The hang was recreated using the test case and fixed by kernel patch
> > > titled
> > > 
> > >   btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
> > > 
> > > [1] https://lore.kernel.org/linux-btrfs/20220228014340.21309-1-realwakka@gmail.com/
> > > 
> > > Signed-off-by: Sidong Yang <realwakka@gmail.com>
> > > ---

(snip)

> > > +	done
> > > +
> > > +	for pid in "${pids[@]}"; do
> > > +		wait $pid
> > > +	done
> > 
> > I think simple "wait" command does what the for loop does.
> 
> I didn't know that "wait" command with no parameter waits all background
> processes to finish. So it seems that we don't need pids it can be
> deleted. Thanks.
> 
> Actually I've been agony about this. Does it needs timeout? When I tried
> to command like this "timeout 10s wait", This command couldn't be
> executed becase "wait" command is not binary. How can I insert timeout?

I think recent discussion on the list is a good reference [1]. A patch was
posted to add timeout to btrfs/255.

More importantly, it was discussed that such timeout of user space program will
not help. Eryu pointed out that once "the kernel already deadlocked, and
filesystem and/or device can't be used by next test either". IMHO, your new case
will not require timeout either with same reasoning.

[1] https://lore.kernel.org/fstests/20220223171126.GQ12643@twin.jikos.cz/T/#me349d62ff367a0a6a28076bdd5b89263fc8109c0
Filipe Manana March 2, 2022, 12:10 p.m. UTC | #4
On Tue, Mar 01, 2022 at 03:19:30PM +0000, Sidong Yang wrote:
> Test enabling/disable quota and creating/destroying qgroup repeatedly
> in asynchronous and confirm it does not cause kernel hang. This is a

in asynchronous -> in parallel

> regression test for the problem reported to linux-btrfs list [1].

It's worth mentioning the deadlock only happens starting with kernel 5.17-rc3.

> 
> The hang was recreated using the test case and fixed by kernel patch
> titled
> 
>   btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
> 
> [1] https://lore.kernel.org/linux-btrfs/20220228014340.21309-1-realwakka@gmail.com/
> 
> Signed-off-by: Sidong Yang <realwakka@gmail.com>

In addition to Shinichiro's comments...

> ---
>  tests/btrfs/262     | 54 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/262.out |  2 ++
>  2 files changed, 56 insertions(+)
>  create mode 100755 tests/btrfs/262
>  create mode 100644 tests/btrfs/262.out
> 
> diff --git a/tests/btrfs/262 b/tests/btrfs/262
> new file mode 100755
> index 00000000..9be380f9
> --- /dev/null
> +++ b/tests/btrfs/262
> @@ -0,0 +1,54 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.
> +#
> +# FS QA Test 262
> +#
> +# Test the deadlock between qgroup and quota commands

The test description should be a lot more clear.

"the deadlock" is vague a gives the wrong idea we only ever had a single
deadlock related to qgroups. "qgroup and quota commands" is confusing,
and "qgroup" and "quota" are pretty much synonyms, and it should mention
which commands.

Plus what we want to test is that we can run some qgroup operations in
parallel without triggering a deadlock, crash, etc.

Perhaps something like:

"""
Test that running qgroup enable, create, destroy and disable commands in
parallel does not result in a deadlock, a crash or any filesystem
inconsistency.
"""


> +#
> +. ./common/preamble
> +_begin_fstest auto qgroup

Can also be added to the "quick" group. It takes 1 second in my slowest vm.

> +
> +# Import common functions.
> +. ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +
> +_require_scratch
> +
> +# Run command that enable/disable quota and create/destroy qgroup asynchronously

With the more clear test description above, this can go away.

> +qgroup_deadlock_test()
> +{
> +	_scratch_mkfs > /dev/null 2>&1
> +	_scratch_mount
> +	echo "=== qgroup deadlock test ===" >> $seqres.full

There's no point in echoing this message to the .full file, it provides no
value at all, as testing that is all that this testcase does.

> +
> +	pids=()
> +	for ((i = 0; i < 200; i++)); do
> +		$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT 2>> $seqres.full &
> +		pids+=($!)
> +		$BTRFS_UTIL_PROG qgroup create 1/0 $SCRATCH_MNT 2>> $seqres.full &
> +		pids+=($!)
> +		$BTRFS_UTIL_PROG qgroup destroy 1/0 $SCRATCH_MNT 2>> $seqres.full &
> +		pids+=($!)
> +		$BTRFS_UTIL_PROG quota disable $SCRATCH_MNT 2>> $seqres.full &
> +		pids+=($!)		
> +	done
> +
> +	for pid in "${pids[@]}"; do
> +		wait $pid
> +	done

As pointed before by Shinichiro, a simple 'wait' here is enough, no need to
keep track of the PIDs.

> +
> +	_scratch_unmount
> +	_check_scratch_fs

Not needed, the fstests framework automatically runs 'btrfs check' when a test
finishes. Doing this explicitly is only necessary when we need to do several
mount/unmount operations and want to check the fs is fine after each unmount
and before the next mount.

> +}
> +
> +qgroup_deadlock_test

There's no point in putting all the test code in a function, as the function
is only called once.

Otherwise it looks good, and the test works as advertised, it triggers a
deadlock on 5.17-rc3+ kernel and passes on a patched kernel.

Thanks for converting the reproducer into a test case.

> +
> +# success, all done
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/btrfs/262.out b/tests/btrfs/262.out
> new file mode 100644
> index 00000000..404badc3
> --- /dev/null
> +++ b/tests/btrfs/262.out
> @@ -0,0 +1,2 @@
> +QA output created by 262
> +Silence is golden
> -- 
> 2.25.1
>
Sidong Yang March 2, 2022, 1:43 p.m. UTC | #5
On Wed, Mar 02, 2022 at 12:10:08PM +0000, Filipe Manana wrote:

Hi, Filipe!
Thanks for review.
> On Tue, Mar 01, 2022 at 03:19:30PM +0000, Sidong Yang wrote:
> > Test enabling/disable quota and creating/destroying qgroup repeatedly
> > in asynchronous and confirm it does not cause kernel hang. This is a
> 
> in asynchronous -> in parallel

Sure, Thanks!
> 
> > regression test for the problem reported to linux-btrfs list [1].
> 
> It's worth mentioning the deadlock only happens starting with kernel 5.17-rc3.

It only happens in 5.17-rc3 version? I didn't know about it. I'll add
mention about this.
> 
> > 
> > The hang was recreated using the test case and fixed by kernel patch
> > titled
> > 
> >   btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
> > 
> > [1] https://lore.kernel.org/linux-btrfs/20220228014340.21309-1-realwakka@gmail.com/
> > 
> > Signed-off-by: Sidong Yang <realwakka@gmail.com>
> 
> In addition to Shinichiro's comments...
> 
> > ---
> >  tests/btrfs/262     | 54 +++++++++++++++++++++++++++++++++++++++++++++
> >  tests/btrfs/262.out |  2 ++
> >  2 files changed, 56 insertions(+)
> >  create mode 100755 tests/btrfs/262
> >  create mode 100644 tests/btrfs/262.out
> > 
> > diff --git a/tests/btrfs/262 b/tests/btrfs/262
> > new file mode 100755
> > index 00000000..9be380f9
> > --- /dev/null
> > +++ b/tests/btrfs/262
> > @@ -0,0 +1,54 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.
> > +#
> > +# FS QA Test 262
> > +#
> > +# Test the deadlock between qgroup and quota commands
> 
> The test description should be a lot more clear.
> 
> "the deadlock" is vague a gives the wrong idea we only ever had a single
> deadlock related to qgroups. "qgroup and quota commands" is confusing,
> and "qgroup" and "quota" are pretty much synonyms, and it should mention
> which commands.
> 
> Plus what we want to test is that we can run some qgroup operations in
> parallel without triggering a deadlock, crash, etc.
> 
> Perhaps something like:
> 
> """
> Test that running qgroup enable, create, destroy and disable commands in
> parallel does not result in a deadlock, a crash or any filesystem
> inconsistency.
> """
> 
Yeah, It was not clear. I found that this test is not only for checking
deadlock. But it checks that test runs without any problem.

> 
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto qgroup
> 
> Can also be added to the "quick" group. It takes 1 second in my slowest vm.

Okay, Thanks!
> 
> > +
> > +# Import common functions.
> > +. ./common/filter
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs btrfs
> > +
> > +_require_scratch
> > +
> > +# Run command that enable/disable quota and create/destroy qgroup asynchronously
> 
> With the more clear test description above, this can go away.

Sure!
> 
> > +qgroup_deadlock_test()
> > +{
> > +	_scratch_mkfs > /dev/null 2>&1
> > +	_scratch_mount
> > +	echo "=== qgroup deadlock test ===" >> $seqres.full
> 
> There's no point in echoing this message to the .full file, it provides no
> value at all, as testing that is all that this testcase does.

I agree. This is pointless because it's simple test.
> 
> > +
> > +	pids=()
> > +	for ((i = 0; i < 200; i++)); do
> > +		$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT 2>> $seqres.full &
> > +		pids+=($!)
> > +		$BTRFS_UTIL_PROG qgroup create 1/0 $SCRATCH_MNT 2>> $seqres.full &
> > +		pids+=($!)
> > +		$BTRFS_UTIL_PROG qgroup destroy 1/0 $SCRATCH_MNT 2>> $seqres.full &
> > +		pids+=($!)
> > +		$BTRFS_UTIL_PROG quota disable $SCRATCH_MNT 2>> $seqres.full &
> > +		pids+=($!)		
> > +	done
> > +
> > +	for pid in "${pids[@]}"; do
> > +		wait $pid
> > +	done
> 
> As pointed before by Shinichiro, a simple 'wait' here is enough, no need to
> keep track of the PIDs.

Yeah, I don't have to go hard way.
> 
> > +
> > +	_scratch_unmount
> > +	_check_scratch_fs
> 
> Not needed, the fstests framework automatically runs 'btrfs check' when a test
> finishes. Doing this explicitly is only necessary when we need to do several
> mount/unmount operations and want to check the fs is fine after each unmount
> and before the next mount.
> 

I didn't know about that. I don't need to check manually.
> > +}
> > +
> > +qgroup_deadlock_test
> 
> There's no point in putting all the test code in a function, as the function
> is only called once.

Of course!
> 
> Otherwise it looks good, and the test works as advertised, it triggers a
> deadlock on 5.17-rc3+ kernel and passes on a patched kernel.
> 
> Thanks for converting the reproducer into a test case.
> 

Thanks for detailed review. I'll back soon with v2.

Thanks,
Sidong
> > +
> > +# success, all done
> > +echo "Silence is golden"
> > +status=0
> > +exit
> > diff --git a/tests/btrfs/262.out b/tests/btrfs/262.out
> > new file mode 100644
> > index 00000000..404badc3
> > --- /dev/null
> > +++ b/tests/btrfs/262.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 262
> > +Silence is golden
> > -- 
> > 2.25.1
> >
Sidong Yang March 2, 2022, 1:47 p.m. UTC | #6
On Wed, Mar 02, 2022 at 08:24:48AM +0000, Shinichiro Kawasaki wrote:
> On Mar 02, 2022 / 06:26, Sidong Yang wrote:
> > On Wed, Mar 02, 2022 at 04:43:35AM +0000, Shinichiro Kawasaki wrote:
> > 
> > Hi, Shinichiro.
> > 
> > Thanks for reply!
> > 
> > > Hi Sidong,
> > > 
> > > I tried this patch and observed that it recreates the hang and confirms the fix.
> > > Thanks. Here's my comments for improvements.
> > > 
> > > On Mar 01, 2022 / 15:19, Sidong Yang wrote:
> > > > Test enabling/disable quota and creating/destroying qgroup repeatedly
> > > 
> > > nit: gerund (...ing) and base form are mixed. Base form would be the better to
> > > be same as the code comment.
> > 
> > Yeah, 'disable' should be disabling.
> > > 
> > > > in asynchronous and confirm it does not cause kernel hang. This is a
> > > > regression test for the problem reported to linux-btrfs list [1].
> > > > 
> > > > The hang was recreated using the test case and fixed by kernel patch
> > > > titled
> > > > 
> > > >   btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
> > > > 
> > > > [1] https://lore.kernel.org/linux-btrfs/20220228014340.21309-1-realwakka@gmail.com/
> > > > 
> > > > Signed-off-by: Sidong Yang <realwakka@gmail.com>
> > > > ---
> 
> (snip)
> 
> > > > +	done
> > > > +
> > > > +	for pid in "${pids[@]}"; do
> > > > +		wait $pid
> > > > +	done
> > > 
> > > I think simple "wait" command does what the for loop does.
> > 
> > I didn't know that "wait" command with no parameter waits all background
> > processes to finish. So it seems that we don't need pids it can be
> > deleted. Thanks.
> > 
> > Actually I've been agony about this. Does it needs timeout? When I tried
> > to command like this "timeout 10s wait", This command couldn't be
> > executed becase "wait" command is not binary. How can I insert timeout?
> 
> I think recent discussion on the list is a good reference [1]. A patch was
> posted to add timeout to btrfs/255.
> 
> More importantly, it was discussed that such timeout of user space program will
> not help. Eryu pointed out that once "the kernel already deadlocked, and
> filesystem and/or device can't be used by next test either". IMHO, your new case
> will not require timeout either with same reasoning.
> 
> [1] https://lore.kernel.org/fstests/20220223171126.GQ12643@twin.jikos.cz/T/#me349d62ff367a0a6a28076bdd5b89263fc8109c0
> 
Thanks for a good example. I understand that there is no help in
userspace for the kernel deadlocked. In this case, Just "wait" is
enough.

Thanks,
Sidong
> -- 
> Best Regards,
> Shin'ichiro Kawasaki
Filipe Manana March 2, 2022, 3:12 p.m. UTC | #7
On Wed, Mar 02, 2022 at 01:43:16PM +0000, Sidong Yang wrote:
> On Wed, Mar 02, 2022 at 12:10:08PM +0000, Filipe Manana wrote:
> 
> Hi, Filipe!
> Thanks for review.
> > On Tue, Mar 01, 2022 at 03:19:30PM +0000, Sidong Yang wrote:
> > > Test enabling/disable quota and creating/destroying qgroup repeatedly
> > > in asynchronous and confirm it does not cause kernel hang. This is a
> > 
> > in asynchronous -> in parallel
> 
> Sure, Thanks!
> > 
> > > regression test for the problem reported to linux-btrfs list [1].
> > 
> > It's worth mentioning the deadlock only happens starting with kernel 5.17-rc3.
> 
> It only happens in 5.17-rc3 version? I didn't know about it. I'll add
> mention about this.

Well, in the kernel patch we have:

  Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")

And that commit was introduced in 5.17-rc3. Maybe it deadlocked in a different
way before that commit, perhaps in the way that e804861bd4e6 describes. However
I haven't checked how it behaves on a kernel without that commit. But at least we
know that currently it deadlocks at 5.17-rc3+.

> > 
> > > 
> > > The hang was recreated using the test case and fixed by kernel patch
> > > titled
> > > 
> > >   btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
> > > 
> > > [1] https://lore.kernel.org/linux-btrfs/20220228014340.21309-1-realwakka@gmail.com/
> > > 
> > > Signed-off-by: Sidong Yang <realwakka@gmail.com>
> > 
> > In addition to Shinichiro's comments...
> > 
> > > ---
> > >  tests/btrfs/262     | 54 +++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/btrfs/262.out |  2 ++
> > >  2 files changed, 56 insertions(+)
> > >  create mode 100755 tests/btrfs/262
> > >  create mode 100644 tests/btrfs/262.out
> > > 
> > > diff --git a/tests/btrfs/262 b/tests/btrfs/262
> > > new file mode 100755
> > > index 00000000..9be380f9
> > > --- /dev/null
> > > +++ b/tests/btrfs/262
> > > @@ -0,0 +1,54 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.
> > > +#
> > > +# FS QA Test 262
> > > +#
> > > +# Test the deadlock between qgroup and quota commands
> > 
> > The test description should be a lot more clear.
> > 
> > "the deadlock" is vague a gives the wrong idea we only ever had a single
> > deadlock related to qgroups. "qgroup and quota commands" is confusing,
> > and "qgroup" and "quota" are pretty much synonyms, and it should mention
> > which commands.
> > 
> > Plus what we want to test is that we can run some qgroup operations in
> > parallel without triggering a deadlock, crash, etc.
> > 
> > Perhaps something like:
> > 
> > """
> > Test that running qgroup enable, create, destroy and disable commands in
> > parallel does not result in a deadlock, a crash or any filesystem
> > inconsistency.
> > """
> > 
> Yeah, It was not clear. I found that this test is not only for checking
> deadlock. But it checks that test runs without any problem.
> 
> > 
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto qgroup
> > 
> > Can also be added to the "quick" group. It takes 1 second in my slowest vm.
> 
> Okay, Thanks!
> > 
> > > +
> > > +# Import common functions.
> > > +. ./common/filter
> > > +
> > > +# real QA test starts here
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs btrfs
> > > +
> > > +_require_scratch
> > > +
> > > +# Run command that enable/disable quota and create/destroy qgroup asynchronously
> > 
> > With the more clear test description above, this can go away.
> 
> Sure!
> > 
> > > +qgroup_deadlock_test()
> > > +{
> > > +	_scratch_mkfs > /dev/null 2>&1
> > > +	_scratch_mount
> > > +	echo "=== qgroup deadlock test ===" >> $seqres.full
> > 
> > There's no point in echoing this message to the .full file, it provides no
> > value at all, as testing that is all that this testcase does.
> 
> I agree. This is pointless because it's simple test.
> > 
> > > +
> > > +	pids=()
> > > +	for ((i = 0; i < 200; i++)); do
> > > +		$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT 2>> $seqres.full &
> > > +		pids+=($!)
> > > +		$BTRFS_UTIL_PROG qgroup create 1/0 $SCRATCH_MNT 2>> $seqres.full &
> > > +		pids+=($!)
> > > +		$BTRFS_UTIL_PROG qgroup destroy 1/0 $SCRATCH_MNT 2>> $seqres.full &
> > > +		pids+=($!)
> > > +		$BTRFS_UTIL_PROG quota disable $SCRATCH_MNT 2>> $seqres.full &
> > > +		pids+=($!)		
> > > +	done
> > > +
> > > +	for pid in "${pids[@]}"; do
> > > +		wait $pid
> > > +	done
> > 
> > As pointed before by Shinichiro, a simple 'wait' here is enough, no need to
> > keep track of the PIDs.
> 
> Yeah, I don't have to go hard way.
> > 
> > > +
> > > +	_scratch_unmount
> > > +	_check_scratch_fs
> > 
> > Not needed, the fstests framework automatically runs 'btrfs check' when a test
> > finishes. Doing this explicitly is only necessary when we need to do several
> > mount/unmount operations and want to check the fs is fine after each unmount
> > and before the next mount.
> > 
> 
> I didn't know about that. I don't need to check manually.
> > > +}
> > > +
> > > +qgroup_deadlock_test
> > 
> > There's no point in putting all the test code in a function, as the function
> > is only called once.
> 
> Of course!
> > 
> > Otherwise it looks good, and the test works as advertised, it triggers a
> > deadlock on 5.17-rc3+ kernel and passes on a patched kernel.
> > 
> > Thanks for converting the reproducer into a test case.
> > 
> 
> Thanks for detailed review. I'll back soon with v2.
> 
> Thanks,
> Sidong
> > > +
> > > +# success, all done
> > > +echo "Silence is golden"
> > > +status=0
> > > +exit
> > > diff --git a/tests/btrfs/262.out b/tests/btrfs/262.out
> > > new file mode 100644
> > > index 00000000..404badc3
> > > --- /dev/null
> > > +++ b/tests/btrfs/262.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 262
> > > +Silence is golden
> > > -- 
> > > 2.25.1
> > >
diff mbox series

Patch

diff --git a/tests/btrfs/262 b/tests/btrfs/262
new file mode 100755
index 00000000..9be380f9
--- /dev/null
+++ b/tests/btrfs/262
@@ -0,0 +1,54 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.
+#
+# FS QA Test 262
+#
+# Test the deadlock between qgroup and quota commands
+#
+. ./common/preamble
+_begin_fstest auto qgroup
+
+# Import common functions.
+. ./common/filter
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+
+_require_scratch
+
+# Run command that enable/disable quota and create/destroy qgroup asynchronously
+qgroup_deadlock_test()
+{
+	_scratch_mkfs > /dev/null 2>&1
+	_scratch_mount
+	echo "=== qgroup deadlock test ===" >> $seqres.full
+
+	pids=()
+	for ((i = 0; i < 200; i++)); do
+		$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT 2>> $seqres.full &
+		pids+=($!)
+		$BTRFS_UTIL_PROG qgroup create 1/0 $SCRATCH_MNT 2>> $seqres.full &
+		pids+=($!)
+		$BTRFS_UTIL_PROG qgroup destroy 1/0 $SCRATCH_MNT 2>> $seqres.full &
+		pids+=($!)
+		$BTRFS_UTIL_PROG quota disable $SCRATCH_MNT 2>> $seqres.full &
+		pids+=($!)		
+	done
+
+	for pid in "${pids[@]}"; do
+		wait $pid
+	done
+
+	_scratch_unmount
+	_check_scratch_fs
+}
+
+qgroup_deadlock_test
+
+# success, all done
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/btrfs/262.out b/tests/btrfs/262.out
new file mode 100644
index 00000000..404badc3
--- /dev/null
+++ b/tests/btrfs/262.out
@@ -0,0 +1,2 @@ 
+QA output created by 262
+Silence is golden