diff mbox series

[v3,2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT

Message ID db6c5d87-5a47-75bd-4d24-a135e6bcd783@gmail.com (mailing list archive)
State Superseded
Headers show
Series xfstests: add deadlock between the AGI and AGF with RENAME_WHITEOUT test | expand

Commit Message

Kaixu Xia Sept. 18, 2019, 11:49 a.m. UTC
There is ABBA deadlock bug between the AGI and AGF when performing
rename() with RENAME_WHITEOUT flag, and add this testcase to make
sure the rename() call works well.

Signed-off-by: kaixuxia <kaixuxia@tencent.com>
---
 tests/xfs/512     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/512.out |  2 ++
 tests/xfs/group   |  1 +
 3 files changed, 99 insertions(+)
 create mode 100755 tests/xfs/512
 create mode 100644 tests/xfs/512.out

Comments

Brian Foster Sept. 18, 2019, 1:59 p.m. UTC | #1
On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote:
> There is ABBA deadlock bug between the AGI and AGF when performing
> rename() with RENAME_WHITEOUT flag, and add this testcase to make
> sure the rename() call works well.
> 
> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> ---

FYI, for some reason your patch series isn't threaded on the mailing
list. I thought git send-email did this by default. Assuming you're not
explicitly using --no-thread, you might have to use the --thread option
so this gets posted as a proper series.

>  tests/xfs/512     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/512.out |  2 ++
>  tests/xfs/group   |  1 +
>  3 files changed, 99 insertions(+)
>  create mode 100755 tests/xfs/512
>  create mode 100644 tests/xfs/512.out
> 
> diff --git a/tests/xfs/512 b/tests/xfs/512
> new file mode 100755
> index 0000000..a2089f0
> --- /dev/null
> +++ b/tests/xfs/512
> @@ -0,0 +1,96 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2019 Tencent.  All Rights Reserved.
> +#
> +# FS QA Test 512
> +#
> +# Test the ABBA deadlock case between the AGI and AGF When performing
> +# rename operation with RENAME_WHITEOUT flag.
> +#
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/renameat2
> +
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +_supported_fs xfs
> +_supported_os Linux
> +# single AG will cause default xfs_repair to fail. This test need a
> +# single AG fs, so ignore the check.
> +_require_scratch_nocheck
> +_requires_renameat2 whiteout
> +
> +filter_enospc() {
> +	sed -e '/^.*No space left on device.*/d'
> +}
> +
> +create_file()
> +{
> +	local target_dir=$1
> +	local files_count=$2
> +
> +	for i in $(seq 1 $files_count); do
> +		echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
> +	done
> +}
> +
> +rename_whiteout()
> +{
> +	local target_dir=$1
> +	local files_count=$2
> +
> +	# a long filename could increase the possibility that target_dp
> +	# allocate new blocks(acquire the AGF lock) to store the filename
> +	longnamepre=`$PERL_PROG -e 'print "a"x200;'`
> +
> +	# now try to do rename with RENAME_WHITEOUT flag
> +	for i in $(seq 1 $files_count); do
> +		src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
> +	done
> +}
> +
> +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
> +	_fail "mkfs failed"

This appears to be the only XFS specific bit. Could it be
conditionalized using FSTYP such that this test could go under
tests/generic?

> +_scratch_mount
> +
> +# set the rename and create file counts
> +file_count=50000
> +
> +# create the necessary directory for create and rename operations
> +createdir=$SCRATCH_MNT/createdir
> +mkdir $createdir
> +renamedir=$SCRATCH_MNT/renamedir
> +mkdir $renamedir
> +
> +# create many small files for the rename with RENAME_WHITEOUT
> +create_file $SCRATCH_MNT $file_count
> +
> +# try to create files at the same time to hit the deadlock
> +rename_whiteout $renamedir $file_count &
> +create_file $createdir $file_count &
> +

When I ran this test I noticed that the rename_whiteout task completed
renaming the 50k files before the create_file task created even 30k of
the 50k files. There's no risk of deadlock once one of these tasks
completes, right? If so, that seems like something that could be fixed
up.

Beyond that though, the test itself ran for almost 19 minutes on a vm
with the deadlock fix. That seems like overkill to me for a test that's
so narrowly focused on a particular bug that it's unlikely to fail in
the future. If we can't find a way to get this down to a reasonable time
while still reproducing the deadlock, I'm kind of wondering if there's a
better approach to get more rename coverage from existing tests. For
example, could we add this support to fsstress and see if any of the
existing stress tests might trigger the original problem? Even if we
needed to add a new rename/create focused fsstress test, that might at
least be general enough to provide broader coverage.

Alternatively, what if this test ran a create/rename workload (on a
smaller fileset) for a fixed time of a minute or two and then exited? I
think it would be a reasonable compromise if the test still reproduced
on some smaller frequency, it's just not clear to me how effective such
a test would be without actually trying it. Maybe Eryu has additional
thoughts..

Brian

> +wait
> +echo Silence is golden
> +
> +# Failure comes in the form of a deadlock.
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/512.out b/tests/xfs/512.out
> new file mode 100644
> index 0000000..0aabdef
> --- /dev/null
> +++ b/tests/xfs/512.out
> @@ -0,0 +1,2 @@
> +QA output created by 512
> +Silence is golden
> diff --git a/tests/xfs/group b/tests/xfs/group
> index a7ad300..ed250d6 100644
> --- a/tests/xfs/group
> +++ b/tests/xfs/group
> @@ -509,3 +509,4 @@
>  509 auto ioctl
>  510 auto ioctl quick
>  511 auto quick quota
> +512 auto rename
> -- 
> 1.8.3.1
> 
> -- 
> kaixuxia
Eryu Guan Sept. 19, 2019, 2:37 a.m. UTC | #2
On Wed, Sep 18, 2019 at 09:59:47AM -0400, Brian Foster wrote:
> On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote:
> > There is ABBA deadlock bug between the AGI and AGF when performing
> > rename() with RENAME_WHITEOUT flag, and add this testcase to make
> > sure the rename() call works well.
> > 
> > Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> > ---
> 
> FYI, for some reason your patch series isn't threaded on the mailing
> list. I thought git send-email did this by default. Assuming you're not
> explicitly using --no-thread, you might have to use the --thread option
> so this gets posted as a proper series.
> 
> >  tests/xfs/512     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/512.out |  2 ++
> >  tests/xfs/group   |  1 +
> >  3 files changed, 99 insertions(+)
> >  create mode 100755 tests/xfs/512
> >  create mode 100644 tests/xfs/512.out
> > 
> > diff --git a/tests/xfs/512 b/tests/xfs/512
> > new file mode 100755
> > index 0000000..a2089f0
> > --- /dev/null
> > +++ b/tests/xfs/512
> > @@ -0,0 +1,96 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2019 Tencent.  All Rights Reserved.
> > +#
> > +# FS QA Test 512
> > +#
> > +# Test the ABBA deadlock case between the AGI and AGF When performing
> > +# rename operation with RENAME_WHITEOUT flag.
> > +#
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1	# failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +. ./common/renameat2
> > +
> > +rm -f $seqres.full
> > +
> > +# real QA test starts here
> > +_supported_fs xfs
> > +_supported_os Linux
> > +# single AG will cause default xfs_repair to fail. This test need a
> > +# single AG fs, so ignore the check.
> > +_require_scratch_nocheck
> > +_requires_renameat2 whiteout
> > +
> > +filter_enospc() {
> > +	sed -e '/^.*No space left on device.*/d'
> > +}
> > +
> > +create_file()
> > +{
> > +	local target_dir=$1
> > +	local files_count=$2
> > +
> > +	for i in $(seq 1 $files_count); do
> > +		echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
> > +	done
> > +}
> > +
> > +rename_whiteout()
> > +{
> > +	local target_dir=$1
> > +	local files_count=$2
> > +
> > +	# a long filename could increase the possibility that target_dp
> > +	# allocate new blocks(acquire the AGF lock) to store the filename
> > +	longnamepre=`$PERL_PROG -e 'print "a"x200;'`
> > +
> > +	# now try to do rename with RENAME_WHITEOUT flag
> > +	for i in $(seq 1 $files_count); do
> > +		src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
> > +	done
> > +}
> > +
> > +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
> > +	_fail "mkfs failed"
> 
> This appears to be the only XFS specific bit. Could it be
> conditionalized using FSTYP such that this test could go under
> tests/generic?

Looks like so. I'm wondering if the deadlock is only possible with
single ag xfs (case 1) or single ag is just more easier to hit the
deadlock (case 2).

If it's case 1, I think we could make the test generic by only adding
extra xfs-specific mkfs option when FSTYP is xfs, e.g.

mkfs_opts=""
# here goes comments explaining why xfs needs single ag
if [ "$FSTYP" == "xfs" ]; then
	mkfs_opts="-d agcount=1"
fi

If it's case 2, that depends on what's the posibility to reproduce
without single ag. I can afford 5%-10% or even lower reproduce
posibility (given that the deadlock is a corner case and isn't likely to
happen with normally formatted xfs) to drop "-d agcount=1". So the test
has nothing xfs-specific.

If the reproduce posibility is near 0, then we could go to case 1.

> 
> > +_scratch_mount
> > +
> > +# set the rename and create file counts
> > +file_count=50000
> > +
> > +# create the necessary directory for create and rename operations
> > +createdir=$SCRATCH_MNT/createdir
> > +mkdir $createdir
> > +renamedir=$SCRATCH_MNT/renamedir
> > +mkdir $renamedir
> > +
> > +# create many small files for the rename with RENAME_WHITEOUT
> > +create_file $SCRATCH_MNT $file_count
> > +
> > +# try to create files at the same time to hit the deadlock
> > +rename_whiteout $renamedir $file_count &
> > +create_file $createdir $file_count &
> > +
> 
> When I ran this test I noticed that the rename_whiteout task completed
> renaming the 50k files before the create_file task created even 30k of
> the 50k files. There's no risk of deadlock once one of these tasks
> completes, right? If so, that seems like something that could be fixed
> up.
> 
> Beyond that though, the test itself ran for almost 19 minutes on a vm
> with the deadlock fix. That seems like overkill to me for a test that's
> so narrowly focused on a particular bug that it's unlikely to fail in
> the future. If we can't find a way to get this down to a reasonable time
> while still reproducing the deadlock, I'm kind of wondering if there's a
> better approach to get more rename coverage from existing tests. For
> example, could we add this support to fsstress and see if any of the
> existing stress tests might trigger the original problem? Even if we
> needed to add a new rename/create focused fsstress test, that might at
> least be general enough to provide broader coverage.

yeah, adding renameat2 operations to fsstress looks like a good idea to
extend the test coverage. But that belongs to another patch.

> 
> Alternatively, what if this test ran a create/rename workload (on a
> smaller fileset) for a fixed time of a minute or two and then exited? I
> think it would be a reasonable compromise if the test still reproduced
> on some smaller frequency, it's just not clear to me how effective such
> a test would be without actually trying it. Maybe Eryu has additional
> thoughts..

I agreed, unless the frequency is zero :) As mentioned above, I can
afford smaller frequency if test runs faster. Also, we could scale the
test time based on TIME_FACTOR and add 'stress' group too. (You could
grep for TIME_FACTOR in tests/ dir for examples.)

Thanks,
Eryu

> 
> Brian
> 
> > +wait
> > +echo Silence is golden
> > +
> > +# Failure comes in the form of a deadlock.
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/xfs/512.out b/tests/xfs/512.out
> > new file mode 100644
> > index 0000000..0aabdef
> > --- /dev/null
> > +++ b/tests/xfs/512.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 512
> > +Silence is golden
> > diff --git a/tests/xfs/group b/tests/xfs/group
> > index a7ad300..ed250d6 100644
> > --- a/tests/xfs/group
> > +++ b/tests/xfs/group
> > @@ -509,3 +509,4 @@
> >  509 auto ioctl
> >  510 auto ioctl quick
> >  511 auto quick quota
> > +512 auto rename
> > -- 
> > 1.8.3.1
> > 
> > -- 
> > kaixuxia
Kaixu Xia Sept. 19, 2019, 9:08 a.m. UTC | #3
On 2019/9/18 21:59, Brian Foster wrote:
> On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote:
>> There is ABBA deadlock bug between the AGI and AGF when performing
>> rename() with RENAME_WHITEOUT flag, and add this testcase to make
>> sure the rename() call works well.
>>
>> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
>> ---
> 
> FYI, for some reason your patch series isn't threaded on the mailing
> list. I thought git send-email did this by default. Assuming you're not
> explicitly using --no-thread, you might have to use the --thread option
> so this gets posted as a proper series.
> 
Yeah, thanks!
>>  tests/xfs/512     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  tests/xfs/512.out |  2 ++
>>  tests/xfs/group   |  1 +
>>  3 files changed, 99 insertions(+)
>>  create mode 100755 tests/xfs/512
>>  create mode 100644 tests/xfs/512.out
>>
>> diff --git a/tests/xfs/512 b/tests/xfs/512
>> new file mode 100755
>> index 0000000..a2089f0
>> --- /dev/null
>> +++ b/tests/xfs/512
>> @@ -0,0 +1,96 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (c) 2019 Tencent.  All Rights Reserved.
>> +#
>> +# FS QA Test 512
>> +#
>> +# Test the ABBA deadlock case between the AGI and AGF When performing
>> +# rename operation with RENAME_WHITEOUT flag.
>> +#
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1	# failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +	cd /
>> +	rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/renameat2
>> +
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +_supported_fs xfs
>> +_supported_os Linux
>> +# single AG will cause default xfs_repair to fail. This test need a
>> +# single AG fs, so ignore the check.
>> +_require_scratch_nocheck
>> +_requires_renameat2 whiteout
>> +
>> +filter_enospc() {
>> +	sed -e '/^.*No space left on device.*/d'
>> +}
>> +
>> +create_file()
>> +{
>> +	local target_dir=$1
>> +	local files_count=$2
>> +
>> +	for i in $(seq 1 $files_count); do
>> +		echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
>> +	done
>> +}
>> +
>> +rename_whiteout()
>> +{
>> +	local target_dir=$1
>> +	local files_count=$2
>> +
>> +	# a long filename could increase the possibility that target_dp
>> +	# allocate new blocks(acquire the AGF lock) to store the filename
>> +	longnamepre=`$PERL_PROG -e 'print "a"x200;'`
>> +
>> +	# now try to do rename with RENAME_WHITEOUT flag
>> +	for i in $(seq 1 $files_count); do
>> +		src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
>> +	done
>> +}
>> +
>> +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
>> +	_fail "mkfs failed"
> 
> This appears to be the only XFS specific bit. Could it be
> conditionalized using FSTYP such that this test could go under
> tests/generic?
> 
OK, I'll move this test to tests/generic by using FSTYP.

>> +_scratch_mount
>> +
>> +# set the rename and create file counts
>> +file_count=50000
>> +
>> +# create the necessary directory for create and rename operations
>> +createdir=$SCRATCH_MNT/createdir
>> +mkdir $createdir
>> +renamedir=$SCRATCH_MNT/renamedir
>> +mkdir $renamedir
>> +
>> +# create many small files for the rename with RENAME_WHITEOUT
>> +create_file $SCRATCH_MNT $file_count
>> +
>> +# try to create files at the same time to hit the deadlock
>> +rename_whiteout $renamedir $file_count &
>> +create_file $createdir $file_count &
>> +
> 
> When I ran this test I noticed that the rename_whiteout task completed
> renaming the 50k files before the create_file task created even 30k of
> the 50k files. There's no risk of deadlock once one of these tasks
> completes, right? If so, that seems like something that could be fixed
> up.
> 
> Beyond that though, the test itself ran for almost 19 minutes on a vm
> with the deadlock fix. That seems like overkill to me for a test that's
> so narrowly focused on a particular bug that it's unlikely to fail in
> the future. If we can't find a way to get this down to a reasonable time
> while still reproducing the deadlock, I'm kind of wondering if there's a
> better approach to get more rename coverage from existing tests. For
> example, could we add this support to fsstress and see if any of the
> existing stress tests might trigger the original problem? Even if we
> needed to add a new rename/create focused fsstress test, that might at
> least be general enough to provide broader coverage.
> 
Yeah, rename_whiteout task run faster than create_file task, so maybe
we can set two different files counts for them to reduce the test run
time. This test ran for 380s on my vm with the fixed kernel, but we
still need to find a way to reduce the run time, like the 19 minutes
case. Actually, in most cases, the deadlock happened when the
rename_whiteout task completed renaming hundreds of files. 50000
is set just because this test take 380s on my vm which is acceptable
and the reproduce possibility is near 100%. So maybe we can choose a
proper files count to make the test runs faster. Of course, I'll
also try to use fsstresss and the TIME_FACTOR if they can help to
reduce the run time.
 
> Alternatively, what if this test ran a create/rename workload (on a
> smaller fileset) for a fixed time of a minute or two and then exited? I
> think it would be a reasonable compromise if the test still reproduced
> on some smaller frequency, it's just not clear to me how effective such
> a test would be without actually trying it. Maybe Eryu has additional
> thoughts..
> 
> Brian
> 
>> +wait
>> +echo Silence is golden
>> +
>> +# Failure comes in the form of a deadlock.
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/xfs/512.out b/tests/xfs/512.out
>> new file mode 100644
>> index 0000000..0aabdef
>> --- /dev/null
>> +++ b/tests/xfs/512.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 512
>> +Silence is golden
>> diff --git a/tests/xfs/group b/tests/xfs/group
>> index a7ad300..ed250d6 100644
>> --- a/tests/xfs/group
>> +++ b/tests/xfs/group
>> @@ -509,3 +509,4 @@
>>  509 auto ioctl
>>  510 auto ioctl quick
>>  511 auto quick quota
>> +512 auto rename
>> -- 
>> 1.8.3.1
>>
>> -- 
>> kaixuxia
Brian Foster Sept. 19, 2019, 10:47 a.m. UTC | #4
On Thu, Sep 19, 2019 at 05:08:04PM +0800, kaixuxia wrote:
> 
> 
> On 2019/9/18 21:59, Brian Foster wrote:
> > On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote:
> >> There is ABBA deadlock bug between the AGI and AGF when performing
> >> rename() with RENAME_WHITEOUT flag, and add this testcase to make
> >> sure the rename() call works well.
> >>
> >> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> >> ---
> > 
> > FYI, for some reason your patch series isn't threaded on the mailing
> > list. I thought git send-email did this by default. Assuming you're not
> > explicitly using --no-thread, you might have to use the --thread option
> > so this gets posted as a proper series.
> > 
> Yeah, thanks!
> >>  tests/xfs/512     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  tests/xfs/512.out |  2 ++
> >>  tests/xfs/group   |  1 +
> >>  3 files changed, 99 insertions(+)
> >>  create mode 100755 tests/xfs/512
> >>  create mode 100644 tests/xfs/512.out
> >>
> >> diff --git a/tests/xfs/512 b/tests/xfs/512
> >> new file mode 100755
> >> index 0000000..a2089f0
> >> --- /dev/null
> >> +++ b/tests/xfs/512
> >> @@ -0,0 +1,96 @@
> >> +#! /bin/bash
> >> +# SPDX-License-Identifier: GPL-2.0
> >> +# Copyright (c) 2019 Tencent.  All Rights Reserved.
> >> +#
> >> +# FS QA Test 512
> >> +#
> >> +# Test the ABBA deadlock case between the AGI and AGF When performing
> >> +# rename operation with RENAME_WHITEOUT flag.
> >> +#
> >> +seq=`basename $0`
> >> +seqres=$RESULT_DIR/$seq
> >> +echo "QA output created by $seq"
> >> +
> >> +here=`pwd`
> >> +tmp=/tmp/$$
> >> +status=1	# failure is the default!
> >> +trap "_cleanup; exit \$status" 0 1 2 3 15
> >> +
> >> +_cleanup()
> >> +{
> >> +	cd /
> >> +	rm -f $tmp.*
> >> +}
> >> +
> >> +# get standard environment, filters and checks
> >> +. ./common/rc
> >> +. ./common/filter
> >> +. ./common/renameat2
> >> +
> >> +rm -f $seqres.full
> >> +
> >> +# real QA test starts here
> >> +_supported_fs xfs
> >> +_supported_os Linux
> >> +# single AG will cause default xfs_repair to fail. This test need a
> >> +# single AG fs, so ignore the check.
> >> +_require_scratch_nocheck
> >> +_requires_renameat2 whiteout
> >> +
> >> +filter_enospc() {
> >> +	sed -e '/^.*No space left on device.*/d'
> >> +}
> >> +
> >> +create_file()
> >> +{
> >> +	local target_dir=$1
> >> +	local files_count=$2
> >> +
> >> +	for i in $(seq 1 $files_count); do
> >> +		echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
> >> +	done
> >> +}
> >> +
> >> +rename_whiteout()
> >> +{
> >> +	local target_dir=$1
> >> +	local files_count=$2
> >> +
> >> +	# a long filename could increase the possibility that target_dp
> >> +	# allocate new blocks(acquire the AGF lock) to store the filename
> >> +	longnamepre=`$PERL_PROG -e 'print "a"x200;'`
> >> +
> >> +	# now try to do rename with RENAME_WHITEOUT flag
> >> +	for i in $(seq 1 $files_count); do
> >> +		src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
> >> +	done
> >> +}
> >> +
> >> +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
> >> +	_fail "mkfs failed"
> > 
> > This appears to be the only XFS specific bit. Could it be
> > conditionalized using FSTYP such that this test could go under
> > tests/generic?
> > 
> OK, I'll move this test to tests/generic by using FSTYP.
> 
> >> +_scratch_mount
> >> +
> >> +# set the rename and create file counts
> >> +file_count=50000
> >> +
> >> +# create the necessary directory for create and rename operations
> >> +createdir=$SCRATCH_MNT/createdir
> >> +mkdir $createdir
> >> +renamedir=$SCRATCH_MNT/renamedir
> >> +mkdir $renamedir
> >> +
> >> +# create many small files for the rename with RENAME_WHITEOUT
> >> +create_file $SCRATCH_MNT $file_count
> >> +
> >> +# try to create files at the same time to hit the deadlock
> >> +rename_whiteout $renamedir $file_count &
> >> +create_file $createdir $file_count &
> >> +
> > 
> > When I ran this test I noticed that the rename_whiteout task completed
> > renaming the 50k files before the create_file task created even 30k of
> > the 50k files. There's no risk of deadlock once one of these tasks
> > completes, right? If so, that seems like something that could be fixed
> > up.
> > 
> > Beyond that though, the test itself ran for almost 19 minutes on a vm
> > with the deadlock fix. That seems like overkill to me for a test that's
> > so narrowly focused on a particular bug that it's unlikely to fail in
> > the future. If we can't find a way to get this down to a reasonable time
> > while still reproducing the deadlock, I'm kind of wondering if there's a
> > better approach to get more rename coverage from existing tests. For
> > example, could we add this support to fsstress and see if any of the
> > existing stress tests might trigger the original problem? Even if we
> > needed to add a new rename/create focused fsstress test, that might at
> > least be general enough to provide broader coverage.
> > 
> Yeah, rename_whiteout task run faster than create_file task, so maybe
> we can set two different files counts for them to reduce the test run
> time. This test ran for 380s on my vm with the fixed kernel, but we
> still need to find a way to reduce the run time, like the 19 minutes
> case. Actually, in most cases, the deadlock happened when the
> rename_whiteout task completed renaming hundreds of files. 50000
> is set just because this test take 380s on my vm which is acceptable
> and the reproduce possibility is near 100%. So maybe we can choose a
> proper files count to make the test runs faster. Of course, I'll
> also try to use fsstresss and the TIME_FACTOR if they can help to
> reduce the run time.
>  

I think using different file counts as such is too unpredictable across
different test environments. If we end up with something like the
current test, I'd rather see explicit logic in the test to terminate the
workload thread when the rename thread completes. This probably would
have knocked 2-3 minutes off the slow runtime I reported above.

That aside, I think the fsstress approach is preferable because there is
at least potential to avoid the need for a new test. The relevant
questions to me are:

1.) If you add renameat2 support to fsstress, do any of the existing
fsstress tests reproduce the original problem?

2.) If not, can fsstress reproduce the problem using customized
parameters (i.e., limited to rename and creates)? If so, we may still
need a separate test, but it would be trivial in that it just invokes
fsstress with particular flags for a period of time.

3.) If not, then we need to find a way for this test to run quicker. At
this point, I'm curious how long it takes for this test to reproduce the
problem (on a broken kernel) on average once the setup portion
completes. More than a minute or two, for example, or tens of minutes..
etc.?

Brian

> > Alternatively, what if this test ran a create/rename workload (on a
> > smaller fileset) for a fixed time of a minute or two and then exited? I
> > think it would be a reasonable compromise if the test still reproduced
> > on some smaller frequency, it's just not clear to me how effective such
> > a test would be without actually trying it. Maybe Eryu has additional
> > thoughts..
> > 
> > Brian
> > 
> >> +wait
> >> +echo Silence is golden
> >> +
> >> +# Failure comes in the form of a deadlock.
> >> +
> >> +# success, all done
> >> +status=0
> >> +exit
> >> diff --git a/tests/xfs/512.out b/tests/xfs/512.out
> >> new file mode 100644
> >> index 0000000..0aabdef
> >> --- /dev/null
> >> +++ b/tests/xfs/512.out
> >> @@ -0,0 +1,2 @@
> >> +QA output created by 512
> >> +Silence is golden
> >> diff --git a/tests/xfs/group b/tests/xfs/group
> >> index a7ad300..ed250d6 100644
> >> --- a/tests/xfs/group
> >> +++ b/tests/xfs/group
> >> @@ -509,3 +509,4 @@
> >>  509 auto ioctl
> >>  510 auto ioctl quick
> >>  511 auto quick quota
> >> +512 auto rename
> >> -- 
> >> 1.8.3.1
> >>
> >> -- 
> >> kaixuxia
> 
> -- 
> kaixuxia
Kaixu Xia Sept. 19, 2019, 12:14 p.m. UTC | #5
On 2019/9/19 18:47, Brian Foster wrote:
> On Thu, Sep 19, 2019 at 05:08:04PM +0800, kaixuxia wrote:
>>
>>
>> On 2019/9/18 21:59, Brian Foster wrote:
>>> On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote:
>>>> There is ABBA deadlock bug between the AGI and AGF when performing
>>>> rename() with RENAME_WHITEOUT flag, and add this testcase to make
>>>> sure the rename() call works well.
>>>>
>>>> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
>>>> ---
>>>
>>> FYI, for some reason your patch series isn't threaded on the mailing
>>> list. I thought git send-email did this by default. Assuming you're not
>>> explicitly using --no-thread, you might have to use the --thread option
>>> so this gets posted as a proper series.
>>>
>> Yeah, thanks!
>>>>  tests/xfs/512     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  tests/xfs/512.out |  2 ++
>>>>  tests/xfs/group   |  1 +
>>>>  3 files changed, 99 insertions(+)
>>>>  create mode 100755 tests/xfs/512
>>>>  create mode 100644 tests/xfs/512.out
>>>>
>>>> diff --git a/tests/xfs/512 b/tests/xfs/512
>>>> new file mode 100755
>>>> index 0000000..a2089f0
>>>> --- /dev/null
>>>> +++ b/tests/xfs/512
>>>> @@ -0,0 +1,96 @@
>>>> +#! /bin/bash
>>>> +# SPDX-License-Identifier: GPL-2.0
>>>> +# Copyright (c) 2019 Tencent.  All Rights Reserved.
>>>> +#
>>>> +# FS QA Test 512
>>>> +#
>>>> +# Test the ABBA deadlock case between the AGI and AGF When performing
>>>> +# rename operation with RENAME_WHITEOUT flag.
>>>> +#
>>>> +seq=`basename $0`
>>>> +seqres=$RESULT_DIR/$seq
>>>> +echo "QA output created by $seq"
>>>> +
>>>> +here=`pwd`
>>>> +tmp=/tmp/$$
>>>> +status=1	# failure is the default!
>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>>>> +
>>>> +_cleanup()
>>>> +{
>>>> +	cd /
>>>> +	rm -f $tmp.*
>>>> +}
>>>> +
>>>> +# get standard environment, filters and checks
>>>> +. ./common/rc
>>>> +. ./common/filter
>>>> +. ./common/renameat2
>>>> +
>>>> +rm -f $seqres.full
>>>> +
>>>> +# real QA test starts here
>>>> +_supported_fs xfs
>>>> +_supported_os Linux
>>>> +# single AG will cause default xfs_repair to fail. This test need a
>>>> +# single AG fs, so ignore the check.
>>>> +_require_scratch_nocheck
>>>> +_requires_renameat2 whiteout
>>>> +
>>>> +filter_enospc() {
>>>> +	sed -e '/^.*No space left on device.*/d'
>>>> +}
>>>> +
>>>> +create_file()
>>>> +{
>>>> +	local target_dir=$1
>>>> +	local files_count=$2
>>>> +
>>>> +	for i in $(seq 1 $files_count); do
>>>> +		echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
>>>> +	done
>>>> +}
>>>> +
>>>> +rename_whiteout()
>>>> +{
>>>> +	local target_dir=$1
>>>> +	local files_count=$2
>>>> +
>>>> +	# a long filename could increase the possibility that target_dp
>>>> +	# allocate new blocks(acquire the AGF lock) to store the filename
>>>> +	longnamepre=`$PERL_PROG -e 'print "a"x200;'`
>>>> +
>>>> +	# now try to do rename with RENAME_WHITEOUT flag
>>>> +	for i in $(seq 1 $files_count); do
>>>> +		src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
>>>> +	done
>>>> +}
>>>> +
>>>> +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
>>>> +	_fail "mkfs failed"
>>>
>>> This appears to be the only XFS specific bit. Could it be
>>> conditionalized using FSTYP such that this test could go under
>>> tests/generic?
>>>
>> OK, I'll move this test to tests/generic by using FSTYP.
>>
>>>> +_scratch_mount
>>>> +
>>>> +# set the rename and create file counts
>>>> +file_count=50000
>>>> +
>>>> +# create the necessary directory for create and rename operations
>>>> +createdir=$SCRATCH_MNT/createdir
>>>> +mkdir $createdir
>>>> +renamedir=$SCRATCH_MNT/renamedir
>>>> +mkdir $renamedir
>>>> +
>>>> +# create many small files for the rename with RENAME_WHITEOUT
>>>> +create_file $SCRATCH_MNT $file_count
>>>> +
>>>> +# try to create files at the same time to hit the deadlock
>>>> +rename_whiteout $renamedir $file_count &
>>>> +create_file $createdir $file_count &
>>>> +
>>>
>>> When I ran this test I noticed that the rename_whiteout task completed
>>> renaming the 50k files before the create_file task created even 30k of
>>> the 50k files. There's no risk of deadlock once one of these tasks
>>> completes, right? If so, that seems like something that could be fixed
>>> up.
>>>
>>> Beyond that though, the test itself ran for almost 19 minutes on a vm
>>> with the deadlock fix. That seems like overkill to me for a test that's
>>> so narrowly focused on a particular bug that it's unlikely to fail in
>>> the future. If we can't find a way to get this down to a reasonable time
>>> while still reproducing the deadlock, I'm kind of wondering if there's a
>>> better approach to get more rename coverage from existing tests. For
>>> example, could we add this support to fsstress and see if any of the
>>> existing stress tests might trigger the original problem? Even if we
>>> needed to add a new rename/create focused fsstress test, that might at
>>> least be general enough to provide broader coverage.
>>>
>> Yeah, rename_whiteout task run faster than create_file task, so maybe
>> we can set two different files counts for them to reduce the test run
>> time. This test ran for 380s on my vm with the fixed kernel, but we
>> still need to find a way to reduce the run time, like the 19 minutes
>> case. Actually, in most cases, the deadlock happened when the
>> rename_whiteout task completed renaming hundreds of files. 50000
>> is set just because this test take 380s on my vm which is acceptable
>> and the reproduce possibility is near 100%. So maybe we can choose a
>> proper files count to make the test runs faster. Of course, I'll
>> also try to use fsstresss and the TIME_FACTOR if they can help to
>> reduce the run time.
>>  
> 
> I think using different file counts as such is too unpredictable across
> different test environments. If we end up with something like the
> current test, I'd rather see explicit logic in the test to terminate the
> workload thread when the rename thread completes. This probably would
> have knocked 2-3 minutes off the slow runtime I reported above.
> 
> That aside, I think the fsstress approach is preferable because there is
> at least potential to avoid the need for a new test. The relevant
> questions to me are:
> 
> 1.) If you add renameat2 support to fsstress, do any of the existing
> fsstress tests reproduce the original problem?

Not sure about this, need to do research whether there are existing
fsstress tests can reproduce the problem.	
> 
> 2.) If not, can fsstress reproduce the problem using customized
> parameters (i.e., limited to rename and creates)? If so, we may still
> need a separate test, but it would be trivial in that it just invokes
> fsstress with particular flags for a period of time.
> 
> 3.) If not, then we need to find a way for this test to run quicker. At
> this point, I'm curious how long it takes for this test to reproduce the
> problem (on a broken kernel) on average once the setup portion
> completes. More than a minute or two, for example, or tens of minutes..
> etc.?
> 
About five minutes with 50000 files count on a broken kernel to reproduce
the deadlock on my vm, and the most time is preparing 50000 empty files for
the rename operation.

A example for deadlock happened when renaming 2729 files.
call trace: 
root  31829  ... D+ ... /renamedir/aaaaaaaaaaaaaaaaaaa...aaaaaaaaaaaaaaa2729
# cat /proc/31829/stack 
[<0>] xfs_buf_lock+0x34/0xf0 [xfs]
[<0>] xfs_buf_find+0x215/0x6c0 [xfs]
[<0>] xfs_buf_get_map+0x37/0x230 [xfs]
[<0>] xfs_buf_read_map+0x29/0x190 [xfs]
[<0>] xfs_trans_read_buf_map+0x13d/0x520 [xfs]
[<0>] xfs_read_agi+0xa8/0x160 [xfs]
[<0>] xfs_iunlink_remove+0x6f/0x2a0 [xfs]
[<0>] xfs_rename+0x57a/0xae0 [xfs]
[<0>] xfs_vn_rename+0xe4/0x150 [xfs]
[<0>] vfs_rename+0x1f4/0x7b0
[<0>] do_renameat2+0x431/0x4c0
[<0>] __x64_sys_renameat2+0x20/0x30
[<0>] do_syscall_64+0x49/0x120
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

> Brian
> 
>>> Alternatively, what if this test ran a create/rename workload (on a
>>> smaller fileset) for a fixed time of a minute or two and then exited? I
>>> think it would be a reasonable compromise if the test still reproduced
>>> on some smaller frequency, it's just not clear to me how effective such
>>> a test would be without actually trying it. Maybe Eryu has additional
>>> thoughts..
>>>
>>> Brian
>>>
>>>> +wait
>>>> +echo Silence is golden
>>>> +
>>>> +# Failure comes in the form of a deadlock.
>>>> +
>>>> +# success, all done
>>>> +status=0
>>>> +exit
>>>> diff --git a/tests/xfs/512.out b/tests/xfs/512.out
>>>> new file mode 100644
>>>> index 0000000..0aabdef
>>>> --- /dev/null
>>>> +++ b/tests/xfs/512.out
>>>> @@ -0,0 +1,2 @@
>>>> +QA output created by 512
>>>> +Silence is golden
>>>> diff --git a/tests/xfs/group b/tests/xfs/group
>>>> index a7ad300..ed250d6 100644
>>>> --- a/tests/xfs/group
>>>> +++ b/tests/xfs/group
>>>> @@ -509,3 +509,4 @@
>>>>  509 auto ioctl
>>>>  510 auto ioctl quick
>>>>  511 auto quick quota
>>>> +512 auto rename
>>>> -- 
>>>> 1.8.3.1
>>>>
>>>> -- 
>>>> kaixuxia
>>
>> -- 
>> kaixuxia
Brian Foster Sept. 19, 2019, 12:26 p.m. UTC | #6
On Thu, Sep 19, 2019 at 08:14:10PM +0800, kaixuxia wrote:
> 
> 
> On 2019/9/19 18:47, Brian Foster wrote:
> > On Thu, Sep 19, 2019 at 05:08:04PM +0800, kaixuxia wrote:
> >>
> >>
> >> On 2019/9/18 21:59, Brian Foster wrote:
> >>> On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote:
> >>>> There is ABBA deadlock bug between the AGI and AGF when performing
> >>>> rename() with RENAME_WHITEOUT flag, and add this testcase to make
> >>>> sure the rename() call works well.
> >>>>
> >>>> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> >>>> ---
> >>>
> >>> FYI, for some reason your patch series isn't threaded on the mailing
> >>> list. I thought git send-email did this by default. Assuming you're not
> >>> explicitly using --no-thread, you might have to use the --thread option
> >>> so this gets posted as a proper series.
> >>>
> >> Yeah, thanks!
> >>>>  tests/xfs/512     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  tests/xfs/512.out |  2 ++
> >>>>  tests/xfs/group   |  1 +
> >>>>  3 files changed, 99 insertions(+)
> >>>>  create mode 100755 tests/xfs/512
> >>>>  create mode 100644 tests/xfs/512.out
> >>>>
> >>>> diff --git a/tests/xfs/512 b/tests/xfs/512
> >>>> new file mode 100755
> >>>> index 0000000..a2089f0
> >>>> --- /dev/null
> >>>> +++ b/tests/xfs/512
> >>>> @@ -0,0 +1,96 @@
> >>>> +#! /bin/bash
> >>>> +# SPDX-License-Identifier: GPL-2.0
> >>>> +# Copyright (c) 2019 Tencent.  All Rights Reserved.
> >>>> +#
> >>>> +# FS QA Test 512
> >>>> +#
> >>>> +# Test the ABBA deadlock case between the AGI and AGF When performing
> >>>> +# rename operation with RENAME_WHITEOUT flag.
> >>>> +#
> >>>> +seq=`basename $0`
> >>>> +seqres=$RESULT_DIR/$seq
> >>>> +echo "QA output created by $seq"
> >>>> +
> >>>> +here=`pwd`
> >>>> +tmp=/tmp/$$
> >>>> +status=1	# failure is the default!
> >>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
> >>>> +
> >>>> +_cleanup()
> >>>> +{
> >>>> +	cd /
> >>>> +	rm -f $tmp.*
> >>>> +}
> >>>> +
> >>>> +# get standard environment, filters and checks
> >>>> +. ./common/rc
> >>>> +. ./common/filter
> >>>> +. ./common/renameat2
> >>>> +
> >>>> +rm -f $seqres.full
> >>>> +
> >>>> +# real QA test starts here
> >>>> +_supported_fs xfs
> >>>> +_supported_os Linux
> >>>> +# single AG will cause default xfs_repair to fail. This test need a
> >>>> +# single AG fs, so ignore the check.
> >>>> +_require_scratch_nocheck
> >>>> +_requires_renameat2 whiteout
> >>>> +
> >>>> +filter_enospc() {
> >>>> +	sed -e '/^.*No space left on device.*/d'
> >>>> +}
> >>>> +
> >>>> +create_file()
> >>>> +{
> >>>> +	local target_dir=$1
> >>>> +	local files_count=$2
> >>>> +
> >>>> +	for i in $(seq 1 $files_count); do
> >>>> +		echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
> >>>> +	done
> >>>> +}
> >>>> +
> >>>> +rename_whiteout()
> >>>> +{
> >>>> +	local target_dir=$1
> >>>> +	local files_count=$2
> >>>> +
> >>>> +	# a long filename could increase the possibility that target_dp
> >>>> +	# allocate new blocks(acquire the AGF lock) to store the filename
> >>>> +	longnamepre=`$PERL_PROG -e 'print "a"x200;'`
> >>>> +
> >>>> +	# now try to do rename with RENAME_WHITEOUT flag
> >>>> +	for i in $(seq 1 $files_count); do
> >>>> +		src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
> >>>> +	done
> >>>> +}
> >>>> +
> >>>> +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
> >>>> +	_fail "mkfs failed"
> >>>
> >>> This appears to be the only XFS specific bit. Could it be
> >>> conditionalized using FSTYP such that this test could go under
> >>> tests/generic?
> >>>
> >> OK, I'll move this test to tests/generic by using FSTYP.
> >>
> >>>> +_scratch_mount
> >>>> +
> >>>> +# set the rename and create file counts
> >>>> +file_count=50000
> >>>> +
> >>>> +# create the necessary directory for create and rename operations
> >>>> +createdir=$SCRATCH_MNT/createdir
> >>>> +mkdir $createdir
> >>>> +renamedir=$SCRATCH_MNT/renamedir
> >>>> +mkdir $renamedir
> >>>> +
> >>>> +# create many small files for the rename with RENAME_WHITEOUT
> >>>> +create_file $SCRATCH_MNT $file_count
> >>>> +
> >>>> +# try to create files at the same time to hit the deadlock
> >>>> +rename_whiteout $renamedir $file_count &
> >>>> +create_file $createdir $file_count &
> >>>> +
> >>>
> >>> When I ran this test I noticed that the rename_whiteout task completed
> >>> renaming the 50k files before the create_file task created even 30k of
> >>> the 50k files. There's no risk of deadlock once one of these tasks
> >>> completes, right? If so, that seems like something that could be fixed
> >>> up.
> >>>
> >>> Beyond that though, the test itself ran for almost 19 minutes on a vm
> >>> with the deadlock fix. That seems like overkill to me for a test that's
> >>> so narrowly focused on a particular bug that it's unlikely to fail in
> >>> the future. If we can't find a way to get this down to a reasonable time
> >>> while still reproducing the deadlock, I'm kind of wondering if there's a
> >>> better approach to get more rename coverage from existing tests. For
> >>> example, could we add this support to fsstress and see if any of the
> >>> existing stress tests might trigger the original problem? Even if we
> >>> needed to add a new rename/create focused fsstress test, that might at
> >>> least be general enough to provide broader coverage.
> >>>
> >> Yeah, rename_whiteout task run faster than create_file task, so maybe
> >> we can set two different files counts for them to reduce the test run
> >> time. This test ran for 380s on my vm with the fixed kernel, but we
> >> still need to find a way to reduce the run time, like the 19 minutes
> >> case. Actually, in most cases, the deadlock happened when the
> >> rename_whiteout task completed renaming hundreds of files. 50000
> >> is set just because this test take 380s on my vm which is acceptable
> >> and the reproduce possibility is near 100%. So maybe we can choose a
> >> proper files count to make the test runs faster. Of course, I'll
> >> also try to use fsstresss and the TIME_FACTOR if they can help to
> >> reduce the run time.
> >>  
> > 
> > I think using different file counts as such is too unpredictable across
> > different test environments. If we end up with something like the
> > current test, I'd rather see explicit logic in the test to terminate the
> > workload thread when the rename thread completes. This probably would
> > have knocked 2-3 minutes off the slow runtime I reported above.
> > 
> > That aside, I think the fsstress approach is preferable because there is
> > at least potential to avoid the need for a new test. The relevant
> > questions to me are:
> > 
> > 1.) If you add renameat2 support to fsstress, do any of the existing
> > fsstress tests reproduce the original problem?
> 
> Not sure about this, need to do research whether there are existing
> fsstress tests can reproduce the problem.	

Right, but this will require some work to fsstress to add renameat2
support. To Eryu's earlier point, however, that is probably a useful
patch regardless of what approach we take below.

> > 
> > 2.) If not, can fsstress reproduce the problem using customized
> > parameters (i.e., limited to rename and creates)? If so, we may still
> > need a separate test, but it would be trivial in that it just invokes
> > fsstress with particular flags for a period of time.
> > 
> > 3.) If not, then we need to find a way for this test to run quicker. At
> > this point, I'm curious how long it takes for this test to reproduce the
> > problem (on a broken kernel) on average once the setup portion
> > completes. More than a minute or two, for example, or tens of minutes..
> > etc.?
> > 
> About five minutes with 50000 files count on a broken kernel to reproduce
> the deadlock on my vm, and the most time is preparing 50000 empty files for
> the rename operation.
> 

Ok, so how much time is that outside of the file creation part? You
could add some timestamps to the test to figure that out if necessary.
How many files were renamed before the deadlock occurred?

Brian

> A example for deadlock happened when renaming 2729 files.
> call trace: 
> root  31829  ... D+ ... /renamedir/aaaaaaaaaaaaaaaaaaa...aaaaaaaaaaaaaaa2729
> # cat /proc/31829/stack 
> [<0>] xfs_buf_lock+0x34/0xf0 [xfs]
> [<0>] xfs_buf_find+0x215/0x6c0 [xfs]
> [<0>] xfs_buf_get_map+0x37/0x230 [xfs]
> [<0>] xfs_buf_read_map+0x29/0x190 [xfs]
> [<0>] xfs_trans_read_buf_map+0x13d/0x520 [xfs]
> [<0>] xfs_read_agi+0xa8/0x160 [xfs]
> [<0>] xfs_iunlink_remove+0x6f/0x2a0 [xfs]
> [<0>] xfs_rename+0x57a/0xae0 [xfs]
> [<0>] xfs_vn_rename+0xe4/0x150 [xfs]
> [<0>] vfs_rename+0x1f4/0x7b0
> [<0>] do_renameat2+0x431/0x4c0
> [<0>] __x64_sys_renameat2+0x20/0x30
> [<0>] do_syscall_64+0x49/0x120
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> > Brian
> > 
> >>> Alternatively, what if this test ran a create/rename workload (on a
> >>> smaller fileset) for a fixed time of a minute or two and then exited? I
> >>> think it would be a reasonable compromise if the test still reproduced
> >>> on some smaller frequency, it's just not clear to me how effective such
> >>> a test would be without actually trying it. Maybe Eryu has additional
> >>> thoughts..
> >>>
> >>> Brian
> >>>
> >>>> +wait
> >>>> +echo Silence is golden
> >>>> +
> >>>> +# Failure comes in the form of a deadlock.
> >>>> +
> >>>> +# success, all done
> >>>> +status=0
> >>>> +exit
> >>>> diff --git a/tests/xfs/512.out b/tests/xfs/512.out
> >>>> new file mode 100644
> >>>> index 0000000..0aabdef
> >>>> --- /dev/null
> >>>> +++ b/tests/xfs/512.out
> >>>> @@ -0,0 +1,2 @@
> >>>> +QA output created by 512
> >>>> +Silence is golden
> >>>> diff --git a/tests/xfs/group b/tests/xfs/group
> >>>> index a7ad300..ed250d6 100644
> >>>> --- a/tests/xfs/group
> >>>> +++ b/tests/xfs/group
> >>>> @@ -509,3 +509,4 @@
> >>>>  509 auto ioctl
> >>>>  510 auto ioctl quick
> >>>>  511 auto quick quota
> >>>> +512 auto rename
> >>>> -- 
> >>>> 1.8.3.1
> >>>>
> >>>> -- 
> >>>> kaixuxia
> >>
> >> -- 
> >> kaixuxia
> 
> -- 
> kaixuxia
Kaixu Xia Sept. 20, 2019, 9:18 a.m. UTC | #7
On 2019/9/19 20:26, Brian Foster wrote:
> On Thu, Sep 19, 2019 at 08:14:10PM +0800, kaixuxia wrote:
>>
>>
>> On 2019/9/19 18:47, Brian Foster wrote:
>>> On Thu, Sep 19, 2019 at 05:08:04PM +0800, kaixuxia wrote:
>>>>
>>>>
>>>> On 2019/9/18 21:59, Brian Foster wrote:
>>>>> On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote:
>>>>>> There is ABBA deadlock bug between the AGI and AGF when performing
>>>>>> rename() with RENAME_WHITEOUT flag, and add this testcase to make
>>>>>> sure the rename() call works well.
>>>>>>
>>>>>> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
>>>>>> ---
>>>>>
>>>>> FYI, for some reason your patch series isn't threaded on the mailing
>>>>> list. I thought git send-email did this by default. Assuming you're not
>>>>> explicitly using --no-thread, you might have to use the --thread option
>>>>> so this gets posted as a proper series.
>>>>>
>>>> Yeah, thanks!
>>>>>>  tests/xfs/512     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  tests/xfs/512.out |  2 ++
>>>>>>  tests/xfs/group   |  1 +
>>>>>>  3 files changed, 99 insertions(+)
>>>>>>  create mode 100755 tests/xfs/512
>>>>>>  create mode 100644 tests/xfs/512.out
>>>>>>
>>>>>> diff --git a/tests/xfs/512 b/tests/xfs/512
>>>>>> new file mode 100755
>>>>>> index 0000000..a2089f0
>>>>>> --- /dev/null
>>>>>> +++ b/tests/xfs/512
>>>>>> @@ -0,0 +1,96 @@
>>>>>> +#! /bin/bash
>>>>>> +# SPDX-License-Identifier: GPL-2.0
>>>>>> +# Copyright (c) 2019 Tencent.  All Rights Reserved.
>>>>>> +#
>>>>>> +# FS QA Test 512
>>>>>> +#
>>>>>> +# Test the ABBA deadlock case between the AGI and AGF When performing
>>>>>> +# rename operation with RENAME_WHITEOUT flag.
>>>>>> +#
>>>>>> +seq=`basename $0`
>>>>>> +seqres=$RESULT_DIR/$seq
>>>>>> +echo "QA output created by $seq"
>>>>>> +
>>>>>> +here=`pwd`
>>>>>> +tmp=/tmp/$$
>>>>>> +status=1	# failure is the default!
>>>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>>>>>> +
>>>>>> +_cleanup()
>>>>>> +{
>>>>>> +	cd /
>>>>>> +	rm -f $tmp.*
>>>>>> +}
>>>>>> +
>>>>>> +# get standard environment, filters and checks
>>>>>> +. ./common/rc
>>>>>> +. ./common/filter
>>>>>> +. ./common/renameat2
>>>>>> +
>>>>>> +rm -f $seqres.full
>>>>>> +
>>>>>> +# real QA test starts here
>>>>>> +_supported_fs xfs
>>>>>> +_supported_os Linux
>>>>>> +# single AG will cause default xfs_repair to fail. This test need a
>>>>>> +# single AG fs, so ignore the check.
>>>>>> +_require_scratch_nocheck
>>>>>> +_requires_renameat2 whiteout
>>>>>> +
>>>>>> +filter_enospc() {
>>>>>> +	sed -e '/^.*No space left on device.*/d'
>>>>>> +}
>>>>>> +
>>>>>> +create_file()
>>>>>> +{
>>>>>> +	local target_dir=$1
>>>>>> +	local files_count=$2
>>>>>> +
>>>>>> +	for i in $(seq 1 $files_count); do
>>>>>> +		echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
>>>>>> +	done
>>>>>> +}
>>>>>> +
>>>>>> +rename_whiteout()
>>>>>> +{
>>>>>> +	local target_dir=$1
>>>>>> +	local files_count=$2
>>>>>> +
>>>>>> +	# a long filename could increase the possibility that target_dp
>>>>>> +	# allocate new blocks(acquire the AGF lock) to store the filename
>>>>>> +	longnamepre=`$PERL_PROG -e 'print "a"x200;'`
>>>>>> +
>>>>>> +	# now try to do rename with RENAME_WHITEOUT flag
>>>>>> +	for i in $(seq 1 $files_count); do
>>>>>> +		src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
>>>>>> +	done
>>>>>> +}
>>>>>> +
>>>>>> +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
>>>>>> +	_fail "mkfs failed"
>>>>>
>>>>> This appears to be the only XFS specific bit. Could it be
>>>>> conditionalized using FSTYP such that this test could go under
>>>>> tests/generic?
>>>>>
>>>> OK, I'll move this test to tests/generic by using FSTYP.
>>>>
>>>>>> +_scratch_mount
>>>>>> +
>>>>>> +# set the rename and create file counts
>>>>>> +file_count=50000
>>>>>> +
>>>>>> +# create the necessary directory for create and rename operations
>>>>>> +createdir=$SCRATCH_MNT/createdir
>>>>>> +mkdir $createdir
>>>>>> +renamedir=$SCRATCH_MNT/renamedir
>>>>>> +mkdir $renamedir
>>>>>> +
>>>>>> +# create many small files for the rename with RENAME_WHITEOUT
>>>>>> +create_file $SCRATCH_MNT $file_count
>>>>>> +
>>>>>> +# try to create files at the same time to hit the deadlock
>>>>>> +rename_whiteout $renamedir $file_count &
>>>>>> +create_file $createdir $file_count &
>>>>>> +
>>>>>
>>>>> When I ran this test I noticed that the rename_whiteout task completed
>>>>> renaming the 50k files before the create_file task created even 30k of
>>>>> the 50k files. There's no risk of deadlock once one of these tasks
>>>>> completes, right? If so, that seems like something that could be fixed
>>>>> up.
>>>>>
>>>>> Beyond that though, the test itself ran for almost 19 minutes on a vm
>>>>> with the deadlock fix. That seems like overkill to me for a test that's
>>>>> so narrowly focused on a particular bug that it's unlikely to fail in
>>>>> the future. If we can't find a way to get this down to a reasonable time
>>>>> while still reproducing the deadlock, I'm kind of wondering if there's a
>>>>> better approach to get more rename coverage from existing tests. For
>>>>> example, could we add this support to fsstress and see if any of the
>>>>> existing stress tests might trigger the original problem? Even if we
>>>>> needed to add a new rename/create focused fsstress test, that might at
>>>>> least be general enough to provide broader coverage.
>>>>>
>>>> Yeah, rename_whiteout task run faster than create_file task, so maybe
>>>> we can set two different files counts for them to reduce the test run
>>>> time. This test ran for 380s on my vm with the fixed kernel, but we
>>>> still need to find a way to reduce the run time, like the 19 minutes
>>>> case. Actually, in most cases, the deadlock happened when the
>>>> rename_whiteout task completed renaming hundreds of files. 50000
>>>> is set just because this test take 380s on my vm which is acceptable
>>>> and the reproduce possibility is near 100%. So maybe we can choose a
>>>> proper files count to make the test runs faster. Of course, I'll
>>>> also try to use fsstresss and the TIME_FACTOR if they can help to
>>>> reduce the run time.
>>>>  
>>>
>>> I think using different file counts as such is too unpredictable across
>>> different test environments. If we end up with something like the
>>> current test, I'd rather see explicit logic in the test to terminate the
>>> workload thread when the rename thread completes. This probably would
>>> have knocked 2-3 minutes off the slow runtime I reported above.
>>>
>>> That aside, I think the fsstress approach is preferable because there is
>>> at least potential to avoid the need for a new test. The relevant
>>> questions to me are:
>>>
>>> 1.) If you add renameat2 support to fsstress, do any of the existing
>>> fsstress tests reproduce the original problem?
>>
>> Not sure about this, need to do research whether there are existing
>> fsstress tests can reproduce the problem.	
> 
> Right, but this will require some work to fsstress to add renameat2
> support. To Eryu's earlier point, however, that is probably a useful
> patch regardless of what approach we take below.

Yeah, if taking the fsstress approach to reproduce the deadlock we need
to add renameat2 support for fsstress. Given that the deadlock is a corner
case and need some special settings for higher reproduce possibility, such
as the smaller bsize and agcount, the longer rename target filename, and all
the rename call need have the same target_dp to acquire the AGF lock
(xfs_dir_createname()), so we may need a separate test with using customized
parameters(limited to rename(whiteout) and creates). If not, the possibility
would be lower and also need longer run time.     

> 
>>>
>>> 2.) If not, can fsstress reproduce the problem using customized
>>> parameters (i.e., limited to rename and creates)? If so, we may still
>>> need a separate test, but it would be trivial in that it just invokes
>>> fsstress with particular flags for a period of time.
>>>
>>> 3.) If not, then we need to find a way for this test to run quicker. At
>>> this point, I'm curious how long it takes for this test to reproduce the
>>> problem (on a broken kernel) on average once the setup portion
>>> completes. More than a minute or two, for example, or tens of minutes..
>>> etc.?
>>>
>> About five minutes with 50000 files count on a broken kernel to reproduce
>> the deadlock on my vm, and the most time is preparing 50000 empty files for
>> the rename operation.
>>
> 
> Ok, so how much time is that outside of the file creation part? You
> could add some timestamps to the test to figure that out if necessary.
> How many files were renamed before the deadlock occurred?
> 
About one or two minutes that outside of the file creation on my vm. 
The number of renamed files before the deadlock occurred is not fixed,
the most common range is from 500 to 10000 files. Of course, this range
maybe change on different test environments...   

> Brian
> 
>> A example for deadlock happened when renaming 2729 files.
>> call trace: 
>> root  31829  ... D+ ... /renamedir/aaaaaaaaaaaaaaaaaaa...aaaaaaaaaaaaaaa2729
>> # cat /proc/31829/stack 
>> [<0>] xfs_buf_lock+0x34/0xf0 [xfs]
>> [<0>] xfs_buf_find+0x215/0x6c0 [xfs]
>> [<0>] xfs_buf_get_map+0x37/0x230 [xfs]
>> [<0>] xfs_buf_read_map+0x29/0x190 [xfs]
>> [<0>] xfs_trans_read_buf_map+0x13d/0x520 [xfs]
>> [<0>] xfs_read_agi+0xa8/0x160 [xfs]
>> [<0>] xfs_iunlink_remove+0x6f/0x2a0 [xfs]
>> [<0>] xfs_rename+0x57a/0xae0 [xfs]
>> [<0>] xfs_vn_rename+0xe4/0x150 [xfs]
>> [<0>] vfs_rename+0x1f4/0x7b0
>> [<0>] do_renameat2+0x431/0x4c0
>> [<0>] __x64_sys_renameat2+0x20/0x30
>> [<0>] do_syscall_64+0x49/0x120
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>>> Brian
>>>
>>>>> Alternatively, what if this test ran a create/rename workload (on a
>>>>> smaller fileset) for a fixed time of a minute or two and then exited? I
>>>>> think it would be a reasonable compromise if the test still reproduced
>>>>> on some smaller frequency, it's just not clear to me how effective such
>>>>> a test would be without actually trying it. Maybe Eryu has additional
>>>>> thoughts..
>>>>>
>>>>> Brian
>>>>>
>>>>>> +wait
>>>>>> +echo Silence is golden
>>>>>> +
>>>>>> +# Failure comes in the form of a deadlock.
>>>>>> +
>>>>>> +# success, all done
>>>>>> +status=0
>>>>>> +exit
>>>>>> diff --git a/tests/xfs/512.out b/tests/xfs/512.out
>>>>>> new file mode 100644
>>>>>> index 0000000..0aabdef
>>>>>> --- /dev/null
>>>>>> +++ b/tests/xfs/512.out
>>>>>> @@ -0,0 +1,2 @@
>>>>>> +QA output created by 512
>>>>>> +Silence is golden
>>>>>> diff --git a/tests/xfs/group b/tests/xfs/group
>>>>>> index a7ad300..ed250d6 100644
>>>>>> --- a/tests/xfs/group
>>>>>> +++ b/tests/xfs/group
>>>>>> @@ -509,3 +509,4 @@
>>>>>>  509 auto ioctl
>>>>>>  510 auto ioctl quick
>>>>>>  511 auto quick quota
>>>>>> +512 auto rename
>>>>>> -- 
>>>>>> 1.8.3.1
>>>>>>
>>>>>> -- 
>>>>>> kaixuxia
>>>>
>>>> -- 
>>>> kaixuxia
>>
>> -- 
>> kaixuxia
Brian Foster Sept. 20, 2019, 12:20 p.m. UTC | #8
On Fri, Sep 20, 2019 at 05:18:00PM +0800, kaixuxia wrote:
> 
> On 2019/9/19 20:26, Brian Foster wrote:
> > On Thu, Sep 19, 2019 at 08:14:10PM +0800, kaixuxia wrote:
> >>
> >>
> >> On 2019/9/19 18:47, Brian Foster wrote:
> >>> On Thu, Sep 19, 2019 at 05:08:04PM +0800, kaixuxia wrote:
> >>>>
> >>>>
> >>>> On 2019/9/18 21:59, Brian Foster wrote:
> >>>>> On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote:
> >>>>>> There is ABBA deadlock bug between the AGI and AGF when performing
> >>>>>> rename() with RENAME_WHITEOUT flag, and add this testcase to make
> >>>>>> sure the rename() call works well.
> >>>>>>
> >>>>>> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> >>>>>> ---
> >>>>>
> >>>>> FYI, for some reason your patch series isn't threaded on the mailing
> >>>>> list. I thought git send-email did this by default. Assuming you're not
> >>>>> explicitly using --no-thread, you might have to use the --thread option
> >>>>> so this gets posted as a proper series.
> >>>>>
> >>>> Yeah, thanks!
> >>>>>>  tests/xfs/512     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>  tests/xfs/512.out |  2 ++
> >>>>>>  tests/xfs/group   |  1 +
> >>>>>>  3 files changed, 99 insertions(+)
> >>>>>>  create mode 100755 tests/xfs/512
> >>>>>>  create mode 100644 tests/xfs/512.out
> >>>>>>
> >>>>>> diff --git a/tests/xfs/512 b/tests/xfs/512
> >>>>>> new file mode 100755
> >>>>>> index 0000000..a2089f0
> >>>>>> --- /dev/null
> >>>>>> +++ b/tests/xfs/512
> >>>>>> @@ -0,0 +1,96 @@
> >>>>>> +#! /bin/bash
> >>>>>> +# SPDX-License-Identifier: GPL-2.0
> >>>>>> +# Copyright (c) 2019 Tencent.  All Rights Reserved.
> >>>>>> +#
> >>>>>> +# FS QA Test 512
> >>>>>> +#
> >>>>>> +# Test the ABBA deadlock case between the AGI and AGF When performing
> >>>>>> +# rename operation with RENAME_WHITEOUT flag.
> >>>>>> +#
> >>>>>> +seq=`basename $0`
> >>>>>> +seqres=$RESULT_DIR/$seq
> >>>>>> +echo "QA output created by $seq"
> >>>>>> +
> >>>>>> +here=`pwd`
> >>>>>> +tmp=/tmp/$$
> >>>>>> +status=1	# failure is the default!
> >>>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
> >>>>>> +
> >>>>>> +_cleanup()
> >>>>>> +{
> >>>>>> +	cd /
> >>>>>> +	rm -f $tmp.*
> >>>>>> +}
> >>>>>> +
> >>>>>> +# get standard environment, filters and checks
> >>>>>> +. ./common/rc
> >>>>>> +. ./common/filter
> >>>>>> +. ./common/renameat2
> >>>>>> +
> >>>>>> +rm -f $seqres.full
> >>>>>> +
> >>>>>> +# real QA test starts here
> >>>>>> +_supported_fs xfs
> >>>>>> +_supported_os Linux
> >>>>>> +# single AG will cause default xfs_repair to fail. This test need a
> >>>>>> +# single AG fs, so ignore the check.
> >>>>>> +_require_scratch_nocheck
> >>>>>> +_requires_renameat2 whiteout
> >>>>>> +
> >>>>>> +filter_enospc() {
> >>>>>> +	sed -e '/^.*No space left on device.*/d'
> >>>>>> +}
> >>>>>> +
> >>>>>> +create_file()
> >>>>>> +{
> >>>>>> +	local target_dir=$1
> >>>>>> +	local files_count=$2
> >>>>>> +
> >>>>>> +	for i in $(seq 1 $files_count); do
> >>>>>> +		echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
> >>>>>> +	done
> >>>>>> +}
> >>>>>> +
> >>>>>> +rename_whiteout()
> >>>>>> +{
> >>>>>> +	local target_dir=$1
> >>>>>> +	local files_count=$2
> >>>>>> +
> >>>>>> +	# a long filename could increase the possibility that target_dp
> >>>>>> +	# allocate new blocks(acquire the AGF lock) to store the filename
> >>>>>> +	longnamepre=`$PERL_PROG -e 'print "a"x200;'`
> >>>>>> +
> >>>>>> +	# now try to do rename with RENAME_WHITEOUT flag
> >>>>>> +	for i in $(seq 1 $files_count); do
> >>>>>> +		src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
> >>>>>> +	done
> >>>>>> +}
> >>>>>> +
> >>>>>> +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
> >>>>>> +	_fail "mkfs failed"
> >>>>>
> >>>>> This appears to be the only XFS specific bit. Could it be
> >>>>> conditionalized using FSTYP such that this test could go under
> >>>>> tests/generic?
> >>>>>
> >>>> OK, I'll move this test to tests/generic by using FSTYP.
> >>>>
> >>>>>> +_scratch_mount
> >>>>>> +
> >>>>>> +# set the rename and create file counts
> >>>>>> +file_count=50000
> >>>>>> +
> >>>>>> +# create the necessary directory for create and rename operations
> >>>>>> +createdir=$SCRATCH_MNT/createdir
> >>>>>> +mkdir $createdir
> >>>>>> +renamedir=$SCRATCH_MNT/renamedir
> >>>>>> +mkdir $renamedir
> >>>>>> +
> >>>>>> +# create many small files for the rename with RENAME_WHITEOUT
> >>>>>> +create_file $SCRATCH_MNT $file_count
> >>>>>> +
> >>>>>> +# try to create files at the same time to hit the deadlock
> >>>>>> +rename_whiteout $renamedir $file_count &
> >>>>>> +create_file $createdir $file_count &
> >>>>>> +
> >>>>>
> >>>>> When I ran this test I noticed that the rename_whiteout task completed
> >>>>> renaming the 50k files before the create_file task created even 30k of
> >>>>> the 50k files. There's no risk of deadlock once one of these tasks
> >>>>> completes, right? If so, that seems like something that could be fixed
> >>>>> up.
> >>>>>
> >>>>> Beyond that though, the test itself ran for almost 19 minutes on a vm
> >>>>> with the deadlock fix. That seems like overkill to me for a test that's
> >>>>> so narrowly focused on a particular bug that it's unlikely to fail in
> >>>>> the future. If we can't find a way to get this down to a reasonable time
> >>>>> while still reproducing the deadlock, I'm kind of wondering if there's a
> >>>>> better approach to get more rename coverage from existing tests. For
> >>>>> example, could we add this support to fsstress and see if any of the
> >>>>> existing stress tests might trigger the original problem? Even if we
> >>>>> needed to add a new rename/create focused fsstress test, that might at
> >>>>> least be general enough to provide broader coverage.
> >>>>>
> >>>> Yeah, rename_whiteout task run faster than create_file task, so maybe
> >>>> we can set two different files counts for them to reduce the test run
> >>>> time. This test ran for 380s on my vm with the fixed kernel, but we
> >>>> still need to find a way to reduce the run time, like the 19 minutes
> >>>> case. Actually, in most cases, the deadlock happened when the
> >>>> rename_whiteout task completed renaming hundreds of files. 50000
> >>>> is set just because this test take 380s on my vm which is acceptable
> >>>> and the reproduce possibility is near 100%. So maybe we can choose a
> >>>> proper files count to make the test runs faster. Of course, I'll
> >>>> also try to use fsstresss and the TIME_FACTOR if they can help to
> >>>> reduce the run time.
> >>>>  
> >>>
> >>> I think using different file counts as such is too unpredictable across
> >>> different test environments. If we end up with something like the
> >>> current test, I'd rather see explicit logic in the test to terminate the
> >>> workload thread when the rename thread completes. This probably would
> >>> have knocked 2-3 minutes off the slow runtime I reported above.
> >>>
> >>> That aside, I think the fsstress approach is preferable because there is
> >>> at least potential to avoid the need for a new test. The relevant
> >>> questions to me are:
> >>>
> >>> 1.) If you add renameat2 support to fsstress, do any of the existing
> >>> fsstress tests reproduce the original problem?
> >>
> >> Not sure about this, need to do research whether there are existing
> >> fsstress tests can reproduce the problem.	
> > 
> > Right, but this will require some work to fsstress to add renameat2
> > support. To Eryu's earlier point, however, that is probably a useful
> > patch regardless of what approach we take below.
> 
> Yeah, if taking the fsstress approach to reproduce the deadlock we need
> to add renameat2 support for fsstress. Given that the deadlock is a corner
> case and need some special settings for higher reproduce possibility, such
> as the smaller bsize and agcount, the longer rename target filename, and all
> the rename call need have the same target_dp to acquire the AGF lock
> (xfs_dir_createname()), so we may need a separate test with using customized
> parameters(limited to rename(whiteout) and creates). If not, the possibility
> would be lower and also need longer run time.     
> 
> > 
> >>>
> >>> 2.) If not, can fsstress reproduce the problem using customized
> >>> parameters (i.e., limited to rename and creates)? If so, we may still
> >>> need a separate test, but it would be trivial in that it just invokes
> >>> fsstress with particular flags for a period of time.
> >>>
> >>> 3.) If not, then we need to find a way for this test to run quicker. At
> >>> this point, I'm curious how long it takes for this test to reproduce the
> >>> problem (on a broken kernel) on average once the setup portion
> >>> completes. More than a minute or two, for example, or tens of minutes..
> >>> etc.?
> >>>
> >> About five minutes with 50000 files count on a broken kernel to reproduce
> >> the deadlock on my vm, and the most time is preparing 50000 empty files for
> >> the rename operation.
> >>
> > 
> > Ok, so how much time is that outside of the file creation part? You
> > could add some timestamps to the test to figure that out if necessary.
> > How many files were renamed before the deadlock occurred?
> > 
> About one or two minutes that outside of the file creation on my vm. 
> The number of renamed files before the deadlock occurred is not fixed,
> the most common range is from 500 to 10000 files. Of course, this range
> maybe change on different test environments...   
> 

Ok. So FWIW a 10k file count and immediate exit after the renameat task
completes is a less noticeable 3m45s on my low end vm. I'd still prefer
to see an fsstress test if possible, but that might be a reasonable
fallback option.

Brian

> > Brian
> > 
> >> A example for deadlock happened when renaming 2729 files.
> >> call trace: 
> >> root  31829  ... D+ ... /renamedir/aaaaaaaaaaaaaaaaaaa...aaaaaaaaaaaaaaa2729
> >> # cat /proc/31829/stack 
> >> [<0>] xfs_buf_lock+0x34/0xf0 [xfs]
> >> [<0>] xfs_buf_find+0x215/0x6c0 [xfs]
> >> [<0>] xfs_buf_get_map+0x37/0x230 [xfs]
> >> [<0>] xfs_buf_read_map+0x29/0x190 [xfs]
> >> [<0>] xfs_trans_read_buf_map+0x13d/0x520 [xfs]
> >> [<0>] xfs_read_agi+0xa8/0x160 [xfs]
> >> [<0>] xfs_iunlink_remove+0x6f/0x2a0 [xfs]
> >> [<0>] xfs_rename+0x57a/0xae0 [xfs]
> >> [<0>] xfs_vn_rename+0xe4/0x150 [xfs]
> >> [<0>] vfs_rename+0x1f4/0x7b0
> >> [<0>] do_renameat2+0x431/0x4c0
> >> [<0>] __x64_sys_renameat2+0x20/0x30
> >> [<0>] do_syscall_64+0x49/0x120
> >> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>
> >>> Brian
> >>>
> >>>>> Alternatively, what if this test ran a create/rename workload (on a
> >>>>> smaller fileset) for a fixed time of a minute or two and then exited? I
> >>>>> think it would be a reasonable compromise if the test still reproduced
> >>>>> on some smaller frequency, it's just not clear to me how effective such
> >>>>> a test would be without actually trying it. Maybe Eryu has additional
> >>>>> thoughts..
> >>>>>
> >>>>> Brian
> >>>>>
> >>>>>> +wait
> >>>>>> +echo Silence is golden
> >>>>>> +
> >>>>>> +# Failure comes in the form of a deadlock.
> >>>>>> +
> >>>>>> +# success, all done
> >>>>>> +status=0
> >>>>>> +exit
> >>>>>> diff --git a/tests/xfs/512.out b/tests/xfs/512.out
> >>>>>> new file mode 100644
> >>>>>> index 0000000..0aabdef
> >>>>>> --- /dev/null
> >>>>>> +++ b/tests/xfs/512.out
> >>>>>> @@ -0,0 +1,2 @@
> >>>>>> +QA output created by 512
> >>>>>> +Silence is golden
> >>>>>> diff --git a/tests/xfs/group b/tests/xfs/group
> >>>>>> index a7ad300..ed250d6 100644
> >>>>>> --- a/tests/xfs/group
> >>>>>> +++ b/tests/xfs/group
> >>>>>> @@ -509,3 +509,4 @@
> >>>>>>  509 auto ioctl
> >>>>>>  510 auto ioctl quick
> >>>>>>  511 auto quick quota
> >>>>>> +512 auto rename
> >>>>>> -- 
> >>>>>> 1.8.3.1
> >>>>>>
> >>>>>> -- 
> >>>>>> kaixuxia
> >>>>
> >>>> -- 
> >>>> kaixuxia
> >>
> >> -- 
> >> kaixuxia
> 
> -- 
> kaixuxia
diff mbox series

Patch

diff --git a/tests/xfs/512 b/tests/xfs/512
new file mode 100755
index 0000000..a2089f0
--- /dev/null
+++ b/tests/xfs/512
@@ -0,0 +1,96 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2019 Tencent.  All Rights Reserved.
+#
+# FS QA Test 512
+#
+# Test the ABBA deadlock case between the AGI and AGF When performing
+# rename operation with RENAME_WHITEOUT flag.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/renameat2
+
+rm -f $seqres.full
+
+# real QA test starts here
+_supported_fs xfs
+_supported_os Linux
+# single AG will cause default xfs_repair to fail. This test need a
+# single AG fs, so ignore the check.
+_require_scratch_nocheck
+_requires_renameat2 whiteout
+
+filter_enospc() {
+	sed -e '/^.*No space left on device.*/d'
+}
+
+create_file()
+{
+	local target_dir=$1
+	local files_count=$2
+
+	for i in $(seq 1 $files_count); do
+		echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
+	done
+}
+
+rename_whiteout()
+{
+	local target_dir=$1
+	local files_count=$2
+
+	# a long filename could increase the possibility that target_dp
+	# allocate new blocks(acquire the AGF lock) to store the filename
+	longnamepre=`$PERL_PROG -e 'print "a"x200;'`
+
+	# now try to do rename with RENAME_WHITEOUT flag
+	for i in $(seq 1 $files_count); do
+		src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
+	done
+}
+
+_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
+	_fail "mkfs failed"
+_scratch_mount
+
+# set the rename and create file counts
+file_count=50000
+
+# create the necessary directory for create and rename operations
+createdir=$SCRATCH_MNT/createdir
+mkdir $createdir
+renamedir=$SCRATCH_MNT/renamedir
+mkdir $renamedir
+
+# create many small files for the rename with RENAME_WHITEOUT
+create_file $SCRATCH_MNT $file_count
+
+# try to create files at the same time to hit the deadlock
+rename_whiteout $renamedir $file_count &
+create_file $createdir $file_count &
+
+wait
+echo Silence is golden
+
+# Failure comes in the form of a deadlock.
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/512.out b/tests/xfs/512.out
new file mode 100644
index 0000000..0aabdef
--- /dev/null
+++ b/tests/xfs/512.out
@@ -0,0 +1,2 @@ 
+QA output created by 512
+Silence is golden
diff --git a/tests/xfs/group b/tests/xfs/group
index a7ad300..ed250d6 100644
--- a/tests/xfs/group
+++ b/tests/xfs/group
@@ -509,3 +509,4 @@ 
 509 auto ioctl
 510 auto ioctl quick
 511 auto quick quota
+512 auto rename