diff mbox series

xfs: test mkfs.xfs sizing of internal logs that

Message ID Yo03mZ12X1nLGihK@magnolia (mailing list archive)
State New, archived
Headers show
Series xfs: test mkfs.xfs sizing of internal logs that | expand

Commit Message

Darrick J. Wong May 24, 2022, 7:52 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

This is a regression test that exercises the mkfs.xfs code that creates
log sizes that are very close to the AG size when stripe units are in
play and/or when the log is forced to be in AG 0.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/843     |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/843.out |    2 ++
 2 files changed, 58 insertions(+)
 create mode 100755 tests/xfs/843
 create mode 100644 tests/xfs/843.out

Comments

Dave Chinner May 24, 2022, 11:44 p.m. UTC | #1
On Tue, May 24, 2022 at 12:52:57PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This is a regression test that exercises the mkfs.xfs code that creates
> log sizes that are very close to the AG size when stripe units are in
> play and/or when the log is forced to be in AG 0.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/xfs/843     |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/843.out |    2 ++
>  2 files changed, 58 insertions(+)
>  create mode 100755 tests/xfs/843
>  create mode 100644 tests/xfs/843.out
> 
> diff --git a/tests/xfs/843 b/tests/xfs/843
> new file mode 100755
> index 00000000..3384b1aa
> --- /dev/null
> +++ b/tests/xfs/843
> @@ -0,0 +1,56 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test 843
> +#
> +# Now that we've increased the default log size calculation, test mkfs with
> +# various stripe units and filesystem sizes to see if we can provoke mkfs into
> +# breaking.
> +#
> +. ./common/preamble
> +_begin_fstest auto mkfs
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -r -f $tmp.* $testfile
> +}

I'd omit this completely.

> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs xfs
> +_require_test
> +
> +testfile=$TEST_DIR/a
> +rm -f $testfile
> +
> +test_format() {
> +	local tag="$1"
> +	shift
> +
> +	echo "$tag" >> $seqres.full
> +	$MKFS_XFS_PROG $@ -d file,name=$testfile &>> $seqres.full
> +	local res=$?
> +	test $res -eq 0 || echo "$tag FAIL $res" | tee -a $seqres.full

What breakage are you trying to provoke? Just the log size
calculation? If so, why do we need to actually write the filesystem
to disk? Won't "-N" still calculate everything and fail if it's
broken or quit with success without needing to write anything to
disk?

> +}
> +
> +# First we try various small filesystems and stripe sizes.
> +for M in `seq 298 302` `seq 490 520`; do
> +	for S in `seq 32 4 64`; do
> +		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
> +	done
> +done
> +
> +# log so large it pushes the root dir into AG 1
> +test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0
> +
> +# log end rounded beyond EOAG due to stripe unit
> +test_format "log end beyond eoag" -d agcount=3200,size=6366g -d su=256k,sw=4
> +
> +echo Silence is golden

Put this at the top where the test is being set up (i.e. where you
define testfile). That tells the reader straight away that no output
is expected on a successful run before they start reading the test
code....

Cheers,

Dave.
Darrick J. Wong May 24, 2022, 11:49 p.m. UTC | #2
On Wed, May 25, 2022 at 09:44:26AM +1000, Dave Chinner wrote:
> On Tue, May 24, 2022 at 12:52:57PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > This is a regression test that exercises the mkfs.xfs code that creates
> > log sizes that are very close to the AG size when stripe units are in
> > play and/or when the log is forced to be in AG 0.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/xfs/843     |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/843.out |    2 ++
> >  2 files changed, 58 insertions(+)
> >  create mode 100755 tests/xfs/843
> >  create mode 100644 tests/xfs/843.out
> > 
> > diff --git a/tests/xfs/843 b/tests/xfs/843
> > new file mode 100755
> > index 00000000..3384b1aa
> > --- /dev/null
> > +++ b/tests/xfs/843
> > @@ -0,0 +1,56 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022 Oracle.  All Rights Reserved.
> > +#
> > +# FS QA Test 843
> > +#
> > +# Now that we've increased the default log size calculation, test mkfs with
> > +# various stripe units and filesystem sizes to see if we can provoke mkfs into
> > +# breaking.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto mkfs
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	rm -r -f $tmp.* $testfile
> > +}
> 
> I'd omit this completely.
> 
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs xfs
> > +_require_test
> > +
> > +testfile=$TEST_DIR/a
> > +rm -f $testfile
> > +
> > +test_format() {
> > +	local tag="$1"
> > +	shift
> > +
> > +	echo "$tag" >> $seqres.full
> > +	$MKFS_XFS_PROG $@ -d file,name=$testfile &>> $seqres.full
> > +	local res=$?
> > +	test $res -eq 0 || echo "$tag FAIL $res" | tee -a $seqres.full
> 
> What breakage are you trying to provoke? Just the log size
> calculation? If so, why do we need to actually write the filesystem
> to disk? Won't "-N" still calculate everything and fail if it's
> broken or quit with success without needing to write anything to
> disk?

It will, but...

> > +}
> > +
> > +# First we try various small filesystems and stripe sizes.
> > +for M in `seq 298 302` `seq 490 520`; do
> > +	for S in `seq 32 4 64`; do
> > +		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
> > +	done
> > +done
> > +
> > +# log so large it pushes the root dir into AG 1
> > +test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0

...this particular check in mkfs only happens after we allocate the root
directory, which an -N invocation doesn't do.

> > +
> > +# log end rounded beyond EOAG due to stripe unit
> > +test_format "log end beyond eoag" -d agcount=3200,size=6366g -d su=256k,sw=4
> > +
> > +echo Silence is golden
> 
> Put this at the top where the test is being set up (i.e. where you
> define testfile). That tells the reader straight away that no output
> is expected on a successful run before they start reading the test
> code....

Hmm, we probably ought to update the ./new template too.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner May 25, 2022, 12:24 a.m. UTC | #3
On Tue, May 24, 2022 at 04:49:30PM -0700, Darrick J. Wong wrote:
> On Wed, May 25, 2022 at 09:44:26AM +1000, Dave Chinner wrote:
> > On Tue, May 24, 2022 at 12:52:57PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > This is a regression test that exercises the mkfs.xfs code that creates
> > > log sizes that are very close to the AG size when stripe units are in
> > > play and/or when the log is forced to be in AG 0.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  tests/xfs/843     |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/xfs/843.out |    2 ++
> > >  2 files changed, 58 insertions(+)
> > >  create mode 100755 tests/xfs/843
> > >  create mode 100644 tests/xfs/843.out
> > > 
> > > diff --git a/tests/xfs/843 b/tests/xfs/843
> > > new file mode 100755
> > > index 00000000..3384b1aa
> > > --- /dev/null
> > > +++ b/tests/xfs/843
> > > @@ -0,0 +1,56 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2022 Oracle.  All Rights Reserved.
> > > +#
> > > +# FS QA Test 843
> > > +#
> > > +# Now that we've increased the default log size calculation, test mkfs with
> > > +# various stripe units and filesystem sizes to see if we can provoke mkfs into
> > > +# breaking.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto mkfs
> > > +
> > > +_cleanup()
> > > +{
> > > +	cd /
> > > +	rm -r -f $tmp.* $testfile
> > > +}
> > 
> > I'd omit this completely.
> > 
> > > +# real QA test starts here
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs xfs
> > > +_require_test
> > > +
> > > +testfile=$TEST_DIR/a
> > > +rm -f $testfile
> > > +
> > > +test_format() {
> > > +	local tag="$1"
> > > +	shift
> > > +
> > > +	echo "$tag" >> $seqres.full
> > > +	$MKFS_XFS_PROG $@ -d file,name=$testfile &>> $seqres.full
> > > +	local res=$?
> > > +	test $res -eq 0 || echo "$tag FAIL $res" | tee -a $seqres.full
> > 
> > What breakage are you trying to provoke? Just the log size
> > calculation? If so, why do we need to actually write the filesystem
> > to disk? Won't "-N" still calculate everything and fail if it's
> > broken or quit with success without needing to write anything to
> > disk?
> 
> It will, but...
> 
> > > +}
> > > +
> > > +# First we try various small filesystems and stripe sizes.
> > > +for M in `seq 298 302` `seq 490 520`; do
> > > +	for S in `seq 32 4 64`; do
> > > +		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
> > > +	done
> > > +done
> > > +
> > > +# log so large it pushes the root dir into AG 1
> > > +test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0
> 
> ...this particular check in mkfs only happens after we allocate the root
> directory, which an -N invocation doesn't do.

Ok, so for this test can we drop the -N? We don't need to do 30 IOs
and write 64MB logs for every config we test - I think there's ~35 *
8 invocations of test_format in the loop above before we get here...

Also, why do we need a 6.3TB filesystem with 2.1GB AGs and a 2GB log
to trigger this? That means we have to write 2GB to disk, plus
~20,000 write IOs for the AG headers and btree root blocks before we
get to the failure case, yes?

Why not just exercise the failure case with something like this:

# mkfs.xfs -d agcount=2,size=64M -l size=8180b,agnum=0 -d file,name=test.img
meta-data=test.img               isize=512    agcount=2, agsize=8192 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=0 inobtcount=0
data     =                       bsize=4096   blocks=16384, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=8180, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
mkfs.xfs: root inode created in AG 1, not AG 0
# echo $?
1
#

Otherwise I don't exactly understand what this specific case is
supposed to be testing, so maybe some more explaining is necessary?

Cheers,

Dave.
Darrick J. Wong May 25, 2022, 12:32 a.m. UTC | #4
On Wed, May 25, 2022 at 10:24:13AM +1000, Dave Chinner wrote:
> On Tue, May 24, 2022 at 04:49:30PM -0700, Darrick J. Wong wrote:
> > On Wed, May 25, 2022 at 09:44:26AM +1000, Dave Chinner wrote:
> > > On Tue, May 24, 2022 at 12:52:57PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > This is a regression test that exercises the mkfs.xfs code that creates
> > > > log sizes that are very close to the AG size when stripe units are in
> > > > play and/or when the log is forced to be in AG 0.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > >  tests/xfs/843     |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  tests/xfs/843.out |    2 ++
> > > >  2 files changed, 58 insertions(+)
> > > >  create mode 100755 tests/xfs/843
> > > >  create mode 100644 tests/xfs/843.out
> > > > 
> > > > diff --git a/tests/xfs/843 b/tests/xfs/843
> > > > new file mode 100755
> > > > index 00000000..3384b1aa
> > > > --- /dev/null
> > > > +++ b/tests/xfs/843
> > > > @@ -0,0 +1,56 @@
> > > > +#! /bin/bash
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +# Copyright (c) 2022 Oracle.  All Rights Reserved.
> > > > +#
> > > > +# FS QA Test 843
> > > > +#
> > > > +# Now that we've increased the default log size calculation, test mkfs with
> > > > +# various stripe units and filesystem sizes to see if we can provoke mkfs into
> > > > +# breaking.
> > > > +#
> > > > +. ./common/preamble
> > > > +_begin_fstest auto mkfs
> > > > +
> > > > +_cleanup()
> > > > +{
> > > > +	cd /
> > > > +	rm -r -f $tmp.* $testfile
> > > > +}
> > > 
> > > I'd omit this completely.
> > > 
> > > > +# real QA test starts here
> > > > +
> > > > +# Modify as appropriate.
> > > > +_supported_fs xfs
> > > > +_require_test
> > > > +
> > > > +testfile=$TEST_DIR/a
> > > > +rm -f $testfile
> > > > +
> > > > +test_format() {
> > > > +	local tag="$1"
> > > > +	shift
> > > > +
> > > > +	echo "$tag" >> $seqres.full
> > > > +	$MKFS_XFS_PROG $@ -d file,name=$testfile &>> $seqres.full
> > > > +	local res=$?
> > > > +	test $res -eq 0 || echo "$tag FAIL $res" | tee -a $seqres.full
> > > 
> > > What breakage are you trying to provoke? Just the log size
> > > calculation? If so, why do we need to actually write the filesystem
> > > to disk? Won't "-N" still calculate everything and fail if it's
> > > broken or quit with success without needing to write anything to
> > > disk?
> > 
> > It will, but...
> > 
> > > > +}
> > > > +
> > > > +# First we try various small filesystems and stripe sizes.
> > > > +for M in `seq 298 302` `seq 490 520`; do
> > > > +	for S in `seq 32 4 64`; do
> > > > +		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
> > > > +	done
> > > > +done
> > > > +
> > > > +# log so large it pushes the root dir into AG 1
> > > > +test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0
> > 
> > ...this particular check in mkfs only happens after we allocate the root
> > directory, which an -N invocation doesn't do.
> 
> Ok, so for this test can we drop the -N? We don't need to do 30 IOs
> and write 64MB logs for every config we test - I think there's ~35 *
> 8 invocations of test_format in the loop above before we get here...
> 
> Also, why do we need a 6.3TB filesystem with 2.1GB AGs and a 2GB log
> to trigger this? That means we have to write 2GB to disk, plus
> ~20,000 write IOs for the AG headers and btree root blocks before we
> get to the failure case, yes?

So long as the test filesystem is itself XFS and a reasonably new
vintage, mkfs.xfs will call FALLOC_FL_ZERO_RANGE to zero the log, which
means we're not writing 2GB to storage with every format.

> Why not just exercise the failure case with something like this:
> 
> # mkfs.xfs -d agcount=2,size=64M -l size=8180b,agnum=0 -d file,name=test.img
> meta-data=test.img               isize=512    agcount=2, agsize=8192 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1    bigtime=0 inobtcount=0
> data     =                       bsize=4096   blocks=16384, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=8180, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> mkfs.xfs: root inode created in AG 1, not AG 0
> # echo $?
> 1
> #
> 
> Otherwise I don't exactly understand what this specific case is
> supposed to be testing, so maybe some more explaining is necessary?

To be honest, I hadn't thought of that; this test encodes all the weird
scenarios that Eric and I came up with over many hours of bluejeans
chatting to see if we could trip up all the mkfs log sizing code and
related fixes.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner May 25, 2022, 12:59 a.m. UTC | #5
On Tue, May 24, 2022 at 05:32:50PM -0700, Darrick J. Wong wrote:
> On Wed, May 25, 2022 at 10:24:13AM +1000, Dave Chinner wrote:
> > On Tue, May 24, 2022 at 04:49:30PM -0700, Darrick J. Wong wrote:
> > > On Wed, May 25, 2022 at 09:44:26AM +1000, Dave Chinner wrote:
> > > > On Tue, May 24, 2022 at 12:52:57PM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > 
> > > > > This is a regression test that exercises the mkfs.xfs code that creates
> > > > > log sizes that are very close to the AG size when stripe units are in
> > > > > play and/or when the log is forced to be in AG 0.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > ---
> > > > >  tests/xfs/843     |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  tests/xfs/843.out |    2 ++
> > > > >  2 files changed, 58 insertions(+)
> > > > >  create mode 100755 tests/xfs/843
> > > > >  create mode 100644 tests/xfs/843.out
> > > > > 
> > > > > diff --git a/tests/xfs/843 b/tests/xfs/843
> > > > > new file mode 100755
> > > > > index 00000000..3384b1aa
> > > > > --- /dev/null
> > > > > +++ b/tests/xfs/843
> > > > > @@ -0,0 +1,56 @@
> > > > > +#! /bin/bash
> > > > > +# SPDX-License-Identifier: GPL-2.0
> > > > > +# Copyright (c) 2022 Oracle.  All Rights Reserved.
> > > > > +#
> > > > > +# FS QA Test 843
> > > > > +#
> > > > > +# Now that we've increased the default log size calculation, test mkfs with
> > > > > +# various stripe units and filesystem sizes to see if we can provoke mkfs into
> > > > > +# breaking.
> > > > > +#
> > > > > +. ./common/preamble
> > > > > +_begin_fstest auto mkfs
> > > > > +
> > > > > +_cleanup()
> > > > > +{
> > > > > +	cd /
> > > > > +	rm -r -f $tmp.* $testfile
> > > > > +}
> > > > 
> > > > I'd omit this completely.
> > > > 
> > > > > +# real QA test starts here
> > > > > +
> > > > > +# Modify as appropriate.
> > > > > +_supported_fs xfs
> > > > > +_require_test
> > > > > +
> > > > > +testfile=$TEST_DIR/a
> > > > > +rm -f $testfile
> > > > > +
> > > > > +test_format() {
> > > > > +	local tag="$1"
> > > > > +	shift
> > > > > +
> > > > > +	echo "$tag" >> $seqres.full
> > > > > +	$MKFS_XFS_PROG $@ -d file,name=$testfile &>> $seqres.full
> > > > > +	local res=$?
> > > > > +	test $res -eq 0 || echo "$tag FAIL $res" | tee -a $seqres.full
> > > > 
> > > > What breakage are you trying to provoke? Just the log size
> > > > calculation? If so, why do we need to actually write the filesystem
> > > > to disk? Won't "-N" still calculate everything and fail if it's
> > > > broken or quit with success without needing to write anything to
> > > > disk?
> > > 
> > > It will, but...
> > > 
> > > > > +}
> > > > > +
> > > > > +# First we try various small filesystems and stripe sizes.
> > > > > +for M in `seq 298 302` `seq 490 520`; do
> > > > > +	for S in `seq 32 4 64`; do
> > > > > +		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
> > > > > +	done
> > > > > +done
> > > > > +
> > > > > +# log so large it pushes the root dir into AG 1
> > > > > +test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0
> > > 
> > > ...this particular check in mkfs only happens after we allocate the root
> > > directory, which an -N invocation doesn't do.
> > 
> > Ok, so for this test can we drop the -N? We don't need to do 30 IOs
> > and write 64MB logs for every config we test - I think there's ~35 *
> > 8 invocations of test_format in the loop above before we get here...
> > 
> > Also, why do we need a 6.3TB filesystem with 2.1GB AGs and a 2GB log
> > to trigger this? That means we have to write 2GB to disk, plus
> > ~20,000 write IOs for the AG headers and btree root blocks before we
> > get to the failure case, yes?
> 
> So long as the test filesystem is itself XFS and a reasonably new
> vintage, mkfs.xfs will call FALLOC_FL_ZERO_RANGE to zero the log, which
> means we're not writing 2GB to storage with every format.

Oh, cycle = XLOG_INIT_CYCLE short-cuts stamping the head/tail LSNs
into every sector of the log. Ok, so we don't have to write the log
in this case, just issue 20,000 IOs to write out the AG headers...

> > Why not just exercise the failure case with something like this:
> > 
> > # mkfs.xfs -d agcount=2,size=64M -l size=8180b,agnum=0 -d file,name=test.img
> > meta-data=test.img               isize=512    agcount=2, agsize=8192 blks
> >          =                       sectsz=512   attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=1, rmapbt=0
> >          =                       reflink=1    bigtime=0 inobtcount=0
> > data     =                       bsize=4096   blocks=16384, imaxpct=25
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > log      =internal log           bsize=4096   blocks=8180, version=2
> >          =                       sectsz=512   sunit=0 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > mkfs.xfs: root inode created in AG 1, not AG 0
> > # echo $?
> > 1
> > #
> > 
> > Otherwise I don't exactly understand what this specific case is
> > supposed to be testing, so maybe some more explaining is necessary?
> 
> To be honest, I hadn't thought of that; this test encodes all the weird
> scenarios that Eric and I came up with over many hours of bluejeans
> chatting to see if we could trip up all the mkfs log sizing code and
> related fixes.

*nod*. I figured it was probably either an oversight or me being
incredibly stupid again. EIther way, it looks like the only way we
can trip over this is incredibly contrived mkfs configs, so I'd
argue that the smaller, faster option is just as good as the bigger,
slower one...

Cheers,

Dave.
Darrick J. Wong May 25, 2022, 11:21 p.m. UTC | #6
On Wed, May 25, 2022 at 10:24:13AM +1000, Dave Chinner wrote:
> On Tue, May 24, 2022 at 04:49:30PM -0700, Darrick J. Wong wrote:
> > On Wed, May 25, 2022 at 09:44:26AM +1000, Dave Chinner wrote:
> > > On Tue, May 24, 2022 at 12:52:57PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > This is a regression test that exercises the mkfs.xfs code that creates
> > > > log sizes that are very close to the AG size when stripe units are in
> > > > play and/or when the log is forced to be in AG 0.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > >  tests/xfs/843     |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  tests/xfs/843.out |    2 ++
> > > >  2 files changed, 58 insertions(+)
> > > >  create mode 100755 tests/xfs/843
> > > >  create mode 100644 tests/xfs/843.out
> > > > 
> > > > diff --git a/tests/xfs/843 b/tests/xfs/843
> > > > new file mode 100755
> > > > index 00000000..3384b1aa
> > > > --- /dev/null
> > > > +++ b/tests/xfs/843
> > > > @@ -0,0 +1,56 @@
> > > > +#! /bin/bash
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +# Copyright (c) 2022 Oracle.  All Rights Reserved.
> > > > +#
> > > > +# FS QA Test 843
> > > > +#
> > > > +# Now that we've increased the default log size calculation, test mkfs with
> > > > +# various stripe units and filesystem sizes to see if we can provoke mkfs into
> > > > +# breaking.
> > > > +#
> > > > +. ./common/preamble
> > > > +_begin_fstest auto mkfs
> > > > +
> > > > +_cleanup()
> > > > +{
> > > > +	cd /
> > > > +	rm -r -f $tmp.* $testfile
> > > > +}
> > > 
> > > I'd omit this completely.
> > > 
> > > > +# real QA test starts here
> > > > +
> > > > +# Modify as appropriate.
> > > > +_supported_fs xfs
> > > > +_require_test
> > > > +
> > > > +testfile=$TEST_DIR/a
> > > > +rm -f $testfile
> > > > +
> > > > +test_format() {
> > > > +	local tag="$1"
> > > > +	shift
> > > > +
> > > > +	echo "$tag" >> $seqres.full
> > > > +	$MKFS_XFS_PROG $@ -d file,name=$testfile &>> $seqres.full
> > > > +	local res=$?
> > > > +	test $res -eq 0 || echo "$tag FAIL $res" | tee -a $seqres.full
> > > 
> > > What breakage are you trying to provoke? Just the log size
> > > calculation? If so, why do we need to actually write the filesystem
> > > to disk? Won't "-N" still calculate everything and fail if it's
> > > broken or quit with success without needing to write anything to
> > > disk?
> > 
> > It will, but...
> > 
> > > > +}
> > > > +
> > > > +# First we try various small filesystems and stripe sizes.
> > > > +for M in `seq 298 302` `seq 490 520`; do
> > > > +	for S in `seq 32 4 64`; do
> > > > +		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
> > > > +	done
> > > > +done
> > > > +
> > > > +# log so large it pushes the root dir into AG 1
> > > > +test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0
> > 
> > ...this particular check in mkfs only happens after we allocate the root
> > directory, which an -N invocation doesn't do.
> 
> Ok, so for this test can we drop the -N? We don't need to do 30 IOs
> and write 64MB logs for every config we test - I think there's ~35 *
> 8 invocations of test_format in the loop above before we get here...
> 
> Also, why do we need a 6.3TB filesystem with 2.1GB AGs and a 2GB log
> to trigger this? That means we have to write 2GB to disk, plus
> ~20,000 write IOs for the AG headers and btree root blocks before we
> get to the failure case, yes?
> 
> Why not just exercise the failure case with something like this:
> 
> # mkfs.xfs -d agcount=2,size=64M -l size=8180b,agnum=0 -d file,name=test.img
> meta-data=test.img               isize=512    agcount=2, agsize=8192 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1    bigtime=0 inobtcount=0
> data     =                       bsize=4096   blocks=16384, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=8180, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> mkfs.xfs: root inode created in AG 1, not AG 0

Welll... a better reason for why this test can't do that -- one of my
fixes also made mkfs reject -l size=XXX when XXX is not possible.

That said... -d agcount=3200,size=6366g -lagnum=0 -N seems to work?

--D

> # echo $?
> 1
> #
> 
> Otherwise I don't exactly understand what this specific case is
> supposed to be testing, so maybe some more explaining is necessary?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner May 26, 2022, 12:11 a.m. UTC | #7
On Wed, May 25, 2022 at 04:21:36PM -0700, Darrick J. Wong wrote:
> On Wed, May 25, 2022 at 10:24:13AM +1000, Dave Chinner wrote:
> > On Tue, May 24, 2022 at 04:49:30PM -0700, Darrick J. Wong wrote:
> > > On Wed, May 25, 2022 at 09:44:26AM +1000, Dave Chinner wrote:
> > > > On Tue, May 24, 2022 at 12:52:57PM -0700, Darrick J. Wong wrote:
> > > > > +# First we try various small filesystems and stripe sizes.
> > > > > +for M in `seq 298 302` `seq 490 520`; do
> > > > > +	for S in `seq 32 4 64`; do
> > > > > +		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
> > > > > +	done
> > > > > +done
> > > > > +
> > > > > +# log so large it pushes the root dir into AG 1
> > > > > +test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0
> > > 
> > > ...this particular check in mkfs only happens after we allocate the root
> > > directory, which an -N invocation doesn't do.
> > 
> > Ok, so for this test can we drop the -N? We don't need to do 30 IOs
> > and write 64MB logs for every config we test - I think there's ~35 *
> > 8 invocations of test_format in the loop above before we get here...
> > 
> > Also, why do we need a 6.3TB filesystem with 2.1GB AGs and a 2GB log
> > to trigger this? That means we have to write 2GB to disk, plus
> > ~20,000 write IOs for the AG headers and btree root blocks before we
> > get to the failure case, yes?
> > 
> > Why not just exercise the failure case with something like this:
> > 
> > # mkfs.xfs -d agcount=2,size=64M -l size=8180b,agnum=0 -d file,name=test.img
> > meta-data=test.img               isize=512    agcount=2, agsize=8192 blks
> >          =                       sectsz=512   attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=1, rmapbt=0
> >          =                       reflink=1    bigtime=0 inobtcount=0
> > data     =                       bsize=4096   blocks=16384, imaxpct=25
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > log      =internal log           bsize=4096   blocks=8180, version=2
> >          =                       sectsz=512   sunit=0 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > mkfs.xfs: root inode created in AG 1, not AG 0
> 
> Welll... a better reason for why this test can't do that -- one of my
> fixes also made mkfs reject -l size=XXX when XXX is not possible.

So you make other changes to mkfs that aren't yet upstream that
check whether the log causes the root inode to change position?
I mean, '-l size=8180b' results in a log that fits perfectly fine in
an empty AG, and so you must have added more checks to detect this
moves the root inode somehow?

> That said... -d agcount=3200,size=6366g -lagnum=0 -N seems to work?

On a new mkfs binary with those checks that you mention above?  If
so, it seems to me that those new checks trigger for this case, too,
before we write anything?

IOWs, this won't fail on older mkfs.xfs binaries that don't have
these new log size checks because we never get around to allocating
the root inode and checking it's location, right?  Yet what we
really want here is for it to fail on an old mkfs binary?

Cheers,

Dave.
Darrick J. Wong May 26, 2022, 1:35 a.m. UTC | #8
On Thu, May 26, 2022 at 10:11:49AM +1000, Dave Chinner wrote:
> On Wed, May 25, 2022 at 04:21:36PM -0700, Darrick J. Wong wrote:
> > On Wed, May 25, 2022 at 10:24:13AM +1000, Dave Chinner wrote:
> > > On Tue, May 24, 2022 at 04:49:30PM -0700, Darrick J. Wong wrote:
> > > > On Wed, May 25, 2022 at 09:44:26AM +1000, Dave Chinner wrote:
> > > > > On Tue, May 24, 2022 at 12:52:57PM -0700, Darrick J. Wong wrote:
> > > > > > +# First we try various small filesystems and stripe sizes.
> > > > > > +for M in `seq 298 302` `seq 490 520`; do
> > > > > > +	for S in `seq 32 4 64`; do
> > > > > > +		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
> > > > > > +	done
> > > > > > +done
> > > > > > +
> > > > > > +# log so large it pushes the root dir into AG 1
> > > > > > +test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0
> > > > 
> > > > ...this particular check in mkfs only happens after we allocate the root
> > > > directory, which an -N invocation doesn't do.
> > > 
> > > Ok, so for this test can we drop the -N? We don't need to do 30 IOs
> > > and write 64MB logs for every config we test - I think there's ~35 *
> > > 8 invocations of test_format in the loop above before we get here...
> > > 
> > > Also, why do we need a 6.3TB filesystem with 2.1GB AGs and a 2GB log
> > > to trigger this? That means we have to write 2GB to disk, plus
> > > ~20,000 write IOs for the AG headers and btree root blocks before we
> > > get to the failure case, yes?
> > > 
> > > Why not just exercise the failure case with something like this:
> > > 
> > > # mkfs.xfs -d agcount=2,size=64M -l size=8180b,agnum=0 -d file,name=test.img
> > > meta-data=test.img               isize=512    agcount=2, agsize=8192 blks
> > >          =                       sectsz=512   attr=2, projid32bit=1
> > >          =                       crc=1        finobt=1, sparse=1, rmapbt=0
> > >          =                       reflink=1    bigtime=0 inobtcount=0
> > > data     =                       bsize=4096   blocks=16384, imaxpct=25
> > >          =                       sunit=0      swidth=0 blks
> > > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > > log      =internal log           bsize=4096   blocks=8180, version=2
> > >          =                       sectsz=512   sunit=0 blks, lazy-count=1
> > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > mkfs.xfs: root inode created in AG 1, not AG 0
> > 
> > Welll... a better reason for why this test can't do that -- one of my
> > fixes also made mkfs reject -l size=XXX when XXX is not possible.
> 
> So you make other changes to mkfs that aren't yet upstream that
> check whether the log causes the root inode to change position?

It's in for-next, is that not good enough?

> I mean, '-l size=8180b' results in a log that fits perfectly fine in
> an empty AG, and so you must have added more checks to detect this
> moves the root inode somehow?
> 
> > That said... -d agcount=3200,size=6366g -lagnum=0 -N seems to work?
> 
> On a new mkfs binary with those checks that you mention above?  If
> so, it seems to me that those new checks trigger for this case, too,
> before we write anything?
>
> IOWs, this won't fail on older mkfs.xfs binaries that don't have
> these new log size checks because we never get around to allocating
> the root inode and checking it's location, right?  Yet what we
> really want here is for it to fail on an old mkfs binary?

Yeah, something like that.  It's been so long since Eric and I were
actively poking on this that I don't remember so well anymore.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner May 26, 2022, 2:27 a.m. UTC | #9
On Wed, May 25, 2022 at 06:35:20PM -0700, Darrick J. Wong wrote:
> On Thu, May 26, 2022 at 10:11:49AM +1000, Dave Chinner wrote:
> > On Wed, May 25, 2022 at 04:21:36PM -0700, Darrick J. Wong wrote:
> > > On Wed, May 25, 2022 at 10:24:13AM +1000, Dave Chinner wrote:
> > > > On Tue, May 24, 2022 at 04:49:30PM -0700, Darrick J. Wong wrote:
> > > > > On Wed, May 25, 2022 at 09:44:26AM +1000, Dave Chinner wrote:
> > > > > > On Tue, May 24, 2022 at 12:52:57PM -0700, Darrick J. Wong wrote:
> > > > > > > +# First we try various small filesystems and stripe sizes.
> > > > > > > +for M in `seq 298 302` `seq 490 520`; do
> > > > > > > +	for S in `seq 32 4 64`; do
> > > > > > > +		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
> > > > > > > +	done
> > > > > > > +done
> > > > > > > +
> > > > > > > +# log so large it pushes the root dir into AG 1
> > > > > > > +test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0
> > > > > 
> > > > > ...this particular check in mkfs only happens after we allocate the root
> > > > > directory, which an -N invocation doesn't do.
> > > > 
> > > > Ok, so for this test can we drop the -N? We don't need to do 30 IOs
> > > > and write 64MB logs for every config we test - I think there's ~35 *
> > > > 8 invocations of test_format in the loop above before we get here...
> > > > 
> > > > Also, why do we need a 6.3TB filesystem with 2.1GB AGs and a 2GB log
> > > > to trigger this? That means we have to write 2GB to disk, plus
> > > > ~20,000 write IOs for the AG headers and btree root blocks before we
> > > > get to the failure case, yes?
> > > > 
> > > > Why not just exercise the failure case with something like this:
> > > > 
> > > > # mkfs.xfs -d agcount=2,size=64M -l size=8180b,agnum=0 -d file,name=test.img
> > > > meta-data=test.img               isize=512    agcount=2, agsize=8192 blks
> > > >          =                       sectsz=512   attr=2, projid32bit=1
> > > >          =                       crc=1        finobt=1, sparse=1, rmapbt=0
> > > >          =                       reflink=1    bigtime=0 inobtcount=0
> > > > data     =                       bsize=4096   blocks=16384, imaxpct=25
> > > >          =                       sunit=0      swidth=0 blks
> > > > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > > > log      =internal log           bsize=4096   blocks=8180, version=2
> > > >          =                       sectsz=512   sunit=0 blks, lazy-count=1
> > > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > > mkfs.xfs: root inode created in AG 1, not AG 0
> > > 
> > > Welll... a better reason for why this test can't do that -- one of my
> > > fixes also made mkfs reject -l size=XXX when XXX is not possible.
> > 
> > So you make other changes to mkfs that aren't yet upstream that
> > check whether the log causes the root inode to change position?
> 
> It's in for-next, is that not good enough?

Sorry, I didn't realise Eric had pulled it in. I'm not keep a close
eye on exactly what non-libxfs-sync stuff going into xfsprogs right
now - I've had to filter that stuff out to concentrate on the kernel
cycle stuff...

As it is, it's fine however we reject this case, isn't? it doesn't
have to be the root inode error message that causes it to abort, as
long as we get a non-zero exit value. Hence the above method would
fail with both old and new mkfs binaries, even if they are detecting
the issue differently.

> > I mean, '-l size=8180b' results in a log that fits perfectly fine in
> > an empty AG, and so you must have added more checks to detect this
> > moves the root inode somehow?
> > 
> > > That said... -d agcount=3200,size=6366g -lagnum=0 -N seems to work?
> > 
> > On a new mkfs binary with those checks that you mention above?  If
> > so, it seems to me that those new checks trigger for this case, too,
> > before we write anything?
> >
> > IOWs, this won't fail on older mkfs.xfs binaries that don't have
> > these new log size checks because we never get around to allocating
> > the root inode and checking it's location, right?  Yet what we
> > really want here is for it to fail on an old mkfs binary?
> 
> Yeah, something like that.  It's been so long since Eric and I were
> actively poking on this that I don't remember so well anymore.

Worth checking, because ISTM that if it doesn't fail on old mkfs,
then we'd still the non "-N" version or the targetted log size
variant like above. Of course, I could still be misunderstanding
what you're trying to exercise here...

Cheers,

Dave.
diff mbox series

Patch

diff --git a/tests/xfs/843 b/tests/xfs/843
new file mode 100755
index 00000000..3384b1aa
--- /dev/null
+++ b/tests/xfs/843
@@ -0,0 +1,56 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 Oracle.  All Rights Reserved.
+#
+# FS QA Test 843
+#
+# Now that we've increased the default log size calculation, test mkfs with
+# various stripe units and filesystem sizes to see if we can provoke mkfs into
+# breaking.
+#
+. ./common/preamble
+_begin_fstest auto mkfs
+
+_cleanup()
+{
+	cd /
+	rm -r -f $tmp.* $testfile
+}
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs xfs
+_require_test
+
+testfile=$TEST_DIR/a
+rm -f $testfile
+
+test_format() {
+	local tag="$1"
+	shift
+
+	echo "$tag" >> $seqres.full
+	$MKFS_XFS_PROG $@ -d file,name=$testfile &>> $seqres.full
+	local res=$?
+	test $res -eq 0 || echo "$tag FAIL $res" | tee -a $seqres.full
+}
+
+# First we try various small filesystems and stripe sizes.
+for M in `seq 298 302` `seq 490 520`; do
+	for S in `seq 32 4 64`; do
+		test_format "M=$M S=$S" -dsu=${S}k,sw=1,size=${M}m
+	done
+done
+
+# log so large it pushes the root dir into AG 1
+test_format "log pushes rootdir into AG 1" -d agcount=3200,size=6366g -lagnum=0
+
+# log end rounded beyond EOAG due to stripe unit
+test_format "log end beyond eoag" -d agcount=3200,size=6366g -d su=256k,sw=4
+
+echo Silence is golden
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/843.out b/tests/xfs/843.out
new file mode 100644
index 00000000..87c13504
--- /dev/null
+++ b/tests/xfs/843.out
@@ -0,0 +1,2 @@ 
+QA output created by 843
+Silence is golden