diff mbox series

common/xfs: wipe the XFS superblock of each AGs

Message ID 20190919150024.8346-1-zlang@redhat.com (mailing list archive)
State Superseded
Headers show
Series common/xfs: wipe the XFS superblock of each AGs | expand

Commit Message

Zorro Lang Sept. 19, 2019, 3 p.m. UTC
xfs/030 always fails after d0e484ac699f ("check: wipe scratch devices
between tests") get merged.

Due to xfs/030 does a sized(100m) mkfs. Before we merge above commit,
mkfs.xfs detects an old primary superblock, it will write zeroes to
all superblocks before formatting the new filesystem. But this won't
be done if we wipe the first superblock(by merging above commit).

That means if we make a (smaller) sized xfs after wipefs, those *old*
superblocks which created by last time mkfs.xfs will be left on disk.
Then when we do xfs_repair, if xfs_repair can't find the first SB, it
will go to find those *old* SB at first. When it finds them,
everyting goes wrong.

So I try to get XFS AG geometry(by default) and then try to erase all
superblocks. Thanks Darrick J. Wong helped to analyze this issue.

Signed-off-by: Zorro Lang <zlang@redhat.com>
---
 common/rc  |  4 ++++
 common/xfs | 23 +++++++++++++++++++++++
 2 files changed, 27 insertions(+)

Comments

Darrick J. Wong Sept. 19, 2019, 4:02 p.m. UTC | #1
On Thu, Sep 19, 2019 at 11:00:24PM +0800, Zorro Lang wrote:
> xfs/030 always fails after d0e484ac699f ("check: wipe scratch devices
> between tests") get merged.
> 
> Due to xfs/030 does a sized(100m) mkfs. Before we merge above commit,
> mkfs.xfs detects an old primary superblock, it will write zeroes to
> all superblocks before formatting the new filesystem. But this won't
> be done if we wipe the first superblock(by merging above commit).
> 
> That means if we make a (smaller) sized xfs after wipefs, those *old*
> superblocks which created by last time mkfs.xfs will be left on disk.

One thing missing from this patch -- if the test formatted the scratch
device with non-default geometry, the backup superblocks from that
filesystem will not be erased.  Going back to my example from the email
thread, if the scratch disk has:

  SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
      SB'1 [1G space] SB'2 [1G space] SB'3 [1G space]

Where SB[0-5] are the ones written by xfs/030 and SB'[1-3] were written
by a previous test that did the default scratch device mkfs, then this
patch will wipe out SB'[1-3] and SB0:

  000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
      0000 [1G space] 0000 [1G space] 0000 [1G space]

But that still leaves SB[1-5] which xfs_repair could stumble over later.
For example, if the next test to be run formats a filesystem with 24MB
AGs (instead of 16) and zaps the superblock, then repair will eventually
try a linear scan looking for superblocks and find the ones from the
16MB filesystem first.

There isn't a sequence of tests that do this, but so long as we're
fixing this we might as well zap as much as we can.  So I propose adding
to try_wipe_scratch_xfs() the following:

	dbsize=
	_scratch_xfs_db -c 'sb 0' -c 'p blocksize agblocks agcount' 2>&1 | \
		sed -e 's/ = /=/g' -e 's/blocksize/dbsize/g' \
		    -e 's/agblocks/agsize/g' > $tmp.mkfs
	. $tmp.mkfs

and then repeat the for loop.  If there isn't a filesystem then
$tmp.mkfs will be an empty file and the loop won't run.

> Then when we do xfs_repair, if xfs_repair can't find the first SB, it
> will go to find those *old* SB at first. When it finds them,
> everyting goes wrong.
> 
> So I try to get XFS AG geometry(by default) and then try to erase all
> superblocks. Thanks Darrick J. Wong helped to analyze this issue.
> 
> Signed-off-by: Zorro Lang <zlang@redhat.com>
> ---
>  common/rc  |  4 ++++
>  common/xfs | 23 +++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index 66c7fd4d..fe13f659 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
>  	for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
>  		test -b $dev && $WIPEFS_PROG -a $dev
>  	done
> +
> +	if [ "$FSTYP" = "xfs" ];then
> +		try_wipe_scratch_xfs
> +	fi

We probably ought to delegate all wiping to try_wipe_scratch_xfs, i.e.:

	test -b $dev || continue
	case "$FSTYP" in
	"xfs")
		_try_wipe_scratch_xfs
		;;
	*)
		$WIPEFS_PROG -a $dev
		;;
	esac

and add the WIPEFS_PROG call to _try_wipe_scratch_xfs.

>  }
>  
>  # Only run this on xfs if xfs_scrub is available and has the unicode checker
> diff --git a/common/xfs b/common/xfs
> index 1bce3c18..34516f82 100644
> --- a/common/xfs
> +++ b/common/xfs
> @@ -884,3 +884,26 @@ _xfs_mount_agcount()
>  {
>  	$XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
>  }
> +
> +# wipe the superblock of each XFS AGs
> +try_wipe_scratch_xfs()

Common helper functions should start with a '_'

> +{
> +	local tmp=`mktemp -u`
> +
> +	_scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
> +		if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
> +			print STDOUT "agcount=$1\nagsize=$2\n";
> +		}
> +		if (/^data\s+=\s+bsize=(\d+)\s/) {
> +			print STDOUT "dbsize=$1\n";
> +		}' > $tmp.mkfs
> +
> +	. $tmp.mkfs
> +	if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
> +		for ((i = 0; i < agcount; i++)); do
> +			$XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
> +				$SCRATCH_DEV >/dev/null;
> +		done
> +       fi
> +       rm -f $tmp.mkfs

Add code as discussed above.

--D

> +}
> -- 
> 2.20.1
>
Zorro Lang Sept. 19, 2019, 5:27 p.m. UTC | #2
On Thu, Sep 19, 2019 at 09:02:06AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 19, 2019 at 11:00:24PM +0800, Zorro Lang wrote:
> > xfs/030 always fails after d0e484ac699f ("check: wipe scratch devices
> > between tests") get merged.
> > 
> > Due to xfs/030 does a sized(100m) mkfs. Before we merge above commit,
> > mkfs.xfs detects an old primary superblock, it will write zeroes to
> > all superblocks before formatting the new filesystem. But this won't
> > be done if we wipe the first superblock(by merging above commit).
> > 
> > That means if we make a (smaller) sized xfs after wipefs, those *old*
> > superblocks which created by last time mkfs.xfs will be left on disk.
> 
> One thing missing from this patch -- if the test formatted the scratch
> device with non-default geometry, the backup superblocks from that

Make sense, I didn't think about non-default geometry.

> filesystem will not be erased.  Going back to my example from the email
> thread, if the scratch disk has:
> 
>   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
>       SB'1 [1G space] SB'2 [1G space] SB'3 [1G space]
> 
> Where SB[0-5] are the ones written by xfs/030 and SB'[1-3] were written
> by a previous test that did the default scratch device mkfs, then this
> patch will wipe out SB'[1-3] and SB0:
> 
>   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
>       0000 [1G space] 0000 [1G space] 0000 [1G space]
> 
> But that still leaves SB[1-5] which xfs_repair could stumble over later.
> For example, if the next test to be run formats a filesystem with 24MB
> AGs (instead of 16) and zaps the superblock, then repair will eventually
> try a linear scan looking for superblocks and find the ones from the
> 16MB filesystem first.
> 
> There isn't a sequence of tests that do this, but so long as we're
> fixing this we might as well zap as much as we can.  So I propose adding
> to try_wipe_scratch_xfs() the following:
> 
> 	dbsize=
> 	_scratch_xfs_db -c 'sb 0' -c 'p blocksize agblocks agcount' 2>&1 | \
> 		sed -e 's/ = /=/g' -e 's/blocksize/dbsize/g' \
> 		    -e 's/agblocks/agsize/g' > $tmp.mkfs
> 	. $tmp.mkfs
> 
> and then repeat the for loop.  If there isn't a filesystem then
> $tmp.mkfs will be an empty file and the loop won't run.

Sure, although I don't know why we must change the variable's name :)

> 
> > Then when we do xfs_repair, if xfs_repair can't find the first SB, it
> > will go to find those *old* SB at first. When it finds them,
> > everyting goes wrong.
> > 
> > So I try to get XFS AG geometry(by default) and then try to erase all
> > superblocks. Thanks Darrick J. Wong helped to analyze this issue.
> > 
> > Signed-off-by: Zorro Lang <zlang@redhat.com>
> > ---
> >  common/rc  |  4 ++++
> >  common/xfs | 23 +++++++++++++++++++++++
> >  2 files changed, 27 insertions(+)
> > 
> > diff --git a/common/rc b/common/rc
> > index 66c7fd4d..fe13f659 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
> >  	for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
> >  		test -b $dev && $WIPEFS_PROG -a $dev
> >  	done
> > +
> > +	if [ "$FSTYP" = "xfs" ];then
> > +		try_wipe_scratch_xfs
> > +	fi
> 
> We probably ought to delegate all wiping to try_wipe_scratch_xfs, i.e.:
> 
> 	test -b $dev || continue
> 	case "$FSTYP" in
> 	"xfs")
> 		_try_wipe_scratch_xfs
> 		;;
> 	*)
> 		$WIPEFS_PROG -a $dev
> 		;;
> 	esac
> 
> and add the WIPEFS_PROG call to _try_wipe_scratch_xfs.

Sure,

Thanks!
Zorro

> 
> >  }
> >  
> >  # Only run this on xfs if xfs_scrub is available and has the unicode checker
> > diff --git a/common/xfs b/common/xfs
> > index 1bce3c18..34516f82 100644
> > --- a/common/xfs
> > +++ b/common/xfs
> > @@ -884,3 +884,26 @@ _xfs_mount_agcount()
> >  {
> >  	$XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
> >  }
> > +
> > +# wipe the superblock of each XFS AGs
> > +try_wipe_scratch_xfs()
> 
> Common helper functions should start with a '_'
> 
> > +{
> > +	local tmp=`mktemp -u`
> > +
> > +	_scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
> > +		if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
> > +			print STDOUT "agcount=$1\nagsize=$2\n";
> > +		}
> > +		if (/^data\s+=\s+bsize=(\d+)\s/) {
> > +			print STDOUT "dbsize=$1\n";
> > +		}' > $tmp.mkfs
> > +
> > +	. $tmp.mkfs
> > +	if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
> > +		for ((i = 0; i < agcount; i++)); do
> > +			$XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
> > +				$SCRATCH_DEV >/dev/null;
> > +		done
> > +       fi
> > +       rm -f $tmp.mkfs
> 
> Add code as discussed above.
> 
> --D
> 
> > +}
> > -- 
> > 2.20.1
> >
Yang Xu Sept. 20, 2019, 1:52 a.m. UTC | #3
on 2019/09/19 23:00, Zorro Lang wrote:
> xfs/030 always fails after d0e484ac699f ("check: wipe scratch devices
> between tests") get merged.
> 
> Due to xfs/030 does a sized(100m) mkfs. Before we merge above commit,
> mkfs.xfs detects an old primary superblock, it will write zeroes to
> all superblocks before formatting the new filesystem. But this won't
> be done if we wipe the first superblock(by merging above commit).
> 
> That means if we make a (smaller) sized xfs after wipefs, those *old*
> superblocks which created by last time mkfs.xfs will be left on disk.
> Then when we do xfs_repair, if xfs_repair can't find the first SB, it
> will go to find those *old* SB at first. When it finds them,
> everyting goes wrong.
> 
> So I try to get XFS AG geometry(by default) and then try to erase all
> superblocks. Thanks Darrick J. Wong helped to analyze this issue.
Feel free to add Reported-by.
> 
> Signed-off-by: Zorro Lang <zlang@redhat.com>
> ---
>   common/rc  |  4 ++++
>   common/xfs | 23 +++++++++++++++++++++++
>   2 files changed, 27 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index 66c7fd4d..fe13f659 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
>   	for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
>   		test -b $dev && $WIPEFS_PROG -a $dev
>   	done
> +
> +	if [ "$FSTYP" = "xfs" ];then
> +		try_wipe_scratch_xfs
I think we should add a simple comment for why we add it.

ps:_scratch_mkfs_xfs also can make case pass. We can use it and add 
comment. the  try_wipe_scratch_xfs method and the _scratch_mkfs_xfs 
method are all acceptable for me.
> +	fi
>   }
>   
>   # Only run this on xfs if xfs_scrub is available and has the unicode checker
> diff --git a/common/xfs b/common/xfs
> index 1bce3c18..34516f82 100644
> --- a/common/xfs
> +++ b/common/xfs
> @@ -884,3 +884,26 @@ _xfs_mount_agcount()
>   {
>   	$XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
>   }
> +
> +# wipe the superblock of each XFS AGs
> +try_wipe_scratch_xfs()
> +{
> +	local tmp=`mktemp -u`
> +
> +	_scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
> +		if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
> +			print STDOUT "agcount=$1\nagsize=$2\n";
> +		}
> +		if (/^data\s+=\s+bsize=(\d+)\s/) {
> +			print STDOUT "dbsize=$1\n";
> +		}' > $tmp.mkfs
> +
> +	. $tmp.mkfs
> +	if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
> +		for ((i = 0; i < agcount; i++)); do
> +			$XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
> +				$SCRATCH_DEV >/dev/null;
> +		done
> +       fi
> +       rm -f $tmp.mkfs
> +}
>
Darrick J. Wong Sept. 20, 2019, 2:48 a.m. UTC | #4
On Fri, Sep 20, 2019 at 09:52:11AM +0800, Yang Xu wrote:
> 
> 
> on 2019/09/19 23:00, Zorro Lang wrote:
> > xfs/030 always fails after d0e484ac699f ("check: wipe scratch devices
> > between tests") get merged.
> > 
> > Due to xfs/030 does a sized(100m) mkfs. Before we merge above commit,
> > mkfs.xfs detects an old primary superblock, it will write zeroes to
> > all superblocks before formatting the new filesystem. But this won't
> > be done if we wipe the first superblock(by merging above commit).
> > 
> > That means if we make a (smaller) sized xfs after wipefs, those *old*
> > superblocks which created by last time mkfs.xfs will be left on disk.
> > Then when we do xfs_repair, if xfs_repair can't find the first SB, it
> > will go to find those *old* SB at first. When it finds them,
> > everyting goes wrong.
> > 
> > So I try to get XFS AG geometry(by default) and then try to erase all
> > superblocks. Thanks Darrick J. Wong helped to analyze this issue.
> Feel free to add Reported-by.
> > 
> > Signed-off-by: Zorro Lang <zlang@redhat.com>
> > ---
> >   common/rc  |  4 ++++
> >   common/xfs | 23 +++++++++++++++++++++++
> >   2 files changed, 27 insertions(+)
> > 
> > diff --git a/common/rc b/common/rc
> > index 66c7fd4d..fe13f659 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
> >   	for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
> >   		test -b $dev && $WIPEFS_PROG -a $dev
> >   	done
> > +
> > +	if [ "$FSTYP" = "xfs" ];then
> > +		try_wipe_scratch_xfs
> I think we should add a simple comment for why we add it.
> 
> ps:_scratch_mkfs_xfs also can make case pass. We can use it and add comment.
> the  try_wipe_scratch_xfs method and the _scratch_mkfs_xfs method are all
> acceptable for me.

Yes, I suppose formatting and then wiping per below would also achieve
our means, but it would come at the extra cost of zeroing the log.  I'm
not too eager to increase xfstest runtime even more.

Hmmm, I wonder if xfs_db could just grow a 'wipe all superblocks'
command....

--D

> > +	fi
> >   }
> >   # Only run this on xfs if xfs_scrub is available and has the unicode checker
> > diff --git a/common/xfs b/common/xfs
> > index 1bce3c18..34516f82 100644
> > --- a/common/xfs
> > +++ b/common/xfs
> > @@ -884,3 +884,26 @@ _xfs_mount_agcount()
> >   {
> >   	$XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
> >   }
> > +
> > +# wipe the superblock of each XFS AGs
> > +try_wipe_scratch_xfs()
> > +{
> > +	local tmp=`mktemp -u`
> > +
> > +	_scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
> > +		if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
> > +			print STDOUT "agcount=$1\nagsize=$2\n";
> > +		}
> > +		if (/^data\s+=\s+bsize=(\d+)\s/) {
> > +			print STDOUT "dbsize=$1\n";
> > +		}' > $tmp.mkfs
> > +
> > +	. $tmp.mkfs
> > +	if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
> > +		for ((i = 0; i < agcount; i++)); do
> > +			$XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
> > +				$SCRATCH_DEV >/dev/null;
> > +		done
> > +       fi
> > +       rm -f $tmp.mkfs
> > +}
> > 
> 
>
Yang Xu Sept. 20, 2019, 3:44 a.m. UTC | #5
on 2019/09/20 10:48, Darrick J. Wong wrote:
> On Fri, Sep 20, 2019 at 09:52:11AM +0800, Yang Xu wrote:
>>
>>
>> on 2019/09/19 23:00, Zorro Lang wrote:
>>> xfs/030 always fails after d0e484ac699f ("check: wipe scratch devices
>>> between tests") get merged.
>>>
>>> Due to xfs/030 does a sized(100m) mkfs. Before we merge above commit,
>>> mkfs.xfs detects an old primary superblock, it will write zeroes to
>>> all superblocks before formatting the new filesystem. But this won't
>>> be done if we wipe the first superblock(by merging above commit).
>>>
>>> That means if we make a (smaller) sized xfs after wipefs, those *old*
>>> superblocks which created by last time mkfs.xfs will be left on disk.
>>> Then when we do xfs_repair, if xfs_repair can't find the first SB, it
>>> will go to find those *old* SB at first. When it finds them,
>>> everyting goes wrong.
>>>
>>> So I try to get XFS AG geometry(by default) and then try to erase all
>>> superblocks. Thanks Darrick J. Wong helped to analyze this issue.
>> Feel free to add Reported-by.
>>>
>>> Signed-off-by: Zorro Lang <zlang@redhat.com>
>>> ---
>>>    common/rc  |  4 ++++
>>>    common/xfs | 23 +++++++++++++++++++++++
>>>    2 files changed, 27 insertions(+)
>>>
>>> diff --git a/common/rc b/common/rc
>>> index 66c7fd4d..fe13f659 100644
>>> --- a/common/rc
>>> +++ b/common/rc
>>> @@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
>>>    	for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
>>>    		test -b $dev && $WIPEFS_PROG -a $dev
>>>    	done
>>> +
>>> +	if [ "$FSTYP" = "xfs" ];then
>>> +		try_wipe_scratch_xfs
>> I think we should add a simple comment for why we add it.
>>
>> ps:_scratch_mkfs_xfs also can make case pass. We can use it and add comment.
>> the  try_wipe_scratch_xfs method and the _scratch_mkfs_xfs method are all
>> acceptable for me.
> 
> Yes, I suppose formatting and then wiping per below would also achieve
> our means, but it would come at the extra cost of zeroing the log.  I'm
> not too eager to increase xfstest runtime even more.
> 
I see. Thanks.
> Hmmm, I wonder if xfs_db could just grow a 'wipe all superblocks'
> command....
Good idea.>
> --D
> 
>>> +	fi
>>>    }
>>>    # Only run this on xfs if xfs_scrub is available and has the unicode checker
>>> diff --git a/common/xfs b/common/xfs
>>> index 1bce3c18..34516f82 100644
>>> --- a/common/xfs
>>> +++ b/common/xfs
>>> @@ -884,3 +884,26 @@ _xfs_mount_agcount()
>>>    {
>>>    	$XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
>>>    }
>>> +
>>> +# wipe the superblock of each XFS AGs
>>> +try_wipe_scratch_xfs()
>>> +{
>>> +	local tmp=`mktemp -u`
>>> +
>>> +	_scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
>>> +		if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
>>> +			print STDOUT "agcount=$1\nagsize=$2\n";
>>> +		}
>>> +		if (/^data\s+=\s+bsize=(\d+)\s/) {
>>> +			print STDOUT "dbsize=$1\n";
>>> +		}' > $tmp.mkfs
>>> +
>>> +	. $tmp.mkfs
>>> +	if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
>>> +		for ((i = 0; i < agcount; i++)); do
>>> +			$XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
>>> +				$SCRATCH_DEV >/dev/null;
>>> +		done
>>> +       fi
>>> +       rm -f $tmp.mkfs
>>> +}
>>>
>>
>>
> 
>
Darrick J. Wong Sept. 20, 2019, 4:29 a.m. UTC | #6
On Fri, Sep 20, 2019 at 12:31:39PM +0800, Zorro Lang wrote:
> On Thu, Sep 19, 2019 at 07:48:36PM -0700, Darrick J. Wong wrote:
> > On Fri, Sep 20, 2019 at 09:52:11AM +0800, Yang Xu wrote:
> > > 
> > > 
> > > on 2019/09/19 23:00, Zorro Lang wrote:
> > > > xfs/030 always fails after d0e484ac699f ("check: wipe scratch devices
> > > > between tests") get merged.
> > > > 
> > > > Due to xfs/030 does a sized(100m) mkfs. Before we merge above commit,
> > > > mkfs.xfs detects an old primary superblock, it will write zeroes to
> > > > all superblocks before formatting the new filesystem. But this won't
> > > > be done if we wipe the first superblock(by merging above commit).
> > > > 
> > > > That means if we make a (smaller) sized xfs after wipefs, those *old*
> > > > superblocks which created by last time mkfs.xfs will be left on disk.
> > > > Then when we do xfs_repair, if xfs_repair can't find the first SB, it
> > > > will go to find those *old* SB at first. When it finds them,
> > > > everyting goes wrong.
> > > > 
> > > > So I try to get XFS AG geometry(by default) and then try to erase all
> > > > superblocks. Thanks Darrick J. Wong helped to analyze this issue.
> > > Feel free to add Reported-by.
> > > > 
> > > > Signed-off-by: Zorro Lang <zlang@redhat.com>
> > > > ---
> > > >   common/rc  |  4 ++++
> > > >   common/xfs | 23 +++++++++++++++++++++++
> > > >   2 files changed, 27 insertions(+)
> > > > 
> > > > diff --git a/common/rc b/common/rc
> > > > index 66c7fd4d..fe13f659 100644
> > > > --- a/common/rc
> > > > +++ b/common/rc
> > > > @@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
> > > >   	for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
> > > >   		test -b $dev && $WIPEFS_PROG -a $dev
> > > >   	done
> > > > +
> > > > +	if [ "$FSTYP" = "xfs" ];then
> > > > +		try_wipe_scratch_xfs
> > > I think we should add a simple comment for why we add it.
> > > 
> > > ps:_scratch_mkfs_xfs also can make case pass. We can use it and add comment.
> > > the  try_wipe_scratch_xfs method and the _scratch_mkfs_xfs method are all
> > > acceptable for me.
> > 
> > Yes, I suppose formatting and then wiping per below would also achieve
> > our means, but it would come at the extra cost of zeroing the log.  I'm
> > not too eager to increase xfstest runtime even more.
> > 
> > Hmmm, I wonder if xfs_db could just grow a 'wipe all superblocks'
> > command....
> 
> Haha, I was thinking about that too, and I tried this:
> --
> agc=`_scratch_xfs_get_sb_field agcount`
> wipe_xfs_cmd="$XFS_DB_PROG -x"
> for ((i=0; i<agc; i++)); do
> 	wipe_xfs_cmd="$wipe_xfs_cmd -c \"sb $i\" -c \"write -c magicnum 0x00000000\""
> done
> wipe_xfs_cmd="$wipe_xfs_cmd $SCRATCH_DEV"
> eval $wipe_xfs_cmd
> --
> 
> The only one problem about this, I think it's the max length of bash command:)

Yeah... I mean, the downside of all this is that a filesystme could have
thousands of AGs, though I don't imagine there are many people who set
up a 1PB array just to run xfstests ;)

--D

> Thanks,
> Zorro
> 
> > 
> > --D
> > 
> > > > +	fi
> > > >   }
> > > >   # Only run this on xfs if xfs_scrub is available and has the unicode checker
> > > > diff --git a/common/xfs b/common/xfs
> > > > index 1bce3c18..34516f82 100644
> > > > --- a/common/xfs
> > > > +++ b/common/xfs
> > > > @@ -884,3 +884,26 @@ _xfs_mount_agcount()
> > > >   {
> > > >   	$XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
> > > >   }
> > > > +
> > > > +# wipe the superblock of each XFS AGs
> > > > +try_wipe_scratch_xfs()
> > > > +{
> > > > +	local tmp=`mktemp -u`
> > > > +
> > > > +	_scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
> > > > +		if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
> > > > +			print STDOUT "agcount=$1\nagsize=$2\n";
> > > > +		}
> > > > +		if (/^data\s+=\s+bsize=(\d+)\s/) {
> > > > +			print STDOUT "dbsize=$1\n";
> > > > +		}' > $tmp.mkfs
> > > > +
> > > > +	. $tmp.mkfs
> > > > +	if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
> > > > +		for ((i = 0; i < agcount; i++)); do
> > > > +			$XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
> > > > +				$SCRATCH_DEV >/dev/null;
> > > > +		done
> > > > +       fi
> > > > +       rm -f $tmp.mkfs
> > > > +}
> > > > 
> > > 
> > >
Zorro Lang Sept. 20, 2019, 4:31 a.m. UTC | #7
On Thu, Sep 19, 2019 at 07:48:36PM -0700, Darrick J. Wong wrote:
> On Fri, Sep 20, 2019 at 09:52:11AM +0800, Yang Xu wrote:
> > 
> > 
> > on 2019/09/19 23:00, Zorro Lang wrote:
> > > xfs/030 always fails after d0e484ac699f ("check: wipe scratch devices
> > > between tests") get merged.
> > > 
> > > Due to xfs/030 does a sized(100m) mkfs. Before we merge above commit,
> > > mkfs.xfs detects an old primary superblock, it will write zeroes to
> > > all superblocks before formatting the new filesystem. But this won't
> > > be done if we wipe the first superblock(by merging above commit).
> > > 
> > > That means if we make a (smaller) sized xfs after wipefs, those *old*
> > > superblocks which created by last time mkfs.xfs will be left on disk.
> > > Then when we do xfs_repair, if xfs_repair can't find the first SB, it
> > > will go to find those *old* SB at first. When it finds them,
> > > everyting goes wrong.
> > > 
> > > So I try to get XFS AG geometry(by default) and then try to erase all
> > > superblocks. Thanks Darrick J. Wong helped to analyze this issue.
> > Feel free to add Reported-by.
> > > 
> > > Signed-off-by: Zorro Lang <zlang@redhat.com>
> > > ---
> > >   common/rc  |  4 ++++
> > >   common/xfs | 23 +++++++++++++++++++++++
> > >   2 files changed, 27 insertions(+)
> > > 
> > > diff --git a/common/rc b/common/rc
> > > index 66c7fd4d..fe13f659 100644
> > > --- a/common/rc
> > > +++ b/common/rc
> > > @@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
> > >   	for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
> > >   		test -b $dev && $WIPEFS_PROG -a $dev
> > >   	done
> > > +
> > > +	if [ "$FSTYP" = "xfs" ];then
> > > +		try_wipe_scratch_xfs
> > I think we should add a simple comment for why we add it.
> > 
> > ps:_scratch_mkfs_xfs also can make case pass. We can use it and add comment.
> > the  try_wipe_scratch_xfs method and the _scratch_mkfs_xfs method are all
> > acceptable for me.
> 
> Yes, I suppose formatting and then wiping per below would also achieve
> our means, but it would come at the extra cost of zeroing the log.  I'm
> not too eager to increase xfstest runtime even more.
> 
> Hmmm, I wonder if xfs_db could just grow a 'wipe all superblocks'
> command....

Haha, I was thinking about that too, and I tried this:
--
agc=`_scratch_xfs_get_sb_field agcount`
wipe_xfs_cmd="$XFS_DB_PROG -x"
for ((i=0; i<agc; i++)); do
	wipe_xfs_cmd="$wipe_xfs_cmd -c \"sb $i\" -c \"write -c magicnum 0x00000000\""
done
wipe_xfs_cmd="$wipe_xfs_cmd $SCRATCH_DEV"
eval $wipe_xfs_cmd
--

The only one problem about this, I think it's the max length of bash command:)

Thanks,
Zorro

> 
> --D
> 
> > > +	fi
> > >   }
> > >   # Only run this on xfs if xfs_scrub is available and has the unicode checker
> > > diff --git a/common/xfs b/common/xfs
> > > index 1bce3c18..34516f82 100644
> > > --- a/common/xfs
> > > +++ b/common/xfs
> > > @@ -884,3 +884,26 @@ _xfs_mount_agcount()
> > >   {
> > >   	$XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
> > >   }
> > > +
> > > +# wipe the superblock of each XFS AGs
> > > +try_wipe_scratch_xfs()
> > > +{
> > > +	local tmp=`mktemp -u`
> > > +
> > > +	_scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
> > > +		if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
> > > +			print STDOUT "agcount=$1\nagsize=$2\n";
> > > +		}
> > > +		if (/^data\s+=\s+bsize=(\d+)\s/) {
> > > +			print STDOUT "dbsize=$1\n";
> > > +		}' > $tmp.mkfs
> > > +
> > > +	. $tmp.mkfs
> > > +	if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
> > > +		for ((i = 0; i < agcount; i++)); do
> > > +			$XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
> > > +				$SCRATCH_DEV >/dev/null;
> > > +		done
> > > +       fi
> > > +       rm -f $tmp.mkfs
> > > +}
> > > 
> > 
> >
diff mbox series

Patch

diff --git a/common/rc b/common/rc
index 66c7fd4d..fe13f659 100644
--- a/common/rc
+++ b/common/rc
@@ -4048,6 +4048,10 @@  _try_wipe_scratch_devs()
 	for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
 		test -b $dev && $WIPEFS_PROG -a $dev
 	done
+
+	if [ "$FSTYP" = "xfs" ];then
+		try_wipe_scratch_xfs
+	fi
 }
 
 # Only run this on xfs if xfs_scrub is available and has the unicode checker
diff --git a/common/xfs b/common/xfs
index 1bce3c18..34516f82 100644
--- a/common/xfs
+++ b/common/xfs
@@ -884,3 +884,26 @@  _xfs_mount_agcount()
 {
 	$XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
 }
+
+# wipe the superblock of each XFS AGs
+try_wipe_scratch_xfs()
+{
+	local tmp=`mktemp -u`
+
+	_scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
+		if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
+			print STDOUT "agcount=$1\nagsize=$2\n";
+		}
+		if (/^data\s+=\s+bsize=(\d+)\s/) {
+			print STDOUT "dbsize=$1\n";
+		}' > $tmp.mkfs
+
+	. $tmp.mkfs
+	if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
+		for ((i = 0; i < agcount; i++)); do
+			$XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
+				$SCRATCH_DEV >/dev/null;
+		done
+       fi
+       rm -f $tmp.mkfs
+}