diff mbox series

common/populate: Ensure that S_IFDIR.FMT_BTREE is in btree format

Message ID 20221201081208.40147-1-hsiangkao@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series common/populate: Ensure that S_IFDIR.FMT_BTREE is in btree format | expand

Commit Message

Gao Xiang Dec. 1, 2022, 8:12 a.m. UTC
Sometimes "$((128 * dblksz / 40))" dirents cannot make sure that
S_IFDIR.FMT_BTREE could become btree format for its DATA fork.

Actually we just observed it can fail after apply our inode
extent-to-btree workaround. The root cause is that the kernel may be
too good at allocating consecutive blocks so that the data fork is
still in extents format.

Therefore instead of using a fixed number, let's make sure the number
of extents is large enough than (inode size - inode core size) /
sizeof(xfs_bmbt_rec_t).

Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
Cc: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 common/populate | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong Dec. 1, 2022, 3:52 p.m. UTC | #1
On Thu, Dec 01, 2022 at 04:12:08PM +0800, Gao Xiang wrote:
> Sometimes "$((128 * dblksz / 40))" dirents cannot make sure that
> S_IFDIR.FMT_BTREE could become btree format for its DATA fork.
> 
> Actually we just observed it can fail after apply our inode
> extent-to-btree workaround. The root cause is that the kernel may be
> too good at allocating consecutive blocks so that the data fork is
> still in extents format.
> 
> Therefore instead of using a fixed number, let's make sure the number
> of extents is large enough than (inode size - inode core size) /
> sizeof(xfs_bmbt_rec_t).
> 
> Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
> Cc: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  common/populate | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/common/populate b/common/populate
> index 6e004997..e179a300 100644
> --- a/common/populate
> +++ b/common/populate
> @@ -71,6 +71,25 @@ __populate_create_dir() {
>  	done
>  }
>  
> +# Create a large directory and ensure that it's a btree format
> +__populate_create_btree_dir() {

Since this encodes behavior specific to xfs, this ought to be called

__populate_xfs_create_btree_dir

or something like that.

> +	name="$1"
> +	isize="$2"

These ought to be local variables, e.g.

	local name="$1"
	local isize="$2"

So that they don't pollute the global name scope.  Yay bash.

> +
> +	mkdir -p "${name}"
> +	d=0
> +	while true; do
> +		creat=mkdir
> +		test "$((d % 20))" -eq 0 && creat=touch
> +		$creat "${name}/$(printf "%.08d" "$d")"
> +		if [ "$((d % 40))" -eq 0 ]; then
> +			nexts="$($XFS_IO_PROG -c "stat" $name | grep 'fsxattr.nextents' | sed -e 's/^.*nextents = //g' -e 's/\([0-9]*\).*$/\1/g')"

_xfs_get_fsxattr...

> +			[ "$nexts" -gt "$(((isize - 176) / 16))" ] && break

Only need to calculate this once if you declare this at the top:

	# We need enough extents to guarantee that the data fork is in
	# btree format.  Cycling the mount to use xfs_db is too slow, so
	# watch for when the extent count exceeds the space after the
	# inode core.
	local max_nextents="$(((isize - 176) / 16))"

and then you can do:

			[[ $nexts -gt $max_nextents ]] && break

Also not a fan of hardcoding 176 around fstests, but I don't know how
we'd detect that at all.

# Number of bytes reserved for only the inode record, excluding the
# immediate fork areas.
_xfs_inode_core_bytes()
{
	echo 176
}

I guess?  Or extract it from tests/xfs/122.out?

> +		fi
> +		d=$((d+1))
> +	done
> +}
> +
>  # Add a bunch of attrs to a file
>  __populate_create_attr() {
>  	name="$1"
> @@ -176,6 +195,7 @@ _scratch_xfs_populate() {
>  
>  	blksz="$(stat -f -c '%s' "${SCRATCH_MNT}")"
>  	dblksz="$(_xfs_get_dir_blocksize "$SCRATCH_MNT")"
> +	isize="$($XFS_INFO_PROG "${SCRATCH_MNT}" | grep meta-data=.*isize | sed -e 's/^.*isize=//g' -e 's/\([0-9]*\).*$/\1/g')"

Please hoist this to common/xfs:

# Number of bytes reserved for a full inode record, which includes the
# immediate fork areas.
_xfs_inode_size()
{
	local mntpoint="$1"

	$XFS_INFO_PROG "$mountpoint" | grep 'meta-data=.*isize' | sed -e 's/^.*isize=\([0-9]*\).*$/\1/g')"
}

Otherwise this looks reasonable.

--D

>  	crc="$(_xfs_has_feature "$SCRATCH_MNT" crc -v)"
>  	if [ $crc -eq 1 ]; then
>  		leaf_hdr_size=64
> @@ -226,7 +246,7 @@ _scratch_xfs_populate() {
>  
>  	# - BTREE
>  	echo "+ btree dir"
> -	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BTREE" "$((128 * dblksz / 40))" true
> +	__populate_create_btree_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BTREE" "$isize"
>  
>  	# Symlinks
>  	# - FMT_LOCAL
> -- 
> 2.24.4
>
Gao Xiang Dec. 2, 2022, 2:23 a.m. UTC | #2
Hi Darrick,

On Thu, Dec 01, 2022 at 07:52:44AM -0800, Darrick J. Wong wrote:
> On Thu, Dec 01, 2022 at 04:12:08PM +0800, Gao Xiang wrote:
> > Sometimes "$((128 * dblksz / 40))" dirents cannot make sure that
> > S_IFDIR.FMT_BTREE could become btree format for its DATA fork.
> > 
> > Actually we just observed it can fail after apply our inode
> > extent-to-btree workaround. The root cause is that the kernel may be
> > too good at allocating consecutive blocks so that the data fork is
> > still in extents format.
> > 
> > Therefore instead of using a fixed number, let's make sure the number
> > of extents is large enough than (inode size - inode core size) /
> > sizeof(xfs_bmbt_rec_t).
> > 
> > Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
> > Cc: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >  common/populate | 22 +++++++++++++++++++++-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> > 
> > diff --git a/common/populate b/common/populate
> > index 6e004997..e179a300 100644
> > --- a/common/populate
> > +++ b/common/populate
> > @@ -71,6 +71,25 @@ __populate_create_dir() {
> >  	done
> >  }
> >  
> > +# Create a large directory and ensure that it's a btree format
> > +__populate_create_btree_dir() {
> 
> Since this encodes behavior specific to xfs, this ought to be called
> 
> __populate_xfs_create_btree_dir
> 
> or something like that.
> 
> > +	name="$1"
> > +	isize="$2"
> 
> These ought to be local variables, e.g.
> 
> 	local name="$1"
> 	local isize="$2"
> 
> So that they don't pollute the global name scope.  Yay bash.
> 
> > +
> > +	mkdir -p "${name}"
> > +	d=0
> > +	while true; do
> > +		creat=mkdir
> > +		test "$((d % 20))" -eq 0 && creat=touch
> > +		$creat "${name}/$(printf "%.08d" "$d")"
> > +		if [ "$((d % 40))" -eq 0 ]; then
> > +			nexts="$($XFS_IO_PROG -c "stat" $name | grep 'fsxattr.nextents' | sed -e 's/^.*nextents = //g' -e 's/\([0-9]*\).*$/\1/g')"
> 
> _xfs_get_fsxattr...
> 
> > +			[ "$nexts" -gt "$(((isize - 176) / 16))" ] && break
> 
> Only need to calculate this once if you declare this at the top:
> 
> 	# We need enough extents to guarantee that the data fork is in
> 	# btree format.  Cycling the mount to use xfs_db is too slow, so
> 	# watch for when the extent count exceeds the space after the
> 	# inode core.
> 	local max_nextents="$(((isize - 176) / 16))"
> 
> and then you can do:
> 
> 			[[ $nexts -gt $max_nextents ]] && break
> 
> Also not a fan of hardcoding 176 around fstests, but I don't know how
> we'd detect that at all.
> 
> # Number of bytes reserved for only the inode record, excluding the
> # immediate fork areas.
> _xfs_inode_core_bytes()
> {
> 	echo 176
> }
> 
> I guess?  Or extract it from tests/xfs/122.out?

Thanks for your comments.

I guess hard-coded 176 in _xfs_inode_core_bytes() is fine for now
(It seems a bit weird to extract a number from a test expected result..)

Otherwise I agree with your comments.  I will ask Ziyang to follow your
comments and send out v2.

Thanks,
Gao Xiang
Dave Chinner Dec. 6, 2022, 11:34 p.m. UTC | #3
On Fri, Dec 02, 2022 at 10:23:07AM +0800, Gao Xiang wrote:
> Hi Darrick,
> 
> On Thu, Dec 01, 2022 at 07:52:44AM -0800, Darrick J. Wong wrote:
> > On Thu, Dec 01, 2022 at 04:12:08PM +0800, Gao Xiang wrote:
> > > Sometimes "$((128 * dblksz / 40))" dirents cannot make sure that
> > > S_IFDIR.FMT_BTREE could become btree format for its DATA fork.
> > > 
> > > Actually we just observed it can fail after apply our inode
> > > extent-to-btree workaround. The root cause is that the kernel may be
> > > too good at allocating consecutive blocks so that the data fork is
> > > still in extents format.
> > > 
> > > Therefore instead of using a fixed number, let's make sure the number
> > > of extents is large enough than (inode size - inode core size) /
> > > sizeof(xfs_bmbt_rec_t).
> > > 
> > > Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
> > > Cc: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
> > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > ---
> > >  common/populate | 22 +++++++++++++++++++++-
> > >  1 file changed, 21 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/common/populate b/common/populate
> > > index 6e004997..e179a300 100644
> > > --- a/common/populate
> > > +++ b/common/populate
> > > @@ -71,6 +71,25 @@ __populate_create_dir() {
> > >  	done
> > >  }
> > >  
> > > +# Create a large directory and ensure that it's a btree format
> > > +__populate_create_btree_dir() {
> > 
> > Since this encodes behavior specific to xfs, this ought to be called
> > 
> > __populate_xfs_create_btree_dir
> > 
> > or something like that.
> > 
> > > +	name="$1"
> > > +	isize="$2"
> > 
> > These ought to be local variables, e.g.
> > 
> > 	local name="$1"
> > 	local isize="$2"
> > 
> > So that they don't pollute the global name scope.  Yay bash.
> > 
> > > +
> > > +	mkdir -p "${name}"
> > > +	d=0
> > > +	while true; do
> > > +		creat=mkdir
> > > +		test "$((d % 20))" -eq 0 && creat=touch
> > > +		$creat "${name}/$(printf "%.08d" "$d")"
> > > +		if [ "$((d % 40))" -eq 0 ]; then
> > > +			nexts="$($XFS_IO_PROG -c "stat" $name | grep 'fsxattr.nextents' | sed -e 's/^.*nextents = //g' -e 's/\([0-9]*\).*$/\1/g')"
> > 
> > _xfs_get_fsxattr...

The grep/sed expression is also overly complex - it can easily be
replaced with just this:

	nexts=`$XFS_IO_PROG -c "stat" $name | sed -ne 's/^fsxattr.nextents = //p'

The "-n" option to sed suppresses the printing of the stream
(pattern space) to the output as it processes the input, which gets
rid of the need for using grep to suppress non-matching input. The "p"
suffix to the search string forces matched patterns to be printed to
the output.

This results in only matched, substituted pattern spaces to be
printed, avoiding the need for grep and multiple sed regexes to be
matched to strip the text down to just the integer string.

> > > +			[ "$nexts" -gt "$(((isize - 176) / 16))" ] && break
> > 
> > Only need to calculate this once if you declare this at the top:
> > 
> > 	# We need enough extents to guarantee that the data fork is in
> > 	# btree format.  Cycling the mount to use xfs_db is too slow, so
> > 	# watch for when the extent count exceeds the space after the
> > 	# inode core.
> > 	local max_nextents="$(((isize - 176) / 16))"
> > 
> > and then you can do:
> > 
> > 			[[ $nexts -gt $max_nextents ]] && break
> > 
> > Also not a fan of hardcoding 176 around fstests, but I don't know how
> > we'd detect that at all.
> > 
> > # Number of bytes reserved for only the inode record, excluding the
> > # immediate fork areas.
> > _xfs_inode_core_bytes()
> > {
> > 	echo 176
> > }
> > 
> > I guess?  Or extract it from tests/xfs/122.out?
> 
> Thanks for your comments.
> 
> I guess hard-coded 176 in _xfs_inode_core_bytes() is fine for now
> (It seems a bit weird to extract a number from a test expected result..)

Which is wrong when testing a v4 filesystem - in that case the inode
core size is 96 bytes and so max extents may be larger on v4
filesystems than v5 filesystems....

Cheers,

Dave.
Gao Xiang Dec. 7, 2022, 2:11 a.m. UTC | #4
Hi Dave,

On Wed, Dec 07, 2022 at 10:34:17AM +1100, Dave Chinner wrote:
> On Fri, Dec 02, 2022 at 10:23:07AM +0800, Gao Xiang wrote:
> > Hi Darrick,
> > 
> > On Thu, Dec 01, 2022 at 07:52:44AM -0800, Darrick J. Wong wrote:
> > > On Thu, Dec 01, 2022 at 04:12:08PM +0800, Gao Xiang wrote:
> > > > Sometimes "$((128 * dblksz / 40))" dirents cannot make sure that
> > > > S_IFDIR.FMT_BTREE could become btree format for its DATA fork.
> > > > 
> > > > Actually we just observed it can fail after apply our inode
> > > > extent-to-btree workaround. The root cause is that the kernel may be
> > > > too good at allocating consecutive blocks so that the data fork is
> > > > still in extents format.
> > > > 
> > > > Therefore instead of using a fixed number, let's make sure the number
> > > > of extents is large enough than (inode size - inode core size) /
> > > > sizeof(xfs_bmbt_rec_t).
> > > > 
> > > > Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > Cc: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
> > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > > ---
> > > >  common/populate | 22 +++++++++++++++++++++-
> > > >  1 file changed, 21 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/common/populate b/common/populate
> > > > index 6e004997..e179a300 100644
> > > > --- a/common/populate
> > > > +++ b/common/populate
> > > > @@ -71,6 +71,25 @@ __populate_create_dir() {
> > > >  	done
> > > >  }
> > > >  
> > > > +# Create a large directory and ensure that it's a btree format
> > > > +__populate_create_btree_dir() {
> > > 
> > > Since this encodes behavior specific to xfs, this ought to be called
> > > 
> > > __populate_xfs_create_btree_dir
> > > 
> > > or something like that.
> > > 
> > > > +	name="$1"
> > > > +	isize="$2"
> > > 
> > > These ought to be local variables, e.g.
> > > 
> > > 	local name="$1"
> > > 	local isize="$2"
> > > 
> > > So that they don't pollute the global name scope.  Yay bash.
> > > 
> > > > +
> > > > +	mkdir -p "${name}"
> > > > +	d=0
> > > > +	while true; do
> > > > +		creat=mkdir
> > > > +		test "$((d % 20))" -eq 0 && creat=touch
> > > > +		$creat "${name}/$(printf "%.08d" "$d")"
> > > > +		if [ "$((d % 40))" -eq 0 ]; then
> > > > +			nexts="$($XFS_IO_PROG -c "stat" $name | grep 'fsxattr.nextents' | sed -e 's/^.*nextents = //g' -e 's/\([0-9]*\).*$/\1/g')"
> > > 
> > > _xfs_get_fsxattr...
> 
> The grep/sed expression is also overly complex - it can easily be
> replaced with just this:
> 
> 	nexts=`$XFS_IO_PROG -c "stat" $name | sed -ne 's/^fsxattr.nextents = //p'
> 
> The "-n" option to sed suppresses the printing of the stream
> (pattern space) to the output as it processes the input, which gets
> rid of the need for using grep to suppress non-matching input. The "p"
> suffix to the search string forces matched patterns to be printed to
> the output.
> 
> This results in only matched, substituted pattern spaces to be
> printed, avoiding the need for grep and multiple sed regexes to be
> matched to strip the text down to just the integer string.

I just copied it from another reference at that time as a copy-and-paste
engineer.. Also note that Ziyang's new patch already use
_xfs_get_fsxattr to get this field.

> 
> > > > +			[ "$nexts" -gt "$(((isize - 176) / 16))" ] && break
> > > 
> > > Only need to calculate this once if you declare this at the top:
> > > 
> > > 	# We need enough extents to guarantee that the data fork is in
> > > 	# btree format.  Cycling the mount to use xfs_db is too slow, so
> > > 	# watch for when the extent count exceeds the space after the
> > > 	# inode core.
> > > 	local max_nextents="$(((isize - 176) / 16))"
> > > 
> > > and then you can do:
> > > 
> > > 			[[ $nexts -gt $max_nextents ]] && break
> > > 
> > > Also not a fan of hardcoding 176 around fstests, but I don't know how
> > > we'd detect that at all.
> > > 
> > > # Number of bytes reserved for only the inode record, excluding the
> > > # immediate fork areas.
> > > _xfs_inode_core_bytes()
> > > {
> > > 	echo 176
> > > }
> > > 
> > > I guess?  Or extract it from tests/xfs/122.out?
> > 
> > Thanks for your comments.
> > 
> > I guess hard-coded 176 in _xfs_inode_core_bytes() is fine for now
> > (It seems a bit weird to extract a number from a test expected result..)
> 
> Which is wrong when testing a v4 filesystem - in that case the inode
> core size is 96 bytes and so max extents may be larger on v4
> filesystems than v5 filesystems....

Do we really care v4 fs for now since it's deprecated?... Darrick once also 
suggested using (isize / 16) but it seems it could take unnecessary time to
prepare.. Or we could just use (isize - 96) / 16 to keep v4 work.

Thanks,
Gao Xiang


> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Darrick J. Wong Dec. 7, 2022, 2:17 a.m. UTC | #5
On Wed, Dec 07, 2022 at 10:11:49AM +0800, Gao Xiang wrote:
> Hi Dave,
> 
> On Wed, Dec 07, 2022 at 10:34:17AM +1100, Dave Chinner wrote:
> > On Fri, Dec 02, 2022 at 10:23:07AM +0800, Gao Xiang wrote:
> > > Hi Darrick,
> > > 
> > > On Thu, Dec 01, 2022 at 07:52:44AM -0800, Darrick J. Wong wrote:
> > > > On Thu, Dec 01, 2022 at 04:12:08PM +0800, Gao Xiang wrote:
> > > > > Sometimes "$((128 * dblksz / 40))" dirents cannot make sure that
> > > > > S_IFDIR.FMT_BTREE could become btree format for its DATA fork.
> > > > > 
> > > > > Actually we just observed it can fail after apply our inode
> > > > > extent-to-btree workaround. The root cause is that the kernel may be
> > > > > too good at allocating consecutive blocks so that the data fork is
> > > > > still in extents format.
> > > > > 
> > > > > Therefore instead of using a fixed number, let's make sure the number
> > > > > of extents is large enough than (inode size - inode core size) /
> > > > > sizeof(xfs_bmbt_rec_t).
> > > > > 
> > > > > Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > > Cc: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
> > > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > > > ---
> > > > >  common/populate | 22 +++++++++++++++++++++-
> > > > >  1 file changed, 21 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/common/populate b/common/populate
> > > > > index 6e004997..e179a300 100644
> > > > > --- a/common/populate
> > > > > +++ b/common/populate
> > > > > @@ -71,6 +71,25 @@ __populate_create_dir() {
> > > > >  	done
> > > > >  }
> > > > >  
> > > > > +# Create a large directory and ensure that it's a btree format
> > > > > +__populate_create_btree_dir() {
> > > > 
> > > > Since this encodes behavior specific to xfs, this ought to be called
> > > > 
> > > > __populate_xfs_create_btree_dir
> > > > 
> > > > or something like that.
> > > > 
> > > > > +	name="$1"
> > > > > +	isize="$2"
> > > > 
> > > > These ought to be local variables, e.g.
> > > > 
> > > > 	local name="$1"
> > > > 	local isize="$2"
> > > > 
> > > > So that they don't pollute the global name scope.  Yay bash.
> > > > 
> > > > > +
> > > > > +	mkdir -p "${name}"
> > > > > +	d=0
> > > > > +	while true; do
> > > > > +		creat=mkdir
> > > > > +		test "$((d % 20))" -eq 0 && creat=touch
> > > > > +		$creat "${name}/$(printf "%.08d" "$d")"
> > > > > +		if [ "$((d % 40))" -eq 0 ]; then
> > > > > +			nexts="$($XFS_IO_PROG -c "stat" $name | grep 'fsxattr.nextents' | sed -e 's/^.*nextents = //g' -e 's/\([0-9]*\).*$/\1/g')"
> > > > 
> > > > _xfs_get_fsxattr...
> > 
> > The grep/sed expression is also overly complex - it can easily be
> > replaced with just this:
> > 
> > 	nexts=`$XFS_IO_PROG -c "stat" $name | sed -ne 's/^fsxattr.nextents = //p'
> > 
> > The "-n" option to sed suppresses the printing of the stream
> > (pattern space) to the output as it processes the input, which gets
> > rid of the need for using grep to suppress non-matching input. The "p"
> > suffix to the search string forces matched patterns to be printed to
> > the output.
> > 
> > This results in only matched, substituted pattern spaces to be
> > printed, avoiding the need for grep and multiple sed regexes to be
> > matched to strip the text down to just the integer string.
> 
> I just copied it from another reference at that time as a copy-and-paste
> engineer.. Also note that Ziyang's new patch already use
> _xfs_get_fsxattr to get this field.
> 
> > 
> > > > > +			[ "$nexts" -gt "$(((isize - 176) / 16))" ] && break
> > > > 
> > > > Only need to calculate this once if you declare this at the top:
> > > > 
> > > > 	# We need enough extents to guarantee that the data fork is in
> > > > 	# btree format.  Cycling the mount to use xfs_db is too slow, so
> > > > 	# watch for when the extent count exceeds the space after the
> > > > 	# inode core.
> > > > 	local max_nextents="$(((isize - 176) / 16))"
> > > > 
> > > > and then you can do:
> > > > 
> > > > 			[[ $nexts -gt $max_nextents ]] && break
> > > > 
> > > > Also not a fan of hardcoding 176 around fstests, but I don't know how
> > > > we'd detect that at all.
> > > > 
> > > > # Number of bytes reserved for only the inode record, excluding the
> > > > # immediate fork areas.
> > > > _xfs_inode_core_bytes()
> > > > {
> > > > 	echo 176
> > > > }
> > > > 
> > > > I guess?  Or extract it from tests/xfs/122.out?
> > > 
> > > Thanks for your comments.
> > > 
> > > I guess hard-coded 176 in _xfs_inode_core_bytes() is fine for now
> > > (It seems a bit weird to extract a number from a test expected result..)
> > 
> > Which is wrong when testing a v4 filesystem - in that case the inode
> > core size is 96 bytes and so max extents may be larger on v4
> > filesystems than v5 filesystems....
> 
> Do we really care v4 fs for now since it's deprecated?... Darrick once also 
> suggested using (isize / 16) but it seems it could take unnecessary time to
> prepare.. Or we could just use (isize - 96) / 16 to keep v4 work.

Well you /could/ make _xfs_inode_core_bytes grep xfs_info for 'crc=1'
and switch 176/96 on that.  The only reason why the existing callers
hardcoded 176 is that (I think) they all require crc=1.

(Or they're h***** bash scripts and we've just gotten lucky the whole
time.)

--D

> Thanks,
> Gao Xiang
> 
> 
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
Dave Chinner Dec. 7, 2022, 9:48 p.m. UTC | #6
On Wed, Dec 07, 2022 at 10:11:49AM +0800, Gao Xiang wrote:
> On Wed, Dec 07, 2022 at 10:34:17AM +1100, Dave Chinner wrote:
> > On Fri, Dec 02, 2022 at 10:23:07AM +0800, Gao Xiang wrote:
> > > > > +			[ "$nexts" -gt "$(((isize - 176) / 16))" ] && break
> > > > 
> > > > Only need to calculate this once if you declare this at the top:
> > > > 
> > > > 	# We need enough extents to guarantee that the data fork is in
> > > > 	# btree format.  Cycling the mount to use xfs_db is too slow, so
> > > > 	# watch for when the extent count exceeds the space after the
> > > > 	# inode core.
> > > > 	local max_nextents="$(((isize - 176) / 16))"
> > > > 
> > > > and then you can do:
> > > > 
> > > > 			[[ $nexts -gt $max_nextents ]] && break
> > > > 
> > > > Also not a fan of hardcoding 176 around fstests, but I don't know how
> > > > we'd detect that at all.
> > > > 
> > > > # Number of bytes reserved for only the inode record, excluding the
> > > > # immediate fork areas.
> > > > _xfs_inode_core_bytes()
> > > > {
> > > > 	echo 176
> > > > }
> > > > 
> > > > I guess?  Or extract it from tests/xfs/122.out?
> > > 
> > > Thanks for your comments.
> > > 
> > > I guess hard-coded 176 in _xfs_inode_core_bytes() is fine for now
> > > (It seems a bit weird to extract a number from a test expected result..)
> > 
> > Which is wrong when testing a v4 filesystem - in that case the inode
> > core size is 96 bytes and so max extents may be larger on v4
> > filesystems than v5 filesystems....
> 
> Do we really care v4 fs for now since it's deprecated?...

Yes, there are still lots of v4 filesystems in production
environments. There shouldn't be many new ones, but there is a long
tail of existing storage containing v4 filesystems that we must not
break.

We have to support v4 filesystems for another few years yet, hence
we still need solid test coverage on them to ensure we don't
accidentally break something that is going to bite users before they
migrate to newer filesystems....

> Darrick once also 
> suggested using (isize / 16) but it seems it could take unnecessary time to
> prepare.. Or we could just use (isize - 96) / 16 to keep v4 work.

It's taken me longer to write this email than it does to write the
code to make it work properly. e.g.:

	xfs_info $scratch | sed -ne 's/.*crc=\([01]\).*/\1/p'

And now we have 0 = v4, 1 = v5, and it's trivial to return the
correct inode size.

You can even do this trivially with grep:

	xfs_info $scratch | grep -wq "crc=1"
	if [ $? -eq 0 ]; then
		echo 176
	else
		echo 96
	fi

and now the return value tells us if we have a v4 or v5 filesystem.

-Dave.
Ziyang Zhang Dec. 8, 2022, 6:29 a.m. UTC | #7
On 2022/12/8 05:48, Dave Chinner wrote:
> On Wed, Dec 07, 2022 at 10:11:49AM +0800, Gao Xiang wrote:
>> On Wed, Dec 07, 2022 at 10:34:17AM +1100, Dave Chinner wrote:
>>> On Fri, Dec 02, 2022 at 10:23:07AM +0800, Gao Xiang wrote:
>>>>>> +			[ "$nexts" -gt "$(((isize - 176) / 16))" ] && break
>>>>>
>>>>> Only need to calculate this once if you declare this at the top:
>>>>>
>>>>> 	# We need enough extents to guarantee that the data fork is in
>>>>> 	# btree format.  Cycling the mount to use xfs_db is too slow, so
>>>>> 	# watch for when the extent count exceeds the space after the
>>>>> 	# inode core.
>>>>> 	local max_nextents="$(((isize - 176) / 16))"
>>>>>
>>>>> and then you can do:
>>>>>
>>>>> 			[[ $nexts -gt $max_nextents ]] && break
>>>>>
>>>>> Also not a fan of hardcoding 176 around fstests, but I don't know how
>>>>> we'd detect that at all.
>>>>>
>>>>> # Number of bytes reserved for only the inode record, excluding the
>>>>> # immediate fork areas.
>>>>> _xfs_inode_core_bytes()
>>>>> {
>>>>> 	echo 176
>>>>> }
>>>>>
>>>>> I guess?  Or extract it from tests/xfs/122.out?
>>>>
>>>> Thanks for your comments.
>>>>
>>>> I guess hard-coded 176 in _xfs_inode_core_bytes() is fine for now
>>>> (It seems a bit weird to extract a number from a test expected result..)
>>>
>>> Which is wrong when testing a v4 filesystem - in that case the inode
>>> core size is 96 bytes and so max extents may be larger on v4
>>> filesystems than v5 filesystems....
>>
>> Do we really care v4 fs for now since it's deprecated?...
> 
> Yes, there are still lots of v4 filesystems in production
> environments. There shouldn't be many new ones, but there is a long
> tail of existing storage containing v4 filesystems that we must not
> break.
> 
> We have to support v4 filesystems for another few years yet, hence
> we still need solid test coverage on them to ensure we don't
> accidentally break something that is going to bite users before they
> migrate to newer filesystems....
> 
>> Darrick once also 
>> suggested using (isize / 16) but it seems it could take unnecessary time to
>> prepare.. Or we could just use (isize - 96) / 16 to keep v4 work.
> 
> It's taken me longer to write this email than it does to write the
> code to make it work properly. e.g.:
> 
> 	xfs_info $scratch | sed -ne 's/.*crc=\([01]\).*/\1/p'
> 
> And now we have 0 = v4, 1 = v5, and it's trivial to return the
> correct inode size.
> 
> You can even do this trivially with grep:
> 
> 	xfs_info $scratch | grep -wq "crc=1"
> 	if [ $? -eq 0 ]; then
> 		echo 176
> 	else
> 		echo 96
> 	fi
> 
> and now the return value tells us if we have a v4 or v5 filesystem.
> 
> -Dave.

Hi, David

I have written new versions, please see:
https://lore.kernel.org/fstests/20221207093147.1634425-1-ZiyangZhang@linux.alibaba.com/

Regards,
Zhang
diff mbox series

Patch

diff --git a/common/populate b/common/populate
index 6e004997..e179a300 100644
--- a/common/populate
+++ b/common/populate
@@ -71,6 +71,25 @@  __populate_create_dir() {
 	done
 }
 
+# Create a large directory and ensure that it's a btree format
+__populate_create_btree_dir() {
+	name="$1"
+	isize="$2"
+
+	mkdir -p "${name}"
+	d=0
+	while true; do
+		creat=mkdir
+		test "$((d % 20))" -eq 0 && creat=touch
+		$creat "${name}/$(printf "%.08d" "$d")"
+		if [ "$((d % 40))" -eq 0 ]; then
+			nexts="$($XFS_IO_PROG -c "stat" $name | grep 'fsxattr.nextents' | sed -e 's/^.*nextents = //g' -e 's/\([0-9]*\).*$/\1/g')"
+			[ "$nexts" -gt "$(((isize - 176) / 16))" ] && break
+		fi
+		d=$((d+1))
+	done
+}
+
 # Add a bunch of attrs to a file
 __populate_create_attr() {
 	name="$1"
@@ -176,6 +195,7 @@  _scratch_xfs_populate() {
 
 	blksz="$(stat -f -c '%s' "${SCRATCH_MNT}")"
 	dblksz="$(_xfs_get_dir_blocksize "$SCRATCH_MNT")"
+	isize="$($XFS_INFO_PROG "${SCRATCH_MNT}" | grep meta-data=.*isize | sed -e 's/^.*isize=//g' -e 's/\([0-9]*\).*$/\1/g')"
 	crc="$(_xfs_has_feature "$SCRATCH_MNT" crc -v)"
 	if [ $crc -eq 1 ]; then
 		leaf_hdr_size=64
@@ -226,7 +246,7 @@  _scratch_xfs_populate() {
 
 	# - BTREE
 	echo "+ btree dir"
-	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BTREE" "$((128 * dblksz / 40))" true
+	__populate_create_btree_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BTREE" "$isize"
 
 	# Symlinks
 	# - FMT_LOCAL