xfs: don't use in-core per-cpu fdblocks for !lazysbcount

Message ID	20210416091023.2143162-1-hsiangkao@redhat.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-xfs-owner@kernel.org> From: Gao Xiang <hsiangkao@redhat.com> To: linux-xfs@vger.kernel.org Cc: Gao Xiang <hsiangkao@redhat.com>, Zorro Lang <zlang@redhat.com> Subject: [PATCH] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Date: Fri, 16 Apr 2021 17:10:23 +0800 Message-Id: <20210416091023.2143162-1-hsiangkao@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	xfs: don't use in-core per-cpu fdblocks for !lazysbcount \| expand xfs: don't use in-core per-cpu fdblocks for !lazysbcount

Gao Xiang April 16, 2021, 9:10 a.m. UTC

There are many paths which could trigger xfs_log_sb(), e.g.
  xfs_bmap_add_attrfork()
    -> xfs_log_sb()
, which overrided on-disk fdblocks by in-core per-CPU fdblocks.

However, for !lazysbcount cases, on-disk fdblocks is actually updated
by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
in-core fdblocks due to xfs_reserve_block() or whatever, see the
comment in xfs_unmountfs().

It could be observed by the following steps reported by Zorro [1]:

1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
2. mount $dev $mnt
3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
4. umount $mnt
5. xfs_repair -n $dev

yet due to commit f46e5a174655("xfs: fold sbcount quiesce logging
into log covering"), xfs_sync_sb() will be triggered even !lazysbcount
but xfs_log_need_covered() case when xfs_unmountfs(), so hard to
reproduce on kernel 5.12+.

After this patch, I've seen no strange so far on older kernels
for the testcase above without lazysbcount.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1949515

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
 fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Carlos Maiolino April 16, 2021, 2:10 p.m. UTC | #1

On Fri, Apr 16, 2021 at 05:10:23PM +0800, Gao Xiang wrote:
> There are many paths which could trigger xfs_log_sb(), e.g.
>   xfs_bmap_add_attrfork()
>     -> xfs_log_sb()
> , which overrided on-disk fdblocks by in-core per-CPU fdblocks.
> 
> However, for !lazysbcount cases, on-disk fdblocks is actually updated
> by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> in-core fdblocks due to xfs_reserve_block() or whatever, see the
> comment in xfs_unmountfs().
> 
> It could be observed by the following steps reported by Zorro [1]:
> 
> 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> 2. mount $dev $mnt
> 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> 4. umount $mnt
> 5. xfs_repair -n $dev
> 
> yet due to commit f46e5a174655("xfs: fold sbcount quiesce logging
> into log covering"),

> ... xfs_sync_sb() will be triggered even !lazysbcount
> but xfs_log_need_covered() case when xfs_unmountfs(), so hard to
> reproduce on kernel 5.12+.

I think this could be rephrased, but I am not native english-speaker either, so
I can't say much. Maybe...

"xfs_sync_sb() will be triggered if no log covering is needed and !lazysbcount."

> Reported-by: Zorro Lang <zlang@redhat.com>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 60e6d255e5e2..423dada3f64c 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -928,7 +928,13 @@ xfs_log_sb(
>  
>  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
>  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> +		struct xfs_dsb	*dsb = bp->b_addr;
> +
> +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> +	} else {
> +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	}

The patch looks good to me, feel free to add:

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

Darrick J. Wong April 16, 2021, 4 p.m. UTC | #2

On Fri, Apr 16, 2021 at 05:10:23PM +0800, Gao Xiang wrote:
> There are many paths which could trigger xfs_log_sb(), e.g.
>   xfs_bmap_add_attrfork()
>     -> xfs_log_sb()
> , which overrided on-disk fdblocks by in-core per-CPU fdblocks.
> 
> However, for !lazysbcount cases, on-disk fdblocks is actually updated
> by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> in-core fdblocks due to xfs_reserve_block() or whatever, see the
> comment in xfs_unmountfs().
> 
> It could be observed by the following steps reported by Zorro [1]:
> 
> 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> 2. mount $dev $mnt
> 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> 4. umount $mnt
> 5. xfs_repair -n $dev
> 
> yet due to commit f46e5a174655("xfs: fold sbcount quiesce logging
> into log covering"), xfs_sync_sb() will be triggered even !lazysbcount
> but xfs_log_need_covered() case when xfs_unmountfs(), so hard to
> reproduce on kernel 5.12+.

Um, I can't understand this(?), possibly because I can't get to RHBZ and
therefore have very little context to start from. :(

Are you saying that because the f46e commit removed the xfs_sync_sb
calls from unmountfs for !lazysb filesystems, we no longer log the
summary counters at unmount?  Which means that we no longer write the
incore percpu fdblocks count to disk at unmount after we've torn down
all the incore space reservations (when sb_fdblocks == m_fdblocks)?

So that means that for !lazysb fses, the only time we log the sb
counters is during transactions, and when we do log the counters we
actually log the wrong value, since the incore reservations should never
escape to disk?  Hence the fix below?

And then by extension, is the reason that nobody noticed before is that
we always used to log the correct value at unmount, so fses with clean
logs always have the correct value, and fses with dirty logs will
recompute fdblocks after log recovery by summing the AGF free blocks
counts?

(Or possibly nobody uses !lazysb filesystems anymore?)

I /think/ the code change looks ok, but as you might surmise from the
large quantity of questions, I'm not ready to RVB this yet.  The commit
message seems like a good place to answer those questions.

> After this patch, I've seen no strange so far on older kernels
> for the testcase above without lazysbcount.
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1949515

This strangely <cough> doesn't seem to be accessible to the public at
large, since <cough> someone at RedHat decided to block all Oracle IPs
<cough>.

--D

> 
> Reported-by: Zorro Lang <zlang@redhat.com>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 60e6d255e5e2..423dada3f64c 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -928,7 +928,13 @@ xfs_log_sb(
>  
>  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
>  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> +		struct xfs_dsb	*dsb = bp->b_addr;
> +
> +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> +	} else {
> +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	}
>  
>  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
>  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> -- 
> 2.27.0
>

Gao Xiang April 16, 2021, 8:45 p.m. UTC | #3

Hi Carlos,

On Fri, Apr 16, 2021 at 04:10:18PM +0200, Carlos Maiolino wrote:
> On Fri, Apr 16, 2021 at 05:10:23PM +0800, Gao Xiang wrote:
> > There are many paths which could trigger xfs_log_sb(), e.g.
> >   xfs_bmap_add_attrfork()
> >     -> xfs_log_sb()
> > , which overrided on-disk fdblocks by in-core per-CPU fdblocks.
> > 
> > However, for !lazysbcount cases, on-disk fdblocks is actually updated
> > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> > in-core fdblocks due to xfs_reserve_block() or whatever, see the
> > comment in xfs_unmountfs().
> > 
> > It could be observed by the following steps reported by Zorro [1]:
> > 
> > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> > 2. mount $dev $mnt
> > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> > 4. umount $mnt
> > 5. xfs_repair -n $dev
> > 
> > yet due to commit f46e5a174655("xfs: fold sbcount quiesce logging
> > into log covering"),
> 
> > ... xfs_sync_sb() will be triggered even !lazysbcount
> > but xfs_log_need_covered() case when xfs_unmountfs(), so hard to
> > reproduce on kernel 5.12+.
> 
> I think this could be rephrased, but I am not native english-speaker either, so
> I can't say much. Maybe...
> 
> "xfs_sync_sb() will be triggered if no log covering is needed and !lazysbcount."

Thanks for your suggestion, I will update the description as...
 "xfs_sync_sb() will also be triggered if log covering is needed
  and !lazysbcount."

> 
> > Reported-by: Zorro Lang <zlang@redhat.com>
> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > index 60e6d255e5e2..423dada3f64c 100644
> > --- a/fs/xfs/libxfs/xfs_sb.c
> > +++ b/fs/xfs/libxfs/xfs_sb.c
> > @@ -928,7 +928,13 @@ xfs_log_sb(
> >  
> >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > +		struct xfs_dsb	*dsb = bp->b_addr;
> > +
> > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > +	} else {
> > +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > +	}
> 
> The patch looks good to me, feel free to add:
> 
> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> 

Thanks for your review!

Thanks,
Gao Xiang

> -- 
> Carlos
>

Gao Xiang April 16, 2021, 9:13 p.m. UTC | #4

Hi Darrick,

On Fri, Apr 16, 2021 at 09:00:13AM -0700, Darrick J. Wong wrote:
> On Fri, Apr 16, 2021 at 05:10:23PM +0800, Gao Xiang wrote:
> > There are many paths which could trigger xfs_log_sb(), e.g.
> >   xfs_bmap_add_attrfork()
> >     -> xfs_log_sb()
> > , which overrided on-disk fdblocks by in-core per-CPU fdblocks.
> > 
> > However, for !lazysbcount cases, on-disk fdblocks is actually updated
> > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> > in-core fdblocks due to xfs_reserve_block() or whatever, see the
> > comment in xfs_unmountfs().
> > 
> > It could be observed by the following steps reported by Zorro [1]:
> > 
> > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> > 2. mount $dev $mnt
> > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> > 4. umount $mnt
> > 5. xfs_repair -n $dev
> > 
> > yet due to commit f46e5a174655("xfs: fold sbcount quiesce logging
> > into log covering"), xfs_sync_sb() will be triggered even !lazysbcount
> > but xfs_log_need_covered() case when xfs_unmountfs(), so hard to
> > reproduce on kernel 5.12+.
> 
> Um, I can't understand this(?), possibly because I can't get to RHBZ and
> therefore have very little context to start from. :(

Very sorry about that.. I realized it doesn't access at all without some
permission after sending out the patch. :(

> 
> Are you saying that because the f46e commit removed the xfs_sync_sb
> calls from unmountfs for !lazysb filesystems, we no longer log the
> summary counters at unmount?  Which means that we no longer write the
> incore percpu fdblocks count to disk at unmount after we've torn down
> all the incore space reservations (when sb_fdblocks == m_fdblocks)?

Er.. I think that is by reverse, before commit f46e, we no longer logged
the summary counters at unmount, due to 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_mount.c?h=v5.11#n1177
  xfs_unmountfs
    -> xfs_log_sbcount
      -> !xfs_sb_version_haslazysbcount
        -> return 0 (xfs_sync_sb bypassed).

So the only time we update the ondisk fdblocks was during transactions,
but xfs_log_sb() corrupted this (due to no summary counters logging at
unmount).

After f46e, it became
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_log.c?h=v5.12-rc2#n982
  xfs_unmountfs
    -> xfs_log_unmount
      -> xfs_log_clean
        -> xfs_log_cover

So if xfs_log_need_covered(mp) == true and
!xfs_sb_version_haslazysbcount(&mp->m_sb),
xfs_sync_sb() will be triggered to cover the log, So
it's hard to reproduce on the current kernel (at least on my side.)

But I have no idea xfs_log_need_covered(mp) is always true at that time,
and the patchset seems a bit large and (possibly) hard to backport...

> 
> So that means that for !lazysb fses, the only time we log the sb
> counters is during transactions, and when we do log the counters we
> actually log the wrong value, since the incore reservations should never
> escape to disk?  Hence the fix below?

Yes

> 
> And then by extension, is the reason that nobody noticed before is that
> we always used to log the correct value at unmount, so fses with clean
> logs always have the correct value, and fses with dirty logs will
> recompute fdblocks after log recovery by summing the AGF free blocks
> counts?

Nope, prior to 5.12-rc1, I think it was broken for a very long time...

> 
> (Or possibly nobody uses !lazysb filesystems anymore?)
> 

Zorro found this days ago on rhel 8 kernel (4.18, maybe he's doing
some new testcases to cover this), and I think it was broken for much
much long time (I don't know which version it was broken first), maybe
it has little impact since it's just a freespace block counter.

So I think it should be backported to many stable kernel versions (?)
But I have no idea when it was broken...

> I /think/ the code change looks ok, but as you might surmise from the
> large quantity of questions, I'm not ready to RVB this yet.  The commit
> message seems like a good place to answer those questions.
> 
> > After this patch, I've seen no strange so far on older kernels
> > for the testcase above without lazysbcount.
> > 
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1949515
> 
> This strangely <cough> doesn't seem to be accessible to the public at
> large, since <cough> someone at RedHat decided to block all Oracle IPs
> <cough>.

<cough> will get rid of it the next time...

Thanks,
Gao Xiang

> 
> --D
> 
> > 
> > Reported-by: Zorro Lang <zlang@redhat.com>
> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > index 60e6d255e5e2..423dada3f64c 100644
> > --- a/fs/xfs/libxfs/xfs_sb.c
> > +++ b/fs/xfs/libxfs/xfs_sb.c
> > @@ -928,7 +928,13 @@ xfs_log_sb(
> >  
> >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > +		struct xfs_dsb	*dsb = bp->b_addr;
> > +
> > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > +	} else {
> > +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > +	}
> >  
> >  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> >  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> > -- 
> > 2.27.0
> > 
>

Gao Xiang April 16, 2021, 9:36 p.m. UTC | #5

On Sat, Apr 17, 2021 at 05:13:20AM +0800, Gao Xiang wrote:
> Hi Darrick,
> 
> On Fri, Apr 16, 2021 at 09:00:13AM -0700, Darrick J. Wong wrote:
> > On Fri, Apr 16, 2021 at 05:10:23PM +0800, Gao Xiang wrote:
> > > There are many paths which could trigger xfs_log_sb(), e.g.
> > >   xfs_bmap_add_attrfork()
> > >     -> xfs_log_sb()
> > > , which overrided on-disk fdblocks by in-core per-CPU fdblocks.
> > > 
> > > However, for !lazysbcount cases, on-disk fdblocks is actually updated
> > > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> > > in-core fdblocks due to xfs_reserve_block() or whatever, see the
> > > comment in xfs_unmountfs().
> > > 
> > > It could be observed by the following steps reported by Zorro [1]:
> > > 
> > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> > > 2. mount $dev $mnt
> > > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> > > 4. umount $mnt
> > > 5. xfs_repair -n $dev
> > > 
> > > yet due to commit f46e5a174655("xfs: fold sbcount quiesce logging
> > > into log covering"), xfs_sync_sb() will be triggered even !lazysbcount
> > > but xfs_log_need_covered() case when xfs_unmountfs(), so hard to
> > > reproduce on kernel 5.12+.
> > 
> > Um, I can't understand this(?), possibly because I can't get to RHBZ and
> > therefore have very little context to start from. :(
> 
> Very sorry about that.. I realized it doesn't access at all without some
> permission after sending out the patch. :(
> 
> > 
> > Are you saying that because the f46e commit removed the xfs_sync_sb
> > calls from unmountfs for !lazysb filesystems, we no longer log the
> > summary counters at unmount?  Which means that we no longer write the
> > incore percpu fdblocks count to disk at unmount after we've torn down
> > all the incore space reservations (when sb_fdblocks == m_fdblocks)?
> 
> Er.. I think that is by reverse, before commit f46e, we no longer logged
> the summary counters at unmount, due to 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_mount.c?h=v5.11#n1177
>   xfs_unmountfs
>     -> xfs_log_sbcount
>       -> !xfs_sb_version_haslazysbcount
>         -> return 0 (xfs_sync_sb bypassed).
> 
> So the only time we update the ondisk fdblocks was during transactions,
> but xfs_log_sb() corrupted this (due to no summary counters logging at
> unmount).
> 
> After f46e, it became
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_log.c?h=v5.12-rc2#n982
>   xfs_unmountfs
>     -> xfs_log_unmount
>       -> xfs_log_clean
>         -> xfs_log_cover
> 
> So if xfs_log_need_covered(mp) == true and
> !xfs_sb_version_haslazysbcount(&mp->m_sb),
> xfs_sync_sb() will be triggered to cover the log, So
> it's hard to reproduce on the current kernel (at least on my side.)
> 
> But I have no idea xfs_log_need_covered(mp) is always true at that time,
> and the patchset seems a bit large and (possibly) hard to backport...
> 

btw, after checking xfs_check_summary_counts(), I think by now,
sb_fdblocks won't be recalculated if !xfs_sb_version_haslazysbcount
when suddenly power outages (with dirty log...)

So for !xfs_sb_version_haslazysbcount cases, we really rely on the
correct on-disk sb_fdblocks all the time...

Anyway, I also think we should warn !lazysb fs deprecated runtimely
by now (even we have XFS_SUPPORT_V4 build config.)

> > 
> > So that means that for !lazysb fses, the only time we log the sb
> > counters is during transactions, and when we do log the counters we
> > actually log the wrong value, since the incore reservations should never
> > escape to disk?  Hence the fix below?
> 
> Yes
> 
> > 
> > And then by extension, is the reason that nobody noticed before is that
> > we always used to log the correct value at unmount, so fses with clean
> > logs always have the correct value, and fses with dirty logs will
> > recompute fdblocks after log recovery by summing the AGF free blocks
> > counts?
> 
> Nope, prior to 5.12-rc1, I think it was broken for a very long time...
> 
> > 
> > (Or possibly nobody uses !lazysb filesystems anymore?)
> > 
> 
> Zorro found this days ago on rhel 8 kernel (4.18, maybe he's doing
> some new testcases to cover this), and I think it was broken for much
> much long time (I don't know which version it was broken first), maybe
> it has little impact since it's just a freespace block counter.
> 
> So I think it should be backported to many stable kernel versions (?)
> But I have no idea when it was broken...
> 
> > I /think/ the code change looks ok, but as you might surmise from the
> > large quantity of questions, I'm not ready to RVB this yet.  The commit
> > message seems like a good place to answer those questions.
> > 
> > > After this patch, I've seen no strange so far on older kernels
> > > for the testcase above without lazysbcount.
> > > 
> > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1949515
> > 
> > This strangely <cough> doesn't seem to be accessible to the public at
> > large, since <cough> someone at RedHat decided to block all Oracle IPs
> > <cough>.
> 
> <cough> will get rid of it the next time...
> 
> Thanks,
> Gao Xiang
> 
> > 
> > --D
> > 
> > > 
> > > Reported-by: Zorro Lang <zlang@redhat.com>
> > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > index 60e6d255e5e2..423dada3f64c 100644
> > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > @@ -928,7 +928,13 @@ xfs_log_sb(
> > >  
> > >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > +		struct xfs_dsb	*dsb = bp->b_addr;
> > > +
> > > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > > +	} else {
> > > +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > +	}
> > >  
> > >  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > >  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> > > -- 
> > > 2.27.0
> > > 
> >

Darrick J. Wong April 17, 2021, 12:19 a.m. UTC | #6

On Sat, Apr 17, 2021 at 05:13:20AM +0800, Gao Xiang wrote:
> Hi Darrick,
> 
> On Fri, Apr 16, 2021 at 09:00:13AM -0700, Darrick J. Wong wrote:
> > On Fri, Apr 16, 2021 at 05:10:23PM +0800, Gao Xiang wrote:
> > > There are many paths which could trigger xfs_log_sb(), e.g.
> > >   xfs_bmap_add_attrfork()
> > >     -> xfs_log_sb()
> > > , which overrided on-disk fdblocks by in-core per-CPU fdblocks.
> > > 
> > > However, for !lazysbcount cases, on-disk fdblocks is actually updated
> > > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> > > in-core fdblocks due to xfs_reserve_block() or whatever, see the
> > > comment in xfs_unmountfs().
> > > 
> > > It could be observed by the following steps reported by Zorro [1]:
> > > 
> > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> > > 2. mount $dev $mnt
> > > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> > > 4. umount $mnt
> > > 5. xfs_repair -n $dev
> > > 
> > > yet due to commit f46e5a174655("xfs: fold sbcount quiesce logging
> > > into log covering"), xfs_sync_sb() will be triggered even !lazysbcount
> > > but xfs_log_need_covered() case when xfs_unmountfs(), so hard to
> > > reproduce on kernel 5.12+.
> > 
> > Um, I can't understand this(?), possibly because I can't get to RHBZ and
> > therefore have very little context to start from. :(
> 
> Very sorry about that.. I realized it doesn't access at all without some
> permission after sending out the patch. :(

To be fair, I don't think it's part of the standard training that even
the public bugzilla bugs aren't visible to certain least-favored
nations. ;)

> > 
> > Are you saying that because the f46e commit removed the xfs_sync_sb
> > calls from unmountfs for !lazysb filesystems, we no longer log the
> > summary counters at unmount?  Which means that we no longer write the
> > incore percpu fdblocks count to disk at unmount after we've torn down
> > all the incore space reservations (when sb_fdblocks == m_fdblocks)?
> 
> Er.. I think that is by reverse, before commit f46e, we no longer logged
> the summary counters at unmount, due to 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_mount.c?h=v5.11#n1177
>   xfs_unmountfs
>     -> xfs_log_sbcount
>       -> !xfs_sb_version_haslazysbcount
>         -> return 0 (xfs_sync_sb bypassed).
> 
> So the only time we update the ondisk fdblocks was during transactions,
> but xfs_log_sb() corrupted this (due to no summary counters logging at
> unmount).

*OH* ok, so this isn't a fix for a regression in Brian's log covering
refactoring series that went into 5.12; this is a fix for a years old
bug that may very well have been there since the introduction of ...
delayed allocation?  I guess?

At least that makes the justification easier -- in !lazysbcount mode, we
must only update the primary super's fdblocks counter to reflect
whatever update we made to the ondisk metadata, which means that we have
to use mp->m_sb.sb_fdblocks.

(Whereas in lazysbcount mode where we only update the sb counters as
part of cleanly unmounting the log after purging all the incore
reservations and therefore can use m_fdblocks...)

> 
> After f46e, it became
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_log.c?h=v5.12-rc2#n982
>   xfs_unmountfs
>     -> xfs_log_unmount
>       -> xfs_log_clean
>         -> xfs_log_cover
> 
> So if xfs_log_need_covered(mp) == true and
> !xfs_sb_version_haslazysbcount(&mp->m_sb),
> xfs_sync_sb() will be triggered to cover the log, So
> it's hard to reproduce on the current kernel (at least on my side.)

Ah

> But I have no idea xfs_log_need_covered(mp) is always true at that time,
> and the patchset seems a bit large and (possibly) hard to backport...

I wouldn't backport that to a stable series. :)

> > So that means that for !lazysb fses, the only time we log the sb
> > counters is during transactions, and when we do log the counters we
> > actually log the wrong value, since the incore reservations should never
> > escape to disk?  Hence the fix below?
> 
> Yes
> 
> > 
> > And then by extension, is the reason that nobody noticed before is that
> > we always used to log the correct value at unmount, so fses with clean
> > logs always have the correct value, and fses with dirty logs will
> > recompute fdblocks after log recovery by summing the AGF free blocks
> > counts?
> 
> Nope, prior to 5.12-rc1, I think it was broken for a very long time...

Yeah, I got that backwards. :(

> > 
> > (Or possibly nobody uses !lazysb filesystems anymore?)
> > 
> 
> Zorro found this days ago on rhel 8 kernel (4.18, maybe he's doing
> some new testcases to cover this), and I think it was broken for much
> much long time (I don't know which version it was broken first), maybe
> it has little impact since it's just a freespace block counter.

Wrong counters mean wrong ENOSPC decisions...

> So I think it should be backported to many stable kernel versions (?)
> But I have no idea when it was broken...
> 
> > I /think/ the code change looks ok, but as you might surmise from the
> > large quantity of questions, I'm not ready to RVB this yet.  The commit
> > message seems like a good place to answer those questions.
> > 
> > > After this patch, I've seen no strange so far on older kernels
> > > for the testcase above without lazysbcount.
> > > 
> > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1949515
> > 
> > This strangely <cough> doesn't seem to be accessible to the public at
> > large, since <cough> someone at RedHat decided to block all Oracle IPs
> > <cough>.
> 
> <cough> will get rid of it the next time...
> 
> Thanks,
> Gao Xiang
> 
> > 
> > --D
> > 
> > > 
> > > Reported-by: Zorro Lang <zlang@redhat.com>
> > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > index 60e6d255e5e2..423dada3f64c 100644
> > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > @@ -928,7 +928,13 @@ xfs_log_sb(
> > >  
> > >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > +		struct xfs_dsb	*dsb = bp->b_addr;
> > > +
> > > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);

Hmm... is this really needed?  I thought in !lazysbcount mode,
xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
So aren't all three of these updates unnecessary?

--D

> > > +	} else {
> > > +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > +	}
> > >  
> > >  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > >  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> > > -- 
> > > 2.27.0
> > > 
> > 
>

Dave Chinner April 17, 2021, 1:57 a.m. UTC | #7

On Fri, Apr 16, 2021 at 05:19:41PM -0700, Darrick J. Wong wrote:
> On Sat, Apr 17, 2021 at 05:13:20AM +0800, Gao Xiang wrote:
> > Hi Darrick,
> > 
> > On Fri, Apr 16, 2021 at 09:00:13AM -0700, Darrick J. Wong wrote:
> > > On Fri, Apr 16, 2021 at 05:10:23PM +0800, Gao Xiang wrote:
.....
> > > Are you saying that because the f46e commit removed the xfs_sync_sb
> > > calls from unmountfs for !lazysb filesystems, we no longer log the
> > > summary counters at unmount?  Which means that we no longer write the
> > > incore percpu fdblocks count to disk at unmount after we've torn down
> > > all the incore space reservations (when sb_fdblocks == m_fdblocks)?
> > 
> > Er.. I think that is by reverse, before commit f46e, we no longer logged
> > the summary counters at unmount, due to 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_mount.c?h=v5.11#n1177
> >   xfs_unmountfs
> >     -> xfs_log_sbcount
> >       -> !xfs_sb_version_haslazysbcount
> >         -> return 0 (xfs_sync_sb bypassed).
> > 
> > So the only time we update the ondisk fdblocks was during transactions,
> > but xfs_log_sb() corrupted this (due to no summary counters logging at
> > unmount).
> 
> *OH* ok, so this isn't a fix for a regression in Brian's log covering
> refactoring series that went into 5.12; this is a fix for a years old
> bug that may very well have been there since the introduction of ...
> delayed allocation?  I guess?

No, xfs_trans_apply_sb_deltas() modifies the counters in the on disk
superblock buffer directly. These counters don't contain the
in-memory reservations for things like delalloc that are held in the
counters in the xfs_mount (either percpu or mp->m_sb.sb_*).

The whole point of lazy superblock counters was avoiding this disk
buffer update, because it required taking the superblock buffer lock
and that serialised every transaction commit that did
allocation/freeing. This serialised the entire transaction system,
and effectively globally limited XFS to around 20,000 commits/sec....

The original code was this:

/*
 * xfs_log_sbcount
 *
 * Called either periodically to keep the on disk superblock values
 * roughly up to date or from unmount to make sure the values are
 * correct on a clean unmount.
 *
 * Note this code can be called during the process of freezing, so
 * we may need to use the transaction allocator which does not not
 * block when the transaction subsystem is in its frozen state.
 */
int
xfs_log_sbcount(
       xfs_mount_t     *mp,
       uint            sync)
{
       xfs_trans_t     *tp;
       int             error;

       if (!xfs_fs_writable(mp))
               return 0;

       xfs_icsb_sync_counters(mp);

       /*
        * we don't need to do this if we are updating the superblock
        * counters on every modification.
        */
       if (!xfs_sb_version_haslazysbcount(&mp->m_sb))
               return 0;

       tp = _xfs_trans_alloc(mp, XFS_TRANS_SB_COUNT);
       error = xfs_trans_reserve(tp, 0, mp->m_sb.sb_sectsize + 128, 0, 0,
                                       XFS_DEFAULT_LOG_COUNT);
       if (error) {
               xfs_trans_cancel(tp, 0);
               return error;
       }

       xfs_mod_sb(tp, XFS_SB_IFREE | XFS_SB_ICOUNT | XFS_SB_FDBLOCKS);
       if (sync)
               xfs_trans_set_sync(tp);
       xfs_trans_commit(tp, 0);

       return 0;
}

And that slowly ended up being modified and merged into
xfs_log_sb()/xfs_sync_sb() and xfs_log_sb() gained more callers that
didn't have the lazysbcount check. i.e. we lost the original reason
for doing this counter update, and so we end up using it
incorrectly.  Brian's log covering rework got rid of the explicit
xfs_log_sbcount() call in the unmount path and replaced it with this
"log covering is not needed sometimes when lazysbcounters are not in
use". So now there isn't any of the original "counter updates are
only necessary for lazysbcount filesystems" code left...

Going back further, we always used to write the superblock at
unmount, and this always used to happen after all the reservations
had been released. Hence it didn't matter whether lazy-sb was
enabled or not, the unmount always wrote out the correct value to
the superblock.

This explicit superblock write was removed from xfs_unmountfs() back
in 2012 by commit 211e4d434bd7 ("xfs: implement freezing by emptying
the AIL") where is was replaced in the unmount path by a call to
xfs_ail_push_all_sync(mp->m_ail). With an explicit call to
xfs_log_sbcount() in the unmountfs path, it was clear that this
would result in the superblock getting written if counters needed
syncing.

IOWs, these changes in 2012 meant that if the superblock is not
dirty, we don't actually write it at unmount. Which means that, for
a non-lazy sb counter filesyste, if the last thing to modify the
superblock was a call to xfs_log_sbcount(), there's every chance
that the superblock that is written back contains the in-memory
values in it from the percpu counters, not the value from
mp->m_sb.sb_fdblocks....

It's only taken ~10 years of testing to find an escape on a clean
unmount.... :/

So, yeah, this seems like a combination of changes over a period of
years adding up to incorrectly logging the superblock counts and not
noticing because unmount on lazycount filesystems rewrites them to
be correct on a clean unmount. BUt with lazysbcount == 0, we've
ended up with an escape because we don't actually update and write
the superblock at unmount anymore.

> > But I have no idea xfs_log_need_covered(mp) is always true at that time,
> > and the patchset seems a bit large and (possibly) hard to backport...
> 
> I wouldn't backport that to a stable series. :)

Nor is it necessary to fix the problem.

> > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > > index 60e6d255e5e2..423dada3f64c 100644
> > > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > > @@ -928,7 +928,13 @@ xfs_log_sb(
> > > >  
> > > >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > > +		struct xfs_dsb	*dsb = bp->b_addr;
> > > > +
> > > > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> 
> Hmm... is this really needed?  I thought in !lazysbcount mode,
> xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
> So aren't all three of these updates unnecessary?

Yup, now I understand the issue, the fix is simply to avoid these
updates for !lazysb. i.e. it should just be:

	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
	}
	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);

Cheers,

Dave.

Gao Xiang April 17, 2021, 2:20 a.m. UTC | #8

Hi Darrick and Dave,

On Sat, Apr 17, 2021 at 11:57:02AM +1000, Dave Chinner wrote:
> On Fri, Apr 16, 2021 at 05:19:41PM -0700, Darrick J. Wong wrote:
> > On Sat, Apr 17, 2021 at 05:13:20AM +0800, Gao Xiang wrote:

...

> 
> Nor is it necessary to fix the problem.
> 
> > > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > > > index 60e6d255e5e2..423dada3f64c 100644
> > > > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > > > @@ -928,7 +928,13 @@ xfs_log_sb(
> > > > >  
> > > > >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > > >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > > > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > > > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > > > +		struct xfs_dsb	*dsb = bp->b_addr;
> > > > > +
> > > > > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > 
> > Hmm... is this really needed?  I thought in !lazysbcount mode,
> > xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
> > So aren't all three of these updates unnecessary?
> 
> Yup, now I understand the issue, the fix is simply to avoid these
> updates for !lazysb. i.e. it should just be:
> 
> 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> 		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> 		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> 		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> 	}
> 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);

I did as this because xfs_sb_to_disk() will override them, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c#n629

...
	to->sb_icount = cpu_to_be64(from->sb_icount);
	to->sb_ifree = cpu_to_be64(from->sb_ifree);
	to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);

As an alternative, I was once to wrap it as:

xfs_sb_to_disk() {
...
	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
		to->sb_icount = cpu_to_be64(from->sb_icount);
		to->sb_ifree = cpu_to_be64(from->sb_ifree);
		to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
	}
...
}

Yet after I observed the other callers of xfs_sb_to_disk() (e.g. growfs
and online repair), I think a better modification is the way I proposed
here, so no need to update xfs_sb_to_disk() and the other callers (since
!lazysbcount is not recommended at all.)

It's easier to backport and less conflict, and btw !lazysbcount also need
to be warned out and deprecated from now.

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

Dave Chinner April 17, 2021, 10:32 p.m. UTC | #9

On Sat, Apr 17, 2021 at 10:20:13AM +0800, Gao Xiang wrote:
> Hi Darrick and Dave,
> 
> On Sat, Apr 17, 2021 at 11:57:02AM +1000, Dave Chinner wrote:
> > On Fri, Apr 16, 2021 at 05:19:41PM -0700, Darrick J. Wong wrote:
> > > On Sat, Apr 17, 2021 at 05:13:20AM +0800, Gao Xiang wrote:
> 
> ...
> 
> > 
> > Nor is it necessary to fix the problem.
> > 
> > > > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > > > > index 60e6d255e5e2..423dada3f64c 100644
> > > > > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > > > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > > > > @@ -928,7 +928,13 @@ xfs_log_sb(
> > > > > >  
> > > > > >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > > > >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > > > > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > > > > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > > > > +		struct xfs_dsb	*dsb = bp->b_addr;
> > > > > > +
> > > > > > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > > 
> > > Hmm... is this really needed?  I thought in !lazysbcount mode,
> > > xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
> > > So aren't all three of these updates unnecessary?
> > 
> > Yup, now I understand the issue, the fix is simply to avoid these
> > updates for !lazysb. i.e. it should just be:
> > 
> > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > 		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > 		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > 		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > 	}
> > 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> 
> I did as this because xfs_sb_to_disk() will override them, see:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c#n629
> 
> ...
> 	to->sb_icount = cpu_to_be64(from->sb_icount);
> 	to->sb_ifree = cpu_to_be64(from->sb_ifree);
> 	to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);

> As an alternative, I was once to wrap it as:
> 
> xfs_sb_to_disk() {
> ...
> 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> 		to->sb_icount = cpu_to_be64(from->sb_icount);
> 		to->sb_ifree = cpu_to_be64(from->sb_ifree);
> 		to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> 	}
> ...
> }

This goes back to a commit in 2015 dropping the fields parameter from
xfs_sb_to_disk(). Originally, we only formatted the requested
parameters to the on-disk buffer from the in-memory superblock amd
this was removed in 2015 by commit 4d11a4023940 ("xfs: remove bitfield
based superblock updates") which meant all superblock modification
calls updated the entire on-disk log.

Up to that point, only xfs_log_sbcount() updated the on-disk
counters in the superblock buffer, and only for lazy-count enabled
filesystems. And xfs_bmap_add_attrfork() would only update the
features fields in the superblock, and nothing else. Now every
modification to the sueprblock updates everythign from the in-memory
state.

However, there are two sets of in-memory state for the superblock
accounting - the superblock fields and the per-cpu coutners. The
per-cpu counters are the ones we apply reservations to and the ones
we use for space tracking. The counters in the mp->m_sb are updated
in the same manner as the on-disk counters.

That is, xfs_trans_apply_sb_deltas() only applies deltas to the
directly to the in-memory superblock in the case of !lazy-count, so
these counters are actually a correct representation of the on-disk
value of the accounting when lazy-count=0.

Hence we should always be able to write the counters in mp->m_sb
directly to the on-disk superblock buffer in the case of
lazy-count=0 and the values should be correct. lazy-count=1 only
updates the mp->m_sb counters from the per-cpu counters so that the
on-disk counters aren't wildly inaccruate, and so that when we
unmount/freeze/etc the counters are actually correct.

Long story short, I think xfs_sb_to_disk() always updating the
on-disk superblock from mp->m_sb is safe to do as the counters in
mp->m_sb are updated in the same manner during transaction commit as
the superblock buffer counters for lazy-count=0....

> Yet after I observed the other callers of xfs_sb_to_disk() (e.g. growfs
> and online repair), I think a better modification is the way I proposed
> here, so no need to update xfs_sb_to_disk() and the other callers (since
> !lazysbcount is not recommended at all.)

Yup that's the original reason for having a fields flag to do
condition update of the on-disk buffer from the in-memory state.
Different code has diferrent requirements, but it looked like this
didn't matter for lazy-count filesystems because other checks
avoided the update of m_sb fields. What was missed in that
optimisation was the fact lazy-count=0 never updated the counters
directly.

/me is now wondering why we even bother with !lazy-count anymore.

WE've updated the agr btree block accounting unconditionally since
lazy-count was added, and scrub will always report a mismatch in
counts if they exist regardless of lazy-count. So why don't we just
start ignoring the on-disk value and always use lazy-count based
updates?

We only added it as mkfs option/feature bit because of the recovery
issue with not being able to account for btree blocks properly at
mount time, but now we have mechanisms for counting blocks in btrees
so even that has gone away. So we could actually just turn
on lazy-count at mount time, and we could get rid of this whole
set of subtle conditional behaviours we clearly aren't able to
exercise effectively...

> It's easier to backport and less conflict, and btw !lazysbcount also need
> to be warned out and deprecated from now.

You have to use -m crc=0 to turn off lazycount, and the deprecation
warning should come from -m crc=0...

Cheers,

Dave.

Gao Xiang April 17, 2021, 11:59 p.m. UTC | #10

Hi Dave,

On Sun, Apr 18, 2021 at 08:32:01AM +1000, Dave Chinner wrote:
> On Sat, Apr 17, 2021 at 10:20:13AM +0800, Gao Xiang wrote:

...

> > > > Hmm... is this really needed?  I thought in !lazysbcount mode,
> > > > xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
> > > > So aren't all three of these updates unnecessary?
> > > 
> > > Yup, now I understand the issue, the fix is simply to avoid these
> > > updates for !lazysb. i.e. it should just be:
> > > 
> > > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > 		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > 		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > 		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > 	}
> > > 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > 
> > I did as this because xfs_sb_to_disk() will override them, see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c#n629
> > 
> > ...
> > 	to->sb_icount = cpu_to_be64(from->sb_icount);
> > 	to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > 	to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> 
> > As an alternative, I was once to wrap it as:
> > 
> > xfs_sb_to_disk() {
> > ...
> > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > 		to->sb_icount = cpu_to_be64(from->sb_icount);
> > 		to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > 		to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> > 	}
> > ...
> > }
> 

...

> 
> That is, xfs_trans_apply_sb_deltas() only applies deltas to the
> directly to the in-memory superblock in the case of !lazy-count, so
> these counters are actually a correct representation of the on-disk
> value of the accounting when lazy-count=0.
> 
> Hence we should always be able to write the counters in mp->m_sb
> directly to the on-disk superblock buffer in the case of
> lazy-count=0 and the values should be correct. lazy-count=1 only
> updates the mp->m_sb counters from the per-cpu counters so that the
> on-disk counters aren't wildly inaccruate, and so that when we
> unmount/freeze/etc the counters are actually correct.
> 
> Long story short, I think xfs_sb_to_disk() always updating the
> on-disk superblock from mp->m_sb is safe to do as the counters in
> mp->m_sb are updated in the same manner during transaction commit as
> the superblock buffer counters for lazy-count=0....

Thanks for your long words, I have to say I don't quite get what's
your thought here, if my understanding is correct,
xfs_trans_apply_sb_deltas() for !lazy-count case just directly
update on-disk superblock (rather than in-memory superblock), see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_trans.c?h=v5.12-rc2#n501

	if (!xfs_sb_version_haslazysbcount(&(tp->t_mountp->m_sb))) {
		if (tp->t_icount_delta)
			be64_add_cpu(&sbp->sb_icount, tp->t_icount_delta);
		if (tp->t_ifree_delta)
			be64_add_cpu(&sbp->sb_ifree, tp->t_ifree_delta);
		if (tp->t_fdblocks_delta)
			be64_add_cpu(&sbp->sb_fdblocks, tp->t_fdblocks_delta);
		if (tp->t_res_fdblocks_delta)
			be64_add_cpu(&sbp->sb_fdblocks, tp->t_res_fdblocks_delta);
	}

That is why I think in-memory mp->m_sb.sb_icount, mp->m_sb.sb_ifree,
mp->m_sb.sb_fdblocks are all outdated at all (kindly correct me if I'm wrong
here)... so xfs_sb_to_disk() will replace on-disk fields with outdated
in-memory counters.

> 
> > Yet after I observed the other callers of xfs_sb_to_disk() (e.g. growfs
> > and online repair), I think a better modification is the way I proposed
> > here, so no need to update xfs_sb_to_disk() and the other callers (since
> > !lazysbcount is not recommended at all.)
> 
> Yup that's the original reason for having a fields flag to do
> condition update of the on-disk buffer from the in-memory state.
> Different code has diferrent requirements, but it looked like this
> didn't matter for lazy-count filesystems because other checks
> avoided the update of m_sb fields. What was missed in that
> optimisation was the fact lazy-count=0 never updated the counters
> directly.
> 
> /me is now wondering why we even bother with !lazy-count anymore.
> 
> WE've updated the agr btree block accounting unconditionally since
> lazy-count was added, and scrub will always report a mismatch in
> counts if they exist regardless of lazy-count. So why don't we just
> start ignoring the on-disk value and always use lazy-count based
> updates?
> 
> We only added it as mkfs option/feature bit because of the recovery
> issue with not being able to account for btree blocks properly at
> mount time, but now we have mechanisms for counting blocks in btrees
> so even that has gone away. So we could actually just turn
> on lazy-count at mount time, and we could get rid of this whole
> set of subtle conditional behaviours we clearly aren't able to
> exercise effectively...

If my understanding of the words above is correct, maybe that could
be unfriendly when users turned back to some old kernels. But
considering lazysbcount has been landed for quite quite long time,
I think that is practical as 2 patches:
 1) fix sb counters for !lazysbcount;
 2) turn on lazysbcount at the mount time from now (and warn users).

> 
> > It's easier to backport and less conflict, and btw !lazysbcount also need
> > to be warned out and deprecated from now.
> 
> You have to use -m crc=0 to turn off lazycount, and the deprecation
> warning should come from -m crc=0...

Yes, but I think 2030 is too far for this !lazysbcount feature, since
it seems easy to cause potential bugs. I think maybe we could get rid
of it as soon as possible.

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

Dave Chinner April 18, 2021, 10:08 p.m. UTC | #11

On Sun, Apr 18, 2021 at 07:59:48AM +0800, Gao Xiang wrote:
> Hi Dave,
> 
> On Sun, Apr 18, 2021 at 08:32:01AM +1000, Dave Chinner wrote:
> > On Sat, Apr 17, 2021 at 10:20:13AM +0800, Gao Xiang wrote:
> 
> ...
> 
> > > > > Hmm... is this really needed?  I thought in !lazysbcount mode,
> > > > > xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
> > > > > So aren't all three of these updates unnecessary?
> > > > 
> > > > Yup, now I understand the issue, the fix is simply to avoid these
> > > > updates for !lazysb. i.e. it should just be:
> > > > 
> > > > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > > 		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > > 		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > > 		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > > 	}
> > > > 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > > 
> > > I did as this because xfs_sb_to_disk() will override them, see:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c#n629
> > > 
> > > ...
> > > 	to->sb_icount = cpu_to_be64(from->sb_icount);
> > > 	to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > > 	to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> > 
> > > As an alternative, I was once to wrap it as:
> > > 
> > > xfs_sb_to_disk() {
> > > ...
> > > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > 		to->sb_icount = cpu_to_be64(from->sb_icount);
> > > 		to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > > 		to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> > > 	}
> > > ...
> > > }
> > 
> 
> ...
> 
> > 
> > That is, xfs_trans_apply_sb_deltas() only applies deltas to the
> > directly to the in-memory superblock in the case of !lazy-count, so
> > these counters are actually a correct representation of the on-disk
> > value of the accounting when lazy-count=0.
> > 
> > Hence we should always be able to write the counters in mp->m_sb
> > directly to the on-disk superblock buffer in the case of
> > lazy-count=0 and the values should be correct. lazy-count=1 only
> > updates the mp->m_sb counters from the per-cpu counters so that the
> > on-disk counters aren't wildly inaccruate, and so that when we
> > unmount/freeze/etc the counters are actually correct.
> > 
> > Long story short, I think xfs_sb_to_disk() always updating the
> > on-disk superblock from mp->m_sb is safe to do as the counters in
> > mp->m_sb are updated in the same manner during transaction commit as
> > the superblock buffer counters for lazy-count=0....
> 
> Thanks for your long words, I have to say I don't quite get what's
> your thought here, if my understanding is correct,
> xfs_trans_apply_sb_deltas() for !lazy-count case just directly
> update on-disk superblock (rather than in-memory superblock), see:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_trans.c?h=v5.12-rc2#n501
> 
> 	if (!xfs_sb_version_haslazysbcount(&(tp->t_mountp->m_sb))) {
> 		if (tp->t_icount_delta)
> 			be64_add_cpu(&sbp->sb_icount, tp->t_icount_delta);
> 		if (tp->t_ifree_delta)
> 			be64_add_cpu(&sbp->sb_ifree, tp->t_ifree_delta);
> 		if (tp->t_fdblocks_delta)
> 			be64_add_cpu(&sbp->sb_fdblocks, tp->t_fdblocks_delta);
> 		if (tp->t_res_fdblocks_delta)
> 			be64_add_cpu(&sbp->sb_fdblocks, tp->t_res_fdblocks_delta);
> 	}

Yeah, I think I misread this jumping between diffs, commits, the
historic tree, etc. got tangled up in the twisty, gnarly branches of
the code...

> > /me is now wondering why we even bother with !lazy-count anymore.
> > 
> > WE've updated the agr btree block accounting unconditionally since
> > lazy-count was added, and scrub will always report a mismatch in
> > counts if they exist regardless of lazy-count. So why don't we just
> > start ignoring the on-disk value and always use lazy-count based
> > updates?
> > 
> > We only added it as mkfs option/feature bit because of the recovery
> > issue with not being able to account for btree blocks properly at
> > mount time, but now we have mechanisms for counting blocks in btrees
> > so even that has gone away. So we could actually just turn
> > on lazy-count at mount time, and we could get rid of this whole
> > set of subtle conditional behaviours we clearly aren't able to
> > exercise effectively...
> 
> If my understanding of the words above is correct, maybe that could
> be unfriendly when users turned back to some old kernels. But
> considering lazysbcount has been landed for quite quite long time,
> I think that is practical as 2 patches:
>  1) fix sb counters for !lazysbcount;
>  2) turn on lazysbcount at the mount time from now (and warn users).

Yup, that seems reasonable to me - getting rid of all the
lazysbcount checks everywhere except the mount path would simplify
the code a lot...

> > You have to use -m crc=0 to turn off lazycount, and the deprecation
> > warning should come from -m crc=0...
> 
> Yes, but I think 2030 is too far for this !lazysbcount feature, since
> it seems easy to cause potential bugs. I think maybe we could get rid
> of it as soon as possible.

Yeah, that's why I think we just turn it on unconditionally. It's
already deprecated, and all supported long term kernels support lazy
counters, so there's no reason for needing lazy-count=0 anymore...

Cheers,

Dave.

Gao Xiang April 19, 2021, 12:38 a.m. UTC | #12

On Mon, Apr 19, 2021 at 08:08:31AM +1000, Dave Chinner wrote:
> On Sun, Apr 18, 2021 at 07:59:48AM +0800, Gao Xiang wrote:
> > Hi Dave,
> > 
> > On Sun, Apr 18, 2021 at 08:32:01AM +1000, Dave Chinner wrote:
> > > On Sat, Apr 17, 2021 at 10:20:13AM +0800, Gao Xiang wrote:
> > 
> > ...
> > 
> > > > > > Hmm... is this really needed?  I thought in !lazysbcount mode,
> > > > > > xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
> > > > > > So aren't all three of these updates unnecessary?
> > > > > 
> > > > > Yup, now I understand the issue, the fix is simply to avoid these
> > > > > updates for !lazysb. i.e. it should just be:
> > > > > 
> > > > > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > > > 		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > > > 		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > > > 		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > > > 	}
> > > > > 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > > > 
> > > > I did as this because xfs_sb_to_disk() will override them, see:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c#n629
> > > > 
> > > > ...
> > > > 	to->sb_icount = cpu_to_be64(from->sb_icount);
> > > > 	to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > > > 	to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> > > 
> > > > As an alternative, I was once to wrap it as:
> > > > 
> > > > xfs_sb_to_disk() {
> > > > ...
> > > > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > > 		to->sb_icount = cpu_to_be64(from->sb_icount);
> > > > 		to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > > > 		to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> > > > 	}
> > > > ...
> > > > }
> > > 
> > 
> > ...
> > 
> > > 
> > > That is, xfs_trans_apply_sb_deltas() only applies deltas to the
> > > directly to the in-memory superblock in the case of !lazy-count, so
> > > these counters are actually a correct representation of the on-disk
> > > value of the accounting when lazy-count=0.
> > > 
> > > Hence we should always be able to write the counters in mp->m_sb
> > > directly to the on-disk superblock buffer in the case of
> > > lazy-count=0 and the values should be correct. lazy-count=1 only
> > > updates the mp->m_sb counters from the per-cpu counters so that the
> > > on-disk counters aren't wildly inaccruate, and so that when we
> > > unmount/freeze/etc the counters are actually correct.
> > > 
> > > Long story short, I think xfs_sb_to_disk() always updating the
> > > on-disk superblock from mp->m_sb is safe to do as the counters in
> > > mp->m_sb are updated in the same manner during transaction commit as
> > > the superblock buffer counters for lazy-count=0....
> > 
> > Thanks for your long words, I have to say I don't quite get what's
> > your thought here, if my understanding is correct,
> > xfs_trans_apply_sb_deltas() for !lazy-count case just directly
> > update on-disk superblock (rather than in-memory superblock), see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_trans.c?h=v5.12-rc2#n501
> > 
> > 	if (!xfs_sb_version_haslazysbcount(&(tp->t_mountp->m_sb))) {
> > 		if (tp->t_icount_delta)
> > 			be64_add_cpu(&sbp->sb_icount, tp->t_icount_delta);
> > 		if (tp->t_ifree_delta)
> > 			be64_add_cpu(&sbp->sb_ifree, tp->t_ifree_delta);
> > 		if (tp->t_fdblocks_delta)
> > 			be64_add_cpu(&sbp->sb_fdblocks, tp->t_fdblocks_delta);
> > 		if (tp->t_res_fdblocks_delta)
> > 			be64_add_cpu(&sbp->sb_fdblocks, tp->t_res_fdblocks_delta);
> > 	}
> 
> Yeah, I think I misread this jumping between diffs, commits, the
> historic tree, etc. got tangled up in the twisty, gnarly branches of
> the code...
> 
> > > /me is now wondering why we even bother with !lazy-count anymore.
> > > 
> > > WE've updated the agr btree block accounting unconditionally since
> > > lazy-count was added, and scrub will always report a mismatch in
> > > counts if they exist regardless of lazy-count. So why don't we just
> > > start ignoring the on-disk value and always use lazy-count based
> > > updates?
> > > 
> > > We only added it as mkfs option/feature bit because of the recovery
> > > issue with not being able to account for btree blocks properly at
> > > mount time, but now we have mechanisms for counting blocks in btrees
> > > so even that has gone away. So we could actually just turn
> > > on lazy-count at mount time, and we could get rid of this whole
> > > set of subtle conditional behaviours we clearly aren't able to
> > > exercise effectively...
> > 
> > If my understanding of the words above is correct, maybe that could
> > be unfriendly when users turned back to some old kernels. But
> > considering lazysbcount has been landed for quite quite long time,
> > I think that is practical as 2 patches:
> >  1) fix sb counters for !lazysbcount;
> >  2) turn on lazysbcount at the mount time from now (and warn users).
> 
> Yup, that seems reasonable to me - getting rid of all the
> lazysbcount checks everywhere except the mount path would simplify
> the code a lot...

Okay, let me investigate to turn on lazysb feature today as well, so
clean up all the code.

Thanks,
Gao Xiang

> 
> > > You have to use -m crc=0 to turn off lazycount, and the deprecation
> > > warning should come from -m crc=0...
> > 
> > Yes, but I think 2030 is too far for this !lazysbcount feature, since
> > it seems easy to cause potential bugs. I think maybe we could get rid
> > of it as soon as possible.
> 
> Yeah, that's why I think we just turn it on unconditionally. It's
> already deprecated, and all supported long term kernels support lazy
> counters, so there's no reason for needing lazy-count=0 anymore...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

Darrick J. Wong April 20, 2021, 5:17 p.m. UTC | #13

On Sun, Apr 18, 2021 at 08:32:01AM +1000, Dave Chinner wrote:
> On Sat, Apr 17, 2021 at 10:20:13AM +0800, Gao Xiang wrote:
> > Hi Darrick and Dave,
> > 
> > On Sat, Apr 17, 2021 at 11:57:02AM +1000, Dave Chinner wrote:
> > > On Fri, Apr 16, 2021 at 05:19:41PM -0700, Darrick J. Wong wrote:
> > > > On Sat, Apr 17, 2021 at 05:13:20AM +0800, Gao Xiang wrote:
> > 
> > ...
> > 
> > > 
> > > Nor is it necessary to fix the problem.
> > > 
> > > > > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > > > > > index 60e6d255e5e2..423dada3f64c 100644
> > > > > > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > > > > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > > > > > @@ -928,7 +928,13 @@ xfs_log_sb(
> > > > > > >  
> > > > > > >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > > > > >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > > > > > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > > > > > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > > > > > +		struct xfs_dsb	*dsb = bp->b_addr;
> > > > > > > +
> > > > > > > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > > > 
> > > > Hmm... is this really needed?  I thought in !lazysbcount mode,
> > > > xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
> > > > So aren't all three of these updates unnecessary?
> > > 
> > > Yup, now I understand the issue, the fix is simply to avoid these
> > > updates for !lazysb. i.e. it should just be:
> > > 
> > > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > 		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > 		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > 		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > 	}
> > > 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > 
> > I did as this because xfs_sb_to_disk() will override them, see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c#n629
> > 
> > ...
> > 	to->sb_icount = cpu_to_be64(from->sb_icount);
> > 	to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > 	to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> 
> > As an alternative, I was once to wrap it as:
> > 
> > xfs_sb_to_disk() {
> > ...
> > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > 		to->sb_icount = cpu_to_be64(from->sb_icount);
> > 		to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > 		to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> > 	}
> > ...
> > }
> 
> This goes back to a commit in 2015 dropping the fields parameter from
> xfs_sb_to_disk(). Originally, we only formatted the requested
> parameters to the on-disk buffer from the in-memory superblock amd
> this was removed in 2015 by commit 4d11a4023940 ("xfs: remove bitfield
> based superblock updates") which meant all superblock modification
> calls updated the entire on-disk log.
> 
> Up to that point, only xfs_log_sbcount() updated the on-disk
> counters in the superblock buffer, and only for lazy-count enabled
> filesystems. And xfs_bmap_add_attrfork() would only update the
> features fields in the superblock, and nothing else. Now every
> modification to the sueprblock updates everythign from the in-memory
> state.
> 
> However, there are two sets of in-memory state for the superblock
> accounting - the superblock fields and the per-cpu coutners. The
> per-cpu counters are the ones we apply reservations to and the ones
> we use for space tracking. The counters in the mp->m_sb are updated
> in the same manner as the on-disk counters.
> 
> That is, xfs_trans_apply_sb_deltas() only applies deltas to the
> directly to the in-memory superblock in the case of !lazy-count, so
> these counters are actually a correct representation of the on-disk
> value of the accounting when lazy-count=0.
> 
> Hence we should always be able to write the counters in mp->m_sb
> directly to the on-disk superblock buffer in the case of
> lazy-count=0 and the values should be correct. lazy-count=1 only
> updates the mp->m_sb counters from the per-cpu counters so that the
> on-disk counters aren't wildly inaccruate, and so that when we
> unmount/freeze/etc the counters are actually correct.
> 
> Long story short, I think xfs_sb_to_disk() always updating the
> on-disk superblock from mp->m_sb is safe to do as the counters in
> mp->m_sb are updated in the same manner during transaction commit as
> the superblock buffer counters for lazy-count=0....
> 
> > Yet after I observed the other callers of xfs_sb_to_disk() (e.g. growfs
> > and online repair), I think a better modification is the way I proposed
> > here, so no need to update xfs_sb_to_disk() and the other callers (since
> > !lazysbcount is not recommended at all.)
> 
> Yup that's the original reason for having a fields flag to do
> condition update of the on-disk buffer from the in-memory state.
> Different code has diferrent requirements, but it looked like this
> didn't matter for lazy-count filesystems because other checks
> avoided the update of m_sb fields. What was missed in that
> optimisation was the fact lazy-count=0 never updated the counters
> directly.
> 
> /me is now wondering why we even bother with !lazy-count anymore.
> 
> WE've updated the agr btree block accounting unconditionally since
> lazy-count was added, and scrub will always report a mismatch in
> counts if they exist regardless of lazy-count. So why don't we just
> start ignoring the on-disk value and always use lazy-count based
> updates?
> 
> We only added it as mkfs option/feature bit because of the recovery
> issue with not being able to account for btree blocks properly at
> mount time, but now we have mechanisms for counting blocks in btrees
> so even that has gone away. So we could actually just turn
> on lazy-count at mount time, and we could get rid of this whole
> set of subtle conditional behaviours we clearly aren't able to
> exercise effectively...
> 
> > It's easier to backport and less conflict, and btw !lazysbcount also need
> > to be warned out and deprecated from now.
> 
> You have to use -m crc=0 to turn off lazycount, and the deprecation
> warning should come from -m crc=0...

Can someone send Eric a patch adding a warning into mkfs about
formatting a V4 filesystem, please? :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

xfs: don't use in-core per-cpu fdblocks for !lazysbcount

Commit Message

Comments

Patch