diff mbox

[v2] xfs: detect agfl count corruption and reset agfl

Message ID 20180315174539.48142-1-bfoster@redhat.com (mailing list archive)
State Accepted
Headers show

Commit Message

Brian Foster March 15, 2018, 5:45 p.m. UTC
The struct xfs_agfl v5 header was originally introduced with
unexpected padding that caused the AGFL to operate with one less
slot than intended. The header has since been packed, but the fix
left an incompatibility for users who upgrade from an old kernel
with the unpacked header to a newer kernel with the packed header
while the AGFL happens to wrap around the end. The newer kernel
recognizes one extra slot at the physical end of the AGFL that the
previous kernel did not. The new kernel will eventually attempt to
allocate a block from that slot, which contains invalid data, and
cause a crash.

This condition can be detected by comparing the active range of the
AGFL to the count. While this detects a padding mismatch, it can
also trigger false positives for unrelated flcount corruption. Since
we cannot distinguish a size mismatch due to padding from unrelated
corruption, we can't trust the AGFL enough to simply repopulate the
empty slot.

Instead, avoid unnecessarily complex detection logic and and use a
solution that can handle any form of flcount corruption that slips
through read verifiers: distrust the entire AGFL and reset it to an
empty state. Any valid blocks within the AGFL are intentionally
leaked. This requires xfs_repair to rectify (which was already
necessary based on the state the AGFL was found in). The reset
mitigates the side effect of the padding mismatch problem from a
filesystem crash to a free space accounting inconsistency. The
generic approach also means that this patch can be safely backported
to kernels with or without a packed struct xfs_agfl.

Check the AGF for an invalid freelist count on initial read from
disk. If detected, set a flag on the xfs_perag to indicate that a
reset is required before the AGFL can be used. In the first
transaction that attempts to use a flagged AGFL, reset it to empty,
warn the user about the inconsistency and allow the freelist fixup
code to repopulate the AGFL with new blocks. The xfs_perag flag is
cleared to eliminate the need for repeated checks on each block
allocation operation.

Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
CC: <stable@vger.kernel.org>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
---

v2:
- Function rename and logic cleanup.
- CC stable.
v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2
- Use a simplified AGFL reset mechanism.
- Trigger on AGFL fixup rather than get/put.
- Various refactors, cleanups and comments.
rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2

 fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_mount.h        |  1 +
 fs/xfs/xfs_trace.h        |  9 ++++-
 3 files changed, 103 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong March 15, 2018, 7:31 p.m. UTC | #1
On Thu, Mar 15, 2018 at 01:45:39PM -0400, Brian Foster wrote:
> The struct xfs_agfl v5 header was originally introduced with
> unexpected padding that caused the AGFL to operate with one less
> slot than intended. The header has since been packed, but the fix
> left an incompatibility for users who upgrade from an old kernel
> with the unpacked header to a newer kernel with the packed header
> while the AGFL happens to wrap around the end. The newer kernel
> recognizes one extra slot at the physical end of the AGFL that the
> previous kernel did not. The new kernel will eventually attempt to
> allocate a block from that slot, which contains invalid data, and
> cause a crash.
> 
> This condition can be detected by comparing the active range of the
> AGFL to the count. While this detects a padding mismatch, it can
> also trigger false positives for unrelated flcount corruption. Since
> we cannot distinguish a size mismatch due to padding from unrelated
> corruption, we can't trust the AGFL enough to simply repopulate the
> empty slot.
> 
> Instead, avoid unnecessarily complex detection logic and and use a
> solution that can handle any form of flcount corruption that slips
> through read verifiers: distrust the entire AGFL and reset it to an
> empty state. Any valid blocks within the AGFL are intentionally
> leaked. This requires xfs_repair to rectify (which was already
> necessary based on the state the AGFL was found in). The reset
> mitigates the side effect of the padding mismatch problem from a
> filesystem crash to a free space accounting inconsistency. The
> generic approach also means that this patch can be safely backported
> to kernels with or without a packed struct xfs_agfl.
> 
> Check the AGF for an invalid freelist count on initial read from
> disk. If detected, set a flag on the xfs_perag to indicate that a
> reset is required before the AGFL can be used. In the first
> transaction that attempts to use a flagged AGFL, reset it to empty,
> warn the user about the inconsistency and allow the freelist fixup
> code to repopulate the AGFL with new blocks. The xfs_perag flag is
> cleared to eliminate the need for repeated checks on each block
> allocation operation.
> 
> Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
> CC: <stable@vger.kernel.org>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>

Thanks, will test and apply to 4.17.

--D

> ---
> 
> v2:
> - Function rename and logic cleanup.
> - CC stable.
> v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2
> - Use a simplified AGFL reset mechanism.
> - Trigger on AGFL fixup rather than get/put.
> - Various refactors, cleanups and comments.
> rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
> 
>  fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_mount.h        |  1 +
>  fs/xfs/xfs_trace.h        |  9 ++++-
>  3 files changed, 103 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 3db90b707fb2..39387bdd225d 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2063,6 +2063,93 @@ xfs_alloc_space_available(
>  }
>  
>  /*
> + * Check the agfl fields of the agf for inconsistency or corruption. The purpose
> + * is to detect an agfl header padding mismatch between current and early v5
> + * kernels. This problem manifests as a 1-slot size difference between the
> + * on-disk flcount and the active [first, last] range of a wrapped agfl. This
> + * may also catch variants of agfl count corruption unrelated to padding. Either
> + * way, we'll reset the agfl and warn the user.
> + *
> + * Return true if a reset is required before the agfl can be used, false
> + * otherwise.
> + */
> +static bool
> +xfs_agfl_needs_reset(
> +	struct xfs_mount	*mp,
> +	struct xfs_agf		*agf)
> +{
> +	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
> +	uint32_t		l = be32_to_cpu(agf->agf_fllast);
> +	uint32_t		c = be32_to_cpu(agf->agf_flcount);
> +	int			agfl_size = xfs_agfl_size(mp);
> +	int			active;
> +
> +	/* no agfl header on v4 supers */
> +	if (!xfs_sb_version_hascrc(&mp->m_sb))
> +		return false;
> +
> +	/*
> +	 * The agf read verifier catches severe corruption of these fields.
> +	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
> +	 * the verifier allows it.
> +	 */
> +	if (f >= agfl_size || l >= agfl_size)
> +		return true;
> +	if (c > agfl_size)
> +		return true;
> +
> +	/*
> +	 * Check consistency between the on-disk count and the active range. An
> +	 * agfl padding mismatch manifests as an inconsistent flcount.
> +	 */
> +	if (c && l >= f)
> +		active = l - f + 1;
> +	else if (c)
> +		active = agfl_size - f + l + 1;
> +	else
> +		active = 0;
> +
> +	return active != c;
> +}
> +
> +/*
> + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
> + * agfl content cannot be trusted. Warn the user that a repair is required to
> + * recover leaked blocks.
> + *
> + * The purpose of this mechanism is to handle filesystems affected by the agfl
> + * header padding mismatch problem. A reset keeps the filesystem online with a
> + * relatively minor free space accounting inconsistency rather than suffer the
> + * inevitable crash from use of an invalid agfl block.
> + */
> +static void
> +xfs_agfl_reset(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> +
> +	ASSERT(pag->pagf_agflreset);
> +	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
> +
> +	xfs_warn(mp,
> +	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
> +	       "Please unmount and run xfs_repair.",
> +	         pag->pag_agno, pag->pagf_flcount);
> +
> +	agf->agf_flfirst = 0;
> +	agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
> +	agf->agf_flcount = 0;
> +	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
> +				    XFS_AGF_FLCOUNT);
> +
> +	pag->pagf_flcount = 0;
> +	pag->pagf_agflreset = false;
> +}
> +
> +/*
>   * Decide whether to use this allocation group for this allocation.
>   * If so, fix up the btree freelist's size.
>   */
> @@ -2123,6 +2210,10 @@ xfs_alloc_fix_freelist(
>  		}
>  	}
>  
> +	/* reset a padding mismatched agfl before final free space check */
> +	if (pag->pagf_agflreset)
> +		xfs_agfl_reset(tp, agbp, pag);
> +
>  	/* If there isn't enough total space or single-extent, reject it. */
>  	need = xfs_alloc_min_freelist(mp, pag);
>  	if (!xfs_alloc_space_available(args, need, flags))
> @@ -2279,6 +2370,7 @@ xfs_alloc_get_freelist(
>  		agf->agf_flfirst = 0;
>  
>  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> +	ASSERT(!pag->pagf_agflreset);
>  	be32_add_cpu(&agf->agf_flcount, -1);
>  	xfs_trans_agflist_delta(tp, -1);
>  	pag->pagf_flcount--;
> @@ -2390,6 +2482,7 @@ xfs_alloc_put_freelist(
>  		agf->agf_fllast = 0;
>  
>  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> +	ASSERT(!pag->pagf_agflreset);
>  	be32_add_cpu(&agf->agf_flcount, 1);
>  	xfs_trans_agflist_delta(tp, 1);
>  	pag->pagf_flcount++;
> @@ -2597,6 +2690,7 @@ xfs_alloc_read_agf(
>  		pag->pagb_count = 0;
>  		pag->pagb_tree = RB_ROOT;
>  		pag->pagf_init = 1;
> +		pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf);
>  	}
>  #ifdef DEBUG
>  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 1808f56decaa..10b90bbc5162 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -353,6 +353,7 @@ typedef struct xfs_perag {
>  	char		pagi_inodeok;	/* The agi is ok for inodes */
>  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
>  					/* # of levels in bno & cnt btree */
> +	bool		pagf_agflreset; /* agfl requires reset before use */
>  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
>  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
>  	xfs_extlen_t	pagf_longest;	/* longest free space */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 945de08af7ba..a982c0b623d0 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim,
>  		  __entry->tlen)
>  );
>  
> -TRACE_EVENT(xfs_agf,
> +DECLARE_EVENT_CLASS(xfs_agf_class,
>  	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
>  		 unsigned long caller_ip),
>  	TP_ARGS(mp, agf, flags, caller_ip),
> @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf,
>  		  __entry->longest,
>  		  (void *)__entry->caller_ip)
>  );
> +#define DEFINE_AGF_EVENT(name) \
> +DEFINE_EVENT(xfs_agf_class, name, \
> +	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
> +		 unsigned long caller_ip), \
> +	TP_ARGS(mp, agf, flags, caller_ip))
> +DEFINE_AGF_EVENT(xfs_agf);
> +DEFINE_AGF_EVENT(xfs_agfl_reset);
>  
>  TRACE_EVENT(xfs_free_extent,
>  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> -- 
> 2.13.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster March 16, 2018, noon UTC | #2
On Thu, Mar 15, 2018 at 12:31:30PM -0700, Darrick J. Wong wrote:
> On Thu, Mar 15, 2018 at 01:45:39PM -0400, Brian Foster wrote:
> > The struct xfs_agfl v5 header was originally introduced with
> > unexpected padding that caused the AGFL to operate with one less
> > slot than intended. The header has since been packed, but the fix
> > left an incompatibility for users who upgrade from an old kernel
> > with the unpacked header to a newer kernel with the packed header
> > while the AGFL happens to wrap around the end. The newer kernel
> > recognizes one extra slot at the physical end of the AGFL that the
> > previous kernel did not. The new kernel will eventually attempt to
> > allocate a block from that slot, which contains invalid data, and
> > cause a crash.
> > 
> > This condition can be detected by comparing the active range of the
> > AGFL to the count. While this detects a padding mismatch, it can
> > also trigger false positives for unrelated flcount corruption. Since
> > we cannot distinguish a size mismatch due to padding from unrelated
> > corruption, we can't trust the AGFL enough to simply repopulate the
> > empty slot.
> > 
> > Instead, avoid unnecessarily complex detection logic and and use a
> > solution that can handle any form of flcount corruption that slips
> > through read verifiers: distrust the entire AGFL and reset it to an
> > empty state. Any valid blocks within the AGFL are intentionally
> > leaked. This requires xfs_repair to rectify (which was already
> > necessary based on the state the AGFL was found in). The reset
> > mitigates the side effect of the padding mismatch problem from a
> > filesystem crash to a free space accounting inconsistency. The
> > generic approach also means that this patch can be safely backported
> > to kernels with or without a packed struct xfs_agfl.
> > 
> > Check the AGF for an invalid freelist count on initial read from
> > disk. If detected, set a flag on the xfs_perag to indicate that a
> > reset is required before the AGFL can be used. In the first
> > transaction that attempts to use a flagged AGFL, reset it to empty,
> > warn the user about the inconsistency and allow the freelist fixup
> > code to repopulate the AGFL with new blocks. The xfs_perag flag is
> > cleared to eliminate the need for repeated checks on each block
> > allocation operation.
> > 
> > Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
> > CC: <stable@vger.kernel.org>
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> 
> Thanks, will test and apply to 4.17.
> 

Thanks. Re: Dave's comment on the 'Fixes:' tag [1]...

I've locally dropped the tag and appended the following sentence to the
last paragraph in the commit log:

"This allows kernels that include the packing fix commit 96f859d52bcb
("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
to handle older unpacked AGFL formats without a filesystem crash."

I think that demonstrates intent without making the (already long)
commit log too much longer. Thoughts? I can send that as v3 if it's
easier, but I'd prefer to get this nailed down first if there are no
further comments on the patch.

Brian

[1] https://marc.info/?l=linux-xfs&m=152115408908495&w=2

> --D
> 
> > ---
> > 
> > v2:
> > - Function rename and logic cleanup.
> > - CC stable.
> > v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2
> > - Use a simplified AGFL reset mechanism.
> > - Trigger on AGFL fixup rather than get/put.
> > - Various refactors, cleanups and comments.
> > rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
> > 
> >  fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_mount.h        |  1 +
> >  fs/xfs/xfs_trace.h        |  9 ++++-
> >  3 files changed, 103 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 3db90b707fb2..39387bdd225d 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -2063,6 +2063,93 @@ xfs_alloc_space_available(
> >  }
> >  
> >  /*
> > + * Check the agfl fields of the agf for inconsistency or corruption. The purpose
> > + * is to detect an agfl header padding mismatch between current and early v5
> > + * kernels. This problem manifests as a 1-slot size difference between the
> > + * on-disk flcount and the active [first, last] range of a wrapped agfl. This
> > + * may also catch variants of agfl count corruption unrelated to padding. Either
> > + * way, we'll reset the agfl and warn the user.
> > + *
> > + * Return true if a reset is required before the agfl can be used, false
> > + * otherwise.
> > + */
> > +static bool
> > +xfs_agfl_needs_reset(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_agf		*agf)
> > +{
> > +	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
> > +	uint32_t		l = be32_to_cpu(agf->agf_fllast);
> > +	uint32_t		c = be32_to_cpu(agf->agf_flcount);
> > +	int			agfl_size = xfs_agfl_size(mp);
> > +	int			active;
> > +
> > +	/* no agfl header on v4 supers */
> > +	if (!xfs_sb_version_hascrc(&mp->m_sb))
> > +		return false;
> > +
> > +	/*
> > +	 * The agf read verifier catches severe corruption of these fields.
> > +	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
> > +	 * the verifier allows it.
> > +	 */
> > +	if (f >= agfl_size || l >= agfl_size)
> > +		return true;
> > +	if (c > agfl_size)
> > +		return true;
> > +
> > +	/*
> > +	 * Check consistency between the on-disk count and the active range. An
> > +	 * agfl padding mismatch manifests as an inconsistent flcount.
> > +	 */
> > +	if (c && l >= f)
> > +		active = l - f + 1;
> > +	else if (c)
> > +		active = agfl_size - f + l + 1;
> > +	else
> > +		active = 0;
> > +
> > +	return active != c;
> > +}
> > +
> > +/*
> > + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
> > + * agfl content cannot be trusted. Warn the user that a repair is required to
> > + * recover leaked blocks.
> > + *
> > + * The purpose of this mechanism is to handle filesystems affected by the agfl
> > + * header padding mismatch problem. A reset keeps the filesystem online with a
> > + * relatively minor free space accounting inconsistency rather than suffer the
> > + * inevitable crash from use of an invalid agfl block.
> > + */
> > +static void
> > +xfs_agfl_reset(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agbp,
> > +	struct xfs_perag	*pag)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> > +
> > +	ASSERT(pag->pagf_agflreset);
> > +	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
> > +
> > +	xfs_warn(mp,
> > +	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
> > +	       "Please unmount and run xfs_repair.",
> > +	         pag->pag_agno, pag->pagf_flcount);
> > +
> > +	agf->agf_flfirst = 0;
> > +	agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
> > +	agf->agf_flcount = 0;
> > +	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
> > +				    XFS_AGF_FLCOUNT);
> > +
> > +	pag->pagf_flcount = 0;
> > +	pag->pagf_agflreset = false;
> > +}
> > +
> > +/*
> >   * Decide whether to use this allocation group for this allocation.
> >   * If so, fix up the btree freelist's size.
> >   */
> > @@ -2123,6 +2210,10 @@ xfs_alloc_fix_freelist(
> >  		}
> >  	}
> >  
> > +	/* reset a padding mismatched agfl before final free space check */
> > +	if (pag->pagf_agflreset)
> > +		xfs_agfl_reset(tp, agbp, pag);
> > +
> >  	/* If there isn't enough total space or single-extent, reject it. */
> >  	need = xfs_alloc_min_freelist(mp, pag);
> >  	if (!xfs_alloc_space_available(args, need, flags))
> > @@ -2279,6 +2370,7 @@ xfs_alloc_get_freelist(
> >  		agf->agf_flfirst = 0;
> >  
> >  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > +	ASSERT(!pag->pagf_agflreset);
> >  	be32_add_cpu(&agf->agf_flcount, -1);
> >  	xfs_trans_agflist_delta(tp, -1);
> >  	pag->pagf_flcount--;
> > @@ -2390,6 +2482,7 @@ xfs_alloc_put_freelist(
> >  		agf->agf_fllast = 0;
> >  
> >  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > +	ASSERT(!pag->pagf_agflreset);
> >  	be32_add_cpu(&agf->agf_flcount, 1);
> >  	xfs_trans_agflist_delta(tp, 1);
> >  	pag->pagf_flcount++;
> > @@ -2597,6 +2690,7 @@ xfs_alloc_read_agf(
> >  		pag->pagb_count = 0;
> >  		pag->pagb_tree = RB_ROOT;
> >  		pag->pagf_init = 1;
> > +		pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf);
> >  	}
> >  #ifdef DEBUG
> >  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 1808f56decaa..10b90bbc5162 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -353,6 +353,7 @@ typedef struct xfs_perag {
> >  	char		pagi_inodeok;	/* The agi is ok for inodes */
> >  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
> >  					/* # of levels in bno & cnt btree */
> > +	bool		pagf_agflreset; /* agfl requires reset before use */
> >  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
> >  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
> >  	xfs_extlen_t	pagf_longest;	/* longest free space */
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 945de08af7ba..a982c0b623d0 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim,
> >  		  __entry->tlen)
> >  );
> >  
> > -TRACE_EVENT(xfs_agf,
> > +DECLARE_EVENT_CLASS(xfs_agf_class,
> >  	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
> >  		 unsigned long caller_ip),
> >  	TP_ARGS(mp, agf, flags, caller_ip),
> > @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf,
> >  		  __entry->longest,
> >  		  (void *)__entry->caller_ip)
> >  );
> > +#define DEFINE_AGF_EVENT(name) \
> > +DEFINE_EVENT(xfs_agf_class, name, \
> > +	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
> > +		 unsigned long caller_ip), \
> > +	TP_ARGS(mp, agf, flags, caller_ip))
> > +DEFINE_AGF_EVENT(xfs_agf);
> > +DEFINE_AGF_EVENT(xfs_agfl_reset);
> >  
> >  TRACE_EVENT(xfs_free_extent,
> >  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> > -- 
> > 2.13.6
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 16, 2018, 3:49 p.m. UTC | #3
On Fri, Mar 16, 2018 at 08:00:02AM -0400, Brian Foster wrote:
> On Thu, Mar 15, 2018 at 12:31:30PM -0700, Darrick J. Wong wrote:
> > On Thu, Mar 15, 2018 at 01:45:39PM -0400, Brian Foster wrote:
> > > The struct xfs_agfl v5 header was originally introduced with
> > > unexpected padding that caused the AGFL to operate with one less
> > > slot than intended. The header has since been packed, but the fix
> > > left an incompatibility for users who upgrade from an old kernel
> > > with the unpacked header to a newer kernel with the packed header
> > > while the AGFL happens to wrap around the end. The newer kernel
> > > recognizes one extra slot at the physical end of the AGFL that the
> > > previous kernel did not. The new kernel will eventually attempt to
> > > allocate a block from that slot, which contains invalid data, and
> > > cause a crash.
> > > 
> > > This condition can be detected by comparing the active range of the
> > > AGFL to the count. While this detects a padding mismatch, it can
> > > also trigger false positives for unrelated flcount corruption. Since
> > > we cannot distinguish a size mismatch due to padding from unrelated
> > > corruption, we can't trust the AGFL enough to simply repopulate the
> > > empty slot.
> > > 
> > > Instead, avoid unnecessarily complex detection logic and and use a
> > > solution that can handle any form of flcount corruption that slips
> > > through read verifiers: distrust the entire AGFL and reset it to an
> > > empty state. Any valid blocks within the AGFL are intentionally
> > > leaked. This requires xfs_repair to rectify (which was already
> > > necessary based on the state the AGFL was found in). The reset
> > > mitigates the side effect of the padding mismatch problem from a
> > > filesystem crash to a free space accounting inconsistency. The
> > > generic approach also means that this patch can be safely backported
> > > to kernels with or without a packed struct xfs_agfl.
> > > 
> > > Check the AGF for an invalid freelist count on initial read from
> > > disk. If detected, set a flag on the xfs_perag to indicate that a
> > > reset is required before the AGFL can be used. In the first
> > > transaction that attempts to use a flagged AGFL, reset it to empty,
> > > warn the user about the inconsistency and allow the freelist fixup
> > > code to repopulate the AGFL with new blocks. The xfs_perag flag is
> > > cleared to eliminate the need for repeated checks on each block
> > > allocation operation.
> > > 
> > > Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
> > > CC: <stable@vger.kernel.org>
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> > 
> > Thanks, will test and apply to 4.17.
> > 
> 
> Thanks. Re: Dave's comment on the 'Fixes:' tag [1]...
> 
> I've locally dropped the tag and appended the following sentence to the
> last paragraph in the commit log:
> 
> "This allows kernels that include the packing fix commit 96f859d52bcb
> ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
> to handle older unpacked AGFL formats without a filesystem crash."
> 
> I think that demonstrates intent without making the (already long)
> commit log too much longer. Thoughts? I can send that as v3 if it's
> easier, but I'd prefer to get this nailed down first if there are no
> further comments on the patch.

That seems fine to me.  If there aren't any other changes to this patch
I'll simply edit the commit message in my tree to drop the Fixes: tag
and add the blurb above.  Once this has stewed in upstream for a few
weeks we can start to backport this to older kernels.  I volunteer for
4.1 and 4.14, any takers for the others?

I will also send out my xfstest for review.

--D

> Brian
> 
> [1] https://marc.info/?l=linux-xfs&m=152115408908495&w=2
> 
> > --D
> > 
> > > ---
> > > 
> > > v2:
> > > - Function rename and logic cleanup.
> > > - CC stable.
> > > v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2
> > > - Use a simplified AGFL reset mechanism.
> > > - Trigger on AGFL fixup rather than get/put.
> > > - Various refactors, cleanups and comments.
> > > rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
> > > 
> > >  fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_mount.h        |  1 +
> > >  fs/xfs/xfs_trace.h        |  9 ++++-
> > >  3 files changed, 103 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index 3db90b707fb2..39387bdd225d 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -2063,6 +2063,93 @@ xfs_alloc_space_available(
> > >  }
> > >  
> > >  /*
> > > + * Check the agfl fields of the agf for inconsistency or corruption. The purpose
> > > + * is to detect an agfl header padding mismatch between current and early v5
> > > + * kernels. This problem manifests as a 1-slot size difference between the
> > > + * on-disk flcount and the active [first, last] range of a wrapped agfl. This
> > > + * may also catch variants of agfl count corruption unrelated to padding. Either
> > > + * way, we'll reset the agfl and warn the user.
> > > + *
> > > + * Return true if a reset is required before the agfl can be used, false
> > > + * otherwise.
> > > + */
> > > +static bool
> > > +xfs_agfl_needs_reset(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_agf		*agf)
> > > +{
> > > +	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
> > > +	uint32_t		l = be32_to_cpu(agf->agf_fllast);
> > > +	uint32_t		c = be32_to_cpu(agf->agf_flcount);
> > > +	int			agfl_size = xfs_agfl_size(mp);
> > > +	int			active;
> > > +
> > > +	/* no agfl header on v4 supers */
> > > +	if (!xfs_sb_version_hascrc(&mp->m_sb))
> > > +		return false;
> > > +
> > > +	/*
> > > +	 * The agf read verifier catches severe corruption of these fields.
> > > +	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
> > > +	 * the verifier allows it.
> > > +	 */
> > > +	if (f >= agfl_size || l >= agfl_size)
> > > +		return true;
> > > +	if (c > agfl_size)
> > > +		return true;
> > > +
> > > +	/*
> > > +	 * Check consistency between the on-disk count and the active range. An
> > > +	 * agfl padding mismatch manifests as an inconsistent flcount.
> > > +	 */
> > > +	if (c && l >= f)
> > > +		active = l - f + 1;
> > > +	else if (c)
> > > +		active = agfl_size - f + l + 1;
> > > +	else
> > > +		active = 0;
> > > +
> > > +	return active != c;
> > > +}
> > > +
> > > +/*
> > > + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
> > > + * agfl content cannot be trusted. Warn the user that a repair is required to
> > > + * recover leaked blocks.
> > > + *
> > > + * The purpose of this mechanism is to handle filesystems affected by the agfl
> > > + * header padding mismatch problem. A reset keeps the filesystem online with a
> > > + * relatively minor free space accounting inconsistency rather than suffer the
> > > + * inevitable crash from use of an invalid agfl block.
> > > + */
> > > +static void
> > > +xfs_agfl_reset(
> > > +	struct xfs_trans	*tp,
> > > +	struct xfs_buf		*agbp,
> > > +	struct xfs_perag	*pag)
> > > +{
> > > +	struct xfs_mount	*mp = tp->t_mountp;
> > > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> > > +
> > > +	ASSERT(pag->pagf_agflreset);
> > > +	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
> > > +
> > > +	xfs_warn(mp,
> > > +	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
> > > +	       "Please unmount and run xfs_repair.",
> > > +	         pag->pag_agno, pag->pagf_flcount);
> > > +
> > > +	agf->agf_flfirst = 0;
> > > +	agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
> > > +	agf->agf_flcount = 0;
> > > +	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
> > > +				    XFS_AGF_FLCOUNT);
> > > +
> > > +	pag->pagf_flcount = 0;
> > > +	pag->pagf_agflreset = false;
> > > +}
> > > +
> > > +/*
> > >   * Decide whether to use this allocation group for this allocation.
> > >   * If so, fix up the btree freelist's size.
> > >   */
> > > @@ -2123,6 +2210,10 @@ xfs_alloc_fix_freelist(
> > >  		}
> > >  	}
> > >  
> > > +	/* reset a padding mismatched agfl before final free space check */
> > > +	if (pag->pagf_agflreset)
> > > +		xfs_agfl_reset(tp, agbp, pag);
> > > +
> > >  	/* If there isn't enough total space or single-extent, reject it. */
> > >  	need = xfs_alloc_min_freelist(mp, pag);
> > >  	if (!xfs_alloc_space_available(args, need, flags))
> > > @@ -2279,6 +2370,7 @@ xfs_alloc_get_freelist(
> > >  		agf->agf_flfirst = 0;
> > >  
> > >  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > > +	ASSERT(!pag->pagf_agflreset);
> > >  	be32_add_cpu(&agf->agf_flcount, -1);
> > >  	xfs_trans_agflist_delta(tp, -1);
> > >  	pag->pagf_flcount--;
> > > @@ -2390,6 +2482,7 @@ xfs_alloc_put_freelist(
> > >  		agf->agf_fllast = 0;
> > >  
> > >  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > > +	ASSERT(!pag->pagf_agflreset);
> > >  	be32_add_cpu(&agf->agf_flcount, 1);
> > >  	xfs_trans_agflist_delta(tp, 1);
> > >  	pag->pagf_flcount++;
> > > @@ -2597,6 +2690,7 @@ xfs_alloc_read_agf(
> > >  		pag->pagb_count = 0;
> > >  		pag->pagb_tree = RB_ROOT;
> > >  		pag->pagf_init = 1;
> > > +		pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf);
> > >  	}
> > >  #ifdef DEBUG
> > >  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > > index 1808f56decaa..10b90bbc5162 100644
> > > --- a/fs/xfs/xfs_mount.h
> > > +++ b/fs/xfs/xfs_mount.h
> > > @@ -353,6 +353,7 @@ typedef struct xfs_perag {
> > >  	char		pagi_inodeok;	/* The agi is ok for inodes */
> > >  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
> > >  					/* # of levels in bno & cnt btree */
> > > +	bool		pagf_agflreset; /* agfl requires reset before use */
> > >  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
> > >  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
> > >  	xfs_extlen_t	pagf_longest;	/* longest free space */
> > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > index 945de08af7ba..a982c0b623d0 100644
> > > --- a/fs/xfs/xfs_trace.h
> > > +++ b/fs/xfs/xfs_trace.h
> > > @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim,
> > >  		  __entry->tlen)
> > >  );
> > >  
> > > -TRACE_EVENT(xfs_agf,
> > > +DECLARE_EVENT_CLASS(xfs_agf_class,
> > >  	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
> > >  		 unsigned long caller_ip),
> > >  	TP_ARGS(mp, agf, flags, caller_ip),
> > > @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf,
> > >  		  __entry->longest,
> > >  		  (void *)__entry->caller_ip)
> > >  );
> > > +#define DEFINE_AGF_EVENT(name) \
> > > +DEFINE_EVENT(xfs_agf_class, name, \
> > > +	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
> > > +		 unsigned long caller_ip), \
> > > +	TP_ARGS(mp, agf, flags, caller_ip))
> > > +DEFINE_AGF_EVENT(xfs_agf);
> > > +DEFINE_AGF_EVENT(xfs_agfl_reset);
> > >  
> > >  TRACE_EVENT(xfs_free_extent,
> > >  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> > > -- 
> > > 2.13.6
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan March 23, 2018, 5:11 a.m. UTC | #4
Hi Brian,

On Thu, Mar 15, 2018 at 01:45:39PM -0400, Brian Foster wrote:
> The struct xfs_agfl v5 header was originally introduced with
> unexpected padding that caused the AGFL to operate with one less
> slot than intended. The header has since been packed, but the fix
> left an incompatibility for users who upgrade from an old kernel
> with the unpacked header to a newer kernel with the packed header
> while the AGFL happens to wrap around the end. The newer kernel
> recognizes one extra slot at the physical end of the AGFL that the
> previous kernel did not. The new kernel will eventually attempt to
> allocate a block from that slot, which contains invalid data, and
> cause a crash.
> 
> This condition can be detected by comparing the active range of the
> AGFL to the count. While this detects a padding mismatch, it can
> also trigger false positives for unrelated flcount corruption. Since
> we cannot distinguish a size mismatch due to padding from unrelated
> corruption, we can't trust the AGFL enough to simply repopulate the
> empty slot.
> 
> Instead, avoid unnecessarily complex detection logic and and use a
> solution that can handle any form of flcount corruption that slips
> through read verifiers: distrust the entire AGFL and reset it to an
> empty state. Any valid blocks within the AGFL are intentionally
> leaked. This requires xfs_repair to rectify (which was already
> necessary based on the state the AGFL was found in). The reset
> mitigates the side effect of the padding mismatch problem from a
> filesystem crash to a free space accounting inconsistency. The
> generic approach also means that this patch can be safely backported
> to kernels with or without a packed struct xfs_agfl.
> 
> Check the AGF for an invalid freelist count on initial read from
> disk. If detected, set a flag on the xfs_perag to indicate that a
> reset is required before the AGFL can be used. In the first
> transaction that attempts to use a flagged AGFL, reset it to empty,
> warn the user about the inconsistency and allow the freelist fixup
> code to repopulate the AGFL with new blocks. The xfs_perag flag is
> cleared to eliminate the need for repeated checks on each block
> allocation operation.
> 
> Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
> CC: <stable@vger.kernel.org>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> ---
> 
> v2:
> - Function rename and logic cleanup.
> - CC stable.
> v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2
> - Use a simplified AGFL reset mechanism.
> - Trigger on AGFL fixup rather than get/put.
> - Various refactors, cleanups and comments.
> rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
> 
>  fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_mount.h        |  1 +
>  fs/xfs/xfs_trace.h        |  9 ++++-
>  3 files changed, 103 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 3db90b707fb2..39387bdd225d 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2063,6 +2063,93 @@ xfs_alloc_space_available(
>  }
>  
>  /*
> + * Check the agfl fields of the agf for inconsistency or corruption. The purpose
> + * is to detect an agfl header padding mismatch between current and early v5
> + * kernels. This problem manifests as a 1-slot size difference between the
> + * on-disk flcount and the active [first, last] range of a wrapped agfl. This
> + * may also catch variants of agfl count corruption unrelated to padding. Either
> + * way, we'll reset the agfl and warn the user.
> + *
> + * Return true if a reset is required before the agfl can be used, false
> + * otherwise.
> + */
> +static bool
> +xfs_agfl_needs_reset(
> +	struct xfs_mount	*mp,
> +	struct xfs_agf		*agf)
> +{
> +	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
> +	uint32_t		l = be32_to_cpu(agf->agf_fllast);
> +	uint32_t		c = be32_to_cpu(agf->agf_flcount);
> +	int			agfl_size = xfs_agfl_size(mp);
                                            ^^^^^^^^^^^^^
Should this be XFS_AGFL_SIZE(mp)? Otherwise I got compile error (based
on 4.15-rc5 kernel):

fs/xfs/libxfs/xfs_alloc.c: In function ‘xfs_agfl_needs_reset’:
fs/xfs/libxfs/xfs_alloc.c:2075:20: error: implicit declaration of function ‘xfs_agfl_size’; did you mean ‘xfs_agfl_verify’? [-Werror=implicit-function-declaration]
  int   agfl_size = xfs_agfl_size(mp);
                    ^~~~~~~~~~~~~
                    xfs_agfl_verify
cc1: some warnings being treated as errors

> +	int			active;
> +
> +	/* no agfl header on v4 supers */
> +	if (!xfs_sb_version_hascrc(&mp->m_sb))
> +		return false;
> +
> +	/*
> +	 * The agf read verifier catches severe corruption of these fields.
> +	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
> +	 * the verifier allows it.
> +	 */
> +	if (f >= agfl_size || l >= agfl_size)
> +		return true;
> +	if (c > agfl_size)
> +		return true;
> +
> +	/*
> +	 * Check consistency between the on-disk count and the active range. An
> +	 * agfl padding mismatch manifests as an inconsistent flcount.
> +	 */
> +	if (c && l >= f)
> +		active = l - f + 1;
> +	else if (c)
> +		active = agfl_size - f + l + 1;
> +	else
> +		active = 0;
> +
> +	return active != c;
> +}
> +
> +/*
> + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
> + * agfl content cannot be trusted. Warn the user that a repair is required to
> + * recover leaked blocks.
> + *
> + * The purpose of this mechanism is to handle filesystems affected by the agfl
> + * header padding mismatch problem. A reset keeps the filesystem online with a
> + * relatively minor free space accounting inconsistency rather than suffer the
> + * inevitable crash from use of an invalid agfl block.
> + */
> +static void
> +xfs_agfl_reset(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> +
> +	ASSERT(pag->pagf_agflreset);
> +	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
> +
> +	xfs_warn(mp,
> +	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
> +	       "Please unmount and run xfs_repair.",
> +	         pag->pag_agno, pag->pagf_flcount);
> +
> +	agf->agf_flfirst = 0;
> +	agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);

Same here.

Thanks,
Eryu

> +	agf->agf_flcount = 0;
> +	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
> +				    XFS_AGF_FLCOUNT);
> +
> +	pag->pagf_flcount = 0;
> +	pag->pagf_agflreset = false;
> +}
> +
> +/*
>   * Decide whether to use this allocation group for this allocation.
>   * If so, fix up the btree freelist's size.
>   */
> @@ -2123,6 +2210,10 @@ xfs_alloc_fix_freelist(
>  		}
>  	}
>  
> +	/* reset a padding mismatched agfl before final free space check */
> +	if (pag->pagf_agflreset)
> +		xfs_agfl_reset(tp, agbp, pag);
> +
>  	/* If there isn't enough total space or single-extent, reject it. */
>  	need = xfs_alloc_min_freelist(mp, pag);
>  	if (!xfs_alloc_space_available(args, need, flags))
> @@ -2279,6 +2370,7 @@ xfs_alloc_get_freelist(
>  		agf->agf_flfirst = 0;
>  
>  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> +	ASSERT(!pag->pagf_agflreset);
>  	be32_add_cpu(&agf->agf_flcount, -1);
>  	xfs_trans_agflist_delta(tp, -1);
>  	pag->pagf_flcount--;
> @@ -2390,6 +2482,7 @@ xfs_alloc_put_freelist(
>  		agf->agf_fllast = 0;
>  
>  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> +	ASSERT(!pag->pagf_agflreset);
>  	be32_add_cpu(&agf->agf_flcount, 1);
>  	xfs_trans_agflist_delta(tp, 1);
>  	pag->pagf_flcount++;
> @@ -2597,6 +2690,7 @@ xfs_alloc_read_agf(
>  		pag->pagb_count = 0;
>  		pag->pagb_tree = RB_ROOT;
>  		pag->pagf_init = 1;
> +		pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf);
>  	}
>  #ifdef DEBUG
>  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 1808f56decaa..10b90bbc5162 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -353,6 +353,7 @@ typedef struct xfs_perag {
>  	char		pagi_inodeok;	/* The agi is ok for inodes */
>  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
>  					/* # of levels in bno & cnt btree */
> +	bool		pagf_agflreset; /* agfl requires reset before use */
>  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
>  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
>  	xfs_extlen_t	pagf_longest;	/* longest free space */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 945de08af7ba..a982c0b623d0 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim,
>  		  __entry->tlen)
>  );
>  
> -TRACE_EVENT(xfs_agf,
> +DECLARE_EVENT_CLASS(xfs_agf_class,
>  	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
>  		 unsigned long caller_ip),
>  	TP_ARGS(mp, agf, flags, caller_ip),
> @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf,
>  		  __entry->longest,
>  		  (void *)__entry->caller_ip)
>  );
> +#define DEFINE_AGF_EVENT(name) \
> +DEFINE_EVENT(xfs_agf_class, name, \
> +	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
> +		 unsigned long caller_ip), \
> +	TP_ARGS(mp, agf, flags, caller_ip))
> +DEFINE_AGF_EVENT(xfs_agf);
> +DEFINE_AGF_EVENT(xfs_agfl_reset);
>  
>  TRACE_EVENT(xfs_free_extent,
>  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> -- 
> 2.13.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster March 23, 2018, 11:46 a.m. UTC | #5
On Fri, Mar 23, 2018 at 01:11:38PM +0800, Eryu Guan wrote:
> Hi Brian,
> 
> On Thu, Mar 15, 2018 at 01:45:39PM -0400, Brian Foster wrote:
> > The struct xfs_agfl v5 header was originally introduced with
> > unexpected padding that caused the AGFL to operate with one less
> > slot than intended. The header has since been packed, but the fix
> > left an incompatibility for users who upgrade from an old kernel
> > with the unpacked header to a newer kernel with the packed header
> > while the AGFL happens to wrap around the end. The newer kernel
> > recognizes one extra slot at the physical end of the AGFL that the
> > previous kernel did not. The new kernel will eventually attempt to
> > allocate a block from that slot, which contains invalid data, and
> > cause a crash.
> > 
> > This condition can be detected by comparing the active range of the
> > AGFL to the count. While this detects a padding mismatch, it can
> > also trigger false positives for unrelated flcount corruption. Since
> > we cannot distinguish a size mismatch due to padding from unrelated
> > corruption, we can't trust the AGFL enough to simply repopulate the
> > empty slot.
> > 
> > Instead, avoid unnecessarily complex detection logic and and use a
> > solution that can handle any form of flcount corruption that slips
> > through read verifiers: distrust the entire AGFL and reset it to an
> > empty state. Any valid blocks within the AGFL are intentionally
> > leaked. This requires xfs_repair to rectify (which was already
> > necessary based on the state the AGFL was found in). The reset
> > mitigates the side effect of the padding mismatch problem from a
> > filesystem crash to a free space accounting inconsistency. The
> > generic approach also means that this patch can be safely backported
> > to kernels with or without a packed struct xfs_agfl.
> > 
> > Check the AGF for an invalid freelist count on initial read from
> > disk. If detected, set a flag on the xfs_perag to indicate that a
> > reset is required before the AGFL can be used. In the first
> > transaction that attempts to use a flagged AGFL, reset it to empty,
> > warn the user about the inconsistency and allow the freelist fixup
> > code to repopulate the AGFL with new blocks. The xfs_perag flag is
> > cleared to eliminate the need for repeated checks on each block
> > allocation operation.
> > 
> > Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
> > CC: <stable@vger.kernel.org>
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> > ---
> > 
> > v2:
> > - Function rename and logic cleanup.
> > - CC stable.
> > v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2
> > - Use a simplified AGFL reset mechanism.
> > - Trigger on AGFL fixup rather than get/put.
> > - Various refactors, cleanups and comments.
> > rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
> > 
> >  fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_mount.h        |  1 +
> >  fs/xfs/xfs_trace.h        |  9 ++++-
> >  3 files changed, 103 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 3db90b707fb2..39387bdd225d 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -2063,6 +2063,93 @@ xfs_alloc_space_available(
> >  }
> >  
> >  /*
> > + * Check the agfl fields of the agf for inconsistency or corruption. The purpose
> > + * is to detect an agfl header padding mismatch between current and early v5
> > + * kernels. This problem manifests as a 1-slot size difference between the
> > + * on-disk flcount and the active [first, last] range of a wrapped agfl. This
> > + * may also catch variants of agfl count corruption unrelated to padding. Either
> > + * way, we'll reset the agfl and warn the user.
> > + *
> > + * Return true if a reset is required before the agfl can be used, false
> > + * otherwise.
> > + */
> > +static bool
> > +xfs_agfl_needs_reset(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_agf		*agf)
> > +{
> > +	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
> > +	uint32_t		l = be32_to_cpu(agf->agf_fllast);
> > +	uint32_t		c = be32_to_cpu(agf->agf_flcount);
> > +	int			agfl_size = xfs_agfl_size(mp);
>                                             ^^^^^^^^^^^^^
> Should this be XFS_AGFL_SIZE(mp)? Otherwise I got compile error (based
> on 4.15-rc5 kernel):
> 

This patch was rebased onto for-next which now includes commit
a78ee256c325e ("xfs: convert XFS_AGFL_SIZE to a helper function"). That
commit refactors the macro into a helper function.

Brian

> fs/xfs/libxfs/xfs_alloc.c: In function ‘xfs_agfl_needs_reset’:
> fs/xfs/libxfs/xfs_alloc.c:2075:20: error: implicit declaration of function ‘xfs_agfl_size’; did you mean ‘xfs_agfl_verify’? [-Werror=implicit-function-declaration]
>   int   agfl_size = xfs_agfl_size(mp);
>                     ^~~~~~~~~~~~~
>                     xfs_agfl_verify
> cc1: some warnings being treated as errors
> 
> > +	int			active;
> > +
> > +	/* no agfl header on v4 supers */
> > +	if (!xfs_sb_version_hascrc(&mp->m_sb))
> > +		return false;
> > +
> > +	/*
> > +	 * The agf read verifier catches severe corruption of these fields.
> > +	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
> > +	 * the verifier allows it.
> > +	 */
> > +	if (f >= agfl_size || l >= agfl_size)
> > +		return true;
> > +	if (c > agfl_size)
> > +		return true;
> > +
> > +	/*
> > +	 * Check consistency between the on-disk count and the active range. An
> > +	 * agfl padding mismatch manifests as an inconsistent flcount.
> > +	 */
> > +	if (c && l >= f)
> > +		active = l - f + 1;
> > +	else if (c)
> > +		active = agfl_size - f + l + 1;
> > +	else
> > +		active = 0;
> > +
> > +	return active != c;
> > +}
> > +
> > +/*
> > + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
> > + * agfl content cannot be trusted. Warn the user that a repair is required to
> > + * recover leaked blocks.
> > + *
> > + * The purpose of this mechanism is to handle filesystems affected by the agfl
> > + * header padding mismatch problem. A reset keeps the filesystem online with a
> > + * relatively minor free space accounting inconsistency rather than suffer the
> > + * inevitable crash from use of an invalid agfl block.
> > + */
> > +static void
> > +xfs_agfl_reset(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agbp,
> > +	struct xfs_perag	*pag)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> > +
> > +	ASSERT(pag->pagf_agflreset);
> > +	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
> > +
> > +	xfs_warn(mp,
> > +	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
> > +	       "Please unmount and run xfs_repair.",
> > +	         pag->pag_agno, pag->pagf_flcount);
> > +
> > +	agf->agf_flfirst = 0;
> > +	agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
> 
> Same here.
> 
> Thanks,
> Eryu
> 
> > +	agf->agf_flcount = 0;
> > +	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
> > +				    XFS_AGF_FLCOUNT);
> > +
> > +	pag->pagf_flcount = 0;
> > +	pag->pagf_agflreset = false;
> > +}
> > +
> > +/*
> >   * Decide whether to use this allocation group for this allocation.
> >   * If so, fix up the btree freelist's size.
> >   */
> > @@ -2123,6 +2210,10 @@ xfs_alloc_fix_freelist(
> >  		}
> >  	}
> >  
> > +	/* reset a padding mismatched agfl before final free space check */
> > +	if (pag->pagf_agflreset)
> > +		xfs_agfl_reset(tp, agbp, pag);
> > +
> >  	/* If there isn't enough total space or single-extent, reject it. */
> >  	need = xfs_alloc_min_freelist(mp, pag);
> >  	if (!xfs_alloc_space_available(args, need, flags))
> > @@ -2279,6 +2370,7 @@ xfs_alloc_get_freelist(
> >  		agf->agf_flfirst = 0;
> >  
> >  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > +	ASSERT(!pag->pagf_agflreset);
> >  	be32_add_cpu(&agf->agf_flcount, -1);
> >  	xfs_trans_agflist_delta(tp, -1);
> >  	pag->pagf_flcount--;
> > @@ -2390,6 +2482,7 @@ xfs_alloc_put_freelist(
> >  		agf->agf_fllast = 0;
> >  
> >  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > +	ASSERT(!pag->pagf_agflreset);
> >  	be32_add_cpu(&agf->agf_flcount, 1);
> >  	xfs_trans_agflist_delta(tp, 1);
> >  	pag->pagf_flcount++;
> > @@ -2597,6 +2690,7 @@ xfs_alloc_read_agf(
> >  		pag->pagb_count = 0;
> >  		pag->pagb_tree = RB_ROOT;
> >  		pag->pagf_init = 1;
> > +		pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf);
> >  	}
> >  #ifdef DEBUG
> >  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 1808f56decaa..10b90bbc5162 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -353,6 +353,7 @@ typedef struct xfs_perag {
> >  	char		pagi_inodeok;	/* The agi is ok for inodes */
> >  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
> >  					/* # of levels in bno & cnt btree */
> > +	bool		pagf_agflreset; /* agfl requires reset before use */
> >  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
> >  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
> >  	xfs_extlen_t	pagf_longest;	/* longest free space */
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 945de08af7ba..a982c0b623d0 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim,
> >  		  __entry->tlen)
> >  );
> >  
> > -TRACE_EVENT(xfs_agf,
> > +DECLARE_EVENT_CLASS(xfs_agf_class,
> >  	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
> >  		 unsigned long caller_ip),
> >  	TP_ARGS(mp, agf, flags, caller_ip),
> > @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf,
> >  		  __entry->longest,
> >  		  (void *)__entry->caller_ip)
> >  );
> > +#define DEFINE_AGF_EVENT(name) \
> > +DEFINE_EVENT(xfs_agf_class, name, \
> > +	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
> > +		 unsigned long caller_ip), \
> > +	TP_ARGS(mp, agf, flags, caller_ip))
> > +DEFINE_AGF_EVENT(xfs_agf);
> > +DEFINE_AGF_EVENT(xfs_agfl_reset);
> >  
> >  TRACE_EVENT(xfs_free_extent,
> >  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> > -- 
> > 2.13.6
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan March 25, 2018, 6:02 a.m. UTC | #6
On Fri, Mar 23, 2018 at 07:46:22AM -0400, Brian Foster wrote:
> On Fri, Mar 23, 2018 at 01:11:38PM +0800, Eryu Guan wrote:
> > Hi Brian,
> > 
> > On Thu, Mar 15, 2018 at 01:45:39PM -0400, Brian Foster wrote:
> > > The struct xfs_agfl v5 header was originally introduced with
> > > unexpected padding that caused the AGFL to operate with one less
> > > slot than intended. The header has since been packed, but the fix
> > > left an incompatibility for users who upgrade from an old kernel
> > > with the unpacked header to a newer kernel with the packed header
> > > while the AGFL happens to wrap around the end. The newer kernel
> > > recognizes one extra slot at the physical end of the AGFL that the
> > > previous kernel did not. The new kernel will eventually attempt to
> > > allocate a block from that slot, which contains invalid data, and
> > > cause a crash.
> > > 
> > > This condition can be detected by comparing the active range of the
> > > AGFL to the count. While this detects a padding mismatch, it can
> > > also trigger false positives for unrelated flcount corruption. Since
> > > we cannot distinguish a size mismatch due to padding from unrelated
> > > corruption, we can't trust the AGFL enough to simply repopulate the
> > > empty slot.
> > > 
> > > Instead, avoid unnecessarily complex detection logic and and use a
> > > solution that can handle any form of flcount corruption that slips
> > > through read verifiers: distrust the entire AGFL and reset it to an
> > > empty state. Any valid blocks within the AGFL are intentionally
> > > leaked. This requires xfs_repair to rectify (which was already
> > > necessary based on the state the AGFL was found in). The reset
> > > mitigates the side effect of the padding mismatch problem from a
> > > filesystem crash to a free space accounting inconsistency. The
> > > generic approach also means that this patch can be safely backported
> > > to kernels with or without a packed struct xfs_agfl.
> > > 
> > > Check the AGF for an invalid freelist count on initial read from
> > > disk. If detected, set a flag on the xfs_perag to indicate that a
> > > reset is required before the AGFL can be used. In the first
> > > transaction that attempts to use a flagged AGFL, reset it to empty,
> > > warn the user about the inconsistency and allow the freelist fixup
> > > code to repopulate the AGFL with new blocks. The xfs_perag flag is
> > > cleared to eliminate the need for repeated checks on each block
> > > allocation operation.
> > > 
> > > Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
> > > CC: <stable@vger.kernel.org>
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
> > > ---
> > > 
> > > v2:
> > > - Function rename and logic cleanup.
> > > - CC stable.
> > > v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2
> > > - Use a simplified AGFL reset mechanism.
> > > - Trigger on AGFL fixup rather than get/put.
> > > - Various refactors, cleanups and comments.
> > > rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
> > > 
> > >  fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_mount.h        |  1 +
> > >  fs/xfs/xfs_trace.h        |  9 ++++-
> > >  3 files changed, 103 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index 3db90b707fb2..39387bdd225d 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -2063,6 +2063,93 @@ xfs_alloc_space_available(
> > >  }
> > >  
> > >  /*
> > > + * Check the agfl fields of the agf for inconsistency or corruption. The purpose
> > > + * is to detect an agfl header padding mismatch between current and early v5
> > > + * kernels. This problem manifests as a 1-slot size difference between the
> > > + * on-disk flcount and the active [first, last] range of a wrapped agfl. This
> > > + * may also catch variants of agfl count corruption unrelated to padding. Either
> > > + * way, we'll reset the agfl and warn the user.
> > > + *
> > > + * Return true if a reset is required before the agfl can be used, false
> > > + * otherwise.
> > > + */
> > > +static bool
> > > +xfs_agfl_needs_reset(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_agf		*agf)
> > > +{
> > > +	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
> > > +	uint32_t		l = be32_to_cpu(agf->agf_fllast);
> > > +	uint32_t		c = be32_to_cpu(agf->agf_flcount);
> > > +	int			agfl_size = xfs_agfl_size(mp);
> >                                             ^^^^^^^^^^^^^
> > Should this be XFS_AGFL_SIZE(mp)? Otherwise I got compile error (based
> > on 4.15-rc5 kernel):
> > 
> 
> This patch was rebased onto for-next which now includes commit
> a78ee256c325e ("xfs: convert XFS_AGFL_SIZE to a helper function"). That
> commit refactors the macro into a helper function.

Ah, thanks! I did search the list (roughly) to see if there's any patch
introduced xfs_agfl_size but apparently missed this patch.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 3db90b707fb2..39387bdd225d 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2063,6 +2063,93 @@  xfs_alloc_space_available(
 }
 
 /*
+ * Check the agfl fields of the agf for inconsistency or corruption. The purpose
+ * is to detect an agfl header padding mismatch between current and early v5
+ * kernels. This problem manifests as a 1-slot size difference between the
+ * on-disk flcount and the active [first, last] range of a wrapped agfl. This
+ * may also catch variants of agfl count corruption unrelated to padding. Either
+ * way, we'll reset the agfl and warn the user.
+ *
+ * Return true if a reset is required before the agfl can be used, false
+ * otherwise.
+ */
+static bool
+xfs_agfl_needs_reset(
+	struct xfs_mount	*mp,
+	struct xfs_agf		*agf)
+{
+	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
+	uint32_t		l = be32_to_cpu(agf->agf_fllast);
+	uint32_t		c = be32_to_cpu(agf->agf_flcount);
+	int			agfl_size = xfs_agfl_size(mp);
+	int			active;
+
+	/* no agfl header on v4 supers */
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		return false;
+
+	/*
+	 * The agf read verifier catches severe corruption of these fields.
+	 * Repeat some sanity checks to cover a packed -> unpacked mismatch if
+	 * the verifier allows it.
+	 */
+	if (f >= agfl_size || l >= agfl_size)
+		return true;
+	if (c > agfl_size)
+		return true;
+
+	/*
+	 * Check consistency between the on-disk count and the active range. An
+	 * agfl padding mismatch manifests as an inconsistent flcount.
+	 */
+	if (c && l >= f)
+		active = l - f + 1;
+	else if (c)
+		active = agfl_size - f + l + 1;
+	else
+		active = 0;
+
+	return active != c;
+}
+
+/*
+ * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
+ * agfl content cannot be trusted. Warn the user that a repair is required to
+ * recover leaked blocks.
+ *
+ * The purpose of this mechanism is to handle filesystems affected by the agfl
+ * header padding mismatch problem. A reset keeps the filesystem online with a
+ * relatively minor free space accounting inconsistency rather than suffer the
+ * inevitable crash from use of an invalid agfl block.
+ */
+static void
+xfs_agfl_reset(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	struct xfs_perag	*pag)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+
+	ASSERT(pag->pagf_agflreset);
+	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
+
+	xfs_warn(mp,
+	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
+	       "Please unmount and run xfs_repair.",
+	         pag->pag_agno, pag->pagf_flcount);
+
+	agf->agf_flfirst = 0;
+	agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
+	agf->agf_flcount = 0;
+	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
+				    XFS_AGF_FLCOUNT);
+
+	pag->pagf_flcount = 0;
+	pag->pagf_agflreset = false;
+}
+
+/*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
  */
@@ -2123,6 +2210,10 @@  xfs_alloc_fix_freelist(
 		}
 	}
 
+	/* reset a padding mismatched agfl before final free space check */
+	if (pag->pagf_agflreset)
+		xfs_agfl_reset(tp, agbp, pag);
+
 	/* If there isn't enough total space or single-extent, reject it. */
 	need = xfs_alloc_min_freelist(mp, pag);
 	if (!xfs_alloc_space_available(args, need, flags))
@@ -2279,6 +2370,7 @@  xfs_alloc_get_freelist(
 		agf->agf_flfirst = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	ASSERT(!pag->pagf_agflreset);
 	be32_add_cpu(&agf->agf_flcount, -1);
 	xfs_trans_agflist_delta(tp, -1);
 	pag->pagf_flcount--;
@@ -2390,6 +2482,7 @@  xfs_alloc_put_freelist(
 		agf->agf_fllast = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	ASSERT(!pag->pagf_agflreset);
 	be32_add_cpu(&agf->agf_flcount, 1);
 	xfs_trans_agflist_delta(tp, 1);
 	pag->pagf_flcount++;
@@ -2597,6 +2690,7 @@  xfs_alloc_read_agf(
 		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
 		pag->pagf_init = 1;
+		pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf);
 	}
 #ifdef DEBUG
 	else if (!XFS_FORCED_SHUTDOWN(mp)) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 1808f56decaa..10b90bbc5162 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -353,6 +353,7 @@  typedef struct xfs_perag {
 	char		pagi_inodeok;	/* The agi is ok for inodes */
 	uint8_t		pagf_levels[XFS_BTNUM_AGF];
 					/* # of levels in bno & cnt btree */
+	bool		pagf_agflreset; /* agfl requires reset before use */
 	uint32_t	pagf_flcount;	/* count of blocks in freelist */
 	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
 	xfs_extlen_t	pagf_longest;	/* longest free space */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 945de08af7ba..a982c0b623d0 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1477,7 +1477,7 @@  TRACE_EVENT(xfs_extent_busy_trim,
 		  __entry->tlen)
 );
 
-TRACE_EVENT(xfs_agf,
+DECLARE_EVENT_CLASS(xfs_agf_class,
 	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
 		 unsigned long caller_ip),
 	TP_ARGS(mp, agf, flags, caller_ip),
@@ -1533,6 +1533,13 @@  TRACE_EVENT(xfs_agf,
 		  __entry->longest,
 		  (void *)__entry->caller_ip)
 );
+#define DEFINE_AGF_EVENT(name) \
+DEFINE_EVENT(xfs_agf_class, name, \
+	TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
+		 unsigned long caller_ip), \
+	TP_ARGS(mp, agf, flags, caller_ip))
+DEFINE_AGF_EVENT(xfs_agf);
+DEFINE_AGF_EVENT(xfs_agfl_reset);
 
 TRACE_EVENT(xfs_free_extent,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,