Message ID | 20180314171724.41951-1-bfoster@redhat.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote: > The struct xfs_agfl v5 header was originally introduced with > unexpected padding that caused the AGFL to operate with one less > slot than intended. The header has since been packed, but the fix > left an incompatibility for users who upgrade from an old kernel > with the unpacked header to a newer kernel with the packed header > while the AGFL happens to wrap around the end. The newer kernel > recognizes one extra slot at the physical end of the AGFL that the > previous kernel did not. The new kernel will eventually attempt to > allocate a block from that slot, which contains invalid data, and > cause a crash. > > This condition can be detected by comparing the active range of the > AGFL to the count. While this detects a padding mismatch, it can > also trigger false positives for unrelated flcount corruption. Since > we cannot distinguish a size mismatch due to padding from unrelated > corruption, we can't trust the AGFL enough to simply repopulate the > empty slot. > > Instead, avoid unnecessarily complex detection logic and and use a > solution that can handle any form of flcount corruption that slips > through read verifiers: distrust the entire AGFL and reset it to an > empty state. Any valid blocks within the AGFL are intentionally > leaked. This requires xfs_repair to rectify (which was already > necessary based on the state the AGFL was found in). The reset > mitigates the side effect of the padding mismatch problem from a > filesystem crash to a free space accounting inconsistency. The > generic approach also means that this patch can be safely backported > to kernels with or without a packed struct xfs_agfl. > > Check the AGF for an invalid freelist count on initial read from > disk. If detected, set a flag on the xfs_perag to indicate that a > reset is required before the AGFL can be used. In the first > transaction that attempts to use a flagged AGFL, reset it to empty, > warn the user about the inconsistency and allow the freelist fixup > code to repopulate the AGFL with new blocks. The xfs_perag flag is > cleared to eliminate the need for repeated checks on each block > allocation operation. > > Suggested-by: Dave Chinner <david@fromorbit.com> > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > > Hi all, > > Here's an actual patch for the AGFL reset approach. This survives a > normal xfstests run. I also ran some quick xfstests runs with hacks in > place to unconditionally trigger an agfl reset on first read. This > caused a ton of test failures due to the resulting accounting > inconsistency on the scratch device, but otherwise it survived from a > torture test perspective and didn't introduce any unrelated regressions > that I could see. > > Thoughts, reviews, flames appreciated. > > Brian > > v1: > - Use a simplified AGFL reset mechanism. > - Trigger on AGFL fixup rather than get/put. > - Various refactors, cleanups and comments. > rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2 > > fs/xfs/libxfs/xfs_alloc.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_mount.h | 1 + > fs/xfs/xfs_trace.h | 9 ++++- > 3 files changed, 105 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index 3db90b707fb2..7dfec92f28dc 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -2063,6 +2063,95 @@ xfs_alloc_space_available( > } > > /* > + * Check the agfl fields of the agf for inconsistency or corruption. The purpose > + * is to detect an agfl header padding mismatch between current and early v5 > + * kernels. This problem manifests as a 1-slot size difference between the > + * on-disk flcount and the active [first, last] range of a wrapped agfl. This > + * may also catch variants of agfl count corruption unrelated to padding. Either > + * way, we'll reset the agfl and warn the user. > + * > + * Return true if a reset is required before the agfl can be used, false > + * otherwise. > + */ > +static bool > +xfs_agfl_need_reset( > + struct xfs_mount *mp, > + struct xfs_agf *agf) > +{ > + uint32_t f = be32_to_cpu(agf->agf_flfirst); > + uint32_t l = be32_to_cpu(agf->agf_fllast); > + uint32_t c = be32_to_cpu(agf->agf_flcount); > + int agfl_size = xfs_agfl_size(mp); > + int active; > + > + /* no agfl header on v4 supers */ > + if (!xfs_sb_version_hascrc(&mp->m_sb)) > + return false; > + > + /* > + * The agf read verifier catches severe corruption of these fields. > + * Repeat some sanity checks to cover a packed -> unpacked mismatch if > + * the verifier allows it. > + */ > + if (f >= agfl_size || l >= agfl_size) > + return true; > + if (c > agfl_size) > + return true; > + > + /* > + * Check consistency between the on-disk count and the active range. An > + * agfl padding mismatch manifests as an inconsistent flcount. > + */ > + if (c && l >= f) > + active = l - f + 1; > + else if (c) > + active = agfl_size - f + l + 1; > + else > + active = 0; > + if (active != c) > + return true; > + > + return false; This could become "return active != c;" But otherwise looks ok enough to try it out, Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > +} > + > +/* > + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the > + * agfl content cannot be trusted. Warn the user that a repair is required to > + * recover leaked blocks. > + * > + * The purpose of this mechanism is to handle filesystems affected by the agfl > + * header padding mismatch problem. A reset keeps the filesystem online with a > + * relatively minor free space accounting inconsistency rather than suffer the > + * inevitable crash from use of an invalid agfl block. > + */ > +static void > +xfs_agfl_reset( > + struct xfs_trans *tp, > + struct xfs_buf *agbp, > + struct xfs_perag *pag) > +{ > + struct xfs_mount *mp = tp->t_mountp; > + struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp); > + > + ASSERT(pag->pagf_agflreset); > + trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_); > + > + xfs_warn(mp, > + "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. " > + "Please unmount and run xfs_repair.", > + pag->pag_agno, pag->pagf_flcount); > + > + agf->agf_flfirst = 0; > + agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1); > + agf->agf_flcount = 0; > + xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST | > + XFS_AGF_FLCOUNT); > + > + pag->pagf_flcount = 0; > + pag->pagf_agflreset = false; > +} > + > +/* > * Decide whether to use this allocation group for this allocation. > * If so, fix up the btree freelist's size. > */ > @@ -2123,6 +2212,10 @@ xfs_alloc_fix_freelist( > } > } > > + /* reset a padding mismatched agfl before final free space check */ > + if (pag->pagf_agflreset) > + xfs_agfl_reset(tp, agbp, pag); > + > /* If there isn't enough total space or single-extent, reject it. */ > need = xfs_alloc_min_freelist(mp, pag); > if (!xfs_alloc_space_available(args, need, flags)) > @@ -2279,6 +2372,7 @@ xfs_alloc_get_freelist( > agf->agf_flfirst = 0; > > pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); > + ASSERT(!pag->pagf_agflreset); > be32_add_cpu(&agf->agf_flcount, -1); > xfs_trans_agflist_delta(tp, -1); > pag->pagf_flcount--; > @@ -2390,6 +2484,7 @@ xfs_alloc_put_freelist( > agf->agf_fllast = 0; > > pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); > + ASSERT(!pag->pagf_agflreset); > be32_add_cpu(&agf->agf_flcount, 1); > xfs_trans_agflist_delta(tp, 1); > pag->pagf_flcount++; > @@ -2597,6 +2692,7 @@ xfs_alloc_read_agf( > pag->pagb_count = 0; > pag->pagb_tree = RB_ROOT; > pag->pagf_init = 1; > + pag->pagf_agflreset = xfs_agfl_need_reset(mp, agf); > } > #ifdef DEBUG > else if (!XFS_FORCED_SHUTDOWN(mp)) { > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > index 1808f56decaa..10b90bbc5162 100644 > --- a/fs/xfs/xfs_mount.h > +++ b/fs/xfs/xfs_mount.h > @@ -353,6 +353,7 @@ typedef struct xfs_perag { > char pagi_inodeok; /* The agi is ok for inodes */ > uint8_t pagf_levels[XFS_BTNUM_AGF]; > /* # of levels in bno & cnt btree */ > + bool pagf_agflreset; /* agfl requires reset before use */ > uint32_t pagf_flcount; /* count of blocks in freelist */ > xfs_extlen_t pagf_freeblks; /* total free blocks */ > xfs_extlen_t pagf_longest; /* longest free space */ > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index 945de08af7ba..a982c0b623d0 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim, > __entry->tlen) > ); > > -TRACE_EVENT(xfs_agf, > +DECLARE_EVENT_CLASS(xfs_agf_class, > TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, > unsigned long caller_ip), > TP_ARGS(mp, agf, flags, caller_ip), > @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf, > __entry->longest, > (void *)__entry->caller_ip) > ); > +#define DEFINE_AGF_EVENT(name) \ > +DEFINE_EVENT(xfs_agf_class, name, \ > + TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \ > + unsigned long caller_ip), \ > + TP_ARGS(mp, agf, flags, caller_ip)) > +DEFINE_AGF_EVENT(xfs_agf); > +DEFINE_AGF_EVENT(xfs_agfl_reset); > > TRACE_EVENT(xfs_free_extent, > TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno, > -- > 2.13.6 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote: > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote: >> The struct xfs_agfl v5 header was originally introduced with >> unexpected padding that caused the AGFL to operate with one less >> slot than intended. The header has since been packed, but the fix >> left an incompatibility for users who upgrade from an old kernel >> with the unpacked header to a newer kernel with the packed header >> while the AGFL happens to wrap around the end. The newer kernel >> recognizes one extra slot at the physical end of the AGFL that the >> previous kernel did not. The new kernel will eventually attempt to >> allocate a block from that slot, which contains invalid data, and >> cause a crash. >> >> This condition can be detected by comparing the active range of the >> AGFL to the count. While this detects a padding mismatch, it can >> also trigger false positives for unrelated flcount corruption. Since >> we cannot distinguish a size mismatch due to padding from unrelated >> corruption, we can't trust the AGFL enough to simply repopulate the >> empty slot. >> >> Instead, avoid unnecessarily complex detection logic and and use a >> solution that can handle any form of flcount corruption that slips >> through read verifiers: distrust the entire AGFL and reset it to an >> empty state. Any valid blocks within the AGFL are intentionally >> leaked. This requires xfs_repair to rectify (which was already >> necessary based on the state the AGFL was found in). The reset >> mitigates the side effect of the padding mismatch problem from a >> filesystem crash to a free space accounting inconsistency. The >> generic approach also means that this patch can be safely backported >> to kernels with or without a packed struct xfs_agfl. >> >> Check the AGF for an invalid freelist count on initial read from >> disk. If detected, set a flag on the xfs_perag to indicate that a >> reset is required before the AGFL can be used. In the first >> transaction that attempts to use a flagged AGFL, reset it to empty, >> warn the user about the inconsistency and allow the freelist fixup >> code to repopulate the AGFL with new blocks. The xfs_perag flag is >> cleared to eliminate the need for repeated checks on each block >> allocation operation. >> >> Suggested-by: Dave Chinner <david@fromorbit.com> >> Signed-off-by: Brian Foster <bfoster@redhat.com> >> --- >> >> Hi all, >> >> Here's an actual patch for the AGFL reset approach. This survives a >> normal xfstests run. I also ran some quick xfstests runs with hacks in >> place to unconditionally trigger an agfl reset on first read. This >> caused a ton of test failures due to the resulting accounting >> inconsistency on the scratch device, but otherwise it survived from a >> torture test perspective and didn't introduce any unrelated regressions >> that I could see. >> >> Thoughts, reviews, flames appreciated. >> >> Brian >> >> v1: >> - Use a simplified AGFL reset mechanism. >> - Trigger on AGFL fixup rather than get/put. >> - Various refactors, cleanups and comments. >> rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2 >> >> fs/xfs/libxfs/xfs_alloc.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++ >> fs/xfs/xfs_mount.h | 1 + >> fs/xfs/xfs_trace.h | 9 ++++- >> 3 files changed, 105 insertions(+), 1 deletion(-) >> >> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c >> index 3db90b707fb2..7dfec92f28dc 100644 >> --- a/fs/xfs/libxfs/xfs_alloc.c >> +++ b/fs/xfs/libxfs/xfs_alloc.c >> @@ -2063,6 +2063,95 @@ xfs_alloc_space_available( >> } >> >> /* >> + * Check the agfl fields of the agf for inconsistency or corruption. The purpose >> + * is to detect an agfl header padding mismatch between current and early v5 >> + * kernels. This problem manifests as a 1-slot size difference between the >> + * on-disk flcount and the active [first, last] range of a wrapped agfl. This >> + * may also catch variants of agfl count corruption unrelated to padding. Either >> + * way, we'll reset the agfl and warn the user. >> + * >> + * Return true if a reset is required before the agfl can be used, false >> + * otherwise. >> + */ >> +static bool >> +xfs_agfl_need_reset( Nitpick: Rename xfs_agfl_need_reset to xfs_agfl_needs_reset for better English and flow. >> + struct xfs_mount *mp, >> + struct xfs_agf *agf) >> +{ >> + uint32_t f = be32_to_cpu(agf->agf_flfirst); >> + uint32_t l = be32_to_cpu(agf->agf_fllast); >> + uint32_t c = be32_to_cpu(agf->agf_flcount); >> + int agfl_size = xfs_agfl_size(mp); >> + int active; >> + >> + /* no agfl header on v4 supers */ >> + if (!xfs_sb_version_hascrc(&mp->m_sb)) >> + return false; >> + >> + /* >> + * The agf read verifier catches severe corruption of these fields. >> + * Repeat some sanity checks to cover a packed -> unpacked mismatch if >> + * the verifier allows it. >> + */ >> + if (f >= agfl_size || l >= agfl_size) >> + return true; >> + if (c > agfl_size) >> + return true; >> + >> + /* >> + * Check consistency between the on-disk count and the active range. An >> + * agfl padding mismatch manifests as an inconsistent flcount. >> + */ >> + if (c && l >= f) >> + active = l - f + 1; >> + else if (c) >> + active = agfl_size - f + l + 1; >> + else >> + active = 0; >> + if (active != c) >> + return true; >> + >> + return false; > > This could become "return active != c;" > > But otherwise looks ok enough to try it out, > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > --D > >> +} >> + >> +/* >> + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the >> + * agfl content cannot be trusted. Warn the user that a repair is required to >> + * recover leaked blocks. >> + * >> + * The purpose of this mechanism is to handle filesystems affected by the agfl >> + * header padding mismatch problem. A reset keeps the filesystem online with a >> + * relatively minor free space accounting inconsistency rather than suffer the >> + * inevitable crash from use of an invalid agfl block. >> + */ >> +static void >> +xfs_agfl_reset( >> + struct xfs_trans *tp, >> + struct xfs_buf *agbp, >> + struct xfs_perag *pag) >> +{ >> + struct xfs_mount *mp = tp->t_mountp; >> + struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp); >> + >> + ASSERT(pag->pagf_agflreset); >> + trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_); >> + >> + xfs_warn(mp, >> + "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. " >> + "Please unmount and run xfs_repair.", >> + pag->pag_agno, pag->pagf_flcount); >> + >> + agf->agf_flfirst = 0; >> + agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1); >> + agf->agf_flcount = 0; >> + xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST | >> + XFS_AGF_FLCOUNT); >> + >> + pag->pagf_flcount = 0; >> + pag->pagf_agflreset = false; >> +} >> + >> +/* >> * Decide whether to use this allocation group for this allocation. >> * If so, fix up the btree freelist's size. >> */ >> @@ -2123,6 +2212,10 @@ xfs_alloc_fix_freelist( >> } >> } >> >> + /* reset a padding mismatched agfl before final free space check */ >> + if (pag->pagf_agflreset) >> + xfs_agfl_reset(tp, agbp, pag); >> + >> /* If there isn't enough total space or single-extent, reject it. */ >> need = xfs_alloc_min_freelist(mp, pag); >> if (!xfs_alloc_space_available(args, need, flags)) >> @@ -2279,6 +2372,7 @@ xfs_alloc_get_freelist( >> agf->agf_flfirst = 0; >> >> pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); >> + ASSERT(!pag->pagf_agflreset); >> be32_add_cpu(&agf->agf_flcount, -1); >> xfs_trans_agflist_delta(tp, -1); >> pag->pagf_flcount--; >> @@ -2390,6 +2484,7 @@ xfs_alloc_put_freelist( >> agf->agf_fllast = 0; >> >> pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); >> + ASSERT(!pag->pagf_agflreset); >> be32_add_cpu(&agf->agf_flcount, 1); >> xfs_trans_agflist_delta(tp, 1); >> pag->pagf_flcount++; >> @@ -2597,6 +2692,7 @@ xfs_alloc_read_agf( >> pag->pagb_count = 0; >> pag->pagb_tree = RB_ROOT; >> pag->pagf_init = 1; >> + pag->pagf_agflreset = xfs_agfl_need_reset(mp, agf); >> } >> #ifdef DEBUG >> else if (!XFS_FORCED_SHUTDOWN(mp)) { >> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h >> index 1808f56decaa..10b90bbc5162 100644 >> --- a/fs/xfs/xfs_mount.h >> +++ b/fs/xfs/xfs_mount.h >> @@ -353,6 +353,7 @@ typedef struct xfs_perag { >> char pagi_inodeok; /* The agi is ok for inodes */ >> uint8_t pagf_levels[XFS_BTNUM_AGF]; >> /* # of levels in bno & cnt btree */ >> + bool pagf_agflreset; /* agfl requires reset before use */ >> uint32_t pagf_flcount; /* count of blocks in freelist */ >> xfs_extlen_t pagf_freeblks; /* total free blocks */ >> xfs_extlen_t pagf_longest; /* longest free space */ >> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h >> index 945de08af7ba..a982c0b623d0 100644 >> --- a/fs/xfs/xfs_trace.h >> +++ b/fs/xfs/xfs_trace.h >> @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim, >> __entry->tlen) >> ); >> >> -TRACE_EVENT(xfs_agf, >> +DECLARE_EVENT_CLASS(xfs_agf_class, >> TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, >> unsigned long caller_ip), >> TP_ARGS(mp, agf, flags, caller_ip), >> @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf, >> __entry->longest, >> (void *)__entry->caller_ip) >> ); >> +#define DEFINE_AGF_EVENT(name) \ >> +DEFINE_EVENT(xfs_agf_class, name, \ >> + TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \ >> + unsigned long caller_ip), \ >> + TP_ARGS(mp, agf, flags, caller_ip)) >> +DEFINE_AGF_EVENT(xfs_agf); >> +DEFINE_AGF_EVENT(xfs_agfl_reset); >> >> TRACE_EVENT(xfs_free_extent, >> TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno, >> -- >> 2.13.6 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com> I'm also assuming this will get submitted back to the linux-stable trees as the agfl packing change is already causing issues in the stable trees. If you do not intend to push it into the linux-stable trees let me know and I'll take care of at least the major ones. Thanks, Dave -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote: > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong > <darrick.wong@oracle.com> wrote: > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote: ... > > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com> > > I'm also assuming this will get submitted back to the linux-stable > trees as the agfl packing change is already causing issues in the > stable trees. If you do not intend to push it into the linux-stable > trees let me know and I'll take care of at least the major ones. > Yeah, I can cc stable in the next post along with the other minor fixes. My question is how far back should this fix go? Was the plan to only go back to v4.5 because that is where the packing fix first went in? Or should this go back further because it looks like the packing fix was backported to v3.10: $ git show 96f859d52bcb1 commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30 Author: Darrick J. Wong <darrick.wong@oracle.com> Date: Mon Jan 4 16:13:21 2016 +1100 libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct ... cc: <stable@vger.kernel.org> # 3.10 - 4.4 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Brian > Thanks, > Dave > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote: > On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote: > > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong > > <darrick.wong@oracle.com> wrote: > > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote: > ... > > > > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com> > > > > I'm also assuming this will get submitted back to the linux-stable > > trees as the agfl packing change is already causing issues in the > > stable trees. If you do not intend to push it into the linux-stable > > trees let me know and I'll take care of at least the major ones. > > > > Yeah, I can cc stable in the next post along with the other minor fixes. > My question is how far back should this fix go? Was the plan to only go > back to v4.5 because that is where the packing fix first went in? Or > should this go back further because it looks like the packing fix was > backported to v3.10: > > $ git show 96f859d52bcb1 > commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30 > Author: Darrick J. Wong <darrick.wong@oracle.com> > Date: Mon Jan 4 16:13:21 2016 +1100 > > libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct > > ... > > cc: <stable@vger.kernel.org> # 3.10 - 4.4 > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > Reviewed-by: Dave Chinner <dchinner@redhat.com> > Signed-off-by: Dave Chinner <david@fromorbit.com> Hmmm, I'm assuming that you'd want 3.10 at least for RHEL, but I'll let you all figure that one out. As far as the upstream kernels, 4.14.27, 4.9.87, 4.4.121, and 4.1.50 have that packing patch so I guess they'll all need some version of this. --D > > Brian > > > Thanks, > > Dave > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 15, 2018 at 10:46 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote: > On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote: >> On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote: >> > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong >> > <darrick.wong@oracle.com> wrote: >> > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote: >> ... >> > >> > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com> >> > >> > I'm also assuming this will get submitted back to the linux-stable >> > trees as the agfl packing change is already causing issues in the >> > stable trees. If you do not intend to push it into the linux-stable >> > trees let me know and I'll take care of at least the major ones. >> > >> >> Yeah, I can cc stable in the next post along with the other minor fixes. >> My question is how far back should this fix go? Was the plan to only go >> back to v4.5 because that is where the packing fix first went in? Or >> should this go back further because it looks like the packing fix was >> backported to v3.10: >> >> $ git show 96f859d52bcb1 >> commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30 >> Author: Darrick J. Wong <darrick.wong@oracle.com> >> Date: Mon Jan 4 16:13:21 2016 +1100 >> >> libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct >> >> ... >> >> cc: <stable@vger.kernel.org> # 3.10 - 4.4 >> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> >> Reviewed-by: Dave Chinner <dchinner@redhat.com> >> Signed-off-by: Dave Chinner <david@fromorbit.com> > > Hmmm, I'm assuming that you'd want 3.10 at least for RHEL, but I'll let > you all figure that one out. > > As far as the upstream kernels, 4.14.27, 4.9.87, 4.4.121, and 4.1.50 > have that packing patch so I guess they'll all need some version of this. > > --D > >> >> Brian >> >> > Thanks, >> > Dave >> > -- RHEL is actually fine for now, since they explicitly remove the packing patch in their kernel, and xfsprogs. Once you submit the patches to linux-stable the ubuntu-kernel team monitors and includes patches for the releases that they are stable maintainers of *(they are downstream for 4.4 of gregkh, but currently maintain a 3.13, 4.13, and 4.15 tree). Also please add a Fixes line to your commit so it's obvious what patch it helps remediate. Fixes is actually not a great word here, but that looks to be what the submitting-patches.txt doc calls for. Fixes: 96f859d52bcb libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct This way stable maintainers understand that the fix resolves an issue that was introduced by that patch, and can apply/not apply appropriately. Although in all honesty the patch really applies to all stable kernels regardless of if the packing patch has been applied or not. Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote: > On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote: > > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong > > <darrick.wong@oracle.com> wrote: > > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote: > ... > > > > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com> > > > > I'm also assuming this will get submitted back to the linux-stable > > trees as the agfl packing change is already causing issues in the > > stable trees. If you do not intend to push it into the linux-stable > > trees let me know and I'll take care of at least the major ones. > > > > Yeah, I can cc stable in the next post along with the other minor fixes. > My question is how far back should this fix go? Was the plan to only go > back to v4.5 because that is where the packing fix first went in? Or > should this go back further because it looks like the packing fix was > backported to v3.10: > > $ git show 96f859d52bcb1 > commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30 > Author: Darrick J. Wong <darrick.wong@oracle.com> > Date: Mon Jan 4 16:13:21 2016 +1100 > > libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct > > ... > > cc: <stable@vger.kernel.org> # 3.10 - 4.4 > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > Reviewed-by: Dave Chinner <dchinner@redhat.com> > Signed-off-by: Dave Chinner <david@fromorbit.com> 3.10 was when the problem was first introduced. I have no idea whether it got backported that far but the stable kernel maintainers, so you'll have to manually audit all current long-term stable kernels to determine what kernels need backports. Cheers, Dave.
On Thu, Mar 15, 2018 at 11:27:02AM -0500, Dave Chiluk wrote: > On Thu, Mar 15, 2018 at 10:46 AM, Darrick J. Wong > <darrick.wong@oracle.com> wrote: > > On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote: > >> On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote: > >> > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong > >> > <darrick.wong@oracle.com> wrote: > >> > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote: > >> ... > >> > > >> > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com> > >> > > >> > I'm also assuming this will get submitted back to the linux-stable > >> > trees as the agfl packing change is already causing issues in the > >> > stable trees. If you do not intend to push it into the linux-stable > >> > trees let me know and I'll take care of at least the major ones. > >> > > >> > >> Yeah, I can cc stable in the next post along with the other minor fixes. > >> My question is how far back should this fix go? Was the plan to only go > >> back to v4.5 because that is where the packing fix first went in? Or > >> should this go back further because it looks like the packing fix was > >> backported to v3.10: > >> > >> $ git show 96f859d52bcb1 > >> commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30 > >> Author: Darrick J. Wong <darrick.wong@oracle.com> > >> Date: Mon Jan 4 16:13:21 2016 +1100 > >> > >> libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct > >> > >> ... > >> > >> cc: <stable@vger.kernel.org> # 3.10 - 4.4 > >> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > >> Reviewed-by: Dave Chinner <dchinner@redhat.com> > >> Signed-off-by: Dave Chinner <david@fromorbit.com> > > > > Hmmm, I'm assuming that you'd want 3.10 at least for RHEL, but I'll let > > you all figure that one out. > > > > As far as the upstream kernels, 4.14.27, 4.9.87, 4.4.121, and 4.1.50 > > have that packing patch so I guess they'll all need some version of this. > > > > --D > > > >> > >> Brian > >> > >> > Thanks, > >> > Dave > >> > -- > > RHEL is actually fine for now, since they explicitly remove the > packing patch in their kernel, and xfsprogs. Once you submit the > patches to linux-stable the ubuntu-kernel team monitors and includes > patches for the releases that they are stable maintainers of *(they > are downstream for 4.4 of gregkh, but currently maintain a 3.13, 4.13, > and 4.15 tree). > > Also please add a Fixes line to your commit so it's obvious what patch > it helps remediate. Fixes is actually not a great word here, but that > looks to be what the submitting-patches.txt doc calls for. > > Fixes: 96f859d52bcb libxfs: pack the agfl header structure so > XFS_AGFL_SIZE is correct No, please don't dumb down a complex issue to a simple, naive metadata tag like this. Explain the issue fully in the commit message, mentioning/referencing commits when appropriate. As I've mentioned in another thread recently about backports - it you are relying on "fixes" tags to determine what needs backporting, your backporting process is fundamentally broken. I don't care what the kernel documentatin says - it frequently does not apply because it's written by someone else for their own reasons and requirements that aren't relevant to us. They are guidelines, not rules, for that reason. > This way stable maintainers understand that the fix resolves an issue > that was introduced by that patch, and can apply/not apply > appropriately. I simply don't trust the stable process to get complex XFS backports right and correctly tested. e.g. We've had this problem before with things like error numbers changing sign @ 3.16 - patches from >3.16 were getting backported with negative errnos to kernels <3.16, and they were breaking because errors were not being correctly detected. Because nobody in the stable process was regression testing filesystem backports other than booting kernels, it wasn't until users installed and started reporting stable kernel regressions to us that we were able to identify the bugs and the process issues that caused them. Put simply: the stable kernel maintainers are not filesystem experts and they don't run filesystem regression tests to determine that the fixes don't have any unexpected side effects. What that means is that stable kernel backports need to be done under the eye of an XFS developer who then follows up by reviewing the backports once merged and running regression tests agtainst the resulting kernel as we cannot rely on the stable process to do this. It's a serious amount of work for something as critical as fixing an on-disk format problem, and we simply can't trust anyone else to do the job properly. Cheers, Dave.
On Fri, Mar 16, 2018 at 09:26:05AM +1100, Dave Chinner wrote: > On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote: > > On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote: > > > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong > > > <darrick.wong@oracle.com> wrote: > > > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote: > > ... > > > > > > Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com> > > > > > > I'm also assuming this will get submitted back to the linux-stable > > > trees as the agfl packing change is already causing issues in the > > > stable trees. If you do not intend to push it into the linux-stable > > > trees let me know and I'll take care of at least the major ones. > > > > > > > Yeah, I can cc stable in the next post along with the other minor fixes. > > My question is how far back should this fix go? Was the plan to only go > > back to v4.5 because that is where the packing fix first went in? Or > > should this go back further because it looks like the packing fix was > > backported to v3.10: > > > > $ git show 96f859d52bcb1 > > commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30 > > Author: Darrick J. Wong <darrick.wong@oracle.com> > > Date: Mon Jan 4 16:13:21 2016 +1100 > > > > libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct > > > > ... > > > > cc: <stable@vger.kernel.org> # 3.10 - 4.4 > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > Signed-off-by: Dave Chinner <david@fromorbit.com> > > 3.10 was when the problem was first introduced. I have no idea > whether it got backported that far but the stable kernel > maintainers, so you'll have to manually audit all current long-term > stable kernels to determine what kernels need backports. > The earliest I saw it backported was to 3.16, which looks like Luis had perhaps done that one. Otherwise, the kernels Darrick pointed out seem to have the fix. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 3db90b707fb2..7dfec92f28dc 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2063,6 +2063,95 @@ xfs_alloc_space_available( } /* + * Check the agfl fields of the agf for inconsistency or corruption. The purpose + * is to detect an agfl header padding mismatch between current and early v5 + * kernels. This problem manifests as a 1-slot size difference between the + * on-disk flcount and the active [first, last] range of a wrapped agfl. This + * may also catch variants of agfl count corruption unrelated to padding. Either + * way, we'll reset the agfl and warn the user. + * + * Return true if a reset is required before the agfl can be used, false + * otherwise. + */ +static bool +xfs_agfl_need_reset( + struct xfs_mount *mp, + struct xfs_agf *agf) +{ + uint32_t f = be32_to_cpu(agf->agf_flfirst); + uint32_t l = be32_to_cpu(agf->agf_fllast); + uint32_t c = be32_to_cpu(agf->agf_flcount); + int agfl_size = xfs_agfl_size(mp); + int active; + + /* no agfl header on v4 supers */ + if (!xfs_sb_version_hascrc(&mp->m_sb)) + return false; + + /* + * The agf read verifier catches severe corruption of these fields. + * Repeat some sanity checks to cover a packed -> unpacked mismatch if + * the verifier allows it. + */ + if (f >= agfl_size || l >= agfl_size) + return true; + if (c > agfl_size) + return true; + + /* + * Check consistency between the on-disk count and the active range. An + * agfl padding mismatch manifests as an inconsistent flcount. + */ + if (c && l >= f) + active = l - f + 1; + else if (c) + active = agfl_size - f + l + 1; + else + active = 0; + if (active != c) + return true; + + return false; +} + +/* + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the + * agfl content cannot be trusted. Warn the user that a repair is required to + * recover leaked blocks. + * + * The purpose of this mechanism is to handle filesystems affected by the agfl + * header padding mismatch problem. A reset keeps the filesystem online with a + * relatively minor free space accounting inconsistency rather than suffer the + * inevitable crash from use of an invalid agfl block. + */ +static void +xfs_agfl_reset( + struct xfs_trans *tp, + struct xfs_buf *agbp, + struct xfs_perag *pag) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp); + + ASSERT(pag->pagf_agflreset); + trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_); + + xfs_warn(mp, + "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. " + "Please unmount and run xfs_repair.", + pag->pag_agno, pag->pagf_flcount); + + agf->agf_flfirst = 0; + agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1); + agf->agf_flcount = 0; + xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST | + XFS_AGF_FLCOUNT); + + pag->pagf_flcount = 0; + pag->pagf_agflreset = false; +} + +/* * Decide whether to use this allocation group for this allocation. * If so, fix up the btree freelist's size. */ @@ -2123,6 +2212,10 @@ xfs_alloc_fix_freelist( } } + /* reset a padding mismatched agfl before final free space check */ + if (pag->pagf_agflreset) + xfs_agfl_reset(tp, agbp, pag); + /* If there isn't enough total space or single-extent, reject it. */ need = xfs_alloc_min_freelist(mp, pag); if (!xfs_alloc_space_available(args, need, flags)) @@ -2279,6 +2372,7 @@ xfs_alloc_get_freelist( agf->agf_flfirst = 0; pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); + ASSERT(!pag->pagf_agflreset); be32_add_cpu(&agf->agf_flcount, -1); xfs_trans_agflist_delta(tp, -1); pag->pagf_flcount--; @@ -2390,6 +2484,7 @@ xfs_alloc_put_freelist( agf->agf_fllast = 0; pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); + ASSERT(!pag->pagf_agflreset); be32_add_cpu(&agf->agf_flcount, 1); xfs_trans_agflist_delta(tp, 1); pag->pagf_flcount++; @@ -2597,6 +2692,7 @@ xfs_alloc_read_agf( pag->pagb_count = 0; pag->pagb_tree = RB_ROOT; pag->pagf_init = 1; + pag->pagf_agflreset = xfs_agfl_need_reset(mp, agf); } #ifdef DEBUG else if (!XFS_FORCED_SHUTDOWN(mp)) { diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 1808f56decaa..10b90bbc5162 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -353,6 +353,7 @@ typedef struct xfs_perag { char pagi_inodeok; /* The agi is ok for inodes */ uint8_t pagf_levels[XFS_BTNUM_AGF]; /* # of levels in bno & cnt btree */ + bool pagf_agflreset; /* agfl requires reset before use */ uint32_t pagf_flcount; /* count of blocks in freelist */ xfs_extlen_t pagf_freeblks; /* total free blocks */ xfs_extlen_t pagf_longest; /* longest free space */ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 945de08af7ba..a982c0b623d0 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim, __entry->tlen) ); -TRACE_EVENT(xfs_agf, +DECLARE_EVENT_CLASS(xfs_agf_class, TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, unsigned long caller_ip), TP_ARGS(mp, agf, flags, caller_ip), @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf, __entry->longest, (void *)__entry->caller_ip) ); +#define DEFINE_AGF_EVENT(name) \ +DEFINE_EVENT(xfs_agf_class, name, \ + TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \ + unsigned long caller_ip), \ + TP_ARGS(mp, agf, flags, caller_ip)) +DEFINE_AGF_EVENT(xfs_agf); +DEFINE_AGF_EVENT(xfs_agfl_reset); TRACE_EVENT(xfs_free_extent, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
The struct xfs_agfl v5 header was originally introduced with unexpected padding that caused the AGFL to operate with one less slot than intended. The header has since been packed, but the fix left an incompatibility for users who upgrade from an old kernel with the unpacked header to a newer kernel with the packed header while the AGFL happens to wrap around the end. The newer kernel recognizes one extra slot at the physical end of the AGFL that the previous kernel did not. The new kernel will eventually attempt to allocate a block from that slot, which contains invalid data, and cause a crash. This condition can be detected by comparing the active range of the AGFL to the count. While this detects a padding mismatch, it can also trigger false positives for unrelated flcount corruption. Since we cannot distinguish a size mismatch due to padding from unrelated corruption, we can't trust the AGFL enough to simply repopulate the empty slot. Instead, avoid unnecessarily complex detection logic and and use a solution that can handle any form of flcount corruption that slips through read verifiers: distrust the entire AGFL and reset it to an empty state. Any valid blocks within the AGFL are intentionally leaked. This requires xfs_repair to rectify (which was already necessary based on the state the AGFL was found in). The reset mitigates the side effect of the padding mismatch problem from a filesystem crash to a free space accounting inconsistency. The generic approach also means that this patch can be safely backported to kernels with or without a packed struct xfs_agfl. Check the AGF for an invalid freelist count on initial read from disk. If detected, set a flag on the xfs_perag to indicate that a reset is required before the AGFL can be used. In the first transaction that attempts to use a flagged AGFL, reset it to empty, warn the user about the inconsistency and allow the freelist fixup code to repopulate the AGFL with new blocks. The xfs_perag flag is cleared to eliminate the need for repeated checks on each block allocation operation. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Brian Foster <bfoster@redhat.com> --- Hi all, Here's an actual patch for the AGFL reset approach. This survives a normal xfstests run. I also ran some quick xfstests runs with hacks in place to unconditionally trigger an agfl reset on first read. This caused a ton of test failures due to the resulting accounting inconsistency on the scratch device, but otherwise it survived from a torture test perspective and didn't introduce any unrelated regressions that I could see. Thoughts, reviews, flames appreciated. Brian v1: - Use a simplified AGFL reset mechanism. - Trigger on AGFL fixup rather than get/put. - Various refactors, cleanups and comments. rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2 fs/xfs/libxfs/xfs_alloc.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_trace.h | 9 ++++- 3 files changed, 105 insertions(+), 1 deletion(-)