[RFC] xfs: warn instead of fail verifier on empty attr3 leaf block
diff mbox series

Message ID 20200511185016.33684-1-bfoster@redhat.com
State New
Headers show
Series
  • [RFC] xfs: warn instead of fail verifier on empty attr3 leaf block
Related show

Commit Message

Brian Foster May 11, 2020, 6:50 p.m. UTC
Signed-off-by: Brian Foster <bfoster@redhat.com>
---

What do folks think of something like this? We have a user report of a
corresponding read verifier failure while processing unlinked inodes.
This presumably means the attr fork was put in this state because the
format conversion and xattr set are not atomic. For example, the
filesystem crashed after the format conversion transaction hit the log
but before the xattr set transaction. The subsequent recovery succeeds
according to the logic below, but if the attr didn't hit the log the
leaf block remains empty and sets a landmine for the next read attempt.
This either prevents further xattr operations on the inode or prevents
the inode from being removed from the unlinked list due to xattr
inactivation failure.

I've not confirmed that this is how the user got into this state, but
I've confirmed that it's possible. We have a couple band aids now (this
and the writeback variant) that intend to deal with this problem and
still haven't quite got it right, so personally I'm inclined to accept
the reality that an empty attr leaf block is an expected state based on
our current xattr implementation and just remove the check from the
verifier (at least until we have atomic sets). I turned it into a
warning/comment for the purpose of discussion. Thoughts?

Brian

 fs/xfs/libxfs/xfs_attr_leaf.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Comments

Christoph Hellwig May 12, 2020, 8:10 a.m. UTC | #1
On Mon, May 11, 2020 at 02:50:16PM -0400, Brian Foster wrote:
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> What do folks think of something like this? We have a user report of a
> corresponding read verifier failure while processing unlinked inodes.
> This presumably means the attr fork was put in this state because the
> format conversion and xattr set are not atomic. For example, the
> filesystem crashed after the format conversion transaction hit the log
> but before the xattr set transaction. The subsequent recovery succeeds
> according to the logic below, but if the attr didn't hit the log the
> leaf block remains empty and sets a landmine for the next read attempt.
> This either prevents further xattr operations on the inode or prevents
> the inode from being removed from the unlinked list due to xattr
> inactivation failure.
> 
> I've not confirmed that this is how the user got into this state, but
> I've confirmed that it's possible. We have a couple band aids now (this
> and the writeback variant) that intend to deal with this problem and
> still haven't quite got it right, so personally I'm inclined to accept
> the reality that an empty attr leaf block is an expected state based on
> our current xattr implementation and just remove the check from the
> verifier (at least until we have atomic sets). I turned it into a
> warning/comment for the purpose of discussion. Thoughts?

If the transaction is not atomic I don't think we should even
warn in this case, even if it is unlikely to happen..
Darrick J. Wong May 12, 2020, 3:53 p.m. UTC | #2
On Tue, May 12, 2020 at 01:10:37AM -0700, Christoph Hellwig wrote:
> On Mon, May 11, 2020 at 02:50:16PM -0400, Brian Foster wrote:
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > What do folks think of something like this? We have a user report of a
> > corresponding read verifier failure while processing unlinked inodes.
> > This presumably means the attr fork was put in this state because the
> > format conversion and xattr set are not atomic. For example, the
> > filesystem crashed after the format conversion transaction hit the log
> > but before the xattr set transaction. The subsequent recovery succeeds
> > according to the logic below, but if the attr didn't hit the log the
> > leaf block remains empty and sets a landmine for the next read attempt.
> > This either prevents further xattr operations on the inode or prevents
> > the inode from being removed from the unlinked list due to xattr
> > inactivation failure.
> > 
> > I've not confirmed that this is how the user got into this state, but
> > I've confirmed that it's possible. We have a couple band aids now (this
> > and the writeback variant) that intend to deal with this problem and
> > still haven't quite got it right, so personally I'm inclined to accept
> > the reality that an empty attr leaf block is an expected state based on
> > our current xattr implementation and just remove the check from the
> > verifier (at least until we have atomic sets). I turned it into a
> > warning/comment for the purpose of discussion. Thoughts?
> 
> If the transaction is not atomic I don't think we should even
> warn in this case, even if it is unlikely to happen..

I was gonna say, I think we've messed this up enough that I think we
just have to accept empty attr leaf blocks. :/

I also think we should improve the ability to scan for and invalidate
incore buffers so that we can invalidate and truncate the attr fork
extents directly from an extent walk loop.  It seems a little silly that
we have to walk the dabtree just to find out where multiblock remote
attr value structures might be hiding.

--D
Christoph Hellwig May 12, 2020, 4:03 p.m. UTC | #3
On Tue, May 12, 2020 at 08:53:20AM -0700, Darrick J. Wong wrote:
> I was gonna say, I think we've messed this up enough that I think we
> just have to accept empty attr leaf blocks. :/
> 
> I also think we should improve the ability to scan for and invalidate
> incore buffers so that we can invalidate and truncate the attr fork
> extents directly from an extent walk loop.  It seems a little silly that
> we have to walk the dabtree just to find out where multiblock remote
> attr value structures might be hiding.

The buffers are indexed by the physical block number.  Unless you
want to move to logical indexing that's what we'll need to do.
Darrick J. Wong May 12, 2020, 4:19 p.m. UTC | #4
On Tue, May 12, 2020 at 09:03:00AM -0700, Christoph Hellwig wrote:
> On Tue, May 12, 2020 at 08:53:20AM -0700, Darrick J. Wong wrote:
> > I was gonna say, I think we've messed this up enough that I think we
> > just have to accept empty attr leaf blocks. :/
> > 
> > I also think we should improve the ability to scan for and invalidate
> > incore buffers so that we can invalidate and truncate the attr fork
> > extents directly from an extent walk loop.  It seems a little silly that
> > we have to walk the dabtree just to find out where multiblock remote
> > attr value structures might be hiding.
> 
> The buffers are indexed by the physical block number.  Unless you
> want to move to logical indexing that's what we'll need to do.

<shrug> I modded xfs_buf_incore and _xfs_buf_obj_cmp to return an
xfs_buf that matches map->bm_bn regardless of length and it seems fine
so far...

--D
Brian Foster May 12, 2020, 5:20 p.m. UTC | #5
On Tue, May 12, 2020 at 08:53:20AM -0700, Darrick J. Wong wrote:
> On Tue, May 12, 2020 at 01:10:37AM -0700, Christoph Hellwig wrote:
> > On Mon, May 11, 2020 at 02:50:16PM -0400, Brian Foster wrote:
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > 
> > > What do folks think of something like this? We have a user report of a
> > > corresponding read verifier failure while processing unlinked inodes.
> > > This presumably means the attr fork was put in this state because the
> > > format conversion and xattr set are not atomic. For example, the
> > > filesystem crashed after the format conversion transaction hit the log
> > > but before the xattr set transaction. The subsequent recovery succeeds
> > > according to the logic below, but if the attr didn't hit the log the
> > > leaf block remains empty and sets a landmine for the next read attempt.
> > > This either prevents further xattr operations on the inode or prevents
> > > the inode from being removed from the unlinked list due to xattr
> > > inactivation failure.
> > > 
> > > I've not confirmed that this is how the user got into this state, but
> > > I've confirmed that it's possible. We have a couple band aids now (this
> > > and the writeback variant) that intend to deal with this problem and
> > > still haven't quite got it right, so personally I'm inclined to accept
> > > the reality that an empty attr leaf block is an expected state based on
> > > our current xattr implementation and just remove the check from the
> > > verifier (at least until we have atomic sets). I turned it into a
> > > warning/comment for the purpose of discussion. Thoughts?
> > 
> > If the transaction is not atomic I don't think we should even
> > warn in this case, even if it is unlikely to happen..
> 
> I was gonna say, I think we've messed this up enough that I think we
> just have to accept empty attr leaf blocks. :/
> 

That makes at least 3 votes (including me) to drop the check so I'll
send a real patch after some regression testing. Thanks.

Brian

> I also think we should improve the ability to scan for and invalidate
> incore buffers so that we can invalidate and truncate the attr fork
> extents directly from an extent walk loop.  It seems a little silly that
> we have to walk the dabtree just to find out where multiblock remote
> attr value structures might be hiding.
> 
> --D
>

Patch
diff mbox series

diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 863444e2dda7..71cee43669e1 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -309,12 +309,18 @@  xfs_attr3_leaf_verify(
 		return fa;
 
 	/*
-	 * In recovery there is a transient state where count == 0 is valid
-	 * because we may have transitioned an empty shortform attr to a leaf
-	 * if the attr didn't fit in shortform.
+	 * There is a valid count == 0 state if we transitioned an empty
+	 * shortform attr to leaf format because an attr didn't fit in
+	 * shortform. This is intended to transient during recovery, but in
+	 * reality is not because the attr comes in a separate transaction from
+	 * format conversion and may not have hit the log. Warn if we encounter
+	 * this outside of recovery just to inform the user something might be
+	 * off.
 	 */
 	if (!xfs_log_in_recovery(mp) && ichdr.count == 0)
-		return __this_address;
+		xfs_warn(mp,
+	"Empty attr leaf block (bno 0x%llx). attr fork in unexpected format\n",
+			 bp->b_bn);
 
 	/*
 	 * firstused is the block offset of the first name info structure.