diff mbox series

[2/7] xfs: shut down filesystem if we xfs_trans_cancel with deferred work items

Message ID 163961696648.3129691.5075630610079213754.stgit@magnolia (mailing list archive)
State Accepted
Headers show
Series xfs: random fixes for 5.17 | expand

Commit Message

Darrick J. Wong Dec. 16, 2021, 1:09 a.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

While debugging some very strange rmap corruption reports in connection
with the online directory repair code.  I root-caused the error to the
following incorrect sequence:

<start repair transaction>
<expand directory, causing a deferred rmap to be queued>
<roll transaction>
<cancel transaction>

Obviously, we should have committed the transaction instead of
cancelling it.  Thinking more broadly, however, xfs_trans_cancel should
have warned us that we were throwing away work item that we already
committed to performing.  This is not correct, and we need to shut down
the filesystem.

Change xfs_trans_cancel to complain in the loudest manner if we're
cancelling any transaction with deferred work items attached.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_trans.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Dave Chinner Dec. 16, 2021, 4:57 a.m. UTC | #1
On Wed, Dec 15, 2021 at 05:09:26PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While debugging some very strange rmap corruption reports in connection
> with the online directory repair code.  I root-caused the error to the
> following incorrect sequence:
> 
> <start repair transaction>
> <expand directory, causing a deferred rmap to be queued>
> <roll transaction>
> <cancel transaction>
> 
> Obviously, we should have committed the transaction instead of
> cancelling it.  Thinking more broadly, however, xfs_trans_cancel should
> have warned us that we were throwing away work item that we already
> committed to performing.  This is not correct, and we need to shut down
> the filesystem.
> 
> Change xfs_trans_cancel to complain in the loudest manner if we're
> cancelling any transaction with deferred work items attached.

Yup, seems reasonable.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig Dec. 24, 2021, 7:16 a.m. UTC | #2
On Wed, Dec 15, 2021 at 05:09:26PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While debugging some very strange rmap corruption reports in connection
> with the online directory repair code.  I root-caused the error to the
> following incorrect sequence:
> 
> <start repair transaction>
> <expand directory, causing a deferred rmap to be queued>
> <roll transaction>
> <cancel transaction>
> 
> Obviously, we should have committed the transaction instead of
> cancelling it.  Thinking more broadly, however, xfs_trans_cancel should
> have warned us that we were throwing away work item that we already
> committed to performing.  This is not correct, and we need to shut down
> the filesystem.
> 
> Change xfs_trans_cancel to complain in the loudest manner if we're
> cancelling any transaction with deferred work items attached.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 234a9d9c2f43..59e2f9031b9f 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -942,8 +942,17 @@  xfs_trans_cancel(
 
 	trace_xfs_trans_cancel(tp, _RET_IP_);
 
-	if (tp->t_flags & XFS_TRANS_PERM_LOG_RES)
+	/*
+	 * It's never valid to cancel a transaction with deferred ops attached,
+	 * because the transaction is effectively dirty.  Complain about this
+	 * loudly before freeing the in-memory defer items.
+	 */
+	if (!list_empty(&tp->t_dfops)) {
+		ASSERT(xfs_is_shutdown(mp) || list_empty(&tp->t_dfops));
+		ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
+		dirty = true;
 		xfs_defer_cancel(tp);
+	}
 
 	/*
 	 * See if the caller is relying on us to shut down the