diff mbox series

[15/14] xfs: capture inode generation numbers in the ondisk exchmaps log item

Message ID 20240410000528.GR6390@frogsfrogsfrogs (mailing list archive)
State New, archived
Headers show
Series [01/14] vfs: export remap and write check helpers | expand

Commit Message

Darrick J. Wong April 10, 2024, 12:05 a.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Per some very late review comments, capture the generation numbers of
both inodes involved in a file content exchange operation so that we
don't accidentally target files with have been reallocated.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
I'm throwing this one on the pile since I guess it's not so hard to add
the generation number to a brand new log item.
---
 fs/xfs/libxfs/xfs_log_format.h |    2 ++
 fs/xfs/xfs_exchmaps_item.c     |   12 ++++++++++++
 2 files changed, 14 insertions(+)

Comments

Christoph Hellwig April 10, 2024, 4 a.m. UTC | #1
On Tue, Apr 09, 2024 at 05:05:28PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Per some very late review comments, capture the generation numbers of
> both inodes involved in a file content exchange operation so that we
> don't accidentally target files with have been reallocated.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> I'm throwing this one on the pile since I guess it's not so hard to add
> the generation number to a brand new log item.

It does looks fine to me, but it leaves the question open:  why here
and not elsewhere.  And the answer based on the previous discussions
is that this is the first new log item after the problem was known
and we'll need to eventually rev the other ino based items as well.
Maybe capture this in a comment?
Darrick J. Wong April 10, 2024, 6:39 p.m. UTC | #2
On Wed, Apr 10, 2024 at 06:00:58AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:05:28PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Per some very late review comments, capture the generation numbers of
> > both inodes involved in a file content exchange operation so that we
> > don't accidentally target files with have been reallocated.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > I'm throwing this one on the pile since I guess it's not so hard to add
> > the generation number to a brand new log item.
> 
> It does looks fine to me, but it leaves the question open:  why here
> and not elsewhere.  And the answer based on the previous discussions
> is that this is the first new log item after the problem was known
> and we'll need to eventually rev the other ino based items as well.
> Maybe capture this in a comment?

	/*
	 * This log intent item targets inodes, which means that it effectively
	 * contains a file handle.  Check that the generation numbers match the
	 * intent item like we do for other file handles.  This is the first
	 * new log intent item to be defined after this validation weakness was
	 * identified, which is why recovery for other items do not check this.
	 */

How about that?

--D
Christoph Hellwig April 11, 2024, 3:25 a.m. UTC | #3
On Wed, Apr 10, 2024 at 11:39:31AM -0700, Darrick J. Wong wrote:
> 	/*
> 	 * This log intent item targets inodes, which means that it effectively
> 	 * contains a file handle.  Check that the generation numbers match the
> 	 * intent item like we do for other file handles.  This is the first
> 	 * new log intent item to be defined after this validation weakness was
> 	 * identified, which is why recovery for other items do not check this.
> 	 */
> 
> How about that?

Sounds good.
Dave Chinner April 17, 2024, 11:42 p.m. UTC | #4
On Tue, Apr 09, 2024 at 05:05:28PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Per some very late review comments, capture the generation numbers of
> both inodes involved in a file content exchange operation so that we
> don't accidentally target files with have been reallocated.

<sigh>

That's a really poor commit message, and that's ignoring the fact
the change is actually unnecessary.

The issue raised during review didn't need code to be added - it was
a question about inode lifecycles and interactions with user driven
intent chains.  Instead of discussing and working through the issue
raised to determine if it was a real issue or not, you immediately
assumed everything had to change everywhere and started changing
code. Then it turned into a hot mess and you started ranting and
lecturing people about how you do want critical reviews of this
code, and now we have this completely unexplained, unnecessary patch
in the series.

Yes, I did perform a followup investigation to that was needed
to answer the question I had posed during review. The question was
whether the intent recovery at the end of replay is subject to inode
life-cycle events during the post-intent, pre-done portion of
recovery.

Fundamentally, intent chains run in a context that holds an inode
reference aren't subject to inode life cycle issues and so we don't
need the generation number in the intent to identify the inode. I'd
largely forgotten all this because I haven't looked at BUIs and
intent extent maps for a -long- time and so I forgot all about the
inode numbers they encode and the reasons they don't need generation
numbers.

i.e. because we can't free an inode while there is an open,
unresolved intent chain running, there can't be any life cycle
issues with inode numbers in the journal. In the case of exchange:

- exchange is done with a reference to the inode via open file
  descriptors.
- the ofds cannot be released until the exchange operation returns to
  userspace.
- the last reference to the inode is therefore held until after the
  entire intent chain is committed to the journal.
- therefore, inode freeing can only occur after the exchange returns
  to userspace and so can only occur in the journal -after- the
  intent chain is complete in the journal.

Therefore: if the intent chain in the journal is not complete
we are guaranteed that the inode in the exchange items is live and
valid in the filesytem and the intent chain is acting on the current
lifecycle instance of the inode.

So, yeah, we don't need inode generation numbers in intent items
that are acting on an inode, and we probably should document that
somewhere so we don't forget about it again...

-Dave.
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 8dbe1f997dfd5..accba2acd623d 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -896,6 +896,8 @@  struct xfs_xmi_log_format {
 
 	uint64_t		xmi_inode1;	/* inumber of first file */
 	uint64_t		xmi_inode2;	/* inumber of second file */
+	uint32_t		xmi_igen1;	/* generation of first file */
+	uint32_t		xmi_igen2;	/* generation of second file */
 	uint64_t		xmi_startoff1;	/* block offset into file1 */
 	uint64_t		xmi_startoff2;	/* block offset into file2 */
 	uint64_t		xmi_blockcount;	/* number of blocks */
diff --git a/fs/xfs/xfs_exchmaps_item.c b/fs/xfs/xfs_exchmaps_item.c
index a40216f33214c..3c4bb9601c3e0 100644
--- a/fs/xfs/xfs_exchmaps_item.c
+++ b/fs/xfs/xfs_exchmaps_item.c
@@ -231,7 +231,9 @@  xfs_exchmaps_create_intent(
 	xlf = &xmi_lip->xmi_format;
 
 	xlf->xmi_inode1 = xmi->xmi_ip1->i_ino;
+	xlf->xmi_igen1 = VFS_I(xmi->xmi_ip1)->i_generation;
 	xlf->xmi_inode2 = xmi->xmi_ip2->i_ino;
+	xlf->xmi_igen2 = VFS_I(xmi->xmi_ip2)->i_generation;
 	xlf->xmi_startoff1 = xmi->xmi_startoff1;
 	xlf->xmi_startoff2 = xmi->xmi_startoff2;
 	xlf->xmi_blockcount = xmi->xmi_blockcount;
@@ -377,6 +379,14 @@  xfs_xmi_item_recover_intent(
 	if (error)
 		goto err_rele1;
 
+	if (VFS_I(ip1)->i_generation != xlf->xmi_igen1 ||
+	    VFS_I(ip2)->i_generation != xlf->xmi_igen2) {
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+				xlf, sizeof(*xlf));
+		error = -EFSCORRUPTED;
+		goto err_rele2;
+	}
+
 	req->ip1 = ip1;
 	req->ip2 = ip2;
 	req->startoff1 = xlf->xmi_startoff1;
@@ -485,6 +495,8 @@  xfs_exchmaps_relog_intent(
 
 	new_xlf->xmi_inode1	= old_xlf->xmi_inode1;
 	new_xlf->xmi_inode2	= old_xlf->xmi_inode2;
+	new_xlf->xmi_igen1	= old_xlf->xmi_igen1;
+	new_xlf->xmi_igen2	= old_xlf->xmi_igen2;
 	new_xlf->xmi_startoff1	= old_xlf->xmi_startoff1;
 	new_xlf->xmi_startoff2	= old_xlf->xmi_startoff2;
 	new_xlf->xmi_blockcount	= old_xlf->xmi_blockcount;