Message ID | 20240410000528.GR6390@frogsfrogsfrogs (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/14] vfs: export remap and write check helpers | expand |
On Tue, Apr 09, 2024 at 05:05:28PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Per some very late review comments, capture the generation numbers of > both inodes involved in a file content exchange operation so that we > don't accidentally target files with have been reallocated. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > I'm throwing this one on the pile since I guess it's not so hard to add > the generation number to a brand new log item. It does looks fine to me, but it leaves the question open: why here and not elsewhere. And the answer based on the previous discussions is that this is the first new log item after the problem was known and we'll need to eventually rev the other ino based items as well. Maybe capture this in a comment?
On Wed, Apr 10, 2024 at 06:00:58AM +0200, Christoph Hellwig wrote: > On Tue, Apr 09, 2024 at 05:05:28PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Per some very late review comments, capture the generation numbers of > > both inodes involved in a file content exchange operation so that we > > don't accidentally target files with have been reallocated. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > --- > > I'm throwing this one on the pile since I guess it's not so hard to add > > the generation number to a brand new log item. > > It does looks fine to me, but it leaves the question open: why here > and not elsewhere. And the answer based on the previous discussions > is that this is the first new log item after the problem was known > and we'll need to eventually rev the other ino based items as well. > Maybe capture this in a comment? /* * This log intent item targets inodes, which means that it effectively * contains a file handle. Check that the generation numbers match the * intent item like we do for other file handles. This is the first * new log intent item to be defined after this validation weakness was * identified, which is why recovery for other items do not check this. */ How about that? --D
On Wed, Apr 10, 2024 at 11:39:31AM -0700, Darrick J. Wong wrote: > /* > * This log intent item targets inodes, which means that it effectively > * contains a file handle. Check that the generation numbers match the > * intent item like we do for other file handles. This is the first > * new log intent item to be defined after this validation weakness was > * identified, which is why recovery for other items do not check this. > */ > > How about that? Sounds good.
On Tue, Apr 09, 2024 at 05:05:28PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Per some very late review comments, capture the generation numbers of > both inodes involved in a file content exchange operation so that we > don't accidentally target files with have been reallocated. <sigh> That's a really poor commit message, and that's ignoring the fact the change is actually unnecessary. The issue raised during review didn't need code to be added - it was a question about inode lifecycles and interactions with user driven intent chains. Instead of discussing and working through the issue raised to determine if it was a real issue or not, you immediately assumed everything had to change everywhere and started changing code. Then it turned into a hot mess and you started ranting and lecturing people about how you do want critical reviews of this code, and now we have this completely unexplained, unnecessary patch in the series. Yes, I did perform a followup investigation to that was needed to answer the question I had posed during review. The question was whether the intent recovery at the end of replay is subject to inode life-cycle events during the post-intent, pre-done portion of recovery. Fundamentally, intent chains run in a context that holds an inode reference aren't subject to inode life cycle issues and so we don't need the generation number in the intent to identify the inode. I'd largely forgotten all this because I haven't looked at BUIs and intent extent maps for a -long- time and so I forgot all about the inode numbers they encode and the reasons they don't need generation numbers. i.e. because we can't free an inode while there is an open, unresolved intent chain running, there can't be any life cycle issues with inode numbers in the journal. In the case of exchange: - exchange is done with a reference to the inode via open file descriptors. - the ofds cannot be released until the exchange operation returns to userspace. - the last reference to the inode is therefore held until after the entire intent chain is committed to the journal. - therefore, inode freeing can only occur after the exchange returns to userspace and so can only occur in the journal -after- the intent chain is complete in the journal. Therefore: if the intent chain in the journal is not complete we are guaranteed that the inode in the exchange items is live and valid in the filesytem and the intent chain is acting on the current lifecycle instance of the inode. So, yeah, we don't need inode generation numbers in intent items that are acting on an inode, and we probably should document that somewhere so we don't forget about it again... -Dave.
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index 8dbe1f997dfd5..accba2acd623d 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -896,6 +896,8 @@ struct xfs_xmi_log_format { uint64_t xmi_inode1; /* inumber of first file */ uint64_t xmi_inode2; /* inumber of second file */ + uint32_t xmi_igen1; /* generation of first file */ + uint32_t xmi_igen2; /* generation of second file */ uint64_t xmi_startoff1; /* block offset into file1 */ uint64_t xmi_startoff2; /* block offset into file2 */ uint64_t xmi_blockcount; /* number of blocks */ diff --git a/fs/xfs/xfs_exchmaps_item.c b/fs/xfs/xfs_exchmaps_item.c index a40216f33214c..3c4bb9601c3e0 100644 --- a/fs/xfs/xfs_exchmaps_item.c +++ b/fs/xfs/xfs_exchmaps_item.c @@ -231,7 +231,9 @@ xfs_exchmaps_create_intent( xlf = &xmi_lip->xmi_format; xlf->xmi_inode1 = xmi->xmi_ip1->i_ino; + xlf->xmi_igen1 = VFS_I(xmi->xmi_ip1)->i_generation; xlf->xmi_inode2 = xmi->xmi_ip2->i_ino; + xlf->xmi_igen2 = VFS_I(xmi->xmi_ip2)->i_generation; xlf->xmi_startoff1 = xmi->xmi_startoff1; xlf->xmi_startoff2 = xmi->xmi_startoff2; xlf->xmi_blockcount = xmi->xmi_blockcount; @@ -377,6 +379,14 @@ xfs_xmi_item_recover_intent( if (error) goto err_rele1; + if (VFS_I(ip1)->i_generation != xlf->xmi_igen1 || + VFS_I(ip2)->i_generation != xlf->xmi_igen2) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + xlf, sizeof(*xlf)); + error = -EFSCORRUPTED; + goto err_rele2; + } + req->ip1 = ip1; req->ip2 = ip2; req->startoff1 = xlf->xmi_startoff1; @@ -485,6 +495,8 @@ xfs_exchmaps_relog_intent( new_xlf->xmi_inode1 = old_xlf->xmi_inode1; new_xlf->xmi_inode2 = old_xlf->xmi_inode2; + new_xlf->xmi_igen1 = old_xlf->xmi_igen1; + new_xlf->xmi_igen2 = old_xlf->xmi_igen2; new_xlf->xmi_startoff1 = old_xlf->xmi_startoff1; new_xlf->xmi_startoff2 = old_xlf->xmi_startoff2; new_xlf->xmi_blockcount = old_xlf->xmi_blockcount;