Message ID | 20240812052352.3786445-2-hch@lst.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] xfs: fix handling of RCU freed inodes from other AGs in xfs_icwalk_ag | expand |
On Mon, Aug 12, 2024 at 07:23:00AM +0200, Christoph Hellwig wrote: > When xfs_icwalk_ag skips an inode because it was RCU freed from another > AG, the slot for the inode in the batch array needs to be zeroed. We > also really shouldn't try to grab the inode in that case (or at very > least undo the grab), so move the call to xfs_icwalk_ag after this sanity > check. > > Fixes: 1a3e8f3da09c ("xfs: convert inode cache lookups to use RCU locking") > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_icache.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > index ae3c049fd3a216..3ee92d3d1770db 100644 > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -1701,9 +1701,6 @@ xfs_icwalk_ag( > for (i = 0; i < nr_found; i++) { > struct xfs_inode *ip = batch[i]; > > - if (done || !xfs_icwalk_igrab(goal, ip, icw)) > - batch[i] = NULL; > - > /* > * Update the index for the next lookup. Catch > * overflows into the next AG range which can occur if > @@ -1716,8 +1713,14 @@ xfs_icwalk_ag( > * us to see this inode, so another lookup from the > * same index will not find it again. > */ > - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) > + if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) { > + batch[i] = NULL; > continue; > + } > + > + if (done || !xfs_icwalk_igrab(goal, ip, icw)) > + batch[i] = NULL; IOWs, if @ip has been freed and reallocated to a different AG, then we don't want to touch it at all, not even to check IRECLAIMABLE in igrab. I think that sounds correct so Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + > first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1); > if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) > done = true; > -- > 2.43.0 > >
On Mon, Aug 12, 2024 at 07:23:00AM +0200, Christoph Hellwig wrote: > When xfs_icwalk_ag skips an inode because it was RCU freed from another > AG, the slot for the inode in the batch array needs to be zeroed. We > also really shouldn't try to grab the inode in that case (or at very > least undo the grab), so move the call to xfs_icwalk_ag after this sanity > check. I think this change is invalid and needs to be reworked. It moves the actual "has the inode been RCU freed in this grace period" checks to after some other check on the inode is performed. From that perspective, this is a bad change to make. From another perspective, this is a poor change to make because.... > Fixes: 1a3e8f3da09c ("xfs: convert inode cache lookups to use RCU locking") > Signed-off-by: Christoph Hellwig <hch@lst.de> ... that commit and the current code isn't actually broken. In commit 1a3e8f3da09c, xfs_inode_ag_walk_grab() added explicit checks against using RCU freed inodes, and that nulls out the inode in the batch array: /* * check for stale RCU freed inode * * If the inode has been reallocated, it doesn't matter if it's not in * the AG we are walking - we are walking for writeback, so if it * passes all the "valid inode" checks and is dirty, then we'll write * it back anyway. If it has been reallocated and still being * initialised, the XFS_INEW check below will catch it. */ spin_lock(&ip->i_flags_lock); if (!ip->i_ino) goto out_unlock_noent; Indeed, xfs_icwalk_igrab() -> xfs_blockgc_igrab() still has this check: /* Check for stale RCU freed inode */ spin_lock(&ip->i_flags_lock); if (!ip->i_ino) goto out_unlock_noent; However, that commit used a different check for reclaimable inodes; it uses XFS_IRECLAIM to indicate that the inode is either being reclaimed or has been RCU freed and so reclaim should not touch it. xfs_reclaim_igrab() still retains that check: spin_lock(&ip->i_flags_lock); if (!__xfs_iflags_test(ip, XFS_IRECLAIMABLE) || __xfs_iflags_test(ip, XFS_IRECLAIM)) { /* not a reclaim candidate. */ spin_unlock(&ip->i_flags_lock); return false; } XFS_IRECLAIM is always set on RCU freed inodes (it's set at the same time ip->i_ino is set to zero in xfs_reclaim_inode()) before the inode is rcu freed, and so neither of these grab functions will allow RCU freed inodes to be grabbed. Hence: > --- > fs/xfs/xfs_icache.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > index ae3c049fd3a216..3ee92d3d1770db 100644 > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -1701,9 +1701,6 @@ xfs_icwalk_ag( > for (i = 0; i < nr_found; i++) { > struct xfs_inode *ip = batch[i]; > > - if (done || !xfs_icwalk_igrab(goal, ip, icw)) > - batch[i] = NULL; > - The existing code here -always- catches RCU freed inodes that have not yet been freed due to the grace period still running. This code replaces RCU freed inodes with NULL in the batch array, and so after this point the inode is guaranteed to be valid. > /* > * Update the index for the next lookup. Catch > * overflows into the next AG range which can occur if > @@ -1716,8 +1713,14 @@ xfs_icwalk_ag( > * us to see this inode, so another lookup from the > * same index will not find it again. > */ > - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) > + if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) { > + batch[i] = NULL; > continue; This check is not checking for RCU freed inodes - it was checking for indoes that are both freed and reallocated within a single RCU grace period. I added this check in 1a3e8f3da09c (backin 2010) because I wasn't 100% certain of the RCU existence guarantees and those interacted with read side critical sections. It was defensive, "just in case" code. Looking at this code now - with another 14 years of RCU algorithms experience under my belt - it is obvious that this code is unnecessary and always has been. From RCU first principles, we cannot see a new inode with a different inode number magically appear at this memory address during a rcu_read_lock() side critical period. Such behaviour would require the inode to be removed from the radix tree, freed and then reallocated and reinitialised to a different identity within a single RCU grace period. That is impossible because the object we found during the lookup won't get freed until the RCU grace period ends sometime after we drop the rcu_read_lock().... Hence the right thing to do here is to remove this check completely and leave the rest of the code alone.... -Dave.
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index ae3c049fd3a216..3ee92d3d1770db 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1701,9 +1701,6 @@ xfs_icwalk_ag( for (i = 0; i < nr_found; i++) { struct xfs_inode *ip = batch[i]; - if (done || !xfs_icwalk_igrab(goal, ip, icw)) - batch[i] = NULL; - /* * Update the index for the next lookup. Catch * overflows into the next AG range which can occur if @@ -1716,8 +1713,14 @@ xfs_icwalk_ag( * us to see this inode, so another lookup from the * same index will not find it again. */ - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) + if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) { + batch[i] = NULL; continue; + } + + if (done || !xfs_icwalk_igrab(goal, ip, icw)) + batch[i] = NULL; + first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1); if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) done = true;
When xfs_icwalk_ag skips an inode because it was RCU freed from another AG, the slot for the inode in the batch array needs to be zeroed. We also really shouldn't try to grab the inode in that case (or at very least undo the grab), so move the call to xfs_icwalk_ag after this sanity check. Fixes: 1a3e8f3da09c ("xfs: convert inode cache lookups to use RCU locking") Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_icache.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)