Message ID | 166473482943.1084685.12751834399982118437.stgit@magnolia (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | xfs: fix iget/irele usage in online fsck | expand |
On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Right now, there are statements scattered all over the online fsck > codebase about how we can't use XFS_IGET_DONTCACHE because of concerns > about scrub's unusual practice of releasing inodes with transactions > held. > > However, iget is the wrong place to handle this -- the DONTCACHE state > doesn't matter at all until we try to *release* the inode, and here we > get things wrong in multiple ways: > > First, if we /do/ have a transaction, we must NOT drop the inode, > because the inode could have dirty pages, dropping the inode will > trigger writeback, and writeback can trigger a nested transaction. > > Second, if the inode already had an active reference and the DONTCACHE > flag set, the icache hit when scrub grabs another ref will not clear > DONTCACHE. This is sort of by design, since DONTCACHE is now used to > initiate cache drops so that sysadmins can change a file's access mode > between pagecache and DAX. > > Third, if we do actually have the last active reference to the inode, we > can set DONTCACHE to avoid polluting the cache. This is the /one/ case > where we actually want that flag. > > Create an xchk_irele helper to encode all that logic and switch the > online fsck code to use it. Since this now means that nearly all > scrubbers use the same xfs_iget flags, we can wrap them too. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> Ok, I can see what needs to be done here. It seems a bit fragile, but I don't see a better way at the moment. That said... > diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c > index ab182a5cd0c0..38ea04e66468 100644 > --- a/fs/xfs/scrub/parent.c > +++ b/fs/xfs/scrub/parent.c > @@ -131,7 +131,6 @@ xchk_parent_validate( > xfs_ino_t dnum, > bool *try_again) > { > - struct xfs_mount *mp = sc->mp; > struct xfs_inode *dp = NULL; > xfs_nlink_t expected_nlink; > xfs_nlink_t nlink; > @@ -168,7 +167,7 @@ xchk_parent_validate( > * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a > * cross referencing error. Any other error is an operational error. > */ > - error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_UNTRUSTED, 0, &dp); > + error = xchk_iget(sc, dnum, &dp); > if (error == -EINVAL || error == -ENOENT) { > error = -EFSCORRUPTED; > xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error); > @@ -253,7 +252,7 @@ xchk_parent_validate( > out_unlock: > xfs_iunlock(dp, XFS_IOLOCK_SHARED); > out_rele: > - xfs_irele(dp); > + xchk_irele(sc, dp); > out: > return error; > } Didn't you miss a couple of cases here? THe current upstream code looks like: ....... 237 /* Drat, parent changed. Try again! */ 238 if (dnum != dp->i_ino) { 239 xfs_irele(dp); 240 *try_again = true; 241 return 0; 242 } 243 xfs_irele(dp); 244 245 /* 246 * '..' didn't change, so check that there was only one entry 247 * for us in the parent. 248 */ 249 if (nlink != expected_nlink) 250 xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); 251 return error; 252 253 out_unlock: 254 xfs_iunlock(dp, XFS_IOLOCK_SHARED); 255 out_rele: 256 xfs_irele(dp); 257 out: 258 return error; 259 } So it looks like you missed the conversion at lines 239 and 243. Of course, these may have been removed in a prior patchset I've looked at and forgotten about, but on the surface this looks like missed conversions. Cheers, Dave.
On Tue, Nov 15, 2022 at 02:13:18PM +1100, Dave Chinner wrote: > On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Right now, there are statements scattered all over the online fsck > > codebase about how we can't use XFS_IGET_DONTCACHE because of concerns > > about scrub's unusual practice of releasing inodes with transactions > > held. > > > > However, iget is the wrong place to handle this -- the DONTCACHE state > > doesn't matter at all until we try to *release* the inode, and here we > > get things wrong in multiple ways: > > > > First, if we /do/ have a transaction, we must NOT drop the inode, > > because the inode could have dirty pages, dropping the inode will > > trigger writeback, and writeback can trigger a nested transaction. > > > > Second, if the inode already had an active reference and the DONTCACHE > > flag set, the icache hit when scrub grabs another ref will not clear > > DONTCACHE. This is sort of by design, since DONTCACHE is now used to > > initiate cache drops so that sysadmins can change a file's access mode > > between pagecache and DAX. > > > > Third, if we do actually have the last active reference to the inode, we > > can set DONTCACHE to avoid polluting the cache. This is the /one/ case > > where we actually want that flag. > > > > Create an xchk_irele helper to encode all that logic and switch the > > online fsck code to use it. Since this now means that nearly all > > scrubbers use the same xfs_iget flags, we can wrap them too. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > Ok, I can see what needs to be done here. It seems a bit fragile, > but I don't see a better way at the moment. > > That said... > > > diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c > > index ab182a5cd0c0..38ea04e66468 100644 > > --- a/fs/xfs/scrub/parent.c > > +++ b/fs/xfs/scrub/parent.c > > @@ -131,7 +131,6 @@ xchk_parent_validate( > > xfs_ino_t dnum, > > bool *try_again) > > { > > - struct xfs_mount *mp = sc->mp; > > struct xfs_inode *dp = NULL; > > xfs_nlink_t expected_nlink; > > xfs_nlink_t nlink; > > @@ -168,7 +167,7 @@ xchk_parent_validate( > > * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a > > * cross referencing error. Any other error is an operational error. > > */ > > - error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_UNTRUSTED, 0, &dp); > > + error = xchk_iget(sc, dnum, &dp); > > if (error == -EINVAL || error == -ENOENT) { > > error = -EFSCORRUPTED; > > xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error); > > @@ -253,7 +252,7 @@ xchk_parent_validate( > > out_unlock: > > xfs_iunlock(dp, XFS_IOLOCK_SHARED); > > out_rele: > > - xfs_irele(dp); > > + xchk_irele(sc, dp); > > out: > > return error; > > } > > Didn't you miss a couple of cases here? THe current upstream code > looks like: > > ....... > 237 /* Drat, parent changed. Try again! */ > 238 if (dnum != dp->i_ino) { > 239 xfs_irele(dp); > 240 *try_again = true; > 241 return 0; > 242 } > 243 xfs_irele(dp); > 244 > 245 /* > 246 * '..' didn't change, so check that there was only one entry > 247 * for us in the parent. > 248 */ > 249 if (nlink != expected_nlink) > 250 xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); > 251 return error; > 252 > 253 out_unlock: > 254 xfs_iunlock(dp, XFS_IOLOCK_SHARED); > 255 out_rele: > 256 xfs_irele(dp); > 257 out: > 258 return error; > 259 } > > So it looks like you missed the conversion at lines 239 and 243. Of > course, these may have been removed in a prior patchset I've looked > at and forgotten about, but on the surface this looks like missed > conversions. Actually, I probably missed it because one of the follow-on fixpatches in the v23.1 patchbomb removes it entirely: https://lore.kernel.org/linux-xfs/166473483278.1084804.14032671424392139245.stgit@magnolia/ --D > Cheers, > > Dave. > > -- > Dave Chinner > david@fromorbit.com
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 73ac38c1126e..42a25488bd25 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -710,6 +710,16 @@ xchk_checkpoint_log( return 0; } +/* Verify that an inode is allocated ondisk, then return its cached inode. */ +int +xchk_iget( + struct xfs_scrub *sc, + xfs_ino_t inum, + struct xfs_inode **ipp) +{ + return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp); +} + /* * Given an inode and the scrub control structure, grab either the * inode referenced in the control structure or the inode passed in. @@ -734,8 +744,7 @@ xchk_get_inode( /* Look up the inode, see if the generation number matches. */ if (xfs_internal_inum(mp, sc->sm->sm_ino)) return -ENOENT; - error = xfs_iget(mp, NULL, sc->sm->sm_ino, - XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &ip); + error = xchk_iget(sc, sc->sm->sm_ino, &ip); switch (error) { case -ENOENT: /* Inode doesn't exist, just bail out. */ @@ -757,7 +766,7 @@ xchk_get_inode( * that it no longer exists. */ error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap, - XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE); + XFS_IGET_UNTRUSTED); if (error) return -ENOENT; error = -EFSCORRUPTED; @@ -770,7 +779,7 @@ xchk_get_inode( return error; } if (VFS_I(ip)->i_generation != sc->sm->sm_gen) { - xfs_irele(ip); + xchk_irele(sc, ip); return -ENOENT; } @@ -778,6 +787,41 @@ xchk_get_inode( return 0; } +/* Release an inode, possibly dropping it in the process. */ +void +xchk_irele( + struct xfs_scrub *sc, + struct xfs_inode *ip) +{ + if (current->journal_info != NULL) { + ASSERT(current->journal_info == sc->tp); + + /* + * If we are in a transaction, we /cannot/ drop the inode + * ourselves, because the VFS will trigger writeback, which + * can require a transaction. Clear DONTCACHE to force the + * inode to the LRU, where someone else can take care of + * dropping it. + * + * Note that when we grabbed our reference to the inode, it + * could have had an active ref and DONTCACHE set if a sysadmin + * is trying to coerce a change in file access mode. icache + * hits do not clear DONTCACHE, so we must do it here. + */ + spin_lock(&VFS_I(ip)->i_lock); + VFS_I(ip)->i_state &= ~I_DONTCACHE; + spin_unlock(&VFS_I(ip)->i_lock); + } else if (atomic_read(&VFS_I(ip)->i_count) == 1) { + /* + * If this is the last reference to the inode and the caller + * permits it, set DONTCACHE to avoid thrashing. + */ + d_mark_dontcache(VFS_I(ip)); + } + + xfs_irele(ip); +} + /* Set us up to scrub a file's contents. */ int xchk_setup_inode_contents( diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 0efe6b947d88..7472c41d9cfe 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -137,6 +137,9 @@ int xchk_get_inode(struct xfs_scrub *sc); int xchk_setup_inode_contents(struct xfs_scrub *sc, unsigned int resblks); void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp); +int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp); +void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip); + /* * Don't bother cross-referencing if we already found corruption or cross * referencing discrepancies. diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c index 58b9761db48d..43a1cbf2ac67 100644 --- a/fs/xfs/scrub/dir.c +++ b/fs/xfs/scrub/dir.c @@ -86,7 +86,7 @@ xchk_dir_check_ftype( xfs_mode_to_ftype(VFS_I(ip)->i_mode)); if (ino_dtype != dtype) xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset); - xfs_irele(ip); + xchk_irele(sdc->sc, ip); out: return error; } diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c index ab182a5cd0c0..38ea04e66468 100644 --- a/fs/xfs/scrub/parent.c +++ b/fs/xfs/scrub/parent.c @@ -131,7 +131,6 @@ xchk_parent_validate( xfs_ino_t dnum, bool *try_again) { - struct xfs_mount *mp = sc->mp; struct xfs_inode *dp = NULL; xfs_nlink_t expected_nlink; xfs_nlink_t nlink; @@ -168,7 +167,7 @@ xchk_parent_validate( * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a * cross referencing error. Any other error is an operational error. */ - error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_UNTRUSTED, 0, &dp); + error = xchk_iget(sc, dnum, &dp); if (error == -EINVAL || error == -ENOENT) { error = -EFSCORRUPTED; xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error); @@ -253,7 +252,7 @@ xchk_parent_validate( out_unlock: xfs_iunlock(dp, XFS_IOLOCK_SHARED); out_rele: - xfs_irele(dp); + xchk_irele(sc, dp); out: return error; } diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 7a3557a69fe0..bc9638c7a379 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -181,7 +181,7 @@ xchk_teardown( xfs_iunlock(sc->ip, sc->ilock_flags); if (sc->ip != ip_in && !xfs_internal_inum(sc->mp, sc->ip->i_ino)) - xfs_irele(sc->ip); + xchk_irele(sc, sc->ip); sc->ip = NULL; } if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)