Message ID | 163961697197.3129691.1911552605195534271.stgit@magnolia (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | xfs: random fixes for 5.17 | expand |
On Wed, Dec 15, 2021 at 05:09:32PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > When xfs_scrub encounters a directory with a leaf1 block, it tries to > validate that the leaf1 block's bestcount (aka the best free count of > each directory data block) is the correct size. Previously, this author > believed that comparing bestcount to the directory isize (since > directory data blocks are under isize, and leaf/bestfree blocks are > above it) was sufficient. > > Unfortunately during testing of online repair, it was discovered that it > is possible to create a directory with a hole between the last directory > block and isize. We have xfs_da3_swap_lastblock() that can leave an -empty- da block between the last referenced block and isize, but that's not a "hole" in the file. If you don't mean xfs_da3_swap_lastblock(), then can you clarify what you mean by a "hole" here and explain to me how the situation it occurs in comes about? Cheers, Dave.
On Thu, Dec 16, 2021 at 04:05:37PM +1100, Dave Chinner wrote: > On Wed, Dec 15, 2021 at 05:09:32PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > When xfs_scrub encounters a directory with a leaf1 block, it tries to > > validate that the leaf1 block's bestcount (aka the best free count of > > each directory data block) is the correct size. Previously, this author > > believed that comparing bestcount to the directory isize (since > > directory data blocks are under isize, and leaf/bestfree blocks are > > above it) was sufficient. > > > > Unfortunately during testing of online repair, it was discovered that it > > is possible to create a directory with a hole between the last directory > > block and isize. > > We have xfs_da3_swap_lastblock() that can leave an -empty- da block > between the last referenced block and isize, but that's not a "hole" > in the file. If you don't mean xfs_da3_swap_lastblock(), then can > you clarify what you mean by a "hole" here and explain to me how the > situation it occurs in comes about? I don't actually know how it comes about. I wrote a test that sets up fsstress to expand and contract directories and races xfs_scrub -n, and noticed that I'd periodically get complaints about directories (usually $SCRATCH_MNT/p$CPU) where the last block(s) before i_size were actually holes. I began reading the dir2 code to try to figure out how this came about (clearly we're not updating i_size somewhere) but then took the shortcut of seeing if xfs_repair or xfs_check complained about this situation. Neither of them did, and I found a couple more directories in a similar situation on my crash test dummy machine, and concluded "Wellllp, I guess this is part of the ondisk format!" and committed the patch. Also, I thought xfs_da3_swap_lastblock only operates on leaf and da btree blocks, not the blocks containing directory entries? I /think/ the actual explanation is that something goes wrong in xfs_dir2_shrink_inode (maybe?) such that the mapping goes away but i_disk_size doesn't get updated? Not sure how /that/ can happen, though... --D > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Thu, Dec 16, 2021 at 11:25:49AM -0800, Darrick J. Wong wrote: > On Thu, Dec 16, 2021 at 04:05:37PM +1100, Dave Chinner wrote: > > On Wed, Dec 15, 2021 at 05:09:32PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > When xfs_scrub encounters a directory with a leaf1 block, it tries to > > > validate that the leaf1 block's bestcount (aka the best free count of > > > each directory data block) is the correct size. Previously, this author > > > believed that comparing bestcount to the directory isize (since > > > directory data blocks are under isize, and leaf/bestfree blocks are > > > above it) was sufficient. > > > > > > Unfortunately during testing of online repair, it was discovered that it > > > is possible to create a directory with a hole between the last directory > > > block and isize. > > > > We have xfs_da3_swap_lastblock() that can leave an -empty- da block > > between the last referenced block and isize, but that's not a "hole" > > in the file. If you don't mean xfs_da3_swap_lastblock(), then can > > you clarify what you mean by a "hole" here and explain to me how the > > situation it occurs in comes about? > > I don't actually know how it comes about. I wrote a test that sets up > fsstress to expand and contract directories and races xfs_scrub -n, and > noticed that I'd periodically get complaints about directories (usually > $SCRATCH_MNT/p$CPU) where the last block(s) before i_size were actually > holes. Is that test getting to ENOSPC at all? > I began reading the dir2 code to try to figure out how this came about > (clearly we're not updating i_size somewhere) but then took the shortcut > of seeing if xfs_repair or xfs_check complained about this situation. > Neither of them did, and I found a couple more directories in a similar > situation on my crash test dummy machine, and concluded "Wellllp, I > guess this is part of the ondisk format!" and committed the patch. > > Also, I thought xfs_da3_swap_lastblock only operates on leaf and da > btree blocks, not the blocks containing directory entries? Ah, right you are. I noticed xfs_da_shrink_inode() being called from leaf_to_block() and thought it might be swapping the leaf with the last data block that we probably just removed. Looking at the code, that is not going to happend AFAICT... > I /think/ > the actual explanation is that something goes wrong in > xfs_dir2_shrink_inode (maybe?) such that the mapping goes away but > i_disk_size doesn't get updated? Not sure how /that/ can happen, > though... Actually, the ENOSPC case in xfs_dir2_shrink_inode is the likely case. If we can't free the block because bunmapi gets ENOSPC due to xfs_dir_rename() being called without a block reservation, it'll just get left there as an empty data block. If all the other dir data blocks around it get removed properly, it could eventually end up between the last valid entry and isize.... There are lots of weird corner cases around ENOSPC in the directory code, perhaps this is just another of them... Cheers, Dave.
On Fri, Dec 17, 2021 at 08:17:48AM +1100, Dave Chinner wrote: > On Thu, Dec 16, 2021 at 11:25:49AM -0800, Darrick J. Wong wrote: > > On Thu, Dec 16, 2021 at 04:05:37PM +1100, Dave Chinner wrote: > > > On Wed, Dec 15, 2021 at 05:09:32PM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > When xfs_scrub encounters a directory with a leaf1 block, it tries to > > > > validate that the leaf1 block's bestcount (aka the best free count of > > > > each directory data block) is the correct size. Previously, this author > > > > believed that comparing bestcount to the directory isize (since > > > > directory data blocks are under isize, and leaf/bestfree blocks are > > > > above it) was sufficient. > > > > > > > > Unfortunately during testing of online repair, it was discovered that it > > > > is possible to create a directory with a hole between the last directory > > > > block and isize. > > > > > > We have xfs_da3_swap_lastblock() that can leave an -empty- da block > > > between the last referenced block and isize, but that's not a "hole" > > > in the file. If you don't mean xfs_da3_swap_lastblock(), then can > > > you clarify what you mean by a "hole" here and explain to me how the > > > situation it occurs in comes about? > > > > I don't actually know how it comes about. I wrote a test that sets up > > fsstress to expand and contract directories and races xfs_scrub -n, and > > noticed that I'd periodically get complaints about directories (usually > > $SCRATCH_MNT/p$CPU) where the last block(s) before i_size were actually > > holes. > > Is that test getting to ENOSPC at all? Yes. That particular VM has a generous 8GB of SCRATCH_DEV to make the repairs more interesting. > > I began reading the dir2 code to try to figure out how this came about > > (clearly we're not updating i_size somewhere) but then took the shortcut > > of seeing if xfs_repair or xfs_check complained about this situation. > > Neither of them did, and I found a couple more directories in a similar > > situation on my crash test dummy machine, and concluded "Wellllp, I > > guess this is part of the ondisk format!" and committed the patch. > > > > Also, I thought xfs_da3_swap_lastblock only operates on leaf and da > > btree blocks, not the blocks containing directory entries? > > Ah, right you are. I noticed xfs_da_shrink_inode() being called from > leaf_to_block() and thought it might be swapping the leaf with the > last data block that we probably just removed. Looking at the code, > that is not going to happend AFAICT... > > > I /think/ > > the actual explanation is that something goes wrong in > > xfs_dir2_shrink_inode (maybe?) such that the mapping goes away but > > i_disk_size doesn't get updated? Not sure how /that/ can happen, > > though... > > Actually, the ENOSPC case in xfs_dir2_shrink_inode is the likely > case. If we can't free the block because bunmapi gets ENOSPC due > to xfs_dir_rename() being called without a block reservation, it'll > just get left there as an empty data block. If all the other dir > data blocks around it get removed properly, it could eventually end > up between the last valid entry and isize.... > > There are lots of weird corner cases around ENOSPC in the directory > code, perhaps this is just another of them... <nod> The next time I reproduce it, I'll send you a metadump. --D > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Wed, Dec 15, 2021 at 05:09:32PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > When xfs_scrub encounters a directory with a leaf1 block, it tries to > validate that the leaf1 block's bestcount (aka the best free count of > each directory data block) is the correct size. Previously, this author > believed that comparing bestcount to the directory isize (since > directory data blocks are under isize, and leaf/bestfree blocks are > above it) was sufficient. > > Unfortunately during testing of online repair, it was discovered that it > is possible to create a directory with a hole between the last directory > block and isize. The directory code seems to handle this situation just > fine and xfs_repair doesn't complain, which effectively makes this quirk > part of the disk format. > > Fix the check to work properly. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> With the "we're not sure how this happens" discussion out of the way, the change to handle the empty space between the last block and isize looks fine. Reviewed-by: Dave Chinner <dchinner@redhat.com>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c index 200a63f58fe7..9a16932d77ce 100644 --- a/fs/xfs/scrub/dir.c +++ b/fs/xfs/scrub/dir.c @@ -497,6 +497,7 @@ STATIC int xchk_directory_leaf1_bestfree( struct xfs_scrub *sc, struct xfs_da_args *args, + xfs_dir2_db_t last_data_db, xfs_dablk_t lblk) { struct xfs_dir3_icleaf_hdr leafhdr; @@ -534,10 +535,14 @@ xchk_directory_leaf1_bestfree( } /* - * There should be as many bestfree slots as there are dir data - * blocks that can fit under i_size. + * There must be enough bestfree slots to cover all the directory data + * blocks that we scanned. It is possible for there to be a hole + * between the last data block and i_disk_size. This seems like an + * oversight to the scrub author, but as we have been writing out + * directories like this (and xfs_repair doesn't mind them) for years, + * that's what we have to check. */ - if (bestcount != xfs_dir2_byte_to_db(geo, sc->ip->i_disk_size)) { + if (bestcount != last_data_db + 1) { xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk); goto out; } @@ -669,6 +674,7 @@ xchk_directory_blocks( xfs_fileoff_t lblk; struct xfs_iext_cursor icur; xfs_dablk_t dabno; + xfs_dir2_db_t last_data_db = 0; bool found; int is_block = 0; int error; @@ -712,6 +718,7 @@ xchk_directory_blocks( args.geo->fsbcount); lblk < got.br_startoff + got.br_blockcount; lblk += args.geo->fsbcount) { + last_data_db = xfs_dir2_da_to_db(mp->m_dir_geo, lblk); error = xchk_directory_data_bestfree(sc, lblk, is_block); if (error) @@ -734,7 +741,7 @@ xchk_directory_blocks( xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk); goto out; } - error = xchk_directory_leaf1_bestfree(sc, &args, + error = xchk_directory_leaf1_bestfree(sc, &args, last_data_db, leaf_lblk); if (error) goto out;