Message ID | 38fc8e93-a4be-7eef-ebd6-fa3cb31b9dee@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xfs: fix the problem of mount failure caused by not refreshing mp->m_sb | expand |
On Tue, May 23, 2023 at 02:25:54PM +0800, Wu Guanghao wrote: > After testing xfs_growfs + fsstress + fault injection, the following stack appeared > when mounting the filesystem: > > [ 149.902032] XFS (loop0): xfs_buf_map_verify: daddr 0x200001 out of range, EOFS 0x200000 > [ 149.902072] WARNING: CPU: 12 PID: 3045 at fs/xfs/xfs_buf.c:535 xfs_buf_get_map+0x5ae/0x650 [xfs] > ... > [ 149.902473] xfs_buf_read_map+0x59/0x330 [xfs] > [ 149.902621] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] > [ 149.902809] xlog_recover_buf_commit_pass2+0xff/0x640 [xfs] > [ 149.902959] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] > [ 149.903104] xlog_recover_items_pass2+0x55/0xd0 [xfs] > [ 149.903247] xlog_recover_commit_trans+0x2e0/0x330 [xfs] > [ 149.903390] xlog_recovery_process_trans+0x8e/0xf0 [xfs] > [ 149.903531] xlog_recover_process_data+0x9c/0x130 [xfs] > [ 149.903687] xlog_do_recovery_pass+0x3cc/0x5d0 [xfs] > [ 149.903843] xlog_do_log_recovery+0x5c/0x80 [xfs] > [ 149.903984] xlog_do_recover+0x33/0x1c0 [xfs] > [ 149.904125] xlog_recover+0xdd/0x190 [xfs] > [ 149.904265] xfs_log_mount+0x125/0x2f0 [xfs] > [ 149.904410] xfs_mountfs+0x41a/0x910 [xfs] > [ 149.904558] ? __pfx_xfs_fstrm_free_func+0x10/0x10 [xfs] > [ 149.904725] xfs_fs_fill_super+0x4b7/0x940 [xfs] > [ 149.904873] ? __pfx_xfs_fs_fill_super+0x10/0x10 [xfs] > [ 149.905016] get_tree_bdev+0x19a/0x280 > [ 149.905020] vfs_get_tree+0x29/0xd0 > [ 149.905023] path_mount+0x69e/0x9b0 > [ 149.905026] do_mount+0x7d/0xa0 > [ 149.905029] __x64_sys_mount+0xdc/0x100 > [ 149.905032] do_syscall_64+0x3e/0x90 > [ 149.905035] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > The trigger process is as follows: > > 1. Growfs size from 0x200000 to 0x300000 > 2. Using the space range of 0x200000~0x300000 > 3. The above operations have only been written to the log area on disk > 4. Fault injection and shutdown filesystem > 5. Mount the filesystem and replay the log about growfs, but only modify the > superblock buffer without modifying the mp->m_sb structure in memory > 6. Continuing the log replay, at this point we are replaying operation 2, then > it was discovered that the blocks used more than mp->m_sb.sb_dblocks > > Therefore, during log replay, if there are any modifications made to the > superblock, we should refresh the information recorded in the mp->m_sb. Heh, clever. Thanks for supplying a patch. :) > Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com> > --- > fs/xfs/xfs_buf_item_recover.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c > index 43167f543afc..2ac3d2083188 100644 > --- a/fs/xfs/xfs_buf_item_recover.c > +++ b/fs/xfs/xfs_buf_item_recover.c > @@ -22,6 +22,8 @@ > #include "xfs_inode.h" > #include "xfs_dir2.h" > #include "xfs_quota.h" > +#include "xfs_sb.h" > +#include "xfs_ag.h" > > /* > * This is the number of entries in the l_buf_cancel_table used during > @@ -969,6 +971,29 @@ xlog_recover_buf_commit_pass2( > goto out_release; > } else { > xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn); > + /* > + * If the superblock buffer is modified, we also need to modify the > + * content of the mp. > + */ > + if (bp->b_maps[0].bm_bn == XFS_SB_DADDR && bp->b_ops) { > + struct xfs_dsb *sb = bp->b_addr; I think the body of this if statement ought to be a separate function, e.g. STATIC int xlog_recover_sb_buffer(...) { struct xfs_dsb *sb = bp->b_addr; bp->b_ops->verify_write(bp); ... } Also, I think the callsite is better placed at the end of xlog_recover_do_reg_buffer. > + > + bp->b_ops->verify_write(bp); I was about to ask why you ran the full write verifier here instead of calling ->verify_struct (which skips the crc computation), but then I realized: const struct xfs_buf_ops xfs_sb_buf_ops = { .name = "xfs_sb", .magic = { cpu_to_be32(XFS_SB_MAGIC), cpu_to_be32(XFS_SB_MAGIC) }, .verify_read = xfs_sb_read_verify, .verify_write = xfs_sb_write_verify, }; So, heh. No structure verifier. I think for completeness xfs_sb_buf_ops ought to have one, but I'm willing to live with this for now. > + error = bp->b_error; > + if (error) > + goto out_release; > + > + if (be32_to_cpu(sb->sb_agcount) > mp->m_sb.sb_agcount) { > + error = xfs_initialize_perag(mp, > + be32_to_cpu(sb->sb_agcount), > + be64_to_cpu(sb->sb_dblocks), > + &mp->m_maxagi); > + if (error) > + goto out_release; Might want to log a message here that the perag init is what killed log recovery. Other than those reorganization suggestions, I think this looks correct. Would you mind submitting your testcase to fstests? --D > + } > + > + xfs_sb_from_disk(&mp->m_sb, sb); > + } > } > > /* > -- > 2.27.0
On Tue, May 23, 2023 at 02:25:54PM +0800, Wu Guanghao wrote: > After testing xfs_growfs + fsstress + fault injection, the following stack appeared > when mounting the filesystem: > > [ 149.902032] XFS (loop0): xfs_buf_map_verify: daddr 0x200001 out of range, EOFS 0x200000 > [ 149.902072] WARNING: CPU: 12 PID: 3045 at fs/xfs/xfs_buf.c:535 xfs_buf_get_map+0x5ae/0x650 [xfs] > ... > [ 149.902473] xfs_buf_read_map+0x59/0x330 [xfs] > [ 149.902621] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] > [ 149.902809] xlog_recover_buf_commit_pass2+0xff/0x640 [xfs] > [ 149.902959] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] > [ 149.903104] xlog_recover_items_pass2+0x55/0xd0 [xfs] > [ 149.903247] xlog_recover_commit_trans+0x2e0/0x330 [xfs] > [ 149.903390] xlog_recovery_process_trans+0x8e/0xf0 [xfs] > [ 149.903531] xlog_recover_process_data+0x9c/0x130 [xfs] > [ 149.903687] xlog_do_recovery_pass+0x3cc/0x5d0 [xfs] > [ 149.903843] xlog_do_log_recovery+0x5c/0x80 [xfs] > [ 149.903984] xlog_do_recover+0x33/0x1c0 [xfs] > [ 149.904125] xlog_recover+0xdd/0x190 [xfs] > [ 149.904265] xfs_log_mount+0x125/0x2f0 [xfs] > [ 149.904410] xfs_mountfs+0x41a/0x910 [xfs] > [ 149.904558] ? __pfx_xfs_fstrm_free_func+0x10/0x10 [xfs] > [ 149.904725] xfs_fs_fill_super+0x4b7/0x940 [xfs] > [ 149.904873] ? __pfx_xfs_fs_fill_super+0x10/0x10 [xfs] > [ 149.905016] get_tree_bdev+0x19a/0x280 > [ 149.905020] vfs_get_tree+0x29/0xd0 > [ 149.905023] path_mount+0x69e/0x9b0 > [ 149.905026] do_mount+0x7d/0xa0 > [ 149.905029] __x64_sys_mount+0xdc/0x100 > [ 149.905032] do_syscall_64+0x3e/0x90 > [ 149.905035] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > The trigger process is as follows: > > 1. Growfs size from 0x200000 to 0x300000 > 2. Using the space range of 0x200000~0x300000 > 3. The above operations have only been written to the log area on disk > 4. Fault injection and shutdown filesystem > 5. Mount the filesystem and replay the log about growfs, but only modify the > superblock buffer without modifying the mp->m_sb structure in memory > 6. Continuing the log replay, at this point we are replaying operation 2, then > it was discovered that the blocks used more than mp->m_sb.sb_dblocks > > Therefore, during log replay, if there are any modifications made to the > superblock, we should refresh the information recorded in the mp->m_sb. > > Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com> There are a bunch of things we need to re-init post recovery if the superblock contents change during recovery. See xlog_do_recover() - if we are moving the sb log item recovery updates from post-recovery to "at log item recovery", then we need to be moving everything else in xlog_do_recover() here as well. That said.... > --- > fs/xfs/xfs_buf_item_recover.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c > index 43167f543afc..2ac3d2083188 100644 > --- a/fs/xfs/xfs_buf_item_recover.c > +++ b/fs/xfs/xfs_buf_item_recover.c > @@ -22,6 +22,8 @@ > #include "xfs_inode.h" > #include "xfs_dir2.h" > #include "xfs_quota.h" > +#include "xfs_sb.h" > +#include "xfs_ag.h" > > /* > * This is the number of entries in the l_buf_cancel_table used during > @@ -969,6 +971,29 @@ xlog_recover_buf_commit_pass2( > goto out_release; > } else { > xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn); > + /* > + * If the superblock buffer is modified, we also need to modify the > + * content of the mp. > + */ > + if (bp->b_maps[0].bm_bn == XFS_SB_DADDR && bp->b_ops) { > + struct xfs_dsb *sb = bp->b_addr; > + > + bp->b_ops->verify_write(bp); > + error = bp->b_error; > + if (error) > + goto out_release; > + > + if (be32_to_cpu(sb->sb_agcount) > mp->m_sb.sb_agcount) { > + error = xfs_initialize_perag(mp, > + be32_to_cpu(sb->sb_agcount), > + be64_to_cpu(sb->sb_dblocks), > + &mp->m_maxagi); > + if (error) > + goto out_release; > + } > + > + xfs_sb_from_disk(&mp->m_sb, sb); Ok, so what are we supposed to do here if the filesystem was shrunk? How do we guarantee that the mods that the shrink might do beyond the new end of device in the same checkpoint have already been replayed by the time the superblock change of size is replayed here? What if feature bits in the superblock were changed? e.g. we add a feature bit the filesystem doesn't understand? Or we have items for recovery in this checkpoint that are in the log after the superblock that depend on that feature bit not yet having been changed? What if the superblock got re-logged and the feature bit change subsequent objects rely on so gets moved to later in the checkpoint? i.e. there appears to be some important item recovery ordering issues we likely need to address here before we move the in-memory state updates from post-recovery to mid-recovery. I suspect that fixing this problem so that superblock updates also guarantee log recovery ordering is also going to need changing how we update feature and geometry state in the superblock.... -Dave.
在 2023/5/24 0:22, Darrick J. Wong 写道: > On Tue, May 23, 2023 at 02:25:54PM +0800, Wu Guanghao wrote: >> After testing xfs_growfs + fsstress + fault injection, the following stack appeared >> when mounting the filesystem: >> >> [ 149.902032] XFS (loop0): xfs_buf_map_verify: daddr 0x200001 out of range, EOFS 0x200000 >> [ 149.902072] WARNING: CPU: 12 PID: 3045 at fs/xfs/xfs_buf.c:535 xfs_buf_get_map+0x5ae/0x650 [xfs] >> ... >> [ 149.902473] xfs_buf_read_map+0x59/0x330 [xfs] >> [ 149.902621] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] >> [ 149.902809] xlog_recover_buf_commit_pass2+0xff/0x640 [xfs] >> [ 149.902959] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] >> [ 149.903104] xlog_recover_items_pass2+0x55/0xd0 [xfs] >> [ 149.903247] xlog_recover_commit_trans+0x2e0/0x330 [xfs] >> [ 149.903390] xlog_recovery_process_trans+0x8e/0xf0 [xfs] >> [ 149.903531] xlog_recover_process_data+0x9c/0x130 [xfs] >> [ 149.903687] xlog_do_recovery_pass+0x3cc/0x5d0 [xfs] >> [ 149.903843] xlog_do_log_recovery+0x5c/0x80 [xfs] >> [ 149.903984] xlog_do_recover+0x33/0x1c0 [xfs] >> [ 149.904125] xlog_recover+0xdd/0x190 [xfs] >> [ 149.904265] xfs_log_mount+0x125/0x2f0 [xfs] >> [ 149.904410] xfs_mountfs+0x41a/0x910 [xfs] >> [ 149.904558] ? __pfx_xfs_fstrm_free_func+0x10/0x10 [xfs] >> [ 149.904725] xfs_fs_fill_super+0x4b7/0x940 [xfs] >> [ 149.904873] ? __pfx_xfs_fs_fill_super+0x10/0x10 [xfs] >> [ 149.905016] get_tree_bdev+0x19a/0x280 >> [ 149.905020] vfs_get_tree+0x29/0xd0 >> [ 149.905023] path_mount+0x69e/0x9b0 >> [ 149.905026] do_mount+0x7d/0xa0 >> [ 149.905029] __x64_sys_mount+0xdc/0x100 >> [ 149.905032] do_syscall_64+0x3e/0x90 >> [ 149.905035] entry_SYSCALL_64_after_hwframe+0x72/0xdc >> >> The trigger process is as follows: >> >> 1. Growfs size from 0x200000 to 0x300000 >> 2. Using the space range of 0x200000~0x300000 >> 3. The above operations have only been written to the log area on disk >> 4. Fault injection and shutdown filesystem >> 5. Mount the filesystem and replay the log about growfs, but only modify the >> superblock buffer without modifying the mp->m_sb structure in memory >> 6. Continuing the log replay, at this point we are replaying operation 2, then >> it was discovered that the blocks used more than mp->m_sb.sb_dblocks >> >> Therefore, during log replay, if there are any modifications made to the >> superblock, we should refresh the information recorded in the mp->m_sb. > > Heh, clever. Thanks for supplying a patch. :) > >> Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com> >> --- >> fs/xfs/xfs_buf_item_recover.c | 25 +++++++++++++++++++++++++ >> 1 file changed, 25 insertions(+) >> >> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c >> index 43167f543afc..2ac3d2083188 100644 >> --- a/fs/xfs/xfs_buf_item_recover.c >> +++ b/fs/xfs/xfs_buf_item_recover.c >> @@ -22,6 +22,8 @@ >> #include "xfs_inode.h" >> #include "xfs_dir2.h" >> #include "xfs_quota.h" >> +#include "xfs_sb.h" >> +#include "xfs_ag.h" >> >> /* >> * This is the number of entries in the l_buf_cancel_table used during >> @@ -969,6 +971,29 @@ xlog_recover_buf_commit_pass2( >> goto out_release; >> } else { >> xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn); >> + /* >> + * If the superblock buffer is modified, we also need to modify the >> + * content of the mp. >> + */ >> + if (bp->b_maps[0].bm_bn == XFS_SB_DADDR && bp->b_ops) { >> + struct xfs_dsb *sb = bp->b_addr; > > I think the body of this if statement ought to be a separate function, > e.g. > > STATIC int > xlog_recover_sb_buffer(...) > { > struct xfs_dsb *sb = bp->b_addr; > > bp->b_ops->verify_write(bp); > ... > } > > Also, I think the callsite is better placed at the end of > xlog_recover_do_reg_buffer. > >> + >> + bp->b_ops->verify_write(bp); > > I was about to ask why you ran the full write verifier here instead of > calling ->verify_struct (which skips the crc computation), but then I > realized: > > const struct xfs_buf_ops xfs_sb_buf_ops = { > .name = "xfs_sb", > .magic = { cpu_to_be32(XFS_SB_MAGIC), cpu_to_be32(XFS_SB_MAGIC) }, > .verify_read = xfs_sb_read_verify, > .verify_write = xfs_sb_write_verify, > }; > > So, heh. No structure verifier. I think for completeness > xfs_sb_buf_ops ought to have one, but I'm willing to live with this for > now. > >> + error = bp->b_error; >> + if (error) >> + goto out_release; >> + >> + if (be32_to_cpu(sb->sb_agcount) > mp->m_sb.sb_agcount) { >> + error = xfs_initialize_perag(mp, >> + be32_to_cpu(sb->sb_agcount), >> + be64_to_cpu(sb->sb_dblocks), >> + &mp->m_maxagi); >> + if (error) >> + goto out_release; > > Might want to log a message here that the perag init is what killed log > recovery. > > Other than those reorganization suggestions, I think this looks correct. > Would you mind submitting your testcase to fstests? > > --D Thank you for your suggestion. I will make the changes in the v2 version. And we will try to submit test cases in the future. Thanks, Guanghao > >> + } >> + >> + xfs_sb_from_disk(&mp->m_sb, sb); >> + } >> } >> >> /* >> -- >> 2.27.0 > . >
diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 43167f543afc..2ac3d2083188 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -22,6 +22,8 @@ #include "xfs_inode.h" #include "xfs_dir2.h" #include "xfs_quota.h" +#include "xfs_sb.h" +#include "xfs_ag.h" /* * This is the number of entries in the l_buf_cancel_table used during @@ -969,6 +971,29 @@ xlog_recover_buf_commit_pass2( goto out_release; } else { xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn); + /* + * If the superblock buffer is modified, we also need to modify the + * content of the mp. + */ + if (bp->b_maps[0].bm_bn == XFS_SB_DADDR && bp->b_ops) { + struct xfs_dsb *sb = bp->b_addr; + + bp->b_ops->verify_write(bp); + error = bp->b_error; + if (error) + goto out_release; + + if (be32_to_cpu(sb->sb_agcount) > mp->m_sb.sb_agcount) { + error = xfs_initialize_perag(mp, + be32_to_cpu(sb->sb_agcount), + be64_to_cpu(sb->sb_dblocks), + &mp->m_maxagi); + if (error) + goto out_release; + } + + xfs_sb_from_disk(&mp->m_sb, sb); + } } /*
After testing xfs_growfs + fsstress + fault injection, the following stack appeared when mounting the filesystem: [ 149.902032] XFS (loop0): xfs_buf_map_verify: daddr 0x200001 out of range, EOFS 0x200000 [ 149.902072] WARNING: CPU: 12 PID: 3045 at fs/xfs/xfs_buf.c:535 xfs_buf_get_map+0x5ae/0x650 [xfs] ... [ 149.902473] xfs_buf_read_map+0x59/0x330 [xfs] [ 149.902621] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] [ 149.902809] xlog_recover_buf_commit_pass2+0xff/0x640 [xfs] [ 149.902959] ? xlog_recover_items_pass2+0x55/0xd0 [xfs] [ 149.903104] xlog_recover_items_pass2+0x55/0xd0 [xfs] [ 149.903247] xlog_recover_commit_trans+0x2e0/0x330 [xfs] [ 149.903390] xlog_recovery_process_trans+0x8e/0xf0 [xfs] [ 149.903531] xlog_recover_process_data+0x9c/0x130 [xfs] [ 149.903687] xlog_do_recovery_pass+0x3cc/0x5d0 [xfs] [ 149.903843] xlog_do_log_recovery+0x5c/0x80 [xfs] [ 149.903984] xlog_do_recover+0x33/0x1c0 [xfs] [ 149.904125] xlog_recover+0xdd/0x190 [xfs] [ 149.904265] xfs_log_mount+0x125/0x2f0 [xfs] [ 149.904410] xfs_mountfs+0x41a/0x910 [xfs] [ 149.904558] ? __pfx_xfs_fstrm_free_func+0x10/0x10 [xfs] [ 149.904725] xfs_fs_fill_super+0x4b7/0x940 [xfs] [ 149.904873] ? __pfx_xfs_fs_fill_super+0x10/0x10 [xfs] [ 149.905016] get_tree_bdev+0x19a/0x280 [ 149.905020] vfs_get_tree+0x29/0xd0 [ 149.905023] path_mount+0x69e/0x9b0 [ 149.905026] do_mount+0x7d/0xa0 [ 149.905029] __x64_sys_mount+0xdc/0x100 [ 149.905032] do_syscall_64+0x3e/0x90 [ 149.905035] entry_SYSCALL_64_after_hwframe+0x72/0xdc The trigger process is as follows: 1. Growfs size from 0x200000 to 0x300000 2. Using the space range of 0x200000~0x300000 3. The above operations have only been written to the log area on disk 4. Fault injection and shutdown filesystem 5. Mount the filesystem and replay the log about growfs, but only modify the superblock buffer without modifying the mp->m_sb structure in memory 6. Continuing the log replay, at this point we are replaying operation 2, then it was discovered that the blocks used more than mp->m_sb.sb_dblocks Therefore, during log replay, if there are any modifications made to the superblock, we should refresh the information recorded in the mp->m_sb. Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com> --- fs/xfs/xfs_buf_item_recover.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) -- 2.27.0