Message ID | 896a0202-aac8-e43f-7ea6-3718591e32aa@sandeen.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote: > Now that unlinked inode recovery is done outside of > log recovery, there is no need to dirty the log on > snapshots just to handle unlinked inodes. This means > that readonly snapshots can be mounted without requiring > -o ro,norecovery to avoid the log replay that can't happen > on a readonly block device. > > (unlinked inodes will just hang out in the agi buckets until > the next writable mount) FWIW I put these two in a test kernel to see what would happen and generic/311 failures popped up. It looked like the _check_scratch_fs found incorrect block counts on the snapshot(?) --D > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 93588ea..5669525 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1419,9 +1419,10 @@ struct proc_xfs_info { > > /* > * Second stage of a freeze. The data is already frozen so we only > - * need to take care of the metadata. Once that's done sync the superblock > - * to the log to dirty it in case of a crash while frozen. This ensures that we > - * will recover the unlinked inode lists on the next mount. > + * need to take care of the metadata. > + * Any unlinked inode lists will remain at this point, and be recovered > + * on the next writable mount if we crash while frozen, or create > + * a snapshot from the frozen filesystem. > */ > STATIC int > xfs_fs_freeze( > @@ -1431,7 +1432,7 @@ struct proc_xfs_info { > > xfs_save_resvblks(mp); > xfs_quiesce_attr(mp); > - return xfs_sync_sb(mp, true); > + return 0; > } > > STATIC int > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote: > On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote: > > Now that unlinked inode recovery is done outside of > > log recovery, there is no need to dirty the log on > > snapshots just to handle unlinked inodes. This means > > that readonly snapshots can be mounted without requiring > > -o ro,norecovery to avoid the log replay that can't happen > > on a readonly block device. > > > > (unlinked inodes will just hang out in the agi buckets until > > the next writable mount) > > FWIW I put these two in a test kernel to see what would happen and > generic/311 failures popped up. It looked like the _check_scratch_fs > found incorrect block counts on the snapshot(?) > Interesting. Just a wild guess, but perhaps it has something to do with lazy sb accounting..? I see we call xfs_initialize_perag_data() when mounting an unclean fs. Brian > --D > > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > > --- > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index 93588ea..5669525 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -1419,9 +1419,10 @@ struct proc_xfs_info { > > > > /* > > * Second stage of a freeze. The data is already frozen so we only > > - * need to take care of the metadata. Once that's done sync the superblock > > - * to the log to dirty it in case of a crash while frozen. This ensures that we > > - * will recover the unlinked inode lists on the next mount. > > + * need to take care of the metadata. > > + * Any unlinked inode lists will remain at this point, and be recovered > > + * on the next writable mount if we crash while frozen, or create > > + * a snapshot from the frozen filesystem. > > */ > > STATIC int > > xfs_fs_freeze( > > @@ -1431,7 +1432,7 @@ struct proc_xfs_info { > > > > xfs_save_resvblks(mp); > > xfs_quiesce_attr(mp); > > - return xfs_sync_sb(mp, true); > > + return 0; > > } > > > > STATIC int > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 26, 2018 at 08:46:49AM -0400, Brian Foster wrote: > On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote: > > On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote: > > > Now that unlinked inode recovery is done outside of > > > log recovery, there is no need to dirty the log on > > > snapshots just to handle unlinked inodes. This means > > > that readonly snapshots can be mounted without requiring > > > -o ro,norecovery to avoid the log replay that can't happen > > > on a readonly block device. > > > > > > (unlinked inodes will just hang out in the agi buckets until > > > the next writable mount) > > > > FWIW I put these two in a test kernel to see what would happen and > > generic/311 failures popped up. It looked like the _check_scratch_fs > > found incorrect block counts on the snapshot(?) > > > > Interesting. Just a wild guess, but perhaps it has something to do with > lazy sb accounting..? I see we call xfs_initialize_perag_data() when > mounting an unclean fs. The freeze is calls xfs_log_sbcount() which should update the superblock counters from the in-memory counters and write them to disk. If they are out, I'm guessing it's because the in-memory per-ag reservations are not being returned to the global pool before the in-memory counters are summed during a freeze.... Cheers, Dave.
Hi folks, On Wed, Mar 28, 2018 at 08:17:28AM +1100, Dave Chinner wrote: > On Mon, Mar 26, 2018 at 08:46:49AM -0400, Brian Foster wrote: > > On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote: > > > On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote: > > > > Now that unlinked inode recovery is done outside of > > > > log recovery, there is no need to dirty the log on > > > > snapshots just to handle unlinked inodes. This means > > > > that readonly snapshots can be mounted without requiring > > > > -o ro,norecovery to avoid the log replay that can't happen > > > > on a readonly block device. > > > > > > > > (unlinked inodes will just hang out in the agi buckets until > > > > the next writable mount) > > > > > > FWIW I put these two in a test kernel to see what would happen and > > > generic/311 failures popped up. It looked like the _check_scratch_fs > > > found incorrect block counts on the snapshot(?) > > > > > > > Interesting. Just a wild guess, but perhaps it has something to do with > > lazy sb accounting..? I see we call xfs_initialize_perag_data() when > > mounting an unclean fs. > > The freeze is calls xfs_log_sbcount() which should update the > superblock counters from the in-memory counters and write them to > disk. > > If they are out, I'm guessing it's because the in-memory per-ag > reservations are not being returned to the global pool before the > in-memory counters are summed during a freeze.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com I spend some time on tracking this problem. I've made a quick modification with per-AG reservation and tested with generic/311 it seems fine. My current question is that how such fsfreezed images (with clean mount) work with old kernels without [PATCH 1/1]? I'm afraid orphan inodes won't be freed with such old kernels.... Am I missing something? Thanks, Gao Xiang diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 06041834daa3..79d6d8858dcf 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -963,12 +963,17 @@ xfs_log_quiesce( return xfs_log_cover(mp); } -void +int xfs_log_clean( struct xfs_mount *mp) { - xfs_log_quiesce(mp); + int ret; + + ret = xfs_log_quiesce(mp); + if (ret) + return ret; xfs_log_unmount_write(mp); + return 0; } /* diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 044e02cb8921..4061a219bfde 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -139,7 +139,7 @@ bool xfs_log_item_in_current_chkpt(struct xfs_log_item *lip); void xfs_log_work_queue(struct xfs_mount *mp); int xfs_log_quiesce(struct xfs_mount *mp); -void xfs_log_clean(struct xfs_mount *mp); +int xfs_log_clean(struct xfs_mount *mp); bool xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t); bool xfs_log_in_recovery(struct xfs_mount *); diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 97f31308de03..3ef21f589d6b 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3478,6 +3478,7 @@ xlog_recover_finish( : "internal"); log->l_flags &= ~XLOG_RECOVERY_NEEDED; } else { + xlog_recover_process_iunlinks(log); xfs_info(log->l_mp, "Ending clean mount"); } return 0; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 21b1d034aca3..0db1e7e0e0c8 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -884,7 +884,7 @@ xfs_fs_freeze( { struct xfs_mount *mp = XFS_M(sb); unsigned int flags; - int ret; + int error; /* * The filesystem is now frozen far enough that memory reclaim @@ -893,10 +893,25 @@ xfs_fs_freeze( */ flags = memalloc_nofs_save(); xfs_blockgc_stop(mp); + + /* Get rid of any leftover CoW reservations... */ + error = xfs_blockgc_free_space(mp, NULL); + if (error) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + return error; + } + + /* Free the per-AG metadata reservation pool. */ + error = xfs_fs_unreserve_ag_blocks(mp); + if (error) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + return error; + } + xfs_save_resvblks(mp); - ret = xfs_log_quiesce(mp); + error = xfs_log_clean(mp); memalloc_nofs_restore(flags); - return ret; + return error; } STATIC int @@ -904,10 +919,26 @@ xfs_fs_unfreeze( struct super_block *sb) { struct xfs_mount *mp = XFS_M(sb); + int error; xfs_restore_resvblks(mp); xfs_log_work_queue(mp); + + /* Recover any CoW blocks that never got remapped. */ + error = xfs_reflink_recover_cow(mp); + if (error) { + xfs_err(mp, + "Error %d recovering leftover CoW allocations.", error); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + return error; + } + xfs_blockgc_start(mp); + + /* Create the per-AG metadata reservation pool .*/ + error = xfs_fs_reserve_ag_blocks(mp); + if (error && error != -ENOSPC) + return error; return 0; } @@ -1440,7 +1471,6 @@ xfs_fs_fill_super( #endif } - /* Filesystem claims it needs repair, so refuse the mount. */ if (xfs_sb_version_needsrepair(&mp->m_sb)) { xfs_warn(mp, "Filesystem needs repair. Please run xfs_repair."); error = -EFSCORRUPTED;
On 2/23/21 7:42 AM, Gao Xiang wrote: > Hi folks, > > On Wed, Mar 28, 2018 at 08:17:28AM +1100, Dave Chinner wrote: >> On Mon, Mar 26, 2018 at 08:46:49AM -0400, Brian Foster wrote: >>> On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote: >>>> On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote: >>>>> Now that unlinked inode recovery is done outside of >>>>> log recovery, there is no need to dirty the log on >>>>> snapshots just to handle unlinked inodes. This means >>>>> that readonly snapshots can be mounted without requiring >>>>> -o ro,norecovery to avoid the log replay that can't happen >>>>> on a readonly block device. >>>>> >>>>> (unlinked inodes will just hang out in the agi buckets until >>>>> the next writable mount) >>>> >>>> FWIW I put these two in a test kernel to see what would happen and >>>> generic/311 failures popped up. It looked like the _check_scratch_fs >>>> found incorrect block counts on the snapshot(?) >>>> >>> >>> Interesting. Just a wild guess, but perhaps it has something to do with >>> lazy sb accounting..? I see we call xfs_initialize_perag_data() when >>> mounting an unclean fs. >> >> The freeze is calls xfs_log_sbcount() which should update the >> superblock counters from the in-memory counters and write them to >> disk. >> >> If they are out, I'm guessing it's because the in-memory per-ag >> reservations are not being returned to the global pool before the >> in-memory counters are summed during a freeze.... >> >> Cheers, >> >> Dave. >> -- >> Dave Chinner >> david@fromorbit.com > > I spend some time on tracking this problem. I've made a quick > modification with per-AG reservation and tested with generic/311 > it seems fine. My current question is that how such fsfreezed > images (with clean mount) work with old kernels without [PATCH 1/1]? > I'm afraid orphan inodes won't be freed with such old kernels.... > Am I missing something? It's true, a snapshot created with these patches will not have their unlinked inodes processed if mounted on an older kernel. I'm not sure how much of a problem that is; the filesystem is not inconsistent, but some space is lost, I guess. I'm not sure it's common to take a snapshot of a frozen filesystem on one kernel and then move it back to an older kernel. Maybe others have thoughts on this. -Eric
On Tue, Feb 23, 2021 at 08:40:56AM -0600, Eric Sandeen wrote: > On 2/23/21 7:42 AM, Gao Xiang wrote: > > Hi folks, > > > > On Wed, Mar 28, 2018 at 08:17:28AM +1100, Dave Chinner wrote: > >> On Mon, Mar 26, 2018 at 08:46:49AM -0400, Brian Foster wrote: > >>> On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote: > >>>> On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote: > >>>>> Now that unlinked inode recovery is done outside of > >>>>> log recovery, there is no need to dirty the log on > >>>>> snapshots just to handle unlinked inodes. This means > >>>>> that readonly snapshots can be mounted without requiring > >>>>> -o ro,norecovery to avoid the log replay that can't happen > >>>>> on a readonly block device. > >>>>> > >>>>> (unlinked inodes will just hang out in the agi buckets until > >>>>> the next writable mount) > >>>> > >>>> FWIW I put these two in a test kernel to see what would happen and > >>>> generic/311 failures popped up. It looked like the _check_scratch_fs > >>>> found incorrect block counts on the snapshot(?) > >>>> > >>> > >>> Interesting. Just a wild guess, but perhaps it has something to do with > >>> lazy sb accounting..? I see we call xfs_initialize_perag_data() when > >>> mounting an unclean fs. > >> > >> The freeze is calls xfs_log_sbcount() which should update the > >> superblock counters from the in-memory counters and write them to > >> disk. > >> > >> If they are out, I'm guessing it's because the in-memory per-ag > >> reservations are not being returned to the global pool before the > >> in-memory counters are summed during a freeze.... > >> > >> Cheers, > >> > >> Dave. > >> -- > >> Dave Chinner > >> david@fromorbit.com > > > > I spend some time on tracking this problem. I've made a quick > > modification with per-AG reservation and tested with generic/311 > > it seems fine. My current question is that how such fsfreezed > > images (with clean mount) work with old kernels without [PATCH 1/1]? > > I'm afraid orphan inodes won't be freed with such old kernels.... > > Am I missing something? > > It's true, a snapshot created with these patches will not have their unlinked > inodes processed if mounted on an older kernel. I'm not sure how much of a > problem that is; the filesystem is not inconsistent, but some space is lost, > I guess. I'm not sure it's common to take a snapshot of a frozen filesystem on > one kernel and then move it back to an older kernel. Maybe others have > thoughts on this. My current thought might be only to write clean mount without unlinked inodes when freezing, but leave log dirty if any unlinked inodes exist as Brian mentioned before and don't handle such case (?). I'd like to hear more comments about this as well. Thanks, Gao Xiang > > -Eric >
On 2/23/21 9:03 AM, Gao Xiang wrote: > On Tue, Feb 23, 2021 at 08:40:56AM -0600, Eric Sandeen wrote: >> On 2/23/21 7:42 AM, Gao Xiang wrote: >>> Hi folks, >>> >>> On Wed, Mar 28, 2018 at 08:17:28AM +1100, Dave Chinner wrote: >>>> On Mon, Mar 26, 2018 at 08:46:49AM -0400, Brian Foster wrote: >>>>> On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote: >>>>>> On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote: >>>>>>> Now that unlinked inode recovery is done outside of >>>>>>> log recovery, there is no need to dirty the log on >>>>>>> snapshots just to handle unlinked inodes. This means >>>>>>> that readonly snapshots can be mounted without requiring >>>>>>> -o ro,norecovery to avoid the log replay that can't happen >>>>>>> on a readonly block device. >>>>>>> >>>>>>> (unlinked inodes will just hang out in the agi buckets until >>>>>>> the next writable mount) >>>>>> >>>>>> FWIW I put these two in a test kernel to see what would happen and >>>>>> generic/311 failures popped up. It looked like the _check_scratch_fs >>>>>> found incorrect block counts on the snapshot(?) >>>>>> >>>>> >>>>> Interesting. Just a wild guess, but perhaps it has something to do with >>>>> lazy sb accounting..? I see we call xfs_initialize_perag_data() when >>>>> mounting an unclean fs. >>>> >>>> The freeze is calls xfs_log_sbcount() which should update the >>>> superblock counters from the in-memory counters and write them to >>>> disk. >>>> >>>> If they are out, I'm guessing it's because the in-memory per-ag >>>> reservations are not being returned to the global pool before the >>>> in-memory counters are summed during a freeze.... >>>> >>>> Cheers, >>>> >>>> Dave. >>>> -- >>>> Dave Chinner >>>> david@fromorbit.com >>> >>> I spend some time on tracking this problem. I've made a quick >>> modification with per-AG reservation and tested with generic/311 >>> it seems fine. My current question is that how such fsfreezed >>> images (with clean mount) work with old kernels without [PATCH 1/1]? >>> I'm afraid orphan inodes won't be freed with such old kernels.... >>> Am I missing something? >> >> It's true, a snapshot created with these patches will not have their unlinked >> inodes processed if mounted on an older kernel. I'm not sure how much of a >> problem that is; the filesystem is not inconsistent, but some space is lost, >> I guess. I'm not sure it's common to take a snapshot of a frozen filesystem on >> one kernel and then move it back to an older kernel. Maybe others have >> thoughts on this. > > My current thought might be only to write clean mount without > unlinked inodes when freezing, but leave log dirty if any > unlinked inodes exist as Brian mentioned before and don't > handle such case (?). I'd like to hear more comments about > this as well. I don't know if I had made this comment before ;) but I feel like that's even more "surprise" (as in: gets further from the principle of least surprise) and TBH I would rather not have that somewhat unpredictable behavior. I think I'd rather /always/ make a dirty log than sometimes do it, other times not. It'd just be more confusion for the admin IMHO. Thanks, -Eric > Thanks, > Gao Xiang > >> >> -Eric >> >
On Tue, Feb 23, 2021 at 09:46:38AM -0600, Eric Sandeen wrote: > > > On 2/23/21 9:03 AM, Gao Xiang wrote: > > On Tue, Feb 23, 2021 at 08:40:56AM -0600, Eric Sandeen wrote: > >> On 2/23/21 7:42 AM, Gao Xiang wrote: > >>> Hi folks, > >>> > >>> On Wed, Mar 28, 2018 at 08:17:28AM +1100, Dave Chinner wrote: > >>>> On Mon, Mar 26, 2018 at 08:46:49AM -0400, Brian Foster wrote: > >>>>> On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote: > >>>>>> On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote: > >>>>>>> Now that unlinked inode recovery is done outside of > >>>>>>> log recovery, there is no need to dirty the log on > >>>>>>> snapshots just to handle unlinked inodes. This means > >>>>>>> that readonly snapshots can be mounted without requiring > >>>>>>> -o ro,norecovery to avoid the log replay that can't happen > >>>>>>> on a readonly block device. > >>>>>>> > >>>>>>> (unlinked inodes will just hang out in the agi buckets until > >>>>>>> the next writable mount) > >>>>>> > >>>>>> FWIW I put these two in a test kernel to see what would happen and > >>>>>> generic/311 failures popped up. It looked like the _check_scratch_fs > >>>>>> found incorrect block counts on the snapshot(?) > >>>>>> > >>>>> > >>>>> Interesting. Just a wild guess, but perhaps it has something to do with > >>>>> lazy sb accounting..? I see we call xfs_initialize_perag_data() when > >>>>> mounting an unclean fs. > >>>> > >>>> The freeze is calls xfs_log_sbcount() which should update the > >>>> superblock counters from the in-memory counters and write them to > >>>> disk. > >>>> > >>>> If they are out, I'm guessing it's because the in-memory per-ag > >>>> reservations are not being returned to the global pool before the > >>>> in-memory counters are summed during a freeze.... > >>>> > >>>> Cheers, > >>>> > >>>> Dave. > >>>> -- > >>>> Dave Chinner > >>>> david@fromorbit.com > >>> > >>> I spend some time on tracking this problem. I've made a quick > >>> modification with per-AG reservation and tested with generic/311 > >>> it seems fine. My current question is that how such fsfreezed > >>> images (with clean mount) work with old kernels without [PATCH 1/1]? > >>> I'm afraid orphan inodes won't be freed with such old kernels.... > >>> Am I missing something? > >> > >> It's true, a snapshot created with these patches will not have their unlinked > >> inodes processed if mounted on an older kernel. I'm not sure how much of a > >> problem that is; the filesystem is not inconsistent, but some space is lost, > >> I guess. I'm not sure it's common to take a snapshot of a frozen filesystem on > >> one kernel and then move it back to an older kernel. Maybe others have > >> thoughts on this. > > > > My current thought might be only to write clean mount without > > unlinked inodes when freezing, but leave log dirty if any > > unlinked inodes exist as Brian mentioned before and don't > > handle such case (?). I'd like to hear more comments about > > this as well. > > I don't know if I had made this comment before ;) but I feel like that's even > more "surprise" (as in: gets further from the principle of least surprise) > and TBH I would rather not have that somewhat unpredictable behavior. > Yeah, I saw that comment as well.... > I think I'd rather /always/ make a dirty log than sometimes do it, other > times not. It'd just be more confusion for the admin IMHO. Ok, some other alternative approaches I could think out in my mind aren't trivial (e.g. some hack on log recovery, etc).. Any ideas / thoughts about this are welcomed :) Thanks! Thanks, Gao Xiang > > Thanks, > -Eric > > > Thanks, > > Gao Xiang > > > >> > >> -Eric > >> > > >
On Tue, Feb 23, 2021 at 09:46:38AM -0600, Eric Sandeen wrote: > > > On 2/23/21 9:03 AM, Gao Xiang wrote: > > On Tue, Feb 23, 2021 at 08:40:56AM -0600, Eric Sandeen wrote: > >> On 2/23/21 7:42 AM, Gao Xiang wrote: > >>> Hi folks, > >>> > >>> On Wed, Mar 28, 2018 at 08:17:28AM +1100, Dave Chinner wrote: > >>>> On Mon, Mar 26, 2018 at 08:46:49AM -0400, Brian Foster wrote: > >>>>> On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote: > >>>>>> On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote: > >>>>>>> Now that unlinked inode recovery is done outside of > >>>>>>> log recovery, there is no need to dirty the log on > >>>>>>> snapshots just to handle unlinked inodes. This means > >>>>>>> that readonly snapshots can be mounted without requiring > >>>>>>> -o ro,norecovery to avoid the log replay that can't happen > >>>>>>> on a readonly block device. > >>>>>>> > >>>>>>> (unlinked inodes will just hang out in the agi buckets until > >>>>>>> the next writable mount) > >>>>>> > >>>>>> FWIW I put these two in a test kernel to see what would happen and > >>>>>> generic/311 failures popped up. It looked like the _check_scratch_fs > >>>>>> found incorrect block counts on the snapshot(?) > >>>>>> > >>>>> > >>>>> Interesting. Just a wild guess, but perhaps it has something to do with > >>>>> lazy sb accounting..? I see we call xfs_initialize_perag_data() when > >>>>> mounting an unclean fs. > >>>> > >>>> The freeze is calls xfs_log_sbcount() which should update the > >>>> superblock counters from the in-memory counters and write them to > >>>> disk. > >>>> > >>>> If they are out, I'm guessing it's because the in-memory per-ag > >>>> reservations are not being returned to the global pool before the > >>>> in-memory counters are summed during a freeze.... > >>>> > >>>> Cheers, > >>>> > >>>> Dave. > >>>> -- > >>>> Dave Chinner > >>>> david@fromorbit.com > >>> > >>> I spend some time on tracking this problem. I've made a quick > >>> modification with per-AG reservation and tested with generic/311 > >>> it seems fine. My current question is that how such fsfreezed > >>> images (with clean mount) work with old kernels without [PATCH 1/1]? > >>> I'm afraid orphan inodes won't be freed with such old kernels.... > >>> Am I missing something? > >> > >> It's true, a snapshot created with these patches will not have their unlinked > >> inodes processed if mounted on an older kernel. I'm not sure how much of a > >> problem that is; the filesystem is not inconsistent, but some space is lost, > >> I guess. I'm not sure it's common to take a snapshot of a frozen filesystem on > >> one kernel and then move it back to an older kernel. Maybe others have > >> thoughts on this. Yes, I know of cloudy image generation factories that use old versions of RHEL to generate images that are then frozen and copied to a deployment system without an unmount. I don't understand why they insist that unmount is "too slow" but freeze isn't, nor why they then file bugs that their instance deploy process is unacceptably slow because of log recovery. > > My current thought might be only to write clean mount without > > unlinked inodes when freezing, but leave log dirty if any > > unlinked inodes exist as Brian mentioned before and don't > > handle such case (?). I'd like to hear more comments about > > this as well. > > I don't know if I had made this comment before ;) but I feel like that's even > more "surprise" (as in: gets further from the principle of least surprise) > and TBH I would rather not have that somewhat unpredictable behavior. > > I think I'd rather /always/ make a dirty log than sometimes do it, other > times not. It'd just be more confusion for the admin IMHO. ...but the next time anyone wants to introduce a new in/rocompat feature flag for something inode related, then you can disable the "leave a dirty log on freeze if there are unlinked inodes" behavior. --D > > Thanks, > -Eric > > > Thanks, > > Gao Xiang > > > >> > >> -Eric > >> > >
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 93588ea..5669525 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1419,9 +1419,10 @@ struct proc_xfs_info { /* * Second stage of a freeze. The data is already frozen so we only - * need to take care of the metadata. Once that's done sync the superblock - * to the log to dirty it in case of a crash while frozen. This ensures that we - * will recover the unlinked inode lists on the next mount. + * need to take care of the metadata. + * Any unlinked inode lists will remain at this point, and be recovered + * on the next writable mount if we crash while frozen, or create + * a snapshot from the frozen filesystem. */ STATIC int xfs_fs_freeze( @@ -1431,7 +1432,7 @@ struct proc_xfs_info { xfs_save_resvblks(mp); xfs_quiesce_attr(mp); - return xfs_sync_sb(mp, true); + return 0; } STATIC int
Now that unlinked inode recovery is done outside of log recovery, there is no need to dirty the log on snapshots just to handle unlinked inodes. This means that readonly snapshots can be mounted without requiring -o ro,norecovery to avoid the log replay that can't happen on a readonly block device. (unlinked inodes will just hang out in the agi buckets until the next writable mount) Signed-off-by: Eric Sandeen <sandeen@redhat.com> --- -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html