Message ID | 1474589500-13584-1-git-send-email-david@fromorbit.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On 9/22/16 7:11 PM, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > Recently we've had a number of reports where log recovery on a v5 > filesystem has reported corruptions that looked to be caused by > recovery being re-run over the top of an already-recovered > metadata. This has uncovered a bug in recovery (fixed elsewhere) > but the vector that caused this was largely unknown. > > A kdump test started tripping over this problem - the system > would be crashed, the kdump kernel and environment would boot and > dump the kernel core image, and then the system would reboot. After > reboot, the root filesystem was triggering log recovery and > corruptions were being detected. The metadumps indicated the above > log recovery issue. > > What is happening is that the kdump kernel and environment is > mounting the root device read-only to find the binaries needed to do > it's work. The result of this is that it is running log recovery. > However, because there were unlinked files and EFIs to be processed > by recovery, the completion of phase 1 of log recovery could not > mark the log clean. And because it's a read-only mount, the unmount > process does not write records to the log to mark it clean, either. > Hence on the next mount of the filesystem, log recovery was run > again across all the metadata that had already been recovered and > this is what triggered corruption warnings. > > To avoid this problem, we need to ensure that a read-only mount > always updates the log when it completes the second phase of > recovery. We already handle this sort of issue with rw->ro remount > transitions, so the solution is as simple as quiescing the > filesystem at the appropriate time during the mount process. This > results in the log being marked clean so the mount behaviour > recorded in the logs on repeated RO mounts will change (i.e. log > recovery will no longer be run on every mount until a RW mount is > done). This is a user visible change in behaviour, but it is > harmless. Excellent idea :) Reviewed-by: Eric Sandeen <sandeen@redhat.com> > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_mount.c | 14 ++++++++++++++ > fs/xfs/xfs_super.c | 2 +- > fs/xfs/xfs_super.h | 1 + > 3 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > index faeead6..56e85a6 100644 > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -934,6 +934,20 @@ xfs_mountfs( > } > > /* > + * Now the log is fully replayed, we can transition to full read-only > + * mode for read-only mounts. This will sync all the metadata and clean > + * the log so that the recovery we just performed does not have to be > + * replayed again on the next mount. > + * > + * We use the same quiesce mechanism as the rw->ro remount, as they are > + * semantically identical operations. > + */ > + if ((mp->m_flags & (XFS_MOUNT_RDONLY|XFS_MOUNT_NORECOVERY)) == > + XFS_MOUNT_RDONLY) { > + xfs_quiesce_attr(mp); > + } > + > + /* > * Complete the quota initialisation, post-log-replay component. > */ > if (quotamount) { > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 3409753..2d092f9 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1137,7 +1137,7 @@ xfs_restore_resvblks(struct xfs_mount *mp) > * Note: xfs_log_quiesce() stops background log work - the callers must ensure > * it is started again when appropriate. > */ > -static void > +void > xfs_quiesce_attr( > struct xfs_mount *mp) > { > diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h > index 529bce9..b6418ab 100644 > --- a/fs/xfs/xfs_super.h > +++ b/fs/xfs/xfs_super.h > @@ -61,6 +61,7 @@ struct xfs_mount; > struct xfs_buftarg; > struct block_device; > > +extern void xfs_quiesce_attr(struct xfs_mount *mp); > extern void xfs_flush_inodes(struct xfs_mount *mp); > extern void xfs_blkdev_issue_flush(struct xfs_buftarg *); > extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *, >
On 9/22/16 8:39 PM, Eric Sandeen wrote: > On 9/22/16 7:11 PM, Dave Chinner wrote: >> From: Dave Chinner <dchinner@redhat.com> ... >> To avoid this problem, we need to ensure that a read-only mount >> always updates the log when it completes the second phase of >> recovery. We already handle this sort of issue with rw->ro remount >> transitions, so the solution is as simple as quiescing the >> filesystem at the appropriate time during the mount process. This >> results in the log being marked clean so the mount behaviour >> recorded in the logs on repeated RO mounts will change (i.e. log >> recovery will no longer be run on every mount until a RW mount is >> done). This is a user visible change in behaviour, but it is >> harmless. > > Excellent idea :) > > Reviewed-by: Eric Sandeen <sandeen@redhat.com> Actually ... after playing with this a bit ... Should we restrict this to only when the log got replayed, with a !XFS_LAST_UNMOUNT_WAS_CLEAN(mp) test? Maybe it's harmless as it is, but it seems we should restrict it to the log-got-replayed case. -Eric >> Signed-off-by: Dave Chinner <dchinner@redhat.com> >> --- >> fs/xfs/xfs_mount.c | 14 ++++++++++++++ >> fs/xfs/xfs_super.c | 2 +- >> fs/xfs/xfs_super.h | 1 + >> 3 files changed, 16 insertions(+), 1 deletion(-) >> >> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c >> index faeead6..56e85a6 100644 >> --- a/fs/xfs/xfs_mount.c >> +++ b/fs/xfs/xfs_mount.c >> @@ -934,6 +934,20 @@ xfs_mountfs( >> } >> >> /* >> + * Now the log is fully replayed, we can transition to full read-only >> + * mode for read-only mounts. This will sync all the metadata and clean >> + * the log so that the recovery we just performed does not have to be >> + * replayed again on the next mount. >> + * >> + * We use the same quiesce mechanism as the rw->ro remount, as they are >> + * semantically identical operations. >> + */ >> + if ((mp->m_flags & (XFS_MOUNT_RDONLY|XFS_MOUNT_NORECOVERY)) == >> + XFS_MOUNT_RDONLY) { >> + xfs_quiesce_attr(mp); >> + } >> + >> + /* >> * Complete the quota initialisation, post-log-replay component. >> */ >> if (quotamount) { >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c >> index 3409753..2d092f9 100644 >> --- a/fs/xfs/xfs_super.c >> +++ b/fs/xfs/xfs_super.c >> @@ -1137,7 +1137,7 @@ xfs_restore_resvblks(struct xfs_mount *mp) >> * Note: xfs_log_quiesce() stops background log work - the callers must ensure >> * it is started again when appropriate. >> */ >> -static void >> +void >> xfs_quiesce_attr( >> struct xfs_mount *mp) >> { >> diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h >> index 529bce9..b6418ab 100644 >> --- a/fs/xfs/xfs_super.h >> +++ b/fs/xfs/xfs_super.h >> @@ -61,6 +61,7 @@ struct xfs_mount; >> struct xfs_buftarg; >> struct block_device; >> >> +extern void xfs_quiesce_attr(struct xfs_mount *mp); >> extern void xfs_flush_inodes(struct xfs_mount *mp); >> extern void xfs_blkdev_issue_flush(struct xfs_buftarg *); >> extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *, >> > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs >
On Fri, Sep 23, 2016 at 11:24:53AM -0500, Eric Sandeen wrote: > On 9/22/16 8:39 PM, Eric Sandeen wrote: > > On 9/22/16 7:11 PM, Dave Chinner wrote: > >> From: Dave Chinner <dchinner@redhat.com> > > ... > > >> To avoid this problem, we need to ensure that a read-only mount > >> always updates the log when it completes the second phase of > >> recovery. We already handle this sort of issue with rw->ro remount > >> transitions, so the solution is as simple as quiescing the > >> filesystem at the appropriate time during the mount process. This > >> results in the log being marked clean so the mount behaviour > >> recorded in the logs on repeated RO mounts will change (i.e. log > >> recovery will no longer be run on every mount until a RW mount is > >> done). This is a user visible change in behaviour, but it is > >> harmless. > > > > Excellent idea :) > > > > Reviewed-by: Eric Sandeen <sandeen@redhat.com> > > Actually ... after playing with this a bit ... > > Should we restrict this to only when the log got > replayed, with a !XFS_LAST_UNMOUNT_WAS_CLEAN(mp) test? > > Maybe it's harmless as it is, but it seems we should restrict > it to the log-got-replayed case. Well, writing another unmount record should be harmless. The thing is that the quiesce call also serves to shut down periodic log functions that are started when the second phase of recovery is run, These are started regardless of whether the mount was clean or not. Hence we are currently running periodic log work on ro mounts and tryin gto cover/sync the log every so often. For rw mounts this is needed, but for ro mounts once we know everything is clean we can shut them down, jus tlike we do for a rw->ro remount. So I think that just treating the ro mount like a rw mount followed by a rw->ro transition after potential mount time modifications are complete is the best approach for consistency of RO behaviour.... Cheers, Dave.
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index faeead6..56e85a6 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -934,6 +934,20 @@ xfs_mountfs( } /* + * Now the log is fully replayed, we can transition to full read-only + * mode for read-only mounts. This will sync all the metadata and clean + * the log so that the recovery we just performed does not have to be + * replayed again on the next mount. + * + * We use the same quiesce mechanism as the rw->ro remount, as they are + * semantically identical operations. + */ + if ((mp->m_flags & (XFS_MOUNT_RDONLY|XFS_MOUNT_NORECOVERY)) == + XFS_MOUNT_RDONLY) { + xfs_quiesce_attr(mp); + } + + /* * Complete the quota initialisation, post-log-replay component. */ if (quotamount) { diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 3409753..2d092f9 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1137,7 +1137,7 @@ xfs_restore_resvblks(struct xfs_mount *mp) * Note: xfs_log_quiesce() stops background log work - the callers must ensure * it is started again when appropriate. */ -static void +void xfs_quiesce_attr( struct xfs_mount *mp) { diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h index 529bce9..b6418ab 100644 --- a/fs/xfs/xfs_super.h +++ b/fs/xfs/xfs_super.h @@ -61,6 +61,7 @@ struct xfs_mount; struct xfs_buftarg; struct block_device; +extern void xfs_quiesce_attr(struct xfs_mount *mp); extern void xfs_flush_inodes(struct xfs_mount *mp); extern void xfs_blkdev_issue_flush(struct xfs_buftarg *); extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *,