[2/2] xfs: don't dirty snapshot logs for unlinked inode recovery
diff mbox

Message ID 896a0202-aac8-e43f-7ea6-3718591e32aa@sandeen.net
State New
Headers show

Commit Message

Eric Sandeen March 7, 2018, 11:33 p.m. UTC
Now that unlinked inode recovery is done outside of
log recovery, there is no need to dirty the log on
snapshots just to handle unlinked inodes.  This means
that readonly snapshots can be mounted without requiring
-o ro,norecovery to avoid the log replay that can't happen
on a readonly block device.

(unlinked inodes will just hang out in the agi buckets until
the next writable mount)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---


--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Darrick J. Wong March 24, 2018, 4:20 p.m. UTC | #1
On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote:
> Now that unlinked inode recovery is done outside of
> log recovery, there is no need to dirty the log on
> snapshots just to handle unlinked inodes.  This means
> that readonly snapshots can be mounted without requiring
> -o ro,norecovery to avoid the log replay that can't happen
> on a readonly block device.
> 
> (unlinked inodes will just hang out in the agi buckets until
> the next writable mount)

FWIW I put these two in a test kernel to see what would happen and
generic/311 failures popped up.  It looked like the _check_scratch_fs
found incorrect block counts on the snapshot(?)

--D

> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 93588ea..5669525 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1419,9 +1419,10 @@ struct proc_xfs_info {
>  
>  /*
>   * Second stage of a freeze. The data is already frozen so we only
> - * need to take care of the metadata. Once that's done sync the superblock
> - * to the log to dirty it in case of a crash while frozen. This ensures that we
> - * will recover the unlinked inode lists on the next mount.
> + * need to take care of the metadata.
> + * Any unlinked inode lists will remain at this point, and be recovered
> + * on the next writable mount if we crash while frozen, or create
> + * a snapshot from the frozen filesystem.
>   */
>  STATIC int
>  xfs_fs_freeze(
> @@ -1431,7 +1432,7 @@ struct proc_xfs_info {
>  
>  	xfs_save_resvblks(mp);
>  	xfs_quiesce_attr(mp);
> -	return xfs_sync_sb(mp, true);
> +	return 0;
>  }
>  
>  STATIC int
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster March 26, 2018, 12:46 p.m. UTC | #2
On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote:
> On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote:
> > Now that unlinked inode recovery is done outside of
> > log recovery, there is no need to dirty the log on
> > snapshots just to handle unlinked inodes.  This means
> > that readonly snapshots can be mounted without requiring
> > -o ro,norecovery to avoid the log replay that can't happen
> > on a readonly block device.
> > 
> > (unlinked inodes will just hang out in the agi buckets until
> > the next writable mount)
> 
> FWIW I put these two in a test kernel to see what would happen and
> generic/311 failures popped up.  It looked like the _check_scratch_fs
> found incorrect block counts on the snapshot(?)
> 

Interesting. Just a wild guess, but perhaps it has something to do with
lazy sb accounting..? I see we call xfs_initialize_perag_data() when
mounting an unclean fs.

Brian

> --D
> 
> > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > ---
> > 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 93588ea..5669525 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1419,9 +1419,10 @@ struct proc_xfs_info {
> >  
> >  /*
> >   * Second stage of a freeze. The data is already frozen so we only
> > - * need to take care of the metadata. Once that's done sync the superblock
> > - * to the log to dirty it in case of a crash while frozen. This ensures that we
> > - * will recover the unlinked inode lists on the next mount.
> > + * need to take care of the metadata.
> > + * Any unlinked inode lists will remain at this point, and be recovered
> > + * on the next writable mount if we crash while frozen, or create
> > + * a snapshot from the frozen filesystem.
> >   */
> >  STATIC int
> >  xfs_fs_freeze(
> > @@ -1431,7 +1432,7 @@ struct proc_xfs_info {
> >  
> >  	xfs_save_resvblks(mp);
> >  	xfs_quiesce_attr(mp);
> > -	return xfs_sync_sb(mp, true);
> > +	return 0;
> >  }
> >  
> >  STATIC int
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 27, 2018, 9:17 p.m. UTC | #3
On Mon, Mar 26, 2018 at 08:46:49AM -0400, Brian Foster wrote:
> On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote:
> > On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote:
> > > Now that unlinked inode recovery is done outside of
> > > log recovery, there is no need to dirty the log on
> > > snapshots just to handle unlinked inodes.  This means
> > > that readonly snapshots can be mounted without requiring
> > > -o ro,norecovery to avoid the log replay that can't happen
> > > on a readonly block device.
> > > 
> > > (unlinked inodes will just hang out in the agi buckets until
> > > the next writable mount)
> > 
> > FWIW I put these two in a test kernel to see what would happen and
> > generic/311 failures popped up.  It looked like the _check_scratch_fs
> > found incorrect block counts on the snapshot(?)
> > 
> 
> Interesting. Just a wild guess, but perhaps it has something to do with
> lazy sb accounting..? I see we call xfs_initialize_perag_data() when
> mounting an unclean fs.

The freeze is calls xfs_log_sbcount() which should update the
superblock counters from the in-memory counters and write them to
disk.

If they are out, I'm guessing it's because the in-memory per-ag
reservations are not being returned to the global pool before the
in-memory counters are summed during a freeze....

Cheers,

Dave.

Patch
diff mbox

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 93588ea..5669525 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1419,9 +1419,10 @@  struct proc_xfs_info {
 
 /*
  * Second stage of a freeze. The data is already frozen so we only
- * need to take care of the metadata. Once that's done sync the superblock
- * to the log to dirty it in case of a crash while frozen. This ensures that we
- * will recover the unlinked inode lists on the next mount.
+ * need to take care of the metadata.
+ * Any unlinked inode lists will remain at this point, and be recovered
+ * on the next writable mount if we crash while frozen, or create
+ * a snapshot from the frozen filesystem.
  */
 STATIC int
 xfs_fs_freeze(
@@ -1431,7 +1432,7 @@  struct proc_xfs_info {
 
 	xfs_save_resvblks(mp);
 	xfs_quiesce_attr(mp);
-	return xfs_sync_sb(mp, true);
+	return 0;
 }
 
 STATIC int