diff mbox series

[01/12] xfs: free COW staging extents when freezing filesystem

Message ID 154630901708.16693.4337908735292154622.stgit@magnolia (mailing list archive)
State New, archived
Headers show
Series xfs: deferred inode inactivation | expand

Commit Message

Darrick J. Wong Jan. 1, 2019, 2:16 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

When we're freezing the filesystem, free all the COW staging extents
before we shut the log down so that we can minimize the amount of
recovery work that will be necessary during the next mount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_icache.c |   17 ++++++++++++-----
 fs/xfs/xfs_super.c  |   18 ++++++++++++++++++
 2 files changed, 30 insertions(+), 5 deletions(-)

Comments

Brian Foster Jan. 11, 2019, 4:28 p.m. UTC | #1
On Mon, Dec 31, 2018 at 06:16:57PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we're freezing the filesystem, free all the COW staging extents
> before we shut the log down so that we can minimize the amount of
> recovery work that will be necessary during the next mount.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_icache.c |   17 ++++++++++++-----
>  fs/xfs/xfs_super.c  |   18 ++++++++++++++++++
>  2 files changed, 30 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 245483cc282b..36d986087abb 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1741,6 +1741,7 @@ xfs_inode_free_cowblocks(
>  	void			*args)
>  {
>  	struct xfs_eofblocks	*eofb = args;
> +	uint			lock_mode = 0;
>  	int			match;
>  	int			ret = 0;
>  
> @@ -1761,9 +1762,15 @@ xfs_inode_free_cowblocks(
>  			return 0;
>  	}
>  
> -	/* Free the CoW blocks */
> -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> +	/*
> +	 * Free the CoW blocks.  We don't need to lock the inode if we're in
> +	 * the process of freezing the filesystem because we've already locked
> +	 * out writes and page faults.
> +	 */
> +	if (ip->i_mount->m_super->s_writers.frozen == SB_UNFROZEN)
> +		lock_mode = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;

Ok, but is there a problem with actually locking in this context? If so,
I'd expect the comment to say something like: "Free the COW blocks.
Don't lock the inode if we're in the process of freezing the filesystem
because <some bad thing happens>. This is safe because we've already
locked out writes and page faults."

> +	if (lock_mode)
> +		xfs_ilock(ip, lock_mode);
>  
>  	/*
>  	 * Check again, nobody else should be able to dirty blocks or change
> @@ -1772,8 +1779,8 @@ xfs_inode_free_cowblocks(
>  	if (xfs_prep_free_cowblocks(ip))
>  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
>  
> -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> -	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> +	if (lock_mode)
> +		xfs_iunlock(ip, lock_mode);
>  
>  	return ret;
>  }
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 6903b36eac5d..bb4953cfd683 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1102,6 +1102,24 @@ xfs_fs_sync_fs(
>  	if (!wait)
>  		return 0;
>  
> +	/*
> +	 * If we're in the middle of freezing the filesystem, this is our last
> +	 * chance to run regular transactions.  Clear all the COW staging
> +	 * extents so that we can freeze the filesystem with as little recovery
> +	 * work to do at the next mount as possible.  It's safe to do this
> +	 * without locking inodes because the freezer code flushed all the
> +	 * dirty data from the page cache and locked out writes and page faults.
> +	 */

Well, we have XFS_TRANS_NO_WRITECOUNT. It seems we could use an eofb
freeze flag or some such to notify the scan that we're in a special
freeze context and in turn use that to tweak the locking (instead of the
SB_FREEZE_* checks) and avoid blocking on transaction allocation. Would
that allow us to run this in xfs_fs_freeze() context?

I'm also a little curious about the high-level tradeoff we're making.
This patch means that a freeze, and thus something like an LVM snapshot,
would clear/reset all COW blocks in the fs, right? If so, ISTM that
could be annoying if the user had a bunch of reflinked vdisk images or
something on the fs.

Is the tradeoff just that if the user freezes and then doesn't unfreeze
before the next mount that log recovery will have to deal with COW
blocks, or are there considerations for a typical freeze/unfreeze cycle
as well? Is this more expensive at recovery time than at freeze time?

Brian

> +	if (sb->s_writers.frozen == SB_FREEZE_PAGEFAULT) {
> +		int		error;
> +
> +		error = xfs_icache_free_cowblocks(mp, NULL);
> +		if (error) {
> +			xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> +			return error;
> +		}
> +	}
> +
>  	xfs_log_force(mp, XFS_LOG_SYNC);
>  	if (laptop_mode) {
>  		/*
>
Darrick J. Wong Jan. 17, 2019, 5:24 p.m. UTC | #2
On Fri, Jan 11, 2019 at 11:28:38AM -0500, Brian Foster wrote:
> On Mon, Dec 31, 2018 at 06:16:57PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When we're freezing the filesystem, free all the COW staging extents
> > before we shut the log down so that we can minimize the amount of
> > recovery work that will be necessary during the next mount.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_icache.c |   17 ++++++++++++-----
> >  fs/xfs/xfs_super.c  |   18 ++++++++++++++++++
> >  2 files changed, 30 insertions(+), 5 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 245483cc282b..36d986087abb 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -1741,6 +1741,7 @@ xfs_inode_free_cowblocks(
> >  	void			*args)
> >  {
> >  	struct xfs_eofblocks	*eofb = args;
> > +	uint			lock_mode = 0;
> >  	int			match;
> >  	int			ret = 0;
> >  
> > @@ -1761,9 +1762,15 @@ xfs_inode_free_cowblocks(
> >  			return 0;
> >  	}
> >  
> > -	/* Free the CoW blocks */
> > -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > +	/*
> > +	 * Free the CoW blocks.  We don't need to lock the inode if we're in
> > +	 * the process of freezing the filesystem because we've already locked
> > +	 * out writes and page faults.
> > +	 */
> > +	if (ip->i_mount->m_super->s_writers.frozen == SB_UNFROZEN)
> > +		lock_mode = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> 
> Ok, but is there a problem with actually locking in this context? If so,
> I'd expect the comment to say something like: "Free the COW blocks.
> Don't lock the inode if we're in the process of freezing the filesystem
> because <some bad thing happens>. This is safe because we've already
> locked out writes and page faults."

Hmmm, it's now been so long since I wrote this patch I don't remember
why I did this.  Wait something's coming back...

> > +	if (lock_mode)
> > +		xfs_ilock(ip, lock_mode);
> >  
> >  	/*
> >  	 * Check again, nobody else should be able to dirty blocks or change
> > @@ -1772,8 +1779,8 @@ xfs_inode_free_cowblocks(
> >  	if (xfs_prep_free_cowblocks(ip))
> >  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> >  
> > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > -	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > +	if (lock_mode)
> > +		xfs_iunlock(ip, lock_mode);
> >  
> >  	return ret;
> >  }
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 6903b36eac5d..bb4953cfd683 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1102,6 +1102,24 @@ xfs_fs_sync_fs(
> >  	if (!wait)
> >  		return 0;
> >  
> > +	/*
> > +	 * If we're in the middle of freezing the filesystem, this is our last
> > +	 * chance to run regular transactions.  Clear all the COW staging
> > +	 * extents so that we can freeze the filesystem with as little recovery
> > +	 * work to do at the next mount as possible.  It's safe to do this
> > +	 * without locking inodes because the freezer code flushed all the
> > +	 * dirty data from the page cache and locked out writes and page faults.
> > +	 */
> 
> Well, we have XFS_TRANS_NO_WRITECOUNT. It seems we could use an eofb
> freeze flag or some such to notify the scan that we're in a special
> freeze context and in turn use that to tweak the locking (instead of the
> SB_FREEZE_* checks) and avoid blocking on transaction allocation. Would
> that allow us to run this in xfs_fs_freeze() context?

...oh, right.  Another thread could have taken the io/mmap locks (say
for fallocate) before the fs entered SB_FROZEN, and is now blocked
in xfs_trans_alloc waiting for the fs to unfreeze.  I can't remember the
exact circumstances though.

> I'm also a little curious about the high-level tradeoff we're making.
> This patch means that a freeze, and thus something like an LVM snapshot,
> would clear/reset all COW blocks in the fs, right? If so, ISTM that
> could be annoying if the user had a bunch of reflinked vdisk images or
> something on the fs.

Yeah, it's sort of painful to drop all those speculative preallocations.

> Is the tradeoff just that if the user freezes and then doesn't unfreeze
> before the next mount that log recovery will have to deal with COW
> blocks, or are there considerations for a typical freeze/unfreeze cycle
> as well? Is this more expensive at recovery time than at freeze time?

It's entirely to reduce the recovery work necessary after freezing the
fs, snapshotting the fs (via lvm or whatever) and then mounting the
snapshot separately.  TBH this patch has languished for years now
because it's only necessary to optimize this one usage.

Maybe it's easier to drop this one, unless anyone feels particularly
passionate about clearing out cow staging extents before a freeze?

--D

> Brian
> 
> > +	if (sb->s_writers.frozen == SB_FREEZE_PAGEFAULT) {
> > +		int		error;
> > +
> > +		error = xfs_icache_free_cowblocks(mp, NULL);
> > +		if (error) {
> > +			xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > +			return error;
> > +		}
> > +	}
> > +
> >  	xfs_log_force(mp, XFS_LOG_SYNC);
> >  	if (laptop_mode) {
> >  		/*
> >
Brian Foster Jan. 17, 2019, 6:14 p.m. UTC | #3
On Thu, Jan 17, 2019 at 09:24:28AM -0800, Darrick J. Wong wrote:
> On Fri, Jan 11, 2019 at 11:28:38AM -0500, Brian Foster wrote:
> > On Mon, Dec 31, 2018 at 06:16:57PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > When we're freezing the filesystem, free all the COW staging extents
> > > before we shut the log down so that we can minimize the amount of
> > > recovery work that will be necessary during the next mount.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/xfs_icache.c |   17 ++++++++++++-----
> > >  fs/xfs/xfs_super.c  |   18 ++++++++++++++++++
> > >  2 files changed, 30 insertions(+), 5 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > index 245483cc282b..36d986087abb 100644
> > > --- a/fs/xfs/xfs_icache.c
> > > +++ b/fs/xfs/xfs_icache.c
> > > @@ -1741,6 +1741,7 @@ xfs_inode_free_cowblocks(
> > >  	void			*args)
> > >  {
> > >  	struct xfs_eofblocks	*eofb = args;
> > > +	uint			lock_mode = 0;
> > >  	int			match;
> > >  	int			ret = 0;
> > >  
> > > @@ -1761,9 +1762,15 @@ xfs_inode_free_cowblocks(
> > >  			return 0;
> > >  	}
> > >  
> > > -	/* Free the CoW blocks */
> > > -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > > +	/*
> > > +	 * Free the CoW blocks.  We don't need to lock the inode if we're in
> > > +	 * the process of freezing the filesystem because we've already locked
> > > +	 * out writes and page faults.
> > > +	 */
> > > +	if (ip->i_mount->m_super->s_writers.frozen == SB_UNFROZEN)
> > > +		lock_mode = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> > 
> > Ok, but is there a problem with actually locking in this context? If so,
> > I'd expect the comment to say something like: "Free the COW blocks.
> > Don't lock the inode if we're in the process of freezing the filesystem
> > because <some bad thing happens>. This is safe because we've already
> > locked out writes and page faults."
> 
> Hmmm, it's now been so long since I wrote this patch I don't remember
> why I did this.  Wait something's coming back...
> 
> > > +	if (lock_mode)
> > > +		xfs_ilock(ip, lock_mode);
> > >  
> > >  	/*
> > >  	 * Check again, nobody else should be able to dirty blocks or change
> > > @@ -1772,8 +1779,8 @@ xfs_inode_free_cowblocks(
> > >  	if (xfs_prep_free_cowblocks(ip))
> > >  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> > >  
> > > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > -	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > > +	if (lock_mode)
> > > +		xfs_iunlock(ip, lock_mode);
> > >  
> > >  	return ret;
> > >  }
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index 6903b36eac5d..bb4953cfd683 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -1102,6 +1102,24 @@ xfs_fs_sync_fs(
> > >  	if (!wait)
> > >  		return 0;
> > >  
> > > +	/*
> > > +	 * If we're in the middle of freezing the filesystem, this is our last
> > > +	 * chance to run regular transactions.  Clear all the COW staging
> > > +	 * extents so that we can freeze the filesystem with as little recovery
> > > +	 * work to do at the next mount as possible.  It's safe to do this
> > > +	 * without locking inodes because the freezer code flushed all the
> > > +	 * dirty data from the page cache and locked out writes and page faults.
> > > +	 */
> > 
> > Well, we have XFS_TRANS_NO_WRITECOUNT. It seems we could use an eofb
> > freeze flag or some such to notify the scan that we're in a special
> > freeze context and in turn use that to tweak the locking (instead of the
> > SB_FREEZE_* checks) and avoid blocking on transaction allocation. Would
> > that allow us to run this in xfs_fs_freeze() context?
> 
> ...oh, right.  Another thread could have taken the io/mmap locks (say
> for fallocate) before the fs entered SB_FROZEN, and is now blocked
> in xfs_trans_alloc waiting for the fs to unfreeze.  I can't remember the
> exact circumstances though.
> 

Ok, so we'd basically deadlock on the freeze IIUC. That's exactly what
we'd want to document in the comment. :P

> > I'm also a little curious about the high-level tradeoff we're making.
> > This patch means that a freeze, and thus something like an LVM snapshot,
> > would clear/reset all COW blocks in the fs, right? If so, ISTM that
> > could be annoying if the user had a bunch of reflinked vdisk images or
> > something on the fs.
> 
> Yeah, it's sort of painful to drop all those speculative preallocations.
> 
> > Is the tradeoff just that if the user freezes and then doesn't unfreeze
> > before the next mount that log recovery will have to deal with COW
> > blocks, or are there considerations for a typical freeze/unfreeze cycle
> > as well? Is this more expensive at recovery time than at freeze time?
> 
> It's entirely to reduce the recovery work necessary after freezing the
> fs, snapshotting the fs (via lvm or whatever) and then mounting the
> snapshot separately.  TBH this patch has languished for years now
> because it's only necessary to optimize this one usage.
> 

So the snapshot will run log recovery and end up freeing all of these
extents "the hard way," right?

> Maybe it's easier to drop this one, unless anyone feels particularly
> passionate about clearing out cow staging extents before a freeze?
> 

Not really.

On principle, it seems more appropriate to me to put the work onto the
snapshot rather than disrupt the primary fs. I could see doing something
like this without much thought if the blocks were already freed or
something and the change was thus invisible to a user, but ISTM that
losing the preallocation can have a significant affect on the primary
workload.

Then again, this doesn't affect me much and I'm not terribly familiar
with the use case or whether there have been complaints or anything. I
don't feel strongly about it so long as the tradeoffs are documented
somewhere so it's clear what needs fixing if this behavior became
problematic down the road.

Brian

> --D
> 
> > Brian
> > 
> > > +	if (sb->s_writers.frozen == SB_FREEZE_PAGEFAULT) {
> > > +		int		error;
> > > +
> > > +		error = xfs_icache_free_cowblocks(mp, NULL);
> > > +		if (error) {
> > > +			xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > +			return error;
> > > +		}
> > > +	}
> > > +
> > >  	xfs_log_force(mp, XFS_LOG_SYNC);
> > >  	if (laptop_mode) {
> > >  		/*
> > >
Darrick J. Wong Jan. 17, 2019, 8:20 p.m. UTC | #4
On Thu, Jan 17, 2019 at 01:14:06PM -0500, Brian Foster wrote:
> On Thu, Jan 17, 2019 at 09:24:28AM -0800, Darrick J. Wong wrote:
> > On Fri, Jan 11, 2019 at 11:28:38AM -0500, Brian Foster wrote:
> > > On Mon, Dec 31, 2018 at 06:16:57PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > When we're freezing the filesystem, free all the COW staging extents
> > > > before we shut the log down so that we can minimize the amount of
> > > > recovery work that will be necessary during the next mount.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/xfs_icache.c |   17 ++++++++++++-----
> > > >  fs/xfs/xfs_super.c  |   18 ++++++++++++++++++
> > > >  2 files changed, 30 insertions(+), 5 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > > index 245483cc282b..36d986087abb 100644
> > > > --- a/fs/xfs/xfs_icache.c
> > > > +++ b/fs/xfs/xfs_icache.c
> > > > @@ -1741,6 +1741,7 @@ xfs_inode_free_cowblocks(
> > > >  	void			*args)
> > > >  {
> > > >  	struct xfs_eofblocks	*eofb = args;
> > > > +	uint			lock_mode = 0;
> > > >  	int			match;
> > > >  	int			ret = 0;
> > > >  
> > > > @@ -1761,9 +1762,15 @@ xfs_inode_free_cowblocks(
> > > >  			return 0;
> > > >  	}
> > > >  
> > > > -	/* Free the CoW blocks */
> > > > -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > > > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > > > +	/*
> > > > +	 * Free the CoW blocks.  We don't need to lock the inode if we're in
> > > > +	 * the process of freezing the filesystem because we've already locked
> > > > +	 * out writes and page faults.
> > > > +	 */
> > > > +	if (ip->i_mount->m_super->s_writers.frozen == SB_UNFROZEN)
> > > > +		lock_mode = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> > > 
> > > Ok, but is there a problem with actually locking in this context? If so,
> > > I'd expect the comment to say something like: "Free the COW blocks.
> > > Don't lock the inode if we're in the process of freezing the filesystem
> > > because <some bad thing happens>. This is safe because we've already
> > > locked out writes and page faults."
> > 
> > Hmmm, it's now been so long since I wrote this patch I don't remember
> > why I did this.  Wait something's coming back...
> > 
> > > > +	if (lock_mode)
> > > > +		xfs_ilock(ip, lock_mode);
> > > >  
> > > >  	/*
> > > >  	 * Check again, nobody else should be able to dirty blocks or change
> > > > @@ -1772,8 +1779,8 @@ xfs_inode_free_cowblocks(
> > > >  	if (xfs_prep_free_cowblocks(ip))
> > > >  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> > > >  
> > > > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > > -	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > > > +	if (lock_mode)
> > > > +		xfs_iunlock(ip, lock_mode);
> > > >  
> > > >  	return ret;
> > > >  }
> > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > index 6903b36eac5d..bb4953cfd683 100644
> > > > --- a/fs/xfs/xfs_super.c
> > > > +++ b/fs/xfs/xfs_super.c
> > > > @@ -1102,6 +1102,24 @@ xfs_fs_sync_fs(
> > > >  	if (!wait)
> > > >  		return 0;
> > > >  
> > > > +	/*
> > > > +	 * If we're in the middle of freezing the filesystem, this is our last
> > > > +	 * chance to run regular transactions.  Clear all the COW staging
> > > > +	 * extents so that we can freeze the filesystem with as little recovery
> > > > +	 * work to do at the next mount as possible.  It's safe to do this
> > > > +	 * without locking inodes because the freezer code flushed all the
> > > > +	 * dirty data from the page cache and locked out writes and page faults.
> > > > +	 */
> > > 
> > > Well, we have XFS_TRANS_NO_WRITECOUNT. It seems we could use an eofb
> > > freeze flag or some such to notify the scan that we're in a special
> > > freeze context and in turn use that to tweak the locking (instead of the
> > > SB_FREEZE_* checks) and avoid blocking on transaction allocation. Would
> > > that allow us to run this in xfs_fs_freeze() context?
> > 
> > ...oh, right.  Another thread could have taken the io/mmap locks (say
> > for fallocate) before the fs entered SB_FROZEN, and is now blocked
> > in xfs_trans_alloc waiting for the fs to unfreeze.  I can't remember the
> > exact circumstances though.
> > 
> 
> Ok, so we'd basically deadlock on the freeze IIUC. That's exactly what
> we'd want to document in the comment. :P
> 
> > > I'm also a little curious about the high-level tradeoff we're making.
> > > This patch means that a freeze, and thus something like an LVM snapshot,
> > > would clear/reset all COW blocks in the fs, right? If so, ISTM that
> > > could be annoying if the user had a bunch of reflinked vdisk images or
> > > something on the fs.
> > 
> > Yeah, it's sort of painful to drop all those speculative preallocations.
> > 
> > > Is the tradeoff just that if the user freezes and then doesn't unfreeze
> > > before the next mount that log recovery will have to deal with COW
> > > blocks, or are there considerations for a typical freeze/unfreeze cycle
> > > as well? Is this more expensive at recovery time than at freeze time?
> > 
> > It's entirely to reduce the recovery work necessary after freezing the
> > fs, snapshotting the fs (via lvm or whatever) and then mounting the
> > snapshot separately.  TBH this patch has languished for years now
> > because it's only necessary to optimize this one usage.
> > 
> 
> So the snapshot will run log recovery and end up freeing all of these
> extents "the hard way," right?

Yes.  I suspect it's faster to let log recovery clean it up because it
doesn't have to walk the cowblocks inodes to free cow extents piece by
piece.

> > Maybe it's easier to drop this one, unless anyone feels particularly
> > passionate about clearing out cow staging extents before a freeze?
> > 
> 
> Not really.
> 
> On principle, it seems more appropriate to me to put the work onto the
> snapshot rather than disrupt the primary fs. I could see doing something
> like this without much thought if the blocks were already freed or
> something and the change was thus invisible to a user, but ISTM that
> losing the preallocation can have a significant affect on the primary
> workload.

Agreed.

> Then again, this doesn't affect me much and I'm not terribly familiar
> with the use case or whether there have been complaints or anything. I
> don't feel strongly about it so long as the tradeoffs are documented
> somewhere so it's clear what needs fixing if this behavior became
> problematic down the road.

Welll... if nobody else yells I'm happy to set this patch aside and let
the code be preserved in mailing list lore for a while.

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > > +	if (sb->s_writers.frozen == SB_FREEZE_PAGEFAULT) {
> > > > +		int		error;
> > > > +
> > > > +		error = xfs_icache_free_cowblocks(mp, NULL);
> > > > +		if (error) {
> > > > +			xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > > +			return error;
> > > > +		}
> > > > +	}
> > > > +
> > > >  	xfs_log_force(mp, XFS_LOG_SYNC);
> > > >  	if (laptop_mode) {
> > > >  		/*
> > > >
diff mbox series

Patch

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 245483cc282b..36d986087abb 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1741,6 +1741,7 @@  xfs_inode_free_cowblocks(
 	void			*args)
 {
 	struct xfs_eofblocks	*eofb = args;
+	uint			lock_mode = 0;
 	int			match;
 	int			ret = 0;
 
@@ -1761,9 +1762,15 @@  xfs_inode_free_cowblocks(
 			return 0;
 	}
 
-	/* Free the CoW blocks */
-	xfs_ilock(ip, XFS_IOLOCK_EXCL);
-	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
+	/*
+	 * Free the CoW blocks.  We don't need to lock the inode if we're in
+	 * the process of freezing the filesystem because we've already locked
+	 * out writes and page faults.
+	 */
+	if (ip->i_mount->m_super->s_writers.frozen == SB_UNFROZEN)
+		lock_mode = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+	if (lock_mode)
+		xfs_ilock(ip, lock_mode);
 
 	/*
 	 * Check again, nobody else should be able to dirty blocks or change
@@ -1772,8 +1779,8 @@  xfs_inode_free_cowblocks(
 	if (xfs_prep_free_cowblocks(ip))
 		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
 
-	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
-	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	if (lock_mode)
+		xfs_iunlock(ip, lock_mode);
 
 	return ret;
 }
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 6903b36eac5d..bb4953cfd683 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1102,6 +1102,24 @@  xfs_fs_sync_fs(
 	if (!wait)
 		return 0;
 
+	/*
+	 * If we're in the middle of freezing the filesystem, this is our last
+	 * chance to run regular transactions.  Clear all the COW staging
+	 * extents so that we can freeze the filesystem with as little recovery
+	 * work to do at the next mount as possible.  It's safe to do this
+	 * without locking inodes because the freezer code flushed all the
+	 * dirty data from the page cache and locked out writes and page faults.
+	 */
+	if (sb->s_writers.frozen == SB_FREEZE_PAGEFAULT) {
+		int		error;
+
+		error = xfs_icache_free_cowblocks(mp, NULL);
+		if (error) {
+			xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+			return error;
+		}
+	}
+
 	xfs_log_force(mp, XFS_LOG_SYNC);
 	if (laptop_mode) {
 		/*