[3/3] xfs: rework collapse range into an atomic operation

Message ID	20191213171258.36934-4-bfoster@redhat.com (mailing list archive)
State	Accepted
Headers	show Return-Path: <SRS0=jVwY=2D=vger.kernel.org=linux-xfs-owner@kernel.org> From: Brian Foster <bfoster@redhat.com> To: linux-xfs@vger.kernel.org Subject: [PATCH 3/3] xfs: rework collapse range into an atomic operation Date: Fri, 13 Dec 2019 12:12:58 -0500 Message-Id: <20191213171258.36934-4-bfoster@redhat.com> In-Reply-To: <20191213171258.36934-1-bfoster@redhat.com> References: <20191213171258.36934-1-bfoster@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk
Series	xfs: hold ilock across insert and collapse range \| expand [0/3] xfs: hold ilock across insert and collapse range [1/3] xfs: open code insert range extent split helper [2/3] xfs: rework insert range into an atomic operation [3/3] xfs: rework collapse range into an atomic operation

Brian Foster Dec. 13, 2019, 5:12 p.m. UTC

The collapse range operation uses a unique transaction and ilock
cycle for the hole punch and each extent shift iteration of the
overall operation. While the hole punch is safe as a separate
operation due to the iolock, cycling the ilock after each extent
shift is risky similar to insert range.

To avoid this problem, make collapse range atomic with respect to
ilock. Hold the ilock across the entire operation, replace the
individual transactions with a single rolling transaction sequence
and finish dfops on each iteration to perform pending frees and roll
the transaction. Remove the unnecessary quota reservation as
collapse range can only ever merge extents (and thus remove extent
records and potentially free bmap blocks). The dfops call
automatically relogs the inode to keep it moving in the log. This
guarantees that nothing else can change the extent mapping of an
inode while a collapse range operation is in progress.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_bmap_util.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

Darrick J. Wong Dec. 18, 2019, 2:39 a.m. UTC | #1

On Fri, Dec 13, 2019 at 12:12:58PM -0500, Brian Foster wrote:
> The collapse range operation uses a unique transaction and ilock
> cycle for the hole punch and each extent shift iteration of the
> overall operation. While the hole punch is safe as a separate
> operation due to the iolock, cycling the ilock after each extent
> shift is risky similar to insert range.

It is?  I thought collapse range was safe because we started by punching
out the doomed range and shifting downwards, which eliminates the
problems that come with two adjacent mappings that could be combined?

<confused?>

--D

> To avoid this problem, make collapse range atomic with respect to
> ilock. Hold the ilock across the entire operation, replace the
> individual transactions with a single rolling transaction sequence
> and finish dfops on each iteration to perform pending frees and roll
> the transaction. Remove the unnecessary quota reservation as
> collapse range can only ever merge extents (and thus remove extent
> records and potentially free bmap blocks). The dfops call
> automatically relogs the inode to keep it moving in the log. This
> guarantees that nothing else can change the extent mapping of an
> inode while a collapse range operation is in progress.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_bmap_util.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 555c8b49a223..1c34a34997ca 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1050,7 +1050,6 @@ xfs_collapse_file_space(
>  	int			error;
>  	xfs_fileoff_t		next_fsb = XFS_B_TO_FSB(mp, offset + len);
>  	xfs_fileoff_t		shift_fsb = XFS_B_TO_FSB(mp, len);
> -	uint			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
>  	bool			done = false;
>  
>  	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> @@ -1066,32 +1065,34 @@ xfs_collapse_file_space(
>  	if (error)
>  		return error;
>  
> -	while (!error && !done) {
> -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0,
> -					&tp);
> -		if (error)
> -			break;
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp);
> +	if (error)
> +		return error;
>  
> -		xfs_ilock(ip, XFS_ILOCK_EXCL);
> -		error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot,
> -				ip->i_gdquot, ip->i_pdquot, resblks, 0,
> -				XFS_QMOPT_RES_REGBLKS);
> -		if (error)
> -			goto out_trans_cancel;
> -		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +	xfs_trans_ijoin(tp, ip, 0);
>  
> +	while (!done) {
>  		error = xfs_bmap_collapse_extents(tp, ip, &next_fsb, shift_fsb,
>  				&done);
>  		if (error)
>  			goto out_trans_cancel;
> +		if (done)
> +			break;
>  
> -		error = xfs_trans_commit(tp);
> +		/* finish any deferred frees and roll the transaction */
> +		error = xfs_defer_finish(&tp);
> +		if (error)
> +			goto out_trans_cancel;
>  	}
>  
> +	error = xfs_trans_commit(tp);
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
>  
>  out_trans_cancel:
>  	xfs_trans_cancel(tp);
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
>  }
>  
> -- 
> 2.20.1
>

Brian Foster Dec. 18, 2019, 12:11 p.m. UTC | #2

On Tue, Dec 17, 2019 at 06:39:58PM -0800, Darrick J. Wong wrote:
> On Fri, Dec 13, 2019 at 12:12:58PM -0500, Brian Foster wrote:
> > The collapse range operation uses a unique transaction and ilock
> > cycle for the hole punch and each extent shift iteration of the
> > overall operation. While the hole punch is safe as a separate
> > operation due to the iolock, cycling the ilock after each extent
> > shift is risky similar to insert range.
> 
> It is?  I thought collapse range was safe because we started by punching
> out the doomed range and shifting downwards, which eliminates the
> problems that come with two adjacent mappings that could be combined?
> 
> <confused?>
> 

This is somewhat vague wording. I don't mean to say the same bug is
possible on collapse. Indeed, I don't think that it is. What I mean to
say is just that cycling the ilock generally opens the operation up to
concurrent extent changes, similar to the behavior that resulted in the
insert range bug and against the general design principle of the
operation (as implied by the iolock hold -> flush -> unmap preparation
sequence).

IOW, it seems to me that a similar behavior is possible on collapse, it
just might occur after an extent has been shifted into its new target
range rather than before. That wouldn't be a corruption/bug because it
doesn't change the semantics of the shift operation or the content of
the file, but it's subtle and arguably a misbehavior and/or landmine.

Brian

> --D
> 
> > To avoid this problem, make collapse range atomic with respect to
> > ilock. Hold the ilock across the entire operation, replace the
> > individual transactions with a single rolling transaction sequence
> > and finish dfops on each iteration to perform pending frees and roll
> > the transaction. Remove the unnecessary quota reservation as
> > collapse range can only ever merge extents (and thus remove extent
> > records and potentially free bmap blocks). The dfops call
> > automatically relogs the inode to keep it moving in the log. This
> > guarantees that nothing else can change the extent mapping of an
> > inode while a collapse range operation is in progress.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_bmap_util.c | 29 +++++++++++++++--------------
> >  1 file changed, 15 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 555c8b49a223..1c34a34997ca 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -1050,7 +1050,6 @@ xfs_collapse_file_space(
> >  	int			error;
> >  	xfs_fileoff_t		next_fsb = XFS_B_TO_FSB(mp, offset + len);
> >  	xfs_fileoff_t		shift_fsb = XFS_B_TO_FSB(mp, len);
> > -	uint			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
> >  	bool			done = false;
> >  
> >  	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> > @@ -1066,32 +1065,34 @@ xfs_collapse_file_space(
> >  	if (error)
> >  		return error;
> >  
> > -	while (!error && !done) {
> > -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0,
> > -					&tp);
> > -		if (error)
> > -			break;
> > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp);
> > +	if (error)
> > +		return error;
> >  
> > -		xfs_ilock(ip, XFS_ILOCK_EXCL);
> > -		error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot,
> > -				ip->i_gdquot, ip->i_pdquot, resblks, 0,
> > -				XFS_QMOPT_RES_REGBLKS);
> > -		if (error)
> > -			goto out_trans_cancel;
> > -		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > +	xfs_trans_ijoin(tp, ip, 0);
> >  
> > +	while (!done) {
> >  		error = xfs_bmap_collapse_extents(tp, ip, &next_fsb, shift_fsb,
> >  				&done);
> >  		if (error)
> >  			goto out_trans_cancel;
> > +		if (done)
> > +			break;
> >  
> > -		error = xfs_trans_commit(tp);
> > +		/* finish any deferred frees and roll the transaction */
> > +		error = xfs_defer_finish(&tp);
> > +		if (error)
> > +			goto out_trans_cancel;
> >  	}
> >  
> > +	error = xfs_trans_commit(tp);
> > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >  	return error;
> >  
> >  out_trans_cancel:
> >  	xfs_trans_cancel(tp);
> > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >  	return error;
> >  }
> >  
> > -- 
> > 2.20.1
> > 
>

Darrick J. Wong Dec. 18, 2019, 9:19 p.m. UTC | #3

On Wed, Dec 18, 2019 at 07:11:20AM -0500, Brian Foster wrote:
> On Tue, Dec 17, 2019 at 06:39:58PM -0800, Darrick J. Wong wrote:
> > On Fri, Dec 13, 2019 at 12:12:58PM -0500, Brian Foster wrote:
> > > The collapse range operation uses a unique transaction and ilock
> > > cycle for the hole punch and each extent shift iteration of the
> > > overall operation. While the hole punch is safe as a separate
> > > operation due to the iolock, cycling the ilock after each extent
> > > shift is risky similar to insert range.
> > 
> > It is?  I thought collapse range was safe because we started by punching
> > out the doomed range and shifting downwards, which eliminates the
> > problems that come with two adjacent mappings that could be combined?
> > 
> > <confused?>
> > 
> 
> This is somewhat vague wording. I don't mean to say the same bug is
> possible on collapse. Indeed, I don't think that it is. What I mean to
> say is just that cycling the ilock generally opens the operation up to
> concurrent extent changes, similar to the behavior that resulted in the
> insert range bug and against the general design principle of the
> operation (as implied by the iolock hold -> flush -> unmap preparation
> sequence).

Oh, ok, you're merely trying to prevent anyone from seeing the inode
metadata while we're in the middle of a collapse-range operation.  I
wonder then if we need to take a look at the remap range operations, but
oh wow is that a gnarly mess of inode locking. :)

> IOW, it seems to me that a similar behavior is possible on collapse, it
> just might occur after an extent has been shifted into its new target
> range rather than before. That wouldn't be a corruption/bug because it
> doesn't change the semantics of the shift operation or the content of
> the file, but it's subtle and arguably a misbehavior and/or landmine.

<nod> Ok, just making sure. :)

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> Brian
> 
> > --D
> > 
> > > To avoid this problem, make collapse range atomic with respect to
> > > ilock. Hold the ilock across the entire operation, replace the
> > > individual transactions with a single rolling transaction sequence
> > > and finish dfops on each iteration to perform pending frees and roll
> > > the transaction. Remove the unnecessary quota reservation as
> > > collapse range can only ever merge extents (and thus remove extent
> > > records and potentially free bmap blocks). The dfops call
> > > automatically relogs the inode to keep it moving in the log. This
> > > guarantees that nothing else can change the extent mapping of an
> > > inode while a collapse range operation is in progress.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/xfs_bmap_util.c | 29 +++++++++++++++--------------
> > >  1 file changed, 15 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > index 555c8b49a223..1c34a34997ca 100644
> > > --- a/fs/xfs/xfs_bmap_util.c
> > > +++ b/fs/xfs/xfs_bmap_util.c
> > > @@ -1050,7 +1050,6 @@ xfs_collapse_file_space(
> > >  	int			error;
> > >  	xfs_fileoff_t		next_fsb = XFS_B_TO_FSB(mp, offset + len);
> > >  	xfs_fileoff_t		shift_fsb = XFS_B_TO_FSB(mp, len);
> > > -	uint			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
> > >  	bool			done = false;
> > >  
> > >  	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> > > @@ -1066,32 +1065,34 @@ xfs_collapse_file_space(
> > >  	if (error)
> > >  		return error;
> > >  
> > > -	while (!error && !done) {
> > > -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0,
> > > -					&tp);
> > > -		if (error)
> > > -			break;
> > > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp);
> > > +	if (error)
> > > +		return error;
> > >  
> > > -		xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > -		error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot,
> > > -				ip->i_gdquot, ip->i_pdquot, resblks, 0,
> > > -				XFS_QMOPT_RES_REGBLKS);
> > > -		if (error)
> > > -			goto out_trans_cancel;
> > > -		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > +	xfs_trans_ijoin(tp, ip, 0);
> > >  
> > > +	while (!done) {
> > >  		error = xfs_bmap_collapse_extents(tp, ip, &next_fsb, shift_fsb,
> > >  				&done);
> > >  		if (error)
> > >  			goto out_trans_cancel;
> > > +		if (done)
> > > +			break;
> > >  
> > > -		error = xfs_trans_commit(tp);
> > > +		/* finish any deferred frees and roll the transaction */
> > > +		error = xfs_defer_finish(&tp);
> > > +		if (error)
> > > +			goto out_trans_cancel;
> > >  	}
> > >  
> > > +	error = xfs_trans_commit(tp);
> > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > >  	return error;
> > >  
> > >  out_trans_cancel:
> > >  	xfs_trans_cancel(tp);
> > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > >  	return error;
> > >  }
> > >  
> > > -- 
> > > 2.20.1
> > > 
> > 
>

Brian Foster Dec. 19, 2019, 11:56 a.m. UTC | #4

On Wed, Dec 18, 2019 at 01:19:05PM -0800, Darrick J. Wong wrote:
> On Wed, Dec 18, 2019 at 07:11:20AM -0500, Brian Foster wrote:
> > On Tue, Dec 17, 2019 at 06:39:58PM -0800, Darrick J. Wong wrote:
> > > On Fri, Dec 13, 2019 at 12:12:58PM -0500, Brian Foster wrote:
> > > > The collapse range operation uses a unique transaction and ilock
> > > > cycle for the hole punch and each extent shift iteration of the
> > > > overall operation. While the hole punch is safe as a separate
> > > > operation due to the iolock, cycling the ilock after each extent
> > > > shift is risky similar to insert range.
> > > 
> > > It is?  I thought collapse range was safe because we started by punching
> > > out the doomed range and shifting downwards, which eliminates the
> > > problems that come with two adjacent mappings that could be combined?
> > > 
> > > <confused?>
> > > 
> > 
> > This is somewhat vague wording. I don't mean to say the same bug is
> > possible on collapse. Indeed, I don't think that it is. What I mean to
> > say is just that cycling the ilock generally opens the operation up to
> > concurrent extent changes, similar to the behavior that resulted in the
> > insert range bug and against the general design principle of the
> > operation (as implied by the iolock hold -> flush -> unmap preparation
> > sequence).
> 
> Oh, ok, you're merely trying to prevent anyone from seeing the inode
> metadata while we're in the middle of a collapse-range operation.  I
> wonder then if we need to take a look at the remap range operations, but
> oh wow is that a gnarly mess of inode locking. :)
> 

Yeah, that's actually a much better way to put it. ;) Feel free to tweak
the commit log along those lines if my current description is too
vague/confusing. Thanks for the reviews..

Brian

> > IOW, it seems to me that a similar behavior is possible on collapse, it
> > just might occur after an extent has been shifted into its new target
> > range rather than before. That wouldn't be a corruption/bug because it
> > doesn't change the semantics of the shift operation or the content of
> > the file, but it's subtle and arguably a misbehavior and/or landmine.
> 
> <nod> Ok, just making sure. :)
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > To avoid this problem, make collapse range atomic with respect to
> > > > ilock. Hold the ilock across the entire operation, replace the
> > > > individual transactions with a single rolling transaction sequence
> > > > and finish dfops on each iteration to perform pending frees and roll
> > > > the transaction. Remove the unnecessary quota reservation as
> > > > collapse range can only ever merge extents (and thus remove extent
> > > > records and potentially free bmap blocks). The dfops call
> > > > automatically relogs the inode to keep it moving in the log. This
> > > > guarantees that nothing else can change the extent mapping of an
> > > > inode while a collapse range operation is in progress.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > >  fs/xfs/xfs_bmap_util.c | 29 +++++++++++++++--------------
> > > >  1 file changed, 15 insertions(+), 14 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > > index 555c8b49a223..1c34a34997ca 100644
> > > > --- a/fs/xfs/xfs_bmap_util.c
> > > > +++ b/fs/xfs/xfs_bmap_util.c
> > > > @@ -1050,7 +1050,6 @@ xfs_collapse_file_space(
> > > >  	int			error;
> > > >  	xfs_fileoff_t		next_fsb = XFS_B_TO_FSB(mp, offset + len);
> > > >  	xfs_fileoff_t		shift_fsb = XFS_B_TO_FSB(mp, len);
> > > > -	uint			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
> > > >  	bool			done = false;
> > > >  
> > > >  	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> > > > @@ -1066,32 +1065,34 @@ xfs_collapse_file_space(
> > > >  	if (error)
> > > >  		return error;
> > > >  
> > > > -	while (!error && !done) {
> > > > -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0,
> > > > -					&tp);
> > > > -		if (error)
> > > > -			break;
> > > > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp);
> > > > +	if (error)
> > > > +		return error;
> > > >  
> > > > -		xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > > -		error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot,
> > > > -				ip->i_gdquot, ip->i_pdquot, resblks, 0,
> > > > -				XFS_QMOPT_RES_REGBLKS);
> > > > -		if (error)
> > > > -			goto out_trans_cancel;
> > > > -		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > > > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > > +	xfs_trans_ijoin(tp, ip, 0);
> > > >  
> > > > +	while (!done) {
> > > >  		error = xfs_bmap_collapse_extents(tp, ip, &next_fsb, shift_fsb,
> > > >  				&done);
> > > >  		if (error)
> > > >  			goto out_trans_cancel;
> > > > +		if (done)
> > > > +			break;
> > > >  
> > > > -		error = xfs_trans_commit(tp);
> > > > +		/* finish any deferred frees and roll the transaction */
> > > > +		error = xfs_defer_finish(&tp);
> > > > +		if (error)
> > > > +			goto out_trans_cancel;
> > > >  	}
> > > >  
> > > > +	error = xfs_trans_commit(tp);
> > > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > >  	return error;
> > > >  
> > > >  out_trans_cancel:
> > > >  	xfs_trans_cancel(tp);
> > > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > >  	return error;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.20.1
> > > > 
> > > 
> > 
>

Christoph Hellwig Dec. 24, 2019, 11:29 a.m. UTC | #5

Looks good modulo any commit log issues:

Reviewed-by: Christoph Hellwig <hch@lst.de>

[3/3] xfs: rework collapse range into an atomic operation

Commit Message

Comments

Patch