[1/4] xfs: drop submit side trans alloc for append ioends

Message ID	20210405145903.629152-2-bfoster@redhat.com (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> From: Brian Foster <bfoster@redhat.com> To: linux-xfs@vger.kernel.org Subject: [PATCH 1/4] xfs: drop submit side trans alloc for append ioends Date: Mon, 5 Apr 2021 10:59:00 -0400 Message-Id: <20210405145903.629152-2-bfoster@redhat.com> In-Reply-To: <20210405145903.629152-1-bfoster@redhat.com> References: <20210405145903.629152-1-bfoster@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	xfs: ioend batching log reservation deadlock \| expand [0/4] xfs: ioend batching log reservation deadlock [1/4] xfs: drop submit side trans alloc for append ioends [2/4] xfs: open code ioend needs workqueue helper [3/4] xfs: drop unused ioend private merge and setfilesize code [4/4] xfs: drop unnecessary setfilesize helper [5/4] iomap: remove unused private field from ioend

Brian Foster April 5, 2021, 2:59 p.m. UTC

Per-inode ioend completion batching has a log reservation deadlock
vector between preallocated append transactions and transactions
that are acquired at completion time for other purposes (i.e.,
unwritten extent conversion or COW fork remaps). For example, if the
ioend completion workqueue task executes on a batch of ioends that
are sorted such that an append ioend sits at the tail, it's possible
for the outstanding append transaction reservation to block
allocation of transactions required to process preceding ioends in
the list.

Append ioend completion is historically the common path for on-disk
inode size updates. While file extending writes may have completed
sometime earlier, the on-disk inode size is only updated after
successful writeback completion. These transactions are preallocated
serially from writeback context to mitigate concurrency and
associated log reservation pressure across completions processed by
multi-threaded workqueue tasks.

However, now that delalloc blocks unconditionally map to unwritten
extents at physical block allocation time, size updates via append
ioends are relatively rare. This means that inode size updates most
commonly occur as part of the preexisting completion time
transaction to convert unwritten extents. As a result, there is no
longer a strong need to preallocate size update transactions.

Remove the preallocation of inode size update transactions to avoid
the ioend completion processing log reservation deadlock. Instead,
continue to send all potential size extending ioends to workqueue
context for completion and allocate the transaction from that
context. This ensures that no outstanding log reservation is owned
by the ioend completion worker task when it begins to process
ioends.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 45 +++------------------------------------------
 1 file changed, 3 insertions(+), 42 deletions(-)

Christoph Hellwig April 7, 2021, 6:33 a.m. UTC | #1

> @@ -182,12 +155,10 @@ xfs_end_ioend(
>  		error = xfs_reflink_end_cow(ip, offset, size);
>  	else if (ioend->io_type == IOMAP_UNWRITTEN)
>  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> -	else
> -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);

I first though we'd now call xfs_setfilesize for unwritten extents
as well, but as those have already updated di_size we are fine here.

As a future enhancement it would be useful to let xfs_reflink_end_cow
update the file size similar to what we do for the unwritten case.

>  done:
> -	if (ioend->io_private)
> -		error = xfs_setfilesize_ioend(ioend, error);
> +	if (!error && xfs_ioend_is_append(ioend))
> +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
>  	iomap_finish_ioends(ioend, error);

The done label can move after the call to xfs_setfilesize now.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Brian Foster April 7, 2021, 11:23 a.m. UTC | #2

On Wed, Apr 07, 2021 at 07:33:21AM +0100, Christoph Hellwig wrote:
> > @@ -182,12 +155,10 @@ xfs_end_ioend(
> >  		error = xfs_reflink_end_cow(ip, offset, size);
> >  	else if (ioend->io_type == IOMAP_UNWRITTEN)
> >  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> > -	else
> > -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
> 
> I first though we'd now call xfs_setfilesize for unwritten extents
> as well, but as those have already updated di_size we are fine here.
> 
> As a future enhancement it would be useful to let xfs_reflink_end_cow
> update the file size similar to what we do for the unwritten case.
> 

Agreed. I noticed that when first passing through the code but didn't
want to get into further functional changes.

> >  done:
> > -	if (ioend->io_private)
> > -		error = xfs_setfilesize_ioend(ioend, error);
> > +	if (!error && xfs_ioend_is_append(ioend))
> > +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
> >  	iomap_finish_ioends(ioend, error);
> 
> The done label can move after the call to xfs_setfilesize now.
> 

Fixed.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 

Thanks.

Brian

Darrick J. Wong April 7, 2021, 3:31 p.m. UTC | #3

On Mon, Apr 05, 2021 at 10:59:00AM -0400, Brian Foster wrote:
> Per-inode ioend completion batching has a log reservation deadlock
> vector between preallocated append transactions and transactions
> that are acquired at completion time for other purposes (i.e.,
> unwritten extent conversion or COW fork remaps). For example, if the
> ioend completion workqueue task executes on a batch of ioends that
> are sorted such that an append ioend sits at the tail, it's possible
> for the outstanding append transaction reservation to block
> allocation of transactions required to process preceding ioends in
> the list.
> 
> Append ioend completion is historically the common path for on-disk
> inode size updates. While file extending writes may have completed
> sometime earlier, the on-disk inode size is only updated after
> successful writeback completion. These transactions are preallocated
> serially from writeback context to mitigate concurrency and
> associated log reservation pressure across completions processed by
> multi-threaded workqueue tasks.
> 
> However, now that delalloc blocks unconditionally map to unwritten
> extents at physical block allocation time, size updates via append
> ioends are relatively rare. This means that inode size updates most
> commonly occur as part of the preexisting completion time
> transaction to convert unwritten extents. As a result, there is no
> longer a strong need to preallocate size update transactions.
> 
> Remove the preallocation of inode size update transactions to avoid
> the ioend completion processing log reservation deadlock. Instead,
> continue to send all potential size extending ioends to workqueue
> context for completion and allocate the transaction from that
> context. This ensures that no outstanding log reservation is owned
> by the ioend completion worker task when it begins to process
> ioends.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_aops.c | 45 +++------------------------------------------
>  1 file changed, 3 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 1cc7c36d98e9..c1951975bd6a 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
>  		XFS_I(ioend->io_inode)->i_d.di_size;
>  }
>  
> -STATIC int
> -xfs_setfilesize_trans_alloc(
> -	struct iomap_ioend	*ioend)
> -{
> -	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
> -	struct xfs_trans	*tp;
> -	int			error;
> -
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
> -	if (error)
> -		return error;
> -
> -	ioend->io_private = tp;
> -
> -	/*
> -	 * We may pass freeze protection with a transaction.  So tell lockdep
> -	 * we released it.
> -	 */
> -	__sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
> -	/*
> -	 * We hand off the transaction to the completion thread now, so
> -	 * clear the flag here.
> -	 */
> -	xfs_trans_clear_context(tp);
> -	return 0;
> -}
> -
>  /*
>   * Update on-disk file size now that data has been written to disk.
>   */
> @@ -182,12 +155,10 @@ xfs_end_ioend(
>  		error = xfs_reflink_end_cow(ip, offset, size);

Seems reasonable to me.  xfs_reflink_end_cow_extent should probably
learn how to extend the ondisk EOF as patch 6/4.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  	else if (ioend->io_type == IOMAP_UNWRITTEN)
>  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> -	else
> -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
>  
>  done:
> -	if (ioend->io_private)
> -		error = xfs_setfilesize_ioend(ioend, error);
> +	if (!error && xfs_ioend_is_append(ioend))
> +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
>  	iomap_finish_ioends(ioend, error);
>  	memalloc_nofs_restore(nofs_flag);
>  }
> @@ -237,7 +208,7 @@ xfs_end_io(
>  
>  static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
>  {
> -	return ioend->io_private ||
> +	return xfs_ioend_is_append(ioend) ||
>  		ioend->io_type == IOMAP_UNWRITTEN ||
>  		(ioend->io_flags & IOMAP_F_SHARED);
>  }
> @@ -250,8 +221,6 @@ xfs_end_bio(
>  	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
>  	unsigned long		flags;
>  
> -	ASSERT(xfs_ioend_needs_workqueue(ioend));
> -
>  	spin_lock_irqsave(&ip->i_ioend_lock, flags);
>  	if (list_empty(&ip->i_ioend_list))
>  		WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
> @@ -501,14 +470,6 @@ xfs_prepare_ioend(
>  				ioend->io_offset, ioend->io_size);
>  	}
>  
> -	/* Reserve log space if we might write beyond the on-disk inode size. */
> -	if (!status &&
> -	    ((ioend->io_flags & IOMAP_F_SHARED) ||
> -	     ioend->io_type != IOMAP_UNWRITTEN) &&
> -	    xfs_ioend_is_append(ioend) &&
> -	    !ioend->io_private)
> -		status = xfs_setfilesize_trans_alloc(ioend);
> -
>  	memalloc_nofs_restore(nofs_flag);
>  
>  	if (xfs_ioend_needs_workqueue(ioend))
> -- 
> 2.26.3
>

Brian Foster April 9, 2021, 1:47 p.m. UTC | #4

On Wed, Apr 07, 2021 at 08:31:52AM -0700, Darrick J. Wong wrote:
> On Mon, Apr 05, 2021 at 10:59:00AM -0400, Brian Foster wrote:
> > Per-inode ioend completion batching has a log reservation deadlock
> > vector between preallocated append transactions and transactions
> > that are acquired at completion time for other purposes (i.e.,
> > unwritten extent conversion or COW fork remaps). For example, if the
> > ioend completion workqueue task executes on a batch of ioends that
> > are sorted such that an append ioend sits at the tail, it's possible
> > for the outstanding append transaction reservation to block
> > allocation of transactions required to process preceding ioends in
> > the list.
> > 
> > Append ioend completion is historically the common path for on-disk
> > inode size updates. While file extending writes may have completed
> > sometime earlier, the on-disk inode size is only updated after
> > successful writeback completion. These transactions are preallocated
> > serially from writeback context to mitigate concurrency and
> > associated log reservation pressure across completions processed by
> > multi-threaded workqueue tasks.
> > 
> > However, now that delalloc blocks unconditionally map to unwritten
> > extents at physical block allocation time, size updates via append
> > ioends are relatively rare. This means that inode size updates most
> > commonly occur as part of the preexisting completion time
> > transaction to convert unwritten extents. As a result, there is no
> > longer a strong need to preallocate size update transactions.
> > 
> > Remove the preallocation of inode size update transactions to avoid
> > the ioend completion processing log reservation deadlock. Instead,
> > continue to send all potential size extending ioends to workqueue
> > context for completion and allocate the transaction from that
> > context. This ensures that no outstanding log reservation is owned
> > by the ioend completion worker task when it begins to process
> > ioends.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_aops.c | 45 +++------------------------------------------
> >  1 file changed, 3 insertions(+), 42 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 1cc7c36d98e9..c1951975bd6a 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
> >  		XFS_I(ioend->io_inode)->i_d.di_size;
> >  }
> >  
> > -STATIC int
> > -xfs_setfilesize_trans_alloc(
> > -	struct iomap_ioend	*ioend)
> > -{
> > -	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
> > -	struct xfs_trans	*tp;
> > -	int			error;
> > -
> > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
> > -	if (error)
> > -		return error;
> > -
> > -	ioend->io_private = tp;
> > -
> > -	/*
> > -	 * We may pass freeze protection with a transaction.  So tell lockdep
> > -	 * we released it.
> > -	 */
> > -	__sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
> > -	/*
> > -	 * We hand off the transaction to the completion thread now, so
> > -	 * clear the flag here.
> > -	 */
> > -	xfs_trans_clear_context(tp);
> > -	return 0;
> > -}
> > -
> >  /*
> >   * Update on-disk file size now that data has been written to disk.
> >   */
> > @@ -182,12 +155,10 @@ xfs_end_ioend(
> >  		error = xfs_reflink_end_cow(ip, offset, size);
> 
> Seems reasonable to me.  xfs_reflink_end_cow_extent should probably
> learn how to extend the ondisk EOF as patch 6/4.
> 

After looking into this since multiple people asked for it, I'm not as
convinced it's the right approach as I was on first thought. The reason
being that the reflink completion code actually remaps in reverse order
(I suspect due to depending on the same behavior of the bmap unmap code,
which is used to pull extents from the COW fork..).

That means that while inter-ioend completion takes some care (even if
not necessarily guaranteed) to order on-disk size updates against
completion within the associated i_size, a size update along with the
COW fork remap transaction could result in the opposite in some cases
(i.e. if the cow fork is fragmented relative to the data fork). I
suspect this is a rare situation and not a critical problem, but at the
same time I find it a bit odd to implement intra-ioend ordering
inconsistent with the broader inter-ioend logic.

I could probably be convinced otherwise, but I think this warrants more
thought and is not necessarily a straightforward cleanup. I'm also not
sure how common disk size updates via COW fork I/O completion really are
in the first place, so I'm not sure it's worth the oddness to save an
extra transaction. In any event, I'll probably leave this off from this
series for now because it doesn't have much to do with fixing the
deadlock problem and can revisit it if anybody has thoughts on any of
the above..

Brian

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> --D
> 
> >  	else if (ioend->io_type == IOMAP_UNWRITTEN)
> >  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> > -	else
> > -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
> >  
> >  done:
> > -	if (ioend->io_private)
> > -		error = xfs_setfilesize_ioend(ioend, error);
> > +	if (!error && xfs_ioend_is_append(ioend))
> > +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
> >  	iomap_finish_ioends(ioend, error);
> >  	memalloc_nofs_restore(nofs_flag);
> >  }
> > @@ -237,7 +208,7 @@ xfs_end_io(
> >  
> >  static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
> >  {
> > -	return ioend->io_private ||
> > +	return xfs_ioend_is_append(ioend) ||
> >  		ioend->io_type == IOMAP_UNWRITTEN ||
> >  		(ioend->io_flags & IOMAP_F_SHARED);
> >  }
> > @@ -250,8 +221,6 @@ xfs_end_bio(
> >  	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
> >  	unsigned long		flags;
> >  
> > -	ASSERT(xfs_ioend_needs_workqueue(ioend));
> > -
> >  	spin_lock_irqsave(&ip->i_ioend_lock, flags);
> >  	if (list_empty(&ip->i_ioend_list))
> >  		WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
> > @@ -501,14 +470,6 @@ xfs_prepare_ioend(
> >  				ioend->io_offset, ioend->io_size);
> >  	}
> >  
> > -	/* Reserve log space if we might write beyond the on-disk inode size. */
> > -	if (!status &&
> > -	    ((ioend->io_flags & IOMAP_F_SHARED) ||
> > -	     ioend->io_type != IOMAP_UNWRITTEN) &&
> > -	    xfs_ioend_is_append(ioend) &&
> > -	    !ioend->io_private)
> > -		status = xfs_setfilesize_trans_alloc(ioend);
> > -
> >  	memalloc_nofs_restore(nofs_flag);
> >  
> >  	if (xfs_ioend_needs_workqueue(ioend))
> > -- 
> > 2.26.3
> > 
>

Darrick J. Wong April 9, 2021, 4:07 p.m. UTC | #5

On Fri, Apr 09, 2021 at 09:47:24AM -0400, Brian Foster wrote:
> On Wed, Apr 07, 2021 at 08:31:52AM -0700, Darrick J. Wong wrote:
> > On Mon, Apr 05, 2021 at 10:59:00AM -0400, Brian Foster wrote:
> > > Per-inode ioend completion batching has a log reservation deadlock
> > > vector between preallocated append transactions and transactions
> > > that are acquired at completion time for other purposes (i.e.,
> > > unwritten extent conversion or COW fork remaps). For example, if the
> > > ioend completion workqueue task executes on a batch of ioends that
> > > are sorted such that an append ioend sits at the tail, it's possible
> > > for the outstanding append transaction reservation to block
> > > allocation of transactions required to process preceding ioends in
> > > the list.
> > > 
> > > Append ioend completion is historically the common path for on-disk
> > > inode size updates. While file extending writes may have completed
> > > sometime earlier, the on-disk inode size is only updated after
> > > successful writeback completion. These transactions are preallocated
> > > serially from writeback context to mitigate concurrency and
> > > associated log reservation pressure across completions processed by
> > > multi-threaded workqueue tasks.
> > > 
> > > However, now that delalloc blocks unconditionally map to unwritten
> > > extents at physical block allocation time, size updates via append
> > > ioends are relatively rare. This means that inode size updates most
> > > commonly occur as part of the preexisting completion time
> > > transaction to convert unwritten extents. As a result, there is no
> > > longer a strong need to preallocate size update transactions.
> > > 
> > > Remove the preallocation of inode size update transactions to avoid
> > > the ioend completion processing log reservation deadlock. Instead,
> > > continue to send all potential size extending ioends to workqueue
> > > context for completion and allocate the transaction from that
> > > context. This ensures that no outstanding log reservation is owned
> > > by the ioend completion worker task when it begins to process
> > > ioends.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/xfs_aops.c | 45 +++------------------------------------------
> > >  1 file changed, 3 insertions(+), 42 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > index 1cc7c36d98e9..c1951975bd6a 100644
> > > --- a/fs/xfs/xfs_aops.c
> > > +++ b/fs/xfs/xfs_aops.c
> > > @@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
> > >  		XFS_I(ioend->io_inode)->i_d.di_size;
> > >  }
> > >  
> > > -STATIC int
> > > -xfs_setfilesize_trans_alloc(
> > > -	struct iomap_ioend	*ioend)
> > > -{
> > > -	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
> > > -	struct xfs_trans	*tp;
> > > -	int			error;
> > > -
> > > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
> > > -	if (error)
> > > -		return error;
> > > -
> > > -	ioend->io_private = tp;
> > > -
> > > -	/*
> > > -	 * We may pass freeze protection with a transaction.  So tell lockdep
> > > -	 * we released it.
> > > -	 */
> > > -	__sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
> > > -	/*
> > > -	 * We hand off the transaction to the completion thread now, so
> > > -	 * clear the flag here.
> > > -	 */
> > > -	xfs_trans_clear_context(tp);
> > > -	return 0;
> > > -}
> > > -
> > >  /*
> > >   * Update on-disk file size now that data has been written to disk.
> > >   */
> > > @@ -182,12 +155,10 @@ xfs_end_ioend(
> > >  		error = xfs_reflink_end_cow(ip, offset, size);
> > 
> > Seems reasonable to me.  xfs_reflink_end_cow_extent should probably
> > learn how to extend the ondisk EOF as patch 6/4.
> > 
> 
> After looking into this since multiple people asked for it, I'm not as
> convinced it's the right approach as I was on first thought. The reason
> being that the reflink completion code actually remaps in reverse order
> (I suspect due to depending on the same behavior of the bmap unmap code,
> which is used to pull extents from the COW fork..).

Yes.  There's no algorithmic reason why _end_cow has to call bunmapi
directly (and thereby remap in reverse order) -- it could log a bunmap
intent item for the data fork before logging the refcount btree intent
and the bmap remap intent.  With such a rework in place, we would
effectively be doing the remap in order of increasing file offset,
instead of forwards-backwards like we do now.

If you want to chase that, you might also want to look at this patch[1]
that eliminates the need to requeue bunmap operations.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=reflink-speedups&id=323b0acb563ea50bfff79801083f5ba8544c7985

> That means that while inter-ioend completion takes some care (even if
> not necessarily guaranteed) to order on-disk size updates against
> completion within the associated i_size, a size update along with the
> COW fork remap transaction could result in the opposite in some cases
> (i.e. if the cow fork is fragmented relative to the data fork). I
> suspect this is a rare situation and not a critical problem, but at the
> same time I find it a bit odd to implement intra-ioend ordering
> inconsistent with the broader inter-ioend logic.
> 
> I could probably be convinced otherwise, but I think this warrants more
> thought and is not necessarily a straightforward cleanup. I'm also not
> sure how common disk size updates via COW fork I/O completion really are
> in the first place, so I'm not sure it's worth the oddness to save an
> extra transaction. In any event, I'll probably leave this off from this
> series for now because it doesn't have much to do with fixing the
> deadlock problem and can revisit it if anybody has thoughts on any of
> the above..

The only case I can think of is an extending write to a reflinked file
if (say) you snapshot /var on a live system.

--D

> 
> Brian
> 
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > 
> > --D
> > 
> > >  	else if (ioend->io_type == IOMAP_UNWRITTEN)
> > >  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> > > -	else
> > > -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
> > >  
> > >  done:
> > > -	if (ioend->io_private)
> > > -		error = xfs_setfilesize_ioend(ioend, error);
> > > +	if (!error && xfs_ioend_is_append(ioend))
> > > +		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
> > >  	iomap_finish_ioends(ioend, error);
> > >  	memalloc_nofs_restore(nofs_flag);
> > >  }
> > > @@ -237,7 +208,7 @@ xfs_end_io(
> > >  
> > >  static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
> > >  {
> > > -	return ioend->io_private ||
> > > +	return xfs_ioend_is_append(ioend) ||
> > >  		ioend->io_type == IOMAP_UNWRITTEN ||
> > >  		(ioend->io_flags & IOMAP_F_SHARED);
> > >  }
> > > @@ -250,8 +221,6 @@ xfs_end_bio(
> > >  	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
> > >  	unsigned long		flags;
> > >  
> > > -	ASSERT(xfs_ioend_needs_workqueue(ioend));
> > > -
> > >  	spin_lock_irqsave(&ip->i_ioend_lock, flags);
> > >  	if (list_empty(&ip->i_ioend_list))
> > >  		WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
> > > @@ -501,14 +470,6 @@ xfs_prepare_ioend(
> > >  				ioend->io_offset, ioend->io_size);
> > >  	}
> > >  
> > > -	/* Reserve log space if we might write beyond the on-disk inode size. */
> > > -	if (!status &&
> > > -	    ((ioend->io_flags & IOMAP_F_SHARED) ||
> > > -	     ioend->io_type != IOMAP_UNWRITTEN) &&
> > > -	    xfs_ioend_is_append(ioend) &&
> > > -	    !ioend->io_private)
> > > -		status = xfs_setfilesize_trans_alloc(ioend);
> > > -
> > >  	memalloc_nofs_restore(nofs_flag);
> > >  
> > >  	if (xfs_ioend_needs_workqueue(ioend))
> > > -- 
> > > 2.26.3
> > > 
> > 
>

[1/4] xfs: drop submit side trans alloc for append ioends

Commit Message

Comments

Patch