[25/29] xfs: support nowait for xfs_buf_item_init()

Message ID	20230825135431.1317785-26-hao.xu@linux.dev (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Hao Xu <hao.xu@linux.dev> To: io-uring@vger.kernel.org, Jens Axboe <axboe@kernel.dk> Cc: Dominique Martinet <asmadeus@codewreck.org>, Pavel Begunkov <asml.silence@gmail.com>, Christian Brauner <brauner@kernel.org>, Alexander Viro <viro@zeniv.linux.org.uk>, Stefan Roesch <shr@fb.com>, Clay Harris <bugs@claycon.org>, Dave Chinner <david@fromorbit.com>, "Darrick J . Wong" <djwong@kernel.org>, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-cachefs@redhat.com, ecryptfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-unionfs@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, codalist@coda.cs.cmu.edu, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-mm@kvack.org, linux-nilfs@vger.kernel.org, devel@lists.orangefs.org, linux-cifs@vger.kernel.org, samba-technical@lists.samba.org, linux-mtd@lists.infradead.org, Wanpeng Li <wanpengli@tencent.com> Subject: [PATCH 25/29] xfs: support nowait for xfs_buf_item_init() Date: Fri, 25 Aug 2023 21:54:27 +0800 Message-Id: <20230825135431.1317785-26-hao.xu@linux.dev> In-Reply-To: <20230825135431.1317785-1-hao.xu@linux.dev> References: <20230825135431.1317785-1-hao.xu@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	io_uring getdents \| expand [RFC,v5,00/29] io_uring getdents [01/29] fs: split off vfs_getdents function of getdents64 syscall [02/29] xfs: rename XBF_TRYLOCK to XBF_NOWAIT [03/29] xfs: add NOWAIT semantics for readdir [04/29] vfs: add nowait flag for struct dir_context [05/29] vfs: add a vfs helper for io_uring file pos lock [06/29] vfs: add file_pos_unlock() for io_uring usage [07/29] vfs: add a nowait parameter for touch_atime() [08/29] vfs: add nowait parameter for file_accessed() [09/29] vfs: move file_accessed() to the beginning of iterate_dir() [10/29] vfs: add S_NOWAIT for nowait time update [11/29] vfs: trylock inode->i_rwsem in iterate_dir() to support nowait [12/29] xfs: enforce GFP_NOIO implicitly during nowait time update [13/29] xfs: make xfs_trans_alloc() support nowait semantics [14/29] xfs: support nowait for xfs_log_reserve() [15/29] xfs: don't wait for free space in xlog_grant_head_check() in nowait case [16/29] xfs: add nowait parameter for xfs_inode_item_init() [17/29] xfs: make xfs_trans_ijoin() error out -EAGAIN [18/29] xfs: set XBF_NOWAIT for xfs_buf_read_map if necessary [19/29] xfs: support nowait memory allocation in _xfs_buf_alloc() [20/29] xfs: distinguish error type of memory allocation failure for nowait case [21/29] xfs: return -EAGAIN when bulk memory allocation fails in nowait case [22/29] xfs: comment page allocation for nowait case in xfs_buf_find_insert() [23/29] xfs: don't print warn info for -EAGAIN error in xfs_buf_get_map() [24/29] xfs: support nowait for xfs_buf_read_map() [25/29] xfs: support nowait for xfs_buf_item_init() [26/29] xfs: return -EAGAIN when nowait meets sync in transaction commit [27/29] xfs: add a comment for xlog_kvmalloc() [28/29] xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit() [29/29] io_uring: add support for getdents

Message ID

20230825135431.1317785-26-hao.xu@linux.dev (mailing list archive)

State

New

Headers

From: Hao Xu <hao.xu@linux.dev>
To: io-uring@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>
Cc: Dominique Martinet <asmadeus@codewreck.org>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Christian Brauner <brauner@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Stefan Roesch <shr@fb.com>,
	Clay Harris <bugs@claycon.org>,
	Dave Chinner <david@fromorbit.com>,
	"Darrick J . Wong" <djwong@kernel.org>,
	linux-fsdevel@vger.kernel.org,
	linux-xfs@vger.kernel.org,
	linux-ext4@vger.kernel.org,
	linux-cachefs@redhat.com,
	ecryptfs@vger.kernel.org,
	linux-nfs@vger.kernel.org,
	linux-unionfs@vger.kernel.org,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-s390@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org,
	linux-btrfs@vger.kernel.org,
	codalist@coda.cs.cmu.edu,
	linux-f2fs-devel@lists.sourceforge.net,
	cluster-devel@redhat.com,
	linux-mm@kvack.org,
	linux-nilfs@vger.kernel.org,
	devel@lists.orangefs.org,
	linux-cifs@vger.kernel.org,
	samba-technical@lists.samba.org,
	linux-mtd@lists.infradead.org,
	Wanpeng Li <wanpengli@tencent.com>
Subject: [PATCH 25/29] xfs: support nowait for xfs_buf_item_init()
Date: Fri, 25 Aug 2023 21:54:27 +0800
Message-Id: <20230825135431.1317785-26-hao.xu@linux.dev>
In-Reply-To: <20230825135431.1317785-1-hao.xu@linux.dev>
References: <20230825135431.1317785-1-hao.xu@linux.dev>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

io_uring getdents | expand

Commit Message

Hao Xu Aug. 25, 2023, 1:54 p.m. UTC

From: Hao Xu <howeyxu@tencent.com>

support nowait for xfs_buf_item_init() and error out -EAGAIN to
_xfs_trans_bjoin() when it would block.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
---
 fs/xfs/xfs_buf_item.c         |  9 +++++++--
 fs/xfs/xfs_buf_item.h         |  2 +-
 fs/xfs/xfs_buf_item_recover.c |  2 +-
 fs/xfs/xfs_trans_buf.c        | 16 +++++++++++++---
 4 files changed, 22 insertions(+), 7 deletions(-)

Comments

Dave Chinner Aug. 25, 2023, 10:16 p.m. UTC | #1

On Fri, Aug 25, 2023 at 09:54:27PM +0800, Hao Xu wrote:
> From: Hao Xu <howeyxu@tencent.com>
> 
> support nowait for xfs_buf_item_init() and error out -EAGAIN to
> _xfs_trans_bjoin() when it would block.
> 
> Signed-off-by: Hao Xu <howeyxu@tencent.com>
> ---
>  fs/xfs/xfs_buf_item.c         |  9 +++++++--
>  fs/xfs/xfs_buf_item.h         |  2 +-
>  fs/xfs/xfs_buf_item_recover.c |  2 +-
>  fs/xfs/xfs_trans_buf.c        | 16 +++++++++++++---
>  4 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 023d4e0385dd..b1e63137d65b 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -827,7 +827,8 @@ xfs_buf_item_free_format(
>  int
>  xfs_buf_item_init(
>  	struct xfs_buf	*bp,
> -	struct xfs_mount *mp)
> +	struct xfs_mount *mp,
> +	bool   nowait)
>  {
>  	struct xfs_buf_log_item	*bip = bp->b_log_item;
>  	int			chunks;
> @@ -847,7 +848,11 @@ xfs_buf_item_init(
>  		return 0;
>  	}
>  
> -	bip = kmem_cache_zalloc(xfs_buf_item_cache, GFP_KERNEL | __GFP_NOFAIL);
> +	bip = kmem_cache_zalloc(xfs_buf_item_cache,
> +				GFP_KERNEL | (nowait ? 0 : __GFP_NOFAIL));
> +	if (!bip)
> +		return -EAGAIN;
> +
>  	xfs_log_item_init(mp, &bip->bli_item, XFS_LI_BUF, &xfs_buf_item_ops);
>  	bip->bli_buf = bp;

I see filesystem shutdowns....

> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 016371f58f26..a1e4f2e8629a 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -57,13 +57,14 @@ xfs_trans_buf_item_match(
>   * If the buffer does not yet have a buf log item associated with it,
>   * then allocate one for it.  Then add the buf item to the transaction.
>   */
> -STATIC void
> +STATIC int
>  _xfs_trans_bjoin(
>  	struct xfs_trans	*tp,
>  	struct xfs_buf		*bp,
>  	int			reset_recur)
>  {
>  	struct xfs_buf_log_item	*bip;
> +	int ret;
>  
>  	ASSERT(bp->b_transp == NULL);
>  
> @@ -72,7 +73,11 @@ _xfs_trans_bjoin(
>  	 * it doesn't have one yet, then allocate one and initialize it.
>  	 * The checks to see if one is there are in xfs_buf_item_init().
>  	 */
> -	xfs_buf_item_init(bp, tp->t_mountp);
> +	ret = xfs_buf_item_init(bp, tp->t_mountp,
> +				tp->t_flags & XFS_TRANS_NOWAIT);
> +	if (ret < 0)
> +		return ret;
> +
>  	bip = bp->b_log_item;
>  	ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
>  	ASSERT(!(bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
> @@ -92,6 +97,7 @@ _xfs_trans_bjoin(
>  	xfs_trans_add_item(tp, &bip->bli_item);
>  	bp->b_transp = tp;
>  
> +	return 0;
>  }
>  
>  void
> @@ -309,7 +315,11 @@ xfs_trans_read_buf_map(
>  	}
>  
>  	if (tp) {
> -		_xfs_trans_bjoin(tp, bp, 1);
> +		error = _xfs_trans_bjoin(tp, bp, 1);
> +		if (error) {
> +			xfs_buf_relse(bp);
> +			return error;
> +		}
>  		trace_xfs_trans_read_buf(bp->b_log_item);

So what happens at the callers when we have a dirty transaction and
joining a buffer fails with -EAGAIN?

Apart from the fact this may well propagate -EAGAIN up to userspace,
cancelling a dirty transaction at this point will result in a
filesystem shutdown....

Indeed, this can happen in the "simple" timestamp update case that
this "nowait" semantic is being aimed at. We log the inode in the
timestamp update, which dirties the log item and registers a
precommit operation to be run. We commit the
transaction, which then runs xfs_inode_item_precommit() and that
may need to attach the inode to the inode cluster buffer. This
results in:

xfs_inode_item_precommit
  xfs_imap_to_bp
    xfs_trans_read_buf_map
      _xfs_trans_bjoin
        xfs_buf_item_init(XFS_TRANS_NOWAIT)
	  kmem_cache_zalloc(GFP_NOFS)
	  <memory allocation fails>
      gets -EAGAIN error
    propagates -EAGAIN
  fails due to -EAGAIN

And now xfs_trans_commit() fails with a dirty transaction and the
filesystem shuts down.

IOWs, XFS_TRANS_NOWAIT as it stands is fundamentally broken. Once we
dirty an item in a transaction, we *cannot* back out of the
transaction. We *must block* in every place that could fail -
locking, memory allocation and/or IO - until the transaction
completes because we cannot undo the changes we've already made to
the dirty items in the transaction....

It's even worse than that - once we have committed intents, the
whole chain of intent processing must be run to completionr. Hence
we can't tolerate backing out of that defered processing chain half
way through because we might have to block.

Until we can roll back partial dirty transactions and partially
completed defered intent chains at any random point of completion,
XFS_TRANS_NOWAIT will not work.

-Dave.

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 023d4e0385dd..b1e63137d65b 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -827,7 +827,8 @@  xfs_buf_item_free_format(
 int
 xfs_buf_item_init(
 	struct xfs_buf	*bp,
-	struct xfs_mount *mp)
+	struct xfs_mount *mp,
+	bool   nowait)
 {
 	struct xfs_buf_log_item	*bip = bp->b_log_item;
 	int			chunks;
@@ -847,7 +848,11 @@  xfs_buf_item_init(
 		return 0;
 	}
 
-	bip = kmem_cache_zalloc(xfs_buf_item_cache, GFP_KERNEL | __GFP_NOFAIL);
+	bip = kmem_cache_zalloc(xfs_buf_item_cache,
+				GFP_KERNEL | (nowait ? 0 : __GFP_NOFAIL));
+	if (!bip)
+		return -EAGAIN;
+
 	xfs_log_item_init(mp, &bip->bli_item, XFS_LI_BUF, &xfs_buf_item_ops);
 	bip->bli_buf = bp;
 
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 4d8a6aece995..b1daf8988280 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -47,7 +47,7 @@  struct xfs_buf_log_item {
 	struct xfs_buf_log_format __bli_format;	/* embedded in-log header */
 };
 
-int	xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
+int	xfs_buf_item_init(struct xfs_buf *bp, struct xfs_mount *mp, bool nowait);
 void	xfs_buf_item_done(struct xfs_buf *bp);
 void	xfs_buf_item_relse(struct xfs_buf *);
 bool	xfs_buf_item_put(struct xfs_buf_log_item *);
diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index 43167f543afc..aa64d5a499d6 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -429,7 +429,7 @@  xlog_recover_validate_buf_type(
 		struct xfs_buf_log_item	*bip;
 
 		bp->b_flags |= _XBF_LOGRECOVERY;
-		xfs_buf_item_init(bp, mp);
+		xfs_buf_item_init(bp, mp, false);
 		bip = bp->b_log_item;
 		bip->bli_item.li_lsn = current_lsn;
 	}
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 016371f58f26..a1e4f2e8629a 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -57,13 +57,14 @@  xfs_trans_buf_item_match(
  * If the buffer does not yet have a buf log item associated with it,
  * then allocate one for it.  Then add the buf item to the transaction.
  */
-STATIC void
+STATIC int
 _xfs_trans_bjoin(
 	struct xfs_trans	*tp,
 	struct xfs_buf		*bp,
 	int			reset_recur)
 {
 	struct xfs_buf_log_item	*bip;
+	int ret;
 
 	ASSERT(bp->b_transp == NULL);
 
@@ -72,7 +73,11 @@  _xfs_trans_bjoin(
 	 * it doesn't have one yet, then allocate one and initialize it.
 	 * The checks to see if one is there are in xfs_buf_item_init().
 	 */
-	xfs_buf_item_init(bp, tp->t_mountp);
+	ret = xfs_buf_item_init(bp, tp->t_mountp,
+				tp->t_flags & XFS_TRANS_NOWAIT);
+	if (ret < 0)
+		return ret;
+
 	bip = bp->b_log_item;
 	ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
 	ASSERT(!(bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
@@ -92,6 +97,7 @@  _xfs_trans_bjoin(
 	xfs_trans_add_item(tp, &bip->bli_item);
 	bp->b_transp = tp;
 
+	return 0;
 }
 
 void
@@ -309,7 +315,11 @@  xfs_trans_read_buf_map(
 	}
 
 	if (tp) {
-		_xfs_trans_bjoin(tp, bp, 1);
+		error = _xfs_trans_bjoin(tp, bp, 1);
+		if (error) {
+			xfs_buf_relse(bp);
+			return error;
+		}
 		trace_xfs_trans_read_buf(bp->b_log_item);
 	}
 	ASSERT(bp->b_ops != NULL || ops == NULL);

[25/29] xfs: support nowait for xfs_buf_item_init()

Commit Message

Comments

Patch