diff mbox series

[20/33] lustre: llite: Fix return for non-queued aio

Message ID 20250202204633.1148872-21-jsimmons@infradead.org (mailing list archive)
State New
Headers show
Series lustre: sync to OpenSFS branch May 31, 2023 | expand

Commit Message

James Simmons Feb. 2, 2025, 8:46 p.m. UTC
From: Patrick Farrell <pfarrell@whamcloud.com>

If an AIO fails or is completed synchronously (even
partially), the VFS will handle calling the completion
callback to finish the AIO, and so Lustre needs to return
the number of bytes successfully completed to the VFS.

This fixes a bug where if an AIO was racing with buffered
I/O, the AIO would fall back to buffered I/O, causing it to
complete before returning to the VFS rather than being
queued.  In this case, Lustre would return 0 the VFS, and
the VFS would complete the AIO and report 0 bytes moved.

This fixes the logic for this.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13805
Lustre-commit: 8a5bb81f774b9d41f ("LU-13805 llite: Fix return for non-queued aio")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49915
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 51 +++++++++++++++++++++++-------------------
 fs/lustre/llite/rw26.c |  1 -
 2 files changed, 28 insertions(+), 24 deletions(-)
diff mbox series

Patch

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index fc9095279a4b..e745dc8c53a5 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1832,21 +1832,12 @@  ll_file_io_generic(const struct lu_env *env, struct vvp_io_args *args,
 		range_locked = false;
 	}
 
-	/*
-	 * In order to move forward AIO, ci_nob was increased,
-	 * but that doesn't mean io have been finished, it just
-	 * means io have been submited, we will always return
-	 * EIOCBQUEUED to the caller, So we could only return
-	 * number of bytes in non-AIO case.
-	 */
 	if (io->ci_nob > 0) {
-		if (!is_aio) {
-			if (rc2 == 0) {
-				result += io->ci_nob;
-				*ppos = io->u.ci_wr.wr.crw_pos; /* for splice */
-			} else if (rc2) {
-				result = 0;
-			}
+		if (rc2 == 0) {
+			result += io->ci_nob;
+			*ppos = io->u.ci_wr.wr.crw_pos; /* for splice */
+		} else if (rc2) {
+			result = 0;
 		}
 		count -= io->ci_nob;
 
@@ -1886,22 +1877,36 @@  ll_file_io_generic(const struct lu_env *env, struct vvp_io_args *args,
 	}
 
 	if (io->ci_dio_aio) {
+		/* set the number of bytes successfully moved in the aio */
+		if (result > 0)
+			io->ci_dio_aio->cda_bytes = result;
 		/*
 		 * VFS will call aio_complete() if no -EIOCBQUEUED
 		 * is returned for AIO, so we can not call aio_complete()
-		 * in our end_io().
+		 * in our end_io().  (cda_no_aio_complete is always set for
+		 * normal DIO.)
 		 *
-		 * NB: This is safe because the atomic_dec_and_lock  in
-		 * cl_sync_io_init has implicit memory barriers, so this will
-		 * be seen by whichever thread completes the DIO/AIO, even if
-		 * it's not this one
+		 * NB: Setting cda_no_aio_complete like this is safe because
+		 * the atomic_dec_and_lock in cl_sync_io_note has implicit
+		 * memory barriers, so this will be seen by whichever thread
+		 * completes the DIO/AIO, even if it's not this one.
 		 */
-		if (rc != -EIOCBQUEUED)
+		if (is_aio && rc != -EIOCBQUEUED)
 			io->ci_dio_aio->cda_no_aio_complete = 1;
+		/* if an aio enqueued successfully (-EIOCBQUEUED), then Lustre
+		 * will call aio_complete rather than the vfs, so we return 0
+		 * to tell the VFS we're handling it
+		 */
+		else if (is_aio) /* rc == -EIOCBQUEUED */
+			result = 0;
 		/**
-		 * Drop one extra reference so that end_io() could be
-		 * called for this IO context, we could call it after
-		 * we make sure all AIO requests have been proceed.
+		 * Drop the reference held by the llite layer on this top level
+		 * IO context.
+		 *
+		 * For DIO, this frees it here, since IO is complete, and for
+		 * AIO, we will call aio_complete() (and then free this top
+		 * level context) once all the outstanding chunks of this AIO
+		 * have completed.
 		 */
 		cl_sync_io_note(env, &io->ci_dio_aio->cda_sync,
 				rc == -EIOCBQUEUED ? 0 : rc);
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index ad7308a8c902..9239e029276e 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -425,7 +425,6 @@  static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	}
 
 out:
-	ll_dio_aio->cda_bytes += tot_bytes;
 
 	if (rw == WRITE)
 		vio->u.readwrite.vui_written += tot_bytes;