diff mbox series

nfs: Avoid flushing many pages with NFS_FILE_SYNC

Message ID 20240524161419.18448-1-jack@suse.cz (mailing list archive)
State New
Headers show
Series nfs: Avoid flushing many pages with NFS_FILE_SYNC | expand

Commit Message

Jan Kara May 24, 2024, 4:14 p.m. UTC
When we are doing WB_SYNC_ALL writeback, nfs submits write requests with
NFS_FILE_SYNC flag to the server (which then generally treats it as an
O_SYNC write). This helps to reduce latency for single requests but when
submitting more requests, additional fsyncs on the server side hurt
latency. NFS generally avoids this additional overhead by not setting
NFS_FILE_SYNC if desc->pg_moreio is set.

However this logic doesn't always work. When we do random 4k writes to a huge
file and then call fsync(2), each page writeback is going to be sent with
NFS_FILE_SYNC because after preparing one page for writeback, we start writing
back next, nfs_do_writepage() will call nfs_pageio_cond_complete() which finds
the page is not contiguous with previously prepared IO and submits is *without*
setting desc->pg_moreio.  Hence NFS_FILE_SYNC is used resulting in poor
performance.

Fix the problem by setting desc->pg_moreio in nfs_pageio_cond_complete() before
submitting outstanding IO. This improves throughput of
fsync-after-random-writes on my test SSD from ~70MB/s to ~250MB/s.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/nfs/pagelist.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Trond Myklebust May 24, 2024, 4:25 p.m. UTC | #1
On Fri, 2024-05-24 at 18:14 +0200, Jan Kara wrote:
> When we are doing WB_SYNC_ALL writeback, nfs submits write requests
> with
> NFS_FILE_SYNC flag to the server (which then generally treats it as
> an
> O_SYNC write). This helps to reduce latency for single requests but
> when
> submitting more requests, additional fsyncs on the server side hurt
> latency. NFS generally avoids this additional overhead by not setting
> NFS_FILE_SYNC if desc->pg_moreio is set.
> 
> However this logic doesn't always work. When we do random 4k writes
> to a huge
> file and then call fsync(2), each page writeback is going to be sent
> with
> NFS_FILE_SYNC because after preparing one page for writeback, we
> start writing
> back next, nfs_do_writepage() will call nfs_pageio_cond_complete()
> which finds
> the page is not contiguous with previously prepared IO and submits is
> *without*
> setting desc->pg_moreio.  Hence NFS_FILE_SYNC is used resulting in
> poor
> performance.
> 
> Fix the problem by setting desc->pg_moreio in
> nfs_pageio_cond_complete() before
> submitting outstanding IO. This improves throughput of
> fsync-after-random-writes on my test SSD from ~70MB/s to ~250MB/s.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/nfs/pagelist.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 6efb5068c116..040b6b79c75e 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -1545,6 +1545,11 @@ void nfs_pageio_cond_complete(struct
> nfs_pageio_descriptor *desc, pgoff_t index)
>  					continue;
>  			} else if (index == prev->wb_index + 1)
>  				continue;
> +			/*
> +			 * We will submit more requests after these.
> Indicate
> +			 * this to the underlying layers.
> +			 */
> +			desc->pg_moreio = 1;
>  			nfs_pageio_complete(desc);
>  			break;
>  		}

Thanks!
Jeff Layton May 26, 2024, 11:36 a.m. UTC | #2
On Fri, 2024-05-24 at 18:14 +0200, Jan Kara wrote:
> When we are doing WB_SYNC_ALL writeback, nfs submits write requests with
> NFS_FILE_SYNC flag to the server (which then generally treats it as an
> O_SYNC write). This helps to reduce latency for single requests but when
> submitting more requests, additional fsyncs on the server side hurt
> latency. NFS generally avoids this additional overhead by not setting
> NFS_FILE_SYNC if desc->pg_moreio is set.
> 
> However this logic doesn't always work. When we do random 4k writes to a huge
> file and then call fsync(2), each page writeback is going to be sent with
> NFS_FILE_SYNC because after preparing one page for writeback, we start writing
> back next, nfs_do_writepage() will call nfs_pageio_cond_complete() which finds
> the page is not contiguous with previously prepared IO and submits is *without*
> setting desc->pg_moreio.  Hence NFS_FILE_SYNC is used resulting in poor
> performance.
> 
> Fix the problem by setting desc->pg_moreio in nfs_pageio_cond_complete() before
> submitting outstanding IO. This improves throughput of
> fsync-after-random-writes on my test SSD from ~70MB/s to ~250MB/s.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/nfs/pagelist.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 6efb5068c116..040b6b79c75e 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -1545,6 +1545,11 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>  					continue;
>  			} else if (index == prev->wb_index + 1)
>  				continue;
> +			/*
> +			 * We will submit more requests after these. Indicate
> +			 * this to the underlying layers.
> +			 */
> +			desc->pg_moreio = 1;
>  			nfs_pageio_complete(desc);
>  			break;
>  		}

Nice work!

Reviewed-by: Jeff Layton <jlayton@kernel.org>
diff mbox series

Patch

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 6efb5068c116..040b6b79c75e 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -1545,6 +1545,11 @@  void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
 					continue;
 			} else if (index == prev->wb_index + 1)
 				continue;
+			/*
+			 * We will submit more requests after these. Indicate
+			 * this to the underlying layers.
+			 */
+			desc->pg_moreio = 1;
 			nfs_pageio_complete(desc);
 			break;
 		}