diff mbox series

[02/23] MM: extend block-plugging to cover all swap reads with read-ahead

Message ID 164299611274.26253.13900771841681128440.stgit@noble.brown (mailing list archive)
State New
Headers show
Series Repair SWAP-over_NFS | expand

Commit Message

NeilBrown Jan. 24, 2022, 3:48 a.m. UTC
Code that does swap read-ahead uses blk_start_plug() and
blk_finish_plug() to allow lower levels to combine multiple read-ahead
pages into a single request, but calls blk_finish_plug() *before*
submitting the original (non-ahead) read request.
This missed an opportunity to combine read requests.

This patch moves the blk_finish_plug to *after* all the reads.
This will likely combine the primary read with some of the "ahead"
reads, and that may slightly increase the latency of that read, but it
should more than make up for this by making more efficient use of the
storage path.

The patch mostly makes the code look more consistent.  Performance
change is unlikely to be noticeable.

Fixes-no-auto-backport: 3fb5c298b04e ("swap: allow swap readahead to be merged")
Signed-off-by: NeilBrown <neilb@suse.de>
---
 mm/swap_state.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

Comments

Christoph Hellwig Jan. 24, 2022, 7:27 a.m. UTC | #1
On Mon, Jan 24, 2022 at 02:48:32PM +1100, NeilBrown wrote:
> Code that does swap read-ahead uses blk_start_plug() and
> blk_finish_plug() to allow lower levels to combine multiple read-ahead
> pages into a single request, but calls blk_finish_plug() *before*
> submitting the original (non-ahead) read request.
> This missed an opportunity to combine read requests.
> 
> This patch moves the blk_finish_plug to *after* all the reads.
> This will likely combine the primary read with some of the "ahead"
> reads, and that may slightly increase the latency of that read, but it
> should more than make up for this by making more efficient use of the
> storage path.
> 
> The patch mostly makes the code look more consistent.  Performance
> change is unlikely to be noticeable.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

> Fixes-no-auto-backport: 3fb5c298b04e ("swap: allow swap readahead to be merged")

Is this really a thing?
NeilBrown Jan. 26, 2022, 9:47 p.m. UTC | #2
On Mon, 24 Jan 2022, Christoph Hellwig wrote:
> On Mon, Jan 24, 2022 at 02:48:32PM +1100, NeilBrown wrote:
> > Code that does swap read-ahead uses blk_start_plug() and
> > blk_finish_plug() to allow lower levels to combine multiple read-ahead
> > pages into a single request, but calls blk_finish_plug() *before*
> > submitting the original (non-ahead) read request.
> > This missed an opportunity to combine read requests.
> > 
> > This patch moves the blk_finish_plug to *after* all the reads.
> > This will likely combine the primary read with some of the "ahead"
> > reads, and that may slightly increase the latency of that read, but it
> > should more than make up for this by making more efficient use of the
> > storage path.
> > 
> > The patch mostly makes the code look more consistent.  Performance
> > change is unlikely to be noticeable.
> 
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks.
> 
> > Fixes-no-auto-backport: 3fb5c298b04e ("swap: allow swap readahead to be merged")
> 
> Is this really a thing?
Maybe it should be.....
As I'm sure you guessed, I think it is valuable to record this
connection between commits, but I don't like it hasty automatic
backporting of patches where the (unknown) risk might exceed the (known)
value.  This is how I choose to state my displeasure.

Thanks,
NeilBrown
Hugh Dickins Jan. 26, 2022, 11:09 p.m. UTC | #3
On Thu, 27 Jan 2022, NeilBrown wrote:
> On Mon, 24 Jan 2022, Christoph Hellwig wrote:
> > On Mon, Jan 24, 2022 at 02:48:32PM +1100, NeilBrown wrote:
> > > Code that does swap read-ahead uses blk_start_plug() and
> > > blk_finish_plug() to allow lower levels to combine multiple read-ahead
> > > pages into a single request, but calls blk_finish_plug() *before*
> > > submitting the original (non-ahead) read request.
> > > This missed an opportunity to combine read requests.

No, you're misunderstanding there.  All the necessary reads are issued
within the loop, between the plug and unplug: it does not skip over
the target page in the loop, but issues its read along with the rest.

But it has not kept any of those pages locked, nor even kept any
refcounts raised: so at the end has to look up the target page again
with the final read_swap_cache_async() (which also copes with the
highly unlikely case that the page got swapped out again meanwhile).

> > > 
> > > This patch moves the blk_finish_plug to *after* all the reads.
> > > This will likely combine the primary read with some of the "ahead"
> > > reads, and that may slightly increase the latency of that read, but it
> > > should more than make up for this by making more efficient use of the
> > > storage path.
> > > 
> > > The patch mostly makes the code look more consistent.  Performance
> > > change is unlikely to be noticeable.
> > 
> > Looks good:
> > 
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Thanks.
> > 
> > > Fixes-no-auto-backport: 3fb5c298b04e ("swap: allow swap readahead to be merged")
> > 
> > Is this really a thing?
> Maybe it should be.....
> As I'm sure you guessed, I think it is valuable to record this
> connection between commits, but I don't like it hasty automatic
> backporting of patches where the (unknown) risk might exceed the (known)
> value.  This is how I choose to state my displeasure.

I don't suppose your patch does any actual harm (beyond propagating a
misunderstanding), but it's certainly not a fix, and I think should
simply be dropped from the series.

(But please don't expect any comment from me on the rest:
SWP_FS_OPS has always been beyond my understanding.)

Hugh
NeilBrown Jan. 27, 2022, 12:32 a.m. UTC | #4
On Thu, 27 Jan 2022, Hugh Dickins wrote:
> On Thu, 27 Jan 2022, NeilBrown wrote:
> > On Mon, 24 Jan 2022, Christoph Hellwig wrote:
> > > On Mon, Jan 24, 2022 at 02:48:32PM +1100, NeilBrown wrote:
> > > > Code that does swap read-ahead uses blk_start_plug() and
> > > > blk_finish_plug() to allow lower levels to combine multiple read-ahead
> > > > pages into a single request, but calls blk_finish_plug() *before*
> > > > submitting the original (non-ahead) read request.
> > > > This missed an opportunity to combine read requests.
> 
> No, you're misunderstanding there.  All the necessary reads are issued
> within the loop, between the plug and unplug: it does not skip over
> the target page in the loop, but issues its read along with the rest.
> 
> But it has not kept any of those pages locked, nor even kept any
> refcounts raised: so at the end has to look up the target page again
> with the final read_swap_cache_async() (which also copes with the
> highly unlikely case that the page got swapped out again meanwhile).
> 
....
> 
> I don't suppose your patch does any actual harm (beyond propagating a
> misunderstanding), but it's certainly not a fix, and I think should
> simply be dropped from the series.

Thanks - I had missed that.  The code is correct, but looks wrong (to
me).
I've dropped the patch, but added a comment when I add
"swap_read_unplug()" to explain while plugging isn't needed for that
final read_swap_cache_async().

> 
> (But please don't expect any comment from me on the rest:
> SWP_FS_OPS has always been beyond my understanding.)

:-)

Thanks,
NeilBrown
diff mbox series

Patch

diff --git a/mm/swap_state.c b/mm/swap_state.c
index bb38453425c7..093ecf864200 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -625,6 +625,7 @@  struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	struct vm_area_struct *vma = vmf->vma;
 	unsigned long addr = vmf->address;
 
+	blk_start_plug(&plug);
 	mask = swapin_nr_pages(offset) - 1;
 	if (!mask)
 		goto skip;
@@ -638,7 +639,6 @@  struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	if (end_offset >= si->max)
 		end_offset = si->max - 1;
 
-	blk_start_plug(&plug);
 	for (offset = start_offset; offset <= end_offset ; offset++) {
 		/* Ok, do the async read-ahead now */
 		page = __read_swap_cache_async(
@@ -655,11 +655,12 @@  struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 		}
 		put_page(page);
 	}
-	blk_finish_plug(&plug);
 
 	lru_add_drain();	/* Push any new pages onto the LRU now */
 skip:
-	return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll);
+	page = read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll);
+	blk_finish_plug(&plug);
+	return page;
 }
 
 int init_swap_address_space(unsigned int type, unsigned long nr_pages)
@@ -800,11 +801,11 @@  static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
 		.win = 1,
 	};
 
+	blk_start_plug(&plug);
 	swap_ra_info(vmf, &ra_info);
 	if (ra_info.win == 1)
 		goto skip;
 
-	blk_start_plug(&plug);
 	for (i = 0, pte = ra_info.ptes; i < ra_info.nr_pte;
 	     i++, pte++) {
 		pentry = *pte;
@@ -828,11 +829,12 @@  static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
 		}
 		put_page(page);
 	}
-	blk_finish_plug(&plug);
 	lru_add_drain();
 skip:
-	return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address,
+	page = read_swap_cache_async(fentry, gfp_mask, vma, vmf->address,
 				     ra_info.win == 1);
+	blk_finish_plug(&plug);
+	return page;
 }
 
 /**