diff mbox series

[2/2] mm/filemap: fix mapping_seek_hole_data on THP & 32-bit

Message ID alpine.LSU.2.11.2104211737410.3299@eggly.anvils (mailing list archive)
State New
Headers show
Series [1/2] mm/filemap: fix find_lock_entries hang on 32-bit THP | expand

Commit Message

Hugh Dickins April 22, 2021, 12:39 a.m. UTC
No problem on 64-bit without huge pages, but xfstests generic/285
and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs,
and on 32-bit architectures, with the new mapping_seek_hole_data().
Several different bugs turned out to need fixing.

u64 casts added to stop unfortunate sign-extension when shifting
(and let's use shifts throughout, rather than mixed with * and /).

Use round_up() when advancing pos, to stop assuming that pos was
already THP-aligned when advancing it by THP-size.  (But I believe
this use of round_up() assumes that any THP must be THP-aligned:
true while tmpfs enforces that alignment, and is the only fs with
FS_THP_SUPPORT; but might need to be generalized in the future?
If I try to generalize it right now, I'm sure to get it wrong!)

Use xas_set() when iterating away from a THP, so that xa_index stays
in synch with start, instead of drifting away to return bogus offset.

Check start against end to avoid wrapping 32-bit xa_index to 0 (and
to handle these additional cases, seek_data or not, it's easier to
break the loop than goto: so rearrange exit from the function).

Fixes: 41139aa4c3a3 ("mm/filemap: add mapping_seek_hole_data")
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/filemap.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

Comments

Matthew Wilcox April 22, 2021, 1:16 a.m. UTC | #1
On Wed, Apr 21, 2021 at 05:39:14PM -0700, Hugh Dickins wrote:
> No problem on 64-bit without huge pages, but xfstests generic/285
> and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs,
> and on 32-bit architectures, with the new mapping_seek_hole_data().
> Several different bugs turned out to need fixing.
> 
> u64 casts added to stop unfortunate sign-extension when shifting
> (and let's use shifts throughout, rather than mixed with * and /).

That confuses me.  loff_t is a signed long long, but it can't be negative
(... right?)  So how does casting it to an u64 before dividing by
PAGE_SIZE help?

> Use round_up() when advancing pos, to stop assuming that pos was
> already THP-aligned when advancing it by THP-size.  (But I believe
> this use of round_up() assumes that any THP must be THP-aligned:
> true while tmpfs enforces that alignment, and is the only fs with
> FS_THP_SUPPORT; but might need to be generalized in the future?
> If I try to generalize it right now, I'm sure to get it wrong!)

No generalisation needed in future.  Folios must be naturally aligned
within a file.

> @@ -2681,7 +2681,8 @@ loff_t mapping_seek_hole_data(struct add
>  
>  	rcu_read_lock();
>  	while ((page = find_get_entry(&xas, max, XA_PRESENT))) {
> -		loff_t pos = xas.xa_index * PAGE_SIZE;
> +		loff_t pos = (u64)xas.xa_index << PAGE_SHIFT;
> +		unsigned int seek_size;

I've been preferring size_t for 'number of bytes in a page' because
I'm sure somebody is going to want a page larger than 2GB in the next
ten years.
Hugh Dickins April 22, 2021, 5:55 a.m. UTC | #2
On Thu, 22 Apr 2021, Matthew Wilcox wrote:
> On Wed, Apr 21, 2021 at 05:39:14PM -0700, Hugh Dickins wrote:
> > No problem on 64-bit without huge pages, but xfstests generic/285
> > and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs,
> > and on 32-bit architectures, with the new mapping_seek_hole_data().
> > Several different bugs turned out to need fixing.
> > 
> > u64 casts added to stop unfortunate sign-extension when shifting
> > (and let's use shifts throughout, rather than mixed with * and /).
> 
> That confuses me.  loff_t is a signed long long, but it can't be negative
> (... right?)  So how does casting it to an u64 before dividing by
> PAGE_SIZE help?

That is a good question. Sprinkling u64s was the first thing I tried,
and I'd swear that it made a good difference at the time; but perhaps
that was all down to just the one on xas.xa_index << PAGE_SHIFT. Or
is it possible that one of the other bugs led to a negative loff_t,
and the casts got better behaviour out of that? Doubtful.

What I certainly recall from yesterday was leaving out one (which?)
of the casts as unnecessary, and wasting quite a bit of time until I
put it back in. Did I really choose precisely the only one necessary?

Taking most of them out did give me good quick runs just now: I'll
go over them again and try full runs on all machines. You'll think me
crazy, but yesterday's experience leaves me reluctant to change without
full testing - but agree it's not good to leave ignorant magic in.

> 
> > Use round_up() when advancing pos, to stop assuming that pos was
> > already THP-aligned when advancing it by THP-size.  (But I believe
> > this use of round_up() assumes that any THP must be THP-aligned:
> > true while tmpfs enforces that alignment, and is the only fs with
> > FS_THP_SUPPORT; but might need to be generalized in the future?
> > If I try to generalize it right now, I'm sure to get it wrong!)
> 
> No generalisation needed in future.  Folios must be naturally aligned
> within a file.

Thanks for the info: I did search around in your various patch series
from last October, and failed to find a decider there: I imagined that
when you started on compound pages for more efficient I/O, there would
be no necessity to align them (whereas huge pmd mappings of shared
files make the alignment important). Anyway, assuming natural alignment
is easiest - but it's remarkable how few places need to rely on it.

> 
> > @@ -2681,7 +2681,8 @@ loff_t mapping_seek_hole_data(struct add
> >  
> >  	rcu_read_lock();
> >  	while ((page = find_get_entry(&xas, max, XA_PRESENT))) {
> > -		loff_t pos = xas.xa_index * PAGE_SIZE;
> > +		loff_t pos = (u64)xas.xa_index << PAGE_SHIFT;
> > +		unsigned int seek_size;
> 
> I've been preferring size_t for 'number of bytes in a page' because
> I'm sure somebody is going to want a page larger than 2GB in the next
> ten years.

Ah, there I was simply following what the author of seek_page_size()
had chosen, and I think that's the right thing to do in today's tree:
let's see who that author was... hmm, someone called Matthew Wilcox :)
Hugh Dickins April 22, 2021, 8:46 p.m. UTC | #3
On Wed, 21 Apr 2021, Hugh Dickins wrote:
> On Thu, 22 Apr 2021, Matthew Wilcox wrote:
> > On Wed, Apr 21, 2021 at 05:39:14PM -0700, Hugh Dickins wrote:
> > > No problem on 64-bit without huge pages, but xfstests generic/285
> > > and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs,
> > > and on 32-bit architectures, with the new mapping_seek_hole_data().
> > > Several different bugs turned out to need fixing.
> > > 
> > > u64 casts added to stop unfortunate sign-extension when shifting
> > > (and let's use shifts throughout, rather than mixed with * and /).
> > 
> > That confuses me.  loff_t is a signed long long, but it can't be negative
> > (... right?)  So how does casting it to an u64 before dividing by
> > PAGE_SIZE help?
> 
> That is a good question. Sprinkling u64s was the first thing I tried,
> and I'd swear that it made a good difference at the time; but perhaps
> that was all down to just the one on xas.xa_index << PAGE_SHIFT. Or
> is it possible that one of the other bugs led to a negative loff_t,
> and the casts got better behaviour out of that? Doubtful.
> 
> What I certainly recall from yesterday was leaving out one (which?)
> of the casts as unnecessary, and wasting quite a bit of time until I
> put it back in. Did I really choose precisely the only one necessary?
> 
> Taking most of them out did give me good quick runs just now: I'll
> go over them again and try full runs on all machines. You'll think me
> crazy, but yesterday's experience leaves me reluctant to change without
> full testing - but agree it's not good to leave ignorant magic in.

And you'll be unsurprised to hear that the test runs went fine,
with all but one of those u64 casts removed. And I did locate the
version of filemap.c where I'd left out one "unnecessary" cast:
I had indeed chosen to remove the only one that's necessary.

v2 coming up now, thanks,

Hugh
diff mbox series

Patch

--- 5.12-rc8/mm/filemap.c	2021-02-26 19:42:39.812156085 -0800
+++ linux/mm/filemap.c	2021-04-20 23:20:20.509464440 -0700
@@ -2671,8 +2671,8 @@  unsigned int seek_page_size(struct xa_st
 loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
 		loff_t end, int whence)
 {
-	XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT);
-	pgoff_t max = (end - 1) / PAGE_SIZE;
+	XA_STATE(xas, &mapping->i_pages, (u64)start >> PAGE_SHIFT);
+	pgoff_t max = (u64)(end - 1) >> PAGE_SHIFT;
 	bool seek_data = (whence == SEEK_DATA);
 	struct page *page;
 
@@ -2681,7 +2681,8 @@  loff_t mapping_seek_hole_data(struct add
 
 	rcu_read_lock();
 	while ((page = find_get_entry(&xas, max, XA_PRESENT))) {
-		loff_t pos = xas.xa_index * PAGE_SIZE;
+		loff_t pos = (u64)xas.xa_index << PAGE_SHIFT;
+		unsigned int seek_size;
 
 		if (start < pos) {
 			if (!seek_data)
@@ -2689,25 +2690,25 @@  loff_t mapping_seek_hole_data(struct add
 			start = pos;
 		}
 
-		pos += seek_page_size(&xas, page);
+		seek_size = seek_page_size(&xas, page);
+		pos = round_up((u64)pos + 1, seek_size);
 		start = page_seek_hole_data(&xas, mapping, page, start, pos,
 				seek_data);
 		if (start < pos)
 			goto unlock;
+		if (start >= end)
+			break;
+		if (seek_size > PAGE_SIZE)
+			xas_set(&xas, (u64)pos >> PAGE_SHIFT);
 		if (!xa_is_value(page))
 			put_page(page);
 	}
-	rcu_read_unlock();
-
 	if (seek_data)
-		return -ENXIO;
-	goto out;
-
+		start = -ENXIO;
 unlock:
 	rcu_read_unlock();
-	if (!xa_is_value(page))
+	if (page && !xa_is_value(page))
 		put_page(page);
-out:
 	if (start > end)
 		return end;
 	return start;