Message ID | alpine.LSU.2.11.2104211737410.3299@eggly.anvils (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] mm/filemap: fix find_lock_entries hang on 32-bit THP | expand |
On Wed, Apr 21, 2021 at 05:39:14PM -0700, Hugh Dickins wrote: > No problem on 64-bit without huge pages, but xfstests generic/285 > and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs, > and on 32-bit architectures, with the new mapping_seek_hole_data(). > Several different bugs turned out to need fixing. > > u64 casts added to stop unfortunate sign-extension when shifting > (and let's use shifts throughout, rather than mixed with * and /). That confuses me. loff_t is a signed long long, but it can't be negative (... right?) So how does casting it to an u64 before dividing by PAGE_SIZE help? > Use round_up() when advancing pos, to stop assuming that pos was > already THP-aligned when advancing it by THP-size. (But I believe > this use of round_up() assumes that any THP must be THP-aligned: > true while tmpfs enforces that alignment, and is the only fs with > FS_THP_SUPPORT; but might need to be generalized in the future? > If I try to generalize it right now, I'm sure to get it wrong!) No generalisation needed in future. Folios must be naturally aligned within a file. > @@ -2681,7 +2681,8 @@ loff_t mapping_seek_hole_data(struct add > > rcu_read_lock(); > while ((page = find_get_entry(&xas, max, XA_PRESENT))) { > - loff_t pos = xas.xa_index * PAGE_SIZE; > + loff_t pos = (u64)xas.xa_index << PAGE_SHIFT; > + unsigned int seek_size; I've been preferring size_t for 'number of bytes in a page' because I'm sure somebody is going to want a page larger than 2GB in the next ten years.
On Thu, 22 Apr 2021, Matthew Wilcox wrote: > On Wed, Apr 21, 2021 at 05:39:14PM -0700, Hugh Dickins wrote: > > No problem on 64-bit without huge pages, but xfstests generic/285 > > and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs, > > and on 32-bit architectures, with the new mapping_seek_hole_data(). > > Several different bugs turned out to need fixing. > > > > u64 casts added to stop unfortunate sign-extension when shifting > > (and let's use shifts throughout, rather than mixed with * and /). > > That confuses me. loff_t is a signed long long, but it can't be negative > (... right?) So how does casting it to an u64 before dividing by > PAGE_SIZE help? That is a good question. Sprinkling u64s was the first thing I tried, and I'd swear that it made a good difference at the time; but perhaps that was all down to just the one on xas.xa_index << PAGE_SHIFT. Or is it possible that one of the other bugs led to a negative loff_t, and the casts got better behaviour out of that? Doubtful. What I certainly recall from yesterday was leaving out one (which?) of the casts as unnecessary, and wasting quite a bit of time until I put it back in. Did I really choose precisely the only one necessary? Taking most of them out did give me good quick runs just now: I'll go over them again and try full runs on all machines. You'll think me crazy, but yesterday's experience leaves me reluctant to change without full testing - but agree it's not good to leave ignorant magic in. > > > Use round_up() when advancing pos, to stop assuming that pos was > > already THP-aligned when advancing it by THP-size. (But I believe > > this use of round_up() assumes that any THP must be THP-aligned: > > true while tmpfs enforces that alignment, and is the only fs with > > FS_THP_SUPPORT; but might need to be generalized in the future? > > If I try to generalize it right now, I'm sure to get it wrong!) > > No generalisation needed in future. Folios must be naturally aligned > within a file. Thanks for the info: I did search around in your various patch series from last October, and failed to find a decider there: I imagined that when you started on compound pages for more efficient I/O, there would be no necessity to align them (whereas huge pmd mappings of shared files make the alignment important). Anyway, assuming natural alignment is easiest - but it's remarkable how few places need to rely on it. > > > @@ -2681,7 +2681,8 @@ loff_t mapping_seek_hole_data(struct add > > > > rcu_read_lock(); > > while ((page = find_get_entry(&xas, max, XA_PRESENT))) { > > - loff_t pos = xas.xa_index * PAGE_SIZE; > > + loff_t pos = (u64)xas.xa_index << PAGE_SHIFT; > > + unsigned int seek_size; > > I've been preferring size_t for 'number of bytes in a page' because > I'm sure somebody is going to want a page larger than 2GB in the next > ten years. Ah, there I was simply following what the author of seek_page_size() had chosen, and I think that's the right thing to do in today's tree: let's see who that author was... hmm, someone called Matthew Wilcox :)
On Wed, 21 Apr 2021, Hugh Dickins wrote: > On Thu, 22 Apr 2021, Matthew Wilcox wrote: > > On Wed, Apr 21, 2021 at 05:39:14PM -0700, Hugh Dickins wrote: > > > No problem on 64-bit without huge pages, but xfstests generic/285 > > > and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs, > > > and on 32-bit architectures, with the new mapping_seek_hole_data(). > > > Several different bugs turned out to need fixing. > > > > > > u64 casts added to stop unfortunate sign-extension when shifting > > > (and let's use shifts throughout, rather than mixed with * and /). > > > > That confuses me. loff_t is a signed long long, but it can't be negative > > (... right?) So how does casting it to an u64 before dividing by > > PAGE_SIZE help? > > That is a good question. Sprinkling u64s was the first thing I tried, > and I'd swear that it made a good difference at the time; but perhaps > that was all down to just the one on xas.xa_index << PAGE_SHIFT. Or > is it possible that one of the other bugs led to a negative loff_t, > and the casts got better behaviour out of that? Doubtful. > > What I certainly recall from yesterday was leaving out one (which?) > of the casts as unnecessary, and wasting quite a bit of time until I > put it back in. Did I really choose precisely the only one necessary? > > Taking most of them out did give me good quick runs just now: I'll > go over them again and try full runs on all machines. You'll think me > crazy, but yesterday's experience leaves me reluctant to change without > full testing - but agree it's not good to leave ignorant magic in. And you'll be unsurprised to hear that the test runs went fine, with all but one of those u64 casts removed. And I did locate the version of filemap.c where I'd left out one "unnecessary" cast: I had indeed chosen to remove the only one that's necessary. v2 coming up now, thanks, Hugh
--- 5.12-rc8/mm/filemap.c 2021-02-26 19:42:39.812156085 -0800 +++ linux/mm/filemap.c 2021-04-20 23:20:20.509464440 -0700 @@ -2671,8 +2671,8 @@ unsigned int seek_page_size(struct xa_st loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start, loff_t end, int whence) { - XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT); - pgoff_t max = (end - 1) / PAGE_SIZE; + XA_STATE(xas, &mapping->i_pages, (u64)start >> PAGE_SHIFT); + pgoff_t max = (u64)(end - 1) >> PAGE_SHIFT; bool seek_data = (whence == SEEK_DATA); struct page *page; @@ -2681,7 +2681,8 @@ loff_t mapping_seek_hole_data(struct add rcu_read_lock(); while ((page = find_get_entry(&xas, max, XA_PRESENT))) { - loff_t pos = xas.xa_index * PAGE_SIZE; + loff_t pos = (u64)xas.xa_index << PAGE_SHIFT; + unsigned int seek_size; if (start < pos) { if (!seek_data) @@ -2689,25 +2690,25 @@ loff_t mapping_seek_hole_data(struct add start = pos; } - pos += seek_page_size(&xas, page); + seek_size = seek_page_size(&xas, page); + pos = round_up((u64)pos + 1, seek_size); start = page_seek_hole_data(&xas, mapping, page, start, pos, seek_data); if (start < pos) goto unlock; + if (start >= end) + break; + if (seek_size > PAGE_SIZE) + xas_set(&xas, (u64)pos >> PAGE_SHIFT); if (!xa_is_value(page)) put_page(page); } - rcu_read_unlock(); - if (seek_data) - return -ENXIO; - goto out; - + start = -ENXIO; unlock: rcu_read_unlock(); - if (!xa_is_value(page)) + if (page && !xa_is_value(page)) put_page(page); -out: if (start > end) return end; return start;
No problem on 64-bit without huge pages, but xfstests generic/285 and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs, and on 32-bit architectures, with the new mapping_seek_hole_data(). Several different bugs turned out to need fixing. u64 casts added to stop unfortunate sign-extension when shifting (and let's use shifts throughout, rather than mixed with * and /). Use round_up() when advancing pos, to stop assuming that pos was already THP-aligned when advancing it by THP-size. (But I believe this use of round_up() assumes that any THP must be THP-aligned: true while tmpfs enforces that alignment, and is the only fs with FS_THP_SUPPORT; but might need to be generalized in the future? If I try to generalize it right now, I'm sure to get it wrong!) Use xas_set() when iterating away from a THP, so that xa_index stays in synch with start, instead of drifting away to return bogus offset. Check start against end to avoid wrapping 32-bit xa_index to 0 (and to handle these additional cases, seek_data or not, it's easier to break the loop than goto: so rearrange exit from the function). Fixes: 41139aa4c3a3 ("mm/filemap: add mapping_seek_hole_data") Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/filemap.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)