Message ID | 20230602222445.2284892-8-willy@infradead.org (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | Create large folios in iomap buffered write path | expand |
On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote: > If we have a large folio, we can copy in larger chunks than PAGE_SIZE. > Start at the maximum page cache size and shrink by half every time we > hit the "we are short on memory" problem. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > fs/iomap/buffered-io.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index a10f9c037515..10434b07e0f9 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > { > loff_t length = iomap_length(iter); > + size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; > loff_t pos = iter->pos; > ssize_t written = 0; > long status = 0; > @@ -776,15 +777,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > > do { > struct folio *folio; > - struct page *page; > - unsigned long offset; /* Offset into pagecache page */ > - unsigned long bytes; /* Bytes to write to page */ > + size_t offset; /* Offset into folio */ > + unsigned long bytes; /* Bytes to write to folio */ > size_t copied; /* Bytes copied from user */ > > - offset = offset_in_page(pos); > - bytes = min_t(unsigned long, PAGE_SIZE - offset, > - iov_iter_count(i)); > again: > + offset = pos & (chunk - 1); > + bytes = min(chunk - offset, iov_iter_count(i)); > status = balance_dirty_pages_ratelimited_flags(mapping, > bdp_flags); > if (unlikely(status)) > @@ -814,11 +813,14 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > if (iter->iomap.flags & IOMAP_F_STALE) > break; > > - page = folio_file_page(folio, pos >> PAGE_SHIFT); > + offset = offset_in_folio(folio, pos); > + if (bytes > folio_size(folio) - offset) > + bytes = folio_size(folio) - offset; > + > if (mapping_writably_mapped(mapping)) > - flush_dcache_page(page); > + flush_dcache_folio(folio); > > - copied = copy_page_from_iter_atomic(page, offset, bytes, i); > + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); I think I've gotten lost in the weeds. Does copy_page_from_iter_atomic actually know how to deal with a multipage folio? AFAICT it takes a page, kmaps it, and copies @bytes starting at @offset in the page. If a caller feeds it a multipage folio, does that all work correctly? Or will the pagecache split multipage folios as needed to make it work right? If we create a 64k folio at pos 0 and then want to write a byte at pos 40k, does __filemap_get_folio break up the 64k folio so that the folio returned by iomap_get_folio starts at 40k? Or can the iter code handle jumping ten pages into a 16-page folio and I just can't see it? (Allergies suddenly went from 0 to 9, engage breaindead mode...) --D > > status = iomap_write_end(iter, pos, bytes, copied, folio); > > @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > */ > if (copied) > bytes = copied; > + if (chunk > PAGE_SIZE) > + chunk /= 2; > goto again; > } > pos += status; > -- > 2.39.2 >
On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote: > On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote: > > - copied = copy_page_from_iter_atomic(page, offset, bytes, i); > > + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); > > I think I've gotten lost in the weeds. Does copy_page_from_iter_atomic > actually know how to deal with a multipage folio? AFAICT it takes a > page, kmaps it, and copies @bytes starting at @offset in the page. If > a caller feeds it a multipage folio, does that all work correctly? Or > will the pagecache split multipage folios as needed to make it work > right? It's a smidgen inefficient, but it does work. First, it calls page_copy_sane() to check that offset & n fit within the compound page (ie this all predates folios). ... Oh. copy_page_from_iter() handles this correctly. copy_page_from_iter_atomic() doesn't. I'll have to fix this first. Looks like Al fixed copy_page_from_iter() in c03f05f183cd and didn't fix copy_page_from_iter_atomic(). > If we create a 64k folio at pos 0 and then want to write a byte at pos > 40k, does __filemap_get_folio break up the 64k folio so that the folio > returned by iomap_get_folio starts at 40k? Or can the iter code handle > jumping ten pages into a 16-page folio and I just can't see it? Well ... it handles it fine unless it's highmem. p is kaddr + offset, so if offset is 40k, it works correctly on !highmem.
On 6/5/2023 6:11 AM, Matthew Wilcox wrote: > On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote: >> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote: >>> - copied = copy_page_from_iter_atomic(page, offset, bytes, i); >>> + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); >> >> I think I've gotten lost in the weeds. Does copy_page_from_iter_atomic >> actually know how to deal with a multipage folio? AFAICT it takes a >> page, kmaps it, and copies @bytes starting at @offset in the page. If >> a caller feeds it a multipage folio, does that all work correctly? Or >> will the pagecache split multipage folios as needed to make it work >> right? > > It's a smidgen inefficient, but it does work. First, it calls > page_copy_sane() to check that offset & n fit within the compound page > (ie this all predates folios). > > ... Oh. copy_page_from_iter() handles this correctly. > copy_page_from_iter_atomic() doesn't. I'll have to fix this > first. Looks like Al fixed copy_page_from_iter() in c03f05f183cd > and didn't fix copy_page_from_iter_atomic(). > >> If we create a 64k folio at pos 0 and then want to write a byte at pos >> 40k, does __filemap_get_folio break up the 64k folio so that the folio >> returned by iomap_get_folio starts at 40k? Or can the iter code handle >> jumping ten pages into a 16-page folio and I just can't see it? > > Well ... it handles it fine unless it's highmem. p is kaddr + offset, > so if offset is 40k, it works correctly on !highmem. So is it better to have implementations for !highmem and highmem? And for !highmem, we don't need the kmap_local_page()/kunmap_local() and chunk size per copy is not limited to PAGE_SIZE. Thanks. Regards Yin, Fengwei
On Mon, Jun 05, 2023 at 04:25:22PM +0800, Yin, Fengwei wrote: > On 6/5/2023 6:11 AM, Matthew Wilcox wrote: > > On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote: > >> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote: > >>> - copied = copy_page_from_iter_atomic(page, offset, bytes, i); > >>> + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); > >> > >> I think I've gotten lost in the weeds. Does copy_page_from_iter_atomic > >> actually know how to deal with a multipage folio? AFAICT it takes a > >> page, kmaps it, and copies @bytes starting at @offset in the page. If > >> a caller feeds it a multipage folio, does that all work correctly? Or > >> will the pagecache split multipage folios as needed to make it work > >> right? > > > > It's a smidgen inefficient, but it does work. First, it calls > > page_copy_sane() to check that offset & n fit within the compound page > > (ie this all predates folios). > > > > ... Oh. copy_page_from_iter() handles this correctly. > > copy_page_from_iter_atomic() doesn't. I'll have to fix this > > first. Looks like Al fixed copy_page_from_iter() in c03f05f183cd > > and didn't fix copy_page_from_iter_atomic(). > > > >> If we create a 64k folio at pos 0 and then want to write a byte at pos > >> 40k, does __filemap_get_folio break up the 64k folio so that the folio > >> returned by iomap_get_folio starts at 40k? Or can the iter code handle > >> jumping ten pages into a 16-page folio and I just can't see it? > > > > Well ... it handles it fine unless it's highmem. p is kaddr + offset, > > so if offset is 40k, it works correctly on !highmem. > So is it better to have implementations for !highmem and highmem? And for > !highmem, we don't need the kmap_local_page()/kunmap_local() and chunk > size per copy is not limited to PAGE_SIZE. Thanks. No, that's not needed; we can handle that just fine. Maybe this can use kmap_local_page() instead of kmap_atomic(). Al, what do you think? I haven't tested this yet; need to figure out a qemu config with highmem ... diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 960223ed9199..d3d6a0789625 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) } EXPORT_SYMBOL(iov_iter_zero); -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, - struct iov_iter *i) +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, + size_t bytes, struct iov_iter *i) { - char *kaddr = kmap_atomic(page), *p = kaddr + offset; - if (!page_copy_sane(page, offset, bytes)) { - kunmap_atomic(kaddr); + size_t n = bytes, copied = 0; + + if (!page_copy_sane(page, offset, bytes)) return 0; - } - if (WARN_ON_ONCE(!i->data_source)) { - kunmap_atomic(kaddr); + if (WARN_ON_ONCE(!i->data_source)) return 0; + + page += offset / PAGE_SIZE; + offset %= PAGE_SIZE; + if (PageHighMem(page)) + n = min_t(size_t, bytes, PAGE_SIZE); + while (1) { + char *kaddr = kmap_atomic(page) + offset; + iterate_and_advance(i, n, base, len, off, + copyin(kaddr + off, base, len), + memcpy_from_iter(i, kaddr + off, base, len) + ) + kunmap_atomic(kaddr); + copied += n; + if (!PageHighMem(page) || copied == bytes || n == 0) + break; + offset += n; + page += offset / PAGE_SIZE; + offset %= PAGE_SIZE; + n = min_t(size_t, bytes - copied, PAGE_SIZE); } - iterate_and_advance(i, bytes, base, len, off, - copyin(p + off, base, len), - memcpy_from_iter(i, p + off, base, len) - ) - kunmap_atomic(kaddr); - return bytes; + return copied; } EXPORT_SYMBOL(copy_page_from_iter_atomic);
On 6/7/23 02:07, Matthew Wilcox wrote: > On Mon, Jun 05, 2023 at 04:25:22PM +0800, Yin, Fengwei wrote: >> On 6/5/2023 6:11 AM, Matthew Wilcox wrote: >>> On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote: >>>> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote: >>>>> - copied = copy_page_from_iter_atomic(page, offset, bytes, i); >>>>> + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); >>>> >>>> I think I've gotten lost in the weeds. Does copy_page_from_iter_atomic >>>> actually know how to deal with a multipage folio? AFAICT it takes a >>>> page, kmaps it, and copies @bytes starting at @offset in the page. If >>>> a caller feeds it a multipage folio, does that all work correctly? Or >>>> will the pagecache split multipage folios as needed to make it work >>>> right? >>> >>> It's a smidgen inefficient, but it does work. First, it calls >>> page_copy_sane() to check that offset & n fit within the compound page >>> (ie this all predates folios). >>> >>> ... Oh. copy_page_from_iter() handles this correctly. >>> copy_page_from_iter_atomic() doesn't. I'll have to fix this >>> first. Looks like Al fixed copy_page_from_iter() in c03f05f183cd >>> and didn't fix copy_page_from_iter_atomic(). >>> >>>> If we create a 64k folio at pos 0 and then want to write a byte at pos >>>> 40k, does __filemap_get_folio break up the 64k folio so that the folio >>>> returned by iomap_get_folio starts at 40k? Or can the iter code handle >>>> jumping ten pages into a 16-page folio and I just can't see it? >>> >>> Well ... it handles it fine unless it's highmem. p is kaddr + offset, >>> so if offset is 40k, it works correctly on !highmem. >> So is it better to have implementations for !highmem and highmem? And for >> !highmem, we don't need the kmap_local_page()/kunmap_local() and chunk >> size per copy is not limited to PAGE_SIZE. Thanks. > > No, that's not needed; we can handle that just fine. Maybe this can > use kmap_local_page() instead of kmap_atomic(). Al, what do you think? > I haven't tested this yet; need to figure out a qemu config with highmem ... > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 960223ed9199..d3d6a0789625 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) > } > EXPORT_SYMBOL(iov_iter_zero); > > -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, > - struct iov_iter *i) > +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, > + size_t bytes, struct iov_iter *i) > { > - char *kaddr = kmap_atomic(page), *p = kaddr + offset; > - if (!page_copy_sane(page, offset, bytes)) { > - kunmap_atomic(kaddr); > + size_t n = bytes, copied = 0; > + > + if (!page_copy_sane(page, offset, bytes)) > return 0; > - } > - if (WARN_ON_ONCE(!i->data_source)) { > - kunmap_atomic(kaddr); > + if (WARN_ON_ONCE(!i->data_source)) > return 0; > + > + page += offset / PAGE_SIZE; > + offset %= PAGE_SIZE; > + if (PageHighMem(page)) > + n = min_t(size_t, bytes, PAGE_SIZE); This is smart. > + while (1) { > + char *kaddr = kmap_atomic(page) + offset; > + iterate_and_advance(i, n, base, len, off, > + copyin(kaddr + off, base, len), > + memcpy_from_iter(i, kaddr + off, base, len) > + ) > + kunmap_atomic(kaddr); > + copied += n; > + if (!PageHighMem(page) || copied == bytes || n == 0) > + break; My understanding is copied == bytes could cover !PageHighMem(page). > + offset += n; > + page += offset / PAGE_SIZE; Should be page += n / PAGE_SIZE? Thanks. Regards Yin, Fengwei > + offset %= PAGE_SIZE; > + n = min_t(size_t, bytes - copied, PAGE_SIZE); > } > - iterate_and_advance(i, bytes, base, len, off, > - copyin(p + off, base, len), > - memcpy_from_iter(i, p + off, base, len) > - ) > - kunmap_atomic(kaddr); > - return bytes; > + return copied; > } > EXPORT_SYMBOL(copy_page_from_iter_atomic); >
On 6/7/2023 10:21 AM, Yin Fengwei wrote: > > > On 6/7/23 02:07, Matthew Wilcox wrote: >> On Mon, Jun 05, 2023 at 04:25:22PM +0800, Yin, Fengwei wrote: >>> On 6/5/2023 6:11 AM, Matthew Wilcox wrote: >>>> On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote: >>>>> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote: >>>>>> - copied = copy_page_from_iter_atomic(page, offset, bytes, i); >>>>>> + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); >>>>> >>>>> I think I've gotten lost in the weeds. Does copy_page_from_iter_atomic >>>>> actually know how to deal with a multipage folio? AFAICT it takes a >>>>> page, kmaps it, and copies @bytes starting at @offset in the page. If >>>>> a caller feeds it a multipage folio, does that all work correctly? Or >>>>> will the pagecache split multipage folios as needed to make it work >>>>> right? >>>> >>>> It's a smidgen inefficient, but it does work. First, it calls >>>> page_copy_sane() to check that offset & n fit within the compound page >>>> (ie this all predates folios). >>>> >>>> ... Oh. copy_page_from_iter() handles this correctly. >>>> copy_page_from_iter_atomic() doesn't. I'll have to fix this >>>> first. Looks like Al fixed copy_page_from_iter() in c03f05f183cd >>>> and didn't fix copy_page_from_iter_atomic(). >>>> >>>>> If we create a 64k folio at pos 0 and then want to write a byte at pos >>>>> 40k, does __filemap_get_folio break up the 64k folio so that the folio >>>>> returned by iomap_get_folio starts at 40k? Or can the iter code handle >>>>> jumping ten pages into a 16-page folio and I just can't see it? >>>> >>>> Well ... it handles it fine unless it's highmem. p is kaddr + offset, >>>> so if offset is 40k, it works correctly on !highmem. >>> So is it better to have implementations for !highmem and highmem? And for >>> !highmem, we don't need the kmap_local_page()/kunmap_local() and chunk >>> size per copy is not limited to PAGE_SIZE. Thanks. >> >> No, that's not needed; we can handle that just fine. Maybe this can >> use kmap_local_page() instead of kmap_atomic(). Al, what do you think? >> I haven't tested this yet; need to figure out a qemu config with highmem ... >> >> diff --git a/lib/iov_iter.c b/lib/iov_iter.c >> index 960223ed9199..d3d6a0789625 100644 >> --- a/lib/iov_iter.c >> +++ b/lib/iov_iter.c >> @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) >> } >> EXPORT_SYMBOL(iov_iter_zero); >> >> -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, >> - struct iov_iter *i) >> +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, >> + size_t bytes, struct iov_iter *i) >> { >> - char *kaddr = kmap_atomic(page), *p = kaddr + offset; >> - if (!page_copy_sane(page, offset, bytes)) { >> - kunmap_atomic(kaddr); >> + size_t n = bytes, copied = 0; >> + >> + if (!page_copy_sane(page, offset, bytes)) >> return 0; >> - } >> - if (WARN_ON_ONCE(!i->data_source)) { >> - kunmap_atomic(kaddr); >> + if (WARN_ON_ONCE(!i->data_source)) >> return 0; >> + >> + page += offset / PAGE_SIZE; >> + offset %= PAGE_SIZE; >> + if (PageHighMem(page)) >> + n = min_t(size_t, bytes, PAGE_SIZE); > This is smart. > >> + while (1) { >> + char *kaddr = kmap_atomic(page) + offset; >> + iterate_and_advance(i, n, base, len, off, >> + copyin(kaddr + off, base, len), >> + memcpy_from_iter(i, kaddr + off, base, len) >> + ) >> + kunmap_atomic(kaddr); >> + copied += n; >> + if (!PageHighMem(page) || copied == bytes || n == 0) >> + break; > My understanding is copied == bytes could cover !PageHighMem(page). > >> + offset += n; >> + page += offset / PAGE_SIZE; > Should be page += n / PAGE_SIZE? Thanks. offset / PAGE_SIZE is correct. Sorry for the noise. Regards Yin, Fengwei > > > Regards > Yin, Fengwei > >> + offset %= PAGE_SIZE; >> + n = min_t(size_t, bytes - copied, PAGE_SIZE); >> } >> - iterate_and_advance(i, bytes, base, len, off, >> - copyin(p + off, base, len), >> - memcpy_from_iter(i, p + off, base, len) >> - ) >> - kunmap_atomic(kaddr); >> - return bytes; >> + return copied; >> } >> EXPORT_SYMBOL(copy_page_from_iter_atomic); >>
On 6/7/23 02:07, Matthew Wilcox wrote: > On Mon, Jun 05, 2023 at 04:25:22PM +0800, Yin, Fengwei wrote: >> On 6/5/2023 6:11 AM, Matthew Wilcox wrote: >>> On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote: >>>> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote: >>>>> - copied = copy_page_from_iter_atomic(page, offset, bytes, i); >>>>> + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); >>>> >>>> I think I've gotten lost in the weeds. Does copy_page_from_iter_atomic >>>> actually know how to deal with a multipage folio? AFAICT it takes a >>>> page, kmaps it, and copies @bytes starting at @offset in the page. If >>>> a caller feeds it a multipage folio, does that all work correctly? Or >>>> will the pagecache split multipage folios as needed to make it work >>>> right? >>> >>> It's a smidgen inefficient, but it does work. First, it calls >>> page_copy_sane() to check that offset & n fit within the compound page >>> (ie this all predates folios). >>> >>> ... Oh. copy_page_from_iter() handles this correctly. >>> copy_page_from_iter_atomic() doesn't. I'll have to fix this >>> first. Looks like Al fixed copy_page_from_iter() in c03f05f183cd >>> and didn't fix copy_page_from_iter_atomic(). >>> >>>> If we create a 64k folio at pos 0 and then want to write a byte at pos >>>> 40k, does __filemap_get_folio break up the 64k folio so that the folio >>>> returned by iomap_get_folio starts at 40k? Or can the iter code handle >>>> jumping ten pages into a 16-page folio and I just can't see it? >>> >>> Well ... it handles it fine unless it's highmem. p is kaddr + offset, >>> so if offset is 40k, it works correctly on !highmem. >> So is it better to have implementations for !highmem and highmem? And for >> !highmem, we don't need the kmap_local_page()/kunmap_local() and chunk >> size per copy is not limited to PAGE_SIZE. Thanks. > > No, that's not needed; we can handle that just fine. Maybe this can > use kmap_local_page() instead of kmap_atomic(). Al, what do you think? > I haven't tested this yet; need to figure out a qemu config with highmem ... > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 960223ed9199..d3d6a0789625 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) > } > EXPORT_SYMBOL(iov_iter_zero); > > -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, > - struct iov_iter *i) > +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, > + size_t bytes, struct iov_iter *i) > { > - char *kaddr = kmap_atomic(page), *p = kaddr + offset; > - if (!page_copy_sane(page, offset, bytes)) { > - kunmap_atomic(kaddr); > + size_t n = bytes, copied = 0; > + > + if (!page_copy_sane(page, offset, bytes)) > return 0; > - } > - if (WARN_ON_ONCE(!i->data_source)) { > - kunmap_atomic(kaddr); > + if (WARN_ON_ONCE(!i->data_source)) > return 0; > + > + page += offset / PAGE_SIZE; > + offset %= PAGE_SIZE; > + if (PageHighMem(page)) > + n = min_t(size_t, bytes, PAGE_SIZE); Should be PAGE_SIZE - offset instead of PAGE_SIZE? > + while (1) { > + char *kaddr = kmap_atomic(page) + offset; > + iterate_and_advance(i, n, base, len, off, > + copyin(kaddr + off, base, len), > + memcpy_from_iter(i, kaddr + off, base, len) > + ) > + kunmap_atomic(kaddr); > + copied += n; > + if (!PageHighMem(page) || copied == bytes || n == 0) > + break; > + offset += n; > + page += offset / PAGE_SIZE; > + offset %= PAGE_SIZE; > + n = min_t(size_t, bytes - copied, PAGE_SIZE); Should be PAGE_SIZE - offset instead of PAGE_SIZE? Thanks. Regards Yin, Fengwei > } > - iterate_and_advance(i, bytes, base, len, off, > - copyin(p + off, base, len), > - memcpy_from_iter(i, p + off, base, len) > - ) > - kunmap_atomic(kaddr); > - return bytes; > + return copied; > } > EXPORT_SYMBOL(copy_page_from_iter_atomic); >
On Wed, Jun 07, 2023 at 10:21:41AM +0800, Yin Fengwei wrote: > On 6/7/23 02:07, Matthew Wilcox wrote: > > +++ b/lib/iov_iter.c > > @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) > > } > > EXPORT_SYMBOL(iov_iter_zero); > > > > -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, > > - struct iov_iter *i) > > +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, > > + size_t bytes, struct iov_iter *i) > > { > > - char *kaddr = kmap_atomic(page), *p = kaddr + offset; > > - if (!page_copy_sane(page, offset, bytes)) { > > - kunmap_atomic(kaddr); > > + size_t n = bytes, copied = 0; > > + > > + if (!page_copy_sane(page, offset, bytes)) > > return 0; > > - } > > - if (WARN_ON_ONCE(!i->data_source)) { > > - kunmap_atomic(kaddr); > > + if (WARN_ON_ONCE(!i->data_source)) > > return 0; > > + > > + page += offset / PAGE_SIZE; > > + offset %= PAGE_SIZE; > > + if (PageHighMem(page)) > > + n = min_t(size_t, bytes, PAGE_SIZE); > This is smart. Thanks ;-) > > + while (1) { > > + char *kaddr = kmap_atomic(page) + offset; > > + iterate_and_advance(i, n, base, len, off, > > + copyin(kaddr + off, base, len), > > + memcpy_from_iter(i, kaddr + off, base, len) > > + ) > > + kunmap_atomic(kaddr); > > + copied += n; > > + if (!PageHighMem(page) || copied == bytes || n == 0) > > + break; > My understanding is copied == bytes could cover !PageHighMem(page). It could! But the PageHighMem test serves two purposes. One is that it tells the human reader that this is all because of HighMem. The other is that on !HIGHMEM systems it compiles to false: PAGEFLAG_FALSE(HighMem, highmem) static inline int Page##uname(const struct page *page) { return 0; } So it tells the _compiler_ that all of this code is ignorable and it can optimise it out. Now, you and I know that it will always be true, but it lets the compiler remove the test. Hopefully the compiler can also see that: while (1) { ... if (true) break; } means it can optimise away the entire loop structure and just produce the same code that it always did.
On Wed, Jun 07, 2023 at 02:40:02PM +0800, Yin Fengwei wrote: > > + page += offset / PAGE_SIZE; > > + offset %= PAGE_SIZE; > > + if (PageHighMem(page)) > > + n = min_t(size_t, bytes, PAGE_SIZE); > Should be PAGE_SIZE - offset instead of PAGE_SIZE? Yes, it should. Thanks.
On 6/7/23 23:55, Matthew Wilcox wrote: > On Wed, Jun 07, 2023 at 10:21:41AM +0800, Yin Fengwei wrote: >> On 6/7/23 02:07, Matthew Wilcox wrote: >>> +++ b/lib/iov_iter.c >>> @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) >>> } >>> EXPORT_SYMBOL(iov_iter_zero); >>> >>> -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, >>> - struct iov_iter *i) >>> +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, >>> + size_t bytes, struct iov_iter *i) >>> { >>> - char *kaddr = kmap_atomic(page), *p = kaddr + offset; >>> - if (!page_copy_sane(page, offset, bytes)) { >>> - kunmap_atomic(kaddr); >>> + size_t n = bytes, copied = 0; >>> + >>> + if (!page_copy_sane(page, offset, bytes)) >>> return 0; >>> - } >>> - if (WARN_ON_ONCE(!i->data_source)) { >>> - kunmap_atomic(kaddr); >>> + if (WARN_ON_ONCE(!i->data_source)) >>> return 0; >>> + >>> + page += offset / PAGE_SIZE; >>> + offset %= PAGE_SIZE; >>> + if (PageHighMem(page)) >>> + n = min_t(size_t, bytes, PAGE_SIZE); >> This is smart. > > Thanks ;-) > >>> + while (1) { >>> + char *kaddr = kmap_atomic(page) + offset; >>> + iterate_and_advance(i, n, base, len, off, >>> + copyin(kaddr + off, base, len), >>> + memcpy_from_iter(i, kaddr + off, base, len) >>> + ) >>> + kunmap_atomic(kaddr); >>> + copied += n; >>> + if (!PageHighMem(page) || copied == bytes || n == 0) >>> + break; >> My understanding is copied == bytes could cover !PageHighMem(page). > > It could! But the PageHighMem test serves two purposes. One is that > it tells the human reader that this is all because of HighMem. The > other is that on !HIGHMEM systems it compiles to false: > > PAGEFLAG_FALSE(HighMem, highmem) > > static inline int Page##uname(const struct page *page) { return 0; } > > So it tells the _compiler_ that all of this code is ignorable and > it can optimise it out. Now, you and I know that it will always > be true, but it lets the compiler remove the test. Hopefully the > compiler can also see that: > > while (1) { > ... > if (true) > break; > } > > means it can optimise away the entire loop structure and just produce > the same code that it always did. I thought about the first purpose. But the second purpose is new thing I learned from this thread. Thanks a lot for detail explanation. Regards Yin, Fengwei
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index a10f9c037515..10434b07e0f9 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) { loff_t length = iomap_length(iter); + size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; loff_t pos = iter->pos; ssize_t written = 0; long status = 0; @@ -776,15 +777,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) do { struct folio *folio; - struct page *page; - unsigned long offset; /* Offset into pagecache page */ - unsigned long bytes; /* Bytes to write to page */ + size_t offset; /* Offset into folio */ + unsigned long bytes; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ - offset = offset_in_page(pos); - bytes = min_t(unsigned long, PAGE_SIZE - offset, - iov_iter_count(i)); again: + offset = pos & (chunk - 1); + bytes = min(chunk - offset, iov_iter_count(i)); status = balance_dirty_pages_ratelimited_flags(mapping, bdp_flags); if (unlikely(status)) @@ -814,11 +813,14 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) if (iter->iomap.flags & IOMAP_F_STALE) break; - page = folio_file_page(folio, pos >> PAGE_SHIFT); + offset = offset_in_folio(folio, pos); + if (bytes > folio_size(folio) - offset) + bytes = folio_size(folio) - offset; + if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); + flush_dcache_folio(folio); - copied = copy_page_from_iter_atomic(page, offset, bytes, i); + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); status = iomap_write_end(iter, pos, bytes, copied, folio); @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) */ if (copied) bytes = copied; + if (chunk > PAGE_SIZE) + chunk /= 2; goto again; } pos += status;
If we have a large folio, we can copy in larger chunks than PAGE_SIZE. Start at the maximum page cache size and shrink by half every time we hit the "we are short on memory" problem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- fs/iomap/buffered-io.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)