[v2,7/7] iomap: Copy larger chunks from userspace

Message ID	20230602222445.2284892-8-willy@infradead.org (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-xfs-owner@vger.kernel.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-xfs@vger.kernel.org, Wang Yugui <wangyugui@e16-tech.com>, Dave Chinner <david@fromorbit.com>, Christoph Hellwig <hch@infradead.org>, "Darrick J . Wong" <djwong@kernel.org> Subject: [PATCH v2 7/7] iomap: Copy larger chunks from userspace Date: Fri, 2 Jun 2023 23:24:44 +0100 Message-Id: <20230602222445.2284892-8-willy@infradead.org> In-Reply-To: <20230602222445.2284892-1-willy@infradead.org> References: <20230602222445.2284892-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Create large folios in iomap buffered write path \| expand [v2,0/7] Create large folios in iomap buffered write path [v2,1/7] iomap: Remove large folio handling in iomap_invalidate_folio() [v2,2/7] doc: Correct the description of ->release_folio [v2,3/7] iomap: Remove unnecessary test from iomap_release_folio() [v2,4/7] filemap: Add fgp_t typedef [v2,5/7] filemap: Allow __filemap_get_folio to allocate large folios [v2,6/7] iomap: Create large folios in the buffered write path [v2,7/7] iomap: Copy larger chunks from userspace

Matthew Wilcox June 2, 2023, 10:24 p.m. UTC

If we have a large folio, we can copy in larger chunks than PAGE_SIZE.
Start at the maximum page cache size and shrink by half every time we
hit the "we are short on memory" problem.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Darrick J. Wong June 4, 2023, 6:29 p.m. UTC | #1

On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote:
> If we have a large folio, we can copy in larger chunks than PAGE_SIZE.
> Start at the maximum page cache size and shrink by half every time we
> hit the "we are short on memory" problem.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a10f9c037515..10434b07e0f9 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
>  static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  {
>  	loff_t length = iomap_length(iter);
> +	size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
>  	loff_t pos = iter->pos;
>  	ssize_t written = 0;
>  	long status = 0;
> @@ -776,15 +777,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  
>  	do {
>  		struct folio *folio;
> -		struct page *page;
> -		unsigned long offset;	/* Offset into pagecache page */
> -		unsigned long bytes;	/* Bytes to write to page */
> +		size_t offset;		/* Offset into folio */
> +		unsigned long bytes;	/* Bytes to write to folio */
>  		size_t copied;		/* Bytes copied from user */
>  
> -		offset = offset_in_page(pos);
> -		bytes = min_t(unsigned long, PAGE_SIZE - offset,
> -						iov_iter_count(i));
>  again:
> +		offset = pos & (chunk - 1);
> +		bytes = min(chunk - offset, iov_iter_count(i));
>  		status = balance_dirty_pages_ratelimited_flags(mapping,
>  							       bdp_flags);
>  		if (unlikely(status))
> @@ -814,11 +813,14 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  		if (iter->iomap.flags & IOMAP_F_STALE)
>  			break;
>  
> -		page = folio_file_page(folio, pos >> PAGE_SHIFT);
> +		offset = offset_in_folio(folio, pos);
> +		if (bytes > folio_size(folio) - offset)
> +			bytes = folio_size(folio) - offset;
> +
>  		if (mapping_writably_mapped(mapping))
> -			flush_dcache_page(page);
> +			flush_dcache_folio(folio);
>  
> -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);

I think I've gotten lost in the weeds.  Does copy_page_from_iter_atomic
actually know how to deal with a multipage folio?  AFAICT it takes a
page, kmaps it, and copies @bytes starting at @offset in the page.  If
a caller feeds it a multipage folio, does that all work correctly?  Or
will the pagecache split multipage folios as needed to make it work
right?

If we create a 64k folio at pos 0 and then want to write a byte at pos
40k, does __filemap_get_folio break up the 64k folio so that the folio
returned by iomap_get_folio starts at 40k?  Or can the iter code handle
jumping ten pages into a 16-page folio and I just can't see it?

(Allergies suddenly went from 0 to 9, engage breaindead mode...)

--D

>  
>  		status = iomap_write_end(iter, pos, bytes, copied, folio);
>  
> @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  			 */
>  			if (copied)
>  				bytes = copied;
> +			if (chunk > PAGE_SIZE)
> +				chunk /= 2;
>  			goto again;
>  		}
>  		pos += status;
> -- 
> 2.39.2
>

Matthew Wilcox June 4, 2023, 10:11 p.m. UTC | #2

On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote:
> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote:
> > -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> > +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
> 
> I think I've gotten lost in the weeds.  Does copy_page_from_iter_atomic
> actually know how to deal with a multipage folio?  AFAICT it takes a
> page, kmaps it, and copies @bytes starting at @offset in the page.  If
> a caller feeds it a multipage folio, does that all work correctly?  Or
> will the pagecache split multipage folios as needed to make it work
> right?

It's a smidgen inefficient, but it does work.  First, it calls
page_copy_sane() to check that offset & n fit within the compound page
(ie this all predates folios).

... Oh.  copy_page_from_iter() handles this correctly.
copy_page_from_iter_atomic() doesn't.  I'll have to fix this
first.  Looks like Al fixed copy_page_from_iter() in c03f05f183cd
and didn't fix copy_page_from_iter_atomic().

> If we create a 64k folio at pos 0 and then want to write a byte at pos
> 40k, does __filemap_get_folio break up the 64k folio so that the folio
> returned by iomap_get_folio starts at 40k?  Or can the iter code handle
> jumping ten pages into a 16-page folio and I just can't see it?

Well ... it handles it fine unless it's highmem.  p is kaddr + offset,
so if offset is 40k, it works correctly on !highmem.

Yin, Fengwei June 5, 2023, 8:25 a.m. UTC | #3

On 6/5/2023 6:11 AM, Matthew Wilcox wrote:
> On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote:
>> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote:
>>> -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
>>> +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
>>
>> I think I've gotten lost in the weeds.  Does copy_page_from_iter_atomic
>> actually know how to deal with a multipage folio?  AFAICT it takes a
>> page, kmaps it, and copies @bytes starting at @offset in the page.  If
>> a caller feeds it a multipage folio, does that all work correctly?  Or
>> will the pagecache split multipage folios as needed to make it work
>> right?
> 
> It's a smidgen inefficient, but it does work.  First, it calls
> page_copy_sane() to check that offset & n fit within the compound page
> (ie this all predates folios).
> 
> ... Oh.  copy_page_from_iter() handles this correctly.
> copy_page_from_iter_atomic() doesn't.  I'll have to fix this
> first.  Looks like Al fixed copy_page_from_iter() in c03f05f183cd
> and didn't fix copy_page_from_iter_atomic().
> 
>> If we create a 64k folio at pos 0 and then want to write a byte at pos
>> 40k, does __filemap_get_folio break up the 64k folio so that the folio
>> returned by iomap_get_folio starts at 40k?  Or can the iter code handle
>> jumping ten pages into a 16-page folio and I just can't see it?
> 
> Well ... it handles it fine unless it's highmem.  p is kaddr + offset,
> so if offset is 40k, it works correctly on !highmem.
So is it better to have implementations for !highmem and highmem? And for
!highmem, we don't need the kmap_local_page()/kunmap_local() and chunk
size per copy is not limited to PAGE_SIZE. Thanks.


Regards
Yin, Fengwei

Matthew Wilcox June 6, 2023, 6:07 p.m. UTC | #4

On Mon, Jun 05, 2023 at 04:25:22PM +0800, Yin, Fengwei wrote:
> On 6/5/2023 6:11 AM, Matthew Wilcox wrote:
> > On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote:
> >> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote:
> >>> -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> >>> +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
> >>
> >> I think I've gotten lost in the weeds.  Does copy_page_from_iter_atomic
> >> actually know how to deal with a multipage folio?  AFAICT it takes a
> >> page, kmaps it, and copies @bytes starting at @offset in the page.  If
> >> a caller feeds it a multipage folio, does that all work correctly?  Or
> >> will the pagecache split multipage folios as needed to make it work
> >> right?
> > 
> > It's a smidgen inefficient, but it does work.  First, it calls
> > page_copy_sane() to check that offset & n fit within the compound page
> > (ie this all predates folios).
> > 
> > ... Oh.  copy_page_from_iter() handles this correctly.
> > copy_page_from_iter_atomic() doesn't.  I'll have to fix this
> > first.  Looks like Al fixed copy_page_from_iter() in c03f05f183cd
> > and didn't fix copy_page_from_iter_atomic().
> > 
> >> If we create a 64k folio at pos 0 and then want to write a byte at pos
> >> 40k, does __filemap_get_folio break up the 64k folio so that the folio
> >> returned by iomap_get_folio starts at 40k?  Or can the iter code handle
> >> jumping ten pages into a 16-page folio and I just can't see it?
> > 
> > Well ... it handles it fine unless it's highmem.  p is kaddr + offset,
> > so if offset is 40k, it works correctly on !highmem.
> So is it better to have implementations for !highmem and highmem? And for
> !highmem, we don't need the kmap_local_page()/kunmap_local() and chunk
> size per copy is not limited to PAGE_SIZE. Thanks.

No, that's not needed; we can handle that just fine.  Maybe this can
use kmap_local_page() instead of kmap_atomic().  Al, what do you think?
I haven't tested this yet; need to figure out a qemu config with highmem ...

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 960223ed9199..d3d6a0789625 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
 }
 EXPORT_SYMBOL(iov_iter_zero);
 
-size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes,
-				  struct iov_iter *i)
+size_t copy_page_from_iter_atomic(struct page *page, unsigned offset,
+		size_t bytes, struct iov_iter *i)
 {
-	char *kaddr = kmap_atomic(page), *p = kaddr + offset;
-	if (!page_copy_sane(page, offset, bytes)) {
-		kunmap_atomic(kaddr);
+	size_t n = bytes, copied = 0;
+
+	if (!page_copy_sane(page, offset, bytes))
 		return 0;
-	}
-	if (WARN_ON_ONCE(!i->data_source)) {
-		kunmap_atomic(kaddr);
+	if (WARN_ON_ONCE(!i->data_source))
 		return 0;
+
+	page += offset / PAGE_SIZE;
+	offset %= PAGE_SIZE;
+	if (PageHighMem(page))
+		n = min_t(size_t, bytes, PAGE_SIZE);
+	while (1) {
+		char *kaddr = kmap_atomic(page) + offset;
+		iterate_and_advance(i, n, base, len, off,
+			copyin(kaddr + off, base, len),
+			memcpy_from_iter(i, kaddr + off, base, len)
+		)
+		kunmap_atomic(kaddr);
+		copied += n;
+		if (!PageHighMem(page) || copied == bytes || n == 0)
+			break;
+		offset += n;
+		page += offset / PAGE_SIZE;
+		offset %= PAGE_SIZE;
+		n = min_t(size_t, bytes - copied, PAGE_SIZE);
 	}
-	iterate_and_advance(i, bytes, base, len, off,
-		copyin(p + off, base, len),
-		memcpy_from_iter(i, p + off, base, len)
-	)
-	kunmap_atomic(kaddr);
-	return bytes;
+	return copied;
 }
 EXPORT_SYMBOL(copy_page_from_iter_atomic);

Yin, Fengwei June 7, 2023, 2:21 a.m. UTC | #5

On 6/7/23 02:07, Matthew Wilcox wrote:
> On Mon, Jun 05, 2023 at 04:25:22PM +0800, Yin, Fengwei wrote:
>> On 6/5/2023 6:11 AM, Matthew Wilcox wrote:
>>> On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote:
>>>> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote:
>>>>> -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
>>>>> +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
>>>>
>>>> I think I've gotten lost in the weeds.  Does copy_page_from_iter_atomic
>>>> actually know how to deal with a multipage folio?  AFAICT it takes a
>>>> page, kmaps it, and copies @bytes starting at @offset in the page.  If
>>>> a caller feeds it a multipage folio, does that all work correctly?  Or
>>>> will the pagecache split multipage folios as needed to make it work
>>>> right?
>>>
>>> It's a smidgen inefficient, but it does work.  First, it calls
>>> page_copy_sane() to check that offset & n fit within the compound page
>>> (ie this all predates folios).
>>>
>>> ... Oh.  copy_page_from_iter() handles this correctly.
>>> copy_page_from_iter_atomic() doesn't.  I'll have to fix this
>>> first.  Looks like Al fixed copy_page_from_iter() in c03f05f183cd
>>> and didn't fix copy_page_from_iter_atomic().
>>>
>>>> If we create a 64k folio at pos 0 and then want to write a byte at pos
>>>> 40k, does __filemap_get_folio break up the 64k folio so that the folio
>>>> returned by iomap_get_folio starts at 40k?  Or can the iter code handle
>>>> jumping ten pages into a 16-page folio and I just can't see it?
>>>
>>> Well ... it handles it fine unless it's highmem.  p is kaddr + offset,
>>> so if offset is 40k, it works correctly on !highmem.
>> So is it better to have implementations for !highmem and highmem? And for
>> !highmem, we don't need the kmap_local_page()/kunmap_local() and chunk
>> size per copy is not limited to PAGE_SIZE. Thanks.
> 
> No, that's not needed; we can handle that just fine.  Maybe this can
> use kmap_local_page() instead of kmap_atomic().  Al, what do you think?
> I haven't tested this yet; need to figure out a qemu config with highmem ...
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 960223ed9199..d3d6a0789625 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
>  }
>  EXPORT_SYMBOL(iov_iter_zero);
>  
> -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes,
> -				  struct iov_iter *i)
> +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset,
> +		size_t bytes, struct iov_iter *i)
>  {
> -	char *kaddr = kmap_atomic(page), *p = kaddr + offset;
> -	if (!page_copy_sane(page, offset, bytes)) {
> -		kunmap_atomic(kaddr);
> +	size_t n = bytes, copied = 0;
> +
> +	if (!page_copy_sane(page, offset, bytes))
>  		return 0;
> -	}
> -	if (WARN_ON_ONCE(!i->data_source)) {
> -		kunmap_atomic(kaddr);
> +	if (WARN_ON_ONCE(!i->data_source))
>  		return 0;
> +
> +	page += offset / PAGE_SIZE;
> +	offset %= PAGE_SIZE;
> +	if (PageHighMem(page))
> +		n = min_t(size_t, bytes, PAGE_SIZE);
This is smart.

> +	while (1) {
> +		char *kaddr = kmap_atomic(page) + offset;
> +		iterate_and_advance(i, n, base, len, off,
> +			copyin(kaddr + off, base, len),
> +			memcpy_from_iter(i, kaddr + off, base, len)
> +		)
> +		kunmap_atomic(kaddr);
> +		copied += n;
> +		if (!PageHighMem(page) || copied == bytes || n == 0)
> +			break;
My understanding is copied == bytes could cover !PageHighMem(page).

> +		offset += n;
> +		page += offset / PAGE_SIZE;
Should be page += n / PAGE_SIZE? Thanks.


Regards
Yin, Fengwei

> +		offset %= PAGE_SIZE;
> +		n = min_t(size_t, bytes - copied, PAGE_SIZE);
>  	}
> -	iterate_and_advance(i, bytes, base, len, off,
> -		copyin(p + off, base, len),
> -		memcpy_from_iter(i, p + off, base, len)
> -	)
> -	kunmap_atomic(kaddr);
> -	return bytes;
> +	return copied;
>  }
>  EXPORT_SYMBOL(copy_page_from_iter_atomic);
>

Yin, Fengwei June 7, 2023, 5:33 a.m. UTC | #6

On 6/7/2023 10:21 AM, Yin Fengwei wrote:
> 
> 
> On 6/7/23 02:07, Matthew Wilcox wrote:
>> On Mon, Jun 05, 2023 at 04:25:22PM +0800, Yin, Fengwei wrote:
>>> On 6/5/2023 6:11 AM, Matthew Wilcox wrote:
>>>> On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote:
>>>>> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote:
>>>>>> -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
>>>>>> +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
>>>>>
>>>>> I think I've gotten lost in the weeds.  Does copy_page_from_iter_atomic
>>>>> actually know how to deal with a multipage folio?  AFAICT it takes a
>>>>> page, kmaps it, and copies @bytes starting at @offset in the page.  If
>>>>> a caller feeds it a multipage folio, does that all work correctly?  Or
>>>>> will the pagecache split multipage folios as needed to make it work
>>>>> right?
>>>>
>>>> It's a smidgen inefficient, but it does work.  First, it calls
>>>> page_copy_sane() to check that offset & n fit within the compound page
>>>> (ie this all predates folios).
>>>>
>>>> ... Oh.  copy_page_from_iter() handles this correctly.
>>>> copy_page_from_iter_atomic() doesn't.  I'll have to fix this
>>>> first.  Looks like Al fixed copy_page_from_iter() in c03f05f183cd
>>>> and didn't fix copy_page_from_iter_atomic().
>>>>
>>>>> If we create a 64k folio at pos 0 and then want to write a byte at pos
>>>>> 40k, does __filemap_get_folio break up the 64k folio so that the folio
>>>>> returned by iomap_get_folio starts at 40k?  Or can the iter code handle
>>>>> jumping ten pages into a 16-page folio and I just can't see it?
>>>>
>>>> Well ... it handles it fine unless it's highmem.  p is kaddr + offset,
>>>> so if offset is 40k, it works correctly on !highmem.
>>> So is it better to have implementations for !highmem and highmem? And for
>>> !highmem, we don't need the kmap_local_page()/kunmap_local() and chunk
>>> size per copy is not limited to PAGE_SIZE. Thanks.
>>
>> No, that's not needed; we can handle that just fine.  Maybe this can
>> use kmap_local_page() instead of kmap_atomic().  Al, what do you think?
>> I haven't tested this yet; need to figure out a qemu config with highmem ...
>>
>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>> index 960223ed9199..d3d6a0789625 100644
>> --- a/lib/iov_iter.c
>> +++ b/lib/iov_iter.c
>> @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
>>  }
>>  EXPORT_SYMBOL(iov_iter_zero);
>>  
>> -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes,
>> -				  struct iov_iter *i)
>> +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset,
>> +		size_t bytes, struct iov_iter *i)
>>  {
>> -	char *kaddr = kmap_atomic(page), *p = kaddr + offset;
>> -	if (!page_copy_sane(page, offset, bytes)) {
>> -		kunmap_atomic(kaddr);
>> +	size_t n = bytes, copied = 0;
>> +
>> +	if (!page_copy_sane(page, offset, bytes))
>>  		return 0;
>> -	}
>> -	if (WARN_ON_ONCE(!i->data_source)) {
>> -		kunmap_atomic(kaddr);
>> +	if (WARN_ON_ONCE(!i->data_source))
>>  		return 0;
>> +
>> +	page += offset / PAGE_SIZE;
>> +	offset %= PAGE_SIZE;
>> +	if (PageHighMem(page))
>> +		n = min_t(size_t, bytes, PAGE_SIZE);
> This is smart.
> 
>> +	while (1) {
>> +		char *kaddr = kmap_atomic(page) + offset;
>> +		iterate_and_advance(i, n, base, len, off,
>> +			copyin(kaddr + off, base, len),
>> +			memcpy_from_iter(i, kaddr + off, base, len)
>> +		)
>> +		kunmap_atomic(kaddr);
>> +		copied += n;
>> +		if (!PageHighMem(page) || copied == bytes || n == 0)
>> +			break;
> My understanding is copied == bytes could cover !PageHighMem(page).
> 
>> +		offset += n;
>> +		page += offset / PAGE_SIZE;
> Should be page += n / PAGE_SIZE? Thanks.
offset / PAGE_SIZE is correct. Sorry for the noise.

Regards
Yin, Fengwei

> 
> 
> Regards
> Yin, Fengwei
> 
>> +		offset %= PAGE_SIZE;
>> +		n = min_t(size_t, bytes - copied, PAGE_SIZE);
>>  	}
>> -	iterate_and_advance(i, bytes, base, len, off,
>> -		copyin(p + off, base, len),
>> -		memcpy_from_iter(i, p + off, base, len)
>> -	)
>> -	kunmap_atomic(kaddr);
>> -	return bytes;
>> +	return copied;
>>  }
>>  EXPORT_SYMBOL(copy_page_from_iter_atomic);
>>

Yin, Fengwei June 7, 2023, 6:40 a.m. UTC | #7

On 6/7/23 02:07, Matthew Wilcox wrote:
> On Mon, Jun 05, 2023 at 04:25:22PM +0800, Yin, Fengwei wrote:
>> On 6/5/2023 6:11 AM, Matthew Wilcox wrote:
>>> On Sun, Jun 04, 2023 at 11:29:52AM -0700, Darrick J. Wong wrote:
>>>> On Fri, Jun 02, 2023 at 11:24:44PM +0100, Matthew Wilcox (Oracle) wrote:
>>>>> -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
>>>>> +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
>>>>
>>>> I think I've gotten lost in the weeds.  Does copy_page_from_iter_atomic
>>>> actually know how to deal with a multipage folio?  AFAICT it takes a
>>>> page, kmaps it, and copies @bytes starting at @offset in the page.  If
>>>> a caller feeds it a multipage folio, does that all work correctly?  Or
>>>> will the pagecache split multipage folios as needed to make it work
>>>> right?
>>>
>>> It's a smidgen inefficient, but it does work.  First, it calls
>>> page_copy_sane() to check that offset & n fit within the compound page
>>> (ie this all predates folios).
>>>
>>> ... Oh.  copy_page_from_iter() handles this correctly.
>>> copy_page_from_iter_atomic() doesn't.  I'll have to fix this
>>> first.  Looks like Al fixed copy_page_from_iter() in c03f05f183cd
>>> and didn't fix copy_page_from_iter_atomic().
>>>
>>>> If we create a 64k folio at pos 0 and then want to write a byte at pos
>>>> 40k, does __filemap_get_folio break up the 64k folio so that the folio
>>>> returned by iomap_get_folio starts at 40k?  Or can the iter code handle
>>>> jumping ten pages into a 16-page folio and I just can't see it?
>>>
>>> Well ... it handles it fine unless it's highmem.  p is kaddr + offset,
>>> so if offset is 40k, it works correctly on !highmem.
>> So is it better to have implementations for !highmem and highmem? And for
>> !highmem, we don't need the kmap_local_page()/kunmap_local() and chunk
>> size per copy is not limited to PAGE_SIZE. Thanks.
> 
> No, that's not needed; we can handle that just fine.  Maybe this can
> use kmap_local_page() instead of kmap_atomic().  Al, what do you think?
> I haven't tested this yet; need to figure out a qemu config with highmem ...
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 960223ed9199..d3d6a0789625 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
>  }
>  EXPORT_SYMBOL(iov_iter_zero);
>  
> -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes,
> -				  struct iov_iter *i)
> +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset,
> +		size_t bytes, struct iov_iter *i)
>  {
> -	char *kaddr = kmap_atomic(page), *p = kaddr + offset;
> -	if (!page_copy_sane(page, offset, bytes)) {
> -		kunmap_atomic(kaddr);
> +	size_t n = bytes, copied = 0;
> +
> +	if (!page_copy_sane(page, offset, bytes))
>  		return 0;
> -	}
> -	if (WARN_ON_ONCE(!i->data_source)) {
> -		kunmap_atomic(kaddr);
> +	if (WARN_ON_ONCE(!i->data_source))
>  		return 0;
> +
> +	page += offset / PAGE_SIZE;
> +	offset %= PAGE_SIZE;
> +	if (PageHighMem(page))
> +		n = min_t(size_t, bytes, PAGE_SIZE);
Should be PAGE_SIZE - offset instead of PAGE_SIZE?

> +	while (1) {
> +		char *kaddr = kmap_atomic(page) + offset;
> +		iterate_and_advance(i, n, base, len, off,
> +			copyin(kaddr + off, base, len),
> +			memcpy_from_iter(i, kaddr + off, base, len)
> +		)
> +		kunmap_atomic(kaddr);
> +		copied += n;
> +		if (!PageHighMem(page) || copied == bytes || n == 0)
> +			break;
> +		offset += n;
> +		page += offset / PAGE_SIZE;
> +		offset %= PAGE_SIZE;
> +		n = min_t(size_t, bytes - copied, PAGE_SIZE);

Should be PAGE_SIZE - offset instead of PAGE_SIZE? Thanks.


Regards
Yin, Fengwei

>  	}
> -	iterate_and_advance(i, bytes, base, len, off,
> -		copyin(p + off, base, len),
> -		memcpy_from_iter(i, p + off, base, len)
> -	)
> -	kunmap_atomic(kaddr);
> -	return bytes;
> +	return copied;
>  }
>  EXPORT_SYMBOL(copy_page_from_iter_atomic);
>

Matthew Wilcox June 7, 2023, 3:55 p.m. UTC | #8

On Wed, Jun 07, 2023 at 10:21:41AM +0800, Yin Fengwei wrote:
> On 6/7/23 02:07, Matthew Wilcox wrote:
> > +++ b/lib/iov_iter.c
> > @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
> >  }
> >  EXPORT_SYMBOL(iov_iter_zero);
> >  
> > -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes,
> > -				  struct iov_iter *i)
> > +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset,
> > +		size_t bytes, struct iov_iter *i)
> >  {
> > -	char *kaddr = kmap_atomic(page), *p = kaddr + offset;
> > -	if (!page_copy_sane(page, offset, bytes)) {
> > -		kunmap_atomic(kaddr);
> > +	size_t n = bytes, copied = 0;
> > +
> > +	if (!page_copy_sane(page, offset, bytes))
> >  		return 0;
> > -	}
> > -	if (WARN_ON_ONCE(!i->data_source)) {
> > -		kunmap_atomic(kaddr);
> > +	if (WARN_ON_ONCE(!i->data_source))
> >  		return 0;
> > +
> > +	page += offset / PAGE_SIZE;
> > +	offset %= PAGE_SIZE;
> > +	if (PageHighMem(page))
> > +		n = min_t(size_t, bytes, PAGE_SIZE);
> This is smart.

Thanks ;-)

> > +	while (1) {
> > +		char *kaddr = kmap_atomic(page) + offset;
> > +		iterate_and_advance(i, n, base, len, off,
> > +			copyin(kaddr + off, base, len),
> > +			memcpy_from_iter(i, kaddr + off, base, len)
> > +		)
> > +		kunmap_atomic(kaddr);
> > +		copied += n;
> > +		if (!PageHighMem(page) || copied == bytes || n == 0)
> > +			break;
> My understanding is copied == bytes could cover !PageHighMem(page).

It could!  But the PageHighMem test serves two purposes.  One is that
it tells the human reader that this is all because of HighMem.  The
other is that on !HIGHMEM systems it compiles to false:

PAGEFLAG_FALSE(HighMem, highmem)

static inline int Page##uname(const struct page *page) { return 0; }

So it tells the _compiler_ that all of this code is ignorable and
it can optimise it out.  Now, you and I know that it will always
be true, but it lets the compiler remove the test.  Hopefully the
compiler can also see that:

	while (1) {
		...
		if (true)
			break;
	}

means it can optimise away the entire loop structure and just produce
the same code that it always did.

Matthew Wilcox June 7, 2023, 3:56 p.m. UTC | #9

On Wed, Jun 07, 2023 at 02:40:02PM +0800, Yin Fengwei wrote:
> > +	page += offset / PAGE_SIZE;
> > +	offset %= PAGE_SIZE;
> > +	if (PageHighMem(page))
> > +		n = min_t(size_t, bytes, PAGE_SIZE);
> Should be PAGE_SIZE - offset instead of PAGE_SIZE?

Yes, it should.  Thanks.

Yin, Fengwei June 8, 2023, 1:22 a.m. UTC | #10

On 6/7/23 23:55, Matthew Wilcox wrote:
> On Wed, Jun 07, 2023 at 10:21:41AM +0800, Yin Fengwei wrote:
>> On 6/7/23 02:07, Matthew Wilcox wrote:
>>> +++ b/lib/iov_iter.c
>>> @@ -857,24 +857,36 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
>>>  }
>>>  EXPORT_SYMBOL(iov_iter_zero);
>>>  
>>> -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes,
>>> -				  struct iov_iter *i)
>>> +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset,
>>> +		size_t bytes, struct iov_iter *i)
>>>  {
>>> -	char *kaddr = kmap_atomic(page), *p = kaddr + offset;
>>> -	if (!page_copy_sane(page, offset, bytes)) {
>>> -		kunmap_atomic(kaddr);
>>> +	size_t n = bytes, copied = 0;
>>> +
>>> +	if (!page_copy_sane(page, offset, bytes))
>>>  		return 0;
>>> -	}
>>> -	if (WARN_ON_ONCE(!i->data_source)) {
>>> -		kunmap_atomic(kaddr);
>>> +	if (WARN_ON_ONCE(!i->data_source))
>>>  		return 0;
>>> +
>>> +	page += offset / PAGE_SIZE;
>>> +	offset %= PAGE_SIZE;
>>> +	if (PageHighMem(page))
>>> +		n = min_t(size_t, bytes, PAGE_SIZE);
>> This is smart.
> 
> Thanks ;-)
> 
>>> +	while (1) {
>>> +		char *kaddr = kmap_atomic(page) + offset;
>>> +		iterate_and_advance(i, n, base, len, off,
>>> +			copyin(kaddr + off, base, len),
>>> +			memcpy_from_iter(i, kaddr + off, base, len)
>>> +		)
>>> +		kunmap_atomic(kaddr);
>>> +		copied += n;
>>> +		if (!PageHighMem(page) || copied == bytes || n == 0)
>>> +			break;
>> My understanding is copied == bytes could cover !PageHighMem(page).
> 
> It could!  But the PageHighMem test serves two purposes.  One is that
> it tells the human reader that this is all because of HighMem.  The
> other is that on !HIGHMEM systems it compiles to false:
> 
> PAGEFLAG_FALSE(HighMem, highmem)
> 
> static inline int Page##uname(const struct page *page) { return 0; }
> 
> So it tells the _compiler_ that all of this code is ignorable and
> it can optimise it out.  Now, you and I know that it will always
> be true, but it lets the compiler remove the test.  Hopefully the
> compiler can also see that:
> 
> 	while (1) {
> 		...
> 		if (true)
> 			break;
> 	}
> 
> means it can optimise away the entire loop structure and just produce
> the same code that it always did.
I thought about the first purpose. But the second purpose is new thing
I learned from this thread. Thanks a lot for detail explanation.


Regards
Yin, Fengwei

[v2,7/7] iomap: Copy larger chunks from userspace

Commit Message

Comments

Patch