diff mbox series

[v2,4/6] iomap: correct the dirty length in page mkwrite

Message ID 20240812121159.3775074-5-yi.zhang@huaweicloud.com (mailing list archive)
State Accepted, archived
Headers show
Series iomap: some minor non-critical fixes and improvements when block size < folio size | expand

Commit Message

Zhang Yi Aug. 12, 2024, 12:11 p.m. UTC
From: Zhang Yi <yi.zhang@huawei.com>

When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
folio by folio_mark_dirty() even the map length is shorter than one
folio. However, on the filesystem with more than one blocks per folio,
we'd better to only set counterpart block's dirty bit according to
iomap_length(), so open code folio_mark_dirty() and pass the correct
length.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong Aug. 12, 2024, 4:45 p.m. UTC | #1
On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
> folio by folio_mark_dirty() even the map length is shorter than one
> folio. However, on the filesystem with more than one blocks per folio,
> we'd better to only set counterpart block's dirty bit according to
> iomap_length(), so open code folio_mark_dirty() and pass the correct
> length.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>  fs/iomap/buffered-io.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 79031b7517e5..ac762de9a27f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1492,7 +1492,10 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter,
>  		block_commit_write(&folio->page, 0, length);
>  	} else {
>  		WARN_ON_ONCE(!folio_test_uptodate(folio));
> -		folio_mark_dirty(folio);
> +
> +		ifs_alloc(iter->inode, folio, 0);
> +		iomap_set_range_dirty(folio, 0, length);
> +		filemap_dirty_folio(iter->inode->i_mapping, folio);

Is it correct to be doing a lot more work by changing folio_mark_dirty
to filemap_dirty_folio?  Now pagefaults call __mark_inode_dirty which
they did not before.  Also, the folio itself must be marked dirty if any
of the ifs bitmap is marked dirty, so I don't understand the change
here.

--D

>  	}
>  
>  	return length;
> -- 
> 2.39.2
> 
>
Zhang Yi Aug. 13, 2024, 2:49 a.m. UTC | #2
On 2024/8/13 0:45, Darrick J. Wong wrote:
> On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
>> folio by folio_mark_dirty() even the map length is shorter than one
>> folio. However, on the filesystem with more than one blocks per folio,
>> we'd better to only set counterpart block's dirty bit according to
>> iomap_length(), so open code folio_mark_dirty() and pass the correct
>> length.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>>  fs/iomap/buffered-io.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 79031b7517e5..ac762de9a27f 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -1492,7 +1492,10 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter,
>>  		block_commit_write(&folio->page, 0, length);
>>  	} else {
>>  		WARN_ON_ONCE(!folio_test_uptodate(folio));
>> -		folio_mark_dirty(folio);
>> +
>> +		ifs_alloc(iter->inode, folio, 0);
>> +		iomap_set_range_dirty(folio, 0, length);
>> +		filemap_dirty_folio(iter->inode->i_mapping, folio);
> 
> Is it correct to be doing a lot more work by changing folio_mark_dirty
> to filemap_dirty_folio?  Now pagefaults call __mark_inode_dirty which
> they did not before.  Also, the folio itself must be marked dirty if any
> of the ifs bitmap is marked dirty, so I don't understand the change
> here.
> 

This change is just open code iomap_dirty_folio() and correct the length
that passing to iomap_set_range_dirty().

bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio)
{
	struct inode *inode = mapping->host;
	size_t len = folio_size(folio);
...
	ifs_alloc(inode, folio, 0);
	iomap_set_range_dirty(folio, 0, len);
	return filemap_dirty_folio(mapping, folio);
}

Before this change, the code also call filemap_dirty_folio() (though
folio_mark_dirty()->iomap_dirty_folio()->filemap_dirty_folio()), so it call
__mark_inode_dirty() too. After this change, filemap_dirty_folio()->
folio_test_set_dirty() will mark the folio dirty. Hence there is no
difference between the two points you mentioned. Am I missing something?

Thanks,
Yi.

> 
>>  	}
>>  
>>  	return length;
>> -- 
>> 2.39.2
>>
>>
Christoph Hellwig Aug. 14, 2024, 5:36 a.m. UTC | #3
On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
> folio by folio_mark_dirty() even the map length is shorter than one
> folio. However, on the filesystem with more than one blocks per folio,
> we'd better to only set counterpart block's dirty bit according to
> iomap_length(), so open code folio_mark_dirty() and pass the correct
> length.

What about moving the folio_mark_dirty out of the loop and directly
into iomap_page_mkwrite so that it is exactly called once?  The
iterator then does nothing for the !buffer_head case (but we still
need to call it to allocate the blocks).
Zhang Yi Aug. 14, 2024, 7:49 a.m. UTC | #4
On 2024/8/14 13:36, Christoph Hellwig wrote:
> On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
>> folio by folio_mark_dirty() even the map length is shorter than one
>> folio. However, on the filesystem with more than one blocks per folio,
>> we'd better to only set counterpart block's dirty bit according to
>> iomap_length(), so open code folio_mark_dirty() and pass the correct
>> length.
> 
> What about moving the folio_mark_dirty out of the loop and directly
> into iomap_page_mkwrite so that it is exactly called once?  The
> iterator then does nothing for the !buffer_head case (but we still
> need to call it to allocate the blocks).
> 

Sorry, this makes me confused. How does this could prevent setting
redundant dirty bits?

Suppose we have a 3K regular file on a filesystem with 1K block size.
In iomap_page_mkwrite(), the iter.len is 3K, if the folio size is 4K,
folio_mark_dirty() will also mark all 4 bits of ifs dirty. And then,
if we expand this file size to 4K, and this will still lead to a hole
with dirty bit set but without any block allocated/reserved. Am I
missing something?

Thanks,
Yi.
Christoph Hellwig Aug. 15, 2024, 5:59 a.m. UTC | #5
On Wed, Aug 14, 2024 at 03:49:41PM +0800, Zhang Yi wrote:
> Sorry, this makes me confused. How does this could prevent setting
> redundant dirty bits?
> 
> Suppose we have a 3K regular file on a filesystem with 1K block size.
> In iomap_page_mkwrite(), the iter.len is 3K, if the folio size is 4K,
> folio_mark_dirty() will also mark all 4 bits of ifs dirty. And then,
> if we expand this file size to 4K, and this will still lead to a hole
> with dirty bit set but without any block allocated/reserved. Am I
> missing something?

No, we still need the ifs manipulation in the loop indeed.  But
the filemap_dirty_folio (and the not uptodate warning) can still
move outside the iterator.
Zhang Yi Aug. 16, 2024, 2:19 a.m. UTC | #6
On 2024/8/15 13:59, Christoph Hellwig wrote:
> On Wed, Aug 14, 2024 at 03:49:41PM +0800, Zhang Yi wrote:
>> Sorry, this makes me confused. How does this could prevent setting
>> redundant dirty bits?
>>
>> Suppose we have a 3K regular file on a filesystem with 1K block size.
>> In iomap_page_mkwrite(), the iter.len is 3K, if the folio size is 4K,
>> folio_mark_dirty() will also mark all 4 bits of ifs dirty. And then,
>> if we expand this file size to 4K, and this will still lead to a hole
>> with dirty bit set but without any block allocated/reserved. Am I
>> missing something?
> 
> No, we still need the ifs manipulation in the loop indeed.  But
> the filemap_dirty_folio (and the not uptodate warning) can still
> move outside the iterator.
> 
Yes, right, I misunderstood.

Thanks,
Yi.
Matthew Wilcox Aug. 17, 2024, 4:45 a.m. UTC | #7
On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
> folio by folio_mark_dirty() even the map length is shorter than one
> folio. However, on the filesystem with more than one blocks per folio,
> we'd better to only set counterpart block's dirty bit according to
> iomap_length(), so open code folio_mark_dirty() and pass the correct
> length.

We shouldn't waste any time trying to optimise writing to files through
mmap().  People have written papers about what a terrible programming
model this is.  eg https://db.cs.cmu.edu/mmap-cidr2022/

There are some programs that do it, but they are few and far between.
Zhang Yi Aug. 17, 2024, 6:43 a.m. UTC | #8
On 2024/8/17 12:45, Matthew Wilcox wrote:
> On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
>> folio by folio_mark_dirty() even the map length is shorter than one
>> folio. However, on the filesystem with more than one blocks per folio,
>> we'd better to only set counterpart block's dirty bit according to
>> iomap_length(), so open code folio_mark_dirty() and pass the correct
>> length.
> 
> We shouldn't waste any time trying to optimise writing to files through
> mmap().  People have written papers about what a terrible programming
> model this is.  eg https://db.cs.cmu.edu/mmap-cidr2022/
> 
> There are some programs that do it, but they are few and far between.
> 

I don't think it's an optimization, this is a fix, the issue is the same
to the one that previous 2 patches want to fix: there left a hole with
ifs dirty bit set but without any valid data and without any block
allocated/reserved. This mmap() path is just once case that could lead
to this issue (The case if as I said in my reply to Christoph, suppose
we have a 3K regular file on a filesystem with 1K block size. If the
folio size is 4K, In iomap_page_mkwrite(), it will mark all 4 bits of ifs
dirty, the last one will become redundant if we expand the file to 4K
later).

Hence in my opinion, we need to fix this, or it's gonna be a potential
problem one day.

Thanks,
Yi.
diff mbox series

Patch

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 79031b7517e5..ac762de9a27f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1492,7 +1492,10 @@  static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter,
 		block_commit_write(&folio->page, 0, length);
 	} else {
 		WARN_ON_ONCE(!folio_test_uptodate(folio));
-		folio_mark_dirty(folio);
+
+		ifs_alloc(iter->inode, folio, 0);
+		iomap_set_range_dirty(folio, 0, length);
+		filemap_dirty_folio(iter->inode->i_mapping, folio);
 	}
 
 	return length;