Message ID | 20240812121159.3775074-5-yi.zhang@huaweicloud.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | iomap: some minor non-critical fixes and improvements when block size < folio size | expand |
On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@huawei.com> > > When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire > folio by folio_mark_dirty() even the map length is shorter than one > folio. However, on the filesystem with more than one blocks per folio, > we'd better to only set counterpart block's dirty bit according to > iomap_length(), so open code folio_mark_dirty() and pass the correct > length. > > Signed-off-by: Zhang Yi <yi.zhang@huawei.com> > --- > fs/iomap/buffered-io.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 79031b7517e5..ac762de9a27f 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1492,7 +1492,10 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter, > block_commit_write(&folio->page, 0, length); > } else { > WARN_ON_ONCE(!folio_test_uptodate(folio)); > - folio_mark_dirty(folio); > + > + ifs_alloc(iter->inode, folio, 0); > + iomap_set_range_dirty(folio, 0, length); > + filemap_dirty_folio(iter->inode->i_mapping, folio); Is it correct to be doing a lot more work by changing folio_mark_dirty to filemap_dirty_folio? Now pagefaults call __mark_inode_dirty which they did not before. Also, the folio itself must be marked dirty if any of the ifs bitmap is marked dirty, so I don't understand the change here. --D > } > > return length; > -- > 2.39.2 > >
On 2024/8/13 0:45, Darrick J. Wong wrote: > On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@huawei.com> >> >> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire >> folio by folio_mark_dirty() even the map length is shorter than one >> folio. However, on the filesystem with more than one blocks per folio, >> we'd better to only set counterpart block's dirty bit according to >> iomap_length(), so open code folio_mark_dirty() and pass the correct >> length. >> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> >> --- >> fs/iomap/buffered-io.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c >> index 79031b7517e5..ac762de9a27f 100644 >> --- a/fs/iomap/buffered-io.c >> +++ b/fs/iomap/buffered-io.c >> @@ -1492,7 +1492,10 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter, >> block_commit_write(&folio->page, 0, length); >> } else { >> WARN_ON_ONCE(!folio_test_uptodate(folio)); >> - folio_mark_dirty(folio); >> + >> + ifs_alloc(iter->inode, folio, 0); >> + iomap_set_range_dirty(folio, 0, length); >> + filemap_dirty_folio(iter->inode->i_mapping, folio); > > Is it correct to be doing a lot more work by changing folio_mark_dirty > to filemap_dirty_folio? Now pagefaults call __mark_inode_dirty which > they did not before. Also, the folio itself must be marked dirty if any > of the ifs bitmap is marked dirty, so I don't understand the change > here. > This change is just open code iomap_dirty_folio() and correct the length that passing to iomap_set_range_dirty(). bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio) { struct inode *inode = mapping->host; size_t len = folio_size(folio); ... ifs_alloc(inode, folio, 0); iomap_set_range_dirty(folio, 0, len); return filemap_dirty_folio(mapping, folio); } Before this change, the code also call filemap_dirty_folio() (though folio_mark_dirty()->iomap_dirty_folio()->filemap_dirty_folio()), so it call __mark_inode_dirty() too. After this change, filemap_dirty_folio()-> folio_test_set_dirty() will mark the folio dirty. Hence there is no difference between the two points you mentioned. Am I missing something? Thanks, Yi. > >> } >> >> return length; >> -- >> 2.39.2 >> >>
On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@huawei.com> > > When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire > folio by folio_mark_dirty() even the map length is shorter than one > folio. However, on the filesystem with more than one blocks per folio, > we'd better to only set counterpart block's dirty bit according to > iomap_length(), so open code folio_mark_dirty() and pass the correct > length. What about moving the folio_mark_dirty out of the loop and directly into iomap_page_mkwrite so that it is exactly called once? The iterator then does nothing for the !buffer_head case (but we still need to call it to allocate the blocks).
On 2024/8/14 13:36, Christoph Hellwig wrote: > On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@huawei.com> >> >> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire >> folio by folio_mark_dirty() even the map length is shorter than one >> folio. However, on the filesystem with more than one blocks per folio, >> we'd better to only set counterpart block's dirty bit according to >> iomap_length(), so open code folio_mark_dirty() and pass the correct >> length. > > What about moving the folio_mark_dirty out of the loop and directly > into iomap_page_mkwrite so that it is exactly called once? The > iterator then does nothing for the !buffer_head case (but we still > need to call it to allocate the blocks). > Sorry, this makes me confused. How does this could prevent setting redundant dirty bits? Suppose we have a 3K regular file on a filesystem with 1K block size. In iomap_page_mkwrite(), the iter.len is 3K, if the folio size is 4K, folio_mark_dirty() will also mark all 4 bits of ifs dirty. And then, if we expand this file size to 4K, and this will still lead to a hole with dirty bit set but without any block allocated/reserved. Am I missing something? Thanks, Yi.
On Wed, Aug 14, 2024 at 03:49:41PM +0800, Zhang Yi wrote: > Sorry, this makes me confused. How does this could prevent setting > redundant dirty bits? > > Suppose we have a 3K regular file on a filesystem with 1K block size. > In iomap_page_mkwrite(), the iter.len is 3K, if the folio size is 4K, > folio_mark_dirty() will also mark all 4 bits of ifs dirty. And then, > if we expand this file size to 4K, and this will still lead to a hole > with dirty bit set but without any block allocated/reserved. Am I > missing something? No, we still need the ifs manipulation in the loop indeed. But the filemap_dirty_folio (and the not uptodate warning) can still move outside the iterator.
On 2024/8/15 13:59, Christoph Hellwig wrote: > On Wed, Aug 14, 2024 at 03:49:41PM +0800, Zhang Yi wrote: >> Sorry, this makes me confused. How does this could prevent setting >> redundant dirty bits? >> >> Suppose we have a 3K regular file on a filesystem with 1K block size. >> In iomap_page_mkwrite(), the iter.len is 3K, if the folio size is 4K, >> folio_mark_dirty() will also mark all 4 bits of ifs dirty. And then, >> if we expand this file size to 4K, and this will still lead to a hole >> with dirty bit set but without any block allocated/reserved. Am I >> missing something? > > No, we still need the ifs manipulation in the loop indeed. But > the filemap_dirty_folio (and the not uptodate warning) can still > move outside the iterator. > Yes, right, I misunderstood. Thanks, Yi.
On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@huawei.com> > > When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire > folio by folio_mark_dirty() even the map length is shorter than one > folio. However, on the filesystem with more than one blocks per folio, > we'd better to only set counterpart block's dirty bit according to > iomap_length(), so open code folio_mark_dirty() and pass the correct > length. We shouldn't waste any time trying to optimise writing to files through mmap(). People have written papers about what a terrible programming model this is. eg https://db.cs.cmu.edu/mmap-cidr2022/ There are some programs that do it, but they are few and far between.
On 2024/8/17 12:45, Matthew Wilcox wrote: > On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@huawei.com> >> >> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire >> folio by folio_mark_dirty() even the map length is shorter than one >> folio. However, on the filesystem with more than one blocks per folio, >> we'd better to only set counterpart block's dirty bit according to >> iomap_length(), so open code folio_mark_dirty() and pass the correct >> length. > > We shouldn't waste any time trying to optimise writing to files through > mmap(). People have written papers about what a terrible programming > model this is. eg https://db.cs.cmu.edu/mmap-cidr2022/ > > There are some programs that do it, but they are few and far between. > I don't think it's an optimization, this is a fix, the issue is the same to the one that previous 2 patches want to fix: there left a hole with ifs dirty bit set but without any valid data and without any block allocated/reserved. This mmap() path is just once case that could lead to this issue (The case if as I said in my reply to Christoph, suppose we have a 3K regular file on a filesystem with 1K block size. If the folio size is 4K, In iomap_page_mkwrite(), it will mark all 4 bits of ifs dirty, the last one will become redundant if we expand the file to 4K later). Hence in my opinion, we need to fix this, or it's gonna be a potential problem one day. Thanks, Yi.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 79031b7517e5..ac762de9a27f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1492,7 +1492,10 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter, block_commit_write(&folio->page, 0, length); } else { WARN_ON_ONCE(!folio_test_uptodate(folio)); - folio_mark_dirty(folio); + + ifs_alloc(iter->inode, folio, 0); + iomap_set_range_dirty(folio, 0, length); + filemap_dirty_folio(iter->inode->i_mapping, folio); } return length;