Message ID | 20230202204428.3267832-2-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fix a minor POSIX conformance problem | expand |
On Thu, Feb 02, 2023 at 08:44:23PM +0000, Matthew Wilcox (Oracle) wrote: > POSIX requires that "If the file size is increased, the extended area > shall appear as if it were zero-filled". It is possible to use mmap to > write past EOF and that data will become visible instead of zeroes. > This fixes the problem for the filesystems which simply call > truncate_setsize(). More complex filesystems will need their own > patches. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/truncate.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/mm/truncate.c b/mm/truncate.c > index 7b4ea4c4a46b..cebfc5415e9a 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -763,9 +763,12 @@ void truncate_setsize(struct inode *inode, loff_t newsize) > loff_t oldsize = inode->i_size; > > i_size_write(inode, newsize); > - if (newsize > oldsize) > + if (newsize > oldsize) { > pagecache_isize_extended(inode, oldsize, newsize); > - truncate_pagecache(inode, newsize); > + truncate_pagecache(inode, oldsize); > + } else { > + truncate_pagecache(inode, newsize); > + } I don't think this alone quite addresses the problem. Looking at ext4 for example, if the eof page is dirty and writeback occurs between the i_size update (because writeback also zeroes the post-eof portion of the page) and the truncate_setsize() call, we end up with pagecache inconsistency because pagecache truncate doesn't dirty the page it zeroes. So for example, with this series plus a nefariously placed filemap_flush() in ext4_setattr(): # xfs_io -fc "truncate 1" -c "mmap 0 1k" -c "mwrite 0 10" -c "truncate 5" -c "mread -v 0 5" /mnt/file 00000000: 58 00 00 00 00 X.... # umount /mnt/; mount <dev> /mnt/ # xfs_io -c "mmap 0 1k" -c "mread -v 0 5" /mnt/file 00000000: 58 58 58 58 58 XXXXX Brian > } > EXPORT_SYMBOL(truncate_setsize); > > -- > 2.35.1 > >
On Fri, Feb 03, 2023 at 08:00:16AM -0500, Brian Foster wrote: > On Thu, Feb 02, 2023 at 08:44:23PM +0000, Matthew Wilcox (Oracle) wrote: > > POSIX requires that "If the file size is increased, the extended area > > shall appear as if it were zero-filled". It is possible to use mmap to > > write past EOF and that data will become visible instead of zeroes. > > This fixes the problem for the filesystems which simply call > > truncate_setsize(). More complex filesystems will need their own > > patches. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > --- > > mm/truncate.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/mm/truncate.c b/mm/truncate.c > > index 7b4ea4c4a46b..cebfc5415e9a 100644 > > --- a/mm/truncate.c > > +++ b/mm/truncate.c > > @@ -763,9 +763,12 @@ void truncate_setsize(struct inode *inode, loff_t newsize) > > loff_t oldsize = inode->i_size; > > > > i_size_write(inode, newsize); > > - if (newsize > oldsize) > > + if (newsize > oldsize) { > > pagecache_isize_extended(inode, oldsize, newsize); > > - truncate_pagecache(inode, newsize); > > + truncate_pagecache(inode, oldsize); > > + } else { > > + truncate_pagecache(inode, newsize); > > + } > > I don't think this alone quite addresses the problem. Looking at ext4 > for example, if the eof page is dirty and writeback occurs between the > i_size update (because writeback also zeroes the post-eof portion of the > page) and the truncate_setsize() call, we end up with pagecache > inconsistency because pagecache truncate doesn't dirty the page it > zeroes. > > So for example, with this series plus a nefariously placed > filemap_flush() in ext4_setattr(): > > # xfs_io -fc "truncate 1" -c "mmap 0 1k" -c "mwrite 0 10" -c "truncate 5" -c "mread -v 0 5" /mnt/file > 00000000: 58 00 00 00 00 X.... > # umount /mnt/; mount <dev> /mnt/ > # xfs_io -c "mmap 0 1k" -c "mread -v 0 5" /mnt/file > 00000000: 58 58 58 58 58 XXXXX Hm, so switch the order of i_size_write() and truncate_pagecache()? There could still be a store between old-EOF and new-EOF from another thread, which would then be visible, but I don't think you could prove that store should have been zeroed. Not from the thread doing the ftruncate() anyway -- I think the thread doing the store could prove it, but that thread is relying on undefined behaviour anyway.
diff --git a/mm/truncate.c b/mm/truncate.c index 7b4ea4c4a46b..cebfc5415e9a 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -763,9 +763,12 @@ void truncate_setsize(struct inode *inode, loff_t newsize) loff_t oldsize = inode->i_size; i_size_write(inode, newsize); - if (newsize > oldsize) + if (newsize > oldsize) { pagecache_isize_extended(inode, oldsize, newsize); - truncate_pagecache(inode, newsize); + truncate_pagecache(inode, oldsize); + } else { + truncate_pagecache(inode, newsize); + } } EXPORT_SYMBOL(truncate_setsize);
POSIX requires that "If the file size is increased, the extended area shall appear as if it were zero-filled". It is possible to use mmap to write past EOF and that data will become visible instead of zeroes. This fixes the problem for the filesystems which simply call truncate_setsize(). More complex filesystems will need their own patches. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/truncate.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)