Message ID | 1474994615-29553-7-git-send-email-jack@suse.cz (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, Sep 27, 2016 at 06:43:35PM +0200, Jan Kara wrote: > @@ -1410,6 +1422,17 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED)) > return -EIO; > > + /* > + * Write can allocate block for an area which has a hole page mapped > + * into page tables. We have to tear down these mappings so that data > + * written by write(2) is visible in mmap. > + */ > + if (iomap->flags & IOMAP_F_NEW && inode->i_mapping->nrpages) { gcc should be throwing warnings about that: if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) { Cheers, Dave.
On Tue, Sep 27, 2016 at 06:43:35PM +0200, Jan Kara wrote: > Currently each filesystem (possibly through generic_file_direct_write() > or iomap_dax_rw()) takes care of invalidating page tables and evicting > hole pages from the radix tree when write(2) to the file happens. This > invalidation is only necessary when there is some block allocation > resulting from write(2). Furthermore in current place the invalidation > is racy wrt page fault instantiating a hole page just after we have > invalidated it. > > So perform the page invalidation inside dax_do_io() where we can do it > only when really necessary and after blocks have been allocated so > nobody will be instantiating new hole pages anymore. > > Signed-off-by: Jan Kara <jack@suse.cz> This looks fine with the comment from Dave addressed: Reviewed-by: Christoph Hellwig <hch@lst.de> > + if (buffer_new(bh) && > + inode->i_mapping->nrpages) { Btw, it would be nice if the nrpages check could move into invalidate_inode_pages2_range instead of having to bother with it in the callers.
On Wed 28-09-16 10:20:34, Dave Chinner wrote: > On Tue, Sep 27, 2016 at 06:43:35PM +0200, Jan Kara wrote: > > @@ -1410,6 +1422,17 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > > if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED)) > > return -EIO; > > > > + /* > > + * Write can allocate block for an area which has a hole page mapped > > + * into page tables. We have to tear down these mappings so that data > > + * written by write(2) is visible in mmap. > > + */ > > + if (iomap->flags & IOMAP_F_NEW && inode->i_mapping->nrpages) { > > gcc should be throwing warnings about that: > > if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) { Actually the bitwise '&' takes precedense over the logical '&&' so the evaluation order ends up being correct. But I agree it's better to be explicit with parenthesis here. Fixed. Honza
On Fri 30-09-16 01:55:44, Christoph Hellwig wrote: > On Tue, Sep 27, 2016 at 06:43:35PM +0200, Jan Kara wrote: > > Currently each filesystem (possibly through generic_file_direct_write() > > or iomap_dax_rw()) takes care of invalidating page tables and evicting > > hole pages from the radix tree when write(2) to the file happens. This > > invalidation is only necessary when there is some block allocation > > resulting from write(2). Furthermore in current place the invalidation > > is racy wrt page fault instantiating a hole page just after we have > > invalidated it. > > > > So perform the page invalidation inside dax_do_io() where we can do it > > only when really necessary and after blocks have been allocated so > > nobody will be instantiating new hole pages anymore. > > > > Signed-off-by: Jan Kara <jack@suse.cz> > > This looks fine with the comment from Dave addressed: > > Reviewed-by: Christoph Hellwig <hch@lst.de> Thanks. > > + if (buffer_new(bh) && > > + inode->i_mapping->nrpages) { > > Btw, it would be nice if the nrpages check could move into > invalidate_inode_pages2_range instead of having to bother with it in > the callers. We cannot do that - someone can be possibly calling invalidate_inode_pages2_range() to invalidate exceptional entries in the given range (although I don't think there's currently any such caller). DAX code currently needs it only to invalidate hole pages so that's why it can do an optimization to call invalidate_inode_pages2_range() only when mapping->nrpages > 0. But in general it would be a trap for invalidate_inode_pages2_range() to work only if mapping->nrpages > 0... Honza
On Mon, Oct 03, 2016 at 04:58:02PM +0200, Jan Kara wrote: > On Wed 28-09-16 10:20:34, Dave Chinner wrote: > > On Tue, Sep 27, 2016 at 06:43:35PM +0200, Jan Kara wrote: > > > @@ -1410,6 +1422,17 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > > > if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED)) > > > return -EIO; > > > > > > + /* > > > + * Write can allocate block for an area which has a hole page mapped > > > + * into page tables. We have to tear down these mappings so that data > > > + * written by write(2) is visible in mmap. > > > + */ > > > + if (iomap->flags & IOMAP_F_NEW && inode->i_mapping->nrpages) { > > > > gcc should be throwing warnings about that: > > > > if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) { > > Actually the bitwise '&' takes precedense over the logical '&&' so the > evaluation order ends up being correct. Yes, I know that. However, my concern is that such expressions don't indicate the /intent/ of the author and so it can be difficult when reading the code to determine if the logic is correct or whether it is a typo and is wrong. In many cases, & and && will function identically for the tested cases, so neither cursory review or testing would catch something like this being wrong. > But I agree it's better to be > explicit with parenthesis here. Fixed. Yup, a little bit of "documentation" goes a long way :P Cheers, Dave.
diff --git a/fs/dax.c b/fs/dax.c index c8a639d2214e..2f69ca891aab 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -186,6 +186,18 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, */ WARN_ON_ONCE(rw == WRITE && buffer_unwritten(bh)); + /* + * Write can allocate block for an area which + * has a hole page mapped into page tables. We + * have to tear down these mappings so that + * data written by write(2) is visible in mmap. + */ + if (buffer_new(bh) && + inode->i_mapping->nrpages) { + invalidate_inode_pages2_range( + inode->i_mapping, page, + (bh_max - 1) >> PAGE_SHIFT); + } } else { unsigned done = bh->b_size - (bh_max - (pos - first)); @@ -1410,6 +1422,17 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data, if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED)) return -EIO; + /* + * Write can allocate block for an area which has a hole page mapped + * into page tables. We have to tear down these mappings so that data + * written by write(2) is visible in mmap. + */ + if (iomap->flags & IOMAP_F_NEW && inode->i_mapping->nrpages) { + invalidate_inode_pages2_range(inode->i_mapping, + pos >> PAGE_SHIFT, + (end - 1) >> PAGE_SHIFT); + } + while (pos < end) { unsigned offset = pos & (PAGE_SIZE - 1); struct blk_dax_ctl dax = { 0 }; @@ -1469,23 +1492,6 @@ iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter, if (iov_iter_rw(iter) == WRITE) flags |= IOMAP_WRITE; - /* - * Yes, even DAX files can have page cache attached to them: A zeroed - * page is inserted into the pagecache when we have to serve a write - * fault on a hole. It should never be dirtied and can simply be - * dropped from the pagecache once we get real data for the page. - * - * XXX: This is racy against mmap, and there's nothing we can do about - * it. We'll eventually need to shift this down even further so that - * we can check if we allocated blocks over a hole first. - */ - if (mapping->nrpages) { - ret = invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, - (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT); - WARN_ON_ONCE(ret); - } - while (iov_iter_count(iter)) { ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops, iter, iomap_dax_actor);
Currently each filesystem (possibly through generic_file_direct_write() or iomap_dax_rw()) takes care of invalidating page tables and evicting hole pages from the radix tree when write(2) to the file happens. This invalidation is only necessary when there is some block allocation resulting from write(2). Furthermore in current place the invalidation is racy wrt page fault instantiating a hole page just after we have invalidated it. So perform the page invalidation inside dax_do_io() where we can do it only when really necessary and after blocks have been allocated so nobody will be instantiating new hole pages anymore. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/dax.c | 40 +++++++++++++++++++++++----------------- 1 file changed, 23 insertions(+), 17 deletions(-)