Message ID | 20190809225833.6657-7-ira.weiny@intel.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | RDMA/FS DAX truncate proposal V1,000,002 ;-) | expand |
On Fri, Aug 09, 2019 at 03:58:20PM -0700, ira.weiny@intel.com wrote: > From: Ira Weiny <ira.weiny@intel.com> > > Callers of dax_layout_busy_page() are only rarely operating on the > entire file of concern. > > Teach dax_layout_busy_page() to operate on a sub-range of the > address_space provided. Specifying 0 - ULONG_MAX however, will continue > to operate on the "entire file" and XFS is split out to a separate patch > by this method. > > This could potentially speed up dax_layout_busy_page() as well. I need this functionality as well for virtio_fs and posted a patch for this. https://lkml.org/lkml/2019/8/21/825 Given this is an optimization which existing users can benefit from already, this patch could probably be pushed upstream independently. > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > Changes from RFC v1 > Fix 0-day build errors > > fs/dax.c | 15 +++++++++++---- > fs/ext4/ext4.h | 2 +- > fs/ext4/extents.c | 6 +++--- > fs/ext4/inode.c | 19 ++++++++++++------- > fs/xfs/xfs_file.c | 3 ++- > include/linux/dax.h | 6 ++++-- > 6 files changed, 33 insertions(+), 18 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index a14ec32255d8..3ad19c384454 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -573,8 +573,11 @@ bool dax_mapping_is_dax(struct address_space *mapping) > EXPORT_SYMBOL_GPL(dax_mapping_is_dax); > > /** > - * dax_layout_busy_page - find first pinned page in @mapping > + * dax_layout_busy_page - find first pinned page in @mapping within > + * the range @off - @off + @len > * @mapping: address space to scan for a page with ref count > 1 > + * @off: offset to start at > + * @len: length to scan through > * > * DAX requires ZONE_DEVICE mapped pages. These pages are never > * 'onlined' to the page allocator so they are considered idle when > @@ -587,9 +590,13 @@ EXPORT_SYMBOL_GPL(dax_mapping_is_dax); > * to be able to run unmap_mapping_range() and subsequently not race > * mapping_mapped() becoming true. > */ > -struct page *dax_layout_busy_page(struct address_space *mapping) > +struct page *dax_layout_busy_page(struct address_space *mapping, > + loff_t off, loff_t len) > { > - XA_STATE(xas, &mapping->i_pages, 0); > + unsigned long start_idx = off >> PAGE_SHIFT; > + unsigned long end_idx = (len == ULONG_MAX) ? ULONG_MAX > + : start_idx + (len >> PAGE_SHIFT); > + XA_STATE(xas, &mapping->i_pages, start_idx); > void *entry; > unsigned int scanned = 0; > struct page *page = NULL; > @@ -612,7 +619,7 @@ struct page *dax_layout_busy_page(struct address_space *mapping) > unmap_mapping_range(mapping, 0, 0, 1); Should we unmap only those pages which fall in the range specified by caller. Unmapping whole file seems to be less efficient. Thanks Vivek
On Fri, Aug 23, 2019 at 11:18:26AM -0400, Vivek Goyal wrote: > On Fri, Aug 09, 2019 at 03:58:20PM -0700, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > Callers of dax_layout_busy_page() are only rarely operating on the > > entire file of concern. > > > > Teach dax_layout_busy_page() to operate on a sub-range of the > > address_space provided. Specifying 0 - ULONG_MAX however, will continue > > to operate on the "entire file" and XFS is split out to a separate patch > > by this method. > > > > This could potentially speed up dax_layout_busy_page() as well. > > I need this functionality as well for virtio_fs and posted a patch for > this. > > https://lkml.org/lkml/2019/8/21/825 > > Given this is an optimization which existing users can benefit from already, > this patch could probably be pushed upstream independently. I'm ok with that. However, this patch does not apply cleanly to head as I had some other additions to dax.h. > > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > > > --- > > Changes from RFC v1 > > Fix 0-day build errors > > > > fs/dax.c | 15 +++++++++++---- > > fs/ext4/ext4.h | 2 +- > > fs/ext4/extents.c | 6 +++--- > > fs/ext4/inode.c | 19 ++++++++++++------- > > fs/xfs/xfs_file.c | 3 ++- > > include/linux/dax.h | 6 ++++-- > > 6 files changed, 33 insertions(+), 18 deletions(-) > > > > diff --git a/fs/dax.c b/fs/dax.c > > index a14ec32255d8..3ad19c384454 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -573,8 +573,11 @@ bool dax_mapping_is_dax(struct address_space *mapping) > > EXPORT_SYMBOL_GPL(dax_mapping_is_dax); > > > > /** > > - * dax_layout_busy_page - find first pinned page in @mapping > > + * dax_layout_busy_page - find first pinned page in @mapping within > > + * the range @off - @off + @len > > * @mapping: address space to scan for a page with ref count > 1 > > + * @off: offset to start at > > + * @len: length to scan through > > * > > * DAX requires ZONE_DEVICE mapped pages. These pages are never > > * 'onlined' to the page allocator so they are considered idle when > > @@ -587,9 +590,13 @@ EXPORT_SYMBOL_GPL(dax_mapping_is_dax); > > * to be able to run unmap_mapping_range() and subsequently not race > > * mapping_mapped() becoming true. > > */ > > -struct page *dax_layout_busy_page(struct address_space *mapping) > > +struct page *dax_layout_busy_page(struct address_space *mapping, > > + loff_t off, loff_t len) > > { > > - XA_STATE(xas, &mapping->i_pages, 0); > > + unsigned long start_idx = off >> PAGE_SHIFT; > > + unsigned long end_idx = (len == ULONG_MAX) ? ULONG_MAX > > + : start_idx + (len >> PAGE_SHIFT); > > + XA_STATE(xas, &mapping->i_pages, start_idx); > > void *entry; > > unsigned int scanned = 0; > > struct page *page = NULL; > > @@ -612,7 +619,7 @@ struct page *dax_layout_busy_page(struct address_space *mapping) > > unmap_mapping_range(mapping, 0, 0, 1); > > Should we unmap only those pages which fall in the range specified by caller. > Unmapping whole file seems to be less efficient. Seems reasonable to me. I was focused on getting pages which were busy not necessarily on what got unmapped. So I did not consider this. Thanks for the suggestion. However, I don't understand the math you do for length? Is this comment/code correct? + /* length is being calculated from lstart and not start. + * This is due to behavior of unmap_mapping_range(). If + * start is say 4094 and end is on 4093 then want to + * unamp two pages, idx 0 and 1. But unmap_mapping_range() + * will unmap only page at idx 0. If we calculate len + * from the rounded down start, this problem should not + * happen. + */ + len = end - lstart + 1; How can end (4093) be < start (4094)? Is that valid? And why would a start of 4094 unmap idx 0? Ira
diff --git a/fs/dax.c b/fs/dax.c index a14ec32255d8..3ad19c384454 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -573,8 +573,11 @@ bool dax_mapping_is_dax(struct address_space *mapping) EXPORT_SYMBOL_GPL(dax_mapping_is_dax); /** - * dax_layout_busy_page - find first pinned page in @mapping + * dax_layout_busy_page - find first pinned page in @mapping within + * the range @off - @off + @len * @mapping: address space to scan for a page with ref count > 1 + * @off: offset to start at + * @len: length to scan through * * DAX requires ZONE_DEVICE mapped pages. These pages are never * 'onlined' to the page allocator so they are considered idle when @@ -587,9 +590,13 @@ EXPORT_SYMBOL_GPL(dax_mapping_is_dax); * to be able to run unmap_mapping_range() and subsequently not race * mapping_mapped() becoming true. */ -struct page *dax_layout_busy_page(struct address_space *mapping) +struct page *dax_layout_busy_page(struct address_space *mapping, + loff_t off, loff_t len) { - XA_STATE(xas, &mapping->i_pages, 0); + unsigned long start_idx = off >> PAGE_SHIFT; + unsigned long end_idx = (len == ULONG_MAX) ? ULONG_MAX + : start_idx + (len >> PAGE_SHIFT); + XA_STATE(xas, &mapping->i_pages, start_idx); void *entry; unsigned int scanned = 0; struct page *page = NULL; @@ -612,7 +619,7 @@ struct page *dax_layout_busy_page(struct address_space *mapping) unmap_mapping_range(mapping, 0, 0, 1); xas_lock_irq(&xas); - xas_for_each(&xas, entry, ULONG_MAX) { + xas_for_each(&xas, entry, end_idx) { if (WARN_ON_ONCE(!xa_is_value(entry))) continue; if (unlikely(dax_is_locked(entry))) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9c7f4036021b..32738ccdac1d 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2578,7 +2578,7 @@ extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *); extern int ext4_inode_attach_jinode(struct inode *inode); extern int ext4_can_truncate(struct inode *inode); extern int ext4_truncate(struct inode *); -extern int ext4_break_layouts(struct inode *); +extern int ext4_break_layouts(struct inode *inode, loff_t offset, loff_t len); extern int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length); extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int nblocks); extern void ext4_set_inode_flags(struct inode *); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 92266a2da7d6..ded4b1d92299 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4736,7 +4736,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, */ down_write(&EXT4_I(inode)->i_mmap_sem); - ret = ext4_break_layouts(inode); + ret = ext4_break_layouts(inode, offset, len); if (ret) { up_write(&EXT4_I(inode)->i_mmap_sem); goto out_mutex; @@ -5419,7 +5419,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) */ down_write(&EXT4_I(inode)->i_mmap_sem); - ret = ext4_break_layouts(inode); + ret = ext4_break_layouts(inode, offset, len); if (ret) goto out_mmap; @@ -5572,7 +5572,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) */ down_write(&EXT4_I(inode)->i_mmap_sem); - ret = ext4_break_layouts(inode); + ret = ext4_break_layouts(inode, offset, len); if (ret) goto out_mmap; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f08f48de52c5..d3fc6035428c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4262,7 +4262,7 @@ static void ext4_wait_dax_page(struct ext4_inode_info *ei) down_write(&ei->i_mmap_sem); } -int ext4_break_layouts(struct inode *inode) +int ext4_break_layouts(struct inode *inode, loff_t offset, loff_t len) { struct ext4_inode_info *ei = EXT4_I(inode); struct page *page; @@ -4279,7 +4279,7 @@ int ext4_break_layouts(struct inode *inode) } do { - page = dax_layout_busy_page(inode->i_mapping); + page = dax_layout_busy_page(inode->i_mapping, offset, len); if (!page) return 0; @@ -4366,7 +4366,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) */ down_write(&EXT4_I(inode)->i_mmap_sem); - ret = ext4_break_layouts(inode); + ret = ext4_break_layouts(inode, offset, length); if (ret) goto out_dio; @@ -5657,10 +5657,15 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) down_write(&EXT4_I(inode)->i_mmap_sem); - rc = ext4_break_layouts(inode); - if (rc) { - up_write(&EXT4_I(inode)->i_mmap_sem); - return rc; + if (shrink) { + loff_t off = attr->ia_size; + loff_t len = inode->i_size - attr->ia_size; + + rc = ext4_break_layouts(inode, off, len); + if (rc) { + up_write(&EXT4_I(inode)->i_mmap_sem); + return rc; + } } if (attr->ia_size != inode->i_size) { diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 28101bbc0b78..8f8d478f9ec6 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -740,7 +740,8 @@ xfs_break_dax_layouts( ASSERT(xfs_isilocked(XFS_I(inode), XFS_MMAPLOCK_EXCL)); - page = dax_layout_busy_page(inode->i_mapping); + /* We default to the "whole file" */ + page = dax_layout_busy_page(inode->i_mapping, 0, ULONG_MAX); if (!page) return 0; diff --git a/include/linux/dax.h b/include/linux/dax.h index da0768b34b48..f34616979e45 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -144,7 +144,8 @@ int dax_writeback_mapping_range(struct address_space *mapping, struct block_device *bdev, struct writeback_control *wbc); bool dax_mapping_is_dax(struct address_space *mapping); -struct page *dax_layout_busy_page(struct address_space *mapping); +struct page *dax_layout_busy_page(struct address_space *mapping, + loff_t off, loff_t len); dax_entry_t dax_lock_page(struct page *page); void dax_unlock_page(struct page *page, dax_entry_t cookie); #else @@ -180,7 +181,8 @@ static inline bool dax_mapping_is_dax(struct address_space *mapping) return false; } -static inline struct page *dax_layout_busy_page(struct address_space *mapping) +static inline struct page *dax_layout_busy_page(struct address_space *mapping, + loff_t off, loff_t len) { return NULL; }