diff mbox series

[v3,6/6] mm,thp: handle writes to file with THP in pagecache

Message ID 20190619062424.3486524-7-songliubraving@fb.com (mailing list archive)
State New, archived
Headers show
Series Enable THP for text section of non-shmem files | expand

Commit Message

Song Liu June 19, 2019, 6:24 a.m. UTC
In previous patch, an application could put part of its text section in
THP via madvise(). These THPs will be protected from writes when the
application is still running (TXTBSY). However, after the application
exits, the file is available for writes.

This patch briefly handles such writes by truncate the whole file in
sys_open(). A new counter nr_thps is added to struct address_space.
in truncate_pagecache(), if nr_thps is not zero, we force truncate of
the whole file.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 fs/inode.c         |  3 +++
 include/linux/fs.h | 31 +++++++++++++++++++++++++++++++
 mm/filemap.c       |  1 +
 mm/khugepaged.c    |  4 +++-
 mm/truncate.c      |  7 ++++++-
 5 files changed, 44 insertions(+), 2 deletions(-)

Comments

Rik van Riel June 20, 2019, 1:39 a.m. UTC | #1
On Tue, 2019-06-18 at 23:24 -0700, Song Liu wrote:

> index 8563339041f6..bab8d9eef46c 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -790,7 +790,11 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages2);
>  void truncate_pagecache(struct inode *inode, loff_t newsize)
>  {
>  	struct address_space *mapping = inode->i_mapping;
> -	loff_t holebegin = round_up(newsize, PAGE_SIZE);
> +	loff_t holebegin;
> +
> +	/* if non-shmem file has thp, truncate the whole file */
> +	if (filemap_nr_thps(mapping))
> +		newsize = 0;
>  

I don't get it. Sometimes truncate is used to
increase the size of a file, or to change it
to a non-zero size.

Won't forcing the newsize to zero break applications,
when the file is truncated to a different size than
they expect?
Song Liu June 20, 2019, 2:10 a.m. UTC | #2
> On Jun 19, 2019, at 6:39 PM, Rik van Riel <riel@fb.com> wrote:
> 
> On Tue, 2019-06-18 at 23:24 -0700, Song Liu wrote:
> 
>> index 8563339041f6..bab8d9eef46c 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -790,7 +790,11 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages2);
>> void truncate_pagecache(struct inode *inode, loff_t newsize)
>> {
>> 	struct address_space *mapping = inode->i_mapping;
>> -	loff_t holebegin = round_up(newsize, PAGE_SIZE);
>> +	loff_t holebegin;
>> +
>> +	/* if non-shmem file has thp, truncate the whole file */
>> +	if (filemap_nr_thps(mapping))
>> +		newsize = 0;
>> 
> 
> I don't get it. Sometimes truncate is used to
> increase the size of a file, or to change it
> to a non-zero size.
> 
> Won't forcing the newsize to zero break applications,
> when the file is truncated to a different size than
> they expect?

This is not truncate the file. It only drops page cache. 
truncate_setsize() will still set correct size. I don't 
think this breaks anything. 

We can probably make it smarter and only drop the clean
huge pages (dirty page should not exist). 

Thanks,
Song
Rik van Riel June 20, 2019, 1 p.m. UTC | #3
On Wed, 2019-06-19 at 22:10 -0400, Song Liu wrote:
> > On Jun 19, 2019, at 6:39 PM, Rik van Riel <riel@fb.com> wrote:
> > 
> > On Tue, 2019-06-18 at 23:24 -0700, Song Liu wrote:
> > 
> > > index 8563339041f6..bab8d9eef46c 100644
> > > --- a/mm/truncate.c
> > > +++ b/mm/truncate.c
> > > @@ -790,7 +790,11 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages2);
> > > void truncate_pagecache(struct inode *inode, loff_t newsize)
> > > {
> > > 	struct address_space *mapping = inode->i_mapping;
> > > -	loff_t holebegin = round_up(newsize, PAGE_SIZE);
> > > +	loff_t holebegin;
> > > +
> > > +	/* if non-shmem file has thp, truncate the whole file */
> > > +	if (filemap_nr_thps(mapping))
> > > +		newsize = 0;
> > > 
> > 
> > I don't get it. Sometimes truncate is used to
> > increase the size of a file, or to change it
> > to a non-zero size.
> > 
> > Won't forcing the newsize to zero break applications,
> > when the file is truncated to a different size than
> > they expect?
> 
> This is not truncate the file. It only drops page cache. 
> truncate_setsize() will still set correct size. I don't 
> think this breaks anything. 

Ahhh, indeed. Good point.

I wonder if the dropping of the page cache could be
done automatically from open(), if it determines that
there are no more readonly THP users of the file, and
the new opener wants to write to the file?

That magic could be in a helper function, so it would
be just a one line change in the same spot where it
currently denies the permission :)

> We can probably make it smarter and only drop the clean
> huge pages (dirty page should not exist).
William Kucharski June 20, 2019, 1:55 p.m. UTC | #4
> On Jun 19, 2019, at 8:10 PM, Song Liu <songliubraving@fb.com> wrote:
> 
> This is not truncate the file. It only drops page cache. 
> truncate_setsize() will still set correct size. I don't 
> think this breaks anything. 
> 
> We can probably make it smarter and only drop the clean
> huge pages (dirty page should not exist). 

It sounds like I will need this change for my THP work as well for the
same reason; once a RO THP text page is in the page cache, if the file is
marked writable strange things will occur.
diff mbox series

Patch

diff --git a/fs/inode.c b/fs/inode.c
index df6542ec3b88..518113a4e219 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -181,6 +181,9 @@  int inode_init_always(struct super_block *sb, struct inode *inode)
 	mapping->flags = 0;
 	mapping->wb_err = 0;
 	atomic_set(&mapping->i_mmap_writable, 0);
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+	atomic_set(&mapping->nr_thps, 0);
+#endif
 	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
 	mapping->private_data = NULL;
 	mapping->writeback_index = 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f7fdfe93e25d..3edf4ee42eee 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -444,6 +444,10 @@  struct address_space {
 	struct xarray		i_pages;
 	gfp_t			gfp_mask;
 	atomic_t		i_mmap_writable;
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+	/* number of thp, only for non-shmem files */
+	atomic_t		nr_thps;
+#endif
 	struct rb_root_cached	i_mmap;
 	struct rw_semaphore	i_mmap_rwsem;
 	unsigned long		nrpages;
@@ -2790,6 +2794,33 @@  static inline errseq_t filemap_sample_wb_err(struct address_space *mapping)
 	return errseq_sample(&mapping->wb_err);
 }
 
+static inline int filemap_nr_thps(struct address_space *mapping)
+{
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+	return atomic_read(&mapping->nr_thps);
+#else
+	return 0;
+#endif
+}
+
+static inline void filemap_nr_thps_inc(struct address_space *mapping)
+{
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+	atomic_inc(&mapping->nr_thps);
+#else
+	WARN_ON_ONCE(1);
+#endif
+}
+
+static inline void filemap_nr_thps_dec(struct address_space *mapping)
+{
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+	atomic_dec(&mapping->nr_thps);
+#else
+	WARN_ON_ONCE(1);
+#endif
+}
+
 extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end,
 			   int datasync);
 extern int vfs_fsync(struct file *file, int datasync);
diff --git a/mm/filemap.c b/mm/filemap.c
index e79ceccdc6df..a8e86c136381 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -205,6 +205,7 @@  static void unaccount_page_cache_page(struct address_space *mapping,
 			__dec_node_page_state(page, NR_SHMEM_THPS);
 	} else if (PageTransHuge(page)) {
 		__dec_node_page_state(page, NR_FILE_THPS);
+		filemap_nr_thps_dec(mapping);
 	}
 
 	/*
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fbcff5a1d65a..17ebe9da56ce 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1500,8 +1500,10 @@  static void collapse_file(struct vm_area_struct *vma,
 
 	if (is_shmem)
 		__inc_node_page_state(new_page, NR_SHMEM_THPS);
-	else
+	else {
 		__inc_node_page_state(new_page, NR_FILE_THPS);
+		filemap_nr_thps_inc(mapping);
+	}
 
 	if (nr_none) {
 		struct zone *zone = page_zone(new_page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 8563339041f6..bab8d9eef46c 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -790,7 +790,11 @@  EXPORT_SYMBOL_GPL(invalidate_inode_pages2);
 void truncate_pagecache(struct inode *inode, loff_t newsize)
 {
 	struct address_space *mapping = inode->i_mapping;
-	loff_t holebegin = round_up(newsize, PAGE_SIZE);
+	loff_t holebegin;
+
+	/* if non-shmem file has thp, truncate the whole file */
+	if (filemap_nr_thps(mapping))
+		newsize = 0;
 
 	/*
 	 * unmap_mapping_range is called twice, first simply for
@@ -801,6 +805,7 @@  void truncate_pagecache(struct inode *inode, loff_t newsize)
 	 * truncate_inode_pages finishes, hence the second
 	 * unmap_mapping_range call must be made for correctness.
 	 */
+	holebegin = round_up(newsize, PAGE_SIZE);
 	unmap_mapping_range(mapping, holebegin, 0, 1);
 	truncate_inode_pages(mapping, newsize);
 	unmap_mapping_range(mapping, holebegin, 0, 1);