diff mbox series

[2/2] fs: hugetlbfs: Fix logic to skip allocation on hit in page cache

Message ID f15f57def8e69cf288f0646f819b784fe15fabe2.1683069252.git.ackerleytng@google.com (mailing list archive)
State Mainlined, archived
Headers show
Series Fix fallocate error in hugetlbfs when fallocating again | expand

Commit Message

Ackerley Tng May 2, 2023, 11:56 p.m. UTC
When fallocate() is called twice on the same offset in the file, the
second fallocate() should succeed.

page_cache_next_miss() always advances index before returning, so even
on a page cache hit, the check would set present to false.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 fs/hugetlbfs/inode.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

Comments

Mike Kravetz May 3, 2023, 3:05 a.m. UTC | #1
On 05/02/23 23:56, Ackerley Tng wrote:
> When fallocate() is called twice on the same offset in the file, the
> second fallocate() should succeed.
> 
> page_cache_next_miss() always advances index before returning, so even
> on a page cache hit, the check would set present to false.

Thank you Ackerley for finding this!

When I read the description of page_cache_next_miss(), I assumed

	present = page_cache_next_miss(mapping, index, 1) != index;

would tell us if there was a page at index in the cache.

However, when looking closer at the code it does not check for a page
at index, but rather starts looking at index+1.  Perhaps that is why
it is named next?

Matthew, I think the use of the above statement was your suggestion.
And you know the xarray code better than anyone.  I just want to make
sure page_cache_next_miss is operating as designed/expected.  If so,
then the changes suggested here make sense.

In addition, the same code is in hugetlbfs_pagecache_present and will
have this same issue.
Mike Kravetz May 4, 2023, 12:14 a.m. UTC | #2
On 05/02/23 20:05, Mike Kravetz wrote:
> On 05/02/23 23:56, Ackerley Tng wrote:
> > When fallocate() is called twice on the same offset in the file, the
> > second fallocate() should succeed.
> > 
> > page_cache_next_miss() always advances index before returning, so even
> > on a page cache hit, the check would set present to false.
> 
> Thank you Ackerley for finding this!
> 
> When I read the description of page_cache_next_miss(), I assumed
> 
> 	present = page_cache_next_miss(mapping, index, 1) != index;
> 
> would tell us if there was a page at index in the cache.
> 
> However, when looking closer at the code it does not check for a page
> at index, but rather starts looking at index+1.  Perhaps that is why
> it is named next?
> 
> Matthew, I think the use of the above statement was your suggestion.
> And you know the xarray code better than anyone.  I just want to make
> sure page_cache_next_miss is operating as designed/expected.  If so,
> then the changes suggested here make sense.

I took a closer look at the code today.

page_cache_next_miss has a 'special case' for index 0.  The function
description says:

 * Return: The index of the gap if found, otherwise an index outside the
 * range specified (in which case 'return - index >= max_scan' will be true).
 * In the rare case of index wrap-around, 0 will be returned.

And, the loop in the routine does:

	while (max_scan--) {
		void *entry = xas_next(&xas);
		if (!entry || xa_is_value(entry))
			break;
		if (xas.xa_index == 0)
			break;
	}

At first glance, I thought xas_next always went to the next entry but
now see that is not the case here because this is a new state with
xa_node = XAS_RESTART.  So, xas_next is effectively a xas_load.

This means in the case were index == 0,

	page_cache_next_miss(mapping, index, 1)

will ALWAYS return zero even if a page is present.

I need to look at the xarray code and this rare index wrap-around case
to see if we can somehow modify that check for xas.xa_index == 0 in
page_cache_next_miss.
diff mbox series

Patch

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ecfdfb2529a3..f640cff1bbce 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -821,7 +821,6 @@  static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		 */
 		struct folio *folio;
 		unsigned long addr;
-		bool present;
 
 		cond_resched();
 
@@ -845,10 +844,7 @@  static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
 
 		/* See if already present in mapping to avoid alloc/free */
-		rcu_read_lock();
-		present = page_cache_next_miss(mapping, index, 1) != index;
-		rcu_read_unlock();
-		if (present) {
+		if (filemap_has_folio(mapping, index)) {
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			hugetlb_drop_vma_policy(&pseudo_vma);
 			continue;