Message ID | f15f57def8e69cf288f0646f819b784fe15fabe2.1683069252.git.ackerleytng@google.com (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | Fix fallocate error in hugetlbfs when fallocating again | expand |
On 05/02/23 23:56, Ackerley Tng wrote: > When fallocate() is called twice on the same offset in the file, the > second fallocate() should succeed. > > page_cache_next_miss() always advances index before returning, so even > on a page cache hit, the check would set present to false. Thank you Ackerley for finding this! When I read the description of page_cache_next_miss(), I assumed present = page_cache_next_miss(mapping, index, 1) != index; would tell us if there was a page at index in the cache. However, when looking closer at the code it does not check for a page at index, but rather starts looking at index+1. Perhaps that is why it is named next? Matthew, I think the use of the above statement was your suggestion. And you know the xarray code better than anyone. I just want to make sure page_cache_next_miss is operating as designed/expected. If so, then the changes suggested here make sense. In addition, the same code is in hugetlbfs_pagecache_present and will have this same issue.
On 05/02/23 20:05, Mike Kravetz wrote: > On 05/02/23 23:56, Ackerley Tng wrote: > > When fallocate() is called twice on the same offset in the file, the > > second fallocate() should succeed. > > > > page_cache_next_miss() always advances index before returning, so even > > on a page cache hit, the check would set present to false. > > Thank you Ackerley for finding this! > > When I read the description of page_cache_next_miss(), I assumed > > present = page_cache_next_miss(mapping, index, 1) != index; > > would tell us if there was a page at index in the cache. > > However, when looking closer at the code it does not check for a page > at index, but rather starts looking at index+1. Perhaps that is why > it is named next? > > Matthew, I think the use of the above statement was your suggestion. > And you know the xarray code better than anyone. I just want to make > sure page_cache_next_miss is operating as designed/expected. If so, > then the changes suggested here make sense. I took a closer look at the code today. page_cache_next_miss has a 'special case' for index 0. The function description says: * Return: The index of the gap if found, otherwise an index outside the * range specified (in which case 'return - index >= max_scan' will be true). * In the rare case of index wrap-around, 0 will be returned. And, the loop in the routine does: while (max_scan--) { void *entry = xas_next(&xas); if (!entry || xa_is_value(entry)) break; if (xas.xa_index == 0) break; } At first glance, I thought xas_next always went to the next entry but now see that is not the case here because this is a new state with xa_node = XAS_RESTART. So, xas_next is effectively a xas_load. This means in the case were index == 0, page_cache_next_miss(mapping, index, 1) will ALWAYS return zero even if a page is present. I need to look at the xarray code and this rare index wrap-around case to see if we can somehow modify that check for xas.xa_index == 0 in page_cache_next_miss.
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ecfdfb2529a3..f640cff1bbce 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -821,7 +821,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, */ struct folio *folio; unsigned long addr; - bool present; cond_resched(); @@ -845,10 +844,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, mutex_lock(&hugetlb_fault_mutex_table[hash]); /* See if already present in mapping to avoid alloc/free */ - rcu_read_lock(); - present = page_cache_next_miss(mapping, index, 1) != index; - rcu_read_unlock(); - if (present) { + if (filemap_has_folio(mapping, index)) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); hugetlb_drop_vma_policy(&pseudo_vma); continue;
When fallocate() is called twice on the same offset in the file, the second fallocate() should succeed. page_cache_next_miss() always advances index before returning, so even on a page cache hit, the check would set present to false. Signed-off-by: Ackerley Tng <ackerleytng@google.com> --- fs/hugetlbfs/inode.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)