diff mbox series

mm/hugetlb: revert use of page_cache_next_miss()

Message ID 20230505185301.534259-1-sidhartha.kumar@oracle.com (mailing list archive)
State New
Headers show
Series mm/hugetlb: revert use of page_cache_next_miss() | expand

Commit Message

Sidhartha Kumar May 5, 2023, 6:53 p.m. UTC
As reported by Ackerley[1], the use of page_cache_next_miss() in
hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
same offset fails with -EEXIST. Revert this change and go back to the
previous method of using get from the page cache and then dropping the
reference on success.

hugetlbfs_pagecache_present() was also refactored to use
page_cache_next_miss(), revert the usage there as well.

User visible impacts include hugetlb fallocate incorrectly returning
EEXIST if pages are already present in the file. In addition, hugetlb
pages will not be included in core dumps if they need to be brought in via
GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
already present in the cache. It may try to allocate a new page and
potentially return ENOMEM as opposed to EEXIST.

Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
Cc: <stable@vger.kernel.org> #v6.3+
Reported-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>

[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
---
This patch is meant to fix stable v6.3.1 as safe as possible by doing a
simple revert.

Patch page cache: fix page_cache_next/prev_miss off by one by Mike is a
potential fix that will allow the use of page_cache_next_miss() and is
awaiting review.

Patch Fix fallocate error in hugetlbfs when fallocating again by Ackerley
is another fix but introduces a new function and is also awaiting review.

 fs/hugetlbfs/inode.c |  8 +++-----
 mm/hugetlb.c         | 11 +++++------
 2 files changed, 8 insertions(+), 11 deletions(-)

Comments

Mike Kravetz May 5, 2023, 9:58 p.m. UTC | #1
On 05/05/23 11:53, Sidhartha Kumar wrote:
> As reported by Ackerley[1], the use of page_cache_next_miss() in
> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
> same offset fails with -EEXIST. Revert this change and go back to the
> previous method of using get from the page cache and then dropping the
> reference on success.
> 
> hugetlbfs_pagecache_present() was also refactored to use
> page_cache_next_miss(), revert the usage there as well.
> 
> User visible impacts include hugetlb fallocate incorrectly returning
> EEXIST if pages are already present in the file. In addition, hugetlb
> pages will not be included in core dumps if they need to be brought in via
> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
> already present in the cache. It may try to allocate a new page and
> potentially return ENOMEM as opposed to EEXIST.
> 
> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")

Small nit and a question for people more familiar with stable backports.

d0ce0e47b323 added the usage of page_cache_next_miss to hugetlb fallocate.
91a2fb956ad99 added the usage to hugetlbfs_pagecache_present.  Both are
in v6.3 and d0ce0e47b323 (referenced here) comes later.  So, I 'think' it
is OK to fix both instances with a single patch and reference the commit
where both are present.  Or, should there be two patches which is more
technically correct?

> Cc: <stable@vger.kernel.org> #v6.3+
> Reported-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> 
> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
> ---
> This patch is meant to fix stable v6.3.1 as safe as possible by doing a
> simple revert.
> 
> Patch page cache: fix page_cache_next/prev_miss off by one by Mike is a
> potential fix that will allow the use of page_cache_next_miss() and is
> awaiting review.
> 
> Patch Fix fallocate error in hugetlbfs when fallocating again by Ackerley
> is another fix but introduces a new function and is also awaiting review.
> 
>  fs/hugetlbfs/inode.c |  8 +++-----
>  mm/hugetlb.c         | 11 +++++------
>  2 files changed, 8 insertions(+), 11 deletions(-)

IMO, this is safest and simplest way of fixing v6.3.  My proposed changes to
page_cache_next/prev_miss have the potential to impact readahead, so really
need review/testing by someone more familiar with that.  If a fix is
urgently needed, I would suggest using this for backport and then either
use my patch or expand Ackerley's proposal to move forward.

As a backport to stable,
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Mike Kravetz May 16, 2023, 11:12 p.m. UTC | #2
On 05/05/23 14:58, Mike Kravetz wrote:
> On 05/05/23 11:53, Sidhartha Kumar wrote:
> > As reported by Ackerley[1], the use of page_cache_next_miss() in
> > hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
> > same offset fails with -EEXIST. Revert this change and go back to the
> > previous method of using get from the page cache and then dropping the
> > reference on success.
> > 
> > hugetlbfs_pagecache_present() was also refactored to use
> > page_cache_next_miss(), revert the usage there as well.
> > 
> > User visible impacts include hugetlb fallocate incorrectly returning
> > EEXIST if pages are already present in the file. In addition, hugetlb
> > pages will not be included in core dumps if they need to be brought in via
> > GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
> > already present in the cache. It may try to allocate a new page and
> > potentially return ENOMEM as opposed to EEXIST.
> > 
> > Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
> 
> Small nit and a question for people more familiar with stable backports.
> 
> d0ce0e47b323 added the usage of page_cache_next_miss to hugetlb fallocate.
> 91a2fb956ad99 added the usage to hugetlbfs_pagecache_present.  Both are
> in v6.3 and d0ce0e47b323 (referenced here) comes later.  So, I 'think' it
> is OK to fix both instances with a single patch and reference the commit
> where both are present.  Or, should there be two patches which is more
> technically correct?
> 
> > Cc: <stable@vger.kernel.org> #v6.3+
> > Reported-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> > 
> > [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
> > ---
> > This patch is meant to fix stable v6.3.1 as safe as possible by doing a
> > simple revert.
> > 
> > Patch page cache: fix page_cache_next/prev_miss off by one by Mike is a
> > potential fix that will allow the use of page_cache_next_miss() and is
> > awaiting review.
> > 
> > Patch Fix fallocate error in hugetlbfs when fallocating again by Ackerley
> > is another fix but introduces a new function and is also awaiting review.
> > 
> >  fs/hugetlbfs/inode.c |  8 +++-----
> >  mm/hugetlb.c         | 11 +++++------
> >  2 files changed, 8 insertions(+), 11 deletions(-)
> 
> IMO, this is safest and simplest way of fixing v6.3.  My proposed changes to
> page_cache_next/prev_miss have the potential to impact readahead, so really
> need review/testing by someone more familiar with that.  If a fix is
> urgently needed, I would suggest using this for backport and then either
> use my patch or expand Ackerley's proposal to move forward.
> 
> As a backport to stable,
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> -- 
> Mike Kravetz

Any objection to using this patch to fix v6.3 while we decide what is the best
way to move forward?
kernel test robot May 23, 2023, 5 a.m. UTC | #3
Hello,

kernel test robot noticed "BUG:KASAN:null-ptr-deref_in_hugetlbfs_fallocate" on:

commit: 1f944358dbb5e9a6493fd7b1f77ee64376d2bdf1 ("[PATCH] mm/hugetlb: revert use of page_cache_next_miss()")
url: https://github.com/intel-lab-lkp/linux/commits/Sidhartha-Kumar/mm-hugetlb-revert-use-of-page_cache_next_miss/20230506-025434
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 78b421b6a7c6dbb6a213877c742af52330f5026d
patch link: https://lore.kernel.org/all/20230505185301.534259-1-sidhartha.kumar@oracle.com/
patch subject: [PATCH] mm/hugetlb: revert use of page_cache_next_miss()

in testcase: trinity
version: trinity-x86_64-abe9de86-1_20230501
with following parameters:

	runtime: 600s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


compiler: clang-14
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202305231207.35d53791-oliver.sang@intel.com


[ 144.098719][ T1547] BUG: KASAN: null-ptr-deref in hugetlbfs_fallocate (inode.c:?) 
[  144.099404][ T1547] Read of size 4 at addr 0000000000000032 by task trinity-c1/1547
[  144.100071][ T1547]
[  144.100282][ T1547] CPU: 0 PID: 1547 Comm: trinity-c1 Not tainted 6.3.0-13165-g1f944358dbb5 #1 1f0cfaa9708c3e99bb7e2ecf8f7fd22c51fc3e3b
[  144.101310][ T1547] Call Trace:
[  144.101602][ T1547]  <TASK>
[ 144.101858][ T1547] dump_stack_lvl (??:?) 
[ 144.102269][ T1547] print_report (report.c:?) 
[ 144.102655][ T1547] ? start_report (report.c:?) 
[ 144.103044][ T1547] ? hugetlbfs_fallocate (inode.c:?) 
[ 144.103497][ T1547] ? hugetlbfs_fallocate (inode.c:?) 
[ 144.103937][ T1547] kasan_report (??:?) 
[ 144.104270][ T1547] ? filemap_get_entry (??:?) 
[ 144.104656][ T1547] ? hugetlbfs_fallocate (inode.c:?) 
[ 144.105082][ T1547] kasan_check_range (??:?) 
[ 144.105503][ T1547] hugetlbfs_fallocate (inode.c:?) 
[ 144.105921][ T1547] vfs_fallocate (??:?) 
[ 144.106317][ T1547] ksys_fallocate (??:?) 
[ 144.106702][ T1547] __x64_sys_fallocate (??:?) 
[ 144.107121][ T1547] do_syscall_64 (??:?) 
[ 144.107521][ T1547] entry_SYSCALL_64_after_hwframe (??:?) 
[  144.108022][ T1547] RIP: 0033:0x7fedb9a039b9
[ 144.108398][ T1547] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
All code
========
   0:	00 c3                	add    %al,%bl
   2:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
   9:	00 00 00 
   c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  11:	48 89 f8             	mov    %rdi,%rax
  14:	48 89 f7             	mov    %rsi,%rdi
  17:	48 89 d6             	mov    %rdx,%rsi
  1a:	48 89 ca             	mov    %rcx,%rdx
  1d:	4d 89 c2             	mov    %r8,%r10
  20:	4d 89 c8             	mov    %r9,%r8
  23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
  28:	0f 05                	syscall 
  2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
  30:	73 01                	jae    0x33
  32:	c3                   	retq   
  33:	48 8b 0d a7 54 0c 00 	mov    0xc54a7(%rip),%rcx        # 0xc54e1
  3a:	f7 d8                	neg    %eax
  3c:	64 89 01             	mov    %eax,%fs:(%rcx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   6:	73 01                	jae    0x9
   8:	c3                   	retq   
   9:	48 8b 0d a7 54 0c 00 	mov    0xc54a7(%rip),%rcx        # 0xc54b7
  10:	f7 d8                	neg    %eax
  12:	64 89 01             	mov    %eax,%fs:(%rcx)
  15:	48                   	rex.W
[  144.109953][ T1547] RSP: 002b:00007ffdf492f6a8 EFLAGS: 00000246 ORIG_RAX: 000000000000011d
[  144.110612][ T1547] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fedb9a039b9
[  144.111233][ T1547] RDX: 0000000000000008 RSI: 0000000000000000 RDI: 000000000000011a
[  144.111870][ T1547] RBP: 00007fedb839a000 R08: 0000000000000020 R09: 0000000000000090
[  144.112514][ T1547] R10: 0000000000000800 R11: 0000000000000246 R12: 000000000000011d
[  144.113168][ T1547] R13: 00007fedb9ad1580 R14: 00007fedb839a058 R15: 00007fedb839a000
[  144.113814][ T1547]  </TASK>
[  144.114073][ T1547] ==================================================================
[  144.114752][ T1547] Disabling lock debugging due to kernel taint
[  144.115284][ T1547] general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] KASAN
[  144.116161][ T1547] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
[  144.116830][ T1547] CPU: 0 PID: 1547 Comm: trinity-c1 Tainted: G    B              6.3.0-13165-g1f944358dbb5 #1 1f0cfaa9708c3e99bb7e2ecf8f7fd22c51fc3e3b
[ 144.117939][ T1547] RIP: 0010:hugetlbfs_fallocate (inode.c:?) 
[ 144.118431][ T1547] Code: 84 9c 00 00 00 48 89 c5 48 8d 58 34 48 89 df be 04 00 00 00 e8 d5 83 ca ff 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <8a> 04 08 84 c0 0f 85 d8 01 00 00 83 3b 00 0f 84 3a 07 00 00 48 89
All code
========
   0:	84 9c 00 00 00 48 89 	test   %bl,-0x76b80000(%rax,%rax,1)
   7:	c5 48 8d             	(bad)  
   a:	58                   	pop    %rax
   b:	34 48                	xor    $0x48,%al
   d:	89 df                	mov    %ebx,%edi
   f:	be 04 00 00 00       	mov    $0x4,%esi
  14:	e8 d5 83 ca ff       	callq  0xffffffffffca83ee
  19:	48 89 d8             	mov    %rbx,%rax
  1c:	48 c1 e8 03          	shr    $0x3,%rax
  20:	48 b9 00 00 00 00 00 	movabs $0xdffffc0000000000,%rcx
  27:	fc ff df 
  2a:*	8a 04 08             	mov    (%rax,%rcx,1),%al		<-- trapping instruction
  2d:	84 c0                	test   %al,%al
  2f:	0f 85 d8 01 00 00    	jne    0x20d
  35:	83 3b 00             	cmpl   $0x0,(%rbx)
  38:	0f 84 3a 07 00 00    	je     0x778
  3e:	48                   	rex.W
  3f:	89                   	.byte 0x89

Code starting with the faulting instruction
===========================================
   0:	8a 04 08             	mov    (%rax,%rcx,1),%al
   3:	84 c0                	test   %al,%al
   5:	0f 85 d8 01 00 00    	jne    0x1e3
   b:	83 3b 00             	cmpl   $0x0,(%rbx)
   e:	0f 84 3a 07 00 00    	je     0x74e
  14:	48                   	rex.W
  15:	89                   	.byte 0x89
[  144.120027][ T1547] RSP: 0018:ffff88812ba3fd48 EFLAGS: 00010206
[  144.120545][ T1547] RAX: 0000000000000006 RBX: 0000000000000032 RCX: dffffc0000000000
[  144.121198][ T1547] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff8a927100
[  144.121864][ T1547] RBP: fffffffffffffffe R08: dffffc0000000000 R09: fffffbfff1524e21
[  144.122535][ T1547] R10: 0000000000000000 R11: dffff7fff1524e22 R12: 0000000000000000
[  144.123214][ T1547] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000fffffffc
[  144.123947][ T1547] FS:  00007fedb9ad1600(0000) GS:ffffffff87f0a000(0000) knlGS:0000000000000000
[  144.124701][ T1547] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  144.125263][ T1547] CR2: 00007fedb95005fc CR3: 000000012dfd0000 CR4: 00000000000406f0
[  144.125925][ T1547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  144.126601][ T1547] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  144.127277][ T1547] Call Trace:
[  144.127584][ T1547]  <TASK>
[ 144.127848][ T1547] vfs_fallocate (??:?) 
[ 144.128251][ T1547] ksys_fallocate (??:?) 
[ 144.128646][ T1547] __x64_sys_fallocate (??:?) 
[ 144.129072][ T1547] do_syscall_64 (??:?) 
[ 144.129460][ T1547] entry_SYSCALL_64_after_hwframe (??:?) 
[  144.129972][ T1547] RIP: 0033:0x7fedb9a039b9
[ 144.130359][ T1547] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
All code
========
   0:	00 c3                	add    %al,%bl
   2:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
   9:	00 00 00 
   c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  11:	48 89 f8             	mov    %rdi,%rax
  14:	48 89 f7             	mov    %rsi,%rdi
  17:	48 89 d6             	mov    %rdx,%rsi
  1a:	48 89 ca             	mov    %rcx,%rdx
  1d:	4d 89 c2             	mov    %r8,%r10
  20:	4d 89 c8             	mov    %r9,%r8
  23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
  28:	0f 05                	syscall 
  2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
  30:	73 01                	jae    0x33
  32:	c3                   	retq   
  33:	48 8b 0d a7 54 0c 00 	mov    0xc54a7(%rip),%rcx        # 0xc54e1
  3a:	f7 d8                	neg    %eax
  3c:	64 89 01             	mov    %eax,%fs:(%rcx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   6:	73 01                	jae    0x9
   8:	c3                   	retq   
   9:	48 8b 0d a7 54 0c 00 	mov    0xc54a7(%rip),%rcx        # 0xc54b7
  10:	f7 d8                	neg    %eax
  12:	64 89 01             	mov    %eax,%fs:(%rcx)
  15:	48                   	rex.W


To reproduce:

        # build kernel
	cd linux
	cp config-6.3.0-13165-g1f944358dbb5 .config
	make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
Sidhartha Kumar May 23, 2023, 6:14 a.m. UTC | #4
On 5/22/23 10:00 PM, kernel test robot wrote:
> 
> Hello,
> 
> kernel test robot noticed "BUG:KASAN:null-ptr-deref_in_hugetlbfs_fallocate" on:
> 
> commit: 1f944358dbb5e9a6493fd7b1f77ee64376d2bdf1 ("[PATCH] mm/hugetlb: revert use of page_cache_next_miss()")
> url: https://github.com/intel-lab-lkp/linux/commits/Sidhartha-Kumar/mm-hugetlb-revert-use-of-page_cache_next_miss/20230506-025434
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 78b421b6a7c6dbb6a213877c742af52330f5026d
> patch link: https://lore.kernel.org/all/20230505185301.534259-1-sidhartha.kumar@oracle.com/
> patch subject: [PATCH] mm/hugetlb: revert use of page_cache_next_miss()
> 

This test is using 6.4-rc1 as its base where __filemap_get_folio() has 
been converted to return an ERR_PTR() rather than null. I believe this 
report can be fixed by doing:

		if (!IS_ERR(folio)) {
			folio_put(folio);
			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
			hugetlb_drop_vma_policy(&pseudo_vma);
			continue;
		}


However, I was targeting this patch to be applied to stable 6.3 as it's 
the simplest way to fix the current user visible bugs mentioned in the 
commit. Because it's unclear the direction upstream will take to fix 
this issue, as there is also the option to take Ackerly's patch[1] 
rather than using this fix, I'm not sure if I should send a version of 
this patch with 6.4-rc1 context. Please let me know how to proceed.


Thanks,
Sidhartha Kumar

[1]: 
https://lore.kernel.org/linux-mm/98624c2f481966492b4eb8272aef747790229b73.1683069252.git.ackerleytng@google.com/
> in testcase: trinity
> version: trinity-x86_64-abe9de86-1_20230501
> with following parameters:
> 
> 	runtime: 600s
> 
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
> 
> 
> compiler: clang-14
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202305231207.35d53791-oliver.sang@intel.com
> 
> 
> [ 144.098719][ T1547] BUG: KASAN: null-ptr-deref in hugetlbfs_fallocate (inode.c:?)
> [  144.099404][ T1547] Read of size 4 at addr 0000000000000032 by task trinity-c1/1547
> [  144.100071][ T1547]
> [  144.100282][ T1547] CPU: 0 PID: 1547 Comm: trinity-c1 Not tainted 6.3.0-13165-g1f944358dbb5 #1 1f0cfaa9708c3e99bb7e2ecf8f7fd22c51fc3e3b
> [  144.101310][ T1547] Call Trace:
> [  144.101602][ T1547]  <TASK>
> [ 144.101858][ T1547] dump_stack_lvl (??:?)
> [ 144.102269][ T1547] print_report (report.c:?)
> [ 144.102655][ T1547] ? start_report (report.c:?)
> [ 144.103044][ T1547] ? hugetlbfs_fallocate (inode.c:?)
> [ 144.103497][ T1547] ? hugetlbfs_fallocate (inode.c:?)
> [ 144.103937][ T1547] kasan_report (??:?)
> [ 144.104270][ T1547] ? filemap_get_entry (??:?)
> [ 144.104656][ T1547] ? hugetlbfs_fallocate (inode.c:?)
> [ 144.105082][ T1547] kasan_check_range (??:?)
> [ 144.105503][ T1547] hugetlbfs_fallocate (inode.c:?)
> [ 144.105921][ T1547] vfs_fallocate (??:?)
> [ 144.106317][ T1547] ksys_fallocate (??:?)
> [ 144.106702][ T1547] __x64_sys_fallocate (??:?)
> [ 144.107121][ T1547] do_syscall_64 (??:?)
> [ 144.107521][ T1547] entry_SYSCALL_64_after_hwframe (??:?)
> [  144.108022][ T1547] RIP: 0033:0x7fedb9a039b9
> [ 144.108398][ T1547] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
> All code
> ========
>     0:	00 c3                	add    %al,%bl
>     2:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
>     9:	00 00 00
>     c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>    11:	48 89 f8             	mov    %rdi,%rax
>    14:	48 89 f7             	mov    %rsi,%rdi
>    17:	48 89 d6             	mov    %rdx,%rsi
>    1a:	48 89 ca             	mov    %rcx,%rdx
>    1d:	4d 89 c2             	mov    %r8,%r10
>    20:	4d 89 c8             	mov    %r9,%r8
>    23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
>    28:	0f 05                	syscall
>    2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>    30:	73 01                	jae    0x33
>    32:	c3                   	retq
>    33:	48 8b 0d a7 54 0c 00 	mov    0xc54a7(%rip),%rcx        # 0xc54e1
>    3a:	f7 d8                	neg    %eax
>    3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>    3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>     6:	73 01                	jae    0x9
>     8:	c3                   	retq
>     9:	48 8b 0d a7 54 0c 00 	mov    0xc54a7(%rip),%rcx        # 0xc54b7
>    10:	f7 d8                	neg    %eax
>    12:	64 89 01             	mov    %eax,%fs:(%rcx)
>    15:	48                   	rex.W
> [  144.109953][ T1547] RSP: 002b:00007ffdf492f6a8 EFLAGS: 00000246 ORIG_RAX: 000000000000011d
> [  144.110612][ T1547] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fedb9a039b9
> [  144.111233][ T1547] RDX: 0000000000000008 RSI: 0000000000000000 RDI: 000000000000011a
> [  144.111870][ T1547] RBP: 00007fedb839a000 R08: 0000000000000020 R09: 0000000000000090
> [  144.112514][ T1547] R10: 0000000000000800 R11: 0000000000000246 R12: 000000000000011d
> [  144.113168][ T1547] R13: 00007fedb9ad1580 R14: 00007fedb839a058 R15: 00007fedb839a000
> [  144.113814][ T1547]  </TASK>
> [  144.114073][ T1547] ==================================================================
> [  144.114752][ T1547] Disabling lock debugging due to kernel taint
> [  144.115284][ T1547] general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] KASAN
> [  144.116161][ T1547] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
> [  144.116830][ T1547] CPU: 0 PID: 1547 Comm: trinity-c1 Tainted: G    B              6.3.0-13165-g1f944358dbb5 #1 1f0cfaa9708c3e99bb7e2ecf8f7fd22c51fc3e3b
> [ 144.117939][ T1547] RIP: 0010:hugetlbfs_fallocate (inode.c:?)
> [ 144.118431][ T1547] Code: 84 9c 00 00 00 48 89 c5 48 8d 58 34 48 89 df be 04 00 00 00 e8 d5 83 ca ff 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <8a> 04 08 84 c0 0f 85 d8 01 00 00 83 3b 00 0f 84 3a 07 00 00 48 89
> All code
> ========
>     0:	84 9c 00 00 00 48 89 	test   %bl,-0x76b80000(%rax,%rax,1)
>     7:	c5 48 8d             	(bad)
>     a:	58                   	pop    %rax
>     b:	34 48                	xor    $0x48,%al
>     d:	89 df                	mov    %ebx,%edi
>     f:	be 04 00 00 00       	mov    $0x4,%esi
>    14:	e8 d5 83 ca ff       	callq  0xffffffffffca83ee
>    19:	48 89 d8             	mov    %rbx,%rax
>    1c:	48 c1 e8 03          	shr    $0x3,%rax
>    20:	48 b9 00 00 00 00 00 	movabs $0xdffffc0000000000,%rcx
>    27:	fc ff df
>    2a:*	8a 04 08             	mov    (%rax,%rcx,1),%al		<-- trapping instruction
>    2d:	84 c0                	test   %al,%al
>    2f:	0f 85 d8 01 00 00    	jne    0x20d
>    35:	83 3b 00             	cmpl   $0x0,(%rbx)
>    38:	0f 84 3a 07 00 00    	je     0x778
>    3e:	48                   	rex.W
>    3f:	89                   	.byte 0x89
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	8a 04 08             	mov    (%rax,%rcx,1),%al
>     3:	84 c0                	test   %al,%al
>     5:	0f 85 d8 01 00 00    	jne    0x1e3
>     b:	83 3b 00             	cmpl   $0x0,(%rbx)
>     e:	0f 84 3a 07 00 00    	je     0x74e
>    14:	48                   	rex.W
>    15:	89                   	.byte 0x89
> [  144.120027][ T1547] RSP: 0018:ffff88812ba3fd48 EFLAGS: 00010206
> [  144.120545][ T1547] RAX: 0000000000000006 RBX: 0000000000000032 RCX: dffffc0000000000
> [  144.121198][ T1547] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff8a927100
> [  144.121864][ T1547] RBP: fffffffffffffffe R08: dffffc0000000000 R09: fffffbfff1524e21
> [  144.122535][ T1547] R10: 0000000000000000 R11: dffff7fff1524e22 R12: 0000000000000000
> [  144.123214][ T1547] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000fffffffc
> [  144.123947][ T1547] FS:  00007fedb9ad1600(0000) GS:ffffffff87f0a000(0000) knlGS:0000000000000000
> [  144.124701][ T1547] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  144.125263][ T1547] CR2: 00007fedb95005fc CR3: 000000012dfd0000 CR4: 00000000000406f0
> [  144.125925][ T1547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  144.126601][ T1547] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  144.127277][ T1547] Call Trace:
> [  144.127584][ T1547]  <TASK>
> [ 144.127848][ T1547] vfs_fallocate (??:?)
> [ 144.128251][ T1547] ksys_fallocate (??:?)
> [ 144.128646][ T1547] __x64_sys_fallocate (??:?)
> [ 144.129072][ T1547] do_syscall_64 (??:?)
> [ 144.129460][ T1547] entry_SYSCALL_64_after_hwframe (??:?)
> [  144.129972][ T1547] RIP: 0033:0x7fedb9a039b9
> [ 144.130359][ T1547] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
> All code
> ========
>     0:	00 c3                	add    %al,%bl
>     2:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
>     9:	00 00 00
>     c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>    11:	48 89 f8             	mov    %rdi,%rax
>    14:	48 89 f7             	mov    %rsi,%rdi
>    17:	48 89 d6             	mov    %rdx,%rsi
>    1a:	48 89 ca             	mov    %rcx,%rdx
>    1d:	4d 89 c2             	mov    %r8,%r10
>    20:	4d 89 c8             	mov    %r9,%r8
>    23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
>    28:	0f 05                	syscall
>    2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>    30:	73 01                	jae    0x33
>    32:	c3                   	retq
>    33:	48 8b 0d a7 54 0c 00 	mov    0xc54a7(%rip),%rcx        # 0xc54e1
>    3a:	f7 d8                	neg    %eax
>    3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>    3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>     6:	73 01                	jae    0x9
>     8:	c3                   	retq
>     9:	48 8b 0d a7 54 0c 00 	mov    0xc54a7(%rip),%rcx        # 0xc54b7
>    10:	f7 d8                	neg    %eax
>    12:	64 89 01             	mov    %eax,%fs:(%rcx)
>    15:	48                   	rex.W
> 
> 
> To reproduce:
> 
>          # build kernel
> 	cd linux
> 	cp config-6.3.0-13165-g1f944358dbb5 .config
> 	make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> 	make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> 	cd <mod-install-dir>
> 	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> 
> 
>          git clone https://github.com/intel/lkp-tests.git
>          cd lkp-tests
>          bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
> 
>          # if come across any failure that blocks the test,
>          # please remove ~/.lkp and /lkp dir to run from a clean state.
> 
> 
>
diff mbox series

Patch

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9062da6da5675..6d6cd8f26d76d 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -821,7 +821,6 @@  static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		 */
 		struct folio *folio;
 		unsigned long addr;
-		bool present;
 
 		cond_resched();
 
@@ -845,10 +844,9 @@  static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
 
 		/* See if already present in mapping to avoid alloc/free */
-		rcu_read_lock();
-		present = page_cache_next_miss(mapping, index, 1) != index;
-		rcu_read_unlock();
-		if (present) {
+		folio = filemap_get_folio(mapping, index);
+		if (folio) {
+			folio_put(folio);
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			hugetlb_drop_vma_policy(&pseudo_vma);
 			continue;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 245038a9fe4ea..29ab27d2a3ef5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5666,13 +5666,12 @@  static bool hugetlbfs_pagecache_present(struct hstate *h,
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	pgoff_t idx = vma_hugecache_offset(h, vma, address);
-	bool present;
-
-	rcu_read_lock();
-	present = page_cache_next_miss(mapping, idx, 1) != idx;
-	rcu_read_unlock();
+	struct folio *folio;
 
-	return present;
+	folio = filemap_get_folio(mapping, idx);
+	if (folio)
+		folio_put(folio);
+	return folio != NULL;
 }
 
 int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,