diff mbox series

[bpf-next,v4] bpf: fix lockdep warning triggered by stack_map_get_build_id_offset()

Message ID 20210909060245.2966358-1-yhs@fb.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next,v4] bpf: fix lockdep warning triggered by stack_map_get_build_id_offset() | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: akpm@linux-foundation.org; 6 maintainers not CCed: akpm@linux-foundation.org kpsingh@kernel.org john.fastabend@gmail.com songliubraving@fb.com netdev@vger.kernel.org kafai@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes fail Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/header_inline success Link
bpf/vmtest-bpf-next success VM_Test
bpf/vmtest-bpf-next-PR success PR summary

Commit Message

Yonghong Song Sept. 9, 2021, 6:02 a.m. UTC
Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning:

  [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0
  [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod]
  [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G        W  O      5.14.0+ #53
  [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0
  [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8
  [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246
  [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000
  [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
  [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe
  [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0
  [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0
  [ 1411.304501] FS:  00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000
  [ 1411.304504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0
  [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1411.304512] Call Trace:
  [ 1411.304517]  stack_map_get_build_id_offset+0x17c/0x260
  [ 1411.304528]  __bpf_get_stack+0x18f/0x230
  [ 1411.304541]  bpf_get_stack_raw_tp+0x5a/0x70
  [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000
  [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
  [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04
  [ 1411.306978]  bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c
  [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0
  [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108
  [ 1411.309074]  bpf_trace_run2+0x8f/0x1a0
  [ 1411.309891] FS:  00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000
  [ 1411.309896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1411.311221]  syscall_trace_enter.isra.20+0x161/0x1f0
  [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0
  [ 1411.312291]  do_syscall_64+0x15/0x80
  [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1411.313803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1411.315082] RIP: 0033:0x7f00ad80a0e0
  [ 1411.315626] Call Trace:
  [ 1411.315632]  stack_map_get_build_id_offset+0x17c/0x260

To reproduce, first build `test_progs` binary:
  make -C tools/testing/selftests/bpf -j60
and then run the binary at tools/testing/selftests/bpf directory:
  ./test_progs -t get_stack_raw_tp

The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked()
annotations to find_vma*()") which added mmap_assert_locked() in find_vma() function.
The mmap_assert_locked() function asserts that mm->mmap_lock needs to be held. But this
is not the case for bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c),
which uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held
in bpf_get_stack[id]() use case, the above warning is emitted during test run.

This patch fixed the issue by (1). using mmap_read_trylock() instead of
mmap_read_trylock_non_owner() to satisfy lockdep checking in find_vma(),
and (2). droping lockdep for mmap_lock right before the irq_work_queue().

Cc: Luigi Rizzo <lrizzo@google.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/stackmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Jason Gunthorpe Sept. 9, 2021, 12:34 p.m. UTC | #1
On Wed, Sep 08, 2021 at 11:02:45PM -0700, Yonghong Song wrote:
> Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning:
> 
>   [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0
>   [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod]
>   [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G        W  O      5.14.0+ #53
>   [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>   [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0
>   [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8
>   [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246
>   [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000
>   [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>   [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe
>   [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0
>   [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0
>   [ 1411.304501] FS:  00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000
>   [ 1411.304504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0
>   [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   [ 1411.304512] Call Trace:
>   [ 1411.304517]  stack_map_get_build_id_offset+0x17c/0x260
>   [ 1411.304528]  __bpf_get_stack+0x18f/0x230
>   [ 1411.304541]  bpf_get_stack_raw_tp+0x5a/0x70
>   [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000
>   [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>   [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04
>   [ 1411.306978]  bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c
>   [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0
>   [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108
>   [ 1411.309074]  bpf_trace_run2+0x8f/0x1a0
>   [ 1411.309891] FS:  00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000
>   [ 1411.309896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [ 1411.311221]  syscall_trace_enter.isra.20+0x161/0x1f0
>   [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0
>   [ 1411.312291]  do_syscall_64+0x15/0x80
>   [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   [ 1411.313803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>   [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   [ 1411.315082] RIP: 0033:0x7f00ad80a0e0
>   [ 1411.315626] Call Trace:
>   [ 1411.315632]  stack_map_get_build_id_offset+0x17c/0x260
> 
> To reproduce, first build `test_progs` binary:
>   make -C tools/testing/selftests/bpf -j60
> and then run the binary at tools/testing/selftests/bpf directory:
>   ./test_progs -t get_stack_raw_tp
> 
> The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked()
> annotations to find_vma*()") which added mmap_assert_locked() in find_vma() function.
> The mmap_assert_locked() function asserts that mm->mmap_lock needs to be held. But this
> is not the case for bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c),
> which uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held
> in bpf_get_stack[id]() use case, the above warning is emitted during test run.
> 
> This patch fixed the issue by (1). using mmap_read_trylock() instead of
> mmap_read_trylock_non_owner() to satisfy lockdep checking in find_vma(),
> and (2). droping lockdep for mmap_lock right before the irq_work_queue().
> 
> Cc: Luigi Rizzo <lrizzo@google.com>
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
> Signed-off-by: Yonghong Song <yhs@fb.com>
>  kernel/bpf/stackmap.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index e8eefdf8cf3e..7b7500a3724b 100644
> +++ b/kernel/bpf/stackmap.c
> @@ -179,7 +179,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>  	 * with build_id.
>  	 */
>  	if (!user || !current || !current->mm || irq_work_busy ||
> -	    !mmap_read_trylock_non_owner(current->mm)) {
> +	    !mmap_read_trylock(current->mm)) {

This was the only caller of mmap_read_trylock_non_owner(), so please
delete it too

Jason
Matthew Wilcox Sept. 9, 2021, 1:33 p.m. UTC | #2
On Wed, Sep 08, 2021 at 11:02:45PM -0700, Yonghong Song wrote:
> @@ -204,9 +204,10 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>  	}
>  
>  	if (!work) {
> -		mmap_read_unlock_non_owner(current->mm);
> +		mmap_read_unlock(current->mm);
>  	} else {
>  		work->mm = current->mm;
> +		rwsem_release(&current->mm->mmap_lock.dep_map, _RET_IP_);
>  		irq_work_queue(&work->irq_work);

This needs a comment before the rwsem_release().  Something like:

		/*
		 * The lock will be released once we're out of interrupt
		 * context.  Tell lockdep that we've released it now so
		 * it doesn't complain that we forgot to release it.
		 */
Yonghong Song Sept. 9, 2021, 2:54 p.m. UTC | #3
On 9/9/21 5:34 AM, Jason Gunthorpe wrote:
> On Wed, Sep 08, 2021 at 11:02:45PM -0700, Yonghong Song wrote:
>> Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning:
>>
>>    [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0
>>    [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod]
>>    [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G        W  O      5.14.0+ #53
>>    [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>    [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0
>>    [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8
>>    [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246
>>    [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000
>>    [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>>    [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe
>>    [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0
>>    [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0
>>    [ 1411.304501] FS:  00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000
>>    [ 1411.304504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>    [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0
>>    [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>    [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>    [ 1411.304512] Call Trace:
>>    [ 1411.304517]  stack_map_get_build_id_offset+0x17c/0x260
>>    [ 1411.304528]  __bpf_get_stack+0x18f/0x230
>>    [ 1411.304541]  bpf_get_stack_raw_tp+0x5a/0x70
>>    [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000
>>    [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>>    [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04
>>    [ 1411.306978]  bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c
>>    [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0
>>    [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108
>>    [ 1411.309074]  bpf_trace_run2+0x8f/0x1a0
>>    [ 1411.309891] FS:  00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000
>>    [ 1411.309896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>    [ 1411.311221]  syscall_trace_enter.isra.20+0x161/0x1f0
>>    [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0
>>    [ 1411.312291]  do_syscall_64+0x15/0x80
>>    [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>    [ 1411.313803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>    [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>    [ 1411.315082] RIP: 0033:0x7f00ad80a0e0
>>    [ 1411.315626] Call Trace:
>>    [ 1411.315632]  stack_map_get_build_id_offset+0x17c/0x260
>>
>> To reproduce, first build `test_progs` binary:
>>    make -C tools/testing/selftests/bpf -j60
>> and then run the binary at tools/testing/selftests/bpf directory:
>>    ./test_progs -t get_stack_raw_tp
>>
>> The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked()
>> annotations to find_vma*()") which added mmap_assert_locked() in find_vma() function.
>> The mmap_assert_locked() function asserts that mm->mmap_lock needs to be held. But this
>> is not the case for bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c),
>> which uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held
>> in bpf_get_stack[id]() use case, the above warning is emitted during test run.
>>
>> This patch fixed the issue by (1). using mmap_read_trylock() instead of
>> mmap_read_trylock_non_owner() to satisfy lockdep checking in find_vma(),
>> and (2). droping lockdep for mmap_lock right before the irq_work_queue().
>>
>> Cc: Luigi Rizzo <lrizzo@google.com>
>> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
>> Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>   kernel/bpf/stackmap.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index e8eefdf8cf3e..7b7500a3724b 100644
>> +++ b/kernel/bpf/stackmap.c
>> @@ -179,7 +179,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>>   	 * with build_id.
>>   	 */
>>   	if (!user || !current || !current->mm || irq_work_busy ||
>> -	    !mmap_read_trylock_non_owner(current->mm)) {
>> +	    !mmap_read_trylock(current->mm)) {
> 
> This was the only caller of mmap_read_trylock_non_owner(), so please
> delete it too

Thanks. Will do.

> 
> Jason
>
Yonghong Song Sept. 9, 2021, 2:55 p.m. UTC | #4
On 9/9/21 6:33 AM, Matthew Wilcox wrote:
> On Wed, Sep 08, 2021 at 11:02:45PM -0700, Yonghong Song wrote:
>> @@ -204,9 +204,10 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>>   	}
>>   
>>   	if (!work) {
>> -		mmap_read_unlock_non_owner(current->mm);
>> +		mmap_read_unlock(current->mm);
>>   	} else {
>>   		work->mm = current->mm;
>> +		rwsem_release(&current->mm->mmap_lock.dep_map, _RET_IP_);
>>   		irq_work_queue(&work->irq_work);
> 
> This needs a comment before the rwsem_release().  Something like:
> 
> 		/*
> 		 * The lock will be released once we're out of interrupt
> 		 * context.  Tell lockdep that we've released it now so
> 		 * it doesn't complain that we forgot to release it.
> 		 */

Thanks! Will add it in the next revision.
diff mbox series

Patch

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index e8eefdf8cf3e..7b7500a3724b 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -179,7 +179,7 @@  static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 	 * with build_id.
 	 */
 	if (!user || !current || !current->mm || irq_work_busy ||
-	    !mmap_read_trylock_non_owner(current->mm)) {
+	    !mmap_read_trylock(current->mm)) {
 		/* cannot access current->mm, fall back to ips */
 		for (i = 0; i < trace_nr; i++) {
 			id_offs[i].status = BPF_STACK_BUILD_ID_IP;
@@ -204,9 +204,10 @@  static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 	}
 
 	if (!work) {
-		mmap_read_unlock_non_owner(current->mm);
+		mmap_read_unlock(current->mm);
 	} else {
 		work->mm = current->mm;
+		rwsem_release(&current->mm->mmap_lock.dep_map, _RET_IP_);
 		irq_work_queue(&work->irq_work);
 	}
 }