Message ID | 20210907062650.1309736-1-yhs@fb.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [mm] mm/mmap: relax mmap_assert_locked() for find_vma() | expand |
* Yonghong Song <yhs@fb.com> [210907 02:27]: > Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning: > > [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0 > [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod] > [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G W O 5.14.0+ #53 > [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0 > [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8 > [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246 > [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000 > [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e > [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe > [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0 > [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0 > [ 1411.304501] FS: 00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000 > [ 1411.304504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0 > [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1411.304512] Call Trace: > [ 1411.304517] stack_map_get_build_id_offset+0x17c/0x260 > [ 1411.304528] __bpf_get_stack+0x18f/0x230 > [ 1411.304541] bpf_get_stack_raw_tp+0x5a/0x70 > [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000 > [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e > [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04 > [ 1411.306978] bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c > [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0 > [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108 > [ 1411.309074] bpf_trace_run2+0x8f/0x1a0 > [ 1411.309891] FS: 00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000 > [ 1411.309896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1411.311221] syscall_trace_enter.isra.20+0x161/0x1f0 > [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0 > [ 1411.312291] do_syscall_64+0x15/0x80 > [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1411.313803] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1411.315082] RIP: 0033:0x7f00ad80a0e0 > [ 1411.315626] Call Trace: > [ 1411.315632] stack_map_get_build_id_offset+0x17c/0x260 > > To reproduce, first build `test_progs` binary: > make -C tools/testing/selftests/bpf -j60 > and then run the binary at tools/testing/selftests/bpf directory: > ./test_progs -t get_stack_raw_tp > > The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") > which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function > asserts that mm->mmap_lock needs to be held. But this is not the case for > bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which > uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held > in bpf_get_stack[id]() use case, the above warning is emitted during test run. > > This patch fixed the issue by replacing mmap_assert_locked() with rwsem_is_locked(&mm->mmap_lock) > in find_vma(). I'm not a fan of removing the lockdep check. I'd rather see find_vma() do the lockdep and call another internal function which does exactly what you have below. Then you could call the one and everyone else could keep the code they have. More importantly, since this is the only user of the mmap_read_trylock_non_owner(), is there a build bot that could be used to catch errors introduced in this path? Thanks, Liam > > Cc: Luigi Rizzo <lrizzo@google.com> > Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") > Signed-off-by: Yonghong Song <yhs@fb.com> > --- > mm/mmap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Alternatively we could add a bool argument to find_vma to indicate whether > mm->mmap_lock has been held or not. But I would like to report and try the > simpler solution first. > > diff --git a/mm/mmap.c b/mm/mmap.c > index 88dcc5c25225..8c78b85475b1 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2275,7 +2275,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr) > struct rb_node *rb_node; > struct vm_area_struct *vma; > > - mmap_assert_locked(mm); > + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > /* Check the cache first. */ > vma = vmacache_find(mm, addr); > if (likely(vma)) > -- > 2.30.2 > >
On 9/7/21 8:09 PM, Liam Howlett wrote: > * Yonghong Song <yhs@fb.com> [210907 02:27]: >> Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning: >> >> [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0 >> [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod] >> [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G W O 5.14.0+ #53 >> [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >> [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0 >> [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8 >> [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246 >> [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000 >> [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e >> [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe >> [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0 >> [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0 >> [ 1411.304501] FS: 00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000 >> [ 1411.304504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0 >> [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 1411.304512] Call Trace: >> [ 1411.304517] stack_map_get_build_id_offset+0x17c/0x260 >> [ 1411.304528] __bpf_get_stack+0x18f/0x230 >> [ 1411.304541] bpf_get_stack_raw_tp+0x5a/0x70 >> [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000 >> [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e >> [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04 >> [ 1411.306978] bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c >> [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0 >> [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108 >> [ 1411.309074] bpf_trace_run2+0x8f/0x1a0 >> [ 1411.309891] FS: 00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000 >> [ 1411.309896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1411.311221] syscall_trace_enter.isra.20+0x161/0x1f0 >> [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0 >> [ 1411.312291] do_syscall_64+0x15/0x80 >> [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 1411.313803] entry_SYSCALL_64_after_hwframe+0x44/0xae >> [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 1411.315082] RIP: 0033:0x7f00ad80a0e0 >> [ 1411.315626] Call Trace: >> [ 1411.315632] stack_map_get_build_id_offset+0x17c/0x260 >> >> To reproduce, first build `test_progs` binary: >> make -C tools/testing/selftests/bpf -j60 >> and then run the binary at tools/testing/selftests/bpf directory: >> ./test_progs -t get_stack_raw_tp >> >> The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") >> which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function >> asserts that mm->mmap_lock needs to be held. But this is not the case for >> bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which >> uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held >> in bpf_get_stack[id]() use case, the above warning is emitted during test run. >> >> This patch fixed the issue by replacing mmap_assert_locked() with rwsem_is_locked(&mm->mmap_lock) >> in find_vma(). > > I'm not a fan of removing the lockdep check. I'd rather see find_vma() > do the lockdep and call another internal function which does exactly > what you have below. Then you could call the one and everyone else > could keep the code they have. Okay, will do that in the next revision. > > More importantly, since this is the only user of the > mmap_read_trylock_non_owner(), is there a build bot that could be used > to catch errors introduced in this > path? We do have libbpf ci to test selftest against current *bpf_next* and some past old kernels. In this case, I think it is linux-next first introduced issue, and then the change went to net-next and then bpf-next. So to catch this earlier, we need to do libbpf ci on linux-next and/or net-next. We will discuss how we could incorporate this. > > Thanks, > Liam > >> >> Cc: Luigi Rizzo <lrizzo@google.com> >> Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") >> Signed-off-by: Yonghong Song <yhs@fb.com> >> --- >> mm/mmap.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> Alternatively we could add a bool argument to find_vma to indicate whether >> mm->mmap_lock has been held or not. But I would like to report and try the >> simpler solution first. >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 88dcc5c25225..8c78b85475b1 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2275,7 +2275,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr) >> struct rb_node *rb_node; >> struct vm_area_struct *vma; >> >> - mmap_assert_locked(mm); >> + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); >> /* Check the cache first. */ >> vma = vmacache_find(mm, addr); >> if (likely(vma)) >> -- >> 2.30.2 >>
diff --git a/mm/mmap.c b/mm/mmap.c index 88dcc5c25225..8c78b85475b1 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2275,7 +2275,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr) struct rb_node *rb_node; struct vm_area_struct *vma; - mmap_assert_locked(mm); + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); /* Check the cache first. */ vma = vmacache_find(mm, addr); if (likely(vma))
Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning: [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0 [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod] [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G W O 5.14.0+ #53 [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0 [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8 [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246 [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000 [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0 [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0 [ 1411.304501] FS: 00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000 [ 1411.304504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0 [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1411.304512] Call Trace: [ 1411.304517] stack_map_get_build_id_offset+0x17c/0x260 [ 1411.304528] __bpf_get_stack+0x18f/0x230 [ 1411.304541] bpf_get_stack_raw_tp+0x5a/0x70 [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000 [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04 [ 1411.306978] bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0 [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108 [ 1411.309074] bpf_trace_run2+0x8f/0x1a0 [ 1411.309891] FS: 00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000 [ 1411.309896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1411.311221] syscall_trace_enter.isra.20+0x161/0x1f0 [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0 [ 1411.312291] do_syscall_64+0x15/0x80 [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1411.313803] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1411.315082] RIP: 0033:0x7f00ad80a0e0 [ 1411.315626] Call Trace: [ 1411.315632] stack_map_get_build_id_offset+0x17c/0x260 To reproduce, first build `test_progs` binary: make -C tools/testing/selftests/bpf -j60 and then run the binary at tools/testing/selftests/bpf directory: ./test_progs -t get_stack_raw_tp The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function asserts that mm->mmap_lock needs to be held. But this is not the case for bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held in bpf_get_stack[id]() use case, the above warning is emitted during test run. This patch fixed the issue by replacing mmap_assert_locked() with rwsem_is_locked(&mm->mmap_lock) in find_vma(). Cc: Luigi Rizzo <lrizzo@google.com> Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") Signed-off-by: Yonghong Song <yhs@fb.com> --- mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Alternatively we could add a bool argument to find_vma to indicate whether mm->mmap_lock has been held or not. But I would like to report and try the simpler solution first.