Message ID | 7cce156160c4da8062e3cc8c5e9d5b7880feaafd.1725284500.git.leonro@nvidia.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | [rdma-next] RDMA/core: Skip initialized but not leaked GID entries | expand |
在 2024/9/2 21:42, Leon Romanovsky 写道: > From: Leon Romanovsky <leonro@nvidia.com> > > Failure in driver initialization can lead to a situation where the GID > entries are set but not used yet. In this case, the kref will be equal to 1, > which will trigger a false positive leak detection. I have also delved into this problem. And I agree with you. This is a false positive leak detection. Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Zhu Yanjun > > For example, these messages are printed during the driver initialization > and followed by release_gid_table() call: > > infiniband syz1: ib_query_port failed (-19) > infiniband syz1: Couldn't set up InfiniBand P_Key/GID cache > ------------[ cut here ]------------ > GID entry ref leak for dev syz1 index 0 ref=1 > WARNING: CPU: 0 PID: 19837 at drivers/infiniband/core/cache.c:806 gid_table_release_one+0x387/0x4b0 > Modules linked in: > CPU: 0 UID: 0 PID: 19837 Comm: syz.1.3934 Not tainted 6.11.0-rc5-syzkaller-00079-g928f79a188aa #0 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > RIP: 0010:gid_table_release_one+0x387/0x4b0 > Code: 78 07 00 00 48 85 f6 74 2a 48 89 74 24 38 > e8 b0 0a 76 f9 48 8b 74 24 38 44 89 f9 89 da 48 c7 c7 c0 69 51 8c e8 5a > c3 38 f9 90 <0f> 0b 90 90 e9 6f fe ff ff e8 8b 0a 76 f9 49 8d bc 24 28 > 07 00 00 > RSP: 0018:ffffc900042b7080 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002811e000 > RDX: 0000000000040000 RSI: ffffffff814dd406 RDI: 0000000000000001 > RBP: ffff88807ebaaf00 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000000 R12: ffff888051860000 > R13: dffffc0000000000 R14: ffffed100fd755fb R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff88802c000000(0063) knlGS:00000000f56c6b40 > CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > CR2: 000000002effcff8 CR3: 0000000060c5e000 CR4: 0000000000350ef0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400 > Call Trace: > <TASK> > ? show_regs+0x8c/0xa0 > ? __warn+0xe5/0x3c0 > ? gid_table_release_one+0x387/0x4b0 > ? report_bug+0x3c0/0x580 > ? handle_bug+0x3d/0x70 > ? exc_invalid_op+0x17/0x50 > ? asm_exc_invalid_op+0x1a/0x20 > ? __warn_printk+0x1a6/0x350 > ? gid_table_release_one+0x387/0x4b0 > ib_device_release+0xef/0x1e0 > ? __pfx_ib_device_release+0x10/0x10 > device_release+0xa1/0x240 > kobject_put+0x1e4/0x5a0 > put_device+0x1f/0x30 > rxe_net_add+0xe0/0x110 > rxe_newlink+0x70/0x190 > nldev_newlink+0x373/0x5e0 > ? __pfx_nldev_newlink+0x10/0x10 > ? aa_get_newest_label+0x376/0x680 > ? __pfx_lock_acquire+0x10/0x10 > ? __pfx_aa_get_newest_label+0x10/0x10 > ? __pfx_rwsem_read_trylock+0x10/0x10 > ? __pfx___might_resched+0x10/0x10 > ? apparmor_capable+0x114/0x1d0 > ? ns_capable+0xd7/0x110 > ? __pfx_nldev_newlink+0x10/0x10 > rdma_nl_rcv_msg+0x388/0x6e0 > ? __pfx_rdma_nl_rcv_msg+0x10/0x10 > ? __pfx___lock_acquire+0x10/0x10 > ? find_held_lock+0x2d/0x110 > rdma_nl_rcv_skb.constprop.0.isra.0+0x2e6/0x450 > ? __pfx_rdma_nl_rcv_skb.constprop.0.isra.0+0x10/0x10 > ? netlink_deliver_tap+0x1ae/0xcf0 > netlink_unicast+0x53c/0x7f0 > ? __pfx_netlink_unicast+0x10/0x10 > ? __phys_addr_symbol+0x30/0x80 > ? __check_object_size+0x497/0x720 > netlink_sendmsg+0x8b8/0xd70 > ? __pfx_netlink_sendmsg+0x10/0x10 > ? bpf_lsm_socket_sendmsg+0x9/0x10 > ____sys_sendmsg+0x9b4/0xb50 > ? __pfx_____sys_sendmsg+0x10/0x10 > ? get_compat_msghdr+0x11b/0x170 > ? __pfx___lock_acquire+0x10/0x10 > ? try_to_wake_up+0xc08/0x13e0 > ___sys_sendmsg+0x135/0x1e0 > ? __pfx____sys_sendmsg+0x10/0x10 > ? __fget_light+0x173/0x210 > __sys_sendmsg+0x117/0x1f0 > ? __pfx___sys_sendmsg+0x10/0x10 > ? __ia32_sys_futex_time32+0x1da/0x460 > __do_fast_syscall_32+0x73/0x120 > do_fast_syscall_32+0x32/0x80 > entry_SYSENTER_compat_after_hwframe+0x84/0x8e > RIP: 0023:0xf7f20579 > Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 > 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 > 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 > 00 00 00 > RSP: 002b:00000000f56c656c EFLAGS: 00000296 ORIG_RAX: 0000000000000172 > RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00000000200003c0 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000 > SYZFAIL: failed to recv rpc fd=3 want=4 recv=0 n=0 (errno 9: Bad file descriptor) > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > </TASK> > Kernel panic - not syncing: kernel: panic_on_warn set ... > > In order to fix it, don't print kern > > Fixes: a92fbeac7e94 ("RDMA/cache: Release GID table even if leak is detected") > Reported-by: syzbot+b8b7a6774bf40cf8296b@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/all/000000000000d70eed06211ac86b@google.com > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > --- > drivers/infiniband/core/cache.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c > index b7c078b7f7cf..c6aec2e04d4c 100644 > --- a/drivers/infiniband/core/cache.c > +++ b/drivers/infiniband/core/cache.c > @@ -800,13 +800,15 @@ static void release_gid_table(struct ib_device *device, > return; > > for (i = 0; i < table->sz; i++) { > + int gid_kref; > + > if (is_gid_entry_free(table->data_vec[i])) > continue; > > - WARN_ONCE(true, > + gid_kref = kref_read(&table->data_vec[i]->kref); > + WARN_ONCE(gid_kref > 1, > "GID entry ref leak for dev %s index %d ref=%u\n", > - dev_name(&device->dev), i, > - kref_read(&table->data_vec[i]->kref)); > + dev_name(&device->dev), i, gid_kref); > } > > mutex_destroy(&table->lock);
On Tue, Sep 03, 2024 at 06:46:24AM +0800, Zhu Yanjun wrote: > 在 2024/9/2 21:42, Leon Romanovsky 写道: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > Failure in driver initialization can lead to a situation where the GID > > entries are set but not used yet. In this case, the kref will be equal to 1, > > which will trigger a false positive leak detection. > > I have also delved into this problem. And I agree with you. This is a false > positive leak detection. > > Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Thanks for the review.
On Tue, Sep 03, 2024 at 06:46:24AM +0800, Zhu Yanjun wrote: > 在 2024/9/2 21:42, Leon Romanovsky 写道: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > Failure in driver initialization can lead to a situation where the GID > > entries are set but not used yet. In this case, the kref will be equal to 1, > > which will trigger a false positive leak detection. > > I have also delved into this problem. And I agree with you. This is a false > positive leak detection. I retested after applying commit 1403c8b14765 ("IB/core: Fix ib_cache_setup_one error flow cleanup") and it looks like this patch is not needed anymore. I will drop it. Thanks > > Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> > > Zhu Yanjun > > > > > For example, these messages are printed during the driver initialization > > and followed by release_gid_table() call: > > > > infiniband syz1: ib_query_port failed (-19) > > infiniband syz1: Couldn't set up InfiniBand P_Key/GID cache > > ------------[ cut here ]------------ > > GID entry ref leak for dev syz1 index 0 ref=1 > > WARNING: CPU: 0 PID: 19837 at drivers/infiniband/core/cache.c:806 gid_table_release_one+0x387/0x4b0 > > Modules linked in: > > CPU: 0 UID: 0 PID: 19837 Comm: syz.1.3934 Not tainted 6.11.0-rc5-syzkaller-00079-g928f79a188aa #0 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > > RIP: 0010:gid_table_release_one+0x387/0x4b0 > > Code: 78 07 00 00 48 85 f6 74 2a 48 89 74 24 38 > > e8 b0 0a 76 f9 48 8b 74 24 38 44 89 f9 89 da 48 c7 c7 c0 69 51 8c e8 5a > > c3 38 f9 90 <0f> 0b 90 90 e9 6f fe ff ff e8 8b 0a 76 f9 49 8d bc 24 28 > > 07 00 00 > > RSP: 0018:ffffc900042b7080 EFLAGS: 00010286 > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002811e000 > > RDX: 0000000000040000 RSI: ffffffff814dd406 RDI: 0000000000000001 > > RBP: ffff88807ebaaf00 R08: 0000000000000001 R09: 0000000000000000 > > R10: 0000000000000001 R11: 0000000000000000 R12: ffff888051860000 > > R13: dffffc0000000000 R14: ffffed100fd755fb R15: 0000000000000001 > > FS: 0000000000000000(0000) GS:ffff88802c000000(0063) knlGS:00000000f56c6b40 > > CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > > CR2: 000000002effcff8 CR3: 0000000060c5e000 CR4: 0000000000350ef0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400 > > Call Trace: > > <TASK> > > ? show_regs+0x8c/0xa0 > > ? __warn+0xe5/0x3c0 > > ? gid_table_release_one+0x387/0x4b0 > > ? report_bug+0x3c0/0x580 > > ? handle_bug+0x3d/0x70 > > ? exc_invalid_op+0x17/0x50 > > ? asm_exc_invalid_op+0x1a/0x20 > > ? __warn_printk+0x1a6/0x350 > > ? gid_table_release_one+0x387/0x4b0 > > ib_device_release+0xef/0x1e0 > > ? __pfx_ib_device_release+0x10/0x10 > > device_release+0xa1/0x240 > > kobject_put+0x1e4/0x5a0 > > put_device+0x1f/0x30 > > rxe_net_add+0xe0/0x110 > > rxe_newlink+0x70/0x190 > > nldev_newlink+0x373/0x5e0 > > ? __pfx_nldev_newlink+0x10/0x10 > > ? aa_get_newest_label+0x376/0x680 > > ? __pfx_lock_acquire+0x10/0x10 > > ? __pfx_aa_get_newest_label+0x10/0x10 > > ? __pfx_rwsem_read_trylock+0x10/0x10 > > ? __pfx___might_resched+0x10/0x10 > > ? apparmor_capable+0x114/0x1d0 > > ? ns_capable+0xd7/0x110 > > ? __pfx_nldev_newlink+0x10/0x10 > > rdma_nl_rcv_msg+0x388/0x6e0 > > ? __pfx_rdma_nl_rcv_msg+0x10/0x10 > > ? __pfx___lock_acquire+0x10/0x10 > > ? find_held_lock+0x2d/0x110 > > rdma_nl_rcv_skb.constprop.0.isra.0+0x2e6/0x450 > > ? __pfx_rdma_nl_rcv_skb.constprop.0.isra.0+0x10/0x10 > > ? netlink_deliver_tap+0x1ae/0xcf0 > > netlink_unicast+0x53c/0x7f0 > > ? __pfx_netlink_unicast+0x10/0x10 > > ? __phys_addr_symbol+0x30/0x80 > > ? __check_object_size+0x497/0x720 > > netlink_sendmsg+0x8b8/0xd70 > > ? __pfx_netlink_sendmsg+0x10/0x10 > > ? bpf_lsm_socket_sendmsg+0x9/0x10 > > ____sys_sendmsg+0x9b4/0xb50 > > ? __pfx_____sys_sendmsg+0x10/0x10 > > ? get_compat_msghdr+0x11b/0x170 > > ? __pfx___lock_acquire+0x10/0x10 > > ? try_to_wake_up+0xc08/0x13e0 > > ___sys_sendmsg+0x135/0x1e0 > > ? __pfx____sys_sendmsg+0x10/0x10 > > ? __fget_light+0x173/0x210 > > __sys_sendmsg+0x117/0x1f0 > > ? __pfx___sys_sendmsg+0x10/0x10 > > ? __ia32_sys_futex_time32+0x1da/0x460 > > __do_fast_syscall_32+0x73/0x120 > > do_fast_syscall_32+0x32/0x80 > > entry_SYSENTER_compat_after_hwframe+0x84/0x8e > > RIP: 0023:0xf7f20579 > > Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 > > 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 > > 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 > > 00 00 00 > > RSP: 002b:00000000f56c656c EFLAGS: 00000296 ORIG_RAX: 0000000000000172 > > RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00000000200003c0 > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000 > > SYZFAIL: failed to recv rpc fd=3 want=4 recv=0 n=0 (errno 9: Bad file descriptor) > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > </TASK> > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > In order to fix it, don't print kern > > > > Fixes: a92fbeac7e94 ("RDMA/cache: Release GID table even if leak is detected") > > Reported-by: syzbot+b8b7a6774bf40cf8296b@syzkaller.appspotmail.com > > Closes: https://lore.kernel.org/all/000000000000d70eed06211ac86b@google.com > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > > --- > > drivers/infiniband/core/cache.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c > > index b7c078b7f7cf..c6aec2e04d4c 100644 > > --- a/drivers/infiniband/core/cache.c > > +++ b/drivers/infiniband/core/cache.c > > @@ -800,13 +800,15 @@ static void release_gid_table(struct ib_device *device, > > return; > > for (i = 0; i < table->sz; i++) { > > + int gid_kref; > > + > > if (is_gid_entry_free(table->data_vec[i])) > > continue; > > - WARN_ONCE(true, > > + gid_kref = kref_read(&table->data_vec[i]->kref); > > + WARN_ONCE(gid_kref > 1, > > "GID entry ref leak for dev %s index %d ref=%u\n", > > - dev_name(&device->dev), i, > > - kref_read(&table->data_vec[i]->kref)); > > + dev_name(&device->dev), i, gid_kref); > > } > > mutex_destroy(&table->lock); > >
On Mon, Sep 02, 2024 at 04:42:52PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > Failure in driver initialization can lead to a situation where the GID > entries are set but not used yet. In this case, the kref will be equal to 1, > which will trigger a false positive leak detection. Why does that happen?? > For example, these messages are printed during the driver initialization > and followed by release_gid_table() call: > > infiniband syz1: ib_query_port failed (-19) > infiniband syz1: Couldn't set up InfiniBand P_Key/GID cache Okay, but who set the ref=1? > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c > index b7c078b7f7cf..c6aec2e04d4c 100644 > --- a/drivers/infiniband/core/cache.c > +++ b/drivers/infiniband/core/cache.c > @@ -800,13 +800,15 @@ static void release_gid_table(struct ib_device *device, > return; > > for (i = 0; i < table->sz; i++) { > + int gid_kref; > + > if (is_gid_entry_free(table->data_vec[i])) > continue; > > - WARN_ONCE(true, > + gid_kref = kref_read(&table->data_vec[i]->kref); > + WARN_ONCE(gid_kref > 1, > "GID entry ref leak for dev %s index %d ref=%u\n", > - dev_name(&device->dev), i, > - kref_read(&table->data_vec[i]->kref)); > + dev_name(&device->dev), i, gid_kref); > } I'm not convinced, I think the bug here is something wrong on the refcounting side not the freeing side. Ref should not be 1. Seems like missing error unwinding in the init side. Jason
On Wed, Sep 04, 2024 at 11:31:13AM -0300, Jason Gunthorpe wrote: > On Mon, Sep 02, 2024 at 04:42:52PM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > Failure in driver initialization can lead to a situation where the GID > > entries are set but not used yet. In this case, the kref will be equal to 1, > > which will trigger a false positive leak detection. > > Why does that happen?? > > > > For example, these messages are printed during the driver initialization > > and followed by release_gid_table() call: > > > > infiniband syz1: ib_query_port failed (-19) > > infiniband syz1: Couldn't set up InfiniBand P_Key/GID cache > > Okay, but who set the ref=1? > > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c > > index b7c078b7f7cf..c6aec2e04d4c 100644 > > --- a/drivers/infiniband/core/cache.c > > +++ b/drivers/infiniband/core/cache.c > > @@ -800,13 +800,15 @@ static void release_gid_table(struct ib_device *device, > > return; > > > > for (i = 0; i < table->sz; i++) { > > + int gid_kref; > > + > > if (is_gid_entry_free(table->data_vec[i])) > > continue; > > > > - WARN_ONCE(true, > > + gid_kref = kref_read(&table->data_vec[i]->kref); > > + WARN_ONCE(gid_kref > 1, > > "GID entry ref leak for dev %s index %d ref=%u\n", > > - dev_name(&device->dev), i, > > - kref_read(&table->data_vec[i]->kref)); > > + dev_name(&device->dev), i, gid_kref); > > } > > I'm not convinced, I think the bug here is something wrong on the > refcounting side not the freeing side. Ref should not be 1. Seems like > missing error unwinding in the init side. I dropped this patch as the real fix is here 1403c8b14765 ("IB/core: Fix ib_cache_setup_one error flow cleanup") Thanks > > Jason >
在 2024/9/4 23:34, Leon Romanovsky 写道: > On Wed, Sep 04, 2024 at 11:31:13AM -0300, Jason Gunthorpe wrote: >> On Mon, Sep 02, 2024 at 04:42:52PM +0300, Leon Romanovsky wrote: >>> From: Leon Romanovsky <leonro@nvidia.com> >>> >>> Failure in driver initialization can lead to a situation where the GID >>> entries are set but not used yet. In this case, the kref will be equal to 1, >>> which will trigger a false positive leak detection. >> >> Why does that happen?? >> >> >>> For example, these messages are printed during the driver initialization >>> and followed by release_gid_table() call: >>> >>> infiniband syz1: ib_query_port failed (-19) >>> infiniband syz1: Couldn't set up InfiniBand P_Key/GID cache >> >> Okay, but who set the ref=1? >> >>> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c >>> index b7c078b7f7cf..c6aec2e04d4c 100644 >>> --- a/drivers/infiniband/core/cache.c >>> +++ b/drivers/infiniband/core/cache.c >>> @@ -800,13 +800,15 @@ static void release_gid_table(struct ib_device *device, >>> return; >>> >>> for (i = 0; i < table->sz; i++) { >>> + int gid_kref; >>> + >>> if (is_gid_entry_free(table->data_vec[i])) >>> continue; >>> >>> - WARN_ONCE(true, >>> + gid_kref = kref_read(&table->data_vec[i]->kref); >>> + WARN_ONCE(gid_kref > 1, >>> "GID entry ref leak for dev %s index %d ref=%u\n", >>> - dev_name(&device->dev), i, >>> - kref_read(&table->data_vec[i]->kref)); >>> + dev_name(&device->dev), i, gid_kref); >>> } >> >> I'm not convinced, I think the bug here is something wrong on the >> refcounting side not the freeing side. Ref should not be 1. Seems like >> missing error unwinding in the init side. > > I dropped this patch as the real fix is here 1403c8b14765 ("IB/core: Fix ib_cache_setup_one error flow cleanup") The commit 1403c8b14765 ("IB/core: Fix ib_cache_setup_one error flow cleanup") is in the link https://patchwork.kernel.org/project/linux-rdma/patch/79137687d829899b0b1c9835fcb4b258004c439a.1725273354.git.leon@kernel.org/ Zhu Yanjun > > Thanks > >> >> Jason >>
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index b7c078b7f7cf..c6aec2e04d4c 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -800,13 +800,15 @@ static void release_gid_table(struct ib_device *device, return; for (i = 0; i < table->sz; i++) { + int gid_kref; + if (is_gid_entry_free(table->data_vec[i])) continue; - WARN_ONCE(true, + gid_kref = kref_read(&table->data_vec[i]->kref); + WARN_ONCE(gid_kref > 1, "GID entry ref leak for dev %s index %d ref=%u\n", - dev_name(&device->dev), i, - kref_read(&table->data_vec[i]->kref)); + dev_name(&device->dev), i, gid_kref); } mutex_destroy(&table->lock);