diff mbox series

[1/1] gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()

Message ID 20240124101404.161655-2-kovalev@altlinux.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp() | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1077 this patch: 1077
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 0 of 0 maintainers
netdev/build_clang success Errors and warnings before: 1095 this patch: 1095
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1095 this patch: 1095
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 45 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest pending net-next-2024-01-25--00-00

Commit Message

Vasiliy Kovalev Jan. 24, 2024, 10:14 a.m. UTC
From: Vasiliy Kovalev <kovalev@altlinux.org>

After unloading the module, an instance continues to exist that accesses
outdated memory addresses.

To prevent this, the dump_pdp_en flag has been added, which blocks the
dump of pdp contexts by a false value. And only after these checks can
the net_generic() function be called.

These errors were found using the syzkaller program:

Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug.
gtp: GTP module loaded (pdp ctx size 104 bytes)
gtp: GTP module unloaded
general protection fault, probably for non-canonical address
0xdffffc0000000001:0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 2782 Comm: syz-executor139 Not tainted 5.10.200-std-def-alt1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-alt1
RIP: 0010:gtp_genl_dump_pdp+0x1b1/0x790 [gtp]
...
Call Trace:
 genl_lock_dumpit+0x6b/0xa0 net/netlink/genetlink.c:623
 netlink_dump+0x575/0xc70 net/netlink/af_netlink.c:2271
 __netlink_dump_start+0x64e/0x910 net/netlink/af_netlink.c:2376
 genl_family_rcv_msg_dumpit+0x2b8/0x310 net/netlink/genetlink.c:686
 genl_family_rcv_msg net/netlink/genetlink.c:780 [inline]
 genl_rcv_msg+0x450/0x5a0 net/netlink/genetlink.c:800
 netlink_rcv_skb+0x150/0x440 net/netlink/af_netlink.c:2497
 genl_rcv+0x29/0x40 net/netlink/genetlink.c:811
 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
 netlink_unicast+0x54e/0x800 net/netlink/af_netlink.c:1348
 netlink_sendmsg+0x914/0xe00 net/netlink/af_netlink.c:1916
 sock_sendmsg_nosec net/socket.c:651 [inline]
 __sock_sendmsg+0x159/0x190 net/socket.c:663
 ____sys_sendmsg+0x712/0x870 net/socket.c:2376
 ___sys_sendmsg+0xf8/0x170 net/socket.c:2430
 __sys_sendmsg+0xea/0x1b0 net/socket.c:2459
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x62/0xc7
RIP: 0033:0x7f2ea16c2d49

Fixes: 94a6d9fb88df ("gtp: fix wrong condition in gtp_genl_dump_pdp()")
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
---
 drivers/net/gtp.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Eric Dumazet Jan. 24, 2024, 10:42 a.m. UTC | #1
On Wed, Jan 24, 2024 at 11:14 AM <kovalev@altlinux.org> wrote:
>
> From: Vasiliy Kovalev <kovalev@altlinux.org>
>
> After unloading the module, an instance continues to exist that accesses
> outdated memory addresses.
>
> To prevent this, the dump_pdp_en flag has been added, which blocks the
> dump of pdp contexts by a false value. And only after these checks can
> the net_generic() function be called.
>
> These errors were found using the syzkaller program:
>
> Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug.
> gtp: GTP module loaded (pdp ctx size 104 bytes)
> gtp: GTP module unloaded
> general protection fault, probably for non-canonical address
> 0xdffffc0000000001:0000 [#1] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> CPU: 0 PID: 2782 Comm: syz-executor139 Not tainted 5.10.200-std-def-alt1 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-alt1
> RIP: 0010:gtp_genl_dump_pdp+0x1b1/0x790 [gtp]
> ...
> Call Trace:
>  genl_lock_dumpit+0x6b/0xa0 net/netlink/genetlink.c:623
>  netlink_dump+0x575/0xc70 net/netlink/af_netlink.c:2271
>  __netlink_dump_start+0x64e/0x910 net/netlink/af_netlink.c:2376
>  genl_family_rcv_msg_dumpit+0x2b8/0x310 net/netlink/genetlink.c:686
>  genl_family_rcv_msg net/netlink/genetlink.c:780 [inline]
>  genl_rcv_msg+0x450/0x5a0 net/netlink/genetlink.c:800
>  netlink_rcv_skb+0x150/0x440 net/netlink/af_netlink.c:2497
>  genl_rcv+0x29/0x40 net/netlink/genetlink.c:811
>  netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
>  netlink_unicast+0x54e/0x800 net/netlink/af_netlink.c:1348
>  netlink_sendmsg+0x914/0xe00 net/netlink/af_netlink.c:1916
>  sock_sendmsg_nosec net/socket.c:651 [inline]
>  __sock_sendmsg+0x159/0x190 net/socket.c:663
>  ____sys_sendmsg+0x712/0x870 net/socket.c:2376
>  ___sys_sendmsg+0xf8/0x170 net/socket.c:2430
>  __sys_sendmsg+0xea/0x1b0 net/socket.c:2459
>  do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x62/0xc7
> RIP: 0033:0x7f2ea16c2d49
>
> Fixes: 94a6d9fb88df ("gtp: fix wrong condition in gtp_genl_dump_pdp()")
> Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
> ---
>  drivers/net/gtp.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> index 477b4d4f860bd3..3fc4639711cd83 100644
> --- a/drivers/net/gtp.c
> +++ b/drivers/net/gtp.c
> @@ -1675,6 +1675,8 @@ static int gtp_genl_get_pdp(struct sk_buff *skb, struct genl_info *info)
>         return err;
>  }
>
> +static bool dump_pdp_en;
> +

Hmm, it seems there is a missing try_module_get() somewhere...

__netlink_dump_start() does one, so perhaps we reach __netlink_dump_start()
with a NULL in control->module ?
Eric Dumazet Jan. 24, 2024, 10:57 a.m. UTC | #2
On Wed, Jan 24, 2024 at 11:14 AM <kovalev@altlinux.org> wrote:
>
> From: Vasiliy Kovalev <kovalev@altlinux.org>
>
> After unloading the module, an instance continues to exist that accesses
> outdated memory addresses.
>
> To prevent this, the dump_pdp_en flag has been added, which blocks the
> dump of pdp contexts by a false value. And only after these checks can
> the net_generic() function be called.
>
> These errors were found using the syzkaller program:
>
> Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug.
> gtp: GTP module loaded (pdp ctx size 104 bytes)
> gtp: GTP module unloaded
> general protection fault, probably for non-canonical address
> 0xdffffc0000000001:0000 [#1] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> CPU: 0 PID: 2782 Comm: syz-executor139 Not tainted 5.10.200-std-def-alt1 #1

Oh wait, this is a 5.10 kernel ?

Please generate a stack trace using a recent tree, it is possible the
bug has been fixed already.
Vasiliy Kovalev Jan. 24, 2024, 11:20 a.m. UTC | #3
24.01.2024 13:57, Eric Dumazet wrote:
> Oh wait, this is a 5.10 kernel ?
Yes, but the bug is reproduced on the latest stable kernels.
> Please generate a stack trace using a recent tree, it is possible the
> bug has been fixed already.

See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the 
bottom of the message.

[  523.915255] Call Trace:
[  523.915255]  <TASK>
[  523.915255]  ? __die+0x1f/0x70
[  523.915255]  ? page_fault_oops+0x14d/0x4a0
[  523.915255]  ? exc_page_fault+0x7b/0x180
[  523.915255]  ? asm_exc_page_fault+0x22/0x30
[  523.915255]  ? gtp_genl_dump_pdp+0x82/0x190 [gtp]
[  523.915255]  ? gtp_genl_dump_pdp+0x82/0x190 [gtp]
[  523.915255]  genl_dumpit+0x2f/0x90
[  523.915255]  netlink_dump+0x126/0x320
[  523.915255]  __netlink_dump_start+0x1da/0x2a0
[  523.915255]  genl_family_rcv_msg_dumpit+0x93/0x100
[  523.915255]  ? __pfx_genl_start+0x10/0x10
[  523.915255]  ? __pfx_genl_dumpit+0x10/0x10
[  523.915255]  ? __pfx_genl_done+0x10/0x10
[  523.915255]  genl_rcv_msg+0x112/0x2a0
[  523.915255]  ? __pfx_gtp_genl_dump_pdp+0x10/0x10 [gtp]
[  523.915255]  ? __pfx_genl_rcv_msg+0x10/0x10
[  523.915255]  netlink_rcv_skb+0x54/0x110
[  523.915255]  genl_rcv+0x24/0x40
[  523.915255]  netlink_unicast+0x19f/0x290
[  523.915255]  netlink_sendmsg+0x250/0x4e0
[  523.915255]  ____sys_sendmsg+0x376/0x3b0
[  523.915255]  ? copy_msghdr_from_user+0x6d/0xb0
[  523.915255]  ___sys_sendmsg+0x86/0xe0
[  523.915255]  ? do_fault+0x296/0x470
[  523.915255]  ? __handle_mm_fault+0x771/0xda0
[  523.915255]  __sys_sendmsg+0x57/0xb0
[  523.915255]  do_syscall_64+0x59/0x90
[  523.915255]  ? ct_kernel_exit.isra.0+0x71/0x90
[  523.915255]  ? __ct_user_enter+0x5a/0xd0
[  523.915255]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  523.915255] RIP: 0033:0x7f2bcb93cd49
Eric Dumazet Jan. 24, 2024, 11:52 a.m. UTC | #4
On Wed, Jan 24, 2024 at 12:20 PM <kovalev@altlinux.org> wrote:
>
> 24.01.2024 13:57, Eric Dumazet wrote:
> > Oh wait, this is a 5.10 kernel ?
> Yes, but the bug is reproduced on the latest stable kernels.
> > Please generate a stack trace using a recent tree, it is possible the
> > bug has been fixed already.
>
> See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the
> bottom of the message.

Ah, ok. Not sure why you sent a cover letter for a single patch...

Setting a boolean, in a module that can disappear will not prevent the
module from disappearing.

This work around might work, or might not work, depending on timing,
preemptions, ....

Thanks.
Pablo Neira Ayuso Jan. 29, 2024, 12:02 p.m. UTC | #5
Hi,

On Wed, Jan 24, 2024 at 02:20:04PM +0300, kovalev@altlinux.org wrote:
> 24.01.2024 13:57, Eric Dumazet wrote:
> > Oh wait, this is a 5.10 kernel ?
>
> Yes, but the bug is reproduced on the latest stable kernels.
>
> > Please generate a stack trace using a recent tree, it is possible the
> > bug has been fixed already.

__netlink_dump_start() is called at the beginning of the dump, which is
grabbing a reference on this module.

do you have a reproducer?

> See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the bottom
> of the message.
> 
> [  523.915255] Call Trace:
> [  523.915255]  <TASK>
> [  523.915255]  ? __die+0x1f/0x70
> [  523.915255]  ? page_fault_oops+0x14d/0x4a0
> [  523.915255]  ? exc_page_fault+0x7b/0x180
> [  523.915255]  ? asm_exc_page_fault+0x22/0x30
> [  523.915255]  ? gtp_genl_dump_pdp+0x82/0x190 [gtp]
> [  523.915255]  ? gtp_genl_dump_pdp+0x82/0x190 [gtp]
> [  523.915255]  genl_dumpit+0x2f/0x90
> [  523.915255]  netlink_dump+0x126/0x320
> [  523.915255]  __netlink_dump_start+0x1da/0x2a0
> [  523.915255]  genl_family_rcv_msg_dumpit+0x93/0x100
> [  523.915255]  ? __pfx_genl_start+0x10/0x10
> [  523.915255]  ? __pfx_genl_dumpit+0x10/0x10
> [  523.915255]  ? __pfx_genl_done+0x10/0x10
> [  523.915255]  genl_rcv_msg+0x112/0x2a0
> [  523.915255]  ? __pfx_gtp_genl_dump_pdp+0x10/0x10 [gtp]
> [  523.915255]  ? __pfx_genl_rcv_msg+0x10/0x10
> [  523.915255]  netlink_rcv_skb+0x54/0x110
> [  523.915255]  genl_rcv+0x24/0x40
> [  523.915255]  netlink_unicast+0x19f/0x290
> [  523.915255]  netlink_sendmsg+0x250/0x4e0
> [  523.915255]  ____sys_sendmsg+0x376/0x3b0
> [  523.915255]  ? copy_msghdr_from_user+0x6d/0xb0
> [  523.915255]  ___sys_sendmsg+0x86/0xe0
> [  523.915255]  ? do_fault+0x296/0x470
> [  523.915255]  ? __handle_mm_fault+0x771/0xda0
> [  523.915255]  __sys_sendmsg+0x57/0xb0
> [  523.915255]  do_syscall_64+0x59/0x90
> [  523.915255]  ? ct_kernel_exit.isra.0+0x71/0x90
> [  523.915255]  ? __ct_user_enter+0x5a/0xd0
> [  523.915255]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [  523.915255] RIP: 0033:0x7f2bcb93cd49
> 
> -- 
> Regards,
> Vasiliy Kovalev
>
Vasiliy Kovalev Jan. 29, 2024, 4:53 p.m. UTC | #6
29.01.2024 15:02, Pablo Neira Ayuso wrote:
> __netlink_dump_start() is called at the beginning of the dump, which is
> grabbing a reference on this module.
>
> do you have a reproducer?
Hi, yes, in the cover letter [PATCH 0/1] 
https://lore.kernel.org/all/20240124101404.161655-1-kovalev@altlinux.org/
Vasiliy Kovalev Feb. 9, 2024, 6:16 p.m. UTC | #7
Hi,

24.01.2024 14:52, Eric Dumazet wrote:
> On Wed, Jan 24, 2024 at 12:20 PM <kovalev@altlinux.org> wrote:
>> 24.01.2024 13:57, Eric Dumazet wrote:
>>> Oh wait, this is a 5.10 kernel ?
>> Yes, but the bug is reproduced on the latest stable kernels.
>>> Please generate a stack trace using a recent tree, it is possible the
>>> bug has been fixed already.
>> See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the
>> bottom of the message.
> Ah, ok. Not sure why you sent a cover letter for a single patch...
>
> Setting a boolean, in a module that can disappear will not prevent the
> module from disappearing.
>
> This work around might work, or might not work, depending on timing,
> preemptions, ....
>
> Thanks.

I tested running the reproducer [1] on the 6.8-rc3 kernel, the crash 
occurs in less than 10 seconds and the qemu VM restarts:

dmesg -w:

[  106.941736] gtp: GTP module unloaded
[  106.962548] gtp: GTP module loaded (pdp ctx size 104 bytes)
[  107.014691] gtp: GTP module unloaded
[  107.041554] gtp: GTP module loaded (pdp ctx size 104 bytes)
[  107.082283] gtp: GTP module unloaded
[  107.123268] general protection fault, probably for non-canonical 
address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI
[  107.124050] KASAN: null-ptr-deref in range 
[0x0000000000000010-0x0000000000000017]
[  107.124339] CPU: 1 PID: 5826 Comm: gtp Not tainted 
6.8.0-rc3-std-def-alt1 #1
[  107.124604] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.0-alt1 04/01/2014
[  107.124916] RIP: 0010:gtp_genl_dump_pdp+0x1be/0x800 [gtp]
[  107.125141] Code: c6 89 c6 e8 64 e9 86 df 58 45 85 f6 0f 85 4e 04 00 
00 e8 c5 ee 86 df 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 
03 <80> 3c 02 00 0f 85 de 05 00 00 48 8b 44 24 18 4c 8b 30 4c 39 f0 74
[  107.125960] RSP: 0018:ffff888014107220 EFLAGS: 00010202
[  107.126164] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 
0000000000000000
[  107.126434] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 
0000000000000000
[  107.126707] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000000
[  107.126976] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000000
[  107.127245] R13: ffff88800fcda588 R14: 0000000000000001 R15: 
0000000000000000
[  107.127515] FS:  00007f1be4eb05c0(0000) GS:ffff88806ce80000(0000) 
knlGS:0000000000000000
[  107.127955] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  107.128177] CR2: 00007f1be4e766cf CR3: 000000000c33e000 CR4: 
0000000000750ef0
[  107.128450] PKRU: 55555554
[  107.128577] Call Trace:
[  107.128699]  <TASK>
[  107.128790]  ? show_regs+0x90/0xa0
[  107.128935]  ? die_addr+0x50/0xd0
[  107.129075]  ? exc_general_protection+0x148/0x220
[  107.129267]  ? asm_exc_general_protection+0x22/0x30
[  107.129469]  ? gtp_genl_dump_pdp+0x1be/0x800 [gtp]
[  107.129677]  ? __alloc_skb+0x1dd/0x350
[  107.129831]  ? __pfx___alloc_skb+0x10/0x10
[  107.129999]  genl_dumpit+0x11d/0x230
[  107.130150]  netlink_dump+0x5b9/0xce0
[  107.130301]  ? lockdep_hardirqs_on_prepare+0x253/0x430
[  107.130503]  ? __pfx_netlink_dump+0x10/0x10
[  107.130686]  ? kasan_save_track+0x10/0x40
[  107.130849]  ? __kasan_kmalloc+0x9b/0xa0
[  107.131009]  ? genl_start+0x675/0x970
[  107.131162]  __netlink_dump_start+0x6fc/0x9f0
[  107.131341]  genl_family_rcv_msg_dumpit+0x1bb/0x2d0
[  107.131538]  ? __pfx_genl_family_rcv_msg_dumpit+0x10/0x10
[  107.131754]  ? genl_op_from_small+0x2a/0x440
[  107.131972]  ? cap_capable+0x1d0/0x240
[  107.132127]  ? __pfx_genl_start+0x10/0x10
[  107.132292]  ? __pfx_genl_dumpit+0x10/0x10
[  107.132461]  ? __pfx_genl_done+0x10/0x10
[  107.132645]  ? security_capable+0x9d/0xe0

With the proposed patch applied, such a crash is not observed during 
long-term testing.

[1] 
https://lore.kernel.org/lkml/20240124101404.161655-1-kovalev@altlinux.org/T/#mf9b411baec52858b1c9118c671f26a6dc424e7b4
Eric Dumazet Feb. 9, 2024, 7:21 p.m. UTC | #8
On Fri, Feb 9, 2024 at 7:16 PM <kovalev@altlinux.org> wrote:
>
> Hi,
>
> 24.01.2024 14:52, Eric Dumazet wrote:
> > On Wed, Jan 24, 2024 at 12:20 PM <kovalev@altlinux.org> wrote:
> >> 24.01.2024 13:57, Eric Dumazet wrote:
> >>> Oh wait, this is a 5.10 kernel ?
> >> Yes, but the bug is reproduced on the latest stable kernels.
> >>> Please generate a stack trace using a recent tree, it is possible the
> >>> bug has been fixed already.
> >> See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the
> >> bottom of the message.
> > Ah, ok. Not sure why you sent a cover letter for a single patch...
> >
> > Setting a boolean, in a module that can disappear will not prevent the
> > module from disappearing.
> >
> > This work around might work, or might not work, depending on timing,
> > preemptions, ....
> >
> > Thanks.
>
> I tested running the reproducer [1] on the 6.8-rc3 kernel, the crash
> occurs in less than 10 seconds and the qemu VM restarts:
>
> dmesg -w:
>
> [  106.941736] gtp: GTP module unloaded
> [  106.962548] gtp: GTP module loaded (pdp ctx size 104 bytes)
> [  107.014691] gtp: GTP module unloaded
> [  107.041554] gtp: GTP module loaded (pdp ctx size 104 bytes)
> [  107.082283] gtp: GTP module unloaded
> [  107.123268] general protection fault, probably for non-canonical
> address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [  107.124050] KASAN: null-ptr-deref in range
> [0x0000000000000010-0x0000000000000017]
> [  107.124339] CPU: 1 PID: 5826 Comm: gtp Not tainted
> 6.8.0-rc3-std-def-alt1 #1
> [  107.124604] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.16.0-alt1 04/01/2014
> [  107.124916] RIP: 0010:gtp_genl_dump_pdp+0x1be/0x800 [gtp]
> [  107.125141] Code: c6 89 c6 e8 64 e9 86 df 58 45 85 f6 0f 85 4e 04 00
> 00 e8 c5 ee 86 df 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df 48 c1 ea
> 03 <80> 3c 02 00 0f 85 de 05 00 00 48 8b 44 24 18 4c 8b 30 4c 39 f0 74
> [  107.125960] RSP: 0018:ffff888014107220 EFLAGS: 00010202
> [  107.126164] RAX: dffffc0000000000 RBX: 0000000000000000 RCX:
> 0000000000000000
> [  107.126434] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
> 0000000000000000
> [  107.126707] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [  107.126976] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [  107.127245] R13: ffff88800fcda588 R14: 0000000000000001 R15:
> 0000000000000000
> [  107.127515] FS:  00007f1be4eb05c0(0000) GS:ffff88806ce80000(0000)
> knlGS:0000000000000000
> [  107.127955] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  107.128177] CR2: 00007f1be4e766cf CR3: 000000000c33e000 CR4:
> 0000000000750ef0
> [  107.128450] PKRU: 55555554
> [  107.128577] Call Trace:
> [  107.128699]  <TASK>
> [  107.128790]  ? show_regs+0x90/0xa0
> [  107.128935]  ? die_addr+0x50/0xd0
> [  107.129075]  ? exc_general_protection+0x148/0x220
> [  107.129267]  ? asm_exc_general_protection+0x22/0x30
> [  107.129469]  ? gtp_genl_dump_pdp+0x1be/0x800 [gtp]
> [  107.129677]  ? __alloc_skb+0x1dd/0x350
> [  107.129831]  ? __pfx___alloc_skb+0x10/0x10
> [  107.129999]  genl_dumpit+0x11d/0x230
> [  107.130150]  netlink_dump+0x5b9/0xce0
> [  107.130301]  ? lockdep_hardirqs_on_prepare+0x253/0x430
> [  107.130503]  ? __pfx_netlink_dump+0x10/0x10
> [  107.130686]  ? kasan_save_track+0x10/0x40
> [  107.130849]  ? __kasan_kmalloc+0x9b/0xa0
> [  107.131009]  ? genl_start+0x675/0x970
> [  107.131162]  __netlink_dump_start+0x6fc/0x9f0
> [  107.131341]  genl_family_rcv_msg_dumpit+0x1bb/0x2d0
> [  107.131538]  ? __pfx_genl_family_rcv_msg_dumpit+0x10/0x10
> [  107.131754]  ? genl_op_from_small+0x2a/0x440
> [  107.131972]  ? cap_capable+0x1d0/0x240
> [  107.132127]  ? __pfx_genl_start+0x10/0x10
> [  107.132292]  ? __pfx_genl_dumpit+0x10/0x10
> [  107.132461]  ? __pfx_genl_done+0x10/0x10
> [  107.132645]  ? security_capable+0x9d/0xe0
>
> With the proposed patch applied, such a crash is not observed during
> long-term testing.

Maybe, but the patch is not good, I think I and Pablo gave feedback on this ?

Please trace __netlink_dump_start() content of control->module

gtp_genl_family.module should be set, and we should get it.

Otherwise, if the bug is in the core, we would need a dozen of 'work
arounds because it is better than nothing'

Thank you.
Vasiliy Kovalev Feb. 14, 2024, 4:50 p.m. UTC | #9
09.02.2024 22:21, Eric Dumazet wrote:

> Maybe, but the patch is not good, I think I and Pablo gave feedback on this ?
>
> Please trace __netlink_dump_start() content of control->module
>
> gtp_genl_family.module should be set, and we should get it.
>
> Otherwise, if the bug is in the core, we would need a dozen of 'work
> arounds because it is better than nothing'
>
> Thank you.

Thanks.

I tracked the moment when the __netlink_dump_start() function was 
called, it turned out that in the gtp_init() initialization function 
before registering pernet subsystem (gtp_net_ops), therefore, outdated 
data is used, which leads to a crash.

The documentation says that ops structure must be assigned before 
registering a generic netlink family [1].

I have fixed and sent a new patch [2].

[1] 
https://elixir.bootlin.com/linux/v6.8-rc4/source/net/netlink/genetlink.c#L773

[2] 
https://lore.kernel.org/netdev/20240214162733.34214-1-kovalev@altlinux.org/T/#u
Eric Dumazet Feb. 15, 2024, 8:32 p.m. UTC | #10
On Wed, Feb 14, 2024 at 5:50 PM <kovalev@altlinux.org> wrote:
>
> 09.02.2024 22:21, Eric Dumazet wrote:
>
> > Maybe, but the patch is not good, I think I and Pablo gave feedback on this ?
> >
> > Please trace __netlink_dump_start() content of control->module
> >
> > gtp_genl_family.module should be set, and we should get it.
> >
> > Otherwise, if the bug is in the core, we would need a dozen of 'work
> > arounds because it is better than nothing'
> >
> > Thank you.
>
> Thanks.
>
> I tracked the moment when the __netlink_dump_start() function was
> called, it turned out that in the gtp_init() initialization function
> before registering pernet subsystem (gtp_net_ops), therefore, outdated
> data is used, which leads to a crash.
>
> The documentation says that ops structure must be assigned before
> registering a generic netlink family [1].
>
> I have fixed and sent a new patch [2].
>
> [1]
> https://elixir.bootlin.com/linux/v6.8-rc4/source/net/netlink/genetlink.c#L773
>
> [2]
> https://lore.kernel.org/netdev/20240214162733.34214-1-kovalev@altlinux.org/T/#u
>

Excellent detective work, thanks a lot !
diff mbox series

Patch

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 477b4d4f860bd3..3fc4639711cd83 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -1675,6 +1675,8 @@  static int gtp_genl_get_pdp(struct sk_buff *skb, struct genl_info *info)
 	return err;
 }
 
+static bool dump_pdp_en;
+
 static int gtp_genl_dump_pdp(struct sk_buff *skb,
 				struct netlink_callback *cb)
 {
@@ -1684,12 +1686,19 @@  static int gtp_genl_dump_pdp(struct sk_buff *skb,
 	struct pdp_ctx *pctx;
 	struct gtp_net *gn;
 
-	gn = net_generic(net, gtp_net_id);
-
-	if (cb->args[4])
+	/* Do not allow further operations if the module is
+	 * unloaded before or after the process is blocked.
+	 */
+	if (!dump_pdp_en)
 		return 0;
 
 	rcu_read_lock();
+	if (!dump_pdp_en || cb->args[4]) {
+		rcu_read_unlock();
+		return 0;
+	}
+	gn = net_generic(net, gtp_net_id);
+
 	list_for_each_entry_rcu(gtp, &gn->gtp_dev_list, list) {
 		if (last_gtp && last_gtp != gtp)
 			continue;
@@ -1914,6 +1923,8 @@  static int __init gtp_init(void)
 	if (err < 0)
 		goto unreg_genl_family;
 
+	dump_pdp_en = true;
+
 	pr_info("GTP module loaded (pdp ctx size %zd bytes)\n",
 		sizeof(struct pdp_ctx));
 	return 0;
@@ -1930,6 +1941,7 @@  late_initcall(gtp_init);
 
 static void __exit gtp_fini(void)
 {
+	dump_pdp_en = false;
 	genl_unregister_family(&gtp_genl_family);
 	rtnl_link_unregister(&gtp_link_ops);
 	unregister_pernet_subsys(&gtp_net_ops);