mbox series

[0/3] wifi: Un-embed ath10k and ath11k dummy netdev

Message ID 20240405122123.4156104-1-leitao@debian.org (mailing list archive)
Headers show
Series wifi: Un-embed ath10k and ath11k dummy netdev | expand

Message

Breno Leitao April 5, 2024, 12:21 p.m. UTC
struct net_device shouldn't be embedded into any structure, instead,
the owner should use the private space to embed their state into
net_device.

This patch set fixes the problem above for ath10k and ath11k. This also
fixes the conversion of qtnfmac driver to the new helper.

This patch set depends on a series that is still under review:
https://lore.kernel.org/all/20240404114854.2498663-1-leitao@debian.org/#t

If it helps, I've pushed the tree to
https://github.com/leitao/linux/tree/wireless-dummy

PS: Due to lack of hardware, unfortunately all these patches are
compiled tested only.

Breno Leitao (3):
  wifi: qtnfmac: Use netdev dummy allocator helper
  wifi: ath10k: allocate dummy net_device dynamically
  wifi: ath11k: allocate dummy net_device dynamically

 drivers/net/wireless/ath/ath10k/core.c        |  9 ++++++--
 drivers/net/wireless/ath/ath10k/core.h        |  2 +-
 drivers/net/wireless/ath/ath10k/pci.c         |  2 +-
 drivers/net/wireless/ath/ath10k/sdio.c        |  2 +-
 drivers/net/wireless/ath/ath10k/snoc.c        |  4 ++--
 drivers/net/wireless/ath/ath10k/usb.c         |  2 +-
 drivers/net/wireless/ath/ath11k/ahb.c         |  9 ++++++--
 drivers/net/wireless/ath/ath11k/core.h        |  2 +-
 drivers/net/wireless/ath/ath11k/pcic.c        | 21 +++++++++++++++----
 .../wireless/quantenna/qtnfmac/pcie/pcie.c    |  3 +--
 10 files changed, 39 insertions(+), 17 deletions(-)

Comments

Kalle Valo April 5, 2024, 3:15 p.m. UTC | #1
Breno Leitao <leitao@debian.org> writes:

> struct net_device shouldn't be embedded into any structure, instead,
> the owner should use the private space to embed their state into
> net_device.
>
> This patch set fixes the problem above for ath10k and ath11k. This also
> fixes the conversion of qtnfmac driver to the new helper.
>
> This patch set depends on a series that is still under review:
> https://lore.kernel.org/all/20240404114854.2498663-1-leitao@debian.org/#t
>
> If it helps, I've pushed the tree to
> https://github.com/leitao/linux/tree/wireless-dummy
>
> PS: Due to lack of hardware, unfortunately all these patches are
> compiled tested only.
>
> Breno Leitao (3):
>   wifi: qtnfmac: Use netdev dummy allocator helper
>   wifi: ath10k: allocate dummy net_device dynamically
>   wifi: ath11k: allocate dummy net_device dynamically

Thanks for setting up the branch, that makes the testing very easy. I
now tested the branch using the commit below with ath11k WCN6855 hw2.0
on an x86 box:

5be9a125d8e7 wifi: ath11k: allocate dummy net_device dynamically

But unfortunately it crashes, the stack trace below. I can easily test
your branches, just let me know what to test. A direct 'git pull'
command is the best.

[  238.886002] rmmod ath11k_pci
[  239.530328] ------------[ cut here ]------------
[  239.530433] kernel BUG at net/core/dev.c:11066!
[  239.530621] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[  239.530709] CPU: 5 PID: 1668 Comm: rmmod Not tainted 6.9.0-rc2+ #1367
[  239.530762] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021
[  239.530848] RIP: 0010:free_netdev (net/core/dev.c:11066 (discriminator 1)) 
[ 239.530893] Code: 08 84 d2 0f 85 3c 01 00 00 0f b7 83 3a 03 00 00 48 29 c3 48 89 df e8 27 10 21 fe 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 44 0f b6 25 fd 91 53 02 41 80 fc 01 0f 87 1f 14 94 00 41 83
All code
========
   0:	08 84 d2 0f 85 3c 01 	or     %al,0x13c850f(%rdx,%rdx,8)
   7:	00 00                	add    %al,(%rax)
   9:	0f b7 83 3a 03 00 00 	movzwl 0x33a(%rbx),%eax
  10:	48 29 c3             	sub    %rax,%rbx
  13:	48 89 df             	mov    %rbx,%rdi
  16:	e8 27 10 21 fe       	call   0xfffffffffe211042
  1b:	48 83 c4 10          	add    $0x10,%rsp
  1f:	5b                   	pop    %rbx
  20:	41 5c                	pop    %r12
  22:	41 5d                	pop    %r13
  24:	41 5e                	pop    %r14
  26:	41 5f                	pop    %r15
  28:	5d                   	pop    %rbp
  29:	c3                   	ret
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	44 0f b6 25 fd 91 53 	movzbl 0x25391fd(%rip),%r12d        # 0x2539231
  33:	02 
  34:	41 80 fc 01          	cmp    $0x1,%r12b
  38:	0f 87 1f 14 94 00    	ja     0x94145d
  3e:	41                   	rex.B
  3f:	83                   	.byte 0x83

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	44 0f b6 25 fd 91 53 	movzbl 0x25391fd(%rip),%r12d        # 0x2539207
   9:	02 
   a:	41 80 fc 01          	cmp    $0x1,%r12b
   e:	0f 87 1f 14 94 00    	ja     0x941433
  14:	41                   	rex.B
  15:	83                   	.byte 0x83
[  239.530977] RSP: 0018:ffffc90002dffb70 EFLAGS: 00010202
[  239.531023] RAX: 0000000000000005 RBX: ffff88810c819000 RCX: 0000000000000000
[  239.531072] RDX: 1ffff110219032d1 RSI: ffffffff87e79780 RDI: 0000000000000000
[  239.531120] RBP: ffffc90002dffba8 R08: 0000000000000001 R09: 0000000000000001
[  239.531169] R10: ffffffff897e3f17 R11: 00000000000000bb R12: ffff88810c8194e0
[  239.531224] R13: ffff88810c819178 R14: dffffc0000000000 R15: ffff88810c819688
[  239.531274] FS:  00007f462c4c4740(0000) GS:ffff888232c00000(0000) knlGS:0000000000000000
[  239.531326] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  239.531371] CR2: 000055fc93483308 CR3: 0000000113a99004 CR4: 00000000003706f0
[  239.531420] Call Trace:
[  239.531484]  <TASK>
[  239.531550] ? show_regs (arch/x86/kernel/dumpstack.c:479) 
[  239.531592] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447) 
[  239.531630] ? do_trap (arch/x86/kernel/traps.c:114 arch/x86/kernel/traps.c:155) 
[  239.531669] ? do_error_trap (./arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:176) 
[  239.531708] ? free_netdev (net/core/dev.c:11066 (discriminator 1)) 
[  239.531749] ? handle_invalid_op (arch/x86/kernel/traps.c:214) 
[  239.531789] ? free_netdev (net/core/dev.c:11066 (discriminator 1)) 
[  239.531829] ? exc_invalid_op (arch/x86/kernel/traps.c:267) 
[  239.531868] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) 
[  239.531910] ? free_netdev (net/core/dev.c:11066 (discriminator 1)) 
[  239.531952] ath11k_pcic_free_irq (drivers/net/wireless/ath/ath11k/pcic.c:312 (discriminator 2) drivers/net/wireless/ath/ath11k/pcic.c:334 (discriminator 2)) ath11k
[  239.532029] ath11k_pci_remove (drivers/net/wireless/ath/ath11k/pci.c:478 drivers/net/wireless/ath/ath11k/pci.c:987) ath11k_pci
[  239.532075] pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475) 
[  239.532116] device_remove (drivers/base/dd.c:569) 
[  239.532155] device_release_driver_internal (drivers/base/dd.c:1272 drivers/base/dd.c:1293) 
[  239.532198] ? __kasan_check_read (mm/kasan/shadow.c:32) 
[  239.532241] driver_detach (drivers/base/dd.c:1357) 
[  239.532281] bus_remove_driver (drivers/base/bus.c:736) 
[  239.532322] driver_unregister (drivers/base/driver.c:275) 
[  239.532362] pci_unregister_driver (./include/linux/spinlock.h:351 drivers/pci/pci-driver.c:85 drivers/pci/pci-driver.c:1467) 
[  239.532402] ? find_module_all (kernel/module/main.c:357 (discriminator 1)) 
[  239.532443] ath11k_pci_exit (drivers/net/wireless/ath/ath11k/pci.c:1069) ath11k_pci
[  239.532543] __do_sys_delete_module (kernel/module/main.c:756) 
[  239.532584] ? __kasan_slab_free (mm/kasan/common.c:274) 
[  239.532625] ? module_flags (kernel/module/main.c:700) 
[  239.532666] ? kmem_cache_free (mm/slub.c:4280 (discriminator 3) mm/slub.c:4344 (discriminator 3)) 
[  239.533366] ? __fput (fs/file_table.c:436) 
[  239.534132] ? debug_smp_processor_id (lib/smp_processor_id.c:61) 
[  239.534855] __x64_sys_delete_module (kernel/module/main.c:698) 
[  239.535620] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) 
[  239.536334] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) 
[  239.537628] RIP: 0033:0x7f462c611c8b
[ 239.538510] Code: 73 01 c3 48 8b 0d 05 c2 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 c1 0c 00 f7 d8 64 89 01 48
All code
========
   0:	73 01                	jae    0x3
   2:	c3                   	ret
   3:	48 8b 0d 05 c2 0c 00 	mov    0xcc205(%rip),%rcx        # 0xcc20f
   a:	f7 d8                	neg    %eax
   c:	64 89 01             	mov    %eax,%fs:(%rcx)
   f:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
  13:	c3                   	ret
  14:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
  1b:	00 00 00 
  1e:	90                   	nop
  1f:	f3 0f 1e fa          	endbr64
  23:	b8 b0 00 00 00       	mov    $0xb0,%eax
  28:	0f 05                	syscall
  2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
  30:	73 01                	jae    0x33
  32:	c3                   	ret
  33:	48 8b 0d d5 c1 0c 00 	mov    0xcc1d5(%rip),%rcx        # 0xcc20f
  3a:	f7 d8                	neg    %eax
  3c:	64 89 01             	mov    %eax,%fs:(%rcx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   6:	73 01                	jae    0x9
   8:	c3                   	ret
   9:	48 8b 0d d5 c1 0c 00 	mov    0xcc1d5(%rip),%rcx        # 0xcc1e5
  10:	f7 d8                	neg    %eax
  12:	64 89 01             	mov    %eax,%fs:(%rcx)
  15:	48                   	rex.W
[  239.540022] RSP: 002b:00007fff08acd828 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  239.540791] RAX: ffffffffffffffda RBX: 0000555f9f0647d0 RCX: 00007f462c611c8b
[  239.541556] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555f9f064838
[  239.542319] RBP: 00007fff08acd888 R08: 0000000000000000 R09: 0000000000000000
[  239.543087] R10: 00007f462c68dac0 R11: 0000000000000206 R12: 00007fff08acda60
[  239.543859] R13: 00007fff08acdeb7 R14: 0000555f9f0632a0 R15: 0000555f9f0647d0
[  239.544615]  </TASK>
[  239.545337] Modules linked in: ath11k_pci(-) ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr nvme nvme_core
[  239.546153] ---[ end trace 0000000000000000 ]---
[  239.568717] RIP: 0010:free_netdev (net/core/dev.c:11066 (discriminator 1)) 
[ 239.569476] Code: 08 84 d2 0f 85 3c 01 00 00 0f b7 83 3a 03 00 00 48 29 c3 48 89 df e8 27 10 21 fe 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 44 0f b6 25 fd 91 53 02 41 80 fc 01 0f 87 1f 14 94 00 41 83
All code
========
   0:	08 84 d2 0f 85 3c 01 	or     %al,0x13c850f(%rdx,%rdx,8)
   7:	00 00                	add    %al,(%rax)
   9:	0f b7 83 3a 03 00 00 	movzwl 0x33a(%rbx),%eax
  10:	48 29 c3             	sub    %rax,%rbx
  13:	48 89 df             	mov    %rbx,%rdi
  16:	e8 27 10 21 fe       	call   0xfffffffffe211042
  1b:	48 83 c4 10          	add    $0x10,%rsp
  1f:	5b                   	pop    %rbx
  20:	41 5c                	pop    %r12
  22:	41 5d                	pop    %r13
  24:	41 5e                	pop    %r14
  26:	41 5f                	pop    %r15
  28:	5d                   	pop    %rbp
  29:	c3                   	ret
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	44 0f b6 25 fd 91 53 	movzbl 0x25391fd(%rip),%r12d        # 0x2539231
  33:	02 
  34:	41 80 fc 01          	cmp    $0x1,%r12b
  38:	0f 87 1f 14 94 00    	ja     0x94145d
  3e:	41                   	rex.B
  3f:	83                   	.byte 0x83

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	44 0f b6 25 fd 91 53 	movzbl 0x25391fd(%rip),%r12d        # 0x2539207
   9:	02 
   a:	41 80 fc 01          	cmp    $0x1,%r12b
   e:	0f 87 1f 14 94 00    	ja     0x941433
  14:	41                   	rex.B
  15:	83                   	.byte 0x83
[  239.571082] RSP: 0018:ffffc90002dffb70 EFLAGS: 00010202
[  239.571929] RAX: 0000000000000005 RBX: ffff88810c819000 RCX: 0000000000000000
[  239.572970] RDX: 1ffff110219032d1 RSI: ffffffff87e79780 RDI: 0000000000000000
[  239.573792] RBP: ffffc90002dffba8 R08: 0000000000000001 R09: 0000000000000001
[  239.574600] R10: ffffffff897e3f17 R11: 00000000000000bb R12: ffff88810c8194e0
[  239.575368] R13: ffff88810c819178 R14: dffffc0000000000 R15: ffff88810c819688
[  239.576136] FS:  00007f462c4c4740(0000) GS:ffff888231c00000(0000) knlGS:0000000000000000
[  239.576909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  239.577723] CR2: 0000557ff5c54118 CR3: 0000000113a99003 CR4: 00000000003706f0
[  239.578508] Kernel panic - not syncing: Fatal exception
[  239.579273] Kernel Offset: 0x3e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  239.608975] Rebooting in 10 seconds..
Breno Leitao April 8, 2024, 1:33 p.m. UTC | #2
Hello Kalle,

On Fri, Apr 05, 2024 at 06:15:05PM +0300, Kalle Valo wrote:
> Breno Leitao <leitao@debian.org> writes:
> 
> > struct net_device shouldn't be embedded into any structure, instead,
> > the owner should use the private space to embed their state into
> > net_device.
> >
> > This patch set fixes the problem above for ath10k and ath11k. This also
> > fixes the conversion of qtnfmac driver to the new helper.
> >
> > This patch set depends on a series that is still under review:
> > https://lore.kernel.org/all/20240404114854.2498663-1-leitao@debian.org/#t
> >
> > If it helps, I've pushed the tree to
> > https://github.com/leitao/linux/tree/wireless-dummy
> >
> > PS: Due to lack of hardware, unfortunately all these patches are
> > compiled tested only.
> >
> > Breno Leitao (3):
> >   wifi: qtnfmac: Use netdev dummy allocator helper
> >   wifi: ath10k: allocate dummy net_device dynamically
> >   wifi: ath11k: allocate dummy net_device dynamically
> 
> Thanks for setting up the branch, that makes the testing very easy. I
> now tested the branch using the commit below with ath11k WCN6855 hw2.0
> on an x86 box:
> 
> 5be9a125d8e7 wifi: ath11k: allocate dummy net_device dynamically
> 
> But unfortunately it crashes, the stack trace below. I can easily test
> your branches, just let me know what to test. A direct 'git pull'
> command is the best.

Thanks for the test.

Reading the issue, I am afraid that freeing netdev explicitly
(free_netdev()) might not be the best approach at the exit path.

I would like to try to leverage the ->needs_free_netdev netdev
mechanism to do the clean-up, if that makes sense. I've updated the
ath11k patch, and I am curious if that is what we want.

Would you mind testing a net patch I have, please?

  https://github.com/leitao/linux/tree/wireless-dummy_v2

PS: I didn't updated the other drivers (ath10k, qtnfmac, etc).

Thank you!
Kalle Valo April 8, 2024, 4:43 p.m. UTC | #3
Breno Leitao <leitao@debian.org> writes:

> Hello Kalle,
>
> On Fri, Apr 05, 2024 at 06:15:05PM +0300, Kalle Valo wrote:
>> Breno Leitao <leitao@debian.org> writes:
>> 
>> > struct net_device shouldn't be embedded into any structure, instead,
>> > the owner should use the private space to embed their state into
>> > net_device.
>> >
>> > This patch set fixes the problem above for ath10k and ath11k. This also
>> > fixes the conversion of qtnfmac driver to the new helper.
>> >
>> > This patch set depends on a series that is still under review:
>> > https://lore.kernel.org/all/20240404114854.2498663-1-leitao@debian.org/#t
>> >
>> > If it helps, I've pushed the tree to
>> > https://github.com/leitao/linux/tree/wireless-dummy
>> >
>> > PS: Due to lack of hardware, unfortunately all these patches are
>> > compiled tested only.
>> >
>> > Breno Leitao (3):
>> >   wifi: qtnfmac: Use netdev dummy allocator helper
>> >   wifi: ath10k: allocate dummy net_device dynamically
>> >   wifi: ath11k: allocate dummy net_device dynamically
>> 
>> Thanks for setting up the branch, that makes the testing very easy. I
>> now tested the branch using the commit below with ath11k WCN6855 hw2.0
>> on an x86 box:
>> 
>> 5be9a125d8e7 wifi: ath11k: allocate dummy net_device dynamically
>> 
>> But unfortunately it crashes, the stack trace below. I can easily test
>> your branches, just let me know what to test. A direct 'git pull'
>> command is the best.
>
> Thanks for the test.
>
> Reading the issue, I am afraid that freeing netdev explicitly
> (free_netdev()) might not be the best approach at the exit path.
>
> I would like to try to leverage the ->needs_free_netdev netdev
> mechanism to do the clean-up, if that makes sense. I've updated the
> ath11k patch, and I am curious if that is what we want.
>
> Would you mind testing a net patch I have, please?
>
>   https://github.com/leitao/linux/tree/wireless-dummy_v2

I tested this again with my WCN6855 hw2.0 x86 test box on this commit:

a87674ac820e wifi: ath11k: allocate dummy net_device dynamically

It passes my tests and doesn't crash, but I see this kmemleak warning a
lot:

unreferenced object 0xffff888127109400 (size 128):
  comm "insmod", pid 2813, jiffies 4294926528
  hex dump (first 32 bytes):
    d0 93 d5 0a 81 88 ff ff d0 93 d5 0a 81 88 ff ff  ................
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 870e4f12):
    [<ffffffff99bcd375>] kmemleak_alloc+0x45/0x80
    [<ffffffff975707a8>] kmalloc_trace+0x278/0x2c0
    [<ffffffff992904c5>] __hw_addr_create+0x55/0x260
    [<ffffffff992909cb>] __hw_addr_add_ex+0x2fb/0x6d0
    [<ffffffff99294004>] dev_addr_init+0x144/0x230
    [<ffffffff992629ee>] alloc_netdev_mqs+0x12e/0xfe0
    [<ffffffff992638c5>] alloc_netdev_dummy+0x25/0x30
    [<ffffffffc0b6b0cd>] ath11k_pcic_ext_irq_config+0x1ad/0xc10 [ath11k]
    [<ffffffffc0b6c431>] ath11k_pcic_config_irq+0x2f1/0x4b0 [ath11k]
    [<ffffffffc0cb8314>] ath11k_pci_probe+0x874/0x1210 [ath11k_pci]
    [<ffffffff97febf06>] local_pci_probe+0xd6/0x180
    [<ffffffff97feefaa>] pci_call_probe+0x15a/0x400
    [<ffffffff97ff03d6>] pci_device_probe+0xa6/0x100
    [<ffffffff98abe315>] really_probe+0x1d5/0x920
    [<ffffffff98abed48>] __driver_probe_device+0x2e8/0x3f0
    [<ffffffff98abee9a>] driver_probe_device+0x4a/0x140
Breno Leitao April 8, 2024, 7:33 p.m. UTC | #4
On Mon, Apr 08, 2024 at 07:43:42PM +0300, Kalle Valo wrote:
> Breno Leitao <leitao@debian.org> writes:
> > On Fri, Apr 05, 2024 at 06:15:05PM +0300, Kalle Valo wrote:
> >> Breno Leitao <leitao@debian.org> writes:
> >> 
> >> > struct net_device shouldn't be embedded into any structure, instead,
> >> > the owner should use the private space to embed their state into
> >> > net_device.
> >> >
> >> > This patch set fixes the problem above for ath10k and ath11k. This also
> >> > fixes the conversion of qtnfmac driver to the new helper.
> >> >
> >> > This patch set depends on a series that is still under review:
> >> > https://lore.kernel.org/all/20240404114854.2498663-1-leitao@debian.org/#t
> >> >
> >> > If it helps, I've pushed the tree to
> >> > https://github.com/leitao/linux/tree/wireless-dummy
> >> >
> >> > PS: Due to lack of hardware, unfortunately all these patches are
> >> > compiled tested only.
> >> >
> >> > Breno Leitao (3):
> >> >   wifi: qtnfmac: Use netdev dummy allocator helper
> >> >   wifi: ath10k: allocate dummy net_device dynamically
> >> >   wifi: ath11k: allocate dummy net_device dynamically
> >> 
> >> Thanks for setting up the branch, that makes the testing very easy. I
> >> now tested the branch using the commit below with ath11k WCN6855 hw2.0
> >> on an x86 box:
> >> 
> >> 5be9a125d8e7 wifi: ath11k: allocate dummy net_device dynamically
> >> 
> >> But unfortunately it crashes, the stack trace below. I can easily test
> >> your branches, just let me know what to test. A direct 'git pull'
> >> command is the best.
> >
> > Thanks for the test.
> >
> > Reading the issue, I am afraid that freeing netdev explicitly
> > (free_netdev()) might not be the best approach at the exit path.
> >
> > I would like to try to leverage the ->needs_free_netdev netdev
> > mechanism to do the clean-up, if that makes sense. I've updated the
> > ath11k patch, and I am curious if that is what we want.
> >
> > Would you mind testing a net patch I have, please?
> >
> >   https://github.com/leitao/linux/tree/wireless-dummy_v2
> 
> I tested this again with my WCN6855 hw2.0 x86 test box on this commit:
> 
> a87674ac820e wifi: ath11k: allocate dummy net_device dynamically
> 
> It passes my tests and doesn't crash, but I see this kmemleak warning a
> lot:

Thanks Kalle, that was helpful. The device is not being clean-up
automatically.

Chatting with Jakub, he suggested coming back to the original approach,
but, adding a additional patch, at the free_netdev().

Would you mind running another test, please?

	https://github.com/leitao/linux/tree/wireless-dummy_v3

The branch above is basically the original branch (as in this patch
series), with this additional patch:

	Author: Breno Leitao <leitao@debian.org>
	Date:   Mon Apr 8 11:37:32 2024 -0700

	    net: free_netdev: exit earlier if dummy

	    For dummy devices, exit earlier at free_netdev() instead of executing
	    the whole function. This is necessary, because dummy devices are
	    special, and shouldn't have the second part of the function executed.

	    Otherwise reg_state, which is NETREG_DUMMY, will be overwritten and
	    there will be no way to identify that this is a dummy device. Also, this
	    device do not need the final put_device(), since dummy devices are not
	    registered (through register_netdevice()), where the device reference is
	    increased (at netdev_register_kobject()/device_add()).

	    Suggested-by: Jakub Kicinski <kuba@kernel.org>
	    Signed-off-by: Breno Leitao <leitao@debian.org>

	diff --git a/net/core/dev.c b/net/core/dev.c
	index 2b82bd1cd2f8..5d2cb97d0ae6 100644
	--- a/net/core/dev.c
	+++ b/net/core/dev.c
	@@ -11058,7 +11058,8 @@ void free_netdev(struct net_device *dev)
		dev->xdp_bulkq = NULL;

		/*  Compatibility with error handling in drivers */
	-       if (dev->reg_state == NETREG_UNINITIALIZED) {
	+       if (dev->reg_state == NETREG_UNINITIALIZED ||
	+           dev->reg_state == NETREG_DUMMY) {
			netdev_freemem(dev);
			return;
		}
Kalle Valo April 9, 2024, 10:03 a.m. UTC | #5
Breno Leitao <leitao@debian.org> writes:

>> > Reading the issue, I am afraid that freeing netdev explicitly
>> > (free_netdev()) might not be the best approach at the exit path.
>> >
>> > I would like to try to leverage the ->needs_free_netdev netdev
>> > mechanism to do the clean-up, if that makes sense. I've updated the
>> > ath11k patch, and I am curious if that is what we want.
>> >
>> > Would you mind testing a net patch I have, please?
>> >
>> >   https://github.com/leitao/linux/tree/wireless-dummy_v2
>> 
>> I tested this again with my WCN6855 hw2.0 x86 test box on this commit:
>> 
>> a87674ac820e wifi: ath11k: allocate dummy net_device dynamically
>> 
>> It passes my tests and doesn't crash, but I see this kmemleak warning a
>> lot:
>
> Thanks Kalle, that was helpful. The device is not being clean-up
> automatically.
>
> Chatting with Jakub, he suggested coming back to the original approach,
> but, adding a additional patch, at the free_netdev().
>
> Would you mind running another test, please?
>
> 	https://github.com/leitao/linux/tree/wireless-dummy_v3
>
> The branch above is basically the original branch (as in this patch
> series), with this additional patch:
>
> 	Author: Breno Leitao <leitao@debian.org>
> 	Date:   Mon Apr 8 11:37:32 2024 -0700
>
> 	    net: free_netdev: exit earlier if dummy

I tested with the same ath11k hardware and this one passes all my
(simple) ath11k tests, no issues found. I used this commit:

1c10aebaa8ce net: free_netdev: exit earlier if dummy
Breno Leitao April 9, 2024, 10:51 a.m. UTC | #6
On Tue, Apr 09, 2024 at 01:03:21PM +0300, Kalle Valo wrote:
> Breno Leitao <leitao@debian.org> writes:
> 
> >> > Reading the issue, I am afraid that freeing netdev explicitly
> >> > (free_netdev()) might not be the best approach at the exit path.
> >> >
> >> > I would like to try to leverage the ->needs_free_netdev netdev
> >> > mechanism to do the clean-up, if that makes sense. I've updated the
> >> > ath11k patch, and I am curious if that is what we want.
> >> >
> >> > Would you mind testing a net patch I have, please?
> >> >
> >> >   https://github.com/leitao/linux/tree/wireless-dummy_v2
> >> 
> >> I tested this again with my WCN6855 hw2.0 x86 test box on this commit:
> >> 
> >> a87674ac820e wifi: ath11k: allocate dummy net_device dynamically
> >> 
> >> It passes my tests and doesn't crash, but I see this kmemleak warning a
> >> lot:
> >
> > Thanks Kalle, that was helpful. The device is not being clean-up
> > automatically.
> >
> > Chatting with Jakub, he suggested coming back to the original approach,
> > but, adding a additional patch, at the free_netdev().
> >
> > Would you mind running another test, please?
> >
> > 	https://github.com/leitao/linux/tree/wireless-dummy_v3
> >
> > The branch above is basically the original branch (as in this patch
> > series), with this additional patch:
> >
> > 	Author: Breno Leitao <leitao@debian.org>
> > 	Date:   Mon Apr 8 11:37:32 2024 -0700
> >
> > 	    net: free_netdev: exit earlier if dummy
> 
> I tested with the same ath11k hardware and this one passes all my
> (simple) ath11k tests, no issues found. I used this commit:
> 
> 1c10aebaa8ce net: free_netdev: exit earlier if dummy

Thank you so much for the test.

I will respin a v2 of this patchset with the additional patch included.
Kalle Valo April 9, 2024, 11:40 a.m. UTC | #7
Breno Leitao <leitao@debian.org> writes:

> On Tue, Apr 09, 2024 at 01:03:21PM +0300, Kalle Valo wrote:
>
>> Breno Leitao <leitao@debian.org> writes:
>> 
>> >> > Reading the issue, I am afraid that freeing netdev explicitly
>> >> > (free_netdev()) might not be the best approach at the exit path.
>> >> >
>> >> > I would like to try to leverage the ->needs_free_netdev netdev
>> >> > mechanism to do the clean-up, if that makes sense. I've updated the
>> >> > ath11k patch, and I am curious if that is what we want.
>> >> >
>> >> > Would you mind testing a net patch I have, please?
>> >> >
>> >> >   https://github.com/leitao/linux/tree/wireless-dummy_v2
>> >> 
>> >> I tested this again with my WCN6855 hw2.0 x86 test box on this commit:
>> >> 
>> >> a87674ac820e wifi: ath11k: allocate dummy net_device dynamically
>> >> 
>> >> It passes my tests and doesn't crash, but I see this kmemleak warning a
>> >> lot:
>> >
>> > Thanks Kalle, that was helpful. The device is not being clean-up
>> > automatically.
>> >
>> > Chatting with Jakub, he suggested coming back to the original approach,
>> > but, adding a additional patch, at the free_netdev().
>> >
>> > Would you mind running another test, please?
>> >
>> > 	https://github.com/leitao/linux/tree/wireless-dummy_v3
>> >
>> > The branch above is basically the original branch (as in this patch
>> > series), with this additional patch:
>> >
>> > 	Author: Breno Leitao <leitao@debian.org>
>> > 	Date:   Mon Apr 8 11:37:32 2024 -0700
>> >
>> > 	    net: free_netdev: exit earlier if dummy
>> 
>> I tested with the same ath11k hardware and this one passes all my
>> (simple) ath11k tests, no issues found. I used this commit:
>> 
>> 1c10aebaa8ce net: free_netdev: exit earlier if dummy
>
> Thank you so much for the test.
>
> I will respin a v2 of this patchset with the additional patch included.

Sounds good. Feel free to add:

Tested-by: Kalle Valo <kvalo@kernel.org>