diff mbox series

[1/1] Net: bcm.c: Remove Subtree Instead of Entry

Message ID 20240808202658.5933-1-david.hunter.linux@gmail.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [1/1] Net: bcm.c: Remove Subtree Instead of Entry | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 29 this patch: 29
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 29 this patch: 29
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 29 this patch: 29
netdev/checkpatch warning WARNING: The commit message has 'Call Trace:', perhaps it also needs a 'Fixes:' tag?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-09--15-00 (tests: 705)

Commit Message

David Hunter Aug. 8, 2024, 8:26 p.m. UTC
Fix a warning with bcm.c that is caused by removing an entry. If the
entry had a process as a child, a warning is generated:

remove_proc_entry: removing non-empty directory 'net/can-bcm'...
WARNING: CPU: 1 PID: 71 at fs/proc/generic.c:717 remove_proc_entry
Call Trace:
remove_proc_entry
canbcm_pernet_exit
ops_exit_list

Instead of simply removing the entry, remove the entire subdirectory.
The child process will still be removed, but without a warning occurring.

This patch was compiled and the code traced with gdb to see that the
tree  was removed. The code was run to see that the warning was removed. 
In addition, the code was tested with the kselftest
net subsystem. No regressions were detected.

Signed-off-by: David Hunter <david.hunter.linux@gmail.com>
---
 net/can/bcm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Oliver Hartkopp Aug. 9, 2024, 9:57 a.m. UTC | #1
Hello David,

many thanks for the patch and the description.

Btw. the data structures of the elements inside that bcm proc dir should 
have been removed at that point, so that the can-bcm dir should be empty.

I'm not sure what happens to the open sockets that are (later) removed 
in bcm_release() when we use remove_proc_subtree() as suggested. 
Removing this warning probably does not heal the root cause of the issue.

What did you do to trigger the warning? Did you work with network 
namespaces or LXC/Docker and purged an entire namespace?

Best regards,
Oliver

On 08.08.24 22:26, David Hunter wrote:
> Fix a warning with bcm.c that is caused by removing an entry. If the
> entry had a process as a child, a warning is generated:
> 
> remove_proc_entry: removing non-empty directory 'net/can-bcm'...
> WARNING: CPU: 1 PID: 71 at fs/proc/generic.c:717 remove_proc_entry
> Call Trace:
> remove_proc_entry
> canbcm_pernet_exit
> ops_exit_list
> 
> Instead of simply removing the entry, remove the entire subdirectory.
> The child process will still be removed, but without a warning occurring.
> 
> This patch was compiled and the code traced with gdb to see that the
> tree  was removed. The code was run to see that the warning was removed.
> In addition, the code was tested with the kselftest
> net subsystem. No regressions were detected.
> 
> Signed-off-by: David Hunter <david.hunter.linux@gmail.com>
> ---
>   net/can/bcm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/can/bcm.c b/net/can/bcm.c
> index 27d5fcf0eac9..fea48fd793e5 100644
> --- a/net/can/bcm.c
> +++ b/net/can/bcm.c
> @@ -1779,7 +1779,7 @@ static void canbcm_pernet_exit(struct net *net)
>   #if IS_ENABLED(CONFIG_PROC_FS)
>   	/* remove /proc/net/can-bcm directory */
>   	if (net->can.bcmproc_dir)
> -		remove_proc_entry("can-bcm", net->proc_net);
> +		remove_proc_subtree("can-bcm", net->proc_net);
>   #endif /* CONFIG_PROC_FS */
>   }
>
David Hunter Aug. 9, 2024, 4:21 p.m. UTC | #2
Hello Oliver, 

> What did you do to trigger the warning? 

I am in the Linux Kernel Internship Program for the Linux Foundation. Our goal is to fix outstanding bugs with the kernel. I found the following bug on syzbot: 

https://syzkaller.appspot.com/bug?extid=df49d48077305d17519a

This specific link is for a separate issue that I will soon send a separate patch for; however, I found the bug for this patch after I switched the command parameter for panic_on_warn to 0. 

If you wish to reproduce the error, you can do the following steps: 
	1) compile and install a kernel with the config file from the link
	2) pass kernel paramter panic_on_warn=0
	3) build and run the C reproducer for the bug. 

As best as I can tell, the C reproducer simply made a system call that resulted in the bcm-can directry entry being deleted. I am still wrapping my head around the code (I am new to kernel programming), but here is the full stacktrace. 

156.449047][   T71] Call Trace:
[  156.450067][   T71]  <TASK>
[  156.451076][   T71]  ? show_regs+0x84/0x8b
[  156.452490][   T71]  ? __warn+0x150/0x29e
[  156.453754][   T71]  ? remove_proc_entry+0x335/0x385
[  156.456485][   T71]  ? report_bug+0x33d/0x431
[  156.457994][   T71]  ? remove_proc_entry+0x335/0x385
[  156.459845][   T71]  ? handle_bug+0x3d/0x66
[  156.461230][   T71]  ? exc_invalid_op+0x17/0x3e
[  156.462672][   T71]  ? asm_exc_invalid_op+0x1a/0x20
[  156.464282][   T71]  ? __warn_printk+0x26d/0x2aa
[  156.465759][   T71]  ? remove_proc_entry+0x335/0x385
[  156.467233][   T71]  ? remove_proc_entry+0x334/0x385
[  156.468821][   T71]  ? proc_readdir+0x11a/0x11a
[  156.470122][   T71]  ? __sanitizer_cov_trace_pc+0x1e/0x42
[  156.471697][   T71]  ? cgw_remove_all_jobs+0xa5/0x16f
[  156.474096][   T71]  canbcm_pernet_exit+0x73/0x79
[  156.476732][   T71]  ops_exit_list+0xf1/0x146
[  156.478358][   T71]  cleanup_net+0x333/0x570
[  156.479856][   T71]  ? setup_net+0x7ba/0x7ba
[  156.481479][   T71]  ? process_scheduled_works+0x652/0xbab
[  156.483592][   T71]  process_scheduled_works+0x7b8/0xbab
[  156.486039][   T71]  ? drain_workqueue+0x33b/0x33b
[  156.487841][   T71]  ? __sanitizer_cov_trace_pc+0x1e/0x42
[  156.489742][   T71]  ? move_linked_works+0x9f/0x108
[  156.491376][   T71]  worker_thread+0x5bd/0x6cc
[  156.492877][   T71]  ? rescuer_thread+0x64d/0x64d
[  156.494350][   T71]  kthread+0x30a/0x31e
[  156.495769][   T71]  ? kthread_complete_and_exit+0x35/0x35
[  156.497977][   T71]  ret_from_fork+0x34/0x6b
[  156.499734][   T71]  ? kthread_complete_and_exit+0x35/0x35
[  156.501494][   T71]  ret_from_fork_asm+0x11/0x20

> Removing this warning probably does not heal the root cause of the issue.

I would love to work on the root cause of the issue if at all possible. Do you think that the C reproducer went down an unlikely avenue, and therefore, further work is not needed, or do you think that this is an issue that requires some attention? 

I appreciate the response to my patch. I am learning a lot. 

Thanks, 
David
Kuniyuki Iwashima Aug. 9, 2024, 8:22 p.m. UTC | #3
From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Fri, 9 Aug 2024 11:57:41 +0200
> Hello David,
> 
> many thanks for the patch and the description.
> 
> Btw. the data structures of the elements inside that bcm proc dir should 
> have been removed at that point, so that the can-bcm dir should be empty.
> 
> I'm not sure what happens to the open sockets that are (later) removed 
> in bcm_release() when we use remove_proc_subtree() as suggested. 
> Removing this warning probably does not heal the root cause of the issue.

I posted a patch to fix bcm's proc entry leak few weeks ago, and this might
be related.
https://lore.kernel.org/netdev/20240722192842.37421-1-kuniyu@amazon.com/

Oliver, could you take this patch to can tree ?
Kuniyuki Iwashima Aug. 9, 2024, 8:28 p.m. UTC | #4
From: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Fri, 9 Aug 2024 13:22:49 -0700
> From: Oliver Hartkopp <socketcan@hartkopp.net>
> Date: Fri, 9 Aug 2024 11:57:41 +0200
> > Hello David,
> > 
> > many thanks for the patch and the description.
> > 
> > Btw. the data structures of the elements inside that bcm proc dir should 
> > have been removed at that point, so that the can-bcm dir should be empty.
> > 
> > I'm not sure what happens to the open sockets that are (later) removed 
> > in bcm_release() when we use remove_proc_subtree() as suggested. 
> > Removing this warning probably does not heal the root cause of the issue.
> 
> I posted a patch to fix bcm's proc entry leak few weeks ago, and this might
> be related.
> https://lore.kernel.org/netdev/20240722192842.37421-1-kuniyu@amazon.com/

I just noticed the syzbot report that David pointed out has the same
splat, so this is the same issue that my patch fixes.

https://syzkaller.appspot.com/bug?extid=df49d48077305d17519a
Marc Kleine-Budde Aug. 30, 2024, 10:56 a.m. UTC | #5
On 09.08.2024 13:22:49, Kuniyuki Iwashima wrote:
> From: Oliver Hartkopp <socketcan@hartkopp.net>
> Date: Fri, 9 Aug 2024 11:57:41 +0200
> > Hello David,
> > 
> > many thanks for the patch and the description.
> > 
> > Btw. the data structures of the elements inside that bcm proc dir should 
> > have been removed at that point, so that the can-bcm dir should be empty.
> > 
> > I'm not sure what happens to the open sockets that are (later) removed 
> > in bcm_release() when we use remove_proc_subtree() as suggested. 
> > Removing this warning probably does not heal the root cause of the issue.
> 
> I posted a patch to fix bcm's proc entry leak few weeks ago, and this might
> be related.
> https://lore.kernel.org/netdev/20240722192842.37421-1-kuniyu@amazon.com/
> 
> Oliver, could you take this patch to can tree ?

That patch is included in my latest PR to net:

https://lore.kernel.org/all/20240829192947.1186760-1-mkl@pengutronix.de

Marc
diff mbox series

Patch

diff --git a/net/can/bcm.c b/net/can/bcm.c
index 27d5fcf0eac9..fea48fd793e5 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1779,7 +1779,7 @@  static void canbcm_pernet_exit(struct net *net)
 #if IS_ENABLED(CONFIG_PROC_FS)
 	/* remove /proc/net/can-bcm directory */
 	if (net->can.bcmproc_dir)
-		remove_proc_entry("can-bcm", net->proc_net);
+		remove_proc_subtree("can-bcm", net->proc_net);
 #endif /* CONFIG_PROC_FS */
 }