diff mbox series

configs/debug: make sure PROVE_RCU_LIST=y takes effect

Message ID 20241016011144.3058445-1-kuba@kernel.org (mailing list archive)
State New
Headers show
Series configs/debug: make sure PROVE_RCU_LIST=y takes effect | expand

Commit Message

Jakub Kicinski Oct. 16, 2024, 1:11 a.m. UTC
Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
added CONFIG_PROVE_RCU_LIST=y to the common CI config,
but RCU_EXPERT is not set, and it's a dependency for
CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
indicate that it does catch bugs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
I'd be slightly tempted to still send it to Linux for v6.12.

CC: paulmck@kernel.org
CC: frederic@kernel.org
CC: neeraj.upadhyay@kernel.org
CC: joel@joelfernandes.org
CC: rcu@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: kees@kernel.org
CC: matttbe@kernel.org
---
 kernel/configs/debug.config | 1 +
 1 file changed, 1 insertion(+)

Comments

Joel Fernandes Oct. 16, 2024, 2:23 a.m. UTC | #1
On Tue, Oct 15, 2024 at 06:11:44PM -0700, Jakub Kicinski wrote:
> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
> but RCU_EXPERT is not set, and it's a dependency for
> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
> indicate that it does catch bugs.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel

> ---
> I'd be slightly tempted to still send it to Linux for v6.12.
> 
> CC: paulmck@kernel.org
> CC: frederic@kernel.org
> CC: neeraj.upadhyay@kernel.org
> CC: joel@joelfernandes.org
> CC: rcu@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: kees@kernel.org
> CC: matttbe@kernel.org
> ---
>  kernel/configs/debug.config | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config
> index 509ee703de15..20552f163930 100644
> --- a/kernel/configs/debug.config
> +++ b/kernel/configs/debug.config
> @@ -103,6 +103,7 @@ CONFIG_BUG_ON_DATA_CORRUPTION=y
>  #
>  # RCU Debugging
>  #
> +CONFIG_RCU_EXPERT=y
>  CONFIG_PROVE_RCU=y
>  CONFIG_PROVE_RCU_LIST=y
>  #
> -- 
> 2.46.2
>
Matthieu Baerts (NGI0) Oct. 16, 2024, 3:18 p.m. UTC | #2
Hi Jakub,

On 16/10/2024 03:11, Jakub Kicinski wrote:
> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
> but RCU_EXPERT is not set, and it's a dependency for
> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
> indicate that it does catch bugs.

Good catch! I confirm it fixes the issue:

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> I'd be slightly tempted to still send it to Linux for v6.12.

Good idea, it sounds like a fix because I guess if it was on the list,
it was supposed to be used. But be careful it might detect quite a few
issues: I just enabled it on MPTCP tree, and it found issues.

It might be good to check the impact first, e.g. at least enabling it
when running the Networking tests. Please note that the CI didn't pick
up this patch, because it is marked for RCU (PATCH rcu):


https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-kuba@kernel.org/

If the impact is important, it might be better to target linux-next
first, no?

Cheers,
Matt
Matthieu Baerts (NGI0) Oct. 16, 2024, 3:29 p.m. UTC | #3
On 16/10/2024 17:18, Matthieu Baerts wrote:
> Hi Jakub,
> 
> On 16/10/2024 03:11, Jakub Kicinski wrote:
>> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
>> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
>> but RCU_EXPERT is not set, and it's a dependency for
>> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
>> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
>> indicate that it does catch bugs.
> 
> Good catch! I confirm it fixes the issue:
> 
> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> 
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> ---
>> I'd be slightly tempted to still send it to Linux for v6.12.
> 
> Good idea, it sounds like a fix because I guess if it was on the list,
> it was supposed to be used. But be careful it might detect quite a few
> issues: I just enabled it on MPTCP tree, and it found issues.
Sorry, I forgot to mention that it found 3 issues: one specific to
MPTCP, one in Netfilter, but also one when shutting down the VM:


> # /usr/lib/klibc/bin/poweroff
> [ 2360.588763][T11825] 
> [ 2360.589006][T11825] =============================
> [ 2360.589424][T11825] WARNING: suspicious RCU usage
> [ 2360.589952][T11825] 6.12.0-rc2+ #1 Tainted: G                 N
> [ 2360.590522][T11825] -----------------------------
> [ 2360.590896][T11825] kernel/events/core.c:13962 RCU-list traversed in non-reader section!!
> [ 2360.592341][T11825] 
> [ 2360.592341][T11825] other info that might help us debug this:
> [ 2360.592341][T11825] 
> [ 2360.593343][T11825] 
> [ 2360.593343][T11825] rcu_scheduler_active = 2, debug_locks = 1
> [ 2360.593951][T11825] 3 locks held by poweroff/11825:
> [2360.594481][T11825] #0: ffffffff89641e28 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot (kernel/reboot.c:770) 
> [2360.594997][T11825] #1: ffffffff8963eab0 ((reboot_notifier_list).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain (kernel/notifier.c:388) 
> [2360.595859][T11825] #2: ffffffff898d1a68 (pmus_lock){+.+.}-{3:3}, at: perf_event_exit_cpu_context (kernel/events/core.c:13983) 
> [ 2360.596645][T11825] 
> [ 2360.596645][T11825] stack backtrace:
> [ 2360.597476][T11825] CPU: 3 UID: 0 PID: 11825 Comm: poweroff Tainted: G                 N 6.12.0-rc2+ #1
> [ 2360.597987][T11825] Tainted: [N]=TEST
> [ 2360.598291][T11825] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 2360.598890][T11825] Call Trace:
> [ 2360.599295][T11825]  <TASK>
> [2360.599568][T11825] dump_stack_lvl (lib/dump_stack.c:123) 
> [2360.600032][T11825] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) 
> [2360.600518][T11825] perf_event_clear_cpumask (kernel/events/core.c:13962 (discriminator 9)) 
> [2360.600972][T11825] ? __pfx_perf_event_clear_cpumask (kernel/events/core.c:13939) 
> [2360.601607][T11825] ? acpi_execute_simple_method (drivers/acpi/utils.c:679) 
> [2360.601988][T11825] ? __pfx_acpi_execute_simple_method (drivers/acpi/utils.c:679) 
> [2360.602577][T11825] ? md_notify_reboot (drivers/md/md.c:9860) 
> [2360.603043][T11825] perf_event_exit_cpu_context (kernel/events/core.c:13984 (discriminator 1)) 
> [2360.603543][T11825] perf_reboot (kernel/events/core.c:14066 (discriminator 3)) 
> [2360.603979][T11825] ? trace_notifier_run (include/trace/events/notifier.h:59 (discriminator 2)) 
> [2360.604454][T11825] notifier_call_chain (kernel/notifier.c:93) 
> [2360.604833][T11825] blocking_notifier_call_chain (kernel/notifier.c:389) 
> [2360.605522][T11825] kernel_power_off (kernel/reboot.c:294) 
> [2360.605908][T11825] __do_sys_reboot (kernel/reboot.c:771) 
> [2360.606320][T11825] ? __pfx___do_sys_reboot (kernel/reboot.c:717) 
> [2360.606739][T11825] ? __pfx_ksys_sync (fs/sync.c:98) 
> [2360.607113][T11825] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1)) 
> [2360.607551][T11825] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
> [ 2360.607911][T11825] RIP: 0033:0x20e505
> [ 2360.608320][T11825] Code: 48 89 77 38 c3 89 f0 48 8b 1f 48 8b 67 08 48 8b 6f 10 4c 8b 67 18 4c 8b 6f 20 4c 8b 77 28 4c 8b 7f 30 ff 67 38 49 89 ca 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 f7 d8 89 05 d6 8e 00 00 48 83 c8 ff c3
> All code
> ========
>    0:	48 89 77 38          	mov    %rsi,0x38(%rdi)
>    4:	c3                   	ret
>    5:	89 f0                	mov    %esi,%eax
>    7:	48 8b 1f             	mov    (%rdi),%rbx
>    a:	48 8b 67 08          	mov    0x8(%rdi),%rsp
>    e:	48 8b 6f 10          	mov    0x10(%rdi),%rbp
>   12:	4c 8b 67 18          	mov    0x18(%rdi),%r12
>   16:	4c 8b 6f 20          	mov    0x20(%rdi),%r13
>   1a:	4c 8b 77 28          	mov    0x28(%rdi),%r14
>   1e:	4c 8b 7f 30          	mov    0x30(%rdi),%r15
>   22:	ff 67 38             	jmp    *0x38(%rdi)
>   25:	49 89 ca             	mov    %rcx,%r10
>   28:	0f 05                	syscall
>   2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>   30:	73 01                	jae    0x33
>   32:	c3                   	ret
>   33:	f7 d8                	neg    %eax
>   35:	89 05 d6 8e 00 00    	mov    %eax,0x8ed6(%rip)        # 0x8f11
>   3b:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
>   3f:	c3                   	ret
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>    6:	73 01                	jae    0x9
>    8:	c3                   	ret
>    9:	f7 d8                	neg    %eax
>    b:	89 05 d6 8e 00 00    	mov    %eax,0x8ed6(%rip)        # 0x8ee7
>   11:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
>   15:	c3                   	ret
> [ 2360.609571][T11825] RSP: 002b:00007ffd898b0d58 EFLAGS: 00000217 ORIG_RAX: 00000000000000a9
> [ 2360.610171][T11825] RAX: ffffffffffffffda RBX: 000000004321fedc RCX: 000000000020e505
> [ 2360.610763][T11825] RDX: 000000004321fedc RSI: 0000000028121969 RDI: 00000000fee1dead
> [ 2360.611461][T11825] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000213020
> [ 2360.611987][T11825] R10: 0000000000000000 R11: 0000000000000217 R12: 00007ffd898b0db8
> [ 2360.612583][T11825] R13: 0000000000401000 R14: 0000000000000000 R15: 0000000000000000
> [ 2360.613142][T11825]  </TASK>
> [ 2360.615641][T11825] ACPI: PM: Preparing to enter system sleep state S5
> [ 2360.617134][T11825] kvm: exiting hardware virtualization
> [ 2360.617715][T11825] reboot: Power down


I can easily reproduce this one: this means that all "debug" VMs will
report issues!

Cheers,
Matt
Paul E. McKenney Oct. 16, 2024, 3:50 p.m. UTC | #4
On Tue, Oct 15, 2024 at 06:11:44PM -0700, Jakub Kicinski wrote:
> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
> but RCU_EXPERT is not set, and it's a dependency for
> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
> indicate that it does catch bugs.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
> I'd be slightly tempted to still send it to Linux for v6.12.
> 
> CC: paulmck@kernel.org
> CC: frederic@kernel.org
> CC: neeraj.upadhyay@kernel.org
> CC: joel@joelfernandes.org
> CC: rcu@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: kees@kernel.org
> CC: matttbe@kernel.org
> ---
>  kernel/configs/debug.config | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config
> index 509ee703de15..20552f163930 100644
> --- a/kernel/configs/debug.config
> +++ b/kernel/configs/debug.config
> @@ -103,6 +103,7 @@ CONFIG_BUG_ON_DATA_CORRUPTION=y
>  #
>  # RCU Debugging
>  #
> +CONFIG_RCU_EXPERT=y
>  CONFIG_PROVE_RCU=y
>  CONFIG_PROVE_RCU_LIST=y
>  #
> -- 
> 2.46.2
>
Simon Horman Oct. 17, 2024, 3:36 p.m. UTC | #5
On Tue, Oct 15, 2024 at 06:11:44PM -0700, Jakub Kicinski wrote:
> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
> but RCU_EXPERT is not set, and it's a dependency for
> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
> indicate that it does catch bugs.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

I exercised this in conjunction with tools/testing/selftests/net/config
and the resulting .config is as described before and after.

Reviewed-by: Simon Horman <horms@kernel.org>
diff mbox series

Patch

diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config
index 509ee703de15..20552f163930 100644
--- a/kernel/configs/debug.config
+++ b/kernel/configs/debug.config
@@ -103,6 +103,7 @@  CONFIG_BUG_ON_DATA_CORRUPTION=y
 #
 # RCU Debugging
 #
+CONFIG_RCU_EXPERT=y
 CONFIG_PROVE_RCU=y
 CONFIG_PROVE_RCU_LIST=y
 #