[v2,2/4] rcutree: Add checks for dynticks counters in rcu_is_cpu_rrupt_from_idle
diff mbox series

Message ID 20190326192411.198070-2-joel@joelfernandes.org
State Mainlined
Commit eddded80121f2a7bda810f65bf7cb648a709ed11
Headers show
Series
  • [v2,1/4] lockdep: Add assertion to check if in an interrupt
Related show

Commit Message

Joel Fernandes March 26, 2019, 7:24 p.m. UTC
In the future we would like to combine the dynticks and dynticks_nesting
counters thus leading to simplifying the code. At the moment we cannot
do that due to concerns about usermode upcalls appearing to RCU as half
of an interrupt. Byungchul tried to do it in [1] but the
"half-interrupt" concern was raised. It is half because, what RCU
expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
exception happens. However, only rcu_irq_enter() is observed. This
concern may not be valid anymore, but at least it used to be the case.

Out of abundance of caution, Paul added warnings [2] in the RCU code
which if not fired by 2021 may allow us to assume that such
half-interrupt scenario cannot happen any more, which can lead to
simplification of this code.

Summary of the changes are the following:

(1) In preparation for this combination of counters in the future, we
first need to first be sure that rcu_rrupt_from_idle cannot be called
from anywhere but a hard-interrupt because previously, the comments
suggested otherwise so let us be sure. We discussed this here [3]. We
use the services of lockdep to accomplish this.

(2) Further rcu_rrupt_from_idle() is not explicit about how it is using
the counters which can lead to weird future bugs. This patch therefore
makes it more explicit about the specific counter values being tested

(3) Lastly, we check for counter underflows just to be sure these are
not happening, because the previous code in rcu_rrupt_from_idle() was
allowing the case where the counters can underflow, and the function
would still return true. Now we are checking for specific values so let
us be confident by additional checking, that such underflows don't
happen. Any case, if they do, we should fix them and the screaming
warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
and PROVE_LOCKING are disabled.

[1] https://lore.kernel.org/patchwork/patch/952349/
[2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
[3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/

Cc: byungchul.park@lge.com
Cc: kernel-team@android.com
Cc: rcu@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

Comments

Paul E. McKenney March 27, 2019, 2:47 a.m. UTC | #1
On Tue, Mar 26, 2019 at 03:24:09PM -0400, Joel Fernandes (Google) wrote:
> In the future we would like to combine the dynticks and dynticks_nesting
> counters thus leading to simplifying the code. At the moment we cannot
> do that due to concerns about usermode upcalls appearing to RCU as half
> of an interrupt. Byungchul tried to do it in [1] but the
> "half-interrupt" concern was raised. It is half because, what RCU
> expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
> exception happens. However, only rcu_irq_enter() is observed. This
> concern may not be valid anymore, but at least it used to be the case.
> 
> Out of abundance of caution, Paul added warnings [2] in the RCU code
> which if not fired by 2021 may allow us to assume that such
> half-interrupt scenario cannot happen any more, which can lead to
> simplification of this code.
> 
> Summary of the changes are the following:
> 
> (1) In preparation for this combination of counters in the future, we
> first need to first be sure that rcu_rrupt_from_idle cannot be called
> from anywhere but a hard-interrupt because previously, the comments
> suggested otherwise so let us be sure. We discussed this here [3]. We
> use the services of lockdep to accomplish this.
> 
> (2) Further rcu_rrupt_from_idle() is not explicit about how it is using
> the counters which can lead to weird future bugs. This patch therefore
> makes it more explicit about the specific counter values being tested
> 
> (3) Lastly, we check for counter underflows just to be sure these are
> not happening, because the previous code in rcu_rrupt_from_idle() was
> allowing the case where the counters can underflow, and the function
> would still return true. Now we are checking for specific values so let
> us be confident by additional checking, that such underflows don't
> happen. Any case, if they do, we should fix them and the screaming
> warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
> and PROVE_LOCKING are disabled.
> 
> [1] https://lore.kernel.org/patchwork/patch/952349/
> [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
> [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/
> 
> Cc: byungchul.park@lge.com
> Cc: kernel-team@android.com
> Cc: rcu@vger.kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Color me stupid:

[   48.845724] ------------[ cut here ]------------
[   48.846619] Not in hardirq as expected
[   48.847322] WARNING: CPU: 5 PID: 34 at /home/git/linux-2.6-tip/kernel/rcu/tree.c:388 rcu_is_cpu_rrupt_from_idle+0xea/0x110
[   48.849302] Modules linked in:
[   48.849869] CPU: 5 PID: 34 Comm: cpuhp/5 Not tainted 5.1.0-rc1+ #1
[   48.850985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[   48.852436] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0xea/0x110
[   48.853455] Code: 85 c0 0f 85 59 ff ff ff 80 3d 33 55 68 01 00 0f 85 4c ff ff ff 48 c7 c7 48 d8 cc 8e 31 c0 c6 05 1d 55 68 01 01 e8 66 54 f8 ff <0f> 0b e9 30 ff ff ff 65 48 8b 05 df 58 54 72 48 85 c0 0f 94 c0 0f
[   48.856783] RSP: 0000:ffffbc46802dfdc0 EFLAGS: 00010082
[   48.857735] RAX: 000000000000001a RBX: 0000000000022b80 RCX: 0000000000000000
[   48.859028] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8dac906c
[   48.860313] RBP: ffffbc46802dfe20 R08: 0000000000000001 R09: 0000000000000001
[   48.861607] R10: 000000007d53d16d R11: ffffbc46802dfb48 R12: ffff9e7d7eb62b80
[   48.862898] R13: 0000000000000005 R14: ffffffff8dae2ac0 R15: 00000000000000c9
[   48.864191] FS:  0000000000000000(0000) GS:ffff9e7d7eb40000(0000) knlGS:0000000000000000
[   48.865663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.866702] CR2: 0000000000000000 CR3: 0000000021022000 CR4: 00000000000006e0
[   48.867993] Call Trace:
[   48.868450]  rcu_exp_handler+0x35/0x90
[   48.869147]  generic_exec_single+0xab/0x100
[   48.869918]  ? rcu_barrier+0x240/0x240
[   48.870607]  smp_call_function_single+0x8e/0xd0
[   48.871441]  rcutree_online_cpu+0x80/0x90
[   48.872181]  cpuhp_invoke_callback+0xb5/0x890
[   48.872979]  cpuhp_thread_fun+0x172/0x210
[   48.873722]  ? cpuhp_thread_fun+0x2a/0x210
[   48.874474]  smpboot_thread_fn+0x10d/0x160
[   48.875224]  kthread+0xf3/0x130
[   48.875804]  ? sort_range+0x20/0x20
[   48.876446]  ? kthread_cancel_delayed_work_sync+0x10/0x10
[   48.877445]  ret_from_fork+0x3a/0x50
[   48.878124] irq event stamp: 734
[   48.878717] hardirqs last  enabled at (733): [<ffffffff8e4f332d>] _raw_spin_unlock_irqrestore+0x2d/0x40
[   48.880402] hardirqs last disabled at (734): [<ffffffff8db0110a>] generic_exec_single+0x9a/0x100
[   48.881986] softirqs last  enabled at (0): [<ffffffff8da5feaf>] copy_process.part.56+0x61f/0x2110
[   48.883540] softirqs last disabled at (0): [<0000000000000000>]           (null)
[   48.884840] ---[ end trace 00b4c1d2f816f4ed ]---

If a CPU invokes generic_exec_single() on itself, the "IPI handler" will
be invoked directly, triggering your new lockdep check.  Which is a bit
wasteful.  My thought is to add code to sync_rcu_exp_select_node_cpus()
to check the CPU with preemption disabled, avoiding the call to
smp_call_function_single() in that case.

I have queued all four of your patches, and am trying the fix to
the caller of smp_call_function_single() shown below.  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 9c990df880d1..51d61028abf1 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -384,7 +384,13 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
 			mask_ofl_test |= mask;
 			continue;
 		}
+		preempt_disable();
+		if (smp_processor_id() == cpu) {
+			preempt_enable();
+			continue;
+		}
 		ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
+		preempt_enable();
 		if (!ret) {
 			mask_ofl_ipi &= ~mask;
 			continue;
Joel Fernandes March 27, 2019, 3:34 p.m. UTC | #2
On Tue, Mar 26, 2019 at 07:47:51PM -0700, Paul E. McKenney wrote:
> On Tue, Mar 26, 2019 at 03:24:09PM -0400, Joel Fernandes (Google) wrote:
> > In the future we would like to combine the dynticks and dynticks_nesting
> > counters thus leading to simplifying the code. At the moment we cannot
> > do that due to concerns about usermode upcalls appearing to RCU as half
> > of an interrupt. Byungchul tried to do it in [1] but the
> > "half-interrupt" concern was raised. It is half because, what RCU
> > expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
> > exception happens. However, only rcu_irq_enter() is observed. This
> > concern may not be valid anymore, but at least it used to be the case.
> > 
> > Out of abundance of caution, Paul added warnings [2] in the RCU code
> > which if not fired by 2021 may allow us to assume that such
> > half-interrupt scenario cannot happen any more, which can lead to
> > simplification of this code.
> > 
> > Summary of the changes are the following:
> > 
> > (1) In preparation for this combination of counters in the future, we
> > first need to first be sure that rcu_rrupt_from_idle cannot be called
> > from anywhere but a hard-interrupt because previously, the comments
> > suggested otherwise so let us be sure. We discussed this here [3]. We
> > use the services of lockdep to accomplish this.
> > 
> > (2) Further rcu_rrupt_from_idle() is not explicit about how it is using
> > the counters which can lead to weird future bugs. This patch therefore
> > makes it more explicit about the specific counter values being tested
> > 
> > (3) Lastly, we check for counter underflows just to be sure these are
> > not happening, because the previous code in rcu_rrupt_from_idle() was
> > allowing the case where the counters can underflow, and the function
> > would still return true. Now we are checking for specific values so let
> > us be confident by additional checking, that such underflows don't
> > happen. Any case, if they do, we should fix them and the screaming
> > warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
> > and PROVE_LOCKING are disabled.
> > 
> > [1] https://lore.kernel.org/patchwork/patch/952349/
> > [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
> > [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/
> > 
> > Cc: byungchul.park@lge.com
> > Cc: kernel-team@android.com
> > Cc: rcu@vger.kernel.org
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Color me stupid:
> 
> [   48.845724] ------------[ cut here ]------------
> [   48.846619] Not in hardirq as expected
> [   48.847322] WARNING: CPU: 5 PID: 34 at /home/git/linux-2.6-tip/kernel/rcu/tree.c:388 rcu_is_cpu_rrupt_from_idle+0xea/0x110
> [   48.849302] Modules linked in:
> [   48.849869] CPU: 5 PID: 34 Comm: cpuhp/5 Not tainted 5.1.0-rc1+ #1
> [   48.850985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [   48.852436] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0xea/0x110
> [   48.853455] Code: 85 c0 0f 85 59 ff ff ff 80 3d 33 55 68 01 00 0f 85 4c ff ff ff 48 c7 c7 48 d8 cc 8e 31 c0 c6 05 1d 55 68 01 01 e8 66 54 f8 ff <0f> 0b e9 30 ff ff ff 65 48 8b 05 df 58 54 72 48 85 c0 0f 94 c0 0f
> [   48.856783] RSP: 0000:ffffbc46802dfdc0 EFLAGS: 00010082
> [   48.857735] RAX: 000000000000001a RBX: 0000000000022b80 RCX: 0000000000000000
> [   48.859028] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8dac906c
> [   48.860313] RBP: ffffbc46802dfe20 R08: 0000000000000001 R09: 0000000000000001
> [   48.861607] R10: 000000007d53d16d R11: ffffbc46802dfb48 R12: ffff9e7d7eb62b80
> [   48.862898] R13: 0000000000000005 R14: ffffffff8dae2ac0 R15: 00000000000000c9
> [   48.864191] FS:  0000000000000000(0000) GS:ffff9e7d7eb40000(0000) knlGS:0000000000000000
> [   48.865663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   48.866702] CR2: 0000000000000000 CR3: 0000000021022000 CR4: 00000000000006e0
> [   48.867993] Call Trace:
> [   48.868450]  rcu_exp_handler+0x35/0x90
> [   48.869147]  generic_exec_single+0xab/0x100
> [   48.869918]  ? rcu_barrier+0x240/0x240
> [   48.870607]  smp_call_function_single+0x8e/0xd0
> [   48.871441]  rcutree_online_cpu+0x80/0x90
> [   48.872181]  cpuhp_invoke_callback+0xb5/0x890
> [   48.872979]  cpuhp_thread_fun+0x172/0x210
> [   48.873722]  ? cpuhp_thread_fun+0x2a/0x210
> [   48.874474]  smpboot_thread_fn+0x10d/0x160
> [   48.875224]  kthread+0xf3/0x130
> [   48.875804]  ? sort_range+0x20/0x20
> [   48.876446]  ? kthread_cancel_delayed_work_sync+0x10/0x10
> [   48.877445]  ret_from_fork+0x3a/0x50
> [   48.878124] irq event stamp: 734
> [   48.878717] hardirqs last  enabled at (733): [<ffffffff8e4f332d>] _raw_spin_unlock_irqrestore+0x2d/0x40
> [   48.880402] hardirqs last disabled at (734): [<ffffffff8db0110a>] generic_exec_single+0x9a/0x100
> [   48.881986] softirqs last  enabled at (0): [<ffffffff8da5feaf>] copy_process.part.56+0x61f/0x2110
> [   48.883540] softirqs last disabled at (0): [<0000000000000000>]           (null)
> [   48.884840] ---[ end trace 00b4c1d2f816f4ed ]---
> 
> If a CPU invokes generic_exec_single() on itself, the "IPI handler" will
> be invoked directly, triggering your new lockdep check.  Which is a bit
> wasteful.  My thought is to add code to sync_rcu_exp_select_node_cpus()
> to check the CPU with preemption disabled, avoiding the call to
> smp_call_function_single() in that case.
> 
> I have queued all four of your patches, and am trying the fix to
> the caller of smp_call_function_single() shown below.  Thoughts?

Oh interesting. Your fix makes sense. I will go through these paths more as
well since I'm not super familiar with this area of the RCU code. But I had
one small nit below.

Also thanks for pulling the patches, I tested TREE09 and TASKS02 which
disable SMP and both passed.

> ------------------------------------------------------------------------
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 9c990df880d1..51d61028abf1 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -384,7 +384,13 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
>  			mask_ofl_test |= mask;
>  			continue;
>  		}
> +		preempt_disable();
> +		if (smp_processor_id() == cpu) {

Can be this?
		if (get_cpu() == cpu) {
			put_cpu();
			continue;
		}

> +			preempt_enable();
> +			continue;
> +		}
>  		ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
> +		preempt_enable();

and here:
		put_cpu();

>  		if (!ret) {
>  			mask_ofl_ipi &= ~mask;
>  			continue;
> 

thanks,

 - Joel
Paul E. McKenney March 27, 2019, 3:53 p.m. UTC | #3
On Wed, Mar 27, 2019 at 11:34:01AM -0400, Joel Fernandes wrote:
> On Tue, Mar 26, 2019 at 07:47:51PM -0700, Paul E. McKenney wrote:
> > On Tue, Mar 26, 2019 at 03:24:09PM -0400, Joel Fernandes (Google) wrote:
> > > In the future we would like to combine the dynticks and dynticks_nesting
> > > counters thus leading to simplifying the code. At the moment we cannot
> > > do that due to concerns about usermode upcalls appearing to RCU as half
> > > of an interrupt. Byungchul tried to do it in [1] but the
> > > "half-interrupt" concern was raised. It is half because, what RCU
> > > expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
> > > exception happens. However, only rcu_irq_enter() is observed. This
> > > concern may not be valid anymore, but at least it used to be the case.
> > > 
> > > Out of abundance of caution, Paul added warnings [2] in the RCU code
> > > which if not fired by 2021 may allow us to assume that such
> > > half-interrupt scenario cannot happen any more, which can lead to
> > > simplification of this code.
> > > 
> > > Summary of the changes are the following:
> > > 
> > > (1) In preparation for this combination of counters in the future, we
> > > first need to first be sure that rcu_rrupt_from_idle cannot be called
> > > from anywhere but a hard-interrupt because previously, the comments
> > > suggested otherwise so let us be sure. We discussed this here [3]. We
> > > use the services of lockdep to accomplish this.
> > > 
> > > (2) Further rcu_rrupt_from_idle() is not explicit about how it is using
> > > the counters which can lead to weird future bugs. This patch therefore
> > > makes it more explicit about the specific counter values being tested
> > > 
> > > (3) Lastly, we check for counter underflows just to be sure these are
> > > not happening, because the previous code in rcu_rrupt_from_idle() was
> > > allowing the case where the counters can underflow, and the function
> > > would still return true. Now we are checking for specific values so let
> > > us be confident by additional checking, that such underflows don't
> > > happen. Any case, if they do, we should fix them and the screaming
> > > warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
> > > and PROVE_LOCKING are disabled.
> > > 
> > > [1] https://lore.kernel.org/patchwork/patch/952349/
> > > [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
> > > [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/
> > > 
> > > Cc: byungchul.park@lge.com
> > > Cc: kernel-team@android.com
> > > Cc: rcu@vger.kernel.org
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Color me stupid:
> > 
> > [   48.845724] ------------[ cut here ]------------
> > [   48.846619] Not in hardirq as expected
> > [   48.847322] WARNING: CPU: 5 PID: 34 at /home/git/linux-2.6-tip/kernel/rcu/tree.c:388 rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > [   48.849302] Modules linked in:
> > [   48.849869] CPU: 5 PID: 34 Comm: cpuhp/5 Not tainted 5.1.0-rc1+ #1
> > [   48.850985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [   48.852436] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > [   48.853455] Code: 85 c0 0f 85 59 ff ff ff 80 3d 33 55 68 01 00 0f 85 4c ff ff ff 48 c7 c7 48 d8 cc 8e 31 c0 c6 05 1d 55 68 01 01 e8 66 54 f8 ff <0f> 0b e9 30 ff ff ff 65 48 8b 05 df 58 54 72 48 85 c0 0f 94 c0 0f
> > [   48.856783] RSP: 0000:ffffbc46802dfdc0 EFLAGS: 00010082
> > [   48.857735] RAX: 000000000000001a RBX: 0000000000022b80 RCX: 0000000000000000
> > [   48.859028] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8dac906c
> > [   48.860313] RBP: ffffbc46802dfe20 R08: 0000000000000001 R09: 0000000000000001
> > [   48.861607] R10: 000000007d53d16d R11: ffffbc46802dfb48 R12: ffff9e7d7eb62b80
> > [   48.862898] R13: 0000000000000005 R14: ffffffff8dae2ac0 R15: 00000000000000c9
> > [   48.864191] FS:  0000000000000000(0000) GS:ffff9e7d7eb40000(0000) knlGS:0000000000000000
> > [   48.865663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   48.866702] CR2: 0000000000000000 CR3: 0000000021022000 CR4: 00000000000006e0
> > [   48.867993] Call Trace:
> > [   48.868450]  rcu_exp_handler+0x35/0x90
> > [   48.869147]  generic_exec_single+0xab/0x100
> > [   48.869918]  ? rcu_barrier+0x240/0x240
> > [   48.870607]  smp_call_function_single+0x8e/0xd0
> > [   48.871441]  rcutree_online_cpu+0x80/0x90
> > [   48.872181]  cpuhp_invoke_callback+0xb5/0x890
> > [   48.872979]  cpuhp_thread_fun+0x172/0x210
> > [   48.873722]  ? cpuhp_thread_fun+0x2a/0x210
> > [   48.874474]  smpboot_thread_fn+0x10d/0x160
> > [   48.875224]  kthread+0xf3/0x130
> > [   48.875804]  ? sort_range+0x20/0x20
> > [   48.876446]  ? kthread_cancel_delayed_work_sync+0x10/0x10
> > [   48.877445]  ret_from_fork+0x3a/0x50
> > [   48.878124] irq event stamp: 734
> > [   48.878717] hardirqs last  enabled at (733): [<ffffffff8e4f332d>] _raw_spin_unlock_irqrestore+0x2d/0x40
> > [   48.880402] hardirqs last disabled at (734): [<ffffffff8db0110a>] generic_exec_single+0x9a/0x100
> > [   48.881986] softirqs last  enabled at (0): [<ffffffff8da5feaf>] copy_process.part.56+0x61f/0x2110
> > [   48.883540] softirqs last disabled at (0): [<0000000000000000>]           (null)
> > [   48.884840] ---[ end trace 00b4c1d2f816f4ed ]---
> > 
> > If a CPU invokes generic_exec_single() on itself, the "IPI handler" will
> > be invoked directly, triggering your new lockdep check.  Which is a bit
> > wasteful.  My thought is to add code to sync_rcu_exp_select_node_cpus()
> > to check the CPU with preemption disabled, avoiding the call to
> > smp_call_function_single() in that case.
> > 
> > I have queued all four of your patches, and am trying the fix to
> > the caller of smp_call_function_single() shown below.  Thoughts?
> 
> Oh interesting. Your fix makes sense. I will go through these paths more as
> well since I'm not super familiar with this area of the RCU code. But I had
> one small nit below.

Very good, applying that change.  I have a similar issue in the CPU-hotplug
code that I will also be fixing.

Are there other places where I should be using get_cpu()?

> Also thanks for pulling the patches, I tested TREE09 and TASKS02 which
> disable SMP and both passed.

Sounds good!

							Thanx, Paul

> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 9c990df880d1..51d61028abf1 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -384,7 +384,13 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> >  			mask_ofl_test |= mask;
> >  			continue;
> >  		}
> > +		preempt_disable();
> > +		if (smp_processor_id() == cpu) {
> 
> Can be this?
> 		if (get_cpu() == cpu) {
> 			put_cpu();
> 			continue;
> 		}
> 
> > +			preempt_enable();
> > +			continue;
> > +		}
> >  		ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
> > +		preempt_enable();
> 
> and here:
> 		put_cpu();
> 
> >  		if (!ret) {
> >  			mask_ofl_ipi &= ~mask;
> >  			continue;
> > 
> 
> thanks,
> 
>  - Joel
>
Joel Fernandes March 27, 2019, 5:45 p.m. UTC | #4
On Wed, Mar 27, 2019 at 08:53:51AM -0700, Paul E. McKenney wrote:
> On Wed, Mar 27, 2019 at 11:34:01AM -0400, Joel Fernandes wrote:
> > On Tue, Mar 26, 2019 at 07:47:51PM -0700, Paul E. McKenney wrote:
> > > On Tue, Mar 26, 2019 at 03:24:09PM -0400, Joel Fernandes (Google) wrote:
> > > > In the future we would like to combine the dynticks and dynticks_nesting
> > > > counters thus leading to simplifying the code. At the moment we cannot
> > > > do that due to concerns about usermode upcalls appearing to RCU as half
> > > > of an interrupt. Byungchul tried to do it in [1] but the
> > > > "half-interrupt" concern was raised. It is half because, what RCU
> > > > expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
> > > > exception happens. However, only rcu_irq_enter() is observed. This
> > > > concern may not be valid anymore, but at least it used to be the case.
> > > > 
> > > > Out of abundance of caution, Paul added warnings [2] in the RCU code
> > > > which if not fired by 2021 may allow us to assume that such
> > > > half-interrupt scenario cannot happen any more, which can lead to
> > > > simplification of this code.
> > > > 
> > > > Summary of the changes are the following:
> > > > 
> > > > (1) In preparation for this combination of counters in the future, we
> > > > first need to first be sure that rcu_rrupt_from_idle cannot be called
> > > > from anywhere but a hard-interrupt because previously, the comments
> > > > suggested otherwise so let us be sure. We discussed this here [3]. We
> > > > use the services of lockdep to accomplish this.
> > > > 
> > > > (2) Further rcu_rrupt_from_idle() is not explicit about how it is using
> > > > the counters which can lead to weird future bugs. This patch therefore
> > > > makes it more explicit about the specific counter values being tested
> > > > 
> > > > (3) Lastly, we check for counter underflows just to be sure these are
> > > > not happening, because the previous code in rcu_rrupt_from_idle() was
> > > > allowing the case where the counters can underflow, and the function
> > > > would still return true. Now we are checking for specific values so let
> > > > us be confident by additional checking, that such underflows don't
> > > > happen. Any case, if they do, we should fix them and the screaming
> > > > warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
> > > > and PROVE_LOCKING are disabled.
> > > > 
> > > > [1] https://lore.kernel.org/patchwork/patch/952349/
> > > > [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
> > > > [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/
> > > > 
> > > > Cc: byungchul.park@lge.com
> > > > Cc: kernel-team@android.com
> > > > Cc: rcu@vger.kernel.org
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > 
> > > Color me stupid:
> > > 
> > > [   48.845724] ------------[ cut here ]------------
> > > [   48.846619] Not in hardirq as expected
> > > [   48.847322] WARNING: CPU: 5 PID: 34 at /home/git/linux-2.6-tip/kernel/rcu/tree.c:388 rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > > [   48.849302] Modules linked in:
> > > [   48.849869] CPU: 5 PID: 34 Comm: cpuhp/5 Not tainted 5.1.0-rc1+ #1
> > > [   48.850985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > [   48.852436] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > > [   48.853455] Code: 85 c0 0f 85 59 ff ff ff 80 3d 33 55 68 01 00 0f 85 4c ff ff ff 48 c7 c7 48 d8 cc 8e 31 c0 c6 05 1d 55 68 01 01 e8 66 54 f8 ff <0f> 0b e9 30 ff ff ff 65 48 8b 05 df 58 54 72 48 85 c0 0f 94 c0 0f
> > > [   48.856783] RSP: 0000:ffffbc46802dfdc0 EFLAGS: 00010082
> > > [   48.857735] RAX: 000000000000001a RBX: 0000000000022b80 RCX: 0000000000000000
> > > [   48.859028] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8dac906c
> > > [   48.860313] RBP: ffffbc46802dfe20 R08: 0000000000000001 R09: 0000000000000001
> > > [   48.861607] R10: 000000007d53d16d R11: ffffbc46802dfb48 R12: ffff9e7d7eb62b80
> > > [   48.862898] R13: 0000000000000005 R14: ffffffff8dae2ac0 R15: 00000000000000c9
> > > [   48.864191] FS:  0000000000000000(0000) GS:ffff9e7d7eb40000(0000) knlGS:0000000000000000
> > > [   48.865663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   48.866702] CR2: 0000000000000000 CR3: 0000000021022000 CR4: 00000000000006e0
> > > [   48.867993] Call Trace:
> > > [   48.868450]  rcu_exp_handler+0x35/0x90
> > > [   48.869147]  generic_exec_single+0xab/0x100
> > > [   48.869918]  ? rcu_barrier+0x240/0x240
> > > [   48.870607]  smp_call_function_single+0x8e/0xd0
> > > [   48.871441]  rcutree_online_cpu+0x80/0x90
> > > [   48.872181]  cpuhp_invoke_callback+0xb5/0x890
> > > [   48.872979]  cpuhp_thread_fun+0x172/0x210
> > > [   48.873722]  ? cpuhp_thread_fun+0x2a/0x210
> > > [   48.874474]  smpboot_thread_fn+0x10d/0x160
> > > [   48.875224]  kthread+0xf3/0x130
> > > [   48.875804]  ? sort_range+0x20/0x20
> > > [   48.876446]  ? kthread_cancel_delayed_work_sync+0x10/0x10
> > > [   48.877445]  ret_from_fork+0x3a/0x50
> > > [   48.878124] irq event stamp: 734
> > > [   48.878717] hardirqs last  enabled at (733): [<ffffffff8e4f332d>] _raw_spin_unlock_irqrestore+0x2d/0x40
> > > [   48.880402] hardirqs last disabled at (734): [<ffffffff8db0110a>] generic_exec_single+0x9a/0x100
> > > [   48.881986] softirqs last  enabled at (0): [<ffffffff8da5feaf>] copy_process.part.56+0x61f/0x2110
> > > [   48.883540] softirqs last disabled at (0): [<0000000000000000>]           (null)
> > > [   48.884840] ---[ end trace 00b4c1d2f816f4ed ]---
> > > 
> > > If a CPU invokes generic_exec_single() on itself, the "IPI handler" will
> > > be invoked directly, triggering your new lockdep check.  Which is a bit
> > > wasteful.  My thought is to add code to sync_rcu_exp_select_node_cpus()
> > > to check the CPU with preemption disabled, avoiding the call to
> > > smp_call_function_single() in that case.
> > > 
> > > I have queued all four of your patches, and am trying the fix to
> > > the caller of smp_call_function_single() shown below.  Thoughts?
> > 
> > Oh interesting. Your fix makes sense. I will go through these paths more as
> > well since I'm not super familiar with this area of the RCU code. But I had
> > one small nit below.
> 
> Very good, applying that change.  I have a similar issue in the CPU-hotplug
> code that I will also be fixing.
> 
> Are there other places where I should be using get_cpu()?

I will check other usages. I wonder if this path is problematic:

rcu_do_batch AIUI can be called from process-context if boost is enabled.
In that case rcu_do_batch()-> invoke_rcu_core()-> smp_processor_id() might be
problematic. I will double confirm this situation is possible and send a
get/put_cpu patch as well if that's the case. Other paths seem to be
disabling interrupts or softirqs so they are fine. But I will go through it
in more detail later today (sorry for slow responses, currently catching a plane).

CONFIG_DEBUG_PREEMPT should be able to catch these kinds of issues since
smp_processor_id() checks this internally. And it seems rcutorture configs do
enable these, so it may not be an issue after all, or that DEBUG_PREEMPT
checking needs some investigation to see why it doesn't warn if at all :-)

> > Also thanks for pulling the patches, I tested TREE09 and TASKS02 which
> > disable SMP and both passed.
> 
> Sounds good!

thanks!

 - Joel
Paul E. McKenney March 27, 2019, 6:44 p.m. UTC | #5
On Wed, Mar 27, 2019 at 01:45:45PM -0400, Joel Fernandes wrote:
> On Wed, Mar 27, 2019 at 08:53:51AM -0700, Paul E. McKenney wrote:
> > On Wed, Mar 27, 2019 at 11:34:01AM -0400, Joel Fernandes wrote:
> > > On Tue, Mar 26, 2019 at 07:47:51PM -0700, Paul E. McKenney wrote:
> > > > On Tue, Mar 26, 2019 at 03:24:09PM -0400, Joel Fernandes (Google) wrote:
> > > > > In the future we would like to combine the dynticks and dynticks_nesting
> > > > > counters thus leading to simplifying the code. At the moment we cannot
> > > > > do that due to concerns about usermode upcalls appearing to RCU as half
> > > > > of an interrupt. Byungchul tried to do it in [1] but the
> > > > > "half-interrupt" concern was raised. It is half because, what RCU
> > > > > expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
> > > > > exception happens. However, only rcu_irq_enter() is observed. This
> > > > > concern may not be valid anymore, but at least it used to be the case.
> > > > > 
> > > > > Out of abundance of caution, Paul added warnings [2] in the RCU code
> > > > > which if not fired by 2021 may allow us to assume that such
> > > > > half-interrupt scenario cannot happen any more, which can lead to
> > > > > simplification of this code.
> > > > > 
> > > > > Summary of the changes are the following:
> > > > > 
> > > > > (1) In preparation for this combination of counters in the future, we
> > > > > first need to first be sure that rcu_rrupt_from_idle cannot be called
> > > > > from anywhere but a hard-interrupt because previously, the comments
> > > > > suggested otherwise so let us be sure. We discussed this here [3]. We
> > > > > use the services of lockdep to accomplish this.
> > > > > 
> > > > > (2) Further rcu_rrupt_from_idle() is not explicit about how it is using
> > > > > the counters which can lead to weird future bugs. This patch therefore
> > > > > makes it more explicit about the specific counter values being tested
> > > > > 
> > > > > (3) Lastly, we check for counter underflows just to be sure these are
> > > > > not happening, because the previous code in rcu_rrupt_from_idle() was
> > > > > allowing the case where the counters can underflow, and the function
> > > > > would still return true. Now we are checking for specific values so let
> > > > > us be confident by additional checking, that such underflows don't
> > > > > happen. Any case, if they do, we should fix them and the screaming
> > > > > warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
> > > > > and PROVE_LOCKING are disabled.
> > > > > 
> > > > > [1] https://lore.kernel.org/patchwork/patch/952349/
> > > > > [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
> > > > > [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/
> > > > > 
> > > > > Cc: byungchul.park@lge.com
> > > > > Cc: kernel-team@android.com
> > > > > Cc: rcu@vger.kernel.org
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > 
> > > > Color me stupid:
> > > > 
> > > > [   48.845724] ------------[ cut here ]------------
> > > > [   48.846619] Not in hardirq as expected
> > > > [   48.847322] WARNING: CPU: 5 PID: 34 at /home/git/linux-2.6-tip/kernel/rcu/tree.c:388 rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > > > [   48.849302] Modules linked in:
> > > > [   48.849869] CPU: 5 PID: 34 Comm: cpuhp/5 Not tainted 5.1.0-rc1+ #1
> > > > [   48.850985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > [   48.852436] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > > > [   48.853455] Code: 85 c0 0f 85 59 ff ff ff 80 3d 33 55 68 01 00 0f 85 4c ff ff ff 48 c7 c7 48 d8 cc 8e 31 c0 c6 05 1d 55 68 01 01 e8 66 54 f8 ff <0f> 0b e9 30 ff ff ff 65 48 8b 05 df 58 54 72 48 85 c0 0f 94 c0 0f
> > > > [   48.856783] RSP: 0000:ffffbc46802dfdc0 EFLAGS: 00010082
> > > > [   48.857735] RAX: 000000000000001a RBX: 0000000000022b80 RCX: 0000000000000000
> > > > [   48.859028] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8dac906c
> > > > [   48.860313] RBP: ffffbc46802dfe20 R08: 0000000000000001 R09: 0000000000000001
> > > > [   48.861607] R10: 000000007d53d16d R11: ffffbc46802dfb48 R12: ffff9e7d7eb62b80
> > > > [   48.862898] R13: 0000000000000005 R14: ffffffff8dae2ac0 R15: 00000000000000c9
> > > > [   48.864191] FS:  0000000000000000(0000) GS:ffff9e7d7eb40000(0000) knlGS:0000000000000000
> > > > [   48.865663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [   48.866702] CR2: 0000000000000000 CR3: 0000000021022000 CR4: 00000000000006e0
> > > > [   48.867993] Call Trace:
> > > > [   48.868450]  rcu_exp_handler+0x35/0x90
> > > > [   48.869147]  generic_exec_single+0xab/0x100
> > > > [   48.869918]  ? rcu_barrier+0x240/0x240
> > > > [   48.870607]  smp_call_function_single+0x8e/0xd0
> > > > [   48.871441]  rcutree_online_cpu+0x80/0x90
> > > > [   48.872181]  cpuhp_invoke_callback+0xb5/0x890
> > > > [   48.872979]  cpuhp_thread_fun+0x172/0x210
> > > > [   48.873722]  ? cpuhp_thread_fun+0x2a/0x210
> > > > [   48.874474]  smpboot_thread_fn+0x10d/0x160
> > > > [   48.875224]  kthread+0xf3/0x130
> > > > [   48.875804]  ? sort_range+0x20/0x20
> > > > [   48.876446]  ? kthread_cancel_delayed_work_sync+0x10/0x10
> > > > [   48.877445]  ret_from_fork+0x3a/0x50
> > > > [   48.878124] irq event stamp: 734
> > > > [   48.878717] hardirqs last  enabled at (733): [<ffffffff8e4f332d>] _raw_spin_unlock_irqrestore+0x2d/0x40
> > > > [   48.880402] hardirqs last disabled at (734): [<ffffffff8db0110a>] generic_exec_single+0x9a/0x100
> > > > [   48.881986] softirqs last  enabled at (0): [<ffffffff8da5feaf>] copy_process.part.56+0x61f/0x2110
> > > > [   48.883540] softirqs last disabled at (0): [<0000000000000000>]           (null)
> > > > [   48.884840] ---[ end trace 00b4c1d2f816f4ed ]---
> > > > 
> > > > If a CPU invokes generic_exec_single() on itself, the "IPI handler" will
> > > > be invoked directly, triggering your new lockdep check.  Which is a bit
> > > > wasteful.  My thought is to add code to sync_rcu_exp_select_node_cpus()
> > > > to check the CPU with preemption disabled, avoiding the call to
> > > > smp_call_function_single() in that case.
> > > > 
> > > > I have queued all four of your patches, and am trying the fix to
> > > > the caller of smp_call_function_single() shown below.  Thoughts?
> > > 
> > > Oh interesting. Your fix makes sense. I will go through these paths more as
> > > well since I'm not super familiar with this area of the RCU code. But I had
> > > one small nit below.
> > 
> > Very good, applying that change.  I have a similar issue in the CPU-hotplug
> > code that I will also be fixing.
> > 
> > Are there other places where I should be using get_cpu()?
> 
> I will check other usages. I wonder if this path is problematic:
> 
> rcu_do_batch AIUI can be called from process-context if boost is enabled.
> In that case rcu_do_batch()-> invoke_rcu_core()-> smp_processor_id() might be
> problematic. I will double confirm this situation is possible and send a
> get/put_cpu patch as well if that's the case. Other paths seem to be
> disabling interrupts or softirqs so they are fine. But I will go through it
> in more detail later today (sorry for slow responses, currently catching a plane).

The theory is that the case where it is invoked from process context,
it is invoked from an rcuc kthread, which is bound to a single CPU.
Wouldn't hurt to check, though!

> CONFIG_DEBUG_PREEMPT should be able to catch these kinds of issues since
> smp_processor_id() checks this internally. And it seems rcutorture configs do
> enable these, so it may not be an issue after all, or that DEBUG_PREEMPT
> checking needs some investigation to see why it doesn't warn if at all :-)

Or maybe I don't have CONFIG_DEBUG_PREEMPT enabled on the scenario that
needs it.  ;-)

And please see below for an additional patch to make the world safe for
rcu_is_cpu_rrupt_from_idle().  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit a8d8c1e6e09a9a9521e3248a92f5fbb9eb2cf988
Author: Paul E. McKenney <paulmck@linux.ibm.com>
Date:   Wed Mar 27 10:03:12 2019 -0700

    rcu: Avoid self-IPI in sync_sched_exp_online_cleanup()
    
    The sync_sched_exp_online_cleanup() is invoked at online time to handle
    the case where the start of an expedited grace period ran concurrently
    with a CPU being taken offline and then immediately being placed online.
    It checks to see if RCU needs an expedited quiescent state from the
    incoming CPU, sending it an IPI if so.  However, it is quite possible
    that sync_sched_exp_online_cleanup() is running on that CPU, in which
    case it is considerably less overhead to simply request the quiescent
    state locally instead of simulating a self-IPI.
    
    This commit therefore places the last few lines of rcu_exp_handler()
    into a new rcu_exp_need_qs() function, which is invoked both by
    rcu_exp_handler() and by sync_sched_exp_online_cleanup() in the self-IPI
    case.
    
    This also reduces the rcu_exp_handler() function's state space by
    removing the direct call that this smp_call_function_single() uses to
    emulate the requested self-IPI.  This in turn will allow tighter error
    checking in rcu_is_cpu_rrupt_from_idle().
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 5390618787b6..de1b4acf6979 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -699,6 +699,16 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
 
 #else /* #ifdef CONFIG_PREEMPT_RCU */
 
+/* Request an expedited quiescent state. */
+static void rcu_exp_need_qs(void)
+{
+	__this_cpu_write(rcu_data.cpu_no_qs.b.exp, true);
+	/* Store .exp before .rcu_urgent_qs. */
+	smp_store_release(this_cpu_ptr(&rcu_data.rcu_urgent_qs), true);
+	set_tsk_need_resched(current);
+	set_preempt_need_resched();
+}
+
 /* Invoked on each online non-idle CPU for expedited quiescent state. */
 static void rcu_exp_handler(void *unused)
 {
@@ -714,25 +724,38 @@ static void rcu_exp_handler(void *unused)
 		rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
 		return;
 	}
-	__this_cpu_write(rcu_data.cpu_no_qs.b.exp, true);
-	/* Store .exp before .rcu_urgent_qs. */
-	smp_store_release(this_cpu_ptr(&rcu_data.rcu_urgent_qs), true);
-	set_tsk_need_resched(current);
-	set_preempt_need_resched();
+	rcu_exp_need_qs();
 }
 
 /* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
 static void sync_sched_exp_online_cleanup(int cpu)
 {
+	unsigned long flags;
+	int my_cpu;
 	struct rcu_data *rdp;
 	int ret;
 	struct rcu_node *rnp;
 
 	rdp = per_cpu_ptr(&rcu_data, cpu);
 	rnp = rdp->mynode;
-	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
+	my_cpu = get_cpu();
+	/* Quiescent state either not needed or already requested, leave. */
+	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
+	    __this_cpu_read(rcu_data.cpu_no_qs.b.exp)) {
+		put_cpu();
 		return;
+	}
+	/* Quiescent state needed on current CPU, so set it up locally. */
+	if (my_cpu == cpu) {
+		local_irq_save(flags);
+		rcu_exp_need_qs();
+		local_irq_restore(flags);
+		put_cpu();
+		return;
+	}
+	/* Quiescent state needed on some other CPU, send IPI. */
 	ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
+	put_cpu();
 	WARN_ON_ONCE(ret);
 }
Joel Fernandes March 27, 2019, 9:38 p.m. UTC | #6
On Wed, Mar 27, 2019 at 11:44:37AM -0700, Paul E. McKenney wrote:
> On Wed, Mar 27, 2019 at 01:45:45PM -0400, Joel Fernandes wrote:
> > On Wed, Mar 27, 2019 at 08:53:51AM -0700, Paul E. McKenney wrote:
> > > On Wed, Mar 27, 2019 at 11:34:01AM -0400, Joel Fernandes wrote:
> > > > On Tue, Mar 26, 2019 at 07:47:51PM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Mar 26, 2019 at 03:24:09PM -0400, Joel Fernandes (Google) wrote:
> > > > > > In the future we would like to combine the dynticks and dynticks_nesting
> > > > > > counters thus leading to simplifying the code. At the moment we cannot
> > > > > > do that due to concerns about usermode upcalls appearing to RCU as half
> > > > > > of an interrupt. Byungchul tried to do it in [1] but the
> > > > > > "half-interrupt" concern was raised. It is half because, what RCU
> > > > > > expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
> > > > > > exception happens. However, only rcu_irq_enter() is observed. This
> > > > > > concern may not be valid anymore, but at least it used to be the case.
> > > > > > 
> > > > > > Out of abundance of caution, Paul added warnings [2] in the RCU code
> > > > > > which if not fired by 2021 may allow us to assume that such
> > > > > > half-interrupt scenario cannot happen any more, which can lead to
> > > > > > simplification of this code.
> > > > > > 
> > > > > > Summary of the changes are the following:
> > > > > > 
> > > > > > (1) In preparation for this combination of counters in the future, we
> > > > > > first need to first be sure that rcu_rrupt_from_idle cannot be called
> > > > > > from anywhere but a hard-interrupt because previously, the comments
> > > > > > suggested otherwise so let us be sure. We discussed this here [3]. We
> > > > > > use the services of lockdep to accomplish this.
> > > > > > 
> > > > > > (2) Further rcu_rrupt_from_idle() is not explicit about how it is using
> > > > > > the counters which can lead to weird future bugs. This patch therefore
> > > > > > makes it more explicit about the specific counter values being tested
> > > > > > 
> > > > > > (3) Lastly, we check for counter underflows just to be sure these are
> > > > > > not happening, because the previous code in rcu_rrupt_from_idle() was
> > > > > > allowing the case where the counters can underflow, and the function
> > > > > > would still return true. Now we are checking for specific values so let
> > > > > > us be confident by additional checking, that such underflows don't
> > > > > > happen. Any case, if they do, we should fix them and the screaming
> > > > > > warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
> > > > > > and PROVE_LOCKING are disabled.
> > > > > > 
> > > > > > [1] https://lore.kernel.org/patchwork/patch/952349/
> > > > > > [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
> > > > > > [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/
> > > > > > 
> > > > > > Cc: byungchul.park@lge.com
> > > > > > Cc: kernel-team@android.com
> > > > > > Cc: rcu@vger.kernel.org
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > 
> > > > > Color me stupid:
> > > > > 
> > > > > [   48.845724] ------------[ cut here ]------------
> > > > > [   48.846619] Not in hardirq as expected
> > > > > [   48.847322] WARNING: CPU: 5 PID: 34 at /home/git/linux-2.6-tip/kernel/rcu/tree.c:388 rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > > > > [   48.849302] Modules linked in:
> > > > > [   48.849869] CPU: 5 PID: 34 Comm: cpuhp/5 Not tainted 5.1.0-rc1+ #1
> > > > > [   48.850985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > > [   48.852436] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > > > > [   48.853455] Code: 85 c0 0f 85 59 ff ff ff 80 3d 33 55 68 01 00 0f 85 4c ff ff ff 48 c7 c7 48 d8 cc 8e 31 c0 c6 05 1d 55 68 01 01 e8 66 54 f8 ff <0f> 0b e9 30 ff ff ff 65 48 8b 05 df 58 54 72 48 85 c0 0f 94 c0 0f
> > > > > [   48.856783] RSP: 0000:ffffbc46802dfdc0 EFLAGS: 00010082
> > > > > [   48.857735] RAX: 000000000000001a RBX: 0000000000022b80 RCX: 0000000000000000
> > > > > [   48.859028] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8dac906c
> > > > > [   48.860313] RBP: ffffbc46802dfe20 R08: 0000000000000001 R09: 0000000000000001
> > > > > [   48.861607] R10: 000000007d53d16d R11: ffffbc46802dfb48 R12: ffff9e7d7eb62b80
> > > > > [   48.862898] R13: 0000000000000005 R14: ffffffff8dae2ac0 R15: 00000000000000c9
> > > > > [   48.864191] FS:  0000000000000000(0000) GS:ffff9e7d7eb40000(0000) knlGS:0000000000000000
> > > > > [   48.865663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [   48.866702] CR2: 0000000000000000 CR3: 0000000021022000 CR4: 00000000000006e0
> > > > > [   48.867993] Call Trace:
> > > > > [   48.868450]  rcu_exp_handler+0x35/0x90
> > > > > [   48.869147]  generic_exec_single+0xab/0x100
> > > > > [   48.869918]  ? rcu_barrier+0x240/0x240
> > > > > [   48.870607]  smp_call_function_single+0x8e/0xd0
> > > > > [   48.871441]  rcutree_online_cpu+0x80/0x90
> > > > > [   48.872181]  cpuhp_invoke_callback+0xb5/0x890
> > > > > [   48.872979]  cpuhp_thread_fun+0x172/0x210
> > > > > [   48.873722]  ? cpuhp_thread_fun+0x2a/0x210
> > > > > [   48.874474]  smpboot_thread_fn+0x10d/0x160
> > > > > [   48.875224]  kthread+0xf3/0x130
> > > > > [   48.875804]  ? sort_range+0x20/0x20
> > > > > [   48.876446]  ? kthread_cancel_delayed_work_sync+0x10/0x10
> > > > > [   48.877445]  ret_from_fork+0x3a/0x50
> > > > > [   48.878124] irq event stamp: 734
> > > > > [   48.878717] hardirqs last  enabled at (733): [<ffffffff8e4f332d>] _raw_spin_unlock_irqrestore+0x2d/0x40
> > > > > [   48.880402] hardirqs last disabled at (734): [<ffffffff8db0110a>] generic_exec_single+0x9a/0x100
> > > > > [   48.881986] softirqs last  enabled at (0): [<ffffffff8da5feaf>] copy_process.part.56+0x61f/0x2110
> > > > > [   48.883540] softirqs last disabled at (0): [<0000000000000000>]           (null)
> > > > > [   48.884840] ---[ end trace 00b4c1d2f816f4ed ]---
> > > > > 
> > > > > If a CPU invokes generic_exec_single() on itself, the "IPI handler" will
> > > > > be invoked directly, triggering your new lockdep check.  Which is a bit
> > > > > wasteful.  My thought is to add code to sync_rcu_exp_select_node_cpus()
> > > > > to check the CPU with preemption disabled, avoiding the call to
> > > > > smp_call_function_single() in that case.
> > > > > 
> > > > > I have queued all four of your patches, and am trying the fix to
> > > > > the caller of smp_call_function_single() shown below.  Thoughts?
> > > > 
> > > > Oh interesting. Your fix makes sense. I will go through these paths more as
> > > > well since I'm not super familiar with this area of the RCU code. But I had
> > > > one small nit below.
> > > 
> > > Very good, applying that change.  I have a similar issue in the CPU-hotplug
> > > code that I will also be fixing.
> > > 
> > > Are there other places where I should be using get_cpu()?
> > 
> > I will check other usages. I wonder if this path is problematic:
> > 
> > rcu_do_batch AIUI can be called from process-context if boost is enabled.
> > In that case rcu_do_batch()-> invoke_rcu_core()-> smp_processor_id() might be
> > problematic. I will double confirm this situation is possible and send a
> > get/put_cpu patch as well if that's the case. Other paths seem to be
> > disabling interrupts or softirqs so they are fine. But I will go through it
> > in more detail later today (sorry for slow responses, currently catching a plane).
> 
> The theory is that the case where it is invoked from process context,
> it is invoked from an rcuc kthread, which is bound to a single CPU.
> Wouldn't hurt to check, though!
> 
> > CONFIG_DEBUG_PREEMPT should be able to catch these kinds of issues since
> > smp_processor_id() checks this internally. And it seems rcutorture configs do
> > enable these, so it may not be an issue after all, or that DEBUG_PREEMPT
> > checking needs some investigation to see why it doesn't warn if at all :-)
> 
> Or maybe I don't have CONFIG_DEBUG_PREEMPT enabled on the scenario that
> needs it.  ;-)
> 
> And please see below for an additional patch to make the world safe for
> rcu_is_cpu_rrupt_from_idle().  ;-)
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit a8d8c1e6e09a9a9521e3248a92f5fbb9eb2cf988
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date:   Wed Mar 27 10:03:12 2019 -0700
> 
>     rcu: Avoid self-IPI in sync_sched_exp_online_cleanup()
>     
>     The sync_sched_exp_online_cleanup() is invoked at online time to handle
>     the case where the start of an expedited grace period ran concurrently
>     with a CPU being taken offline and then immediately being placed online.
>     It checks to see if RCU needs an expedited quiescent state from the
>     incoming CPU, sending it an IPI if so.  However, it is quite possible
>     that sync_sched_exp_online_cleanup() is running on that CPU, in which
>     case it is considerably less overhead to simply request the quiescent
>     state locally instead of simulating a self-IPI.
>     
>     This commit therefore places the last few lines of rcu_exp_handler()
>     into a new rcu_exp_need_qs() function, which is invoked both by
>     rcu_exp_handler() and by sync_sched_exp_online_cleanup() in the self-IPI
>     case.
>     
>     This also reduces the rcu_exp_handler() function's state space by
>     removing the direct call that this smp_call_function_single() uses to
>     emulate the requested self-IPI.  This in turn will allow tighter error
>     checking in rcu_is_cpu_rrupt_from_idle().
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 5390618787b6..de1b4acf6979 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -699,6 +699,16 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>  
>  #else /* #ifdef CONFIG_PREEMPT_RCU */
>  
> +/* Request an expedited quiescent state. */
> +static void rcu_exp_need_qs(void)
> +{
> +	__this_cpu_write(rcu_data.cpu_no_qs.b.exp, true);
> +	/* Store .exp before .rcu_urgent_qs. */
> +	smp_store_release(this_cpu_ptr(&rcu_data.rcu_urgent_qs), true);
> +	set_tsk_need_resched(current);
> +	set_preempt_need_resched();
> +}
> +
>  /* Invoked on each online non-idle CPU for expedited quiescent state. */
>  static void rcu_exp_handler(void *unused)
>  {
> @@ -714,25 +724,38 @@ static void rcu_exp_handler(void *unused)
>  		rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
>  		return;
>  	}
> -	__this_cpu_write(rcu_data.cpu_no_qs.b.exp, true);
> -	/* Store .exp before .rcu_urgent_qs. */
> -	smp_store_release(this_cpu_ptr(&rcu_data.rcu_urgent_qs), true);
> -	set_tsk_need_resched(current);
> -	set_preempt_need_resched();
> +	rcu_exp_need_qs();
>  }
>  
>  /* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
>  static void sync_sched_exp_online_cleanup(int cpu)
>  {
> +	unsigned long flags;
> +	int my_cpu;
>  	struct rcu_data *rdp;
>  	int ret;
>  	struct rcu_node *rnp;
>  
>  	rdp = per_cpu_ptr(&rcu_data, cpu);
>  	rnp = rdp->mynode;
> -	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
> +	my_cpu = get_cpu();
> +	/* Quiescent state either not needed or already requested, leave. */
> +	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> +	    __this_cpu_read(rcu_data.cpu_no_qs.b.exp)) {
> +		put_cpu();
>  		return;
> +	}
> +	/* Quiescent state needed on current CPU, so set it up locally. */
> +	if (my_cpu == cpu) {
> +		local_irq_save(flags);
> +		rcu_exp_need_qs();
> +		local_irq_restore(flags);
> +		put_cpu();
> +		return;

This looks good to me, thanks. I love it that we can avoid the self-ipi and
reduce the overhead, and nice to see the lockdep check we added triggered
this optimization.

I still need to go through and understand the "PREEMPT=n hotplug clean up"
work. :-)

Also, you could add to the patch:
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

(and I'm going to go through the other places where get_cpu should be used)

thanks,

 - Joel


> +	}
> +	/* Quiescent state needed on some other CPU, send IPI. */
>  	ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
> +	put_cpu();
>  	WARN_ON_ONCE(ret);
>  }
>  
>
Paul E. McKenney March 28, 2019, 3:06 p.m. UTC | #7
On Wed, Mar 27, 2019 at 05:38:46PM -0400, Joel Fernandes wrote:
> On Wed, Mar 27, 2019 at 11:44:37AM -0700, Paul E. McKenney wrote:
> > On Wed, Mar 27, 2019 at 01:45:45PM -0400, Joel Fernandes wrote:
> > > On Wed, Mar 27, 2019 at 08:53:51AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Mar 27, 2019 at 11:34:01AM -0400, Joel Fernandes wrote:
> > > > > On Tue, Mar 26, 2019 at 07:47:51PM -0700, Paul E. McKenney wrote:
> > > > > > On Tue, Mar 26, 2019 at 03:24:09PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > In the future we would like to combine the dynticks and dynticks_nesting
> > > > > > > counters thus leading to simplifying the code. At the moment we cannot
> > > > > > > do that due to concerns about usermode upcalls appearing to RCU as half
> > > > > > > of an interrupt. Byungchul tried to do it in [1] but the
> > > > > > > "half-interrupt" concern was raised. It is half because, what RCU
> > > > > > > expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
> > > > > > > exception happens. However, only rcu_irq_enter() is observed. This
> > > > > > > concern may not be valid anymore, but at least it used to be the case.
> > > > > > > 
> > > > > > > Out of abundance of caution, Paul added warnings [2] in the RCU code
> > > > > > > which if not fired by 2021 may allow us to assume that such
> > > > > > > half-interrupt scenario cannot happen any more, which can lead to
> > > > > > > simplification of this code.
> > > > > > > 
> > > > > > > Summary of the changes are the following:
> > > > > > > 
> > > > > > > (1) In preparation for this combination of counters in the future, we
> > > > > > > first need to first be sure that rcu_rrupt_from_idle cannot be called
> > > > > > > from anywhere but a hard-interrupt because previously, the comments
> > > > > > > suggested otherwise so let us be sure. We discussed this here [3]. We
> > > > > > > use the services of lockdep to accomplish this.
> > > > > > > 
> > > > > > > (2) Further rcu_rrupt_from_idle() is not explicit about how it is using
> > > > > > > the counters which can lead to weird future bugs. This patch therefore
> > > > > > > makes it more explicit about the specific counter values being tested
> > > > > > > 
> > > > > > > (3) Lastly, we check for counter underflows just to be sure these are
> > > > > > > not happening, because the previous code in rcu_rrupt_from_idle() was
> > > > > > > allowing the case where the counters can underflow, and the function
> > > > > > > would still return true. Now we are checking for specific values so let
> > > > > > > us be confident by additional checking, that such underflows don't
> > > > > > > happen. Any case, if they do, we should fix them and the screaming
> > > > > > > warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
> > > > > > > and PROVE_LOCKING are disabled.
> > > > > > > 
> > > > > > > [1] https://lore.kernel.org/patchwork/patch/952349/
> > > > > > > [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
> > > > > > > [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/
> > > > > > > 
> > > > > > > Cc: byungchul.park@lge.com
> > > > > > > Cc: kernel-team@android.com
> > > > > > > Cc: rcu@vger.kernel.org
> > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > 
> > > > > > Color me stupid:
> > > > > > 
> > > > > > [   48.845724] ------------[ cut here ]------------
> > > > > > [   48.846619] Not in hardirq as expected
> > > > > > [   48.847322] WARNING: CPU: 5 PID: 34 at /home/git/linux-2.6-tip/kernel/rcu/tree.c:388 rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > > > > > [   48.849302] Modules linked in:
> > > > > > [   48.849869] CPU: 5 PID: 34 Comm: cpuhp/5 Not tainted 5.1.0-rc1+ #1
> > > > > > [   48.850985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > > > [   48.852436] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0xea/0x110
> > > > > > [   48.853455] Code: 85 c0 0f 85 59 ff ff ff 80 3d 33 55 68 01 00 0f 85 4c ff ff ff 48 c7 c7 48 d8 cc 8e 31 c0 c6 05 1d 55 68 01 01 e8 66 54 f8 ff <0f> 0b e9 30 ff ff ff 65 48 8b 05 df 58 54 72 48 85 c0 0f 94 c0 0f
> > > > > > [   48.856783] RSP: 0000:ffffbc46802dfdc0 EFLAGS: 00010082
> > > > > > [   48.857735] RAX: 000000000000001a RBX: 0000000000022b80 RCX: 0000000000000000
> > > > > > [   48.859028] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8dac906c
> > > > > > [   48.860313] RBP: ffffbc46802dfe20 R08: 0000000000000001 R09: 0000000000000001
> > > > > > [   48.861607] R10: 000000007d53d16d R11: ffffbc46802dfb48 R12: ffff9e7d7eb62b80
> > > > > > [   48.862898] R13: 0000000000000005 R14: ffffffff8dae2ac0 R15: 00000000000000c9
> > > > > > [   48.864191] FS:  0000000000000000(0000) GS:ffff9e7d7eb40000(0000) knlGS:0000000000000000
> > > > > > [   48.865663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [   48.866702] CR2: 0000000000000000 CR3: 0000000021022000 CR4: 00000000000006e0
> > > > > > [   48.867993] Call Trace:
> > > > > > [   48.868450]  rcu_exp_handler+0x35/0x90
> > > > > > [   48.869147]  generic_exec_single+0xab/0x100
> > > > > > [   48.869918]  ? rcu_barrier+0x240/0x240
> > > > > > [   48.870607]  smp_call_function_single+0x8e/0xd0
> > > > > > [   48.871441]  rcutree_online_cpu+0x80/0x90
> > > > > > [   48.872181]  cpuhp_invoke_callback+0xb5/0x890
> > > > > > [   48.872979]  cpuhp_thread_fun+0x172/0x210
> > > > > > [   48.873722]  ? cpuhp_thread_fun+0x2a/0x210
> > > > > > [   48.874474]  smpboot_thread_fn+0x10d/0x160
> > > > > > [   48.875224]  kthread+0xf3/0x130
> > > > > > [   48.875804]  ? sort_range+0x20/0x20
> > > > > > [   48.876446]  ? kthread_cancel_delayed_work_sync+0x10/0x10
> > > > > > [   48.877445]  ret_from_fork+0x3a/0x50
> > > > > > [   48.878124] irq event stamp: 734
> > > > > > [   48.878717] hardirqs last  enabled at (733): [<ffffffff8e4f332d>] _raw_spin_unlock_irqrestore+0x2d/0x40
> > > > > > [   48.880402] hardirqs last disabled at (734): [<ffffffff8db0110a>] generic_exec_single+0x9a/0x100
> > > > > > [   48.881986] softirqs last  enabled at (0): [<ffffffff8da5feaf>] copy_process.part.56+0x61f/0x2110
> > > > > > [   48.883540] softirqs last disabled at (0): [<0000000000000000>]           (null)
> > > > > > [   48.884840] ---[ end trace 00b4c1d2f816f4ed ]---
> > > > > > 
> > > > > > If a CPU invokes generic_exec_single() on itself, the "IPI handler" will
> > > > > > be invoked directly, triggering your new lockdep check.  Which is a bit
> > > > > > wasteful.  My thought is to add code to sync_rcu_exp_select_node_cpus()
> > > > > > to check the CPU with preemption disabled, avoiding the call to
> > > > > > smp_call_function_single() in that case.
> > > > > > 
> > > > > > I have queued all four of your patches, and am trying the fix to
> > > > > > the caller of smp_call_function_single() shown below.  Thoughts?
> > > > > 
> > > > > Oh interesting. Your fix makes sense. I will go through these paths more as
> > > > > well since I'm not super familiar with this area of the RCU code. But I had
> > > > > one small nit below.
> > > > 
> > > > Very good, applying that change.  I have a similar issue in the CPU-hotplug
> > > > code that I will also be fixing.
> > > > 
> > > > Are there other places where I should be using get_cpu()?
> > > 
> > > I will check other usages. I wonder if this path is problematic:
> > > 
> > > rcu_do_batch AIUI can be called from process-context if boost is enabled.
> > > In that case rcu_do_batch()-> invoke_rcu_core()-> smp_processor_id() might be
> > > problematic. I will double confirm this situation is possible and send a
> > > get/put_cpu patch as well if that's the case. Other paths seem to be
> > > disabling interrupts or softirqs so they are fine. But I will go through it
> > > in more detail later today (sorry for slow responses, currently catching a plane).
> > 
> > The theory is that the case where it is invoked from process context,
> > it is invoked from an rcuc kthread, which is bound to a single CPU.
> > Wouldn't hurt to check, though!
> > 
> > > CONFIG_DEBUG_PREEMPT should be able to catch these kinds of issues since
> > > smp_processor_id() checks this internally. And it seems rcutorture configs do
> > > enable these, so it may not be an issue after all, or that DEBUG_PREEMPT
> > > checking needs some investigation to see why it doesn't warn if at all :-)
> > 
> > Or maybe I don't have CONFIG_DEBUG_PREEMPT enabled on the scenario that
> > needs it.  ;-)
> > 
> > And please see below for an additional patch to make the world safe for
> > rcu_is_cpu_rrupt_from_idle().  ;-)
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit a8d8c1e6e09a9a9521e3248a92f5fbb9eb2cf988
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date:   Wed Mar 27 10:03:12 2019 -0700
> > 
> >     rcu: Avoid self-IPI in sync_sched_exp_online_cleanup()
> >     
> >     The sync_sched_exp_online_cleanup() is invoked at online time to handle
> >     the case where the start of an expedited grace period ran concurrently
> >     with a CPU being taken offline and then immediately being placed online.
> >     It checks to see if RCU needs an expedited quiescent state from the
> >     incoming CPU, sending it an IPI if so.  However, it is quite possible
> >     that sync_sched_exp_online_cleanup() is running on that CPU, in which
> >     case it is considerably less overhead to simply request the quiescent
> >     state locally instead of simulating a self-IPI.
> >     
> >     This commit therefore places the last few lines of rcu_exp_handler()
> >     into a new rcu_exp_need_qs() function, which is invoked both by
> >     rcu_exp_handler() and by sync_sched_exp_online_cleanup() in the self-IPI
> >     case.
> >     
> >     This also reduces the rcu_exp_handler() function's state space by
> >     removing the direct call that this smp_call_function_single() uses to
> >     emulate the requested self-IPI.  This in turn will allow tighter error
> >     checking in rcu_is_cpu_rrupt_from_idle().
> >     
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 5390618787b6..de1b4acf6979 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -699,6 +699,16 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> >  
> >  #else /* #ifdef CONFIG_PREEMPT_RCU */
> >  
> > +/* Request an expedited quiescent state. */
> > +static void rcu_exp_need_qs(void)
> > +{
> > +	__this_cpu_write(rcu_data.cpu_no_qs.b.exp, true);
> > +	/* Store .exp before .rcu_urgent_qs. */
> > +	smp_store_release(this_cpu_ptr(&rcu_data.rcu_urgent_qs), true);
> > +	set_tsk_need_resched(current);
> > +	set_preempt_need_resched();
> > +}
> > +
> >  /* Invoked on each online non-idle CPU for expedited quiescent state. */
> >  static void rcu_exp_handler(void *unused)
> >  {
> > @@ -714,25 +724,38 @@ static void rcu_exp_handler(void *unused)
> >  		rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
> >  		return;
> >  	}
> > -	__this_cpu_write(rcu_data.cpu_no_qs.b.exp, true);
> > -	/* Store .exp before .rcu_urgent_qs. */
> > -	smp_store_release(this_cpu_ptr(&rcu_data.rcu_urgent_qs), true);
> > -	set_tsk_need_resched(current);
> > -	set_preempt_need_resched();
> > +	rcu_exp_need_qs();
> >  }
> >  
> >  /* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
> >  static void sync_sched_exp_online_cleanup(int cpu)
> >  {
> > +	unsigned long flags;
> > +	int my_cpu;
> >  	struct rcu_data *rdp;
> >  	int ret;
> >  	struct rcu_node *rnp;
> >  
> >  	rdp = per_cpu_ptr(&rcu_data, cpu);
> >  	rnp = rdp->mynode;
> > -	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
> > +	my_cpu = get_cpu();
> > +	/* Quiescent state either not needed or already requested, leave. */
> > +	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> > +	    __this_cpu_read(rcu_data.cpu_no_qs.b.exp)) {
> > +		put_cpu();
> >  		return;
> > +	}
> > +	/* Quiescent state needed on current CPU, so set it up locally. */
> > +	if (my_cpu == cpu) {
> > +		local_irq_save(flags);
> > +		rcu_exp_need_qs();
> > +		local_irq_restore(flags);
> > +		put_cpu();
> > +		return;
> 
> This looks good to me, thanks. I love it that we can avoid the self-ipi and
> reduce the overhead, and nice to see the lockdep check we added triggered
> this optimization.

Here is hoping...  Passed light testing overnight, which is a good sign.

> I still need to go through and understand the "PREEMPT=n hotplug clean up"
> work. :-)

A review of that code would be quite welcome!

> Also, you could add to the patch:
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Done, thank you!

> (and I'm going to go through the other places where get_cpu should be used)

Very good, looking forward to it!

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > +	}
> > +	/* Quiescent state needed on some other CPU, send IPI. */
> >  	ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
> > +	put_cpu();
> >  	WARN_ON_ONCE(ret);
> >  }
> >  
> > 
>

Patch
diff mbox series

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9180158756d2..dbff8a274c46 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -381,16 +381,29 @@  static void __maybe_unused rcu_momentary_dyntick_idle(void)
 }
 
 /**
- * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle
+ * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
  *
- * If the current CPU is idle or running at a first-level (not nested)
+ * If the current CPU is idle and running at a first-level (not nested)
  * interrupt from idle, return true.  The caller must have at least
  * disabled preemption.
  */
 static int rcu_is_cpu_rrupt_from_idle(void)
 {
-	return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
-	       __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
+	/* Called only from within the scheduling-clock interrupt */
+	lockdep_assert_in_irq();
+
+	/* Check for counter underflows */
+	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
+			 "RCU dynticks_nesting counter underflow!");
+	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
+			 "RCU dynticks_nmi_nesting counter underflow/zero!");
+
+	/* Are we at first interrupt nesting level? */
+	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
+		return false;
+
+	/* Does CPU appear to be idle from an RCU standpoint? */
+	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
 }
 
 #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch. */