[v7,4/6] rcu: Add RCU stall diagnosis information

Message ID	20221111130709.247-5-thunder.leizhen@huawei.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <rcu-owner@kernel.org> From: Zhen Lei <thunder.leizhen@huawei.com> To: "Paul E . McKenney" <paulmck@kernel.org>, Frederic Weisbecker <frederic@kernel.org>, Neeraj Upadhyay <quic_neeraju@quicinc.com>, "Josh Triplett" <josh@joshtriplett.org>, Steven Rostedt <rostedt@goodmis.org>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Lai Jiangshan <jiangshanlai@gmail.com>, Joel Fernandes <joel@joelfernandes.org>, <rcu@vger.kernel.org>, <linux-kernel@vger.kernel.org> CC: Zhen Lei <thunder.leizhen@huawei.com>, Robert Elliott <elliott@hpe.com> Subject: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information Date: Fri, 11 Nov 2022 21:07:07 +0800 Message-ID: <20221111130709.247-5-thunder.leizhen@huawei.com> In-Reply-To: <20221111130709.247-1-thunder.leizhen@huawei.com> References: <20221111130709.247-1-thunder.leizhen@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII Precedence: bulk
Series	rcu: Add RCU stall diagnosis information \| expand [v7,0/6] rcu: Add RCU stall diagnosis information [v7,1/6] genirq: Fix the return type of kstat_cpu_irqs_sum() [v7,2/6] sched: Add helper kstat_cpu_softirqs_sum() [v7,3/6] sched: Add helper nr_context_switches_cpu() [v7,4/6] rcu: Add RCU stall diagnosis information [v7,5/6] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information [v7,6/6] rcu: Align the output of RCU stall

Leizhen (ThunderTown) Nov. 11, 2022, 1:07 p.m. UTC

Because RCU CPU stall warnings are driven from the scheduling-clock
interrupt handler, a workload consisting of a very large number of
short-duration hardware interrupts can result in misleading stall-warning
messages.  On systems supporting only a single level of interrupts,
that is, where interrupts handlers cannot be interrupted, this can
produce misleading diagnostics.  The stack traces will show the
innocent-bystander interrupted task, not the interrupts that are
at the very least exacerbating the stall.

This situation can be improved by displaying the number of interrupts
and the CPU time that they have consumed.  Diagnosing other types
of stalls can be eased by also providing the count of softirqs and
the CPU time that they consumed as well as the number of context
switches and the task-level CPU time consumed.

Consider the following output given this change:

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:     0-....: (1250 ticks this GP) <omitted>
rcu:          hardirqs   softirqs   csw/system
rcu:  number:      624         45            0
rcu: cputime:       69          1         2425   ==> 2500(ms)

This output shows that the number of hard and soft interrupts is small,
there are no context switches, and the system takes up a lot of time. This
indicates that the current task is looping with preemption disabled.

The impact on system performance is negligible because snapshot is
recorded only once for all continuous RCU stalls.

This added debugging information is suppressed by default and can be
enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or
by booting with rcupdate.rcu_cpu_stall_cputime=1.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  6 ++++
 kernel/rcu/Kconfig.debug                      | 11 +++++++
 kernel/rcu/rcu.h                              |  1 +
 kernel/rcu/tree.c                             | 18 +++++++++++
 kernel/rcu/tree.h                             | 19 ++++++++++++
 kernel/rcu/tree_stall.h                       | 31 +++++++++++++++++++
 kernel/rcu/update.c                           |  2 ++
 7 files changed, 88 insertions(+)

Frederic Weisbecker Nov. 14, 2022, 11:24 a.m. UTC | #1

On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
> Because RCU CPU stall warnings are driven from the scheduling-clock
> interrupt handler, a workload consisting of a very large number of
> short-duration hardware interrupts can result in misleading stall-warning
> messages.  On systems supporting only a single level of interrupts,
> that is, where interrupts handlers cannot be interrupted, this can
> produce misleading diagnostics.  The stack traces will show the
> innocent-bystander interrupted task, not the interrupts that are
> at the very least exacerbating the stall.
> 
> This situation can be improved by displaying the number of interrupts
> and the CPU time that they have consumed.  Diagnosing other types
> of stalls can be eased by also providing the count of softirqs and
> the CPU time that they consumed as well as the number of context
> switches and the task-level CPU time consumed.
> 
> Consider the following output given this change:
> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu:     0-....: (1250 ticks this GP) <omitted>
> rcu:          hardirqs   softirqs   csw/system
> rcu:  number:      624         45            0
> rcu: cputime:       69          1         2425   ==> 2500(ms)
> 
> This output shows that the number of hard and soft interrupts is small,
> there are no context switches, and the system takes up a lot of time. This
> indicates that the current task is looping with preemption disabled.
> 
> The impact on system performance is negligible because snapshot is
> recorded only once for all continuous RCU stalls.
> 
> This added debugging information is suppressed by default and can be
> enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or
> by booting with rcupdate.rcu_cpu_stall_cputime=1.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  .../admin-guide/kernel-parameters.txt         |  6 ++++
>  kernel/rcu/Kconfig.debug                      | 11 +++++++
>  kernel/rcu/rcu.h                              |  1 +
>  kernel/rcu/tree.c                             | 18 +++++++++++
>  kernel/rcu/tree.h                             | 19 ++++++++++++
>  kernel/rcu/tree_stall.h                       | 31 +++++++++++++++++++
>  kernel/rcu/update.c                           |  2 ++
>  7 files changed, 88 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 811b2e6d4672685..ee7d9d962591c5d 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5084,6 +5084,12 @@
>  			rcupdate.rcu_cpu_stall_timeout to be used (after
>  			conversion from seconds to milliseconds).
>  
> +	rcupdate.rcu_cpu_stall_cputime= [KNL]
> +			Provide statistics on the cputime and count of
> +			interrupts and tasks during the sampling period. For
> +			multiple continuous RCU stalls, all sampling periods
> +			begin at half of the first RCU stall timeout.
> +
>  	rcupdate.rcu_expedited= [KNL]
>  			Use expedited grace-period primitives, for
>  			example, synchronize_rcu_expedited() instead
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 1b0c41d490f0588..025566a9ba44667 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -95,6 +95,17 @@ config RCU_EXP_CPU_STALL_TIMEOUT
>  	  says to use the RCU_CPU_STALL_TIMEOUT value converted from
>  	  seconds to milliseconds.
>  
> +config RCU_CPU_STALL_CPUTIME
> +	bool "Provide additional RCU stall debug information"
> +	depends on RCU_STALL_COMMON
> +	default n
> +	help
> +	  Collect statistics during the sampling period, such as the number of
> +	  (hard interrupts, soft interrupts, task switches) and the cputime of
> +	  (hard interrupts, soft interrupts, kernel tasks) are added to the
> +	  RCU stall report. For multiple continuous RCU stalls, all sampling
> +	  periods begin at half of the first RCU stall timeout.
> +
>  config RCU_TRACE
>  	bool "Enable tracing for RCU"
>  	depends on DEBUG_KERNEL
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 96122f203187f39..4844dec36bddb48 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -231,6 +231,7 @@ extern int rcu_cpu_stall_ftrace_dump;
>  extern int rcu_cpu_stall_suppress;
>  extern int rcu_cpu_stall_timeout;
>  extern int rcu_exp_cpu_stall_timeout;
> +extern int rcu_cpu_stall_cputime;
>  int rcu_jiffies_till_stall_check(void);
>  int rcu_exp_jiffies_till_stall_check(void);
>  
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index ed93ddb8203d42c..3921aacfd421ba9 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -866,6 +866,24 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
>  		}
> +
> +		if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq != rdp->gp_seq) {
> +			int cpu = rdp->cpu;
> +			struct rcu_snap_record *rsrp;
> +			struct kernel_cpustat *kcsp;
> +
> +			kcsp = &kcpustat_cpu(cpu);
> +
> +			rsrp = &rdp->snap_record;
> +			rsrp->cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
> +			rsrp->cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
> +			rsrp->cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
> +			rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
> +			rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);

Getting the sum of all CPU's IRQs, with even two iterations on all of them, look
costly. So I have to ask: why is this information useful and why can't we deduce
it from other CPUs stall reports?

I'm also asking because this rcu_cpu_stall_cputime is likely to be very useful for
distros, to the point that I expect it to be turned on by default as doing a
snapshot of kcpustat fields is cheap. But doing that wide CPU snapshot is
definetly going to be an unbearable overhead.

Thanks.

Leizhen (ThunderTown) Nov. 14, 2022, 12:32 p.m. UTC | #2

On 2022/11/14 19:24, Frederic Weisbecker wrote:
> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
>> Because RCU CPU stall warnings are driven from the scheduling-clock
>> interrupt handler, a workload consisting of a very large number of
>> short-duration hardware interrupts can result in misleading stall-warning
>> messages.  On systems supporting only a single level of interrupts,
>> that is, where interrupts handlers cannot be interrupted, this can
>> produce misleading diagnostics.  The stack traces will show the
>> innocent-bystander interrupted task, not the interrupts that are
>> at the very least exacerbating the stall.
>>
>> This situation can be improved by displaying the number of interrupts
>> and the CPU time that they have consumed.  Diagnosing other types
>> of stalls can be eased by also providing the count of softirqs and
>> the CPU time that they consumed as well as the number of context
>> switches and the task-level CPU time consumed.
>>
>> Consider the following output given this change:
>>
>> rcu: INFO: rcu_preempt self-detected stall on CPU
>> rcu:     0-....: (1250 ticks this GP) <omitted>
>> rcu:          hardirqs   softirqs   csw/system
>> rcu:  number:      624         45            0
>> rcu: cputime:       69          1         2425   ==> 2500(ms)
>>
>> This output shows that the number of hard and soft interrupts is small,
>> there are no context switches, and the system takes up a lot of time. This
>> indicates that the current task is looping with preemption disabled.
>>
>> The impact on system performance is negligible because snapshot is
>> recorded only once for all continuous RCU stalls.
>>
>> This added debugging information is suppressed by default and can be
>> enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or
>> by booting with rcupdate.rcu_cpu_stall_cputime=1.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  6 ++++
>>  kernel/rcu/Kconfig.debug                      | 11 +++++++
>>  kernel/rcu/rcu.h                              |  1 +
>>  kernel/rcu/tree.c                             | 18 +++++++++++
>>  kernel/rcu/tree.h                             | 19 ++++++++++++
>>  kernel/rcu/tree_stall.h                       | 31 +++++++++++++++++++
>>  kernel/rcu/update.c                           |  2 ++
>>  7 files changed, 88 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 811b2e6d4672685..ee7d9d962591c5d 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -5084,6 +5084,12 @@
>>  			rcupdate.rcu_cpu_stall_timeout to be used (after
>>  			conversion from seconds to milliseconds).
>>  
>> +	rcupdate.rcu_cpu_stall_cputime= [KNL]
>> +			Provide statistics on the cputime and count of
>> +			interrupts and tasks during the sampling period. For
>> +			multiple continuous RCU stalls, all sampling periods
>> +			begin at half of the first RCU stall timeout.
>> +
>>  	rcupdate.rcu_expedited= [KNL]
>>  			Use expedited grace-period primitives, for
>>  			example, synchronize_rcu_expedited() instead
>> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
>> index 1b0c41d490f0588..025566a9ba44667 100644
>> --- a/kernel/rcu/Kconfig.debug
>> +++ b/kernel/rcu/Kconfig.debug
>> @@ -95,6 +95,17 @@ config RCU_EXP_CPU_STALL_TIMEOUT
>>  	  says to use the RCU_CPU_STALL_TIMEOUT value converted from
>>  	  seconds to milliseconds.
>>  
>> +config RCU_CPU_STALL_CPUTIME
>> +	bool "Provide additional RCU stall debug information"
>> +	depends on RCU_STALL_COMMON
>> +	default n
>> +	help
>> +	  Collect statistics during the sampling period, such as the number of
>> +	  (hard interrupts, soft interrupts, task switches) and the cputime of
>> +	  (hard interrupts, soft interrupts, kernel tasks) are added to the
>> +	  RCU stall report. For multiple continuous RCU stalls, all sampling
>> +	  periods begin at half of the first RCU stall timeout.
>> +
>>  config RCU_TRACE
>>  	bool "Enable tracing for RCU"
>>  	depends on DEBUG_KERNEL
>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
>> index 96122f203187f39..4844dec36bddb48 100644
>> --- a/kernel/rcu/rcu.h
>> +++ b/kernel/rcu/rcu.h
>> @@ -231,6 +231,7 @@ extern int rcu_cpu_stall_ftrace_dump;
>>  extern int rcu_cpu_stall_suppress;
>>  extern int rcu_cpu_stall_timeout;
>>  extern int rcu_exp_cpu_stall_timeout;
>> +extern int rcu_cpu_stall_cputime;
>>  int rcu_jiffies_till_stall_check(void);
>>  int rcu_exp_jiffies_till_stall_check(void);
>>  
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index ed93ddb8203d42c..3921aacfd421ba9 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -866,6 +866,24 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
>>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
>>  		}
>> +
>> +		if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq != rdp->gp_seq) {
>> +			int cpu = rdp->cpu;
>> +			struct rcu_snap_record *rsrp;
>> +			struct kernel_cpustat *kcsp;
>> +
>> +			kcsp = &kcpustat_cpu(cpu);
>> +
>> +			rsrp = &rdp->snap_record;
>> +			rsrp->cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
>> +			rsrp->cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
>> +			rsrp->cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
>> +			rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
>> +			rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
> 
> Getting the sum of all CPU's IRQs, with even two iterations on all of them, look
> costly. So I have to ask: why is this information useful and why can't we deduce
> it from other CPUs stall reports?

Only the RCU stalled CPUs are recorded. Why all CPUs?

static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
{
	rcu_for_each_leaf_node(rnp) {
		if (rnp->qsmask == 0) {
			continue;
		}
		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
			if (f(rdp))

> 
> I'm also asking because this rcu_cpu_stall_cputime is likely to be very useful for
> distros, to the point that I expect it to be turned on by default as doing a
> snapshot of kcpustat fields is cheap. But doing that wide CPU snapshot is
> definetly going to be an unbearable overhead.

I purposely added a print test, only the RCU stalled CPU would be taken snapshots and
calculated differentials.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d1f0d857dc85df5..693e7c83bd17d1e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -872,6 +872,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
                        struct rcu_snap_record *rsrp;
                        struct kernel_cpustat *kcsp;

+                       printk("fixme: cpu=%d\n", smp_processor_id());
                        kcsp = &kcpustat_cpu(cpu);

                        rsrp = &rdp->snap_record;

> 
> Thanks.
> .
>

Frederic Weisbecker Nov. 14, 2022, 12:46 p.m. UTC | #3

On Mon, Nov 14, 2022 at 08:32:19PM +0800, Leizhen (ThunderTown) wrote:
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index ed93ddb8203d42c..3921aacfd421ba9 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -866,6 +866,24 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
> >>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
> >>  		}
> >> +
> >> +		if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq != rdp->gp_seq) {
> >> +			int cpu = rdp->cpu;
> >> +			struct rcu_snap_record *rsrp;
> >> +			struct kernel_cpustat *kcsp;
> >> +
> >> +			kcsp = &kcpustat_cpu(cpu);
> >> +
> >> +			rsrp = &rdp->snap_record;
> >> +			rsrp->cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
> >> +			rsrp->cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
> >> +			rsrp->cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
> >> +			rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
> >> +			rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
> > 
> > Getting the sum of all CPU's IRQs, with even two iterations on all of them, look
> > costly. So I have to ask: why is this information useful and why can't we deduce
> > it from other CPUs stall reports?
> 
> Only the RCU stalled CPUs are recorded. Why all CPUs?

Bah, I misread kstat_cpu_softirqs_sum() kstat_cpu_irqs_sum() content. Sorry
about that, my brainfart... :-)

Frederic Weisbecker Nov. 16, 2022, 10:39 p.m. UTC | #4

On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
> @@ -262,6 +279,8 @@ struct rcu_data {
>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
> +					    /* the first RCU stall timeout */

This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME

> +static void print_cpu_stat_info(int cpu)
> +{
> +	struct rcu_snap_record rsr, *rsrp;
> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +	struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
> +
> +	if (!rcu_cpu_stall_cputime)
> +		return;
> +
> +	rsrp = &rdp->snap_record;
> +	if (rsrp->gp_seq != rdp->gp_seq)
> +		return;
> +
> +	rsr.cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
> +	rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
> +	rsr.cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
> +
> +	pr_err("\t         hardirqs   softirqs   csw/system\n");
> +	pr_err("\t number: %8ld %10d %12lld\n",
> +		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
> +		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
> +		nr_context_switches_cpu(cpu) - rsrp->nr_csw);
> +	pr_err("\tcputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
> +		div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
> +		div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
> +		div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
> +		jiffies64_to_msecs(jiffies - rsrp->jiffies));

jiffies_to_msecs() should be enough.

Thanks.

Leizhen (ThunderTown) Nov. 17, 2022, 1:57 a.m. UTC | #5

On 2022/11/17 6:39, Frederic Weisbecker wrote:
> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
>> @@ -262,6 +279,8 @@ struct rcu_data {
>>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
>> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
>> +					    /* the first RCU stall timeout */
> 
> This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME

This will not work for now because we also support boot option rcupdate.rcu_cpu_stall_cputime.

> 
>> +static void print_cpu_stat_info(int cpu)
>> +{
>> +	struct rcu_snap_record rsr, *rsrp;
>> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>> +	struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
>> +
>> +	if (!rcu_cpu_stall_cputime)
>> +		return;
>> +
>> +	rsrp = &rdp->snap_record;
>> +	if (rsrp->gp_seq != rdp->gp_seq)
>> +		return;
>> +
>> +	rsr.cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
>> +	rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
>> +	rsr.cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
>> +
>> +	pr_err("\t         hardirqs   softirqs   csw/system\n");
>> +	pr_err("\t number: %8ld %10d %12lld\n",
>> +		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
>> +		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
>> +		nr_context_switches_cpu(cpu) - rsrp->nr_csw);
>> +	pr_err("\tcputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
>> +		div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
>> +		div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
>> +		div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
>> +		jiffies64_to_msecs(jiffies - rsrp->jiffies));
> 
> jiffies_to_msecs() should be enough.

OK, thanks.

> 
> Thanks.
> 
> .
>

Frederic Weisbecker Nov. 17, 2022, 12:22 p.m. UTC | #6

On Thu, Nov 17, 2022 at 09:57:18AM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/11/17 6:39, Frederic Weisbecker wrote:
> > On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
> >> @@ -262,6 +279,8 @@ struct rcu_data {
> >>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
> >>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
> >>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
> >> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
> >> +					    /* the first RCU stall timeout */
> > 
> > This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME
> 
> This will not work for now because we also support boot option
> rcupdate.rcu_cpu_stall_cputime.

I'm confused. If CONFIG_RCU_CPU_STALL_CPUTIME=n then rcupdate.rcu_cpu_stall_cputime has
no effect, right?

Thanks.

> 
> > 
> >> +static void print_cpu_stat_info(int cpu)
> >> +{
> >> +	struct rcu_snap_record rsr, *rsrp;
> >> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >> +	struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
> >> +
> >> +	if (!rcu_cpu_stall_cputime)
> >> +		return;
> >> +
> >> +	rsrp = &rdp->snap_record;
> >> +	if (rsrp->gp_seq != rdp->gp_seq)
> >> +		return;
> >> +
> >> +	rsr.cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
> >> +	rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
> >> +	rsr.cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
> >> +
> >> +	pr_err("\t         hardirqs   softirqs   csw/system\n");
> >> +	pr_err("\t number: %8ld %10d %12lld\n",
> >> +		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
> >> +		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
> >> +		nr_context_switches_cpu(cpu) - rsrp->nr_csw);
> >> +	pr_err("\tcputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
> >> +		div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
> >> +		div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
> >> +		div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
> >> +		jiffies64_to_msecs(jiffies - rsrp->jiffies));
> > 
> > jiffies_to_msecs() should be enough.
> 
> OK, thanks.
> 
> > 
> > Thanks.
> > 
> > .
> > 
> 
> -- 
> Regards,
>   Zhen Lei

Leizhen (ThunderTown) Nov. 17, 2022, 1:25 p.m. UTC | #7

On 2022/11/17 20:22, Frederic Weisbecker wrote:
> On Thu, Nov 17, 2022 at 09:57:18AM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/11/17 6:39, Frederic Weisbecker wrote:
>>> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
>>>> @@ -262,6 +279,8 @@ struct rcu_data {
>>>>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>>>>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>>>>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
>>>> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
>>>> +					    /* the first RCU stall timeout */
>>>
>>> This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME
>>
>> This will not work for now because we also support boot option
>> rcupdate.rcu_cpu_stall_cputime.
> 
> I'm confused. If CONFIG_RCU_CPU_STALL_CPUTIME=n then rcupdate.rcu_cpu_stall_cputime has
> no effect, right?

No, rcupdate.rcu_cpu_stall_cputime override CONFIG_RCU_CPU_STALL_CPUTIME. Because
the default value of CONFIG_RCU_CPU_STALL_CPUTIME is n, so in most cases, we need
rcupdate.rcu_cpu_stall_cputime as the escape route.

If CONFIG_RCU_CPU_STALL_CPUTIME=y is default, your suggestion is more appropriate.

> 
> Thanks.
> 
>>
>>>
>>>> +static void print_cpu_stat_info(int cpu)
>>>> +{
>>>> +	struct rcu_snap_record rsr, *rsrp;
>>>> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>>>> +	struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
>>>> +
>>>> +	if (!rcu_cpu_stall_cputime)
>>>> +		return;
>>>> +
>>>> +	rsrp = &rdp->snap_record;
>>>> +	if (rsrp->gp_seq != rdp->gp_seq)
>>>> +		return;
>>>> +
>>>> +	rsr.cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
>>>> +	rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
>>>> +	rsr.cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
>>>> +
>>>> +	pr_err("\t         hardirqs   softirqs   csw/system\n");
>>>> +	pr_err("\t number: %8ld %10d %12lld\n",
>>>> +		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
>>>> +		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
>>>> +		nr_context_switches_cpu(cpu) - rsrp->nr_csw);
>>>> +	pr_err("\tcputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
>>>> +		div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
>>>> +		div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
>>>> +		div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
>>>> +		jiffies64_to_msecs(jiffies - rsrp->jiffies));
>>>
>>> jiffies_to_msecs() should be enough.
>>
>> OK, thanks.
>>
>>>
>>> Thanks.
>>>
>>> .
>>>
>>
>> -- 
>> Regards,
>>   Zhen Lei
> .
>

Frederic Weisbecker Nov. 17, 2022, 2:26 p.m. UTC | #8

On Thu, Nov 17, 2022 at 09:25:44PM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/11/17 20:22, Frederic Weisbecker wrote:
> > On Thu, Nov 17, 2022 at 09:57:18AM +0800, Leizhen (ThunderTown) wrote:
> >>
> >>
> >> On 2022/11/17 6:39, Frederic Weisbecker wrote:
> >>> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
> >>>> @@ -262,6 +279,8 @@ struct rcu_data {
> >>>>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
> >>>>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
> >>>>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
> >>>> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
> >>>> +					    /* the first RCU stall timeout */
> >>>
> >>> This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME
> >>
> >> This will not work for now because we also support boot option
> >> rcupdate.rcu_cpu_stall_cputime.
> > 
> > I'm confused. If CONFIG_RCU_CPU_STALL_CPUTIME=n then rcupdate.rcu_cpu_stall_cputime has
> > no effect, right?
> 
> No, rcupdate.rcu_cpu_stall_cputime override CONFIG_RCU_CPU_STALL_CPUTIME. Because
> the default value of CONFIG_RCU_CPU_STALL_CPUTIME is n, so in most cases, we need
> rcupdate.rcu_cpu_stall_cputime as the escape route.
> 
> If CONFIG_RCU_CPU_STALL_CPUTIME=y is default, your suggestion is more
> appropriate.

Oh ok I thought it was a support Kconfig switch.

Then please just mention that rcupdate.rcu_cpu_stall_cputime overrides
CONFIG_RCU_CPU_STALL_CPUTIME behaviour in the Kconfig help text.

Thanks.

Leizhen (ThunderTown) Nov. 18, 2022, 2:03 a.m. UTC | #9

On 2022/11/17 22:26, Frederic Weisbecker wrote:
> On Thu, Nov 17, 2022 at 09:25:44PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/11/17 20:22, Frederic Weisbecker wrote:
>>> On Thu, Nov 17, 2022 at 09:57:18AM +0800, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2022/11/17 6:39, Frederic Weisbecker wrote:
>>>>> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
>>>>>> @@ -262,6 +279,8 @@ struct rcu_data {
>>>>>>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>>>>>>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>>>>>>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
>>>>>> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
>>>>>> +					    /* the first RCU stall timeout */
>>>>>
>>>>> This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME
>>>>
>>>> This will not work for now because we also support boot option
>>>> rcupdate.rcu_cpu_stall_cputime.
>>>
>>> I'm confused. If CONFIG_RCU_CPU_STALL_CPUTIME=n then rcupdate.rcu_cpu_stall_cputime has
>>> no effect, right?
>>
>> No, rcupdate.rcu_cpu_stall_cputime override CONFIG_RCU_CPU_STALL_CPUTIME. Because
>> the default value of CONFIG_RCU_CPU_STALL_CPUTIME is n, so in most cases, we need
>> rcupdate.rcu_cpu_stall_cputime as the escape route.
>>
>> If CONFIG_RCU_CPU_STALL_CPUTIME=y is default, your suggestion is more
>> appropriate.
> 
> Oh ok I thought it was a support Kconfig switch.
> 
> Then please just mention that rcupdate.rcu_cpu_stall_cputime overrides
> CONFIG_RCU_CPU_STALL_CPUTIME behaviour in the Kconfig help text.

Okay, I'll add the description.

> 
> Thanks.
> .
>

[v7,4/6] rcu: Add RCU stall diagnosis information

Commit Message

Comments

Patch