diff mbox series

kallsyms: Fix sleeping function called from invalid context when CONFIG_KALLSYMS_SELFTEST=y

Message ID 20221220063923.1937-1-thunder.leizhen@huawei.com (mailing list archive)
State New, archived
Headers show
Series kallsyms: Fix sleeping function called from invalid context when CONFIG_KALLSYMS_SELFTEST=y | expand

Commit Message

Leizhen (ThunderTown) Dec. 20, 2022, 6:39 a.m. UTC
[T58] BUG: sleeping function called from invalid context at kernel/kallsyms.c:305
[T58] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 58, name: kallsyms_test
[T58] preempt_count: 0, expected: 0
[T58] RCU nest depth: 0, expected: 0
[T58] no locks held by kallsyms_test/58.
[T58] irq event stamp: 18899904
[T58] hardirqs last enabled at (18899903): finish_task_switch.isra.0 (core.c:?)
[T58] hardirqs last disabled at (18899904): test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
[T58] softirqs last enabled at (18899886): __do_softirq (??:?)
[T58] softirqs last disabled at (18899879): ____do_softirq (irq.c:?)
[T58] CPU: 0 PID: 58 Comm: kallsyms_test Tainted: G T  6.1.0-next-20221215 #2
[T58] Hardware name: linux,dummy-virt (DT)
[T58] Call trace:
[T58] dump_backtrace (??:?)
[T58] show_stack (??:?)
[T58] dump_stack_lvl (??:?)
[T58] dump_stack (??:?)
[T58] __might_resched (??:?)
[T58] kallsyms_on_each_symbol (??:?)
[T58] test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
[T58] test_entry (kallsyms_selftest.c:?)
[T58] kthread (kthread.c:?)
[T58] ret_from_fork (??:?)
[T58] kallsyms_selftest: kallsyms_on_each_symbol() traverse all: 5744310840 ns
[T58] kallsyms_selftest: kallsyms_on_each_match_symbol() traverse all: 1164580 ns
[T58] kallsyms_selftest: finish

Functions kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol()
call the user-registered hook function for each symbol that meets the
requirements. Because it is uncertain how long that hook function will
execute, they call cond_resched() to avoid consuming CPU resources for a
long time. However, irqs need to be disabled during the performance test
to ensure the accuracy of test data. Because the performance test hook is
very clear, very simple function, let's do not call cond_resched() when
CONFIG_KALLSYMS_SELFTEST=y.

Fixes: 30f3bb09778d ("kallsyms: Add self-test facility")
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/kallsyms.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Christophe Leroy Dec. 20, 2022, 8:15 a.m. UTC | #1
Le 20/12/2022 à 07:39, Zhen Lei a écrit :
> [T58] BUG: sleeping function called from invalid context at kernel/kallsyms.c:305
> [T58] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 58, name: kallsyms_test
> [T58] preempt_count: 0, expected: 0
> [T58] RCU nest depth: 0, expected: 0
> [T58] no locks held by kallsyms_test/58.
> [T58] irq event stamp: 18899904
> [T58] hardirqs last enabled at (18899903): finish_task_switch.isra.0 (core.c:?)
> [T58] hardirqs last disabled at (18899904): test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
> [T58] softirqs last enabled at (18899886): __do_softirq (??:?)
> [T58] softirqs last disabled at (18899879): ____do_softirq (irq.c:?)
> [T58] CPU: 0 PID: 58 Comm: kallsyms_test Tainted: G T  6.1.0-next-20221215 #2
> [T58] Hardware name: linux,dummy-virt (DT)
> [T58] Call trace:
> [T58] dump_backtrace (??:?)
> [T58] show_stack (??:?)
> [T58] dump_stack_lvl (??:?)
> [T58] dump_stack (??:?)
> [T58] __might_resched (??:?)
> [T58] kallsyms_on_each_symbol (??:?)
> [T58] test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
> [T58] test_entry (kallsyms_selftest.c:?)
> [T58] kthread (kthread.c:?)
> [T58] ret_from_fork (??:?)
> [T58] kallsyms_selftest: kallsyms_on_each_symbol() traverse all: 5744310840 ns
> [T58] kallsyms_selftest: kallsyms_on_each_match_symbol() traverse all: 1164580 ns
> [T58] kallsyms_selftest: finish
> 
> Functions kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol()
> call the user-registered hook function for each symbol that meets the
> requirements. Because it is uncertain how long that hook function will
> execute, they call cond_resched() to avoid consuming CPU resources for a
> long time. However, irqs need to be disabled during the performance test
> to ensure the accuracy of test data. Because the performance test hook is
> very clear, very simple function, let's do not call cond_resched() when
> CONFIG_KALLSYMS_SELFTEST=y.

I don't think it is appropriate to change the behaviour of a core 
function based on whether a compile time option related to tests is 
selected or not, because you will change the behaviour for all users, 
not only for the tests.

If the problem is that IRQs are disabled, maybe the solution is

	if (!irqs_disabled())
		cond_resched();

Or try to disable the call to cond_resched() in a way or another during 
the run of selftests.

> 
> Fixes: 30f3bb09778d ("kallsyms: Add self-test facility")
> Reported-by: Anders Roxell <anders.roxell@linaro.org>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   kernel/kallsyms.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 83f499182c9aa31..a49e344a686517b 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -302,7 +302,8 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
>   		ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
>   		if (ret != 0)
>   			return ret;
> -		cond_resched();
> +		if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST))
> +			cond_resched();
>   	}
>   	return 0;
>   }
> @@ -319,7 +320,8 @@ int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
>   
>   	for (i = start; !ret && i <= end; i++) {
>   		ret = fn(data, kallsyms_sym_address(get_symbol_seq(i)));
> -		cond_resched();
> +		if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST))
> +			cond_resched();
>   	}
>   
>   	return ret;
Petr Mladek Dec. 20, 2022, 11:01 a.m. UTC | #2
On Tue 2022-12-20 08:15:40, Christophe Leroy wrote:
> 
> 
> Le 20/12/2022 à 07:39, Zhen Lei a écrit :
> > [T58] BUG: sleeping function called from invalid context at kernel/kallsyms.c:305
> > [T58] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 58, name: kallsyms_test
> > [T58] preempt_count: 0, expected: 0
> > [T58] RCU nest depth: 0, expected: 0
> > [T58] no locks held by kallsyms_test/58.
> > [T58] irq event stamp: 18899904
> > [T58] hardirqs last enabled at (18899903): finish_task_switch.isra.0 (core.c:?)
> > [T58] hardirqs last disabled at (18899904): test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
> > [T58] softirqs last enabled at (18899886): __do_softirq (??:?)
> > [T58] softirqs last disabled at (18899879): ____do_softirq (irq.c:?)
> > [T58] CPU: 0 PID: 58 Comm: kallsyms_test Tainted: G T  6.1.0-next-20221215 #2
> > [T58] Hardware name: linux,dummy-virt (DT)
> > [T58] Call trace:
> > [T58] dump_backtrace (??:?)
> > [T58] show_stack (??:?)
> > [T58] dump_stack_lvl (??:?)
> > [T58] dump_stack (??:?)
> > [T58] __might_resched (??:?)
> > [T58] kallsyms_on_each_symbol (??:?)
> > [T58] test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
> > [T58] test_entry (kallsyms_selftest.c:?)
> > [T58] kthread (kthread.c:?)
> > [T58] ret_from_fork (??:?)
> > [T58] kallsyms_selftest: kallsyms_on_each_symbol() traverse all: 5744310840 ns
> > [T58] kallsyms_selftest: kallsyms_on_each_match_symbol() traverse all: 1164580 ns
> > [T58] kallsyms_selftest: finish
> > 
> > Functions kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol()
> > call the user-registered hook function for each symbol that meets the
> > requirements. Because it is uncertain how long that hook function will
> > execute, they call cond_resched() to avoid consuming CPU resources for a
> > long time. However, irqs need to be disabled during the performance test
> > to ensure the accuracy of test data. Because the performance test hook is
> > very clear, very simple function, let's do not call cond_resched() when
> > CONFIG_KALLSYMS_SELFTEST=y.
> 
> I don't think it is appropriate to change the behaviour of a core 
> function based on whether a compile time option related to tests is 
> selected or not, because you will change the behaviour for all users, 
> not only for the tests.

I agree. This is very bad idea. It would change the behavior for
the entire system.

> If the problem is that IRQs are disabled, maybe the solution is
> 
> 	if (!irqs_disabled())
> 		cond_resched();
> 
> Or try to disable the call to cond_resched() in a way or another during 
> the run of selftests.

If I get it correctly, the problem is this code in kernel/kallsyms_selftest.c:

static int lookup_name(void *data, const char *name, struct module *mod, unsigned long addr)
{
[...]
	local_irq_save(flags);
	t0 = sched_clock();
	(void)kallsyms_lookup_name(name);
	t1 = sched_clock();
	local_irq_restore(flags);
[...]

and IRQs are disabled to measure the time spent in this function
without interruption and rescheduling.

I am sure that there are better ways how to measure the time.
Even the "time" command in userspace is able to show time how much CPU
time a command used.

I am not familiar with it. But task_cputime() in
kernel/sched/cputime.c looks promising. And there must be
the interface how the user space get this information.
Some is available via /proc/<PID>/... I am not sure
if there is a syscall.

Best Regards,
Petr
Leizhen (ThunderTown) Dec. 20, 2022, 2:29 p.m. UTC | #3
On 2022/12/20 19:01, Petr Mladek wrote:
> On Tue 2022-12-20 08:15:40, Christophe Leroy wrote:
>>
>>
>> Le 20/12/2022 à 07:39, Zhen Lei a écrit :
>>> [T58] BUG: sleeping function called from invalid context at kernel/kallsyms.c:305
>>> [T58] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 58, name: kallsyms_test
>>> [T58] preempt_count: 0, expected: 0
>>> [T58] RCU nest depth: 0, expected: 0
>>> [T58] no locks held by kallsyms_test/58.
>>> [T58] irq event stamp: 18899904
>>> [T58] hardirqs last enabled at (18899903): finish_task_switch.isra.0 (core.c:?)
>>> [T58] hardirqs last disabled at (18899904): test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
>>> [T58] softirqs last enabled at (18899886): __do_softirq (??:?)
>>> [T58] softirqs last disabled at (18899879): ____do_softirq (irq.c:?)
>>> [T58] CPU: 0 PID: 58 Comm: kallsyms_test Tainted: G T  6.1.0-next-20221215 #2
>>> [T58] Hardware name: linux,dummy-virt (DT)
>>> [T58] Call trace:
>>> [T58] dump_backtrace (??:?)
>>> [T58] show_stack (??:?)
>>> [T58] dump_stack_lvl (??:?)
>>> [T58] dump_stack (??:?)
>>> [T58] __might_resched (??:?)
>>> [T58] kallsyms_on_each_symbol (??:?)
>>> [T58] test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
>>> [T58] test_entry (kallsyms_selftest.c:?)
>>> [T58] kthread (kthread.c:?)
>>> [T58] ret_from_fork (??:?)
>>> [T58] kallsyms_selftest: kallsyms_on_each_symbol() traverse all: 5744310840 ns
>>> [T58] kallsyms_selftest: kallsyms_on_each_match_symbol() traverse all: 1164580 ns
>>> [T58] kallsyms_selftest: finish
>>>
>>> Functions kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol()
>>> call the user-registered hook function for each symbol that meets the
>>> requirements. Because it is uncertain how long that hook function will
>>> execute, they call cond_resched() to avoid consuming CPU resources for a
>>> long time. However, irqs need to be disabled during the performance test
>>> to ensure the accuracy of test data. Because the performance test hook is
>>> very clear, very simple function, let's do not call cond_resched() when
>>> CONFIG_KALLSYMS_SELFTEST=y.
>>
>> I don't think it is appropriate to change the behaviour of a core 
>> function based on whether a compile time option related to tests is 
>> selected or not, because you will change the behaviour for all users, 
>> not only for the tests.
> 
> I agree. This is very bad idea. It would change the behavior for
> the entire system.

It just doesn't look so good, but it doesn't affect the entire system,
and the proposed changes below will.

> 
>> If the problem is that IRQs are disabled, maybe the solution is
>>
>> 	if (!irqs_disabled())
>> 		cond_resched();

If irqs is disabled by the upper layer, this error cannot be easily detected.

>>
>> Or try to disable the call to cond_resched() in a way or another during 
>> the run of selftests.
> 
> If I get it correctly, the problem is this code in kernel/kallsyms_selftest.c:

Yes, another method is to remove the interrupt protection.

> 
> static int lookup_name(void *data, const char *name, struct module *mod, unsigned long addr)
> {
> [...]
> 	local_irq_save(flags);
> 	t0 = sched_clock();
> 	(void)kallsyms_lookup_name(name);
> 	t1 = sched_clock();
> 	local_irq_restore(flags);
> [...]
> 
> and IRQs are disabled to measure the time spent in this function
> without interruption and rescheduling.
> 
> I am sure that there are better ways how to measure the time.
> Even the "time" command in userspace is able to show time how much CPU
> time a command used.

I've got an idea:

local_irq_save(flags);
//get the count and cputime of interrupts
t0 = sched_clock();
local_irq_restore(flags);

(void)kallsyms_lookup_name(name);

local_irq_save(flags);
t1 = sched_clock();
//get the count and cputime of interrupts
local_irq_restore(flags);

minus the cputime of local_irq_save(flags)/local_irq_restore(flags)

if count changed, minus the cputime of interrupts

> 
> I am not familiar with it. But task_cputime() in
> kernel/sched/cputime.c looks promising. And there must be
> the interface how the user space get this information.
> Some is available via /proc/<PID>/... I am not sure
> if there is a syscall.
> 
> Best Regards,
> Petr
> .
>
Petr Mladek Dec. 20, 2022, 3:43 p.m. UTC | #4
On Tue 2022-12-20 22:29:31, Leizhen (ThunderTown) wrote:
> On 2022/12/20 19:01, Petr Mladek wrote:
> > On Tue 2022-12-20 08:15:40, Christophe Leroy wrote:
> >>
> >>
> >> Le 20/12/2022 à 07:39, Zhen Lei a écrit :
> >>> [T58] BUG: sleeping function called from invalid context at kernel/kallsyms.c:305
> >>> [T58] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 58, name: kallsyms_test
> >>> [T58] preempt_count: 0, expected: 0
> >>> [T58] RCU nest depth: 0, expected: 0
> >>> [T58] no locks held by kallsyms_test/58.
> >>> [T58] irq event stamp: 18899904
> >>> [T58] hardirqs last enabled at (18899903): finish_task_switch.isra.0 (core.c:?)
> >>> [T58] hardirqs last disabled at (18899904): test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
> >>> [T58] softirqs last enabled at (18899886): __do_softirq (??:?)
> >>> [T58] softirqs last disabled at (18899879): ____do_softirq (irq.c:?)
> >>> [T58] CPU: 0 PID: 58 Comm: kallsyms_test Tainted: G T  6.1.0-next-20221215 #2
> >>> [T58] Hardware name: linux,dummy-virt (DT)
> >>> [T58] Call trace:
> >>> [T58] dump_backtrace (??:?)
> >>> [T58] show_stack (??:?)
> >>> [T58] dump_stack_lvl (??:?)
> >>> [T58] dump_stack (??:?)
> >>> [T58] __might_resched (??:?)
> >>> [T58] kallsyms_on_each_symbol (??:?)
> >>> [T58] test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
> >>> [T58] test_entry (kallsyms_selftest.c:?)
> >>> [T58] kthread (kthread.c:?)
> >>> [T58] ret_from_fork (??:?)
> >>> [T58] kallsyms_selftest: kallsyms_on_each_symbol() traverse all: 5744310840 ns
> >>> [T58] kallsyms_selftest: kallsyms_on_each_match_symbol() traverse all: 1164580 ns
> >>> [T58] kallsyms_selftest: finish
> >>>
> >>> Functions kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol()
> >>> call the user-registered hook function for each symbol that meets the
> >>> requirements. Because it is uncertain how long that hook function will
> >>> execute, they call cond_resched() to avoid consuming CPU resources for a
> >>> long time. However, irqs need to be disabled during the performance test
> >>> to ensure the accuracy of test data. Because the performance test hook is
> >>> very clear, very simple function, let's do not call cond_resched() when
> >>> CONFIG_KALLSYMS_SELFTEST=y.
> >>
> >> I don't think it is appropriate to change the behaviour of a core 
> >> function based on whether a compile time option related to tests is 
> >> selected or not, because you will change the behaviour for all users, 
> >> not only for the tests.
> > 
> > I agree. This is very bad idea. It would change the behavior for
> > the entire system.
> 
> It just doesn't look so good, but it doesn't affect the entire system,
> and the proposed changes below will.

I do not understand this. The proposed patch did:

--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -302,7 +302,8 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 		ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
 		if (ret != 0)
 			return ret;
-		cond_resched();
+		if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST))
+			cond_resched();
 	}
 	return 0;
 }
@@ -319,7 +320,8 @@ int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
 
 	for (i = start; !ret && i <= end; i++) {
 		ret = fn(data, kallsyms_sym_address(get_symbol_seq(i)));
-		cond_resched();
+		if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST))
+			cond_resched();
 	}
 
 	return ret;

It changes the behavior for any kallsyms_on_each_symbol()
and kallsyms_on_each_match_symbol() caller. And this _is_ the problem!

The cond_resched() is there for a reason. See the commit
f5bdb34bf0c9314548f2 ("livepatch: Avoid CPU hogging with
cond_resched").

If a function explicitely includes cond_resched() because it
delayed something by >1s that it must never be called with
interrupts enabled except when there is a good reason to
do so, e.g. for debugging. And measuring CPU time
in a selftest is not a good reason, definitely!

> > If I get it correctly, the problem is this code in kernel/kallsyms_selftest.c:
> 
> Yes, another method is to remove the interrupt protection.
> > 
> > static int lookup_name(void *data, const char *name, struct module *mod, unsigned long addr)
> > {
> > [...]
> > 	local_irq_save(flags);
> > 	t0 = sched_clock();
> > 	(void)kallsyms_lookup_name(name);
> > 	t1 = sched_clock();
> > 	local_irq_restore(flags);
> > [...]
> > 
> > and IRQs are disabled to measure the time spent in this function
> > without interruption and rescheduling.
> > 
> > I am sure that there are better ways how to measure the time.
> > Even the "time" command in userspace is able to show time how much CPU
> > time a command used.
> 
> I've got an idea:
> 
> local_irq_save(flags);
> //get the count and cputime of interrupts
> t0 = sched_clock();
> local_irq_restore(flags);
> 
> (void)kallsyms_lookup_name(name);
> 
> local_irq_save(flags);
> t1 = sched_clock();
> //get the count and cputime of interrupts
> local_irq_restore(flags);

Why do you need to disable IRQs around sched_clock(), please?
It works well in any context.

Is sched_clock() needed at all?
Honestly, I do not understand why you are using sched_clock() for
this.

> > I am not familiar with it. But task_cputime() in
> > kernel/sched/cputime.c looks promising. And there must be
> > the interface how the user space get this information.
> > Some is available via /proc/<PID>/... I am not sure
> > if there is a syscall.

Please, try to investigate the above hints.

Honestly, it looks like you do not know what you are doing.

I NACK this patch.

Best Regards,
Petr
Leizhen (ThunderTown) Dec. 21, 2022, 12:37 p.m. UTC | #5
On 2022/12/20 23:43, Petr Mladek wrote:
> On Tue 2022-12-20 22:29:31, Leizhen (ThunderTown) wrote:
>> On 2022/12/20 19:01, Petr Mladek wrote:
>>> On Tue 2022-12-20 08:15:40, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 20/12/2022 à 07:39, Zhen Lei a écrit :
>>>>> [T58] BUG: sleeping function called from invalid context at kernel/kallsyms.c:305
>>>>> [T58] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 58, name: kallsyms_test
>>>>> [T58] preempt_count: 0, expected: 0
>>>>> [T58] RCU nest depth: 0, expected: 0
>>>>> [T58] no locks held by kallsyms_test/58.
>>>>> [T58] irq event stamp: 18899904
>>>>> [T58] hardirqs last enabled at (18899903): finish_task_switch.isra.0 (core.c:?)
>>>>> [T58] hardirqs last disabled at (18899904): test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
>>>>> [T58] softirqs last enabled at (18899886): __do_softirq (??:?)
>>>>> [T58] softirqs last disabled at (18899879): ____do_softirq (irq.c:?)
>>>>> [T58] CPU: 0 PID: 58 Comm: kallsyms_test Tainted: G T  6.1.0-next-20221215 #2
>>>>> [T58] Hardware name: linux,dummy-virt (DT)
>>>>> [T58] Call trace:
>>>>> [T58] dump_backtrace (??:?)
>>>>> [T58] show_stack (??:?)
>>>>> [T58] dump_stack_lvl (??:?)
>>>>> [T58] dump_stack (??:?)
>>>>> [T58] __might_resched (??:?)
>>>>> [T58] kallsyms_on_each_symbol (??:?)
>>>>> [T58] test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?)
>>>>> [T58] test_entry (kallsyms_selftest.c:?)
>>>>> [T58] kthread (kthread.c:?)
>>>>> [T58] ret_from_fork (??:?)
>>>>> [T58] kallsyms_selftest: kallsyms_on_each_symbol() traverse all: 5744310840 ns
>>>>> [T58] kallsyms_selftest: kallsyms_on_each_match_symbol() traverse all: 1164580 ns
>>>>> [T58] kallsyms_selftest: finish
>>>>>
>>>>> Functions kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol()
>>>>> call the user-registered hook function for each symbol that meets the
>>>>> requirements. Because it is uncertain how long that hook function will
>>>>> execute, they call cond_resched() to avoid consuming CPU resources for a
>>>>> long time. However, irqs need to be disabled during the performance test
>>>>> to ensure the accuracy of test data. Because the performance test hook is
>>>>> very clear, very simple function, let's do not call cond_resched() when
>>>>> CONFIG_KALLSYMS_SELFTEST=y.
>>>>
>>>> I don't think it is appropriate to change the behaviour of a core 
>>>> function based on whether a compile time option related to tests is 
>>>> selected or not, because you will change the behaviour for all users, 
>>>> not only for the tests.
>>>
>>> I agree. This is very bad idea. It would change the behavior for
>>> the entire system.
>>
>> It just doesn't look so good, but it doesn't affect the entire system,
>> and the proposed changes below will.

I mean "the proposed changes below":
	if (!irqs_disabled())
		cond_resched();

> 
> I do not understand this. The proposed patch did:
> 
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -302,7 +302,8 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
>  		ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
>  		if (ret != 0)
>  			return ret;
> -		cond_resched();
> +		if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST))
> +			cond_resched();
>  	}
>  	return 0;
>  }
> @@ -319,7 +320,8 @@ int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
>  
>  	for (i = start; !ret && i <= end; i++) {
>  		ret = fn(data, kallsyms_sym_address(get_symbol_seq(i)));
> -		cond_resched();
> +		if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST))
> +			cond_resched();
>  	}
>  
>  	return ret;
> 
> It changes the behavior for any kallsyms_on_each_symbol()
> and kallsyms_on_each_match_symbol() caller. And this _is_ the problem!

!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST) ==> CONFIG_KALLSYMS_SELFTEST=n

So when CONFIG_KALLSYMS_SELFTEST=n, cond_resched() is called, the
original function has not changed.

And when CONFIG_KALLSYMS_SELFTEST=y, cond_resched() is not called.

kallsyms_selftest: kallsyms_on_each_symbol() traverse all: 18864181 ns

The duration for disabling interruption does not exceed 20 ms.
It's not a big problem for test. CONFIG_KALLSYMS_SELFTEST is set only
in the lab.

> 
> The cond_resched() is there for a reason. See the commit
> f5bdb34bf0c9314548f2 ("livepatch: Avoid CPU hogging with
> cond_resched").
> 
> If a function explicitely includes cond_resched() because it
> delayed something by >1s that it must never be called with
> interrupts enabled except when there is a good reason to
> do so, e.g. for debugging. And measuring CPU time
> in a selftest is not a good reason, definitely!

OK, thanks.

> 
>>> If I get it correctly, the problem is this code in kernel/kallsyms_selftest.c:
>>
>> Yes, another method is to remove the interrupt protection.
>>>
>>> static int lookup_name(void *data, const char *name, struct module *mod, unsigned long addr)
>>> {
>>> [...]
>>> 	local_irq_save(flags);
>>> 	t0 = sched_clock();
>>> 	(void)kallsyms_lookup_name(name);
>>> 	t1 = sched_clock();
>>> 	local_irq_restore(flags);
>>> [...]
>>>
>>> and IRQs are disabled to measure the time spent in this function
>>> without interruption and rescheduling.
>>>
>>> I am sure that there are better ways how to measure the time.
>>> Even the "time" command in userspace is able to show time how much CPU
>>> time a command used.
>>
>> I've got an idea:
>>
>> local_irq_save(flags);
>> //get the count and cputime of interrupts
>> t0 = sched_clock();
>> local_irq_restore(flags);
>>
>> (void)kallsyms_lookup_name(name);
>>
>> local_irq_save(flags);
>> t1 = sched_clock();
>> //get the count and cputime of interrupts
>> local_irq_restore(flags);
> 
> Why do you need to disable IRQs around sched_clock(), please?
> It works well in any context.

I need to calculate the difference accurately.

//get the count and cputime of interrupts
        <-------- (1) maybe interrupted
t0 = sched_clock();
(void)kallsyms_lookup_name(name);  <------- (2) maybe interrupted
t1 = sched_clock();
        <-------- (3) maybe interrupted
//get the count and cputime of interrupts

Interrupted at different points, resulting in inaccurate
calculation results. (1) and (3) also need to be considered.

> 
> Is sched_clock() needed at all?

Yes, because I'm counting the execution time of a single function, the
time is short.

> Honestly, I do not understand why you are using sched_clock() for
> this.

The unit of sched_clock() is nanosecond, high precision, no conversion required.

> 
>>> I am not familiar with it. But task_cputime() in
>>> kernel/sched/cputime.c looks promising. And there must be
>>> the interface how the user space get this information.
>>> Some is available via /proc/<PID>/... I am not sure
>>> if there is a syscall.
> 
> Please, try to investigate the above hints.

Yes, I will. Thanks.

> 
> Honestly, it looks like you do not know what you are doing.
> 
> I NACK this patch.
> 
> Best Regards,
> Petr
> .
>
Leizhen (ThunderTown) Dec. 27, 2022, 7:16 a.m. UTC | #6
On 2022/12/21 20:37, Leizhen (ThunderTown) wrote:
>>>> I am not familiar with it. But task_cputime() in
>>>> kernel/sched/cputime.c looks promising. And there must be
>>>> the interface how the user space get this information.
>>>> Some is available via /proc/<PID>/... I am not sure
>>>> if there is a syscall.
>> Please, try to investigate the above hints.
> Yes, I will. Thanks.

Function task_cputime() works well when CONFIG_VIRT_CPU_ACCOUNTING_GEN=y,
but not when CONFIG_TICK_CPU_ACCOUNTING=y. The latter seems to be based on
tick interrupt sampling, rather than recording timestamps during task switches.

>
diff mbox series

Patch

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 83f499182c9aa31..a49e344a686517b 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -302,7 +302,8 @@  int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 		ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
 		if (ret != 0)
 			return ret;
-		cond_resched();
+		if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST))
+			cond_resched();
 	}
 	return 0;
 }
@@ -319,7 +320,8 @@  int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
 
 	for (i = start; !ret && i <= end; i++) {
 		ret = fn(data, kallsyms_sym_address(get_symbol_seq(i)));
-		cond_resched();
+		if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST))
+			cond_resched();
 	}
 
 	return ret;