diff mbox series

[V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

Message ID 9df267db-e647-a81d-16bb-b8bfb06c2624@huawei.com (mailing list archive)
State New, archived
Headers show
Series [V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill() | expand

Commit Message

Yunfeng Ye Oct. 9, 2019, 4:45 a.m. UTC
If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
take so long in the right case. Use usleep_range() instead of msleep(),
reduce the waiting time, and give a chance to busy wait before sleep.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
---
V1->V2:
- use usleep_range() instead of udelay() after waiting for a while

 arch/arm64/kernel/psci.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Comments

Sudeep Holla Oct. 16, 2019, 3:32 p.m. UTC | #1
On Wed, Oct 09, 2019 at 12:45:16PM +0800, Yunfeng Ye wrote:
> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
> take so long in the right case. Use usleep_range() instead of msleep(),
> reduce the waiting time, and give a chance to busy wait before sleep.
>
> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
> ---
> V1->V2:
> - use usleep_range() instead of udelay() after waiting for a while
>
>  arch/arm64/kernel/psci.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> index c9f72b2..99b3122 100644
> --- a/arch/arm64/kernel/psci.c
> +++ b/arch/arm64/kernel/psci.c
> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
>  static int cpu_psci_cpu_kill(unsigned int cpu)
>  {
>  	int err, i;
> +	unsigned long timeout;
>
>  	if (!psci_ops.affinity_info)
>  		return 0;
> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
>  	 * while it is dying. So, try again a few times.
>  	 */
>
> -	for (i = 0; i < 10; i++) {
> +	i = 0;
> +	timeout = jiffies + msecs_to_jiffies(100);
> +	do {
>  		err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>  		if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
>  			pr_info("CPU%d killed.\n", cpu);
>  			return 0;
>  		}
>
> -		msleep(10);
> -		pr_info("Retrying again to check for CPU kill\n");

You dropped this message, any particular reason ?

> -	}
> +		/* busy-wait max 1ms */
> +		if (i++ < 100) {
> +			cond_resched();
> +			udelay(10);
> +			continue;

Why can't it be simple like loop of 100 * msleep(1) instead of loop of
10 * msleep(10). The above initial busy wait for 1 ms looks too much
optimised for your setup where it takes 50-500us, what if it take just
over 1 ms ?

We need more generic solution.

--
Regards,
Sudeep
Yunfeng Ye Oct. 17, 2019, 1:26 p.m. UTC | #2
On 2019/10/16 23:32, Sudeep Holla wrote:
> On Wed, Oct 09, 2019 at 12:45:16PM +0800, Yunfeng Ye wrote:
>> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
>> take so long in the right case. Use usleep_range() instead of msleep(),
>> reduce the waiting time, and give a chance to busy wait before sleep.
>>
>> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
>> ---
>> V1->V2:
>> - use usleep_range() instead of udelay() after waiting for a while
>>
>>  arch/arm64/kernel/psci.c | 17 +++++++++++++----
>>  1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
>> index c9f72b2..99b3122 100644
>> --- a/arch/arm64/kernel/psci.c
>> +++ b/arch/arm64/kernel/psci.c
>> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
>>  static int cpu_psci_cpu_kill(unsigned int cpu)
>>  {
>>  	int err, i;
>> +	unsigned long timeout;
>>
>>  	if (!psci_ops.affinity_info)
>>  		return 0;
>> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
>>  	 * while it is dying. So, try again a few times.
>>  	 */
>>
>> -	for (i = 0; i < 10; i++) {
>> +	i = 0;
>> +	timeout = jiffies + msecs_to_jiffies(100);
>> +	do {
>>  		err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>>  		if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
>>  			pr_info("CPU%d killed.\n", cpu);
>>  			return 0;
>>  		}
>>
>> -		msleep(10);
>> -		pr_info("Retrying again to check for CPU kill\n");
> 
> You dropped this message, any particular reason ?
> 
When reduce the time interval to 1ms, the print message maybe increase 10 times.
on the other hand, cpu_psci_cpu_kill() will print message on success or failure, which
this retry log is not very necessary. of cource, I think use pr_info_once() instead of
pr_info() is better.

thanks.

>> -	}
>> +		/* busy-wait max 1ms */
>> +		if (i++ < 100) {
>> +			cond_resched();
>> +			udelay(10);
>> +			continue;
> 
> Why can't it be simple like loop of 100 * msleep(1) instead of loop of
> 10 * msleep(10). The above initial busy wait for 1 ms looks too much
> optimised for your setup where it takes 50-500us, what if it take just
> over 1 ms ?
> 
msleep() is implemented by jiffies. when HZ=100 or HZ=250, msleep(1) is not
accurate. so I think usleep_range() is better. 1 ms looks simple and good, but how
about 100us is better? I refer a function sunxi_mc_smp_cpu_kill(), it use
usleep_range(50, 100).

thanks.

> We need more generic solution.
> 
> --
> Regards,
> Sudeep
> 
> .
>
Sudeep Holla Oct. 17, 2019, 1:54 p.m. UTC | #3
On Thu, Oct 17, 2019 at 09:26:15PM +0800, Yunfeng Ye wrote:
>
>
> On 2019/10/16 23:32, Sudeep Holla wrote:
> > On Wed, Oct 09, 2019 at 12:45:16PM +0800, Yunfeng Ye wrote:
> >> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
> >> take so long in the right case. Use usleep_range() instead of msleep(),
> >> reduce the waiting time, and give a chance to busy wait before sleep.
> >>
> >> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
> >> ---
> >> V1->V2:
> >> - use usleep_range() instead of udelay() after waiting for a while
> >>
> >>  arch/arm64/kernel/psci.c | 17 +++++++++++++----
> >>  1 file changed, 13 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> >> index c9f72b2..99b3122 100644
> >> --- a/arch/arm64/kernel/psci.c
> >> +++ b/arch/arm64/kernel/psci.c
> >> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
> >>  static int cpu_psci_cpu_kill(unsigned int cpu)
> >>  {
> >>  	int err, i;
> >> +	unsigned long timeout;
> >>
> >>  	if (!psci_ops.affinity_info)
> >>  		return 0;
> >> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
> >>  	 * while it is dying. So, try again a few times.
> >>  	 */
> >>
> >> -	for (i = 0; i < 10; i++) {
> >> +	i = 0;
> >> +	timeout = jiffies + msecs_to_jiffies(100);
> >> +	do {
> >>  		err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
> >>  		if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
> >>  			pr_info("CPU%d killed.\n", cpu);
> >>  			return 0;
> >>  		}
> >>
> >> -		msleep(10);
> >> -		pr_info("Retrying again to check for CPU kill\n");
> >
> > You dropped this message, any particular reason ?
> >
> When reduce the time interval to 1ms, the print message maybe increase 10
> times. on the other hand, cpu_psci_cpu_kill() will print message on success
> or failure, which this retry log is not very necessary. of cource, I think
> use pr_info_once() instead of pr_info() is better.
>

Yes changing it to pr_info_once is better than dropping it as it gives
some indication to the firmware if there's scope for improvement.

> >> -	}
> >> +		/* busy-wait max 1ms */
> >> +		if (i++ < 100) {
> >> +			cond_resched();
> >> +			udelay(10);
> >> +			continue;
> >
> > Why can't it be simple like loop of 100 * msleep(1) instead of loop of
> > 10 * msleep(10). The above initial busy wait for 1 ms looks too much
> > optimised for your setup where it takes 50-500us, what if it take just
> > over 1 ms ?
> >
> msleep() is implemented by jiffies. when HZ=100 or HZ=250, msleep(1) is not
> accurate. so I think usleep_range() is better. 1 ms looks simple and good, but how
> about 100us is better? I refer a function sunxi_mc_smp_cpu_kill(), it use
> usleep_range(50, 100).
>

Again that's specific to sunxi platforms and may work well. While I agree
msleep(1) may not be accurate, I am still inclined to have a max value
of 1000(i.e. 1ms) for usleep_range.

--
Regards,
Sudeep
David Laight Oct. 17, 2019, 2 p.m. UTC | #4
From: Yunfeng Ye
> Sent: 17 October 2019 14:26
...
> >> -	for (i = 0; i < 10; i++) {
> >> +	i = 0;
> >> +	timeout = jiffies + msecs_to_jiffies(100);
> >> +	do {
> >>  		err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
> >>  		if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
> >>  			pr_info("CPU%d killed.\n", cpu);
> >>  			return 0;
> >>  		}
> >>
> >> -		msleep(10);
> >> -		pr_info("Retrying again to check for CPU kill\n");
> >
> > You dropped this message, any particular reason ?
> >
> When reduce the time interval to 1ms, the print message maybe increase 10 times.
> on the other hand, cpu_psci_cpu_kill() will print message on success or failure, which
> this retry log is not very necessary. of cource, I think use pr_info_once() instead of
> pr_info() is better.

Maybe you should print in on (say) the 10th time around the loop.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Yunfeng Ye Oct. 17, 2019, 2:19 p.m. UTC | #5
On 2019/10/17 22:00, David Laight wrote:
> From: Yunfeng Ye
>> Sent: 17 October 2019 14:26
> ...
>>>> -	for (i = 0; i < 10; i++) {
>>>> +	i = 0;
>>>> +	timeout = jiffies + msecs_to_jiffies(100);
>>>> +	do {
>>>>  		err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>>>>  		if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
>>>>  			pr_info("CPU%d killed.\n", cpu);
>>>>  			return 0;
>>>>  		}
>>>>
>>>> -		msleep(10);
>>>> -		pr_info("Retrying again to check for CPU kill\n");
>>>
>>> You dropped this message, any particular reason ?
>>>
>> When reduce the time interval to 1ms, the print message maybe increase 10 times.
>> on the other hand, cpu_psci_cpu_kill() will print message on success or failure, which
>> this retry log is not very necessary. of cource, I think use pr_info_once() instead of
>> pr_info() is better.
> 
> Maybe you should print in on (say) the 10th time around the loop.
> 
Can it like this:
  pr_info("CPU%d killed with %d loops.\n", cpu, loops);

If put the number of waiting times in the successful printing message, it is
not necessary to print the "Retrying ..." message.

thanks.

> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
Yunfeng Ye Oct. 17, 2019, 2:24 p.m. UTC | #6
On 2019/10/17 21:54, Sudeep Holla wrote:
> On Thu, Oct 17, 2019 at 09:26:15PM +0800, Yunfeng Ye wrote:
>>
>>
>> On 2019/10/16 23:32, Sudeep Holla wrote:
>>> On Wed, Oct 09, 2019 at 12:45:16PM +0800, Yunfeng Ye wrote:
>>>> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
>>>> take so long in the right case. Use usleep_range() instead of msleep(),
>>>> reduce the waiting time, and give a chance to busy wait before sleep.
>>>>
>>>> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
>>>> ---
>>>> V1->V2:
>>>> - use usleep_range() instead of udelay() after waiting for a while
>>>>
>>>>  arch/arm64/kernel/psci.c | 17 +++++++++++++----
>>>>  1 file changed, 13 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
>>>> index c9f72b2..99b3122 100644
>>>> --- a/arch/arm64/kernel/psci.c
>>>> +++ b/arch/arm64/kernel/psci.c
>>>> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
>>>>  static int cpu_psci_cpu_kill(unsigned int cpu)
>>>>  {
>>>>  	int err, i;
>>>> +	unsigned long timeout;
>>>>
>>>>  	if (!psci_ops.affinity_info)
>>>>  		return 0;
>>>> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
>>>>  	 * while it is dying. So, try again a few times.
>>>>  	 */
>>>>
>>>> -	for (i = 0; i < 10; i++) {
>>>> +	i = 0;
>>>> +	timeout = jiffies + msecs_to_jiffies(100);
>>>> +	do {
>>>>  		err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>>>>  		if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
>>>>  			pr_info("CPU%d killed.\n", cpu);
>>>>  			return 0;
>>>>  		}
>>>>
>>>> -		msleep(10);
>>>> -		pr_info("Retrying again to check for CPU kill\n");
>>>
>>> You dropped this message, any particular reason ?
>>>
>> When reduce the time interval to 1ms, the print message maybe increase 10
>> times. on the other hand, cpu_psci_cpu_kill() will print message on success
>> or failure, which this retry log is not very necessary. of cource, I think
>> use pr_info_once() instead of pr_info() is better.
>>
> 
> Yes changing it to pr_info_once is better than dropping it as it gives
> some indication to the firmware if there's scope for improvement.
> 
>>>> -	}
>>>> +		/* busy-wait max 1ms */
>>>> +		if (i++ < 100) {
>>>> +			cond_resched();
>>>> +			udelay(10);
>>>> +			continue;
>>>
>>> Why can't it be simple like loop of 100 * msleep(1) instead of loop of
>>> 10 * msleep(10). The above initial busy wait for 1 ms looks too much
>>> optimised for your setup where it takes 50-500us, what if it take just
>>> over 1 ms ?
>>>
>> msleep() is implemented by jiffies. when HZ=100 or HZ=250, msleep(1) is not
>> accurate. so I think usleep_range() is better. 1 ms looks simple and good, but how
>> about 100us is better? I refer a function sunxi_mc_smp_cpu_kill(), it use
>> usleep_range(50, 100).
>>
> 
> Again that's specific to sunxi platforms and may work well. While I agree
> msleep(1) may not be accurate, I am still inclined to have a max value
> of 1000(i.e. 1ms) for usleep_range.
> 
ok, I will send a new version patch that waiting max 1ms.
thanks.

> --
> Regards,
> Sudeep
> 
> .
>
David Laight Oct. 17, 2019, 2:25 p.m. UTC | #7
From: Yunfeng Ye
 > Sent: 17 October 2019 15:20
> On 2019/10/17 22:00, David Laight wrote:
> > From: Yunfeng Ye
> >> Sent: 17 October 2019 14:26
> > ...
> >>>> -	for (i = 0; i < 10; i++) {
> >>>> +	i = 0;
> >>>> +	timeout = jiffies + msecs_to_jiffies(100);
> >>>> +	do {
> >>>>  		err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
> >>>>  		if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
> >>>>  			pr_info("CPU%d killed.\n", cpu);
> >>>>  			return 0;
> >>>>  		}
> >>>>
> >>>> -		msleep(10);
> >>>> -		pr_info("Retrying again to check for CPU kill\n");
> >>>
> >>> You dropped this message, any particular reason ?
> >>>
> >> When reduce the time interval to 1ms, the print message maybe increase 10 times.
> >> on the other hand, cpu_psci_cpu_kill() will print message on success or failure, which
> >> this retry log is not very necessary. of cource, I think use pr_info_once() instead of
> >> pr_info() is better.
> >
> > Maybe you should print in on (say) the 10th time around the loop.
> >
> Can it like this:
>   pr_info("CPU%d killed with %d loops.\n", cpu, loops);
> 
> If put the number of waiting times in the successful printing message, it is
> not necessary to print the "Retrying ..." message.

That depends on whether you want to know how long it took or why the system
is 'stuck'.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
index c9f72b2..99b3122 100644
--- a/arch/arm64/kernel/psci.c
+++ b/arch/arm64/kernel/psci.c
@@ -82,6 +82,7 @@  static void cpu_psci_cpu_die(unsigned int cpu)
 static int cpu_psci_cpu_kill(unsigned int cpu)
 {
 	int err, i;
+	unsigned long timeout;

 	if (!psci_ops.affinity_info)
 		return 0;
@@ -91,16 +92,24 @@  static int cpu_psci_cpu_kill(unsigned int cpu)
 	 * while it is dying. So, try again a few times.
 	 */

-	for (i = 0; i < 10; i++) {
+	i = 0;
+	timeout = jiffies + msecs_to_jiffies(100);
+	do {
 		err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
 		if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
 			pr_info("CPU%d killed.\n", cpu);
 			return 0;
 		}

-		msleep(10);
-		pr_info("Retrying again to check for CPU kill\n");
-	}
+		/* busy-wait max 1ms */
+		if (i++ < 100) {
+			cond_resched();
+			udelay(10);
+			continue;
+		}
+
+		usleep_range(100, 1000);
+	} while (time_before(jiffies, timeout));

 	pr_warn("CPU%d may not have shut down cleanly (AFFINITY_INFO reports %d)\n",
 			cpu, err);