diff mbox series

[RFC,v4,5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()

Message ID 20241125132029.7241-6-patryk.wlazlyn@linux.intel.com (mailing list archive)
State Changes Requested, archived
Headers show
Series SRF: Fix offline CPU preventing pc6 entry | expand

Commit Message

Patryk Wlazlyn Nov. 25, 2024, 1:20 p.m. UTC
The algorithm based on cpuid leaf 0x5 in the mwait_play_dead(),
for looking up the mwait hint for the deepest cstate may calculate the
wrong hint on recent Intel x86 platforms.

Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
only fallback to the later in case of missing appropriate enter_dead()
handler.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
---
 arch/x86/kernel/smpboot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Peter Zijlstra Nov. 25, 2024, 1:45 p.m. UTC | #1
On Mon, Nov 25, 2024 at 02:20:25PM +0100, Patryk Wlazlyn wrote:
> The algorithm based on cpuid leaf 0x5 in the mwait_play_dead(),
> for looking up the mwait hint for the deepest cstate may calculate the
> wrong hint on recent Intel x86 platforms.
> 
> Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
> only fallback to the later in case of missing appropriate enter_dead()
> handler.

So 7/8 should go first, such that picking cpuidle doesn't end up in some
weird state.

> 
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> ---
>  arch/x86/kernel/smpboot.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index d0464c7a0af5..2627b56fb9bc 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1413,9 +1413,10 @@ void native_play_dead(void)
>  	play_dead_common();
>  	tboot_shutdown(TB_SHUTDOWN_WFS);
>  
> +	/* The first successful play_dead() will not return */
> +	cpuidle_play_dead();
>  	mwait_play_dead();
> -	if (cpuidle_play_dead())
> -		hlt_play_dead();
> +	hlt_play_dead();
>  }
>  
>  #else /* ... !CONFIG_HOTPLUG_CPU */
> -- 
> 2.47.0
>
Rafael J. Wysocki Nov. 25, 2024, 1:56 p.m. UTC | #2
On Mon, Nov 25, 2024 at 2:21 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> The algorithm based on cpuid leaf 0x5 in the mwait_play_dead(),
> for looking up the mwait hint for the deepest cstate may calculate the
> wrong hint on recent Intel x86 platforms.
>
> Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
> only fallback to the later in case of missing appropriate enter_dead()
> handler.
>
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> ---
>  arch/x86/kernel/smpboot.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index d0464c7a0af5..2627b56fb9bc 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1413,9 +1413,10 @@ void native_play_dead(void)
>         play_dead_common();
>         tboot_shutdown(TB_SHUTDOWN_WFS);
>
> +       /* The first successful play_dead() will not return */
> +       cpuidle_play_dead();
>         mwait_play_dead();
> -       if (cpuidle_play_dead())
> -               hlt_play_dead();
> +       hlt_play_dead();
>  }
>
>  #else /* ... !CONFIG_HOTPLUG_CPU */
> --

If you first make intel_idle provide :enter_dead() for all CPUs on all
platforms and implement it by calling mwait_play_dead_with_hint(), you
won't need mwait_play_dead() any more.
Patryk Wlazlyn Nov. 25, 2024, 2:43 p.m. UTC | #3
> If you first make intel_idle provide :enter_dead() for all CPUs on all
> platforms and implement it by calling mwait_play_dead_with_hint(), you
> won't need mwait_play_dead() any more.
Crossed my mind, but because mwait_play_dead doesn't filter on Intel
vendor specifically, I thought that I might break some edge case.


I am all for simplifying.
Rafael J. Wysocki Nov. 25, 2024, 2:48 p.m. UTC | #4
On Mon, Nov 25, 2024 at 3:43 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> > If you first make intel_idle provide :enter_dead() for all CPUs on all
> > platforms and implement it by calling mwait_play_dead_with_hint(), you
> > won't need mwait_play_dead() any more.
> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
> vendor specifically,

In practice, it does.

The vendor check in it is equivalent to "if Intel".

> I thought that I might break some edge case.
>
>
> I am all for simplifying.
Patryk Wlazlyn Nov. 26, 2024, 11:56 a.m. UTC | #5
>>> If you first make intel_idle provide :enter_dead() for all CPUs on all
>>> platforms and implement it by calling mwait_play_dead_with_hint(), you
>>> won't need mwait_play_dead() any more.
>> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
>> vendor specifically,
>
> In practice, it does.
>
> The vendor check in it is equivalent to "if Intel".

Actually, what about INTEL_IDLE=n?
We might hit acpi_idle, which would call mwait_play_dead_with_hint() now, but
if we don't, don't we want to try mwait_play_dead before hlt or is it too
unrealistic to happen?
Rafael J. Wysocki Nov. 26, 2024, 12:04 p.m. UTC | #6
On Tue, Nov 26, 2024 at 12:56 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> >>> If you first make intel_idle provide :enter_dead() for all CPUs on all
> >>> platforms and implement it by calling mwait_play_dead_with_hint(), you
> >>> won't need mwait_play_dead() any more.
> >> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
> >> vendor specifically,
> >
> > In practice, it does.
> >
> > The vendor check in it is equivalent to "if Intel".
>
> Actually, what about INTEL_IDLE=n?
> We might hit acpi_idle, which would call mwait_play_dead_with_hint() now, but
> if we don't, don't we want to try mwait_play_dead before hlt or is it too
> unrealistic to happen?

In that case the hint to use would not be known anyway, so
hlt_play_dead() is the right choice IMV.
Patryk Wlazlyn Nov. 26, 2024, 1:10 p.m. UTC | #7
>>>>> If you first make intel_idle provide :enter_dead() for all CPUs on all
>>>>> platforms and implement it by calling mwait_play_dead_with_hint(), you
>>>>> won't need mwait_play_dead() any more.
>>>> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
>>>> vendor specifically,
>>>
>>> In practice, it does.
>>>
>>> The vendor check in it is equivalent to "if Intel".
>>
>> Actually, what about INTEL_IDLE=n?
>> We might hit acpi_idle, which would call mwait_play_dead_with_hint() now, but
>> if we don't, don't we want to try mwait_play_dead before hlt or is it too
>> unrealistic to happen?
>
> In that case the hint to use would not be known anyway, so
> hlt_play_dead() is the right choice IMV.

Fair, it's not known, but we could fallback to the old, cpuid leaf 0x5 based
algorithm, which works on a lot of hardware.

That being said, I think it's cleaner to get rid of the old algorithm entirely
and rely on idle drivers to do the right thing from this point forward.

If we could bring the CPU out of the mwait loop into the hlt loop somehow (via
interrupt for example) we could remove the old kexec hack altogether.
Rafael J. Wysocki Nov. 26, 2024, 1:38 p.m. UTC | #8
On Tue, Nov 26, 2024 at 2:10 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> >>>>> If you first make intel_idle provide :enter_dead() for all CPUs on all
> >>>>> platforms and implement it by calling mwait_play_dead_with_hint(), you
> >>>>> won't need mwait_play_dead() any more.
> >>>> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
> >>>> vendor specifically,
> >>>
> >>> In practice, it does.
> >>>
> >>> The vendor check in it is equivalent to "if Intel".
> >>
> >> Actually, what about INTEL_IDLE=n?
> >> We might hit acpi_idle, which would call mwait_play_dead_with_hint() now, but
> >> if we don't, don't we want to try mwait_play_dead before hlt or is it too
> >> unrealistic to happen?
> >
> > In that case the hint to use would not be known anyway, so
> > hlt_play_dead() is the right choice IMV.
>
> Fair, it's not known, but we could fallback to the old, cpuid leaf 0x5 based
> algorithm, which works on a lot of hardware.
>
> That being said, I think it's cleaner to get rid of the old algorithm entirely
> and rely on idle drivers to do the right thing from this point forward.
>
> If we could bring the CPU out of the mwait loop into the hlt loop somehow (via
> interrupt for example) we could remove the old kexec hack altogether.

Well, the problem is that CPUs may be woken up from MWAIT waits for
various reasons and the reason for a given wakeup is not known a
priori.
diff mbox series

Patch

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d0464c7a0af5..2627b56fb9bc 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1413,9 +1413,10 @@  void native_play_dead(void)
 	play_dead_common();
 	tboot_shutdown(TB_SHUTDOWN_WFS);
 
+	/* The first successful play_dead() will not return */
+	cpuidle_play_dead();
 	mwait_play_dead();
-	if (cpuidle_play_dead())
-		hlt_play_dead();
+	hlt_play_dead();
 }
 
 #else /* ... !CONFIG_HOTPLUG_CPU */