diff mbox series

intel_idle: Add Jasper Lake and Elkhart Lake support

Message ID 20240726062601.674078-1-kai.heng.feng@canonical.com (mailing list archive)
State Changes Requested, archived
Headers show
Series intel_idle: Add Jasper Lake and Elkhart Lake support | expand

Commit Message

Kai-Heng Feng July 26, 2024, 6:26 a.m. UTC
Without proper C-state support, the CPU can take long time to exit to C0
to handle IRQ and perform DMA.

The data collect via wult shows the latency is similar to Broxton, so
use the existing table to support C-state on JSL and EHL.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219023
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/idle/intel_idle.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Zhang, Rui July 26, 2024, 6:51 a.m. UTC | #1
On Fri, 2024-07-26 at 14:26 +0800, Kai-Heng Feng wrote:
> Without proper C-state support, the CPU can take long time to exit to
> C0
> to handle IRQ and perform DMA.

Can you provide more details?

Say, what cstate is entered w/ and w/o this patch?

can you show the output of "grep .
/sys/devices/system/cpu/cpu0/cpuidle/state*/*" without this patch?

> 
> The data collect via wult shows the latency is similar to Broxton, so
> use the existing table to support C-state on JSL and EHL.

so you have done cstate measurement on the EHL using wult?
can you share more details about the measurement results?

thanks,
rui

> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219023
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
>  drivers/idle/intel_idle.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 9aab7abc2ae9..eb6975a1d083 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -1538,6 +1538,8 @@ static const struct x86_cpu_id intel_idle_ids[]
> __initconst = {
>         X86_MATCH_VFM(INTEL_ATOM_GOLDMONT,      &idle_cpu_bxt),
>         X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_PLUS, &idle_cpu_bxt),
>         X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_D,    &idle_cpu_dnv),
> +       X86_MATCH_VFM(INTEL_ATOM_TREMONT,       &idle_cpu_bxt),
> +       X86_MATCH_VFM(INTEL_ATOM_TREMONT_L,     &idle_cpu_bxt),
>         X86_MATCH_VFM(INTEL_ATOM_TREMONT_D,     &idle_cpu_snr),
>         X86_MATCH_VFM(INTEL_ATOM_CRESTMONT,     &idle_cpu_grr),
>         X86_MATCH_VFM(INTEL_ATOM_CRESTMONT_X,   &idle_cpu_srf),
Zhang, Rui July 26, 2024, 6:54 a.m. UTC | #2
BTW, Add Rafael.

On Fri, 2024-07-26 at 14:51 +0800, Zhang, Rui wrote:
> On Fri, 2024-07-26 at 14:26 +0800, Kai-Heng Feng wrote:
> > Without proper C-state support, the CPU can take long time to exit
> > to
> > C0
> > to handle IRQ and perform DMA.
> 
> Can you provide more details?
> 
> Say, what cstate is entered w/ and w/o this patch?
> 
> can you show the output of "grep .
> /sys/devices/system/cpu/cpu0/cpuidle/state*/*" without this patch?
> 
> > 
> > The data collect via wult shows the latency is similar to Broxton,
> > so
> > use the existing table to support C-state on JSL and EHL.
> 
> so you have done cstate measurement on the EHL using wult?
> can you share more details about the measurement results?
> 
> thanks,
> rui
> 
> > 
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=219023
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > ---
> >  drivers/idle/intel_idle.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> > index 9aab7abc2ae9..eb6975a1d083 100644
> > --- a/drivers/idle/intel_idle.c
> > +++ b/drivers/idle/intel_idle.c
> > @@ -1538,6 +1538,8 @@ static const struct x86_cpu_id
> > intel_idle_ids[]
> > __initconst = {
> >         X86_MATCH_VFM(INTEL_ATOM_GOLDMONT,      &idle_cpu_bxt),
> >         X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_PLUS, &idle_cpu_bxt),
> >         X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_D,    &idle_cpu_dnv),
> > +       X86_MATCH_VFM(INTEL_ATOM_TREMONT,       &idle_cpu_bxt),
> > +       X86_MATCH_VFM(INTEL_ATOM_TREMONT_L,     &idle_cpu_bxt),
> >         X86_MATCH_VFM(INTEL_ATOM_TREMONT_D,     &idle_cpu_snr),
> >         X86_MATCH_VFM(INTEL_ATOM_CRESTMONT,     &idle_cpu_grr),
> >         X86_MATCH_VFM(INTEL_ATOM_CRESTMONT_X,   &idle_cpu_srf),
>
Kai-Heng Feng July 29, 2024, 4:07 a.m. UTC | #3
[+Cc Rafael, Srinivas]

On Fri, Jul 26, 2024 at 2:52 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Fri, 2024-07-26 at 14:26 +0800, Kai-Heng Feng wrote:
> > Without proper C-state support, the CPU can take long time to exit to
> > C0
> > to handle IRQ and perform DMA.
>
> Can you provide more details?
>
> Say, what cstate is entered w/ and w/o this patch?

Without the patch it's ACPI C1, C2 and C3.

>
> can you show the output of "grep .
> /sys/devices/system/cpu/cpu0/cpuidle/state*/*" without this patch?

/sys/devices/system/cpu/cpu0/cpuidle$ grep . */*
state0/above:0
state0/below:631
state0/default_status:enabled
state0/desc:CPUIDLE CORE POLL IDLE
state0/disable:0
state0/latency:0
state0/name:POLL
state0/power:4294967295
state0/rejected:0
state0/residency:0
state0/time:19513
state0/usage:635
state1/above:26
state1/below:12437
state1/default_status:enabled
state1/desc:ACPI FFH MWAIT 0x0
state1/disable:0
state1/latency:1
state1/name:C1_ACPI
state1/power:0
state1/rejected:0
state1/residency:1
grep: state1/s2idle: Is a directory
state1/time:18621370
state1/usage:74523
state2/above:1690
state2/below:17
state2/default_status:enabled
state2/desc:ACPI FFH MWAIT 0x31
state2/disable:0
state2/latency:253
state2/name:C2_ACPI
state2/power:0
state2/rejected:0
state2/residency:759
grep: state2/s2idle: Is a directory
state2/time:7063052
state2/usage:7909
state3/above:13111
state3/below:0
state3/default_status:enabled
state3/desc:ACPI FFH MWAIT 0x60
state3/disable:0
state3/latency:1048
state3/name:C3_ACPI
state3/power:0
state3/rejected:0
state3/residency:3144
grep: state3/s2idle: Is a directory
state3/time:4451519230
state3/usage:55467


>
> >
> > The data collect via wult shows the latency is similar to Broxton, so
> > use the existing table to support C-state on JSL and EHL.
>
> so you have done cstate measurement on the EHL using wult?
> can you share more details about the measurement results?

I look at the "spikes" of the aggregated graph. Not sure if it's the
right way to interpret the graph though.

It will be much better if Intel can add the proper C-states table for
JSL and EHL.

Kai-Heng

>
> thanks,
> rui
>
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=219023
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > ---
> >  drivers/idle/intel_idle.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> > index 9aab7abc2ae9..eb6975a1d083 100644
> > --- a/drivers/idle/intel_idle.c
> > +++ b/drivers/idle/intel_idle.c
> > @@ -1538,6 +1538,8 @@ static const struct x86_cpu_id intel_idle_ids[]
> > __initconst = {
> >         X86_MATCH_VFM(INTEL_ATOM_GOLDMONT,      &idle_cpu_bxt),
> >         X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_PLUS, &idle_cpu_bxt),
> >         X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_D,    &idle_cpu_dnv),
> > +       X86_MATCH_VFM(INTEL_ATOM_TREMONT,       &idle_cpu_bxt),
> > +       X86_MATCH_VFM(INTEL_ATOM_TREMONT_L,     &idle_cpu_bxt),
> >         X86_MATCH_VFM(INTEL_ATOM_TREMONT_D,     &idle_cpu_snr),
> >         X86_MATCH_VFM(INTEL_ATOM_CRESTMONT,     &idle_cpu_grr),
> >         X86_MATCH_VFM(INTEL_ATOM_CRESTMONT_X,   &idle_cpu_srf),
>
Rafael J. Wysocki July 30, 2024, 1:59 p.m. UTC | #4
On Mon, Jul 29, 2024 at 6:08 AM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> [+Cc Rafael, Srinivas]
>
> On Fri, Jul 26, 2024 at 2:52 PM Zhang, Rui <rui.zhang@intel.com> wrote:
> >
> > On Fri, 2024-07-26 at 14:26 +0800, Kai-Heng Feng wrote:
> > > Without proper C-state support, the CPU can take long time to exit to
> > > C0
> > > to handle IRQ and perform DMA.
> >
> > Can you provide more details?
> >
> > Say, what cstate is entered w/ and w/o this patch?
>
> Without the patch it's ACPI C1, C2 and C3.

They are called like this because they come from ACPI _CST.  You need
to use turbostat (or equivalent) to check what C-states really are
entered.

>
> >
> > can you show the output of "grep .
> > /sys/devices/system/cpu/cpu0/cpuidle/state*/*" without this patch?
>
> /sys/devices/system/cpu/cpu0/cpuidle$ grep . */*
> state0/above:0
> state0/below:631
> state0/default_status:enabled
> state0/desc:CPUIDLE CORE POLL IDLE
> state0/disable:0
> state0/latency:0
> state0/name:POLL
> state0/power:4294967295
> state0/rejected:0
> state0/residency:0
> state0/time:19513
> state0/usage:635
> state1/above:26
> state1/below:12437
> state1/default_status:enabled
> state1/desc:ACPI FFH MWAIT 0x0

This is C1 AFAICS.

> state1/disable:0
> state1/latency:1
> state1/name:C1_ACPI
> state1/power:0
> state1/rejected:0
> state1/residency:1
> grep: state1/s2idle: Is a directory
> state1/time:18621370
> state1/usage:74523
> state2/above:1690
> state2/below:17
> state2/default_status:enabled
> state2/desc:ACPI FFH MWAIT 0x31

This looks like something that used to be called "C7s".

> state2/disable:0
> state2/latency:253
> state2/name:C2_ACPI
> state2/power:0
> state2/rejected:0
> state2/residency:759
> grep: state2/s2idle: Is a directory
> state2/time:7063052
> state2/usage:7909
> state3/above:13111
> state3/below:0
> state3/default_status:enabled
> state3/desc:ACPI FFH MWAIT 0x60

And this looks like C10.

> state3/disable:0
> state3/latency:1048
> state3/name:C3_ACPI
> state3/power:0
> state3/rejected:0
> state3/residency:3144
> grep: state3/s2idle: Is a directory
> state3/time:4451519230
> state3/usage:55467
>
>
> >
> > >
> > > The data collect via wult shows the latency is similar to Broxton, so
> > > use the existing table to support C-state on JSL and EHL.
> >
> > so you have done cstate measurement on the EHL using wult?
> > can you share more details about the measurement results?
>
> I look at the "spikes" of the aggregated graph. Not sure if it's the
> right way to interpret the graph though.
>
> It will be much better if Intel can add the proper C-states table for
> JSL and EHL.

So what's missing in the above, from the technical standpoint?
Kai-Heng Feng July 31, 2024, 6:17 a.m. UTC | #5
On Tue, Jul 30, 2024 at 9:59 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Jul 29, 2024 at 6:08 AM Kai-Heng Feng
> <kai.heng.feng@canonical.com> wrote:
> >
> > [+Cc Rafael, Srinivas]
> >
> > On Fri, Jul 26, 2024 at 2:52 PM Zhang, Rui <rui.zhang@intel.com> wrote:
> > >
> > > On Fri, 2024-07-26 at 14:26 +0800, Kai-Heng Feng wrote:
> > > > Without proper C-state support, the CPU can take long time to exit to
> > > > C0
> > > > to handle IRQ and perform DMA.
> > >
> > > Can you provide more details?
> > >
> > > Say, what cstate is entered w/ and w/o this patch?
> >
> > Without the patch it's ACPI C1, C2 and C3.
>
> They are called like this because they come from ACPI _CST.  You need
> to use turbostat (or equivalent) to check what C-states really are
> entered.

Both C1, C2 and C3 have some residencies.

>
> >
> > >
> > > can you show the output of "grep .
> > > /sys/devices/system/cpu/cpu0/cpuidle/state*/*" without this patch?
> >
> > /sys/devices/system/cpu/cpu0/cpuidle$ grep . */*
> > state0/above:0
> > state0/below:631
> > state0/default_status:enabled
> > state0/desc:CPUIDLE CORE POLL IDLE
> > state0/disable:0
> > state0/latency:0
> > state0/name:POLL
> > state0/power:4294967295
> > state0/rejected:0
> > state0/residency:0
> > state0/time:19513
> > state0/usage:635
> > state1/above:26
> > state1/below:12437
> > state1/default_status:enabled
> > state1/desc:ACPI FFH MWAIT 0x0
>
> This is C1 AFAICS.
>
> > state1/disable:0
> > state1/latency:1
> > state1/name:C1_ACPI
> > state1/power:0
> > state1/rejected:0
> > state1/residency:1
> > grep: state1/s2idle: Is a directory
> > state1/time:18621370
> > state1/usage:74523
> > state2/above:1690
> > state2/below:17
> > state2/default_status:enabled
> > state2/desc:ACPI FFH MWAIT 0x31
>
> This looks like something that used to be called "C7s".
>
> > state2/disable:0
> > state2/latency:253
> > state2/name:C2_ACPI
> > state2/power:0
> > state2/rejected:0
> > state2/residency:759
> > grep: state2/s2idle: Is a directory
> > state2/time:7063052
> > state2/usage:7909
> > state3/above:13111
> > state3/below:0
> > state3/default_status:enabled
> > state3/desc:ACPI FFH MWAIT 0x60
>
> And this looks like C10.
>
> > state3/disable:0
> > state3/latency:1048
> > state3/name:C3_ACPI
> > state3/power:0
> > state3/rejected:0
> > state3/residency:3144
> > grep: state3/s2idle: Is a directory
> > state3/time:4451519230
> > state3/usage:55467
> >
> >
> > >
> > > >
> > > > The data collect via wult shows the latency is similar to Broxton, so
> > > > use the existing table to support C-state on JSL and EHL.
> > >
> > > so you have done cstate measurement on the EHL using wult?
> > > can you share more details about the measurement results?
> >
> > I look at the "spikes" of the aggregated graph. Not sure if it's the
> > right way to interpret the graph though.
> >
> > It will be much better if Intel can add the proper C-states table for
> > JSL and EHL.
>
> So what's missing in the above, from the technical standpoint?

The crucial part to make the issue (i.e. slow ethernet) is
".disable_promotion_to_c1e = true".

Can we use that for EHL and JSL?

Kai-Heng
Len Brown July 31, 2024, 4:28 p.m. UTC | #6
On Wed, Jul 31, 2024 at 2:18 AM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> The crucial part to make the issue (i.e. slow ethernet) is
> ".disable_promotion_to_c1e = true".

Okay, so the problem statement is that on this machine with some
ethernet controller and some workload,
performance is better when you use just C1 and not C1E (or deeper) states.

And so you want to have the option of accessing C1 without the overhead of C1E?

Presumably you don't care about the power savings of the deeper states,
or you are using PM_QOS to avoid deep c-states at run time?

> Can we use that for EHL and JSL?

Yes.

You may also have a BIOS option to achieve the same goal, depending on
the platform.)
Kai-Heng Feng Aug. 1, 2024, 6:54 a.m. UTC | #7
On Thu, Aug 1, 2024 at 12:28 AM Len Brown <lenb@kernel.org> wrote:
>
> On Wed, Jul 31, 2024 at 2:18 AM Kai-Heng Feng
> <kai.heng.feng@canonical.com> wrote:
> >
> > The crucial part to make the issue (i.e. slow ethernet) is
> > ".disable_promotion_to_c1e = true".
>
> Okay, so the problem statement is that on this machine with some
> ethernet controller and some workload,
> performance is better when you use just C1 and not C1E (or deeper) states.
>
> And so you want to have the option of accessing C1 without the overhead of C1E?

Yes, that's the case here.

>
> Presumably you don't care about the power savings of the deeper states,
> or you are using PM_QOS to avoid deep c-states at run time?

I tried to use cpu_latency_qos in the network driver's NAPI poll, but
only saw marginal improvement to around 830Mbps. Hitting the 940Mbps
is still the goal here.

>
> > Can we use that for EHL and JSL?
>
> Yes.

Is it plausible to disable C1E promotion while using ACPI idle driver?
Or provide a C-state table in intel_idle and update the states via
_CST?

>
> You may also have a BIOS option to achieve the same goal, depending on
> the platform.)

I am seeing three different platforms from different vendors hitting
the same issue, so it's better to disable C1e for these platforms.

Kai-Heng

>
> --
> Len Brown, Intel
>
Rafael J. Wysocki Aug. 1, 2024, 10:20 a.m. UTC | #8
On Thu, Aug 1, 2024 at 8:55 AM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> On Thu, Aug 1, 2024 at 12:28 AM Len Brown <lenb@kernel.org> wrote:
> >
> > On Wed, Jul 31, 2024 at 2:18 AM Kai-Heng Feng
> > <kai.heng.feng@canonical.com> wrote:
> > >
> > > The crucial part to make the issue (i.e. slow ethernet) is
> > > ".disable_promotion_to_c1e = true".
> >
> > Okay, so the problem statement is that on this machine with some
> > ethernet controller and some workload,
> > performance is better when you use just C1 and not C1E (or deeper) states.
> >
> > And so you want to have the option of accessing C1 without the overhead of C1E?
>
> Yes, that's the case here.
>
> >
> > Presumably you don't care about the power savings of the deeper states,
> > or you are using PM_QOS to avoid deep c-states at run time?
>
> I tried to use cpu_latency_qos in the network driver's NAPI poll, but
> only saw marginal improvement to around 830Mbps. Hitting the 940Mbps
> is still the goal here.
>
> >
> > > Can we use that for EHL and JSL?
> >
> > Yes.
>
> Is it plausible to disable C1E promotion while using ACPI idle driver?

It is in general.

It would require the driver to recognize the processors in question
without providing C-state tables, but it should be doable.

Next week I'll be offline, I'll look into this when I'm back.

> Or provide a C-state table in intel_idle and update the states via _CST?

That should be possible either.

> >
> > You may also have a BIOS option to achieve the same goal, depending on
> > the platform.)
>
> I am seeing three different platforms from different vendors hitting
> the same issue, so it's better to disable C1e for these platforms.

I see.

Thanks!
diff mbox series

Patch

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 9aab7abc2ae9..eb6975a1d083 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1538,6 +1538,8 @@  static const struct x86_cpu_id intel_idle_ids[] __initconst = {
 	X86_MATCH_VFM(INTEL_ATOM_GOLDMONT,	&idle_cpu_bxt),
 	X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_PLUS,	&idle_cpu_bxt),
 	X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_D,	&idle_cpu_dnv),
+	X86_MATCH_VFM(INTEL_ATOM_TREMONT,	&idle_cpu_bxt),
+	X86_MATCH_VFM(INTEL_ATOM_TREMONT_L,	&idle_cpu_bxt),
 	X86_MATCH_VFM(INTEL_ATOM_TREMONT_D,	&idle_cpu_snr),
 	X86_MATCH_VFM(INTEL_ATOM_CRESTMONT,	&idle_cpu_grr),
 	X86_MATCH_VFM(INTEL_ATOM_CRESTMONT_X,	&idle_cpu_srf),