diff mbox series

[v4] topology: make core_mask include at least cluster_siblings

Message ID 3d58dc946a4fa1cc696d05baad1cf05ae686a86d.1649115057.git.darren@os.amperecomputing.com (mailing list archive)
State New, archived
Headers show
Series [v4] topology: make core_mask include at least cluster_siblings | expand

Commit Message

Darren Hart April 4, 2022, 11:40 p.m. UTC
Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
Control Unit, but have no shared CPU-side last level cache.

cpu_coregroup_mask() will return a cpumask with weight 1, while
cpu_clustergroup_mask() will return a cpumask with weight 2.

As a result, build_sched_domain() will BUG() once per CPU with:

BUG: arch topology borken
the CLS domain not a subset of the MC domain

The MC level cpumask is then extended to that of the CLS child, and is
later removed entirely as redundant. This sched domain topology is an
improvement over previous topologies, or those built without
SCHED_CLUSTER, particularly for certain latency sensitive workloads.
With the current scheduler model and heuristics, this is a desirable
default topology for Ampere Altra and Altra Max system.

Rather than create a custom sched domains topology structure and
introduce new logic in arch/arm64 to detect these systems, update the
core_mask so coregroup is never a subset of clustergroup, extending it
to cluster_siblings if necessary. Only do this if CONFIG_SCHED_CLUSTER
is enabled to avoid also changing the topology (MC) when
CONFIG_SCHED_CLUSTER is disabled.

This has the added benefit over a custom topology of working for both
symmetric and asymmetric topologies. It does not address systems where
the CLUSTER topology is above a populated MC topology, but these are not
considered today and can be addressed separately if and when they
appear.

The final sched domain topology for a 2 socket Ampere Altra system is
unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:

For CPU0:

CONFIG_SCHED_CLUSTER=y
CLS  [0-1]
DIE  [0-79]
NUMA [0-159]

CONFIG_SCHED_CLUSTER is not set
DIE  [0-79]
NUMA [0-159]

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: D. Scott Phillips <scott@os.amperecomputing.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Carl Worth <carl@os.amperecomputing.com>
Cc: <stable@vger.kernel.org> # 5.16.x
Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
---
v1: Drop MC level if coregroup weight == 1
v2: New sd topo in arch/arm64/kernel/smp.c
v3: No new topo, extend core_mask to cluster_siblings
v4: Rebase on 5.18-rc1 for GregKH to pull. Add IS_ENABLED(CONFIG_SCHED_CLUSTER).

 drivers/base/arch_topology.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Darren Hart April 5, 2022, 2:29 a.m. UTC | #1
On Mon, Apr 04, 2022 at 04:40:37PM -0700, Darren Hart wrote:
> Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> Control Unit, but have no shared CPU-side last level cache.
> 
> cpu_coregroup_mask() will return a cpumask with weight 1, while
> cpu_clustergroup_mask() will return a cpumask with weight 2.
> 
> As a result, build_sched_domain() will BUG() once per CPU with:
> 
> BUG: arch topology borken
> the CLS domain not a subset of the MC domain
> 
> The MC level cpumask is then extended to that of the CLS child, and is
> later removed entirely as redundant. This sched domain topology is an
> improvement over previous topologies, or those built without
> SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> With the current scheduler model and heuristics, this is a desirable
> default topology for Ampere Altra and Altra Max system.
> 
> Rather than create a custom sched domains topology structure and
> introduce new logic in arch/arm64 to detect these systems, update the
> core_mask so coregroup is never a subset of clustergroup, extending it
> to cluster_siblings if necessary. Only do this if CONFIG_SCHED_CLUSTER
> is enabled to avoid also changing the topology (MC) when
> CONFIG_SCHED_CLUSTER is disabled.
> 
> This has the added benefit over a custom topology of working for both
> symmetric and asymmetric topologies. It does not address systems where
> the CLUSTER topology is above a populated MC topology, but these are not
> considered today and can be addressed separately if and when they
> appear.
> 
> The final sched domain topology for a 2 socket Ampere Altra system is
> unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> 
> For CPU0:
> 
> CONFIG_SCHED_CLUSTER=y
> CLS  [0-1]
> DIE  [0-79]
> NUMA [0-159]
> 
> CONFIG_SCHED_CLUSTER is not set
> DIE  [0-79]
> NUMA [0-159]
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Barry Song <song.bao.hua@hisilicon.com>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> Cc: Carl Worth <carl@os.amperecomputing.com>
> Cc: <stable@vger.kernel.org> # 5.16.x
> Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
> Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> ---
> v1: Drop MC level if coregroup weight == 1
> v2: New sd topo in arch/arm64/kernel/smp.c
> v3: No new topo, extend core_mask to cluster_siblings
> v4: Rebase on 5.18-rc1 for GregKH to pull. Add IS_ENABLED(CONFIG_SCHED_CLUSTER).

A bit more context on the state of review:

Several folks reviewed, but I didn't add their Reviewed-by since I added the
IS_ENABLED(CONFIG_SCHED_CLUSTER) test since they reviewed it last. This change
preserves the stated intent of the change when CONFIG_SCHED_CLUSTER is disabled.

Barry Song - Suggested this approach
Vincent Guittot - informal review with reservations
Sudeep Holla - Acked-by
Dietmar Eggemann - informal review (added to Cc, apologies for the omission Dietmar)

All but Barry's recommendation captured in the v3 thread:
https://lore.kernel.org/linux-arm-kernel/f1deaeabfd31fdf512ff6502f38186ef842c2b1f.1646413117.git.darren@os.amperecomputing.com/

Thanks,

> 
>  drivers/base/arch_topology.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 1d6636ebaac5..5497c5ab7318 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -667,6 +667,15 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
>  			core_mask = &cpu_topology[cpu].llc_sibling;
>  	}
>  
> +	/*
> +	 * For systems with no shared cpu-side LLC but with clusters defined,
> +	 * extend core_mask to cluster_siblings. The sched domain builder will
> +	 * then remove MC as redundant with CLS if SCHED_CLUSTER is enabled.
> +	 */
> +	if (IS_ENABLED(CONFIG_SCHED_CLUSTER) &&
> +	    cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
> +		core_mask = &cpu_topology[cpu].cluster_sibling;
> +
>  	return core_mask;
>  }
>
Barry Song April 5, 2022, 6:38 a.m. UTC | #2
On Tue, Apr 5, 2022 at 3:46 PM Darren Hart
<darren@os.amperecomputing.com> wrote:
>
> On Mon, Apr 04, 2022 at 04:40:37PM -0700, Darren Hart wrote:
> > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > Control Unit, but have no shared CPU-side last level cache.
> >
> > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > cpu_clustergroup_mask() will return a cpumask with weight 2.
> >
> > As a result, build_sched_domain() will BUG() once per CPU with:
> >
> > BUG: arch topology borken
> > the CLS domain not a subset of the MC domain
> >
> > The MC level cpumask is then extended to that of the CLS child, and is
> > later removed entirely as redundant. This sched domain topology is an
> > improvement over previous topologies, or those built without
> > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > With the current scheduler model and heuristics, this is a desirable
> > default topology for Ampere Altra and Altra Max system.
> >
> > Rather than create a custom sched domains topology structure and
> > introduce new logic in arch/arm64 to detect these systems, update the
> > core_mask so coregroup is never a subset of clustergroup, extending it
> > to cluster_siblings if necessary. Only do this if CONFIG_SCHED_CLUSTER
> > is enabled to avoid also changing the topology (MC) when
> > CONFIG_SCHED_CLUSTER is disabled.
> >
> > This has the added benefit over a custom topology of working for both
> > symmetric and asymmetric topologies. It does not address systems where
> > the CLUSTER topology is above a populated MC topology, but these are not
> > considered today and can be addressed separately if and when they
> > appear.
> >
> > The final sched domain topology for a 2 socket Ampere Altra system is
> > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> >
> > For CPU0:
> >
> > CONFIG_SCHED_CLUSTER=y
> > CLS  [0-1]
> > DIE  [0-79]
> > NUMA [0-159]
> >
> > CONFIG_SCHED_CLUSTER is not set
> > DIE  [0-79]
> > NUMA [0-159]
> >
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > Cc: Carl Worth <carl@os.amperecomputing.com>
> > Cc: <stable@vger.kernel.org> # 5.16.x
> > Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
> > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > ---
> > v1: Drop MC level if coregroup weight == 1
> > v2: New sd topo in arch/arm64/kernel/smp.c
> > v3: No new topo, extend core_mask to cluster_siblings
> > v4: Rebase on 5.18-rc1 for GregKH to pull. Add IS_ENABLED(CONFIG_SCHED_CLUSTER).
>
> A bit more context on the state of review:
>
> Several folks reviewed, but I didn't add their Reviewed-by since I added the
> IS_ENABLED(CONFIG_SCHED_CLUSTER) test since they reviewed it last. This change
> preserves the stated intent of the change when CONFIG_SCHED_CLUSTER is disabled.

Everything still works even without IS_ENABLED(CONFIG_SCHED_CLUSTER), right?
Anyway, putting IS_ENABLED(CONFIG_SCHED_CLUSTER) seems to be right as
well.
But it seems it is still a good choice to put all these reviewed-by
and acked-by you got in
v3? I don't  think the added IS_ENABLED will change their decisions.

>
> Barry Song - Suggested this approach
> Vincent Guittot - informal review with reservations
> Sudeep Holla - Acked-by
> Dietmar Eggemann - informal review (added to Cc, apologies for the omission Dietmar)
>
> All but Barry's recommendation captured in the v3 thread:
> https://lore.kernel.org/linux-arm-kernel/f1deaeabfd31fdf512ff6502f38186ef842c2b1f.1646413117.git.darren@os.amperecomputing.com/
>
> Thanks,
>
> >
> >  drivers/base/arch_topology.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> > index 1d6636ebaac5..5497c5ab7318 100644
> > --- a/drivers/base/arch_topology.c
> > +++ b/drivers/base/arch_topology.c
> > @@ -667,6 +667,15 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
> >                       core_mask = &cpu_topology[cpu].llc_sibling;
> >       }
> >
> > +     /*
> > +      * For systems with no shared cpu-side LLC but with clusters defined,
> > +      * extend core_mask to cluster_siblings. The sched domain builder will
> > +      * then remove MC as redundant with CLS if SCHED_CLUSTER is enabled.
> > +      */
> > +     if (IS_ENABLED(CONFIG_SCHED_CLUSTER) &&
> > +         cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
> > +             core_mask = &cpu_topology[cpu].cluster_sibling;
> > +
> >       return core_mask;
> >  }
> >
> --
> Darren Hart
> Ampere Computing / OS and Kernel

Thanks
Barry
Darren Hart April 5, 2022, 2:54 p.m. UTC | #3
On Tue, Apr 05, 2022 at 06:38:01PM +1200, Barry Song wrote:
> On Tue, Apr 5, 2022 at 3:46 PM Darren Hart
> <darren@os.amperecomputing.com> wrote:
> >
> > On Mon, Apr 04, 2022 at 04:40:37PM -0700, Darren Hart wrote:
> > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > Control Unit, but have no shared CPU-side last level cache.
> > >
> > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > >
> > > As a result, build_sched_domain() will BUG() once per CPU with:
> > >
> > > BUG: arch topology borken
> > > the CLS domain not a subset of the MC domain
> > >
> > > The MC level cpumask is then extended to that of the CLS child, and is
> > > later removed entirely as redundant. This sched domain topology is an
> > > improvement over previous topologies, or those built without
> > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > With the current scheduler model and heuristics, this is a desirable
> > > default topology for Ampere Altra and Altra Max system.
> > >
> > > Rather than create a custom sched domains topology structure and
> > > introduce new logic in arch/arm64 to detect these systems, update the
> > > core_mask so coregroup is never a subset of clustergroup, extending it
> > > to cluster_siblings if necessary. Only do this if CONFIG_SCHED_CLUSTER
> > > is enabled to avoid also changing the topology (MC) when
> > > CONFIG_SCHED_CLUSTER is disabled.
> > >
> > > This has the added benefit over a custom topology of working for both
> > > symmetric and asymmetric topologies. It does not address systems where
> > > the CLUSTER topology is above a populated MC topology, but these are not
> > > considered today and can be addressed separately if and when they
> > > appear.
> > >
> > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > >
> > > For CPU0:
> > >
> > > CONFIG_SCHED_CLUSTER=y
> > > CLS  [0-1]
> > > DIE  [0-79]
> > > NUMA [0-159]
> > >
> > > CONFIG_SCHED_CLUSTER is not set
> > > DIE  [0-79]
> > > NUMA [0-159]
> > >
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > > Cc: Carl Worth <carl@os.amperecomputing.com>
> > > Cc: <stable@vger.kernel.org> # 5.16.x
> > > Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
> > > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > > ---
> > > v1: Drop MC level if coregroup weight == 1
> > > v2: New sd topo in arch/arm64/kernel/smp.c
> > > v3: No new topo, extend core_mask to cluster_siblings
> > > v4: Rebase on 5.18-rc1 for GregKH to pull. Add IS_ENABLED(CONFIG_SCHED_CLUSTER).
> >
> > A bit more context on the state of review:
> >
> > Several folks reviewed, but I didn't add their Reviewed-by since I added the
> > IS_ENABLED(CONFIG_SCHED_CLUSTER) test since they reviewed it last. This change
> > preserves the stated intent of the change when CONFIG_SCHED_CLUSTER is disabled.
> 
> Everything still works even without IS_ENABLED(CONFIG_SCHED_CLUSTER), right?
> Anyway, putting IS_ENABLED(CONFIG_SCHED_CLUSTER) seems to be right as
> well.

Hi Barry,

Without the additional IS_ENABLED check, if CONFIG_SCHED_CLUSTER is disabled
then rather than a topology of:

DIE  [0-79]
NUMA [0-159]

We end up expanding the MC span and get:

MC   [0-1]
DIE  [0-79]
NUMA [0-159]

This isn't "bad", but it wasn't the stated intent, and I prefer users can choose
between the two by using the CONFIG_SCHED_CLUSTER option.

> But it seems it is still a good choice to put all these reviewed-by
> and acked-by you got in
> v3? I don't  think the added IS_ENABLED will change their decisions.

I think Sudeep is the only one that wrote the actual tag, and in my experience
those tags should be explicitly volunteered rather than assumed, especially if a
change is made, especially for Reviewed-by. [1] reinforces this with "Hence
patch mergers will sometimes manually convert an acker’s “yep, looks good to me”
into an Acked-by: (but note that it is usually better to ask for an explicit
ack)."

Greg, since I'm asking you to pull this - please let me know if I'm being overly
cautious with tags here.

> 
> >
> > Barry Song - Suggested this approach

Can we add your Reviewed-by here Barry?

Thanks,

Darren

1. https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by

> > Vincent Guittot - informal review with reservations
> > Sudeep Holla - Acked-by
> > Dietmar Eggemann - informal review (added to Cc, apologies for the omission Dietmar)
> >
> > All but Barry's recommendation captured in the v3 thread:
> > https://lore.kernel.org/linux-arm-kernel/f1deaeabfd31fdf512ff6502f38186ef842c2b1f.1646413117.git.darren@os.amperecomputing.com/
> >
> > Thanks,
> >
> > >
> > >  drivers/base/arch_topology.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> > > index 1d6636ebaac5..5497c5ab7318 100644
> > > --- a/drivers/base/arch_topology.c
> > > +++ b/drivers/base/arch_topology.c
> > > @@ -667,6 +667,15 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
> > >                       core_mask = &cpu_topology[cpu].llc_sibling;
> > >       }
> > >
> > > +     /*
> > > +      * For systems with no shared cpu-side LLC but with clusters defined,
> > > +      * extend core_mask to cluster_siblings. The sched domain builder will
> > > +      * then remove MC as redundant with CLS if SCHED_CLUSTER is enabled.
> > > +      */
> > > +     if (IS_ENABLED(CONFIG_SCHED_CLUSTER) &&
> > > +         cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
> > > +             core_mask = &cpu_topology[cpu].cluster_sibling;
> > > +
> > >       return core_mask;
> > >  }
> > >
> > --
> > Darren Hart
> > Ampere Computing / OS and Kernel
> 
> Thanks
> Barry
Darren Hart April 5, 2022, 3:02 p.m. UTC | #4
On Mon, Apr 04, 2022 at 07:29:20PM -0700, Darren Hart wrote:
...
> A bit more context on the state of review:
> 
> Several folks reviewed, but I didn't add their Reviewed-by since I added the
> IS_ENABLED(CONFIG_SCHED_CLUSTER) test since they reviewed it last. This change
> preserves the stated intent of the change when CONFIG_SCHED_CLUSTER is disabled.
> 
> Barry Song - Suggested this approach
> Vincent Guittot - informal review with reservations
> Sudeep Holla - Acked-by
> Dietmar Eggemann - informal review (added to Cc, apologies for the omission Dietmar)

Dietmar responded with his:

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Barry Song April 6, 2022, 6:14 a.m. UTC | #5
On Wed, Apr 6, 2022 at 2:55 AM Darren Hart
<darren@os.amperecomputing.com> wrote:
>
> On Tue, Apr 05, 2022 at 06:38:01PM +1200, Barry Song wrote:
> > On Tue, Apr 5, 2022 at 3:46 PM Darren Hart
> > <darren@os.amperecomputing.com> wrote:
> > >
> > > On Mon, Apr 04, 2022 at 04:40:37PM -0700, Darren Hart wrote:
> > > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > > Control Unit, but have no shared CPU-side last level cache.
> > > >
> > > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > > >
> > > > As a result, build_sched_domain() will BUG() once per CPU with:
> > > >
> > > > BUG: arch topology borken
> > > > the CLS domain not a subset of the MC domain
> > > >
> > > > The MC level cpumask is then extended to that of the CLS child, and is
> > > > later removed entirely as redundant. This sched domain topology is an
> > > > improvement over previous topologies, or those built without
> > > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > > With the current scheduler model and heuristics, this is a desirable
> > > > default topology for Ampere Altra and Altra Max system.
> > > >
> > > > Rather than create a custom sched domains topology structure and
> > > > introduce new logic in arch/arm64 to detect these systems, update the
> > > > core_mask so coregroup is never a subset of clustergroup, extending it
> > > > to cluster_siblings if necessary. Only do this if CONFIG_SCHED_CLUSTER
> > > > is enabled to avoid also changing the topology (MC) when
> > > > CONFIG_SCHED_CLUSTER is disabled.
> > > >
> > > > This has the added benefit over a custom topology of working for both
> > > > symmetric and asymmetric topologies. It does not address systems where
> > > > the CLUSTER topology is above a populated MC topology, but these are not
> > > > considered today and can be addressed separately if and when they
> > > > appear.
> > > >
> > > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > > >
> > > > For CPU0:
> > > >
> > > > CONFIG_SCHED_CLUSTER=y
> > > > CLS  [0-1]
> > > > DIE  [0-79]
> > > > NUMA [0-159]
> > > >
> > > > CONFIG_SCHED_CLUSTER is not set
> > > > DIE  [0-79]
> > > > NUMA [0-159]
> > > >
> > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > > > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > > > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > > > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > > > Cc: Carl Worth <carl@os.amperecomputing.com>
> > > > Cc: <stable@vger.kernel.org> # 5.16.x
> > > > Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
> > > > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > > > ---
> > > > v1: Drop MC level if coregroup weight == 1
> > > > v2: New sd topo in arch/arm64/kernel/smp.c
> > > > v3: No new topo, extend core_mask to cluster_siblings
> > > > v4: Rebase on 5.18-rc1 for GregKH to pull. Add IS_ENABLED(CONFIG_SCHED_CLUSTER).
> > >
> > > A bit more context on the state of review:
> > >
> > > Several folks reviewed, but I didn't add their Reviewed-by since I added the
> > > IS_ENABLED(CONFIG_SCHED_CLUSTER) test since they reviewed it last. This change
> > > preserves the stated intent of the change when CONFIG_SCHED_CLUSTER is disabled.
> >
> > Everything still works even without IS_ENABLED(CONFIG_SCHED_CLUSTER), right?
> > Anyway, putting IS_ENABLED(CONFIG_SCHED_CLUSTER) seems to be right as
> > well.
>
> Hi Barry,
>
> Without the additional IS_ENABLED check, if CONFIG_SCHED_CLUSTER is disabled
> then rather than a topology of:
>
> DIE  [0-79]
> NUMA [0-159]
>
> We end up expanding the MC span and get:
>
> MC   [0-1]
> DIE  [0-79]
> NUMA [0-159]
>
> This isn't "bad", but it wasn't the stated intent, and I prefer users can choose
> between the two by using the CONFIG_SCHED_CLUSTER option.
>
> > But it seems it is still a good choice to put all these reviewed-by
> > and acked-by you got in
> > v3? I don't  think the added IS_ENABLED will change their decisions.
>
> I think Sudeep is the only one that wrote the actual tag, and in my experience
> those tags should be explicitly volunteered rather than assumed, especially if a
> change is made, especially for Reviewed-by. [1] reinforces this with "Hence
> patch mergers will sometimes manually convert an acker’s “yep, looks good to me”
> into an Acked-by: (but note that it is usually better to ask for an explicit
> ack)."
>
> Greg, since I'm asking you to pull this - please let me know if I'm being overly
> cautious with tags here.
>
> >
> > >
> > > Barry Song - Suggested this approach
>
> Can we add your Reviewed-by here Barry?

Yes, please.

I think you should add

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
according to:
https://lore.kernel.org/lkml/e91bcc83-37c8-dcca-e088-8b3fcd737b2c@arm.com/

Acked-by: Sudeep Holla <sudeep.holla@arm.com>
according to:
https://lore.kernel.org/lkml/YiczzB92EcShyvLh@bogus/

>
> Thanks,
>
> Darren
>
> 1. https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by
>
> > > Vincent Guittot - informal review with reservations
> > > Sudeep Holla - Acked-by
> > > Dietmar Eggemann - informal review (added to Cc, apologies for the omission Dietmar)
> > >
> > > All but Barry's recommendation captured in the v3 thread:
> > > https://lore.kernel.org/linux-arm-kernel/f1deaeabfd31fdf512ff6502f38186ef842c2b1f.1646413117.git.darren@os.amperecomputing.com/
> > >
> > > Thanks,
> > >
> > > >
> > > >  drivers/base/arch_topology.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> > > > index 1d6636ebaac5..5497c5ab7318 100644
> > > > --- a/drivers/base/arch_topology.c
> > > > +++ b/drivers/base/arch_topology.c
> > > > @@ -667,6 +667,15 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
> > > >                       core_mask = &cpu_topology[cpu].llc_sibling;
> > > >       }
> > > >
> > > > +     /*
> > > > +      * For systems with no shared cpu-side LLC but with clusters defined,
> > > > +      * extend core_mask to cluster_siblings. The sched domain builder will
> > > > +      * then remove MC as redundant with CLS if SCHED_CLUSTER is enabled.
> > > > +      */
> > > > +     if (IS_ENABLED(CONFIG_SCHED_CLUSTER) &&
> > > > +         cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
> > > > +             core_mask = &cpu_topology[cpu].cluster_sibling;
> > > > +
> > > >       return core_mask;
> > > >  }
> > > >
> > > --
> > > Darren Hart
> > > Ampere Computing / OS and Kernel
> >

Thanks
Barry
Darren Hart April 8, 2022, 6:15 p.m. UTC | #6
On Wed, Apr 06, 2022 at 06:14:16PM +1200, Barry Song wrote:
> On Wed, Apr 6, 2022 at 2:55 AM Darren Hart
> <darren@os.amperecomputing.com> wrote:
...

> > Can we add your Reviewed-by here Barry?
> 
> Yes, please.
> 
> I think you should add
> 
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> according to:
> https://lore.kernel.org/lkml/e91bcc83-37c8-dcca-e088-8b3fcd737b2c@arm.com/
> 
> Acked-by: Sudeep Holla <sudeep.holla@arm.com>
> according to:
> https://lore.kernel.org/lkml/YiczzB92EcShyvLh@bogus/

Thanks Barry,

Greg, I am assuming you prefer I not resend the same patch with these added and
that your tooling automates most of this (b4 or similar). Please let me know if
you prefer a resend.

Thanks,
diff mbox series

Patch

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 1d6636ebaac5..5497c5ab7318 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -667,6 +667,15 @@  const struct cpumask *cpu_coregroup_mask(int cpu)
 			core_mask = &cpu_topology[cpu].llc_sibling;
 	}
 
+	/*
+	 * For systems with no shared cpu-side LLC but with clusters defined,
+	 * extend core_mask to cluster_siblings. The sched domain builder will
+	 * then remove MC as redundant with CLS if SCHED_CLUSTER is enabled.
+	 */
+	if (IS_ENABLED(CONFIG_SCHED_CLUSTER) &&
+	    cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
+		core_mask = &cpu_topology[cpu].cluster_sibling;
+
 	return core_mask;
 }