diff mbox series

[v3] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512

Message ID 37099a57-b655-3b3a-56d0-5f7fbd49d7db@gentwo.org (mailing list archive)
State New, archived
Headers show
Series [v3] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 | expand

Commit Message

Christoph Lameter (Ampere) March 7, 2024, 1:45 a.m. UTC
Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
Therefore this patch increases the default NR_CPUS from 256 to 512.

As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
this might have a significant impact on stack usage due to code which places
cpumasks on the stack. To mitigate that concern, we can select
CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
NR_CPUS=256, we only select this when NR_CPUS > 256.

CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
dynamically allocated. This was used in the X86 architecture in the
past to enable support for larger CPU configurations up to 8k cpus.

With that is becomes possible to dynamically size the allocation of
the cpu bitmaps depending on the quantity of processors detected on
bootup. Memory used for cpumasks will increase if the kernel is
run on a machine with more cores.

Further increases may be needed if ARM processor vendors start
supporting more processors. Given the current inflationary trends
in core counts from multiple processor manufacturers this may occur.

There are minor regressions for hackbench. The kernel data size
for 512 cpus is smaller with offstack than with onstack.

Benchmark results using hackbench average over 10 runs of

 	hackbench -s 512 -l 2000 -g 15 -f 25 -P

on Altra 80 Core

Support for 256 CPUs on stack. Baseline

 	7.8564 sec

Support for 512 CUs on stack.

 	7.8713 sec + 0.18%

512 CPUS offstack

 	7.8916 sec + 0.44%

Kernel size comparison:

    text		   data	    filename				Difference to onstack256 baseline
25755648	9589248	    vmlinuz-6.8.0-rc4-onstack256
25755648	9607680	    vmlinuz-6.8.0-rc4-onstack512	+0.19%
25755648	9603584	    vmlinuz-6.8.0-rc4-offstack512	+0.14%

Tested-by: Eric Mackay <eric.mackay@oracle.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com>
---


Original post: https://www.spinics.net/lists/linux-mm/msg369701.html
V2: https://lkml.org/lkml/2024/2/7/505


V1->V2

- Keep quotation marks
- Remove whiltespace damage
- Add tested by

V2->V3:
- Add test results
- Rework descriptions


  arch/arm64/Kconfig | 16 +++++++++++++++-
  1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Mark Rutland March 7, 2024, 5:49 p.m. UTC | #1
Hi Christoph,

On Wed, Mar 06, 2024 at 05:45:04PM -0800, Christoph Lameter (Ampere) wrote:
> Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
> these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
> Therefore this patch increases the default NR_CPUS from 256 to 512.
> 
> As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
> this might have a significant impact on stack usage due to code which places
> cpumasks on the stack. To mitigate that concern, we can select
> CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> NR_CPUS=256, we only select this when NR_CPUS > 256.
> 
> CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
> dynamically allocated. This was used in the X86 architecture in the
> past to enable support for larger CPU configurations up to 8k cpus.
> 
> With that is becomes possible to dynamically size the allocation of
> the cpu bitmaps depending on the quantity of processors detected on
> bootup. Memory used for cpumasks will increase if the kernel is
> run on a machine with more cores.
> 
> Further increases may be needed if ARM processor vendors start
> supporting more processors. Given the current inflationary trends
> in core counts from multiple processor manufacturers this may occur.
> 
> There are minor regressions for hackbench. The kernel data size
> for 512 cpus is smaller with offstack than with onstack.
> 
> Benchmark results using hackbench average over 10 runs of
> 
> 	hackbench -s 512 -l 2000 -g 15 -f 25 -P
> 
> on Altra 80 Core
> 
> Support for 256 CPUs on stack. Baseline
> 
> 	7.8564 sec
> 
> Support for 512 CUs on stack.
> 
> 	7.8713 sec + 0.18%
> 
> 512 CPUS offstack
> 
> 	7.8916 sec + 0.44%
> 
> Kernel size comparison:
> 
>    text		   data	    filename				Difference to onstack256 baseline
> 25755648	9589248	    vmlinuz-6.8.0-rc4-onstack256
> 25755648	9607680	    vmlinuz-6.8.0-rc4-onstack512	+0.19%
> 25755648	9603584	    vmlinuz-6.8.0-rc4-offstack512	+0.14%

Thanks for this data; I think that's a strong justification that this isn't
likely to cause a big problem for us, and so I thing it makes sense to go with
this.

I have two minor comments below.

> Tested-by: Eric Mackay <eric.mackay@oracle.com>
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com>
> ---
> 
> 
> Original post: https://www.spinics.net/lists/linux-mm/msg369701.html
> V2: https://lkml.org/lkml/2024/2/7/505
> 
> 
> V1->V2
> 
> - Keep quotation marks
> - Remove whiltespace damage
> - Add tested by
> 
> V2->V3:
> - Add test results
> - Rework descriptions
> 
> 
>  arch/arm64/Kconfig | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index aa7c1d435139..4e767dede47d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1427,7 +1427,21 @@ config SCHED_SMT
>  config NR_CPUS
>  	int "Maximum number of CPUs (2-4096)"
>  	range 2 4096
> -	default "256"
> +	default "512"
> +
> +#
> +# Determines the placement of cpumasks.
> +#
> +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated.
> +# Useful for machines with lots of core because it avoids increasing
> +# the size of many of the data structures in the kernel.
> +#
> +# If this is off then the cpumasks have a static sizes and are
> +# embedded within data structures.
> +#
> +	config CPUMASK_OFFSTACK
> +	def_bool y
> +	depends on NR_CPUS > 256

As before, can we please delete the comment? That's the general semantic of
CPUMASK_OFFSTACK, not why we're selecting it.

That aside, this config option is defined in lib/Kconfig, so we should select
it rather than redefining it. i.e. this should be:

	select CPUMASK_OFFSTACK if NR_CPUS > 256

Sorry for not spotting that before.

With those changes:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Catalin, are you happy to fix that up when applying?

Mark.

> 
>  config HOTPLUG_CPU
>  	bool "Support for hot-pluggable CPUs"
> -- 
> 2.39.2
>
Catalin Marinas March 7, 2024, 7:07 p.m. UTC | #2
On Wed, 06 Mar 2024 17:45:04 -0800, Christoph Lameter (Ampere) wrote:
> Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
> these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
> Therefore this patch increases the default NR_CPUS from 256 to 512.
> 
> As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
> this might have a significant impact on stack usage due to code which places
> cpumasks on the stack. To mitigate that concern, we can select
> CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> NR_CPUS=256, we only select this when NR_CPUS > 256.
> 
> [...]

Applied to arm64 (for-next/misc), thanks!

I dropped the config entry and comment, replaced it with a select as per
Mark's suggestion.

[1/1] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
      https://git.kernel.org/arm64/c/0499a78369ad
Marek Szyprowski March 8, 2024, 2:01 p.m. UTC | #3
Dear All,

On 07.03.2024 02:45, Christoph Lameter (Ampere) wrote:
> Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> Computing) are planning to ship systems with 512 CPUs. So that all 
> CPUs on
> these systems can be used with defconfig, we'd like to bump NR_CPUS to 
> 512.
> Therefore this patch increases the default NR_CPUS from 256 to 512.
>
> As increasing NR_CPUS will increase the size of cpumasks, there's a 
> fear that
> this might have a significant impact on stack usage due to code which 
> places
> cpumasks on the stack. To mitigate that concern, we can select
> CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> NR_CPUS=256, we only select this when NR_CPUS > 256.
>
> CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
> dynamically allocated. This was used in the X86 architecture in the
> past to enable support for larger CPU configurations up to 8k cpus.
>
> With that is becomes possible to dynamically size the allocation of
> the cpu bitmaps depending on the quantity of processors detected on
> bootup. Memory used for cpumasks will increase if the kernel is
> run on a machine with more cores.
>
> Further increases may be needed if ARM processor vendors start
> supporting more processors. Given the current inflationary trends
> in core counts from multiple processor manufacturers this may occur.
>
> There are minor regressions for hackbench. The kernel data size
> for 512 cpus is smaller with offstack than with onstack.
>
> Benchmark results using hackbench average over 10 runs of
>
>     hackbench -s 512 -l 2000 -g 15 -f 25 -P
>
> on Altra 80 Core
>
> Support for 256 CPUs on stack. Baseline
>
>     7.8564 sec
>
> Support for 512 CUs on stack.
>
>     7.8713 sec + 0.18%
>
> 512 CPUS offstack
>
>     7.8916 sec + 0.44%
>
> Kernel size comparison:
>
>    text           data        filename                Difference to 
> onstack256 baseline
> 25755648    9589248        vmlinuz-6.8.0-rc4-onstack256
> 25755648    9607680        vmlinuz-6.8.0-rc4-onstack512    +0.19%
> 25755648    9603584        vmlinuz-6.8.0-rc4-offstack512    +0.14%
>
> Tested-by: Eric Mackay <eric.mackay@oracle.com>
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com>


This patch landed in today's linux-next as commit 0499a78369ad ("ARM64: 
Dynamically allocate cpumasks and increase supported CPUs to 512"). 
Unfortunately it triggers the following warning during boot on most of 
my ARM64-based test boards. Here is an example from Odroid-N2 board:

  ------------[ cut here ]------------
  WARNING: CPU: 4 PID: 63 at drivers/opp/core.c:2554 
dev_pm_opp_set_config+0x390/0x710
  Modules linked in: dw_hdmi_i2s_audio meson_gxl smsc onboard_usb_hub(+) 
rtc_pcf8563 panfrost snd_soc_meson_axg_sound_card drm_shmem_helper 
crct10dif_ce dwmac_generic snd_soc_meson_card_utils gpu_sched 
snd_soc_meson_g12a_toacodec snd_soc_meson_g12a_tohdmitx rc_odroid 
snd_soc_meson_codec_glue pwm_meson ao_cec_g12a meson_gxbb_wdt meson_ir 
snd_soc_meson_axg_frddr snd_soc_meson_axg_toddr snd_soc_meson_axg_tdmin 
meson_vdec(C) v4l2_mem2mem videobuf2_dma_contig snd_soc_meson_axg_tdmout 
videobuf2_memops axg_audio videobuf2_v4l2 sclk_div videodev 
reset_meson_audio_arb snd_soc_meson_axg_fifo clk_phase dwmac_meson8b 
stmmac_platform videobuf2_common mdio_mux_meson_g12a meson_drm 
meson_dw_hdmi rtc_meson_vrtc stmmac meson_ddr_pmu_g12 mc dw_hdmi 
drm_display_helper pcs_xpcs snd_soc_meson_t9015 meson_canvas gpio_fan 
display_connector snd_soc_meson_axg_tdm_interface 
snd_soc_simple_amplifier snd_soc_meson_axg_tdm_formatter nvmem_meson_efuse
  hub 1-1:1.0: USB hub found
  CPU: 4 PID: 63 Comm: kworker/u12:5 Tainted: G         C 6.8.0-rc3+ #14677
  Hardware name: Hardkernel ODROID-N2 (DT)
  Workqueue: events_unbound deferred_probe_work_func
  pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : dev_pm_opp_set_config+0x390/0x710
  lr : dev_pm_opp_set_config+0x5c/0x710
  ...
  Call trace:
   dev_pm_opp_set_config+0x390/0x710
   dt_cpufreq_probe+0x268/0x480
   platform_probe+0x68/0xd8
   really_probe+0x148/0x2b4
   __driver_probe_device+0x78/0x12c
   driver_probe_device+0xdc/0x164
   __device_attach_driver+0xb8/0x138
   bus_for_each_drv+0x84/0xe0
   __device_attach+0xa8/0x1b0
   device_initial_probe+0x14/0x20
   bus_probe_device+0xb0/0xb4
   deferred_probe_work_func+0x8c/0xc8
   process_one_work+0x1ec/0x53c
   worker_thread+0x298/0x408
   kthread+0x124/0x128
   ret_from_fork+0x10/0x20
  irq event stamp: 317178
  hardirqs last  enabled at (317177): [<ffff8000801788d4>] 
ktime_get_coarse_real_ts64+0x10c/0x110
  hardirqs last disabled at (317178): [<ffff800081222030>] el1_dbg+0x24/0x8c
  softirqs last  enabled at (315802): [<ffff800080010a60>] 
__do_softirq+0x4a0/0x4e8
  softirqs last disabled at (315793): [<ffff8000800169b0>] 
____do_softirq+0x10/0x1c
  ---[ end trace 0000000000000000 ]---
  cpu cpu2: error -EBUSY: failed to set regulators
  cpufreq-dt: probe of cpufreq-dt failed with error -16

It looks that cpufreq-dt and/or opp drivers needs some adjustments 
related with this change.


> ---
>
>
> Original post: https://www.spinics.net/lists/linux-mm/msg369701.html
> V2: https://lkml.org/lkml/2024/2/7/505
>
>
> V1->V2
>
> - Keep quotation marks
> - Remove whiltespace damage
> - Add tested by
>
> V2->V3:
> - Add test results
> - Rework descriptions
>
>
>  arch/arm64/Kconfig | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index aa7c1d435139..4e767dede47d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1427,7 +1427,21 @@ config SCHED_SMT
>  config NR_CPUS
>      int "Maximum number of CPUs (2-4096)"
>      range 2 4096
> -    default "256"
> +    default "512"
> +
> +#
> +# Determines the placement of cpumasks.
> +#
> +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated.
> +# Useful for machines with lots of core because it avoids increasing
> +# the size of many of the data structures in the kernel.
> +#
> +# If this is off then the cpumasks have a static sizes and are
> +# embedded within data structures.
> +#
> +    config CPUMASK_OFFSTACK
> +    def_bool y
> +    depends on NR_CPUS > 256
>
>  config HOTPLUG_CPU
>      bool "Support for hot-pluggable CPUs"

Best regards
Catalin Marinas March 8, 2024, 2:51 p.m. UTC | #4
On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote:
> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64: 
> Dynamically allocate cpumasks and increase supported CPUs to 512"). 
> Unfortunately it triggers the following warning during boot on most of 
> my ARM64-based test boards. Here is an example from Odroid-N2 board:
> 
>   ------------[ cut here ]------------
>   WARNING: CPU: 4 PID: 63 at drivers/opp/core.c:2554 
> dev_pm_opp_set_config+0x390/0x710
>   Modules linked in: dw_hdmi_i2s_audio meson_gxl smsc onboard_usb_hub(+) 
> rtc_pcf8563 panfrost snd_soc_meson_axg_sound_card drm_shmem_helper 
> crct10dif_ce dwmac_generic snd_soc_meson_card_utils gpu_sched 
> snd_soc_meson_g12a_toacodec snd_soc_meson_g12a_tohdmitx rc_odroid 
> snd_soc_meson_codec_glue pwm_meson ao_cec_g12a meson_gxbb_wdt meson_ir 
> snd_soc_meson_axg_frddr snd_soc_meson_axg_toddr snd_soc_meson_axg_tdmin 
> meson_vdec(C) v4l2_mem2mem videobuf2_dma_contig snd_soc_meson_axg_tdmout 
> videobuf2_memops axg_audio videobuf2_v4l2 sclk_div videodev 
> reset_meson_audio_arb snd_soc_meson_axg_fifo clk_phase dwmac_meson8b 
> stmmac_platform videobuf2_common mdio_mux_meson_g12a meson_drm 
> meson_dw_hdmi rtc_meson_vrtc stmmac meson_ddr_pmu_g12 mc dw_hdmi 
> drm_display_helper pcs_xpcs snd_soc_meson_t9015 meson_canvas gpio_fan 
> display_connector snd_soc_meson_axg_tdm_interface 
> snd_soc_simple_amplifier snd_soc_meson_axg_tdm_formatter nvmem_meson_efuse
>   hub 1-1:1.0: USB hub found
>   CPU: 4 PID: 63 Comm: kworker/u12:5 Tainted: G         C 6.8.0-rc3+ #14677
>   Hardware name: Hardkernel ODROID-N2 (DT)
>   Workqueue: events_unbound deferred_probe_work_func
>   pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : dev_pm_opp_set_config+0x390/0x710
>   lr : dev_pm_opp_set_config+0x5c/0x710
>   ...
>   Call trace:
>    dev_pm_opp_set_config+0x390/0x710
>    dt_cpufreq_probe+0x268/0x480
>    platform_probe+0x68/0xd8
>    really_probe+0x148/0x2b4
>    __driver_probe_device+0x78/0x12c
>    driver_probe_device+0xdc/0x164
>    __device_attach_driver+0xb8/0x138
>    bus_for_each_drv+0x84/0xe0
>    __device_attach+0xa8/0x1b0
>    device_initial_probe+0x14/0x20
>    bus_probe_device+0xb0/0xb4
>    deferred_probe_work_func+0x8c/0xc8
>    process_one_work+0x1ec/0x53c
>    worker_thread+0x298/0x408
>    kthread+0x124/0x128
>    ret_from_fork+0x10/0x20
>   irq event stamp: 317178
>   hardirqs last  enabled at (317177): [<ffff8000801788d4>] 
> ktime_get_coarse_real_ts64+0x10c/0x110
>   hardirqs last disabled at (317178): [<ffff800081222030>] el1_dbg+0x24/0x8c
>   softirqs last  enabled at (315802): [<ffff800080010a60>] 
> __do_softirq+0x4a0/0x4e8
>   softirqs last disabled at (315793): [<ffff8000800169b0>] 
> ____do_softirq+0x10/0x1c
>   ---[ end trace 0000000000000000 ]---
>   cpu cpu2: error -EBUSY: failed to set regulators
>   cpufreq-dt: probe of cpufreq-dt failed with error -16
> 
> It looks that cpufreq-dt and/or opp drivers needs some adjustments 
> related with this change.

That's strange. Is this with defconfig? I wonder whether NR_CPUS being
larger caused the issue with this specific code. Otherwise
CPUMASK_OFFSTACK may not work that well on arm64.

> 
> 
> > ---
> >
> >
> > Original post: https://www.spinics.net/lists/linux-mm/msg369701.html
> > V2: https://lkml.org/lkml/2024/2/7/505
> >
> >
> > V1->V2
> >
> > - Keep quotation marks
> > - Remove whiltespace damage
> > - Add tested by
> >
> > V2->V3:
> > - Add test results
> > - Rework descriptions
> >
> >
> >  arch/arm64/Kconfig | 16 +++++++++++++++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index aa7c1d435139..4e767dede47d 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1427,7 +1427,21 @@ config SCHED_SMT
> >  config NR_CPUS
> >      int "Maximum number of CPUs (2-4096)"
> >      range 2 4096
> > -    default "256"
> > +    default "512"
> > +
> > +#
> > +# Determines the placement of cpumasks.
> > +#
> > +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated.
> > +# Useful for machines with lots of core because it avoids increasing
> > +# the size of many of the data structures in the kernel.
> > +#
> > +# If this is off then the cpumasks have a static sizes and are
> > +# embedded within data structures.
> > +#
> > +    config CPUMASK_OFFSTACK
> > +    def_bool y
> > +    depends on NR_CPUS > 256
> >
> >  config HOTPLUG_CPU
> >      bool "Support for hot-pluggable CPUs"
> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>
Marek Szyprowski March 8, 2024, 4:21 p.m. UTC | #5
On 08.03.2024 15:51, Catalin Marinas wrote:
> On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote:
>> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64:
>> Dynamically allocate cpumasks and increase supported CPUs to 512").
>> Unfortunately it triggers the following warning during boot on most of
>> my ARM64-based test boards. Here is an example from Odroid-N2 board:
>>
>>    ------------[ cut here ]------------
>>    WARNING: CPU: 4 PID: 63 at drivers/opp/core.c:2554
>> dev_pm_opp_set_config+0x390/0x710
>>    Modules linked in: dw_hdmi_i2s_audio meson_gxl smsc onboard_usb_hub(+)
>> rtc_pcf8563 panfrost snd_soc_meson_axg_sound_card drm_shmem_helper
>> crct10dif_ce dwmac_generic snd_soc_meson_card_utils gpu_sched
>> snd_soc_meson_g12a_toacodec snd_soc_meson_g12a_tohdmitx rc_odroid
>> snd_soc_meson_codec_glue pwm_meson ao_cec_g12a meson_gxbb_wdt meson_ir
>> snd_soc_meson_axg_frddr snd_soc_meson_axg_toddr snd_soc_meson_axg_tdmin
>> meson_vdec(C) v4l2_mem2mem videobuf2_dma_contig snd_soc_meson_axg_tdmout
>> videobuf2_memops axg_audio videobuf2_v4l2 sclk_div videodev
>> reset_meson_audio_arb snd_soc_meson_axg_fifo clk_phase dwmac_meson8b
>> stmmac_platform videobuf2_common mdio_mux_meson_g12a meson_drm
>> meson_dw_hdmi rtc_meson_vrtc stmmac meson_ddr_pmu_g12 mc dw_hdmi
>> drm_display_helper pcs_xpcs snd_soc_meson_t9015 meson_canvas gpio_fan
>> display_connector snd_soc_meson_axg_tdm_interface
>> snd_soc_simple_amplifier snd_soc_meson_axg_tdm_formatter nvmem_meson_efuse
>>    hub 1-1:1.0: USB hub found
>>    CPU: 4 PID: 63 Comm: kworker/u12:5 Tainted: G         C 6.8.0-rc3+ #14677
>>    Hardware name: Hardkernel ODROID-N2 (DT)
>>    Workqueue: events_unbound deferred_probe_work_func
>>    pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>    pc : dev_pm_opp_set_config+0x390/0x710
>>    lr : dev_pm_opp_set_config+0x5c/0x710
>>    ...
>>    Call trace:
>>     dev_pm_opp_set_config+0x390/0x710
>>     dt_cpufreq_probe+0x268/0x480
>>     platform_probe+0x68/0xd8
>>     really_probe+0x148/0x2b4
>>     __driver_probe_device+0x78/0x12c
>>     driver_probe_device+0xdc/0x164
>>     __device_attach_driver+0xb8/0x138
>>     bus_for_each_drv+0x84/0xe0
>>     __device_attach+0xa8/0x1b0
>>     device_initial_probe+0x14/0x20
>>     bus_probe_device+0xb0/0xb4
>>     deferred_probe_work_func+0x8c/0xc8
>>     process_one_work+0x1ec/0x53c
>>     worker_thread+0x298/0x408
>>     kthread+0x124/0x128
>>     ret_from_fork+0x10/0x20
>>    irq event stamp: 317178
>>    hardirqs last  enabled at (317177): [<ffff8000801788d4>]
>> ktime_get_coarse_real_ts64+0x10c/0x110
>>    hardirqs last disabled at (317178): [<ffff800081222030>] el1_dbg+0x24/0x8c
>>    softirqs last  enabled at (315802): [<ffff800080010a60>]
>> __do_softirq+0x4a0/0x4e8
>>    softirqs last disabled at (315793): [<ffff8000800169b0>]
>> ____do_softirq+0x10/0x1c
>>    ---[ end trace 0000000000000000 ]---
>>    cpu cpu2: error -EBUSY: failed to set regulators
>>    cpufreq-dt: probe of cpufreq-dt failed with error -16
>>
>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
>> related with this change.
> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
> larger caused the issue with this specific code. Otherwise
> CPUMASK_OFFSTACK may not work that well on arm64.

I've used defconfig with some debug options enabled and some drivers 
compiled-in:

make ARCH=arm64 defconfig

./scripts/config -e BLK_DEV_RAM --set-val BLK_DEV_RAM_COUNT 4 --set-val 
BLK_DEV_RAM_SIZE 81920 --set-val CMA_SIZE_MBYTES 96 -e PROVE_LOCKING -e 
DEBUG_ATOMIC_SLEEP -e STAGING -e I2C_GPIO -e PM_DEBUG -e 
PM_ADVANCED_DEBUG -e USB_GADGET -e USB_ETH -e CONFIG_DEVFREQ_THERMAL -e 
CONFIG_BRCMFMAC_PCIE -e CONFIG_NFC -d ARCH_SUNXI -d ARCH_ALPINE -d 
DRM_NOUVEAU -d ARCH_BCM_IPROC -d ARCH_BERLIN -d ARCH_BRCMSTB -d 
ARCH_LAYERSCAPE -d ARCH_LG1K -d ARCH_HISI -d ARCH_MEDIATEK -d ARCH_MVEBU 
-d ARCH_SEATTLE -d ARCH_SYNQUACER -d ARCH_RENESAS -d ARCH_STRATIX10 -d 
ARCH_TEGRA -d ARCH_SPRD -d ARCH_THUNDER -d ARCH_THUNDER2 -d 
ARCH_UNIPHIER -d ARCH_XGENE -d ARCH_ZX -d ARCH_ZYNQMP -d HIBERNATION -d 
CLK_SUNXI -d CONFIG_EFI -d CONFIG_TEE -e FW_CFG_SYSFS

Best regards
Christoph Lameter (Ampere) March 8, 2024, 5:08 p.m. UTC | #6
On Fri, 8 Mar 2024, Marek Szyprowski wrote:

>>>
>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
>>> related with this change.
>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
>> larger caused the issue with this specific code. Otherwise
>> CPUMASK_OFFSTACK may not work that well on arm64.

cpumask handling must use the accessor functions provided in 
include/linux/cpumask.h for declaring and accessing cpumasks. It is likely 
related to the driver opencoding one of the accessors.

I.e. you must use alloc_cpumask_var() and not allocate yourself on the 
stack.
Mark Rutland March 11, 2024, 12:12 p.m. UTC | #7
On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
> 
> > > > 
> > > > It looks that cpufreq-dt and/or opp drivers needs some adjustments
> > > > related with this change.
> > > That's strange. Is this with defconfig? I wonder whether NR_CPUS being
> > > larger caused the issue with this specific code. Otherwise
> > > CPUMASK_OFFSTACK may not work that well on arm64.
> 
> cpumask handling must use the accessor functions provided in
> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
> related to the driver opencoding one of the accessors.

I took a look at both the OPP code and the cpufreq-dt code and it looks like
those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
the cpumask accessors, and use the cpumask_var_*() functions to dynamically
allocate/free cpumasks). Maybe I've missed something, but superficially those
look right.

Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?
That'll have CPUMASK_OFFSTACK=n, and:

* If that blows up, we know the problem is independent of CPUMASK_OFFSTACK, and
  has something to do with large cpumasks (either a driver bug, or elsewhere).

* If that doesn't blow up, it suggests the problem is related to
  CPUMASK_OFFSTACK rather than with large cpumasks specifically.

Either way, we probably need to revert this patch for now, as this won't have
enough time to soak in linux-next in time for v6.9.

Mark.
Marek Szyprowski March 11, 2024, 2:56 p.m. UTC | #8
On 11.03.2024 13:12, Mark Rutland wrote:
> On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
>> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
>>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
>>>>> related with this change.
>>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
>>>> larger caused the issue with this specific code. Otherwise
>>>> CPUMASK_OFFSTACK may not work that well on arm64.
>> cpumask handling must use the accessor functions provided in
>> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
>> related to the driver opencoding one of the accessors.
> I took a look at both the OPP code and the cpufreq-dt code and it looks like
> those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
> the cpumask accessors, and use the cpumask_var_*() functions to dynamically
> allocate/free cpumasks). Maybe I've missed something, but superficially those
> look right.
>
> Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?

Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works 
fine, so it must be something else broken.

> That'll have CPUMASK_OFFSTACK=n, and:
>
> * If that blows up, we know the problem is independent of CPUMASK_OFFSTACK, and
>    has something to do with large cpumasks (either a driver bug, or elsewhere).
>
> * If that doesn't blow up, it suggests the problem is related to
>    CPUMASK_OFFSTACK rather than with large cpumasks specifically.
>
> Either way, we probably need to revert this patch for now, as this won't have
> enough time to soak in linux-next in time for v6.9.

Best regards
Catalin Marinas March 11, 2024, 3:22 p.m. UTC | #9
On Mon, Mar 11, 2024 at 03:56:37PM +0100, Marek Szyprowski wrote:
> On 11.03.2024 13:12, Mark Rutland wrote:
> > On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
> >> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
> >>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
> >>>>> related with this change.
> >>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
> >>>> larger caused the issue with this specific code. Otherwise
> >>>> CPUMASK_OFFSTACK may not work that well on arm64.
> >> cpumask handling must use the accessor functions provided in
> >> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
> >> related to the driver opencoding one of the accessors.
> > I took a look at both the OPP code and the cpufreq-dt code and it looks like
> > those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
> > the cpumask accessors, and use the cpumask_var_*() functions to dynamically
> > allocate/free cpumasks). Maybe I've missed something, but superficially those
> > look right.
> >
> > Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?
> 
> Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works 
> fine, so it must be something else broken.

Thanks for confirming. Would you mind testing the problematic commit
with CONFIG_DEBUG_PER_CPU_MAPS enabled? If it doesn't show anything
obvious that can be fixed quickly, I'll revert the commit and queue it
again after -rc1 for 6.10 (I haven't sent 6.9 the pull request yet).
Marek Szyprowski March 11, 2024, 4:51 p.m. UTC | #10
Hi Catalin,

On 11.03.2024 16:22, Catalin Marinas wrote:
> On Mon, Mar 11, 2024 at 03:56:37PM +0100, Marek Szyprowski wrote:
>> On 11.03.2024 13:12, Mark Rutland wrote:
>>> On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
>>>> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
>>>>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
>>>>>>> related with this change.
>>>>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
>>>>>> larger caused the issue with this specific code. Otherwise
>>>>>> CPUMASK_OFFSTACK may not work that well on arm64.
>>>> cpumask handling must use the accessor functions provided in
>>>> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
>>>> related to the driver opencoding one of the accessors.
>>> I took a look at both the OPP code and the cpufreq-dt code and it looks like
>>> those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
>>> the cpumask accessors, and use the cpumask_var_*() functions to dynamically
>>> allocate/free cpumasks). Maybe I've missed something, but superficially those
>>> look right.
>>>
>>> Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?
>> Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works
>> fine, so it must be something else broken.
> Thanks for confirming. Would you mind testing the problematic commit
> with CONFIG_DEBUG_PER_CPU_MAPS enabled? If it doesn't show anything
> obvious that can be fixed quickly, I'll revert the commit and queue it
> again after -rc1 for 6.10 (I haven't sent 6.9 the pull request yet).

I've enabled this option, but unfortunately it didn't reveal anything 
more besides the warning and error I've posted in my initial report. I 
will try to analyze this issue further, but I won't manage to do this today.

Best regards
Catalin Marinas March 11, 2024, 5:08 p.m. UTC | #11
On Mon, Mar 11, 2024 at 05:51:04PM +0100, Marek Szyprowski wrote:
> On 11.03.2024 16:22, Catalin Marinas wrote:
> > On Mon, Mar 11, 2024 at 03:56:37PM +0100, Marek Szyprowski wrote:
> >> On 11.03.2024 13:12, Mark Rutland wrote:
> >>> On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
> >>>> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
> >>>>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
> >>>>>>> related with this change.
> >>>>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
> >>>>>> larger caused the issue with this specific code. Otherwise
> >>>>>> CPUMASK_OFFSTACK may not work that well on arm64.
> >>>> cpumask handling must use the accessor functions provided in
> >>>> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
> >>>> related to the driver opencoding one of the accessors.
> >>> I took a look at both the OPP code and the cpufreq-dt code and it looks like
> >>> those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
> >>> the cpumask accessors, and use the cpumask_var_*() functions to dynamically
> >>> allocate/free cpumasks). Maybe I've missed something, but superficially those
> >>> look right.
> >>>
> >>> Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?
> >> Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works
> >> fine, so it must be something else broken.
> > Thanks for confirming. Would you mind testing the problematic commit
> > with CONFIG_DEBUG_PER_CPU_MAPS enabled? If it doesn't show anything
> > obvious that can be fixed quickly, I'll revert the commit and queue it
> > again after -rc1 for 6.10 (I haven't sent 6.9 the pull request yet).
> 
> I've enabled this option, but unfortunately it didn't reveal anything 
> more besides the warning and error I've posted in my initial report. I 
> will try to analyze this issue further, but I won't manage to do this today.

No worries, thanks for giving this a try.
Catalin Marinas March 11, 2024, 6:55 p.m. UTC | #12
On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote:
> On 07.03.2024 02:45, Christoph Lameter (Ampere) wrote:
> > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> > Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
> > these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
> > Therefore this patch increases the default NR_CPUS from 256 to 512.
> >
> > As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
> > this might have a significant impact on stack usage due to code which places
> > cpumasks on the stack. To mitigate that concern, we can select
> > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> > NR_CPUS=256, we only select this when NR_CPUS > 256.
> >
> > CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
> > dynamically allocated. This was used in the X86 architecture in the
> > past to enable support for larger CPU configurations up to 8k cpus.
[...]
> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64: 
> Dynamically allocate cpumasks and increase supported CPUs to 512"). 
> Unfortunately it triggers the following warning during boot on most of 
> my ARM64-based test boards. Here is an example from Odroid-N2 board:

I spent a big part of this afternoon going through the code paths but
there's nothing obvious that triggered this problem. My suspicion is
some memory corruption, algorithmically I can't see anything that could
go wrong with CPUMASK_OFFSTACK. Unfortunately I could not reproduce it
yet to be able to add some debug info.

So I decided to revert this patch. If we get to the bottom of it during
the merging window, I can still revive it. Otherwise we'll add it to
linux-next post -rc1.

Thanks for reporting it and subsequent debugging.
Christoph Lameter (Ampere) March 11, 2024, 9:07 p.m. UTC | #13
On Mon, 11 Mar 2024, Catalin Marinas wrote:

>> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64:
>> Dynamically allocate cpumasks and increase supported CPUs to 512").
>> Unfortunately it triggers the following warning during boot on most of
>> my ARM64-based test boards. Here is an example from Odroid-N2 board:
>
> I spent a big part of this afternoon going through the code paths but
> there's nothing obvious that triggered this problem. My suspicion is
> some memory corruption, algorithmically I can't see anything that could
> go wrong with CPUMASK_OFFSTACK. Unfortunately I could not reproduce it
> yet to be able to add some debug info.
>
> So I decided to revert this patch. If we get to the bottom of it during
> the merging window, I can still revive it. Otherwise we'll add it to
> linux-next post -rc1.

I also looked through the opp source and I cannot find even anything that
even uses the functionality changed by the OFFSTACK option.

This could be an issue in the ARM64 arch code itself where there maybe 
an assumption elsewhere that a cpumask can always store up to NR_CPU cpus 
and not only nr_cpu_ids as OFFSTACK does.

How can I exercise the opp driver in order to recreate the problem?

I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if 
there is an issue with OFFSTACK in opp then it should fail with kernel 
default configuration on that platform.
Christoph Lameter (Ampere) March 12, 2024, 5:06 p.m. UTC | #14
On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:

> This could be an issue in the ARM64 arch code itself where there maybe an 
> assumption elsewhere that a cpumask can always store up to NR_CPU cpus and 
> not only nr_cpu_ids as OFFSTACK does.
>
> How can I exercise the opp driver in order to recreate the problem?
>
> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if there 
> is an issue with OFFSTACK in opp then it should fail with kernel default 
> configuration on that platform.

I checked the ARM64 arch sources use of NR_CPUS and its all fine.

Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.

No warnings in the kernel log during those tests.

How to reproduce this?
Catalin Marinas March 12, 2024, 5:55 p.m. UTC | #15
On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote:
> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
> 
> > This could be an issue in the ARM64 arch code itself where there maybe
> > an assumption elsewhere that a cpumask can always store up to NR_CPU
> > cpus and not only nr_cpu_ids as OFFSTACK does.
> > 
> > How can I exercise the opp driver in order to recreate the problem?
> > 
> > I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
> > there is an issue with OFFSTACK in opp then it should fail with kernel
> > default configuration on that platform.
> 
> I checked the ARM64 arch sources use of NR_CPUS and its all fine.
> 
> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
> 
> No warnings in the kernel log during those tests.
> 
> How to reproduce this?

I guess you need a platform with a dts that has an "operating-points-v2"
property. I don't have any around.

Sudeep was trying to trigger this code path earlier, not sure where he
got to.
Sudeep Holla March 13, 2024, 2:35 p.m. UTC | #16
On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote:
> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote:
> > On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
> >
> > > This could be an issue in the ARM64 arch code itself where there maybe
> > > an assumption elsewhere that a cpumask can always store up to NR_CPU
> > > cpus and not only nr_cpu_ids as OFFSTACK does.
> > >
> > > How can I exercise the opp driver in order to recreate the problem?
> > >
> > > I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
> > > there is an issue with OFFSTACK in opp then it should fail with kernel
> > > default configuration on that platform.
> >
> > I checked the ARM64 arch sources use of NR_CPUS and its all fine.
> >
> > Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
> >
> > No warnings in the kernel log during those tests.
> >
> > How to reproduce this?
>
> I guess you need a platform with a dts that has an "operating-points-v2"
> property. I don't have any around.
>
> Sudeep was trying to trigger this code path earlier, not sure where he
> got to.

I did try to trigger this on FVP by adding OPPs + some hacks to add dummy
clock provider to successfully probe this driver. I couldn't hit the issue
reported 
Marek Szyprowski March 13, 2024, 4:22 p.m. UTC | #17
On 13.03.2024 15:35, Sudeep Holla wrote:
> On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote:
>> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote:
>>> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
>>>
>>>> This could be an issue in the ARM64 arch code itself where there maybe
>>>> an assumption elsewhere that a cpumask can always store up to NR_CPU
>>>> cpus and not only nr_cpu_ids as OFFSTACK does.
>>>>
>>>> How can I exercise the opp driver in order to recreate the problem?
>>>>
>>>> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
>>>> there is an issue with OFFSTACK in opp then it should fail with kernel
>>>> default configuration on that platform.
>>> I checked the ARM64 arch sources use of NR_CPUS and its all fine.
>>>
>>> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
>>>
>>> No warnings in the kernel log during those tests.
>>>
>>> How to reproduce this?
>> I guess you need a platform with a dts that has an "operating-points-v2"
>> property. I don't have any around.
>>
>> Sudeep was trying to trigger this code path earlier, not sure where he
>> got to.
> I did try to trigger this on FVP by adding OPPs + some hacks to add dummy
> clock provider to successfully probe this driver. I couldn't hit the issue
> reported 
Christoph Lameter (Ampere) March 13, 2024, 4:39 p.m. UTC | #18
On Wed, 13 Mar 2024, Marek Szyprowski wrote:

>> I did try to trigger this on FVP by adding OPPs + some hacks to add dummy
>> clock provider to successfully probe this driver. I couldn't hit the issue
>> reported 
Russell King (Oracle) March 13, 2024, 5:13 p.m. UTC | #19
On Wed, Mar 13, 2024 at 05:22:33PM +0100, Marek Szyprowski wrote:
> On 13.03.2024 15:35, Sudeep Holla wrote:
> > On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote:
> >> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote:
> >>> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
> >>>
> >>>> This could be an issue in the ARM64 arch code itself where there maybe
> >>>> an assumption elsewhere that a cpumask can always store up to NR_CPU
> >>>> cpus and not only nr_cpu_ids as OFFSTACK does.
> >>>>
> >>>> How can I exercise the opp driver in order to recreate the problem?
> >>>>
> >>>> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
> >>>> there is an issue with OFFSTACK in opp then it should fail with kernel
> >>>> default configuration on that platform.
> >>> I checked the ARM64 arch sources use of NR_CPUS and its all fine.
> >>>
> >>> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
> >>>
> >>> No warnings in the kernel log during those tests.
> >>>
> >>> How to reproduce this?
> >> I guess you need a platform with a dts that has an "operating-points-v2"
> >> property. I don't have any around.
> >>
> >> Sudeep was trying to trigger this code path earlier, not sure where he
> >> got to.
> > I did try to trigger this on FVP by adding OPPs + some hacks to add dummy
> > clock provider to successfully probe this driver. I couldn't hit the issue
> > reported 
Marek Szyprowski March 13, 2024, 8:18 p.m. UTC | #20
On 13.03.2024 17:39, Christoph Lameter (Ampere) wrote:
> On Wed, 13 Mar 2024, Marek Szyprowski wrote:
>
>>> I did try to trigger this on FVP by adding OPPs + some hacks to add 
>>> dummy
>>> clock provider to successfully probe this driver. I couldn't hit the 
>>> issue
>>> reported 
Catalin Marinas March 14, 2024, 8:39 a.m. UTC | #21
On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote:
> So, I wonder whether what you're seeing is a latent bug which is
> being tickled by the presence of the CPU masks being off-stack
> changing the kernel timing.
> 
> I would suggest the printk debug approach may help here to see when
> the OPPs are begun to be parsed, when they're created etc and their
> timing relationship to being used. Given the suspicion, it's possible
> that the mere addition of printk() may "fix" the problem, which again
> would be another semi-useful data point.

It might be an init order problem. Passing "initcall_debug" on the
cmdline might help a bit.

It would also be useful in dev_pm_opp_set_config(), in the WARN_ON
block, to print opp_table->opp_list.next to get an idea whether it looks
like a valid pointer or memory corruption.
Marek Szyprowski March 14, 2024, 12:28 p.m. UTC | #22
Dear All,

On 14.03.2024 09:39, Catalin Marinas wrote:
> On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote:
>> So, I wonder whether what you're seeing is a latent bug which is
>> being tickled by the presence of the CPU masks being off-stack
>> changing the kernel timing.
>>
>> I would suggest the printk debug approach may help here to see when
>> the OPPs are begun to be parsed, when they're created etc and their
>> timing relationship to being used. Given the suspicion, it's possible
>> that the mere addition of printk() may "fix" the problem, which again
>> would be another semi-useful data point.
> It might be an init order problem. Passing "initcall_debug" on the
> cmdline might help a bit.
>
> It would also be useful in dev_pm_opp_set_config(), in the WARN_ON
> block, to print opp_table->opp_list.next to get an idea whether it looks
> like a valid pointer or memory corruption.

I've finally found some time to do the step-by-step printk-based 
debugging of this issue and finally found what's broken!

Here is the fix:

diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index 8bd6e5e8f121..2d83bbc65dd0 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev, 
int cpu)
         if (!priv)
                 return -ENOMEM;

-       if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL))
+       if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL))
                 return -ENOMEM;

         cpumask_set_cpu(cpu, priv->cpus);


It is really surprising that this didn't blow up for anyone else so 
far... This means that the $subject patch is fine.

I will send a proper patch fixing this issue in a few minutes.

Best regards
Russell King (Oracle) March 14, 2024, 1:17 p.m. UTC | #23
On Thu, Mar 14, 2024 at 01:28:40PM +0100, Marek Szyprowski wrote:
> Dear All,
> 
> On 14.03.2024 09:39, Catalin Marinas wrote:
> > On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote:
> >> So, I wonder whether what you're seeing is a latent bug which is
> >> being tickled by the presence of the CPU masks being off-stack
> >> changing the kernel timing.
> >>
> >> I would suggest the printk debug approach may help here to see when
> >> the OPPs are begun to be parsed, when they're created etc and their
> >> timing relationship to being used. Given the suspicion, it's possible
> >> that the mere addition of printk() may "fix" the problem, which again
> >> would be another semi-useful data point.
> > It might be an init order problem. Passing "initcall_debug" on the
> > cmdline might help a bit.
> >
> > It would also be useful in dev_pm_opp_set_config(), in the WARN_ON
> > block, to print opp_table->opp_list.next to get an idea whether it looks
> > like a valid pointer or memory corruption.
> 
> I've finally found some time to do the step-by-step printk-based 
> debugging of this issue and finally found what's broken!
> 
> Here is the fix:
> 
> diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> index 8bd6e5e8f121..2d83bbc65dd0 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev, 
> int cpu)
>          if (!priv)
>                  return -ENOMEM;
> 
> -       if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL))
> +       if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL))
>                  return -ENOMEM;
> 
>          cpumask_set_cpu(cpu, priv->cpus);
> 
> 
> It is really surprising that this didn't blow up for anyone else so 
> far... This means that the $subject patch is fine.

Wow. I guess we've been lucky with that allocation hitting memory
containing zeros. Well done at tracking it down!
Catalin Marinas March 14, 2024, 1:57 p.m. UTC | #24
On Thu, Mar 14, 2024 at 01:28:40PM +0100, Marek Szyprowski wrote:
> On 14.03.2024 09:39, Catalin Marinas wrote:
> > On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote:
> >> So, I wonder whether what you're seeing is a latent bug which is
> >> being tickled by the presence of the CPU masks being off-stack
> >> changing the kernel timing.
> >>
> >> I would suggest the printk debug approach may help here to see when
> >> the OPPs are begun to be parsed, when they're created etc and their
> >> timing relationship to being used. Given the suspicion, it's possible
> >> that the mere addition of printk() may "fix" the problem, which again
> >> would be another semi-useful data point.
> > It might be an init order problem. Passing "initcall_debug" on the
> > cmdline might help a bit.
> >
> > It would also be useful in dev_pm_opp_set_config(), in the WARN_ON
> > block, to print opp_table->opp_list.next to get an idea whether it looks
> > like a valid pointer or memory corruption.
> 
> I've finally found some time to do the step-by-step printk-based 
> debugging of this issue and finally found what's broken!
> 
> Here is the fix:
> 
> diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> index 8bd6e5e8f121..2d83bbc65dd0 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev, 
> int cpu)
>          if (!priv)
>                  return -ENOMEM;
> 
> -       if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL))
> +       if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL))
>                  return -ENOMEM;
> 
>          cpumask_set_cpu(cpu, priv->cpus);
> 
> 
> It is really surprising that this didn't blow up for anyone else so 
> far... This means that the $subject patch is fine.
> 
> I will send a proper patch fixing this issue in a few minutes.

Nice. Many thanks for tracking this down. I'll revert the revert of the
CPUMASK_OFFSTACK in the second part of the merging window (I already
sent the pull request).
Christoph Lameter (Ampere) March 14, 2024, 5:01 p.m. UTC | #25
On Thu, 14 Mar 2024, Russell King (Oracle) wrote:

>> It is really surprising that this didn't blow up for anyone else so
>> far... This means that the $subject patch is fine.
>
> Wow. I guess we've been lucky with that allocation hitting memory
> containing zeros. Well done at tracking it down!

It would have blown up with slub_debug because that includes poisoning the 
contents of all allocations via the slab allocator. Why did that not 
occur? We should have seen a backtrace with data in registers etc showing 
poisoning values for an unitialized object.

Note that this was indeed triggered by OFFSTACK because 
(z)alloc_cpumask_var() only generates a kmalloc allocation if that option 
is set.

The config option was never set before my patch was applied on ARM64.
Catalin Marinas March 18, 2024, 6:17 p.m. UTC | #26
On Thu, Mar 07, 2024 at 07:07:07PM +0000, Catalin Marinas wrote:
> On Wed, 06 Mar 2024 17:45:04 -0800, Christoph Lameter (Ampere) wrote:
> > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> > Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
> > these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
> > Therefore this patch increases the default NR_CPUS from 256 to 512.
> > 
> > As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
> > this might have a significant impact on stack usage due to code which places
> > cpumasks on the stack. To mitigate that concern, we can select
> > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> > NR_CPUS=256, we only select this when NR_CPUS > 256.
> > 
> > [...]
> 
> Applied to arm64 (for-next/misc), thanks!
> 
> I dropped the config entry and comment, replaced it with a select as per
> Mark's suggestion.
> 
> [1/1] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
>       https://git.kernel.org/arm64/c/0499a78369ad

I re-instated this patch in arm64 for-next/core as:

https://git.kernel.org/arm64/c/3fbd56f0e7c1
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index aa7c1d435139..4e767dede47d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1427,7 +1427,21 @@  config SCHED_SMT
  config NR_CPUS
  	int "Maximum number of CPUs (2-4096)"
  	range 2 4096
-	default "256"
+	default "512"
+
+#
+# Determines the placement of cpumasks.
+#
+# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated.
+# Useful for machines with lots of core because it avoids increasing
+# the size of many of the data structures in the kernel.
+#
+# If this is off then the cpumasks have a static sizes and are
+# embedded within data structures.
+#
+	config CPUMASK_OFFSTACK
+	def_bool y
+	depends on NR_CPUS > 256

  config HOTPLUG_CPU
  	bool "Support for hot-pluggable CPUs"