diff mbox series

[2/2] arch_topology: Sanity check cpumask in thermal pressure update

Message ID 20220118185612.2067031-2-bjorn.andersson@linaro.org (mailing list archive)
State New, archived
Delegated to: viresh kumar
Headers show
Series [1/2] cpufreq: qcom-hw: Use initialized cpumask for thermal pressure update | expand

Commit Message

Bjorn Andersson Jan. 18, 2022, 6:56 p.m. UTC
Occasionally during boot the Qualcomm cpufreq driver was able to cause
an invalid memory access in topology_update_thermal_pressure() on the
line:

	if (max_freq <= capped_freq)

It turns out that this was caused by a race, which resulted in the
cpumask passed to the function being empty, in which case
cpumask_first() will return a cpu beyond the number of valid cpus, which
when used to access the per_cpu max_freq would return invalid pointer.

The bug in the Qualcomm cpufreq driver is being fixed, but having a
sanity check of the arguments would have saved quite a bit of time and
it's not unlikely that others will run into the same issue.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---
 drivers/base/arch_topology.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Greg Kroah-Hartman Jan. 19, 2022, 10:25 a.m. UTC | #1
On Tue, Jan 18, 2022 at 10:56:12AM -0800, Bjorn Andersson wrote:
> Occasionally during boot the Qualcomm cpufreq driver was able to cause
> an invalid memory access in topology_update_thermal_pressure() on the
> line:
> 
> 	if (max_freq <= capped_freq)
> 
> It turns out that this was caused by a race, which resulted in the
> cpumask passed to the function being empty, in which case
> cpumask_first() will return a cpu beyond the number of valid cpus, which
> when used to access the per_cpu max_freq would return invalid pointer.
> 
> The bug in the Qualcomm cpufreq driver is being fixed, but having a
> sanity check of the arguments would have saved quite a bit of time and
> it's not unlikely that others will run into the same issue.
> 
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
>  drivers/base/arch_topology.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 976154140f0b..6560a0c3b969 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -177,6 +177,9 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
>  	u32 max_freq;
>  	int cpu;
>  
> +	if (WARN_ON(cpumask_empty(cpus)))
> +		return;

Sorry, but I do not want to add any more WARN_ON() calls to the kernel
unless really needed.  We don't try to save the kernel from itself all
the time by validating every internal api call parameters.

thjanks,

greg k-h
Sudeep Holla Jan. 19, 2022, 2:43 p.m. UTC | #2
On Tue, Jan 18, 2022 at 10:56:12AM -0800, Bjorn Andersson wrote:
> Occasionally during boot the Qualcomm cpufreq driver was able to cause
> an invalid memory access in topology_update_thermal_pressure() on the
> line:
> 
> 	if (max_freq <= capped_freq)
> 
> It turns out that this was caused by a race, which resulted in the
> cpumask passed to the function being empty, in which case
> cpumask_first() will return a cpu beyond the number of valid cpus, which
> when used to access the per_cpu max_freq would return invalid pointer.
> 
> The bug in the Qualcomm cpufreq driver is being fixed, but having a
> sanity check of the arguments would have saved quite a bit of time and
> it's not unlikely that others will run into the same issue.
> 
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
>  drivers/base/arch_topology.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 976154140f0b..6560a0c3b969 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -177,6 +177,9 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
>  	u32 max_freq;
>  	int cpu;
>  
> +	if (WARN_ON(cpumask_empty(cpus)))
> +		return;
> +

Why can't the caller check and call this only when cpus is not empty ?
IIUC there are many such APIs that use cpumask and could result in similar
issues if called with empty cpus. Probably we could add a note that cpus
must not be empty if that helps the callers ?
Bjorn Andersson Jan. 19, 2022, 3:21 p.m. UTC | #3
On Wed 19 Jan 06:43 PST 2022, Sudeep Holla wrote:

> On Tue, Jan 18, 2022 at 10:56:12AM -0800, Bjorn Andersson wrote:
> > Occasionally during boot the Qualcomm cpufreq driver was able to cause
> > an invalid memory access in topology_update_thermal_pressure() on the
> > line:
> > 
> > 	if (max_freq <= capped_freq)
> > 
> > It turns out that this was caused by a race, which resulted in the
> > cpumask passed to the function being empty, in which case
> > cpumask_first() will return a cpu beyond the number of valid cpus, which
> > when used to access the per_cpu max_freq would return invalid pointer.
> > 
> > The bug in the Qualcomm cpufreq driver is being fixed, but having a
> > sanity check of the arguments would have saved quite a bit of time and
> > it's not unlikely that others will run into the same issue.
> > 
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> > ---
> >  drivers/base/arch_topology.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> > index 976154140f0b..6560a0c3b969 100644
> > --- a/drivers/base/arch_topology.c
> > +++ b/drivers/base/arch_topology.c
> > @@ -177,6 +177,9 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
> >  	u32 max_freq;
> >  	int cpu;
> >  
> > +	if (WARN_ON(cpumask_empty(cpus)))
> > +		return;
> > +
> 
> Why can't the caller check and call this only when cpus is not empty ?
> IIUC there are many such APIs that use cpumask and could result in similar
> issues if called with empty cpus. Probably we could add a note that cpus
> must not be empty if that helps the callers ?
> 

As indicated in the commit message, it took me a while to conclude that
the cause for a memory fault on what seemed to be a comparison between
two variables on the stack was actually caused by this race - which
isn't trivially reproducible, unless you know what the bug is.

Now _I_ know better and will hopefully recognize the oops signature
right away, but my hope was to put the sanity check on this side to save
the next caller of this API some time. Updating the comment probably
would have saved me a minute or two at the end, probably as confirmation
of my findings after the fact...

If you prefer to keep topology_update_thermal_pressure() clean(er) and
exciting I can hack around the issue in the Qualcomm driver.

PS. I'm onboard with Greg's objection to the WARN_ON()...

Regards,
Bjorn
diff mbox series

Patch

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 976154140f0b..6560a0c3b969 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -177,6 +177,9 @@  void topology_update_thermal_pressure(const struct cpumask *cpus,
 	u32 max_freq;
 	int cpu;
 
+	if (WARN_ON(cpumask_empty(cpus)))
+		return;
+
 	cpu = cpumask_first(cpus);
 	max_capacity = arch_scale_cpu_capacity(cpu);
 	max_freq = per_cpu(freq_factor, cpu);