diff mbox series

[V2,2/3] cpufreq: Panic if policy is active in cpufreq_policy_free()

Message ID 426bf6edc80b2e944d459fa7b8dffbe8b73bb3d9.1653623526.git.viresh.kumar@linaro.org (mailing list archive)
State Changes Requested, archived
Headers show
Series None | expand

Commit Message

Viresh Kumar May 27, 2022, 3:53 a.m. UTC
With the new design in place, to avoid potential races show() and
store() callbacks check if the policy is active or not before proceeding
any further. And in order to guarantee that cpufreq_policy_free() must
be called after clearing the policy->cpus mask, i.e. by marking it
inactive.

Lets make sure we don't get a bug around this later and catch this early
by putting a BUG_ON() within cpufreq_policy_free().

Also update cpufreq_online() a bit to make sure we clear the cpus mask
for each error case before calling cpufreq_policy_free().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
V2: Update cpufreq_online() and changelog.

 drivers/cpufreq/cpufreq.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki June 14, 2022, 1:59 p.m. UTC | #1
On Fri, May 27, 2022 at 5:53 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> With the new design in place, to avoid potential races show() and
> store() callbacks check if the policy is active or not before proceeding
> any further. And in order to guarantee that cpufreq_policy_free() must
> be called after clearing the policy->cpus mask, i.e. by marking it
> inactive.
>
> Lets make sure we don't get a bug around this later and catch this early
> by putting a BUG_ON() within cpufreq_policy_free().
>
> Also update cpufreq_online() a bit to make sure we clear the cpus mask
> for each error case before calling cpufreq_policy_free().
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> V2: Update cpufreq_online() and changelog.
>
>  drivers/cpufreq/cpufreq.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index e24aa5d4bca5..0f8245731783 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
>         unsigned long flags;
>         int cpu;
>
> +       /*
> +        * The callers must ensure the policy is inactive by now, to avoid any
> +        * races with show()/store() callbacks.
> +        */
> +       BUG_ON(!policy_is_inactive(policy));

I'm not a super-big fan of this change.

First off, crashing the kernel outright here because of possible races
appears a bit excessive to me.

Second, it looks like we are worrying about the code running before
the wait_for_completion() call in cpufreq_policy_put_kobj(), because
after that call no one can be running show() or store().  So why don't
we reorder the wait_for_completion() call with respect to the code in
question instead?

> +
>         /* Remove policy from list */
>         write_lock_irqsave(&cpufreq_driver_lock, flags);
>         list_del(&policy->policy_list);
> @@ -1538,8 +1544,6 @@ static int cpufreq_online(unsigned int cpu)
>         for_each_cpu(j, policy->real_cpus)
>                 remove_cpu_dev_symlink(policy, j, get_cpu_device(j));
>
> -       cpumask_clear(policy->cpus);
> -
>  out_offline_policy:
>         if (cpufreq_driver->offline)
>                 cpufreq_driver->offline(policy);
> @@ -1549,6 +1553,7 @@ static int cpufreq_online(unsigned int cpu)
>                 cpufreq_driver->exit(policy);
>
>  out_free_policy:
> +       cpumask_clear(policy->cpus);
>         up_write(&policy->rwsem);
>
>         cpufreq_policy_free(policy);
> --
> 2.31.1.272.g89b43f80a514
>
Viresh Kumar June 15, 2022, 4:59 a.m. UTC | #2
On 14-06-22, 15:59, Rafael J. Wysocki wrote:
> On Fri, May 27, 2022 at 5:53 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > With the new design in place, to avoid potential races show() and
> > store() callbacks check if the policy is active or not before proceeding
> > any further. And in order to guarantee that cpufreq_policy_free() must
> > be called after clearing the policy->cpus mask, i.e. by marking it
> > inactive.
> >
> > Lets make sure we don't get a bug around this later and catch this early
> > by putting a BUG_ON() within cpufreq_policy_free().
> >
> > Also update cpufreq_online() a bit to make sure we clear the cpus mask
> > for each error case before calling cpufreq_policy_free().
> >
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> > V2: Update cpufreq_online() and changelog.
> >
> >  drivers/cpufreq/cpufreq.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index e24aa5d4bca5..0f8245731783 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
> >         unsigned long flags;
> >         int cpu;
> >
> > +       /*
> > +        * The callers must ensure the policy is inactive by now, to avoid any
> > +        * races with show()/store() callbacks.
> > +        */
> > +       BUG_ON(!policy_is_inactive(policy));
> 
> I'm not a super-big fan of this change.
> 
> First off, crashing the kernel outright here because of possible races
> appears a bit excessive to me.
> 
> Second, it looks like we are worrying about the code running before
> the wait_for_completion() call in cpufreq_policy_put_kobj(), because
> after that call no one can be running show() or store().  So why don't
> we reorder the wait_for_completion() call with respect to the code in
> question instead?

No, I am not worrying about that race. I am just trying to make sure some change
in future doesn't break this assumption (that policy should be inactive by this
point). That's all. It all looks good for now.

May be a WARN instead of BUG if we don't want to crash.
Rafael J. Wysocki July 6, 2022, 1:49 p.m. UTC | #3
On Wed, Jun 15, 2022 at 7:00 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 14-06-22, 15:59, Rafael J. Wysocki wrote:
> > On Fri, May 27, 2022 at 5:53 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > With the new design in place, to avoid potential races show() and
> > > store() callbacks check if the policy is active or not before proceeding
> > > any further. And in order to guarantee that cpufreq_policy_free() must
> > > be called after clearing the policy->cpus mask, i.e. by marking it
> > > inactive.
> > >
> > > Lets make sure we don't get a bug around this later and catch this early
> > > by putting a BUG_ON() within cpufreq_policy_free().
> > >
> > > Also update cpufreq_online() a bit to make sure we clear the cpus mask
> > > for each error case before calling cpufreq_policy_free().
> > >
> > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > > ---
> > > V2: Update cpufreq_online() and changelog.
> > >
> > >  drivers/cpufreq/cpufreq.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index e24aa5d4bca5..0f8245731783 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
> > >         unsigned long flags;
> > >         int cpu;
> > >
> > > +       /*
> > > +        * The callers must ensure the policy is inactive by now, to avoid any
> > > +        * races with show()/store() callbacks.
> > > +        */
> > > +       BUG_ON(!policy_is_inactive(policy));
> >
> > I'm not a super-big fan of this change.
> >
> > First off, crashing the kernel outright here because of possible races
> > appears a bit excessive to me.
> >
> > Second, it looks like we are worrying about the code running before
> > the wait_for_completion() call in cpufreq_policy_put_kobj(), because
> > after that call no one can be running show() or store().  So why don't
> > we reorder the wait_for_completion() call with respect to the code in
> > question instead?
>
> No, I am not worrying about that race. I am just trying to make sure some change
> in future doesn't break this assumption (that policy should be inactive by this
> point). That's all. It all looks good for now.
>
> May be a WARN instead of BUG if we don't want to crash.

WARN_ON() would be somewhat better, but then I'm not sure if having a
full call trace in this case is really useful, because we know when
cpufreq_policy_free() can be called anyway.

Maybe just print a warning message.
Viresh Kumar July 6, 2022, 3:23 p.m. UTC | #4
On 06-07-22, 15:49, Rafael J. Wysocki wrote:
> WARN_ON() would be somewhat better, but then I'm not sure if having a
> full call trace in this case is really useful, because we know when
> cpufreq_policy_free() can be called anyway.
> 
> Maybe just print a warning message.

The warning will get printed, yes, but I am sure everyone will end up
ignoring it, once it happens.

One of the benefits of printing the call-stack is people will take it
seriously and report it, and we won't miss a bug, if one gets in
somehow.
Rafael J. Wysocki July 6, 2022, 3:34 p.m. UTC | #5
On Wed, Jul 6, 2022 at 5:23 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 06-07-22, 15:49, Rafael J. Wysocki wrote:
> > WARN_ON() would be somewhat better, but then I'm not sure if having a
> > full call trace in this case is really useful, because we know when
> > cpufreq_policy_free() can be called anyway.
> >
> > Maybe just print a warning message.
>
> The warning will get printed, yes, but I am sure everyone will end up
> ignoring it, once it happens.
>
> One of the benefits of printing the call-stack is people will take it
> seriously and report it, and we won't miss a bug, if one gets in
> somehow.

I'd rather not go into discussing things that people may or may not do and why.

My point is that if WARN_ON() gets converted to panic(), they will not
see the message at all and if the message gets printed, they will have
a chance to see it even in that case.  Whether or not they use that
chance as desirable is beyond the scope of engineering IMV.
diff mbox series

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index e24aa5d4bca5..0f8245731783 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1284,6 +1284,12 @@  static void cpufreq_policy_free(struct cpufreq_policy *policy)
 	unsigned long flags;
 	int cpu;
 
+	/*
+	 * The callers must ensure the policy is inactive by now, to avoid any
+	 * races with show()/store() callbacks.
+	 */
+	BUG_ON(!policy_is_inactive(policy));
+
 	/* Remove policy from list */
 	write_lock_irqsave(&cpufreq_driver_lock, flags);
 	list_del(&policy->policy_list);
@@ -1538,8 +1544,6 @@  static int cpufreq_online(unsigned int cpu)
 	for_each_cpu(j, policy->real_cpus)
 		remove_cpu_dev_symlink(policy, j, get_cpu_device(j));
 
-	cpumask_clear(policy->cpus);
-
 out_offline_policy:
 	if (cpufreq_driver->offline)
 		cpufreq_driver->offline(policy);
@@ -1549,6 +1553,7 @@  static int cpufreq_online(unsigned int cpu)
 		cpufreq_driver->exit(policy);
 
 out_free_policy:
+	cpumask_clear(policy->cpus);
 	up_write(&policy->rwsem);
 
 	cpufreq_policy_free(policy);