opp: Reinitialize the list_kref before adding the static OPPs again
diff mbox series

Message ID 2700308706c0d46ca06eeb973079a1f18bf553dd.1571390916.git.viresh.kumar@linaro.org
State New
Delegated to: viresh kumar
Headers show
Series
  • opp: Reinitialize the list_kref before adding the static OPPs again
Related show

Commit Message

Viresh Kumar Oct. 18, 2019, 9:28 a.m. UTC
The list_kref reaches a count of 0 when all the static OPPs are removed,
for example when dev_pm_opp_of_cpumask_remove_table() is called, though
the actual OPP table may not get freed as it may still be referenced by
other parts of the kernel, like from a call to
dev_pm_opp_set_supported_hw(). And if we call
dev_pm_opp_of_cpumask_add_table() again at this point, we must
reinitialize the list_kref otherwise the kernel will hit a WARN() in
kref infrastructure for incrementing a kref with value 0.

Fixes: 11e1a1648298 ("opp: Don't decrement uninitialized list_kref")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/opp/of.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Dmitry Osipenko Oct. 18, 2019, 3:35 p.m. UTC | #1
18.10.2019 12:28, Viresh Kumar пишет:
> The list_kref reaches a count of 0 when all the static OPPs are removed,
> for example when dev_pm_opp_of_cpumask_remove_table() is called, though
> the actual OPP table may not get freed as it may still be referenced by
> other parts of the kernel, like from a call to
> dev_pm_opp_set_supported_hw(). And if we call
> dev_pm_opp_of_cpumask_add_table() again at this point, we must
> reinitialize the list_kref otherwise the kernel will hit a WARN() in
> kref infrastructure for incrementing a kref with value 0.
> 
> Fixes: 11e1a1648298 ("opp: Don't decrement uninitialized list_kref")
> Reported-by: Dmitry Osipenko <digetx@gmail.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/opp/of.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index 6dc41faf74b5..1cbb58240b80 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -663,6 +663,13 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
>  		return 0;
>  	}
>  
> +	/*
> +	 * Re-initialize list_kref every time we add static OPPs to the OPP
> +	 * table as the reference count may be 0 after the last tie static OPPs
> +	 * were removed.
> +	 */
> +	kref_init(&opp_table->list_kref);
> +
>  	/* We have opp-table node now, iterate over it and add OPPs */
>  	for_each_available_child_of_node(opp_table->np, np) {
>  		opp = _opp_add_static_v2(opp_table, dev, np);
> 

I don't see the warning anymore, cpufreq-dt module reloading works fine
now. Thank you very much!

Tested-by: Dmitry Osipenko <digetx@gmail.com>
Stephen Boyd Oct. 18, 2019, 9:12 p.m. UTC | #2
Quoting Viresh Kumar (2019-10-18 02:28:41)
> The list_kref reaches a count of 0 when all the static OPPs are removed,
> for example when dev_pm_opp_of_cpumask_remove_table() is called, though
> the actual OPP table may not get freed as it may still be referenced by
> other parts of the kernel, like from a call to
> dev_pm_opp_set_supported_hw(). And if we call
> dev_pm_opp_of_cpumask_add_table() again at this point, we must
> reinitialize the list_kref otherwise the kernel will hit a WARN() in
> kref infrastructure for incrementing a kref with value 0.
> 
> Fixes: 11e1a1648298 ("opp: Don't decrement uninitialized list_kref")
> Reported-by: Dmitry Osipenko <digetx@gmail.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/opp/of.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index 6dc41faf74b5..1cbb58240b80 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -663,6 +663,13 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
>                 return 0;
>         }
>  
> +       /*
> +        * Re-initialize list_kref every time we add static OPPs to the OPP
> +        * table as the reference count may be 0 after the last tie static OPPs

s/tie/time/

> +        * were removed.
> +        */
> +       kref_init(&opp_table->list_kref);

It seems racy. Why are we doing this vs. making an entirely new and
different OPP structure? Or why is the count reaching 0 when something
is obviously still referencing it?

> +
>         /* We have opp-table node now, iterate over it and add OPPs */
>         for_each_available_child_of_node(opp_table->np, np) {
>                 opp = _opp_add_static_v2(opp_table, dev, np);
Viresh Kumar Oct. 21, 2019, 2:18 a.m. UTC | #3
On 18-10-19, 18:35, Dmitry Osipenko wrote:
> 18.10.2019 12:28, Viresh Kumar пишет:
> > The list_kref reaches a count of 0 when all the static OPPs are removed,
> > for example when dev_pm_opp_of_cpumask_remove_table() is called, though
> > the actual OPP table may not get freed as it may still be referenced by
> > other parts of the kernel, like from a call to
> > dev_pm_opp_set_supported_hw(). And if we call
> > dev_pm_opp_of_cpumask_add_table() again at this point, we must
> > reinitialize the list_kref otherwise the kernel will hit a WARN() in
> > kref infrastructure for incrementing a kref with value 0.
> > 
> > Fixes: 11e1a1648298 ("opp: Don't decrement uninitialized list_kref")
> > Reported-by: Dmitry Osipenko <digetx@gmail.com>
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> >  drivers/opp/of.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> > index 6dc41faf74b5..1cbb58240b80 100644
> > --- a/drivers/opp/of.c
> > +++ b/drivers/opp/of.c
> > @@ -663,6 +663,13 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
> >  		return 0;
> >  	}
> >  
> > +	/*
> > +	 * Re-initialize list_kref every time we add static OPPs to the OPP
> > +	 * table as the reference count may be 0 after the last tie static OPPs
> > +	 * were removed.
> > +	 */
> > +	kref_init(&opp_table->list_kref);
> > +
> >  	/* We have opp-table node now, iterate over it and add OPPs */
> >  	for_each_available_child_of_node(opp_table->np, np) {
> >  		opp = _opp_add_static_v2(opp_table, dev, np);
> > 
> 
> I don't see the warning anymore, cpufreq-dt module reloading works fine
> now. Thank you very much!
> 
> Tested-by: Dmitry Osipenko <digetx@gmail.com>

Thanks for testing.
Viresh Kumar Oct. 21, 2019, 2:25 a.m. UTC | #4
On 18-10-19, 14:12, Stephen Boyd wrote:
> Quoting Viresh Kumar (2019-10-18 02:28:41)
> > The list_kref reaches a count of 0 when all the static OPPs are removed,
> > for example when dev_pm_opp_of_cpumask_remove_table() is called, though
> > the actual OPP table may not get freed as it may still be referenced by
> > other parts of the kernel, like from a call to
> > dev_pm_opp_set_supported_hw(). And if we call
> > dev_pm_opp_of_cpumask_add_table() again at this point, we must
> > reinitialize the list_kref otherwise the kernel will hit a WARN() in
> > kref infrastructure for incrementing a kref with value 0.
> > 
> > Fixes: 11e1a1648298 ("opp: Don't decrement uninitialized list_kref")
> > Reported-by: Dmitry Osipenko <digetx@gmail.com>
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> >  drivers/opp/of.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> > index 6dc41faf74b5..1cbb58240b80 100644
> > --- a/drivers/opp/of.c
> > +++ b/drivers/opp/of.c
> > @@ -663,6 +663,13 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
> >                 return 0;
> >         }
> >  
> > +       /*
> > +        * Re-initialize list_kref every time we add static OPPs to the OPP
> > +        * table as the reference count may be 0 after the last tie static OPPs
> 
> s/tie/time/
> 
> > +        * were removed.
> > +        */
> > +       kref_init(&opp_table->list_kref);
> 
> It seems racy.

I am not sure if I see a race here, but maybe I am missing something.
Care to explain ?

> Why are we doing this vs. making an entirely new and
> different OPP structure? Or why is the count reaching 0 when something
> is obviously still referencing it?

The kref for the opp table is opp_table->kref and the one here is
different. This is list_kref which is used for freeing OPPs added
statically from the DT. The static OPPs get added to the OPP table
when one calls dev_pm_opp_of_cpumask_add_table() and must be removed
on a call to dev_pm_opp_of_cpumask_remove_table(). The opp table
structure may not get freed at this moment though as it is still
referenced by the caller of dev_pm_opp_set_supported_hw().

And now when we try to add the static OPPs again (re-insertion of
cpufreq module), we need to reinitialize the list_kref again as its
count reached 0 earlier and the resources (static OPPs) were freed.
Stephen Boyd Oct. 28, 2019, 12:01 p.m. UTC | #5
Quoting Viresh Kumar (2019-10-20 19:25:16)
> On 18-10-19, 14:12, Stephen Boyd wrote:
> > Quoting Viresh Kumar (2019-10-18 02:28:41)
> > > The list_kref reaches a count of 0 when all the static OPPs are removed,
> > > for example when dev_pm_opp_of_cpumask_remove_table() is called, though
> > > the actual OPP table may not get freed as it may still be referenced by
> > > other parts of the kernel, like from a call to
> > > dev_pm_opp_set_supported_hw(). And if we call
> > > dev_pm_opp_of_cpumask_add_table() again at this point, we must
> > > reinitialize the list_kref otherwise the kernel will hit a WARN() in
> > > kref infrastructure for incrementing a kref with value 0.
> > > 
> > > Fixes: 11e1a1648298 ("opp: Don't decrement uninitialized list_kref")
> > > Reported-by: Dmitry Osipenko <digetx@gmail.com>
> > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > > ---
> > >  drivers/opp/of.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> > > index 6dc41faf74b5..1cbb58240b80 100644
> > > --- a/drivers/opp/of.c
> > > +++ b/drivers/opp/of.c
> > > @@ -663,6 +663,13 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
> > >                 return 0;
> > >         }
> > >  
> > > +       /*
> > > +        * Re-initialize list_kref every time we add static OPPs to the OPP
> > > +        * table as the reference count may be 0 after the last tie static OPPs
> > 
> > s/tie/time/
> > 
> > > +        * were removed.
> > > +        */
> > > +       kref_init(&opp_table->list_kref);
> > 
> > It seems racy.
> 
> I am not sure if I see a race here, but maybe I am missing something.
> Care to explain ?

Some static OPP is removed at the same time that this function is
called?

> 
> > Why are we doing this vs. making an entirely new and
> > different OPP structure? Or why is the count reaching 0 when something
> > is obviously still referencing it?
> 
> The kref for the opp table is opp_table->kref and the one here is
> different. This is list_kref which is used for freeing OPPs added
> statically from the DT. The static OPPs get added to the OPP table
> when one calls dev_pm_opp_of_cpumask_add_table() and must be removed
> on a call to dev_pm_opp_of_cpumask_remove_table(). The opp table
> structure may not get freed at this moment though as it is still
> referenced by the caller of dev_pm_opp_set_supported_hw().
> 
> And now when we try to add the static OPPs again (re-insertion of
> cpufreq module), we need to reinitialize the list_kref again as its
> count reached 0 earlier and the resources (static OPPs) were freed.
> 

Right. I don't understand why the count reaches 0 if we can still get a
pointer to something. I guess we've got this kref thing that has a
lifetime beyond the life of what it's tracking, which is weird. Usually
the kref is embedded inside the pointer that is returned by the "get"
call, but here it's outside it and used to track when we should free
static OPPs. Why are we removing static OPPs? Shouldn't they just stick
around forever until the device is deleted vs. populated over and over
again?
Viresh Kumar Oct. 30, 2019, 7:50 a.m. UTC | #6
On Mon, 28 Oct 2019 at 17:31, Stephen Boyd <sboyd@kernel.org> wrote:
> Quoting Viresh Kumar (2019-10-20 19:25:16)

> Some static OPP is removed at the same time that this function is
> called?

Hmm, not just this line but yeah this can be racy in principle though
not in practice.
As both addition and removal of the static OPPs happen from the same
driver, like
during cpufreq registration and unregistration.

> Right. I don't understand why the count reaches 0 if we can still get a
> pointer to something. I guess we've got this kref thing that has a
> lifetime beyond the life of what it's tracking, which is weird.

Something is weird here for sure as the kref is not protecting a
specific object here.
Maybe we should use a simple counter protected with opp-table lock here.

> Usually
> the kref is embedded inside the pointer that is returned by the "get"
> call, but here it's outside it and used to track when we should free
> static OPPs.

> Why are we removing static OPPs? Shouldn't they just stick
> around forever until the device is deleted vs. populated over and over
> again?

Because the only use of the static OPPs is gone and so freeing them is the
right thing to do. Also, it is possible in principle to change the supported-hw
values after removing the cpufreq driver and adding it back, which means
it is possible to get a new set of OPPs.

--
viresh
Viresh Kumar Nov. 11, 2019, 8:21 a.m. UTC | #7
On 30-10-19, 07:33, Stephen Boyd wrote:
> I agree a simple refcount_t makes more sense here instead of using a
> kref. That would be clearer.

I was using kref as I wanted to call the cleanup routine when kref
reaches 0. A refcount_t will have the same problem as the warning in
this came from refcount mechanism only (which is used by kref). You
can't increment a refcount_t if it is zero :)

Any other suggestions other than local counting ?
Viresh Kumar Nov. 11, 2019, 11:31 a.m. UTC | #8
On 11-11-19, 13:51, Viresh Kumar wrote:
> On 30-10-19, 07:33, Stephen Boyd wrote:
> > I agree a simple refcount_t makes more sense here instead of using a
> > kref. That would be clearer.
> 
> I was using kref as I wanted to call the cleanup routine when kref
> reaches 0. A refcount_t will have the same problem as the warning in
> this came from refcount mechanism only (which is used by kref). You
> can't increment a refcount_t if it is zero :)
> 
> Any other suggestions other than local counting ?

i.e. something like this, untested.

---
 drivers/opp/core.c | 26 ++++++++------------------
 drivers/opp/of.c   | 14 +++++---------
 drivers/opp/opp.h  |  6 ++----
 3 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 467b2348a289..ea1d89177511 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -988,7 +988,6 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index)
 	BLOCKING_INIT_NOTIFIER_HEAD(&opp_table->head);
 	INIT_LIST_HEAD(&opp_table->opp_list);
 	kref_init(&opp_table->kref);
-	kref_init(&opp_table->list_kref);
 
 	/* Secure the device table modification */
 	list_add(&opp_table->node, &opp_tables);
@@ -1076,27 +1075,18 @@ static void _opp_remove_all_static(struct opp_table *opp_table)
 {
 	struct dev_pm_opp *opp, *tmp;
 
+	mutex_lock(&opp_table->lock);
+
+	if (!opp_table->parsed_static_opps || --opp_table->parsed_static_opps)
+		goto unlock;
+
 	list_for_each_entry_safe(opp, tmp, &opp_table->opp_list, node) {
 		if (!opp->dynamic)
 			dev_pm_opp_put(opp);
 	}
 
-	opp_table->parsed_static_opps = false;
-}
-
-static void _opp_table_list_kref_release(struct kref *kref)
-{
-	struct opp_table *opp_table = container_of(kref, struct opp_table,
-						   list_kref);
-
-	_opp_remove_all_static(opp_table);
-	mutex_unlock(&opp_table_lock);
-}
-
-void _put_opp_list_kref(struct opp_table *opp_table)
-{
-	kref_put_mutex(&opp_table->list_kref, _opp_table_list_kref_release,
-		       &opp_table_lock);
+unlock:
+	mutex_unlock(&opp_table->lock);
 }
 
 void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
@@ -2276,7 +2266,7 @@ void _dev_pm_opp_find_and_remove_table(struct device *dev)
 		return;
 	}
 
-	_put_opp_list_kref(opp_table);
+	_opp_remove_all_static(opp_table);
 
 	/* Drop reference taken by _find_opp_table() */
 	dev_pm_opp_put_opp_table(opp_table);
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 1cbb58240b80..2c433e9f9223 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -658,17 +658,15 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
 	struct dev_pm_opp *opp;
 
 	/* OPP table is already initialized for the device */
+	mutex_lock(&opp_table->lock);
 	if (opp_table->parsed_static_opps) {
-		kref_get(&opp_table->list_kref);
+		opp_table->parsed_static_opps++;
+		mutex_unlock(&opp_table->lock);
 		return 0;
 	}
 
-	/*
-	 * Re-initialize list_kref every time we add static OPPs to the OPP
-	 * table as the reference count may be 0 after the last tie static OPPs
-	 * were removed.
-	 */
-	kref_init(&opp_table->list_kref);
+	opp_table->parsed_static_opps = 1;
+	mutex_unlock(&opp_table->lock);
 
 	/* We have opp-table node now, iterate over it and add OPPs */
 	for_each_available_child_of_node(opp_table->np, np) {
@@ -701,8 +699,6 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
 	if (pstate_count)
 		opp_table->genpd_performance_state = true;
 
-	opp_table->parsed_static_opps = true;
-
 	return 0;
 }
 
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 51ad942d1b6b..4e69b855a6a0 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -127,11 +127,10 @@ enum opp_table_access {
  * @dev_list:	list of devices that share these OPPs
  * @opp_list:	table of opps
  * @kref:	for reference count of the table.
- * @list_kref:	for reference count of the OPP list.
  * @lock:	mutex protecting the opp_list and dev_list.
  * @np:		struct device_node pointer for opp's DT node.
  * @clock_latency_ns_max: Max clock latency in nanoseconds.
- * @parsed_static_opps: True if OPPs are initialized from DT.
+ * @parsed_static_opps: Count of devices for which OPPs are initialized from DT.
  * @shared_opp: OPP is shared between multiple devices.
  * @suspend_opp: Pointer to OPP to be used during device suspend.
  * @genpd_virt_dev_lock: Mutex protecting the genpd virtual device pointers.
@@ -167,7 +166,6 @@ struct opp_table {
 	struct list_head dev_list;
 	struct list_head opp_list;
 	struct kref kref;
-	struct kref list_kref;
 	struct mutex lock;
 
 	struct device_node *np;
@@ -176,7 +174,7 @@ struct opp_table {
 	/* For backward compatibility with v1 bindings */
 	unsigned int voltage_tolerance_v1;
 
-	bool parsed_static_opps;
+	unsigned int parsed_static_opps;
 	enum opp_table_access shared_opp;
 	struct dev_pm_opp *suspend_opp;

Patch
diff mbox series

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 6dc41faf74b5..1cbb58240b80 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -663,6 +663,13 @@  static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
 		return 0;
 	}
 
+	/*
+	 * Re-initialize list_kref every time we add static OPPs to the OPP
+	 * table as the reference count may be 0 after the last tie static OPPs
+	 * were removed.
+	 */
+	kref_init(&opp_table->list_kref);
+
 	/* We have opp-table node now, iterate over it and add OPPs */
 	for_each_available_child_of_node(opp_table->np, np) {
 		opp = _opp_add_static_v2(opp_table, dev, np);