diff mbox series

[03/13] opp: Keep track of currently programmed OPP

Message ID 96b57316a2a307a5cc5ff7302b3cd0084123a2ed.1611227342.git.viresh.kumar@linaro.org (mailing list archive)
State New
Delegated to: viresh kumar
Headers show
Series opp: Implement dev_pm_opp_set_opp() | expand

Commit Message

Viresh Kumar Jan. 21, 2021, 11:17 a.m. UTC
The dev_pm_opp_set_rate() helper needs to know the currently programmed
OPP to make few decisions and currently we try to find it on every
invocation of this routine.

Lets start keeping track of the current_opp programmed for the devices
of the opp table, that will be quite useful going forward.

If we fail to find the current OPP, we pick the first one available in
the list, as the list is in ascending order of frequencies, level, or
bandwidth and that's the best guess we can make anyway.

Note that we used to do the frequency comparison a bit early in
dev_pm_opp_set_rate() previously, and now instead we check the target
opp, which shall be more accurate anyway.

We need to make sure that current_opp's memory doesn't get freed while
it is being used and so we keep a reference of it until the time it is
used.

Now that current_opp will always be set, we can drop some unnecessary
checks as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/opp/core.c | 83 +++++++++++++++++++++++++++++-----------------
 drivers/opp/opp.h  |  2 ++
 2 files changed, 55 insertions(+), 30 deletions(-)

Comments

Dmitry Osipenko Jan. 21, 2021, 9:41 p.m. UTC | #1
21.01.2021 14:17, Viresh Kumar пишет:
> @@ -1074,15 +1091,18 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
>  
>  	if (!ret) {
>  		ret = _set_opp_bw(opp_table, opp, dev, false);
> -		if (!ret)
> +		if (!ret) {
>  			opp_table->enabled = true;
> +			dev_pm_opp_put(old_opp);
> +
> +			/* Make sure current_opp doesn't get freed */
> +			dev_pm_opp_get(opp);
> +			opp_table->current_opp = opp;
> +		}
>  	}

I'm a bit surprised that _set_opp_bw() isn't used similarly to
_set_opp_voltage() in _generic_set_opp_regulator().

I'd expect the BW requirement to be raised before the clock rate goes UP.
Viresh Kumar Jan. 22, 2021, 4:45 a.m. UTC | #2
On 22-01-21, 00:41, Dmitry Osipenko wrote:
> 21.01.2021 14:17, Viresh Kumar пишет:
> > @@ -1074,15 +1091,18 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
> >  
> >  	if (!ret) {
> >  		ret = _set_opp_bw(opp_table, opp, dev, false);
> > -		if (!ret)
> > +		if (!ret) {
> >  			opp_table->enabled = true;
> > +			dev_pm_opp_put(old_opp);
> > +
> > +			/* Make sure current_opp doesn't get freed */
> > +			dev_pm_opp_get(opp);
> > +			opp_table->current_opp = opp;
> > +		}
> >  	}
> 
> I'm a bit surprised that _set_opp_bw() isn't used similarly to
> _set_opp_voltage() in _generic_set_opp_regulator().
> 
> I'd expect the BW requirement to be raised before the clock rate goes UP.

I remember discussing that earlier when this stuff came in, and this I
believe is the reason for that.

We need to scale regulators before/after frequency because when we
increase the frequency a regulator may _not_ be providing enough power
to sustain that (even for a short while) and this may have undesired
effects on the hardware and so it is important to prevent that
malfunction.

In case of bandwidth such issues will not happen (AFAIK) and doing it
just once is normally enough. It is just about allowing more data to
be transmitted, and won't make the hardware behave badly.
Dmitry Osipenko Jan. 22, 2021, 2:31 p.m. UTC | #3
22.01.2021 07:45, Viresh Kumar пишет:
> On 22-01-21, 00:41, Dmitry Osipenko wrote:
>> 21.01.2021 14:17, Viresh Kumar пишет:
>>> @@ -1074,15 +1091,18 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
>>>  
>>>  	if (!ret) {
>>>  		ret = _set_opp_bw(opp_table, opp, dev, false);
>>> -		if (!ret)
>>> +		if (!ret) {
>>>  			opp_table->enabled = true;
>>> +			dev_pm_opp_put(old_opp);
>>> +
>>> +			/* Make sure current_opp doesn't get freed */
>>> +			dev_pm_opp_get(opp);
>>> +			opp_table->current_opp = opp;
>>> +		}
>>>  	}
>>
>> I'm a bit surprised that _set_opp_bw() isn't used similarly to
>> _set_opp_voltage() in _generic_set_opp_regulator().
>>
>> I'd expect the BW requirement to be raised before the clock rate goes UP.
> 
> I remember discussing that earlier when this stuff came in, and this I
> believe is the reason for that.
> 
> We need to scale regulators before/after frequency because when we
> increase the frequency a regulator may _not_ be providing enough power
> to sustain that (even for a short while) and this may have undesired
> effects on the hardware and so it is important to prevent that
> malfunction.
> 
> In case of bandwidth such issues will not happen (AFAIK) and doing it
> just once is normally enough. It is just about allowing more data to
> be transmitted, and won't make the hardware behave badly.
> 

This may not be true for all kinds of hardware, a display controller is
one example. If display's pixclock is raised before the memory bandwidth
of the display's memory client, then display controller may get a memory
underflow since it won't be able to fetch memory fast enough and it's
not possible to pause data transmission to display panel, hence display
panel may get out of sync and a full hardware reset will be needed in
order to recover. At least this is the case for NVIDIA Tegra SoCs.

I guess it's not a real problem for any of OPP API users right now, but
this is something to keep in mind.
Viresh Kumar Jan. 25, 2021, 3:12 a.m. UTC | #4
On 22-01-21, 17:31, Dmitry Osipenko wrote:
> This may not be true for all kinds of hardware, a display controller is
> one example. If display's pixclock is raised before the memory bandwidth
> of the display's memory client, then display controller may get a memory
> underflow since it won't be able to fetch memory fast enough and it's
> not possible to pause data transmission to display panel, hence display
> panel may get out of sync and a full hardware reset will be needed in
> order to recover. At least this is the case for NVIDIA Tegra SoCs.

Hmm, but I expected that the request for more data will only come after the
opp-set-rate has finished and not in between. May be I am wrong. There is
nothing wrong in doing it the regulator way if required.

> I guess it's not a real problem for any of OPP API users right now, but
> this is something to keep in mind.

Sure, I am not against it. Just that we thought it isn't worth the code.
Akhil P Oommen Jan. 27, 2021, 4:31 p.m. UTC | #5
On 1/22/2021 10:15 AM, Viresh Kumar wrote:
> On 22-01-21, 00:41, Dmitry Osipenko wrote:
>> 21.01.2021 14:17, Viresh Kumar пишет:
>>> @@ -1074,15 +1091,18 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
>>>   
>>>   	if (!ret) {
>>>   		ret = _set_opp_bw(opp_table, opp, dev, false);
>>> -		if (!ret)
>>> +		if (!ret) {
>>>   			opp_table->enabled = true;
>>> +			dev_pm_opp_put(old_opp);
>>> +
>>> +			/* Make sure current_opp doesn't get freed */
>>> +			dev_pm_opp_get(opp);
>>> +			opp_table->current_opp = opp;
>>> +		}
>>>   	}
>>
>> I'm a bit surprised that _set_opp_bw() isn't used similarly to
>> _set_opp_voltage() in _generic_set_opp_regulator().
>>
>> I'd expect the BW requirement to be raised before the clock rate goes UP.
> 
> I remember discussing that earlier when this stuff came in, and this I
> believe is the reason for that.
> 
> We need to scale regulators before/after frequency because when we
> increase the frequency a regulator may _not_ be providing enough power
> to sustain that (even for a short while) and this may have undesired
> effects on the hardware and so it is important to prevent that
> malfunction.
> 
> In case of bandwidth such issues will not happen (AFAIK) and doing it
> just once is normally enough. It is just about allowing more data to
> be transmitted, and won't make the hardware behave badly.
> 
I agree with Dmitry. BW is a shared resource in a lot of architectures. 
Raising clk before increasing the bw can lead to a scenario where this 
client saturate the entire BW for whatever small duration it may be. 
This will impact the latency requirements of other clients.

-Akhil.
Viresh Kumar Jan. 28, 2021, 4:14 a.m. UTC | #6
On 27-01-21, 22:01, Akhil P Oommen wrote:
> On 1/22/2021 10:15 AM, Viresh Kumar wrote:
> > On 22-01-21, 00:41, Dmitry Osipenko wrote:
> > > 21.01.2021 14:17, Viresh Kumar пишет:
> > > > @@ -1074,15 +1091,18 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
> > > >   	if (!ret) {
> > > >   		ret = _set_opp_bw(opp_table, opp, dev, false);
> > > > -		if (!ret)
> > > > +		if (!ret) {
> > > >   			opp_table->enabled = true;
> > > > +			dev_pm_opp_put(old_opp);
> > > > +
> > > > +			/* Make sure current_opp doesn't get freed */
> > > > +			dev_pm_opp_get(opp);
> > > > +			opp_table->current_opp = opp;
> > > > +		}
> > > >   	}
> > > 
> > > I'm a bit surprised that _set_opp_bw() isn't used similarly to
> > > _set_opp_voltage() in _generic_set_opp_regulator().
> > > 
> > > I'd expect the BW requirement to be raised before the clock rate goes UP.
> > 
> > I remember discussing that earlier when this stuff came in, and this I
> > believe is the reason for that.
> > 
> > We need to scale regulators before/after frequency because when we
> > increase the frequency a regulator may _not_ be providing enough power
> > to sustain that (even for a short while) and this may have undesired
> > effects on the hardware and so it is important to prevent that
> > malfunction.
> > 
> > In case of bandwidth such issues will not happen (AFAIK) and doing it
> > just once is normally enough. It is just about allowing more data to
> > be transmitted, and won't make the hardware behave badly.
> > 
> I agree with Dmitry. BW is a shared resource in a lot of architectures.
> Raising clk before increasing the bw can lead to a scenario where this
> client saturate the entire BW for whatever small duration it may be. This
> will impact the latency requirements of other clients.

I see. I will make the necessary changes then to fix it. Thanks guys.
diff mbox series

Patch

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index cb5b67ccf5cf..4ee598344e6a 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -788,8 +788,7 @@  static int _generic_set_opp_regulator(struct opp_table *opp_table,
 			__func__, old_freq);
 restore_voltage:
 	/* This shouldn't harm even if the voltages weren't updated earlier */
-	if (old_supply)
-		_set_opp_voltage(dev, reg, old_supply);
+	_set_opp_voltage(dev, reg, old_supply);
 
 	return ret;
 }
@@ -839,10 +838,7 @@  static int _set_opp_custom(const struct opp_table *opp_table,
 
 	data->old_opp.rate = old_freq;
 	size = sizeof(*old_supply) * opp_table->regulator_count;
-	if (!old_supply)
-		memset(data->old_opp.supplies, 0, size);
-	else
-		memcpy(data->old_opp.supplies, old_supply, size);
+	memcpy(data->old_opp.supplies, old_supply, size);
 
 	data->new_opp.rate = freq;
 	memcpy(data->new_opp.supplies, new_supply, size);
@@ -943,6 +939,31 @@  int dev_pm_opp_set_bw(struct device *dev, struct dev_pm_opp *opp)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_set_bw);
 
+static void _find_current_opp(struct device *dev, struct opp_table *opp_table)
+{
+	struct dev_pm_opp *opp = ERR_PTR(-ENODEV);
+	unsigned long freq;
+
+	if (!IS_ERR(opp_table->clk)) {
+		freq = clk_get_rate(opp_table->clk);
+		opp = _find_freq_ceil(opp_table, &freq);
+	}
+
+	/*
+	 * Unable to find the current OPP ? Pick the first from the list since
+	 * it is in ascending order, otherwise rest of the code will need to
+	 * make special checks to validate current_opp.
+	 */
+	if (IS_ERR(opp)) {
+		mutex_lock(&opp_table->lock);
+		opp = list_first_entry(&opp_table->opp_list, struct dev_pm_opp, node);
+		dev_pm_opp_get(opp);
+		mutex_unlock(&opp_table->lock);
+	}
+
+	opp_table->current_opp = opp;
+}
+
 static int _disable_opp_table(struct device *dev, struct opp_table *opp_table)
 {
 	int ret;
@@ -1004,16 +1025,6 @@  int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
 	if ((long)freq <= 0)
 		freq = target_freq;
 
-	old_freq = clk_get_rate(opp_table->clk);
-
-	/* Return early if nothing to do */
-	if (opp_table->enabled && old_freq == freq) {
-		dev_dbg(dev, "%s: old/new frequencies (%lu Hz) are same, nothing to do\n",
-			__func__, freq);
-		ret = 0;
-		goto put_opp_table;
-	}
-
 	/*
 	 * For IO devices which require an OPP on some platforms/SoCs
 	 * while just needing to scale the clock on some others
@@ -1026,12 +1037,9 @@  int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
 		goto put_opp_table;
 	}
 
-	temp_freq = old_freq;
-	old_opp = _find_freq_ceil(opp_table, &temp_freq);
-	if (IS_ERR(old_opp)) {
-		dev_err(dev, "%s: failed to find current OPP for freq %lu (%ld)\n",
-			__func__, old_freq, PTR_ERR(old_opp));
-	}
+	/* Find the currently set OPP if we don't know already */
+	if (unlikely(!opp_table->current_opp))
+		_find_current_opp(dev, opp_table);
 
 	temp_freq = freq;
 	opp = _find_freq_ceil(opp_table, &temp_freq);
@@ -1039,7 +1047,17 @@  int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
 		ret = PTR_ERR(opp);
 		dev_err(dev, "%s: failed to find OPP for freq %lu (%d)\n",
 			__func__, freq, ret);
-		goto put_old_opp;
+		goto put_opp_table;
+	}
+
+	old_opp = opp_table->current_opp;
+	old_freq = old_opp->rate;
+
+	/* Return early if nothing to do */
+	if (opp_table->enabled && old_opp == opp) {
+		dev_dbg(dev, "%s: OPPs are same, nothing to do\n", __func__);
+		ret = 0;
+		goto put_opp;
 	}
 
 	dev_dbg(dev, "%s: switching OPP: %lu Hz --> %lu Hz\n", __func__,
@@ -1054,11 +1072,10 @@  int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
 
 	if (opp_table->set_opp) {
 		ret = _set_opp_custom(opp_table, dev, old_freq, freq,
-				      IS_ERR(old_opp) ? NULL : old_opp->supplies,
-				      opp->supplies);
+				      old_opp->supplies, opp->supplies);
 	} else if (opp_table->regulators) {
 		ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq,
-						 IS_ERR(old_opp) ? NULL : old_opp->supplies,
+						 old_opp->supplies,
 						 opp->supplies);
 	} else {
 		/* Only frequency scaling */
@@ -1074,15 +1091,18 @@  int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
 
 	if (!ret) {
 		ret = _set_opp_bw(opp_table, opp, dev, false);
-		if (!ret)
+		if (!ret) {
 			opp_table->enabled = true;
+			dev_pm_opp_put(old_opp);
+
+			/* Make sure current_opp doesn't get freed */
+			dev_pm_opp_get(opp);
+			opp_table->current_opp = opp;
+		}
 	}
 
 put_opp:
 	dev_pm_opp_put(opp);
-put_old_opp:
-	if (!IS_ERR(old_opp))
-		dev_pm_opp_put(old_opp);
 put_opp_table:
 	dev_pm_opp_put_opp_table(opp_table);
 	return ret;
@@ -1276,6 +1296,9 @@  static void _opp_table_kref_release(struct kref *kref)
 	list_del(&opp_table->node);
 	mutex_unlock(&opp_table_lock);
 
+	if (opp_table->current_opp)
+		dev_pm_opp_put(opp_table->current_opp);
+
 	_of_clear_opp_table(opp_table);
 
 	/* Release clk */
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 4408cfcb0f31..359fd89d5770 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -135,6 +135,7 @@  enum opp_table_access {
  * @clock_latency_ns_max: Max clock latency in nanoseconds.
  * @parsed_static_opps: Count of devices for which OPPs are initialized from DT.
  * @shared_opp: OPP is shared between multiple devices.
+ * @current_opp: Currently configured OPP for the table.
  * @suspend_opp: Pointer to OPP to be used during device suspend.
  * @genpd_virt_dev_lock: Mutex protecting the genpd virtual device pointers.
  * @genpd_virt_devs: List of virtual devices for multiple genpd support.
@@ -183,6 +184,7 @@  struct opp_table {
 
 	unsigned int parsed_static_opps;
 	enum opp_table_access shared_opp;
+	struct dev_pm_opp *current_opp;
 	struct dev_pm_opp *suspend_opp;
 
 	struct mutex genpd_virt_dev_lock;