[v3,0/6] Introduce Bandwidth OPPs for interconnect paths

Message ID	20190703011020.151615-1-saravanak@google.com (mailing list archive)
Headers	show Return-Path: <linux-pm-owner@kernel.org> Date: Tue, 2 Jul 2019 18:10:14 -0700 Message-Id: <20190703011020.151615-1-saravanak@google.com> Mime-Version: 1.0 Subject: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths From: Saravana Kannan <saravanak@google.com> To: Georgi Djakov <georgi.djakov@linaro.org>, Rob Herring <robh+dt@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Viresh Kumar <vireshk@kernel.org>, Nishanth Menon <nm@ti.com>, Stephen Boyd <sboyd@kernel.org>, "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Saravana Kannan <saravanak@google.com>, vincent.guittot@linaro.org, seansw@qti.qualcomm.com, daidavid1@codeaurora.org, Rajendra Nayak <rnayak@codeaurora.org>, sibis@codeaurora.org, bjorn.andersson@linaro.org, evgreen@chromium.org, kernel-team@android.com, linux-pm@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-pm-owner@vger.kernel.org Precedence: bulk
Series	Introduce Bandwidth OPPs for interconnect paths \| expand [v3,0/6] Introduce Bandwidth OPPs for interconnect paths [v3,1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings [v3,2/6] OPP: Add support for bandwidth OPP tables [v3,3/6] OPP: Add helper function for bandwidth OPP tables [v3,4/6] OPP: Add API to find an OPP table from its DT node [v3,5/6] dt-bindings: interconnect: Add interconnect-opp-table property [v3,6/6] interconnect: Add OPP table support for interconnects

Saravana Kannan July 3, 2019, 1:10 a.m. UTC

Interconnects and interconnect paths quantify their performance levels in
terms of bandwidth and not in terms of frequency. So similar to how we have
frequency based OPP tables in DT and in the OPP framework, we need
bandwidth OPP table support in the OPP framework and in DT. Since there can
be more than one interconnect path used by a device, we also need a way to
assign a bandwidth OPP table to an interconnect path.

This patch series:
- Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
- Adds interconnect-opp-table property to interconnect DT bindings
- Adds OPP helper functions for bandwidth OPP tables
- Adds icc_get_opp_table() to get the OPP table for an interconnect path

So with the DT bindings added in this patch series, the DT for a GPU
that does bandwidth voting from GPU to Cache and GPU to DDR would look
something like this:

gpu_cache_opp_table: gpu_cache_opp_table {
	compatible = "operating-points-v2";

	gpu_cache_3000: opp-3000 {
		opp-peak-KBps = <3000>;
		opp-avg-KBps = <1000>;
	};
	gpu_cache_6000: opp-6000 {
		opp-peak-KBps = <6000>;
		opp-avg-KBps = <2000>;
	};
	gpu_cache_9000: opp-9000 {
		opp-peak-KBps = <9000>;
		opp-avg-KBps = <9000>;
	};
};

gpu_ddr_opp_table: gpu_ddr_opp_table {
	compatible = "operating-points-v2";

	gpu_ddr_1525: opp-1525 {
		opp-peak-KBps = <1525>;
		opp-avg-KBps = <452>;
	};
	gpu_ddr_3051: opp-3051 {
		opp-peak-KBps = <3051>;
		opp-avg-KBps = <915>;
	};
	gpu_ddr_7500: opp-7500 {
		opp-peak-KBps = <7500>;
		opp-avg-KBps = <3000>;
	};
};

gpu_opp_table: gpu_opp_table {
	compatible = "operating-points-v2";
	opp-shared;

	opp-200000000 {
		opp-hz = /bits/ 64 <200000000>;
	};
	opp-400000000 {
		opp-hz = /bits/ 64 <400000000>;
	};
};

gpu@7864000 {
	...
	operating-points-v2 = <&gpu_opp_table>, <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>;
	interconnects = <&mmnoc MASTER_GPU_1 &bimc SLAVE_SYSTEM_CACHE>,
			<&mmnoc MASTER_GPU_1 &bimc SLAVE_DDR>;
	interconnect-names = "gpu-cache", "gpu-mem";
	interconnect-opp-table = <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>
};

Cheers,
Saravana

Saravana Kannan (6):
  dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  OPP: Add support for bandwidth OPP tables
  OPP: Add helper function for bandwidth OPP tables
  OPP: Add API to find an OPP table from its DT node
  dt-bindings: interconnect: Add interconnect-opp-table property
  interconnect: Add OPP table support for interconnects

 .../bindings/interconnect/interconnect.txt    |  8 ++
 Documentation/devicetree/bindings/opp/opp.txt | 15 +++-
 drivers/interconnect/core.c                   | 27 ++++++-
 drivers/opp/core.c                            | 51 +++++++++++++
 drivers/opp/of.c                              | 76 ++++++++++++++++---
 drivers/opp/opp.h                             |  4 +-
 include/linux/interconnect.h                  |  7 ++
 include/linux/pm_opp.h                        | 26 +++++++
 8 files changed, 199 insertions(+), 15 deletions(-)

Viresh Kumar July 3, 2019, 6:36 a.m. UTC | #1

On 02-07-19, 18:10, Saravana Kannan wrote:
> Interconnects and interconnect paths quantify their performance levels in
> terms of bandwidth and not in terms of frequency. So similar to how we have
> frequency based OPP tables in DT and in the OPP framework, we need
> bandwidth OPP table support in the OPP framework and in DT. Since there can
> be more than one interconnect path used by a device, we also need a way to
> assign a bandwidth OPP table to an interconnect path.
> 
> This patch series:
> - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
> - Adds interconnect-opp-table property to interconnect DT bindings
> - Adds OPP helper functions for bandwidth OPP tables
> - Adds icc_get_opp_table() to get the OPP table for an interconnect path
> 
> So with the DT bindings added in this patch series, the DT for a GPU
> that does bandwidth voting from GPU to Cache and GPU to DDR would look
> something like this:

And what changed since V2 ?

Saravana Kannan July 3, 2019, 8:35 p.m. UTC | #2

On Tue, Jul 2, 2019 at 11:36 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 02-07-19, 18:10, Saravana Kannan wrote:
> > Interconnects and interconnect paths quantify their performance levels in
> > terms of bandwidth and not in terms of frequency. So similar to how we have
> > frequency based OPP tables in DT and in the OPP framework, we need
> > bandwidth OPP table support in the OPP framework and in DT. Since there can
> > be more than one interconnect path used by a device, we also need a way to
> > assign a bandwidth OPP table to an interconnect path.
> >
> > This patch series:
> > - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
> > - Adds interconnect-opp-table property to interconnect DT bindings
> > - Adds OPP helper functions for bandwidth OPP tables
> > - Adds icc_get_opp_table() to get the OPP table for an interconnect path
> >
> > So with the DT bindings added in this patch series, the DT for a GPU
> > that does bandwidth voting from GPU to Cache and GPU to DDR would look
> > something like this:
>
> And what changed since V2 ?

Sorry, forgot to put that in. I just dropped a lot of patches that
weren't relevant to the idea of BW OPPs and tying them to
interconnects.

-Saravana

Viresh Kumar July 17, 2019, 10:32 a.m. UTC | #3

On 02-07-19, 18:10, Saravana Kannan wrote:
> Interconnects and interconnect paths quantify their performance levels in
> terms of bandwidth and not in terms of frequency. So similar to how we have
> frequency based OPP tables in DT and in the OPP framework, we need
> bandwidth OPP table support in the OPP framework and in DT. Since there can
> be more than one interconnect path used by a device, we also need a way to
> assign a bandwidth OPP table to an interconnect path.
> 
> This patch series:
> - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
> - Adds interconnect-opp-table property to interconnect DT bindings
> - Adds OPP helper functions for bandwidth OPP tables
> - Adds icc_get_opp_table() to get the OPP table for an interconnect path
> 
> So with the DT bindings added in this patch series, the DT for a GPU
> that does bandwidth voting from GPU to Cache and GPU to DDR would look
> something like this:
> 
> gpu_cache_opp_table: gpu_cache_opp_table {
> 	compatible = "operating-points-v2";
> 
> 	gpu_cache_3000: opp-3000 {
> 		opp-peak-KBps = <3000>;
> 		opp-avg-KBps = <1000>;
> 	};
> 	gpu_cache_6000: opp-6000 {
> 		opp-peak-KBps = <6000>;
> 		opp-avg-KBps = <2000>;
> 	};
> 	gpu_cache_9000: opp-9000 {
> 		opp-peak-KBps = <9000>;
> 		opp-avg-KBps = <9000>;
> 	};
> };
> 
> gpu_ddr_opp_table: gpu_ddr_opp_table {
> 	compatible = "operating-points-v2";
> 
> 	gpu_ddr_1525: opp-1525 {
> 		opp-peak-KBps = <1525>;
> 		opp-avg-KBps = <452>;
> 	};
> 	gpu_ddr_3051: opp-3051 {
> 		opp-peak-KBps = <3051>;
> 		opp-avg-KBps = <915>;
> 	};
> 	gpu_ddr_7500: opp-7500 {
> 		opp-peak-KBps = <7500>;
> 		opp-avg-KBps = <3000>;
> 	};
> };

Who is going to use the above tables and how ? These are the maximum
BW available over these paths, right ?

> gpu_opp_table: gpu_opp_table {
> 	compatible = "operating-points-v2";
> 	opp-shared;
> 
> 	opp-200000000 {
> 		opp-hz = /bits/ 64 <200000000>;
> 	};
> 	opp-400000000 {
> 		opp-hz = /bits/ 64 <400000000>;
> 	};
> };

Shouldn't this link back to the above tables via required-opp, etc ?
How will we know how much BW is required by the GPU device for all the
paths ?

> gpu@7864000 {
> 	...
> 	operating-points-v2 = <&gpu_opp_table>, <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>;
> 	interconnects = <&mmnoc MASTER_GPU_1 &bimc SLAVE_SYSTEM_CACHE>,
> 			<&mmnoc MASTER_GPU_1 &bimc SLAVE_DDR>;
> 	interconnect-names = "gpu-cache", "gpu-mem";
> 	interconnect-opp-table = <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>
> };

Saravana Kannan July 17, 2019, 8:34 p.m. UTC | #4

On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 02-07-19, 18:10, Saravana Kannan wrote:
> > Interconnects and interconnect paths quantify their performance levels in
> > terms of bandwidth and not in terms of frequency. So similar to how we have
> > frequency based OPP tables in DT and in the OPP framework, we need
> > bandwidth OPP table support in the OPP framework and in DT. Since there can
> > be more than one interconnect path used by a device, we also need a way to
> > assign a bandwidth OPP table to an interconnect path.
> >
> > This patch series:
> > - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
> > - Adds interconnect-opp-table property to interconnect DT bindings
> > - Adds OPP helper functions for bandwidth OPP tables
> > - Adds icc_get_opp_table() to get the OPP table for an interconnect path
> >
> > So with the DT bindings added in this patch series, the DT for a GPU
> > that does bandwidth voting from GPU to Cache and GPU to DDR would look
> > something like this:
> >
> > gpu_cache_opp_table: gpu_cache_opp_table {
> >       compatible = "operating-points-v2";
> >
> >       gpu_cache_3000: opp-3000 {
> >               opp-peak-KBps = <3000>;
> >               opp-avg-KBps = <1000>;
> >       };
> >       gpu_cache_6000: opp-6000 {
> >               opp-peak-KBps = <6000>;
> >               opp-avg-KBps = <2000>;
> >       };
> >       gpu_cache_9000: opp-9000 {
> >               opp-peak-KBps = <9000>;
> >               opp-avg-KBps = <9000>;
> >       };
> > };
> >
> > gpu_ddr_opp_table: gpu_ddr_opp_table {
> >       compatible = "operating-points-v2";
> >
> >       gpu_ddr_1525: opp-1525 {
> >               opp-peak-KBps = <1525>;
> >               opp-avg-KBps = <452>;
> >       };
> >       gpu_ddr_3051: opp-3051 {
> >               opp-peak-KBps = <3051>;
> >               opp-avg-KBps = <915>;
> >       };
> >       gpu_ddr_7500: opp-7500 {
> >               opp-peak-KBps = <7500>;
> >               opp-avg-KBps = <3000>;
> >       };
> > };
>
> Who is going to use the above tables and how ?

In this example the GPU driver would use these. It'll go through these
and then decide what peak and average bw to pick based on whatever
criteria.

> These are the maximum
> BW available over these paths, right ?

I wouldn't call them "maximum" because there can't be multiple
maximums :) But yes, these are the meaningful bandwidth from the GPU's
perspective to use over these paths.

>
> > gpu_opp_table: gpu_opp_table {
> >       compatible = "operating-points-v2";
> >       opp-shared;
> >
> >       opp-200000000 {
> >               opp-hz = /bits/ 64 <200000000>;
> >       };
> >       opp-400000000 {
> >               opp-hz = /bits/ 64 <400000000>;
> >       };
> > };
>
> Shouldn't this link back to the above tables via required-opp, etc ?
> How will we know how much BW is required by the GPU device for all the
> paths ?

If that's what the GPU driver wants to do, then yes. But the GPU
driver could also choose to scale the bandwidth for these paths based
on multiple other signals. Eg: bus port busy percentage, measure
bandwidth, etc.

Or they might even decide to pick the BW OPP that matches the index of
their GPU OPP. It's up to them how they actually do the heuristic.

-Saravana

Viresh Kumar July 18, 2019, 5:37 a.m. UTC | #5

I know you have explained lots of things earlier as well, but they are
available over multiple threads and I don't know where to reply now :)

Lets have proper discussion (once again) here and be done with it.
Sorry for the trouble of explaining things again.

On 17-07-19, 13:34, Saravana Kannan wrote:
> On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > gpu_cache_opp_table: gpu_cache_opp_table {
> > >       compatible = "operating-points-v2";
> > >
> > >       gpu_cache_3000: opp-3000 {
> > >               opp-peak-KBps = <3000>;
> > >               opp-avg-KBps = <1000>;
> > >       };
> > >       gpu_cache_6000: opp-6000 {
> > >               opp-peak-KBps = <6000>;
> > >               opp-avg-KBps = <2000>;
> > >       };
> > >       gpu_cache_9000: opp-9000 {
> > >               opp-peak-KBps = <9000>;
> > >               opp-avg-KBps = <9000>;
> > >       };
> > > };
> > >
> > > gpu_ddr_opp_table: gpu_ddr_opp_table {
> > >       compatible = "operating-points-v2";
> > >
> > >       gpu_ddr_1525: opp-1525 {
> > >               opp-peak-KBps = <1525>;
> > >               opp-avg-KBps = <452>;
> > >       };
> > >       gpu_ddr_3051: opp-3051 {
> > >               opp-peak-KBps = <3051>;
> > >               opp-avg-KBps = <915>;
> > >       };
> > >       gpu_ddr_7500: opp-7500 {
> > >               opp-peak-KBps = <7500>;
> > >               opp-avg-KBps = <3000>;
> > >       };
> > > };
> >
> > Who is going to use the above tables and how ?
> 
> In this example the GPU driver would use these. It'll go through these
> and then decide what peak and average bw to pick based on whatever
> criteria.

Are you saying that the GPU driver will decide which bandwidth to
choose while running at a particular frequency (say 2 GHz) ? And that
it can choose 1525 or 3051 or 7500 from the ddr path ?

Will it be possible to publicly share how we derive to these decisions
?

The thing is I don't like these separate OPP tables which will not be
used by anyone else, but just GPU (or a single device). I would like
to put this data in the GPU OPP table only. What about putting a
range in the GPU OPP table for the Bandwidth if it can change so much
for the same frequency.

> > These are the maximum
> > BW available over these paths, right ?
> 
> I wouldn't call them "maximum" because there can't be multiple
> maximums :) But yes, these are the meaningful bandwidth from the GPU's
> perspective to use over these paths.
> 
> >
> > > gpu_opp_table: gpu_opp_table {
> > >       compatible = "operating-points-v2";
> > >       opp-shared;
> > >
> > >       opp-200000000 {
> > >               opp-hz = /bits/ 64 <200000000>;
> > >       };
> > >       opp-400000000 {
> > >               opp-hz = /bits/ 64 <400000000>;
> > >       };
> > > };
> >
> > Shouldn't this link back to the above tables via required-opp, etc ?
> > How will we know how much BW is required by the GPU device for all the
> > paths ?
> 
> If that's what the GPU driver wants to do, then yes. But the GPU
> driver could also choose to scale the bandwidth for these paths based
> on multiple other signals. Eg: bus port busy percentage, measure
> bandwidth, etc.

Lets say that the GPU is running at 2 GHz right now and based on above
inputs it wants to increase the bandwidth to 7500 for ddr path, now
does it make sense to run at 4 GHz instead of 2 so we utilize the
bandwidth to the best of our ability and waste less power ?

If something like that is acceptable, then what about keeping the
bandwidth fixed for frequencies and rather scale the frequency of the
GPU on the inputs your provided (like bus port busy percentage, etc).

The current proposal makes me wonder on why should we try to reuse OPP
tables for providing these bandwidth values as the OPP tables for
interconnect paths isn't really a lot of data, only bandwidth all the
time and there is no linking from the device's OPP table as well.

Saravana Kannan July 19, 2019, 4:12 a.m. UTC | #6

On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> I know you have explained lots of things earlier as well, but they are
> available over multiple threads and I don't know where to reply now :)
>
> Lets have proper discussion (once again) here and be done with it.
> Sorry for the trouble of explaining things again.
>
> On 17-07-19, 13:34, Saravana Kannan wrote:
> > On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > > gpu_cache_opp_table: gpu_cache_opp_table {
> > > >       compatible = "operating-points-v2";
> > > >
> > > >       gpu_cache_3000: opp-3000 {
> > > >               opp-peak-KBps = <3000>;
> > > >               opp-avg-KBps = <1000>;
> > > >       };
> > > >       gpu_cache_6000: opp-6000 {
> > > >               opp-peak-KBps = <6000>;
> > > >               opp-avg-KBps = <2000>;
> > > >       };
> > > >       gpu_cache_9000: opp-9000 {
> > > >               opp-peak-KBps = <9000>;
> > > >               opp-avg-KBps = <9000>;
> > > >       };
> > > > };
> > > >
> > > > gpu_ddr_opp_table: gpu_ddr_opp_table {
> > > >       compatible = "operating-points-v2";
> > > >
> > > >       gpu_ddr_1525: opp-1525 {
> > > >               opp-peak-KBps = <1525>;
> > > >               opp-avg-KBps = <452>;
> > > >       };
> > > >       gpu_ddr_3051: opp-3051 {
> > > >               opp-peak-KBps = <3051>;
> > > >               opp-avg-KBps = <915>;
> > > >       };
> > > >       gpu_ddr_7500: opp-7500 {
> > > >               opp-peak-KBps = <7500>;
> > > >               opp-avg-KBps = <3000>;
> > > >       };
> > > > };
> > >
> > > Who is going to use the above tables and how ?
> >
> > In this example the GPU driver would use these. It'll go through these
> > and then decide what peak and average bw to pick based on whatever
> > criteria.
>
> Are you saying that the GPU driver will decide which bandwidth to
> choose while running at a particular frequency (say 2 GHz) ? And that
> it can choose 1525 or 3051 or 7500 from the ddr path ?
>
> Will it be possible to publicly share how we derive to these decisions
> ?

GPU is just an example. So I can't really speak for how a random GPU
driver might decide the bandwidth to pick.

But one obvious way is to start at the lowest bandwidth and check the
bus port busy%. If it's > 80% busy, it'll pick the next bandwidth,
etc. So, something like what cpufreq ondemand or conservative governor
used to do.

> The thing is I don't like these separate OPP tables which will not be
> used by anyone else, but just GPU (or a single device).

The BW OPP table isn't always a secondary OPP table. It can be a
primary OPP table too. For example, if you have a bandwidth monitoring
device/HW IP that can measure for a path and make requests for that
path, it'll have a BW OPP table and it'll pick from one of those BW
OPP levels based on the hardware measurements. It will have it's own
device driver. This is basically no different from a device being the
only user of a freq OPP table.

> I would like
> to put this data in the GPU OPP table only. What about putting a
> range in the GPU OPP table for the Bandwidth if it can change so much
> for the same frequency.

I don't think the range is going to work. If a GPU is doing purely
computational work, it's not unreasonable for it to vote for the
lowest bandwidth for any GPU frequency.

>
> > > These are the maximum
> > > BW available over these paths, right ?
> >
> > I wouldn't call them "maximum" because there can't be multiple
> > maximums :) But yes, these are the meaningful bandwidth from the GPU's
> > perspective to use over these paths.
> >
> > >
> > > > gpu_opp_table: gpu_opp_table {
> > > >       compatible = "operating-points-v2";
> > > >       opp-shared;
> > > >
> > > >       opp-200000000 {
> > > >               opp-hz = /bits/ 64 <200000000>;
> > > >       };
> > > >       opp-400000000 {
> > > >               opp-hz = /bits/ 64 <400000000>;
> > > >       };
> > > > };
> > >
> > > Shouldn't this link back to the above tables via required-opp, etc ?
> > > How will we know how much BW is required by the GPU device for all the
> > > paths ?
> >
> > If that's what the GPU driver wants to do, then yes. But the GPU
> > driver could also choose to scale the bandwidth for these paths based
> > on multiple other signals. Eg: bus port busy percentage, measure
> > bandwidth, etc.
>
> Lets say that the GPU is running at 2 GHz right now and based on above
> inputs it wants to increase the bandwidth to 7500 for ddr path, now
> does it make sense to run at 4 GHz instead of 2 so we utilize the
> bandwidth to the best of our ability and waste less power ?

This is kinda hard to explain, but I'll try.

Firstly, the GPU power increase might be so high that you might not
want to do this anyway.

Also, what you are proposing *might* improve the perf/mW (efficiency)
but it doesn't decrease the actual power consumption. So, this doesn't
really work towards saving power for mobile devices.

Also, if the GPU is generating a lot of traffic to DDR and you
increase the GPU frequency, it's only going to generate even more
traffic. So you'll end up in a positive feedback loop that maxes out
the frequency and bandwidth. Definitely not something you want for a
mobile device.

> If something like that is acceptable, then what about keeping the
> bandwidth fixed for frequencies and rather scale the frequency of the
> GPU on the inputs your provided (like bus port busy percentage, etc).

I don't think it's acceptable.

> The current proposal makes me wonder on why should we try to reuse OPP
> tables for providing these bandwidth values as the OPP tables for
> interconnect paths isn't really a lot of data, only bandwidth all the
> time and there is no linking from the device's OPP table as well.

I think everyone is getting too tied up on mapping device frequency to
bandwidth requests. That's useful for a limited set of cases. But it
doesn't work for a lot of use cases.

Couple of benefits of using BW OPPs instead of with listing bandwidth
values as part of frequency OPP tables:
- Works better when the interconnect path has more useful levels that
the device frequency levels. I think this might even be true on the
SDM845 for GPU and DDR. The link from freq OPP to BW OPP could list
the minimum bandwidth level to use for a particular device freq and
then let the hardware monitoring heuristic take it higher from there.
- Works even if no freq to bandwidth mapping heuristic is used but the
device needs to skip certain bandwidth levels based on the platform's
power/perf reasons.
- More scalable as more properties are added to BW OPP levels. Traffic
priority is one natural extension of the BW OPP "rows". Explicit
latency is another possibility.
- Currently devices that have use case specific bandwidth levels
(that's not computed at runtime) have no way of capturing their use
case level bandwidth needs in DT. Everyone is inventing their own
scheme. Having BW OPP table would allow them capture all the use case
specific bandwidth levels in DT and then pick one using the
index/phandle/etc. We could even allow naming OPP rows and pick it
that way. Not saying this is a main reason for BW OPP tables or we
should do this, but this is a possibility to consider.

Long story short, BW OPP tables make a lot of sense for anyone who has
actually done bandwidth scaling on a commercial platform.

If people are getting too tied up about the interconnect-opp-table we
can just drop that. I just added that to avoid having any implicit
ordering of tables in the operation-points-v2 property vs
interconnects property and call it out more explicitly. But it's not a
hill worth dying on.

-Saravana

Viresh Kumar July 29, 2019, 9:24 a.m. UTC | #7

On 18-07-19, 21:12, Saravana Kannan wrote:
> On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > I would like
> > to put this data in the GPU OPP table only. What about putting a
> > range in the GPU OPP table for the Bandwidth if it can change so much
> > for the same frequency.
> 
> I don't think the range is going to work.

Any specific reason for that ?

> If a GPU is doing purely
> computational work, it's not unreasonable for it to vote for the
> lowest bandwidth for any GPU frequency.

I think that is fine, but if the GPU is able to find how much
bandwidth it needs why can't it just pass that value without needing
to have another OPP table for the path ?

The interconnect can then take all the requests and have its own OPP
table to get help on deciding what stable bandwidth to use.

Saravana Kannan July 29, 2019, 8:12 p.m. UTC | #8

On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 18-07-19, 21:12, Saravana Kannan wrote:
> > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > I would like
> > > to put this data in the GPU OPP table only. What about putting a
> > > range in the GPU OPP table for the Bandwidth if it can change so much
> > > for the same frequency.
> >
> > I don't think the range is going to work.
>
> Any specific reason for that ?

The next sentence was literally explaining this :) Fine to debate
that, but ignoring that and asking this question is kinda funny.

> > If a GPU is doing purely
> > computational work, it's not unreasonable for it to vote for the
> > lowest bandwidth for any GPU frequency.
>
> I think that is fine, but if the GPU is able to find how much
> bandwidth it needs why can't it just pass that value without needing
> to have another OPP table for the path ?

You were asking this question in the context of "can the GPU OPP just
list all the range of bandwidth it might use per GPU frequency". My point
is that the range would be useless because it would the entire
available bandwidth range (because purely compute work might not need
any bandwidth).

Whereas, what the GPU's algorithm actually needs might be the list of
"useful" bandwidth levels to use.

Also, as we add more ICC request properties, this range idea will not scale.

-Saravana

Viresh Kumar July 30, 2019, 3:01 a.m. UTC | #9

On 29-07-19, 13:12, Saravana Kannan wrote:
> On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 18-07-19, 21:12, Saravana Kannan wrote:
> > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > > I would like
> > > > to put this data in the GPU OPP table only. What about putting a
> > > > range in the GPU OPP table for the Bandwidth if it can change so much
> > > > for the same frequency.
> > >
> > > I don't think the range is going to work.
> >
> > Any specific reason for that ?
> 
> The next sentence was literally explaining this :) Fine to debate
> that, but ignoring that and asking this question is kinda funny.

Okay, but ...

> > > If a GPU is doing purely
> > > computational work, it's not unreasonable for it to vote for the
> > > lowest bandwidth for any GPU frequency.

... it wasn't clear to me even after reading this sentence again now
:)

I understand that you may have to vote for the lowest bandwidth but
that doesn't explain why a range can't work (sorry if it was just me
who doesn't understood it :)).

> > I think that is fine, but if the GPU is able to find how much
> > bandwidth it needs why can't it just pass that value without needing
> > to have another OPP table for the path ?
> 
> You were asking this question in the context of "can the GPU OPP just
> list all the range of bandwidth it might use per GPU frequency". My point
> is that the range would be useless because it would the entire
> available bandwidth range (because purely compute work might not need
> any bandwidth).

If it is useless to have entire range here, then why bother providing
one ? Why can't the GPU request what it needs in exact terms, based on
its calculations ? And then based on these requests, let the
interconnect find what's the best/stable values it really wants to
program the path for (and for that the interconnect can use its own
OPP table, which would be fine).

> Whereas, what the GPU's algorithm actually needs might be the list of
> "useful" bandwidth levels to use.

Hmm, I am not sure GPU's algorithm needs this table AFAIU based on all
the conversations we had until now. It is very capable of finding how
much bandwidth it needs, you just want the GPU driver to finally align
that with a stable bandwidth for the platform later on. And what I am
asking is that it is not required for the end driver to look for
stable values, it just requests what it wants and let the interconnect
core/driver decide the stable values.

Very much like the clock framework, most of the user drivers just ask
for a clk value to be programmed and it is the clock driver which
keeps a table of the stable values and then aligns the requested value
to one of those.

> Also, as we add more ICC request properties, this range idea will not scale.

I am not sure about what kind of properties there can be and where
should they land. My feedback is completely based on what you have
presented in this patchset.

Saravana Kannan Aug. 3, 2019, 7:36 a.m. UTC | #10

Resending due to HTML.

On Mon, Jul 29, 2019 at 8:02 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 29-07-19, 13:12, Saravana Kannan wrote:
> > On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 18-07-19, 21:12, Saravana Kannan wrote:
> > > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > > > I would like
> > > > > to put this data in the GPU OPP table only. What about putting a
> > > > > range in the GPU OPP table for the Bandwidth if it can change so much
> > > > > for the same frequency.
> > > >
> > > > I don't think the range is going to work.
> > >
> > > Any specific reason for that ?
> >
> > The next sentence was literally explaining this :) Fine to debate
> > that, but ignoring that and asking this question is kinda funny.
>
> Okay, but ...
>
> > > > If a GPU is doing purely
> > > > computational work, it's not unreasonable for it to vote for the
> > > > lowest bandwidth for any GPU frequency.
>
> ... it wasn't clear to me even after reading this sentence again now
> :)
>
> I understand that you may have to vote for the lowest bandwidth but
> that doesn't explain why a range can't work (sorry if it was just me
> who doesn't understood it :)).

Well, doesn't work as in, it doesn't give any additional info. I can
just vote for 0 or UINT_MAX if I want to stay at the lowest or high
bandwidth. Having the actual values of the lowest or highest point
doesn't help for cases where you need to skip intermediate bandwidth
levels when going from low to high (as the need increases).

>
> > > I think that is fine, but if the GPU is able to find how much
> > > bandwidth it needs why can't it just pass that value without needing
> > > to have another OPP table for the path ?
> >
> > You were asking this question in the context of "can the GPU OPP just
> > list all the range of bandwidth it might use per GPU frequency". My point
> > is that the range would be useless because it would the entire
> > available bandwidth range (because purely compute work might not need
> > any bandwidth).
>
> If it is useless to have entire range here, then why bother providing
> one ? Why can't the GPU request what it needs in exact terms, based on
> its calculations ? And then based on these requests, let the
> interconnect find what's the best/stable values it really wants to
> program the path for (and for that the interconnect can use its own
> OPP table, which would be fine).

Let's say there actual path can support 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 GB/s.

Let's say 2, 3, and 4 need the same voltage level as 5 for this path.
So, for GPU's needs using 2, 3 and 4 GB/s might not be good because
the power savings from the frequency difference is not worth the
performance and power (if you run the interconnect slow, the GPU would
run faster to achieve the same performance) impact compared to running
the interconnect at 5 GB/s. Similarly it might skip 6 GB/s. So even if
the GPU can somehow calculate the exact bandwidth required (or say
measure it), it'll need to know to skip 2, 3 and 4 because they aren't
power/perf efficient levels to use.

But all these bandwidth levels might be useable for a smaller HW IP
whose power cost isn't high. So power savings running the interconnect
at 3 GB/s might be worth it -- because even if the small HW IP ran
faster to achieve the performance, the power increase in the HW IP
won't be higher than the power savings from running the interconnect
slower.

> > Whereas, what the GPU's algorithm actually needs might be the list of
> > "useful" bandwidth levels to use.
>
> Hmm, I am not sure GPU's algorithm needs this table AFAIU based on all
> the conversations we had until now. It is very capable of finding how
> much bandwidth it needs,

Not really. If you have a monitor that can actually measure the
bandwidth, yes. Most often that's not the case. If you just have a
monitor that can give you the bus port busy% then it'll have to use
this table to pick the useful ones. As in, in the example above, if
the bus is still too busy at 1 GB/s it would directly ask for 5 GB/s
instead of going through 2, 3 and 4.

> you just want the GPU driver to finally align
> that with a stable bandwidth for the platform later on. And what I am
> asking is that it is not required for the end driver to look for
> stable values, it just requests what it wants and let the interconnect
> core/driver decide the stable values.

I think you've misunderstood my prior statements then.

The interconnect driver would then still have to aggregate the
requests and pick the final frequency for the interconnect. That's
where it comes in -- executing/implementing the requests of all the
clients.

> Very much like the clock framework, most of the user drivers just ask
> for a clk value to be programmed and it is the clock driver which
> keeps a table of the stable values and then aligns the requested value
> to one of those.

This of this similar to the clock API and the OPP tables for CPUs. The
clock API could run the CPU at multiple different frequencies. But the
CPU driver uses the CPU OPP table to pick a certain set of frequencies
that are "useful". If your CPU clock is shared with another block, say
L3 and there's an L3 driver that's requesting a different frequency
range (using clk_set_rate_range(x, UINT_MAX)), that's where the clock
driver aggregates their request and set's the final clock frequency.

Similarly, the GPU driver wants to pick useful interconnect path
bandwidth levels using BW OPP tables. And the interconnect framework
aggregates the requests across multiple drivers requesting different
bandwidths.

Does that make sense?

-Saravana

[v3,0/6] Introduce Bandwidth OPPs for interconnect paths

Message

Comments