drm/etnaviv: add etnaviv cooling device

Message ID	E1cn8jj-0002mp-SO@rmk-PC.armlinux.org.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Russell King <rmk+kernel@armlinux.org.uk> To: Lucas Stach <l.stach@pengutronix.de>, Christian Gmeiner <christian.gmeiner@gmail.com> Subject: [PATCH] drm/etnaviv: add etnaviv cooling device MIME-Version: 1.0 Content-Disposition: inline Message-Id: <E1cn8jj-0002mp-SO@rmk-PC.armlinux.org.uk> Date: Sun, 12 Mar 2017 19:00:59 +0000 Precedence: list Cc: David Airlie <airlied@linux.ie>, etnaviv@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Russell King (Oracle) March 12, 2017, 7 p.m. UTC

Each Vivante GPU contains a clock divider which can divide the GPU clock
by 2^n, which can lower the power dissipation from the GPU.  It has been
suggested that the GC600 on Dove is responsible for 20-30% of the power
dissipation from the SoC, so lowering the GPU clock rate provides a way
to throttle the power dissiptation, and reduce the temperature when the
SoC gets hot.

This patch hooks the Etnaviv driver into the kernel's thermal management
to allow the GPUs to be throttled when necessary, allowing a reduction in
GPU clock rate from /1 to /64 in power of 2 steps.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 84 ++++++++++++++++++++++++++++-------
 drivers/gpu/drm/etnaviv/etnaviv_gpu.h |  2 +
 2 files changed, 71 insertions(+), 15 deletions(-)

Lucas Stach March 15, 2017, 1:03 p.m. UTC | #1

Am Sonntag, den 12.03.2017, 19:00 +0000 schrieb Russell King:
> Each Vivante GPU contains a clock divider which can divide the GPU clock
> by 2^n, which can lower the power dissipation from the GPU.  It has been
> suggested that the GC600 on Dove is responsible for 20-30% of the power
> dissipation from the SoC, so lowering the GPU clock rate provides a way
> to throttle the power dissiptation, and reduce the temperature when the
> SoC gets hot.
> 
> This patch hooks the Etnaviv driver into the kernel's thermal management
> to allow the GPUs to be throttled when necessary, allowing a reduction in
> GPU clock rate from /1 to /64 in power of 2 steps.

Are those power of 2 steps a hardware limitation, or is it something you
implemented this way to get a smaller number of steps, with a more
meaningful difference in clock speed?
My understanding was that the FSCALE value is just a regular divider
with all steps values in the range of 1-64 being usable.

Regards,
Lucas
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 84 ++++++++++++++++++++++++++++-------
>  drivers/gpu/drm/etnaviv/etnaviv_gpu.h |  2 +
>  2 files changed, 71 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> index 130d7d517a19..bd95182d0852 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> @@ -18,6 +18,7 @@
>  #include <linux/dma-fence.h>
>  #include <linux/moduleparam.h>
>  #include <linux/of_device.h>
> +#include <linux/thermal.h>
>  
>  #include "etnaviv_cmdbuf.h"
>  #include "etnaviv_dump.h"
> @@ -409,6 +410,17 @@ static void etnaviv_gpu_load_clock(struct etnaviv_gpu *gpu, u32 clock)
>  	gpu_write(gpu, VIVS_HI_CLOCK_CONTROL, clock);
>  }
>  
> +static void etnaviv_gpu_update_clock(struct etnaviv_gpu *gpu)
> +{
> +	unsigned int fscale = 1 << (6 - gpu->freq_scale);
> +	u32 clock;
> +
> +	clock = VIVS_HI_CLOCK_CONTROL_DISABLE_DEBUG_REGISTERS |
> +		VIVS_HI_CLOCK_CONTROL_FSCALE_VAL(fscale);
> +
> +	etnaviv_gpu_load_clock(gpu, clock);
> +}
> +
>  static int etnaviv_hw_reset(struct etnaviv_gpu *gpu)
>  {
>  	u32 control, idle;
> @@ -426,11 +438,10 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu)
>  	timeout = jiffies + msecs_to_jiffies(1000);
>  
>  	while (time_is_after_jiffies(timeout)) {
> -		control = VIVS_HI_CLOCK_CONTROL_DISABLE_DEBUG_REGISTERS |
> -			  VIVS_HI_CLOCK_CONTROL_FSCALE_VAL(0x40);
> -
>  		/* enable clock */
> -		etnaviv_gpu_load_clock(gpu, control);
> +		etnaviv_gpu_update_clock(gpu);
> +
> +		control = gpu_read(gpu, VIVS_HI_CLOCK_CONTROL);
>  
>  		/* Wait for stable clock.  Vivante's code waited for 1ms */
>  		usleep_range(1000, 10000);
> @@ -490,11 +501,7 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu)
>  	}
>  
>  	/* We rely on the GPU running, so program the clock */
> -	control = VIVS_HI_CLOCK_CONTROL_DISABLE_DEBUG_REGISTERS |
> -		  VIVS_HI_CLOCK_CONTROL_FSCALE_VAL(0x40);
> -
> -	/* enable clock */
> -	etnaviv_gpu_load_clock(gpu, control);
> +	etnaviv_gpu_update_clock(gpu);
>  
>  	return 0;
>  }
> @@ -1526,17 +1533,13 @@ static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu)
>  #ifdef CONFIG_PM
>  static int etnaviv_gpu_hw_resume(struct etnaviv_gpu *gpu)
>  {
> -	u32 clock;
>  	int ret;
>  
>  	ret = mutex_lock_killable(&gpu->lock);
>  	if (ret)
>  		return ret;
>  
> -	clock = VIVS_HI_CLOCK_CONTROL_DISABLE_DEBUG_REGISTERS |
> -		VIVS_HI_CLOCK_CONTROL_FSCALE_VAL(0x40);
> -
> -	etnaviv_gpu_load_clock(gpu, clock);
> +	etnaviv_gpu_update_clock(gpu);
>  	etnaviv_gpu_hw_init(gpu);
>  
>  	gpu->switch_context = true;
> @@ -1548,6 +1551,47 @@ static int etnaviv_gpu_hw_resume(struct etnaviv_gpu *gpu)
>  }
>  #endif
>  
> +static int
> +etnaviv_gpu_cooling_get_max_state(struct thermal_cooling_device *cdev,
> +				  unsigned long *state)
> +{
> +	*state = 6;
> +
> +	return 0;
> +}
> +
> +static int
> +etnaviv_gpu_cooling_get_cur_state(struct thermal_cooling_device *cdev,
> +				  unsigned long *state)
> +{
> +	struct etnaviv_gpu *gpu = cdev->devdata;
> +
> +	*state = gpu->freq_scale;
> +
> +	return 0;
> +}
> +
> +static int
> +etnaviv_gpu_cooling_set_cur_state(struct thermal_cooling_device *cdev,
> +				  unsigned long state)
> +{
> +	struct etnaviv_gpu *gpu = cdev->devdata;
> +
> +	mutex_lock(&gpu->lock);
> +	gpu->freq_scale = state;
> +	if (!pm_runtime_suspended(gpu->dev))
> +		etnaviv_gpu_update_clock(gpu);
> +	mutex_unlock(&gpu->lock);
> +
> +	return 0;
> +}
> +
> +static struct thermal_cooling_device_ops cooling_ops = {
> +	.get_max_state = etnaviv_gpu_cooling_get_max_state,
> +	.get_cur_state = etnaviv_gpu_cooling_get_cur_state,
> +	.set_cur_state = etnaviv_gpu_cooling_set_cur_state,
> +};
> +
>  static int etnaviv_gpu_bind(struct device *dev, struct device *master,
>  	void *data)
>  {
> @@ -1556,13 +1600,20 @@ static int etnaviv_gpu_bind(struct device *dev, struct device *master,
>  	struct etnaviv_gpu *gpu = dev_get_drvdata(dev);
>  	int ret;
>  
> +	gpu->cooling = thermal_of_cooling_device_register(dev->of_node,
> +				(char *)dev_name(dev), gpu, &cooling_ops);
> +	if (IS_ERR(gpu->cooling))
> +		return PTR_ERR(gpu->cooling);
> +
>  #ifdef CONFIG_PM
>  	ret = pm_runtime_get_sync(gpu->dev);
>  #else
>  	ret = etnaviv_gpu_clk_enable(gpu);
>  #endif
> -	if (ret < 0)
> +	if (ret < 0) {
> +		thermal_cooling_device_unregister(gpu->cooling);
>  		return ret;
> +	}
>  
>  	gpu->drm = drm;
>  	gpu->fence_context = dma_fence_context_alloc(1);
> @@ -1616,6 +1667,9 @@ static void etnaviv_gpu_unbind(struct device *dev, struct device *master,
>  	}
>  
>  	gpu->drm = NULL;
> +
> +	thermal_cooling_device_unregister(gpu->cooling);
> +	gpu->cooling = NULL;
>  }
>  
>  static const struct component_ops gpu_ops = {
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
> index 1c0606ea7d5e..6a1e68eec24c 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
> @@ -97,6 +97,7 @@ struct etnaviv_cmdbuf;
>  
>  struct etnaviv_gpu {
>  	struct drm_device *drm;
> +	struct thermal_cooling_device *cooling;
>  	struct device *dev;
>  	struct mutex lock;
>  	struct etnaviv_chip_identity identity;
> @@ -150,6 +151,7 @@ struct etnaviv_gpu {
>  	u32 hangcheck_fence;
>  	u32 hangcheck_dma_addr;
>  	struct work_struct recover_work;
> +	unsigned int freq_scale;
>  };
>  
>  static inline void gpu_write(struct etnaviv_gpu *gpu, u32 reg, u32 data)

Russell King (Oracle) March 15, 2017, 2:05 p.m. UTC | #2

On Wed, Mar 15, 2017 at 02:03:09PM +0100, Lucas Stach wrote:
> Am Sonntag, den 12.03.2017, 19:00 +0000 schrieb Russell King:
> > Each Vivante GPU contains a clock divider which can divide the GPU clock
> > by 2^n, which can lower the power dissipation from the GPU.  It has been
> > suggested that the GC600 on Dove is responsible for 20-30% of the power
> > dissipation from the SoC, so lowering the GPU clock rate provides a way
> > to throttle the power dissiptation, and reduce the temperature when the
> > SoC gets hot.
> > 
> > This patch hooks the Etnaviv driver into the kernel's thermal management
> > to allow the GPUs to be throttled when necessary, allowing a reduction in
> > GPU clock rate from /1 to /64 in power of 2 steps.
> 
> Are those power of 2 steps a hardware limitation, or is it something you
> implemented this way to get a smaller number of steps, with a more
> meaningful difference in clock speed?
> My understanding was that the FSCALE value is just a regular divider
> with all steps values in the range of 1-64 being usable.

I don't share your understanding.  The Vivante GAL kernel driver only
ever sets power-of-two values.  I have no evidence to support your
suggestion.

There's evidence that says your understanding is incorrect however.
It isn't a divider.  A value of 0x40 gives the fastest clock rate,
a value of 0x01 gives the slowest.  If it were a binary divider,
a value of 0x7f would give the slowest rate - so why doesn't Vivante
use that in galcore when putting the GPU into idle/lower power - why
do they just 0x01.

This all leads me to believe that it's not a binary divider, but a
set of bits that select the clock from a set of divide-by-two stages,
and having more than one bit set is invalid.

However, without definitive information from Vivante, we'll never
really know.  We're unlikely to get that.

Chris Healy March 19, 2017, 8:03 p.m. UTC | #3

I don't have any input on this binary divider subject but I do want to
bring up some observations regarding Etnaviv GPU power management that
seems relevant.

I've done some comparisons between the Freescale Vivante GPU driver
stack (1) and the Marvell PXA1928 Vivante GPU driver stack (2) and see
more functionality in the PXA1928 stack than the Freescale i.MX6 stack
that may be of value for Etnaviv.  When I look at the Marvell PXA1928
Vivante GPU driver stack, (2) I see "gpufreq" code (3) that includes
support for conservative, ondemand, performance, powersave, and
userspace governors.  Additionally, AFAIK the key feature needed to
support a gpufreq driver and associated governors is to be able to
know what the load is on the GPU.  When looking at the PXA1928 driver,
it seems that it is looking at some load counters within the GPU that
are likely to be common across platforms.  (Check
"gpufreq_get_gpu_load" (4) in gpufreq.c.)

Also, given the wealth of counters present in the 3DGPU and my
understanding that there are 3 different controllable GPU frequencies
(at least with the i.MX6), it seems that one could dynamically adjust
each of these 3 different controllable frequencies independently based
on associated load counters.  The i.MX6 has 3 different frequencies,
IIRC, AXI, 3DGPU core, and 3DGPU shader.  I believe there are counters
associated with each of these GPU sub-blocks so it seems feasible to
adjust each of the 3 buses based on the sub-block load.  (I'm no
expert by any means with any of this so this may be crazy talk...)

If my observations are correct that the gpufreq functionality present
in the PXA1928 driver is portable across SoC platforms with the
Vivante 3D GPUs, does it make sense to add a gpufreq driver with the
Etnaviv driver?

What are the benefits and drawbacks of implementing a gpufreq driver
with associated governors in comparison to adding this cooling device
driver functionality?  (It seems to me that a gpufreq driver is more
proactive and the cooling device is more reactive.)

Can and should gpufreq driver functionality (such as that present in
the PXA1928 driver) and the proposed cooling device functionality
co-exist?

(1) - https://github.com/etnaviv/vivante_kernel_drivers/tree/master/imx6_v4_0_0
(2) - https://github.com/etnaviv/vivante_kernel_drivers/tree/master/pxa1928
(3) - https://github.com/etnaviv/vivante_kernel_drivers/tree/master/pxa1928/hal/os/linux/kernel/gpufreq
(4) - https://github.com/etnaviv/vivante_kernel_drivers/blob/master/pxa1928/hal/os/linux/kernel/gpufreq/gpufreq.c#L1294

On Wed, Mar 15, 2017 at 7:05 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Wed, Mar 15, 2017 at 02:03:09PM +0100, Lucas Stach wrote:
>> Am Sonntag, den 12.03.2017, 19:00 +0000 schrieb Russell King:
>> > Each Vivante GPU contains a clock divider which can divide the GPU clock
>> > by 2^n, which can lower the power dissipation from the GPU.  It has been
>> > suggested that the GC600 on Dove is responsible for 20-30% of the power
>> > dissipation from the SoC, so lowering the GPU clock rate provides a way
>> > to throttle the power dissiptation, and reduce the temperature when the
>> > SoC gets hot.
>> >
>> > This patch hooks the Etnaviv driver into the kernel's thermal management
>> > to allow the GPUs to be throttled when necessary, allowing a reduction in
>> > GPU clock rate from /1 to /64 in power of 2 steps.
>>
>> Are those power of 2 steps a hardware limitation, or is it something you
>> implemented this way to get a smaller number of steps, with a more
>> meaningful difference in clock speed?
>> My understanding was that the FSCALE value is just a regular divider
>> with all steps values in the range of 1-64 being usable.
>
> I don't share your understanding.  The Vivante GAL kernel driver only
> ever sets power-of-two values.  I have no evidence to support your
> suggestion.
>
> There's evidence that says your understanding is incorrect however.
> It isn't a divider.  A value of 0x40 gives the fastest clock rate,
> a value of 0x01 gives the slowest.  If it were a binary divider,
> a value of 0x7f would give the slowest rate - so why doesn't Vivante
> use that in galcore when putting the GPU into idle/lower power - why
> do they just 0x01.
>
> This all leads me to believe that it's not a binary divider, but a
> set of bits that select the clock from a set of divide-by-two stages,
> and having more than one bit set is invalid.
>
> However, without definitive information from Vivante, we'll never
> really know.  We're unlikely to get that.
>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
> _______________________________________________
> etnaviv mailing list
> etnaviv@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/etnaviv

Russell King (Oracle) March 19, 2017, 8:50 p.m. UTC | #4

On Sun, Mar 19, 2017 at 01:03:42PM -0700, Chris Healy wrote:
> I don't have any input on this binary divider subject but I do want to
> bring up some observations regarding Etnaviv GPU power management that
> seems relevant.

GPU cooling isn't really to do with GPU power management, it's more
to do with helping avoid the SoC overheating, and going into emergency
shutdown.

So, it's solving a completely different problem.

However, thank you for the links, it'll be something to be researched
at some point.

Lucas Stach March 20, 2017, 9:19 a.m. UTC | #5

Am Sonntag, den 19.03.2017, 13:03 -0700 schrieb Chris Healy:
> I don't have any input on this binary divider subject but I do want to
> bring up some observations regarding Etnaviv GPU power management that
> seems relevant.
> 
> I've done some comparisons between the Freescale Vivante GPU driver
> stack (1) and the Marvell PXA1928 Vivante GPU driver stack (2) and see
> more functionality in the PXA1928 stack than the Freescale i.MX6 stack
> that may be of value for Etnaviv.  When I look at the Marvell PXA1928
> Vivante GPU driver stack, (2) I see "gpufreq" code (3) that includes
> support for conservative, ondemand, performance, powersave, and
> userspace governors.  Additionally, AFAIK the key feature needed to
> support a gpufreq driver and associated governors is to be able to
> know what the load is on the GPU.  When looking at the PXA1928 driver,
> it seems that it is looking at some load counters within the GPU that
> are likely to be common across platforms.  (Check
> "gpufreq_get_gpu_load" (4) in gpufreq.c.)
> 
> Also, given the wealth of counters present in the 3DGPU and my
> understanding that there are 3 different controllable GPU frequencies
> (at least with the i.MX6), it seems that one could dynamically adjust
> each of these 3 different controllable frequencies independently based
> on associated load counters.  The i.MX6 has 3 different frequencies,
> IIRC, AXI, 3DGPU core, and 3DGPU shader.  I believe there are counters
> associated with each of these GPU sub-blocks so it seems feasible to
> adjust each of the 3 buses based on the sub-block load.  (I'm no
> expert by any means with any of this so this may be crazy talk...)
> 
> If my observations are correct that the gpufreq functionality present
> in the PXA1928 driver is portable across SoC platforms with the
> Vivante 3D GPUs, does it make sense to add a gpufreq driver with the
> Etnaviv driver?
> 
> What are the benefits and drawbacks of implementing a gpufreq driver
> with associated governors in comparison to adding this cooling device
> driver functionality?  (It seems to me that a gpufreq driver is more
> proactive and the cooling device is more reactive.)
> 
> Can and should gpufreq driver functionality (such as that present in
> the PXA1928 driver) and the proposed cooling device functionality
> co-exist?

Yes, probably we want to have both at some point. The cooling-device
stuff is about throttling the GPU when the SoC reaches critical
temperatures.

The devfreq governors are about providing exactly the right performance
point.
Though as I have not yet seen any SoCs where the voltage would be scaled
with GPU frequency, downclocking the GPU is of limited value. For most
of those SoCs a race-to-idle policy is probably okay, as this allows
other components of the system like the DRAM to go into lower power
operating modes when the GPU is idle.

Regards,
Lucas

Lucas Stach March 20, 2017, 9:21 a.m. UTC | #6

Am Mittwoch, den 15.03.2017, 14:05 +0000 schrieb Russell King - ARM
Linux:
> On Wed, Mar 15, 2017 at 02:03:09PM +0100, Lucas Stach wrote:
> > Am Sonntag, den 12.03.2017, 19:00 +0000 schrieb Russell King:
> > > Each Vivante GPU contains a clock divider which can divide the GPU clock
> > > by 2^n, which can lower the power dissipation from the GPU.  It has been
> > > suggested that the GC600 on Dove is responsible for 20-30% of the power
> > > dissipation from the SoC, so lowering the GPU clock rate provides a way
> > > to throttle the power dissiptation, and reduce the temperature when the
> > > SoC gets hot.
> > > 
> > > This patch hooks the Etnaviv driver into the kernel's thermal management
> > > to allow the GPUs to be throttled when necessary, allowing a reduction in
> > > GPU clock rate from /1 to /64 in power of 2 steps.
> > 
> > Are those power of 2 steps a hardware limitation, or is it something you
> > implemented this way to get a smaller number of steps, with a more
> > meaningful difference in clock speed?
> > My understanding was that the FSCALE value is just a regular divider
> > with all steps values in the range of 1-64 being usable.
> 
> I don't share your understanding.  The Vivante GAL kernel driver only
> ever sets power-of-two values.  I have no evidence to support your
> suggestion.
> 
> There's evidence that says your understanding is incorrect however.
> It isn't a divider.  A value of 0x40 gives the fastest clock rate,
> a value of 0x01 gives the slowest.  If it were a binary divider,
> a value of 0x7f would give the slowest rate - so why doesn't Vivante
> use that in galcore when putting the GPU into idle/lower power - why
> do they just 0x01.
> 
> This all leads me to believe that it's not a binary divider, but a
> set of bits that select the clock from a set of divide-by-two stages,
> and having more than one bit set is invalid.
> 
> However, without definitive information from Vivante, we'll never
> really know.  We're unlikely to get that.
> 
Yes, it seems your understanding is correct and my mental model of this
FSCALE thing was off.

I'll pick up this patch for the next round of etnaviv features.

Regards,
Lucas

Russell King (Oracle) March 20, 2017, 9:50 a.m. UTC | #7

On Mon, Mar 20, 2017 at 10:19:35AM +0100, Lucas Stach wrote:
> Yes, probably we want to have both at some point. The cooling-device
> stuff is about throttling the GPU when the SoC reaches critical
> temperatures.
> 
> The devfreq governors are about providing exactly the right performance
> point.
> Though as I have not yet seen any SoCs where the voltage would be scaled
> with GPU frequency, downclocking the GPU is of limited value. For most
> of those SoCs a race-to-idle policy is probably okay, as this allows
> other components of the system like the DRAM to go into lower power
> operating modes when the GPU is idle.

However, since we already have runtime PM support in etnaviv (where we
even power the GPU off on some SoCs) the lack of voltage scaling does
suggest that GPU frequency scaling would be of limited value in terms
of overall power usage.

It's also worth bearing in mind that Vivante GPUs internally control
clock gates to various modules when they're on.

drm/etnaviv: add etnaviv cooling device

Commit Message

Comments

Patch