diff mbox

drm/tegra: add support for runtime pm

Message ID 1369662568-22390-1-git-send-email-mkulkarni@nvidia.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mayuresh Kulkarni May 27, 2013, 1:49 p.m. UTC
- as of now host1x and gr2d module's clocks are enabled
in their driver's .probe and never disabled
- this commit adds support for runtime pm in host1x and
gr2d drivers
- during boot-up clocks are enabled in .probe and disabled
on its exit
- when new work is submitted, clocks are enabled in submit
and disabled when submit completes
- parent->child relation between host1x and gr2d ensures
that host1x clock is also enabled before enabling gr2d clock
and it is turned off after gr2d clock is turned off

Signed-off-by: Mayuresh Kulkarni <mkulkarni@nvidia.com>
---
 drivers/gpu/host1x/channel.c  |  5 ++++
 drivers/gpu/host1x/dev.c      | 58 ++++++++++++++++++++++++++++++++++++++----
 drivers/gpu/host1x/drm/gr2d.c | 59 ++++++++++++++++++++++++++++++++++++++-----
 drivers/gpu/host1x/intr.c     |  9 +++++--
 4 files changed, 118 insertions(+), 13 deletions(-)

Comments

Terje Bergstrom May 28, 2013, 5:45 a.m. UTC | #1
On 27.05.2013 18:45, Thierry Reding wrote:
> On Mon, May 27, 2013 at 07:19:28PM +0530, Mayuresh Kulkarni wrote:
>> +#ifdef CONFIG_PM_RUNTIME
>> +static int host1x_runtime_suspend(struct device *dev)
>> +{
>> +	struct host1x *host;
>> +
>> +	host = dev_get_drvdata(dev);
>> +	if (IS_ERR_OR_NULL(host))
> 
> I think a simple
> 
> 	if (!host)
> 		return -EINVAL;
> 
> would be enough here. The driver-data of the device should never be an
> ERR_PTR()-encoded value, but either a valid pointer to a host1x object
> or NULL.

True, we should avoid IS_ERR_OR_NULL() like plague. We always know if
the called API returns a NULL on error or an error code. In case of
error code we should just propagate that.

> Same comments apply here. Also I think it might be a good idea to split
> the host1x and gr2d changes into separate patches.

That's a bit tricky, but doable. We just need to enable it for 2D first,
and then host1x to keep bisectability.

>>  static void action_submit_complete(struct host1x_waitlist *waiter)
>>  {
>> +	int completed = waiter->count;
>>  	struct host1x_channel *channel = waiter->data;
>>  
>> +	/* disable clocks for all the submits that got completed in this lot */
>> +	while (completed--)
>> +		pm_runtime_put(channel->dev);
>> +
>>  	host1x_cdma_update(&channel->cdma);
>>  
>> -	/*  Add nr_completed to trace */
>> +	/* Add nr_completed to trace */
>>  	trace_host1x_channel_submit_complete(dev_name(channel->dev),
>>  					     waiter->count, waiter->thresh);
>> -
>>  }
> 
> This feels hackish. But I can't see any better place to do this. Terje,
> Arto: any ideas how we can do this in a cleaner way? If there's nothing
> better then maybe moving the code into a separate function, say
> host1x_waitlist_complete(), might make this less awkward?

Yeah, it's a bit awkward. action_submit_complete() actually does handle
completion of multiple jobs, and we do one pm_runtime_get() per job.

We could do pm_runtime_put() in host1x_cdma_update(). It anyway goes
through each job that is completed, so while freeing the job it could as
well call runtime PM. That way we could even remove the waiter->count
variable altogether as it's not needed anymore.

The not-so-beautiful aspect is that we do pm_runtime_get() in
host1x_channel.c and pm_runtime_put() in host1x_cdma.c. For code
readability it's be great to have them in the same file. I actually get
questions every now and then because in downstream because of doing
these operations in different files.

Terje
Thierry Reding May 28, 2013, 9:10 a.m. UTC | #2
On Tue, May 28, 2013 at 08:45:03AM +0300, Terje Bergström wrote:
> On 27.05.2013 18:45, Thierry Reding wrote:
> > On Mon, May 27, 2013 at 07:19:28PM +0530, Mayuresh Kulkarni wrote:
> >> +#ifdef CONFIG_PM_RUNTIME
> >> +static int host1x_runtime_suspend(struct device *dev)
> >> +{
> >> +	struct host1x *host;
> >> +
> >> +	host = dev_get_drvdata(dev);
> >> +	if (IS_ERR_OR_NULL(host))
> > 
> > I think a simple
> > 
> > 	if (!host)
> > 		return -EINVAL;
> > 
> > would be enough here. The driver-data of the device should never be an
> > ERR_PTR()-encoded value, but either a valid pointer to a host1x object
> > or NULL.
> 
> True, we should avoid IS_ERR_OR_NULL() like plague. We always know if
> the called API returns a NULL on error or an error code. In case of
> error code we should just propagate that.

Yes, that's the case in general. In this specific case the value
obtained by dev_get_drvdata() should either be a valid pointer or NULL,
never an error code. We can easily make sure by only setting the data
(using platform_set_drvdata()) when the pointer is valid.

Thinking about it some more, I don't think we can ever get NULL here. A
device's .runtime_suspend() cannot be called when the device has been
removed, right? That's the only case where the value returned might be
NULL. It would be NULL too if host1x wasn't initialized yet, but that's
already dealt with by the proper ordering in .probe().

> > Same comments apply here. Also I think it might be a good idea to split
> > the host1x and gr2d changes into separate patches.
> 
> That's a bit tricky, but doable. We just need to enable it for 2D first,
> and then host1x to keep bisectability.

Right, there's a dependency. But I'd still prefer to have them separate.
Unless it gets really messy.

> >>  static void action_submit_complete(struct host1x_waitlist *waiter)
> >>  {
> >> +	int completed = waiter->count;
> >>  	struct host1x_channel *channel = waiter->data;
> >>  
> >> +	/* disable clocks for all the submits that got completed in this lot */
> >> +	while (completed--)
> >> +		pm_runtime_put(channel->dev);
> >> +
> >>  	host1x_cdma_update(&channel->cdma);
> >>  
> >> -	/*  Add nr_completed to trace */
> >> +	/* Add nr_completed to trace */
> >>  	trace_host1x_channel_submit_complete(dev_name(channel->dev),
> >>  					     waiter->count, waiter->thresh);
> >> -
> >>  }
> > 
> > This feels hackish. But I can't see any better place to do this. Terje,
> > Arto: any ideas how we can do this in a cleaner way? If there's nothing
> > better then maybe moving the code into a separate function, say
> > host1x_waitlist_complete(), might make this less awkward?
> 
> Yeah, it's a bit awkward. action_submit_complete() actually does handle
> completion of multiple jobs, and we do one pm_runtime_get() per job.
> 
> We could do pm_runtime_put() in host1x_cdma_update(). It anyway goes
> through each job that is completed, so while freeing the job it could as
> well call runtime PM. That way we could even remove the waiter->count
> variable altogether as it's not needed anymore.

That sounds a lot better. We could add a helper (host1x_job_finish()
perhaps) with the following from update_cdma_locked():

	/* Unpin the memory */
	host1x_job_unpin(job);

	/* Pop push buffer slots */
	if (job->num_slots) {
		struct push_buffer *pb = &cdma->push_buffer;
		host1x_pushbuffer_pop(pb, job->num_slots);
		if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
			signal = true;
	}

	list_del(&job->list);

And add pm_runtime_put() (as well as potentially other stuff) in there.
That'll prevent update_cdma_unlocked() from growing too much. It isn't
too bad right now, so maybe a helper isn't warranted yet, but I don't
think it'll hurt.

> The not-so-beautiful aspect is that we do pm_runtime_get() in
> host1x_channel.c and pm_runtime_put() in host1x_cdma.c. For code
> readability it's be great to have them in the same file. I actually get
> questions every now and then because in downstream because of doing
> these operations in different files.

With the above helper in place, we could move host1x_job_submit() to
job.c instead and have all the code in one file.

Thierry
Mayuresh Kulkarni June 4, 2013, 8:41 a.m. UTC | #3
On Tuesday 28 May 2013 02:40 PM, Thierry Reding wrote:
> * PGP Signed by an unknown key
>
> On Tue, May 28, 2013 at 08:45:03AM +0300, Terje Bergström wrote:
>> On 27.05.2013 18:45, Thierry Reding wrote:
>>> On Mon, May 27, 2013 at 07:19:28PM +0530, Mayuresh Kulkarni wrote:
>>>> +#ifdef CONFIG_PM_RUNTIME
>>>> +static int host1x_runtime_suspend(struct device *dev)
>>>> +{
>>>> +	struct host1x *host;
>>>> +
>>>> +	host = dev_get_drvdata(dev);
>>>> +	if (IS_ERR_OR_NULL(host))
>>>
>>> I think a simple
>>>
>>> 	if (!host)
>>> 		return -EINVAL;
>>>
>>> would be enough here. The driver-data of the device should never be an
>>> ERR_PTR()-encoded value, but either a valid pointer to a host1x object
>>> or NULL.
>>
>> True, we should avoid IS_ERR_OR_NULL() like plague. We always know if
>> the called API returns a NULL on error or an error code. In case of
>> error code we should just propagate that.
>
> Yes, that's the case in general. In this specific case the value
> obtained by dev_get_drvdata() should either be a valid pointer or NULL,
> never an error code. We can easily make sure by only setting the data
> (using platform_set_drvdata()) when the pointer is valid.
>
> Thinking about it some more, I don't think we can ever get NULL here. A
> device's .runtime_suspend() cannot be called when the device has been
> removed, right? That's the only case where the value returned might be
> NULL. It would be NULL too if host1x wasn't initialized yet, but that's
> already dealt with by the proper ordering in .probe().
>
>>> Same comments apply here. Also I think it might be a good idea to split
>>> the host1x and gr2d changes into separate patches.
>>
>> That's a bit tricky, but doable. We just need to enable it for 2D first,
>> and then host1x to keep bisectability.
>
> Right, there's a dependency. But I'd still prefer to have them separate.
> Unless it gets really messy.
>
>>>>   static void action_submit_complete(struct host1x_waitlist *waiter)
>>>>   {
>>>> +	int completed = waiter->count;
>>>>   	struct host1x_channel *channel = waiter->data;
>>>>
>>>> +	/* disable clocks for all the submits that got completed in this lot */
>>>> +	while (completed--)
>>>> +		pm_runtime_put(channel->dev);
>>>> +
>>>>   	host1x_cdma_update(&channel->cdma);
>>>>
>>>> -	/*  Add nr_completed to trace */
>>>> +	/* Add nr_completed to trace */
>>>>   	trace_host1x_channel_submit_complete(dev_name(channel->dev),
>>>>   					     waiter->count, waiter->thresh);
>>>> -
>>>>   }
>>>
>>> This feels hackish. But I can't see any better place to do this. Terje,
>>> Arto: any ideas how we can do this in a cleaner way? If there's nothing
>>> better then maybe moving the code into a separate function, say
>>> host1x_waitlist_complete(), might make this less awkward?
>>
>> Yeah, it's a bit awkward. action_submit_complete() actually does handle
>> completion of multiple jobs, and we do one pm_runtime_get() per job.
>>
>> We could do pm_runtime_put() in host1x_cdma_update(). It anyway goes
>> through each job that is completed, so while freeing the job it could as
>> well call runtime PM. That way we could even remove the waiter->count
>> variable altogether as it's not needed anymore.
>
> That sounds a lot better. We could add a helper (host1x_job_finish()
> perhaps) with the following from update_cdma_locked():
>
> 	/* Unpin the memory */
> 	host1x_job_unpin(job);
>
> 	/* Pop push buffer slots */
> 	if (job->num_slots) {
> 		struct push_buffer *pb = &cdma->push_buffer;
> 		host1x_pushbuffer_pop(pb, job->num_slots);
> 		if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
> 			signal = true;
> 	}
>
> 	list_del(&job->list);
>
> And add pm_runtime_put() (as well as potentially other stuff) in there.
> That'll prevent update_cdma_unlocked() from growing too much. It isn't
> too bad right now, so maybe a helper isn't warranted yet, but I don't
> think it'll hurt.
>
>> The not-so-beautiful aspect is that we do pm_runtime_get() in
>> host1x_channel.c and pm_runtime_put() in host1x_cdma.c. For code
>> readability it's be great to have them in the same file. I actually get
>> questions every now and then because in downstream because of doing
>> these operations in different files.
>
> With the above helper in place, we could move host1x_job_submit() to
> job.c instead and have all the code in one file.
>
> Thierry
>
> * Unknown Key
> * 0x7F3EB3A1
>

In downstream, we have 2 APIs which are wrapper over runtime PM calls. 
We call those from _submit and job complete.

I wonder if we should follow the same here?
Thierry Reding June 10, 2013, 8:43 p.m. UTC | #4
On Tue, Jun 04, 2013 at 02:11:27PM +0530, Mayuresh Kulkarni wrote:
> On Tuesday 28 May 2013 02:40 PM, Thierry Reding wrote:
> >* PGP Signed by an unknown key
> >
> >On Tue, May 28, 2013 at 08:45:03AM +0300, Terje Bergström wrote:
> >>On 27.05.2013 18:45, Thierry Reding wrote:
> >>>On Mon, May 27, 2013 at 07:19:28PM +0530, Mayuresh Kulkarni wrote:
> >>>>+#ifdef CONFIG_PM_RUNTIME
> >>>>+static int host1x_runtime_suspend(struct device *dev)
> >>>>+{
> >>>>+	struct host1x *host;
> >>>>+
> >>>>+	host = dev_get_drvdata(dev);
> >>>>+	if (IS_ERR_OR_NULL(host))
> >>>
> >>>I think a simple
> >>>
> >>>	if (!host)
> >>>		return -EINVAL;
> >>>
> >>>would be enough here. The driver-data of the device should never be an
> >>>ERR_PTR()-encoded value, but either a valid pointer to a host1x object
> >>>or NULL.
> >>
> >>True, we should avoid IS_ERR_OR_NULL() like plague. We always know if
> >>the called API returns a NULL on error or an error code. In case of
> >>error code we should just propagate that.
> >
> >Yes, that's the case in general. In this specific case the value
> >obtained by dev_get_drvdata() should either be a valid pointer or NULL,
> >never an error code. We can easily make sure by only setting the data
> >(using platform_set_drvdata()) when the pointer is valid.
> >
> >Thinking about it some more, I don't think we can ever get NULL here. A
> >device's .runtime_suspend() cannot be called when the device has been
> >removed, right? That's the only case where the value returned might be
> >NULL. It would be NULL too if host1x wasn't initialized yet, but that's
> >already dealt with by the proper ordering in .probe().
> >
> >>>Same comments apply here. Also I think it might be a good idea to split
> >>>the host1x and gr2d changes into separate patches.
> >>
> >>That's a bit tricky, but doable. We just need to enable it for 2D first,
> >>and then host1x to keep bisectability.
> >
> >Right, there's a dependency. But I'd still prefer to have them separate.
> >Unless it gets really messy.
> >
> >>>>  static void action_submit_complete(struct host1x_waitlist *waiter)
> >>>>  {
> >>>>+	int completed = waiter->count;
> >>>>  	struct host1x_channel *channel = waiter->data;
> >>>>
> >>>>+	/* disable clocks for all the submits that got completed in this lot */
> >>>>+	while (completed--)
> >>>>+		pm_runtime_put(channel->dev);
> >>>>+
> >>>>  	host1x_cdma_update(&channel->cdma);
> >>>>
> >>>>-	/*  Add nr_completed to trace */
> >>>>+	/* Add nr_completed to trace */
> >>>>  	trace_host1x_channel_submit_complete(dev_name(channel->dev),
> >>>>  					     waiter->count, waiter->thresh);
> >>>>-
> >>>>  }
> >>>
> >>>This feels hackish. But I can't see any better place to do this. Terje,
> >>>Arto: any ideas how we can do this in a cleaner way? If there's nothing
> >>>better then maybe moving the code into a separate function, say
> >>>host1x_waitlist_complete(), might make this less awkward?
> >>
> >>Yeah, it's a bit awkward. action_submit_complete() actually does handle
> >>completion of multiple jobs, and we do one pm_runtime_get() per job.
> >>
> >>We could do pm_runtime_put() in host1x_cdma_update(). It anyway goes
> >>through each job that is completed, so while freeing the job it could as
> >>well call runtime PM. That way we could even remove the waiter->count
> >>variable altogether as it's not needed anymore.
> >
> >That sounds a lot better. We could add a helper (host1x_job_finish()
> >perhaps) with the following from update_cdma_locked():
> >
> >	/* Unpin the memory */
> >	host1x_job_unpin(job);
> >
> >	/* Pop push buffer slots */
> >	if (job->num_slots) {
> >		struct push_buffer *pb = &cdma->push_buffer;
> >		host1x_pushbuffer_pop(pb, job->num_slots);
> >		if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
> >			signal = true;
> >	}
> >
> >	list_del(&job->list);
> >
> >And add pm_runtime_put() (as well as potentially other stuff) in there.
> >That'll prevent update_cdma_unlocked() from growing too much. It isn't
> >too bad right now, so maybe a helper isn't warranted yet, but I don't
> >think it'll hurt.
> >
> >>The not-so-beautiful aspect is that we do pm_runtime_get() in
> >>host1x_channel.c and pm_runtime_put() in host1x_cdma.c. For code
> >>readability it's be great to have them in the same file. I actually get
> >>questions every now and then because in downstream because of doing
> >>these operations in different files.
> >
> >With the above helper in place, we could move host1x_job_submit() to
> >job.c instead and have all the code in one file.
> >
> >Thierry
> >
> >* Unknown Key
> >* 0x7F3EB3A1
> >
> 
> In downstream, we have 2 APIs which are wrapper over runtime PM
> calls. We call those from _submit and job complete.
> 
> I wonder if we should follow the same here?

Can you post the corresponding wrappers to make it easier to discuss
them? If they just wrap runtime PM calls then they don't solve the
locality problems that Terje brought up.

Thierry
Terje Bergstrom June 11, 2013, 6:23 a.m. UTC | #5
On 10.06.2013 23:43, Thierry Reding wrote:
> Can you post the corresponding wrappers to make it easier to discuss
> them? If they just wrap runtime PM calls then they don't solve the
> locality problems that Terje brought up.

I don't think the wrappers are applicable here. They're there in
downstream to fix a problem with runtime PM core: some host1x clients
require host1x to be active when they're processing, and some (dc) need
it only when CPU is programming DC or when a signal towards host1x is
pending. Runtime PM wants to keep parent enabled when any of the clients
are enabled.

We solved this downstream by enabling ignore_children for host1x. As a
side effect, we need to explicitly enable host1x when we're enabling one
of its clients that need host1x, and that's what the wrapper does.

We can fix this particular problem by moving calls to runtime PM to job.c.

Mayuresh, once you've managed to test your patches, please send the v2.

Terje
diff mbox

Patch

diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
index 83ea51b..b71caec 100644
--- a/drivers/gpu/host1x/channel.c
+++ b/drivers/gpu/host1x/channel.c
@@ -18,6 +18,7 @@ 
 
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/pm_runtime.h>
 
 #include "channel.h"
 #include "dev.h"
@@ -41,6 +42,10 @@  int host1x_job_submit(struct host1x_job *job)
 {
 	struct host1x *host = dev_get_drvdata(job->channel->dev->parent);
 
+	/* enable clocks
+	 * they will be disabled in action_submit_complete */
+	pm_runtime_get_sync(job->channel->dev);
+
 	return host1x_hw_channel_submit(host, job);
 }
 
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 28e28a2..93d9d72 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -23,6 +23,7 @@ 
 #include <linux/of_device.h>
 #include <linux/clk.h>
 #include <linux/io.h>
+#include <linux/pm_runtime.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/host1x.h>
@@ -143,11 +144,11 @@  static int host1x_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	err = clk_prepare_enable(host->clk);
-	if (err < 0) {
-		dev_err(&pdev->dev, "failed to enable clock\n");
-		return err;
-	}
+	/* enable runtime pm */
+	pm_runtime_enable(&pdev->dev);
+
+	/* enable clocks */
+	pm_runtime_get_sync(&pdev->dev);
 
 	err = host1x_syncpt_init(host);
 	if (err) {
@@ -165,6 +166,9 @@  static int host1x_probe(struct platform_device *pdev)
 
 	host1x_drm_alloc(pdev);
 
+	/* disable clocks */
+	pm_runtime_put(&pdev->dev);
+
 	return 0;
 
 fail_deinit_syncpt:
@@ -183,6 +187,47 @@  static int __exit host1x_remove(struct platform_device *pdev)
 	return 0;
 }
 
+#ifdef CONFIG_PM_RUNTIME
+static int host1x_runtime_suspend(struct device *dev)
+{
+	struct host1x *host;
+
+	host = dev_get_drvdata(dev);
+	if (IS_ERR_OR_NULL(host))
+		return -EINVAL;
+
+	clk_disable_unprepare(host->clk);
+
+	return 0;
+}
+
+static int host1x_runtime_resume(struct device *dev)
+{
+	int err = 0;
+	struct host1x *host;
+
+	host = dev_get_drvdata(dev);
+	if (IS_ERR_OR_NULL(host))
+		return -EINVAL;
+
+	err = clk_prepare_enable(host->clk);
+	if (err < 0)
+		dev_err(dev, "failed to enable clock\n");
+
+	return err;
+}
+#endif /* CONFIG_PM_RUNTIME */
+
+#ifdef CONFIG_PM
+static const struct dev_pm_ops host1x_pm_ops = {
+#ifdef CONFIG_PM_RUNTIME
+	.runtime_suspend = host1x_runtime_suspend,
+	.runtime_resume = host1x_runtime_resume,
+#endif
+};
+
+#endif /* CONFIG_PM */
+
 static struct platform_driver tegra_host1x_driver = {
 	.probe = host1x_probe,
 	.remove = __exit_p(host1x_remove),
@@ -190,6 +235,9 @@  static struct platform_driver tegra_host1x_driver = {
 		.owner = THIS_MODULE,
 		.name = "tegra-host1x",
 		.of_match_table = host1x_of_match,
+#ifdef CONFIG_PM
+		.pm = &host1x_pm_ops,
+#endif
 	},
 };
 
diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
index 6a45ae0..172a654 100644
--- a/drivers/gpu/host1x/drm/gr2d.c
+++ b/drivers/gpu/host1x/drm/gr2d.c
@@ -22,6 +22,7 @@ 
 #include <linux/of.h>
 #include <linux/of_device.h>
 #include <linux/clk.h>
+#include <linux/pm_runtime.h>
 
 #include "channel.h"
 #include "drm.h"
@@ -271,11 +272,13 @@  static int gr2d_probe(struct platform_device *pdev)
 		return PTR_ERR(gr2d->clk);
 	}
 
-	err = clk_prepare_enable(gr2d->clk);
-	if (err) {
-		dev_err(dev, "cannot turn on clock\n");
-		return err;
-	}
+	platform_set_drvdata(pdev, gr2d);
+
+	/* enable runtime pm */
+	pm_runtime_enable(&pdev->dev);
+
+	/* enable clocks */
+	pm_runtime_get_sync(&pdev->dev);
 
 	gr2d->channel = host1x_channel_request(dev);
 	if (!gr2d->channel)
@@ -301,7 +304,8 @@  static int gr2d_probe(struct platform_device *pdev)
 
 	gr2d_init_addr_reg_map(dev, gr2d);
 
-	platform_set_drvdata(pdev, gr2d);
+	/* disable clocks */
+	pm_runtime_put(&pdev->dev);
 
 	return 0;
 }
@@ -328,6 +332,46 @@  static int __exit gr2d_remove(struct platform_device *pdev)
 	return 0;
 }
 
+#ifdef CONFIG_PM_RUNTIME
+static int gr2d_runtime_suspend(struct device *dev)
+{
+	struct gr2d *gr2d;
+
+	gr2d = dev_get_drvdata(dev);
+	if (IS_ERR_OR_NULL(gr2d))
+		return -EINVAL;
+
+	clk_disable_unprepare(gr2d->clk);
+
+	return 0;
+}
+
+static int gr2d_runtime_resume(struct device *dev)
+{
+	int err = 0;
+	struct gr2d *gr2d;
+
+	gr2d = dev_get_drvdata(dev);
+	if (IS_ERR_OR_NULL(gr2d))
+		return -EINVAL;
+
+	err = clk_prepare_enable(gr2d->clk);
+	if (err < 0)
+		dev_err(dev, "failed to enable clock\n");
+
+	return err;
+}
+#endif /* CONFIG_PM_RUNTIME */
+
+#ifdef CONFIG_PM
+static const struct dev_pm_ops gr2d_pm_ops = {
+#ifdef CONFIG_PM_RUNTIME
+	.runtime_suspend = gr2d_runtime_suspend,
+	.runtime_resume = gr2d_runtime_resume,
+#endif
+};
+#endif /* CONFIG_PM */
+
 struct platform_driver tegra_gr2d_driver = {
 	.probe = gr2d_probe,
 	.remove = __exit_p(gr2d_remove),
@@ -335,5 +379,8 @@  struct platform_driver tegra_gr2d_driver = {
 		.owner = THIS_MODULE,
 		.name = "gr2d",
 		.of_match_table = gr2d_match,
+#ifdef CONFIG_PM
+		.pm = &gr2d_pm_ops,
+#endif
 	}
 };
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 2491bf8..c23fb64 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -20,6 +20,7 @@ 
 #include <linux/interrupt.h>
 #include <linux/slab.h>
 #include <linux/irq.h>
+#include <linux/pm_runtime.h>
 
 #include <trace/events/host1x.h>
 #include "channel.h"
@@ -109,14 +110,18 @@  static void reset_threshold_interrupt(struct host1x *host,
 
 static void action_submit_complete(struct host1x_waitlist *waiter)
 {
+	int completed = waiter->count;
 	struct host1x_channel *channel = waiter->data;
 
+	/* disable clocks for all the submits that got completed in this lot */
+	while (completed--)
+		pm_runtime_put(channel->dev);
+
 	host1x_cdma_update(&channel->cdma);
 
-	/*  Add nr_completed to trace */
+	/* Add nr_completed to trace */
 	trace_host1x_channel_submit_complete(dev_name(channel->dev),
 					     waiter->count, waiter->thresh);
-
 }
 
 static void action_wakeup(struct host1x_waitlist *waiter)