[1/2] drm/i915/guc: Add GuC Load time to dmesg log.
diff mbox

Message ID 1509495081-30501-1-git-send-email-anusha.srivatsa@intel.com
State New
Headers show

Commit Message

Srivatsa, Anusha Nov. 1, 2017, 12:11 a.m. UTC
Calculate the time that GuC takes to load using
jiffies. This information could be very useful in
determining if GuC is taking unreasonably long time
to load in a certain platforms.

v2: Calculate time before logs are collected.
Move the guc_load_time variable as a part of
intel_uc_fw struct. Store only final result
which is to be exported to debugfs. (Michal)
Add the load time in the print message as well.

v3: Remove debugfs entry. Remove local variable
guc_finish_load. (Daniel, Tvrtko)

v4: Use ktime_get() instead of jiffies. Use DRM_NOTE
if time taken to load is more than the threshold. On
load times within acceptable range, use DRM_DEBUG_DRIVER
(Tvrtko)

v5: Rebased. Do not expose the load time variable in a global
struct (Tvrtko, Joonas)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_fw.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Michal Wajdeczko Nov. 1, 2017, 1:14 p.m. UTC | #1
On Wed, 01 Nov 2017 01:11:20 +0100, Anusha Srivatsa  
<anusha.srivatsa@intel.com> wrote:

> Calculate the time that GuC takes to load using
> jiffies. This information could be very useful in
   ^^^^^^^
This is no longer true.

> determining if GuC is taking unreasonably long time
> to load in a certain platforms.
>
> v2: Calculate time before logs are collected.
> Move the guc_load_time variable as a part of
> intel_uc_fw struct. Store only final result
> which is to be exported to debugfs. (Michal)
> Add the load time in the print message as well.
>
> v3: Remove debugfs entry. Remove local variable
> guc_finish_load. (Daniel, Tvrtko)
>
> v4: Use ktime_get() instead of jiffies. Use DRM_NOTE
> if time taken to load is more than the threshold. On
> load times within acceptable range, use DRM_DEBUG_DRIVER
> (Tvrtko)
>
> v5: Rebased. Do not expose the load time variable in a global
> struct (Tvrtko, Joonas)
>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Oscar Mateo <oscar.mateo@intel.com>
> Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_guc_fw.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c  
> b/drivers/gpu/drm/i915/intel_guc_fw.c
> index ef67a36..4ce9a30 100644
> --- a/drivers/gpu/drm/i915/intel_guc_fw.c
> +++ b/drivers/gpu/drm/i915/intel_guc_fw.c
> @@ -133,7 +133,8 @@ static int guc_ucode_xfer_dma(struct  
> drm_i915_private *dev_priv,
>  	unsigned long offset;
>  	struct sg_table *sg = vma->pages;
>  	u32 status, rsa[UOS_RSA_SCRATCH_MAX_COUNT];
> -	int i, ret = 0;
> +	int i, ret = 0, load_time;

Note that ktime_ms_delta() return type is s64 not int.

> +	ktime_t start_load;

s/start_load/now ?

> 	/* where RSA signature starts */
>  	offset = guc_fw->rsa_offset;
> @@ -160,6 +161,7 @@ static int guc_ucode_xfer_dma(struct  
> drm_i915_private *dev_priv,
>  	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
> 	/* Finally start the DMA */
> +	start_load = ktime_get();

Maybe we should either update comment with note about taking start time
or move ktime_get call before that comment to avoid confusion..

>  	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
> 	/*
> @@ -172,13 +174,18 @@ static int guc_ucode_xfer_dma(struct  
> drm_i915_private *dev_priv,
>  	 */
>  	ret = wait_for(guc_ucode_response(dev_priv, &status), 100);
> +	load_time = ktime_ms_delta(ktime_get(), start_load);
> +
>  	DRM_DEBUG_DRIVER("DMA status 0x%x, GuC status 0x%x\n",
>  			I915_READ(DMA_CTRL), status);
> 	if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
>  		DRM_ERROR("GuC firmware signature verification failed\n");
>  		ret = -ENOEXEC;
> -	}
> +	} else if (load_time > 20)
> +		DRM_NOTE("GuC load takes more than acceptable threshold\n");

Please add threshold and actual load time to the message to let user
know that times

> +	else
> +		DRM_DEBUG_DRIVER("GuC loaded in %d ms\n", load_time);

And I'm not sure that we can rely on 'load_time' on timeout in wait_for.

> 	DRM_DEBUG_DRIVER("returning %d\n", ret);
Chris Wilson Nov. 1, 2017, 1:24 p.m. UTC | #2
Quoting Michal Wajdeczko (2017-11-01 13:14:33)
> On Wed, 01 Nov 2017 01:11:20 +0100, Anusha Srivatsa  
> > @@ -172,13 +174,18 @@ static int guc_ucode_xfer_dma(struct  
> > drm_i915_private *dev_priv,
> >        */
> >       ret = wait_for(guc_ucode_response(dev_priv, &status), 100);
> > +     load_time = ktime_ms_delta(ktime_get(), start_load);
> > +
> >       DRM_DEBUG_DRIVER("DMA status 0x%x, GuC status 0x%x\n",
> >                       I915_READ(DMA_CTRL), status);
> >       if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
> >               DRM_ERROR("GuC firmware signature verification failed\n");
> >               ret = -ENOEXEC;
> > -     }
> > +     } else if (load_time > 20)
> > +             DRM_NOTE("GuC load takes more than acceptable threshold\n");
> 
> Please add threshold and actual load time to the message to let user
> know that times

The more important question is why are we telling the user this at all;
a significant but normal condition. What do we expect them to do? It
doesn't impair any functionality of the driver, it just took longer than
you expected -- which may be simply because the system was doing
something else and we slept for longer.
-Chris
David Weinehall Nov. 1, 2017, 1:38 p.m. UTC | #3
On Tue, Oct 31, 2017 at 05:11:20PM -0700, Anusha Srivatsa wrote:
> Calculate the time that GuC takes to load using
> jiffies. This information could be very useful in
> determining if GuC is taking unreasonably long time
> to load in a certain platforms.
> 
> v2: Calculate time before logs are collected.
> Move the guc_load_time variable as a part of
> intel_uc_fw struct. Store only final result
> which is to be exported to debugfs. (Michal)
> Add the load time in the print message as well.
> 
> v3: Remove debugfs entry. Remove local variable
> guc_finish_load. (Daniel, Tvrtko)
> 
> v4: Use ktime_get() instead of jiffies. Use DRM_NOTE
> if time taken to load is more than the threshold. On
> load times within acceptable range, use DRM_DEBUG_DRIVER
> (Tvrtko)
> 
> v5: Rebased. Do not expose the load time variable in a global
> struct (Tvrtko, Joonas)

From my point of view (measuring suspend/resume times) knowing
how much time is spent loading GuC & HuC is really useful,
so it's definitely useful information.


Kind regards, David Weinehall
Chris Wilson Nov. 1, 2017, 1:48 p.m. UTC | #4
Quoting David Weinehall (2017-11-01 13:38:48)
> On Tue, Oct 31, 2017 at 05:11:20PM -0700, Anusha Srivatsa wrote:
> > Calculate the time that GuC takes to load using
> > jiffies. This information could be very useful in
> > determining if GuC is taking unreasonably long time
> > to load in a certain platforms.
> > 
> > v2: Calculate time before logs are collected.
> > Move the guc_load_time variable as a part of
> > intel_uc_fw struct. Store only final result
> > which is to be exported to debugfs. (Michal)
> > Add the load time in the print message as well.
> > 
> > v3: Remove debugfs entry. Remove local variable
> > guc_finish_load. (Daniel, Tvrtko)
> > 
> > v4: Use ktime_get() instead of jiffies. Use DRM_NOTE
> > if time taken to load is more than the threshold. On
> > load times within acceptable range, use DRM_DEBUG_DRIVER
> > (Tvrtko)
> > 
> > v5: Rebased. Do not expose the load time variable in a global
> > struct (Tvrtko, Joonas)
> 
> From my point of view (measuring suspend/resume times) knowing
> how much time is spent loading GuC & HuC is really useful,
> so it's definitely useful information.

This information could be gleaned from ftrace...
-Chris
Srivatsa, Anusha Nov. 2, 2017, 6:04 p.m. UTC | #5
On Wed, Nov 01, 2017 at 02:14:33PM +0100, Michal Wajdeczko wrote:
> On Wed, 01 Nov 2017 01:11:20 +0100, Anusha Srivatsa
> <anusha.srivatsa@intel.com> wrote:
> 
> >Calculate the time that GuC takes to load using
> >jiffies. This information could be very useful in
>   ^^^^^^^
> This is no longer true.

True. Will sending an all new patch with updated 
approach(using ktime instead of jiffies) be good?
Or stick to this with change in commit message?

> >determining if GuC is taking unreasonably long time
> >to load in a certain platforms.
> >
> >v2: Calculate time before logs are collected.
> >Move the guc_load_time variable as a part of
> >intel_uc_fw struct. Store only final result
> >which is to be exported to debugfs. (Michal)
> >Add the load time in the print message as well.
> >
> >v3: Remove debugfs entry. Remove local variable
> >guc_finish_load. (Daniel, Tvrtko)
> >
> >v4: Use ktime_get() instead of jiffies. Use DRM_NOTE
> >if time taken to load is more than the threshold. On
> >load times within acceptable range, use DRM_DEBUG_DRIVER
> >(Tvrtko)
> >
> >v5: Rebased. Do not expose the load time variable in a global
> >struct (Tvrtko, Joonas)
> >
> >Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >Cc: Oscar Mateo <oscar.mateo@intel.com>
> >Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
> >---
> > drivers/gpu/drm/i915/intel_guc_fw.c | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c
> >b/drivers/gpu/drm/i915/intel_guc_fw.c
> >index ef67a36..4ce9a30 100644
> >--- a/drivers/gpu/drm/i915/intel_guc_fw.c
> >+++ b/drivers/gpu/drm/i915/intel_guc_fw.c
> >@@ -133,7 +133,8 @@ static int guc_ucode_xfer_dma(struct drm_i915_private
> >*dev_priv,
> > 	unsigned long offset;
> > 	struct sg_table *sg = vma->pages;
> > 	u32 status, rsa[UOS_RSA_SCRATCH_MAX_COUNT];
> >-	int i, ret = 0;
> >+	int i, ret = 0, load_time;
> 
> Note that ktime_ms_delta() return type is s64 not int.
> 
> >+	ktime_t start_load;
> 
> s/start_load/now ?

I think start_load is verbose. 
 
> >	/* where RSA signature starts */
> > 	offset = guc_fw->rsa_offset;
> >@@ -160,6 +161,7 @@ static int guc_ucode_xfer_dma(struct drm_i915_private
> >*dev_priv,
> > 	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
> >	/* Finally start the DMA */
> >+	start_load = ktime_get();
> 
> Maybe we should either update comment with note about taking start time
> or move ktime_get call before that comment to avoid confusion..

I prefer the latter.
 
> > 	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
> >	/*
> >@@ -172,13 +174,18 @@ static int guc_ucode_xfer_dma(struct
> >drm_i915_private *dev_priv,
> > 	 */
> > 	ret = wait_for(guc_ucode_response(dev_priv, &status), 100);
> >+	load_time = ktime_ms_delta(ktime_get(), start_load);
> >+
> > 	DRM_DEBUG_DRIVER("DMA status 0x%x, GuC status 0x%x\n",
> > 			I915_READ(DMA_CTRL), status);
> >	if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
> > 		DRM_ERROR("GuC firmware signature verification failed\n");
> > 		ret = -ENOEXEC;
> >-	}
> >+	} else if (load_time > 20)
> >+		DRM_NOTE("GuC load takes more than acceptable threshold\n");
> 
> Please add threshold and actual load time to the message to let user
> know that times
 
> >+	else
> >+		DRM_DEBUG_DRIVER("GuC loaded in %d ms\n", load_time);
> 
> And I'm not sure that we can rely on 'load_time' on timeout in wait_for.

Hmm.... we  are checking the DMA status right after that which means
the firmware load should have happened by then.... thats why I 
am calculating it there....

 
> >	DRM_DEBUG_DRIVER("returning %d\n", ret);
Srivatsa, Anusha Nov. 2, 2017, 8:28 p.m. UTC | #6
On Wed, Nov 01, 2017 at 01:24:15PM +0000, Chris Wilson wrote:
> Quoting Michal Wajdeczko (2017-11-01 13:14:33)
> > On Wed, 01 Nov 2017 01:11:20 +0100, Anusha Srivatsa  
> > > @@ -172,13 +174,18 @@ static int guc_ucode_xfer_dma(struct  
> > > drm_i915_private *dev_priv,
> > >        */
> > >       ret = wait_for(guc_ucode_response(dev_priv, &status), 100);
> > > +     load_time = ktime_ms_delta(ktime_get(), start_load);
> > > +
> > >       DRM_DEBUG_DRIVER("DMA status 0x%x, GuC status 0x%x\n",
> > >                       I915_READ(DMA_CTRL), status);
> > >       if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
> > >               DRM_ERROR("GuC firmware signature verification failed\n");
> > >               ret = -ENOEXEC;
> > > -     }
> > > +     } else if (load_time > 20)
> > > +             DRM_NOTE("GuC load takes more than acceptable threshold\n");
> > 
> > Please add threshold and actual load time to the message to let user
> > know that times
> 
> The more important question is why are we telling the user this at all;
> a significant but normal condition. What do we expect them to do? It
> doesn't impair any functionality of the driver, it just took longer than
> you expected -- which may be simply because the system was doing
> something else and we slept for longer.

Chris, I am inclining to have this info more for us than the user. It is more of
a debug print to give us some data. We can see if firmware takes peculiarly
long time to load. We know its normal to be within 20ms range. So, alert
if it takes longer than that...

> -Chris
Chris Wilson Nov. 2, 2017, 8:33 p.m. UTC | #7
Quoting Anusha Srivatsa (2017-11-02 20:28:10)
> On Wed, Nov 01, 2017 at 01:24:15PM +0000, Chris Wilson wrote:
> > Quoting Michal Wajdeczko (2017-11-01 13:14:33)
> > > On Wed, 01 Nov 2017 01:11:20 +0100, Anusha Srivatsa  
> > > > @@ -172,13 +174,18 @@ static int guc_ucode_xfer_dma(struct  
> > > > drm_i915_private *dev_priv,
> > > >        */
> > > >       ret = wait_for(guc_ucode_response(dev_priv, &status), 100);
> > > > +     load_time = ktime_ms_delta(ktime_get(), start_load);
> > > > +
> > > >       DRM_DEBUG_DRIVER("DMA status 0x%x, GuC status 0x%x\n",
> > > >                       I915_READ(DMA_CTRL), status);
> > > >       if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
> > > >               DRM_ERROR("GuC firmware signature verification failed\n");
> > > >               ret = -ENOEXEC;
> > > > -     }
> > > > +     } else if (load_time > 20)
> > > > +             DRM_NOTE("GuC load takes more than acceptable threshold\n");
> > > 
> > > Please add threshold and actual load time to the message to let user
> > > know that times
> > 
> > The more important question is why are we telling the user this at all;
> > a significant but normal condition. What do we expect them to do? It
> > doesn't impair any functionality of the driver, it just took longer than
> > you expected -- which may be simply because the system was doing
> > something else and we slept for longer.
> 
> Chris, I am inclining to have this info more for us than the user. It is more of
> a debug print to give us some data. We can see if firmware takes peculiarly
> long time to load. We know its normal to be within 20ms range. So, alert
> if it takes longer than that...

Sure, but the impact is that this is a user facing message, even marked
as a significant message. We are quite capable of parsing debug
messages; even capable of setting up ftrace to extract this timing info
without adding the dmesg in the first place...
-Chris

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c b/drivers/gpu/drm/i915/intel_guc_fw.c
index ef67a36..4ce9a30 100644
--- a/drivers/gpu/drm/i915/intel_guc_fw.c
+++ b/drivers/gpu/drm/i915/intel_guc_fw.c
@@ -133,7 +133,8 @@  static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv,
 	unsigned long offset;
 	struct sg_table *sg = vma->pages;
 	u32 status, rsa[UOS_RSA_SCRATCH_MAX_COUNT];
-	int i, ret = 0;
+	int i, ret = 0, load_time;
+	ktime_t start_load;
 
 	/* where RSA signature starts */
 	offset = guc_fw->rsa_offset;
@@ -160,6 +161,7 @@  static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv,
 	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
 
 	/* Finally start the DMA */
+	start_load = ktime_get();
 	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
 
 	/*
@@ -172,13 +174,18 @@  static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv,
 	 */
 	ret = wait_for(guc_ucode_response(dev_priv, &status), 100);
 
+	load_time = ktime_ms_delta(ktime_get(), start_load);
+
 	DRM_DEBUG_DRIVER("DMA status 0x%x, GuC status 0x%x\n",
 			I915_READ(DMA_CTRL), status);
 
 	if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
 		DRM_ERROR("GuC firmware signature verification failed\n");
 		ret = -ENOEXEC;
-	}
+	} else if (load_time > 20)
+		DRM_NOTE("GuC load takes more than acceptable threshold\n");
+	else
+		DRM_DEBUG_DRIVER("GuC loaded in %d ms\n", load_time);
 
 	DRM_DEBUG_DRIVER("returning %d\n", ret);