[v11,2/7] drm/i915/guc: Add CT size delay helper

Message ID	20231011000248.2181018-3-jonathan.cavitt@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Jonathan Cavitt <jonathan.cavitt@intel.com> To: intel-gfx@lists.freedesktop.org Date: Tue, 10 Oct 2023 17:02:43 -0700 Message-Id: <20231011000248.2181018-3-jonathan.cavitt@intel.com> In-Reply-To: <20231011000248.2181018-1-jonathan.cavitt@intel.com> References: <20231011000248.2181018-1-jonathan.cavitt@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH v11 2/7] drm/i915/guc: Add CT size delay helper Precedence: list Cc: janusz.krzysztofik@intel.com, andi.shyti@intel.com, matthew.d.roper@intel.com, jonathan.cavitt@intel.com, saurabhg.gupta@intel.com, chris.p.wilson@linux.intel.com, nirmoy.das@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	drm/i915: Define and use GuC and CTB TLB invalidation routines \| expand [v11,0/7] drm/i915: Define and use GuC and CTB TLB invalidation routines [v11,1/7] drm/i915: Add GuC TLB Invalidation device info flags [v11,2/7] drm/i915/guc: Add CT size delay helper [v11,3/7] drm/i915: Define and use GuC and CTB TLB invalidation routines [v11,4/7] drm/i915: No TLB invalidation on suspended GT [v11,5/7] drm/i915: No TLB invalidation on wedged GT [v11,6/7] drm/i915/gt: Increase sleep in gt_tlb selftest sanitycheck [v11,7/7] drm/i915: Enable GuC TLB invalidations for MTL

Message ID

20231011000248.2181018-3-jonathan.cavitt@intel.com (mailing list archive)

State

New, archived

Headers

From: Jonathan Cavitt <jonathan.cavitt@intel.com>
To: intel-gfx@lists.freedesktop.org
Date: Tue, 10 Oct 2023 17:02:43 -0700
Message-Id: <20231011000248.2181018-3-jonathan.cavitt@intel.com>
In-Reply-To: <20231011000248.2181018-1-jonathan.cavitt@intel.com>
References: <20231011000248.2181018-1-jonathan.cavitt@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: [Intel-gfx] [PATCH v11 2/7] drm/i915/guc: Add CT size delay helper
Precedence: list
Cc: janusz.krzysztofik@intel.com, andi.shyti@intel.com,
 matthew.d.roper@intel.com, jonathan.cavitt@intel.com,
 saurabhg.gupta@intel.com,
 chris.p.wilson@linux.intel.com, nirmoy.das@intel.com
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

drm/i915: Define and use GuC and CTB TLB invalidation routines | expand

Commit Message

Cavitt, Jonathan Oct. 11, 2023, 12:02 a.m. UTC

Add a helper function to the GuC CT buffer that reports the expected
time to process all outstanding requests.  As of now, there is no
functionality to check number of requests in the buffer, so the helper
function just reports 2 seconds, or 1ms per request up to the maximum
number of requests the CT buffer can store.

Suggested-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 27 +++++++++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 ++
 2 files changed, 29 insertions(+)

Comments

Nirmoy Das Oct. 11, 2023, 9 a.m. UTC | #1

On 10/11/2023 2:02 AM, Jonathan Cavitt wrote:
> Add a helper function to the GuC CT buffer that reports the expected
> time to process all outstanding requests.  As of now, there is no
> functionality to check number of requests in the buffer, so the helper
> function just reports 2 seconds, or 1ms per request up to the maximum
> number of requests the CT buffer can store.
>
> Suggested-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 27 +++++++++++++++++++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 ++
>   2 files changed, 29 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index c33210ead1ef7..03b616ba4ebb7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -103,6 +103,33 @@ enum { CTB_SEND = 0, CTB_RECV = 1 };
>   
>   enum { CTB_OWNER_HOST = 0 };
>   
> +/*
> + * Some H2G commands involve a synchronous response that the driver needs
> + * to wait for. In such cases, a timeout is required to prevent the driver
> + * from waiting forever in the case of an error (either no error response
> + * is defined in the protocol or something has died and requires a reset).
> + * The specific command may be defined as having a time bound response but
> + * the CT is a queue and that time guarantee only starts from the point
> + * when the command reaches the head of the queue and is processed by GuC.
> + *
> + * Ideally there would be a helper to report the progress of a given
> + * command through the CT. However, that would require a significant
> + * amount of work in the CT layer. In the meantime, provide a reasonable
> + * estimation of the worst case latency it should take for the entire
> + * queue to drain. And therefore, how long a caller should wait before
> + * giving up on their request. The current estimate is based on empirical
> + * measurement of a test that fills the buffer with context creation and
> + * destruction requests as they seem to be the slowest operation.
> + */
> +long intel_guc_ct_max_queue_time_jiffies(void)
> +{
> +	/*
> +	 * A 4KB buffer full of context destroy commands takes a little
> +	 * over a second to process so bump that to 2s to be super safe.
> +	 */
> +	return (CTB_H2G_BUFFER_SIZE * HZ) / SZ_2K;
> +}
> +
>   static void ct_receive_tasklet_func(struct tasklet_struct *t);
>   static void ct_incoming_request_worker_func(struct work_struct *w);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 58e42901ff498..2c4bb9a941be6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -104,6 +104,8 @@ struct intel_guc_ct {
>   #endif
>   };
>   
> +long intel_guc_ct_max_queue_time_jiffies(void);
> +
>   void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>   int intel_guc_ct_init(struct intel_guc_ct *ct);
>   void intel_guc_ct_fini(struct intel_guc_ct *ct);

John Harrison Oct. 11, 2023, 5:44 p.m. UTC | #2

On 10/10/2023 17:02, Jonathan Cavitt wrote:
> Add a helper function to the GuC CT buffer that reports the expected
> time to process all outstanding requests.  As of now, there is no
> functionality to check number of requests in the buffer, so the helper
> function just reports 2 seconds, or 1ms per request up to the maximum
> number of requests the CT buffer can store.
This comment is inaccurate.

The buffer is 4K bytes. If it was only 1ms per request then a 2s total 
means 2000 requests in the buffer, or 2 bytes per request. The smallest 
request possible is 2 words or 8 bytes (and that would be a request with 
no data at all). The average requests size is more likely 4 words at 
least. Which means only 250 requests per queue and therefore a maximum 
time of 8ms per request to hit a 2s total.

It would be better to simply say "As of now, there is no mechanism for 
tracking a given request's progress through the queue. Instead, add a 
helper that returns an estimated maximum time the queue should take to 
drain if completely full.". The description in the code itself gives the 
full details. No need to repeat all that in the commit message.

With that updated:
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

>
> Suggested-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 27 +++++++++++++++++++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  2 ++
>   2 files changed, 29 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index c33210ead1ef7..03b616ba4ebb7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -103,6 +103,33 @@ enum { CTB_SEND = 0, CTB_RECV = 1 };
>   
>   enum { CTB_OWNER_HOST = 0 };
>   
> +/*
> + * Some H2G commands involve a synchronous response that the driver needs
> + * to wait for. In such cases, a timeout is required to prevent the driver
> + * from waiting forever in the case of an error (either no error response
> + * is defined in the protocol or something has died and requires a reset).
> + * The specific command may be defined as having a time bound response but
> + * the CT is a queue and that time guarantee only starts from the point
> + * when the command reaches the head of the queue and is processed by GuC.
> + *
> + * Ideally there would be a helper to report the progress of a given
> + * command through the CT. However, that would require a significant
> + * amount of work in the CT layer. In the meantime, provide a reasonable
> + * estimation of the worst case latency it should take for the entire
> + * queue to drain. And therefore, how long a caller should wait before
> + * giving up on their request. The current estimate is based on empirical
> + * measurement of a test that fills the buffer with context creation and
> + * destruction requests as they seem to be the slowest operation.
> + */
> +long intel_guc_ct_max_queue_time_jiffies(void)
> +{
> +	/*
> +	 * A 4KB buffer full of context destroy commands takes a little
> +	 * over a second to process so bump that to 2s to be super safe.
> +	 */
> +	return (CTB_H2G_BUFFER_SIZE * HZ) / SZ_2K;
> +}
> +
>   static void ct_receive_tasklet_func(struct tasklet_struct *t);
>   static void ct_incoming_request_worker_func(struct work_struct *w);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 58e42901ff498..2c4bb9a941be6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -104,6 +104,8 @@ struct intel_guc_ct {
>   #endif
>   };
>   
> +long intel_guc_ct_max_queue_time_jiffies(void);
> +
>   void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>   int intel_guc_ct_init(struct intel_guc_ct *ct);
>   void intel_guc_ct_fini(struct intel_guc_ct *ct);

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index c33210ead1ef7..03b616ba4ebb7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -103,6 +103,33 @@  enum { CTB_SEND = 0, CTB_RECV = 1 };
 
 enum { CTB_OWNER_HOST = 0 };
 
+/*
+ * Some H2G commands involve a synchronous response that the driver needs
+ * to wait for. In such cases, a timeout is required to prevent the driver
+ * from waiting forever in the case of an error (either no error response
+ * is defined in the protocol or something has died and requires a reset).
+ * The specific command may be defined as having a time bound response but
+ * the CT is a queue and that time guarantee only starts from the point
+ * when the command reaches the head of the queue and is processed by GuC.
+ *
+ * Ideally there would be a helper to report the progress of a given
+ * command through the CT. However, that would require a significant
+ * amount of work in the CT layer. In the meantime, provide a reasonable
+ * estimation of the worst case latency it should take for the entire
+ * queue to drain. And therefore, how long a caller should wait before
+ * giving up on their request. The current estimate is based on empirical
+ * measurement of a test that fills the buffer with context creation and
+ * destruction requests as they seem to be the slowest operation.
+ */
+long intel_guc_ct_max_queue_time_jiffies(void)
+{
+	/*
+	 * A 4KB buffer full of context destroy commands takes a little
+	 * over a second to process so bump that to 2s to be super safe.
+	 */
+	return (CTB_H2G_BUFFER_SIZE * HZ) / SZ_2K;
+}
+
 static void ct_receive_tasklet_func(struct tasklet_struct *t);
 static void ct_incoming_request_worker_func(struct work_struct *w);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 58e42901ff498..2c4bb9a941be6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -104,6 +104,8 @@  struct intel_guc_ct {
 #endif
 };
 
+long intel_guc_ct_max_queue_time_jiffies(void);
+
 void intel_guc_ct_init_early(struct intel_guc_ct *ct);
 int intel_guc_ct_init(struct intel_guc_ct *ct);
 void intel_guc_ct_fini(struct intel_guc_ct *ct);

[v11,2/7] drm/i915/guc: Add CT size delay helper

Commit Message

Comments

Patch