[04/47] drm/i915/guc: Add non blocking CTB send function

Message ID	20210624070516.21893-5-matthew.brost@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=foSz=LS=lists.freedesktop.org=intel-gfx-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA82F61351 IronPort-SDR: 7L/SqCWZkVMEXxVpv9TfLqkpn4ZuZF3Mss0oZXE9/KV/L60v7P5xPPkdghZTH+6M1h1Yq8GABN UDCvcvthCc8A== IronPort-SDR: W6gpkkGz0TrFZGDPBsL7BWx4rx/jULj8lyv+Xp2agrgfqrdbK2qDYhGqB6YNmfvqj1O0iXc2h7 HmATd0bbDnXQ== From: Matthew Brost <matthew.brost@intel.com> To: <intel-gfx@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org> Date: Thu, 24 Jun 2021 00:04:33 -0700 Message-Id: <20210624070516.21893-5-matthew.brost@intel.com> In-Reply-To: <20210624070516.21893-1-matthew.brost@intel.com> References: <20210624070516.21893-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 04/47] drm/i915/guc: Add non blocking CTB send function Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	GuC submission support \| expand [00/47] GuC submission support [01/47] drm/i915/guc: Relax CTB response timeout [02/47] drm/i915/guc: Improve error message for unsolicited CT response [03/47] drm/i915/guc: Increase size of CTB buffers [04/47] drm/i915/guc: Add non blocking CTB send function [05/47] drm/i915/guc: Add stall timer to non blocking CTB send function [06/47] drm/i915/guc: Optimize CTB writes and reads [07/47] drm/i915/guc: Module load failure test for CT buffer creation [08/47] drm/i915/guc: Add new GuC interface defines and structures [09/47] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor [10/47] drm/i915/guc: Add lrc descriptor context lookup array [11/47] drm/i915/guc: Implement GuC submission tasklet [12/47] drm/i915/guc: Add bypass tasklet submission path to GuC [13/47] drm/i915/guc: Implement GuC context operations for new inteface [14/47] drm/i915/guc: Insert fence on context when deregistering [15/47] drm/i915/guc: Defer context unpin until scheduling is disabled [16/47] drm/i915/guc: Disable engine barriers with GuC during unpin [17/47] drm/i915/guc: Extend deregistration fence to schedule disable [18/47] drm/i915: Disable preempt busywait when using GuC scheduling [19/47] drm/i915/guc: Ensure request ordering via completion fences [20/47] drm/i915/guc: Disable semaphores when using GuC scheduling [21/47] drm/i915/guc: Ensure G2H response has space in buffer [22/47] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC [23/47] drm/i915/guc: Update GuC debugfs to support new GuC [24/47] drm/i915/guc: Add several request trace points [25/47] drm/i915: Add intel_context tracing [26/47] drm/i915/guc: GuC virtual engines [27/47] drm/i915: Track 'serial' counts for virtual engines [28/47] drm/i915: Hold reference to intel_context over life of i915_request [29/47] drm/i915/guc: Disable bonding extension with GuC submission [30/47] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs [31/47] drm/i915/guc: Reset implementation for new GuC interface [32/47] drm/i915: Reset GPU immediately if submission is disabled [33/47] drm/i915/guc: Add disable interrupts to guc sanitize [34/47] drm/i915/guc: Suspend/resume implementation for new interface [35/47] drm/i915/guc: Handle context reset notification [36/47] drm/i915/guc: Handle engine reset failure notification [37/47] drm/i915/guc: Enable the timer expired interrupt for GuC [38/47] drm/i915/guc: Provide mmio list to be saved/restored on engine reset [39/47] drm/i915/guc: Don't complain about reset races [40/47] drm/i915/guc: Enable GuC engine reset [41/47] drm/i915/guc: Capture error state on context reset [42/47] drm/i915/guc: Fix for error capture after full GPU reset with GuC [43/47] drm/i915/guc: Hook GuC scheduling policies up [44/47] drm/i915/guc: Connect reset modparam updates to GuC policy flags [45/47] drm/i915/guc: Include scheduling policies in the debugfs state dump [46/47] drm/i915/guc: Add golden context to GuC ADS [47/47] drm/i915/guc: Unblock GuC submission on Gen11+

Matthew Brost June 24, 2021, 7:04 a.m. UTC

Add non blocking CTB send function, intel_guc_send_nb. GuC submission
will send CTBs in the critical path and does not need to wait for these
CTBs to complete before moving on, hence the need for this new function.

The non-blocking CTB now must have a flow control mechanism to ensure
the buffer isn't overrun. A lazy spin wait is used as we believe the
flow control condition should be rare with a properly sized buffer.

The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
 3 files changed, 82 insertions(+), 10 deletions(-)

Michal Wajdeczko June 24, 2021, 2:48 p.m. UTC | #1

On 24.06.2021 09:04, Matthew Brost wrote:
> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> will send CTBs in the critical path and does not need to wait for these
> CTBs to complete before moving on, hence the need for this new function.
> 
> The non-blocking CTB now must have a flow control mechanism to ensure
> the buffer isn't overrun. A lazy spin wait is used as we believe the
> flow control condition should be rare with a properly sized buffer.
> 
> The function, intel_guc_send_nb, is exported in this patch but unused.
> Several patches later in the series make use of this function.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
>  3 files changed, 82 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 4abc59f6f3cd..24b1df6ad4ae 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>  static
>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
>  {
> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> +}
> +
> +#define INTEL_GUC_SEND_NB		BIT(31)

hmm, this flag really belongs to intel_guc_ct_send() so it should be
defined as CTB flag near that function declaration

> +static
> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> +{
> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> +				 INTEL_GUC_SEND_NB);
>  }
>  
>  static inline int
> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>  			   u32 *response_buf, u32 response_buf_size)
>  {
>  	return intel_guc_ct_send(&guc->ct, action, len,
> -				 response_buf, response_buf_size);
> +				 response_buf, response_buf_size, 0);
>  }
>  
>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index a17215920e58..c9a65d05911f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -3,6 +3,11 @@
>   * Copyright © 2016-2019 Intel Corporation
>   */
>  
> +#include <linux/circ_buf.h>
> +#include <linux/ktime.h>
> +#include <linux/time64.h>
> +#include <linux/timekeeping.h>
> +
>  #include "i915_drv.h"
>  #include "intel_guc_ct.h"
>  #include "gt/intel_gt.h"
> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
>  static int ct_write(struct intel_guc_ct *ct,
>  		    const u32 *action,
>  		    u32 len /* in dwords */,
> -		    u32 fence)
> +		    u32 fence, u32 flags)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
>  
> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));

or as we already switched to accept and return whole HXG messages in
guc_send_mmio() maybe we should do the same for CTB variant too and
instead of using extra flag just let caller to prepare proper HXG header
with HXG_EVENT type and then in CTB code just look at this type to make
decision which code path to use

note that existing callers should not be impacted, as full HXG header
for the REQUEST message looks exactly the same as "action" code alone.

>  
>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>  	return err;
>  }
>  
> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> +{
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> +	u32 head = READ_ONCE(desc->head);
> +	u32 space;
> +
> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> +
> +	return space >= len_dw;

here you are returning true(1) as has room

> +}
> +
> +static int ct_send_nb(struct intel_guc_ct *ct,
> +		      const u32 *action,
> +		      u32 len,
> +		      u32 flags)
> +{
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> +	unsigned long spin_flags;
> +	u32 fence;
> +	int ret;
> +
> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> +
> +	ret = h2g_has_room(ctb, len + 1);

but here you treat "1" it as en error

and this "1" is GUC_HXG_MSG_MIN_LEN, right ?

> +	if (unlikely(ret))
> +		goto out;
> +
> +	fence = ct_get_next_fence(ct);
> +	ret = ct_write(ct, action, len, fence, flags);
> +	if (unlikely(ret))
> +		goto out;
> +
> +	intel_guc_notify(ct_to_guc(ct));
> +
> +out:
> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> +
> +	return ret;
> +}
> +
>  static int ct_send(struct intel_guc_ct *ct,
>  		   const u32 *action,
>  		   u32 len,
> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  		   u32 response_buf_size,
>  		   u32 *status)
>  {
> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>  	struct ct_request request;
>  	unsigned long flags;
>  	u32 fence;
> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(!len);
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	GEM_BUG_ON(!response_buf && response_buf_size);
> +	might_sleep();
>  
> +	/*
> +	 * We use a lazy spin wait loop here as we believe that if the CT
> +	 * buffers are sized correctly the flow control condition should be
> +	 * rare.

shouldn't we at least try to log such cases with RATE_LIMITED to find
out how "rare" it is, or if really unlikely just return -EBUSY as in
case of non-blocking send ?

> +	 */
> +retry:
>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> +		cond_resched();
> +		goto retry;
> +	}
>  
>  	fence = ct_get_next_fence(ct);
>  	request.fence = fence;
> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
>  	list_add_tail(&request.link, &ct->requests.pending);
>  	spin_unlock(&ct->requests.lock);
>  
> -	err = ct_write(ct, action, len, fence);
> +	err = ct_write(ct, action, len, fence, 0);
>  
>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>  
> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
>   * Command Transport (CT) buffer based GuC send function.
>   */
>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> -		      u32 *response_buf, u32 response_buf_size)
> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>  {
>  	u32 status = ~0; /* undefined */
>  	int ret;
> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>  		return -ENODEV;
>  	}
>  
> +	if (flags & INTEL_GUC_SEND_NB)
> +		return ct_send_nb(ct, action, len, flags);
> +
>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>  	if (unlikely(ret < 0)) {
>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 1ae2dde6db93..eb69263324ba 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
>  	bool broken;
>  };
>  
> -
>  /** Top-level structure for Command Transport related data
>   *
>   * Includes a pair of CT buffers for bi-directional communication and tracking
> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>  }
>  
>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> -		      u32 *response_buf, u32 response_buf_size);
> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>  
>  #endif /* _INTEL_GUC_CT_H_ */
>

Matthew Brost June 24, 2021, 3:49 p.m. UTC | #2

On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 09:04, Matthew Brost wrote:
> > Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> > will send CTBs in the critical path and does not need to wait for these
> > CTBs to complete before moving on, hence the need for this new function.
> > 
> > The non-blocking CTB now must have a flow control mechanism to ensure
> > the buffer isn't overrun. A lazy spin wait is used as we believe the
> > flow control condition should be rare with a properly sized buffer.
> > 
> > The function, intel_guc_send_nb, is exported in this patch but unused.
> > Several patches later in the series make use of this function.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
> >  3 files changed, 82 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 4abc59f6f3cd..24b1df6ad4ae 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >  static
> >  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >  {
> > -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> > +}
> > +
> > +#define INTEL_GUC_SEND_NB		BIT(31)
> 
> hmm, this flag really belongs to intel_guc_ct_send() so it should be
> defined as CTB flag near that function declaration
> 

I can move this up a few lines.

> > +static
> > +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> > +{
> > +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> > +				 INTEL_GUC_SEND_NB);
> >  }
> >  
> >  static inline int
> > @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >  			   u32 *response_buf, u32 response_buf_size)
> >  {
> >  	return intel_guc_ct_send(&guc->ct, action, len,
> > -				 response_buf, response_buf_size);
> > +				 response_buf, response_buf_size, 0);
> >  }
> >  
> >  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index a17215920e58..c9a65d05911f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -3,6 +3,11 @@
> >   * Copyright © 2016-2019 Intel Corporation
> >   */
> >  
> > +#include <linux/circ_buf.h>
> > +#include <linux/ktime.h>
> > +#include <linux/time64.h>
> > +#include <linux/timekeeping.h>
> > +
> >  #include "i915_drv.h"
> >  #include "intel_guc_ct.h"
> >  #include "gt/intel_gt.h"
> > @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
> >  static int ct_write(struct intel_guc_ct *ct,
> >  		    const u32 *action,
> >  		    u32 len /* in dwords */,
> > -		    u32 fence)
> > +		    u32 fence, u32 flags)
> >  {
> >  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >  	struct guc_ct_buffer_desc *desc = ctb->desc;
> > @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> >  
> > -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> > +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> > +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> > +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> > +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> > +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> > +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> 
> or as we already switched to accept and return whole HXG messages in
> guc_send_mmio() maybe we should do the same for CTB variant too and
> instead of using extra flag just let caller to prepare proper HXG header
> with HXG_EVENT type and then in CTB code just look at this type to make
> decision which code path to use
>

Not sure I follow. Anyways could this be done in a follow up by you if
want this change.
 
> note that existing callers should not be impacted, as full HXG header
> for the REQUEST message looks exactly the same as "action" code alone.
> 
> >  
> >  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> > @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >  	return err;
> >  }
> >  
> > +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> > +{
> > +	struct guc_ct_buffer_desc *desc = ctb->desc;
> > +	u32 head = READ_ONCE(desc->head);
> > +	u32 space;
> > +
> > +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> > +
> > +	return space >= len_dw;
> 
> here you are returning true(1) as has room
>

See below.
 
> > +}
> > +
> > +static int ct_send_nb(struct intel_guc_ct *ct,
> > +		      const u32 *action,
> > +		      u32 len,
> > +		      u32 flags)
> > +{
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> > +	unsigned long spin_flags;
> > +	u32 fence;
> > +	int ret;
> > +
> > +	spin_lock_irqsave(&ctb->lock, spin_flags);
> > +
> > +	ret = h2g_has_room(ctb, len + 1);
> 
> but here you treat "1" it as en error
> 

Yes, this patch is broken but fixed in a follow up one. Regardless I'll
fix this patch in place.

> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
>

Not exactly. This is following how ct_send() uses the action + len
field. Action[0] field goes in the HXG header and extra + 1 is for the
CT header.

> > +	if (unlikely(ret))
> > +		goto out;
> > +
> > +	fence = ct_get_next_fence(ct);
> > +	ret = ct_write(ct, action, len, fence, flags);
> > +	if (unlikely(ret))
> > +		goto out;
> > +
> > +	intel_guc_notify(ct_to_guc(ct));
> > +
> > +out:
> > +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> > +
> > +	return ret;
> > +}
> > +
> >  static int ct_send(struct intel_guc_ct *ct,
> >  		   const u32 *action,
> >  		   u32 len,
> > @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >  		   u32 response_buf_size,
> >  		   u32 *status)
> >  {
> > +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >  	struct ct_request request;
> >  	unsigned long flags;
> >  	u32 fence;
> > @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >  	GEM_BUG_ON(!len);
> >  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >  	GEM_BUG_ON(!response_buf && response_buf_size);
> > +	might_sleep();
> >  
> > +	/*
> > +	 * We use a lazy spin wait loop here as we believe that if the CT
> > +	 * buffers are sized correctly the flow control condition should be
> > +	 * rare.
> 
> shouldn't we at least try to log such cases with RATE_LIMITED to find
> out how "rare" it is, or if really unlikely just return -EBUSY as in
> case of non-blocking send ?
>

Definitely not return -EBUSY as this a blocking call. Perhaps we can log
this, but IGTs likely can hit rather easily. It really is only
interesting if real workloads hit this. Regardless that can be a follow
up.

Matt
 
> > +	 */
> > +retry:
> >  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> > +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> > +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> > +		cond_resched();
> > +		goto retry;
> > +	}
> >  
> >  	fence = ct_get_next_fence(ct);
> >  	request.fence = fence;
> > @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >  	list_add_tail(&request.link, &ct->requests.pending);
> >  	spin_unlock(&ct->requests.lock);
> >  
> > -	err = ct_write(ct, action, len, fence);
> > +	err = ct_write(ct, action, len, fence, 0);
> >  
> >  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >  
> > @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >   * Command Transport (CT) buffer based GuC send function.
> >   */
> >  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > -		      u32 *response_buf, u32 response_buf_size)
> > +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >  {
> >  	u32 status = ~0; /* undefined */
> >  	int ret;
> > @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >  		return -ENODEV;
> >  	}
> >  
> > +	if (flags & INTEL_GUC_SEND_NB)
> > +		return ct_send_nb(ct, action, len, flags);
> > +
> >  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >  	if (unlikely(ret < 0)) {
> >  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 1ae2dde6db93..eb69263324ba 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
> >  	bool broken;
> >  };
> >  
> > -
> >  /** Top-level structure for Command Transport related data
> >   *
> >   * Includes a pair of CT buffers for bi-directional communication and tracking
> > @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >  }
> >  
> >  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> > -		      u32 *response_buf, u32 response_buf_size);
> > +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >  
> >  #endif /* _INTEL_GUC_CT_H_ */
> >

Michal Wajdeczko June 24, 2021, 5:02 p.m. UTC | #3

On 24.06.2021 17:49, Matthew Brost wrote:
> On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
>>
>>
>> On 24.06.2021 09:04, Matthew Brost wrote:
>>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
>>> will send CTBs in the critical path and does not need to wait for these
>>> CTBs to complete before moving on, hence the need for this new function.
>>>
>>> The non-blocking CTB now must have a flow control mechanism to ensure
>>> the buffer isn't overrun. A lazy spin wait is used as we believe the
>>> flow control condition should be rare with a properly sized buffer.
>>>
>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>> Several patches later in the series make use of this function.
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
>>>  3 files changed, 82 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index 4abc59f6f3cd..24b1df6ad4ae 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>>>  static
>>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
>>>  {
>>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
>>> +}
>>> +
>>> +#define INTEL_GUC_SEND_NB		BIT(31)
>>
>> hmm, this flag really belongs to intel_guc_ct_send() so it should be
>> defined as CTB flag near that function declaration
>>
> 
> I can move this up a few lines.
> 
>>> +static
>>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
>>> +{
>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
>>> +				 INTEL_GUC_SEND_NB);
>>>  }
>>>  
>>>  static inline int
>>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>>>  			   u32 *response_buf, u32 response_buf_size)
>>>  {
>>>  	return intel_guc_ct_send(&guc->ct, action, len,
>>> -				 response_buf, response_buf_size);
>>> +				 response_buf, response_buf_size, 0);
>>>  }
>>>  
>>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index a17215920e58..c9a65d05911f 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -3,6 +3,11 @@
>>>   * Copyright © 2016-2019 Intel Corporation
>>>   */
>>>  
>>> +#include <linux/circ_buf.h>
>>> +#include <linux/ktime.h>
>>> +#include <linux/time64.h>
>>> +#include <linux/timekeeping.h>
>>> +
>>>  #include "i915_drv.h"
>>>  #include "intel_guc_ct.h"
>>>  #include "gt/intel_gt.h"
>>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
>>>  static int ct_write(struct intel_guc_ct *ct,
>>>  		    const u32 *action,
>>>  		    u32 len /* in dwords */,
>>> -		    u32 fence)
>>> +		    u32 fence, u32 flags)
>>>  {
>>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
>>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
>>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
>>>  
>>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
>>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
>>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
>>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
>>
>> or as we already switched to accept and return whole HXG messages in
>> guc_send_mmio() maybe we should do the same for CTB variant too and
>> instead of using extra flag just let caller to prepare proper HXG header
>> with HXG_EVENT type and then in CTB code just look at this type to make
>> decision which code path to use
>>
> 
> Not sure I follow. Anyways could this be done in a follow up by you if
> want this change.
>  
>> note that existing callers should not be impacted, as full HXG header
>> for the REQUEST message looks exactly the same as "action" code alone.
>>
>>>  
>>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
>>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
>>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>>>  	return err;
>>>  }
>>>  
>>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>>> +{
>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> +	u32 head = READ_ONCE(desc->head);
>>> +	u32 space;
>>> +
>>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
>>> +
>>> +	return space >= len_dw;
>>
>> here you are returning true(1) as has room
>>
> 
> See below.
>  
>>> +}
>>> +
>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>> +		      const u32 *action,
>>> +		      u32 len,
>>> +		      u32 flags)
>>> +{
>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>> +	unsigned long spin_flags;
>>> +	u32 fence;
>>> +	int ret;
>>> +
>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
>>> +
>>> +	ret = h2g_has_room(ctb, len + 1);
>>
>> but here you treat "1" it as en error
>>
> 
> Yes, this patch is broken but fixed in a follow up one. Regardless I'll
> fix this patch in place.
> 
>> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
>>
> 
> Not exactly. This is following how ct_send() uses the action + len
> field. Action[0] field goes in the HXG header and extra + 1 is for the
> CT header.

well, "len" already counts "action" so by treating input as full HXG
message (including HXG header) will make it cleaner

we can try do it later but by doing it right now we would avoid
introducing this send_nb() function and deprecating them long term again

> 
>>> +	if (unlikely(ret))
>>> +		goto out;
>>> +
>>> +	fence = ct_get_next_fence(ct);
>>> +	ret = ct_write(ct, action, len, fence, flags);
>>> +	if (unlikely(ret))
>>> +		goto out;
>>> +
>>> +	intel_guc_notify(ct_to_guc(ct));
>>> +
>>> +out:
>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>  static int ct_send(struct intel_guc_ct *ct,
>>>  		   const u32 *action,
>>>  		   u32 len,
>>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>  		   u32 response_buf_size,
>>>  		   u32 *status)
>>>  {
>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>  	struct ct_request request;
>>>  	unsigned long flags;
>>>  	u32 fence;
>>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>  	GEM_BUG_ON(!len);
>>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>  	GEM_BUG_ON(!response_buf && response_buf_size);
>>> +	might_sleep();
>>>  
>>> +	/*
>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>> +	 * buffers are sized correctly the flow control condition should be
>>> +	 * rare.
>>
>> shouldn't we at least try to log such cases with RATE_LIMITED to find
>> out how "rare" it is, or if really unlikely just return -EBUSY as in
>> case of non-blocking send ?
>>
> 
> Definitely not return -EBUSY as this a blocking call. Perhaps we can log

blocking calls still can fail for various reasons, full CTB is one of
them, and if we return error (now broken) for non-blocking variant then
we should do the same for blocking variant as well and let the caller
decide about next steps

> this, but IGTs likely can hit rather easily. It really is only
> interesting if real workloads hit this. Regardless that can be a follow
> up.

if we hide retry in a silent loop then we will not find it out if we hit
this condition (IGT or real WL) or not

> 
> Matt
>  
>>> +	 */
>>> +retry:
>>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>> +		cond_resched();
>>> +		goto retry;
>>> +	}
>>>  
>>>  	fence = ct_get_next_fence(ct);
>>>  	request.fence = fence;
>>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>  	list_add_tail(&request.link, &ct->requests.pending);
>>>  	spin_unlock(&ct->requests.lock);
>>>  
>>> -	err = ct_write(ct, action, len, fence);
>>> +	err = ct_write(ct, action, len, fence, 0);
>>>  
>>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>  
>>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>   * Command Transport (CT) buffer based GuC send function.
>>>   */
>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>> -		      u32 *response_buf, u32 response_buf_size)
>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>>>  {
>>>  	u32 status = ~0; /* undefined */
>>>  	int ret;
>>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>  		return -ENODEV;
>>>  	}
>>>  
>>> +	if (flags & INTEL_GUC_SEND_NB)
>>> +		return ct_send_nb(ct, action, len, flags);
>>> +
>>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>  	if (unlikely(ret < 0)) {
>>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index 1ae2dde6db93..eb69263324ba 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
>>>  	bool broken;
>>>  };
>>>  
>>> -
>>>  /** Top-level structure for Command Transport related data
>>>   *
>>>   * Includes a pair of CT buffers for bi-directional communication and tracking
>>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>>>  }
>>>  
>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>> -		      u32 *response_buf, u32 response_buf_size);
>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>>>  
>>>  #endif /* _INTEL_GUC_CT_H_ */
>>>

Matthew Brost June 24, 2021, 10:41 p.m. UTC | #4

On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 17:49, Matthew Brost wrote:
> > On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
> >>
> >>
> >> On 24.06.2021 09:04, Matthew Brost wrote:
> >>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> >>> will send CTBs in the critical path and does not need to wait for these
> >>> CTBs to complete before moving on, hence the need for this new function.
> >>>
> >>> The non-blocking CTB now must have a flow control mechanism to ensure
> >>> the buffer isn't overrun. A lazy spin wait is used as we believe the
> >>> flow control condition should be rare with a properly sized buffer.
> >>>
> >>> The function, intel_guc_send_nb, is exported in this patch but unused.
> >>> Several patches later in the series make use of this function.
> >>>
> >>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>> ---
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
> >>>  3 files changed, 82 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> index 4abc59f6f3cd..24b1df6ad4ae 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >>>  static
> >>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >>>  {
> >>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> >>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> >>> +}
> >>> +
> >>> +#define INTEL_GUC_SEND_NB		BIT(31)
> >>
> >> hmm, this flag really belongs to intel_guc_ct_send() so it should be
> >> defined as CTB flag near that function declaration
> >>
> > 
> > I can move this up a few lines.
> > 
> >>> +static
> >>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> >>> +{
> >>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> >>> +				 INTEL_GUC_SEND_NB);
> >>>  }
> >>>  
> >>>  static inline int
> >>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >>>  			   u32 *response_buf, u32 response_buf_size)
> >>>  {
> >>>  	return intel_guc_ct_send(&guc->ct, action, len,
> >>> -				 response_buf, response_buf_size);
> >>> +				 response_buf, response_buf_size, 0);
> >>>  }
> >>>  
> >>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> index a17215920e58..c9a65d05911f 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> @@ -3,6 +3,11 @@
> >>>   * Copyright © 2016-2019 Intel Corporation
> >>>   */
> >>>  
> >>> +#include <linux/circ_buf.h>
> >>> +#include <linux/ktime.h>
> >>> +#include <linux/time64.h>
> >>> +#include <linux/timekeeping.h>
> >>> +
> >>>  #include "i915_drv.h"
> >>>  #include "intel_guc_ct.h"
> >>>  #include "gt/intel_gt.h"
> >>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
> >>>  static int ct_write(struct intel_guc_ct *ct,
> >>>  		    const u32 *action,
> >>>  		    u32 len /* in dwords */,
> >>> -		    u32 fence)
> >>> +		    u32 fence, u32 flags)
> >>>  {
> >>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> >>>  
> >>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> >>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> >>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> >>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> >>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> >>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> >>
> >> or as we already switched to accept and return whole HXG messages in
> >> guc_send_mmio() maybe we should do the same for CTB variant too and
> >> instead of using extra flag just let caller to prepare proper HXG header
> >> with HXG_EVENT type and then in CTB code just look at this type to make
> >> decision which code path to use
> >>
> > 
> > Not sure I follow. Anyways could this be done in a follow up by you if
> > want this change.
> >  
> >> note that existing callers should not be impacted, as full HXG header
> >> for the REQUEST message looks exactly the same as "action" code alone.
> >>
> >>>  
> >>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> >>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >>>  	return err;
> >>>  }
> >>>  
> >>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >>> +{
> >>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>> +	u32 head = READ_ONCE(desc->head);
> >>> +	u32 space;
> >>> +
> >>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> >>> +
> >>> +	return space >= len_dw;
> >>
> >> here you are returning true(1) as has room
> >>
> > 
> > See below.
> >  
> >>> +}
> >>> +
> >>> +static int ct_send_nb(struct intel_guc_ct *ct,
> >>> +		      const u32 *action,
> >>> +		      u32 len,
> >>> +		      u32 flags)
> >>> +{
> >>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>> +	unsigned long spin_flags;
> >>> +	u32 fence;
> >>> +	int ret;
> >>> +
> >>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> >>> +
> >>> +	ret = h2g_has_room(ctb, len + 1);
> >>
> >> but here you treat "1" it as en error
> >>
> > 
> > Yes, this patch is broken but fixed in a follow up one. Regardless I'll
> > fix this patch in place.
> > 
> >> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
> >>
> > 
> > Not exactly. This is following how ct_send() uses the action + len
> > field. Action[0] field goes in the HXG header and extra + 1 is for the
> > CT header.
> 
> well, "len" already counts "action" so by treating input as full HXG
> message (including HXG header) will make it cleaner
> 

Yes, I know. See above. To me GUC_HXG_MSG_MIN_LEN makes zero sense and
it is worse than adding + 1. This + 1 accounts for the CT header not the
HXG header. If any we add a new define, GUC_CT_HDR_LEN, and add that.

Matt

> we can try do it later but by doing it right now we would avoid
> introducing this send_nb() function and deprecating them long term again
> 
> > 
> >>> +	if (unlikely(ret))
> >>> +		goto out;
> >>> +
> >>> +	fence = ct_get_next_fence(ct);
> >>> +	ret = ct_write(ct, action, len, fence, flags);
> >>> +	if (unlikely(ret))
> >>> +		goto out;
> >>> +
> >>> +	intel_guc_notify(ct_to_guc(ct));
> >>> +
> >>> +out:
> >>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>>  static int ct_send(struct intel_guc_ct *ct,
> >>>  		   const u32 *action,
> >>>  		   u32 len,
> >>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  		   u32 response_buf_size,
> >>>  		   u32 *status)
> >>>  {
> >>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>  	struct ct_request request;
> >>>  	unsigned long flags;
> >>>  	u32 fence;
> >>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  	GEM_BUG_ON(!len);
> >>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>  	GEM_BUG_ON(!response_buf && response_buf_size);
> >>> +	might_sleep();
> >>>  
> >>> +	/*
> >>> +	 * We use a lazy spin wait loop here as we believe that if the CT
> >>> +	 * buffers are sized correctly the flow control condition should be
> >>> +	 * rare.
> >>
> >> shouldn't we at least try to log such cases with RATE_LIMITED to find
> >> out how "rare" it is, or if really unlikely just return -EBUSY as in
> >> case of non-blocking send ?
> >>
> > 
> > Definitely not return -EBUSY as this a blocking call. Perhaps we can log
> 
> blocking calls still can fail for various reasons, full CTB is one of
> them, and if we return error (now broken) for non-blocking variant then
> we should do the same for blocking variant as well and let the caller
> decide about next steps
> 
> > this, but IGTs likely can hit rather easily. It really is only
> > interesting if real workloads hit this. Regardless that can be a follow
> > up.
> 
> if we hide retry in a silent loop then we will not find it out if we hit
> this condition (IGT or real WL) or not
> 
> > 
> > Matt
> >  
> >>> +	 */
> >>> +retry:
> >>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> >>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>> +		cond_resched();
> >>> +		goto retry;
> >>> +	}
> >>>  
> >>>  	fence = ct_get_next_fence(ct);
> >>>  	request.fence = fence;
> >>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  	list_add_tail(&request.link, &ct->requests.pending);
> >>>  	spin_unlock(&ct->requests.lock);
> >>>  
> >>> -	err = ct_write(ct, action, len, fence);
> >>> +	err = ct_write(ct, action, len, fence, 0);
> >>>  
> >>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>  
> >>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>   * Command Transport (CT) buffer based GuC send function.
> >>>   */
> >>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>> -		      u32 *response_buf, u32 response_buf_size)
> >>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >>>  {
> >>>  	u32 status = ~0; /* undefined */
> >>>  	int ret;
> >>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>  		return -ENODEV;
> >>>  	}
> >>>  
> >>> +	if (flags & INTEL_GUC_SEND_NB)
> >>> +		return ct_send_nb(ct, action, len, flags);
> >>> +
> >>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >>>  	if (unlikely(ret < 0)) {
> >>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> index 1ae2dde6db93..eb69263324ba 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
> >>>  	bool broken;
> >>>  };
> >>>  
> >>> -
> >>>  /** Top-level structure for Command Transport related data
> >>>   *
> >>>   * Includes a pair of CT buffers for bi-directional communication and tracking
> >>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >>>  }
> >>>  
> >>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>> -		      u32 *response_buf, u32 response_buf_size);
> >>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >>>  
> >>>  #endif /* _INTEL_GUC_CT_H_ */
> >>>

Matthew Brost June 24, 2021, 10:47 p.m. UTC | #5

On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 24.06.2021 17:49, Matthew Brost wrote:
> > On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
> >>
> >>
> >> On 24.06.2021 09:04, Matthew Brost wrote:
> >>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> >>> will send CTBs in the critical path and does not need to wait for these
> >>> CTBs to complete before moving on, hence the need for this new function.
> >>>
> >>> The non-blocking CTB now must have a flow control mechanism to ensure
> >>> the buffer isn't overrun. A lazy spin wait is used as we believe the
> >>> flow control condition should be rare with a properly sized buffer.
> >>>
> >>> The function, intel_guc_send_nb, is exported in this patch but unused.
> >>> Several patches later in the series make use of this function.
> >>>
> >>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>> ---
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
> >>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
> >>>  3 files changed, 82 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> index 4abc59f6f3cd..24b1df6ad4ae 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >>>  static
> >>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >>>  {
> >>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> >>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> >>> +}
> >>> +
> >>> +#define INTEL_GUC_SEND_NB		BIT(31)
> >>
> >> hmm, this flag really belongs to intel_guc_ct_send() so it should be
> >> defined as CTB flag near that function declaration
> >>
> > 
> > I can move this up a few lines.
> > 
> >>> +static
> >>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> >>> +{
> >>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> >>> +				 INTEL_GUC_SEND_NB);
> >>>  }
> >>>  
> >>>  static inline int
> >>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >>>  			   u32 *response_buf, u32 response_buf_size)
> >>>  {
> >>>  	return intel_guc_ct_send(&guc->ct, action, len,
> >>> -				 response_buf, response_buf_size);
> >>> +				 response_buf, response_buf_size, 0);
> >>>  }
> >>>  
> >>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> index a17215920e58..c9a65d05911f 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>> @@ -3,6 +3,11 @@
> >>>   * Copyright © 2016-2019 Intel Corporation
> >>>   */
> >>>  
> >>> +#include <linux/circ_buf.h>
> >>> +#include <linux/ktime.h>
> >>> +#include <linux/time64.h>
> >>> +#include <linux/timekeeping.h>
> >>> +
> >>>  #include "i915_drv.h"
> >>>  #include "intel_guc_ct.h"
> >>>  #include "gt/intel_gt.h"
> >>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
> >>>  static int ct_write(struct intel_guc_ct *ct,
> >>>  		    const u32 *action,
> >>>  		    u32 len /* in dwords */,
> >>> -		    u32 fence)
> >>> +		    u32 fence, u32 flags)
> >>>  {
> >>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> >>>  
> >>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> >>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> >>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> >>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> >>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> >>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> >>
> >> or as we already switched to accept and return whole HXG messages in
> >> guc_send_mmio() maybe we should do the same for CTB variant too and
> >> instead of using extra flag just let caller to prepare proper HXG header
> >> with HXG_EVENT type and then in CTB code just look at this type to make
> >> decision which code path to use
> >>
> > 
> > Not sure I follow. Anyways could this be done in a follow up by you if
> > want this change.
> >  
> >> note that existing callers should not be impacted, as full HXG header
> >> for the REQUEST message looks exactly the same as "action" code alone.
> >>
> >>>  
> >>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> >>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >>>  	return err;
> >>>  }
> >>>  
> >>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >>> +{
> >>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>> +	u32 head = READ_ONCE(desc->head);
> >>> +	u32 space;
> >>> +
> >>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> >>> +
> >>> +	return space >= len_dw;
> >>
> >> here you are returning true(1) as has room
> >>
> > 
> > See below.
> >  
> >>> +}
> >>> +
> >>> +static int ct_send_nb(struct intel_guc_ct *ct,
> >>> +		      const u32 *action,
> >>> +		      u32 len,
> >>> +		      u32 flags)
> >>> +{
> >>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>> +	unsigned long spin_flags;
> >>> +	u32 fence;
> >>> +	int ret;
> >>> +
> >>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> >>> +
> >>> +	ret = h2g_has_room(ctb, len + 1);
> >>
> >> but here you treat "1" it as en error
> >>
> > 
> > Yes, this patch is broken but fixed in a follow up one. Regardless I'll
> > fix this patch in place.
> > 
> >> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
> >>
> > 
> > Not exactly. This is following how ct_send() uses the action + len
> > field. Action[0] field goes in the HXG header and extra + 1 is for the
> > CT header.
> 
> well, "len" already counts "action" so by treating input as full HXG
> message (including HXG header) will make it cleaner
> 
> we can try do it later but by doing it right now we would avoid
> introducing this send_nb() function and deprecating them long term again
> 
> > 
> >>> +	if (unlikely(ret))
> >>> +		goto out;
> >>> +
> >>> +	fence = ct_get_next_fence(ct);
> >>> +	ret = ct_write(ct, action, len, fence, flags);
> >>> +	if (unlikely(ret))
> >>> +		goto out;
> >>> +
> >>> +	intel_guc_notify(ct_to_guc(ct));
> >>> +
> >>> +out:
> >>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>>  static int ct_send(struct intel_guc_ct *ct,
> >>>  		   const u32 *action,
> >>>  		   u32 len,
> >>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  		   u32 response_buf_size,
> >>>  		   u32 *status)
> >>>  {
> >>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>  	struct ct_request request;
> >>>  	unsigned long flags;
> >>>  	u32 fence;
> >>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  	GEM_BUG_ON(!len);
> >>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>  	GEM_BUG_ON(!response_buf && response_buf_size);
> >>> +	might_sleep();
> >>>  
> >>> +	/*
> >>> +	 * We use a lazy spin wait loop here as we believe that if the CT
> >>> +	 * buffers are sized correctly the flow control condition should be
> >>> +	 * rare.
> >>
> >> shouldn't we at least try to log such cases with RATE_LIMITED to find
> >> out how "rare" it is, or if really unlikely just return -EBUSY as in
> >> case of non-blocking send ?
> >>
> > 
> > Definitely not return -EBUSY as this a blocking call. Perhaps we can log
> 
> blocking calls still can fail for various reasons, full CTB is one of
> them, and if we return error (now broken) for non-blocking variant then
> we should do the same for blocking variant as well and let the caller
> decide about next steps
> 

And have to rewrite reset of the stack with the new behavior, that seems
wrong. This function is allowed to block, so let it.

If you want to do this but all means go ahead but I'll likely NACK it as
over engineered.

> > this, but IGTs likely can hit rather easily. It really is only
> > interesting if real workloads hit this. Regardless that can be a follow
> > up.
> 
> if we hide retry in a silent loop then we will not find it out if we hit
> this condition (IGT or real WL) or not
>

We don't care if this is hit in IGTs as this isn't a real world use
case and will spam the dmesg if we hit this. I can make a note of this
as an open and we can revisit later.

Matt
 
> > 
> > Matt
> >  
> >>> +	 */
> >>> +retry:
> >>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> >>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>> +		cond_resched();
> >>> +		goto retry;
> >>> +	}
> >>>  
> >>>  	fence = ct_get_next_fence(ct);
> >>>  	request.fence = fence;
> >>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>  	list_add_tail(&request.link, &ct->requests.pending);
> >>>  	spin_unlock(&ct->requests.lock);
> >>>  
> >>> -	err = ct_write(ct, action, len, fence);
> >>> +	err = ct_write(ct, action, len, fence, 0);
> >>>  
> >>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>  
> >>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>   * Command Transport (CT) buffer based GuC send function.
> >>>   */
> >>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>> -		      u32 *response_buf, u32 response_buf_size)
> >>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >>>  {
> >>>  	u32 status = ~0; /* undefined */
> >>>  	int ret;
> >>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>  		return -ENODEV;
> >>>  	}
> >>>  
> >>> +	if (flags & INTEL_GUC_SEND_NB)
> >>> +		return ct_send_nb(ct, action, len, flags);
> >>> +
> >>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >>>  	if (unlikely(ret < 0)) {
> >>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> index 1ae2dde6db93..eb69263324ba 100644
> >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
> >>>  	bool broken;
> >>>  };
> >>>  
> >>> -
> >>>  /** Top-level structure for Command Transport related data
> >>>   *
> >>>   * Includes a pair of CT buffers for bi-directional communication and tracking
> >>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >>>  }
> >>>  
> >>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>> -		      u32 *response_buf, u32 response_buf_size);
> >>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >>>  
> >>>  #endif /* _INTEL_GUC_CT_H_ */
> >>>

Michal Wajdeczko June 25, 2021, 11:50 a.m. UTC | #6

On 25.06.2021 00:41, Matthew Brost wrote:
> On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
>>
>>
>> On 24.06.2021 17:49, Matthew Brost wrote:
>>> On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
>>>>
>>>>
>>>> On 24.06.2021 09:04, Matthew Brost wrote:
>>>>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
>>>>> will send CTBs in the critical path and does not need to wait for these
>>>>> CTBs to complete before moving on, hence the need for this new function.
>>>>>
>>>>> The non-blocking CTB now must have a flow control mechanism to ensure
>>>>> the buffer isn't overrun. A lazy spin wait is used as we believe the
>>>>> flow control condition should be rare with a properly sized buffer.
>>>>>
>>>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>>>> Several patches later in the series make use of this function.
>>>>>
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>> ---
>>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
>>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
>>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
>>>>>  3 files changed, 82 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>>> index 4abc59f6f3cd..24b1df6ad4ae 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>>>>>  static
>>>>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
>>>>>  {
>>>>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
>>>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
>>>>> +}
>>>>> +
>>>>> +#define INTEL_GUC_SEND_NB		BIT(31)
>>>>
>>>> hmm, this flag really belongs to intel_guc_ct_send() so it should be
>>>> defined as CTB flag near that function declaration
>>>>
>>>
>>> I can move this up a few lines.
>>>
>>>>> +static
>>>>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
>>>>> +{
>>>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
>>>>> +				 INTEL_GUC_SEND_NB);
>>>>>  }
>>>>>  
>>>>>  static inline int
>>>>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>>>>>  			   u32 *response_buf, u32 response_buf_size)
>>>>>  {
>>>>>  	return intel_guc_ct_send(&guc->ct, action, len,
>>>>> -				 response_buf, response_buf_size);
>>>>> +				 response_buf, response_buf_size, 0);
>>>>>  }
>>>>>  
>>>>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> index a17215920e58..c9a65d05911f 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>>> @@ -3,6 +3,11 @@
>>>>>   * Copyright © 2016-2019 Intel Corporation
>>>>>   */
>>>>>  
>>>>> +#include <linux/circ_buf.h>
>>>>> +#include <linux/ktime.h>
>>>>> +#include <linux/time64.h>
>>>>> +#include <linux/timekeeping.h>
>>>>> +
>>>>>  #include "i915_drv.h"
>>>>>  #include "intel_guc_ct.h"
>>>>>  #include "gt/intel_gt.h"
>>>>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
>>>>>  static int ct_write(struct intel_guc_ct *ct,
>>>>>  		    const u32 *action,
>>>>>  		    u32 len /* in dwords */,
>>>>> -		    u32 fence)
>>>>> +		    u32 fence, u32 flags)
>>>>>  {
>>>>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
>>>>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
>>>>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
>>>>>  
>>>>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>>>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>>>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
>>>>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
>>>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
>>>>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
>>>>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
>>>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
>>>>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
>>>>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
>>>>
>>>> or as we already switched to accept and return whole HXG messages in
>>>> guc_send_mmio() maybe we should do the same for CTB variant too and
>>>> instead of using extra flag just let caller to prepare proper HXG header
>>>> with HXG_EVENT type and then in CTB code just look at this type to make
>>>> decision which code path to use
>>>>
>>>
>>> Not sure I follow. Anyways could this be done in a follow up by you if
>>> want this change.
>>>  
>>>> note that existing callers should not be impacted, as full HXG header
>>>> for the REQUEST message looks exactly the same as "action" code alone.
>>>>
>>>>>  
>>>>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
>>>>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
>>>>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>>>>>  	return err;
>>>>>  }
>>>>>  
>>>>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
>>>>> +{
>>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>>> +	u32 head = READ_ONCE(desc->head);
>>>>> +	u32 space;
>>>>> +
>>>>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
>>>>> +
>>>>> +	return space >= len_dw;
>>>>
>>>> here you are returning true(1) as has room
>>>>
>>>
>>> See below.
>>>  
>>>>> +}
>>>>> +
>>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
>>>>> +		      const u32 *action,
>>>>> +		      u32 len,
>>>>> +		      u32 flags)
>>>>> +{
>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>> +	unsigned long spin_flags;
>>>>> +	u32 fence;
>>>>> +	int ret;
>>>>> +
>>>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
>>>>> +
>>>>> +	ret = h2g_has_room(ctb, len + 1);
>>>>
>>>> but here you treat "1" it as en error
>>>>
>>>
>>> Yes, this patch is broken but fixed in a follow up one. Regardless I'll
>>> fix this patch in place.
>>>
>>>> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
>>>>
>>>
>>> Not exactly. This is following how ct_send() uses the action + len
>>> field. Action[0] field goes in the HXG header and extra + 1 is for the
>>> CT header.
>>
>> well, "len" already counts "action" so by treating input as full HXG
>> message (including HXG header) will make it cleaner
>>
> 
> Yes, I know. See above. To me GUC_HXG_MSG_MIN_LEN makes zero sense and
> it is worse than adding + 1. This + 1 accounts for the CT header not the
> HXG header. If any we add a new define, GUC_CT_HDR_LEN, and add that.

you mean GUC_CTB_MSG_MIN_LEN ? it's already there [1]

[1]
https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h#n82

> 
> Matt
> 
>> we can try do it later but by doing it right now we would avoid
>> introducing this send_nb() function and deprecating them long term again
>>
>>>
>>>>> +	if (unlikely(ret))
>>>>> +		goto out;
>>>>> +
>>>>> +	fence = ct_get_next_fence(ct);
>>>>> +	ret = ct_write(ct, action, len, fence, flags);
>>>>> +	if (unlikely(ret))
>>>>> +		goto out;
>>>>> +
>>>>> +	intel_guc_notify(ct_to_guc(ct));
>>>>> +
>>>>> +out:
>>>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +
>>>>>  static int ct_send(struct intel_guc_ct *ct,
>>>>>  		   const u32 *action,
>>>>>  		   u32 len,
>>>>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>  		   u32 response_buf_size,
>>>>>  		   u32 *status)
>>>>>  {
>>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
>>>>>  	struct ct_request request;
>>>>>  	unsigned long flags;
>>>>>  	u32 fence;
>>>>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>  	GEM_BUG_ON(!len);
>>>>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>>>  	GEM_BUG_ON(!response_buf && response_buf_size);
>>>>> +	might_sleep();
>>>>>  
>>>>> +	/*
>>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>>> +	 * buffers are sized correctly the flow control condition should be
>>>>> +	 * rare.
>>>>
>>>> shouldn't we at least try to log such cases with RATE_LIMITED to find
>>>> out how "rare" it is, or if really unlikely just return -EBUSY as in
>>>> case of non-blocking send ?
>>>>
>>>
>>> Definitely not return -EBUSY as this a blocking call. Perhaps we can log
>>
>> blocking calls still can fail for various reasons, full CTB is one of
>> them, and if we return error (now broken) for non-blocking variant then
>> we should do the same for blocking variant as well and let the caller
>> decide about next steps
>>
>>> this, but IGTs likely can hit rather easily. It really is only
>>> interesting if real workloads hit this. Regardless that can be a follow
>>> up.
>>
>> if we hide retry in a silent loop then we will not find it out if we hit
>> this condition (IGT or real WL) or not
>>
>>>
>>> Matt
>>>  
>>>>> +	 */
>>>>> +retry:
>>>>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
>>>>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
>>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>> +		cond_resched();
>>>>> +		goto retry;
>>>>> +	}
>>>>>  
>>>>>  	fence = ct_get_next_fence(ct);
>>>>>  	request.fence = fence;
>>>>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>  	list_add_tail(&request.link, &ct->requests.pending);
>>>>>  	spin_unlock(&ct->requests.lock);
>>>>>  
>>>>> -	err = ct_write(ct, action, len, fence);
>>>>> +	err = ct_write(ct, action, len, fence, 0);
>>>>>  
>>>>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
>>>>>  
>>>>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
>>>>>   * Command Transport (CT) buffer based GuC send function.
>>>>>   */
>>>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>>> -		      u32 *response_buf, u32 response_buf_size)
>>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
>>>>>  {
>>>>>  	u32 status = ~0; /* undefined */
>>>>>  	int ret;
>>>>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>>>  		return -ENODEV;
>>>>>  	}
>>>>>  
>>>>> +	if (flags & INTEL_GUC_SEND_NB)
>>>>> +		return ct_send_nb(ct, action, len, flags);
>>>>> +
>>>>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>>>>>  	if (unlikely(ret < 0)) {
>>>>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> index 1ae2dde6db93..eb69263324ba 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
>>>>>  	bool broken;
>>>>>  };
>>>>>  
>>>>> -
>>>>>  /** Top-level structure for Command Transport related data
>>>>>   *
>>>>>   * Includes a pair of CT buffers for bi-directional communication and tracking
>>>>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
>>>>>  }
>>>>>  
>>>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>>>>> -		      u32 *response_buf, u32 response_buf_size);
>>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
>>>>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>>>>>  
>>>>>  #endif /* _INTEL_GUC_CT_H_ */
>>>>>

Matthew Brost June 25, 2021, 5:53 p.m. UTC | #7

On Fri, Jun 25, 2021 at 01:50:21PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 25.06.2021 00:41, Matthew Brost wrote:
> > On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
> >>
> >>
> >> On 24.06.2021 17:49, Matthew Brost wrote:
> >>> On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
> >>>>
> >>>>
> >>>> On 24.06.2021 09:04, Matthew Brost wrote:
> >>>>> Add non blocking CTB send function, intel_guc_send_nb. GuC submission
> >>>>> will send CTBs in the critical path and does not need to wait for these
> >>>>> CTBs to complete before moving on, hence the need for this new function.
> >>>>>
> >>>>> The non-blocking CTB now must have a flow control mechanism to ensure
> >>>>> the buffer isn't overrun. A lazy spin wait is used as we believe the
> >>>>> flow control condition should be rare with a properly sized buffer.
> >>>>>
> >>>>> The function, intel_guc_send_nb, is exported in this patch but unused.
> >>>>> Several patches later in the series make use of this function.
> >>>>>
> >>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>>>> ---
> >>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    | 12 +++-
> >>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++--
> >>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  3 +-
> >>>>>  3 files changed, 82 insertions(+), 10 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>>> index 4abc59f6f3cd..24b1df6ad4ae 100644
> >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>>> @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >>>>>  static
> >>>>>  inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
> >>>>>  {
> >>>>> -	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
> >>>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
> >>>>> +}
> >>>>> +
> >>>>> +#define INTEL_GUC_SEND_NB		BIT(31)
> >>>>
> >>>> hmm, this flag really belongs to intel_guc_ct_send() so it should be
> >>>> defined as CTB flag near that function declaration
> >>>>
> >>>
> >>> I can move this up a few lines.
> >>>
> >>>>> +static
> >>>>> +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
> >>>>> +{
> >>>>> +	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
> >>>>> +				 INTEL_GUC_SEND_NB);
> >>>>>  }
> >>>>>  
> >>>>>  static inline int
> >>>>> @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >>>>>  			   u32 *response_buf, u32 response_buf_size)
> >>>>>  {
> >>>>>  	return intel_guc_ct_send(&guc->ct, action, len,
> >>>>> -				 response_buf, response_buf_size);
> >>>>> +				 response_buf, response_buf_size, 0);
> >>>>>  }
> >>>>>  
> >>>>>  static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>>> index a17215920e58..c9a65d05911f 100644
> >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>>> @@ -3,6 +3,11 @@
> >>>>>   * Copyright © 2016-2019 Intel Corporation
> >>>>>   */
> >>>>>  
> >>>>> +#include <linux/circ_buf.h>
> >>>>> +#include <linux/ktime.h>
> >>>>> +#include <linux/time64.h>
> >>>>> +#include <linux/timekeeping.h>
> >>>>> +
> >>>>>  #include "i915_drv.h"
> >>>>>  #include "intel_guc_ct.h"
> >>>>>  #include "gt/intel_gt.h"
> >>>>> @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct)
> >>>>>  static int ct_write(struct intel_guc_ct *ct,
> >>>>>  		    const u32 *action,
> >>>>>  		    u32 len /* in dwords */,
> >>>>> -		    u32 fence)
> >>>>> +		    u32 fence, u32 flags)
> >>>>>  {
> >>>>>  	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>>>> @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct,
> >>>>>  		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
> >>>>>  		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
> >>>>>  
> >>>>> -	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>>>> -	      FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>>>> -			 GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
> >>>>> +	hxg = (flags & INTEL_GUC_SEND_NB) ?
> >>>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
> >>>>> +		 FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
> >>>>> +			    GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
> >>>>> +		(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> >>>>> +		 FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
> >>>>> +			    GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
> >>>>
> >>>> or as we already switched to accept and return whole HXG messages in
> >>>> guc_send_mmio() maybe we should do the same for CTB variant too and
> >>>> instead of using extra flag just let caller to prepare proper HXG header
> >>>> with HXG_EVENT type and then in CTB code just look at this type to make
> >>>> decision which code path to use
> >>>>
> >>>
> >>> Not sure I follow. Anyways could this be done in a follow up by you if
> >>> want this change.
> >>>  
> >>>> note that existing callers should not be impacted, as full HXG header
> >>>> for the REQUEST message looks exactly the same as "action" code alone.
> >>>>
> >>>>>  
> >>>>>  	CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n",
> >>>>>  		 tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]);
> >>>>> @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> >>>>>  	return err;
> >>>>>  }
> >>>>>  
> >>>>> +static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw)
> >>>>> +{
> >>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> >>>>> +	u32 head = READ_ONCE(desc->head);
> >>>>> +	u32 space;
> >>>>> +
> >>>>> +	space = CIRC_SPACE(desc->tail, head, ctb->size);
> >>>>> +
> >>>>> +	return space >= len_dw;
> >>>>
> >>>> here you are returning true(1) as has room
> >>>>
> >>>
> >>> See below.
> >>>  
> >>>>> +}
> >>>>> +
> >>>>> +static int ct_send_nb(struct intel_guc_ct *ct,
> >>>>> +		      const u32 *action,
> >>>>> +		      u32 len,
> >>>>> +		      u32 flags)
> >>>>> +{
> >>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>> +	unsigned long spin_flags;
> >>>>> +	u32 fence;
> >>>>> +	int ret;
> >>>>> +
> >>>>> +	spin_lock_irqsave(&ctb->lock, spin_flags);
> >>>>> +
> >>>>> +	ret = h2g_has_room(ctb, len + 1);
> >>>>
> >>>> but here you treat "1" it as en error
> >>>>
> >>>
> >>> Yes, this patch is broken but fixed in a follow up one. Regardless I'll
> >>> fix this patch in place.
> >>>
> >>>> and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
> >>>>
> >>>
> >>> Not exactly. This is following how ct_send() uses the action + len
> >>> field. Action[0] field goes in the HXG header and extra + 1 is for the
> >>> CT header.
> >>
> >> well, "len" already counts "action" so by treating input as full HXG
> >> message (including HXG header) will make it cleaner
> >>
> > 
> > Yes, I know. See above. To me GUC_HXG_MSG_MIN_LEN makes zero sense and
> > it is worse than adding + 1. This + 1 accounts for the CT header not the
> > HXG header. If any we add a new define, GUC_CT_HDR_LEN, and add that.
> 
> you mean GUC_CTB_MSG_MIN_LEN ? it's already there [1]
> 

Kinda? I think we should have a define GUC_CTB_HDR_LEN which is 1 and
GUC_CTB_MSG_MIN_LEN is defined as GUC_CTB_HDR_LEN. 'GUC_CTB_HDR_LEN'
makes it clear that the + 1 is referring to the header. I've done this
branch of these patches already.

Matt

> [1]
> https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h#n82
> 
> > 
> > Matt
> > 
> >> we can try do it later but by doing it right now we would avoid
> >> introducing this send_nb() function and deprecating them long term again
> >>
> >>>
> >>>>> +	if (unlikely(ret))
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	fence = ct_get_next_fence(ct);
> >>>>> +	ret = ct_write(ct, action, len, fence, flags);
> >>>>> +	if (unlikely(ret))
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	intel_guc_notify(ct_to_guc(ct));
> >>>>> +
> >>>>> +out:
> >>>>> +	spin_unlock_irqrestore(&ctb->lock, spin_flags);
> >>>>> +
> >>>>> +	return ret;
> >>>>> +}
> >>>>> +
> >>>>>  static int ct_send(struct intel_guc_ct *ct,
> >>>>>  		   const u32 *action,
> >>>>>  		   u32 len,
> >>>>> @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>  		   u32 response_buf_size,
> >>>>>  		   u32 *status)
> >>>>>  {
> >>>>> +	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
> >>>>>  	struct ct_request request;
> >>>>>  	unsigned long flags;
> >>>>>  	u32 fence;
> >>>>> @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>  	GEM_BUG_ON(!len);
> >>>>>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> >>>>>  	GEM_BUG_ON(!response_buf && response_buf_size);
> >>>>> +	might_sleep();
> >>>>>  
> >>>>> +	/*
> >>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
> >>>>> +	 * buffers are sized correctly the flow control condition should be
> >>>>> +	 * rare.
> >>>>
> >>>> shouldn't we at least try to log such cases with RATE_LIMITED to find
> >>>> out how "rare" it is, or if really unlikely just return -EBUSY as in
> >>>> case of non-blocking send ?
> >>>>
> >>>
> >>> Definitely not return -EBUSY as this a blocking call. Perhaps we can log
> >>
> >> blocking calls still can fail for various reasons, full CTB is one of
> >> them, and if we return error (now broken) for non-blocking variant then
> >> we should do the same for blocking variant as well and let the caller
> >> decide about next steps
> >>
> >>> this, but IGTs likely can hit rather easily. It really is only
> >>> interesting if real workloads hit this. Regardless that can be a follow
> >>> up.
> >>
> >> if we hide retry in a silent loop then we will not find it out if we hit
> >> this condition (IGT or real WL) or not
> >>
> >>>
> >>> Matt
> >>>  
> >>>>> +	 */
> >>>>> +retry:
> >>>>>  	spin_lock_irqsave(&ct->ctbs.send.lock, flags);
> >>>>> +	if (unlikely(!h2g_has_room(ctb, len + 1))) {
> >>>>> +		spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>> +		cond_resched();
> >>>>> +		goto retry;
> >>>>> +	}
> >>>>>  
> >>>>>  	fence = ct_get_next_fence(ct);
> >>>>>  	request.fence = fence;
> >>>>> @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>  	list_add_tail(&request.link, &ct->requests.pending);
> >>>>>  	spin_unlock(&ct->requests.lock);
> >>>>>  
> >>>>> -	err = ct_write(ct, action, len, fence);
> >>>>> +	err = ct_write(ct, action, len, fence, 0);
> >>>>>  
> >>>>>  	spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
> >>>>>  
> >>>>> @@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
> >>>>>   * Command Transport (CT) buffer based GuC send function.
> >>>>>   */
> >>>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>>> -		      u32 *response_buf, u32 response_buf_size)
> >>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags)
> >>>>>  {
> >>>>>  	u32 status = ~0; /* undefined */
> >>>>>  	int ret;
> >>>>> @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>>>  		return -ENODEV;
> >>>>>  	}
> >>>>>  
> >>>>> +	if (flags & INTEL_GUC_SEND_NB)
> >>>>> +		return ct_send_nb(ct, action, len, flags);
> >>>>> +
> >>>>>  	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
> >>>>>  	if (unlikely(ret < 0)) {
> >>>>>  		CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
> >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>>> index 1ae2dde6db93..eb69263324ba 100644
> >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> >>>>> @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer {
> >>>>>  	bool broken;
> >>>>>  };
> >>>>>  
> >>>>> -
> >>>>>  /** Top-level structure for Command Transport related data
> >>>>>   *
> >>>>>   * Includes a pair of CT buffers for bi-directional communication and tracking
> >>>>> @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
> >>>>>  }
> >>>>>  
> >>>>>  int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
> >>>>> -		      u32 *response_buf, u32 response_buf_size);
> >>>>> +		      u32 *response_buf, u32 response_buf_size, u32 flags);
> >>>>>  void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
> >>>>>  
> >>>>>  #endif /* _INTEL_GUC_CT_H_ */
> >>>>>

[04/47] drm/i915/guc: Add non blocking CTB send function

Commit Message

Comments

Patch