[25/46] drm/i915/guc: Update debugfs for GuC multi-lrc

Message ID	20210803222943.27686-26-matthew.brost@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Vnqj=M2=lists.freedesktop.org=intel-gfx-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E102760184 From: Matthew Brost <matthew.brost@intel.com> To: <intel-gfx@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org> Date: Tue, 3 Aug 2021 15:29:22 -0700 Message-Id: <20210803222943.27686-26-matthew.brost@intel.com> In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com> References: <20210803222943.27686-1-matthew.brost@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH 25/46] drm/i915/guc: Update debugfs for GuC multi-lrc Precedence: list Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Parallel submission aka multi-bb execbuf \| expand [00/46] Parallel submission aka multi-bb execbuf [01/46] drm/i915/guc: Allow flexible number of context ids [02/46] drm/i915/guc: Connect the number of guc_ids to debugfs [03/46] drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted [04/46] drm/i915/guc: Don't allow requests not ready to consume all guc_ids [05/46] drm/i915/guc: Introduce guc_submit_engine object [06/46] drm/i915/guc: Check return of __xa_store when registering a context [07/46] drm/i915/guc: Non-static lrc descriptor registration buffer [08/46] drm/i915/guc: Take GT PM ref when deregistering context [09/46] drm/i915: Add GT PM unpark worker [10/46] drm/i915/guc: Take engine PM when a context is pinned with GuC submission [11/46] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission [12/46] drm/i915/guc: Selftest for GuC flow control [13/46] drm/i915: Add logical engine mapping [14/46] drm/i915: Expose logical engine instance to user [15/46] drm/i915/guc: Introduce context parent-child relationship [16/46] drm/i915/guc: Implement GuC parent-child context pin / unpin functions [17/46] drm/i915/guc: Add multi-lrc context registration [18/46] drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts [19/46] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids [20/46] drm/i915/guc: Add hang check to GuC submit engine [21/46] drm/i915/guc: Add guc_child_context_destroy [22/46] drm/i915/guc: Implement multi-lrc submission [23/46] drm/i915/guc: Insert submit fences between requests in parent-child relationship [24/46] drm/i915/guc: Implement multi-lrc reset [25/46] drm/i915/guc: Update debugfs for GuC multi-lrc [26/46] drm/i915: Connect UAPI to GuC multi-lrc interface [27/46] drm/i915/doc: Update parallel submit doc to point to i915_drm.h [28/46] drm/i915/guc: Add basic GuC multi-lrc selftest [29/46] drm/i915/guc: Extend GuC flow control selftest for multi-lrc [30/46] drm/i915/guc: Implement no mid batch preemption for multi-lrc [31/46] drm/i915: Move secure execbuf check to execbuf2 [32/46] drm/i915: Move input/exec fence handling to i915_gem_execbuffer2 [33/46] drm/i915: Move output fence handling to i915_gem_execbuffer2 [34/46] drm/i915: Return output fence from i915_gem_do_execbuffer [35/46] drm/i915: Store batch index in struct i915_execbuffer [36/46] drm/i915: Allow callers of i915_gem_do_execbuffer to override the batch index [37/46] drm/i915: Teach execbuf there can be more than one batch in the objects list [38/46] drm/i915: Only track object dependencies on first request [39/46] drm/i915: Force parallel contexts to use copy engine for reloc [40/46] drm/i915: Multi-batch execbuffer2 [41/46] drm/i915: Eliminate unnecessary VMA calls for multi-BB submission [42/46] drm/i915: Hold all parallel requests until last request, properly handle error [43/46] drm/i915/guc: Handle errors in multi-lrc requests [44/46] drm/i915: Enable multi-bb execbuf [45/46] drm/i915/execlists: Weak parallel submission support for execlists [46/46] drm/i915/guc: Add delay before disabling scheduling on contexts

Matthew Brost Aug. 3, 2021, 10:29 p.m. UTC

Display the workqueue status in debugfs for GuC contexts that are in
parent-child relationship.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++++------
 1 file changed, 39 insertions(+), 17 deletions(-)

Daniel Vetter Aug. 9, 2021, 4:36 p.m. UTC | #1

On Tue, Aug 03, 2021 at 03:29:22PM -0700, Matthew Brost wrote:
> Display the workqueue status in debugfs for GuC contexts that are in
> parent-child relationship.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++++------
>  1 file changed, 39 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 30df1c8db491..44a7582c9aed 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
>  		gse_log_submission_info(guc->gse[i], p, i);
>  }
>  
> +static inline void guc_log_context(struct drm_printer *p,
> +				   struct intel_context *ce)
> +{
> +	drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> +	drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> +	drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> +		   ce->ring->head,
> +		   ce->lrc_reg_state[CTX_RING_HEAD]);
> +	drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> +		   ce->ring->tail,
> +		   ce->lrc_reg_state[CTX_RING_TAIL]);
> +	drm_printf(p, "\t\tContext Pin Count: %u\n",
> +		   atomic_read(&ce->pin_count));
> +	drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> +		   atomic_read(&ce->guc_id_ref));
> +	drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> +		   atomic_read(&ce->guc_num_rq_not_ready));
> +	drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> +		   ce->guc_state.sched_state,
> +		   atomic_read(&ce->guc_sched_state_no_lock));

It's all debugfs, but I think proper locking even there is good. It at
least reduces the confusion when the locking scheme is largely
undocumented. Also given how much we have rcu for everything would be good
to double-check all pointer dererences are properly protected.

> +}
> +
>  void intel_guc_submission_print_context_info(struct intel_guc *guc,
>  					     struct drm_printer *p)
>  {
>  	struct intel_context *ce;
>  	unsigned long index;
>  	xa_for_each(&guc->context_lookup, index, ce) {

xa_for_each doesn't provide any guarantees, so doesn't protect against
concurrent removeal or anything like that. We need to do better than that.
-Daniel

> -		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> -		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> -		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> -			   ce->ring->head,
> -			   ce->lrc_reg_state[CTX_RING_HEAD]);
> -		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> -			   ce->ring->tail,
> -			   ce->lrc_reg_state[CTX_RING_TAIL]);
> -		drm_printf(p, "\t\tContext Pin Count: %u\n",
> -			   atomic_read(&ce->pin_count));
> -		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> -			   atomic_read(&ce->guc_id_ref));
> -		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> -			   atomic_read(&ce->guc_num_rq_not_ready));
> -		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> -			   ce->guc_state.sched_state,
> -			   atomic_read(&ce->guc_sched_state_no_lock));
> +		GEM_BUG_ON(intel_context_is_child(ce));
>  
> +		guc_log_context(p, ce);
>  		guc_log_context_priority(p, ce);
> +
> +		if (intel_context_is_parent(ce)) {
> +			struct guc_process_desc *desc = __get_process_desc(ce);
> +			struct intel_context *child;
> +
> +			drm_printf(p, "\t\tWQI Head: %u\n",
> +				   READ_ONCE(desc->head));
> +			drm_printf(p, "\t\tWQI Tail: %u\n",
> +				   READ_ONCE(desc->tail));
> +			drm_printf(p, "\t\tWQI Status: %u\n\n",
> +				   READ_ONCE(desc->wq_status));
> +
> +			for_each_child(ce, child)
> +				guc_log_context(p, child);
> +		}
>  	}
>  }
>  
> -- 
> 2.28.0
>

Matthew Brost Aug. 9, 2021, 7:13 p.m. UTC | #2

On Mon, Aug 09, 2021 at 06:36:44PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:22PM -0700, Matthew Brost wrote:
> > Display the workqueue status in debugfs for GuC contexts that are in
> > parent-child relationship.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++++------
> >  1 file changed, 39 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 30df1c8db491..44a7582c9aed 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> >  		gse_log_submission_info(guc->gse[i], p, i);
> >  }
> >  
> > +static inline void guc_log_context(struct drm_printer *p,
> > +				   struct intel_context *ce)
> > +{
> > +	drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > +	drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > +	drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > +		   ce->ring->head,
> > +		   ce->lrc_reg_state[CTX_RING_HEAD]);
> > +	drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > +		   ce->ring->tail,
> > +		   ce->lrc_reg_state[CTX_RING_TAIL]);
> > +	drm_printf(p, "\t\tContext Pin Count: %u\n",
> > +		   atomic_read(&ce->pin_count));
> > +	drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > +		   atomic_read(&ce->guc_id_ref));
> > +	drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > +		   atomic_read(&ce->guc_num_rq_not_ready));
> > +	drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > +		   ce->guc_state.sched_state,
> > +		   atomic_read(&ce->guc_sched_state_no_lock));
> 
> It's all debugfs, but I think proper locking even there is good. It at
> least reduces the confusion when the locking scheme is largely
> undocumented. Also given how much we have rcu for everything would be good
> to double-check all pointer dererences are properly protected.
>

Not sure if I 100% follow this but I don't think any of the pointers
dref here are RCU protected. Certainly none of the GuC ones are.

Will double before the next respin though.

> > +}
> > +
> >  void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >  					     struct drm_printer *p)
> >  {
> >  	struct intel_context *ce;
> >  	unsigned long index;
> >  	xa_for_each(&guc->context_lookup, index, ce) {
> 
> xa_for_each doesn't provide any guarantees, so doesn't protect against
> concurrent removeal or anything like that. We need to do better than that.

https://elixir.bootlin.com/linux/latest/source/include/linux/xarray.h#L498
'It is safe to modify the array during the iteration.'

Matt

> -Daniel
> 
> > -		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > -		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > -		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > -			   ce->ring->head,
> > -			   ce->lrc_reg_state[CTX_RING_HEAD]);
> > -		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > -			   ce->ring->tail,
> > -			   ce->lrc_reg_state[CTX_RING_TAIL]);
> > -		drm_printf(p, "\t\tContext Pin Count: %u\n",
> > -			   atomic_read(&ce->pin_count));
> > -		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > -			   atomic_read(&ce->guc_id_ref));
> > -		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > -			   atomic_read(&ce->guc_num_rq_not_ready));
> > -		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > -			   ce->guc_state.sched_state,
> > -			   atomic_read(&ce->guc_sched_state_no_lock));
> > +		GEM_BUG_ON(intel_context_is_child(ce));
> >  
> > +		guc_log_context(p, ce);
> >  		guc_log_context_priority(p, ce);
> > +
> > +		if (intel_context_is_parent(ce)) {
> > +			struct guc_process_desc *desc = __get_process_desc(ce);
> > +			struct intel_context *child;
> > +
> > +			drm_printf(p, "\t\tWQI Head: %u\n",
> > +				   READ_ONCE(desc->head));
> > +			drm_printf(p, "\t\tWQI Tail: %u\n",
> > +				   READ_ONCE(desc->tail));
> > +			drm_printf(p, "\t\tWQI Status: %u\n\n",
> > +				   READ_ONCE(desc->wq_status));
> > +
> > +			for_each_child(ce, child)
> > +				guc_log_context(p, child);
> > +		}
> >  	}
> >  }
> >  
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Daniel Vetter Aug. 10, 2021, 9:23 a.m. UTC | #3

On Mon, Aug 09, 2021 at 07:13:11PM +0000, Matthew Brost wrote:
> On Mon, Aug 09, 2021 at 06:36:44PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:22PM -0700, Matthew Brost wrote:
> > > Display the workqueue status in debugfs for GuC contexts that are in
> > > parent-child relationship.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++++------
> > >  1 file changed, 39 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 30df1c8db491..44a7582c9aed 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> > >  		gse_log_submission_info(guc->gse[i], p, i);
> > >  }
> > >  
> > > +static inline void guc_log_context(struct drm_printer *p,
> > > +				   struct intel_context *ce)
> > > +{
> > > +	drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > +	drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > +	drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > +		   ce->ring->head,
> > > +		   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > +	drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > +		   ce->ring->tail,
> > > +		   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > +	drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > +		   atomic_read(&ce->pin_count));
> > > +	drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > +		   atomic_read(&ce->guc_id_ref));
> > > +	drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > +		   atomic_read(&ce->guc_num_rq_not_ready));
> > > +	drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > +		   ce->guc_state.sched_state,
> > > +		   atomic_read(&ce->guc_sched_state_no_lock));
> > 
> > It's all debugfs, but I think proper locking even there is good. It at
> > least reduces the confusion when the locking scheme is largely
> > undocumented. Also given how much we have rcu for everything would be good
> > to double-check all pointer dererences are properly protected.
> >
> 
> Not sure if I 100% follow this but I don't think any of the pointers
> dref here are RCU protected. Certainly none of the GuC ones are.
> 
> Will double before the next respin though.
> 
> > > +}
> > > +
> > >  void intel_guc_submission_print_context_info(struct intel_guc *guc,
> > >  					     struct drm_printer *p)
> > >  {
> > >  	struct intel_context *ce;
> > >  	unsigned long index;
> > >  	xa_for_each(&guc->context_lookup, index, ce) {
> > 
> > xa_for_each doesn't provide any guarantees, so doesn't protect against
> > concurrent removeal or anything like that. We need to do better than that.
> 
> https://elixir.bootlin.com/linux/latest/source/include/linux/xarray.h#L498
> 'It is safe to modify the array during the iteration.'

The xarray. Not the thing you're dereferencing, because the xarray only
stores pointers, not your data structure. So yeah correct statement is
that it doesn't provide you any guarantees beyond "the iterator wont be
confused if the xarray itself is modified during iteration". Which isn't
what you need here, you need a lot more.
-Daniel

> 
> Matt
> 
> > -Daniel
> > 
> > > -		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > -		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > -		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > -			   ce->ring->head,
> > > -			   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > -		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > -			   ce->ring->tail,
> > > -			   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > -		drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > -			   atomic_read(&ce->pin_count));
> > > -		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > -			   atomic_read(&ce->guc_id_ref));
> > > -		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > -			   atomic_read(&ce->guc_num_rq_not_ready));
> > > -		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > -			   ce->guc_state.sched_state,
> > > -			   atomic_read(&ce->guc_sched_state_no_lock));
> > > +		GEM_BUG_ON(intel_context_is_child(ce));
> > >  
> > > +		guc_log_context(p, ce);
> > >  		guc_log_context_priority(p, ce);
> > > +
> > > +		if (intel_context_is_parent(ce)) {
> > > +			struct guc_process_desc *desc = __get_process_desc(ce);
> > > +			struct intel_context *child;
> > > +
> > > +			drm_printf(p, "\t\tWQI Head: %u\n",
> > > +				   READ_ONCE(desc->head));
> > > +			drm_printf(p, "\t\tWQI Tail: %u\n",
> > > +				   READ_ONCE(desc->tail));
> > > +			drm_printf(p, "\t\tWQI Status: %u\n\n",
> > > +				   READ_ONCE(desc->wq_status));
> > > +
> > > +			for_each_child(ce, child)
> > > +				guc_log_context(p, child);
> > > +		}
> > >  	}
> > >  }
> > >  
> > > -- 
> > > 2.28.0
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

Daniel Vetter Aug. 10, 2021, 9:27 a.m. UTC | #4

On Tue, Aug 10, 2021 at 11:23:39AM +0200, Daniel Vetter wrote:
> On Mon, Aug 09, 2021 at 07:13:11PM +0000, Matthew Brost wrote:
> > On Mon, Aug 09, 2021 at 06:36:44PM +0200, Daniel Vetter wrote:
> > > On Tue, Aug 03, 2021 at 03:29:22PM -0700, Matthew Brost wrote:
> > > > Display the workqueue status in debugfs for GuC contexts that are in
> > > > parent-child relationship.
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++++------
> > > >  1 file changed, 39 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 30df1c8db491..44a7582c9aed 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> > > >  		gse_log_submission_info(guc->gse[i], p, i);
> > > >  }
> > > >  
> > > > +static inline void guc_log_context(struct drm_printer *p,
> > > > +				   struct intel_context *ce)
> > > > +{
> > > > +	drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > > +	drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > > +	drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > > +		   ce->ring->head,
> > > > +		   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > > +	drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > > +		   ce->ring->tail,
> > > > +		   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > > +	drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > > +		   atomic_read(&ce->pin_count));
> > > > +	drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > > +		   atomic_read(&ce->guc_id_ref));
> > > > +	drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > > +		   atomic_read(&ce->guc_num_rq_not_ready));
> > > > +	drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > > +		   ce->guc_state.sched_state,
> > > > +		   atomic_read(&ce->guc_sched_state_no_lock));
> > > 
> > > It's all debugfs, but I think proper locking even there is good. It at
> > > least reduces the confusion when the locking scheme is largely
> > > undocumented. Also given how much we have rcu for everything would be good
> > > to double-check all pointer dererences are properly protected.
> > >
> > 
> > Not sure if I 100% follow this but I don't think any of the pointers
> > dref here are RCU protected. Certainly none of the GuC ones are.
> > 
> > Will double before the next respin though.
> > 
> > > > +}
> > > > +
> > > >  void intel_guc_submission_print_context_info(struct intel_guc *guc,
> > > >  					     struct drm_printer *p)
> > > >  {
> > > >  	struct intel_context *ce;
> > > >  	unsigned long index;
> > > >  	xa_for_each(&guc->context_lookup, index, ce) {
> > > 
> > > xa_for_each doesn't provide any guarantees, so doesn't protect against
> > > concurrent removeal or anything like that. We need to do better than that.
> > 
> > https://elixir.bootlin.com/linux/latest/source/include/linux/xarray.h#L498
> > 'It is safe to modify the array during the iteration.'
> 
> The xarray. Not the thing you're dereferencing, because the xarray only
> stores pointers, not your data structure. So yeah correct statement is
> that it doesn't provide you any guarantees beyond "the iterator wont be
> confused if the xarray itself is modified during iteration". Which isn't
> what you need here, you need a lot more.

Or spelled out: The pointer you get could become immediately meaningless,
before you can look at it, due to a concurrent removal/release. All the
xa_for_each guarantees you is that on the next round you get the next
pointer, until you got them all (plus/minus concurrent changes). But that
next pointer could have become meaningless right away too.

So you need your own locking to make use of these pointers you got and
make sure they're not immediately meaningless before your loop body even
started.

One of the reasons why I think this is so important is that debugfs files
nest a lot of loops fairly often, so are good cheat-sheet for the locking
if it happens to be undocumented (which also shouldn't be the case). Ofc
if there's no locking in debugfs, no cheat-sheet :-)

Cheers, Daniel

> -Daniel
> 
> > 
> > Matt
> > 
> > > -Daniel
> > > 
> > > > -		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > > -		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > > -		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > > -			   ce->ring->head,
> > > > -			   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > > -		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > > -			   ce->ring->tail,
> > > > -			   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > > -		drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > > -			   atomic_read(&ce->pin_count));
> > > > -		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > > -			   atomic_read(&ce->guc_id_ref));
> > > > -		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > > -			   atomic_read(&ce->guc_num_rq_not_ready));
> > > > -		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > > -			   ce->guc_state.sched_state,
> > > > -			   atomic_read(&ce->guc_sched_state_no_lock));
> > > > +		GEM_BUG_ON(intel_context_is_child(ce));
> > > >  
> > > > +		guc_log_context(p, ce);
> > > >  		guc_log_context_priority(p, ce);
> > > > +
> > > > +		if (intel_context_is_parent(ce)) {
> > > > +			struct guc_process_desc *desc = __get_process_desc(ce);
> > > > +			struct intel_context *child;
> > > > +
> > > > +			drm_printf(p, "\t\tWQI Head: %u\n",
> > > > +				   READ_ONCE(desc->head));
> > > > +			drm_printf(p, "\t\tWQI Tail: %u\n",
> > > > +				   READ_ONCE(desc->tail));
> > > > +			drm_printf(p, "\t\tWQI Status: %u\n\n",
> > > > +				   READ_ONCE(desc->wq_status));
> > > > +
> > > > +			for_each_child(ce, child)
> > > > +				guc_log_context(p, child);
> > > > +		}
> > > >  	}
> > > >  }
> > > >  
> > > > -- 
> > > > 2.28.0
> > > > 
> > > 
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Matthew Brost Aug. 10, 2021, 5:29 p.m. UTC | #5

On Tue, Aug 10, 2021 at 11:27:31AM +0200, Daniel Vetter wrote:
> On Tue, Aug 10, 2021 at 11:23:39AM +0200, Daniel Vetter wrote:
> > On Mon, Aug 09, 2021 at 07:13:11PM +0000, Matthew Brost wrote:
> > > On Mon, Aug 09, 2021 at 06:36:44PM +0200, Daniel Vetter wrote:
> > > > On Tue, Aug 03, 2021 at 03:29:22PM -0700, Matthew Brost wrote:
> > > > > Display the workqueue status in debugfs for GuC contexts that are in
> > > > > parent-child relationship.
> > > > > 
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++++------
> > > > >  1 file changed, 39 insertions(+), 17 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > index 30df1c8db491..44a7582c9aed 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > @@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> > > > >  		gse_log_submission_info(guc->gse[i], p, i);
> > > > >  }
> > > > >  
> > > > > +static inline void guc_log_context(struct drm_printer *p,
> > > > > +				   struct intel_context *ce)
> > > > > +{
> > > > > +	drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > > > +	drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > > > +	drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > > > +		   ce->ring->head,
> > > > > +		   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > > > +	drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > > > +		   ce->ring->tail,
> > > > > +		   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > > > +	drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > > > +		   atomic_read(&ce->pin_count));
> > > > > +	drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > > > +		   atomic_read(&ce->guc_id_ref));
> > > > > +	drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > > > +		   atomic_read(&ce->guc_num_rq_not_ready));
> > > > > +	drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > > > +		   ce->guc_state.sched_state,
> > > > > +		   atomic_read(&ce->guc_sched_state_no_lock));
> > > > 
> > > > It's all debugfs, but I think proper locking even there is good. It at
> > > > least reduces the confusion when the locking scheme is largely
> > > > undocumented. Also given how much we have rcu for everything would be good
> > > > to double-check all pointer dererences are properly protected.
> > > >
> > > 
> > > Not sure if I 100% follow this but I don't think any of the pointers
> > > dref here are RCU protected. Certainly none of the GuC ones are.
> > > 
> > > Will double before the next respin though.
> > > 
> > > > > +}
> > > > > +
> > > > >  void intel_guc_submission_print_context_info(struct intel_guc *guc,
> > > > >  					     struct drm_printer *p)
> > > > >  {
> > > > >  	struct intel_context *ce;
> > > > >  	unsigned long index;
> > > > >  	xa_for_each(&guc->context_lookup, index, ce) {
> > > > 
> > > > xa_for_each doesn't provide any guarantees, so doesn't protect against
> > > > concurrent removeal or anything like that. We need to do better than that.
> > > 
> > > https://elixir.bootlin.com/linux/latest/source/include/linux/xarray.h#L498
> > > 'It is safe to modify the array during the iteration.'
> > 
> > The xarray. Not the thing you're dereferencing, because the xarray only
> > stores pointers, not your data structure. So yeah correct statement is
> > that it doesn't provide you any guarantees beyond "the iterator wont be
> > confused if the xarray itself is modified during iteration". Which isn't
> > what you need here, you need a lot more.
> 
> Or spelled out: The pointer you get could become immediately meaningless,
> before you can look at it, due to a concurrent removal/release. All the
> xa_for_each guarantees you is that on the next round you get the next
> pointer, until you got them all (plus/minus concurrent changes). But that
> next pointer could have become meaningless right away too.
> 
> So you need your own locking to make use of these pointers you got and
> make sure they're not immediately meaningless before your loop body even
> started.
> 

Ok, I think I see your point. Likely whenever we do a xa_for_each over
&guc->context_lookup we should just grab its lock as if it is in the
xarray we have reference to object looked up. Also everytime we use
xa_for_each on &guc->context_lookup it is a corner case we it is ok to
block anyone else from using this (e.g. during a reset, checking
debugfs, etc...). Does that sound correct?

Matt

> One of the reasons why I think this is so important is that debugfs files
> nest a lot of loops fairly often, so are good cheat-sheet for the locking
> if it happens to be undocumented (which also shouldn't be the case). Ofc
> if there's no locking in debugfs, no cheat-sheet :-)
> 
> Cheers, Daniel
> 
> > -Daniel
> > 
> > > 
> > > Matt
> > > 
> > > > -Daniel
> > > > 
> > > > > -		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > > > -		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > > > -		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > > > -			   ce->ring->head,
> > > > > -			   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > > > -		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > > > -			   ce->ring->tail,
> > > > > -			   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > > > -		drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > > > -			   atomic_read(&ce->pin_count));
> > > > > -		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > > > -			   atomic_read(&ce->guc_id_ref));
> > > > > -		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > > > -			   atomic_read(&ce->guc_num_rq_not_ready));
> > > > > -		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > > > -			   ce->guc_state.sched_state,
> > > > > -			   atomic_read(&ce->guc_sched_state_no_lock));
> > > > > +		GEM_BUG_ON(intel_context_is_child(ce));
> > > > >  
> > > > > +		guc_log_context(p, ce);
> > > > >  		guc_log_context_priority(p, ce);
> > > > > +
> > > > > +		if (intel_context_is_parent(ce)) {
> > > > > +			struct guc_process_desc *desc = __get_process_desc(ce);
> > > > > +			struct intel_context *child;
> > > > > +
> > > > > +			drm_printf(p, "\t\tWQI Head: %u\n",
> > > > > +				   READ_ONCE(desc->head));
> > > > > +			drm_printf(p, "\t\tWQI Tail: %u\n",
> > > > > +				   READ_ONCE(desc->tail));
> > > > > +			drm_printf(p, "\t\tWQI Status: %u\n\n",
> > > > > +				   READ_ONCE(desc->wq_status));
> > > > > +
> > > > > +			for_each_child(ce, child)
> > > > > +				guc_log_context(p, child);
> > > > > +		}
> > > > >  	}
> > > > >  }
> > > > >  
> > > > > -- 
> > > > > 2.28.0
> > > > > 
> > > > 
> > > > -- 
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Daniel Vetter Aug. 11, 2021, 10:04 a.m. UTC | #6

On Tue, Aug 10, 2021 at 05:29:46PM +0000, Matthew Brost wrote:
> On Tue, Aug 10, 2021 at 11:27:31AM +0200, Daniel Vetter wrote:
> > On Tue, Aug 10, 2021 at 11:23:39AM +0200, Daniel Vetter wrote:
> > > On Mon, Aug 09, 2021 at 07:13:11PM +0000, Matthew Brost wrote:
> > > > On Mon, Aug 09, 2021 at 06:36:44PM +0200, Daniel Vetter wrote:
> > > > > On Tue, Aug 03, 2021 at 03:29:22PM -0700, Matthew Brost wrote:
> > > > > > Display the workqueue status in debugfs for GuC contexts that are in
> > > > > > parent-child relationship.
> > > > > > 
> > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > ---
> > > > > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++++------
> > > > > >  1 file changed, 39 insertions(+), 17 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > > index 30df1c8db491..44a7582c9aed 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > > @@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> > > > > >  		gse_log_submission_info(guc->gse[i], p, i);
> > > > > >  }
> > > > > >  
> > > > > > +static inline void guc_log_context(struct drm_printer *p,
> > > > > > +				   struct intel_context *ce)
> > > > > > +{
> > > > > > +	drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > > > > +	drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > > > > +	drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > > > > +		   ce->ring->head,
> > > > > > +		   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > > > > +	drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > > > > +		   ce->ring->tail,
> > > > > > +		   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > > > > +	drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > > > > +		   atomic_read(&ce->pin_count));
> > > > > > +	drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > > > > +		   atomic_read(&ce->guc_id_ref));
> > > > > > +	drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > > > > +		   atomic_read(&ce->guc_num_rq_not_ready));
> > > > > > +	drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > > > > +		   ce->guc_state.sched_state,
> > > > > > +		   atomic_read(&ce->guc_sched_state_no_lock));
> > > > > 
> > > > > It's all debugfs, but I think proper locking even there is good. It at
> > > > > least reduces the confusion when the locking scheme is largely
> > > > > undocumented. Also given how much we have rcu for everything would be good
> > > > > to double-check all pointer dererences are properly protected.
> > > > >
> > > > 
> > > > Not sure if I 100% follow this but I don't think any of the pointers
> > > > dref here are RCU protected. Certainly none of the GuC ones are.
> > > > 
> > > > Will double before the next respin though.
> > > > 
> > > > > > +}
> > > > > > +
> > > > > >  void intel_guc_submission_print_context_info(struct intel_guc *guc,
> > > > > >  					     struct drm_printer *p)
> > > > > >  {
> > > > > >  	struct intel_context *ce;
> > > > > >  	unsigned long index;
> > > > > >  	xa_for_each(&guc->context_lookup, index, ce) {
> > > > > 
> > > > > xa_for_each doesn't provide any guarantees, so doesn't protect against
> > > > > concurrent removeal or anything like that. We need to do better than that.
> > > > 
> > > > https://elixir.bootlin.com/linux/latest/source/include/linux/xarray.h#L498
> > > > 'It is safe to modify the array during the iteration.'
> > > 
> > > The xarray. Not the thing you're dereferencing, because the xarray only
> > > stores pointers, not your data structure. So yeah correct statement is
> > > that it doesn't provide you any guarantees beyond "the iterator wont be
> > > confused if the xarray itself is modified during iteration". Which isn't
> > > what you need here, you need a lot more.
> > 
> > Or spelled out: The pointer you get could become immediately meaningless,
> > before you can look at it, due to a concurrent removal/release. All the
> > xa_for_each guarantees you is that on the next round you get the next
> > pointer, until you got them all (plus/minus concurrent changes). But that
> > next pointer could have become meaningless right away too.
> > 
> > So you need your own locking to make use of these pointers you got and
> > make sure they're not immediately meaningless before your loop body even
> > started.
> > 
> 
> Ok, I think I see your point. Likely whenever we do a xa_for_each over
> &guc->context_lookup we should just grab its lock as if it is in the
> xarray we have reference to object looked up. Also everytime we use
> xa_for_each on &guc->context_lookup it is a corner case we it is ok to
> block anyone else from using this (e.g. during a reset, checking
> debugfs, etc...). Does that sound correct?

Yup, generally the simplest is to just hold the lock for the
list/xarray/whatever to keep the object alive. Next up in complexity is to
grab a temporary reference. This is usually required if the next step is
taking a mutex, and your lookup lock is a spinlock. Or if you have some
other locking inversion.

And yes anywhere in debugfs, or anywhere else where performance doesn't
matter just use proper locking, no tricks with rcu or lockless or
whatever.

Finally a word on gpu reset: It is currently not annotated, but gpu reset
is a dma_fence signalling critical section (if we fail to get through gpu
reset dma_fence are potentially stuck). That means any lock you take in
gpu reset is very encumbered, so needs an audit to make sure you're not
creating an inversion anywhere. While I bring this up, I noticed you're
using i915_sw_fence instead of dma_fence directly in a bunch of places in
GuC code. We're kinda aiming to get rid of i915_sw_fence (and maybe move
the remaining useful bits into drivers/dma-buf/), so using less of the
i915-NIH-isms would be really good in general. There's unfortunately way
too much of that too.
-Daniel

> 
> Matt
> 
> > One of the reasons why I think this is so important is that debugfs files
> > nest a lot of loops fairly often, so are good cheat-sheet for the locking
> > if it happens to be undocumented (which also shouldn't be the case). Ofc
> > if there's no locking in debugfs, no cheat-sheet :-)
> > 
> > Cheers, Daniel
> > 
> > > -Daniel
> > > 
> > > > 
> > > > Matt
> > > > 
> > > > > -Daniel
> > > > > 
> > > > > > -		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > > > > -		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > > > > -		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > > > > -			   ce->ring->head,
> > > > > > -			   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > > > > -		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > > > > -			   ce->ring->tail,
> > > > > > -			   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > > > > -		drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > > > > -			   atomic_read(&ce->pin_count));
> > > > > > -		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > > > > -			   atomic_read(&ce->guc_id_ref));
> > > > > > -		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > > > > -			   atomic_read(&ce->guc_num_rq_not_ready));
> > > > > > -		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > > > > -			   ce->guc_state.sched_state,
> > > > > > -			   atomic_read(&ce->guc_sched_state_no_lock));
> > > > > > +		GEM_BUG_ON(intel_context_is_child(ce));
> > > > > >  
> > > > > > +		guc_log_context(p, ce);
> > > > > >  		guc_log_context_priority(p, ce);
> > > > > > +
> > > > > > +		if (intel_context_is_parent(ce)) {
> > > > > > +			struct guc_process_desc *desc = __get_process_desc(ce);
> > > > > > +			struct intel_context *child;
> > > > > > +
> > > > > > +			drm_printf(p, "\t\tWQI Head: %u\n",
> > > > > > +				   READ_ONCE(desc->head));
> > > > > > +			drm_printf(p, "\t\tWQI Tail: %u\n",
> > > > > > +				   READ_ONCE(desc->tail));
> > > > > > +			drm_printf(p, "\t\tWQI Status: %u\n\n",
> > > > > > +				   READ_ONCE(desc->wq_status));
> > > > > > +
> > > > > > +			for_each_child(ce, child)
> > > > > > +				guc_log_context(p, child);
> > > > > > +		}
> > > > > >  	}
> > > > > >  }
> > > > > >  
> > > > > > -- 
> > > > > > 2.28.0
> > > > > > 
> > > > > 
> > > > > -- 
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > http://blog.ffwll.ch
> > > 
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

Matthew Brost Aug. 11, 2021, 5:35 p.m. UTC | #7

On Wed, Aug 11, 2021 at 12:04:04PM +0200, Daniel Vetter wrote:
> On Tue, Aug 10, 2021 at 05:29:46PM +0000, Matthew Brost wrote:
> > On Tue, Aug 10, 2021 at 11:27:31AM +0200, Daniel Vetter wrote:
> > > On Tue, Aug 10, 2021 at 11:23:39AM +0200, Daniel Vetter wrote:
> > > > On Mon, Aug 09, 2021 at 07:13:11PM +0000, Matthew Brost wrote:
> > > > > On Mon, Aug 09, 2021 at 06:36:44PM +0200, Daniel Vetter wrote:
> > > > > > On Tue, Aug 03, 2021 at 03:29:22PM -0700, Matthew Brost wrote:
> > > > > > > Display the workqueue status in debugfs for GuC contexts that are in
> > > > > > > parent-child relationship.
> > > > > > > 
> > > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > > ---
> > > > > > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++++------
> > > > > > >  1 file changed, 39 insertions(+), 17 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > > > index 30df1c8db491..44a7582c9aed 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > > > > @@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> > > > > > >  		gse_log_submission_info(guc->gse[i], p, i);
> > > > > > >  }
> > > > > > >  
> > > > > > > +static inline void guc_log_context(struct drm_printer *p,
> > > > > > > +				   struct intel_context *ce)
> > > > > > > +{
> > > > > > > +	drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > > > > > +	drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > > > > > +	drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > > > > > +		   ce->ring->head,
> > > > > > > +		   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > > > > > +	drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > > > > > +		   ce->ring->tail,
> > > > > > > +		   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > > > > > +	drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > > > > > +		   atomic_read(&ce->pin_count));
> > > > > > > +	drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > > > > > +		   atomic_read(&ce->guc_id_ref));
> > > > > > > +	drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > > > > > +		   atomic_read(&ce->guc_num_rq_not_ready));
> > > > > > > +	drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > > > > > +		   ce->guc_state.sched_state,
> > > > > > > +		   atomic_read(&ce->guc_sched_state_no_lock));
> > > > > > 
> > > > > > It's all debugfs, but I think proper locking even there is good. It at
> > > > > > least reduces the confusion when the locking scheme is largely
> > > > > > undocumented. Also given how much we have rcu for everything would be good
> > > > > > to double-check all pointer dererences are properly protected.
> > > > > >
> > > > > 
> > > > > Not sure if I 100% follow this but I don't think any of the pointers
> > > > > dref here are RCU protected. Certainly none of the GuC ones are.
> > > > > 
> > > > > Will double before the next respin though.
> > > > > 
> > > > > > > +}
> > > > > > > +
> > > > > > >  void intel_guc_submission_print_context_info(struct intel_guc *guc,
> > > > > > >  					     struct drm_printer *p)
> > > > > > >  {
> > > > > > >  	struct intel_context *ce;
> > > > > > >  	unsigned long index;
> > > > > > >  	xa_for_each(&guc->context_lookup, index, ce) {
> > > > > > 
> > > > > > xa_for_each doesn't provide any guarantees, so doesn't protect against
> > > > > > concurrent removeal or anything like that. We need to do better than that.
> > > > > 
> > > > > https://elixir.bootlin.com/linux/latest/source/include/linux/xarray.h#L498
> > > > > 'It is safe to modify the array during the iteration.'
> > > > 
> > > > The xarray. Not the thing you're dereferencing, because the xarray only
> > > > stores pointers, not your data structure. So yeah correct statement is
> > > > that it doesn't provide you any guarantees beyond "the iterator wont be
> > > > confused if the xarray itself is modified during iteration". Which isn't
> > > > what you need here, you need a lot more.
> > > 
> > > Or spelled out: The pointer you get could become immediately meaningless,
> > > before you can look at it, due to a concurrent removal/release. All the
> > > xa_for_each guarantees you is that on the next round you get the next
> > > pointer, until you got them all (plus/minus concurrent changes). But that
> > > next pointer could have become meaningless right away too.
> > > 
> > > So you need your own locking to make use of these pointers you got and
> > > make sure they're not immediately meaningless before your loop body even
> > > started.
> > > 
> > 
> > Ok, I think I see your point. Likely whenever we do a xa_for_each over
> > &guc->context_lookup we should just grab its lock as if it is in the
> > xarray we have reference to object looked up. Also everytime we use
> > xa_for_each on &guc->context_lookup it is a corner case we it is ok to
> > block anyone else from using this (e.g. during a reset, checking
> > debugfs, etc...). Does that sound correct?
> 
> Yup, generally the simplest is to just hold the lock for the
> list/xarray/whatever to keep the object alive. Next up in complexity is to
> grab a temporary reference. This is usually required if the next step is
> taking a mutex, and your lookup lock is a spinlock. Or if you have some
> other locking inversion.
> 
> And yes anywhere in debugfs, or anywhere else where performance doesn't
> matter just use proper locking, no tricks with rcu or lockless or
> whatever.
> 
> Finally a word on gpu reset: It is currently not annotated, but gpu reset
> is a dma_fence signalling critical section (if we fail to get through gpu
> reset dma_fence are potentially stuck). That means any lock you take in
> gpu reset is very encumbered, so needs an audit to make sure you're not
> creating an inversion anywhere. While I bring this up, I noticed you're
> using i915_sw_fence instead of dma_fence directly in a bunch of places in
> GuC code. We're kinda aiming to get rid of i915_sw_fence (and maybe move
> the remaining useful bits into drivers/dma-buf/), so using less of the
> i915-NIH-isms would be really good in general. There's unfortunately way
> too much of that too.

Yes, I'm aware of trying to get rid of the i915_sw_fence as a long term
goal. That is going to take quite a while to unwind.

Matt 

> -Daniel
> 
> > 
> > Matt
> > 
> > > One of the reasons why I think this is so important is that debugfs files
> > > nest a lot of loops fairly often, so are good cheat-sheet for the locking
> > > if it happens to be undocumented (which also shouldn't be the case). Ofc
> > > if there's no locking in debugfs, no cheat-sheet :-)
> > > 
> > > Cheers, Daniel
> > > 
> > > > -Daniel
> > > > 
> > > > > 
> > > > > Matt
> > > > > 
> > > > > > -Daniel
> > > > > > 
> > > > > > > -		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > > > > > > -		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > > > > > > -		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > > > > > > -			   ce->ring->head,
> > > > > > > -			   ce->lrc_reg_state[CTX_RING_HEAD]);
> > > > > > > -		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > > > > > > -			   ce->ring->tail,
> > > > > > > -			   ce->lrc_reg_state[CTX_RING_TAIL]);
> > > > > > > -		drm_printf(p, "\t\tContext Pin Count: %u\n",
> > > > > > > -			   atomic_read(&ce->pin_count));
> > > > > > > -		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > > > > > > -			   atomic_read(&ce->guc_id_ref));
> > > > > > > -		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > > > > > > -			   atomic_read(&ce->guc_num_rq_not_ready));
> > > > > > > -		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > > > > > > -			   ce->guc_state.sched_state,
> > > > > > > -			   atomic_read(&ce->guc_sched_state_no_lock));
> > > > > > > +		GEM_BUG_ON(intel_context_is_child(ce));
> > > > > > >  
> > > > > > > +		guc_log_context(p, ce);
> > > > > > >  		guc_log_context_priority(p, ce);
> > > > > > > +
> > > > > > > +		if (intel_context_is_parent(ce)) {
> > > > > > > +			struct guc_process_desc *desc = __get_process_desc(ce);
> > > > > > > +			struct intel_context *child;
> > > > > > > +
> > > > > > > +			drm_printf(p, "\t\tWQI Head: %u\n",
> > > > > > > +				   READ_ONCE(desc->head));
> > > > > > > +			drm_printf(p, "\t\tWQI Tail: %u\n",
> > > > > > > +				   READ_ONCE(desc->tail));
> > > > > > > +			drm_printf(p, "\t\tWQI Status: %u\n\n",
> > > > > > > +				   READ_ONCE(desc->wq_status));
> > > > > > > +
> > > > > > > +			for_each_child(ce, child)
> > > > > > > +				guc_log_context(p, child);
> > > > > > > +		}
> > > > > > >  	}
> > > > > > >  }
> > > > > > >  
> > > > > > > -- 
> > > > > > > 2.28.0
> > > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > Daniel Vetter
> > > > > > Software Engineer, Intel Corporation
> > > > > > http://blog.ffwll.ch
> > > > 
> > > > -- 
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> > > 
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

[25/46] drm/i915/guc: Update debugfs for GuC multi-lrc

Commit Message

Comments

Patch