diff mbox series

[11/46] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission

Message ID 20210803222943.27686-12-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series Parallel submission aka multi-bb execbuf | expand

Commit Message

Matthew Brost Aug. 3, 2021, 10:29 p.m. UTC
Calling switch_to_kernel_context isn't needed if the engine PM reference
is taken while all contexts are pinned. By not calling
switch_to_kernel_context we save on issuing a request to the engine.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Daniel Vetter Aug. 9, 2021, 2:27 p.m. UTC | #1
On Tue, Aug 03, 2021 at 03:29:08PM -0700, Matthew Brost wrote:
> Calling switch_to_kernel_context isn't needed if the engine PM reference
> is taken while all contexts are pinned. By not calling
> switch_to_kernel_context we save on issuing a request to the engine.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> index 1f07ac4e0672..58099de6bf07 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> @@ -162,6 +162,10 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
>  	unsigned long flags;
>  	bool result = true;
>  
> +	/* No need to switch_to_kernel_context if GuC submission */

Maybe whack a big FIXME on here that we should unravel this properly.
Currently the execlist backend assumptions are leaked all over the place,
leading to stuff like this. Which means extremely fragile code.

I currently don't have a great idea on how exactly we should do that, but
oh well.

btw just in case we ever want to make guc lrc properly evictable (which as
the og use-case for this function, way, way back), would we need to fully
unregister them from guc? At least I'm assuming there's no other trick
like the below one.

Another aside: How does the perf/OA patching work on GuC?

Anyway, patch looks legit:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


> +	if (intel_engine_uses_guc(engine))
> +		return true;
> +
>  	/* GPU is pointing to the void, as good as in the kernel context. */
>  	if (intel_gt_is_wedged(engine->gt))
>  		return true;
> -- 
> 2.28.0
>
Matthew Brost Aug. 9, 2021, 6:20 p.m. UTC | #2
On Mon, Aug 09, 2021 at 04:27:01PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:08PM -0700, Matthew Brost wrote:
> > Calling switch_to_kernel_context isn't needed if the engine PM reference
> > is taken while all contexts are pinned. By not calling
> > switch_to_kernel_context we save on issuing a request to the engine.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > index 1f07ac4e0672..58099de6bf07 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > @@ -162,6 +162,10 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
> >  	unsigned long flags;
> >  	bool result = true;
> >  
> > +	/* No need to switch_to_kernel_context if GuC submission */
> 
> Maybe whack a big FIXME on here that we should unravel this properly.

Sure, can add a FIXME here.

> Currently the execlist backend assumptions are leaked all over the place,
> leading to stuff like this. Which means extremely fragile code.
>

Yes, this something required for execlists implemented in what should be
generic code. 

> I currently don't have a great idea on how exactly we should do that, but
> oh well.

Me either, it will be a process.

> 
> btw just in case we ever want to make guc lrc properly evictable (which as
> the og use-case for this function, way, way back), would we need to fully

Can you explain what you mean by fully evictable? Not getting what you
mean in this context.

> unregister them from guc? At least I'm assuming there's no other trick

If scheduling is disabled on the context (currently done on unpin) you are
free move anything around as the GuC is guaranteed not to touch the
context state. If on re-pin something has moved (e.g. the LRC vaddr is
different), you need to unregister and re-register the context with the
GuC.

> like the below one.
> 
> Another aside: How does the perf/OA patching work on GuC?
>

Not my area of expertise but perf somewhat a WIP. The plan is for the
GuC to write out some stats to HWSP I think? John Harrison is working to
get this fully implemented.

OA is working afaik, with Umesh Nerlige Ramappa being the expert here.

Matt

> Anyway, patch looks legit:
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> 
> > +	if (intel_engine_uses_guc(engine))
> > +		return true;
> > +
> >  	/* GPU is pointing to the void, as good as in the kernel context. */
> >  	if (intel_gt_is_wedged(engine->gt))
> >  		return true;
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter Aug. 10, 2021, 6:47 a.m. UTC | #3
On Mon, Aug 09, 2021 at 06:20:51PM +0000, Matthew Brost wrote:
> On Mon, Aug 09, 2021 at 04:27:01PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:08PM -0700, Matthew Brost wrote:
> > > Calling switch_to_kernel_context isn't needed if the engine PM reference
> > > is taken while all contexts are pinned. By not calling
> > > switch_to_kernel_context we save on issuing a request to the engine.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > > index 1f07ac4e0672..58099de6bf07 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > > @@ -162,6 +162,10 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
> > >  	unsigned long flags;
> > >  	bool result = true;
> > >  
> > > +	/* No need to switch_to_kernel_context if GuC submission */
> > 
> > Maybe whack a big FIXME on here that we should unravel this properly.
> 
> Sure, can add a FIXME here.
> 
> > Currently the execlist backend assumptions are leaked all over the place,
> > leading to stuff like this. Which means extremely fragile code.
> >
> 
> Yes, this something required for execlists implemented in what should be
> generic code. 
> 
> > I currently don't have a great idea on how exactly we should do that, but
> > oh well.
> 
> Me either, it will be a process.
> 
> > 
> > btw just in case we ever want to make guc lrc properly evictable (which as
> > the og use-case for this function, way, way back), would we need to fully
> 
> Can you explain what you mean by fully evictable? Not getting what you
> mean in this context.
> 
> > unregister them from guc? At least I'm assuming there's no other trick
> 
> If scheduling is disabled on the context (currently done on unpin) you are
> free move anything around as the GuC is guaranteed not to touch the
> context state. If on re-pin something has moved (e.g. the LRC vaddr is
> different), you need to unregister and re-register the context with the
> GuC.

So at that point GuC also guarantees that it's not left in the hw engine?
Execlist has this barrier request to fully unload the ctx from the hw, and
that's also why I cam on the topic of OA.

> > like the below one.
> > 
> > Another aside: How does the perf/OA patching work on GuC?
> >
> 
> Not my area of expertise but perf somewhat a WIP. The plan is for the
> GuC to write out some stats to HWSP I think? John Harrison is working to
> get this fully implemented.
> 
> OA is working afaik, with Umesh Nerlige Ramappa being the expert here.

I think it's OA that I'm thinking of here: We have code in i915_perf.c to
patch all the ctx currently in the system, so that they have a consistent
OA config. That's also relying on this barrier stuff, and I was wondering
how that will work with GuC.
-Daniel

> 
> Matt
> 
> > Anyway, patch looks legit:
> > 
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > 
> > > +	if (intel_engine_uses_guc(engine))
> > > +		return true;
> > > +
> > >  	/* GPU is pointing to the void, as good as in the kernel context. */
> > >  	if (intel_gt_is_wedged(engine->gt))
> > >  		return true;
> > > -- 
> > > 2.28.0
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
Matthew Brost Aug. 11, 2021, 5:47 p.m. UTC | #4
On Tue, Aug 10, 2021 at 08:47:10AM +0200, Daniel Vetter wrote:
> On Mon, Aug 09, 2021 at 06:20:51PM +0000, Matthew Brost wrote:
> > On Mon, Aug 09, 2021 at 04:27:01PM +0200, Daniel Vetter wrote:
> > > On Tue, Aug 03, 2021 at 03:29:08PM -0700, Matthew Brost wrote:
> > > > Calling switch_to_kernel_context isn't needed if the engine PM reference
> > > > is taken while all contexts are pinned. By not calling
> > > > switch_to_kernel_context we save on issuing a request to the engine.
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > > > index 1f07ac4e0672..58099de6bf07 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > > > @@ -162,6 +162,10 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
> > > >  	unsigned long flags;
> > > >  	bool result = true;
> > > >  
> > > > +	/* No need to switch_to_kernel_context if GuC submission */
> > > 
> > > Maybe whack a big FIXME on here that we should unravel this properly.
> > 
> > Sure, can add a FIXME here.
> > 
> > > Currently the execlist backend assumptions are leaked all over the place,
> > > leading to stuff like this. Which means extremely fragile code.
> > >
> > 
> > Yes, this something required for execlists implemented in what should be
> > generic code. 
> > 
> > > I currently don't have a great idea on how exactly we should do that, but
> > > oh well.
> > 
> > Me either, it will be a process.
> > 
> > > 
> > > btw just in case we ever want to make guc lrc properly evictable (which as
> > > the og use-case for this function, way, way back), would we need to fully
> > 
> > Can you explain what you mean by fully evictable? Not getting what you
> > mean in this context.
> > 
> > > unregister them from guc? At least I'm assuming there's no other trick
> > 
> > If scheduling is disabled on the context (currently done on unpin) you are
> > free move anything around as the GuC is guaranteed not to touch the
> > context state. If on re-pin something has moved (e.g. the LRC vaddr is
> > different), you need to unregister and re-register the context with the
> > GuC.
> 
> So at that point GuC also guarantees that it's not left in the hw engine?
> Execlist has this barrier request to fully unload the ctx from the hw, and
> that's also why I cam on the topic of OA.
> 
> > > like the below one.
> > > 
> > > Another aside: How does the perf/OA patching work on GuC?
> > >
> > 
> > Not my area of expertise but perf somewhat a WIP. The plan is for the
> > GuC to write out some stats to HWSP I think? John Harrison is working to
> > get this fully implemented.
> > 
> > OA is working afaik, with Umesh Nerlige Ramappa being the expert here.
> 
> I think it's OA that I'm thinking of here: We have code in i915_perf.c to
> patch all the ctx currently in the system, so that they have a consistent
> OA config. That's also relying on this barrier stuff, and I was wondering
> how that will work with GuC.
> -Daniel
> 

Not an OA expert at all but glanced at the code I don't see anything in
there that prevents it working with GuC submission. We certainly have
this working internally. If you have questions about this I'd reach out to
Umesh Nerlige Ramappa as he likely has the answers.

Matt

> > 
> > Matt
> > 
> > > Anyway, patch looks legit:
> > > 
> > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > 
> > > 
> > > > +	if (intel_engine_uses_guc(engine))
> > > > +		return true;
> > > > +
> > > >  	/* GPU is pointing to the void, as good as in the kernel context. */
> > > >  	if (intel_gt_is_wedged(engine->gt))
> > > >  		return true;
> > > > -- 
> > > > 2.28.0
> > > > 
> > > 
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 1f07ac4e0672..58099de6bf07 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -162,6 +162,10 @@  static bool switch_to_kernel_context(struct intel_engine_cs *engine)
 	unsigned long flags;
 	bool result = true;
 
+	/* No need to switch_to_kernel_context if GuC submission */
+	if (intel_engine_uses_guc(engine))
+		return true;
+
 	/* GPU is pointing to the void, as good as in the kernel context. */
 	if (intel_gt_is_wedged(engine->gt))
 		return true;