[v2,1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

Message ID	1432922605-5893-2-git-send-email-arun.siluvery@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Arun Siluvery <arun.siluvery@linux.intel.com> To: intel-gfx@lists.freedesktop.org Date: Fri, 29 May 2015 19:03:19 +0100 Message-Id: <1432922605-5893-2-git-send-email-arun.siluvery@linux.intel.com> In-Reply-To: <1432922605-5893-1-git-send-email-arun.siluvery@linux.intel.com> References: <1432922605-5893-1-git-send-email-arun.siluvery@linux.intel.com> Cc: Namrta <namrta.salonie@intel.com> Subject: [Intel-gfx] [PATCH v2 1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

arun.siluvery@linux.intel.com May 29, 2015, 6:03 p.m. UTC

This patch adds functions to setup WA batch buffers but they are not yet
enabled in this patch. Some of the WA are to be applied during context save
but before restore and some at the end of context save/restore but before
executing the instructions in the ring, WA batch buffers are created for
this purpose and these WA cannot be applied using normal means.

Signed-off-by: Namrta <namrta.salonie@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |   3 ++
 drivers/gpu/drm/i915/intel_lrc.c | 101 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 104 insertions(+)

Chris Wilson May 29, 2015, 6:16 p.m. UTC | #1

On Fri, May 29, 2015 at 07:03:19PM +0100, Arun Siluvery wrote:
> This patch adds functions to setup WA batch buffers but they are not yet
> enabled in this patch. Some of the WA are to be applied during context save
> but before restore and some at the end of context save/restore but before
> executing the instructions in the ring, WA batch buffers are created for
> this purpose and these WA cannot be applied using normal means.
> 
> Signed-off-by: Namrta <namrta.salonie@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h  |   3 ++
>  drivers/gpu/drm/i915/intel_lrc.c | 101 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 104 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 731b5ce..dd4b31d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -814,6 +814,9 @@ struct intel_context {
>  
>  	/* Execlists */
>  	bool rcs_initialized;
> +	struct intel_ringbuffer *indirect_ctx_wa_bb;
> +	struct intel_ringbuffer *per_ctx_wa_bb;

Eh? They are only command sequences whose starting addresses you encode
into the execlists context. Why have you allocated a ringbuf not an
object? Why have you allocated 2 pages when you only need one, and could
even find space elsewhere in the context....

And these should be pinned alongside the context *not permanently*.

I want a debug mode that limits us to say 16M of GGTT space so that
these address space leaks are easier to demonstrate in practice.
-Chris

arun.siluvery@linux.intel.com June 1, 2015, 10:01 a.m. UTC | #2

On 29/05/2015 19:16, Chris Wilson wrote:
> On Fri, May 29, 2015 at 07:03:19PM +0100, Arun Siluvery wrote:
>> This patch adds functions to setup WA batch buffers but they are not yet
>> enabled in this patch. Some of the WA are to be applied during context save
>> but before restore and some at the end of context save/restore but before
>> executing the instructions in the ring, WA batch buffers are created for
>> this purpose and these WA cannot be applied using normal means.
>>
>> Signed-off-by: Namrta <namrta.salonie@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h  |   3 ++
>>   drivers/gpu/drm/i915/intel_lrc.c | 101 +++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 104 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 731b5ce..dd4b31d 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -814,6 +814,9 @@ struct intel_context {
>>
>>   	/* Execlists */
>>   	bool rcs_initialized;
>> +	struct intel_ringbuffer *indirect_ctx_wa_bb;
>> +	struct intel_ringbuffer *per_ctx_wa_bb;
>
> Eh? They are only command sequences whose starting addresses you encode
> into the execlists context. Why have you allocated a ringbuf not an
> object? Why have you allocated 2 pages when you only need one, and could
> even find space elsewhere in the context....

ringbuf is only used so that I can use logical_ring_*(), object can also 
be used.
Single page is enough but since we have two batch buffers and need to 
provide offsets in two different registers, two pages are used for 
simplifying things, I guess we can manage with single page, I will try this.

Your idea of using space in context itself simplifies many things but 
the context size varies across Gens, is it safe to pick last page or 
increase the size by one more page and use that to load these 
instructions? I think using an additional page is safe to avoid the risk 
of HW overwriting that page or do you have any other recommendation? I 
will first try and see if it works.

>
> And these should be pinned alongside the context *not permanently*.
right, I will correct this but this won't be required if we use the 
space in context.

>
> I want a debug mode that limits us to say 16M of GGTT space so that
> these address space leaks are easier to demonstrate in practice.
> -Chris
>

regards
Arun

Chris Wilson June 1, 2015, 10:07 a.m. UTC | #3

On Mon, Jun 01, 2015 at 11:01:08AM +0100, Siluvery, Arun wrote:
> On 29/05/2015 19:16, Chris Wilson wrote:
> >On Fri, May 29, 2015 at 07:03:19PM +0100, Arun Siluvery wrote:
> >>This patch adds functions to setup WA batch buffers but they are not yet
> >>enabled in this patch. Some of the WA are to be applied during context save
> >>but before restore and some at the end of context save/restore but before
> >>executing the instructions in the ring, WA batch buffers are created for
> >>this purpose and these WA cannot be applied using normal means.
> >>
> >>Signed-off-by: Namrta <namrta.salonie@intel.com>
> >>Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>---
> >>  drivers/gpu/drm/i915/i915_drv.h  |   3 ++
> >>  drivers/gpu/drm/i915/intel_lrc.c | 101 +++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 104 insertions(+)
> >>
> >>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>index 731b5ce..dd4b31d 100644
> >>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>@@ -814,6 +814,9 @@ struct intel_context {
> >>
> >>  	/* Execlists */
> >>  	bool rcs_initialized;
> >>+	struct intel_ringbuffer *indirect_ctx_wa_bb;
> >>+	struct intel_ringbuffer *per_ctx_wa_bb;
> >
> >Eh? They are only command sequences whose starting addresses you encode
> >into the execlists context. Why have you allocated a ringbuf not an
> >object? Why have you allocated 2 pages when you only need one, and could
> >even find space elsewhere in the context....
> 
> ringbuf is only used so that I can use logical_ring_*(), object can
> also be used.
> Single page is enough but since we have two batch buffers and need
> to provide offsets in two different registers, two pages are used
> for simplifying things, I guess we can manage with single page, I
> will try this.
> 
> Your idea of using space in context itself simplifies many things
> but the context size varies across Gens, is it safe to pick last
> page or increase the size by one more page and use that to load
> these instructions? I think using an additional page is safe to
> avoid the risk of HW overwriting that page or do you have any other
> recommendation? I will first try and see if it works.
> 
> >
> >And these should be pinned alongside the context *not permanently*.
> right, I will correct this but this won't be required if we use the
> space in context.

Indeed, allocating an extra scratch page in the context would simplify
vma/mm management. A trick might be to allocate the scratch page at the
start, then offset the lrc regs etc - that would then be consistent
amongst gen and be easy enough to extend if we need more per-context
scratch space in future.
-Chris

Thomas Daniel June 1, 2015, 10:22 a.m. UTC | #4

> 

> Indeed, allocating an extra scratch page in the context would simplify

> vma/mm management. A trick might be to allocate the scratch page at the

> start, then offset the lrc regs etc - that would then be consistent

> amongst gen and be easy enough to extend if we need more per-context

> scratch space in future.

> -Chris


Yes, I think we already have another use for more per-context space at the start.  The GuC is planning to do this.  Arun, you probably should work with Alex Dai and Dave Gordon to avoid conflicts here.

Thomas.

arun.siluvery@linux.intel.com June 2, 2015, 6:36 p.m. UTC | #5

On 01/06/2015 11:22, Daniel, Thomas wrote:
>>
>> Indeed, allocating an extra scratch page in the context would simplify
>> vma/mm management. A trick might be to allocate the scratch page at the
>> start, then offset the lrc regs etc - that would then be consistent
>> amongst gen and be easy enough to extend if we need more per-context
>> scratch space in future.
>> -Chris
>
> Yes, I think we already have another use for more per-context space at the start.  The GuC is planning to do this.  Arun, you probably should work with Alex Dai and Dave Gordon to avoid conflicts here.
>
> Thomas.
>

Thanks for the heads-up Thomas.
I have discussed with Dave and agreed to share this page;
GuC probably doesn't need whole page so first half is reserved for it's 
use and second half is used for WA.

I have modified my patches to use context page for applying these WA and 
don't see any issues.

During the discussions Dave proposed another approach. Even though these 
WA are called per context they are only initialized once and not changed 
afterwards, same set of WA are applied for each context so instead of 
adding them in each context, does it make sense to create a separate 
page and share across all contexts? but ofcourse GuC will anyway add a 
new page to context so I might as well share that page.

Chris/Dave, do you see any problems with sharing page with GuC or you 
prefer to allocate a separate page for these WA and share across all 
contexts? Please give your comments.

regards
Arun

Dave Gordon June 2, 2015, 6:47 p.m. UTC | #6

On 02/06/15 19:36, Siluvery, Arun wrote:
> On 01/06/2015 11:22, Daniel, Thomas wrote:
>>>
>>> Indeed, allocating an extra scratch page in the context would simplify
>>> vma/mm management. A trick might be to allocate the scratch page at the
>>> start, then offset the lrc regs etc - that would then be consistent
>>> amongst gen and be easy enough to extend if we need more per-context
>>> scratch space in future.
>>> -Chris
>>
>> Yes, I think we already have another use for more per-context space at
>> the start.  The GuC is planning to do this.  Arun, you probably should
>> work with Alex Dai and Dave Gordon to avoid conflicts here.
>>
>> Thomas.
>>
> 
> Thanks for the heads-up Thomas.
> I have discussed with Dave and agreed to share this page;
> GuC probably doesn't need whole page so first half is reserved for it's
> use and second half is used for WA.
> 
> I have modified my patches to use context page for applying these WA and
> don't see any issues.
> 
> During the discussions Dave proposed another approach. Even though these
> WA are called per context they are only initialized once and not changed
> afterwards, same set of WA are applied for each context so instead of
> adding them in each context, does it make sense to create a separate
> page and share across all contexts? but of course GuC will anyway add a
> new page to context so I might as well share that page.
> 
> Chris/Dave, do you see any problems with sharing page with GuC or you
> prefer to allocate a separate page for these WA and share across all
> contexts? Please give your comments.
> 
> regards
> Arun

I think we have to consider which is more future-proof i.e. which is
least likely:
(1) the area shared with the GuC grows (definitely still in flux), or
(2) workarounds need to be context-specific (possible, but unlikely)

So I'd prefer a single area set up just once to contain the pre- and
post-context restore workaround batches. If necessary, the one area
could contain multiple batches at different offsets, so we could point
different contexts at different (shared) batches as required. I think
they're unlikely to actually need per-context customisation[*], but
there might be a need for different workarounds according to workload
type or privilege level or some other criterion ... ?

.Dave.

[*] unless they need per-context memory addresses coded into them?

arun.siluvery@linux.intel.com June 4, 2015, 2:30 p.m. UTC | #7

On 02/06/2015 19:47, Dave Gordon wrote:
> On 02/06/15 19:36, Siluvery, Arun wrote:
>> On 01/06/2015 11:22, Daniel, Thomas wrote:
>>>>
>>>> Indeed, allocating an extra scratch page in the context would simplify
>>>> vma/mm management. A trick might be to allocate the scratch page at the
>>>> start, then offset the lrc regs etc - that would then be consistent
>>>> amongst gen and be easy enough to extend if we need more per-context
>>>> scratch space in future.
>>>> -Chris
>>>
>>> Yes, I think we already have another use for more per-context space at
>>> the start.  The GuC is planning to do this.  Arun, you probably should
>>> work with Alex Dai and Dave Gordon to avoid conflicts here.
>>>
>>> Thomas.
>>>
>>
>> Thanks for the heads-up Thomas.
>> I have discussed with Dave and agreed to share this page;
>> GuC probably doesn't need whole page so first half is reserved for it's
>> use and second half is used for WA.
>>
>> I have modified my patches to use context page for applying these WA and
>> don't see any issues.
>>
>> During the discussions Dave proposed another approach. Even though these
>> WA are called per context they are only initialized once and not changed
>> afterwards, same set of WA are applied for each context so instead of
>> adding them in each context, does it make sense to create a separate
>> page and share across all contexts? but of course GuC will anyway add a
>> new page to context so I might as well share that page.
>>
>> Chris/Dave, do you see any problems with sharing page with GuC or you
>> prefer to allocate a separate page for these WA and share across all
>> contexts? Please give your comments.
>>
>> regards
>> Arun
>
> I think we have to consider which is more future-proof i.e. which is
> least likely:
> (1) the area shared with the GuC grows (definitely still in flux), or
> (2) workarounds need to be context-specific (possible, but unlikely)
>
> So I'd prefer a single area set up just once to contain the pre- and
> post-context restore workaround batches. If necessary, the one area
> could contain multiple batches at different offsets, so we could point
> different contexts at different (shared) batches as required. I think
> they're unlikely to actually need per-context customisation[*], but
> there might be a need for different workarounds according to workload
> type or privilege level or some other criterion ... ?
>
> .Dave.
>
> [*] unless they need per-context memory addresses coded into them?
>

Considering these WA are initialized only once and not changed 
afterwards and GuC area probably grows in future which may run into the 
space used by WA, independent single area setup makes senses.
I also checked spec and it is not clear whether any customization is 
going to be required for different contexts.
I have modified patches to setup a single page with WA when 
default_context is initialized and this is used by all contexts.

I will send patches but please let me know if there are any other comments.

regards
Arun
>
>

Daniel Vetter June 15, 2015, 10:40 a.m. UTC | #8

On Fri, May 29, 2015 at 07:16:38PM +0100, Chris Wilson wrote:
> On Fri, May 29, 2015 at 07:03:19PM +0100, Arun Siluvery wrote:
> > This patch adds functions to setup WA batch buffers but they are not yet
> > enabled in this patch. Some of the WA are to be applied during context save
> > but before restore and some at the end of context save/restore but before
> > executing the instructions in the ring, WA batch buffers are created for
> > this purpose and these WA cannot be applied using normal means.
> > 
> > Signed-off-by: Namrta <namrta.salonie@intel.com>
> > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h  |   3 ++
> >  drivers/gpu/drm/i915/intel_lrc.c | 101 +++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 104 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 731b5ce..dd4b31d 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -814,6 +814,9 @@ struct intel_context {
> >  
> >  	/* Execlists */
> >  	bool rcs_initialized;
> > +	struct intel_ringbuffer *indirect_ctx_wa_bb;
> > +	struct intel_ringbuffer *per_ctx_wa_bb;
> 
> Eh? They are only command sequences whose starting addresses you encode
> into the execlists context. Why have you allocated a ringbuf not an
> object? Why have you allocated 2 pages when you only need one, and could
> even find space elsewhere in the context....
> 
> And these should be pinned alongside the context *not permanently*.
> 
> I want a debug mode that limits us to say 16M of GGTT space so that
> these address space leaks are easier to demonstrate in practice.

Yeah the "fix up execlist context to use active list tracking" series
is still pending too ... And artificially limiting ggtt space would indeed
be a neat debug trick. Maybe we'd need to have different limits for
display and other buffers just to avoid making 4k completely unuseable
with this debug knob enabled.
-Daniel

Daniel Vetter June 15, 2015, 10:41 a.m. UTC | #9

On Thu, Jun 04, 2015 at 03:30:56PM +0100, Siluvery, Arun wrote:
> On 02/06/2015 19:47, Dave Gordon wrote:
> >On 02/06/15 19:36, Siluvery, Arun wrote:
> >>On 01/06/2015 11:22, Daniel, Thomas wrote:
> >>>>
> >>>>Indeed, allocating an extra scratch page in the context would simplify
> >>>>vma/mm management. A trick might be to allocate the scratch page at the
> >>>>start, then offset the lrc regs etc - that would then be consistent
> >>>>amongst gen and be easy enough to extend if we need more per-context
> >>>>scratch space in future.
> >>>>-Chris
> >>>
> >>>Yes, I think we already have another use for more per-context space at
> >>>the start.  The GuC is planning to do this.  Arun, you probably should
> >>>work with Alex Dai and Dave Gordon to avoid conflicts here.
> >>>
> >>>Thomas.
> >>>
> >>
> >>Thanks for the heads-up Thomas.
> >>I have discussed with Dave and agreed to share this page;
> >>GuC probably doesn't need whole page so first half is reserved for it's
> >>use and second half is used for WA.
> >>
> >>I have modified my patches to use context page for applying these WA and
> >>don't see any issues.
> >>
> >>During the discussions Dave proposed another approach. Even though these
> >>WA are called per context they are only initialized once and not changed
> >>afterwards, same set of WA are applied for each context so instead of
> >>adding them in each context, does it make sense to create a separate
> >>page and share across all contexts? but of course GuC will anyway add a
> >>new page to context so I might as well share that page.
> >>
> >>Chris/Dave, do you see any problems with sharing page with GuC or you
> >>prefer to allocate a separate page for these WA and share across all
> >>contexts? Please give your comments.
> >>
> >>regards
> >>Arun
> >
> >I think we have to consider which is more future-proof i.e. which is
> >least likely:
> >(1) the area shared with the GuC grows (definitely still in flux), or
> >(2) workarounds need to be context-specific (possible, but unlikely)
> >
> >So I'd prefer a single area set up just once to contain the pre- and
> >post-context restore workaround batches. If necessary, the one area
> >could contain multiple batches at different offsets, so we could point
> >different contexts at different (shared) batches as required. I think
> >they're unlikely to actually need per-context customisation[*], but
> >there might be a need for different workarounds according to workload
> >type or privilege level or some other criterion ... ?
> >
> >.Dave.
> >
> >[*] unless they need per-context memory addresses coded into them?
> >
> 
> Considering these WA are initialized only once and not changed afterwards
> and GuC area probably grows in future which may run into the space used by
> WA, independent single area setup makes senses.
> I also checked spec and it is not clear whether any customization is going
> to be required for different contexts.
> I have modified patches to setup a single page with WA when default_context
> is initialized and this is used by all contexts.
> 
> I will send patches but please let me know if there are any other comments.

Yeah if the wa batches aren't ctx specific, then there's really no need to
allocate one of them per ctx. One global buffer with all the wa combined
should really be all we need.
-Daniel

arun.siluvery@linux.intel.com June 15, 2015, 2:14 p.m. UTC | #10

On 15/06/2015 11:41, Daniel Vetter wrote:
> On Thu, Jun 04, 2015 at 03:30:56PM +0100, Siluvery, Arun wrote:
>> On 02/06/2015 19:47, Dave Gordon wrote:
>>> On 02/06/15 19:36, Siluvery, Arun wrote:
>>>> On 01/06/2015 11:22, Daniel, Thomas wrote:
>>>>>>
>>>>>> Indeed, allocating an extra scratch page in the context would simplify
>>>>>> vma/mm management. A trick might be to allocate the scratch page at the
>>>>>> start, then offset the lrc regs etc - that would then be consistent
>>>>>> amongst gen and be easy enough to extend if we need more per-context
>>>>>> scratch space in future.
>>>>>> -Chris
>>>>>
>>>>> Yes, I think we already have another use for more per-context space at
>>>>> the start.  The GuC is planning to do this.  Arun, you probably should
>>>>> work with Alex Dai and Dave Gordon to avoid conflicts here.
>>>>>
>>>>> Thomas.
>>>>>
>>>>
>>>> Thanks for the heads-up Thomas.
>>>> I have discussed with Dave and agreed to share this page;
>>>> GuC probably doesn't need whole page so first half is reserved for it's
>>>> use and second half is used for WA.
>>>>
>>>> I have modified my patches to use context page for applying these WA and
>>>> don't see any issues.
>>>>
>>>> During the discussions Dave proposed another approach. Even though these
>>>> WA are called per context they are only initialized once and not changed
>>>> afterwards, same set of WA are applied for each context so instead of
>>>> adding them in each context, does it make sense to create a separate
>>>> page and share across all contexts? but of course GuC will anyway add a
>>>> new page to context so I might as well share that page.
>>>>
>>>> Chris/Dave, do you see any problems with sharing page with GuC or you
>>>> prefer to allocate a separate page for these WA and share across all
>>>> contexts? Please give your comments.
>>>>
>>>> regards
>>>> Arun
>>>
>>> I think we have to consider which is more future-proof i.e. which is
>>> least likely:
>>> (1) the area shared with the GuC grows (definitely still in flux), or
>>> (2) workarounds need to be context-specific (possible, but unlikely)
>>>
>>> So I'd prefer a single area set up just once to contain the pre- and
>>> post-context restore workaround batches. If necessary, the one area
>>> could contain multiple batches at different offsets, so we could point
>>> different contexts at different (shared) batches as required. I think
>>> they're unlikely to actually need per-context customisation[*], but
>>> there might be a need for different workarounds according to workload
>>> type or privilege level or some other criterion ... ?
>>>
>>> .Dave.
>>>
>>> [*] unless they need per-context memory addresses coded into them?
>>>
>>
>> Considering these WA are initialized only once and not changed afterwards
>> and GuC area probably grows in future which may run into the space used by
>> WA, independent single area setup makes senses.
>> I also checked spec and it is not clear whether any customization is going
>> to be required for different contexts.
>> I have modified patches to setup a single page with WA when default_context
>> is initialized and this is used by all contexts.
>>
>> I will send patches but please let me know if there are any other comments.
>
> Yeah if the wa batches aren't ctx specific, then there's really no need to
> allocate one of them per ctx. One global buffer with all the wa combined
> should really be all we need.
> -Daniel
>
Hi Daniel,

Agree, this is already taken into account in the next revision v3 
(already sent to mailing list).

I can see you are still going through the list but when you get there, 
please let me know if you have any other comments.

regards
Arun

[v2,1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

Commit Message

Comments

Patch