diff mbox series

[02/22] drm/i915/gem: Specify address type for chained reloc batches

Message ID 20200504044903.7626-2-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [01/22] drm/i915: Allow some leniency in PCU reads | expand

Commit Message

Chris Wilson May 4, 2020, 4:48 a.m. UTC
It is required that a chained batch be in the same address domain as its
parent, and also that must be specified in the command for earlier gen
as it is not inferred from the chaining until gen6.

Fixes: 964a9b0f611e ("drm/i915/gem: Use chained reloc batches")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Tvrtko Ursulin May 4, 2020, 11:49 a.m. UTC | #1
On 04/05/2020 05:48, Chris Wilson wrote:
> It is required that a chained batch be in the same address domain as its
> parent, and also that must be specified in the command for earlier gen
> as it is not inferred from the chaining until gen6.
> 
> Fixes: 964a9b0f611e ("drm/i915/gem: Use chained reloc batches")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index cce7df231cb9..ab0d4df13c0b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1004,14 +1004,14 @@ static int reloc_gpu_chain(struct reloc_cache *cache)
>   	GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
>   	cmd = cache->rq_cmd + cache->rq_size;
>   	*cmd++ = MI_ARB_CHECK;
> -	if (cache->gen >= 8) {
> +	if (cache->gen >= 8)
>   		*cmd++ = MI_BATCH_BUFFER_START_GEN8;
> -		*cmd++ = lower_32_bits(batch->node.start);
> -		*cmd++ = upper_32_bits(batch->node.start);
> -	} else {
> +	else if (cache->gen >= 6)
>   		*cmd++ = MI_BATCH_BUFFER_START;
> -		*cmd++ = lower_32_bits(batch->node.start);
> -	}
> +	else
> +		*cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
> +	*cmd++ = lower_32_bits(batch->node.start);
> +	*cmd++ = upper_32_bits(batch->node.start);

MI_NOOP between batches on gen < 8 ?

Regards,

Tvrtko

>   	i915_gem_object_flush_map(cache->rq_vma->obj);
>   	i915_gem_object_unpin_map(cache->rq_vma->obj);
>   	cache->rq_vma = NULL;
>
Chris Wilson May 4, 2020, 11:53 a.m. UTC | #2
Quoting Tvrtko Ursulin (2020-05-04 12:49:03)
> 
> On 04/05/2020 05:48, Chris Wilson wrote:
> > It is required that a chained batch be in the same address domain as its
> > parent, and also that must be specified in the command for earlier gen
> > as it is not inferred from the chaining until gen6.
> > 
> > Fixes: 964a9b0f611e ("drm/i915/gem: Use chained reloc batches")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++------
> >   1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index cce7df231cb9..ab0d4df13c0b 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -1004,14 +1004,14 @@ static int reloc_gpu_chain(struct reloc_cache *cache)
> >       GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
> >       cmd = cache->rq_cmd + cache->rq_size;
> >       *cmd++ = MI_ARB_CHECK;
> > -     if (cache->gen >= 8) {
> > +     if (cache->gen >= 8)
> >               *cmd++ = MI_BATCH_BUFFER_START_GEN8;
> > -             *cmd++ = lower_32_bits(batch->node.start);
> > -             *cmd++ = upper_32_bits(batch->node.start);
> > -     } else {
> > +     else if (cache->gen >= 6)
> >               *cmd++ = MI_BATCH_BUFFER_START;
> > -             *cmd++ = lower_32_bits(batch->node.start);
> > -     }
> > +     else
> > +             *cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
> > +     *cmd++ = lower_32_bits(batch->node.start);
> > +     *cmd++ = upper_32_bits(batch->node.start);
> 
> MI_NOOP between batches on gen < 8 ?

batch->node.start is 32b on gen<8, so this is effectively a trailing NOOP
after the BB_START. However, since it is after the 2 dword BB_START on
earlier gen it is never even evaluated by the CS.
-Chris
Tvrtko Ursulin May 4, 2020, 12:15 p.m. UTC | #3
On 04/05/2020 12:53, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-05-04 12:49:03)
>>
>> On 04/05/2020 05:48, Chris Wilson wrote:
>>> It is required that a chained batch be in the same address domain as its
>>> parent, and also that must be specified in the command for earlier gen
>>> as it is not inferred from the chaining until gen6.
>>>
>>> Fixes: 964a9b0f611e ("drm/i915/gem: Use chained reloc batches")
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++------
>>>    1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> index cce7df231cb9..ab0d4df13c0b 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> @@ -1004,14 +1004,14 @@ static int reloc_gpu_chain(struct reloc_cache *cache)
>>>        GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
>>>        cmd = cache->rq_cmd + cache->rq_size;
>>>        *cmd++ = MI_ARB_CHECK;
>>> -     if (cache->gen >= 8) {
>>> +     if (cache->gen >= 8)
>>>                *cmd++ = MI_BATCH_BUFFER_START_GEN8;
>>> -             *cmd++ = lower_32_bits(batch->node.start);
>>> -             *cmd++ = upper_32_bits(batch->node.start);
>>> -     } else {
>>> +     else if (cache->gen >= 6)
>>>                *cmd++ = MI_BATCH_BUFFER_START;
>>> -             *cmd++ = lower_32_bits(batch->node.start);
>>> -     }
>>> +     else
>>> +             *cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
>>> +     *cmd++ = lower_32_bits(batch->node.start);
>>> +     *cmd++ = upper_32_bits(batch->node.start);
>>
>> MI_NOOP between batches on gen < 8 ?
> 
> batch->node.start is 32b on gen<8, so this is effectively a trailing NOOP
> after the BB_START. However, since it is after the 2 dword BB_START on
> earlier gen it is never even evaluated by the CS.

Right, okay, like I remembered how it worked from last week! :) Best to 
drop a comment still.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cce7df231cb9..ab0d4df13c0b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1004,14 +1004,14 @@  static int reloc_gpu_chain(struct reloc_cache *cache)
 	GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
 	cmd = cache->rq_cmd + cache->rq_size;
 	*cmd++ = MI_ARB_CHECK;
-	if (cache->gen >= 8) {
+	if (cache->gen >= 8)
 		*cmd++ = MI_BATCH_BUFFER_START_GEN8;
-		*cmd++ = lower_32_bits(batch->node.start);
-		*cmd++ = upper_32_bits(batch->node.start);
-	} else {
+	else if (cache->gen >= 6)
 		*cmd++ = MI_BATCH_BUFFER_START;
-		*cmd++ = lower_32_bits(batch->node.start);
-	}
+	else
+		*cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
+	*cmd++ = lower_32_bits(batch->node.start);
+	*cmd++ = upper_32_bits(batch->node.start);
 	i915_gem_object_flush_map(cache->rq_vma->obj);
 	i915_gem_object_unpin_map(cache->rq_vma->obj);
 	cache->rq_vma = NULL;