diff mbox

drm/i915/cmdparser: Limit clflush to active cachelines

Message ID 20170310094212.3496-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 10, 2017, 9:42 a.m. UTC
We only need to clflush those cachelines that we have validated to be
read by the GPU. Userspace typically fills the batch length in
correctly, the exceptions tend to be explicit tests within igt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Mika Kuoppala March 10, 2017, 9:58 a.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> We only need to clflush those cachelines that we have validated to be
> read by the GPU. Userspace typically fills the batch length in
> correctly, the exceptions tend to be explicit tests within igt.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 21b1cd917d81..b9ce9a6881ea 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
>  	}
>  
>  	if (ret == 0 && needs_clflush_after)
> -		drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len);
> +		drm_clflush_virt_range(shadow_batch_obj->mm.mapping,
> +				       (void *)cmd - shadow_batch_obj->mm.mapping);

(void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping)

We get away as the wb mapping being zero but for correctness.

-Mika

>  	i915_gem_object_unpin_map(shadow_batch_obj);
>  
>  	return ret;
> -- 
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson March 10, 2017, 10:04 a.m. UTC | #2
On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > We only need to clflush those cachelines that we have validated to be
> > read by the GPU. Userspace typically fills the batch length in
> > correctly, the exceptions tend to be explicit tests within igt.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index 21b1cd917d81..b9ce9a6881ea 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
> >  	}
> >  
> >  	if (ret == 0 && needs_clflush_after)
> > -		drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len);
> > +		drm_clflush_virt_range(shadow_batch_obj->mm.mapping,
> > +				       (void *)cmd - shadow_batch_obj->mm.mapping);
> 
> (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping)
> 
> We get away as the wb mapping being zero but for correctness.

The low bits of mm.mapping cannot change the cacheline and so doesn't
affect the clflush. Same as before.
-Chris
Chris Wilson March 10, 2017, 10:19 a.m. UTC | #3
On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote:
> On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote:
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > 
> > > We only need to clflush those cachelines that we have validated to be
> > > read by the GPU. Userspace typically fills the batch length in
> > > correctly, the exceptions tend to be explicit tests within igt.
> > >
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > index 21b1cd917d81..b9ce9a6881ea 100644
> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
> > >  	}
> > >  
> > >  	if (ret == 0 && needs_clflush_after)
> > > -		drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len);
> > > +		drm_clflush_virt_range(shadow_batch_obj->mm.mapping,
> > > +				       (void *)cmd - shadow_batch_obj->mm.mapping);
> > 
> > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping)
> > 
> > We get away as the wb mapping being zero but for correctness.
> 
> The low bits of mm.mapping cannot change the cacheline and so doesn't
> affect the clflush. Same as before.

Although, not quite the same as before as before we didn't use a fixed
end-point and so it could overshoot by a cacheline, even across a page
boundary.
-Chris
Mika Kuoppala March 10, 2017, 10:26 a.m. UTC | #4
Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote:
>> On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote:
>> > Chris Wilson <chris@chris-wilson.co.uk> writes:
>> > 
>> > > We only need to clflush those cachelines that we have validated to be
>> > > read by the GPU. Userspace typically fills the batch length in
>> > > correctly, the exceptions tend to be explicit tests within igt.
>> > >
>> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > > ---
>> > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++-
>> > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> > > index 21b1cd917d81..b9ce9a6881ea 100644
>> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
>> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
>> > >  	}
>> > >  
>> > >  	if (ret == 0 && needs_clflush_after)
>> > > -		drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len);
>> > > +		drm_clflush_virt_range(shadow_batch_obj->mm.mapping,
>> > > +				       (void *)cmd - shadow_batch_obj->mm.mapping);
>> > 
>> > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping)
>> > 
>> > We get away as the wb mapping being zero but for correctness.
>> 
>> The low bits of mm.mapping cannot change the cacheline and so doesn't
>> affect the clflush. Same as before.
>
> Although, not quite the same as before as before we didn't use a fixed
> end-point and so it could overshoot by a cacheline, even across a page
> boundary.

And there could be PAGE_MASK worth of lowbits.
-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
Chris Wilson March 10, 2017, 10:39 a.m. UTC | #5
On Fri, Mar 10, 2017 at 12:26:10PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote:
> >> On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote:
> >> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> > 
> >> > > We only need to clflush those cachelines that we have validated to be
> >> > > read by the GPU. Userspace typically fills the batch length in
> >> > > correctly, the exceptions tend to be explicit tests within igt.
> >> > >
> >> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > > ---
> >> > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++-
> >> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> >> > >
> >> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> >> > > index 21b1cd917d81..b9ce9a6881ea 100644
> >> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> >> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> >> > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
> >> > >  	}
> >> > >  
> >> > >  	if (ret == 0 && needs_clflush_after)
> >> > > -		drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len);
> >> > > +		drm_clflush_virt_range(shadow_batch_obj->mm.mapping,
> >> > > +				       (void *)cmd - shadow_batch_obj->mm.mapping);
> >> > 
> >> > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping)
> >> > 
> >> > We get away as the wb mapping being zero but for correctness.
> >> 
> >> The low bits of mm.mapping cannot change the cacheline and so doesn't
> >> affect the clflush. Same as before.
> >
> > Although, not quite the same as before as before we didn't use a fixed
> > end-point and so it could overshoot by a cacheline, even across a page
> > boundary.
> 
> And there could be PAGE_MASK worth of lowbits.

There can? Wasn't anticipating adding a few thousand memory types, and
then sharing a single cache. :-p
-Chris
Mika Kuoppala March 10, 2017, 10:42 a.m. UTC | #6
Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Fri, Mar 10, 2017 at 12:26:10PM +0200, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote:
>> >> On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote:
>> >> > Chris Wilson <chris@chris-wilson.co.uk> writes:
>> >> > 
>> >> > > We only need to clflush those cachelines that we have validated to be
>> >> > > read by the GPU. Userspace typically fills the batch length in
>> >> > > correctly, the exceptions tend to be explicit tests within igt.
>> >> > >
>> >> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> >> > > ---
>> >> > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++-
>> >> > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >> > >
>> >> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> >> > > index 21b1cd917d81..b9ce9a6881ea 100644
>> >> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
>> >> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> >> > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
>> >> > >  	}
>> >> > >  
>> >> > >  	if (ret == 0 && needs_clflush_after)
>> >> > > -		drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len);
>> >> > > +		drm_clflush_virt_range(shadow_batch_obj->mm.mapping,
>> >> > > +				       (void *)cmd - shadow_batch_obj->mm.mapping);
>> >> > 
>> >> > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping)
>> >> > 
>> >> > We get away as the wb mapping being zero but for correctness.
>> >> 
>> >> The low bits of mm.mapping cannot change the cacheline and so doesn't
>> >> affect the clflush. Same as before.
>> >
>> > Although, not quite the same as before as before we didn't use a fixed
>> > end-point and so it could overshoot by a cacheline, even across a page
>> > boundary.
>> 
>> And there could be PAGE_MASK worth of lowbits.
>
> There can? Wasn't anticipating adding a few thousand memory types, and
> then sharing a single cache. :-p

#define ptr_mask_bits(ptr) ({
 \
         unsigned long __v = (unsigned long)(ptr);
 \
         (typeof(ptr))(__v & PAGE_MASK);
 \
 })                                                

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
Chris Wilson March 10, 2017, 10:50 a.m. UTC | #7
On Fri, Mar 10, 2017 at 12:42:34PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Fri, Mar 10, 2017 at 12:26:10PM +0200, Mika Kuoppala wrote:
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> 
> >> > On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote:
> >> >> On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote:
> >> >> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> >> > 
> >> >> > > We only need to clflush those cachelines that we have validated to be
> >> >> > > read by the GPU. Userspace typically fills the batch length in
> >> >> > > correctly, the exceptions tend to be explicit tests within igt.
> >> >> > >
> >> >> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> >> > > ---
> >> >> > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++-
> >> >> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> >> >> > >
> >> >> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> >> >> > > index 21b1cd917d81..b9ce9a6881ea 100644
> >> >> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> >> >> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> >> >> > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,
> >> >> > >  	}
> >> >> > >  
> >> >> > >  	if (ret == 0 && needs_clflush_after)
> >> >> > > -		drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len);
> >> >> > > +		drm_clflush_virt_range(shadow_batch_obj->mm.mapping,
> >> >> > > +				       (void *)cmd - shadow_batch_obj->mm.mapping);
> >> >> > 
> >> >> > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping)
> >> >> > 
> >> >> > We get away as the wb mapping being zero but for correctness.
> >> >> 
> >> >> The low bits of mm.mapping cannot change the cacheline and so doesn't
> >> >> affect the clflush. Same as before.
> >> >
> >> > Although, not quite the same as before as before we didn't use a fixed
> >> > end-point and so it could overshoot by a cacheline, even across a page
> >> > boundary.
> >> 
> >> And there could be PAGE_MASK worth of lowbits.
> >
> > There can? Wasn't anticipating adding a few thousand memory types, and
> > then sharing a single cache. :-p
> 
> #define ptr_mask_bits(ptr) ({
>  \
>          unsigned long __v = (unsigned long)(ptr);
>  \
>          (typeof(ptr))(__v & PAGE_MASK);
>  \
>  })                                                

Yes, I didn't name that function very well. The low bits of
obj->mm.mapping currently represent and index into the cache.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 21b1cd917d81..b9ce9a6881ea 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1331,7 +1331,8 @@  int intel_engine_cmd_parser(struct intel_engine_cs *engine,
 	}
 
 	if (ret == 0 && needs_clflush_after)
-		drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len);
+		drm_clflush_virt_range(shadow_batch_obj->mm.mapping,
+				       (void *)cmd - shadow_batch_obj->mm.mapping);
 	i915_gem_object_unpin_map(shadow_batch_obj);
 
 	return ret;