Message ID | 20170310094212.3496-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Chris Wilson <chris@chris-wilson.co.uk> writes: > We only need to clflush those cachelines that we have validated to be > read by the GPU. Userspace typically fills the batch length in > correctly, the exceptions tend to be explicit tests within igt. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c > index 21b1cd917d81..b9ce9a6881ea 100644 > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, > } > > if (ret == 0 && needs_clflush_after) > - drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len); > + drm_clflush_virt_range(shadow_batch_obj->mm.mapping, > + (void *)cmd - shadow_batch_obj->mm.mapping); (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping) We get away as the wb mapping being zero but for correctness. -Mika > i915_gem_object_unpin_map(shadow_batch_obj); > > return ret; > -- > 2.11.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote: > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > We only need to clflush those cachelines that we have validated to be > > read by the GPU. Userspace typically fills the batch length in > > correctly, the exceptions tend to be explicit tests within igt. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c > > index 21b1cd917d81..b9ce9a6881ea 100644 > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, > > } > > > > if (ret == 0 && needs_clflush_after) > > - drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len); > > + drm_clflush_virt_range(shadow_batch_obj->mm.mapping, > > + (void *)cmd - shadow_batch_obj->mm.mapping); > > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping) > > We get away as the wb mapping being zero but for correctness. The low bits of mm.mapping cannot change the cacheline and so doesn't affect the clflush. Same as before. -Chris
On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote: > On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote: > > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > > > We only need to clflush those cachelines that we have validated to be > > > read by the GPU. Userspace typically fills the batch length in > > > correctly, the exceptions tend to be explicit tests within igt. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > --- > > > drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c > > > index 21b1cd917d81..b9ce9a6881ea 100644 > > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, > > > } > > > > > > if (ret == 0 && needs_clflush_after) > > > - drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len); > > > + drm_clflush_virt_range(shadow_batch_obj->mm.mapping, > > > + (void *)cmd - shadow_batch_obj->mm.mapping); > > > > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping) > > > > We get away as the wb mapping being zero but for correctness. > > The low bits of mm.mapping cannot change the cacheline and so doesn't > affect the clflush. Same as before. Although, not quite the same as before as before we didn't use a fixed end-point and so it could overshoot by a cacheline, even across a page boundary. -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote: >> On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote: >> > Chris Wilson <chris@chris-wilson.co.uk> writes: >> > >> > > We only need to clflush those cachelines that we have validated to be >> > > read by the GPU. Userspace typically fills the batch length in >> > > correctly, the exceptions tend to be explicit tests within igt. >> > > >> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> > > --- >> > > drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++- >> > > 1 file changed, 2 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c >> > > index 21b1cd917d81..b9ce9a6881ea 100644 >> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c >> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c >> > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, >> > > } >> > > >> > > if (ret == 0 && needs_clflush_after) >> > > - drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len); >> > > + drm_clflush_virt_range(shadow_batch_obj->mm.mapping, >> > > + (void *)cmd - shadow_batch_obj->mm.mapping); >> > >> > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping) >> > >> > We get away as the wb mapping being zero but for correctness. >> >> The low bits of mm.mapping cannot change the cacheline and so doesn't >> affect the clflush. Same as before. > > Although, not quite the same as before as before we didn't use a fixed > end-point and so it could overshoot by a cacheline, even across a page > boundary. And there could be PAGE_MASK worth of lowbits. -Mika > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre
On Fri, Mar 10, 2017 at 12:26:10PM +0200, Mika Kuoppala wrote: > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote: > >> On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote: > >> > Chris Wilson <chris@chris-wilson.co.uk> writes: > >> > > >> > > We only need to clflush those cachelines that we have validated to be > >> > > read by the GPU. Userspace typically fills the batch length in > >> > > correctly, the exceptions tend to be explicit tests within igt. > >> > > > >> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >> > > --- > >> > > drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++- > >> > > 1 file changed, 2 insertions(+), 1 deletion(-) > >> > > > >> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c > >> > > index 21b1cd917d81..b9ce9a6881ea 100644 > >> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > >> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > >> > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, > >> > > } > >> > > > >> > > if (ret == 0 && needs_clflush_after) > >> > > - drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len); > >> > > + drm_clflush_virt_range(shadow_batch_obj->mm.mapping, > >> > > + (void *)cmd - shadow_batch_obj->mm.mapping); > >> > > >> > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping) > >> > > >> > We get away as the wb mapping being zero but for correctness. > >> > >> The low bits of mm.mapping cannot change the cacheline and so doesn't > >> affect the clflush. Same as before. > > > > Although, not quite the same as before as before we didn't use a fixed > > end-point and so it could overshoot by a cacheline, even across a page > > boundary. > > And there could be PAGE_MASK worth of lowbits. There can? Wasn't anticipating adding a few thousand memory types, and then sharing a single cache. :-p -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > On Fri, Mar 10, 2017 at 12:26:10PM +0200, Mika Kuoppala wrote: >> Chris Wilson <chris@chris-wilson.co.uk> writes: >> >> > On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote: >> >> On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote: >> >> > Chris Wilson <chris@chris-wilson.co.uk> writes: >> >> > >> >> > > We only need to clflush those cachelines that we have validated to be >> >> > > read by the GPU. Userspace typically fills the batch length in >> >> > > correctly, the exceptions tend to be explicit tests within igt. >> >> > > >> >> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> >> > > --- >> >> > > drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++- >> >> > > 1 file changed, 2 insertions(+), 1 deletion(-) >> >> > > >> >> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c >> >> > > index 21b1cd917d81..b9ce9a6881ea 100644 >> >> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c >> >> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c >> >> > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, >> >> > > } >> >> > > >> >> > > if (ret == 0 && needs_clflush_after) >> >> > > - drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len); >> >> > > + drm_clflush_virt_range(shadow_batch_obj->mm.mapping, >> >> > > + (void *)cmd - shadow_batch_obj->mm.mapping); >> >> > >> >> > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping) >> >> > >> >> > We get away as the wb mapping being zero but for correctness. >> >> >> >> The low bits of mm.mapping cannot change the cacheline and so doesn't >> >> affect the clflush. Same as before. >> > >> > Although, not quite the same as before as before we didn't use a fixed >> > end-point and so it could overshoot by a cacheline, even across a page >> > boundary. >> >> And there could be PAGE_MASK worth of lowbits. > > There can? Wasn't anticipating adding a few thousand memory types, and > then sharing a single cache. :-p #define ptr_mask_bits(ptr) ({ \ unsigned long __v = (unsigned long)(ptr); \ (typeof(ptr))(__v & PAGE_MASK); \ }) > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre
On Fri, Mar 10, 2017 at 12:42:34PM +0200, Mika Kuoppala wrote: > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > On Fri, Mar 10, 2017 at 12:26:10PM +0200, Mika Kuoppala wrote: > >> Chris Wilson <chris@chris-wilson.co.uk> writes: > >> > >> > On Fri, Mar 10, 2017 at 10:04:16AM +0000, Chris Wilson wrote: > >> >> On Fri, Mar 10, 2017 at 11:58:44AM +0200, Mika Kuoppala wrote: > >> >> > Chris Wilson <chris@chris-wilson.co.uk> writes: > >> >> > > >> >> > > We only need to clflush those cachelines that we have validated to be > >> >> > > read by the GPU. Userspace typically fills the batch length in > >> >> > > correctly, the exceptions tend to be explicit tests within igt. > >> >> > > > >> >> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >> >> > > --- > >> >> > > drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++- > >> >> > > 1 file changed, 2 insertions(+), 1 deletion(-) > >> >> > > > >> >> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c > >> >> > > index 21b1cd917d81..b9ce9a6881ea 100644 > >> >> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > >> >> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > >> >> > > @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, > >> >> > > } > >> >> > > > >> >> > > if (ret == 0 && needs_clflush_after) > >> >> > > - drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len); > >> >> > > + drm_clflush_virt_range(shadow_batch_obj->mm.mapping, > >> >> > > + (void *)cmd - shadow_batch_obj->mm.mapping); > >> >> > > >> >> > (void *)cmd - ptr_mask_bit(shadow_batch_obj->mm.mapping) > >> >> > > >> >> > We get away as the wb mapping being zero but for correctness. > >> >> > >> >> The low bits of mm.mapping cannot change the cacheline and so doesn't > >> >> affect the clflush. Same as before. > >> > > >> > Although, not quite the same as before as before we didn't use a fixed > >> > end-point and so it could overshoot by a cacheline, even across a page > >> > boundary. > >> > >> And there could be PAGE_MASK worth of lowbits. > > > > There can? Wasn't anticipating adding a few thousand memory types, and > > then sharing a single cache. :-p > > #define ptr_mask_bits(ptr) ({ > \ > unsigned long __v = (unsigned long)(ptr); > \ > (typeof(ptr))(__v & PAGE_MASK); > \ > }) Yes, I didn't name that function very well. The low bits of obj->mm.mapping currently represent and index into the cache. -Chris
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 21b1cd917d81..b9ce9a6881ea 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -1331,7 +1331,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, } if (ret == 0 && needs_clflush_after) - drm_clflush_virt_range(shadow_batch_obj->mm.mapping, batch_len); + drm_clflush_virt_range(shadow_batch_obj->mm.mapping, + (void *)cmd - shadow_batch_obj->mm.mapping); i915_gem_object_unpin_map(shadow_batch_obj); return ret;
We only need to clflush those cachelines that we have validated to be read by the GPU. Userspace typically fills the batch length in correctly, the exceptions tend to be explicit tests within igt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_cmd_parser.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)