diff mbox

[i-g-t] lib/gpgpu_fill: Adding missing configuration parameters for gpgpu_fill function

Message ID 20180321142305.14168-1-katarzyna.dec@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Katarzyna Dec March 21, 2018, 2:23 p.m. UTC
During debugging gpgpu_fill test on various platforms, I found out few
things that can affect newer gens:
  Seting number of threads in TS in gen8_fill_interface_descriptor to 1.
This field was omitted in earlier platforms (apparently without any side
effects). Not setting it for newer platforms results in GPU hang.
  Adding pipeline selection mask to gen9_gpgpu_fillfunc, which is
necessary from SKL.
  Changes gen7_emit_interface_descriptor_load to gen8 version.

Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com>
Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 lib/gpgpu_fill.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Daniele Ceraolo Spurio March 26, 2018, 9:28 p.m. UTC | #1
+ igt-dev

On 21/03/18 07:23, Katarzyna Dec wrote:
> During debugging gpgpu_fill test on various platforms, I found out few
> things that can affect newer gens:

this is slightly confusing, as you start with saying that you've found 
something but then below you start with what you've changed. I'd reword 
to make it clearer what were the issues and how they've been fixed.

>    Seting number of threads in TS in gen8_fill_interface_descriptor to 1.
> This field was omitted in earlier platforms (apparently without any side
> effects). Not setting it for newer platforms results in GPU hang.
>    Adding pipeline selection mask to gen9_gpgpu_fillfunc, which is
> necessary from SKL.
>    Changes gen7_emit_interface_descriptor_load to gen8 version.

This last point applies to the gen9 fillfunc only. I'd also mention that 
the gen8 interface descriptor applies to gen9+ as well for extra clarity.

> 
> Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com>
> Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   lib/gpgpu_fill.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c
> index 4d98643d..e7a1edaa 100644
> --- a/lib/gpgpu_fill.c
> +++ b/lib/gpgpu_fill.c
> @@ -336,6 +336,8 @@ gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *
>   	idd->desc5.constant_urb_entry_read_offset = 0;
>   	idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */
>   
> +	idd->desc6.num_threads_in_tg = 1;
> +

1 seems like a safe value which applies to all platforms and it doesn't 
change execution time (gem_gpgpu_fill still completes in ~0.2s on SKL 
with this change applied) so it seems reasonable to me to use it, but 
it'd be nice to get an ack by someone more experienced with the gpgpu 
pipeline.

>   	return offset;
>   }
>   
> @@ -790,12 +792,13 @@ gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch,
>   	batch->ptr = batch->buffer;
>   
>   	/* GPGPU pipeline */
> -	OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_GPGPU);
> +	OUT_BATCH(GEN7_PIPELINE_SELECT | GEN9_PIPELINE_SELECTION_MASK |
> +	PIPELINE_SELECT_GPGPU);

need more indent here.

Apart from the various nitpicks the change looks good to me.

Daniele

>   
>   	gen9_emit_state_base_address(batch);
>   	gen8_emit_vfe_state_gpgpu(batch);
>   	gen7_emit_curbe_load(batch, curbe_buffer);
> -	gen7_emit_interface_descriptor_load(batch, interface_descriptor);
> +	gen8_emit_interface_descriptor_load(batch, interface_descriptor);
>   	gen8_emit_gpgpu_walk(batch, x, y, width, height);
>   
>   	OUT_BATCH(MI_BATCH_BUFFER_END);
>
Katarzyna Dec March 27, 2018, 7:43 a.m. UTC | #2
On Mon, Mar 26, 2018 at 02:28:48PM -0700, Daniele Ceraolo Spurio wrote:
> + igt-dev
> 
> On 21/03/18 07:23, Katarzyna Dec wrote:
> > During debugging gpgpu_fill test on various platforms, I found out few
> > things that can affect newer gens:
> 
> this is slightly confusing, as you start with saying that you've found
> something but then below you start with what you've changed. I'd reword to
> make it clearer what were the issues and how they've been fixed.
>
Actually, this changes will be included in refactoring series and I will
explain everything (I hope) in there.

> >    Seting number of threads in TS in gen8_fill_interface_descriptor to 1.
> > This field was omitted in earlier platforms (apparently without any side
> > effects). Not setting it for newer platforms results in GPU hang.
> >    Adding pipeline selection mask to gen9_gpgpu_fillfunc, which is
> > necessary from SKL.
> >    Changes gen7_emit_interface_descriptor_load to gen8 version.
> 
> This last point applies to the gen9 fillfunc only. I'd also mention that the
> gen8 interface descriptor applies to gen9+ as well for extra clarity.
>
Gen9+ functions will be based on Gen9, so this change will be included there
there as well.
> > 
> > Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com>
> > Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > ---
> >   lib/gpgpu_fill.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c
> > index 4d98643d..e7a1edaa 100644
> > --- a/lib/gpgpu_fill.c
> > +++ b/lib/gpgpu_fill.c
> > @@ -336,6 +336,8 @@ gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *
> >   	idd->desc5.constant_urb_entry_read_offset = 0;
> >   	idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */
> > +	idd->desc6.num_threads_in_tg = 1;
> > +
> 
> 1 seems like a safe value which applies to all platforms and it doesn't
> change execution time (gem_gpgpu_fill still completes in ~0.2s on SKL with
> this change applied) so it seems reasonable to me to use it, but it'd be
> nice to get an ack by someone more experienced with the gpgpu pipeline.
> 
Missing seting number of threads in thread group seems like a 'bug' here.
According to documentation from BDW this field cannot be set to 0. We cannot
be sure that it is not zero unless we set it explicitly.

> >   	return offset;
> >   }
> > @@ -790,12 +792,13 @@ gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch,
> >   	batch->ptr = batch->buffer;
> >   	/* GPGPU pipeline */
> > -	OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_GPGPU);
> > +	OUT_BATCH(GEN7_PIPELINE_SELECT | GEN9_PIPELINE_SELECTION_MASK |
> > +	PIPELINE_SELECT_GPGPU);
> 
> need more indent here.
> 
> Apart from the various nitpicks the change looks good to me.
> 
> Daniele
> 
Refactoring of these libraries will be a good oportunity to adjust code style :)

Thanks for important feedback!
Kasia
> >   	gen9_emit_state_base_address(batch);
> >   	gen8_emit_vfe_state_gpgpu(batch);
> >   	gen7_emit_curbe_load(batch, curbe_buffer);
> > -	gen7_emit_interface_descriptor_load(batch, interface_descriptor);
> > +	gen8_emit_interface_descriptor_load(batch, interface_descriptor);
> >   	gen8_emit_gpgpu_walk(batch, x, y, width, height);
> >   	OUT_BATCH(MI_BATCH_BUFFER_END);
> >
diff mbox

Patch

diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c
index 4d98643d..e7a1edaa 100644
--- a/lib/gpgpu_fill.c
+++ b/lib/gpgpu_fill.c
@@ -336,6 +336,8 @@  gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *
 	idd->desc5.constant_urb_entry_read_offset = 0;
 	idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */
 
+	idd->desc6.num_threads_in_tg = 1;
+
 	return offset;
 }
 
@@ -790,12 +792,13 @@  gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 	batch->ptr = batch->buffer;
 
 	/* GPGPU pipeline */
-	OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_GPGPU);
+	OUT_BATCH(GEN7_PIPELINE_SELECT | GEN9_PIPELINE_SELECTION_MASK |
+	PIPELINE_SELECT_GPGPU);
 
 	gen9_emit_state_base_address(batch);
 	gen8_emit_vfe_state_gpgpu(batch);
 	gen7_emit_curbe_load(batch, curbe_buffer);
-	gen7_emit_interface_descriptor_load(batch, interface_descriptor);
+	gen8_emit_interface_descriptor_load(batch, interface_descriptor);
 	gen8_emit_gpgpu_walk(batch, x, y, width, height);
 
 	OUT_BATCH(MI_BATCH_BUFFER_END);