Message ID | 20180321142305.14168-1-katarzyna.dec@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
+ igt-dev On 21/03/18 07:23, Katarzyna Dec wrote: > During debugging gpgpu_fill test on various platforms, I found out few > things that can affect newer gens: this is slightly confusing, as you start with saying that you've found something but then below you start with what you've changed. I'd reword to make it clearer what were the issues and how they've been fixed. > Seting number of threads in TS in gen8_fill_interface_descriptor to 1. > This field was omitted in earlier platforms (apparently without any side > effects). Not setting it for newer platforms results in GPU hang. > Adding pipeline selection mask to gen9_gpgpu_fillfunc, which is > necessary from SKL. > Changes gen7_emit_interface_descriptor_load to gen8 version. This last point applies to the gen9 fillfunc only. I'd also mention that the gen8 interface descriptor applies to gen9+ as well for extra clarity. > > Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com> > Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > --- > lib/gpgpu_fill.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c > index 4d98643d..e7a1edaa 100644 > --- a/lib/gpgpu_fill.c > +++ b/lib/gpgpu_fill.c > @@ -336,6 +336,8 @@ gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf * > idd->desc5.constant_urb_entry_read_offset = 0; > idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */ > > + idd->desc6.num_threads_in_tg = 1; > + 1 seems like a safe value which applies to all platforms and it doesn't change execution time (gem_gpgpu_fill still completes in ~0.2s on SKL with this change applied) so it seems reasonable to me to use it, but it'd be nice to get an ack by someone more experienced with the gpgpu pipeline. > return offset; > } > > @@ -790,12 +792,13 @@ gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch, > batch->ptr = batch->buffer; > > /* GPGPU pipeline */ > - OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_GPGPU); > + OUT_BATCH(GEN7_PIPELINE_SELECT | GEN9_PIPELINE_SELECTION_MASK | > + PIPELINE_SELECT_GPGPU); need more indent here. Apart from the various nitpicks the change looks good to me. Daniele > > gen9_emit_state_base_address(batch); > gen8_emit_vfe_state_gpgpu(batch); > gen7_emit_curbe_load(batch, curbe_buffer); > - gen7_emit_interface_descriptor_load(batch, interface_descriptor); > + gen8_emit_interface_descriptor_load(batch, interface_descriptor); > gen8_emit_gpgpu_walk(batch, x, y, width, height); > > OUT_BATCH(MI_BATCH_BUFFER_END); >
On Mon, Mar 26, 2018 at 02:28:48PM -0700, Daniele Ceraolo Spurio wrote: > + igt-dev > > On 21/03/18 07:23, Katarzyna Dec wrote: > > During debugging gpgpu_fill test on various platforms, I found out few > > things that can affect newer gens: > > this is slightly confusing, as you start with saying that you've found > something but then below you start with what you've changed. I'd reword to > make it clearer what were the issues and how they've been fixed. > Actually, this changes will be included in refactoring series and I will explain everything (I hope) in there. > > Seting number of threads in TS in gen8_fill_interface_descriptor to 1. > > This field was omitted in earlier platforms (apparently without any side > > effects). Not setting it for newer platforms results in GPU hang. > > Adding pipeline selection mask to gen9_gpgpu_fillfunc, which is > > necessary from SKL. > > Changes gen7_emit_interface_descriptor_load to gen8 version. > > This last point applies to the gen9 fillfunc only. I'd also mention that the > gen8 interface descriptor applies to gen9+ as well for extra clarity. > Gen9+ functions will be based on Gen9, so this change will be included there there as well. > > > > Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com> > > Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com> > > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > > --- > > lib/gpgpu_fill.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c > > index 4d98643d..e7a1edaa 100644 > > --- a/lib/gpgpu_fill.c > > +++ b/lib/gpgpu_fill.c > > @@ -336,6 +336,8 @@ gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf * > > idd->desc5.constant_urb_entry_read_offset = 0; > > idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */ > > + idd->desc6.num_threads_in_tg = 1; > > + > > 1 seems like a safe value which applies to all platforms and it doesn't > change execution time (gem_gpgpu_fill still completes in ~0.2s on SKL with > this change applied) so it seems reasonable to me to use it, but it'd be > nice to get an ack by someone more experienced with the gpgpu pipeline. > Missing seting number of threads in thread group seems like a 'bug' here. According to documentation from BDW this field cannot be set to 0. We cannot be sure that it is not zero unless we set it explicitly. > > return offset; > > } > > @@ -790,12 +792,13 @@ gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch, > > batch->ptr = batch->buffer; > > /* GPGPU pipeline */ > > - OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_GPGPU); > > + OUT_BATCH(GEN7_PIPELINE_SELECT | GEN9_PIPELINE_SELECTION_MASK | > > + PIPELINE_SELECT_GPGPU); > > need more indent here. > > Apart from the various nitpicks the change looks good to me. > > Daniele > Refactoring of these libraries will be a good oportunity to adjust code style :) Thanks for important feedback! Kasia > > gen9_emit_state_base_address(batch); > > gen8_emit_vfe_state_gpgpu(batch); > > gen7_emit_curbe_load(batch, curbe_buffer); > > - gen7_emit_interface_descriptor_load(batch, interface_descriptor); > > + gen8_emit_interface_descriptor_load(batch, interface_descriptor); > > gen8_emit_gpgpu_walk(batch, x, y, width, height); > > OUT_BATCH(MI_BATCH_BUFFER_END); > >
diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c index 4d98643d..e7a1edaa 100644 --- a/lib/gpgpu_fill.c +++ b/lib/gpgpu_fill.c @@ -336,6 +336,8 @@ gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf * idd->desc5.constant_urb_entry_read_offset = 0; idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */ + idd->desc6.num_threads_in_tg = 1; + return offset; } @@ -790,12 +792,13 @@ gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch, batch->ptr = batch->buffer; /* GPGPU pipeline */ - OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_GPGPU); + OUT_BATCH(GEN7_PIPELINE_SELECT | GEN9_PIPELINE_SELECTION_MASK | + PIPELINE_SELECT_GPGPU); gen9_emit_state_base_address(batch); gen8_emit_vfe_state_gpgpu(batch); gen7_emit_curbe_load(batch, curbe_buffer); - gen7_emit_interface_descriptor_load(batch, interface_descriptor); + gen8_emit_interface_descriptor_load(batch, interface_descriptor); gen8_emit_gpgpu_walk(batch, x, y, width, height); OUT_BATCH(MI_BATCH_BUFFER_END);
During debugging gpgpu_fill test on various platforms, I found out few things that can affect newer gens: Seting number of threads in TS in gen8_fill_interface_descriptor to 1. This field was omitted in earlier platforms (apparently without any side effects). Not setting it for newer platforms results in GPU hang. Adding pipeline selection mask to gen9_gpgpu_fillfunc, which is necessary from SKL. Changes gen7_emit_interface_descriptor_load to gen8 version. Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com> Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> --- lib/gpgpu_fill.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)