diff mbox series

drm/i915/perf: don't forget noa wait after oa config

Message ID 20191113154639.27144-1-lionel.g.landwerlin@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/perf: don't forget noa wait after oa config | expand

Commit Message

Lionel Landwerlin Nov. 13, 2019, 3:46 p.m. UTC
I'm observing incoherence metric values, changing from run to run.

It appears the patches introducing noa wait & reconfiguration from
command stream switched places in the series multiple times during the
review. This lead to the dependency of one onto the order to go
missing...

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
---
 drivers/gpu/drm/i915/i915_perf.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Chris Wilson Nov. 13, 2019, 4:35 p.m. UTC | #1
Quoting Lionel Landwerlin (2019-11-13 15:46:39)
> I'm observing incoherence metric values, changing from run to run.
> 
> It appears the patches introducing noa wait & reconfiguration from
> command stream switched places in the series multiple times during the
> review. This lead to the dependency of one onto the order to go
> missing...

I don't think I dropped it; if I did my apologies. I do feel the
egg-on-face for writing a selftest to verify that noa_wait does what you
said it did, but completely missing that it went unused :)
 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 507236bd41ae..31e47ee23357 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
>         config_length += num_lri_dwords(oa_config->mux_regs_len);
>         config_length += num_lri_dwords(oa_config->b_counter_regs_len);
>         config_length += num_lri_dwords(oa_config->flex_regs_len);
> -       config_length++; /* MI_BATCH_BUFFER_END */
> +       config_length += 3; /* MI_BATCH_BUFFER_START */
>         config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE);
>  
>         obj = i915_gem_object_create_shmem(stream->perf->i915, config_length);
> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
>                              oa_config->flex_regs,
>                              oa_config->flex_regs_len);
>  
> -       *cs++ = MI_BATCH_BUFFER_END;
> +       /* Jump into the active wait. */
> +       *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ?
> +                MI_BATCH_BUFFER_START :
> +                MI_BATCH_BUFFER_START_GEN8);
> +       *cs++ = i915_ggtt_offset(stream->noa_wait);
> +       *cs++ = 0;

Yikes, stream->noa_wait is unused.

Hmm, the noa_wait doesn't have any arbitration points internally, so we
probably do need to make it non-preemptable as well?

With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

We need to wrap emit_oa_config() in a similar selftest and verify
that a read of the oa regs are correct and that the TIMESTAMP indicates
the appropriate delay before the read. If you feel bored.
-Chris
Lionel Landwerlin Nov. 13, 2019, 6:07 p.m. UTC | #2
On 13/11/2019 18:35, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-11-13 15:46:39)
>> I'm observing incoherence metric values, changing from run to run.
>>
>> It appears the patches introducing noa wait & reconfiguration from
>> command stream switched places in the series multiple times during the
>> review. This lead to the dependency of one onto the order to go
>> missing...
> I don't think I dropped it; if I did my apologies. I do feel the
> egg-on-face for writing a selftest to verify that noa_wait does what you
> said it did, but completely missing that it went unused :)
>   
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
>> ---
>>   drivers/gpu/drm/i915/i915_perf.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 507236bd41ae..31e47ee23357 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
>>          config_length += num_lri_dwords(oa_config->mux_regs_len);
>>          config_length += num_lri_dwords(oa_config->b_counter_regs_len);
>>          config_length += num_lri_dwords(oa_config->flex_regs_len);
>> -       config_length++; /* MI_BATCH_BUFFER_END */
>> +       config_length += 3; /* MI_BATCH_BUFFER_START */
>>          config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE);
>>   
>>          obj = i915_gem_object_create_shmem(stream->perf->i915, config_length);
>> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
>>                               oa_config->flex_regs,
>>                               oa_config->flex_regs_len);
>>   
>> -       *cs++ = MI_BATCH_BUFFER_END;
>> +       /* Jump into the active wait. */
>> +       *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ?
>> +                MI_BATCH_BUFFER_START :
>> +                MI_BATCH_BUFFER_START_GEN8);
>> +       *cs++ = i915_ggtt_offset(stream->noa_wait);
>> +       *cs++ = 0;
> Yikes, stream->noa_wait is unused.
>
> Hmm, the noa_wait doesn't have any arbitration points internally, so we
> probably do need to make it non-preemptable as well?
>
> With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config,
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> We need to wrap emit_oa_config() in a similar selftest and verify
> that a read of the oa regs are correct and that the TIMESTAMP indicates
> the appropriate delay before the read. If you feel bored.
> -Chris


As long as we wait long enough, it should be okay.

Why making it nopreempt?


-Lionel
Chris Wilson Nov. 13, 2019, 6:10 p.m. UTC | #3
Quoting Lionel Landwerlin (2019-11-13 18:07:59)
> On 13/11/2019 18:35, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-11-13 15:46:39)
> >> I'm observing incoherence metric values, changing from run to run.
> >>
> >> It appears the patches introducing noa wait & reconfiguration from
> >> command stream switched places in the series multiple times during the
> >> review. This lead to the dependency of one onto the order to go
> >> missing...
> > I don't think I dropped it; if I did my apologies. I do feel the
> > egg-on-face for writing a selftest to verify that noa_wait does what you
> > said it did, but completely missing that it went unused :)
> >   
> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
> >> ---
> >>   drivers/gpu/drm/i915/i915_perf.c | 9 +++++++--
> >>   1 file changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >> index 507236bd41ae..31e47ee23357 100644
> >> --- a/drivers/gpu/drm/i915/i915_perf.c
> >> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
> >>          config_length += num_lri_dwords(oa_config->mux_regs_len);
> >>          config_length += num_lri_dwords(oa_config->b_counter_regs_len);
> >>          config_length += num_lri_dwords(oa_config->flex_regs_len);
> >> -       config_length++; /* MI_BATCH_BUFFER_END */
> >> +       config_length += 3; /* MI_BATCH_BUFFER_START */
> >>          config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE);
> >>   
> >>          obj = i915_gem_object_create_shmem(stream->perf->i915, config_length);
> >> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
> >>                               oa_config->flex_regs,
> >>                               oa_config->flex_regs_len);
> >>   
> >> -       *cs++ = MI_BATCH_BUFFER_END;
> >> +       /* Jump into the active wait. */
> >> +       *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ?
> >> +                MI_BATCH_BUFFER_START :
> >> +                MI_BATCH_BUFFER_START_GEN8);
> >> +       *cs++ = i915_ggtt_offset(stream->noa_wait);
> >> +       *cs++ = 0;
> > Yikes, stream->noa_wait is unused.
> >
> > Hmm, the noa_wait doesn't have any arbitration points internally, so we
> > probably do need to make it non-preemptable as well?
> >
> > With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config,
> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> > We need to wrap emit_oa_config() in a similar selftest and verify
> > that a read of the oa regs are correct and that the TIMESTAMP indicates
> > the appropriate delay before the read. If you feel bored.
> > -Chris
> 
> 
> As long as we wait long enough, it should be okay.
> 
> Why making it nopreempt?

Aiui, the batch buffer has no arbitration points so the delay may incur
the wrath of the forced preemption. That is another request (of higher
priority) wishing to run, but not being able to.
-Chris
Chris Wilson Nov. 13, 2019, 6:11 p.m. UTC | #4
Quoting Chris Wilson (2019-11-13 18:10:22)
> Quoting Lionel Landwerlin (2019-11-13 18:07:59)
> > On 13/11/2019 18:35, Chris Wilson wrote:
> > > Quoting Lionel Landwerlin (2019-11-13 15:46:39)
> > >> I'm observing incoherence metric values, changing from run to run.
> > >>
> > >> It appears the patches introducing noa wait & reconfiguration from
> > >> command stream switched places in the series multiple times during the
> > >> review. This lead to the dependency of one onto the order to go
> > >> missing...
> > > I don't think I dropped it; if I did my apologies. I do feel the
> > > egg-on-face for writing a selftest to verify that noa_wait does what you
> > > said it did, but completely missing that it went unused :)
> > >   
> > >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> > >> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
> > >> ---
> > >>   drivers/gpu/drm/i915/i915_perf.c | 9 +++++++--
> > >>   1 file changed, 7 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > >> index 507236bd41ae..31e47ee23357 100644
> > >> --- a/drivers/gpu/drm/i915/i915_perf.c
> > >> +++ b/drivers/gpu/drm/i915/i915_perf.c
> > >> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
> > >>          config_length += num_lri_dwords(oa_config->mux_regs_len);
> > >>          config_length += num_lri_dwords(oa_config->b_counter_regs_len);
> > >>          config_length += num_lri_dwords(oa_config->flex_regs_len);
> > >> -       config_length++; /* MI_BATCH_BUFFER_END */
> > >> +       config_length += 3; /* MI_BATCH_BUFFER_START */
> > >>          config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE);
> > >>   
> > >>          obj = i915_gem_object_create_shmem(stream->perf->i915, config_length);
> > >> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
> > >>                               oa_config->flex_regs,
> > >>                               oa_config->flex_regs_len);
> > >>   
> > >> -       *cs++ = MI_BATCH_BUFFER_END;
> > >> +       /* Jump into the active wait. */
> > >> +       *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ?
> > >> +                MI_BATCH_BUFFER_START :
> > >> +                MI_BATCH_BUFFER_START_GEN8);
> > >> +       *cs++ = i915_ggtt_offset(stream->noa_wait);
> > >> +       *cs++ = 0;
> > > Yikes, stream->noa_wait is unused.
> > >
> > > Hmm, the noa_wait doesn't have any arbitration points internally, so we
> > > probably do need to make it non-preemptable as well?
> > >
> > > With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config,
> > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> > >
> > > We need to wrap emit_oa_config() in a similar selftest and verify
> > > that a read of the oa regs are correct and that the TIMESTAMP indicates
> > > the appropriate delay before the read. If you feel bored.
> > > -Chris
> > 
> > 
> > As long as we wait long enough, it should be okay.
> > 
> > Why making it nopreempt?
> 
> Aiui, the batch buffer has no arbitration points so the delay may incur
> the wrath of the forced preemption. That is another request (of higher
> priority) wishing to run, but not being able to.

The alternative would be adding a MI_ARB_CHECK at the start of the loop
if you happy with being preempted out.
-Chris
Lionel Landwerlin Nov. 13, 2019, 6:33 p.m. UTC | #5
On 13/11/2019 20:11, Chris Wilson wrote:
> Quoting Chris Wilson (2019-11-13 18:10:22)
>> Quoting Lionel Landwerlin (2019-11-13 18:07:59)
>>> On 13/11/2019 18:35, Chris Wilson wrote:
>>>> Quoting Lionel Landwerlin (2019-11-13 15:46:39)
>>>>> I'm observing incoherence metric values, changing from run to run.
>>>>>
>>>>> It appears the patches introducing noa wait & reconfiguration from
>>>>> command stream switched places in the series multiple times during the
>>>>> review. This lead to the dependency of one onto the order to go
>>>>> missing...
>>>> I don't think I dropped it; if I did my apologies. I do feel the
>>>> egg-on-face for writing a selftest to verify that noa_wait does what you
>>>> said it did, but completely missing that it went unused :)


It was probably my mistake :)


>>>>    
>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_perf.c | 9 +++++++--
>>>>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>>>> index 507236bd41ae..31e47ee23357 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
>>>>>           config_length += num_lri_dwords(oa_config->mux_regs_len);
>>>>>           config_length += num_lri_dwords(oa_config->b_counter_regs_len);
>>>>>           config_length += num_lri_dwords(oa_config->flex_regs_len);
>>>>> -       config_length++; /* MI_BATCH_BUFFER_END */
>>>>> +       config_length += 3; /* MI_BATCH_BUFFER_START */
>>>>>           config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE);
>>>>>    
>>>>>           obj = i915_gem_object_create_shmem(stream->perf->i915, config_length);
>>>>> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
>>>>>                                oa_config->flex_regs,
>>>>>                                oa_config->flex_regs_len);
>>>>>    
>>>>> -       *cs++ = MI_BATCH_BUFFER_END;
>>>>> +       /* Jump into the active wait. */
>>>>> +       *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ?
>>>>> +                MI_BATCH_BUFFER_START :
>>>>> +                MI_BATCH_BUFFER_START_GEN8);
>>>>> +       *cs++ = i915_ggtt_offset(stream->noa_wait);
>>>>> +       *cs++ = 0;
>>>> Yikes, stream->noa_wait is unused.
>>>>
>>>> Hmm, the noa_wait doesn't have any arbitration points internally, so we
>>>> probably do need to make it non-preemptable as well?
>>>>
>>>> With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config,
>>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> We need to wrap emit_oa_config() in a similar selftest and verify
>>>> that a read of the oa regs are correct and that the TIMESTAMP indicates
>>>> the appropriate delay before the read. If you feel bored.
>>>> -Chris
>>>
>>> As long as we wait long enough, it should be okay.
>>>
>>> Why making it nopreempt?
>> Aiui, the batch buffer has no arbitration points so the delay may incur
>> the wrath of the forced preemption. That is another request (of higher
>> priority) wishing to run, but not being able to.
> The alternative would be adding a MI_ARB_CHECK at the start of the loop
> if you happy with being preempted out.
> -Chris

I guess I'll do that :)


-Lionel
Chris Wilson Nov. 14, 2019, 7:19 a.m. UTC | #6
Quoting Lionel Landwerlin (2019-11-13 18:33:59)
> On 13/11/2019 20:11, Chris Wilson wrote:
> > Quoting Chris Wilson (2019-11-13 18:10:22)
> >> Quoting Lionel Landwerlin (2019-11-13 18:07:59)
> >>> On 13/11/2019 18:35, Chris Wilson wrote:
> >>>> Quoting Lionel Landwerlin (2019-11-13 15:46:39)
> >>>>> I'm observing incoherence metric values, changing from run to run.
> >>>>>
> >>>>> It appears the patches introducing noa wait & reconfiguration from
> >>>>> command stream switched places in the series multiple times during the
> >>>>> review. This lead to the dependency of one onto the order to go
> >>>>> missing...
> >>>> I don't think I dropped it; if I did my apologies. I do feel the
> >>>> egg-on-face for writing a selftest to verify that noa_wait does what you
> >>>> said it did, but completely missing that it went unused :)
> 
> 
> It was probably my mistake :)
> 
> 
> >>>>    
> >>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >>>>> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
> >>>>> ---
> >>>>>    drivers/gpu/drm/i915/i915_perf.c | 9 +++++++--
> >>>>>    1 file changed, 7 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >>>>> index 507236bd41ae..31e47ee23357 100644
> >>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
> >>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >>>>> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
> >>>>>           config_length += num_lri_dwords(oa_config->mux_regs_len);
> >>>>>           config_length += num_lri_dwords(oa_config->b_counter_regs_len);
> >>>>>           config_length += num_lri_dwords(oa_config->flex_regs_len);
> >>>>> -       config_length++; /* MI_BATCH_BUFFER_END */
> >>>>> +       config_length += 3; /* MI_BATCH_BUFFER_START */
> >>>>>           config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE);
> >>>>>    
> >>>>>           obj = i915_gem_object_create_shmem(stream->perf->i915, config_length);
> >>>>> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream,
> >>>>>                                oa_config->flex_regs,
> >>>>>                                oa_config->flex_regs_len);
> >>>>>    
> >>>>> -       *cs++ = MI_BATCH_BUFFER_END;
> >>>>> +       /* Jump into the active wait. */
> >>>>> +       *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ?
> >>>>> +                MI_BATCH_BUFFER_START :
> >>>>> +                MI_BATCH_BUFFER_START_GEN8);
> >>>>> +       *cs++ = i915_ggtt_offset(stream->noa_wait);
> >>>>> +       *cs++ = 0;
> >>>> Yikes, stream->noa_wait is unused.
> >>>>
> >>>> Hmm, the noa_wait doesn't have any arbitration points internally, so we
> >>>> probably do need to make it non-preemptable as well?
> >>>>
> >>>> With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config,
> >>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>
> >>>> We need to wrap emit_oa_config() in a similar selftest and verify
> >>>> that a read of the oa regs are correct and that the TIMESTAMP indicates
> >>>> the appropriate delay before the read. If you feel bored.
> >>>> -Chris
> >>>
> >>> As long as we wait long enough, it should be okay.
> >>>
> >>> Why making it nopreempt?
> >> Aiui, the batch buffer has no arbitration points so the delay may incur
> >> the wrath of the forced preemption. That is another request (of higher
> >> priority) wishing to run, but not being able to.
> > The alternative would be adding a MI_ARB_CHECK at the start of the loop
> > if you happy with being preempted out.
> > -Chris
> 
> I guess I'll do that :)

Ok, then have an 
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
on the understanding you'll look at adding an MI_ARB_CHECK next :)
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 507236bd41ae..31e47ee23357 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1870,7 +1870,7 @@  alloc_oa_config_buffer(struct i915_perf_stream *stream,
 	config_length += num_lri_dwords(oa_config->mux_regs_len);
 	config_length += num_lri_dwords(oa_config->b_counter_regs_len);
 	config_length += num_lri_dwords(oa_config->flex_regs_len);
-	config_length++; /* MI_BATCH_BUFFER_END */
+	config_length += 3; /* MI_BATCH_BUFFER_START */
 	config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE);
 
 	obj = i915_gem_object_create_shmem(stream->perf->i915, config_length);
@@ -1895,7 +1895,12 @@  alloc_oa_config_buffer(struct i915_perf_stream *stream,
 			     oa_config->flex_regs,
 			     oa_config->flex_regs_len);
 
-	*cs++ = MI_BATCH_BUFFER_END;
+	/* Jump into the active wait. */
+	*cs++ = (INTEL_GEN(stream->perf->i915) < 8 ?
+		 MI_BATCH_BUFFER_START :
+		 MI_BATCH_BUFFER_START_GEN8);
+	*cs++ = i915_ggtt_offset(stream->noa_wait);
+	*cs++ = 0;
 
 	i915_gem_object_flush_map(obj);
 	i915_gem_object_unpin_map(obj);