diff mbox series

[2/5] drm/i915/perf: allow holding preemption on filtered ctx

Message ID 20190521140855.3957-3-lionel.g.landwerlin@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Vulkan performance query support | expand

Commit Message

Lionel Landwerlin May 21, 2019, 2:08 p.m. UTC
We would like to make use of perf in Vulkan. The Vulkan API is much
lower level than OpenGL, with applications directly exposed to the
concept of command buffers (pretty much equivalent to our batch
buffers). In Vulkan, queries are always limited in scope to a command
buffer. In OpenGL, the lack of command buffer concept meant that
queries' duration could span multiple command buffers.

With that restriction gone in Vulkan, we would like to simplify
measuring performance just by measuring the deltas between the counter
snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the
more complex scheme we currently have in the GL driver, using 2
MI_RECORD_PERF_COUNT commands and doing some post processing on the
stream of OA reports, coming from the global OA buffer, to remove any
unrelated deltas in between the 2 MI_RECORD_PERF_COUNT.

Disabling preemption only apply to a single context with which want to
query performance counters for and is considered a privileged
operation, by default protected by CAP_SYS_ADMIN. It is possible to
enable it for a normal user by disabling the paranoid stream setting.

v2: Store preemption setting in intel_context (Chris)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |  1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  3 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  2 +-
 drivers/gpu/drm/i915/i915_drv.c               |  2 +-
 drivers/gpu/drm/i915/i915_perf.c              | 35 +++++++++++++++----
 include/uapi/drm/i915_drm.h                   | 10 ++++++
 6 files changed, 44 insertions(+), 9 deletions(-)

Comments

Chris Wilson May 21, 2019, 4:36 p.m. UTC | #1
Quoting Lionel Landwerlin (2019-05-21 15:08:52)
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index f263a8374273..2ad95977f7a8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
>         if (IS_ERR(cs))
>                 return PTR_ERR(cs);
>  
> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;

My prediction is that this will result in this context being reset due
to preemption timeouts and the context under profile being banned. Note
that preemption timeouts will be the primary means for hang detection
for endless batches.
-Chris
Lionel Landwerlin May 21, 2019, 4:50 p.m. UTC | #2
On 21/05/2019 17:36, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-05-21 15:08:52)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> index f263a8374273..2ad95977f7a8 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
>>          if (IS_ERR(cs))
>>                  return PTR_ERR(cs);
>>   
>> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
> My prediction is that this will result in this context being reset due 
> to preemption timeouts and the context under profile being banned. 
> Note that preemption timeouts will be the primary means for hang 
> detection for endless batches. -Chris

Thanks,

One question : how is that dealt with with compute workloads at the moment?
I though those where still not fully preemptable.

I need to rework this with a more "software" approach holding on preemption.
Adding a condition in intel_lrc.c need_preempt() looks like the right 
direction?

Cheers,

-Lionel
Chris Wilson May 21, 2019, 5:17 p.m. UTC | #3
Quoting Lionel Landwerlin (2019-05-21 17:50:30)
> On 21/05/2019 17:36, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-05-21 15:08:52)
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> index f263a8374273..2ad95977f7a8 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
> >>          if (IS_ERR(cs))
> >>                  return PTR_ERR(cs);
> >>   
> >> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> >> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
> > My prediction is that this will result in this context being reset due 
> > to preemption timeouts and the context under profile being banned. 
> > Note that preemption timeouts will be the primary means for hang 
> > detection for endless batches. -Chris
> 
> Thanks,
> 
> One question : how is that dealt with with compute workloads at the moment?
> I though those where still not fully preemptable.

Not blocking is the condition under which they get to use endless...
compute jobs are preemptible from gen9 afaik, gen8 was problematic and so
disabled.
 
> I need to rework this with a more "software" approach holding on preemption.
> Adding a condition in intel_lrc.c need_preempt() looks like the right 
> direction?

Even less if that is our means of hangcheck.
-Chris
Lionel Landwerlin May 21, 2019, 5:52 p.m. UTC | #4
On 21/05/2019 18:17, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-05-21 17:50:30)
>> On 21/05/2019 17:36, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2019-05-21 15:08:52)
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>> index f263a8374273..2ad95977f7a8 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
>>>>           if (IS_ERR(cs))
>>>>                   return PTR_ERR(cs);
>>>>    
>>>> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>>>> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
>>> My prediction is that this will result in this context being reset due
>>> to preemption timeouts and the context under profile being banned.
>>> Note that preemption timeouts will be the primary means for hang
>>> detection for endless batches. -Chris
>> Thanks,
>>
>> One question : how is that dealt with with compute workloads at the moment?
>> I though those where still not fully preemptable.
> Not blocking is the condition under which they get to use endless...
> compute jobs are preemptible from gen9 afaik, gen8 was problematic and so
> disabled.
>   
>> I need to rework this with a more "software" approach holding on preemption.
>> Adding a condition in intel_lrc.c need_preempt() looks like the right
>> direction?
> Even less if that is our means of hangcheck.
> -Chris
>

Can we differentiate between a hangcheck & a high priority request?

If I remember correctly, we can set the hangcheck timeout somewhere in /sys.

I think it's fine to ban the context doing a perf query if it's taking 
too long.


If a user runs into that scenario we can tell them to increase the timeout.


-Lionel
Lionel Landwerlin May 24, 2019, 9:28 a.m. UTC | #5
On 21/05/2019 17:36, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-05-21 15:08:52)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> index f263a8374273..2ad95977f7a8 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
>>          if (IS_ERR(cs))
>>                  return PTR_ERR(cs);
>>   
>> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
> My prediction is that this will result in this context being reset due
> to preemption timeouts and the context under profile being banned. Note
> that preemption timeouts will be the primary means for hang detection
> for endless batches.
> -Chris
>

Another thought :

What if we ran with the max priority?
It would be fine to have the hangcheck preempt the workload (it's pretty 
short and shouldn't affect perf counters from 3d/compute pipeline much) 
as long as ensure nothing else runs.

-Lionel
Chris Wilson May 24, 2019, 9:42 a.m. UTC | #6
Quoting Lionel Landwerlin (2019-05-24 10:28:16)
> On 21/05/2019 17:36, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-05-21 15:08:52)
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> index f263a8374273..2ad95977f7a8 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
> >>          if (IS_ERR(cs))
> >>                  return PTR_ERR(cs);
> >>   
> >> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> >> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
> > My prediction is that this will result in this context being reset due
> > to preemption timeouts and the context under profile being banned. Note
> > that preemption timeouts will be the primary means for hang detection
> > for endless batches.
> > -Chris
> >
> 
> Another thought :
> 
> What if we ran with the max priority?
> It would be fine to have the hangcheck preempt the workload (it's pretty 
> short and shouldn't affect perf counters from 3d/compute pipeline much) 
> as long as ensure nothing else runs.

It's certainly safer from the pov that we don't block preemption and so
don't incur forced resets. Not keen on the system being perturbed by the
act of observing it, and I still dislike the notion of permitting one
client to hog the GPU so easily. Makes me think of RT throttling, and
generally throwing out the absolute priority system (in exchange for
computed deadlines or something).
-Chris
Lionel Landwerlin May 24, 2019, 9:51 a.m. UTC | #7
On 24/05/2019 10:42, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-05-24 10:28:16)
>> On 21/05/2019 17:36, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2019-05-21 15:08:52)
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>> index f263a8374273..2ad95977f7a8 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
>>>>           if (IS_ERR(cs))
>>>>                   return PTR_ERR(cs);
>>>>    
>>>> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>>>> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
>>> My prediction is that this will result in this context being reset due
>>> to preemption timeouts and the context under profile being banned. Note
>>> that preemption timeouts will be the primary means for hang detection
>>> for endless batches.
>>> -Chris
>>>
>> Another thought :
>>
>> What if we ran with the max priority?
>> It would be fine to have the hangcheck preempt the workload (it's pretty
>> short and shouldn't affect perf counters from 3d/compute pipeline much)
>> as long as ensure nothing else runs.
> It's certainly safer from the pov that we don't block preemption and so
> don't incur forced resets. Not keen on the system being perturbed by the
> act of observing it, and I still dislike the notion of permitting one
> client to hog the GPU so easily. Makes me think of RT throttling, and
> generally throwing out the absolute priority system (in exchange for
> computed deadlines or something).
> -Chris
>
I don't like it much either but I can't see how to do otherwise with the 
hardware we currently have.

I'm thinking of 2 priorities values one of scheduling, one once running.

Most contexts would have both values equal.

Could mitigate the issue a bit?


-Lionel
Chris Wilson May 24, 2019, 10:07 a.m. UTC | #8
Quoting Lionel Landwerlin (2019-05-24 10:51:49)
> On 24/05/2019 10:42, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-05-24 10:28:16)
> >> On 21/05/2019 17:36, Chris Wilson wrote:
> >>> Quoting Lionel Landwerlin (2019-05-21 15:08:52)
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>>> index f263a8374273..2ad95977f7a8 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>>> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
> >>>>           if (IS_ERR(cs))
> >>>>                   return PTR_ERR(cs);
> >>>>    
> >>>> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> >>>> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
> >>> My prediction is that this will result in this context being reset due
> >>> to preemption timeouts and the context under profile being banned. Note
> >>> that preemption timeouts will be the primary means for hang detection
> >>> for endless batches.
> >>> -Chris
> >>>
> >> Another thought :
> >>
> >> What if we ran with the max priority?
> >> It would be fine to have the hangcheck preempt the workload (it's pretty
> >> short and shouldn't affect perf counters from 3d/compute pipeline much)
> >> as long as ensure nothing else runs.
> > It's certainly safer from the pov that we don't block preemption and so
> > don't incur forced resets. Not keen on the system being perturbed by the
> > act of observing it, and I still dislike the notion of permitting one
> > client to hog the GPU so easily. Makes me think of RT throttling, and
> > generally throwing out the absolute priority system (in exchange for
> > computed deadlines or something).
> > -Chris
> >
> I don't like it much either but I can't see how to do otherwise with the 
> hardware we currently have.
> 
> I'm thinking of 2 priorities values one of scheduling, one once running.

It's not quite that easy as you may start running concurrently with one
of your dependencies and must therefore manage the priority inversion if
you boost yourself. And I've just gone through and thrown out the
current complexity of manipulating priority as they run because it made
timeslicing much harder (where the priority was changing between
evaluating the need for the context switch and the context switch
occurring -- such mistakes can be noticed in throughput sensitive
transcode workloads).

> Most contexts would have both values equal.
> 
> Could mitigate the issue a bit?

A bit, it gives you a soft notion of a no-preempt flag without queue
jumping. rq_prio(rq) | intel_context->effective_priority or somesuch.
-Chris
Lionel Landwerlin May 27, 2019, 10:11 p.m. UTC | #9
On 24/05/2019 11:07, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-05-24 10:51:49)
>> On 24/05/2019 10:42, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2019-05-24 10:28:16)
>>>> On 21/05/2019 17:36, Chris Wilson wrote:
>>>>> Quoting Lionel Landwerlin (2019-05-21 15:08:52)
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>> index f263a8374273..2ad95977f7a8 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
>>>>>>            if (IS_ERR(cs))
>>>>>>                    return PTR_ERR(cs);
>>>>>>     
>>>>>> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>>>>>> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
>>>>> My prediction is that this will result in this context being reset due
>>>>> to preemption timeouts and the context under profile being banned. Note
>>>>> that preemption timeouts will be the primary means for hang detection
>>>>> for endless batches.
>>>>> -Chris
>>>>>
>>>> Another thought :
>>>>
>>>> What if we ran with the max priority?
>>>> It would be fine to have the hangcheck preempt the workload (it's pretty
>>>> short and shouldn't affect perf counters from 3d/compute pipeline much)
>>>> as long as ensure nothing else runs.
>>> It's certainly safer from the pov that we don't block preemption and so
>>> don't incur forced resets. Not keen on the system being perturbed by the
>>> act of observing it, and I still dislike the notion of permitting one
>>> client to hog the GPU so easily. Makes me think of RT throttling, and
>>> generally throwing out the absolute priority system (in exchange for
>>> computed deadlines or something).
>>> -Chris
>>>
>> I don't like it much either but I can't see how to do otherwise with the
>> hardware we currently have.
>>
>> I'm thinking of 2 priorities values one of scheduling, one once running.
> It's not quite that easy as you may start running concurrently with one
> of your dependencies and must therefore manage the priority inversion if
> you boost yourself. And I've just gone through and thrown out the
> current complexity of manipulating priority as they run because it made
> timeslicing much harder (where the priority was changing between
> evaluating the need for the context switch and the context switch
> occurring -- such mistakes can be noticed in throughput sensitive
> transcode workloads).


It's like you wrote a scheduler before!


Here is how I could see this work.

I can see the 3 different stages of a request :

   - waiting on dependencies

   - in the engine queue

   - in the HW


The request would maintain is normal/default priority until it hits the HW.

When hitting the HW for the first time, its priority is upgraded to perf 
priority so that it sticks to the HW until completition (or some other 
timeout kicks it off the HW).


Does that still sound broken?


Thanks a lot,


-Lionel


>
>> Most contexts would have both values equal.
>>
>> Could mitigate the issue a bit?
> A bit, it gives you a soft notion of a no-preempt flag without queue
> jumping. rq_prio(rq) | intel_context->effective_priority or somesuch.
> -Chris
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 5b31e1e05ddd..68a4b888fb1a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -117,6 +117,7 @@  intel_context_init(struct intel_context *ce,
 	ce->ops = engine->cops;
 	ce->sseu = engine->sseu;
 	ce->saturated = 0;
+	ce->arb_enable = MI_ARB_ENABLE;
 
 	INIT_LIST_HEAD(&ce->signal_link);
 	INIT_LIST_HEAD(&ce->signals);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 963a312430e6..07f586e3608d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -65,6 +65,9 @@  struct intel_context {
 
 	/** sseu: Control eu/slice partitioning */
 	struct intel_sseu sseu;
+
+	/** arb_enable: Control preemption */
+	u32 arb_enable;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f263a8374273..2ad95977f7a8 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2085,7 +2085,7 @@  static int gen9_emit_bb_start(struct i915_request *rq,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+	*cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
 
 	*cs++ = MI_BATCH_BUFFER_START_GEN8 |
 		(flags & I915_DISPATCH_SECURE ? 0 : BIT(8));
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index f309a0b2ccfc..5871e0cfbab0 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -470,7 +470,7 @@  static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 		value = INTEL_INFO(dev_priv)->has_coherent_ggtt;
 		break;
 	case I915_PARAM_PERF_REVISION:
-		value = 1;
+		value = 2;
 		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c4995d5a16d2..8c7fa7f7014b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -355,6 +355,7 @@  struct perf_open_properties {
 	u32 sample_flags;
 
 	u64 single_context:1;
+	u64 context_disable_preemption:1;
 	u64 ctx_handle;
 
 	/* OA sampling state */
@@ -1201,7 +1202,8 @@  static int i915_oa_read(struct i915_perf_stream *stream,
 }
 
 static struct intel_context *oa_pin_context(struct drm_i915_private *i915,
-					    struct i915_gem_context *ctx)
+					    struct i915_gem_context *ctx,
+					    bool disable_preemption)
 {
 	struct i915_gem_engines_iter it;
 	struct intel_context *ce;
@@ -1222,6 +1224,7 @@  static struct intel_context *oa_pin_context(struct drm_i915_private *i915,
 		err = intel_context_pin(ce);
 		if (err == 0) {
 			i915->perf.oa.pinned_ctx = ce;
+			ce->arb_enable = MI_ARB_DISABLE;
 			break;
 		}
 	}
@@ -1237,19 +1240,22 @@  static struct intel_context *oa_pin_context(struct drm_i915_private *i915,
 /**
  * oa_get_render_ctx_id - determine and hold ctx hw id
  * @stream: An i915-perf stream opened for OA metrics
+ * @disable_preemption: Whether to disable preemption on the context
  *
  * Determine the render context hw id, and ensure it remains fixed for the
  * lifetime of the stream. This ensures that we don't have to worry about
- * updating the context ID in OACONTROL on the fly.
+ * updating the context ID in OACONTROL on the fly. Also disable preemption on
+ * the context if needed.
  *
  * Returns: zero on success or a negative error code
  */
-static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
+static int oa_get_render_ctx_id(struct i915_perf_stream *stream,
+				bool disable_preemption)
 {
 	struct drm_i915_private *i915 = stream->dev_priv;
 	struct intel_context *ce;
 
-	ce = oa_pin_context(i915, stream->ctx);
+	ce = oa_pin_context(i915, stream->ctx, disable_preemption);
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
@@ -1337,6 +1343,7 @@  static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
 	ce = fetch_and_zero(&dev_priv->perf.oa.pinned_ctx);
 	if (ce) {
 		mutex_lock(&dev_priv->drm.struct_mutex);
+		ce->arb_enable = MI_ARB_ENABLE;
 		intel_context_unpin(ce);
 		mutex_unlock(&dev_priv->drm.struct_mutex);
 	}
@@ -2085,7 +2092,7 @@  static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		dev_priv->perf.oa.period_exponent = props->oa_period_exponent;
 
 	if (stream->ctx) {
-		ret = oa_get_render_ctx_id(stream);
+		ret = oa_get_render_ctx_id(stream, props->context_disable_preemption);
 		if (ret) {
 			DRM_DEBUG("Invalid context id to filter with\n");
 			return ret;
@@ -2583,6 +2590,15 @@  i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 		}
 	}
 
+	if (props->context_disable_preemption) {
+		if (!props->single_context) {
+			DRM_DEBUG("preemption disable with no context\n");
+			ret = -EINVAL;
+			goto err;
+		}
+		privileged_op = true;
+	}
+
 	/*
 	 * On Haswell the OA unit supports clock gating off for a specific
 	 * context and in this mode there's no visibility of metrics for the
@@ -2597,8 +2613,10 @@  i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	 * MI_REPORT_PERF_COUNT commands and so consider it a privileged op to
 	 * enable the OA unit by default.
 	 */
-	if (IS_HASWELL(dev_priv) && specific_ctx)
+	if (IS_HASWELL(dev_priv) && specific_ctx &&
+	    !props->context_disable_preemption) {
 		privileged_op = false;
+	}
 
 	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
 	 * we check a dev.i915.perf_stream_paranoid sysctl option
@@ -2607,7 +2625,7 @@  i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	 */
 	if (privileged_op &&
 	    i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
-		DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n");
+		DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
 		ret = -EACCES;
 		goto err_ctx;
 	}
@@ -2799,6 +2817,9 @@  static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 			props->oa_periodic = true;
 			props->oa_period_exponent = value;
 			break;
+		case DRM_I915_PERF_PROP_HOLD_PREEMPTION:
+			props->context_disable_preemption = value != 0 ? 1 : 0;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ad8a3e4f6355..5601dc688295 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1727,6 +1727,16 @@  enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_OA_EXPONENT,
 
+	/**
+	 * Specifying this property is only valid when specify a context to
+	 * filter with DRM_I915_PERF_PROP_CTX_HANDLE. Specifying this property
+	 * will hold preemption of the particular context we want to gather
+	 * performance data about.
+	 *
+	 * This property is available in perf revision 2.
+	 */
+	DRM_I915_PERF_PROP_HOLD_PREEMPTION,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };