diff mbox

[RFC,v1] drm/i915: Add Exec param to control data port coherency.

Message ID 1521463055-5325-2-git-send-email-tomasz.lis@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Lis, Tomasz March 19, 2018, 12:37 p.m. UTC
The patch adds a parameter to control the data port coherency functionality
on a per-exec call basis. When data port coherency flag value is different
than what it was in previous call for the context, a command to switch data
port coherency state is added before the buffer to be executed.

Bspec: 11419
---
 drivers/gpu/drm/i915/i915_drv.c            |  3 ++
 drivers/gpu/drm/i915/i915_gem_context.h    |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 17 ++++++++++
 drivers/gpu/drm/i915/intel_lrc.c           | 53 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h           |  3 ++
 include/uapi/drm/i915_drm.h                | 12 ++++++-
 6 files changed, 88 insertions(+), 1 deletion(-)

Comments

Chris Wilson March 19, 2018, 12:43 p.m. UTC | #1
Quoting Tomasz Lis (2018-03-19 12:37:35)
> The patch adds a parameter to control the data port coherency functionality
> on a per-exec call basis. When data port coherency flag value is different
> than what it was in previous call for the context, a command to switch data
> port coherency state is added before the buffer to be executed.

So this is part of the context? Why do it at exec level? If exec level
is desired, why not whitelist it?
-Chris
Lis, Tomasz March 19, 2018, 2:14 p.m. UTC | #2
On 2018-03-19 13:43, Chris Wilson wrote:
> Quoting Tomasz Lis (2018-03-19 12:37:35)
>> The patch adds a parameter to control the data port coherency functionality
>> on a per-exec call basis. When data port coherency flag value is different
>> than what it was in previous call for the context, a command to switch data
>> port coherency state is added before the buffer to be executed.
> So this is part of the context? Why do it at exec level?

It is part of the context, stored within HDC chicken bit register.
The exec level was requested by the OCL team, due to concerns about 
performance cost of context setparam calls.

>   If exec level
> is desired, why not whitelist it?
> -Chris

If we have no issue in whitelisting the register, I'm sure OCL will 
agree to that.
I assumed the whitelisting will be unacceptable because of security 
concerns with some options.
The register also changes its position and content between gens, which 
makes whitelisting hard to manage.

Main purpose of chicken bit registers, in general, is to allow work 
around for hardware features which could  be buggy or could have 
unintended influence on the platform.
The data port coherency functionality landed there for the same reasons; 
then it twisted itself in a way that we now need user space to switch it.
Is it really ok to whitelist chicken bit registers?
-Tomasz
Chris Wilson March 19, 2018, 2:26 p.m. UTC | #3
Quoting Lis, Tomasz (2018-03-19 14:14:19)
> 
> 
> On 2018-03-19 13:43, Chris Wilson wrote:
> > Quoting Tomasz Lis (2018-03-19 12:37:35)
> >> The patch adds a parameter to control the data port coherency functionality
> >> on a per-exec call basis. When data port coherency flag value is different
> >> than what it was in previous call for the context, a command to switch data
> >> port coherency state is added before the buffer to be executed.
> > So this is part of the context? Why do it at exec level?
> 
> It is part of the context, stored within HDC chicken bit register.
> The exec level was requested by the OCL team, due to concerns about 
> performance cost of context setparam calls.

What? Oh dear, oh dear, thrice oh dear. The context setparam would look
like:

	if (arg != context->value) {
		rq = request_alloc(context, RCS);
		cs = ring_begin(rq, 4);
		cs++ = MI_LRI;
		cs++ = reg;
		cs++ = magic;
		cs++ = MI_NOOP;
		request_add(rq);
		context->value = arg
	}

The argument is whether stuffing it into a crowded, v.frequently
executed execbuf is better than an irregular setparam. If they want to
flip it on every batch, use execbuf. If it's going to be very
infrequent, setparam.

That discussion must be part of the rationale in the commitlog.

Otoh, execbuf3 would accept it as a command packet. Hmm.

> >   If exec level
> > is desired, why not whitelist it?
> 
> If we have no issue in whitelisting the register, I'm sure OCL will 
> agree to that.
> I assumed the whitelisting will be unacceptable because of security 
> concerns with some options.
> The register also changes its position and content between gens, which 
> makes whitelisting hard to manage.
> 
> Main purpose of chicken bit registers, in general, is to allow work 
> around for hardware features which could  be buggy or could have 
> unintended influence on the platform.
> The data port coherency functionality landed there for the same reasons; 
> then it twisted itself in a way that we now need user space to switch it.
> Is it really ok to whitelist chicken bit registers?

It all depends on whether it breaks segregation. If the only users
affected are themselves, fine. Otherwise, no.
-Chris
Lis, Tomasz March 20, 2018, 5:23 p.m. UTC | #4
On 2018-03-19 15:26, Chris Wilson wrote:
> Quoting Lis, Tomasz (2018-03-19 14:14:19)
>>
>> On 2018-03-19 13:43, Chris Wilson wrote:
>>> Quoting Tomasz Lis (2018-03-19 12:37:35)
>>>> The patch adds a parameter to control the data port coherency functionality
>>>> on a per-exec call basis. When data port coherency flag value is different
>>>> than what it was in previous call for the context, a command to switch data
>>>> port coherency state is added before the buffer to be executed.
>>> So this is part of the context? Why do it at exec level?
>> It is part of the context, stored within HDC chicken bit register.
>> The exec level was requested by the OCL team, due to concerns about
>> performance cost of context setparam calls.
> What? Oh dear, oh dear, thrice oh dear. The context setparam would look
> like:
>
> 	if (arg != context->value) {
> 		rq = request_alloc(context, RCS);
> 		cs = ring_begin(rq, 4);
> 		cs++ = MI_LRI;
> 		cs++ = reg;
> 		cs++ = magic;
> 		cs++ = MI_NOOP;
> 		request_add(rq);
> 		context->value = arg
> 	}
>
> The argument is whether stuffing it into a crowded, v.frequently
> executed execbuf is better than an irregular setparam. If they want to
> flip it on every batch, use execbuf. If it's going to be very
> infrequent, setparam.
Implementing the data port coherency switch as context setparam would 
not be a problem, I agree.
But this is not a solution OCL is willing to accept. Any additional 
IOCTL call is a concern for the OCL developers.

For more explanation on switch frequency - please look at the cover 
letter I provided; here's the related part of it:
(note: the data port coherency is called fine grain coherency within UMD)

    3. Will coherency switch be used frequently?

    There are scenarios that will require frequent toggling of the coherency
    switch.
    E.g. an application has two OCL compute kernels: kern_master and kern_worker.
    kern_master uses, concurrently with CPU, some fine grain SVM resources
    (CL_MEM_SVM_FINE_GRAIN_BUFFER). These resources contain descriptors of
    computational work that needs to be executed. kern_master analyzes incoming
    work descriptors and populates a plain OCL buffer (non-fine-grain) with payload
    for kern_worker. Once kern_master is done, kern_worker kicks-in and processes
    the payload that kern_master produced. These two kernels work in a loop, one
    after another. Since only kern_master requires coherency, kern_worker should
    not be forced to pay for it. This means that we need to have the ability to
    toggle coherency switch on or off per each GPU submission:
    (ENABLE COHERENCY) kern_master -> (DISABLE COHERENCY)kern_worker -> (ENABLE
    COHERENCY) kern_master -> (DISABLE COHERENCY)kern_worker -> ...

> That discussion must be part of the rationale in the commitlog.
Will add.
Should I place the whole text from cover letter within the commit comment?
> Otoh, execbuf3 would accept it as a command packet. Hmm.
I know we have execbuf2, but execbuf3? Are you proposing to add 
something like that?
>>>    If exec level
>>> is desired, why not whitelist it?
>> If we have no issue in whitelisting the register, I'm sure OCL will
>> agree to that.
>> I assumed the whitelisting will be unacceptable because of security
>> concerns with some options.
>> The register also changes its position and content between gens, which
>> makes whitelisting hard to manage.
>>
>> Main purpose of chicken bit registers, in general, is to allow work
>> around for hardware features which could  be buggy or could have
>> unintended influence on the platform.
>> The data port coherency functionality landed there for the same reasons;
>> then it twisted itself in a way that we now need user space to switch it.
>> Is it really ok to whitelist chicken bit registers?
> It all depends on whether it breaks segregation. If the only users
> affected are themselves, fine. Otherwise, no.
> -Chris
Chicken Bit registers are definitely not planned as safe for use. While 
meaning of bits within HDC_CHICKEN0 change between gens, I doubt any of 
the registers *can't* be used to cause GPU hung.
-Tomasz
oscar.mateo@intel.com March 20, 2018, 6:43 p.m. UTC | #5
On 3/19/2018 7:14 AM, Lis, Tomasz wrote:
>
>
> On 2018-03-19 13:43, Chris Wilson wrote:
>> Quoting Tomasz Lis (2018-03-19 12:37:35)
>>> The patch adds a parameter to control the data port coherency 
>>> functionality
>>> on a per-exec call basis. When data port coherency flag value is 
>>> different
>>> than what it was in previous call for the context, a command to 
>>> switch data
>>> port coherency state is added before the buffer to be executed.
>> So this is part of the context? Why do it at exec level?
>
> It is part of the context, stored within HDC chicken bit register.
> The exec level was requested by the OCL team, due to concerns about 
> performance cost of context setparam calls.
>
>>   If exec level
>> is desired, why not whitelist it?
>> -Chris
>
> If we have no issue in whitelisting the register, I'm sure OCL will 
> agree to that.
> I assumed the whitelisting will be unacceptable because of security 
> concerns with some options.
> The register also changes its position and content between gens, which 
> makes whitelisting hard to manage.
>

I think a security analysis of this register was already done, and the 
result was that it contains some other bits that could be dangerous. In 
CNL those bits were moved out of the way and the HDC_CHICKEN0 register 
can be whitelisted (WaAllowUMDToControlCoherency). In ICL the register 
should already be non-privileged.

> Main purpose of chicken bit registers, in general, is to allow work 
> around for hardware features which could  be buggy or could have 
> unintended influence on the platform.
> The data port coherency functionality landed there for the same 
> reasons; then it twisted itself in a way that we now need user space 
> to switch it.
> Is it really ok to whitelist chicken bit registers?
> -Tomasz
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson March 21, 2018, 10:16 a.m. UTC | #6
Quoting Oscar Mateo (2018-03-20 18:43:45)
> 
> 
> On 3/19/2018 7:14 AM, Lis, Tomasz wrote:
> >
> >
> > On 2018-03-19 13:43, Chris Wilson wrote:
> >> Quoting Tomasz Lis (2018-03-19 12:37:35)
> >>> The patch adds a parameter to control the data port coherency 
> >>> functionality
> >>> on a per-exec call basis. When data port coherency flag value is 
> >>> different
> >>> than what it was in previous call for the context, a command to 
> >>> switch data
> >>> port coherency state is added before the buffer to be executed.
> >> So this is part of the context? Why do it at exec level?
> >
> > It is part of the context, stored within HDC chicken bit register.
> > The exec level was requested by the OCL team, due to concerns about 
> > performance cost of context setparam calls.
> >
> >>   If exec level
> >> is desired, why not whitelist it?
> >> -Chris
> >
> > If we have no issue in whitelisting the register, I'm sure OCL will 
> > agree to that.
> > I assumed the whitelisting will be unacceptable because of security 
> > concerns with some options.
> > The register also changes its position and content between gens, which 
> > makes whitelisting hard to manage.
> >
> 
> I think a security analysis of this register was already done, and the 
> result was that it contains some other bits that could be dangerous. In 
> CNL those bits were moved out of the way and the HDC_CHICKEN0 register 
> can be whitelisted (WaAllowUMDToControlCoherency). In ICL the register 
> should already be non-privileged.

The previous alternative to whitelisting was running through a command
parser for validation. That's a very general mechanism suitable for a
wide range of sins.
-Chris
oscar.mateo@intel.com March 21, 2018, 7:42 p.m. UTC | #7
On 3/21/2018 3:16 AM, Chris Wilson wrote:
> Quoting Oscar Mateo (2018-03-20 18:43:45)
>>
>> On 3/19/2018 7:14 AM, Lis, Tomasz wrote:
>>>
>>> On 2018-03-19 13:43, Chris Wilson wrote:
>>>> Quoting Tomasz Lis (2018-03-19 12:37:35)
>>>>> The patch adds a parameter to control the data port coherency
>>>>> functionality
>>>>> on a per-exec call basis. When data port coherency flag value is
>>>>> different
>>>>> than what it was in previous call for the context, a command to
>>>>> switch data
>>>>> port coherency state is added before the buffer to be executed.
>>>> So this is part of the context? Why do it at exec level?
>>> It is part of the context, stored within HDC chicken bit register.
>>> The exec level was requested by the OCL team, due to concerns about
>>> performance cost of context setparam calls.
>>>
>>>>    If exec level
>>>> is desired, why not whitelist it?
>>>> -Chris
>>> If we have no issue in whitelisting the register, I'm sure OCL will
>>> agree to that.
>>> I assumed the whitelisting will be unacceptable because of security
>>> concerns with some options.
>>> The register also changes its position and content between gens, which
>>> makes whitelisting hard to manage.
>>>
>> I think a security analysis of this register was already done, and the
>> result was that it contains some other bits that could be dangerous. In
>> CNL those bits were moved out of the way and the HDC_CHICKEN0 register
>> can be whitelisted (WaAllowUMDToControlCoherency). In ICL the register
>> should already be non-privileged.
> The previous alternative to whitelisting was running through a command
> parser for validation. That's a very general mechanism suitable for a
> wide range of sins.
> -Chris

Are you suggesting that we enable the cmd parser for every Gen < CNL for 
this particular usage only? :P
Lis, Tomasz March 27, 2018, 5:41 p.m. UTC | #8
On 2018-03-21 20:42, Oscar Mateo wrote:
>
>
> On 3/21/2018 3:16 AM, Chris Wilson wrote:
>> Quoting Oscar Mateo (2018-03-20 18:43:45)
>>>
>>> On 3/19/2018 7:14 AM, Lis, Tomasz wrote:
>>>>
>>>> On 2018-03-19 13:43, Chris Wilson wrote:
>>>>> Quoting Tomasz Lis (2018-03-19 12:37:35)
>>>>>> The patch adds a parameter to control the data port coherency
>>>>>> functionality
>>>>>> on a per-exec call basis. When data port coherency flag value is
>>>>>> different
>>>>>> than what it was in previous call for the context, a command to
>>>>>> switch data
>>>>>> port coherency state is added before the buffer to be executed.
>>>>> So this is part of the context? Why do it at exec level?
>>>> It is part of the context, stored within HDC chicken bit register.
>>>> The exec level was requested by the OCL team, due to concerns about
>>>> performance cost of context setparam calls.
>>>>
>>>>>    If exec level
>>>>> is desired, why not whitelist it?
>>>>> -Chris
>>>> If we have no issue in whitelisting the register, I'm sure OCL will
>>>> agree to that.
>>>> I assumed the whitelisting will be unacceptable because of security
>>>> concerns with some options.
>>>> The register also changes its position and content between gens, which
>>>> makes whitelisting hard to manage.
>>>>
>>> I think a security analysis of this register was already done, and the
>>> result was that it contains some other bits that could be dangerous. In
>>> CNL those bits were moved out of the way and the HDC_CHICKEN0 register
>>> can be whitelisted (WaAllowUMDToControlCoherency). In ICL the register
>>> should already be non-privileged.
>> The previous alternative to whitelisting was running through a command
>> parser for validation. That's a very general mechanism suitable for a
>> wide range of sins.
>> -Chris
>
> Are you suggesting that we enable the cmd parser for every Gen < CNL 
> for this particular usage only? :P
>
It is a solution that would allow to do what we want without any 
additions to module interface.
It may be worth considering if we think the coherency setting will be 
temporary and removed in future gens, as we wouldn't want to have 
obsolete flags.

I think the setting will stay with us, as it is needed to support 
CL_MEM_SVM_FINE_GRAIN_BUFFER flag from OpenCL spec.
Keeping coherency will always cost performance, so we will likely always 
have a hardware setting to switch it.
But the bspec says coherency override control will be removed in future 
projects...
Joonas Lahtinen May 4, 2018, 9:24 a.m. UTC | #9
Quoting Lis, Tomasz (2018-03-20 19:23:03)
> 
> 
> On 2018-03-19 15:26, Chris Wilson wrote:
> 
>     Quoting Lis, Tomasz (2018-03-19 14:14:19)
> 
> 
>         On 2018-03-19 13:43, Chris Wilson wrote:
> 
>             Quoting Tomasz Lis (2018-03-19 12:37:35)
> 
>                 The patch adds a parameter to control the data port coherency functionality
>                 on a per-exec call basis. When data port coherency flag value is different
>                 than what it was in previous call for the context, a command to switch data
>                 port coherency state is added before the buffer to be executed.
> 
>             So this is part of the context? Why do it at exec level?
> 
>         It is part of the context, stored within HDC chicken bit register.
>         The exec level was requested by the OCL team, due to concerns about
>         performance cost of context setparam calls.
> 
>     What? Oh dear, oh dear, thrice oh dear. The context setparam would look
>     like:
> 
>             if (arg != context->value) {
>                     rq = request_alloc(context, RCS);
>                     cs = ring_begin(rq, 4);
>                     cs++ = MI_LRI;
>                     cs++ = reg;
>                     cs++ = magic;
>                     cs++ = MI_NOOP;
>                     request_add(rq);
>                     context->value = arg
>             }
> 
>     The argument is whether stuffing it into a crowded, v.frequently
>     executed execbuf is better than an irregular setparam. If they want to
>     flip it on every batch, use execbuf. If it's going to be very
>     infrequent, setparam.
> 
> Implementing the data port coherency switch as context setparam would not be a
> problem, I agree.
> But this is not a solution OCL is willing to accept. Any additional IOCTL call
> is a concern for the OCL developers.

Being part of hardware context is a good indication that GEM context is
the right place for the bit. Stuffing more into execbuf for
one-usecase-one-platform scenario doesn't sound very future looking in
terms of overall driver development.

I would truly imagine that the IOCTL execution time should not be
meaningful compared to compute kernel execution times. If they are
already having a large amount of IOCTL calls between each patch, I guess
that is something to be discussed separately.

Regards, Joonas

> 
> For more explanation on switch frequency - please look at the cover letter I
> provided; here's the related part of it:
> (note: the data port coherency is called fine grain coherency within UMD)
> 
>     3. Will coherency switch be used frequently?
> 
>     There are scenarios that will require frequent toggling of the coherency
>     switch.
>     E.g. an application has two OCL compute kernels: kern_master and kern_worker.
>     kern_master uses, concurrently with CPU, some fine grain SVM resources
>     (CL_MEM_SVM_FINE_GRAIN_BUFFER). These resources contain descriptors of
>     computational work that needs to be executed. kern_master analyzes incoming
>     work descriptors and populates a plain OCL buffer (non-fine-grain) with payload
>     for kern_worker. Once kern_master is done, kern_worker kicks-in and processes
>     the payload that kern_master produced. These two kernels work in a loop, one
>     after another. Since only kern_master requires coherency, kern_worker should
>     not be forced to pay for it. This means that we need to have the ability to
>     toggle coherency switch on or off per each GPU submission:
>     (ENABLE COHERENCY) kern_master -> (DISABLE COHERENCY)kern_worker -> (ENABLE
>     COHERENCY) kern_master -> (DISABLE COHERENCY)kern_worker -> ...
> 
>     That discussion must be part of the rationale in the commitlog.
> 
> Will add.
> Should I place the whole text from cover letter within the commit comment?
> 
>     Otoh, execbuf3 would accept it as a command packet. Hmm.
> 
> I know we have execbuf2, but execbuf3? Are you proposing to add something like
> that?
> 
>               If exec level
>             is desired, why not whitelist it?
> 
>         If we have no issue in whitelisting the register, I'm sure OCL will
>         agree to that.
>         I assumed the whitelisting will be unacceptable because of security
>         concerns with some options.
>         The register also changes its position and content between gens, which
>         makes whitelisting hard to manage.
> 
>         Main purpose of chicken bit registers, in general, is to allow work
>         around for hardware features which could  be buggy or could have
>         unintended influence on the platform.
>         The data port coherency functionality landed there for the same reasons;
>         then it twisted itself in a way that we now need user space to switch it.
>         Is it really ok to whitelist chicken bit registers?
> 
>     It all depends on whether it breaks segregation. If the only users
>     affected are themselves, fine. Otherwise, no.
>     -Chris
> 
> Chicken Bit registers are definitely not planned as safe for use. While meaning
> of bits within HDC_CHICKEN0 change between gens, I doubt any of the registers
> *can't* be used to cause GPU hung.
> -Tomasz
>
Lis, Tomasz June 20, 2018, 3:03 p.m. UTC | #10
The OCL Team agreed to use IOCTL instead of Exec flag to switch coherency
settings.

Also:
1. I will follow this patch with IGT test for the new functionality.
2. The OCL Team will publish UMD patch for it.

Tomasz Lis (1):
  drm/i915: Add IOCTL Param to control data port coherency.

 drivers/gpu/drm/i915/i915_gem_context.c | 41 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h |  6 ++++
 drivers/gpu/drm/i915/intel_lrc.c        | 51 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |  4 +++
 include/uapi/drm/i915_drm.h             |  1 +
 5 files changed, 103 insertions(+)
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 3df5193..fcb3547 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -436,6 +436,9 @@  static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_CS_TIMESTAMP_FREQUENCY:
 		value = 1000 * INTEL_INFO(dev_priv)->cs_timestamp_frequency_khz;
 		break;
+	case I915_PARAM_HAS_EXEC_DATA_PORT_COHERENCY:
+		value = (INTEL_GEN(dev_priv) >= 9);
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 7854262..00aa309 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -118,6 +118,7 @@  struct i915_gem_context {
 #define CONTEXT_BANNABLE		3
 #define CONTEXT_BANNED			4
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	5
+#define CONTEXT_DATA_PORT_COHERENT	6
 
 	/**
 	 * @hw_id: - unique identifier for the context
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8c170db..f848f14 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2245,6 +2245,18 @@  i915_gem_do_execbuffer(struct drm_device *dev,
 		eb.batch_flags |= I915_DISPATCH_RS;
 	}
 
+	if (args->flags & I915_EXEC_DATA_PORT_COHERENT) {
+		if (INTEL_GEN(eb.i915) < 9) {
+			DRM_DEBUG("Data Port Coherency is only allowed for Gen9 and above\n");
+			return -EINVAL;
+		}
+		if (eb.engine->class != RENDER_CLASS) {
+			DRM_DEBUG("Data Port Coherency is not available on %s\n",
+				 eb.engine->name);
+			return -EINVAL;
+		}
+	}
+
 	if (args->flags & I915_EXEC_FENCE_IN) {
 		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
 		if (!in_fence)
@@ -2371,6 +2383,11 @@  i915_gem_do_execbuffer(struct drm_device *dev,
 		goto err_batch_unpin;
 	}
 
+	/* Emit the switch of data port coherency state if needed */
+	err = intel_lr_context_modify_data_port_coherency(eb.request,
+			(args->flags & I915_EXEC_DATA_PORT_COHERENT) != 0);
+	GEM_WARN_ON(err);
+
 	if (in_fence) {
 		err = i915_request_await_dma_fence(eb.request, in_fence);
 		if (err < 0)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 53f1c00..b847798 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -254,6 +254,59 @@  intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
 	ce->lrc_desc = desc;
 }
 
+static int emit_set_data_port_coherency(struct i915_request *req, bool enable)
+{
+	u32 *cs;
+	i915_reg_t reg;
+
+	GEM_BUG_ON(req->engine->class != RENDER_CLASS);
+	GEM_BUG_ON(INTEL_GEN(req->i915) < 9);
+
+	cs = intel_ring_begin(req, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	if (INTEL_GEN(req->i915) >= 10)
+		reg = CNL_HDC_CHICKEN0;
+	else
+		reg = HDC_CHICKEN0;
+
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(reg);
+	/* Enabling coherency means disabling the bit which forces it off */
+	if (enable)
+		*cs++ = _MASKED_BIT_DISABLE(HDC_FORCE_NON_COHERENT);
+	else
+		*cs++ = _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT);
+	*cs++ = MI_NOOP;
+
+	intel_ring_advance(req, cs);
+
+	return 0;
+}
+
+int
+intel_lr_context_modify_data_port_coherency(struct i915_request *req,
+					bool enable)
+{
+	struct i915_gem_context *ctx = req->ctx;
+	int ret;
+
+	if (test_bit(CONTEXT_DATA_PORT_COHERENT, &ctx->flags) == enable)
+		return 0;
+
+	ret = emit_set_data_port_coherency(req, enable);
+
+	if (!ret) {
+		if (enable)
+			__set_bit(CONTEXT_DATA_PORT_COHERENT, &ctx->flags);
+		else
+			__clear_bit(CONTEXT_DATA_PORT_COHERENT, &ctx->flags);
+	}
+
+	return ret;
+}
+
 static struct i915_priolist *
 lookup_priolist(struct intel_engine_cs *engine,
 		struct i915_priotree *pt,
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 59d7b86..c46b239 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -111,4 +111,7 @@  intel_lr_context_descriptor(struct i915_gem_context *ctx,
 	return ctx->engine[engine->id].lrc_desc;
 }
 
+int intel_lr_context_modify_data_port_coherency(struct i915_request *req,
+						bool enable);
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634c..a5fed1f 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -529,6 +529,11 @@  typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51
 
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to switch
+ * Data Cache access into Data Port Coherency mode.
+ */
+#define I915_PARAM_HAS_EXEC_DATA_PORT_COHERENCY 52
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
@@ -1048,7 +1053,12 @@  struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_ARRAY   (1<<19)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
+/* Data Port Coherency capability will be switched before an exec call
+ * which has this flag different than previous call for the context.
+ */
+#define I915_EXEC_DATA_PORT_COHERENT   (1<<20)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_DATA_PORT_COHERENT<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \