Message ID | 1522430994-15366-1-git-send-email-tomasz.lis@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Tomasz, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on drm-intel/for-linux-next] [also build test WARNING on v4.16-rc7 next-20180329] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tomasz-Lis/drm-i915-Add-Exec-param-to-control-data-port-coherency/20180401-021313 base: git://anongit.freedesktop.org/drm-intel for-linux-next config: i386-randconfig-x010-201813 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): In file included from drivers/gpu//drm/i915/i915_request.h:30:0, from drivers/gpu//drm/i915/i915_gem_timeline.h:30, from drivers/gpu//drm/i915/intel_ringbuffer.h:8, from drivers/gpu//drm/i915/intel_lrc.h:27, from drivers/gpu//drm/i915/i915_drv.h:63, from drivers/gpu//drm/i915/i915_gem_execbuffer.c:38: drivers/gpu//drm/i915/i915_gem_execbuffer.c: In function 'i915_gem_do_execbuffer': drivers/gpu//drm/i915/i915_gem.h:47:54: warning: statement with no effect [-Wunused-value] #define GEM_WARN_ON(expr) (BUILD_BUG_ON_INVALID(expr), 0) ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ >> drivers/gpu//drm/i915/i915_gem_execbuffer.c:2389:2: note: in expansion of macro 'GEM_WARN_ON' GEM_WARN_ON(err); ^~~~~~~~~~~ vim +/GEM_WARN_ON +2389 drivers/gpu//drm/i915/i915_gem_execbuffer.c 2182 2183 static int 2184 i915_gem_do_execbuffer(struct drm_device *dev, 2185 struct drm_file *file, 2186 struct drm_i915_gem_execbuffer2 *args, 2187 struct drm_i915_gem_exec_object2 *exec, 2188 struct drm_syncobj **fences) 2189 { 2190 struct i915_execbuffer eb; 2191 struct dma_fence *in_fence = NULL; 2192 struct sync_file *out_fence = NULL; 2193 int out_fence_fd = -1; 2194 int err; 2195 2196 BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS); 2197 BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS & 2198 ~__EXEC_OBJECT_UNKNOWN_FLAGS); 2199 2200 eb.i915 = to_i915(dev); 2201 eb.file = file; 2202 eb.args = args; 2203 if (DBG_FORCE_RELOC || !(args->flags & I915_EXEC_NO_RELOC)) 2204 args->flags |= __EXEC_HAS_RELOC; 2205 2206 eb.exec = exec; 2207 eb.vma = (struct i915_vma **)(exec + args->buffer_count + 1); 2208 eb.vma[0] = NULL; 2209 eb.flags = (unsigned int *)(eb.vma + args->buffer_count + 1); 2210 2211 eb.invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS; 2212 if (USES_FULL_PPGTT(eb.i915)) 2213 eb.invalid_flags |= EXEC_OBJECT_NEEDS_GTT; 2214 reloc_cache_init(&eb.reloc_cache, eb.i915); 2215 2216 eb.buffer_count = args->buffer_count; 2217 eb.batch_start_offset = args->batch_start_offset; 2218 eb.batch_len = args->batch_len; 2219 2220 eb.batch_flags = 0; 2221 if (args->flags & I915_EXEC_SECURE) { 2222 if (!drm_is_current_master(file) || !capable(CAP_SYS_ADMIN)) 2223 return -EPERM; 2224 2225 eb.batch_flags |= I915_DISPATCH_SECURE; 2226 } 2227 if (args->flags & I915_EXEC_IS_PINNED) 2228 eb.batch_flags |= I915_DISPATCH_PINNED; 2229 2230 eb.engine = eb_select_engine(eb.i915, file, args); 2231 if (!eb.engine) 2232 return -EINVAL; 2233 2234 if (args->flags & I915_EXEC_RESOURCE_STREAMER) { 2235 if (!HAS_RESOURCE_STREAMER(eb.i915)) { 2236 DRM_DEBUG("RS is only allowed for Haswell, Gen8 and above\n"); 2237 return -EINVAL; 2238 } 2239 if (eb.engine->id != RCS) { 2240 DRM_DEBUG("RS is not available on %s\n", 2241 eb.engine->name); 2242 return -EINVAL; 2243 } 2244 2245 eb.batch_flags |= I915_DISPATCH_RS; 2246 } 2247 2248 if (args->flags & I915_EXEC_DATA_PORT_COHERENT) { 2249 if (INTEL_GEN(eb.i915) < 9) { 2250 DRM_DEBUG("Data Port Coherency is only allowed for Gen9 and above\n"); 2251 return -EINVAL; 2252 } 2253 if (eb.engine->class != RENDER_CLASS) { 2254 DRM_DEBUG("Data Port Coherency is not available on %s\n", 2255 eb.engine->name); 2256 return -EINVAL; 2257 } 2258 } 2259 2260 if (args->flags & I915_EXEC_FENCE_IN) { 2261 in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2)); 2262 if (!in_fence) 2263 return -EINVAL; 2264 } 2265 2266 if (args->flags & I915_EXEC_FENCE_OUT) { 2267 out_fence_fd = get_unused_fd_flags(O_CLOEXEC); 2268 if (out_fence_fd < 0) { 2269 err = out_fence_fd; 2270 goto err_in_fence; 2271 } 2272 } 2273 2274 err = eb_create(&eb); 2275 if (err) 2276 goto err_out_fence; 2277 2278 GEM_BUG_ON(!eb.lut_size); 2279 2280 err = eb_select_context(&eb); 2281 if (unlikely(err)) 2282 goto err_destroy; 2283 2284 /* 2285 * Take a local wakeref for preparing to dispatch the execbuf as 2286 * we expect to access the hardware fairly frequently in the 2287 * process. Upon first dispatch, we acquire another prolonged 2288 * wakeref that we hold until the GPU has been idle for at least 2289 * 100ms. 2290 */ 2291 intel_runtime_pm_get(eb.i915); 2292 2293 err = i915_mutex_lock_interruptible(dev); 2294 if (err) 2295 goto err_rpm; 2296 2297 err = eb_relocate(&eb); 2298 if (err) { 2299 /* 2300 * If the user expects the execobject.offset and 2301 * reloc.presumed_offset to be an exact match, 2302 * as for using NO_RELOC, then we cannot update 2303 * the execobject.offset until we have completed 2304 * relocation. 2305 */ 2306 args->flags &= ~__EXEC_HAS_RELOC; 2307 goto err_vma; 2308 } 2309 2310 if (unlikely(*eb.batch->exec_flags & EXEC_OBJECT_WRITE)) { 2311 DRM_DEBUG("Attempting to use self-modifying batch buffer\n"); 2312 err = -EINVAL; 2313 goto err_vma; 2314 } 2315 if (eb.batch_start_offset > eb.batch->size || 2316 eb.batch_len > eb.batch->size - eb.batch_start_offset) { 2317 DRM_DEBUG("Attempting to use out-of-bounds batch\n"); 2318 err = -EINVAL; 2319 goto err_vma; 2320 } 2321 2322 if (eb_use_cmdparser(&eb)) { 2323 struct i915_vma *vma; 2324 2325 vma = eb_parse(&eb, drm_is_current_master(file)); 2326 if (IS_ERR(vma)) { 2327 err = PTR_ERR(vma); 2328 goto err_vma; 2329 } 2330 2331 if (vma) { 2332 /* 2333 * Batch parsed and accepted: 2334 * 2335 * Set the DISPATCH_SECURE bit to remove the NON_SECURE 2336 * bit from MI_BATCH_BUFFER_START commands issued in 2337 * the dispatch_execbuffer implementations. We 2338 * specifically don't want that set on batches the 2339 * command parser has accepted. 2340 */ 2341 eb.batch_flags |= I915_DISPATCH_SECURE; 2342 eb.batch_start_offset = 0; 2343 eb.batch = vma; 2344 } 2345 } 2346 2347 if (eb.batch_len == 0) 2348 eb.batch_len = eb.batch->size - eb.batch_start_offset; 2349 2350 /* 2351 * snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure 2352 * batch" bit. Hence we need to pin secure batches into the global gtt. 2353 * hsw should have this fixed, but bdw mucks it up again. */ 2354 if (eb.batch_flags & I915_DISPATCH_SECURE) { 2355 struct i915_vma *vma; 2356 2357 /* 2358 * So on first glance it looks freaky that we pin the batch here 2359 * outside of the reservation loop. But: 2360 * - The batch is already pinned into the relevant ppgtt, so we 2361 * already have the backing storage fully allocated. 2362 * - No other BO uses the global gtt (well contexts, but meh), 2363 * so we don't really have issues with multiple objects not 2364 * fitting due to fragmentation. 2365 * So this is actually safe. 2366 */ 2367 vma = i915_gem_object_ggtt_pin(eb.batch->obj, NULL, 0, 0, 0); 2368 if (IS_ERR(vma)) { 2369 err = PTR_ERR(vma); 2370 goto err_vma; 2371 } 2372 2373 eb.batch = vma; 2374 } 2375 2376 /* All GPU relocation batches must be submitted prior to the user rq */ 2377 GEM_BUG_ON(eb.reloc_cache.rq); 2378 2379 /* Allocate a request for this batch buffer nice and early. */ 2380 eb.request = i915_request_alloc(eb.engine, eb.ctx); 2381 if (IS_ERR(eb.request)) { 2382 err = PTR_ERR(eb.request); 2383 goto err_batch_unpin; 2384 } 2385 2386 /* Emit the switch of data port coherency state if needed */ 2387 err = intel_lr_context_modify_data_port_coherency(eb.request, 2388 (args->flags & I915_EXEC_DATA_PORT_COHERENT) != 0); > 2389 GEM_WARN_ON(err); 2390 2391 if (in_fence) { 2392 err = i915_request_await_dma_fence(eb.request, in_fence); 2393 if (err < 0) 2394 goto err_request; 2395 } 2396 2397 if (fences) { 2398 err = await_fence_array(&eb, fences); 2399 if (err) 2400 goto err_request; 2401 } 2402 2403 if (out_fence_fd != -1) { 2404 out_fence = sync_file_create(&eb.request->fence); 2405 if (!out_fence) { 2406 err = -ENOMEM; 2407 goto err_request; 2408 } 2409 } 2410 2411 /* 2412 * Whilst this request exists, batch_obj will be on the 2413 * active_list, and so will hold the active reference. Only when this 2414 * request is retired will the the batch_obj be moved onto the 2415 * inactive_list and lose its active reference. Hence we do not need 2416 * to explicitly hold another reference here. 2417 */ 2418 eb.request->batch = eb.batch; 2419 2420 trace_i915_request_queue(eb.request, eb.batch_flags); 2421 err = eb_submit(&eb); 2422 err_request: 2423 __i915_request_add(eb.request, err == 0); 2424 add_to_client(eb.request, file); 2425 2426 if (fences) 2427 signal_fence_array(&eb, fences); 2428 2429 if (out_fence) { 2430 if (err == 0) { 2431 fd_install(out_fence_fd, out_fence->file); 2432 args->rsvd2 &= GENMASK_ULL(31, 0); /* keep in-fence */ 2433 args->rsvd2 |= (u64)out_fence_fd << 32; 2434 out_fence_fd = -1; 2435 } else { 2436 fput(out_fence->file); 2437 } 2438 } 2439 2440 err_batch_unpin: 2441 if (eb.batch_flags & I915_DISPATCH_SECURE) 2442 i915_vma_unpin(eb.batch); 2443 err_vma: 2444 if (eb.exec) 2445 eb_release_vmas(&eb); 2446 mutex_unlock(&dev->struct_mutex); 2447 err_rpm: 2448 intel_runtime_pm_put(eb.i915); 2449 i915_gem_context_put(eb.ctx); 2450 err_destroy: 2451 eb_destroy(&eb); 2452 err_out_fence: 2453 if (out_fence_fd != -1) 2454 put_unused_fd(out_fence_fd); 2455 err_in_fence: 2456 dma_fence_put(in_fence); 2457 return err; 2458 } 2459 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index d354627..030854e 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -436,6 +436,9 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_CS_TIMESTAMP_FREQUENCY: value = 1000 * INTEL_INFO(dev_priv)->cs_timestamp_frequency_khz; break; + case I915_PARAM_HAS_EXEC_DATA_PORT_COHERENCY: + value = (INTEL_GEN(dev_priv) >= 9); + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h index 7854262..00aa309 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.h +++ b/drivers/gpu/drm/i915/i915_gem_context.h @@ -118,6 +118,7 @@ struct i915_gem_context { #define CONTEXT_BANNABLE 3 #define CONTEXT_BANNED 4 #define CONTEXT_FORCE_SINGLE_SUBMISSION 5 +#define CONTEXT_DATA_PORT_COHERENT 6 /** * @hw_id: - unique identifier for the context diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 8c170db..e3a2f9e 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2245,6 +2245,18 @@ i915_gem_do_execbuffer(struct drm_device *dev, eb.batch_flags |= I915_DISPATCH_RS; } + if (args->flags & I915_EXEC_DATA_PORT_COHERENT) { + if (INTEL_GEN(eb.i915) < 9) { + DRM_DEBUG("Data Port Coherency is only allowed for Gen9 and above\n"); + return -EINVAL; + } + if (eb.engine->class != RENDER_CLASS) { + DRM_DEBUG("Data Port Coherency is not available on %s\n", + eb.engine->name); + return -EINVAL; + } + } + if (args->flags & I915_EXEC_FENCE_IN) { in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2)); if (!in_fence) @@ -2371,6 +2383,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, goto err_batch_unpin; } + /* Emit the switch of data port coherency state if needed */ + err = intel_lr_context_modify_data_port_coherency(eb.request, + (args->flags & I915_EXEC_DATA_PORT_COHERENT) != 0); + GEM_WARN_ON(err); + if (in_fence) { err = i915_request_await_dma_fence(eb.request, in_fence); if (err < 0) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index f60b61b..2094494 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -254,6 +254,59 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx, ce->lrc_desc = desc; } +static int emit_set_data_port_coherency(struct i915_request *req, bool enable) +{ + u32 *cs; + i915_reg_t reg; + + GEM_BUG_ON(req->engine->class != RENDER_CLASS); + GEM_BUG_ON(INTEL_GEN(req->i915) < 9); + + cs = intel_ring_begin(req, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + if (INTEL_GEN(req->i915) >= 10) + reg = CNL_HDC_CHICKEN0; + else + reg = HDC_CHICKEN0; + + *cs++ = MI_LOAD_REGISTER_IMM(1); + *cs++ = i915_mmio_reg_offset(reg); + /* Enabling coherency means disabling the bit which forces it off */ + if (enable) + *cs++ = _MASKED_BIT_DISABLE(HDC_FORCE_NON_COHERENT); + else + *cs++ = _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT); + *cs++ = MI_NOOP; + + intel_ring_advance(req, cs); + + return 0; +} + +int +intel_lr_context_modify_data_port_coherency(struct i915_request *req, + bool enable) +{ + struct i915_gem_context *ctx = req->ctx; + int ret; + + if (test_bit(CONTEXT_DATA_PORT_COHERENT, &ctx->flags) == enable) + return 0; + + ret = emit_set_data_port_coherency(req, enable); + + if (!ret) { + if (enable) + __set_bit(CONTEXT_DATA_PORT_COHERENT, &ctx->flags); + else + __clear_bit(CONTEXT_DATA_PORT_COHERENT, &ctx->flags); + } + + return ret; +} + static struct i915_priolist * lookup_priolist(struct intel_engine_cs *engine, struct i915_priotree *pt, diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index 59d7b86..c46b239 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -111,4 +111,7 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx, return ctx->engine[engine->id].lrc_desc; } +int intel_lr_context_modify_data_port_coherency(struct i915_request *req, + bool enable); + #endif /* _INTEL_LRC_H_ */ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 7f5634c..0f52793 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -529,6 +529,11 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51 +/* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to switch + * Data Cache access into Data Port Coherency mode. + */ +#define I915_PARAM_HAS_EXEC_DATA_PORT_COHERENCY 52 + typedef struct drm_i915_getparam { __s32 param; /* @@ -1048,7 +1053,12 @@ struct drm_i915_gem_execbuffer2 { */ #define I915_EXEC_FENCE_ARRAY (1<<19) -#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1)) +/* Data Port Coherency capability will be switched before an exec call + * which has this flag different than previous call for the context. + */ +#define I915_EXEC_DATA_PORT_COHERENT (1<<20) + +#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_DATA_PORT_COHERENT<<1)) #define I915_EXEC_CONTEXT_ID_MASK (0xffffffff) #define i915_execbuffer2_set_context_id(eb2, context) \
The patch adds a parameter to control the data port coherency functionality on a per-exec call basis. When data port coherency flag value is different than what it was in previous call for the context, a command to switch data port coherency state is added before the buffer to be executed. Rationale: The OpenCL driver develpers requested a functionality to control cache coherency at data port level. Keeping the coherency at that level is disabled by default due to its performance costs. OpenCL driver is planning to enable it for a small subset of submissions, when such functionality is required. Below are answers to basic question explaining background of the functionality and reasoning for the proposed implementation: 1. Why do we need a coherency enable/disable switch for memory that is shared between CPU and GEN (GPU)? Memory coherency between CPU and GEN, while being a great feature that enables CL_MEM_SVM_FINE_GRAIN_BUFFER OCL capability on Intel GEN architecture, adds overhead related to tracking (snooping) memory inside different cache units (L1$, L2$, L3$, LLC$, etc.). At the same time, minority of modern OCL applications actually use CL_MEM_SVM_FINE_GRAIN_BUFFER (and hence require memory coherency between CPU and GPU). The goal of coherency enable/disable switch is to remove overhead of memory coherency when memory coherency is not needed. 2. Why do we need a global coherency switch? In order to support I/O commands from within EUs (Execution Units), Intel GEN ISA (GEN Instruction Set Assembly) contains dedicated "send" instructions. These send instructions provide several addressing models. One of these addressing models (named "stateless") provides most flexible I/O using plain virtual addresses (as opposed to buffer_handle+offset models). This "stateless" model is similar to regular memory load/store operations available on typical CPUs. Since this model provides I/O using arbitrary virtual addresses, it enables algorithmic designs that are based on pointer-to-pointer (e.g. buffer of pointers) concepts. For instance, it allows creating tree-like data structures such as: ________________ | NODE1 | | uint64_t data | +----------------| | NODE* | NODE*| +--------+-------+ / \ ________________/ \________________ | NODE2 | | NODE3 | | uint64_t data | | uint64_t data | +----------------| +----------------| | NODE* | NODE*| | NODE* | NODE*| +--------+-------+ +--------+-------+ Please note that pointers inside such structures can point to memory locations in different OCL allocations - e.g. NODE1 and NODE2 can reside in one OCL allocation while NODE3 resides in a completely separate OCL allocation. Additionally, such pointers can be shared with CPU (i.e. using SVM - Shared Virtual Memory feature). Using pointers from different allocations doesn't affect the stateless addressing model which even allows scattered reading from different allocations at the same time (i.e. by utilizing SIMD-nature of send instructions). When it comes to coherency programming, send instructions in stateless model can be encoded (at ISA level) to either use or disable coherency. However, for generic OCL applications (such as example with tree-like data structure), OCL compiler is not able to determine origin of memory pointed to by an arbitrary pointer - i.e. is not able to track given pointer back to a specific allocation. As such, it's not able to decide whether coherency is needed or not for specific pointer (or for specific I/O instruction). As a result, compiler encodes all stateless sends as coherent (doing otherwise would lead to functional issues resulting from data corruption). Please note that it would be possible to workaround this (e.g. based on allocations map and pointer bounds checking prior to each I/O instruction) but the performance cost of such workaround would be many times greater than the cost of keeping coherency always enabled. As such, enabling/disabling memory coherency at GEN ISA level is not feasible and alternative method is needed. Such alternative solution is to have a global coherency switch that allows disabling coherency for single (though entire) GPU submission. This is beneficial because this way we: * can enable (and pay for) coherency only in submissions that actually need coherency (submissions that use CL_MEM_SVM_FINE_GRAIN_BUFFER resources) * don't care about coherency at GEN ISA granularity (no performance impact) 3. Will coherency switch be used frequently? There are scenarios that will require frequent toggling of the coherency switch. E.g. an application has two OCL compute kernels: kern_master and kern_worker. kern_master uses, concurrently with CPU, some fine grain SVM resources (CL_MEM_SVM_FINE_GRAIN_BUFFER). These resources contain descriptors of computational work that needs to be executed. kern_master analyzes incoming work descriptors and populates a plain OCL buffer (non-fine-grain) with payload for kern_worker. Once kern_master is done, kern_worker kicks-in and processes the payload that kern_master produced. These two kernels work in a loop, one after another. Since only kern_master requires coherency, kern_worker should not be forced to pay for it. This means that we need to have the ability to toggle coherency switch on or off per each GPU submission: (ENABLE COHERENCY) kern_master -> (DISABLE COHERENCY)kern_worker -> (ENABLE COHERENCY) kern_master -> (DISABLE COHERENCY)kern_worker -> ... 4. Why the execlist flag approach was chosen? There are two other ways of providing the functionality to UMDs, besides the execlist flag: a) Chicken bit register whitelisting. This approach would allow adding the functionality without any change to KMDs interface. Also, it has been determined that whitelisting is safe for gen10 and gen11. The issue is with gen9, where hardware whitelisting cannot be used, and OCL driver needs support for it. A workaround there would be to use command parser, which verifies buffers before execution. But such parsing comes at a considerable performance cost. b) Providing the flag as context IOCTL setting. The data port coherency switch could be implemented as a context parameter, which would schedule submission of a buffer to switch the coherency flag. That is an elegant solution with bounds the flag to context, which matches the hardware placement of the feature. This solution was not accepted because of OCL driver performance concerns. The OCL driver is constructed with emphasis on creating small, but very frequent submissions. With such architecture, adding IOCTL setparam call before submission has considerable impact on the performance. Bspec: 11419 Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 3 ++ drivers/gpu/drm/i915/i915_gem_context.h | 1 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 17 ++++++++++ drivers/gpu/drm/i915/intel_lrc.c | 53 ++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_lrc.h | 3 ++ include/uapi/drm/i915_drm.h | 12 ++++++- 6 files changed, 88 insertions(+), 1 deletion(-)