From patchwork Sat Jan 9 11:30:21 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: akash.goel@intel.com X-Patchwork-Id: 7992111 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 0811FBEEE5 for ; Sat, 9 Jan 2016 11:18:08 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 54FE320274 for ; Sat, 9 Jan 2016 11:18:06 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 5F908201FA for ; Sat, 9 Jan 2016 11:18:04 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 735E56E09D; Sat, 9 Jan 2016 03:18:01 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id 39A336E09D for ; Sat, 9 Jan 2016 03:17:59 -0800 (PST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP; 09 Jan 2016 03:17:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,544,1444719600"; d="scan'208";a="877894254" Received: from akashgoe-desktop.iind.intel.com ([10.223.82.36]) by fmsmga001.fm.intel.com with ESMTP; 09 Jan 2016 03:17:56 -0800 From: akash.goel@intel.com To: intel-gfx@lists.freedesktop.org Date: Sat, 9 Jan 2016 17:00:21 +0530 Message-Id: <1452339021-3177-1-git-send-email-akash.goel@intel.com> X-Mailer: git-send-email 1.9.2 Cc: Akash Goel Subject: [Intel-gfx] [PATCH] drm/i915: Support to enable TRTT on GEN9 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Akash Goel Gen9 has an additional address translation hardware support in form of Tiled Resource Translation Table (TR-TT) which provides an extra level of abstraction over PPGTT. This is useful for mapping Sparse/Tiled texture resources. Sparse resources are created as virtual-only allocations. Regions of the resource that the application intends to use is bound to the physical memory on the fly and can be re-bound to different memory allocations over the lifetime of the resource. TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required for a new PPGTT instance, but TR-TT may not enabled for every context. 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT, which such chunk to use is conveyed to HW through a register. Any GFX address, which lies in that reserved 44 bit range will be translated through TR-TT first and then through PPGTT to get the actual physical address, so the output of translation from TR-TT will be a PPGTT offset. TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and each level is contained within a 4KB page hence L3 and L2 is composed of 512 64b entries and L1 is composed of 1024 32b entries. There is a provision to keep TR-TT Tables in virtual space, where the pages of TRTT tables will be mapped to PPGTT. Currently this is the supported mode, in this mode UMD will have a full control on TR-TT management, with bare minimum support from KMD. So the entries of L3 table will contain the PPGTT offset of L2 Table pages, similarly entries of L2 table will contain the PPGTT offset of L1 Table pages. The entries of L1 table will contain the PPGTT offset of BOs actually backing the Sparse resources. The assumption here is that UMD only will do the complete PPGTT address space management and use the Soft Pin API for all the buffer objects associated with a given Context. So UMD will also have to allocate the L3/L2/L1 table pages as a regular GEM BO only & assign them a PPGTT address through the Soft Pin API. UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to program the relevant entries of L3/L2/L1 tables. Any space in TR-TT segment not bound to any Sparse texture, will be handled through Invalid tile, User is expected to initialize the entries of a new L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to the holes in the Sparse texture resource will be set with the Null tile pattern The improper programming of TRTT should only lead to a recoverable GPU hang, eventually leading to banning of the culprit context without victimizing others. The association of any Sparse resource with the BOs will be known only to UMD, and only the Sparse resources shall be assigned an offset from the TR-TT segment by UMD. The use of TR-TT segment or mapping of Sparse resources will be abstracted from the KMD, UMD can do the address assignment from TR-TT segment autonomously and KMD will be oblivious of it. The BOs must not be assigned an address from TR-TT segment, they will be mapped to PPGTT in a regular way by KMD, using the Soft Pin offset provided by UMD. This patch provides an interface through which UMD can convey KMD to enable TR-TT for a given context. A new I915_CONTEXT_PARAM_ENABLE_TRTT param has been added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose. UMD will have to pass the GFX address of L3 table page, pattern value for the Null & invalid Tile registers. Testcase: igt/gem_trtt Signed-off-by: Akash Goel --- drivers/gpu/drm/i915/i915_dma.c | 3 ++ drivers/gpu/drm/i915/i915_drv.h | 12 +++++++ drivers/gpu/drm/i915/i915_gem_context.c | 45 ++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_gem_gtt.c | 57 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_gem_gtt.h | 6 ++++ drivers/gpu/drm/i915/i915_reg.h | 19 +++++++++++ drivers/gpu/drm/i915/intel_lrc.c | 41 ++++++++++++++++++++++++ include/uapi/drm/i915_drm.h | 8 +++++ 8 files changed, 191 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 988a380..c247c25 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_EXEC_SOFTPIN: value = 1; break; + case I915_PARAM_HAS_TRTT: + value = HAS_TRTT(dev); + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index c6dd4db..12c612e 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -839,6 +839,7 @@ struct i915_ctx_hang_stats { #define DEFAULT_CONTEXT_HANDLE 0 #define CONTEXT_NO_ZEROMAP (1<<0) +#define CONTEXT_USE_TRTT (1<<1) /** * struct intel_context - as the name implies, represents a context. * @ref: reference count. @@ -881,6 +882,15 @@ struct intel_context { int pin_count; } engine[I915_NUM_RINGS]; + /* TRTT info */ + struct { + uint32_t invd_tile_val; + uint32_t null_tile_val; + uint64_t l3_table_address; + struct i915_vma *vma; + bool update_trtt_params; + } trtt_info; + struct list_head link; }; @@ -2626,6 +2636,8 @@ struct drm_i915_cmd_table { !IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev) && \ !IS_BROXTON(dev)) +#define HAS_TRTT(dev) (IS_GEN9(dev)) + #define INTEL_PCH_DEVICE_ID_MASK 0xff00 #define INTEL_PCH_IBX_DEVICE_ID_TYPE 0x3b00 #define INTEL_PCH_CPT_DEVICE_ID_TYPE 0x1c00 diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 900ffd0..ae9fc34 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -146,6 +146,9 @@ static void i915_gem_context_clean(struct intel_context *ctx) if (WARN_ON(__i915_vma_unbind_no_wait(vma))) break; } + + if (ctx->flags & CONTEXT_USE_TRTT) + i915_gem_destroy_trtt_vma(ctx->trtt_info.vma); } void i915_gem_context_free(struct kref *ctx_ref) @@ -512,6 +515,35 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id) return ctx; } +static int +i915_setup_trtt_ctx(struct intel_context *ctx, + struct drm_i915_gem_context_trtt_param *trtt_params) +{ + if (ctx->flags & CONTEXT_USE_TRTT) + return -EEXIST; + + /* basic sanity checks for the l3 table pointer */ + if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) && + (ctx->trtt_info.l3_table_address < + (GEN9_TRTT_SEGMENT_START + GEN9_TRTT_SEGMENT_SIZE))) + return -EINVAL; + + if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK) + return -EINVAL; + + ctx->trtt_info.vma = i915_gem_setup_trtt_vma(&ctx->ppgtt->base); + if (IS_ERR(ctx->trtt_info.vma)) + return PTR_ERR(ctx->trtt_info.vma); + + ctx->trtt_info.null_tile_val = trtt_params->null_tile_val; + ctx->trtt_info.invd_tile_val = trtt_params->invd_tile_val; + ctx->trtt_info.l3_table_address = trtt_params->l3_table_address; + ctx->trtt_info.update_trtt_params = 1; + + ctx->flags |= CONTEXT_USE_TRTT; + return 0; +} + static inline int mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags) { @@ -952,6 +984,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, { struct drm_i915_file_private *file_priv = file->driver_priv; struct drm_i915_gem_context_param *args = data; + struct drm_i915_gem_context_trtt_param trtt_params; struct intel_context *ctx; int ret; @@ -983,6 +1016,18 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0; } break; + case I915_CONTEXT_PARAM_ENABLE_TRTT: + if (args->size < sizeof(trtt_params)) + ret = -EINVAL; + else if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev)) + ret = -ENODEV; + else if (copy_from_user(&trtt_params, + to_user_ptr(args->value), + sizeof(trtt_params))) + ret = -EFAULT; + else + ret = i915_setup_trtt_ctx(ctx, &trtt_params); + break; default: ret = -EINVAL; break; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 56f4f2e..28fc1ea 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2146,6 +2146,13 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt) int i915_ppgtt_init_hw(struct drm_device *dev) { + if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) { + struct drm_i915_private *dev_priv = dev->dev_private; + + I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR, + GEN9_TRTT_BYPASS_DISABLE); + } + /* In the case of execlists, PPGTT is enabled by the context descriptor * and the PDPs are contained within the context itself. We don't * need to do anything here. */ @@ -3328,6 +3335,56 @@ i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj, } +void i915_gem_destroy_trtt_vma(struct i915_vma *vma) +{ + struct i915_address_space *vm = vma->vm; + + WARN_ON(!list_empty(&vma->vma_link)); + WARN_ON(!list_empty(&vma->mm_list)); + WARN_ON(!list_empty(&vma->exec_list)); + + drm_mm_remove_node(&vma->node); + i915_ppgtt_put(i915_vm_to_ppgtt(vm)); + kmem_cache_free(to_i915(vm->dev)->vmas, vma); +} + +struct i915_vma * +i915_gem_setup_trtt_vma(struct i915_address_space *vm) +{ + struct i915_vma *vma; + int ret; + + vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL); + if (vma == NULL) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(&vma->vma_link); + INIT_LIST_HEAD(&vma->mm_list); + INIT_LIST_HEAD(&vma->exec_list); + vma->vm = vm; + i915_ppgtt_get(i915_vm_to_ppgtt(vm)); + + /* Mark the vma as perennially pinned */ + vma->pin_count = 1; + + /* Reserve from the 48 bit PPGTT space */ + vma->node.start = GEN9_TRTT_SEGMENT_START; + vma->node.size = GEN9_TRTT_SEGMENT_SIZE; + ret = drm_mm_reserve_node(&vm->mm, &vma->node); + if (ret) { + ret = i915_gem_evict_for_vma(vma); + if (ret == 0) + ret = drm_mm_reserve_node(&vm->mm, &vma->node); + } + if (ret) { + DRM_ERROR("Reservation for TRTT segment failed: %i\n", ret); + i915_gem_destroy_trtt_vma(vma); + return ERR_PTR(ret); + } + + return vma; +} + static struct scatterlist * rotate_pages(dma_addr_t *in, unsigned int offset, unsigned int width, unsigned int height, diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index b448ad8..acb942d 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -129,6 +129,10 @@ typedef uint64_t gen8_ppgtt_pml4e_t; #define GEN8_PPAT_ELLC_OVERRIDE (0<<2) #define GEN8_PPAT(i, x) ((uint64_t) (x) << ((i) * 8)) +/* Lies at the top of 48 bit PPGTT space */ +#define GEN9_TRTT_SEGMENT_START ((1ULL << 48) - (1ULL << 44)) +#define GEN9_TRTT_SEGMENT_SIZE (1ULL << 44) + enum i915_ggtt_view_type { I915_GGTT_VIEW_NORMAL = 0, I915_GGTT_VIEW_ROTATED, @@ -559,4 +563,6 @@ size_t i915_ggtt_view_size(struct drm_i915_gem_object *obj, const struct i915_ggtt_view *view); +struct i915_vma *i915_gem_setup_trtt_vma(struct i915_address_space *vm); +void i915_gem_destroy_trtt_vma(struct i915_vma *vma); #endif diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 007ae83..5859be6 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -186,6 +186,25 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN8_RPCS_EU_MIN_SHIFT 0 #define GEN8_RPCS_EU_MIN_MASK (0xf << GEN8_RPCS_EU_MIN_SHIFT) +#define GEN9_TR_CHICKEN_BIT_VECTOR _MMIO(0x4DFC) +#define GEN9_TRTT_BYPASS_DISABLE (1<<0) + +/* TRTT registers in the H/W Context */ +#define GEN9_TRTT_L3_POINTER_DW0 _MMIO(0x4DE0) +#define GEN9_TRTT_L3_POINTER_DW1 _MMIO(0x4DE4) +#define GEN9_TRTT_L3_GFXADDR_MASK 0xFFFFFFFF0000 + +#define GEN9_TRTT_NULL_TILE_REG _MMIO(0x4DE8) +#define GEN9_TRTT_INVD_TILE_REG _MMIO(0x4DEC) + +#define GEN9_TRTT_VA_MASKDATA _MMIO(0x4DF0) +#define GEN9_TRVA_MASK_VALUE 0xF0 +#define GEN9_TRVA_DATA_VALUE 0xF + +#define GEN9_TRTT_TABLE_CONTROL _MMIO(0x4DF4) +#define GEN9_TRTT_IN_GFX_VA_SPACE (1<<1) +#define GEN9_TRTT_ENABLE (1<<0) + #define GAM_ECOCHK _MMIO(0x4090) #define BDW_DISABLE_HDC_INVALIDATION (1<<25) #define ECOCHK_SNB_BIT (1<<10) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 8096c6a..a8b795d 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -183,6 +183,12 @@ #define CTX_LRI_HEADER_2 0x41 #define CTX_R_PWR_CLK_STATE 0x42 #define CTX_GPGPU_CSR_BASE_ADDRESS 0x44 +#define CTX_TRTT_L3_PTR_DW0 0x202 +#define CTX_TRTT_L3_PTR_DW1 0x204 +#define CTX_TRTT_NULL_TILE 0x206 +#define CTX_TRTT_INVD_TILE 0x208 +#define CTX_TRTT_VA_MASKDATA 0x20A +#define CTX_TRTT_TBL_CTL 0x20C #define GEN8_CTX_VALID (1<<0) #define GEN8_CTX_FORCE_PD_RESTORE (1<<1) @@ -228,6 +234,8 @@ enum { static int intel_lr_context_pin(struct drm_i915_gem_request *rq); static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring, struct drm_i915_gem_object *default_ctx_obj); +static void populate_lr_context_trtt(struct intel_context *ctx, + uint32_t *reg_state); /** @@ -390,6 +398,14 @@ static int execlists_update_context(struct drm_i915_gem_request *rq) ASSIGN_CTX_PDP(ppgtt, reg_state, 0); } + if (ring->id == RCS && rq->ctx->trtt_info.update_trtt_params) { + /* The same page of the context object also contain fields + * related for TRTT setup. + */ + populate_lr_context_trtt(rq->ctx, reg_state); + rq->ctx->trtt_info.update_trtt_params = 0; + } + kunmap_atomic(reg_state); return 0; @@ -2247,6 +2263,31 @@ make_rpcs(struct drm_device *dev) return rpcs; } +static void +populate_lr_context_trtt(struct intel_context *ctx, uint32_t *reg_state) +{ + unsigned long masked_l3_gfx_address = + ctx->trtt_info.l3_table_address & GEN9_TRTT_L3_GFXADDR_MASK; + + ASSIGN_CTX_REG(reg_state, CTX_TRTT_L3_PTR_DW0, GEN9_TRTT_L3_POINTER_DW0, + lower_32_bits(masked_l3_gfx_address)); + + ASSIGN_CTX_REG(reg_state, CTX_TRTT_L3_PTR_DW1, GEN9_TRTT_L3_POINTER_DW1, + upper_32_bits(masked_l3_gfx_address)); + + ASSIGN_CTX_REG(reg_state, CTX_TRTT_NULL_TILE, GEN9_TRTT_NULL_TILE_REG, + ctx->trtt_info.null_tile_val); + + ASSIGN_CTX_REG(reg_state, CTX_TRTT_INVD_TILE, GEN9_TRTT_INVD_TILE_REG, + ctx->trtt_info.invd_tile_val); + + ASSIGN_CTX_REG(reg_state, CTX_TRTT_VA_MASKDATA, GEN9_TRTT_VA_MASKDATA, + GEN9_TRVA_MASK_VALUE | GEN9_TRVA_DATA_VALUE); + + ASSIGN_CTX_REG(reg_state, CTX_TRTT_TBL_CTL, GEN9_TRTT_TABLE_CONTROL, + GEN9_TRTT_IN_GFX_VA_SPACE | GEN9_TRTT_ENABLE); +} + static int populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj, struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index acf2102..6d6f448 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -357,6 +357,7 @@ typedef struct drm_i915_irq_wait { #define I915_PARAM_HAS_GPU_RESET 35 #define I915_PARAM_HAS_RESOURCE_STREAMER 36 #define I915_PARAM_HAS_EXEC_SOFTPIN 37 +#define I915_PARAM_HAS_TRTT 38 typedef struct drm_i915_getparam { __s32 param; @@ -1140,7 +1141,14 @@ struct drm_i915_gem_context_param { #define I915_CONTEXT_PARAM_BAN_PERIOD 0x1 #define I915_CONTEXT_PARAM_NO_ZEROMAP 0x2 #define I915_CONTEXT_PARAM_GTT_SIZE 0x3 +#define I915_CONTEXT_PARAM_ENABLE_TRTT 0x4 __u64 value; }; +struct drm_i915_gem_context_trtt_param { + __u64 l3_table_address; + __u32 invd_tile_val; + __u32 null_tile_val; +}; + #endif /* _UAPI_I915_DRM_H_ */