diff mbox

drm/i915: Support to enable TRTT on GEN9

Message ID 1452339021-3177-1-git-send-email-akash.goel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

akash.goel@intel.com Jan. 9, 2016, 11:30 a.m. UTC
From: Akash Goel <akash.goel@intel.com>

Gen9 has an additional address translation hardware support in form of
Tiled Resource Translation Table (TR-TT) which provides an extra level
of abstraction over PPGTT.
This is useful for mapping Sparse/Tiled texture resources.
Sparse resources are created as virtual-only allocations. Regions of the
resource that the application intends to use is bound to the physical memory
on the fly and can be re-bound to different memory allocations over the
lifetime of the resource.

TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
for a new PPGTT instance, but TR-TT may not enabled for every context.
1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
which such chunk to use is conveyed to HW through a register.
Any GFX address, which lies in that reserved 44 bit range will be translated
through TR-TT first and then through PPGTT to get the actual physical address,
so the output of translation from TR-TT will be a PPGTT offset.

TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which
leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and
each level is contained within a 4KB page hence L3 and L2 is composed of
512 64b entries and L1 is composed of 1024 32b entries.

There is a provision to keep TR-TT Tables in virtual space, where the pages of
TRTT tables will be mapped to PPGTT.
Currently this is the supported mode, in this mode UMD will have a full control
on TR-TT management, with bare minimum support from KMD.
So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
The entries of L1 table will contain the PPGTT offset of BOs actually backing
the Sparse resources.
The assumption here is that UMD only will do the complete PPGTT address space
management and use the Soft Pin API for all the buffer objects associated with
a given Context. So UMD will also have to allocate the L3/L2/L1 table pages
as a regular GEM BO only & assign them a PPGTT address through the Soft Pin API.
UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to
program the relevant entries of L3/L2/L1 tables.

Any space in TR-TT segment not bound to any Sparse texture, will be handled
through Invalid tile, User is expected to initialize the entries of a new
L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to
the holes in the Sparse texture resource will be set with the Null tile pattern
The improper programming of TRTT should only lead to a recoverable GPU hang,
eventually leading to banning of the culprit context without victimizing others.

The association of any Sparse resource with the BOs will be known only to UMD,
and only the Sparse resources shall be assigned an offset from the TR-TT segment
by UMD. The use of TR-TT segment or mapping of Sparse resources will be
abstracted from the KMD, UMD can do the address assignment from TR-TT segment
autonomously and KMD will be oblivious of it.
The BOs must not be assigned an address from TR-TT segment, they will be mapped
to PPGTT in a regular way by KMD, using the Soft Pin offset provided by UMD.

This patch provides an interface through which UMD can convey KMD to enable
TR-TT for a given context. A new I915_CONTEXT_PARAM_ENABLE_TRTT param has been
added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose.
UMD will have to pass the GFX address of L3 table page, pattern value for the
Null & invalid Tile registers.

Testcase: igt/gem_trtt

Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c         |  3 ++
 drivers/gpu/drm/i915/i915_drv.h         | 12 +++++++
 drivers/gpu/drm/i915/i915_gem_context.c | 45 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 57 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h     |  6 ++++
 drivers/gpu/drm/i915/i915_reg.h         | 19 +++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 41 ++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h             |  8 +++++
 8 files changed, 191 insertions(+)

Comments

Chris Wilson Jan. 10, 2016, 5:39 p.m. UTC | #1
On Sat, Jan 09, 2016 at 05:00:21PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> Gen9 has an additional address translation hardware support in form of
> Tiled Resource Translation Table (TR-TT) which provides an extra level
> of abstraction over PPGTT.
> This is useful for mapping Sparse/Tiled texture resources.
> Sparse resources are created as virtual-only allocations. Regions of the
> resource that the application intends to use is bound to the physical memory
> on the fly and can be re-bound to different memory allocations over the
> lifetime of the resource.
> 
> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
> for a new PPGTT instance, but TR-TT may not enabled for every context.
> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
> which such chunk to use is conveyed to HW through a register.
> Any GFX address, which lies in that reserved 44 bit range will be translated
> through TR-TT first and then through PPGTT to get the actual physical address,
> so the output of translation from TR-TT will be a PPGTT offset.
> 
> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which
> leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and
> each level is contained within a 4KB page hence L3 and L2 is composed of
> 512 64b entries and L1 is composed of 1024 32b entries.
> 
> There is a provision to keep TR-TT Tables in virtual space, where the pages of
> TRTT tables will be mapped to PPGTT.
> Currently this is the supported mode, in this mode UMD will have a full control
> on TR-TT management, with bare minimum support from KMD.
> So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
> The entries of L1 table will contain the PPGTT offset of BOs actually backing
> the Sparse resources.

> The assumption here is that UMD only will do the complete PPGTT address space
> management and use the Soft Pin API for all the buffer objects associated with
> a given Context.

That is a poor assumption, and not one required for this to work.

> So UMD will also have to allocate the L3/L2/L1 table pages
> as a regular GEM BO only & assign them a PPGTT address through the Soft Pin API.
> UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to
> program the relevant entries of L3/L2/L1 tables.

This only applies to te TR-TT L1-L3 cache, right?

> Any space in TR-TT segment not bound to any Sparse texture, will be handled
> through Invalid tile, User is expected to initialize the entries of a new
> L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to
> the holes in the Sparse texture resource will be set with the Null tile pattern
> The improper programming of TRTT should only lead to a recoverable GPU hang,
> eventually leading to banning of the culprit context without victimizing others.
> 
> The association of any Sparse resource with the BOs will be known only to UMD,
> and only the Sparse resources shall be assigned an offset from the TR-TT segment
> by UMD. The use of TR-TT segment or mapping of Sparse resources will be
> abstracted from the KMD,

s/abstracted from/transparent to/ s/,/;/

> UMD can do the address assignment from TR-TT segment

s/can/will/

> autonomously and KMD will be oblivious of it.
> The BOs must not be assigned an address from TR-TT segment, they will be mapped

s/The BOs/Any object/

> to PPGTT in a regular way by KMD

s/using the Soft Pin offset provided by UMD// as this is irrelevant.

> This patch provides an interface through which UMD can convey KMD to enable
> TR-TT for a given context. A new I915_CONTEXT_PARAM_ENABLE_TRTT param has been
> added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose.
> UMD will have to pass the GFX address of L3 table page,

+along with the

> pattern value for the
> Null & invalid Tile registers.
> 
> Testcase: igt/gem_trtt
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c         |  3 ++
>  drivers/gpu/drm/i915/i915_drv.h         | 12 +++++++
>  drivers/gpu/drm/i915/i915_gem_context.c | 45 ++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.c     | 57 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.h     |  6 ++++
>  drivers/gpu/drm/i915/i915_reg.h         | 19 +++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        | 41 ++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h             |  8 +++++
>  8 files changed, 191 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 988a380..c247c25 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  	case I915_PARAM_HAS_EXEC_SOFTPIN:
>  		value = 1;
>  		break;
> +	case I915_PARAM_HAS_TRTT:
> +		value = HAS_TRTT(dev);
> +		break;

Should we do this here, or just query the context? In fact you are
missing the context getparam path any way.

>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
>  		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c6dd4db..12c612e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -839,6 +839,7 @@ struct i915_ctx_hang_stats {
>  #define DEFAULT_CONTEXT_HANDLE 0
>  
>  #define CONTEXT_NO_ZEROMAP (1<<0)
> +#define CONTEXT_USE_TRTT   (1<<1)

Make flags unsigned whilst you are here, and fix the holes!

>  /**
>   * struct intel_context - as the name implies, represents a context.
>   * @ref: reference count.
> @@ -881,6 +882,15 @@ struct intel_context {
>  		int pin_count;
>  	} engine[I915_NUM_RINGS];
>  
> +	/* TRTT info */
> +	struct {

Give this a name now, we will be thankful in the future.

> +		uint32_t invd_tile_val;
> +		uint32_t null_tile_val;
> +		uint64_t l3_table_address;
> +		struct i915_vma *vma;
> +		bool update_trtt_params;
> +	} trtt_info;
> +
>  	struct list_head link;
>  };
>  
> @@ -2626,6 +2636,8 @@ struct drm_i915_cmd_table {
>  				 !IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev) && \
>  				 !IS_BROXTON(dev))
>  
> +#define HAS_TRTT(dev)		(IS_GEN9(dev))
> +
>  #define INTEL_PCH_DEVICE_ID_MASK		0xff00
>  #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
>  #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 900ffd0..ae9fc34 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -146,6 +146,9 @@ static void i915_gem_context_clean(struct intel_context *ctx)
>  		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
>  			break;
>  	}
> +
> +	if (ctx->flags & CONTEXT_USE_TRTT)
> +		i915_gem_destroy_trtt_vma(ctx->trtt_info.vma);

Sould be in context free.

>  }
>  
>  void i915_gem_context_free(struct kref *ctx_ref)
> @@ -512,6 +515,35 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>  	return ctx;
>  }
>  
> +static int
> +i915_setup_trtt_ctx(struct intel_context *ctx,
> +		    struct drm_i915_gem_context_trtt_param *trtt_params)
> +{
> +	if (ctx->flags & CONTEXT_USE_TRTT)
> +		return -EEXIST;
> +
> +	/* basic sanity checks for the l3 table pointer */
> +	if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) &&
> +	    (ctx->trtt_info.l3_table_address <
> +			(GEN9_TRTT_SEGMENT_START + GEN9_TRTT_SEGMENT_SIZE)))

Presumably l3_table has an actual size and you want to do a range
overlap test, not just the start address.

> +		return -EINVAL;
> +
> +	if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK)
> +		return -EINVAL;

These are worth adding DRM_DEBUG() or even better start using dev_debug()
so that we can debug userspace startup issues.

> @@ -952,6 +984,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>  {
>  	struct drm_i915_file_private *file_priv = file->driver_priv;
>  	struct drm_i915_gem_context_param *args = data;
> +	struct drm_i915_gem_context_trtt_param trtt_params;
>  	struct intel_context *ctx;
>  	int ret;
>  
> @@ -983,6 +1016,18 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>  			ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0;
>  		}
>  		break;
> +	case I915_CONTEXT_PARAM_ENABLE_TRTT:

Bump this case to i915_setup_trtt_ctx.
i.e. just have
	ret = i915_setup_trtt_ctx(ctx, args);
	break;

Otherwise this function will become very unwieldly very quickly.

>  int i915_ppgtt_init_hw(struct drm_device *dev)
>  {
> +	if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
> +		struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +		I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
> +			   GEN9_TRTT_BYPASS_DISABLE);

Shouldn't this be a context specific register? In which case you need to
set it in the context image instead.

Hmm. given you already do the context image tweaks, how does work with
non-trtt contexts?

> +struct i915_vma *
> +i915_gem_setup_trtt_vma(struct i915_address_space *vm)
> +{
> +	struct i915_vma *vma;
> +	int ret;
> +
> +	vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL);
> +	if (vma == NULL)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&vma->vma_link);
> +	INIT_LIST_HEAD(&vma->mm_list);
> +	INIT_LIST_HEAD(&vma->exec_list);
> +	vma->vm = vm;
> +	i915_ppgtt_get(i915_vm_to_ppgtt(vm));

Tempted to write a patch to allow

	vma->vm = i915_ppggtt_get(i915_vm_to_ppgtt(vm));
?

> +	/* Mark the vma as perennially pinned */

s/perennially/permanently/

We don't want to lose the reservation as opposed to having it grow back
next year.

> +	vma->pin_count = 1;
> +
> +	/* Reserve from the 48 bit PPGTT space */
> +	vma->node.start = GEN9_TRTT_SEGMENT_START;
> +	vma->node.size = GEN9_TRTT_SEGMENT_SIZE;
> +	ret = drm_mm_reserve_node(&vm->mm, &vma->node);
> +	if (ret) {
> +		ret = i915_gem_evict_for_vma(vma);
> +		if (ret == 0)
> +			ret = drm_mm_reserve_node(&vm->mm, &vma->node);

Good. I think we want i915_vm_reserve_node(vm, START, SIZE, &vma) - but
have a look at the other callsites to see if we have a common interface.
Looks like this would improve i915_vgpu.

> +struct drm_i915_gem_context_trtt_param {
> +	__u64 l3_table_address;
> +	__u32 invd_tile_val;
> +	__u32 null_tile_val;
> +};

Passes the ABI structure sanity checks.
-Chris
akash.goel@intel.com Jan. 11, 2016, 7:39 a.m. UTC | #2
On 1/10/2016 11:09 PM, Chris Wilson wrote:
> On Sat, Jan 09, 2016 at 05:00:21PM +0530, akash.goel@intel.com wrote:
>> From: Akash Goel <akash.goel@intel.com>
>>
>> Gen9 has an additional address translation hardware support in form of
>> Tiled Resource Translation Table (TR-TT) which provides an extra level
>> of abstraction over PPGTT.
>> This is useful for mapping Sparse/Tiled texture resources.
>> Sparse resources are created as virtual-only allocations. Regions of the
>> resource that the application intends to use is bound to the physical memory
>> on the fly and can be re-bound to different memory allocations over the
>> lifetime of the resource.
>>
>> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
>> for a new PPGTT instance, but TR-TT may not enabled for every context.
>> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
>> which such chunk to use is conveyed to HW through a register.
>> Any GFX address, which lies in that reserved 44 bit range will be translated
>> through TR-TT first and then through PPGTT to get the actual physical address,
>> so the output of translation from TR-TT will be a PPGTT offset.
>>
>> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which
>> leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and
>> each level is contained within a 4KB page hence L3 and L2 is composed of
>> 512 64b entries and L1 is composed of 1024 32b entries.
>>
>> There is a provision to keep TR-TT Tables in virtual space, where the pages of
>> TRTT tables will be mapped to PPGTT.
>> Currently this is the supported mode, in this mode UMD will have a full control
>> on TR-TT management, with bare minimum support from KMD.
>> So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
>> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
>> The entries of L1 table will contain the PPGTT offset of BOs actually backing
>> the Sparse resources.
>
>> The assumption here is that UMD only will do the complete PPGTT address space
>> management and use the Soft Pin API for all the buffer objects associated with
>> a given Context.
>
> That is a poor assumption, and not one required for this to work.
>
This is not a strict requirement.
But I thought that conflicts will be minimized if UMD itself can do the 
full address space management.
At least UMD has to ensure that PPGTT offset of L3 table remains same 
throughout.

>> So UMD will also have to allocate the L3/L2/L1 table pages
>> as a regular GEM BO only & assign them a PPGTT address through the Soft Pin API.
>> UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to
>> program the relevant entries of L3/L2/L1 tables.
>
> This only applies to te TR-TT L1-L3 cache, right?
>
Yes applies only to the TR-TT L1-L3 tables.
The backing pages of L3/L2/L1 tables shall be allocated as a BO, which 
should be assigned a PPGTT address.
The table entries could be written directly also by UMD by mmapping the 
table BOs, but adding MI_STORE_DATA_IMM commands in the batch buffer 
itself would help to achieve serialization (implicitly).

>> Any space in TR-TT segment not bound to any Sparse texture, will be handled
>> through Invalid tile, User is expected to initialize the entries of a new
>> L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to
>> the holes in the Sparse texture resource will be set with the Null tile pattern
>> The improper programming of TRTT should only lead to a recoverable GPU hang,
>> eventually leading to banning of the culprit context without victimizing others.
>>
>> The association of any Sparse resource with the BOs will be known only to UMD,
>> and only the Sparse resources shall be assigned an offset from the TR-TT segment
>> by UMD. The use of TR-TT segment or mapping of Sparse resources will be
>> abstracted from the KMD,
>
> s/abstracted from/transparent to/ s/,/;/
>
Ok will rephrase as 'transparent to KMD;'
>> UMD can do the address assignment from TR-TT segment
>
> s/can/will/
>
Ok

>> autonomously and KMD will be oblivious of it.
>> The BOs must not be assigned an address from TR-TT segment, they will be mapped
>
> s/The BOs/Any object/
>
Ok will use 'Any object'
>> to PPGTT in a regular way by KMD
>
> s/using the Soft Pin offset provided by UMD// as this is irrelevant.
>
You mean to say that it is needless or inappropriate to state that KMD 
will use the Soft PIN offset provided by UMD, it doesn't matter that 
whether the Soft PIN offset is used or KMD itself assigns an address.

>> This patch provides an interface through which UMD can convey KMD to enable
>> TR-TT for a given context. A new I915_CONTEXT_PARAM_ENABLE_TRTT param has been
>> added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose.
>> UMD will have to pass the GFX address of L3 table page,
>
> +along with the
>
Ok.

>> pattern value for the
>> Null & invalid Tile registers.
>>
>> Testcase: igt/gem_trtt
>>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_dma.c         |  3 ++
>>   drivers/gpu/drm/i915/i915_drv.h         | 12 +++++++
>>   drivers/gpu/drm/i915/i915_gem_context.c | 45 ++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_gem_gtt.c     | 57 +++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_gem_gtt.h     |  6 ++++
>>   drivers/gpu/drm/i915/i915_reg.h         | 19 +++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        | 41 ++++++++++++++++++++++++
>>   include/uapi/drm/i915_drm.h             |  8 +++++
>>   8 files changed, 191 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
>> index 988a380..c247c25 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>>   	case I915_PARAM_HAS_EXEC_SOFTPIN:
>>   		value = 1;
>>   		break;
>> +	case I915_PARAM_HAS_TRTT:
>> +		value = HAS_TRTT(dev);
>> +		break;
>
> Should we do this here, or just query the context? In fact you are
> missing the context getparam path any way.
>
Sorry, do you mean to say that with -ENODEV error also, on context 
setparam, User can make out the TR-TT support, so no need to have an 
explicit getparam case.

Would the context getparam path be really useful for TR-TT?.
If its needed, then would be better to rename 
I915_CONTEXT_PARAM_ENABLE_TRTT to I915_CONTEXT_PARAM_TRTT_INFO ?


>>   	default:
>>   		DRM_DEBUG("Unknown parameter %d\n", param->param);
>>   		return -EINVAL;
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index c6dd4db..12c612e 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -839,6 +839,7 @@ struct i915_ctx_hang_stats {
>>   #define DEFAULT_CONTEXT_HANDLE 0
>>
>>   #define CONTEXT_NO_ZEROMAP (1<<0)
>> +#define CONTEXT_USE_TRTT   (1<<1)
>
> Make flags unsigned whilst you are here, and fix the holes!
>

Ok will change the type of 'flags' field inside 'intel_context' to unsigned.
Sorry, but apart from this anything else required here ?

>>   /**
>>    * struct intel_context - as the name implies, represents a context.
>>    * @ref: reference count.
>> @@ -881,6 +882,15 @@ struct intel_context {
>>   		int pin_count;
>>   	} engine[I915_NUM_RINGS];
>>
>> +	/* TRTT info */
>> +	struct {
>
> Give this a name now, we will be thankful in the future.
>
Would ctx_trtt_params be fine ?

>> +		uint32_t invd_tile_val;
>> +		uint32_t null_tile_val;
>> +		uint64_t l3_table_address;
>> +		struct i915_vma *vma;
>> +		bool update_trtt_params;
>> +	} trtt_info;
>> +
>>   	struct list_head link;
>>   };
>>
>> @@ -2626,6 +2636,8 @@ struct drm_i915_cmd_table {
>>   				 !IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev) && \
>>   				 !IS_BROXTON(dev))
>>
>> +#define HAS_TRTT(dev)		(IS_GEN9(dev))
>> +
>>   #define INTEL_PCH_DEVICE_ID_MASK		0xff00
>>   #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
>>   #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>> index 900ffd0..ae9fc34 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -146,6 +146,9 @@ static void i915_gem_context_clean(struct intel_context *ctx)
>>   		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
>>   			break;
>>   	}
>> +
>> +	if (ctx->flags & CONTEXT_USE_TRTT)
>> +		i915_gem_destroy_trtt_vma(ctx->trtt_info.vma);
>
> Sould be in context free.

Fine, will move it to the gem_context_free()

>
>>   }
>>
>>   void i915_gem_context_free(struct kref *ctx_ref)
>> @@ -512,6 +515,35 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>>   	return ctx;
>>   }
>>
>> +static int
>> +i915_setup_trtt_ctx(struct intel_context *ctx,
>> +		    struct drm_i915_gem_context_trtt_param *trtt_params)
>> +{
>> +	if (ctx->flags & CONTEXT_USE_TRTT)
>> +		return -EEXIST;
>> +
>> +	/* basic sanity checks for the l3 table pointer */
>> +	if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) &&
>> +	    (ctx->trtt_info.l3_table_address <
>> +			(GEN9_TRTT_SEGMENT_START + GEN9_TRTT_SEGMENT_SIZE)))
>
> Presumably l3_table has an actual size and you want to do a range
> overlap test, not just the start address.
>
Yes intend to do a range overlap test only. But since L3 table size is 
fixed as 4KB, thought there is no real need to also include the size in 
the range check, considering the allocations are always in multiple of 4KB.

>> +		return -EINVAL;
>> +
>> +	if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK)
>> +		return -EINVAL;
>
> These are worth adding DRM_DEBUG() or even better start using dev_debug()
> so that we can debug userspace startup issues.
>
Fine, I think DRM_DEBUG_DRIVER will be more appropriate compared to 
DRM_DEBUG.
Or
	dev_dbg(dev->dev, "invalid l3 table address\n");

>> @@ -952,6 +984,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>   {
>>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>>   	struct drm_i915_gem_context_param *args = data;
>> +	struct drm_i915_gem_context_trtt_param trtt_params;
>>   	struct intel_context *ctx;
>>   	int ret;
>>
>> @@ -983,6 +1016,18 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>   			ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0;
>>   		}
>>   		break;
>> +	case I915_CONTEXT_PARAM_ENABLE_TRTT:
>
> Bump this case to i915_setup_trtt_ctx.
> i.e. just have
> 	ret = i915_setup_trtt_ctx(ctx, args);
> 	break;
>
> Otherwise this function will become very unwieldly very quickly.
>
Fine, this will be much better.

>>   int i915_ppgtt_init_hw(struct drm_device *dev)
>>   {
>> +	if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
>> +		struct drm_i915_private *dev_priv = dev->dev_private;
>> +
>> +		I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
>> +			   GEN9_TRTT_BYPASS_DISABLE);
>
> Shouldn't this be a context specific register? In which case you need to
> set it in the context image instead.
>
> Hmm. given you already do the context image tweaks, how does work with
> non-trtt contexts?
>

GEN9_TR_CHICKEN_BIT_VECTOR is not a context specific register.
It globally enables TR-TT support in Hw. Still TR-TT enabling on per 
context basis is required.
Non-trtt contexts are not affected by this setting.

>> +struct i915_vma *
>> +i915_gem_setup_trtt_vma(struct i915_address_space *vm)
>> +{
>> +	struct i915_vma *vma;
>> +	int ret;
>> +
>> +	vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL);
>> +	if (vma == NULL)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	INIT_LIST_HEAD(&vma->vma_link);
>> +	INIT_LIST_HEAD(&vma->mm_list);
>> +	INIT_LIST_HEAD(&vma->exec_list);
>> +	vma->vm = vm;
>> +	i915_ppgtt_get(i915_vm_to_ppgtt(vm));
>
> Tempted to write a patch to allow
>
> 	vma->vm = i915_ppggtt_get(i915_vm_to_ppgtt(vm));
> ?
>
>> +	/* Mark the vma as perennially pinned */
>
> s/perennially/permanently/
>

Thanks, will rephrase as 'permanently pinned'.
> We don't want to lose the reservation as opposed to having it grow back
> next year.
>
>> +	vma->pin_count = 1;
>> +
>> +	/* Reserve from the 48 bit PPGTT space */
>> +	vma->node.start = GEN9_TRTT_SEGMENT_START;
>> +	vma->node.size = GEN9_TRTT_SEGMENT_SIZE;
>> +	ret = drm_mm_reserve_node(&vm->mm, &vma->node);
>> +	if (ret) {
>> +		ret = i915_gem_evict_for_vma(vma);
>> +		if (ret == 0)
>> +			ret = drm_mm_reserve_node(&vm->mm, &vma->node);
>
> Good. I think we want i915_vm_reserve_node(vm, START, SIZE, &vma) - but
> have a look at the other callsites to see if we have a common interface.
> Looks like this would improve i915_vgpu.
>

Ok so need to define a new wrapper function,
	i915_vm_reserve_node(vm, START, SIZE, &vma).

After looking at the other callsites of drm_mm_reserve_node, including 
i915_vgpu, I think it would be better to have the prototype as,
    i915_vm_reserve_node(vm, &node);

However this should be done as a separate patch ?

>> +struct drm_i915_gem_context_trtt_param {
>> +	__u64 l3_table_address;
>> +	__u32 invd_tile_val;
>> +	__u32 null_tile_val;
>> +};
>
> Passes the ABI structure sanity checks.

Should we allow User to also choose the location of TR-TT segment (size 
is anyways fixed as 1<<44).

Best regards
Akash
> -Chris
>
Chris Wilson Jan. 11, 2016, 8:49 a.m. UTC | #3
On Mon, Jan 11, 2016 at 01:09:50PM +0530, Goel, Akash wrote:
> 
> 
> On 1/10/2016 11:09 PM, Chris Wilson wrote:
> >On Sat, Jan 09, 2016 at 05:00:21PM +0530, akash.goel@intel.com wrote:
> >>From: Akash Goel <akash.goel@intel.com>
> >>
> >>Gen9 has an additional address translation hardware support in form of
> >>Tiled Resource Translation Table (TR-TT) which provides an extra level
> >>of abstraction over PPGTT.
> >>This is useful for mapping Sparse/Tiled texture resources.
> >>Sparse resources are created as virtual-only allocations. Regions of the
> >>resource that the application intends to use is bound to the physical memory
> >>on the fly and can be re-bound to different memory allocations over the
> >>lifetime of the resource.
> >>
> >>TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
> >>for a new PPGTT instance, but TR-TT may not enabled for every context.
> >>1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
> >>which such chunk to use is conveyed to HW through a register.
> >>Any GFX address, which lies in that reserved 44 bit range will be translated
> >>through TR-TT first and then through PPGTT to get the actual physical address,
> >>so the output of translation from TR-TT will be a PPGTT offset.
> >>
> >>TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which
> >>leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and
> >>each level is contained within a 4KB page hence L3 and L2 is composed of
> >>512 64b entries and L1 is composed of 1024 32b entries.
> >>
> >>There is a provision to keep TR-TT Tables in virtual space, where the pages of
> >>TRTT tables will be mapped to PPGTT.
> >>Currently this is the supported mode, in this mode UMD will have a full control
> >>on TR-TT management, with bare minimum support from KMD.
> >>So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
> >>similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
> >>The entries of L1 table will contain the PPGTT offset of BOs actually backing
> >>the Sparse resources.
> >
> >>The assumption here is that UMD only will do the complete PPGTT address space
> >>management and use the Soft Pin API for all the buffer objects associated with
> >>a given Context.
> >
> >That is a poor assumption, and not one required for this to work.
> >
> This is not a strict requirement.
> But I thought that conflicts will be minimized if UMD itself can do
> the full address space management.
> At least UMD has to ensure that PPGTT offset of L3 table remains
> same throughout.

Yes, userspace must control that object, and that would require softpin
to preserve it across execbuffer calls. The kernel does not require that
all addresses be handled in userspace afterwards, that's the language I
wish to avoid. (Hence I don't like using "assumption" as that just
invites userspace to break the kernel.)
 
> >>So UMD will also have to allocate the L3/L2/L1 table pages
> >>as a regular GEM BO only & assign them a PPGTT address through the Soft Pin API.
> >>UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to
> >>program the relevant entries of L3/L2/L1 tables.
> >
> >This only applies to te TR-TT L1-L3 cache, right?
> >
> Yes applies only to the TR-TT L1-L3 tables.
> The backing pages of L3/L2/L1 tables shall be allocated as a BO,
> which should be assigned a PPGTT address.
> The table entries could be written directly also by UMD by mmapping
> the table BOs, but adding MI_STORE_DATA_IMM commands in the batch
> buffer itself would help to achieve serialization (implicitly).

Can you tighten up the phrasing here? My first read was that you indeed
for all PTE writes to be in userspace, which is scary.

"UMD will then allocate the L3/L32/L1 page tables for TR-TT as a regular
bo, and will use softpin to assign it to the l3_table_address when used.
UMD will also maintain the entries in the TR-TT page tables using
regular batch commands (MI_STORE_DATA_IMM), or via mmapping of the page
table bo."

> >>autonomously and KMD will be oblivious of it.
> >>The BOs must not be assigned an address from TR-TT segment, they will be mapped
> >
> >s/The BOs/Any object/
> >
> Ok will use 'Any object'
> >>to PPGTT in a regular way by KMD
> >
> >s/using the Soft Pin offset provided by UMD// as this is irrelevant.
> >
> You mean to say that it is needless or inappropriate to state that
> KMD will use the Soft PIN offset provided by UMD, it doesn't matter
> that whether the Soft PIN offset is used or KMD itself assigns an
> address.

I just want to avoid implying that userspace must use softpin on every
single bo for this to work. (Mainly because I don't really want
userspace to have to do full address space management, as we will always
have to do the double check inside the kernel. Unless there is a real
need (e.g. svm), I'd rather improve the kernel allocator/verification, rather
than try and circumvent it.)

> >>@@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
> >>  	case I915_PARAM_HAS_EXEC_SOFTPIN:
> >>  		value = 1;
> >>  		break;
> >>+	case I915_PARAM_HAS_TRTT:
> >>+		value = HAS_TRTT(dev);
> >>+		break;
> >
> >Should we do this here, or just query the context? In fact you are
> >missing the context getparam path any way.
> >
> Sorry, do you mean to say that with -ENODEV error also, on context
> setparam, User can make out the TR-TT support, so no need to have an
> explicit getparam case.
> 
> Would the context getparam path be really useful for TR-TT?.
> If its needed, then would be better to rename
> I915_CONTEXT_PARAM_ENABLE_TRTT to I915_CONTEXT_PARAM_TRTT_INFO ?

The question I have is do we want:

GETPARAM + CONTEXT_SETPARAM

or

CONTEXT_GETPARAM + CONTEXT_SETPARAM

the latter seems more symmetric and flexible, and we can use as a double
check later on that we set the right address etc.

Indeed, hindsight says ENABLE_TRTT is a bad name :)

I915_CONTEXT_PARAM_TRTT (let's assume for now  this will be the last,
any future PARAM_TRTT can think of a good name for its extension).

> >>  	default:
> >>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
> >>  		return -EINVAL;
> >>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>index c6dd4db..12c612e 100644
> >>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>@@ -839,6 +839,7 @@ struct i915_ctx_hang_stats {
> >>  #define DEFAULT_CONTEXT_HANDLE 0
> >>
> >>  #define CONTEXT_NO_ZEROMAP (1<<0)
> >>+#define CONTEXT_USE_TRTT   (1<<1)
> >
> >Make flags unsigned whilst you are here, and fix the holes!
> >
> 
> Ok will change the type of 'flags' field inside 'intel_context' to unsigned.
> Sorry, but apart from this anything else required here ?

No, it's just belated anger about the silly "int flags".

> 
> >>  /**
> >>   * struct intel_context - as the name implies, represents a context.
> >>   * @ref: reference count.
> >>@@ -881,6 +882,15 @@ struct intel_context {
> >>  		int pin_count;
> >>  	} engine[I915_NUM_RINGS];
> >>
> >>+	/* TRTT info */
> >>+	struct {
> >
> >Give this a name now, we will be thankful in the future.
> >
> Would ctx_trtt_params be fine ?

struct intel_context_trtt

(Avoid using params for internals, let's keep those for uAPI - that
helps us distinguish pieces of code / context.)

> >>
> >>  void i915_gem_context_free(struct kref *ctx_ref)
> >>@@ -512,6 +515,35 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
> >>  	return ctx;
> >>  }
> >>
> >>+static int
> >>+i915_setup_trtt_ctx(struct intel_context *ctx,
> >>+		    struct drm_i915_gem_context_trtt_param *trtt_params)
> >>+{
> >>+	if (ctx->flags & CONTEXT_USE_TRTT)
> >>+		return -EEXIST;
> >>+
> >>+	/* basic sanity checks for the l3 table pointer */
> >>+	if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) &&
> >>+	    (ctx->trtt_info.l3_table_address <
> >>+			(GEN9_TRTT_SEGMENT_START + GEN9_TRTT_SEGMENT_SIZE)))
> >
> >Presumably l3_table has an actual size and you want to do a range
> >overlap test, not just the start address.
> >
> Yes intend to do a range overlap test only. But since L3 table size
> is fixed as 4KB, thought there is no real need to also include the
> size in the range check, considering the allocations are always in
> multiple of 4KB.

Ok. You have a choice of writting that up as a comment, or just doing
the page overlap test :) Honestly, I would just go for the range test
since this is a one-off init path and the reader then doesn't even have
to think.

> >>+		return -EINVAL;
> >>+
> >>+	if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK)
> >>+		return -EINVAL;
> >
> >These are worth adding DRM_DEBUG() or even better start using dev_debug()
> >so that we can debug userspace startup issues.
> >
> Fine, I think DRM_DEBUG_DRIVER will be more appropriate compared to
> DRM_DEBUG.

No, these are userspace errors for which we use DRM_DEBUG.
DRM_DEBUG_DRIVER is for a driver error :)

> Or
> 	dev_dbg(dev->dev, "invalid l3 table address\n");

Include as much info as you can (without giving away kernel internals),
since the user gave use the l3_address that we reject, report it. That
makes it easier to spot if it is the same address as they expected.

dev_dbg() would be my preference.

#define i915_dbg(DEV, args...) dev_dbg(__I915__(DEV)->dev->dev, ##args)
(not the prettiest yet, the pointer dancing is in the wrong direction!)

and let's get the ball rolling.

> >>  int i915_ppgtt_init_hw(struct drm_device *dev)
> >>  {
> >>+	if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
> >>+		struct drm_i915_private *dev_priv = dev->dev_private;
> >>+
> >>+		I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
> >>+			   GEN9_TRTT_BYPASS_DISABLE);
> >
> >Shouldn't this be a context specific register? In which case you need to
> >set it in the context image instead.
> >
> >Hmm. given you already do the context image tweaks, how does work with
> >non-trtt contexts?
> >
> 
> GEN9_TR_CHICKEN_BIT_VECTOR is not a context specific register.
> It globally enables TR-TT support in Hw. Still TR-TT enabling on per
> context basis is required.
> Non-trtt contexts are not affected by this setting.

Please add that as a comment here. What are the downsides, potential
regressions? It's behind a chicken bit after all...

> Ok so need to define a new wrapper function,
> 	i915_vm_reserve_node(vm, START, SIZE, &vma).
> 
> After looking at the other callsites of drm_mm_reserve_node,
> including i915_vgpu, I think it would be better to have the
> prototype as,
>    i915_vm_reserve_node(vm, &node);
> 
> However this should be done as a separate patch ?

Yes, I was just recognising the code duplication and found 3 places
where we could use it - 3 being the magic number to refactor.
 
> >>+struct drm_i915_gem_context_trtt_param {
> >>+	__u64 l3_table_address;
> >>+	__u32 invd_tile_val;
> >>+	__u32 null_tile_val;
> >>+};
> >
> >Passes the ABI structure sanity checks.
> 
> Should we allow User to also choose the location of TR-TT segment
> (size is anyways fixed as 1<<44).

The kernel is much more agnostic with your approach than I anticipated,
so from our pov, allowing the user to shoot themselves in the foot is
ok.

There is only one sensible location, but that one location may be
sensible for a few things.

i.e. it shouldn't be below 1<<40 so that you can do full aliasing
between CPU and GPU addresses, and you want to avoid cutting your
address space in two, so it has to go at the ends, ergo it should be at
the very top.
-Chris
akash.goel@intel.com Jan. 11, 2016, 12:29 p.m. UTC | #4
On 1/11/2016 2:19 PM, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 01:09:50PM +0530, Goel, Akash wrote:
>>
>>
>> On 1/10/2016 11:09 PM, Chris Wilson wrote:
>>> On Sat, Jan 09, 2016 at 05:00:21PM +0530, akash.goel@intel.com wrote:
>>>> From: Akash Goel <akash.goel@intel.com>
>>>>
>>>> Gen9 has an additional address translation hardware support in form of
>>>> Tiled Resource Translation Table (TR-TT) which provides an extra level
>>>> of abstraction over PPGTT.
>>>> This is useful for mapping Sparse/Tiled texture resources.
>>>> Sparse resources are created as virtual-only allocations. Regions of the
>>>> resource that the application intends to use is bound to the physical memory
>>>> on the fly and can be re-bound to different memory allocations over the
>>>> lifetime of the resource.
>>>>
>>>> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
>>>> for a new PPGTT instance, but TR-TT may not enabled for every context.
>>>> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
>>>> which such chunk to use is conveyed to HW through a register.
>>>> Any GFX address, which lies in that reserved 44 bit range will be translated
>>>> through TR-TT first and then through PPGTT to get the actual physical address,
>>>> so the output of translation from TR-TT will be a PPGTT offset.
>>>>
>>>> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which
>>>> leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and
>>>> each level is contained within a 4KB page hence L3 and L2 is composed of
>>>> 512 64b entries and L1 is composed of 1024 32b entries.
>>>>
>>>> There is a provision to keep TR-TT Tables in virtual space, where the pages of
>>>> TRTT tables will be mapped to PPGTT.
>>>> Currently this is the supported mode, in this mode UMD will have a full control
>>>> on TR-TT management, with bare minimum support from KMD.
>>>> So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
>>>> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
>>>> The entries of L1 table will contain the PPGTT offset of BOs actually backing
>>>> the Sparse resources.
>>>
>>>> The assumption here is that UMD only will do the complete PPGTT address space
>>>> management and use the Soft Pin API for all the buffer objects associated with
>>>> a given Context.
>>>
>>> That is a poor assumption, and not one required for this to work.
>>>
>> This is not a strict requirement.
>> But I thought that conflicts will be minimized if UMD itself can do
>> the full address space management.
>> At least UMD has to ensure that PPGTT offset of L3 table remains
>> same throughout.
>
> Yes, userspace must control that object, and that would require softpin
> to preserve it across execbuffer calls. The kernel does not require that
> all addresses be handled in userspace afterwards, that's the language I
> wish to avoid. (Hence I don't like using "assumption" as that just
> invites userspace to break the kernel.)
>
Fine will remove the word "assumption", instead can I put it as "UMD may 
do the complete PPGTT address space management, on the pretext that it 
could help to might minimize the conflicts".

>>>> So UMD will also have to allocate the L3/L2/L1 table pages
>>>> as a regular GEM BO only & assign them a PPGTT address through the Soft Pin API.
>>>> UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to
>>>> program the relevant entries of L3/L2/L1 tables.
>>>
>>> This only applies to te TR-TT L1-L3 cache, right?
>>>
>> Yes applies only to the TR-TT L1-L3 tables.
>> The backing pages of L3/L2/L1 tables shall be allocated as a BO,
>> which should be assigned a PPGTT address.
>> The table entries could be written directly also by UMD by mmapping
>> the table BOs, but adding MI_STORE_DATA_IMM commands in the batch
>> buffer itself would help to achieve serialization (implicitly).
>
> Can you tighten up the phrasing here? My first read was that you indeed
> for all PTE writes to be in userspace, which is scary.
>
> "UMD will then allocate the L3/L32/L1 page tables for TR-TT as a regular
> bo, and will use softpin to assign it to the l3_table_address when used.
> UMD will also maintain the entries in the TR-TT page tables using
> regular batch commands (MI_STORE_DATA_IMM), or via mmapping of the page
> table bo."
>

Yes, UMD will have to use softpin to assign l3_table_address to L3 table BO.
Similarly the softpin will be needed for L2/L1 table BOs.

>>>> autonomously and KMD will be oblivious of it.
>>>> The BOs must not be assigned an address from TR-TT segment, they will be mapped
>>>
>>> s/The BOs/Any object/
>>>
>> Ok will use 'Any object'
>>>> to PPGTT in a regular way by KMD
>>>
>>> s/using the Soft Pin offset provided by UMD// as this is irrelevant.
>>>
>> You mean to say that it is needless or inappropriate to state that
>> KMD will use the Soft PIN offset provided by UMD, it doesn't matter
>> that whether the Soft PIN offset is used or KMD itself assigns an
>> address.
>
> I just want to avoid implying that userspace must use softpin on every
> single bo for this to work. (Mainly because I don't really want
> userspace to have to do full address space management, as we will always
> have to do the double check inside the kernel. Unless there is a real
> need (e.g. svm), I'd rather improve the kernel allocator/verification, rather
> than try and circumvent it.)
>
>>>> @@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>>>>   	case I915_PARAM_HAS_EXEC_SOFTPIN:
>>>>   		value = 1;
>>>>   		break;
>>>> +	case I915_PARAM_HAS_TRTT:
>>>> +		value = HAS_TRTT(dev);
>>>> +		break;
>>>
>>> Should we do this here, or just query the context? In fact you are
>>> missing the context getparam path any way.
>>>
>> Sorry, do you mean to say that with -ENODEV error also, on context
>> setparam, User can make out the TR-TT support, so no need to have an
>> explicit getparam case.
>>
>> Would the context getparam path be really useful for TR-TT?.
>> If its needed, then would be better to rename
>> I915_CONTEXT_PARAM_ENABLE_TRTT to I915_CONTEXT_PARAM_TRTT_INFO ?
>
> The question I have is do we want:
>
> GETPARAM + CONTEXT_SETPARAM
>
> or
>
> CONTEXT_GETPARAM + CONTEXT_SETPARAM
>
> the latter seems more symmetric and flexible, and we can use as a double
> check later on that we set the right address etc.
>
> Indeed, hindsight says ENABLE_TRTT is a bad name :)
>
> I915_CONTEXT_PARAM_TRTT (let's assume for now  this will be the last,
> any future PARAM_TRTT can think of a good name for its extension).
>

fine, will rename as I915_CONTEXT_PARAM_TRTT.
And use the CONTEXT_GETPARAM + CONTEXT_SETPARAM pair.

Would the following change be fine ?

@@ -974,6 +987,24 @@ int i915_gem_context_getparam_ioctl(struct 
drm_device *dev, void *data,
  		else
  			args->value = to_i915(dev)->gtt.base.total;
  		break;
+	case I915_CONTEXT_PARAM_TRTT:
+		if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev))
+			return -ENODEV;
+		else if (args->size < sizeof(trtt_params))
+			ret = -EINVAL;
+		else {
+			trtt_params.l3_table_address =
+				ctx->trtt_info.l3_table_address;
+			trtt_params.null_tile_val =
+				ctx->trtt_info.null_tile_val;
+			trtt_params.invd_tile_val =
+				ctx->trtt_info.invd_tile_val;
+
+			if (__copy_to_user(to_user_ptr(args->value),
+					   &trtt_params,
+					   sizeof(trtt_params)))
+				ret = -EFAULT;
+		}
  	default:

>>>>   	default:
>>>>   		DRM_DEBUG("Unknown parameter %d\n", param->param);
>>>>   		return -EINVAL;
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>> index c6dd4db..12c612e 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -839,6 +839,7 @@ struct i915_ctx_hang_stats {
>>>>   #define DEFAULT_CONTEXT_HANDLE 0
>>>>
>>>>   #define CONTEXT_NO_ZEROMAP (1<<0)
>>>> +#define CONTEXT_USE_TRTT   (1<<1)
>>>
>>> Make flags unsigned whilst you are here, and fix the holes!
>>>
>>
>> Ok will change the type of 'flags' field inside 'intel_context' to unsigned.
>> Sorry, but apart from this anything else required here ?
>
> No, it's just belated anger about the silly "int flags".
>

Fine will modify the type in this patch only.

>>
>>>>   /**
>>>>    * struct intel_context - as the name implies, represents a context.
>>>>    * @ref: reference count.
>>>> @@ -881,6 +882,15 @@ struct intel_context {
>>>>   		int pin_count;
>>>>   	} engine[I915_NUM_RINGS];
>>>>
>>>> +	/* TRTT info */
>>>> +	struct {
>>>
>>> Give this a name now, we will be thankful in the future.
>>>
>> Would ctx_trtt_params be fine ?
>
> struct intel_context_trtt
>
> (Avoid using params for internals, let's keep those for uAPI - that
> helps us distinguish pieces of code / context.)
>
Thanks, intel_context_trtt is more appropriate.

>>>>
>>>>   void i915_gem_context_free(struct kref *ctx_ref)
>>>> @@ -512,6 +515,35 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>>>>   	return ctx;
>>>>   }
>>>>
>>>> +static int
>>>> +i915_setup_trtt_ctx(struct intel_context *ctx,
>>>> +		    struct drm_i915_gem_context_trtt_param *trtt_params)
>>>> +{
>>>> +	if (ctx->flags & CONTEXT_USE_TRTT)
>>>> +		return -EEXIST;
>>>> +
>>>> +	/* basic sanity checks for the l3 table pointer */
>>>> +	if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) &&
>>>> +	    (ctx->trtt_info.l3_table_address <
>>>> +			(GEN9_TRTT_SEGMENT_START + GEN9_TRTT_SEGMENT_SIZE)))
>>>
>>> Presumably l3_table has an actual size and you want to do a range
>>> overlap test, not just the start address.
>>>
>> Yes intend to do a range overlap test only. But since L3 table size
>> is fixed as 4KB, thought there is no real need to also include the
>> size in the range check, considering the allocations are always in
>> multiple of 4KB.
>
> Ok. You have a choice of writting that up as a comment, or just doing
> the page overlap test :) Honestly, I would just go for the range test
> since this is a one-off init path and the reader then doesn't even have
> to think.
>

Fine, will do the page overlap test.

>>>> +		return -EINVAL;
>>>> +
>>>> +	if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK)
>>>> +		return -EINVAL;
>>>
>>> These are worth adding DRM_DEBUG() or even better start using dev_debug()
>>> so that we can debug userspace startup issues.
>>>
>> Fine, I think DRM_DEBUG_DRIVER will be more appropriate compared to
>> DRM_DEBUG.
>
> No, these are userspace errors for which we use DRM_DEBUG.
> DRM_DEBUG_DRIVER is for a driver error :)
>
>> Or
>> 	dev_dbg(dev->dev, "invalid l3 table address\n");
>
> Include as much info as you can (without giving away kernel internals),
> since the user gave use the l3_address that we reject, report it. That
> makes it easier to spot if it is the same address as they expected.
>
> dev_dbg() would be my preference.
>
> #define i915_dbg(DEV, args...) dev_dbg(__I915__(DEV)->dev->dev, ##args)
> (not the prettiest yet, the pointer dancing is in the wrong direction!)
>
> and let's get the ball rolling.
>

This will also go as a separate patch.
One doubt here, by using dev_dbg() we intend to avoid dependency on the 
value of drm.debug parameter and always get certain error messages ?. 
Sorry just want to understand the rationale behind it.

>>>>   int i915_ppgtt_init_hw(struct drm_device *dev)
>>>>   {
>>>> +	if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
>>>> +		struct drm_i915_private *dev_priv = dev->dev_private;
>>>> +
>>>> +		I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
>>>> +			   GEN9_TRTT_BYPASS_DISABLE);
>>>
>>> Shouldn't this be a context specific register? In which case you need to
>>> set it in the context image instead.
>>>
>>> Hmm. given you already do the context image tweaks, how does work with
>>> non-trtt contexts?
>>>
>>
>> GEN9_TR_CHICKEN_BIT_VECTOR is not a context specific register.
>> It globally enables TR-TT support in Hw. Still TR-TT enabling on per
>> context basis is required.
>> Non-trtt contexts are not affected by this setting.
>
> Please add that as a comment here. What are the downsides, potential
> regressions? It's behind a chicken bit after all...
>
Fine, will add a comment here for clarity.
So not aware of downsides, this setting should come into picture only 
when TR-TT is enabled for a context.

>> Ok so need to define a new wrapper function,
>> 	i915_vm_reserve_node(vm, START, SIZE, &vma).
>>
>> After looking at the other callsites of drm_mm_reserve_node,
>> including i915_vgpu, I think it would be better to have the
>> prototype as,
>>     i915_vm_reserve_node(vm, &node);
>>
>> However this should be done as a separate patch ?
>
> Yes, I was just recognising the code duplication and found 3 places
> where we could use it - 3 being the magic number to refactor.
>
>>>> +struct drm_i915_gem_context_trtt_param {
>>>> +	__u64 l3_table_address;
>>>> +	__u32 invd_tile_val;
>>>> +	__u32 null_tile_val;
>>>> +};
>>>
>>> Passes the ABI structure sanity checks.
>>
>> Should we allow User to also choose the location of TR-TT segment
>> (size is anyways fixed as 1<<44).
>
> The kernel is much more agnostic with your approach than I anticipated,
> so from our pov, allowing the user to shoot themselves in the foot is
> ok.
>
> There is only one sensible location, but that one location may be
> sensible for a few things.
>
> i.e. it shouldn't be below 1<<40 so that you can do full aliasing
> between CPU and GPU addresses, and you want to avoid cutting your
> address space in two, so it has to go at the ends, ergo it should be at
> the very top.

So Top most region is the most suitable location, hence should be used 
always.

Best regards
Akash

> -Chris
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 988a380..c247c25 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -172,6 +172,9 @@  static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_SOFTPIN:
 		value = 1;
 		break;
+	case I915_PARAM_HAS_TRTT:
+		value = HAS_TRTT(dev);
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c6dd4db..12c612e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -839,6 +839,7 @@  struct i915_ctx_hang_stats {
 #define DEFAULT_CONTEXT_HANDLE 0
 
 #define CONTEXT_NO_ZEROMAP (1<<0)
+#define CONTEXT_USE_TRTT   (1<<1)
 /**
  * struct intel_context - as the name implies, represents a context.
  * @ref: reference count.
@@ -881,6 +882,15 @@  struct intel_context {
 		int pin_count;
 	} engine[I915_NUM_RINGS];
 
+	/* TRTT info */
+	struct {
+		uint32_t invd_tile_val;
+		uint32_t null_tile_val;
+		uint64_t l3_table_address;
+		struct i915_vma *vma;
+		bool update_trtt_params;
+	} trtt_info;
+
 	struct list_head link;
 };
 
@@ -2626,6 +2636,8 @@  struct drm_i915_cmd_table {
 				 !IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev) && \
 				 !IS_BROXTON(dev))
 
+#define HAS_TRTT(dev)		(IS_GEN9(dev))
+
 #define INTEL_PCH_DEVICE_ID_MASK		0xff00
 #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
 #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 900ffd0..ae9fc34 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -146,6 +146,9 @@  static void i915_gem_context_clean(struct intel_context *ctx)
 		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
 			break;
 	}
+
+	if (ctx->flags & CONTEXT_USE_TRTT)
+		i915_gem_destroy_trtt_vma(ctx->trtt_info.vma);
 }
 
 void i915_gem_context_free(struct kref *ctx_ref)
@@ -512,6 +515,35 @@  i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
 	return ctx;
 }
 
+static int
+i915_setup_trtt_ctx(struct intel_context *ctx,
+		    struct drm_i915_gem_context_trtt_param *trtt_params)
+{
+	if (ctx->flags & CONTEXT_USE_TRTT)
+		return -EEXIST;
+
+	/* basic sanity checks for the l3 table pointer */
+	if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) &&
+	    (ctx->trtt_info.l3_table_address <
+			(GEN9_TRTT_SEGMENT_START + GEN9_TRTT_SEGMENT_SIZE)))
+		return -EINVAL;
+
+	if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK)
+		return -EINVAL;
+
+	ctx->trtt_info.vma = i915_gem_setup_trtt_vma(&ctx->ppgtt->base);
+	if (IS_ERR(ctx->trtt_info.vma))
+		return PTR_ERR(ctx->trtt_info.vma);
+
+	ctx->trtt_info.null_tile_val = trtt_params->null_tile_val;
+	ctx->trtt_info.invd_tile_val = trtt_params->invd_tile_val;
+	ctx->trtt_info.l3_table_address = trtt_params->l3_table_address;
+	ctx->trtt_info.update_trtt_params = 1;
+
+	ctx->flags |= CONTEXT_USE_TRTT;
+	return 0;
+}
+
 static inline int
 mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 {
@@ -952,6 +984,7 @@  int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct drm_i915_gem_context_param *args = data;
+	struct drm_i915_gem_context_trtt_param trtt_params;
 	struct intel_context *ctx;
 	int ret;
 
@@ -983,6 +1016,18 @@  int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 			ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0;
 		}
 		break;
+	case I915_CONTEXT_PARAM_ENABLE_TRTT:
+		if (args->size < sizeof(trtt_params))
+			ret = -EINVAL;
+		else if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev))
+			ret = -ENODEV;
+		else if (copy_from_user(&trtt_params,
+					to_user_ptr(args->value),
+					sizeof(trtt_params)))
+			ret = -EFAULT;
+		else
+			ret = i915_setup_trtt_ctx(ctx, &trtt_params);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 56f4f2e..28fc1ea 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2146,6 +2146,13 @@  int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 
 int i915_ppgtt_init_hw(struct drm_device *dev)
 {
+	if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
+		struct drm_i915_private *dev_priv = dev->dev_private;
+
+		I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
+			   GEN9_TRTT_BYPASS_DISABLE);
+	}
+
 	/* In the case of execlists, PPGTT is enabled by the context descriptor
 	 * and the PDPs are contained within the context itself.  We don't
 	 * need to do anything here. */
@@ -3328,6 +3335,56 @@  i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj,
 
 }
 
+void i915_gem_destroy_trtt_vma(struct i915_vma *vma)
+{
+	struct i915_address_space *vm = vma->vm;
+
+	WARN_ON(!list_empty(&vma->vma_link));
+	WARN_ON(!list_empty(&vma->mm_list));
+	WARN_ON(!list_empty(&vma->exec_list));
+
+	drm_mm_remove_node(&vma->node);
+	i915_ppgtt_put(i915_vm_to_ppgtt(vm));
+	kmem_cache_free(to_i915(vm->dev)->vmas, vma);
+}
+
+struct i915_vma *
+i915_gem_setup_trtt_vma(struct i915_address_space *vm)
+{
+	struct i915_vma *vma;
+	int ret;
+
+	vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL);
+	if (vma == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&vma->vma_link);
+	INIT_LIST_HEAD(&vma->mm_list);
+	INIT_LIST_HEAD(&vma->exec_list);
+	vma->vm = vm;
+	i915_ppgtt_get(i915_vm_to_ppgtt(vm));
+
+	/* Mark the vma as perennially pinned */
+	vma->pin_count = 1;
+
+	/* Reserve from the 48 bit PPGTT space */
+	vma->node.start = GEN9_TRTT_SEGMENT_START;
+	vma->node.size = GEN9_TRTT_SEGMENT_SIZE;
+	ret = drm_mm_reserve_node(&vm->mm, &vma->node);
+	if (ret) {
+		ret = i915_gem_evict_for_vma(vma);
+		if (ret == 0)
+			ret = drm_mm_reserve_node(&vm->mm, &vma->node);
+	}
+	if (ret) {
+		DRM_ERROR("Reservation for TRTT segment failed: %i\n", ret);
+		i915_gem_destroy_trtt_vma(vma);
+		return ERR_PTR(ret);
+	}
+
+	return vma;
+}
+
 static struct scatterlist *
 rotate_pages(dma_addr_t *in, unsigned int offset,
 	     unsigned int width, unsigned int height,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b448ad8..acb942d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -129,6 +129,10 @@  typedef uint64_t gen8_ppgtt_pml4e_t;
 #define GEN8_PPAT_ELLC_OVERRIDE		(0<<2)
 #define GEN8_PPAT(i, x)			((uint64_t) (x) << ((i) * 8))
 
+/* Lies at the top of 48 bit PPGTT space */
+#define GEN9_TRTT_SEGMENT_START		((1ULL << 48) - (1ULL << 44))
+#define GEN9_TRTT_SEGMENT_SIZE		(1ULL << 44)
+
 enum i915_ggtt_view_type {
 	I915_GGTT_VIEW_NORMAL = 0,
 	I915_GGTT_VIEW_ROTATED,
@@ -559,4 +563,6 @@  size_t
 i915_ggtt_view_size(struct drm_i915_gem_object *obj,
 		    const struct i915_ggtt_view *view);
 
+struct i915_vma *i915_gem_setup_trtt_vma(struct i915_address_space *vm);
+void i915_gem_destroy_trtt_vma(struct i915_vma *vma);
 #endif
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 007ae83..5859be6 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -186,6 +186,25 @@  static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define   GEN8_RPCS_EU_MIN_SHIFT	0
 #define   GEN8_RPCS_EU_MIN_MASK		(0xf << GEN8_RPCS_EU_MIN_SHIFT)
 
+#define GEN9_TR_CHICKEN_BIT_VECTOR	_MMIO(0x4DFC)
+#define   GEN9_TRTT_BYPASS_DISABLE	(1<<0)
+
+/* TRTT registers in the H/W Context */
+#define GEN9_TRTT_L3_POINTER_DW0	_MMIO(0x4DE0)
+#define GEN9_TRTT_L3_POINTER_DW1	_MMIO(0x4DE4)
+#define   GEN9_TRTT_L3_GFXADDR_MASK	0xFFFFFFFF0000
+
+#define GEN9_TRTT_NULL_TILE_REG		_MMIO(0x4DE8)
+#define GEN9_TRTT_INVD_TILE_REG		_MMIO(0x4DEC)
+
+#define GEN9_TRTT_VA_MASKDATA		_MMIO(0x4DF0)
+#define   GEN9_TRVA_MASK_VALUE		0xF0
+#define   GEN9_TRVA_DATA_VALUE		0xF
+
+#define GEN9_TRTT_TABLE_CONTROL		_MMIO(0x4DF4)
+#define   GEN9_TRTT_IN_GFX_VA_SPACE	(1<<1)
+#define   GEN9_TRTT_ENABLE		(1<<0)
+
 #define GAM_ECOCHK			_MMIO(0x4090)
 #define   BDW_DISABLE_HDC_INVALIDATION	(1<<25)
 #define   ECOCHK_SNB_BIT		(1<<10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8096c6a..a8b795d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -183,6 +183,12 @@ 
 #define CTX_LRI_HEADER_2		0x41
 #define CTX_R_PWR_CLK_STATE		0x42
 #define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
+#define CTX_TRTT_L3_PTR_DW0		0x202
+#define CTX_TRTT_L3_PTR_DW1		0x204
+#define CTX_TRTT_NULL_TILE		0x206
+#define CTX_TRTT_INVD_TILE		0x208
+#define CTX_TRTT_VA_MASKDATA		0x20A
+#define CTX_TRTT_TBL_CTL		0x20C
 
 #define GEN8_CTX_VALID (1<<0)
 #define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
@@ -228,6 +234,8 @@  enum {
 static int intel_lr_context_pin(struct drm_i915_gem_request *rq);
 static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *default_ctx_obj);
+static void populate_lr_context_trtt(struct intel_context *ctx,
+		uint32_t *reg_state);
 
 
 /**
@@ -390,6 +398,14 @@  static int execlists_update_context(struct drm_i915_gem_request *rq)
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
 	}
 
+	if (ring->id == RCS && rq->ctx->trtt_info.update_trtt_params) {
+		/* The same page of the context object also contain fields
+		 * related for TRTT setup.
+		 */
+		populate_lr_context_trtt(rq->ctx, reg_state);
+		rq->ctx->trtt_info.update_trtt_params = 0;
+	}
+
 	kunmap_atomic(reg_state);
 
 	return 0;
@@ -2247,6 +2263,31 @@  make_rpcs(struct drm_device *dev)
 	return rpcs;
 }
 
+static void
+populate_lr_context_trtt(struct intel_context *ctx, uint32_t *reg_state)
+{
+	unsigned long masked_l3_gfx_address =
+		ctx->trtt_info.l3_table_address & GEN9_TRTT_L3_GFXADDR_MASK;
+
+	ASSIGN_CTX_REG(reg_state, CTX_TRTT_L3_PTR_DW0, GEN9_TRTT_L3_POINTER_DW0,
+		       lower_32_bits(masked_l3_gfx_address));
+
+	ASSIGN_CTX_REG(reg_state, CTX_TRTT_L3_PTR_DW1, GEN9_TRTT_L3_POINTER_DW1,
+		       upper_32_bits(masked_l3_gfx_address));
+
+	ASSIGN_CTX_REG(reg_state, CTX_TRTT_NULL_TILE, GEN9_TRTT_NULL_TILE_REG,
+		       ctx->trtt_info.null_tile_val);
+
+	ASSIGN_CTX_REG(reg_state, CTX_TRTT_INVD_TILE, GEN9_TRTT_INVD_TILE_REG,
+		       ctx->trtt_info.invd_tile_val);
+
+	ASSIGN_CTX_REG(reg_state, CTX_TRTT_VA_MASKDATA, GEN9_TRTT_VA_MASKDATA,
+		       GEN9_TRVA_MASK_VALUE | GEN9_TRVA_DATA_VALUE);
+
+	ASSIGN_CTX_REG(reg_state, CTX_TRTT_TBL_CTL, GEN9_TRTT_TABLE_CONTROL,
+		       GEN9_TRTT_IN_GFX_VA_SPACE | GEN9_TRTT_ENABLE);
+}
+
 static int
 populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index acf2102..6d6f448 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -357,6 +357,7 @@  typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_GPU_RESET	 35
 #define I915_PARAM_HAS_RESOURCE_STREAMER 36
 #define I915_PARAM_HAS_EXEC_SOFTPIN	 37
+#define I915_PARAM_HAS_TRTT		 38
 
 typedef struct drm_i915_getparam {
 	__s32 param;
@@ -1140,7 +1141,14 @@  struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_BAN_PERIOD	0x1
 #define I915_CONTEXT_PARAM_NO_ZEROMAP	0x2
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
+#define I915_CONTEXT_PARAM_ENABLE_TRTT	0x4
 	__u64 value;
 };
 
+struct drm_i915_gem_context_trtt_param {
+	__u64 l3_table_address;
+	__u32 invd_tile_val;
+	__u32 null_tile_val;
+};
+
 #endif /* _UAPI_I915_DRM_H_ */