diff mbox series

[PATCHv3,7/7] drm/msm/a6xx: Add support for using system cache(LLC)

Message ID 449a6544b10f0035d191ac52283198343187c153.1593344120.git.saiprakash.ranjan@codeaurora.org (mailing list archive)
State Superseded
Headers show
Series System Cache support for GPU and required SMMU support | expand

Commit Message

Sai Prakash Ranjan June 29, 2020, 3:52 p.m. UTC
From: Sharat Masetty <smasetty@codeaurora.org>

The last level system cache can be partitioned to 32 different
slices of which GPU has two slices preallocated. One slice is
used for caching GPU buffers and the other slice is used for
caching the GPU SMMU pagetables. This talks to the core system
cache driver to acquire the slice handles, configure the SCID's
to those slices and activates and deactivates the slices upon
GPU power collapse and restore.

Some support from the IOMMU driver is also needed to make use
of the system cache. IOMMU_SYS_CACHE_ONLY is a buffer protection
flag which enables caching GPU data buffers in the system cache
with memory attributes such as outer cacheable, read-allocate,
write-allocate for buffers. The GPU then has the ability to
override a few cacheability parameters which it does to override
write-allocate to write-no-allocate as the GPU hardware does not
benefit much from it.

Similarly DOMAIN_ATTR_SYS_CACHE is another domain level attribute
used by the IOMMU driver to set the right attributes to cache the
hardware pagetables into the system cache.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
(sai: fix to set attr before device attach to IOMMU and rebase)
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 82 +++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |  3 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 23 ++++++-
 drivers/gpu/drm/msm/msm_iommu.c         |  3 +
 drivers/gpu/drm/msm/msm_mmu.h           |  4 ++
 5 files changed, 114 insertions(+), 1 deletion(-)

Comments

Will Deacon July 3, 2020, 1:37 p.m. UTC | #1
On Mon, Jun 29, 2020 at 09:22:50PM +0530, Sai Prakash Ranjan wrote:
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index f455c597f76d..bd1d58229cc2 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -218,6 +218,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
>  		iova |= GENMASK_ULL(63, 49);
>  
>  
> +	if (mmu->features & MMU_FEATURE_USE_SYSTEM_CACHE)
> +		prot |= IOMMU_SYS_CACHE_ONLY;

Given that I think this is the only user of IOMMU_SYS_CACHE_ONLY, then it
looks like it should actually be a property on the domain because we never
need to configure it on a per-mapping basis within a domain, and therefore
it shouldn't be exposed by the IOMMU API as a prot flag.

Do you agree?

Will
Sai Prakash Ranjan July 3, 2020, 2:53 p.m. UTC | #2
Hi Will,

On 2020-07-03 19:07, Will Deacon wrote:
> On Mon, Jun 29, 2020 at 09:22:50PM +0530, Sai Prakash Ranjan wrote:
>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
>> b/drivers/gpu/drm/msm/msm_iommu.c
>> index f455c597f76d..bd1d58229cc2 100644
>> --- a/drivers/gpu/drm/msm/msm_iommu.c
>> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>> @@ -218,6 +218,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, 
>> uint64_t iova,
>>  		iova |= GENMASK_ULL(63, 49);
>> 
>> 
>> +	if (mmu->features & MMU_FEATURE_USE_SYSTEM_CACHE)
>> +		prot |= IOMMU_SYS_CACHE_ONLY;
> 
> Given that I think this is the only user of IOMMU_SYS_CACHE_ONLY, then 
> it
> looks like it should actually be a property on the domain because we 
> never
> need to configure it on a per-mapping basis within a domain, and 
> therefore
> it shouldn't be exposed by the IOMMU API as a prot flag.
> 
> Do you agree?
> 

GPU being the only user is for now, but there are other clients which 
can use this.
Plus how do we set the memory attributes if we do not expose this as 
prot flag?

Thanks,
Sai
Will Deacon July 3, 2020, 3:40 p.m. UTC | #3
On Fri, Jul 03, 2020 at 08:23:07PM +0530, Sai Prakash Ranjan wrote:
> On 2020-07-03 19:07, Will Deacon wrote:
> > On Mon, Jun 29, 2020 at 09:22:50PM +0530, Sai Prakash Ranjan wrote:
> > > diff --git a/drivers/gpu/drm/msm/msm_iommu.c
> > > b/drivers/gpu/drm/msm/msm_iommu.c
> > > index f455c597f76d..bd1d58229cc2 100644
> > > --- a/drivers/gpu/drm/msm/msm_iommu.c
> > > +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > > @@ -218,6 +218,9 @@ static int msm_iommu_map(struct msm_mmu *mmu,
> > > uint64_t iova,
> > >  		iova |= GENMASK_ULL(63, 49);
> > > 
> > > 
> > > +	if (mmu->features & MMU_FEATURE_USE_SYSTEM_CACHE)
> > > +		prot |= IOMMU_SYS_CACHE_ONLY;
> > 
> > Given that I think this is the only user of IOMMU_SYS_CACHE_ONLY, then
> > it
> > looks like it should actually be a property on the domain because we
> > never
> > need to configure it on a per-mapping basis within a domain, and
> > therefore
> > it shouldn't be exposed by the IOMMU API as a prot flag.
> > 
> > Do you agree?
> > 
> 
> GPU being the only user is for now, but there are other clients which can
> use this.
> Plus how do we set the memory attributes if we do not expose this as prot
> flag?

I just don't understand the need for it to be per-map operation. Put another
way, if we extended the domain attribute to apply to cacheable mappings
on the domain and not just the table walk, what would break?

Will
Rob Clark July 3, 2020, 4:04 p.m. UTC | #4
On Fri, Jul 3, 2020 at 7:53 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> Hi Will,
>
> On 2020-07-03 19:07, Will Deacon wrote:
> > On Mon, Jun 29, 2020 at 09:22:50PM +0530, Sai Prakash Ranjan wrote:
> >> diff --git a/drivers/gpu/drm/msm/msm_iommu.c
> >> b/drivers/gpu/drm/msm/msm_iommu.c
> >> index f455c597f76d..bd1d58229cc2 100644
> >> --- a/drivers/gpu/drm/msm/msm_iommu.c
> >> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> >> @@ -218,6 +218,9 @@ static int msm_iommu_map(struct msm_mmu *mmu,
> >> uint64_t iova,
> >>              iova |= GENMASK_ULL(63, 49);
> >>
> >>
> >> +    if (mmu->features & MMU_FEATURE_USE_SYSTEM_CACHE)
> >> +            prot |= IOMMU_SYS_CACHE_ONLY;
> >
> > Given that I think this is the only user of IOMMU_SYS_CACHE_ONLY, then
> > it
> > looks like it should actually be a property on the domain because we
> > never
> > need to configure it on a per-mapping basis within a domain, and
> > therefore
> > it shouldn't be exposed by the IOMMU API as a prot flag.
> >
> > Do you agree?
> >
>
> GPU being the only user is for now, but there are other clients which
> can use this.
> Plus how do we set the memory attributes if we do not expose this as
> prot flag?

It does appear that the downstream kgsl driver sets this for basically
all mappings.. well there is some conditional stuff around
DOMAIN_ATTR_USE_LLC_NWA but it seems based on the property of the
domain.  (Jordan may know more about what that is about.)  But looks
like there are a lot of different paths into iommu_map in kgsl so I
might have missed something.

Assuming there isn't some case where we specifically don't want to use
the system cache for some mapping, I think it could be a domain
attribute that sets an io_pgtable_cfg::quirks flag

BR,
-R
Sai Prakash Ranjan July 3, 2020, 4:59 p.m. UTC | #5
On 2020-07-03 21:34, Rob Clark wrote:
> On Fri, Jul 3, 2020 at 7:53 AM Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
>> 
>> Hi Will,
>> 
>> On 2020-07-03 19:07, Will Deacon wrote:
>> > On Mon, Jun 29, 2020 at 09:22:50PM +0530, Sai Prakash Ranjan wrote:
>> >> diff --git a/drivers/gpu/drm/msm/msm_iommu.c
>> >> b/drivers/gpu/drm/msm/msm_iommu.c
>> >> index f455c597f76d..bd1d58229cc2 100644
>> >> --- a/drivers/gpu/drm/msm/msm_iommu.c
>> >> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>> >> @@ -218,6 +218,9 @@ static int msm_iommu_map(struct msm_mmu *mmu,
>> >> uint64_t iova,
>> >>              iova |= GENMASK_ULL(63, 49);
>> >>
>> >>
>> >> +    if (mmu->features & MMU_FEATURE_USE_SYSTEM_CACHE)
>> >> +            prot |= IOMMU_SYS_CACHE_ONLY;
>> >
>> > Given that I think this is the only user of IOMMU_SYS_CACHE_ONLY, then
>> > it
>> > looks like it should actually be a property on the domain because we
>> > never
>> > need to configure it on a per-mapping basis within a domain, and
>> > therefore
>> > it shouldn't be exposed by the IOMMU API as a prot flag.
>> >
>> > Do you agree?
>> >
>> 
>> GPU being the only user is for now, but there are other clients which
>> can use this.
>> Plus how do we set the memory attributes if we do not expose this as
>> prot flag?
> 
> It does appear that the downstream kgsl driver sets this for basically
> all mappings.. well there is some conditional stuff around
> DOMAIN_ATTR_USE_LLC_NWA but it seems based on the property of the
> domain.  (Jordan may know more about what that is about.)  But looks
> like there are a lot of different paths into iommu_map in kgsl so I
> might have missed something.
> 
> Assuming there isn't some case where we specifically don't want to use
> the system cache for some mapping, I think it could be a domain
> attribute that sets an io_pgtable_cfg::quirks flag
> 

Ok then we are good to remove unused sys cache prot flag which Will has 
posted.

Thanks,
Sai
Jordan Crouse July 9, 2020, 4:13 p.m. UTC | #6
On Fri, Jul 03, 2020 at 09:04:49AM -0700, Rob Clark wrote:
> On Fri, Jul 3, 2020 at 7:53 AM Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
> >
> > Hi Will,
> >
> > On 2020-07-03 19:07, Will Deacon wrote:
> > > On Mon, Jun 29, 2020 at 09:22:50PM +0530, Sai Prakash Ranjan wrote:
> > >> diff --git a/drivers/gpu/drm/msm/msm_iommu.c
> > >> b/drivers/gpu/drm/msm/msm_iommu.c
> > >> index f455c597f76d..bd1d58229cc2 100644
> > >> --- a/drivers/gpu/drm/msm/msm_iommu.c
> > >> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > >> @@ -218,6 +218,9 @@ static int msm_iommu_map(struct msm_mmu *mmu,
> > >> uint64_t iova,
> > >>              iova |= GENMASK_ULL(63, 49);
> > >>
> > >>
> > >> +    if (mmu->features & MMU_FEATURE_USE_SYSTEM_CACHE)
> > >> +            prot |= IOMMU_SYS_CACHE_ONLY;
> > >
> > > Given that I think this is the only user of IOMMU_SYS_CACHE_ONLY, then
> > > it
> > > looks like it should actually be a property on the domain because we
> > > never
> > > need to configure it on a per-mapping basis within a domain, and
> > > therefore
> > > it shouldn't be exposed by the IOMMU API as a prot flag.
> > >
> > > Do you agree?
> > >
> >
> > GPU being the only user is for now, but there are other clients which
> > can use this.
> > Plus how do we set the memory attributes if we do not expose this as
> > prot flag?
> 
> It does appear that the downstream kgsl driver sets this for basically
> all mappings.. well there is some conditional stuff around
> DOMAIN_ATTR_USE_LLC_NWA but it seems based on the property of the
> domain.  (Jordan may know more about what that is about.)  But looks
> like there are a lot of different paths into iommu_map in kgsl so I
> might have missed something.

Downstream does set it universally. There are some theoretical use cases
where it might be beneficial to set it on a per-mapping basis with a bunch
of hinting from userspace and nobody has tried to characterize this on real
hardware so it is not clear to me if it is worth it.

I think a domain wide attribute works for now but if a compelling per-mapping
use case does comes down the pipeline we need to have a backup in mind -
possibly a prot flag to disable NWA?

Jordan

> Assuming there isn't some case where we specifically don't want to use
> the system cache for some mapping, I think it could be a domain
> attribute that sets an io_pgtable_cfg::quirks flag
> 
> BR,
> -R
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 6bee70853ea8..c33cd2a588e6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -9,6 +9,8 @@ 
 #include "a6xx_gmu.xml.h"
 
 #include <linux/devfreq.h>
+#include <linux/bitfield.h>
+#include <linux/soc/qcom/llcc-qcom.h>
 
 #define GPU_PAS_ID 13
 
@@ -808,6 +810,79 @@  static const u32 a6xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_A6XX_CP_RB_CNTL),
 };
 
+static void a6xx_llc_rmw(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 mask, u32 or)
+{
+	return msm_rmw(a6xx_gpu->llc_mmio + (reg << 2), mask, or);
+}
+
+static void a6xx_llc_write(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 value)
+{
+	return msm_writel(value, a6xx_gpu->llc_mmio + (reg << 2));
+}
+
+static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu)
+{
+	llcc_slice_deactivate(a6xx_gpu->llc_slice);
+	llcc_slice_deactivate(a6xx_gpu->htw_llc_slice);
+}
+
+static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
+{
+	u32 cntl1_regval = 0;
+
+	if (IS_ERR(a6xx_gpu->llc_mmio))
+		return;
+
+	if (!llcc_slice_activate(a6xx_gpu->llc_slice)) {
+		u32 gpu_scid = llcc_get_slice_id(a6xx_gpu->llc_slice);
+
+		gpu_scid &= 0x1f;
+		cntl1_regval = (gpu_scid << 0) | (gpu_scid << 5) | (gpu_scid << 10) |
+			       (gpu_scid << 15) | (gpu_scid << 20);
+	}
+
+	if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) {
+		u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
+
+		gpuhtw_scid &= 0x1f;
+		cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid);
+	}
+
+	if (cntl1_regval) {
+		/*
+		 * Program the slice IDs for the various GPU blocks and GPU MMU
+		 * pagetables
+		 */
+		a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, cntl1_regval);
+
+		/*
+		 * Program cacheability overrides to not allocate cache lines on
+		 * a write miss
+		 */
+		a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, 0x03);
+	}
+}
+
+static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)
+{
+	llcc_slice_putd(a6xx_gpu->llc_slice);
+	llcc_slice_putd(a6xx_gpu->htw_llc_slice);
+}
+
+static void a6xx_llc_slices_init(struct platform_device *pdev,
+		struct a6xx_gpu *a6xx_gpu)
+{
+	a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
+	if (IS_ERR(a6xx_gpu->llc_mmio))
+		return;
+
+	a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU);
+	a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
+
+	if (IS_ERR(a6xx_gpu->llc_slice) && IS_ERR(a6xx_gpu->htw_llc_slice))
+		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
+}
+
 static int a6xx_pm_resume(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -822,6 +897,8 @@  static int a6xx_pm_resume(struct msm_gpu *gpu)
 
 	msm_gpu_resume_devfreq(gpu);
 
+	a6xx_llc_activate(a6xx_gpu);
+
 	return 0;
 }
 
@@ -830,6 +907,8 @@  static int a6xx_pm_suspend(struct msm_gpu *gpu)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 
+	a6xx_llc_deactivate(a6xx_gpu);
+
 	devfreq_suspend_device(gpu->devfreq.devfreq);
 
 	return a6xx_gmu_stop(a6xx_gpu);
@@ -868,6 +947,7 @@  static void a6xx_destroy(struct msm_gpu *gpu)
 		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
 	}
 
+	a6xx_llc_slices_destroy(a6xx_gpu);
 	a6xx_gmu_remove(a6xx_gpu);
 
 	adreno_gpu_cleanup(adreno_gpu);
@@ -962,6 +1042,8 @@  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = NULL;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;
 
+	a6xx_llc_slices_init(pdev, a6xx_gpu);
+
 	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 7239b8b60939..90043448fab1 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -21,6 +21,9 @@  struct a6xx_gpu {
 	struct msm_ringbuffer *cur_ring;
 
 	struct a6xx_gmu gmu;
+	void __iomem *llc_mmio;
+	void *llc_slice;
+	void *htw_llc_slice;
 };
 
 #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 3e717c1ebb7f..4666d2df8e65 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -190,10 +190,31 @@  adreno_iommu_create_address_space(struct msm_gpu *gpu,
 		struct platform_device *pdev)
 {
 	struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
-	struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, iommu);
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 	struct msm_gem_address_space *aspace;
+	struct msm_mmu *mmu;
 	u64 start, size;
 
+	/*
+	 * This allows GPU to set the bus attributes required to use system
+	 * cache on behalf of the iommu page table walker.
+	 */
+	if (!IS_ERR(a6xx_gpu->htw_llc_slice)) {
+		int gpu_htw_llc = 1;
+
+		iommu_domain_set_attr(iommu, DOMAIN_ATTR_SYS_CACHE, &gpu_htw_llc);
+	}
+
+	mmu = msm_iommu_new(&pdev->dev, iommu);
+	if (IS_ERR(mmu)) {
+		iommu_domain_free(iommu);
+		return ERR_CAST(mmu);
+	}
+
+	if (!IS_ERR(a6xx_gpu->llc_slice))
+		mmu->features |= MMU_FEATURE_USE_SYSTEM_CACHE;
+
 	/*
 	 * Use the aperture start or SZ_16M, whichever is greater. This will
 	 * ensure that we align with the allocated pagetable range while still
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index f455c597f76d..bd1d58229cc2 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -218,6 +218,9 @@  static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 		iova |= GENMASK_ULL(63, 49);
 
 
+	if (mmu->features & MMU_FEATURE_USE_SYSTEM_CACHE)
+		prot |= IOMMU_SYS_CACHE_ONLY;
+
 	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
 	WARN_ON(!ret);
 
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 61ade89d9e48..90965241e567 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -23,12 +23,16 @@  enum msm_mmu_type {
 	MSM_MMU_IOMMU_PAGETABLE,
 };
 
+/* MMU features */
+#define MMU_FEATURE_USE_SYSTEM_CACHE	BIT(0)
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
 	int (*handler)(void *arg, unsigned long iova, int flags);
 	void *arg;
 	enum msm_mmu_type type;
+	u32 features;
 };
 
 static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,