diff mbox series

[v2,1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE

Message ID 8df778355378127ea7eccc9521d6427e3e48d4f2.1600780574.git.robin.murphy@arm.com
State New
Headers show
Series drm: panfrost: Coherency support | expand

Commit Message

Robin Murphy Sept. 22, 2020, 2:16 p.m. UTC
Midgard GPUs have ACE-Lite master interfaces which allows systems to
integrate them in an I/O-coherent manner. It seems that from the GPU's
viewpoint, the rest of the system is its outer shareable domain, and so
even when snoop signals are wired up, they are only emitted for outer
shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
indeed get coherent pagetable walks working nicely for the coherent
T620 in the Arm Juno SoC.

Reviewed-by: Steven Price <steven.price@arm.com>
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Will Deacon Sept. 28, 2020, 2:59 p.m. UTC | #1
On Tue, Sep 22, 2020 at 03:16:48PM +0100, Robin Murphy wrote:
> Midgard GPUs have ACE-Lite master interfaces which allows systems to
> integrate them in an I/O-coherent manner. It seems that from the GPU's
> viewpoint, the rest of the system is its outer shareable domain, and so
> even when snoop signals are wired up, they are only emitted for outer
> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> indeed get coherent pagetable walks working nicely for the coherent
> T620 in the Arm Juno SoC.
> 
> Reviewed-by: Steven Price <steven.price@arm.com>
> Tested-by: Neil Armstrong <narmstrong@baylibre.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/io-pgtable-arm.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index dc7bcf858b6d..b4072a18e45d 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>  	}
>  
> -	if (prot & IOMMU_CACHE)
> +	/*
> +	 * Also Mali has its own notions of shareability wherein its Inner
> +	 * domain covers the cores within the GPU, and its Outer domain is
> +	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
> +	 * terms, depending on coherency).
> +	 */
> +	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
>  		pte |= ARM_LPAE_PTE_SH_IS;
>  	else
>  		pte |= ARM_LPAE_PTE_SH_OS;
> @@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
>  	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
>  					  ARM_MALI_LPAE_TTBR_READ_INNER |
>  					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
> +	if (cfg->coherent_walk)
> +		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
> +

Acked-by: Will Deacon <will@kernel.org>

I'm assuming I'm not the right person to merge this, and it needs to go
alongside the other patches in this series.

Will
Boris Brezillon Oct. 5, 2020, 2:50 p.m. UTC | #2
On Tue, 22 Sep 2020 15:16:48 +0100
Robin Murphy <robin.murphy@arm.com> wrote:

> Midgard GPUs have ACE-Lite master interfaces which allows systems to
> integrate them in an I/O-coherent manner. It seems that from the GPU's
> viewpoint, the rest of the system is its outer shareable domain, and so
> even when snoop signals are wired up, they are only emitted for outer
> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> indeed get coherent pagetable walks working nicely for the coherent
> T620 in the Arm Juno SoC.
> 
> Reviewed-by: Steven Price <steven.price@arm.com>
> Tested-by: Neil Armstrong <narmstrong@baylibre.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/io-pgtable-arm.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index dc7bcf858b6d..b4072a18e45d 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>  	}
>  
> -	if (prot & IOMMU_CACHE)
> +	/*
> +	 * Also Mali has its own notions of shareability wherein its Inner
> +	 * domain covers the cores within the GPU, and its Outer domain is
> +	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
> +	 * terms, depending on coherency).
> +	 */
> +	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
>  		pte |= ARM_LPAE_PTE_SH_IS;
>  	else
>  		pte |= ARM_LPAE_PTE_SH_OS;

Actually, it still doesn't work on s922x :-/. For it to work I
correctly, I need to drop the outer shareable flag here.

> @@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
>  	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
>  					  ARM_MALI_LPAE_TTBR_READ_INNER |
>  					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
> +	if (cfg->coherent_walk)
> +		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
> +
>  	return &data->iop;
>  
>  out_free_data:
Steven Price Oct. 5, 2020, 3:16 p.m. UTC | #3
On 05/10/2020 15:50, Boris Brezillon wrote:
> On Tue, 22 Sep 2020 15:16:48 +0100
> Robin Murphy <robin.murphy@arm.com> wrote:
> 
>> Midgard GPUs have ACE-Lite master interfaces which allows systems to
>> integrate them in an I/O-coherent manner. It seems that from the GPU's
>> viewpoint, the rest of the system is its outer shareable domain, and so
>> even when snoop signals are wired up, they are only emitted for outer
>> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
>> indeed get coherent pagetable walks working nicely for the coherent
>> T620 in the Arm Juno SoC.
>>
>> Reviewed-by: Steven Price <steven.price@arm.com>
>> Tested-by: Neil Armstrong <narmstrong@baylibre.com>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/io-pgtable-arm.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index dc7bcf858b6d..b4072a18e45d 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>>   				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>>   	}
>>   
>> -	if (prot & IOMMU_CACHE)
>> +	/*
>> +	 * Also Mali has its own notions of shareability wherein its Inner
>> +	 * domain covers the cores within the GPU, and its Outer domain is
>> +	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
>> +	 * terms, depending on coherency).
>> +	 */
>> +	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
>>   		pte |= ARM_LPAE_PTE_SH_IS;
>>   	else
>>   		pte |= ARM_LPAE_PTE_SH_OS;
> 
> Actually, it still doesn't work on s922x :-/. For it to work I
> correctly, I need to drop the outer shareable flag here.

The logic here does seem a bit odd. Originally it was:

IOMMU_CACHE -> Inner shared (value 3)
!IOMMU_CACHE -> Outer shared (value 2)

For Mali we're forcing everything to the second option. But Mali being 
Mali doesn't do things the same as LPAE, so for Mali we have:

0 - not shared
1 - reserved
2 - inner(*) and outer shareable
3 - inner shareable only

(*) where "inner" means internal to the GPU, and "outer" means shared 
with the CPU "inner". Very confusing!

So originally we had:
IOMMU_CACHE -> not shared with CPU (only internally in the GPU)
!IOMMU_CACHE -> shared with CPU

The change above gets us to "always shared", dropping the SH_OS bit 
would get us to not even shareable between cores (which doesn't sound 
like what we want).

It's not at all clear to me why the change helps, but I suspect we want 
at least "inner" shareable.

Steve

>> @@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
>>   	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
>>   					  ARM_MALI_LPAE_TTBR_READ_INNER |
>>   					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
>> +	if (cfg->coherent_walk)
>> +		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
>> +
>>   	return &data->iop;
>>   
>>   out_free_data:
>
Boris Brezillon Oct. 5, 2020, 3:52 p.m. UTC | #4
On Mon, 5 Oct 2020 16:16:32 +0100
Steven Price <steven.price@arm.com> wrote:

> On 05/10/2020 15:50, Boris Brezillon wrote:
> > On Tue, 22 Sep 2020 15:16:48 +0100
> > Robin Murphy <robin.murphy@arm.com> wrote:
> >   
> >> Midgard GPUs have ACE-Lite master interfaces which allows systems to
> >> integrate them in an I/O-coherent manner. It seems that from the GPU's
> >> viewpoint, the rest of the system is its outer shareable domain, and so
> >> even when snoop signals are wired up, they are only emitted for outer
> >> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> >> indeed get coherent pagetable walks working nicely for the coherent
> >> T620 in the Arm Juno SoC.
> >>
> >> Reviewed-by: Steven Price <steven.price@arm.com>
> >> Tested-by: Neil Armstrong <narmstrong@baylibre.com>
> >> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >> ---
> >>   drivers/iommu/io-pgtable-arm.c | 11 ++++++++++-
> >>   1 file changed, 10 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> >> index dc7bcf858b6d..b4072a18e45d 100644
> >> --- a/drivers/iommu/io-pgtable-arm.c
> >> +++ b/drivers/iommu/io-pgtable-arm.c
> >> @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> >>   				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
> >>   	}
> >>   
> >> -	if (prot & IOMMU_CACHE)
> >> +	/*
> >> +	 * Also Mali has its own notions of shareability wherein its Inner
> >> +	 * domain covers the cores within the GPU, and its Outer domain is
> >> +	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
> >> +	 * terms, depending on coherency).
> >> +	 */
> >> +	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
> >>   		pte |= ARM_LPAE_PTE_SH_IS;
> >>   	else
> >>   		pte |= ARM_LPAE_PTE_SH_OS;  
> > 
> > Actually, it still doesn't work on s922x :-/. For it to work I
> > correctly, I need to drop the outer shareable flag here.  
> 
> The logic here does seem a bit odd. Originally it was:
> 
> IOMMU_CACHE -> Inner shared (value 3)
> !IOMMU_CACHE -> Outer shared (value 2)
> 
> For Mali we're forcing everything to the second option. But Mali being 
> Mali doesn't do things the same as LPAE, so for Mali we have:
> 
> 0 - not shared
> 1 - reserved
> 2 - inner(*) and outer shareable
> 3 - inner shareable only
> 
> (*) where "inner" means internal to the GPU, and "outer" means shared 
> with the CPU "inner". Very confusing!
> 
> So originally we had:
> IOMMU_CACHE -> not shared with CPU (only internally in the GPU)
> !IOMMU_CACHE -> shared with CPU
> 
> The change above gets us to "always shared", dropping the SH_OS bit 
> would get us to not even shareable between cores (which doesn't sound 
> like what we want).

Thanks for this explanation.

> 
> It's not at all clear to me why the change helps, but I suspect we want 
> at least "inner" shareable.

Right. Looks like all this was caused by a bad conflict resolution
during a rebase. Sorry for the noise :-/.
diff mbox series

Patch

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dc7bcf858b6d..b4072a18e45d 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -440,7 +440,13 @@  static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
 	}
 
-	if (prot & IOMMU_CACHE)
+	/*
+	 * Also Mali has its own notions of shareability wherein its Inner
+	 * domain covers the cores within the GPU, and its Outer domain is
+	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
+	 * terms, depending on coherency).
+	 */
+	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
 		pte |= ARM_LPAE_PTE_SH_IS;
 	else
 		pte |= ARM_LPAE_PTE_SH_OS;
@@ -1049,6 +1055,9 @@  arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
 					  ARM_MALI_LPAE_TTBR_READ_INNER |
 					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
+	if (cfg->coherent_walk)
+		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
+
 	return &data->iop;
 
 out_free_data: