Message ID | 8df778355378127ea7eccc9521d6427e3e48d4f2.1600780574.git.robin.murphy@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm: panfrost: Coherency support | expand |
On Tue, Sep 22, 2020 at 03:16:48PM +0100, Robin Murphy wrote: > Midgard GPUs have ACE-Lite master interfaces which allows systems to > integrate them in an I/O-coherent manner. It seems that from the GPU's > viewpoint, the rest of the system is its outer shareable domain, and so > even when snoop signals are wired up, they are only emitted for outer > shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does > indeed get coherent pagetable walks working nicely for the coherent > T620 in the Arm Juno SoC. > > Reviewed-by: Steven Price <steven.price@arm.com> > Tested-by: Neil Armstrong <narmstrong@baylibre.com> > Signed-off-by: Robin Murphy <robin.murphy@arm.com> > --- > drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index dc7bcf858b6d..b4072a18e45d 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, > << ARM_LPAE_PTE_ATTRINDX_SHIFT); > } > > - if (prot & IOMMU_CACHE) > + /* > + * Also Mali has its own notions of shareability wherein its Inner > + * domain covers the cores within the GPU, and its Outer domain is > + * "outside the GPU" (i.e. either the Inner or System domain in CPU > + * terms, depending on coherency). > + */ > + if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) > pte |= ARM_LPAE_PTE_SH_IS; > else > pte |= ARM_LPAE_PTE_SH_OS; > @@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) > cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) | > ARM_MALI_LPAE_TTBR_READ_INNER | > ARM_MALI_LPAE_TTBR_ADRMODE_TABLE; > + if (cfg->coherent_walk) > + cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER; > + Acked-by: Will Deacon <will@kernel.org> I'm assuming I'm not the right person to merge this, and it needs to go alongside the other patches in this series. Will
On Tue, 22 Sep 2020 15:16:48 +0100 Robin Murphy <robin.murphy@arm.com> wrote: > Midgard GPUs have ACE-Lite master interfaces which allows systems to > integrate them in an I/O-coherent manner. It seems that from the GPU's > viewpoint, the rest of the system is its outer shareable domain, and so > even when snoop signals are wired up, they are only emitted for outer > shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does > indeed get coherent pagetable walks working nicely for the coherent > T620 in the Arm Juno SoC. > > Reviewed-by: Steven Price <steven.price@arm.com> > Tested-by: Neil Armstrong <narmstrong@baylibre.com> > Signed-off-by: Robin Murphy <robin.murphy@arm.com> > --- > drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index dc7bcf858b6d..b4072a18e45d 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, > << ARM_LPAE_PTE_ATTRINDX_SHIFT); > } > > - if (prot & IOMMU_CACHE) > + /* > + * Also Mali has its own notions of shareability wherein its Inner > + * domain covers the cores within the GPU, and its Outer domain is > + * "outside the GPU" (i.e. either the Inner or System domain in CPU > + * terms, depending on coherency). > + */ > + if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) > pte |= ARM_LPAE_PTE_SH_IS; > else > pte |= ARM_LPAE_PTE_SH_OS; Actually, it still doesn't work on s922x :-/. For it to work I correctly, I need to drop the outer shareable flag here. > @@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) > cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) | > ARM_MALI_LPAE_TTBR_READ_INNER | > ARM_MALI_LPAE_TTBR_ADRMODE_TABLE; > + if (cfg->coherent_walk) > + cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER; > + > return &data->iop; > > out_free_data:
On 05/10/2020 15:50, Boris Brezillon wrote: > On Tue, 22 Sep 2020 15:16:48 +0100 > Robin Murphy <robin.murphy@arm.com> wrote: > >> Midgard GPUs have ACE-Lite master interfaces which allows systems to >> integrate them in an I/O-coherent manner. It seems that from the GPU's >> viewpoint, the rest of the system is its outer shareable domain, and so >> even when snoop signals are wired up, they are only emitted for outer >> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does >> indeed get coherent pagetable walks working nicely for the coherent >> T620 in the Arm Juno SoC. >> >> Reviewed-by: Steven Price <steven.price@arm.com> >> Tested-by: Neil Armstrong <narmstrong@baylibre.com> >> Signed-off-by: Robin Murphy <robin.murphy@arm.com> >> --- >> drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- >> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c >> index dc7bcf858b6d..b4072a18e45d 100644 >> --- a/drivers/iommu/io-pgtable-arm.c >> +++ b/drivers/iommu/io-pgtable-arm.c >> @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, >> << ARM_LPAE_PTE_ATTRINDX_SHIFT); >> } >> >> - if (prot & IOMMU_CACHE) >> + /* >> + * Also Mali has its own notions of shareability wherein its Inner >> + * domain covers the cores within the GPU, and its Outer domain is >> + * "outside the GPU" (i.e. either the Inner or System domain in CPU >> + * terms, depending on coherency). >> + */ >> + if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) >> pte |= ARM_LPAE_PTE_SH_IS; >> else >> pte |= ARM_LPAE_PTE_SH_OS; > > Actually, it still doesn't work on s922x :-/. For it to work I > correctly, I need to drop the outer shareable flag here. The logic here does seem a bit odd. Originally it was: IOMMU_CACHE -> Inner shared (value 3) !IOMMU_CACHE -> Outer shared (value 2) For Mali we're forcing everything to the second option. But Mali being Mali doesn't do things the same as LPAE, so for Mali we have: 0 - not shared 1 - reserved 2 - inner(*) and outer shareable 3 - inner shareable only (*) where "inner" means internal to the GPU, and "outer" means shared with the CPU "inner". Very confusing! So originally we had: IOMMU_CACHE -> not shared with CPU (only internally in the GPU) !IOMMU_CACHE -> shared with CPU The change above gets us to "always shared", dropping the SH_OS bit would get us to not even shareable between cores (which doesn't sound like what we want). It's not at all clear to me why the change helps, but I suspect we want at least "inner" shareable. Steve >> @@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) >> cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) | >> ARM_MALI_LPAE_TTBR_READ_INNER | >> ARM_MALI_LPAE_TTBR_ADRMODE_TABLE; >> + if (cfg->coherent_walk) >> + cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER; >> + >> return &data->iop; >> >> out_free_data: >
On Mon, 5 Oct 2020 16:16:32 +0100 Steven Price <steven.price@arm.com> wrote: > On 05/10/2020 15:50, Boris Brezillon wrote: > > On Tue, 22 Sep 2020 15:16:48 +0100 > > Robin Murphy <robin.murphy@arm.com> wrote: > > > >> Midgard GPUs have ACE-Lite master interfaces which allows systems to > >> integrate them in an I/O-coherent manner. It seems that from the GPU's > >> viewpoint, the rest of the system is its outer shareable domain, and so > >> even when snoop signals are wired up, they are only emitted for outer > >> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does > >> indeed get coherent pagetable walks working nicely for the coherent > >> T620 in the Arm Juno SoC. > >> > >> Reviewed-by: Steven Price <steven.price@arm.com> > >> Tested-by: Neil Armstrong <narmstrong@baylibre.com> > >> Signed-off-by: Robin Murphy <robin.murphy@arm.com> > >> --- > >> drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- > >> 1 file changed, 10 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > >> index dc7bcf858b6d..b4072a18e45d 100644 > >> --- a/drivers/iommu/io-pgtable-arm.c > >> +++ b/drivers/iommu/io-pgtable-arm.c > >> @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, > >> << ARM_LPAE_PTE_ATTRINDX_SHIFT); > >> } > >> > >> - if (prot & IOMMU_CACHE) > >> + /* > >> + * Also Mali has its own notions of shareability wherein its Inner > >> + * domain covers the cores within the GPU, and its Outer domain is > >> + * "outside the GPU" (i.e. either the Inner or System domain in CPU > >> + * terms, depending on coherency). > >> + */ > >> + if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) > >> pte |= ARM_LPAE_PTE_SH_IS; > >> else > >> pte |= ARM_LPAE_PTE_SH_OS; > > > > Actually, it still doesn't work on s922x :-/. For it to work I > > correctly, I need to drop the outer shareable flag here. > > The logic here does seem a bit odd. Originally it was: > > IOMMU_CACHE -> Inner shared (value 3) > !IOMMU_CACHE -> Outer shared (value 2) > > For Mali we're forcing everything to the second option. But Mali being > Mali doesn't do things the same as LPAE, so for Mali we have: > > 0 - not shared > 1 - reserved > 2 - inner(*) and outer shareable > 3 - inner shareable only > > (*) where "inner" means internal to the GPU, and "outer" means shared > with the CPU "inner". Very confusing! > > So originally we had: > IOMMU_CACHE -> not shared with CPU (only internally in the GPU) > !IOMMU_CACHE -> shared with CPU > > The change above gets us to "always shared", dropping the SH_OS bit > would get us to not even shareable between cores (which doesn't sound > like what we want). Thanks for this explanation. > > It's not at all clear to me why the change helps, but I suspect we want > at least "inner" shareable. Right. Looks like all this was caused by a bad conflict resolution during a rebase. Sorry for the noise :-/.
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index dc7bcf858b6d..b4072a18e45d 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, << ARM_LPAE_PTE_ATTRINDX_SHIFT); } - if (prot & IOMMU_CACHE) + /* + * Also Mali has its own notions of shareability wherein its Inner + * domain covers the cores within the GPU, and its Outer domain is + * "outside the GPU" (i.e. either the Inner or System domain in CPU + * terms, depending on coherency). + */ + if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) pte |= ARM_LPAE_PTE_SH_IS; else pte |= ARM_LPAE_PTE_SH_OS; @@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) | ARM_MALI_LPAE_TTBR_READ_INNER | ARM_MALI_LPAE_TTBR_ADRMODE_TABLE; + if (cfg->coherent_walk) + cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER; + return &data->iop; out_free_data: