mbox series

[0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

Message ID cover.1610372717.git.saiprakash.ranjan@codeaurora.org (mailing list archive)
Headers show
Series iommu/drm/msm: Allow non-coherent masters to use system cache | expand

Message

Sai Prakash Ranjan Jan. 11, 2021, 2:15 p.m. UTC
commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
the memory type setting required for the non-coherent masters to use
system cache. Now that system cache support for GPU is added, we will
need to set the right PTE attribute for GPU buffers to be sys cached.
Without this, the system cache lines are not allocated for GPU.

So the patches in this series introduces a new prot flag IOMMU_LLC,
renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
and makes GPU the user of this protection flag.

The series slightly depends on following 2 patches posted earlier and
is based on msm-next branch:
 * https://lore.kernel.org/patchwork/patch/1363008/
 * https://lore.kernel.org/patchwork/patch/1363010/

Sai Prakash Ranjan (3):
  iommu/io-pgtable: Rename last-level cache quirk to
    IO_PGTABLE_QUIRK_PTW_LLC
  iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
 drivers/gpu/drm/msm/msm_iommu.c         | 3 +++
 drivers/gpu/drm/msm/msm_mmu.h           | 4 ++++
 drivers/iommu/io-pgtable-arm.c          | 9 ++++++---
 include/linux/io-pgtable.h              | 6 +++---
 include/linux/iommu.h                   | 6 ++++++
 7 files changed, 26 insertions(+), 7 deletions(-)


base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709

Comments

Sai Prakash Ranjan Jan. 20, 2021, 5:18 a.m. UTC | #1
On 2021-01-11 19:45, Sai Prakash Ranjan wrote:
> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> the memory type setting required for the non-coherent masters to use
> system cache. Now that system cache support for GPU is added, we will
> need to set the right PTE attribute for GPU buffers to be sys cached.
> Without this, the system cache lines are not allocated for GPU.
> 
> So the patches in this series introduces a new prot flag IOMMU_LLC,
> renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> and makes GPU the user of this protection flag.
> 
> The series slightly depends on following 2 patches posted earlier and
> is based on msm-next branch:
>  * https://lore.kernel.org/patchwork/patch/1363008/
>  * https://lore.kernel.org/patchwork/patch/1363010/
> 
> Sai Prakash Ranjan (3):
>   iommu/io-pgtable: Rename last-level cache quirk to
>     IO_PGTABLE_QUIRK_PTW_LLC
>   iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
>   drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers
> 
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
>  drivers/gpu/drm/msm/msm_iommu.c         | 3 +++
>  drivers/gpu/drm/msm/msm_mmu.h           | 4 ++++
>  drivers/iommu/io-pgtable-arm.c          | 9 ++++++---
>  include/linux/io-pgtable.h              | 6 +++---
>  include/linux/iommu.h                   | 6 ++++++
>  7 files changed, 26 insertions(+), 7 deletions(-)
> 
> 
> base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709


Gentle Ping!

Thanks,
Sai
Sai Prakash Ranjan Jan. 29, 2021, 8:53 a.m. UTC | #2
On 2021-01-20 10:48, Sai Prakash Ranjan wrote:
> On 2021-01-11 19:45, Sai Prakash Ranjan wrote:
>> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> the memory type setting required for the non-coherent masters to use
>> system cache. Now that system cache support for GPU is added, we will
>> need to set the right PTE attribute for GPU buffers to be sys cached.
>> Without this, the system cache lines are not allocated for GPU.
>> 
>> So the patches in this series introduces a new prot flag IOMMU_LLC,
>> renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>> and makes GPU the user of this protection flag.
>> 
>> The series slightly depends on following 2 patches posted earlier and
>> is based on msm-next branch:
>>  * https://lore.kernel.org/patchwork/patch/1363008/
>>  * https://lore.kernel.org/patchwork/patch/1363010/
>> 
>> Sai Prakash Ranjan (3):
>>   iommu/io-pgtable: Rename last-level cache quirk to
>>     IO_PGTABLE_QUIRK_PTW_LLC
>>   iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
>>   drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers
>> 
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
>>  drivers/gpu/drm/msm/msm_iommu.c         | 3 +++
>>  drivers/gpu/drm/msm/msm_mmu.h           | 4 ++++
>>  drivers/iommu/io-pgtable-arm.c          | 9 ++++++---
>>  include/linux/io-pgtable.h              | 6 +++---
>>  include/linux/iommu.h                   | 6 ++++++
>>  7 files changed, 26 insertions(+), 7 deletions(-)
>> 
>> 
>> base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709
> 
> 
> Gentle Ping!
> 

Gentle Ping!!

Thanks,
Sai
Georgi Djakov July 28, 2021, 2 p.m. UTC | #3
On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> the memory type setting required for the non-coherent masters to use
> system cache. Now that system cache support for GPU is added, we will
> need to set the right PTE attribute for GPU buffers to be sys cached.
> Without this, the system cache lines are not allocated for GPU.
> 
> So the patches in this series introduces a new prot flag IOMMU_LLC,
> renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> and makes GPU the user of this protection flag.

Hi Sai,

Thank you for the patchset! Are you planning to refresh it, as it does
not apply anymore?

Thanks,
Georgi

> 
> The series slightly depends on following 2 patches posted earlier and
> is based on msm-next branch:
>  * https://lore.kernel.org/patchwork/patch/1363008/
>  * https://lore.kernel.org/patchwork/patch/1363010/
> 
> Sai Prakash Ranjan (3):
>   iommu/io-pgtable: Rename last-level cache quirk to
>     IO_PGTABLE_QUIRK_PTW_LLC
>   iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
>   drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers
> 
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
>  drivers/gpu/drm/msm/msm_iommu.c         | 3 +++
>  drivers/gpu/drm/msm/msm_mmu.h           | 4 ++++
>  drivers/iommu/io-pgtable-arm.c          | 9 ++++++---
>  include/linux/io-pgtable.h              | 6 +++---
>  include/linux/iommu.h                   | 6 ++++++
>  7 files changed, 26 insertions(+), 7 deletions(-)
> 
> 
> base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
>
Sai Prakash Ranjan July 29, 2021, 4:38 a.m. UTC | #4
Hi Georgi,

On 2021-07-28 19:30, Georgi Djakov wrote:
> On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
>> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> the memory type setting required for the non-coherent masters to use
>> system cache. Now that system cache support for GPU is added, we will
>> need to set the right PTE attribute for GPU buffers to be sys cached.
>> Without this, the system cache lines are not allocated for GPU.
>> 
>> So the patches in this series introduces a new prot flag IOMMU_LLC,
>> renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>> and makes GPU the user of this protection flag.
> 
> Hi Sai,
> 
> Thank you for the patchset! Are you planning to refresh it, as it does
> not apply anymore?
> 

I was waiting on Will's reply [1]. If there are no changes needed, then
I can repost the patch.

[1] 
https://lore.kernel.org/lkml/21239ba603d0bdc4e4c696588a905f88@codeaurora.org/

Thanks,
Sai

> 
>> 
>> The series slightly depends on following 2 patches posted earlier and
>> is based on msm-next branch:
>>  * https://lore.kernel.org/patchwork/patch/1363008/
>>  * https://lore.kernel.org/patchwork/patch/1363010/
>> 
>> Sai Prakash Ranjan (3):
>>   iommu/io-pgtable: Rename last-level cache quirk to
>>     IO_PGTABLE_QUIRK_PTW_LLC
>>   iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
>>   drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers
>> 
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
>>  drivers/gpu/drm/msm/msm_iommu.c         | 3 +++
>>  drivers/gpu/drm/msm/msm_mmu.h           | 4 ++++
>>  drivers/iommu/io-pgtable-arm.c          | 9 ++++++---
>>  include/linux/io-pgtable.h              | 6 +++---
>>  include/linux/iommu.h                   | 6 ++++++
>>  7 files changed, 26 insertions(+), 7 deletions(-)
>> 
>> 
>> base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
>> member
>> of Code Aurora Forum, hosted by The Linux Foundation
>>
Will Deacon Aug. 2, 2021, 10:55 a.m. UTC | #5
On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> On 2021-07-28 19:30, Georgi Djakov wrote:
> > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > the memory type setting required for the non-coherent masters to use
> > > system cache. Now that system cache support for GPU is added, we will
> > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > Without this, the system cache lines are not allocated for GPU.
> > > 
> > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > and makes GPU the user of this protection flag.
> > 
> > Thank you for the patchset! Are you planning to refresh it, as it does
> > not apply anymore?
> > 
> 
> I was waiting on Will's reply [1]. If there are no changes needed, then
> I can repost the patch.

I still think you need to handle the mismatched alias, no? You're adding
a new memory type to the SMMU which doesn't exist on the CPU side. That
can't be right.

Will
Rob Clark Aug. 2, 2021, 3:08 p.m. UTC | #6
On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
>
> On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > the memory type setting required for the non-coherent masters to use
> > > > system cache. Now that system cache support for GPU is added, we will
> > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > Without this, the system cache lines are not allocated for GPU.
> > > >
> > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > and makes GPU the user of this protection flag.
> > >
> > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > not apply anymore?
> > >
> >
> > I was waiting on Will's reply [1]. If there are no changes needed, then
> > I can repost the patch.
>
> I still think you need to handle the mismatched alias, no? You're adding
> a new memory type to the SMMU which doesn't exist on the CPU side. That
> can't be right.
>

Just curious, and maybe this is a dumb question, but what is your
concern about mismatched aliases?  I mean the cache hierarchy on the
GPU device side (anything beyond the LLC) is pretty different and
doesn't really care about the smmu pgtable attributes..

BR,
-R
Will Deacon Aug. 2, 2021, 3:14 p.m. UTC | #7
On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > the memory type setting required for the non-coherent masters to use
> > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > Without this, the system cache lines are not allocated for GPU.
> > > > >
> > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > and makes GPU the user of this protection flag.
> > > >
> > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > not apply anymore?
> > > >
> > >
> > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > I can repost the patch.
> >
> > I still think you need to handle the mismatched alias, no? You're adding
> > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > can't be right.
> >
> 
> Just curious, and maybe this is a dumb question, but what is your
> concern about mismatched aliases?  I mean the cache hierarchy on the
> GPU device side (anything beyond the LLC) is pretty different and
> doesn't really care about the smmu pgtable attributes..

If the CPU accesses a shared buffer with different attributes to those which
the device is using then you fall into the "mismatched memory attributes"
part of the Arm architecture. It's reasonably unforgiving (you should go and
read it) and in some cases can apply to speculative accesses as well, but
the end result is typically loss of coherency.

Will
Rob Clark Aug. 3, 2021, 1:36 a.m. UTC | #8
On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > >
> > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > and makes GPU the user of this protection flag.
> > > > >
> > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > not apply anymore?
> > > > >
> > > >
> > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > I can repost the patch.
> > >
> > > I still think you need to handle the mismatched alias, no? You're adding
> > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > can't be right.
> > >
> >
> > Just curious, and maybe this is a dumb question, but what is your
> > concern about mismatched aliases?  I mean the cache hierarchy on the
> > GPU device side (anything beyond the LLC) is pretty different and
> > doesn't really care about the smmu pgtable attributes..
>
> If the CPU accesses a shared buffer with different attributes to those which
> the device is using then you fall into the "mismatched memory attributes"
> part of the Arm architecture. It's reasonably unforgiving (you should go and
> read it) and in some cases can apply to speculative accesses as well, but
> the end result is typically loss of coherency.

Ok, I might have a few other sections to read first to decipher the
terminology..

But my understanding of LLC is that it looks just like system memory
to the CPU and GPU (I think that would make it "the point of
coherence" between the GPU and CPU?)  If that is true, shouldn't it be
invisible from the point of view of different CPU mapping options?

BR,
-R
Will Deacon Aug. 9, 2021, 2:56 p.m. UTC | #9
On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > >
> > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > and makes GPU the user of this protection flag.
> > > > > >
> > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > not apply anymore?
> > > > > >
> > > > >
> > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > I can repost the patch.
> > > >
> > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > can't be right.
> > > >
> > >
> > > Just curious, and maybe this is a dumb question, but what is your
> > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > GPU device side (anything beyond the LLC) is pretty different and
> > > doesn't really care about the smmu pgtable attributes..
> >
> > If the CPU accesses a shared buffer with different attributes to those which
> > the device is using then you fall into the "mismatched memory attributes"
> > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > read it) and in some cases can apply to speculative accesses as well, but
> > the end result is typically loss of coherency.
> 
> Ok, I might have a few other sections to read first to decipher the
> terminology..
> 
> But my understanding of LLC is that it looks just like system memory
> to the CPU and GPU (I think that would make it "the point of
> coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> invisible from the point of view of different CPU mapping options?

You could certainly build a system where mismatched attributes don't cause
loss of coherence, but as it's not guaranteed by the architecture and the
changes proposed here affect APIs which are exposed across SoCs, then I
don't think it helps much.

Will
Rob Clark Aug. 9, 2021, 4:57 p.m. UTC | #10
On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > >
> > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > and makes GPU the user of this protection flag.
> > > > > > >
> > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > > not apply anymore?
> > > > > > >
> > > > > >
> > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > > I can repost the patch.
> > > > >
> > > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > > can't be right.
> > > > >
> > > >
> > > > Just curious, and maybe this is a dumb question, but what is your
> > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > doesn't really care about the smmu pgtable attributes..
> > >
> > > If the CPU accesses a shared buffer with different attributes to those which
> > > the device is using then you fall into the "mismatched memory attributes"
> > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > > read it) and in some cases can apply to speculative accesses as well, but
> > > the end result is typically loss of coherency.
> >
> > Ok, I might have a few other sections to read first to decipher the
> > terminology..
> >
> > But my understanding of LLC is that it looks just like system memory
> > to the CPU and GPU (I think that would make it "the point of
> > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > invisible from the point of view of different CPU mapping options?
>
> You could certainly build a system where mismatched attributes don't cause
> loss of coherence, but as it's not guaranteed by the architecture and the
> changes proposed here affect APIs which are exposed across SoCs, then I
> don't think it helps much.
>

Hmm, the description of the new mapping flag is that it applies only
to transparent outer level cache:

+/*
+ * Non-coherent masters can use this page protection flag to set cacheable
+ * memory attributes for only a transparent outer level of cache, also known as
+ * the last-level or system cache.
+ */
+#define IOMMU_LLC      (1 << 6)

But I suppose we could call it instead IOMMU_QCOM_LLC or something
like that to make it more clear that it is not necessarily something
that would work with a different outer level cache implementation?

BR,
-R
Will Deacon Aug. 9, 2021, 5:05 p.m. UTC | #11
On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > > >
> > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > >
> > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > > > not apply anymore?
> > > > > > > >
> > > > > > >
> > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > > > I can repost the patch.
> > > > > >
> > > > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > > > can't be right.
> > > > > >
> > > > >
> > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > doesn't really care about the smmu pgtable attributes..
> > > >
> > > > If the CPU accesses a shared buffer with different attributes to those which
> > > > the device is using then you fall into the "mismatched memory attributes"
> > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > > > read it) and in some cases can apply to speculative accesses as well, but
> > > > the end result is typically loss of coherency.
> > >
> > > Ok, I might have a few other sections to read first to decipher the
> > > terminology..
> > >
> > > But my understanding of LLC is that it looks just like system memory
> > > to the CPU and GPU (I think that would make it "the point of
> > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > invisible from the point of view of different CPU mapping options?
> >
> > You could certainly build a system where mismatched attributes don't cause
> > loss of coherence, but as it's not guaranteed by the architecture and the
> > changes proposed here affect APIs which are exposed across SoCs, then I
> > don't think it helps much.
> >
> 
> Hmm, the description of the new mapping flag is that it applies only
> to transparent outer level cache:
> 
> +/*
> + * Non-coherent masters can use this page protection flag to set cacheable
> + * memory attributes for only a transparent outer level of cache, also known as
> + * the last-level or system cache.
> + */
> +#define IOMMU_LLC      (1 << 6)
> 
> But I suppose we could call it instead IOMMU_QCOM_LLC or something
> like that to make it more clear that it is not necessarily something
> that would work with a different outer level cache implementation?

... or we could just deal with the problem so that other people can reuse
the code. I haven't really understood the reluctance to solve this properly.

Am I missing some reason this isn't solvable?

Will
Rob Clark Aug. 9, 2021, 5:18 p.m. UTC | #12
On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > > > >
> > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > >
> > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > > > > not apply anymore?
> > > > > > > > >
> > > > > > > >
> > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > > > > I can repost the patch.
> > > > > > >
> > > > > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > > > > can't be right.
> > > > > > >
> > > > > >
> > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > doesn't really care about the smmu pgtable attributes..
> > > > >
> > > > > If the CPU accesses a shared buffer with different attributes to those which
> > > > > the device is using then you fall into the "mismatched memory attributes"
> > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > > > > read it) and in some cases can apply to speculative accesses as well, but
> > > > > the end result is typically loss of coherency.
> > > >
> > > > Ok, I might have a few other sections to read first to decipher the
> > > > terminology..
> > > >
> > > > But my understanding of LLC is that it looks just like system memory
> > > > to the CPU and GPU (I think that would make it "the point of
> > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > invisible from the point of view of different CPU mapping options?
> > >
> > > You could certainly build a system where mismatched attributes don't cause
> > > loss of coherence, but as it's not guaranteed by the architecture and the
> > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > don't think it helps much.
> > >
> >
> > Hmm, the description of the new mapping flag is that it applies only
> > to transparent outer level cache:
> >
> > +/*
> > + * Non-coherent masters can use this page protection flag to set cacheable
> > + * memory attributes for only a transparent outer level of cache, also known as
> > + * the last-level or system cache.
> > + */
> > +#define IOMMU_LLC      (1 << 6)
> >
> > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > like that to make it more clear that it is not necessarily something
> > that would work with a different outer level cache implementation?
>
> ... or we could just deal with the problem so that other people can reuse
> the code. I haven't really understood the reluctance to solve this properly.
>
> Am I missing some reason this isn't solvable?
>

Oh, was there another way to solve it (other than foregoing setting
INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
corresponding setting on the MMU pgtables side of things?

BR,
-R
Will Deacon Aug. 9, 2021, 5:40 p.m. UTC | #13
On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > > > > >
> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > > >
> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > > > > > not apply anymore?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > > > > > I can repost the patch.
> > > > > > > >
> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > > > > > can't be right.
> > > > > > > >
> > > > > > >
> > > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > > doesn't really care about the smmu pgtable attributes..
> > > > > >
> > > > > > If the CPU accesses a shared buffer with different attributes to those which
> > > > > > the device is using then you fall into the "mismatched memory attributes"
> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > > > > > read it) and in some cases can apply to speculative accesses as well, but
> > > > > > the end result is typically loss of coherency.
> > > > >
> > > > > Ok, I might have a few other sections to read first to decipher the
> > > > > terminology..
> > > > >
> > > > > But my understanding of LLC is that it looks just like system memory
> > > > > to the CPU and GPU (I think that would make it "the point of
> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > > invisible from the point of view of different CPU mapping options?
> > > >
> > > > You could certainly build a system where mismatched attributes don't cause
> > > > loss of coherence, but as it's not guaranteed by the architecture and the
> > > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > > don't think it helps much.
> > > >
> > >
> > > Hmm, the description of the new mapping flag is that it applies only
> > > to transparent outer level cache:
> > >
> > > +/*
> > > + * Non-coherent masters can use this page protection flag to set cacheable
> > > + * memory attributes for only a transparent outer level of cache, also known as
> > > + * the last-level or system cache.
> > > + */
> > > +#define IOMMU_LLC      (1 << 6)
> > >
> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > > like that to make it more clear that it is not necessarily something
> > > that would work with a different outer level cache implementation?
> >
> > ... or we could just deal with the problem so that other people can reuse
> > the code. I haven't really understood the reluctance to solve this properly.
> >
> > Am I missing some reason this isn't solvable?
> 
> Oh, was there another way to solve it (other than foregoing setting
> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> corresponding setting on the MMU pgtables side of things?

Right -- we just need to program the CPU's MMU with the matching memory
attributes! It's a bit more fiddly if you're just using ioremap_wc()
though, as it's usually the DMA API which handles the attributes under the
hood.

Anyway, sorry, I should've said that explicitly earlier on. We've done this
sort of thing in the Android tree so I assumed Sai knew what needed to be
done and then I didn't think to explain to you :(

Will
Sai Prakash Ranjan Aug. 9, 2021, 5:47 p.m. UTC | #14
On 2021-08-09 23:10, Will Deacon wrote:
> On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>> >
>> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
>> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
>> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
>> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
>> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
>> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
>> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
>> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
>> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
>> > > > > > > > > > >
>> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
>> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>> > > > > > > > > > > and makes GPU the user of this protection flag.
>> > > > > > > > > >
>> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
>> > > > > > > > > > not apply anymore?
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
>> > > > > > > > > I can repost the patch.
>> > > > > > > >
>> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
>> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
>> > > > > > > > can't be right.
>> > > > > > > >
>> > > > > > >
>> > > > > > > Just curious, and maybe this is a dumb question, but what is your
>> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
>> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
>> > > > > > > doesn't really care about the smmu pgtable attributes..
>> > > > > >
>> > > > > > If the CPU accesses a shared buffer with different attributes to those which
>> > > > > > the device is using then you fall into the "mismatched memory attributes"
>> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
>> > > > > > read it) and in some cases can apply to speculative accesses as well, but
>> > > > > > the end result is typically loss of coherency.
>> > > > >
>> > > > > Ok, I might have a few other sections to read first to decipher the
>> > > > > terminology..
>> > > > >
>> > > > > But my understanding of LLC is that it looks just like system memory
>> > > > > to the CPU and GPU (I think that would make it "the point of
>> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
>> > > > > invisible from the point of view of different CPU mapping options?
>> > > >
>> > > > You could certainly build a system where mismatched attributes don't cause
>> > > > loss of coherence, but as it's not guaranteed by the architecture and the
>> > > > changes proposed here affect APIs which are exposed across SoCs, then I
>> > > > don't think it helps much.
>> > > >
>> > >
>> > > Hmm, the description of the new mapping flag is that it applies only
>> > > to transparent outer level cache:
>> > >
>> > > +/*
>> > > + * Non-coherent masters can use this page protection flag to set cacheable
>> > > + * memory attributes for only a transparent outer level of cache, also known as
>> > > + * the last-level or system cache.
>> > > + */
>> > > +#define IOMMU_LLC      (1 << 6)
>> > >
>> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> > > like that to make it more clear that it is not necessarily something
>> > > that would work with a different outer level cache implementation?
>> >
>> > ... or we could just deal with the problem so that other people can reuse
>> > the code. I haven't really understood the reluctance to solve this properly.
>> >
>> > Am I missing some reason this isn't solvable?
>> 
>> Oh, was there another way to solve it (other than foregoing setting
>> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> corresponding setting on the MMU pgtables side of things?
> 
> Right -- we just need to program the CPU's MMU with the matching memory
> attributes! It's a bit more fiddly if you're just using ioremap_wc()
> though, as it's usually the DMA API which handles the attributes under 
> the
> hood.
> 
> Anyway, sorry, I should've said that explicitly earlier on. We've done 
> this
> sort of thing in the Android tree so I assumed Sai knew what needed to 
> be
> done and then I didn't think to explain to you :(
> 

Right I was aware of that but even in the android tree there is no user 
:)
I think we can't have a new memory type without any user right in 
upstream
like android tree?

@Rob, I think you  already tried adding a new MT and used 
pgprot_syscached()
in GPU driver but it was crashing?

Thanks,
Sai
Rob Clark Aug. 9, 2021, 6:07 p.m. UTC | #15
On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> On 2021-08-09 23:10, Will Deacon wrote:
> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
> >> >
> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
> >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> >> > > > > > > > > > >
> >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> >> > > > > > > > > > > and makes GPU the user of this protection flag.
> >> > > > > > > > > >
> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> >> > > > > > > > > > not apply anymore?
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> >> > > > > > > > > I can repost the patch.
> >> > > > > > > >
> >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> >> > > > > > > > can't be right.
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > Just curious, and maybe this is a dumb question, but what is your
> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
> >> > > > > > > doesn't really care about the smmu pgtable attributes..
> >> > > > > >
> >> > > > > > If the CPU accesses a shared buffer with different attributes to those which
> >> > > > > > the device is using then you fall into the "mismatched memory attributes"
> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> >> > > > > > read it) and in some cases can apply to speculative accesses as well, but
> >> > > > > > the end result is typically loss of coherency.
> >> > > > >
> >> > > > > Ok, I might have a few other sections to read first to decipher the
> >> > > > > terminology..
> >> > > > >
> >> > > > > But my understanding of LLC is that it looks just like system memory
> >> > > > > to the CPU and GPU (I think that would make it "the point of
> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> >> > > > > invisible from the point of view of different CPU mapping options?
> >> > > >
> >> > > > You could certainly build a system where mismatched attributes don't cause
> >> > > > loss of coherence, but as it's not guaranteed by the architecture and the
> >> > > > changes proposed here affect APIs which are exposed across SoCs, then I
> >> > > > don't think it helps much.
> >> > > >
> >> > >
> >> > > Hmm, the description of the new mapping flag is that it applies only
> >> > > to transparent outer level cache:
> >> > >
> >> > > +/*
> >> > > + * Non-coherent masters can use this page protection flag to set cacheable
> >> > > + * memory attributes for only a transparent outer level of cache, also known as
> >> > > + * the last-level or system cache.
> >> > > + */
> >> > > +#define IOMMU_LLC      (1 << 6)
> >> > >
> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> >> > > like that to make it more clear that it is not necessarily something
> >> > > that would work with a different outer level cache implementation?
> >> >
> >> > ... or we could just deal with the problem so that other people can reuse
> >> > the code. I haven't really understood the reluctance to solve this properly.
> >> >
> >> > Am I missing some reason this isn't solvable?
> >>
> >> Oh, was there another way to solve it (other than foregoing setting
> >> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> >> corresponding setting on the MMU pgtables side of things?
> >
> > Right -- we just need to program the CPU's MMU with the matching memory
> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
> > though, as it's usually the DMA API which handles the attributes under
> > the
> > hood.
> >
> > Anyway, sorry, I should've said that explicitly earlier on. We've done
> > this
> > sort of thing in the Android tree so I assumed Sai knew what needed to
> > be
> > done and then I didn't think to explain to you :(
> >
>
> Right I was aware of that but even in the android tree there is no user
> :)
> I think we can't have a new memory type without any user right in
> upstream
> like android tree?
>
> @Rob, I think you  already tried adding a new MT and used
> pgprot_syscached()
> in GPU driver but it was crashing?

Correct, but IIRC there were some differences in the code for memory
types compared to the android tree.. I couldn't figure out the
necessary patches to cherry-pick to get the android patch to apply
cleanly, so I tried re-implementing it without having much of a clue
about how that code works (which was probably the issue) ;-)

BR,
-R
Sai Prakash Ranjan Aug. 9, 2021, 6:10 p.m. UTC | #16
On 2021-08-09 23:37, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
>> 
>> On 2021-08-09 23:10, Will Deacon wrote:
>> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>> >> >
>> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
>> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
>> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
>> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
>> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
>> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
>> >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
>> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
>> >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
>> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>> >> > > > > > > > > > > and makes GPU the user of this protection flag.
>> >> > > > > > > > > >
>> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
>> >> > > > > > > > > > not apply anymore?
>> >> > > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
>> >> > > > > > > > > I can repost the patch.
>> >> > > > > > > >
>> >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
>> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
>> >> > > > > > > > can't be right.
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > > > Just curious, and maybe this is a dumb question, but what is your
>> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
>> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
>> >> > > > > > > doesn't really care about the smmu pgtable attributes..
>> >> > > > > >
>> >> > > > > > If the CPU accesses a shared buffer with different attributes to those which
>> >> > > > > > the device is using then you fall into the "mismatched memory attributes"
>> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
>> >> > > > > > read it) and in some cases can apply to speculative accesses as well, but
>> >> > > > > > the end result is typically loss of coherency.
>> >> > > > >
>> >> > > > > Ok, I might have a few other sections to read first to decipher the
>> >> > > > > terminology..
>> >> > > > >
>> >> > > > > But my understanding of LLC is that it looks just like system memory
>> >> > > > > to the CPU and GPU (I think that would make it "the point of
>> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
>> >> > > > > invisible from the point of view of different CPU mapping options?
>> >> > > >
>> >> > > > You could certainly build a system where mismatched attributes don't cause
>> >> > > > loss of coherence, but as it's not guaranteed by the architecture and the
>> >> > > > changes proposed here affect APIs which are exposed across SoCs, then I
>> >> > > > don't think it helps much.
>> >> > > >
>> >> > >
>> >> > > Hmm, the description of the new mapping flag is that it applies only
>> >> > > to transparent outer level cache:
>> >> > >
>> >> > > +/*
>> >> > > + * Non-coherent masters can use this page protection flag to set cacheable
>> >> > > + * memory attributes for only a transparent outer level of cache, also known as
>> >> > > + * the last-level or system cache.
>> >> > > + */
>> >> > > +#define IOMMU_LLC      (1 << 6)
>> >> > >
>> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> >> > > like that to make it more clear that it is not necessarily something
>> >> > > that would work with a different outer level cache implementation?
>> >> >
>> >> > ... or we could just deal with the problem so that other people can reuse
>> >> > the code. I haven't really understood the reluctance to solve this properly.
>> >> >
>> >> > Am I missing some reason this isn't solvable?
>> >>
>> >> Oh, was there another way to solve it (other than foregoing setting
>> >> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> >> corresponding setting on the MMU pgtables side of things?
>> >
>> > Right -- we just need to program the CPU's MMU with the matching memory
>> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
>> > though, as it's usually the DMA API which handles the attributes under
>> > the
>> > hood.
>> >
>> > Anyway, sorry, I should've said that explicitly earlier on. We've done
>> > this
>> > sort of thing in the Android tree so I assumed Sai knew what needed to
>> > be
>> > done and then I didn't think to explain to you :(
>> >
>> 
>> Right I was aware of that but even in the android tree there is no 
>> user
>> :)
>> I think we can't have a new memory type without any user right in
>> upstream
>> like android tree?
>> 
>> @Rob, I think you  already tried adding a new MT and used
>> pgprot_syscached()
>> in GPU driver but it was crashing?
> 
> Correct, but IIRC there were some differences in the code for memory
> types compared to the android tree.. I couldn't figure out the
> necessary patches to cherry-pick to get the android patch to apply
> cleanly, so I tried re-implementing it without having much of a clue
> about how that code works (which was probably the issue) ;-)
> 

Hehe no, even I get the same crash after porting/modifying the required
patches from android ;) and I think crashes would be seen in android as
well, its just that they don't have any user exercising that code.

Thing is I can't make head and tail of the GPU crash logs, maybe you 
know
how to decode those errors, if not I can start a thread with QC GPU team
and ask them to decode?

Thanks,
Sai
Rob Clark Aug. 9, 2021, 6:30 p.m. UTC | #17
On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> On 2021-08-09 23:37, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
> > <saiprakash.ranjan@codeaurora.org> wrote:
> >>
> >> On 2021-08-09 23:10, Will Deacon wrote:
> >> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> >> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
> >> >> >
> >> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> >> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> >> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> >> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> >> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> >> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> >> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> >> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> >> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> >> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> >> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> >> >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
> >> >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> >> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> >> >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> >> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> >> >> > > > > > > > > > > and makes GPU the user of this protection flag.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> >> >> > > > > > > > > > not apply anymore?
> >> >> > > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> >> >> > > > > > > > > I can repost the patch.
> >> >> > > > > > > >
> >> >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
> >> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> >> >> > > > > > > > can't be right.
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > > > Just curious, and maybe this is a dumb question, but what is your
> >> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> >> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
> >> >> > > > > > > doesn't really care about the smmu pgtable attributes..
> >> >> > > > > >
> >> >> > > > > > If the CPU accesses a shared buffer with different attributes to those which
> >> >> > > > > > the device is using then you fall into the "mismatched memory attributes"
> >> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> >> >> > > > > > read it) and in some cases can apply to speculative accesses as well, but
> >> >> > > > > > the end result is typically loss of coherency.
> >> >> > > > >
> >> >> > > > > Ok, I might have a few other sections to read first to decipher the
> >> >> > > > > terminology..
> >> >> > > > >
> >> >> > > > > But my understanding of LLC is that it looks just like system memory
> >> >> > > > > to the CPU and GPU (I think that would make it "the point of
> >> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> >> >> > > > > invisible from the point of view of different CPU mapping options?
> >> >> > > >
> >> >> > > > You could certainly build a system where mismatched attributes don't cause
> >> >> > > > loss of coherence, but as it's not guaranteed by the architecture and the
> >> >> > > > changes proposed here affect APIs which are exposed across SoCs, then I
> >> >> > > > don't think it helps much.
> >> >> > > >
> >> >> > >
> >> >> > > Hmm, the description of the new mapping flag is that it applies only
> >> >> > > to transparent outer level cache:
> >> >> > >
> >> >> > > +/*
> >> >> > > + * Non-coherent masters can use this page protection flag to set cacheable
> >> >> > > + * memory attributes for only a transparent outer level of cache, also known as
> >> >> > > + * the last-level or system cache.
> >> >> > > + */
> >> >> > > +#define IOMMU_LLC      (1 << 6)
> >> >> > >
> >> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> >> >> > > like that to make it more clear that it is not necessarily something
> >> >> > > that would work with a different outer level cache implementation?
> >> >> >
> >> >> > ... or we could just deal with the problem so that other people can reuse
> >> >> > the code. I haven't really understood the reluctance to solve this properly.
> >> >> >
> >> >> > Am I missing some reason this isn't solvable?
> >> >>
> >> >> Oh, was there another way to solve it (other than foregoing setting
> >> >> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> >> >> corresponding setting on the MMU pgtables side of things?
> >> >
> >> > Right -- we just need to program the CPU's MMU with the matching memory
> >> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
> >> > though, as it's usually the DMA API which handles the attributes under
> >> > the
> >> > hood.
> >> >
> >> > Anyway, sorry, I should've said that explicitly earlier on. We've done
> >> > this
> >> > sort of thing in the Android tree so I assumed Sai knew what needed to
> >> > be
> >> > done and then I didn't think to explain to you :(
> >> >
> >>
> >> Right I was aware of that but even in the android tree there is no
> >> user
> >> :)
> >> I think we can't have a new memory type without any user right in
> >> upstream
> >> like android tree?
> >>
> >> @Rob, I think you  already tried adding a new MT and used
> >> pgprot_syscached()
> >> in GPU driver but it was crashing?
> >
> > Correct, but IIRC there were some differences in the code for memory
> > types compared to the android tree.. I couldn't figure out the
> > necessary patches to cherry-pick to get the android patch to apply
> > cleanly, so I tried re-implementing it without having much of a clue
> > about how that code works (which was probably the issue) ;-)
> >
>
> Hehe no, even I get the same crash after porting/modifying the required
> patches from android ;) and I think crashes would be seen in android as
> well, its just that they don't have any user exercising that code.
>
> Thing is I can't make head and tail of the GPU crash logs, maybe you
> know
> how to decode those errors, if not I can start a thread with QC GPU team
> and ask them to decode?
>

If you have a gpu devcore dump, I can take a look at it with
crashdec.. otherwise I can try to find the branch where I had that
patch backported.

I'm more familiar with using crashdec to figure out mesa bugs, but
maybe I could spot something where what the GPU is seeing disagrees
with what the CPU expects it to be seeing.

BR,
-R
Sai Prakash Ranjan Aug. 9, 2021, 6:32 p.m. UTC | #18
On 2021-08-10 00:00, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
>> 
>> On 2021-08-09 23:37, Rob Clark wrote:
>> > On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
>> > <saiprakash.ranjan@codeaurora.org> wrote:
>> >>
>> >> On 2021-08-09 23:10, Will Deacon wrote:
>> >> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> >> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>> >> >> >
>> >> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> >> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
>> >> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> >> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
>> >> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> >> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
>> >> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
>> >> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> >> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
>> >> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> >> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> >> >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
>> >> >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
>> >> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
>> >> >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
>> >> >> > > > > > > > > > >
>> >> >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
>> >> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>> >> >> > > > > > > > > > > and makes GPU the user of this protection flag.
>> >> >> > > > > > > > > >
>> >> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
>> >> >> > > > > > > > > > not apply anymore?
>> >> >> > > > > > > > > >
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
>> >> >> > > > > > > > > I can repost the patch.
>> >> >> > > > > > > >
>> >> >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
>> >> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
>> >> >> > > > > > > > can't be right.
>> >> >> > > > > > > >
>> >> >> > > > > > >
>> >> >> > > > > > > Just curious, and maybe this is a dumb question, but what is your
>> >> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
>> >> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
>> >> >> > > > > > > doesn't really care about the smmu pgtable attributes..
>> >> >> > > > > >
>> >> >> > > > > > If the CPU accesses a shared buffer with different attributes to those which
>> >> >> > > > > > the device is using then you fall into the "mismatched memory attributes"
>> >> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
>> >> >> > > > > > read it) and in some cases can apply to speculative accesses as well, but
>> >> >> > > > > > the end result is typically loss of coherency.
>> >> >> > > > >
>> >> >> > > > > Ok, I might have a few other sections to read first to decipher the
>> >> >> > > > > terminology..
>> >> >> > > > >
>> >> >> > > > > But my understanding of LLC is that it looks just like system memory
>> >> >> > > > > to the CPU and GPU (I think that would make it "the point of
>> >> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
>> >> >> > > > > invisible from the point of view of different CPU mapping options?
>> >> >> > > >
>> >> >> > > > You could certainly build a system where mismatched attributes don't cause
>> >> >> > > > loss of coherence, but as it's not guaranteed by the architecture and the
>> >> >> > > > changes proposed here affect APIs which are exposed across SoCs, then I
>> >> >> > > > don't think it helps much.
>> >> >> > > >
>> >> >> > >
>> >> >> > > Hmm, the description of the new mapping flag is that it applies only
>> >> >> > > to transparent outer level cache:
>> >> >> > >
>> >> >> > > +/*
>> >> >> > > + * Non-coherent masters can use this page protection flag to set cacheable
>> >> >> > > + * memory attributes for only a transparent outer level of cache, also known as
>> >> >> > > + * the last-level or system cache.
>> >> >> > > + */
>> >> >> > > +#define IOMMU_LLC      (1 << 6)
>> >> >> > >
>> >> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> >> >> > > like that to make it more clear that it is not necessarily something
>> >> >> > > that would work with a different outer level cache implementation?
>> >> >> >
>> >> >> > ... or we could just deal with the problem so that other people can reuse
>> >> >> > the code. I haven't really understood the reluctance to solve this properly.
>> >> >> >
>> >> >> > Am I missing some reason this isn't solvable?
>> >> >>
>> >> >> Oh, was there another way to solve it (other than foregoing setting
>> >> >> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> >> >> corresponding setting on the MMU pgtables side of things?
>> >> >
>> >> > Right -- we just need to program the CPU's MMU with the matching memory
>> >> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
>> >> > though, as it's usually the DMA API which handles the attributes under
>> >> > the
>> >> > hood.
>> >> >
>> >> > Anyway, sorry, I should've said that explicitly earlier on. We've done
>> >> > this
>> >> > sort of thing in the Android tree so I assumed Sai knew what needed to
>> >> > be
>> >> > done and then I didn't think to explain to you :(
>> >> >
>> >>
>> >> Right I was aware of that but even in the android tree there is no
>> >> user
>> >> :)
>> >> I think we can't have a new memory type without any user right in
>> >> upstream
>> >> like android tree?
>> >>
>> >> @Rob, I think you  already tried adding a new MT and used
>> >> pgprot_syscached()
>> >> in GPU driver but it was crashing?
>> >
>> > Correct, but IIRC there were some differences in the code for memory
>> > types compared to the android tree.. I couldn't figure out the
>> > necessary patches to cherry-pick to get the android patch to apply
>> > cleanly, so I tried re-implementing it without having much of a clue
>> > about how that code works (which was probably the issue) ;-)
>> >
>> 
>> Hehe no, even I get the same crash after porting/modifying the 
>> required
>> patches from android ;) and I think crashes would be seen in android 
>> as
>> well, its just that they don't have any user exercising that code.
>> 
>> Thing is I can't make head and tail of the GPU crash logs, maybe you
>> know
>> how to decode those errors, if not I can start a thread with QC GPU 
>> team
>> and ask them to decode?
>> 
> 
> If you have a gpu devcore dump, I can take a look at it with
> crashdec.. otherwise I can try to find the branch where I had that
> patch backported.
> 
> I'm more familiar with using crashdec to figure out mesa bugs, but
> maybe I could spot something where what the GPU is seeing disagrees
> with what the CPU expects it to be seeing.
> 

Sure, I will get a devcoredump tomorrow and attach in the bug, currently
I don't have it handy.

Thanks,
Sai
Will Deacon Aug. 10, 2021, 9:16 a.m. UTC | #19
On Mon, Aug 09, 2021 at 11:17:40PM +0530, Sai Prakash Ranjan wrote:
> On 2021-08-09 23:10, Will Deacon wrote:
> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> > > On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > > > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > > > > like that to make it more clear that it is not necessarily something
> > > > > that would work with a different outer level cache implementation?
> > > >
> > > > ... or we could just deal with the problem so that other people can reuse
> > > > the code. I haven't really understood the reluctance to solve this properly.
> > > >
> > > > Am I missing some reason this isn't solvable?
> > > 
> > > Oh, was there another way to solve it (other than foregoing setting
> > > INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> > > corresponding setting on the MMU pgtables side of things?
> > 
> > Right -- we just need to program the CPU's MMU with the matching memory
> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
> > though, as it's usually the DMA API which handles the attributes under
> > the
> > hood.
> > 
> > Anyway, sorry, I should've said that explicitly earlier on. We've done
> > this
> > sort of thing in the Android tree so I assumed Sai knew what needed to
> > be
> > done and then I didn't think to explain to you :(
> > 
> 
> Right I was aware of that but even in the android tree there is no user :)

I'm assuming there are vendor modules using it there, otherwise we wouldn't
have been asked to put it in. Since you work at Qualcomm, maybe you could
talk to your colleagues (Isaac and Patrick) directly?

> I think we can't have a new memory type without any user right in upstream
> like android tree?

Correct. But I don't think we should be adding IOMMU_* anything upstream
if we don't have a user.

Will
Sai Prakash Ranjan Aug. 10, 2021, 9:54 a.m. UTC | #20
On 2021-08-10 14:46, Will Deacon wrote:
> On Mon, Aug 09, 2021 at 11:17:40PM +0530, Sai Prakash Ranjan wrote:
>> On 2021-08-09 23:10, Will Deacon wrote:
>> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> > > On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>> > > > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> > > > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> > > > > like that to make it more clear that it is not necessarily something
>> > > > > that would work with a different outer level cache implementation?
>> > > >
>> > > > ... or we could just deal with the problem so that other people can reuse
>> > > > the code. I haven't really understood the reluctance to solve this properly.
>> > > >
>> > > > Am I missing some reason this isn't solvable?
>> > >
>> > > Oh, was there another way to solve it (other than foregoing setting
>> > > INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> > > corresponding setting on the MMU pgtables side of things?
>> >
>> > Right -- we just need to program the CPU's MMU with the matching memory
>> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
>> > though, as it's usually the DMA API which handles the attributes under
>> > the
>> > hood.
>> >
>> > Anyway, sorry, I should've said that explicitly earlier on. We've done
>> > this
>> > sort of thing in the Android tree so I assumed Sai knew what needed to
>> > be
>> > done and then I didn't think to explain to you :(
>> >
>> 
>> Right I was aware of that but even in the android tree there is no 
>> user :)
> 
> I'm assuming there are vendor modules using it there, otherwise we 
> wouldn't
> have been asked to put it in. Since you work at Qualcomm, maybe you 
> could
> talk to your colleagues (Isaac and Patrick) directly?
> 

Right I will check with them regarding the vendor modules in android.

>> I think we can't have a new memory type without any user right in 
>> upstream
>> like android tree?
> 
> Correct. But I don't think we should be adding IOMMU_* anything 
> upstream
> if we don't have a user.
> 

Agreed, once we have the fix for GPU crash I can continue further on 
using
this properly.

Thanks,
Sai