Message ID | 20211001143427.1564786-5-boris.brezillon@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/panfrost: Add extra GPU-usage flags | expand |
On 01/10/2021 15:34, Boris Brezillon wrote: > This lets the driver reduce shareability domain of the MMU mapping, > which can in turn reduce access time and save power on cache-coherent > systems. > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> This seems reasonable to me - it matches the kbase BASE_MEM_COHERENT_SYSTEM (only backwards obviously) and it worked reasonably well for the blob. But I'm wondering if we need to do anything special to deal with the fact we will now have some non-coherent mappings on an otherwise coherent device. There are certainly some oddities around how these buffers will be mapped into user space if requested, e.g. panfrost_gem_create_object() sets 'map_wc' based on pfdev->coherent which is arguably wrong for GPUONLY. So there are two things we could consider: a) Actually prevent user space mapping GPUONLY flagged buffers. Which matches the intention of the name. b) Attempt to provide user space with the tools to safely interact with the buffers (this is the kbase approach). This does have the benefit of allowing *mostly* GPU access. An example here is the tiler heap where the CPU could zero out as necessary but mostly the GPU has ownership and the CPU never reads the contents. GPUONLY/DEVONLY might not be the best name in that case. Any thoughts? Thanks, Steve > --- > drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++- > drivers/gpu/drm/panfrost/panfrost_gem.c | 1 + > drivers/gpu/drm/panfrost/panfrost_gem.h | 1 + > drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++ > include/uapi/drm/panfrost_drm.h | 1 + > 5 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c > index b29ac942ae2d..b176921b9392 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c > @@ -77,7 +77,8 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct > > #define PANFROST_BO_FLAGS \ > (PANFROST_BO_NOEXEC | PANFROST_BO_HEAP | \ > - PANFROST_BO_NOREAD | PANFROST_BO_NOWRITE) > + PANFROST_BO_NOREAD | PANFROST_BO_NOWRITE | \ > + PANFROST_BO_GPUONLY) > > static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, > struct drm_file *file) > diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c > index d6c1bb1445f2..4b1f85c0b98f 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_gem.c > +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c > @@ -254,6 +254,7 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv, > bo->noread = !!(flags & PANFROST_BO_NOREAD); > bo->nowrite = !!(flags & PANFROST_BO_NOWRITE); > bo->is_heap = !!(flags & PANFROST_BO_HEAP); > + bo->gpuonly = !!(flags & PANFROST_BO_GPUONLY); > > /* > * Allocate an id of idr table where the obj is registered > diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h > index 6246b5fef446..e332d5a4c24f 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_gem.h > +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h > @@ -40,6 +40,7 @@ struct panfrost_gem_object { > bool noread :1; > bool nowrite :1; > bool is_heap :1; > + bool gpuonly :1; > }; > > struct panfrost_gem_mapping { > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c > index 6a5c9d94d6f2..89eee8e80aa5 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > @@ -321,6 +321,9 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping) > if (!bo->noread) > prot |= IOMMU_READ; > > + if (bo->gpuonly) > + prot |= IOMMU_DEVONLY; > + > sgt = drm_gem_shmem_get_pages_sgt(obj); > if (WARN_ON(IS_ERR(sgt))) > return PTR_ERR(sgt); > diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h > index a2de81225125..538b58b2d095 100644 > --- a/include/uapi/drm/panfrost_drm.h > +++ b/include/uapi/drm/panfrost_drm.h > @@ -88,6 +88,7 @@ struct drm_panfrost_wait_bo { > #define PANFROST_BO_HEAP 2 > #define PANFROST_BO_NOREAD 4 > #define PANFROST_BO_NOWRITE 8 > +#define PANFROST_BO_GPUONLY 0x10 > > /** > * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs. >
On Fri, 1 Oct 2021 16:13:42 +0100 Steven Price <steven.price@arm.com> wrote: > On 01/10/2021 15:34, Boris Brezillon wrote: > > This lets the driver reduce shareability domain of the MMU mapping, > > which can in turn reduce access time and save power on cache-coherent > > systems. > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > This seems reasonable to me - it matches the kbase > BASE_MEM_COHERENT_SYSTEM (only backwards obviously) and it worked > reasonably well for the blob. > > But I'm wondering if we need to do anything special to deal with the > fact we will now have some non-coherent mappings on an otherwise > coherent device. > > There are certainly some oddities around how these buffers will be > mapped into user space if requested, e.g. panfrost_gem_create_object() > sets 'map_wc' based on pfdev->coherent which is arguably wrong for > GPUONLY. So there are two things we could consider: > > a) Actually prevent user space mapping GPUONLY flagged buffers. Which > matches the intention of the name. I intended to do that, just forgot to add wrappers around drm_gem_shmem_{mmap,vmap}() to forbid CPU-mappings on gpuonly buffers. > > b) Attempt to provide user space with the tools to safely interact with > the buffers (this is the kbase approach). This does have the benefit of > allowing *mostly* GPU access. An example here is the tiler heap where > the CPU could zero out as necessary but mostly the GPU has ownership and > the CPU never reads the contents. GPUONLY/DEVONLY might not be the best > name in that case. Uh, right, I forgot we had to zero the tiler heap on Midgard (most of the time done with a WRITE_VALUE job, but there's an exception on some old Midgard GPUs IIRC).
> > This seems reasonable to me - it matches the kbase > > BASE_MEM_COHERENT_SYSTEM (only backwards obviously) and it worked > > reasonably well for the blob. Oh, is that what that was for? I remember seeing it set on Midgard for varyings. Good to go full circle now. > > But I'm wondering if we need to do anything special to deal with the > > fact we will now have some non-coherent mappings on an otherwise > > coherent device. > > > > There are certainly some oddities around how these buffers will be > > mapped into user space if requested, e.g. panfrost_gem_create_object() > > sets 'map_wc' based on pfdev->coherent which is arguably wrong for > > GPUONLY. So there are two things we could consider: > > > > a) Actually prevent user space mapping GPUONLY flagged buffers. Which > > matches the intention of the name. > > I intended to do that, just forgot to add wrappers around > drm_gem_shmem_{mmap,vmap}() to forbid CPU-mappings on gpuonly buffers. This feels like the cleaner solution to me. > > b) Attempt to provide user space with the tools to safely interact with > > the buffers (this is the kbase approach). This does have the benefit of > > allowing *mostly* GPU access. An example here is the tiler heap where > > the CPU could zero out as necessary but mostly the GPU has ownership and > > the CPU never reads the contents. GPUONLY/DEVONLY might not be the best > > name in that case. > > Uh, right, I forgot we had to zero the tiler heap on Midgard (most of > the time done with a WRITE_VALUE job, but there's an exception on some > old Midgard GPUs IIRC). "Attempt" is the key word here :| We indeed only touch the tiler heap from the CPU on v4, and life's too short to care about new optimizations for v4. Unless the patch is trivial, my vote is for a) preventing the mappings and only setting GPUONLY on the tiler_heap starting with v5.
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index b29ac942ae2d..b176921b9392 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -77,7 +77,8 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct #define PANFROST_BO_FLAGS \ (PANFROST_BO_NOEXEC | PANFROST_BO_HEAP | \ - PANFROST_BO_NOREAD | PANFROST_BO_NOWRITE) + PANFROST_BO_NOREAD | PANFROST_BO_NOWRITE | \ + PANFROST_BO_GPUONLY) static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index d6c1bb1445f2..4b1f85c0b98f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -254,6 +254,7 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv, bo->noread = !!(flags & PANFROST_BO_NOREAD); bo->nowrite = !!(flags & PANFROST_BO_NOWRITE); bo->is_heap = !!(flags & PANFROST_BO_HEAP); + bo->gpuonly = !!(flags & PANFROST_BO_GPUONLY); /* * Allocate an id of idr table where the obj is registered diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h index 6246b5fef446..e332d5a4c24f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.h +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h @@ -40,6 +40,7 @@ struct panfrost_gem_object { bool noread :1; bool nowrite :1; bool is_heap :1; + bool gpuonly :1; }; struct panfrost_gem_mapping { diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 6a5c9d94d6f2..89eee8e80aa5 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -321,6 +321,9 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping) if (!bo->noread) prot |= IOMMU_READ; + if (bo->gpuonly) + prot |= IOMMU_DEVONLY; + sgt = drm_gem_shmem_get_pages_sgt(obj); if (WARN_ON(IS_ERR(sgt))) return PTR_ERR(sgt); diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h index a2de81225125..538b58b2d095 100644 --- a/include/uapi/drm/panfrost_drm.h +++ b/include/uapi/drm/panfrost_drm.h @@ -88,6 +88,7 @@ struct drm_panfrost_wait_bo { #define PANFROST_BO_HEAP 2 #define PANFROST_BO_NOREAD 4 #define PANFROST_BO_NOWRITE 8 +#define PANFROST_BO_GPUONLY 0x10 /** * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
This lets the driver reduce shareability domain of the MMU mapping, which can in turn reduce access time and save power on cache-coherent systems. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> --- drivers/gpu/drm/panfrost/panfrost_drv.c | 3 ++- drivers/gpu/drm/panfrost/panfrost_gem.c | 1 + drivers/gpu/drm/panfrost/panfrost_gem.h | 1 + drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++ include/uapi/drm/panfrost_drm.h | 1 + 5 files changed, 8 insertions(+), 1 deletion(-)