Message ID | 20190529065121.13485-1-boris.brezillon@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/gem_shmem: Use a writecombine mapping for ->vaddr | expand |
On Wed, May 29, 2019 at 08:51:21AM +0200, Boris Brezillon wrote: > Right now, the BO is mapped as a cached region when ->vmap() is called > and the underlying object is not a dmabuf. > Doing that makes cache management a bit more complicated (you'd need > to call dma_map/unmap_sg() on the ->sgt field everytime the BO is about > to be passed to the GPU/CPU), so let's map the BO with writecombine > attributes instead (as done in most drivers). > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > Found this issue while working on panfrost perfcnt where the GPU dumps > perf counter values in memory and the CPU reads them back in > kernel-space. This patch seems to solve the unpredictable behavior I > had. > > I can also go for the other option (call dma_map/unmap/_sg() when > needed) if you think that's more appropriate. Uh, I guess shmem helpers (or gem helpers in general) need some concept about what kind of cpu mapping is desired. Since some cpus (like e.g. i915) do actually want cached mode for everything. Same is needed for the userspace mmap, those should all agree. Default probably best if we go with uncached. -Daniel > --- > drivers/gpu/drm/drm_gem_shmem_helper.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c > index 1ee208c2c85e..472ea5d81f82 100644 > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c > @@ -255,7 +255,8 @@ static void *drm_gem_shmem_vmap_locked(struct drm_gem_shmem_object *shmem) > if (obj->import_attach) > shmem->vaddr = dma_buf_vmap(obj->import_attach->dmabuf); > else > - shmem->vaddr = vmap(shmem->pages, obj->size >> PAGE_SHIFT, VM_MAP, PAGE_KERNEL); > + shmem->vaddr = vmap(shmem->pages, obj->size >> PAGE_SHIFT, > + VM_MAP, pgprot_writecombine(PAGE_KERNEL)); > > if (!shmem->vaddr) { > DRM_DEBUG_KMS("Failed to vmap pages\n"); > -- > 2.20.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Boris Brezillon <boris.brezillon@collabora.com> writes: > Right now, the BO is mapped as a cached region when ->vmap() is called > and the underlying object is not a dmabuf. > Doing that makes cache management a bit more complicated (you'd need > to call dma_map/unmap_sg() on the ->sgt field everytime the BO is about > to be passed to the GPU/CPU), so let's map the BO with writecombine > attributes instead (as done in most drivers). > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > Found this issue while working on panfrost perfcnt where the GPU dumps > perf counter values in memory and the CPU reads them back in > kernel-space. This patch seems to solve the unpredictable behavior I > had. > > I can also go for the other option (call dma_map/unmap/_sg() when > needed) if you think that's more appropriate. writecombined was the intent, and this makes kernel vmap match the userspace mmap path. Reviewed-by: Eric Anholt <eric@anholt.net>
On Fri, May 31, 2019 at 08:46:58AM -0700, Eric Anholt wrote: > Boris Brezillon <boris.brezillon@collabora.com> writes: > > > Right now, the BO is mapped as a cached region when ->vmap() is called > > and the underlying object is not a dmabuf. > > Doing that makes cache management a bit more complicated (you'd need > > to call dma_map/unmap_sg() on the ->sgt field everytime the BO is about > > to be passed to the GPU/CPU), so let's map the BO with writecombine > > attributes instead (as done in most drivers). > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > --- > > Found this issue while working on panfrost perfcnt where the GPU dumps > > perf counter values in memory and the CPU reads them back in > > kernel-space. This patch seems to solve the unpredictable behavior I > > had. > > > > I can also go for the other option (call dma_map/unmap/_sg() when > > needed) if you think that's more appropriate. > > writecombined was the intent, and this makes kernel vmap match the > userspace mmap path. Since I missed that obviously: Where do the shmem helpers set write combined mode for userspace mmap? -Daniel > > Reviewed-by: Eric Anholt <eric@anholt.net> > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter <daniel@ffwll.ch> writes: > On Fri, May 31, 2019 at 08:46:58AM -0700, Eric Anholt wrote: >> Boris Brezillon <boris.brezillon@collabora.com> writes: >> >> > Right now, the BO is mapped as a cached region when ->vmap() is called >> > and the underlying object is not a dmabuf. >> > Doing that makes cache management a bit more complicated (you'd need >> > to call dma_map/unmap_sg() on the ->sgt field everytime the BO is about >> > to be passed to the GPU/CPU), so let's map the BO with writecombine >> > attributes instead (as done in most drivers). >> > >> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> >> > --- >> > Found this issue while working on panfrost perfcnt where the GPU dumps >> > perf counter values in memory and the CPU reads them back in >> > kernel-space. This patch seems to solve the unpredictable behavior I >> > had. >> > >> > I can also go for the other option (call dma_map/unmap/_sg() when >> > needed) if you think that's more appropriate. >> >> writecombined was the intent, and this makes kernel vmap match the >> userspace mmap path. > > Since I missed that obviously: Where do the shmem helpers set write > combined mode for userspace mmap? That was the trick when I asked the question, too. drm_gem.c is what sets WC by default.
On Mon, Jun 3, 2019 at 5:43 PM Eric Anholt <eric@anholt.net> wrote: > > Daniel Vetter <daniel@ffwll.ch> writes: > > > On Fri, May 31, 2019 at 08:46:58AM -0700, Eric Anholt wrote: > >> Boris Brezillon <boris.brezillon@collabora.com> writes: > >> > >> > Right now, the BO is mapped as a cached region when ->vmap() is called > >> > and the underlying object is not a dmabuf. > >> > Doing that makes cache management a bit more complicated (you'd need > >> > to call dma_map/unmap_sg() on the ->sgt field everytime the BO is about > >> > to be passed to the GPU/CPU), so let's map the BO with writecombine > >> > attributes instead (as done in most drivers). > >> > > >> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > >> > --- > >> > Found this issue while working on panfrost perfcnt where the GPU dumps > >> > perf counter values in memory and the CPU reads them back in > >> > kernel-space. This patch seems to solve the unpredictable behavior I > >> > had. > >> > > >> > I can also go for the other option (call dma_map/unmap/_sg() when > >> > needed) if you think that's more appropriate. > >> > >> writecombined was the intent, and this makes kernel vmap match the > >> userspace mmap path. > > > > Since I missed that obviously: Where do the shmem helpers set write > > combined mode for userspace mmap? > > That was the trick when I asked the question, too. drm_gem.c is what > sets WC by default. TIL. And looks like this has been the case forever, it laned in commit a2c0a97b784f837300f7b0869c82ab712c600952 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Wed Nov 5 10:31:53 2008 -0800 drm: GEM mmap support I'll retract my concern, making this wc everywhere is the right thing to do. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cheers, Daniel
On Wed, May 29, 2019 at 12:51 AM Boris Brezillon <boris.brezillon@collabora.com> wrote: > > Right now, the BO is mapped as a cached region when ->vmap() is called > and the underlying object is not a dmabuf. > Doing that makes cache management a bit more complicated (you'd need > to call dma_map/unmap_sg() on the ->sgt field everytime the BO is about > to be passed to the GPU/CPU), so let's map the BO with writecombine > attributes instead (as done in most drivers). > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > Found this issue while working on panfrost perfcnt where the GPU dumps > perf counter values in memory and the CPU reads them back in > kernel-space. This patch seems to solve the unpredictable behavior I > had. > > I can also go for the other option (call dma_map/unmap/_sg() when > needed) if you think that's more appropriate. Using writecombine everywhere was the intention, but I missed this spot. I've applied this adding a Fixes tag. Rob
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 1ee208c2c85e..472ea5d81f82 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -255,7 +255,8 @@ static void *drm_gem_shmem_vmap_locked(struct drm_gem_shmem_object *shmem) if (obj->import_attach) shmem->vaddr = dma_buf_vmap(obj->import_attach->dmabuf); else - shmem->vaddr = vmap(shmem->pages, obj->size >> PAGE_SHIFT, VM_MAP, PAGE_KERNEL); + shmem->vaddr = vmap(shmem->pages, obj->size >> PAGE_SHIFT, + VM_MAP, pgprot_writecombine(PAGE_KERNEL)); if (!shmem->vaddr) { DRM_DEBUG_KMS("Failed to vmap pages\n");
Right now, the BO is mapped as a cached region when ->vmap() is called and the underlying object is not a dmabuf. Doing that makes cache management a bit more complicated (you'd need to call dma_map/unmap_sg() on the ->sgt field everytime the BO is about to be passed to the GPU/CPU), so let's map the BO with writecombine attributes instead (as done in most drivers). Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> --- Found this issue while working on panfrost perfcnt where the GPU dumps perf counter values in memory and the CPU reads them back in kernel-space. This patch seems to solve the unpredictable behavior I had. I can also go for the other option (call dma_map/unmap/_sg() when needed) if you think that's more appropriate. --- drivers/gpu/drm/drm_gem_shmem_helper.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)