diff mbox series

[v3,3/3] drm/vgem: use normal cached mmap'ings

Message ID 20190716213746.4670-3-robdclark@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v3,1/3] drm/gem: don't force writecombine mmap'ing | expand

Commit Message

Rob Clark July 16, 2019, 9:37 p.m. UTC
From: Rob Clark <robdclark@chromium.org>

Since there is no real device associated with VGEM, it is impossible to
end up with appropriate dev->dma_ops, meaning that we have no way to
invalidate the shmem pages allocated by VGEM.  So, at least on platforms
without drm_cflush_pages(), we end up with corruption when cache lines
from previous usage of VGEM bo pages get evicted to memory.

The only sane option is to use cached mappings.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
v3: rebased on drm-tip

 drivers/gpu/drm/vgem/vgem_drv.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

Comments

Eric Anholt July 16, 2019, 11:39 p.m. UTC | #1
Rob Clark <robdclark@gmail.com> writes:

> From: Rob Clark <robdclark@chromium.org>
>
> Since there is no real device associated with VGEM, it is impossible to
> end up with appropriate dev->dma_ops, meaning that we have no way to
> invalidate the shmem pages allocated by VGEM.  So, at least on platforms
> without drm_cflush_pages(), we end up with corruption when cache lines
> from previous usage of VGEM bo pages get evicted to memory.
>
> The only sane option is to use cached mappings.

This may be an improvement, but...

pin/unpin is only on attaching/closing the dma-buf, right?  So, great,
you flushed the cached map once after exporting the vgem dma-buf to the
actual GPU device, but from then on you still have no interface for
getting coherent access through VGEM's mapping again, which still
exists.

I feel like this is papering over something that's really just broken,
and we should stop providing VGEM just because someone wants to write
dma-buf test code without driver-specific BO alloc ioctl code.
Rob Clark July 17, 2019, 12:13 a.m. UTC | #2
On Tue, Jul 16, 2019 at 4:39 PM Eric Anholt <eric@anholt.net> wrote:
>
> Rob Clark <robdclark@gmail.com> writes:
>
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Since there is no real device associated with VGEM, it is impossible to
> > end up with appropriate dev->dma_ops, meaning that we have no way to
> > invalidate the shmem pages allocated by VGEM.  So, at least on platforms
> > without drm_cflush_pages(), we end up with corruption when cache lines
> > from previous usage of VGEM bo pages get evicted to memory.
> >
> > The only sane option is to use cached mappings.
>
> This may be an improvement, but...
>
> pin/unpin is only on attaching/closing the dma-buf, right?  So, great,
> you flushed the cached map once after exporting the vgem dma-buf to the
> actual GPU device, but from then on you still have no interface for
> getting coherent access through VGEM's mapping again, which still
> exists.

In *theory* one would detach before doing further CPU access to
buffer, and then re-attach when passing back to GPU.

Ofc that isn't how actual drivers do things.  But maybe it is enough
for vgem to serve it's purpose (ie. test code).

> I feel like this is papering over something that's really just broken,
> and we should stop providing VGEM just because someone wants to write
> dma-buf test code without driver-specific BO alloc ioctl code.

yup, it is vgem that is fundamentally broken (or maybe more
specifically doesn't fit in w/ dma-mappings view of how to do cache
maint), and I'm just papering over it because people and CI systems
want to be able to use it to do some dma-buf tests ;-)

I'm kinda wondering, at least for arm/dt based systems, if there is a
way (other than in early boot) that we can inject a vgem device node
into the dtb.  That isn't a thing drivers should normally do, but (if
possible) since vgem is really just test infrastructure, it could be a
way to make dma-mapping happily think vgem is a real device.

BR,
-R
Daniel Vetter July 19, 2019, 9:13 a.m. UTC | #3
On Tue, Jul 16, 2019 at 05:13:10PM -0700, Rob Clark wrote:
> On Tue, Jul 16, 2019 at 4:39 PM Eric Anholt <eric@anholt.net> wrote:
> >
> > Rob Clark <robdclark@gmail.com> writes:
> >
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > Since there is no real device associated with VGEM, it is impossible to
> > > end up with appropriate dev->dma_ops, meaning that we have no way to
> > > invalidate the shmem pages allocated by VGEM.  So, at least on platforms
> > > without drm_cflush_pages(), we end up with corruption when cache lines
> > > from previous usage of VGEM bo pages get evicted to memory.
> > >
> > > The only sane option is to use cached mappings.
> >
> > This may be an improvement, but...
> >
> > pin/unpin is only on attaching/closing the dma-buf, right?  So, great,
> > you flushed the cached map once after exporting the vgem dma-buf to the
> > actual GPU device, but from then on you still have no interface for
> > getting coherent access through VGEM's mapping again, which still
> > exists.
> 
> In *theory* one would detach before doing further CPU access to
> buffer, and then re-attach when passing back to GPU.
> 
> Ofc that isn't how actual drivers do things.  But maybe it is enough
> for vgem to serve it's purpose (ie. test code).
> 
> > I feel like this is papering over something that's really just broken,
> > and we should stop providing VGEM just because someone wants to write
> > dma-buf test code without driver-specific BO alloc ioctl code.
> 
> yup, it is vgem that is fundamentally broken (or maybe more
> specifically doesn't fit in w/ dma-mappings view of how to do cache
> maint), and I'm just papering over it because people and CI systems
> want to be able to use it to do some dma-buf tests ;-)
> 
> I'm kinda wondering, at least for arm/dt based systems, if there is a
> way (other than in early boot) that we can inject a vgem device node
> into the dtb.  That isn't a thing drivers should normally do, but (if
> possible) since vgem is really just test infrastructure, it could be a
> way to make dma-mapping happily think vgem is a real device.

Or we just extend drm_cflush_pages with the cflushing we need (at least
for those arms where this is possible, let's ignore the others) and accept
for a few more years that dma-api doesn't fit?

Note this would need to be a full copypasta of what the arch code has
(since just exporting the function was shot down before), but I really
don't care about the resulting wailing if we do this.
-Daniel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index e7d12e93b1f0..84262e2bd7f7 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -259,9 +259,6 @@  static int vgem_mmap(struct file *filp, struct vm_area_struct *vma)
 	if (ret)
 		return ret;
 
-	/* Keep the WC mmaping set by drm_gem_mmap() but our pages
-	 * are ordinary and not special.
-	 */
 	vma->vm_flags = flags | VM_DONTEXPAND | VM_DONTDUMP;
 	return 0;
 }
@@ -310,17 +307,17 @@  static void vgem_unpin_pages(struct drm_vgem_gem_object *bo)
 static int vgem_prime_pin(struct drm_gem_object *obj, struct device *dev)
 {
 	struct drm_vgem_gem_object *bo = to_vgem_bo(obj);
-	long n_pages = obj->size >> PAGE_SHIFT;
+	long i, n_pages = obj->size >> PAGE_SHIFT;
 	struct page **pages;
 
 	pages = vgem_pin_pages(bo);
 	if (IS_ERR(pages))
 		return PTR_ERR(pages);
 
-	/* Flush the object from the CPU cache so that importers can rely
-	 * on coherent indirect access via the exported dma-address.
-	 */
-	drm_clflush_pages(pages, n_pages);
+	for (i = 0; i < n_pages; i++) {
+		dma_sync_single_for_device(dev, page_to_phys(pages[i]),
+					   PAGE_SIZE, DMA_BIDIRECTIONAL);
+	}
 
 	return 0;
 }
@@ -328,6 +325,13 @@  static int vgem_prime_pin(struct drm_gem_object *obj, struct device *dev)
 static void vgem_prime_unpin(struct drm_gem_object *obj, struct device *dev)
 {
 	struct drm_vgem_gem_object *bo = to_vgem_bo(obj);
+	long i, n_pages = obj->size >> PAGE_SHIFT;
+	struct page **pages = bo->pages;
+
+	for (i = 0; i < n_pages; i++) {
+		dma_sync_single_for_cpu(dev, page_to_phys(pages[i]),
+					PAGE_SIZE, DMA_BIDIRECTIONAL);
+	}
 
 	vgem_unpin_pages(bo);
 }
@@ -382,7 +386,7 @@  static void *vgem_prime_vmap(struct drm_gem_object *obj)
 	if (IS_ERR(pages))
 		return NULL;
 
-	return vmap(pages, n_pages, 0, pgprot_writecombine(PAGE_KERNEL));
+	return vmap(pages, n_pages, 0, PAGE_KERNEL);
 }
 
 static void vgem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
@@ -411,7 +415,7 @@  static int vgem_prime_mmap(struct drm_gem_object *obj,
 	fput(vma->vm_file);
 	vma->vm_file = get_file(obj->filp);
 	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
-	vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
 
 	return 0;
 }