diff mbox series

[2/2] drm/vgem: use normal cached mmap'ings

Message ID 20190716164221.15436-2-robdclark@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/2] drm/gem: don't force writecombine mmap'ing | expand

Commit Message

Rob Clark July 16, 2019, 4:42 p.m. UTC
From: Rob Clark <robdclark@chromium.org>

Since there is no real device associated with vgem, it is impossible to
end up with appropriate dev->dma_ops, meaning that we have no way to
invalidate the shmem pages allocated by vgem.  So, at least on platforms
without drm_cflush_pages(), we end up with corruption when cache lines
from previous usage of vgem bo pages get evicted to memory.

The only sane option is to use cached mappings.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
it is.  And that doesn't really help for drivers that don't attach/
detach for each use.

But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
need to care too much about use of cached mmap'ings.

 drivers/gpu/drm/vgem/vgem_drv.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

Chris Wilson July 16, 2019, 4:59 p.m. UTC | #1
Quoting Rob Clark (2019-07-16 17:42:15)
> From: Rob Clark <robdclark@chromium.org>
> 
> Since there is no real device associated with vgem, it is impossible to
> end up with appropriate dev->dma_ops, meaning that we have no way to
> invalidate the shmem pages allocated by vgem.  So, at least on platforms
> without drm_cflush_pages(), we end up with corruption when cache lines
> from previous usage of vgem bo pages get evicted to memory.
> 
> The only sane option is to use cached mappings.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
> Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
> although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
> it is.  And that doesn't really help for drivers that don't attach/
> detach for each use.
> 
> But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
> need to care too much about use of cached mmap'ings.

Sadly this regresses with i915 interop.

Starting subtest: 4KiB-tiny-vgem-blt-early-read-child
(gem_concurrent_blit:8309) CRITICAL: Test assertion failure function dmabuf_cmp_bo, file ../tests/i915/gem_concurrent_all.c:408:
(gem_concurrent_blit:8309) CRITICAL: Failed assertion: v[((y)*(b->width) + (((y) + pass)%(b->width)))] == val
(gem_concurrent_blit:8309) CRITICAL: error: 0 != 0xdeadbeef

and igt/prime_vgem

Can you please cc intel-gfx so CI can pick up these changes?
-Chris
Rob Clark July 16, 2019, 5:03 p.m. UTC | #2
On Tue, Jul 16, 2019 at 10:01 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Rob Clark (2019-07-16 17:42:15)
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Since there is no real device associated with vgem, it is impossible to
> > end up with appropriate dev->dma_ops, meaning that we have no way to
> > invalidate the shmem pages allocated by vgem.  So, at least on platforms
> > without drm_cflush_pages(), we end up with corruption when cache lines
> > from previous usage of vgem bo pages get evicted to memory.
> >
> > The only sane option is to use cached mappings.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> > Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
> > although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
> > it is.  And that doesn't really help for drivers that don't attach/
> > detach for each use.
> >
> > But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
> > need to care too much about use of cached mmap'ings.
>
> Sadly this regresses with i915 interop.
>
> Starting subtest: 4KiB-tiny-vgem-blt-early-read-child
> (gem_concurrent_blit:8309) CRITICAL: Test assertion failure function dmabuf_cmp_bo, file ../tests/i915/gem_concurrent_all.c:408:
> (gem_concurrent_blit:8309) CRITICAL: Failed assertion: v[((y)*(b->width) + (((y) + pass)%(b->width)))] == val
> (gem_concurrent_blit:8309) CRITICAL: error: 0 != 0xdeadbeef
>
> and igt/prime_vgem
>
> Can you please cc intel-gfx so CI can pick up these changes?
> -Chris

I suppose CI is actually reading the imported VGEM bo from GPU?  I can
try to wire up the attach/detach dma_sync, which might help..

BR,
-R
Daniel Vetter July 19, 2019, 9:09 a.m. UTC | #3
On Tue, Jul 16, 2019 at 09:42:15AM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Since there is no real device associated with vgem, it is impossible to
> end up with appropriate dev->dma_ops, meaning that we have no way to
> invalidate the shmem pages allocated by vgem.  So, at least on platforms
> without drm_cflush_pages(), we end up with corruption when cache lines
> from previous usage of vgem bo pages get evicted to memory.
> 
> The only sane option is to use cached mappings.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
> Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
> although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
> it is.  And that doesn't really help for drivers that don't attach/
> detach for each use.
> 
> But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
> need to care too much about use of cached mmap'ings.

Isn't this going to horribly break testing buffer sharing with SoC
devices? I'd assume they all expect writecombining mode to make sure stuff
is coherent?

Also could we get away with this by simply extending drm_cflush_pages for
those arm platforms where we do have a clflush instruction? Yes I know
that'll get people screaming, I'll shrug :-)

If all we need patch 1/2 for is this vgem patch then the auditing needed for
patch 1 doesn't look appealing ...
-Daniel

> 
>  drivers/gpu/drm/vgem/vgem_drv.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
> index 11a8f99ba18c..ccf0c3fbd586 100644
> --- a/drivers/gpu/drm/vgem/vgem_drv.c
> +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> @@ -259,9 +259,6 @@ static int vgem_mmap(struct file *filp, struct vm_area_struct *vma)
>  	if (ret)
>  		return ret;
>  
> -	/* Keep the WC mmaping set by drm_gem_mmap() but our pages
> -	 * are ordinary and not special.
> -	 */
>  	vma->vm_flags = flags | VM_DONTEXPAND | VM_DONTDUMP;
>  	return 0;
>  }
> @@ -382,7 +379,7 @@ static void *vgem_prime_vmap(struct drm_gem_object *obj)
>  	if (IS_ERR(pages))
>  		return NULL;
>  
> -	return vmap(pages, n_pages, 0, pgprot_writecombine(PAGE_KERNEL));
> +	return vmap(pages, n_pages, 0, PAGE_KERNEL);
>  }
>  
>  static void vgem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
> @@ -411,7 +408,7 @@ static int vgem_prime_mmap(struct drm_gem_object *obj,
>  	fput(vma->vm_file);
>  	vma->vm_file = get_file(obj->filp);
>  	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
> -	vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>  
>  	return 0;
>  }
> -- 
> 2.21.0
>
Rob Clark July 19, 2019, 3:04 p.m. UTC | #4
On Fri, Jul 19, 2019 at 2:09 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Jul 16, 2019 at 09:42:15AM -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Since there is no real device associated with vgem, it is impossible to
> > end up with appropriate dev->dma_ops, meaning that we have no way to
> > invalidate the shmem pages allocated by vgem.  So, at least on platforms
> > without drm_cflush_pages(), we end up with corruption when cache lines
> > from previous usage of vgem bo pages get evicted to memory.
> >
> > The only sane option is to use cached mappings.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> > Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
> > although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
> > it is.  And that doesn't really help for drivers that don't attach/
> > detach for each use.
> >
> > But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
> > need to care too much about use of cached mmap'ings.
>
> Isn't this going to horribly break testing buffer sharing with SoC
> devices? I'd assume they all expect writecombining mode to make sure stuff
> is coherent?
>
> Also could we get away with this by simply extending drm_cflush_pages for
> those arm platforms where we do have a clflush instruction? Yes I know
> that'll get people screaming, I'll shrug :-)
>
> If all we need patch 1/2 for is this vgem patch then the auditing needed for
> patch 1 doesn't look appealing ...

I think we should go w/ the simpler approach in that keeps WC (but
kinda relies on an implementation detail of dma-mapping, ie.
dev->dma_ops==NULL => dma_direct

IMO the first patch in this series is probably a thing we should try
to do somehow, it is a bit rude that core helpers are forcing WC.  But
not sure about how to land that smoothly.  Perhaps something worth
adding to the TODO list at any rate.

BR,
-R

> -Daniel
>
> >
> >  drivers/gpu/drm/vgem/vgem_drv.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
> > index 11a8f99ba18c..ccf0c3fbd586 100644
> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> > @@ -259,9 +259,6 @@ static int vgem_mmap(struct file *filp, struct vm_area_struct *vma)
> >       if (ret)
> >               return ret;
> >
> > -     /* Keep the WC mmaping set by drm_gem_mmap() but our pages
> > -      * are ordinary and not special.
> > -      */
> >       vma->vm_flags = flags | VM_DONTEXPAND | VM_DONTDUMP;
> >       return 0;
> >  }
> > @@ -382,7 +379,7 @@ static void *vgem_prime_vmap(struct drm_gem_object *obj)
> >       if (IS_ERR(pages))
> >               return NULL;
> >
> > -     return vmap(pages, n_pages, 0, pgprot_writecombine(PAGE_KERNEL));
> > +     return vmap(pages, n_pages, 0, PAGE_KERNEL);
> >  }
> >
> >  static void vgem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
> > @@ -411,7 +408,7 @@ static int vgem_prime_mmap(struct drm_gem_object *obj,
> >       fput(vma->vm_file);
> >       vma->vm_file = get_file(obj->filp);
> >       vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
> > -     vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> > +     vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> >
> >       return 0;
> >  }
> > --
> > 2.21.0
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
diff mbox series

Patch

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index 11a8f99ba18c..ccf0c3fbd586 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -259,9 +259,6 @@  static int vgem_mmap(struct file *filp, struct vm_area_struct *vma)
 	if (ret)
 		return ret;
 
-	/* Keep the WC mmaping set by drm_gem_mmap() but our pages
-	 * are ordinary and not special.
-	 */
 	vma->vm_flags = flags | VM_DONTEXPAND | VM_DONTDUMP;
 	return 0;
 }
@@ -382,7 +379,7 @@  static void *vgem_prime_vmap(struct drm_gem_object *obj)
 	if (IS_ERR(pages))
 		return NULL;
 
-	return vmap(pages, n_pages, 0, pgprot_writecombine(PAGE_KERNEL));
+	return vmap(pages, n_pages, 0, PAGE_KERNEL);
 }
 
 static void vgem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
@@ -411,7 +408,7 @@  static int vgem_prime_mmap(struct drm_gem_object *obj,
 	fput(vma->vm_file);
 	vma->vm_file = get_file(obj->filp);
 	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
-	vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
 
 	return 0;
 }