diff mbox series

[v2,15/15] drm/bochs: reserve bo for pin/unpin

Message ID 20190108112519.27473-16-kraxel@redhat.com (mailing list archive)
State New, archived
Headers show
Series drm/bochs: cleanups, atomic modesetting, generic fbdev. | expand

Commit Message

Gerd Hoffmann Jan. 8, 2019, 11:25 a.m. UTC
The buffer object must be reserved before calling
ttm_bo_validate for pinning/unpinning.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/gpu/drm/bochs/bochs_mm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Daniel Vetter Jan. 9, 2019, 10:10 a.m. UTC | #1
On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote:
> The buffer object must be reserved before calling
> ttm_bo_validate for pinning/unpinning.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Seems a bit a bisect fumble in your series here: legacy kms code reserved
the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I
think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to
avoid bisect fail I think you need to have these temporarily in your
cleanup/prepare_plane functions too.

Looked through the entire series, this here is the only issue I think
should be fixed before merging (making atomic_enable optional can be done
as a follow-up if you feel like it). With that addressed on the series:

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/bochs/bochs_mm.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
> index cfe061c25f..970a591908 100644
> --- a/drivers/gpu/drm/bochs/bochs_mm.c
> +++ b/drivers/gpu/drm/bochs/bochs_mm.c
> @@ -223,7 +223,11 @@ int bochs_bo_pin(struct bochs_bo *bo, u32 pl_flag)
>  	bochs_ttm_placement(bo, pl_flag);
>  	for (i = 0; i < bo->placement.num_placement; i++)
>  		bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;
> +	ret = ttm_bo_reserve(&bo->bo, true, false, NULL);
> +	if (ret)
> +		return ret;
>  	ret = ttm_bo_validate(&bo->bo, &bo->placement, &ctx);
> +	ttm_bo_unreserve(&bo->bo);
>  	if (ret)
>  		return ret;
>  
> @@ -247,7 +251,11 @@ int bochs_bo_unpin(struct bochs_bo *bo)
>  
>  	for (i = 0; i < bo->placement.num_placement; i++)
>  		bo->placements[i].flags &= ~TTM_PL_FLAG_NO_EVICT;
> +	ret = ttm_bo_reserve(&bo->bo, true, false, NULL);
> +	if (ret)
> +		return ret;
>  	ret = ttm_bo_validate(&bo->bo, &bo->placement, &ctx);
> +	ttm_bo_unreserve(&bo->bo);
>  	if (ret)
>  		return ret;
>  
> -- 
> 2.9.3
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Gerd Hoffmann Jan. 9, 2019, 2:54 p.m. UTC | #2
On Wed, Jan 09, 2019 at 11:10:44AM +0100, Daniel Vetter wrote:
> On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote:
> > The buffer object must be reserved before calling
> > ttm_bo_validate for pinning/unpinning.
> > 
> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> 
> Seems a bit a bisect fumble in your series here: legacy kms code reserved
> the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I
> think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to
> avoid bisect fail I think you need to have these temporarily in your
> cleanup/prepare_plane functions too.

I think I've sorted that.  Have some other changes too, will probably
send v3 tomorrow.

> Looked through the entire series, this here is the only issue I think
> should be fixed before merging (making atomic_enable optional can be done
> as a follow-up if you feel like it). With that addressed on the series:
> 
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Thanks.

While being at it:  I'm also looking at dma-buf export and import
support for the qemu drivers.

Right now both qxl and virtio have gem_prime_get_sg_table and
gem_prime_import_sg_table handlers which throw a WARN_ONCE() and return
an error.

If I understand things correctly it is valid to set all import/export
callbacks (prime_handle_to_fd, prime_fd_to_handle,
gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
supporting dma-buf import/export and still advertise DRIVER_PRIME to
indicate the other prime callbacks are supported (so generic fbdev
emulation can use gem_prime_vmap etc).  Is that correct?

On exporting:

TTM_PL_TT should be easy, just pin the buffer, grab the pages list and
feed that into drm_prime_pages_to_sg.  Didn't try yet though.  Is that
approach correct?

Is it possible to export TTM_PL_VRAM objects (with backing storage being
a pci memory bar)?  If so, how?

On importing:

Importing into TTM_PL_TT object looks easy again, at least when the
object is actually stored in RAM.  What if not?

Importing into TTM_PL_VRAM:  Impossible I think, without copying over
the data.  Should that be done?  If so, how?  Or is it better to just
not support import then?

thanks,
  Gerd
Daniel Vetter Jan. 9, 2019, 5:35 p.m. UTC | #3
On Wed, Jan 9, 2019 at 3:54 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> On Wed, Jan 09, 2019 at 11:10:44AM +0100, Daniel Vetter wrote:
> > On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote:
> > > The buffer object must be reserved before calling
> > > ttm_bo_validate for pinning/unpinning.
> > >
> > > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> >
> > Seems a bit a bisect fumble in your series here: legacy kms code reserved
> > the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I
> > think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to
> > avoid bisect fail I think you need to have these temporarily in your
> > cleanup/prepare_plane functions too.
>
> I think I've sorted that.  Have some other changes too, will probably
> send v3 tomorrow.
>
> > Looked through the entire series, this here is the only issue I think
> > should be fixed before merging (making atomic_enable optional can be done
> > as a follow-up if you feel like it). With that addressed on the series:
> >
> > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Thanks.
>
> While being at it:  I'm also looking at dma-buf export and import
> support for the qemu drivers.
>
> Right now both qxl and virtio have gem_prime_get_sg_table and
> gem_prime_import_sg_table handlers which throw a WARN_ONCE() and return
> an error.
>
> If I understand things correctly it is valid to set all import/export
> callbacks (prime_handle_to_fd, prime_fd_to_handle,
> gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
> supporting dma-buf import/export and still advertise DRIVER_PRIME to
> indicate the other prime callbacks are supported (so generic fbdev
> emulation can use gem_prime_vmap etc).  Is that correct?

I'm not sure how much that's a good idea ... Never thought about it
tbh. All the fbdev/dma-buf stuff has plenty of hacks and
inconsistencies still, so I guess we can't make it much worse really.

> On exporting:
>
> TTM_PL_TT should be easy, just pin the buffer, grab the pages list and
> feed that into drm_prime_pages_to_sg.  Didn't try yet though.  Is that
> approach correct?
>
> Is it possible to export TTM_PL_VRAM objects (with backing storage being
> a pci memory bar)?  If so, how?

Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO)
and then knows the internals so it can do a proper pci peer2peer
mapping. Or at least there's been lots of patches floating around to
make that happen.

I think other drivers migrate the bo out of VRAM.

> On importing:
>
> Importing into TTM_PL_TT object looks easy again, at least when the
> object is actually stored in RAM.  What if not?

They are all supposed to be stored in RAM. Note that all current ttm
importers totally break the abstraction, by taking the sg list,
throwing the dma mapping away and assuming there's a struct page
backing it. Would be good if we could stop spreading that abuse - the
dma-buf interfaces have been modelled after the ttm bo interfaces, so
shouldn't be too hard to wire this up correctly.

> Importing into TTM_PL_VRAM:  Impossible I think, without copying over
> the data.  Should that be done?  If so, how?  Or is it better to just
> not support import then?

Hm, since you ask about TTM concepts and not what this means in terms
of dma-buf: As long as you upcast to the ttm_bo you can do whatever
you want to really. But with plain dma-buf this doesn't work right now
(least because ttm assumes it gets system RAM on import, in theory you
could put the peer2peer dma mapping into the sg list and it should
work).
-Daniel

>
> thanks,
>   Gerd
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Alex Deucher Jan. 9, 2019, 6:45 p.m. UTC | #4
On Wed, Jan 9, 2019 at 12:36 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Jan 9, 2019 at 3:54 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> >
> > On Wed, Jan 09, 2019 at 11:10:44AM +0100, Daniel Vetter wrote:
> > > On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote:
> > > > The buffer object must be reserved before calling
> > > > ttm_bo_validate for pinning/unpinning.
> > > >
> > > > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> > >
> > > Seems a bit a bisect fumble in your series here: legacy kms code reserved
> > > the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I
> > > think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to
> > > avoid bisect fail I think you need to have these temporarily in your
> > > cleanup/prepare_plane functions too.
> >
> > I think I've sorted that.  Have some other changes too, will probably
> > send v3 tomorrow.
> >
> > > Looked through the entire series, this here is the only issue I think
> > > should be fixed before merging (making atomic_enable optional can be done
> > > as a follow-up if you feel like it). With that addressed on the series:
> > >
> > > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> > Thanks.
> >
> > While being at it:  I'm also looking at dma-buf export and import
> > support for the qemu drivers.
> >
> > Right now both qxl and virtio have gem_prime_get_sg_table and
> > gem_prime_import_sg_table handlers which throw a WARN_ONCE() and return
> > an error.
> >
> > If I understand things correctly it is valid to set all import/export
> > callbacks (prime_handle_to_fd, prime_fd_to_handle,
> > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
> > supporting dma-buf import/export and still advertise DRIVER_PRIME to
> > indicate the other prime callbacks are supported (so generic fbdev
> > emulation can use gem_prime_vmap etc).  Is that correct?
>
> I'm not sure how much that's a good idea ... Never thought about it
> tbh. All the fbdev/dma-buf stuff has plenty of hacks and
> inconsistencies still, so I guess we can't make it much worse really.
>
> > On exporting:
> >
> > TTM_PL_TT should be easy, just pin the buffer, grab the pages list and
> > feed that into drm_prime_pages_to_sg.  Didn't try yet though.  Is that
> > approach correct?
> >
> > Is it possible to export TTM_PL_VRAM objects (with backing storage being
> > a pci memory bar)?  If so, how?
>
> Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO)
> and then knows the internals so it can do a proper pci peer2peer
> mapping. Or at least there's been lots of patches floating around to
> make that happen.

Here's Christian's WIP stuff for adding device memory support to dma-buf:
https://cgit.freedesktop.org/~deathsimple/linux/log/?h=p2p

Alex

>
> I think other drivers migrate the bo out of VRAM.
>
> > On importing:
> >
> > Importing into TTM_PL_TT object looks easy again, at least when the
> > object is actually stored in RAM.  What if not?
>
> They are all supposed to be stored in RAM. Note that all current ttm
> importers totally break the abstraction, by taking the sg list,
> throwing the dma mapping away and assuming there's a struct page
> backing it. Would be good if we could stop spreading that abuse - the
> dma-buf interfaces have been modelled after the ttm bo interfaces, so
> shouldn't be too hard to wire this up correctly.
>
> > Importing into TTM_PL_VRAM:  Impossible I think, without copying over
> > the data.  Should that be done?  If so, how?  Or is it better to just
> > not support import then?
>
> Hm, since you ask about TTM concepts and not what this means in terms
> of dma-buf: As long as you upcast to the ttm_bo you can do whatever
> you want to really. But with plain dma-buf this doesn't work right now
> (least because ttm assumes it gets system RAM on import, in theory you
> could put the peer2peer dma mapping into the sg list and it should
> work).
> -Daniel
>
> >
> > thanks,
> >   Gerd
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Gerd Hoffmann Jan. 9, 2019, 8:51 p.m. UTC | #5
Hi,

> > If I understand things correctly it is valid to set all import/export
> > callbacks (prime_handle_to_fd, prime_fd_to_handle,
> > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
> > supporting dma-buf import/export and still advertise DRIVER_PRIME to
> > indicate the other prime callbacks are supported (so generic fbdev
> > emulation can use gem_prime_vmap etc).  Is that correct?
> 
> I'm not sure how much that's a good idea ... Never thought about it
> tbh. All the fbdev/dma-buf stuff has plenty of hacks and
> inconsistencies still, so I guess we can't make it much worse really.

Setting prime_handle_to_fd + prime_fd_to_handle to NULL has the effect
that drm stops advertising DRM_PRIME_CAP_{IMPORT,EXPORT} to userspace.

Which looks better to me than telling userspace we support it then throw
errors unconditionally when userspace tries to use that.

> > Is it possible to export TTM_PL_VRAM objects (with backing storage being
> > a pci memory bar)?  If so, how?
> 
> Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO)
> and then knows the internals so it can do a proper pci peer2peer
> mapping. Or at least there's been lots of patches floating around to
> make that happen.

That is limited to bo sharing between two amdgpu devices, correct?

> I think other drivers migrate the bo out of VRAM.

Well, that doesn't look too useful.  bochs and qxl virtual hardware
can't access buffers outside VRAM.  So, while I could migrate the
buffers to RAM (via memcpy) when exporting they would at the same time
become unusable for the GPU ...

> > On importing:
> >
> > Importing into TTM_PL_TT object looks easy again, at least when the
> > object is actually stored in RAM.  What if not?
> 
> They are all supposed to be stored in RAM. Note that all current ttm
> importers totally break the abstraction, by taking the sg list,
> throwing the dma mapping away and assuming there's a struct page
> backing it. Would be good if we could stop spreading that abuse - the
> dma-buf interfaces have been modelled after the ttm bo interfaces, so
> shouldn't be too hard to wire this up correctly.

Ok.  With virtio-gpu (where objects are backed by RAM pages anyway)
wiring this up should be easy.

But given there is no correct sample code I can look at it would be cool
if you could give some more hints how this is supposed to work.  The
gem_prime_import_sg_table() callback gets a sg list passed in after all,
so I probably would have tried to take the sg list too ...

> > Importing into TTM_PL_VRAM:  Impossible I think, without copying over
> > the data.  Should that be done?  If so, how?  Or is it better to just
> > not support import then?
> 
> Hm, since you ask about TTM concepts and not what this means in terms
> of dma-buf:

Ok, more details on the quesion:

dma-buf: whatever the driver gets passed into the
gem_prime_import_sg_table() callback.

import into TTM_PL_VRAM: qemu driver which supports VRAM storage only
(bochs, qxl), so the buffer has to be stored there if we want do
something with it (like scanning out to a crtc).

> As long as you upcast to the ttm_bo you can do whatever
> you want to really.

Well, if the dma-buf comes from another device (say export vgem bo, then
try import into bochs/qxl/virtio) I can't upcast.

When the dma-buf comes from the same device drm_gem_prime_import_dev()
will notice and take a shortcut (skip import, just increase refcount
instead), so I don't have to worry about that case in the
gem_prime_import_sg_table() callback.

> But with plain dma-buf this doesn't work right now
> (least because ttm assumes it gets system RAM on import, in theory you
> could put the peer2peer dma mapping into the sg list and it should
> work).

Well, qemu display devices don't have peer2peer dma support.
So I guess the answer is "doesn't work".

cheers,
  Gerd
Daniel Vetter Jan. 9, 2019, 9:45 p.m. UTC | #6
On Wed, Jan 9, 2019 at 9:52 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > > If I understand things correctly it is valid to set all import/export
> > > callbacks (prime_handle_to_fd, prime_fd_to_handle,
> > > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
> > > supporting dma-buf import/export and still advertise DRIVER_PRIME to
> > > indicate the other prime callbacks are supported (so generic fbdev
> > > emulation can use gem_prime_vmap etc).  Is that correct?
> >
> > I'm not sure how much that's a good idea ... Never thought about it
> > tbh. All the fbdev/dma-buf stuff has plenty of hacks and
> > inconsistencies still, so I guess we can't make it much worse really.
>
> Setting prime_handle_to_fd + prime_fd_to_handle to NULL has the effect
> that drm stops advertising DRM_PRIME_CAP_{IMPORT,EXPORT} to userspace.
>
> Which looks better to me than telling userspace we support it then throw
> errors unconditionally when userspace tries to use that.
>
> > > Is it possible to export TTM_PL_VRAM objects (with backing storage being
> > > a pci memory bar)?  If so, how?
> >
> > Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO)
> > and then knows the internals so it can do a proper pci peer2peer
> > mapping. Or at least there's been lots of patches floating around to
> > make that happen.
>
> That is limited to bo sharing between two amdgpu devices, correct?
>
> > I think other drivers migrate the bo out of VRAM.
>
> Well, that doesn't look too useful.  bochs and qxl virtual hardware
> can't access buffers outside VRAM.  So, while I could migrate the
> buffers to RAM (via memcpy) when exporting they would at the same time
> become unusable for the GPU ...
>
> > > On importing:
> > >
> > > Importing into TTM_PL_TT object looks easy again, at least when the
> > > object is actually stored in RAM.  What if not?
> >
> > They are all supposed to be stored in RAM. Note that all current ttm
> > importers totally break the abstraction, by taking the sg list,
> > throwing the dma mapping away and assuming there's a struct page
> > backing it. Would be good if we could stop spreading that abuse - the
> > dma-buf interfaces have been modelled after the ttm bo interfaces, so
> > shouldn't be too hard to wire this up correctly.
>
> Ok.  With virtio-gpu (where objects are backed by RAM pages anyway)
> wiring this up should be easy.
>
> But given there is no correct sample code I can look at it would be cool
> if you could give some more hints how this is supposed to work.  The
> gem_prime_import_sg_table() callback gets a sg list passed in after all,
> so I probably would have tried to take the sg list too ...

I'm not a fan of that helper either, that's really the broken part
imo. i915 doesn't use it. It's a midlayer so that the nvidia blob can
avoid directly touching the EXPORT_SYMBOL_GPL dma-buf symbols, afaiui
there's really no other solid reason for it. What the new gem cma
helpers does is imo much better (it still uses the import_sg_table
midlayer, but oh well).

For ttm you'd need to make sure that all the various ttm cpu side
access functions also all go through the relevant dma-buf interfaces,
and not through the struct page list fished out of the sgt. That was
at least the idea, long ago.

> > > Importing into TTM_PL_VRAM:  Impossible I think, without copying over
> > > the data.  Should that be done?  If so, how?  Or is it better to just
> > > not support import then?
> >
> > Hm, since you ask about TTM concepts and not what this means in terms
> > of dma-buf:
>
> Ok, more details on the quesion:
>
> dma-buf: whatever the driver gets passed into the
> gem_prime_import_sg_table() callback.
>
> import into TTM_PL_VRAM: qemu driver which supports VRAM storage only
> (bochs, qxl), so the buffer has to be stored there if we want do
> something with it (like scanning out to a crtc).
>
> > As long as you upcast to the ttm_bo you can do whatever
> > you want to really.
>
> Well, if the dma-buf comes from another device (say export vgem bo, then
> try import into bochs/qxl/virtio) I can't upcast.

In that case you'll in practice only get system RAM, and you're not
allowed to move it (dma-buf is meant to be zero-copy after all). If
your hw can't scan these out directly, then userspace needs to arrange
for a buffer copy into a native buffer somehow (that's how Xorg prime
works at least I think). No idea whether your virtual gpus can make
use of that directly. You might also get some pci peer2peer range in
the future, but it's strictly opt-in (because there's too many dma-buf
importers that just blindly assume there's a struct page behind the
sgt).

> When the dma-buf comes from the same device drm_gem_prime_import_dev()
> will notice and take a shortcut (skip import, just increase refcount
> instead), so I don't have to worry about that case in the
> gem_prime_import_sg_table() callback.

You can also upcast if it's from the same driver, not just same device.
-Daniel

> > But with plain dma-buf this doesn't work right now
> > (least because ttm assumes it gets system RAM on import, in theory you
> > could put the peer2peer dma mapping into the sg list and it should
> > work).
>
> Well, qemu display devices don't have peer2peer dma support.
> So I guess the answer is "doesn't work".
>
> cheers,
>   Gerd
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index cfe061c25f..970a591908 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -223,7 +223,11 @@  int bochs_bo_pin(struct bochs_bo *bo, u32 pl_flag)
 	bochs_ttm_placement(bo, pl_flag);
 	for (i = 0; i < bo->placement.num_placement; i++)
 		bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;
+	ret = ttm_bo_reserve(&bo->bo, true, false, NULL);
+	if (ret)
+		return ret;
 	ret = ttm_bo_validate(&bo->bo, &bo->placement, &ctx);
+	ttm_bo_unreserve(&bo->bo);
 	if (ret)
 		return ret;
 
@@ -247,7 +251,11 @@  int bochs_bo_unpin(struct bochs_bo *bo)
 
 	for (i = 0; i < bo->placement.num_placement; i++)
 		bo->placements[i].flags &= ~TTM_PL_FLAG_NO_EVICT;
+	ret = ttm_bo_reserve(&bo->bo, true, false, NULL);
+	if (ret)
+		return ret;
 	ret = ttm_bo_validate(&bo->bo, &bo->placement, &ctx);
+	ttm_bo_unreserve(&bo->bo);
 	if (ret)
 		return ret;