diff mbox series

[1/3] virtio-gpu uapi: Add VIRTIO_GPU_F_EXPLICIT_FLUSH feature

Message ID 20210511083610.367541-1-vivek.kasireddy@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/3] virtio-gpu uapi: Add VIRTIO_GPU_F_EXPLICIT_FLUSH feature | expand

Commit Message

Vivek Kasireddy May 11, 2021, 8:36 a.m. UTC
This feature enables the Guest to wait until a flush has been
performed on a buffer it has submitted to the Host.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 include/uapi/linux/virtio_gpu.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Gerd Hoffmann May 11, 2021, 10:29 a.m. UTC | #1
On Tue, May 11, 2021 at 01:36:08AM -0700, Vivek Kasireddy wrote:
> This feature enables the Guest to wait until a flush has been
> performed on a buffer it has submitted to the Host.

This needs a virtio-spec update documenting the new feature.

> +	VIRTIO_GPU_CMD_WAIT_FLUSH,

Why a new command?

If I understand it correctly you want wait until
VIRTIO_GPU_CMD_RESOURCE_FLUSH is done.  We could
extend the VIRTIO_GPU_CMD_RESOURCE_FLUSH command
for that instead.

take care,
  Gerd
Vivek Kasireddy May 11, 2021, 8:53 p.m. UTC | #2
Hi Gerd,

> On Tue, May 11, 2021 at 01:36:08AM -0700, Vivek Kasireddy wrote:
> > This feature enables the Guest to wait until a flush has been
> > performed on a buffer it has submitted to the Host.
> 
> This needs a virtio-spec update documenting the new feature.
[Kasireddy, Vivek] Yes, I was planning to do that after getting your 
thoughts on this feature.

> 
> > +	VIRTIO_GPU_CMD_WAIT_FLUSH,
> 
> Why a new command?
> 
> If I understand it correctly you want wait until
> VIRTIO_GPU_CMD_RESOURCE_FLUSH is done.  We could
> extend the VIRTIO_GPU_CMD_RESOURCE_FLUSH command
> for that instead.
[Kasireddy, Vivek] VIRTIO_GPU_CMD_RESOURCE_FLUSH can trigger/queue a
redraw that may be performed synchronously or asynchronously depending on the
UI (Glarea is async and gtk-egl is sync but can be made async). I'd like to make the
Guest wait until the actual redraw happens (until GlFLush or eglSwapBuffers, again
depending on the UI). 

However, as part of this feature (explicit flush), I'd like to make the Guest wait until
the current resource (as specified by resource_flush or set_scanout) is flushed or
synchronized. But for a different feature I am thinking of (explicit sync), I'd like to
make the Guest wait for the previous buffer/resource submitted (available via 
old_state->fb).

I think it may be possible to accomplish both features by overloading resource_flush
but given the various combinations of Guests (Android/Chrome OS, Windows, Linux)
and Hosts (Android/Chrome OS, Linux) that are or will be supported with virtio-gpu +
i915, I figured adding a new command might be cleaner.

Thanks,
Vivek


> 
> take care,
>   Gerd
Gerd Hoffmann May 12, 2021, 6:44 a.m. UTC | #3
Hi,

> However, as part of this feature (explicit flush), I'd like to make the Guest wait until
> the current resource (as specified by resource_flush or set_scanout) is flushed or
> synchronized. But for a different feature I am thinking of (explicit sync), I'd like to
> make the Guest wait for the previous buffer/resource submitted (available via 
> old_state->fb).

For page-flipping I guess?  i.e. you want submit a new framebuffer, then
wait until the host doesn't need the previous one?  That is likewise
linked to a command, although it is set_scanout this time.

So, right now qemu simply queues the request and completes the command
when a guest sends a resource_flush our set_scanout command.  You want
be notified when the host is actually done processing the request.

I still think it makes sense extend the resource_flush and set_scanout
commands for that, for example by adding a flag for the flags field in
the request header.  That way it is clear what exactly you are waiting
for.  You can also attach a fence to the request which you can later
wait on.

take care,
  Gerd
Vivek Kasireddy May 12, 2021, 9:18 p.m. UTC | #4
Hi Gerd,

> > However, as part of this feature (explicit flush), I'd like to make the Guest wait until
> > the current resource (as specified by resource_flush or set_scanout) is flushed or
> > synchronized. But for a different feature I am thinking of (explicit sync), I'd like to
> > make the Guest wait for the previous buffer/resource submitted (available via
> > old_state->fb).
> 
> For page-flipping I guess?  i.e. you want submit a new framebuffer, then
> wait until the host doesn't need the previous one?  That is likewise
> linked to a command, although it is set_scanout this time.
[Kasireddy, Vivek] Mainly for page-flipping but I'd also like to have fbcon, Xorg that
do frontbuffer rendering/updates to work seamlessly as well.

> 
> So, right now qemu simply queues the request and completes the command
> when a guest sends a resource_flush our set_scanout command.  You want
> be notified when the host is actually done processing the request.
[Kasireddy, Vivek] Correct, that is exactly what I want -- make the Guest wait
until it gets notified that the Host is completely done processing/using the fb.
However, there can be two resources the guest can be made to wait on: wait for
the new/current fb that is being submitted to be processed (explicit flush) or wait
for the previous fb that was submitted earlier (in the previous repaint cycle)
to be processed (explicit sync).

IIUC, Explicit sync only makes sense if 1) the Host windowing system also supports
that feature/protocol (currently only upstream Weston does but I'd like to add it to
Mutter if no one else does) or if there is a way to figure out (dma-buf sync file?) if
the Host has completely processed the fb and 2) if Qemu UI is not doing a blit and
instead submitting the guest fb/dmabuf directly to the Host windowing system.
As you are aware, 2) can possibly be done with dbus/pipewire Qemu UI backends
(I'll explore this soon) but not with GTK or SDL. 

Ideally, I'd like to have Explicit sync but until 2) above happens, Explicit flush can
be a reasonable alternative given the blit that happens with GTK/SDL backends. 
By making the Guest wait until the UI has submitted the buffer to the Host 
windowing system, we can theoretically tie the Guest repaint freq to the Host
vblank and thus have Guest apps render at 60 FPS by default instead of 90 FPS
that I see without expflush. This feature would also help Windows guests that
can only do frontbuffer rendering with Blobs.

> 
> I still think it makes sense extend the resource_flush and set_scanout
> commands for that, for example by adding a flag for the flags field in
> the request header.  That way it is clear what exactly you are waiting
> for.  You can also attach a fence to the request which you can later
> wait on.
[Kasireddy, Vivek] Yeah, I am reluctant to add a new cmd as well but I want
it to work for all Guests and Hosts. Anyway, let me try your idea of adding
specific flags and see if it works.

Thanks,
Vivek

> 
> take care,
>   Gerd
Gerd Hoffmann May 14, 2021, 10:38 a.m. UTC | #5
On Wed, May 12, 2021 at 09:18:37PM +0000, Kasireddy, Vivek wrote:
> Hi Gerd,
> 
> > > However, as part of this feature (explicit flush), I'd like to make the Guest wait until
> > > the current resource (as specified by resource_flush or set_scanout) is flushed or
> > > synchronized. But for a different feature I am thinking of (explicit sync), I'd like to
> > > make the Guest wait for the previous buffer/resource submitted (available via
> > > old_state->fb).
> > 
> > For page-flipping I guess?  i.e. you want submit a new framebuffer, then
> > wait until the host doesn't need the previous one?  That is likewise
> > linked to a command, although it is set_scanout this time.
> [Kasireddy, Vivek] Mainly for page-flipping but I'd also like to have fbcon, Xorg that
> do frontbuffer rendering/updates to work seamlessly as well.
> 
> > 
> > So, right now qemu simply queues the request and completes the command
> > when a guest sends a resource_flush our set_scanout command.  You want
> > be notified when the host is actually done processing the request.
> [Kasireddy, Vivek] Correct, that is exactly what I want -- make the Guest wait
> until it gets notified that the Host is completely done processing/using the fb.
> However, there can be two resources the guest can be made to wait on: wait for
> the new/current fb that is being submitted to be processed (explicit flush)

That would be wait on resource_flush case, right?

> or wait for the previous fb that was submitted earlier (in the
> previous repaint cycle) to be processed (explicit sync).

That would be the wait on set_scanout case, right?

And it would effectively wait on the previous fb not being needed by the
host any more (because the page-flip to the new fb completed) so the
guest can re-use the previous fb to render the next frame, right?

(also when doing front-buffer rendering with xorg/fbcon and then doing a
virtual console switch the guest could wait for the console switch being
completed).

> IIUC, Explicit sync only makes sense if 1) the Host windowing system also supports
> that feature/protocol (currently only upstream Weston does but I'd like to add it to
> Mutter if no one else does) or if there is a way to figure out (dma-buf sync file?) if
> the Host has completely processed the fb and 2) if Qemu UI is not doing a blit and
> instead submitting the guest fb/dmabuf directly to the Host windowing system.
> As you are aware, 2) can possibly be done with dbus/pipewire Qemu UI backends
> (I'll explore this soon) but not with GTK or SDL. 

Well, I think we need to clearly define the wait flag semantics.  Should
resource_flush with wait flag wait until the host is done reading the
resource (blit done)?  Or should it wait until the host screen has been
updated (gtk draw callback completed)?

Everything else will be a host/guest implementation detail then, and
of course this needs some integration with the UI on the host side and
different UIs might have to do different things.

On the guest side integrating this with fences will give us enough
flexibility on how we want handle the waits.  Simplest would be to just
block.  We could implement virtual vblanks, which would probably make
most userspace work fine without explicit virtio-gpu support.  If needed
we could even give userspace access to the fence so it can choose how to
wait.

take care,
  Gerd
Vivek Kasireddy May 18, 2021, 5:45 a.m. UTC | #6
Hi Gerd,

> > [Kasireddy, Vivek] Correct, that is exactly what I want -- make the Guest wait
> > until it gets notified that the Host is completely done processing/using the fb.
> > However, there can be two resources the guest can be made to wait on: wait for
> > the new/current fb that is being submitted to be processed (explicit flush)
> 
> That would be wait on resource_flush case, right?
[Kasireddy, Vivek] Yes, correct.

> 
> > or wait for the previous fb that was submitted earlier (in the
> > previous repaint cycle) to be processed (explicit sync).
> 
> That would be the wait on set_scanout case, right?
[Kasireddy, Vivek] Right.

> 
> And it would effectively wait on the previous fb not being needed by the
> host any more (because the page-flip to the new fb completed) so the
> guest can re-use the previous fb to render the next frame, right?
[Kasireddy, Vivek] Yup.

> 
> (also when doing front-buffer rendering with xorg/fbcon and then doing a
> virtual console switch the guest could wait for the console switch being
> completed).
> 
> > IIUC, Explicit sync only makes sense if 1) the Host windowing system also supports
> > that feature/protocol (currently only upstream Weston does but I'd like to add it to
> > Mutter if no one else does) or if there is a way to figure out (dma-buf sync file?) if
> > the Host has completely processed the fb and 2) if Qemu UI is not doing a blit and
> > instead submitting the guest fb/dmabuf directly to the Host windowing system.
> > As you are aware, 2) can possibly be done with dbus/pipewire Qemu UI backends
> > (I'll explore this soon) but not with GTK or SDL.
> 
> Well, I think we need to clearly define the wait flag semantics. 
[Kasireddy, Vivek] At-least with our passthrough use-case (maybe not with Virgl),
I think we need to ensure the following criteria:
1) With Blobs, ensure that the Guest and Host would never use the dmabuf/FB at
the same time. 
2) The Guest should not render more frames than the refresh rate of the Host so
that GPU resources are not wasted.

> Should resource_flush with wait flag wait until the host is done reading the
> resource (blit done)?
[Kasireddy, Vivek] I started with this but did not find it useful as it did not meet
2) above. However, I think we could have a flag for this if the Guest is using a
virtual vblank/timer and only wants to wait until the blit is done.

> Or should it wait until the host screen has been
> updated (gtk draw callback completed)?
[Kasireddy, Vivek] This is what the last 7 patches of my Blob series (v3) do. So,
we'd want to have a separate flag for this as well. And, lastly, we are going to
need another flag for the set_scanout case where we wait for the previous
fb to be synchronized.

> 
> Everything else will be a host/guest implementation detail then, and
> of course this needs some integration with the UI on the host side and
> different UIs might have to do different things.
[Kasireddy, Vivek] Sure, I think we can start with GTK and go from there.

> 
> On the guest side integrating this with fences will give us enough
> flexibility on how we want handle the waits.  Simplest would be to just
> block. 
[Kasireddy, Vivek] I agree; simply blocking (dma_fence_wait) is more than
enough for most use-cases.

>We could implement virtual vblanks, which would probably make
> most userspace work fine without explicit virtio-gpu support.  If needed
> we could even give userspace access to the fence so it can choose how to
> wait.
[Kasireddy, Vivek] Virtual vblanks is not a bad idea but I think blocking with
fences in the Guest kernel space seems more simpler. And, sharing fences with
the Guest compositor is also very interesting but I suspect we might need to
modify the compositor for this use-case, which might be a non-starter. Lastly,
even with virtual vblanks, we still need to make sure that we meet the two
criteria mentioned above. 

Thanks,
Vivek
Vivek Kasireddy May 24, 2021, 7:55 p.m. UTC | #7
Hi Gerd,
Any further comments on this?

Thanks,
Vivek

> 
> Hi Gerd,
> 
> > > [Kasireddy, Vivek] Correct, that is exactly what I want -- make the
> > > Guest wait until it gets notified that the Host is completely done processing/using the
> fb.
> > > However, there can be two resources the guest can be made to wait
> > > on: wait for the new/current fb that is being submitted to be
> > > processed (explicit flush)
> >
> > That would be wait on resource_flush case, right?
> [Kasireddy, Vivek] Yes, correct.
> 
> >
> > > or wait for the previous fb that was submitted earlier (in the
> > > previous repaint cycle) to be processed (explicit sync).
> >
> > That would be the wait on set_scanout case, right?
> [Kasireddy, Vivek] Right.
> 
> >
> > And it would effectively wait on the previous fb not being needed by
> > the host any more (because the page-flip to the new fb completed) so
> > the guest can re-use the previous fb to render the next frame, right?
> [Kasireddy, Vivek] Yup.
> 
> >
> > (also when doing front-buffer rendering with xorg/fbcon and then doing
> > a virtual console switch the guest could wait for the console switch
> > being completed).
> >
> > > IIUC, Explicit sync only makes sense if 1) the Host windowing system
> > > also supports that feature/protocol (currently only upstream Weston
> > > does but I'd like to add it to Mutter if no one else does) or if
> > > there is a way to figure out (dma-buf sync file?) if the Host has
> > > completely processed the fb and 2) if Qemu UI is not doing a blit and instead
> submitting the guest fb/dmabuf directly to the Host windowing system.
> > > As you are aware, 2) can possibly be done with dbus/pipewire Qemu UI
> > > backends (I'll explore this soon) but not with GTK or SDL.
> >
> > Well, I think we need to clearly define the wait flag semantics.
> [Kasireddy, Vivek] At-least with our passthrough use-case (maybe not with Virgl), I think
> we need to ensure the following criteria:
> 1) With Blobs, ensure that the Guest and Host would never use the dmabuf/FB at the same
> time.
> 2) The Guest should not render more frames than the refresh rate of the Host so that GPU
> resources are not wasted.
> 
> > Should resource_flush with wait flag wait until the host is done
> > reading the resource (blit done)?
> [Kasireddy, Vivek] I started with this but did not find it useful as it did not meet
> 2) above. However, I think we could have a flag for this if the Guest is using a virtual
> vblank/timer and only wants to wait until the blit is done.
> 
> > Or should it wait until the host screen has been updated (gtk draw
> > callback completed)?
> [Kasireddy, Vivek] This is what the last 7 patches of my Blob series (v3) do. So, we'd want
> to have a separate flag for this as well. And, lastly, we are going to need another flag for
> the set_scanout case where we wait for the previous fb to be synchronized.
> 
> >
> > Everything else will be a host/guest implementation detail then, and
> > of course this needs some integration with the UI on the host side and
> > different UIs might have to do different things.
> [Kasireddy, Vivek] Sure, I think we can start with GTK and go from there.
> 
> >
> > On the guest side integrating this with fences will give us enough
> > flexibility on how we want handle the waits.  Simplest would be to
> > just block.
> [Kasireddy, Vivek] I agree; simply blocking (dma_fence_wait) is more than enough for
> most use-cases.
> 
> >We could implement virtual vblanks, which would probably make  most
> >userspace work fine without explicit virtio-gpu support.  If needed  we
> >could even give userspace access to the fence so it can choose how to
> >wait.
> [Kasireddy, Vivek] Virtual vblanks is not a bad idea but I think blocking with fences in the
> Guest kernel space seems more simpler. And, sharing fences with the Guest compositor is
> also very interesting but I suspect we might need to modify the compositor for this use-
> case, which might be a non-starter. Lastly, even with virtual vblanks, we still need to make
> sure that we meet the two criteria mentioned above.
> 
> Thanks,
> Vivek
diff mbox series

Patch

diff --git a/include/uapi/linux/virtio_gpu.h b/include/uapi/linux/virtio_gpu.h
index 97523a95781d..8fe58657a473 100644
--- a/include/uapi/linux/virtio_gpu.h
+++ b/include/uapi/linux/virtio_gpu.h
@@ -60,6 +60,11 @@ 
  */
 #define VIRTIO_GPU_F_RESOURCE_BLOB       3
 
+/*
+ * VIRTIO_GPU_CMD_WAIT_FLUSH
+ */
+#define VIRTIO_GPU_F_EXPLICIT_FLUSH      4
+
 enum virtio_gpu_ctrl_type {
 	VIRTIO_GPU_UNDEFINED = 0,
 
@@ -78,6 +83,7 @@  enum virtio_gpu_ctrl_type {
 	VIRTIO_GPU_CMD_RESOURCE_ASSIGN_UUID,
 	VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB,
 	VIRTIO_GPU_CMD_SET_SCANOUT_BLOB,
+	VIRTIO_GPU_CMD_WAIT_FLUSH,
 
 	/* 3d commands */
 	VIRTIO_GPU_CMD_CTX_CREATE = 0x0200,
@@ -441,4 +447,10 @@  struct virtio_gpu_resource_unmap_blob {
 	__le32 padding;
 };
 
+/* VIRTIO_GPU_CMD_WAIT_FLUSH */
+struct virtio_gpu_wait_flush {
+	struct virtio_gpu_ctrl_hdr hdr;
+	__le32 resource_id;
+};
+
 #endif