Message ID | 20230712224424.30158-4-dongwon.kim@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/virtio: synchronous guest framebuffer update | expand |
On 7/13/23 01:44, Dongwon Kim wrote: > This helper is needed for framebuffer synchronization. Old framebuffer data > is often displayed on the guest display without this helper. > > Cc: Gerd Hoffmann <kraxel@redhat.com> > Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> > Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> > --- > drivers/gpu/drm/virtio/virtgpu_plane.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c > index a063f06ab6c5..e197299489ce 100644 > --- a/drivers/gpu/drm/virtio/virtgpu_plane.c > +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c > @@ -26,6 +26,7 @@ > #include <drm/drm_atomic_helper.h> > #include <drm/drm_damage_helper.h> > #include <drm/drm_fourcc.h> > +#include <drm/drm_gem_atomic_helper.h> > > #include "virtgpu_drv.h" > > @@ -271,6 +272,9 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane, > vgfb = to_virtio_gpu_framebuffer(new_state->fb); > vgplane_st = to_virtio_gpu_plane_state(new_state); > bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]); > + > + drm_gem_plane_helper_prepare_fb(plane, new_state); The implicit display BO sync should happen on a host side, unless you're rendering with Venus and then displaying with virgl. Doing it on guest side should be a major performance hit. Please provide a complete description of your setup: what VMM you use, config options, what tests you're running.
On 8/17/2023 7:33 PM, Dmitry Osipenko wrote: > On 7/13/23 01:44, Dongwon Kim wrote: >> This helper is needed for framebuffer synchronization. Old framebuffer data >> is often displayed on the guest display without this helper. >> >> Cc: Gerd Hoffmann <kraxel@redhat.com> >> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> >> Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> >> --- >> drivers/gpu/drm/virtio/virtgpu_plane.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c >> index a063f06ab6c5..e197299489ce 100644 >> --- a/drivers/gpu/drm/virtio/virtgpu_plane.c >> +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c >> @@ -26,6 +26,7 @@ >> #include <drm/drm_atomic_helper.h> >> #include <drm/drm_damage_helper.h> >> #include <drm/drm_fourcc.h> >> +#include <drm/drm_gem_atomic_helper.h> >> >> #include "virtgpu_drv.h" >> >> @@ -271,6 +272,9 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane, >> vgfb = to_virtio_gpu_framebuffer(new_state->fb); >> vgplane_st = to_virtio_gpu_plane_state(new_state); >> bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]); >> + >> + drm_gem_plane_helper_prepare_fb(plane, new_state); > The implicit display BO sync should happen on a host side, unless you're > rendering with Venus and then displaying with virgl. Doing it on guest > side should be a major performance hit. Please provide a complete > description of your setup: what VMM you use, config options, what tests > you're running. We use virtio-gpu as a kms device while using i915 as the render device in our setup. And we use QEMU as VMM. Virtio-gpu driver flushes the scanout to QEMU as a blob resource (reference to the buffer). QEMU then creates a dmabuf using udmabuf for the blob then renders it as a texture using OGL. The test I ran is simple. Just starting terminal app and typing things to see if there is any frame regression. I believe this helper is required since the BO on the guest is basically dmabuf that is being shared between i915 and virtio-gpu driver. I didn't think about the performance impact. If the impact is too much and that is not acceptable, is there any other suggestions or some tests I can try? Thanks!
On 8/20/23 23:58, Kim, Dongwon wrote: > On 8/17/2023 7:33 PM, Dmitry Osipenko wrote: >> On 7/13/23 01:44, Dongwon Kim wrote: >>> This helper is needed for framebuffer synchronization. Old >>> framebuffer data >>> is often displayed on the guest display without this helper. >>> >>> Cc: Gerd Hoffmann <kraxel@redhat.com> >>> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> >>> Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> >>> --- >>> drivers/gpu/drm/virtio/virtgpu_plane.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c >>> b/drivers/gpu/drm/virtio/virtgpu_plane.c >>> index a063f06ab6c5..e197299489ce 100644 >>> --- a/drivers/gpu/drm/virtio/virtgpu_plane.c >>> +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c >>> @@ -26,6 +26,7 @@ >>> #include <drm/drm_atomic_helper.h> >>> #include <drm/drm_damage_helper.h> >>> #include <drm/drm_fourcc.h> >>> +#include <drm/drm_gem_atomic_helper.h> >>> #include "virtgpu_drv.h" >>> @@ -271,6 +272,9 @@ static int virtio_gpu_plane_prepare_fb(struct >>> drm_plane *plane, >>> vgfb = to_virtio_gpu_framebuffer(new_state->fb); >>> vgplane_st = to_virtio_gpu_plane_state(new_state); >>> bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]); >>> + >>> + drm_gem_plane_helper_prepare_fb(plane, new_state); >> The implicit display BO sync should happen on a host side, unless you're >> rendering with Venus and then displaying with virgl. Doing it on guest >> side should be a major performance hit. Please provide a complete >> description of your setup: what VMM you use, config options, what tests >> you're running. > > We use virtio-gpu as a kms device while using i915 as the render device > in our setup. > And we use QEMU as VMM. Virtio-gpu driver flushes the scanout to QEMU as > a blob resource > (reference to the buffer). QEMU then creates a dmabuf using udmabuf for > the blob > then renders it as a texture using OGL. The test I ran is simple. Just > starting terminal > app and typing things to see if there is any frame regression. I believe > this helper is > required since the BO on the guest is basically dmabuf that is being > shared between i915 > and virtio-gpu driver. I didn't think about the performance impact. If > the impact is > too much and that is not acceptable, is there any other suggestions or > some tests I can try? You can do fence-wait in the guest userspace/Mesa after blitting/drawing to the udmabuf. You may run popular vk/gl gfx benchmarks using gl/sdl outputs to see the fps impact. Virglrender today supports native contexts. The method you're using for GPU priming was proven to be slow in comparison to multi-gpu native contexts. There is ongoing work for supporting fence passing from guest to host [1] that allows to do fence-syncing on host. You'll find links to the WIP virtio-intel native context in [1] as well. You won't find GPU priming support using native context in [1], patches hasn't been published yet. [1] https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1138 Note that in general it's not acceptable to upstream patches that serve downstream only. Yours display sync issue is irrelevant to the upstream stack unless you're going to upstream all the VMM and guest userspace patches, and in such case you should always publish all the patches and provide links. So, you need to check the performance impact and publish all the patches to the relevant upstream projects.
On 8/23/2023 8:52 PM, Dmitry Osipenko wrote: > On 8/20/23 23:58, Kim, Dongwon wrote: >> On 8/17/2023 7:33 PM, Dmitry Osipenko wrote: >>> On 7/13/23 01:44, Dongwon Kim wrote: >>>> This helper is needed for framebuffer synchronization. Old >>>> framebuffer data >>>> is often displayed on the guest display without this helper. >>>> >>>> Cc: Gerd Hoffmann <kraxel@redhat.com> >>>> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> >>>> Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> >>>> --- >>>> drivers/gpu/drm/virtio/virtgpu_plane.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c >>>> b/drivers/gpu/drm/virtio/virtgpu_plane.c >>>> index a063f06ab6c5..e197299489ce 100644 >>>> --- a/drivers/gpu/drm/virtio/virtgpu_plane.c >>>> +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c >>>> @@ -26,6 +26,7 @@ >>>> #include <drm/drm_atomic_helper.h> >>>> #include <drm/drm_damage_helper.h> >>>> #include <drm/drm_fourcc.h> >>>> +#include <drm/drm_gem_atomic_helper.h> >>>> #include "virtgpu_drv.h" >>>> @@ -271,6 +272,9 @@ static int virtio_gpu_plane_prepare_fb(struct >>>> drm_plane *plane, >>>> vgfb = to_virtio_gpu_framebuffer(new_state->fb); >>>> vgplane_st = to_virtio_gpu_plane_state(new_state); >>>> bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]); >>>> + >>>> + drm_gem_plane_helper_prepare_fb(plane, new_state); >>> The implicit display BO sync should happen on a host side, unless you're >>> rendering with Venus and then displaying with virgl. Doing it on guest >>> side should be a major performance hit. Please provide a complete >>> description of your setup: what VMM you use, config options, what tests >>> you're running. >> We use virtio-gpu as a kms device while using i915 as the render device >> in our setup. >> And we use QEMU as VMM. Virtio-gpu driver flushes the scanout to QEMU as >> a blob resource >> (reference to the buffer). QEMU then creates a dmabuf using udmabuf for >> the blob >> then renders it as a texture using OGL. The test I ran is simple. Just >> starting terminal >> app and typing things to see if there is any frame regression. I believe >> this helper is >> required since the BO on the guest is basically dmabuf that is being >> shared between i915 >> and virtio-gpu driver. I didn't think about the performance impact. If >> the impact is >> too much and that is not acceptable, is there any other suggestions or >> some tests I can try? > You can do fence-wait in the guest userspace/Mesa after blitting/drawing > to the udmabuf. There is already synchronization between QEMU and virtio-gpu driver on the guest. Upon resource flush, virtio-gpu waits for the response for the message from the QEMU and QEMU sends out the response once rendering is done. The problem we are seeing is not that the rendering part is reusing the buffer before it's displayed by the QEMU. Problem we are facing is more like some frame is often not finished when "resource-flush" is issued. So unless there is a way for QEMU to wait for this fence (guest drm), I think we should have some synchronization point in the guest side. I saw other DRM drivers, omap, tegra, vc4 and so on are doing the similar so I guess this is a generic solution for such cases. But I do understand your concern as the primary use case of virtio-gpu driver is for virgl. So this extra wait would cost some performance drop. But I have a couple of points here as well. 1. Wouldn't this extra wait caused by drm_gem_plane_helper_prepare_fb be minimal as the actual rendering is done in the host? 2. Can we just make this helper called only if virgl is not used as 3D driver? > > You may run popular vk/gl gfx benchmarks using gl/sdl outputs to see the > fps impact. ok > > Virglrender today supports native contexts. The method you're using for > GPU priming was proven to be slow in comparison to multi-gpu native > contexts. There is ongoing work for supporting fence passing from guest > to host [1] that allows to do fence-syncing on host. You'll find links > to the WIP virtio-intel native context in [1] as well. You won't find > GPU priming support using native context in [1], patches hasn't been > published yet. > > [1] > https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1138 > > Note that in general it's not acceptable to upstream patches that serve > downstream only. Yours display sync issue is irrelevant to the upstream > stack unless you're going to upstream all the VMM and guest userspace > patches, and in such case you should always publish all the patches and > provide links. > > So, you need to check the performance impact and publish all the patches > to the relevant upstream projects. QEMU has all patches regarding this (blob scanout support) but guest Mesa patch for KMSRO is still outstanding. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592 I understand this specific patch would need more discussion/justification but what about other two, are you generally ok with those in the same series? Thanks!!
On 8/24/23 20:58, Kim, Dongwon wrote: ... >> You can do fence-wait in the guest userspace/Mesa after blitting/drawing >> to the udmabuf. > > There is already synchronization between QEMU and virtio-gpu driver on > the guest. Upon resource flush, virtio-gpu waits for the response for > the message from the QEMU and QEMU sends out the response once rendering > is done. The problem we are seeing is not that the rendering part is > reusing the buffer before it's displayed by the QEMU. Problem we are > facing is more like some frame is often not finished when > "resource-flush" is issued. So unless there is a way for QEMU to wait > for this fence (guest drm), I think we should have some synchronization > point in the guest side. > > I saw other DRM drivers, omap, tegra, vc4 and so on are doing the > similar so I guess this is a generic solution for such cases. But I do > understand your concern as the primary use case of virtio-gpu driver is > for virgl. So this extra wait would cost some performance drop. But I > have a couple of points here as well. > > 1. Wouldn't this extra wait caused by drm_gem_plane_helper_prepare_fb be > minimal as the actual > rendering is done in the host? > > 2. Can we just make this helper called only if virgl is not used as 3D > driver? The problem you described above shouldn't be resolved by your patch. You need to wait for FB to be released by the host's display and not to before GPU finished rendering on guest. I.e. you're swapping display buffers and your dGPU starts rendering to the buffer that is in active use by host's display, correct? Maybe you need to do glFinish() on host after swapping buffers? But that will block host for a long time. For now I don't have solution. >> Virglrender today supports native contexts. The method you're using for >> GPU priming was proven to be slow in comparison to multi-gpu native >> contexts. There is ongoing work for supporting fence passing from guest >> to host [1] that allows to do fence-syncing on host. You'll find links >> to the WIP virtio-intel native context in [1] as well. You won't find >> GPU priming support using native context in [1], patches hasn't been >> published yet. >> >> [1] >> https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1138 >> >> Note that in general it's not acceptable to upstream patches that serve >> downstream only. Yours display sync issue is irrelevant to the upstream >> stack unless you're going to upstream all the VMM and guest userspace >> patches, and in such case you should always publish all the patches and >> provide links. >> >> So, you need to check the performance impact and publish all the patches >> to the relevant upstream projects. > > QEMU has all patches regarding this (blob scanout support) but guest Mesa > patch for KMSRO is still outstanding. > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592 > > I understand this specific patch would need more > discussion/justification but > what about other two, are you generally ok with those in the same series? The first patch should be dropped. The second one could be useful, you'll need to provide step-by-step instruction for how to reproduce the multi-display issue, please write it in the cover-letter for the next patch version.
Hi Dmitry, On 8/31/2023 3:51 PM, Dmitry Osipenko wrote: > On 8/24/23 20:58, Kim, Dongwon wrote: > ... >>> You can do fence-wait in the guest userspace/Mesa after blitting/drawing >>> to the udmabuf. >> There is already synchronization between QEMU and virtio-gpu driver on >> the guest. Upon resource flush, virtio-gpu waits for the response for >> the message from the QEMU and QEMU sends out the response once rendering >> is done. The problem we are seeing is not that the rendering part is >> reusing the buffer before it's displayed by the QEMU. Problem we are >> facing is more like some frame is often not finished when >> "resource-flush" is issued. So unless there is a way for QEMU to wait >> for this fence (guest drm), I think we should have some synchronization >> point in the guest side. >> >> I saw other DRM drivers, omap, tegra, vc4 and so on are doing the >> similar so I guess this is a generic solution for such cases. But I do >> understand your concern as the primary use case of virtio-gpu driver is >> for virgl. So this extra wait would cost some performance drop. But I >> have a couple of points here as well. >> >> 1. Wouldn't this extra wait caused by drm_gem_plane_helper_prepare_fb be >> minimal as the actual >> rendering is done in the host? >> >> 2. Can we just make this helper called only if virgl is not used as 3D >> driver? > The problem you described above shouldn't be resolved by your patch. You > need to wait for FB to be released by the host's display and not to > before GPU finished rendering on guest. I.e. you're swapping display > buffers and your dGPU starts rendering to the buffer that is in active > use by host's display, correct? I don't believe the guest will start rendering on the same FB while host is consuming it because the virtio-gpu driver on the guest won't release the FB for the next frame before it gets the virtio resp for the resource flush command and the host (QEMU) will hold the response until the rendering is finished. And having this helper clearly fixes the issue we encountered (some old frame is sometimes shown..). But you might be right. The way I understood the original problem might not be 100% correct as it's based on my assumption on how things were fixed with addition of the helper. Then can you help me to approach to the real problem? If it's really fixed by the helper, then what would the original issue be? > > Maybe you need to do glFinish() on host after swapping buffers? But that > will block host for a long time. I can try glFinish everytime after it draws a frame on the host. But.. If that is the issue, then wouldn't I expect some skipped frames, not old frames? > For now I don't have solution. > >>> Virglrender today supports native contexts. The method you're using for >>> GPU priming was proven to be slow in comparison to multi-gpu native >>> contexts. There is ongoing work for supporting fence passing from guest >>> to host [1] that allows to do fence-syncing on host. You'll find links >>> to the WIP virtio-intel native context in [1] as well. You won't find >>> GPU priming support using native context in [1], patches hasn't been >>> published yet. >>> >>> [1] >>> https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1138 >>> >>> Note that in general it's not acceptable to upstream patches that serve >>> downstream only. Yours display sync issue is irrelevant to the upstream >>> stack unless you're going to upstream all the VMM and guest userspace >>> patches, and in such case you should always publish all the patches and >>> provide links. >>> >>> So, you need to check the performance impact and publish all the patches >>> to the relevant upstream projects. >> QEMU has all patches regarding this (blob scanout support) but guest Mesa >> patch for KMSRO is still outstanding. >> >> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592 >> >> I understand this specific patch would need more >> discussion/justification but >> what about other two, are you generally ok with those in the same series? > The first patch should be dropped. The second one could be useful, Yes, the first one will be gone. I will upload a revised version of the second one only with proper explanation. > you'll need to provide step-by-step instruction for how to reproduce the > multi-display issue, please write it in the cover-letter for the next > patch version. Thanks. Always appreciate your feedback!!
On 9/6/23 00:08, Kim, Dongwon wrote: > > I don't believe the guest will start rendering on the same FB while host is > consuming it because the virtio-gpu driver on the guest won't release > the FB for the next > frame before it gets the virtio resp for the resource flush command and > the host (QEMU) > will hold the response until the rendering is finished. The virgl_cmd_set_scanout() shouldn't hold the response if you're using SDL because frame swapping won't be vsynced. It may hold the response implicitly if you're using GTK for the Qemu's display. Are you using SDL?
Hi Dmitry, Resource flush is what is waiting for the fence to be signaled. (in current code before my patches are applied) static void virtio_gpu_resource_flush(struct drm_plane *plane, uint32_t x, uint32_t y, uint32_t width, uint32_t height) ...... virtio_gpu_notify(vgdev); dma_fence_wait_timeout(&vgfb->fence->f, true, msecs_to_jiffies(50)); ....... We use gtk on the host. Thanks! -----Original Message----- From: Dmitry Osipenko <dmitry.osipenko@collabora.com> Sent: Wednesday, October 4, 2023 4:32 PM To: Kim, Dongwon <dongwon.kim@intel.com>; dri-devel@lists.freedesktop.org Cc: Kasireddy, Vivek <vivek.kasireddy@intel.com>; kraxel@redhat.com Subject: Re: [RFC PATCH 3/3] drm/virtio: drm_gem_plane_helper_prepare_fb for obj synchronization On 9/6/23 00:08, Kim, Dongwon wrote: > > I don't believe the guest will start rendering on the same FB while > host is consuming it because the virtio-gpu driver on the guest won't > release the FB for the next frame before it gets the virtio resp for > the resource flush command and the host (QEMU) will hold the response > until the rendering is finished. The virgl_cmd_set_scanout() shouldn't hold the response if you're using SDL because frame swapping won't be vsynced. It may hold the response implicitly if you're using GTK for the Qemu's display. Are you using SDL? -- Best regards, Dmitry
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c index a063f06ab6c5..e197299489ce 100644 --- a/drivers/gpu/drm/virtio/virtgpu_plane.c +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c @@ -26,6 +26,7 @@ #include <drm/drm_atomic_helper.h> #include <drm/drm_damage_helper.h> #include <drm/drm_fourcc.h> +#include <drm/drm_gem_atomic_helper.h> #include "virtgpu_drv.h" @@ -271,6 +272,9 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane, vgfb = to_virtio_gpu_framebuffer(new_state->fb); vgplane_st = to_virtio_gpu_plane_state(new_state); bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]); + + drm_gem_plane_helper_prepare_fb(plane, new_state); + if (!bo || (plane->type == DRM_PLANE_TYPE_PRIMARY && !bo->guest_blob)) return 0;
This helper is needed for framebuffer synchronization. Old framebuffer data is often displayed on the guest display without this helper. Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> --- drivers/gpu/drm/virtio/virtgpu_plane.c | 4 ++++ 1 file changed, 4 insertions(+)