[RfC] Add udmabuf misc device

Message ID	20180313154826.20436-1-kraxel@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-media-owner@kernel.org> From: Gerd Hoffmann <kraxel@redhat.com> To: dri-devel@lists.freedesktop.org Cc: qemu-devel@nongnu.org, Gerd Hoffmann <kraxel@redhat.com>, David Airlie <airlied@linux.ie>, Tomeu Vizoso <tomeu.vizoso@collabora.com>, Sumit Semwal <sumit.semwal@linaro.org>, linux-kernel@vger.kernel.org (open list), linux-media@vger.kernel.org (open list:DMA BUFFER SHARING FRAMEWORK), linaro-mm-sig@lists.linaro.org (moderated list:DMA BUFFER SHARING FRAMEWORK) Subject: [RfC PATCH] Add udmabuf misc device Date: Tue, 13 Mar 2018 16:48:26 +0100 Message-Id: <20180313154826.20436-1-kraxel@redhat.com> Sender: linux-media-owner@vger.kernel.org Precedence: bulk

Gerd Hoffmann March 13, 2018, 3:48 p.m. UTC

A driver to let userspace turn iovecs into dma-bufs.

Use case:  Allows qemu pass around dmabufs for the guest framebuffer.
https://www.kraxel.org/cgit/qemu/log/?h=sirius/udmabuf has an
experimental patch.

Also allows qemu to export guest virtio-gpu resources as host dmabufs.
Should be possible to use it to display guest wayland windows on the
host display server.  virtio-gpu ressources can be chunked so we will
actually need multiple iovec entries.  UNTESTED.

Want collect some feedback on the general approach with this RfC series.
Can this work?  If not, better ideas?

Question:  Must this be hooked into some kind of mlock accounting, to
limit the amout of memory userspace is allowed to pin this way?  Or will
get_user_pages_fast() handle that for me?

Known issue:  Driver API isn't complete yet.  Need add some flags, for
example to support read-only buffers.

Cc: David Airlie <airlied@linux.ie>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 include/uapi/linux/udmabuf.h |  21 ++++
 drivers/dma-buf/udmabuf.c    | 250 +++++++++++++++++++++++++++++++++++++++++++
 drivers/dma-buf/Kconfig      |   7 ++
 drivers/dma-buf/Makefile     |   1 +
 4 files changed, 279 insertions(+)
 create mode 100644 include/uapi/linux/udmabuf.h
 create mode 100644 drivers/dma-buf/udmabuf.c

Daniel Vetter March 13, 2018, 4:10 p.m. UTC | #1

On Tue, Mar 13, 2018 at 04:48:26PM +0100, Gerd Hoffmann wrote:
> A driver to let userspace turn iovecs into dma-bufs.
> 
> Use case:  Allows qemu pass around dmabufs for the guest framebuffer.
> https://www.kraxel.org/cgit/qemu/log/?h=sirius/udmabuf has an
> experimental patch.
> 
> Also allows qemu to export guest virtio-gpu resources as host dmabufs.
> Should be possible to use it to display guest wayland windows on the
> host display server.  virtio-gpu ressources can be chunked so we will
> actually need multiple iovec entries.  UNTESTED.
> 
> Want collect some feedback on the general approach with this RfC series.
> Can this work?  If not, better ideas?
> 
> Question:  Must this be hooked into some kind of mlock accounting, to
> limit the amout of memory userspace is allowed to pin this way?  Or will
> get_user_pages_fast() handle that for me?

Either mlock account (because it's mlocked defacto), and get_user_pages
won't do that for you.

Or you write the full-blown userptr implementation, including mmu_notifier
support (see i915 or amdgpu), but that also requires Christian Königs
latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
buffers is a no-go).

> 
> Known issue:  Driver API isn't complete yet.  Need add some flags, for
> example to support read-only buffers.

dma-buf has no concept of read-only. I don't think we can even enforce
that (not many iommus can enforce this iirc), so pretty much need to
require r/w memory.

> Cc: David Airlie <airlied@linux.ie>
> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

btw there's also the hyperdmabuf stuff from the xen folks, but imo their
solution of forwarding the entire dma-buf api is over the top. This here
looks _much_ better, pls cc all the hyperdmabuf people on your next
version.

Overall I like the idea, but too lazy to review. Can maybe be bribed :-)

Oh, some kselftests for this stuff would be lovely.
-Daniel
> ---
>  include/uapi/linux/udmabuf.h |  21 ++++
>  drivers/dma-buf/udmabuf.c    | 250 +++++++++++++++++++++++++++++++++++++++++++
>  drivers/dma-buf/Kconfig      |   7 ++
>  drivers/dma-buf/Makefile     |   1 +
>  4 files changed, 279 insertions(+)
>  create mode 100644 include/uapi/linux/udmabuf.h
>  create mode 100644 drivers/dma-buf/udmabuf.c
> 
> diff --git a/include/uapi/linux/udmabuf.h b/include/uapi/linux/udmabuf.h
> new file mode 100644
> index 0000000000..fd2fa441fe
> --- /dev/null
> +++ b/include/uapi/linux/udmabuf.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_UDMABUF_H
> +#define _UAPI_LINUX_UDMABUF_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +struct udmabuf_iovec {
> +	__u64 base;
> +	__u64 len;
> +};
> +
> +struct udmabuf_create {
> +	__u32 flags;
> +	__u32 niov;
> +	struct udmabuf_iovec iovs[];
> +};
> +
> +#define UDMABUF_CREATE _IOW(0x42, 0x23, struct udmabuf_create)
> +
> +#endif /* _UAPI_LINUX_UDMABUF_H */
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> new file mode 100644
> index 0000000000..ec012d7ac7
> --- /dev/null
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -0,0 +1,250 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <linux/miscdevice.h>
> +#include <linux/dma-buf.h>
> +#include <linux/highmem.h>
> +
> +#include <uapi/linux/udmabuf.h>
> +
> +struct udmabuf {
> +	u32 pagecount;
> +	struct page **pages;
> +};
> +
> +static int udmabuf_vm_fault(struct vm_fault *vmf)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct udmabuf *ubuf = vma->vm_private_data;
> +
> +	if (WARN_ON(vmf->pgoff >= ubuf->pagecount))
> +		return VM_FAULT_SIGBUS;
> +
> +	vmf->page = ubuf->pages[vmf->pgoff];
> +	get_page(vmf->page);
> +	return 0;
> +}
> +
> +static const struct vm_operations_struct udmabuf_vm_ops = {
> +	.fault = udmabuf_vm_fault,
> +};
> +
> +static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma)
> +{
> +	struct udmabuf *ubuf = buf->priv;
> +
> +	if ((vma->vm_flags & VM_SHARED) == 0)
> +		return -EINVAL;
> +
> +	vma->vm_ops = &udmabuf_vm_ops;
> +	vma->vm_private_data = ubuf;
> +	return 0;
> +}
> +
> +static struct sg_table *map_udmabuf(struct dma_buf_attachment *at,
> +				    enum dma_data_direction direction)
> +{
> +	struct udmabuf *ubuf = at->dmabuf->priv;
> +	struct sg_table *sg;
> +
> +	sg = kzalloc(sizeof(*sg), GFP_KERNEL);
> +	if (!sg)
> +		goto err1;
> +	if (sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount,
> +				      0, ubuf->pagecount << PAGE_SHIFT,
> +				      GFP_KERNEL) < 0)
> +		goto err2;
> +	if (!dma_map_sg(at->dev, sg->sgl, sg->nents, direction))
> +		goto err3;
> +
> +	return sg;
> +
> +err3:
> +	sg_free_table(sg);
> +err2:
> +	kfree(sg);
> +err1:
> +	return ERR_PTR(-ENOMEM);
> +}
> +
> +static void unmap_udmabuf(struct dma_buf_attachment *at,
> +			  struct sg_table *sg,
> +			  enum dma_data_direction direction)
> +{
> +	sg_free_table(sg);
> +	kfree(sg);
> +}
> +
> +static void release_udmabuf(struct dma_buf *buf)
> +{
> +	struct udmabuf *ubuf = buf->priv;
> +	pgoff_t pg;
> +
> +	for (pg = 0; pg < ubuf->pagecount; pg++)
> +		put_page(ubuf->pages[pg]);
> +	kfree(ubuf->pages);
> +	kfree(ubuf);
> +}
> +
> +static void *kmap_atomic_udmabuf(struct dma_buf *buf, unsigned long offset)
> +{
> +	struct udmabuf *ubuf = buf->priv;
> +	struct page *page = ubuf->pages[offset >> PAGE_SHIFT];
> +
> +	return kmap_atomic(page);
> +}
> +
> +static void *kmap_udmabuf(struct dma_buf *buf, unsigned long offset)
> +{
> +	struct udmabuf *ubuf = buf->priv;
> +	struct page *page = ubuf->pages[offset >> PAGE_SHIFT];
> +
> +	return kmap(page);
> +}
> +
> +static struct dma_buf_ops udmabuf_ops = {
> +	.map_dma_buf	  = map_udmabuf,
> +	.unmap_dma_buf	  = unmap_udmabuf,
> +	.release	  = release_udmabuf,
> +	.map_atomic	  = kmap_atomic_udmabuf,
> +	.map		  = kmap_udmabuf,
> +	.mmap		  = mmap_udmabuf,
> +};
> +
> +static long udmabuf_ioctl_create(struct file *filp, unsigned long arg)
> +{
> +	struct udmabuf_create create;
> +	struct udmabuf_iovec *iovs;
> +	struct udmabuf *ubuf;
> +	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> +	struct dma_buf *buf;
> +	pgoff_t pgoff, pgcnt;
> +	u32 iov;
> +	int ret;
> +
> +	if (copy_from_user(&create, (void __user *)arg,
> +			   sizeof(struct udmabuf_create)))
> +		return -EFAULT;
> +
> +	iovs = kmalloc_array(create.niov, sizeof(struct udmabuf_iovec),
> +			     GFP_KERNEL);
> +	if (!iovs)
> +		return -ENOMEM;
> +
> +	arg += offsetof(struct udmabuf_create, iovs);
> +	ret = -EFAULT;
> +	if (copy_from_user(iovs, (void __user *)arg,
> +			   create.niov * sizeof(struct udmabuf_iovec)))
> +		goto err_free_iovs;
> +
> +	ubuf = kzalloc(sizeof(struct udmabuf), GFP_KERNEL);
> +	if (!ubuf)
> +		goto err_free_iovs;
> +
> +	ret = -EINVAL;
> +	for (iov = 0; iov < create.niov; iov++) {
> +		if (!IS_ALIGNED(iovs[iov].base, PAGE_SIZE))
> +			goto err_free_iovs;
> +		if (!IS_ALIGNED(iovs[iov].len, PAGE_SIZE))
> +			goto err_free_iovs;
> +		ubuf->pagecount += iovs[iov].len >> PAGE_SHIFT;
> +	}
> +
> +	ret = -ENOMEM;
> +	ubuf->pages = kmalloc_array(ubuf->pagecount, sizeof(struct page*),
> +				    GFP_KERNEL);
> +	if (!ubuf->pages)
> +		goto err_free_buf;
> +
> +	pgoff = 0;
> +	for (iov = 0; iov < create.niov; iov++) {
> +		pgcnt = iovs[iov].len >> PAGE_SHIFT;
> +		while (pgcnt > 0) {
> +			ret = get_user_pages_fast(iovs[iov].base, pgcnt,
> +						  true, /* write */
> +						  ubuf->pages + pgoff);
> +			if (ret < 0)
> +				goto err_put_pages;
> +			pgoff += ret;
> +			pgcnt -= ret;
> +		}
> +	}
> +
> +	exp_info.ops  = &udmabuf_ops;
> +	exp_info.size = ubuf->pagecount << PAGE_SHIFT;
> +	exp_info.priv = ubuf;
> +
> +	buf = dma_buf_export(&exp_info);
> +	if (IS_ERR(buf)) {
> +		ret = PTR_ERR(buf);
> +		goto err_put_pages;
> +	}
> +
> +	kfree(iovs);
> +	return dma_buf_fd(buf, 0);
> +
> +err_put_pages:
> +	while (pgoff > 0)
> +		put_page(ubuf->pages[--pgoff]);
> +err_free_buf:
> +	kfree(ubuf->pages);
> +	kfree(ubuf);
> +err_free_iovs:
> +	kfree(iovs);
> +	return ret;
> +}
> +
> +static long udmabuf_ioctl(struct file *filp, unsigned int ioctl,
> +				  unsigned long arg)
> +{
> +	long ret;
> +
> +	switch (ioctl) {
> +	case UDMABUF_CREATE:
> +		ret = udmabuf_ioctl_create(filp, arg);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +static const struct file_operations udmabuf_fops = {
> +	.owner		= THIS_MODULE,
> +	.unlocked_ioctl = udmabuf_ioctl,
> +};
> +
> +static struct miscdevice udmabuf_misc = {
> +	.minor          = MISC_DYNAMIC_MINOR,
> +	.name           = "udmabuf",
> +	.fops           = &udmabuf_fops,
> +};
> +
> +static int __init udmabuf_dev_init(void)
> +{
> +	int ret;
> +
> +	ret = misc_register(&udmabuf_misc);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +static void __exit udmabuf_dev_exit(void)
> +{
> +	misc_deregister(&udmabuf_misc);
> +}
> +
> +module_init(udmabuf_dev_init)
> +module_exit(udmabuf_dev_exit)
> +
> +MODULE_LICENSE("GPL v2");
> diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
> index ed3b785bae..5876b52554 100644
> --- a/drivers/dma-buf/Kconfig
> +++ b/drivers/dma-buf/Kconfig
> @@ -30,4 +30,11 @@ config SW_SYNC
>  	  WARNING: improper use of this can result in deadlocking kernel
>  	  drivers from userspace. Intended for test and debug only.
>  
> +config UDMABUF
> +	tristate "userspace dmabuf misc driver"
> +	default n
> +	depends on DMA_SHARED_BUFFER
> +	---help---
> +	  A driver to let userspace turn iovs into dma-bufs.
> +
>  endmenu
> diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
> index c33bf88631..0913a6ccab 100644
> --- a/drivers/dma-buf/Makefile
> +++ b/drivers/dma-buf/Makefile
> @@ -1,3 +1,4 @@
>  obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
>  obj-$(CONFIG_SYNC_FILE)		+= sync_file.o
>  obj-$(CONFIG_SW_SYNC)		+= sw_sync.o sync_debug.o
> +obj-$(CONFIG_UDMABUF)		+= udmabuf.o
> -- 
> 2.9.3
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Gerd Hoffmann March 14, 2018, 8:03 a.m. UTC | #2

Hi,

> Either mlock account (because it's mlocked defacto), and get_user_pages
> won't do that for you.
> 
> Or you write the full-blown userptr implementation, including mmu_notifier
> support (see i915 or amdgpu), but that also requires Christian Königs
> latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
> buffers is a no-go).

I guess I'll look at mlock accounting for starters then.  Easier for
now, and leaves the door open to switch to userptr later as this should
be transparent to userspace.

> > Known issue:  Driver API isn't complete yet.  Need add some flags, for
> > example to support read-only buffers.
> 
> dma-buf has no concept of read-only. I don't think we can even enforce
> that (not many iommus can enforce this iirc), so pretty much need to
> require r/w memory.

Ah, ok.  Just saw the 'write' arg for get_user_pages_fast and figured we
might support that, but if iommus can't handle that anyway it's
pointless indeed.

> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> 
> btw there's also the hyperdmabuf stuff from the xen folks, but imo their
> solution of forwarding the entire dma-buf api is over the top. This here
> looks _much_ better, pls cc all the hyperdmabuf people on your next
> version.

Fun fact: googling for "hyperdmabuf" found me your mail and nothing else :-o
(Trying "hyper dmabuf" instead worked better then).

Yes, will cc them on the next version.  Not sure it'll help much on xen
though due to the memory management being very different.  Basically xen
owns the memory, not the kernel of the control domain (dom0), so
creating dmabufs for guest memory chunks isn't that simple ...

Also it's not clear whenever they really need guest -> guest exports or
guest -> dom0 exports.

> Overall I like the idea, but too lazy to review.

Cool.  General comments on the idea was all I was looking for for the
moment.  Spare yor review cycles for the next version ;)

> Oh, some kselftests for this stuff would be lovely.

I'll look into it.

thanks,
  Gerd

Daniel Vetter April 5, 2018, 8:32 p.m. UTC | #3

Pulling this out of the shadows again.

We now also have xen-zcopy from Oleksandr and the hyper dmabuf stuff
from Matt and Dongwong.

At least from the intel side there seems to be the idea to just have 1
special device that can handle cross-gues/host sharing for all kinds
of hypervisors, so I guess you all need to work together :-)

Or we throw out the idea that hyper dmabuf will be cross-hypervisor
(not sure how useful/reasonable that is, someone please convince me
one way or the other).

Cheers, Daniel

On Wed, Mar 14, 2018 at 9:03 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>   Hi,
>
>> Either mlock account (because it's mlocked defacto), and get_user_pages
>> won't do that for you.
>>
>> Or you write the full-blown userptr implementation, including mmu_notifier
>> support (see i915 or amdgpu), but that also requires Christian Königs
>> latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
>> buffers is a no-go).
>
> I guess I'll look at mlock accounting for starters then.  Easier for
> now, and leaves the door open to switch to userptr later as this should
> be transparent to userspace.
>
>> > Known issue:  Driver API isn't complete yet.  Need add some flags, for
>> > example to support read-only buffers.
>>
>> dma-buf has no concept of read-only. I don't think we can even enforce
>> that (not many iommus can enforce this iirc), so pretty much need to
>> require r/w memory.
>
> Ah, ok.  Just saw the 'write' arg for get_user_pages_fast and figured we
> might support that, but if iommus can't handle that anyway it's
> pointless indeed.
>
>> > Cc: David Airlie <airlied@linux.ie>
>> > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
>>
>> btw there's also the hyperdmabuf stuff from the xen folks, but imo their
>> solution of forwarding the entire dma-buf api is over the top. This here
>> looks _much_ better, pls cc all the hyperdmabuf people on your next
>> version.
>
> Fun fact: googling for "hyperdmabuf" found me your mail and nothing else :-o
> (Trying "hyper dmabuf" instead worked better then).
>
> Yes, will cc them on the next version.  Not sure it'll help much on xen
> though due to the memory management being very different.  Basically xen
> owns the memory, not the kernel of the control domain (dom0), so
> creating dmabufs for guest memory chunks isn't that simple ...
>
> Also it's not clear whenever they really need guest -> guest exports or
> guest -> dom0 exports.
>
>> Overall I like the idea, but too lazy to review.
>
> Cool.  General comments on the idea was all I was looking for for the
> moment.  Spare yor review cycles for the next version ;)
>
>> Oh, some kselftests for this stuff would be lovely.
>
> I'll look into it.
>
> thanks,
>   Gerd
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Matt Roper April 6, 2018, 12:11 a.m. UTC | #4

On Thu, Apr 05, 2018 at 10:32:04PM +0200, Daniel Vetter wrote:
> Pulling this out of the shadows again.
> 
> We now also have xen-zcopy from Oleksandr and the hyper dmabuf stuff
> from Matt and Dongwong.
> 
> At least from the intel side there seems to be the idea to just have 1
> special device that can handle cross-gues/host sharing for all kinds
> of hypervisors, so I guess you all need to work together :-)
> 
> Or we throw out the idea that hyper dmabuf will be cross-hypervisor
> (not sure how useful/reasonable that is, someone please convince me
> one way or the other).
> 
> Cheers, Daniel

Dongwon (DW) is the one doing all the real work on hyper_dmabuf, but I'm
familiar with the use cases he's trying to address, and I think there
are a couple high-level goals of his work that are worth calling out as
we discuss the various options for sharing buffers produced in one VM
with a consumer running in another VM:

 * We should try to keep the interface/usage separate from the
   underlying hypervisor implementation details.  I.e., in DW's design
   the sink/source drivers that handle the actual buffer passing in the
   two VM's should provide a generic interface that does not depend on a
   specific hypervisor.  Behind the scenes there could be various
   implementations for specific hypervisors (Xen, KVM, ACRN, etc.), and
   some of those backends may have additional restrictions, but it would
   be best if userspace didn't have to know the specific hypervisor
   running on the system and could just query the general capabilities
   available to it.  We've already got projects in flight that are
   wanting this functionality on Xen and ACRN today.

 * The general interface should be able to express sharing from any
   guest:guest, not just guest:host.  Arbitrary G:G sharing might be
   something some hypervisors simply aren't able to support, but the
   userspace API itself shouldn't make assumptions or restrict that.  I
   think ideally the sharing API would include some kind of
   query_targets interface that would return a list of VM's that your
   current OS is allowed to share with; that list would be depend on the
   policy established by the system integrator, but obviously wouldn't
   include targets that the hypervisor itself wouldn't be capable of
   handling.

 * A lot of the initial use cases are in the realm of graphics, but this
   shouldn't be a graphics-specific API.  Buffers might contain other
   types of content as well (e.g., audio).  Really the content producer
   could potentially be any driver (or userspace) running in the VM that
   knows how to import/export dma_buf's (or maybe just import given
   danvet's suggestion that we should make the sink driver do all the
   actual memory allocation for any buffers that may be shared).

 * We need to be able to handle cross-VM coordination of buffer usage as
   well, so I think we'd want to include fence forwarding support in the
   API as well to signal back and forth about production/consumption
   completion.  And of course document really well what should happen
   if, for example, the entire VM you're sharing with/from dies.

 * The sharing API could be used to share multiple kinds of content in a
   single system.  The sharing sink driver running in the content
   producer's VM should accept some additional metadata that will be
   passed over to the target VM as well.  The sharing source driver
   running in the content consumer's VM would then be able to use this
   metadata to determine the purpose of a new buffer that arrives and
   filter/dispatch it to the appropriate consumer.

For reference, the terminology I'm using is

 /----------\  dma_buf   /------\ HV /--------\  dma_buf   /----------\
 | Producer |----------->| Sink | HV | Source |----------->| Consumer |
 \----------/   ioctls   \------/ HV \--------/  uevents   \----------/

In the realm of graphics, "Producer" could potentially be something like
an EGL client that sends the buffer at context setup and then signals
with fences on each SwapBuffers.  "Consumer" could be a Wayland client
that proxies the buffers into surfaces or dispatches them to other
userspace software that's waiting for buffers.

With the hyper_dmabuf approach, there's a lot of ABI details that need
to be worked out and really clearly documented before we worry too much
about the backend hypervisor-specific stuff.

I'm not super familiar with xen-zcopy and udmabuf, but it sounds like
they're approaching similar problems from slightly different directions,
so we should make sure we can come up with something that satisfies
everyone's requirements. 

Matt

> 
> On Wed, Mar 14, 2018 at 9:03 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> >   Hi,
> >
> >> Either mlock account (because it's mlocked defacto), and get_user_pages
> >> won't do that for you.
> >>
> >> Or you write the full-blown userptr implementation, including mmu_notifier
> >> support (see i915 or amdgpu), but that also requires Christian Königs
> >> latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
> >> buffers is a no-go).
> >
> > I guess I'll look at mlock accounting for starters then.  Easier for
> > now, and leaves the door open to switch to userptr later as this should
> > be transparent to userspace.
> >
> >> > Known issue:  Driver API isn't complete yet.  Need add some flags, for
> >> > example to support read-only buffers.
> >>
> >> dma-buf has no concept of read-only. I don't think we can even enforce
> >> that (not many iommus can enforce this iirc), so pretty much need to
> >> require r/w memory.
> >
> > Ah, ok.  Just saw the 'write' arg for get_user_pages_fast and figured we
> > might support that, but if iommus can't handle that anyway it's
> > pointless indeed.
> >
> >> > Cc: David Airlie <airlied@linux.ie>
> >> > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> >>
> >> btw there's also the hyperdmabuf stuff from the xen folks, but imo their
> >> solution of forwarding the entire dma-buf api is over the top. This here
> >> looks _much_ better, pls cc all the hyperdmabuf people on your next
> >> version.
> >
> > Fun fact: googling for "hyperdmabuf" found me your mail and nothing else :-o
> > (Trying "hyper dmabuf" instead worked better then).
> >
> > Yes, will cc them on the next version.  Not sure it'll help much on xen
> > though due to the memory management being very different.  Basically xen
> > owns the memory, not the kernel of the control domain (dom0), so
> > creating dmabufs for guest memory chunks isn't that simple ...
> >
> > Also it's not clear whenever they really need guest -> guest exports or
> > guest -> dom0 exports.
> >
> >> Overall I like the idea, but too lazy to review.
> >
> > Cool.  General comments on the idea was all I was looking for for the
> > moment.  Spare yor review cycles for the next version ;)
> >
> >> Oh, some kselftests for this stuff would be lovely.
> >
> > I'll look into it.
> >
> > thanks,
> >   Gerd
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Oleksandr Andrushchenko April 6, 2018, 6:55 a.m. UTC | #5

On 04/06/2018 03:11 AM, Matt Roper wrote:
> On Thu, Apr 05, 2018 at 10:32:04PM +0200, Daniel Vetter wrote:
>> Pulling this out of the shadows again.
>>
>> We now also have xen-zcopy from Oleksandr and the hyper dmabuf stuff
>> from Matt and Dongwong.
>>
>> At least from the intel side there seems to be the idea to just have 1
>> special device that can handle cross-gues/host sharing for all kinds
>> of hypervisors, so I guess you all need to work together :-)
>>
>> Or we throw out the idea that hyper dmabuf will be cross-hypervisor
>> (not sure how useful/reasonable that is, someone please convince me
>> one way or the other).
>>
>> Cheers, Daniel
> Dongwon (DW) is the one doing all the real work on hyper_dmabuf, but I'm
> familiar with the use cases he's trying to address, and I think there
> are a couple high-level goals of his work that are worth calling out as
> we discuss the various options for sharing buffers produced in one VM
> with a consumer running in another VM:
>
>   * We should try to keep the interface/usage separate from the
>     underlying hypervisor implementation details.  I.e., in DW's design
>     the sink/source drivers that handle the actual buffer passing in the
>     two VM's should provide a generic interface that does not depend on a
>     specific hypervisor.
This is what we did for display, sound and multi-touch on Xen:
we have implemented generic protocols which are OS agnostic.
Have you started prototyping such a protocol for hyper-dmabuf yet?

>   Behind the scenes there could be various
>     implementations for specific hypervisors (Xen, KVM, ACRN, etc.), and
>     some of those backends may have additional restrictions, but it would
>     be best if userspace didn't have to know the specific hypervisor
>     running on the system and could just query the general capabilities
>     available to it.  We've already got projects in flight that are
>     wanting this functionality on Xen and ACRN today.
Should we add corresponding communities into discussion then?

>
>   * The general interface should be able to express sharing from any
>     guest:guest, not just guest:host.  Arbitrary G:G sharing might be
>     something some hypervisors simply aren't able to support, but the
>     userspace API itself shouldn't make assumptions or restrict that.  I
>     think ideally the sharing API would include some kind of
>     query_targets interface that would return a list of VM's that your
>     current OS is allowed to share with; that list would be depend on the
>     policy established by the system integrator, but obviously wouldn't
>     include targets that the hypervisor itself wouldn't be capable of
>     handling.
Can you give a use-case for this? I mean that the system integrator
is the one who defines which guests/hosts talk to each other,
but querying means that it is possible that VMs have some sort
of discovery mechanism, so they can decide on their own whom
to connect to.
>     
>   * A lot of the initial use cases are in the realm of graphics, but this
>     shouldn't be a graphics-specific API.  Buffers might contain other
>     types of content as well (e.g., audio).  Really the content producer
>     could potentially be any driver (or userspace) running in the VM that
>     knows how to import/export dma_buf's (or maybe just import given
>     danvet's suggestion that we should make the sink driver do all the
>     actual memory allocation for any buffers that may be shared).
>
>   * We need to be able to handle cross-VM coordination of buffer usage as
>     well, so I think we'd want to include fence forwarding support in the
>     API as well to signal back and forth about production/consumption
>     completion.  And of course document really well what should happen
>     if, for example, the entire VM you're sharing with/from dies.
>
>   * The sharing API could be used to share multiple kinds of content in a
>     single system.  The sharing sink driver running in the content
>     producer's VM should accept some additional metadata that will be
>     passed over to the target VM as well.  The sharing source driver
>     running in the content consumer's VM would then be able to use this
>     metadata to determine the purpose of a new buffer that arrives and
>     filter/dispatch it to the appropriate consumer.
>
>
> For reference, the terminology I'm using is
>
>   /----------\  dma_buf   /------\ HV /--------\  dma_buf   /----------\
>   | Producer |----------->| Sink | HV | Source |----------->| Consumer |
>   \----------/   ioctls   \------/ HV \--------/  uevents   \----------/
>
>
>
> In the realm of graphics, "Producer" could potentially be something like
> an EGL client that sends the buffer at context setup and then signals
> with fences on each SwapBuffers.  "Consumer" could be a Wayland client
> that proxies the buffers into surfaces or dispatches them to other
> userspace software that's waiting for buffers.
>
> With the hyper_dmabuf approach, there's a lot of ABI details that need
> to be worked out and really clearly documented before we worry too much
> about the backend hypervisor-specific stuff.
>
> I'm not super familiar with xen-zcopy

Let me describe the rationale and some implementation details of the Xen
zero-copy driver I posted recently [1].

The main requirement for us to implement such a helper driver was an ability
to avoid memory copying for large buffers in display use-cases. This is why
we only focused on DRM use-cases, not trying to implement something
generic. This is why the driver is somewhat coupled with Xen 
para-virtualized
DRM driver [2] by Xen para-virtual display protocol [3] grant references
sharing mechanism, e.g. backend receives an array of Xen grant references to
frontend's buffer pages. These grant references are then used to construct a
PRIME buffer. The same mechanism is used when backend shares a buffer 
with the
frontend, but in the other direction. More details on UAPI of the driver are
available at [1].

So, when discussing a possibility to share dma-bufs in a generic way I would
also like to have the following considered:

1. We are targeting ARM and one of the major requirements for the buffer
sharing is the ability to allocate physically contiguous buffers, which gets
even more complicated for systems not backed with an IOMMU. So, for some
use-cases it is enough to make the buffers contiguous in terms of IPA and
sometimes those need to be contiguous in terms of PA.
(The use-case is that you use Wayland-DRM/KMS or share the buffer with
the driver implemented with DRM CMA helpers).

2. For Xen we would love to see UAPI to create a dma-buf from grant 
references
provided, so we can use this generic solution to implement zero-copying 
without
breaking the existing Xen protocols. This can probably be extended to other
hypervizors as well.

Thank you,
Oleksandr Andrushchenko


>   and udmabuf, but it sounds like
> they're approaching similar problems from slightly different directions,
> so we should make sure we can come up with something that satisfies
> everyone's requirements.
>
>
> Matt
>
>> On Wed, Mar 14, 2018 at 9:03 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>    Hi,
>>>
>>>> Either mlock account (because it's mlocked defacto), and get_user_pages
>>>> won't do that for you.
>>>>
>>>> Or you write the full-blown userptr implementation, including mmu_notifier
>>>> support (see i915 or amdgpu), but that also requires Christian Königs
>>>> latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
>>>> buffers is a no-go).
>>> I guess I'll look at mlock accounting for starters then.  Easier for
>>> now, and leaves the door open to switch to userptr later as this should
>>> be transparent to userspace.
>>>
>>>>> Known issue:  Driver API isn't complete yet.  Need add some flags, for
>>>>> example to support read-only buffers.
>>>> dma-buf has no concept of read-only. I don't think we can even enforce
>>>> that (not many iommus can enforce this iirc), so pretty much need to
>>>> require r/w memory.
>>> Ah, ok.  Just saw the 'write' arg for get_user_pages_fast and figured we
>>> might support that, but if iommus can't handle that anyway it's
>>> pointless indeed.
>>>
>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
>>>> btw there's also the hyperdmabuf stuff from the xen folks, but imo their
>>>> solution of forwarding the entire dma-buf api is over the top. This here
>>>> looks _much_ better, pls cc all the hyperdmabuf people on your next
>>>> version.
>>> Fun fact: googling for "hyperdmabuf" found me your mail and nothing else :-o
>>> (Trying "hyper dmabuf" instead worked better then).
>>>
>>> Yes, will cc them on the next version.  Not sure it'll help much on xen
>>> though due to the memory management being very different.  Basically xen
>>> owns the memory, not the kernel of the control domain (dom0), so
>>> creating dmabufs for guest memory chunks isn't that simple ...
>>>
>>> Also it's not clear whenever they really need guest -> guest exports or
>>> guest -> dom0 exports.
>>>
>>>> Overall I like the idea, but too lazy to review.
>>> Cool.  General comments on the idea was all I was looking for for the
>>> moment.  Spare yor review cycles for the next version ;)
>>>
>>>> Oh, some kselftests for this stuff would be lovely.
>>> I'll look into it.
>>>
>>> thanks,
>>>    Gerd
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
[1] https://patchwork.freedesktop.org/series/40880/
[2] 
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=c575b7eeb89f94356997abd62d6d5a0590e259b7
[3] 
https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h

Gerd Hoffmann April 6, 2018, 9:07 a.m. UTC | #6

Hi,

> >   * The general interface should be able to express sharing from any
> >     guest:guest, not just guest:host.  Arbitrary G:G sharing might be
> >     something some hypervisors simply aren't able to support, but the
> >     userspace API itself shouldn't make assumptions or restrict that.  I
> >     think ideally the sharing API would include some kind of
> >     query_targets interface that would return a list of VM's that your
> >     current OS is allowed to share with; that list would be depend on the
> >     policy established by the system integrator, but obviously wouldn't
> >     include targets that the hypervisor itself wouldn't be capable of
> >     handling.

> Can you give a use-case for this? I mean that the system integrator
> is the one who defines which guests/hosts talk to each other,
> but querying means that it is possible that VMs have some sort
> of discovery mechanism, so they can decide on their own whom
> to connect to.

Note that vsock (created by vmware, these days also has a virtio
transport for kvm) started with support for both guest <=> host and
guest <=> guest support.  But later on guest <=> guest was dropped.
As far I know the reasons where (a) lack of use cases and (b) security.

So, I likewise would know more details on the use cases you have in mind
here.  Unless we have a compelling use case here I'd suggest to drop the
guest <=> guest requirement as it makes the whole thing alot more
complex.

> >   * The sharing API could be used to share multiple kinds of content in a
> >     single system.  The sharing sink driver running in the content
> >     producer's VM should accept some additional metadata that will be
> >     passed over to the target VM as well.

Not sure this should be part of hyper-dmabuf.  A dma-buf is nothing but
a block of data, period.  Therefore protocols with dma-buf support
(wayland for example) typically already send over metadata describing
the content, so duplicating that in hyper-dmabuf looks pointless.

> 1. We are targeting ARM and one of the major requirements for the buffer
> sharing is the ability to allocate physically contiguous buffers, which gets
> even more complicated for systems not backed with an IOMMU. So, for some
> use-cases it is enough to make the buffers contiguous in terms of IPA and
> sometimes those need to be contiguous in terms of PA.

Which pretty much implies the host must to the allocation.

> 2. For Xen we would love to see UAPI to create a dma-buf from grant
> references provided, so we can use this generic solution to implement
> zero-copying without breaking the existing Xen protocols. This can
> probably be extended to other hypervizors as well.

I'm not sure we can create something which works on both kvm and xen.
The memory management model is quite different ...


On xen the hypervisor manages all memory.  Guests can allow other guests
to access specific pages (using grant tables).  In theory any guest <=>
guest communication is possible.  In practice is mostly guest <=> dom0
because guests access their virtual hardware that way.  dom0 is the
priviledged guest which owns any hardware not managed by xen itself.

Xen guests can ask the hypervisor to update the mapping of guest
physical pages.  They can ballon down (unmap and free pages).  They can
ballon up (ask the hypervisor to map fresh pages).  They can map pages
exported by other guests using grant tables.  xen-zcopy makes heavy use
of this.  It balloons down, to make room in the guest physical address
space, then goes map the exported pages there, finally composes a
dma-buf.


On kvm qemu manages all guest memory.  qemu also has all guest memory
mapped, so a grant-table like mechanism isn't needed to implement
virtual devices.  qemu can decide how it backs memory for the guest.
qemu propagates the guest memory map to the kvm driver in the linux
kernel.  kvm guests have some control over the guest memory map, for
example they can map pci bars wherever they want in their guest physical
address space by programming the base registers accordingly, but unlike
xen guests they can't ask the host to remap individual pages.

Due to qemu having all guest memory mapped virtual devices are typically
designed to have the guest allocate resources, then notify the host
where they are located.  This is where the udmabuf idea comes from:
Guest tells the host (qemu) where the gem object is, and qemu then can
create a dmabuf backed by those pages to pass it on to other processes
such as the wayland display server.  Possibly even without the guest
explicitly asking for it, i.e. export the framebuffer placed by the
guest in the (virtual) vga pci memory bar as dma-buf.  And I can imagine
that this is useful outsize virtualization too.


I fail to see any common ground for xen-zcopy and udmabuf ...

Beside that there is the problem that the udmabuf idea has its own share
of issues, for example the fork() issue pointed out by Christian
König[1].  So I still need to find something which can work for kvm ...

cheers,
  Gerd

[1] https://www.spinics.net/lists/dri-devel/msg169442.html

Oleksandr Andrushchenko April 6, 2018, 9:34 a.m. UTC | #7

On 04/06/2018 12:07 PM, Gerd Hoffmann wrote:
>    Hi,
>
>>>    * The general interface should be able to express sharing from any
>>>      guest:guest, not just guest:host.  Arbitrary G:G sharing might be
>>>      something some hypervisors simply aren't able to support, but the
>>>      userspace API itself shouldn't make assumptions or restrict that.  I
>>>      think ideally the sharing API would include some kind of
>>>      query_targets interface that would return a list of VM's that your
>>>      current OS is allowed to share with; that list would be depend on the
>>>      policy established by the system integrator, but obviously wouldn't
>>>      include targets that the hypervisor itself wouldn't be capable of
>>>      handling.
>> Can you give a use-case for this? I mean that the system integrator
>> is the one who defines which guests/hosts talk to each other,
>> but querying means that it is possible that VMs have some sort
>> of discovery mechanism, so they can decide on their own whom
>> to connect to.
> Note that vsock (created by vmware, these days also has a virtio
> transport for kvm) started with support for both guest <=> host and
> guest <=> guest support.  But later on guest <=> guest was dropped.
> As far I know the reasons where (a) lack of use cases and (b) security.
>
> So, I likewise would know more details on the use cases you have in mind
> here.  Unless we have a compelling use case here I'd suggest to drop the
> guest <=> guest requirement as it makes the whole thing alot more
> complex.
This is exactly the use-case we have: in our setup Dom0 doesn't
own any HW at all and all the HW is passed into a dedicated
driver domain (DomD) which is still a guest domain.
Then, buffers are shared between two guests, for example,
DomD and DomA (Android guest)
>>>    * The sharing API could be used to share multiple kinds of content in a
>>>      single system.  The sharing sink driver running in the content
>>>      producer's VM should accept some additional metadata that will be
>>>      passed over to the target VM as well.
> Not sure this should be part of hyper-dmabuf.  A dma-buf is nothing but
> a block of data, period.  Therefore protocols with dma-buf support
> (wayland for example) typically already send over metadata describing
> the content, so duplicating that in hyper-dmabuf looks pointless.
>
>> 1. We are targeting ARM and one of the major requirements for the buffer
>> sharing is the ability to allocate physically contiguous buffers, which gets
>> even more complicated for systems not backed with an IOMMU. So, for some
>> use-cases it is enough to make the buffers contiguous in terms of IPA and
>> sometimes those need to be contiguous in terms of PA.
> Which pretty much implies the host must to the allocation.
>
>> 2. For Xen we would love to see UAPI to create a dma-buf from grant
>> references provided, so we can use this generic solution to implement
>> zero-copying without breaking the existing Xen protocols. This can
>> probably be extended to other hypervizors as well.
> I'm not sure we can create something which works on both kvm and xen.
> The memory management model is quite different ...
>
>
> On xen the hypervisor manages all memory.  Guests can allow other guests
> to access specific pages (using grant tables).  In theory any guest <=>
> guest communication is possible.  In practice is mostly guest <=> dom0
> because guests access their virtual hardware that way.  dom0 is the
> priviledged guest which owns any hardware not managed by xen itself.
Please see above for our setup with DomD and Dom0 being
a generic ARMv8 domain, no HW
> Xen guests can ask the hypervisor to update the mapping of guest
> physical pages.  They can ballon down (unmap and free pages).  They can
> ballon up (ask the hypervisor to map fresh pages).  They can map pages
> exported by other guests using grant tables.  xen-zcopy makes heavy use
> of this.  It balloons down, to make room in the guest physical address
> space, then goes map the exported pages there, finally composes a
> dma-buf.
This is what it does
>
> On kvm qemu manages all guest memory.  qemu also has all guest memory
> mapped, so a grant-table like mechanism isn't needed to implement
> virtual devices.  qemu can decide how it backs memory for the guest.
> qemu propagates the guest memory map to the kvm driver in the linux
> kernel.  kvm guests have some control over the guest memory map, for
> example they can map pci bars wherever they want in their guest physical
> address space by programming the base registers accordingly, but unlike
> xen guests they can't ask the host to remap individual pages.
>
> Due to qemu having all guest memory mapped virtual devices are typically
> designed to have the guest allocate resources, then notify the host
> where they are located.  This is where the udmabuf idea comes from:
> Guest tells the host (qemu) where the gem object is, and qemu then can
> create a dmabuf backed by those pages to pass it on to other processes
> such as the wayland display server.  Possibly even without the guest
> explicitly asking for it, i.e. export the framebuffer placed by the
> guest in the (virtual) vga pci memory bar as dma-buf.  And I can imagine
> that this is useful outsize virtualization too.
>
>
> I fail to see any common ground for xen-zcopy and udmabuf ...
>
> Beside that there is the problem that the udmabuf idea has its own share
> of issues, for example the fork() issue pointed out by Christian
> König[1].  So I still need to find something which can work for kvm ...
>
> cheers,
>    Gerd
>
> [1] https://www.spinics.net/lists/dri-devel/msg169442.html
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Daniel Stone April 6, 2018, 9:52 a.m. UTC | #8

Hi Gerd,

On 14 March 2018 at 08:03, Gerd Hoffmann <kraxel@redhat.com> wrote:
>> Either mlock account (because it's mlocked defacto), and get_user_pages
>> won't do that for you.
>>
>> Or you write the full-blown userptr implementation, including mmu_notifier
>> support (see i915 or amdgpu), but that also requires Christian Königs
>> latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
>> buffers is a no-go).
>
> I guess I'll look at mlock accounting for starters then.  Easier for
> now, and leaves the door open to switch to userptr later as this should
> be transparent to userspace.

Out of interest, do you have usecases for full userptr support? Maybe
another way would be to allow creation of dmabufs from memfds.

Cheers,
Daniel

Gerd Hoffmann April 6, 2018, 10:54 a.m. UTC | #9

On Fri, Apr 06, 2018 at 10:52:21AM +0100, Daniel Stone wrote:
> Hi Gerd,
> 
> On 14 March 2018 at 08:03, Gerd Hoffmann <kraxel@redhat.com> wrote:
> >> Either mlock account (because it's mlocked defacto), and get_user_pages
> >> won't do that for you.
> >>
> >> Or you write the full-blown userptr implementation, including mmu_notifier
> >> support (see i915 or amdgpu), but that also requires Christian Königs
> >> latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
> >> buffers is a no-go).
> >
> > I guess I'll look at mlock accounting for starters then.  Easier for
> > now, and leaves the door open to switch to userptr later as this should
> > be transparent to userspace.
> 
> Out of interest, do you have usecases for full userptr support? Maybe
> another way would be to allow creation of dmabufs from memfds.

I have two things in mind.

One is vga emulation.  I have virtual pci memory bar for the virtual
vga.  qemu backs vga memory with anonymous pages right now, switching
that to shmem should be easy though if that makes things easier.  Guest
places the framebuffer somewhere in the pci bar, and I want export the
chunk which represents the framebuffer as dma-buf to display it on the
host without copying around data.  Framebuffer is linear in guest
physical memory, so a single block only.  That is the simpler case.

The more difficuilt one is virtio-gpu ressources.  virtio-gpu resources
live in host memory (guest has no direct access).  The guest can
optionally specify guest memory pages as backing storage for the
resource.  Guest backing storage is allowed to be scattered.  Commands
exist to copy both ways between host storage and guest backing.

With virgl (opengl acceleration) enabled the guest will send rendering
commands to fill the framebuffer ressource, so there is no need to copy
content to the framebuffer ressource.  The guest may fill other
resources such as textures used for rendering with copy commands.

Without acceleration the guest does software-rendering to the backing
storage, then sends a command to copy the framebuffer content from guest
backing storage to host ressource.

Now it would be useful to allow a shared mapping, so no copying between
guest backing storage and host resource is needed, especially for the
software rendering case (i.e. dumb gem buffers).  Being able to export
guest dumb buffers to other host processes would be useful too, for
example to display guest windows seamlessly on the host wayland server.

So getting a dma-buf for the guest backing storage via udmabuf looked
like a useful approach.  We can export the guest gem buffers to other
host processes that way.  qemu itself could map it too, to get a linear
representation of the scattered guest backing storage.

The other obvious approach would be to do it the other way around and
allow the guest map the host resource somehow.  On the host side qemu
could use vgem to allocate resource memory, so it'll be a gem object
already.  Mapping that into the guest isn't that straight-forward
though.  The guest manages its physical address space, so the guest
would need to find a free spot and ask the host to place the resource
there.  Then the guest needs page structs covering the mapped resource,
so it can work with it.  Didn't investigate how difficuilt that is.  Use
memory hotplug maybe?  Can we easily unmap the resource then?  Also I
think updating the guests physical memory layout (which we would need to
do on every resource map/unmap) isn't an exactly cheap operation ...

cheers,
  Gerd

Oleksandr Andrushchenko April 6, 2018, 11:17 a.m. UTC | #10

On 04/06/2018 12:07 PM, Gerd Hoffmann wrote:
> I'm not sure we can create something which works on both kvm and xen.
> The memory management model is quite different ...
>
>
> On xen the hypervisor manages all memory.  Guests can allow other guests
> to access specific pages (using grant tables).  In theory any guest <=>
> guest communication is possible.  In practice is mostly guest <=> dom0
> because guests access their virtual hardware that way.  dom0 is the
> priviledged guest which owns any hardware not managed by xen itself.
>
> Xen guests can ask the hypervisor to update the mapping of guest
> physical pages.  They can ballon down (unmap and free pages).  They can
> ballon up (ask the hypervisor to map fresh pages).  They can map pages
> exported by other guests using grant tables.  xen-zcopy makes heavy use
> of this.  It balloons down, to make room in the guest physical address
> space, then goes map the exported pages there, finally composes a
> dma-buf.
>
>
> On kvm qemu manages all guest memory.  qemu also has all guest memory
> mapped, so a grant-table like mechanism isn't needed to implement
> virtual devices.  qemu can decide how it backs memory for the guest.
> qemu propagates the guest memory map to the kvm driver in the linux
> kernel.  kvm guests have some control over the guest memory map, for
> example they can map pci bars wherever they want in their guest physical
> address space by programming the base registers accordingly, but unlike
> xen guests they can't ask the host to remap individual pages.
>
> Due to qemu having all guest memory mapped virtual devices are typically
> designed to have the guest allocate resources, then notify the host
> where they are located.  This is where the udmabuf idea comes from:
> Guest tells the host (qemu) where the gem object is, and qemu then can
> create a dmabuf backed by those pages to pass it on to other processes
> such as the wayland display server.  Possibly even without the guest
> explicitly asking for it, i.e. export the framebuffer placed by the
> guest in the (virtual) vga pci memory bar as dma-buf.  And I can imagine
> that this is useful outsize virtualization too.
>
>
> I fail to see any common ground for xen-zcopy and udmabuf ...
Does the above mean you can assume that xen-zcopy and udmabuf
can co-exist as two different solutions?
And what about hyper-dmabuf?

Thank you,
Oleksandr

Gerd Hoffmann April 6, 2018, 11:57 a.m. UTC | #11

Hi,

> > I fail to see any common ground for xen-zcopy and udmabuf ...

> Does the above mean you can assume that xen-zcopy and udmabuf
> can co-exist as two different solutions?

Well, udmabuf route isn't fully clear yet, but yes.

See also gvt (intel vgpu), where the hypervisor interface is abstracted
away into a separate kernel modules even though most of the actual vgpu
emulation code is common.

> And what about hyper-dmabuf?

No idea, didn't look at it in detail.

Looks pretty complex from a distant view.  Maybe because it tries to
build a communication framework using dma-bufs instead of a simple
dma-buf passing mechanism.

Like xen-zcopy it seems to depend on the idea that the hypervisor
manages all memory it is easy for guests to share pages with the help of
the hypervisor.  Which simply isn't the case on kvm.

hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
on top of xen-zcopy.

cheers,
  Gerd

Oleksandr Andrushchenko April 6, 2018, 12:36 p.m. UTC | #12

On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
>    Hi,
>
>>> I fail to see any common ground for xen-zcopy and udmabuf ...
>> Does the above mean you can assume that xen-zcopy and udmabuf
>> can co-exist as two different solutions?
> Well, udmabuf route isn't fully clear yet, but yes.
>
> See also gvt (intel vgpu), where the hypervisor interface is abstracted
> away into a separate kernel modules even though most of the actual vgpu
> emulation code is common.
Thank you for your input, I'm just trying to figure out
which of the three z-copy solutions intersect and how much
>> And what about hyper-dmabuf?
> No idea, didn't look at it in detail.
>
> Looks pretty complex from a distant view.  Maybe because it tries to
> build a communication framework using dma-bufs instead of a simple
> dma-buf passing mechanism.
Yes, I am looking at it now, trying to figure out the full story
and its implementation. BTW, Intel guys were about to share some
test application for hyper-dmabuf, maybe I have missed one.
It could probably better explain the use-cases and the complexity
they have in hyper-dmabuf.
>
> Like xen-zcopy it seems to depend on the idea that the hypervisor
> manages all memory it is easy for guests to share pages with the help of
> the hypervisor.
So, for xen-zcopy we were not trying to make it generic,
it just solves display (dumb) zero-copying use-cases for Xen.
We implemented it as a DRM helper driver because we can't see any
other use-cases as of now.
For example, we also have Xen para-virtualized sound driver, but
its buffer memory usage is not comparable to what display wants
and it works somewhat differently (e.g. there is no "frame done"
event, so one can't tell when the sound buffer can be "flipped").
At the same time, we do not use virtio-gpu, so this could probably
be one more candidate for shared dma-bufs some day.
>    Which simply isn't the case on kvm.
>
> hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
> on top of xen-zcopy.
Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
in terms of implementing all that page sharing fun in multiple directions,
e.g. Host->Guest, Guest->Host, Guest<->Guest.
But I'll let Matt and Dongwon to comment on that.

>
> cheers,
>    Gerd
>
Thank you,
Oleksandr

P.S. Sorry for making your original mail thread to discuss things much
broader than your RFC...

Kim, Dongwon April 6, 2018, 6:57 p.m. UTC | #13

On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
> >   Hi,
> >
> >>>I fail to see any common ground for xen-zcopy and udmabuf ...
> >>Does the above mean you can assume that xen-zcopy and udmabuf
> >>can co-exist as two different solutions?
> >Well, udmabuf route isn't fully clear yet, but yes.
> >
> >See also gvt (intel vgpu), where the hypervisor interface is abstracted
> >away into a separate kernel modules even though most of the actual vgpu
> >emulation code is common.
> Thank you for your input, I'm just trying to figure out
> which of the three z-copy solutions intersect and how much
> >>And what about hyper-dmabuf?

xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
in terms of these core sharing feature:

1. the sharing process - import prime/dmabuf from the producer -> extract
underlying pages and get those shared -> return references for shared pages

2. the page sharing mechanism - it uses Xen-grant-table.

And to give you a quick summary of differences as far as I understand
between two implementations (please correct me if I am wrong, Oleksandr.)

1. xen-zcopy is DRM specific - can import only DRM prime buffer
while hyper_dmabuf can export any dmabuf regardless of originator

2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
out synchronization message to the exporting VM for synchronization.

3. 1-level references - when using grant-table for sharing pages, there will
be same # of refs (each 8 byte) as # of shared pages, which is passed to
the userspace to be shared with importing VM in case of xen-zcopy. Compared
to this, hyper_dmabuf does multiple level addressing to generate only one
reference id that represents all shared pages.

4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
communication defined for dmabuf synchronization and private data (meta
info that Matt Roper mentioned) exchange.

5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
notified when newdmabuf is exported from other VM - uevent can be optionally
generated when this happens.

6. structure - hyper_dmabuf is targetting to provide a generic solution for
inter-domain dmabuf sharing for most hypervisors, which is why it has two
layers as mattrope mentioned, front-end that contains standard API and backend
that is specific to hypervisor.

> >No idea, didn't look at it in detail.
> >
> >Looks pretty complex from a distant view.  Maybe because it tries to
> >build a communication framework using dma-bufs instead of a simple
> >dma-buf passing mechanism.

we started with simple dma-buf sharing but realized there are many
things we need to consider in real use-case, so we added communication
, notification and dma-buf synchronization then re-structured it to 
front-end and back-end (this made things more compicated..) since Xen
was not our only target. Also, we thought passing the reference for the
buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.

> Yes, I am looking at it now, trying to figure out the full story
> and its implementation. BTW, Intel guys were about to share some
> test application for hyper-dmabuf, maybe I have missed one.
> It could probably better explain the use-cases and the complexity
> they have in hyper-dmabuf.

One example is actually in github. If you want take a look at it, please
visit:

https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export

> >
> >Like xen-zcopy it seems to depend on the idea that the hypervisor
> >manages all memory it is easy for guests to share pages with the help of
> >the hypervisor.
> So, for xen-zcopy we were not trying to make it generic,
> it just solves display (dumb) zero-copying use-cases for Xen.
> We implemented it as a DRM helper driver because we can't see any
> other use-cases as of now.
> For example, we also have Xen para-virtualized sound driver, but
> its buffer memory usage is not comparable to what display wants
> and it works somewhat differently (e.g. there is no "frame done"
> event, so one can't tell when the sound buffer can be "flipped").
> At the same time, we do not use virtio-gpu, so this could probably
> be one more candidate for shared dma-bufs some day.
> >   Which simply isn't the case on kvm.
> >
> >hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
> >on top of xen-zcopy.
> Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
> in terms of implementing all that page sharing fun in multiple directions,
> e.g. Host->Guest, Guest->Host, Guest<->Guest.
> But I'll let Matt and Dongwon to comment on that.

I think we can definitely collaborate. Especially, maybe we are using some
outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
for bringing that up Oleksandr). However, the question is once we collaborate
somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
provides? I don't think we need different IOCTLs that do the same in the final
solution.

> 
> >
> >cheers,
> >   Gerd
> >
> Thank you,
> Oleksandr
> 
> P.S. Sorry for making your original mail thread to discuss things much
> broader than your RFC...
>

Daniel Vetter April 9, 2018, 8 a.m. UTC | #14

On Fri, Apr 06, 2018 at 12:54:22PM +0200, Gerd Hoffmann wrote:
> On Fri, Apr 06, 2018 at 10:52:21AM +0100, Daniel Stone wrote:
> > Hi Gerd,
> > 
> > On 14 March 2018 at 08:03, Gerd Hoffmann <kraxel@redhat.com> wrote:
> > >> Either mlock account (because it's mlocked defacto), and get_user_pages
> > >> won't do that for you.
> > >>
> > >> Or you write the full-blown userptr implementation, including mmu_notifier
> > >> support (see i915 or amdgpu), but that also requires Christian Königs
> > >> latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
> > >> buffers is a no-go).
> > >
> > > I guess I'll look at mlock accounting for starters then.  Easier for
> > > now, and leaves the door open to switch to userptr later as this should
> > > be transparent to userspace.
> > 
> > Out of interest, do you have usecases for full userptr support? Maybe
> > another way would be to allow creation of dmabufs from memfds.
> 
> I have two things in mind.
> 
> One is vga emulation.  I have virtual pci memory bar for the virtual
> vga.  qemu backs vga memory with anonymous pages right now, switching
> that to shmem should be easy though if that makes things easier.  Guest
> places the framebuffer somewhere in the pci bar, and I want export the
> chunk which represents the framebuffer as dma-buf to display it on the
> host without copying around data.  Framebuffer is linear in guest
> physical memory, so a single block only.  That is the simpler case.
> 
> The more difficuilt one is virtio-gpu ressources.  virtio-gpu resources
> live in host memory (guest has no direct access).  The guest can
> optionally specify guest memory pages as backing storage for the
> resource.  Guest backing storage is allowed to be scattered.  Commands
> exist to copy both ways between host storage and guest backing.
> 
> With virgl (opengl acceleration) enabled the guest will send rendering
> commands to fill the framebuffer ressource, so there is no need to copy
> content to the framebuffer ressource.  The guest may fill other
> resources such as textures used for rendering with copy commands.
> 
> Without acceleration the guest does software-rendering to the backing
> storage, then sends a command to copy the framebuffer content from guest
> backing storage to host ressource.
> 
> Now it would be useful to allow a shared mapping, so no copying between
> guest backing storage and host resource is needed, especially for the
> software rendering case (i.e. dumb gem buffers).  Being able to export
> guest dumb buffers to other host processes would be useful too, for
> example to display guest windows seamlessly on the host wayland server.
> 
> So getting a dma-buf for the guest backing storage via udmabuf looked
> like a useful approach.  We can export the guest gem buffers to other
> host processes that way.  qemu itself could map it too, to get a linear
> representation of the scattered guest backing storage.
> 
> The other obvious approach would be to do it the other way around and
> allow the guest map the host resource somehow.  On the host side qemu
> could use vgem to allocate resource memory, so it'll be a gem object
> already.  Mapping that into the guest isn't that straight-forward
> though.  The guest manages its physical address space, so the guest
> would need to find a free spot and ask the host to place the resource
> there.  Then the guest needs page structs covering the mapped resource,
> so it can work with it.  Didn't investigate how difficuilt that is.  Use
> memory hotplug maybe?  Can we easily unmap the resource then?  Also I
> think updating the guests physical memory layout (which we would need to
> do on every resource map/unmap) isn't an exactly cheap operation ...

Generally we try to cache mappings as much as possible. And wrt finding a
slot: Create a sufficiently sized BAR on the virgl device, just for that?
-Daniel

Daniel Vetter April 9, 2018, 8:12 a.m. UTC | #15

On Thu, Apr 05, 2018 at 05:11:17PM -0700, Matt Roper wrote:
> On Thu, Apr 05, 2018 at 10:32:04PM +0200, Daniel Vetter wrote:
> > Pulling this out of the shadows again.
> > 
> > We now also have xen-zcopy from Oleksandr and the hyper dmabuf stuff
> > from Matt and Dongwong.
> > 
> > At least from the intel side there seems to be the idea to just have 1
> > special device that can handle cross-gues/host sharing for all kinds
> > of hypervisors, so I guess you all need to work together :-)
> > 
> > Or we throw out the idea that hyper dmabuf will be cross-hypervisor
> > (not sure how useful/reasonable that is, someone please convince me
> > one way or the other).
> > 
> > Cheers, Daniel
> 
> Dongwon (DW) is the one doing all the real work on hyper_dmabuf, but I'm
> familiar with the use cases he's trying to address, and I think there
> are a couple high-level goals of his work that are worth calling out as
> we discuss the various options for sharing buffers produced in one VM
> with a consumer running in another VM:
> 
>  * We should try to keep the interface/usage separate from the
>    underlying hypervisor implementation details.  I.e., in DW's design
>    the sink/source drivers that handle the actual buffer passing in the
>    two VM's should provide a generic interface that does not depend on a
>    specific hypervisor.  Behind the scenes there could be various
>    implementations for specific hypervisors (Xen, KVM, ACRN, etc.), and
>    some of those backends may have additional restrictions, but it would
>    be best if userspace didn't have to know the specific hypervisor
>    running on the system and could just query the general capabilities
>    available to it.  We've already got projects in flight that are
>    wanting this functionality on Xen and ACRN today.

Two comments on this:

- Just because it's in drivers/gpu doesn't mean you can't use it for
  anything else. E.g. the xen-zcopy driver can very much be used for any
  dma-buf, there's nothing gpu specific with it - well besides that it
  resuses some useful DRM ioctls, but if that annoys you just do a #define
  TOTALLY_GENERIC DRM and be done :-)

- Especially the kvm memory and hypervisor model seems totally different
  from other hypervisors, e.g. no real use for guest-guest sharing (which
  doesn't go through the host) and other cases. So trying to make
  something 100% generic seems like a bad idea.

  Wrt making it generic: Just use generic interfaces - if you can somehow
  use xen-front for the display sharing, then a) no need for hyper-dmabuf
  and b) already fully generic since it looks like a normal drm device to
  the guest userspace.

>  * The general interface should be able to express sharing from any
>    guest:guest, not just guest:host.  Arbitrary G:G sharing might be
>    something some hypervisors simply aren't able to support, but the
>    userspace API itself shouldn't make assumptions or restrict that.  I
>    think ideally the sharing API would include some kind of
>    query_targets interface that would return a list of VM's that your
>    current OS is allowed to share with; that list would be depend on the
>    policy established by the system integrator, but obviously wouldn't
>    include targets that the hypervisor itself wouldn't be capable of
>    handling.

Uh ... has a proper security architect analyzed this idea?

>  * A lot of the initial use cases are in the realm of graphics, but this
>    shouldn't be a graphics-specific API.  Buffers might contain other
>    types of content as well (e.g., audio).  Really the content producer
>    could potentially be any driver (or userspace) running in the VM that
>    knows how to import/export dma_buf's (or maybe just import given
>    danvet's suggestion that we should make the sink driver do all the
>    actual memory allocation for any buffers that may be shared).

See above, just because it uses drm ioctls doesn't make it gfx specific.

Otoh making it even more graphics specific might be even better, i.e. just
sharing the backend tech (grant tables or whatever), but having dedicated
front-ents for each use-case so there's less code to type.

>  * We need to be able to handle cross-VM coordination of buffer usage as
>    well, so I think we'd want to include fence forwarding support in the
>    API as well to signal back and forth about production/consumption
>    completion.  And of course document really well what should happen
>    if, for example, the entire VM you're sharing with/from dies.

Implicit fencing has been proven to be a bad idea. Please do explicit
passing of dma_fences (plus assorted protocol).

>  * The sharing API could be used to share multiple kinds of content in a
>    single system.  The sharing sink driver running in the content
>    producer's VM should accept some additional metadata that will be
>    passed over to the target VM as well.  The sharing source driver
>    running in the content consumer's VM would then be able to use this
>    metadata to determine the purpose of a new buffer that arrives and
>    filter/dispatch it to the appropriate consumer.

If you want metadata, why not use xen-front or something similar to have a
well-defined means to transfer everything? One of the key design decisions
of dma-buf was to _not_ have metadata, just buffer sharing.
-Daniel

> 
> 
> For reference, the terminology I'm using is
> 
>  /----------\  dma_buf   /------\ HV /--------\  dma_buf   /----------\
>  | Producer |----------->| Sink | HV | Source |----------->| Consumer |
>  \----------/   ioctls   \------/ HV \--------/  uevents   \----------/
> 
> 
> 
> In the realm of graphics, "Producer" could potentially be something like
> an EGL client that sends the buffer at context setup and then signals
> with fences on each SwapBuffers.  "Consumer" could be a Wayland client
> that proxies the buffers into surfaces or dispatches them to other
> userspace software that's waiting for buffers.
> 
> With the hyper_dmabuf approach, there's a lot of ABI details that need
> to be worked out and really clearly documented before we worry too much
> about the backend hypervisor-specific stuff.
> 
> I'm not super familiar with xen-zcopy and udmabuf, but it sounds like
> they're approaching similar problems from slightly different directions,
> so we should make sure we can come up with something that satisfies
> everyone's requirements. 
> 
> 
> Matt
> 
> > 
> > On Wed, Mar 14, 2018 at 9:03 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> > >   Hi,
> > >
> > >> Either mlock account (because it's mlocked defacto), and get_user_pages
> > >> won't do that for you.
> > >>
> > >> Or you write the full-blown userptr implementation, including mmu_notifier
> > >> support (see i915 or amdgpu), but that also requires Christian Königs
> > >> latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
> > >> buffers is a no-go).
> > >
> > > I guess I'll look at mlock accounting for starters then.  Easier for
> > > now, and leaves the door open to switch to userptr later as this should
> > > be transparent to userspace.
> > >
> > >> > Known issue:  Driver API isn't complete yet.  Need add some flags, for
> > >> > example to support read-only buffers.
> > >>
> > >> dma-buf has no concept of read-only. I don't think we can even enforce
> > >> that (not many iommus can enforce this iirc), so pretty much need to
> > >> require r/w memory.
> > >
> > > Ah, ok.  Just saw the 'write' arg for get_user_pages_fast and figured we
> > > might support that, but if iommus can't handle that anyway it's
> > > pointless indeed.
> > >
> > >> > Cc: David Airlie <airlied@linux.ie>
> > >> > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > >> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> > >>
> > >> btw there's also the hyperdmabuf stuff from the xen folks, but imo their
> > >> solution of forwarding the entire dma-buf api is over the top. This here
> > >> looks _much_ better, pls cc all the hyperdmabuf people on your next
> > >> version.
> > >
> > > Fun fact: googling for "hyperdmabuf" found me your mail and nothing else :-o
> > > (Trying "hyper dmabuf" instead worked better then).
> > >
> > > Yes, will cc them on the next version.  Not sure it'll help much on xen
> > > though due to the memory management being very different.  Basically xen
> > > owns the memory, not the kernel of the control domain (dom0), so
> > > creating dmabufs for guest memory chunks isn't that simple ...
> > >
> > > Also it's not clear whenever they really need guest -> guest exports or
> > > guest -> dom0 exports.
> > >
> > >> Overall I like the idea, but too lazy to review.
> > >
> > > Cool.  General comments on the idea was all I was looking for for the
> > > moment.  Spare yor review cycles for the next version ;)
> > >
> > >> Oh, some kselftests for this stuff would be lovely.
> > >
> > > I'll look into it.
> > >
> > > thanks,
> > >   Gerd
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > 
> > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> IoTG Platform Enabling & Development
> Intel Corporation
> (916) 356-2795

Oleksandr Andrushchenko April 10, 2018, 6:37 a.m. UTC | #16

On 04/06/2018 09:57 PM, Dongwon Kim wrote:
> On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
>> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
>>>    Hi,
>>>
>>>>> I fail to see any common ground for xen-zcopy and udmabuf ...
>>>> Does the above mean you can assume that xen-zcopy and udmabuf
>>>> can co-exist as two different solutions?
>>> Well, udmabuf route isn't fully clear yet, but yes.
>>>
>>> See also gvt (intel vgpu), where the hypervisor interface is abstracted
>>> away into a separate kernel modules even though most of the actual vgpu
>>> emulation code is common.
>> Thank you for your input, I'm just trying to figure out
>> which of the three z-copy solutions intersect and how much
>>>> And what about hyper-dmabuf?
> xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
> in terms of these core sharing feature:
>
> 1. the sharing process - import prime/dmabuf from the producer -> extract
> underlying pages and get those shared -> return references for shared pages
>
> 2. the page sharing mechanism - it uses Xen-grant-table.
>
> And to give you a quick summary of differences as far as I understand
> between two implementations (please correct me if I am wrong, Oleksandr.)
>
> 1. xen-zcopy is DRM specific - can import only DRM prime buffer
> while hyper_dmabuf can export any dmabuf regardless of originator
Well, this is true. And at the same time this is just a matter
of extending the API: xen-zcopy is a helper driver designed for
xen-front/back use-case, so this is why it only has DRM PRIME API
>
> 2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
> while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
> out synchronization message to the exporting VM for synchronization.
This is true. Again, this is because of the use-cases it covers.
But having synchronization for a generic solution seems to be a good idea.
>
> 3. 1-level references - when using grant-table for sharing pages, there will
> be same # of refs (each 8 byte)
To be precise, grant ref is 4 bytes
> as # of shared pages, which is passed to
> the userspace to be shared with importing VM in case of xen-zcopy.
The reason for that is that xen-zcopy is a helper driver, e.g.
the grant references come from the display backend [1], which implements
Xen display protocol [2]. So, effectively the backend extracts references
from frontend's requests and passes those to xen-zcopy as an array
of refs.
>   Compared
> to this, hyper_dmabuf does multiple level addressing to generate only one
> reference id that represents all shared pages.
In the protocol [2] only one reference to the gref directory is passed 
between VMs
(and the gref directory is a single-linked list of shared pages 
containing all
of the grefs of the buffer).

>
> 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
> communication defined for dmabuf synchronization and private data (meta
> info that Matt Roper mentioned) exchange.
This is true, xen-zcopy has no means for inter VM sync and meta-data,
simply because it doesn't have any code for inter VM exchange in it,
e.g. the inter VM protocol is handled by the backend [1].
>
> 5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
> notified when newdmabuf is exported from other VM - uevent can be optionally
> generated when this happens.
>
> 6. structure - hyper_dmabuf is targetting to provide a generic solution for
> inter-domain dmabuf sharing for most hypervisors, which is why it has two
> layers as mattrope mentioned, front-end that contains standard API and backend
> that is specific to hypervisor.
Again, xen-zcopy is decoupled from inter VM communication
>>> No idea, didn't look at it in detail.
>>>
>>> Looks pretty complex from a distant view.  Maybe because it tries to
>>> build a communication framework using dma-bufs instead of a simple
>>> dma-buf passing mechanism.
> we started with simple dma-buf sharing but realized there are many
> things we need to consider in real use-case, so we added communication
> , notification and dma-buf synchronization then re-structured it to
> front-end and back-end (this made things more compicated..) since Xen
> was not our only target. Also, we thought passing the reference for the
> buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.
>
>> Yes, I am looking at it now, trying to figure out the full story
>> and its implementation. BTW, Intel guys were about to share some
>> test application for hyper-dmabuf, maybe I have missed one.
>> It could probably better explain the use-cases and the complexity
>> they have in hyper-dmabuf.
> One example is actually in github. If you want take a look at it, please
> visit:
>
> https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
Thank you, I'll have a look
>>> Like xen-zcopy it seems to depend on the idea that the hypervisor
>>> manages all memory it is easy for guests to share pages with the help of
>>> the hypervisor.
>> So, for xen-zcopy we were not trying to make it generic,
>> it just solves display (dumb) zero-copying use-cases for Xen.
>> We implemented it as a DRM helper driver because we can't see any
>> other use-cases as of now.
>> For example, we also have Xen para-virtualized sound driver, but
>> its buffer memory usage is not comparable to what display wants
>> and it works somewhat differently (e.g. there is no "frame done"
>> event, so one can't tell when the sound buffer can be "flipped").
>> At the same time, we do not use virtio-gpu, so this could probably
>> be one more candidate for shared dma-bufs some day.
>>>    Which simply isn't the case on kvm.
>>>
>>> hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
>>> on top of xen-zcopy.
>> Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
>> in terms of implementing all that page sharing fun in multiple directions,
>> e.g. Host->Guest, Guest->Host, Guest<->Guest.
>> But I'll let Matt and Dongwon to comment on that.
> I think we can definitely collaborate. Especially, maybe we are using some
> outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
> for bringing that up Oleksandr). However, the question is once we collaborate
> somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
> provides? I don't think we need different IOCTLs that do the same in the final
> solution.
>
If you think of xen-zcopy as a library (which implements Xen
grant references mangling) and DRM PRIME wrapper on top of that
library, we can probably define proper API for that library,
so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
about to start upstreaming Xen para-virtualized sound device driver soon,
which also uses similar code and gref passing mechanism [3].
(Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
snd/xen-front and then propose a Xen helper library for sharing big buffers,
so common code of the above drivers can use the same code w/o code 
duplication)

Thank you,
Oleksandr

P.S. All, is it a good idea to move this out of udmabuf thread into a 
dedicated one?
>>> cheers,
>>>    Gerd
>>>
>> Thank you,
>> Oleksandr
>>
>> P.S. Sorry for making your original mail thread to discuss things much
>> broader than your RFC...
>>
[1] https://github.com/xen-troops/displ_be
[2] 
https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h#L484
[3] 
https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/sndif.h

Gerd Hoffmann April 10, 2018, 2:22 p.m. UTC | #17

Hi,

> Generally we try to cache mappings as much as possible. And wrt finding a
> slot: Create a sufficiently sized BAR on the virgl device, just for that?

Well.  virtio has no concept of "bars" ...

The most common virtio transport layer happens to be pci, which actually
has bars.  But we also have virtio-mmio (largely unused since arm got
pci) and virtio-ccw (used on s390x).

In any case it would be a layering violation.

Figured meanwhile qemu got memfd support recently, i.e. it can be
configured to back guest memory with memfd.  Which makes the memfd route
quite attractive.  Guess I try switch udmabuf to require memfd storage
as proof-of-concept.

cheers,
  Gerd

Kim, Dongwon April 10, 2018, 5:26 p.m. UTC | #18

On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
> On 04/06/2018 09:57 PM, Dongwon Kim wrote:
> >On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
> >>On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
> >>>   Hi,
> >>>
> >>>>>I fail to see any common ground for xen-zcopy and udmabuf ...
> >>>>Does the above mean you can assume that xen-zcopy and udmabuf
> >>>>can co-exist as two different solutions?
> >>>Well, udmabuf route isn't fully clear yet, but yes.
> >>>
> >>>See also gvt (intel vgpu), where the hypervisor interface is abstracted
> >>>away into a separate kernel modules even though most of the actual vgpu
> >>>emulation code is common.
> >>Thank you for your input, I'm just trying to figure out
> >>which of the three z-copy solutions intersect and how much
> >>>>And what about hyper-dmabuf?
> >xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
> >in terms of these core sharing feature:
> >
> >1. the sharing process - import prime/dmabuf from the producer -> extract
> >underlying pages and get those shared -> return references for shared pages

Another thing is danvet was kind of against to the idea of importing existing
dmabuf/prime buffer and forward it to the other domain due to synchronization
issues. He proposed to make hyper_dmabuf only work as an exporter so that it
can have a full control over the buffer. I think we need to talk about this
further as well.

danvet, can you comment on this topic?

> >
> >2. the page sharing mechanism - it uses Xen-grant-table.
> >
> >And to give you a quick summary of differences as far as I understand
> >between two implementations (please correct me if I am wrong, Oleksandr.)
> >
> >1. xen-zcopy is DRM specific - can import only DRM prime buffer
> >while hyper_dmabuf can export any dmabuf regardless of originator
> Well, this is true. And at the same time this is just a matter
> of extending the API: xen-zcopy is a helper driver designed for
> xen-front/back use-case, so this is why it only has DRM PRIME API
> >
> >2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
> >while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
> >out synchronization message to the exporting VM for synchronization.
> This is true. Again, this is because of the use-cases it covers.
> But having synchronization for a generic solution seems to be a good idea.

Yeah, understood xen-zcopy works ok with your use case. But I am just curious
if it is ok not to have any inter-domain synchronization in this sharing model.
The buffer being shared is technically dma-buf and originator needs to be able
to keep track of it.

> >
> >3. 1-level references - when using grant-table for sharing pages, there will
> >be same # of refs (each 8 byte)
> To be precise, grant ref is 4 bytes
You are right. Thanks for correction.;)

> >as # of shared pages, which is passed to
> >the userspace to be shared with importing VM in case of xen-zcopy.
> The reason for that is that xen-zcopy is a helper driver, e.g.
> the grant references come from the display backend [1], which implements
> Xen display protocol [2]. So, effectively the backend extracts references
> from frontend's requests and passes those to xen-zcopy as an array
> of refs.
> >  Compared
> >to this, hyper_dmabuf does multiple level addressing to generate only one
> >reference id that represents all shared pages.
> In the protocol [2] only one reference to the gref directory is passed
> between VMs
> (and the gref directory is a single-linked list of shared pages containing
> all
> of the grefs of the buffer).

ok, good to know. I will look into its implementation in more details but is
this gref directory (chained grefs) something that can be used for any general
memory sharing use case or is it jsut for xen-display (in current code base)?

> 
> >
> >4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
> >communication defined for dmabuf synchronization and private data (meta
> >info that Matt Roper mentioned) exchange.
> This is true, xen-zcopy has no means for inter VM sync and meta-data,
> simply because it doesn't have any code for inter VM exchange in it,
> e.g. the inter VM protocol is handled by the backend [1].
> >
> >5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
> >notified when newdmabuf is exported from other VM - uevent can be optionally
> >generated when this happens.
> >
> >6. structure - hyper_dmabuf is targetting to provide a generic solution for
> >inter-domain dmabuf sharing for most hypervisors, which is why it has two
> >layers as mattrope mentioned, front-end that contains standard API and backend
> >that is specific to hypervisor.
> Again, xen-zcopy is decoupled from inter VM communication
> >>>No idea, didn't look at it in detail.
> >>>
> >>>Looks pretty complex from a distant view.  Maybe because it tries to
> >>>build a communication framework using dma-bufs instead of a simple
> >>>dma-buf passing mechanism.
> >we started with simple dma-buf sharing but realized there are many
> >things we need to consider in real use-case, so we added communication
> >, notification and dma-buf synchronization then re-structured it to
> >front-end and back-end (this made things more compicated..) since Xen
> >was not our only target. Also, we thought passing the reference for the
> >buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.
> >
> >>Yes, I am looking at it now, trying to figure out the full story
> >>and its implementation. BTW, Intel guys were about to share some
> >>test application for hyper-dmabuf, maybe I have missed one.
> >>It could probably better explain the use-cases and the complexity
> >>they have in hyper-dmabuf.
> >One example is actually in github. If you want take a look at it, please
> >visit:
> >
> >https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
> Thank you, I'll have a look
> >>>Like xen-zcopy it seems to depend on the idea that the hypervisor
> >>>manages all memory it is easy for guests to share pages with the help of
> >>>the hypervisor.
> >>So, for xen-zcopy we were not trying to make it generic,
> >>it just solves display (dumb) zero-copying use-cases for Xen.
> >>We implemented it as a DRM helper driver because we can't see any
> >>other use-cases as of now.
> >>For example, we also have Xen para-virtualized sound driver, but
> >>its buffer memory usage is not comparable to what display wants
> >>and it works somewhat differently (e.g. there is no "frame done"
> >>event, so one can't tell when the sound buffer can be "flipped").
> >>At the same time, we do not use virtio-gpu, so this could probably
> >>be one more candidate for shared dma-bufs some day.
> >>>   Which simply isn't the case on kvm.
> >>>
> >>>hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
> >>>on top of xen-zcopy.
> >>Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
> >>in terms of implementing all that page sharing fun in multiple directions,
> >>e.g. Host->Guest, Guest->Host, Guest<->Guest.
> >>But I'll let Matt and Dongwon to comment on that.
> >I think we can definitely collaborate. Especially, maybe we are using some
> >outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
> >for bringing that up Oleksandr). However, the question is once we collaborate
> >somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
> >provides? I don't think we need different IOCTLs that do the same in the final
> >solution.
> >
> If you think of xen-zcopy as a library (which implements Xen
> grant references mangling) and DRM PRIME wrapper on top of that
> library, we can probably define proper API for that library,
> so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
> about to start upstreaming Xen para-virtualized sound device driver soon,
> which also uses similar code and gref passing mechanism [3].
> (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
> snd/xen-front and then propose a Xen helper library for sharing big buffers,
> so common code of the above drivers can use the same code w/o code
> duplication)

I think it is possible to use your functions for memory sharing part in
hyper_dmabuf's backend (this 'backend' means the layer that does page sharing
and inter-vm communication with xen-specific way.), so why don't we work on
"Xen helper library for sharing big buffers" first while we continue our
discussion on the common API layer that can cover any dmabuf sharing cases.

> 
> Thank you,
> Oleksandr
> 
> P.S. All, is it a good idea to move this out of udmabuf thread into a
> dedicated one?

Either way is fine with me.

> >>>cheers,
> >>>   Gerd
> >>>
> >>Thank you,
> >>Oleksandr
> >>
> >>P.S. Sorry for making your original mail thread to discuss things much
> >>broader than your RFC...
> >>
> [1] https://github.com/xen-troops/displ_be
> [2] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h#L484
> [3] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/sndif.h
>

Oleksandr Andrushchenko April 11, 2018, 5:59 a.m. UTC | #19

On 04/10/2018 08:26 PM, Dongwon Kim wrote:
> On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
>> On 04/06/2018 09:57 PM, Dongwon Kim wrote:
>>> On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
>>>> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
>>>>>    Hi,
>>>>>
>>>>>>> I fail to see any common ground for xen-zcopy and udmabuf ...
>>>>>> Does the above mean you can assume that xen-zcopy and udmabuf
>>>>>> can co-exist as two different solutions?
>>>>> Well, udmabuf route isn't fully clear yet, but yes.
>>>>>
>>>>> See also gvt (intel vgpu), where the hypervisor interface is abstracted
>>>>> away into a separate kernel modules even though most of the actual vgpu
>>>>> emulation code is common.
>>>> Thank you for your input, I'm just trying to figure out
>>>> which of the three z-copy solutions intersect and how much
>>>>>> And what about hyper-dmabuf?
>>> xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
>>> in terms of these core sharing feature:
>>>
>>> 1. the sharing process - import prime/dmabuf from the producer -> extract
>>> underlying pages and get those shared -> return references for shared pages
> Another thing is danvet was kind of against to the idea of importing existing
> dmabuf/prime buffer and forward it to the other domain due to synchronization
> issues. He proposed to make hyper_dmabuf only work as an exporter so that it
> can have a full control over the buffer. I think we need to talk about this
> further as well.
Yes, I saw this. But this limits the use-cases so much.
For instance, running Android as a Guest (which uses ION to allocate
buffers) means that finally HW composer will import dma-buf into
the DRM driver. Then, in case of xen-front for example, it needs to be
shared with the backend (Host side). Of course, we can change user-space
to make xen-front allocate the buffers (make it exporter), but what we try
to avoid is to change user-space which in normal world would have remain
unchanged otherwise.
So, I do think we have to support this use-case and just have to understand
the complexity.

>
> danvet, can you comment on this topic?
>
>>> 2. the page sharing mechanism - it uses Xen-grant-table.
>>>
>>> And to give you a quick summary of differences as far as I understand
>>> between two implementations (please correct me if I am wrong, Oleksandr.)
>>>
>>> 1. xen-zcopy is DRM specific - can import only DRM prime buffer
>>> while hyper_dmabuf can export any dmabuf regardless of originator
>> Well, this is true. And at the same time this is just a matter
>> of extending the API: xen-zcopy is a helper driver designed for
>> xen-front/back use-case, so this is why it only has DRM PRIME API
>>> 2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
>>> while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
>>> out synchronization message to the exporting VM for synchronization.
>> This is true. Again, this is because of the use-cases it covers.
>> But having synchronization for a generic solution seems to be a good idea.
> Yeah, understood xen-zcopy works ok with your use case. But I am just curious
> if it is ok not to have any inter-domain synchronization in this sharing model.
The synchronization is done with displif protocol [1]
> The buffer being shared is technically dma-buf and originator needs to be able
> to keep track of it.
As I am working in DRM terms the tracking is done by the DRM core
for me for free. (This might be one of the reasons Daniel sees DRM
based implementation fit very good from code-reuse POV).
>
>>> 3. 1-level references - when using grant-table for sharing pages, there will
>>> be same # of refs (each 8 byte)
>> To be precise, grant ref is 4 bytes
> You are right. Thanks for correction.;)
>
>>> as # of shared pages, which is passed to
>>> the userspace to be shared with importing VM in case of xen-zcopy.
>> The reason for that is that xen-zcopy is a helper driver, e.g.
>> the grant references come from the display backend [1], which implements
>> Xen display protocol [2]. So, effectively the backend extracts references
>> from frontend's requests and passes those to xen-zcopy as an array
>> of refs.
>>>   Compared
>>> to this, hyper_dmabuf does multiple level addressing to generate only one
>>> reference id that represents all shared pages.
>> In the protocol [2] only one reference to the gref directory is passed
>> between VMs
>> (and the gref directory is a single-linked list of shared pages containing
>> all
>> of the grefs of the buffer).
> ok, good to know. I will look into its implementation in more details but is
> this gref directory (chained grefs) something that can be used for any general
> memory sharing use case or is it jsut for xen-display (in current code base)?
Not to mislead you: one grant ref is passed via displif protocol,
but the page it's referencing contains the rest of the grant refs.

As to if this can be used for any memory: yes. It is the same for
sndif and displif Xen protocols, but defined twice as strictly speaking
sndif and displif are two separate protocols.

While reviewing your RFC v2 one of the comments I had [2] was that if we
can start from defining such a generic protocol for hyper-dmabuf.
It can be a header file, which not only has the description part
(which then become a part of Documentation/...rst file), but also defines
all the required constants for requests, responses, defines message formats,
state diagrams etc. all at one place. Of course this protocol must not be
Xen specific, but be OS/hypervisor agnostic.
Having that will trigger a new round of discussion, so we have it all 
designed
and discussed before we start implementing.

Besides the protocol we have to design UAPI part as well and make sure
the hyper-dmabuf is not only accessible from user-space, but there will 
be number
of kernel-space users as well.
>
>>> 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
>>> communication defined for dmabuf synchronization and private data (meta
>>> info that Matt Roper mentioned) exchange.
>> This is true, xen-zcopy has no means for inter VM sync and meta-data,
>> simply because it doesn't have any code for inter VM exchange in it,
>> e.g. the inter VM protocol is handled by the backend [1].
>>> 5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
>>> notified when newdmabuf is exported from other VM - uevent can be optionally
>>> generated when this happens.
>>>
>>> 6. structure - hyper_dmabuf is targetting to provide a generic solution for
>>> inter-domain dmabuf sharing for most hypervisors, which is why it has two
>>> layers as mattrope mentioned, front-end that contains standard API and backend
>>> that is specific to hypervisor.
>> Again, xen-zcopy is decoupled from inter VM communication
>>>>> No idea, didn't look at it in detail.
>>>>>
>>>>> Looks pretty complex from a distant view.  Maybe because it tries to
>>>>> build a communication framework using dma-bufs instead of a simple
>>>>> dma-buf passing mechanism.
>>> we started with simple dma-buf sharing but realized there are many
>>> things we need to consider in real use-case, so we added communication
>>> , notification and dma-buf synchronization then re-structured it to
>>> front-end and back-end (this made things more compicated..) since Xen
>>> was not our only target. Also, we thought passing the reference for the
>>> buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.
>>>
>>>> Yes, I am looking at it now, trying to figure out the full story
>>>> and its implementation. BTW, Intel guys were about to share some
>>>> test application for hyper-dmabuf, maybe I have missed one.
>>>> It could probably better explain the use-cases and the complexity
>>>> they have in hyper-dmabuf.
>>> One example is actually in github. If you want take a look at it, please
>>> visit:
>>>
>>> https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
>> Thank you, I'll have a look
>>>>> Like xen-zcopy it seems to depend on the idea that the hypervisor
>>>>> manages all memory it is easy for guests to share pages with the help of
>>>>> the hypervisor.
>>>> So, for xen-zcopy we were not trying to make it generic,
>>>> it just solves display (dumb) zero-copying use-cases for Xen.
>>>> We implemented it as a DRM helper driver because we can't see any
>>>> other use-cases as of now.
>>>> For example, we also have Xen para-virtualized sound driver, but
>>>> its buffer memory usage is not comparable to what display wants
>>>> and it works somewhat differently (e.g. there is no "frame done"
>>>> event, so one can't tell when the sound buffer can be "flipped").
>>>> At the same time, we do not use virtio-gpu, so this could probably
>>>> be one more candidate for shared dma-bufs some day.
>>>>>    Which simply isn't the case on kvm.
>>>>>
>>>>> hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
>>>>> on top of xen-zcopy.
>>>> Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
>>>> in terms of implementing all that page sharing fun in multiple directions,
>>>> e.g. Host->Guest, Guest->Host, Guest<->Guest.
>>>> But I'll let Matt and Dongwon to comment on that.
>>> I think we can definitely collaborate. Especially, maybe we are using some
>>> outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
>>> for bringing that up Oleksandr). However, the question is once we collaborate
>>> somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
>>> provides? I don't think we need different IOCTLs that do the same in the final
>>> solution.
>>>
>> If you think of xen-zcopy as a library (which implements Xen
>> grant references mangling) and DRM PRIME wrapper on top of that
>> library, we can probably define proper API for that library,
>> so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
>> about to start upstreaming Xen para-virtualized sound device driver soon,
>> which also uses similar code and gref passing mechanism [3].
>> (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
>> snd/xen-front and then propose a Xen helper library for sharing big buffers,
>> so common code of the above drivers can use the same code w/o code
>> duplication)
> I think it is possible to use your functions for memory sharing part in
> hyper_dmabuf's backend (this 'backend' means the layer that does page sharing
> and inter-vm communication with xen-specific way.), so why don't we work on
> "Xen helper library for sharing big buffers" first while we continue our
> discussion on the common API layer that can cover any dmabuf sharing cases.
>
Well, I would love we reuse the code that I have, but I also
understand that it was limited by my use-cases. So, I do not
insist we have to ;)
If we start designing and discussing hyper-dmabuf protocol we of course
can work on this helper library in parallel.
>> Thank you,
>> Oleksandr
>>
>> P.S. All, is it a good idea to move this out of udmabuf thread into a
>> dedicated one?
> Either way is fine with me.
So, if you can start designing the protocol we may have a dedicated mail
thread for that. I will try to help with the protocol as much as I can

>>>>> cheers,
>>>>>    Gerd
>>>>>
>>>> Thank you,
>>>> Oleksandr
>>>>
>>>> P.S. Sorry for making your original mail thread to discuss things much
>>>> broader than your RFC...
>>>>
>> [1] https://github.com/xen-troops/displ_be
>> [2] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h#L484
>> [3] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/sndif.h
>>
[1] 
https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h
[2] 
https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00685.html

Daniel Vetter April 13, 2018, 3:37 p.m. UTC | #20

On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
> > On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
> > > On 04/06/2018 09:57 PM, Dongwon Kim wrote:
> > > > On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
> > > > > On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
> > > > > >    Hi,
> > > > > > 
> > > > > > > > I fail to see any common ground for xen-zcopy and udmabuf ...
> > > > > > > Does the above mean you can assume that xen-zcopy and udmabuf
> > > > > > > can co-exist as two different solutions?
> > > > > > Well, udmabuf route isn't fully clear yet, but yes.
> > > > > > 
> > > > > > See also gvt (intel vgpu), where the hypervisor interface is abstracted
> > > > > > away into a separate kernel modules even though most of the actual vgpu
> > > > > > emulation code is common.
> > > > > Thank you for your input, I'm just trying to figure out
> > > > > which of the three z-copy solutions intersect and how much
> > > > > > > And what about hyper-dmabuf?
> > > > xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
> > > > in terms of these core sharing feature:
> > > > 
> > > > 1. the sharing process - import prime/dmabuf from the producer -> extract
> > > > underlying pages and get those shared -> return references for shared pages
> > Another thing is danvet was kind of against to the idea of importing existing
> > dmabuf/prime buffer and forward it to the other domain due to synchronization
> > issues. He proposed to make hyper_dmabuf only work as an exporter so that it
> > can have a full control over the buffer. I think we need to talk about this
> > further as well.
> Yes, I saw this. But this limits the use-cases so much.
> For instance, running Android as a Guest (which uses ION to allocate
> buffers) means that finally HW composer will import dma-buf into
> the DRM driver. Then, in case of xen-front for example, it needs to be
> shared with the backend (Host side). Of course, we can change user-space
> to make xen-front allocate the buffers (make it exporter), but what we try
> to avoid is to change user-space which in normal world would have remain
> unchanged otherwise.
> So, I do think we have to support this use-case and just have to understand
> the complexity.

Erm, why do you need importer capability for this use-case?

guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes
that dma-buf -> import to the real display hw

No where in this chain do you need xen-zcopy to be able to import a
dma-buf (within linux, it needs to import a bunch of pages from the
hypervisor).

Now if your plan is to use xen-zcopy in the guest1 instead of xen-front,
then you indeed need to import. But that imo doesn't make sense:
- xen-front gives you clearly defined flip events you can forward to the
  hypervisor. xen-zcopy would need to add that again. Same for
  hyperdmabuf (and really we're not going to shuffle struct dma_fence over
  the wire in a generic fashion between hypervisor guests).

- xen-front already has the idea of pixel format for the buffer (and any
  other metadata). Again, xen-zcopy and hyperdmabuf lack that, would need
  to add it shoehorned in somehow.

Ofc you won't be able to shovel sound or media stream data over to another
guest like this, but that's what you have xen-v4l and xen-sound or
whatever else for. Trying to make a new uapi, which means userspace must
be changed for all the different use-case, instead of reusing standard
linux driver uapi (which just happens to send the data to another
hypervisor guest instead of real hw) imo just doesn't make much sense.

Also, at least for the gpu subsystem: Any new uapi must have full
userspace available for it, see:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

Adding more uapi is definitely the most painful way to fix a use-case.
Personally I'd go as far and also change the xen-zcopy side on the
receiving guest to use some standard linux uapi. E.g. you could write an
output v4l driver to receive the frames from guest1.

> > danvet, can you comment on this topic?
> > 
> > > > 2. the page sharing mechanism - it uses Xen-grant-table.
> > > > 
> > > > And to give you a quick summary of differences as far as I understand
> > > > between two implementations (please correct me if I am wrong, Oleksandr.)
> > > > 
> > > > 1. xen-zcopy is DRM specific - can import only DRM prime buffer
> > > > while hyper_dmabuf can export any dmabuf regardless of originator
> > > Well, this is true. And at the same time this is just a matter
> > > of extending the API: xen-zcopy is a helper driver designed for
> > > xen-front/back use-case, so this is why it only has DRM PRIME API
> > > > 2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
> > > > while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
> > > > out synchronization message to the exporting VM for synchronization.
> > > This is true. Again, this is because of the use-cases it covers.
> > > But having synchronization for a generic solution seems to be a good idea.
> > Yeah, understood xen-zcopy works ok with your use case. But I am just curious
> > if it is ok not to have any inter-domain synchronization in this sharing model.
> The synchronization is done with displif protocol [1]
> > The buffer being shared is technically dma-buf and originator needs to be able
> > to keep track of it.
> As I am working in DRM terms the tracking is done by the DRM core
> for me for free. (This might be one of the reasons Daniel sees DRM
> based implementation fit very good from code-reuse POV).

Hm, not sure what tracking you refer to here all ... I got lost in all the
replies while catching up.

> > > > 3. 1-level references - when using grant-table for sharing pages, there will
> > > > be same # of refs (each 8 byte)
> > > To be precise, grant ref is 4 bytes
> > You are right. Thanks for correction.;)
> > 
> > > > as # of shared pages, which is passed to
> > > > the userspace to be shared with importing VM in case of xen-zcopy.
> > > The reason for that is that xen-zcopy is a helper driver, e.g.
> > > the grant references come from the display backend [1], which implements
> > > Xen display protocol [2]. So, effectively the backend extracts references
> > > from frontend's requests and passes those to xen-zcopy as an array
> > > of refs.
> > > >   Compared
> > > > to this, hyper_dmabuf does multiple level addressing to generate only one
> > > > reference id that represents all shared pages.
> > > In the protocol [2] only one reference to the gref directory is passed
> > > between VMs
> > > (and the gref directory is a single-linked list of shared pages containing
> > > all
> > > of the grefs of the buffer).
> > ok, good to know. I will look into its implementation in more details but is
> > this gref directory (chained grefs) something that can be used for any general
> > memory sharing use case or is it jsut for xen-display (in current code base)?
> Not to mislead you: one grant ref is passed via displif protocol,
> but the page it's referencing contains the rest of the grant refs.
> 
> As to if this can be used for any memory: yes. It is the same for
> sndif and displif Xen protocols, but defined twice as strictly speaking
> sndif and displif are two separate protocols.
> 
> While reviewing your RFC v2 one of the comments I had [2] was that if we
> can start from defining such a generic protocol for hyper-dmabuf.
> It can be a header file, which not only has the description part
> (which then become a part of Documentation/...rst file), but also defines
> all the required constants for requests, responses, defines message formats,
> state diagrams etc. all at one place. Of course this protocol must not be
> Xen specific, but be OS/hypervisor agnostic.
> Having that will trigger a new round of discussion, so we have it all
> designed
> and discussed before we start implementing.
> 
> Besides the protocol we have to design UAPI part as well and make sure
> the hyper-dmabuf is not only accessible from user-space, but there will be
> number
> of kernel-space users as well.

Again, why do you want to create new uapi for this? Given the very strict
requirements we have for new uapi (see above link), it's the toughest way
to get any kind of support in.

That's why I had essentially zero big questions for xen-front (except some
implementation improvements, and stuff to make sure xen-front actually
implements the real uapi semantics instead of its own), and why I'm asking
much more questions on this stuff here.

> > > > 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
> > > > communication defined for dmabuf synchronization and private data (meta
> > > > info that Matt Roper mentioned) exchange.
> > > This is true, xen-zcopy has no means for inter VM sync and meta-data,
> > > simply because it doesn't have any code for inter VM exchange in it,
> > > e.g. the inter VM protocol is handled by the backend [1].
> > > > 5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
> > > > notified when newdmabuf is exported from other VM - uevent can be optionally
> > > > generated when this happens.
> > > > 
> > > > 6. structure - hyper_dmabuf is targetting to provide a generic solution for
> > > > inter-domain dmabuf sharing for most hypervisors, which is why it has two
> > > > layers as mattrope mentioned, front-end that contains standard API and backend
> > > > that is specific to hypervisor.
> > > Again, xen-zcopy is decoupled from inter VM communication
> > > > > > No idea, didn't look at it in detail.
> > > > > > 
> > > > > > Looks pretty complex from a distant view.  Maybe because it tries to
> > > > > > build a communication framework using dma-bufs instead of a simple
> > > > > > dma-buf passing mechanism.
> > > > we started with simple dma-buf sharing but realized there are many
> > > > things we need to consider in real use-case, so we added communication
> > > > , notification and dma-buf synchronization then re-structured it to
> > > > front-end and back-end (this made things more compicated..) since Xen
> > > > was not our only target. Also, we thought passing the reference for the
> > > > buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.
> > > > 
> > > > > Yes, I am looking at it now, trying to figure out the full story
> > > > > and its implementation. BTW, Intel guys were about to share some
> > > > > test application for hyper-dmabuf, maybe I have missed one.
> > > > > It could probably better explain the use-cases and the complexity
> > > > > they have in hyper-dmabuf.
> > > > One example is actually in github. If you want take a look at it, please
> > > > visit:
> > > > 
> > > > https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
> > > Thank you, I'll have a look
> > > > > > Like xen-zcopy it seems to depend on the idea that the hypervisor
> > > > > > manages all memory it is easy for guests to share pages with the help of
> > > > > > the hypervisor.
> > > > > So, for xen-zcopy we were not trying to make it generic,
> > > > > it just solves display (dumb) zero-copying use-cases for Xen.
> > > > > We implemented it as a DRM helper driver because we can't see any
> > > > > other use-cases as of now.
> > > > > For example, we also have Xen para-virtualized sound driver, but
> > > > > its buffer memory usage is not comparable to what display wants
> > > > > and it works somewhat differently (e.g. there is no "frame done"
> > > > > event, so one can't tell when the sound buffer can be "flipped").
> > > > > At the same time, we do not use virtio-gpu, so this could probably
> > > > > be one more candidate for shared dma-bufs some day.
> > > > > >    Which simply isn't the case on kvm.
> > > > > > 
> > > > > > hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
> > > > > > on top of xen-zcopy.
> > > > > Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
> > > > > in terms of implementing all that page sharing fun in multiple directions,
> > > > > e.g. Host->Guest, Guest->Host, Guest<->Guest.
> > > > > But I'll let Matt and Dongwon to comment on that.
> > > > I think we can definitely collaborate. Especially, maybe we are using some
> > > > outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
> > > > for bringing that up Oleksandr). However, the question is once we collaborate
> > > > somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
> > > > provides? I don't think we need different IOCTLs that do the same in the final
> > > > solution.
> > > > 
> > > If you think of xen-zcopy as a library (which implements Xen
> > > grant references mangling) and DRM PRIME wrapper on top of that
> > > library, we can probably define proper API for that library,
> > > so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
> > > about to start upstreaming Xen para-virtualized sound device driver soon,
> > > which also uses similar code and gref passing mechanism [3].
> > > (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
> > > snd/xen-front and then propose a Xen helper library for sharing big buffers,
> > > so common code of the above drivers can use the same code w/o code
> > > duplication)
> > I think it is possible to use your functions for memory sharing part in
> > hyper_dmabuf's backend (this 'backend' means the layer that does page sharing
> > and inter-vm communication with xen-specific way.), so why don't we work on
> > "Xen helper library for sharing big buffers" first while we continue our
> > discussion on the common API layer that can cover any dmabuf sharing cases.
> > 
> Well, I would love we reuse the code that I have, but I also
> understand that it was limited by my use-cases. So, I do not
> insist we have to ;)
> If we start designing and discussing hyper-dmabuf protocol we of course
> can work on this helper library in parallel.

Imo code reuse is overrated. Adding new uapi is what freaks me out here
:-)

If we end up with duplicated implementations, even in upstream, meh, not
great, but also ok. New uapi, and in a similar way, new hypervisor api
like the dma-buf forwarding that hyperdmabuf does is the kind of thing
that will lock us in for 10+ years (if we make a mistake).

> > > Thank you,
> > > Oleksandr
> > > 
> > > P.S. All, is it a good idea to move this out of udmabuf thread into a
> > > dedicated one?
> > Either way is fine with me.
> So, if you can start designing the protocol we may have a dedicated mail
> thread for that. I will try to help with the protocol as much as I can

Please don't start with the protocol. Instead start with the concrete
use-cases, and then figure out why exactly you need new uapi. Once we have
that answered, we can start thinking about fleshing out the details.

Cheers, Daniel

> 
> > > > > > cheers,
> > > > > >    Gerd
> > > > > > 
> > > > > Thank you,
> > > > > Oleksandr
> > > > > 
> > > > > P.S. Sorry for making your original mail thread to discuss things much
> > > > > broader than your RFC...
> > > > > 
> > > [1] https://github.com/xen-troops/displ_be
> > > [2] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h#L484
> > > [3] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/sndif.h
> > > 
> [1] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h
> [2]
> https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00685.html
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Oleksandr Andrushchenko April 16, 2018, 7:16 a.m. UTC | #21

On 04/13/2018 06:37 PM, Daniel Vetter wrote:
> On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
>> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
>>> On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
>>>> On 04/06/2018 09:57 PM, Dongwon Kim wrote:
>>>>> On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
>>>>>> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
>>>>>>>     Hi,
>>>>>>>
>>>>>>>>> I fail to see any common ground for xen-zcopy and udmabuf ...
>>>>>>>> Does the above mean you can assume that xen-zcopy and udmabuf
>>>>>>>> can co-exist as two different solutions?
>>>>>>> Well, udmabuf route isn't fully clear yet, but yes.
>>>>>>>
>>>>>>> See also gvt (intel vgpu), where the hypervisor interface is abstracted
>>>>>>> away into a separate kernel modules even though most of the actual vgpu
>>>>>>> emulation code is common.
>>>>>> Thank you for your input, I'm just trying to figure out
>>>>>> which of the three z-copy solutions intersect and how much
>>>>>>>> And what about hyper-dmabuf?
>>>>> xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
>>>>> in terms of these core sharing feature:
>>>>>
>>>>> 1. the sharing process - import prime/dmabuf from the producer -> extract
>>>>> underlying pages and get those shared -> return references for shared pages
>>> Another thing is danvet was kind of against to the idea of importing existing
>>> dmabuf/prime buffer and forward it to the other domain due to synchronization
>>> issues. He proposed to make hyper_dmabuf only work as an exporter so that it
>>> can have a full control over the buffer. I think we need to talk about this
>>> further as well.
>> Yes, I saw this. But this limits the use-cases so much.
>> For instance, running Android as a Guest (which uses ION to allocate
>> buffers) means that finally HW composer will import dma-buf into
>> the DRM driver. Then, in case of xen-front for example, it needs to be
>> shared with the backend (Host side). Of course, we can change user-space
>> to make xen-front allocate the buffers (make it exporter), but what we try
>> to avoid is to change user-space which in normal world would have remain
>> unchanged otherwise.
>> So, I do think we have to support this use-case and just have to understand
>> the complexity.
> Erm, why do you need importer capability for this use-case?
>
> guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes
> that dma-buf -> import to the real display hw
>
> No where in this chain do you need xen-zcopy to be able to import a
> dma-buf (within linux, it needs to import a bunch of pages from the
> hypervisor).
>
> Now if your plan is to use xen-zcopy in the guest1 instead of xen-front,
> then you indeed need to import.
This is the exact use-case I was referring to while saying
we need to import on Guest1 side. If hyper-dmabuf is so
generic that there is no xen-front in the picture, then
it needs to import a dma-buf, so it can be exported at Guest2 side.
>   But that imo doesn't make sense:
> - xen-front gives you clearly defined flip events you can forward to the
>    hypervisor. xen-zcopy would need to add that again.
xen-zcopy is a helper driver which doesn't handle page flips
and is not a KMS driver as one might think of: the DRM UAPI it uses is
just to export a dma-buf as a PRIME buffer, but that's it.
Flipping etc. is done by the backend [1], not xen-zcopy.
>   Same for
>    hyperdmabuf (and really we're not going to shuffle struct dma_fence over
>    the wire in a generic fashion between hypervisor guests).
>
> - xen-front already has the idea of pixel format for the buffer (and any
>    other metadata). Again, xen-zcopy and hyperdmabuf lack that, would need
>    to add it shoehorned in somehow.
Again, here you are talking of something which is implemented in
Xen display backend, not xen-zcopy, e.g. display backend can
implement para-virtual display w/o xen-zcopy at all, but in this case
there is a memory copying for each frame. With the help of xen-zcopy
the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or
Weston or whatever as xen-zcopy exports remote buffers as PRIME buffers,
thus no buffer copying is required.
>
> Ofc you won't be able to shovel sound or media stream data over to another
> guest like this, but that's what you have xen-v4l and xen-sound or
> whatever else for. Trying to make a new uapi, which means userspace must
> be changed for all the different use-case, instead of reusing standard
> linux driver uapi (which just happens to send the data to another
> hypervisor guest instead of real hw) imo just doesn't make much sense.
>
> Also, at least for the gpu subsystem: Any new uapi must have full
> userspace available for it, see:
>
> https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
>
> Adding more uapi is definitely the most painful way to fix a use-case.
> Personally I'd go as far and also change the xen-zcopy side on the
> receiving guest to use some standard linux uapi. E.g. you could write an
> output v4l driver to receive the frames from guest1.
So, we now know that xen-zcopy was not meant to handle page flips,
but to implement new UAPI to let user-space create buffers either
from Guest2 grant references (so it can be exported to Guest1) or
other way round, e.g. create (from Guest1 grant references to export to
Guest 2). For that reason it adds 2 IOCTLs: create buffer from grefs
or produce grefs for the buffer given.
One additional IOCTL is to wait for the buffer to be released by
Guest2 user-space.
That being said, I don't quite see how v4l can be used here to implement
UAPI I need.
>
>>> danvet, can you comment on this topic?
>>>
>>>>> 2. the page sharing mechanism - it uses Xen-grant-table.
>>>>>
>>>>> And to give you a quick summary of differences as far as I understand
>>>>> between two implementations (please correct me if I am wrong, Oleksandr.)
>>>>>
>>>>> 1. xen-zcopy is DRM specific - can import only DRM prime buffer
>>>>> while hyper_dmabuf can export any dmabuf regardless of originator
>>>> Well, this is true. And at the same time this is just a matter
>>>> of extending the API: xen-zcopy is a helper driver designed for
>>>> xen-front/back use-case, so this is why it only has DRM PRIME API
>>>>> 2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
>>>>> while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
>>>>> out synchronization message to the exporting VM for synchronization.
>>>> This is true. Again, this is because of the use-cases it covers.
>>>> But having synchronization for a generic solution seems to be a good idea.
>>> Yeah, understood xen-zcopy works ok with your use case. But I am just curious
>>> if it is ok not to have any inter-domain synchronization in this sharing model.
>> The synchronization is done with displif protocol [1]
>>> The buffer being shared is technically dma-buf and originator needs to be able
>>> to keep track of it.
>> As I am working in DRM terms the tracking is done by the DRM core
>> for me for free. (This might be one of the reasons Daniel sees DRM
>> based implementation fit very good from code-reuse POV).
> Hm, not sure what tracking you refer to here all ... I got lost in all the
> replies while catching up.
>
I was just referring to accounting stuff already implemented in the DRM 
core,
so I don't have to worry about doing the same for buffers to understand
when they are released etc.
>>>>> 3. 1-level references - when using grant-table for sharing pages, there will
>>>>> be same # of refs (each 8 byte)
>>>> To be precise, grant ref is 4 bytes
>>> You are right. Thanks for correction.;)
>>>
>>>>> as # of shared pages, which is passed to
>>>>> the userspace to be shared with importing VM in case of xen-zcopy.
>>>> The reason for that is that xen-zcopy is a helper driver, e.g.
>>>> the grant references come from the display backend [1], which implements
>>>> Xen display protocol [2]. So, effectively the backend extracts references
>>>> from frontend's requests and passes those to xen-zcopy as an array
>>>> of refs.
>>>>>    Compared
>>>>> to this, hyper_dmabuf does multiple level addressing to generate only one
>>>>> reference id that represents all shared pages.
>>>> In the protocol [2] only one reference to the gref directory is passed
>>>> between VMs
>>>> (and the gref directory is a single-linked list of shared pages containing
>>>> all
>>>> of the grefs of the buffer).
>>> ok, good to know. I will look into its implementation in more details but is
>>> this gref directory (chained grefs) something that can be used for any general
>>> memory sharing use case or is it jsut for xen-display (in current code base)?
>> Not to mislead you: one grant ref is passed via displif protocol,
>> but the page it's referencing contains the rest of the grant refs.
>>
>> As to if this can be used for any memory: yes. It is the same for
>> sndif and displif Xen protocols, but defined twice as strictly speaking
>> sndif and displif are two separate protocols.
>>
>> While reviewing your RFC v2 one of the comments I had [2] was that if we
>> can start from defining such a generic protocol for hyper-dmabuf.
>> It can be a header file, which not only has the description part
>> (which then become a part of Documentation/...rst file), but also defines
>> all the required constants for requests, responses, defines message formats,
>> state diagrams etc. all at one place. Of course this protocol must not be
>> Xen specific, but be OS/hypervisor agnostic.
>> Having that will trigger a new round of discussion, so we have it all
>> designed
>> and discussed before we start implementing.
>>
>> Besides the protocol we have to design UAPI part as well and make sure
>> the hyper-dmabuf is not only accessible from user-space, but there will be
>> number
>> of kernel-space users as well.
> Again, why do you want to create new uapi for this? Given the very strict
> requirements we have for new uapi (see above link), it's the toughest way
> to get any kind of support in.
I do understand that adding new UAPI is not good for many reasons.
But here I was meaning that current hyper-dmabuf design is
only user-space oriented, e.g. it provides number of IOCTLs to do all
the work. But I need a way to access the same from the kernel, so, for 
example,
some other para-virtual driver can export/import dma-buf, not only 
user-space.

>
> That's why I had essentially zero big questions for xen-front (except some
> implementation improvements, and stuff to make sure xen-front actually
> implements the real uapi semantics instead of its own), and why I'm asking
> much more questions on this stuff here.
>
>>>>> 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
>>>>> communication defined for dmabuf synchronization and private data (meta
>>>>> info that Matt Roper mentioned) exchange.
>>>> This is true, xen-zcopy has no means for inter VM sync and meta-data,
>>>> simply because it doesn't have any code for inter VM exchange in it,
>>>> e.g. the inter VM protocol is handled by the backend [1].
>>>>> 5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
>>>>> notified when newdmabuf is exported from other VM - uevent can be optionally
>>>>> generated when this happens.
>>>>>
>>>>> 6. structure - hyper_dmabuf is targetting to provide a generic solution for
>>>>> inter-domain dmabuf sharing for most hypervisors, which is why it has two
>>>>> layers as mattrope mentioned, front-end that contains standard API and backend
>>>>> that is specific to hypervisor.
>>>> Again, xen-zcopy is decoupled from inter VM communication
>>>>>>> No idea, didn't look at it in detail.
>>>>>>>
>>>>>>> Looks pretty complex from a distant view.  Maybe because it tries to
>>>>>>> build a communication framework using dma-bufs instead of a simple
>>>>>>> dma-buf passing mechanism.
>>>>> we started with simple dma-buf sharing but realized there are many
>>>>> things we need to consider in real use-case, so we added communication
>>>>> , notification and dma-buf synchronization then re-structured it to
>>>>> front-end and back-end (this made things more compicated..) since Xen
>>>>> was not our only target. Also, we thought passing the reference for the
>>>>> buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.
>>>>>
>>>>>> Yes, I am looking at it now, trying to figure out the full story
>>>>>> and its implementation. BTW, Intel guys were about to share some
>>>>>> test application for hyper-dmabuf, maybe I have missed one.
>>>>>> It could probably better explain the use-cases and the complexity
>>>>>> they have in hyper-dmabuf.
>>>>> One example is actually in github. If you want take a look at it, please
>>>>> visit:
>>>>>
>>>>> https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
>>>> Thank you, I'll have a look
>>>>>>> Like xen-zcopy it seems to depend on the idea that the hypervisor
>>>>>>> manages all memory it is easy for guests to share pages with the help of
>>>>>>> the hypervisor.
>>>>>> So, for xen-zcopy we were not trying to make it generic,
>>>>>> it just solves display (dumb) zero-copying use-cases for Xen.
>>>>>> We implemented it as a DRM helper driver because we can't see any
>>>>>> other use-cases as of now.
>>>>>> For example, we also have Xen para-virtualized sound driver, but
>>>>>> its buffer memory usage is not comparable to what display wants
>>>>>> and it works somewhat differently (e.g. there is no "frame done"
>>>>>> event, so one can't tell when the sound buffer can be "flipped").
>>>>>> At the same time, we do not use virtio-gpu, so this could probably
>>>>>> be one more candidate for shared dma-bufs some day.
>>>>>>>     Which simply isn't the case on kvm.
>>>>>>>
>>>>>>> hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
>>>>>>> on top of xen-zcopy.
>>>>>> Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
>>>>>> in terms of implementing all that page sharing fun in multiple directions,
>>>>>> e.g. Host->Guest, Guest->Host, Guest<->Guest.
>>>>>> But I'll let Matt and Dongwon to comment on that.
>>>>> I think we can definitely collaborate. Especially, maybe we are using some
>>>>> outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
>>>>> for bringing that up Oleksandr). However, the question is once we collaborate
>>>>> somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
>>>>> provides? I don't think we need different IOCTLs that do the same in the final
>>>>> solution.
>>>>>
>>>> If you think of xen-zcopy as a library (which implements Xen
>>>> grant references mangling) and DRM PRIME wrapper on top of that
>>>> library, we can probably define proper API for that library,
>>>> so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
>>>> about to start upstreaming Xen para-virtualized sound device driver soon,
>>>> which also uses similar code and gref passing mechanism [3].
>>>> (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
>>>> snd/xen-front and then propose a Xen helper library for sharing big buffers,
>>>> so common code of the above drivers can use the same code w/o code
>>>> duplication)
>>> I think it is possible to use your functions for memory sharing part in
>>> hyper_dmabuf's backend (this 'backend' means the layer that does page sharing
>>> and inter-vm communication with xen-specific way.), so why don't we work on
>>> "Xen helper library for sharing big buffers" first while we continue our
>>> discussion on the common API layer that can cover any dmabuf sharing cases.
>>>
>> Well, I would love we reuse the code that I have, but I also
>> understand that it was limited by my use-cases. So, I do not
>> insist we have to ;)
>> If we start designing and discussing hyper-dmabuf protocol we of course
>> can work on this helper library in parallel.
> Imo code reuse is overrated. Adding new uapi is what freaks me out here
> :-)
>
> If we end up with duplicated implementations, even in upstream, meh, not
> great, but also ok. New uapi, and in a similar way, new hypervisor api
> like the dma-buf forwarding that hyperdmabuf does is the kind of thing
> that will lock us in for 10+ years (if we make a mistake).
>
>>>> Thank you,
>>>> Oleksandr
>>>>
>>>> P.S. All, is it a good idea to move this out of udmabuf thread into a
>>>> dedicated one?
>>> Either way is fine with me.
>> So, if you can start designing the protocol we may have a dedicated mail
>> thread for that. I will try to help with the protocol as much as I can
> Please don't start with the protocol. Instead start with the concrete
> use-cases, and then figure out why exactly you need new uapi. Once we have
> that answered, we can start thinking about fleshing out the details.
On my side there are only 2 use-cases, Guest2 only:
1. Create a PRIME (dma-buf) from grant references
2. Create grant references from PRIME (dma-buf)

> Cheers, Daniel
>
Thank you,
Oleksandr
>>>>>>> cheers,
>>>>>>>     Gerd
>>>>>>>
>>>>>> Thank you,
>>>>>> Oleksandr
>>>>>>
>>>>>> P.S. Sorry for making your original mail thread to discuss things much
>>>>>> broader than your RFC...
>>>>>>
>>>> [1] https://github.com/xen-troops/displ_be
>>>> [2] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h#L484
>>>> [3] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/sndif.h
>>>>
>> [1] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h
>> [2]
>> https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00685.html
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
[1] https://github.com/xen-troops/displ_be

Daniel Vetter April 16, 2018, 7:43 a.m. UTC | #22

On Mon, Apr 16, 2018 at 10:16:31AM +0300, Oleksandr Andrushchenko wrote:
> On 04/13/2018 06:37 PM, Daniel Vetter wrote:
> > On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
> > > On 04/10/2018 08:26 PM, Dongwon Kim wrote:
> > > > On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
> > > > > On 04/06/2018 09:57 PM, Dongwon Kim wrote:
> > > > > > On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
> > > > > > > On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
> > > > > > > >     Hi,
> > > > > > > > 
> > > > > > > > > > I fail to see any common ground for xen-zcopy and udmabuf ...
> > > > > > > > > Does the above mean you can assume that xen-zcopy and udmabuf
> > > > > > > > > can co-exist as two different solutions?
> > > > > > > > Well, udmabuf route isn't fully clear yet, but yes.
> > > > > > > > 
> > > > > > > > See also gvt (intel vgpu), where the hypervisor interface is abstracted
> > > > > > > > away into a separate kernel modules even though most of the actual vgpu
> > > > > > > > emulation code is common.
> > > > > > > Thank you for your input, I'm just trying to figure out
> > > > > > > which of the three z-copy solutions intersect and how much
> > > > > > > > > And what about hyper-dmabuf?
> > > > > > xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
> > > > > > in terms of these core sharing feature:
> > > > > > 
> > > > > > 1. the sharing process - import prime/dmabuf from the producer -> extract
> > > > > > underlying pages and get those shared -> return references for shared pages
> > > > Another thing is danvet was kind of against to the idea of importing existing
> > > > dmabuf/prime buffer and forward it to the other domain due to synchronization
> > > > issues. He proposed to make hyper_dmabuf only work as an exporter so that it
> > > > can have a full control over the buffer. I think we need to talk about this
> > > > further as well.
> > > Yes, I saw this. But this limits the use-cases so much.
> > > For instance, running Android as a Guest (which uses ION to allocate
> > > buffers) means that finally HW composer will import dma-buf into
> > > the DRM driver. Then, in case of xen-front for example, it needs to be
> > > shared with the backend (Host side). Of course, we can change user-space
> > > to make xen-front allocate the buffers (make it exporter), but what we try
> > > to avoid is to change user-space which in normal world would have remain
> > > unchanged otherwise.
> > > So, I do think we have to support this use-case and just have to understand
> > > the complexity.
> > Erm, why do you need importer capability for this use-case?
> > 
> > guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes
> > that dma-buf -> import to the real display hw
> > 
> > No where in this chain do you need xen-zcopy to be able to import a
> > dma-buf (within linux, it needs to import a bunch of pages from the
> > hypervisor).
> > 
> > Now if your plan is to use xen-zcopy in the guest1 instead of xen-front,
> > then you indeed need to import.
> This is the exact use-case I was referring to while saying
> we need to import on Guest1 side. If hyper-dmabuf is so
> generic that there is no xen-front in the picture, then
> it needs to import a dma-buf, so it can be exported at Guest2 side.
> >   But that imo doesn't make sense:
> > - xen-front gives you clearly defined flip events you can forward to the
> >    hypervisor. xen-zcopy would need to add that again.
> xen-zcopy is a helper driver which doesn't handle page flips
> and is not a KMS driver as one might think of: the DRM UAPI it uses is
> just to export a dma-buf as a PRIME buffer, but that's it.
> Flipping etc. is done by the backend [1], not xen-zcopy.
> >   Same for
> >    hyperdmabuf (and really we're not going to shuffle struct dma_fence over
> >    the wire in a generic fashion between hypervisor guests).
> > 
> > - xen-front already has the idea of pixel format for the buffer (and any
> >    other metadata). Again, xen-zcopy and hyperdmabuf lack that, would need
> >    to add it shoehorned in somehow.
> Again, here you are talking of something which is implemented in
> Xen display backend, not xen-zcopy, e.g. display backend can
> implement para-virtual display w/o xen-zcopy at all, but in this case
> there is a memory copying for each frame. With the help of xen-zcopy
> the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or
> Weston or whatever as xen-zcopy exports remote buffers as PRIME buffers,
> thus no buffer copying is required.

Why do you need to copy on every frame for xen-front? In the above
pipeline, using xen-front I see 0 architectural reasons to have a copy
anywhere.

This seems to be the core of the confusion we're having here.

> > Ofc you won't be able to shovel sound or media stream data over to another
> > guest like this, but that's what you have xen-v4l and xen-sound or
> > whatever else for. Trying to make a new uapi, which means userspace must
> > be changed for all the different use-case, instead of reusing standard
> > linux driver uapi (which just happens to send the data to another
> > hypervisor guest instead of real hw) imo just doesn't make much sense.
> > 
> > Also, at least for the gpu subsystem: Any new uapi must have full
> > userspace available for it, see:
> > 
> > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
> > 
> > Adding more uapi is definitely the most painful way to fix a use-case.
> > Personally I'd go as far and also change the xen-zcopy side on the
> > receiving guest to use some standard linux uapi. E.g. you could write an
> > output v4l driver to receive the frames from guest1.
> So, we now know that xen-zcopy was not meant to handle page flips,
> but to implement new UAPI to let user-space create buffers either
> from Guest2 grant references (so it can be exported to Guest1) or
> other way round, e.g. create (from Guest1 grant references to export to
> Guest 2). For that reason it adds 2 IOCTLs: create buffer from grefs
> or produce grefs for the buffer given.
> One additional IOCTL is to wait for the buffer to be released by
> Guest2 user-space.
> That being said, I don't quite see how v4l can be used here to implement
> UAPI I need.

Under the assumption that you can make xen-front to zerocopy for the
kernel->hypervisor path, v4l could be made to work for the
hypervisor->kernel side of the pipeline.

But it sounds like we have a confusion already on why or why not xen-front
can or cannot do zerocopy.

> > > > danvet, can you comment on this topic?
> > > > 
> > > > > > 2. the page sharing mechanism - it uses Xen-grant-table.
> > > > > > 
> > > > > > And to give you a quick summary of differences as far as I understand
> > > > > > between two implementations (please correct me if I am wrong, Oleksandr.)
> > > > > > 
> > > > > > 1. xen-zcopy is DRM specific - can import only DRM prime buffer
> > > > > > while hyper_dmabuf can export any dmabuf regardless of originator
> > > > > Well, this is true. And at the same time this is just a matter
> > > > > of extending the API: xen-zcopy is a helper driver designed for
> > > > > xen-front/back use-case, so this is why it only has DRM PRIME API
> > > > > > 2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
> > > > > > while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
> > > > > > out synchronization message to the exporting VM for synchronization.
> > > > > This is true. Again, this is because of the use-cases it covers.
> > > > > But having synchronization for a generic solution seems to be a good idea.
> > > > Yeah, understood xen-zcopy works ok with your use case. But I am just curious
> > > > if it is ok not to have any inter-domain synchronization in this sharing model.
> > > The synchronization is done with displif protocol [1]
> > > > The buffer being shared is technically dma-buf and originator needs to be able
> > > > to keep track of it.
> > > As I am working in DRM terms the tracking is done by the DRM core
> > > for me for free. (This might be one of the reasons Daniel sees DRM
> > > based implementation fit very good from code-reuse POV).
> > Hm, not sure what tracking you refer to here all ... I got lost in all the
> > replies while catching up.
> > 
> I was just referring to accounting stuff already implemented in the DRM
> core,
> so I don't have to worry about doing the same for buffers to understand
> when they are released etc.
> > > > > > 3. 1-level references - when using grant-table for sharing pages, there will
> > > > > > be same # of refs (each 8 byte)
> > > > > To be precise, grant ref is 4 bytes
> > > > You are right. Thanks for correction.;)
> > > > 
> > > > > > as # of shared pages, which is passed to
> > > > > > the userspace to be shared with importing VM in case of xen-zcopy.
> > > > > The reason for that is that xen-zcopy is a helper driver, e.g.
> > > > > the grant references come from the display backend [1], which implements
> > > > > Xen display protocol [2]. So, effectively the backend extracts references
> > > > > from frontend's requests and passes those to xen-zcopy as an array
> > > > > of refs.
> > > > > >    Compared
> > > > > > to this, hyper_dmabuf does multiple level addressing to generate only one
> > > > > > reference id that represents all shared pages.
> > > > > In the protocol [2] only one reference to the gref directory is passed
> > > > > between VMs
> > > > > (and the gref directory is a single-linked list of shared pages containing
> > > > > all
> > > > > of the grefs of the buffer).
> > > > ok, good to know. I will look into its implementation in more details but is
> > > > this gref directory (chained grefs) something that can be used for any general
> > > > memory sharing use case or is it jsut for xen-display (in current code base)?
> > > Not to mislead you: one grant ref is passed via displif protocol,
> > > but the page it's referencing contains the rest of the grant refs.
> > > 
> > > As to if this can be used for any memory: yes. It is the same for
> > > sndif and displif Xen protocols, but defined twice as strictly speaking
> > > sndif and displif are two separate protocols.
> > > 
> > > While reviewing your RFC v2 one of the comments I had [2] was that if we
> > > can start from defining such a generic protocol for hyper-dmabuf.
> > > It can be a header file, which not only has the description part
> > > (which then become a part of Documentation/...rst file), but also defines
> > > all the required constants for requests, responses, defines message formats,
> > > state diagrams etc. all at one place. Of course this protocol must not be
> > > Xen specific, but be OS/hypervisor agnostic.
> > > Having that will trigger a new round of discussion, so we have it all
> > > designed
> > > and discussed before we start implementing.
> > > 
> > > Besides the protocol we have to design UAPI part as well and make sure
> > > the hyper-dmabuf is not only accessible from user-space, but there will be
> > > number
> > > of kernel-space users as well.
> > Again, why do you want to create new uapi for this? Given the very strict
> > requirements we have for new uapi (see above link), it's the toughest way
> > to get any kind of support in.
> I do understand that adding new UAPI is not good for many reasons.
> But here I was meaning that current hyper-dmabuf design is
> only user-space oriented, e.g. it provides number of IOCTLs to do all
> the work. But I need a way to access the same from the kernel, so, for
> example,
> some other para-virtual driver can export/import dma-buf, not only
> user-space.

If you need an import-export helper library, just merge it. Do not attach
any uapi to it, just the internal helpers.

Much, much, much easier to land.

> > That's why I had essentially zero big questions for xen-front (except some
> > implementation improvements, and stuff to make sure xen-front actually
> > implements the real uapi semantics instead of its own), and why I'm asking
> > much more questions on this stuff here.
> > 
> > > > > > 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
> > > > > > communication defined for dmabuf synchronization and private data (meta
> > > > > > info that Matt Roper mentioned) exchange.
> > > > > This is true, xen-zcopy has no means for inter VM sync and meta-data,
> > > > > simply because it doesn't have any code for inter VM exchange in it,
> > > > > e.g. the inter VM protocol is handled by the backend [1].
> > > > > > 5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
> > > > > > notified when newdmabuf is exported from other VM - uevent can be optionally
> > > > > > generated when this happens.
> > > > > > 
> > > > > > 6. structure - hyper_dmabuf is targetting to provide a generic solution for
> > > > > > inter-domain dmabuf sharing for most hypervisors, which is why it has two
> > > > > > layers as mattrope mentioned, front-end that contains standard API and backend
> > > > > > that is specific to hypervisor.
> > > > > Again, xen-zcopy is decoupled from inter VM communication
> > > > > > > > No idea, didn't look at it in detail.
> > > > > > > > 
> > > > > > > > Looks pretty complex from a distant view.  Maybe because it tries to
> > > > > > > > build a communication framework using dma-bufs instead of a simple
> > > > > > > > dma-buf passing mechanism.
> > > > > > we started with simple dma-buf sharing but realized there are many
> > > > > > things we need to consider in real use-case, so we added communication
> > > > > > , notification and dma-buf synchronization then re-structured it to
> > > > > > front-end and back-end (this made things more compicated..) since Xen
> > > > > > was not our only target. Also, we thought passing the reference for the
> > > > > > buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.
> > > > > > 
> > > > > > > Yes, I am looking at it now, trying to figure out the full story
> > > > > > > and its implementation. BTW, Intel guys were about to share some
> > > > > > > test application for hyper-dmabuf, maybe I have missed one.
> > > > > > > It could probably better explain the use-cases and the complexity
> > > > > > > they have in hyper-dmabuf.
> > > > > > One example is actually in github. If you want take a look at it, please
> > > > > > visit:
> > > > > > 
> > > > > > https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
> > > > > Thank you, I'll have a look
> > > > > > > > Like xen-zcopy it seems to depend on the idea that the hypervisor
> > > > > > > > manages all memory it is easy for guests to share pages with the help of
> > > > > > > > the hypervisor.
> > > > > > > So, for xen-zcopy we were not trying to make it generic,
> > > > > > > it just solves display (dumb) zero-copying use-cases for Xen.
> > > > > > > We implemented it as a DRM helper driver because we can't see any
> > > > > > > other use-cases as of now.
> > > > > > > For example, we also have Xen para-virtualized sound driver, but
> > > > > > > its buffer memory usage is not comparable to what display wants
> > > > > > > and it works somewhat differently (e.g. there is no "frame done"
> > > > > > > event, so one can't tell when the sound buffer can be "flipped").
> > > > > > > At the same time, we do not use virtio-gpu, so this could probably
> > > > > > > be one more candidate for shared dma-bufs some day.
> > > > > > > >     Which simply isn't the case on kvm.
> > > > > > > > 
> > > > > > > > hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
> > > > > > > > on top of xen-zcopy.
> > > > > > > Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
> > > > > > > in terms of implementing all that page sharing fun in multiple directions,
> > > > > > > e.g. Host->Guest, Guest->Host, Guest<->Guest.
> > > > > > > But I'll let Matt and Dongwon to comment on that.
> > > > > > I think we can definitely collaborate. Especially, maybe we are using some
> > > > > > outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
> > > > > > for bringing that up Oleksandr). However, the question is once we collaborate
> > > > > > somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
> > > > > > provides? I don't think we need different IOCTLs that do the same in the final
> > > > > > solution.
> > > > > > 
> > > > > If you think of xen-zcopy as a library (which implements Xen
> > > > > grant references mangling) and DRM PRIME wrapper on top of that
> > > > > library, we can probably define proper API for that library,
> > > > > so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
> > > > > about to start upstreaming Xen para-virtualized sound device driver soon,
> > > > > which also uses similar code and gref passing mechanism [3].
> > > > > (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
> > > > > snd/xen-front and then propose a Xen helper library for sharing big buffers,
> > > > > so common code of the above drivers can use the same code w/o code
> > > > > duplication)
> > > > I think it is possible to use your functions for memory sharing part in
> > > > hyper_dmabuf's backend (this 'backend' means the layer that does page sharing
> > > > and inter-vm communication with xen-specific way.), so why don't we work on
> > > > "Xen helper library for sharing big buffers" first while we continue our
> > > > discussion on the common API layer that can cover any dmabuf sharing cases.
> > > > 
> > > Well, I would love we reuse the code that I have, but I also
> > > understand that it was limited by my use-cases. So, I do not
> > > insist we have to ;)
> > > If we start designing and discussing hyper-dmabuf protocol we of course
> > > can work on this helper library in parallel.
> > Imo code reuse is overrated. Adding new uapi is what freaks me out here
> > :-)
> > 
> > If we end up with duplicated implementations, even in upstream, meh, not
> > great, but also ok. New uapi, and in a similar way, new hypervisor api
> > like the dma-buf forwarding that hyperdmabuf does is the kind of thing
> > that will lock us in for 10+ years (if we make a mistake).
> > 
> > > > > Thank you,
> > > > > Oleksandr
> > > > > 
> > > > > P.S. All, is it a good idea to move this out of udmabuf thread into a
> > > > > dedicated one?
> > > > Either way is fine with me.
> > > So, if you can start designing the protocol we may have a dedicated mail
> > > thread for that. I will try to help with the protocol as much as I can
> > Please don't start with the protocol. Instead start with the concrete
> > use-cases, and then figure out why exactly you need new uapi. Once we have
> > that answered, we can start thinking about fleshing out the details.
> On my side there are only 2 use-cases, Guest2 only:
> 1. Create a PRIME (dma-buf) from grant references
> 2. Create grant references from PRIME (dma-buf)

So these grant references, are those userspace visible things? I thought
the grant references was just the kernel/hypervisor internal magic to make
this all work?
-Daniel

Oleksandr Andrushchenko April 16, 2018, 8:22 a.m. UTC | #23

On 04/16/2018 10:43 AM, Daniel Vetter wrote:
> On Mon, Apr 16, 2018 at 10:16:31AM +0300, Oleksandr Andrushchenko wrote:
>> On 04/13/2018 06:37 PM, Daniel Vetter wrote:
>>> On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
>>>> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
>>>>> On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
>>>>>> On 04/06/2018 09:57 PM, Dongwon Kim wrote:
>>>>>>> On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
>>>>>>>> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
>>>>>>>>>      Hi,
>>>>>>>>>
>>>>>>>>>>> I fail to see any common ground for xen-zcopy and udmabuf ...
>>>>>>>>>> Does the above mean you can assume that xen-zcopy and udmabuf
>>>>>>>>>> can co-exist as two different solutions?
>>>>>>>>> Well, udmabuf route isn't fully clear yet, but yes.
>>>>>>>>>
>>>>>>>>> See also gvt (intel vgpu), where the hypervisor interface is abstracted
>>>>>>>>> away into a separate kernel modules even though most of the actual vgpu
>>>>>>>>> emulation code is common.
>>>>>>>> Thank you for your input, I'm just trying to figure out
>>>>>>>> which of the three z-copy solutions intersect and how much
>>>>>>>>>> And what about hyper-dmabuf?
>>>>>>> xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
>>>>>>> in terms of these core sharing feature:
>>>>>>>
>>>>>>> 1. the sharing process - import prime/dmabuf from the producer -> extract
>>>>>>> underlying pages and get those shared -> return references for shared pages
>>>>> Another thing is danvet was kind of against to the idea of importing existing
>>>>> dmabuf/prime buffer and forward it to the other domain due to synchronization
>>>>> issues. He proposed to make hyper_dmabuf only work as an exporter so that it
>>>>> can have a full control over the buffer. I think we need to talk about this
>>>>> further as well.
>>>> Yes, I saw this. But this limits the use-cases so much.
>>>> For instance, running Android as a Guest (which uses ION to allocate
>>>> buffers) means that finally HW composer will import dma-buf into
>>>> the DRM driver. Then, in case of xen-front for example, it needs to be
>>>> shared with the backend (Host side). Of course, we can change user-space
>>>> to make xen-front allocate the buffers (make it exporter), but what we try
>>>> to avoid is to change user-space which in normal world would have remain
>>>> unchanged otherwise.
>>>> So, I do think we have to support this use-case and just have to understand
>>>> the complexity.
>>> Erm, why do you need importer capability for this use-case?
>>>
>>> guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes
>>> that dma-buf -> import to the real display hw
>>>
>>> No where in this chain do you need xen-zcopy to be able to import a
>>> dma-buf (within linux, it needs to import a bunch of pages from the
>>> hypervisor).
>>>
>>> Now if your plan is to use xen-zcopy in the guest1 instead of xen-front,
>>> then you indeed need to import.
>> This is the exact use-case I was referring to while saying
>> we need to import on Guest1 side. If hyper-dmabuf is so
>> generic that there is no xen-front in the picture, then
>> it needs to import a dma-buf, so it can be exported at Guest2 side.
>>>    But that imo doesn't make sense:
>>> - xen-front gives you clearly defined flip events you can forward to the
>>>     hypervisor. xen-zcopy would need to add that again.
>> xen-zcopy is a helper driver which doesn't handle page flips
>> and is not a KMS driver as one might think of: the DRM UAPI it uses is
>> just to export a dma-buf as a PRIME buffer, but that's it.
>> Flipping etc. is done by the backend [1], not xen-zcopy.
>>>    Same for
>>>     hyperdmabuf (and really we're not going to shuffle struct dma_fence over
>>>     the wire in a generic fashion between hypervisor guests).
>>>
>>> - xen-front already has the idea of pixel format for the buffer (and any
>>>     other metadata). Again, xen-zcopy and hyperdmabuf lack that, would need
>>>     to add it shoehorned in somehow.
>> Again, here you are talking of something which is implemented in
>> Xen display backend, not xen-zcopy, e.g. display backend can
>> implement para-virtual display w/o xen-zcopy at all, but in this case
>> there is a memory copying for each frame. With the help of xen-zcopy
>> the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or
>> Weston or whatever as xen-zcopy exports remote buffers as PRIME buffers,
>> thus no buffer copying is required.
> Why do you need to copy on every frame for xen-front? In the above
> pipeline, using xen-front I see 0 architectural reasons to have a copy
> anywhere.
>
> This seems to be the core of the confusion we're having here.
Ok, so I'll try to explain:
1. xen-front - produces a display buffer to be shown at Guest2
by the backend, shares its grant references with the backend
2. xen-front sends page flip event to the backend specifying the
buffer in question
3. Backend takes the shared buffer (which is only a buffer mapped into
backend's memory, it is not a dma-buf/PRIME one) and makes memcpy from
it to a local dumb/surface
4. Backend flips that local dumb buffer/surface

If I have a xen-zcopy helper driver then I can avoid doing step 3):
1) 2) remain the same as above
3) Initially for a new display buffer, backend calls xen-zcopy to create
a local PRIME buffer from the grant references provided by the xen-front
via displif protocol [1]: we now have handle_zcopy
4) Backend exports this PRIME with HANDLE_TO_FD from xen-zcopy and imports
it into Weston-KMS/DRM or real HW DRM driver with FD_TO_HANDLE: we now have
handle_local
5) On page flip event backend flips local PRIME: uses handle_local for flips
>
>>> Ofc you won't be able to shovel sound or media stream data over to another
>>> guest like this, but that's what you have xen-v4l and xen-sound or
>>> whatever else for. Trying to make a new uapi, which means userspace must
>>> be changed for all the different use-case, instead of reusing standard
>>> linux driver uapi (which just happens to send the data to another
>>> hypervisor guest instead of real hw) imo just doesn't make much sense.
>>>
>>> Also, at least for the gpu subsystem: Any new uapi must have full
>>> userspace available for it, see:
>>>
>>> https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
>>>
>>> Adding more uapi is definitely the most painful way to fix a use-case.
>>> Personally I'd go as far and also change the xen-zcopy side on the
>>> receiving guest to use some standard linux uapi. E.g. you could write an
>>> output v4l driver to receive the frames from guest1.
>> So, we now know that xen-zcopy was not meant to handle page flips,
>> but to implement new UAPI to let user-space create buffers either
>> from Guest2 grant references (so it can be exported to Guest1) or
>> other way round, e.g. create (from Guest1 grant references to export to
>> Guest 2). For that reason it adds 2 IOCTLs: create buffer from grefs
>> or produce grefs for the buffer given.
>> One additional IOCTL is to wait for the buffer to be released by
>> Guest2 user-space.
>> That being said, I don't quite see how v4l can be used here to implement
>> UAPI I need.
> Under the assumption that you can make xen-front to zerocopy for the
> kernel->hypervisor path, v4l could be made to work for the
> hypervisor->kernel side of the pipeline.
>
> But it sounds like we have a confusion already on why or why not xen-front
> can or cannot do zerocopy.
xen-front provides an array of grant references to Guest2 (backend).
It's up to backend what it does with those grant references
which at Guest2 side are not PRIME or dma-buf, but just a set of pages.
This is xen-zcopy which turns these pages into a PRIME. When this is done
backend can now tell DRM drivers to use the buffer in DRM terms.
>>>>> danvet, can you comment on this topic?
>>>>>
>>>>>>> 2. the page sharing mechanism - it uses Xen-grant-table.
>>>>>>>
>>>>>>> And to give you a quick summary of differences as far as I understand
>>>>>>> between two implementations (please correct me if I am wrong, Oleksandr.)
>>>>>>>
>>>>>>> 1. xen-zcopy is DRM specific - can import only DRM prime buffer
>>>>>>> while hyper_dmabuf can export any dmabuf regardless of originator
>>>>>> Well, this is true. And at the same time this is just a matter
>>>>>> of extending the API: xen-zcopy is a helper driver designed for
>>>>>> xen-front/back use-case, so this is why it only has DRM PRIME API
>>>>>>> 2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
>>>>>>> while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
>>>>>>> out synchronization message to the exporting VM for synchronization.
>>>>>> This is true. Again, this is because of the use-cases it covers.
>>>>>> But having synchronization for a generic solution seems to be a good idea.
>>>>> Yeah, understood xen-zcopy works ok with your use case. But I am just curious
>>>>> if it is ok not to have any inter-domain synchronization in this sharing model.
>>>> The synchronization is done with displif protocol [1]
>>>>> The buffer being shared is technically dma-buf and originator needs to be able
>>>>> to keep track of it.
>>>> As I am working in DRM terms the tracking is done by the DRM core
>>>> for me for free. (This might be one of the reasons Daniel sees DRM
>>>> based implementation fit very good from code-reuse POV).
>>> Hm, not sure what tracking you refer to here all ... I got lost in all the
>>> replies while catching up.
>>>
>> I was just referring to accounting stuff already implemented in the DRM
>> core,
>> so I don't have to worry about doing the same for buffers to understand
>> when they are released etc.
>>>>>>> 3. 1-level references - when using grant-table for sharing pages, there will
>>>>>>> be same # of refs (each 8 byte)
>>>>>> To be precise, grant ref is 4 bytes
>>>>> You are right. Thanks for correction.;)
>>>>>
>>>>>>> as # of shared pages, which is passed to
>>>>>>> the userspace to be shared with importing VM in case of xen-zcopy.
>>>>>> The reason for that is that xen-zcopy is a helper driver, e.g.
>>>>>> the grant references come from the display backend [1], which implements
>>>>>> Xen display protocol [2]. So, effectively the backend extracts references
>>>>>> from frontend's requests and passes those to xen-zcopy as an array
>>>>>> of refs.
>>>>>>>     Compared
>>>>>>> to this, hyper_dmabuf does multiple level addressing to generate only one
>>>>>>> reference id that represents all shared pages.
>>>>>> In the protocol [2] only one reference to the gref directory is passed
>>>>>> between VMs
>>>>>> (and the gref directory is a single-linked list of shared pages containing
>>>>>> all
>>>>>> of the grefs of the buffer).
>>>>> ok, good to know. I will look into its implementation in more details but is
>>>>> this gref directory (chained grefs) something that can be used for any general
>>>>> memory sharing use case or is it jsut for xen-display (in current code base)?
>>>> Not to mislead you: one grant ref is passed via displif protocol,
>>>> but the page it's referencing contains the rest of the grant refs.
>>>>
>>>> As to if this can be used for any memory: yes. It is the same for
>>>> sndif and displif Xen protocols, but defined twice as strictly speaking
>>>> sndif and displif are two separate protocols.
>>>>
>>>> While reviewing your RFC v2 one of the comments I had [2] was that if we
>>>> can start from defining such a generic protocol for hyper-dmabuf.
>>>> It can be a header file, which not only has the description part
>>>> (which then become a part of Documentation/...rst file), but also defines
>>>> all the required constants for requests, responses, defines message formats,
>>>> state diagrams etc. all at one place. Of course this protocol must not be
>>>> Xen specific, but be OS/hypervisor agnostic.
>>>> Having that will trigger a new round of discussion, so we have it all
>>>> designed
>>>> and discussed before we start implementing.
>>>>
>>>> Besides the protocol we have to design UAPI part as well and make sure
>>>> the hyper-dmabuf is not only accessible from user-space, but there will be
>>>> number
>>>> of kernel-space users as well.
>>> Again, why do you want to create new uapi for this? Given the very strict
>>> requirements we have for new uapi (see above link), it's the toughest way
>>> to get any kind of support in.
>> I do understand that adding new UAPI is not good for many reasons.
>> But here I was meaning that current hyper-dmabuf design is
>> only user-space oriented, e.g. it provides number of IOCTLs to do all
>> the work. But I need a way to access the same from the kernel, so, for
>> example,
>> some other para-virtual driver can export/import dma-buf, not only
>> user-space.
> If you need an import-export helper library, just merge it. Do not attach
> any uapi to it, just the internal helpers.
>
> Much, much, much easier to land.
This can be done, but again, I will need some entity which
backend may use to convert xen-front's grant references into
a PRIME buffer, hence there is UAPI for that. In other words,
I'll need a thiner xen-zcopy which will implement the same UAPI
and use that library for Xen related stuff.

The confusion may also come from the fact that the backend is
a user-space application, not a kernel module (we have 2 modes
of its operation as of now: DRM master or Weston client), so
it needs a way to talk to the kernel.
>>> That's why I had essentially zero big questions for xen-front (except some
>>> implementation improvements, and stuff to make sure xen-front actually
>>> implements the real uapi semantics instead of its own), and why I'm asking
>>> much more questions on this stuff here.
>>>
>>>>>>> 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
>>>>>>> communication defined for dmabuf synchronization and private data (meta
>>>>>>> info that Matt Roper mentioned) exchange.
>>>>>> This is true, xen-zcopy has no means for inter VM sync and meta-data,
>>>>>> simply because it doesn't have any code for inter VM exchange in it,
>>>>>> e.g. the inter VM protocol is handled by the backend [1].
>>>>>>> 5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
>>>>>>> notified when newdmabuf is exported from other VM - uevent can be optionally
>>>>>>> generated when this happens.
>>>>>>>
>>>>>>> 6. structure - hyper_dmabuf is targetting to provide a generic solution for
>>>>>>> inter-domain dmabuf sharing for most hypervisors, which is why it has two
>>>>>>> layers as mattrope mentioned, front-end that contains standard API and backend
>>>>>>> that is specific to hypervisor.
>>>>>> Again, xen-zcopy is decoupled from inter VM communication
>>>>>>>>> No idea, didn't look at it in detail.
>>>>>>>>>
>>>>>>>>> Looks pretty complex from a distant view.  Maybe because it tries to
>>>>>>>>> build a communication framework using dma-bufs instead of a simple
>>>>>>>>> dma-buf passing mechanism.
>>>>>>> we started with simple dma-buf sharing but realized there are many
>>>>>>> things we need to consider in real use-case, so we added communication
>>>>>>> , notification and dma-buf synchronization then re-structured it to
>>>>>>> front-end and back-end (this made things more compicated..) since Xen
>>>>>>> was not our only target. Also, we thought passing the reference for the
>>>>>>> buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.
>>>>>>>
>>>>>>>> Yes, I am looking at it now, trying to figure out the full story
>>>>>>>> and its implementation. BTW, Intel guys were about to share some
>>>>>>>> test application for hyper-dmabuf, maybe I have missed one.
>>>>>>>> It could probably better explain the use-cases and the complexity
>>>>>>>> they have in hyper-dmabuf.
>>>>>>> One example is actually in github. If you want take a look at it, please
>>>>>>> visit:
>>>>>>>
>>>>>>> https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
>>>>>> Thank you, I'll have a look
>>>>>>>>> Like xen-zcopy it seems to depend on the idea that the hypervisor
>>>>>>>>> manages all memory it is easy for guests to share pages with the help of
>>>>>>>>> the hypervisor.
>>>>>>>> So, for xen-zcopy we were not trying to make it generic,
>>>>>>>> it just solves display (dumb) zero-copying use-cases for Xen.
>>>>>>>> We implemented it as a DRM helper driver because we can't see any
>>>>>>>> other use-cases as of now.
>>>>>>>> For example, we also have Xen para-virtualized sound driver, but
>>>>>>>> its buffer memory usage is not comparable to what display wants
>>>>>>>> and it works somewhat differently (e.g. there is no "frame done"
>>>>>>>> event, so one can't tell when the sound buffer can be "flipped").
>>>>>>>> At the same time, we do not use virtio-gpu, so this could probably
>>>>>>>> be one more candidate for shared dma-bufs some day.
>>>>>>>>>      Which simply isn't the case on kvm.
>>>>>>>>>
>>>>>>>>> hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
>>>>>>>>> on top of xen-zcopy.
>>>>>>>> Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
>>>>>>>> in terms of implementing all that page sharing fun in multiple directions,
>>>>>>>> e.g. Host->Guest, Guest->Host, Guest<->Guest.
>>>>>>>> But I'll let Matt and Dongwon to comment on that.
>>>>>>> I think we can definitely collaborate. Especially, maybe we are using some
>>>>>>> outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
>>>>>>> for bringing that up Oleksandr). However, the question is once we collaborate
>>>>>>> somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
>>>>>>> provides? I don't think we need different IOCTLs that do the same in the final
>>>>>>> solution.
>>>>>>>
>>>>>> If you think of xen-zcopy as a library (which implements Xen
>>>>>> grant references mangling) and DRM PRIME wrapper on top of that
>>>>>> library, we can probably define proper API for that library,
>>>>>> so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
>>>>>> about to start upstreaming Xen para-virtualized sound device driver soon,
>>>>>> which also uses similar code and gref passing mechanism [3].
>>>>>> (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
>>>>>> snd/xen-front and then propose a Xen helper library for sharing big buffers,
>>>>>> so common code of the above drivers can use the same code w/o code
>>>>>> duplication)
>>>>> I think it is possible to use your functions for memory sharing part in
>>>>> hyper_dmabuf's backend (this 'backend' means the layer that does page sharing
>>>>> and inter-vm communication with xen-specific way.), so why don't we work on
>>>>> "Xen helper library for sharing big buffers" first while we continue our
>>>>> discussion on the common API layer that can cover any dmabuf sharing cases.
>>>>>
>>>> Well, I would love we reuse the code that I have, but I also
>>>> understand that it was limited by my use-cases. So, I do not
>>>> insist we have to ;)
>>>> If we start designing and discussing hyper-dmabuf protocol we of course
>>>> can work on this helper library in parallel.
>>> Imo code reuse is overrated. Adding new uapi is what freaks me out here
>>> :-)
>>>
>>> If we end up with duplicated implementations, even in upstream, meh, not
>>> great, but also ok. New uapi, and in a similar way, new hypervisor api
>>> like the dma-buf forwarding that hyperdmabuf does is the kind of thing
>>> that will lock us in for 10+ years (if we make a mistake).
>>>
>>>>>> Thank you,
>>>>>> Oleksandr
>>>>>>
>>>>>> P.S. All, is it a good idea to move this out of udmabuf thread into a
>>>>>> dedicated one?
>>>>> Either way is fine with me.
>>>> So, if you can start designing the protocol we may have a dedicated mail
>>>> thread for that. I will try to help with the protocol as much as I can
>>> Please don't start with the protocol. Instead start with the concrete
>>> use-cases, and then figure out why exactly you need new uapi. Once we have
>>> that answered, we can start thinking about fleshing out the details.
>> On my side there are only 2 use-cases, Guest2 only:
>> 1. Create a PRIME (dma-buf) from grant references
>> 2. Create grant references from PRIME (dma-buf)
> So these grant references, are those userspace visible things?
Yes, the user-space backend receives those from xen-front via [1]

> I thought
> the grant references was just the kernel/hypervisor internal magic to make
> this all work?
So, I can map the grant references from user-space, but I won't
be able to turn those into a PRIME buffer. So, the only use of those
w/o xen-zcopy is to map grant refs and copy into real HW dumb on every 
page flip.
> -Daniel
[1] 
https://elixir.bootlin.com/linux/v4.17-rc1/source/include/xen/interface/io/displif.h#L484

Daniel Vetter April 16, 2018, 9:32 a.m. UTC | #24

On Mon, Apr 16, 2018 at 10:22 AM, Oleksandr Andrushchenko
<andr2000@gmail.com> wrote:
> On 04/16/2018 10:43 AM, Daniel Vetter wrote:
>>
>> On Mon, Apr 16, 2018 at 10:16:31AM +0300, Oleksandr Andrushchenko wrote:
>>>
>>> On 04/13/2018 06:37 PM, Daniel Vetter wrote:
>>>>
>>>> On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
>>>>>
>>>>> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
>>>>>>
>>>>>> On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko
>>>>>> wrote:
>>>>>>>
>>>>>>> On 04/06/2018 09:57 PM, Dongwon Kim wrote:
>>>>>>>>
>>>>>>>> On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
>>>>>>>>>>
>>>>>>>>>>      Hi,
>>>>>>>>>>
>>>>>>>>>>>> I fail to see any common ground for xen-zcopy and udmabuf ...
>>>>>>>>>>>
>>>>>>>>>>> Does the above mean you can assume that xen-zcopy and udmabuf
>>>>>>>>>>> can co-exist as two different solutions?
>>>>>>>>>>
>>>>>>>>>> Well, udmabuf route isn't fully clear yet, but yes.
>>>>>>>>>>
>>>>>>>>>> See also gvt (intel vgpu), where the hypervisor interface is
>>>>>>>>>> abstracted
>>>>>>>>>> away into a separate kernel modules even though most of the actual
>>>>>>>>>> vgpu
>>>>>>>>>> emulation code is common.
>>>>>>>>>
>>>>>>>>> Thank you for your input, I'm just trying to figure out
>>>>>>>>> which of the three z-copy solutions intersect and how much
>>>>>>>>>>>
>>>>>>>>>>> And what about hyper-dmabuf?
>>>>>>>>
>>>>>>>> xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
>>>>>>>> in terms of these core sharing feature:
>>>>>>>>
>>>>>>>> 1. the sharing process - import prime/dmabuf from the producer ->
>>>>>>>> extract
>>>>>>>> underlying pages and get those shared -> return references for
>>>>>>>> shared pages
>>>>>>
>>>>>> Another thing is danvet was kind of against to the idea of importing
>>>>>> existing
>>>>>> dmabuf/prime buffer and forward it to the other domain due to
>>>>>> synchronization
>>>>>> issues. He proposed to make hyper_dmabuf only work as an exporter so
>>>>>> that it
>>>>>> can have a full control over the buffer. I think we need to talk about
>>>>>> this
>>>>>> further as well.
>>>>>
>>>>> Yes, I saw this. But this limits the use-cases so much.
>>>>> For instance, running Android as a Guest (which uses ION to allocate
>>>>> buffers) means that finally HW composer will import dma-buf into
>>>>> the DRM driver. Then, in case of xen-front for example, it needs to be
>>>>> shared with the backend (Host side). Of course, we can change
>>>>> user-space
>>>>> to make xen-front allocate the buffers (make it exporter), but what we
>>>>> try
>>>>> to avoid is to change user-space which in normal world would have
>>>>> remain
>>>>> unchanged otherwise.
>>>>> So, I do think we have to support this use-case and just have to
>>>>> understand
>>>>> the complexity.
>>>>
>>>> Erm, why do you need importer capability for this use-case?
>>>>
>>>> guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes
>>>> that dma-buf -> import to the real display hw
>>>>
>>>> No where in this chain do you need xen-zcopy to be able to import a
>>>> dma-buf (within linux, it needs to import a bunch of pages from the
>>>> hypervisor).
>>>>
>>>> Now if your plan is to use xen-zcopy in the guest1 instead of xen-front,
>>>> then you indeed need to import.
>>>
>>> This is the exact use-case I was referring to while saying
>>> we need to import on Guest1 side. If hyper-dmabuf is so
>>> generic that there is no xen-front in the picture, then
>>> it needs to import a dma-buf, so it can be exported at Guest2 side.
>>>>
>>>>    But that imo doesn't make sense:
>>>> - xen-front gives you clearly defined flip events you can forward to the
>>>>     hypervisor. xen-zcopy would need to add that again.
>>>
>>> xen-zcopy is a helper driver which doesn't handle page flips
>>> and is not a KMS driver as one might think of: the DRM UAPI it uses is
>>> just to export a dma-buf as a PRIME buffer, but that's it.
>>> Flipping etc. is done by the backend [1], not xen-zcopy.
>>>>
>>>>    Same for
>>>>     hyperdmabuf (and really we're not going to shuffle struct dma_fence
>>>> over
>>>>     the wire in a generic fashion between hypervisor guests).
>>>>
>>>> - xen-front already has the idea of pixel format for the buffer (and any
>>>>     other metadata). Again, xen-zcopy and hyperdmabuf lack that, would
>>>> need
>>>>     to add it shoehorned in somehow.
>>>
>>> Again, here you are talking of something which is implemented in
>>> Xen display backend, not xen-zcopy, e.g. display backend can
>>> implement para-virtual display w/o xen-zcopy at all, but in this case
>>> there is a memory copying for each frame. With the help of xen-zcopy
>>> the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or
>>> Weston or whatever as xen-zcopy exports remote buffers as PRIME buffers,
>>> thus no buffer copying is required.
>>
>> Why do you need to copy on every frame for xen-front? In the above
>> pipeline, using xen-front I see 0 architectural reasons to have a copy
>> anywhere.
>>
>> This seems to be the core of the confusion we're having here.
>
> Ok, so I'll try to explain:
> 1. xen-front - produces a display buffer to be shown at Guest2
> by the backend, shares its grant references with the backend
> 2. xen-front sends page flip event to the backend specifying the
> buffer in question
> 3. Backend takes the shared buffer (which is only a buffer mapped into
> backend's memory, it is not a dma-buf/PRIME one) and makes memcpy from
> it to a local dumb/surface

Why do you even do that? The copying here I mean - why don't you just
directly scan out from the grant references you received through the
hypervisor?

Also I'm not clear in your example which step happens where (guest 1/2
or hypervisor)?

> 4. Backend flips that local dumb buffer/surface
>
> If I have a xen-zcopy helper driver then I can avoid doing step 3):
> 1) 2) remain the same as above
> 3) Initially for a new display buffer, backend calls xen-zcopy to create
> a local PRIME buffer from the grant references provided by the xen-front
> via displif protocol [1]: we now have handle_zcopy
> 4) Backend exports this PRIME with HANDLE_TO_FD from xen-zcopy and imports
> it into Weston-KMS/DRM or real HW DRM driver with FD_TO_HANDLE: we now have
> handle_local
> 5) On page flip event backend flips local PRIME: uses handle_local for flips
>
>>
>>>> Ofc you won't be able to shovel sound or media stream data over to
>>>> another
>>>> guest like this, but that's what you have xen-v4l and xen-sound or
>>>> whatever else for. Trying to make a new uapi, which means userspace must
>>>> be changed for all the different use-case, instead of reusing standard
>>>> linux driver uapi (which just happens to send the data to another
>>>> hypervisor guest instead of real hw) imo just doesn't make much sense.
>>>>
>>>> Also, at least for the gpu subsystem: Any new uapi must have full
>>>> userspace available for it, see:
>>>>
>>>>
>>>> https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
>>>>
>>>> Adding more uapi is definitely the most painful way to fix a use-case.
>>>> Personally I'd go as far and also change the xen-zcopy side on the
>>>> receiving guest to use some standard linux uapi. E.g. you could write an
>>>> output v4l driver to receive the frames from guest1.
>>>
>>> So, we now know that xen-zcopy was not meant to handle page flips,
>>> but to implement new UAPI to let user-space create buffers either
>>> from Guest2 grant references (so it can be exported to Guest1) or
>>> other way round, e.g. create (from Guest1 grant references to export to
>>> Guest 2). For that reason it adds 2 IOCTLs: create buffer from grefs
>>> or produce grefs for the buffer given.
>>> One additional IOCTL is to wait for the buffer to be released by
>>> Guest2 user-space.
>>> That being said, I don't quite see how v4l can be used here to implement
>>> UAPI I need.
>>
>> Under the assumption that you can make xen-front to zerocopy for the
>> kernel->hypervisor path, v4l could be made to work for the
>> hypervisor->kernel side of the pipeline.
>>
>> But it sounds like we have a confusion already on why or why not xen-front
>> can or cannot do zerocopy.
>
> xen-front provides an array of grant references to Guest2 (backend).
> It's up to backend what it does with those grant references
> which at Guest2 side are not PRIME or dma-buf, but just a set of pages.
> This is xen-zcopy which turns these pages into a PRIME. When this is done
> backend can now tell DRM drivers to use the buffer in DRM terms.
>
>>>>>> danvet, can you comment on this topic?
>>>>>>
>>>>>>>> 2. the page sharing mechanism - it uses Xen-grant-table.
>>>>>>>>
>>>>>>>> And to give you a quick summary of differences as far as I
>>>>>>>> understand
>>>>>>>> between two implementations (please correct me if I am wrong,
>>>>>>>> Oleksandr.)
>>>>>>>>
>>>>>>>> 1. xen-zcopy is DRM specific - can import only DRM prime buffer
>>>>>>>> while hyper_dmabuf can export any dmabuf regardless of originator
>>>>>>>
>>>>>>> Well, this is true. And at the same time this is just a matter
>>>>>>> of extending the API: xen-zcopy is a helper driver designed for
>>>>>>> xen-front/back use-case, so this is why it only has DRM PRIME API
>>>>>>>>
>>>>>>>> 2. xen-zcopy doesn't seem to have dma-buf synchronization between
>>>>>>>> two VMs
>>>>>>>> while (as danvet called it as remote dmabuf api sharing)
>>>>>>>> hyper_dmabuf sends
>>>>>>>> out synchronization message to the exporting VM for synchronization.
>>>>>>>
>>>>>>> This is true. Again, this is because of the use-cases it covers.
>>>>>>> But having synchronization for a generic solution seems to be a good
>>>>>>> idea.
>>>>>>
>>>>>> Yeah, understood xen-zcopy works ok with your use case. But I am just
>>>>>> curious
>>>>>> if it is ok not to have any inter-domain synchronization in this
>>>>>> sharing model.
>>>>>
>>>>> The synchronization is done with displif protocol [1]
>>>>>>
>>>>>> The buffer being shared is technically dma-buf and originator needs to
>>>>>> be able
>>>>>> to keep track of it.
>>>>>
>>>>> As I am working in DRM terms the tracking is done by the DRM core
>>>>> for me for free. (This might be one of the reasons Daniel sees DRM
>>>>> based implementation fit very good from code-reuse POV).
>>>>
>>>> Hm, not sure what tracking you refer to here all ... I got lost in all
>>>> the
>>>> replies while catching up.
>>>>
>>> I was just referring to accounting stuff already implemented in the DRM
>>> core,
>>> so I don't have to worry about doing the same for buffers to understand
>>> when they are released etc.
>>>>>>>>
>>>>>>>> 3. 1-level references - when using grant-table for sharing pages,
>>>>>>>> there will
>>>>>>>> be same # of refs (each 8 byte)
>>>>>>>
>>>>>>> To be precise, grant ref is 4 bytes
>>>>>>
>>>>>> You are right. Thanks for correction.;)
>>>>>>
>>>>>>>> as # of shared pages, which is passed to
>>>>>>>> the userspace to be shared with importing VM in case of xen-zcopy.
>>>>>>>
>>>>>>> The reason for that is that xen-zcopy is a helper driver, e.g.
>>>>>>> the grant references come from the display backend [1], which
>>>>>>> implements
>>>>>>> Xen display protocol [2]. So, effectively the backend extracts
>>>>>>> references
>>>>>>> from frontend's requests and passes those to xen-zcopy as an array
>>>>>>> of refs.
>>>>>>>>
>>>>>>>>     Compared
>>>>>>>> to this, hyper_dmabuf does multiple level addressing to generate
>>>>>>>> only one
>>>>>>>> reference id that represents all shared pages.
>>>>>>>
>>>>>>> In the protocol [2] only one reference to the gref directory is
>>>>>>> passed
>>>>>>> between VMs
>>>>>>> (and the gref directory is a single-linked list of shared pages
>>>>>>> containing
>>>>>>> all
>>>>>>> of the grefs of the buffer).
>>>>>>
>>>>>> ok, good to know. I will look into its implementation in more details
>>>>>> but is
>>>>>> this gref directory (chained grefs) something that can be used for any
>>>>>> general
>>>>>> memory sharing use case or is it jsut for xen-display (in current code
>>>>>> base)?
>>>>>
>>>>> Not to mislead you: one grant ref is passed via displif protocol,
>>>>> but the page it's referencing contains the rest of the grant refs.
>>>>>
>>>>> As to if this can be used for any memory: yes. It is the same for
>>>>> sndif and displif Xen protocols, but defined twice as strictly speaking
>>>>> sndif and displif are two separate protocols.
>>>>>
>>>>> While reviewing your RFC v2 one of the comments I had [2] was that if
>>>>> we
>>>>> can start from defining such a generic protocol for hyper-dmabuf.
>>>>> It can be a header file, which not only has the description part
>>>>> (which then become a part of Documentation/...rst file), but also
>>>>> defines
>>>>> all the required constants for requests, responses, defines message
>>>>> formats,
>>>>> state diagrams etc. all at one place. Of course this protocol must not
>>>>> be
>>>>> Xen specific, but be OS/hypervisor agnostic.
>>>>> Having that will trigger a new round of discussion, so we have it all
>>>>> designed
>>>>> and discussed before we start implementing.
>>>>>
>>>>> Besides the protocol we have to design UAPI part as well and make sure
>>>>> the hyper-dmabuf is not only accessible from user-space, but there will
>>>>> be
>>>>> number
>>>>> of kernel-space users as well.
>>>>
>>>> Again, why do you want to create new uapi for this? Given the very
>>>> strict
>>>> requirements we have for new uapi (see above link), it's the toughest
>>>> way
>>>> to get any kind of support in.
>>>
>>> I do understand that adding new UAPI is not good for many reasons.
>>> But here I was meaning that current hyper-dmabuf design is
>>> only user-space oriented, e.g. it provides number of IOCTLs to do all
>>> the work. But I need a way to access the same from the kernel, so, for
>>> example,
>>> some other para-virtual driver can export/import dma-buf, not only
>>> user-space.
>>
>> If you need an import-export helper library, just merge it. Do not attach
>> any uapi to it, just the internal helpers.
>>
>> Much, much, much easier to land.
>
> This can be done, but again, I will need some entity which
> backend may use to convert xen-front's grant references into
> a PRIME buffer, hence there is UAPI for that. In other words,
> I'll need a thiner xen-zcopy which will implement the same UAPI
> and use that library for Xen related stuff.
>
> The confusion may also come from the fact that the backend is
> a user-space application, not a kernel module (we have 2 modes
> of its operation as of now: DRM master or Weston client), so
> it needs a way to talk to the kernel.

So this is entirely a means to implement the virtual xen device in
dom0 (or whichever guest implements it)?

I'm externally confused about what you mean with "backend", since
xen-front also has backend code. But that backend code lives in the
same guest os image (afaict at least), since it does direct function
calls.

Please be more specific in what you mean instead of just "backend",
that's really confusing.

But essentially we're talking about the equivalent of what qemu does
for kvm, and that's entirely not my problem. Not really a gpu
subsystem problem I think. Just talk with the xen hypervisor people
about how exactly they want to go about converting grant tables to
dma-buf, so that your virtual hw backend in userspace can make use of
it. And then merge it somewhere in the xen directories. Since the
grant tables and everything is very xen specific, I don't think
there's much point in trying to have a fake generic uapi that pretends
to work on other hypervisors, as long as they're Xen :-)

And you probably have no need for all the caching/general book-keeping
drm_prime does (it's all in userspace I guess, except for the magic
conversion from grant references to a dma_buf). So there's no point
trying to reuse code in drm_prime.c.

Also, this should make it tons easier to reuse xen-zcopy for
sound/wireless/v4l backends.

>>>> That's why I had essentially zero big questions for xen-front (except
>>>> some
>>>> implementation improvements, and stuff to make sure xen-front actually
>>>> implements the real uapi semantics instead of its own), and why I'm
>>>> asking
>>>> much more questions on this stuff here.
>>>>
>>>>>>>> 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm
>>>>>>>> msg
>>>>>>>> communication defined for dmabuf synchronization and private data
>>>>>>>> (meta
>>>>>>>> info that Matt Roper mentioned) exchange.
>>>>>>>
>>>>>>> This is true, xen-zcopy has no means for inter VM sync and meta-data,
>>>>>>> simply because it doesn't have any code for inter VM exchange in it,
>>>>>>> e.g. the inter VM protocol is handled by the backend [1].
>>>>>>>>
>>>>>>>> 5. driver-to-driver notification (hyper_dmabuf only) - importing VM
>>>>>>>> gets
>>>>>>>> notified when newdmabuf is exported from other VM - uevent can be
>>>>>>>> optionally
>>>>>>>> generated when this happens.
>>>>>>>>
>>>>>>>> 6. structure - hyper_dmabuf is targetting to provide a generic
>>>>>>>> solution for
>>>>>>>> inter-domain dmabuf sharing for most hypervisors, which is why it
>>>>>>>> has two
>>>>>>>> layers as mattrope mentioned, front-end that contains standard API
>>>>>>>> and backend
>>>>>>>> that is specific to hypervisor.
>>>>>>>
>>>>>>> Again, xen-zcopy is decoupled from inter VM communication
>>>>>>>>>>
>>>>>>>>>> No idea, didn't look at it in detail.
>>>>>>>>>>
>>>>>>>>>> Looks pretty complex from a distant view.  Maybe because it tries
>>>>>>>>>> to
>>>>>>>>>> build a communication framework using dma-bufs instead of a simple
>>>>>>>>>> dma-buf passing mechanism.
>>>>>>>>
>>>>>>>> we started with simple dma-buf sharing but realized there are many
>>>>>>>> things we need to consider in real use-case, so we added
>>>>>>>> communication
>>>>>>>> , notification and dma-buf synchronization then re-structured it to
>>>>>>>> front-end and back-end (this made things more compicated..) since
>>>>>>>> Xen
>>>>>>>> was not our only target. Also, we thought passing the reference for
>>>>>>>> the
>>>>>>>> buffer (hyper_dmabuf_id) is not secure so added uvent mechanism
>>>>>>>> later.
>>>>>>>>
>>>>>>>>> Yes, I am looking at it now, trying to figure out the full story
>>>>>>>>> and its implementation. BTW, Intel guys were about to share some
>>>>>>>>> test application for hyper-dmabuf, maybe I have missed one.
>>>>>>>>> It could probably better explain the use-cases and the complexity
>>>>>>>>> they have in hyper-dmabuf.
>>>>>>>>
>>>>>>>> One example is actually in github. If you want take a look at it,
>>>>>>>> please
>>>>>>>> visit:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
>>>>>>>
>>>>>>> Thank you, I'll have a look
>>>>>>>>>>
>>>>>>>>>> Like xen-zcopy it seems to depend on the idea that the hypervisor
>>>>>>>>>> manages all memory it is easy for guests to share pages with the
>>>>>>>>>> help of
>>>>>>>>>> the hypervisor.
>>>>>>>>>
>>>>>>>>> So, for xen-zcopy we were not trying to make it generic,
>>>>>>>>> it just solves display (dumb) zero-copying use-cases for Xen.
>>>>>>>>> We implemented it as a DRM helper driver because we can't see any
>>>>>>>>> other use-cases as of now.
>>>>>>>>> For example, we also have Xen para-virtualized sound driver, but
>>>>>>>>> its buffer memory usage is not comparable to what display wants
>>>>>>>>> and it works somewhat differently (e.g. there is no "frame done"
>>>>>>>>> event, so one can't tell when the sound buffer can be "flipped").
>>>>>>>>> At the same time, we do not use virtio-gpu, so this could probably
>>>>>>>>> be one more candidate for shared dma-bufs some day.
>>>>>>>>>>
>>>>>>>>>>      Which simply isn't the case on kvm.
>>>>>>>>>>
>>>>>>>>>> hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf
>>>>>>>>>> build
>>>>>>>>>> on top of xen-zcopy.
>>>>>>>>>
>>>>>>>>> Hm, I can imagine that: xen-zcopy could be a library code for
>>>>>>>>> hyper-dmabuf
>>>>>>>>> in terms of implementing all that page sharing fun in multiple
>>>>>>>>> directions,
>>>>>>>>> e.g. Host->Guest, Guest->Host, Guest<->Guest.
>>>>>>>>> But I'll let Matt and Dongwon to comment on that.
>>>>>>>>
>>>>>>>> I think we can definitely collaborate. Especially, maybe we are
>>>>>>>> using some
>>>>>>>> outdated sharing mechanism/grant-table mechanism in our Xen backend
>>>>>>>> (thanks
>>>>>>>> for bringing that up Oleksandr). However, the question is once we
>>>>>>>> collaborate
>>>>>>>> somehow, can xen-zcopy's usecase use the standard API that
>>>>>>>> hyper_dmabuf
>>>>>>>> provides? I don't think we need different IOCTLs that do the same in
>>>>>>>> the final
>>>>>>>> solution.
>>>>>>>>
>>>>>>> If you think of xen-zcopy as a library (which implements Xen
>>>>>>> grant references mangling) and DRM PRIME wrapper on top of that
>>>>>>> library, we can probably define proper API for that library,
>>>>>>> so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
>>>>>>> about to start upstreaming Xen para-virtualized sound device driver
>>>>>>> soon,
>>>>>>> which also uses similar code and gref passing mechanism [3].
>>>>>>> (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
>>>>>>> snd/xen-front and then propose a Xen helper library for sharing big
>>>>>>> buffers,
>>>>>>> so common code of the above drivers can use the same code w/o code
>>>>>>> duplication)
>>>>>>
>>>>>> I think it is possible to use your functions for memory sharing part
>>>>>> in
>>>>>> hyper_dmabuf's backend (this 'backend' means the layer that does page
>>>>>> sharing
>>>>>> and inter-vm communication with xen-specific way.), so why don't we
>>>>>> work on
>>>>>> "Xen helper library for sharing big buffers" first while we continue
>>>>>> our
>>>>>> discussion on the common API layer that can cover any dmabuf sharing
>>>>>> cases.
>>>>>>
>>>>> Well, I would love we reuse the code that I have, but I also
>>>>> understand that it was limited by my use-cases. So, I do not
>>>>> insist we have to ;)
>>>>> If we start designing and discussing hyper-dmabuf protocol we of course
>>>>> can work on this helper library in parallel.
>>>>
>>>> Imo code reuse is overrated. Adding new uapi is what freaks me out here
>>>> :-)
>>>>
>>>> If we end up with duplicated implementations, even in upstream, meh, not
>>>> great, but also ok. New uapi, and in a similar way, new hypervisor api
>>>> like the dma-buf forwarding that hyperdmabuf does is the kind of thing
>>>> that will lock us in for 10+ years (if we make a mistake).
>>>>
>>>>>>> Thank you,
>>>>>>> Oleksandr
>>>>>>>
>>>>>>> P.S. All, is it a good idea to move this out of udmabuf thread into a
>>>>>>> dedicated one?
>>>>>>
>>>>>> Either way is fine with me.
>>>>>
>>>>> So, if you can start designing the protocol we may have a dedicated
>>>>> mail
>>>>> thread for that. I will try to help with the protocol as much as I can
>>>>
>>>> Please don't start with the protocol. Instead start with the concrete
>>>> use-cases, and then figure out why exactly you need new uapi. Once we
>>>> have
>>>> that answered, we can start thinking about fleshing out the details.
>>>
>>> On my side there are only 2 use-cases, Guest2 only:
>>> 1. Create a PRIME (dma-buf) from grant references
>>> 2. Create grant references from PRIME (dma-buf)
>>
>> So these grant references, are those userspace visible things?
>
> Yes, the user-space backend receives those from xen-front via [1]
>
>> I thought
>> the grant references was just the kernel/hypervisor internal magic to make
>> this all work?
>
> So, I can map the grant references from user-space, but I won't
> be able to turn those into a PRIME buffer. So, the only use of those
> w/o xen-zcopy is to map grant refs and copy into real HW dumb on every page
> flip.

Ok, that explains. I thought your current xen-side implementation for
xen-front is already making all that stuff happen. But I'm still not
sure given all the confusing talk about back-end we have in these
threads (hyperdmabuf people also talked about different backends for
different hypervisors, I guess that's a different kind of backend?).
-Daniel

Oleksandr Andrushchenko April 16, 2018, 10:14 a.m. UTC | #25

On 04/16/2018 12:32 PM, Daniel Vetter wrote:
> On Mon, Apr 16, 2018 at 10:22 AM, Oleksandr Andrushchenko
> <andr2000@gmail.com> wrote:
>> On 04/16/2018 10:43 AM, Daniel Vetter wrote:
>>> On Mon, Apr 16, 2018 at 10:16:31AM +0300, Oleksandr Andrushchenko wrote:
>>>> On 04/13/2018 06:37 PM, Daniel Vetter wrote:
>>>>> On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
>>>>>> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
>>>>>>> On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko
>>>>>>> wrote:
>>>>>>>> On 04/06/2018 09:57 PM, Dongwon Kim wrote:
>>>>>>>>> On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko
>>>>>>>>> wrote:
>>>>>>>>>> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
>>>>>>>>>>>       Hi,
>>>>>>>>>>>
>>>>>>>>>>>>> I fail to see any common ground for xen-zcopy and udmabuf ...
>>>>>>>>>>>> Does the above mean you can assume that xen-zcopy and udmabuf
>>>>>>>>>>>> can co-exist as two different solutions?
>>>>>>>>>>> Well, udmabuf route isn't fully clear yet, but yes.
>>>>>>>>>>>
>>>>>>>>>>> See also gvt (intel vgpu), where the hypervisor interface is
>>>>>>>>>>> abstracted
>>>>>>>>>>> away into a separate kernel modules even though most of the actual
>>>>>>>>>>> vgpu
>>>>>>>>>>> emulation code is common.
>>>>>>>>>> Thank you for your input, I'm just trying to figure out
>>>>>>>>>> which of the three z-copy solutions intersect and how much
>>>>>>>>>>>> And what about hyper-dmabuf?
>>>>>>>>> xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
>>>>>>>>> in terms of these core sharing feature:
>>>>>>>>>
>>>>>>>>> 1. the sharing process - import prime/dmabuf from the producer ->
>>>>>>>>> extract
>>>>>>>>> underlying pages and get those shared -> return references for
>>>>>>>>> shared pages
>>>>>>> Another thing is danvet was kind of against to the idea of importing
>>>>>>> existing
>>>>>>> dmabuf/prime buffer and forward it to the other domain due to
>>>>>>> synchronization
>>>>>>> issues. He proposed to make hyper_dmabuf only work as an exporter so
>>>>>>> that it
>>>>>>> can have a full control over the buffer. I think we need to talk about
>>>>>>> this
>>>>>>> further as well.
>>>>>> Yes, I saw this. But this limits the use-cases so much.
>>>>>> For instance, running Android as a Guest (which uses ION to allocate
>>>>>> buffers) means that finally HW composer will import dma-buf into
>>>>>> the DRM driver. Then, in case of xen-front for example, it needs to be
>>>>>> shared with the backend (Host side). Of course, we can change
>>>>>> user-space
>>>>>> to make xen-front allocate the buffers (make it exporter), but what we
>>>>>> try
>>>>>> to avoid is to change user-space which in normal world would have
>>>>>> remain
>>>>>> unchanged otherwise.
>>>>>> So, I do think we have to support this use-case and just have to
>>>>>> understand
>>>>>> the complexity.
>>>>> Erm, why do you need importer capability for this use-case?
>>>>>
>>>>> guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes
>>>>> that dma-buf -> import to the real display hw
>>>>>
>>>>> No where in this chain do you need xen-zcopy to be able to import a
>>>>> dma-buf (within linux, it needs to import a bunch of pages from the
>>>>> hypervisor).
>>>>>
>>>>> Now if your plan is to use xen-zcopy in the guest1 instead of xen-front,
>>>>> then you indeed need to import.
>>>> This is the exact use-case I was referring to while saying
>>>> we need to import on Guest1 side. If hyper-dmabuf is so
>>>> generic that there is no xen-front in the picture, then
>>>> it needs to import a dma-buf, so it can be exported at Guest2 side.
>>>>>     But that imo doesn't make sense:
>>>>> - xen-front gives you clearly defined flip events you can forward to the
>>>>>      hypervisor. xen-zcopy would need to add that again.
>>>> xen-zcopy is a helper driver which doesn't handle page flips
>>>> and is not a KMS driver as one might think of: the DRM UAPI it uses is
>>>> just to export a dma-buf as a PRIME buffer, but that's it.
>>>> Flipping etc. is done by the backend [1], not xen-zcopy.
>>>>>     Same for
>>>>>      hyperdmabuf (and really we're not going to shuffle struct dma_fence
>>>>> over
>>>>>      the wire in a generic fashion between hypervisor guests).
>>>>>
>>>>> - xen-front already has the idea of pixel format for the buffer (and any
>>>>>      other metadata). Again, xen-zcopy and hyperdmabuf lack that, would
>>>>> need
>>>>>      to add it shoehorned in somehow.
>>>> Again, here you are talking of something which is implemented in
>>>> Xen display backend, not xen-zcopy, e.g. display backend can
>>>> implement para-virtual display w/o xen-zcopy at all, but in this case
>>>> there is a memory copying for each frame. With the help of xen-zcopy
>>>> the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or
>>>> Weston or whatever as xen-zcopy exports remote buffers as PRIME buffers,
>>>> thus no buffer copying is required.
>>> Why do you need to copy on every frame for xen-front? In the above
>>> pipeline, using xen-front I see 0 architectural reasons to have a copy
>>> anywhere.
>>>
>>> This seems to be the core of the confusion we're having here.
>> Ok, so I'll try to explain:
>> 1. xen-front - produces a display buffer to be shown at Guest2
>> by the backend, shares its grant references with the backend
>> 2. xen-front sends page flip event to the backend specifying the
>> buffer in question
>> 3. Backend takes the shared buffer (which is only a buffer mapped into
>> backend's memory, it is not a dma-buf/PRIME one) and makes memcpy from
>> it to a local dumb/surface
> Why do you even do that? The copying here I mean - why don't you just
> directly scan out from the grant references you received through the
> hypervisor?
Probably the confusion comes from the fact that KVM and Xen
implement things differently (for example, on ARM we don't use QEMU at all).
Please see [1] and [2] for Xen frontend/backend placement in the picture.

WRT to [2] xen-front is a PV front-end driver running in guest OS
and Xen display backend is a user-space application running in Dom0
(in the picture [2] backend runs as a Dom0 kernel driver).
So, the para-virtualized device is not implemented in the hypervisor
itself, but as user/kernel-space pair in corresponding domains.
Thus, when xen-front shares grant references of the pages of the buffer
with the Xen display backend (user-space) the later can only map those
references into Dom0 memory to memcpy into some local display buffer/dumb.
Hence, hypervisor is not in the equation while actually implementing
para-virtual display device, e.g. it provides you with API to share/map
pages, but it won't be the entity which will implement actual page flips 
etc.
So, this is where xen-zcopy comes into the play (runs in Dom0):
it not only maps xen-front's grant references into Dom0, but also creates
a PRIME buffer, so this buffer can be used by other DRM devices/Weston
running in Dom0.
> Also I'm not clear in your example which step happens where (guest 1/2
> or hypervisor)?
Steps 1,2 - Guest2, kernel space
Steps 3-4 - Guest1, Dom0 user-space
The hypervisor here only provides transport and means to access buffers,
actual display/DRM related code is in xen-front and Dom0's display backend
>> 4. Backend flips that local dumb buffer/surface
>>
>> If I have a xen-zcopy helper driver then I can avoid doing step 3):
>> 1) 2) remain the same as above
>> 3) Initially for a new display buffer, backend calls xen-zcopy to create
>> a local PRIME buffer from the grant references provided by the xen-front
>> via displif protocol [1]: we now have handle_zcopy
>> 4) Backend exports this PRIME with HANDLE_TO_FD from xen-zcopy and imports
>> it into Weston-KMS/DRM or real HW DRM driver with FD_TO_HANDLE: we now have
>> handle_local
>> 5) On page flip event backend flips local PRIME: uses handle_local for flips
>>
>>>>> Ofc you won't be able to shovel sound or media stream data over to
>>>>> another
>>>>> guest like this, but that's what you have xen-v4l and xen-sound or
>>>>> whatever else for. Trying to make a new uapi, which means userspace must
>>>>> be changed for all the different use-case, instead of reusing standard
>>>>> linux driver uapi (which just happens to send the data to another
>>>>> hypervisor guest instead of real hw) imo just doesn't make much sense.
>>>>>
>>>>> Also, at least for the gpu subsystem: Any new uapi must have full
>>>>> userspace available for it, see:
>>>>>
>>>>>
>>>>> https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
>>>>>
>>>>> Adding more uapi is definitely the most painful way to fix a use-case.
>>>>> Personally I'd go as far and also change the xen-zcopy side on the
>>>>> receiving guest to use some standard linux uapi. E.g. you could write an
>>>>> output v4l driver to receive the frames from guest1.
>>>> So, we now know that xen-zcopy was not meant to handle page flips,
>>>> but to implement new UAPI to let user-space create buffers either
>>>> from Guest2 grant references (so it can be exported to Guest1) or
>>>> other way round, e.g. create (from Guest1 grant references to export to
>>>> Guest 2). For that reason it adds 2 IOCTLs: create buffer from grefs
>>>> or produce grefs for the buffer given.
>>>> One additional IOCTL is to wait for the buffer to be released by
>>>> Guest2 user-space.
>>>> That being said, I don't quite see how v4l can be used here to implement
>>>> UAPI I need.
>>> Under the assumption that you can make xen-front to zerocopy for the
>>> kernel->hypervisor path, v4l could be made to work for the
>>> hypervisor->kernel side of the pipeline.
>>>
>>> But it sounds like we have a confusion already on why or why not xen-front
>>> can or cannot do zerocopy.
>> xen-front provides an array of grant references to Guest2 (backend).
>> It's up to backend what it does with those grant references
>> which at Guest2 side are not PRIME or dma-buf, but just a set of pages.
>> This is xen-zcopy which turns these pages into a PRIME. When this is done
>> backend can now tell DRM drivers to use the buffer in DRM terms.
>>
>>>>>>> danvet, can you comment on this topic?
>>>>>>>
>>>>>>>>> 2. the page sharing mechanism - it uses Xen-grant-table.
>>>>>>>>>
>>>>>>>>> And to give you a quick summary of differences as far as I
>>>>>>>>> understand
>>>>>>>>> between two implementations (please correct me if I am wrong,
>>>>>>>>> Oleksandr.)
>>>>>>>>>
>>>>>>>>> 1. xen-zcopy is DRM specific - can import only DRM prime buffer
>>>>>>>>> while hyper_dmabuf can export any dmabuf regardless of originator
>>>>>>>> Well, this is true. And at the same time this is just a matter
>>>>>>>> of extending the API: xen-zcopy is a helper driver designed for
>>>>>>>> xen-front/back use-case, so this is why it only has DRM PRIME API
>>>>>>>>> 2. xen-zcopy doesn't seem to have dma-buf synchronization between
>>>>>>>>> two VMs
>>>>>>>>> while (as danvet called it as remote dmabuf api sharing)
>>>>>>>>> hyper_dmabuf sends
>>>>>>>>> out synchronization message to the exporting VM for synchronization.
>>>>>>>> This is true. Again, this is because of the use-cases it covers.
>>>>>>>> But having synchronization for a generic solution seems to be a good
>>>>>>>> idea.
>>>>>>> Yeah, understood xen-zcopy works ok with your use case. But I am just
>>>>>>> curious
>>>>>>> if it is ok not to have any inter-domain synchronization in this
>>>>>>> sharing model.
>>>>>> The synchronization is done with displif protocol [1]
>>>>>>> The buffer being shared is technically dma-buf and originator needs to
>>>>>>> be able
>>>>>>> to keep track of it.
>>>>>> As I am working in DRM terms the tracking is done by the DRM core
>>>>>> for me for free. (This might be one of the reasons Daniel sees DRM
>>>>>> based implementation fit very good from code-reuse POV).
>>>>> Hm, not sure what tracking you refer to here all ... I got lost in all
>>>>> the
>>>>> replies while catching up.
>>>>>
>>>> I was just referring to accounting stuff already implemented in the DRM
>>>> core,
>>>> so I don't have to worry about doing the same for buffers to understand
>>>> when they are released etc.
>>>>>>>>> 3. 1-level references - when using grant-table for sharing pages,
>>>>>>>>> there will
>>>>>>>>> be same # of refs (each 8 byte)
>>>>>>>> To be precise, grant ref is 4 bytes
>>>>>>> You are right. Thanks for correction.;)
>>>>>>>
>>>>>>>>> as # of shared pages, which is passed to
>>>>>>>>> the userspace to be shared with importing VM in case of xen-zcopy.
>>>>>>>> The reason for that is that xen-zcopy is a helper driver, e.g.
>>>>>>>> the grant references come from the display backend [1], which
>>>>>>>> implements
>>>>>>>> Xen display protocol [2]. So, effectively the backend extracts
>>>>>>>> references
>>>>>>>> from frontend's requests and passes those to xen-zcopy as an array
>>>>>>>> of refs.
>>>>>>>>>      Compared
>>>>>>>>> to this, hyper_dmabuf does multiple level addressing to generate
>>>>>>>>> only one
>>>>>>>>> reference id that represents all shared pages.
>>>>>>>> In the protocol [2] only one reference to the gref directory is
>>>>>>>> passed
>>>>>>>> between VMs
>>>>>>>> (and the gref directory is a single-linked list of shared pages
>>>>>>>> containing
>>>>>>>> all
>>>>>>>> of the grefs of the buffer).
>>>>>>> ok, good to know. I will look into its implementation in more details
>>>>>>> but is
>>>>>>> this gref directory (chained grefs) something that can be used for any
>>>>>>> general
>>>>>>> memory sharing use case or is it jsut for xen-display (in current code
>>>>>>> base)?
>>>>>> Not to mislead you: one grant ref is passed via displif protocol,
>>>>>> but the page it's referencing contains the rest of the grant refs.
>>>>>>
>>>>>> As to if this can be used for any memory: yes. It is the same for
>>>>>> sndif and displif Xen protocols, but defined twice as strictly speaking
>>>>>> sndif and displif are two separate protocols.
>>>>>>
>>>>>> While reviewing your RFC v2 one of the comments I had [2] was that if
>>>>>> we
>>>>>> can start from defining such a generic protocol for hyper-dmabuf.
>>>>>> It can be a header file, which not only has the description part
>>>>>> (which then become a part of Documentation/...rst file), but also
>>>>>> defines
>>>>>> all the required constants for requests, responses, defines message
>>>>>> formats,
>>>>>> state diagrams etc. all at one place. Of course this protocol must not
>>>>>> be
>>>>>> Xen specific, but be OS/hypervisor agnostic.
>>>>>> Having that will trigger a new round of discussion, so we have it all
>>>>>> designed
>>>>>> and discussed before we start implementing.
>>>>>>
>>>>>> Besides the protocol we have to design UAPI part as well and make sure
>>>>>> the hyper-dmabuf is not only accessible from user-space, but there will
>>>>>> be
>>>>>> number
>>>>>> of kernel-space users as well.
>>>>> Again, why do you want to create new uapi for this? Given the very
>>>>> strict
>>>>> requirements we have for new uapi (see above link), it's the toughest
>>>>> way
>>>>> to get any kind of support in.
>>>> I do understand that adding new UAPI is not good for many reasons.
>>>> But here I was meaning that current hyper-dmabuf design is
>>>> only user-space oriented, e.g. it provides number of IOCTLs to do all
>>>> the work. But I need a way to access the same from the kernel, so, for
>>>> example,
>>>> some other para-virtual driver can export/import dma-buf, not only
>>>> user-space.
>>> If you need an import-export helper library, just merge it. Do not attach
>>> any uapi to it, just the internal helpers.
>>>
>>> Much, much, much easier to land.
>> This can be done, but again, I will need some entity which
>> backend may use to convert xen-front's grant references into
>> a PRIME buffer, hence there is UAPI for that. In other words,
>> I'll need a thiner xen-zcopy which will implement the same UAPI
>> and use that library for Xen related stuff.
>>
>> The confusion may also come from the fact that the backend is
>> a user-space application, not a kernel module (we have 2 modes
>> of its operation as of now: DRM master or Weston client), so
>> it needs a way to talk to the kernel.
> So this is entirely a means to implement the virtual xen device in
> dom0 (or whichever guest implements it)?
>
> I'm externally confused about what you mean with "backend", since
> xen-front also has backend code. But that backend code lives in the
> same guest os image (afaict at least), since it does direct function
> calls.
xen-front has no backend code, but only has code which allows it
to create a dumb buffer from the grant references provided by the
backend.
> Please be more specific in what you mean instead of just "backend",
> that's really confusing.
Hope [2] better explains this
>
> But essentially we're talking about the equivalent of what qemu does
> for kvm, and that's entirely not my problem. Not really a gpu
> subsystem problem I think. Just talk with the xen hypervisor people
> about how exactly they want to go about converting grant tables to
> dma-buf, so that your virtual hw backend in userspace can make use of
> it.
The problem here is that the display backend then will need
to talk to DRM. And what is the UAPI for that? Right, PRIME
buffers.
> And then merge it somewhere in the xen directories. Since the
> grant tables and everything is very xen specific, I don't think
> there's much point in trying to have a fake generic uapi that pretends
> to work on other hypervisors, as long as they're Xen :-)
>
> And you probably have no need for all the caching/general book-keeping
> drm_prime does (it's all in userspace I guess, except for the magic
> conversion from grant references to a dma_buf). So there's no point
> trying to reuse code in drm_prime.c.
>
> Also, this should make it tons easier to reuse xen-zcopy for
> sound/wireless/v4l backends.
>
>>>>> That's why I had essentially zero big questions for xen-front (except
>>>>> some
>>>>> implementation improvements, and stuff to make sure xen-front actually
>>>>> implements the real uapi semantics instead of its own), and why I'm
>>>>> asking
>>>>> much more questions on this stuff here.
>>>>>
>>>>>>>>> 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm
>>>>>>>>> msg
>>>>>>>>> communication defined for dmabuf synchronization and private data
>>>>>>>>> (meta
>>>>>>>>> info that Matt Roper mentioned) exchange.
>>>>>>>> This is true, xen-zcopy has no means for inter VM sync and meta-data,
>>>>>>>> simply because it doesn't have any code for inter VM exchange in it,
>>>>>>>> e.g. the inter VM protocol is handled by the backend [1].
>>>>>>>>> 5. driver-to-driver notification (hyper_dmabuf only) - importing VM
>>>>>>>>> gets
>>>>>>>>> notified when newdmabuf is exported from other VM - uevent can be
>>>>>>>>> optionally
>>>>>>>>> generated when this happens.
>>>>>>>>>
>>>>>>>>> 6. structure - hyper_dmabuf is targetting to provide a generic
>>>>>>>>> solution for
>>>>>>>>> inter-domain dmabuf sharing for most hypervisors, which is why it
>>>>>>>>> has two
>>>>>>>>> layers as mattrope mentioned, front-end that contains standard API
>>>>>>>>> and backend
>>>>>>>>> that is specific to hypervisor.
>>>>>>>> Again, xen-zcopy is decoupled from inter VM communication
>>>>>>>>>>> No idea, didn't look at it in detail.
>>>>>>>>>>>
>>>>>>>>>>> Looks pretty complex from a distant view.  Maybe because it tries
>>>>>>>>>>> to
>>>>>>>>>>> build a communication framework using dma-bufs instead of a simple
>>>>>>>>>>> dma-buf passing mechanism.
>>>>>>>>> we started with simple dma-buf sharing but realized there are many
>>>>>>>>> things we need to consider in real use-case, so we added
>>>>>>>>> communication
>>>>>>>>> , notification and dma-buf synchronization then re-structured it to
>>>>>>>>> front-end and back-end (this made things more compicated..) since
>>>>>>>>> Xen
>>>>>>>>> was not our only target. Also, we thought passing the reference for
>>>>>>>>> the
>>>>>>>>> buffer (hyper_dmabuf_id) is not secure so added uvent mechanism
>>>>>>>>> later.
>>>>>>>>>
>>>>>>>>>> Yes, I am looking at it now, trying to figure out the full story
>>>>>>>>>> and its implementation. BTW, Intel guys were about to share some
>>>>>>>>>> test application for hyper-dmabuf, maybe I have missed one.
>>>>>>>>>> It could probably better explain the use-cases and the complexity
>>>>>>>>>> they have in hyper-dmabuf.
>>>>>>>>> One example is actually in github. If you want take a look at it,
>>>>>>>>> please
>>>>>>>>> visit:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
>>>>>>>> Thank you, I'll have a look
>>>>>>>>>>> Like xen-zcopy it seems to depend on the idea that the hypervisor
>>>>>>>>>>> manages all memory it is easy for guests to share pages with the
>>>>>>>>>>> help of
>>>>>>>>>>> the hypervisor.
>>>>>>>>>> So, for xen-zcopy we were not trying to make it generic,
>>>>>>>>>> it just solves display (dumb) zero-copying use-cases for Xen.
>>>>>>>>>> We implemented it as a DRM helper driver because we can't see any
>>>>>>>>>> other use-cases as of now.
>>>>>>>>>> For example, we also have Xen para-virtualized sound driver, but
>>>>>>>>>> its buffer memory usage is not comparable to what display wants
>>>>>>>>>> and it works somewhat differently (e.g. there is no "frame done"
>>>>>>>>>> event, so one can't tell when the sound buffer can be "flipped").
>>>>>>>>>> At the same time, we do not use virtio-gpu, so this could probably
>>>>>>>>>> be one more candidate for shared dma-bufs some day.
>>>>>>>>>>>       Which simply isn't the case on kvm.
>>>>>>>>>>>
>>>>>>>>>>> hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf
>>>>>>>>>>> build
>>>>>>>>>>> on top of xen-zcopy.
>>>>>>>>>> Hm, I can imagine that: xen-zcopy could be a library code for
>>>>>>>>>> hyper-dmabuf
>>>>>>>>>> in terms of implementing all that page sharing fun in multiple
>>>>>>>>>> directions,
>>>>>>>>>> e.g. Host->Guest, Guest->Host, Guest<->Guest.
>>>>>>>>>> But I'll let Matt and Dongwon to comment on that.
>>>>>>>>> I think we can definitely collaborate. Especially, maybe we are
>>>>>>>>> using some
>>>>>>>>> outdated sharing mechanism/grant-table mechanism in our Xen backend
>>>>>>>>> (thanks
>>>>>>>>> for bringing that up Oleksandr). However, the question is once we
>>>>>>>>> collaborate
>>>>>>>>> somehow, can xen-zcopy's usecase use the standard API that
>>>>>>>>> hyper_dmabuf
>>>>>>>>> provides? I don't think we need different IOCTLs that do the same in
>>>>>>>>> the final
>>>>>>>>> solution.
>>>>>>>>>
>>>>>>>> If you think of xen-zcopy as a library (which implements Xen
>>>>>>>> grant references mangling) and DRM PRIME wrapper on top of that
>>>>>>>> library, we can probably define proper API for that library,
>>>>>>>> so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
>>>>>>>> about to start upstreaming Xen para-virtualized sound device driver
>>>>>>>> soon,
>>>>>>>> which also uses similar code and gref passing mechanism [3].
>>>>>>>> (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
>>>>>>>> snd/xen-front and then propose a Xen helper library for sharing big
>>>>>>>> buffers,
>>>>>>>> so common code of the above drivers can use the same code w/o code
>>>>>>>> duplication)
>>>>>>> I think it is possible to use your functions for memory sharing part
>>>>>>> in
>>>>>>> hyper_dmabuf's backend (this 'backend' means the layer that does page
>>>>>>> sharing
>>>>>>> and inter-vm communication with xen-specific way.), so why don't we
>>>>>>> work on
>>>>>>> "Xen helper library for sharing big buffers" first while we continue
>>>>>>> our
>>>>>>> discussion on the common API layer that can cover any dmabuf sharing
>>>>>>> cases.
>>>>>>>
>>>>>> Well, I would love we reuse the code that I have, but I also
>>>>>> understand that it was limited by my use-cases. So, I do not
>>>>>> insist we have to ;)
>>>>>> If we start designing and discussing hyper-dmabuf protocol we of course
>>>>>> can work on this helper library in parallel.
>>>>> Imo code reuse is overrated. Adding new uapi is what freaks me out here
>>>>> :-)
>>>>>
>>>>> If we end up with duplicated implementations, even in upstream, meh, not
>>>>> great, but also ok. New uapi, and in a similar way, new hypervisor api
>>>>> like the dma-buf forwarding that hyperdmabuf does is the kind of thing
>>>>> that will lock us in for 10+ years (if we make a mistake).
>>>>>
>>>>>>>> Thank you,
>>>>>>>> Oleksandr
>>>>>>>>
>>>>>>>> P.S. All, is it a good idea to move this out of udmabuf thread into a
>>>>>>>> dedicated one?
>>>>>>> Either way is fine with me.
>>>>>> So, if you can start designing the protocol we may have a dedicated
>>>>>> mail
>>>>>> thread for that. I will try to help with the protocol as much as I can
>>>>> Please don't start with the protocol. Instead start with the concrete
>>>>> use-cases, and then figure out why exactly you need new uapi. Once we
>>>>> have
>>>>> that answered, we can start thinking about fleshing out the details.
>>>> On my side there are only 2 use-cases, Guest2 only:
>>>> 1. Create a PRIME (dma-buf) from grant references
>>>> 2. Create grant references from PRIME (dma-buf)
>>> So these grant references, are those userspace visible things?
>> Yes, the user-space backend receives those from xen-front via [1]
>>
>>> I thought
>>> the grant references was just the kernel/hypervisor internal magic to make
>>> this all work?
>> So, I can map the grant references from user-space, but I won't
>> be able to turn those into a PRIME buffer. So, the only use of those
>> w/o xen-zcopy is to map grant refs and copy into real HW dumb on every page
>> flip.
> Ok, that explains. I thought your current xen-side implementation for
> xen-front is already making all that stuff happen. But I'm still not
> sure given all the confusing talk about back-end we have in these
> threads (hyperdmabuf people also talked about different backends for
> different hypervisors, I guess that's a different kind of backend?).
Hope the explanation above makes it all clearer.
Please let me know if you still want me to elaborate more
> -Daniel
[1] https://wiki.xen.org/wiki/Paravirtualization_(PV)
[2] https://wiki.xen.org/wiki/File:XenPV.png

Daniel Vetter April 16, 2018, 12:08 p.m. UTC | #26

Ok, confusion around backend is I think cleared up. The other
confusion seems to be around dma-buf:

dma-buf is the cross subsystem zerocopy abstraction. PRIME is the
drm-specific support for it, 100% based on top of the generic struct
dma_buf.

You need a dma_buf exporter to convert a xen grant references list
into a dma_buf, which you can then import in your drm driver (using
prime), v4l, or anything else that supports dma-buf. You do _not_ need
a prime implementation, that's only the marketing name we've given to
dma-buf import/export for drm drivers.
-Daniel


On Mon, Apr 16, 2018 at 12:14 PM, Oleksandr Andrushchenko
<andr2000@gmail.com> wrote:
> On 04/16/2018 12:32 PM, Daniel Vetter wrote:
>>
>> On Mon, Apr 16, 2018 at 10:22 AM, Oleksandr Andrushchenko
>> <andr2000@gmail.com> wrote:
>>>
>>> On 04/16/2018 10:43 AM, Daniel Vetter wrote:
>>>>
>>>> On Mon, Apr 16, 2018 at 10:16:31AM +0300, Oleksandr Andrushchenko wrote:
>>>>>
>>>>> On 04/13/2018 06:37 PM, Daniel Vetter wrote:
>>>>>>
>>>>>> On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko
>>>>>> wrote:
>>>>>>>
>>>>>>> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
>>>>>>>>
>>>>>>>> On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 04/06/2018 09:57 PM, Dongwon Kim wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>       Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>>> I fail to see any common ground for xen-zcopy and udmabuf ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does the above mean you can assume that xen-zcopy and udmabuf
>>>>>>>>>>>>> can co-exist as two different solutions?
>>>>>>>>>>>>
>>>>>>>>>>>> Well, udmabuf route isn't fully clear yet, but yes.
>>>>>>>>>>>>
>>>>>>>>>>>> See also gvt (intel vgpu), where the hypervisor interface is
>>>>>>>>>>>> abstracted
>>>>>>>>>>>> away into a separate kernel modules even though most of the
>>>>>>>>>>>> actual
>>>>>>>>>>>> vgpu
>>>>>>>>>>>> emulation code is common.
>>>>>>>>>>>
>>>>>>>>>>> Thank you for your input, I'm just trying to figure out
>>>>>>>>>>> which of the three z-copy solutions intersect and how much
>>>>>>>>>>>>>
>>>>>>>>>>>>> And what about hyper-dmabuf?
>>>>>>>>>>
>>>>>>>>>> xen z-copy solution is pretty similar fundamentally to
>>>>>>>>>> hyper_dmabuf
>>>>>>>>>> in terms of these core sharing feature:
>>>>>>>>>>
>>>>>>>>>> 1. the sharing process - import prime/dmabuf from the producer ->
>>>>>>>>>> extract
>>>>>>>>>> underlying pages and get those shared -> return references for
>>>>>>>>>> shared pages
>>>>>>>>
>>>>>>>> Another thing is danvet was kind of against to the idea of importing
>>>>>>>> existing
>>>>>>>> dmabuf/prime buffer and forward it to the other domain due to
>>>>>>>> synchronization
>>>>>>>> issues. He proposed to make hyper_dmabuf only work as an exporter so
>>>>>>>> that it
>>>>>>>> can have a full control over the buffer. I think we need to talk
>>>>>>>> about
>>>>>>>> this
>>>>>>>> further as well.
>>>>>>>
>>>>>>> Yes, I saw this. But this limits the use-cases so much.
>>>>>>> For instance, running Android as a Guest (which uses ION to allocate
>>>>>>> buffers) means that finally HW composer will import dma-buf into
>>>>>>> the DRM driver. Then, in case of xen-front for example, it needs to
>>>>>>> be
>>>>>>> shared with the backend (Host side). Of course, we can change
>>>>>>> user-space
>>>>>>> to make xen-front allocate the buffers (make it exporter), but what
>>>>>>> we
>>>>>>> try
>>>>>>> to avoid is to change user-space which in normal world would have
>>>>>>> remain
>>>>>>> unchanged otherwise.
>>>>>>> So, I do think we have to support this use-case and just have to
>>>>>>> understand
>>>>>>> the complexity.
>>>>>>
>>>>>> Erm, why do you need importer capability for this use-case?
>>>>>>
>>>>>> guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy
>>>>>> exposes
>>>>>> that dma-buf -> import to the real display hw
>>>>>>
>>>>>> No where in this chain do you need xen-zcopy to be able to import a
>>>>>> dma-buf (within linux, it needs to import a bunch of pages from the
>>>>>> hypervisor).
>>>>>>
>>>>>> Now if your plan is to use xen-zcopy in the guest1 instead of
>>>>>> xen-front,
>>>>>> then you indeed need to import.
>>>>>
>>>>> This is the exact use-case I was referring to while saying
>>>>> we need to import on Guest1 side. If hyper-dmabuf is so
>>>>> generic that there is no xen-front in the picture, then
>>>>> it needs to import a dma-buf, so it can be exported at Guest2 side.
>>>>>>
>>>>>>     But that imo doesn't make sense:
>>>>>> - xen-front gives you clearly defined flip events you can forward to
>>>>>> the
>>>>>>      hypervisor. xen-zcopy would need to add that again.
>>>>>
>>>>> xen-zcopy is a helper driver which doesn't handle page flips
>>>>> and is not a KMS driver as one might think of: the DRM UAPI it uses is
>>>>> just to export a dma-buf as a PRIME buffer, but that's it.
>>>>> Flipping etc. is done by the backend [1], not xen-zcopy.
>>>>>>
>>>>>>     Same for
>>>>>>      hyperdmabuf (and really we're not going to shuffle struct
>>>>>> dma_fence
>>>>>> over
>>>>>>      the wire in a generic fashion between hypervisor guests).
>>>>>>
>>>>>> - xen-front already has the idea of pixel format for the buffer (and
>>>>>> any
>>>>>>      other metadata). Again, xen-zcopy and hyperdmabuf lack that,
>>>>>> would
>>>>>> need
>>>>>>      to add it shoehorned in somehow.
>>>>>
>>>>> Again, here you are talking of something which is implemented in
>>>>> Xen display backend, not xen-zcopy, e.g. display backend can
>>>>> implement para-virtual display w/o xen-zcopy at all, but in this case
>>>>> there is a memory copying for each frame. With the help of xen-zcopy
>>>>> the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or
>>>>> Weston or whatever as xen-zcopy exports remote buffers as PRIME
>>>>> buffers,
>>>>> thus no buffer copying is required.
>>>>
>>>> Why do you need to copy on every frame for xen-front? In the above
>>>> pipeline, using xen-front I see 0 architectural reasons to have a copy
>>>> anywhere.
>>>>
>>>> This seems to be the core of the confusion we're having here.
>>>
>>> Ok, so I'll try to explain:
>>> 1. xen-front - produces a display buffer to be shown at Guest2
>>> by the backend, shares its grant references with the backend
>>> 2. xen-front sends page flip event to the backend specifying the
>>> buffer in question
>>> 3. Backend takes the shared buffer (which is only a buffer mapped into
>>> backend's memory, it is not a dma-buf/PRIME one) and makes memcpy from
>>> it to a local dumb/surface
>>
>> Why do you even do that? The copying here I mean - why don't you just
>> directly scan out from the grant references you received through the
>> hypervisor?
>
> Probably the confusion comes from the fact that KVM and Xen
> implement things differently (for example, on ARM we don't use QEMU at all).
> Please see [1] and [2] for Xen frontend/backend placement in the picture.
>
> WRT to [2] xen-front is a PV front-end driver running in guest OS
> and Xen display backend is a user-space application running in Dom0
> (in the picture [2] backend runs as a Dom0 kernel driver).
> So, the para-virtualized device is not implemented in the hypervisor
> itself, but as user/kernel-space pair in corresponding domains.
> Thus, when xen-front shares grant references of the pages of the buffer
> with the Xen display backend (user-space) the later can only map those
> references into Dom0 memory to memcpy into some local display buffer/dumb.
> Hence, hypervisor is not in the equation while actually implementing
> para-virtual display device, e.g. it provides you with API to share/map
> pages, but it won't be the entity which will implement actual page flips
> etc.
> So, this is where xen-zcopy comes into the play (runs in Dom0):
> it not only maps xen-front's grant references into Dom0, but also creates
> a PRIME buffer, so this buffer can be used by other DRM devices/Weston
> running in Dom0.
>>
>> Also I'm not clear in your example which step happens where (guest 1/2
>> or hypervisor)?
>
> Steps 1,2 - Guest2, kernel space
> Steps 3-4 - Guest1, Dom0 user-space
> The hypervisor here only provides transport and means to access buffers,
> actual display/DRM related code is in xen-front and Dom0's display backend
>
>>> 4. Backend flips that local dumb buffer/surface
>>>
>>> If I have a xen-zcopy helper driver then I can avoid doing step 3):
>>> 1) 2) remain the same as above
>>> 3) Initially for a new display buffer, backend calls xen-zcopy to create
>>> a local PRIME buffer from the grant references provided by the xen-front
>>> via displif protocol [1]: we now have handle_zcopy
>>> 4) Backend exports this PRIME with HANDLE_TO_FD from xen-zcopy and
>>> imports
>>> it into Weston-KMS/DRM or real HW DRM driver with FD_TO_HANDLE: we now
>>> have
>>> handle_local
>>> 5) On page flip event backend flips local PRIME: uses handle_local for
>>> flips
>>>
>>>>>> Ofc you won't be able to shovel sound or media stream data over to
>>>>>> another
>>>>>> guest like this, but that's what you have xen-v4l and xen-sound or
>>>>>> whatever else for. Trying to make a new uapi, which means userspace
>>>>>> must
>>>>>> be changed for all the different use-case, instead of reusing standard
>>>>>> linux driver uapi (which just happens to send the data to another
>>>>>> hypervisor guest instead of real hw) imo just doesn't make much sense.
>>>>>>
>>>>>> Also, at least for the gpu subsystem: Any new uapi must have full
>>>>>> userspace available for it, see:
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
>>>>>>
>>>>>> Adding more uapi is definitely the most painful way to fix a use-case.
>>>>>> Personally I'd go as far and also change the xen-zcopy side on the
>>>>>> receiving guest to use some standard linux uapi. E.g. you could write
>>>>>> an
>>>>>> output v4l driver to receive the frames from guest1.
>>>>>
>>>>> So, we now know that xen-zcopy was not meant to handle page flips,
>>>>> but to implement new UAPI to let user-space create buffers either
>>>>> from Guest2 grant references (so it can be exported to Guest1) or
>>>>> other way round, e.g. create (from Guest1 grant references to export to
>>>>> Guest 2). For that reason it adds 2 IOCTLs: create buffer from grefs
>>>>> or produce grefs for the buffer given.
>>>>> One additional IOCTL is to wait for the buffer to be released by
>>>>> Guest2 user-space.
>>>>> That being said, I don't quite see how v4l can be used here to
>>>>> implement
>>>>> UAPI I need.
>>>>
>>>> Under the assumption that you can make xen-front to zerocopy for the
>>>> kernel->hypervisor path, v4l could be made to work for the
>>>> hypervisor->kernel side of the pipeline.
>>>>
>>>> But it sounds like we have a confusion already on why or why not
>>>> xen-front
>>>> can or cannot do zerocopy.
>>>
>>> xen-front provides an array of grant references to Guest2 (backend).
>>> It's up to backend what it does with those grant references
>>> which at Guest2 side are not PRIME or dma-buf, but just a set of pages.
>>> This is xen-zcopy which turns these pages into a PRIME. When this is done
>>> backend can now tell DRM drivers to use the buffer in DRM terms.
>>>
>>>>>>>> danvet, can you comment on this topic?
>>>>>>>>
>>>>>>>>>> 2. the page sharing mechanism - it uses Xen-grant-table.
>>>>>>>>>>
>>>>>>>>>> And to give you a quick summary of differences as far as I
>>>>>>>>>> understand
>>>>>>>>>> between two implementations (please correct me if I am wrong,
>>>>>>>>>> Oleksandr.)
>>>>>>>>>>
>>>>>>>>>> 1. xen-zcopy is DRM specific - can import only DRM prime buffer
>>>>>>>>>> while hyper_dmabuf can export any dmabuf regardless of originator
>>>>>>>>>
>>>>>>>>> Well, this is true. And at the same time this is just a matter
>>>>>>>>> of extending the API: xen-zcopy is a helper driver designed for
>>>>>>>>> xen-front/back use-case, so this is why it only has DRM PRIME API
>>>>>>>>>>
>>>>>>>>>> 2. xen-zcopy doesn't seem to have dma-buf synchronization between
>>>>>>>>>> two VMs
>>>>>>>>>> while (as danvet called it as remote dmabuf api sharing)
>>>>>>>>>> hyper_dmabuf sends
>>>>>>>>>> out synchronization message to the exporting VM for
>>>>>>>>>> synchronization.
>>>>>>>>>
>>>>>>>>> This is true. Again, this is because of the use-cases it covers.
>>>>>>>>> But having synchronization for a generic solution seems to be a
>>>>>>>>> good
>>>>>>>>> idea.
>>>>>>>>
>>>>>>>> Yeah, understood xen-zcopy works ok with your use case. But I am
>>>>>>>> just
>>>>>>>> curious
>>>>>>>> if it is ok not to have any inter-domain synchronization in this
>>>>>>>> sharing model.
>>>>>>>
>>>>>>> The synchronization is done with displif protocol [1]
>>>>>>>>
>>>>>>>> The buffer being shared is technically dma-buf and originator needs
>>>>>>>> to
>>>>>>>> be able
>>>>>>>> to keep track of it.
>>>>>>>
>>>>>>> As I am working in DRM terms the tracking is done by the DRM core
>>>>>>> for me for free. (This might be one of the reasons Daniel sees DRM
>>>>>>> based implementation fit very good from code-reuse POV).
>>>>>>
>>>>>> Hm, not sure what tracking you refer to here all ... I got lost in all
>>>>>> the
>>>>>> replies while catching up.
>>>>>>
>>>>> I was just referring to accounting stuff already implemented in the DRM
>>>>> core,
>>>>> so I don't have to worry about doing the same for buffers to understand
>>>>> when they are released etc.
>>>>>>>>>>
>>>>>>>>>> 3. 1-level references - when using grant-table for sharing pages,
>>>>>>>>>> there will
>>>>>>>>>> be same # of refs (each 8 byte)
>>>>>>>>>
>>>>>>>>> To be precise, grant ref is 4 bytes
>>>>>>>>
>>>>>>>> You are right. Thanks for correction.;)
>>>>>>>>
>>>>>>>>>> as # of shared pages, which is passed to
>>>>>>>>>> the userspace to be shared with importing VM in case of xen-zcopy.
>>>>>>>>>
>>>>>>>>> The reason for that is that xen-zcopy is a helper driver, e.g.
>>>>>>>>> the grant references come from the display backend [1], which
>>>>>>>>> implements
>>>>>>>>> Xen display protocol [2]. So, effectively the backend extracts
>>>>>>>>> references
>>>>>>>>> from frontend's requests and passes those to xen-zcopy as an array
>>>>>>>>> of refs.
>>>>>>>>>>
>>>>>>>>>>      Compared
>>>>>>>>>> to this, hyper_dmabuf does multiple level addressing to generate
>>>>>>>>>> only one
>>>>>>>>>> reference id that represents all shared pages.
>>>>>>>>>
>>>>>>>>> In the protocol [2] only one reference to the gref directory is
>>>>>>>>> passed
>>>>>>>>> between VMs
>>>>>>>>> (and the gref directory is a single-linked list of shared pages
>>>>>>>>> containing
>>>>>>>>> all
>>>>>>>>> of the grefs of the buffer).
>>>>>>>>
>>>>>>>> ok, good to know. I will look into its implementation in more
>>>>>>>> details
>>>>>>>> but is
>>>>>>>> this gref directory (chained grefs) something that can be used for
>>>>>>>> any
>>>>>>>> general
>>>>>>>> memory sharing use case or is it jsut for xen-display (in current
>>>>>>>> code
>>>>>>>> base)?
>>>>>>>
>>>>>>> Not to mislead you: one grant ref is passed via displif protocol,
>>>>>>> but the page it's referencing contains the rest of the grant refs.
>>>>>>>
>>>>>>> As to if this can be used for any memory: yes. It is the same for
>>>>>>> sndif and displif Xen protocols, but defined twice as strictly
>>>>>>> speaking
>>>>>>> sndif and displif are two separate protocols.
>>>>>>>
>>>>>>> While reviewing your RFC v2 one of the comments I had [2] was that if
>>>>>>> we
>>>>>>> can start from defining such a generic protocol for hyper-dmabuf.
>>>>>>> It can be a header file, which not only has the description part
>>>>>>> (which then become a part of Documentation/...rst file), but also
>>>>>>> defines
>>>>>>> all the required constants for requests, responses, defines message
>>>>>>> formats,
>>>>>>> state diagrams etc. all at one place. Of course this protocol must
>>>>>>> not
>>>>>>> be
>>>>>>> Xen specific, but be OS/hypervisor agnostic.
>>>>>>> Having that will trigger a new round of discussion, so we have it all
>>>>>>> designed
>>>>>>> and discussed before we start implementing.
>>>>>>>
>>>>>>> Besides the protocol we have to design UAPI part as well and make
>>>>>>> sure
>>>>>>> the hyper-dmabuf is not only accessible from user-space, but there
>>>>>>> will
>>>>>>> be
>>>>>>> number
>>>>>>> of kernel-space users as well.
>>>>>>
>>>>>> Again, why do you want to create new uapi for this? Given the very
>>>>>> strict
>>>>>> requirements we have for new uapi (see above link), it's the toughest
>>>>>> way
>>>>>> to get any kind of support in.
>>>>>
>>>>> I do understand that adding new UAPI is not good for many reasons.
>>>>> But here I was meaning that current hyper-dmabuf design is
>>>>> only user-space oriented, e.g. it provides number of IOCTLs to do all
>>>>> the work. But I need a way to access the same from the kernel, so, for
>>>>> example,
>>>>> some other para-virtual driver can export/import dma-buf, not only
>>>>> user-space.
>>>>
>>>> If you need an import-export helper library, just merge it. Do not
>>>> attach
>>>> any uapi to it, just the internal helpers.
>>>>
>>>> Much, much, much easier to land.
>>>
>>> This can be done, but again, I will need some entity which
>>> backend may use to convert xen-front's grant references into
>>> a PRIME buffer, hence there is UAPI for that. In other words,
>>> I'll need a thiner xen-zcopy which will implement the same UAPI
>>> and use that library for Xen related stuff.
>>>
>>> The confusion may also come from the fact that the backend is
>>> a user-space application, not a kernel module (we have 2 modes
>>> of its operation as of now: DRM master or Weston client), so
>>> it needs a way to talk to the kernel.
>>
>> So this is entirely a means to implement the virtual xen device in
>> dom0 (or whichever guest implements it)?
>>
>> I'm externally confused about what you mean with "backend", since
>> xen-front also has backend code. But that backend code lives in the
>> same guest os image (afaict at least), since it does direct function
>> calls.
>
> xen-front has no backend code, but only has code which allows it
> to create a dumb buffer from the grant references provided by the
> backend.
>>
>> Please be more specific in what you mean instead of just "backend",
>> that's really confusing.
>
> Hope [2] better explains this
>>
>>
>> But essentially we're talking about the equivalent of what qemu does
>> for kvm, and that's entirely not my problem. Not really a gpu
>> subsystem problem I think. Just talk with the xen hypervisor people
>> about how exactly they want to go about converting grant tables to
>> dma-buf, so that your virtual hw backend in userspace can make use of
>> it.
>
> The problem here is that the display backend then will need
> to talk to DRM. And what is the UAPI for that? Right, PRIME
> buffers.
>
>> And then merge it somewhere in the xen directories. Since the
>> grant tables and everything is very xen specific, I don't think
>> there's much point in trying to have a fake generic uapi that pretends
>> to work on other hypervisors, as long as they're Xen :-)
>>
>> And you probably have no need for all the caching/general book-keeping
>> drm_prime does (it's all in userspace I guess, except for the magic
>> conversion from grant references to a dma_buf). So there's no point
>> trying to reuse code in drm_prime.c.
>>
>> Also, this should make it tons easier to reuse xen-zcopy for
>> sound/wireless/v4l backends.
>>
>>>>>> That's why I had essentially zero big questions for xen-front (except
>>>>>> some
>>>>>> implementation improvements, and stuff to make sure xen-front actually
>>>>>> implements the real uapi semantics instead of its own), and why I'm
>>>>>> asking
>>>>>> much more questions on this stuff here.
>>>>>>
>>>>>>>>>> 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has
>>>>>>>>>> inter-vm
>>>>>>>>>> msg
>>>>>>>>>> communication defined for dmabuf synchronization and private data
>>>>>>>>>> (meta
>>>>>>>>>> info that Matt Roper mentioned) exchange.
>>>>>>>>>
>>>>>>>>> This is true, xen-zcopy has no means for inter VM sync and
>>>>>>>>> meta-data,
>>>>>>>>> simply because it doesn't have any code for inter VM exchange in
>>>>>>>>> it,
>>>>>>>>> e.g. the inter VM protocol is handled by the backend [1].
>>>>>>>>>>
>>>>>>>>>> 5. driver-to-driver notification (hyper_dmabuf only) - importing
>>>>>>>>>> VM
>>>>>>>>>> gets
>>>>>>>>>> notified when newdmabuf is exported from other VM - uevent can be
>>>>>>>>>> optionally
>>>>>>>>>> generated when this happens.
>>>>>>>>>>
>>>>>>>>>> 6. structure - hyper_dmabuf is targetting to provide a generic
>>>>>>>>>> solution for
>>>>>>>>>> inter-domain dmabuf sharing for most hypervisors, which is why it
>>>>>>>>>> has two
>>>>>>>>>> layers as mattrope mentioned, front-end that contains standard API
>>>>>>>>>> and backend
>>>>>>>>>> that is specific to hypervisor.
>>>>>>>>>
>>>>>>>>> Again, xen-zcopy is decoupled from inter VM communication
>>>>>>>>>>>>
>>>>>>>>>>>> No idea, didn't look at it in detail.
>>>>>>>>>>>>
>>>>>>>>>>>> Looks pretty complex from a distant view.  Maybe because it
>>>>>>>>>>>> tries
>>>>>>>>>>>> to
>>>>>>>>>>>> build a communication framework using dma-bufs instead of a
>>>>>>>>>>>> simple
>>>>>>>>>>>> dma-buf passing mechanism.
>>>>>>>>>>
>>>>>>>>>> we started with simple dma-buf sharing but realized there are many
>>>>>>>>>> things we need to consider in real use-case, so we added
>>>>>>>>>> communication
>>>>>>>>>> , notification and dma-buf synchronization then re-structured it
>>>>>>>>>> to
>>>>>>>>>> front-end and back-end (this made things more compicated..) since
>>>>>>>>>> Xen
>>>>>>>>>> was not our only target. Also, we thought passing the reference
>>>>>>>>>> for
>>>>>>>>>> the
>>>>>>>>>> buffer (hyper_dmabuf_id) is not secure so added uvent mechanism
>>>>>>>>>> later.
>>>>>>>>>>
>>>>>>>>>>> Yes, I am looking at it now, trying to figure out the full story
>>>>>>>>>>> and its implementation. BTW, Intel guys were about to share some
>>>>>>>>>>> test application for hyper-dmabuf, maybe I have missed one.
>>>>>>>>>>> It could probably better explain the use-cases and the complexity
>>>>>>>>>>> they have in hyper-dmabuf.
>>>>>>>>>>
>>>>>>>>>> One example is actually in github. If you want take a look at it,
>>>>>>>>>> please
>>>>>>>>>> visit:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
>>>>>>>>>
>>>>>>>>> Thank you, I'll have a look
>>>>>>>>>>>>
>>>>>>>>>>>> Like xen-zcopy it seems to depend on the idea that the
>>>>>>>>>>>> hypervisor
>>>>>>>>>>>> manages all memory it is easy for guests to share pages with the
>>>>>>>>>>>> help of
>>>>>>>>>>>> the hypervisor.
>>>>>>>>>>>
>>>>>>>>>>> So, for xen-zcopy we were not trying to make it generic,
>>>>>>>>>>> it just solves display (dumb) zero-copying use-cases for Xen.
>>>>>>>>>>> We implemented it as a DRM helper driver because we can't see any
>>>>>>>>>>> other use-cases as of now.
>>>>>>>>>>> For example, we also have Xen para-virtualized sound driver, but
>>>>>>>>>>> its buffer memory usage is not comparable to what display wants
>>>>>>>>>>> and it works somewhat differently (e.g. there is no "frame done"
>>>>>>>>>>> event, so one can't tell when the sound buffer can be "flipped").
>>>>>>>>>>> At the same time, we do not use virtio-gpu, so this could
>>>>>>>>>>> probably
>>>>>>>>>>> be one more candidate for shared dma-bufs some day.
>>>>>>>>>>>>
>>>>>>>>>>>>       Which simply isn't the case on kvm.
>>>>>>>>>>>>
>>>>>>>>>>>> hyper-dmabuf and xen-zcopy could maybe share code, or
>>>>>>>>>>>> hyper-dmabuf
>>>>>>>>>>>> build
>>>>>>>>>>>> on top of xen-zcopy.
>>>>>>>>>>>
>>>>>>>>>>> Hm, I can imagine that: xen-zcopy could be a library code for
>>>>>>>>>>> hyper-dmabuf
>>>>>>>>>>> in terms of implementing all that page sharing fun in multiple
>>>>>>>>>>> directions,
>>>>>>>>>>> e.g. Host->Guest, Guest->Host, Guest<->Guest.
>>>>>>>>>>> But I'll let Matt and Dongwon to comment on that.
>>>>>>>>>>
>>>>>>>>>> I think we can definitely collaborate. Especially, maybe we are
>>>>>>>>>> using some
>>>>>>>>>> outdated sharing mechanism/grant-table mechanism in our Xen
>>>>>>>>>> backend
>>>>>>>>>> (thanks
>>>>>>>>>> for bringing that up Oleksandr). However, the question is once we
>>>>>>>>>> collaborate
>>>>>>>>>> somehow, can xen-zcopy's usecase use the standard API that
>>>>>>>>>> hyper_dmabuf
>>>>>>>>>> provides? I don't think we need different IOCTLs that do the same
>>>>>>>>>> in
>>>>>>>>>> the final
>>>>>>>>>> solution.
>>>>>>>>>>
>>>>>>>>> If you think of xen-zcopy as a library (which implements Xen
>>>>>>>>> grant references mangling) and DRM PRIME wrapper on top of that
>>>>>>>>> library, we can probably define proper API for that library,
>>>>>>>>> so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
>>>>>>>>> about to start upstreaming Xen para-virtualized sound device driver
>>>>>>>>> soon,
>>>>>>>>> which also uses similar code and gref passing mechanism [3].
>>>>>>>>> (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
>>>>>>>>> snd/xen-front and then propose a Xen helper library for sharing big
>>>>>>>>> buffers,
>>>>>>>>> so common code of the above drivers can use the same code w/o code
>>>>>>>>> duplication)
>>>>>>>>
>>>>>>>> I think it is possible to use your functions for memory sharing part
>>>>>>>> in
>>>>>>>> hyper_dmabuf's backend (this 'backend' means the layer that does
>>>>>>>> page
>>>>>>>> sharing
>>>>>>>> and inter-vm communication with xen-specific way.), so why don't we
>>>>>>>> work on
>>>>>>>> "Xen helper library for sharing big buffers" first while we continue
>>>>>>>> our
>>>>>>>> discussion on the common API layer that can cover any dmabuf sharing
>>>>>>>> cases.
>>>>>>>>
>>>>>>> Well, I would love we reuse the code that I have, but I also
>>>>>>> understand that it was limited by my use-cases. So, I do not
>>>>>>> insist we have to ;)
>>>>>>> If we start designing and discussing hyper-dmabuf protocol we of
>>>>>>> course
>>>>>>> can work on this helper library in parallel.
>>>>>>
>>>>>> Imo code reuse is overrated. Adding new uapi is what freaks me out
>>>>>> here
>>>>>> :-)
>>>>>>
>>>>>> If we end up with duplicated implementations, even in upstream, meh,
>>>>>> not
>>>>>> great, but also ok. New uapi, and in a similar way, new hypervisor api
>>>>>> like the dma-buf forwarding that hyperdmabuf does is the kind of thing
>>>>>> that will lock us in for 10+ years (if we make a mistake).
>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Oleksandr
>>>>>>>>>
>>>>>>>>> P.S. All, is it a good idea to move this out of udmabuf thread into
>>>>>>>>> a
>>>>>>>>> dedicated one?
>>>>>>>>
>>>>>>>> Either way is fine with me.
>>>>>>>
>>>>>>> So, if you can start designing the protocol we may have a dedicated
>>>>>>> mail
>>>>>>> thread for that. I will try to help with the protocol as much as I
>>>>>>> can
>>>>>>
>>>>>> Please don't start with the protocol. Instead start with the concrete
>>>>>> use-cases, and then figure out why exactly you need new uapi. Once we
>>>>>> have
>>>>>> that answered, we can start thinking about fleshing out the details.
>>>>>
>>>>> On my side there are only 2 use-cases, Guest2 only:
>>>>> 1. Create a PRIME (dma-buf) from grant references
>>>>> 2. Create grant references from PRIME (dma-buf)
>>>>
>>>> So these grant references, are those userspace visible things?
>>>
>>> Yes, the user-space backend receives those from xen-front via [1]
>>>
>>>> I thought
>>>> the grant references was just the kernel/hypervisor internal magic to
>>>> make
>>>> this all work?
>>>
>>> So, I can map the grant references from user-space, but I won't
>>> be able to turn those into a PRIME buffer. So, the only use of those
>>> w/o xen-zcopy is to map grant refs and copy into real HW dumb on every
>>> page
>>> flip.
>>
>> Ok, that explains. I thought your current xen-side implementation for
>> xen-front is already making all that stuff happen. But I'm still not
>> sure given all the confusing talk about back-end we have in these
>> threads (hyperdmabuf people also talked about different backends for
>> different hypervisors, I guess that's a different kind of backend?).
>
> Hope the explanation above makes it all clearer.
> Please let me know if you still want me to elaborate more
>>
>> -Daniel
>
> [1] https://wiki.xen.org/wiki/Paravirtualization_(PV)
> [2] https://wiki.xen.org/wiki/File:XenPV.png

[RfC] Add udmabuf misc device

Commit Message

Comments

Patch