diff mbox

[v9] dma-buf: Add ioctls to allow userspace to flush

Message ID 1455228291-29640-1-git-send-email-tiago.vignatti@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tiago Vignatti Feb. 11, 2016, 10:04 p.m. UTC
From: Daniel Vetter <daniel.vetter@ffwll.ch>

The userspace might need some sort of cache coherency management e.g. when CPU
and GPU domains are being accessed through dma-buf at the same time. To
circumvent this problem there are begin/end coherency markers, that forward
directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
used like following:
     - mmap dma-buf fd
     - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
       to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
       want (with the new data being consumed by the GPU or say scanout device)
     - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64)
v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
dma-buf functions. Check for overflows in start/length.
v4 (Tiago): use 2d regions for sync.
v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
remove range information from struct dma_buf_sync.
v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
documentation about the recommendation on using sync ioctls.
v7 (Tiago): Alex' nit on flags definition and being even more wording in the
doc about sync usage.
v9 (Tiago): remove useless is_dma_buf_file check. Fix sync.flags conditionals
and its mask order check. Add <linux/types.h> include in dma-buf.h.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Tiago Vignatti <tiago.vignatti@intel.com>
---

I left SYNC_START and SYNC_END exclusive, just how the logic was before. If we
see an useful use case, maybe like the way David said, to store two frames
next to each other in the same BO, we can patch up later fairly easily.

About the ioctl direction, just like Ville pointed, we're doing only
copy_from_user at the moment and seems that _IOW is all we need. So I also
didn't touch anything on that.

David, Ville PTAL. Thank you,

Tiago

 Documentation/dma-buf-sharing.txt | 21 +++++++++++++++++-
 drivers/dma-buf/dma-buf.c         | 45 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/dma-buf.h      | 40 ++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 include/uapi/linux/dma-buf.h

Comments

David Herrmann Feb. 12, 2016, 2:50 p.m. UTC | #1
Hi

On Thu, Feb 11, 2016 at 11:04 PM, Tiago Vignatti
<tiago.vignatti@intel.com> wrote:
> From: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> The userspace might need some sort of cache coherency management e.g. when CPU
> and GPU domains are being accessed through dma-buf at the same time. To
> circumvent this problem there are begin/end coherency markers, that forward
> directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
> of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
> used like following:
>      - mmap dma-buf fd
>      - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
>        to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
>        want (with the new data being consumed by the GPU or say scanout device)
>      - munmap once you don't need the buffer any more
>
> v2 (Tiago): Fix header file type names (u64 -> __u64)
> v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
> dma-buf functions. Check for overflows in start/length.
> v4 (Tiago): use 2d regions for sync.
> v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
> remove range information from struct dma_buf_sync.
> v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
> documentation about the recommendation on using sync ioctls.
> v7 (Tiago): Alex' nit on flags definition and being even more wording in the
> doc about sync usage.
> v9 (Tiago): remove useless is_dma_buf_file check. Fix sync.flags conditionals
> and its mask order check. Add <linux/types.h> include in dma-buf.h.
>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: David Herrmann <dh.herrmann@gmail.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Signed-off-by: Tiago Vignatti <tiago.vignatti@intel.com>
> ---
>
> I left SYNC_START and SYNC_END exclusive, just how the logic was before. If we
> see an useful use case, maybe like the way David said, to store two frames
> next to each other in the same BO, we can patch up later fairly easily.
>
> About the ioctl direction, just like Ville pointed, we're doing only
> copy_from_user at the moment and seems that _IOW is all we need. So I also
> didn't touch anything on that.

Fair enough.

All I have left are comments on coding-style, and I'll refrain from
verbalizing them.. (gnah, it is so tempting).

Reviewed-by: David Herrmann <dh.herrmann@gmail.com>

Thanks!
David

>  Documentation/dma-buf-sharing.txt | 21 +++++++++++++++++-
>  drivers/dma-buf/dma-buf.c         | 45 +++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/dma-buf.h      | 40 ++++++++++++++++++++++++++++++++++
>  3 files changed, 105 insertions(+), 1 deletion(-)
>  create mode 100644 include/uapi/linux/dma-buf.h
>
> diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt
> index 4f4a84b..32ac32e 100644
> --- a/Documentation/dma-buf-sharing.txt
> +++ b/Documentation/dma-buf-sharing.txt
> @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases:
>     handles, too). So it's beneficial to support this in a similar fashion on
>     dma-buf to have a good transition path for existing Android userspace.
>
> -   No special interfaces, userspace simply calls mmap on the dma-buf fd.
> +   No special interfaces, userspace simply calls mmap on the dma-buf fd, making
> +   sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always*
> +   used when the access happens. This is discussed next paragraphs.
> +
> +   Some systems might need some sort of cache coherency management e.g. when
> +   CPU and GPU domains are being accessed through dma-buf at the same time. To
> +   circumvent this problem there are begin/end coherency markers, that forward
> +   directly to existing dma-buf device drivers vfunc hooks. Userspace can make
> +   use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
> +   would be used like following:
> +     - mmap dma-buf fd
> +     - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
> +       to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
> +       want (with the new data being consumed by the GPU or say scanout device)
> +     - munmap once you don't need the buffer any more
> +
> +    Therefore, for correctness and optimal performance, systems with the memory
> +    cache shared by the GPU and CPU i.e. the "coherent" and also the
> +    "incoherent" are always required to use SYNC_START and SYNC_END before and
> +    after, respectively, when accessing the mapped address.
>
>  2. Supporting existing mmap interfaces in importers
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index b2ac13b..9810d1d 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -34,6 +34,8 @@
>  #include <linux/poll.h>
>  #include <linux/reservation.h>
>
> +#include <uapi/linux/dma-buf.h>
> +
>  static inline int is_dma_buf_file(struct file *);
>
>  struct dma_buf_list {
> @@ -251,11 +253,54 @@ out:
>         return events;
>  }
>
> +static long dma_buf_ioctl(struct file *file,
> +                         unsigned int cmd, unsigned long arg)
> +{
> +       struct dma_buf *dmabuf;
> +       struct dma_buf_sync sync;
> +       enum dma_data_direction direction;
> +
> +       dmabuf = file->private_data;
> +
> +       switch (cmd) {
> +       case DMA_BUF_IOCTL_SYNC:
> +               if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
> +                       return -EFAULT;
> +
> +               if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
> +                       return -EINVAL;
> +
> +               switch (sync.flags & DMA_BUF_SYNC_RW) {
> +               case DMA_BUF_SYNC_READ:
> +                       direction = DMA_FROM_DEVICE;
> +                       break;
> +               case DMA_BUF_SYNC_WRITE:
> +                       direction = DMA_TO_DEVICE;
> +                       break;
> +               case DMA_BUF_SYNC_RW:
> +                       direction = DMA_BIDIRECTIONAL;
> +                       break;
> +               default:
> +                       return -EINVAL;
> +               }
> +
> +               if (sync.flags & DMA_BUF_SYNC_END)
> +                       dma_buf_end_cpu_access(dmabuf, direction);
> +               else
> +                       dma_buf_begin_cpu_access(dmabuf, direction);
> +
> +               return 0;
> +       default:
> +               return -ENOTTY;
> +       }
> +}
> +
>  static const struct file_operations dma_buf_fops = {
>         .release        = dma_buf_release,
>         .mmap           = dma_buf_mmap_internal,
>         .llseek         = dma_buf_llseek,
>         .poll           = dma_buf_poll,
> +       .unlocked_ioctl = dma_buf_ioctl,
>  };
>
>  /*
> diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
> new file mode 100644
> index 0000000..fb0dedb
> --- /dev/null
> +++ b/include/uapi/linux/dma-buf.h
> @@ -0,0 +1,40 @@
> +/*
> + * Framework for buffer objects that can be shared across devices/subsystems.
> + *
> + * Copyright(C) 2015 Intel Ltd
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _DMA_BUF_UAPI_H_
> +#define _DMA_BUF_UAPI_H_
> +
> +#include <linux/types.h>
> +
> +/* begin/end dma-buf functions used for userspace mmap. */
> +struct dma_buf_sync {
> +       __u64 flags;
> +};
> +
> +#define DMA_BUF_SYNC_READ      (1 << 0)
> +#define DMA_BUF_SYNC_WRITE     (2 << 0)
> +#define DMA_BUF_SYNC_RW        (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE)
> +#define DMA_BUF_SYNC_START     (0 << 2)
> +#define DMA_BUF_SYNC_END       (1 << 2)
> +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
> +       (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
> +
> +#define DMA_BUF_BASE           'b'
> +#define DMA_BUF_IOCTL_SYNC     _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
> +
> +#endif
> --
> 2.1.4
>
Daniel Vetter Feb. 12, 2016, 3:02 p.m. UTC | #2
On Fri, Feb 12, 2016 at 03:50:02PM +0100, David Herrmann wrote:
> Hi
> 
> On Thu, Feb 11, 2016 at 11:04 PM, Tiago Vignatti
> <tiago.vignatti@intel.com> wrote:
> > From: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> > The userspace might need some sort of cache coherency management e.g. when CPU
> > and GPU domains are being accessed through dma-buf at the same time. To
> > circumvent this problem there are begin/end coherency markers, that forward
> > directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
> > of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
> > used like following:
> >      - mmap dma-buf fd
> >      - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
> >        to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
> >        want (with the new data being consumed by the GPU or say scanout device)
> >      - munmap once you don't need the buffer any more
> >
> > v2 (Tiago): Fix header file type names (u64 -> __u64)
> > v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
> > dma-buf functions. Check for overflows in start/length.
> > v4 (Tiago): use 2d regions for sync.
> > v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
> > remove range information from struct dma_buf_sync.
> > v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
> > documentation about the recommendation on using sync ioctls.
> > v7 (Tiago): Alex' nit on flags definition and being even more wording in the
> > doc about sync usage.
> > v9 (Tiago): remove useless is_dma_buf_file check. Fix sync.flags conditionals
> > and its mask order check. Add <linux/types.h> include in dma-buf.h.
> >
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Cc: David Herrmann <dh.herrmann@gmail.com>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Signed-off-by: Tiago Vignatti <tiago.vignatti@intel.com>
> > ---
> >
> > I left SYNC_START and SYNC_END exclusive, just how the logic was before. If we
> > see an useful use case, maybe like the way David said, to store two frames
> > next to each other in the same BO, we can patch up later fairly easily.
> >
> > About the ioctl direction, just like Ville pointed, we're doing only
> > copy_from_user at the moment and seems that _IOW is all we need. So I also
> > didn't touch anything on that.
> 
> Fair enough.
> 
> All I have left are comments on coding-style, and I'll refrain from
> verbalizing them.. (gnah, it is so tempting).
> 
> Reviewed-by: David Herrmann <dh.herrmann@gmail.com>

Merged to drm-misc, thanks.
-Daniel
Chris Wilson Feb. 25, 2016, 6:01 p.m. UTC | #3
On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:
> +static long dma_buf_ioctl(struct file *file,
> +			  unsigned int cmd, unsigned long arg)
> +{
> +	struct dma_buf *dmabuf;
> +	struct dma_buf_sync sync;
> +	enum dma_data_direction direction;
> +
> +	dmabuf = file->private_data;
> +
> +	switch (cmd) {
> +	case DMA_BUF_IOCTL_SYNC:
> +		if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
> +			return -EFAULT;
> +
> +		if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
> +			return -EINVAL;
> +
> +		switch (sync.flags & DMA_BUF_SYNC_RW) {
> +		case DMA_BUF_SYNC_READ:
> +			direction = DMA_FROM_DEVICE;
> +			break;
> +		case DMA_BUF_SYNC_WRITE:
> +			direction = DMA_TO_DEVICE;
> +			break;
> +		case DMA_BUF_SYNC_RW:
> +			direction = DMA_BIDIRECTIONAL;
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +
> +		if (sync.flags & DMA_BUF_SYNC_END)
> +			dma_buf_end_cpu_access(dmabuf, direction);
> +		else
> +			dma_buf_begin_cpu_access(dmabuf, direction);

We forgot to report the error back to userspace. Might as well fixup the
callchain to propagate error from end-cpu-access as well. Found after
updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
-Chris
Daniel Vetter Feb. 29, 2016, 2:54 p.m. UTC | #4
On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:
> On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:
> > +static long dma_buf_ioctl(struct file *file,
> > +			  unsigned int cmd, unsigned long arg)
> > +{
> > +	struct dma_buf *dmabuf;
> > +	struct dma_buf_sync sync;
> > +	enum dma_data_direction direction;
> > +
> > +	dmabuf = file->private_data;
> > +
> > +	switch (cmd) {
> > +	case DMA_BUF_IOCTL_SYNC:
> > +		if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
> > +			return -EFAULT;
> > +
> > +		if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
> > +			return -EINVAL;
> > +
> > +		switch (sync.flags & DMA_BUF_SYNC_RW) {
> > +		case DMA_BUF_SYNC_READ:
> > +			direction = DMA_FROM_DEVICE;
> > +			break;
> > +		case DMA_BUF_SYNC_WRITE:
> > +			direction = DMA_TO_DEVICE;
> > +			break;
> > +		case DMA_BUF_SYNC_RW:
> > +			direction = DMA_BIDIRECTIONAL;
> > +			break;
> > +		default:
> > +			return -EINVAL;
> > +		}
> > +
> > +		if (sync.flags & DMA_BUF_SYNC_END)
> > +			dma_buf_end_cpu_access(dmabuf, direction);
> > +		else
> > +			dma_buf_begin_cpu_access(dmabuf, direction);
> 
> We forgot to report the error back to userspace. Might as well fixup the
> callchain to propagate error from end-cpu-access as well. Found after
> updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.

EINTR? Do we need to make this ABI - I guess so? Tiago, do you have
patches? See drmIoctl() in libdrm for what's needed on the userspace side
if my guess is right.
-Daniel
Chris Wilson Feb. 29, 2016, 3:02 p.m. UTC | #5
On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote:
> On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:
> > On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:
> > > +static long dma_buf_ioctl(struct file *file,
> > > +			  unsigned int cmd, unsigned long arg)
> > > +{
> > > +	struct dma_buf *dmabuf;
> > > +	struct dma_buf_sync sync;
> > > +	enum dma_data_direction direction;
> > > +
> > > +	dmabuf = file->private_data;
> > > +
> > > +	switch (cmd) {
> > > +	case DMA_BUF_IOCTL_SYNC:
> > > +		if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
> > > +			return -EFAULT;
> > > +
> > > +		if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
> > > +			return -EINVAL;
> > > +
> > > +		switch (sync.flags & DMA_BUF_SYNC_RW) {
> > > +		case DMA_BUF_SYNC_READ:
> > > +			direction = DMA_FROM_DEVICE;
> > > +			break;
> > > +		case DMA_BUF_SYNC_WRITE:
> > > +			direction = DMA_TO_DEVICE;
> > > +			break;
> > > +		case DMA_BUF_SYNC_RW:
> > > +			direction = DMA_BIDIRECTIONAL;
> > > +			break;
> > > +		default:
> > > +			return -EINVAL;
> > > +		}
> > > +
> > > +		if (sync.flags & DMA_BUF_SYNC_END)
> > > +			dma_buf_end_cpu_access(dmabuf, direction);
> > > +		else
> > > +			dma_buf_begin_cpu_access(dmabuf, direction);
> > 
> > We forgot to report the error back to userspace. Might as well fixup the
> > callchain to propagate error from end-cpu-access as well. Found after
> > updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
> 
> EINTR? Do we need to make this ABI - I guess so? Tiago, do you have
> patches? See drmIoctl() in libdrm for what's needed on the userspace side
> if my guess is right.

EINTR is the easiest, but conceivably we could also get EIO and
currently EAGAIN.

I am also seeing some strange timing dependent (i.e. valgrind doesn't
show up anything client side and the tests then pass) failures (SIGSEGV,
SIGBUS) with !llc.
-Chris
Daniel Vetter March 5, 2016, 9:34 a.m. UTC | #6
On Mon, Feb 29, 2016 at 03:02:09PM +0000, Chris Wilson wrote:
> On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote:
> > On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:
> > > On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:
> > > > +static long dma_buf_ioctl(struct file *file,
> > > > +			  unsigned int cmd, unsigned long arg)
> > > > +{
> > > > +	struct dma_buf *dmabuf;
> > > > +	struct dma_buf_sync sync;
> > > > +	enum dma_data_direction direction;
> > > > +
> > > > +	dmabuf = file->private_data;
> > > > +
> > > > +	switch (cmd) {
> > > > +	case DMA_BUF_IOCTL_SYNC:
> > > > +		if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
> > > > +			return -EFAULT;
> > > > +
> > > > +		if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
> > > > +			return -EINVAL;
> > > > +
> > > > +		switch (sync.flags & DMA_BUF_SYNC_RW) {
> > > > +		case DMA_BUF_SYNC_READ:
> > > > +			direction = DMA_FROM_DEVICE;
> > > > +			break;
> > > > +		case DMA_BUF_SYNC_WRITE:
> > > > +			direction = DMA_TO_DEVICE;
> > > > +			break;
> > > > +		case DMA_BUF_SYNC_RW:
> > > > +			direction = DMA_BIDIRECTIONAL;
> > > > +			break;
> > > > +		default:
> > > > +			return -EINVAL;
> > > > +		}
> > > > +
> > > > +		if (sync.flags & DMA_BUF_SYNC_END)
> > > > +			dma_buf_end_cpu_access(dmabuf, direction);
> > > > +		else
> > > > +			dma_buf_begin_cpu_access(dmabuf, direction);
> > > 
> > > We forgot to report the error back to userspace. Might as well fixup the
> > > callchain to propagate error from end-cpu-access as well. Found after
> > > updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
> > 
> > EINTR? Do we need to make this ABI - I guess so? Tiago, do you have
> > patches? See drmIoctl() in libdrm for what's needed on the userspace side
> > if my guess is right.
> 
> EINTR is the easiest, but conceivably we could also get EIO and
> currently EAGAIN.
> 
> I am also seeing some strange timing dependent (i.e. valgrind doesn't
> show up anything client side and the tests then pass) failures (SIGSEGV,
> SIGBUS) with !llc.

Tiago, ping. Also probably a gap in igt coverage besides the kernel side.
-Daniel
Tiago Vignatti March 14, 2016, 8:21 p.m. UTC | #7
On 03/05/2016 06:34 AM, Daniel Vetter wrote:
> On Mon, Feb 29, 2016 at 03:02:09PM +0000, Chris Wilson wrote:
>> On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote:
>>> On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:
>>>> On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:
>>>>> +static long dma_buf_ioctl(struct file *file,
>>>>> +			  unsigned int cmd, unsigned long arg)
>>>>> +{
>>>>> +	struct dma_buf *dmabuf;
>>>>> +	struct dma_buf_sync sync;
>>>>> +	enum dma_data_direction direction;
>>>>> +
>>>>> +	dmabuf = file->private_data;
>>>>> +
>>>>> +	switch (cmd) {
>>>>> +	case DMA_BUF_IOCTL_SYNC:
>>>>> +		if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
>>>>> +			return -EFAULT;
>>>>> +
>>>>> +		if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
>>>>> +			return -EINVAL;
>>>>> +
>>>>> +		switch (sync.flags & DMA_BUF_SYNC_RW) {
>>>>> +		case DMA_BUF_SYNC_READ:
>>>>> +			direction = DMA_FROM_DEVICE;
>>>>> +			break;
>>>>> +		case DMA_BUF_SYNC_WRITE:
>>>>> +			direction = DMA_TO_DEVICE;
>>>>> +			break;
>>>>> +		case DMA_BUF_SYNC_RW:
>>>>> +			direction = DMA_BIDIRECTIONAL;
>>>>> +			break;
>>>>> +		default:
>>>>> +			return -EINVAL;
>>>>> +		}
>>>>> +
>>>>> +		if (sync.flags & DMA_BUF_SYNC_END)
>>>>> +			dma_buf_end_cpu_access(dmabuf, direction);
>>>>> +		else
>>>>> +			dma_buf_begin_cpu_access(dmabuf, direction);
>>>>
>>>> We forgot to report the error back to userspace. Might as well fixup the
>>>> callchain to propagate error from end-cpu-access as well. Found after
>>>> updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
>>>
>>> EINTR? Do we need to make this ABI - I guess so? Tiago, do you have
>>> patches? See drmIoctl() in libdrm for what's needed on the userspace side
>>> if my guess is right.
>>
>> EINTR is the easiest, but conceivably we could also get EIO and
>> currently EAGAIN.
>>
>> I am also seeing some strange timing dependent (i.e. valgrind doesn't
>> show up anything client side and the tests then pass) failures (SIGSEGV,
>> SIGBUS) with !llc.
>
> Tiago, ping. Also probably a gap in igt coverage besides the kernel side.
> -Daniel

Hey guys! I'm back from vacation now. I'll take a look on it in the next 
days and then come back to you.

Tiago
Chris Wilson March 15, 2016, 8:51 a.m. UTC | #8
On Mon, Mar 14, 2016 at 05:21:10PM -0300, Tiago Vignatti wrote:
> On 03/05/2016 06:34 AM, Daniel Vetter wrote:
> >On Mon, Feb 29, 2016 at 03:02:09PM +0000, Chris Wilson wrote:
> >>On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote:
> >>>On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:
> >>>>On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:
> >>>>>+static long dma_buf_ioctl(struct file *file,
> >>>>>+			  unsigned int cmd, unsigned long arg)
> >>>>>+{
> >>>>>+	struct dma_buf *dmabuf;
> >>>>>+	struct dma_buf_sync sync;
> >>>>>+	enum dma_data_direction direction;
> >>>>>+
> >>>>>+	dmabuf = file->private_data;
> >>>>>+
> >>>>>+	switch (cmd) {
> >>>>>+	case DMA_BUF_IOCTL_SYNC:
> >>>>>+		if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
> >>>>>+			return -EFAULT;
> >>>>>+
> >>>>>+		if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
> >>>>>+			return -EINVAL;
> >>>>>+
> >>>>>+		switch (sync.flags & DMA_BUF_SYNC_RW) {
> >>>>>+		case DMA_BUF_SYNC_READ:
> >>>>>+			direction = DMA_FROM_DEVICE;
> >>>>>+			break;
> >>>>>+		case DMA_BUF_SYNC_WRITE:
> >>>>>+			direction = DMA_TO_DEVICE;
> >>>>>+			break;
> >>>>>+		case DMA_BUF_SYNC_RW:
> >>>>>+			direction = DMA_BIDIRECTIONAL;
> >>>>>+			break;
> >>>>>+		default:
> >>>>>+			return -EINVAL;
> >>>>>+		}
> >>>>>+
> >>>>>+		if (sync.flags & DMA_BUF_SYNC_END)
> >>>>>+			dma_buf_end_cpu_access(dmabuf, direction);
> >>>>>+		else
> >>>>>+			dma_buf_begin_cpu_access(dmabuf, direction);
> >>>>
> >>>>We forgot to report the error back to userspace. Might as well fixup the
> >>>>callchain to propagate error from end-cpu-access as well. Found after
> >>>>updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
> >>>
> >>>EINTR? Do we need to make this ABI - I guess so? Tiago, do you have
> >>>patches? See drmIoctl() in libdrm for what's needed on the userspace side
> >>>if my guess is right.
> >>
> >>EINTR is the easiest, but conceivably we could also get EIO and
> >>currently EAGAIN.
> >>
> >>I am also seeing some strange timing dependent (i.e. valgrind doesn't
> >>show up anything client side and the tests then pass) failures (SIGSEGV,
> >>SIGBUS) with !llc.
> >
> >Tiago, ping. Also probably a gap in igt coverage besides the kernel side.
> >-Daniel
> 
> Hey guys! I'm back from vacation now. I'll take a look on it in the
> next days and then come back to you.

An unpolished version:
https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=breadcrumbs&id=b83483e87438f870d8209a44a1843235555a1d54
-Chris
diff mbox

Patch

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt
index 4f4a84b..32ac32e 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -350,7 +350,26 @@  Being able to mmap an export dma-buf buffer object has 2 main use-cases:
    handles, too). So it's beneficial to support this in a similar fashion on
    dma-buf to have a good transition path for existing Android userspace.
 
-   No special interfaces, userspace simply calls mmap on the dma-buf fd.
+   No special interfaces, userspace simply calls mmap on the dma-buf fd, making
+   sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always*
+   used when the access happens. This is discussed next paragraphs.
+
+   Some systems might need some sort of cache coherency management e.g. when
+   CPU and GPU domains are being accessed through dma-buf at the same time. To
+   circumvent this problem there are begin/end coherency markers, that forward
+   directly to existing dma-buf device drivers vfunc hooks. Userspace can make
+   use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
+   would be used like following:
+     - mmap dma-buf fd
+     - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
+       to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
+       want (with the new data being consumed by the GPU or say scanout device)
+     - munmap once you don't need the buffer any more
+
+    Therefore, for correctness and optimal performance, systems with the memory
+    cache shared by the GPU and CPU i.e. the "coherent" and also the
+    "incoherent" are always required to use SYNC_START and SYNC_END before and
+    after, respectively, when accessing the mapped address.
 
 2. Supporting existing mmap interfaces in importers
 
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b2ac13b..9810d1d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -34,6 +34,8 @@ 
 #include <linux/poll.h>
 #include <linux/reservation.h>
 
+#include <uapi/linux/dma-buf.h>
+
 static inline int is_dma_buf_file(struct file *);
 
 struct dma_buf_list {
@@ -251,11 +253,54 @@  out:
 	return events;
 }
 
+static long dma_buf_ioctl(struct file *file,
+			  unsigned int cmd, unsigned long arg)
+{
+	struct dma_buf *dmabuf;
+	struct dma_buf_sync sync;
+	enum dma_data_direction direction;
+
+	dmabuf = file->private_data;
+
+	switch (cmd) {
+	case DMA_BUF_IOCTL_SYNC:
+		if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
+			return -EFAULT;
+
+		if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
+			return -EINVAL;
+
+		switch (sync.flags & DMA_BUF_SYNC_RW) {
+		case DMA_BUF_SYNC_READ:
+			direction = DMA_FROM_DEVICE;
+			break;
+		case DMA_BUF_SYNC_WRITE:
+			direction = DMA_TO_DEVICE;
+			break;
+		case DMA_BUF_SYNC_RW:
+			direction = DMA_BIDIRECTIONAL;
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		if (sync.flags & DMA_BUF_SYNC_END)
+			dma_buf_end_cpu_access(dmabuf, direction);
+		else
+			dma_buf_begin_cpu_access(dmabuf, direction);
+
+		return 0;
+	default:
+		return -ENOTTY;
+	}
+}
+
 static const struct file_operations dma_buf_fops = {
 	.release	= dma_buf_release,
 	.mmap		= dma_buf_mmap_internal,
 	.llseek		= dma_buf_llseek,
 	.poll		= dma_buf_poll,
+	.unlocked_ioctl	= dma_buf_ioctl,
 };
 
 /*
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
new file mode 100644
index 0000000..fb0dedb
--- /dev/null
+++ b/include/uapi/linux/dma-buf.h
@@ -0,0 +1,40 @@ 
+/*
+ * Framework for buffer objects that can be shared across devices/subsystems.
+ *
+ * Copyright(C) 2015 Intel Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _DMA_BUF_UAPI_H_
+#define _DMA_BUF_UAPI_H_
+
+#include <linux/types.h>
+
+/* begin/end dma-buf functions used for userspace mmap. */
+struct dma_buf_sync {
+	__u64 flags;
+};
+
+#define DMA_BUF_SYNC_READ      (1 << 0)
+#define DMA_BUF_SYNC_WRITE     (2 << 0)
+#define DMA_BUF_SYNC_RW        (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE)
+#define DMA_BUF_SYNC_START     (0 << 2)
+#define DMA_BUF_SYNC_END       (1 << 2)
+#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
+	(DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+
+#define DMA_BUF_BASE		'b'
+#define DMA_BUF_IOCTL_SYNC	_IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
+
+#endif