[RFC,13/15] videobuf2: do not sync buffers for DMABUF queues
diff mbox series

Message ID 20191217032034.54897-14-senozhatsky@chromium.org
State New
Headers show
Series
  • Implement V4L2_BUF_FLAG_NO_CACHE_* flags
Related show

Commit Message

Sergey Senozhatsky Dec. 17, 2019, 3:20 a.m. UTC
DMA-exporter is supposed to handle cache syncs, so we can
avoid ->prepare()/->finish() syncs from videobuf2 core for
DMABUF buffers.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/media/common/videobuf2/videobuf2-v4l2.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Hans Verkuil Jan. 10, 2020, 10:30 a.m. UTC | #1
On 12/17/19 4:20 AM, Sergey Senozhatsky wrote:
> DMA-exporter is supposed to handle cache syncs, so we can
> avoid ->prepare()/->finish() syncs from videobuf2 core for
> DMABUF buffers.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  drivers/media/common/videobuf2/videobuf2-v4l2.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> index 1762849288ae..2b9d3318e6fb 100644
> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q,
>  				   struct vb2_buffer *vb,
>  				   struct v4l2_buffer *b)
>  {
> -	vb->need_cache_sync_on_prepare = 1;
> +	/*
> +	 * DMA exporter should take care of cache syncs, so we can avoid
> +	 * explicit ->prepare()/->finish() syncs.
> +	 */
> +	if (q->memory == VB2_MEMORY_DMABUF) {
> +		vb->need_cache_sync_on_finish = 0;
> +		vb->need_cache_sync_on_prepare = 0;
> +		return;
> +	}
>  
> +	/*
> +	 * For other ->memory types we always need ->prepare() cache
> +	 * sync. ->finish() cache sync, however, can be avoided when queue
> +	 * direction is TO_DEVICE.
> +	 */
> +	vb->need_cache_sync_on_prepare = 1;

I'm trying to remember: what needs to be done in prepare()
for a capture buffer? I thought that for capture you only
needed to invalidate the cache in finish(), but nothing needs
to be done in the prepare().

Regards,

	Hans

>  	if (q->dma_dir != DMA_TO_DEVICE)
>  		vb->need_cache_sync_on_finish = 1;
>  	else
>
Sergey Senozhatsky Jan. 22, 2020, 5:05 a.m. UTC | #2
On (20/01/10 11:30), Hans Verkuil wrote:
[..]
> > diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> > index 1762849288ae..2b9d3318e6fb 100644
> > --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
> > +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> > @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q,
> >  				   struct vb2_buffer *vb,
> >  				   struct v4l2_buffer *b)
> >  {
> > -	vb->need_cache_sync_on_prepare = 1;
> > +	/*
> > +	 * DMA exporter should take care of cache syncs, so we can avoid
> > +	 * explicit ->prepare()/->finish() syncs.
> > +	 */
> > +	if (q->memory == VB2_MEMORY_DMABUF) {
> > +		vb->need_cache_sync_on_finish = 0;
> > +		vb->need_cache_sync_on_prepare = 0;
> > +		return;
> > +	}
> >  
> > +	/*
> > +	 * For other ->memory types we always need ->prepare() cache
> > +	 * sync. ->finish() cache sync, however, can be avoided when queue
> > +	 * direction is TO_DEVICE.
> > +	 */
> > +	vb->need_cache_sync_on_prepare = 1;
> 
> I'm trying to remember: what needs to be done in prepare()
> for a capture buffer? I thought that for capture you only
> needed to invalidate the cache in finish(), but nothing needs
> to be done in the prepare().

Hmm. Not sure. A precaution in case if user-space wrote to that buffer?

+	if (q->dma_dir == DMA_FROM_DEVICE)
+		q->need_cache_sync_on_prepare = 0;

?

	-ss
Hans Verkuil Jan. 23, 2020, 11:35 a.m. UTC | #3
On 1/22/20 6:05 AM, Sergey Senozhatsky wrote:
> On (20/01/10 11:30), Hans Verkuil wrote:
> [..]
>>> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
>>> index 1762849288ae..2b9d3318e6fb 100644
>>> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
>>> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
>>> @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q,
>>>  				   struct vb2_buffer *vb,
>>>  				   struct v4l2_buffer *b)
>>>  {
>>> -	vb->need_cache_sync_on_prepare = 1;
>>> +	/*
>>> +	 * DMA exporter should take care of cache syncs, so we can avoid
>>> +	 * explicit ->prepare()/->finish() syncs.
>>> +	 */
>>> +	if (q->memory == VB2_MEMORY_DMABUF) {
>>> +		vb->need_cache_sync_on_finish = 0;
>>> +		vb->need_cache_sync_on_prepare = 0;
>>> +		return;
>>> +	}
>>>  
>>> +	/*
>>> +	 * For other ->memory types we always need ->prepare() cache
>>> +	 * sync. ->finish() cache sync, however, can be avoided when queue
>>> +	 * direction is TO_DEVICE.
>>> +	 */
>>> +	vb->need_cache_sync_on_prepare = 1;
>>
>> I'm trying to remember: what needs to be done in prepare()
>> for a capture buffer? I thought that for capture you only
>> needed to invalidate the cache in finish(), but nothing needs
>> to be done in the prepare().
> 
> Hmm. Not sure. A precaution in case if user-space wrote to that buffer?

But whatever was written in the buffer is going to be overwritten anyway.

Unless I am mistaken the current situation is that the cache syncs are done
in both prepare and finish, regardless of the DMA direction.

I would keep that behavior to avoid introducing any unexpected regressions.

Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean)
in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the
finish for CAPTURE buffers.

This also means that any drivers that want to access a buffer in between the
prepare...finish calls will need to do a begin/end_cpu_access. But that's a
separate matter.

Regards,

	Hans

> 
> +	if (q->dma_dir == DMA_FROM_DEVICE)
> +		q->need_cache_sync_on_prepare = 0;
> 
> ?
> 
> 	-ss
>
Sergey Senozhatsky Jan. 24, 2020, 2:25 a.m. UTC | #4
On (20/01/23 12:35), Hans Verkuil wrote:
> On 1/22/20 6:05 AM, Sergey Senozhatsky wrote:
> > On (20/01/10 11:30), Hans Verkuil wrote:

[..]

> But whatever was written in the buffer is going to be overwritten anyway.
> 
> Unless I am mistaken the current situation is that the cache syncs are done
> in both prepare and finish, regardless of the DMA direction.
> 
> I would keep that behavior to avoid introducing any unexpected regressions.

OK.

> Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean)
> in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the
> finish for CAPTURE buffers.

We alter default cache sync behaviour based both on queue ->memory
type and queue ->dma_dir. Shall both of those cases depend on
->allow_cache_hints, or q->memory can be independent?

static void set_buffer_cache_hints(struct vb2_queue *q,
				   struct vb2_buffer *vb,
				   struct v4l2_buffer *b)
{
	if (!q->allow_cache_hints)
		return;

	/*
	 * DMA exporter should take care of cache syncs, so we can avoid
	 * explicit ->prepare()/->finish() syncs. For other ->memory types
	 * we always need ->prepare() or/and ->finish() cache sync.
	 */
	if (q->memory == VB2_MEMORY_DMABUF) {
		vb->need_cache_sync_on_finish = 0;
		vb->need_cache_sync_on_prepare = 0;
		return;
	}

	/*
	 * ->finish() cache sync can be avoided when queue direction is
	 * TO_DEVICE.
	 */
	if (q->dma_dir == DMA_TO_DEVICE)
		vb->need_cache_sync_on_finish = 0;
	else
		vb->need_cache_sync_on_finish = 1;

	/*
	 * ->prepare() cache sync can be avoided when queue direction is
	 * FROM_DEVICE.
	 */
	if (q->dma_dir == DMA_FROM_DEVICE)
		vb->need_cache_sync_on_prepare = 0;
	else
		vb->need_cache_sync_on_prepare = 1;

	if (b->flags & V4L2_BUF_FLAG_NO_CACHE_INVALIDATE)
		vb->need_cache_sync_on_finish = 0;

	if (b->flags & V4L2_BUF_FLAG_NO_CACHE_CLEAN)
		vb->need_cache_sync_on_prepare = 0;
}
Sergey Senozhatsky Jan. 24, 2020, 7:32 a.m. UTC | #5
On (20/01/24 11:25), Sergey Senozhatsky wrote:
[..]
>
> > Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean)
> > in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the
> > finish for CAPTURE buffers.
>
> We alter default cache sync behaviour based both on queue ->memory
> type and queue ->dma_dir. Shall both of those cases depend on
> ->allow_cache_hints, or q->memory can be independent?
>
> static void set_buffer_cache_hints(struct vb2_queue *q,
> 				   struct vb2_buffer *vb,
> 				   struct v4l2_buffer *b)
> {
> 	if (!q->allow_cache_hints)
> 		return;
>
> 	/*
> 	 * DMA exporter should take care of cache syncs, so we can avoid
> 	 * explicit ->prepare()/->finish() syncs. For other ->memory types
> 	 * we always need ->prepare() or/and ->finish() cache sync.
> 	 */
> 	if (q->memory == VB2_MEMORY_DMABUF) {
> 		vb->need_cache_sync_on_finish = 0;
> 		vb->need_cache_sync_on_prepare = 0;
> 		return;
> 	}

I personally would prefer this to be above ->allow_cache_hints check,
IOW to be independent. Because we need to skip flush/invalidation for
DMABUF anyway, that's what allocators do in ->prepare() and ->finish()
currently.

	-ss
Tomasz Figa Jan. 28, 2020, 7:19 a.m. UTC | #6
On Thu, Jan 23, 2020 at 8:35 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 1/22/20 6:05 AM, Sergey Senozhatsky wrote:
> > On (20/01/10 11:30), Hans Verkuil wrote:
> > [..]
> >>> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> >>> index 1762849288ae..2b9d3318e6fb 100644
> >>> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
> >>> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> >>> @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q,
> >>>                                struct vb2_buffer *vb,
> >>>                                struct v4l2_buffer *b)
> >>>  {
> >>> -   vb->need_cache_sync_on_prepare = 1;
> >>> +   /*
> >>> +    * DMA exporter should take care of cache syncs, so we can avoid
> >>> +    * explicit ->prepare()/->finish() syncs.
> >>> +    */
> >>> +   if (q->memory == VB2_MEMORY_DMABUF) {
> >>> +           vb->need_cache_sync_on_finish = 0;
> >>> +           vb->need_cache_sync_on_prepare = 0;
> >>> +           return;
> >>> +   }
> >>>
> >>> +   /*
> >>> +    * For other ->memory types we always need ->prepare() cache
> >>> +    * sync. ->finish() cache sync, however, can be avoided when queue
> >>> +    * direction is TO_DEVICE.
> >>> +    */
> >>> +   vb->need_cache_sync_on_prepare = 1;
> >>
> >> I'm trying to remember: what needs to be done in prepare()
> >> for a capture buffer? I thought that for capture you only
> >> needed to invalidate the cache in finish(), but nothing needs
> >> to be done in the prepare().
> >
> > Hmm. Not sure. A precaution in case if user-space wrote to that buffer?
>
> But whatever was written in the buffer is going to be overwritten anyway.
>
> Unless I am mistaken the current situation is that the cache syncs are done
> in both prepare and finish, regardless of the DMA direction.
>
> I would keep that behavior to avoid introducing any unexpected regressions.
>

It wouldn't be surprising if the buffer was first filled with default
values (e.g. all zeroes) on the CPU. That would make the cache lines
dirty and they could overwrite what the device writes. So we need to
flush (aka clean) the write-back caches on prepare for CAPTURE
buffers.

> Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean)
> in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the
> finish for CAPTURE buffers.

I'd still default to the existing behavior even if allow_cache_hint is
set, because of what I wrote above. Then if the userspace doesn't ever
write to the buffers, it can request no flush/clean by setting the
V4L2_BUF_FLAG_NO_CACHE_CLEAN flag when queuing the buffer.

>
> This also means that any drivers that want to access a buffer in between the
> prepare...finish calls will need to do a begin/end_cpu_access. But that's a
> separate matter.

AFAIR with current design of the series, the drivers can opt-in for
userspace cache sync hints, so by default even if the userspace
requests sync to be skipped, it wouldn't have any effect unless the
driver allows so. Then I'd imagine migrating all the drivers to
request clean/invalidate explicitly. Note that the DMA-buf
begin/end_cpu_access isn't enough here. We'd need something like
vb2_begin/end_cpu_access() which also takes care of syncing
inconsistent MMAP and USERPTR buffers.

>
> Regards,
>
>         Hans
>
> >
> > +     if (q->dma_dir == DMA_FROM_DEVICE)
> > +             q->need_cache_sync_on_prepare = 0;
> >
> > ?
> >
> >       -ss
> >
>
Tomasz Figa Jan. 28, 2020, 7:22 a.m. UTC | #7
On Fri, Jan 24, 2020 at 4:32 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (20/01/24 11:25), Sergey Senozhatsky wrote:
> [..]
> >
> > > Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean)
> > > in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the
> > > finish for CAPTURE buffers.
> >
> > We alter default cache sync behaviour based both on queue ->memory
> > type and queue ->dma_dir. Shall both of those cases depend on
> > ->allow_cache_hints, or q->memory can be independent?
> >
> > static void set_buffer_cache_hints(struct vb2_queue *q,
> >                                  struct vb2_buffer *vb,
> >                                  struct v4l2_buffer *b)
> > {
> >       if (!q->allow_cache_hints)
> >               return;
> >
> >       /*
> >        * DMA exporter should take care of cache syncs, so we can avoid
> >        * explicit ->prepare()/->finish() syncs. For other ->memory types
> >        * we always need ->prepare() or/and ->finish() cache sync.
> >        */
> >       if (q->memory == VB2_MEMORY_DMABUF) {
> >               vb->need_cache_sync_on_finish = 0;
> >               vb->need_cache_sync_on_prepare = 0;
> >               return;
> >       }
>
> I personally would prefer this to be above ->allow_cache_hints check,
> IOW to be independent. Because we need to skip flush/invalidation for
> DMABUF anyway, that's what allocators do in ->prepare() and ->finish()
> currently.

I think it's even more than personal preference. We shouldn't even
attempt to sync imported DMA-bufs, so the code above may result in
incorrect behavior.
Sergey Senozhatsky Jan. 28, 2020, 7:34 a.m. UTC | #8
On (20/01/28 16:22), Tomasz Figa wrote:
[..]
> > > > Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean)
> > > > in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the
> > > > finish for CAPTURE buffers.
> > >
> > > We alter default cache sync behaviour based both on queue ->memory
> > > type and queue ->dma_dir. Shall both of those cases depend on
> > > ->allow_cache_hints, or q->memory can be independent?
> > >
> > > static void set_buffer_cache_hints(struct vb2_queue *q,
> > >                                  struct vb2_buffer *vb,
> > >                                  struct v4l2_buffer *b)
> > > {
> > >       if (!q->allow_cache_hints)
> > >               return;
> > >
> > >       /*
> > >        * DMA exporter should take care of cache syncs, so we can avoid
> > >        * explicit ->prepare()/->finish() syncs. For other ->memory types
> > >        * we always need ->prepare() or/and ->finish() cache sync.
> > >        */
> > >       if (q->memory == VB2_MEMORY_DMABUF) {
> > >               vb->need_cache_sync_on_finish = 0;
> > >               vb->need_cache_sync_on_prepare = 0;
> > >               return;
> > >       }
> >
> > I personally would prefer this to be above ->allow_cache_hints check,
> > IOW to be independent. Because we need to skip flush/invalidation for
> > DMABUF anyway, that's what allocators do in ->prepare() and ->finish()
> > currently.
>
> I think it's even more than personal preference. We shouldn't even
> attempt to sync imported DMA-bufs, so the code above may result in
> incorrect behavior.

Sure. Allocators test buf->db_attach in prepare()/finish(). This patchset
removes that logic, because we move it to the core code. But if the
conclusion will be to clear ->need_cache_sync_on_finish/prepare only
when ->allow_cache_hints was set, then buf->db_attach bail out must stay
in allocators. And this is where I have preferences :)

	-ss
Hans Verkuil Jan. 28, 2020, 8:47 a.m. UTC | #9
On 1/28/20 8:19 AM, Tomasz Figa wrote:
> On Thu, Jan 23, 2020 at 8:35 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> On 1/22/20 6:05 AM, Sergey Senozhatsky wrote:
>>> On (20/01/10 11:30), Hans Verkuil wrote:
>>> [..]
>>>>> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
>>>>> index 1762849288ae..2b9d3318e6fb 100644
>>>>> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
>>>>> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
>>>>> @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q,
>>>>>                                struct vb2_buffer *vb,
>>>>>                                struct v4l2_buffer *b)
>>>>>  {
>>>>> -   vb->need_cache_sync_on_prepare = 1;
>>>>> +   /*
>>>>> +    * DMA exporter should take care of cache syncs, so we can avoid
>>>>> +    * explicit ->prepare()/->finish() syncs.
>>>>> +    */
>>>>> +   if (q->memory == VB2_MEMORY_DMABUF) {
>>>>> +           vb->need_cache_sync_on_finish = 0;
>>>>> +           vb->need_cache_sync_on_prepare = 0;
>>>>> +           return;
>>>>> +   }
>>>>>
>>>>> +   /*
>>>>> +    * For other ->memory types we always need ->prepare() cache
>>>>> +    * sync. ->finish() cache sync, however, can be avoided when queue
>>>>> +    * direction is TO_DEVICE.
>>>>> +    */
>>>>> +   vb->need_cache_sync_on_prepare = 1;
>>>>
>>>> I'm trying to remember: what needs to be done in prepare()
>>>> for a capture buffer? I thought that for capture you only
>>>> needed to invalidate the cache in finish(), but nothing needs
>>>> to be done in the prepare().
>>>
>>> Hmm. Not sure. A precaution in case if user-space wrote to that buffer?
>>
>> But whatever was written in the buffer is going to be overwritten anyway.
>>
>> Unless I am mistaken the current situation is that the cache syncs are done
>> in both prepare and finish, regardless of the DMA direction.
>>
>> I would keep that behavior to avoid introducing any unexpected regressions.
>>
> 
> It wouldn't be surprising if the buffer was first filled with default
> values (e.g. all zeroes) on the CPU. That would make the cache lines
> dirty and they could overwrite what the device writes. So we need to
> flush (aka clean) the write-back caches on prepare for CAPTURE
> buffers.

Very true, I'd forgotten about that. This should be documented in the userspace
documentation. And possible in this function as well.

I think these issues are complex enough that there is no such things as too
much documentation :-)

> 
>> Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean)
>> in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the
>> finish for CAPTURE buffers.
> 
> I'd still default to the existing behavior even if allow_cache_hint is
> set, because of what I wrote above. Then if the userspace doesn't ever
> write to the buffers, it can request no flush/clean by setting the
> V4L2_BUF_FLAG_NO_CACHE_CLEAN flag when queuing the buffer.
> 
>>
>> This also means that any drivers that want to access a buffer in between the
>> prepare...finish calls will need to do a begin/end_cpu_access. But that's a
>> separate matter.
> 
> AFAIR with current design of the series, the drivers can opt-in for
> userspace cache sync hints, so by default even if the userspace
> requests sync to be skipped, it wouldn't have any effect unless the
> driver allows so. Then I'd imagine migrating all the drivers to
> request clean/invalidate explicitly. Note that the DMA-buf
> begin/end_cpu_access isn't enough here. We'd need something like
> vb2_begin/end_cpu_access() which also takes care of syncing
> inconsistent MMAP and USERPTR buffers.

I did just that in my old git branch for this:

https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=vb2-cpu-access

Regards,

	Hans

> 
>>
>> Regards,
>>
>>         Hans
>>
>>>
>>> +     if (q->dma_dir == DMA_FROM_DEVICE)
>>> +             q->need_cache_sync_on_prepare = 0;
>>>
>>> ?
>>>
>>>       -ss
>>>
>>

Patch
diff mbox series

diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
index 1762849288ae..2b9d3318e6fb 100644
--- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
+++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
@@ -341,8 +341,22 @@  static void set_buffer_cache_hints(struct vb2_queue *q,
 				   struct vb2_buffer *vb,
 				   struct v4l2_buffer *b)
 {
-	vb->need_cache_sync_on_prepare = 1;
+	/*
+	 * DMA exporter should take care of cache syncs, so we can avoid
+	 * explicit ->prepare()/->finish() syncs.
+	 */
+	if (q->memory == VB2_MEMORY_DMABUF) {
+		vb->need_cache_sync_on_finish = 0;
+		vb->need_cache_sync_on_prepare = 0;
+		return;
+	}
 
+	/*
+	 * For other ->memory types we always need ->prepare() cache
+	 * sync. ->finish() cache sync, however, can be avoided when queue
+	 * direction is TO_DEVICE.
+	 */
+	vb->need_cache_sync_on_prepare = 1;
 	if (q->dma_dir != DMA_TO_DEVICE)
 		vb->need_cache_sync_on_finish = 1;
 	else