Message ID | 20191217032034.54897-14-senozhatsky@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Implement V4L2_BUF_FLAG_NO_CACHE_* flags | expand |
On 12/17/19 4:20 AM, Sergey Senozhatsky wrote: > DMA-exporter is supposed to handle cache syncs, so we can > avoid ->prepare()/->finish() syncs from videobuf2 core for > DMABUF buffers. > > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> > --- > drivers/media/common/videobuf2/videobuf2-v4l2.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c > index 1762849288ae..2b9d3318e6fb 100644 > --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c > +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c > @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q, > struct vb2_buffer *vb, > struct v4l2_buffer *b) > { > - vb->need_cache_sync_on_prepare = 1; > + /* > + * DMA exporter should take care of cache syncs, so we can avoid > + * explicit ->prepare()/->finish() syncs. > + */ > + if (q->memory == VB2_MEMORY_DMABUF) { > + vb->need_cache_sync_on_finish = 0; > + vb->need_cache_sync_on_prepare = 0; > + return; > + } > > + /* > + * For other ->memory types we always need ->prepare() cache > + * sync. ->finish() cache sync, however, can be avoided when queue > + * direction is TO_DEVICE. > + */ > + vb->need_cache_sync_on_prepare = 1; I'm trying to remember: what needs to be done in prepare() for a capture buffer? I thought that for capture you only needed to invalidate the cache in finish(), but nothing needs to be done in the prepare(). Regards, Hans > if (q->dma_dir != DMA_TO_DEVICE) > vb->need_cache_sync_on_finish = 1; > else >
On (20/01/10 11:30), Hans Verkuil wrote: [..] > > diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c > > index 1762849288ae..2b9d3318e6fb 100644 > > --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c > > +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c > > @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q, > > struct vb2_buffer *vb, > > struct v4l2_buffer *b) > > { > > - vb->need_cache_sync_on_prepare = 1; > > + /* > > + * DMA exporter should take care of cache syncs, so we can avoid > > + * explicit ->prepare()/->finish() syncs. > > + */ > > + if (q->memory == VB2_MEMORY_DMABUF) { > > + vb->need_cache_sync_on_finish = 0; > > + vb->need_cache_sync_on_prepare = 0; > > + return; > > + } > > > > + /* > > + * For other ->memory types we always need ->prepare() cache > > + * sync. ->finish() cache sync, however, can be avoided when queue > > + * direction is TO_DEVICE. > > + */ > > + vb->need_cache_sync_on_prepare = 1; > > I'm trying to remember: what needs to be done in prepare() > for a capture buffer? I thought that for capture you only > needed to invalidate the cache in finish(), but nothing needs > to be done in the prepare(). Hmm. Not sure. A precaution in case if user-space wrote to that buffer? + if (q->dma_dir == DMA_FROM_DEVICE) + q->need_cache_sync_on_prepare = 0; ? -ss
On 1/22/20 6:05 AM, Sergey Senozhatsky wrote: > On (20/01/10 11:30), Hans Verkuil wrote: > [..] >>> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c >>> index 1762849288ae..2b9d3318e6fb 100644 >>> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c >>> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c >>> @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q, >>> struct vb2_buffer *vb, >>> struct v4l2_buffer *b) >>> { >>> - vb->need_cache_sync_on_prepare = 1; >>> + /* >>> + * DMA exporter should take care of cache syncs, so we can avoid >>> + * explicit ->prepare()/->finish() syncs. >>> + */ >>> + if (q->memory == VB2_MEMORY_DMABUF) { >>> + vb->need_cache_sync_on_finish = 0; >>> + vb->need_cache_sync_on_prepare = 0; >>> + return; >>> + } >>> >>> + /* >>> + * For other ->memory types we always need ->prepare() cache >>> + * sync. ->finish() cache sync, however, can be avoided when queue >>> + * direction is TO_DEVICE. >>> + */ >>> + vb->need_cache_sync_on_prepare = 1; >> >> I'm trying to remember: what needs to be done in prepare() >> for a capture buffer? I thought that for capture you only >> needed to invalidate the cache in finish(), but nothing needs >> to be done in the prepare(). > > Hmm. Not sure. A precaution in case if user-space wrote to that buffer? But whatever was written in the buffer is going to be overwritten anyway. Unless I am mistaken the current situation is that the cache syncs are done in both prepare and finish, regardless of the DMA direction. I would keep that behavior to avoid introducing any unexpected regressions. Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean) in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the finish for CAPTURE buffers. This also means that any drivers that want to access a buffer in between the prepare...finish calls will need to do a begin/end_cpu_access. But that's a separate matter. Regards, Hans > > + if (q->dma_dir == DMA_FROM_DEVICE) > + q->need_cache_sync_on_prepare = 0; > > ? > > -ss >
On (20/01/23 12:35), Hans Verkuil wrote: > On 1/22/20 6:05 AM, Sergey Senozhatsky wrote: > > On (20/01/10 11:30), Hans Verkuil wrote: [..] > But whatever was written in the buffer is going to be overwritten anyway. > > Unless I am mistaken the current situation is that the cache syncs are done > in both prepare and finish, regardless of the DMA direction. > > I would keep that behavior to avoid introducing any unexpected regressions. OK. > Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean) > in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the > finish for CAPTURE buffers. We alter default cache sync behaviour based both on queue ->memory type and queue ->dma_dir. Shall both of those cases depend on ->allow_cache_hints, or q->memory can be independent? static void set_buffer_cache_hints(struct vb2_queue *q, struct vb2_buffer *vb, struct v4l2_buffer *b) { if (!q->allow_cache_hints) return; /* * DMA exporter should take care of cache syncs, so we can avoid * explicit ->prepare()/->finish() syncs. For other ->memory types * we always need ->prepare() or/and ->finish() cache sync. */ if (q->memory == VB2_MEMORY_DMABUF) { vb->need_cache_sync_on_finish = 0; vb->need_cache_sync_on_prepare = 0; return; } /* * ->finish() cache sync can be avoided when queue direction is * TO_DEVICE. */ if (q->dma_dir == DMA_TO_DEVICE) vb->need_cache_sync_on_finish = 0; else vb->need_cache_sync_on_finish = 1; /* * ->prepare() cache sync can be avoided when queue direction is * FROM_DEVICE. */ if (q->dma_dir == DMA_FROM_DEVICE) vb->need_cache_sync_on_prepare = 0; else vb->need_cache_sync_on_prepare = 1; if (b->flags & V4L2_BUF_FLAG_NO_CACHE_INVALIDATE) vb->need_cache_sync_on_finish = 0; if (b->flags & V4L2_BUF_FLAG_NO_CACHE_CLEAN) vb->need_cache_sync_on_prepare = 0; }
On (20/01/24 11:25), Sergey Senozhatsky wrote: [..] > > > Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean) > > in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the > > finish for CAPTURE buffers. > > We alter default cache sync behaviour based both on queue ->memory > type and queue ->dma_dir. Shall both of those cases depend on > ->allow_cache_hints, or q->memory can be independent? > > static void set_buffer_cache_hints(struct vb2_queue *q, > struct vb2_buffer *vb, > struct v4l2_buffer *b) > { > if (!q->allow_cache_hints) > return; > > /* > * DMA exporter should take care of cache syncs, so we can avoid > * explicit ->prepare()/->finish() syncs. For other ->memory types > * we always need ->prepare() or/and ->finish() cache sync. > */ > if (q->memory == VB2_MEMORY_DMABUF) { > vb->need_cache_sync_on_finish = 0; > vb->need_cache_sync_on_prepare = 0; > return; > } I personally would prefer this to be above ->allow_cache_hints check, IOW to be independent. Because we need to skip flush/invalidation for DMABUF anyway, that's what allocators do in ->prepare() and ->finish() currently. -ss
On Thu, Jan 23, 2020 at 8:35 PM Hans Verkuil <hverkuil@xs4all.nl> wrote: > > On 1/22/20 6:05 AM, Sergey Senozhatsky wrote: > > On (20/01/10 11:30), Hans Verkuil wrote: > > [..] > >>> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c > >>> index 1762849288ae..2b9d3318e6fb 100644 > >>> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c > >>> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c > >>> @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q, > >>> struct vb2_buffer *vb, > >>> struct v4l2_buffer *b) > >>> { > >>> - vb->need_cache_sync_on_prepare = 1; > >>> + /* > >>> + * DMA exporter should take care of cache syncs, so we can avoid > >>> + * explicit ->prepare()/->finish() syncs. > >>> + */ > >>> + if (q->memory == VB2_MEMORY_DMABUF) { > >>> + vb->need_cache_sync_on_finish = 0; > >>> + vb->need_cache_sync_on_prepare = 0; > >>> + return; > >>> + } > >>> > >>> + /* > >>> + * For other ->memory types we always need ->prepare() cache > >>> + * sync. ->finish() cache sync, however, can be avoided when queue > >>> + * direction is TO_DEVICE. > >>> + */ > >>> + vb->need_cache_sync_on_prepare = 1; > >> > >> I'm trying to remember: what needs to be done in prepare() > >> for a capture buffer? I thought that for capture you only > >> needed to invalidate the cache in finish(), but nothing needs > >> to be done in the prepare(). > > > > Hmm. Not sure. A precaution in case if user-space wrote to that buffer? > > But whatever was written in the buffer is going to be overwritten anyway. > > Unless I am mistaken the current situation is that the cache syncs are done > in both prepare and finish, regardless of the DMA direction. > > I would keep that behavior to avoid introducing any unexpected regressions. > It wouldn't be surprising if the buffer was first filled with default values (e.g. all zeroes) on the CPU. That would make the cache lines dirty and they could overwrite what the device writes. So we need to flush (aka clean) the write-back caches on prepare for CAPTURE buffers. > Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean) > in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the > finish for CAPTURE buffers. I'd still default to the existing behavior even if allow_cache_hint is set, because of what I wrote above. Then if the userspace doesn't ever write to the buffers, it can request no flush/clean by setting the V4L2_BUF_FLAG_NO_CACHE_CLEAN flag when queuing the buffer. > > This also means that any drivers that want to access a buffer in between the > prepare...finish calls will need to do a begin/end_cpu_access. But that's a > separate matter. AFAIR with current design of the series, the drivers can opt-in for userspace cache sync hints, so by default even if the userspace requests sync to be skipped, it wouldn't have any effect unless the driver allows so. Then I'd imagine migrating all the drivers to request clean/invalidate explicitly. Note that the DMA-buf begin/end_cpu_access isn't enough here. We'd need something like vb2_begin/end_cpu_access() which also takes care of syncing inconsistent MMAP and USERPTR buffers. > > Regards, > > Hans > > > > > + if (q->dma_dir == DMA_FROM_DEVICE) > > + q->need_cache_sync_on_prepare = 0; > > > > ? > > > > -ss > > >
On Fri, Jan 24, 2020 at 4:32 PM Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > On (20/01/24 11:25), Sergey Senozhatsky wrote: > [..] > > > > > Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean) > > > in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the > > > finish for CAPTURE buffers. > > > > We alter default cache sync behaviour based both on queue ->memory > > type and queue ->dma_dir. Shall both of those cases depend on > > ->allow_cache_hints, or q->memory can be independent? > > > > static void set_buffer_cache_hints(struct vb2_queue *q, > > struct vb2_buffer *vb, > > struct v4l2_buffer *b) > > { > > if (!q->allow_cache_hints) > > return; > > > > /* > > * DMA exporter should take care of cache syncs, so we can avoid > > * explicit ->prepare()/->finish() syncs. For other ->memory types > > * we always need ->prepare() or/and ->finish() cache sync. > > */ > > if (q->memory == VB2_MEMORY_DMABUF) { > > vb->need_cache_sync_on_finish = 0; > > vb->need_cache_sync_on_prepare = 0; > > return; > > } > > I personally would prefer this to be above ->allow_cache_hints check, > IOW to be independent. Because we need to skip flush/invalidation for > DMABUF anyway, that's what allocators do in ->prepare() and ->finish() > currently. I think it's even more than personal preference. We shouldn't even attempt to sync imported DMA-bufs, so the code above may result in incorrect behavior.
On (20/01/28 16:22), Tomasz Figa wrote: [..] > > > > Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean) > > > > in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the > > > > finish for CAPTURE buffers. > > > > > > We alter default cache sync behaviour based both on queue ->memory > > > type and queue ->dma_dir. Shall both of those cases depend on > > > ->allow_cache_hints, or q->memory can be independent? > > > > > > static void set_buffer_cache_hints(struct vb2_queue *q, > > > struct vb2_buffer *vb, > > > struct v4l2_buffer *b) > > > { > > > if (!q->allow_cache_hints) > > > return; > > > > > > /* > > > * DMA exporter should take care of cache syncs, so we can avoid > > > * explicit ->prepare()/->finish() syncs. For other ->memory types > > > * we always need ->prepare() or/and ->finish() cache sync. > > > */ > > > if (q->memory == VB2_MEMORY_DMABUF) { > > > vb->need_cache_sync_on_finish = 0; > > > vb->need_cache_sync_on_prepare = 0; > > > return; > > > } > > > > I personally would prefer this to be above ->allow_cache_hints check, > > IOW to be independent. Because we need to skip flush/invalidation for > > DMABUF anyway, that's what allocators do in ->prepare() and ->finish() > > currently. > > I think it's even more than personal preference. We shouldn't even > attempt to sync imported DMA-bufs, so the code above may result in > incorrect behavior. Sure. Allocators test buf->db_attach in prepare()/finish(). This patchset removes that logic, because we move it to the core code. But if the conclusion will be to clear ->need_cache_sync_on_finish/prepare only when ->allow_cache_hints was set, then buf->db_attach bail out must stay in allocators. And this is where I have preferences :) -ss
On 1/28/20 8:19 AM, Tomasz Figa wrote: > On Thu, Jan 23, 2020 at 8:35 PM Hans Verkuil <hverkuil@xs4all.nl> wrote: >> >> On 1/22/20 6:05 AM, Sergey Senozhatsky wrote: >>> On (20/01/10 11:30), Hans Verkuil wrote: >>> [..] >>>>> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c >>>>> index 1762849288ae..2b9d3318e6fb 100644 >>>>> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c >>>>> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c >>>>> @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q, >>>>> struct vb2_buffer *vb, >>>>> struct v4l2_buffer *b) >>>>> { >>>>> - vb->need_cache_sync_on_prepare = 1; >>>>> + /* >>>>> + * DMA exporter should take care of cache syncs, so we can avoid >>>>> + * explicit ->prepare()/->finish() syncs. >>>>> + */ >>>>> + if (q->memory == VB2_MEMORY_DMABUF) { >>>>> + vb->need_cache_sync_on_finish = 0; >>>>> + vb->need_cache_sync_on_prepare = 0; >>>>> + return; >>>>> + } >>>>> >>>>> + /* >>>>> + * For other ->memory types we always need ->prepare() cache >>>>> + * sync. ->finish() cache sync, however, can be avoided when queue >>>>> + * direction is TO_DEVICE. >>>>> + */ >>>>> + vb->need_cache_sync_on_prepare = 1; >>>> >>>> I'm trying to remember: what needs to be done in prepare() >>>> for a capture buffer? I thought that for capture you only >>>> needed to invalidate the cache in finish(), but nothing needs >>>> to be done in the prepare(). >>> >>> Hmm. Not sure. A precaution in case if user-space wrote to that buffer? >> >> But whatever was written in the buffer is going to be overwritten anyway. >> >> Unless I am mistaken the current situation is that the cache syncs are done >> in both prepare and finish, regardless of the DMA direction. >> >> I would keep that behavior to avoid introducing any unexpected regressions. >> > > It wouldn't be surprising if the buffer was first filled with default > values (e.g. all zeroes) on the CPU. That would make the cache lines > dirty and they could overwrite what the device writes. So we need to > flush (aka clean) the write-back caches on prepare for CAPTURE > buffers. Very true, I'd forgotten about that. This should be documented in the userspace documentation. And possible in this function as well. I think these issues are complex enough that there is no such things as too much documentation :-) > >> Then, if q->allow_cache_hint is set, then default to a cache sync (cache clean) >> in the prepare for OUTPUT buffers and a cache sync (cache invalidate) in the >> finish for CAPTURE buffers. > > I'd still default to the existing behavior even if allow_cache_hint is > set, because of what I wrote above. Then if the userspace doesn't ever > write to the buffers, it can request no flush/clean by setting the > V4L2_BUF_FLAG_NO_CACHE_CLEAN flag when queuing the buffer. > >> >> This also means that any drivers that want to access a buffer in between the >> prepare...finish calls will need to do a begin/end_cpu_access. But that's a >> separate matter. > > AFAIR with current design of the series, the drivers can opt-in for > userspace cache sync hints, so by default even if the userspace > requests sync to be skipped, it wouldn't have any effect unless the > driver allows so. Then I'd imagine migrating all the drivers to > request clean/invalidate explicitly. Note that the DMA-buf > begin/end_cpu_access isn't enough here. We'd need something like > vb2_begin/end_cpu_access() which also takes care of syncing > inconsistent MMAP and USERPTR buffers. I did just that in my old git branch for this: https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=vb2-cpu-access Regards, Hans > >> >> Regards, >> >> Hans >> >>> >>> + if (q->dma_dir == DMA_FROM_DEVICE) >>> + q->need_cache_sync_on_prepare = 0; >>> >>> ? >>> >>> -ss >>> >>
diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c index 1762849288ae..2b9d3318e6fb 100644 --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c @@ -341,8 +341,22 @@ static void set_buffer_cache_hints(struct vb2_queue *q, struct vb2_buffer *vb, struct v4l2_buffer *b) { - vb->need_cache_sync_on_prepare = 1; + /* + * DMA exporter should take care of cache syncs, so we can avoid + * explicit ->prepare()/->finish() syncs. + */ + if (q->memory == VB2_MEMORY_DMABUF) { + vb->need_cache_sync_on_finish = 0; + vb->need_cache_sync_on_prepare = 0; + return; + } + /* + * For other ->memory types we always need ->prepare() cache + * sync. ->finish() cache sync, however, can be avoided when queue + * direction is TO_DEVICE. + */ + vb->need_cache_sync_on_prepare = 1; if (q->dma_dir != DMA_TO_DEVICE) vb->need_cache_sync_on_finish = 1; else
DMA-exporter is supposed to handle cache syncs, so we can avoid ->prepare()/->finish() syncs from videobuf2 core for DMABUF buffers. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> --- drivers/media/common/videobuf2/videobuf2-v4l2.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)