diff mbox series

[v4,1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig

Message ID 20250303-b4-rkisp-noncoherent-v4-1-e32e843fb6ef@gmail.com (mailing list archive)
State New
Headers show
Series Allow non-coherent video capture buffers on Rockchip ISP V1 | expand

Commit Message

Mikhail Rudenko March 3, 2025, 11:40 a.m. UTC
When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
commit 129134e5415d ("media: media/v4l2: remove
V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
no-ops, making cache maintenance for non-coherent dmabufs allocated by
dma-contig impossible.

Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
{flush,invalidate}_kernel_vmap_range calls to
vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
buffers.

Fixes: c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag")
Cc: stable@vger.kernel.org
Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
---
 .../media/common/videobuf2/videobuf2-dma-contig.c  | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Nicolas Dufresne March 3, 2025, 3:24 p.m. UTC | #1
Hi Mikhail,

Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit :
> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
> commit 129134e5415d ("media: media/v4l2: remove
> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
> no-ops, making cache maintenance for non-coherent dmabufs allocated
> by
> dma-contig impossible.
> 
> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
> {flush,invalidate}_kernel_vmap_range calls to
> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
> buffers.
> 
> Fixes: c0acf9cfeee0 ("media: videobuf2: handle
> V4L2_MEMORY_FLAG_NON_COHERENT flag")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
> ---
>  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22
> ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index
> a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7
> bc888a84a95d5 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -427,6 +427,17 @@ static int
>  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
>  				   enum dma_data_direction
> direction)
>  {
> +	struct vb2_dc_buf *buf = dbuf->priv;
> +	struct sg_table *sgt = buf->dma_sgt;
> +
> +	if (!buf->non_coherent_mem)
> +		return 0;
> +
> +	if (buf->vaddr)
> +		invalidate_kernel_vmap_range(buf->vaddr, buf->size);

What would make me a lot more confortable with this change is if you
enable kernel mappings for one test. This will ensure you cover the
call to "invalidate" in your testing. I'd like to know about the
performance impact. With this implementation it should be identical to
the VB2 one.

What I was trying to say in previous comments, is that my impression is
that we can skip this for CPU read access, since we don't guaranty
concurrent access anyway. Both address space can keep their cache in
that case. Though, I see RKISP does not use kernel mapping plus I'm not
reporting a bug, but checking if we should leave a comment for possible
users of kernel mapping in the future ?

> +
> +	dma_sync_sgtable_for_cpu(buf->dev, sgt, direction);
> +
>  	return 0;
>  }
>  
> @@ -434,6 +445,17 @@ static int
>  vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
>  				 enum dma_data_direction direction)
>  {
> +	struct vb2_dc_buf *buf = dbuf->priv;
> +	struct sg_table *sgt = buf->dma_sgt;
> +
> +	if (!buf->non_coherent_mem)
> +		return 0;
> +
> +	if (buf->vaddr)
> +		flush_kernel_vmap_range(buf->vaddr, buf->size);
> +
> +	dma_sync_sgtable_for_device(buf->dev, sgt, direction);
> +
>  	return 0;
>  }
>  
>
Mikhail Rudenko March 5, 2025, 7:40 a.m. UTC | #2
Hi Nicolas,

On 2025-03-03 at 10:24 -05, Nicolas Dufresne <nicolas@ndufresne.ca> wrote:

> Hi Mikhail,
>
> Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit :
>> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
>> commit 129134e5415d ("media: media/v4l2: remove
>> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
>> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
>> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
>> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
>> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
>> no-ops, making cache maintenance for non-coherent dmabufs allocated
>> by
>> dma-contig impossible.
>>
>> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
>> {flush,invalidate}_kernel_vmap_range calls to
>> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
>> buffers.
>>
>> Fixes: c0acf9cfeee0 ("media: videobuf2: handle
>> V4L2_MEMORY_FLAG_NON_COHERENT flag")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
>> ---
>>  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22
>> ++++++++++++++++++++++
>>  1 file changed, 22 insertions(+)
>>
>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> index
>> a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7
>> bc888a84a95d5 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> @@ -427,6 +427,17 @@ static int
>>  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
>>  				   enum dma_data_direction
>> direction)
>>  {
>> +	struct vb2_dc_buf *buf = dbuf->priv;
>> +	struct sg_table *sgt = buf->dma_sgt;
>> +
>> +	if (!buf->non_coherent_mem)
>> +		return 0;
>> +
>> +	if (buf->vaddr)
>> +		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>
> What would make me a lot more confortable with this change is if you
> enable kernel mappings for one test. This will ensure you cover the
> call to "invalidate" in your testing. I'd like to know about the
> performance impact. With this implementation it should be identical to
> the VB2 one.

I'll enable kernel mappings and rerun my tests later this week.

> What I was trying to say in previous comments, is that my impression is
> that we can skip this for CPU read access, since we don't guaranty
> concurrent access anyway. Both address space can keep their cache in
> that case. Though, I see RKISP does not use kernel mapping plus I'm not
> reporting a bug, but checking if we should leave a comment for possible
> users of kernel mapping in the future ?

I trust Tomasz here, I'd wait for his comment on v4.

>> +
>> +	dma_sync_sgtable_for_cpu(buf->dev, sgt, direction);
>> +
>>  	return 0;
>>  }
>>
>> @@ -434,6 +445,17 @@ static int
>>  vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
>>  				 enum dma_data_direction direction)
>>  {
>> +	struct vb2_dc_buf *buf = dbuf->priv;
>> +	struct sg_table *sgt = buf->dma_sgt;
>> +
>> +	if (!buf->non_coherent_mem)
>> +		return 0;
>> +
>> +	if (buf->vaddr)
>> +		flush_kernel_vmap_range(buf->vaddr, buf->size);
>> +
>> +	dma_sync_sgtable_for_device(buf->dev, sgt, direction);
>> +
>>  	return 0;
>>  }
>>
>>


--
Best regards,
Mikhail Rudenko
Tomasz Figa March 5, 2025, 8:12 a.m. UTC | #3
On Tue, Mar 4, 2025 at 12:24 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Hi Mikhail,
>
> Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit :
> > When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
> > commit 129134e5415d ("media: media/v4l2: remove
> > V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
> > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
> > no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
> > introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
> > V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
> > no-ops, making cache maintenance for non-coherent dmabufs allocated
> > by
> > dma-contig impossible.
> >
> > Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
> > {flush,invalidate}_kernel_vmap_range calls to
> > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
> > buffers.
> >
> > Fixes: c0acf9cfeee0 ("media: videobuf2: handle
> > V4L2_MEMORY_FLAG_NON_COHERENT flag")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
> > ---
> >  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22
> > ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > index
> > a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7
> > bc888a84a95d5 100644
> > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > @@ -427,6 +427,17 @@ static int
> >  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
> >                                  enum dma_data_direction
> > direction)
> >  {
> > +     struct vb2_dc_buf *buf = dbuf->priv;
> > +     struct sg_table *sgt = buf->dma_sgt;
> > +
> > +     if (!buf->non_coherent_mem)
> > +             return 0;
> > +
> > +     if (buf->vaddr)
> > +             invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>
> What would make me a lot more confortable with this change is if you
> enable kernel mappings for one test. This will ensure you cover the
> call to "invalidate" in your testing. I'd like to know about the
> performance impact. With this implementation it should be identical to
> the VB2 one.

I agree that it would be good to test that path as well. I wonder if
we could somehow do it with one of the vi* drivers...

>
> What I was trying to say in previous comments, is that my impression is
> that we can skip this for CPU read access, since we don't guaranty
> concurrent access anyway. Both address space can keep their cache in
> that case. Though, I see RKISP does not use kernel mapping plus I'm not
> reporting a bug, but checking if we should leave a comment for possible
> users of kernel mapping in the future ?

We can't skip it for CPU read access, because it may be the first read
after the DMA writing to the buffer, so we need to invalidate the
caches.

That said, on majority of systems this will be a no-op, because it
only applies to VIVT and VIPT aliasing caches + only when the kernel
mapping is actually used (the buf->vaddr mapping is created on
demand).

>
> > +
> > +     dma_sync_sgtable_for_cpu(buf->dev, sgt, direction);
> > +
> >       return 0;
> >  }
> >
> > @@ -434,6 +445,17 @@ static int
> >  vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
> >                                enum dma_data_direction direction)
> >  {
> > +     struct vb2_dc_buf *buf = dbuf->priv;
> > +     struct sg_table *sgt = buf->dma_sgt;
> > +
> > +     if (!buf->non_coherent_mem)
> > +             return 0;
> > +
> > +     if (buf->vaddr)
> > +             flush_kernel_vmap_range(buf->vaddr, buf->size);
> > +
> > +     dma_sync_sgtable_for_device(buf->dev, sgt, direction);
> > +
> >       return 0;
> >  }
> >
> >
>
Mikhail Rudenko March 9, 2025, 8:18 p.m. UTC | #4
Hi Nicolas, Tomasz,

On 2025-03-03 at 10:24 -05, Nicolas Dufresne <nicolas@ndufresne.ca> wrote:

> Hi Mikhail,
>
> Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit :
>> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
>> commit 129134e5415d ("media: media/v4l2: remove
>> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
>> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
>> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
>> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
>> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
>> no-ops, making cache maintenance for non-coherent dmabufs allocated
>> by
>> dma-contig impossible.
>>
>> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
>> {flush,invalidate}_kernel_vmap_range calls to
>> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
>> buffers.
>>
>> Fixes: c0acf9cfeee0 ("media: videobuf2: handle
>> V4L2_MEMORY_FLAG_NON_COHERENT flag")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
>> ---
>>  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22
>> ++++++++++++++++++++++
>>  1 file changed, 22 insertions(+)
>>
>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> index
>> a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7
>> bc888a84a95d5 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> @@ -427,6 +427,17 @@ static int
>>  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
>>  				   enum dma_data_direction
>> direction)
>>  {
>> +	struct vb2_dc_buf *buf = dbuf->priv;
>> +	struct sg_table *sgt = buf->dma_sgt;
>> +
>> +	if (!buf->non_coherent_mem)
>> +		return 0;
>> +
>> +	if (buf->vaddr)
>> +		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>
> What would make me a lot more confortable with this change is if you
> enable kernel mappings for one test. This will ensure you cover the
> call to "invalidate" in your testing. I'd like to know about the
> performance impact. With this implementation it should be identical to
> the VB2 one.
>

I have re-run my tests on RK3399, with 1280x720 XRGB capture buffers (1
plane, 3686400 bytes). Capture process was pinned to "big" cores running
at fixed frequency of 1.8 GHz. Libcamera was modified to request buffers
with V4L2_MEMORY_FLAG_NON_COHERENT flag. DMA_BUF_IOCTL_SYNC ioctls were
used as appropriate. For kernel mapping effect test, vb2_plane_vaddr
call was inserted into rkisp1_vb2_buf_init.

The timings are as following:

- memcpy coherent buffer: 7570 +/- 63 us
- memcpy non-coherent buffer: 1120 +/- 34 us

without kernel mapping:

- ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 428 +/- 15 us
- ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 422 +/- 28 us

with kernel mapping:

- ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 526 +/- 13 us
- ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 519 +/- 38 us

So, even with kernel mapping enabled, speedup is 7570 / (1120 + 526 + 519) = 3.5,
and the use of noncoherent buffers is justified -- at least on this platform.

--
Best regards,
Mikhail Rudenko
Tomasz Figa March 10, 2025, 9 a.m. UTC | #5
On Mon, Mar 10, 2025 at 5:52 PM Mikhail Rudenko <mike.rudenko@gmail.com> wrote:
>
>
> Hi Nicolas, Tomasz,
>
> On 2025-03-03 at 10:24 -05, Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> > Hi Mikhail,
> >
> > Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit :
> >> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
> >> commit 129134e5415d ("media: media/v4l2: remove
> >> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
> >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
> >> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
> >> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
> >> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
> >> no-ops, making cache maintenance for non-coherent dmabufs allocated
> >> by
> >> dma-contig impossible.
> >>
> >> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
> >> {flush,invalidate}_kernel_vmap_range calls to
> >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
> >> buffers.
> >>
> >> Fixes: c0acf9cfeee0 ("media: videobuf2: handle
> >> V4L2_MEMORY_FLAG_NON_COHERENT flag")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
> >> ---
> >>  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22
> >> ++++++++++++++++++++++
> >>  1 file changed, 22 insertions(+)
> >>
> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> index
> >> a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7
> >> bc888a84a95d5 100644
> >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> @@ -427,6 +427,17 @@ static int
> >>  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
> >>                                 enum dma_data_direction
> >> direction)
> >>  {
> >> +    struct vb2_dc_buf *buf = dbuf->priv;
> >> +    struct sg_table *sgt = buf->dma_sgt;
> >> +
> >> +    if (!buf->non_coherent_mem)
> >> +            return 0;
> >> +
> >> +    if (buf->vaddr)
> >> +            invalidate_kernel_vmap_range(buf->vaddr, buf->size);
> >
> > What would make me a lot more confortable with this change is if you
> > enable kernel mappings for one test. This will ensure you cover the
> > call to "invalidate" in your testing. I'd like to know about the
> > performance impact. With this implementation it should be identical to
> > the VB2 one.
> >
>
> I have re-run my tests on RK3399, with 1280x720 XRGB capture buffers (1
> plane, 3686400 bytes). Capture process was pinned to "big" cores running
> at fixed frequency of 1.8 GHz. Libcamera was modified to request buffers
> with V4L2_MEMORY_FLAG_NON_COHERENT flag. DMA_BUF_IOCTL_SYNC ioctls were
> used as appropriate. For kernel mapping effect test, vb2_plane_vaddr
> call was inserted into rkisp1_vb2_buf_init.
>
> The timings are as following:
>
> - memcpy coherent buffer: 7570 +/- 63 us
> - memcpy non-coherent buffer: 1120 +/- 34 us
>
> without kernel mapping:
>
> - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 428 +/- 15 us
> - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 422 +/- 28 us
>
> with kernel mapping:
>
> - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 526 +/- 13 us
> - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 519 +/- 38 us
>
> So, even with kernel mapping enabled, speedup is 7570 / (1120 + 526 + 519) = 3.5,
> and the use of noncoherent buffers is justified -- at least on this platform.

Thanks a lot for the additional testing.

Acked-by: Tomasz Figa <tfiga@chromium.org>

Best regards,
Tomasz
diff mbox series

Patch

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7bc888a84a95d5 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -427,6 +427,17 @@  static int
 vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
 				   enum dma_data_direction direction)
 {
+	struct vb2_dc_buf *buf = dbuf->priv;
+	struct sg_table *sgt = buf->dma_sgt;
+
+	if (!buf->non_coherent_mem)
+		return 0;
+
+	if (buf->vaddr)
+		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
+
+	dma_sync_sgtable_for_cpu(buf->dev, sgt, direction);
+
 	return 0;
 }
 
@@ -434,6 +445,17 @@  static int
 vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
 				 enum dma_data_direction direction)
 {
+	struct vb2_dc_buf *buf = dbuf->priv;
+	struct sg_table *sgt = buf->dma_sgt;
+
+	if (!buf->non_coherent_mem)
+		return 0;
+
+	if (buf->vaddr)
+		flush_kernel_vmap_range(buf->vaddr, buf->size);
+
+	dma_sync_sgtable_for_device(buf->dev, sgt, direction);
+
 	return 0;
 }