diff mbox series

[v6,4/5] dma-buf: heaps: Add CMA heap to dmabuf heaps

Message ID 20190624194908.121273-5-john.stultz@linaro.org (mailing list archive)
State New, archived
Headers show
Series DMA-BUF Heaps (destaging ION) | expand

Commit Message

John Stultz June 24, 2019, 7:49 p.m. UTC
This adds a CMA heap, which allows userspace to allocate
a dma-buf of contiguous memory out of a CMA region.

This code is an evolution of the Android ION implementation, so
thanks to its original author and maintainters:
  Benjamin Gaignard, Laura Abbott, and others!

Cc: Laura Abbott <labbott@redhat.com>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Pratik Patel <pratikp@codeaurora.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Vincent Donnefort <Vincent.Donnefort@arm.com>
Cc: Sudipto Paul <Sudipto.Paul@arm.com>
Cc: Andrew F. Davis <afd@ti.com>
Cc: Xu YiPing <xuyiping@hisilicon.com>
Cc: "Chenfeng (puck)" <puck.chen@hisilicon.com>
Cc: butao <butao@hisilicon.com>
Cc: "Xiaqing (A)" <saberlily.xia@hisilicon.com>
Cc: Yudongbin <yudongbin@hisilicon.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chenbo Feng <fengc@google.com>
Cc: Alistair Strachan <astrachan@google.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Change-Id: Ic2b0c5dfc0dbaff5245bd1c50170c64b06c73051
---
v2:
* Switch allocate to return dmabuf fd
* Simplify init code
* Checkpatch fixups
v3:
* Switch to inline function for to_cma_heap()
* Minor cleanups suggested by Brian
* Fold in new registration style from Andrew
* Folded in changes from Andrew to use simplified page list
  from the heap helpers
v4:
* Use the fd_flags when creating dmabuf fd (Suggested by
  Benjamin)
* Use precalculated pagecount (Suggested by Andrew)
v6:
* Changed variable names to improve clarity, as suggested
  by Brian
---
 drivers/dma-buf/heaps/Kconfig    |   8 ++
 drivers/dma-buf/heaps/Makefile   |   1 +
 drivers/dma-buf/heaps/cma_heap.c | 169 +++++++++++++++++++++++++++++++
 3 files changed, 178 insertions(+)
 create mode 100644 drivers/dma-buf/heaps/cma_heap.c

Comments

John Stultz July 23, 2019, 5:04 a.m. UTC | #1
On Thu, Jul 18, 2019 at 3:08 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> This and the previous one seem very much duplicated boilerplate
> code.

So yes, there is some duplicate boilerplate between the system and cma
heaps particularly in the allocation function, where we allocate and
set up the helper buffer structure, allocate the pages, then create
and export the dmabuf. I took a pass at trying to minimize some of
this earlier, but those efforts ended up adding helper functions that
take a ton of arguments, and had some trouble properly handling error
paths without leaking memory, so I left it as is.

I'll try to take another pass myself but if you have specific
suggestions for improving here, I'd be happy to try them.

> Why can't just normal branches for allocating and freeing
> normal pages vs cma?  We even have an existing helper for that
> with dma_alloc_contiguous().

Apologies, I'm not sure I'm understanding your suggestion here.
dma_alloc_contiguous() does have some interesting optimizations
(avoiding allocating single page from cma), though its focus on
default area vs specific device area doesn't quite match up the usage
model for dma heaps.  Instead of allocating memory for a single
device, we want to be able to allow userland, for a given usage mode,
to be able to allocate a dmabuf from a specific heap of memory which
will satisfy the usage mode for that buffer pipeline (across multiple
devices).

Now, indeed, the system and cma heaps in this patchset are a bit
simple/trivial (though useful with my devices that require contiguous
buffers for the display driver), but more complex ION heaps have
traditionally stayed out of tree in vendor code, in many cases making
incompatible tweaks to the ION core dmabuf exporter logic. That's why
dmabuf heaps is trying to destage ION in a way that allows heaps to
implement their exporter logic themselves, so we can start pulling
those more complicated heaps out of their vendor hidey-holes and get
some proper upstream review.

But I suspect I just am confused as to what your suggesting. Maybe
could you expand a bit? Apologies for being a bit dense.

Thanks again for the input!
-john
Benjamin Gaignard July 24, 2019, 8:08 a.m. UTC | #2
Le mer. 24 juil. 2019 à 09:00, Christoph Hellwig <hch@infradead.org> a écrit :
>
> On Mon, Jul 22, 2019 at 10:04:06PM -0700, John Stultz wrote:
> > Apologies, I'm not sure I'm understanding your suggestion here.
> > dma_alloc_contiguous() does have some interesting optimizations
> > (avoiding allocating single page from cma), though its focus on
> > default area vs specific device area doesn't quite match up the usage
> > model for dma heaps.  Instead of allocating memory for a single
> > device, we want to be able to allow userland, for a given usage mode,
> > to be able to allocate a dmabuf from a specific heap of memory which
> > will satisfy the usage mode for that buffer pipeline (across multiple
> > devices).
> >
> > Now, indeed, the system and cma heaps in this patchset are a bit
> > simple/trivial (though useful with my devices that require contiguous
> > buffers for the display driver), but more complex ION heaps have
> > traditionally stayed out of tree in vendor code, in many cases making
> > incompatible tweaks to the ION core dmabuf exporter logic.
>
> So what would the more complicated heaps be?
>
> > That's why
> > dmabuf heaps is trying to destage ION in a way that allows heaps to
> > implement their exporter logic themselves, so we can start pulling
> > those more complicated heaps out of their vendor hidey-holes and get
> > some proper upstream review.
> >
> > But I suspect I just am confused as to what your suggesting. Maybe
> > could you expand a bit? Apologies for being a bit dense.
>
> My suggestion is to merge the system and CMA heaps.  CMA (at least
> the system-wide CMA area) is really just an optimization to get
> large contigous regions more easily.  We should make use of it as
> transparent as possible, just like we do in the DMA code.

CMA has made possible to get large regions of memories and to give some
priority on device allocating pages on it. I don't think that possible
with system
heap so I suggest to keep CMA heap if we want to be able to port a maximum
of applications on dma-buf-heap.

Benjamin
Laura Abbott July 24, 2019, 11:38 a.m. UTC | #3
On 7/24/19 2:59 AM, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 10:04:06PM -0700, John Stultz wrote:
>> Apologies, I'm not sure I'm understanding your suggestion here.
>> dma_alloc_contiguous() does have some interesting optimizations
>> (avoiding allocating single page from cma), though its focus on
>> default area vs specific device area doesn't quite match up the usage
>> model for dma heaps.  Instead of allocating memory for a single
>> device, we want to be able to allow userland, for a given usage mode,
>> to be able to allocate a dmabuf from a specific heap of memory which
>> will satisfy the usage mode for that buffer pipeline (across multiple
>> devices).
>>
>> Now, indeed, the system and cma heaps in this patchset are a bit
>> simple/trivial (though useful with my devices that require contiguous
>> buffers for the display driver), but more complex ION heaps have
>> traditionally stayed out of tree in vendor code, in many cases making
>> incompatible tweaks to the ION core dmabuf exporter logic.
> 
> So what would the more complicated heaps be?
> 
>> That's why
>> dmabuf heaps is trying to destage ION in a way that allows heaps to
>> implement their exporter logic themselves, so we can start pulling
>> those more complicated heaps out of their vendor hidey-holes and get
>> some proper upstream review.
>>
>> But I suspect I just am confused as to what your suggesting. Maybe
>> could you expand a bit? Apologies for being a bit dense.
> 
> My suggestion is to merge the system and CMA heaps.  CMA (at least
> the system-wide CMA area) is really just an optimization to get
> large contigous regions more easily.  We should make use of it as
> transparent as possible, just like we do in the DMA code.
> 

It's not just an optimization for Ion though. Ion was designed to
let the callers choose between system and multiple CMA heaps. On other
systems there may be multiple CMA regions dedicated to a specific
purpose or placed at a specific address. The callers need to
be able to choose exactly whether they want a particular CMA region
or discontiguous regions.

Thanks,
Laura
Andrew Davis July 24, 2019, 3:46 p.m. UTC | #4
On 7/24/19 2:59 AM, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 10:04:06PM -0700, John Stultz wrote:
>> Apologies, I'm not sure I'm understanding your suggestion here.
>> dma_alloc_contiguous() does have some interesting optimizations
>> (avoiding allocating single page from cma), though its focus on
>> default area vs specific device area doesn't quite match up the usage
>> model for dma heaps.  Instead of allocating memory for a single
>> device, we want to be able to allow userland, for a given usage mode,
>> to be able to allocate a dmabuf from a specific heap of memory which
>> will satisfy the usage mode for that buffer pipeline (across multiple
>> devices).
>>
>> Now, indeed, the system and cma heaps in this patchset are a bit
>> simple/trivial (though useful with my devices that require contiguous
>> buffers for the display driver), but more complex ION heaps have
>> traditionally stayed out of tree in vendor code, in many cases making
>> incompatible tweaks to the ION core dmabuf exporter logic.
> 
> So what would the more complicated heaps be?
> 


https://patchwork.kernel.org/patch/10863957/

It's actually a more simple heap type IMHO, but the logic inside is
incompatible with the system/CMA heaps, if you move any of their code
into the core framework then this heap stops working. Leading to out of
tree hacks on the core to get it back functional. I see the same for the
"complex" heaps with ION.

Andrew


>> That's why
>> dmabuf heaps is trying to destage ION in a way that allows heaps to
>> implement their exporter logic themselves, so we can start pulling
>> those more complicated heaps out of their vendor hidey-holes and get
>> some proper upstream review.
>>
>> But I suspect I just am confused as to what your suggesting. Maybe
>> could you expand a bit? Apologies for being a bit dense.
> 
> My suggestion is to merge the system and CMA heaps.  CMA (at least
> the system-wide CMA area) is really just an optimization to get
> large contigous regions more easily.  We should make use of it as
> transparent as possible, just like we do in the DMA code.
>
John Stultz July 24, 2019, 6:46 p.m. UTC | #5
On Wed, Jul 24, 2019 at 12:00 AM Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Jul 22, 2019 at 10:04:06PM -0700, John Stultz wrote:
> > Apologies, I'm not sure I'm understanding your suggestion here.
> > dma_alloc_contiguous() does have some interesting optimizations
> > (avoiding allocating single page from cma), though its focus on
> > default area vs specific device area doesn't quite match up the usage
> > model for dma heaps.  Instead of allocating memory for a single
> > device, we want to be able to allow userland, for a given usage mode,
> > to be able to allocate a dmabuf from a specific heap of memory which
> > will satisfy the usage mode for that buffer pipeline (across multiple
> > devices).
> >
> > Now, indeed, the system and cma heaps in this patchset are a bit
> > simple/trivial (though useful with my devices that require contiguous
> > buffers for the display driver), but more complex ION heaps have
> > traditionally stayed out of tree in vendor code, in many cases making
> > incompatible tweaks to the ION core dmabuf exporter logic.
>
> So what would the more complicated heaps be?

Mostly secure heaps, but there have been work in the past with
specially aligned chunk heaps in the past. Unfortunately since every
vendor tree also include many of their own hacks to the core ION code
(usually without changelogs or comments) its hard to decipher much of
it.

Some examples:
msm:
https://github.com/Panchajanya1999/msm-4.14/blob/Pie/drivers/staging/android/ion/ion_cma_secure_heap.c

exynos:
https://github.com/exynos-linux-stable/starlte/tree/tw90-android-p/drivers/staging/android/ion/exynos

hisi:
https://github.com/hexdebug/android_kernel_huawei_hi3660/blob/master/drivers/staging/android/ion/ion_seccm_heap.c
https://github.com/hexdebug/android_kernel_huawei_hi3660/blob/master/drivers/staging/android/ion/ion_secsg_heap.c
https://github.com/hexdebug/android_kernel_huawei_hi3660/blob/master/drivers/staging/android/ion/ion_fama_misc_heap.c

mediatek:
https://android.googlesource.com/kernel/mediatek/+/android-mtk-3.18/drivers/staging/android/ion/mtk/ion_mm_heap.c

But Andrew's example is probably a better example against the dmabuf
heaps frameworks


> > That's why
> > dmabuf heaps is trying to destage ION in a way that allows heaps to
> > implement their exporter logic themselves, so we can start pulling
> > those more complicated heaps out of their vendor hidey-holes and get
> > some proper upstream review.
> >
> > But I suspect I just am confused as to what your suggesting. Maybe
> > could you expand a bit? Apologies for being a bit dense.
>
> My suggestion is to merge the system and CMA heaps.  CMA (at least
> the system-wide CMA area) is really just an optimization to get
> large contigous regions more easily.  We should make use of it as
> transparent as possible, just like we do in the DMA code.

I'm still not understanding how this would work. Benjamin and Laura
already commented on this point, but for a simple example, with the
HiKey boards, the DRM driver requires contiguous memory for the
framebuffer, but the GPU can handle non-contiguous. Thus the target
framebuffers that we pass to the display has to be CMA allocated, but
all the other graphics buffers that the GPU will render to and
composite can be system.

Having the separate heaps allows the gralloc code to allocate the
proper buffer depending on the planned usage (GRALLOC_USAGE_HW_FB or
not), and that way we don't waste CMA on buffers that don't have to be
contiguous.

Laura already touched on this, but similar logic can be used for
camera buffers, which can make sure we allocate from a specifically
reserved CMA region that is only used for the camera so we can always
be sure to have N free buffers immediately to capture with, etc.

But let me know if I'm still misunderstanding your suggestion.

thanks
-john
Benjamin Gaignard July 25, 2019, 1:20 p.m. UTC | #6
Le jeu. 25 juil. 2019 à 14:52, Christoph Hellwig <hch@infradead.org> a écrit :
>
> On Wed, Jul 24, 2019 at 11:46:24AM -0700, John Stultz wrote:
> > I'm still not understanding how this would work. Benjamin and Laura
> > already commented on this point, but for a simple example, with the
> > HiKey boards, the DRM driver requires contiguous memory for the
> > framebuffer, but the GPU can handle non-contiguous. Thus the target
> > framebuffers that we pass to the display has to be CMA allocated, but
> > all the other graphics buffers that the GPU will render to and
> > composite can be system.

No we have uses cases where graphic buffers can go directly to display without
using GPU. For example we can grab frames from the camera and send them directly
in display (same for video decoder) because we have planes for that.

>
> But that just means we need a flag that memory needs to be contiguous,
> which totally makes sense at the API level.  But CMA is not the only
> source of contigous memory, so we should not conflate the two.

We have one file descriptor per heap to be able to add access control
on each heap.
That wasn't possible with ION because the heap was selected given the
flags in ioctl
structure and we can't do access control based on that. If we put flag
to select the
allocation mechanism (system, CMA, other) in ioctl we come back to ION status.
For me one allocation mechanism = one heap.

>
> > Laura already touched on this, but similar logic can be used for
> > camera buffers, which can make sure we allocate from a specifically
> > reserved CMA region that is only used for the camera so we can always
> > be sure to have N free buffers immediately to capture with, etc.
>
> And for that we already have per-device CMA areas hanging off struct
> device, which this should reuse instead of adding another duplicate
> CMA area lookup scheme.
Andrew Davis July 25, 2019, 1:31 p.m. UTC | #7
On 7/25/19 8:50 AM, Christoph Hellwig wrote:
> On Wed, Jul 24, 2019 at 11:46:01AM -0400, Andrew F. Davis wrote:
>> https://patchwork.kernel.org/patch/10863957/
>>
>> It's actually a more simple heap type IMHO, but the logic inside is
>> incompatible with the system/CMA heaps, if you move any of their code
>> into the core framework then this heap stops working. Leading to out of
>> tree hacks on the core to get it back functional. I see the same for the
>> "complex" heaps with ION.
> 
> Well, this mostly is just another allocator (gen_pool).  And given that
> the whole dma-buf infrastucture assumes things are backed by pages we
> really shouldn't need much more than an alloc and a free callback (and
> maybe the pgprot to map it) and handle the rest in common code.
> 

But that's just it, dma-buf does not assume buffers are backed by normal
kernel managed memory, it is up to the buffer exporter where and when to
allocate the memory. The memory backed by this SRAM buffer does not have
the normal struct page backing. So moving the map, sync, etc functions
to common code would fail for this and many other heap types. This was a
major problem with Ion that prompted this new design.

Each heap type may need to do something different depending on its
backing memory, moving everything to common code that is common to
System and CMA heaps is would lead those being the only upstreamable heaps.

Andrew
Andrew Davis July 25, 2019, 1:47 p.m. UTC | #8
On 7/25/19 8:48 AM, Christoph Hellwig wrote:
> On Wed, Jul 24, 2019 at 07:38:07AM -0400, Laura Abbott wrote:
>> It's not just an optimization for Ion though. Ion was designed to
>> let the callers choose between system and multiple CMA heaps.
> 
> Who cares about ion?  That some out of tree android crap that should
> not be relevant for upstream except as an example for how not to design
> things..
> 

Tell us how you really feel about ION :)

>> On other
>> systems there may be multiple CMA regions dedicated to a specific
>> purpose or placed at a specific address. The callers need to
>> be able to choose exactly whether they want a particular CMA region
>> or discontiguous regions.
> 
> At least in cma is only used either with the global pool or a per-device
> cma pool.  I think if you want to make this new dma-buf API fit in with
> the rest with the kernel you follow that model, and pass in a struct
> device to select the particular cma area, similar how the DMA allocator
> works.
> 

This is a central allocator, it is not tied to any one device. If we
knew the one device ahead of time we would just use the existing dma_alloc.

We might be able to solve some of that with late mapping after all the
devices attach to the buffer, but even then, which device's CMA area
would we chose to use from all the attached devices?

I can agree that allocating from per-device CMA using Heaps doesn't make
much sense, but for global pools I'm not sure I see any way to allow
devices to select which pool is right for a specific use. They don't
have the full use-case information like the application does, the
selection needs to be made from the application.

Andrew
Andrew Davis July 25, 2019, 2:10 p.m. UTC | #9
On 7/25/19 10:04 AM, Christoph Hellwig wrote:
> On Thu, Jul 25, 2019 at 09:31:50AM -0400, Andrew F. Davis wrote:
>> But that's just it, dma-buf does not assume buffers are backed by normal
>> kernel managed memory, it is up to the buffer exporter where and when to
>> allocate the memory. The memory backed by this SRAM buffer does not have
>> the normal struct page backing. So moving the map, sync, etc functions
>> to common code would fail for this and many other heap types. This was a
>> major problem with Ion that prompted this new design.
> 
> The code clearly shows it has page backing, e.g. this:
> 
> +	sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(buffer->paddr)), buffer->len, 0);
> 
> and the fact that it (and the dma-buf API) uses scatterlists, which 
> requires pages.
> 

Pages yes, but not "normal" pages from the kernel managed area.
page_to_pfn() will return bad values on the pages returned by this
allocator and so will any of the kernel sync/map functions. Therefor
those operations cannot be common and need special per-heap handling.

Andrew
Andrew Davis July 25, 2019, 2:25 p.m. UTC | #10
On 7/25/19 10:11 AM, Christoph Hellwig wrote:
> On Thu, Jul 25, 2019 at 10:10:08AM -0400, Andrew F. Davis wrote:
>> Pages yes, but not "normal" pages from the kernel managed area.
>> page_to_pfn() will return bad values on the pages returned by this
>> allocator and so will any of the kernel sync/map functions. Therefor
>> those operations cannot be common and need special per-heap handling.
> 
> Well, that means this thing is buggy and abuses the scatterlist API
> and we can't merge it anyway, so it is irrelevant.
> 

Since when do scatterlists need to only have kernel virtual backed
memory pages? Device memory is stored in scatterlists and
dma_sync_sg_for_* would fail just the same when the cache ops were
attempted.
Benjamin Gaignard July 25, 2019, 2:46 p.m. UTC | #11
Le jeu. 25 juil. 2019 à 16:33, Christoph Hellwig <hch@infradead.org> a écrit :
>
> On Thu, Jul 25, 2019 at 03:20:11PM +0200, Benjamin Gaignard wrote:
> > > But that just means we need a flag that memory needs to be contiguous,
> > > which totally makes sense at the API level.  But CMA is not the only
> > > source of contigous memory, so we should not conflate the two.
> >
> > We have one file descriptor per heap to be able to add access control
> > on each heap.
> > That wasn't possible with ION because the heap was selected given the
> > flags in ioctl
> > structure and we can't do access control based on that. If we put flag
> > to select the
> > allocation mechanism (system, CMA, other) in ioctl we come back to ION status.
> > For me one allocation mechanism = one heap.
>
> Well, I agree with your split for different fundamentally different
> allocators.  But the point is that CMA (at least the system cma area)
> fundamentally isn't a different allocator.  The per-device CMA area
> still are kinda the same, but you can just have one fd for each
> per-device CMA area to make your life simple.

Form my point of view default cma area is a specific allocator because
it could migrate page to give them in priority to contigous allocation. It is
also one of the last region where system page are going to be allocated.
I don't think that system allocator does have the same features, either we
would have use it rather develop CMA years ago ;-). In short CMA had
solved lot of problems on our side so it had to have it own heap.
Andrew Davis July 25, 2019, 2:51 p.m. UTC | #12
On 7/25/19 10:30 AM, Christoph Hellwig wrote:
> On Thu, Jul 25, 2019 at 10:25:50AM -0400, Andrew F. Davis wrote:
>> On 7/25/19 10:11 AM, Christoph Hellwig wrote:
>>> On Thu, Jul 25, 2019 at 10:10:08AM -0400, Andrew F. Davis wrote:
>>>> Pages yes, but not "normal" pages from the kernel managed area.
>>>> page_to_pfn() will return bad values on the pages returned by this
>>>> allocator and so will any of the kernel sync/map functions. Therefor
>>>> those operations cannot be common and need special per-heap handling.
>>>
>>> Well, that means this thing is buggy and abuses the scatterlist API
>>> and we can't merge it anyway, so it is irrelevant.
>>>
>>
>> Since when do scatterlists need to only have kernel virtual backed
>> memory pages? Device memory is stored in scatterlists and
>> dma_sync_sg_for_* would fail just the same when the cache ops were
>> attempted.
> 
> I'm not sure what you mean with virtual backed memory pages, as we
> don't really have that concept.
> 
> But a page in the scatterlist needs to be able to be used everywhere
> we'd normally use a page, e.g. page_to_phys, page_to_pfn, kmap,
> page_address (if !highmem) as consumers including the dma mapping
> interface do all that.
> 
> If you want to dma map memory that does not have page backing you
> need to use dma_map_resource.
> 

I probably should have worded that better.

It does have page backing, what I meant by "page_to_pfn() will return
bad values" is not that it won't give you the correct pfn, it will, but
that then that pfn is not part of the normal memory space
(lowmem/highmem) it's device memory, so cache ops won't work. But you
should not be doing that on device memory anyway.

That is a problem with Ion I want to avoid, it assumed all buffers were
in DDR and so would do cache operations on them unconditionally, too
many assumptions were made as too much was moved into the common core
code and not enough was left for the heaps themselves to decide.
diff mbox series

Patch

diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
index 205052744169..a5eef06c4226 100644
--- a/drivers/dma-buf/heaps/Kconfig
+++ b/drivers/dma-buf/heaps/Kconfig
@@ -4,3 +4,11 @@  config DMABUF_HEAPS_SYSTEM
 	help
 	  Choose this option to enable the system dmabuf heap. The system heap
 	  is backed by pages from the buddy allocator. If in doubt, say Y.
+
+config DMABUF_HEAPS_CMA
+	bool "DMA-BUF CMA Heap"
+	depends on DMABUF_HEAPS && DMA_CMA
+	help
+	  Choose this option to enable dma-buf CMA heap. This heap is backed
+	  by the Contiguous Memory Allocator (CMA). If your system has these
+	  regions, you should say Y here.
diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
index d1808eca2581..6e54cdec3da0 100644
--- a/drivers/dma-buf/heaps/Makefile
+++ b/drivers/dma-buf/heaps/Makefile
@@ -1,3 +1,4 @@ 
 # SPDX-License-Identifier: GPL-2.0
 obj-y					+= heap-helpers.o
 obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)	+= system_heap.o
+obj-$(CONFIG_DMABUF_HEAPS_CMA)		+= cma_heap.o
diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
new file mode 100644
index 000000000000..d2b10878b60b
--- /dev/null
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -0,0 +1,169 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DMABUF CMA heap exporter
+ *
+ * Copyright (C) 2012, 2019 Linaro Ltd.
+ * Author: <benjamin.gaignard@linaro.org> for ST-Ericsson.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-heap.h>
+#include <linux/slab.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/cma.h>
+#include <linux/scatterlist.h>
+#include <linux/highmem.h>
+
+#include "heap-helpers.h"
+
+struct cma_heap {
+	struct dma_heap *heap;
+	struct cma *cma;
+};
+
+static void cma_heap_free(struct heap_helper_buffer *buffer)
+{
+	struct cma_heap *cma_heap = dma_heap_get_data(buffer->heap_buffer.heap);
+	unsigned long nr_pages = buffer->pagecount;
+	struct page *cma_pages = buffer->priv_virt;
+
+	/* free page list */
+	kfree(buffer->pages);
+	/* release memory */
+	cma_release(cma_heap->cma, cma_pages, nr_pages);
+	kfree(buffer);
+}
+
+/* dmabuf heap CMA operations functions */
+static int cma_heap_allocate(struct dma_heap *heap,
+				unsigned long len,
+				unsigned long fd_flags,
+				unsigned long heap_flags)
+{
+	struct cma_heap *cma_heap = dma_heap_get_data(heap);
+	struct heap_helper_buffer *helper_buffer;
+	struct page *cma_pages;
+	size_t size = PAGE_ALIGN(len);
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	unsigned long align = get_order(size);
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+	struct dma_buf *dmabuf;
+	int ret = -ENOMEM;
+	pgoff_t pg;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	helper_buffer = kzalloc(sizeof(*helper_buffer), GFP_KERNEL);
+	if (!helper_buffer)
+		return -ENOMEM;
+
+	INIT_HEAP_HELPER_BUFFER(helper_buffer, cma_heap_free);
+	helper_buffer->heap_buffer.flags = heap_flags;
+	helper_buffer->heap_buffer.heap = heap;
+	helper_buffer->heap_buffer.size = len;
+
+	cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, false);
+	if (!cma_pages)
+		goto free_buf;
+
+	if (PageHighMem(cma_pages)) {
+		unsigned long nr_clear_pages = nr_pages;
+		struct page *page = cma_pages;
+
+		while (nr_clear_pages > 0) {
+			void *vaddr = kmap_atomic(page);
+
+			memset(vaddr, 0, PAGE_SIZE);
+			kunmap_atomic(vaddr);
+			page++;
+			nr_clear_pages--;
+		}
+	} else {
+		memset(page_address(cma_pages), 0, size);
+	}
+
+	helper_buffer->pagecount = nr_pages;
+	helper_buffer->pages = kmalloc_array(helper_buffer->pagecount,
+					     sizeof(*helper_buffer->pages),
+					     GFP_KERNEL);
+	if (!helper_buffer->pages) {
+		ret = -ENOMEM;
+		goto free_cma;
+	}
+
+	for (pg = 0; pg < helper_buffer->pagecount; pg++) {
+		helper_buffer->pages[pg] = &cma_pages[pg];
+		if (!helper_buffer->pages[pg])
+			goto free_pages;
+	}
+
+	/* create the dmabuf */
+	exp_info.ops = &heap_helper_ops;
+	exp_info.size = len;
+	exp_info.flags = fd_flags;
+	exp_info.priv = &helper_buffer->heap_buffer;
+	dmabuf = dma_buf_export(&exp_info);
+	if (IS_ERR(dmabuf)) {
+		ret = PTR_ERR(dmabuf);
+		goto free_pages;
+	}
+
+	helper_buffer->heap_buffer.dmabuf = dmabuf;
+	helper_buffer->priv_virt = cma_pages;
+
+	ret = dma_buf_fd(dmabuf, fd_flags);
+	if (ret < 0) {
+		dma_buf_put(dmabuf);
+		/* just return, as put will call release and that will free */
+		return ret;
+	}
+
+	return ret;
+
+free_pages:
+	kfree(helper_buffer->pages);
+free_cma:
+	cma_release(cma_heap->cma, cma_pages, nr_pages);
+free_buf:
+	kfree(helper_buffer);
+	return ret;
+}
+
+static struct dma_heap_ops cma_heap_ops = {
+	.allocate = cma_heap_allocate,
+};
+
+static int __add_cma_heap(struct cma *cma, void *data)
+{
+	struct cma_heap *cma_heap;
+	struct dma_heap_export_info exp_info;
+
+	cma_heap = kzalloc(sizeof(*cma_heap), GFP_KERNEL);
+	if (!cma_heap)
+		return -ENOMEM;
+	cma_heap->cma = cma;
+
+	exp_info.name = cma_get_name(cma);
+	exp_info.ops = &cma_heap_ops;
+	exp_info.priv = cma_heap;
+
+	cma_heap->heap = dma_heap_add(&exp_info);
+	if (IS_ERR(cma_heap->heap)) {
+		int ret = PTR_ERR(cma_heap->heap);
+
+		kfree(cma_heap);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int add_cma_heaps(void)
+{
+	cma_for_each_area(__add_cma_heap, NULL);
+	return 0;
+}
+device_initcall(add_cma_heaps);