diff mbox

[6/8] drivers: add Contiguous Memory Allocator

Message ID 1309851710-3828-7-git-send-email-m.szyprowski@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marek Szyprowski July 5, 2011, 7:41 a.m. UTC
The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 drivers/base/Kconfig           |   77 +++++++++
 drivers/base/Makefile          |    1 +
 drivers/base/dma-contiguous.c  |  367 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-contiguous.h |  104 +++++++++++
 4 files changed, 549 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/linux/dma-contiguous.h

Comments

Marek Szyprowski July 5, 2011, 10:24 a.m. UTC | #1
Hello,

On Tuesday, July 05, 2011 9:42 AM Marek Szyprowski wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  drivers/base/Kconfig           |   77 +++++++++
>  drivers/base/Makefile          |    1 +
>  drivers/base/dma-contiguous.c  |  367
> ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-contiguous.h |  104 +++++++++++
>  4 files changed, 549 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index d57e8d0..95ae1a7 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -168,4 +168,81 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
> 
> +config CMA
> +	bool "Contiguous Memory Allocator"
> +	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK

The above line should be obviously "depends on HAVE_DMA_CONTIGUOUS &&
HAVE_MEMBLOCK".
I'm sorry for posting broken version. 

(snipped)

Best regards
Russell King - ARM Linux July 5, 2011, 11:33 a.m. UTC | #2
On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.

And how are you addressing the technical concerns about aliasing of
cache attributes which I keep bringing up with this and you keep
ignoring and telling me that I'm standing in your way.
Marek Szyprowski July 6, 2011, 1:58 p.m. UTC | #3
Hello,

On Tuesday, July 05, 2011 1:34 PM Russell King - ARM Linux wrote:

> On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > The Contiguous Memory Allocator is a set of helper functions for DMA
> > mapping framework that improves allocations of contiguous memory chunks.
> >
> > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > gives back to the system. Kernel is allowed to allocate movable pages
> > within CMA's managed memory so that it can be used for example for page
> > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > request such pages are migrated out of CMA area to free required
> > contiguous block and fulfill the request. This allows to allocate large
> > contiguous chunks of memory at any time assuming that there is enough
> > free memory available in the system.
> >
> > This code is heavily based on earlier works by Michal Nazarewicz.
> 
> And how are you addressing the technical concerns about aliasing of
> cache attributes which I keep bringing up with this and you keep
> ignoring and telling me that I'm standing in your way.

I'm perfectly aware of the issues with aliasing of cache attributes.

My idea is to change low memory linear mapping for all CMA areas on boot
time to use 2 level page tables (4KiB mappings instead of super-section
mappings). This way the page properties for a single page in CMA area can
be changed/updated at any time to match required coherent/writecombine
attributes. Linear mapping can be even removed completely if we want to 
create the it elsewhere in the address space. 

The only problem that might need to be resolved is GFP_ATOMIC allocation
(updating page properties probably requires some locking), but it can be
served from a special area which is created on boot without low-memory
mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
for large buffers anyway.

CMA limits the memory area from which coherent pages are being taken quite
well, so the change in the linear mapping method should have no significant
impact on the system performance.

I didn't implement such solution yet, because it is really hard to handle
all issues at the same time and creating the allocator was just a first
step.

Best regards
Arnd Bergmann July 6, 2011, 2:09 p.m. UTC | #4
On Wednesday 06 July 2011, Marek Szyprowski wrote:
> The only problem that might need to be resolved is GFP_ATOMIC allocation
> (updating page properties probably requires some locking), but it can be
> served from a special area which is created on boot without low-memory
> mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
> for large buffers anyway.

Would it be easier to start with a version that only allocated from memory
without a low-memory mapping at first?

This would be similar to the approach that Russell's fix for the regular
dma_alloc_coherent has taken, except that you need to also allow the memory
to be used as highmem user pages.

Maybe you can simply adapt the default location of the contiguous memory
are like this:
- make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
- if ZONE_HIGHMEM exist during boot, put the CMA area in there
- otherwise, put the CMA area at the top end of lowmem, and change
  the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

	Arnd
Russell King - ARM Linux July 6, 2011, 2:23 p.m. UTC | #5
On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> Maybe you can simply adapt the default location of the contiguous memory
> are like this:
> - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> - otherwise, put the CMA area at the top end of lowmem, and change
>   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

One of the requirements of the allocator is that the returned memory
should be zero'd (because it can be exposed to userspace via ALSA
and frame buffers.)

Zeroing the memory from all the contexts which dma_alloc_coherent
is called from is a trivial matter if its in lowmem, but highmem is
harder.

Another issue is that when a platform has restricted DMA regions,
they typically don't fall into the highmem zone.  As the dmabounce
code allocates from the DMA coherent allocator to provide it with
guaranteed DMA-able memory, that would be rather inconvenient.
Nicolas Pitre July 6, 2011, 2:37 p.m. UTC | #6
On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> Another issue is that when a platform has restricted DMA regions,
> they typically don't fall into the highmem zone.  As the dmabounce
> code allocates from the DMA coherent allocator to provide it with
> guaranteed DMA-able memory, that would be rather inconvenient.

Do we encounter this in practice i.e. do those platforms requiring large 
contiguous allocations motivating this work have such DMA restrictions?


Nicolas
Arnd Bergmann July 6, 2011, 2:51 p.m. UTC | #7
On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> > Maybe you can simply adapt the default location of the contiguous memory
> > are like this:
> > - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> > - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> > - otherwise, put the CMA area at the top end of lowmem, and change
> >   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.
> 
> One of the requirements of the allocator is that the returned memory
> should be zero'd (because it can be exposed to userspace via ALSA
> and frame buffers.)
> 
> Zeroing the memory from all the contexts which dma_alloc_coherent
> is called from is a trivial matter if its in lowmem, but highmem is
> harder.

I don't see how. The pages get allocated from an unmapped area
or memory, mapped into the kernel address space as uncached or wc
and then cleared. This should be the same for lowmem or highmem
pages.

What am I missing?

> Another issue is that when a platform has restricted DMA regions,
> they typically don't fall into the highmem zone.  As the dmabounce
> code allocates from the DMA coherent allocator to provide it with
> guaranteed DMA-able memory, that would be rather inconvenient.

True. The dmabounce code would consequently have to allocate
the memory through an internal function that avoids the
contiguous allocation area and goes straight to ZONE_DMA memory
as it does today.

	Arnd
Marek Szyprowski July 6, 2011, 2:56 p.m. UTC | #8
Hello,

On Wednesday, July 06, 2011 4:09 PM Arnd Bergmann wrote:

> On Wednesday 06 July 2011, Marek Szyprowski wrote:
> > The only problem that might need to be resolved is GFP_ATOMIC allocation
> > (updating page properties probably requires some locking), but it can be
> > served from a special area which is created on boot without low-memory
> > mapping at all. None sane driver will call dma_alloc_coherent(GFP_ATOMIC)
> > for large buffers anyway.
> 
> Would it be easier to start with a version that only allocated from memory
> without a low-memory mapping at first?
>
> This would be similar to the approach that Russell's fix for the regular
> dma_alloc_coherent has taken, except that you need to also allow the memory
> to be used as highmem user pages.
> 
> Maybe you can simply adapt the default location of the contiguous memory
> are like this:
> - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> - otherwise, put the CMA area at the top end of lowmem, and change
>   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.

This will not solve our problems. We need CMA also to create at least one
device private area that for sure will be in low memory (video codec).

I will rewrite ARM dma-mapping & CMA integration patch basing on the latest 
ARM for-next patches and add proof-of-concept of the solution presented in my
previous mail (2-level page tables and unmapping pages from low-mem).

Best regards
Arnd Bergmann July 6, 2011, 2:59 p.m. UTC | #9
On Wednesday 06 July 2011, Nicolas Pitre wrote:
> On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> 
> > Another issue is that when a platform has restricted DMA regions,
> > they typically don't fall into the highmem zone.  As the dmabounce
> > code allocates from the DMA coherent allocator to provide it with
> > guaranteed DMA-able memory, that would be rather inconvenient.
> 
> Do we encounter this in practice i.e. do those platforms requiring large 
> contiguous allocations motivating this work have such DMA restrictions?

You can probably find one or two of those, but we don't have to optimize
for that case. I would at least expect the maximum size of the allocation
to be smaller than the DMA limit for these, and consequently mandate that
they define a sufficiently large CONSISTENT_DMA_SIZE for the crazy devices,
or possibly add a hack to unmap some low memory and call
dma_declare_coherent_memory() for the device.

	Arnd
Russell King - ARM Linux July 6, 2011, 3:37 p.m. UTC | #10
On Wed, Jul 06, 2011 at 04:56:23PM +0200, Marek Szyprowski wrote:
> This will not solve our problems. We need CMA also to create at least one
> device private area that for sure will be in low memory (video codec).

You make these statements but you don't say why.  Can you please
explain why the video codec needs low memory - does it have a
restricted number of memory address bits which it can manipulate?
Marek Szyprowski July 6, 2011, 3:47 p.m. UTC | #11
Hello,

On Wednesday, July 06, 2011 5:37 PM Russell King - ARM Linux wrote:

> On Wed, Jul 06, 2011 at 04:56:23PM +0200, Marek Szyprowski wrote:
> > This will not solve our problems. We need CMA also to create at least one
> > device private area that for sure will be in low memory (video codec).
> 
> You make these statements but you don't say why.  Can you please
> explain why the video codec needs low memory - does it have a
> restricted number of memory address bits which it can manipulate?

Nope, it only needs to put some type of memory buffers in first bank 
(effectively in 30000000-34ffffff area) and the others in the second bank
(40000000-57ffffff area). The values are given for Samsung GONI board.

Best regards
Russell King - ARM Linux July 6, 2011, 3:48 p.m. UTC | #12
On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > On Wed, Jul 06, 2011 at 04:09:29PM +0200, Arnd Bergmann wrote:
> > > Maybe you can simply adapt the default location of the contiguous memory
> > > are like this:
> > > - make CONFIG_CMA depend on CONFIG_HIGHMEM on ARM, at compile time
> > > - if ZONE_HIGHMEM exist during boot, put the CMA area in there
> > > - otherwise, put the CMA area at the top end of lowmem, and change
> > >   the zone sizes so ZONE_HIGHMEM stretches over all of the CMA memory.
> > 
> > One of the requirements of the allocator is that the returned memory
> > should be zero'd (because it can be exposed to userspace via ALSA
> > and frame buffers.)
> > 
> > Zeroing the memory from all the contexts which dma_alloc_coherent
> > is called from is a trivial matter if its in lowmem, but highmem is
> > harder.
> 
> I don't see how. The pages get allocated from an unmapped area
> or memory, mapped into the kernel address space as uncached or wc
> and then cleared. This should be the same for lowmem or highmem
> pages.

You don't want to clear them via their uncached or WC mapping, but via
their cached mapping _before_ they get their alternative mapping, and
flush any cached out of that mapping - both L1 and L2 caches.

For lowmem pages, that's easy.  For highmem pages, they need to be
individually kmap'd to zero them etc.  (alloc_pages() warns on
GFP_HIGHMEM + GFP_ZERO from atomic contexts - and dma_alloc_coherent
must be callable from such contexts.)

That may be easier now that we don't have the explicit indicies for
kmap_atomics, but at that time it wasn't easily possible.

> > Another issue is that when a platform has restricted DMA regions,
> > they typically don't fall into the highmem zone.  As the dmabounce
> > code allocates from the DMA coherent allocator to provide it with
> > guaranteed DMA-able memory, that would be rather inconvenient.
> 
> True. The dmabounce code would consequently have to allocate
> the memory through an internal function that avoids the
> contiguous allocation area and goes straight to ZONE_DMA memory
> as it does today.

CMA's whole purpose for existing is to provide _dma-able_ contiguous
memory for things like cameras and such like found on crippled non-
scatter-gather hardware.  If that memory is not DMA-able what's the
point?
Christoph Lameter (Ampere) July 6, 2011, 4:05 p.m. UTC | #13
On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> > > they typically don't fall into the highmem zone.  As the dmabounce
> > > code allocates from the DMA coherent allocator to provide it with
> > > guaranteed DMA-able memory, that would be rather inconvenient.
> >
> > True. The dmabounce code would consequently have to allocate
> > the memory through an internal function that avoids the
> > contiguous allocation area and goes straight to ZONE_DMA memory
> > as it does today.
>
> CMA's whole purpose for existing is to provide _dma-able_ contiguous
> memory for things like cameras and such like found on crippled non-
> scatter-gather hardware.  If that memory is not DMA-able what's the
> point?

ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
into all of memory (and so is ZONE_DMA32). Memory from ZONE_NORMAL can be
used for DMA as well and a fully capable device would be expected to
handle any memory in the system for DMA transfers.

"guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
you can call ZONE_DMA memory to be guaranteed if you guarantee that any
device must at mininum be able to perform DMA into ZONE_DMA memory. But
there may not be much of that memory around so you would want to limit
the use of that scarce resource.
MichaƂ Nazarewicz July 6, 2011, 4:09 p.m. UTC | #14
On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter <cl@linux.com> wrote:
> ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot  
> DMA into all of memory (and so is ZONE_DMA32).  Memory from ZONE_NORMAL
> can be used for DMA as well and a fully capable device would be expected
> to handle any memory in the system for DMA transfers.
>
> "guaranteed" dmaable memory? DMA abilities are device specific. Well  
> maybe you can call ZONE_DMA memory to be guaranteed if you guarantee
> that any device must at mininum be able to perform DMA into ZONE_DMA
> memory. But there may not be much of that memory around so you would
> want to limit the use of that scarce resource.

As pointed in Marek's other mail, this reasoning is not helping in any
way.  In case of video codec on various Samsung devices (and from some
other threads this is not limited to Samsung), the codec needs separate
buffers in separate memory banks.
Christoph Lameter (Ampere) July 6, 2011, 4:19 p.m. UTC | #15
On Wed, 6 Jul 2011, Michal Nazarewicz wrote:

> On Wed, 06 Jul 2011 18:05:00 +0200, Christoph Lameter <cl@linux.com> wrote:
> > ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
> > into all of memory (and so is ZONE_DMA32).  Memory from ZONE_NORMAL
> > can be used for DMA as well and a fully capable device would be expected
> > to handle any memory in the system for DMA transfers.
> >
> > "guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
> > you can call ZONE_DMA memory to be guaranteed if you guarantee
> > that any device must at mininum be able to perform DMA into ZONE_DMA
> > memory. But there may not be much of that memory around so you would
> > want to limit the use of that scarce resource.
>
> As pointed in Marek's other mail, this reasoning is not helping in any
> way.  In case of video codec on various Samsung devices (and from some
> other threads this is not limited to Samsung), the codec needs separate
> buffers in separate memory banks.

What I described is the basic memory architecture of Linux. I am not that
familiar with ARM and the issue discussed here. Only got involved because
ZONE_DMA was mentioned. The nature of ZONE_DMA is often misunderstood.

The allocation of the memory banks for the Samsung devices has to fit
somehow into one of these zones. Its probably best to put the memory banks
into ZONE_NORMAL and not have any dependency on ZONE_DMA at all.
Arnd Bergmann July 6, 2011, 4:31 p.m. UTC | #16
On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> > On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > 
> > I don't see how. The pages get allocated from an unmapped area
> > or memory, mapped into the kernel address space as uncached or wc
> > and then cleared. This should be the same for lowmem or highmem
> > pages.
> 
> You don't want to clear them via their uncached or WC mapping, but via
> their cached mapping _before_ they get their alternative mapping, and
> flush any cached out of that mapping - both L1 and L2 caches.

But there can't be any other mapping, which is the whole point of
the exercise to use highmem.
Quoting from the new dma_alloc_area() function:

        c = arm_vmregion_alloc(&area->vm, align, size,
                            gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
        if (!c)
                return NULL;
        memset((void *)c->vm_start, 0, size);

area->vm here points to an uncached location, which means that
we already zero the data through the uncached mapping. I don't
see how it's getting worse than it is already.

> > > Another issue is that when a platform has restricted DMA regions,
> > > they typically don't fall into the highmem zone.  As the dmabounce
> > > code allocates from the DMA coherent allocator to provide it with
> > > guaranteed DMA-able memory, that would be rather inconvenient.
> > 
> > True. The dmabounce code would consequently have to allocate
> > the memory through an internal function that avoids the
> > contiguous allocation area and goes straight to ZONE_DMA memory
> > as it does today.
> 
> CMA's whole purpose for existing is to provide _dma-able_ contiguous
> memory for things like cameras and such like found on crippled non-
> scatter-gather hardware.  If that memory is not DMA-able what's the
> point?

I mean not any ZONE_DMA memory, but the memory backing coherent_areas[],
which is by definition DMA-able from any device and is what is currently
being used for the purpose.

	Arnd
Russell King - ARM Linux July 6, 2011, 5:02 p.m. UTC | #17
On Wed, Jul 06, 2011 at 11:05:00AM -0500, Christoph Lameter wrote:
> On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> 
> > > > they typically don't fall into the highmem zone.  As the dmabounce
> > > > code allocates from the DMA coherent allocator to provide it with
> > > > guaranteed DMA-able memory, that would be rather inconvenient.
> > >
> > > True. The dmabounce code would consequently have to allocate
> > > the memory through an internal function that avoids the
> > > contiguous allocation area and goes straight to ZONE_DMA memory
> > > as it does today.
> >
> > CMA's whole purpose for existing is to provide _dma-able_ contiguous
> > memory for things like cameras and such like found on crippled non-
> > scatter-gather hardware.  If that memory is not DMA-able what's the
> > point?
> 
> ZONE_DMA is a zone for memory of legacy (crippled) devices that cannot DMA
> into all of memory (and so is ZONE_DMA32). Memory from ZONE_NORMAL can be
> used for DMA as well and a fully capable device would be expected to
> handle any memory in the system for DMA transfers.
> 
> "guaranteed" dmaable memory? DMA abilities are device specific. Well maybe
> you can call ZONE_DMA memory to be guaranteed if you guarantee that any
> device must at mininum be able to perform DMA into ZONE_DMA memory. But
> there may not be much of that memory around so you would want to limit
> the use of that scarce resource.

Precisely, which is what ZONE_DMA is all about.  I *have* been a Linux
kernel hacker for the last 18 years and do know these things, especially
as ARM has had various issues with DMA memory limitations over those
years - and have successfully had platforms working reliably given that
and ZONE_DMA.
Russell King - ARM Linux July 6, 2011, 5:15 p.m. UTC | #18
On Wed, Jul 06, 2011 at 11:19:00AM -0500, Christoph Lameter wrote:
> What I described is the basic memory architecture of Linux. I am not that
> familiar with ARM and the issue discussed here. Only got involved because
> ZONE_DMA was mentioned. The nature of ZONE_DMA is often misunderstood.
> 
> The allocation of the memory banks for the Samsung devices has to fit
> somehow into one of these zones. Its probably best to put the memory banks
> into ZONE_NORMAL and not have any dependency on ZONE_DMA at all.

Let me teach you about the ARM memory management on Linux.

Firstly, lets go over the structure of zones in Linux.  There are three
zones - ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM.  These zones are filled
in that order.  So, ZONE_DMA starts at zero.  Following on from ZONE_DMA
is ZONE_NORMAL memory, and lastly ZONE_HIGHMEM.

At boot, we pass all memory over to the kernel as follows:

1. If there is no DMA zone, then we pass all low memory over as ZONE_NORMAL.

2. If there is a DMA zone, by default we pass all low memory as ZONE_DMA.
   This is required so drivers which use GFP_DMA can work.

   Platforms with restricted DMA requirements can modify that layout to
   move memory from ZONE_DMA into ZONE_NORMAL, thereby restricting the
   upper address which the kernel allocators will give for GFP_DMA
   allocations.

3. In either case, any high memory as ZONE_HIGHMEM if configured (or memory
   is truncated if not.)

So, when we have (eg) a platform where only the _even_ MBs of memory are
DMA-able, we have a 1MB DMA zone at the beginning of system memory, and
everything else in ZONE_NORMAL.  This means GFP_DMA will return either
memory from the first 1MB or fail if it can't.  This is the behaviour we
desire.

Normal allocations will come from ZONE_NORMAL _first_ and then try ZONE_DMA
if there's no other alternative.  This is the same desired behaviour as
x86.

So, ARM is no different from x86, with the exception that the 16MB DMA
zone due to ISA ends up being different sizes on ARM depending on our
restrictions.
Christoph Lameter (Ampere) July 6, 2011, 7:03 p.m. UTC | #19
On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:

> So, ARM is no different from x86, with the exception that the 16MB DMA
> zone due to ISA ends up being different sizes on ARM depending on our
> restrictions.

Sounds good. Thank you.
Nicolas Pitre July 6, 2011, 7:10 p.m. UTC | #20
On Wed, 6 Jul 2011, Arnd Bergmann wrote:

> On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > On Wed, Jul 06, 2011 at 04:51:49PM +0200, Arnd Bergmann wrote:
> > > On Wednesday 06 July 2011, Russell King - ARM Linux wrote:
> > > 
> > > I don't see how. The pages get allocated from an unmapped area
> > > or memory, mapped into the kernel address space as uncached or wc
> > > and then cleared. This should be the same for lowmem or highmem
> > > pages.
> > 
> > You don't want to clear them via their uncached or WC mapping, but via
> > their cached mapping _before_ they get their alternative mapping, and
> > flush any cached out of that mapping - both L1 and L2 caches.
> 
> But there can't be any other mapping, which is the whole point of
> the exercise to use highmem.
> Quoting from the new dma_alloc_area() function:
> 
>         c = arm_vmregion_alloc(&area->vm, align, size,
>                             gfp & ~(__GFP_DMA | __GFP_HIGHMEM));
>         if (!c)
>                 return NULL;
>         memset((void *)c->vm_start, 0, size);
> 
> area->vm here points to an uncached location, which means that
> we already zero the data through the uncached mapping. I don't
> see how it's getting worse than it is already.

If you get a highmem page, because the cache is VIPT, that page might 
still be cached even if it wasn't mapped.  With a VIVT cache we must 
flush the cache whenever a highmem page is unmapped.  There is no such 
restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
highmem page you get doesn't have cache lines associated to it, you must 
first map it cacheable, then perform cache invalidation on it, and 
eventually remap it as non-cacheable.  This is necessary because there 
is no way to perform cache maintenance on L1 cache using physical 
addresses unfortunately.  See commit 7e5a69e83b for an example of what 
this entails (fortunately commit 3e4d3af501 made things much easier and 
therefore commit 39af22a79 greatly simplified things).



Nicolas
Arnd Bergmann July 6, 2011, 8:23 p.m. UTC | #21
On Wednesday 06 July 2011 21:10:07 Nicolas Pitre wrote:
> If you get a highmem page, because the cache is VIPT, that page might 
> still be cached even if it wasn't mapped.  With a VIVT cache we must 
> flush the cache whenever a highmem page is unmapped.  There is no such 
> restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
> highmem page you get doesn't have cache lines associated to it, you must 
> first map it cacheable, then perform cache invalidation on it, and 
> eventually remap it as non-cacheable.  This is necessary because there 
> is no way to perform cache maintenance on L1 cache using physical 
> addresses unfortunately.  See commit 7e5a69e83b for an example of what 
> this entails (fortunately commit 3e4d3af501 made things much easier and 
> therefore commit 39af22a79 greatly simplified things).

Ok, thanks for the explanation. This definitely makes the highmem approach
much harder to get right, and slower. Let's hope then that Marek's approach
of using small pages for the contiguous memory region and changing their
attributes on the fly works out better than this.

	Arnd
Nicolas Pitre July 7, 2011, 5:29 a.m. UTC | #22
On Wed, 6 Jul 2011, Arnd Bergmann wrote:

> On Wednesday 06 July 2011 21:10:07 Nicolas Pitre wrote:
> > If you get a highmem page, because the cache is VIPT, that page might 
> > still be cached even if it wasn't mapped.  With a VIVT cache we must 
> > flush the cache whenever a highmem page is unmapped.  There is no such 
> > restriction with VIPT i.e. ARMv6 and above.  Therefore to make sure the 
> > highmem page you get doesn't have cache lines associated to it, you must 
> > first map it cacheable, then perform cache invalidation on it, and 
> > eventually remap it as non-cacheable.  This is necessary because there 
> > is no way to perform cache maintenance on L1 cache using physical 
> > addresses unfortunately.  See commit 7e5a69e83b for an example of what 
> > this entails (fortunately commit 3e4d3af501 made things much easier and 
> > therefore commit 39af22a79 greatly simplified things).
> 
> Ok, thanks for the explanation. This definitely makes the highmem approach
> much harder to get right, and slower. Let's hope then that Marek's approach
> of using small pages for the contiguous memory region and changing their
> attributes on the fly works out better than this.

I would say that both approaches have fairly equivalent complexity.


Nicolas
Janusz Krzysztofik July 9, 2011, 2:57 p.m. UTC | #23
On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > Another issue is that when a platform has restricted DMA regions,
> > > they typically don't fall into the highmem zone.  As the
> > > dmabounce code allocates from the DMA coherent allocator to
> > > provide it with guaranteed DMA-able memory, that would be rather
> > > inconvenient.
> > 
> > Do we encounter this in practice i.e. do those platforms requiring
> > large contiguous allocations motivating this work have such DMA
> > restrictions?
> 
> You can probably find one or two of those, but we don't have to
> optimize for that case. I would at least expect the maximum size of
> the allocation to be smaller than the DMA limit for these, and
> consequently mandate that they define a sufficiently large
> CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a hack to
> unmap some low memory and call
> dma_declare_coherent_memory() for the device.

Once found that Russell has dropped his "ARM: DMA: steal memory for DMA 
coherent mappings" for now, let me get back to this idea of a hack that 
would allow for safely calling dma_declare_coherent_memory() in order to 
assign a device with a block of contiguous memory for exclusive use. 
Assuming there should be no problem with successfully allocating a large 
continuous block of coherent memory at boot time with 
dma_alloc_coherent(), this block could be reserved for the device. The 
only problem is with the dma_declare_coherent_memory() calling 
ioremap(), which was designed with a device's dedicated physical memory 
in mind, but shouldn't be called on a memory already mapped.

There were three approaches proposed, two of them in August 2010:
http://www.spinics.net/lists/linux-media/msg22179.html,
http://www.spinics.net/lists/arm-kernel/msg96318.html,
and a third one in January 2011:
http://www.spinics.net/lists/linux-arch/msg12637.html.

As far as I can understand the reason why both of the first two were 
NAKed, it was suggested that videobuf-dma-contig shouldn't use coherent 
if all it requires is a contiguous memory, and a new API should be 
invented, or dma_pool API extended, for providing contiguous memory. The 
CMA was pointed out as a new work in progress contiguous memory API. Now 
it turns out it's not, it's only a helper to ensure that 
dma_alloc_coherent() always succeeds, and videobuf2-dma-contig is still 
going to allocate buffers from coherent memory.

(CCing both authors, Marin Mitov and Guennadi Liakhovetski, and their 
main opponent, FUJITA Tomonori)

The third solution was not discussed much after it was pointed out as 
being not very different from those two in terms of the above mentioned 
rationale.

All three solutions was different from now suggested method of unmapping 
some low memory and then calling dma_declare_coherent_memory() which 
ioremaps it in that those tried to reserve some boot time allocated 
coherent memory, already mapped correctly, without (io)remapping it.

If there are still problems with the CMA on one hand, and a need for a 
hack to handle "crazy devices" is still seen, regardless of CMA 
available and working or not, on the other, maybe we should get back to 
the idea of adopting coherent API to new requirements, review those 
three proposals again and select one which seems most acceptable to 
everyone? Being a submitter of the third, I'll be happy to refresh it if 
selected.

Thanks,
Janusz
Marek Szyprowski July 11, 2011, 1:47 p.m. UTC | #24
Hello,

On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:

> On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > Another issue is that when a platform has restricted DMA regions,
> > > > they typically don't fall into the highmem zone.  As the
> > > > dmabounce code allocates from the DMA coherent allocator to
> > > > provide it with guaranteed DMA-able memory, that would be rather
> > > > inconvenient.
> > >
> > > Do we encounter this in practice i.e. do those platforms requiring
> > > large contiguous allocations motivating this work have such DMA
> > > restrictions?
> >
> > You can probably find one or two of those, but we don't have to
> > optimize for that case. I would at least expect the maximum size of
> > the allocation to be smaller than the DMA limit for these, and
> > consequently mandate that they define a sufficiently large
> > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a hack to
> > unmap some low memory and call
> > dma_declare_coherent_memory() for the device.
> 
> Once found that Russell has dropped his "ARM: DMA: steal memory for DMA
> coherent mappings" for now, let me get back to this idea of a hack that
> would allow for safely calling dma_declare_coherent_memory() in order to
> assign a device with a block of contiguous memory for exclusive use.

We tested such approach and finally with 3.0-rc1 it works fine. You can find
an example for dma_declare_coherent() together with required memblock_remove()
calls in the following patch series:
http://www.spinics.net/lists/linux-samsung-soc/msg05026.html 
"[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and EXYNOS4"

> Assuming there should be no problem with successfully allocating a large
> continuous block of coherent memory at boot time with
> dma_alloc_coherent(), this block could be reserved for the device. The
> only problem is with the dma_declare_coherent_memory() calling
> ioremap(), which was designed with a device's dedicated physical memory
> in mind, but shouldn't be called on a memory already mapped.

All these issues with ioremap has been finally resolved in 3.0-rc1. Like
Russell pointed me in http://www.spinics.net/lists/arm-kernel/msg127644.html,
ioremap can be fixed to work on early reserved memory areas by selecting
ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.

> There were three approaches proposed, two of them in August 2010:
> http://www.spinics.net/lists/linux-media/msg22179.html,
> http://www.spinics.net/lists/arm-kernel/msg96318.html,
> and a third one in January 2011:
> http://www.spinics.net/lists/linux-arch/msg12637.html.
> 
> As far as I can understand the reason why both of the first two were
> NAKed, it was suggested that videobuf-dma-contig shouldn't use coherent
> if all it requires is a contiguous memory, and a new API should be
> invented, or dma_pool API extended, for providing contiguous memory.

This is another story. DMA-mapping framework definitely needs some 
extensions to allow more detailed specification of the allocated memory
(currently we have only coherent and nearly ARM-specific writecombine).
During Linaro Memory Management summit we agreed that the 
dma_alloc_attrs() function might be needed to clean-up the API and
provide a nice way of adding new memory parameters. Having a possibility
to allocate contiguous cached buffers might be one of the new DMA
attributes. Here are some details of my proposal:
http://www.spinics.net/lists/linux-mm/msg21235.html

> The
> CMA was pointed out as a new work in progress contiguous memory API.

That was probably the biggest mistake at the beginning. We definitely 
should have learned dma-mapping framework and its internals.

> Now
> it turns out it's not, it's only a helper to ensure that
> dma_alloc_coherent() always succeeds, and videobuf2-dma-contig is still
> going to allocate buffers from coherent memory.

I hope that once the dma_alloc_attrs() API will be accepted, I will add
support for memory attributes to videobuf2-dma-contig allocator. 
 
> (CCing both authors, Marin Mitov and Guennadi Liakhovetski, and their
> main opponent, FUJITA Tomonori)
> 
> The third solution was not discussed much after it was pointed out as
> being not very different from those two in terms of the above mentioned
> rationale.
> 
> All three solutions was different from now suggested method of unmapping
> some low memory and then calling dma_declare_coherent_memory() which
> ioremaps it in that those tried to reserve some boot time allocated
> coherent memory, already mapped correctly, without (io)remapping it.
> 
> If there are still problems with the CMA on one hand, and a need for a
> hack to handle "crazy devices" is still seen, regardless of CMA
> available and working or not, on the other, maybe we should get back to
> the idea of adopting coherent API to new requirements, review those
> three proposals again and select one which seems most acceptable to
> everyone? Being a submitter of the third, I'll be happy to refresh it if
> selected.

I'm open to discussion.

Best regards
Janusz Krzysztofik July 11, 2011, 7:01 p.m. UTC | #25
Dnia poniedzia?ek, 11 lipca 2011 o 15:47:32 Marek Szyprowski napisa?(a):
> Hello,
> 
> On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:
> > On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > > Another issue is that when a platform has restricted DMA
> > > > > regions, they typically don't fall into the highmem zone. 
> > > > > As the dmabounce code allocates from the DMA coherent
> > > > > allocator to provide it with guaranteed DMA-able memory,
> > > > > that would be rather inconvenient.
> > > > 
> > > > Do we encounter this in practice i.e. do those platforms
> > > > requiring large contiguous allocations motivating this work
> > > > have such DMA restrictions?
> > > 
> > > You can probably find one or two of those, but we don't have to
> > > optimize for that case. I would at least expect the maximum size
> > > of the allocation to be smaller than the DMA limit for these,
> > > and consequently mandate that they define a sufficiently large
> > > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a
> > > hack to unmap some low memory and call
> > > dma_declare_coherent_memory() for the device.
> > 
> > Once found that Russell has dropped his "ARM: DMA: steal memory for
> > DMA coherent mappings" for now, let me get back to this idea of a
> > hack that would allow for safely calling
> > dma_declare_coherent_memory() in order to assign a device with a
> > block of contiguous memory for exclusive use.
> 
> We tested such approach and finally with 3.0-rc1 it works fine. You
> can find an example for dma_declare_coherent() together with
> required memblock_remove() calls in the following patch series:
> http://www.spinics.net/lists/linux-samsung-soc/msg05026.html
> "[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and
> EXYNOS4"
> 
> > Assuming there should be no problem with successfully allocating a
> > large continuous block of coherent memory at boot time with
> > dma_alloc_coherent(), this block could be reserved for the device.
> > The only problem is with the dma_declare_coherent_memory() calling
> > ioremap(), which was designed with a device's dedicated physical
> > memory in mind, but shouldn't be called on a memory already
> > mapped.
> 
> All these issues with ioremap has been finally resolved in 3.0-rc1.
> Like Russell pointed me in
> http://www.spinics.net/lists/arm-kernel/msg127644.html, ioremap can
> be fixed to work on early reserved memory areas by selecting
> ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.

I'm not sure. Recently I tried to refresh my now 7 months old patch in 
which I used that 'memblock_remove() then dma_declare_coherent_memery()' 
method[1]. It was different from your S5P MFC example in that it didn't 
punch any holes in the system memory, only stole a block of SDRAM from 
its tail. But Russell reminded me again: "we should not be mapping SDRAM 
using device mappings."[2]. Would defining ARCH_HAS_HOLES_MEMORYMODEL 
(even if it was justified) make any diference in my case? I don't think 
so. Wnat I think, after Russell, is that we still need that obligatory 
ioremap() removed from dma_declare_coherent_memory(), or made it 
optional, or a separate dma_declare_coherent_memory()-like function 
without (obligatory) ioremap() provided by the DMA API, in order to get 
the dma_declare_coherent_memery() method being accepted without any 
reservations when used inside arch/arm, I'm afraid.

Thanks,
Janusz

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-December/034644.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2011-June/052488.html
Marek Szyprowski July 12, 2011, 5:34 a.m. UTC | #26
Hello,

On Monday, July 11, 2011 9:01 PM Janusz Krzysztofik wrote:

> Dnia poniedzia?ek, 11 lipca 2011 o 15:47:32 Marek Szyprowski napisa?(a):
> > Hello,
> >
> > On Saturday, July 09, 2011 4:57 PM Janusz Krzysztofik	wrote:
> > > On Wed, 6 Jul 2011 at 16:59:45 Arnd Bergmann wrote:
> > > > On Wednesday 06 July 2011, Nicolas Pitre wrote:
> > > > > On Wed, 6 Jul 2011, Russell King - ARM Linux wrote:
> > > > > > Another issue is that when a platform has restricted DMA
> > > > > > regions, they typically don't fall into the highmem zone.
> > > > > > As the dmabounce code allocates from the DMA coherent
> > > > > > allocator to provide it with guaranteed DMA-able memory,
> > > > > > that would be rather inconvenient.
> > > > >
> > > > > Do we encounter this in practice i.e. do those platforms
> > > > > requiring large contiguous allocations motivating this work
> > > > > have such DMA restrictions?
> > > >
> > > > You can probably find one or two of those, but we don't have to
> > > > optimize for that case. I would at least expect the maximum size
> > > > of the allocation to be smaller than the DMA limit for these,
> > > > and consequently mandate that they define a sufficiently large
> > > > CONSISTENT_DMA_SIZE for the crazy devices, or possibly add a
> > > > hack to unmap some low memory and call
> > > > dma_declare_coherent_memory() for the device.
> > >
> > > Once found that Russell has dropped his "ARM: DMA: steal memory for
> > > DMA coherent mappings" for now, let me get back to this idea of a
> > > hack that would allow for safely calling
> > > dma_declare_coherent_memory() in order to assign a device with a
> > > block of contiguous memory for exclusive use.
> >
> > We tested such approach and finally with 3.0-rc1 it works fine. You
> > can find an example for dma_declare_coherent() together with
> > required memblock_remove() calls in the following patch series:
> > http://www.spinics.net/lists/linux-samsung-soc/msg05026.html
> > "[PATCH 0/3 v2] ARM: S5P: Add support for MFC device on S5PV210 and
> > EXYNOS4"
> >
> > > Assuming there should be no problem with successfully allocating a
> > > large continuous block of coherent memory at boot time with
> > > dma_alloc_coherent(), this block could be reserved for the device.
> > > The only problem is with the dma_declare_coherent_memory() calling
> > > ioremap(), which was designed with a device's dedicated physical
> > > memory in mind, but shouldn't be called on a memory already
> > > mapped.
> >
> > All these issues with ioremap has been finally resolved in 3.0-rc1.
> > Like Russell pointed me in
> > http://www.spinics.net/lists/arm-kernel/msg127644.html, ioremap can
> > be fixed to work on early reserved memory areas by selecting
> > ARCH_HAS_HOLES_MEMORYMODEL Kconfig option.
> 
> I'm not sure. Recently I tried to refresh my now 7 months old patch in
> which I used that 'memblock_remove() then dma_declare_coherent_memery()'
> method[1]. It was different from your S5P MFC example in that it didn't
> punch any holes in the system memory, only stole a block of SDRAM from
> its tail. But Russell reminded me again: "we should not be mapping SDRAM
> using device mappings."[2]. Would defining ARCH_HAS_HOLES_MEMORYMODEL
> (even if it was justified) make any diference in my case? I don't think
> so.

Defining ARCH_HAS_HOLES_MEMORYMODEL changes the behavior of valid_pfn()
macro/function, which is used in the ioremap(). When defined, valid_pfn()
checks if the selected pfn is inside system memory or not (using memblock
information). If the area is removed with memblock_remove(), then a check
with valid_pfn() fails and ioremap() doesn't complain about mapping
system memory.

> Wnat I think, after Russell, is that we still need that obligatory
> ioremap() removed from dma_declare_coherent_memory(), or made it
> optional, or a separate dma_declare_coherent_memory()-like function
> without (obligatory) ioremap() provided by the DMA API, in order to get
> the dma_declare_coherent_memery() method being accepted without any
> reservations when used inside arch/arm, I'm afraid.

Please check again with 3.0-rc1. ARCH_HAS_HOLES_MEMORYMODEL solution was
suggested by Russell. It looks like this is the correct solution for this
problem, because I don't believe that ioremap() will be removed from 
dma_declare_coherent() anytime soon. 

Best regards
Marek Szyprowski July 14, 2011, 12:29 p.m. UTC | #27
Hello,

I've just found two nasty bugs in this version of CMA. Sadly, both are the
results of posting the patches in a big hurry. I'm really sorry. 

Alignment argument was not passed correctly to the 
bitmap_find_next_zero_area() function and there was an ugly bug in the
dma_release_from_contiguous() function. 

On Tuesday, July 05, 2011 9:42 AM Marek Szyprowski wrote:

> The Contiguous Memory Allocator is a set of helper functions for DMA
> mapping framework that improves allocations of contiguous memory chunks.
> 
> CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> gives back to the system. Kernel is allowed to allocate movable pages
> within CMA's managed memory so that it can be used for example for page
> cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> request such pages are migrated out of CMA area to free required
> contiguous block and fulfill the request. This allows to allocate large
> contiguous chunks of memory at any time assuming that there is enough
> free memory available in the system.
> 
> This code is heavily based on earlier works by Michal Nazarewicz.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  drivers/base/Kconfig           |   77 +++++++++
>  drivers/base/Makefile          |    1 +
>  drivers/base/dma-contiguous.c  |  367
> ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-contiguous.h |  104 +++++++++++
>  4 files changed, 549 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-contiguous.c
>  create mode 100644 include/linux/dma-contiguous.h
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index d57e8d0..95ae1a7 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -168,4 +168,81 @@ config SYS_HYPERVISOR
>  	bool
>  	default n
> 
> +config CMA
> +	bool "Contiguous Memory Allocator"
> +	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
> +	select MIGRATION
> +	select CMA_MIGRATE_TYPE
> +	help
> +	  This enables the Contiguous Memory Allocator which allows drivers
> +	  to allocate big physically-contiguous blocks of memory for use with
> +	  hardware components that do not support I/O map nor scatter-gather.
> +
> +	  For more information see <include/linux/dma-contiguous.h>.
> +	  If unsure, say "n".
> +
> +if CMA
> +
> +config CMA_DEBUG
> +	bool "CMA debug messages (DEVELOPEMENT)"
> +	help
> +	  Turns on debug messages in CMA.  This produces KERN_DEBUG
> +	  messages for every CMA call as well as various messages while
> +	  processing calls such as dma_alloc_from_contiguous().
> +	  This option does not affect warning and error messages.
> +
> +comment "Default contiguous memory area size:"
> +
> +config CMA_SIZE_ABSOLUTE
> +	int "Absolute size (in MiB)"
> +	default 16
> +	help
> +	  Defines the size (in MiB) of the default memory area for Contiguous
> +	  Memory Allocator.
> +
> +config CMA_SIZE_PERCENTAGE
> +	int "Percentage of total memory"
> +	default 10
> +	help
> +	  Defines the size of the default memory area for Contiguous Memory
> +	  Allocator as a percentage of the total memory in the system.
> +
> +choice
> +	prompt "Selected region size"
> +	default CMA_SIZE_SEL_ABSOLUTE
> +
> +config CMA_SIZE_SEL_ABSOLUTE
> +	bool "Use absolute value only"
> +
> +config CMA_SIZE_SEL_PERCENTAGE
> +	bool "Use percentage value only"
> +
> +config CMA_SIZE_SEL_MIN
> +	bool "Use lower value (minimum)"
> +
> +config CMA_SIZE_SEL_MAX
> +	bool "Use higher value (maximum)"
> +
> +endchoice
> +
> +config CMA_ALIGNMENT
> +	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
> +	range 4 9
> +	default 8
> +	help
> +	  DMA mapping framework by default aligns all buffers to the smallest
> +	  PAGE_SIZE order which is greater than or equal to the requested
> buffer
> +	  size. This works well for buffers up to a few hundreds kilobytes,
> but
> +	  for larger buffers it just a memory waste. With this parameter you
> can
> +	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
> +	  buffers will be aligned only to this specified order. The order is
> +	  expressed as a power of two multiplied by the PAGE_SIZE.
> +
> +	  For example, if your system defaults to 4KiB pages, the order value
> +	  of 8 means that the buffers will be aligned up to 1MiB only.
> +
> +	  If unsure, leave the default value "8".
> +
> +endif
> +
>  endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 4c5701c..be6aab4 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -5,6 +5,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
> +obj-$(CONFIG_CMA) += dma-contiguous.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> new file mode 100644
> index 0000000..707b901
> --- /dev/null
> +++ b/drivers/base/dma-contiguous.c
> @@ -0,0 +1,367 @@
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +#define pr_fmt(fmt) "cma: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +#ifndef DEBUG
> +#  define DEBUG
> +#endif
> +#endif
> +
> +#include <asm/page.h>
> +#include <asm/sizes.h>
> +
> +#include <linux/memblock.h>
> +#include <linux/err.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/page-isolation.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/mm_types.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/dma-contiguous.h>
> +
> +struct cma {
> +	unsigned long	base_pfn;
> +	unsigned long	count;
> +	unsigned long	*bitmap;
> +};
> +
> +struct cma *dma_contiguous_default_area;
> +
> +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
> +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
> +static long size_cmdline = -1;
> +
> +static int __init early_cma(char *p)
> +{
> +	pr_debug("%s(%s)\n", __func__, p);
> +	size_cmdline = memparse(p, &p);
> +	return 0;
> +}
> +early_param("cma", early_cma);
> +
> +/**
> + * dma_contiguous_reserve() - reserve area for contiguous memory handling
> + *
> + * This funtion reserves memory from memblock subsystem. It should be
> + * called by arch specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +void __init dma_contiguous_reserve(void)
> +{
> +	struct memblock_region *reg;
> +	unsigned long selected_size = 0;
> +	unsigned long total_pages = 0;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	/*
> +	 * We cannot use memblock_phys_mem_size() here, because
> +	 * memblock_analyze() has not been called yet.
> +	 */
> +	for_each_memblock(memory, reg)
> +		total_pages += memblock_region_memory_end_pfn(reg) -
> +			       memblock_region_memory_base_pfn(reg);
> +
> +	size_percent *= (total_pages << PAGE_SHIFT) / 100;
> +
> +	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
> +		 (total_pages << PAGE_SHIFT) / SZ_1M);
> +
> +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
> +	selected_size = size_abs;
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
> +	selected_size = size_percent;
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_MIN
> +	selected_size = min(size_abs, size_percent);
> +#endif
> +#ifdef CONFIG_CMA_SIZE_SEL_MAX
> +	selected_size = max(size_abs, size_percent);
> +#endif
> +
> +	if (size_cmdline != -1)
> +		selected_size = size_cmdline;
> +
> +	if (!selected_size)
> +		return;
> +
> +	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +		 selected_size / SZ_1M);
> +
> +	dma_declare_contiguous(NULL, selected_size, 0);
> +};
> +
> +static DEFINE_MUTEX(cma_mutex);
> +
> +#ifdef CONFIG_DEBUG_VM
> +
> +static int __cma_activate_area(unsigned long base_pfn, unsigned long
> count)
> +{
> +	unsigned long pfn = base_pfn;
> +	unsigned i = count;
> +	struct zone *zone;
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	VM_BUG_ON(!pfn_valid(pfn));
> +	zone = page_zone(pfn_to_page(pfn));
> +
> +	do {
> +		VM_BUG_ON(!pfn_valid(pfn));
> +		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
> +		if (!(pfn & (pageblock_nr_pages - 1)))
> +			init_cma_reserved_pageblock(pfn_to_page(pfn));
> +		++pfn;
> +	} while (--i);
> +
> +	return 0;
> +}
> +
> +#else
> +
> +static int __cma_activate_area(unsigned long base_pfn, unsigned long
> count)
> +{
> +	unsigned i = count >> pageblock_order;
> +	struct page *p = pfn_to_page(base_pfn);
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	do {
> +		init_cma_reserved_pageblock(p);
> +		p += pageblock_nr_pages;
> +	} while (--i);
> +
> +	return 0;
> +}
> +
> +#endif
> +
> +static struct cma *__cma_create_area(unsigned long base_pfn,
> +				     unsigned long count)
> +{
> +	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
> +	struct cma *cma;
> +
> +	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
> +
> +	cma = kmalloc(sizeof *cma, GFP_KERNEL);
> +	if (!cma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cma->base_pfn = base_pfn;
> +	cma->count = count;
> +	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> +
> +	if (!cma->bitmap)
> +		goto no_mem;
> +
> +	__cma_activate_area(base_pfn, count);
> +
> +	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
> +	return cma;
> +
> +no_mem:
> +	kfree(cma);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
> +static struct cma_reserved {
> +	unsigned long start;
> +	unsigned long size;
> +	struct device *dev;
> +} cma_reserved[8] __initdata;
> +static unsigned cma_reserved_count __initdata;
> +
> +static int __init __cma_init_reserved_areas(void)
> +{
> +	struct cma_reserved *r = cma_reserved;
> +	unsigned i = cma_reserved_count;
> +
> +	pr_debug("%s()\n", __func__);
> +
> +	for (; i; --i, ++r) {
> +		struct cma *cma;
> +		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
> +					r->size >> PAGE_SHIFT);
> +		if (!IS_ERR(cma)) {
> +			pr_debug("%s: created area %p\n", __func__, cma);
> +			if (r->dev)
> +				set_dev_cma_area(r->dev, cma);
> +			else
> +				dma_contiguous_default_area = cma;
> +		}
> +	}
> +	return 0;
> +}
> +core_initcall(__cma_init_reserved_areas);
> +
> +/**
> + * dma_declare_contiguous() - reserve area for contiguous memory handling
> + *			      for particular device
> + * @dev:   Pointer to device structure.
> + * @size:  Size of the reserved memory.
> + * @start: Start address of the reserved memory (optional, 0 for any).
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +int __init dma_declare_contiguous(struct device *dev, unsigned long size,
> +				  phys_addr_t start)
> +{
> +	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
> +	unsigned long alignment;
> +
> +	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
> +
> +	/* Sanity checks */
> +	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
> +		return -ENOSPC;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	/* Sanitise input arguments */
> +	alignment = PAGE_SIZE << (MAX_ORDER + 1);
> +	start = ALIGN(start, alignment);
> +	size  = ALIGN(size , alignment);
> +
> +	/* Reserve memory */
> +	if (start) {
> +		if (memblock_is_region_reserved(start, size) ||
> +		    memblock_reserve(start, size) < 0)
> +			return -EBUSY;
> +	} else {
> +		/*
> +		 * Use __memblock_alloc_base() since
> +		 * memblock_alloc_base() panic()s.
> +		 */
> +		u64 addr = __memblock_alloc_base(size, alignment, 0);
> +		if (!addr) {
> +			return -ENOMEM;
> +		} else if (addr + size > ~(unsigned long)0) {
> +			memblock_free(addr, size);
> +			return -EOVERFLOW;
> +		} else {
> +			start = addr;
> +		}
> +	}
> +
> +	/*
> +	 * Each reserved area must be initialised later, when more kernel
> +	 * subsystems (like slab allocator) are available.
> +	 */
> +	r->start = start;
> +	r->size = size;
> +	r->dev = dev;
> +	cma_reserved_count++;
> +	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
> +	       size / SZ_1M, (void *)start);
> +	return 0;
> +}
> +
> +/**
> + * dma_alloc_from_contiguous() - allocate pages from contiguous area
> + * @dev:   Pointer to device for which the allocation is performed.
> + * @count: Requested number of pages.
> + * @align: Requested alignment of pages (in PAGE_SIZE order).
> + *
> + * This funtion allocates memory buffer for specified device. It uses
> + * device specific contiguous memory area if available or the default
> + * global one. Requires architecture specific get_dev_cma_area() helper
> + * function.
> + */
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int align)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn, pageno;
> +	int ret;
> +
> +	if (!cma)
> +		return NULL;
> +
> +	if (align > CONFIG_CMA_ALIGNMENT)
> +		align = CONFIG_CMA_ALIGNMENT;
> +
> +	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
> +
> +	if (!count)
> +		return NULL;
> +
> +	mutex_lock(&cma_mutex);
> +

> +	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
> +					    align);

Fixed version:
pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
					      (1 << align) - 1);

> +	if (pageno >= cma->count) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +	bitmap_set(cma->bitmap, pageno, count);
> +
> +	pfn = cma->base_pfn + pageno;
> +	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
> +	if (ret)
> +		goto free;
> +
> +	mutex_unlock(&cma_mutex);
> +
> +	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
> +	return pfn_to_page(pfn);
> +free:
> +	bitmap_clear(cma->bitmap, pageno, count);
> +error:
> +	mutex_unlock(&cma_mutex);
> +	return NULL;
> +}
> +
> +/**
> + * dma_release_from_contiguous() - release allocated pages
> + * @dev:   Pointer to device for which the pages were allocated.
> + * @pages: Allocated pages.
> + * @count: Number of allocated pages.
> + *
> + * This funtion reserves memory for specified device. It should be
> + * called by board specific code once a memblock allocator has been
> activated
> + * and all other subsystems have already allocated/reserved memory.
> + */
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	struct cma *cma = get_dev_cma_area(dev);
> +	unsigned long pfn;
> +
> +	if (!cma || !pages)
> +		return 0;
> +
> +	pr_debug("%s([%p])\n", __func__, (void *)pages);
> +
> +	pfn = page_to_pfn(pages);
> +
> +	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)

Fixed version:
	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)

> +		return 0;
> +
> +	mutex_lock(&cma_mutex);
> +
> +	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> +	free_contig_pages(pages, count);
> +
> +	mutex_unlock(&cma_mutex);
> +	return 1;
> +}
> diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-
> contiguous.h
> new file mode 100644
> index 0000000..98312c9
> --- /dev/null
> +++ b/include/linux/dma-contiguous.h
> @@ -0,0 +1,104 @@
> +#ifndef __LINUX_CMA_H
> +#define __LINUX_CMA_H
> +
> +/*
> + * Contiguous Memory Allocator for DMA mapping framework
> + * Copyright (c) 2010-2011 by Samsung Electronics.
> + * Written by:
> + *	Marek Szyprowski <m.szyprowski@samsung.com>
> + *	Michal Nazarewicz <mina86@mina86.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +/*
> + * Contiguous Memory Allocator
> + *
> + *   The Contiguous Memory Allocator (CMA) makes it possible to
> + *   allocate big contiguous chunks of memory after the system has
> + *   booted.
> + *
> + * Why is it needed?
> + *
> + *   Various devices on embedded systems have no scatter-getter and/or
> + *   IO map support and require contiguous blocks of memory to
> + *   operate.  They include devices such as cameras, hardware video
> + *   coders, etc.
> + *
> + *   Such devices often require big memory buffers (a full HD frame
> + *   is, for instance, more then 2 mega pixels large, i.e. more than 6
> + *   MB of memory), which makes mechanisms such as kmalloc() or
> + *   alloc_page() ineffective.
> + *
> + *   At the same time, a solution where a big memory region is
> + *   reserved for a device is suboptimal since often more memory is
> + *   reserved then strictly required and, moreover, the memory is
> + *   inaccessible to page system even if device drivers don't use it.
> + *
> + *   CMA tries to solve this issue by operating on memory regions
> + *   where only movable pages can be allocated from.  This way, kernel
> + *   can use the memory for pagecache and when device driver requests
> + *   it, allocated pages can be migrated.
> + *
> + * Driver usage
> + *
> + *   CMA should not be used by the device drivers directly. It is
> + *   only a helper framework for dma-mapping subsystem.
> + *
> + *   For more information, see kernel-docs in drivers/base/dma-
> contiguous.c
> + */
> +
> +#ifdef __KERNEL__
> +
> +struct cma;
> +struct page;
> +struct device;
> +
> +#ifdef CONFIG_CMA
> +
> +extern struct cma *dma_contiguous_default_area;
> +
> +void dma_contiguous_reserve(void);
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   phys_addr_t base);
> +
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order);
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count);
> +
> +#else
> +
> +#define dna_contiguous_default_area NULL
> +
> +static inline void dma_contiguous_reserve(void) { }
> +
> +static inline
> +int dma_declare_contiguous(struct device *dev, unsigned long size,
> +			   unsigned long base)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +				       unsigned int order)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int dma_release_from_contiguous(struct device *dev, struct page *pages,
> +				int count)
> +{
> +	return 0;
> +}
> +
> +#endif
> +
> +#endif
> +
> +#endif
> --
> 1.7.1.569.g6f426

Best regards
James Bottomley Aug. 3, 2011, 5:43 p.m. UTC | #28
[cc to ks-discuss added, since this may be a relevant topic]

On Tue, 2011-07-05 at 14:27 +0200, Arnd Bergmann wrote:
> On Tuesday 05 July 2011, Russell King - ARM Linux wrote:
> > On Tue, Jul 05, 2011 at 09:41:48AM +0200, Marek Szyprowski wrote:
> > > The Contiguous Memory Allocator is a set of helper functions for DMA
> > > mapping framework that improves allocations of contiguous memory chunks.
> > > 
> > > CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
> > > gives back to the system. Kernel is allowed to allocate movable pages
> > > within CMA's managed memory so that it can be used for example for page
> > > cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
> > > request such pages are migrated out of CMA area to free required
> > > contiguous block and fulfill the request. This allows to allocate large
> > > contiguous chunks of memory at any time assuming that there is enough
> > > free memory available in the system.
> > > 
> > > This code is heavily based on earlier works by Michal Nazarewicz.
> > 
> > And how are you addressing the technical concerns about aliasing of
> > cache attributes which I keep bringing up with this and you keep
> > ignoring and telling me that I'm standing in your way.

Just to chime in here, parisc has an identical issue.  If the CPU ever
sees an alias with different attributes for the same page, it will HPMC
the box (that's basically the bios will kill the system as being
architecturally inconsistent), so an architecture neutral solution on
this point is essential to us as well.

> This is of course an important issue, and it's the one item listed as
> TODO in the introductory mail that sent.
> 
> It's also a preexisting problem as far as I can tell, and it needs
> to be solved in __dma_alloc for both cases, dma_alloc_from_contiguous
> and __alloc_system_pages as introduced in patch 7.
> 
> We've discussed this back and forth, and it always comes down to
> one of two ugly solutions:
> 
> 1. Put all of the MIGRATE_CMA and pages into highmem and change
> __alloc_system_pages so it also allocates only from highmem pages.
> The consequences of this are that we always need to build kernels
> with highmem enabled and that we have less lowmem on systems that
> are already small, both of which can be fairly expensive unless
> you have lots of highmem already.

So this would require that systems using the API have a highmem? (parisc
doesn't today).

> 2. Add logic to unmap pages from the linear mapping, which is
> very expensive because it forces the use of small pages in the
> linear mapping (or in parts of it), and possibly means walking
> all page tables to remove the PTEs on alloc and put them back
> in on free.
> 
> I believe that Chunsang Jeong from Linaro is planning to
> implement both variants and post them for review, so we can
> decide which one to merge, or even to merge both and make
> it a configuration option. See also
> https://blueprints.launchpad.net/linaro-mm-sig/+spec/engr-mm-dma-mapping-2011.07
> 
> I don't think we need to make merging the CMA patches depending on
> the other patches, it's clear that both need to be solved, and
> they are independent enough.

I assume from the above that ARM has a hardware page walker?

The way I'd fix this on parisc, because we have a software based TLB, is
to rely on the fact that a page may only be used either for DMA or for
Page Cache, so the aliases should never be interleaved.  Since you know
the point at which the page flips from DMA to Cache (and vice versa),
I'd purge the TLB entry and flush the page at that point and rely on the
usage guarantees to ensure that the alias TLB entry doesn't reappear.
This isn't inexpensive but the majority of the cost is the cache flush
which is a requirement to clean the aliases anyway (a TLB entry purge is
pretty cheap).

Would this work for the ARM hardware walker as well?  It would require
you to have a TLB entry purge instruction as well as some architectural
guarantees about not speculating the TLB.

James
diff mbox

Patch

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d57e8d0..95ae1a7 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -168,4 +168,81 @@  config SYS_HYPERVISOR
 	bool
 	default n
 
+config CMA
+	bool "Contiguous Memory Allocator"
+	depends HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK
+	select MIGRATION
+	select CMA_MIGRATE_TYPE
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPEMENT)"
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_ABSOLUTE
+	int "Absolute size (in MiB)"
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_ABSOLUTE
+	bool "Use absolute value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 4c5701c..be6aab4 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -5,6 +5,7 @@  obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   cpu.o firmware.o init.o map.o devres.o \
 			   attribute_container.o transport_class.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..707b901
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,367 @@ 
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/sizes.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M;
+static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ *
+ * This funtion reserves memory from memblock subsystem. It should be
+ * called by arch specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+void __init dma_contiguous_reserve(void)
+{
+	struct memblock_region *reg;
+	unsigned long selected_size = 0;
+	unsigned long total_pages = 0;
+
+	pr_debug("%s()\n", __func__);
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: available phys mem: %ld MiB\n", __func__,
+		 (total_pages << PAGE_SHIFT) / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE
+	selected_size = size_abs;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_PERCENTAGE
+	selected_size = size_percent;
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MIN
+	selected_size = min(size_abs, size_percent);
+#endif
+#ifdef CONFIG_CMA_SIZE_SEL_MAX
+	selected_size = max(size_abs, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count;
+	struct zone *zone;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	VM_BUG_ON(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		VM_BUG_ON(!pfn_valid(pfn));
+		VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+		if (!(pfn & (pageblock_nr_pages - 1)))
+			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		++pfn;
+	} while (--i);
+
+	return 0;
+}
+
+#else
+
+static int __cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned i = count >> pageblock_order;
+	struct page *p = pfn_to_page(base_pfn);
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	do {
+		init_cma_reserved_pageblock(p);
+		p += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+}
+
+#endif
+
+static struct cma *__cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+
+	pr_debug("%s(0x%08lx+0x%lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	__cma_activate_area(base_pfn, count);
+
+	pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+	return cma;
+
+no_mem:
+	kfree(cma);
+	return ERR_PTR(-ENOMEM);
+}
+
+static struct cma_reserved {
+	unsigned long start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[8] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init __cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = __cma_create_area(page_to_pfn(phys_to_page(r->start)),
+					r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma)) {
+			pr_debug("%s: created area %p\n", __func__, cma);
+			if (r->dev)
+				set_dev_cma_area(r->dev, cma);
+			else
+				dma_contiguous_default_area = cma;
+		}
+	}
+	return 0;
+}
+core_initcall(__cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t start)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved))
+		return -ENOSPC;
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << (MAX_ORDER + 1);
+	start = ALIGN(start, alignment);
+	size  = ALIGN(size , alignment);
+
+	/* Reserve memory */
+	if (start) {
+		if (memblock_is_region_reserved(start, size) ||
+		    memblock_reserve(start, size) < 0)
+			return -EBUSY;
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		u64 addr = __memblock_alloc_base(size, alignment, 0);
+		if (!addr) {
+			return -ENOMEM;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			return -EOVERFLOW;
+		} else {
+			start = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = start;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	printk(KERN_INFO "%s: reserved %ld MiB area at 0x%p\n", __func__,
+	       size / SZ_1M, (void *)start);
+	return 0;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn, pageno;
+	int ret;
+
+	if (!cma)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(<%p>, %d/%d)\n", __func__, (void *)cma, count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count,
+					    align);
+	if (pageno >= cma->count) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	bitmap_set(cma->bitmap, pageno, count);
+
+	pfn = cma->base_pfn + pageno;
+	ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA);
+	if (ret)
+		goto free;
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returning [%ld]\n", __func__, pfn);
+	return pfn_to_page(pfn);
+free:
+	bitmap_clear(cma->bitmap, pageno, count);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code once a memblock allocator has been activated
+ * and all other subsystems have already allocated/reserved memory.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = get_dev_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s([%p])\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_pages(pages, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..98312c9
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,104 @@ 
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(void);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define dna_contiguous_default_area NULL
+
+static inline void dma_contiguous_reserve(void) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   unsigned long base)
+{
+	return -EINVAL;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif