[kvm-unit-tests,v2,0/7] Rewrite the allocators

Message ID	20201002154420.292134-1-imbrenda@linux.ibm.com (mailing list archive)
Headers	show Return-Path: <SRS0=ns5b=DJ=vger.kernel.org=kvm-owner@kernel.org> From: Claudio Imbrenda <imbrenda@linux.ibm.com> To: kvm@vger.kernel.org, pbonzini@redhat.com Cc: frankja@linux.ibm.com, david@redhat.com, thuth@redhat.com, cohuck@redhat.com, lvivier@redhat.com Subject: [kvm-unit-tests PATCH v2 0/7] Rewrite the allocators Date: Fri, 2 Oct 2020 17:44:13 +0200 Message-Id: <20201002154420.292134-1-imbrenda@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Rewrite the allocators \| expand [kvm-unit-tests,v2,0/7] Rewrite the allocators [kvm-unit-tests,v2,1/7] lib/list: Add double linked list management functions [kvm-unit-tests,v2,2/7] lib/vmalloc: vmalloc support for handling allocation metadata [kvm-unit-tests,v2,3/7] lib/asm: Add definitions of memory areas [kvm-unit-tests,v2,4/7] lib/alloc_page: complete rewrite of the page allocator [kvm-unit-tests,v2,5/7] lib/alloc: simplify free and malloc [kvm-unit-tests,v2,6/7] lib/alloc.h: remove align_min from struct alloc_ops [kvm-unit-tests,v2,7/7] lib/alloc_page: allow reserving arbitrary memory ranges

Claudio Imbrenda Oct. 2, 2020, 3:44 p.m. UTC

The KVM unit tests are increasingly being used to test more than just
KVM. They are being used to test TCG, qemu I/O device emulation, other
hypervisors, and even actual hardware.

The existing memory allocators are becoming more and more inadequate to
the needs of the upcoming unit tests (but also some existing ones, see
below).

Some important features that are lacking:
* ability to perform a small physical page allocation with a big
  alignment withtout wasting huge amounts of memory
* ability to allocate physical pages from specific pools/areaas (e.g.
  below 16M, or 4G, etc)
* ability to reserve arbitrary pages (if free), removing them from the
  free pool

Some other features that are nice, but not so fundamental:
* no need for the generic allocator to keep track of metadata
  (i.e. allocation size), this is now handled by the lower level
  allocators
* coalescing small blocks into bigger ones, to allow contiguous memory
  freed in small blocks in a random order to be used for large
  allocations again

This is achieved in the following ways:

For the virtual allocator:
* only the virtul allocator needs one extra page of metadata, but only
  for allocations that wouldn't fit in one page

For the page allocator:
* page allocator has up to 6 memory pools, each pool has a metadata
  area; the metadata has a byte for each page in the area, describing
  the order of the block it belongs to, and whether it is free
* if there are no free blocks of the desired size, a bigger block is
  split until we reach the required size; the unused parts of the block
  are put back in the free lists
* if an allocation needs ablock with a larger alignment than its size, a
  larger block of (at least) the required order is split; the unused parts
  put back in the appropriate free lists
* if the allocation could not be satisfied, the next allowed area is
  searched; the allocation fails only when all allowed areas have been
  tried
* new functions to perform allocations from specific areas; the areas
  are arch-dependent and should be set up by the arch code
* for now x86 has a memory area for "lowest" memory under 16MB, one for
  "low" memory under 4GB and one for the rest, while s390x has one for under
  2GB and one for the rest; suggestions for more fine grained areas or for
  the other architectures are welcome
* upon freeing a block, an attempt is made to coalesce it into the
  appropriate neighbour (if it is free), and so on for the resulting
  larger block thus obtained

For the physical allocator:
* the minimum alignment is now handled manually, since it has been
  removed from the common struct


This patchset addresses some current but otherwise unsolvable issues on
s390x, such as the need to allocate a block under 2GB for each SMP CPU
upon CPU activation.

This patchset has been tested on s390x, amd64 and i386. It has also been
compiled on aarch64.

V1->V2:
* Renamed some functions, as per review comments
* Improved commit messages
* Split the list handling functions into an independent header
* Addded arch-specific headers to define the memory areas
* Fixed some minor issues
* The magic value for small allocations in the virtual allocator is now
  put right before the returned pointer, like for large allocations
* Added comments to make the code more readable
* Many minor fixes

Claudio Imbrenda (7):
  lib/list: Add double linked list management functions
  lib/vmalloc: vmalloc support for handling allocation metadata
  lib/asm: Add definitions of memory areas
  lib/alloc_page: complete rewrite of the page allocator
  lib/alloc: simplify free and malloc
  lib/alloc.h: remove align_min from struct alloc_ops
  lib/alloc_page: allow reserving arbitrary memory ranges

 lib/asm-generic/memory_areas.h |  11 +
 lib/arm/asm/memory_areas.h     |  11 +
 lib/arm64/asm/memory_areas.h   |  11 +
 lib/powerpc/asm/memory_areas.h |  11 +
 lib/ppc64/asm/memory_areas.h   |  11 +
 lib/s390x/asm/memory_areas.h   |  17 ++
 lib/x86/asm/memory_areas.h     |  22 ++
 lib/alloc.h                    |   3 +-
 lib/alloc_page.h               |  80 ++++-
 lib/list.h                     |  53 ++++
 lib/alloc.c                    |  42 +--
 lib/alloc_page.c               | 541 +++++++++++++++++++++++++++------
 lib/alloc_phys.c               |   9 +-
 lib/arm/setup.c                |   2 +-
 lib/s390x/sclp.c               |   6 +-
 lib/s390x/smp.c                |   6 +-
 lib/vmalloc.c                  | 121 ++++++--
 s390x/smp.c                    |   4 +-
 18 files changed, 789 insertions(+), 172 deletions(-)
 create mode 100644 lib/asm-generic/memory_areas.h
 create mode 100644 lib/arm/asm/memory_areas.h
 create mode 100644 lib/arm64/asm/memory_areas.h
 create mode 100644 lib/powerpc/asm/memory_areas.h
 create mode 100644 lib/ppc64/asm/memory_areas.h
 create mode 100644 lib/s390x/asm/memory_areas.h
 create mode 100644 lib/x86/asm/memory_areas.h
 create mode 100644 lib/list.h

Pierre Morel Oct. 5, 2020, 11:54 a.m. UTC | #1

On 2020-10-02 17:44, Claudio Imbrenda wrote:
> The KVM unit tests are increasingly being used to test more than just
> KVM. They are being used to test TCG, qemu I/O device emulation, other
> hypervisors, and even actual hardware.
> 
> The existing memory allocators are becoming more and more inadequate to
> the needs of the upcoming unit tests (but also some existing ones, see
> below).
> 
> Some important features that are lacking:
> * ability to perform a small physical page allocation with a big
>    alignment withtout wasting huge amounts of memory
> * ability to allocate physical pages from specific pools/areaas (e.g.
>    below 16M, or 4G, etc)
> * ability to reserve arbitrary pages (if free), removing them from the
>    free pool
> 
> Some other features that are nice, but not so fundamental:
> * no need for the generic allocator to keep track of metadata
>    (i.e. allocation size), this is now handled by the lower level
>    allocators
> * coalescing small blocks into bigger ones, to allow contiguous memory
>    freed in small blocks in a random order to be used for large
>    allocations again
> 
> This is achieved in the following ways:
> 
> For the virtual allocator:
> * only the virtul allocator needs one extra page of metadata, but only
>    for allocations that wouldn't fit in one page
> 
> For the page allocator:
> * page allocator has up to 6 memory pools, each pool has a metadata
>    area; the metadata has a byte for each page in the area, describing
>    the order of the block it belongs to, and whether it is free
> * if there are no free blocks of the desired size, a bigger block is
>    split until we reach the required size; the unused parts of the block
>    are put back in the free lists
> * if an allocation needs ablock with a larger alignment than its size, a
>    larger block of (at least) the required order is split; the unused parts
>    put back in the appropriate free lists
> * if the allocation could not be satisfied, the next allowed area is
>    searched; the allocation fails only when all allowed areas have been
>    tried
> * new functions to perform allocations from specific areas; the areas
>    are arch-dependent and should be set up by the arch code
> * for now x86 has a memory area for "lowest" memory under 16MB, one for
>    "low" memory under 4GB and one for the rest, while s390x has one for under
>    2GB and one for the rest; suggestions for more fine grained areas or for
>    the other architectures are welcome


While doing a page allocator, the topology is not the only 
characteristic we may need to specify.
Specific page characteristics like rights, access flags, cache behavior 
may be useful when testing I/O for some architectures.
This obviously will need some connection to the MMU handling.

Wouldn't it be interesting to use a bitmap flag as argument to 
page_alloc() to define separate regions, even if the connection with the 
MMU is done in a future series?

Regards,
Pierre

Claudio Imbrenda Oct. 5, 2020, 12:35 p.m. UTC | #2

On Mon, 5 Oct 2020 13:54:42 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

[...]

> While doing a page allocator, the topology is not the only 
> characteristic we may need to specify.
> Specific page characteristics like rights, access flags, cache
> behavior may be useful when testing I/O for some architectures.
> This obviously will need some connection to the MMU handling.
> 
> Wouldn't it be interesting to use a bitmap flag as argument to 
> page_alloc() to define separate regions, even if the connection with
> the MMU is done in a future series?

the physical allocator is only concerned with the physical pages. if
you need special MMU flags to be set, then you should enable the MMU
and fiddle with the flags and settings yourself.

Andrew Jones Oct. 5, 2020, 12:49 p.m. UTC | #3

On Mon, Oct 05, 2020 at 02:35:03PM +0200, Claudio Imbrenda wrote:
> On Mon, 5 Oct 2020 13:54:42 +0200
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
> [...]
> 
> > While doing a page allocator, the topology is not the only 
> > characteristic we may need to specify.
> > Specific page characteristics like rights, access flags, cache
> > behavior may be useful when testing I/O for some architectures.
> > This obviously will need some connection to the MMU handling.
> > 
> > Wouldn't it be interesting to use a bitmap flag as argument to 
> > page_alloc() to define separate regions, even if the connection with
> > the MMU is done in a future series?
> 
> the physical allocator is only concerned with the physical pages. if
> you need special MMU flags to be set, then you should enable the MMU
> and fiddle with the flags and settings yourself.
>

Given enough need, we could create a collection of functions like

 alloc_pages_ro()
 alloc_pages_uncached()
 ...

These functions wouldn't have generic implementations, only arch-specific
implementations, and those implementations would simply do a typical
allocation, followed by an iteration of each PTE where the arch-specific
flags get set.

Thanks,
drew

Pierre Morel Oct. 5, 2020, 12:57 p.m. UTC | #4

On 2020-10-05 14:35, Claudio Imbrenda wrote:
> On Mon, 5 Oct 2020 13:54:42 +0200
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
> [...]
> 
>> While doing a page allocator, the topology is not the only
>> characteristic we may need to specify.
>> Specific page characteristics like rights, access flags, cache
>> behavior may be useful when testing I/O for some architectures.
>> This obviously will need some connection to the MMU handling.
>>
>> Wouldn't it be interesting to use a bitmap flag as argument to
>> page_alloc() to define separate regions, even if the connection with
>> the MMU is done in a future series?
> 
> the physical allocator is only concerned with the physical pages. if
> you need special MMU flags to be set, then you should enable the MMU
> and fiddle with the flags and settings yourself.
> 

AFAIU the page_allocator() works on virtual addresses if the MMU has 
been initialized.

Considering that more and more tests will enable the MMU by default, 
eventually with a simple logical mapping, it seems to me that having the 
possibility to give the page allocator more information about the page 
access configuration could be interesting.

I find that using two different interfaces, both related to memory 
handling, to have a proper memory configuration for an I/O page may be 
complicated without some way to link page allocator and MMU tables together.

Claudio Imbrenda Oct. 5, 2020, 2:59 p.m. UTC | #5

On Mon, 5 Oct 2020 14:57:15 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 2020-10-05 14:35, Claudio Imbrenda wrote:
> > On Mon, 5 Oct 2020 13:54:42 +0200
> > Pierre Morel <pmorel@linux.ibm.com> wrote:
> > 
> > [...]
> >   
> >> While doing a page allocator, the topology is not the only
> >> characteristic we may need to specify.
> >> Specific page characteristics like rights, access flags, cache
> >> behavior may be useful when testing I/O for some architectures.
> >> This obviously will need some connection to the MMU handling.
> >>
> >> Wouldn't it be interesting to use a bitmap flag as argument to
> >> page_alloc() to define separate regions, even if the connection
> >> with the MMU is done in a future series?  
> > 
> > the physical allocator is only concerned with the physical pages. if
> > you need special MMU flags to be set, then you should enable the MMU
> > and fiddle with the flags and settings yourself.
> >   
> 
> AFAIU the page_allocator() works on virtual addresses if the MMU has 
> been initialized.

no, it still works on physical addresses, which happen to be identity
mapped by the MMU. don't forget that the page tables are
themselves allocated with the page allocator. 

> Considering that more and more tests will enable the MMU by default, 
> eventually with a simple logical mapping, it seems to me that having
> the possibility to give the page allocator more information about the
> page access configuration could be interesting.

I disagree.

I think we should not violate the layering here.

> I find that using two different interfaces, both related to memory 
> handling, to have a proper memory configuration for an I/O page may
> be complicated without some way to link page allocator and MMU tables
> together.

If you want to allocate an identity mapped page and also change its
properties at the same time, you can always write a wrapper.

keep the page allocator working only on physical addresses, add a
function to change the properties of the mapping, and add a wrapper
for the two.

Paolo Bonzini Nov. 6, 2020, 11:36 a.m. UTC | #6

On 02/10/20 17:44, Claudio Imbrenda wrote:
> The KVM unit tests are increasingly being used to test more than just
> KVM. They are being used to test TCG, qemu I/O device emulation, other
> hypervisors, and even actual hardware.
> 
> The existing memory allocators are becoming more and more inadequate to
> the needs of the upcoming unit tests (but also some existing ones, see
> below).
> 
> Some important features that are lacking:
> * ability to perform a small physical page allocation with a big
>    alignment withtout wasting huge amounts of memory
> * ability to allocate physical pages from specific pools/areaas (e.g.
>    below 16M, or 4G, etc)
> * ability to reserve arbitrary pages (if free), removing them from the
>    free pool
> 
> Some other features that are nice, but not so fundamental:
> * no need for the generic allocator to keep track of metadata
>    (i.e. allocation size), this is now handled by the lower level
>    allocators
> * coalescing small blocks into bigger ones, to allow contiguous memory
>    freed in small blocks in a random order to be used for large
>    allocations again
> 
> This is achieved in the following ways:
> 
> For the virtual allocator:
> * only the virtul allocator needs one extra page of metadata, but only
>    for allocations that wouldn't fit in one page
> 
> For the page allocator:
> * page allocator has up to 6 memory pools, each pool has a metadata
>    area; the metadata has a byte for each page in the area, describing
>    the order of the block it belongs to, and whether it is free
> * if there are no free blocks of the desired size, a bigger block is
>    split until we reach the required size; the unused parts of the block
>    are put back in the free lists
> * if an allocation needs ablock with a larger alignment than its size, a
>    larger block of (at least) the required order is split; the unused parts
>    put back in the appropriate free lists
> * if the allocation could not be satisfied, the next allowed area is
>    searched; the allocation fails only when all allowed areas have been
>    tried
> * new functions to perform allocations from specific areas; the areas
>    are arch-dependent and should be set up by the arch code
> * for now x86 has a memory area for "lowest" memory under 16MB, one for
>    "low" memory under 4GB and one for the rest, while s390x has one for under
>    2GB and one for the rest; suggestions for more fine grained areas or for
>    the other architectures are welcome
> * upon freeing a block, an attempt is made to coalesce it into the
>    appropriate neighbour (if it is free), and so on for the resulting
>    larger block thus obtained
> 
> For the physical allocator:
> * the minimum alignment is now handled manually, since it has been
>    removed from the common struct
> 
> 
> This patchset addresses some current but otherwise unsolvable issues on
> s390x, such as the need to allocate a block under 2GB for each SMP CPU
> upon CPU activation.
> 
> This patchset has been tested on s390x, amd64 and i386. It has also been
> compiled on aarch64.
> 
> V1->V2:
> * Renamed some functions, as per review comments
> * Improved commit messages
> * Split the list handling functions into an independent header
> * Addded arch-specific headers to define the memory areas
> * Fixed some minor issues
> * The magic value for small allocations in the virtual allocator is now
>    put right before the returned pointer, like for large allocations
> * Added comments to make the code more readable
> * Many minor fixes

Queued with the exception of patch 6 (still waiting for the CI to 
finish, but still).

Paolo

[kvm-unit-tests,v2,0/7] Rewrite the allocators

Message

Comments