Message ID | 20201002154420.292134-1-imbrenda@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | Rewrite the allocators | expand |
On 2020-10-02 17:44, Claudio Imbrenda wrote: > The KVM unit tests are increasingly being used to test more than just > KVM. They are being used to test TCG, qemu I/O device emulation, other > hypervisors, and even actual hardware. > > The existing memory allocators are becoming more and more inadequate to > the needs of the upcoming unit tests (but also some existing ones, see > below). > > Some important features that are lacking: > * ability to perform a small physical page allocation with a big > alignment withtout wasting huge amounts of memory > * ability to allocate physical pages from specific pools/areaas (e.g. > below 16M, or 4G, etc) > * ability to reserve arbitrary pages (if free), removing them from the > free pool > > Some other features that are nice, but not so fundamental: > * no need for the generic allocator to keep track of metadata > (i.e. allocation size), this is now handled by the lower level > allocators > * coalescing small blocks into bigger ones, to allow contiguous memory > freed in small blocks in a random order to be used for large > allocations again > > This is achieved in the following ways: > > For the virtual allocator: > * only the virtul allocator needs one extra page of metadata, but only > for allocations that wouldn't fit in one page > > For the page allocator: > * page allocator has up to 6 memory pools, each pool has a metadata > area; the metadata has a byte for each page in the area, describing > the order of the block it belongs to, and whether it is free > * if there are no free blocks of the desired size, a bigger block is > split until we reach the required size; the unused parts of the block > are put back in the free lists > * if an allocation needs ablock with a larger alignment than its size, a > larger block of (at least) the required order is split; the unused parts > put back in the appropriate free lists > * if the allocation could not be satisfied, the next allowed area is > searched; the allocation fails only when all allowed areas have been > tried > * new functions to perform allocations from specific areas; the areas > are arch-dependent and should be set up by the arch code > * for now x86 has a memory area for "lowest" memory under 16MB, one for > "low" memory under 4GB and one for the rest, while s390x has one for under > 2GB and one for the rest; suggestions for more fine grained areas or for > the other architectures are welcome While doing a page allocator, the topology is not the only characteristic we may need to specify. Specific page characteristics like rights, access flags, cache behavior may be useful when testing I/O for some architectures. This obviously will need some connection to the MMU handling. Wouldn't it be interesting to use a bitmap flag as argument to page_alloc() to define separate regions, even if the connection with the MMU is done in a future series? Regards, Pierre
On Mon, 5 Oct 2020 13:54:42 +0200 Pierre Morel <pmorel@linux.ibm.com> wrote: [...] > While doing a page allocator, the topology is not the only > characteristic we may need to specify. > Specific page characteristics like rights, access flags, cache > behavior may be useful when testing I/O for some architectures. > This obviously will need some connection to the MMU handling. > > Wouldn't it be interesting to use a bitmap flag as argument to > page_alloc() to define separate regions, even if the connection with > the MMU is done in a future series? the physical allocator is only concerned with the physical pages. if you need special MMU flags to be set, then you should enable the MMU and fiddle with the flags and settings yourself.
On Mon, Oct 05, 2020 at 02:35:03PM +0200, Claudio Imbrenda wrote: > On Mon, 5 Oct 2020 13:54:42 +0200 > Pierre Morel <pmorel@linux.ibm.com> wrote: > > [...] > > > While doing a page allocator, the topology is not the only > > characteristic we may need to specify. > > Specific page characteristics like rights, access flags, cache > > behavior may be useful when testing I/O for some architectures. > > This obviously will need some connection to the MMU handling. > > > > Wouldn't it be interesting to use a bitmap flag as argument to > > page_alloc() to define separate regions, even if the connection with > > the MMU is done in a future series? > > the physical allocator is only concerned with the physical pages. if > you need special MMU flags to be set, then you should enable the MMU > and fiddle with the flags and settings yourself. > Given enough need, we could create a collection of functions like alloc_pages_ro() alloc_pages_uncached() ... These functions wouldn't have generic implementations, only arch-specific implementations, and those implementations would simply do a typical allocation, followed by an iteration of each PTE where the arch-specific flags get set. Thanks, drew
On 2020-10-05 14:35, Claudio Imbrenda wrote: > On Mon, 5 Oct 2020 13:54:42 +0200 > Pierre Morel <pmorel@linux.ibm.com> wrote: > > [...] > >> While doing a page allocator, the topology is not the only >> characteristic we may need to specify. >> Specific page characteristics like rights, access flags, cache >> behavior may be useful when testing I/O for some architectures. >> This obviously will need some connection to the MMU handling. >> >> Wouldn't it be interesting to use a bitmap flag as argument to >> page_alloc() to define separate regions, even if the connection with >> the MMU is done in a future series? > > the physical allocator is only concerned with the physical pages. if > you need special MMU flags to be set, then you should enable the MMU > and fiddle with the flags and settings yourself. > AFAIU the page_allocator() works on virtual addresses if the MMU has been initialized. Considering that more and more tests will enable the MMU by default, eventually with a simple logical mapping, it seems to me that having the possibility to give the page allocator more information about the page access configuration could be interesting. I find that using two different interfaces, both related to memory handling, to have a proper memory configuration for an I/O page may be complicated without some way to link page allocator and MMU tables together.
On Mon, 5 Oct 2020 14:57:15 +0200 Pierre Morel <pmorel@linux.ibm.com> wrote: > On 2020-10-05 14:35, Claudio Imbrenda wrote: > > On Mon, 5 Oct 2020 13:54:42 +0200 > > Pierre Morel <pmorel@linux.ibm.com> wrote: > > > > [...] > > > >> While doing a page allocator, the topology is not the only > >> characteristic we may need to specify. > >> Specific page characteristics like rights, access flags, cache > >> behavior may be useful when testing I/O for some architectures. > >> This obviously will need some connection to the MMU handling. > >> > >> Wouldn't it be interesting to use a bitmap flag as argument to > >> page_alloc() to define separate regions, even if the connection > >> with the MMU is done in a future series? > > > > the physical allocator is only concerned with the physical pages. if > > you need special MMU flags to be set, then you should enable the MMU > > and fiddle with the flags and settings yourself. > > > > AFAIU the page_allocator() works on virtual addresses if the MMU has > been initialized. no, it still works on physical addresses, which happen to be identity mapped by the MMU. don't forget that the page tables are themselves allocated with the page allocator. > Considering that more and more tests will enable the MMU by default, > eventually with a simple logical mapping, it seems to me that having > the possibility to give the page allocator more information about the > page access configuration could be interesting. I disagree. I think we should not violate the layering here. > I find that using two different interfaces, both related to memory > handling, to have a proper memory configuration for an I/O page may > be complicated without some way to link page allocator and MMU tables > together. If you want to allocate an identity mapped page and also change its properties at the same time, you can always write a wrapper. keep the page allocator working only on physical addresses, add a function to change the properties of the mapping, and add a wrapper for the two.
On 02/10/20 17:44, Claudio Imbrenda wrote: > The KVM unit tests are increasingly being used to test more than just > KVM. They are being used to test TCG, qemu I/O device emulation, other > hypervisors, and even actual hardware. > > The existing memory allocators are becoming more and more inadequate to > the needs of the upcoming unit tests (but also some existing ones, see > below). > > Some important features that are lacking: > * ability to perform a small physical page allocation with a big > alignment withtout wasting huge amounts of memory > * ability to allocate physical pages from specific pools/areaas (e.g. > below 16M, or 4G, etc) > * ability to reserve arbitrary pages (if free), removing them from the > free pool > > Some other features that are nice, but not so fundamental: > * no need for the generic allocator to keep track of metadata > (i.e. allocation size), this is now handled by the lower level > allocators > * coalescing small blocks into bigger ones, to allow contiguous memory > freed in small blocks in a random order to be used for large > allocations again > > This is achieved in the following ways: > > For the virtual allocator: > * only the virtul allocator needs one extra page of metadata, but only > for allocations that wouldn't fit in one page > > For the page allocator: > * page allocator has up to 6 memory pools, each pool has a metadata > area; the metadata has a byte for each page in the area, describing > the order of the block it belongs to, and whether it is free > * if there are no free blocks of the desired size, a bigger block is > split until we reach the required size; the unused parts of the block > are put back in the free lists > * if an allocation needs ablock with a larger alignment than its size, a > larger block of (at least) the required order is split; the unused parts > put back in the appropriate free lists > * if the allocation could not be satisfied, the next allowed area is > searched; the allocation fails only when all allowed areas have been > tried > * new functions to perform allocations from specific areas; the areas > are arch-dependent and should be set up by the arch code > * for now x86 has a memory area for "lowest" memory under 16MB, one for > "low" memory under 4GB and one for the rest, while s390x has one for under > 2GB and one for the rest; suggestions for more fine grained areas or for > the other architectures are welcome > * upon freeing a block, an attempt is made to coalesce it into the > appropriate neighbour (if it is free), and so on for the resulting > larger block thus obtained > > For the physical allocator: > * the minimum alignment is now handled manually, since it has been > removed from the common struct > > > This patchset addresses some current but otherwise unsolvable issues on > s390x, such as the need to allocate a block under 2GB for each SMP CPU > upon CPU activation. > > This patchset has been tested on s390x, amd64 and i386. It has also been > compiled on aarch64. > > V1->V2: > * Renamed some functions, as per review comments > * Improved commit messages > * Split the list handling functions into an independent header > * Addded arch-specific headers to define the memory areas > * Fixed some minor issues > * The magic value for small allocations in the virtual allocator is now > put right before the returned pointer, like for large allocations > * Added comments to make the code more readable > * Many minor fixes Queued with the exception of patch 6 (still waiting for the CI to finish, but still). Paolo