Message ID | 20250401171754.2686501-1-vitaly.wool@konsulko.se (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: add zblock allocator | expand |
On Tue, Apr 1, 2025 at 10:18 AM Vitaly Wool <vitaly.wool@konsulko.se> wrote: > > zblock is a special purpose allocator for storing compressed pages. > It stores integer number of compressed objects per its block. These > blocks consist of several physical pages (2**n, i. e. 1/2/4/8). Haven't taken a close look yet, but as a general principle I don't mind having a separate allocator for a separate use case. Some quick notes (will do a careful review later): > > With zblock, it is possible to densely arrange objects of various sizes > resulting in low internal fragmentation. Also this allocator tries to > fill incomplete blocks instead of adding new ones, in many cases > providing a compression ratio substantially higher than z3fold and zbud > (though lower than zmalloc's). Do we have data for comparison here? > > zblock does not require MMU to operate and also is superior to zsmalloc This is not an actual meaningful distinction. CONFIG_SWAP depends on CONFIG_MMU: menuconfig SWAP bool "Support for paging of anonymous memory (swap)" depends on MMU && BLOCK && !ARCH_NO_SWAP > with regard to average performance and worst execution times, thus > allowing for better response time and real-time characteristics of the > whole system. By performance, do you mean latency or throughput or storage density? > > E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get > 5-10% higher value for bogo ops/s in zblock/zsmalloc comparison. > > Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se> > Signed-off-by: Igor Belousov <igor.b@beldev.am> > --- > Documentation/mm/zblock.rst | 22 ++ > MAINTAINERS | 7 + > mm/Kconfig | 8 + > mm/Makefile | 1 + > mm/zblock.c | 492 ++++++++++++++++++++++++++++++++++++ > 5 files changed, 530 insertions(+) > create mode 100644 Documentation/mm/zblock.rst > create mode 100644 mm/zblock.c > > diff --git a/Documentation/mm/zblock.rst b/Documentation/mm/zblock.rst > new file mode 100644 > index 000000000000..754b3dbb9e94 > --- /dev/null > +++ b/Documentation/mm/zblock.rst > @@ -0,0 +1,22 @@ > +====== > +zblock > +====== > + > +zblock is a special purpose allocator for storing compressed pages. > +It stores integer number of compressed objects per its block. These > +blocks consist of several physical pages (2**n, i. e. 1/2/4/8). > + > +With zblock, it is possible to densely arrange objects of various sizes > +resulting in low internal fragmentation. Also this allocator tries to > +fill incomplete blocks instead of adding new ones, in many cases > +providing a compression ratio substantially higher than z3fold and zbud > +(though lower than zmalloc's). > + > +zblock does not require MMU to operate and also is superior to zsmalloc Same note as above. > +with regard to average performance and worst execution times, thus > +allowing for better response time and real-time characteristics of the > +whole system. > + > +E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get > +5-10% higher value for bogo ops/s in zblock/zsmalloc comparison. > + > diff --git a/MAINTAINERS b/MAINTAINERS > index 991a33bad10e..166e9bfa04dc 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -26313,6 +26313,13 @@ F: Documentation/networking/device_drivers/hamradio/z8530drv.rst > F: drivers/net/hamradio/*scc.c > F: drivers/net/hamradio/z8530.h > > +ZBLOCK COMPRESSED SLAB MEMORY ALLOCATOR > +M: Vitaly Wool <vitaly.wool@konsulko.se> > +L: linux-mm@kvack.org > +S: Maintained > +F: Documentation/mm/zblock.rst > +F: mm/zblock.c > + > ZBUD COMPRESSED PAGE ALLOCATOR > M: Seth Jennings <sjenning@redhat.com> > M: Dan Streetman <ddstreet@ieee.org> > diff --git a/mm/Kconfig b/mm/Kconfig > index 1b501db06417..26b79e3c1300 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -193,6 +193,14 @@ config Z3FOLD_DEPRECATED > page. It is a ZBUD derivative so the simplicity and determinism are > still there. > > +config ZBLOCK > + tristate "Fast compression allocator with high density" > + depends on ZPOOL > + help > + A special purpose allocator for storing compressed pages. > + It is designed to store same size compressed pages in blocks of > + physical pages. > + > config Z3FOLD > tristate > default y if Z3FOLD_DEPRECATED=y > diff --git a/mm/Makefile b/mm/Makefile > index 850386a67b3e..2018455b7baa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -116,6 +116,7 @@ obj-$(CONFIG_ZPOOL) += zpool.o > obj-$(CONFIG_ZBUD) += zbud.o > obj-$(CONFIG_ZSMALLOC) += zsmalloc.o > obj-$(CONFIG_Z3FOLD) += z3fold.o > +obj-$(CONFIG_ZBLOCK) += zblock.o > obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o > obj-$(CONFIG_CMA) += cma.o > obj-$(CONFIG_NUMA) += numa.o > diff --git a/mm/zblock.c b/mm/zblock.c > new file mode 100644 > index 000000000000..a6778653c451 > --- /dev/null > +++ b/mm/zblock.c > @@ -0,0 +1,492 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * zblock.c > + * > + * Author: Vitaly Wool <vitaly.wool@konsulko.com> > + * Based on the work from Ananda Badmaev <a.badmaev@clicknet.pro> > + * Copyright (C) 2022-2024, Konsulko AB. > + * > + * Zblock is a small object allocator with the intention to serve as a > + * zpool backend. It operates on page blocks which consist of number > + * of physical pages being a power of 2 and store integer number of > + * compressed pages per block which results in determinism and simplicity. > + * > + * zblock doesn't export any API and is meant to be used via zpool API. > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include <linux/atomic.h> > +#include <linux/mm.h> > +#include <linux/module.h> > +#include <linux/preempt.h> > +#include <linux/slab.h> > +#include <linux/spinlock.h> > +#include <linux/zpool.h> > + > +#define SLOT_FREE 0 > +#define BIT_SLOT_OCCUPIED 0 > +#define BIT_SLOT_MAPPED 1 > + > +#define SLOT_BITS 5 > +#define MAX_SLOTS (1 << SLOT_BITS) > +#define SLOT_MASK ((0x1UL << SLOT_BITS) - 1) > + > +#define ZBLOCK_HEADER_SIZE round_up(sizeof(struct zblock_block), sizeof(long)) > +#define BLOCK_DATA_SIZE(order) ((PAGE_SIZE << order) - ZBLOCK_HEADER_SIZE) > +#define SLOT_SIZE(nslots, order) (round_down((BLOCK_DATA_SIZE(order) / nslots), sizeof(long))) > + > +#define BLOCK_CACHE_SIZE 32 > + > +struct zblock_pool; > + > +/** > + * struct zblock_block - block metadata > + * Block consists of several (1/2/4/8) pages and contains fixed > + * integer number of slots for allocating compressed pages. > + * > + * free_slots: number of free slots in the block > + * slot_info: contains data about free/occupied slots > + * cache_idx: index of the block in cache > + */ > +struct zblock_block { > + atomic_t free_slots; > + u64 slot_info[1]; > + int cache_idx; > +}; > + > +/** > + * struct block_desc - general metadata for block lists > + * Each block list stores only blocks of corresponding type which means > + * that all blocks in it have the same number and size of slots. > + * All slots are aligned to size of long. > + * > + * slot_size: size of slot for this list > + * slots_per_block: number of slots per block for this list > + * order: order for __get_free_pages > + */ > +static const struct block_desc { > + const unsigned int slot_size; > + const unsigned short slots_per_block; > + const unsigned short order; > +} block_desc[] = { > + { SLOT_SIZE(32, 0), 32, 0 }, > + { SLOT_SIZE(22, 0), 22, 0 }, > + { SLOT_SIZE(17, 0), 17, 0 }, > + { SLOT_SIZE(13, 0), 13, 0 }, > + { SLOT_SIZE(11, 0), 11, 0 }, > + { SLOT_SIZE(9, 0), 9, 0 }, > + { SLOT_SIZE(8, 0), 8, 0 }, > + { SLOT_SIZE(14, 1), 14, 1 }, > + { SLOT_SIZE(12, 1), 12, 1 }, > + { SLOT_SIZE(11, 1), 11, 1 }, > + { SLOT_SIZE(10, 1), 10, 1 }, > + { SLOT_SIZE(9, 1), 9, 1 }, > + { SLOT_SIZE(8, 1), 8, 1 }, > + { SLOT_SIZE(15, 2), 15, 2 }, > + { SLOT_SIZE(14, 2), 14, 2 }, > + { SLOT_SIZE(13, 2), 13, 2 }, > + { SLOT_SIZE(12, 2), 12, 2 }, > + { SLOT_SIZE(11, 2), 11, 2 }, > + { SLOT_SIZE(10, 2), 10, 2 }, > + { SLOT_SIZE(9, 2), 9, 2 }, > + { SLOT_SIZE(8, 2), 8, 2 }, > + { SLOT_SIZE(15, 3), 15, 3 }, > + { SLOT_SIZE(14, 3), 14, 3 }, > + { SLOT_SIZE(13, 3), 13, 3 }, > + { SLOT_SIZE(12, 3), 12, 3 }, > + { SLOT_SIZE(11, 3), 11, 3 }, > + { SLOT_SIZE(10, 3), 10, 3 }, > + { SLOT_SIZE(9, 3), 9, 3 }, > + { SLOT_SIZE(7, 3), 7, 3 } > +}; > + > +/** > + * struct block_list - stores metadata of particular list > + * lock: protects block_cache > + * block_cache: blocks with free slots > + * block_count: total number of blocks in the list > + */ > +struct block_list { > + spinlock_t lock; > + struct zblock_block *block_cache[BLOCK_CACHE_SIZE]; > + unsigned long block_count; > +}; > + > +/** > + * struct zblock_pool - stores metadata for each zblock pool > + * @block_lists: array of block lists > + * @zpool: zpool driver > + * @alloc_flag: protects block allocation from memory leak > + * > + * This structure is allocated at pool creation time and maintains metadata > + * for a particular zblock pool. > + */ > +struct zblock_pool { > + struct block_list block_lists[ARRAY_SIZE(block_desc)]; > + struct zpool *zpool; > + atomic_t alloc_flag; > +}; > + > +/***************** > + * Helpers > + *****************/ > + > +static int cache_insert_block(struct zblock_block *block, struct block_list *list) > +{ > + unsigned int i, min_free_slots = atomic_read(&block->free_slots); > + int min_index = -1; > + > + if (WARN_ON(block->cache_idx != -1)) > + return -EINVAL; > + > + min_free_slots = atomic_read(&block->free_slots); > + for (i = 0; i < BLOCK_CACHE_SIZE; i++) { > + if (!list->block_cache[i] || !atomic_read(&(list->block_cache[i])->free_slots)) { > + min_index = i; > + break; > + } > + if (atomic_read(&(list->block_cache[i])->free_slots) < min_free_slots) { > + min_free_slots = atomic_read(&(list->block_cache[i])->free_slots); > + min_index = i; > + } > + } > + if (min_index >= 0) { > + if (list->block_cache[min_index]) > + (list->block_cache[min_index])->cache_idx = -1; > + list->block_cache[min_index] = block; > + block->cache_idx = min_index; > + } > + return min_index < 0 ? min_index : 0; > +} > + > +static struct zblock_block *cache_find_block(struct block_list *list) > +{ > + int i; > + struct zblock_block *z = NULL; > + > + for (i = 0; i < BLOCK_CACHE_SIZE; i++) { > + if (list->block_cache[i] && > + atomic_dec_if_positive(&list->block_cache[i]->free_slots) >= 0) { > + z = list->block_cache[i]; > + break; > + } > + } > + return z; > +} > + > +static int cache_remove_block(struct block_list *list, struct zblock_block *block) > +{ > + int idx = block->cache_idx; > + > + block->cache_idx = -1; > + if (idx >= 0) > + list->block_cache[idx] = NULL; > + return idx < 0 ? idx : 0; > +} > + > +/* > + * Encodes the handle of a particular slot in the pool using metadata > + */ > +static inline unsigned long metadata_to_handle(struct zblock_block *block, > + unsigned int block_type, unsigned int slot) > +{ > + return (unsigned long)(block) + (block_type << SLOT_BITS) + slot; > +} > + > +/* Returns block, block type and slot in the pool corresponding to handle */ > +static inline struct zblock_block *handle_to_metadata(unsigned long handle, > + unsigned int *block_type, unsigned int *slot) > +{ > + *block_type = (handle & (PAGE_SIZE - 1)) >> SLOT_BITS; > + *slot = handle & SLOT_MASK; > + return (struct zblock_block *)(handle & PAGE_MASK); > +} > + > + > +/* > + * allocate new block and add it to corresponding block list > + */ > +static struct zblock_block *alloc_block(struct zblock_pool *pool, > + int block_type, gfp_t gfp, > + unsigned long *handle) > +{ > + struct zblock_block *block; > + struct block_list *list; > + > + block = (void *)__get_free_pages(gfp, block_desc[block_type].order); > + if (!block) > + return NULL; > + > + list = &(pool->block_lists)[block_type]; > + > + /* init block data */ > + memset(&block->slot_info, 0, sizeof(block->slot_info)); > + atomic_set(&block->free_slots, block_desc[block_type].slots_per_block - 1); > + block->cache_idx = -1; > + set_bit(BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info); > + *handle = metadata_to_handle(block, block_type, 0); > + > + spin_lock(&list->lock); > + cache_insert_block(block, list); > + list->block_count++; > + spin_unlock(&list->lock); > + return block; > +} > + > +/***************** > + * API Functions > + *****************/ > +/** > + * zblock_create_pool() - create a new zblock pool > + * @gfp: gfp flags when allocating the zblock pool structure > + * @ops: user-defined operations for the zblock pool > + * > + * Return: pointer to the new zblock pool or NULL if the metadata allocation > + * failed. > + */ > +static struct zblock_pool *zblock_create_pool(gfp_t gfp) > +{ > + struct zblock_pool *pool; > + struct block_list *list; > + int i, j; > + > + pool = kmalloc(sizeof(struct zblock_pool), gfp); > + if (!pool) > + return NULL; > + > + /* init each block list */ > + for (i = 0; i < ARRAY_SIZE(block_desc); i++) { > + list = &(pool->block_lists)[i]; > + spin_lock_init(&list->lock); > + for (j = 0; j < BLOCK_CACHE_SIZE; j++) > + list->block_cache[j] = NULL; > + list->block_count = 0; > + } > + atomic_set(&pool->alloc_flag, 0); > + return pool; > +} > + > +/** > + * zblock_destroy_pool() - destroys an existing zblock pool > + * @pool: the zblock pool to be destroyed > + * > + */ > +static void zblock_destroy_pool(struct zblock_pool *pool) > +{ > + kfree(pool); > +} > + > + > +/** > + * zblock_alloc() - allocates a slot of appropriate size > + * @pool: zblock pool from which to allocate > + * @size: size in bytes of the desired allocation > + * @gfp: gfp flags used if the pool needs to grow > + * @handle: handle of the new allocation > + * > + * Return: 0 if success and handle is set, otherwise -EINVAL if the size or > + * gfp arguments are invalid or -ENOMEM if the pool was unable to allocate > + * a new slot. > + */ > +static int zblock_alloc(struct zblock_pool *pool, size_t size, gfp_t gfp, > + unsigned long *handle) > +{ > + unsigned int block_type, slot; > + struct zblock_block *block; > + struct block_list *list; > + > + if (!size) > + return -EINVAL; > + > + if (size > PAGE_SIZE) > + return -ENOSPC; > + > + /* find basic block type with suitable slot size */ > + for (block_type = 0; block_type < ARRAY_SIZE(block_desc); block_type++) { > + if (size <= block_desc[block_type].slot_size) > + break; > + } > + list = &(pool->block_lists[block_type]); > + > +check: > + /* check if there are free slots in cache */ > + spin_lock(&list->lock); > + block = cache_find_block(list); > + spin_unlock(&list->lock); > + if (block) > + goto found; > + > + /* not found block with free slots try to allocate new empty block */ > + if (atomic_cmpxchg(&pool->alloc_flag, 0, 1)) > + goto check; > + block = alloc_block(pool, block_type, gfp, handle); > + atomic_set(&pool->alloc_flag, 0); > + if (block) > + return 0; > + return -ENOMEM; > + > +found: > + /* find the first free slot in block */ > + for (slot = 0; slot < block_desc[block_type].slots_per_block; slot++) { > + if (!test_and_set_bit(slot*2 + BIT_SLOT_OCCUPIED, > + (unsigned long *)&block->slot_info)) > + break; > + } > + *handle = metadata_to_handle(block, block_type, slot); > + return 0; > +} > + > +/** > + * zblock_free() - frees the allocation associated with the given handle > + * @pool: pool in which the allocation resided > + * @handle: handle associated with the allocation returned by zblock_alloc() > + * > + */ > +static void zblock_free(struct zblock_pool *pool, unsigned long handle) > +{ > + unsigned int slot, block_type; > + struct zblock_block *block; > + struct block_list *list; > + > + block = handle_to_metadata(handle, &block_type, &slot); > + list = &(pool->block_lists[block_type]); > + > + spin_lock(&list->lock); > + /* if all slots in block are empty delete whole block */ > + if (atomic_inc_return(&block->free_slots) == block_desc[block_type].slots_per_block) { > + list->block_count--; > + cache_remove_block(list, block); > + spin_unlock(&list->lock); > + free_pages((unsigned long)block, block_desc[block_type].order); > + return; > + } > + > + if (atomic_read(&block->free_slots) < block_desc[block_type].slots_per_block/2 > + && block->cache_idx == -1) > + cache_insert_block(block, list); > + spin_unlock(&list->lock); > + > + clear_bit(slot*2 + BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info); > +} > + > +/** > + * zblock_map() - maps the allocation associated with the given handle > + * @pool: pool in which the allocation resides > + * @handle: handle associated with the allocation to be mapped > + * > + * > + * Returns: a pointer to the mapped allocation > + */ > +static void *zblock_map(struct zblock_pool *pool, unsigned long handle) > +{ > + unsigned int block_type, slot; > + struct zblock_block *block; > + void *p; > + > + block = handle_to_metadata(handle, &block_type, &slot); > + p = (void *)block + ZBLOCK_HEADER_SIZE + slot * block_desc[block_type].slot_size; > + return p; > +} > + > +/** > + * zblock_unmap() - unmaps the allocation associated with the given handle > + * @pool: pool in which the allocation resides > + * @handle: handle associated with the allocation to be unmapped > + */ > +static void zblock_unmap(struct zblock_pool *pool, unsigned long handle) > +{ > +} > + > +/** > + * zblock_get_total_pages() - gets the zblock pool size in pages > + * @pool: pool being queried > + * > + * Returns: size in bytes of the given pool. > + */ > +static u64 zblock_get_total_pages(struct zblock_pool *pool) > +{ > + u64 total_size; > + int i; > + > + total_size = 0; > + for (i = 0; i < ARRAY_SIZE(block_desc); i++) > + total_size += pool->block_lists[i].block_count << block_desc[i].order; > + > + return total_size; > +} > + > +/***************** > + * zpool > + ****************/ > + > +static void *zblock_zpool_create(const char *name, gfp_t gfp) > +{ > + return zblock_create_pool(gfp); > +} > + > +static void zblock_zpool_destroy(void *pool) > +{ > + zblock_destroy_pool(pool); > +} > + > +static int zblock_zpool_malloc(void *pool, size_t size, gfp_t gfp, > + unsigned long *handle) > +{ > + return zblock_alloc(pool, size, gfp, handle); > +} > + > +static void zblock_zpool_free(void *pool, unsigned long handle) > +{ > + zblock_free(pool, handle); > +} > + > +static void *zblock_zpool_map(void *pool, unsigned long handle, > + enum zpool_mapmode mm) > +{ > + return zblock_map(pool, handle); > +} > + > +static void zblock_zpool_unmap(void *pool, unsigned long handle) > +{ > + zblock_unmap(pool, handle); > +} > + > +static u64 zblock_zpool_total_pages(void *pool) > +{ > + return zblock_get_total_pages(pool); > +} > + > +static struct zpool_driver zblock_zpool_driver = { > + .type = "zblock", > + .owner = THIS_MODULE, > + .create = zblock_zpool_create, > + .destroy = zblock_zpool_destroy, > + .malloc = zblock_zpool_malloc, > + .free = zblock_zpool_free, > + .map = zblock_zpool_map, > + .unmap = zblock_zpool_unmap, > + .total_pages = zblock_zpool_total_pages, > +}; > + > +MODULE_ALIAS("zpool-zblock"); > + > +static int __init init_zblock(void) > +{ > + pr_info("loaded\n"); > + zpool_register_driver(&zblock_zpool_driver); > + return 0; > +} > + > +static void __exit exit_zblock(void) > +{ > + zpool_unregister_driver(&zblock_zpool_driver); > + pr_info("unloaded\n"); > +} > + > +module_init(init_zblock); > +module_exit(exit_zblock); > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("Vitaly Wool <vitaly.wool@konsulko.com>"); > +MODULE_DESCRIPTION("Block allocator for compressed pages"); > -- > 2.39.2 > >
Hi Nhat, On Tuesday, April 1, 2025 at 8:24:15 pm +02:00, Nhat Pham <nphamcs@gmail.com> wrote: >> With zblock, it is possible to densely arrange objects of various sizes >> resulting in low internal fragmentation. Also this allocator tries to >> fill incomplete blocks instead of adding new ones, in many cases >> providing a compression ratio substantially higher than z3fold and zbud >> (though lower than zmalloc's). >> Do we have data for comparison here? Yeah, for instance we normally run *stress-ng --vm 4 --vm-bytes 4G --vm-keep --timeout 10m --metrics-brief* zblock: stress-ng: metrc: [499] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s stress-ng: metrc: [499] (secs) (secs) (secs) (real time) (usr+sys time) stress-ng: metrc: [499] vm 43755994 600.53 2382.34 15.19 72861.92 18250.50 zsmalloc: stress-ng: metrc: [491] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s stress-ng: metrc: [491] (secs) (secs) (secs) (real time) (usr+sys time) stress-ng: metrc: [491] vm 41769859 601.37 2381.85 16.56 69457.56 17415.62 which gives just a little short of 5% of advantage for zblock. >> with regard to average performance and worst execution times, thus >> allowing for better response time and real-time characteristics of the >> whole system. >> By performance, do you mean latency or throughput or storage density? Latency and throughput. It's hard to compete with zsmalloc in storage density indeed. ~Vitaly
Hi Vitaly, On Tue, Apr 01, 2025 at 07:17:54PM +0200, Vitaly Wool wrote: > zblock is a special purpose allocator for storing compressed pages. > It stores integer number of compressed objects per its block. These > blocks consist of several physical pages (2**n, i. e. 1/2/4/8). > > With zblock, it is possible to densely arrange objects of various sizes > resulting in low internal fragmentation. Also this allocator tries to > fill incomplete blocks instead of adding new ones, in many cases > providing a compression ratio substantially higher than z3fold and zbud > (though lower than zmalloc's). > > zblock does not require MMU Can you explain why not requiring MMU is important for your use-case? Also what exactly is your use-case? Are you planning to use zblock through zram or zswap or something new? > to operate and also is superior to zsmalloc > with regard to average performance and worst execution times, thus > allowing for better response time and real-time characteristics of the > whole system. > > E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get > 5-10% higher value for bogo ops/s in zblock/zsmalloc comparison. Can you explain a bit more on this test? How is this test using zblock? thanks, Shakeel
Hi Shakeel, 2025-04-02 03:16 Shakeel Butt wrote: > Hi Vitaly, > > On Tue, Apr 01, 2025 at 07:17:54PM +0200, Vitaly Wool wrote: >> zblock is a special purpose allocator for storing compressed pages. >> It stores integer number of compressed objects per its block. These >> blocks consist of several physical pages (2**n, i. e. 1/2/4/8). >> >> With zblock, it is possible to densely arrange objects of various >> sizes >> resulting in low internal fragmentation. Also this allocator tries to >> fill incomplete blocks instead of adding new ones, in many cases >> providing a compression ratio substantially higher than z3fold and >> zbud >> (though lower than zmalloc's). >> >> zblock does not require MMU > > Can you explain why not requiring MMU is important for your use-case? > Also what exactly is your use-case? Are you planning to use zblock > through zram or zswap or something new? We have 2 use cases for a zpool backend: data centers (zswap) and small MMU less devices, this is where zram comes ito play. We know that zram over zpool patch is still out of tree but we also know the story behind that :) >> to operate and also is superior to zsmalloc >> with regard to average performance and worst execution times, thus >> allowing for better response time and real-time characteristics of the >> whole system. >> >> E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get >> 5-10% higher value for bogo ops/s in zblock/zsmalloc comparison. > > Can you explain a bit more on this test? How is this test using zblock? We're running stress-ng on a number of devices. The results published in the previous message are from the run on Raspberry Pi 5. zpool's settings used for comparison are lz4/80%. Thanks, Igor
Hi Vitaly, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on v6.14] [cannot apply to akpm-mm/mm-everything next-20250402] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Vitaly-Wool/mm-add-zblock-allocator/20250402-011953 base: linus/master patch link: https://lore.kernel.org/r/20250401171754.2686501-1-vitaly.wool%40konsulko.se patch subject: [PATCH] mm: add zblock allocator config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20250402/202504022017.kHAz8bGB-lkp@intel.com/config) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250402/202504022017.kHAz8bGB-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202504022017.kHAz8bGB-lkp@intel.com/ All warnings (new ones prefixed by >>): >> mm/zblock.c:56: warning: Function parameter or struct member 'free_slots' not described in 'zblock_block' >> mm/zblock.c:56: warning: Function parameter or struct member 'slot_info' not described in 'zblock_block' >> mm/zblock.c:56: warning: Function parameter or struct member 'cache_idx' not described in 'zblock_block' >> mm/zblock.c:102: warning: Function parameter or struct member 'slot_size' not described in 'block_desc' >> mm/zblock.c:102: warning: Function parameter or struct member 'slots_per_block' not described in 'block_desc' >> mm/zblock.c:102: warning: Function parameter or struct member 'order' not described in 'block_desc' >> mm/zblock.c:102: warning: Function parameter or struct member 'block_desc' not described in 'block_desc' >> mm/zblock.c:114: warning: Function parameter or struct member 'lock' not described in 'block_list' >> mm/zblock.c:114: warning: Function parameter or struct member 'block_cache' not described in 'block_list' >> mm/zblock.c:114: warning: Function parameter or struct member 'block_count' not described in 'block_list' >> mm/zblock.c:249: warning: Excess function parameter 'ops' description in 'zblock_create_pool' vim +56 mm/zblock.c 42 43 /** 44 * struct zblock_block - block metadata 45 * Block consists of several (1/2/4/8) pages and contains fixed 46 * integer number of slots for allocating compressed pages. 47 * 48 * free_slots: number of free slots in the block 49 * slot_info: contains data about free/occupied slots 50 * cache_idx: index of the block in cache 51 */ 52 struct zblock_block { 53 atomic_t free_slots; 54 u64 slot_info[1]; 55 int cache_idx; > 56 }; 57 58 /** 59 * struct block_desc - general metadata for block lists 60 * Each block list stores only blocks of corresponding type which means 61 * that all blocks in it have the same number and size of slots. 62 * All slots are aligned to size of long. 63 * 64 * slot_size: size of slot for this list 65 * slots_per_block: number of slots per block for this list 66 * order: order for __get_free_pages 67 */ 68 static const struct block_desc { 69 const unsigned int slot_size; 70 const unsigned short slots_per_block; 71 const unsigned short order; 72 } block_desc[] = { 73 { SLOT_SIZE(32, 0), 32, 0 }, 74 { SLOT_SIZE(22, 0), 22, 0 }, 75 { SLOT_SIZE(17, 0), 17, 0 }, 76 { SLOT_SIZE(13, 0), 13, 0 }, 77 { SLOT_SIZE(11, 0), 11, 0 }, 78 { SLOT_SIZE(9, 0), 9, 0 }, 79 { SLOT_SIZE(8, 0), 8, 0 }, 80 { SLOT_SIZE(14, 1), 14, 1 }, 81 { SLOT_SIZE(12, 1), 12, 1 }, 82 { SLOT_SIZE(11, 1), 11, 1 }, 83 { SLOT_SIZE(10, 1), 10, 1 }, 84 { SLOT_SIZE(9, 1), 9, 1 }, 85 { SLOT_SIZE(8, 1), 8, 1 }, 86 { SLOT_SIZE(15, 2), 15, 2 }, 87 { SLOT_SIZE(14, 2), 14, 2 }, 88 { SLOT_SIZE(13, 2), 13, 2 }, 89 { SLOT_SIZE(12, 2), 12, 2 }, 90 { SLOT_SIZE(11, 2), 11, 2 }, 91 { SLOT_SIZE(10, 2), 10, 2 }, 92 { SLOT_SIZE(9, 2), 9, 2 }, 93 { SLOT_SIZE(8, 2), 8, 2 }, 94 { SLOT_SIZE(15, 3), 15, 3 }, 95 { SLOT_SIZE(14, 3), 14, 3 }, 96 { SLOT_SIZE(13, 3), 13, 3 }, 97 { SLOT_SIZE(12, 3), 12, 3 }, 98 { SLOT_SIZE(11, 3), 11, 3 }, 99 { SLOT_SIZE(10, 3), 10, 3 }, 100 { SLOT_SIZE(9, 3), 9, 3 }, 101 { SLOT_SIZE(7, 3), 7, 3 } > 102 }; 103 104 /** 105 * struct block_list - stores metadata of particular list 106 * lock: protects block_cache 107 * block_cache: blocks with free slots 108 * block_count: total number of blocks in the list 109 */ 110 struct block_list { 111 spinlock_t lock; 112 struct zblock_block *block_cache[BLOCK_CACHE_SIZE]; 113 unsigned long block_count; > 114 }; 115 116 /** 117 * struct zblock_pool - stores metadata for each zblock pool 118 * @block_lists: array of block lists 119 * @zpool: zpool driver 120 * @alloc_flag: protects block allocation from memory leak 121 * 122 * This structure is allocated at pool creation time and maintains metadata 123 * for a particular zblock pool. 124 */ 125 struct zblock_pool { 126 struct block_list block_lists[ARRAY_SIZE(block_desc)]; 127 struct zpool *zpool; 128 atomic_t alloc_flag; 129 }; 130 131 /***************** 132 * Helpers 133 *****************/ 134 135 static int cache_insert_block(struct zblock_block *block, struct block_list *list) 136 { 137 unsigned int i, min_free_slots = atomic_read(&block->free_slots); 138 int min_index = -1; 139 140 if (WARN_ON(block->cache_idx != -1)) 141 return -EINVAL; 142 143 min_free_slots = atomic_read(&block->free_slots); 144 for (i = 0; i < BLOCK_CACHE_SIZE; i++) { 145 if (!list->block_cache[i] || !atomic_read(&(list->block_cache[i])->free_slots)) { 146 min_index = i; 147 break; 148 } 149 if (atomic_read(&(list->block_cache[i])->free_slots) < min_free_slots) { 150 min_free_slots = atomic_read(&(list->block_cache[i])->free_slots); 151 min_index = i; 152 } 153 } 154 if (min_index >= 0) { 155 if (list->block_cache[min_index]) 156 (list->block_cache[min_index])->cache_idx = -1; 157 list->block_cache[min_index] = block; 158 block->cache_idx = min_index; 159 } 160 return min_index < 0 ? min_index : 0; 161 } 162 163 static struct zblock_block *cache_find_block(struct block_list *list) 164 { 165 int i; 166 struct zblock_block *z = NULL; 167 168 for (i = 0; i < BLOCK_CACHE_SIZE; i++) { 169 if (list->block_cache[i] && 170 atomic_dec_if_positive(&list->block_cache[i]->free_slots) >= 0) { 171 z = list->block_cache[i]; 172 break; 173 } 174 } 175 return z; 176 } 177 178 static int cache_remove_block(struct block_list *list, struct zblock_block *block) 179 { 180 int idx = block->cache_idx; 181 182 block->cache_idx = -1; 183 if (idx >= 0) 184 list->block_cache[idx] = NULL; 185 return idx < 0 ? idx : 0; 186 } 187 188 /* 189 * Encodes the handle of a particular slot in the pool using metadata 190 */ 191 static inline unsigned long metadata_to_handle(struct zblock_block *block, 192 unsigned int block_type, unsigned int slot) 193 { 194 return (unsigned long)(block) + (block_type << SLOT_BITS) + slot; 195 } 196 197 /* Returns block, block type and slot in the pool corresponding to handle */ 198 static inline struct zblock_block *handle_to_metadata(unsigned long handle, 199 unsigned int *block_type, unsigned int *slot) 200 { 201 *block_type = (handle & (PAGE_SIZE - 1)) >> SLOT_BITS; 202 *slot = handle & SLOT_MASK; 203 return (struct zblock_block *)(handle & PAGE_MASK); 204 } 205 206 207 /* 208 * allocate new block and add it to corresponding block list 209 */ 210 static struct zblock_block *alloc_block(struct zblock_pool *pool, 211 int block_type, gfp_t gfp, 212 unsigned long *handle) 213 { 214 struct zblock_block *block; 215 struct block_list *list; 216 217 block = (void *)__get_free_pages(gfp, block_desc[block_type].order); 218 if (!block) 219 return NULL; 220 221 list = &(pool->block_lists)[block_type]; 222 223 /* init block data */ 224 memset(&block->slot_info, 0, sizeof(block->slot_info)); 225 atomic_set(&block->free_slots, block_desc[block_type].slots_per_block - 1); 226 block->cache_idx = -1; 227 set_bit(BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info); 228 *handle = metadata_to_handle(block, block_type, 0); 229 230 spin_lock(&list->lock); 231 cache_insert_block(block, list); 232 list->block_count++; 233 spin_unlock(&list->lock); 234 return block; 235 } 236 237 /***************** 238 * API Functions 239 *****************/ 240 /** 241 * zblock_create_pool() - create a new zblock pool 242 * @gfp: gfp flags when allocating the zblock pool structure 243 * @ops: user-defined operations for the zblock pool 244 * 245 * Return: pointer to the new zblock pool or NULL if the metadata allocation 246 * failed. 247 */ 248 static struct zblock_pool *zblock_create_pool(gfp_t gfp) > 249 { 250 struct zblock_pool *pool; 251 struct block_list *list; 252 int i, j; 253 254 pool = kmalloc(sizeof(struct zblock_pool), gfp); 255 if (!pool) 256 return NULL; 257 258 /* init each block list */ 259 for (i = 0; i < ARRAY_SIZE(block_desc); i++) { 260 list = &(pool->block_lists)[i]; 261 spin_lock_init(&list->lock); 262 for (j = 0; j < BLOCK_CACHE_SIZE; j++) 263 list->block_cache[j] = NULL; 264 list->block_count = 0; 265 } 266 atomic_set(&pool->alloc_flag, 0); 267 return pool; 268 } 269
Hi Igor, On Wed, Apr 02, 2025 at 10:45:25AM +0400, igor.b@beldev.am wrote: > Hi Shakeel, > > 2025-04-02 03:16 Shakeel Butt wrote: > > Hi Vitaly, > > > > On Tue, Apr 01, 2025 at 07:17:54PM +0200, Vitaly Wool wrote: > > > zblock is a special purpose allocator for storing compressed pages. > > > It stores integer number of compressed objects per its block. These > > > blocks consist of several physical pages (2**n, i. e. 1/2/4/8). > > > > > > With zblock, it is possible to densely arrange objects of various > > > sizes > > > resulting in low internal fragmentation. Also this allocator tries to > > > fill incomplete blocks instead of adding new ones, in many cases > > > providing a compression ratio substantially higher than z3fold and > > > zbud > > > (though lower than zmalloc's). > > > > > > zblock does not require MMU > > > > Can you explain why not requiring MMU is important for your use-case? > > Also what exactly is your use-case? Are you planning to use zblock > > through zram or zswap or something new? > > We have 2 use cases for a zpool backend: data centers (zswap) and small MMU > less devices, this is where zram comes ito play. We know that zram over > zpool patch is still out of tree but we also know the story behind that :) > So zswap and zram are the use-cases. It is usually frowned upon to add a general purpose library without an actual user. So, I would recommend to add a user along with the zblock. > > > to operate and also is superior to zsmalloc > > > with regard to average performance and worst execution times, thus > > > allowing for better response time and real-time characteristics of the > > > whole system. > > > > > > E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get > > > 5-10% higher value for bogo ops/s in zblock/zsmalloc comparison. > > > > Can you explain a bit more on this test? How is this test using zblock? > > We're running stress-ng on a number of devices. The results published in the > previous message are from the run on Raspberry Pi 5. zpool's settings used > for comparison are lz4/80%. > My question was mainly if this is zswap or zram and I think it is zswap based on the stress-ng command. For zswap, I would recommend to run perf experiment on a server (or beefy desktop) as well. Also pick a normal server like workload. I would be interested in looking at zswapin latencies and zswapout throuput in addition to the metrics the workload cares about. > Thanks, > Igor >
diff --git a/Documentation/mm/zblock.rst b/Documentation/mm/zblock.rst new file mode 100644 index 000000000000..754b3dbb9e94 --- /dev/null +++ b/Documentation/mm/zblock.rst @@ -0,0 +1,22 @@ +====== +zblock +====== + +zblock is a special purpose allocator for storing compressed pages. +It stores integer number of compressed objects per its block. These +blocks consist of several physical pages (2**n, i. e. 1/2/4/8). + +With zblock, it is possible to densely arrange objects of various sizes +resulting in low internal fragmentation. Also this allocator tries to +fill incomplete blocks instead of adding new ones, in many cases +providing a compression ratio substantially higher than z3fold and zbud +(though lower than zmalloc's). + +zblock does not require MMU to operate and also is superior to zsmalloc +with regard to average performance and worst execution times, thus +allowing for better response time and real-time characteristics of the +whole system. + +E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get +5-10% higher value for bogo ops/s in zblock/zsmalloc comparison. + diff --git a/MAINTAINERS b/MAINTAINERS index 991a33bad10e..166e9bfa04dc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -26313,6 +26313,13 @@ F: Documentation/networking/device_drivers/hamradio/z8530drv.rst F: drivers/net/hamradio/*scc.c F: drivers/net/hamradio/z8530.h +ZBLOCK COMPRESSED SLAB MEMORY ALLOCATOR +M: Vitaly Wool <vitaly.wool@konsulko.se> +L: linux-mm@kvack.org +S: Maintained +F: Documentation/mm/zblock.rst +F: mm/zblock.c + ZBUD COMPRESSED PAGE ALLOCATOR M: Seth Jennings <sjenning@redhat.com> M: Dan Streetman <ddstreet@ieee.org> diff --git a/mm/Kconfig b/mm/Kconfig index 1b501db06417..26b79e3c1300 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -193,6 +193,14 @@ config Z3FOLD_DEPRECATED page. It is a ZBUD derivative so the simplicity and determinism are still there. +config ZBLOCK + tristate "Fast compression allocator with high density" + depends on ZPOOL + help + A special purpose allocator for storing compressed pages. + It is designed to store same size compressed pages in blocks of + physical pages. + config Z3FOLD tristate default y if Z3FOLD_DEPRECATED=y diff --git a/mm/Makefile b/mm/Makefile index 850386a67b3e..2018455b7baa 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -116,6 +116,7 @@ obj-$(CONFIG_ZPOOL) += zpool.o obj-$(CONFIG_ZBUD) += zbud.o obj-$(CONFIG_ZSMALLOC) += zsmalloc.o obj-$(CONFIG_Z3FOLD) += z3fold.o +obj-$(CONFIG_ZBLOCK) += zblock.o obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o obj-$(CONFIG_CMA) += cma.o obj-$(CONFIG_NUMA) += numa.o diff --git a/mm/zblock.c b/mm/zblock.c new file mode 100644 index 000000000000..a6778653c451 --- /dev/null +++ b/mm/zblock.c @@ -0,0 +1,492 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * zblock.c + * + * Author: Vitaly Wool <vitaly.wool@konsulko.com> + * Based on the work from Ananda Badmaev <a.badmaev@clicknet.pro> + * Copyright (C) 2022-2024, Konsulko AB. + * + * Zblock is a small object allocator with the intention to serve as a + * zpool backend. It operates on page blocks which consist of number + * of physical pages being a power of 2 and store integer number of + * compressed pages per block which results in determinism and simplicity. + * + * zblock doesn't export any API and is meant to be used via zpool API. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/atomic.h> +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/preempt.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/zpool.h> + +#define SLOT_FREE 0 +#define BIT_SLOT_OCCUPIED 0 +#define BIT_SLOT_MAPPED 1 + +#define SLOT_BITS 5 +#define MAX_SLOTS (1 << SLOT_BITS) +#define SLOT_MASK ((0x1UL << SLOT_BITS) - 1) + +#define ZBLOCK_HEADER_SIZE round_up(sizeof(struct zblock_block), sizeof(long)) +#define BLOCK_DATA_SIZE(order) ((PAGE_SIZE << order) - ZBLOCK_HEADER_SIZE) +#define SLOT_SIZE(nslots, order) (round_down((BLOCK_DATA_SIZE(order) / nslots), sizeof(long))) + +#define BLOCK_CACHE_SIZE 32 + +struct zblock_pool; + +/** + * struct zblock_block - block metadata + * Block consists of several (1/2/4/8) pages and contains fixed + * integer number of slots for allocating compressed pages. + * + * free_slots: number of free slots in the block + * slot_info: contains data about free/occupied slots + * cache_idx: index of the block in cache + */ +struct zblock_block { + atomic_t free_slots; + u64 slot_info[1]; + int cache_idx; +}; + +/** + * struct block_desc - general metadata for block lists + * Each block list stores only blocks of corresponding type which means + * that all blocks in it have the same number and size of slots. + * All slots are aligned to size of long. + * + * slot_size: size of slot for this list + * slots_per_block: number of slots per block for this list + * order: order for __get_free_pages + */ +static const struct block_desc { + const unsigned int slot_size; + const unsigned short slots_per_block; + const unsigned short order; +} block_desc[] = { + { SLOT_SIZE(32, 0), 32, 0 }, + { SLOT_SIZE(22, 0), 22, 0 }, + { SLOT_SIZE(17, 0), 17, 0 }, + { SLOT_SIZE(13, 0), 13, 0 }, + { SLOT_SIZE(11, 0), 11, 0 }, + { SLOT_SIZE(9, 0), 9, 0 }, + { SLOT_SIZE(8, 0), 8, 0 }, + { SLOT_SIZE(14, 1), 14, 1 }, + { SLOT_SIZE(12, 1), 12, 1 }, + { SLOT_SIZE(11, 1), 11, 1 }, + { SLOT_SIZE(10, 1), 10, 1 }, + { SLOT_SIZE(9, 1), 9, 1 }, + { SLOT_SIZE(8, 1), 8, 1 }, + { SLOT_SIZE(15, 2), 15, 2 }, + { SLOT_SIZE(14, 2), 14, 2 }, + { SLOT_SIZE(13, 2), 13, 2 }, + { SLOT_SIZE(12, 2), 12, 2 }, + { SLOT_SIZE(11, 2), 11, 2 }, + { SLOT_SIZE(10, 2), 10, 2 }, + { SLOT_SIZE(9, 2), 9, 2 }, + { SLOT_SIZE(8, 2), 8, 2 }, + { SLOT_SIZE(15, 3), 15, 3 }, + { SLOT_SIZE(14, 3), 14, 3 }, + { SLOT_SIZE(13, 3), 13, 3 }, + { SLOT_SIZE(12, 3), 12, 3 }, + { SLOT_SIZE(11, 3), 11, 3 }, + { SLOT_SIZE(10, 3), 10, 3 }, + { SLOT_SIZE(9, 3), 9, 3 }, + { SLOT_SIZE(7, 3), 7, 3 } +}; + +/** + * struct block_list - stores metadata of particular list + * lock: protects block_cache + * block_cache: blocks with free slots + * block_count: total number of blocks in the list + */ +struct block_list { + spinlock_t lock; + struct zblock_block *block_cache[BLOCK_CACHE_SIZE]; + unsigned long block_count; +}; + +/** + * struct zblock_pool - stores metadata for each zblock pool + * @block_lists: array of block lists + * @zpool: zpool driver + * @alloc_flag: protects block allocation from memory leak + * + * This structure is allocated at pool creation time and maintains metadata + * for a particular zblock pool. + */ +struct zblock_pool { + struct block_list block_lists[ARRAY_SIZE(block_desc)]; + struct zpool *zpool; + atomic_t alloc_flag; +}; + +/***************** + * Helpers + *****************/ + +static int cache_insert_block(struct zblock_block *block, struct block_list *list) +{ + unsigned int i, min_free_slots = atomic_read(&block->free_slots); + int min_index = -1; + + if (WARN_ON(block->cache_idx != -1)) + return -EINVAL; + + min_free_slots = atomic_read(&block->free_slots); + for (i = 0; i < BLOCK_CACHE_SIZE; i++) { + if (!list->block_cache[i] || !atomic_read(&(list->block_cache[i])->free_slots)) { + min_index = i; + break; + } + if (atomic_read(&(list->block_cache[i])->free_slots) < min_free_slots) { + min_free_slots = atomic_read(&(list->block_cache[i])->free_slots); + min_index = i; + } + } + if (min_index >= 0) { + if (list->block_cache[min_index]) + (list->block_cache[min_index])->cache_idx = -1; + list->block_cache[min_index] = block; + block->cache_idx = min_index; + } + return min_index < 0 ? min_index : 0; +} + +static struct zblock_block *cache_find_block(struct block_list *list) +{ + int i; + struct zblock_block *z = NULL; + + for (i = 0; i < BLOCK_CACHE_SIZE; i++) { + if (list->block_cache[i] && + atomic_dec_if_positive(&list->block_cache[i]->free_slots) >= 0) { + z = list->block_cache[i]; + break; + } + } + return z; +} + +static int cache_remove_block(struct block_list *list, struct zblock_block *block) +{ + int idx = block->cache_idx; + + block->cache_idx = -1; + if (idx >= 0) + list->block_cache[idx] = NULL; + return idx < 0 ? idx : 0; +} + +/* + * Encodes the handle of a particular slot in the pool using metadata + */ +static inline unsigned long metadata_to_handle(struct zblock_block *block, + unsigned int block_type, unsigned int slot) +{ + return (unsigned long)(block) + (block_type << SLOT_BITS) + slot; +} + +/* Returns block, block type and slot in the pool corresponding to handle */ +static inline struct zblock_block *handle_to_metadata(unsigned long handle, + unsigned int *block_type, unsigned int *slot) +{ + *block_type = (handle & (PAGE_SIZE - 1)) >> SLOT_BITS; + *slot = handle & SLOT_MASK; + return (struct zblock_block *)(handle & PAGE_MASK); +} + + +/* + * allocate new block and add it to corresponding block list + */ +static struct zblock_block *alloc_block(struct zblock_pool *pool, + int block_type, gfp_t gfp, + unsigned long *handle) +{ + struct zblock_block *block; + struct block_list *list; + + block = (void *)__get_free_pages(gfp, block_desc[block_type].order); + if (!block) + return NULL; + + list = &(pool->block_lists)[block_type]; + + /* init block data */ + memset(&block->slot_info, 0, sizeof(block->slot_info)); + atomic_set(&block->free_slots, block_desc[block_type].slots_per_block - 1); + block->cache_idx = -1; + set_bit(BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info); + *handle = metadata_to_handle(block, block_type, 0); + + spin_lock(&list->lock); + cache_insert_block(block, list); + list->block_count++; + spin_unlock(&list->lock); + return block; +} + +/***************** + * API Functions + *****************/ +/** + * zblock_create_pool() - create a new zblock pool + * @gfp: gfp flags when allocating the zblock pool structure + * @ops: user-defined operations for the zblock pool + * + * Return: pointer to the new zblock pool or NULL if the metadata allocation + * failed. + */ +static struct zblock_pool *zblock_create_pool(gfp_t gfp) +{ + struct zblock_pool *pool; + struct block_list *list; + int i, j; + + pool = kmalloc(sizeof(struct zblock_pool), gfp); + if (!pool) + return NULL; + + /* init each block list */ + for (i = 0; i < ARRAY_SIZE(block_desc); i++) { + list = &(pool->block_lists)[i]; + spin_lock_init(&list->lock); + for (j = 0; j < BLOCK_CACHE_SIZE; j++) + list->block_cache[j] = NULL; + list->block_count = 0; + } + atomic_set(&pool->alloc_flag, 0); + return pool; +} + +/** + * zblock_destroy_pool() - destroys an existing zblock pool + * @pool: the zblock pool to be destroyed + * + */ +static void zblock_destroy_pool(struct zblock_pool *pool) +{ + kfree(pool); +} + + +/** + * zblock_alloc() - allocates a slot of appropriate size + * @pool: zblock pool from which to allocate + * @size: size in bytes of the desired allocation + * @gfp: gfp flags used if the pool needs to grow + * @handle: handle of the new allocation + * + * Return: 0 if success and handle is set, otherwise -EINVAL if the size or + * gfp arguments are invalid or -ENOMEM if the pool was unable to allocate + * a new slot. + */ +static int zblock_alloc(struct zblock_pool *pool, size_t size, gfp_t gfp, + unsigned long *handle) +{ + unsigned int block_type, slot; + struct zblock_block *block; + struct block_list *list; + + if (!size) + return -EINVAL; + + if (size > PAGE_SIZE) + return -ENOSPC; + + /* find basic block type with suitable slot size */ + for (block_type = 0; block_type < ARRAY_SIZE(block_desc); block_type++) { + if (size <= block_desc[block_type].slot_size) + break; + } + list = &(pool->block_lists[block_type]); + +check: + /* check if there are free slots in cache */ + spin_lock(&list->lock); + block = cache_find_block(list); + spin_unlock(&list->lock); + if (block) + goto found; + + /* not found block with free slots try to allocate new empty block */ + if (atomic_cmpxchg(&pool->alloc_flag, 0, 1)) + goto check; + block = alloc_block(pool, block_type, gfp, handle); + atomic_set(&pool->alloc_flag, 0); + if (block) + return 0; + return -ENOMEM; + +found: + /* find the first free slot in block */ + for (slot = 0; slot < block_desc[block_type].slots_per_block; slot++) { + if (!test_and_set_bit(slot*2 + BIT_SLOT_OCCUPIED, + (unsigned long *)&block->slot_info)) + break; + } + *handle = metadata_to_handle(block, block_type, slot); + return 0; +} + +/** + * zblock_free() - frees the allocation associated with the given handle + * @pool: pool in which the allocation resided + * @handle: handle associated with the allocation returned by zblock_alloc() + * + */ +static void zblock_free(struct zblock_pool *pool, unsigned long handle) +{ + unsigned int slot, block_type; + struct zblock_block *block; + struct block_list *list; + + block = handle_to_metadata(handle, &block_type, &slot); + list = &(pool->block_lists[block_type]); + + spin_lock(&list->lock); + /* if all slots in block are empty delete whole block */ + if (atomic_inc_return(&block->free_slots) == block_desc[block_type].slots_per_block) { + list->block_count--; + cache_remove_block(list, block); + spin_unlock(&list->lock); + free_pages((unsigned long)block, block_desc[block_type].order); + return; + } + + if (atomic_read(&block->free_slots) < block_desc[block_type].slots_per_block/2 + && block->cache_idx == -1) + cache_insert_block(block, list); + spin_unlock(&list->lock); + + clear_bit(slot*2 + BIT_SLOT_OCCUPIED, (unsigned long *)block->slot_info); +} + +/** + * zblock_map() - maps the allocation associated with the given handle + * @pool: pool in which the allocation resides + * @handle: handle associated with the allocation to be mapped + * + * + * Returns: a pointer to the mapped allocation + */ +static void *zblock_map(struct zblock_pool *pool, unsigned long handle) +{ + unsigned int block_type, slot; + struct zblock_block *block; + void *p; + + block = handle_to_metadata(handle, &block_type, &slot); + p = (void *)block + ZBLOCK_HEADER_SIZE + slot * block_desc[block_type].slot_size; + return p; +} + +/** + * zblock_unmap() - unmaps the allocation associated with the given handle + * @pool: pool in which the allocation resides + * @handle: handle associated with the allocation to be unmapped + */ +static void zblock_unmap(struct zblock_pool *pool, unsigned long handle) +{ +} + +/** + * zblock_get_total_pages() - gets the zblock pool size in pages + * @pool: pool being queried + * + * Returns: size in bytes of the given pool. + */ +static u64 zblock_get_total_pages(struct zblock_pool *pool) +{ + u64 total_size; + int i; + + total_size = 0; + for (i = 0; i < ARRAY_SIZE(block_desc); i++) + total_size += pool->block_lists[i].block_count << block_desc[i].order; + + return total_size; +} + +/***************** + * zpool + ****************/ + +static void *zblock_zpool_create(const char *name, gfp_t gfp) +{ + return zblock_create_pool(gfp); +} + +static void zblock_zpool_destroy(void *pool) +{ + zblock_destroy_pool(pool); +} + +static int zblock_zpool_malloc(void *pool, size_t size, gfp_t gfp, + unsigned long *handle) +{ + return zblock_alloc(pool, size, gfp, handle); +} + +static void zblock_zpool_free(void *pool, unsigned long handle) +{ + zblock_free(pool, handle); +} + +static void *zblock_zpool_map(void *pool, unsigned long handle, + enum zpool_mapmode mm) +{ + return zblock_map(pool, handle); +} + +static void zblock_zpool_unmap(void *pool, unsigned long handle) +{ + zblock_unmap(pool, handle); +} + +static u64 zblock_zpool_total_pages(void *pool) +{ + return zblock_get_total_pages(pool); +} + +static struct zpool_driver zblock_zpool_driver = { + .type = "zblock", + .owner = THIS_MODULE, + .create = zblock_zpool_create, + .destroy = zblock_zpool_destroy, + .malloc = zblock_zpool_malloc, + .free = zblock_zpool_free, + .map = zblock_zpool_map, + .unmap = zblock_zpool_unmap, + .total_pages = zblock_zpool_total_pages, +}; + +MODULE_ALIAS("zpool-zblock"); + +static int __init init_zblock(void) +{ + pr_info("loaded\n"); + zpool_register_driver(&zblock_zpool_driver); + return 0; +} + +static void __exit exit_zblock(void) +{ + zpool_unregister_driver(&zblock_zpool_driver); + pr_info("unloaded\n"); +} + +module_init(init_zblock); +module_exit(exit_zblock); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Vitaly Wool <vitaly.wool@konsulko.com>"); +MODULE_DESCRIPTION("Block allocator for compressed pages");