diff mbox series

[PATCHv2,3/6] zsmalloc: fine-grained inuse ratio based fullness grouping

Message ID 20230223030451.543162-4-senozhatsky@chromium.org (mailing list archive)
State New
Headers show
Series zsmalloc: fine-grained fullness and new compaction algorithm | expand

Commit Message

Sergey Senozhatsky Feb. 23, 2023, 3:04 a.m. UTC
Each zspage maintains ->inuse counter which keeps track of the
number of objects stored in the page. The ->inuse counter also
determines the page's "fullness group" which is calculated as
the ratio of the "inuse" objects to the total number of objects
the page can hold (objs_per_zspage). The closer the ->inuse
counter is to objs_per_zspage, the better.

Each size class maintains several fullness lists, that keep
track of zspages of particular "fullness". Pages within each
fullness list are stored in random order with regard to the
->inuse counter. This is because sorting the pages by ->inuse
counter each time obj_malloc() or obj_free() is called would
be too expensive. However, the ->inuse counter is still a
crucial factor in many situations.

For the two major zsmalloc operations, zs_malloc() and zs_compact(),
we typically select the head page from the corresponding fullness
list as the best candidate page. However, this assumption is not
always accurate.

For the zs_malloc() operation, the optimal candidate page should
have the highest ->inuse counter. This is because the goal is to
maximize the number of ZS_FULL pages and make full use of all
allocated memory.

For the zs_compact() operation, the optimal candidate page should
have the lowest ->inuse counter. This is because compaction needs
to move objects in use to another page before it can release the
zspage and return its physical pages to the buddy allocator. The
fewer objects in use, the quicker compaction can release the page.
Additionally, compaction is measured by the number of pages it
releases.

This patch reworks the fullness grouping mechanism. Instead of
having two groups - ZS_ALMOST_EMPTY (usage ratio below 3/4) and
ZS_ALMOST_FULL (usage ration above 3/4) - that result in too many
pages being included in the ALMOST_EMPTY group for specific
classes, size classes maintain a larger number of fullness lists
that give strict guarantees on the minimum and maximum ->inuse
values within each group. Each group represents a 10% change in the
->inuse ratio compared to neighboring groups. In essence, there
are groups for pages with 0%, 10%, 20% usage ratios, and so on,
up to 100%.

This enhances the selection of candidate pages for both zs_malloc()
and zs_compact(). A printout of the ->inuse counters of the first 7
pages per (random) class fullness group:

 class-768 objs_per_zspage 16:
   fullness 100%:  empty
   fullness  99%:  empty
   fullness  90%:  empty
   fullness  80%:  empty
   fullness  70%:  empty
   fullness  60%:  8  8  9  9  8  8  8
   fullness  50%:  empty
   fullness  40%:  5  5  6  5  5  5  5
   fullness  30%:  4  4  4  4  4  4  4
   fullness  20%:  2  3  2  3  3  2  2
   fullness  10%:  1  1  1  1  1  1  1
   fullness   0%:  empty

The zs_malloc() function searches through the groups of pages
starting with the one having the highest usage ratio. This means
that it always selects a page from the group with the least
internal fragmentation (highest usage ratio) and makes it even
less fragmented by increasing its usage ratio.

The zs_compact() function, on the other hand, begins by scanning
the group with the highest fragmentation (lowest usage ratio) to
locate the source page. The first available page is selected, and
then the function moves downward to find a destination page in
the group with the lowest internal fragmentation (highest usage
ratio).

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 mm/zsmalloc.c | 183 +++++++++++++++++++++++++++++---------------------
 1 file changed, 107 insertions(+), 76 deletions(-)

Comments

Minchan Kim Feb. 23, 2023, 11:27 p.m. UTC | #1
On Thu, Feb 23, 2023 at 12:04:48PM +0900, Sergey Senozhatsky wrote:
> Each zspage maintains ->inuse counter which keeps track of the
> number of objects stored in the page. The ->inuse counter also
> determines the page's "fullness group" which is calculated as

                zspage's

> the ratio of the "inuse" objects to the total number of objects
> the page can hold (objs_per_zspage). The closer the ->inuse

      zspage

> counter is to objs_per_zspage, the better.
> 
> Each size class maintains several fullness lists, that keep
> track of zspages of particular "fullness". Pages within each
> fullness list are stored in random order with regard to the
> ->inuse counter. This is because sorting the pages by ->inuse
> counter each time obj_malloc() or obj_free() is called would
> be too expensive. However, the ->inuse counter is still a
> crucial factor in many situations.
> 
> For the two major zsmalloc operations, zs_malloc() and zs_compact(),
> we typically select the head page from the corresponding fullness
> list as the best candidate page. However, this assumption is not
> always accurate.
> 
> For the zs_malloc() operation, the optimal candidate page should
> have the highest ->inuse counter. This is because the goal is to
> maximize the number of ZS_FULL pages and make full use of all
> allocated memory.
> 
> For the zs_compact() operation, the optimal candidate page should

                                                            as source page

> have the lowest ->inuse counter. This is because compaction needs
> to move objects in use to another page before it can release the
> zspage and return its physical pages to the buddy allocator. The
> fewer objects in use, the quicker compaction can release the page.
> Additionally, compaction is measured by the number of pages it
> releases.
> 
> This patch reworks the fullness grouping mechanism. Instead of
> having two groups - ZS_ALMOST_EMPTY (usage ratio below 3/4) and
> ZS_ALMOST_FULL (usage ration above 3/4) - that result in too many
> pages being included in the ALMOST_EMPTY group for specific
> classes, size classes maintain a larger number of fullness lists
> that give strict guarantees on the minimum and maximum ->inuse
> values within each group. Each group represents a 10% change in the
> ->inuse ratio compared to neighboring groups. In essence, there
> are groups for pages with 0%, 10%, 20% usage ratios, and so on,
> up to 100%.
> 
> This enhances the selection of candidate pages for both zs_malloc()
> and zs_compact(). A printout of the ->inuse counters of the first 7
> pages per (random) class fullness group:
> 
>  class-768 objs_per_zspage 16:
>    fullness 100%:  empty
>    fullness  99%:  empty
>    fullness  90%:  empty
>    fullness  80%:  empty
>    fullness  70%:  empty
>    fullness  60%:  8  8  9  9  8  8  8
>    fullness  50%:  empty
>    fullness  40%:  5  5  6  5  5  5  5
>    fullness  30%:  4  4  4  4  4  4  4
>    fullness  20%:  2  3  2  3  3  2  2
>    fullness  10%:  1  1  1  1  1  1  1
>    fullness   0%:  empty
> 
> The zs_malloc() function searches through the groups of pages
> starting with the one having the highest usage ratio. This means
> that it always selects a page from the group with the least
> internal fragmentation (highest usage ratio) and makes it even
> less fragmented by increasing its usage ratio.
> 
> The zs_compact() function, on the other hand, begins by scanning
> the group with the highest fragmentation (lowest usage ratio) to
> locate the source page. The first available page is selected, and
> then the function moves downward to find a destination page in
> the group with the lowest internal fragmentation (highest usage
> ratio).

That's nice! I just have small nits below.

> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  mm/zsmalloc.c | 183 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 107 insertions(+), 76 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 38ae8963c0eb..1a92ebe338eb 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -127,7 +127,7 @@
>  #define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
>  
>  #define HUGE_BITS	1
> -#define FULLNESS_BITS	2
> +#define FULLNESS_BITS	4
>  #define CLASS_BITS	8
>  #define ISOLATED_BITS	5
>  #define MAGIC_VAL_BITS	8
> @@ -159,15 +159,33 @@
>  #define ZS_SIZE_CLASSES	(DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE, \
>  				      ZS_SIZE_CLASS_DELTA) + 1)
>  
> -#define ZS_EMPTY		0
> -#define ZS_ALMOST_EMPTY		1
> -#define ZS_ALMOST_FULL		2
> -#define ZS_FULL			3
> -#define ZS_OBJS_ALLOCATED	4
> -#define ZS_OBJS_INUSE		5
> -
> -#define NR_ZS_STAT		6
> -#define NR_ZS_FULLNESS		4
> +/*
> + * Pages are distinguished by the ratio of used memory (that is the ratio
> + * of ->inuse objects to all objects that page can store). For example,
> + * INUSE_RATIO_30 means that the ratio of used objects is > 20% and <= 30%.
> + *
> + * The number of fullness groups is not random. It allows us to keep
> + * diffeence between the least busy page in the group (minimum permitted
> + * number of ->inuse objects) and the most busy page (maximum permitted
> + * number of ->inuse objects) at a reasonable value.
> + */
> +#define ZS_INUSE_RATIO_0	0

How about keeping ZS_EMPTY and ZS_FULL since they are used
multiple places in source code? It would have less churning.

> +#define ZS_INUSE_RATIO_10	1
> +#define ZS_INUSE_RATIO_20	2
> +#define ZS_INUSE_RATIO_30	3
> +#define ZS_INUSE_RATIO_40	4
> +#define ZS_INUSE_RATIO_50	5
> +#define ZS_INUSE_RATIO_60	6
> +#define ZS_INUSE_RATIO_70	7
> +#define ZS_INUSE_RATIO_80	8
> +#define ZS_INUSE_RATIO_90	9
> +#define ZS_INUSE_RATIO_99	10

Do we really need all the define macro for the range from 10 to 99?
Can't we do this?

enum class_stat_type {
    ZS_EMPTY,
    /*
     * There are fullness buckets between 10% - 99%.
     */
    ZS_FULL = 11
    NR_ZS_FULLNESS,
    ZS_OBJ_ALLOCATED = NR_ZS_FULLNESS,
    ZS_OBJ_USED,
    NR_ZS_STAT,
}

> +#define ZS_INUSE_RATIO_100	11


> +#define ZS_OBJS_ALLOCATED	12
> +#define ZS_OBJS_INUSE		13
> +
> +#define NR_ZS_INUSE_RATIO	12
> +#define NR_ZS_STAT		14
>  
>  struct zs_size_stat {
>  	unsigned long objs[NR_ZS_STAT];
> @@ -177,25 +195,10 @@ struct zs_size_stat {
>  static struct dentry *zs_stat_root;
>  #endif
>  
> -/*
> - * We assign a page to ZS_ALMOST_EMPTY fullness group when:
> - *	n <= N / f, where
> - * n = number of allocated objects
> - * N = total number of objects zspage can store
> - * f = fullness_threshold_frac
> - *
> - * Similarly, we assign zspage to:
> - *	ZS_ALMOST_FULL	when n > N / f
> - *	ZS_EMPTY	when n == 0
> - *	ZS_FULL		when n == N
> - *
> - * (see: fix_fullness_group())
> - */
> -static const int fullness_threshold_frac = 4;
>  static size_t huge_class_size;
>  
>  struct size_class {
> -	struct list_head fullness_list[NR_ZS_FULLNESS];
> +	struct list_head fullness_list[NR_ZS_INUSE_RATIO];

With the trick with enum, we don't have this change.

>  	/*
>  	 * Size of objects stored in this class. Must be multiple
>  	 * of ZS_ALIGN.
> @@ -641,8 +644,23 @@ static int zs_stats_size_show(struct seq_file *s, void *v)
>  			continue;
>  
>  		spin_lock(&pool->lock);
> -		class_almost_full = zs_stat_get(class, ZS_ALMOST_FULL);
> -		class_almost_empty = zs_stat_get(class, ZS_ALMOST_EMPTY);
> +
> +		/*
> +		 * Replecate old behaviour for almost_full and almost_empty
> +		 * stats.
> +		 */
> +		class_almost_full = zs_stat_get(class, ZS_INUSE_RATIO_99);
> +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_90);
> +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_80);
> +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_70);

> +
> +		class_almost_empty = zs_stat_get(class, ZS_INUSE_RATIO_60);
> +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_50);
> +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_40);
> +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_30);
> +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_20);
> +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_10);

I guess you can use just loop here from 1 to 6

And then from 7 to 10 for class_almost_full.
 
> +
>  		obj_allocated = zs_stat_get(class, ZS_OBJS_ALLOCATED);
>  		obj_used = zs_stat_get(class, ZS_OBJS_INUSE);
>  		freeable = zs_can_compact(class);
> @@ -712,32 +730,30 @@ static inline void zs_pool_stat_destroy(struct zs_pool *pool)
>  }
>  #endif
>  
> -
>  /*
>   * For each size class, zspages are divided into different groups
> - * depending on how "full" they are. This was done so that we could
> - * easily find empty or nearly empty zspages when we try to shrink
> - * the pool (not yet implemented). This function returns fullness
> + * depending on their usage ratio. This function returns fullness
>   * status of the given page.
>   */
>  static int get_fullness_group(struct size_class *class, struct zspage *zspage)
>  {
> -	int inuse, objs_per_zspage;
> -	int fg;
> +	int inuse, objs_per_zspage, ratio;
>  
>  	inuse = get_zspage_inuse(zspage);
>  	objs_per_zspage = class->objs_per_zspage;
>  
>  	if (inuse == 0)
> -		fg = ZS_EMPTY;
> -	else if (inuse == objs_per_zspage)
> -		fg = ZS_FULL;
> -	else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac)
> -		fg = ZS_ALMOST_EMPTY;
> -	else
> -		fg = ZS_ALMOST_FULL;
> +		return ZS_INUSE_RATIO_0;
> +	if (inuse == objs_per_zspage)
> +		return ZS_INUSE_RATIO_100;
>  
> -	return fg;
> +	ratio = 100 * inuse / objs_per_zspage;
> +	/*
> +	 * Take integer division into consideration: a page with one inuse
> +	 * object out of 127 possible, will endup having 0 usage ratio,
> +	 * which is wrong as it belongs in ZS_INUSE_RATIO_10 fullness group.
> +	 */
> +	return ratio / 10 + 1;
>  }
>  
>  /*
> @@ -769,11 +785,11 @@ static void remove_zspage(struct size_class *class,
>  /*
>   * Each size class maintains zspages in different fullness groups depending
>   * on the number of live objects they contain. When allocating or freeing
> - * objects, the fullness status of the page can change, say, from ALMOST_FULL
> - * to ALMOST_EMPTY when freeing an object. This function checks if such
> - * a status change has occurred for the given page and accordingly moves the
> - * page from the freelist of the old fullness group to that of the new
> - * fullness group.
> + * objects, the fullness status of the page can change, for instance, from
> + * INUSE_RATIO_80 to INUSE_RATIO_70 when freeing an object. This function
> + * checks if such a status change has occurred for the given page and
> + * accordingly moves the page from the list of the old fullness group to that
> + * of the new fullness group.
>   */
>  static int fix_fullness_group(struct size_class *class, struct zspage *zspage)
>  {
> @@ -959,7 +975,7 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,
>  	assert_spin_locked(&pool->lock);
>  
>  	VM_BUG_ON(get_zspage_inuse(zspage));
> -	VM_BUG_ON(fg != ZS_EMPTY);
> +	VM_BUG_ON(fg != ZS_INUSE_RATIO_0);
>  
>  	/* Free all deferred handles from zs_free */
>  	free_handles(pool, class, zspage);
> @@ -998,7 +1014,7 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class,
>  		return;
>  	}
>  
> -	remove_zspage(class, zspage, ZS_EMPTY);
> +	remove_zspage(class, zspage, ZS_INUSE_RATIO_0);
>  #ifdef CONFIG_ZPOOL
>  	list_del(&zspage->lru);
>  #endif
> @@ -1134,9 +1150,9 @@ static struct zspage *find_get_zspage(struct size_class *class)
>  	int i;
>  	struct zspage *zspage;
>  
> -	for (i = ZS_ALMOST_FULL; i >= ZS_EMPTY; i--) {
> +	for (i = ZS_INUSE_RATIO_99; i >= ZS_INUSE_RATIO_0; i--) {
>  		zspage = list_first_entry_or_null(&class->fullness_list[i],
> -				struct zspage, list);
> +						  struct zspage, list);
>  		if (zspage)
>  			break;
>  	}
> @@ -1629,7 +1645,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
>  	obj_free(class->size, obj, NULL);
>  
>  	fullness = fix_fullness_group(class, zspage);
> -	if (fullness == ZS_EMPTY)
> +	if (fullness == ZS_INUSE_RATIO_0)
>  		free_zspage(pool, class, zspage);
>  
>  	spin_unlock(&pool->lock);
> @@ -1811,22 +1827,33 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  	return ret;
>  }
>  
> -static struct zspage *isolate_zspage(struct size_class *class, bool source)
> +static struct zspage *isolate_src_zspage(struct size_class *class)
>  {
> -	int i;
>  	struct zspage *zspage;
> -	int fg[2] = {ZS_ALMOST_EMPTY, ZS_ALMOST_FULL};
> +	int fg;
>  
> -	if (!source) {
> -		fg[0] = ZS_ALMOST_FULL;
> -		fg[1] = ZS_ALMOST_EMPTY;
> +	for (fg = ZS_INUSE_RATIO_10; fg <= ZS_INUSE_RATIO_99; fg++) {
> +		zspage = list_first_entry_or_null(&class->fullness_list[fg],
> +						  struct zspage, list);
> +		if (zspage) {
> +			remove_zspage(class, zspage, fg);
> +			return zspage;
> +		}
>  	}
>  
> -	for (i = 0; i < 2; i++) {
> -		zspage = list_first_entry_or_null(&class->fullness_list[fg[i]],
> -							struct zspage, list);
> +	return zspage;
> +}
> +
> +static struct zspage *isolate_dst_zspage(struct size_class *class)
> +{
> +	struct zspage *zspage;
> +	int fg;
> +
> +	for (fg = ZS_INUSE_RATIO_99; fg >= ZS_INUSE_RATIO_10; fg--) {
> +		zspage = list_first_entry_or_null(&class->fullness_list[fg],
> +						  struct zspage, list);
>  		if (zspage) {
> -			remove_zspage(class, zspage, fg[i]);
> +			remove_zspage(class, zspage, fg);
>  			return zspage;
>  		}
>  	}
> @@ -2119,7 +2146,7 @@ static void async_free_zspage(struct work_struct *work)
>  			continue;
>  
>  		spin_lock(&pool->lock);
> -		list_splice_init(&class->fullness_list[ZS_EMPTY], &free_pages);
> +		list_splice_init(&class->fullness_list[ZS_INUSE_RATIO_0], &free_pages);
>  		spin_unlock(&pool->lock);
>  	}
>  
> @@ -2128,7 +2155,7 @@ static void async_free_zspage(struct work_struct *work)
>  		lock_zspage(zspage);
>  
>  		get_zspage_mapping(zspage, &class_idx, &fullness);
> -		VM_BUG_ON(fullness != ZS_EMPTY);
> +		VM_BUG_ON(fullness != ZS_INUSE_RATIO_0);
>  		class = pool->size_class[class_idx];
>  		spin_lock(&pool->lock);
>  #ifdef CONFIG_ZPOOL
> @@ -2201,7 +2228,7 @@ static unsigned long __zs_compact(struct zs_pool *pool,
>  	 * as well as zpage allocation/free
>  	 */
>  	spin_lock(&pool->lock);
> -	while ((src_zspage = isolate_zspage(class, true))) {
> +	while ((src_zspage = isolate_src_zspage(class))) {
>  		/* protect someone accessing the zspage(i.e., zs_map_object) */
>  		migrate_write_lock(src_zspage);
>  
> @@ -2211,7 +2238,7 @@ static unsigned long __zs_compact(struct zs_pool *pool,
>  		cc.obj_idx = 0;
>  		cc.s_page = get_first_page(src_zspage);
>  
> -		while ((dst_zspage = isolate_zspage(class, false))) {
> +		while ((dst_zspage = isolate_dst_zspage(class))) {
>  			migrate_write_lock_nested(dst_zspage);
>  
>  			cc.d_page = get_first_page(dst_zspage);
> @@ -2236,7 +2263,7 @@ static unsigned long __zs_compact(struct zs_pool *pool,
>  		putback_zspage(class, dst_zspage);
>  		migrate_write_unlock(dst_zspage);
>  
> -		if (putback_zspage(class, src_zspage) == ZS_EMPTY) {
> +		if (putback_zspage(class, src_zspage) == ZS_INUSE_RATIO_0) {
>  			migrate_write_unlock(src_zspage);
>  			free_zspage(pool, class, src_zspage);
>  			pages_freed += class->pages_per_zspage;
> @@ -2394,7 +2421,7 @@ struct zs_pool *zs_create_pool(const char *name)
>  		int pages_per_zspage;
>  		int objs_per_zspage;
>  		struct size_class *class;
> -		int fullness = 0;
> +		int fullness;
>  
>  		size = ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA;
>  		if (size > ZS_MAX_ALLOC_SIZE)
> @@ -2448,9 +2475,12 @@ struct zs_pool *zs_create_pool(const char *name)
>  		class->pages_per_zspage = pages_per_zspage;
>  		class->objs_per_zspage = objs_per_zspage;
>  		pool->size_class[i] = class;
> -		for (fullness = ZS_EMPTY; fullness < NR_ZS_FULLNESS;
> -							fullness++)
> +
> +		fullness = ZS_INUSE_RATIO_0;
> +		while (fullness < NR_ZS_INUSE_RATIO) {
>  			INIT_LIST_HEAD(&class->fullness_list[fullness]);
> +			fullness++;
> +		}
>  
>  		prev_class = class;
>  	}
> @@ -2496,11 +2526,12 @@ void zs_destroy_pool(struct zs_pool *pool)
>  		if (class->index != i)
>  			continue;
>  
> -		for (fg = ZS_EMPTY; fg < NR_ZS_FULLNESS; fg++) {
> -			if (!list_empty(&class->fullness_list[fg])) {
> -				pr_info("Freeing non-empty class with size %db, fullness group %d\n",
> -					class->size, fg);
> -			}
> +		for (fg = ZS_INUSE_RATIO_0; fg < NR_ZS_INUSE_RATIO; fg++) {
> +			if (list_empty(&class->fullness_list[fg]))
> +				continue;
> +
> +			pr_err("Class-%d fullness group %d is not empty\n",
> +			       class->size, fg);
>  		}
>  		kfree(class);
>  	}
> @@ -2672,7 +2703,7 @@ static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries)
>  			 * while the page is removed from the pool. Fix it
>  			 * up for the check in __free_zspage().
>  			 */
> -			zspage->fullness = ZS_EMPTY;
> +			zspage->fullness = ZS_INUSE_RATIO_0;
>  
>  			__free_zspage(pool, class, zspage);
>  			spin_unlock(&pool->lock);
> -- 
> 2.39.2.637.g21b0678d19-goog
>
Sergey Senozhatsky Feb. 26, 2023, 4:38 a.m. UTC | #2
On (23/02/23 15:27), Minchan Kim wrote:
> > + * Pages are distinguished by the ratio of used memory (that is the ratio
> > + * of ->inuse objects to all objects that page can store). For example,
> > + * INUSE_RATIO_30 means that the ratio of used objects is > 20% and <= 30%.
> > + *
> > + * The number of fullness groups is not random. It allows us to keep
> > + * diffeence between the least busy page in the group (minimum permitted
> > + * number of ->inuse objects) and the most busy page (maximum permitted
> > + * number of ->inuse objects) at a reasonable value.
> > + */
> > +#define ZS_INUSE_RATIO_0	0
> 
> How about keeping ZS_EMPTY and ZS_FULL since they are used
> multiple places in source code? It would have less churning.

I have to admit that I sort of like the unified naming
	"zspage inuse ratio goes from 0 to 100"

but I can keep ZS_EMPTY / ZS_FULL as two "special" inuse values.

> > +#define ZS_INUSE_RATIO_10	1
> > +#define ZS_INUSE_RATIO_20	2
> > +#define ZS_INUSE_RATIO_30	3
> > +#define ZS_INUSE_RATIO_40	4
> > +#define ZS_INUSE_RATIO_50	5
> > +#define ZS_INUSE_RATIO_60	6
> > +#define ZS_INUSE_RATIO_70	7
> > +#define ZS_INUSE_RATIO_80	8
> > +#define ZS_INUSE_RATIO_90	9
> > +#define ZS_INUSE_RATIO_99	10
> 
> Do we really need all the define macro for the range from 10 to 99?
> Can't we do this?
> 
> enum class_stat_type {
>     ZS_EMPTY,
>     /*
>      * There are fullness buckets between 10% - 99%.
>      */
>     ZS_FULL = 11
>     NR_ZS_FULLNESS,
>     ZS_OBJ_ALLOCATED = NR_ZS_FULLNESS,
>     ZS_OBJ_USED,
>     NR_ZS_STAT,
> }

This creates undocumented secret constats, which are being heavily
used (zspage fullness values, indeces in fullness lists arrays,
stats array offsets, etc.) but have no trace in the code. And this
also forces us to use magic number in the code. So should fullness
grouping change, things like, for instance, zs_stat_get(7), will
compile just fine yet will do something very different and we will
have someone to spot the regression.

So yes, it's 10 lines of defines, it's not even 10 lines of code, but
 1) it is documentation, we keep constats documented
 2) more importantly, it protects us from regressions and bugs

From maintinability point of view, having everything excpliticly
documented / spelled out is a win.

As of why I decided to go with defines, this is because zspage fullness
values and class stats are two conceptually different things, they don't
really fit in one single enum, unless enum's name is "zs_constants".
What do you think?

[..]
> >  	 * Size of objects stored in this class. Must be multiple
> >  	 * of ZS_ALIGN.
> > @@ -641,8 +644,23 @@ static int zs_stats_size_show(struct seq_file *s, void *v)
> >  			continue;
> >  
> >  		spin_lock(&pool->lock);
> > -		class_almost_full = zs_stat_get(class, ZS_ALMOST_FULL);
> > -		class_almost_empty = zs_stat_get(class, ZS_ALMOST_EMPTY);
> > +
> > +		/*
> > +		 * Replecate old behaviour for almost_full and almost_empty
> > +		 * stats.
> > +		 */
> > +		class_almost_full = zs_stat_get(class, ZS_INUSE_RATIO_99);
> > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_90);
> > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_80);
> > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_70);
> 
> > +
> > +		class_almost_empty = zs_stat_get(class, ZS_INUSE_RATIO_60);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_50);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_40);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_30);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_20);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_10);
> 
> I guess you can use just loop here from 1 to 6
> 
> And then from 7 to 10 for class_almost_full.

I can change it to

	for (r = ZS_INUSE_RATIO_10; r <= ZS_INUSE_RATIO_70; r++)
and
	for (r = ZS_INUSE_RATIO_80; r <= ZS_INUSE_RATIO_99; r++)

which would be safer than using hard-coded numbers.

Shall we actually instead report per inuse ratio stats instead? I sort
of don't see too many reasons to keep that below/above 3/4 thing.
Minchan Kim Feb. 28, 2023, 10:53 p.m. UTC | #3
On Sun, Feb 26, 2023 at 01:38:22PM +0900, Sergey Senozhatsky wrote:
> On (23/02/23 15:27), Minchan Kim wrote:
> > > + * Pages are distinguished by the ratio of used memory (that is the ratio
> > > + * of ->inuse objects to all objects that page can store). For example,
> > > + * INUSE_RATIO_30 means that the ratio of used objects is > 20% and <= 30%.
> > > + *
> > > + * The number of fullness groups is not random. It allows us to keep
> > > + * diffeence between the least busy page in the group (minimum permitted
> > > + * number of ->inuse objects) and the most busy page (maximum permitted
> > > + * number of ->inuse objects) at a reasonable value.
> > > + */
> > > +#define ZS_INUSE_RATIO_0	0
> > 
> > How about keeping ZS_EMPTY and ZS_FULL since they are used
> > multiple places in source code? It would have less churning.
> 
> I have to admit that I sort of like the unified naming
> 	"zspage inuse ratio goes from 0 to 100"
> 
> but I can keep ZS_EMPTY / ZS_FULL as two "special" inuse values.
> 
> > > +#define ZS_INUSE_RATIO_10	1
> > > +#define ZS_INUSE_RATIO_20	2
> > > +#define ZS_INUSE_RATIO_30	3
> > > +#define ZS_INUSE_RATIO_40	4
> > > +#define ZS_INUSE_RATIO_50	5
> > > +#define ZS_INUSE_RATIO_60	6
> > > +#define ZS_INUSE_RATIO_70	7
> > > +#define ZS_INUSE_RATIO_80	8
> > > +#define ZS_INUSE_RATIO_90	9
> > > +#define ZS_INUSE_RATIO_99	10
> > 
> > Do we really need all the define macro for the range from 10 to 99?
> > Can't we do this?
> > 
> > enum class_stat_type {
> >     ZS_EMPTY,
> >     /*
> >      * There are fullness buckets between 10% - 99%.
> >      */
> >     ZS_FULL = 11
> >     NR_ZS_FULLNESS,
> >     ZS_OBJ_ALLOCATED = NR_ZS_FULLNESS,
> >     ZS_OBJ_USED,
> >     NR_ZS_STAT,
> > }
> 
> This creates undocumented secret constats, which are being heavily
> used (zspage fullness values, indeces in fullness lists arrays,
> stats array offsets, etc.) but have no trace in the code. And this
> also forces us to use magic number in the code. So should fullness
> grouping change, things like, for instance, zs_stat_get(7), will
> compile just fine yet will do something very different and we will
> have someone to spot the regression.
> 
> So yes, it's 10 lines of defines, it's not even 10 lines of code, but
>  1) it is documentation, we keep constats documented
>  2) more importantly, it protects us from regressions and bugs
> 
> From maintinability point of view, having everything excpliticly
> documented / spelled out is a win.
> 
> As of why I decided to go with defines, this is because zspage fullness
> values and class stats are two conceptually different things, they don't
> really fit in one single enum, unless enum's name is "zs_constants".
> What do you think?

Agree. We don't need to combine them, then. 
BTW, I still prefer the enum instead of 10 define.

enum fullness_group {
    ZS_EMPTY,
    ZS_INUSE_RATIO_MIN,
    ZS_INUSE_RATIO_ALMOST_FULL = 7,
    ZS_INUSE_RATIO_MAX = 10,
    ZS_FULL,
    NR_ZS_FULLNESS,
}

> 
> [..]
> > >  	 * Size of objects stored in this class. Must be multiple
> > >  	 * of ZS_ALIGN.
> > > @@ -641,8 +644,23 @@ static int zs_stats_size_show(struct seq_file *s, void *v)
> > >  			continue;
> > >  
> > >  		spin_lock(&pool->lock);
> > > -		class_almost_full = zs_stat_get(class, ZS_ALMOST_FULL);
> > > -		class_almost_empty = zs_stat_get(class, ZS_ALMOST_EMPTY);
> > > +
> > > +		/*
> > > +		 * Replecate old behaviour for almost_full and almost_empty
> > > +		 * stats.
> > > +		 */
> > > +		class_almost_full = zs_stat_get(class, ZS_INUSE_RATIO_99);
> > > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_90);
> > > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_80);
> > > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_70);
> > 
> > > +
> > > +		class_almost_empty = zs_stat_get(class, ZS_INUSE_RATIO_60);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_50);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_40);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_30);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_20);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_10);
> > 
> > I guess you can use just loop here from 1 to 6
> > 
> > And then from 7 to 10 for class_almost_full.
> 
> I can change it to
> 
> 	for (r = ZS_INUSE_RATIO_10; r <= ZS_INUSE_RATIO_70; r++)
> and
> 	for (r = ZS_INUSE_RATIO_80; r <= ZS_INUSE_RATIO_99; r++)
> 
> which would be safer than using hard-coded numbers.

I didn't mean to have hard code either but just wanted to show
the intention to use the loop.

> 
> Shall we actually instead report per inuse ratio stats instead? I sort
> of don't see too many reasons to keep that below/above 3/4 thing.

Oh, yeah. Since it's debugfs, we would get excuse to break.
Sergey Senozhatsky March 1, 2023, 4:05 a.m. UTC | #4
On (23/02/28 14:53), Minchan Kim wrote:
[..]
> > As of why I decided to go with defines, this is because zspage fullness
> > values and class stats are two conceptually different things, they don't
> > really fit in one single enum, unless enum's name is "zs_constants".
> > What do you think?
> 
> Agree. We don't need to combine them, then. 
> BTW, I still prefer the enum instead of 10 define.
> 
> enum fullness_group {
>     ZS_EMPTY,
>     ZS_INUSE_RATIO_MIN,
>     ZS_INUSE_RATIO_ALMOST_FULL = 7,
>     ZS_INUSE_RATIO_MAX = 10,
>     ZS_FULL,
>     NR_ZS_FULLNESS,
> }

So we keep enum nesting? Sorry, I'm not exactly following.

We have fullness values (which we use independently) and stats array
which has overlapping offsets with fullness values.

[..]
> > I can change it to
> > 
> > 	for (r = ZS_INUSE_RATIO_10; r <= ZS_INUSE_RATIO_70; r++)
> > and
> > 	for (r = ZS_INUSE_RATIO_80; r <= ZS_INUSE_RATIO_99; r++)
> > 
> > which would be safer than using hard-coded numbers.
> 
> I didn't mean to have hard code either but just wanted to show
> the intention to use the loop.

Got it. I just wanted to show that being very verbose (having every
constant documented) is nice :)

> > 
> > Shall we actually instead report per inuse ratio stats instead? I sort
> > of don't see too many reasons to keep that below/above 3/4 thing.
> 
> Oh, yeah. Since it's debugfs, we would get excuse to break.

This was in my original patch, but I decided to put a comment and keep
the old  behavior. I probably will switch to a more precise reporting
(per inuse ratio) in a separate patch, so that we can easily revert it
without any impact on new fullness grouping.
Sergey Senozhatsky March 1, 2023, 8:55 a.m. UTC | #5
On (23/02/28 14:53), Minchan Kim wrote:
> BTW, I still prefer the enum instead of 10 define.
> 
> enum fullness_group {
>     ZS_EMPTY,
>     ZS_INUSE_RATIO_MIN,
>     ZS_INUSE_RATIO_ALMOST_FULL = 7,
>     ZS_INUSE_RATIO_MAX = 10,
>     ZS_FULL,
>     NR_ZS_FULLNESS,
> }

For educational purposes, may I ask what do enums give us? We
always use integers - int:4 in zspage fullness, int for arrays
offsets and we cast to plain integers in get/set stats. So those
enums exist only at declaration point, and plain int otherwise.
What are the benefits over #defines?
Minchan Kim March 2, 2023, 12:13 a.m. UTC | #6
On Wed, Mar 01, 2023 at 01:05:20PM +0900, Sergey Senozhatsky wrote:
> On (23/02/28 14:53), Minchan Kim wrote:
> [..]
> > > As of why I decided to go with defines, this is because zspage fullness
> > > values and class stats are two conceptually different things, they don't
> > > really fit in one single enum, unless enum's name is "zs_constants".
> > > What do you think?
> > 
> > Agree. We don't need to combine them, then. 
> > BTW, I still prefer the enum instead of 10 define.
> > 
> > enum fullness_group {
> >     ZS_EMPTY,
> >     ZS_INUSE_RATIO_MIN,
> >     ZS_INUSE_RATIO_ALMOST_FULL = 7,
> >     ZS_INUSE_RATIO_MAX = 10,
> >     ZS_FULL,
> >     NR_ZS_FULLNESS,
> > }
> 
> So we keep enum nesting? Sorry, I'm not exactly following.

Sorry, I meant let's keep separating them since they are different
things conceptually as you mentioned.

> 
> We have fullness values (which we use independently) and stats array
> which has overlapping offsets with fullness values.
> 
> [..]
> > > I can change it to
> > > 
> > > 	for (r = ZS_INUSE_RATIO_10; r <= ZS_INUSE_RATIO_70; r++)
> > > and
> > > 	for (r = ZS_INUSE_RATIO_80; r <= ZS_INUSE_RATIO_99; r++)
> > > 
> > > which would be safer than using hard-coded numbers.
> > 
> > I didn't mean to have hard code either but just wanted to show
> > the intention to use the loop.
> 
> Got it. I just wanted to show that being very verbose (having every
> constant documented) is nice :)
> 
> > > 
> > > Shall we actually instead report per inuse ratio stats instead? I sort
> > > of don't see too many reasons to keep that below/above 3/4 thing.
> > 
> > Oh, yeah. Since it's debugfs, we would get excuse to break.
> 
> This was in my original patch, but I decided to put a comment and keep
> the old  behavior. I probably will switch to a more precise reporting
> (per inuse ratio) in a separate patch, so that we can easily revert it
> without any impact on new fullness grouping.

Sounds good.
Minchan Kim March 2, 2023, 12:28 a.m. UTC | #7
On Wed, Mar 01, 2023 at 05:55:44PM +0900, Sergey Senozhatsky wrote:
> On (23/02/28 14:53), Minchan Kim wrote:
> > BTW, I still prefer the enum instead of 10 define.
> > 
> > enum fullness_group {
> >     ZS_EMPTY,
> >     ZS_INUSE_RATIO_MIN,
> >     ZS_INUSE_RATIO_ALMOST_FULL = 7,
> >     ZS_INUSE_RATIO_MAX = 10,
> >     ZS_FULL,
> >     NR_ZS_FULLNESS,
> > }
> 
> For educational purposes, may I ask what do enums give us? We
> always use integers - int:4 in zspage fullness, int for arrays
> offsets and we cast to plain integers in get/set stats. So those
> enums exist only at declaration point, and plain int otherwise.
> What are the benefits over #defines?

Well, I just didn't like the 12 hard coded define *list* values
and never used other places except zs_stats_size_show since
I thought we could handle zs_stats_size_show in the loop without
the specific each ratio definary.

Furthermore, above example, the special ZS_INUSE_RATIO_MAX will
be definary instead of hard coded 10.

    ZS_INUSE_RATIO_MAX = ZS_INUSE_RATIO_MIN + ZS_INUSER_RATIO_CLASS_SIZE

so, if we want to change the ratio later, we would need minimal
changes all the places instead of changing all the hard codeded
definary.
Sergey Senozhatsky March 2, 2023, 12:53 a.m. UTC | #8
On (23/03/01 16:28), Minchan Kim wrote:
> On Wed, Mar 01, 2023 at 05:55:44PM +0900, Sergey Senozhatsky wrote:
> > On (23/02/28 14:53), Minchan Kim wrote:
> > > BTW, I still prefer the enum instead of 10 define.
> > > 
> > > enum fullness_group {
> > >     ZS_EMPTY,
> > >     ZS_INUSE_RATIO_MIN,
> > >     ZS_INUSE_RATIO_ALMOST_FULL = 7,
> > >     ZS_INUSE_RATIO_MAX = 10,
> > >     ZS_FULL,
> > >     NR_ZS_FULLNESS,
> > > }
> > 
> > For educational purposes, may I ask what do enums give us? We
> > always use integers - int:4 in zspage fullness, int for arrays
> > offsets and we cast to plain integers in get/set stats. So those
> > enums exist only at declaration point, and plain int otherwise.
> > What are the benefits over #defines?
> 
> Well, I just didn't like the 12 hard coded define *list* values
> and never used other places except zs_stats_size_show since

If we have two enums, then we need more lines

enum fullness {
	ZS_INUSE_RATIO_0
	...
	ZS_INUSE_RATIO_100
}

enum stats {
	INUSE_RATIO_0
	...
	INUSE_RATIO_100

	// the rest of stats
}

and then we use int:4 fullness value to access stats.

> I thought we could handle zs_stats_size_show in the loop without
> the specific each ratio definary.

For per inuse ratio zs_stats_size_show() we need to access stats
individually:

	inuse10, inuse20, inuse30, ... inuse99
Minchan Kim March 3, 2023, 12:20 a.m. UTC | #9
On Thu, Mar 02, 2023 at 09:53:03AM +0900, Sergey Senozhatsky wrote:
> On (23/03/01 16:28), Minchan Kim wrote:
> > On Wed, Mar 01, 2023 at 05:55:44PM +0900, Sergey Senozhatsky wrote:
> > > On (23/02/28 14:53), Minchan Kim wrote:
> > > > BTW, I still prefer the enum instead of 10 define.
> > > > 
> > > > enum fullness_group {
> > > >     ZS_EMPTY,
> > > >     ZS_INUSE_RATIO_MIN,
> > > >     ZS_INUSE_RATIO_ALMOST_FULL = 7,
> > > >     ZS_INUSE_RATIO_MAX = 10,
> > > >     ZS_FULL,
> > > >     NR_ZS_FULLNESS,
> > > > }
> > > 
> > > For educational purposes, may I ask what do enums give us? We
> > > always use integers - int:4 in zspage fullness, int for arrays
> > > offsets and we cast to plain integers in get/set stats. So those
> > > enums exist only at declaration point, and plain int otherwise.
> > > What are the benefits over #defines?
> > 
> > Well, I just didn't like the 12 hard coded define *list* values
> > and never used other places except zs_stats_size_show since
> 
> If we have two enums, then we need more lines
> 
> enum fullness {
> 	ZS_INUSE_RATIO_0
> 	...
> 	ZS_INUSE_RATIO_100
> }
> 
> enum stats {
> 	INUSE_RATIO_0
> 	...
> 	INUSE_RATIO_100
> 
> 	// the rest of stats
> }
> 
> and then we use int:4 fullness value to access stats.

Yeah. I don't see any problem unless I miss your point.

> 
> > I thought we could handle zs_stats_size_show in the loop without
> > the specific each ratio definary.
> 
> For per inuse ratio zs_stats_size_show() we need to access stats
> individually:
> 
> 	inuse10, inuse20, inuse30, ... inuse99

Does it need specific index in the enum list?

I don't mind having all the hard coded index if it's *necessary*
but wanted to try we could make the index with base + index
on demand in the loop via simple arithmetic.
Sergey Senozhatsky March 3, 2023, 1:06 a.m. UTC | #10
On (23/03/02 16:20), Minchan Kim wrote:
> On Thu, Mar 02, 2023 at 09:53:03AM +0900, Sergey Senozhatsky wrote:
> > On (23/03/01 16:28), Minchan Kim wrote:
> > > On Wed, Mar 01, 2023 at 05:55:44PM +0900, Sergey Senozhatsky wrote:
> > > > On (23/02/28 14:53), Minchan Kim wrote:
> > > > > BTW, I still prefer the enum instead of 10 define.
> > > > > 
> > > > > enum fullness_group {
> > > > >     ZS_EMPTY,
> > > > >     ZS_INUSE_RATIO_MIN,
> > > > >     ZS_INUSE_RATIO_ALMOST_FULL = 7,
> > > > >     ZS_INUSE_RATIO_MAX = 10,
> > > > >     ZS_FULL,
> > > > >     NR_ZS_FULLNESS,
> > > > > }
> > > > 
> > > > For educational purposes, may I ask what do enums give us? We
> > > > always use integers - int:4 in zspage fullness, int for arrays
> > > > offsets and we cast to plain integers in get/set stats. So those
> > > > enums exist only at declaration point, and plain int otherwise.
> > > > What are the benefits over #defines?
> > > 
> > > Well, I just didn't like the 12 hard coded define *list* values
> > > and never used other places except zs_stats_size_show since
> > 
> > If we have two enums, then we need more lines
> > 
> > enum fullness {
> > 	ZS_INUSE_RATIO_0
> > 	...
> > 	ZS_INUSE_RATIO_100
> > }
> > 
> > enum stats {
> > 	INUSE_RATIO_0
> > 	...
> > 	INUSE_RATIO_100
> > 
> > 	// the rest of stats
> > }
> > 
> > and then we use int:4 fullness value to access stats.
> 
> Yeah. I don't see any problem unless I miss your point.

OK. How about having one enum? E.g. "zs_flags" or something which
will contain all our constants?

Otherwise I can create two big enums for fullness and stats.
What's your preference on inuse_0 and inuse_100 naming? Do we
keep unified naming or should it be INUSE_MIN/INUSE_MAX or
EMPTY/FULL?

> > For per inuse ratio zs_stats_size_show() we need to access stats
> > individually:
> > 
> > 	inuse10, inuse20, inuse30, ... inuse99
> 
> Does it need specific index in the enum list?

If we report per inuse group then yes:

	sprintf("... %lu %lu ..... %lu %lu ...\n",
		...
		get_stat(ZS_INUSE_RATIO_10),
		get_stat(ZS_INUSE_RATIO_20),
		get_stat(ZS_INUSE_RATIO_30),
		...
		get_stat(ZS_INUSE_RATIO_99),
		...);
Minchan Kim March 3, 2023, 1:38 a.m. UTC | #11
On Fri, Mar 03, 2023 at 10:06:43AM +0900, Sergey Senozhatsky wrote:
> On (23/03/02 16:20), Minchan Kim wrote:
> > On Thu, Mar 02, 2023 at 09:53:03AM +0900, Sergey Senozhatsky wrote:
> > > On (23/03/01 16:28), Minchan Kim wrote:
> > > > On Wed, Mar 01, 2023 at 05:55:44PM +0900, Sergey Senozhatsky wrote:
> > > > > On (23/02/28 14:53), Minchan Kim wrote:
> > > > > > BTW, I still prefer the enum instead of 10 define.
> > > > > > 
> > > > > > enum fullness_group {
> > > > > >     ZS_EMPTY,
> > > > > >     ZS_INUSE_RATIO_MIN,
> > > > > >     ZS_INUSE_RATIO_ALMOST_FULL = 7,
> > > > > >     ZS_INUSE_RATIO_MAX = 10,
> > > > > >     ZS_FULL,
> > > > > >     NR_ZS_FULLNESS,
> > > > > > }
> > > > > 
> > > > > For educational purposes, may I ask what do enums give us? We
> > > > > always use integers - int:4 in zspage fullness, int for arrays
> > > > > offsets and we cast to plain integers in get/set stats. So those
> > > > > enums exist only at declaration point, and plain int otherwise.
> > > > > What are the benefits over #defines?
> > > > 
> > > > Well, I just didn't like the 12 hard coded define *list* values
> > > > and never used other places except zs_stats_size_show since
> > > 
> > > If we have two enums, then we need more lines
> > > 
> > > enum fullness {
> > > 	ZS_INUSE_RATIO_0
> > > 	...
> > > 	ZS_INUSE_RATIO_100
> > > }
> > > 
> > > enum stats {
> > > 	INUSE_RATIO_0
> > > 	...
> > > 	INUSE_RATIO_100
> > > 
> > > 	// the rest of stats
> > > }
> > > 
> > > and then we use int:4 fullness value to access stats.
> > 
> > Yeah. I don't see any problem unless I miss your point.
> 
> OK. How about having one enum? E.g. "zs_flags" or something which
> will contain all our constants?
> 
> Otherwise I can create two big enums for fullness and stats.

Let's go with two enums at this moment since your great work is not
tied into the problem. If that becomes really maintaince hole,
we could tidy it up at that time.

> What's your preference on inuse_0 and inuse_100 naming? Do we
> keep unified naming or should it be INUSE_MIN/INUSE_MAX or
> EMPTY/FULL?

I don't have strong opinion about it. I will follow your choice. ;-)

> 
> > > For per inuse ratio zs_stats_size_show() we need to access stats
> > > individually:
> > > 
> > > 	inuse10, inuse20, inuse30, ... inuse99
> > 
> > Does it need specific index in the enum list?
> 
> If we report per inuse group then yes:
> 
> 	sprintf("... %lu %lu ..... %lu %lu ...\n",
> 		...
> 		get_stat(ZS_INUSE_RATIO_10),
> 		get_stat(ZS_INUSE_RATIO_20),
> 		get_stat(ZS_INUSE_RATIO_30),
> 		...
> 		get_stat(ZS_INUSE_RATIO_99),
> 		...);

I thought we could handle it with loop

prologue - seq_printf
for (ratio = min, ratio < max; ratio++ )
    seq_printf(s, "%lu", get_stat(ratio)
epilogue - seq_printf
seq_puts(s, "\n");
Sergey Senozhatsky March 3, 2023, 1:43 a.m. UTC | #12
On (23/03/02 17:38), Minchan Kim wrote:
> > Otherwise I can create two big enums for fullness and stats.
> 
> Let's go with two enums at this moment since your great work is not
> tied into the problem. If that becomes really maintaince hole,
> we could tidy it up at that time.

OK.

> 
> > What's your preference on inuse_0 and inuse_100 naming? Do we
> > keep unified naming or should it be INUSE_MIN/INUSE_MAX or
> > EMPTY/FULL?
> 
> I don't have strong opinion about it. I will follow your choice. ;-)

OK :)

> prologue - seq_printf
> for (ratio = min, ratio < max; ratio++ )
>     seq_printf(s, "%lu", get_stat(ratio)
> epilogue - seq_printf
> seq_puts(s, "\n");

Let me try a loop.
diff mbox series

Patch

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 38ae8963c0eb..1a92ebe338eb 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -127,7 +127,7 @@ 
 #define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
 
 #define HUGE_BITS	1
-#define FULLNESS_BITS	2
+#define FULLNESS_BITS	4
 #define CLASS_BITS	8
 #define ISOLATED_BITS	5
 #define MAGIC_VAL_BITS	8
@@ -159,15 +159,33 @@ 
 #define ZS_SIZE_CLASSES	(DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE, \
 				      ZS_SIZE_CLASS_DELTA) + 1)
 
-#define ZS_EMPTY		0
-#define ZS_ALMOST_EMPTY		1
-#define ZS_ALMOST_FULL		2
-#define ZS_FULL			3
-#define ZS_OBJS_ALLOCATED	4
-#define ZS_OBJS_INUSE		5
-
-#define NR_ZS_STAT		6
-#define NR_ZS_FULLNESS		4
+/*
+ * Pages are distinguished by the ratio of used memory (that is the ratio
+ * of ->inuse objects to all objects that page can store). For example,
+ * INUSE_RATIO_30 means that the ratio of used objects is > 20% and <= 30%.
+ *
+ * The number of fullness groups is not random. It allows us to keep
+ * diffeence between the least busy page in the group (minimum permitted
+ * number of ->inuse objects) and the most busy page (maximum permitted
+ * number of ->inuse objects) at a reasonable value.
+ */
+#define ZS_INUSE_RATIO_0	0
+#define ZS_INUSE_RATIO_10	1
+#define ZS_INUSE_RATIO_20	2
+#define ZS_INUSE_RATIO_30	3
+#define ZS_INUSE_RATIO_40	4
+#define ZS_INUSE_RATIO_50	5
+#define ZS_INUSE_RATIO_60	6
+#define ZS_INUSE_RATIO_70	7
+#define ZS_INUSE_RATIO_80	8
+#define ZS_INUSE_RATIO_90	9
+#define ZS_INUSE_RATIO_99	10
+#define ZS_INUSE_RATIO_100	11
+#define ZS_OBJS_ALLOCATED	12
+#define ZS_OBJS_INUSE		13
+
+#define NR_ZS_INUSE_RATIO	12
+#define NR_ZS_STAT		14
 
 struct zs_size_stat {
 	unsigned long objs[NR_ZS_STAT];
@@ -177,25 +195,10 @@  struct zs_size_stat {
 static struct dentry *zs_stat_root;
 #endif
 
-/*
- * We assign a page to ZS_ALMOST_EMPTY fullness group when:
- *	n <= N / f, where
- * n = number of allocated objects
- * N = total number of objects zspage can store
- * f = fullness_threshold_frac
- *
- * Similarly, we assign zspage to:
- *	ZS_ALMOST_FULL	when n > N / f
- *	ZS_EMPTY	when n == 0
- *	ZS_FULL		when n == N
- *
- * (see: fix_fullness_group())
- */
-static const int fullness_threshold_frac = 4;
 static size_t huge_class_size;
 
 struct size_class {
-	struct list_head fullness_list[NR_ZS_FULLNESS];
+	struct list_head fullness_list[NR_ZS_INUSE_RATIO];
 	/*
 	 * Size of objects stored in this class. Must be multiple
 	 * of ZS_ALIGN.
@@ -641,8 +644,23 @@  static int zs_stats_size_show(struct seq_file *s, void *v)
 			continue;
 
 		spin_lock(&pool->lock);
-		class_almost_full = zs_stat_get(class, ZS_ALMOST_FULL);
-		class_almost_empty = zs_stat_get(class, ZS_ALMOST_EMPTY);
+
+		/*
+		 * Replecate old behaviour for almost_full and almost_empty
+		 * stats.
+		 */
+		class_almost_full = zs_stat_get(class, ZS_INUSE_RATIO_99);
+		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_90);
+		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_80);
+		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_70);
+
+		class_almost_empty = zs_stat_get(class, ZS_INUSE_RATIO_60);
+		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_50);
+		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_40);
+		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_30);
+		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_20);
+		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_10);
+
 		obj_allocated = zs_stat_get(class, ZS_OBJS_ALLOCATED);
 		obj_used = zs_stat_get(class, ZS_OBJS_INUSE);
 		freeable = zs_can_compact(class);
@@ -712,32 +730,30 @@  static inline void zs_pool_stat_destroy(struct zs_pool *pool)
 }
 #endif
 
-
 /*
  * For each size class, zspages are divided into different groups
- * depending on how "full" they are. This was done so that we could
- * easily find empty or nearly empty zspages when we try to shrink
- * the pool (not yet implemented). This function returns fullness
+ * depending on their usage ratio. This function returns fullness
  * status of the given page.
  */
 static int get_fullness_group(struct size_class *class, struct zspage *zspage)
 {
-	int inuse, objs_per_zspage;
-	int fg;
+	int inuse, objs_per_zspage, ratio;
 
 	inuse = get_zspage_inuse(zspage);
 	objs_per_zspage = class->objs_per_zspage;
 
 	if (inuse == 0)
-		fg = ZS_EMPTY;
-	else if (inuse == objs_per_zspage)
-		fg = ZS_FULL;
-	else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac)
-		fg = ZS_ALMOST_EMPTY;
-	else
-		fg = ZS_ALMOST_FULL;
+		return ZS_INUSE_RATIO_0;
+	if (inuse == objs_per_zspage)
+		return ZS_INUSE_RATIO_100;
 
-	return fg;
+	ratio = 100 * inuse / objs_per_zspage;
+	/*
+	 * Take integer division into consideration: a page with one inuse
+	 * object out of 127 possible, will endup having 0 usage ratio,
+	 * which is wrong as it belongs in ZS_INUSE_RATIO_10 fullness group.
+	 */
+	return ratio / 10 + 1;
 }
 
 /*
@@ -769,11 +785,11 @@  static void remove_zspage(struct size_class *class,
 /*
  * Each size class maintains zspages in different fullness groups depending
  * on the number of live objects they contain. When allocating or freeing
- * objects, the fullness status of the page can change, say, from ALMOST_FULL
- * to ALMOST_EMPTY when freeing an object. This function checks if such
- * a status change has occurred for the given page and accordingly moves the
- * page from the freelist of the old fullness group to that of the new
- * fullness group.
+ * objects, the fullness status of the page can change, for instance, from
+ * INUSE_RATIO_80 to INUSE_RATIO_70 when freeing an object. This function
+ * checks if such a status change has occurred for the given page and
+ * accordingly moves the page from the list of the old fullness group to that
+ * of the new fullness group.
  */
 static int fix_fullness_group(struct size_class *class, struct zspage *zspage)
 {
@@ -959,7 +975,7 @@  static void __free_zspage(struct zs_pool *pool, struct size_class *class,
 	assert_spin_locked(&pool->lock);
 
 	VM_BUG_ON(get_zspage_inuse(zspage));
-	VM_BUG_ON(fg != ZS_EMPTY);
+	VM_BUG_ON(fg != ZS_INUSE_RATIO_0);
 
 	/* Free all deferred handles from zs_free */
 	free_handles(pool, class, zspage);
@@ -998,7 +1014,7 @@  static void free_zspage(struct zs_pool *pool, struct size_class *class,
 		return;
 	}
 
-	remove_zspage(class, zspage, ZS_EMPTY);
+	remove_zspage(class, zspage, ZS_INUSE_RATIO_0);
 #ifdef CONFIG_ZPOOL
 	list_del(&zspage->lru);
 #endif
@@ -1134,9 +1150,9 @@  static struct zspage *find_get_zspage(struct size_class *class)
 	int i;
 	struct zspage *zspage;
 
-	for (i = ZS_ALMOST_FULL; i >= ZS_EMPTY; i--) {
+	for (i = ZS_INUSE_RATIO_99; i >= ZS_INUSE_RATIO_0; i--) {
 		zspage = list_first_entry_or_null(&class->fullness_list[i],
-				struct zspage, list);
+						  struct zspage, list);
 		if (zspage)
 			break;
 	}
@@ -1629,7 +1645,7 @@  void zs_free(struct zs_pool *pool, unsigned long handle)
 	obj_free(class->size, obj, NULL);
 
 	fullness = fix_fullness_group(class, zspage);
-	if (fullness == ZS_EMPTY)
+	if (fullness == ZS_INUSE_RATIO_0)
 		free_zspage(pool, class, zspage);
 
 	spin_unlock(&pool->lock);
@@ -1811,22 +1827,33 @@  static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 	return ret;
 }
 
-static struct zspage *isolate_zspage(struct size_class *class, bool source)
+static struct zspage *isolate_src_zspage(struct size_class *class)
 {
-	int i;
 	struct zspage *zspage;
-	int fg[2] = {ZS_ALMOST_EMPTY, ZS_ALMOST_FULL};
+	int fg;
 
-	if (!source) {
-		fg[0] = ZS_ALMOST_FULL;
-		fg[1] = ZS_ALMOST_EMPTY;
+	for (fg = ZS_INUSE_RATIO_10; fg <= ZS_INUSE_RATIO_99; fg++) {
+		zspage = list_first_entry_or_null(&class->fullness_list[fg],
+						  struct zspage, list);
+		if (zspage) {
+			remove_zspage(class, zspage, fg);
+			return zspage;
+		}
 	}
 
-	for (i = 0; i < 2; i++) {
-		zspage = list_first_entry_or_null(&class->fullness_list[fg[i]],
-							struct zspage, list);
+	return zspage;
+}
+
+static struct zspage *isolate_dst_zspage(struct size_class *class)
+{
+	struct zspage *zspage;
+	int fg;
+
+	for (fg = ZS_INUSE_RATIO_99; fg >= ZS_INUSE_RATIO_10; fg--) {
+		zspage = list_first_entry_or_null(&class->fullness_list[fg],
+						  struct zspage, list);
 		if (zspage) {
-			remove_zspage(class, zspage, fg[i]);
+			remove_zspage(class, zspage, fg);
 			return zspage;
 		}
 	}
@@ -2119,7 +2146,7 @@  static void async_free_zspage(struct work_struct *work)
 			continue;
 
 		spin_lock(&pool->lock);
-		list_splice_init(&class->fullness_list[ZS_EMPTY], &free_pages);
+		list_splice_init(&class->fullness_list[ZS_INUSE_RATIO_0], &free_pages);
 		spin_unlock(&pool->lock);
 	}
 
@@ -2128,7 +2155,7 @@  static void async_free_zspage(struct work_struct *work)
 		lock_zspage(zspage);
 
 		get_zspage_mapping(zspage, &class_idx, &fullness);
-		VM_BUG_ON(fullness != ZS_EMPTY);
+		VM_BUG_ON(fullness != ZS_INUSE_RATIO_0);
 		class = pool->size_class[class_idx];
 		spin_lock(&pool->lock);
 #ifdef CONFIG_ZPOOL
@@ -2201,7 +2228,7 @@  static unsigned long __zs_compact(struct zs_pool *pool,
 	 * as well as zpage allocation/free
 	 */
 	spin_lock(&pool->lock);
-	while ((src_zspage = isolate_zspage(class, true))) {
+	while ((src_zspage = isolate_src_zspage(class))) {
 		/* protect someone accessing the zspage(i.e., zs_map_object) */
 		migrate_write_lock(src_zspage);
 
@@ -2211,7 +2238,7 @@  static unsigned long __zs_compact(struct zs_pool *pool,
 		cc.obj_idx = 0;
 		cc.s_page = get_first_page(src_zspage);
 
-		while ((dst_zspage = isolate_zspage(class, false))) {
+		while ((dst_zspage = isolate_dst_zspage(class))) {
 			migrate_write_lock_nested(dst_zspage);
 
 			cc.d_page = get_first_page(dst_zspage);
@@ -2236,7 +2263,7 @@  static unsigned long __zs_compact(struct zs_pool *pool,
 		putback_zspage(class, dst_zspage);
 		migrate_write_unlock(dst_zspage);
 
-		if (putback_zspage(class, src_zspage) == ZS_EMPTY) {
+		if (putback_zspage(class, src_zspage) == ZS_INUSE_RATIO_0) {
 			migrate_write_unlock(src_zspage);
 			free_zspage(pool, class, src_zspage);
 			pages_freed += class->pages_per_zspage;
@@ -2394,7 +2421,7 @@  struct zs_pool *zs_create_pool(const char *name)
 		int pages_per_zspage;
 		int objs_per_zspage;
 		struct size_class *class;
-		int fullness = 0;
+		int fullness;
 
 		size = ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA;
 		if (size > ZS_MAX_ALLOC_SIZE)
@@ -2448,9 +2475,12 @@  struct zs_pool *zs_create_pool(const char *name)
 		class->pages_per_zspage = pages_per_zspage;
 		class->objs_per_zspage = objs_per_zspage;
 		pool->size_class[i] = class;
-		for (fullness = ZS_EMPTY; fullness < NR_ZS_FULLNESS;
-							fullness++)
+
+		fullness = ZS_INUSE_RATIO_0;
+		while (fullness < NR_ZS_INUSE_RATIO) {
 			INIT_LIST_HEAD(&class->fullness_list[fullness]);
+			fullness++;
+		}
 
 		prev_class = class;
 	}
@@ -2496,11 +2526,12 @@  void zs_destroy_pool(struct zs_pool *pool)
 		if (class->index != i)
 			continue;
 
-		for (fg = ZS_EMPTY; fg < NR_ZS_FULLNESS; fg++) {
-			if (!list_empty(&class->fullness_list[fg])) {
-				pr_info("Freeing non-empty class with size %db, fullness group %d\n",
-					class->size, fg);
-			}
+		for (fg = ZS_INUSE_RATIO_0; fg < NR_ZS_INUSE_RATIO; fg++) {
+			if (list_empty(&class->fullness_list[fg]))
+				continue;
+
+			pr_err("Class-%d fullness group %d is not empty\n",
+			       class->size, fg);
 		}
 		kfree(class);
 	}
@@ -2672,7 +2703,7 @@  static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries)
 			 * while the page is removed from the pool. Fix it
 			 * up for the check in __free_zspage().
 			 */
-			zspage->fullness = ZS_EMPTY;
+			zspage->fullness = ZS_INUSE_RATIO_0;
 
 			__free_zspage(pool, class, zspage);
 			spin_unlock(&pool->lock);