diff mbox series

[19/36] xen/arch: introduce cache-coloring allocator

Message ID 20220304174701.1453977-20-marco.solieri@minervasys.tech (mailing list archive)
State New, archived
Headers show
Series Arm cache coloring | expand

Commit Message

Marco Solieri March 4, 2022, 5:46 p.m. UTC
From: Luca Miccio <lucmiccio@gmail.com>

Introduce a new memory page allocator that implement the cache coloring
mechanism. The allocation algorithm follows the given coloring scheme
specified for each guest, and maximizes contiguity in the page
selection.

Pages are stored by color in separated and address-ordered lists that
are collectively called the colored heap.  These lists will be populated
by a simple initialisation function, which, for any available page,
compute its color and insert it in the corresponding list.  When a
domain requests a page, the allocator take one from the subset of lists
whose colors equal the domain configuration.  It chooses the highest
page element among the lasts elements of such lists.  This ordering
guarantees that contiguous pages are sequentially allocated, if this is
made possible by a color assignment which includes adjacent ids.

The allocator can handle only requests with order equals to 0 since the
single color granularity is represented in memory by one page.

A dump function is added to allow inspection of colored heap
information.

Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
---
 xen/common/page_alloc.c | 264 +++++++++++++++++++++++++++++++++++++++-
 xen/include/xen/mm.h    |   5 +
 2 files changed, 268 insertions(+), 1 deletion(-)

Comments

Jan Beulich March 9, 2022, 2:35 p.m. UTC | #1
On 04.03.2022 18:46, Marco Solieri wrote:
> @@ -438,6 +441,263 @@ mfn_t __init alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align)
>  
>  
>  
> +static DEFINE_SPINLOCK(heap_lock);

Please take the opportunity and shrink the number of consecutive blank lines
above here.

> +
> +#ifdef CONFIG_COLORING
> +/*************************
> + * COLORED SIDE-ALLOCATOR
> + *
> + * Pages are stored by their color in separated lists. Each list defines a color
> + * and it is initialized during end_boot_allocator, where each page's color
> + * is calculated and the page itself is put in the correct list.
> + * After initialization we have N list where N is the number of maximum
> + * available colors on the platform.
> + * All the lists' heads are stored as element in an array with size N-1 using
> + * the following schema:
> + * array[X] = head of color X, where X goes from 0 to N-1
> + */
> +typedef struct page_list_head color_list;

color_list_t, to make it easy to recognize this as a type (rather than
a variable name)?

> +static color_list *color_heap;

__ro_after_init or, if Arm still hasn't got support for  that, at least
__read_mostly.

> +static long total_avail_col_pages;

Can this go negative? If not, unsigned long please.

> +static u64 col_num_max;

No new uses of u<N> or s<N> please. They're being phased out, and C99
types should be used instead.

> +static bool color_init_state = true;

This variable looks to be local to init_col_heap_pages() - please move
it there (provided it's needed in the first place: it's no clear why
you don't use an initcall instead).

> +#define page_to_head(pg) (&color_heap[color_from_page(pg)])
> +#define color_to_head(col) (&color_heap[col])
> +
> +/* Add page in list in order depending on its physical address. */
> +static void page_list_add_order(struct page_info *pg, struct list_head *head)

I guess you mean struct page_list_head * here?

> +{
> +    struct page_info *pos;
> +
> +    /* Add first page after head */
> +    if ( page_list_empty(head) )
> +    {
> +        page_list_add(pg, head);
> +        return;
> +    }
> +
> +    /* Add non-first page in list in ascending order */
> +    page_list_for_each_reverse(pos, head)
> +    {
> +        /* Get pg position */
> +        if ( page_to_maddr(pos) <= page_to_maddr(pg) )

Wouldn't it be a bug if the two were equal? If so, perhaps better
ASSERT() or even BUG_ON() accordingly?

> +        {
> +            /* Insert pg between pos and pos->list.next */
> +            page_list_add(pg, &pos->list);

The 2nd parameter of page_list_add() is struct page_list_head *, not
struct page_list_entry *. I guess you won't get away without introducing
a new accessor.

> +            break;
> +        }
> +
> +        /*
> +         * If pos is the first element it means that pg <= pos so we have
> +         * to insert pg after head.
> +         */
> +        if ( page_list_first(head) == pos )
> +        {
> +            page_list_add(pg, head);
> +            break;
> +        }

The way it's written it's not immediately obvious that the passed in page
would actually be put anywhere on the list by the time the function
returns. Furthermore this if() doesn't look to be necessary to be
evaluated on every loop iteration. Instead it could apparently live
_after_ the loop, requiring use of "return" instead of "break" further up.

> +    }
> +}

This function dealing with a page at a time while linearly scanning the
list looks to be pretty inefficient, the more that it'll be a common
case that callers have to deal with multiple pages at a time. Sadly it is
not clear how many colors there may be (without hunting down the origin
of max_col_num, which get_max_colors() returns, in the earlier 18
patches), and hence how long these lists may grow.

> +/* Alloc one page based on domain color configuration */
> +static struct page_info *alloc_col_heap_page(
> +    unsigned int memflags, struct domain *d)
> +{
> +    struct page_info *pg, *tmp;
> +    bool need_tlbflush = false;
> +    uint32_t cur_color;
> +    uint32_t tlbflush_timestamp = 0;
> +    uint32_t *colors = 0;

Please consult ./CODING_STYLE for when it is appropriate to use fixed-
width types.

> +    int max_colors;
> +    int i;

I don't suppose either of these can go negative, so unsigned int please.
(Just like other remarks - please consider applicable to the entire
series.)

> +    colors = d->colors;
> +    max_colors = d->max_colors;

Please make these the initializers of the variables, which will also
avoid using 0 where NULL is meant above. It also looks as if these were
the only two uses of "d" in the function. If so, please consider making
colors and max_colors the function parameters instead. Or else d likely
wants to be pointer to const (and colors as well as it looks; generally
please use const on pointed-to types wherever possible).

> +    spin_lock(&heap_lock);
> +
> +    tmp = pg = NULL;
> +
> +    /* Check for the first pg on non-empty list */
> +    for ( i = 0; i < max_colors; i++ )
> +    {
> +        if ( !page_list_empty(color_to_head(colors[i])) )
> +        {
> +            tmp = pg = page_list_last(color_to_head(colors[i]));
> +            cur_color = d->colors[i];
> +            break;
> +        }
> +    }
> +
> +    /* If all lists are empty, no requests can be satisfied */
> +    if ( !pg )
> +    {
> +        spin_unlock(&heap_lock);
> +        return NULL;
> +    }

I'm not convinced this is a useful thing to have. The identical
construct below the subsequent loop will deal with this case quite fine
afaict.

> +    /* Get the highest page from the lists compliant to the domain color(s) */
> +    for ( i += 1; i < max_colors; i++ )

Perhaps easier as

    while ( ++i < max_colors )

?

> +    {
> +        if ( page_list_empty(color_to_head(colors[i])) )
> +        {
> +            printk(XENLOG_INFO "List empty\n");

This is liable to be too noisy even if converted to dprintk(). Please
drop.

> +            continue;
> +        }
> +        tmp = page_list_last(color_to_head(colors[i]));
> +        if ( page_to_maddr(tmp) > page_to_maddr(pg) )
> +        {
> +            pg = tmp;
> +            cur_color = colors[i];

You only ever write this variable - please drop such, or introduce them
only once actually needed (if e.g. in a later patch).

> +        }
> +    }
> +
> +    if ( !pg )
> +    {
> +        spin_unlock(&heap_lock);
> +        return NULL;
> +    }
> +
> +    pg->count_info = PGC_state_inuse;
> +
> +    if ( !(memflags & MEMF_no_tlbflush) )
> +        accumulate_tlbflush(&need_tlbflush, pg,
> +                            &tlbflush_timestamp);
> +
> +    /* Initialise fields which have other uses for free pages. */
> +    pg->u.inuse.type_info = 0;
> +    page_set_owner(pg, NULL);

This would now become the 3rd instance - time to consider a small
helper function?

> +    flush_page_to_ram(mfn_x(page_to_mfn(pg)),
> +                      !(memflags & MEMF_no_icache_flush));
> +
> +    page_list_del(pg, page_to_head(pg));
> +    total_avail_col_pages--;
> +
> +    spin_unlock(&heap_lock);
> +
> +    if ( need_tlbflush )
> +        filtered_flush_tlb_mask(tlbflush_timestamp);
> +
> +    return pg;
> +}
> +
> +struct page_info *alloc_col_domheap_page(
> +    struct domain *d, unsigned int memflags)
> +{
> +    struct page_info *pg;
> +
> +    ASSERT(!in_irq());
> +
> +    /* Get page based on color selection */
> +    pg = alloc_col_heap_page(memflags, d);
> +
> +    if ( !pg )
> +    {
> +        printk(XENLOG_INFO "ERROR: Colored Page is null\n");
> +        return NULL;
> +    }
> +
> +    /* Assign page to domain */
> +    if ( d && !(memflags & MEMF_no_owner) &&
> +        assign_page(pg, 0, d, memflags) )
> +    {
> +        free_col_heap_page(pg);
> +        return NULL;
> +    }
> +
> +    return pg;
> +}

So this is really only providing a single order-0 page. From the cover
letter it didn't sound like you were aiming at such a limited use case.
It's also not listed under "Known limitations" there that large pages
don't even have provisions made for.

> +void free_col_heap_page(struct page_info *pg)
> +{
> +    /* This page is not a guest frame any more. */
> +    pg->count_info = PGC_state_free;
> +
> +    page_set_owner(pg, NULL);
> +    total_avail_col_pages++;
> +    page_list_add_order( pg, page_to_head(pg) );

Nit: Stray blanks immediately inside the parentheses.

> +}

How does this fit into the get_page() / put_page() machinery? You
don't alter free_domheap_pages(), after all.

> +static inline void init_col_heap_pages(struct page_info *pg, unsigned long nr_pages)

Why inline? I guess this might be to silence the compiler warning
about the function being unused, but then this only means that you
want to introduce the function once it's needed. Then it would
also be possible to tell whether the function wants to be __init.

Additionally the line is too long.

> +{
> +    int i;
> +
> +    if ( color_init_state )
> +    {
> +        col_num_max = get_max_colors();
> +        color_heap = xmalloc_array(color_list, col_num_max);
> +        BUG_ON(!color_heap);
> +
> +        for ( i = 0; i < col_num_max; i++ )
> +        {
> +            printk(XENLOG_INFO "Init list for color: %u\n", i);

Again too noisy. Such may be okay in a RFC series, but should have been
dropped for a "normal" submission.

> +            INIT_PAGE_LIST_HEAD(&color_heap[i]);
> +        }
> +
> +        color_init_state = false;
> +    }
> +
> +    printk(XENLOG_INFO "Init color heap pages with %lu pages for a given size of 0x%"PRIx64"\n",

While you shouldn't split the format string across lines, you should
take all other available measures to limit line length. Furthermore
please consider using the shorter %#x form here and elsewhere. Overall:

    printk(XENLOG_INFO
           "Init color heap with %lu pages for a given size of 0x%"PRIx64"\n",

And even then the two values logged are redundant with one another,
so things can further be shortened here.

> +            nr_pages, nr_pages * PAGE_SIZE);

Nit: Indentation.

> +    printk(XENLOG_INFO "Paging starting from: 0x%"PRIx64"\n", page_to_maddr(pg));
> +    total_avail_col_pages += nr_pages;
> +
> +    for ( i = 0; i < nr_pages; i++ )
> +    {
> +        pg->colored = true;
> +        page_list_add_order(pg, page_to_head(pg));
> +        pg++;
> +    }
> +}
> +
> +static inline bool is_page_colored(struct page_info *pg)
> +{
> +        return pg->colored;

Nit: Indentation again (and more instance below).

> +}
> +
> +static void dump_col_heap(unsigned char key)
> +{
> +    struct page_info *pg;
> +    unsigned long size;
> +    unsigned int i;
> +
> +    printk("Colored heap info\n");
> +    for ( i = 0; i < col_num_max; i++ )
> +    {
> +        printk("Heap[%u]: ", i);
> +        size = 0;
> +        page_list_for_each( pg, color_to_head(i) )
> +        {
> +            BUG_ON(!(color_from_page(pg) == i));
> +            size++;
> +        }

How long is this going to take on a decently sized system? At the
very least you'll need to call process_pending_softirqs() every
once in a while. But this may be taking too long this way anyway.

> +        printk("%lu pages -> %lukB free\n", size, size << (PAGE_SHIFT - 10));

Again the same information is being logged twice.

> +    }
> +
> +    printk("Total number of pages: %lu\n", total_avail_col_pages);

Since the value logged isn't calculated locally, may I suggest to
merge this into the initial printk()?

> +}
> +#else /* !CONFIG_COLORING */
> +#define init_col_heap_pages(x, y) init_heap_pages(x, y)
> +
> +inline struct page_info *alloc_col_domheap_page(
> +	struct domain *d, unsigned int memflags)
> +{
> +	return NULL;
> +}
> +
> +inline void free_col_heap_page(struct page_info *pg)
> +{
> +	return;
> +}
> +
> +static inline bool is_page_colored(struct page_info *pg)
> +{
> +        return false;
> +}

Why are any of these needed? And if any are needed, please drop
inline once again.

> @@ -2600,6 +2859,9 @@ static void cf_check dump_heap(unsigned char key)
>  static __init int cf_check register_heap_trigger(void)
>  {
>      register_keyhandler('H', dump_heap, "dump heap info", 1);
> +#ifdef CONFIG_COLORING
> +    register_keyhandler('c', dump_col_heap, "dump coloring heap info", 1);

'c' already has a use on x86. Please avoid such collisions, even if
initially you're targeting Arm only. I don't see why a separate key
is needed anyway - can't you just extend dump_heap()?

> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -131,6 +131,11 @@ unsigned int online_page(mfn_t mfn, uint32_t *status);
>  int offline_page(mfn_t mfn, int broken, uint32_t *status);
>  int query_page_offline(mfn_t mfn, uint32_t *status);
>  
> +/* Colored suballocator. */
> +struct page_info *alloc_col_domheap_page(
> +    struct domain *d, unsigned int memflags);
> +void free_col_heap_page(struct page_info *pg);

These two should imo represent a pair, i.e. be named similarly.

Jan
diff mbox series

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 4635718237..82f6e8330a 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -150,6 +150,9 @@ 
 #define p2m_pod_offline_or_broken_hit(pg) 0
 #define p2m_pod_offline_or_broken_replace(pg) BUG_ON(pg != NULL)
 #endif
+#ifdef CONFIG_COLORING
+#include <asm/coloring.h>
+#endif
 
 #ifndef PGC_reserved
 #define PGC_reserved 0
@@ -438,6 +441,263 @@  mfn_t __init alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align)
 
 
 
+static DEFINE_SPINLOCK(heap_lock);
+
+#ifdef CONFIG_COLORING
+/*************************
+ * COLORED SIDE-ALLOCATOR
+ *
+ * Pages are stored by their color in separated lists. Each list defines a color
+ * and it is initialized during end_boot_allocator, where each page's color
+ * is calculated and the page itself is put in the correct list.
+ * After initialization we have N list where N is the number of maximum
+ * available colors on the platform.
+ * All the lists' heads are stored as element in an array with size N-1 using
+ * the following schema:
+ * array[X] = head of color X, where X goes from 0 to N-1
+ */
+typedef struct page_list_head color_list;
+static color_list *color_heap;
+static long total_avail_col_pages;
+static u64 col_num_max;
+static bool color_init_state = true;
+
+#define page_to_head(pg) (&color_heap[color_from_page(pg)])
+#define color_to_head(col) (&color_heap[col])
+
+/* Add page in list in order depending on its physical address. */
+static void page_list_add_order(struct page_info *pg, struct list_head *head)
+{
+    struct page_info *pos;
+
+    /* Add first page after head */
+    if ( page_list_empty(head) )
+    {
+        page_list_add(pg, head);
+        return;
+    }
+
+    /* Add non-first page in list in ascending order */
+    page_list_for_each_reverse(pos, head)
+    {
+        /* Get pg position */
+        if ( page_to_maddr(pos) <= page_to_maddr(pg) )
+        {
+            /* Insert pg between pos and pos->list.next */
+            page_list_add(pg, &pos->list);
+            break;
+        }
+
+        /*
+         * If pos is the first element it means that pg <= pos so we have
+         * to insert pg after head.
+         */
+        if ( page_list_first(head) == pos )
+        {
+            page_list_add(pg, head);
+            break;
+        }
+    }
+}
+
+/* Alloc one page based on domain color configuration */
+static struct page_info *alloc_col_heap_page(
+    unsigned int memflags, struct domain *d)
+{
+    struct page_info *pg, *tmp;
+    bool need_tlbflush = false;
+    uint32_t cur_color;
+    uint32_t tlbflush_timestamp = 0;
+    uint32_t *colors = 0;
+    int max_colors;
+    int i;
+
+    colors = d->colors;
+    max_colors = d->max_colors;
+
+    spin_lock(&heap_lock);
+
+    tmp = pg = NULL;
+
+    /* Check for the first pg on non-empty list */
+    for ( i = 0; i < max_colors; i++ )
+    {
+        if ( !page_list_empty(color_to_head(colors[i])) )
+        {
+            tmp = pg = page_list_last(color_to_head(colors[i]));
+            cur_color = d->colors[i];
+            break;
+        }
+    }
+
+    /* If all lists are empty, no requests can be satisfied */
+    if ( !pg )
+    {
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
+
+    /* Get the highest page from the lists compliant to the domain color(s) */
+    for ( i += 1; i < max_colors; i++ )
+    {
+        if ( page_list_empty(color_to_head(colors[i])) )
+        {
+            printk(XENLOG_INFO "List empty\n");
+            continue;
+        }
+        tmp = page_list_last(color_to_head(colors[i]));
+        if ( page_to_maddr(tmp) > page_to_maddr(pg) )
+        {
+            pg = tmp;
+            cur_color = colors[i];
+        }
+    }
+
+    if ( !pg )
+    {
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
+
+    pg->count_info = PGC_state_inuse;
+
+    if ( !(memflags & MEMF_no_tlbflush) )
+        accumulate_tlbflush(&need_tlbflush, pg,
+                            &tlbflush_timestamp);
+
+    /* Initialise fields which have other uses for free pages. */
+    pg->u.inuse.type_info = 0;
+    page_set_owner(pg, NULL);
+
+    flush_page_to_ram(mfn_x(page_to_mfn(pg)),
+                      !(memflags & MEMF_no_icache_flush));
+
+    page_list_del(pg, page_to_head(pg));
+    total_avail_col_pages--;
+
+    spin_unlock(&heap_lock);
+
+    if ( need_tlbflush )
+        filtered_flush_tlb_mask(tlbflush_timestamp);
+
+    return pg;
+}
+
+struct page_info *alloc_col_domheap_page(
+    struct domain *d, unsigned int memflags)
+{
+    struct page_info *pg;
+
+    ASSERT(!in_irq());
+
+    /* Get page based on color selection */
+    pg = alloc_col_heap_page(memflags, d);
+
+    if ( !pg )
+    {
+        printk(XENLOG_INFO "ERROR: Colored Page is null\n");
+        return NULL;
+    }
+
+    /* Assign page to domain */
+    if ( d && !(memflags & MEMF_no_owner) &&
+        assign_page(pg, 0, d, memflags) )
+    {
+        free_col_heap_page(pg);
+        return NULL;
+    }
+
+    return pg;
+}
+
+void free_col_heap_page(struct page_info *pg)
+{
+    /* This page is not a guest frame any more. */
+    pg->count_info = PGC_state_free;
+
+    page_set_owner(pg, NULL);
+    total_avail_col_pages++;
+    page_list_add_order( pg, page_to_head(pg) );
+}
+
+static inline void init_col_heap_pages(struct page_info *pg, unsigned long nr_pages)
+{
+    int i;
+
+    if ( color_init_state )
+    {
+        col_num_max = get_max_colors();
+        color_heap = xmalloc_array(color_list, col_num_max);
+        BUG_ON(!color_heap);
+
+        for ( i = 0; i < col_num_max; i++ )
+        {
+            printk(XENLOG_INFO "Init list for color: %u\n", i);
+            INIT_PAGE_LIST_HEAD(&color_heap[i]);
+        }
+
+        color_init_state = false;
+    }
+
+    printk(XENLOG_INFO "Init color heap pages with %lu pages for a given size of 0x%"PRIx64"\n",
+            nr_pages, nr_pages * PAGE_SIZE);
+    printk(XENLOG_INFO "Paging starting from: 0x%"PRIx64"\n", page_to_maddr(pg));
+    total_avail_col_pages += nr_pages;
+
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        pg->colored = true;
+        page_list_add_order(pg, page_to_head(pg));
+        pg++;
+    }
+}
+
+static inline bool is_page_colored(struct page_info *pg)
+{
+        return pg->colored;
+}
+
+static void dump_col_heap(unsigned char key)
+{
+    struct page_info *pg;
+    unsigned long size;
+    unsigned int i;
+
+    printk("Colored heap info\n");
+    for ( i = 0; i < col_num_max; i++ )
+    {
+        printk("Heap[%u]: ", i);
+        size = 0;
+        page_list_for_each( pg, color_to_head(i) )
+        {
+            BUG_ON(!(color_from_page(pg) == i));
+            size++;
+        }
+        printk("%lu pages -> %lukB free\n", size, size << (PAGE_SHIFT - 10));
+    }
+
+    printk("Total number of pages: %lu\n", total_avail_col_pages);
+}
+#else /* !CONFIG_COLORING */
+#define init_col_heap_pages(x, y) init_heap_pages(x, y)
+
+inline struct page_info *alloc_col_domheap_page(
+	struct domain *d, unsigned int memflags)
+{
+	return NULL;
+}
+
+inline void free_col_heap_page(struct page_info *pg)
+{
+	return;
+}
+
+static inline bool is_page_colored(struct page_info *pg)
+{
+        return false;
+}
+#endif /* CONFIG_COLORING */
+
 /*************************
  * BINARY BUDDY ALLOCATOR
  */
@@ -458,7 +718,6 @@  static unsigned long node_need_scrub[MAX_NUMNODES];
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
-static DEFINE_SPINLOCK(heap_lock);
 static long outstanding_claims; /* total outstanding claims by all domains */
 
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
@@ -2600,6 +2859,9 @@  static void cf_check dump_heap(unsigned char key)
 static __init int cf_check register_heap_trigger(void)
 {
     register_keyhandler('H', dump_heap, "dump heap info", 1);
+#ifdef CONFIG_COLORING
+    register_keyhandler('c', dump_col_heap, "dump coloring heap info", 1);
+#endif
     return 0;
 }
 __initcall(register_heap_trigger);
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index f0861ed5bb..63288e537c 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -131,6 +131,11 @@  unsigned int online_page(mfn_t mfn, uint32_t *status);
 int offline_page(mfn_t mfn, int broken, uint32_t *status);
 int query_page_offline(mfn_t mfn, uint32_t *status);
 
+/* Colored suballocator. */
+struct page_info *alloc_col_domheap_page(
+    struct domain *d, unsigned int memflags);
+void free_col_heap_page(struct page_info *pg);
+
 void heap_init_late(void);
 
 int assign_pages(