diff mbox series

[V4] xen/gnttab: Store frame GFN in struct page_info on Arm

Message ID 1638563610-4419-1-git-send-email-olekstysh@gmail.com (mailing list archive)
State Superseded
Headers show
Series [V4] xen/gnttab: Store frame GFN in struct page_info on Arm | expand

Commit Message

Oleksandr Tyshchenko Dec. 3, 2021, 8:33 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Rework Arm implementation to store grant table frame GFN
in struct page_info directly instead of keeping it in
standalone status/shared arrays. This patch is based on
the assumption that grant table page is the xenheap page.

To cover 64-bit/40-bit IPA on Arm64/Arm32 we need the space
to hold 52-bit/28-bit + extra bit value respectively. In order
to not grow the size of struct page_info borrow the required
amount of bits from type_info's count portion which current
context won't suffer (currently only 1 bit is used on Arm).
Please note, to minimize code changes and avoid introducing
an extra #ifdef-s to the header, we keep the same amount of
bits on both subarches, although the count portion on Arm64
could be wider, so we waste some bits here.

Introduce corresponding PGT_* constructs and access macros.
Update existing gnttab macros to deal with GFN value according
to new location. Also update the use of count portion on Arm
in share_xen_page_with_guest().

While at it, extend this simplified M2P-like approach for any
xenheap pages which are proccessed in xenmem_add_to_physmap_one()
except foreign ones. Update the code to set GFN portion after
establishing new mapping for the xenheap page in said function
and to clean GFN portion when putting a reference on that page
in p2m_put_l3_page().

And for everything to work correctly introduce arch-specific
macros arch_alloc_(free)xenheap_page to be called from
alloc_(free)xenheap_pages() respectively, the former's purpose
on Arm is to clear the GFN portion before use, the latter was
left dummy for now. On x86 both are just stubs.

This patch is intended to fix the potential issue on Arm
which might happen when remapping grant-table frame.
A guest (or the toolstack) will unmap the grant-table frame
using XENMEM_remove_physmap. This is a generic hypercall,
so on x86, we are relying on the fact the M2P entry will
be cleared on removal. For architecture without the M2P,
the GFN would still be present in the grant frame/status
array. So on the next call to map the page, we will end up to
request the P2M to remove whatever mapping was the given GFN.
This could well be another mapping.

Besides that, this patch simplifies arch code on Arm by
removing arrays and corresponding management code and
as the result gnttab_init_arch/gnttab_destroy_arch helpers
and struct grant_table_arch become useless and can be
dropped globally.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Dear @RISC-V maintainers, please note in current patch I drop arch
specific gnttab_init(destroy)_arch helpers as unneeded for both Arm and x86.
Please let me know if you are going to reuse them in the nearest future and
I will retain them.

You can find the related discussions at:
https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/
https://lore.kernel.org/xen-devel/1632425551-18910-1-git-send-email-olekstysh@gmail.com/

Changes RFC1 -> RFC2:
 - update patch description
 - add/update comments in code
 - clarify check in p2m_put_l3_page()
 - introduce arch_alloc_xenheap_page() and arch_free_xenheap_page()
   and drop page_arch_init()
 - add ASSERT to gnttab_shared_page() and gnttab_status_page()
 - rework changes to Arm's struct page_info: do not split type_info,
   allocate GFN portion by reducing count portion, create corresponding
   PGT_* construct, etc
 - update page_get_frame_gfn() and page_set_frame_gfn()
 - update the use of count portion on Arm
 - drop the leading underscore in the macro parameter names

Changes RFC2 -> RFC3:
 - update patch description
 - drop PGT_count_base and MASK_INSR() in share_xen_page_with_guest()
 - update alloc_xenheap_page() and free_xenheap_page() for SEPARATE_XENHEAP
   case (Arm32)
 - provide an extra bit for GFN portion, to get PGT_INVALID_FRAME_GFN
   one bit more than the maximum number of physical address bits on Arm32

Changes RFC3 -> V4:
 - rebase on Jan's "gnttab: remove guest_physmap_remove_page() call
   from gnttab_map_frame()"
 - finally resolve locking question by recent Julien's suggestion,
   so drop the RFC tag
 - update comments in Arm's mm.h/p2m.c to not mention grant table
 - convert page_set(get)_frame_gfn to static inline func and
   rename them to page_set(get)_xenheap_gfn()
 - rename PGT_INVALID_FRAME_GFN to PGT_INVALID_XENHEAP_GFN
 - add ASSERT(is_xen_heap_page(...)) in page_set(get)_frame_gfn
 - remove BUG_ON() in arch_free_xenheap_page
 - remove local type_info in share_xen_page_with_guest()
 - remove an extra argument p2m in p2m_put_l3_page()
 - remove #ifdef CONFIG_GRANT_TABLE in p2m_put_l3_page()
 - also cover real-only pages by using p2m_is_ram instead of a check
   against p2m_ram_rw in p2m_put_l3_page() and use "else if" construct
 - call arch_free_xenheap_page() before clearing the PGC_xen_heap in
   free_xenheap_pages()
 - remove ASSERT() in gnttab_shared(status)_page and use simpler
   virt_to_page
 - remove local pg_ in gnttab_shared(status)_gfn
 - update patch description to reflect recent changes
---
 xen/arch/arm/mm.c                 | 24 ++++++++++++++--
 xen/arch/arm/p2m.c                |  7 +++--
 xen/common/grant_table.c          |  9 ------
 xen/common/page_alloc.c           | 20 +++++++++++++-
 xen/include/asm-arm/grant_table.h | 58 ++++++++++++---------------------------
 xen/include/asm-arm/mm.h          | 39 ++++++++++++++++++++++++--
 xen/include/asm-x86/grant_table.h |  5 ----
 xen/include/asm-x86/mm.h          |  4 +++
 8 files changed, 102 insertions(+), 64 deletions(-)

Comments

Oleksandr Tyshchenko Dec. 6, 2021, 4:15 p.m. UTC | #1
Hi Jan


May I please ask, are you OK with the proposed changes?


On 03.12.21 22:33, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Rework Arm implementation to store grant table frame GFN
> in struct page_info directly instead of keeping it in
> standalone status/shared arrays. This patch is based on
> the assumption that grant table page is the xenheap page.
>
> To cover 64-bit/40-bit IPA on Arm64/Arm32 we need the space
> to hold 52-bit/28-bit + extra bit value respectively. In order
> to not grow the size of struct page_info borrow the required
> amount of bits from type_info's count portion which current
> context won't suffer (currently only 1 bit is used on Arm).
> Please note, to minimize code changes and avoid introducing
> an extra #ifdef-s to the header, we keep the same amount of
> bits on both subarches, although the count portion on Arm64
> could be wider, so we waste some bits here.
>
> Introduce corresponding PGT_* constructs and access macros.
> Update existing gnttab macros to deal with GFN value according
> to new location. Also update the use of count portion on Arm
> in share_xen_page_with_guest().
>
> While at it, extend this simplified M2P-like approach for any
> xenheap pages which are proccessed in xenmem_add_to_physmap_one()
> except foreign ones. Update the code to set GFN portion after
> establishing new mapping for the xenheap page in said function
> and to clean GFN portion when putting a reference on that page
> in p2m_put_l3_page().
>
> And for everything to work correctly introduce arch-specific
> macros arch_alloc_(free)xenheap_page to be called from
> alloc_(free)xenheap_pages() respectively, the former's purpose
> on Arm is to clear the GFN portion before use, the latter was
> left dummy for now. On x86 both are just stubs.
>
> This patch is intended to fix the potential issue on Arm
> which might happen when remapping grant-table frame.
> A guest (or the toolstack) will unmap the grant-table frame
> using XENMEM_remove_physmap. This is a generic hypercall,
> so on x86, we are relying on the fact the M2P entry will
> be cleared on removal. For architecture without the M2P,
> the GFN would still be present in the grant frame/status
> array. So on the next call to map the page, we will end up to
> request the P2M to remove whatever mapping was the given GFN.
> This could well be another mapping.
>
> Besides that, this patch simplifies arch code on Arm by
> removing arrays and corresponding management code and
> as the result gnttab_init_arch/gnttab_destroy_arch helpers
> and struct grant_table_arch become useless and can be
> dropped globally.
>
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Dear @RISC-V maintainers, please note in current patch I drop arch
> specific gnttab_init(destroy)_arch helpers as unneeded for both Arm and x86.
> Please let me know if you are going to reuse them in the nearest future and
> I will retain them.
>
> You can find the related discussions at:
> https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
> https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
> https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/
> https://lore.kernel.org/xen-devel/1632425551-18910-1-git-send-email-olekstysh@gmail.com/
>
> Changes RFC1 -> RFC2:
>   - update patch description
>   - add/update comments in code
>   - clarify check in p2m_put_l3_page()
>   - introduce arch_alloc_xenheap_page() and arch_free_xenheap_page()
>     and drop page_arch_init()
>   - add ASSERT to gnttab_shared_page() and gnttab_status_page()
>   - rework changes to Arm's struct page_info: do not split type_info,
>     allocate GFN portion by reducing count portion, create corresponding
>     PGT_* construct, etc
>   - update page_get_frame_gfn() and page_set_frame_gfn()
>   - update the use of count portion on Arm
>   - drop the leading underscore in the macro parameter names
>
> Changes RFC2 -> RFC3:
>   - update patch description
>   - drop PGT_count_base and MASK_INSR() in share_xen_page_with_guest()
>   - update alloc_xenheap_page() and free_xenheap_page() for SEPARATE_XENHEAP
>     case (Arm32)
>   - provide an extra bit for GFN portion, to get PGT_INVALID_FRAME_GFN
>     one bit more than the maximum number of physical address bits on Arm32
>
> Changes RFC3 -> V4:
>   - rebase on Jan's "gnttab: remove guest_physmap_remove_page() call
>     from gnttab_map_frame()"
>   - finally resolve locking question by recent Julien's suggestion,
>     so drop the RFC tag
>   - update comments in Arm's mm.h/p2m.c to not mention grant table
>   - convert page_set(get)_frame_gfn to static inline func and
>     rename them to page_set(get)_xenheap_gfn()
>   - rename PGT_INVALID_FRAME_GFN to PGT_INVALID_XENHEAP_GFN
>   - add ASSERT(is_xen_heap_page(...)) in page_set(get)_frame_gfn
>   - remove BUG_ON() in arch_free_xenheap_page
>   - remove local type_info in share_xen_page_with_guest()
>   - remove an extra argument p2m in p2m_put_l3_page()
>   - remove #ifdef CONFIG_GRANT_TABLE in p2m_put_l3_page()
>   - also cover real-only pages by using p2m_is_ram instead of a check
>     against p2m_ram_rw in p2m_put_l3_page() and use "else if" construct
>   - call arch_free_xenheap_page() before clearing the PGC_xen_heap in
>     free_xenheap_pages()
>   - remove ASSERT() in gnttab_shared(status)_page and use simpler
>     virt_to_page
>   - remove local pg_ in gnttab_shared(status)_gfn
>   - update patch description to reflect recent changes
> ---
>   xen/arch/arm/mm.c                 | 24 ++++++++++++++--
>   xen/arch/arm/p2m.c                |  7 +++--
>   xen/common/grant_table.c          |  9 ------
>   xen/common/page_alloc.c           | 20 +++++++++++++-
>   xen/include/asm-arm/grant_table.h | 58 ++++++++++++---------------------------
>   xen/include/asm-arm/mm.h          | 39 ++++++++++++++++++++++++--
>   xen/include/asm-x86/grant_table.h |  5 ----
>   xen/include/asm-x86/mm.h          |  4 +++
>   8 files changed, 102 insertions(+), 64 deletions(-)
>
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index eea926d..4f4cab3 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1382,8 +1382,10 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
>       spin_lock(&d->page_alloc_lock);
>   
>       /* The incremented type count pins as writable or read-only. */
> -    page->u.inuse.type_info =
> -        (flags == SHARE_ro ? PGT_none : PGT_writable_page) | 1;
> +    page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
> +    page->u.inuse.type_info |= (flags == SHARE_ro ? PGT_none
> +                                                  : PGT_writable_page) |
> +                                MASK_INSR(1, PGT_count_mask);
>   
>       page_set_owner(page, d);
>       smp_wmb(); /* install valid domain ptr before updating refcnt. */
> @@ -1487,7 +1489,23 @@ int xenmem_add_to_physmap_one(
>       }
>   
>       /* Map at new location. */
> -    rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
> +    if ( !p2m_is_ram(t) || !is_xen_heap_mfn(mfn) )
> +        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
> +    else
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        p2m_write_lock(p2m);
> +        if ( gfn_eq(page_get_xenheap_gfn(mfn_to_page(mfn)), INVALID_GFN) )
> +        {
> +            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, p2m->default_access);
> +            if ( !rc )
> +                page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
> +        }
> +        else
> +            rc = -EBUSY;
> +        p2m_write_unlock(p2m);
> +    }
>   
>       /*
>        * For XENMAPSPACE_gmfn_foreign if we failed to add the mapping, we need
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 8b20b43..fd8aff9 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -720,6 +720,8 @@ static int p2m_mem_access_radix_set(struct p2m_domain *p2m, gfn_t gfn,
>    */
>   static void p2m_put_l3_page(const lpae_t pte)
>   {
> +    mfn_t mfn = lpae_get_mfn(pte);
> +
>       ASSERT(p2m_is_valid(pte));
>   
>       /*
> @@ -731,11 +733,12 @@ static void p2m_put_l3_page(const lpae_t pte)
>        */
>       if ( p2m_is_foreign(pte.p2m.type) )
>       {
> -        mfn_t mfn = lpae_get_mfn(pte);
> -
>           ASSERT(mfn_valid(mfn));
>           put_page(mfn_to_page(mfn));
>       }
> +    /* Detect the xenheap page and mark the stored GFN as invalid. */
> +    else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
> +        page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
>   }
>   
>   /* Free lpae sub-tree behind an entry */
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 0262f2c..01d7a29 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -94,8 +94,6 @@ struct grant_table {
>   
>       /* Domain to which this struct grant_table belongs. */
>       struct domain *domain;
> -
> -    struct grant_table_arch arch;
>   };
>   
>   unsigned int __read_mostly opt_max_grant_frames = 64;
> @@ -1997,14 +1995,9 @@ int grant_table_init(struct domain *d, int max_grant_frames,
>   
>       grant_write_lock(gt);
>   
> -    ret = gnttab_init_arch(gt);
> -    if ( ret )
> -        goto unlock;
> -
>       /* gnttab_grow_table() allocates a min number of frames, so 0 is okay. */
>       ret = gnttab_grow_table(d, 0);
>   
> - unlock:
>       grant_write_unlock(gt);
>   
>    out:
> @@ -3911,8 +3904,6 @@ grant_table_destroy(
>       if ( t == NULL )
>           return;
>   
> -    gnttab_destroy_arch(t);
> -
>       for ( i = 0; i < nr_grant_frames(t); i++ )
>           free_xenheap_page(t->shared_raw[i]);
>       xfree(t->shared_raw);
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index d0baaa2..2306d9a 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2161,6 +2161,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe)
>   void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>   {
>       struct page_info *pg;
> +    unsigned int i;
>   
>       ASSERT(!in_irq());
>   
> @@ -2169,6 +2170,9 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>       if ( unlikely(pg == NULL) )
>           return NULL;
>   
> +    for ( i = 0; i < (1u << order); i++ )
> +        arch_alloc_xenheap_page(&pg[i]);
> +
>       memguard_unguard_range(page_to_virt(pg), 1 << (order + PAGE_SHIFT));
>   
>       return page_to_virt(pg);
> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>   
>   void free_xenheap_pages(void *v, unsigned int order)
>   {
> +    struct page_info *pg;
> +    unsigned int i;
> +
>       ASSERT(!in_irq());
>   
>       if ( v == NULL )
>           return;
>   
> +    pg = virt_to_page(v);
> +
>       memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>   
> -    free_heap_pages(virt_to_page(v), order, false);
> +    for ( i = 0; i < (1u << order); i++ )
> +        arch_free_xenheap_page(&pg[i]);
> +
> +    free_heap_pages(pg, order, false);
>   }
>   
>   #else  /* !CONFIG_SEPARATE_XENHEAP */
> @@ -2220,7 +2232,10 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>           return NULL;
>   
>       for ( i = 0; i < (1u << order); i++ )
> +    {
>           pg[i].count_info |= PGC_xen_heap;
> +        arch_alloc_xenheap_page(&pg[i]);
> +    }
>   
>       return page_to_virt(pg);
>   }
> @@ -2238,7 +2253,10 @@ void free_xenheap_pages(void *v, unsigned int order)
>       pg = virt_to_page(v);
>   
>       for ( i = 0; i < (1u << order); i++ )
> +    {
> +        arch_free_xenheap_page(&pg[i]);
>           pg[i].count_info &= ~PGC_xen_heap;
> +    }
>   
>       free_heap_pages(pg, order, true);
>   }
> diff --git a/xen/include/asm-arm/grant_table.h b/xen/include/asm-arm/grant_table.h
> index d31a4d6..d6fda31 100644
> --- a/xen/include/asm-arm/grant_table.h
> +++ b/xen/include/asm-arm/grant_table.h
> @@ -11,11 +11,6 @@
>   #define INITIAL_NR_GRANT_FRAMES 1U
>   #define GNTTAB_MAX_VERSION 1
>   
> -struct grant_table_arch {
> -    gfn_t *shared_gfn;
> -    gfn_t *status_gfn;
> -};
> -
>   static inline void gnttab_clear_flags(struct domain *d,
>                                         unsigned int mask, uint16_t *addr)
>   {
> @@ -46,41 +41,12 @@ int replace_grant_host_mapping(unsigned long gpaddr, mfn_t mfn,
>   #define gnttab_dom0_frames()                                             \
>       min_t(unsigned int, opt_max_grant_frames, PFN_DOWN(_etext - _stext))
>   
> -#define gnttab_init_arch(gt)                                             \
> -({                                                                       \
> -    unsigned int ngf_ = (gt)->max_grant_frames;                          \
> -    unsigned int nsf_ = grant_to_status_frames(ngf_);                    \
> -                                                                         \
> -    (gt)->arch.shared_gfn = xmalloc_array(gfn_t, ngf_);                  \
> -    (gt)->arch.status_gfn = xmalloc_array(gfn_t, nsf_);                  \
> -    if ( (gt)->arch.shared_gfn && (gt)->arch.status_gfn )                \
> -    {                                                                    \
> -        while ( ngf_-- )                                                 \
> -            (gt)->arch.shared_gfn[ngf_] = INVALID_GFN;                   \
> -        while ( nsf_-- )                                                 \
> -            (gt)->arch.status_gfn[nsf_] = INVALID_GFN;                   \
> -    }                                                                    \
> -    else                                                                 \
> -        gnttab_destroy_arch(gt);                                         \
> -    (gt)->arch.shared_gfn ? 0 : -ENOMEM;                                 \
> -})
> -
> -#define gnttab_destroy_arch(gt)                                          \
> -    do {                                                                 \
> -        XFREE((gt)->arch.shared_gfn);                                    \
> -        XFREE((gt)->arch.status_gfn);                                    \
> -    } while ( 0 )
> -
>   #define gnttab_set_frame_gfn(gt, st, idx, gfn, mfn)                      \
>       ({                                                                   \
> -        int rc_ = 0;                                                     \
>           gfn_t ogfn = gnttab_get_frame_gfn(gt, st, idx);                  \
> -        if ( gfn_eq(ogfn, INVALID_GFN) || gfn_eq(ogfn, gfn) ||           \
> -             (rc_ = guest_physmap_remove_page((gt)->domain, ogfn, mfn,   \
> -                                              0)) == 0 )                 \
> -            ((st) ? (gt)->arch.status_gfn                                \
> -                  : (gt)->arch.shared_gfn)[idx] = (gfn);                 \
> -        rc_;                                                             \
> +        (!gfn_eq(ogfn, INVALID_GFN) && !gfn_eq(ogfn, gfn))               \
> +         ? guest_physmap_remove_page((gt)->domain, ogfn, mfn, 0)         \
> +         : 0;                                                            \
>       })
>   
>   #define gnttab_get_frame_gfn(gt, st, idx) ({                             \
> @@ -88,11 +54,21 @@ int replace_grant_host_mapping(unsigned long gpaddr, mfn_t mfn,
>           : gnttab_shared_gfn(NULL, gt, idx);                              \
>   })
>   
> -#define gnttab_shared_gfn(d, t, i)                                       \
> -    (((i) >= nr_grant_frames(t)) ? INVALID_GFN : (t)->arch.shared_gfn[i])
> +#define gnttab_shared_page(t, i) ({                                      \
> +    virt_to_page((t)->shared_raw[i]);                                    \
> +})
> +
> +#define gnttab_status_page(t, i) ({                                      \
> +    virt_to_page((t)->status[i]);                                        \
> +})
>   
> -#define gnttab_status_gfn(d, t, i)                                       \
> -    (((i) >= nr_status_frames(t)) ? INVALID_GFN : (t)->arch.status_gfn[i])
> +#define gnttab_shared_gfn(d, t, i) ({                                    \
> +    page_get_xenheap_gfn(gnttab_shared_page(t, i));                      \
> +})
> +
> +#define gnttab_status_gfn(d, t, i) ({                                    \
> +    page_get_xenheap_gfn(gnttab_status_page(t, i));                      \
> +})
>   
>   #define gnttab_need_iommu_mapping(d)                    \
>       (is_domain_direct_mapped(d) && is_iommu_enabled(d))
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 7b5e7b7..74b6485 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -98,9 +98,16 @@ struct page_info
>   #define PGT_writable_page PG_mask(1, 1)  /* has writable mappings?         */
>   #define PGT_type_mask     PG_mask(1, 1)  /* Bits 31 or 63.                 */
>   
> - /* Count of uses of this frame as its current type. */
> -#define PGT_count_width   PG_shift(2)
> -#define PGT_count_mask    ((1UL<<PGT_count_width)-1)
> + /* 2-bit count of uses of this frame as its current type. */
> +#define PGT_count_mask    PG_mask(3, 3)
> +
> +/*
> + * Stored in bits [28:0] or [60:0] GFN if page is xenheap page.
> + */
> +#define PGT_gfn_width     PG_shift(3)
> +#define PGT_gfn_mask      ((1UL<<PGT_gfn_width)-1)
> +
> +#define PGT_INVALID_XENHEAP_GFN   _gfn(PGT_gfn_mask)
>   
>    /* Cleared when the owning guest 'frees' this page. */
>   #define _PGC_allocated    PG_shift(1)
> @@ -166,6 +173,32 @@ extern unsigned long xenheap_base_pdx;
>   
>   #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
>   
> +static inline gfn_t page_get_xenheap_gfn(struct page_info *p)
> +{
> +    gfn_t gfn_ = _gfn(p->u.inuse.type_info & PGT_gfn_mask);
> +
> +    ASSERT(is_xen_heap_page(p));
> +
> +    return gfn_eq(gfn_, PGT_INVALID_XENHEAP_GFN) ? INVALID_GFN : gfn_;
> +}
> +
> +static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
> +{
> +    gfn_t gfn_ = gfn_eq(gfn, INVALID_GFN) ? PGT_INVALID_XENHEAP_GFN : gfn;
> +
> +    ASSERT(is_xen_heap_page(p));
> +
> +    p->u.inuse.type_info &= ~PGT_gfn_mask;
> +    p->u.inuse.type_info |= gfn_x(gfn_);
> +}
> +
> +/*
> + * As the struct page_info representing the xenheap page on Arm can contain
> + * the valid GFN we need to clear it beforehand.
> + */
> +#define arch_alloc_xenheap_page(p)   page_set_xenheap_gfn(p, INVALID_GFN)
> +#define arch_free_xenheap_page(p)    do {} while ( 0 )
> +
>   #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
>   /* PDX of the first page in the frame table. */
>   extern unsigned long frametable_base_pdx;
> diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h
> index a8a2143..5c23cec 100644
> --- a/xen/include/asm-x86/grant_table.h
> +++ b/xen/include/asm-x86/grant_table.h
> @@ -14,9 +14,6 @@
>   
>   #define INITIAL_NR_GRANT_FRAMES 1U
>   
> -struct grant_table_arch {
> -};
> -
>   static inline int create_grant_host_mapping(uint64_t addr, mfn_t frame,
>                                               unsigned int flags,
>                                               unsigned int cache_flags)
> @@ -35,8 +32,6 @@ static inline int replace_grant_host_mapping(uint64_t addr, mfn_t frame,
>       return replace_grant_pv_mapping(addr, frame, new_addr, flags);
>   }
>   
> -#define gnttab_init_arch(gt) 0
> -#define gnttab_destroy_arch(gt) do {} while ( 0 )
>   #define gnttab_set_frame_gfn(gt, st, idx, gfn, mfn)                      \
>       (gfn_eq(gfn, INVALID_GFN)                                            \
>        ? guest_physmap_remove_page((gt)->domain,                           \
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index cb90527..3c153c6 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -327,6 +327,10 @@ struct page_info
>   
>   #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
>   
> +/* No arch-specific actions are needed for the xenheap page */
> +#define arch_alloc_xenheap_page(p)   do {} while ( 0 )
> +#define arch_free_xenheap_page(p)    do {} while ( 0 )
> +
>   #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
>   extern unsigned long max_page;
>   extern unsigned long total_pages;
Jan Beulich Dec. 14, 2021, 1:37 p.m. UTC | #2
On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1382,8 +1382,10 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
>      spin_lock(&d->page_alloc_lock);
>  
>      /* The incremented type count pins as writable or read-only. */
> -    page->u.inuse.type_info =
> -        (flags == SHARE_ro ? PGT_none : PGT_writable_page) | 1;
> +    page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
> +    page->u.inuse.type_info |= (flags == SHARE_ro ? PGT_none
> +                                                  : PGT_writable_page) |
> +                                MASK_INSR(1, PGT_count_mask);

It's certainly up to the Arm maintainers to judge, but I would have
deemed it better (less risky going forward) if PGT_count_mask
continued to use the bottom bits. (I guess I may have said so before.)

> @@ -1487,7 +1489,23 @@ int xenmem_add_to_physmap_one(
>      }
>  
>      /* Map at new location. */
> -    rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
> +    if ( !p2m_is_ram(t) || !is_xen_heap_mfn(mfn) )
> +        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
> +    else
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        p2m_write_lock(p2m);
> +        if ( gfn_eq(page_get_xenheap_gfn(mfn_to_page(mfn)), INVALID_GFN) )
> +        {
> +            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, p2m->default_access);
> +            if ( !rc )
> +                page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
> +        }
> +        else
> +            rc = -EBUSY;

May I suggest to avoid failing here when page_get_xenheap_gfn(mfn_to_page(mfn))
matches the passed in GFN?

> @@ -2169,6 +2170,9 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>      if ( unlikely(pg == NULL) )
>          return NULL;
>  
> +    for ( i = 0; i < (1u << order); i++ )
> +        arch_alloc_xenheap_page(&pg[i]);
> +
>      memguard_unguard_range(page_to_virt(pg), 1 << (order + PAGE_SHIFT));

I think this and ...

> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>  
>  void free_xenheap_pages(void *v, unsigned int order)
>  {
> +    struct page_info *pg;
> +    unsigned int i;
> +
>      ASSERT(!in_irq());
>  
>      if ( v == NULL )
>          return;
>  
> +    pg = virt_to_page(v);
> +
>      memguard_guard_range(v, 1 << (order + PAGE_SHIFT));

... this really want to (logically) move into the new arch hooks.
That'll effectively mean to simply drop the Arm stubs afaict (and I
notice there's some dead code there on x86, which I guess I'll make
a patch to clean up). But first of all this suggests that you want
to call the hooks with base page and order, putting the loops there.

> @@ -166,6 +173,32 @@ extern unsigned long xenheap_base_pdx;
>  
>  #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
>  
> +static inline gfn_t page_get_xenheap_gfn(struct page_info *p)

const please wherever possible.

Jan
Oleksandr Tyshchenko Dec. 14, 2021, 4:26 p.m. UTC | #3
On 14.12.21 15:37, Jan Beulich wrote:

Hi Jan, Julien.

> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -1382,8 +1382,10 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
>>       spin_lock(&d->page_alloc_lock);
>>   
>>       /* The incremented type count pins as writable or read-only. */
>> -    page->u.inuse.type_info =
>> -        (flags == SHARE_ro ? PGT_none : PGT_writable_page) | 1;
>> +    page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
>> +    page->u.inuse.type_info |= (flags == SHARE_ro ? PGT_none
>> +                                                  : PGT_writable_page) |
>> +                                MASK_INSR(1, PGT_count_mask);
> It's certainly up to the Arm maintainers to judge, but I would have
> deemed it better (less risky going forward) if PGT_count_mask
> continued to use the bottom bits. (I guess I may have said so before.)

If I am not mistaken the request was to make sure (re-check) that moving 
the count portion up
was compatible with all present uses. The code above is only a single 
place on Arm
which needs updating.


>
>> @@ -1487,7 +1489,23 @@ int xenmem_add_to_physmap_one(
>>       }
>>   
>>       /* Map at new location. */
>> -    rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
>> +    if ( !p2m_is_ram(t) || !is_xen_heap_mfn(mfn) )
>> +        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
>> +    else
>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +        p2m_write_lock(p2m);
>> +        if ( gfn_eq(page_get_xenheap_gfn(mfn_to_page(mfn)), INVALID_GFN) )
>> +        {
>> +            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, p2m->default_access);
>> +            if ( !rc )
>> +                page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
>> +        }
>> +        else
>> +            rc = -EBUSY;
> May I suggest to avoid failing here when page_get_xenheap_gfn(mfn_to_page(mfn))
> matches the passed in GFN?


Good question...
There was an explicit request to fail here if page_get_xenheap_gfn() 
returns a valid GFN.
 From the other side, if old GFN matches new GFN we do not remove the 
mapping in gnttab_set_frame_gfn(),
so probably we could avoid failing here in that particular case. 
@Julien, what do you think?


>
>> @@ -2169,6 +2170,9 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>>       if ( unlikely(pg == NULL) )
>>           return NULL;
>>   
>> +    for ( i = 0; i < (1u << order); i++ )
>> +        arch_alloc_xenheap_page(&pg[i]);
>> +
>>       memguard_unguard_range(page_to_virt(pg), 1 << (order + PAGE_SHIFT));
> I think this and ...
>
>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>>   
>>   void free_xenheap_pages(void *v, unsigned int order)
>>   {
>> +    struct page_info *pg;
>> +    unsigned int i;
>> +
>>       ASSERT(!in_irq());
>>   
>>       if ( v == NULL )
>>           return;
>>   
>> +    pg = virt_to_page(v);
>> +
>>       memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
> ... this really want to (logically) move into the new arch hooks.
> That'll effectively mean to simply drop the Arm stubs afaict (and I
> notice there's some dead code there on x86, which I guess I'll make
> a patch to clean up). But first of all this suggests that you want
> to call the hooks with base page and order, putting the loops there.

I see your point and agree ... However I see the on-list patches that 
remove common memguard_* invocations and x86 bits.
So I assume, this request is not actual anymore, or I still need to pass 
an order to new arch hooks? Please clarify.


>
>> @@ -166,6 +173,32 @@ extern unsigned long xenheap_base_pdx;
>>   
>>   #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
>>   
>> +static inline gfn_t page_get_xenheap_gfn(struct page_info *p)
> const please wherever possible.

ok, will do.


Thank you.


>
> Jan
>
Jan Beulich Dec. 14, 2021, 4:45 p.m. UTC | #4
On 14.12.2021 17:26, Oleksandr wrote:
> On 14.12.21 15:37, Jan Beulich wrote:
>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>>>   
>>>   void free_xenheap_pages(void *v, unsigned int order)
>>>   {
>>> +    struct page_info *pg;
>>> +    unsigned int i;
>>> +
>>>       ASSERT(!in_irq());
>>>   
>>>       if ( v == NULL )
>>>           return;
>>>   
>>> +    pg = virt_to_page(v);
>>> +
>>>       memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>> ... this really want to (logically) move into the new arch hooks.
>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>> notice there's some dead code there on x86, which I guess I'll make
>> a patch to clean up). But first of all this suggests that you want
>> to call the hooks with base page and order, putting the loops there.
> 
> I see your point and agree ... However I see the on-list patches that 
> remove common memguard_* invocations and x86 bits.
> So I assume, this request is not actual anymore, or I still need to pass 
> an order to new arch hooks? Please clarify.

Well, that patch (really just the Arm one) effectively takes care of
part of what I did say above. Irrespective I continue to think that
the hook should take a (page,order) tuple instead of getting invoked
once for every order-0 page. And the hook invocations should be placed
such that they could fulfill the (being removed) memguard function
(iirc that was already the case, at least mostly).

Jan
Oleksandr Tyshchenko Dec. 14, 2021, 4:51 p.m. UTC | #5
On 14.12.21 18:45, Jan Beulich wrote:

Hi Jan.

> On 14.12.2021 17:26, Oleksandr wrote:
>> On 14.12.21 15:37, Jan Beulich wrote:
>>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>>>>    
>>>>    void free_xenheap_pages(void *v, unsigned int order)
>>>>    {
>>>> +    struct page_info *pg;
>>>> +    unsigned int i;
>>>> +
>>>>        ASSERT(!in_irq());
>>>>    
>>>>        if ( v == NULL )
>>>>            return;
>>>>    
>>>> +    pg = virt_to_page(v);
>>>> +
>>>>        memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>> ... this really want to (logically) move into the new arch hooks.
>>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>>> notice there's some dead code there on x86, which I guess I'll make
>>> a patch to clean up). But first of all this suggests that you want
>>> to call the hooks with base page and order, putting the loops there.
>> I see your point and agree ... However I see the on-list patches that
>> remove common memguard_* invocations and x86 bits.
>> So I assume, this request is not actual anymore, or I still need to pass
>> an order to new arch hooks? Please clarify.
> Well, that patch (really just the Arm one) effectively takes care of
> part of what I did say above. Irrespective I continue to think that
> the hook should take a (page,order) tuple instead of getting invoked
> once for every order-0 page. And the hook invocations should be placed
> such that they could fulfill the (being removed) memguard function
> (iirc that was already the case, at least mostly).

ok, thank you for the clarification, will do.


>
> Jan
>
Julien Grall Dec. 22, 2021, 9:49 a.m. UTC | #6
Hi Oleksandr,

On 14/12/2021 17:26, Oleksandr wrote:
>>> @@ -1487,7 +1489,23 @@ int xenmem_add_to_physmap_one(
>>>       }
>>>       /* Map at new location. */
>>> -    rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
>>> +    if ( !p2m_is_ram(t) || !is_xen_heap_mfn(mfn) )
>>> +        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
>>> +    else
>>> +    {
>>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +
>>> +        p2m_write_lock(p2m);
>>> +        if ( gfn_eq(page_get_xenheap_gfn(mfn_to_page(mfn)), 
>>> INVALID_GFN) )
>>> +        {
>>> +            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, 
>>> p2m->default_access);
>>> +            if ( !rc )
>>> +                page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
>>> +        }
>>> +        else
>>> +            rc = -EBUSY;
>> May I suggest to avoid failing here when 
>> page_get_xenheap_gfn(mfn_to_page(mfn))
>> matches the passed in GFN?
> 
> 
> Good question...
> There was an explicit request to fail here if page_get_xenheap_gfn() 
> returns a valid GFN.
>  From the other side, if old GFN matches new GFN we do not remove the 
> mapping in gnttab_set_frame_gfn(),
> so probably we could avoid failing here in that particular case. 
> @Julien, what do you think?

I will answer by a question :). Can this situation happen in normal 
circumstances (e.g. there is no concurrent call)?

The recent XSAs in the grant table code made me more cautious and I 
would prefer if we fail more often than risking potentially introducing 
yet another security issue. It is easy to relax afterwards if there are 
legitimate use cases.

Cheers,
Julien Grall Dec. 22, 2021, 10:01 a.m. UTC | #7
Hi Jan,

On 14/12/2021 17:45, Jan Beulich wrote:
> On 14.12.2021 17:26, Oleksandr wrote:
>> On 14.12.21 15:37, Jan Beulich wrote:
>>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>>>>    
>>>>    void free_xenheap_pages(void *v, unsigned int order)
>>>>    {
>>>> +    struct page_info *pg;
>>>> +    unsigned int i;
>>>> +
>>>>        ASSERT(!in_irq());
>>>>    
>>>>        if ( v == NULL )
>>>>            return;
>>>>    
>>>> +    pg = virt_to_page(v);
>>>> +
>>>>        memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>> ... this really want to (logically) move into the new arch hooks.
>>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>>> notice there's some dead code there on x86, which I guess I'll make
>>> a patch to clean up). But first of all this suggests that you want
>>> to call the hooks with base page and order, putting the loops there.
>>
>> I see your point and agree ... However I see the on-list patches that
>> remove common memguard_* invocations and x86 bits.
>> So I assume, this request is not actual anymore, or I still need to pass
>> an order to new arch hooks? Please clarify.
> 
> Well, that patch (really just the Arm one) effectively takes care of
> part of what I did say above. Irrespective I continue to think that
> the hook should take a (page,order) tuple instead of getting invoked
> once for every order-0 page. And the hook invocations should be placed
> such that they could fulfill the (being removed) memguard function
> (iirc that was already the case, at least mostly).

IIUC your suggestion, with your approach, alloc_xenheap_pages() would 
look like:

      for ( i = 0; i < (1u << order); i++ )
          pg[i].count_info |= PGC_xen_heap;

      arch_alloc_xenheap_pages(pg, 1U << order);

The Arm implementation for arch_alloc_xenheap_pages() would also contain 
a loop.

This could turn out to be quite expensive with large allocation (1GB 
allocation would require 16MB of cache) because the cache may not have 
enough space contain all the pages of that range. So you would have to 
pull twice the page_info in the cache.

This is something we noticed when working on reducing the downtime for 
Live-Update. We were able to save 100s ms by reducing the number of loop 
on the page_info.

So I would prefer Oleksandr's approach here.

Cheers,
Oleksandr Tyshchenko Dec. 22, 2021, 10:38 a.m. UTC | #8
On 22.12.21 11:49, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien


>
> On 14/12/2021 17:26, Oleksandr wrote:
>>>> @@ -1487,7 +1489,23 @@ int xenmem_add_to_physmap_one(
>>>>       }
>>>>       /* Map at new location. */
>>>> -    rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
>>>> +    if ( !p2m_is_ram(t) || !is_xen_heap_mfn(mfn) )
>>>> +        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
>>>> +    else
>>>> +    {
>>>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +
>>>> +        p2m_write_lock(p2m);
>>>> +        if ( gfn_eq(page_get_xenheap_gfn(mfn_to_page(mfn)), 
>>>> INVALID_GFN) )
>>>> +        {
>>>> +            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, 
>>>> p2m->default_access);
>>>> +            if ( !rc )
>>>> +                page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
>>>> +        }
>>>> +        else
>>>> +            rc = -EBUSY;
>>> May I suggest to avoid failing here when 
>>> page_get_xenheap_gfn(mfn_to_page(mfn))
>>> matches the passed in GFN?
>>
>>
>> Good question...
>> There was an explicit request to fail here if page_get_xenheap_gfn() 
>> returns a valid GFN.
>>  From the other side, if old GFN matches new GFN we do not remove the 
>> mapping in gnttab_set_frame_gfn(),
>> so probably we could avoid failing here in that particular case. 
>> @Julien, what do you think?
>
> I will answer by a question :). Can this situation happen in normal 
> circumstances (e.g. there is no concurrent call)?

I think no, it can't. Otherwise I would notice it while testing in 
normal circumstances.


>
>
> The recent XSAs in the grant table code made me more cautious and I 
> would prefer if we fail more often than risking potentially 
> introducing yet another security issue. It is easy to relax afterwards 
> if there are legitimate use cases.

I got it, so your explicit request to fail here still stands. Thank you 
for the clarification.


>
>
> Cheers,
>
Oleksandr Tyshchenko Dec. 22, 2021, 10:59 a.m. UTC | #9
On 22.12.21 12:01, Julien Grall wrote:

Hi Julien

> Hi Jan,
>
> On 14/12/2021 17:45, Jan Beulich wrote:
>> On 14.12.2021 17:26, Oleksandr wrote:
>>> On 14.12.21 15:37, Jan Beulich wrote:
>>>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int 
>>>>> order, unsigned int memflags)
>>>>>       void free_xenheap_pages(void *v, unsigned int order)
>>>>>    {
>>>>> +    struct page_info *pg;
>>>>> +    unsigned int i;
>>>>> +
>>>>>        ASSERT(!in_irq());
>>>>>           if ( v == NULL )
>>>>>            return;
>>>>>    +    pg = virt_to_page(v);
>>>>> +
>>>>>        memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>>> ... this really want to (logically) move into the new arch hooks.
>>>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>>>> notice there's some dead code there on x86, which I guess I'll make
>>>> a patch to clean up). But first of all this suggests that you want
>>>> to call the hooks with base page and order, putting the loops there.
>>>
>>> I see your point and agree ... However I see the on-list patches that
>>> remove common memguard_* invocations and x86 bits.
>>> So I assume, this request is not actual anymore, or I still need to 
>>> pass
>>> an order to new arch hooks? Please clarify.
>>
>> Well, that patch (really just the Arm one) effectively takes care of
>> part of what I did say above. Irrespective I continue to think that
>> the hook should take a (page,order) tuple instead of getting invoked
>> once for every order-0 page. And the hook invocations should be placed
>> such that they could fulfill the (being removed) memguard function
>> (iirc that was already the case, at least mostly).
>
> IIUC your suggestion, with your approach, alloc_xenheap_pages() would 
> look like:
>
>      for ( i = 0; i < (1u << order); i++ )
>          pg[i].count_info |= PGC_xen_heap;
>
>      arch_alloc_xenheap_pages(pg, 1U << order);
>
> The Arm implementation for arch_alloc_xenheap_pages() would also 
> contain a loop.
I have the same understanding (except passing an order instead of a 
number, but it doesn't really matter now)

#define arch_alloc_xenheap_pages(p, order) ({                   \
     unsigned int cnt_ = 1u << (order);                          \
     while (cnt_--)                                              \
         page_set_xenheap_gfn(&(p)[cnt_], INVALID_GFN); \
})

So yes if !CONFIG_SEPARATE_XENHEAP there will be two loops here.



>
>
> This could turn out to be quite expensive with large allocation (1GB 
> allocation would require 16MB of cache) because the cache may not have 
> enough space contain all the pages of that range. So you would have to 
> pull twice the page_info in the cache.
>
> This is something we noticed when working on reducing the downtime for 
> Live-Update. We were able to save 100s ms by reducing the number of 
> loop on the page_info.
>
> So I would prefer Oleksandr's approach here.
>
> Cheers,
>
Jan Beulich Dec. 22, 2021, 12:05 p.m. UTC | #10
On 22.12.2021 11:01, Julien Grall wrote:
> On 14/12/2021 17:45, Jan Beulich wrote:
>> On 14.12.2021 17:26, Oleksandr wrote:
>>> On 14.12.21 15:37, Jan Beulich wrote:
>>>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>>>>>    
>>>>>    void free_xenheap_pages(void *v, unsigned int order)
>>>>>    {
>>>>> +    struct page_info *pg;
>>>>> +    unsigned int i;
>>>>> +
>>>>>        ASSERT(!in_irq());
>>>>>    
>>>>>        if ( v == NULL )
>>>>>            return;
>>>>>    
>>>>> +    pg = virt_to_page(v);
>>>>> +
>>>>>        memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>>> ... this really want to (logically) move into the new arch hooks.
>>>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>>>> notice there's some dead code there on x86, which I guess I'll make
>>>> a patch to clean up). But first of all this suggests that you want
>>>> to call the hooks with base page and order, putting the loops there.
>>>
>>> I see your point and agree ... However I see the on-list patches that
>>> remove common memguard_* invocations and x86 bits.
>>> So I assume, this request is not actual anymore, or I still need to pass
>>> an order to new arch hooks? Please clarify.
>>
>> Well, that patch (really just the Arm one) effectively takes care of
>> part of what I did say above. Irrespective I continue to think that
>> the hook should take a (page,order) tuple instead of getting invoked
>> once for every order-0 page. And the hook invocations should be placed
>> such that they could fulfill the (being removed) memguard function
>> (iirc that was already the case, at least mostly).
> 
> IIUC your suggestion, with your approach, alloc_xenheap_pages() would 
> look like:
> 
>       for ( i = 0; i < (1u << order); i++ )
>           pg[i].count_info |= PGC_xen_heap;
> 
>       arch_alloc_xenheap_pages(pg, 1U << order);

Like Oleksandr said, the 2nd argument would be just "order".

> The Arm implementation for arch_alloc_xenheap_pages() would also contain 
> a loop.
> 
> This could turn out to be quite expensive with large allocation (1GB 
> allocation would require 16MB of cache) because the cache may not have 
> enough space contain all the pages of that range. So you would have to 
> pull twice the page_info in the cache.

Hmm, that's a fair point. I assume you realize that a similar issue of
higher overhead would occur when using your approach, and when some
memguard-like thing was to reappear: Such mapping operations typically
are more efficient when done on a larger range. Since that's only a
hypothetical use at this point, I'm willing to accept your preference.
I'd like us to consider one more aspect though: All you need on Arm is
the setting of the exact same bits to the exact same pattern for every
struct page_info involved. Can't we simply have an arch hook returning
that pattern, for generic code to then OR it in alongside PGC_xen_heap?

Jan
Julien Grall Dec. 22, 2021, 12:33 p.m. UTC | #11
Hi Jan,

On 22/12/2021 13:05, Jan Beulich wrote:
> On 22.12.2021 11:01, Julien Grall wrote:
>> On 14/12/2021 17:45, Jan Beulich wrote:
>>> On 14.12.2021 17:26, Oleksandr wrote:
>>>> On 14.12.21 15:37, Jan Beulich wrote:
>>>>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>>>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>>>>>>     
>>>>>>     void free_xenheap_pages(void *v, unsigned int order)
>>>>>>     {
>>>>>> +    struct page_info *pg;
>>>>>> +    unsigned int i;
>>>>>> +
>>>>>>         ASSERT(!in_irq());
>>>>>>     
>>>>>>         if ( v == NULL )
>>>>>>             return;
>>>>>>     
>>>>>> +    pg = virt_to_page(v);
>>>>>> +
>>>>>>         memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>>>> ... this really want to (logically) move into the new arch hooks.
>>>>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>>>>> notice there's some dead code there on x86, which I guess I'll make
>>>>> a patch to clean up). But first of all this suggests that you want
>>>>> to call the hooks with base page and order, putting the loops there.
>>>>
>>>> I see your point and agree ... However I see the on-list patches that
>>>> remove common memguard_* invocations and x86 bits.
>>>> So I assume, this request is not actual anymore, or I still need to pass
>>>> an order to new arch hooks? Please clarify.
>>>
>>> Well, that patch (really just the Arm one) effectively takes care of
>>> part of what I did say above. Irrespective I continue to think that
>>> the hook should take a (page,order) tuple instead of getting invoked
>>> once for every order-0 page. And the hook invocations should be placed
>>> such that they could fulfill the (being removed) memguard function
>>> (iirc that was already the case, at least mostly).
>>
>> IIUC your suggestion, with your approach, alloc_xenheap_pages() would
>> look like:
>>
>>        for ( i = 0; i < (1u << order); i++ )
>>            pg[i].count_info |= PGC_xen_heap;
>>
>>        arch_alloc_xenheap_pages(pg, 1U << order);
> 
> Like Oleksandr said, the 2nd argument would be just "order".
> 
>> The Arm implementation for arch_alloc_xenheap_pages() would also contain
>> a loop.
>>
>> This could turn out to be quite expensive with large allocation (1GB
>> allocation would require 16MB of cache) because the cache may not have
>> enough space contain all the pages of that range. So you would have to
>> pull twice the page_info in the cache.
> 
> Hmm, that's a fair point. I assume you realize that a similar issue of
> higher overhead would occur when using your approach, and when some
> memguard-like thing was to reappear: Such mapping operations typically
> are more efficient when done on a larger range.

Yes, I was aware of that when I wrote my message. However, they are not 
necessary at the moment. So I think we can defer the discussion.

>  Since that's only a
> hypothetical use at this point, I'm willing to accept your preference.
> I'd like us to consider one more aspect though: All you need on Arm is
> the setting of the exact same bits to the exact same pattern for every
> struct page_info involved. Can't we simply have an arch hook returning
> that pattern, for generic code to then OR it in alongside PGC_xen_heap?

arch_alloc_xenheap_pages() will modify inuse.type_info so we can't or 
the value to PGC_xen_heap.

Cheers,
Oleksandr Tyshchenko Dec. 22, 2021, 12:44 p.m. UTC | #12
On 22.12.21 14:33, Julien Grall wrote:
> Hi Jan,


Hi Julien, Jan



>
> On 22/12/2021 13:05, Jan Beulich wrote:
>> On 22.12.2021 11:01, Julien Grall wrote:
>>> On 14/12/2021 17:45, Jan Beulich wrote:
>>>> On 14.12.2021 17:26, Oleksandr wrote:
>>>>> On 14.12.21 15:37, Jan Beulich wrote:
>>>>>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>>>>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int 
>>>>>>> order, unsigned int memflags)
>>>>>>>         void free_xenheap_pages(void *v, unsigned int order)
>>>>>>>     {
>>>>>>> +    struct page_info *pg;
>>>>>>> +    unsigned int i;
>>>>>>> +
>>>>>>>         ASSERT(!in_irq());
>>>>>>>             if ( v == NULL )
>>>>>>>             return;
>>>>>>>     +    pg = virt_to_page(v);
>>>>>>> +
>>>>>>>         memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>>>>> ... this really want to (logically) move into the new arch hooks.
>>>>>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>>>>>> notice there's some dead code there on x86, which I guess I'll make
>>>>>> a patch to clean up). But first of all this suggests that you want
>>>>>> to call the hooks with base page and order, putting the loops there.
>>>>>
>>>>> I see your point and agree ... However I see the on-list patches that
>>>>> remove common memguard_* invocations and x86 bits.
>>>>> So I assume, this request is not actual anymore, or I still need 
>>>>> to pass
>>>>> an order to new arch hooks? Please clarify.
>>>>
>>>> Well, that patch (really just the Arm one) effectively takes care of
>>>> part of what I did say above. Irrespective I continue to think that
>>>> the hook should take a (page,order) tuple instead of getting invoked
>>>> once for every order-0 page. And the hook invocations should be placed
>>>> such that they could fulfill the (being removed) memguard function
>>>> (iirc that was already the case, at least mostly).
>>>
>>> IIUC your suggestion, with your approach, alloc_xenheap_pages() would
>>> look like:
>>>
>>>        for ( i = 0; i < (1u << order); i++ )
>>>            pg[i].count_info |= PGC_xen_heap;
>>>
>>>        arch_alloc_xenheap_pages(pg, 1U << order);
>>
>> Like Oleksandr said, the 2nd argument would be just "order".
>>
>>> The Arm implementation for arch_alloc_xenheap_pages() would also 
>>> contain
>>> a loop.
>>>
>>> This could turn out to be quite expensive with large allocation (1GB
>>> allocation would require 16MB of cache) because the cache may not have
>>> enough space contain all the pages of that range. So you would have to
>>> pull twice the page_info in the cache.
>>
>> Hmm, that's a fair point. I assume you realize that a similar issue of
>> higher overhead would occur when using your approach, and when some
>> memguard-like thing was to reappear: Such mapping operations typically
>> are more efficient when done on a larger range.
>
> Yes, I was aware of that when I wrote my message. However, they are 
> not necessary at the moment. So I think we can defer the discussion.
>
>>  Since that's only a
>> hypothetical use at this point, I'm willing to accept your preference.
>> I'd like us to consider one more aspect though: All you need on Arm is
>> the setting of the exact same bits to the exact same pattern for every
>> struct page_info involved. Can't we simply have an arch hook returning
>> that pattern, for generic code to then OR it in alongside PGC_xen_heap?
>
> arch_alloc_xenheap_pages() will modify inuse.type_info so we can't or 
> the value to PGC_xen_heap.

I wonder, can we apply pattern here at alloc_heap_pages() when 
initializing type_info?
https://xenbits.xen.org/gitweb/?p=xen.git;f=xen/common/page_alloc.c;hb=refs/heads/master#l1027
If yes, the next question would be what indicator to use here to make 
sure that page is really xenheap page.
I also wonder, can we apply pattern for all type of pages here (without 
differentiating)?
Jan Beulich Jan. 4, 2022, 8:31 a.m. UTC | #13
On 22.12.2021 13:33, Julien Grall wrote:
> On 22/12/2021 13:05, Jan Beulich wrote:
>> On 22.12.2021 11:01, Julien Grall wrote:
>>> On 14/12/2021 17:45, Jan Beulich wrote:
>>>> On 14.12.2021 17:26, Oleksandr wrote:
>>>>> On 14.12.21 15:37, Jan Beulich wrote:
>>>>>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>>>>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>>>>>>>     
>>>>>>>     void free_xenheap_pages(void *v, unsigned int order)
>>>>>>>     {
>>>>>>> +    struct page_info *pg;
>>>>>>> +    unsigned int i;
>>>>>>> +
>>>>>>>         ASSERT(!in_irq());
>>>>>>>     
>>>>>>>         if ( v == NULL )
>>>>>>>             return;
>>>>>>>     
>>>>>>> +    pg = virt_to_page(v);
>>>>>>> +
>>>>>>>         memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>>>>> ... this really want to (logically) move into the new arch hooks.
>>>>>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>>>>>> notice there's some dead code there on x86, which I guess I'll make
>>>>>> a patch to clean up). But first of all this suggests that you want
>>>>>> to call the hooks with base page and order, putting the loops there.
>>>>>
>>>>> I see your point and agree ... However I see the on-list patches that
>>>>> remove common memguard_* invocations and x86 bits.
>>>>> So I assume, this request is not actual anymore, or I still need to pass
>>>>> an order to new arch hooks? Please clarify.
>>>>
>>>> Well, that patch (really just the Arm one) effectively takes care of
>>>> part of what I did say above. Irrespective I continue to think that
>>>> the hook should take a (page,order) tuple instead of getting invoked
>>>> once for every order-0 page. And the hook invocations should be placed
>>>> such that they could fulfill the (being removed) memguard function
>>>> (iirc that was already the case, at least mostly).
>>>
>>> IIUC your suggestion, with your approach, alloc_xenheap_pages() would
>>> look like:
>>>
>>>        for ( i = 0; i < (1u << order); i++ )
>>>            pg[i].count_info |= PGC_xen_heap;
>>>
>>>        arch_alloc_xenheap_pages(pg, 1U << order);
>>
>> Like Oleksandr said, the 2nd argument would be just "order".
>>
>>> The Arm implementation for arch_alloc_xenheap_pages() would also contain
>>> a loop.
>>>
>>> This could turn out to be quite expensive with large allocation (1GB
>>> allocation would require 16MB of cache) because the cache may not have
>>> enough space contain all the pages of that range. So you would have to
>>> pull twice the page_info in the cache.
>>
>> Hmm, that's a fair point. I assume you realize that a similar issue of
>> higher overhead would occur when using your approach, and when some
>> memguard-like thing was to reappear: Such mapping operations typically
>> are more efficient when done on a larger range.
> 
> Yes, I was aware of that when I wrote my message. However, they are not 
> necessary at the moment. So I think we can defer the discussion.
> 
>>  Since that's only a
>> hypothetical use at this point, I'm willing to accept your preference.
>> I'd like us to consider one more aspect though: All you need on Arm is
>> the setting of the exact same bits to the exact same pattern for every
>> struct page_info involved. Can't we simply have an arch hook returning
>> that pattern, for generic code to then OR it in alongside PGC_xen_heap?
> 
> arch_alloc_xenheap_pages() will modify inuse.type_info so we can't or 
> the value to PGC_xen_heap.

Oh, sure - I didn't mean it to be a single OR, but two next to each other
inside the one loop that's already there.

Jan
Jan Beulich Jan. 4, 2022, 8:36 a.m. UTC | #14
On 22.12.2021 13:44, Oleksandr wrote:
> 
> On 22.12.21 14:33, Julien Grall wrote:
>> Hi Jan,
> 
> 
> Hi Julien, Jan
> 
> 
> 
>>
>> On 22/12/2021 13:05, Jan Beulich wrote:
>>> On 22.12.2021 11:01, Julien Grall wrote:
>>>> On 14/12/2021 17:45, Jan Beulich wrote:
>>>>> On 14.12.2021 17:26, Oleksandr wrote:
>>>>>> On 14.12.21 15:37, Jan Beulich wrote:
>>>>>>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>>>>>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int 
>>>>>>>> order, unsigned int memflags)
>>>>>>>>         void free_xenheap_pages(void *v, unsigned int order)
>>>>>>>>     {
>>>>>>>> +    struct page_info *pg;
>>>>>>>> +    unsigned int i;
>>>>>>>> +
>>>>>>>>         ASSERT(!in_irq());
>>>>>>>>             if ( v == NULL )
>>>>>>>>             return;
>>>>>>>>     +    pg = virt_to_page(v);
>>>>>>>> +
>>>>>>>>         memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>>>>>> ... this really want to (logically) move into the new arch hooks.
>>>>>>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>>>>>>> notice there's some dead code there on x86, which I guess I'll make
>>>>>>> a patch to clean up). But first of all this suggests that you want
>>>>>>> to call the hooks with base page and order, putting the loops there.
>>>>>>
>>>>>> I see your point and agree ... However I see the on-list patches that
>>>>>> remove common memguard_* invocations and x86 bits.
>>>>>> So I assume, this request is not actual anymore, or I still need 
>>>>>> to pass
>>>>>> an order to new arch hooks? Please clarify.
>>>>>
>>>>> Well, that patch (really just the Arm one) effectively takes care of
>>>>> part of what I did say above. Irrespective I continue to think that
>>>>> the hook should take a (page,order) tuple instead of getting invoked
>>>>> once for every order-0 page. And the hook invocations should be placed
>>>>> such that they could fulfill the (being removed) memguard function
>>>>> (iirc that was already the case, at least mostly).
>>>>
>>>> IIUC your suggestion, with your approach, alloc_xenheap_pages() would
>>>> look like:
>>>>
>>>>        for ( i = 0; i < (1u << order); i++ )
>>>>            pg[i].count_info |= PGC_xen_heap;
>>>>
>>>>        arch_alloc_xenheap_pages(pg, 1U << order);
>>>
>>> Like Oleksandr said, the 2nd argument would be just "order".
>>>
>>>> The Arm implementation for arch_alloc_xenheap_pages() would also 
>>>> contain
>>>> a loop.
>>>>
>>>> This could turn out to be quite expensive with large allocation (1GB
>>>> allocation would require 16MB of cache) because the cache may not have
>>>> enough space contain all the pages of that range. So you would have to
>>>> pull twice the page_info in the cache.
>>>
>>> Hmm, that's a fair point. I assume you realize that a similar issue of
>>> higher overhead would occur when using your approach, and when some
>>> memguard-like thing was to reappear: Such mapping operations typically
>>> are more efficient when done on a larger range.
>>
>> Yes, I was aware of that when I wrote my message. However, they are 
>> not necessary at the moment. So I think we can defer the discussion.
>>
>>>  Since that's only a
>>> hypothetical use at this point, I'm willing to accept your preference.
>>> I'd like us to consider one more aspect though: All you need on Arm is
>>> the setting of the exact same bits to the exact same pattern for every
>>> struct page_info involved. Can't we simply have an arch hook returning
>>> that pattern, for generic code to then OR it in alongside PGC_xen_heap?
>>
>> arch_alloc_xenheap_pages() will modify inuse.type_info so we can't or 
>> the value to PGC_xen_heap.
> 
> I wonder, can we apply pattern here at alloc_heap_pages() when 
> initializing type_info?
> https://xenbits.xen.org/gitweb/?p=xen.git;f=xen/common/page_alloc.c;hb=refs/heads/master#l1027
> If yes, the next question would be what indicator to use here to make 
> sure that page is really xenheap page.

Technically that would seem to be possible, by way of passing yet another
argument into alloc_heap_pages(). I'm not (yet) convinced, though, of this
being desirable.

> I also wonder, can we apply pattern for all type of pages here (without 
> differentiating)?

I'm afraid I don't understand this part: How could we get along without
differentiating Xen heap and domain heap pages?

Jan
Oleksandr Tyshchenko Jan. 4, 2022, 9:41 p.m. UTC | #15
On 04.01.22 10:36, Jan Beulich wrote:

Hi Jan


> On 22.12.2021 13:44, Oleksandr wrote:
>> On 22.12.21 14:33, Julien Grall wrote:
>>> Hi Jan,
>>
>> Hi Julien, Jan
>>
>>
>>
>>> On 22/12/2021 13:05, Jan Beulich wrote:
>>>> On 22.12.2021 11:01, Julien Grall wrote:
>>>>> On 14/12/2021 17:45, Jan Beulich wrote:
>>>>>> On 14.12.2021 17:26, Oleksandr wrote:
>>>>>>> On 14.12.21 15:37, Jan Beulich wrote:
>>>>>>>> On 03.12.2021 21:33, Oleksandr Tyshchenko wrote:
>>>>>>>>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int
>>>>>>>>> order, unsigned int memflags)
>>>>>>>>>          void free_xenheap_pages(void *v, unsigned int order)
>>>>>>>>>      {
>>>>>>>>> +    struct page_info *pg;
>>>>>>>>> +    unsigned int i;
>>>>>>>>> +
>>>>>>>>>          ASSERT(!in_irq());
>>>>>>>>>              if ( v == NULL )
>>>>>>>>>              return;
>>>>>>>>>      +    pg = virt_to_page(v);
>>>>>>>>> +
>>>>>>>>>          memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>>>>>>> ... this really want to (logically) move into the new arch hooks.
>>>>>>>> That'll effectively mean to simply drop the Arm stubs afaict (and I
>>>>>>>> notice there's some dead code there on x86, which I guess I'll make
>>>>>>>> a patch to clean up). But first of all this suggests that you want
>>>>>>>> to call the hooks with base page and order, putting the loops there.
>>>>>>> I see your point and agree ... However I see the on-list patches that
>>>>>>> remove common memguard_* invocations and x86 bits.
>>>>>>> So I assume, this request is not actual anymore, or I still need
>>>>>>> to pass
>>>>>>> an order to new arch hooks? Please clarify.
>>>>>> Well, that patch (really just the Arm one) effectively takes care of
>>>>>> part of what I did say above. Irrespective I continue to think that
>>>>>> the hook should take a (page,order) tuple instead of getting invoked
>>>>>> once for every order-0 page. And the hook invocations should be placed
>>>>>> such that they could fulfill the (being removed) memguard function
>>>>>> (iirc that was already the case, at least mostly).
>>>>> IIUC your suggestion, with your approach, alloc_xenheap_pages() would
>>>>> look like:
>>>>>
>>>>>         for ( i = 0; i < (1u << order); i++ )
>>>>>             pg[i].count_info |= PGC_xen_heap;
>>>>>
>>>>>         arch_alloc_xenheap_pages(pg, 1U << order);
>>>> Like Oleksandr said, the 2nd argument would be just "order".
>>>>
>>>>> The Arm implementation for arch_alloc_xenheap_pages() would also
>>>>> contain
>>>>> a loop.
>>>>>
>>>>> This could turn out to be quite expensive with large allocation (1GB
>>>>> allocation would require 16MB of cache) because the cache may not have
>>>>> enough space contain all the pages of that range. So you would have to
>>>>> pull twice the page_info in the cache.
>>>> Hmm, that's a fair point. I assume you realize that a similar issue of
>>>> higher overhead would occur when using your approach, and when some
>>>> memguard-like thing was to reappear: Such mapping operations typically
>>>> are more efficient when done on a larger range.
>>> Yes, I was aware of that when I wrote my message. However, they are
>>> not necessary at the moment. So I think we can defer the discussion.
>>>
>>>>   Since that's only a
>>>> hypothetical use at this point, I'm willing to accept your preference.
>>>> I'd like us to consider one more aspect though: All you need on Arm is
>>>> the setting of the exact same bits to the exact same pattern for every
>>>> struct page_info involved. Can't we simply have an arch hook returning
>>>> that pattern, for generic code to then OR it in alongside PGC_xen_heap?
>>> arch_alloc_xenheap_pages() will modify inuse.type_info so we can't or
>>> the value to PGC_xen_heap.
>> I wonder, can we apply pattern here at alloc_heap_pages() when
>> initializing type_info?
>> https://xenbits.xen.org/gitweb/?p=xen.git;f=xen/common/page_alloc.c;hb=refs/heads/master#l1027
>> If yes, the next question would be what indicator to use here to make
>> sure that page is really xenheap page.
> Technically that would seem to be possible, by way of passing yet another
> argument into alloc_heap_pages(). I'm not (yet) convinced, though, of this
> being desirable.

Yes, adding an extra argument would work, but I also don't like that 
idea much. I meant
to determine somehow from the existing arguments (zone_*, memflags) that 
a particular page is a xenheap page.


>
>> I also wonder, can we apply pattern for all type of pages here (without
>> differentiating)?
> I'm afraid I don't understand this part: How could we get along without
> differentiating Xen heap and domain heap pages?

I was thinking, what bad could happen if we would simply use the following:

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 50334a0..97cf0d8 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1024,7 +1024,7 @@ static struct page_info *alloc_heap_pages(
                                  &tlbflush_timestamp);

          /* Initialise fields which have other uses for free pages. */
-        pg[i].u.inuse.type_info = 0;
+        pg[i].u.inuse.type_info = PGT_TYPE_INFO_INIT_PATTERN;
          page_set_owner(&pg[i], NULL);

      }


on Arm:
#define PGT_TYPE_INFO_INIT_PATTERN   PGT_gfn_mask
or
#define PGT_TYPE_INFO_INIT_PATTERN   gfn_x(PGT_INVALID_XENHEAP_GFN)

on x86:
#define PGT_TYPE_INFO_INIT_PATTERN   0


Yes, we apply this pattern to *all* pages, although the gfn portion is 
only used for xenheap pages.
I might mistake but I think this pattern (which doesn't set any bits 
outside of the gfn portion) is harmless for non-xenheap pages, albeit an 
extra action.

Well, if not appropriate I will follow the advise to OR the pattern in 
alloc_xenheap_pages(). For !CONFIG_SEPARATE_XENHEAP the loop is already 
there, but for CONFIG_SEPARATE_XENHEAP new loop is needed.


Thank you.


>
> Jan
>
Jan Beulich Jan. 5, 2022, 7:30 a.m. UTC | #16
On 04.01.2022 22:41, Oleksandr wrote:
> On 04.01.22 10:36, Jan Beulich wrote:
>> On 22.12.2021 13:44, Oleksandr wrote:
>>> I also wonder, can we apply pattern for all type of pages here (without
>>> differentiating)?
>> I'm afraid I don't understand this part: How could we get along without
>> differentiating Xen heap and domain heap pages?
> 
> I was thinking, what bad could happen if we would simply use the following:
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 50334a0..97cf0d8 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1024,7 +1024,7 @@ static struct page_info *alloc_heap_pages(
>                                   &tlbflush_timestamp);
> 
>           /* Initialise fields which have other uses for free pages. */
> -        pg[i].u.inuse.type_info = 0;
> +        pg[i].u.inuse.type_info = PGT_TYPE_INFO_INIT_PATTERN;
>           page_set_owner(&pg[i], NULL);
> 
>       }
> 
> 
> on Arm:
> #define PGT_TYPE_INFO_INIT_PATTERN   PGT_gfn_mask
> or
> #define PGT_TYPE_INFO_INIT_PATTERN   gfn_x(PGT_INVALID_XENHEAP_GFN)
> 
> on x86:
> #define PGT_TYPE_INFO_INIT_PATTERN   0
> 
> 
> Yes, we apply this pattern to *all* pages, although the gfn portion is 
> only used for xenheap pages.
> I might mistake but I think this pattern (which doesn't set any bits 
> outside of the gfn portion) is harmless for non-xenheap pages, albeit an 
> extra action.

I wouldn't mind this, but it's largely to the Arm maintainers to be
happy with it, i.e. not foreseeing present or latent issues.

Jan
diff mbox series

Patch

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index eea926d..4f4cab3 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1382,8 +1382,10 @@  void share_xen_page_with_guest(struct page_info *page, struct domain *d,
     spin_lock(&d->page_alloc_lock);
 
     /* The incremented type count pins as writable or read-only. */
-    page->u.inuse.type_info =
-        (flags == SHARE_ro ? PGT_none : PGT_writable_page) | 1;
+    page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
+    page->u.inuse.type_info |= (flags == SHARE_ro ? PGT_none
+                                                  : PGT_writable_page) |
+                                MASK_INSR(1, PGT_count_mask);
 
     page_set_owner(page, d);
     smp_wmb(); /* install valid domain ptr before updating refcnt. */
@@ -1487,7 +1489,23 @@  int xenmem_add_to_physmap_one(
     }
 
     /* Map at new location. */
-    rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
+    if ( !p2m_is_ram(t) || !is_xen_heap_mfn(mfn) )
+        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
+    else
+    {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        p2m_write_lock(p2m);
+        if ( gfn_eq(page_get_xenheap_gfn(mfn_to_page(mfn)), INVALID_GFN) )
+        {
+            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, p2m->default_access);
+            if ( !rc )
+                page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
+        }
+        else
+            rc = -EBUSY;
+        p2m_write_unlock(p2m);
+    }
 
     /*
      * For XENMAPSPACE_gmfn_foreign if we failed to add the mapping, we need
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 8b20b43..fd8aff9 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -720,6 +720,8 @@  static int p2m_mem_access_radix_set(struct p2m_domain *p2m, gfn_t gfn,
  */
 static void p2m_put_l3_page(const lpae_t pte)
 {
+    mfn_t mfn = lpae_get_mfn(pte);
+
     ASSERT(p2m_is_valid(pte));
 
     /*
@@ -731,11 +733,12 @@  static void p2m_put_l3_page(const lpae_t pte)
      */
     if ( p2m_is_foreign(pte.p2m.type) )
     {
-        mfn_t mfn = lpae_get_mfn(pte);
-
         ASSERT(mfn_valid(mfn));
         put_page(mfn_to_page(mfn));
     }
+    /* Detect the xenheap page and mark the stored GFN as invalid. */
+    else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
+        page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
 }
 
 /* Free lpae sub-tree behind an entry */
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 0262f2c..01d7a29 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -94,8 +94,6 @@  struct grant_table {
 
     /* Domain to which this struct grant_table belongs. */
     struct domain *domain;
-
-    struct grant_table_arch arch;
 };
 
 unsigned int __read_mostly opt_max_grant_frames = 64;
@@ -1997,14 +1995,9 @@  int grant_table_init(struct domain *d, int max_grant_frames,
 
     grant_write_lock(gt);
 
-    ret = gnttab_init_arch(gt);
-    if ( ret )
-        goto unlock;
-
     /* gnttab_grow_table() allocates a min number of frames, so 0 is okay. */
     ret = gnttab_grow_table(d, 0);
 
- unlock:
     grant_write_unlock(gt);
 
  out:
@@ -3911,8 +3904,6 @@  grant_table_destroy(
     if ( t == NULL )
         return;
 
-    gnttab_destroy_arch(t);
-
     for ( i = 0; i < nr_grant_frames(t); i++ )
         free_xenheap_page(t->shared_raw[i]);
     xfree(t->shared_raw);
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index d0baaa2..2306d9a 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2161,6 +2161,7 @@  void init_xenheap_pages(paddr_t ps, paddr_t pe)
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 {
     struct page_info *pg;
+    unsigned int i;
 
     ASSERT(!in_irq());
 
@@ -2169,6 +2170,9 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( unlikely(pg == NULL) )
         return NULL;
 
+    for ( i = 0; i < (1u << order); i++ )
+        arch_alloc_xenheap_page(&pg[i]);
+
     memguard_unguard_range(page_to_virt(pg), 1 << (order + PAGE_SHIFT));
 
     return page_to_virt(pg);
@@ -2177,14 +2181,22 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
+    struct page_info *pg;
+    unsigned int i;
+
     ASSERT(!in_irq());
 
     if ( v == NULL )
         return;
 
+    pg = virt_to_page(v);
+
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order, false);
+    for ( i = 0; i < (1u << order); i++ )
+        arch_free_xenheap_page(&pg[i]);
+
+    free_heap_pages(pg, order, false);
 }
 
 #else  /* !CONFIG_SEPARATE_XENHEAP */
@@ -2220,7 +2232,10 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
         return NULL;
 
     for ( i = 0; i < (1u << order); i++ )
+    {
         pg[i].count_info |= PGC_xen_heap;
+        arch_alloc_xenheap_page(&pg[i]);
+    }
 
     return page_to_virt(pg);
 }
@@ -2238,7 +2253,10 @@  void free_xenheap_pages(void *v, unsigned int order)
     pg = virt_to_page(v);
 
     for ( i = 0; i < (1u << order); i++ )
+    {
+        arch_free_xenheap_page(&pg[i]);
         pg[i].count_info &= ~PGC_xen_heap;
+    }
 
     free_heap_pages(pg, order, true);
 }
diff --git a/xen/include/asm-arm/grant_table.h b/xen/include/asm-arm/grant_table.h
index d31a4d6..d6fda31 100644
--- a/xen/include/asm-arm/grant_table.h
+++ b/xen/include/asm-arm/grant_table.h
@@ -11,11 +11,6 @@ 
 #define INITIAL_NR_GRANT_FRAMES 1U
 #define GNTTAB_MAX_VERSION 1
 
-struct grant_table_arch {
-    gfn_t *shared_gfn;
-    gfn_t *status_gfn;
-};
-
 static inline void gnttab_clear_flags(struct domain *d,
                                       unsigned int mask, uint16_t *addr)
 {
@@ -46,41 +41,12 @@  int replace_grant_host_mapping(unsigned long gpaddr, mfn_t mfn,
 #define gnttab_dom0_frames()                                             \
     min_t(unsigned int, opt_max_grant_frames, PFN_DOWN(_etext - _stext))
 
-#define gnttab_init_arch(gt)                                             \
-({                                                                       \
-    unsigned int ngf_ = (gt)->max_grant_frames;                          \
-    unsigned int nsf_ = grant_to_status_frames(ngf_);                    \
-                                                                         \
-    (gt)->arch.shared_gfn = xmalloc_array(gfn_t, ngf_);                  \
-    (gt)->arch.status_gfn = xmalloc_array(gfn_t, nsf_);                  \
-    if ( (gt)->arch.shared_gfn && (gt)->arch.status_gfn )                \
-    {                                                                    \
-        while ( ngf_-- )                                                 \
-            (gt)->arch.shared_gfn[ngf_] = INVALID_GFN;                   \
-        while ( nsf_-- )                                                 \
-            (gt)->arch.status_gfn[nsf_] = INVALID_GFN;                   \
-    }                                                                    \
-    else                                                                 \
-        gnttab_destroy_arch(gt);                                         \
-    (gt)->arch.shared_gfn ? 0 : -ENOMEM;                                 \
-})
-
-#define gnttab_destroy_arch(gt)                                          \
-    do {                                                                 \
-        XFREE((gt)->arch.shared_gfn);                                    \
-        XFREE((gt)->arch.status_gfn);                                    \
-    } while ( 0 )
-
 #define gnttab_set_frame_gfn(gt, st, idx, gfn, mfn)                      \
     ({                                                                   \
-        int rc_ = 0;                                                     \
         gfn_t ogfn = gnttab_get_frame_gfn(gt, st, idx);                  \
-        if ( gfn_eq(ogfn, INVALID_GFN) || gfn_eq(ogfn, gfn) ||           \
-             (rc_ = guest_physmap_remove_page((gt)->domain, ogfn, mfn,   \
-                                              0)) == 0 )                 \
-            ((st) ? (gt)->arch.status_gfn                                \
-                  : (gt)->arch.shared_gfn)[idx] = (gfn);                 \
-        rc_;                                                             \
+        (!gfn_eq(ogfn, INVALID_GFN) && !gfn_eq(ogfn, gfn))               \
+         ? guest_physmap_remove_page((gt)->domain, ogfn, mfn, 0)         \
+         : 0;                                                            \
     })
 
 #define gnttab_get_frame_gfn(gt, st, idx) ({                             \
@@ -88,11 +54,21 @@  int replace_grant_host_mapping(unsigned long gpaddr, mfn_t mfn,
         : gnttab_shared_gfn(NULL, gt, idx);                              \
 })
 
-#define gnttab_shared_gfn(d, t, i)                                       \
-    (((i) >= nr_grant_frames(t)) ? INVALID_GFN : (t)->arch.shared_gfn[i])
+#define gnttab_shared_page(t, i) ({                                      \
+    virt_to_page((t)->shared_raw[i]);                                    \
+})
+
+#define gnttab_status_page(t, i) ({                                      \
+    virt_to_page((t)->status[i]);                                        \
+})
 
-#define gnttab_status_gfn(d, t, i)                                       \
-    (((i) >= nr_status_frames(t)) ? INVALID_GFN : (t)->arch.status_gfn[i])
+#define gnttab_shared_gfn(d, t, i) ({                                    \
+    page_get_xenheap_gfn(gnttab_shared_page(t, i));                      \
+})
+
+#define gnttab_status_gfn(d, t, i) ({                                    \
+    page_get_xenheap_gfn(gnttab_status_page(t, i));                      \
+})
 
 #define gnttab_need_iommu_mapping(d)                    \
     (is_domain_direct_mapped(d) && is_iommu_enabled(d))
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 7b5e7b7..74b6485 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -98,9 +98,16 @@  struct page_info
 #define PGT_writable_page PG_mask(1, 1)  /* has writable mappings?         */
 #define PGT_type_mask     PG_mask(1, 1)  /* Bits 31 or 63.                 */
 
- /* Count of uses of this frame as its current type. */
-#define PGT_count_width   PG_shift(2)
-#define PGT_count_mask    ((1UL<<PGT_count_width)-1)
+ /* 2-bit count of uses of this frame as its current type. */
+#define PGT_count_mask    PG_mask(3, 3)
+
+/*
+ * Stored in bits [28:0] or [60:0] GFN if page is xenheap page.
+ */
+#define PGT_gfn_width     PG_shift(3)
+#define PGT_gfn_mask      ((1UL<<PGT_gfn_width)-1)
+
+#define PGT_INVALID_XENHEAP_GFN   _gfn(PGT_gfn_mask)
 
  /* Cleared when the owning guest 'frees' this page. */
 #define _PGC_allocated    PG_shift(1)
@@ -166,6 +173,32 @@  extern unsigned long xenheap_base_pdx;
 
 #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
 
+static inline gfn_t page_get_xenheap_gfn(struct page_info *p)
+{
+    gfn_t gfn_ = _gfn(p->u.inuse.type_info & PGT_gfn_mask);
+
+    ASSERT(is_xen_heap_page(p));
+
+    return gfn_eq(gfn_, PGT_INVALID_XENHEAP_GFN) ? INVALID_GFN : gfn_;
+}
+
+static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
+{
+    gfn_t gfn_ = gfn_eq(gfn, INVALID_GFN) ? PGT_INVALID_XENHEAP_GFN : gfn;
+
+    ASSERT(is_xen_heap_page(p));
+
+    p->u.inuse.type_info &= ~PGT_gfn_mask;
+    p->u.inuse.type_info |= gfn_x(gfn_);
+}
+
+/*
+ * As the struct page_info representing the xenheap page on Arm can contain
+ * the valid GFN we need to clear it beforehand.
+ */
+#define arch_alloc_xenheap_page(p)   page_set_xenheap_gfn(p, INVALID_GFN)
+#define arch_free_xenheap_page(p)    do {} while ( 0 )
+
 #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
 /* PDX of the first page in the frame table. */
 extern unsigned long frametable_base_pdx;
diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h
index a8a2143..5c23cec 100644
--- a/xen/include/asm-x86/grant_table.h
+++ b/xen/include/asm-x86/grant_table.h
@@ -14,9 +14,6 @@ 
 
 #define INITIAL_NR_GRANT_FRAMES 1U
 
-struct grant_table_arch {
-};
-
 static inline int create_grant_host_mapping(uint64_t addr, mfn_t frame,
                                             unsigned int flags,
                                             unsigned int cache_flags)
@@ -35,8 +32,6 @@  static inline int replace_grant_host_mapping(uint64_t addr, mfn_t frame,
     return replace_grant_pv_mapping(addr, frame, new_addr, flags);
 }
 
-#define gnttab_init_arch(gt) 0
-#define gnttab_destroy_arch(gt) do {} while ( 0 )
 #define gnttab_set_frame_gfn(gt, st, idx, gfn, mfn)                      \
     (gfn_eq(gfn, INVALID_GFN)                                            \
      ? guest_physmap_remove_page((gt)->domain,                           \
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index cb90527..3c153c6 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -327,6 +327,10 @@  struct page_info
 
 #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
 
+/* No arch-specific actions are needed for the xenheap page */
+#define arch_alloc_xenheap_page(p)   do {} while ( 0 )
+#define arch_free_xenheap_page(p)    do {} while ( 0 )
+
 #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
 extern unsigned long max_page;
 extern unsigned long total_pages;