diff mbox series

[RFC,4/6] xen/arm: mm: Allow other mapping size in xen_pt_update_entry()

Message ID 20201119190751.22345-5-julien@xen.org (mailing list archive)
State New, archived
Headers show
Series xen/arm: mm: Add limited support for superpages | expand

Commit Message

Julien Grall Nov. 19, 2020, 7:07 p.m. UTC
From: Julien Grall <julien.grall@arm.com>

At the moment, xen_pt_update_entry() only supports mapping at level 3
(i.e 4KB mapping). While this is fine for most of the runtime helper,
the boot code will require to use superpage mapping.

We don't want to allow superpage mapping by default as some of the
callers may expect small mappings (i.e populate_pt_range()) or even
expect to unmap only a part of a superpage.

To keep the code simple, a new flag _PAGE_BLOCK is introduced to
allow the caller to enable superpage mapping.

As the code doesn't support all the combinations, xen_pt_check_entry()
is extended to take into account the cases we don't support when
using block mapping:
    - Replacing a table with a mapping. This may happen if region was
    first mapped with 4KB mapping and then later on replaced with a 2MB
    (or 1GB mapping)
    - Removing/modify a table. This may happen if a caller try to remove a
    region with _PAGE_BLOCK set when it was created without it

Note that the current restriction mean that the caller must ensure that
_PAGE_BLOCK is consistently set/cleared across all the updates on a
given virtual region. This ought to be fine with the expected use-cases.

More rework will be necessary if we wanted to remove the restrictions.

Note that nr_mfns is now marked const as it is used for flushing the
TLBs and we don't want it to be modified.

Signed-off-by: Julien Grall <julien.grall@arm.com>

---

This patch is necessary for upcoming changes in the MM code. I would
like to remove most of the open-coding update of the page-tables as they
are not easy to properly fix/extend. For instance, always mapping
xenheap mapping with 1GB superpage is plain wrong because:
    - RAM regions are not always 1GB aligned (such as on RPI 4) and we
    may end up to map MMIO with cacheable attributes
    - RAM may contain reserved regions should either not be mapped
---
 xen/arch/arm/mm.c          | 87 ++++++++++++++++++++++++++++++--------
 xen/include/asm-arm/page.h |  4 ++
 2 files changed, 73 insertions(+), 18 deletions(-)

Comments

Stefano Stabellini Nov. 20, 2020, 1:46 a.m. UTC | #1
On Thu, 19 Nov 2020, Julien Grall wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> At the moment, xen_pt_update_entry() only supports mapping at level 3
> (i.e 4KB mapping). While this is fine for most of the runtime helper,
> the boot code will require to use superpage mapping.
> 
> We don't want to allow superpage mapping by default as some of the
> callers may expect small mappings (i.e populate_pt_range()) or even
> expect to unmap only a part of a superpage.
> 
> To keep the code simple, a new flag _PAGE_BLOCK is introduced to
> allow the caller to enable superpage mapping.
> 
> As the code doesn't support all the combinations, xen_pt_check_entry()
> is extended to take into account the cases we don't support when
> using block mapping:
>     - Replacing a table with a mapping. This may happen if region was
>     first mapped with 4KB mapping and then later on replaced with a 2MB
>     (or 1GB mapping)
>     - Removing/modify a table. This may happen if a caller try to remove a
>     region with _PAGE_BLOCK set when it was created without it
> 
> Note that the current restriction mean that the caller must ensure that
> _PAGE_BLOCK is consistently set/cleared across all the updates on a
> given virtual region. This ought to be fine with the expected use-cases.
> 
> More rework will be necessary if we wanted to remove the restrictions.
> 
> Note that nr_mfns is now marked const as it is used for flushing the
> TLBs and we don't want it to be modified.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>

Thanks for the patch, you might want to update the Signed-off-by (even
if you haven't changed the patch)


> ---
> 
> This patch is necessary for upcoming changes in the MM code. I would
> like to remove most of the open-coding update of the page-tables as they
> are not easy to properly fix/extend. For instance, always mapping
> xenheap mapping with 1GB superpage is plain wrong because:
>     - RAM regions are not always 1GB aligned (such as on RPI 4) and we
>     may end up to map MMIO with cacheable attributes
>     - RAM may contain reserved regions should either not be mapped
> ---
>  xen/arch/arm/mm.c          | 87 ++++++++++++++++++++++++++++++--------
>  xen/include/asm-arm/page.h |  4 ++
>  2 files changed, 73 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 59f8a3f15fd1..af0f12b6e6d3 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1060,9 +1060,10 @@ static int xen_pt_next_level(bool read_only, unsigned int level,
>  }
>  
>  /* Sanity check of the entry */
> -static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
> +static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
> +                               unsigned int flags)
>  {
> -    /* Sanity check when modifying a page. */
> +    /* Sanity check when modifying an entry. */
>      if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
>      {
>          /* We don't allow modifying an invalid entry. */
> @@ -1072,6 +1073,13 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>              return false;
>          }
>  
> +        /* We don't allow modifying a table entry */
> +        if ( !lpae_is_mapping(entry, level) )
> +        {
> +            mm_printk("Modifying a table entry is not allowed.\n");
> +            return false;
> +        }
> +
>          /* We don't allow changing memory attributes. */
>          if ( entry.pt.ai != PAGE_AI_MASK(flags) )
>          {
> @@ -1087,7 +1095,7 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>              return false;
>          }
>      }
> -    /* Sanity check when inserting a page */
> +    /* Sanity check when inserting a mapping */
>      else if ( flags & _PAGE_PRESENT )
>      {
>          /* We should be here with a valid MFN. */
> @@ -1096,18 +1104,28 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>          /* We don't allow replacing any valid entry. */
>          if ( lpae_is_valid(entry) )
>          {
> -            mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
> -                      mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
> +            if ( lpae_is_mapping(entry, level) )
> +                mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
> +                          mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
> +            else
> +                mm_printk("Trying to replace a table with a mapping.\n");
>              return false;
>          }
>      }
> -    /* Sanity check when removing a page. */
> +    /* Sanity check when removing a mapping. */
>      else if ( (flags & (_PAGE_PRESENT|_PAGE_POPULATE)) == 0 )
>      {
>          /* We should be here with an invalid MFN. */
>          ASSERT(mfn_eq(mfn, INVALID_MFN));
>  
> -        /* We don't allow removing page with contiguous bit set. */
> +        /* We don't allow removing a table */
> +        if ( lpae_is_table(entry, level) )
> +        {
> +            mm_printk("Removing a table is not allowed.\n");
> +            return false;
> +        }
> +
> +        /* We don't allow removing a mapping with contiguous bit set. */
>          if ( entry.pt.contig )
>          {
>              mm_printk("Removing entry with contiguous bit set is not allowed.\n");
> @@ -1126,12 +1144,12 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>  }
>  
>  static int xen_pt_update_entry(mfn_t root, unsigned long virt,
> -                               mfn_t mfn, unsigned int flags)
> +                               mfn_t mfn, unsigned int page_order,
> +                               unsigned int flags)
>  {
>      int rc;
>      unsigned int level;
> -    /* We only support 4KB mapping (i.e level 3) for now */
> -    unsigned int target = 3;
> +    unsigned int target = 3 - (page_order / LPAE_SHIFT);

Given that page_order is not used for anything else in this function,
wouldn't it be easier to just pass the target level to
xen_pt_update_entry? Calculating target from page_order, when page_order
is otherwise unused, it doesn't look like the most straightforward way
to do it.


>      lpae_t *table;
>      /*
>       * The intermediate page tables are read-only when the MFN is not valid
> @@ -1186,7 +1204,7 @@ static int xen_pt_update_entry(mfn_t root, unsigned long virt,
>      entry = table + offsets[level];
>  
>      rc = -EINVAL;
> -    if ( !xen_pt_check_entry(*entry, mfn, flags) )
> +    if ( !xen_pt_check_entry(*entry, mfn, level, flags) )
>          goto out;
>  
>      /* If we are only populating page-table, then we are done. */
> @@ -1204,8 +1222,11 @@ static int xen_pt_update_entry(mfn_t root, unsigned long virt,
>          {
>              pte = mfn_to_xen_entry(mfn, PAGE_AI_MASK(flags));
>  
> -            /* Third level entries set pte.pt.table = 1 */
> -            pte.pt.table = 1;
> +            /*
> +             * First and second level pages set pte.pt.table = 0, but
> +             * third level entries set pte.pt.table = 1.
> +             */
> +            pte.pt.table = (level == 3);
>          }
>          else /* We are updating the permission => Copy the current pte. */
>              pte = *entry;
> @@ -1229,11 +1250,12 @@ static DEFINE_SPINLOCK(xen_pt_lock);
>  
>  static int xen_pt_update(unsigned long virt,
>                           mfn_t mfn,
> -                         unsigned long nr_mfns,
> +                         const unsigned long nr_mfns,
>                           unsigned int flags)
>  {
>      int rc = 0;
> -    unsigned long addr = virt, addr_end = addr + nr_mfns * PAGE_SIZE;
> +    unsigned long vfn = paddr_to_pfn(virt);
> +    unsigned long left = nr_mfns;

Given that paddr_to_pfn is meant for physical addresses, I would rather
opencode paddr_to_pfn using PAGE_SHIFT here. Again, just a suggestion.


>      /*
>       * For arm32, page-tables are different on each CPUs. Yet, they share
> @@ -1265,14 +1287,43 @@ static int xen_pt_update(unsigned long virt,
>  
>      spin_lock(&xen_pt_lock);
>  
> -    for ( ; addr < addr_end; addr += PAGE_SIZE )
> +    while ( left )
>      {
> -        rc = xen_pt_update_entry(root, addr, mfn, flags);
> +        unsigned int order;
> +        unsigned long mask;
> +
> +        /*
> +         * Don't take into account the MFN when removing mapping (i.e
> +         * MFN_INVALID) to calculate the correct target order.
> +         *
> +         * XXX: Support superpage mappings if nr is not aligned to a
> +         * superpage size.

It would be good to add another sentence to explain that the checks
below are simply based on masks and rely on the mfn, vfn, and also
nr_mfn to be superpage aligned. (It took me some time to figure it out.)


> +         */
> +        mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
> +        mask |= vfn | left;
> +
> +        /*
> +         * Always use level 3 mapping unless the caller request block
> +         * mapping.
> +         */
> +        if ( likely(!(flags & _PAGE_BLOCK)) )
> +            order = THIRD_ORDER;
> +        else if ( !(mask & (BIT(FIRST_ORDER, UL) - 1)) )
> +            order = FIRST_ORDER;
> +        else if ( !(mask & (BIT(SECOND_ORDER, UL) - 1)) )
> +            order = SECOND_ORDER;
> +        else
> +            order = THIRD_ORDER;
> +
> +        rc = xen_pt_update_entry(root, pfn_to_paddr(vfn), mfn, order, flags);
>          if ( rc )
>              break;
>  
> +        vfn += 1U << order;
>          if ( !mfn_eq(mfn, INVALID_MFN) )
> -            mfn = mfn_add(mfn, 1);
> +            mfn = mfn_add(mfn, 1U << order);
> +
> +        left -= (1U << order);
>
>      }
>  
>      /*
> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> index 4ea8e97247c8..de096b0968e3 100644
> --- a/xen/include/asm-arm/page.h
> +++ b/xen/include/asm-arm/page.h
> @@ -79,6 +79,7 @@
>   * [3:4] Permission flags
>   * [5]   Page present
>   * [6]   Only populate page tables
> + * [7]   Use any level mapping only (i.e. superpages is allowed)
>   */
>  #define PAGE_AI_MASK(x) ((x) & 0x7U)
>  
> @@ -92,6 +93,9 @@
>  #define _PAGE_PRESENT    (1U << 5)
>  #define _PAGE_POPULATE   (1U << 6)
>  
> +#define _PAGE_BLOCK_BIT     7
> +#define _PAGE_BLOCK         (1U << _PAGE_BLOCK_BIT)
> +
>  /*
>   * _PAGE_DEVICE and _PAGE_NORMAL are convenience defines. They are not
>   * meant to be used outside of this header.
Julien Grall Nov. 20, 2020, 4:09 p.m. UTC | #2
Hi Stefano,

On 20/11/2020 01:46, Stefano Stabellini wrote:
> On Thu, 19 Nov 2020, Julien Grall wrote:
>> From: Julien Grall <julien.grall@arm.com>
>>
>> At the moment, xen_pt_update_entry() only supports mapping at level 3
>> (i.e 4KB mapping). While this is fine for most of the runtime helper,
>> the boot code will require to use superpage mapping.
>>
>> We don't want to allow superpage mapping by default as some of the
>> callers may expect small mappings (i.e populate_pt_range()) or even
>> expect to unmap only a part of a superpage.
>>
>> To keep the code simple, a new flag _PAGE_BLOCK is introduced to
>> allow the caller to enable superpage mapping.
>>
>> As the code doesn't support all the combinations, xen_pt_check_entry()
>> is extended to take into account the cases we don't support when
>> using block mapping:
>>      - Replacing a table with a mapping. This may happen if region was
>>      first mapped with 4KB mapping and then later on replaced with a 2MB
>>      (or 1GB mapping)
>>      - Removing/modify a table. This may happen if a caller try to remove a
>>      region with _PAGE_BLOCK set when it was created without it
>>
>> Note that the current restriction mean that the caller must ensure that
>> _PAGE_BLOCK is consistently set/cleared across all the updates on a
>> given virtual region. This ought to be fine with the expected use-cases.
>>
>> More rework will be necessary if we wanted to remove the restrictions.
>>
>> Note that nr_mfns is now marked const as it is used for flushing the
>> TLBs and we don't want it to be modified.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> Thanks for the patch, you might want to update the Signed-off-by (even
> if you haven't changed the patch)

Yes, I realized it afterwards. I will update it in the next version.

>>   static int xen_pt_update_entry(mfn_t root, unsigned long virt,
>> -                               mfn_t mfn, unsigned int flags)
>> +                               mfn_t mfn, unsigned int page_order,
>> +                               unsigned int flags)
>>   {
>>       int rc;
>>       unsigned int level;
>> -    /* We only support 4KB mapping (i.e level 3) for now */
>> -    unsigned int target = 3;
>> +    unsigned int target = 3 - (page_order / LPAE_SHIFT);
> 
> Given that page_order is not used for anything else in this function,
> wouldn't it be easier to just pass the target level to
> xen_pt_update_entry? Calculating target from page_order, when page_order
> is otherwise unused, it doesn't look like the most straightforward way
> to do it.

FWIW, this is the same way we use in __p2m_set_entry() 
(xen_pt_update_entry() is derived from it).

Anyway, in the caller, we need to know the size of the mapping. I would 
rather avoid to have to keep two variables when one can "easily" infer 
the second one.

One possibility would be to introduce a static array level_orders 
(already exist in the p2m) that would allow us to easily convert from a 
level to an order.

Let me see if that's fit with my next plan (I am looking to add suport 
for the contiguous bit as well).

> 
> 
>>       lpae_t *table;
>>       /*
>>        * The intermediate page tables are read-only when the MFN is not valid
>> @@ -1186,7 +1204,7 @@ static int xen_pt_update_entry(mfn_t root, unsigned long virt,
>>       entry = table + offsets[level];
>>   
>>       rc = -EINVAL;
>> -    if ( !xen_pt_check_entry(*entry, mfn, flags) )
>> +    if ( !xen_pt_check_entry(*entry, mfn, level, flags) )
>>           goto out;
>>   
>>       /* If we are only populating page-table, then we are done. */
>> @@ -1204,8 +1222,11 @@ static int xen_pt_update_entry(mfn_t root, unsigned long virt,
>>           {
>>               pte = mfn_to_xen_entry(mfn, PAGE_AI_MASK(flags));
>>   
>> -            /* Third level entries set pte.pt.table = 1 */
>> -            pte.pt.table = 1;
>> +            /*
>> +             * First and second level pages set pte.pt.table = 0, but
>> +             * third level entries set pte.pt.table = 1.
>> +             */
>> +            pte.pt.table = (level == 3);
>>           }
>>           else /* We are updating the permission => Copy the current pte. */
>>               pte = *entry;
>> @@ -1229,11 +1250,12 @@ static DEFINE_SPINLOCK(xen_pt_lock);
>>   
>>   static int xen_pt_update(unsigned long virt,
>>                            mfn_t mfn,
>> -                         unsigned long nr_mfns,
>> +                         const unsigned long nr_mfns,
>>                            unsigned int flags)
>>   {
>>       int rc = 0;
>> -    unsigned long addr = virt, addr_end = addr + nr_mfns * PAGE_SIZE;
>> +    unsigned long vfn = paddr_to_pfn(virt);
>> +    unsigned long left = nr_mfns;
> 
> Given that paddr_to_pfn is meant for physical addresses, I would rather
> opencode paddr_to_pfn using PAGE_SHIFT here. Again, just a suggestion.
paddr_to_pfn() is poorly named. This is meant to take any address and 
return the frame.

There are wrapper for machine address and guest address but there is no 
concept for the virtual yet.

Long term,, I would like to kill paddr_to_pfn() use on Arm in favor of 
the typesafe version. So I should probably not introduce a new one :).

I will open-code the shift.

> 
>>       /*
>>        * For arm32, page-tables are different on each CPUs. Yet, they share
>> @@ -1265,14 +1287,43 @@ static int xen_pt_update(unsigned long virt,
>>   
>>       spin_lock(&xen_pt_lock);
>>   
>> -    for ( ; addr < addr_end; addr += PAGE_SIZE )
>> +    while ( left )
>>       {
>> -        rc = xen_pt_update_entry(root, addr, mfn, flags);
>> +        unsigned int order;
>> +        unsigned long mask;
>> +
>> +        /*
>> +         * Don't take into account the MFN when removing mapping (i.e
>> +         * MFN_INVALID) to calculate the correct target order.
>> +         *
>> +         * XXX: Support superpage mappings if nr is not aligned to a
>> +         * superpage size.
> 
> It would be good to add another sentence to explain that the checks
> below are simply based on masks and rely on the mfn, vfn, and also
> nr_mfn to be superpage aligned. (It took me some time to figure it out.)

I am not sure to understand what you wrote here. Could you suggest a 
sentence?

Regarding the TODO itself, we have the exact same one in the P2M code. I 
couldn't find a clever way to deal with it yet. Any idea how this could 
be solved?

Cheers,
Stefano Stabellini Nov. 23, 2020, 10:27 p.m. UTC | #3
On Fri, 20 Nov 2020, Julien Grall wrote:
> > >       /*
> > >        * For arm32, page-tables are different on each CPUs. Yet, they
> > > share
> > > @@ -1265,14 +1287,43 @@ static int xen_pt_update(unsigned long virt,
> > >         spin_lock(&xen_pt_lock);
> > >   -    for ( ; addr < addr_end; addr += PAGE_SIZE )
> > > +    while ( left )
> > >       {
> > > -        rc = xen_pt_update_entry(root, addr, mfn, flags);
> > > +        unsigned int order;
> > > +        unsigned long mask;
> > > +
> > > +        /*
> > > +         * Don't take into account the MFN when removing mapping (i.e
> > > +         * MFN_INVALID) to calculate the correct target order.
> > > +         *
> > > +         * XXX: Support superpage mappings if nr is not aligned to a
> > > +         * superpage size.
> > 
> > It would be good to add another sentence to explain that the checks
> > below are simply based on masks and rely on the mfn, vfn, and also
> > nr_mfn to be superpage aligned. (It took me some time to figure it out.)
> 
> I am not sure to understand what you wrote here. Could you suggest a sentence?

Something like the following:

/*
 * Don't take into account the MFN when removing mapping (i.e
 * MFN_INVALID) to calculate the correct target order.
 *
 * This loop relies on mfn, vfn, and nr_mfn, to be all superpage
 * aligned, and it uses `mask' to check for that.
 *
 * XXX: Support superpage mappings if nr_mfn is not aligned to a
 * superpage size.
 */


> Regarding the TODO itself, we have the exact same one in the P2M code. I
> couldn't find a clever way to deal with it yet. Any idea how this could be
> solved?
 
I was thinking of a loop that start with the highest possible superpage
size that virt and mfn are aligned to, and also smaller or equal to
nr_mfn. So rather than using the mask to also make sure nr_mfns is
aligned, I would only use the mask to check that mfn and virt are
aligned. Then, we only need to check that superpage_size <= left.

Concrete example: virt and mfn are 2MB aligned, nr_mfn is 5MB / 1280 4K
pages. We allocate 2MB superpages until onlt 1MB is left. At that point
superpage_size <= left fails and we go down to 4K allocations.

Would that work?
Julien Grall Nov. 23, 2020, 11:23 p.m. UTC | #4
Hi Stefano,

On 23/11/2020 22:27, Stefano Stabellini wrote:
> On Fri, 20 Nov 2020, Julien Grall wrote:
>>>>        /*
>>>>         * For arm32, page-tables are different on each CPUs. Yet, they
>>>> share
>>>> @@ -1265,14 +1287,43 @@ static int xen_pt_update(unsigned long virt,
>>>>          spin_lock(&xen_pt_lock);
>>>>    -    for ( ; addr < addr_end; addr += PAGE_SIZE )
>>>> +    while ( left )
>>>>        {
>>>> -        rc = xen_pt_update_entry(root, addr, mfn, flags);
>>>> +        unsigned int order;
>>>> +        unsigned long mask;
>>>> +
>>>> +        /*
>>>> +         * Don't take into account the MFN when removing mapping (i.e
>>>> +         * MFN_INVALID) to calculate the correct target order.
>>>> +         *
>>>> +         * XXX: Support superpage mappings if nr is not aligned to a
>>>> +         * superpage size.
>>>
>>> It would be good to add another sentence to explain that the checks
>>> below are simply based on masks and rely on the mfn, vfn, and also
>>> nr_mfn to be superpage aligned. (It took me some time to figure it out.)
>>
>> I am not sure to understand what you wrote here. Could you suggest a sentence?
> 
> Something like the following:
> 
> /*
>   * Don't take into account the MFN when removing mapping (i.e
>   * MFN_INVALID) to calculate the correct target order.
>   *
>   * This loop relies on mfn, vfn, and nr_mfn, to be all superpage
>   * aligned, and it uses `mask' to check for that.

Unfortunately, I am still not sure to understand this comment.
The loop can deal with any (super)page size (4KB, 2MB, 1GB). There are 
no assumption on any alignment for mfn, vfn and nr_mfn.

By OR-ing the 3 components together, we can use it to find out the 
maximum size that can be used for the mapping.

So can you clarify what you mean?

>   *
>   * XXX: Support superpage mappings if nr_mfn is not aligned to a
>   * superpage size.
>   */
> 
> 
>> Regarding the TODO itself, we have the exact same one in the P2M code. I
>> couldn't find a clever way to deal with it yet. Any idea how this could be
>> solved?
>   
> I was thinking of a loop that start with the highest possible superpage
> size that virt and mfn are aligned to, and also smaller or equal to
> nr_mfn. So rather than using the mask to also make sure nr_mfns is
> aligned, I would only use the mask to check that mfn and virt are
> aligned. Then, we only need to check that superpage_size <= left.
> 
> Concrete example: virt and mfn are 2MB aligned, nr_mfn is 5MB / 1280 4K
> pages. We allocate 2MB superpages until onlt 1MB is left. At that point
> superpage_size <= left fails and we go down to 4K allocations.
> 
> Would that work?

Unfortunately no, AFAICT, your assumption is that vfn/mfn are originally 
aligned to higest possible superpage size. There are situation where 
this is not the case.

To give a concrete example, at the moment the RAM is mapped using 1GB 
superpage in Xen. But in the future, we will only want to map RAM 
regions in the directmap that haven't been marked as reserved [1].

Those reserved regions don't have architectural alignment or placement.

I will use an over-exegerated example (or maybe not :)).

Imagine you have 4GB of RAM starting at 0. The HW/Software engineer 
decided to place a 2MB reserved region start at 512MB.

As a result we would want to map two RAM regions:
    1) 0 to 512MB
    2) 514MB to 4GB

I will only focus on 2). In the ideal situation, we would want to map
    a) 514MB to 1GB using 2MB superpage
    b) 1GB to 4GB using 1GB superpage

We don't want be to use 2MB superpage because this will increase TLB 
pressure (we want to avoid Xen using too much TLB entries) and also 
increase the size of the page-tables.

Therefore, we want to select the best size for each iteration. For now, 
the only solution I can come up with is to OR vfn/mfn and then use a 
series of check to compare the mask and nr_mfn.

In addition to the "classic" mappings (i.e. 4KB, 2MB, 1GB). I would like 
to explore contiguous mapping (e.g. 64KB, 32MB) to further reduce the 
TLBs pressure. Note that a processor may or may not take advantage of 
contiguous mapping to reduce the number of TLBs used.

This will unfortunately increase the numbers of check. I will try to 
come up with a patch and we can discuss from there.

Cheers,

[1] Reserved region may be marked as uncacheable and therefore we 
shouldn't map them in Xen address space to avoid break cache coherency.
Stefano Stabellini Nov. 24, 2020, 12:25 a.m. UTC | #5
On Mon, 23 Nov 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 23/11/2020 22:27, Stefano Stabellini wrote:
> > On Fri, 20 Nov 2020, Julien Grall wrote:
> > > > >        /*
> > > > >         * For arm32, page-tables are different on each CPUs. Yet, they
> > > > > share
> > > > > @@ -1265,14 +1287,43 @@ static int xen_pt_update(unsigned long virt,
> > > > >          spin_lock(&xen_pt_lock);
> > > > >    -    for ( ; addr < addr_end; addr += PAGE_SIZE )
> > > > > +    while ( left )
> > > > >        {
> > > > > -        rc = xen_pt_update_entry(root, addr, mfn, flags);
> > > > > +        unsigned int order;
> > > > > +        unsigned long mask;
> > > > > +
> > > > > +        /*
> > > > > +         * Don't take into account the MFN when removing mapping (i.e
> > > > > +         * MFN_INVALID) to calculate the correct target order.
> > > > > +         *
> > > > > +         * XXX: Support superpage mappings if nr is not aligned to a
> > > > > +         * superpage size.
> > > > 
> > > > It would be good to add another sentence to explain that the checks
> > > > below are simply based on masks and rely on the mfn, vfn, and also
> > > > nr_mfn to be superpage aligned. (It took me some time to figure it out.)
> > > 
> > > I am not sure to understand what you wrote here. Could you suggest a
> > > sentence?
> > 
> > Something like the following:
> > 
> > /*
> >   * Don't take into account the MFN when removing mapping (i.e
> >   * MFN_INVALID) to calculate the correct target order.
> >   *
> >   * This loop relies on mfn, vfn, and nr_mfn, to be all superpage
> >   * aligned, and it uses `mask' to check for that.
> 
> Unfortunately, I am still not sure to understand this comment.
> The loop can deal with any (super)page size (4KB, 2MB, 1GB). There are no
> assumption on any alignment for mfn, vfn and nr_mfn.
> 
> By OR-ing the 3 components together, we can use it to find out the maximum
> size that can be used for the mapping.
> 
> So can you clarify what you mean?

In pseudo-code:

  mask = mfn | vfn | nr_mfns;
  if (mask & ((1<<FIRST_ORDER) - 1))
  if (mask & ((1<<SECOND_ORDER) - 1))
  if (mask & ((1<<THIRD_ORDER) - 1))
  ...

As you wrote the mask is used to find the max size that can be used for
the mapping.

But let's take nr_mfns out of the equation for a moment for clarity:

  mask = mfn | vfn;
  if (mask & ((1<<FIRST_ORDER) - 1))
  if (mask & ((1<<SECOND_ORDER) - 1))
  if (mask & ((1<<THIRD_ORDER) - 1))
  ...

How would you describe this check? I'd call this an alignment check,
is it not?


> >   *
> >   * XXX: Support superpage mappings if nr_mfn is not aligned to a
> >   * superpage size.
> >   */
> > 
> > 
> > > Regarding the TODO itself, we have the exact same one in the P2M code. I
> > > couldn't find a clever way to deal with it yet. Any idea how this could be
> > > solved?
> >   I was thinking of a loop that start with the highest possible superpage
> > size that virt and mfn are aligned to, and also smaller or equal to
> > nr_mfn. So rather than using the mask to also make sure nr_mfns is
> > aligned, I would only use the mask to check that mfn and virt are
> > aligned. Then, we only need to check that superpage_size <= left.
> > 
> > Concrete example: virt and mfn are 2MB aligned, nr_mfn is 5MB / 1280 4K
> > pages. We allocate 2MB superpages until onlt 1MB is left. At that point
> > superpage_size <= left fails and we go down to 4K allocations.
> > 
> > Would that work?
> 
> Unfortunately no, AFAICT, your assumption is that vfn/mfn are originally
> aligned to higest possible superpage size. There are situation where this is
> not the case.

Yes, I was assuming that vfn/mfn are originally aligned to higest
possible superpage size. It is more difficult without that assumption
:-)


> To give a concrete example, at the moment the RAM is mapped using 1GB
> superpage in Xen. But in the future, we will only want to map RAM regions in
> the directmap that haven't been marked as reserved [1].
> 
> Those reserved regions don't have architectural alignment or placement.
> 
> I will use an over-exegerated example (or maybe not :)).
> 
> Imagine you have 4GB of RAM starting at 0. The HW/Software engineer decided to
> place a 2MB reserved region start at 512MB.
> 
> As a result we would want to map two RAM regions:
>    1) 0 to 512MB
>    2) 514MB to 4GB
> 
> I will only focus on 2). In the ideal situation, we would want to map
>    a) 514MB to 1GB using 2MB superpage
>    b) 1GB to 4GB using 1GB superpage
> 
> We don't want be to use 2MB superpage because this will increase TLB pressure
> (we want to avoid Xen using too much TLB entries) and also increase the size
> of the page-tables.
> 
> Therefore, we want to select the best size for each iteration. For now, the
> only solution I can come up with is to OR vfn/mfn and then use a series of
> check to compare the mask and nr_mfn.

Yeah, that's more or less what I was imagining too. Maybe we could use
ffs and friends to avoid or simplify some of those checks.


> In addition to the "classic" mappings (i.e. 4KB, 2MB, 1GB). I would like to
> explore contiguous mapping (e.g. 64KB, 32MB) to further reduce the TLBs
> pressure. Note that a processor may or may not take advantage of contiguous
> mapping to reduce the number of TLBs used.
> 
> This will unfortunately increase the numbers of check. I will try to come up
> with a patch and we can discuss from there.

OK
Bertrand Marquis Nov. 24, 2020, 6:13 p.m. UTC | #6
Hi Julien,

> On 19 Nov 2020, at 19:07, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <julien.grall@arm.com>
> 
> At the moment, xen_pt_update_entry() only supports mapping at level 3
> (i.e 4KB mapping). While this is fine for most of the runtime helper,
> the boot code will require to use superpage mapping.
> 
> We don't want to allow superpage mapping by default as some of the
> callers may expect small mappings (i.e populate_pt_range()) or even
> expect to unmap only a part of a superpage.
> 
> To keep the code simple, a new flag _PAGE_BLOCK is introduced to
> allow the caller to enable superpage mapping.
> 
> As the code doesn't support all the combinations, xen_pt_check_entry()
> is extended to take into account the cases we don't support when
> using block mapping:
>    - Replacing a table with a mapping. This may happen if region was
>    first mapped with 4KB mapping and then later on replaced with a 2MB
>    (or 1GB mapping)
>    - Removing/modify a table. This may happen if a caller try to remove a
>    region with _PAGE_BLOCK set when it was created without it
> 
> Note that the current restriction mean that the caller must ensure that
> _PAGE_BLOCK is consistently set/cleared across all the updates on a
> given virtual region. This ought to be fine with the expected use-cases.
> 
> More rework will be necessary if we wanted to remove the restrictions.
> 
> Note that nr_mfns is now marked const as it is used for flushing the
> TLBs and we don't want it to be modified.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 

First I did test the serie on Arm and so far it was working properly.

I only have some remarks because even if the code is right, I think
some parts of the code are not easy to read...

> ---
> 
> This patch is necessary for upcoming changes in the MM code. I would
> like to remove most of the open-coding update of the page-tables as they
> are not easy to properly fix/extend. For instance, always mapping
> xenheap mapping with 1GB superpage is plain wrong because:
>    - RAM regions are not always 1GB aligned (such as on RPI 4) and we
>    may end up to map MMIO with cacheable attributes
>    - RAM may contain reserved regions should either not be mapped
> ---
> xen/arch/arm/mm.c          | 87 ++++++++++++++++++++++++++++++--------
> xen/include/asm-arm/page.h |  4 ++
> 2 files changed, 73 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 59f8a3f15fd1..af0f12b6e6d3 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1060,9 +1060,10 @@ static int xen_pt_next_level(bool read_only, unsigned int level,
> }
> 
> /* Sanity check of the entry */
> -static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
> +static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
> +                               unsigned int flags)
> {
> -    /* Sanity check when modifying a page. */
> +    /* Sanity check when modifying an entry. */
>     if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
>     {
>         /* We don't allow modifying an invalid entry. */
> @@ -1072,6 +1073,13 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>             return false;
>         }
> 
> +        /* We don't allow modifying a table entry */
> +        if ( !lpae_is_mapping(entry, level) )
> +        {
> +            mm_printk("Modifying a table entry is not allowed.\n");
> +            return false;
> +        }
> +
>         /* We don't allow changing memory attributes. */
>         if ( entry.pt.ai != PAGE_AI_MASK(flags) )
>         {
> @@ -1087,7 +1095,7 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>             return false;
>         }
>     }
> -    /* Sanity check when inserting a page */
> +    /* Sanity check when inserting a mapping */
>     else if ( flags & _PAGE_PRESENT )
>     {
>         /* We should be here with a valid MFN. */
> @@ -1096,18 +1104,28 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>         /* We don't allow replacing any valid entry. */
>         if ( lpae_is_valid(entry) )
>         {
> -            mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
> -                      mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
> +            if ( lpae_is_mapping(entry, level) )
> +                mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
> +                          mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
> +            else
> +                mm_printk("Trying to replace a table with a mapping.\n");
>             return false;
>         }
>     }
> -    /* Sanity check when removing a page. */
> +    /* Sanity check when removing a mapping. */
>     else if ( (flags & (_PAGE_PRESENT|_PAGE_POPULATE)) == 0 )
>     {
>         /* We should be here with an invalid MFN. */
>         ASSERT(mfn_eq(mfn, INVALID_MFN));
> 
> -        /* We don't allow removing page with contiguous bit set. */
> +        /* We don't allow removing a table */
> +        if ( lpae_is_table(entry, level) )
> +        {
> +            mm_printk("Removing a table is not allowed.\n");
> +            return false;
> +        }
> +
> +        /* We don't allow removing a mapping with contiguous bit set. */
>         if ( entry.pt.contig )
>         {
>             mm_printk("Removing entry with contiguous bit set is not allowed.\n");
> @@ -1126,12 +1144,12 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
> }
> 
> static int xen_pt_update_entry(mfn_t root, unsigned long virt,
> -                               mfn_t mfn, unsigned int flags)
> +                               mfn_t mfn, unsigned int page_order,
> +                               unsigned int flags)
> {
>     int rc;
>     unsigned int level;
> -    /* We only support 4KB mapping (i.e level 3) for now */
> -    unsigned int target = 3;
> +    unsigned int target = 3 - (page_order / LPAE_SHIFT);

This is not really straight forward and it would be good to actually explain the computation here or ...

>     lpae_t *table;
>     /*
>      * The intermediate page tables are read-only when the MFN is not valid
> @@ -1186,7 +1204,7 @@ static int xen_pt_update_entry(mfn_t root, unsigned long virt,
>     entry = table + offsets[level];
> 
>     rc = -EINVAL;
> -    if ( !xen_pt_check_entry(*entry, mfn, flags) )
> +    if ( !xen_pt_check_entry(*entry, mfn, level, flags) )
>         goto out;
> 
>     /* If we are only populating page-table, then we are done. */
> @@ -1204,8 +1222,11 @@ static int xen_pt_update_entry(mfn_t root, unsigned long virt,
>         {
>             pte = mfn_to_xen_entry(mfn, PAGE_AI_MASK(flags));
> 
> -            /* Third level entries set pte.pt.table = 1 */
> -            pte.pt.table = 1;
> +            /*
> +             * First and second level pages set pte.pt.table = 0, but
> +             * third level entries set pte.pt.table = 1.
> +             */
> +            pte.pt.table = (level == 3);
>         }
>         else /* We are updating the permission => Copy the current pte. */
>             pte = *entry;
> @@ -1229,11 +1250,12 @@ static DEFINE_SPINLOCK(xen_pt_lock);
> 
> static int xen_pt_update(unsigned long virt,
>                          mfn_t mfn,
> -                         unsigned long nr_mfns,
> +                         const unsigned long nr_mfns,
>                          unsigned int flags)
> {
>     int rc = 0;
> -    unsigned long addr = virt, addr_end = addr + nr_mfns * PAGE_SIZE;
> +    unsigned long vfn = paddr_to_pfn(virt);
> +    unsigned long left = nr_mfns;
> 
>     /*
>      * For arm32, page-tables are different on each CPUs. Yet, they share
> @@ -1265,14 +1287,43 @@ static int xen_pt_update(unsigned long virt,
> 
>     spin_lock(&xen_pt_lock);
> 
> -    for ( ; addr < addr_end; addr += PAGE_SIZE )
> +    while ( left )
>     {
> -        rc = xen_pt_update_entry(root, addr, mfn, flags);
> +        unsigned int order;
> +        unsigned long mask;
> +
> +        /*
> +         * Don't take into account the MFN when removing mapping (i.e
> +         * MFN_INVALID) to calculate the correct target order.
> +         *
> +         * XXX: Support superpage mappings if nr is not aligned to a
> +         * superpage size.
> +         */
> +        mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
> +        mask |= vfn | left;
> +
> +        /*
> +         * Always use level 3 mapping unless the caller request block
> +         * mapping.
> +         */
> +        if ( likely(!(flags & _PAGE_BLOCK)) )
> +            order = THIRD_ORDER;
> +        else if ( !(mask & (BIT(FIRST_ORDER, UL) - 1)) )
> +            order = FIRST_ORDER;
> +        else if ( !(mask & (BIT(SECOND_ORDER, UL) - 1)) )
> +            order = SECOND_ORDER;
> +        else
> +            order = THIRD_ORDER;
> +
> +        rc = xen_pt_update_entry(root, pfn_to_paddr(vfn), mfn, order, flags);

maybe it would be easier here to pass directly the target instead of the page order.

>         if ( rc )
>             break;
> 
> +        vfn += 1U << order;
>         if ( !mfn_eq(mfn, INVALID_MFN) )
> -            mfn = mfn_add(mfn, 1);
> +            mfn = mfn_add(mfn, 1U << order);
> +
> +        left -= (1U << order);
>     }
> 
>     /*
> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> index 4ea8e97247c8..de096b0968e3 100644
> --- a/xen/include/asm-arm/page.h
> +++ b/xen/include/asm-arm/page.h
> @@ -79,6 +79,7 @@
>  * [3:4] Permission flags
>  * [5]   Page present
>  * [6]   Only populate page tables
> + * [7]   Use any level mapping only (i.e. superpages is allowed)

the comment for the bit is not really logic: any level mapping only
Wouldn’t it be more clear to name the bit _PAGE_SUPERPAGE_BIT and
comment it by saying that superpages are allowed ?

Regards
Bertrand

>  */
> #define PAGE_AI_MASK(x) ((x) & 0x7U)
> 
> @@ -92,6 +93,9 @@
> #define _PAGE_PRESENT    (1U << 5)
> #define _PAGE_POPULATE   (1U << 6)
> 
> +#define _PAGE_BLOCK_BIT     7
> +#define _PAGE_BLOCK         (1U << _PAGE_BLOCK_BIT)
> +
> /*
>  * _PAGE_DEVICE and _PAGE_NORMAL are convenience defines. They are not
>  * meant to be used outside of this header.
> -- 
> 2.17.1
>
Julien Grall Nov. 25, 2020, 6:03 p.m. UTC | #7
On 24/11/2020 18:13, Bertrand Marquis wrote:
> Hi Julien,

Hi Bertrand,

>> On 19 Nov 2020, at 19:07, Julien Grall <julien@xen.org> wrote:
>>
>> From: Julien Grall <julien.grall@arm.com>
>>
>> At the moment, xen_pt_update_entry() only supports mapping at level 3
>> (i.e 4KB mapping). While this is fine for most of the runtime helper,
>> the boot code will require to use superpage mapping.
>>
>> We don't want to allow superpage mapping by default as some of the
>> callers may expect small mappings (i.e populate_pt_range()) or even
>> expect to unmap only a part of a superpage.
>>
>> To keep the code simple, a new flag _PAGE_BLOCK is introduced to
>> allow the caller to enable superpage mapping.
>>
>> As the code doesn't support all the combinations, xen_pt_check_entry()
>> is extended to take into account the cases we don't support when
>> using block mapping:
>>     - Replacing a table with a mapping. This may happen if region was
>>     first mapped with 4KB mapping and then later on replaced with a 2MB
>>     (or 1GB mapping)
>>     - Removing/modify a table. This may happen if a caller try to remove a
>>     region with _PAGE_BLOCK set when it was created without it
>>
>> Note that the current restriction mean that the caller must ensure that
>> _PAGE_BLOCK is consistently set/cleared across all the updates on a
>> given virtual region. This ought to be fine with the expected use-cases.
>>
>> More rework will be necessary if we wanted to remove the restrictions.
>>
>> Note that nr_mfns is now marked const as it is used for flushing the
>> TLBs and we don't want it to be modified.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>
> 
> First I did test the serie on Arm and so far it was working properly.

Thanks for the testing and...

> 
> I only have some remarks because even if the code is right, I think
> some parts of the code are not easy to read...

... I am always open for suggestion :).

>> ---
>>
>> This patch is necessary for upcoming changes in the MM code. I would
>> like to remove most of the open-coding update of the page-tables as they
>> are not easy to properly fix/extend. For instance, always mapping
>> xenheap mapping with 1GB superpage is plain wrong because:
>>     - RAM regions are not always 1GB aligned (such as on RPI 4) and we
>>     may end up to map MMIO with cacheable attributes
>>     - RAM may contain reserved regions should either not be mapped
>> ---
>> xen/arch/arm/mm.c          | 87 ++++++++++++++++++++++++++++++--------
>> xen/include/asm-arm/page.h |  4 ++
>> 2 files changed, 73 insertions(+), 18 deletions(-)
>>
>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>> index 59f8a3f15fd1..af0f12b6e6d3 100644
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -1060,9 +1060,10 @@ static int xen_pt_next_level(bool read_only, unsigned int level,
>> }
>>
>> /* Sanity check of the entry */
>> -static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>> +static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
>> +                               unsigned int flags)
>> {
>> -    /* Sanity check when modifying a page. */
>> +    /* Sanity check when modifying an entry. */
>>      if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
>>      {
>>          /* We don't allow modifying an invalid entry. */
>> @@ -1072,6 +1073,13 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>>              return false;
>>          }
>>
>> +        /* We don't allow modifying a table entry */
>> +        if ( !lpae_is_mapping(entry, level) )
>> +        {
>> +            mm_printk("Modifying a table entry is not allowed.\n");
>> +            return false;
>> +        }
>> +
>>          /* We don't allow changing memory attributes. */
>>          if ( entry.pt.ai != PAGE_AI_MASK(flags) )
>>          {
>> @@ -1087,7 +1095,7 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>>              return false;
>>          }
>>      }
>> -    /* Sanity check when inserting a page */
>> +    /* Sanity check when inserting a mapping */
>>      else if ( flags & _PAGE_PRESENT )
>>      {
>>          /* We should be here with a valid MFN. */
>> @@ -1096,18 +1104,28 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>>          /* We don't allow replacing any valid entry. */
>>          if ( lpae_is_valid(entry) )
>>          {
>> -            mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
>> -                      mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
>> +            if ( lpae_is_mapping(entry, level) )
>> +                mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
>> +                          mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
>> +            else
>> +                mm_printk("Trying to replace a table with a mapping.\n");
>>              return false;
>>          }
>>      }
>> -    /* Sanity check when removing a page. */
>> +    /* Sanity check when removing a mapping. */
>>      else if ( (flags & (_PAGE_PRESENT|_PAGE_POPULATE)) == 0 )
>>      {
>>          /* We should be here with an invalid MFN. */
>>          ASSERT(mfn_eq(mfn, INVALID_MFN));
>>
>> -        /* We don't allow removing page with contiguous bit set. */
>> +        /* We don't allow removing a table */
>> +        if ( lpae_is_table(entry, level) )
>> +        {
>> +            mm_printk("Removing a table is not allowed.\n");
>> +            return false;
>> +        }
>> +
>> +        /* We don't allow removing a mapping with contiguous bit set. */
>>          if ( entry.pt.contig )
>>          {
>>              mm_printk("Removing entry with contiguous bit set is not allowed.\n");
>> @@ -1126,12 +1144,12 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>> }
>>
>> static int xen_pt_update_entry(mfn_t root, unsigned long virt,
>> -                               mfn_t mfn, unsigned int flags)
>> +                               mfn_t mfn, unsigned int page_order,
>> +                               unsigned int flags)
>> {
>>      int rc;
>>      unsigned int level;
>> -    /* We only support 4KB mapping (i.e level 3) for now */
>> -    unsigned int target = 3;
>> +    unsigned int target = 3 - (page_order / LPAE_SHIFT);
> 
> This is not really straight forward and it would be good to actually explain the computation here or ...

[...]

>> @@ -1265,14 +1287,43 @@ static int xen_pt_update(unsigned long virt,
>>
>>      spin_lock(&xen_pt_lock);
>>
>> -    for ( ; addr < addr_end; addr += PAGE_SIZE )
>> +    while ( left )
>>      {
>> -        rc = xen_pt_update_entry(root, addr, mfn, flags);
>> +        unsigned int order;
>> +        unsigned long mask;
>> +
>> +        /*
>> +         * Don't take into account the MFN when removing mapping (i.e
>> +         * MFN_INVALID) to calculate the correct target order.
>> +         *
>> +         * XXX: Support superpage mappings if nr is not aligned to a
>> +         * superpage size.
>> +         */
>> +        mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
>> +        mask |= vfn | left;
>> +
>> +        /*
>> +         * Always use level 3 mapping unless the caller request block
>> +         * mapping.
>> +         */
>> +        if ( likely(!(flags & _PAGE_BLOCK)) )
>> +            order = THIRD_ORDER;
>> +        else if ( !(mask & (BIT(FIRST_ORDER, UL) - 1)) )
>> +            order = FIRST_ORDER;
>> +        else if ( !(mask & (BIT(SECOND_ORDER, UL) - 1)) )
>> +            order = SECOND_ORDER;
>> +        else
>> +            order = THIRD_ORDER;
>> +
>> +        rc = xen_pt_update_entry(root, pfn_to_paddr(vfn), mfn, order, flags);
> 
> maybe it would be easier here to pass directly the target instead of the page order.

Stefano suggested the same. For the next version I am planning to 
"hardcoded" the level in the if/else above and then find the order from 
an array similar to level_orders in p2m.c.

> 
>>          if ( rc )
>>              break;
>>
>> +        vfn += 1U << order;
>>          if ( !mfn_eq(mfn, INVALID_MFN) )
>> -            mfn = mfn_add(mfn, 1);
>> +            mfn = mfn_add(mfn, 1U << order);
>> +
>> +        left -= (1U << order);
>>      }
>>
>>      /*
>> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
>> index 4ea8e97247c8..de096b0968e3 100644
>> --- a/xen/include/asm-arm/page.h
>> +++ b/xen/include/asm-arm/page.h
>> @@ -79,6 +79,7 @@
>>   * [3:4] Permission flags
>>   * [5]   Page present
>>   * [6]   Only populate page tables
>> + * [7]   Use any level mapping only (i.e. superpages is allowed)
> 
> the comment for the bit is not really logic: any level mapping only

My original implementation was using the bit the other way around: the 
flag set meant we should only use level 3.

But it turns out to be more complicated to implement because runtime 
users (e.g. vmap()) should only be mapped using small pages to avoid 
trouble

> Wouldn’t it be more clear to name the bit _PAGE_SUPERPAGE_BIT and
> comment it by saying that superpages are allowed ?

I would prefer to keep the name short as the flag will be used in 
combination of others. _PAGE_BLOCK is short and also match the spec :).

In any case, I will update the description of bit 7 with:

"Superpage mappings is allowed".

Cheers,
Julien Grall Nov. 28, 2020, 11:53 a.m. UTC | #8
Hi Stefano,

On 24/11/2020 00:25, Stefano Stabellini wrote:
> On Mon, 23 Nov 2020, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 23/11/2020 22:27, Stefano Stabellini wrote:
>>> On Fri, 20 Nov 2020, Julien Grall wrote:
>>>>>>         /*
>>>>>>          * For arm32, page-tables are different on each CPUs. Yet, they
>>>>>> share
>>>>>> @@ -1265,14 +1287,43 @@ static int xen_pt_update(unsigned long virt,
>>>>>>           spin_lock(&xen_pt_lock);
>>>>>>     -    for ( ; addr < addr_end; addr += PAGE_SIZE )
>>>>>> +    while ( left )
>>>>>>         {
>>>>>> -        rc = xen_pt_update_entry(root, addr, mfn, flags);
>>>>>> +        unsigned int order;
>>>>>> +        unsigned long mask;
>>>>>> +
>>>>>> +        /*
>>>>>> +         * Don't take into account the MFN when removing mapping (i.e
>>>>>> +         * MFN_INVALID) to calculate the correct target order.
>>>>>> +         *
>>>>>> +         * XXX: Support superpage mappings if nr is not aligned to a
>>>>>> +         * superpage size.
>>>>>
>>>>> It would be good to add another sentence to explain that the checks
>>>>> below are simply based on masks and rely on the mfn, vfn, and also
>>>>> nr_mfn to be superpage aligned. (It took me some time to figure it out.)
>>>>
>>>> I am not sure to understand what you wrote here. Could you suggest a
>>>> sentence?
>>>
>>> Something like the following:
>>>
>>> /*
>>>    * Don't take into account the MFN when removing mapping (i.e
>>>    * MFN_INVALID) to calculate the correct target order.
>>>    *
>>>    * This loop relies on mfn, vfn, and nr_mfn, to be all superpage
>>>    * aligned, and it uses `mask' to check for that.
>>
>> Unfortunately, I am still not sure to understand this comment.
>> The loop can deal with any (super)page size (4KB, 2MB, 1GB). There are no
>> assumption on any alignment for mfn, vfn and nr_mfn.
>>
>> By OR-ing the 3 components together, we can use it to find out the maximum
>> size that can be used for the mapping.
>>
>> So can you clarify what you mean?
> 
> In pseudo-code:
> 
>    mask = mfn | vfn | nr_mfns;
>    if (mask & ((1<<FIRST_ORDER) - 1))
>    if (mask & ((1<<SECOND_ORDER) - 1))
>    if (mask & ((1<<THIRD_ORDER) - 1))
>    ...
> 
> As you wrote the mask is used to find the max size that can be used for
> the mapping.
> 
> But let's take nr_mfns out of the equation for a moment for clarity:
> 
>    mask = mfn | vfn;
>    if (mask & ((1<<FIRST_ORDER) - 1))
>    if (mask & ((1<<SECOND_ORDER) - 1))
>    if (mask & ((1<<THIRD_ORDER) - 1))
>    ...
> 
> How would you describe this check? I'd call this an alignment check,
> is it not?
If you take the ``if`` alone, yes they are alignment check. But if you 
take the overall code, then it will just compute which mapping size can 
be used.

However, what I am disputing here is "rely" because there are no 
assumption made on the alignment in the loop (we are able to cater any 
size). In fact, the fact mfn and vfn should be aligned to the mapping 
size is a requirement from the hardware and not the implementation.

Cheers,
Stefano Stabellini Nov. 30, 2020, 10:05 p.m. UTC | #9
On Sat, 28 Nov 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 24/11/2020 00:25, Stefano Stabellini wrote:
> > On Mon, 23 Nov 2020, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 23/11/2020 22:27, Stefano Stabellini wrote:
> > > > On Fri, 20 Nov 2020, Julien Grall wrote:
> > > > > > >         /*
> > > > > > >          * For arm32, page-tables are different on each CPUs. Yet,
> > > > > > > they
> > > > > > > share
> > > > > > > @@ -1265,14 +1287,43 @@ static int xen_pt_update(unsigned long
> > > > > > > virt,
> > > > > > >           spin_lock(&xen_pt_lock);
> > > > > > >     -    for ( ; addr < addr_end; addr += PAGE_SIZE )
> > > > > > > +    while ( left )
> > > > > > >         {
> > > > > > > -        rc = xen_pt_update_entry(root, addr, mfn, flags);
> > > > > > > +        unsigned int order;
> > > > > > > +        unsigned long mask;
> > > > > > > +
> > > > > > > +        /*
> > > > > > > +         * Don't take into account the MFN when removing mapping
> > > > > > > (i.e
> > > > > > > +         * MFN_INVALID) to calculate the correct target order.
> > > > > > > +         *
> > > > > > > +         * XXX: Support superpage mappings if nr is not aligned
> > > > > > > to a
> > > > > > > +         * superpage size.
> > > > > > 
> > > > > > It would be good to add another sentence to explain that the checks
> > > > > > below are simply based on masks and rely on the mfn, vfn, and also
> > > > > > nr_mfn to be superpage aligned. (It took me some time to figure it
> > > > > > out.)
> > > > > 
> > > > > I am not sure to understand what you wrote here. Could you suggest a
> > > > > sentence?
> > > > 
> > > > Something like the following:
> > > > 
> > > > /*
> > > >    * Don't take into account the MFN when removing mapping (i.e
> > > >    * MFN_INVALID) to calculate the correct target order.
> > > >    *
> > > >    * This loop relies on mfn, vfn, and nr_mfn, to be all superpage
> > > >    * aligned, and it uses `mask' to check for that.
> > > 
> > > Unfortunately, I am still not sure to understand this comment.
> > > The loop can deal with any (super)page size (4KB, 2MB, 1GB). There are no
> > > assumption on any alignment for mfn, vfn and nr_mfn.
> > > 
> > > By OR-ing the 3 components together, we can use it to find out the maximum
> > > size that can be used for the mapping.
> > > 
> > > So can you clarify what you mean?
> > 
> > In pseudo-code:
> > 
> >    mask = mfn | vfn | nr_mfns;
> >    if (mask & ((1<<FIRST_ORDER) - 1))
> >    if (mask & ((1<<SECOND_ORDER) - 1))
> >    if (mask & ((1<<THIRD_ORDER) - 1))
> >    ...
> > 
> > As you wrote the mask is used to find the max size that can be used for
> > the mapping.
> > 
> > But let's take nr_mfns out of the equation for a moment for clarity:
> > 
> >    mask = mfn | vfn;
> >    if (mask & ((1<<FIRST_ORDER) - 1))
> >    if (mask & ((1<<SECOND_ORDER) - 1))
> >    if (mask & ((1<<THIRD_ORDER) - 1))
> >    ...
> > 
> > How would you describe this check? I'd call this an alignment check,
> > is it not?
> If you take the ``if`` alone, yes they are alignment check. But if you take
> the overall code, then it will just compute which mapping size can be used.
> 
> However, what I am disputing here is "rely" because there are no assumption
> made on the alignment in the loop (we are able to cater any size). In fact,
> the fact mfn and vfn should be aligned to the mapping size is a requirement
> from the hardware and not the implementation.

OK, maybe the "rely" gives a bad impression. What about:

This loop relies on mfn, vfn, and nr_mfn, to be all superpage aligned
(mfn and vfn have to be architecturally), and it uses `mask' to check
for that.

Feel free to reword it differently if you have a better idea.
Julien Grall April 25, 2021, 3:11 p.m. UTC | #10
Hi Stefano,

On 30/11/2020 22:05, Stefano Stabellini wrote:
> On Sat, 28 Nov 2020, Julien Grall wrote:
>> If you take the ``if`` alone, yes they are alignment check. But if you take
>> the overall code, then it will just compute which mapping size can be used.
>>
>> However, what I am disputing here is "rely" because there are no assumption
>> made on the alignment in the loop (we are able to cater any size). In fact,
>> the fact mfn and vfn should be aligned to the mapping size is a requirement
>> from the hardware and not the implementation.
> 
> OK, maybe the "rely" gives a bad impression. What about:
> 
> This loop relies on mfn, vfn, and nr_mfn, to be all superpage aligned
> (mfn and vfn have to be architecturally), and it uses `mask' to check
> for that.
 >
> Feel free to reword it differently if you have a better idea.
I have used your new wording proposal.

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 59f8a3f15fd1..af0f12b6e6d3 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1060,9 +1060,10 @@  static int xen_pt_next_level(bool read_only, unsigned int level,
 }
 
 /* Sanity check of the entry */
-static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
+static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
+                               unsigned int flags)
 {
-    /* Sanity check when modifying a page. */
+    /* Sanity check when modifying an entry. */
     if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
     {
         /* We don't allow modifying an invalid entry. */
@@ -1072,6 +1073,13 @@  static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
             return false;
         }
 
+        /* We don't allow modifying a table entry */
+        if ( !lpae_is_mapping(entry, level) )
+        {
+            mm_printk("Modifying a table entry is not allowed.\n");
+            return false;
+        }
+
         /* We don't allow changing memory attributes. */
         if ( entry.pt.ai != PAGE_AI_MASK(flags) )
         {
@@ -1087,7 +1095,7 @@  static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
             return false;
         }
     }
-    /* Sanity check when inserting a page */
+    /* Sanity check when inserting a mapping */
     else if ( flags & _PAGE_PRESENT )
     {
         /* We should be here with a valid MFN. */
@@ -1096,18 +1104,28 @@  static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
         /* We don't allow replacing any valid entry. */
         if ( lpae_is_valid(entry) )
         {
-            mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
-                      mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
+            if ( lpae_is_mapping(entry, level) )
+                mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n",
+                          mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
+            else
+                mm_printk("Trying to replace a table with a mapping.\n");
             return false;
         }
     }
-    /* Sanity check when removing a page. */
+    /* Sanity check when removing a mapping. */
     else if ( (flags & (_PAGE_PRESENT|_PAGE_POPULATE)) == 0 )
     {
         /* We should be here with an invalid MFN. */
         ASSERT(mfn_eq(mfn, INVALID_MFN));
 
-        /* We don't allow removing page with contiguous bit set. */
+        /* We don't allow removing a table */
+        if ( lpae_is_table(entry, level) )
+        {
+            mm_printk("Removing a table is not allowed.\n");
+            return false;
+        }
+
+        /* We don't allow removing a mapping with contiguous bit set. */
         if ( entry.pt.contig )
         {
             mm_printk("Removing entry with contiguous bit set is not allowed.\n");
@@ -1126,12 +1144,12 @@  static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
 }
 
 static int xen_pt_update_entry(mfn_t root, unsigned long virt,
-                               mfn_t mfn, unsigned int flags)
+                               mfn_t mfn, unsigned int page_order,
+                               unsigned int flags)
 {
     int rc;
     unsigned int level;
-    /* We only support 4KB mapping (i.e level 3) for now */
-    unsigned int target = 3;
+    unsigned int target = 3 - (page_order / LPAE_SHIFT);
     lpae_t *table;
     /*
      * The intermediate page tables are read-only when the MFN is not valid
@@ -1186,7 +1204,7 @@  static int xen_pt_update_entry(mfn_t root, unsigned long virt,
     entry = table + offsets[level];
 
     rc = -EINVAL;
-    if ( !xen_pt_check_entry(*entry, mfn, flags) )
+    if ( !xen_pt_check_entry(*entry, mfn, level, flags) )
         goto out;
 
     /* If we are only populating page-table, then we are done. */
@@ -1204,8 +1222,11 @@  static int xen_pt_update_entry(mfn_t root, unsigned long virt,
         {
             pte = mfn_to_xen_entry(mfn, PAGE_AI_MASK(flags));
 
-            /* Third level entries set pte.pt.table = 1 */
-            pte.pt.table = 1;
+            /*
+             * First and second level pages set pte.pt.table = 0, but
+             * third level entries set pte.pt.table = 1.
+             */
+            pte.pt.table = (level == 3);
         }
         else /* We are updating the permission => Copy the current pte. */
             pte = *entry;
@@ -1229,11 +1250,12 @@  static DEFINE_SPINLOCK(xen_pt_lock);
 
 static int xen_pt_update(unsigned long virt,
                          mfn_t mfn,
-                         unsigned long nr_mfns,
+                         const unsigned long nr_mfns,
                          unsigned int flags)
 {
     int rc = 0;
-    unsigned long addr = virt, addr_end = addr + nr_mfns * PAGE_SIZE;
+    unsigned long vfn = paddr_to_pfn(virt);
+    unsigned long left = nr_mfns;
 
     /*
      * For arm32, page-tables are different on each CPUs. Yet, they share
@@ -1265,14 +1287,43 @@  static int xen_pt_update(unsigned long virt,
 
     spin_lock(&xen_pt_lock);
 
-    for ( ; addr < addr_end; addr += PAGE_SIZE )
+    while ( left )
     {
-        rc = xen_pt_update_entry(root, addr, mfn, flags);
+        unsigned int order;
+        unsigned long mask;
+
+        /*
+         * Don't take into account the MFN when removing mapping (i.e
+         * MFN_INVALID) to calculate the correct target order.
+         *
+         * XXX: Support superpage mappings if nr is not aligned to a
+         * superpage size.
+         */
+        mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
+        mask |= vfn | left;
+
+        /*
+         * Always use level 3 mapping unless the caller request block
+         * mapping.
+         */
+        if ( likely(!(flags & _PAGE_BLOCK)) )
+            order = THIRD_ORDER;
+        else if ( !(mask & (BIT(FIRST_ORDER, UL) - 1)) )
+            order = FIRST_ORDER;
+        else if ( !(mask & (BIT(SECOND_ORDER, UL) - 1)) )
+            order = SECOND_ORDER;
+        else
+            order = THIRD_ORDER;
+
+        rc = xen_pt_update_entry(root, pfn_to_paddr(vfn), mfn, order, flags);
         if ( rc )
             break;
 
+        vfn += 1U << order;
         if ( !mfn_eq(mfn, INVALID_MFN) )
-            mfn = mfn_add(mfn, 1);
+            mfn = mfn_add(mfn, 1U << order);
+
+        left -= (1U << order);
     }
 
     /*
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index 4ea8e97247c8..de096b0968e3 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -79,6 +79,7 @@ 
  * [3:4] Permission flags
  * [5]   Page present
  * [6]   Only populate page tables
+ * [7]   Use any level mapping only (i.e. superpages is allowed)
  */
 #define PAGE_AI_MASK(x) ((x) & 0x7U)
 
@@ -92,6 +93,9 @@ 
 #define _PAGE_PRESENT    (1U << 5)
 #define _PAGE_POPULATE   (1U << 6)
 
+#define _PAGE_BLOCK_BIT     7
+#define _PAGE_BLOCK         (1U << _PAGE_BLOCK_BIT)
+
 /*
  * _PAGE_DEVICE and _PAGE_NORMAL are convenience defines. They are not
  * meant to be used outside of this header.