diff mbox series

[for,4.20?,v4,3/3] xen/riscv: update mfn calculation in pt_mapping_level()

Message ID 38093d9843afbba9dda7326ee6e8cc3c99343cf6.1739363240.git.oleksii.kurochko@gmail.com (mailing list archive)
State New
Headers show
Series Fixes for vmap_to_mfn() and pt_mapping_level | expand

Commit Message

Oleksii Kurochko Feb. 12, 2025, 4:50 p.m. UTC
When pt_update() is called with arguments (..., INVALID_MFN, ..., 0 or 1),
it indicates that a mapping is being destroyed/modifyed.

In the case when modifying or destroying a mapping, it is necessary to
search until a leaf node is found, instead of searching for a page table
entry based on the precalculated `level` and `order`(look at pt_update()).
This is because when `mfn` == INVALID_MFN, the `mask` (in pt_mapping_level())
will take into account only `vfn`, which could accidentally return an
incorrect level, leading to the discovery of an incorrect page table entry.

For example, if `vfn` is page table level 1 aligned, but it was mapped as
page table level 0, then pt_mapping_level() will return `level` = 1, since
only `vfn` (which is page table level 1 aligned) is taken into account when
`mfn` == INVALID_MFN (look at pt_mapping_level()).

Fixes: c2f1ded524 ("xen/riscv: page table handling")
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v4:
- Move defintion of local variable table inside `else` case as it is
  used only there.
- Change unmap_table(table) to unmap_table(entry) for unifying both
  cases when _pt_walk() is used and when pte is seached on the specified
  level.
- Initialize local variable `entry` to avoid compilation error caused by
  uninitialized variable.
---
Changes in v3:
- Drop ASSERT() for order as it isn't needed anymore.
- Drop PTE_LEAF_SEARCH and use instead level=CONFIG_PAGING_LEVELS;
  refactor connected code correspondingly.
- Calculate order once.
- Drop initializer for local variable order.
- Drop BUG_ON(!pte_is_mapping(*entry)) for the case when leaf searching
  happens as there is a similar check in pt_check_entry(). Look at
  pt.c:41 and pt.c:75.
---
Changes in v2:
 - Introduce PTE_LEAF_SEARCH to tell page table update operation to
   walk down to wherever the leaf entry is.
 - Use introduced PTE_LEAF_SEARCH to not searching pte_t entry twice.
 - Update the commit message.
---
 xen/arch/riscv/pt.c | 97 +++++++++++++++++++++++++++++----------------
 1 file changed, 63 insertions(+), 34 deletions(-)

Comments

Jan Beulich Feb. 19, 2025, 11:28 a.m. UTC | #1
On 12.02.2025 17:50, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/pt.c
> +++ b/xen/arch/riscv/pt.c
> @@ -249,12 +249,10 @@ pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>  
>  /* Update an entry at the level @target. */
>  static int pt_update_entry(mfn_t root, vaddr_t virt,
> -                           mfn_t mfn, unsigned int target,
> +                           mfn_t mfn, unsigned int *target,
>                             unsigned int flags)
>  {
>      int rc;
> -    unsigned int level = HYP_PT_ROOT_LEVEL;
> -    pte_t *table;
>      /*
>       * The intermediate page table shouldn't be allocated when MFN isn't
>       * valid and we are not populating page table.
> @@ -265,41 +263,48 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>       * combinations of (mfn, flags).
>      */
>      bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
> -    pte_t pte, *entry;
> -
> -    /* convenience aliases */
> -    DECLARE_OFFSETS(offsets, virt);
> +    pte_t pte, *entry = NULL;

With there also being "table" below, "entry" isn't quite as bad as in the
other patch. Yet I'd still like to ask that you consider renaming.

> -    table = map_table(root);
> -    for ( ; level > target; level-- )
> +    if ( *target == CONFIG_PAGING_LEVELS )
> +        entry = _pt_walk(virt, target);

Imo it's quite important for the comment ahead of the function to be updated
to mention this special case.

> +    else
>      {
> -        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
> -        if ( rc == XEN_TABLE_MAP_NOMEM )
> +        pte_t *table;
> +        unsigned int level = HYP_PT_ROOT_LEVEL;
> +        /* convenience aliases */

Nit: Style.

> @@ -331,7 +336,8 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>      rc = 0;
>  
>   out:
> -    unmap_table(table);
> +    if ( entry )
> +        unmap_table(entry);

Would it perhaps be worth for unmap_table() to gracefully handle being passed
NULL, to avoid such conditionals (there may be more in the future)?

Jan
Oleksii Kurochko Feb. 19, 2025, 2:46 p.m. UTC | #2
On 2/19/25 12:28 PM, Jan Beulich wrote:
> On 12.02.2025 17:50, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/pt.c
>> +++ b/xen/arch/riscv/pt.c
>> @@ -249,12 +249,10 @@ pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>>   
>>   /* Update an entry at the level @target. */
>>   static int pt_update_entry(mfn_t root, vaddr_t virt,
>> -                           mfn_t mfn, unsigned int target,
>> +                           mfn_t mfn, unsigned int *target,
>>                              unsigned int flags)
>>   {
>>       int rc;
>> -    unsigned int level = HYP_PT_ROOT_LEVEL;
>> -    pte_t *table;
>>       /*
>>        * The intermediate page table shouldn't be allocated when MFN isn't
>>        * valid and we are not populating page table.
>> @@ -265,41 +263,48 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>>        * combinations of (mfn, flags).
>>       */
>>       bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
>> -    pte_t pte, *entry;
>> -
>> -    /* convenience aliases */
>> -    DECLARE_OFFSETS(offsets, virt);
>> +    pte_t pte, *entry = NULL;
> With there also being "table" below, "entry" isn't quite as bad as in the
> other patch. Yet I'd still like to ask that you consider renaming.
>
>> -    table = map_table(root);
>> -    for ( ; level > target; level-- )
>> +    if ( *target == CONFIG_PAGING_LEVELS )
>> +        entry = _pt_walk(virt, target);
> Imo it's quite important for the comment ahead of the function to be updated
> to mention this special case.
>
>> +    else
>>       {
>> -        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
>> -        if ( rc == XEN_TABLE_MAP_NOMEM )
>> +        pte_t *table;
>> +        unsigned int level = HYP_PT_ROOT_LEVEL;
>> +        /* convenience aliases */
> Nit: Style.

 From the 'Comments' section of CODING_STYLE, I see that the comment should start
with capital letter. Do you mean that?

>
>> @@ -331,7 +336,8 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>>       rc = 0;
>>   
>>    out:
>> -    unmap_table(table);
>> +    if ( entry )
>> +        unmap_table(entry);
> Would it perhaps be worth for unmap_table() to gracefully handle being passed
> NULL, to avoid such conditionals (there may be more in the future)?

Agree, it would be more safe to move this check inside unmap_table(). I will update
that.

Thanks.

~ Oleksii
Jan Beulich Feb. 19, 2025, 3:05 p.m. UTC | #3
On 19.02.2025 15:46, Oleksii Kurochko wrote:
> On 2/19/25 12:28 PM, Jan Beulich wrote:
>> On 12.02.2025 17:50, Oleksii Kurochko wrote:
>>> +    else
>>>       {
>>> -        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
>>> -        if ( rc == XEN_TABLE_MAP_NOMEM )
>>> +        pte_t *table;
>>> +        unsigned int level = HYP_PT_ROOT_LEVEL;
>>> +        /* convenience aliases */
>> Nit: Style.
> 
>  From the 'Comments' section of CODING_STYLE, I see that the comment should start
> with capital letter. Do you mean that?

In the (earlier) reply to "xen/riscv: identify specific ISA supported by cpu"
I said precisely that. I didn't think I'd need to repeat it here. So yes.

Jan
Oleksii Kurochko Feb. 19, 2025, 3:09 p.m. UTC | #4
On 2/19/25 4:05 PM, Jan Beulich wrote:
> On 19.02.2025 15:46, Oleksii Kurochko wrote:
>> On 2/19/25 12:28 PM, Jan Beulich wrote:
>>> On 12.02.2025 17:50, Oleksii Kurochko wrote:
>>>> +    else
>>>>        {
>>>> -        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
>>>> -        if ( rc == XEN_TABLE_MAP_NOMEM )
>>>> +        pte_t *table;
>>>> +        unsigned int level = HYP_PT_ROOT_LEVEL;
>>>> +        /* convenience aliases */
>>> Nit: Style.
>>   From the 'Comments' section of CODING_STYLE, I see that the comment should start
>> with capital letter. Do you mean that?
> In the (earlier) reply to "xen/riscv: identify specific ISA supported by cpu"
> I said precisely that. I didn't think I'd need to repeat it here. So yes.

Of course, it was enough. The problem was that I started to read and answer to this patch
series first and went to another (where you wrote that) one after.

Anyway thank you for clarifying.

~ Oleksii
diff mbox series

Patch

diff --git a/xen/arch/riscv/pt.c b/xen/arch/riscv/pt.c
index 260a3a9699..ed0587d58b 100644
--- a/xen/arch/riscv/pt.c
+++ b/xen/arch/riscv/pt.c
@@ -249,12 +249,10 @@  pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
 
 /* Update an entry at the level @target. */
 static int pt_update_entry(mfn_t root, vaddr_t virt,
-                           mfn_t mfn, unsigned int target,
+                           mfn_t mfn, unsigned int *target,
                            unsigned int flags)
 {
     int rc;
-    unsigned int level = HYP_PT_ROOT_LEVEL;
-    pte_t *table;
     /*
      * The intermediate page table shouldn't be allocated when MFN isn't
      * valid and we are not populating page table.
@@ -265,41 +263,48 @@  static int pt_update_entry(mfn_t root, vaddr_t virt,
      * combinations of (mfn, flags).
     */
     bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
-    pte_t pte, *entry;
-
-    /* convenience aliases */
-    DECLARE_OFFSETS(offsets, virt);
+    pte_t pte, *entry = NULL;
 
-    table = map_table(root);
-    for ( ; level > target; level-- )
+    if ( *target == CONFIG_PAGING_LEVELS )
+        entry = _pt_walk(virt, target);
+    else
     {
-        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
-        if ( rc == XEN_TABLE_MAP_NOMEM )
+        pte_t *table;
+        unsigned int level = HYP_PT_ROOT_LEVEL;
+        /* convenience aliases */
+        DECLARE_OFFSETS(offsets, virt);
+
+        table = map_table(root);
+        for ( ; level > *target; level-- )
         {
-            rc = -ENOMEM;
-            goto out;
+            rc = pt_next_level(alloc_tbl, &table, offsets[level]);
+            if ( rc == XEN_TABLE_MAP_NOMEM )
+            {
+                rc = -ENOMEM;
+                goto out;
+            }
+
+            if ( rc == XEN_TABLE_MAP_NONE )
+            {
+                rc = 0;
+                goto out;
+            }
+
+            if ( rc != XEN_TABLE_NORMAL )
+                break;
         }
 
-        if ( rc == XEN_TABLE_MAP_NONE )
+        if ( level != *target )
         {
-            rc = 0;
+            dprintk(XENLOG_ERR,
+                    "%s: Shattering superpage is not supported\n", __func__);
+            rc = -EOPNOTSUPP;
             goto out;
         }
 
-        if ( rc != XEN_TABLE_NORMAL )
-            break;
-    }
-
-    if ( level != target )
-    {
-        dprintk(XENLOG_ERR,
-                "%s: Shattering superpage is not supported\n", __func__);
-        rc = -EOPNOTSUPP;
-        goto out;
+        entry = table + offsets[level];
     }
 
-    entry = table + offsets[level];
-
     rc = -EINVAL;
     if ( !pt_check_entry(*entry, mfn, flags) )
         goto out;
@@ -331,7 +336,8 @@  static int pt_update_entry(mfn_t root, vaddr_t virt,
     rc = 0;
 
  out:
-    unmap_table(table);
+    if ( entry )
+        unmap_table(entry);
 
     return rc;
 }
@@ -424,17 +430,40 @@  static int pt_update(vaddr_t virt, mfn_t mfn,
 
     while ( left )
     {
-        unsigned int order, level;
-
-        level = pt_mapping_level(vfn, mfn, left, flags);
-        order = XEN_PT_LEVEL_ORDER(level);
+        unsigned int order, level = CONFIG_PAGING_LEVELS;
 
-        ASSERT(left >= BIT(order, UL));
+        /*
+         * In the case when modifying or destroying a mapping, it is necessary
+         * to search until a leaf node is found, instead of searching for
+         * a page table entry based on the precalculated `level` and `order`
+         * (look at pt_update()).
+         * This is because when `mfn` == INVALID_MFN, the `mask`(in
+         * pt_mapping_level()) will take into account only `vfn`, which could
+         * accidentally return an incorrect level, leading to the discovery of
+         * an incorrect page table entry.
+         *
+         * For example, if `vfn` is page table level 1 aligned, but it was
+         * mapped as page table level 0, then pt_mapping_level() will return
+         * `level` = 1, since only `vfn` (which is page table level 1 aligned)
+         * is taken into account when `mfn` == INVALID_MFN
+         * (look at pt_mapping_level()).
+         *
+         * To force searching until a leaf node is found is necessary to have
+         * `level` == CONFIG_PAGING_LEVELS which is a default value for
+         * `level`.
+         *
+         * For other cases (when a mapping is not being modified or destroyed),
+         * pt_mapping_level() should be used.
+         */
+        if ( !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE) )
+            level = pt_mapping_level(vfn, mfn, left, flags);
 
-        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level, flags);
+        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, &level, flags);
         if ( rc )
             break;
 
+        order = XEN_PT_LEVEL_ORDER(level);
+
         vfn += 1UL << order;
         if ( !mfn_eq(mfn, INVALID_MFN) )
             mfn = mfn_add(mfn, 1UL << order);