diff mbox series

[for,4.20?,v3,1/3] xen/riscv: implement software page table walking

Message ID e78679b00df63bde40eb3a4d97e3ab9d1faf9382.1738933678.git.oleksii.kurochko@gmail.com (mailing list archive)
State Superseded
Headers show
Series Fixes for vmap_to_mfn() and pt_mapping_level | expand

Commit Message

Oleksii Kurochko Feb. 7, 2025, 1:13 p.m. UTC
RISC-V doesn't have hardware feature to ask MMU to translate
virtual address to physical address ( like Arm has, for example ),
so software page table walking is implemented.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
 - Remove circular dependency.
 - Move declaration of pt_walk() to asm/page.h.
 - Revert other not connected to pt_walk() changes.
 - Update the commit message.
 - Drop unnessary anymore local variables of pt_walk().
 - Refactor pt_walk() to use for() loop instead of while() loop
   as it was suggested by Jan B.
 - Introduce _pt_walk() which returns pte_t * and update prototype
   of pt_walk() to return pte_t by value.
---
Changes in v2:
 - Update the code of pt_walk() to return pte_t instead of paddr_t.
 - Fix typos and drop blankets inside parentheses in the comment.
 - Update the `ret` handling; there is no need for `mfn` calculation
   anymore as pt_walk() returns or pte_t of a leaf node or non-present
   pte_t.
 - Drop the comment before unmap_table().
 - Drop local variable `pa` as pt_walk() is going to return pte_t
   instead of paddr_t.
 - Add the comment above pt_walk() to explain what it does and returns.
 - Add additional explanation about possible return values of pt_next_level()
   used inside while loop in pt_walk().
 - Change `pa` to `table` in the comment before while loop in pt_walk()
   as actually this loop finds a page table where paga table entry for `va`
   is located.
 - After including <asm/page.h> in <asm/mm.h>, the following compilation
   error occurs:
    ./arch/riscv/include/asm/cmpxchg.h:56:9: error: implicit declaration of
    function 'GENMASK'
   To resolve this, <xen/bitops.h> needs to be included at the top of
   <asm/cmpxchg.h>.
 - To avoid an issue with the implicit declaration of map_domain_page() and
   unmap_domain_page() after including <asm/page.h> in <asm/mm.h>, the
   implementation of flush_page_to_ram() has been moved to mm.c. (look for
   more detailed explanation in the commit message) As a result drop
   inclusion of <xen/domain.h> in <asm/page.h>.
 - Update the commit message.
---
 xen/arch/riscv/include/asm/page.h |  2 ++
 xen/arch/riscv/pt.c               | 51 +++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

Comments

Jan Beulich Feb. 10, 2025, 4:32 p.m. UTC | #1
On 07.02.2025 14:13, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/pt.c
> +++ b/xen/arch/riscv/pt.c
> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>      return XEN_TABLE_NORMAL;
>  }
>  
> +/*
> + * _pt_walk() performs software page table walking and returns the pte_t of
> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
> + * for further analysis.
> + * Additionally, _pt_walk() returns the level of the found pte.

That's optional, which I think wants expressing here.

> + */
> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
> +{
> +    const mfn_t root = get_root_page();
> +    unsigned int level;
> +    pte_t *table;
> +
> +    DECLARE_OFFSETS(offsets, va);
> +
> +    table = map_table(root);

This mapping operation doesn't look to have a counterpart. Aiui ...

> +    /*
> +     * Find `table` of an entry which corresponds to `va` by iterating for each
> +     * page level and checking if the entry points to a next page table or
> +     * to a page.
> +     *
> +     * Two cases are possible:
> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
> +     *   pt_next_level() is called for page table level 0, it results in the
> +     *   entry being a pointer to a leaf node, thereby returning
> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
> +     *   mapped.
> +     */
> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
> +    {
> +        int ret = pt_next_level(false, &table, offsets[level]);

... the mapping may be replaced here, but a new mapping will then still
be held by this function and ...

> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
> +            break;
> +
> +        ASSERT(level);
> +    }
> +
> +    if ( pte_level )
> +        *pte_level = level;
> +
> +    return table + offsets[level];
> +}

... the final one then be transferred to the caller.

> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
> +{
> +    return *_pt_walk(va, pte_level);
> +}

Hence aiui there needs to be an unmap operation here.

Jan
Oleksii Kurochko Feb. 11, 2025, 9:15 a.m. UTC | #2
On 2/10/25 5:32 PM, Jan Beulich wrote:
> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/pt.c
>> +++ b/xen/arch/riscv/pt.c
>> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>>       return XEN_TABLE_NORMAL;
>>   }
>>   
>> +/*
>> + * _pt_walk() performs software page table walking and returns the pte_t of
>> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
>> + * for further analysis.
>> + * Additionally, _pt_walk() returns the level of the found pte.
> That's optional, which I think wants expressing here.
>
>> + */
>> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
>> +{
>> +    const mfn_t root = get_root_page();
>> +    unsigned int level;
>> +    pte_t *table;
>> +
>> +    DECLARE_OFFSETS(offsets, va);
>> +
>> +    table = map_table(root);
> This mapping operation doesn't look to have a counterpart. Aiui ...
>
>> +    /*
>> +     * Find `table` of an entry which corresponds to `va` by iterating for each
>> +     * page level and checking if the entry points to a next page table or
>> +     * to a page.
>> +     *
>> +     * Two cases are possible:
>> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
>> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
>> +     *   pt_next_level() is called for page table level 0, it results in the
>> +     *   entry being a pointer to a leaf node, thereby returning
>> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
>> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
>> +     *   mapped.
>> +     */
>> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
>> +    {
>> +        int ret = pt_next_level(false, &table, offsets[level]);
> ... the mapping may be replaced here, but a new mapping will then still
> be held by this function and ...
>
>> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>> +            break;
>> +
>> +        ASSERT(level);
>> +    }
>> +
>> +    if ( pte_level )
>> +        *pte_level = level;
>> +
>> +    return table + offsets[level];
>> +}
> ... the final one then be transferred to the caller.
>
>> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>> +{
>> +    return *_pt_walk(va, pte_level);
>> +}
> Hence aiui there needs to be an unmap operation here.

Agree, it should be an unmap here. I will update that in the next patch version.

Thanks.

~ Oleksii
Oleksii Kurochko Feb. 12, 2025, 11:13 a.m. UTC | #3
On 2/10/25 5:32 PM, Jan Beulich wrote:
> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/pt.c
>> +++ b/xen/arch/riscv/pt.c
>> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>>       return XEN_TABLE_NORMAL;
>>   }
>>   
>> +/*
>> + * _pt_walk() performs software page table walking and returns the pte_t of
>> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
>> + * for further analysis.
>> + * Additionally, _pt_walk() returns the level of the found pte.
> That's optional, which I think wants expressing here.
>
>> + */
>> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
>> +{
>> +    const mfn_t root = get_root_page();
>> +    unsigned int level;
>> +    pte_t *table;
>> +
>> +    DECLARE_OFFSETS(offsets, va);
>> +
>> +    table = map_table(root);
> This mapping operation doesn't look to have a counterpart. Aiui ...
>
>> +    /*
>> +     * Find `table` of an entry which corresponds to `va` by iterating for each
>> +     * page level and checking if the entry points to a next page table or
>> +     * to a page.
>> +     *
>> +     * Two cases are possible:
>> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
>> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
>> +     *   pt_next_level() is called for page table level 0, it results in the
>> +     *   entry being a pointer to a leaf node, thereby returning
>> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
>> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
>> +     *   mapped.
>> +     */
>> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
>> +    {
>> +        int ret = pt_next_level(false, &table, offsets[level]);
> ... the mapping may be replaced here, but a new mapping will then still
> be held by this function and ...
>
>> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>> +            break;
>> +
>> +        ASSERT(level);
>> +    }
>> +
>> +    if ( pte_level )
>> +        *pte_level = level;
>> +
>> +    return table + offsets[level];
>> +}
> ... the final one then be transferred to the caller.
>
>> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>> +{
>> +    return *_pt_walk(va, pte_level);
>> +}
> Hence aiui there needs to be an unmap operation here.

As _pt_walk() returns page table entry, it is needed to transform entry to table.

Do we have any function in Xen for that?

Or the best I can do is:
   pte_t *table = *_pt_walk(va, pte_level) - TABLE_OFFSET(pt_linear_offset(*pte_level, va)
(of course, it is needed to check if pte_level isn't null and do some extra actions if it is NULL)

As an option, as all page tables are PAGE_SIZE aligned, we could use PAGE_OFFSET():
  pte_t *entry = _pt_walk(va, pte_level);
  pte_t *table = PAGE_OFFSET(entry);

~ Oleksii
Jan Beulich Feb. 12, 2025, 11:27 a.m. UTC | #4
On 12.02.2025 12:13, Oleksii Kurochko wrote:
> 
> On 2/10/25 5:32 PM, Jan Beulich wrote:
>> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/pt.c
>>> +++ b/xen/arch/riscv/pt.c
>>> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>>>       return XEN_TABLE_NORMAL;
>>>   }
>>>   
>>> +/*
>>> + * _pt_walk() performs software page table walking and returns the pte_t of
>>> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
>>> + * for further analysis.
>>> + * Additionally, _pt_walk() returns the level of the found pte.
>> That's optional, which I think wants expressing here.
>>
>>> + */
>>> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
>>> +{
>>> +    const mfn_t root = get_root_page();
>>> +    unsigned int level;
>>> +    pte_t *table;
>>> +
>>> +    DECLARE_OFFSETS(offsets, va);
>>> +
>>> +    table = map_table(root);
>> This mapping operation doesn't look to have a counterpart. Aiui ...
>>
>>> +    /*
>>> +     * Find `table` of an entry which corresponds to `va` by iterating for each
>>> +     * page level and checking if the entry points to a next page table or
>>> +     * to a page.
>>> +     *
>>> +     * Two cases are possible:
>>> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
>>> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
>>> +     *   pt_next_level() is called for page table level 0, it results in the
>>> +     *   entry being a pointer to a leaf node, thereby returning
>>> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
>>> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
>>> +     *   mapped.
>>> +     */
>>> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
>>> +    {
>>> +        int ret = pt_next_level(false, &table, offsets[level]);
>> ... the mapping may be replaced here, but a new mapping will then still
>> be held by this function and ...
>>
>>> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>>> +            break;
>>> +
>>> +        ASSERT(level);
>>> +    }
>>> +
>>> +    if ( pte_level )
>>> +        *pte_level = level;
>>> +
>>> +    return table + offsets[level];
>>> +}
>> ... the final one then be transferred to the caller.
>>
>>> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>>> +{
>>> +    return *_pt_walk(va, pte_level);
>>> +}
>> Hence aiui there needs to be an unmap operation here.
> 
> As _pt_walk() returns page table entry, it is needed to transform entry to table.
> 
> Do we have any function in Xen for that?

I don't understand. Both unmap_domain_page() and pmap_unmap() ignore
the low bits of the passed in address.

Jan

> Or the best I can do is:
>    pte_t *table = *_pt_walk(va, pte_level) - TABLE_OFFSET(pt_linear_offset(*pte_level, va)
> (of course, it is needed to check if pte_level isn't null and do some extra actions if it is NULL)
> 
> As an option, as all page tables are PAGE_SIZE aligned, we could use PAGE_OFFSET():
>   pte_t *entry = _pt_walk(va, pte_level);
>   pte_t *table = PAGE_OFFSET(entry);
> 
> ~ Oleksii
> 
>
Oleksii Kurochko Feb. 12, 2025, 12:15 p.m. UTC | #5
On 2/12/25 12:27 PM, Jan Beulich wrote:
> On 12.02.2025 12:13, Oleksii Kurochko wrote:
>> On 2/10/25 5:32 PM, Jan Beulich wrote:
>>> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>>>> --- a/xen/arch/riscv/pt.c
>>>> +++ b/xen/arch/riscv/pt.c
>>>> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>>>>        return XEN_TABLE_NORMAL;
>>>>    }
>>>>    
>>>> +/*
>>>> + * _pt_walk() performs software page table walking and returns the pte_t of
>>>> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
>>>> + * for further analysis.
>>>> + * Additionally, _pt_walk() returns the level of the found pte.
>>> That's optional, which I think wants expressing here.
>>>
>>>> + */
>>>> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
>>>> +{
>>>> +    const mfn_t root = get_root_page();
>>>> +    unsigned int level;
>>>> +    pte_t *table;
>>>> +
>>>> +    DECLARE_OFFSETS(offsets, va);
>>>> +
>>>> +    table = map_table(root);
>>> This mapping operation doesn't look to have a counterpart. Aiui ...
>>>
>>>> +    /*
>>>> +     * Find `table` of an entry which corresponds to `va` by iterating for each
>>>> +     * page level and checking if the entry points to a next page table or
>>>> +     * to a page.
>>>> +     *
>>>> +     * Two cases are possible:
>>>> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
>>>> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
>>>> +     *   pt_next_level() is called for page table level 0, it results in the
>>>> +     *   entry being a pointer to a leaf node, thereby returning
>>>> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
>>>> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
>>>> +     *   mapped.
>>>> +     */
>>>> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
>>>> +    {
>>>> +        int ret = pt_next_level(false, &table, offsets[level]);
>>> ... the mapping may be replaced here, but a new mapping will then still
>>> be held by this function and ...
>>>
>>>> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>>>> +            break;
>>>> +
>>>> +        ASSERT(level);
>>>> +    }
>>>> +
>>>> +    if ( pte_level )
>>>> +        *pte_level = level;
>>>> +
>>>> +    return table + offsets[level];
>>>> +}
>>> ... the final one then be transferred to the caller.
>>>
>>>> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>>>> +{
>>>> +    return *_pt_walk(va, pte_level);
>>>> +}
>>> Hence aiui there needs to be an unmap operation here.
>> As _pt_walk() returns page table entry, it is needed to transform entry to table.
>>
>> Do we have any function in Xen for that?
> I don't understand. Both unmap_domain_page() and pmap_unmap() ignore
> the low bits of the passed in address.

Missed that. Then it is safe to use unmap_table() with page table entry.

Thanks.

~ Oleksii

>> Or the best I can do is:
>>     pte_t *table = *_pt_walk(va, pte_level) - TABLE_OFFSET(pt_linear_offset(*pte_level, va)
>> (of course, it is needed to check if pte_level isn't null and do some extra actions if it is NULL)
>>
>> As an option, as all page tables are PAGE_SIZE aligned, we could use PAGE_OFFSET():
>>    pte_t *entry = _pt_walk(va, pte_level);
>>    pte_t *table = PAGE_OFFSET(entry);
>>
>> ~ Oleksii
>>
>>
diff mbox series

Patch

diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h
index 7a6174a109..0439a1a9ee 100644
--- a/xen/arch/riscv/include/asm/page.h
+++ b/xen/arch/riscv/include/asm/page.h
@@ -208,6 +208,8 @@  static inline pte_t pte_from_mfn(mfn_t mfn, unsigned int flags)
     return (pte_t){ .pte = pte };
 }
 
+pte_t pt_walk(vaddr_t va, unsigned int *pte_level);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* ASM__RISCV__PAGE_H */
diff --git a/xen/arch/riscv/pt.c b/xen/arch/riscv/pt.c
index a703e0f1bd..66cb976b55 100644
--- a/xen/arch/riscv/pt.c
+++ b/xen/arch/riscv/pt.c
@@ -185,6 +185,57 @@  static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
     return XEN_TABLE_NORMAL;
 }
 
+/*
+ * _pt_walk() performs software page table walking and returns the pte_t of
+ * a leaf node or the leaf-most not-present pte_t if no leaf node is found
+ * for further analysis.
+ * Additionally, _pt_walk() returns the level of the found pte.
+ */
+static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
+{
+    const mfn_t root = get_root_page();
+    unsigned int level;
+    pte_t *table;
+
+    DECLARE_OFFSETS(offsets, va);
+
+    table = map_table(root);
+
+    /*
+     * Find `table` of an entry which corresponds to `va` by iterating for each
+     * page level and checking if the entry points to a next page table or
+     * to a page.
+     *
+     * Two cases are possible:
+     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
+     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
+     *   pt_next_level() is called for page table level 0, it results in the
+     *   entry being a pointer to a leaf node, thereby returning
+     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
+     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
+     *   mapped.
+     */
+    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
+    {
+        int ret = pt_next_level(false, &table, offsets[level]);
+
+        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
+            break;
+
+        ASSERT(level);
+    }
+
+    if ( pte_level )
+        *pte_level = level;
+
+    return table + offsets[level];
+}
+
+pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
+{
+    return *_pt_walk(va, pte_level);
+}
+
 /* Update an entry at the level @target. */
 static int pt_update_entry(mfn_t root, vaddr_t virt,
                            mfn_t mfn, unsigned int target,