diff mbox series

[v1,3/5] xen/riscv: introduce setup_mm()

Message ID 300a4a5911766d42ec01c3fcaee664d72b484296.1729068334.git.oleksii.kurochko@gmail.com (mailing list archive)
State New
Headers show
Series Setup memory management | expand

Commit Message

Oleksii Kurochko Oct. 16, 2024, 9:15 a.m. UTC
Introduce the implementation of setup_mm(), which includes:
1. Adding all free regions to the boot allocator, as memory is needed
   to allocate page tables used for frame table mapping.
2. Calculating RAM size and the RAM end address.
3. Setting up direct map mappings from the PFN of physical address 0
   to the PFN of RAM end.
4. Setting up frame table mappings from physical address 0 to ram_end.
5. Setting up total_pages and max_page.
6. Establishing `directmap_virt_end`, which is not currently used but is
   intended for future use in virt_to_page() to ensure that virtual
   addresses do not exceed directmap_virt_end.

In the case of RISC-V 64, which has a large virtual address space
(the minimum supported MMU mode is Sv39, providing TB of VA space),
the direct map and frame table are mapped starting from physical address
0. This simplifies the calculations and thereby improves performance for
page_to_mfn(), mfn_to_page(), and maddr_to_virt(), as there is no
need to account for {directmap, frametable}_base_pdx.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
 xen/arch/riscv/include/asm/mm.h    |  2 +
 xen/arch/riscv/include/asm/setup.h |  2 +
 xen/arch/riscv/mm.c                | 93 ++++++++++++++++++++++++++++++
 xen/arch/riscv/setup.c             |  3 +
 4 files changed, 100 insertions(+)

Comments

Jan Beulich Oct. 17, 2024, 3:15 p.m. UTC | #1
On 16.10.2024 11:15, Oleksii Kurochko wrote:
> @@ -423,3 +424,95 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
>  
>      return fdt_virt;
>  }
> +
> +#ifndef CONFIG_RISCV_32
> +/* Map the region in the directmap area. */
> +static void __init setup_directmap_mappings(unsigned long nr_mfns)
> +{
> +    if ( nr_mfns > PFN_DOWN(DIRECTMAP_SIZE) )
> +        panic("The directmap cannot cover the physical region %#"PRIpaddr" - %#"PRIpaddr"\n",
> +              0UL, nr_mfns << PAGE_SHIFT);

Here and elsewhere can you please use mathematical range expressions
(apparently "[%"PRIpaddr", %#"PRIpaddr")" here), to avoid ambiguity?

> +    if ( map_pages_to_xen((vaddr_t)mfn_to_virt(0),
> +                          _mfn(0), nr_mfns,
> +                          PAGE_HYPERVISOR_RW) )
> +        panic("Unable to setup the directmap mappings.\n");
> +}
> +
> +/* Map a frame table to cover physical addresses ps through pe */
> +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
> +{
> +    unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) -

This looks to be accounting for a partial page at the end.

> +                            mfn_x(maddr_to_mfn(ps)) + 1;

Whereas this doesn't do the same at the start. The sole present caller
passes 0, so that's going to be fine for the time being. Yet it's a
latent pitfall. I'd recommend to either drop the function parameter, or
to deal with it correctly right away.

> +    unsigned long frametable_size = nr_mfns * sizeof(struct page_info);
> +    mfn_t base_mfn;
> +
> +    if ( frametable_size > FRAMETABLE_SIZE )
> +        panic("The frametable cannot cover the physical region %#"PRIpaddr" - %#"PRIpaddr"\n",
> +              ps, pe);
> +
> +    frametable_size = ROUNDUP(frametable_size, MB(2));
> +    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 2<<(20-12));

Nit: Missing blanks in binary expressions. Also please don't open-code
PFN_DOWN() (and again below).

> +    if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
> +                          frametable_size >> PAGE_SHIFT,
> +                          PAGE_HYPERVISOR_RW) )
> +        panic("Unable to setup the frametable mappings.\n");

Nit (as indicated before): No full stop please at the end of log or
panic messages.

> +    memset(&frame_table[0], 0, nr_mfns * sizeof(struct page_info));
> +    memset(&frame_table[nr_mfns], -1,
> +           frametable_size - (nr_mfns * sizeof(struct page_info)));
> +}
> +
> +vaddr_t directmap_virt_end __read_mostly;

__ro_after_init? And moved ahead of the identifier, just like ...

> +/*
> + * Setup memory management
> + *
> + * RISC-V 64 has a large virtual address space (the minimum supported
> + * MMU mode is Sv39, which provides TBs of VA space).
> + * In the case of RISC-V 64, the directmap and frametable are mapped
> + * starting from physical address 0 to simplify the page_to_mfn(),
> + * mfn_to_page(), and maddr_to_virt() calculations, as there is no need
> + * to account for {directmap, frametable}_base_pdx in this setup.
> + */
> +void __init setup_mm(void)

... __init is placed e.g. here.

It's also unclear why the variable needs to be non-static.

> +{
> +    const struct membanks *banks = bootinfo_get_mem();
> +    paddr_t ram_end = 0;
> +    paddr_t ram_size = 0;
> +    unsigned int i;
> +
> +    /*
> +     * We need some memory to allocate the page-tables used for the directmap
> +     * mappings. But some regions may contain memory already allocated
> +     * for other uses (e.g. modules, reserved-memory...).
> +     *
> +     * For simplicity, add all the free regions in the boot allocator.
> +     */
> +    populate_boot_allocator();
> +
> +    total_pages = 0;
> +
> +    for ( i = 0; i < banks->nr_banks; i++ )
> +    {
> +        const struct membank *bank = &banks->bank[i];
> +        paddr_t bank_end = bank->start + bank->size;
> +
> +        ram_size = ram_size + bank->size;
> +        ram_end = max(ram_end, bank_end);
> +    }
> +
> +    setup_directmap_mappings(PFN_DOWN(ram_end));

While you may want to use non-offset-ed mappings, I can't see how you can
validly map just a single huge chunk, no matter whether there are holes
in there. Such holes could hold MMIO regions, which I'm sure would require
more careful mapping (to avoid cacheable accesses, or even speculative
ones).

> +    total_pages = PFN_DOWN(ram_size);

Imo the rounding down to page granularity needs to be done when ram_size
is accumulated, such that partial pages simply won't be counted. Unless
of course there's a guarantee that banks can never have partial pages at
their start/end.

Jan
Oleksii Kurochko Oct. 18, 2024, 2:27 p.m. UTC | #2
On Thu, 2024-10-17 at 17:15 +0200, Jan Beulich wrote:
> On 16.10.2024 11:15, Oleksii Kurochko wrote:
> 
> > +    if ( map_pages_to_xen((vaddr_t)mfn_to_virt(0),
> > +                          _mfn(0), nr_mfns,
> > +                          PAGE_HYPERVISOR_RW) )
> > +        panic("Unable to setup the directmap mappings.\n");
> > +}
> > +
> > +/* Map a frame table to cover physical addresses ps through pe */
> > +static void __init setup_frametable_mappings(paddr_t ps, paddr_t
> > pe)
> > +{
> > +    unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) -
> 
> This looks to be accounting for a partial page at the end.
> 
> > +                            mfn_x(maddr_to_mfn(ps)) + 1;
> 
> Whereas this doesn't do the same at the start. The sole present
> caller
> passes 0, so that's going to be fine for the time being. Yet it's a
> latent pitfall. I'd recommend to either drop the function parameter,
> or
> to deal with it correctly right away.
Then it will be better to do in the next way to be sure that everything
is properly aligned:
   unsigned long nr_mfns = PFN_DOWN(ALIGN_UP(pe) - ALIGN_DOWN(ps));

> > +    memset(&frame_table[0], 0, nr_mfns * sizeof(struct
> > page_info));
> > +    memset(&frame_table[nr_mfns], -1,
> > +           frametable_size - (nr_mfns * sizeof(struct
> > page_info)));
> > +}
> > +
> > +vaddr_t directmap_virt_end __read_mostly;
> 
> __ro_after_init? And moved ahead of the identifier, just like ...
> 
> > +/*
> > + * Setup memory management
> > + *
> > + * RISC-V 64 has a large virtual address space (the minimum
> > supported
> > + * MMU mode is Sv39, which provides TBs of VA space).
> > + * In the case of RISC-V 64, the directmap and frametable are
> > mapped
> > + * starting from physical address 0 to simplify the page_to_mfn(),
> > + * mfn_to_page(), and maddr_to_virt() calculations, as there is no
> > need
> > + * to account for {directmap, frametable}_base_pdx in this setup.
> > + */
> > +void __init setup_mm(void)
> 
> ... __init is placed e.g. here.
> 
> It's also unclear why the variable needs to be non-static.
As it could be use then for some ASSERT(), for example, in
virt_to_page() as Arm or x86 do.

> 
> > +{
> > +    const struct membanks *banks = bootinfo_get_mem();
> > +    paddr_t ram_end = 0;
> > +    paddr_t ram_size = 0;
> > +    unsigned int i;
> > +
> > +    /*
> > +     * We need some memory to allocate the page-tables used for
> > the directmap
> > +     * mappings. But some regions may contain memory already
> > allocated
> > +     * for other uses (e.g. modules, reserved-memory...).
> > +     *
> > +     * For simplicity, add all the free regions in the boot
> > allocator.
> > +     */
> > +    populate_boot_allocator();
> > +
> > +    total_pages = 0;
> > +
> > +    for ( i = 0; i < banks->nr_banks; i++ )
> > +    {
> > +        const struct membank *bank = &banks->bank[i];
> > +        paddr_t bank_end = bank->start + bank->size;
> > +
> > +        ram_size = ram_size + bank->size;
> > +        ram_end = max(ram_end, bank_end);
> > +    }
> > +
> > +    setup_directmap_mappings(PFN_DOWN(ram_end));
> 
> While you may want to use non-offset-ed mappings, I can't see how you
> can
> validly map just a single huge chunk, no matter whether there are
> holes
> in there. Such holes could hold MMIO regions, which I'm sure would
> require
> more careful mapping (to avoid cacheable accesses, or even
> speculative
> ones).
My intention was to avoid subraction of directmap_start ( = ram_start )
in maddr_to_virt() to have less  operations during maddr to virt
calculation:
   static inline void *maddr_to_virt(paddr_t ma)
   {
       /* Offset in the direct map, accounting for pdx compression */
       unsigned long va_offset = maddr_to_directmapoff(ma);
   
       ASSERT(va_offset < DIRECTMAP_SIZE);
   
       return (void *)(DIRECTMAP_VIRT_START /* - directmap_start */ +
   va_offset);
   }
But it seems I don't have any chance to avoid that because of mentioned
by you reasons... Except probably to have a config which will hard code
RAM_START for each platform ( what on other hand will make Xen less
flexible in some point ).
Does it make sense to have a config instead of identifying ram_start in
runtime?

So I have to rework this part of code to look as:
    for ( i = 0; i < banks->nr_banks; i++ )
    {
        const struct membank *bank = &banks->bank[i];
        paddr_t bank_end = bank->start + bank->size;

        ram_size = ram_size + bank->size;
        ram_start = min(ram_start, bank->start);
        ram_end = max(ram_end, bank_end);

        setup_directmap_mappings(PFN_DOWN(bank->start),
                                 PFN_DOWN(bank->size));
    }


> 
> > +    total_pages = PFN_DOWN(ram_size);
> 
> Imo the rounding down to page granularity needs to be done when
> ram_size
> is accumulated, such that partial pages simply won't be counted.
> Unless
> of course there's a guarantee that banks can never have partial pages
> at
> their start/end.
Hmm, good point. I thought that start/end is always properly aligned
but I can't find in device tree spec that it is guaranteed for memory
node, so it will really better rounding down after ram_size is
accumulated.

~ Oleksii
diff mbox series

Patch

diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 0396e66f47..7b472220e5 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -12,6 +12,8 @@ 
 
 #include <asm/page-bits.h>
 
+extern vaddr_t directmap_virt_end;
+
 #define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT)
 #define paddr_to_pfn(pa)  ((unsigned long)((pa) >> PAGE_SHIFT))
 
diff --git a/xen/arch/riscv/include/asm/setup.h b/xen/arch/riscv/include/asm/setup.h
index c0214a9bf2..844a2f0ef1 100644
--- a/xen/arch/riscv/include/asm/setup.h
+++ b/xen/arch/riscv/include/asm/setup.h
@@ -5,6 +5,8 @@ 
 
 #define max_init_domid (0)
 
+void setup_mm(void);
+
 #endif /* ASM__RISCV__SETUP_H */
 
 /*
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 27026d803b..53b7483f75 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -8,6 +8,7 @@ 
 #include <xen/libfdt/libfdt.h>
 #include <xen/macros.h>
 #include <xen/mm.h>
+#include <xen/pdx.h>
 #include <xen/pfn.h>
 #include <xen/sections.h>
 #include <xen/sizes.h>
@@ -423,3 +424,95 @@  void * __init early_fdt_map(paddr_t fdt_paddr)
 
     return fdt_virt;
 }
+
+#ifndef CONFIG_RISCV_32
+/* Map the region in the directmap area. */
+static void __init setup_directmap_mappings(unsigned long nr_mfns)
+{
+    if ( nr_mfns > PFN_DOWN(DIRECTMAP_SIZE) )
+        panic("The directmap cannot cover the physical region %#"PRIpaddr" - %#"PRIpaddr"\n",
+              0UL, nr_mfns << PAGE_SHIFT);
+
+    if ( map_pages_to_xen((vaddr_t)mfn_to_virt(0),
+                          _mfn(0), nr_mfns,
+                          PAGE_HYPERVISOR_RW) )
+        panic("Unable to setup the directmap mappings.\n");
+}
+
+/* Map a frame table to cover physical addresses ps through pe */
+static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
+{
+    unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) -
+                            mfn_x(maddr_to_mfn(ps)) + 1;
+    unsigned long frametable_size = nr_mfns * sizeof(struct page_info);
+    mfn_t base_mfn;
+
+    if ( frametable_size > FRAMETABLE_SIZE )
+        panic("The frametable cannot cover the physical region %#"PRIpaddr" - %#"PRIpaddr"\n",
+              ps, pe);
+
+    frametable_size = ROUNDUP(frametable_size, MB(2));
+    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 2<<(20-12));
+
+    if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
+                          frametable_size >> PAGE_SHIFT,
+                          PAGE_HYPERVISOR_RW) )
+        panic("Unable to setup the frametable mappings.\n");
+
+    memset(&frame_table[0], 0, nr_mfns * sizeof(struct page_info));
+    memset(&frame_table[nr_mfns], -1,
+           frametable_size - (nr_mfns * sizeof(struct page_info)));
+}
+
+vaddr_t directmap_virt_end __read_mostly;
+
+/*
+ * Setup memory management
+ *
+ * RISC-V 64 has a large virtual address space (the minimum supported
+ * MMU mode is Sv39, which provides TBs of VA space).
+ * In the case of RISC-V 64, the directmap and frametable are mapped
+ * starting from physical address 0 to simplify the page_to_mfn(),
+ * mfn_to_page(), and maddr_to_virt() calculations, as there is no need
+ * to account for {directmap, frametable}_base_pdx in this setup.
+ */
+void __init setup_mm(void)
+{
+    const struct membanks *banks = bootinfo_get_mem();
+    paddr_t ram_end = 0;
+    paddr_t ram_size = 0;
+    unsigned int i;
+
+    /*
+     * We need some memory to allocate the page-tables used for the directmap
+     * mappings. But some regions may contain memory already allocated
+     * for other uses (e.g. modules, reserved-memory...).
+     *
+     * For simplicity, add all the free regions in the boot allocator.
+     */
+    populate_boot_allocator();
+
+    total_pages = 0;
+
+    for ( i = 0; i < banks->nr_banks; i++ )
+    {
+        const struct membank *bank = &banks->bank[i];
+        paddr_t bank_end = bank->start + bank->size;
+
+        ram_size = ram_size + bank->size;
+        ram_end = max(ram_end, bank_end);
+    }
+
+    setup_directmap_mappings(PFN_DOWN(ram_end));
+
+    total_pages = PFN_DOWN(ram_size);
+
+    directmap_virt_end = DIRECTMAP_VIRT_START + ram_end;
+
+    setup_frametable_mappings(0, ram_end);
+    max_page = PFN_DOWN(ram_end);
+}
+
+#else /* CONFIG_RISCV_32 */
+#error setup_mm(), setup_{directmap,frametable}_mapping() should be implemented for RV_32
+#endif
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index e29bd75d7c..2887a18c0c 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -12,6 +12,7 @@ 
 
 #include <asm/early_printk.h>
 #include <asm/sbi.h>
+#include <asm/setup.h>
 #include <asm/smp.h>
 #include <asm/traps.h>
 
@@ -59,6 +60,8 @@  void __init noreturn start_xen(unsigned long bootcpu_id,
     printk("Command line: %s\n", cmdline);
     cmdline_parse(cmdline);
 
+    setup_mm();
+
     printk("All set up\n");
 
     machine_halt();