diff mbox series

[v3,4/9] xen: introduce an arch helper for default dma zone status

Message ID 20220511014639.197825-5-wei.chen@arm.com (mailing list archive)
State Superseded
Headers show
Series Device tree based NUMA support for Arm - Part#1 | expand

Commit Message

Wei Chen May 11, 2022, 1:46 a.m. UTC
In current code, when Xen is running in a multiple nodes
NUMA system, it will set dma_bitsize in end_boot_allocator
to reserve some low address memory as DMA zone.

There are some x86 implications in the implementation.
Because on x86, memory starts from 0. On a multiple-nodes
NUMA system, if a single node contains the majority or all
of the DMA memory, x86 prefers to give out memory from
non-local allocations rather than exhausting the DMA memory
ranges. Hence x86 uses dma_bitsize to set aside some largely
arbitrary amount of memory for DMA zone. The allocations
from DMA zone would happen only after exhausting all other
nodes' memory.

But the implications are not shared across all architectures.
For example, Arm cannot guarantee the availability of memory
below a certain boundary for DMA limited-capability devices
either. But currently, Arm doesn't need a reserved DMA zone
in Xen. Because there is no DMA device in Xen. And for guests,
Xen Arm only allows Dom0 to have DMA operations without IOMMU.
Xen will try to allocate memory under 4GB or memory range that
is limited by dma_bitsize for Dom0 in boot time. For DomU, even
Xen can passthrough devices to DomU without IOMMU, but Xen Arm
doesn't guarantee their DMA operations. So, Xen Arm doesn't
need a reserved DMA zone to provide DMA memory for guests.

In this patch, we introduce an arch_want_default_dmazone helper
for different architectures to determine whether they need to
set dma_bitsize for DMA zone reservation or not.

At the same time, when x86 Xen is built with CONFIG_PV=n could
probably leverage this new helper to actually not trigger DMA
zone reservation.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
---
v2 -> v3:
1. Add Tb.
2. Rename arch_have_default_dmazone to arch_want_default_dmazone.
v1 -> v2:
1. Extend the description of Arm's workaround for reserve DMA
   allocations to avoid the same discussion every time.
2. Use a macro to define arch_have_default_dmazone, because
   it's little hard to make x86 version to static inline.
   Use a macro will also avoid add __init for this function.
3. Change arch_have_default_dmazone return value from
   unsigned int to bool.
4. Un-addressed comment: make arch_have_default_dmazone
   of x86 to be static inline. Because, if we move
   arch_have_default_dmazone to x86/asm/numa.h, it depends
   on nodemask.h to provide num_online_nodes. But nodemask.h
   needs numa.h to provide MAX_NUMANODES. This will cause a
   loop dependency. And this function can only be used in
   end_boot_allocator, in Xen initialization. So I think,
   compared to the changes introduced by inline, it doesn't
   mean much.
---
 xen/arch/arm/include/asm/numa.h | 1 +
 xen/arch/x86/include/asm/numa.h | 1 +
 xen/common/page_alloc.c         | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

Comments

Stefano Stabellini May 13, 2022, 12:32 a.m. UTC | #1
On Wed, 11 May 2022, Wei Chen wrote:
> In current code, when Xen is running in a multiple nodes
> NUMA system, it will set dma_bitsize in end_boot_allocator
> to reserve some low address memory as DMA zone.
> 
> There are some x86 implications in the implementation.
> Because on x86, memory starts from 0. On a multiple-nodes
> NUMA system, if a single node contains the majority or all
> of the DMA memory, x86 prefers to give out memory from
> non-local allocations rather than exhausting the DMA memory
> ranges. Hence x86 uses dma_bitsize to set aside some largely
> arbitrary amount of memory for DMA zone. The allocations
> from DMA zone would happen only after exhausting all other
> nodes' memory.
> 
> But the implications are not shared across all architectures.
> For example, Arm cannot guarantee the availability of memory
> below a certain boundary for DMA limited-capability devices
> either. But currently, Arm doesn't need a reserved DMA zone
> in Xen. Because there is no DMA device in Xen. And for guests,
> Xen Arm only allows Dom0 to have DMA operations without IOMMU.
> Xen will try to allocate memory under 4GB or memory range that
> is limited by dma_bitsize for Dom0 in boot time. For DomU, even
> Xen can passthrough devices to DomU without IOMMU, but Xen Arm
> doesn't guarantee their DMA operations. So, Xen Arm doesn't
> need a reserved DMA zone to provide DMA memory for guests.
> 
> In this patch, we introduce an arch_want_default_dmazone helper
> for different architectures to determine whether they need to
> set dma_bitsize for DMA zone reservation or not.
> 
> At the same time, when x86 Xen is built with CONFIG_PV=n could
> probably leverage this new helper to actually not trigger DMA
> zone reservation.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Tested-by: Jiamei Xie <jiamei.xie@arm.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> v2 -> v3:
> 1. Add Tb.
> 2. Rename arch_have_default_dmazone to arch_want_default_dmazone.
> v1 -> v2:
> 1. Extend the description of Arm's workaround for reserve DMA
>    allocations to avoid the same discussion every time.
> 2. Use a macro to define arch_have_default_dmazone, because
>    it's little hard to make x86 version to static inline.
>    Use a macro will also avoid add __init for this function.
> 3. Change arch_have_default_dmazone return value from
>    unsigned int to bool.
> 4. Un-addressed comment: make arch_have_default_dmazone
>    of x86 to be static inline. Because, if we move
>    arch_have_default_dmazone to x86/asm/numa.h, it depends
>    on nodemask.h to provide num_online_nodes. But nodemask.h
>    needs numa.h to provide MAX_NUMANODES. This will cause a
>    loop dependency. And this function can only be used in
>    end_boot_allocator, in Xen initialization. So I think,
>    compared to the changes introduced by inline, it doesn't
>    mean much.
> ---
>  xen/arch/arm/include/asm/numa.h | 1 +
>  xen/arch/x86/include/asm/numa.h | 1 +
>  xen/common/page_alloc.c         | 2 +-
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/include/asm/numa.h b/xen/arch/arm/include/asm/numa.h
> index 31a6de4e23..e4c4d89192 100644
> --- a/xen/arch/arm/include/asm/numa.h
> +++ b/xen/arch/arm/include/asm/numa.h
> @@ -24,6 +24,7 @@ extern mfn_t first_valid_mfn;
>  #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
>  #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
>  #define __node_distance(a, b) (20)
> +#define arch_want_default_dmazone() (false)
>  
>  #endif /* __ARCH_ARM_NUMA_H */
>  /*
> diff --git a/xen/arch/x86/include/asm/numa.h b/xen/arch/x86/include/asm/numa.h
> index bada2c0bb9..5d8385f2e1 100644
> --- a/xen/arch/x86/include/asm/numa.h
> +++ b/xen/arch/x86/include/asm/numa.h
> @@ -74,6 +74,7 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
>  #define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
>  #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
>  				 NODE_DATA(nid)->node_spanned_pages)
> +#define arch_want_default_dmazone() (num_online_nodes() > 1)
>  
>  extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
>  
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 319029140f..b3bddc719b 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
>      }
>      nr_bootmem_regions = 0;
>  
> -    if ( !dma_bitsize && (num_online_nodes() > 1) )
> +    if ( !dma_bitsize && arch_want_default_dmazone() )
>          dma_bitsize = arch_get_dma_bitsize();
>  
>      printk("Domain heap initialised");
> -- 
> 2.25.1
>
Jan Beulich May 18, 2022, 1:09 p.m. UTC | #2
On 11.05.2022 03:46, Wei Chen wrote:
> In current code, when Xen is running in a multiple nodes
> NUMA system, it will set dma_bitsize in end_boot_allocator
> to reserve some low address memory as DMA zone.
> 
> There are some x86 implications in the implementation.
> Because on x86, memory starts from 0. On a multiple-nodes
> NUMA system, if a single node contains the majority or all
> of the DMA memory, x86 prefers to give out memory from
> non-local allocations rather than exhausting the DMA memory
> ranges. Hence x86 uses dma_bitsize to set aside some largely
> arbitrary amount of memory for DMA zone. The allocations
> from DMA zone would happen only after exhausting all other
> nodes' memory.
> 
> But the implications are not shared across all architectures.
> For example, Arm cannot guarantee the availability of memory
> below a certain boundary for DMA limited-capability devices
> either. But currently, Arm doesn't need a reserved DMA zone
> in Xen. Because there is no DMA device in Xen. And for guests,
> Xen Arm only allows Dom0 to have DMA operations without IOMMU.
> Xen will try to allocate memory under 4GB or memory range that
> is limited by dma_bitsize for Dom0 in boot time. For DomU, even
> Xen can passthrough devices to DomU without IOMMU, but Xen Arm
> doesn't guarantee their DMA operations. So, Xen Arm doesn't
> need a reserved DMA zone to provide DMA memory for guests.
> 
> In this patch, we introduce an arch_want_default_dmazone helper
> for different architectures to determine whether they need to
> set dma_bitsize for DMA zone reservation or not.
> 
> At the same time, when x86 Xen is built with CONFIG_PV=n could
> probably leverage this new helper to actually not trigger DMA
> zone reservation.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Tested-by: Jiamei Xie <jiamei.xie@arm.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
diff mbox series

Patch

diff --git a/xen/arch/arm/include/asm/numa.h b/xen/arch/arm/include/asm/numa.h
index 31a6de4e23..e4c4d89192 100644
--- a/xen/arch/arm/include/asm/numa.h
+++ b/xen/arch/arm/include/asm/numa.h
@@ -24,6 +24,7 @@  extern mfn_t first_valid_mfn;
 #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
 #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
 #define __node_distance(a, b) (20)
+#define arch_want_default_dmazone() (false)
 
 #endif /* __ARCH_ARM_NUMA_H */
 /*
diff --git a/xen/arch/x86/include/asm/numa.h b/xen/arch/x86/include/asm/numa.h
index bada2c0bb9..5d8385f2e1 100644
--- a/xen/arch/x86/include/asm/numa.h
+++ b/xen/arch/x86/include/asm/numa.h
@@ -74,6 +74,7 @@  static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
 #define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
 #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
 				 NODE_DATA(nid)->node_spanned_pages)
+#define arch_want_default_dmazone() (num_online_nodes() > 1)
 
 extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 319029140f..b3bddc719b 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1889,7 +1889,7 @@  void __init end_boot_allocator(void)
     }
     nr_bootmem_regions = 0;
 
-    if ( !dma_bitsize && (num_online_nodes() > 1) )
+    if ( !dma_bitsize && arch_want_default_dmazone() )
         dma_bitsize = arch_get_dma_bitsize();
 
     printk("Domain heap initialised");