diff mbox series

[04/37] xen: introduce an arch helper for default dma zone status

Message ID 20210923120236.3692135-5-wei.chen@arm.com (mailing list archive)
State New, archived
Headers show
Series Add device tree based NUMA support to Arm | expand

Commit Message

Wei Chen Sept. 23, 2021, 12:02 p.m. UTC
In current code, when Xen is running in a multiple nodes NUMA
system, it will set dma_bitsize in end_boot_allocator to reserve
some low address memory for DMA.

There are some x86 implications in current implementation. Becuase
on x86, memory starts from 0. On a multiple nodes NUMA system, if
a single node contains the majority or all of the DMA memory. x86
prefer to give out memory from non-local allocations rather than
exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
aside some largely arbitrary amount memory for DMA memory ranges.
The allocations from these memory ranges would happen only after
exhausting all other nodes' memory.

But the implications are not shared across all architectures. For
example, Arm doesn't have these implications. So in this patch, we
introduce an arch_have_default_dmazone helper for arch to determine
that it need to set dma_bitsize for reserve DMA allocations or not.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/numa.c        | 5 +++++
 xen/common/page_alloc.c    | 2 +-
 xen/include/asm-arm/numa.h | 5 +++++
 xen/include/asm-x86/numa.h | 1 +
 4 files changed, 12 insertions(+), 1 deletion(-)

Comments

Stefano Stabellini Sept. 23, 2021, 11:55 p.m. UTC | #1
On Thu, 23 Sep 2021, Wei Chen wrote:
> In current code, when Xen is running in a multiple nodes NUMA
> system, it will set dma_bitsize in end_boot_allocator to reserve
> some low address memory for DMA.
> 
> There are some x86 implications in current implementation. Becuase
                                    ^ the                    ^Because

> on x86, memory starts from 0. On a multiple nodes NUMA system, if
> a single node contains the majority or all of the DMA memory. x86
                                                              ^,

> prefer to give out memory from non-local allocations rather than
> exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
> aside some largely arbitrary amount memory for DMA memory ranges.
                                     ^ of memory

> The allocations from these memory ranges would happen only after
> exhausting all other nodes' memory.
> 
> But the implications are not shared across all architectures. For
> example, Arm doesn't have these implications. So in this patch, we
> introduce an arch_have_default_dmazone helper for arch to determine
> that it need to set dma_bitsize for reserve DMA allocations or not.
          ^ needs

> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/x86/numa.c        | 5 +++++
>  xen/common/page_alloc.c    | 2 +-
>  xen/include/asm-arm/numa.h | 5 +++++
>  xen/include/asm-x86/numa.h | 1 +
>  4 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index ce79ee44ce..1fabbe8281 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -371,6 +371,11 @@ unsigned int __init arch_get_dma_bitsize(void)
>                   + PAGE_SHIFT, 32);
>  }
>  
> +unsigned int arch_have_default_dmazone(void)

Can this function return bool?
Also, can it be a static inline?


> +{
> +    return ( num_online_nodes() > 1 ) ? 1 : 0;
> +}
> +
>  static void dump_numa(unsigned char key)
>  {
>      s_time_t now = NOW();
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 5801358b4b..80916205e5 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
>      }
>      nr_bootmem_regions = 0;
>  
> -    if ( !dma_bitsize && (num_online_nodes() > 1) )
> +    if ( !dma_bitsize && arch_have_default_dmazone() )
>          dma_bitsize = arch_get_dma_bitsize();
>  
>      printk("Domain heap initialised");
> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index 31a6de4e23..9d5739542d 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -25,6 +25,11 @@ extern mfn_t first_valid_mfn;
>  #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
>  #define __node_distance(a, b) (20)
>  
> +static inline unsigned int arch_have_default_dmazone(void)
> +{
> +    return 0;
> +}
> +
>  #endif /* __ARCH_ARM_NUMA_H */
>  /*
>   * Local variables:
> diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> index 3cf26c2def..8060cbf3f4 100644
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -78,5 +78,6 @@ extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
>  void srat_parse_regions(u64 addr);
>  extern u8 __node_distance(nodeid_t a, nodeid_t b);
>  unsigned int arch_get_dma_bitsize(void);
> +unsigned int arch_have_default_dmazone(void);
>  
>  #endif
> -- 
> 2.25.1
>
Wei Chen Sept. 24, 2021, 1:50 a.m. UTC | #2
Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年9月24日 7:56
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [PATCH 04/37] xen: introduce an arch helper for default dma
> zone status
> 
> On Thu, 23 Sep 2021, Wei Chen wrote:
> > In current code, when Xen is running in a multiple nodes NUMA
> > system, it will set dma_bitsize in end_boot_allocator to reserve
> > some low address memory for DMA.
> >
> > There are some x86 implications in current implementation. Becuase
>                                     ^ the                    ^Because
> 
> > on x86, memory starts from 0. On a multiple nodes NUMA system, if
> > a single node contains the majority or all of the DMA memory. x86
>                                                               ^,
> 
> > prefer to give out memory from non-local allocations rather than
> > exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
> > aside some largely arbitrary amount memory for DMA memory ranges.
>                                      ^ of memory
> 
> > The allocations from these memory ranges would happen only after
> > exhausting all other nodes' memory.
> >
> > But the implications are not shared across all architectures. For
> > example, Arm doesn't have these implications. So in this patch, we
> > introduce an arch_have_default_dmazone helper for arch to determine
> > that it need to set dma_bitsize for reserve DMA allocations or not.
>           ^ needs
> 

I will fix above typos in next version.

> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/x86/numa.c        | 5 +++++
> >  xen/common/page_alloc.c    | 2 +-
> >  xen/include/asm-arm/numa.h | 5 +++++
> >  xen/include/asm-x86/numa.h | 1 +
> >  4 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> > index ce79ee44ce..1fabbe8281 100644
> > --- a/xen/arch/x86/numa.c
> > +++ b/xen/arch/x86/numa.c
> > @@ -371,6 +371,11 @@ unsigned int __init arch_get_dma_bitsize(void)
> >                   + PAGE_SHIFT, 32);
> >  }
> >
> > +unsigned int arch_have_default_dmazone(void)
> 
> Can this function return bool?
> Also, can it be a static inline?
> 

Yes, bool would be better. I will place a static inline in asm/numa.h.
Because arm will have another static inline implementation.

> 
> > +{
> > +    return ( num_online_nodes() > 1 ) ? 1 : 0;
> > +}
> > +
> >  static void dump_numa(unsigned char key)
> >  {
> >      s_time_t now = NOW();
> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > index 5801358b4b..80916205e5 100644
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
> >      }
> >      nr_bootmem_regions = 0;
> >
> > -    if ( !dma_bitsize && (num_online_nodes() > 1) )
> > +    if ( !dma_bitsize && arch_have_default_dmazone() )
> >          dma_bitsize = arch_get_dma_bitsize();
> >
> >      printk("Domain heap initialised");
> > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > index 31a6de4e23..9d5739542d 100644
> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -25,6 +25,11 @@ extern mfn_t first_valid_mfn;
> >  #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
> >  #define __node_distance(a, b) (20)
> >
> > +static inline unsigned int arch_have_default_dmazone(void)
> > +{
> > +    return 0;
> > +}
> > +
> >  #endif /* __ARCH_ARM_NUMA_H */
> >  /*
> >   * Local variables:
> > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> > index 3cf26c2def..8060cbf3f4 100644
> > --- a/xen/include/asm-x86/numa.h
> > +++ b/xen/include/asm-x86/numa.h
> > @@ -78,5 +78,6 @@ extern int valid_numa_range(u64 start, u64 end,
> nodeid_t node);
> >  void srat_parse_regions(u64 addr);
> >  extern u8 __node_distance(nodeid_t a, nodeid_t b);
> >  unsigned int arch_get_dma_bitsize(void);
> > +unsigned int arch_have_default_dmazone(void);
> >
> >  #endif
> > --
> > 2.25.1
> >
Jan Beulich Jan. 17, 2022, 4:10 p.m. UTC | #3
I realize this series has been pending for a long time, but I don't
recall any indication that it would have been dropped. Hence as a
first try, a few comments on this relatively simple change. I'm
sorry it to have taken so long to get to it.

On 23.09.2021 14:02, Wei Chen wrote:
> In current code, when Xen is running in a multiple nodes NUMA
> system, it will set dma_bitsize in end_boot_allocator to reserve
> some low address memory for DMA.
> 
> There are some x86 implications in current implementation. Becuase
> on x86, memory starts from 0. On a multiple nodes NUMA system, if
> a single node contains the majority or all of the DMA memory. x86
> prefer to give out memory from non-local allocations rather than
> exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
> aside some largely arbitrary amount memory for DMA memory ranges.
> The allocations from these memory ranges would happen only after
> exhausting all other nodes' memory.
> 
> But the implications are not shared across all architectures. For
> example, Arm doesn't have these implications. So in this patch, we
> introduce an arch_have_default_dmazone helper for arch to determine
> that it need to set dma_bitsize for reserve DMA allocations or not.

How would Arm guarantee availability of memory below a certain
boundary for limited-capability devices? Or is there no need
because there's an assumption that I/O for such devices would
always pass through an IOMMU, lifting address size restrictions?
(I guess in a !PV build on x86 we could also get rid of such a
reservation.)

> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -371,6 +371,11 @@ unsigned int __init arch_get_dma_bitsize(void)
>                   + PAGE_SHIFT, 32);
>  }
>  
> +unsigned int arch_have_default_dmazone(void)
> +{
> +    return ( num_online_nodes() > 1 ) ? 1 : 0;
> +}

According to the expression and ...

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
>      }
>      nr_bootmem_regions = 0;
>  
> -    if ( !dma_bitsize && (num_online_nodes() > 1) )
> +    if ( !dma_bitsize && arch_have_default_dmazone() )
>          dma_bitsize = arch_get_dma_bitsize();

... the use site, you mean the function to return boolean. Please
indicate so by making it have a return type of "bool". Independent
of that you don't need a conditional expression above, nor
(malformed) use of parentheses. I further wonder whether ...

> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -25,6 +25,11 @@ extern mfn_t first_valid_mfn;
>  #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
>  #define __node_distance(a, b) (20)
>  
> +static inline unsigned int arch_have_default_dmazone(void)
> +{
> +    return 0;
> +}

... like this one, x86'es couldn't be inline as well. If indeed
it can't be, making it a macro may still be better (and avoid a
further comment regarding the lack of __init).

Jan
Wei Chen Jan. 18, 2022, 7:51 a.m. UTC | #4
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年1月18日 0:11
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
> Subject: Re: [PATCH 04/37] xen: introduce an arch helper for default dma
> zone status
> 
> I realize this series has been pending for a long time, but I don't
> recall any indication that it would have been dropped. Hence as a
> first try, a few comments on this relatively simple change. I'm
> sorry it to have taken so long to get to it.
> 

Thanks for reviewing this series and lift it up. We are still
working on this series and will send a new version soon.

> On 23.09.2021 14:02, Wei Chen wrote:
> > In current code, when Xen is running in a multiple nodes NUMA
> > system, it will set dma_bitsize in end_boot_allocator to reserve
> > some low address memory for DMA.
> >
> > There are some x86 implications in current implementation. Becuase
> > on x86, memory starts from 0. On a multiple nodes NUMA system, if
> > a single node contains the majority or all of the DMA memory. x86
> > prefer to give out memory from non-local allocations rather than
> > exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
> > aside some largely arbitrary amount memory for DMA memory ranges.
> > The allocations from these memory ranges would happen only after
> > exhausting all other nodes' memory.
> >
> > But the implications are not shared across all architectures. For
> > example, Arm doesn't have these implications. So in this patch, we
> > introduce an arch_have_default_dmazone helper for arch to determine
> > that it need to set dma_bitsize for reserve DMA allocations or not.
> 
> How would Arm guarantee availability of memory below a certain
> boundary for limited-capability devices? Or is there no need
> because there's an assumption that I/O for such devices would
> always pass through an IOMMU, lifting address size restrictions?
> (I guess in a !PV build on x86 we could also get rid of such a
> reservation.)

On Arm, we still can have some devices with limited DMA capability.
And we also don't force all such devices to use IOMMU. This devices
will affect the dma_bitsize. Like RPi platform, it sets its dma_bitsize
to 30. But in multiple NUMA nodes system, Arm doesn't have a default
DMA zone. Multiple nodes is not a constraint on dma_bitsize. And some
previous discussions are placed here [1].

> 
> > --- a/xen/arch/x86/numa.c
> > +++ b/xen/arch/x86/numa.c
> > @@ -371,6 +371,11 @@ unsigned int __init arch_get_dma_bitsize(void)
> >                   + PAGE_SHIFT, 32);
> >  }
> >
> > +unsigned int arch_have_default_dmazone(void)
> > +{
> > +    return ( num_online_nodes() > 1 ) ? 1 : 0;
> > +}
> 
> According to the expression and ...
> 
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
> >      }
> >      nr_bootmem_regions = 0;
> >
> > -    if ( !dma_bitsize && (num_online_nodes() > 1) )
> > +    if ( !dma_bitsize && arch_have_default_dmazone() )
> >          dma_bitsize = arch_get_dma_bitsize();
> 
> ... the use site, you mean the function to return boolean. Please
> indicate so by making it have a return type of "bool". Independent
> of that you don't need a conditional expression above, nor
> (malformed) use of parentheses. I further wonder whether ...
> 

I will fix them in next version. But I am not very clear about
this comment "of that you don't need a conditional expression above",
The "above" indicates this line:
"return ( num_online_nodes() > 1 ) ? 1 : 0;"?

> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -25,6 +25,11 @@ extern mfn_t first_valid_mfn;
> >  #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
> >  #define __node_distance(a, b) (20)
> >
> > +static inline unsigned int arch_have_default_dmazone(void)
> > +{
> > +    return 0;
> > +}
> 
> ... like this one, x86'es couldn't be inline as well. If indeed
> it can't be, making it a macro may still be better (and avoid a
> further comment regarding the lack of __init).

Ok, that would be better, I will do it in next version.

> 
> Jan

[1] https://lists.xenproject.org/archives/html/xen-devel/2021-08/msg00772.html
Jan Beulich Jan. 18, 2022, 8:16 a.m. UTC | #5
On 18.01.2022 08:51, Wei Chen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年1月18日 0:11
>> On 23.09.2021 14:02, Wei Chen wrote:
>>> In current code, when Xen is running in a multiple nodes NUMA
>>> system, it will set dma_bitsize in end_boot_allocator to reserve
>>> some low address memory for DMA.
>>>
>>> There are some x86 implications in current implementation. Becuase
>>> on x86, memory starts from 0. On a multiple nodes NUMA system, if
>>> a single node contains the majority or all of the DMA memory. x86
>>> prefer to give out memory from non-local allocations rather than
>>> exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
>>> aside some largely arbitrary amount memory for DMA memory ranges.
>>> The allocations from these memory ranges would happen only after
>>> exhausting all other nodes' memory.
>>>
>>> But the implications are not shared across all architectures. For
>>> example, Arm doesn't have these implications. So in this patch, we
>>> introduce an arch_have_default_dmazone helper for arch to determine
>>> that it need to set dma_bitsize for reserve DMA allocations or not.
>>
>> How would Arm guarantee availability of memory below a certain
>> boundary for limited-capability devices? Or is there no need
>> because there's an assumption that I/O for such devices would
>> always pass through an IOMMU, lifting address size restrictions?
>> (I guess in a !PV build on x86 we could also get rid of such a
>> reservation.)
> 
> On Arm, we still can have some devices with limited DMA capability.
> And we also don't force all such devices to use IOMMU. This devices
> will affect the dma_bitsize. Like RPi platform, it sets its dma_bitsize
> to 30. But in multiple NUMA nodes system, Arm doesn't have a default
> DMA zone. Multiple nodes is not a constraint on dma_bitsize. And some
> previous discussions are placed here [1].

I'm afraid that doesn't give me more clues. For example, in the mail
being replied to there I find "That means, only first 4GB memory can
be used for DMA." Yet that's not an implication from setting
dma_bitsize. DMA is fine to occur to any address. The special address
range is being held back in case in particular Dom0 is in need of such
a range to perform I/O to _some_ devices.

>>> --- a/xen/arch/x86/numa.c
>>> +++ b/xen/arch/x86/numa.c
>>> @@ -371,6 +371,11 @@ unsigned int __init arch_get_dma_bitsize(void)
>>>                   + PAGE_SHIFT, 32);
>>>  }
>>>
>>> +unsigned int arch_have_default_dmazone(void)
>>> +{
>>> +    return ( num_online_nodes() > 1 ) ? 1 : 0;
>>> +}
>>
>> According to the expression and ...
>>
>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
>>>      }
>>>      nr_bootmem_regions = 0;
>>>
>>> -    if ( !dma_bitsize && (num_online_nodes() > 1) )
>>> +    if ( !dma_bitsize && arch_have_default_dmazone() )
>>>          dma_bitsize = arch_get_dma_bitsize();
>>
>> ... the use site, you mean the function to return boolean. Please
>> indicate so by making it have a return type of "bool". Independent
>> of that you don't need a conditional expression above, nor
>> (malformed) use of parentheses. I further wonder whether ...
>>
> 
> I will fix them in next version. But I am not very clear about
> this comment "of that you don't need a conditional expression above",
> The "above" indicates this line:
> "return ( num_online_nodes() > 1 ) ? 1 : 0;"?

Yes. Even without the use of bool such an expression is a more
complicated form of

    return num_online_nodes() > 1;

where we'd prefer to use the simpler variant for being easier to
read / follow.

Jan

> [1] https://lists.xenproject.org/archives/html/xen-devel/2021-08/msg00772.html
> 
>
Wei Chen Jan. 18, 2022, 9:20 a.m. UTC | #6
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年1月18日 16:16
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
> Subject: Re: [PATCH 04/37] xen: introduce an arch helper for default dma
> zone status
> 
> On 18.01.2022 08:51, Wei Chen wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 2022年1月18日 0:11
> >> On 23.09.2021 14:02, Wei Chen wrote:
> >>> In current code, when Xen is running in a multiple nodes NUMA
> >>> system, it will set dma_bitsize in end_boot_allocator to reserve
> >>> some low address memory for DMA.
> >>>
> >>> There are some x86 implications in current implementation. Becuase
> >>> on x86, memory starts from 0. On a multiple nodes NUMA system, if
> >>> a single node contains the majority or all of the DMA memory. x86
> >>> prefer to give out memory from non-local allocations rather than
> >>> exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
> >>> aside some largely arbitrary amount memory for DMA memory ranges.
> >>> The allocations from these memory ranges would happen only after
> >>> exhausting all other nodes' memory.
> >>>
> >>> But the implications are not shared across all architectures. For
> >>> example, Arm doesn't have these implications. So in this patch, we
> >>> introduce an arch_have_default_dmazone helper for arch to determine
> >>> that it need to set dma_bitsize for reserve DMA allocations or not.
> >>
> >> How would Arm guarantee availability of memory below a certain
> >> boundary for limited-capability devices? Or is there no need
> >> because there's an assumption that I/O for such devices would
> >> always pass through an IOMMU, lifting address size restrictions?
> >> (I guess in a !PV build on x86 we could also get rid of such a
> >> reservation.)
> >
> > On Arm, we still can have some devices with limited DMA capability.
> > And we also don't force all such devices to use IOMMU. This devices
> > will affect the dma_bitsize. Like RPi platform, it sets its dma_bitsize
> > to 30. But in multiple NUMA nodes system, Arm doesn't have a default
> > DMA zone. Multiple nodes is not a constraint on dma_bitsize. And some
> > previous discussions are placed here [1].
> 
> I'm afraid that doesn't give me more clues. For example, in the mail
> being replied to there I find "That means, only first 4GB memory can
> be used for DMA." Yet that's not an implication from setting
> dma_bitsize. DMA is fine to occur to any address. The special address
> range is being held back in case in particular Dom0 is in need of such
> a range to perform I/O to _some_ devices.

I am sorry that my last reply hasn't given you more clues. On Arm, only
Dom0 can have DMA without IOMMU. So when we allocate memory for Dom0,
we're trying to allocate memory under 4GB or in the range of dma_bitsize
indicated. I think these operations meet your above Dom0 special address
range description. As we have already allocated memory for DMA, so I
think we don't need a DMA zone in page allocation. I am not sure is that
answers your earlier question?

> 
> >>> --- a/xen/arch/x86/numa.c
> >>> +++ b/xen/arch/x86/numa.c
> >>> @@ -371,6 +371,11 @@ unsigned int __init arch_get_dma_bitsize(void)
> >>>                   + PAGE_SHIFT, 32);
> >>>  }
> >>>
> >>> +unsigned int arch_have_default_dmazone(void)
> >>> +{
> >>> +    return ( num_online_nodes() > 1 ) ? 1 : 0;
> >>> +}
> >>
> >> According to the expression and ...
> >>
> >>> --- a/xen/common/page_alloc.c
> >>> +++ b/xen/common/page_alloc.c
> >>> @@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
> >>>      }
> >>>      nr_bootmem_regions = 0;
> >>>
> >>> -    if ( !dma_bitsize && (num_online_nodes() > 1) )
> >>> +    if ( !dma_bitsize && arch_have_default_dmazone() )
> >>>          dma_bitsize = arch_get_dma_bitsize();
> >>
> >> ... the use site, you mean the function to return boolean. Please
> >> indicate so by making it have a return type of "bool". Independent
> >> of that you don't need a conditional expression above, nor
> >> (malformed) use of parentheses. I further wonder whether ...
> >>
> >
> > I will fix them in next version. But I am not very clear about
> > this comment "of that you don't need a conditional expression above",
> > The "above" indicates this line:
> > "return ( num_online_nodes() > 1 ) ? 1 : 0;"?
> 
> Yes. Even without the use of bool such an expression is a more
> complicated form of
> 
>     return num_online_nodes() > 1;
> 
> where we'd prefer to use the simpler variant for being easier to
> read / follow.
> 

Thanks for clarification, I will fix it. 

> Jan
> 
> > [1] https://lists.xenproject.org/archives/html/xen-devel/2021-
> 08/msg00772.html
> >
> >
Jan Beulich Jan. 18, 2022, 2:16 p.m. UTC | #7
On 18.01.2022 10:20, Wei Chen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年1月18日 16:16
>>
>> On 18.01.2022 08:51, Wei Chen wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 2022年1月18日 0:11
>>>> On 23.09.2021 14:02, Wei Chen wrote:
>>>>> In current code, when Xen is running in a multiple nodes NUMA
>>>>> system, it will set dma_bitsize in end_boot_allocator to reserve
>>>>> some low address memory for DMA.
>>>>>
>>>>> There are some x86 implications in current implementation. Becuase
>>>>> on x86, memory starts from 0. On a multiple nodes NUMA system, if
>>>>> a single node contains the majority or all of the DMA memory. x86
>>>>> prefer to give out memory from non-local allocations rather than
>>>>> exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
>>>>> aside some largely arbitrary amount memory for DMA memory ranges.
>>>>> The allocations from these memory ranges would happen only after
>>>>> exhausting all other nodes' memory.
>>>>>
>>>>> But the implications are not shared across all architectures. For
>>>>> example, Arm doesn't have these implications. So in this patch, we
>>>>> introduce an arch_have_default_dmazone helper for arch to determine
>>>>> that it need to set dma_bitsize for reserve DMA allocations or not.
>>>>
>>>> How would Arm guarantee availability of memory below a certain
>>>> boundary for limited-capability devices? Or is there no need
>>>> because there's an assumption that I/O for such devices would
>>>> always pass through an IOMMU, lifting address size restrictions?
>>>> (I guess in a !PV build on x86 we could also get rid of such a
>>>> reservation.)
>>>
>>> On Arm, we still can have some devices with limited DMA capability.
>>> And we also don't force all such devices to use IOMMU. This devices
>>> will affect the dma_bitsize. Like RPi platform, it sets its dma_bitsize
>>> to 30. But in multiple NUMA nodes system, Arm doesn't have a default
>>> DMA zone. Multiple nodes is not a constraint on dma_bitsize. And some
>>> previous discussions are placed here [1].
>>
>> I'm afraid that doesn't give me more clues. For example, in the mail
>> being replied to there I find "That means, only first 4GB memory can
>> be used for DMA." Yet that's not an implication from setting
>> dma_bitsize. DMA is fine to occur to any address. The special address
>> range is being held back in case in particular Dom0 is in need of such
>> a range to perform I/O to _some_ devices.
> 
> I am sorry that my last reply hasn't given you more clues. On Arm, only
> Dom0 can have DMA without IOMMU. So when we allocate memory for Dom0,
> we're trying to allocate memory under 4GB or in the range of dma_bitsize
> indicated. I think these operations meet your above Dom0 special address
> range description. As we have already allocated memory for DMA, so I
> think we don't need a DMA zone in page allocation. I am not sure is that
> answers your earlier question?

I view all of this as flawed, or as a workaround at best. Xen shouldn't
make assumptions on what Dom0 may need. Instead Dom0 should make
arrangements such that it can do I/O to/from all devices of interest.
This may involve arranging for address restricted buffers. And for this
to be possible, Xen would need to have available some suitable memory.
I understand this is complicated by the fact that despite being HVM-like,
due to the lack of an IOMMU in front of certain devices address
restrictions on Dom0 address space alone (i.e. without any Xen
involvement) won't help ...

Jan
Wei Chen Jan. 19, 2022, 2:49 a.m. UTC | #8
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年1月18日 22:16
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
> Subject: Re: [PATCH 04/37] xen: introduce an arch helper for default dma
> zone status
> 
> On 18.01.2022 10:20, Wei Chen wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 2022年1月18日 16:16
> >>
> >> On 18.01.2022 08:51, Wei Chen wrote:
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 2022年1月18日 0:11
> >>>> On 23.09.2021 14:02, Wei Chen wrote:
> >>>>> In current code, when Xen is running in a multiple nodes NUMA
> >>>>> system, it will set dma_bitsize in end_boot_allocator to reserve
> >>>>> some low address memory for DMA.
> >>>>>
> >>>>> There are some x86 implications in current implementation. Becuase
> >>>>> on x86, memory starts from 0. On a multiple nodes NUMA system, if
> >>>>> a single node contains the majority or all of the DMA memory. x86
> >>>>> prefer to give out memory from non-local allocations rather than
> >>>>> exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
> >>>>> aside some largely arbitrary amount memory for DMA memory ranges.
> >>>>> The allocations from these memory ranges would happen only after
> >>>>> exhausting all other nodes' memory.
> >>>>>
> >>>>> But the implications are not shared across all architectures. For
> >>>>> example, Arm doesn't have these implications. So in this patch, we
> >>>>> introduce an arch_have_default_dmazone helper for arch to determine
> >>>>> that it need to set dma_bitsize for reserve DMA allocations or not.
> >>>>
> >>>> How would Arm guarantee availability of memory below a certain
> >>>> boundary for limited-capability devices? Or is there no need
> >>>> because there's an assumption that I/O for such devices would
> >>>> always pass through an IOMMU, lifting address size restrictions?
> >>>> (I guess in a !PV build on x86 we could also get rid of such a
> >>>> reservation.)
> >>>
> >>> On Arm, we still can have some devices with limited DMA capability.
> >>> And we also don't force all such devices to use IOMMU. This devices
> >>> will affect the dma_bitsize. Like RPi platform, it sets its
> dma_bitsize
> >>> to 30. But in multiple NUMA nodes system, Arm doesn't have a default
> >>> DMA zone. Multiple nodes is not a constraint on dma_bitsize. And some
> >>> previous discussions are placed here [1].
> >>
> >> I'm afraid that doesn't give me more clues. For example, in the mail
> >> being replied to there I find "That means, only first 4GB memory can
> >> be used for DMA." Yet that's not an implication from setting
> >> dma_bitsize. DMA is fine to occur to any address. The special address
> >> range is being held back in case in particular Dom0 is in need of such
> >> a range to perform I/O to _some_ devices.
> >
> > I am sorry that my last reply hasn't given you more clues. On Arm, only
> > Dom0 can have DMA without IOMMU. So when we allocate memory for Dom0,
> > we're trying to allocate memory under 4GB or in the range of dma_bitsize
> > indicated. I think these operations meet your above Dom0 special address
> > range description. As we have already allocated memory for DMA, so I
> > think we don't need a DMA zone in page allocation. I am not sure is that
> > answers your earlier question?
> 
> I view all of this as flawed, or as a workaround at best. Xen shouldn't
> make assumptions on what Dom0 may need. Instead Dom0 should make
> arrangements such that it can do I/O to/from all devices of interest.
> This may involve arranging for address restricted buffers. And for this
> to be possible, Xen would need to have available some suitable memory.
> I understand this is complicated by the fact that despite being HVM-like,
> due to the lack of an IOMMU in front of certain devices address
> restrictions on Dom0 address space alone (i.e. without any Xen
> involvement) won't help ...
> 

I agree with you that the current implementation is probably the best
kind of workaround. Do you have some suggestions for this patch to
address above comments? Or should I just need to modify the commit log
to contain some of our above discussions?

Thanks,
Wei Chen

> Jan
Jan Beulich Jan. 19, 2022, 7:50 a.m. UTC | #9
On 19.01.2022 03:49, Wei Chen wrote:
> Hi Jan,
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年1月18日 22:16
>> To: Wei Chen <Wei.Chen@arm.com>
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
>> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
>> Subject: Re: [PATCH 04/37] xen: introduce an arch helper for default dma
>> zone status
>>
>> On 18.01.2022 10:20, Wei Chen wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 2022年1月18日 16:16
>>>>
>>>> On 18.01.2022 08:51, Wei Chen wrote:
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: 2022年1月18日 0:11
>>>>>> On 23.09.2021 14:02, Wei Chen wrote:
>>>>>>> In current code, when Xen is running in a multiple nodes NUMA
>>>>>>> system, it will set dma_bitsize in end_boot_allocator to reserve
>>>>>>> some low address memory for DMA.
>>>>>>>
>>>>>>> There are some x86 implications in current implementation. Becuase
>>>>>>> on x86, memory starts from 0. On a multiple nodes NUMA system, if
>>>>>>> a single node contains the majority or all of the DMA memory. x86
>>>>>>> prefer to give out memory from non-local allocations rather than
>>>>>>> exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
>>>>>>> aside some largely arbitrary amount memory for DMA memory ranges.
>>>>>>> The allocations from these memory ranges would happen only after
>>>>>>> exhausting all other nodes' memory.
>>>>>>>
>>>>>>> But the implications are not shared across all architectures. For
>>>>>>> example, Arm doesn't have these implications. So in this patch, we
>>>>>>> introduce an arch_have_default_dmazone helper for arch to determine
>>>>>>> that it need to set dma_bitsize for reserve DMA allocations or not.
>>>>>>
>>>>>> How would Arm guarantee availability of memory below a certain
>>>>>> boundary for limited-capability devices? Or is there no need
>>>>>> because there's an assumption that I/O for such devices would
>>>>>> always pass through an IOMMU, lifting address size restrictions?
>>>>>> (I guess in a !PV build on x86 we could also get rid of such a
>>>>>> reservation.)
>>>>>
>>>>> On Arm, we still can have some devices with limited DMA capability.
>>>>> And we also don't force all such devices to use IOMMU. This devices
>>>>> will affect the dma_bitsize. Like RPi platform, it sets its
>> dma_bitsize
>>>>> to 30. But in multiple NUMA nodes system, Arm doesn't have a default
>>>>> DMA zone. Multiple nodes is not a constraint on dma_bitsize. And some
>>>>> previous discussions are placed here [1].
>>>>
>>>> I'm afraid that doesn't give me more clues. For example, in the mail
>>>> being replied to there I find "That means, only first 4GB memory can
>>>> be used for DMA." Yet that's not an implication from setting
>>>> dma_bitsize. DMA is fine to occur to any address. The special address
>>>> range is being held back in case in particular Dom0 is in need of such
>>>> a range to perform I/O to _some_ devices.
>>>
>>> I am sorry that my last reply hasn't given you more clues. On Arm, only
>>> Dom0 can have DMA without IOMMU. So when we allocate memory for Dom0,
>>> we're trying to allocate memory under 4GB or in the range of dma_bitsize
>>> indicated. I think these operations meet your above Dom0 special address
>>> range description. As we have already allocated memory for DMA, so I
>>> think we don't need a DMA zone in page allocation. I am not sure is that
>>> answers your earlier question?
>>
>> I view all of this as flawed, or as a workaround at best. Xen shouldn't
>> make assumptions on what Dom0 may need. Instead Dom0 should make
>> arrangements such that it can do I/O to/from all devices of interest.
>> This may involve arranging for address restricted buffers. And for this
>> to be possible, Xen would need to have available some suitable memory.
>> I understand this is complicated by the fact that despite being HVM-like,
>> due to the lack of an IOMMU in front of certain devices address
>> restrictions on Dom0 address space alone (i.e. without any Xen
>> involvement) won't help ...
>>
> 
> I agree with you that the current implementation is probably the best
> kind of workaround. Do you have some suggestions for this patch to
> address above comments? Or should I just need to modify the commit log
> to contain some of our above discussions?

Extending the description is my primary request, or else we may end up
having the same discussion every time you submit a new version. As to
improving the situation such that preferably per-arch customization
wouldn't be necessary - that's perhaps better to be thought about by
Arm folks. Otoh, as said, an x86 build with CONFIG_PV=n could probably
leverage the new hook to actually not trigger reservation.

Jan
Wei Chen Jan. 19, 2022, 8:33 a.m. UTC | #10
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年1月19日 15:50
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
> Subject: Re: [PATCH 04/37] xen: introduce an arch helper for default dma
> zone status
> 
> On 19.01.2022 03:49, Wei Chen wrote:
> > Hi Jan,
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 2022年1月18日 22:16
> >> To: Wei Chen <Wei.Chen@arm.com>
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> >> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
> >> Subject: Re: [PATCH 04/37] xen: introduce an arch helper for default
> dma
> >> zone status
> >>
> >> On 18.01.2022 10:20, Wei Chen wrote:
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 2022年1月18日 16:16
> >>>>
> >>>> On 18.01.2022 08:51, Wei Chen wrote:
> >>>>>> From: Jan Beulich <jbeulich@suse.com>
> >>>>>> Sent: 2022年1月18日 0:11
> >>>>>> On 23.09.2021 14:02, Wei Chen wrote:
> >>>>>>> In current code, when Xen is running in a multiple nodes NUMA
> >>>>>>> system, it will set dma_bitsize in end_boot_allocator to reserve
> >>>>>>> some low address memory for DMA.
> >>>>>>>
> >>>>>>> There are some x86 implications in current implementation. Becuase
> >>>>>>> on x86, memory starts from 0. On a multiple nodes NUMA system, if
> >>>>>>> a single node contains the majority or all of the DMA memory. x86
> >>>>>>> prefer to give out memory from non-local allocations rather than
> >>>>>>> exhausting the DMA memory ranges. Hence x86 use dma_bitsize to set
> >>>>>>> aside some largely arbitrary amount memory for DMA memory ranges.
> >>>>>>> The allocations from these memory ranges would happen only after
> >>>>>>> exhausting all other nodes' memory.
> >>>>>>>
> >>>>>>> But the implications are not shared across all architectures. For
> >>>>>>> example, Arm doesn't have these implications. So in this patch, we
> >>>>>>> introduce an arch_have_default_dmazone helper for arch to
> determine
> >>>>>>> that it need to set dma_bitsize for reserve DMA allocations or not.
> >>>>>>
> >>>>>> How would Arm guarantee availability of memory below a certain
> >>>>>> boundary for limited-capability devices? Or is there no need
> >>>>>> because there's an assumption that I/O for such devices would
> >>>>>> always pass through an IOMMU, lifting address size restrictions?
> >>>>>> (I guess in a !PV build on x86 we could also get rid of such a
> >>>>>> reservation.)
> >>>>>
> >>>>> On Arm, we still can have some devices with limited DMA capability.
> >>>>> And we also don't force all such devices to use IOMMU. This devices
> >>>>> will affect the dma_bitsize. Like RPi platform, it sets its
> >> dma_bitsize
> >>>>> to 30. But in multiple NUMA nodes system, Arm doesn't have a default
> >>>>> DMA zone. Multiple nodes is not a constraint on dma_bitsize. And
> some
> >>>>> previous discussions are placed here [1].
> >>>>
> >>>> I'm afraid that doesn't give me more clues. For example, in the mail
> >>>> being replied to there I find "That means, only first 4GB memory can
> >>>> be used for DMA." Yet that's not an implication from setting
> >>>> dma_bitsize. DMA is fine to occur to any address. The special address
> >>>> range is being held back in case in particular Dom0 is in need of
> such
> >>>> a range to perform I/O to _some_ devices.
> >>>
> >>> I am sorry that my last reply hasn't given you more clues. On Arm,
> only
> >>> Dom0 can have DMA without IOMMU. So when we allocate memory for Dom0,
> >>> we're trying to allocate memory under 4GB or in the range of
> dma_bitsize
> >>> indicated. I think these operations meet your above Dom0 special
> address
> >>> range description. As we have already allocated memory for DMA, so I
> >>> think we don't need a DMA zone in page allocation. I am not sure is
> that
> >>> answers your earlier question?
> >>
> >> I view all of this as flawed, or as a workaround at best. Xen shouldn't
> >> make assumptions on what Dom0 may need. Instead Dom0 should make
> >> arrangements such that it can do I/O to/from all devices of interest.
> >> This may involve arranging for address restricted buffers. And for this
> >> to be possible, Xen would need to have available some suitable memory.
> >> I understand this is complicated by the fact that despite being HVM-
> like,
> >> due to the lack of an IOMMU in front of certain devices address
> >> restrictions on Dom0 address space alone (i.e. without any Xen
> >> involvement) won't help ...
> >>
> >
> > I agree with you that the current implementation is probably the best
> > kind of workaround. Do you have some suggestions for this patch to
> > address above comments? Or should I just need to modify the commit log
> > to contain some of our above discussions?
> 
> Extending the description is my primary request, or else we may end up
> having the same discussion every time you submit a new version. As to
> improving the situation such that preferably per-arch customization
> wouldn't be necessary - that's perhaps better to be thought about by
> Arm folks. Otoh, as said, an x86 build with CONFIG_PV=n could probably
> leverage the new hook to actually not trigger reservation.
> 

Ok, I will try to extend the description to avoid the same discussion
in next version.

> Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index ce79ee44ce..1fabbe8281 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -371,6 +371,11 @@  unsigned int __init arch_get_dma_bitsize(void)
                  + PAGE_SHIFT, 32);
 }
 
+unsigned int arch_have_default_dmazone(void)
+{
+    return ( num_online_nodes() > 1 ) ? 1 : 0;
+}
+
 static void dump_numa(unsigned char key)
 {
     s_time_t now = NOW();
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 5801358b4b..80916205e5 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1889,7 +1889,7 @@  void __init end_boot_allocator(void)
     }
     nr_bootmem_regions = 0;
 
-    if ( !dma_bitsize && (num_online_nodes() > 1) )
+    if ( !dma_bitsize && arch_have_default_dmazone() )
         dma_bitsize = arch_get_dma_bitsize();
 
     printk("Domain heap initialised");
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index 31a6de4e23..9d5739542d 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -25,6 +25,11 @@  extern mfn_t first_valid_mfn;
 #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
 #define __node_distance(a, b) (20)
 
+static inline unsigned int arch_have_default_dmazone(void)
+{
+    return 0;
+}
+
 #endif /* __ARCH_ARM_NUMA_H */
 /*
  * Local variables:
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index 3cf26c2def..8060cbf3f4 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -78,5 +78,6 @@  extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
 void srat_parse_regions(u64 addr);
 extern u8 __node_distance(nodeid_t a, nodeid_t b);
 unsigned int arch_get_dma_bitsize(void);
+unsigned int arch_have_default_dmazone(void);
 
 #endif