diff mbox series

[2/2] NUMA: replace phys_to_nid()

Message ID 670b7017-4a6e-fa9e-9d65-65013bd4ad80@suse.com (mailing list archive)
State New, archived
Headers show
Series NUMA: phys_to_nid() related adjustments | expand

Commit Message

Jan Beulich Dec. 13, 2022, 11:38 a.m. UTC
All callers convert frame numbers (perhaps in turn derived from struct
page_info pointers) to an address, just for the function to convert it
back to a frame number (as the first step of paddr_to_pdx()). Replace
the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
call sites by the respectively most suitable one.

While there also introduce a !NUMA stub, eliminating the need for Arm
(and potentially other ports) to carry one individually.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
At the top of free_heap_pages() mfn_to_nid() could also be used, since
the MFN is calculated immediately ahead. The choice of using
page_to_nid() (for now at least) was with the earlier patch's RFC in
mind, addressing of which may require to make mfn_to_nid() do weaker
checking than page_to_nid().

Comments

Julien Grall Dec. 13, 2022, 12:06 p.m. UTC | #1
Hi Jan,

On 13/12/2022 11:38, Jan Beulich wrote:
> All callers convert frame numbers (perhaps in turn derived from struct
> page_info pointers) to an address, just for the function to convert it
> back to a frame number (as the first step of paddr_to_pdx()). Replace
> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
> call sites by the respectively most suitable one.
> 
> While there also introduce a !NUMA stub, eliminating the need for Arm
> (and potentially other ports) to carry one individually.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> At the top of free_heap_pages() mfn_to_nid() could also be used, since
> the MFN is calculated immediately ahead. The choice of using
> page_to_nid() (for now at least) was with the earlier patch's RFC in
> mind, addressing of which may require to make mfn_to_nid() do weaker
> checking than page_to_nid().

I haven't looked in details at the previous patch. However, I don't like 
the idea of making mfn_to_nid() do weaker checking because this could 
easily confuse the reader/developper.

If you want to use weaker check, then it would be better if a separate 
helper is provided with a name reflecting its purpose.

> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned
>   
>       for_each_online_node ( i )
>       {
> -        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
> +        mfn_t mfn = _mfn(node_start_pfn(i) + 1);
>   
>           printk("NODE%u start->%lu size->%lu free->%lu\n",
>                  i, node_start_pfn(i), node_spanned_pages(i),
>                  avail_node_heap_pages(i));
> -        /* Sanity check phys_to_nid() */
> -        if ( phys_to_nid(pa) != i )
> -            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
> -                   pa, phys_to_nid(pa), i);
> +        /* Sanity check mfn_to_nid() */
> +        if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i )


 From the commit message, I would have expected that we would only 
replace phys_to_nid() with either mfn_to_nid() or page_to_nid(). 
However, here you added node_spanned_pages(). Can you explain why?

Cheers,
Jan Beulich Dec. 13, 2022, 12:46 p.m. UTC | #2
On 13.12.2022 13:06, Julien Grall wrote:
> On 13/12/2022 11:38, Jan Beulich wrote:
>> All callers convert frame numbers (perhaps in turn derived from struct
>> page_info pointers) to an address, just for the function to convert it
>> back to a frame number (as the first step of paddr_to_pdx()). Replace
>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
>> call sites by the respectively most suitable one.
>>
>> While there also introduce a !NUMA stub, eliminating the need for Arm
>> (and potentially other ports) to carry one individually.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> At the top of free_heap_pages() mfn_to_nid() could also be used, since
>> the MFN is calculated immediately ahead. The choice of using
>> page_to_nid() (for now at least) was with the earlier patch's RFC in
>> mind, addressing of which may require to make mfn_to_nid() do weaker
>> checking than page_to_nid().
> 
> I haven't looked in details at the previous patch. However, I don't like 
> the idea of making mfn_to_nid() do weaker checking because this could 
> easily confuse the reader/developper.
> 
> If you want to use weaker check, then it would be better if a separate 
> helper is provided with a name reflecting its purpose.

Well, the purpose then still is the very same conversion, so the name
is quite appropriate. I don't view mfn_to_nid_bug_dont_look_very_closely()
(exaggerating) as very sensible a name.

>> --- a/xen/common/numa.c
>> +++ b/xen/common/numa.c
>> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned
>>   
>>       for_each_online_node ( i )
>>       {
>> -        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
>> +        mfn_t mfn = _mfn(node_start_pfn(i) + 1);
>>   
>>           printk("NODE%u start->%lu size->%lu free->%lu\n",
>>                  i, node_start_pfn(i), node_spanned_pages(i),
>>                  avail_node_heap_pages(i));
>> -        /* Sanity check phys_to_nid() */
>> -        if ( phys_to_nid(pa) != i )
>> -            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
>> -                   pa, phys_to_nid(pa), i);
>> +        /* Sanity check mfn_to_nid() */
>> +        if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i )
> 
> 
>  From the commit message, I would have expected that we would only 
> replace phys_to_nid() with either mfn_to_nid() or page_to_nid(). 
> However, here you added node_spanned_pages(). Can you explain why?

Oh, indeed, I meant to say a word on this but then forgot. This
simply is because the adding of 1 to the start PFN (which by
itself is imo a little funny) makes it so that the printk()
inside the conditional would be certain to be called for an
empty (e.g. CPU-only) node.

Jan
Julien Grall Dec. 13, 2022, 1:48 p.m. UTC | #3
Hi Jan,

On 13/12/2022 12:46, Jan Beulich wrote:
> On 13.12.2022 13:06, Julien Grall wrote:
>> On 13/12/2022 11:38, Jan Beulich wrote:
>>> All callers convert frame numbers (perhaps in turn derived from struct
>>> page_info pointers) to an address, just for the function to convert it
>>> back to a frame number (as the first step of paddr_to_pdx()). Replace
>>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
>>> call sites by the respectively most suitable one.
>>>
>>> While there also introduce a !NUMA stub, eliminating the need for Arm
>>> (and potentially other ports) to carry one individually.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> At the top of free_heap_pages() mfn_to_nid() could also be used, since
>>> the MFN is calculated immediately ahead. The choice of using
>>> page_to_nid() (for now at least) was with the earlier patch's RFC in
>>> mind, addressing of which may require to make mfn_to_nid() do weaker
>>> checking than page_to_nid().
>>
>> I haven't looked in details at the previous patch. However, I don't like
>> the idea of making mfn_to_nid() do weaker checking because this could
>> easily confuse the reader/developper.
>>
>> If you want to use weaker check, then it would be better if a separate
>> helper is provided with a name reflecting its purpose.
> 
> Well, the purpose then still is the very same conversion, so the name
> is quite appropriate. I don't view mfn_to_nid_bug_dont_look_very_closely()
> (exaggerating) as very sensible a name.

I understand they are both doing the same conversion. But the checks 
will be different. With your proposal, we are now going to say if the 
caller is "buggy" then use mfn_to_nid() if not then you can use any.

I think this is wrong to hide the "bug" just because the name is longer. 
In fact, it means that any non-buggy caller will still have relaxed 
check. The risk if we are going to introduce more "buggy" caller in the 
future.

So from my perspective there are only two acceptable solutions:
   1. Provide a different helper that will be used for just "buggy" 
caller. This will make super clear that the helper should only be used 
in very limited circumstances.
   2. Fix the "buggy" callers.

 From your previous e-mails, it wasn't clear whether 2) is possible. So 
that's leave us only with 1).

>>> --- a/xen/common/numa.c
>>> +++ b/xen/common/numa.c
>>> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned
>>>    
>>>        for_each_online_node ( i )
>>>        {
>>> -        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
>>> +        mfn_t mfn = _mfn(node_start_pfn(i) + 1);
>>>    
>>>            printk("NODE%u start->%lu size->%lu free->%lu\n",
>>>                   i, node_start_pfn(i), node_spanned_pages(i),
>>>                   avail_node_heap_pages(i));
>>> -        /* Sanity check phys_to_nid() */
>>> -        if ( phys_to_nid(pa) != i )
>>> -            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
>>> -                   pa, phys_to_nid(pa), i);
>>> +        /* Sanity check mfn_to_nid() */
>>> +        if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i )
>>
>>
>>   From the commit message, I would have expected that we would only
>> replace phys_to_nid() with either mfn_to_nid() or page_to_nid().
>> However, here you added node_spanned_pages(). Can you explain why?
> 
> Oh, indeed, I meant to say a word on this but then forgot. This
> simply is because the adding of 1 to the start PFN (which by
> itself is imo a little funny) makes it so that the printk()
> inside the conditional would be certain to be called for an
> empty (e.g. CPU-only) node.

Ok. I think this wants to be a separate patch as this sounds like bug 
and we should avoid mixing code conversion with bug fix.

Cheers,
Jan Beulich Dec. 13, 2022, 2:08 p.m. UTC | #4
On 13.12.2022 14:48, Julien Grall wrote:
> On 13/12/2022 12:46, Jan Beulich wrote:
>> On 13.12.2022 13:06, Julien Grall wrote:
>>> On 13/12/2022 11:38, Jan Beulich wrote:
>>>> All callers convert frame numbers (perhaps in turn derived from struct
>>>> page_info pointers) to an address, just for the function to convert it
>>>> back to a frame number (as the first step of paddr_to_pdx()). Replace
>>>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
>>>> call sites by the respectively most suitable one.
>>>>
>>>> While there also introduce a !NUMA stub, eliminating the need for Arm
>>>> (and potentially other ports) to carry one individually.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> At the top of free_heap_pages() mfn_to_nid() could also be used, since
>>>> the MFN is calculated immediately ahead. The choice of using
>>>> page_to_nid() (for now at least) was with the earlier patch's RFC in
>>>> mind, addressing of which may require to make mfn_to_nid() do weaker
>>>> checking than page_to_nid().
>>>
>>> I haven't looked in details at the previous patch. However, I don't like
>>> the idea of making mfn_to_nid() do weaker checking because this could
>>> easily confuse the reader/developper.
>>>
>>> If you want to use weaker check, then it would be better if a separate
>>> helper is provided with a name reflecting its purpose.
>>
>> Well, the purpose then still is the very same conversion, so the name
>> is quite appropriate. I don't view mfn_to_nid_bug_dont_look_very_closely()
>> (exaggerating) as very sensible a name.
> 
> I understand they are both doing the same conversion. But the checks 
> will be different. With your proposal, we are now going to say if the 
> caller is "buggy" then use mfn_to_nid() if not then you can use any.
> 
> I think this is wrong to hide the "bug" just because the name is longer. 
> In fact, it means that any non-buggy caller will still have relaxed 
> check. The risk if we are going to introduce more "buggy" caller in the 
> future.

While I, too, have taken your perspective as one possible one, I've
also been considering a slightly different perspective: page_to_nid()
implies the caller to have a struct page_info *, which in turn implies
you pass in something identifying valid memory (which hence should have
a valid node ID associated with it). mfn_to_nid(), otoh, has nothing
to pre-qualify (see patch 1's RFC remark as to mfn_valid() not being
sufficient). Hence less rigid checking there can make sense (and you'll
notice that mfn_to_nid() was also used quite sparingly in the course of
the conversion.)

> So from my perspective there are only two acceptable solutions:
>    1. Provide a different helper that will be used for just "buggy" 
> caller. This will make super clear that the helper should only be used 
> in very limited circumstances.
>    2. Fix the "buggy" callers.
> 
>  From your previous e-mails, it wasn't clear whether 2) is possible. So 
> that's leave us only with 1).

The buggy callers are the ones touched by patch 1; see (again) the RFC
remark there for limitations of that approach.

>>>> --- a/xen/common/numa.c
>>>> +++ b/xen/common/numa.c
>>>> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned
>>>>    
>>>>        for_each_online_node ( i )
>>>>        {
>>>> -        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
>>>> +        mfn_t mfn = _mfn(node_start_pfn(i) + 1);
>>>>    
>>>>            printk("NODE%u start->%lu size->%lu free->%lu\n",
>>>>                   i, node_start_pfn(i), node_spanned_pages(i),
>>>>                   avail_node_heap_pages(i));
>>>> -        /* Sanity check phys_to_nid() */
>>>> -        if ( phys_to_nid(pa) != i )
>>>> -            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
>>>> -                   pa, phys_to_nid(pa), i);
>>>> +        /* Sanity check mfn_to_nid() */
>>>> +        if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i )
>>>
>>>
>>>   From the commit message, I would have expected that we would only
>>> replace phys_to_nid() with either mfn_to_nid() or page_to_nid().
>>> However, here you added node_spanned_pages(). Can you explain why?
>>
>> Oh, indeed, I meant to say a word on this but then forgot. This
>> simply is because the adding of 1 to the start PFN (which by
>> itself is imo a little funny) makes it so that the printk()
>> inside the conditional would be certain to be called for an
>> empty (e.g. CPU-only) node.
> 
> Ok. I think this wants to be a separate patch as this sounds like bug 
> and we should avoid mixing code conversion with bug fix.

Yet then this is only in a debug key handler. (Else I would have made
it a separate patch, yes.)

Jan
Julien Grall Dec. 13, 2022, 9:33 p.m. UTC | #5
Hi Jan,

On 13/12/2022 14:08, Jan Beulich wrote:
> On 13.12.2022 14:48, Julien Grall wrote:
>> On 13/12/2022 12:46, Jan Beulich wrote:
>>> On 13.12.2022 13:06, Julien Grall wrote:
>>>> On 13/12/2022 11:38, Jan Beulich wrote:
>>>>> All callers convert frame numbers (perhaps in turn derived from struct
>>>>> page_info pointers) to an address, just for the function to convert it
>>>>> back to a frame number (as the first step of paddr_to_pdx()). Replace
>>>>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
>>>>> call sites by the respectively most suitable one.
>>>>>
>>>>> While there also introduce a !NUMA stub, eliminating the need for Arm
>>>>> (and potentially other ports) to carry one individually.
>>>>>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>> ---
>>>>> At the top of free_heap_pages() mfn_to_nid() could also be used, since
>>>>> the MFN is calculated immediately ahead. The choice of using
>>>>> page_to_nid() (for now at least) was with the earlier patch's RFC in
>>>>> mind, addressing of which may require to make mfn_to_nid() do weaker
>>>>> checking than page_to_nid().
>>>>
>>>> I haven't looked in details at the previous patch. However, I don't like
>>>> the idea of making mfn_to_nid() do weaker checking because this could
>>>> easily confuse the reader/developper.
>>>>
>>>> If you want to use weaker check, then it would be better if a separate
>>>> helper is provided with a name reflecting its purpose.
>>>
>>> Well, the purpose then still is the very same conversion, so the name
>>> is quite appropriate. I don't view mfn_to_nid_bug_dont_look_very_closely()
>>> (exaggerating) as very sensible a name.
>>
>> I understand they are both doing the same conversion. But the checks
>> will be different. With your proposal, we are now going to say if the
>> caller is "buggy" then use mfn_to_nid() if not then you can use any.
>>
>> I think this is wrong to hide the "bug" just because the name is longer.
>> In fact, it means that any non-buggy caller will still have relaxed
>> check. The risk if we are going to introduce more "buggy" caller in the
>> future.
> 
> While I, too, have taken your perspective as one possible one, I've
> also been considering a slightly different perspective: page_to_nid()
> implies the caller to have a struct page_info *, which in turn implies
> you pass in something identifying valid memory (which hence should have
> a valid node ID associated with it). mfn_to_nid(), otoh, has nothing
> to pre-qualify (see patch 1's RFC remark as to mfn_valid() not being
> sufficient). Hence less rigid checking there can make sense (and you'll
> notice that mfn_to_nid() was also used quite sparingly in the course of
> the conversion.)
> 
>> So from my perspective there are only two acceptable solutions:
>>     1. Provide a different helper that will be used for just "buggy"
>> caller. This will make super clear that the helper should only be used
>> in very limited circumstances.
>>     2. Fix the "buggy" callers.
>>
>>   From your previous e-mails, it wasn't clear whether 2) is possible. So
>> that's leave us only with 1).
> 
> The buggy callers are the ones touched by patch 1; see (again) the RFC
> remark there for limitations of that approach.

Even with what you wrote above, I still think that relaxing the check 
for everyone is wrong. Anyway, this patch is not changing the helper. So 
I will wait and see a formal proposal.

> 
>>>>> --- a/xen/common/numa.c
>>>>> +++ b/xen/common/numa.c
>>>>> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned
>>>>>     
>>>>>         for_each_online_node ( i )
>>>>>         {
>>>>> -        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
>>>>> +        mfn_t mfn = _mfn(node_start_pfn(i) + 1);
>>>>>     
>>>>>             printk("NODE%u start->%lu size->%lu free->%lu\n",
>>>>>                    i, node_start_pfn(i), node_spanned_pages(i),
>>>>>                    avail_node_heap_pages(i));
>>>>> -        /* Sanity check phys_to_nid() */
>>>>> -        if ( phys_to_nid(pa) != i )
>>>>> -            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
>>>>> -                   pa, phys_to_nid(pa), i);
>>>>> +        /* Sanity check mfn_to_nid() */
>>>>> +        if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i )
>>>>
>>>>
>>>>    From the commit message, I would have expected that we would only
>>>> replace phys_to_nid() with either mfn_to_nid() or page_to_nid().
>>>> However, here you added node_spanned_pages(). Can you explain why?
>>>
>>> Oh, indeed, I meant to say a word on this but then forgot. This
>>> simply is because the adding of 1 to the start PFN (which by
>>> itself is imo a little funny) makes it so that the printk()
>>> inside the conditional would be certain to be called for an
>>> empty (e.g. CPU-only) node.
>>
>> Ok. I think this wants to be a separate patch as this sounds like bug
>> and we should avoid mixing code conversion with bug fix.
> 
> Yet then this is only in a debug key handler. (Else I would have made
> it a separate patch, yes.)

IMO, the fact it is a debug key handler doesn't matter. While I am 
generally OK if we do minor swapin patch modifying the behavior. I think 
the other way around is quite confusing. And therefore, I would rather 
prefer the split unless another maintainer thinks otherwise.

Cheers,
Andrew Cooper Dec. 16, 2022, 11:49 a.m. UTC | #6
On 13/12/2022 11:38 am, Jan Beulich wrote:
> All callers convert frame numbers (perhaps in turn derived from struct
> page_info pointers) to an address, just for the function to convert it
> back to a frame number (as the first step of paddr_to_pdx()). Replace
> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
> call sites by the respectively most suitable one.
>
> While there also introduce a !NUMA stub, eliminating the need for Arm
> (and potentially other ports) to carry one individually.

Thanks.  This will help RISC-V too.

> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>, albeit with one
deletion.

> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -1,6 +1,7 @@
>  #ifndef _XEN_NUMA_H
>  #define _XEN_NUMA_H
>  
> +#include <xen/mm-frame.h>
>  #include <asm/numa.h>
>  
>  #define NUMA_NO_NODE     0xFF
> @@ -68,12 +69,15 @@ struct node_data {
>  
>  extern struct node_data node_data[];
>  
> -static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr)
> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn)
>  {
>      nodeid_t nid;
> -    ASSERT((paddr_to_pdx(addr) >> memnode_shift) < memnodemapsize);
> -    nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
> +    unsigned long pdx = mfn_to_pdx(mfn);
> +
> +    ASSERT((pdx >> memnode_shift) < memnodemapsize);
> +    nid = memnodemap[pdx >> memnode_shift];
>      ASSERT(nid < MAX_NUMNODES && node_data[nid].node_spanned_pages);
> +
>      return nid;
>  }
>  
> @@ -102,6 +106,15 @@ extern bool numa_update_node_memblks(nod
>                                       paddr_t start, paddr_t size, bool hotplug);
>  extern void numa_set_processor_nodes_parsed(nodeid_t node);
>  
> +#else
> +
> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn)
> +{
> +    return 0;
> +}

pure is useless on a stub like this, whereas its false on the non-stub
form (uses several non-const variables) in a way that the compiler can
prove (because it's static inline), and will discard.

As you're modifying both lines anyway, just drop the attribute.

~Andrew
Jan Beulich Dec. 16, 2022, 11:59 a.m. UTC | #7
On 16.12.2022 12:49, Andrew Cooper wrote:
> On 13/12/2022 11:38 am, Jan Beulich wrote:
>> All callers convert frame numbers (perhaps in turn derived from struct
>> page_info pointers) to an address, just for the function to convert it
>> back to a frame number (as the first step of paddr_to_pdx()). Replace
>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
>> call sites by the respectively most suitable one.
>>
>> While there also introduce a !NUMA stub, eliminating the need for Arm
>> (and potentially other ports) to carry one individually.
> 
> Thanks.  This will help RISC-V too.
> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>,

Thanks. You realize though that the patch may change depending on the
verdict on patch 1 (and, if that one's to change, the two likely
flipped with the actual fix moving here in the form of more relaxed
assertions, one way or another)?

> albeit with one deletion.
> 
>> --- a/xen/include/xen/numa.h
>> +++ b/xen/include/xen/numa.h
>> @@ -1,6 +1,7 @@
>>  #ifndef _XEN_NUMA_H
>>  #define _XEN_NUMA_H
>>  
>> +#include <xen/mm-frame.h>
>>  #include <asm/numa.h>
>>  
>>  #define NUMA_NO_NODE     0xFF
>> @@ -68,12 +69,15 @@ struct node_data {
>>  
>>  extern struct node_data node_data[];
>>  
>> -static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr)
>> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn)
>>  {
>>      nodeid_t nid;
>> -    ASSERT((paddr_to_pdx(addr) >> memnode_shift) < memnodemapsize);
>> -    nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
>> +    unsigned long pdx = mfn_to_pdx(mfn);
>> +
>> +    ASSERT((pdx >> memnode_shift) < memnodemapsize);
>> +    nid = memnodemap[pdx >> memnode_shift];
>>      ASSERT(nid < MAX_NUMNODES && node_data[nid].node_spanned_pages);
>> +
>>      return nid;
>>  }
>>  
>> @@ -102,6 +106,15 @@ extern bool numa_update_node_memblks(nod
>>                                       paddr_t start, paddr_t size, bool hotplug);
>>  extern void numa_set_processor_nodes_parsed(nodeid_t node);
>>  
>> +#else
>> +
>> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn)
>> +{
>> +    return 0;
>> +}
> 
> pure is useless on a stub like this, whereas its false on the non-stub
> form (uses several non-const variables) in a way that the compiler can
> prove (because it's static inline), and will discard.
> 
> As you're modifying both lines anyway, just drop the attribute.

Hmm, yes, I agree for the stub, so I've dropped it there. "Several non-
const variables", however, is only partly true. These are __ro_after_init
and not written anymore once set. Are you sure the compiler will ignore
a "pure" attribute if it finds it (formally) violated? That would be
somewhat odd, as it means differing behavior depending on whether the
same piece of code is in an inline or out-of-line function.

Jan
Andrew Cooper Dec. 16, 2022, 2:27 p.m. UTC | #8
On 16/12/2022 11:59 am, Jan Beulich wrote:
> On 16.12.2022 12:49, Andrew Cooper wrote:
>> On 13/12/2022 11:38 am, Jan Beulich wrote:
>>> All callers convert frame numbers (perhaps in turn derived from struct
>>> page_info pointers) to an address, just for the function to convert it
>>> back to a frame number (as the first step of paddr_to_pdx()). Replace
>>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
>>> call sites by the respectively most suitable one.
>>>
>>> While there also introduce a !NUMA stub, eliminating the need for Arm
>>> (and potentially other ports) to carry one individually.
>> Thanks.  This will help RISC-V too.
>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>,
> Thanks. You realize though that the patch may change depending on the
> verdict on patch 1 (and, if that one's to change, the two likely
> flipped with the actual fix moving here in the form of more relaxed
> assertions, one way or another)?

Yeah, the tweak sounded entirely reasonable.

>
>> albeit with one deletion.
>>
>>> --- a/xen/include/xen/numa.h
>>> +++ b/xen/include/xen/numa.h
>>> @@ -1,6 +1,7 @@
>>>  #ifndef _XEN_NUMA_H
>>>  #define _XEN_NUMA_H
>>>  
>>> +#include <xen/mm-frame.h>
>>>  #include <asm/numa.h>
>>>  
>>>  #define NUMA_NO_NODE     0xFF
>>> @@ -68,12 +69,15 @@ struct node_data {
>>>  
>>>  extern struct node_data node_data[];
>>>  
>>> -static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr)
>>> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn)
>>>  {
>>>      nodeid_t nid;
>>> -    ASSERT((paddr_to_pdx(addr) >> memnode_shift) < memnodemapsize);
>>> -    nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
>>> +    unsigned long pdx = mfn_to_pdx(mfn);
>>> +
>>> +    ASSERT((pdx >> memnode_shift) < memnodemapsize);
>>> +    nid = memnodemap[pdx >> memnode_shift];
>>>      ASSERT(nid < MAX_NUMNODES && node_data[nid].node_spanned_pages);
>>> +
>>>      return nid;
>>>  }
>>>  
>>> @@ -102,6 +106,15 @@ extern bool numa_update_node_memblks(nod
>>>                                       paddr_t start, paddr_t size, bool hotplug);
>>>  extern void numa_set_processor_nodes_parsed(nodeid_t node);
>>>  
>>> +#else
>>> +
>>> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn)
>>> +{
>>> +    return 0;
>>> +}
>> pure is useless on a stub like this, whereas its false on the non-stub
>> form (uses several non-const variables) in a way that the compiler can
>> prove (because it's static inline), and will discard.
>>
>> As you're modifying both lines anyway, just drop the attribute.
> Hmm, yes, I agree for the stub, so I've dropped it there. "Several non-
> const variables", however, is only partly true. These are __ro_after_init
> and not written anymore once set.

They're still read-write as far as C is concerned, and some of these
uses are before modifications finish.

>  Are you sure the compiler will ignore
> a "pure" attribute if it finds it (formally) violated?

Yes, very sure.  It got discussed at length on one of the speculation lists.

When the compiler can prove that the programmer doesn't know the rules
concerning pure/const, the attributes will be discarded.

To abuse the rules, you really do need the operation hidden in a place
that GCC can't see, so either a separate translation unit, or in inline
assembly.

~Andrew
diff mbox series

Patch

--- a/xen/arch/arm/include/asm/numa.h
+++ b/xen/arch/arm/include/asm/numa.h
@@ -11,11 +11,6 @@  typedef u8 nodeid_t;
 #define cpu_to_node(cpu) 0
 #define node_to_cpumask(node)   (cpu_online_map)
 
-static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
-{
-    return 0;
-}
-
 /*
  * TODO: make first_valid_mfn static when NUMA is supported on Arm, this
  * is required because the dummy helpers are using it.
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -492,7 +492,7 @@  p2m_pod_offline_or_broken_replace(struct
 {
     struct domain *d;
     struct p2m_domain *p2m;
-    nodeid_t node = phys_to_nid(page_to_maddr(p));
+    nodeid_t node = page_to_nid(p);
 
     if ( !(d = page_get_owner(p)) || !(p2m = p2m_get_hostp2m(d)) )
         return;
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -565,7 +565,7 @@  void __init paging_init(void)
                 if ( n == CNT )
                     ++holes;
                 else if ( k == holes )
-                    memflags = MEMF_node(phys_to_nid(mfn_to_maddr(mfn)));
+                    memflags = MEMF_node(mfn_to_nid(mfn));
             }
             if ( k == holes )
             {
@@ -600,7 +600,7 @@  void __init paging_init(void)
             mfn = _mfn(MFN(i) + n * PDX_GROUP_COUNT);
             if ( mfn_valid(mfn) )
             {
-                memflags = MEMF_node(phys_to_nid(mfn_to_maddr(mfn)));
+                memflags = MEMF_node(mfn_to_nid(mfn));
                 break;
             }
         }
@@ -677,7 +677,7 @@  void __init paging_init(void)
             mfn = _mfn(MFN(i) + n * PDX_GROUP_COUNT);
             if ( mfn_valid(mfn) )
             {
-                memflags = MEMF_node(phys_to_nid(mfn_to_maddr(mfn)));
+                memflags = MEMF_node(mfn_to_nid(mfn));
                 break;
             }
         }
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -671,15 +671,15 @@  static void cf_check dump_numa(unsigned
 
     for_each_online_node ( i )
     {
-        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
+        mfn_t mfn = _mfn(node_start_pfn(i) + 1);
 
         printk("NODE%u start->%lu size->%lu free->%lu\n",
                i, node_start_pfn(i), node_spanned_pages(i),
                avail_node_heap_pages(i));
-        /* Sanity check phys_to_nid() */
-        if ( phys_to_nid(pa) != i )
-            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
-                   pa, phys_to_nid(pa), i);
+        /* Sanity check mfn_to_nid() */
+        if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i )
+            printk("mfn_to_nid(%"PRI_mfn") -> %d should be %u\n",
+                   mfn_x(mfn), mfn_to_nid(mfn), i);
     }
 
     j = cpumask_first(&cpu_online_map);
@@ -721,7 +721,7 @@  static void cf_check dump_numa(unsigned
         spin_lock(&d->page_alloc_lock);
         page_list_for_each ( page, &d->page_list )
         {
-            i = phys_to_nid(page_to_maddr(page));
+            i = page_to_nid(page);
             page_num_node[i]++;
         }
         spin_unlock(&d->page_alloc_lock);
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -971,7 +971,7 @@  static struct page_info *alloc_heap_page
         return NULL;
     }
 
-    node = phys_to_nid(page_to_maddr(pg));
+    node = page_to_nid(pg);
     zone = page_to_zone(pg);
     buddy_order = PFN_ORDER(pg);
 
@@ -1078,7 +1078,7 @@  static struct page_info *alloc_heap_page
 /* Remove any offlined page in the buddy pointed to by head. */
 static int reserve_offlined_page(struct page_info *head)
 {
-    unsigned int node = phys_to_nid(page_to_maddr(head));
+    unsigned int node = page_to_nid(head);
     int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
     struct page_info *cur_head;
     unsigned int cur_order, first_dirty;
@@ -1443,7 +1443,7 @@  static void free_heap_pages(
 {
     unsigned long mask;
     mfn_t mfn = page_to_mfn(pg);
-    unsigned int i, node = phys_to_nid(mfn_to_maddr(mfn));
+    unsigned int i, node = page_to_nid(pg);
     unsigned int zone = page_to_zone(pg);
     bool pg_offlined = false;
 
@@ -1487,7 +1487,7 @@  static void free_heap_pages(
                  !page_state_is(predecessor, free) ||
                  (predecessor->count_info & PGC_static) ||
                  (PFN_ORDER(predecessor) != order) ||
-                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
+                 (page_to_nid(predecessor) != node) )
                 break;
 
             check_and_stop_scrub(predecessor);
@@ -1511,7 +1511,7 @@  static void free_heap_pages(
                  !page_state_is(successor, free) ||
                  (successor->count_info & PGC_static) ||
                  (PFN_ORDER(successor) != order) ||
-                 (phys_to_nid(page_to_maddr(successor)) != node) )
+                 (page_to_nid(successor) != node) )
                 break;
 
             check_and_stop_scrub(successor);
@@ -1574,7 +1574,7 @@  static unsigned long mark_page_offline(s
 static int reserve_heap_page(struct page_info *pg)
 {
     struct page_info *head = NULL;
-    unsigned int i, node = phys_to_nid(page_to_maddr(pg));
+    unsigned int i, node = page_to_nid(pg);
     unsigned int zone = page_to_zone(pg);
 
     for ( i = 0; i <= MAX_ORDER; i++ )
@@ -1794,7 +1794,7 @@  static void _init_heap_pages(const struc
                              bool need_scrub)
 {
     unsigned long s, e;
-    unsigned int nid = phys_to_nid(page_to_maddr(pg));
+    unsigned int nid = page_to_nid(pg);
 
     s = mfn_x(page_to_mfn(pg));
     e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
@@ -1869,7 +1869,7 @@  static void init_heap_pages(
 #ifdef CONFIG_SEPARATE_XENHEAP
         unsigned int zone = page_to_zone(pg);
 #endif
-        unsigned int nid = phys_to_nid(page_to_maddr(pg));
+        unsigned int nid = page_to_nid(pg);
         unsigned long left = nr_pages - i;
         unsigned long contig_pages;
 
@@ -1893,7 +1893,7 @@  static void init_heap_pages(
                 break;
 #endif
 
-            if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) )
+            if ( nid != (page_to_nid(pg + contig_pages)) )
                 break;
         }
 
@@ -1934,7 +1934,7 @@  void __init end_boot_allocator(void)
     {
         struct bootmem_region *r = &bootmem_region_list[i];
         if ( (r->s < r->e) &&
-             (phys_to_nid(pfn_to_paddr(r->s)) == cpu_to_node(0)) )
+             (mfn_to_nid(_mfn(r->s)) == cpu_to_node(0)) )
         {
             init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s);
             r->e = r->s;
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -1,6 +1,7 @@ 
 #ifndef _XEN_NUMA_H
 #define _XEN_NUMA_H
 
+#include <xen/mm-frame.h>
 #include <asm/numa.h>
 
 #define NUMA_NO_NODE     0xFF
@@ -68,12 +69,15 @@  struct node_data {
 
 extern struct node_data node_data[];
 
-static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr)
+static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn)
 {
     nodeid_t nid;
-    ASSERT((paddr_to_pdx(addr) >> memnode_shift) < memnodemapsize);
-    nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
+    unsigned long pdx = mfn_to_pdx(mfn);
+
+    ASSERT((pdx >> memnode_shift) < memnodemapsize);
+    nid = memnodemap[pdx >> memnode_shift];
     ASSERT(nid < MAX_NUMNODES && node_data[nid].node_spanned_pages);
+
     return nid;
 }
 
@@ -102,6 +106,15 @@  extern bool numa_update_node_memblks(nod
                                      paddr_t start, paddr_t size, bool hotplug);
 extern void numa_set_processor_nodes_parsed(nodeid_t node);
 
+#else
+
+static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn)
+{
+    return 0;
+}
+
 #endif
 
+#define page_to_nid(pg) mfn_to_nid(page_to_mfn(pg))
+
 #endif /* _XEN_NUMA_H */