Message ID | 670b7017-4a6e-fa9e-9d65-65013bd4ad80@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NUMA: phys_to_nid() related adjustments | expand |
Hi Jan, On 13/12/2022 11:38, Jan Beulich wrote: > All callers convert frame numbers (perhaps in turn derived from struct > page_info pointers) to an address, just for the function to convert it > back to a frame number (as the first step of paddr_to_pdx()). Replace > the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace > call sites by the respectively most suitable one. > > While there also introduce a !NUMA stub, eliminating the need for Arm > (and potentially other ports) to carry one individually. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > At the top of free_heap_pages() mfn_to_nid() could also be used, since > the MFN is calculated immediately ahead. The choice of using > page_to_nid() (for now at least) was with the earlier patch's RFC in > mind, addressing of which may require to make mfn_to_nid() do weaker > checking than page_to_nid(). I haven't looked in details at the previous patch. However, I don't like the idea of making mfn_to_nid() do weaker checking because this could easily confuse the reader/developper. If you want to use weaker check, then it would be better if a separate helper is provided with a name reflecting its purpose. > --- a/xen/common/numa.c > +++ b/xen/common/numa.c > @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned > > for_each_online_node ( i ) > { > - paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1); > + mfn_t mfn = _mfn(node_start_pfn(i) + 1); > > printk("NODE%u start->%lu size->%lu free->%lu\n", > i, node_start_pfn(i), node_spanned_pages(i), > avail_node_heap_pages(i)); > - /* Sanity check phys_to_nid() */ > - if ( phys_to_nid(pa) != i ) > - printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n", > - pa, phys_to_nid(pa), i); > + /* Sanity check mfn_to_nid() */ > + if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i ) From the commit message, I would have expected that we would only replace phys_to_nid() with either mfn_to_nid() or page_to_nid(). However, here you added node_spanned_pages(). Can you explain why? Cheers,
On 13.12.2022 13:06, Julien Grall wrote: > On 13/12/2022 11:38, Jan Beulich wrote: >> All callers convert frame numbers (perhaps in turn derived from struct >> page_info pointers) to an address, just for the function to convert it >> back to a frame number (as the first step of paddr_to_pdx()). Replace >> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace >> call sites by the respectively most suitable one. >> >> While there also introduce a !NUMA stub, eliminating the need for Arm >> (and potentially other ports) to carry one individually. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> At the top of free_heap_pages() mfn_to_nid() could also be used, since >> the MFN is calculated immediately ahead. The choice of using >> page_to_nid() (for now at least) was with the earlier patch's RFC in >> mind, addressing of which may require to make mfn_to_nid() do weaker >> checking than page_to_nid(). > > I haven't looked in details at the previous patch. However, I don't like > the idea of making mfn_to_nid() do weaker checking because this could > easily confuse the reader/developper. > > If you want to use weaker check, then it would be better if a separate > helper is provided with a name reflecting its purpose. Well, the purpose then still is the very same conversion, so the name is quite appropriate. I don't view mfn_to_nid_bug_dont_look_very_closely() (exaggerating) as very sensible a name. >> --- a/xen/common/numa.c >> +++ b/xen/common/numa.c >> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned >> >> for_each_online_node ( i ) >> { >> - paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1); >> + mfn_t mfn = _mfn(node_start_pfn(i) + 1); >> >> printk("NODE%u start->%lu size->%lu free->%lu\n", >> i, node_start_pfn(i), node_spanned_pages(i), >> avail_node_heap_pages(i)); >> - /* Sanity check phys_to_nid() */ >> - if ( phys_to_nid(pa) != i ) >> - printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n", >> - pa, phys_to_nid(pa), i); >> + /* Sanity check mfn_to_nid() */ >> + if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i ) > > > From the commit message, I would have expected that we would only > replace phys_to_nid() with either mfn_to_nid() or page_to_nid(). > However, here you added node_spanned_pages(). Can you explain why? Oh, indeed, I meant to say a word on this but then forgot. This simply is because the adding of 1 to the start PFN (which by itself is imo a little funny) makes it so that the printk() inside the conditional would be certain to be called for an empty (e.g. CPU-only) node. Jan
Hi Jan, On 13/12/2022 12:46, Jan Beulich wrote: > On 13.12.2022 13:06, Julien Grall wrote: >> On 13/12/2022 11:38, Jan Beulich wrote: >>> All callers convert frame numbers (perhaps in turn derived from struct >>> page_info pointers) to an address, just for the function to convert it >>> back to a frame number (as the first step of paddr_to_pdx()). Replace >>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace >>> call sites by the respectively most suitable one. >>> >>> While there also introduce a !NUMA stub, eliminating the need for Arm >>> (and potentially other ports) to carry one individually. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> --- >>> At the top of free_heap_pages() mfn_to_nid() could also be used, since >>> the MFN is calculated immediately ahead. The choice of using >>> page_to_nid() (for now at least) was with the earlier patch's RFC in >>> mind, addressing of which may require to make mfn_to_nid() do weaker >>> checking than page_to_nid(). >> >> I haven't looked in details at the previous patch. However, I don't like >> the idea of making mfn_to_nid() do weaker checking because this could >> easily confuse the reader/developper. >> >> If you want to use weaker check, then it would be better if a separate >> helper is provided with a name reflecting its purpose. > > Well, the purpose then still is the very same conversion, so the name > is quite appropriate. I don't view mfn_to_nid_bug_dont_look_very_closely() > (exaggerating) as very sensible a name. I understand they are both doing the same conversion. But the checks will be different. With your proposal, we are now going to say if the caller is "buggy" then use mfn_to_nid() if not then you can use any. I think this is wrong to hide the "bug" just because the name is longer. In fact, it means that any non-buggy caller will still have relaxed check. The risk if we are going to introduce more "buggy" caller in the future. So from my perspective there are only two acceptable solutions: 1. Provide a different helper that will be used for just "buggy" caller. This will make super clear that the helper should only be used in very limited circumstances. 2. Fix the "buggy" callers. From your previous e-mails, it wasn't clear whether 2) is possible. So that's leave us only with 1). >>> --- a/xen/common/numa.c >>> +++ b/xen/common/numa.c >>> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned >>> >>> for_each_online_node ( i ) >>> { >>> - paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1); >>> + mfn_t mfn = _mfn(node_start_pfn(i) + 1); >>> >>> printk("NODE%u start->%lu size->%lu free->%lu\n", >>> i, node_start_pfn(i), node_spanned_pages(i), >>> avail_node_heap_pages(i)); >>> - /* Sanity check phys_to_nid() */ >>> - if ( phys_to_nid(pa) != i ) >>> - printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n", >>> - pa, phys_to_nid(pa), i); >>> + /* Sanity check mfn_to_nid() */ >>> + if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i ) >> >> >> From the commit message, I would have expected that we would only >> replace phys_to_nid() with either mfn_to_nid() or page_to_nid(). >> However, here you added node_spanned_pages(). Can you explain why? > > Oh, indeed, I meant to say a word on this but then forgot. This > simply is because the adding of 1 to the start PFN (which by > itself is imo a little funny) makes it so that the printk() > inside the conditional would be certain to be called for an > empty (e.g. CPU-only) node. Ok. I think this wants to be a separate patch as this sounds like bug and we should avoid mixing code conversion with bug fix. Cheers,
On 13.12.2022 14:48, Julien Grall wrote: > On 13/12/2022 12:46, Jan Beulich wrote: >> On 13.12.2022 13:06, Julien Grall wrote: >>> On 13/12/2022 11:38, Jan Beulich wrote: >>>> All callers convert frame numbers (perhaps in turn derived from struct >>>> page_info pointers) to an address, just for the function to convert it >>>> back to a frame number (as the first step of paddr_to_pdx()). Replace >>>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace >>>> call sites by the respectively most suitable one. >>>> >>>> While there also introduce a !NUMA stub, eliminating the need for Arm >>>> (and potentially other ports) to carry one individually. >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>> --- >>>> At the top of free_heap_pages() mfn_to_nid() could also be used, since >>>> the MFN is calculated immediately ahead. The choice of using >>>> page_to_nid() (for now at least) was with the earlier patch's RFC in >>>> mind, addressing of which may require to make mfn_to_nid() do weaker >>>> checking than page_to_nid(). >>> >>> I haven't looked in details at the previous patch. However, I don't like >>> the idea of making mfn_to_nid() do weaker checking because this could >>> easily confuse the reader/developper. >>> >>> If you want to use weaker check, then it would be better if a separate >>> helper is provided with a name reflecting its purpose. >> >> Well, the purpose then still is the very same conversion, so the name >> is quite appropriate. I don't view mfn_to_nid_bug_dont_look_very_closely() >> (exaggerating) as very sensible a name. > > I understand they are both doing the same conversion. But the checks > will be different. With your proposal, we are now going to say if the > caller is "buggy" then use mfn_to_nid() if not then you can use any. > > I think this is wrong to hide the "bug" just because the name is longer. > In fact, it means that any non-buggy caller will still have relaxed > check. The risk if we are going to introduce more "buggy" caller in the > future. While I, too, have taken your perspective as one possible one, I've also been considering a slightly different perspective: page_to_nid() implies the caller to have a struct page_info *, which in turn implies you pass in something identifying valid memory (which hence should have a valid node ID associated with it). mfn_to_nid(), otoh, has nothing to pre-qualify (see patch 1's RFC remark as to mfn_valid() not being sufficient). Hence less rigid checking there can make sense (and you'll notice that mfn_to_nid() was also used quite sparingly in the course of the conversion.) > So from my perspective there are only two acceptable solutions: > 1. Provide a different helper that will be used for just "buggy" > caller. This will make super clear that the helper should only be used > in very limited circumstances. > 2. Fix the "buggy" callers. > > From your previous e-mails, it wasn't clear whether 2) is possible. So > that's leave us only with 1). The buggy callers are the ones touched by patch 1; see (again) the RFC remark there for limitations of that approach. >>>> --- a/xen/common/numa.c >>>> +++ b/xen/common/numa.c >>>> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned >>>> >>>> for_each_online_node ( i ) >>>> { >>>> - paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1); >>>> + mfn_t mfn = _mfn(node_start_pfn(i) + 1); >>>> >>>> printk("NODE%u start->%lu size->%lu free->%lu\n", >>>> i, node_start_pfn(i), node_spanned_pages(i), >>>> avail_node_heap_pages(i)); >>>> - /* Sanity check phys_to_nid() */ >>>> - if ( phys_to_nid(pa) != i ) >>>> - printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n", >>>> - pa, phys_to_nid(pa), i); >>>> + /* Sanity check mfn_to_nid() */ >>>> + if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i ) >>> >>> >>> From the commit message, I would have expected that we would only >>> replace phys_to_nid() with either mfn_to_nid() or page_to_nid(). >>> However, here you added node_spanned_pages(). Can you explain why? >> >> Oh, indeed, I meant to say a word on this but then forgot. This >> simply is because the adding of 1 to the start PFN (which by >> itself is imo a little funny) makes it so that the printk() >> inside the conditional would be certain to be called for an >> empty (e.g. CPU-only) node. > > Ok. I think this wants to be a separate patch as this sounds like bug > and we should avoid mixing code conversion with bug fix. Yet then this is only in a debug key handler. (Else I would have made it a separate patch, yes.) Jan
Hi Jan, On 13/12/2022 14:08, Jan Beulich wrote: > On 13.12.2022 14:48, Julien Grall wrote: >> On 13/12/2022 12:46, Jan Beulich wrote: >>> On 13.12.2022 13:06, Julien Grall wrote: >>>> On 13/12/2022 11:38, Jan Beulich wrote: >>>>> All callers convert frame numbers (perhaps in turn derived from struct >>>>> page_info pointers) to an address, just for the function to convert it >>>>> back to a frame number (as the first step of paddr_to_pdx()). Replace >>>>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace >>>>> call sites by the respectively most suitable one. >>>>> >>>>> While there also introduce a !NUMA stub, eliminating the need for Arm >>>>> (and potentially other ports) to carry one individually. >>>>> >>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>> --- >>>>> At the top of free_heap_pages() mfn_to_nid() could also be used, since >>>>> the MFN is calculated immediately ahead. The choice of using >>>>> page_to_nid() (for now at least) was with the earlier patch's RFC in >>>>> mind, addressing of which may require to make mfn_to_nid() do weaker >>>>> checking than page_to_nid(). >>>> >>>> I haven't looked in details at the previous patch. However, I don't like >>>> the idea of making mfn_to_nid() do weaker checking because this could >>>> easily confuse the reader/developper. >>>> >>>> If you want to use weaker check, then it would be better if a separate >>>> helper is provided with a name reflecting its purpose. >>> >>> Well, the purpose then still is the very same conversion, so the name >>> is quite appropriate. I don't view mfn_to_nid_bug_dont_look_very_closely() >>> (exaggerating) as very sensible a name. >> >> I understand they are both doing the same conversion. But the checks >> will be different. With your proposal, we are now going to say if the >> caller is "buggy" then use mfn_to_nid() if not then you can use any. >> >> I think this is wrong to hide the "bug" just because the name is longer. >> In fact, it means that any non-buggy caller will still have relaxed >> check. The risk if we are going to introduce more "buggy" caller in the >> future. > > While I, too, have taken your perspective as one possible one, I've > also been considering a slightly different perspective: page_to_nid() > implies the caller to have a struct page_info *, which in turn implies > you pass in something identifying valid memory (which hence should have > a valid node ID associated with it). mfn_to_nid(), otoh, has nothing > to pre-qualify (see patch 1's RFC remark as to mfn_valid() not being > sufficient). Hence less rigid checking there can make sense (and you'll > notice that mfn_to_nid() was also used quite sparingly in the course of > the conversion.) > >> So from my perspective there are only two acceptable solutions: >> 1. Provide a different helper that will be used for just "buggy" >> caller. This will make super clear that the helper should only be used >> in very limited circumstances. >> 2. Fix the "buggy" callers. >> >> From your previous e-mails, it wasn't clear whether 2) is possible. So >> that's leave us only with 1). > > The buggy callers are the ones touched by patch 1; see (again) the RFC > remark there for limitations of that approach. Even with what you wrote above, I still think that relaxing the check for everyone is wrong. Anyway, this patch is not changing the helper. So I will wait and see a formal proposal. > >>>>> --- a/xen/common/numa.c >>>>> +++ b/xen/common/numa.c >>>>> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned >>>>> >>>>> for_each_online_node ( i ) >>>>> { >>>>> - paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1); >>>>> + mfn_t mfn = _mfn(node_start_pfn(i) + 1); >>>>> >>>>> printk("NODE%u start->%lu size->%lu free->%lu\n", >>>>> i, node_start_pfn(i), node_spanned_pages(i), >>>>> avail_node_heap_pages(i)); >>>>> - /* Sanity check phys_to_nid() */ >>>>> - if ( phys_to_nid(pa) != i ) >>>>> - printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n", >>>>> - pa, phys_to_nid(pa), i); >>>>> + /* Sanity check mfn_to_nid() */ >>>>> + if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i ) >>>> >>>> >>>> From the commit message, I would have expected that we would only >>>> replace phys_to_nid() with either mfn_to_nid() or page_to_nid(). >>>> However, here you added node_spanned_pages(). Can you explain why? >>> >>> Oh, indeed, I meant to say a word on this but then forgot. This >>> simply is because the adding of 1 to the start PFN (which by >>> itself is imo a little funny) makes it so that the printk() >>> inside the conditional would be certain to be called for an >>> empty (e.g. CPU-only) node. >> >> Ok. I think this wants to be a separate patch as this sounds like bug >> and we should avoid mixing code conversion with bug fix. > > Yet then this is only in a debug key handler. (Else I would have made > it a separate patch, yes.) IMO, the fact it is a debug key handler doesn't matter. While I am generally OK if we do minor swapin patch modifying the behavior. I think the other way around is quite confusing. And therefore, I would rather prefer the split unless another maintainer thinks otherwise. Cheers,
On 13/12/2022 11:38 am, Jan Beulich wrote: > All callers convert frame numbers (perhaps in turn derived from struct > page_info pointers) to an address, just for the function to convert it > back to a frame number (as the first step of paddr_to_pdx()). Replace > the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace > call sites by the respectively most suitable one. > > While there also introduce a !NUMA stub, eliminating the need for Arm > (and potentially other ports) to carry one individually. Thanks. This will help RISC-V too. > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>, albeit with one deletion. > --- a/xen/include/xen/numa.h > +++ b/xen/include/xen/numa.h > @@ -1,6 +1,7 @@ > #ifndef _XEN_NUMA_H > #define _XEN_NUMA_H > > +#include <xen/mm-frame.h> > #include <asm/numa.h> > > #define NUMA_NO_NODE 0xFF > @@ -68,12 +69,15 @@ struct node_data { > > extern struct node_data node_data[]; > > -static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr) > +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn) > { > nodeid_t nid; > - ASSERT((paddr_to_pdx(addr) >> memnode_shift) < memnodemapsize); > - nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift]; > + unsigned long pdx = mfn_to_pdx(mfn); > + > + ASSERT((pdx >> memnode_shift) < memnodemapsize); > + nid = memnodemap[pdx >> memnode_shift]; > ASSERT(nid < MAX_NUMNODES && node_data[nid].node_spanned_pages); > + > return nid; > } > > @@ -102,6 +106,15 @@ extern bool numa_update_node_memblks(nod > paddr_t start, paddr_t size, bool hotplug); > extern void numa_set_processor_nodes_parsed(nodeid_t node); > > +#else > + > +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn) > +{ > + return 0; > +} pure is useless on a stub like this, whereas its false on the non-stub form (uses several non-const variables) in a way that the compiler can prove (because it's static inline), and will discard. As you're modifying both lines anyway, just drop the attribute. ~Andrew
On 16.12.2022 12:49, Andrew Cooper wrote: > On 13/12/2022 11:38 am, Jan Beulich wrote: >> All callers convert frame numbers (perhaps in turn derived from struct >> page_info pointers) to an address, just for the function to convert it >> back to a frame number (as the first step of paddr_to_pdx()). Replace >> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace >> call sites by the respectively most suitable one. >> >> While there also introduce a !NUMA stub, eliminating the need for Arm >> (and potentially other ports) to carry one individually. > > Thanks. This will help RISC-V too. > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>, Thanks. You realize though that the patch may change depending on the verdict on patch 1 (and, if that one's to change, the two likely flipped with the actual fix moving here in the form of more relaxed assertions, one way or another)? > albeit with one deletion. > >> --- a/xen/include/xen/numa.h >> +++ b/xen/include/xen/numa.h >> @@ -1,6 +1,7 @@ >> #ifndef _XEN_NUMA_H >> #define _XEN_NUMA_H >> >> +#include <xen/mm-frame.h> >> #include <asm/numa.h> >> >> #define NUMA_NO_NODE 0xFF >> @@ -68,12 +69,15 @@ struct node_data { >> >> extern struct node_data node_data[]; >> >> -static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr) >> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn) >> { >> nodeid_t nid; >> - ASSERT((paddr_to_pdx(addr) >> memnode_shift) < memnodemapsize); >> - nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift]; >> + unsigned long pdx = mfn_to_pdx(mfn); >> + >> + ASSERT((pdx >> memnode_shift) < memnodemapsize); >> + nid = memnodemap[pdx >> memnode_shift]; >> ASSERT(nid < MAX_NUMNODES && node_data[nid].node_spanned_pages); >> + >> return nid; >> } >> >> @@ -102,6 +106,15 @@ extern bool numa_update_node_memblks(nod >> paddr_t start, paddr_t size, bool hotplug); >> extern void numa_set_processor_nodes_parsed(nodeid_t node); >> >> +#else >> + >> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn) >> +{ >> + return 0; >> +} > > pure is useless on a stub like this, whereas its false on the non-stub > form (uses several non-const variables) in a way that the compiler can > prove (because it's static inline), and will discard. > > As you're modifying both lines anyway, just drop the attribute. Hmm, yes, I agree for the stub, so I've dropped it there. "Several non- const variables", however, is only partly true. These are __ro_after_init and not written anymore once set. Are you sure the compiler will ignore a "pure" attribute if it finds it (formally) violated? That would be somewhat odd, as it means differing behavior depending on whether the same piece of code is in an inline or out-of-line function. Jan
On 16/12/2022 11:59 am, Jan Beulich wrote: > On 16.12.2022 12:49, Andrew Cooper wrote: >> On 13/12/2022 11:38 am, Jan Beulich wrote: >>> All callers convert frame numbers (perhaps in turn derived from struct >>> page_info pointers) to an address, just for the function to convert it >>> back to a frame number (as the first step of paddr_to_pdx()). Replace >>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace >>> call sites by the respectively most suitable one. >>> >>> While there also introduce a !NUMA stub, eliminating the need for Arm >>> (and potentially other ports) to carry one individually. >> Thanks. This will help RISC-V too. >> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>, > Thanks. You realize though that the patch may change depending on the > verdict on patch 1 (and, if that one's to change, the two likely > flipped with the actual fix moving here in the form of more relaxed > assertions, one way or another)? Yeah, the tweak sounded entirely reasonable. > >> albeit with one deletion. >> >>> --- a/xen/include/xen/numa.h >>> +++ b/xen/include/xen/numa.h >>> @@ -1,6 +1,7 @@ >>> #ifndef _XEN_NUMA_H >>> #define _XEN_NUMA_H >>> >>> +#include <xen/mm-frame.h> >>> #include <asm/numa.h> >>> >>> #define NUMA_NO_NODE 0xFF >>> @@ -68,12 +69,15 @@ struct node_data { >>> >>> extern struct node_data node_data[]; >>> >>> -static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr) >>> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn) >>> { >>> nodeid_t nid; >>> - ASSERT((paddr_to_pdx(addr) >> memnode_shift) < memnodemapsize); >>> - nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift]; >>> + unsigned long pdx = mfn_to_pdx(mfn); >>> + >>> + ASSERT((pdx >> memnode_shift) < memnodemapsize); >>> + nid = memnodemap[pdx >> memnode_shift]; >>> ASSERT(nid < MAX_NUMNODES && node_data[nid].node_spanned_pages); >>> + >>> return nid; >>> } >>> >>> @@ -102,6 +106,15 @@ extern bool numa_update_node_memblks(nod >>> paddr_t start, paddr_t size, bool hotplug); >>> extern void numa_set_processor_nodes_parsed(nodeid_t node); >>> >>> +#else >>> + >>> +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn) >>> +{ >>> + return 0; >>> +} >> pure is useless on a stub like this, whereas its false on the non-stub >> form (uses several non-const variables) in a way that the compiler can >> prove (because it's static inline), and will discard. >> >> As you're modifying both lines anyway, just drop the attribute. > Hmm, yes, I agree for the stub, so I've dropped it there. "Several non- > const variables", however, is only partly true. These are __ro_after_init > and not written anymore once set. They're still read-write as far as C is concerned, and some of these uses are before modifications finish. > Are you sure the compiler will ignore > a "pure" attribute if it finds it (formally) violated? Yes, very sure. It got discussed at length on one of the speculation lists. When the compiler can prove that the programmer doesn't know the rules concerning pure/const, the attributes will be discarded. To abuse the rules, you really do need the operation hidden in a place that GCC can't see, so either a separate translation unit, or in inline assembly. ~Andrew
--- a/xen/arch/arm/include/asm/numa.h +++ b/xen/arch/arm/include/asm/numa.h @@ -11,11 +11,6 @@ typedef u8 nodeid_t; #define cpu_to_node(cpu) 0 #define node_to_cpumask(node) (cpu_online_map) -static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr) -{ - return 0; -} - /* * TODO: make first_valid_mfn static when NUMA is supported on Arm, this * is required because the dummy helpers are using it. --- a/xen/arch/x86/mm/p2m-pod.c +++ b/xen/arch/x86/mm/p2m-pod.c @@ -492,7 +492,7 @@ p2m_pod_offline_or_broken_replace(struct { struct domain *d; struct p2m_domain *p2m; - nodeid_t node = phys_to_nid(page_to_maddr(p)); + nodeid_t node = page_to_nid(p); if ( !(d = page_get_owner(p)) || !(p2m = p2m_get_hostp2m(d)) ) return; --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -565,7 +565,7 @@ void __init paging_init(void) if ( n == CNT ) ++holes; else if ( k == holes ) - memflags = MEMF_node(phys_to_nid(mfn_to_maddr(mfn))); + memflags = MEMF_node(mfn_to_nid(mfn)); } if ( k == holes ) { @@ -600,7 +600,7 @@ void __init paging_init(void) mfn = _mfn(MFN(i) + n * PDX_GROUP_COUNT); if ( mfn_valid(mfn) ) { - memflags = MEMF_node(phys_to_nid(mfn_to_maddr(mfn))); + memflags = MEMF_node(mfn_to_nid(mfn)); break; } } @@ -677,7 +677,7 @@ void __init paging_init(void) mfn = _mfn(MFN(i) + n * PDX_GROUP_COUNT); if ( mfn_valid(mfn) ) { - memflags = MEMF_node(phys_to_nid(mfn_to_maddr(mfn))); + memflags = MEMF_node(mfn_to_nid(mfn)); break; } } --- a/xen/common/numa.c +++ b/xen/common/numa.c @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned for_each_online_node ( i ) { - paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1); + mfn_t mfn = _mfn(node_start_pfn(i) + 1); printk("NODE%u start->%lu size->%lu free->%lu\n", i, node_start_pfn(i), node_spanned_pages(i), avail_node_heap_pages(i)); - /* Sanity check phys_to_nid() */ - if ( phys_to_nid(pa) != i ) - printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n", - pa, phys_to_nid(pa), i); + /* Sanity check mfn_to_nid() */ + if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i ) + printk("mfn_to_nid(%"PRI_mfn") -> %d should be %u\n", + mfn_x(mfn), mfn_to_nid(mfn), i); } j = cpumask_first(&cpu_online_map); @@ -721,7 +721,7 @@ static void cf_check dump_numa(unsigned spin_lock(&d->page_alloc_lock); page_list_for_each ( page, &d->page_list ) { - i = phys_to_nid(page_to_maddr(page)); + i = page_to_nid(page); page_num_node[i]++; } spin_unlock(&d->page_alloc_lock); --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -971,7 +971,7 @@ static struct page_info *alloc_heap_page return NULL; } - node = phys_to_nid(page_to_maddr(pg)); + node = page_to_nid(pg); zone = page_to_zone(pg); buddy_order = PFN_ORDER(pg); @@ -1078,7 +1078,7 @@ static struct page_info *alloc_heap_page /* Remove any offlined page in the buddy pointed to by head. */ static int reserve_offlined_page(struct page_info *head) { - unsigned int node = phys_to_nid(page_to_maddr(head)); + unsigned int node = page_to_nid(head); int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0; struct page_info *cur_head; unsigned int cur_order, first_dirty; @@ -1443,7 +1443,7 @@ static void free_heap_pages( { unsigned long mask; mfn_t mfn = page_to_mfn(pg); - unsigned int i, node = phys_to_nid(mfn_to_maddr(mfn)); + unsigned int i, node = page_to_nid(pg); unsigned int zone = page_to_zone(pg); bool pg_offlined = false; @@ -1487,7 +1487,7 @@ static void free_heap_pages( !page_state_is(predecessor, free) || (predecessor->count_info & PGC_static) || (PFN_ORDER(predecessor) != order) || - (phys_to_nid(page_to_maddr(predecessor)) != node) ) + (page_to_nid(predecessor) != node) ) break; check_and_stop_scrub(predecessor); @@ -1511,7 +1511,7 @@ static void free_heap_pages( !page_state_is(successor, free) || (successor->count_info & PGC_static) || (PFN_ORDER(successor) != order) || - (phys_to_nid(page_to_maddr(successor)) != node) ) + (page_to_nid(successor) != node) ) break; check_and_stop_scrub(successor); @@ -1574,7 +1574,7 @@ static unsigned long mark_page_offline(s static int reserve_heap_page(struct page_info *pg) { struct page_info *head = NULL; - unsigned int i, node = phys_to_nid(page_to_maddr(pg)); + unsigned int i, node = page_to_nid(pg); unsigned int zone = page_to_zone(pg); for ( i = 0; i <= MAX_ORDER; i++ ) @@ -1794,7 +1794,7 @@ static void _init_heap_pages(const struc bool need_scrub) { unsigned long s, e; - unsigned int nid = phys_to_nid(page_to_maddr(pg)); + unsigned int nid = page_to_nid(pg); s = mfn_x(page_to_mfn(pg)); e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); @@ -1869,7 +1869,7 @@ static void init_heap_pages( #ifdef CONFIG_SEPARATE_XENHEAP unsigned int zone = page_to_zone(pg); #endif - unsigned int nid = phys_to_nid(page_to_maddr(pg)); + unsigned int nid = page_to_nid(pg); unsigned long left = nr_pages - i; unsigned long contig_pages; @@ -1893,7 +1893,7 @@ static void init_heap_pages( break; #endif - if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) ) + if ( nid != (page_to_nid(pg + contig_pages)) ) break; } @@ -1934,7 +1934,7 @@ void __init end_boot_allocator(void) { struct bootmem_region *r = &bootmem_region_list[i]; if ( (r->s < r->e) && - (phys_to_nid(pfn_to_paddr(r->s)) == cpu_to_node(0)) ) + (mfn_to_nid(_mfn(r->s)) == cpu_to_node(0)) ) { init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s); r->e = r->s; --- a/xen/include/xen/numa.h +++ b/xen/include/xen/numa.h @@ -1,6 +1,7 @@ #ifndef _XEN_NUMA_H #define _XEN_NUMA_H +#include <xen/mm-frame.h> #include <asm/numa.h> #define NUMA_NO_NODE 0xFF @@ -68,12 +69,15 @@ struct node_data { extern struct node_data node_data[]; -static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr) +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn) { nodeid_t nid; - ASSERT((paddr_to_pdx(addr) >> memnode_shift) < memnodemapsize); - nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift]; + unsigned long pdx = mfn_to_pdx(mfn); + + ASSERT((pdx >> memnode_shift) < memnodemapsize); + nid = memnodemap[pdx >> memnode_shift]; ASSERT(nid < MAX_NUMNODES && node_data[nid].node_spanned_pages); + return nid; } @@ -102,6 +106,15 @@ extern bool numa_update_node_memblks(nod paddr_t start, paddr_t size, bool hotplug); extern void numa_set_processor_nodes_parsed(nodeid_t node); +#else + +static inline nodeid_t __attribute_pure__ mfn_to_nid(mfn_t mfn) +{ + return 0; +} + #endif +#define page_to_nid(pg) mfn_to_nid(page_to_mfn(pg)) + #endif /* _XEN_NUMA_H */
All callers convert frame numbers (perhaps in turn derived from struct page_info pointers) to an address, just for the function to convert it back to a frame number (as the first step of paddr_to_pdx()). Replace the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace call sites by the respectively most suitable one. While there also introduce a !NUMA stub, eliminating the need for Arm (and potentially other ports) to carry one individually. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- At the top of free_heap_pages() mfn_to_nid() could also be used, since the MFN is calculated immediately ahead. The choice of using page_to_nid() (for now at least) was with the earlier patch's RFC in mind, addressing of which may require to make mfn_to_nid() do weaker checking than page_to_nid().