Message ID | 20190729121204.13559-2-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen/nodemask: API cleanup and fixes | expand |
On 29.07.2019 14:11, Andrew Cooper wrote: > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -811,11 +811,18 @@ static struct page_info *get_free_buddy(unsigned int zone_lo, > const struct domain *d) > { > nodeid_t first, node = MEMF_get_node(memflags), req_node = node; > - nodemask_t nodemask = d ? d->node_affinity : node_online_map; > + nodemask_t nodemask = node_online_map; > unsigned int j, zone, nodemask_retry = 0; > struct page_info *pg; > bool use_unscrubbed = (memflags & MEMF_no_scrub); > > + /* > + * d->node_affinity is our preferred allocation set if provided, but it > + * may have bit set outside of node_online_map. Clamp it. Would you mind adding the apparently missing "s" to "bit"? > + */ > + if ( d ) > + nodes_and(nodemask, nodemask, d->node_affinity); Despite my earlier ack: Code further down assumes a non-empty mask, which is no longer guaranteed afaics. I think you want to append an "intersects" check in the if(). With that feel free to promote my A-b to R-b. Jan
On 29/07/2019 16:48, Jan Beulich wrote: > On 29.07.2019 14:11, Andrew Cooper wrote: >> --- a/xen/common/page_alloc.c >> +++ b/xen/common/page_alloc.c >> @@ -811,11 +811,18 @@ static struct page_info *get_free_buddy(unsigned int zone_lo, >> const struct domain *d) >> { >> nodeid_t first, node = MEMF_get_node(memflags), req_node = node; >> - nodemask_t nodemask = d ? d->node_affinity : node_online_map; >> + nodemask_t nodemask = node_online_map; >> unsigned int j, zone, nodemask_retry = 0; >> struct page_info *pg; >> bool use_unscrubbed = (memflags & MEMF_no_scrub); >> >> + /* >> + * d->node_affinity is our preferred allocation set if provided, but it >> + * may have bit set outside of node_online_map. Clamp it. > Would you mind adding the apparently missing "s" to "bit"? Oops yes. > >> + */ >> + if ( d ) >> + nodes_and(nodemask, nodemask, d->node_affinity); > Despite my earlier ack: Code further down assumes a non-empty mask, > which is no longer guaranteed afaics. Nothing previous guaranteed that d->node_affinity had any bits set in it, either. That said, in practice it is either ALL, or something derived from the cpu=>node mappings, so I don't think this is a problem in practice. > I think you want to append an > "intersects" check in the if(). I think it would be better to assert that callers don't give us complete junk. > With that feel free to promote my > A-b to R-b. How about: if ( d ) { if ( nodes_intersect(nodemask, d->node_affinity) ) nodes_and(nodemask, nodemask, d->node_affinity); else ASSERT_UNREACHABLE(); } ? This change has passed my normal set of prepush checks (not not that there is anything interesting NUMA-wise in there). ~Andrew
On 29.07.2019 19:26, Andrew Cooper wrote: > On 29/07/2019 16:48, Jan Beulich wrote: >> On 29.07.2019 14:11, Andrew Cooper wrote: >>> + if ( d ) >>> + nodes_and(nodemask, nodemask, d->node_affinity); >> Despite my earlier ack: Code further down assumes a non-empty mask, >> which is no longer guaranteed afaics. > > Nothing previous guaranteed that d->node_affinity had any bits set in > it, either. > > That said, in practice it is either ALL, or something derived from the > cpu=>node mappings, so I don't think this is a problem in practice. > >> I think you want to append an >> "intersects" check in the if(). > > I think it would be better to assert that callers don't give us complete > junk. > >> With that feel free to promote my >> A-b to R-b. > > How about: > > if ( d ) > { > if ( nodes_intersect(nodemask, d->node_affinity) ) > nodes_and(nodemask, nodemask, d->node_affinity); > else > ASSERT_UNREACHABLE(); > } > > ? > > This change has passed my normal set of prepush checks (not not that > there is anything interesting NUMA-wise in there). domain_update_node_affinity() means to guarantee a non-empty mask (by way of a similar assertion), when ->auto_node_affinity is set. Otoh domain_set_node_affinity() may clear that flag, at which point I can't see what would guarantee that the intersection would remain non-empty as CPUs get offlined. (I don't understand, btw, why the function calls domain_update_node_affinity() after clearing the flag.) Therefore I don't think an assertion like you suggest would be legitimate. For this there would be a prereq of domain_update_node_affinity() clearing the flag when it finds the intersection has become empty. And even then there would be a window in time where the intersection might be actively empty, so some synchronization would be needed in addition. Jan
On 30/07/2019 09:09, Jan Beulich wrote: > On 29.07.2019 19:26, Andrew Cooper wrote: >> On 29/07/2019 16:48, Jan Beulich wrote: >>> On 29.07.2019 14:11, Andrew Cooper wrote: >>>> + if ( d ) >>>> + nodes_and(nodemask, nodemask, d->node_affinity); >>> Despite my earlier ack: Code further down assumes a non-empty mask, >>> which is no longer guaranteed afaics. >> Nothing previous guaranteed that d->node_affinity had any bits set in >> it, either. >> >> That said, in practice it is either ALL, or something derived from the >> cpu=>node mappings, so I don't think this is a problem in practice. >> >>> I think you want to append an >>> "intersects" check in the if(). >> I think it would be better to assert that callers don't give us complete >> junk. >> >>> With that feel free to promote my >>> A-b to R-b. >> How about: >> >> if ( d ) >> { >> if ( nodes_intersect(nodemask, d->node_affinity) ) >> nodes_and(nodemask, nodemask, d->node_affinity); >> else >> ASSERT_UNREACHABLE(); >> } >> >> ? >> >> This change has passed my normal set of prepush checks (not not that >> there is anything interesting NUMA-wise in there). > domain_update_node_affinity() means to guarantee a non-empty mask (by > way of a similar assertion), when ->auto_node_affinity is set. Otoh > domain_set_node_affinity() may clear that flag, at which point I can't > see what would guarantee that the intersection would remain non-empty > as CPUs get offlined. I don't see what CPU offlining has to do with anything. There is no such thing as taking a node out of the node_online_map, nor should there be - even if we offline an entire socket's worth of CPUs, the memory controller is still active and available for use. The domain always has non-zero vCPUs, which will always result in an intersection with node_online_map. What is a problem is XEN_DOMCTL_setnodeaffinity being called with node mask which is disjoint to node_online_map to begin with. This problematic behaviour already exists today, and I bet there is a lot of fun to had with that hypercall. As a first pass, diff --git a/xen/common/domain.c b/xen/common/domain.c index 9aefc2a680..57c84cdc42 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -631,8 +631,9 @@ void domain_update_node_affinity(struct domain *d) int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity) { - /* Being affine with no nodes is just wrong */ - if ( nodes_empty(*affinity) ) + /* Being affine with no nodes, or disjoint with the system, is wrong. */ + if ( nodes_empty(*affinity) || + !nodes_intersects(*affinity, node_online_map) ) return -EINVAL; spin_lock(&d->node_affinity_lock); ought to work, and make it safe to retain the ASSERT() in the heap allocator. ~Andrew
On 30.07.2019 19:32, Andrew Cooper wrote: > On 30/07/2019 09:09, Jan Beulich wrote: >> On 29.07.2019 19:26, Andrew Cooper wrote: >>> On 29/07/2019 16:48, Jan Beulich wrote: >>>> On 29.07.2019 14:11, Andrew Cooper wrote: >>>>> + if ( d ) >>>>> + nodes_and(nodemask, nodemask, d->node_affinity); >>>> Despite my earlier ack: Code further down assumes a non-empty mask, >>>> which is no longer guaranteed afaics. >>> Nothing previous guaranteed that d->node_affinity had any bits set in >>> it, either. >>> >>> That said, in practice it is either ALL, or something derived from the >>> cpu=>node mappings, so I don't think this is a problem in practice. >>> >>>> I think you want to append an >>>> "intersects" check in the if(). >>> I think it would be better to assert that callers don't give us complete >>> junk. >>> >>>> With that feel free to promote my >>>> A-b to R-b. >>> How about: >>> >>> if ( d ) >>> { >>> if ( nodes_intersect(nodemask, d->node_affinity) ) >>> nodes_and(nodemask, nodemask, d->node_affinity); >>> else >>> ASSERT_UNREACHABLE(); >>> } >>> >>> ? >>> >>> This change has passed my normal set of prepush checks (not not that >>> there is anything interesting NUMA-wise in there). >> domain_update_node_affinity() means to guarantee a non-empty mask (by >> way of a similar assertion), when ->auto_node_affinity is set. Otoh >> domain_set_node_affinity() may clear that flag, at which point I can't >> see what would guarantee that the intersection would remain non-empty >> as CPUs get offlined. > > I don't see what CPU offlining has to do with anything. There is no > such thing as taking a node out of the node_online_map, nor should there > be - even if we offline an entire socket's worth of CPUs, the memory > controller is still active and available for use. > > The domain always has non-zero vCPUs, which will always result in an > intersection with node_online_map. Oh, right - I forgot that we (almost) never clear bits from node_online_map. There's one use of node_set_offline() in memory_add() - I wonder whether we shouldn't ditch node_set_offline() to make more visible that we don't mean to ever clear bits there. > What is a problem is XEN_DOMCTL_setnodeaffinity being called with node > mask which is disjoint to node_online_map to begin with. > > This problematic behaviour already exists today, and I bet there is a > lot of fun to had with that hypercall. > > As a first pass, > > diff --git a/xen/common/domain.c b/xen/common/domain.c > index 9aefc2a680..57c84cdc42 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -631,8 +631,9 @@ void domain_update_node_affinity(struct domain *d) > > int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity) > { > - /* Being affine with no nodes is just wrong */ > - if ( nodes_empty(*affinity) ) > + /* Being affine with no nodes, or disjoint with the system, is wrong. */ > + if ( nodes_empty(*affinity) || > + !nodes_intersects(*affinity, node_online_map) ) > return -EINVAL; Right, and then you don't need the nodes_empty() part anymore. With this change folded in (or as a prereq one to allow backporting) you can add my R-b with the adjustment further up in place. Jan
On 31/07/2019 09:22, Jan Beulich wrote: > On 30.07.2019 19:32, Andrew Cooper wrote: >> On 30/07/2019 09:09, Jan Beulich wrote: >>> On 29.07.2019 19:26, Andrew Cooper wrote: >>>> On 29/07/2019 16:48, Jan Beulich wrote: >>>>> On 29.07.2019 14:11, Andrew Cooper wrote: >>>>>> + if ( d ) >>>>>> + nodes_and(nodemask, nodemask, d->node_affinity); >>>>> Despite my earlier ack: Code further down assumes a non-empty mask, >>>>> which is no longer guaranteed afaics. >>>> Nothing previous guaranteed that d->node_affinity had any bits set in >>>> it, either. >>>> >>>> That said, in practice it is either ALL, or something derived from the >>>> cpu=>node mappings, so I don't think this is a problem in practice. >>>> >>>>> I think you want to append an >>>>> "intersects" check in the if(). >>>> I think it would be better to assert that callers don't give us complete >>>> junk. >>>> >>>>> With that feel free to promote my >>>>> A-b to R-b. >>>> How about: >>>> >>>> if ( d ) >>>> { >>>> if ( nodes_intersect(nodemask, d->node_affinity) ) >>>> nodes_and(nodemask, nodemask, d->node_affinity); >>>> else >>>> ASSERT_UNREACHABLE(); >>>> } >>>> >>>> ? >>>> >>>> This change has passed my normal set of prepush checks (not not that >>>> there is anything interesting NUMA-wise in there). >>> domain_update_node_affinity() means to guarantee a non-empty mask (by >>> way of a similar assertion), when ->auto_node_affinity is set. Otoh >>> domain_set_node_affinity() may clear that flag, at which point I can't >>> see what would guarantee that the intersection would remain non-empty >>> as CPUs get offlined. >> I don't see what CPU offlining has to do with anything. There is no >> such thing as taking a node out of the node_online_map, nor should there >> be - even if we offline an entire socket's worth of CPUs, the memory >> controller is still active and available for use. >> >> The domain always has non-zero vCPUs, which will always result in an >> intersection with node_online_map. > Oh, right - I forgot that we (almost) never clear bits from > node_online_map. There's one use of node_set_offline() in > memory_add() - I wonder whether we shouldn't ditch > node_set_offline() to make more visible that we don't mean > to ever clear bits there. > >> What is a problem is XEN_DOMCTL_setnodeaffinity being called with node >> mask which is disjoint to node_online_map to begin with. >> >> This problematic behaviour already exists today, and I bet there is a >> lot of fun to had with that hypercall. >> >> As a first pass, >> >> diff --git a/xen/common/domain.c b/xen/common/domain.c >> index 9aefc2a680..57c84cdc42 100644 >> --- a/xen/common/domain.c >> +++ b/xen/common/domain.c >> @@ -631,8 +631,9 @@ void domain_update_node_affinity(struct domain *d) >> >> int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity) >> { >> - /* Being affine with no nodes is just wrong */ >> - if ( nodes_empty(*affinity) ) >> + /* Being affine with no nodes, or disjoint with the system, is wrong. */ >> + if ( nodes_empty(*affinity) || >> + !nodes_intersects(*affinity, node_online_map) ) >> return -EINVAL; > Right, and then you don't need the nodes_empty() part anymore. Good point. > With this change folded in (or as a prereq one to allow backporting) you > can add my R-b with the adjustment further up in place. I think I'll fold it together into a single change. Its directly relevant to the subject. ~Andrew
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 44a72d0b19..efa437c7df 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -811,11 +811,18 @@ static struct page_info *get_free_buddy(unsigned int zone_lo, const struct domain *d) { nodeid_t first, node = MEMF_get_node(memflags), req_node = node; - nodemask_t nodemask = d ? d->node_affinity : node_online_map; + nodemask_t nodemask = node_online_map; unsigned int j, zone, nodemask_retry = 0; struct page_info *pg; bool use_unscrubbed = (memflags & MEMF_no_scrub); + /* + * d->node_affinity is our preferred allocation set if provided, but it + * may have bit set outside of node_online_map. Clamp it. + */ + if ( d ) + nodes_and(nodemask, nodemask, d->node_affinity); + if ( node == NUMA_NO_NODE ) { if ( d != NULL )