Message ID | 20200204133357.32101-1-julien@xen.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen/mm: Avoid assuming PG_state_inuse == 0 in assign_pages() | expand |
On 04.02.2020 14:33, Julien Grall wrote: > At the moment, assign_pages() relies on PG_state_inuse to be 0. This > makes the code slightly more difficult to understand. I can certainly see where you're coming from, but ... > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -2286,10 +2286,11 @@ int assign_pages( > for ( i = 0; i < (1 << order); i++ ) > { > ASSERT(page_get_owner(&pg[i]) == NULL); > - ASSERT(!pg[i].count_info); > + ASSERT(page_state_is(&pg[i], inuse)); > + ASSERT(!(pg[i].count_info & (~PGC_state))); ... I think this one is better in its original form. An option might be to put a BUILD_BUG_ON() next to it. (In no case, imo, there should be parentheses around a negation expression.) > page_set_owner(&pg[i], d); > smp_wmb(); /* Domain pointer must be visible before updating refcnt. */ > - pg[i].count_info = PGC_allocated | 1; > + pg[i].count_info = PGC_state_inuse | PGC_allocated | 1; As opposed to the above, I certainly appreciate the change here. Jan
On 04/02/2020 13:40, Jan Beulich wrote: > On 04.02.2020 14:33, Julien Grall wrote: >> At the moment, assign_pages() relies on PG_state_inuse to be 0. This >> makes the code slightly more difficult to understand. > > I can certainly see where you're coming from, but ... > >> --- a/xen/common/page_alloc.c >> +++ b/xen/common/page_alloc.c >> @@ -2286,10 +2286,11 @@ int assign_pages( >> for ( i = 0; i < (1 << order); i++ ) >> { >> ASSERT(page_get_owner(&pg[i]) == NULL); >> - ASSERT(!pg[i].count_info); >> + ASSERT(page_state_is(&pg[i], inuse)); >> + ASSERT(!(pg[i].count_info & (~PGC_state))); > > ... I think this one is better in its original form. An option > might be to put a BUILD_BUG_ON() next to it. I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could simplify to "(pg[i].count_info != PGC_state_inuse)". Would that be more suitable? > (In no case, imo, > there should be parentheses around a negation expression.) While I understand this would be valid, I find it a bit easier to read. Cheers,
On 04.02.2020 14:51, Julien Grall wrote: > > > On 04/02/2020 13:40, Jan Beulich wrote: >> On 04.02.2020 14:33, Julien Grall wrote: >>> At the moment, assign_pages() relies on PG_state_inuse to be 0. This >>> makes the code slightly more difficult to understand. >> >> I can certainly see where you're coming from, but ... >> >>> --- a/xen/common/page_alloc.c >>> +++ b/xen/common/page_alloc.c >>> @@ -2286,10 +2286,11 @@ int assign_pages( >>> for ( i = 0; i < (1 << order); i++ ) >>> { >>> ASSERT(page_get_owner(&pg[i]) == NULL); >>> - ASSERT(!pg[i].count_info); >>> + ASSERT(page_state_is(&pg[i], inuse)); >>> + ASSERT(!(pg[i].count_info & (~PGC_state))); >> >> ... I think this one is better in its original form. An option >> might be to put a BUILD_BUG_ON() next to it. > > I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could > simplify to "(pg[i].count_info != PGC_state_inuse)". > > Would that be more suitable? Yes, certainly. However, isn't the ASSERT() itself wrong? We don't hold the heap lock here, so mark_page_offline() could transition the page from inuse to offlining (and possibly also set PGC_broken on it) at any point in time. This wasn't obvious without the two PGC_inuse uses you add, but becomes pretty apparent with them. Of course the simple assignment that you adjust further down then also can't be a simple assignment anymore. Since this would move you quite far away from simply a cosmetic patch (the problem as a whole looks to be wider than just the one case above), I could understand if you didn't want to fix this at this occasion. Yet then I think the patch description shouldn't give the impression that all else is well, but instead at least outline the issue (for someone else to pick up, possibly me). >> (In no case, imo, >> there should be parentheses around a negation expression.) > > While I understand this would be valid, I find it a bit easier to read. Quite the opposite for me, I'm afraid. Jan
Hi Jan, On 04/02/2020 15:13, Jan Beulich wrote: > On 04.02.2020 14:51, Julien Grall wrote: >> >> >> On 04/02/2020 13:40, Jan Beulich wrote: >>> On 04.02.2020 14:33, Julien Grall wrote: >>>> At the moment, assign_pages() relies on PG_state_inuse to be 0. This >>>> makes the code slightly more difficult to understand. >>> >>> I can certainly see where you're coming from, but ... >>> >>>> --- a/xen/common/page_alloc.c >>>> +++ b/xen/common/page_alloc.c >>>> @@ -2286,10 +2286,11 @@ int assign_pages( >>>> for ( i = 0; i < (1 << order); i++ ) >>>> { >>>> ASSERT(page_get_owner(&pg[i]) == NULL); >>>> - ASSERT(!pg[i].count_info); >>>> + ASSERT(page_state_is(&pg[i], inuse)); >>>> + ASSERT(!(pg[i].count_info & (~PGC_state))); >>> >>> ... I think this one is better in its original form. An option >>> might be to put a BUILD_BUG_ON() next to it. >> >> I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could >> simplify to "(pg[i].count_info != PGC_state_inuse)". >> >> Would that be more suitable? > > Yes, certainly. > > However, isn't the ASSERT() itself wrong? We don't hold the heap lock > here, so mark_page_offline() could transition the page from inuse to > offlining (and possibly also set PGC_broken on it) at any point in > time. This wasn't obvious without the two PGC_inuse uses you add, but > becomes pretty apparent with them. Of course the simple assignment > that you adjust further down then also can't be a simple assignment > anymore. You are right, assign_pages() could race with mark_page_offline(). We would need to use a cmpxchg() loop to change type. If one of the page is getting offlined, then we would need to revert all the changes and return an error. > > Since this would move you quite far away from simply a cosmetic > patch (the problem as a whole looks to be wider than just the one > case above), I could understand if you didn't want to fix this at > this occasion. Yet then I think the patch description shouldn't give > the impression that all else is well, but instead at least outline > the issue (for someone else to pick up, possibly me). I am happy to have a look at it. Cheers,
On 05.02.2020 12:24, Julien Grall wrote: > Hi Jan, > > On 04/02/2020 15:13, Jan Beulich wrote: >> On 04.02.2020 14:51, Julien Grall wrote: >>> >>> >>> On 04/02/2020 13:40, Jan Beulich wrote: >>>> On 04.02.2020 14:33, Julien Grall wrote: >>>>> At the moment, assign_pages() relies on PG_state_inuse to be 0. This >>>>> makes the code slightly more difficult to understand. >>>> >>>> I can certainly see where you're coming from, but ... >>>> >>>>> --- a/xen/common/page_alloc.c >>>>> +++ b/xen/common/page_alloc.c >>>>> @@ -2286,10 +2286,11 @@ int assign_pages( >>>>> for ( i = 0; i < (1 << order); i++ ) >>>>> { >>>>> ASSERT(page_get_owner(&pg[i]) == NULL); >>>>> - ASSERT(!pg[i].count_info); >>>>> + ASSERT(page_state_is(&pg[i], inuse)); >>>>> + ASSERT(!(pg[i].count_info & (~PGC_state))); >>>> >>>> ... I think this one is better in its original form. An option >>>> might be to put a BUILD_BUG_ON() next to it. >>> >>> I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could >>> simplify to "(pg[i].count_info != PGC_state_inuse)". >>> >>> Would that be more suitable? >> >> Yes, certainly. >> >> However, isn't the ASSERT() itself wrong? We don't hold the heap lock >> here, so mark_page_offline() could transition the page from inuse to >> offlining (and possibly also set PGC_broken on it) at any point in >> time. This wasn't obvious without the two PGC_inuse uses you add, but >> becomes pretty apparent with them. Of course the simple assignment >> that you adjust further down then also can't be a simple assignment >> anymore. > > You are right, assign_pages() could race with mark_page_offline(). We > would need to use a cmpxchg() loop to change type. If one of the page is > getting offlined, then we would need to revert all the changes and > return an error. I'm not sure we need to go this far. The change of page state happening behind assign_pages()' back is no different from it happening after assign_pages() is done. All we need to make sure is that we don't clobber the state change. I'm also not sure a cmpxchg loop is needed here. Acquiring and releasing the heap lock may do, too. You'll find an example of this elsewhere in the same file, iirc. Jan
Hi Jan, On 05/02/2020 13:36, Jan Beulich wrote: > On 05.02.2020 12:24, Julien Grall wrote: >> Hi Jan, >> >> On 04/02/2020 15:13, Jan Beulich wrote: >>> On 04.02.2020 14:51, Julien Grall wrote: >>>> >>>> >>>> On 04/02/2020 13:40, Jan Beulich wrote: >>>>> On 04.02.2020 14:33, Julien Grall wrote: >>>>>> At the moment, assign_pages() relies on PG_state_inuse to be 0. This >>>>>> makes the code slightly more difficult to understand. >>>>> >>>>> I can certainly see where you're coming from, but ... >>>>> >>>>>> --- a/xen/common/page_alloc.c >>>>>> +++ b/xen/common/page_alloc.c >>>>>> @@ -2286,10 +2286,11 @@ int assign_pages( >>>>>> for ( i = 0; i < (1 << order); i++ ) >>>>>> { >>>>>> ASSERT(page_get_owner(&pg[i]) == NULL); >>>>>> - ASSERT(!pg[i].count_info); >>>>>> + ASSERT(page_state_is(&pg[i], inuse)); >>>>>> + ASSERT(!(pg[i].count_info & (~PGC_state))); >>>>> >>>>> ... I think this one is better in its original form. An option >>>>> might be to put a BUILD_BUG_ON() next to it. >>>> >>>> I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could >>>> simplify to "(pg[i].count_info != PGC_state_inuse)". >>>> >>>> Would that be more suitable? >>> >>> Yes, certainly. >>> >>> However, isn't the ASSERT() itself wrong? We don't hold the heap lock >>> here, so mark_page_offline() could transition the page from inuse to >>> offlining (and possibly also set PGC_broken on it) at any point in >>> time. This wasn't obvious without the two PGC_inuse uses you add, but >>> becomes pretty apparent with them. Of course the simple assignment >>> that you adjust further down then also can't be a simple assignment >>> anymore. >> >> You are right, assign_pages() could race with mark_page_offline(). We >> would need to use a cmpxchg() loop to change type. If one of the page is >> getting offlined, then we would need to revert all the changes and >> return an error. > > I'm not sure we need to go this far. The change of page state > happening behind assign_pages()' back is no different from it > happening after assign_pages() is done. There will be a slight difference for the offline code. After assign_pages() we would return PG_OFFLINE_OWNED... while in the middle of assign_pages() we would return PG_OFFLINE_ANYMOUS. I was concern of the interaction. But it looks like this was taken care per the comment in offline_pages (though the comment is slightly confusing). Anyway, try to revert the change would be real pain. So we can avoid it then it is much better :). > All we need to make sure is > that we don't clobber the state change. > > I'm also not sure a cmpxchg loop is needed here. Acquiring and > releasing the heap lock may do, too. You'll find an example of this > elsewhere in the same file, iirc. I would have thought the cmpxchg() would be your preference. Anyway, if you are happy with the lock, then this is better and less code. The locking ordering would also be fine as domain_adjust_tot_pages() already request the d->page_alloc_lock to be taken first. Cheers,
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 97902d42c1..54773bc42f 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -2286,10 +2286,11 @@ int assign_pages( for ( i = 0; i < (1 << order); i++ ) { ASSERT(page_get_owner(&pg[i]) == NULL); - ASSERT(!pg[i].count_info); + ASSERT(page_state_is(&pg[i], inuse)); + ASSERT(!(pg[i].count_info & (~PGC_state))); page_set_owner(&pg[i], d); smp_wmb(); /* Domain pointer must be visible before updating refcnt. */ - pg[i].count_info = PGC_allocated | 1; + pg[i].count_info = PGC_state_inuse | PGC_allocated | 1; page_list_add_tail(&pg[i], &d->page_list); }