diff mbox series

xen/mm: Avoid assuming PG_state_inuse == 0 in assign_pages()

Message ID 20200204133357.32101-1-julien@xen.org (mailing list archive)
State New, archived
Headers show
Series xen/mm: Avoid assuming PG_state_inuse == 0 in assign_pages() | expand

Commit Message

Julien Grall Feb. 4, 2020, 1:33 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

At the moment, assign_pages() relies on PG_state_inuse to be 0. This
makes the code slightly more difficult to understand.

Rework the code to explicitly check against PG_state_inuse.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/common/page_alloc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Jan Beulich Feb. 4, 2020, 1:40 p.m. UTC | #1
On 04.02.2020 14:33, Julien Grall wrote:
> At the moment, assign_pages() relies on PG_state_inuse to be 0. This
> makes the code slightly more difficult to understand.

I can certainly see where you're coming from, but ...

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2286,10 +2286,11 @@ int assign_pages(
>      for ( i = 0; i < (1 << order); i++ )
>      {
>          ASSERT(page_get_owner(&pg[i]) == NULL);
> -        ASSERT(!pg[i].count_info);
> +        ASSERT(page_state_is(&pg[i], inuse));
> +        ASSERT(!(pg[i].count_info & (~PGC_state)));

... I think this one is better in its original form. An option
might be to put a BUILD_BUG_ON() next to it. (In no case, imo,
there should be parentheses around a negation expression.)

>          page_set_owner(&pg[i], d);
>          smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
> -        pg[i].count_info = PGC_allocated | 1;
> +        pg[i].count_info = PGC_state_inuse | PGC_allocated | 1;

As opposed to the above, I certainly appreciate the change here.

Jan
Julien Grall Feb. 4, 2020, 1:51 p.m. UTC | #2
On 04/02/2020 13:40, Jan Beulich wrote:
> On 04.02.2020 14:33, Julien Grall wrote:
>> At the moment, assign_pages() relies on PG_state_inuse to be 0. This
>> makes the code slightly more difficult to understand.
> 
> I can certainly see where you're coming from, but ...
> 
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -2286,10 +2286,11 @@ int assign_pages(
>>       for ( i = 0; i < (1 << order); i++ )
>>       {
>>           ASSERT(page_get_owner(&pg[i]) == NULL);
>> -        ASSERT(!pg[i].count_info);
>> +        ASSERT(page_state_is(&pg[i], inuse));
>> +        ASSERT(!(pg[i].count_info & (~PGC_state)));
> 
> ... I think this one is better in its original form. An option
> might be to put a BUILD_BUG_ON() next to it.

I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could 
simplify to "(pg[i].count_info != PGC_state_inuse)".

Would that be more suitable?

> (In no case, imo,
> there should be parentheses around a negation expression.)

While I understand this would be valid, I find it a bit easier to read.

Cheers,
Jan Beulich Feb. 4, 2020, 3:13 p.m. UTC | #3
On 04.02.2020 14:51, Julien Grall wrote:
> 
> 
> On 04/02/2020 13:40, Jan Beulich wrote:
>> On 04.02.2020 14:33, Julien Grall wrote:
>>> At the moment, assign_pages() relies on PG_state_inuse to be 0. This
>>> makes the code slightly more difficult to understand.
>>
>> I can certainly see where you're coming from, but ...
>>
>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -2286,10 +2286,11 @@ int assign_pages(
>>>       for ( i = 0; i < (1 << order); i++ )
>>>       {
>>>           ASSERT(page_get_owner(&pg[i]) == NULL);
>>> -        ASSERT(!pg[i].count_info);
>>> +        ASSERT(page_state_is(&pg[i], inuse));
>>> +        ASSERT(!(pg[i].count_info & (~PGC_state)));
>>
>> ... I think this one is better in its original form. An option
>> might be to put a BUILD_BUG_ON() next to it.
> 
> I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could 
> simplify to "(pg[i].count_info != PGC_state_inuse)".
> 
> Would that be more suitable?

Yes, certainly.

However, isn't the ASSERT() itself wrong? We don't hold the heap lock
here, so mark_page_offline() could transition the page from inuse to
offlining (and possibly also set PGC_broken on it) at any point in
time. This wasn't obvious without the two PGC_inuse uses you add, but
becomes pretty apparent with them. Of course the simple assignment
that you adjust further down then also can't be a simple assignment
anymore.

Since this would move you quite far away from simply a cosmetic
patch (the problem as a whole looks to be wider than just the one
case above), I could understand if you didn't want to fix this at
this occasion. Yet then I think the patch description shouldn't give
the impression that all else is well, but instead at least outline
the issue (for someone else to pick up, possibly me).

>> (In no case, imo,
>> there should be parentheses around a negation expression.)
> 
> While I understand this would be valid, I find it a bit easier to read.

Quite the opposite for me, I'm afraid.

Jan
Julien Grall Feb. 5, 2020, 11:24 a.m. UTC | #4
Hi Jan,

On 04/02/2020 15:13, Jan Beulich wrote:
> On 04.02.2020 14:51, Julien Grall wrote:
>>
>>
>> On 04/02/2020 13:40, Jan Beulich wrote:
>>> On 04.02.2020 14:33, Julien Grall wrote:
>>>> At the moment, assign_pages() relies on PG_state_inuse to be 0. This
>>>> makes the code slightly more difficult to understand.
>>>
>>> I can certainly see where you're coming from, but ...
>>>
>>>> --- a/xen/common/page_alloc.c
>>>> +++ b/xen/common/page_alloc.c
>>>> @@ -2286,10 +2286,11 @@ int assign_pages(
>>>>        for ( i = 0; i < (1 << order); i++ )
>>>>        {
>>>>            ASSERT(page_get_owner(&pg[i]) == NULL);
>>>> -        ASSERT(!pg[i].count_info);
>>>> +        ASSERT(page_state_is(&pg[i], inuse));
>>>> +        ASSERT(!(pg[i].count_info & (~PGC_state)));
>>>
>>> ... I think this one is better in its original form. An option
>>> might be to put a BUILD_BUG_ON() next to it.
>>
>> I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could
>> simplify to "(pg[i].count_info != PGC_state_inuse)".
>>
>> Would that be more suitable?
> 
> Yes, certainly.
> 
> However, isn't the ASSERT() itself wrong? We don't hold the heap lock
> here, so mark_page_offline() could transition the page from inuse to
> offlining (and possibly also set PGC_broken on it) at any point in
> time. This wasn't obvious without the two PGC_inuse uses you add, but
> becomes pretty apparent with them. Of course the simple assignment
> that you adjust further down then also can't be a simple assignment
> anymore.

You are right, assign_pages() could race with mark_page_offline(). We 
would need to use a cmpxchg() loop to change type. If one of the page is 
getting offlined, then we would need to revert all the changes and 
return an error.

> 
> Since this would move you quite far away from simply a cosmetic
> patch (the problem as a whole looks to be wider than just the one
> case above), I could understand if you didn't want to fix this at
> this occasion. Yet then I think the patch description shouldn't give
> the impression that all else is well, but instead at least outline
> the issue (for someone else to pick up, possibly me).

I am happy to have a look at it.

Cheers,
Jan Beulich Feb. 5, 2020, 1:36 p.m. UTC | #5
On 05.02.2020 12:24, Julien Grall wrote:
> Hi Jan,
> 
> On 04/02/2020 15:13, Jan Beulich wrote:
>> On 04.02.2020 14:51, Julien Grall wrote:
>>>
>>>
>>> On 04/02/2020 13:40, Jan Beulich wrote:
>>>> On 04.02.2020 14:33, Julien Grall wrote:
>>>>> At the moment, assign_pages() relies on PG_state_inuse to be 0. This
>>>>> makes the code slightly more difficult to understand.
>>>>
>>>> I can certainly see where you're coming from, but ...
>>>>
>>>>> --- a/xen/common/page_alloc.c
>>>>> +++ b/xen/common/page_alloc.c
>>>>> @@ -2286,10 +2286,11 @@ int assign_pages(
>>>>>        for ( i = 0; i < (1 << order); i++ )
>>>>>        {
>>>>>            ASSERT(page_get_owner(&pg[i]) == NULL);
>>>>> -        ASSERT(!pg[i].count_info);
>>>>> +        ASSERT(page_state_is(&pg[i], inuse));
>>>>> +        ASSERT(!(pg[i].count_info & (~PGC_state)));
>>>>
>>>> ... I think this one is better in its original form. An option
>>>> might be to put a BUILD_BUG_ON() next to it.
>>>
>>> I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could
>>> simplify to "(pg[i].count_info != PGC_state_inuse)".
>>>
>>> Would that be more suitable?
>>
>> Yes, certainly.
>>
>> However, isn't the ASSERT() itself wrong? We don't hold the heap lock
>> here, so mark_page_offline() could transition the page from inuse to
>> offlining (and possibly also set PGC_broken on it) at any point in
>> time. This wasn't obvious without the two PGC_inuse uses you add, but
>> becomes pretty apparent with them. Of course the simple assignment
>> that you adjust further down then also can't be a simple assignment
>> anymore.
> 
> You are right, assign_pages() could race with mark_page_offline(). We 
> would need to use a cmpxchg() loop to change type. If one of the page is 
> getting offlined, then we would need to revert all the changes and 
> return an error.

I'm not sure we need to go this far. The change of page state
happening behind assign_pages()' back is no different from it
happening after assign_pages() is done. All we need to make sure is
that we don't clobber the state change.

I'm also not sure a cmpxchg loop is needed here. Acquiring and
releasing the heap lock may do, too. You'll find an example of this
elsewhere in the same file, iirc.

Jan
Julien Grall Feb. 5, 2020, 2:26 p.m. UTC | #6
Hi Jan,

On 05/02/2020 13:36, Jan Beulich wrote:
> On 05.02.2020 12:24, Julien Grall wrote:
>> Hi Jan,
>>
>> On 04/02/2020 15:13, Jan Beulich wrote:
>>> On 04.02.2020 14:51, Julien Grall wrote:
>>>>
>>>>
>>>> On 04/02/2020 13:40, Jan Beulich wrote:
>>>>> On 04.02.2020 14:33, Julien Grall wrote:
>>>>>> At the moment, assign_pages() relies on PG_state_inuse to be 0. This
>>>>>> makes the code slightly more difficult to understand.
>>>>>
>>>>> I can certainly see where you're coming from, but ...
>>>>>
>>>>>> --- a/xen/common/page_alloc.c
>>>>>> +++ b/xen/common/page_alloc.c
>>>>>> @@ -2286,10 +2286,11 @@ int assign_pages(
>>>>>>         for ( i = 0; i < (1 << order); i++ )
>>>>>>         {
>>>>>>             ASSERT(page_get_owner(&pg[i]) == NULL);
>>>>>> -        ASSERT(!pg[i].count_info);
>>>>>> +        ASSERT(page_state_is(&pg[i], inuse));
>>>>>> +        ASSERT(!(pg[i].count_info & (~PGC_state)));
>>>>>
>>>>> ... I think this one is better in its original form. An option
>>>>> might be to put a BUILD_BUG_ON() next to it.
>>>>
>>>> I want to avoid a BUILD_BUG_ON() if possible. I just realized, I could
>>>> simplify to "(pg[i].count_info != PGC_state_inuse)".
>>>>
>>>> Would that be more suitable?
>>>
>>> Yes, certainly.
>>>
>>> However, isn't the ASSERT() itself wrong? We don't hold the heap lock
>>> here, so mark_page_offline() could transition the page from inuse to
>>> offlining (and possibly also set PGC_broken on it) at any point in
>>> time. This wasn't obvious without the two PGC_inuse uses you add, but
>>> becomes pretty apparent with them. Of course the simple assignment
>>> that you adjust further down then also can't be a simple assignment
>>> anymore.
>>
>> You are right, assign_pages() could race with mark_page_offline(). We
>> would need to use a cmpxchg() loop to change type. If one of the page is
>> getting offlined, then we would need to revert all the changes and
>> return an error.
> 
> I'm not sure we need to go this far. The change of page state
> happening behind assign_pages()' back is no different from it
> happening after assign_pages() is done.

There will be a slight difference for the offline code. After 
assign_pages() we would return PG_OFFLINE_OWNED... while in the middle 
of assign_pages() we would return PG_OFFLINE_ANYMOUS.

I was concern of the interaction. But it looks like this was taken care 
per the comment in offline_pages (though the comment is slightly confusing).

Anyway, try to revert the change would be real pain. So we can avoid it 
then it is much better :).

> All we need to make sure is
> that we don't clobber the state change.
> 
> I'm also not sure a cmpxchg loop is needed here. Acquiring and
> releasing the heap lock may do, too. You'll find an example of this
> elsewhere in the same file, iirc.

I would have thought the cmpxchg() would be your preference. Anyway, if 
you are happy with the lock, then this is better and less code.

The locking ordering would also be fine as domain_adjust_tot_pages() 
already request the d->page_alloc_lock to be taken first.

Cheers,
diff mbox series

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 97902d42c1..54773bc42f 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2286,10 +2286,11 @@  int assign_pages(
     for ( i = 0; i < (1 << order); i++ )
     {
         ASSERT(page_get_owner(&pg[i]) == NULL);
-        ASSERT(!pg[i].count_info);
+        ASSERT(page_state_is(&pg[i], inuse));
+        ASSERT(!(pg[i].count_info & (~PGC_state)));
         page_set_owner(&pg[i], d);
         smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
-        pg[i].count_info = PGC_allocated | 1;
+        pg[i].count_info = PGC_state_inuse | PGC_allocated | 1;
         page_list_add_tail(&pg[i], &d->page_list);
     }