diff mbox series

[v2] xen/mm: Avoid assuming the page is inuse in assign_pages()

Message ID 20200206103833.15355-1-julien@xen.org (mailing list archive)
State New, archived
Headers show
Series [v2] xen/mm: Avoid assuming the page is inuse in assign_pages() | expand

Commit Message

Julien Grall Feb. 6, 2020, 10:38 a.m. UTC
From: Julien Grall <jgrall@amazon.com>

At the moment, assign_pages() on the page to be inuse (PGC_state_inuse)
and the state value to be 0.

However, the code may race with the page offlining code (see
offline_page()). Depending on the ordering, the page may be in offlining
state (PGC_state_offlining) before it is assigned to a domain.

On debug build, this may result to hit the assert or just clobber the
state. On non-debug build, the state will get clobbered.

Incidentally the flag PGC_broken will get clobbered as well.

Grab the heap_lock to prevent a race with offline_page() and keep the
state and broken flag around.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---
    Changes in v2:
        - Superseed <20200204133357.32101-1-julien@xen.org>
        - Fix the race with offline_page()
---
 xen/common/page_alloc.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Durrant, Paul Feb. 6, 2020, 10:52 a.m. UTC | #1
> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 06 February 2020 10:39
> To: xen-devel@lists.xenproject.org
> Cc: julien@xen.org; Durrant, Paul <pdurrant@amazon.co.uk>; Grall, Julien
> <jgrall@amazon.com>
> Subject: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
> assign_pages()
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, assign_pages() on the page to be inuse (PGC_state_inuse)
> and the state value to be 0.
> 
> However, the code may race with the page offlining code (see
> offline_page()). Depending on the ordering, the page may be in offlining
> state (PGC_state_offlining) before it is assigned to a domain.
> 
> On debug build, this may result to hit the assert or just clobber the
> state. On non-debug build, the state will get clobbered.
> 
> Incidentally the flag PGC_broken will get clobbered as well.
> 
> Grab the heap_lock to prevent a race with offline_page() and keep the
> state and broken flag around.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

This seems like a reasonable change. I guess having assign_pages() take the global lock is no more problem than its existing call to domain_adjust_tot_pages() which also takes the same lock.

  Paul

> 
> ---
>     Changes in v2:
>         - Superseed <20200204133357.32101-1-julien@xen.org>
>         - Fix the race with offline_page()
> ---
>  xen/common/page_alloc.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 97902d42c1..a684dbf37c 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2283,15 +2283,27 @@ int assign_pages(
>              get_knownalive_domain(d);
>      }
> 
> +    spin_lock(&heap_lock);
>      for ( i = 0; i < (1 << order); i++ )
>      {
> +        /*
> +         * We should only be here if the page is inuse or offlining.
> +         * The latter happen if we race with mark_page_offline() as we
> +         * don't hold the heap_lock.
> +         */
> +        ASSERT(page_state_is(&pg[i], inuse) ||
> +               page_state_is(&pg[i], offlining));
> +        ASSERT(!(pg[i].count_info & ~(PGC_state | PGC_broken)));
>          ASSERT(page_get_owner(&pg[i]) == NULL);
> -        ASSERT(!pg[i].count_info);
>          page_set_owner(&pg[i], d);
>          smp_wmb(); /* Domain pointer must be visible before updating
> refcnt. */
> -        pg[i].count_info = PGC_allocated | 1;
> +
> +        pg[i].count_info &= PGC_state | PGC_broken;
> +        pg[i].count_info |= PGC_allocated | 1;
> +
>          page_list_add_tail(&pg[i], &d->page_list);
>      }
> +    spin_unlock(&heap_lock);
> 
>   out:
>      spin_unlock(&d->page_alloc_lock);
> --
> 2.17.1
Julien Grall Feb. 6, 2020, 11:16 a.m. UTC | #2
Hi Paul,

On 06/02/2020 10:52, Durrant, Paul wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 06 February 2020 10:39
>> To: xen-devel@lists.xenproject.org
>> Cc: julien@xen.org; Durrant, Paul <pdurrant@amazon.co.uk>; Grall, Julien
>> <jgrall@amazon.com>
>> Subject: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
>> assign_pages()
>>
>> From: Julien Grall <jgrall@amazon.com>
>>
>> At the moment, assign_pages() on the page to be inuse (PGC_state_inuse)
>> and the state value to be 0.
>>
>> However, the code may race with the page offlining code (see
>> offline_page()). Depending on the ordering, the page may be in offlining
>> state (PGC_state_offlining) before it is assigned to a domain.
>>
>> On debug build, this may result to hit the assert or just clobber the
>> state. On non-debug build, the state will get clobbered.
>>
>> Incidentally the flag PGC_broken will get clobbered as well.
>>
>> Grab the heap_lock to prevent a race with offline_page() and keep the
>> state and broken flag around.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> This seems like a reasonable change. I guess having assign_pages() take the global lock is no more problem than its existing call to domain_adjust_tot_pages() which also takes the same lock.

That's my understanding. Summarizing our discussion IRL for the other, 
it is not clear whether the lock is enough here.

 From my understanding the sequence

pg[i].count_info &= ...;
pg[i].count_info |= ...;

could result to multiple read/write from the compiler. We could use a 
single assignment, but I still don't think this prevent the compiler to 
be use multiple read/write.

The concern would be a race with get_page_owner_and_reference(). If 1 is 
set before the rest of the bits, then you may be able to get the page.

So I might want to use write_atomic() below. Any opinion?

Cheers,
Durrant, Paul Feb. 6, 2020, 11:44 a.m. UTC | #3
> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 06 February 2020 11:17
> To: Durrant, Paul <pdurrant@amazon.co.uk>; xen-devel@lists.xenproject.org
> Cc: Grall, Julien <jgrall@amazon.com>
> Subject: Re: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
> assign_pages()
> 
> Hi Paul,
> 
> On 06/02/2020 10:52, Durrant, Paul wrote:
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 06 February 2020 10:39
> >> To: xen-devel@lists.xenproject.org
> >> Cc: julien@xen.org; Durrant, Paul <pdurrant@amazon.co.uk>; Grall,
> Julien
> >> <jgrall@amazon.com>
> >> Subject: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
> >> assign_pages()
> >>
> >> From: Julien Grall <jgrall@amazon.com>
> >>
> >> At the moment, assign_pages() on the page to be inuse (PGC_state_inuse)
> >> and the state value to be 0.
> >>
> >> However, the code may race with the page offlining code (see
> >> offline_page()). Depending on the ordering, the page may be in
> offlining
> >> state (PGC_state_offlining) before it is assigned to a domain.
> >>
> >> On debug build, this may result to hit the assert or just clobber the
> >> state. On non-debug build, the state will get clobbered.
> >>
> >> Incidentally the flag PGC_broken will get clobbered as well.
> >>
> >> Grab the heap_lock to prevent a race with offline_page() and keep the
> >> state and broken flag around.
> >>
> >> Signed-off-by: Julien Grall <jgrall@amazon.com>
> >
> > This seems like a reasonable change. I guess having assign_pages() take
> the global lock is no more problem than its existing call to
> domain_adjust_tot_pages() which also takes the same lock.
> 
> That's my understanding. Summarizing our discussion IRL for the other,
> it is not clear whether the lock is enough here.
> 
>  From my understanding the sequence
> 
> pg[i].count_info &= ...;
> pg[i].count_info |= ...;
> 
> could result to multiple read/write from the compiler. We could use a
> single assignment, but I still don't think this prevent the compiler to
> be use multiple read/write.
> 
> The concern would be a race with get_page_owner_and_reference(). If 1 is
> set before the rest of the bits, then you may be able to get the page.
> 
> So I might want to use write_atomic() below. Any opinion?
> 

TBH I wonder if we ought to say that any update to count_info ought to be done by a write_atomic (where it's not already done by cmpxchg).

  Paul

> Cheers,
> 
> --
> Julien Grall
Jan Beulich Feb. 6, 2020, 12:57 p.m. UTC | #4
On 06.02.2020 12:44, Durrant, Paul wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 06 February 2020 11:17
>> To: Durrant, Paul <pdurrant@amazon.co.uk>; xen-devel@lists.xenproject.org
>> Cc: Grall, Julien <jgrall@amazon.com>
>> Subject: Re: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
>> assign_pages()
>>
>> Hi Paul,
>>
>> On 06/02/2020 10:52, Durrant, Paul wrote:
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 06 February 2020 10:39
>>>> To: xen-devel@lists.xenproject.org
>>>> Cc: julien@xen.org; Durrant, Paul <pdurrant@amazon.co.uk>; Grall,
>> Julien
>>>> <jgrall@amazon.com>
>>>> Subject: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
>>>> assign_pages()
>>>>
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> At the moment, assign_pages() on the page to be inuse (PGC_state_inuse)
>>>> and the state value to be 0.
>>>>
>>>> However, the code may race with the page offlining code (see
>>>> offline_page()). Depending on the ordering, the page may be in
>> offlining
>>>> state (PGC_state_offlining) before it is assigned to a domain.
>>>>
>>>> On debug build, this may result to hit the assert or just clobber the
>>>> state. On non-debug build, the state will get clobbered.
>>>>
>>>> Incidentally the flag PGC_broken will get clobbered as well.
>>>>
>>>> Grab the heap_lock to prevent a race with offline_page() and keep the
>>>> state and broken flag around.
>>>>
>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>
>>> This seems like a reasonable change. I guess having assign_pages() take
>> the global lock is no more problem than its existing call to
>> domain_adjust_tot_pages() which also takes the same lock.
>>
>> That's my understanding. Summarizing our discussion IRL for the other,
>> it is not clear whether the lock is enough here.
>>
>>  From my understanding the sequence
>>
>> pg[i].count_info &= ...;
>> pg[i].count_info |= ...;
>>
>> could result to multiple read/write from the compiler. We could use a
>> single assignment, but I still don't think this prevent the compiler to
>> be use multiple read/write.
>>
>> The concern would be a race with get_page_owner_and_reference(). If 1 is
>> set before the rest of the bits, then you may be able to get the page.
>>
>> So I might want to use write_atomic() below. Any opinion?
>>
> 
> TBH I wonder if we ought to say that any update to count_info ought to
> be done by a write_atomic (where it's not already done by cmpxchg).

I agree.

Jan
Jan Beulich Feb. 6, 2020, 1:01 p.m. UTC | #5
On 06.02.2020 11:38, Julien Grall wrote:
> However, the code may race with the page offlining code (see
> offline_page()). Depending on the ordering, the page may be in offlining
> state (PGC_state_offlining) before it is assigned to a domain.
> 
> On debug build, this may result to hit the assert or just clobber the
> state. On non-debug build, the state will get clobbered.
> 
> Incidentally the flag PGC_broken will get clobbered as well.

As mentioned when I first pointed out this issue, it is wider than
just assign_pages() afaict, which is specifically why I said I
wouldn't expect you to want to deal with it alongside the "implicit
inuse" aspect. Fixing just one instance of it without also
addressing the others isn't going to help. IOW you could leave the
code the way it was in v1 in this regard, and then we (you, me, or
yet someone else) take care of the race aspect globally for the
tree.

Jan
Andrew Cooper Feb. 6, 2020, 2:01 p.m. UTC | #6
On 06/02/2020 12:57, Jan Beulich wrote:
> On 06.02.2020 12:44, Durrant, Paul wrote:
>>> -----Original Message-----
>>> From: Julien Grall <julien@xen.org>
>>> Sent: 06 February 2020 11:17
>>> To: Durrant, Paul <pdurrant@amazon.co.uk>; xen-devel@lists.xenproject.org
>>> Cc: Grall, Julien <jgrall@amazon.com>
>>> Subject: Re: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
>>> assign_pages()
>>>
>>> Hi Paul,
>>>
>>> On 06/02/2020 10:52, Durrant, Paul wrote:
>>>>> -----Original Message-----
>>>>> From: Julien Grall <julien@xen.org>
>>>>> Sent: 06 February 2020 10:39
>>>>> To: xen-devel@lists.xenproject.org
>>>>> Cc: julien@xen.org; Durrant, Paul <pdurrant@amazon.co.uk>; Grall,
>>> Julien
>>>>> <jgrall@amazon.com>
>>>>> Subject: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
>>>>> assign_pages()
>>>>>
>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>
>>>>> At the moment, assign_pages() on the page to be inuse (PGC_state_inuse)
>>>>> and the state value to be 0.
>>>>>
>>>>> However, the code may race with the page offlining code (see
>>>>> offline_page()). Depending on the ordering, the page may be in
>>> offlining
>>>>> state (PGC_state_offlining) before it is assigned to a domain.
>>>>>
>>>>> On debug build, this may result to hit the assert or just clobber the
>>>>> state. On non-debug build, the state will get clobbered.
>>>>>
>>>>> Incidentally the flag PGC_broken will get clobbered as well.
>>>>>
>>>>> Grab the heap_lock to prevent a race with offline_page() and keep the
>>>>> state and broken flag around.
>>>>>
>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>> This seems like a reasonable change. I guess having assign_pages() take
>>> the global lock is no more problem than its existing call to
>>> domain_adjust_tot_pages() which also takes the same lock.
>>>
>>> That's my understanding. Summarizing our discussion IRL for the other,
>>> it is not clear whether the lock is enough here.
>>>
>>>  From my understanding the sequence
>>>
>>> pg[i].count_info &= ...;
>>> pg[i].count_info |= ...;
>>>
>>> could result to multiple read/write from the compiler. We could use a
>>> single assignment, but I still don't think this prevent the compiler to
>>> be use multiple read/write.
>>>
>>> The concern would be a race with get_page_owner_and_reference(). If 1 is
>>> set before the rest of the bits, then you may be able to get the page.
>>>
>>> So I might want to use write_atomic() below. Any opinion?
>>>
>> TBH I wonder if we ought to say that any update to count_info ought to
>> be done by a write_atomic (where it's not already done by cmpxchg).
> I agree.

It won't fix anything, and gives the compiler a harder time.

write_atomic() is a mov instruction.  It prohibits the use of and/or
$imm, mem encodings.

If multiple reads/writes are a concern then the only valid code
generation is for *every* modification of count info to use locked
operations.

Swapping regular C for a single mov instruction here is not going to fix
a bug, if such a bug exists.

~Andrew
diff mbox series

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 97902d42c1..a684dbf37c 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2283,15 +2283,27 @@  int assign_pages(
             get_knownalive_domain(d);
     }
 
+    spin_lock(&heap_lock);
     for ( i = 0; i < (1 << order); i++ )
     {
+        /*
+         * We should only be here if the page is inuse or offlining.
+         * The latter happen if we race with mark_page_offline() as we
+         * don't hold the heap_lock.
+         */
+        ASSERT(page_state_is(&pg[i], inuse) ||
+               page_state_is(&pg[i], offlining));
+        ASSERT(!(pg[i].count_info & ~(PGC_state | PGC_broken)));
         ASSERT(page_get_owner(&pg[i]) == NULL);
-        ASSERT(!pg[i].count_info);
         page_set_owner(&pg[i], d);
         smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
-        pg[i].count_info = PGC_allocated | 1;
+
+        pg[i].count_info &= PGC_state | PGC_broken;
+        pg[i].count_info |= PGC_allocated | 1;
+
         page_list_add_tail(&pg[i], &d->page_list);
     }
+    spin_unlock(&heap_lock);
 
  out:
     spin_unlock(&d->page_alloc_lock);