diff mbox series

[v2] xen/x86: clear per cpu stub page information in cpu_smpboot_free()

Message ID 20200108143439.25580-1-jgross@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] xen/x86: clear per cpu stub page information in cpu_smpboot_free() | expand

Commit Message

Jürgen Groß Jan. 8, 2020, 2:34 p.m. UTC
cpu_smpboot_free() removes the stubs for the cpu going offline, but it
isn't clearing the related percpu variables. This will result in
crashes when a stub page is released due to all related cpus gone
offline and one of those cpus going online later.

Fix that by clearing stubs.addr and stubs.mfn in order to allocate a
new stub page when needed.

Fixes: 2e6c8f182c9c50 ("x86: distinguish CPU offlining from CPU removal")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
---
 xen/arch/x86/smpboot.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jan Beulich Jan. 8, 2020, 3:21 p.m. UTC | #1
On 08.01.2020 15:34, Juergen Gross wrote:
> cpu_smpboot_free() removes the stubs for the cpu going offline, but it
> isn't clearing the related percpu variables. This will result in
> crashes when a stub page is released due to all related cpus gone
> offline and one of those cpus going online later.
> 
> Fix that by clearing stubs.addr and stubs.mfn in order to allocate a
> new stub page when needed.

I was really hoping for you to mention CPU parking here. How about

"Fix that by clearing stubs.mfn (and also stubs.addr just to be on
 the safe side) in order to allocate a new stub page when needed,
 irrespective of whether the CPU gets parked or removed."

> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -945,6 +945,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
>                               (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) + 1);
>          if ( i == STUBS_PER_PAGE )
>              free_domheap_page(mfn_to_page(mfn));
> +        per_cpu(stubs.addr, cpu) = 0;
> +        per_cpu(stubs.mfn, cpu) = 0;

Looking more closely, I think I'd prefer these two lines (of which
the addr one isn't strictly needed anyway) to move ahead of the
if().

If you agree, I'll be happy to do both while committing.

Jan
Jürgen Groß Jan. 8, 2020, 3:29 p.m. UTC | #2
On 08.01.20 16:21, Jan Beulich wrote:
> On 08.01.2020 15:34, Juergen Gross wrote:
>> cpu_smpboot_free() removes the stubs for the cpu going offline, but it
>> isn't clearing the related percpu variables. This will result in
>> crashes when a stub page is released due to all related cpus gone
>> offline and one of those cpus going online later.
>>
>> Fix that by clearing stubs.addr and stubs.mfn in order to allocate a
>> new stub page when needed.
> 
> I was really hoping for you to mention CPU parking here. How about
> 
> "Fix that by clearing stubs.mfn (and also stubs.addr just to be on
>   the safe side) in order to allocate a new stub page when needed,
>   irrespective of whether the CPU gets parked or removed."
> 
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -945,6 +945,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
>>                                (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) + 1);
>>           if ( i == STUBS_PER_PAGE )
>>               free_domheap_page(mfn_to_page(mfn));
>> +        per_cpu(stubs.addr, cpu) = 0;
>> +        per_cpu(stubs.mfn, cpu) = 0;
> 
> Looking more closely, I think I'd prefer these two lines (of which
> the addr one isn't strictly needed anyway) to move ahead of the
> if().
> 
> If you agree, I'll be happy to do both while committing.

I agree.

I'm not sure the addr clearing can be omitted. This might result in
problems when during onlining an early error happens in
cpu_smpboot_alloc() and thus skipping the call of alloc_stub_page().
The subsequent call of cpu_smpboot_free() will then overwrite mfn 0.


Juergen
Jan Beulich Jan. 8, 2020, 4:48 p.m. UTC | #3
On 08.01.2020 16:29, Jürgen Groß wrote:
> On 08.01.20 16:21, Jan Beulich wrote:
>> On 08.01.2020 15:34, Juergen Gross wrote:
>>> cpu_smpboot_free() removes the stubs for the cpu going offline, but it
>>> isn't clearing the related percpu variables. This will result in
>>> crashes when a stub page is released due to all related cpus gone
>>> offline and one of those cpus going online later.
>>>
>>> Fix that by clearing stubs.addr and stubs.mfn in order to allocate a
>>> new stub page when needed.
>>
>> I was really hoping for you to mention CPU parking here. How about
>>
>> "Fix that by clearing stubs.mfn (and also stubs.addr just to be on
>>   the safe side) in order to allocate a new stub page when needed,
>>   irrespective of whether the CPU gets parked or removed."
>>
>>> --- a/xen/arch/x86/smpboot.c
>>> +++ b/xen/arch/x86/smpboot.c
>>> @@ -945,6 +945,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
>>>                                (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) + 1);
>>>           if ( i == STUBS_PER_PAGE )
>>>               free_domheap_page(mfn_to_page(mfn));
>>> +        per_cpu(stubs.addr, cpu) = 0;
>>> +        per_cpu(stubs.mfn, cpu) = 0;
>>
>> Looking more closely, I think I'd prefer these two lines (of which
>> the addr one isn't strictly needed anyway) to move ahead of the
>> if().
>>
>> If you agree, I'll be happy to do both while committing.
> 
> I agree.
> 
> I'm not sure the addr clearing can be omitted. This might result in
> problems when during onlining an early error happens in
> cpu_smpboot_alloc() and thus skipping the call of alloc_stub_page().
> The subsequent call of cpu_smpboot_free() will then overwrite mfn 0.

Oh, good point.

Jan
Tao Xu Jan. 9, 2020, 1:08 a.m. UTC | #4
Thank you Juergen. This patch fix the issue in

XEN crash and double fault when doing cpu online/offline
https://lists.xenproject.org/archives/html/xen-devel/2020-01/msg00424.html

Tested-by: Tao Xu <tao3.xu@intel.com>

On 1/8/2020 10:34 PM, Juergen Gross wrote:
> cpu_smpboot_free() removes the stubs for the cpu going offline, but it
> isn't clearing the related percpu variables. This will result in
> crashes when a stub page is released due to all related cpus gone
> offline and one of those cpus going online later.
> 
> Fix that by clearing stubs.addr and stubs.mfn in order to allocate a
> new stub page when needed.
> 
> Fixes: 2e6c8f182c9c50 ("x86: distinguish CPU offlining from CPU removal")
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: Wei Liu <wl@xen.org>
> ---
>   xen/arch/x86/smpboot.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index 7e29704080..46c0729214 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -945,6 +945,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
>                                (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) + 1);
>           if ( i == STUBS_PER_PAGE )
>               free_domheap_page(mfn_to_page(mfn));
> +        per_cpu(stubs.addr, cpu) = 0;
> +        per_cpu(stubs.mfn, cpu) = 0;
>       }
> 
>       FREE_XENHEAP_PAGE(per_cpu(compat_gdt, cpu));
> --
> 2.16.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
>
diff mbox series

Patch

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 7e29704080..46c0729214 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -945,6 +945,8 @@  static void cpu_smpboot_free(unsigned int cpu, bool remove)
                              (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) + 1);
         if ( i == STUBS_PER_PAGE )
             free_domheap_page(mfn_to_page(mfn));
+        per_cpu(stubs.addr, cpu) = 0;
+        per_cpu(stubs.mfn, cpu) = 0;
     }
 
     FREE_XENHEAP_PAGE(per_cpu(compat_gdt, cpu));