diff mbox series

[v1,1/2] x86/cpu: maintain a parked CPU bitmap

Message ID 1574291155-26032-1-git-send-email-chao.gao@intel.com (mailing list archive)
State New, archived
Headers show
Series [v1,1/2] x86/cpu: maintain a parked CPU bitmap | expand

Commit Message

Chao Gao Nov. 20, 2019, 11:05 p.m. UTC
It helps to distinguish parked CPUs from those are really offlined or
hot-added. We need to know the parked CPUs in order to do a special
check against them to ensure that all CPUs, except those are really
offlined or hot-added, have the same ucode version.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Note that changes on ARM side are untested.
---
 xen/arch/arm/smpboot.c    | 1 +
 xen/arch/x86/cpu/common.c | 4 ++++
 xen/arch/x86/smpboot.c    | 1 +
 xen/common/cpu.c          | 4 ++++
 xen/include/asm-arm/smp.h | 1 +
 xen/include/asm-x86/smp.h | 1 +
 xen/include/xen/cpumask.h | 1 +
 7 files changed, 13 insertions(+)

Comments

Julien Grall Nov. 21, 2019, 9:47 a.m. UTC | #1
Hi,

On 20/11/2019 23:05, Chao Gao wrote:
> It helps to distinguish parked CPUs from those are really offlined or
> hot-added. We need to know the parked CPUs in order to do a special
> check against them to ensure that all CPUs, except those are really
> offlined or hot-added, have the same ucode version.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> Note that changes on ARM side are untested.
> ---
>   xen/arch/arm/smpboot.c    | 1 +
>   xen/arch/x86/cpu/common.c | 4 ++++
>   xen/arch/x86/smpboot.c    | 1 +
>   xen/common/cpu.c          | 4 ++++
>   xen/include/asm-arm/smp.h | 1 +
>   xen/include/asm-x86/smp.h | 1 +
>   xen/include/xen/cpumask.h | 1 +
>   7 files changed, 13 insertions(+)
> 
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index 00b64c3..1b57ba4 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -39,6 +39,7 @@
>   cpumask_t cpu_online_map;
>   cpumask_t cpu_present_map;
>   cpumask_t cpu_possible_map;
> +cpumask_var_t cpu_parked_map;

You define cpu_parked_map but AFAIK it will never get allocated. The 
risk here is any access to that variable will result to a fault.

Looking at the changes below, it looks like access in common code will 
be protected by park_offline_cpus. This is always false on Arm, so the 
compiler should remove any access to cpu_parked_map.

With that in mind, I think it would be best to only provide a prototype 
for cpu_parked_map and so the linker can warn if someone used it.


> diff --git a/xen/include/asm-arm/smp.h b/xen/include/asm-arm/smp.h
> index fdbcefa..4b392fa 100644
> --- a/xen/include/asm-arm/smp.h
> +++ b/xen/include/asm-arm/smp.h
> @@ -19,6 +19,7 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
>    * would otherwise prefer them to be off?
>    */
>   #define park_offline_cpus false
> +extern cpumask_var_t cpu_parked_map;

The prototype should be the same for all architectures. So is there any 
reason to duplicate it?

Cheers,
Jan Beulich Nov. 21, 2019, 10:02 a.m. UTC | #2
On 21.11.2019 10:47, Julien Grall wrote:
> On 20/11/2019 23:05, Chao Gao wrote:
>> --- a/xen/arch/arm/smpboot.c
>> +++ b/xen/arch/arm/smpboot.c
>> @@ -39,6 +39,7 @@
>>   cpumask_t cpu_online_map;
>>   cpumask_t cpu_present_map;
>>   cpumask_t cpu_possible_map;
>> +cpumask_var_t cpu_parked_map;
> 
> You define cpu_parked_map but AFAIK it will never get allocated. The 
> risk here is any access to that variable will result to a fault.
> 
> Looking at the changes below, it looks like access in common code will 
> be protected by park_offline_cpus. This is always false on Arm, so the 
> compiler should remove any access to cpu_parked_map.
> 
> With that in mind, I think it would be best to only provide a prototype 
> for cpu_parked_map and so the linker can warn if someone used it.

+1

In fact I wonder whether the maintenance of the map should live in
common code in the first place. While clearing the respective bit
in cpu_up() looks correct (and could be done without any if()),
I'm not convinced the setting of the bit in cpu_down() is going to
be correct in all cases. I'd rather leave this to the relevant
callers of cpu_down(). To deal with cpu_add_remove_lock not being
held perhaps it would be best to set the flag _before_ calling
cpu_down(); consumers of the map then would need to double check
that a CPU is not _also_ (still) online.

Jan
Jan Beulich Nov. 21, 2019, 10:04 a.m. UTC | #3
On 21.11.2019 00:05, Chao Gao wrote:
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -337,7 +337,11 @@ void __init early_cpu_init(void)
>  	}
>  
>  	if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
> +	{
>  		park_offline_cpus = opt_mce;
> +		if (park_offline_cpus && !zalloc_cpumask_var(&cpu_parked_map))
> +			panic("No memory for CPU parked map\n");
> +	}

Maybe shorter as

	if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) && opt_mce)
	{
		if (!zalloc_cpumask_var(&cpu_parked_map))
			panic("No memory for CPU parked map\n");
		park_offline_cpus = true;
	}

?

Jan
Chao Gao Nov. 21, 2019, 11:43 a.m. UTC | #4
On Thu, Nov 21, 2019 at 11:02:10AM +0100, Jan Beulich wrote:
>On 21.11.2019 10:47, Julien Grall wrote:
>> On 20/11/2019 23:05, Chao Gao wrote:
>>> --- a/xen/arch/arm/smpboot.c
>>> +++ b/xen/arch/arm/smpboot.c
>>> @@ -39,6 +39,7 @@
>>>   cpumask_t cpu_online_map;
>>>   cpumask_t cpu_present_map;
>>>   cpumask_t cpu_possible_map;
>>> +cpumask_var_t cpu_parked_map;
>> 
>> You define cpu_parked_map but AFAIK it will never get allocated. The 
>> risk here is any access to that variable will result to a fault.
>> 
>> Looking at the changes below, it looks like access in common code will 
>> be protected by park_offline_cpus. This is always false on Arm, so the 
>> compiler should remove any access to cpu_parked_map.
>> 
>> With that in mind, I think it would be best to only provide a prototype 
>> for cpu_parked_map and so the linker can warn if someone used it.
>
>+1

Will do. I added this because I am not sure all compilers would omit
such access.

>
>In fact I wonder whether the maintenance of the map should live in
>common code in the first place. While clearing the respective bit
>in cpu_up() looks correct (and could be done without any if()),

But when park_offline_cpus() is false, the map isn't allocated. I don't
think it is safe to access the map in this case.

>I'm not convinced the setting of the bit in cpu_down() is going to
>be correct in all cases.

Do you mean in some cases, cpu_down() is to really offline a CPU even
park_offline_cpus is set? And in this case, setting the bit isn't
correct.

If yes, one thing confuses me is that cpu_down() would call
cpu_notifier_call_chain() several times unconditionally. And registered
callbacks take actions depending on the value of park_offline_cpus.
I expected that callbacks would do more check to avoid parking a CPU
in some cases.

Thanks
Chao
Jan Beulich Nov. 21, 2019, 5:07 p.m. UTC | #5
On 21.11.2019 12:43, Chao Gao wrote:
> On Thu, Nov 21, 2019 at 11:02:10AM +0100, Jan Beulich wrote:
>> On 21.11.2019 10:47, Julien Grall wrote:
>>> On 20/11/2019 23:05, Chao Gao wrote:
>>>> --- a/xen/arch/arm/smpboot.c
>>>> +++ b/xen/arch/arm/smpboot.c
>>>> @@ -39,6 +39,7 @@
>>>>   cpumask_t cpu_online_map;
>>>>   cpumask_t cpu_present_map;
>>>>   cpumask_t cpu_possible_map;
>>>> +cpumask_var_t cpu_parked_map;
>>>
>>> You define cpu_parked_map but AFAIK it will never get allocated. The 
>>> risk here is any access to that variable will result to a fault.
>>>
>>> Looking at the changes below, it looks like access in common code will 
>>> be protected by park_offline_cpus. This is always false on Arm, so the 
>>> compiler should remove any access to cpu_parked_map.
>>>
>>> With that in mind, I think it would be best to only provide a prototype 
>>> for cpu_parked_map and so the linker can warn if someone used it.
>>
>> +1
> 
> Will do. I added this because I am not sure all compilers would omit
> such access.
> 
>>
>> In fact I wonder whether the maintenance of the map should live in
>> common code in the first place. While clearing the respective bit
>> in cpu_up() looks correct (and could be done without any if()),
> 
> But when park_offline_cpus() is false, the map isn't allocated. I don't
> think it is safe to access the map in this case.

Oh, you're right of course. Unless the map was allocated
unconditionally ...

>> I'm not convinced the setting of the bit in cpu_down() is going to
>> be correct in all cases.
> 
> Do you mean in some cases, cpu_down() is to really offline a CPU even
> park_offline_cpus is set? And in this case, setting the bit isn't
> correct.

The purposes of cpu_down() calls _may_ be different. Plus whether
there's parking wanted/necessary for an architecture should remain
- as much as possible - an architecture thing to deal with. I.e.
despite park_offline_cpus being used in common code, I think we
should strive to avoid adding more there when it can be avoided.

Jan
diff mbox series

Patch

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 00b64c3..1b57ba4 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -39,6 +39,7 @@ 
 cpumask_t cpu_online_map;
 cpumask_t cpu_present_map;
 cpumask_t cpu_possible_map;
+cpumask_var_t cpu_parked_map;
 
 struct cpuinfo_arm cpu_data[NR_CPUS];
 
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 6c6bd63..fbb961d 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -337,7 +337,11 @@  void __init early_cpu_init(void)
 	}
 
 	if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
+	{
 		park_offline_cpus = opt_mce;
+		if (park_offline_cpus && !zalloc_cpumask_var(&cpu_parked_map))
+			panic("No memory for CPU parked map\n");
+	}
 
 	initialize_cpu_data(0);
 }
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index fa691b6..f66de8d 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -60,6 +60,7 @@  cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
 bool __read_mostly park_offline_cpus;
+cpumask_var_t cpu_parked_map;
 
 unsigned int __read_mostly nr_sockets;
 cpumask_t **__read_mostly socket_cpumask;
diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 66c855c..0090a19 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -117,6 +117,8 @@  int cpu_down(unsigned int cpu)
     cpu_notifier_call_chain(cpu, CPU_DEAD, NULL, true);
 
     send_global_virq(VIRQ_PCPU_STATE);
+    if ( park_offline_cpus )
+        cpumask_set_cpu(cpu, cpu_parked_map);
     cpu_hotplug_done();
     return 0;
 
@@ -154,6 +156,8 @@  int cpu_up(unsigned int cpu)
     cpu_notifier_call_chain(cpu, CPU_ONLINE, NULL, true);
 
     send_global_virq(VIRQ_PCPU_STATE);
+    if ( park_offline_cpus )
+        cpumask_clear_cpu(cpu, cpu_parked_map);
 
     cpu_hotplug_done();
     return 0;
diff --git a/xen/include/asm-arm/smp.h b/xen/include/asm-arm/smp.h
index fdbcefa..4b392fa 100644
--- a/xen/include/asm-arm/smp.h
+++ b/xen/include/asm-arm/smp.h
@@ -19,6 +19,7 @@  DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
  * would otherwise prefer them to be off?
  */
 #define park_offline_cpus false
+extern cpumask_var_t cpu_parked_map;
 
 extern void noreturn stop_cpu(void);
 
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index dbeed2f..886737d 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -31,6 +31,7 @@  DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
  * would otherwise prefer them to be off?
  */
 extern bool park_offline_cpus;
+extern cpumask_var_t cpu_parked_map;
 
 void smp_send_nmi_allbutself(void);
 
diff --git a/xen/include/xen/cpumask.h b/xen/include/xen/cpumask.h
index 256b60b..543cec5 100644
--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -457,6 +457,7 @@  extern cpumask_t cpu_present_map;
 #define for_each_possible_cpu(cpu) for_each_cpu(cpu, &cpu_possible_map)
 #define for_each_online_cpu(cpu)   for_each_cpu(cpu, &cpu_online_map)
 #define for_each_present_cpu(cpu)  for_each_cpu(cpu, &cpu_present_map)
+#define for_each_parked_cpu(cpu)   for_each_cpu(cpu, cpu_parked_map)
 
 /* Copy to/from cpumap provided by control tools. */
 struct xenctl_bitmap;