diff mbox

xen/x86: Update topology map for PV VCPUs

Message ID 1475687373-14589-1-git-send-email-boris.ostrovsky@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boris Ostrovsky Oct. 5, 2016, 5:09 p.m. UTC
Early during boot topology_update_package_map() computes
logical_pkg_ids for all present processors.

Later, when processors are brought up, identify_cpu() updates
these values based on phys_pkg_id which is a function of
initial_apicid. On PV guests the latter may point to a
non-existing node, causing logical_pkg_ids to be set to -1.

Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
to index its arrays and therefore in this case will point to index
65535 (since logical_pkg_id is a u16). This could lead to either a
crash or may actually access random memory location.

As a workaround, we recompute topology during CPU bringup to reset
logical_pkg_id to a valid value.

(The reason for initial_apicid being bogus is because it is
initial_apicid of the processor from which the guest is launched.
This value is CPUID(1).EBX[31:24])

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
---

Copying Andrew for the CPUID part.

 arch/x86/xen/smp.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Andrew Cooper Oct. 5, 2016, 5:41 p.m. UTC | #1
On 05/10/16 18:09, Boris Ostrovsky wrote:
> Early during boot topology_update_package_map() computes
> logical_pkg_ids for all present processors.
>
> Later, when processors are brought up, identify_cpu() updates
> these values based on phys_pkg_id which is a function of
> initial_apicid. On PV guests the latter may point to a
> non-existing node, causing logical_pkg_ids to be set to -1.
>
> Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
> to index its arrays and therefore in this case will point to index
> 65535 (since logical_pkg_id is a u16). This could lead to either a
> crash or may actually access random memory location.
>
> As a workaround, we recompute topology during CPU bringup to reset
> logical_pkg_id to a valid value.
>
> (The reason for initial_apicid being bogus is because it is
> initial_apicid of the processor from which the guest is launched.
> This value is CPUID(1).EBX[31:24])
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: stable@vger.kernel.org
> ---
>
> Copying Andrew for the CPUID part.

Yeah - that leaf is usually fiction.  (Specifically, the fiction of
whichever cpu a specific toolstack function happened to sample at the
point in time that it was choosing which cpuid values to fake up for the
guest).

I am currently working on fixing the reported topology information to be
architecturally plausible, but current and previous hypervisors will be
wrong.

~Andrew

>
>  arch/x86/xen/smp.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 311acad..9fa27ce 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -87,6 +87,12 @@ static void cpu_bringup(void)
>  	cpu_data(cpu).x86_max_cores = 1;
>  	set_cpu_sibling_map(cpu);
>  
> +	/*
> +	 * identify_cpu() may have set logical_pkg_id to -1 due
> +	 * to incorrect phys_proc_id. Let's re-comupte it.
> +	 */
> +	topology_update_package_map(apic->cpu_present_to_apicid(cpu), cpu);
> +
>  	xen_setup_cpu_clockevents();
>  
>  	notify_cpu_starting(cpu);
Jan Beulich Oct. 6, 2016, 12:14 p.m. UTC | #2
>>> On 05.10.16 at 19:09, <boris.ostrovsky@oracle.com> wrote:
> Early during boot topology_update_package_map() computes
> logical_pkg_ids for all present processors.
> 
> Later, when processors are brought up, identify_cpu() updates
> these values based on phys_pkg_id which is a function of
> initial_apicid. On PV guests the latter may point to a
> non-existing node, causing logical_pkg_ids to be set to -1.
> 
> Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
> to index its arrays and therefore in this case will point to index
> 65535 (since logical_pkg_id is a u16). This could lead to either a
> crash or may actually access random memory location.

Another clear indication that such fields should never be touched
(and hence consumers either be fixed or disabled) when running as
PV guest under Xen.

Jan
David Vrabel Oct. 6, 2016, 2:12 p.m. UTC | #3
On 05/10/16 18:09, Boris Ostrovsky wrote:
> Early during boot topology_update_package_map() computes
> logical_pkg_ids for all present processors.
> 
> Later, when processors are brought up, identify_cpu() updates
> these values based on phys_pkg_id which is a function of
> initial_apicid. On PV guests the latter may point to a
> non-existing node, causing logical_pkg_ids to be set to -1.
> 
> Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
> to index its arrays and therefore in this case will point to index
> 65535 (since logical_pkg_id is a u16). This could lead to either a
> crash or may actually access random memory location.
> 
> As a workaround, we recompute topology during CPU bringup to reset
> logical_pkg_id to a valid value.
> 
> (The reason for initial_apicid being bogus is because it is
> initial_apicid of the processor from which the guest is launched.
> This value is CPUID(1).EBX[31:24])

Applied to for-linus-4.9, thanks.

David
diff mbox

Patch

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 311acad..9fa27ce 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -87,6 +87,12 @@  static void cpu_bringup(void)
 	cpu_data(cpu).x86_max_cores = 1;
 	set_cpu_sibling_map(cpu);
 
+	/*
+	 * identify_cpu() may have set logical_pkg_id to -1 due
+	 * to incorrect phys_proc_id. Let's re-comupte it.
+	 */
+	topology_update_package_map(apic->cpu_present_to_apicid(cpu), cpu);
+
 	xen_setup_cpu_clockevents();
 
 	notify_cpu_starting(cpu);