Message ID | 1475687373-14589-1-git-send-email-boris.ostrovsky@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/10/16 18:09, Boris Ostrovsky wrote: > Early during boot topology_update_package_map() computes > logical_pkg_ids for all present processors. > > Later, when processors are brought up, identify_cpu() updates > these values based on phys_pkg_id which is a function of > initial_apicid. On PV guests the latter may point to a > non-existing node, causing logical_pkg_ids to be set to -1. > > Intel's RAPL uses logical_pkg_id (as topology_logical_package_id()) > to index its arrays and therefore in this case will point to index > 65535 (since logical_pkg_id is a u16). This could lead to either a > crash or may actually access random memory location. > > As a workaround, we recompute topology during CPU bringup to reset > logical_pkg_id to a valid value. > > (The reason for initial_apicid being bogus is because it is > initial_apicid of the processor from which the guest is launched. > This value is CPUID(1).EBX[31:24]) > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> > Cc: stable@vger.kernel.org > --- > > Copying Andrew for the CPUID part. Yeah - that leaf is usually fiction. (Specifically, the fiction of whichever cpu a specific toolstack function happened to sample at the point in time that it was choosing which cpuid values to fake up for the guest). I am currently working on fixing the reported topology information to be architecturally plausible, but current and previous hypervisors will be wrong. ~Andrew > > arch/x86/xen/smp.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c > index 311acad..9fa27ce 100644 > --- a/arch/x86/xen/smp.c > +++ b/arch/x86/xen/smp.c > @@ -87,6 +87,12 @@ static void cpu_bringup(void) > cpu_data(cpu).x86_max_cores = 1; > set_cpu_sibling_map(cpu); > > + /* > + * identify_cpu() may have set logical_pkg_id to -1 due > + * to incorrect phys_proc_id. Let's re-comupte it. > + */ > + topology_update_package_map(apic->cpu_present_to_apicid(cpu), cpu); > + > xen_setup_cpu_clockevents(); > > notify_cpu_starting(cpu);
>>> On 05.10.16 at 19:09, <boris.ostrovsky@oracle.com> wrote: > Early during boot topology_update_package_map() computes > logical_pkg_ids for all present processors. > > Later, when processors are brought up, identify_cpu() updates > these values based on phys_pkg_id which is a function of > initial_apicid. On PV guests the latter may point to a > non-existing node, causing logical_pkg_ids to be set to -1. > > Intel's RAPL uses logical_pkg_id (as topology_logical_package_id()) > to index its arrays and therefore in this case will point to index > 65535 (since logical_pkg_id is a u16). This could lead to either a > crash or may actually access random memory location. Another clear indication that such fields should never be touched (and hence consumers either be fixed or disabled) when running as PV guest under Xen. Jan
On 05/10/16 18:09, Boris Ostrovsky wrote: > Early during boot topology_update_package_map() computes > logical_pkg_ids for all present processors. > > Later, when processors are brought up, identify_cpu() updates > these values based on phys_pkg_id which is a function of > initial_apicid. On PV guests the latter may point to a > non-existing node, causing logical_pkg_ids to be set to -1. > > Intel's RAPL uses logical_pkg_id (as topology_logical_package_id()) > to index its arrays and therefore in this case will point to index > 65535 (since logical_pkg_id is a u16). This could lead to either a > crash or may actually access random memory location. > > As a workaround, we recompute topology during CPU bringup to reset > logical_pkg_id to a valid value. > > (The reason for initial_apicid being bogus is because it is > initial_apicid of the processor from which the guest is launched. > This value is CPUID(1).EBX[31:24]) Applied to for-linus-4.9, thanks. David
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index 311acad..9fa27ce 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -87,6 +87,12 @@ static void cpu_bringup(void) cpu_data(cpu).x86_max_cores = 1; set_cpu_sibling_map(cpu); + /* + * identify_cpu() may have set logical_pkg_id to -1 due + * to incorrect phys_proc_id. Let's re-comupte it. + */ + topology_update_package_map(apic->cpu_present_to_apicid(cpu), cpu); + xen_setup_cpu_clockevents(); notify_cpu_starting(cpu);
Early during boot topology_update_package_map() computes logical_pkg_ids for all present processors. Later, when processors are brought up, identify_cpu() updates these values based on phys_pkg_id which is a function of initial_apicid. On PV guests the latter may point to a non-existing node, causing logical_pkg_ids to be set to -1. Intel's RAPL uses logical_pkg_id (as topology_logical_package_id()) to index its arrays and therefore in this case will point to index 65535 (since logical_pkg_id is a u16). This could lead to either a crash or may actually access random memory location. As a workaround, we recompute topology during CPU bringup to reset logical_pkg_id to a valid value. (The reason for initial_apicid being bogus is because it is initial_apicid of the processor from which the guest is launched. This value is CPUID(1).EBX[31:24]) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org --- Copying Andrew for the CPUID part. arch/x86/xen/smp.c | 6 ++++++ 1 file changed, 6 insertions(+)