Message ID | 1495036625-24071-1-git-send-email-luwei.kang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 17.05.17 at 17:57, <luwei.kang@intel.com> wrote: > @@ -581,9 +582,14 @@ static void vpmu_arch_destroy(struct vcpu *v) > > if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy ) > { > - /* Unload VPMU first. This will stop counters */ > - on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), > - vpmu_save_force, v, 1); > + /* > + * Unload VPMU first if VPMU_CONTEXT_LOADED being set. > + * This will stop counters. > + */ > + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) > + on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), > + vpmu_save_force, v, 1); > + > vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); > } > } So this is a good step towards what was requested during v1 review, provided it is correct (I'll let Boris comment). You didn't, however, do anything about the other unguarded last_pcpu uses (in vpmu_load() and upwards from the code above in vpmu_arch_destroy()). These _may_ be implicitly fine, but if so please at least add suitable ASSERT()s. Jan
On 05/18/2017 05:07 AM, Jan Beulich wrote: >>>> On 17.05.17 at 17:57, <luwei.kang@intel.com> wrote: >> @@ -581,9 +582,14 @@ static void vpmu_arch_destroy(struct vcpu *v) >> >> if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy ) >> { >> - /* Unload VPMU first. This will stop counters */ >> - on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), >> - vpmu_save_force, v, 1); >> + /* >> + * Unload VPMU first if VPMU_CONTEXT_LOADED being set. >> + * This will stop counters. >> + */ >> + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) >> + on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), >> + vpmu_save_force, v, 1); >> + >> vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); >> } >> } > So this is a good step towards what was requested during v1 review, > provided it is correct (I'll let Boris comment). From correctness perspective I don't see any problems. As I said last time, I'd rename cpu_callback() to something less generic, like vpmu_cpu_callback() (or vpmu_cpuhp_callback()). > You didn't, however, do > anything about the other unguarded last_pcpu uses (in vpmu_load() > and upwards from the code above in vpmu_arch_destroy()). These > _may_ be implicitly fine, but if so please at least add suitable > ASSERT()s. I wonder whether we should have such an ASSERT() in on_selected_cpus() instead. -boris
>>> On 18.05.17 at 15:03, <boris.ostrovsky@oracle.com> wrote: > On 05/18/2017 05:07 AM, Jan Beulich wrote: >>>>> On 17.05.17 at 17:57, <luwei.kang@intel.com> wrote: >>> @@ -581,9 +582,14 @@ static void vpmu_arch_destroy(struct vcpu *v) >>> >>> if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy ) >>> { >>> - /* Unload VPMU first. This will stop counters */ >>> - on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), >>> - vpmu_save_force, v, 1); >>> + /* >>> + * Unload VPMU first if VPMU_CONTEXT_LOADED being set. >>> + * This will stop counters. >>> + */ >>> + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) >>> + on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), >>> + vpmu_save_force, v, 1); >>> + >>> vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); >>> } >>> } >> So this is a good step towards what was requested during v1 review, >> provided it is correct (I'll let Boris comment). > > From correctness perspective I don't see any problems. > > As I said last time, I'd rename cpu_callback() to something less > generic, like vpmu_cpu_callback() (or vpmu_cpuhp_callback()). The vpmu_ prefix is clearly pointless for a static function. >> You didn't, however, do >> anything about the other unguarded last_pcpu uses (in vpmu_load() >> and upwards from the code above in vpmu_arch_destroy()). These >> _may_ be implicitly fine, but if so please at least add suitable >> ASSERT()s. > > I wonder whether we should have such an ASSERT() in on_selected_cpus() > instead. That's a good idea (and I'll queue a patch to that effect for post-4.9), but it won't deal with all issues here. Namely the use of last_pcpu in vpmu_arch_destroy() which the v2 patch didn't touch is a problem already before calling on_selected_cpus(): per_cpu(last_vcpu, vpmu->last_pcpu) is simply invalid if vpmu->last_pcpu is offline. Jan
On Thu, 2017-05-18 at 07:16 -0600, Jan Beulich wrote: > > > > On 18.05.17 at 15:03, <boris.ostrovsky@oracle.com> wrote: > > > > As I said last time, I'd rename cpu_callback() to something less > > generic, like vpmu_cpu_callback() (or vpmu_cpuhp_callback()). > > The vpmu_ prefix is clearly pointless for a static function. > And "just" using cpu_callback is what we do in a lot of (although, not everywhere :-( ) other places: xen/common/timer.c: .notifier_call = cpu_callback, xen/common/kexec.c: .notifier_call = cpu_callback xen/common/cpupool.c: .notifier_call = cpu_callback xen/arch/x86/hvm/hvm.c: .notifier_call = cpu_callback xen/arch/x86/cpu/mcheck/mce_intel.c: .notifier_call = cpu_callback xen/common/stop_machine.c: .notifier_call = cpu_callback xen/common/tmem_xen.c: .notifier_call = cpu_callback xen/common/tasklet.c: .notifier_call = cpu_callback, xen/common/trace.c: .notifier_call = cpu_callback xen/drivers/passthrough/io.c: .notifier_call = cpu_callback, xen/drivers/cpufreq/cpufreq.c: .notifier_call = cpu_callback Dario
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c index 03401fd..57a0e9d 100644 --- a/xen/arch/x86/cpu/vpmu.c +++ b/xen/arch/x86/cpu/vpmu.c @@ -21,6 +21,7 @@ #include <xen/xenoprof.h> #include <xen/event.h> #include <xen/guest_access.h> +#include <xen/cpu.h> #include <asm/regs.h> #include <asm/types.h> #include <asm/msr.h> @@ -581,9 +582,14 @@ static void vpmu_arch_destroy(struct vcpu *v) if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy ) { - /* Unload VPMU first. This will stop counters */ - on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), - vpmu_save_force, v, 1); + /* + * Unload VPMU first if VPMU_CONTEXT_LOADED being set. + * This will stop counters. + */ + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) + on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), + vpmu_save_force, v, 1); + vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); } } @@ -835,6 +841,33 @@ long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg) return ret; } +static int cpu_callback( + struct notifier_block *nfb, unsigned long action, void *hcpu) +{ + unsigned int cpu = (unsigned long)hcpu; + struct vcpu *vcpu = per_cpu(last_vcpu, cpu); + struct vpmu_struct *vpmu; + + if ( !vcpu ) + return NOTIFY_DONE; + + vpmu = vcpu_vpmu(vcpu); + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + return NOTIFY_DONE; + + if ( action == CPU_DYING ) + { + vpmu_save_force(vcpu); + vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); + } + + return NOTIFY_DONE; +} + +static struct notifier_block vpmu_cpu_nfb = { + .notifier_call = cpu_callback +}; + static int __init vpmu_init(void) { int vendor = current_cpu_data.x86_vendor; @@ -872,8 +905,11 @@ static int __init vpmu_init(void) } if ( vpmu_mode != XENPMU_MODE_OFF ) + { + register_cpu_notifier(&vpmu_cpu_nfb); printk(XENLOG_INFO "VPMU: version " __stringify(XENPMU_VER_MAJ) "." __stringify(XENPMU_VER_MIN) "\n"); + } else opt_vpmu_enabled = 0;
Currently, Hot unplug a physical CPU with vpmu enabled may cause system hang due to send a remote call to an offlined pCPU. This patch add a cpu hot unplug notifer to save vpmu context before cpu offline. Consider one scenario, hot unplug pCPU N with vpmu enabled. The vcpu which running on this pCPU will be switch to other online cpu. A remote call will be send to pCPU N to save the vpmu context before loading the vpmu context on this pCPU. System will hang in function on_select_cpus() because of that pCPU is offlined and can not do any respond. Signed-off-by: Luwei Kang <luwei.kang@intel.com> --- v2: 1.fix some typo and coding style; 2.change "swith" to "if" in cpu_callback() because of there just have one case; 3.add VPMU_CONTEX_LOADED check before send remote call in vpmu_arch_destroy(); --- xen/arch/x86/cpu/vpmu.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-)