diff mbox

[v3] x86/vpmu: add cpu hot unplug notifier for vpmu

Message ID 1495372182-12928-1-git-send-email-luwei.kang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Luwei Kang May 21, 2017, 1:09 p.m. UTC
Currently, Hot unplug a physical CPU with vpmu enabled may cause
system hang due to send a remote call to an offlined pCPU. This
patch add a cpu hot unplug notifer to save vpmu context before
cpu offline.

Consider one scenario, hot unplug pCPU N with vpmu enabled.
The vcpu which running on this pCPU will be switch to other
online cpu. A remote call will be send to pCPU N to save the
vpmu context before loading the vpmu context on this pCPU.
System will hang in function on_select_cpus() because of that pCPU
is offlined and can not do any respond.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
---
v3:
 1.add cpu_online() check in vpm_load() and vpmu_arch_destroy();
 2.add vpmu_ prefix. rename cpu_callback() to vpmu_cpu_callback();
v2:
 1.fix some typo and coding style;
 2.change "swith" to "if" in cpu_callback() because of there just have one case;
 3.add VPMU_CONTEX_LOADED check before send remote call in vpmu_arch_destroy();
---
 xen/arch/x86/cpu/vpmu.c | 52 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 6 deletions(-)

Comments

Jan Beulich May 22, 2017, 1:29 p.m. UTC | #1
>>> On 21.05.17 at 15:09, <luwei.kang@intel.com> wrote:
> v3:
>  1.add cpu_online() check in vpm_load() and vpmu_arch_destroy();
>  2.add vpmu_ prefix. rename cpu_callback() to vpmu_cpu_callback();

I had specifically objected to the latter.

> @@ -394,8 +395,11 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          return 0;
>  
> -    /* First time this VCPU is running here */
> -    if ( vpmu->last_pcpu != pcpu )
> +    /*
> +     * The last pCPU is still online and this is the first time this vCPU
> +     * running here.
> +     */
> +    if ( cpu_online(vpmu->last_pcpu) && vpmu->last_pcpu != pcpu )

Adding a cpu_online() check here is unlikely to be helpful. Actually I
may have misguided you with prior comments (and if so, I'm sorry) -
the LOADED check following this one makes sure on_selected_cpus()
won't be called with an offline CPU here. IOW I think the code can
be left untouched, but the reason why should be spelled out in the
commit message (matching the reasoning why adding the LOADED
check to vpmu_arch_destroy() is sufficient for the second use of
last_pcpu there).

> @@ -575,15 +579,21 @@ static void vpmu_arch_destroy(struct vcpu *v)
>       * We will test it again in vpmu_clear_last() with interrupts
>       * disabled to make sure we don't clear someone else.
>       */
> -    if ( per_cpu(last_vcpu, vpmu->last_pcpu) == v )
> +    if ( cpu_online(vpmu->last_pcpu) &&
> +            per_cpu(last_vcpu, vpmu->last_pcpu) == v )

Indentation.

Jan
Luwei Kang May 23, 2017, 1:47 a.m. UTC | #2
> >>> On 21.05.17 at 15:09, <luwei.kang@intel.com> wrote:
> > v3:
> >  1.add cpu_online() check in vpm_load() and vpmu_arch_destroy();
> > 2.add vpmu_ prefix. rename cpu_callback() to vpmu_cpu_callback();
> 
> I had specifically objected to the latter.

Sorry, will rollback it.

> 
> > @@ -394,8 +395,11 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
> >      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
> >          return 0;
> >
> > -    /* First time this VCPU is running here */
> > -    if ( vpmu->last_pcpu != pcpu )
> > +    /*
> > +     * The last pCPU is still online and this is the first time this vCPU
> > +     * running here.
> > +     */
> > +    if ( cpu_online(vpmu->last_pcpu) && vpmu->last_pcpu != pcpu )
> 
> Adding a cpu_online() check here is unlikely to be helpful. Actually I may have misguided you with prior comments (and if so, I'm
> sorry) - the LOADED check following this one makes sure on_selected_cpus() won't be called with an offline CPU here. IOW I think
> the code can be left untouched, but the reason why should be spelled out in the commit message (matching the reasoning why
> adding the LOADED check to vpmu_arch_destroy() is sufficient for the second use of last_pcpu there).
> 

So, remove cpu_online() check here, because of LOADED check can make sure don't send remote call to an offline cpu (cpu_callback() will
reset this flag).  
The cpu_online() check in vpmu_arch_destroy() should be reserved due to per_cpu(last_vcpu, vpmu->last_pcpu) has become an invalid value(Not NULL).
Is that right?

Thanks,
Luwei Kang

> > @@ -575,15 +579,21 @@ static void vpmu_arch_destroy(struct vcpu *v)
> >       * We will test it again in vpmu_clear_last() with interrupts
> >       * disabled to make sure we don't clear someone else.
> >       */
> > -    if ( per_cpu(last_vcpu, vpmu->last_pcpu) == v )
> > +    if ( cpu_online(vpmu->last_pcpu) &&
> > +            per_cpu(last_vcpu, vpmu->last_pcpu) == v )
> 
> Indentation.
> 
> Jan
Jan Beulich May 23, 2017, 7:31 a.m. UTC | #3
>>> On 23.05.17 at 03:47, <luwei.kang@intel.com> wrote:
>> >>> On 21.05.17 at 15:09, <luwei.kang@intel.com> wrote:
>> > @@ -394,8 +395,11 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
>> >      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>> >          return 0;
>> >
>> > -    /* First time this VCPU is running here */
>> > -    if ( vpmu->last_pcpu != pcpu )
>> > +    /*
>> > +     * The last pCPU is still online and this is the first time this vCPU
>> > +     * running here.
>> > +     */
>> > +    if ( cpu_online(vpmu->last_pcpu) && vpmu->last_pcpu != pcpu )
>> 
>> Adding a cpu_online() check here is unlikely to be helpful. Actually I may 
> have misguided you with prior comments (and if so, I'm
>> sorry) - the LOADED check following this one makes sure on_selected_cpus() 
> won't be called with an offline CPU here. IOW I think
>> the code can be left untouched, but the reason why should be spelled out in 
> the commit message (matching the reasoning why
>> adding the LOADED check to vpmu_arch_destroy() is sufficient for the second 
> use of last_pcpu there).
>> 
> 
> So, remove cpu_online() check here, because of LOADED check can make sure 
> don't send remote call to an offline cpu (cpu_callback() will
> reset this flag).  
> The cpu_online() check in vpmu_arch_destroy() should be reserved due to 
> per_cpu(last_vcpu, vpmu->last_pcpu) has become an invalid value(Not NULL).
> Is that right?

Yes.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 03401fd..486af12 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -21,6 +21,7 @@ 
 #include <xen/xenoprof.h>
 #include <xen/event.h>
 #include <xen/guest_access.h>
+#include <xen/cpu.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
@@ -394,8 +395,11 @@  int vpmu_load(struct vcpu *v, bool_t from_guest)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return 0;
 
-    /* First time this VCPU is running here */
-    if ( vpmu->last_pcpu != pcpu )
+    /*
+     * The last pCPU is still online and this is the first time this vCPU
+     * running here.
+     */
+    if ( cpu_online(vpmu->last_pcpu) && vpmu->last_pcpu != pcpu )
     {
         /*
          * Get the context from last pcpu that we ran on. Note that if another
@@ -575,15 +579,21 @@  static void vpmu_arch_destroy(struct vcpu *v)
      * We will test it again in vpmu_clear_last() with interrupts
      * disabled to make sure we don't clear someone else.
      */
-    if ( per_cpu(last_vcpu, vpmu->last_pcpu) == v )
+    if ( cpu_online(vpmu->last_pcpu) &&
+            per_cpu(last_vcpu, vpmu->last_pcpu) == v )
         on_selected_cpus(cpumask_of(vpmu->last_pcpu),
                          vpmu_clear_last, v, 1);
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
     {
-        /* Unload VPMU first. This will stop counters */
-        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
-                         vpmu_save_force, v, 1);
+        /*
+         * Unload VPMU first if VPMU_CONTEXT_LOADED being set.
+         * This will stop counters.
+         */
+        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+            on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                             vpmu_save_force, v, 1);
+
          vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
     }
 }
@@ -835,6 +845,33 @@  long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
     return ret;
 }
 
+static int vpmu_cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned int cpu = (unsigned long)hcpu;
+    struct vcpu *vcpu = per_cpu(last_vcpu, cpu);
+    struct vpmu_struct *vpmu;
+
+    if ( !vcpu )
+        return NOTIFY_DONE;
+
+    vpmu = vcpu_vpmu(vcpu);
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return NOTIFY_DONE;
+
+    if ( action == CPU_DYING )
+    {
+        vpmu_save_force(vcpu);
+        vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block vpmu_cpu_nfb = {
+    .notifier_call = vpmu_cpu_callback
+};
+
 static int __init vpmu_init(void)
 {
     int vendor = current_cpu_data.x86_vendor;
@@ -872,8 +909,11 @@  static int __init vpmu_init(void)
     }
 
     if ( vpmu_mode != XENPMU_MODE_OFF )
+    {
+        register_cpu_notifier(&vpmu_cpu_nfb);
         printk(XENLOG_INFO "VPMU: version " __stringify(XENPMU_VER_MAJ) "."
                __stringify(XENPMU_VER_MIN) "\n");
+    }
     else
         opt_vpmu_enabled = 0;