diff mbox series

[v9,13/15] x86/microcode: Synchronize late microcode loading

Message ID 1566177928-19114-14-git-send-email-chao.gao@intel.com (mailing list archive)
State Superseded
Headers show
Series improve late microcode loading | expand

Commit Message

Chao Gao Aug. 19, 2019, 1:25 a.m. UTC
This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Tested-by: Chao Gao <chao.gao@intel.com>
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes in v9:
 - log __buildin_return_address(0) when timeout
 - divide CPUs into three logical sets and they will call different
 functions during ucode loading. The 'control thread' is chosen to
 coordinate ucode loading on all CPUs. Since only control thread would
 set 'loading_state', we can get rid of 'cmpxchg' stuff in v8.
 - s/rep_nop/cpu_relax
 - each thread updates its revision number itself
 - add XENLOG_ERR prefix for each line of multi-line log messages

Changes in v8:
 - to support blocking #NMI handling during loading ucode
   * introduce a flag, 'loading_state', to mark the start or end of
     ucode loading.
   * use a bitmap for cpu callin since if cpu may stay in #NMI handling,
     there are two places for a cpu to call in. bitmap won't be counted
     twice.
   * don't wait for all CPUs callout, just wait for CPUs that perform the
     update. We have to do this because some threads may be stuck in NMI
     handling (where cannot reach the rendezvous).
 - emit a warning if the system stays in stop_machine context for more
 than 1s
 - comment that rdtsc is fine while loading an update
 - use cmpxchg() to avoid panic being called on multiple CPUs
 - Propagate revision number to other threads
 - refine comments and prompt messages

Changes in v7:
 - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
 - reword the comment above microcode_update_cpu() to clearly state that
 one thread per core should do the update.
---
 xen/arch/x86/microcode.c | 289 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 267 insertions(+), 22 deletions(-)

Comments

Sergey Dyasli Aug. 19, 2019, 10:27 a.m. UTC | #1
On 19/08/2019 02:25, Chao Gao wrote:
> This patch ports microcode improvement patches from linux kernel.
> 
> Before you read any further: the early loading method is still the
> preferred one and you should always do that. The following patch is
> improving the late loading mechanism for long running jobs and cloud use
> cases.
> 
> Gather all cores and serialize the microcode update on them by doing it
> one-by-one to make the late update process as reliable as possible and
> avoid potential issues caused by the microcode update.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Chao Gao <chao.gao@intel.com>
> [linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
> [linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
> Changes in v9:
>  - log __buildin_return_address(0) when timeout
>  - divide CPUs into three logical sets and they will call different
>  functions during ucode loading. The 'control thread' is chosen to
>  coordinate ucode loading on all CPUs. Since only control thread would
>  set 'loading_state', we can get rid of 'cmpxchg' stuff in v8.
>  - s/rep_nop/cpu_relax
>  - each thread updates its revision number itself
>  - add XENLOG_ERR prefix for each line of multi-line log messages
> 
> Changes in v8:
>  - to support blocking #NMI handling during loading ucode
>    * introduce a flag, 'loading_state', to mark the start or end of
>      ucode loading.
>    * use a bitmap for cpu callin since if cpu may stay in #NMI handling,
>      there are two places for a cpu to call in. bitmap won't be counted
>      twice.
>    * don't wait for all CPUs callout, just wait for CPUs that perform the
>      update. We have to do this because some threads may be stuck in NMI
>      handling (where cannot reach the rendezvous).
>  - emit a warning if the system stays in stop_machine context for more
>  than 1s
>  - comment that rdtsc is fine while loading an update
>  - use cmpxchg() to avoid panic being called on multiple CPUs
>  - Propagate revision number to other threads
>  - refine comments and prompt messages
> 
> Changes in v7:
>  - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
>  - reword the comment above microcode_update_cpu() to clearly state that
>  one thread per core should do the update.
> ---
>  xen/arch/x86/microcode.c | 289 +++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 267 insertions(+), 22 deletions(-)
> 
> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
> index bdd9c9f..91f9e81 100644
> --- a/xen/arch/x86/microcode.c
> +++ b/xen/arch/x86/microcode.c
> @@ -30,18 +30,52 @@
>  #include <xen/smp.h>
>  #include <xen/softirq.h>
>  #include <xen/spinlock.h>
> +#include <xen/stop_machine.h>
>  #include <xen/tasklet.h>
>  #include <xen/guest_access.h>
>  #include <xen/earlycpio.h>
> +#include <xen/watchdog.h>
>  
> +#include <asm/delay.h>
>  #include <asm/msr.h>
>  #include <asm/processor.h>
>  #include <asm/setup.h>
>  #include <asm/microcode.h>
>  
> +/*
> + * Before performing a late microcode update on any thread, we
> + * rendezvous all cpus in stop_machine context. The timeout for
> + * waiting for cpu rendezvous is 30ms. It is the timeout used by
> + * live patching
> + */
> +#define MICROCODE_CALLIN_TIMEOUT_US 30000
> +
> +/*
> + * Timeout for each thread to complete update is set to 1s. It is a
> + * conservative choice considering all possible interference.
> + */
> +#define MICROCODE_UPDATE_TIMEOUT_US 1000000
> +
>  static module_t __initdata ucode_mod;
>  static signed int __initdata ucode_mod_idx;
>  static bool_t __initdata ucode_mod_forced;
> +static unsigned int nr_cores;
> +
> +/*
> + * These states help to coordinate CPUs during loading an update.
> + *
> + * The semantics of each state is as follow:
> + *  - LOADING_PREPARE: initial state of 'loading_state'.
> + *  - LOADING_CALLIN: CPUs are allowed to callin.
> + *  - LOADING_ENTER: all CPUs have called in. Initiate ucode loading.
> + *  - LOADING_EXIT: ucode loading is done or aborted.
> + */
> +static enum {
> +    LOADING_PREPARE,
> +    LOADING_CALLIN,
> +    LOADING_ENTER,
> +    LOADING_EXIT,
> +} loading_state;
>  
>  /*
>   * If we scan the initramfs.cpio for the early microcode code
> @@ -190,6 +224,16 @@ static DEFINE_SPINLOCK(microcode_mutex);
>  DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
>  
>  /*
> + * Count the CPUs that have entered, exited the rendezvous and succeeded in
> + * microcode update during late microcode update respectively.
> + *
> + * Note that a bitmap is used for callin to allow cpu to set a bit multiple
> + * times. It is required to do busy-loop in #NMI handling.
> + */
> +static cpumask_t cpu_callin_map;
> +static atomic_t cpu_out, cpu_updated;
> +
> +/*
>   * Return a patch that covers current CPU. If there are multiple patches,
>   * return the one with the highest revision number. Return error If no
>   * patch is found and an error occurs during the parsing process. Otherwise
> @@ -232,6 +276,34 @@ bool microcode_update_cache(struct microcode_patch *patch)
>      return true;
>  }
>  
> +/* Wait for a condition to be met with a timeout (us). */
> +static int wait_for_condition(int (*func)(void *data), void *data,
> +                         unsigned int timeout)
> +{
> +    while ( !func(data) )
> +    {
> +        if ( !timeout-- )
> +        {
> +            printk("CPU%u: Timeout in %pS\n",
> +                   smp_processor_id(), __builtin_return_address(0));
> +            return -EBUSY;
> +        }
> +        udelay(1);
> +    }
> +
> +    return 0;
> +}
> +
> +static int wait_cpu_callin(void *nr)
> +{
> +    return cpumask_weight(&cpu_callin_map) >= (unsigned long)nr;
> +}
> +
> +static int wait_cpu_callout(void *nr)
> +{
> +    return atomic_read(&cpu_out) >= (unsigned long)nr;
> +}
> +
>  /*
>   * Load a microcode update to current CPU.
>   *
> @@ -265,37 +337,155 @@ static int microcode_update_cpu(const struct microcode_patch *patch)
>      return err;
>  }
>  
> -static long do_microcode_update(void *patch)
> +static int slave_thread_fn(void)
> +{
> +    unsigned int cpu = smp_processor_id();
> +    unsigned int master = cpumask_first(this_cpu(cpu_sibling_mask));
> +
> +    while ( loading_state != LOADING_CALLIN )
> +        cpu_relax();
> +
> +    cpumask_set_cpu(cpu, &cpu_callin_map);
> +
> +    while ( loading_state != LOADING_EXIT )
> +        cpu_relax();
> +
> +    /* Copy update revision from the "master" thread. */
> +    this_cpu(cpu_sig).rev = per_cpu(cpu_sig, master).rev;
> +
> +    return 0;
> +}
> +
> +static int master_thread_fn(const struct microcode_patch *patch)
> +{
> +    unsigned int cpu = smp_processor_id();
> +    int ret = 0;
> +
> +    while ( loading_state != LOADING_CALLIN )
> +        cpu_relax();
> +
> +    cpumask_set_cpu(cpu, &cpu_callin_map);
> +
> +    while ( loading_state != LOADING_ENTER )
> +        cpu_relax();

If I'm reading it right, this will wait forever in case when...

> +
> +    /*
> +     * If an error happened, control thread would set 'loading_state'
> +     * to LOADING_EXIT. Don't perform ucode loading for this case
> +     */
> +    if ( loading_state == LOADING_EXIT )
> +        return ret;
> +
> +    ret = microcode_ops->apply_microcode(patch);
> +    if ( !ret )
> +        atomic_inc(&cpu_updated);
> +    atomic_inc(&cpu_out);
> +
> +    while ( loading_state != LOADING_EXIT )
> +        cpu_relax();
> +
> +    return ret;
> +}
> +
> +static int control_thread_fn(const struct microcode_patch *patch)
>  {
> -    unsigned int cpu;
> +    unsigned int cpu = smp_processor_id(), done;
> +    unsigned long tick;
> +    int ret;
>  
> -    /* Store the patch after a successful loading */
> -    if ( !microcode_update_cpu(patch) && patch )
> +    /* Allow threads to call in */
> +    loading_state = LOADING_CALLIN;
> +    smp_mb();
> +
> +    cpumask_set_cpu(cpu, &cpu_callin_map);
> +
> +    /* Waiting for all threads calling in */
> +    ret = wait_for_condition(wait_cpu_callin,
> +                             (void *)(unsigned long)num_online_cpus(),
> +                             MICROCODE_CALLIN_TIMEOUT_US);
> +    if ( ret ) {
> +        loading_state = LOADING_EXIT;
> +        return ret;
> +    }

...this condition holds. Have you actually tested this case?

> +
> +    /* Let master threads load the given ucode update */
> +    loading_state = LOADING_ENTER;
> +    smp_mb();
> +
> +    ret = microcode_ops->apply_microcode(patch);
> +    if ( !ret )
> +        atomic_inc(&cpu_updated);
> +    atomic_inc(&cpu_out);
> +
> +    tick = rdtsc_ordered();
> +    /* Waiting for master threads finishing update */
> +    done = atomic_read(&cpu_out);
> +    while ( done != nr_cores )
>      {
> -        spin_lock(&microcode_mutex);
> -        microcode_update_cache(patch);
> -        spin_unlock(&microcode_mutex);
> -        patch = NULL;
> +        /*
> +         * During each timeout interval, at least a CPU is expected to
> +         * finish its update. Otherwise, something goes wrong.
> +         *
> +         * Note that RDTSC (in wait_for_condition()) is safe for threads to
> +         * execute while waiting for completion of loading an update.
> +         */
> +        if ( wait_for_condition(wait_cpu_callout,
> +                                (void *)(unsigned long)(done + 1),
> +                                MICROCODE_UPDATE_TIMEOUT_US) )
> +            panic("Timeout when finished updating microcode (finished %u/%u)",
> +                  done, nr_cores);
> +
> +        /* Print warning message once if long time is spent here */
> +        if ( tick && rdtsc_ordered() - tick >= cpu_khz * 1000 )
> +        {
> +            printk(XENLOG_WARNING
> +                   "WARNING: UPDATING MICROCODE HAS CONSUMED MORE THAN 1 SECOND!\n");
> +            tick = 0;
> +        }
> +        done = atomic_read(&cpu_out);
>      }
>  
> -    if ( microcode_ops->end_update )
> -        microcode_ops->end_update();
> +    /* Mark loading is done to unblock other threads */
> +    loading_state = LOADING_EXIT;
> +    smp_mb();
>  
> -    cpu = cpumask_next(smp_processor_id(), &cpu_online_map);
> -    if ( cpu < nr_cpu_ids )
> -        return continue_hypercall_on_cpu(cpu, do_microcode_update, patch);
> +    return ret;
> +}
>  
> -    /* Free the patch if no CPU has loaded it successfully. */
> -    if ( patch )
> -        microcode_free_patch(patch);
> +static int do_microcode_update(void *patch)
> +{
> +    unsigned int cpu = smp_processor_id();
> +    /*
> +     * "master" thread is the one with the lowest thread id among all siblings
> +     * thread in a core or a compute unit. It is chosen to load a microcode
> +     * update.
> +     */
> +    unsigned int master = cpumask_first(this_cpu(cpu_sibling_mask));
> +    int ret;
>  
> -    return 0;
> +    /*
> +     * The control thread set state to coordinate ucode loading. Master threads
> +     * load the given ucode patch. Slave threads just wait for the completion
> +     * of the ucode loading process.
> +     */
> +    if ( cpu == cpumask_first(&cpu_online_map) )
> +        ret = control_thread_fn(patch);
> +    else if ( cpu == master )
> +        ret = master_thread_fn(patch);
> +    else
> +        ret = slave_thread_fn();
> +
> +    if ( microcode_ops->end_update )
> +        microcode_ops->end_update();
> +
> +    return ret;
>  }
>  
>  int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
>  {
>      int ret;
>      void *buffer;
> +    unsigned int cpu, updated;
>      struct microcode_patch *patch;
>  
>      if ( len != (uint32_t)len )
> @@ -314,11 +504,18 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
>          goto free;
>      }
>  
> +    /* cpu_online_map must not change during update */
> +    if ( !get_cpu_maps() )
> +    {
> +        ret = -EBUSY;
> +        goto free;
> +    }
> +
>      if ( microcode_ops->start_update )
>      {
>          ret = microcode_ops->start_update();
>          if ( ret != 0 )
> -            goto free;
> +            goto put;
>      }
>  
>      patch = parse_blob(buffer, len);
> @@ -326,19 +523,67 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
>      {
>          ret = PTR_ERR(patch);
>          printk(XENLOG_INFO "Parsing microcode blob error %d\n", ret);
> -        goto free;
> +        goto put;
>      }
>  
>      if ( !patch )
>      {
>          printk(XENLOG_INFO "No ucode found. Update aborted!\n");
>          ret = -EINVAL;
> -        goto free;
> +        goto put;
> +    }
> +
> +    cpumask_clear(&cpu_callin_map);
> +    atomic_set(&cpu_out, 0);
> +    atomic_set(&cpu_updated, 0);
> +    loading_state = LOADING_PREPARE;
> +
> +    /* Calculate the number of online CPU core */
> +    nr_cores = 0;
> +    for_each_online_cpu(cpu)
> +        if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
> +            nr_cores++;
> +
> +    printk(XENLOG_INFO "%u cores are to update their microcode\n", nr_cores);
> +
> +    /*
> +     * We intend to disable interrupt for long time, which may lead to
> +     * watchdog timeout.
> +     */
> +    watchdog_disable();
> +    /*
> +     * Late loading dance. Why the heavy-handed stop_machine effort?
> +     *
> +     * - HT siblings must be idle and not execute other code while the other
> +     *   sibling is loading microcode in order to avoid any negative
> +     *   interactions cause by the loading.
> +     *
> +     * - In addition, microcode update on the cores must be serialized until
> +     *   this requirement can be relaxed in the future. Right now, this is
> +     *   conservative and good.
> +     */
> +    ret = stop_machine_run(do_microcode_update, patch, NR_CPUS);
> +    watchdog_enable();
> +
> +    updated = atomic_read(&cpu_updated);
> +    if ( updated > 0 )
> +    {
> +        spin_lock(&microcode_mutex);
> +        microcode_update_cache(patch);
> +        spin_unlock(&microcode_mutex);
>      }
> +    else
> +        microcode_free_patch(patch);
>  
> -    ret = continue_hypercall_on_cpu(cpumask_first(&cpu_online_map),
> -                                    do_microcode_update, patch);
> +    if ( updated && updated != nr_cores )
> +        printk(XENLOG_ERR "ERROR: Updating microcode succeeded on %u cores and failed\n"
> +               XENLOG_ERR "on other %u cores. A system with differing microcode \n"
> +               XENLOG_ERR "revisions is considered unstable. Please reboot and do not\n"
> +               XENLOG_ERR "load the microcode that triggersthis warning!\n",
> +               updated, nr_cores - updated);
>  
> + put:
> +    put_cpu_maps();
>   free:
>      xfree(buffer);
>      return ret;
>
Chao Gao Aug. 19, 2019, 2:49 p.m. UTC | #2
On Mon, Aug 19, 2019 at 11:27:36AM +0100, Sergey Dyasli wrote:
>> +static int master_thread_fn(const struct microcode_patch *patch)
>> +{
>> +    unsigned int cpu = smp_processor_id();
>> +    int ret = 0;
>> +
>> +    while ( loading_state != LOADING_CALLIN )
>> +        cpu_relax();
>> +
>> +    cpumask_set_cpu(cpu, &cpu_callin_map);
>> +
>> +    while ( loading_state != LOADING_ENTER )
>> +        cpu_relax();
>
>If I'm reading it right, this will wait forever in case when...
>
>> +
>> +    /*
>> +     * If an error happened, control thread would set 'loading_state'
>> +     * to LOADING_EXIT. Don't perform ucode loading for this case
>> +     */
>> +    if ( loading_state == LOADING_EXIT )
>> +        return ret;

I tried to check whether there was an error here. But as you said, we
cannot reach here if 'control thread' set loading_state from LOADING_CALLIN
to LOADING_EXIT. I will do this check in the while-loop right above.

>> +
>> +    ret = microcode_ops->apply_microcode(patch);
>> +    if ( !ret )
>> +        atomic_inc(&cpu_updated);
>> +    atomic_inc(&cpu_out);
>> +
>> +    while ( loading_state != LOADING_EXIT )
>> +        cpu_relax();
>> +
>> +    return ret;
>> +}
>> +
>> +static int control_thread_fn(const struct microcode_patch *patch)
>>  {
>> -    unsigned int cpu;
>> +    unsigned int cpu = smp_processor_id(), done;
>> +    unsigned long tick;
>> +    int ret;
>>  
>> -    /* Store the patch after a successful loading */
>> -    if ( !microcode_update_cpu(patch) && patch )
>> +    /* Allow threads to call in */
>> +    loading_state = LOADING_CALLIN;
>> +    smp_mb();
>> +
>> +    cpumask_set_cpu(cpu, &cpu_callin_map);
>> +
>> +    /* Waiting for all threads calling in */
>> +    ret = wait_for_condition(wait_cpu_callin,
>> +                             (void *)(unsigned long)num_online_cpus(),
>> +                             MICROCODE_CALLIN_TIMEOUT_US);
>> +    if ( ret ) {
>> +        loading_state = LOADING_EXIT;
>> +        return ret;
>> +    }
>
>...this condition holds. Have you actually tested this case?

I didn't craft a case to verify the error-handling path. And I believe
that you are right. 

Thanks
Chao
Jan Beulich Aug. 29, 2019, 12:06 p.m. UTC | #3
On 19.08.2019 03:25, Chao Gao wrote:
> @@ -232,6 +276,34 @@ bool microcode_update_cache(struct microcode_patch *patch)
>      return true;
>  }
>  
> +/* Wait for a condition to be met with a timeout (us). */
> +static int wait_for_condition(int (*func)(void *data), void *data,
> +                         unsigned int timeout)
> +{
> +    while ( !func(data) )
> +    {
> +        if ( !timeout-- )
> +        {
> +            printk("CPU%u: Timeout in %pS\n",
> +                   smp_processor_id(), __builtin_return_address(0));
> +            return -EBUSY;
> +        }
> +        udelay(1);
> +    }
> +
> +    return 0;
> +}
> +
> +static int wait_cpu_callin(void *nr)
> +{
> +    return cpumask_weight(&cpu_callin_map) >= (unsigned long)nr;
> +}
> +
> +static int wait_cpu_callout(void *nr)
> +{
> +    return atomic_read(&cpu_out) >= (unsigned long)nr;
> +}

Since wait_for_condition() is used with only these two functions
as callbacks, they should imo return bool and take const void *.

> @@ -265,37 +337,155 @@ static int microcode_update_cpu(const struct microcode_patch *patch)
>      return err;
>  }
>  
> -static long do_microcode_update(void *patch)
> +static int slave_thread_fn(void)
> +{
> +    unsigned int cpu = smp_processor_id();
> +    unsigned int master = cpumask_first(this_cpu(cpu_sibling_mask));
> +
> +    while ( loading_state != LOADING_CALLIN )
> +        cpu_relax();
> +
> +    cpumask_set_cpu(cpu, &cpu_callin_map);
> +
> +    while ( loading_state != LOADING_EXIT )
> +        cpu_relax();
> +
> +    /* Copy update revision from the "master" thread. */
> +    this_cpu(cpu_sig).rev = per_cpu(cpu_sig, master).rev;
> +
> +    return 0;
> +}
> +
> +static int master_thread_fn(const struct microcode_patch *patch)
> +{
> +    unsigned int cpu = smp_processor_id();
> +    int ret = 0;
> +
> +    while ( loading_state != LOADING_CALLIN )
> +        cpu_relax();
> +
> +    cpumask_set_cpu(cpu, &cpu_callin_map);
> +
> +    while ( loading_state != LOADING_ENTER )
> +        cpu_relax();
> +
> +    /*
> +     * If an error happened, control thread would set 'loading_state'
> +     * to LOADING_EXIT. Don't perform ucode loading for this case
> +     */
> +    if ( loading_state == LOADING_EXIT )
> +        return ret;

Even if the producer transitions this through ENTER to EXIT, the
observer here may never get to see the ENTER state, and hence
never exit the loop above. You want either < ENTER or == CALLIN.

> +    ret = microcode_ops->apply_microcode(patch);
> +    if ( !ret )
> +        atomic_inc(&cpu_updated);
> +    atomic_inc(&cpu_out);
> +
> +    while ( loading_state != LOADING_EXIT )
> +        cpu_relax();
> +
> +    return ret;
> +}

As a cosmetic remark, I don't think "master" and "slave" are
suitable terms here. "primary" and "secondary" would imo come
closer to what the threads' relationship is.

> +static int control_thread_fn(const struct microcode_patch *patch)
>  {
> -    unsigned int cpu;
> +    unsigned int cpu = smp_processor_id(), done;
> +    unsigned long tick;
> +    int ret;
>  
> -    /* Store the patch after a successful loading */
> -    if ( !microcode_update_cpu(patch) && patch )
> +    /* Allow threads to call in */
> +    loading_state = LOADING_CALLIN;
> +    smp_mb();

Why not just smp_wmb()? (Same further down then.)

> +    cpumask_set_cpu(cpu, &cpu_callin_map);
> +
> +    /* Waiting for all threads calling in */
> +    ret = wait_for_condition(wait_cpu_callin,
> +                             (void *)(unsigned long)num_online_cpus(),
> +                             MICROCODE_CALLIN_TIMEOUT_US);
> +    if ( ret ) {

Misplaced brace.

> +static int do_microcode_update(void *patch)

const?

> @@ -326,19 +523,67 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
>      {
>          ret = PTR_ERR(patch);
>          printk(XENLOG_INFO "Parsing microcode blob error %d\n", ret);
> -        goto free;
> +        goto put;
>      }
>  
>      if ( !patch )
>      {
>          printk(XENLOG_INFO "No ucode found. Update aborted!\n");
>          ret = -EINVAL;
> -        goto free;
> +        goto put;
> +    }
> +
> +    cpumask_clear(&cpu_callin_map);
> +    atomic_set(&cpu_out, 0);
> +    atomic_set(&cpu_updated, 0);
> +    loading_state = LOADING_PREPARE;
> +
> +    /* Calculate the number of online CPU core */
> +    nr_cores = 0;
> +    for_each_online_cpu(cpu)
> +        if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
> +            nr_cores++;
> +
> +    printk(XENLOG_INFO "%u cores are to update their microcode\n", nr_cores);
> +
> +    /*
> +     * We intend to disable interrupt for long time, which may lead to
> +     * watchdog timeout.
> +     */
> +    watchdog_disable();
> +    /*
> +     * Late loading dance. Why the heavy-handed stop_machine effort?
> +     *
> +     * - HT siblings must be idle and not execute other code while the other
> +     *   sibling is loading microcode in order to avoid any negative
> +     *   interactions cause by the loading.
> +     *
> +     * - In addition, microcode update on the cores must be serialized until
> +     *   this requirement can be relaxed in the future. Right now, this is
> +     *   conservative and good.
> +     */
> +    ret = stop_machine_run(do_microcode_update, patch, NR_CPUS);
> +    watchdog_enable();

Considering that stop_machine_run() doesn't itself disable the watchdog,
did you consider having the control thread disable/enable the watchdog,
thus shortening the period where it's not active?

> +    updated = atomic_read(&cpu_updated);
> +    if ( updated > 0 )
> +    {
> +        spin_lock(&microcode_mutex);
> +        microcode_update_cache(patch);
> +        spin_unlock(&microcode_mutex);
>      }
> +    else
> +        microcode_free_patch(patch);
>  
> -    ret = continue_hypercall_on_cpu(cpumask_first(&cpu_online_map),
> -                                    do_microcode_update, patch);
> +    if ( updated && updated != nr_cores )
> +        printk(XENLOG_ERR "ERROR: Updating microcode succeeded on %u cores and failed\n"
> +               XENLOG_ERR "on other %u cores. A system with differing microcode \n"

Stray blank before newline.

> +               XENLOG_ERR "revisions is considered unstable. Please reboot and do not\n"
> +               XENLOG_ERR "load the microcode that triggersthis warning!\n",

Missing blank before "this".

Jan
Chao Gao Aug. 30, 2019, 3:30 a.m. UTC | #4
On Thu, Aug 29, 2019 at 02:06:39PM +0200, Jan Beulich wrote:
>On 19.08.2019 03:25, Chao Gao wrote:
>> +
>> +static int master_thread_fn(const struct microcode_patch *patch)
>> +{
>> +    unsigned int cpu = smp_processor_id();
>> +    int ret = 0;
>> +
>> +    while ( loading_state != LOADING_CALLIN )
>> +        cpu_relax();
>> +
>> +    cpumask_set_cpu(cpu, &cpu_callin_map);
>> +
>> +    while ( loading_state != LOADING_ENTER )
>> +        cpu_relax();
>> +
>> +    /*
>> +     * If an error happened, control thread would set 'loading_state'
>> +     * to LOADING_EXIT. Don't perform ucode loading for this case
>> +     */
>> +    if ( loading_state == LOADING_EXIT )
>> +        return ret;
>
>Even if the producer transitions this through ENTER to EXIT, the
>observer here may never get to see the ENTER state, and hence
>never exit the loop above. You want either < ENTER or == CALLIN.

Yes. I find stopmachine_action() is a good example to implement
a state machine. I will follow it.

>
>> +    ret = microcode_ops->apply_microcode(patch);
>> +    if ( !ret )
>> +        atomic_inc(&cpu_updated);
>> +    atomic_inc(&cpu_out);
>> +
>> +    while ( loading_state != LOADING_EXIT )
>> +        cpu_relax();
>> +
>> +    return ret;
>> +}
>
>As a cosmetic remark, I don't think "master" and "slave" are
>suitable terms here. "primary" and "secondary" would imo come
>closer to what the threads' relationship is.

Will do.

>> +
>> +    /*
>> +     * We intend to disable interrupt for long time, which may lead to
>> +     * watchdog timeout.
>> +     */
>> +    watchdog_disable();
>> +    /*
>> +     * Late loading dance. Why the heavy-handed stop_machine effort?
>> +     *
>> +     * - HT siblings must be idle and not execute other code while the other
>> +     *   sibling is loading microcode in order to avoid any negative
>> +     *   interactions cause by the loading.
>> +     *
>> +     * - In addition, microcode update on the cores must be serialized until
>> +     *   this requirement can be relaxed in the future. Right now, this is
>> +     *   conservative and good.
>> +     */
>> +    ret = stop_machine_run(do_microcode_update, patch, NR_CPUS);
>> +    watchdog_enable();
>
>Considering that stop_machine_run() doesn't itself disable the watchdog,
>did you consider having the control thread disable/enable the watchdog,
>thus shortening the period where it's not active?

Good idea. It helps to keep the code here clean. I think maybe
microcode_nmi_callback can be registered by control thread as well.

Thanks
Chao
diff mbox series

Patch

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index bdd9c9f..91f9e81 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -30,18 +30,52 @@ 
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/stop_machine.h>
 #include <xen/tasklet.h>
 #include <xen/guest_access.h>
 #include <xen/earlycpio.h>
+#include <xen/watchdog.h>
 
+#include <asm/delay.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
 #include <asm/setup.h>
 #include <asm/microcode.h>
 
+/*
+ * Before performing a late microcode update on any thread, we
+ * rendezvous all cpus in stop_machine context. The timeout for
+ * waiting for cpu rendezvous is 30ms. It is the timeout used by
+ * live patching
+ */
+#define MICROCODE_CALLIN_TIMEOUT_US 30000
+
+/*
+ * Timeout for each thread to complete update is set to 1s. It is a
+ * conservative choice considering all possible interference.
+ */
+#define MICROCODE_UPDATE_TIMEOUT_US 1000000
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
+static unsigned int nr_cores;
+
+/*
+ * These states help to coordinate CPUs during loading an update.
+ *
+ * The semantics of each state is as follow:
+ *  - LOADING_PREPARE: initial state of 'loading_state'.
+ *  - LOADING_CALLIN: CPUs are allowed to callin.
+ *  - LOADING_ENTER: all CPUs have called in. Initiate ucode loading.
+ *  - LOADING_EXIT: ucode loading is done or aborted.
+ */
+static enum {
+    LOADING_PREPARE,
+    LOADING_CALLIN,
+    LOADING_ENTER,
+    LOADING_EXIT,
+} loading_state;
 
 /*
  * If we scan the initramfs.cpio for the early microcode code
@@ -190,6 +224,16 @@  static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 /*
+ * Count the CPUs that have entered, exited the rendezvous and succeeded in
+ * microcode update during late microcode update respectively.
+ *
+ * Note that a bitmap is used for callin to allow cpu to set a bit multiple
+ * times. It is required to do busy-loop in #NMI handling.
+ */
+static cpumask_t cpu_callin_map;
+static atomic_t cpu_out, cpu_updated;
+
+/*
  * Return a patch that covers current CPU. If there are multiple patches,
  * return the one with the highest revision number. Return error If no
  * patch is found and an error occurs during the parsing process. Otherwise
@@ -232,6 +276,34 @@  bool microcode_update_cache(struct microcode_patch *patch)
     return true;
 }
 
+/* Wait for a condition to be met with a timeout (us). */
+static int wait_for_condition(int (*func)(void *data), void *data,
+                         unsigned int timeout)
+{
+    while ( !func(data) )
+    {
+        if ( !timeout-- )
+        {
+            printk("CPU%u: Timeout in %pS\n",
+                   smp_processor_id(), __builtin_return_address(0));
+            return -EBUSY;
+        }
+        udelay(1);
+    }
+
+    return 0;
+}
+
+static int wait_cpu_callin(void *nr)
+{
+    return cpumask_weight(&cpu_callin_map) >= (unsigned long)nr;
+}
+
+static int wait_cpu_callout(void *nr)
+{
+    return atomic_read(&cpu_out) >= (unsigned long)nr;
+}
+
 /*
  * Load a microcode update to current CPU.
  *
@@ -265,37 +337,155 @@  static int microcode_update_cpu(const struct microcode_patch *patch)
     return err;
 }
 
-static long do_microcode_update(void *patch)
+static int slave_thread_fn(void)
+{
+    unsigned int cpu = smp_processor_id();
+    unsigned int master = cpumask_first(this_cpu(cpu_sibling_mask));
+
+    while ( loading_state != LOADING_CALLIN )
+        cpu_relax();
+
+    cpumask_set_cpu(cpu, &cpu_callin_map);
+
+    while ( loading_state != LOADING_EXIT )
+        cpu_relax();
+
+    /* Copy update revision from the "master" thread. */
+    this_cpu(cpu_sig).rev = per_cpu(cpu_sig, master).rev;
+
+    return 0;
+}
+
+static int master_thread_fn(const struct microcode_patch *patch)
+{
+    unsigned int cpu = smp_processor_id();
+    int ret = 0;
+
+    while ( loading_state != LOADING_CALLIN )
+        cpu_relax();
+
+    cpumask_set_cpu(cpu, &cpu_callin_map);
+
+    while ( loading_state != LOADING_ENTER )
+        cpu_relax();
+
+    /*
+     * If an error happened, control thread would set 'loading_state'
+     * to LOADING_EXIT. Don't perform ucode loading for this case
+     */
+    if ( loading_state == LOADING_EXIT )
+        return ret;
+
+    ret = microcode_ops->apply_microcode(patch);
+    if ( !ret )
+        atomic_inc(&cpu_updated);
+    atomic_inc(&cpu_out);
+
+    while ( loading_state != LOADING_EXIT )
+        cpu_relax();
+
+    return ret;
+}
+
+static int control_thread_fn(const struct microcode_patch *patch)
 {
-    unsigned int cpu;
+    unsigned int cpu = smp_processor_id(), done;
+    unsigned long tick;
+    int ret;
 
-    /* Store the patch after a successful loading */
-    if ( !microcode_update_cpu(patch) && patch )
+    /* Allow threads to call in */
+    loading_state = LOADING_CALLIN;
+    smp_mb();
+
+    cpumask_set_cpu(cpu, &cpu_callin_map);
+
+    /* Waiting for all threads calling in */
+    ret = wait_for_condition(wait_cpu_callin,
+                             (void *)(unsigned long)num_online_cpus(),
+                             MICROCODE_CALLIN_TIMEOUT_US);
+    if ( ret ) {
+        loading_state = LOADING_EXIT;
+        return ret;
+    }
+
+    /* Let master threads load the given ucode update */
+    loading_state = LOADING_ENTER;
+    smp_mb();
+
+    ret = microcode_ops->apply_microcode(patch);
+    if ( !ret )
+        atomic_inc(&cpu_updated);
+    atomic_inc(&cpu_out);
+
+    tick = rdtsc_ordered();
+    /* Waiting for master threads finishing update */
+    done = atomic_read(&cpu_out);
+    while ( done != nr_cores )
     {
-        spin_lock(&microcode_mutex);
-        microcode_update_cache(patch);
-        spin_unlock(&microcode_mutex);
-        patch = NULL;
+        /*
+         * During each timeout interval, at least a CPU is expected to
+         * finish its update. Otherwise, something goes wrong.
+         *
+         * Note that RDTSC (in wait_for_condition()) is safe for threads to
+         * execute while waiting for completion of loading an update.
+         */
+        if ( wait_for_condition(wait_cpu_callout,
+                                (void *)(unsigned long)(done + 1),
+                                MICROCODE_UPDATE_TIMEOUT_US) )
+            panic("Timeout when finished updating microcode (finished %u/%u)",
+                  done, nr_cores);
+
+        /* Print warning message once if long time is spent here */
+        if ( tick && rdtsc_ordered() - tick >= cpu_khz * 1000 )
+        {
+            printk(XENLOG_WARNING
+                   "WARNING: UPDATING MICROCODE HAS CONSUMED MORE THAN 1 SECOND!\n");
+            tick = 0;
+        }
+        done = atomic_read(&cpu_out);
     }
 
-    if ( microcode_ops->end_update )
-        microcode_ops->end_update();
+    /* Mark loading is done to unblock other threads */
+    loading_state = LOADING_EXIT;
+    smp_mb();
 
-    cpu = cpumask_next(smp_processor_id(), &cpu_online_map);
-    if ( cpu < nr_cpu_ids )
-        return continue_hypercall_on_cpu(cpu, do_microcode_update, patch);
+    return ret;
+}
 
-    /* Free the patch if no CPU has loaded it successfully. */
-    if ( patch )
-        microcode_free_patch(patch);
+static int do_microcode_update(void *patch)
+{
+    unsigned int cpu = smp_processor_id();
+    /*
+     * "master" thread is the one with the lowest thread id among all siblings
+     * thread in a core or a compute unit. It is chosen to load a microcode
+     * update.
+     */
+    unsigned int master = cpumask_first(this_cpu(cpu_sibling_mask));
+    int ret;
 
-    return 0;
+    /*
+     * The control thread set state to coordinate ucode loading. Master threads
+     * load the given ucode patch. Slave threads just wait for the completion
+     * of the ucode loading process.
+     */
+    if ( cpu == cpumask_first(&cpu_online_map) )
+        ret = control_thread_fn(patch);
+    else if ( cpu == master )
+        ret = master_thread_fn(patch);
+    else
+        ret = slave_thread_fn();
+
+    if ( microcode_ops->end_update )
+        microcode_ops->end_update();
+
+    return ret;
 }
 
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
 {
     int ret;
     void *buffer;
+    unsigned int cpu, updated;
     struct microcode_patch *patch;
 
     if ( len != (uint32_t)len )
@@ -314,11 +504,18 @@  int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
         goto free;
     }
 
+    /* cpu_online_map must not change during update */
+    if ( !get_cpu_maps() )
+    {
+        ret = -EBUSY;
+        goto free;
+    }
+
     if ( microcode_ops->start_update )
     {
         ret = microcode_ops->start_update();
         if ( ret != 0 )
-            goto free;
+            goto put;
     }
 
     patch = parse_blob(buffer, len);
@@ -326,19 +523,67 @@  int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
     {
         ret = PTR_ERR(patch);
         printk(XENLOG_INFO "Parsing microcode blob error %d\n", ret);
-        goto free;
+        goto put;
     }
 
     if ( !patch )
     {
         printk(XENLOG_INFO "No ucode found. Update aborted!\n");
         ret = -EINVAL;
-        goto free;
+        goto put;
+    }
+
+    cpumask_clear(&cpu_callin_map);
+    atomic_set(&cpu_out, 0);
+    atomic_set(&cpu_updated, 0);
+    loading_state = LOADING_PREPARE;
+
+    /* Calculate the number of online CPU core */
+    nr_cores = 0;
+    for_each_online_cpu(cpu)
+        if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
+            nr_cores++;
+
+    printk(XENLOG_INFO "%u cores are to update their microcode\n", nr_cores);
+
+    /*
+     * We intend to disable interrupt for long time, which may lead to
+     * watchdog timeout.
+     */
+    watchdog_disable();
+    /*
+     * Late loading dance. Why the heavy-handed stop_machine effort?
+     *
+     * - HT siblings must be idle and not execute other code while the other
+     *   sibling is loading microcode in order to avoid any negative
+     *   interactions cause by the loading.
+     *
+     * - In addition, microcode update on the cores must be serialized until
+     *   this requirement can be relaxed in the future. Right now, this is
+     *   conservative and good.
+     */
+    ret = stop_machine_run(do_microcode_update, patch, NR_CPUS);
+    watchdog_enable();
+
+    updated = atomic_read(&cpu_updated);
+    if ( updated > 0 )
+    {
+        spin_lock(&microcode_mutex);
+        microcode_update_cache(patch);
+        spin_unlock(&microcode_mutex);
     }
+    else
+        microcode_free_patch(patch);
 
-    ret = continue_hypercall_on_cpu(cpumask_first(&cpu_online_map),
-                                    do_microcode_update, patch);
+    if ( updated && updated != nr_cores )
+        printk(XENLOG_ERR "ERROR: Updating microcode succeeded on %u cores and failed\n"
+               XENLOG_ERR "on other %u cores. A system with differing microcode \n"
+               XENLOG_ERR "revisions is considered unstable. Please reboot and do not\n"
+               XENLOG_ERR "load the microcode that triggersthis warning!\n",
+               updated, nr_cores - updated);
 
+ put:
+    put_cpu_maps();
  free:
     xfree(buffer);
     return ret;