diff mbox series

[v3,2/2] x86/smp: use APIC ALLBUT destination shorthand when possible

Message ID 20200117095251.42668-3-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/smp: add support for APIC ALLBUT IPI shorthand | expand

Commit Message

Roger Pau Monné Jan. 17, 2020, 9:52 a.m. UTC
If the IPI destination mask matches the mask of online CPUs use the
APIC ALLBUT destination shorthand in order to send an IPI to all CPUs
on the system except the current one. This can only be safely used
when no CPU hotplug or unplug operations are taking place, no
offline CPUs or those have been onlined and parked, all CPUs in the
system have been accounted for (ie: the number of CPUs doesn't exceed
NR_CPUS and APIC IDs are below MAX_APICS) and there's no possibility
of CPU hotplug (ie: no disabled CPUs have been reported by the
firmware tables).

This is specially beneficial when using the PV shim, since using the
shorthand avoids performing an APIC register write (or multiple ones
if using xAPIC mode) for each destination when doing a global TLB
flush.

The lock time of flush_lock on a 32 vCPU guest using the shim in
x2APIC mode without the shorthand is:

Global lock flush_lock: addr=ffff82d0804b21c0, lockval=f602f602, not locked
  lock:228455938(79406065573135), block:205908580(556416605761539)

Average lock time: 347577ns

While the same guest using the shorthand:

Global lock flush_lock: addr=ffff82d0804b41c0, lockval=d9c4d9bc, cpu=12
  lock:1890775(416719148054), block:1663958(2500161282949)

Average lock time: 220395ns

Approximately a 1/3 improvement in the lock time.

Note that this requires locking the CPU maps (get_cpu_maps) which uses
a trylock. This is currently safe as all users of cpu_add_remove_lock
do a trylock, but will need reevaluating if non-trylock users appear.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Place the code moment in a pre-patch.
 - Rename cpu_overflow to unaccounted_cpus (leaving it as a bool).
 - Make disabled_cpus non-init and non-static (and mark it
   read_mostly).
 - Prevent using the shorthand if there are disabled CPU slots.
 - Cope correctly with send_IPI_mask getting passed an empty mask.
 - Reword comment about shorthand usage in send_IPI_mask.

Changes since v1:
 - Move the shorthand logic to send_IPI_mask.
 - Check interrupts are enabled before trying to get the cpu maps
   lock.
 - Move __prepare_ICR and __default_send_IPI_shortcut.
---
 xen/arch/x86/acpi/boot.c  |  1 +
 xen/arch/x86/mpparse.c    |  7 ++++++-
 xen/arch/x86/smp.c        | 35 ++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/smp.h |  3 +++
 4 files changed, 44 insertions(+), 2 deletions(-)

Comments

Jan Beulich Jan. 20, 2020, 3:49 p.m. UTC | #1
On 17.01.2020 10:52, Roger Pau Monne wrote:
> If the IPI destination mask matches the mask of online CPUs use the
> APIC ALLBUT destination shorthand in order to send an IPI to all CPUs
> on the system except the current one. This can only be safely used
> when no CPU hotplug or unplug operations are taking place, no
> offline CPUs or those have been onlined and parked, all CPUs in the
> system have been accounted for (ie: the number of CPUs doesn't exceed
> NR_CPUS and APIC IDs are below MAX_APICS) and there's no possibility
> of CPU hotplug (ie: no disabled CPUs have been reported by the
> firmware tables).
> 
> This is specially beneficial when using the PV shim, since using the
> shorthand avoids performing an APIC register write (or multiple ones
> if using xAPIC mode) for each destination when doing a global TLB
> flush.
> 
> The lock time of flush_lock on a 32 vCPU guest using the shim in
> x2APIC mode without the shorthand is:
> 
> Global lock flush_lock: addr=ffff82d0804b21c0, lockval=f602f602, not locked
>   lock:228455938(79406065573135), block:205908580(556416605761539)
> 
> Average lock time: 347577ns
> 
> While the same guest using the shorthand:
> 
> Global lock flush_lock: addr=ffff82d0804b41c0, lockval=d9c4d9bc, cpu=12
>   lock:1890775(416719148054), block:1663958(2500161282949)
> 
> Average lock time: 220395ns
> 
> Approximately a 1/3 improvement in the lock time.
> 
> Note that this requires locking the CPU maps (get_cpu_maps) which uses
> a trylock. This is currently safe as all users of cpu_add_remove_lock
> do a trylock, but will need reevaluating if non-trylock users appear.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
diff mbox series

Patch

diff --git a/xen/arch/x86/acpi/boot.c b/xen/arch/x86/acpi/boot.c
index 15542a9bdf..afc6ed9d99 100644
--- a/xen/arch/x86/acpi/boot.c
+++ b/xen/arch/x86/acpi/boot.c
@@ -103,6 +103,7 @@  acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end)
 			       processor->lapic_flags & ACPI_MADT_ENABLED
 			       ? KERN_WARNING "WARNING: " : KERN_INFO,
 			       processor->local_apic_id, processor->uid);
+		unaccounted_cpus = true;
 		/*
 		 * Must not return an error here, to prevent
 		 * acpi_table_parse_entries() from terminating early.
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index f057d9162f..d532575fee 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -61,11 +61,14 @@  unsigned int __read_mostly boot_cpu_physical_apicid = BAD_APICID;
 
 /* Internal processor count */
 static unsigned int num_processors;
-static unsigned int __initdata disabled_cpus;
+unsigned int __read_mostly disabled_cpus;
 
 /* Bitmask of physically existing CPUs */
 physid_mask_t phys_cpu_present_map;
 
+/* Record whether CPUs haven't been added due to overflows. */
+bool __read_mostly unaccounted_cpus;
+
 void __init set_nr_cpu_ids(unsigned int max_cpus)
 {
 	unsigned int tot_cpus = num_processors + disabled_cpus;
@@ -160,6 +163,7 @@  static int MP_processor_info_x(struct mpc_config_processor *m,
 		printk_once(XENLOG_WARNING
 			    "WARNING: NR_CPUS limit of %u reached - ignoring further processors\n",
 			    nr_cpu_ids);
+		unaccounted_cpus = true;
 		return -ENOSPC;
 	}
 
@@ -167,6 +171,7 @@  static int MP_processor_info_x(struct mpc_config_processor *m,
 	    && genapic.name == apic_default.name) {
 		printk_once(XENLOG_WARNING
 			    "WARNING: CPUs limit of 8 reached - ignoring futher processors\n");
+		unaccounted_cpus = true;
 		return -ENOSPC;
 	}
 
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index c14f304c09..65eb7cbda8 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -8,6 +8,7 @@ 
  *	later.
  */
 
+#include <xen/cpu.h>
 #include <xen/irq.h>
 #include <xen/sched.h>
 #include <xen/delay.h>
@@ -64,7 +65,39 @@  static void send_IPI_shortcut(unsigned int shortcut, int vector,
 
 void send_IPI_mask(const cpumask_t *mask, int vector)
 {
-    alternative_vcall(genapic.send_IPI_mask, mask, vector);
+    bool cpus_locked = false;
+    cpumask_t *scratch = this_cpu(scratch_cpumask);
+
+    /*
+     * This can only be safely used when no CPU hotplug or unplug operations
+     * are taking place, there are no offline CPUs (unless those have been
+     * onlined and parked), there are no disabled CPUs and all possible CPUs in
+     * the system have been accounted for.
+     */
+    if ( system_state > SYS_STATE_smp_boot &&
+         !unaccounted_cpus && !disabled_cpus &&
+         /* NB: get_cpu_maps lock requires enabled interrupts. */
+         local_irq_is_enabled() && (cpus_locked = get_cpu_maps()) &&
+         (park_offline_cpus ||
+          cpumask_equal(&cpu_online_map, &cpu_present_map)) )
+        cpumask_or(scratch, mask, cpumask_of(smp_processor_id()));
+    else
+    {
+        if ( cpus_locked )
+        {
+            put_cpu_maps();
+            cpus_locked = false;
+        }
+        cpumask_clear(scratch);
+    }
+
+    if ( cpumask_equal(scratch, &cpu_online_map) )
+        send_IPI_shortcut(APIC_DEST_ALLBUT, vector, APIC_DEST_PHYSICAL);
+    else
+        alternative_vcall(genapic.send_IPI_mask, mask, vector);
+
+    if ( cpus_locked )
+        put_cpu_maps();
 }
 
 void send_IPI_self(int vector)
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index dbeed2fd41..1aa55d41e1 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -84,6 +84,9 @@  extern cpumask_t **socket_cpumask;
 #define get_cpu_current(cpu) \
     (get_cpu_info_from_stack((unsigned long)stack_base[cpu])->current_vcpu)
 
+extern unsigned int disabled_cpus;
+extern bool unaccounted_cpus;
+
 #endif /* !__ASSEMBLY__ */
 
 #endif