From patchwork Mon Jul 31 20:05:45 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Boris Ostrovsky <boris.ostrovsky@oracle.com>
X-Patchwork-Id: 9873083
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	9514A602F0 for <patchwork-xen-devel@patchwork.kernel.org>;
	Mon, 31 Jul 2017 20:06:00 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89D2427D29
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Mon, 31 Jul 2017 20:06:00 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 7E420285C4; Mon, 31 Jul 2017 20:06:00 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	UNPARSEABLE_RELAY autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 33471285A8
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Mon, 31 Jul 2017 20:05:58 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1dcGuT-00073x-63; Mon, 31 Jul 2017 20:03:25 +0000
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <boris.ostrovsky@oracle.com>) id 1dcGuS-00073r-GK
	for xen-devel@lists.xen.org; Mon, 31 Jul 2017 20:03:24 +0000
Received: from [85.158.137.68] by server-13.bemta-3.messagelabs.com id
	D7/F0-01862-B0D8F795; Mon, 31 Jul 2017 20:03:23 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrPLMWRWlGSWpSXmKPExsXSO6nOVZe7tz7
	S4NsrZoslHxezODB6HN39mymAMYo1My8pvyKBNaN3yVPGgn8hFU1LLzA1ME5z62Lk4hASmMwk
	cfzsGhYI5y+jxLe9n5khnI2MEhNWNTN2MXICOT2MEm8a+EBsNgEjibNHp4PFRQSkJa59vgxmM
	wv4SMxf28faxcjBISyQI3H+ux5ImEVAVWL5mRVMIDavgJdEw8apYLaEgILElIfvmSFsY4n2tx
	fZJjDyLGBkWMWoUZxaVJZapGtkrJdUlJmeUZKbmJmja2hgrJebWlycmJ6ak5hUrJecn7uJEej
	5egYGxh2MfXv9DjFKcjApifKekaqPFOJLyk+pzEgszogvKs1JLT7EKMPBoSTBO7UbKCdYlJqe
	WpGWmQMMQZi0BAePkgivHEiat7ggMbc4Mx0idYpRl+PVhP/fmIRY8vLzUqXEeVeDFAmAFGWU5
	sGNgMXDJUZZKWFeRgYGBiGegtSi3MwSVPlXjOIcjErCvLtApvBk5pXAbXoFdAQT0BGSpbUgR5
	QkIqSkGhgnhX+aU2fMcjzu8EGpbU+qtpy6Ztv9SW0Xf6GYjKf3tAR9eVaVpcxsrwxMdePnTNX
	5tEnuxtNoKb0aS+lJjy3kFHkPynFtDqgzfGjGpPH4ackvn5nfpz5fuJp9/dqKxKVLar22/rHb
	Wu+8iG/b8ZxN2/7aOjdqsB/Zrvhg1p6HzK391hNEOV8rsRRnJBpqMRcVJwIAahhKdYICAAA=
X-Env-Sender: boris.ostrovsky@oracle.com
X-Msg-Ref: server-6.tower-31.messagelabs.com!1501531401!69414100!1
X-Originating-IP: [141.146.126.69]
X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor:
	VHJ1c3RlZCBJUDogMTQxLjE0Ni4xMjYuNjkgPT4gMjc3MjE4\n
X-StarScan-Received: 
X-StarScan-Version: 9.4.25; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 10598 invoked from network); 31 Jul 2017 20:03:22 -0000
Received: from aserp1040.oracle.com (HELO aserp1040.oracle.com)
	(141.146.126.69)
	by server-6.tower-31.messagelabs.com with DHE-RSA-AES256-GCM-SHA384
	encrypted SMTP; 31 Jul 2017 20:03:22 -0000
Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234])
	by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2)
	with ESMTP id v6VK3IEq022084
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK); Mon, 31 Jul 2017 20:03:18 GMT
Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235])
	by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id
	v6VK3Ho9018996
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK); Mon, 31 Jul 2017 20:03:18 GMT
Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21])
	by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id
	v6VK3Hei031109; Mon, 31 Jul 2017 20:03:17 GMT
Received: from ovs104.us.oracle.com (/10.149.76.204)
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Mon, 31 Jul 2017 13:03:17 -0700
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: xen-devel@lists.xen.org
Date: Mon, 31 Jul 2017 16:05:45 -0400
Message-Id: <1501531546-23548-1-git-send-email-boris.ostrovsky@oracle.com>
X-Mailer: git-send-email 1.8.3.1
X-Source-IP: aserv0022.oracle.com [141.146.126.234]
Cc: andrew.cooper3@citrix.com, boris.ostrovsky@oracle.com, jbeulich@suse.com
Subject: [Xen-devel] [PATCH] x86/apic/x2apic: Share IRQ vector between
	cluster members only when no cpumask is specified
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

We have limited number (slightly under NR_DYNAMIC_VECTORS=192) of IRQ
vectors that are available to each processor. Currently, when x2apic
cluster mode is used (which is default), each vector is shared among
all processors in the cluster. With many IRQs (as is the case on systems
with multiple SR-IOV cards) and few clusters (e.g. single socket)
there is a good chance that we will run out of vectors.

This patch tries to decrease vector sharing between processors by
assigning vector to a single processor if the assignment request (via
__assign_irq_vector()) comes without explicitly specifying which
processors are expected to share the interrupt. This typically happens
during boot time (or possibly PCI hotplug) when create_irq(NUMA_NO_NODE)
is called. When __assign_irq_vector() is called from
set_desc_affinity() which provides sharing mask, vector sharing will
continue to be performed, as before.

This patch to some extent mirrors Linux commit d872818dbbee
("x86/apic/x2apic: Use multiple cluster members for the irq destination
only with the explicit affinity").

Note that this change still does not guarantee that we never run out of
vectors. For example, on a single core system we will be effectively
back to the single cluster/socket case of original code.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/acpi/boot.c                     | 2 +-
 xen/arch/x86/apic.c                          | 2 +-
 xen/arch/x86/genapic/delivery.c              | 4 ++--
 xen/arch/x86/genapic/x2apic.c                | 9 +++++++--
 xen/arch/x86/io_apic.c                       | 8 ++++----
 xen/arch/x86/irq.c                           | 4 ++--
 xen/arch/x86/mpparse.c                       | 2 +-
 xen/arch/x86/msi.c                           | 2 +-
 xen/include/asm-x86/genapic.h                | 9 ++++++---
 xen/include/asm-x86/mach-generic/mach_apic.h | 9 +++++----
 10 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/acpi/boot.c b/xen/arch/x86/acpi/boot.c
index 8e6c96d..c16b14a 100644
--- a/xen/arch/x86/acpi/boot.c
+++ b/xen/arch/x86/acpi/boot.c
@@ -645,7 +645,7 @@ static void __init acpi_process_madt(void)
 				acpi_ioapic = true;
 
 				smp_found_config = true;
-				clustered_apic_check();
+				CLUSTERED_APIC_CHECK();
 			}
 		}
 		if (error == -EINVAL) {
diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 851a6cc..a0e1798 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -544,7 +544,7 @@ void setup_local_APIC(void)
      * an APIC.  See e.g. "AP-388 82489DX User's Manual" (Intel
      * document number 292116).  So here it goes...
      */
-    init_apic_ldr();
+    INIT_APIC_LDR();
 
     /*
      * Set Task Priority to reject any interrupts below FIRST_DYNAMIC_VECTOR.
diff --git a/xen/arch/x86/genapic/delivery.c b/xen/arch/x86/genapic/delivery.c
index ced92a1..d71c01c 100644
--- a/xen/arch/x86/genapic/delivery.c
+++ b/xen/arch/x86/genapic/delivery.c
@@ -30,7 +30,7 @@ void __init clustered_apic_check_flat(void)
 	printk("Enabling APIC mode:  Flat.  Using %d I/O APICs\n", nr_ioapics);
 }
 
-const cpumask_t *vector_allocation_cpumask_flat(int cpu)
+const cpumask_t *vector_allocation_cpumask_flat(int cpu, const cpumask_t *cpumask)
 {
 	return &cpu_online_map;
 } 
@@ -58,7 +58,7 @@ void __init clustered_apic_check_phys(void)
 	printk("Enabling APIC mode:  Phys.  Using %d I/O APICs\n", nr_ioapics);
 }
 
-const cpumask_t *vector_allocation_cpumask_phys(int cpu)
+const cpumask_t *vector_allocation_cpumask_phys(int cpu, const cpumask_t *cpumask)
 {
 	return cpumask_of(cpu);
 }
diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
index 5fffb31..c0b97c9 100644
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -27,6 +27,7 @@
 #include <asm/processor.h>
 #include <xen/smp.h>
 #include <asm/mach-default/mach_mpparse.h>
+#include <asm/mach-generic/mach_apic.h>
 
 static DEFINE_PER_CPU_READ_MOSTLY(u32, cpu_2_logical_apicid);
 static DEFINE_PER_CPU_READ_MOSTLY(cpumask_t *, cluster_cpus);
@@ -72,9 +73,13 @@ static void __init clustered_apic_check_x2apic(void)
 {
 }
 
-static const cpumask_t *vector_allocation_cpumask_x2apic_cluster(int cpu)
+static const cpumask_t *vector_allocation_cpumask_x2apic_cluster(int cpu,
+    const cpumask_t *cpumask)
 {
-    return per_cpu(cluster_cpus, cpu);
+    if ( cpumask != TARGET_CPUS )
+        return per_cpu(cluster_cpus, cpu);
+    else
+        return cpumask_of(cpu);
 }
 
 static unsigned int cpu_mask_to_apicid_x2apic_cluster(const cpumask_t *cpumask)
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 2838f6b..3eefcfc 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1038,7 +1038,7 @@ static void __init setup_IO_APIC_irqs(void)
                 disable_8259A_irq(irq_to_desc(irq));
 
             desc = irq_to_desc(irq);
-            SET_DEST(entry, logical, cpu_mask_to_apicid(TARGET_CPUS));
+            SET_DEST(entry, logical, CPU_MASK_TO_APICID(TARGET_CPUS));
             spin_lock_irqsave(&ioapic_lock, flags);
             __ioapic_write_entry(apic, pin, 0, entry);
             set_native_irq_info(irq, TARGET_CPUS);
@@ -1070,7 +1070,7 @@ static void __init setup_ExtINT_IRQ0_pin(unsigned int apic, unsigned int pin, in
      */
     entry.dest_mode = INT_DEST_MODE;
     entry.mask = 0;					/* unmask IRQ now */
-    SET_DEST(entry, logical, cpu_mask_to_apicid(TARGET_CPUS));
+    SET_DEST(entry, logical, CPU_MASK_TO_APICID(TARGET_CPUS));
     entry.delivery_mode = INT_DELIVERY_MODE;
     entry.polarity = 0;
     entry.trigger = 0;
@@ -2236,7 +2236,7 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a
     /* Don't chance ending up with an empty mask. */
     if (cpumask_intersects(&mask, desc->arch.cpu_mask))
         cpumask_and(&mask, &mask, desc->arch.cpu_mask);
-    SET_DEST(entry, logical, cpu_mask_to_apicid(&mask));
+    SET_DEST(entry, logical, CPU_MASK_TO_APICID(&mask));
 
     apic_printk(APIC_DEBUG, KERN_DEBUG "IOAPIC[%d]: Set PCI routing entry "
 		"(%d-%d -> %#x -> IRQ %d Mode:%i Active:%i)\n", ioapic,
@@ -2423,7 +2423,7 @@ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
     /* Set the vector field to the real vector! */
     rte.vector = desc->arch.vector;
 
-    SET_DEST(rte, logical, cpu_mask_to_apicid(desc->arch.cpu_mask));
+    SET_DEST(rte, logical, CPU_MASK_TO_APICID(desc->arch.cpu_mask));
 
     __ioapic_write_entry(apic, pin, 0, rte);
     
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 57e6c18..227a549 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -484,7 +484,7 @@ static int __assign_irq_vector(
         if (!cpu_online(cpu))
             continue;
 
-        cpumask_and(&tmp_mask, vector_allocation_cpumask(cpu),
+        cpumask_and(&tmp_mask, VECTOR_ALLOCATION_CPUMASK(cpu, mask),
                     &cpu_online_map);
 
         vector = current_vector;
@@ -748,7 +748,7 @@ unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
     cpumask_copy(desc->affinity, mask);
     cpumask_and(&dest_mask, mask, desc->arch.cpu_mask);
 
-    return cpu_mask_to_apicid(&dest_mask);
+    return CPU_MASK_TO_APICID(&dest_mask);
 }
 
 /* For re-setting irq interrupt affinity for specific irq */
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index a1a0738..4c44f4b 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -396,7 +396,7 @@ static int __init smp_read_mpc(struct mp_config_table *mpc)
 			}
 		}
 	}
-	clustered_apic_check();
+	CLUSTERED_APIC_CHECK();
 	if (!num_processors)
 		printk(KERN_ERR "SMP mptable: no processors registered!\n");
 	return num_processors;
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 77998f4..1f55c98 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -171,7 +171,7 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
             return;
 
         cpumask_and(mask, cpu_mask, &cpu_online_map);
-        msg->dest32 = cpu_mask_to_apicid(mask);
+        msg->dest32 = CPU_MASK_TO_APICID(mask);
     }
 
     msg->address_hi = MSI_ADDR_BASE_HI;
diff --git a/xen/include/asm-x86/genapic.h b/xen/include/asm-x86/genapic.h
index 5496ab0..799bed8 100644
--- a/xen/include/asm-x86/genapic.h
+++ b/xen/include/asm-x86/genapic.h
@@ -34,7 +34,8 @@ struct genapic {
 	void (*init_apic_ldr)(void);
 	void (*clustered_apic_check)(void);
 	const cpumask_t *(*target_cpus)(void);
-	const cpumask_t *(*vector_allocation_cpumask)(int cpu);
+	const cpumask_t *(*vector_allocation_cpumask)(int cpu,
+                                                      const cpumask_t *mask);
 	unsigned int (*cpu_mask_to_apicid)(const cpumask_t *cpumask);
 	void (*send_IPI_mask)(const cpumask_t *mask, int vector);
     void (*send_IPI_self)(uint8_t vector);
@@ -58,7 +59,8 @@ void init_apic_ldr_flat(void);
 void clustered_apic_check_flat(void);
 unsigned int cpu_mask_to_apicid_flat(const cpumask_t *cpumask);
 void send_IPI_mask_flat(const cpumask_t *mask, int vector);
-const cpumask_t *vector_allocation_cpumask_flat(int cpu);
+const cpumask_t *vector_allocation_cpumask_flat(int cpu,
+                                                const cpumask_t *cpumask);
 #define GENAPIC_FLAT \
 	.int_delivery_mode = dest_LowestPrio, \
 	.int_dest_mode = 1 /* logical delivery */, \
@@ -74,7 +76,8 @@ void init_apic_ldr_phys(void);
 void clustered_apic_check_phys(void);
 unsigned int cpu_mask_to_apicid_phys(const cpumask_t *cpumask);
 void send_IPI_mask_phys(const cpumask_t *mask, int vector);
-const cpumask_t *vector_allocation_cpumask_phys(int cpu);
+const cpumask_t *vector_allocation_cpumask_phys(int cpu,
+                                                const cpumask_t *cpumask);
 #define GENAPIC_PHYS \
 	.int_delivery_mode = dest_Fixed, \
 	.int_dest_mode = 0 /* physical delivery */, \
diff --git a/xen/include/asm-x86/mach-generic/mach_apic.h b/xen/include/asm-x86/mach-generic/mach_apic.h
index 03e9e8a..999ad77 100644
--- a/xen/include/asm-x86/mach-generic/mach_apic.h
+++ b/xen/include/asm-x86/mach-generic/mach_apic.h
@@ -13,10 +13,11 @@
 #define INT_DELIVERY_MODE (genapic->int_delivery_mode)
 #define INT_DEST_MODE (genapic->int_dest_mode)
 #define TARGET_CPUS	  (genapic->target_cpus())
-#define init_apic_ldr (genapic->init_apic_ldr)
-#define clustered_apic_check (genapic->clustered_apic_check) 
-#define cpu_mask_to_apicid (genapic->cpu_mask_to_apicid)
-#define vector_allocation_cpumask(cpu) (genapic->vector_allocation_cpumask(cpu))
+#define INIT_APIC_LDR (genapic->init_apic_ldr)
+#define CLUSTERED_APIC_CHECK (genapic->clustered_apic_check) 
+#define CPU_MASK_TO_APICID (genapic->cpu_mask_to_apicid)
+#define VECTOR_ALLOCATION_CPUMASK(cpu, mask) \
+    (genapic->vector_allocation_cpumask(cpu, mask))
 
 static inline void enable_apic_mode(void)
 {