diff mbox

[-v2,RESENDING] x86, acpi: Handle xapic/x2apic entries in MADT at same time

Message ID 4DC0514F.9070901@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Yinghai Lu May 3, 2011, 7:02 p.m. UTC
One system have mixing xapic and x2apic entries in MADT and SRAT.
BIOS guys insist that ACPI 4.0 SPEC said so, if apic id < 255, even
the cpus are with x2apic mode pre-enabled, still need to use xapic entries
instead of x2apic entries.

on 8 socket system with x2apic pre-enabled, will get out of order sequence:
CPU0: socket0, core0, thread0.
CPU1 - CPU 40: socket 4 - socket 7, thread 0
CPU41 - CPU 80: socket 4 - socket 7, thread 1
CPU81 - CPU 119: socket 0 - socket 3, thread 0
CPU120 - CPU 159: socket 0 - socket 3, thread 1

so max_cpus=80 will not get all thread0 now.

Need to handle every entry in MADT at same time with xapic and x2apic.
so we can honor sequence in MADT.

We can use max_cpus= command line to use thread0 in every core,
because recent MADT always have all thread0 at first.
Also it could make the cpu to node mapping more sane.

after patch will get
CPU0 - CPU 79: socket 0 - socket 7, thread 0
CPU80 - CPU 159: socket 0 - socket 7, thread 1

-v2: update some comments, and change to pass array pointer.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/acpi/boot.c |   30 ++++++++++++++++-------
 drivers/acpi/numa.c         |   16 +++++++++---
 drivers/acpi/tables.c       |   57 +++++++++++++++++++++++++++++++++-----------
 include/linux/acpi.h        |    9 ++++++
 4 files changed, 86 insertions(+), 26 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jack Steiner May 5, 2011, 4:21 p.m. UTC | #1
FYI

We also hit the same problem. Our BIOS folks tried to make the BIOS
ACPI compliant but we hit the same problem & broke cpu numbering.

For now, we've reverted the BIOS change but are concerned that another OS
may require the change.


From the boot log:
     891.184006 (    0.001022)| [    0.000000] Setting APIC routing to UV large system.
     891.197920 (    0.013914)| [    0.000000] ACPI: X2APIC (apic_id[0x100] uid[0x20] enabled)
     891.218507 (    0.020587)| [    0.000000] ACPI: X2APIC (apic_id[0x102] uid[0x21] enabled)
    ...
     904.009046 (    0.006237)| [    0.000000] ACPI: X2APIC (apic_id[0x3ff0] uid[0x7fe] enabled)
     904.020501 (    0.011455)| [    0.000000] ACPI: X2APIC (apic_id[0x3ff2] uid[0x7ff] enabled)
     904.024500 (    0.003999)| [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
     904.024684 (    0.000184)| [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)

    ...

    1089.050351 (    0.008981)| [   30.960025] Booting Node   4, Processors  #1  #2 #3 #4 #5 #6 #7 #8 Ok.
    1089.847951 (    0.669131)| [   31.473073] Booting Node   5, Processors  #9 #10 #11 #12 #13 #14 #15 #16 Ok.
    ....
    1283.195893 (    0.772824)| [  224.864248] Booting Node 255, Processors  #2009 #2010 #2011 #2012 #2013 #2014 #2015 #2016 Ok.
    1283.970697 (    0.774804)| [  225.637077] Booting Node   0, Processors  #2017 #2018 #2019 #2020 #2021 #2022 #2023 Ok.
    1284.652065 (    0.681368)| [  226.316121] Booting Node   1, Processors  #2024 #2025 #2026 #2027 #2028 #2029 #2030 #2031 Ok.
    1285.422635 (    0.770570)| [  227.088808] Booting Node   2, Processors  #2032 #2033 #2034 #2035 #2036 #2037 #2038 #2039 Ok.
    1286.205456 (    0.782821)| [  227.864175] Booting Node   3, Processors  #2040 #2041 #2042 #2043 #2044 #2045 #2046 #2047 Ok.

--- jack
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu May 5, 2011, 5:38 p.m. UTC | #2
On 05/05/2011 09:21 AM, Jack Steiner wrote:
> FYI
> 
> We also hit the same problem. Our BIOS folks tried to make the BIOS
> ACPI compliant but we hit the same problem & broke cpu numbering.
> 
> For now, we've reverted the BIOS change but are concerned that another OS
> may require the change.
> 
> 
>>From the boot log:
>      891.184006 (    0.001022)| [    0.000000] Setting APIC routing to UV large system.
>      891.197920 (    0.013914)| [    0.000000] ACPI: X2APIC (apic_id[0x100] uid[0x20] enabled)
>      891.218507 (    0.020587)| [    0.000000] ACPI: X2APIC (apic_id[0x102] uid[0x21] enabled)
>     ...
>      904.009046 (    0.006237)| [    0.000000] ACPI: X2APIC (apic_id[0x3ff0] uid[0x7fe] enabled)
>      904.020501 (    0.011455)| [    0.000000] ACPI: X2APIC (apic_id[0x3ff2] uid[0x7ff] enabled)
>      904.024500 (    0.003999)| [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
>      904.024684 (    0.000184)| [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
> 
>     ...
> 
>     1089.050351 (    0.008981)| [   30.960025] Booting Node   4, Processors  #1  #2 #3 #4 #5 #6 #7 #8 Ok.
>     1089.847951 (    0.669131)| [   31.473073] Booting Node   5, Processors  #9 #10 #11 #12 #13 #14 #15 #16 Ok.
>     ....
>     1283.195893 (    0.772824)| [  224.864248] Booting Node 255, Processors  #2009 #2010 #2011 #2012 #2013 #2014 #2015 #2016 Ok.
>     1283.970697 (    0.774804)| [  225.637077] Booting Node   0, Processors  #2017 #2018 #2019 #2020 #2021 #2022 #2023 Ok.
>     1284.652065 (    0.681368)| [  226.316121] Booting Node   1, Processors  #2024 #2025 #2026 #2027 #2028 #2029 #2030 #2031 Ok.
>     1285.422635 (    0.770570)| [  227.088808] Booting Node   2, Processors  #2032 #2033 #2034 #2035 #2036 #2037 #2038 #2039 Ok.
>     1286.205456 (    0.782821)| [  227.864175] Booting Node   3, Processors  #2040 #2041 #2042 #2043 #2044 #2045 #2046 #2047 Ok.
> 

Yes, your new BIOS need this patch to make cpu number not out of order.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Suresh Siddha May 6, 2011, 6:38 p.m. UTC | #3
On Tue, 2011-05-03 at 12:02 -0700, Yinghai Lu wrote:
> One system have mixing xapic and x2apic entries in MADT and SRAT.
> BIOS guys insist that ACPI 4.0 SPEC said so, if apic id < 255, even
> the cpus are with x2apic mode pre-enabled, still need to use xapic entries
> instead of x2apic entries.
> 
> on 8 socket system with x2apic pre-enabled, will get out of order sequence:
> CPU0: socket0, core0, thread0.
> CPU1 - CPU 40: socket 4 - socket 7, thread 0
> CPU41 - CPU 80: socket 4 - socket 7, thread 1
> CPU81 - CPU 119: socket 0 - socket 3, thread 0
> CPU120 - CPU 159: socket 0 - socket 3, thread 1
> 
> so max_cpus=80 will not get all thread0 now.
> 
> Need to handle every entry in MADT at same time with xapic and x2apic.
> so we can honor sequence in MADT.
> 
> We can use max_cpus= command line to use thread0 in every core,
> because recent MADT always have all thread0 at first.
> Also it could make the cpu to node mapping more sane.
> 
> after patch will get
> CPU0 - CPU 79: socket 0 - socket 7, thread 0
> CPU80 - CPU 159: socket 0 - socket 7, thread 1
> 
> -v2: update some comments, and change to pass array pointer.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/kernel/acpi/boot.c |   30 ++++++++++++++++-------
>  drivers/acpi/numa.c         |   16 +++++++++---
>  drivers/acpi/tables.c       |   57 +++++++++++++++++++++++++++++++++-----------
>  include/linux/acpi.h        |    9 ++++++
>  4 files changed, 86 insertions(+), 26 deletions(-)

Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>

Len, If you are ok, then can you please queue this up?

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
+++ linux-2.6/arch/x86/kernel/acpi/boot.c
@@ -883,6 +883,7 @@  static int __init acpi_parse_madt_lapic_
 {
 	int count;
 	int x2count = 0;
+	struct acpi_subtable_proc madt_proc[2];
 
 	if (!cpu_has_apic)
 		return -ENODEV;
@@ -907,10 +908,16 @@  static int __init acpi_parse_madt_lapic_
 				      acpi_parse_sapic, MAX_LOCAL_APIC);
 
 	if (!count) {
-		x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
-					acpi_parse_x2apic, MAX_LOCAL_APIC);
-		count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
-					acpi_parse_lapic, MAX_LOCAL_APIC);
+		memset(madt_proc, 0, sizeof(madt_proc));
+		madt_proc[0].id = ACPI_MADT_TYPE_LOCAL_APIC;
+		madt_proc[0].handler = acpi_parse_lapic;
+		madt_proc[1].id = ACPI_MADT_TYPE_LOCAL_X2APIC;
+		madt_proc[1].handler = acpi_parse_x2apic;
+		acpi_table_parse_entries_array(ACPI_SIG_MADT,
+					    sizeof(struct acpi_table_madt),
+			    madt_proc, ARRAY_SIZE(madt_proc), MAX_LOCAL_APIC);
+		count = madt_proc[0].count;
+		x2count = madt_proc[1].count;
 	}
 	if (!count && !x2count) {
 		printk(KERN_ERR PREFIX "No LAPIC entries present\n");
@@ -922,11 +929,16 @@  static int __init acpi_parse_madt_lapic_
 		return count;
 	}
 
-	x2count =
-	    acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC_NMI,
-				  acpi_parse_x2apic_nmi, 0);
-	count =
-	    acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC_NMI, acpi_parse_lapic_nmi, 0);
+	memset(madt_proc, 0, sizeof(madt_proc));
+	madt_proc[0].id = ACPI_MADT_TYPE_LOCAL_APIC_NMI;
+	madt_proc[0].handler = acpi_parse_lapic_nmi;
+	madt_proc[1].id = ACPI_MADT_TYPE_LOCAL_X2APIC_NMI;
+	madt_proc[1].handler = acpi_parse_x2apic_nmi;
+	acpi_table_parse_entries_array(ACPI_SIG_MADT,
+				    sizeof(struct acpi_table_madt),
+				    madt_proc, ARRAY_SIZE(madt_proc), 0);
+	count = madt_proc[0].count;
+	x2count = madt_proc[1].count;
 	if (count < 0 || x2count < 0) {
 		printk(KERN_ERR PREFIX "Error parsing LAPIC NMI entry\n");
 		/* TBD: Cleanup to allow fallback to MPS */
Index: linux-2.6/drivers/acpi/numa.c
===================================================================
--- linux-2.6.orig/drivers/acpi/numa.c
+++ linux-2.6/drivers/acpi/numa.c
@@ -284,10 +284,18 @@  int __init acpi_numa_init(void)
 
 	/* SRAT: Static Resource Affinity Table */
 	if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
-		acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
-				     acpi_parse_x2apic_affinity, 0);
-		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
-				     acpi_parse_processor_affinity, 0);
+		struct acpi_subtable_proc srat_proc[2];
+
+		memset(srat_proc, 0, sizeof(srat_proc));
+		srat_proc[0].id = ACPI_SRAT_TYPE_CPU_AFFINITY;
+		srat_proc[0].handler = acpi_parse_processor_affinity;
+		srat_proc[1].id = ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY;
+		srat_proc[1].handler = acpi_parse_x2apic_affinity;
+
+		acpi_table_parse_entries_array(ACPI_SIG_SRAT,
+					   sizeof(struct acpi_table_srat),
+					   srat_proc, ARRAY_SIZE(srat_proc), 0);
+
 		ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
 					    acpi_parse_memory_affinity,
 					    NR_NODE_MEMBLKS);
Index: linux-2.6/drivers/acpi/tables.c
===================================================================
--- linux-2.6.orig/drivers/acpi/tables.c
+++ linux-2.6/drivers/acpi/tables.c
@@ -201,10 +201,9 @@  void acpi_table_print_madt_entry(struct
 
 
 int __init
-acpi_table_parse_entries(char *id,
+acpi_table_parse_entries_array(char *id,
 			     unsigned long table_size,
-			     int entry_id,
-			     acpi_table_entry_handler handler,
+			     struct acpi_subtable_proc *proc, int proc_num,
 			     unsigned int max_entries)
 {
 	struct acpi_table_header *table_header = NULL;
@@ -212,12 +211,12 @@  acpi_table_parse_entries(char *id,
 	unsigned int count = 0;
 	unsigned long table_end;
 	acpi_size tbl_size;
+	int i;
 
-	if (acpi_disabled)
+	if (acpi_disabled) {
+		proc[0].count = -ENODEV;
 		return -ENODEV;
-
-	if (!handler)
-		return -EINVAL;
+	}
 
 	if (strncmp(id, ACPI_SIG_MADT, 4) == 0)
 		acpi_get_table_with_size(id, acpi_apic_instance, &table_header, &tbl_size);
@@ -226,6 +225,7 @@  acpi_table_parse_entries(char *id,
 
 	if (!table_header) {
 		printk(KERN_WARNING PREFIX "%4.4s not present\n", id);
+		proc[0].count = -ENODEV;
 		return -ENODEV;
 	}
 
@@ -238,19 +238,30 @@  acpi_table_parse_entries(char *id,
 
 	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
 	       table_end) {
-		if (entry->type == entry_id
-		    && (!max_entries || count++ < max_entries))
-			if (handler(entry, table_end)) {
-				early_acpi_os_unmap_memory((char *)table_header, tbl_size);
+		for (i = 0; i < proc_num; i++) {
+			if (entry->type != proc[i].id)
+				continue;
+			if (max_entries && count++ >= max_entries)
+				continue;
+			if (proc[i].handler(entry, table_end)) {
+				early_acpi_os_unmap_memory((char *)table_header,
+								 tbl_size);
+				proc[i].count = -EINVAL;
 				return -EINVAL;
 			}
+			proc[i].count++;
+			break;
+		}
 
 		entry = (struct acpi_subtable_header *)
 		    ((unsigned long)entry + entry->length);
 	}
 	if (max_entries && count > max_entries) {
-		printk(KERN_WARNING PREFIX "[%4.4s:0x%02x] ignored %i entries of "
-		       "%i found\n", id, entry_id, count - max_entries, count);
+		printk(KERN_WARNING PREFIX "[%4.4s:0x%02x ", id, proc[0].id);
+		for (i = 1; i < proc_num; i++)
+			printk(KERN_CONT " 0x%02x", proc[i].id);
+		printk(KERN_CONT "] ignored %i entries of %i found\n",
+		       count-max_entries, count);
 	}
 
 	early_acpi_os_unmap_memory((char *)table_header, tbl_size);
@@ -258,6 +269,26 @@  acpi_table_parse_entries(char *id,
 }
 
 int __init
+acpi_table_parse_entries(char *id,
+			     unsigned long table_size,
+			     int entry_id,
+			     acpi_table_entry_handler handler,
+			     unsigned int max_entries)
+{
+	struct acpi_subtable_proc proc[1];
+
+	if (!handler)
+		return -EINVAL;
+
+	memset(proc, 0, sizeof(proc));
+	proc[0].id = entry_id;
+	proc[0].handler = handler;
+
+	return acpi_table_parse_entries_array(id, table_size, proc, 1,
+						 max_entries);
+}
+
+int __init
 acpi_table_parse_madt(enum acpi_madt_type id,
 		      acpi_table_entry_handler handler, unsigned int max_entries)
 {
Index: linux-2.6/include/linux/acpi.h
===================================================================
--- linux-2.6.orig/include/linux/acpi.h
+++ linux-2.6/include/linux/acpi.h
@@ -76,6 +76,12 @@  typedef int (*acpi_table_handler) (struc
 
 typedef int (*acpi_table_entry_handler) (struct acpi_subtable_header *header, const unsigned long end);
 
+struct acpi_subtable_proc {
+	int id;
+	acpi_table_entry_handler handler;
+	int count;
+};
+
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
@@ -86,6 +92,9 @@  int acpi_numa_init (void);
 
 int acpi_table_init (void);
 int acpi_table_parse (char *id, acpi_table_handler handler);
+int acpi_table_parse_entries_array(char *id, unsigned long table_size,
+	struct acpi_subtable_proc *proc, int proc_num,
+	unsigned int max_entries);
 int __init acpi_table_parse_entries(char *id, unsigned long table_size,
 	int entry_id, acpi_table_entry_handler handler, unsigned int max_entries);
 int acpi_table_parse_madt (enum acpi_madt_type id, acpi_table_entry_handler handler, unsigned int max_entries);