Patchworkβ [v2] x86: reduce srat verbosity in the kernel log

login
register
about
Submitter David Rientjes
Date 2009-10-28 04:53:27
Message ID <alpine.DEB.2.00.0910272129250.17260@chino.kir.corp.google.com>
Download mbox | patch
Permalink /patch/56211/
State New
Headers show

Comments

David Rientjes - 2009-10-28 04:53:27
On Wed, 28 Oct 2009, Andi Kleen wrote:

> For now at least seems reasonable to limit to 128 or so yes (and go
> back to the stack). if we ever have sparse apic ids for nodes 
> then that might change; but in this case could still just do
> a acpidump or teach the printer to be more clever and support
> strides.
> 

It'll support sparse apicids, which was shown in Mike's example, although 
it only becomes a problem when they cannot be represented in a 128 
character buffer.  I doubt there are many machines where that happens 
given the way they are formed.

> It would be just good to have some indication in the output
> if there was a overflow.
> 

Agreed, a trailing "..." would be appropriate.



x86: reduce srat verbosity in the kernel log

It's possible to reduce the number of SRAT messages emitted to the kernel
log by printing each valid pxm once and then creating bitmaps to represent
the apic ids that map to the same node.

This reduces lines such as

	SRAT: PXM 0 -> APIC 0 -> Node 0
	SRAT: PXM 0 -> APIC 1 -> Node 0
	SRAT: PXM 1 -> APIC 2 -> Node 1
	SRAT: PXM 1 -> APIC 3 -> Node 1

to

	SRAT: PXM 0 -> APIC {0-1} -> Node 0
	SRAT: PXM 1 -> APIC {2-3} -> Node 1

The buffer used to store the apic id list is 128 characters in length.
If that is too small to represent all the apic id ranges that are bound
to a single pxm, a trailing "..." is added.  APICID_LIST_LEN should be
manually increased for such configurations.

Acked-by: Mike Travis <travis@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/x86/mm/srat_64.c |   41 +++++++++++++++++++++++++++++++++++++----
 drivers/acpi/numa.c   |    5 +++++
 include/linux/acpi.h  |    3 ++-
 3 files changed, 44 insertions(+), 5 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen - 2009-10-28 05:19:25
> +		/*
> +		 * If the bitmap cannot be listed in a buffer of length
> +		 * APICID_LIST_LEN, then it is suffixed with "...".
> +		 */
> +		len = bitmap_scnlistprintf(apicid_list, APICID_LIST_LEN,
> +					   apicid_map, MAX_LOCAL_APIC);
> +		pr_info("SRAT: PXM %u -> APIC {%s%s} -> Node %u\n",
> +			i, apicid_list,
> +			(len == APICID_LIST_LEN - 1) ? "..." : "",

Is the - 1 really correct? If scnlistprintf follows snprintf semantics then it would not
be and my understanding is it is supposed to.

Other than that it looks good.

-Andi
David Rientjes - 2009-10-28 05:24:39
On Wed, 28 Oct 2009, Andi Kleen wrote:

> > +		/*
> > +		 * If the bitmap cannot be listed in a buffer of length
> > +		 * APICID_LIST_LEN, then it is suffixed with "...".
> > +		 */
> > +		len = bitmap_scnlistprintf(apicid_list, APICID_LIST_LEN,
> > +					   apicid_map, MAX_LOCAL_APIC);
> > +		pr_info("SRAT: PXM %u -> APIC {%s%s} -> Node %u\n",
> > +			i, apicid_list,
> > +			(len == APICID_LIST_LEN - 1) ? "..." : "",
> 
> Is the - 1 really correct? If scnlistprintf follows snprintf semantics then it would not
> be and my understanding is it is supposed to.
> 

It is, bitmap_scnlistprintf() returns the number of characters printed to 
the buffer minus the trailing '\0', which is different from snprintf().  
APICID_LIST_LEN-1 then identifies when the buffer was max'd out.  It still 
adds a trailing "..." if the list is exactly 128 characters long, but this 
isn't addressed to avoid added complexity.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Rientjes - 2009-11-10 21:08:23
On Tue, 27 Oct 2009, David Rientjes wrote:

> x86: reduce srat verbosity in the kernel log
> 
> It's possible to reduce the number of SRAT messages emitted to the kernel
> log by printing each valid pxm once and then creating bitmaps to represent
> the apic ids that map to the same node.
> 
> This reduces lines such as
> 
> 	SRAT: PXM 0 -> APIC 0 -> Node 0
> 	SRAT: PXM 0 -> APIC 1 -> Node 0
> 	SRAT: PXM 1 -> APIC 2 -> Node 1
> 	SRAT: PXM 1 -> APIC 3 -> Node 1
> 
> to
> 
> 	SRAT: PXM 0 -> APIC {0-1} -> Node 0
> 	SRAT: PXM 1 -> APIC {2-3} -> Node 1
> 
> The buffer used to store the apic id list is 128 characters in length.
> If that is too small to represent all the apic id ranges that are bound
> to a single pxm, a trailing "..." is added.  APICID_LIST_LEN should be
> manually increased for such configurations.
> 
> Acked-by: Mike Travis <travis@sgi.com>
> Signed-off-by: David Rientjes <rientjes@google.com>

Ingo, have you had a chance to look at merging this yet?
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - 2009-11-10 21:33:12
* David Rientjes <rientjes@google.com> wrote:

> On Tue, 27 Oct 2009, David Rientjes wrote:
> 
> > x86: reduce srat verbosity in the kernel log
> > 
> > It's possible to reduce the number of SRAT messages emitted to the kernel
> > log by printing each valid pxm once and then creating bitmaps to represent
> > the apic ids that map to the same node.
> > 
> > This reduces lines such as
> > 
> > 	SRAT: PXM 0 -> APIC 0 -> Node 0
> > 	SRAT: PXM 0 -> APIC 1 -> Node 0
> > 	SRAT: PXM 1 -> APIC 2 -> Node 1
> > 	SRAT: PXM 1 -> APIC 3 -> Node 1
> > 
> > to
> > 
> > 	SRAT: PXM 0 -> APIC {0-1} -> Node 0
> > 	SRAT: PXM 1 -> APIC {2-3} -> Node 1
> > 
> > The buffer used to store the apic id list is 128 characters in length.
> > If that is too small to represent all the apic id ranges that are bound
> > to a single pxm, a trailing "..." is added.  APICID_LIST_LEN should be
> > manually increased for such configurations.
> > 
> > Acked-by: Mike Travis <travis@sgi.com>
> > Signed-off-by: David Rientjes <rientjes@google.com>
> 
> Ingo, have you had a chance to look at merging this yet?

I'm waiting for Mike to test them (and other patches) and send a new 
series out with bits to pick up.

But i really dont like such type of buffering - in the past they tended 
to be problematic. Why print this info at all in the default bootup?  
It's not needed on a correctly functioning system.

For failure analysis make it opt-in available via a boot parameter (if 
it's needed for bootup analysis) - but otherwise just dont print it.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - 2009-11-10 21:42:40
Ingo Molnar wrote:
> * David Rientjes <rientjes@google.com> wrote:
> 
>> On Tue, 27 Oct 2009, David Rientjes wrote:
>>
>>> x86: reduce srat verbosity in the kernel log
>>>
>>> It's possible to reduce the number of SRAT messages emitted to the kernel
>>> log by printing each valid pxm once and then creating bitmaps to represent
>>> the apic ids that map to the same node.
>>>
>>> This reduces lines such as
>>>
>>> 	SRAT: PXM 0 -> APIC 0 -> Node 0
>>> 	SRAT: PXM 0 -> APIC 1 -> Node 0
>>> 	SRAT: PXM 1 -> APIC 2 -> Node 1
>>> 	SRAT: PXM 1 -> APIC 3 -> Node 1
>>>
>>> to
>>>
>>> 	SRAT: PXM 0 -> APIC {0-1} -> Node 0
>>> 	SRAT: PXM 1 -> APIC {2-3} -> Node 1
>>>
>>> The buffer used to store the apic id list is 128 characters in length.
>>> If that is too small to represent all the apic id ranges that are bound
>>> to a single pxm, a trailing "..." is added.  APICID_LIST_LEN should be
>>> manually increased for such configurations.
>>>
>>> Acked-by: Mike Travis <travis@sgi.com>
>>> Signed-off-by: David Rientjes <rientjes@google.com>
>> Ingo, have you had a chance to look at merging this yet?
> 
> I'm waiting for Mike to test them (and other patches) and send a new 
> series out with bits to pick up.
> 
> But i really dont like such type of buffering - in the past they tended 
> to be problematic. Why print this info at all in the default bootup?  
> It's not needed on a correctly functioning system.
> 
> For failure analysis make it opt-in available via a boot parameter (if 
> it's needed for bootup analysis) - but otherwise just dont print it.
> 
make them to depend on apic=debug or apic=verbose?

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - 2009-11-10 21:57:59
* Yinghai Lu <yinghai@kernel.org> wrote:

> Ingo Molnar wrote:
> > * David Rientjes <rientjes@google.com> wrote:
> > 
> >> On Tue, 27 Oct 2009, David Rientjes wrote:
> >>
> >>> x86: reduce srat verbosity in the kernel log
> >>>
> >>> It's possible to reduce the number of SRAT messages emitted to the kernel
> >>> log by printing each valid pxm once and then creating bitmaps to represent
> >>> the apic ids that map to the same node.
> >>>
> >>> This reduces lines such as
> >>>
> >>> 	SRAT: PXM 0 -> APIC 0 -> Node 0
> >>> 	SRAT: PXM 0 -> APIC 1 -> Node 0
> >>> 	SRAT: PXM 1 -> APIC 2 -> Node 1
> >>> 	SRAT: PXM 1 -> APIC 3 -> Node 1
> >>>
> >>> to
> >>>
> >>> 	SRAT: PXM 0 -> APIC {0-1} -> Node 0
> >>> 	SRAT: PXM 1 -> APIC {2-3} -> Node 1
> >>>
> >>> The buffer used to store the apic id list is 128 characters in length.
> >>> If that is too small to represent all the apic id ranges that are bound
> >>> to a single pxm, a trailing "..." is added.  APICID_LIST_LEN should be
> >>> manually increased for such configurations.
> >>>
> >>> Acked-by: Mike Travis <travis@sgi.com>
> >>> Signed-off-by: David Rientjes <rientjes@google.com>
> >> Ingo, have you had a chance to look at merging this yet?
> > 
> > I'm waiting for Mike to test them (and other patches) and send a new 
> > series out with bits to pick up.
> > 
> > But i really dont like such type of buffering - in the past they tended 
> > to be problematic. Why print this info at all in the default bootup?  
> > It's not needed on a correctly functioning system.
> > 
> > For failure analysis make it opt-in available via a boot parameter (if 
> > it's needed for bootup analysis) - but otherwise just dont print it.
> > 
> make them to depend on apic=debug or apic=verbose?

Yeah - i'd definitely suggest to not splinter the boot flag space too 
much - users wont know what to enable in case of trouble.

But that is a detail (and it can be improved later on as well).

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Travis - 2009-11-10 23:09:15
Ingo Molnar wrote:
> * David Rientjes <rientjes@google.com> wrote:
> 
>> On Tue, 27 Oct 2009, David Rientjes wrote:
>>
>>> x86: reduce srat verbosity in the kernel log
>>>
>>> It's possible to reduce the number of SRAT messages emitted to the kernel
>>> log by printing each valid pxm once and then creating bitmaps to represent
>>> the apic ids that map to the same node.
>>>
>>> This reduces lines such as
>>>
>>> 	SRAT: PXM 0 -> APIC 0 -> Node 0
>>> 	SRAT: PXM 0 -> APIC 1 -> Node 0
>>> 	SRAT: PXM 1 -> APIC 2 -> Node 1
>>> 	SRAT: PXM 1 -> APIC 3 -> Node 1
>>>
>>> to
>>>
>>> 	SRAT: PXM 0 -> APIC {0-1} -> Node 0
>>> 	SRAT: PXM 1 -> APIC {2-3} -> Node 1
>>>
>>> The buffer used to store the apic id list is 128 characters in length.
>>> If that is too small to represent all the apic id ranges that are bound
>>> to a single pxm, a trailing "..." is added.  APICID_LIST_LEN should be
>>> manually increased for such configurations.
>>>
>>> Acked-by: Mike Travis <travis@sgi.com>
>>> Signed-off-by: David Rientjes <rientjes@google.com>
>> Ingo, have you had a chance to look at merging this yet?
> 
> I'm waiting for Mike to test them (and other patches) and send a new 
> series out with bits to pick up.
> 
> But i really dont like such type of buffering - in the past they tended 
> to be problematic. Why print this info at all in the default bootup?  
> It's not needed on a correctly functioning system.
> 
> For failure analysis make it opt-in available via a boot parameter (if 
> it's needed for bootup analysis) - but otherwise just dont print it.
> 
> 	Ingo

Hi,

Sorry, it's been time consuming getting this checked out as our test
systems are much more in demand right now (SC09 is here.)

I'm very close to submitting another version, just picking up
everyone's comments now.  One more test run this afternoon and
I should be able to submit the patches.  I believe I've got a
good compromise between informative messages and compactness,
without any additional overhead.

I've also tested David's patch in every run and it hasn't shown any
problems at all.  (In fact, a recent merge of ACPI 4.0 code and it
still works flawlessly.)

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Rientjes - 2009-11-12 20:56:55
On Tue, 10 Nov 2009, Ingo Molnar wrote:

> I'm waiting for Mike to test them (and other patches) and send a new 
> series out with bits to pick up.
> 

Mike posted his series today without including my patch, so I've replied 
to it.

> But i really dont like such type of buffering - in the past they tended 
> to be problematic.

I'm not sure that I'd call it buffering when iterating through all apic 
id's and setting appropriate bits in a bitmap when they map to a node id.  
It's apparently not been problematic either on my machines, Mike's 
machines, or his merge with ACPI 4.0 code.  I think the code is pretty 
straight forward.

> Why print this info at all in the default bootup?  
> It's not needed on a correctly functioning system.
> 

We have no other export of the apic id to to node mappings in the kernel.  
We already show each pxm's address range, each node's address range, and 
the pxm to node map.  The only other way to map apic ids to nodes is by 
looking for the lines "CPU 0/0 -> Node 0," which I believe are being 
removed.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Travis - 2009-11-12 21:14:20
David Rientjes wrote:
> On Tue, 10 Nov 2009, Ingo Molnar wrote:
> 
>> I'm waiting for Mike to test them (and other patches) and send a new 
>> series out with bits to pick up.
>>
> 
> Mike posted his series today without including my patch, so I've replied 
> to it.

Sorry, I wasn't aware I should have.
> 
>> But i really dont like such type of buffering - in the past they tended 
>> to be problematic.
> 
> I'm not sure that I'd call it buffering when iterating through all apic 
> id's and setting appropriate bits in a bitmap when they map to a node id.  
> It's apparently not been problematic either on my machines, Mike's 
> machines, or his merge with ACPI 4.0 code.  I think the code is pretty 
> straight forward.
> 
>> Why print this info at all in the default bootup?  
>> It's not needed on a correctly functioning system.
>>
> 
> We have no other export of the apic id to to node mappings in the kernel.  
> We already show each pxm's address range, each node's address range, and 
> the pxm to node map.  The only other way to map apic ids to nodes is by 
> looking for the lines "CPU 0/0 -> Node 0," which I believe are being 
> removed.

The bootup messages in my patch 1/7 list nodes and their processors as each
boots.  And this is easily found under /sysfs.

Also, I think in general that all the apic messages, unless they represent
"system boot progress" should be displayed only when asked for, like with
apic=debug or verbose?   Something more like:

BIOS-provided physical RAM map processed.
EFI: memory allocated.
SRAT: table interpreted.
Bootmem setups complete.
ACPI: APIC's enabled.
PM: Registered all nosave memory.

Removing the above tables remove about 3400 lines of console output on a 1k
thread machine.  There are 20,000+ lines of output before you get to the
login prompt (even with the removal of cpu bootup messages).

But you are right, the apic info should be available via /sysfs or /procfs.

The next BIG output is from devices.  Listing all the pci busses available
is an overkill as that info is also readily available when the system is
running.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Rientjes - 2009-11-12 21:20:40
On Thu, 12 Nov 2009, Mike Travis wrote:

> Also, I think in general that all the apic messages, unless they represent
> "system boot progress" should be displayed only when asked for, like with
> apic=debug or verbose?   Something more like:
> 

That's outside the scope of my patch.  My patch does what the title says, 
it reduces srat verbosity in the kernel log.  If an additional change 
would like to suppress that output with a kernel parameter, that's fine, 
but it's an additional change and not what I was addressing.

When posting a patchset like this where all patches are related for a 
common goal and one patch (mine) was proposed during the development of 
the set, it's normal to include that patch in future postings with proper 
attribution given by indicating an author other than yourself in the very 
first line of the email:

	From: David Rientjes <rientjes@google.com>

and retaining your acked-by line, my signed-off-by line, and then adding 
your own signed-off-by line.

If a subsequent patch were to suppress this for kernels not using a 
certain parameter, I certainly wouldn't object to it.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -36,6 +36,9 @@  static int num_node_memblks __initdata;
 static struct bootnode node_memblk_range[NR_NODE_MEMBLKS] __initdata;
 static int memblk_nodeid[NR_NODE_MEMBLKS] __initdata;
 
+static DECLARE_BITMAP(apicid_map, MAX_LOCAL_APIC) __initdata;
+#define APICID_LIST_LEN	(128)
+
 static __init int setup_node(int pxm)
 {
 	return acpi_map_pxm_to_node(pxm);
@@ -136,8 +139,6 @@  acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
 	apicid_to_node[apic_id] = node;
 	node_set(node, cpu_nodes_parsed);
 	acpi_numa = 1;
-	printk(KERN_INFO "SRAT: PXM %u -> APIC %u -> Node %u\n",
-	       pxm, apic_id, node);
 }
 
 /* Callback for Proximity Domain -> LAPIC mapping */
@@ -170,8 +171,40 @@  acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
 	apicid_to_node[apic_id] = node;
 	node_set(node, cpu_nodes_parsed);
 	acpi_numa = 1;
-	printk(KERN_INFO "SRAT: PXM %u -> APIC %u -> Node %u\n",
-	       pxm, apic_id, node);
+}
+
+void __init acpi_numa_print_srat_mapping(void)
+{
+	char apicid_list[APICID_LIST_LEN];
+	int i, j;
+
+	for (i = 0; i < MAX_PXM_DOMAINS; i++) {
+		int len;
+		int nid;
+
+		nid = pxm_to_node(i);
+		if (nid == NUMA_NO_NODE)
+			continue;
+
+		bitmap_zero(apicid_map, MAX_LOCAL_APIC);
+		for (j = 0; j < MAX_LOCAL_APIC; j++)
+			if (apicid_to_node[j] == nid)
+				set_bit(j, apicid_map);
+
+		if (bitmap_empty(apicid_map, MAX_LOCAL_APIC))
+			continue;
+
+		/*
+		 * If the bitmap cannot be listed in a buffer of length
+		 * APICID_LIST_LEN, then it is suffixed with "...".
+		 */
+		len = bitmap_scnlistprintf(apicid_list, APICID_LIST_LEN,
+					   apicid_map, MAX_LOCAL_APIC);
+		pr_info("SRAT: PXM %u -> APIC {%s%s} -> Node %u\n",
+			i, apicid_list,
+			(len == APICID_LIST_LEN - 1) ? "..." : "",
+			nid);
+	}
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -281,6 +281,10 @@  acpi_table_parse_srat(enum acpi_srat_type id,
 					    handler, max_entries);
 }
 
+void __init __attribute__((weak)) acpi_numa_print_srat_mapping(void)
+{
+}
+
 int __init acpi_numa_init(void)
 {
 	/* SRAT: Static Resource Affinity Table */
@@ -292,6 +296,7 @@  int __init acpi_numa_init(void)
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
 				      acpi_parse_memory_affinity,
 				      NR_NODE_MEMBLKS);
+		acpi_numa_print_srat_mapping();
 	}
 
 	/* SLIT: System Locality Information Table */
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -92,12 +92,13 @@  int acpi_table_parse_madt (enum acpi_madt_type id, acpi_table_entry_handler hand
 int acpi_parse_mcfg (struct acpi_table_header *header);
 void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
 
-/* the following four functions are architecture-dependent */
+/* the following six functions are architecture-dependent */
 void acpi_numa_slit_init (struct acpi_table_slit *slit);
 void acpi_numa_processor_affinity_init (struct acpi_srat_cpu_affinity *pa);
 void acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa);
 void acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma);
 void acpi_numa_arch_fixup(void);
+void acpi_numa_print_srat_mapping(void);
 
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */