parisc: Fix TLB related boot crash on SMP machines

Message ID	20161208200046.GA5248@ls3530 (mailing list archive)
State	Accepted, archived
Headers	show Return-Path: <linux-parisc-owner@kernel.org> Date: Thu, 8 Dec 2016 21:00:46 +0100 From: Helge Deller <deller@gmx.de> To: linux-parisc@vger.kernel.org, James Bottomley <James.Bottomley@HansenPartnership.com>, John David Anglin <dave.anglin@bell.net> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Subject: Re: [PATCH] parisc: Fix TLB related boot crash on SMP machines Message-ID: <20161208200046.GA5248@ls3530> References: <20161207205240.GA5917@ls3530> <20161207213414.hliciockser6d5da@raspberrypi-2.musicnaut.iki.fi> <53d038e2-a697-8e20-4225-48034af5aaf4@gmx.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53d038e2-a697-8e20-4225-48034af5aaf4@gmx.de> User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-parisc-owner@vger.kernel.org Precedence: bulk

Message ID

20161208200046.GA5248@ls3530 (mailing list archive)

State

Accepted, archived

Headers

Date: Thu, 8 Dec 2016 21:00:46 +0100
From: Helge Deller <deller@gmx.de>
To: linux-parisc@vger.kernel.org,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	John David Anglin <dave.anglin@bell.net>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Subject: Re: [PATCH] parisc: Fix TLB related boot crash on SMP machines
Message-ID: <20161208200046.GA5248@ls3530>
References: <20161207205240.GA5917@ls3530>
	<20161207213414.hliciockser6d5da@raspberrypi-2.musicnaut.iki.fi>
	<53d038e2-a697-8e20-4225-48034af5aaf4@gmx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <53d038e2-a697-8e20-4225-48034af5aaf4@gmx.de>
User-Agent: Mutt/1.7.1 (2016-10-04)
Sender: linux-parisc-owner@vger.kernel.org
Precedence: bulk

Commit Message

Helge Deller Dec. 8, 2016, 8 p.m. UTC

At bootup we run measurements to calculate the best threshold for when we
should be using full TLB flushes instead of just flushing a specific amount of
TLB entries.  This performance test is run over the kernel text segment.

But running this TLB performance test on the kernel text segment turned out to
crash some SMP machines when the kernel text pages were mapped as huge pages.

To avoid those crashes this patch simply skips this test on some SMP machines
and calculates an optimal threshold based on the maximum number of available
TLB entries and number of online CPUs.

On a technical side, this seems to happen:
The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB
entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge
instruction seems to work without problems even if the pages were mapped as
huge pages.  But on SMP systems the TLB purge instruction is broadcasted to
other CPUs. Those CPUs then crash the machine because the page size is not as
expected.  C8000 machines with PA8800/PA8900 CPUs were not affected by this
problem, because the required cache coherency prohibits to use huge pages at
all.  Sadly I didn't found any documentation about this behaviour, so this
finding is purely based on testing with phyiscal SMP machines (A500-44 and
J5000, both were 2-way boxes).

Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Helge Deller <deller@gmx.de>

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

John David Anglin Dec. 8, 2016, 8:49 p.m. UTC | #1

On 2016-12-08 3:00 PM, Helge Deller wrote:
> On a technical side, this seems to happen:
> The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB
> entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge
> instruction seems to work without problems even if the pages were mapped as
> huge pages.  But on SMP systems the TLB purge instruction is broadcasted to
> other CPUs. Those CPUs then crash the machine because the page size is not as
> expected.  C8000 machines with PA8800/PA8900 CPUs were not affected by this
> problem, because the required cache coherency prohibits to use huge pages at
> all.  Sadly I didn't found any documentation about this behaviour, so this
> finding is purely based on testing with phyiscal SMP machines (A500-44 and
> J5000, both were 2-way boxes).
I doubt the problem is the 4k iteration using pdtlb 0(sr1,addr).  I 
think the issue is the huge
page size for the kernel.  Each pdtlb instruction knocks out the same 
tlb entry including the
entry used for tlb interruptions.  This likely leads to stack overflow.  
In any case, it probably
doesn't provide accurate timing because each pdtlb knocks out the entry 
for the interruption
handler on systems with combined tlb.

Dave

Helge Deller Dec. 8, 2016, 9:15 p.m. UTC | #2

On 08.12.2016 21:49, John David Anglin wrote:
> On 2016-12-08 3:00 PM, Helge Deller wrote:
>> On a technical side, this seems to happen:
>> The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB
>> entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge
>> instruction seems to work without problems even if the pages were mapped as
>> huge pages.  But on SMP systems the TLB purge instruction is broadcasted to
>> other CPUs. Those CPUs then crash the machine because the page size is not as
>> expected.  C8000 machines with PA8800/PA8900 CPUs were not affected by this
>> problem, because the required cache coherency prohibits to use huge pages at
>> all.  Sadly I didn't found any documentation about this behaviour, so this
>> finding is purely based on testing with phyiscal SMP machines (A500-44 and
>> J5000, both were 2-way boxes).

> I doubt the problem is the 4k iteration using pdtlb 0(sr1,addr). I
> think the issue is the huge page size for the kernel. Each pdtlb
> instruction knocks out the same tlb entry including the entry used
> for tlb interruptions. This likely leads to stack overflow. 

Yes, likely.

> In any
> case, it probably doesn't provide accurate timing because each pdtlb
> knocks out the entry for the interruption handler on systems with
> combined tlb.

True.

So, how to continue?
I see two options:
a) skip the TLB measuring code as my patch does.
b) kmalloc() another region and do measurement there.

I'd like to submit some fix-patch for 4.9, else the machines won't boot 4.9.
That's why I'd prefer option a).
Opinions?

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

John David Anglin Dec. 8, 2016, 9:21 p.m. UTC | #3

On 2016-12-08 4:15 PM, Helge Deller wrote:
> On 08.12.2016 21:49, John David Anglin wrote:
>> On 2016-12-08 3:00 PM, Helge Deller wrote:
>>> On a technical side, this seems to happen:
>>> The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB
>>> entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge
>>> instruction seems to work without problems even if the pages were mapped as
>>> huge pages.  But on SMP systems the TLB purge instruction is broadcasted to
>>> other CPUs. Those CPUs then crash the machine because the page size is not as
>>> expected.  C8000 machines with PA8800/PA8900 CPUs were not affected by this
>>> problem, because the required cache coherency prohibits to use huge pages at
>>> all.  Sadly I didn't found any documentation about this behaviour, so this
>>> finding is purely based on testing with phyiscal SMP machines (A500-44 and
>>> J5000, both were 2-way boxes).
>> I doubt the problem is the 4k iteration using pdtlb 0(sr1,addr). I
>> think the issue is the huge page size for the kernel. Each pdtlb
>> instruction knocks out the same tlb entry including the entry used
>> for tlb interruptions. This likely leads to stack overflow.
> Yes, likely.
>
>> In any
>> case, it probably doesn't provide accurate timing because each pdtlb
>> knocks out the entry for the interruption handler on systems with
>> combined tlb.
> True.
>
> So, how to continue?
> I see two options:
> a) skip the TLB measuring code as my patch does.
> b) kmalloc() another region and do measurement there.
>
> I'd like to submit some fix-patch for 4.9, else the machines won't boot 4.9.
> That's why I'd prefer option a).
> Opinions?
Go with option a) for 4.9

Dave

diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index c263301..977f0a4f 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -393,6 +393,15 @@  void __init parisc_setup_cache_timing(void)
 
 	/* calculate TLB flush threshold */
 
+	/* On SMP machines, skip the TLB measure of kernel text which
+	 * has been mapped as huge pages. */
+	if (num_online_cpus() > 1 && !parisc_requires_coherency()) {
+		threshold = max(cache_info.it_size, cache_info.dt_size);
+		threshold *= PAGE_SIZE;
+		threshold /= num_online_cpus();
+		goto set_tlb_threshold;
+	}
+
 	alltime = mfctl(16);
 	flush_tlb_all();
 	alltime = mfctl(16) - alltime;
@@ -411,6 +420,8 @@  void __init parisc_setup_cache_timing(void)
 		alltime, size, rangetime);
 
 	threshold = PAGE_ALIGN(num_online_cpus() * size * alltime / rangetime);
+
+set_tlb_threshold:
 	if (threshold)
 		parisc_tlb_flush_threshold = threshold;
 	printk(KERN_INFO "TLB flush threshold set to %lu KiB\n",

parisc: Fix TLB related boot crash on SMP machines

Commit Message

Comments

Patch