diff mbox

parisc: fix race conditions flushing user cache pages

Message ID BLU0-SMTP84F3DE8CE6D8B9F60C5C2597970@phx.gbl (mailing list archive)
State Accepted, archived
Headers show

Commit Message

John David Anglin May 28, 2013, 12:09 p.m. UTC
There are two issues addressed by this patch:

1) When we flush user instruction and data pages using the kernel  
tmpalias region, we need to
ensure that the local TLB entry for the tmpalias page is never  
replaced with a different mapping.
Previously, we purged the entry before and after the cache flush  
loop.  Although preemption was
disabled, it seemed possible that the value might change during  
interrupt processing.  The patch
removes the purge and disables interrupts during the initial TLB entry  
purge and cache flush.

2) In a number of places, we flush the TLB for the page and then flush  
the page.  We disabled
preemption around the flush.  This change disables preemption around  
both the TLB and cache
flushes as it seemed the effect of the purge might be lost.

Without this change, I saw four random segmentation faults in about  
1.5 days of intensive package
building last weekend.  With the change, I haven't seen a single  
random segmentation fault in about
one week of building Debian packages on 4-way rp3440.  So, there is a  
significant improvement
in system stability.

Signed-off-by: John David Anglin  <dave.anglin@bell.net>
---

--
John David Anglin	dave.anglin@bell.net

Comments

James Bottomley May 30, 2013, 2:55 p.m. UTC | #1
On Tue, 2013-05-28 at 08:09 -0400, John David Anglin wrote:
> There are two issues addressed by this patch:
> 
> 1) When we flush user instruction and data pages using the kernel  
> tmpalias region, we need to
> ensure that the local TLB entry for the tmpalias page is never  
> replaced with a different mapping.
> Previously, we purged the entry before and after the cache flush  
> loop.  Although preemption was
> disabled, it seemed possible that the value might change during  
> interrupt processing.  The patch
> removes the purge and disables interrupts during the initial TLB entry  
> purge and cache flush.
> 
> 2) In a number of places, we flush the TLB for the page and then flush  
> the page.  We disabled
> preemption around the flush.  This change disables preemption around  
> both the TLB and cache
> flushes as it seemed the effect of the purge might be lost.
> 
> Without this change, I saw four random segmentation faults in about  
> 1.5 days of intensive package
> building last weekend.  With the change, I haven't seen a single  
> random segmentation fault in about
> one week of building Debian packages on 4-way rp3440.  So, there is a  
> significant improvement
> in system stability.

an rp3440 is PA2.0, so you weren't really testing any of the tlb purge
locking changes.

Also, I don't know what happened, but the actual tmpalias theory
requires a TLB purge before and after and I though we used to have them.
The reason is twofold:

     1. You don't want the caches to speculate in the tmpalias region
     2. A flush after makes the routines interrupt safe (because you can
        interrupt in a tmpalias operation, do another tmpalias
        operation, purge the cache and restart within the non interrupt
        tmpalias and expect everything to work).

Trying to disable interrupts sounds like problem 2.  Can we return to
the proper tmpalias operations rather than trying to hack around them?

James




--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller May 30, 2013, 3:01 p.m. UTC | #2
On 05/30/2013 04:55 PM, James Bottomley wrote:
> On Tue, 2013-05-28 at 08:09 -0400, John David Anglin wrote:
>> There are two issues addressed by this patch:
>>
>> 1) When we flush user instruction and data pages using the kernel  
>> tmpalias region, we need to
>> ensure that the local TLB entry for the tmpalias page is never  
>> replaced with a different mapping.
>> Previously, we purged the entry before and after the cache flush  
>> loop.  Although preemption was
>> disabled, it seemed possible that the value might change during  
>> interrupt processing.  The patch
>> removes the purge and disables interrupts during the initial TLB entry  
>> purge and cache flush.
>>
>> 2) In a number of places, we flush the TLB for the page and then flush  
>> the page.  We disabled
>> preemption around the flush.  This change disables preemption around  
>> both the TLB and cache
>> flushes as it seemed the effect of the purge might be lost.
>>
>> Without this change, I saw four random segmentation faults in about  
>> 1.5 days of intensive package
>> building last weekend.  With the change, I haven't seen a single  
>> random segmentation fault in about
>> one week of building Debian packages on 4-way rp3440.  So, there is a  
>> significant improvement
>> in system stability.
> 
> an rp3440 is PA2.0, so you weren't really testing any of the tlb purge
> locking changes.

Which kind of system do we need to test those "tlb purge locking changes" (PAx.y)?
At least I can confirm, that Dave's patches have made all my systems 
absolutely stable.
 
> Also, I don't know what happened, but the actual tmpalias theory
> requires a TLB purge before and after and I though we used to have them.
> The reason is twofold:
> 
>      1. You don't want the caches to speculate in the tmpalias region
>      2. A flush after makes the routines interrupt safe (because you can
>         interrupt in a tmpalias operation, do another tmpalias
>         operation, purge the cache and restart within the non interrupt
>         tmpalias and expect everything to work).
> 
> Trying to disable interrupts sounds like problem 2.  Can we return to
> the proper tmpalias operations rather than trying to hack around them?

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller May 30, 2013, 3:05 p.m. UTC | #3
On 05/30/2013 05:01 PM, Helge Deller wrote:
> On 05/30/2013 04:55 PM, James Bottomley wrote:
>> On Tue, 2013-05-28 at 08:09 -0400, John David Anglin wrote:
>>> There are two issues addressed by this patch:
>>>
>>> 1) When we flush user instruction and data pages using the kernel  
>>> tmpalias region, we need to
>>> ensure that the local TLB entry for the tmpalias page is never  
>>> replaced with a different mapping.
>>> Previously, we purged the entry before and after the cache flush  
>>> loop.  Although preemption was
>>> disabled, it seemed possible that the value might change during  
>>> interrupt processing.  The patch
>>> removes the purge and disables interrupts during the initial TLB entry  
>>> purge and cache flush.
>>>
>>> 2) In a number of places, we flush the TLB for the page and then flush  
>>> the page.  We disabled
>>> preemption around the flush.  This change disables preemption around  
>>> both the TLB and cache
>>> flushes as it seemed the effect of the purge might be lost.
>>>
>>> Without this change, I saw four random segmentation faults in about  
>>> 1.5 days of intensive package
>>> building last weekend.  With the change, I haven't seen a single  
>>> random segmentation fault in about
>>> one week of building Debian packages on 4-way rp3440.  So, there is a  
>>> significant improvement
>>> in system stability.
>>
>> an rp3440 is PA2.0, so you weren't really testing any of the tlb purge
>> locking changes.
> 
> Which kind of system do we need to test those "tlb purge locking changes" (PAx.y)?
> At least I can confirm, that Dave's patches have made all my systems 
> absolutely stable.

For those who want to test right now, you can check out the "for-next" branch of:
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git
in which I've queue up the outstanding patches for 3.10...

(web: http://git.kernel.org/cgit/linux/kernel/git/deller/parisc-linux.git/log/?h=for-next)

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley May 30, 2013, 3:11 p.m. UTC | #4
On Thu, 2013-05-30 at 17:01 +0200, Helge Deller wrote:
> On 05/30/2013 04:55 PM, James Bottomley wrote:
> > On Tue, 2013-05-28 at 08:09 -0400, John David Anglin wrote:
> >> There are two issues addressed by this patch:
> >>
> >> 1) When we flush user instruction and data pages using the kernel  
> >> tmpalias region, we need to
> >> ensure that the local TLB entry for the tmpalias page is never  
> >> replaced with a different mapping.
> >> Previously, we purged the entry before and after the cache flush  
> >> loop.  Although preemption was
> >> disabled, it seemed possible that the value might change during  
> >> interrupt processing.  The patch
> >> removes the purge and disables interrupts during the initial TLB entry  
> >> purge and cache flush.
> >>
> >> 2) In a number of places, we flush the TLB for the page and then flush  
> >> the page.  We disabled
> >> preemption around the flush.  This change disables preemption around  
> >> both the TLB and cache
> >> flushes as it seemed the effect of the purge might be lost.
> >>
> >> Without this change, I saw four random segmentation faults in about  
> >> 1.5 days of intensive package
> >> building last weekend.  With the change, I haven't seen a single  
> >> random segmentation fault in about
> >> one week of building Debian packages on 4-way rp3440.  So, there is a  
> >> significant improvement
> >> in system stability.
> > 
> > an rp3440 is PA2.0, so you weren't really testing any of the tlb purge
> > locking changes.
> 
> Which kind of system do we need to test those "tlb purge locking changes" (PAx.y)?

the #ifdef is CONFIG_PA20, so any 1.x system should work.  The actual
locks are only needed for the Merced systems, which are N class because
they have an architectural bug where there can only be one purge active
on the CPU bus at any one time.

> At least I can confirm, that Dave's patches have made all my systems 
> absolutely stable.

Can we try the proper fix before things like disabling interrupts.
copy_page_asm and clear_page_asm don't have the necessary purges at the
end.  I don't think this ever mattered for clear_page_asm since it's
only used in the memory backend for pages going to userspace (so not at
interrupt), but for some reason we've suddenly started using
copy_page_asm, which could be used at interrupt.

I honestly don't think we should be using copy_page_asm at all.  The
speed up we get from it isn't realised because we have to flush the
cache after we've finished to try and make I/D coherence.  This makes
the call the same cost as the kmapped memcpy but requires twice the
tmpalias space to be allocated because of the from and to mappings.

James

> > Also, I don't know what happened, but the actual tmpalias theory
> > requires a TLB purge before and after and I though we used to have them.
> > The reason is twofold:
> > 
> >      1. You don't want the caches to speculate in the tmpalias region
> >      2. A flush after makes the routines interrupt safe (because you can
> >         interrupt in a tmpalias operation, do another tmpalias
> >         operation, purge the cache and restart within the non interrupt
> >         tmpalias and expect everything to work).
> > 
> > Trying to disable interrupts sounds like problem 2.  Can we return to
> > the proper tmpalias operations rather than trying to hack around them?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John David Anglin May 30, 2013, 5:23 p.m. UTC | #5
On 5/30/2013 11:11 AM, James Bottomley wrote:
> On Thu, 2013-05-30 at 17:01 +0200, Helge Deller wrote:
>> On 05/30/2013 04:55 PM, James Bottomley wrote:
>>> On Tue, 2013-05-28 at 08:09 -0400, John David Anglin wrote:
>>>> There are two issues addressed by this patch:
>>>>
>>>> 1) When we flush user instruction and data pages using the kernel
>>>> tmpalias region, we need to
>>>> ensure that the local TLB entry for the tmpalias page is never
>>>> replaced with a different mapping.
>>>> Previously, we purged the entry before and after the cache flush
>>>> loop.  Although preemption was
>>>> disabled, it seemed possible that the value might change during
>>>> interrupt processing.  The patch
>>>> removes the purge and disables interrupts during the initial TLB entry
>>>> purge and cache flush.
>>>>
>>>> 2) In a number of places, we flush the TLB for the page and then flush
>>>> the page.  We disabled
>>>> preemption around the flush.  This change disables preemption around
>>>> both the TLB and cache
>>>> flushes as it seemed the effect of the purge might be lost.
>>>>
>>>> Without this change, I saw four random segmentation faults in about
>>>> 1.5 days of intensive package
>>>> building last weekend.  With the change, I haven't seen a single
>>>> random segmentation fault in about
>>>> one week of building Debian packages on 4-way rp3440.  So, there is a
>>>> significant improvement
>>>> in system stability.
>>> an rp3440 is PA2.0, so you weren't really testing any of the tlb purge
>>> locking changes.
>> Which kind of system do we need to test those "tlb purge locking changes" (PAx.y)?
> the #ifdef is CONFIG_PA20, so any 1.x system should work.  The actual
> locks are only needed for the Merced systems, which are N class because
> they have an architectural bug where there can only be one purge active
> on the CPU bus at any one time.
With respect to the functions doing copies and flushes via the tmpalias 
region, it's my
theory that the locks aren't needed for local purges on PA 2.0. This was 
implemented
previously and isn't relevant to current change.

For global TLB purges, I think the problem is also present on rp3440 
(i.e., disabling
the locks causes problems).  I'm fairly sure that I tried this.
>
>> At least I can confirm, that Dave's patches have made all my systems
>> absolutely stable.
> Can we try the proper fix before things like disabling interrupts.
> copy_page_asm and clear_page_asm don't have the necessary purges at the
> end.  I don't think this ever mattered for clear_page_asm since it's
> only used in the memory backend for pages going to userspace (so not at
> interrupt), but for some reason we've suddenly started using
> copy_page_asm, which could be used at interrupt.
>
> I honestly don't think we should be using copy_page_asm at all.  The
> speed up we get from it isn't realised because we have to flush the
> cache after we've finished to try and make I/D coherence.  This makes
> the call the same cost as the kmapped memcpy but requires twice the
> tmpalias space to be allocated because of the from and to mappings.
>
> James
>
>>> Also, I don't know what happened, but the actual tmpalias theory
>>> requires a TLB purge before and after and I though we used to have them.
>>> The reason is twofold:
>>>
>>>       1. You don't want the caches to speculate in the tmpalias region
>>>       2. A flush after makes the routines interrupt safe (because you can
>>>          interrupt in a tmpalias operation, do another tmpalias
>>>          operation, purge the cache and restart within the non interrupt
>>>          tmpalias and expect everything to work).
I had questioned whether this is true.  In any case, if an interrupt 
occurs, the TLB entry needs
a reload.  So, if latency isn't an issue, it's likely better to disable 
interrupts.  One also doesn't
need after TLB purge.

The copy and clear routines never had TLB purges after the loops. That 
needs fixing
if we follow your suggestion.  At the moment, these routines are unused.

>>>
>>> Trying to disable interrupts sounds like problem 2.  Can we return to
>>> the proper tmpalias operations rather than trying to hack around them?
I'm willing to test to see if adding the purges to clear_user_page and 
copy_user_page helps.
They just need to call flush_kernel_dcache_page_addr instead of 
flush_kernel_dcache_page_asm.

Dave
diff mbox

Patch

diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index f0e2784..772dc27 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -114,8 +114,8 @@  static inline void
 flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr)
 {
 	if (PageAnon(page)) {
-		flush_tlb_page(vma, vmaddr);
 		preempt_disable();
+		flush_tlb_page(vma, vmaddr);
 		flush_dcache_page_asm(page_to_phys(page), vmaddr);
 		preempt_enable();
 	}
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 65fb4cb..2bbdada 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -397,10 +397,10 @@  void copy_user_page(void *vto, void *vfrom, unsigned long vaddr,
 	   the kernel mapping. */
 	preempt_disable();
 	flush_dcache_page_asm(__pa(vfrom), vaddr);
-	preempt_enable();
 	copy_page_asm(vto, vfrom);
 	if (!parisc_requires_coherency())
 		flush_kernel_dcache_page_asm(vto);
+	preempt_enable();
 }
 EXPORT_SYMBOL(copy_user_page);
 
@@ -619,11 +619,11 @@  void clear_user_highpage(struct page *page, unsigned long vaddr)
 	   and the write-capable translation removed from the page
 	   table and purged from the TLB."  */
 
+	preempt_disable();
 	purge_kernel_dcache_page_asm((unsigned long)vto);
 	purge_tlb_start(flags);
 	pdtlb_kernel(vto);
 	purge_tlb_end(flags);
-	preempt_disable();
 	clear_user_page_asm(vto, vaddr);
 	preempt_enable();
 
@@ -644,12 +644,12 @@  void copy_user_highpage(struct page *to, struct page *from,
 	vfrom = kmap_atomic(from);
 	vto = kmap_atomic(to);
 
+	preempt_disable();
 	purge_kernel_dcache_page_asm((unsigned long)vto);
 	purge_tlb_start(flags);
 	pdtlb_kernel(vto);
 	pdtlb_kernel(vfrom);
 	purge_tlb_end(flags);
-	preempt_disable();
 	copy_user_page_asm(vto, vfrom, vaddr);
 	flush_dcache_page_asm(__pa(vto), vaddr);
 	preempt_enable();
diff --git a/arch/parisc/kernel/pacache.S b/arch/parisc/kernel/pacache.S
index 36d7f40..46a83be 100644
--- a/arch/parisc/kernel/pacache.S
+++ b/arch/parisc/kernel/pacache.S
@@ -331,11 +331,10 @@  ENDPROC(flush_data_cache_local)
 
 /* Macros to serialize TLB purge operations on SMP.  */
 
-	.macro	tlb_lock	la,flags,tmp
+	.macro	tlb_lock	la,tmp
 #ifdef CONFIG_SMP
 	ldil		L%pa_tlb_lock,%r1
 	ldo		R%pa_tlb_lock(%r1),\la
-	rsm		PSW_SM_I,\flags
 1:	LDCW		0(\la),\tmp
 	cmpib,<>,n	0,\tmp,3f
 2:	ldw		0(\la),\tmp
@@ -346,11 +345,10 @@  ENDPROC(flush_data_cache_local)
 #endif
 	.endm
 
-	.macro	tlb_unlock	la,flags,tmp
+	.macro	tlb_unlock	la,tmp
 #ifdef CONFIG_SMP
 	ldi		1,\tmp
 	stw		\tmp,0(\la)
-	mtsm		\flags
 #endif
 	.endm
 
@@ -617,16 +615,22 @@  ENTRY(copy_user_page_asm)
 	depwi		1, 9,1, %r29		/* Form aliased virtual address 'from' */
 #endif
 
+	/* We need to ensure that the TLB entry is not replaced with
+	   a different value while the clear/copy loop executes.
+	   If this results in too much latency, interrupts could
+	   be re-enabled periodically.  */
+	rsm		PSW_SM_I, %r19		/* save I-bit state */
+
 	/* Purge any old translations */
 
 #ifdef CONFIG_PA20
 	pdtlb,l		0(%r28)
 	pdtlb,l		0(%r29)
 #else
-	tlb_lock	%r20,%r21,%r22
+	tlb_lock	%r20,%r22
 	pdtlb		0(%r28)
 	pdtlb		0(%r29)
-	tlb_unlock	%r20,%r21,%r22
+	tlb_unlock	%r20,%r22
 #endif
 
 #ifdef CONFIG_64BIT
@@ -736,7 +740,7 @@  ENTRY(copy_user_page_asm)
 	addib,COND(>)		-1, %r1,1b
 	ldo		64(%r29), %r29
 #endif
-
+	mtsm		%r19
 	bv		%r0(%r2)
 	nop
 	.exit
@@ -765,14 +769,20 @@  ENTRY(clear_user_page_asm)
 	depwi		0, 31,PAGE_SHIFT, %r28	/* Clear any offset bits */
 #endif
 
+	/* We need to ensure that the TLB entry is not replaced with
+	   a different value while the clear/copy loop executes.
+	   If this results in too much latency, interrupts could
+	   be re-enabled periodically.  */
+	rsm		PSW_SM_I, %r19		/* save I-bit state */
+
 	/* Purge any old translation */
 
 #ifdef CONFIG_PA20
 	pdtlb,l		0(%r28)
 #else
-	tlb_lock	%r20,%r21,%r22
+	tlb_lock	%r20,%r22
 	pdtlb		0(%r28)
-	tlb_unlock	%r20,%r21,%r22
+	tlb_unlock	%r20,%r22
 #endif
 
 #ifdef CONFIG_64BIT
@@ -823,6 +833,7 @@  ENTRY(clear_user_page_asm)
 	ldo		64(%r28), %r28
 #endif	/* CONFIG_64BIT */
 
+	mtsm		%r19
 	bv		%r0(%r2)
 	nop
 	.exit
@@ -849,18 +860,20 @@  ENTRY(flush_dcache_page_asm)
 	depwi		0, 31,PAGE_SHIFT, %r28	/* Clear any offset bits */
 #endif
 
+	rsm		PSW_SM_I, %r19		/* save I-bit state */
+
 	/* Purge any old translation */
 
 #ifdef CONFIG_PA20
 	pdtlb,l		0(%r28)
 #else
-	tlb_lock	%r20,%r21,%r22
+	tlb_lock	%r20,%r22
 	pdtlb		0(%r28)
-	tlb_unlock	%r20,%r21,%r22
+	tlb_unlock	%r20,%r22
 #endif
 
 	ldil		L%dcache_stride, %r1
-	ldw		R%dcache_stride(%r1), %r1
+	ldw		R%dcache_stride(%r1), %r23
 
 #ifdef CONFIG_64BIT
 	depdi,z		1, 63-PAGE_SHIFT,1, %r25
@@ -868,37 +881,30 @@  ENTRY(flush_dcache_page_asm)
 	depwi,z		1, 31-PAGE_SHIFT,1, %r25
 #endif
 	add		%r28, %r25, %r25
-	sub		%r25, %r1, %r25
-
-
-1:      fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
-	fdc,m		%r1(%r28)
+	sub		%r25, %r23, %r25
+
+
+1:      fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
+	fdc,m		%r23(%r28)
 	cmpb,COND(<<)		%r28, %r25,1b
-	fdc,m		%r1(%r28)
+	fdc,m		%r23(%r28)
 
+	mtsm		%r19
 	sync
 
-#ifdef CONFIG_PA20
-	pdtlb,l		0(%r25)
-#else
-	tlb_lock	%r20,%r21,%r22
-	pdtlb		0(%r25)
-	tlb_unlock	%r20,%r21,%r22
-#endif
-
 	bv		%r0(%r2)
 	nop
 	.exit
@@ -925,18 +931,20 @@  ENTRY(flush_icache_page_asm)
 	depwi		0, 31,PAGE_SHIFT, %r28	/* Clear any offset bits */
 #endif
 
+	rsm		PSW_SM_I, %r19		/* save I-bit state */
+
 	/* Purge any old translation */
 
 #ifdef CONFIG_PA20
 	pitlb,l         %r0(%sr4,%r28)
 #else
-	tlb_lock        %r20,%r21,%r22
+	tlb_lock        %r20,%r22
 	pitlb           (%sr4,%r28)
-	tlb_unlock      %r20,%r21,%r22
+	tlb_unlock      %r20,%r22
 #endif
 
 	ldil		L%icache_stride, %r1
-	ldw		R%icache_stride(%r1), %r1
+	ldw		R%icache_stride(%r1), %r23
 
 #ifdef CONFIG_64BIT
 	depdi,z		1, 63-PAGE_SHIFT,1, %r25
@@ -944,39 +952,32 @@  ENTRY(flush_icache_page_asm)
 	depwi,z		1, 31-PAGE_SHIFT,1, %r25
 #endif
 	add		%r28, %r25, %r25
-	sub		%r25, %r1, %r25
+	sub		%r25, %r23, %r25
 
 
 	/* fic only has the type 26 form on PA1.1, requiring an
 	 * explicit space specification, so use %sr4 */
-1:      fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
-	fic,m		%r1(%sr4,%r28)
+1:      fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
 	cmpb,COND(<<)	%r28, %r25,1b
-	fic,m		%r1(%sr4,%r28)
+	fic,m		%r23(%sr4,%r28)
 
+	mtsm		%r19
 	sync
 
-#ifdef CONFIG_PA20
-	pitlb,l         %r0(%sr4,%r25)
-#else
-	tlb_lock        %r20,%r21,%r22
-	pitlb           (%sr4,%r25)
-	tlb_unlock      %r20,%r21,%r22
-#endif
-
 	bv		%r0(%r2)
 	nop
 	.exit