diff mbox series

parisc: Purge TLB entries after updating page table entry and set page accessed flag in TLB handler

Message ID 655175aa-8b5c-0cc0-f675-5ce64feae0be@bell.net (mailing list archive)
State Accepted, archived
Headers show
Series parisc: Purge TLB entries after updating page table entry and set page accessed flag in TLB handler | expand

Commit Message

John David Anglin Sept. 21, 2018, 2:31 a.m. UTC
This patch may resolve some races in TLB handling.  Hopefully, TLB 
inserts are accesses and
protected by spin lock.

If not, we may need to IPI calls and do local purges on PA 2.0.

Dave

Comments

Mikulas Patocka Sept. 21, 2018, 4:44 p.m. UTC | #1
On Thu, 20 Sep 2018, John David Anglin wrote:

> This patch may resolve some races in TLB handling.  Hopefully, TLB inserts are
> accesses and
> protected by spin lock.
> 
> If not, we may need to IPI calls and do local purges on PA 2.0.
> 
> Dave
> 
> -- 
> John David Anglin  dave.anglin@bell.net


> -       purge_tlb_entries(vma->vm_mm, addr);
>         set_pte(ptep, pte_mkold(pte));
> +       purge_tlb_entries(vma->vm_mm, addr);

I think you don't need to swap set_pte and purge_tlb_entries. This piece 
of code can't race with the TLB handlers because they hold the lock. If 
you drop the lock from the TLB handlers, it would cause more problems and 
it couldn't be saved by swapping set_pte and purge_tlb_entries.

>  #ifdef CONFIG_SMP
>         or,COND(=)      %r0,\spc,%r0
> -       sync
> -       or,COND(=)      %r0,\spc,%r0
> -       stw             \spc,0(\tmp)
> +       stw,ma          \spc,0(\tmp)

Does the standard somewhere say that the "ma" completer works like a 
memory barrier? I didn't know about this.

Mikulas
John David Anglin Sept. 21, 2018, 5:28 p.m. UTC | #2
On 2018-09-21 12:44 PM, Mikulas Patocka wrote:
>
> On Thu, 20 Sep 2018, John David Anglin wrote:
>
>> This patch may resolve some races in TLB handling.  Hopefully, TLB inserts are
>> accesses and
>> protected by spin lock.
>>
>> If not, we may need to IPI calls and do local purges on PA 2.0.
>>
>> Dave
>>
>> -- 
>> John David Anglin  dave.anglin@bell.net
>
>> -       purge_tlb_entries(vma->vm_mm, addr);
>>          set_pte(ptep, pte_mkold(pte));
>> +       purge_tlb_entries(vma->vm_mm, addr);
> I think you don't need to swap set_pte and purge_tlb_entries. This piece
> of code can't race with the TLB handlers because they hold the lock. If
> you drop the lock from the TLB handlers, it would cause more problems and
> it couldn't be saved by swapping set_pte and purge_tlb_entries.
Yes, but it gives me a comfortable feeling that installed TLB entries 
are removed after
the new page table entry is installed.  Thus, code that tries to access 
the page will fault and
install the new PTE value.  This assumes PTE inserts are accesses or 
they don't get delayed
too much in the handler after the lock is released.
>
>>   #ifdef CONFIG_SMP
>>          or,COND(=)      %r0,\spc,%r0
>> -       sync
>> -       or,COND(=)      %r0,\spc,%r0
>> -       stw             \spc,0(\tmp)
>> +       stw,ma          \spc,0(\tmp)
> Does the standard somewhere say that the "ma" completer works like a
> memory barrier? I didn't know about this.
Yes.  On PA  2.0, a store with zero offset and the "ma" completer is 
supposed to be an
ordered store.  This is "o" completer in PA 2.0 mnemonics. According to 
page G-2,
all prior accesses are guaranteed to be performed before an ordered 
store is performed.
So, it's supposed to be a store with release semantics.  On PA 1.X, it 
doesn't do anything
special (adds 0 to \tmp register).  I used "ma" so code is PA 1.X 
compatible.  On PA 1.X,
all accesses are supposed to be strongly ordered.

The intent is to ensure the release is done after all the "accesses" 
inside the lock region.
The ldcw is the barrier at the start.

The "sync" barrier degraded performance noticeably.

Dave
Mikulas Patocka Sept. 21, 2018, 10:48 p.m. UTC | #3
On Thu, 20 Sep 2018, John David Anglin wrote:

> This patch may resolve some races in TLB handling.  Hopefully, TLB inserts are
> accesses and
> protected by spin lock.
> 
> If not, we may need to IPI calls and do local purges on PA 2.0.
> 
> Dave

I tested this patch and it reduces kernel compile time from one hour to 30 
minutes.

Mikulas
John David Anglin Sept. 23, 2018, 3:33 p.m. UTC | #4
On 2018-09-21 6:48 PM, Mikulas Patocka wrote:
> I tested this patch and it reduces kernel compile time from one hour to 30
> minutes.
I'm surprised about the magnitude of the improvement.  There are two 
reasons for improvement.
Firstly, the "sync" barrier is very slow.  Secondly, now that we track 
TLB inserts with the accessed
bit, we only flush TLB entries that have been inserted.  The "pdtlb" and 
"pitlb" instructions take
several hundred cycles as I recall.

Dave
diff mbox series

Patch

diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index fa6b7c78f18a..b86c31291f0a 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -66,9 +66,9 @@  static inline void purge_tlb_entries(struct mm_struct *mm, unsigned long addr)
 		unsigned long flags;				\
 		spin_lock_irqsave(&pa_tlb_lock, flags);		\
 		old_pte = *ptep;				\
+		set_pte(ptep, pteval);				\
 		if (pte_inserted(old_pte))			\
 			purge_tlb_entries(mm, addr);		\
-		set_pte(ptep, pteval);				\
 		spin_unlock_irqrestore(&pa_tlb_lock, flags);	\
 	} while (0)
 
@@ -202,7 +202,7 @@  static inline void purge_tlb_entries(struct mm_struct *mm, unsigned long addr)
 #define _PAGE_HUGE     (1 << xlate_pabit(_PAGE_HPAGE_BIT))
 #define _PAGE_USER     (1 << xlate_pabit(_PAGE_USER_BIT))
 
-#define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE |  _PAGE_DIRTY | _PAGE_ACCESSED)
+#define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY | _PAGE_ACCESSED)
 #define _PAGE_CHG_MASK	(PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
 #define _PAGE_KERNEL_RO	(_PAGE_PRESENT | _PAGE_READ | _PAGE_DIRTY | _PAGE_ACCESSED)
 #define _PAGE_KERNEL_EXEC	(_PAGE_KERNEL_RO | _PAGE_EXEC)
@@ -227,22 +227,22 @@  static inline void purge_tlb_entries(struct mm_struct *mm, unsigned long addr)
 
 #ifndef __ASSEMBLY__
 
-#define PAGE_NONE	__pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED)
-#define PAGE_SHARED	__pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_READ | _PAGE_WRITE | _PAGE_ACCESSED)
+#define PAGE_NONE	__pgprot(_PAGE_PRESENT | _PAGE_USER)
+#define PAGE_SHARED	__pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_READ | _PAGE_WRITE)
 /* Others seem to make this executable, I don't know if that's correct
    or not.  The stack is mapped this way though so this is necessary
    in the short term - dhd@linuxcare.com, 2000-08-08 */
-#define PAGE_READONLY	__pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_READ | _PAGE_ACCESSED)
-#define PAGE_WRITEONLY  __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_WRITE | _PAGE_ACCESSED)
-#define PAGE_EXECREAD   __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_READ | _PAGE_EXEC |_PAGE_ACCESSED)
+#define PAGE_READONLY	__pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_READ)
+#define PAGE_WRITEONLY  __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_WRITE)
+#define PAGE_EXECREAD   __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_READ | _PAGE_EXEC)
 #define PAGE_COPY       PAGE_EXECREAD
-#define PAGE_RWX        __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_READ | _PAGE_WRITE | _PAGE_EXEC |_PAGE_ACCESSED)
+#define PAGE_RWX        __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 #define PAGE_KERNEL	__pgprot(_PAGE_KERNEL)
 #define PAGE_KERNEL_EXEC	__pgprot(_PAGE_KERNEL_EXEC)
 #define PAGE_KERNEL_RWX	__pgprot(_PAGE_KERNEL_RWX)
 #define PAGE_KERNEL_RO	__pgprot(_PAGE_KERNEL_RO)
 #define PAGE_KERNEL_UNC	__pgprot(_PAGE_KERNEL | _PAGE_NO_CACHE)
-#define PAGE_GATEWAY    __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED | _PAGE_GATEWAY| _PAGE_READ)
+#define PAGE_GATEWAY    __pgprot(_PAGE_PRESENT | _PAGE_USER | _PAGE_GATEWAY| _PAGE_READ)
 
 
 /*
@@ -479,8 +479,8 @@  static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned
 		spin_unlock_irqrestore(&pa_tlb_lock, flags);
 		return 0;
 	}
-	purge_tlb_entries(vma->vm_mm, addr);
 	set_pte(ptep, pte_mkold(pte));
+	purge_tlb_entries(vma->vm_mm, addr);
 	spin_unlock_irqrestore(&pa_tlb_lock, flags);
 	return 1;
 }
@@ -493,9 +493,9 @@  static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 
 	spin_lock_irqsave(&pa_tlb_lock, flags);
 	old_pte = *ptep;
+	set_pte(ptep, __pte(0));
 	if (pte_inserted(old_pte))
 		purge_tlb_entries(mm, addr);
-	set_pte(ptep, __pte(0));
 	spin_unlock_irqrestore(&pa_tlb_lock, flags);
 
 	return old_pte;
@@ -505,8 +505,8 @@  static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
 {
 	unsigned long flags;
 	spin_lock_irqsave(&pa_tlb_lock, flags);
-	purge_tlb_entries(mm, addr);
 	set_pte(ptep, pte_wrprotect(*ptep));
+	purge_tlb_entries(mm, addr);
 	spin_unlock_irqrestore(&pa_tlb_lock, flags);
 }
 
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 1b4732e20137..27944a7ebfeb 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -482,9 +482,7 @@ 
 	.macro		tlb_unlock0	spc,tmp
 #ifdef CONFIG_SMP
 	or,COND(=)	%r0,\spc,%r0
-	sync
-	or,COND(=)	%r0,\spc,%r0
-	stw             \spc,0(\tmp)
+	stw,ma		\spc,0(\tmp)
 #endif
 	.endm