Message ID | 1589882610-7291-2-git-send-email-maobibo@loongson.cn (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v4,1/4] MIPS: Do not flush tlb page when updating PTE entry | expand |
On Tue, 19 May 2020 18:03:28 +0800 Bibo Mao <maobibo@loongson.cn> wrote: > If two threads concurrently fault at the same address, the thread that > won the race updates the PTE and its local TLB. For now, the other > thread gives up, simply does nothing, and continues. > > It could happen that this second thread triggers another fault, whereby > it only updates its local TLB while handling the fault. Instead of > triggering another fault, let's directly update the local TLB of the > second thread. > > It is only useful to architectures where software can update TLB, it may > bring out some negative effect if update_mmu_cache is used for other > purpose also. It seldom happens where multiple threads access the same > page at the same time, so the negative effect is limited on other arches. I'm still worried about the impact on other architectures. The additional update_mmu_cache() calls won't occur only when multiple threads are racing against the same page, I think? For example, insert_pfn() will do this when making a read-only page a writable one. Would you have time to add some instrumentation into update_mmu_cache() (maybe a tracepoint) and see what effect this change has upon the frequency at which update_mmu_cache() is called for a selection of workloads? And add this info to the changelog to set minds at ease?
On 05/20/2020 09:26 AM, Andrew Morton wrote: > On Tue, 19 May 2020 18:03:28 +0800 Bibo Mao <maobibo@loongson.cn> wrote: > >> If two threads concurrently fault at the same address, the thread that >> won the race updates the PTE and its local TLB. For now, the other >> thread gives up, simply does nothing, and continues. >> >> It could happen that this second thread triggers another fault, whereby >> it only updates its local TLB while handling the fault. Instead of >> triggering another fault, let's directly update the local TLB of the >> second thread. >> >> It is only useful to architectures where software can update TLB, it may >> bring out some negative effect if update_mmu_cache is used for other >> purpose also. It seldom happens where multiple threads access the same >> page at the same time, so the negative effect is limited on other arches. > > I'm still worried about the impact on other architectures. The > additional update_mmu_cache() calls won't occur only when multiple > threads are racing against the same page, I think? For example, > insert_pfn() will do this when making a read-only page a writable one. How about defining ptep_set_access_flags function like this on mips system? which is the same on riscv platform. static inline int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t entry, int dirty) { if (!pte_same(*ptep, entry)) set_pte_at(vma->vm_mm, address, ptep, entry); /* * update_mmu_cache will unconditionally execute, handling both * the case that the PTE changed and the spurious fault case. */ return true; } And keep the following piece of code unchanged, the change will be smaller. @@ -1770,8 +1770,8 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, } entry = pte_mkyoung(*pte); entry = maybe_mkwrite(pte_mkdirty(entry), vma); - if (ptep_set_access_flags(vma, addr, pte, entry, 1)) - update_mmu_cache(vma, addr, pte); + ptep_set_access_flags(vma, addr, pte, entry, 1); + update_mmu_cache(vma, addr, pte); } @@ -2436,17 +2436,16 @@ static inline bool cow_user_page(struct page *dst, struct page *src, entry = pte_mkyoung(vmf->orig_pte); - if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0)) - update_mmu_cache(vma, addr, vmf->pte); + ptep_set_access_flags(vma, addr, vmf->pte, entry, 0); + update_mmu_cache(vma, addr, vmf->pte); } @@ -2618,8 +2618,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf) flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = pte_mkyoung(vmf->orig_pte); entry = maybe_mkwrite(pte_mkdirty(entry), vma); - if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1)) - update_mmu_cache(vma, vmf->address, vmf->pte); + ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1); + update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); } > > Would you have time to add some instrumentation into update_mmu_cache() > (maybe a tracepoint) and see what effect this change has upon the > frequency at which update_mmu_cache() is called for a selection of > workloads? And add this info to the changelog to set minds at ease? OK, I will add some instrumentation data in the changelog.
On Wed, 20 May 2020 14:39:13 +0800 maobibo <maobibo@loongson.cn> wrote: > > I'm still worried about the impact on other architectures. The > > additional update_mmu_cache() calls won't occur only when multiple > > threads are racing against the same page, I think? For example, > > insert_pfn() will do this when making a read-only page a writable one. > How about defining ptep_set_access_flags function like this on mips system? > which is the same on riscv platform. > > static inline int ptep_set_access_flags(struct vm_area_struct *vma, > unsigned long address, pte_t *ptep, > pte_t entry, int dirty) > { > if (!pte_same(*ptep, entry)) > set_pte_at(vma->vm_mm, address, ptep, entry); > /* > * update_mmu_cache will unconditionally execute, handling both > * the case that the PTE changed and the spurious fault case. > */ > return true; > } > hm, it seems a bit abusive - ptep_set_access_flags() is supposed to return true if the pte changed, and that isn't the case here. I suppose we could run update_mmu_cache() directly from ptep_set_access_flags() if we're about to return false, but that doesn't seem a lot nicer? > > Would you have time to add some instrumentation into update_mmu_cache() > > (maybe a tracepoint) and see what effect this change has upon the > > frequency at which update_mmu_cache() is called for a selection of > > workloads? And add this info to the changelog to set minds at ease? > > OK, I will add some instrumentation data in the changelog. Well, if this testing shows no effect as you expect, perhaps we can leave the code as-is.
diff --git a/mm/memory.c b/mm/memory.c index f703fe8..2eb59a9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1770,8 +1770,8 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, } entry = pte_mkyoung(*pte); entry = maybe_mkwrite(pte_mkdirty(entry), vma); - if (ptep_set_access_flags(vma, addr, pte, entry, 1)) - update_mmu_cache(vma, addr, pte); + ptep_set_access_flags(vma, addr, pte, entry, 1); + update_mmu_cache(vma, addr, pte); } goto out_unlock; } @@ -2436,17 +2436,16 @@ static inline bool cow_user_page(struct page *dst, struct page *src, if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { /* * Other thread has already handled the fault - * and we don't need to do anything. If it's - * not the case, the fault will be triggered - * again on the same address. + * and update local tlb only */ + update_mmu_cache(vma, addr, vmf->pte); ret = false; goto pte_unlock; } entry = pte_mkyoung(vmf->orig_pte); - if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0)) - update_mmu_cache(vma, addr, vmf->pte); + ptep_set_access_flags(vma, addr, vmf->pte, entry, 0); + update_mmu_cache(vma, addr, vmf->pte); } /* @@ -2463,7 +2462,8 @@ static inline bool cow_user_page(struct page *dst, struct page *src, vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); locked = true; if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { - /* The PTE changed under us. Retry page fault. */ + /* The PTE changed under us, update local tlb */ + update_mmu_cache(vma, addr, vmf->pte); ret = false; goto pte_unlock; } @@ -2618,8 +2618,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf) flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = pte_mkyoung(vmf->orig_pte); entry = maybe_mkwrite(pte_mkdirty(entry), vma); - if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1)) - update_mmu_cache(vma, vmf->address, vmf->pte); + ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1); + update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); } @@ -2752,6 +2752,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) new_page = old_page; page_copied = 1; } else { + update_mmu_cache(vma, vmf->address, vmf->pte); mem_cgroup_cancel_charge(new_page, memcg, false); } @@ -2812,6 +2813,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf) * pte_offset_map_lock. */ if (!pte_same(*vmf->pte, vmf->orig_pte)) { + update_mmu_cache(vmf->vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); return VM_FAULT_NOPAGE; } @@ -2936,6 +2938,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); if (!pte_same(*vmf->pte, vmf->orig_pte)) { + update_mmu_cache(vma, vmf->address, vmf->pte); unlock_page(vmf->page); pte_unmap_unlock(vmf->pte, vmf->ptl); put_page(vmf->page); @@ -3341,8 +3344,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) vma->vm_page_prot)); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (!pte_none(*vmf->pte)) + if (!pte_none(*vmf->pte)) { + update_mmu_cache(vma, vmf->address, vmf->pte); goto unlock; + } ret = check_stable_address_space(vma->vm_mm); if (ret) goto unlock; @@ -3378,8 +3383,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (!pte_none(*vmf->pte)) + if (!pte_none(*vmf->pte)) { + update_mmu_cache(vma, vmf->address, vmf->pte); goto release; + } ret = check_stable_address_space(vma->vm_mm); if (ret) @@ -3646,8 +3653,10 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, } /* Re-check under ptl */ - if (unlikely(!pte_none(*vmf->pte))) + if (unlikely(!pte_none(*vmf->pte))) { + update_mmu_cache(vma, vmf->address, vmf->pte); return VM_FAULT_NOPAGE; + } flush_icache_page(vma, page); entry = mk_pte(page, vma->vm_page_prot); @@ -4224,18 +4233,18 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); spin_lock(vmf->ptl); entry = vmf->orig_pte; - if (unlikely(!pte_same(*vmf->pte, entry))) + if (unlikely(!pte_same(*vmf->pte, entry))) { + update_mmu_cache(vmf->vma, vmf->address, vmf->pte); goto unlock; + } if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); entry = pte_mkdirty(entry); } entry = pte_mkyoung(entry); - if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, + if (!ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, vmf->flags & FAULT_FLAG_WRITE)) { - update_mmu_cache(vmf->vma, vmf->address, vmf->pte); - } else { /* * This is needed only for protection faults but the arch code * is not yet telling us if this is a protection fault or not. @@ -4245,6 +4254,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) if (vmf->flags & FAULT_FLAG_WRITE) flush_tlb_fix_spurious_fault(vmf->vma, vmf->address); } + update_mmu_cache(vmf->vma, vmf->address, vmf->pte); unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); return 0;
If two threads concurrently fault at the same address, the thread that won the race updates the PTE and its local TLB. For now, the other thread gives up, simply does nothing, and continues. It could happen that this second thread triggers another fault, whereby it only updates its local TLB while handling the fault. Instead of triggering another fault, let's directly update the local TLB of the second thread. It is only useful to architectures where software can update TLB, it may bring out some negative effect if update_mmu_cache is used for other purpose also. It seldom happens where multiple threads access the same page at the same time, so the negative effect is limited on other arches. Signed-off-by: Bibo Mao <maobibo@loongson.cn> --- mm/memory.c | 44 +++++++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 17 deletions(-)