diff mbox

[RFC,1/3] Revert "mm: always flush VMA ranges affected by zap_page_range"

Message ID 20180612071621.26775-2-npiggin@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nicholas Piggin June 12, 2018, 7:16 a.m. UTC
This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3.

Patch 99baac21e4585 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
problem") provides a superset of the TLB flush coverage of this
commit, and even includes in the changelog "this patch supersedes
'mm: Always flush VMA ranges affected by zap_page_range v2'".

Reverting this avoids double flushing the TLB range, and the less
efficient flush_tlb_range() call (the mmu_gather API is more precise
about what ranges it invalidates).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/memory.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

Comments

Aneesh Kumar K.V June 12, 2018, 1:53 p.m. UTC | #1
On 06/12/2018 12:46 PM, Nicholas Piggin wrote:
> This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3.
> 
> Patch 99baac21e4585 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
> problem") provides a superset of the TLB flush coverage of this
> commit, and even includes in the changelog "this patch supersedes
> 'mm: Always flush VMA ranges affected by zap_page_range v2'".
> 
> Reverting this avoids double flushing the TLB range, and the less
> efficient flush_tlb_range() call (the mmu_gather API is more precise
> about what ranges it invalidates).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   mm/memory.c | 14 +-------------
>   1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 7206a634270b..9d472e00fc2d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1603,20 +1603,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
>   	tlb_gather_mmu(&tlb, mm, start, end);
>   	update_hiwater_rss(mm);
>   	mmu_notifier_invalidate_range_start(mm, start, end);
> -	for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
> +	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
>   		unmap_single_vma(&tlb, vma, start, end, NULL);
> -
> -		/*
> -		 * zap_page_range does not specify whether mmap_sem should be
> -		 * held for read or write. That allows parallel zap_page_range
> -		 * operations to unmap a PTE and defer a flush meaning that
> -		 * this call observes pte_none and fails to flush the TLB.
> -		 * Rather than adding a complex API, ensure that no stale
> -		 * TLB entries exist when this call returns.
> -		 */
> -		flush_tlb_range(vma, start, end);
> -	}
> -
>   	mmu_notifier_invalidate_range_end(mm, start, end);
>   	tlb_finish_mmu(&tlb, start, end);
>   }
> 

No really related to this patch, but does 99baac21e4585 do the right 
thing if the range start - end covers pages with multiple page sizes?

-aneesh
Nadav Amit June 12, 2018, 6:52 p.m. UTC | #2
at 12:16 AM, Nicholas Piggin <npiggin@gmail.com> wrote:

> This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3.
> 
> Patch 99baac21e4585 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
> problem") provides a superset of the TLB flush coverage of this
> commit, and even includes in the changelog "this patch supersedes
> 'mm: Always flush VMA ranges affected by zap_page_range v2'".
> 
> Reverting this avoids double flushing the TLB range, and the less
> efficient flush_tlb_range() call (the mmu_gather API is more precise
> about what ranges it invalidates).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> mm/memory.c | 14 +-------------
> 1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 7206a634270b..9d472e00fc2d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1603,20 +1603,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
> 	tlb_gather_mmu(&tlb, mm, start, end);
> 	update_hiwater_rss(mm);
> 	mmu_notifier_invalidate_range_start(mm, start, end);
> -	for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
> +	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
> 		unmap_single_vma(&tlb, vma, start, end, NULL);
> -
> -		/*
> -		 * zap_page_range does not specify whether mmap_sem should be
> -		 * held for read or write. That allows parallel zap_page_range
> -		 * operations to unmap a PTE and defer a flush meaning that
> -		 * this call observes pte_none and fails to flush the TLB.
> -		 * Rather than adding a complex API, ensure that no stale
> -		 * TLB entries exist when this call returns.
> -		 */
> -		flush_tlb_range(vma, start, end);
> -	}
> -
> 	mmu_notifier_invalidate_range_end(mm, start, end);
> 	tlb_finish_mmu(&tlb, start, end);
> }

Yes, this was in my “to check when I have time” todo list, especially since
the flush was from start to end, not even vma->vm_start to vma->vm_end.

The revert seems correct.

Reviewed-by: Nadav Amit <namit@vmware.com>
diff mbox

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 7206a634270b..9d472e00fc2d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1603,20 +1603,8 @@  void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
 	mmu_notifier_invalidate_range_start(mm, start, end);
-	for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
+	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
 		unmap_single_vma(&tlb, vma, start, end, NULL);
-
-		/*
-		 * zap_page_range does not specify whether mmap_sem should be
-		 * held for read or write. That allows parallel zap_page_range
-		 * operations to unmap a PTE and defer a flush meaning that
-		 * this call observes pte_none and fails to flush the TLB.
-		 * Rather than adding a complex API, ensure that no stale
-		 * TLB entries exist when this call returns.
-		 */
-		flush_tlb_range(vma, start, end);
-	}
-
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	tlb_finish_mmu(&tlb, start, end);
 }