[2/3] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem

Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].

Quote from Mel Gorman

"The race in question is CPU 0 running madv_free and updating some PTEs
while CPU 1 is also running madv_free and looking at the same PTEs.
CPU 1 may have writable TLB entries for a page but fail the pte_dirty
check (because CPU 0 has updated it already) and potentially fail to flush.
Hence, when madv_free on CPU 1 returns, there are still potentially writable
TLB entries and the underlying PTE is still present so that a subsequent write
does not necessarily propagate the dirty bit to the underlying PTE any more.
Reclaim at some unknown time at the future may then see that the PTE is still
clean and discard the page even though a write has happened in the meantime.
I think this is possible but I could have missed some protection in madv_free
that prevents it happening."

This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].

TLB batch API(tlb_[gather|finish]_mmu] uses [set|clear]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch
there are parallel threads going on. In that case, flush TLB to prevent
for user to access memory via stale TLB entry although it fail to gather
pte entry.

I confiremd this patch works with [4] test program Nadav gave so this patch
supersedes "mm: Always flush VMA ranges affected by zap_page_range v2"
in current mmotm.

NOTE:
This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um). It seems most of architecture are straightforward but s390
need to be careful because tlb_flush_mmu works only if mm->context.flush_mm
is set to non-zero which happens only a pte entry really is cleared by
ptep_get_and_clear and friends. However, this problem never changes the
pte entries but need to flush to prevent memory access from stale tlb.

Any thoughts?

[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/

Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-sh@vger.kernel.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: linux-arch@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 arch/arm/include/asm/tlb.h  | 15 ++++++++++++++-
 arch/ia64/include/asm/tlb.h | 12 ++++++++++++
 arch/s390/include/asm/tlb.h | 15 +++++++++++++++
 arch/sh/include/asm/tlb.h   |  4 +++-
 arch/um/include/asm/tlb.h   |  8 ++++++++
 include/linux/mm_types.h    |  7 +++++--
 mm/memory.c                 | 24 ++++++++++++------------
 7 files changed, 69 insertions(+), 16 deletions(-)

[2/3] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem

Commit Message

Comments

Patch