mm: Avoid data corruption on CoW fault into PFN-mapped VMA
diff mbox series

Message ID 20200218154151.13349-1-kirill.shutemov@linux.intel.com
State New
Headers show
Series
  • mm: Avoid data corruption on CoW fault into PFN-mapped VMA
Related show

Commit Message

Kirill A. Shutemov Feb. 18, 2020, 3:41 p.m. UTC
Jeff Moyer has reported that one of xfstests triggers a warning when run
on DAX-enabled filesystem:

	WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50
	...
	wp_page_copy+0x98c/0xd50 (unreliable)
	do_wp_page+0xd8/0xad0
	__handle_mm_fault+0x748/0x1b90
	handle_mm_fault+0x120/0x1f0
	__do_page_fault+0x240/0xd70
	do_page_fault+0x38/0xd0
	handle_page_fault+0x10/0x30

The warning happens on failed __copy_from_user_inatomic() which tries to
copy data into a CoW page.

This happens because of race between MADV_DONTNEED and CoW page fault:

	CPU0					CPU1
 handle_mm_fault()
   do_wp_page()
     wp_page_copy()
       do_wp_page()
					madvise(MADV_DONTNEED)
					  zap_page_range()
					    zap_pte_range()
					      ptep_get_and_clear_full()
					      <TLB flush>
	 __copy_from_user_inatomic()
	 sees empty PTE and fails
	 WARN_ON_ONCE(1)
	 clear_page()

The solution is to re-try __copy_from_user_inatomic() under PTL after
checking that PTE is matches the orig_pte.

The second copy attempt can still fail, like due to non-readable PTE,
but there's nothing reasonable we can do about, except clearing the CoW
page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-and-tested-by: Jeff Moyer <jmoyer@redhat.com>
---
 mm/memory.c | 35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

Comments

Andrew Morton Feb. 19, 2020, 9:22 p.m. UTC | #1
On Tue, 18 Feb 2020 18:41:51 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> Jeff Moyer has reported that one of xfstests triggers a warning when run
> on DAX-enabled filesystem:
> 
> 	WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50
> 	...
> 	wp_page_copy+0x98c/0xd50 (unreliable)
> 	do_wp_page+0xd8/0xad0
> 	__handle_mm_fault+0x748/0x1b90
> 	handle_mm_fault+0x120/0x1f0
> 	__do_page_fault+0x240/0xd70
> 	do_page_fault+0x38/0xd0
> 	handle_page_fault+0x10/0x30
> 
> The warning happens on failed __copy_from_user_inatomic() which tries to
> copy data into a CoW page.
> 
> This happens because of race between MADV_DONTNEED and CoW page fault:
> 
> 	CPU0					CPU1
>  handle_mm_fault()
>    do_wp_page()
>      wp_page_copy()
>        do_wp_page()
> 					madvise(MADV_DONTNEED)
> 					  zap_page_range()
> 					    zap_pte_range()
> 					      ptep_get_and_clear_full()
> 					      <TLB flush>
> 	 __copy_from_user_inatomic()
> 	 sees empty PTE and fails
> 	 WARN_ON_ONCE(1)
> 	 clear_page()
> 
> The solution is to re-try __copy_from_user_inatomic() under PTL after
> checking that PTE is matches the orig_pte.
> 
> The second copy attempt can still fail, like due to non-readable PTE,
> but there's nothing reasonable we can do about, except clearing the CoW
> page.

You don't think this is worthy of a cc:stable?
Kirill A. Shutemov Feb. 20, 2020, 12:06 p.m. UTC | #2
On Wed, Feb 19, 2020 at 01:22:39PM -0800, Andrew Morton wrote:
> On Tue, 18 Feb 2020 18:41:51 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > Jeff Moyer has reported that one of xfstests triggers a warning when run
> > on DAX-enabled filesystem:
> > 
> > 	WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50
> > 	...
> > 	wp_page_copy+0x98c/0xd50 (unreliable)
> > 	do_wp_page+0xd8/0xad0
> > 	__handle_mm_fault+0x748/0x1b90
> > 	handle_mm_fault+0x120/0x1f0
> > 	__do_page_fault+0x240/0xd70
> > 	do_page_fault+0x38/0xd0
> > 	handle_page_fault+0x10/0x30
> > 
> > The warning happens on failed __copy_from_user_inatomic() which tries to
> > copy data into a CoW page.
> > 
> > This happens because of race between MADV_DONTNEED and CoW page fault:
> > 
> > 	CPU0					CPU1
> >  handle_mm_fault()
> >    do_wp_page()
> >      wp_page_copy()
> >        do_wp_page()
> > 					madvise(MADV_DONTNEED)
> > 					  zap_page_range()
> > 					    zap_pte_range()
> > 					      ptep_get_and_clear_full()
> > 					      <TLB flush>
> > 	 __copy_from_user_inatomic()
> > 	 sees empty PTE and fails
> > 	 WARN_ON_ONCE(1)
> > 	 clear_page()
> > 
> > The solution is to re-try __copy_from_user_inatomic() under PTL after
> > checking that PTE is matches the orig_pte.
> > 
> > The second copy attempt can still fail, like due to non-readable PTE,
> > but there's nothing reasonable we can do about, except clearing the CoW
> > page.
> 
> You don't think this is worthy of a cc:stable?

Please, add it.

Although, if I read history correctly, it is 15 year old bug that nobody
noticed until we added WARN() there :/

Patch
diff mbox series

diff --git a/mm/memory.c b/mm/memory.c
index 0bccc622e482..e8bfdf0d9d1d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2257,7 +2257,7 @@  static inline bool cow_user_page(struct page *dst, struct page *src,
 	bool ret;
 	void *kaddr;
 	void __user *uaddr;
-	bool force_mkyoung;
+	bool locked = false;
 	struct vm_area_struct *vma = vmf->vma;
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long addr = vmf->address;
@@ -2282,11 +2282,11 @@  static inline bool cow_user_page(struct page *dst, struct page *src,
 	 * On architectures with software "accessed" bits, we would
 	 * take a double page fault, so mark it accessed here.
 	 */
-	force_mkyoung = arch_faults_on_old_pte() && !pte_young(vmf->orig_pte);
-	if (force_mkyoung) {
+	if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
 		pte_t entry;
 
 		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
+		locked = true;
 		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
 			/*
 			 * Other thread has already handled the fault
@@ -2310,18 +2310,37 @@  static inline bool cow_user_page(struct page *dst, struct page *src,
 	 * zeroes.
 	 */
 	if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
+		if (locked)
+			goto warn;
+
+		/* Re-validate under PTL if the page is still mapped */
+		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
+		locked = true;
+		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+			/* The PTE changed under us. Retry page fault. */
+			ret = false;
+			goto pte_unlock;
+		}
+
 		/*
-		 * Give a warn in case there can be some obscure
-		 * use-case
+		 * The same page can be mapped back since last copy attampt.
+		 * Try to copy again under PTL.
 		 */
-		WARN_ON_ONCE(1);
-		clear_page(kaddr);
+		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
+			/*
+			 * Give a warn in case there can be some obscure
+			 * use-case
+			 */
+warn:
+			WARN_ON_ONCE(1);
+			clear_page(kaddr);
+		}
 	}
 
 	ret = true;
 
 pte_unlock:
-	if (force_mkyoung)
+	if (locked)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	kunmap_atomic(kaddr);
 	flush_dcache_page(dst);