diff mbox series

[v5,3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

Message ID 20190919161204.142796-4-justin.he@arm.com (mailing list archive)
State New, archived
Headers show
Series fix double page fault on arm64 | expand

Commit Message

Justin He Sept. 19, 2019, 4:12 p.m. UTC
When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

Below call trace is from arm64 do_page_fault for debugging purpose
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fix it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
in case there can be some obscure use-case.(by Kirill)

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Reported-by: Yibo Cai <Yibo.Cai@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 mm/memory.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 54 insertions(+), 5 deletions(-)

Comments

Catalin Marinas Sept. 19, 2019, 4:42 p.m. UTC | #1
On Fri, Sep 20, 2019 at 12:12:04AM +0800, Jia He wrote:
> @@ -2152,7 +2163,29 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>  	 */
>  	if (unlikely(!src)) {
>  		void *kaddr = kmap_atomic(dst);
> -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> +		pte_t entry;
> +
> +		/* On architectures with software "accessed" bits, we would
> +		 * take a double page fault, so mark it accessed here.
> +		 */
> +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> +			spin_lock(vmf->ptl);
> +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> +				entry = pte_mkyoung(vmf->orig_pte);
> +				if (ptep_set_access_flags(vma, addr,
> +							  vmf->pte, entry, 0))
> +					update_mmu_cache(vma, addr, vmf->pte);
> +			} else {
> +				/* Other thread has already handled the fault
> +				 * and we don't need to do anything. If it's
> +				 * not the case, the fault will be triggered
> +				 * again on the same address.
> +				 */
> +				return -1;
> +			}
> +			spin_unlock(vmf->ptl);

Returning with the spinlock held doesn't normally go very well ;).
Justin He Sept. 20, 2019, 1:13 a.m. UTC | #2
Hi Catalin

> -----Original Message-----
> From: Catalin Marinas <catalin.marinas@arm.com>
> Sent: 2019年9月20日 0:42
> To: Justin He (Arm Technology China) <Justin.He@arm.com>
> Cc: Will Deacon <will@kernel.org>; Mark Rutland
> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>; Marc
> Zyngier <maz@kernel.org>; Matthew Wilcox <willy@infradead.org>; Kirill A.
> Shutemov <kirill.shutemov@linux.intel.com>; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org; Suzuki Poulose <Suzuki.Poulose@arm.com>; Punit
> Agrawal <punitagrawal@gmail.com>; Anshuman Khandual
> <Anshuman.Khandual@arm.com>; Alex Van Brunt
> <avanbrunt@nvidia.com>; Robin Murphy <Robin.Murphy@arm.com>;
> Thomas Gleixner <tglx@linutronix.de>; Andrew Morton <akpm@linux-
> foundation.org>; Jérôme Glisse <jglisse@redhat.com>; Ralph Campbell
> <rcampbell@nvidia.com>; hejianet@gmail.com; Kaly Xin (Arm Technology
> China) <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v5 3/3] mm: fix double page fault on arm64 if PTE_AF
> is cleared
>
> On Fri, Sep 20, 2019 at 12:12:04AM +0800, Jia He wrote:
> > @@ -2152,7 +2163,29 @@ static inline void cow_user_page(struct page
> *dst, struct page *src, unsigned lo
> >      */
> >     if (unlikely(!src)) {
> >             void *kaddr = kmap_atomic(dst);
> > -           void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > +           void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > +           pte_t entry;
> > +
> > +           /* On architectures with software "accessed" bits, we would
> > +            * take a double page fault, so mark it accessed here.
> > +            */
> > +           if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte))
> {
> > +                   spin_lock(vmf->ptl);
> > +                   if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > +                           entry = pte_mkyoung(vmf->orig_pte);
> > +                           if (ptep_set_access_flags(vma, addr,
> > +                                                     vmf->pte, entry, 0))
> > +                                   update_mmu_cache(vma, addr, vmf-
> >pte);
> > +                   } else {
> > +                           /* Other thread has already handled the
> fault
> > +                            * and we don't need to do anything. If it's
> > +                            * not the case, the fault will be triggered
> > +                            * again on the same address.
> > +                            */
> > +                           return -1;
> > +                   }
> > +                   spin_unlock(vmf->ptl);
>
> Returning with the spinlock held doesn't normally go very well ;).
Yes, my bad. Will fix asap

--
Cheers,
Justin (Jia He)


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..cf681963b2f5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -118,6 +118,13 @@  int randomize_va_space __read_mostly =
 					2;
 #endif
 
+#ifndef arch_faults_on_old_pte
+static inline bool arch_faults_on_old_pte(void)
+{
+	return false;
+}
+#endif
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;
@@ -2140,8 +2147,12 @@  static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
 	return same;
 }
 
-static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
+static inline int cow_user_page(struct page *dst, struct page *src,
+				struct vm_fault *vmf)
 {
+	struct vm_area_struct *vma = vmf->vma;
+	unsigned long addr = vmf->address;
+
 	debug_dma_assert_idle(src);
 
 	/*
@@ -2152,7 +2163,29 @@  static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
 	 */
 	if (unlikely(!src)) {
 		void *kaddr = kmap_atomic(dst);
-		void __user *uaddr = (void __user *)(va & PAGE_MASK);
+		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
+		pte_t entry;
+
+		/* On architectures with software "accessed" bits, we would
+		 * take a double page fault, so mark it accessed here.
+		 */
+		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+			spin_lock(vmf->ptl);
+			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+				entry = pte_mkyoung(vmf->orig_pte);
+				if (ptep_set_access_flags(vma, addr,
+							  vmf->pte, entry, 0))
+					update_mmu_cache(vma, addr, vmf->pte);
+			} else {
+				/* Other thread has already handled the fault
+				 * and we don't need to do anything. If it's
+				 * not the case, the fault will be triggered
+				 * again on the same address.
+				 */
+				return -1;
+			}
+			spin_unlock(vmf->ptl);
+		}
 
 		/*
 		 * This really shouldn't fail, because the page is there
@@ -2160,12 +2193,17 @@  static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
 		 * in which case we just give up and fill the result with
 		 * zeroes.
 		 */
-		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
+		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
+			/* In case there can be some obscure use-case */
+			WARN_ON_ONCE(1);
 			clear_page(kaddr);
+		}
 		kunmap_atomic(kaddr);
 		flush_dcache_page(dst);
 	} else
-		copy_user_highpage(dst, src, va, vma);
+		copy_user_highpage(dst, src, addr, vma);
+
+	return 0;
 }
 
 static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
@@ -2318,7 +2356,16 @@  static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				vmf->address);
 		if (!new_page)
 			goto oom;
-		cow_user_page(new_page, old_page, vmf->address, vma);
+
+		if (cow_user_page(new_page, old_page, vmf)) {
+			/* COW failed, if the fault was solved by other,
+			 * it's fine. If not, userspace would re-fault on
+			 * the same address and we will handle the fault
+			 * from the second attempt.
+			 */
+			put_page(new_page);
+			goto normal;
+		}
 	}
 
 	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
@@ -2420,6 +2467,8 @@  static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 		}
 		put_page(old_page);
 	}
+
+normal:
 	return page_copied ? VM_FAULT_WRITE : 0;
 oom_free_new:
 	put_page(new_page);