[RFC,09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe

Message ID	20221030213043.335669-1-peterx@redhat.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Peter Xu <peterx@redhat.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: James Houghton <jthoughton@google.com>, Mike Kravetz <mike.kravetz@oracle.com>, David Hildenbrand <david@redhat.com>, Andrea Arcangeli <aarcange@redhat.com>, Rik van Riel <riel@surriel.com>, peterx@redhat.com, Andrew Morton <akpm@linux-foundation.org>, Muchun Song <songmuchun@bytedance.com>, Miaohe Lin <linmiaohe@huawei.com>, Nadav Amit <nadav.amit@gmail.com> Subject: [PATCH RFC 09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe Date: Sun, 30 Oct 2022 17:30:43 -0400 Message-Id: <20221030213043.335669-1-peterx@redhat.com> In-Reply-To: <20221030212929.335473-1-peterx@redhat.com> References: <20221030212929.335473-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare \| expand [RFC,00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare [RFC,01/10] mm/hugetlb: Let vma_offset_start() to return start [RFC,02/10] mm/hugetlb: Comment huge_pte_offset() for its locking requirements [RFC,03/10] mm/hugetlb: Make hugetlb_vma_maps_page() RCU-safe [RFC,04/10] mm/hugetlb: Make userfaultfd_huge_must_wait() RCU-safe [RFC,05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe [RFC,06/10] mm/hugetlb: Make page_vma_mapped_walk() RCU-safe [RFC,07/10] mm/hugetlb: Make hugetlb_follow_page_mask() RCU-safe [RFC,08/10] mm/hugetlb: Make follow_hugetlb_page RCU-safe [RFC,09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe [RFC,10/10] mm/hugetlb: Comment at rest huge_pte_offset() places

Message ID

20221030213043.335669-1-peterx@redhat.com (mailing list archive)

State

New

Headers

From: Peter Xu <peterx@redhat.com>
To: linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Cc: James Houghton <jthoughton@google.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	David Hildenbrand <david@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Rik van Riel <riel@surriel.com>,
	peterx@redhat.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <songmuchun@bytedance.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Nadav Amit <nadav.amit@gmail.com>
Subject: [PATCH RFC 09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe
Date: Sun, 30 Oct 2022 17:30:43 -0400
Message-Id: <20221030213043.335669-1-peterx@redhat.com>
In-Reply-To: <20221030212929.335473-1-peterx@redhat.com>
References: <20221030212929.335473-1-peterx@redhat.com>
MIME-Version: 1.0
Content-type: text/plain
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare | expand

RCU makes sure the pte_t* won't go away from under us. Please refer to the comment above huge_pte_offset() for more information. Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/hugetlb.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

James Houghton Nov. 2, 2022, 6:04 p.m. UTC | #1

On Sun, Oct 30, 2022 at 2:30 PM Peter Xu <peterx@redhat.com> wrote:
>
> RCU makes sure the pte_t* won't go away from under us.  Please refer to the
> comment above huge_pte_offset() for more information.

Thanks for this series, Peter! :)

>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/hugetlb.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5dc87e4e6780..6d336d286394 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5822,6 +5822,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>         int need_wait_lock = 0;
>         unsigned long haddr = address & huge_page_mask(h);
>
> +       /* For huge_pte_offset() */
> +       rcu_read_lock();
>         ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
>         if (ptep) {
>                 /*
> @@ -5830,13 +5832,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>                  * not actually modifying content here.
>                  */
>                 entry = huge_ptep_get(ptep);
> +               rcu_read_unlock();
>                 if (unlikely(is_hugetlb_entry_migration(entry))) {
>                         migration_entry_wait_huge(vma, ptep);

ptep is used here (and we dereference it in
`__migration_entry_wait_huge`), so this looks unsafe to me. A simple
way to fix this would be to move the migration entry check after the
huge_pte_alloc call.

- James

>                         return 0;
>                 } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
>                         return VM_FAULT_HWPOISON_LARGE |
>                                 VM_FAULT_SET_HINDEX(hstate_index(h));
> -       }
> +       } else
> +               rcu_read_unlock();
>
>         /*
>          * Serialize hugepage allocation and instantiation, so that we don't
> --
> 2.37.3
>

Peter Xu Nov. 3, 2022, 3:39 p.m. UTC | #2

On Wed, Nov 02, 2022 at 11:04:01AM -0700, James Houghton wrote:
> On Sun, Oct 30, 2022 at 2:30 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > RCU makes sure the pte_t* won't go away from under us.  Please refer to the
> > comment above huge_pte_offset() for more information.
> 
> Thanks for this series, Peter! :)

Thanks for reviewing, James!

> 
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  mm/hugetlb.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 5dc87e4e6780..6d336d286394 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -5822,6 +5822,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >         int need_wait_lock = 0;
> >         unsigned long haddr = address & huge_page_mask(h);
> >
> > +       /* For huge_pte_offset() */
> > +       rcu_read_lock();
> >         ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
> >         if (ptep) {
> >                 /*
> > @@ -5830,13 +5832,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >                  * not actually modifying content here.
> >                  */
> >                 entry = huge_ptep_get(ptep);
> > +               rcu_read_unlock();
> >                 if (unlikely(is_hugetlb_entry_migration(entry))) {
> >                         migration_entry_wait_huge(vma, ptep);
> 
> ptep is used here (and we dereference it in
> `__migration_entry_wait_huge`), so this looks unsafe to me. A simple
> way to fix this would be to move the migration entry check after the
> huge_pte_alloc call.

Right, I definitely overlooked the migration entries in both patches
(including the previous one that you commented), thanks for pointing that
out.

Though moving that after huge_pte_alloc() may have similar problem, iiuc.
The thing is we need either the vma lock or rcu to protect accessing the
pte*, while the pte* page and its pgtable lock can be accessed very deep
into the migration core (e.g., migration_entry_wait_on_locked()) as the
lock cannot be released before the thread queues itself into the waitqueue.

So far I don't see a good way to achieve this but add a hook to
migration_entry_wait_on_locked() so that any lock held for huge migrations
can be properly released after the pgtable lock released but before the
thread yields itself.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5dc87e4e6780..6d336d286394 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5822,6 +5822,8 @@  vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	int need_wait_lock = 0;
 	unsigned long haddr = address & huge_page_mask(h);
 
+	/* For huge_pte_offset() */
+	rcu_read_lock();
 	ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
 	if (ptep) {
 		/*
@@ -5830,13 +5832,15 @@  vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * not actually modifying content here.
 		 */
 		entry = huge_ptep_get(ptep);
+		rcu_read_unlock();
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
 			migration_entry_wait_huge(vma, ptep);
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
 				VM_FAULT_SET_HINDEX(hstate_index(h));
-	}
+	} else
+		rcu_read_unlock();
 
 	/*
 	 * Serialize hugepage allocation and instantiation, so that we don't

[RFC,09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe

Commit Message

Comments

Patch