[RFC,07/10] mm/hugetlb: Make hugetlb_follow_page_mask() RCU-safe

Message ID	20221030212929.335473-8-peterx@redhat.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Peter Xu <peterx@redhat.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org>, James Houghton <jthoughton@google.com>, Miaohe Lin <linmiaohe@huawei.com>, David Hildenbrand <david@redhat.com>, Muchun Song <songmuchun@bytedance.com>, Andrea Arcangeli <aarcange@redhat.com>, Nadav Amit <nadav.amit@gmail.com>, Mike Kravetz <mike.kravetz@oracle.com>, peterx@redhat.com, Rik van Riel <riel@surriel.com> Subject: [PATCH RFC 07/10] mm/hugetlb: Make hugetlb_follow_page_mask() RCU-safe Date: Sun, 30 Oct 2022 17:29:26 -0400 Message-Id: <20221030212929.335473-8-peterx@redhat.com> In-Reply-To: <20221030212929.335473-1-peterx@redhat.com> References: <20221030212929.335473-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare \| expand [RFC,00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare [RFC,01/10] mm/hugetlb: Let vma_offset_start() to return start [RFC,02/10] mm/hugetlb: Comment huge_pte_offset() for its locking requirements [RFC,03/10] mm/hugetlb: Make hugetlb_vma_maps_page() RCU-safe [RFC,04/10] mm/hugetlb: Make userfaultfd_huge_must_wait() RCU-safe [RFC,05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe [RFC,06/10] mm/hugetlb: Make page_vma_mapped_walk() RCU-safe [RFC,07/10] mm/hugetlb: Make hugetlb_follow_page_mask() RCU-safe [RFC,08/10] mm/hugetlb: Make follow_hugetlb_page RCU-safe [RFC,09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe [RFC,10/10] mm/hugetlb: Comment at rest huge_pte_offset() places

Message ID

20221030212929.335473-8-peterx@redhat.com (mailing list archive)

State

New

Headers

From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	James Houghton <jthoughton@google.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	David Hildenbrand <david@redhat.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	peterx@redhat.com,
	Rik van Riel <riel@surriel.com>
Subject: [PATCH RFC 07/10] mm/hugetlb: Make hugetlb_follow_page_mask()
 RCU-safe
Date: Sun, 30 Oct 2022 17:29:26 -0400
Message-Id: <20221030212929.335473-8-peterx@redhat.com>
In-Reply-To: <20221030212929.335473-1-peterx@redhat.com>
References: <20221030212929.335473-1-peterx@redhat.com>
MIME-Version: 1.0
Content-type: text/plain
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare | expand

RCU makes sure the pte_t* won't go away from under us. Please refer to the comment above huge_pte_offset() for more information. Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/hugetlb.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

James Houghton Nov. 2, 2022, 6:24 p.m. UTC | #1

On Sun, Oct 30, 2022 at 2:29 PM Peter Xu <peterx@redhat.com> wrote:
>
> RCU makes sure the pte_t* won't go away from under us.  Please refer to the
> comment above huge_pte_offset() for more information.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/hugetlb.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 9869c12e6460..85214095fb85 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6229,10 +6229,12 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
>         if (WARN_ON_ONCE(flags & FOLL_PIN))
>                 return NULL;
>
> +       /* For huge_pte_offset() */
> +       rcu_read_lock();
>  retry:
>         pte = huge_pte_offset(mm, haddr, huge_page_size(h));
>         if (!pte)
> -               return NULL;
> +               goto out_rcu;
>
>         ptl = huge_pte_lock(h, mm, pte);

Just to make sure -- this huge_pte_lock doesn't count as "blocking"
(for the purposes of what is allowed in an RCU read-side critical
section), right? If so, great! But I think we need to call
`rcu_read_unlock` before entering `__migration_entry_wait_huge`, as
that function really can block.

- James

>         entry = huge_ptep_get(pte);
> @@ -6266,6 +6268,8 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
>         }
>  out:
>         spin_unlock(ptl);
> +out_rcu:
> +       rcu_read_unlock();
>         return page;
>  }
>
> --
> 2.37.3
>

Peter Xu Nov. 3, 2022, 3:50 p.m. UTC | #2

On Wed, Nov 02, 2022 at 11:24:57AM -0700, James Houghton wrote:
> On Sun, Oct 30, 2022 at 2:29 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > RCU makes sure the pte_t* won't go away from under us.  Please refer to the
> > comment above huge_pte_offset() for more information.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  mm/hugetlb.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 9869c12e6460..85214095fb85 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6229,10 +6229,12 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> >         if (WARN_ON_ONCE(flags & FOLL_PIN))
> >                 return NULL;
> >
> > +       /* For huge_pte_offset() */
> > +       rcu_read_lock();
> >  retry:
> >         pte = huge_pte_offset(mm, haddr, huge_page_size(h));
> >         if (!pte)
> > -               return NULL;
> > +               goto out_rcu;
> >
> >         ptl = huge_pte_lock(h, mm, pte);
> 
> Just to make sure -- this huge_pte_lock doesn't count as "blocking"
> (for the purposes of what is allowed in an RCU read-side critical
> section), right? If so, great!

Yeah I think spinlock should be fine, iiuc it'll be fine as long as we
don't proactively yield with any form of sleeping locks.

For RT sleepable spinlock should also be fine in this case, as explicitly
mentioned in the RCU docs:

b.	What about the -rt patchset?  If readers would need to block
	in an non-rt kernel, you need SRCU.  If readers would block
	in a -rt kernel, but not in a non-rt kernel, SRCU is not
	necessary.  (The -rt patchset turns spinlocks into sleeplocks,
	hence this distinction.)

> But I think we need to call `rcu_read_unlock` before entering
> `__migration_entry_wait_huge`, as that function really can block.

Right, let me revisit this after I figure out how to do with the
hugetlb_fault() path first, as you commented in the other patch.

Actually here I really think we should just remove the migration chunk and
return with page==NULL, since I really don't think follow_page_mask should
block at all.. then for !sleep cases (FOLL_NOWAIT) or follow_page we'll
return the NULL upwards early, while for generic GUP (__get_user_pages)
we'll just wait in the upcoming faultin_page().  That's afaict what we do
with non-hugetlb memories too (after the recent removal of FOLL_MIGRATE in
4a0499782a).

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9869c12e6460..85214095fb85 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6229,10 +6229,12 @@  struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 	if (WARN_ON_ONCE(flags & FOLL_PIN))
 		return NULL;
 
+	/* For huge_pte_offset() */
+	rcu_read_lock();
 retry:
 	pte = huge_pte_offset(mm, haddr, huge_page_size(h));
 	if (!pte)
-		return NULL;
+		goto out_rcu;
 
 	ptl = huge_pte_lock(h, mm, pte);
 	entry = huge_ptep_get(pte);
@@ -6266,6 +6268,8 @@  struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 	}
 out:
 	spin_unlock(ptl);
+out_rcu:
+	rcu_read_unlock();
 	return page;
 }

[RFC,07/10] mm/hugetlb: Make hugetlb_follow_page_mask() RCU-safe

Commit Message

Comments

Patch