Message ID | 20221030213043.335669-1-peterx@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare | expand |
On Sun, Oct 30, 2022 at 2:30 PM Peter Xu <peterx@redhat.com> wrote: > > RCU makes sure the pte_t* won't go away from under us. Please refer to the > comment above huge_pte_offset() for more information. Thanks for this series, Peter! :) > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5dc87e4e6780..6d336d286394 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5822,6 +5822,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > int need_wait_lock = 0; > unsigned long haddr = address & huge_page_mask(h); > > + /* For huge_pte_offset() */ > + rcu_read_lock(); > ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); > if (ptep) { > /* > @@ -5830,13 +5832,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * not actually modifying content here. > */ > entry = huge_ptep_get(ptep); > + rcu_read_unlock(); > if (unlikely(is_hugetlb_entry_migration(entry))) { > migration_entry_wait_huge(vma, ptep); ptep is used here (and we dereference it in `__migration_entry_wait_huge`), so this looks unsafe to me. A simple way to fix this would be to move the migration entry check after the huge_pte_alloc call. - James > return 0; > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) > return VM_FAULT_HWPOISON_LARGE | > VM_FAULT_SET_HINDEX(hstate_index(h)); > - } > + } else > + rcu_read_unlock(); > > /* > * Serialize hugepage allocation and instantiation, so that we don't > -- > 2.37.3 >
On Wed, Nov 02, 2022 at 11:04:01AM -0700, James Houghton wrote: > On Sun, Oct 30, 2022 at 2:30 PM Peter Xu <peterx@redhat.com> wrote: > > > > RCU makes sure the pte_t* won't go away from under us. Please refer to the > > comment above huge_pte_offset() for more information. > > Thanks for this series, Peter! :) Thanks for reviewing, James! > > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > mm/hugetlb.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 5dc87e4e6780..6d336d286394 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -5822,6 +5822,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > int need_wait_lock = 0; > > unsigned long haddr = address & huge_page_mask(h); > > > > + /* For huge_pte_offset() */ > > + rcu_read_lock(); > > ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); > > if (ptep) { > > /* > > @@ -5830,13 +5832,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > * not actually modifying content here. > > */ > > entry = huge_ptep_get(ptep); > > + rcu_read_unlock(); > > if (unlikely(is_hugetlb_entry_migration(entry))) { > > migration_entry_wait_huge(vma, ptep); > > ptep is used here (and we dereference it in > `__migration_entry_wait_huge`), so this looks unsafe to me. A simple > way to fix this would be to move the migration entry check after the > huge_pte_alloc call. Right, I definitely overlooked the migration entries in both patches (including the previous one that you commented), thanks for pointing that out. Though moving that after huge_pte_alloc() may have similar problem, iiuc. The thing is we need either the vma lock or rcu to protect accessing the pte*, while the pte* page and its pgtable lock can be accessed very deep into the migration core (e.g., migration_entry_wait_on_locked()) as the lock cannot be released before the thread queues itself into the waitqueue. So far I don't see a good way to achieve this but add a hook to migration_entry_wait_on_locked() so that any lock held for huge migrations can be properly released after the pgtable lock released but before the thread yields itself.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5dc87e4e6780..6d336d286394 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5822,6 +5822,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, int need_wait_lock = 0; unsigned long haddr = address & huge_page_mask(h); + /* For huge_pte_offset() */ + rcu_read_lock(); ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { /* @@ -5830,13 +5832,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * not actually modifying content here. */ entry = huge_ptep_get(ptep); + rcu_read_unlock(); if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, ptep); return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); - } + } else + rcu_read_unlock(); /* * Serialize hugepage allocation and instantiation, so that we don't
RCU makes sure the pte_t* won't go away from under us. Please refer to the comment above huge_pte_offset() for more information. Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/hugetlb.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)