diff mbox series

[v2,2/2] arm64: hugetlb: Fix set_huge_pte_at() to work with all swap entries

Message ID 20230922115804.2043771-3-ryan.roberts@arm.com (mailing list archive)
State New, archived
Headers show
Series Fix set_huge_pte_at() panic on arm64 | expand

Commit Message

Ryan Roberts Sept. 22, 2023, 11:58 a.m. UTC
When called with a swap entry that does not embed a PFN (e.g.
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation
of set_huge_pte_at() would either cause a BUG() to fire (if
CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address
and subsequent panic.

arm64's huge pte implementation supports multiple huge page sizes, some
of which are implemented in the page table with multiple contiguous
entries. So set_huge_pte_at() needs to work out how big the logical pte
is, so that it can also work out how many physical ptes (or pmds) need
to be written. It previously did this by grabbing the folio out of the
pte and querying its size.

However, there are cases when the pte being set is actually a swap
entry. But this also used to work fine, because for huge ptes, we only
ever saw migration entries and hwpoison entries. And both of these types
of swap entries have a PFN embedded, so the code would grab that and
everything still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"),
which causes a PTE_MARKER_POISONED swap entry to be set, coutesey of
commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for
hugetlbfs") - added in v6.5-rc7. Although review shows that there are
other call sites that set PTE_MARKER_UFFD_WP (which also has no PFN),
these don't trigger on arm64 because arm64 doesn't support UFFD WP.

Arguably, the root cause is really due to commit 18f3962953e4 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which
took a page size parameter) and replacing it with calls to
set_huge_pte_at() where the size was inferred from the folio, as
descibed above. While that commit didn't break anything at the time, it
did break the interface because it couldn't handle swap entries without
PFNs. And since then new callers have come along which rely on this
working. But given the brokeness is only observable after commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs"),
that one gets the Fixes tag.

Now that we have modified the set_huge_pte_at() interface to pass the
huge page size in the previous patch, we can trivially fix this issue.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Cc: <stable@vger.kernel.org> # 6.5+
---
 arch/arm64/mm/hugetlbpage.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

Comments

Will Deacon Sept. 22, 2023, 4:14 p.m. UTC | #1
On Fri, Sep 22, 2023 at 12:58:04PM +0100, Ryan Roberts wrote:
> When called with a swap entry that does not embed a PFN (e.g.
> PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation
> of set_huge_pte_at() would either cause a BUG() to fire (if
> CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address
> and subsequent panic.
> 
> arm64's huge pte implementation supports multiple huge page sizes, some
> of which are implemented in the page table with multiple contiguous
> entries. So set_huge_pte_at() needs to work out how big the logical pte
> is, so that it can also work out how many physical ptes (or pmds) need
> to be written. It previously did this by grabbing the folio out of the
> pte and querying its size.
> 
> However, there are cases when the pte being set is actually a swap
> entry. But this also used to work fine, because for huge ptes, we only
> ever saw migration entries and hwpoison entries. And both of these types
> of swap entries have a PFN embedded, so the code would grab that and
> everything still worked out.
> 
> But over time, more calls to set_huge_pte_at() have been added that set
> swap entry types that do not embed a PFN. And this causes the code to go
> bang. The triggering case is for the uffd poison test, commit
> 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"),
> which causes a PTE_MARKER_POISONED swap entry to be set, coutesey of
> commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for
> hugetlbfs") - added in v6.5-rc7. Although review shows that there are
> other call sites that set PTE_MARKER_UFFD_WP (which also has no PFN),
> these don't trigger on arm64 because arm64 doesn't support UFFD WP.
> 
> Arguably, the root cause is really due to commit 18f3962953e4 ("mm:
> hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
> interface to the core code by removing set_huge_swap_pte_at() (which
> took a page size parameter) and replacing it with calls to
> set_huge_pte_at() where the size was inferred from the folio, as
> descibed above. While that commit didn't break anything at the time, it
> did break the interface because it couldn't handle swap entries without
> PFNs. And since then new callers have come along which rely on this
> working. But given the brokeness is only observable after commit
> 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs"),
> that one gets the Fixes tag.
> 
> Now that we have modified the set_huge_pte_at() interface to pass the
> huge page size in the previous patch, we can trivially fix this issue.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
> Cc: <stable@vger.kernel.org> # 6.5+
> ---
>  arch/arm64/mm/hugetlbpage.c | 17 +++--------------
>  1 file changed, 3 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index a7f8c8db3425..13fd592228b1 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct *mm,
>  	flush_tlb_range(&vma, saddr, addr);
>  }
>  
> -static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
> -{
> -	VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
> -
> -	return page_folio(pfn_to_page(swp_offset_pfn(entry)));
> -}
> -
>  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>  			    pte_t *ptep, pte_t pte, unsigned long sz)
>  {
> @@ -257,13 +250,10 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>  	unsigned long pfn, dpfn;
>  	pgprot_t hugeprot;
>  
> -	if (!pte_present(pte)) {
> -		struct folio *folio;
> -
> -		folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte));
> -		ncontig = num_contig_ptes(folio_size(folio), &pgsize);
> +	ncontig = num_contig_ptes(sz, &pgsize);
>  
> -		for (i = 0; i < ncontig; i++, ptep++)
> +	if (!pte_present(pte)) {
> +		for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
>  			set_pte_at(mm, addr, ptep, pte);

Our set_pte_at() doesn't use 'addr' for anything and the old code didn't
even bother to increment it here! I'm fine adding that, but it feels
unrelated to the issue which this patch is actually fixing.

Either way:

Acked-by: Will Deacon <will@kernel.org>

Will
Ryan Roberts Sept. 22, 2023, 4:40 p.m. UTC | #2
On 22/09/2023 17:14, Will Deacon wrote:
> On Fri, Sep 22, 2023 at 12:58:04PM +0100, Ryan Roberts wrote:
>> When called with a swap entry that does not embed a PFN (e.g.
>> PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation
>> of set_huge_pte_at() would either cause a BUG() to fire (if
>> CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address
>> and subsequent panic.
>>
>> arm64's huge pte implementation supports multiple huge page sizes, some
>> of which are implemented in the page table with multiple contiguous
>> entries. So set_huge_pte_at() needs to work out how big the logical pte
>> is, so that it can also work out how many physical ptes (or pmds) need
>> to be written. It previously did this by grabbing the folio out of the
>> pte and querying its size.
>>
>> However, there are cases when the pte being set is actually a swap
>> entry. But this also used to work fine, because for huge ptes, we only
>> ever saw migration entries and hwpoison entries. And both of these types
>> of swap entries have a PFN embedded, so the code would grab that and
>> everything still worked out.
>>
>> But over time, more calls to set_huge_pte_at() have been added that set
>> swap entry types that do not embed a PFN. And this causes the code to go
>> bang. The triggering case is for the uffd poison test, commit
>> 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"),
>> which causes a PTE_MARKER_POISONED swap entry to be set, coutesey of
>> commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for
>> hugetlbfs") - added in v6.5-rc7. Although review shows that there are
>> other call sites that set PTE_MARKER_UFFD_WP (which also has no PFN),
>> these don't trigger on arm64 because arm64 doesn't support UFFD WP.
>>
>> Arguably, the root cause is really due to commit 18f3962953e4 ("mm:
>> hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
>> interface to the core code by removing set_huge_swap_pte_at() (which
>> took a page size parameter) and replacing it with calls to
>> set_huge_pte_at() where the size was inferred from the folio, as
>> descibed above. While that commit didn't break anything at the time, it
>> did break the interface because it couldn't handle swap entries without
>> PFNs. And since then new callers have come along which rely on this
>> working. But given the brokeness is only observable after commit
>> 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs"),
>> that one gets the Fixes tag.
>>
>> Now that we have modified the set_huge_pte_at() interface to pass the
>> huge page size in the previous patch, we can trivially fix this issue.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
>> Cc: <stable@vger.kernel.org> # 6.5+
>> ---
>>  arch/arm64/mm/hugetlbpage.c | 17 +++--------------
>>  1 file changed, 3 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index a7f8c8db3425..13fd592228b1 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct *mm,
>>  	flush_tlb_range(&vma, saddr, addr);
>>  }
>>  
>> -static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
>> -{
>> -	VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
>> -
>> -	return page_folio(pfn_to_page(swp_offset_pfn(entry)));
>> -}
>> -
>>  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>>  			    pte_t *ptep, pte_t pte, unsigned long sz)
>>  {
>> @@ -257,13 +250,10 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>>  	unsigned long pfn, dpfn;
>>  	pgprot_t hugeprot;
>>  
>> -	if (!pte_present(pte)) {
>> -		struct folio *folio;
>> -
>> -		folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte));
>> -		ncontig = num_contig_ptes(folio_size(folio), &pgsize);
>> +	ncontig = num_contig_ptes(sz, &pgsize);
>>  
>> -		for (i = 0; i < ncontig; i++, ptep++)
>> +	if (!pte_present(pte)) {
>> +		for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
>>  			set_pte_at(mm, addr, ptep, pte);
> 
> Our set_pte_at() doesn't use 'addr' for anything and the old code didn't
> even bother to increment it here! I'm fine adding that, but it feels
> unrelated to the issue which this patch is actually fixing.

True. I agree its not strictly necessary and will presumably be optimized out.
But I'm not sure that having knowledge that the implementation doesn't use it is
a good reason not to call the interface correctly. I'll leave it as I've done it
if that's ok.

> 
> Either way:
> 
> Acked-by: Will Deacon <will@kernel.org>

Thanks!

> 
> Will
Axel Rasmussen Sept. 22, 2023, 5:09 p.m. UTC | #3
Looks correct to me - thanks for the fix!

Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>

On Fri, Sep 22, 2023 at 9:41 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 22/09/2023 17:14, Will Deacon wrote:
> > On Fri, Sep 22, 2023 at 12:58:04PM +0100, Ryan Roberts wrote:
> >> When called with a swap entry that does not embed a PFN (e.g.
> >> PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation
> >> of set_huge_pte_at() would either cause a BUG() to fire (if
> >> CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address
> >> and subsequent panic.
> >>
> >> arm64's huge pte implementation supports multiple huge page sizes, some
> >> of which are implemented in the page table with multiple contiguous
> >> entries. So set_huge_pte_at() needs to work out how big the logical pte
> >> is, so that it can also work out how many physical ptes (or pmds) need
> >> to be written. It previously did this by grabbing the folio out of the
> >> pte and querying its size.
> >>
> >> However, there are cases when the pte being set is actually a swap
> >> entry. But this also used to work fine, because for huge ptes, we only
> >> ever saw migration entries and hwpoison entries. And both of these types
> >> of swap entries have a PFN embedded, so the code would grab that and
> >> everything still worked out.
> >>
> >> But over time, more calls to set_huge_pte_at() have been added that set
> >> swap entry types that do not embed a PFN. And this causes the code to go
> >> bang. The triggering case is for the uffd poison test, commit
> >> 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"),
> >> which causes a PTE_MARKER_POISONED swap entry to be set, coutesey of
> >> commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for
> >> hugetlbfs") - added in v6.5-rc7. Although review shows that there are
> >> other call sites that set PTE_MARKER_UFFD_WP (which also has no PFN),
> >> these don't trigger on arm64 because arm64 doesn't support UFFD WP.
> >>
> >> Arguably, the root cause is really due to commit 18f3962953e4 ("mm:
> >> hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
> >> interface to the core code by removing set_huge_swap_pte_at() (which
> >> took a page size parameter) and replacing it with calls to
> >> set_huge_pte_at() where the size was inferred from the folio, as
> >> descibed above. While that commit didn't break anything at the time, it
> >> did break the interface because it couldn't handle swap entries without
> >> PFNs. And since then new callers have come along which rely on this
> >> working. But given the brokeness is only observable after commit
> >> 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs"),
> >> that one gets the Fixes tag.
> >>
> >> Now that we have modified the set_huge_pte_at() interface to pass the
> >> huge page size in the previous patch, we can trivially fix this issue.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >> Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
> >> Cc: <stable@vger.kernel.org> # 6.5+
> >> ---
> >>  arch/arm64/mm/hugetlbpage.c | 17 +++--------------
> >>  1 file changed, 3 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> >> index a7f8c8db3425..13fd592228b1 100644
> >> --- a/arch/arm64/mm/hugetlbpage.c
> >> +++ b/arch/arm64/mm/hugetlbpage.c
> >> @@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct *mm,
> >>      flush_tlb_range(&vma, saddr, addr);
> >>  }
> >>
> >> -static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
> >> -{
> >> -    VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
> >> -
> >> -    return page_folio(pfn_to_page(swp_offset_pfn(entry)));
> >> -}
> >> -
> >>  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> >>                          pte_t *ptep, pte_t pte, unsigned long sz)
> >>  {
> >> @@ -257,13 +250,10 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> >>      unsigned long pfn, dpfn;
> >>      pgprot_t hugeprot;
> >>
> >> -    if (!pte_present(pte)) {
> >> -            struct folio *folio;
> >> -
> >> -            folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte));
> >> -            ncontig = num_contig_ptes(folio_size(folio), &pgsize);
> >> +    ncontig = num_contig_ptes(sz, &pgsize);
> >>
> >> -            for (i = 0; i < ncontig; i++, ptep++)
> >> +    if (!pte_present(pte)) {
> >> +            for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
> >>                      set_pte_at(mm, addr, ptep, pte);
> >
> > Our set_pte_at() doesn't use 'addr' for anything and the old code didn't
> > even bother to increment it here! I'm fine adding that, but it feels
> > unrelated to the issue which this patch is actually fixing.
>
> True. I agree its not strictly necessary and will presumably be optimized out.
> But I'm not sure that having knowledge that the implementation doesn't use it is
> a good reason not to call the interface correctly. I'll leave it as I've done it
> if that's ok.
>
> >
> > Either way:
> >
> > Acked-by: Will Deacon <will@kernel.org>
>
> Thanks!
>
> >
> > Will
>
diff mbox series

Patch

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index a7f8c8db3425..13fd592228b1 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -241,13 +241,6 @@  static void clear_flush(struct mm_struct *mm,
 	flush_tlb_range(&vma, saddr, addr);
 }
 
-static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
-{
-	VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
-
-	return page_folio(pfn_to_page(swp_offset_pfn(entry)));
-}
-
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 			    pte_t *ptep, pte_t pte, unsigned long sz)
 {
@@ -257,13 +250,10 @@  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long pfn, dpfn;
 	pgprot_t hugeprot;
 
-	if (!pte_present(pte)) {
-		struct folio *folio;
-
-		folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte));
-		ncontig = num_contig_ptes(folio_size(folio), &pgsize);
+	ncontig = num_contig_ptes(sz, &pgsize);
 
-		for (i = 0; i < ncontig; i++, ptep++)
+	if (!pte_present(pte)) {
+		for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
 			set_pte_at(mm, addr, ptep, pte);
 		return;
 	}
@@ -273,7 +263,6 @@  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 		return;
 	}
 
-	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
 	pfn = pte_pfn(pte);
 	dpfn = pgsize >> PAGE_SHIFT;
 	hugeprot = pte_pgprot(pte);