diff mbox series

mm/mprotect: Only reference swap pfn page if type match

Message ID 20220823221138.45602-1-peterx@redhat.com (mailing list archive)
State New
Headers show
Series mm/mprotect: Only reference swap pfn page if type match | expand

Commit Message

Peter Xu Aug. 23, 2022, 10:11 p.m. UTC
Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:

  kernel BUG at include/linux/swapops.h:117!
  CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
  RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
  Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
  c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
  48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
  RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
  RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
  RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
  RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
  R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
  R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
  FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   change_pte_range+0x36e/0x880
   change_p4d_range+0x2e8/0x670
   change_protection_range+0x14e/0x2c0
   mprotect_fixup+0x1ee/0x330
   do_mprotect_pkey+0x34c/0x440
   __x64_sys_mprotect+0x1d/0x30

It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
genuine swap entry.

Fix it by only calling it when it's a write migration entry where the page*
is used.

[1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/

Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/mprotect.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Yu Zhao Aug. 23, 2022, 10:44 p.m. UTC | #1
On Tue, Aug 23, 2022 at 4:11 PM Peter Xu <peterx@redhat.com> wrote:
>
> Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
> fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
>
>   kernel BUG at include/linux/swapops.h:117!
>   CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
>   RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
>   Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
>   c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
>   48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
>   RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
>   RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
>   RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
>   RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
>   R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
>   R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
>   FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    <TASK>
>    change_pte_range+0x36e/0x880
>    change_p4d_range+0x2e8/0x670
>    change_protection_range+0x14e/0x2c0
>    mprotect_fixup+0x1ee/0x330
>    do_mprotect_pkey+0x34c/0x440
>    __x64_sys_mprotect+0x1d/0x30
>
> It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
> genuine swap entry.
>
> Fix it by only calling it when it's a write migration entry where the page*
> is used.
>
> [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
>
> Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
> Cc: David Hildenbrand <david@redhat.com>
> Cc: <stable@vger.kernel.org>
> Reported-by: Yu Zhao <yuzhao@google.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Thanks for the quick turnaround!

Tested-by: Yu Zhao <yuzhao@google.com>
David Hildenbrand Aug. 24, 2022, 6:58 a.m. UTC | #2
On 24.08.22 00:11, Peter Xu wrote:
> Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
> fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
> 
>   kernel BUG at include/linux/swapops.h:117!

Hi Peter,

Note that Linus recently complained to me that we should not add any new
BUG_ONs (not even VM_BUG_ONs), and even convert all of them to
WARN_ON_ONCE. So  we might want to change that to a WARN_ON_ONCE before
sending it upstream.

>   CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
>   RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
>   Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
>   c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
>   48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
>   RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
>   RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
>   RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
>   RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
>   R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
>   R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
>   FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    <TASK>
>    change_pte_range+0x36e/0x880
>    change_p4d_range+0x2e8/0x670
>    change_protection_range+0x14e/0x2c0
>    mprotect_fixup+0x1ee/0x330
>    do_mprotect_pkey+0x34c/0x440
>    __x64_sys_mprotect+0x1d/0x30
> 
> It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
> genuine swap entry.

Right, but the page is not touched in that case and it would simply be
an unused garbage pointer. So the real issue is that we might call
pfn_to_page() on a wrong PFN.

For !FLATMEM I think we just don't care. For SPARSEMEM, however, we
could end up getting a NULL pointer from __pfn_to_section() and end up
de-referencing it when looking up the memmap address inside the section
via __section_mem_map_addr().

I guess that could happen before your changes, for example, when we'd
have a swap offset that's larger than the biggest PFN in the system.

Reviewed-by: David Hildenbrand <david@redhat.com>

> 
> Fix it by only calling it when it's a write migration entry where the page*
> is used.
> 
> [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
> 
> Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
> Cc: David Hildenbrand <david@redhat.com>
> Cc: <stable@vger.kernel.org>
> Reported-by: Yu Zhao <yuzhao@google.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/mprotect.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index f2b9b1da9083..4549f5945ebe 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -203,10 +203,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
>  			pages++;
>  		} else if (is_swap_pte(oldpte)) {
>  			swp_entry_t entry = pte_to_swp_entry(oldpte);
> -			struct page *page = pfn_swap_entry_to_page(entry);
>  			pte_t newpte;
>  
>  			if (is_writable_migration_entry(entry)) {
> +				struct page *page = pfn_swap_entry_to_page(entry);
> +
>  				/*
>  				 * A protection check is difficult so
>  				 * just be safe and disable write
David Hildenbrand Aug. 26, 2022, 10:49 a.m. UTC | #3
On 24.08.22 00:11, Peter Xu wrote:
> Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
> fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
> 
>   kernel BUG at include/linux/swapops.h:117!
>   CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
>   RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
>   Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
>   c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
>   48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
>   RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
>   RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
>   RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
>   RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
>   R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
>   R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
>   FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    <TASK>
>    change_pte_range+0x36e/0x880
>    change_p4d_range+0x2e8/0x670
>    change_protection_range+0x14e/0x2c0
>    mprotect_fixup+0x1ee/0x330
>    do_mprotect_pkey+0x34c/0x440
>    __x64_sys_mprotect+0x1d/0x30
> 
> It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
> genuine swap entry.
> 
> Fix it by only calling it when it's a write migration entry where the page*
> is used.
> 
> [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
> 
> Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
> Cc: David Hildenbrand <david@redhat.com>
> Cc: <stable@vger.kernel.org>
> Reported-by: Yu Zhao <yuzhao@google.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/mprotect.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index f2b9b1da9083..4549f5945ebe 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -203,10 +203,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
>  			pages++;
>  		} else if (is_swap_pte(oldpte)) {
>  			swp_entry_t entry = pte_to_swp_entry(oldpte);
> -			struct page *page = pfn_swap_entry_to_page(entry);
>  			pte_t newpte;
>  
>  			if (is_writable_migration_entry(entry)) {
> +				struct page *page = pfn_swap_entry_to_page(entry);
> +
>  				/*
>  				 * A protection check is difficult so
>  				 * just be safe and disable write


Stumbling over the THP code, I was wondering if we also want to adjust change_huge_pmd()
and hugetlb_change_protection. There are no actual swap entries, so I assume we're fine.


diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 482c1826e723..466364e7fc5f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1798,10 +1798,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
        if (is_swap_pmd(*pmd)) {
                swp_entry_t entry = pmd_to_swp_entry(*pmd);
-               struct page *page = pfn_swap_entry_to_page(entry);
 
                VM_BUG_ON(!is_pmd_migration_entry(*pmd));
                if (is_writable_migration_entry(entry)) {
+                       struct page *page = pfn_swap_entry_to_page(entry);
                        pmd_t newpmd;
                        /*
                         * A protection check is difficult so
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2480ba627aa5..559465fae5cd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6370,9 +6370,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
                }
                if (unlikely(is_hugetlb_entry_migration(pte))) {
                        swp_entry_t entry = pte_to_swp_entry(pte);
-                       struct page *page = pfn_swap_entry_to_page(entry);
 
                        if (!is_readable_migration_entry(entry)) {
+                               struct page *page = pfn_swap_entry_to_page(entry);
                                pte_t newpte;
 
                                if (PageAnon(page))


@Peter, what's your thought?
Peter Xu Aug. 26, 2022, 2:25 p.m. UTC | #4
On Fri, Aug 26, 2022 at 12:49:37PM +0200, David Hildenbrand wrote:
> On 24.08.22 00:11, Peter Xu wrote:
> > Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
> > fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
> > 
> >   kernel BUG at include/linux/swapops.h:117!
> >   CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
> >   RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
> >   Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
> >   c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
> >   48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
> >   RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
> >   RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
> >   RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
> >   RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
> >   R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
> >   R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
> >   FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
> >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
> >   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >   Call Trace:
> >    <TASK>
> >    change_pte_range+0x36e/0x880
> >    change_p4d_range+0x2e8/0x670
> >    change_protection_range+0x14e/0x2c0
> >    mprotect_fixup+0x1ee/0x330
> >    do_mprotect_pkey+0x34c/0x440
> >    __x64_sys_mprotect+0x1d/0x30
> > 
> > It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
> > genuine swap entry.
> > 
> > Fix it by only calling it when it's a write migration entry where the page*
> > is used.
> > 
> > [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
> > 
> > Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: <stable@vger.kernel.org>
> > Reported-by: Yu Zhao <yuzhao@google.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  mm/mprotect.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index f2b9b1da9083..4549f5945ebe 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -203,10 +203,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
> >  			pages++;
> >  		} else if (is_swap_pte(oldpte)) {
> >  			swp_entry_t entry = pte_to_swp_entry(oldpte);
> > -			struct page *page = pfn_swap_entry_to_page(entry);
> >  			pte_t newpte;
> >  
> >  			if (is_writable_migration_entry(entry)) {
> > +				struct page *page = pfn_swap_entry_to_page(entry);
> > +
> >  				/*
> >  				 * A protection check is difficult so
> >  				 * just be safe and disable write
> 
> 
> Stumbling over the THP code, I was wondering if we also want to adjust change_huge_pmd()
> and hugetlb_change_protection. There are no actual swap entries, so I assume we're fine.
> 
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 482c1826e723..466364e7fc5f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1798,10 +1798,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>         if (is_swap_pmd(*pmd)) {
>                 swp_entry_t entry = pmd_to_swp_entry(*pmd);
> -               struct page *page = pfn_swap_entry_to_page(entry);
>  
>                 VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>                 if (is_writable_migration_entry(entry)) {
> +                       struct page *page = pfn_swap_entry_to_page(entry);
>                         pmd_t newpmd;
>                         /*
>                          * A protection check is difficult so
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 2480ba627aa5..559465fae5cd 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6370,9 +6370,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>                 }
>                 if (unlikely(is_hugetlb_entry_migration(pte))) {
>                         swp_entry_t entry = pte_to_swp_entry(pte);
> -                       struct page *page = pfn_swap_entry_to_page(entry);
>  
>                         if (!is_readable_migration_entry(entry)) {
> +                               struct page *page = pfn_swap_entry_to_page(entry);
>                                 pte_t newpte;
>  
>                                 if (PageAnon(page))
> 
> 
> @Peter, what's your thought?

IMHO they're not needed?

The rule is simple in my mind: we should only pass in a pfn-typed swap
entry into pfn_swap_entry_to_page() (or the new swp_offset_pfn()), or it's
a violation of the API.  In these two cases they do not violate the API and
they're always safe because they're guaranteed to be pfn swap entries when
calling.

Thanks,
David Hildenbrand Aug. 26, 2022, 2:39 p.m. UTC | #5
On 26.08.22 16:25, Peter Xu wrote:
> On Fri, Aug 26, 2022 at 12:49:37PM +0200, David Hildenbrand wrote:
>> On 24.08.22 00:11, Peter Xu wrote:
>>> Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
>>> fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
>>>
>>>   kernel BUG at include/linux/swapops.h:117!
>>>   CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
>>>   RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
>>>   Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
>>>   c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
>>>   48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
>>>   RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
>>>   RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
>>>   RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
>>>   RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
>>>   R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
>>>   R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
>>>   FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>   CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
>>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>   Call Trace:
>>>    <TASK>
>>>    change_pte_range+0x36e/0x880
>>>    change_p4d_range+0x2e8/0x670
>>>    change_protection_range+0x14e/0x2c0
>>>    mprotect_fixup+0x1ee/0x330
>>>    do_mprotect_pkey+0x34c/0x440
>>>    __x64_sys_mprotect+0x1d/0x30
>>>
>>> It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
>>> genuine swap entry.
>>>
>>> Fix it by only calling it when it's a write migration entry where the page*
>>> is used.
>>>
>>> [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
>>>
>>> Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: <stable@vger.kernel.org>
>>> Reported-by: Yu Zhao <yuzhao@google.com>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>  mm/mprotect.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>> index f2b9b1da9083..4549f5945ebe 100644
>>> --- a/mm/mprotect.c
>>> +++ b/mm/mprotect.c
>>> @@ -203,10 +203,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
>>>  			pages++;
>>>  		} else if (is_swap_pte(oldpte)) {
>>>  			swp_entry_t entry = pte_to_swp_entry(oldpte);
>>> -			struct page *page = pfn_swap_entry_to_page(entry);
>>>  			pte_t newpte;
>>>  
>>>  			if (is_writable_migration_entry(entry)) {
>>> +				struct page *page = pfn_swap_entry_to_page(entry);
>>> +
>>>  				/*
>>>  				 * A protection check is difficult so
>>>  				 * just be safe and disable write
>>
>>
>> Stumbling over the THP code, I was wondering if we also want to adjust change_huge_pmd()
>> and hugetlb_change_protection. There are no actual swap entries, so I assume we're fine.
>>
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 482c1826e723..466364e7fc5f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1798,10 +1798,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>         if (is_swap_pmd(*pmd)) {
>>                 swp_entry_t entry = pmd_to_swp_entry(*pmd);
>> -               struct page *page = pfn_swap_entry_to_page(entry);
>>  
>>                 VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>>                 if (is_writable_migration_entry(entry)) {
>> +                       struct page *page = pfn_swap_entry_to_page(entry);
>>                         pmd_t newpmd;
>>                         /*
>>                          * A protection check is difficult so
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 2480ba627aa5..559465fae5cd 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -6370,9 +6370,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>>                 }
>>                 if (unlikely(is_hugetlb_entry_migration(pte))) {
>>                         swp_entry_t entry = pte_to_swp_entry(pte);
>> -                       struct page *page = pfn_swap_entry_to_page(entry);
>>  
>>                         if (!is_readable_migration_entry(entry)) {
>> +                               struct page *page = pfn_swap_entry_to_page(entry);
>>                                 pte_t newpte;
>>  
>>                                 if (PageAnon(page))
>>
>>
>> @Peter, what's your thought?
> 
> IMHO they're not needed?
> 
> The rule is simple in my mind: we should only pass in a pfn-typed swap
> entry into pfn_swap_entry_to_page() (or the new swp_offset_pfn()), or it's
> a violation of the API.  In these two cases they do not violate the API and
> they're always safe because they're guaranteed to be pfn swap entries when
> calling.

I was wondering about extreme corner cases regarding the struct page.

Assume we have a hwpoison_entry that pointed at a valid struct page. We
can succeed in offlining+removing the section it's located on (I was
recently challenging if we want to keep that behavior as it's really
shaky already), freeing the relevant memmap entry and the memory section.

pfn_swap_entry_to_page() -> pfn_to_page() would be problematic if there
is no memmap anymore.


I assume it's ok to always call it for is_pfn_swap_entry(), but in the
PMD case we only check for is_swap_pmd()? Isn't that problematic?


I was confused by the hugetlb case, it's indeed fine as we check for
is_hugetlb_entry_migration().
Peter Xu Aug. 26, 2022, 4:04 p.m. UTC | #6
On Fri, Aug 26, 2022 at 04:39:08PM +0200, David Hildenbrand wrote:
> On 26.08.22 16:25, Peter Xu wrote:
> > On Fri, Aug 26, 2022 at 12:49:37PM +0200, David Hildenbrand wrote:
> >> On 24.08.22 00:11, Peter Xu wrote:
> >>> Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
> >>> fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
> >>>
> >>>   kernel BUG at include/linux/swapops.h:117!
> >>>   CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
> >>>   RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
> >>>   Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
> >>>   c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
> >>>   48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
> >>>   RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
> >>>   RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
> >>>   RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
> >>>   RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
> >>>   R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
> >>>   R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
> >>>   FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
> >>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>   CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
> >>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>   Call Trace:
> >>>    <TASK>
> >>>    change_pte_range+0x36e/0x880
> >>>    change_p4d_range+0x2e8/0x670
> >>>    change_protection_range+0x14e/0x2c0
> >>>    mprotect_fixup+0x1ee/0x330
> >>>    do_mprotect_pkey+0x34c/0x440
> >>>    __x64_sys_mprotect+0x1d/0x30
> >>>
> >>> It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
> >>> genuine swap entry.
> >>>
> >>> Fix it by only calling it when it's a write migration entry where the page*
> >>> is used.
> >>>
> >>> [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
> >>>
> >>> Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
> >>> Cc: David Hildenbrand <david@redhat.com>
> >>> Cc: <stable@vger.kernel.org>
> >>> Reported-by: Yu Zhao <yuzhao@google.com>
> >>> Signed-off-by: Peter Xu <peterx@redhat.com>
> >>> ---
> >>>  mm/mprotect.c | 3 ++-
> >>>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/mprotect.c b/mm/mprotect.c
> >>> index f2b9b1da9083..4549f5945ebe 100644
> >>> --- a/mm/mprotect.c
> >>> +++ b/mm/mprotect.c
> >>> @@ -203,10 +203,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
> >>>  			pages++;
> >>>  		} else if (is_swap_pte(oldpte)) {
> >>>  			swp_entry_t entry = pte_to_swp_entry(oldpte);
> >>> -			struct page *page = pfn_swap_entry_to_page(entry);
> >>>  			pte_t newpte;
> >>>  
> >>>  			if (is_writable_migration_entry(entry)) {
> >>> +				struct page *page = pfn_swap_entry_to_page(entry);
> >>> +
> >>>  				/*
> >>>  				 * A protection check is difficult so
> >>>  				 * just be safe and disable write
> >>
> >>
> >> Stumbling over the THP code, I was wondering if we also want to adjust change_huge_pmd()
> >> and hugetlb_change_protection. There are no actual swap entries, so I assume we're fine.
> >>
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 482c1826e723..466364e7fc5f 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -1798,10 +1798,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> >>         if (is_swap_pmd(*pmd)) {
> >>                 swp_entry_t entry = pmd_to_swp_entry(*pmd);
> >> -               struct page *page = pfn_swap_entry_to_page(entry);
> >>  
> >>                 VM_BUG_ON(!is_pmd_migration_entry(*pmd));
> >>                 if (is_writable_migration_entry(entry)) {
> >> +                       struct page *page = pfn_swap_entry_to_page(entry);
> >>                         pmd_t newpmd;
> >>                         /*
> >>                          * A protection check is difficult so
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index 2480ba627aa5..559465fae5cd 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -6370,9 +6370,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
> >>                 }
> >>                 if (unlikely(is_hugetlb_entry_migration(pte))) {
> >>                         swp_entry_t entry = pte_to_swp_entry(pte);
> >> -                       struct page *page = pfn_swap_entry_to_page(entry);
> >>  
> >>                         if (!is_readable_migration_entry(entry)) {
> >> +                               struct page *page = pfn_swap_entry_to_page(entry);
> >>                                 pte_t newpte;
> >>  
> >>                                 if (PageAnon(page))
> >>
> >>
> >> @Peter, what's your thought?
> > 
> > IMHO they're not needed?
> > 
> > The rule is simple in my mind: we should only pass in a pfn-typed swap
> > entry into pfn_swap_entry_to_page() (or the new swp_offset_pfn()), or it's
> > a violation of the API.  In these two cases they do not violate the API and
> > they're always safe because they're guaranteed to be pfn swap entries when
> > calling.
> 
> I was wondering about extreme corner cases regarding the struct page.
> 
> Assume we have a hwpoison_entry that pointed at a valid struct page. We
> can succeed in offlining+removing the section it's located on (I was
> recently challenging if we want to keep that behavior as it's really
> shaky already), freeing the relevant memmap entry and the memory section.
> 
> pfn_swap_entry_to_page() -> pfn_to_page() would be problematic if there
> is no memmap anymore.
> 
> 
> I assume it's ok to always call it for is_pfn_swap_entry(), but in the
> PMD case we only check for is_swap_pmd()? Isn't that problematic?

I don't know extensively enough on hwpoison on validity of fetching page
from pfn inside on online/offline ops, but.. if the only concern is about
hwpoison entry existance here I think its fine?  Because iirc we'l split
thp when any of the subpage got poisoned, so we should never hit a hwpoison
entry in thp path.

> 
> 
> I was confused by the hugetlb case, it's indeed fine as we check for
> is_hugetlb_entry_migration().

Right, it's more straightforward in the hugetlb case.
David Hildenbrand Aug. 26, 2022, 4:35 p.m. UTC | #7
On 26.08.22 18:04, Peter Xu wrote:
> On Fri, Aug 26, 2022 at 04:39:08PM +0200, David Hildenbrand wrote:
>> On 26.08.22 16:25, Peter Xu wrote:
>>> On Fri, Aug 26, 2022 at 12:49:37PM +0200, David Hildenbrand wrote:
>>>> On 24.08.22 00:11, Peter Xu wrote:
>>>>> Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
>>>>> fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
>>>>>
>>>>>   kernel BUG at include/linux/swapops.h:117!
>>>>>   CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
>>>>>   RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
>>>>>   Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
>>>>>   c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
>>>>>   48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
>>>>>   RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
>>>>>   RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
>>>>>   RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
>>>>>   RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
>>>>>   R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
>>>>>   R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
>>>>>   FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
>>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>   CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
>>>>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>   Call Trace:
>>>>>    <TASK>
>>>>>    change_pte_range+0x36e/0x880
>>>>>    change_p4d_range+0x2e8/0x670
>>>>>    change_protection_range+0x14e/0x2c0
>>>>>    mprotect_fixup+0x1ee/0x330
>>>>>    do_mprotect_pkey+0x34c/0x440
>>>>>    __x64_sys_mprotect+0x1d/0x30
>>>>>
>>>>> It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
>>>>> genuine swap entry.
>>>>>
>>>>> Fix it by only calling it when it's a write migration entry where the page*
>>>>> is used.
>>>>>
>>>>> [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
>>>>>
>>>>> Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
>>>>> Cc: David Hildenbrand <david@redhat.com>
>>>>> Cc: <stable@vger.kernel.org>
>>>>> Reported-by: Yu Zhao <yuzhao@google.com>
>>>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>>>> ---
>>>>>  mm/mprotect.c | 3 ++-
>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>>>> index f2b9b1da9083..4549f5945ebe 100644
>>>>> --- a/mm/mprotect.c
>>>>> +++ b/mm/mprotect.c
>>>>> @@ -203,10 +203,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
>>>>>  			pages++;
>>>>>  		} else if (is_swap_pte(oldpte)) {
>>>>>  			swp_entry_t entry = pte_to_swp_entry(oldpte);
>>>>> -			struct page *page = pfn_swap_entry_to_page(entry);
>>>>>  			pte_t newpte;
>>>>>  
>>>>>  			if (is_writable_migration_entry(entry)) {
>>>>> +				struct page *page = pfn_swap_entry_to_page(entry);
>>>>> +
>>>>>  				/*
>>>>>  				 * A protection check is difficult so
>>>>>  				 * just be safe and disable write
>>>>
>>>>
>>>> Stumbling over the THP code, I was wondering if we also want to adjust change_huge_pmd()
>>>> and hugetlb_change_protection. There are no actual swap entries, so I assume we're fine.
>>>>
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 482c1826e723..466364e7fc5f 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -1798,10 +1798,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>>>         if (is_swap_pmd(*pmd)) {
>>>>                 swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>>> -               struct page *page = pfn_swap_entry_to_page(entry);
>>>>  
>>>>                 VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>>>>                 if (is_writable_migration_entry(entry)) {
>>>> +                       struct page *page = pfn_swap_entry_to_page(entry);
>>>>                         pmd_t newpmd;
>>>>                         /*
>>>>                          * A protection check is difficult so
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index 2480ba627aa5..559465fae5cd 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -6370,9 +6370,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>>>>                 }
>>>>                 if (unlikely(is_hugetlb_entry_migration(pte))) {
>>>>                         swp_entry_t entry = pte_to_swp_entry(pte);
>>>> -                       struct page *page = pfn_swap_entry_to_page(entry);
>>>>  
>>>>                         if (!is_readable_migration_entry(entry)) {
>>>> +                               struct page *page = pfn_swap_entry_to_page(entry);
>>>>                                 pte_t newpte;
>>>>  
>>>>                                 if (PageAnon(page))
>>>>
>>>>
>>>> @Peter, what's your thought?
>>>
>>> IMHO they're not needed?
>>>
>>> The rule is simple in my mind: we should only pass in a pfn-typed swap
>>> entry into pfn_swap_entry_to_page() (or the new swp_offset_pfn()), or it's
>>> a violation of the API.  In these two cases they do not violate the API and
>>> they're always safe because they're guaranteed to be pfn swap entries when
>>> calling.
>>
>> I was wondering about extreme corner cases regarding the struct page.
>>
>> Assume we have a hwpoison_entry that pointed at a valid struct page. We
>> can succeed in offlining+removing the section it's located on (I was
>> recently challenging if we want to keep that behavior as it's really
>> shaky already), freeing the relevant memmap entry and the memory section.
>>
>> pfn_swap_entry_to_page() -> pfn_to_page() would be problematic if there
>> is no memmap anymore.
>>
>>
>> I assume it's ok to always call it for is_pfn_swap_entry(), but in the
>> PMD case we only check for is_swap_pmd()? Isn't that problematic?
> 
> I don't know extensively enough on hwpoison on validity of fetching page
> from pfn inside on online/offline ops, but.. if the only concern is about
> hwpoison entry existance here I think its fine?  Because iirc we'l split
> thp when any of the subpage got poisoned, so we should never hit a hwpoison
> entry in thp path.

Ah right, so we're good. Thanks!
diff mbox series

Patch

diff --git a/mm/mprotect.c b/mm/mprotect.c
index f2b9b1da9083..4549f5945ebe 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -203,10 +203,11 @@  static unsigned long change_pte_range(struct mmu_gather *tlb,
 			pages++;
 		} else if (is_swap_pte(oldpte)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
-			struct page *page = pfn_swap_entry_to_page(entry);
 			pte_t newpte;
 
 			if (is_writable_migration_entry(entry)) {
+				struct page *page = pfn_swap_entry_to_page(entry);
+
 				/*
 				 * A protection check is difficult so
 				 * just be safe and disable write