diff mbox series

[1/4] mm: filemap: check if any subpage is hwpoisoned for PMD page fault

Message ID 20210914183718.4236-2-shy828301@gmail.com (mailing list archive)
State New
Headers show
Series Solve silent data loss caused by poisoned page cache (shmem/tmpfs) | expand

Commit Message

Yang Shi Sept. 14, 2021, 6:37 p.m. UTC
When handling shmem page fault the THP with corrupted subpage could be PMD
mapped if certain conditions are satisfied.  But kernel is supposed to
send SIGBUS when trying to map hwpoisoned page.

There are two paths which may do PMD map: fault around and regular fault.

Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") the thing was even worse in fault around path.  The THP could be
PMD mapped as long as the VMA fits regardless what subpage is accessed and
corrupted.  After this commit as long as head page is not corrupted the THP
could be PMD mapped.

In the regulat fault path the THP could be PMD mapped as long as the corrupted
page is not accessed and the VMA fits.

Fix the loophole by iterating all subpage to check hwpoisoned one when doing
PMD map, if any is found just fallback to PTE map.  Such THP just can be PTE
mapped.  Do the check in the icache flush loop in order to avoid iterating
all subpages twice and icache flush is actually noop for most architectures.

Cc: <stable@vger.kernel.org>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/filemap.c | 15 +++++++++------
 mm/memory.c  | 11 ++++++++++-
 2 files changed, 19 insertions(+), 7 deletions(-)

Comments

Kirill A. Shutemov Sept. 15, 2021, 11:46 a.m. UTC | #1
On Tue, Sep 14, 2021 at 11:37:15AM -0700, Yang Shi wrote:
> diff --git a/mm/memory.c b/mm/memory.c
> index 25fc46e87214..1765bf72ed16 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3920,8 +3920,17 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>  	if (unlikely(!pmd_none(*vmf->pmd)))
>  		goto out;
>  
> -	for (i = 0; i < HPAGE_PMD_NR; i++)
> +	for (i = 0; i < HPAGE_PMD_NR; i++) {
> +		/*
> +		 * Just backoff if any subpage of a THP is corrupted otherwise
> +		 * the corrupted page may mapped by PMD silently to escape the
> +		 * check.  This kind of THP just can be PTE mapped.  Access to
> +		 * the corrupted subpage should trigger SIGBUS as expected.
> +		 */
> +		if (PageHWPoison(page + i))
> +			goto out;
>  		flush_icache_page(vma, page + i);
> +	}

This is somewhat costly.

flush_icache_page() is empty on most archs so compiler makes the loop go
away before the change. Also page->flags for most of the pages will not
necessary be hot.

I wounder if we should consider making PG_hwpoison to cover full compound
page. On marking page hwpoison we try to split it and mark relevant base
page, if split fails -- mark full compound page.

As alternative we can have one more flag that indicates that the compound
page contains at least one hwpoisoned base page. We should have enough
space in the first tail page.
Yang Shi Sept. 15, 2021, 5:28 p.m. UTC | #2
On Wed, Sep 15, 2021 at 4:46 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Tue, Sep 14, 2021 at 11:37:15AM -0700, Yang Shi wrote:
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 25fc46e87214..1765bf72ed16 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3920,8 +3920,17 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> >       if (unlikely(!pmd_none(*vmf->pmd)))
> >               goto out;
> >
> > -     for (i = 0; i < HPAGE_PMD_NR; i++)
> > +     for (i = 0; i < HPAGE_PMD_NR; i++) {
> > +             /*
> > +              * Just backoff if any subpage of a THP is corrupted otherwise
> > +              * the corrupted page may mapped by PMD silently to escape the
> > +              * check.  This kind of THP just can be PTE mapped.  Access to
> > +              * the corrupted subpage should trigger SIGBUS as expected.
> > +              */
> > +             if (PageHWPoison(page + i))
> > +                     goto out;
> >               flush_icache_page(vma, page + i);
> > +     }
>
> This is somewhat costly.
>
> flush_icache_page() is empty on most archs so compiler makes the loop go
> away before the change. Also page->flags for most of the pages will not
> necessary be hot.

Yeah, good point.

>
> I wounder if we should consider making PG_hwpoison to cover full compound
> page. On marking page hwpoison we try to split it and mark relevant base
> page, if split fails -- mark full compound page.

We need extra bits to record exactly which subpage(s) are poisoned so
that the right page can be isolated when splitting.

>
> As alternative we can have one more flag that indicates that the compound
> page contains at least one hwpoisoned base page. We should have enough
> space in the first tail page.

Yes, actually I was thinking about the same thing too when debugging
this problem. I think this approach is more feasible. We could add a
new flag in the first tail page just like doublemap which indicates
there is/are poisoned subpage(s). It could be cleared when splitting.

I will try to implement this in the next version. Thanks a lot for the
suggestion.

>
> --
>  Kirill A. Shutemov
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index dae481293b5d..740b7afe159a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3195,12 +3195,14 @@  static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
 	}
 
 	if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
-	    vm_fault_t ret = do_set_pmd(vmf, page);
-	    if (!ret) {
-		    /* The page is mapped successfully, reference consumed. */
-		    unlock_page(page);
-		    return true;
-	    }
+		vm_fault_t ret = do_set_pmd(vmf, page);
+		if (ret == VM_FAULT_FALLBACK)
+			goto out;
+		if (!ret) {
+			/* The page is mapped successfully, reference consumed. */
+			unlock_page(page);
+			return true;
+		}
 	}
 
 	if (pmd_none(*vmf->pmd)) {
@@ -3220,6 +3222,7 @@  static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
 		return true;
 	}
 
+out:
 	return false;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 25fc46e87214..1765bf72ed16 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3920,8 +3920,17 @@  vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	if (unlikely(!pmd_none(*vmf->pmd)))
 		goto out;
 
-	for (i = 0; i < HPAGE_PMD_NR; i++)
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
+		/*
+		 * Just backoff if any subpage of a THP is corrupted otherwise
+		 * the corrupted page may mapped by PMD silently to escape the
+		 * check.  This kind of THP just can be PTE mapped.  Access to
+		 * the corrupted subpage should trigger SIGBUS as expected.
+		 */
+		if (PageHWPoison(page + i))
+			goto out;
 		flush_icache_page(vma, page + i);
+	}
 
 	entry = mk_huge_pmd(page, vma->vm_page_prot);
 	if (write)