diff mbox series

mm/gup: check page posion status for coredump.

Message ID 20210317163714.328a038d@alex-virtual-machine (mailing list archive)
State New, archived
Headers show
Series mm/gup: check page posion status for coredump. | expand

Commit Message

yaoaili [么爱利] March 17, 2021, 8:37 a.m. UTC
When we do coredump for user process signal, this may be an SIGBUS signal
with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is
resulted from ECC memory fail like SRAR or SRAO, we expect the memory
recovery work is finished correctly, then the get_dump_page() will not
return the error page as its process pte is set invalid by
memory_failure().

But memory_failure() may fail, and the process's related pte may not be
correctly set invalid, for current code, we will return the poison page
and get it dumped and lead to system panic as its in kernel code.

So check the poison status in get_dump_page(), and if TRUE, return NULL.

Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
---
 mm/gup.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

David Hildenbrand March 17, 2021, 9:12 a.m. UTC | #1
On 17.03.21 09:37, Aili Yao wrote:
> When we do coredump for user process signal, this may be an SIGBUS signal
> with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is
> resulted from ECC memory fail like SRAR or SRAO, we expect the memory
> recovery work is finished correctly, then the get_dump_page() will not
> return the error page as its process pte is set invalid by
> memory_failure().
> 
> But memory_failure() may fail, and the process's related pte may not be
> correctly set invalid, for current code, we will return the poison page
> and get it dumped and lead to system panic as its in kernel code.
> 
> So check the poison status in get_dump_page(), and if TRUE, return NULL.
> 
> Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> ---
>   mm/gup.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index e4c224c..499a496 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1536,6 +1536,14 @@ struct page *get_dump_page(unsigned long addr)
>   				      FOLL_FORCE | FOLL_DUMP | FOLL_GET);
>   	if (locked)
>   		mmap_read_unlock(mm);
> +
> +	if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) {
> +		if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page))))
> +			ret = 0;
> +		else if (unlikely(PageHWPoison(page)))
> +			ret = 0;
> +	}

I wonder if a simple

if (PageHWPoison(compound_head(page)))
	ret = 0;

won't suffice. But I guess the "issue" is compound pages that are not 
huge pages or transparent huge pages.

If not, we certainly want a wrapper for that magic, otherwise we have to 
replicate the same logic all over the place.

> +
>   	return (ret == 1) ? page : NULL;
>   }
>   #endif /* CONFIG_ELF_CORE */
>
yaoaili [么爱利] March 18, 2021, 3:15 a.m. UTC | #2
On Wed, 17 Mar 2021 10:12:02 +0100
David Hildenbrand <david@redhat.com> wrote:

> 
> I wonder if a simple
> 
> if (PageHWPoison(compound_head(page)))
> 	ret = 0;
> 
> won't suffice. But I guess the "issue" is compound pages that are not 
> huge pages or transparent huge pages.

Yes, the simple case won't suffice, as we mark the hugetlb page poison in head, and
other cases in the specific single page struct.

> If not, we certainly want a wrapper for that magic, otherwise we have to 
> replicate the same logic all over the place.
> 
> > +
> >   	return (ret == 1) ? page : NULL;
> >   }
> >   #endif /* CONFIG_ELF_CORE */
> >   
> 
> 

Yes, May other places meet the requirements as the coredump meets, it's better to make a
wrapper for this. But i am not familiar with the specific scenario, so this patch only cover
the coredump case.

I will post a v2 patch for this.
Matthew Wilcox March 18, 2021, 4:46 a.m. UTC | #3
On Wed, Mar 17, 2021 at 10:12:02AM +0100, David Hildenbrand wrote:
> > +	if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) {
> > +		if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page))))
> > +			ret = 0;
> > +		else if (unlikely(PageHWPoison(page)))
> > +			ret = 0;
> > +	}
> 
> I wonder if a simple
> 
> if (PageHWPoison(compound_head(page)))
> 	ret = 0;
> 
> won't suffice. But I guess the "issue" is compound pages that are not huge
> pages or transparent huge pages.

THPs don't set the HWPoison bit on the head page.

https://lore.kernel.org/linux-mm/20210316140947.GA3420@casper.infradead.org/

(and PAGEFLAG(HWPoison, hwpoison, PF_ANY))

By the way,

#ifdef CONFIG_MEMORY_FAILURE
PAGEFLAG(HWPoison, hwpoison, PF_ANY)
TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
#define __PG_HWPOISON (1UL << PG_hwpoison)
extern bool take_page_off_buddy(struct page *page);
#else
PAGEFLAG_FALSE(HWPoison)
#define __PG_HWPOISON 0
#endif

so there's no need for this 
	if (IS_ENABLED(CONFIG_MEMORY_FAILURE)
check, as it simply turns into

	if (PageHuge(page) && 0)
	else if (0)

and the compiler can optimise it all away.
yaoaili [么爱利] March 18, 2021, 5:34 a.m. UTC | #4
On Thu, 18 Mar 2021 04:46:00 +0000
Matthew Wilcox <willy@infradead.org> wrote:

> On Wed, Mar 17, 2021 at 10:12:02AM +0100, David Hildenbrand wrote:
> > > +	if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) {
> > > +		if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page))))
> > > +			ret = 0;
> > > +		else if (unlikely(PageHWPoison(page)))
> > > +			ret = 0;
> > > +	}  
> > 
> > I wonder if a simple
> > 
> > if (PageHWPoison(compound_head(page)))
> > 	ret = 0;
> > 
> > won't suffice. But I guess the "issue" is compound pages that are not huge
> > pages or transparent huge pages.  
> 
> THPs don't set the HWPoison bit on the head page.
> 
> https://lore.kernel.org/linux-mm/20210316140947.GA3420@casper.infradead.org/
> 
> (and PAGEFLAG(HWPoison, hwpoison, PF_ANY))
> 
> By the way,
> 
> #ifdef CONFIG_MEMORY_FAILURE
> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
> TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
> #define __PG_HWPOISON (1UL << PG_hwpoison)
> extern bool take_page_off_buddy(struct page *page);
> #else
> PAGEFLAG_FALSE(HWPoison)
> #define __PG_HWPOISON 0
> #endif
> 
> so there's no need for this 
> 	if (IS_ENABLED(CONFIG_MEMORY_FAILURE)
> check, as it simply turns into
> 
> 	if (PageHuge(page) && 0)
> 	else if (0)
> 
> and the compiler can optimise it all away.

Yes, You are right, I will modify this later.
Thanks for correction
David Hildenbrand March 18, 2021, 8:14 a.m. UTC | #5
On 18.03.21 05:46, Matthew Wilcox wrote:
> On Wed, Mar 17, 2021 at 10:12:02AM +0100, David Hildenbrand wrote:
>>> +	if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) {
>>> +		if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page))))
>>> +			ret = 0;
>>> +		else if (unlikely(PageHWPoison(page)))
>>> +			ret = 0;
>>> +	}
>>
>> I wonder if a simple
>>
>> if (PageHWPoison(compound_head(page)))
>> 	ret = 0;
>>
>> won't suffice. But I guess the "issue" is compound pages that are not huge
>> pages or transparent huge pages.
> 
> THPs don't set the HWPoison bit on the head page.
> 
> https://lore.kernel.org/linux-mm/20210316140947.GA3420@casper.infradead.org/

Oh, okay -- I was missing that we actually already set the HWPoison bit 
before trying to split via TestSetPageHWPoison(). I thought for a second 
that if splitting fails, we don't set any HWPoison bit.
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index e4c224c..499a496 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1536,6 +1536,14 @@  struct page *get_dump_page(unsigned long addr)
 				      FOLL_FORCE | FOLL_DUMP | FOLL_GET);
 	if (locked)
 		mmap_read_unlock(mm);
+
+	if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) {
+		if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page))))
+			ret = 0;
+		else if (unlikely(PageHWPoison(page)))
+			ret = 0;
+	}
+
 	return (ret == 1) ? page : NULL;
 }
 #endif /* CONFIG_ELF_CORE */