diff mbox series

[3/3] mm/memory-failure: send SIGBUS in the event of thp split fail

Message ID 20240501232458.3919593-4-jane.chu@oracle.com (mailing list archive)
State New
Headers show
Series Enhance soft hwpoison handling and injection | expand

Commit Message

Jane Chu May 1, 2024, 11:24 p.m. UTC
When handle hwpoison in a GUP longterm pin'ed thp page,
try_to_split_thp_page() will fail. And at this point, there is little else
the kernel could do except sending a SIGBUS to the user process, thus
give it a chance to recover.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

Comments

Miaohe Lin May 5, 2024, 7 a.m. UTC | #1
On 2024/5/2 7:24, Jane Chu wrote:
> When handle hwpoison in a GUP longterm pin'ed thp page,
> try_to_split_thp_page() will fail. And at this point, there is little else
> the kernel could do except sending a SIGBUS to the user process, thus
> give it a chance to recover.
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>

Thanks for your patch. Some comments below.

> ---
>  mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7fcf182abb96..67f4d24a98e7 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>  	return rc;
>  }
>  
> +/*
> + * The calling condition is as such: thp split failed, page might have
> + * been GUP longterm pinned, not much can be done for recovery.
> + * But a SIGBUS should be delivered with vaddr provided so that the user
> + * application has a chance to recover. Also, application processes'
> + * election for MCE early killed will be honored.
> + */
> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
> +			struct page *hpage)
> +{
> +	struct folio *folio = page_folio(hpage);
> +	LIST_HEAD(tokill);
> +	int res = -EHWPOISON;
> +
> +	/* deal with user pages only */
> +	if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
> +		res = -EBUSY;
> +	if (!(PageLRU(hpage) || PageHuge(p)))
> +		res = -EBUSY;

Above checks seems unneeded. We already know it's thp?

> +
> +	if (res == -EHWPOISON) {
> +		collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
> +		kill_procs(&tokill, true, pfn, flags);
> +	}
> +
> +	if (flags & MF_COUNT_INCREASED)
> +		put_page(p);

This if block is broken. put_page() has been done when try_to_split_thp_page() fails?

> +

action_result is missing?

> +	return res;
> +}
> +
>  /**
>   * memory_failure - Handle memory failure of a page.
>   * @pfn: Page Number of the corrupted page
> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>  		 */
>  		SetPageHasHWPoisoned(hpage);
>  		if (try_to_split_thp_page(p) < 0) {

Should hwpoison_filter() be called in this case?

> +			if (flags & MF_ACTION_REQUIRED) {
> +				pr_err("%#lx: thp split failed\n", pfn);
> +				res = kill_procs_now(p, pfn, flags, hpage);

Can we use hwpoison_user_mappings() directly here?

Thanks.
.

> +				goto unlock_mutex;
> +			}
>  			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>  			goto unlock_mutex;
>  		}
>
Jane Chu May 6, 2024, 8:26 p.m. UTC | #2
On 5/5/2024 12:00 AM, Miaohe Lin wrote:

> On 2024/5/2 7:24, Jane Chu wrote:
>> When handle hwpoison in a GUP longterm pin'ed thp page,
>> try_to_split_thp_page() will fail. And at this point, there is little else
>> the kernel could do except sending a SIGBUS to the user process, thus
>> give it a chance to recover.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> Thanks for your patch. Some comments below.
>
>> ---
>>   mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 36 insertions(+)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 7fcf182abb96..67f4d24a98e7 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>   	return rc;
>>   }
>>   
>> +/*
>> + * The calling condition is as such: thp split failed, page might have
>> + * been GUP longterm pinned, not much can be done for recovery.
>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>> + * application has a chance to recover. Also, application processes'
>> + * election for MCE early killed will be honored.
>> + */
>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>> +			struct page *hpage)
>> +{
>> +	struct folio *folio = page_folio(hpage);
>> +	LIST_HEAD(tokill);
>> +	int res = -EHWPOISON;
>> +
>> +	/* deal with user pages only */
>> +	if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>> +		res = -EBUSY;
>> +	if (!(PageLRU(hpage) || PageHuge(p)))
>> +		res = -EBUSY;
> Above checks seems unneeded. We already know it's thp?

Agreed.

I  lifted these checks from hwpoison_user_mapping() with a hope to make 
kill_procs_now() more generic,

such as, potentially replacing kill_accessing_processes() for 
re-accessing hwpoisoned page.

But I backed out at last, due to concerns that my tests might not have 
covered sufficient number of scenarios.

>
>> +
>> +	if (res == -EHWPOISON) {
>> +		collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>> +		kill_procs(&tokill, true, pfn, flags);
>> +	}
>> +
>> +	if (flags & MF_COUNT_INCREASED)
>> +		put_page(p);
> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?

put_page() has not been done if try_to_split_thp_page() fails, and I 
think it should.

I will revise the code so that put_page() is called regardless 
MF_ACTION_REQUIRED is set or not.

>
>> +
> action_result is missing?

Indeed,  action_result() isn't always called, referring to the 
re-accessing hwpoison scenarios.

In this case, I think the reason  is that, we just killed the process 
and there is nothing

else to do or to report.

>
>> +	return res;
>> +}
>> +
>>   /**
>>    * memory_failure - Handle memory failure of a page.
>>    * @pfn: Page Number of the corrupted page
>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>   		 */
>>   		SetPageHasHWPoisoned(hpage);
>>   		if (try_to_split_thp_page(p) < 0) {
> Should hwpoison_filter() be called in this case?
Yes, it should. I will add the hwpoison_filter check.
>
>> +			if (flags & MF_ACTION_REQUIRED) {
>> +				pr_err("%#lx: thp split failed\n", pfn);
>> +				res = kill_procs_now(p, pfn, flags, hpage);
> Can we use hwpoison_user_mappings() directly here?

I thought about using hwpoison_user_mappings() with an extra flag, but 
gave up in the end.

Because most of the code there are not needed, such as the checks you 
mentioned above,

and umapping etc.

thanks!

-jane

>
> Thanks.
> .
>
>> +				goto unlock_mutex;
>> +			}
>>   			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>>   			goto unlock_mutex;
>>   		}
>>
Miaohe Lin May 8, 2024, 8:08 a.m. UTC | #3
On 2024/5/7 4:26, Jane Chu wrote:
> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
> 
>> On 2024/5/2 7:24, Jane Chu wrote:
>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>> the kernel could do except sending a SIGBUS to the user process, thus
>>> give it a chance to recover.
>>>
>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> Thanks for your patch. Some comments below.
>>
>>> ---
>>>   mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 36 insertions(+)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 7fcf182abb96..67f4d24a98e7 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>       return rc;
>>>   }
>>>   +/*
>>> + * The calling condition is as such: thp split failed, page might have
>>> + * been GUP longterm pinned, not much can be done for recovery.
>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>> + * application has a chance to recover. Also, application processes'
>>> + * election for MCE early killed will be honored.
>>> + */
>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>> +            struct page *hpage)
>>> +{
>>> +    struct folio *folio = page_folio(hpage);
>>> +    LIST_HEAD(tokill);
>>> +    int res = -EHWPOISON;
>>> +
>>> +    /* deal with user pages only */
>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>> +        res = -EBUSY;
>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>> +        res = -EBUSY;
>> Above checks seems unneeded. We already know it's thp?
> 
> Agreed.
> 
> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
> 
> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
> 
> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
> 
>>
>>> +
>>> +    if (res == -EHWPOISON) {
>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>> +        kill_procs(&tokill, true, pfn, flags);
>>> +    }
>>> +
>>> +    if (flags & MF_COUNT_INCREASED)
>>> +        put_page(p);
>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
> 
> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.

In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:

static int try_to_split_thp_page(struct page *page)
{
	int ret;

	lock_page(page);
	ret = split_huge_page(page);
	unlock_page(page);

	if (unlikely(ret))
		put_page(page);
	^^^^^^^^^^^^^^^^^^^^^^^
	return ret;
}

Or am I miss something?

> 
> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not.
> 
>>
>>> +
>> action_result is missing?
> 
> Indeed,  action_result() isn't always called, referring to the re-accessing hwpoison scenarios.
> 
> In this case, I think the reason  is that, we just killed the process and there is nothing
> 
> else to do or to report.
> 
>>
>>> +    return res;
>>> +}
>>> +
>>>   /**
>>>    * memory_failure - Handle memory failure of a page.
>>>    * @pfn: Page Number of the corrupted page
>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>>            */
>>>           SetPageHasHWPoisoned(hpage);
>>>           if (try_to_split_thp_page(p) < 0) {
>> Should hwpoison_filter() be called in this case?
> Yes, it should. I will add the hwpoison_filter check.
>>
>>> +            if (flags & MF_ACTION_REQUIRED) {

Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it?

Thanks.
.
Miaohe Lin May 8, 2024, 9:03 a.m. UTC | #4
On 2024/5/2 7:24, Jane Chu wrote:
> When handle hwpoison in a GUP longterm pin'ed thp page,
> try_to_split_thp_page() will fail. And at this point, there is little else
> the kernel could do except sending a SIGBUS to the user process, thus
> give it a chance to recover.

It seems the user process will still receive SIGBUS via kill_accessing_process()
when (re-)access thp later. So they should have a chance to recover already.
Or am I miss something?

Thanks.
.
Jane Chu May 8, 2024, 4:56 p.m. UTC | #5
On 5/8/2024 2:03 AM, Miaohe Lin wrote:

> On 2024/5/2 7:24, Jane Chu wrote:
>> When handle hwpoison in a GUP longterm pin'ed thp page,
>> try_to_split_thp_page() will fail. And at this point, there is little else
>> the kernel could do except sending a SIGBUS to the user process, thus
>> give it a chance to recover.
> It seems the user process will still receive SIGBUS via kill_accessing_process()
> when (re-)access thp later. So they should have a chance to recover already.
> Or am I miss something?

The concern is about real UE consumption in which case, it's desirable 
to kill the process ASAP without having to relying on subsequent 
access.  Also to honor processes' MCE-early-kill request. 
kill_accessing_process() is very conservative in that, it doesn't check 
other processes that have the poisoned page mapped.

thanks,

-jane

>
> Thanks.
> .
>
>
Jane Chu May 8, 2024, 5:45 p.m. UTC | #6
On 5/8/2024 1:08 AM, Miaohe Lin wrote:

> On 2024/5/7 4:26, Jane Chu wrote:
>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>
>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>> give it a chance to recover.
>>>>
>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>> Thanks for your patch. Some comments below.
>>>
>>>> ---
>>>>    mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 36 insertions(+)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>        return rc;
>>>>    }
>>>>    +/*
>>>> + * The calling condition is as such: thp split failed, page might have
>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>> + * application has a chance to recover. Also, application processes'
>>>> + * election for MCE early killed will be honored.
>>>> + */
>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>> +            struct page *hpage)
>>>> +{
>>>> +    struct folio *folio = page_folio(hpage);
>>>> +    LIST_HEAD(tokill);
>>>> +    int res = -EHWPOISON;
>>>> +
>>>> +    /* deal with user pages only */
>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>> +        res = -EBUSY;
>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>> +        res = -EBUSY;
>>> Above checks seems unneeded. We already know it's thp?
>> Agreed.
>>
>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>
>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>
>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>
>>>> +
>>>> +    if (res == -EHWPOISON) {
>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>> +    }
>>>> +
>>>> +    if (flags & MF_COUNT_INCREASED)
>>>> +        put_page(p);
>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>
> static int try_to_split_thp_page(struct page *page)
> {
> 	int ret;
>
> 	lock_page(page);
> 	ret = split_huge_page(page);
> 	unlock_page(page);
>
> 	if (unlikely(ret))
> 		put_page(page);
> 	^^^^^^^^^^^^^^^^^^^^^^^
> 	return ret;
> }
>
> Or am I miss something?

I think you caught a bug in my code, thanks!

How about moving put_page() outside try_to_split_thp_page() ?

>
>> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not.
>>
>>>> +
>>> action_result is missing?
>> Indeed,  action_result() isn't always called, referring to the re-accessing hwpoison scenarios.
>>
>> In this case, I think the reason  is that, we just killed the process and there is nothing
>>
>> else to do or to report.
>>
>>>> +    return res;
>>>> +}
>>>> +
>>>>    /**
>>>>     * memory_failure - Handle memory failure of a page.
>>>>     * @pfn: Page Number of the corrupted page
>>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>>>             */
>>>>            SetPageHasHWPoisoned(hpage);
>>>>            if (try_to_split_thp_page(p) < 0) {
>>> Should hwpoison_filter() be called in this case?
>> Yes, it should. I will add the hwpoison_filter check.
>>>> +            if (flags & MF_ACTION_REQUIRED) {
> Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it?

I took a clue from kill_accessing_process() which is invoked only if 
MF_ACTION_REQUIRED is set.

The usual code path for delivery signal is

if page-is-dirty or MF_MUST_KILL-is-set or umap-failed, then

- send SIGKILL if vaddr is -EFAULT

- send SIGBUS with BUS_MCEERR_AR if MF_ACTION_REQUIRED is set

- send SIGBUS with BUS_MCEERR_AO if MF_ACTION_REQUIRED is not set and 
process elected for MCE-early-kill

So, if kill_procs_now() is invoked only if MF_ACTION_REQUIRED (as it is 
in the patch), one can argue that

the MCE-early-kill request is not honored which deviates from the 
existing behavior.

Perhaps I should remove the

+ if (flags & MF_ACTION_REQUIRED) {

check.

thanks!

-jane



>
> Thanks.
> .
Miaohe Lin May 9, 2024, 8:30 a.m. UTC | #7
On 2024/5/9 1:45, Jane Chu wrote:
> On 5/8/2024 1:08 AM, Miaohe Lin wrote:
> 
>> On 2024/5/7 4:26, Jane Chu wrote:
>>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>>
>>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>>> give it a chance to recover.
>>>>>
>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>>> Thanks for your patch. Some comments below.
>>>>
>>>>> ---
>>>>>    mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>>    1 file changed, 36 insertions(+)
>>>>>
>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>>> --- a/mm/memory-failure.c
>>>>> +++ b/mm/memory-failure.c
>>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>>        return rc;
>>>>>    }
>>>>>    +/*
>>>>> + * The calling condition is as such: thp split failed, page might have
>>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>>> + * application has a chance to recover. Also, application processes'
>>>>> + * election for MCE early killed will be honored.
>>>>> + */
>>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>>> +            struct page *hpage)
>>>>> +{
>>>>> +    struct folio *folio = page_folio(hpage);
>>>>> +    LIST_HEAD(tokill);
>>>>> +    int res = -EHWPOISON;
>>>>> +
>>>>> +    /* deal with user pages only */
>>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>>> +        res = -EBUSY;
>>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>>> +        res = -EBUSY;
>>>> Above checks seems unneeded. We already know it's thp?
>>> Agreed.
>>>
>>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>>
>>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>>
>>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>>
>>>>> +
>>>>> +    if (res == -EHWPOISON) {
>>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>>> +    }
>>>>> +
>>>>> +    if (flags & MF_COUNT_INCREASED)
>>>>> +        put_page(p);
>>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
>> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>>
>> static int try_to_split_thp_page(struct page *page)
>> {
>>     int ret;
>>
>>     lock_page(page);
>>     ret = split_huge_page(page);
>>     unlock_page(page);
>>
>>     if (unlikely(ret))
>>         put_page(page);
>>     ^^^^^^^^^^^^^^^^^^^^^^^
>>     return ret;
>> }
>>
>> Or am I miss something?
> 
> I think you caught a bug in my code, thanks!
> 
> How about moving put_page() outside try_to_split_thp_page() ?

If you want to send SIGBUS in the event of thp split fail, it might be required to do so.
I think kill_procs_now() needs extra thp refcnt to do its work.

> 
>>
>>> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not.
>>>
>>>>> +
>>>> action_result is missing?
>>> Indeed,  action_result() isn't always called, referring to the re-accessing hwpoison scenarios.
>>>
>>> In this case, I think the reason  is that, we just killed the process and there is nothing
>>>
>>> else to do or to report.
>>>
>>>>> +    return res;
>>>>> +}
>>>>> +
>>>>>    /**
>>>>>     * memory_failure - Handle memory failure of a page.
>>>>>     * @pfn: Page Number of the corrupted page
>>>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>             */
>>>>>            SetPageHasHWPoisoned(hpage);
>>>>>            if (try_to_split_thp_page(p) < 0) {
>>>> Should hwpoison_filter() be called in this case?
>>> Yes, it should. I will add the hwpoison_filter check.
>>>>> +            if (flags & MF_ACTION_REQUIRED) {
>> Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it?
> 
> I took a clue from kill_accessing_process() which is invoked only if MF_ACTION_REQUIRED is set.
> 
> The usual code path for delivery signal is
> 
> if page-is-dirty or MF_MUST_KILL-is-set or umap-failed, then
> 
> - send SIGKILL if vaddr is -EFAULT
> 
> - send SIGBUS with BUS_MCEERR_AR if MF_ACTION_REQUIRED is set
> 
> - send SIGBUS with BUS_MCEERR_AO if MF_ACTION_REQUIRED is not set and process elected for MCE-early-kill
> 
> So, if kill_procs_now() is invoked only if MF_ACTION_REQUIRED (as it is in the patch), one can argue that
> 
> the MCE-early-kill request is not honored which deviates from the existing behavior.
> 
> Perhaps I should remove the
> 
> + if (flags & MF_ACTION_REQUIRED) {

I tend to agree MCE-early-kill request should be honored when try to kill process.
Thanks.
.
Miaohe Lin May 9, 2024, 8:52 a.m. UTC | #8
On 2024/5/9 0:56, Jane Chu wrote:
> On 5/8/2024 2:03 AM, Miaohe Lin wrote:
> 
>> On 2024/5/2 7:24, Jane Chu wrote:
>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>> the kernel could do except sending a SIGBUS to the user process, thus
>>> give it a chance to recover.
>> It seems the user process will still receive SIGBUS via kill_accessing_process()
>> when (re-)access thp later. So they should have a chance to recover already.
>> Or am I miss something?
> 
> The concern is about real UE consumption in which case, it's desirable to kill the process ASAP without having to relying on subsequent access.  Also to honor processes' MCE-early-kill request. kill_accessing_process() is very conservative in that, it doesn't check other processes that have the poisoned page mapped.

I see. Thanks for your explanation.
Thanks.
.

> 
> thanks,
> 
> -jane
> 
>>
>> Thanks.
>> .
>>
>>
> .
Jane Chu May 9, 2024, 3:34 p.m. UTC | #9
On 5/9/2024 1:30 AM, Miaohe Lin wrote:
> On 2024/5/9 1:45, Jane Chu wrote:
>> On 5/8/2024 1:08 AM, Miaohe Lin wrote:
>>
>>> On 2024/5/7 4:26, Jane Chu wrote:
>>>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>>>
>>>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>>>> give it a chance to recover.
>>>>>>
>>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>>>> Thanks for your patch. Some comments below.
>>>>>
>>>>>> ---
>>>>>>     mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 36 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>>>> --- a/mm/memory-failure.c
>>>>>> +++ b/mm/memory-failure.c
>>>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>>>         return rc;
>>>>>>     }
>>>>>>     +/*
>>>>>> + * The calling condition is as such: thp split failed, page might have
>>>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>>>> + * application has a chance to recover. Also, application processes'
>>>>>> + * election for MCE early killed will be honored.
>>>>>> + */
>>>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>>>> +            struct page *hpage)
>>>>>> +{
>>>>>> +    struct folio *folio = page_folio(hpage);
>>>>>> +    LIST_HEAD(tokill);
>>>>>> +    int res = -EHWPOISON;
>>>>>> +
>>>>>> +    /* deal with user pages only */
>>>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>>>> +        res = -EBUSY;
>>>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>>>> +        res = -EBUSY;
>>>>> Above checks seems unneeded. We already know it's thp?
>>>> Agreed.
>>>>
>>>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>>>
>>>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>>>
>>>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>>>
>>>>>> +
>>>>>> +    if (res == -EHWPOISON) {
>>>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>>>> +    }
>>>>>> +
>>>>>> +    if (flags & MF_COUNT_INCREASED)
>>>>>> +        put_page(p);
>>>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>>>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
>>> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>>>
>>> static int try_to_split_thp_page(struct page *page)
>>> {
>>>      int ret;
>>>
>>>      lock_page(page);
>>>      ret = split_huge_page(page);
>>>      unlock_page(page);
>>>
>>>      if (unlikely(ret))
>>>          put_page(page);
>>>      ^^^^^^^^^^^^^^^^^^^^^^^
>>>      return ret;
>>> }
>>>
>>> Or am I miss something?
>> I think you caught a bug in my code, thanks!
>>
>> How about moving put_page() outside try_to_split_thp_page() ?
> If you want to send SIGBUS in the event of thp split fail, it might be required to do so.
> I think kill_procs_now() needs extra thp refcnt to do its work.

Agreed.  I added an boolean to try_to_split_thp_page(),the boolean 
indicates whether to put_page().

In case of kill_procs_now(), put_page() is called afterwards.

>
>>>> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not.
>>>>
>>>>>> +
>>>>> action_result is missing?
>>>> Indeed,  action_result() isn't always called, referring to the re-accessing hwpoison scenarios.
>>>>
>>>> In this case, I think the reason  is that, we just killed the process and there is nothing
>>>>
>>>> else to do or to report.
>>>>
>>>>>> +    return res;
>>>>>> +}
>>>>>> +
>>>>>>     /**
>>>>>>      * memory_failure - Handle memory failure of a page.
>>>>>>      * @pfn: Page Number of the corrupted page
>>>>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>              */
>>>>>>             SetPageHasHWPoisoned(hpage);
>>>>>>             if (try_to_split_thp_page(p) < 0) {
>>>>> Should hwpoison_filter() be called in this case?
>>>> Yes, it should. I will add the hwpoison_filter check.
>>>>>> +            if (flags & MF_ACTION_REQUIRED) {
>>> Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it?
>> I took a clue from kill_accessing_process() which is invoked only if MF_ACTION_REQUIRED is set.
>>
>> The usual code path for delivery signal is
>>
>> if page-is-dirty or MF_MUST_KILL-is-set or umap-failed, then
>>
>> - send SIGKILL if vaddr is -EFAULT
>>
>> - send SIGBUS with BUS_MCEERR_AR if MF_ACTION_REQUIRED is set
>>
>> - send SIGBUS with BUS_MCEERR_AO if MF_ACTION_REQUIRED is not set and process elected for MCE-early-kill
>>
>> So, if kill_procs_now() is invoked only if MF_ACTION_REQUIRED (as it is in the patch), one can argue that
>>
>> the MCE-early-kill request is not honored which deviates from the existing behavior.
>>
>> Perhaps I should remove the
>>
>> + if (flags & MF_ACTION_REQUIRED) {
> I tend to agree MCE-early-kill request should be honored when try to kill process.
> Thanks.
> .

Thanks,

-jane

>
Miaohe Lin May 10, 2024, 2:59 a.m. UTC | #10
On 2024/5/9 23:34, Jane Chu wrote:
> 
> On 5/9/2024 1:30 AM, Miaohe Lin wrote:
>> On 2024/5/9 1:45, Jane Chu wrote:
>>> On 5/8/2024 1:08 AM, Miaohe Lin wrote:
>>>
>>>> On 2024/5/7 4:26, Jane Chu wrote:
>>>>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>>>>
>>>>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>>>>> give it a chance to recover.
>>>>>>>
>>>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>>>>> Thanks for your patch. Some comments below.
>>>>>>
>>>>>>> ---
>>>>>>>     mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>>>>     1 file changed, 36 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>>>>> --- a/mm/memory-failure.c
>>>>>>> +++ b/mm/memory-failure.c
>>>>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>>>>         return rc;
>>>>>>>     }
>>>>>>>     +/*
>>>>>>> + * The calling condition is as such: thp split failed, page might have
>>>>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>>>>> + * application has a chance to recover. Also, application processes'
>>>>>>> + * election for MCE early killed will be honored.
>>>>>>> + */
>>>>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>>>>> +            struct page *hpage)
>>>>>>> +{
>>>>>>> +    struct folio *folio = page_folio(hpage);
>>>>>>> +    LIST_HEAD(tokill);
>>>>>>> +    int res = -EHWPOISON;
>>>>>>> +
>>>>>>> +    /* deal with user pages only */
>>>>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>>>>> +        res = -EBUSY;
>>>>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>>>>> +        res = -EBUSY;
>>>>>> Above checks seems unneeded. We already know it's thp?
>>>>> Agreed.
>>>>>
>>>>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>>>>
>>>>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>>>>
>>>>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>>>>
>>>>>>> +
>>>>>>> +    if (res == -EHWPOISON) {
>>>>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (flags & MF_COUNT_INCREASED)
>>>>>>> +        put_page(p);
>>>>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>>>>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
>>>> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>>>>
>>>> static int try_to_split_thp_page(struct page *page)
>>>> {
>>>>      int ret;
>>>>
>>>>      lock_page(page);
>>>>      ret = split_huge_page(page);
>>>>      unlock_page(page);
>>>>
>>>>      if (unlikely(ret))
>>>>          put_page(page);
>>>>      ^^^^^^^^^^^^^^^^^^^^^^^
>>>>      return ret;
>>>> }
>>>>
>>>> Or am I miss something?
>>> I think you caught a bug in my code, thanks!
>>>
>>> How about moving put_page() outside try_to_split_thp_page() ?
>> If you want to send SIGBUS in the event of thp split fail, it might be required to do so.
>> I think kill_procs_now() needs extra thp refcnt to do its work.
> 
> Agreed.  I added an boolean to try_to_split_thp_page(),the boolean indicates whether to put_page().

IMHO, it might be too complicated to add an extra boolean to indicate whether to put_page(). It might be
more straightforward to always put_page outside try_to_split_thp_page?
Thanks.
.
Jane Chu May 10, 2024, 3:18 a.m. UTC | #11
On 5/9/2024 7:59 PM, Miaohe Lin wrote:
> On 2024/5/9 23:34, Jane Chu wrote:
>> On 5/9/2024 1:30 AM, Miaohe Lin wrote:
>>> On 2024/5/9 1:45, Jane Chu wrote:
>>>> On 5/8/2024 1:08 AM, Miaohe Lin wrote:
>>>>
>>>>> On 2024/5/7 4:26, Jane Chu wrote:
>>>>>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>>>>>
>>>>>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>>>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>>>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>>>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>>>>>> give it a chance to recover.
>>>>>>>>
>>>>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>>>>>> Thanks for your patch. Some comments below.
>>>>>>>
>>>>>>>> ---
>>>>>>>>      mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>>>>>      1 file changed, 36 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>>>>>> --- a/mm/memory-failure.c
>>>>>>>> +++ b/mm/memory-failure.c
>>>>>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>>>>>          return rc;
>>>>>>>>      }
>>>>>>>>      +/*
>>>>>>>> + * The calling condition is as such: thp split failed, page might have
>>>>>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>>>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>>>>>> + * application has a chance to recover. Also, application processes'
>>>>>>>> + * election for MCE early killed will be honored.
>>>>>>>> + */
>>>>>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>>>>>> +            struct page *hpage)
>>>>>>>> +{
>>>>>>>> +    struct folio *folio = page_folio(hpage);
>>>>>>>> +    LIST_HEAD(tokill);
>>>>>>>> +    int res = -EHWPOISON;
>>>>>>>> +
>>>>>>>> +    /* deal with user pages only */
>>>>>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>>>>>> +        res = -EBUSY;
>>>>>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>>>>>> +        res = -EBUSY;
>>>>>>> Above checks seems unneeded. We already know it's thp?
>>>>>> Agreed.
>>>>>>
>>>>>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>>>>>
>>>>>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>>>>>
>>>>>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>>>>>
>>>>>>>> +
>>>>>>>> +    if (res == -EHWPOISON) {
>>>>>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>>>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (flags & MF_COUNT_INCREASED)
>>>>>>>> +        put_page(p);
>>>>>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>>>>>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
>>>>> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>>>>>
>>>>> static int try_to_split_thp_page(struct page *page)
>>>>> {
>>>>>       int ret;
>>>>>
>>>>>       lock_page(page);
>>>>>       ret = split_huge_page(page);
>>>>>       unlock_page(page);
>>>>>
>>>>>       if (unlikely(ret))
>>>>>           put_page(page);
>>>>>       ^^^^^^^^^^^^^^^^^^^^^^^
>>>>>       return ret;
>>>>> }
>>>>>
>>>>> Or am I miss something?
>>>> I think you caught a bug in my code, thanks!
>>>>
>>>> How about moving put_page() outside try_to_split_thp_page() ?
>>> If you want to send SIGBUS in the event of thp split fail, it might be required to do so.
>>> I think kill_procs_now() needs extra thp refcnt to do its work.
>> Agreed.  I added an boolean to try_to_split_thp_page(),the boolean indicates whether to put_page().
> IMHO, it might be too complicated to add an extra boolean to indicate whether to put_page(). It might be
> more straightforward to always put_page outside try_to_split_thp_page?

Looks okay to me, let's see.  Will send out v2 in a while.

thanks,

-jane

> Thanks.
> .
>
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7fcf182abb96..67f4d24a98e7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2168,6 +2168,37 @@  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	return rc;
 }
 
+/*
+ * The calling condition is as such: thp split failed, page might have
+ * been GUP longterm pinned, not much can be done for recovery.
+ * But a SIGBUS should be delivered with vaddr provided so that the user
+ * application has a chance to recover. Also, application processes'
+ * election for MCE early killed will be honored.
+ */
+static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
+			struct page *hpage)
+{
+	struct folio *folio = page_folio(hpage);
+	LIST_HEAD(tokill);
+	int res = -EHWPOISON;
+
+	/* deal with user pages only */
+	if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
+		res = -EBUSY;
+	if (!(PageLRU(hpage) || PageHuge(p)))
+		res = -EBUSY;
+
+	if (res == -EHWPOISON) {
+		collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
+		kill_procs(&tokill, true, pfn, flags);
+	}
+
+	if (flags & MF_COUNT_INCREASED)
+		put_page(p);
+
+	return res;
+}
+
 /**
  * memory_failure - Handle memory failure of a page.
  * @pfn: Page Number of the corrupted page
@@ -2297,6 +2328,11 @@  int memory_failure(unsigned long pfn, int flags)
 		 */
 		SetPageHasHWPoisoned(hpage);
 		if (try_to_split_thp_page(p) < 0) {
+			if (flags & MF_ACTION_REQUIRED) {
+				pr_err("%#lx: thp split failed\n", pfn);
+				res = kill_procs_now(p, pfn, flags, hpage);
+				goto unlock_mutex;
+			}
 			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
 			goto unlock_mutex;
 		}