x86/mce: Check that memory address is usable for recovery

Message ID	20230322005131.174499-1-tony.luck@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-edac-owner@vger.kernel.org> From: Tony Luck <tony.luck@intel.com> To: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, patches@lists.linux.dev, Tony Luck <tony.luck@intel.com> Subject: [PATCH] x86/mce: Check that memory address is usable for recovery Date: Tue, 21 Mar 2023 17:51:31 -0700 Message-Id: <20230322005131.174499-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	x86/mce: Check that memory address is usable for recovery \| expand x86/mce: Check that memory address is usable for recovery

Message ID

20230322005131.174499-1-tony.luck@intel.com (mailing list archive)

State

New, archived

Headers

From: Tony Luck <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
        linux-edac@vger.kernel.org, patches@lists.linux.dev,
        Tony Luck <tony.luck@intel.com>
Subject: [PATCH] x86/mce: Check that memory address is usable for recovery
Date: Tue, 21 Mar 2023 17:51:31 -0700
Message-Id: <20230322005131.174499-1-tony.luck@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

x86/mce: Check that memory address is usable for recovery | expand

Commit Message

Tony Luck March 22, 2023, 12:51 a.m. UTC

uc_decode_notifier() includes a check that "struct mce"
contains a valid address for recovery. But the machine
check recovery code does not include a similar check.

Use mce_usable_address() to check that there is a valid
address.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Yazen Ghannam April 18, 2023, 4:41 p.m. UTC | #1

On 3/21/23 20:51, Tony Luck wrote:
> uc_decode_notifier() includes a check that "struct mce"
> contains a valid address for recovery. But the machine
> check recovery code does not include a similar check.
> 
> Use mce_usable_address() to check that there is a valid
> address.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2eec60f50057..fa28b3f7d945 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>  		/* If this triggers there is no way to recover. Die hard. */
>  		BUG_ON(!on_thread_stack() || !user_mode(regs));
>  
> -		if (kill_current_task)
> +		if (kill_current_task || !mce_usable_address(&m))
>  			queue_task_work(&m, msg, kill_me_now);
>  		else
>  			queue_task_work(&m, msg, kill_me_maybe);

I think it should be like this:

	if (mce_usable_address(&m))
		queue_task_work(&m, msg, kill_me_maybe);
	else
		queue_task_work(&m, msg, kill_me_now);

A usable address should always go through memory_failure() so that the page is
marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL
flag and try to kill all processes after the page is poisoned.

I had a similar patch a while back:
https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@amd.com/

We could also get rid of kill_me_now() like you had suggested.

Thanks,
Yazen

Tony Luck April 18, 2023, 5:27 p.m. UTC | #2

On Tue, Apr 18, 2023 at 12:41:17PM -0400, Yazen Ghannam wrote:
> On 3/21/23 20:51, Tony Luck wrote:
> > uc_decode_notifier() includes a check that "struct mce"
> > contains a valid address for recovery. But the machine
> > check recovery code does not include a similar check.
> > 
> > Use mce_usable_address() to check that there is a valid
> > address.
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/kernel/cpu/mce/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> > index 2eec60f50057..fa28b3f7d945 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
> >  		/* If this triggers there is no way to recover. Die hard. */
> >  		BUG_ON(!on_thread_stack() || !user_mode(regs));
> >  
> > -		if (kill_current_task)
> > +		if (kill_current_task || !mce_usable_address(&m))
> >  			queue_task_work(&m, msg, kill_me_now);
> >  		else
> >  			queue_task_work(&m, msg, kill_me_maybe);
> 
> I think it should be like this:
> 
> 	if (mce_usable_address(&m))
> 		queue_task_work(&m, msg, kill_me_maybe);
> 	else
> 		queue_task_work(&m, msg, kill_me_now);
> 
> A usable address should always go through memory_failure() so that the page is
> marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL
> flag and try to kill all processes after the page is poisoned.
> 
> I had a similar patch a while back:
> https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@amd.com/
> 
> We could also get rid of kill_me_now() like you had suggested.

Can we also get rid of "kill_current_task"? It is only set when there is
no valid return address:

        if (!(m.mcgstatus & MCG_STATUS_RIPV))
                kill_current_task = 1;

kill_me_maybe() does an equivalent check:

	if (!p->mce_ripv)
                flags |= MF_MUST_KILL;

Which only leaves this check to resolve in some way:

        if (worst != MCE_AR_SEVERITY && !kill_current_task)
                goto out;

-Tony

Yazen Ghannam April 18, 2023, 5:51 p.m. UTC | #3

On 4/18/23 13:27, Tony Luck wrote:
> On Tue, Apr 18, 2023 at 12:41:17PM -0400, Yazen Ghannam wrote:
>> On 3/21/23 20:51, Tony Luck wrote:
>>> uc_decode_notifier() includes a check that "struct mce"
>>> contains a valid address for recovery. But the machine
>>> check recovery code does not include a similar check.
>>>
>>> Use mce_usable_address() to check that there is a valid
>>> address.
>>>
>>> Signed-off-by: Tony Luck <tony.luck@intel.com>
>>> ---
>>>  arch/x86/kernel/cpu/mce/core.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>>> index 2eec60f50057..fa28b3f7d945 100644
>>> --- a/arch/x86/kernel/cpu/mce/core.c
>>> +++ b/arch/x86/kernel/cpu/mce/core.c
>>> @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>>>  		/* If this triggers there is no way to recover. Die hard. */
>>>  		BUG_ON(!on_thread_stack() || !user_mode(regs));
>>>  
>>> -		if (kill_current_task)
>>> +		if (kill_current_task || !mce_usable_address(&m))
>>>  			queue_task_work(&m, msg, kill_me_now);
>>>  		else
>>>  			queue_task_work(&m, msg, kill_me_maybe);
>>
>> I think it should be like this:
>>
>> 	if (mce_usable_address(&m))
>> 		queue_task_work(&m, msg, kill_me_maybe);
>> 	else
>> 		queue_task_work(&m, msg, kill_me_now);
>>
>> A usable address should always go through memory_failure() so that the page is
>> marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL
>> flag and try to kill all processes after the page is poisoned.
>>
>> I had a similar patch a while back:
>> https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@amd.com/
>>
>> We could also get rid of kill_me_now() like you had suggested.
> 
> Can we also get rid of "kill_current_task"? It is only set when there is
> no valid return address:
> 
>         if (!(m.mcgstatus & MCG_STATUS_RIPV))
>                 kill_current_task = 1;
> 
> kill_me_maybe() does an equivalent check:
> 
> 	if (!p->mce_ripv)
>                 flags |= MF_MUST_KILL;
> 
> Which only leaves this check to resolve in some way:
> 
>         if (worst != MCE_AR_SEVERITY && !kill_current_task)
>                 goto out;
> 

I agree. And I think all these checks should be baked into the severity.
We'll need additional, fine-grained severity levels though.

The "m.cs" and "m.kflags" checks could also be baked in.

Instead of just one AR severity:
	...
	MCE_AR_SEVERITY,
	MCE_PANIC_SEVERITY,

replace it with specific cases:
	...
	MCE_AR_USER_RECOV,
	MCE_AR_USER_KILL,
	MCE_AR_KERNEL_COPYIN,
	MCE_AR_KERNEL_RECOV,
	MCE_PANIC_SEVERITY,

Then the #MC handler can look like this:

	if (worst < MCE_AR_USER_RECOV)
		goto out;

	if (severity == MCE_AR_USER_RECOV)
		queue_task_work(&m, msg, kill_me_maybe);

	if (severity == MCE_AR_USER_KILL)
		force_sig(SIGBUS);

	if (severity == MCE_AR_KERNEL_COPYIN)
		queue_task_work(&m, msg, kill_me_never);
	
	if (severity == MCE_AR_KERNEL_RECOV) {
		if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
			mce_panic("Failed kernel mode recovery");
	}

I can take a shot at this if it seems reasonable.

What do you think?

Thanks,
Yazen

Tony Luck April 18, 2023, 6:23 p.m. UTC | #4

On Tue, Apr 18, 2023 at 01:51:37PM -0400, Yazen Ghannam wrote:
> I agree. And I think all these checks should be baked into the severity.
> We'll need additional, fine-grained severity levels though.
> 
> The "m.cs" and "m.kflags" checks could also be baked in.
> 
> Instead of just one AR severity:
> 	...
> 	MCE_AR_SEVERITY,
> 	MCE_PANIC_SEVERITY,
> 
> replace it with specific cases:
> 	...
> 	MCE_AR_USER_RECOV,
> 	MCE_AR_USER_KILL,
> 	MCE_AR_KERNEL_COPYIN,
> 	MCE_AR_KERNEL_RECOV,
> 	MCE_PANIC_SEVERITY,
> 
> Then the #MC handler can look like this:
> 
> 	if (worst < MCE_AR_USER_RECOV)
> 		goto out;
> 
> 	if (severity == MCE_AR_USER_RECOV)
> 		queue_task_work(&m, msg, kill_me_maybe);
> 
> 	if (severity == MCE_AR_USER_KILL)
> 		force_sig(SIGBUS);
> 
> 	if (severity == MCE_AR_KERNEL_COPYIN)
> 		queue_task_work(&m, msg, kill_me_never);
> 	
> 	if (severity == MCE_AR_KERNEL_RECOV) {
> 		if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
> 			mce_panic("Failed kernel mode recovery");
> 	}
> 
> I can take a shot at this if it seems reasonable.

That looks much cleaner. There may be some extra MCE_AR_KERNEL* options
in the future (I'd like someday to address COPYOUT when the corrupt
kernel data is in the page cache). But I don't think the number of cases
is going to explode into dozens of cases.

> What do you think?

Brave person ... you are going to have to tinker with
arch/x86/kernel/cpu/mce/severity.c !  Good luck.

-Tony

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2eec60f50057..fa28b3f7d945 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1533,7 +1533,7 @@  noinstr void do_machine_check(struct pt_regs *regs)
 		/* If this triggers there is no way to recover. Die hard. */
 		BUG_ON(!on_thread_stack() || !user_mode(regs));
 
-		if (kill_current_task)
+		if (kill_current_task || !mce_usable_address(&m))
 			queue_task_work(&m, msg, kill_me_now);
 		else
 			queue_task_work(&m, msg, kill_me_maybe);

x86/mce: Check that memory address is usable for recovery

Commit Message

Comments

Patch