Message ID | 20230322005131.174499-1-tony.luck@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/mce: Check that memory address is usable for recovery | expand |
On 3/21/23 20:51, Tony Luck wrote: > uc_decode_notifier() includes a check that "struct mce" > contains a valid address for recovery. But the machine > check recovery code does not include a similar check. > > Use mce_usable_address() to check that there is a valid > address. > > Signed-off-by: Tony Luck <tony.luck@intel.com> > --- > arch/x86/kernel/cpu/mce/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index 2eec60f50057..fa28b3f7d945 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs) > /* If this triggers there is no way to recover. Die hard. */ > BUG_ON(!on_thread_stack() || !user_mode(regs)); > > - if (kill_current_task) > + if (kill_current_task || !mce_usable_address(&m)) > queue_task_work(&m, msg, kill_me_now); > else > queue_task_work(&m, msg, kill_me_maybe); I think it should be like this: if (mce_usable_address(&m)) queue_task_work(&m, msg, kill_me_maybe); else queue_task_work(&m, msg, kill_me_now); A usable address should always go through memory_failure() so that the page is marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL flag and try to kill all processes after the page is poisoned. I had a similar patch a while back: https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@amd.com/ We could also get rid of kill_me_now() like you had suggested. Thanks, Yazen
On Tue, Apr 18, 2023 at 12:41:17PM -0400, Yazen Ghannam wrote: > On 3/21/23 20:51, Tony Luck wrote: > > uc_decode_notifier() includes a check that "struct mce" > > contains a valid address for recovery. But the machine > > check recovery code does not include a similar check. > > > > Use mce_usable_address() to check that there is a valid > > address. > > > > Signed-off-by: Tony Luck <tony.luck@intel.com> > > --- > > arch/x86/kernel/cpu/mce/core.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > > index 2eec60f50057..fa28b3f7d945 100644 > > --- a/arch/x86/kernel/cpu/mce/core.c > > +++ b/arch/x86/kernel/cpu/mce/core.c > > @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs) > > /* If this triggers there is no way to recover. Die hard. */ > > BUG_ON(!on_thread_stack() || !user_mode(regs)); > > > > - if (kill_current_task) > > + if (kill_current_task || !mce_usable_address(&m)) > > queue_task_work(&m, msg, kill_me_now); > > else > > queue_task_work(&m, msg, kill_me_maybe); > > I think it should be like this: > > if (mce_usable_address(&m)) > queue_task_work(&m, msg, kill_me_maybe); > else > queue_task_work(&m, msg, kill_me_now); > > A usable address should always go through memory_failure() so that the page is > marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL > flag and try to kill all processes after the page is poisoned. > > I had a similar patch a while back: > https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@amd.com/ > > We could also get rid of kill_me_now() like you had suggested. Can we also get rid of "kill_current_task"? It is only set when there is no valid return address: if (!(m.mcgstatus & MCG_STATUS_RIPV)) kill_current_task = 1; kill_me_maybe() does an equivalent check: if (!p->mce_ripv) flags |= MF_MUST_KILL; Which only leaves this check to resolve in some way: if (worst != MCE_AR_SEVERITY && !kill_current_task) goto out; -Tony
On 4/18/23 13:27, Tony Luck wrote: > On Tue, Apr 18, 2023 at 12:41:17PM -0400, Yazen Ghannam wrote: >> On 3/21/23 20:51, Tony Luck wrote: >>> uc_decode_notifier() includes a check that "struct mce" >>> contains a valid address for recovery. But the machine >>> check recovery code does not include a similar check. >>> >>> Use mce_usable_address() to check that there is a valid >>> address. >>> >>> Signed-off-by: Tony Luck <tony.luck@intel.com> >>> --- >>> arch/x86/kernel/cpu/mce/core.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c >>> index 2eec60f50057..fa28b3f7d945 100644 >>> --- a/arch/x86/kernel/cpu/mce/core.c >>> +++ b/arch/x86/kernel/cpu/mce/core.c >>> @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs) >>> /* If this triggers there is no way to recover. Die hard. */ >>> BUG_ON(!on_thread_stack() || !user_mode(regs)); >>> >>> - if (kill_current_task) >>> + if (kill_current_task || !mce_usable_address(&m)) >>> queue_task_work(&m, msg, kill_me_now); >>> else >>> queue_task_work(&m, msg, kill_me_maybe); >> >> I think it should be like this: >> >> if (mce_usable_address(&m)) >> queue_task_work(&m, msg, kill_me_maybe); >> else >> queue_task_work(&m, msg, kill_me_now); >> >> A usable address should always go through memory_failure() so that the page is >> marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL >> flag and try to kill all processes after the page is poisoned. >> >> I had a similar patch a while back: >> https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@amd.com/ >> >> We could also get rid of kill_me_now() like you had suggested. > > Can we also get rid of "kill_current_task"? It is only set when there is > no valid return address: > > if (!(m.mcgstatus & MCG_STATUS_RIPV)) > kill_current_task = 1; > > kill_me_maybe() does an equivalent check: > > if (!p->mce_ripv) > flags |= MF_MUST_KILL; > > Which only leaves this check to resolve in some way: > > if (worst != MCE_AR_SEVERITY && !kill_current_task) > goto out; > I agree. And I think all these checks should be baked into the severity. We'll need additional, fine-grained severity levels though. The "m.cs" and "m.kflags" checks could also be baked in. Instead of just one AR severity: ... MCE_AR_SEVERITY, MCE_PANIC_SEVERITY, replace it with specific cases: ... MCE_AR_USER_RECOV, MCE_AR_USER_KILL, MCE_AR_KERNEL_COPYIN, MCE_AR_KERNEL_RECOV, MCE_PANIC_SEVERITY, Then the #MC handler can look like this: if (worst < MCE_AR_USER_RECOV) goto out; if (severity == MCE_AR_USER_RECOV) queue_task_work(&m, msg, kill_me_maybe); if (severity == MCE_AR_USER_KILL) force_sig(SIGBUS); if (severity == MCE_AR_KERNEL_COPYIN) queue_task_work(&m, msg, kill_me_never); if (severity == MCE_AR_KERNEL_RECOV) { if (!fixup_exception(regs, X86_TRAP_MC, 0, 0)) mce_panic("Failed kernel mode recovery"); } I can take a shot at this if it seems reasonable. What do you think? Thanks, Yazen
On Tue, Apr 18, 2023 at 01:51:37PM -0400, Yazen Ghannam wrote: > I agree. And I think all these checks should be baked into the severity. > We'll need additional, fine-grained severity levels though. > > The "m.cs" and "m.kflags" checks could also be baked in. > > Instead of just one AR severity: > ... > MCE_AR_SEVERITY, > MCE_PANIC_SEVERITY, > > replace it with specific cases: > ... > MCE_AR_USER_RECOV, > MCE_AR_USER_KILL, > MCE_AR_KERNEL_COPYIN, > MCE_AR_KERNEL_RECOV, > MCE_PANIC_SEVERITY, > > Then the #MC handler can look like this: > > if (worst < MCE_AR_USER_RECOV) > goto out; > > if (severity == MCE_AR_USER_RECOV) > queue_task_work(&m, msg, kill_me_maybe); > > if (severity == MCE_AR_USER_KILL) > force_sig(SIGBUS); > > if (severity == MCE_AR_KERNEL_COPYIN) > queue_task_work(&m, msg, kill_me_never); > > if (severity == MCE_AR_KERNEL_RECOV) { > if (!fixup_exception(regs, X86_TRAP_MC, 0, 0)) > mce_panic("Failed kernel mode recovery"); > } > > I can take a shot at this if it seems reasonable. That looks much cleaner. There may be some extra MCE_AR_KERNEL* options in the future (I'd like someday to address COPYOUT when the corrupt kernel data is in the page cache). But I don't think the number of cases is going to explode into dozens of cases. > What do you think? Brave person ... you are going to have to tinker with arch/x86/kernel/cpu/mce/severity.c ! Good luck. -Tony
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2eec60f50057..fa28b3f7d945 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs) /* If this triggers there is no way to recover. Die hard. */ BUG_ON(!on_thread_stack() || !user_mode(regs)); - if (kill_current_task) + if (kill_current_task || !mce_usable_address(&m)) queue_task_work(&m, msg, kill_me_now); else queue_task_work(&m, msg, kill_me_maybe);
uc_decode_notifier() includes a check that "struct mce" contains a valid address for recovery. But the machine check recovery code does not include a similar check. Use mce_usable_address() to check that there is a valid address. Signed-off-by: Tony Luck <tony.luck@intel.com> --- arch/x86/kernel/cpu/mce/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)