Message ID | 20230726204157.3604531-1-john.allen@amd.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix MCE handling on AMD hosts | expand |
Hello John, I could test your fixes and I can confirm that the BUS_MCEERR_AR is now working on AMD: Before the fix, the VM panics with: qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7f89573ce000 and GUEST addr 0x10b5ce000 of type BUS_MCEERR_AR injected [ 83.562579] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 1: a000000000000000 [ 83.562585] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81e8f6ff> {pv_native_safe_halt+0xf/0x20} [ 83.562592] mce: [Hardware Error]: TSC 3d39402bdc [ 83.562593] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693515449 SOCKET 0 APIC 0 microcode 800126e [ 83.562596] mce: [Hardware Error]: Machine check: Uncorrected error without MCA Recovery [ 83.562597] Kernel panic - not syncing: Fatal local machine check [ 83.563401] Kernel Offset: disabled With the fix, the same error injection doesn't kill the VM, but generates the following console messages: qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7fa430ab9000 and GUEST addr 0x118cb9000 of type BUS_MCEERR_AR injected [ 250.851996] Disabling lock debugging due to kernel taint [ 250.852928] mce: Uncorrected hardware memory error in user-access at 118cb9000 [ 250.853261] Memory failure: 0x118cb9: Sending SIGBUS to mce_process_rea:1227 due to hardware memory corruption [ 250.854933] mce: [Hardware Error]: Machine check events logged [ 250.855800] Memory failure: 0x118cb9: recovery action for dirty LRU page: Recovered [ 250.856661] mce: [Hardware Error]: CPU 2: Machine Check Exception: 7 Bank 9: bc00000000000000 [ 250.860552] mce: [Hardware Error]: RIP 33:<00007f56b9ecbee5> [ 250.861405] mce: [Hardware Error]: TSC 8c2c664410 ADDR 118cb9000 MISC 8c [ 250.862679] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693508937 SOCKET 0 APIC 2 microcode 800126e But a problem still exists with BUS_MCEERR_AO that kills the VM with: qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr 0x7f1d108e5000 and GUEST addr 0x114ae5000 of type BUS_MCEERR_AO injected [ 157.392905] mce: [Hardware Error]: CPU 0: Machine Check Exception: 7 Bank 9: bc00000000000000 [ 157.392912] mce: [Hardware Error]: RIP 10:<ffffffff81e8f6ff> {pv_native_safe_halt+0xf/0x20} [ 157.392919] mce: [Hardware Error]: TSC 60b92a54d0 ADDR 114ae5000 MISC 8c [ 157.392921] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693500765 SOCKET 0 APIC 0 microcode 800126e [ 157.392924] mce: [Hardware Error]: Machine check: Uncorrected unrecoverable error in kernel context [ 157.392925] Kernel panic - not syncing: Fatal local machine check [ 157.402582] Kernel Offset: disabled As AMD guests can't currently deal with BUS_MCEERR_AO MCE injection, according to me the fix is not complete, the 'AO' case must be handled. The simplest way is probably to filter it at the qemu level, to only inject the 'AR' case -- and it also gives the possibility to let qemu provide a message about an ignored 'AO' error. I would suggest to add a 3rd patch implementing this AMD specific filter: commit bf8cc74df3fcc7bf958a7c42b876e9c059fe4d06 Author: William Roche <william.roche@oracle.com> Date: Thu Aug 31 18:54:57 2023 +0000 i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest AMD guests can't currently deal with BUS_MCEERR_AO MCE injection as it panics the VM kernel. We filter this event and provide a warning message. Signed-off-by: William Roche <william.roche@oracle.com> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 9ca7187628..bd60d5697b 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -606,6 +606,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code) mcg_status |= MCG_STATUS_RIPV; } } else { + if (code == BUS_MCEERR_AO) { + /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */ + return; + } mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV; } @@ -657,7 +661,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) if (ram_addr != RAM_ADDR_INVALID && kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { kvm_hwpoison_page_add(ram_addr); - kvm_mce_inject(cpu, paddr, code); + if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO) + kvm_mce_inject(cpu, paddr, code); /* * Use different logging severity based on error type. @@ -670,8 +675,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) addr, paddr, "BUS_MCEERR_AR"); } else { warn_report("Guest MCE Memory Error at QEMU addr %p and " - "GUEST addr 0x%" HWADDR_PRIx " of type %s injected", - addr, paddr, "BUS_MCEERR_AO"); + "GUEST addr 0x%" HWADDR_PRIx " of type %s %s", + addr, paddr, "BUS_MCEERR_AO", + IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected"); } return; --- I hope this can help. William. On 7/26/23 22:41, John Allen wrote: > In the event that a guest process attempts to access memory that has > been poisoned in response to a deferred uncorrected MCE, an AMD system > will currently generate a SIGBUS error which will result in the entire > guest being shutdown. Ideally, we only want to kill the guest process > that accessed poisoned memory in this case. > > This support has been included in qemu for Intel hosts for a long time, > but there are a couple of changes needed for AMD hosts. First, we will > need to expose the SUCCOR cpuid bit to guests. Second, we need to modify > the MCE injection code to avoid Intel specific behavior when we are > running on an AMD host. > > v2: > - Add "succor" feature word. > - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature. > > John Allen (2): > i386: Add support for SUCCOR feature > i386: Fix MCE support for AMD hosts > > target/i386/cpu.c | 18 +++++++++++++++++- > target/i386/cpu.h | 4 ++++ > target/i386/helper.c | 4 ++++ > target/i386/kvm/kvm.c | 19 +++++++++++++------ > 4 files changed, 38 insertions(+), 7 deletions(-) >
On Thu, Aug 31, 2023 at 11:40:08PM +0200, William Roche wrote: > Hello John, > > I could test your fixes and I can confirm that the BUS_MCEERR_AR is now > working on AMD: > > Before the fix, the VM panics with: > > qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7f89573ce000 and > GUEST addr 0x10b5ce000 of type BUS_MCEERR_AR injected > [ 83.562579] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank > 1: a000000000000000 > [ 83.562585] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81e8f6ff> > {pv_native_safe_halt+0xf/0x20} > [ 83.562592] mce: [Hardware Error]: TSC 3d39402bdc > [ 83.562593] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693515449 > SOCKET 0 APIC 0 microcode 800126e > [ 83.562596] mce: [Hardware Error]: Machine check: Uncorrected error > without MCA Recovery > [ 83.562597] Kernel panic - not syncing: Fatal local machine check > [ 83.563401] Kernel Offset: disabled > > With the fix, the same error injection doesn't kill the VM, but generates > the following console messages: > > qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7fa430ab9000 and > GUEST addr 0x118cb9000 of type BUS_MCEERR_AR injected > [ 250.851996] Disabling lock debugging due to kernel taint > [ 250.852928] mce: Uncorrected hardware memory error in user-access at > 118cb9000 > [ 250.853261] Memory failure: 0x118cb9: Sending SIGBUS to > mce_process_rea:1227 due to hardware memory corruption > [ 250.854933] mce: [Hardware Error]: Machine check events logged > [ 250.855800] Memory failure: 0x118cb9: recovery action for dirty LRU page: > Recovered > [ 250.856661] mce: [Hardware Error]: CPU 2: Machine Check Exception: 7 Bank > 9: bc00000000000000 > [ 250.860552] mce: [Hardware Error]: RIP 33:<00007f56b9ecbee5> > [ 250.861405] mce: [Hardware Error]: TSC 8c2c664410 ADDR 118cb9000 MISC 8c > [ 250.862679] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693508937 > SOCKET 0 APIC 2 microcode 800126e > > > But a problem still exists with BUS_MCEERR_AO that kills the VM with: > > qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr > 0x7f1d108e5000 and GUEST addr 0x114ae5000 of type BUS_MCEERR_AO injected > [ 157.392905] mce: [Hardware Error]: CPU 0: Machine Check Exception: 7 Bank > 9: bc00000000000000 > [ 157.392912] mce: [Hardware Error]: RIP 10:<ffffffff81e8f6ff> > {pv_native_safe_halt+0xf/0x20} > [ 157.392919] mce: [Hardware Error]: TSC 60b92a54d0 ADDR 114ae5000 MISC 8c > [ 157.392921] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693500765 > SOCKET 0 APIC 0 microcode 800126e > [ 157.392924] mce: [Hardware Error]: Machine check: Uncorrected > unrecoverable error in kernel context > [ 157.392925] Kernel panic - not syncing: Fatal local machine check > [ 157.402582] Kernel Offset: disabled > > As AMD guests can't currently deal with BUS_MCEERR_AO MCE injection, > according to me the fix is not complete, the 'AO' case must be handled. The > simplest way is probably to filter it at the qemu level, to only inject the > 'AR' case -- and it also gives the possibility to let qemu provide a message > about an ignored 'AO' error. > > I would suggest to add a 3rd patch implementing this AMD specific filter: > > > commit bf8cc74df3fcc7bf958a7c42b876e9c059fe4d06 > Author: William Roche <william.roche@oracle.com> > Date: Thu Aug 31 18:54:57 2023 +0000 > > i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest > > AMD guests can't currently deal with BUS_MCEERR_AO MCE injection > as it panics the VM kernel. We filter this event and provide a > warning message. > > Signed-off-by: William Roche <william.roche@oracle.com> > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 9ca7187628..bd60d5697b 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -606,6 +606,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, > int code) > mcg_status |= MCG_STATUS_RIPV; > } > } else { > + if (code == BUS_MCEERR_AO) { > + /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */ > + return; > + } > mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV; > } > > @@ -657,7 +661,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void > *addr) > if (ram_addr != RAM_ADDR_INVALID && > kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) > { > kvm_hwpoison_page_add(ram_addr); > - kvm_mce_inject(cpu, paddr, code); > + if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO) > + kvm_mce_inject(cpu, paddr, code); > > /* > * Use different logging severity based on error type. > @@ -670,8 +675,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void > *addr) > addr, paddr, "BUS_MCEERR_AR"); > } else { > warn_report("Guest MCE Memory Error at QEMU addr %p and " > - "GUEST addr 0x%" HWADDR_PRIx " of type %s injected", > - addr, paddr, "BUS_MCEERR_AO"); > + "GUEST addr 0x%" HWADDR_PRIx " of type %s %s", > + addr, paddr, "BUS_MCEERR_AO", > + IS_AMD_CPU(env) ? "ignored on AMD guest" : > "injected"); > } > > return; > --- Thanks, I think this will be a good solution for now while we can't fully support AO errors. I will test this and include in the next version of the series. Thanks, John > > > I hope this can help. > > William. > > > On 7/26/23 22:41, John Allen wrote: > > In the event that a guest process attempts to access memory that has > > been poisoned in response to a deferred uncorrected MCE, an AMD system > > will currently generate a SIGBUS error which will result in the entire > > guest being shutdown. Ideally, we only want to kill the guest process > > that accessed poisoned memory in this case. > > > > This support has been included in qemu for Intel hosts for a long time, > > but there are a couple of changes needed for AMD hosts. First, we will > > need to expose the SUCCOR cpuid bit to guests. Second, we need to modify > > the MCE injection code to avoid Intel specific behavior when we are > > running on an AMD host. > > > > v2: > > - Add "succor" feature word. > > - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature. > > > > John Allen (2): > > i386: Add support for SUCCOR feature > > i386: Fix MCE support for AMD hosts > > > > target/i386/cpu.c | 18 +++++++++++++++++- > > target/i386/cpu.h | 4 ++++ > > target/i386/helper.c | 4 ++++ > > target/i386/kvm/kvm.c | 19 +++++++++++++------ > > 4 files changed, 38 insertions(+), 7 deletions(-) > >