Message ID | 20200411121740.37615-1-gengdongjiu@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: handle the right RAS SEA(Synchronous External Abort) type | expand |
Hi Geng, On 11/04/2020 13:17, Dongjiu Geng wrote: > When the RAS Extension is implemented, b0b011000, 0b011100, > 0b011101, 0b011110, and 0b011111, are not used and reserved > to the DFSC[5:0] of ESR_ELx, but the code still checks these > unused bits, so remove them. They aren't unused: CPUs without the RAS extensions may still generate these. kvm_handle_guest_abort() wants to know if this is an external abort. KVM doesn't really care if the CPU has the RAS extensions or not, its the arch code's job to sort all that out. > If the handling of guest ras data error fails, it should > inject data instead of SError to let the guest recover as > much as possible. (I don't quite follow your point here). If KVM injected a synchronous external abort due to a RAS error here, then you wouldn't be able to support firmware-first RAS with Qemu. I don't think this is what you want. The handling is (and should be) decoupled. KVM guests aren't special. Whatever happens for a normal user-space process is what should happen here. KVM is just doing the plumbing: When the hypervisor takes an external abort due to the guest, it should plumb the error into the arch code to be handled. This is what would happen for a normal EL0 process. This is what do_sea() and kvm_handle_guest_sea() do with apei_claim_sea(). If the RAS code says it handled this error, then we can continue. For user-space, we return to user-space. For a guest, we return to the guest. (for user-space this piece is not quite complete in mainline, see: https://lore.kernel.org/linux-acpi/20200228174817.74278-4-james.morse@arm.com/ ) This first part happens even if the errors are notified by IRQs, or found in a polled buffer. The RAS code may have 'handled' the memory by unmapping it, and marking the corresponding page as HWPOISONed. If user-space tries to access this, it will be give an SIGBUS:MCEERR_AR. If a guest tries to do this, the same things happens. (The signal goes to Qemu). (See do_page_fault()s use of the MCEERR si_code's, and kvm_send_hwpoison_signal) This second part is the same regardless of how the kernel discovered the RAS error in the first place. If the RAS code says it did not handle this error, it means it wasn't a RAS error, or your platform doesn't support RAS. For an external-abort there is very little the hypervisor can do in this situation. It does what KVM has always done: inject an asynchronous external abort. This should only happen if the host has failed to handle the error. KVM's use of asynchronous abort is the simplest one size fits all. Are you seeing this happen? If so, what are the circumstances. Did the host handle the error? (if not: why not!) Thanks, James
Hi James On 2020/4/14 20:18, James Morse wrote: > Hi Geng, > > On 11/04/2020 13:17, Dongjiu Geng wrote: >> When the RAS Extension is implemented, b0b011000, 0b011100, >> 0b011101, 0b011110, and 0b011111, are not used and reserved >> to the DFSC[5:0] of ESR_ELx, but the code still checks these >> unused bits, so remove them. > > They aren't unused: CPUs without the RAS extensions may still generate these. > > kvm_handle_guest_abort() wants to know if this is an external abort. > KVM doesn't really care if the CPU has the RAS extensions or not, its the arch code's job > to sort all that out. No, handle_guest_sea() ---> ghes_notify_sea ---> apei driver If it is an external abort, it will call apei driver to handle it, but it should be only SEA will call the apei driver. other type of external abort should not call apei driver. I am not see arch code sort all that out. /* Synchronous External Abort? */ if (kvm_vcpu_dabt_isextabt(vcpu)) { /* * For RAS the host kernel may handle this abort. * There is no need to pass the error into the guest. */ if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu))) return 1; } > > >> If the handling of guest ras data error fails, it should >> inject data instead of SError to let the guest recover as >> much as possible. In some hardware platform, it supports RAS, but the RAS error address will be not recorded, so it is better to inject a data abort instead of SError for thtat platform. because guest will try to do recovery for the Synchronous data abort, such as kill the error application. But for SError, guest will be panic. > > (I don't quite follow your point here). > > If KVM injected a synchronous external abort due to a RAS error here, then you wouldn't be > able to support firmware-first RAS with Qemu. I don't think this is what you want. > > > The handling is (and should be) decoupled. > > KVM guests aren't special. Whatever happens for a normal user-space process is what should > happen here. KVM is just doing the plumbing: > > When the hypervisor takes an external abort due to the guest, it should plumb the error > into the arch code to be handled. This is what would happen for a normal EL0 process. > This is what do_sea() and kvm_handle_guest_sea() do with apei_claim_sea(). > > If the RAS code says it handled this error, then we can continue. For user-space, we > return to user-space. For a guest, we return to the guest. (for user-space this piece is > not quite complete in mainline, see: > https://lore.kernel.org/linux-acpi/20200228174817.74278-4-james.morse@arm.com/ ) > > This first part happens even if the errors are notified by IRQs, or found in a polled buffer. > > The RAS code may have 'handled' the memory by unmapping it, and marking the corresponding > page as HWPOISONed. If user-space tries to access this, it will be give an > SIGBUS:MCEERR_AR. If a guest tries to do this, the same things happens. (The signal goes > to Qemu). > (See do_page_fault()s use of the MCEERR si_code's, and kvm_send_hwpoison_signal) > > This second part is the same regardless of how the kernel discovered the RAS error in the > first place. > > > If the RAS code says it did not handle this error, it means it wasn't a RAS error, or your > platform doesn't support RAS. For an external-abort there is very little the hypervisor > can do in this situation. It does what KVM has always done: inject an asynchronous > external abort. > This should only happen if the host has failed to handle the error. KVM's use of > asynchronous abort is the simplest one size fits all. > > Are you seeing this happen? If so, what are the circumstances. Did the host handle the > error? (if not: why not!) Yes, some platform supports RAS but will not record the error address, so the host has failed to handle the error. > > > Thanks, > > James > . >
Hi Geng, On 16/04/2020 13:07, gengdongjiu wrote: > On 2020/4/14 20:18, James Morse wrote: >> On 11/04/2020 13:17, Dongjiu Geng wrote: >>> When the RAS Extension is implemented, b0b011000, 0b011100, >>> 0b011101, 0b011110, and 0b011111, are not used and reserved >>> to the DFSC[5:0] of ESR_ELx, but the code still checks these >>> unused bits, so remove them. >> >> They aren't unused: CPUs without the RAS extensions may still generate these. >> >> kvm_handle_guest_abort() wants to know if this is an external abort. >> KVM doesn't really care if the CPU has the RAS extensions or not, its the arch code's job >> to sort all that out. > > No, handle_guest_sea() ---> ghes_notify_sea ---> apei driver You missed the arch code's apei_claim_sea(). This is where kvm washes its hands of the exception, its up to the arch code to deal with, in the same way it deals with errors for user-space. > If it is an external abort, it will call apei driver to handle it, but it should be only SEA will call the apei driver. > other type of external abort should not call apei driver. > I am not see arch code sort all that out. The other kind is _asynchronous_ external abort, which doesn't get handled in here. Do you mean 'Synchronous external abort on translation table walk, nth level'? KVM shouldn't care, this is still up to the arch code to deal with. Do you mean 'Synchronous parity or ECC error on memory access, not on translation table walk'? (and all its translation table walk friends...) These really are still external abort! See shared/functions/aborts/IsExternalAbort() in the psuedo code. (J12-7735 of DDI0487F.a) This means they are trapped by SCR_EL3.EA. On a firmware-first system if we see one of these, its because firmware re-injected it. The arch code needs to get APEI to check for CPER records when it sees one. (KVM ... doesn't care) (I agree not having 'external' in the name is counter-intuitive. I think this is because the component that triggers them is 'in' the CPU, (e.g. the L1 cache). This is how you can know it was a parity or ECC error, instead of 'something outside' (i.e. external). An OS that is deeply tied to the CPU micro-architecture could use the difference to read imp-def sysregs for one, and poke around in imp-def mmio for the external case. Linux isn't tied to the micro-architecture, so ignores this 'internal/external' hint.) >>> If the handling of guest ras data error fails, it should >>> inject data instead of SError to let the guest recover as >>> much as possible. > > In some hardware platform, it supports RAS, but the RAS error address will be not recorded, In what circumstances does this happen? Wouldn't these errors all be uncontained? (in which case the host should panic()). e.g. A write went to the wrong address, we don't know where it went... ... Is this because your platform describes memory-errors that happen inside the CPU as processor errors, instead of memory-errors? Linux's APEI code ignores them, because it doesn't know what they are. This means you are missing the whole memory_failure(), page-unmapping, user-space signalling ... and guest error injection ... part of the RAS handling. > so it is better to inject a data abort instead of SError for thtat platform. Not at all. The SError here is KVM's response to having nothing else it can do. Having the host handle the error is the best way of solving the problem. > because guest will try to do recovery for the Synchronous data abort, such as kill the error > application. But for SError, guest will be panic. I'd rather we fix this by having the host handle the error. Botching in a synchronous exception here would mean Qemu can't properly emulate a machine that has SCR_EL3.EA set ... which you need to provide firmware-first for the guest. [...] > Yes, some platform supports RAS but will not record the error address, so the host has failed > to handle the error. I don't think its reasonable to expect KVM to do anything useful in this case. We should fix the host error handling. In what circumstances does this happen? Thanks, James
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index a30b4eec7cb4..857fbc79d678 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -380,11 +380,6 @@ static __always_inline bool kvm_vcpu_dabt_isextabt(const struct kvm_vcpu *vcpu) case FSC_SEA_TTW1: case FSC_SEA_TTW2: case FSC_SEA_TTW3: - case FSC_SECC: - case FSC_SECC_TTW0: - case FSC_SECC_TTW1: - case FSC_SECC_TTW2: - case FSC_SECC_TTW3: return true; default: return false; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index e3b9ee268823..3c7972ed7fc5 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1926,7 +1926,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) return 1; if (unlikely(!is_iabt)) { - kvm_inject_vabt(vcpu); + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); return 1; } }
When the RAS Extension is implemented, b0b011000, 0b011100, 0b011101, 0b011110, and 0b011111, are not used and reserved to the DFSC[5:0] of ESR_ELx, but the code still checks these unused bits, so remove them. If the handling of guest ras data error fails, it should inject data instead of SError to let the guest recover as much as possible. CC: Xiang Zheng <zhengxiang9@huawei.com> CC: Xiaofei Tan <tanxiaofei@huawei.com> CC: James Morse <james.morse@arm.com> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> --- Abort DFSC of ESR_EL2, below web site[1] has clarified: "When the RAS Extension is implemented, 0b011000, 0b011100, 0b011101, 0b011110, and 0b011111, are reserved." [1]: https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/esr_el2 --- arch/arm64/include/asm/kvm_emulate.h | 5 ----- virt/kvm/arm/mmu.c | 2 +- 2 files changed, 1 insertion(+), 6 deletions(-)