Message ID | 20230922112508.1774352-2-kristina.martsenko@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Support for Arm v8.8 memcpy instructions in KVM guests | expand |
Hi Kristina, On Fri, 22 Sep 2023 12:25:07 +0100, Kristina Martsenko <kristina.martsenko@arm.com> wrote: > > An Armv8.8 FEAT_MOPS main or epilogue instruction will take an exception > if executed on a CPU with a different MOPS implementation option (A or > B) than the CPU where the preceding prologue instruction ran. In this > case the OS exception handler is expected to reset the registers and > restart execution from the prologue instruction. > > A KVM guest may use the instructions at EL1 at times when the guest is > not able to handle the exception, expecting that the instructions will > only run on one CPU (e.g. when running UEFI boot services in the guest). > As KVM may reschedule the guest between different types of CPUs at any > time (on an asymmetric system), it needs to also handle the resulting > exception itself in case the guest is not able to. A similar situation > will also occur in the future when live migrating a guest from one type > of CPU to another. > > Add handling for the MOPS exception to KVM. The handling can be shared > with the EL0 exception handler, as the logic and register layouts are > the same. The exception can be handled right after exiting a guest, > which avoids the cost of returning to the host exit handler. > > Similarly to the EL0 exception handler, in case the main or epilogue > instruction is being single stepped, it makes sense to finish the step > before executing the prologue instruction, so advance the single step > state machine. What is the rationale for advancing the state machine? Shouldn't we instead return to the guest and immediately get the SS exception, which in turn gets reported to userspace? Is it because we rollback the PC to a previous instruction? In the latter case, won't userspace see multiple SS exceptions for the middle instruction if trapping from the epilogue? This would be a bit surprising, to say the least. Thanks, M.
On 24/09/2023 15:48, Marc Zyngier wrote: > Hi Kristina, Hi Marc, > On Fri, 22 Sep 2023 12:25:07 +0100, > Kristina Martsenko <kristina.martsenko@arm.com> wrote: >> >> An Armv8.8 FEAT_MOPS main or epilogue instruction will take an exception >> if executed on a CPU with a different MOPS implementation option (A or >> B) than the CPU where the preceding prologue instruction ran. In this >> case the OS exception handler is expected to reset the registers and >> restart execution from the prologue instruction. >> >> A KVM guest may use the instructions at EL1 at times when the guest is >> not able to handle the exception, expecting that the instructions will >> only run on one CPU (e.g. when running UEFI boot services in the guest). >> As KVM may reschedule the guest between different types of CPUs at any >> time (on an asymmetric system), it needs to also handle the resulting >> exception itself in case the guest is not able to. A similar situation >> will also occur in the future when live migrating a guest from one type >> of CPU to another. >> >> Add handling for the MOPS exception to KVM. The handling can be shared >> with the EL0 exception handler, as the logic and register layouts are >> the same. The exception can be handled right after exiting a guest, >> which avoids the cost of returning to the host exit handler. >> >> Similarly to the EL0 exception handler, in case the main or epilogue >> instruction is being single stepped, it makes sense to finish the step >> before executing the prologue instruction, so advance the single step >> state machine. > > What is the rationale for advancing the state machine? Shouldn't we > instead return to the guest and immediately get the SS exception, > which in turn gets reported to userspace? Is it because we rollback > the PC to a previous instruction? Yes, because we rollback the PC to the prologue instruction. We advance the state machine so that the SS exception is taken immediately upon returning to the guest at the prologue instruction. If we didn't advance it then we would return to the guest, execute the prologue instruction, and then take the SS exception on the middle instruction. Which would be surprising as userspace would see the middle and epilogue instructions executed multiple times but not the prologue. > In the latter case, won't userspace see multiple SS exceptions for the > middle instruction if trapping from the epilogue? This would be a bit > surprising, to say the least. Not sure I follow. Do you mean multiple in a row or multiple in total? Not in a row (we step the prologue instruction in between), but yes in total (every time we start executing the middle instruction). And this happens when trapping from the middle instruction too, not just the epilogue. Do you see a better way of handling it? Here is an example of what GDB sees when single stepping a guest while the guest executes these instructions ("mops ex" are debug prints in kvm; I've added prologue/main/epilogue comments): Breakpoint 2, 0xffff80008051b6a4 in ?? () 0xffff80008051b6a8 in ?? () # prologue 0xffff80008051b6ac in ?? () # main [ 33.615305] mops ex: memcpy: B->A: fwd: main 0xffff80008051b6a8 in ?? () # prologue 0xffff80008051b6ac in ?? () # main 0xffff80008051b6b0 in ?? () # epilogue [ 34.141251] mops ex: memcpy: A->B: fwd: epi 0xffff80008051b6a8 in ?? () # prologue 0xffff80008051b6ac in ?? () # main 0xffff80008051b6b0 in ?? () # epilogue [ 34.209822] mops ex: memcpy: B->A: fwd: epi 0xffff80008051b6a8 in ?? () # prologue 0xffff80008051b6ac in ?? () # main 0xffff80008051b6b0 in ?? () # epilogue 0xffff80008051b6b4 in ?? () [...] Thanks, Kristina
On Mon, Sep 25, 2023 at 04:16:06PM +0100, Kristina Martsenko wrote: [...] > > What is the rationale for advancing the state machine? Shouldn't we > > instead return to the guest and immediately get the SS exception, > > which in turn gets reported to userspace? Is it because we rollback > > the PC to a previous instruction? > > Yes, because we rollback the PC to the prologue instruction. We advance the > state machine so that the SS exception is taken immediately upon returning to > the guest at the prologue instruction. If we didn't advance it then we would > return to the guest, execute the prologue instruction, and then take the SS > exception on the middle instruction. Which would be surprising as userspace > would see the middle and epilogue instructions executed multiple times but not > the prologue. I agree with Kristina that taking the SS exception on the prologue is likely the best course of action. Especially since it matches the behavior of single-stepping an EL0 MOPS sequence with an intervening CPU migration. This behavior might throw an EL1 that single-steps itself for a loop, but I think it is impossible for a hypervisor to hide the consequences of vCPU migration with MOPS in the first place. Marc, I'm guessing you were most concerned about the former case where the VMM was debugging the guest. Is there something you're concerned about I missed?
On Wed, 27 Sep 2023 09:28:20 +0100, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Mon, Sep 25, 2023 at 04:16:06PM +0100, Kristina Martsenko wrote: > > [...] > > > > What is the rationale for advancing the state machine? Shouldn't we > > > instead return to the guest and immediately get the SS exception, > > > which in turn gets reported to userspace? Is it because we rollback > > > the PC to a previous instruction? > > > > Yes, because we rollback the PC to the prologue instruction. We advance the > > state machine so that the SS exception is taken immediately upon returning to > > the guest at the prologue instruction. If we didn't advance it then we would > > return to the guest, execute the prologue instruction, and then take the SS > > exception on the middle instruction. Which would be surprising as userspace > > would see the middle and epilogue instructions executed multiple times but not > > the prologue. > > I agree with Kristina that taking the SS exception on the prologue is > likely the best course of action. Especially since it matches the > behavior of single-stepping an EL0 MOPS sequence with an intervening CPU > migration. > > This behavior might throw an EL1 that single-steps itself for a loop, > but I think it is impossible for a hypervisor to hide the consequences > of vCPU migration with MOPS in the first place. > > Marc, I'm guessing you were most concerned about the former case where > the VMM was debugging the guest. Is there something you're concerned > about I missed? My concern is not only the VMM, but any userspace that perform single-stepping. Imagine the debugger tracks PC by itself, and simply increments it by 4 on a non-branch, non-fault instruction. Move the vcpu or the userspace around, rewind PC, and now the debugger is out of whack with what is executing. While I agree that there is not much a hypervisor can do about that, I'm a bit worried that we are going to break existing SW with this. Now the obvious solution is "don't do that"... M.
On 29/09/2023 10:23, Marc Zyngier wrote: > On Wed, 27 Sep 2023 09:28:20 +0100, > Oliver Upton <oliver.upton@linux.dev> wrote: >> >> On Mon, Sep 25, 2023 at 04:16:06PM +0100, Kristina Martsenko wrote: >> >> [...] >> >>>> What is the rationale for advancing the state machine? Shouldn't we >>>> instead return to the guest and immediately get the SS exception, >>>> which in turn gets reported to userspace? Is it because we rollback >>>> the PC to a previous instruction? >>> >>> Yes, because we rollback the PC to the prologue instruction. We advance the >>> state machine so that the SS exception is taken immediately upon returning to >>> the guest at the prologue instruction. If we didn't advance it then we would >>> return to the guest, execute the prologue instruction, and then take the SS >>> exception on the middle instruction. Which would be surprising as userspace >>> would see the middle and epilogue instructions executed multiple times but not >>> the prologue. >> >> I agree with Kristina that taking the SS exception on the prologue is >> likely the best course of action. Especially since it matches the >> behavior of single-stepping an EL0 MOPS sequence with an intervening CPU >> migration. >> >> This behavior might throw an EL1 that single-steps itself for a loop, >> but I think it is impossible for a hypervisor to hide the consequences >> of vCPU migration with MOPS in the first place. >> >> Marc, I'm guessing you were most concerned about the former case where >> the VMM was debugging the guest. Is there something you're concerned >> about I missed? > > My concern is not only the VMM, but any userspace that perform > single-stepping. Imagine the debugger tracks PC by itself, and simply > increments it by 4 on a non-branch, non-fault instruction. > > Move the vcpu or the userspace around, rewind PC, and now the debugger > is out of whack with what is executing. While I agree that there is > not much a hypervisor can do about that, I'm a bit worried that we are > going to break existing SW with this. > > Now the obvious solution is "don't do that"... If the debugger can handle the PC changing on branching or faulting instructions, then why can't it handle it on MOPS instructions? Wouldn't such a debugger need to be updated any time the architecture adds new branching or faulting instructions? What's different here? Confused, Kristina
On Mon, 02 Oct 2023 15:06:33 +0100, Kristina Martsenko <kristina.martsenko@arm.com> wrote: > > On 29/09/2023 10:23, Marc Zyngier wrote: > > On Wed, 27 Sep 2023 09:28:20 +0100, > > Oliver Upton <oliver.upton@linux.dev> wrote: > >> > >> On Mon, Sep 25, 2023 at 04:16:06PM +0100, Kristina Martsenko wrote: > >> > >> [...] > >> > >>>> What is the rationale for advancing the state machine? Shouldn't we > >>>> instead return to the guest and immediately get the SS exception, > >>>> which in turn gets reported to userspace? Is it because we rollback > >>>> the PC to a previous instruction? > >>> > >>> Yes, because we rollback the PC to the prologue instruction. We advance the > >>> state machine so that the SS exception is taken immediately upon returning to > >>> the guest at the prologue instruction. If we didn't advance it then we would > >>> return to the guest, execute the prologue instruction, and then take the SS > >>> exception on the middle instruction. Which would be surprising as userspace > >>> would see the middle and epilogue instructions executed multiple times but not > >>> the prologue. > >> > >> I agree with Kristina that taking the SS exception on the prologue is > >> likely the best course of action. Especially since it matches the > >> behavior of single-stepping an EL0 MOPS sequence with an intervening CPU > >> migration. > >> > >> This behavior might throw an EL1 that single-steps itself for a loop, > >> but I think it is impossible for a hypervisor to hide the consequences > >> of vCPU migration with MOPS in the first place. > >> > >> Marc, I'm guessing you were most concerned about the former case where > >> the VMM was debugging the guest. Is there something you're concerned > >> about I missed? > > > > My concern is not only the VMM, but any userspace that perform > > single-stepping. Imagine the debugger tracks PC by itself, and simply > > increments it by 4 on a non-branch, non-fault instruction. > > > > Move the vcpu or the userspace around, rewind PC, and now the debugger > > is out of whack with what is executing. While I agree that there is > > not much a hypervisor can do about that, I'm a bit worried that we are > > going to break existing SW with this. > > > > Now the obvious solution is "don't do that"... > > If the debugger can handle the PC changing on branching or faulting > instructions, then why can't it handle it on MOPS instructions? Wouldn't > such a debugger need to be updated any time the architecture adds new > branching or faulting instructions? What's different here? What is different is that we *go back* in the instruction stream, which is a first. I'm not saying that the debugger I describe above would be a very clever piece of SW, quite the opposite. But the way the architecture works results in some interesting side-effects, and I'm willing to bet that some SW will break (rr?). But again, asymmetric systems are such a bad idea that I can't say I care. Thanks, M.
On Mon, Oct 02, 2023 at 03:55:33PM +0100, Marc Zyngier wrote: > On Mon, 02 Oct 2023 15:06:33 +0100, > Kristina Martsenko <kristina.martsenko@arm.com> wrote: > > On 29/09/2023 10:23, Marc Zyngier wrote: > > > On Wed, 27 Sep 2023 09:28:20 +0100, > > > Oliver Upton <oliver.upton@linux.dev> wrote: > > >> On Mon, Sep 25, 2023 at 04:16:06PM +0100, Kristina Martsenko wrote: > > >>>> What is the rationale for advancing the state machine? Shouldn't we > > >>>> instead return to the guest and immediately get the SS exception, > > >>>> which in turn gets reported to userspace? Is it because we rollback > > >>>> the PC to a previous instruction? > > >>> > > >>> Yes, because we rollback the PC to the prologue instruction. We advance the > > >>> state machine so that the SS exception is taken immediately upon returning to > > >>> the guest at the prologue instruction. If we didn't advance it then we would > > >>> return to the guest, execute the prologue instruction, and then take the SS > > >>> exception on the middle instruction. Which would be surprising as userspace > > >>> would see the middle and epilogue instructions executed multiple times but not > > >>> the prologue. > > >> > > >> I agree with Kristina that taking the SS exception on the prologue is > > >> likely the best course of action. Especially since it matches the > > >> behavior of single-stepping an EL0 MOPS sequence with an intervening CPU > > >> migration. > > >> > > >> This behavior might throw an EL1 that single-steps itself for a loop, > > >> but I think it is impossible for a hypervisor to hide the consequences > > >> of vCPU migration with MOPS in the first place. > > >> > > >> Marc, I'm guessing you were most concerned about the former case where > > >> the VMM was debugging the guest. Is there something you're concerned > > >> about I missed? > > > > > > My concern is not only the VMM, but any userspace that perform > > > single-stepping. Imagine the debugger tracks PC by itself, and simply > > > increments it by 4 on a non-branch, non-fault instruction. > > > > > > Move the vcpu or the userspace around, rewind PC, and now the debugger > > > is out of whack with what is executing. While I agree that there is > > > not much a hypervisor can do about that, I'm a bit worried that we are > > > going to break existing SW with this. > > > > > > Now the obvious solution is "don't do that"... > > > > If the debugger can handle the PC changing on branching or faulting > > instructions, then why can't it handle it on MOPS instructions? Wouldn't > > such a debugger need to be updated any time the architecture adds new > > branching or faulting instructions? What's different here? > > What is different is that we *go back* in the instruction stream, > which is a first. I'm not saying that the debugger I describe above > would be a very clever piece of SW, quite the opposite. But the way > the architecture works results in some interesting side-effects, and > I'm willing to bet that some SW will break (rr?). The way the architecture works, either with or without Kristina's single-step change, a debugger would get confused. At least for EL0, I find the proposed (well, upstreamed) approach more predictable - it always restarts from the prologue in case of migration between CPUs with different MOPS implementation (which is not just theoretical AFAIK). It's more like these three instructions are a bigger CISC one ;) (though the CPU can step through its parts). A more transparent approach would have been to fully emulate the instructions in the kernel and advance the PC as expected but I don't think that's even possible. An implementation may decide to leave some bytes to be copied by the epilogue but we can't know that in software, it's a microarchitecture thing. There is the case of EL1 debugging itself (kgdb) and it triggers a MOPS exception to EL2. It would look weird for the guest but I guess the only other option is to disable MCE2 and let EL1 handle the mismatch MOPS option itself (assuming it knows how to; it should be fine for Linux). I think I still prefer Kristina's proposal for KVM as more generic, with the downside of breaking less usual cases like the kernel single-stepping itself.
On Tue, 03 Oct 2023 15:29:42 +0100, Catalin Marinas <catalin.marinas@arm.com> wrote: > > The way the architecture works, either with or without Kristina's > single-step change, a debugger would get confused. At least for EL0, I > find the proposed (well, upstreamed) approach more predictable - it > always restarts from the prologue in case of migration between CPUs with > different MOPS implementation (which is not just theoretical AFAIK). > It's more like these three instructions are a bigger CISC one ;) (though > the CPU can step through its parts). > > A more transparent approach would have been to fully emulate the > instructions in the kernel and advance the PC as expected but I don't > think that's even possible. An implementation may decide to leave some > bytes to be copied by the epilogue but we can't know that in software, > it's a microarchitecture thing. > > There is the case of EL1 debugging itself (kgdb) and it triggers a MOPS > exception to EL2. It would look weird for the guest but I guess the only > other option is to disable MCE2 and let EL1 handle the mismatch MOPS > option itself (assuming it knows how to; it should be fine for Linux). I > think I still prefer Kristina's proposal for KVM as more generic, with > the downside of breaking less usual cases like the kernel > single-stepping itself. I don't disagree at all. My issue isn't with Kristina's patches, which are absolutely fine. It has more to do with the shape of the FEAT_MOPS extension itself, which exposes uarch details to SW instead of abstracting them. But I've now ranted about it for close to two weeks, and it is time for me to move on... ;-) M.
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h index d66dfb3a72dd..eefe766d6161 100644 --- a/arch/arm64/include/asm/traps.h +++ b/arch/arm64/include/asm/traps.h @@ -9,10 +9,9 @@ #include <linux/list.h> #include <asm/esr.h> +#include <asm/ptrace.h> #include <asm/sections.h> -struct pt_regs; - #ifdef CONFIG_ARMV8_DEPRECATED bool try_emulate_armv8_deprecated(struct pt_regs *regs, u32 insn); #else @@ -101,4 +100,55 @@ static inline unsigned long arm64_ras_serror_get_severity(unsigned long esr) bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned long esr); void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr); + +static inline void arm64_mops_reset_regs(struct user_pt_regs *regs, unsigned long esr) +{ + bool wrong_option = esr & ESR_ELx_MOPS_ISS_WRONG_OPTION; + bool option_a = esr & ESR_ELx_MOPS_ISS_OPTION_A; + int dstreg = ESR_ELx_MOPS_ISS_DESTREG(esr); + int srcreg = ESR_ELx_MOPS_ISS_SRCREG(esr); + int sizereg = ESR_ELx_MOPS_ISS_SIZEREG(esr); + unsigned long dst, src, size; + + dst = regs->regs[dstreg]; + src = regs->regs[srcreg]; + size = regs->regs[sizereg]; + + /* + * Put the registers back in the original format suitable for a + * prologue instruction, using the generic return routine from the + * Arm ARM (DDI 0487I.a) rules CNTMJ and MWFQH. + */ + if (esr & ESR_ELx_MOPS_ISS_MEM_INST) { + /* SET* instruction */ + if (option_a ^ wrong_option) { + /* Format is from Option A; forward set */ + regs->regs[dstreg] = dst + size; + regs->regs[sizereg] = -size; + } + } else { + /* CPY* instruction */ + if (!(option_a ^ wrong_option)) { + /* Format is from Option B */ + if (regs->pstate & PSR_N_BIT) { + /* Backward copy */ + regs->regs[dstreg] = dst - size; + regs->regs[srcreg] = src - size; + } + } else { + /* Format is from Option A */ + if (size & BIT(63)) { + /* Forward copy */ + regs->regs[dstreg] = dst + size; + regs->regs[srcreg] = src + size; + regs->regs[sizereg] = -size; + } + } + } + + if (esr & ESR_ELx_MOPS_ISS_FROM_EPILOGUE) + regs->pc -= 8; + else + regs->pc -= 4; +} #endif diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 8b70759cdbb9..ede65a20e7dc 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -516,53 +516,7 @@ void do_el1_fpac(struct pt_regs *regs, unsigned long esr) void do_el0_mops(struct pt_regs *regs, unsigned long esr) { - bool wrong_option = esr & ESR_ELx_MOPS_ISS_WRONG_OPTION; - bool option_a = esr & ESR_ELx_MOPS_ISS_OPTION_A; - int dstreg = ESR_ELx_MOPS_ISS_DESTREG(esr); - int srcreg = ESR_ELx_MOPS_ISS_SRCREG(esr); - int sizereg = ESR_ELx_MOPS_ISS_SIZEREG(esr); - unsigned long dst, src, size; - - dst = pt_regs_read_reg(regs, dstreg); - src = pt_regs_read_reg(regs, srcreg); - size = pt_regs_read_reg(regs, sizereg); - - /* - * Put the registers back in the original format suitable for a - * prologue instruction, using the generic return routine from the - * Arm ARM (DDI 0487I.a) rules CNTMJ and MWFQH. - */ - if (esr & ESR_ELx_MOPS_ISS_MEM_INST) { - /* SET* instruction */ - if (option_a ^ wrong_option) { - /* Format is from Option A; forward set */ - pt_regs_write_reg(regs, dstreg, dst + size); - pt_regs_write_reg(regs, sizereg, -size); - } - } else { - /* CPY* instruction */ - if (!(option_a ^ wrong_option)) { - /* Format is from Option B */ - if (regs->pstate & PSR_N_BIT) { - /* Backward copy */ - pt_regs_write_reg(regs, dstreg, dst - size); - pt_regs_write_reg(regs, srcreg, src - size); - } - } else { - /* Format is from Option A */ - if (size & BIT(63)) { - /* Forward copy */ - pt_regs_write_reg(regs, dstreg, dst + size); - pt_regs_write_reg(regs, srcreg, src + size); - pt_regs_write_reg(regs, sizereg, -size); - } - } - } - - if (esr & ESR_ELx_MOPS_ISS_FROM_EPILOGUE) - regs->pc -= 8; - else - regs->pc -= 4; + arm64_mops_reset_regs(®s->user_regs, esr); /* * If single stepping then finish the step before executing the diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 9cfe6bd1dbe4..f99d8af0b9af 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -30,6 +30,7 @@ #include <asm/fpsimd.h> #include <asm/debug-monitors.h> #include <asm/processor.h> +#include <asm/traps.h> struct kvm_exception_table_entry { int insn, fixup; @@ -265,6 +266,22 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu) return __get_fault_info(vcpu->arch.fault.esr_el2, &vcpu->arch.fault); } +static bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code) +{ + *vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR); + arm64_mops_reset_regs(vcpu_gp_regs(vcpu), vcpu->arch.fault.esr_el2); + write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR); + + /* + * Finish potential single step before executing the prologue + * instruction. + */ + *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS; + write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR); + + return true; +} + static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu) { sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2); diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index c353a06ee7e6..c50f8459e4fc 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -192,6 +192,7 @@ static const exit_handler_fn hyp_exit_handlers[] = { [ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low, [ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low, [ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth, + [ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops, }; static const exit_handler_fn pvm_exit_handlers[] = { @@ -203,6 +204,7 @@ static const exit_handler_fn pvm_exit_handlers[] = { [ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low, [ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low, [ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth, + [ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops, }; static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu) diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index 6537f58b1a8c..796202f2e08f 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -126,6 +126,7 @@ static const exit_handler_fn hyp_exit_handlers[] = { [ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low, [ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low, [ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth, + [ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops, }; static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
An Armv8.8 FEAT_MOPS main or epilogue instruction will take an exception if executed on a CPU with a different MOPS implementation option (A or B) than the CPU where the preceding prologue instruction ran. In this case the OS exception handler is expected to reset the registers and restart execution from the prologue instruction. A KVM guest may use the instructions at EL1 at times when the guest is not able to handle the exception, expecting that the instructions will only run on one CPU (e.g. when running UEFI boot services in the guest). As KVM may reschedule the guest between different types of CPUs at any time (on an asymmetric system), it needs to also handle the resulting exception itself in case the guest is not able to. A similar situation will also occur in the future when live migrating a guest from one type of CPU to another. Add handling for the MOPS exception to KVM. The handling can be shared with the EL0 exception handler, as the logic and register layouts are the same. The exception can be handled right after exiting a guest, which avoids the cost of returning to the host exit handler. Similarly to the EL0 exception handler, in case the main or epilogue instruction is being single stepped, it makes sense to finish the step before executing the prologue instruction, so advance the single step state machine. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> --- arch/arm64/include/asm/traps.h | 54 ++++++++++++++++++++++++- arch/arm64/kernel/traps.c | 48 +--------------------- arch/arm64/kvm/hyp/include/hyp/switch.h | 17 ++++++++ arch/arm64/kvm/hyp/nvhe/switch.c | 2 + arch/arm64/kvm/hyp/vhe/switch.c | 1 + 5 files changed, 73 insertions(+), 49 deletions(-)