Message ID | 20220803155011.43721-1-mlevitsk@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | SMM emulation and interrupt shadow fixes | expand |
On 03/08/2022 17:49, Maxim Levitsky wrote: > This patch series is a result of long debug work to find out why > sometimes guests with win11 secure boot > were failing during boot. > > During writing a unit test I found another bug, turns out > that on rsm emulation, if the rsm instruction was done in real > or 32 bit mode, KVM would truncate the restored RIP to 32 bit. > > I also refactored the way we write SMRAM so it is easier > now to understand what is going on. > > The main bug in this series which I fixed is that we > allowed #SMI to happen during the STI interrupt shadow, > and we did nothing to both reset it on #SMI handler > entry and restore it on RSM. > > V3: addressed most of the review feedback from Sean (thanks!) > > Best regards, > Maxim Levitsky > > Maxim Levitsky (13): > bug: introduce ASSERT_STRUCT_OFFSET > KVM: x86: emulator: em_sysexit should update ctxt->mode > KVM: x86: emulator: introduce emulator_recalc_and_set_mode > KVM: x86: emulator: update the emulation mode after rsm > KVM: x86: emulator: update the emulation mode after CR0 write > KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on > the image format > KVM: x86: emulator/smm: add structs for KVM's smram layout > KVM: x86: emulator/smm: use smram structs in the common code > KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore > KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore > KVM: x86: SVM: use smram structs > KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode > capable > KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM > > arch/x86/include/asm/kvm_host.h | 11 +- > arch/x86/kvm/emulate.c | 305 +++++++++++++++++--------------- > arch/x86/kvm/kvm_emulate.h | 223 ++++++++++++++++++++++- > arch/x86/kvm/svm/svm.c | 30 ++-- > arch/x86/kvm/vmx/vmcs12.h | 5 +- > arch/x86/kvm/vmx/vmx.c | 4 +- > arch/x86/kvm/x86.c | 175 +++++++++--------- > include/linux/build_bug.h | 9 + > 8 files changed, 497 insertions(+), 265 deletions(-) > FWIW, we tested the v2 on 5.19 and backported it to 5.15 with minimal adaption required (mostly unrelated context change) and now also updated that backport to the v3 of this patch series. Our reproducer got fixed with either, but v3 now also avoids triggering logs like: Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle Jul 29 07:15:46 mits4 kernel: kvm: vcpu 1: requested 191999 ns lapic timer period limited to 200000 ns Jul 29 11:06:31 mits4 kernel: kvm: vcpu 1: requested 105786 ns lapic timer period limited to 200000 ns which happened earlier (not sure how deep that correlates with the v2 vs. v3, but it stuck out, so mentioning for sake of completeness). For the backport to 5.15 we skipped "KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on the image format", as that constant was there yet and the actual values stayed the same for our case FWICT and adapted to slight context changes for the others. So, the approach seems to fix our issue and we are already rolling out a kernel to users for testing and got positive feedback there too. With above in mind: Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> It would be also great to see this backported to still supported upstream stable kernels from 5.15 onwards, as there the TDP MMU got by default enabled, and that is at least increasing the chance of our reproducer to trigger dramatically. thx & cheers Thomas
On Wed, 2022-08-10 at 14:00 +0200, Thomas Lamprecht wrote: > On 03/08/2022 17:49, Maxim Levitsky wrote: > > This patch series is a result of long debug work to find out why > > sometimes guests with win11 secure boot > > were failing during boot. > > > > During writing a unit test I found another bug, turns out > > that on rsm emulation, if the rsm instruction was done in real > > or 32 bit mode, KVM would truncate the restored RIP to 32 bit. > > > > I also refactored the way we write SMRAM so it is easier > > now to understand what is going on. > > > > The main bug in this series which I fixed is that we > > allowed #SMI to happen during the STI interrupt shadow, > > and we did nothing to both reset it on #SMI handler > > entry and restore it on RSM. > > > > V3: addressed most of the review feedback from Sean (thanks!) > > > > Best regards, > > Maxim Levitsky > > > > Maxim Levitsky (13): > > bug: introduce ASSERT_STRUCT_OFFSET > > KVM: x86: emulator: em_sysexit should update ctxt->mode > > KVM: x86: emulator: introduce emulator_recalc_and_set_mode > > KVM: x86: emulator: update the emulation mode after rsm > > KVM: x86: emulator: update the emulation mode after CR0 write > > KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on > > the image format > > KVM: x86: emulator/smm: add structs for KVM's smram layout > > KVM: x86: emulator/smm: use smram structs in the common code > > KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore > > KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore > > KVM: x86: SVM: use smram structs > > KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode > > capable > > KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM > > > > arch/x86/include/asm/kvm_host.h | 11 +- > > arch/x86/kvm/emulate.c | 305 +++++++++++++++++--------------- > > arch/x86/kvm/kvm_emulate.h | 223 ++++++++++++++++++++++- > > arch/x86/kvm/svm/svm.c | 30 ++-- > > arch/x86/kvm/vmx/vmcs12.h | 5 +- > > arch/x86/kvm/vmx/vmx.c | 4 +- > > arch/x86/kvm/x86.c | 175 +++++++++--------- > > include/linux/build_bug.h | 9 + > > 8 files changed, 497 insertions(+), 265 deletions(-) > > > > FWIW, we tested the v2 on 5.19 and backported it to 5.15 with minimal adaption > required (mostly unrelated context change) and now also updated that backport > to the v3 of this patch series. > > Our reproducer got fixed with either, but v3 now also avoids triggering logs like: > > Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle > Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle > Jul 29 07:15:46 mits4 kernel: kvm: vcpu 1: requested 191999 ns lapic timer period limited to 200000 ns > Jul 29 11:06:31 mits4 kernel: kvm: vcpu 1: requested 105786 ns lapic timer period limited to 200000 ns > > which happened earlier (not sure how deep that correlates with the v2 vs. v3, but > it stuck out, so mentioning for sake of completeness). This is likely just a coincidence because V3 should not contain any functional changes vs v2. (If I remember correctly) > > For the backport to 5.15 we skipped "KVM: x86: emulator/smm: number of GPRs in > the SMRAM image depends on the image format", as that constant was there yet and > the actual values stayed the same for our case FWICT and adapted to slight context > changes for the others. > > So, the approach seems to fix our issue and we are already rolling out a kernel > to users for testing and got positive feedback there too. > > With above in mind: > > Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Thank you very much for testing! > > It would be also great to see this backported to still supported upstream stable kernels > from 5.15 onwards, as there the TDP MMU got by default enabled, and that is at least > increasing the chance of our reproducer to trigger dramatically. Best regards, Maxim Levitsky > > thx & cheers > Thomas >