Message ID | 20240226190344.787149-11-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: SEV: allow customizing VMSA features | expand |
On Mon, Feb 26, 2024 at 02:03:39PM -0500, Paolo Bonzini wrote: > Some VM types have characteristics in common; in fact, the only use > of VM types right now is kvm_arch_has_private_mem and it assumes that > _all_ nonzero VM types have private memory. > > We will soon introduce a VM type for SEV and SEV-ES VMs, and at that > point we will have two special characteristics of confidential VMs > that depend on the VM type: not just if memory is private, but > also whether guest state is protected. For the latter we have > kvm->arch.guest_state_protected, which is only set on a fully initialized > VM. > > For VM types with protected guest state, we can actually fix a problem in > the SEV-ES implementation, where ioctls to set registers do not cause an > error even if the VM has been initialized and the guest state encrypted. > Make sure that when using VM types that will become an error. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > Message-Id: <20240209183743.22030-7-pbonzini@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/include/asm/kvm_host.h | 7 ++- > arch/x86/kvm/x86.c | 95 +++++++++++++++++++++++++++------ > 2 files changed, 84 insertions(+), 18 deletions(-) > > @@ -5552,9 +5561,13 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, > } > > > -static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, > - u8 *state, unsigned int size) > +static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, > + u8 *state, unsigned int size) > { > + if (vcpu->kvm->arch.has_protected_state && > + fpstate_is_confidential(&vcpu->arch.guest_fpu)) > + return -EINVAL; > + > /* > * Only copy state for features that are enabled for the guest. The > * state itself isn't problematic, but setting bits in the header for > @@ -5571,22 +5584,27 @@ static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, > XFEATURE_MASK_FPSSE; > > if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) > - return; > + return 0; > > fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size, > supported_xcr0, vcpu->arch.pkru); > + return 0; > } > > -static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, > - struct kvm_xsave *guest_xsave) > +static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, > + struct kvm_xsave *guest_xsave) > { > - kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region, > - sizeof(guest_xsave->region)); > + return kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region, > + sizeof(guest_xsave->region)); > } > > static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, > struct kvm_xsave *guest_xsave) > { > + if (vcpu->kvm->arch.has_protected_state && > + fpstate_is_confidential(&vcpu->arch.guest_fpu)) > + return -EINVAL; > + > if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) > return 0; I've been trying to get SNP running on top of these patches and hit and issue with these due to fpstate_set_confidential() being done during svm_vcpu_create(), so when QEMU tries to sync FPU state prior to calling SNP_LAUNCH_FINISH it errors out. I think the same would happen with SEV-ES as well. Maybe fpstate_set_confidential() should be relocated to SEV_LAUNCH_FINISH site as part of these patches? Also, do you happen to have a pointer to the WIP QEMU patches? Happy to help with posting/testing those since we'll need similar for SEV_INIT2-based SNP patches. Thanks, Mike
On Wed, Mar 13, 2024 at 09:49:52PM -0500, Michael Roth wrote: > On Mon, Feb 26, 2024 at 02:03:39PM -0500, Paolo Bonzini wrote: > > Some VM types have characteristics in common; in fact, the only use > > of VM types right now is kvm_arch_has_private_mem and it assumes that > > _all_ nonzero VM types have private memory. > > > > We will soon introduce a VM type for SEV and SEV-ES VMs, and at that > > point we will have two special characteristics of confidential VMs > > that depend on the VM type: not just if memory is private, but > > also whether guest state is protected. For the latter we have > > kvm->arch.guest_state_protected, which is only set on a fully initialized > > VM. > > > > For VM types with protected guest state, we can actually fix a problem in > > the SEV-ES implementation, where ioctls to set registers do not cause an > > error even if the VM has been initialized and the guest state encrypted. > > Make sure that when using VM types that will become an error. > > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > Message-Id: <20240209183743.22030-7-pbonzini@redhat.com> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > --- > > arch/x86/include/asm/kvm_host.h | 7 ++- > > arch/x86/kvm/x86.c | 95 +++++++++++++++++++++++++++------ > > 2 files changed, 84 insertions(+), 18 deletions(-) > > > > @@ -5552,9 +5561,13 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, > > } > > > > > > -static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, > > - u8 *state, unsigned int size) > > +static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, > > + u8 *state, unsigned int size) > > { > > + if (vcpu->kvm->arch.has_protected_state && > > + fpstate_is_confidential(&vcpu->arch.guest_fpu)) > > + return -EINVAL; > > + > > /* > > * Only copy state for features that are enabled for the guest. The > > * state itself isn't problematic, but setting bits in the header for > > @@ -5571,22 +5584,27 @@ static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, > > XFEATURE_MASK_FPSSE; > > > > if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) > > - return; > > + return 0; > > > > fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size, > > supported_xcr0, vcpu->arch.pkru); > > + return 0; > > } > > > > -static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, > > - struct kvm_xsave *guest_xsave) > > +static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, > > + struct kvm_xsave *guest_xsave) > > { > > - kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region, > > - sizeof(guest_xsave->region)); > > + return kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region, > > + sizeof(guest_xsave->region)); > > } > > > > static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, > > struct kvm_xsave *guest_xsave) > > { > > + if (vcpu->kvm->arch.has_protected_state && > > + fpstate_is_confidential(&vcpu->arch.guest_fpu)) > > + return -EINVAL; > > + > > if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) > > return 0; > > I've been trying to get SNP running on top of these patches and hit and > issue with these due to fpstate_set_confidential() being done during > svm_vcpu_create(), so when QEMU tries to sync FPU state prior to calling > SNP_LAUNCH_FINISH it errors out. I think the same would happen with > SEV-ES as well. > > Maybe fpstate_set_confidential() should be relocated to SEV_LAUNCH_FINISH > site as part of these patches? Talked to Tom a bit about this and that might not make much sense unless we actually want to add some code to sync that FPU state into the VMSA prior to encryption/measurement. Otherwise, it might as well be set to confidential as soon as vCPU is created. And if userspace wants to write FPU register state that will not actually become part of the guest state, it probably does make sense to return an error for new VM types and leave it to userspace to deal with special-casing that vs. the other ioctls like SET_REGS/SREGS/etc. -Mike > > Also, do you happen to have a pointer to the WIP QEMU patches? Happy to > help with posting/testing those since we'll need similar for > SEV_INIT2-based SNP patches. > > Thanks, > > Mike >
On Thu, Mar 14, 2024, Michael Roth wrote: > On Wed, Mar 13, 2024 at 09:49:52PM -0500, Michael Roth wrote: > > I've been trying to get SNP running on top of these patches and hit and > > issue with these due to fpstate_set_confidential() being done during > > svm_vcpu_create(), so when QEMU tries to sync FPU state prior to calling > > SNP_LAUNCH_FINISH it errors out. I think the same would happen with > > SEV-ES as well. > > > > Maybe fpstate_set_confidential() should be relocated to SEV_LAUNCH_FINISH > > site as part of these patches? > > Talked to Tom a bit about this and that might not make much sense unless > we actually want to add some code to sync that FPU state into the VMSA > prior to encryption/measurement. Otherwise, it might as well be set to > confidential as soon as vCPU is created. > > And if userspace wants to write FPU register state that will not actually > become part of the guest state, it probably does make sense to return an > error for new VM types and leave it to userspace to deal with > special-casing that vs. the other ioctls like SET_REGS/SREGS/etc. Won't regs and sregs suffer the same fate? That might not matter _today_ for "real" VMs, but it would be a blocking issue for selftests, which need to stuff state to jumpstart vCPUs. And maybe someday real VMs will catch up to the times and stop starting at the RESET vector...
On Thu, Mar 14, 2024 at 03:56:27PM -0700, Sean Christopherson wrote: > On Thu, Mar 14, 2024, Michael Roth wrote: > > On Wed, Mar 13, 2024 at 09:49:52PM -0500, Michael Roth wrote: > > > I've been trying to get SNP running on top of these patches and hit and > > > issue with these due to fpstate_set_confidential() being done during > > > svm_vcpu_create(), so when QEMU tries to sync FPU state prior to calling > > > SNP_LAUNCH_FINISH it errors out. I think the same would happen with > > > SEV-ES as well. > > > > > > Maybe fpstate_set_confidential() should be relocated to SEV_LAUNCH_FINISH > > > site as part of these patches? > > > > Talked to Tom a bit about this and that might not make much sense unless > > we actually want to add some code to sync that FPU state into the VMSA > > prior to encryption/measurement. Otherwise, it might as well be set to > > confidential as soon as vCPU is created. > > > > And if userspace wants to write FPU register state that will not actually > > become part of the guest state, it probably does make sense to return an > > error for new VM types and leave it to userspace to deal with > > special-casing that vs. the other ioctls like SET_REGS/SREGS/etc. > > Won't regs and sregs suffer the same fate? That might not matter _today_ for > "real" VMs, but it would be a blocking issue for selftests, which need to stuff > state to jumpstart vCPUs. SET_REGS/SREGS and the others only throw an error when vcpu->arch.guest_state_protected gets set, which doesn't happen until sev_launch_update_vmsa(). So in those cases userspace is still able to sync additional/non-reset state prior initial launch. It's just XSAVE/XSAVE2 that are a bit more restrictive because they check fpstate_is_confidential() instead, which gets set during vCPU creation. Somewhat related, but just noticed that KVM_SET_FPU also relies on fpstate_is_confidential() but still silently returns 0 with this series. Seems like it should be handled the same way as XSAVE/XSAVE2, whatever we end up doing. -Mike > > And maybe someday real VMs will catch up to the times and stop starting at the > RESET vector... >
On Thu, Mar 14, 2024, Michael Roth wrote: > On Thu, Mar 14, 2024 at 03:56:27PM -0700, Sean Christopherson wrote: > > On Thu, Mar 14, 2024, Michael Roth wrote: > > > On Wed, Mar 13, 2024 at 09:49:52PM -0500, Michael Roth wrote: > > > > I've been trying to get SNP running on top of these patches and hit and > > > > issue with these due to fpstate_set_confidential() being done during > > > > svm_vcpu_create(), so when QEMU tries to sync FPU state prior to calling > > > > SNP_LAUNCH_FINISH it errors out. I think the same would happen with > > > > SEV-ES as well. > > > > Maybe fpstate_set_confidential() should be relocated to SEV_LAUNCH_FINISH > > > > site as part of these patches? > > > > > > Talked to Tom a bit about this and that might not make much sense unless > > > we actually want to add some code to sync that FPU state into the VMSA Is manually copying required for register state? If so, manually copying everything seems like the way to go, otherwise we'll end up with a confusing ABI where a rather arbitrary set of bits are (not) configurable by userspace. > > > prior to encryption/measurement. Otherwise, it might as well be set to > > > confidential as soon as vCPU is created. > > > > > > And if userspace wants to write FPU register state that will not actually > > > become part of the guest state, it probably does make sense to return an > > > error for new VM types and leave it to userspace to deal with > > > special-casing that vs. the other ioctls like SET_REGS/SREGS/etc. > > > > Won't regs and sregs suffer the same fate? That might not matter _today_ for > > "real" VMs, but it would be a blocking issue for selftests, which need to stuff > > state to jumpstart vCPUs. > > SET_REGS/SREGS and the others only throw an error when > vcpu->arch.guest_state_protected gets set, which doesn't happen until Ah, I misread the diff and didn't see the existing check on fpstate_is_confidential(). Side topic, I could have sworn KVM didn't allocate the guest fpstate for SEV-ES, but git blame says otherwise. Avoiding that allocation would have been an argument for immediately marking the fpstate confidential. That said, any reason not to free the state when the fpstate is marked confidential? > sev_launch_update_vmsa(). So in those cases userspace is still able to sync > additional/non-reset state prior initial launch. It's just XSAVE/XSAVE2 that > are a bit more restrictive because they check fpstate_is_confidential() > instead, which gets set during vCPU creation. > > Somewhat related, but just noticed that KVM_SET_FPU also relies on > fpstate_is_confidential() but still silently returns 0 with this series. > Seems like it should be handled the same way as XSAVE/XSAVE2, whatever we > end up doing. +1 Also, I think a less confusing and more robust way to deal with the new VM types would be to condition only the return code on whether or not the VM has protected state, e.g. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9d670a45aea4..0e245738d4c5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5606,10 +5606,6 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, u8 *state, unsigned int size) { - if (vcpu->kvm->arch.has_protected_state && - fpstate_is_confidential(&vcpu->arch.guest_fpu)) - return -EINVAL; - /* * Only copy state for features that are enabled for the guest. The * state itself isn't problematic, but setting bits in the header for @@ -5626,7 +5622,7 @@ static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, XFEATURE_MASK_FPSSE; if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) - return 0; + return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0; fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size, supported_xcr0, vcpu->arch.pkru);
On Fri, Mar 15, 2024 at 3:57 PM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Mar 14, 2024, Michael Roth wrote: > > On Thu, Mar 14, 2024 at 03:56:27PM -0700, Sean Christopherson wrote: > > > On Thu, Mar 14, 2024, Michael Roth wrote: > > > > On Wed, Mar 13, 2024 at 09:49:52PM -0500, Michael Roth wrote: > > > > > I've been trying to get SNP running on top of these patches and hit and > > > > > issue with these due to fpstate_set_confidential() being done during > > > > > svm_vcpu_create(), so when QEMU tries to sync FPU state prior to calling > > > > > SNP_LAUNCH_FINISH it errors out. I think the same would happen with > > > > > SEV-ES as well. > > > > > Maybe fpstate_set_confidential() should be relocated to SEV_LAUNCH_FINISH > > > > > site as part of these patches? > > > > > > > > Talked to Tom a bit about this and that might not make much sense unless > > > > we actually want to add some code to sync that FPU state into the VMSA > > Is manually copying required for register state? If so, manually copying everything > seems like the way to go, otherwise we'll end up with a confusing ABI where a > rather arbitrary set of bits are (not) configurable by userspace. Yes, see sev_es_sync_vmsa. I'll add FPU as well. > > SET_REGS/SREGS and the others only throw an error when > > vcpu->arch.guest_state_protected gets set, which doesn't happen until > > Ah, I misread the diff and didn't see the existing check on fpstate_is_confidential(). > > Side topic, I could have sworn KVM didn't allocate the guest fpstate for SEV-ES, > but git blame says otherwise. Avoiding that allocation would have been an argument > for immediately marking the fpstate confidential. > > That said, any reason not to free the state when the fpstate is marked confidential? No reason not to do it, except not wanting to add more cases to code that's already pretty hairy. > > sev_launch_update_vmsa(). So in those cases userspace is still able to sync > > additional/non-reset state prior initial launch. It's just XSAVE/XSAVE2 that > > are a bit more restrictive because they check fpstate_is_confidential() > > instead, which gets set during vCPU creation. > > > > Somewhat related, but just noticed that KVM_SET_FPU also relies on > > fpstate_is_confidential() but still silently returns 0 with this series. > > Seems like it should be handled the same way as XSAVE/XSAVE2, whatever we > > end up doing. > > +1 > > Also, I think a less confusing and more robust way to deal with the new VM types > would be to condition only the return code on whether or not the VM has protected > state Makes sense (I found KVM_GET/SET_FPU independently and will fix that as well in the next submission). Paolo
On Thu, Mar 14, 2024 at 3:50 AM Michael Roth <michael.roth@amd.com> wrote: > I've been trying to get SNP running on top of these patches and hit and > issue with these due to fpstate_set_confidential() being done during > svm_vcpu_create(), so when QEMU tries to sync FPU state prior to calling > SNP_LAUNCH_FINISH it errors out. I think the same would happen with > SEV-ES as well. > > Maybe fpstate_set_confidential() should be relocated to SEV_LAUNCH_FINISH > site as part of these patches? To SEV_LAUNCH_UPDATE_VMSA, I think, since that's where the last opportunity lies to sync the contents of struct kvm_vcpu. > Also, do you happen to have a pointer to the WIP QEMU patches? Happy to > help with posting/testing those since we'll need similar for > SEV_INIT2-based SNP patches. Pushed to https://gitlab.com/bonzini/qemu, branch sevinit2. There is a hackish commit "runstate: skip initial CPU reset if reset is not actually possible" that needs some auditing, because I'd like to replace - cpu_synchronize_all_post_reset(); + if (cpus_are_resettable()) { + cpu_synchronize_all_post_reset(); + } else { + /* Assume that cpu_synchronize_all_post_init() was enough. */ + assert(runstate_check(RUN_STATE_PRELAUNCH)); + } with - cpu_synchronize_all_post_reset(); + /* + * cpu_synchronize_all_post_init() has already happened if the VM hasn't + * launched. + */ + if (!runstate_check(RUN_STATE_PRELAUNCH)) { + cpu_synchronize_all_post_reset(); + } Paolo
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 651ce10cc152..4c05b3001475 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1279,12 +1279,14 @@ enum kvm_apicv_inhibit { }; struct kvm_arch { - unsigned long vm_type; unsigned long n_used_mmu_pages; unsigned long n_requested_mmu_pages; unsigned long n_max_mmu_pages; unsigned int indirect_shadow_pages; u8 mmu_valid_gen; + u8 vm_type; + bool has_private_mem; + bool has_protected_state; struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; struct list_head active_mmu_pages; struct list_head zapped_obsolete_pages; @@ -2136,8 +2138,9 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd); void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level, int tdp_max_root_level, int tdp_huge_page_level); + #ifdef CONFIG_KVM_PRIVATE_MEM -#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.vm_type != KVM_X86_DEFAULT_VM) +#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem) #else #define kvm_arch_has_private_mem(kvm) false #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a12af5042d82..9d91eb1b3080 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5510,12 +5510,16 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, return 0; } -static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu, - struct kvm_debugregs *dbgregs) +static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu, + struct kvm_debugregs *dbgregs) { unsigned long val; unsigned int i; + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + return -EINVAL; + memset(dbgregs, 0, sizeof(*dbgregs)); BUILD_BUG_ON(ARRAY_SIZE(vcpu->arch.db) != ARRAY_SIZE(dbgregs->db)); @@ -5525,6 +5529,7 @@ static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu, kvm_get_dr(vcpu, 6, &val); dbgregs->dr6 = val; dbgregs->dr7 = vcpu->arch.dr7; + return 0; } static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, @@ -5532,6 +5537,10 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, { unsigned int i; + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + return -EINVAL; + if (dbgregs->flags) return -EINVAL; @@ -5552,9 +5561,13 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, } -static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, - u8 *state, unsigned int size) +static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, + u8 *state, unsigned int size) { + if (vcpu->kvm->arch.has_protected_state && + fpstate_is_confidential(&vcpu->arch.guest_fpu)) + return -EINVAL; + /* * Only copy state for features that are enabled for the guest. The * state itself isn't problematic, but setting bits in the header for @@ -5571,22 +5584,27 @@ static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, XFEATURE_MASK_FPSSE; if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) - return; + return 0; fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size, supported_xcr0, vcpu->arch.pkru); + return 0; } -static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, - struct kvm_xsave *guest_xsave) +static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, + struct kvm_xsave *guest_xsave) { - kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region, - sizeof(guest_xsave->region)); + return kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region, + sizeof(guest_xsave->region)); } static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave) { + if (vcpu->kvm->arch.has_protected_state && + fpstate_is_confidential(&vcpu->arch.guest_fpu)) + return -EINVAL; + if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) return 0; @@ -5596,18 +5614,23 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, &vcpu->arch.pkru); } -static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu, - struct kvm_xcrs *guest_xcrs) +static int kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu, + struct kvm_xcrs *guest_xcrs) { + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + return -EINVAL; + if (!boot_cpu_has(X86_FEATURE_XSAVE)) { guest_xcrs->nr_xcrs = 0; - return; + return 0; } guest_xcrs->nr_xcrs = 1; guest_xcrs->flags = 0; guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK; guest_xcrs->xcrs[0].value = vcpu->arch.xcr0; + return 0; } static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, @@ -5615,6 +5638,10 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, { int i, r = 0; + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + return -EINVAL; + if (!boot_cpu_has(X86_FEATURE_XSAVE)) return -EINVAL; @@ -5997,7 +6024,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp, case KVM_GET_DEBUGREGS: { struct kvm_debugregs dbgregs; - kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs); + r = kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs); + if (r < 0) + break; r = -EFAULT; if (copy_to_user(argp, &dbgregs, @@ -6027,7 +6056,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp, if (!u.xsave) break; - kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave); + r = kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave); + if (r < 0) + break; r = -EFAULT; if (copy_to_user(argp, u.xsave, sizeof(struct kvm_xsave))) @@ -6056,7 +6087,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp, if (!u.xsave) break; - kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size); + r = kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size); + if (r < 0) + break; r = -EFAULT; if (copy_to_user(argp, u.xsave, size)) @@ -6072,7 +6105,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp, if (!u.xcrs) break; - kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs); + r = kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs); + if (r < 0) + break; r = -EFAULT; if (copy_to_user(argp, u.xcrs, @@ -6216,6 +6251,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp, } #endif case KVM_GET_SREGS2: { + r = -EINVAL; + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + goto out; + u.sregs2 = kzalloc(sizeof(struct kvm_sregs2), GFP_KERNEL); r = -ENOMEM; if (!u.sregs2) @@ -6228,6 +6268,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp, break; } case KVM_SET_SREGS2: { + r = -EINVAL; + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + goto out; + u.sregs2 = memdup_user(argp, sizeof(struct kvm_sregs2)); if (IS_ERR(u.sregs2)) { r = PTR_ERR(u.sregs2); @@ -11444,6 +11489,10 @@ static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) { + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + return -EINVAL; + vcpu_load(vcpu); __get_regs(vcpu, regs); vcpu_put(vcpu); @@ -11485,6 +11534,10 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) { + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + return -EINVAL; + vcpu_load(vcpu); __set_regs(vcpu, regs); vcpu_put(vcpu); @@ -11557,6 +11610,10 @@ static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2) int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) { + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + return -EINVAL; + vcpu_load(vcpu); __get_sregs(vcpu, sregs); vcpu_put(vcpu); @@ -11824,6 +11881,10 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, { int ret; + if (vcpu->kvm->arch.has_protected_state && + vcpu->arch.guest_state_protected) + return -EINVAL; + vcpu_load(vcpu); ret = __set_sregs(vcpu, sregs); vcpu_put(vcpu); @@ -12513,6 +12574,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) return -EINVAL; kvm->arch.vm_type = type; + kvm->arch.has_private_mem = + (type == KVM_X86_SW_PROTECTED_VM); ret = kvm_page_track_init(kvm); if (ret)