[RFC,00/10] KVM: selftests: Add support for test-selectable ucall implementations

Message ID	20211210164620.11636-1-michael.roth@amd.com (mailing list archive)
Headers	show Return-Path: <kvm-owner@kernel.org> Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; From: Michael Roth <michael.roth@amd.com> To: <linux-kselftest@vger.kernel.org> CC: <kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <x86@kernel.org>, Nathan Tempelman <natet@google.com>, Marc Orr <marcorr@google.com>, "Steve Rutherford" <srutherford@google.com>, Sean Christopherson <seanjc@google.com>, Mingwei Zhang <mizhang@google.com>, Brijesh Singh <brijesh.singh@amd.com>, Tom Lendacky <thomas.lendacky@amd.com>, Varad Gautam <varad.gautam@suse.com>, Shuah Khan <shuah@kernel.org>, Vitaly Kuznetsov <vkuznets@redhat.com>, "David Woodhouse" <dwmw@amazon.co.uk>, Ricardo Koller <ricarkol@google.com>, "Jim Mattson" <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, "Thomas Gleixner" <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "Borislav Petkov" <bp@alien8.de>, "H . Peter Anvin" <hpa@zytor.com>, "Christian Borntraeger" <borntraeger@linux.ibm.com>, Janosch Frank <frankja@linux.ibm.com>, David Hildenbrand <david@redhat.com>, "Claudio Imbrenda" <imbrenda@linux.ibm.com>, Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>, Alexandru Elisei <alexandru.elisei@arm.com>, "Suzuki K Poulose" <suzuki.poulose@arm.com>, <kvmarm@lists.cs.columbia.edu> Subject: [RFC PATCH 00/10] KVM: selftests: Add support for test-selectable ucall implementations Date: Fri, 10 Dec 2021 10:46:10 -0600 Message-ID: <20211210164620.11636-1-michael.roth@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk
Series	KVM: selftests: Add support for test-selectable ucall implementations \| expand [RFC,00/10] KVM: selftests: Add support for test-selectable ucall implementations [RFC,01/10] kvm: selftests: move base kvm_util.h declarations to kvm_util_base.h [RFC,02/10] kvm: selftests: move ucall declarations into ucall_common.h [RFC,03/10] kvm: selftests: introduce ucall_ops for test/arch-specific ucall implementations [RFC,04/10] kvm: arm64: selftests: use ucall_ops to define default ucall implementation [RFC,05/10] kvm: s390: selftests: use ucall_ops to define default ucall implementation [RFC,06/10] kvm: selftests: add ucall interfaces based around shared memory [RFC,07/10] kvm: selftests: add ucall_shared ops for PIO [RFC,08/10] kvm: selftests: introduce ucall implementation based on halt instructions [RFC,09/10] kvm: selftests: add GUEST_SHARED_* macros for shared ucall implementations [RFC,10/10] kvm: selftests: add ucall_test to test various ucall functionality

Michael Roth Dec. 10, 2021, 4:46 p.m. UTC

These patches are based on kvm/next, and are also available at:

  https://github.com/mdroth/linux/commits/sev-selftests-ucall-rfc1

== BACKGROUND ==

These patches are a prerequisite for adding selftest support for SEV guests
and possibly other confidential computing implementations in the future.
They were motivated by a suggestion Paolo made in response to the initial
SEV selftest RFC:

  https://lore.kernel.org/lkml/20211025035833.yqphcnf5u3lk4zgg@amd.com/T/#m959b56f9fb4ae6ab973f6ab50fe3ddfacd7c5617

Since the changes touch multiple archs and ended up creating a bit more churn
than expected, I thought it would be a good idea to carve this out into a
separate standalone series for reviewers who may be more interested in the
ucall changes than anything SEV-related.

To summarize, x86 relies on a ucall based on using PIO intructions to generate
an exit to userspace and provide the GVA of a dynamically-allocated ucall
struct that resides in guest memory and contains information about how to
handle/interpret the exit. This doesn't work for SEV guests for 3 main reasons:

  1) The guest memory is generally encrypted during run-time, so the guest
     needs to ensure the ucall struct is allocated in shared memory.
  2) The guest page table is also encrypted, so the address would need to be a
     GPA instead of a GVA.
  3) The guest vCPU register may also be encrypted in the case of
     SEV-ES/SEV-SNP, so the approach of examining vCPU register state has
     additional requirements such as requiring guest code to implement a #VC
     handler that can provide the appropriate registers via a vmgexit.

To address these issues, the SEV selftest RFC1 patchset introduced a set of new
SEV-specific interfaces that closely mirrored the functionality of
ucall()/get_ucall(), but relied on a pre-allocated/static ucall buffer in
shared guest memory so it that guest code could pass messages/state to the host
by simply writing to this pre-arranged shared memory region and then generating
an exit to userspace (via a halt instruction).

Paolo suggested instead implementing support for test/guest-specific ucall
implementations that could be used as an alternative to the default PIO-based
ucall implementations as-needed based on test/guest requirements, while still
allowing for tests to use a common set interfaces like ucall()/get_ucall().

== OVERVIEW ==

This series implements the above functionality by introducing a new ucall_ops
struct that can be used to register a particular ucall implementation as need,
then re-implements x86/arm64/s390x in terms of the ucall_ops.

But for the purposes of introducing a new ucall_ops implementation appropriate
for SEV, there are a couple issues that resulted in the need for some additional
ucall interfaces as well:

  a) ucall() doesn't take a pointer to the ucall struct it modifies, so to make
     it work in the case of an implementation that relies a pre-allocated ucall
     struct in shared guest memory some sort of global lookup functionality
     would be needed to locate the appropriate ucall struct for a particular
     VM/vcpu combination, and this would need to be made accessible for use by
     the guest as well. guests would then need some way of determining what
     VM/vcpu identifiers they need to use to do the lookup, which to do reliably
     would likely require seeding the guest with those identifiers in advance,
     which is possible, but much more easily achievable by simply adding a
     ucall() alternative that accepts a pointer to the ucall struct for that
     particular VM/vcpu.

  b) get_ucall() *does* take a pointer to a ucall struct, but currently zeroes
     it out and uses it to copy the guest's ucall struct into. It *could* be
     re-purposed to handle the case where the pointer is an actual pointer to
     the ucall struct in shared guest memory, but that could cause problems
     since callers would need some idea of what the underlying ucall
     implementation expects. Ideally the interfaces would be agnostic to the
     ucall implementation.

So to address those issues, this series also allows ucall implementations to
optionally be extended to support a set of 'shared' ops that are used in the
following manner:

  host:
    uc_gva = ucall_shared_alloc()
    setup_vm_args(vm, uc_gva)

  guest:
    ucall_shared(uc_gva, ...)

  host:
    uget_ucall_shared(uc_gva, ...)

and then implements a new ucall implementation, ucall_ops_halt, based around
these shared interfaces and halt instructions.

While this doesn't really meet the initial goal of re-using the existing
ucall interfaces as-is, the hope is that these *_shared interfaces are
general enough to be re-usable things other than SEV, or at least improve on
code readability over the initial SEV-specific interfaces.

Any review/comments are greatly appreciated!

----------------------------------------------------------------
Michael Roth (10):
      kvm: selftests: move base kvm_util.h declarations to kvm_util_base.h
      kvm: selftests: move ucall declarations into ucall_common.h
      kvm: selftests: introduce ucall_ops for test/arch-specific ucall implementations
      kvm: arm64: selftests: use ucall_ops to define default ucall implementation
      (COMPILE-TESTED ONLY) kvm: s390: selftests: use ucall_ops to define default ucall implementation
      kvm: selftests: add ucall interfaces based around shared memory
      kvm: selftests: add ucall_shared ops for PIO
      kvm: selftests: introduce ucall implementation based on halt instructions
      kvm: selftests: add GUEST_SHARED_* macros for shared ucall implementations
      kvm: selftests: add ucall_test to test various ucall functionality

 tools/testing/selftests/kvm/.gitignore             |   1 +
 tools/testing/selftests/kvm/Makefile               |   5 +-
 .../testing/selftests/kvm/include/aarch64/ucall.h  |  18 +
 tools/testing/selftests/kvm/include/kvm_util.h     | 408 +--------------------
 .../testing/selftests/kvm/include/kvm_util_base.h  | 368 +++++++++++++++++++
 tools/testing/selftests/kvm/include/s390x/ucall.h  |  18 +
 tools/testing/selftests/kvm/include/ucall_common.h | 147 ++++++++
 tools/testing/selftests/kvm/include/x86_64/ucall.h |  19 +
 tools/testing/selftests/kvm/lib/aarch64/ucall.c    |  43 +--
 tools/testing/selftests/kvm/lib/s390x/ucall.c      |  45 +--
 tools/testing/selftests/kvm/lib/ucall_common.c     | 133 +++++++
 tools/testing/selftests/kvm/lib/x86_64/ucall.c     |  82 +++--
 tools/testing/selftests/kvm/ucall_test.c           | 182 +++++++++
 13 files changed, 982 insertions(+), 487 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/aarch64/ucall.h
 create mode 100644 tools/testing/selftests/kvm/include/kvm_util_base.h
 create mode 100644 tools/testing/selftests/kvm/include/s390x/ucall.h
 create mode 100644 tools/testing/selftests/kvm/include/ucall_common.h
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/ucall.h
 create mode 100644 tools/testing/selftests/kvm/lib/ucall_common.c
 create mode 100644 tools/testing/selftests/kvm/ucall_test.c

Paolo Bonzini Dec. 22, 2021, 2:46 p.m. UTC | #1

On 12/10/21 17:46, Michael Roth wrote:
> These patches are based on kvm/next, and are also available at:
> 
>    https://github.com/mdroth/linux/commits/sev-selftests-ucall-rfc1

Thanks, this is a nice implementation of the concept.  I'll check s390 
before the next merge window, as I intend to merge this and the SEV 
tests (which I have already confirmed to work; I haven't yet checked 
SEV-ES because it's a bit harder for me to install new kernels on the 
SEV-ES machine I have access to).

Paolo

> == BACKGROUND ==
> 
> These patches are a prerequisite for adding selftest support for SEV guests
> and possibly other confidential computing implementations in the future.
> They were motivated by a suggestion Paolo made in response to the initial
> SEV selftest RFC:
> 
>    https://lore.kernel.org/lkml/20211025035833.yqphcnf5u3lk4zgg@amd.com/T/#m959b56f9fb4ae6ab973f6ab50fe3ddfacd7c5617
> 
> Since the changes touch multiple archs and ended up creating a bit more churn
> than expected, I thought it would be a good idea to carve this out into a
> separate standalone series for reviewers who may be more interested in the
> ucall changes than anything SEV-related.
> 
> To summarize, x86 relies on a ucall based on using PIO intructions to generate
> an exit to userspace and provide the GVA of a dynamically-allocated ucall
> struct that resides in guest memory and contains information about how to
> handle/interpret the exit. This doesn't work for SEV guests for 3 main reasons:
> 
>    1) The guest memory is generally encrypted during run-time, so the guest
>       needs to ensure the ucall struct is allocated in shared memory.
>    2) The guest page table is also encrypted, so the address would need to be a
>       GPA instead of a GVA.
>    3) The guest vCPU register may also be encrypted in the case of
>       SEV-ES/SEV-SNP, so the approach of examining vCPU register state has
>       additional requirements such as requiring guest code to implement a #VC
>       handler that can provide the appropriate registers via a vmgexit.
> 
> To address these issues, the SEV selftest RFC1 patchset introduced a set of new
> SEV-specific interfaces that closely mirrored the functionality of
> ucall()/get_ucall(), but relied on a pre-allocated/static ucall buffer in
> shared guest memory so it that guest code could pass messages/state to the host
> by simply writing to this pre-arranged shared memory region and then generating
> an exit to userspace (via a halt instruction).
> 
> Paolo suggested instead implementing support for test/guest-specific ucall
> implementations that could be used as an alternative to the default PIO-based
> ucall implementations as-needed based on test/guest requirements, while still
> allowing for tests to use a common set interfaces like ucall()/get_ucall().
> 
> == OVERVIEW ==
> 
> This series implements the above functionality by introducing a new ucall_ops
> struct that can be used to register a particular ucall implementation as need,
> then re-implements x86/arm64/s390x in terms of the ucall_ops.
> 
> But for the purposes of introducing a new ucall_ops implementation appropriate
> for SEV, there are a couple issues that resulted in the need for some additional
> ucall interfaces as well:
> 
>    a) ucall() doesn't take a pointer to the ucall struct it modifies, so to make
>       it work in the case of an implementation that relies a pre-allocated ucall
>       struct in shared guest memory some sort of global lookup functionality
>       would be needed to locate the appropriate ucall struct for a particular
>       VM/vcpu combination, and this would need to be made accessible for use by
>       the guest as well. guests would then need some way of determining what
>       VM/vcpu identifiers they need to use to do the lookup, which to do reliably
>       would likely require seeding the guest with those identifiers in advance,
>       which is possible, but much more easily achievable by simply adding a
>       ucall() alternative that accepts a pointer to the ucall struct for that
>       particular VM/vcpu.
> 
>    b) get_ucall() *does* take a pointer to a ucall struct, but currently zeroes
>       it out and uses it to copy the guest's ucall struct into. It *could* be
>       re-purposed to handle the case where the pointer is an actual pointer to
>       the ucall struct in shared guest memory, but that could cause problems
>       since callers would need some idea of what the underlying ucall
>       implementation expects. Ideally the interfaces would be agnostic to the
>       ucall implementation.
> 
> So to address those issues, this series also allows ucall implementations to
> optionally be extended to support a set of 'shared' ops that are used in the
> following manner:
> 
>    host:
>      uc_gva = ucall_shared_alloc()
>      setup_vm_args(vm, uc_gva)
> 
>    guest:
>      ucall_shared(uc_gva, ...)
> 
>    host:
>      uget_ucall_shared(uc_gva, ...)
> 
> and then implements a new ucall implementation, ucall_ops_halt, based around
> these shared interfaces and halt instructions.
> 
> While this doesn't really meet the initial goal of re-using the existing
> ucall interfaces as-is, the hope is that these *_shared interfaces are
> general enough to be re-usable things other than SEV, or at least improve on
> code readability over the initial SEV-specific interfaces.
> 
> Any review/comments are greatly appreciated!
> 
> ----------------------------------------------------------------
> Michael Roth (10):
>        kvm: selftests: move base kvm_util.h declarations to kvm_util_base.h
>        kvm: selftests: move ucall declarations into ucall_common.h
>        kvm: selftests: introduce ucall_ops for test/arch-specific ucall implementations
>        kvm: arm64: selftests: use ucall_ops to define default ucall implementation
>        (COMPILE-TESTED ONLY) kvm: s390: selftests: use ucall_ops to define default ucall implementation
>        kvm: selftests: add ucall interfaces based around shared memory
>        kvm: selftests: add ucall_shared ops for PIO
>        kvm: selftests: introduce ucall implementation based on halt instructions
>        kvm: selftests: add GUEST_SHARED_* macros for shared ucall implementations
>        kvm: selftests: add ucall_test to test various ucall functionality
> 
>   tools/testing/selftests/kvm/.gitignore             |   1 +
>   tools/testing/selftests/kvm/Makefile               |   5 +-
>   .../testing/selftests/kvm/include/aarch64/ucall.h  |  18 +
>   tools/testing/selftests/kvm/include/kvm_util.h     | 408 +--------------------
>   .../testing/selftests/kvm/include/kvm_util_base.h  | 368 +++++++++++++++++++
>   tools/testing/selftests/kvm/include/s390x/ucall.h  |  18 +
>   tools/testing/selftests/kvm/include/ucall_common.h | 147 ++++++++
>   tools/testing/selftests/kvm/include/x86_64/ucall.h |  19 +
>   tools/testing/selftests/kvm/lib/aarch64/ucall.c    |  43 +--
>   tools/testing/selftests/kvm/lib/s390x/ucall.c      |  45 +--
>   tools/testing/selftests/kvm/lib/ucall_common.c     | 133 +++++++
>   tools/testing/selftests/kvm/lib/x86_64/ucall.c     |  82 +++--
>   tools/testing/selftests/kvm/ucall_test.c           | 182 +++++++++
>   13 files changed, 982 insertions(+), 487 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/include/aarch64/ucall.h
>   create mode 100644 tools/testing/selftests/kvm/include/kvm_util_base.h
>   create mode 100644 tools/testing/selftests/kvm/include/s390x/ucall.h
>   create mode 100644 tools/testing/selftests/kvm/include/ucall_common.h
>   create mode 100644 tools/testing/selftests/kvm/include/x86_64/ucall.h
>   create mode 100644 tools/testing/selftests/kvm/lib/ucall_common.c
>   create mode 100644 tools/testing/selftests/kvm/ucall_test.c
> 
>

Sean Christopherson Dec. 30, 2021, 9:11 p.m. UTC | #2

On Fri, Dec 10, 2021, Michael Roth wrote:
> To summarize, x86 relies on a ucall based on using PIO intructions to generate
> an exit to userspace and provide the GVA of a dynamically-allocated ucall
> struct that resides in guest memory and contains information about how to
> handle/interpret the exit. This doesn't work for SEV guests for 3 main reasons:
> 
>   1) The guest memory is generally encrypted during run-time, so the guest
>      needs to ensure the ucall struct is allocated in shared memory.
>   2) The guest page table is also encrypted, so the address would need to be a
>      GPA instead of a GVA.
>   3) The guest vCPU register may also be encrypted in the case of
>      SEV-ES/SEV-SNP, so the approach of examining vCPU register state has
>      additional requirements such as requiring guest code to implement a #VC
>      handler that can provide the appropriate registers via a vmgexit.
> 
> To address these issues, the SEV selftest RFC1 patchset introduced a set of new
> SEV-specific interfaces that closely mirrored the functionality of
> ucall()/get_ucall(), but relied on a pre-allocated/static ucall buffer in
> shared guest memory so it that guest code could pass messages/state to the host
> by simply writing to this pre-arranged shared memory region and then generating
> an exit to userspace (via a halt instruction).
> 
> Paolo suggested instead implementing support for test/guest-specific ucall
> implementations that could be used as an alternative to the default PIO-based
> ucall implementations as-needed based on test/guest requirements, while still
> allowing for tests to use a common set interfaces like ucall()/get_ucall().

This all seems way more complicated than it needs to be.  HLT is _worse_ than
PIO on x86 because it triggers a userspace exit if and only if the local APIC is
not in-kernel.  That is bound to bite someone.  The only issue with SEV is the
address, not the VM-Exit mechanism.  That doesn't change with SEV-ES, SEV-SNP,
or TDX, as PIO and HLT will both get reflected as #VC/#VE, i.e. the guest side
needs to be updated to use VMGEXIT/TDCALL no matter what, at which point having
the hypercall request PIO emulation is just as easy as requesting HLT.

I also don't like having to differentiate between a "shared" and "regular" ucall.
I kind of like having to explicitly pass the ucall object being used, but that
puts undue burden on simple single-vCPU tests.

The inability to read guest private memory is really the only issue, and that can
be easily solved without completely revamping the ucall framework, and without
having to update a huge pile of tests to make them place nice with private memory.

This would also be a good opportunity to clean up the stupidity of tests having to
manually call ucall_init(), drop the unused/pointless @arg from ucall_init(), and
maybe even fix arm64's lurking landmine of not being SMP safe (the address is shared
by all vCPUs).

To reduce the burden on tests and avoid ordering issues with creating vCPUs,
allocate a ucall struct for every possible vCPU when the VM is created and stuff
the GPA of the struct in the struct itself so that the guest can communicate the
GPA instead of the GVA.  Then confidential VMs just need to make all structs shared.

If all architectures have a way to access a vCPU ID, the ucall structs could be
stored as a simple array.  If not, a list based allocator would probably suffice.

E.g. something like this, except the list management is in common code instead of
x86, and also delete all the per-test ucall_init() calls.

diff --git a/tools/testing/selftests/kvm/lib/x86_64/ucall.c b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
index a3489973e290..9aab6407bd42 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/ucall.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
@@ -8,19 +8,59 @@

 #define UCALL_PIO_PORT ((uint16_t)0x1000)

-void ucall_init(struct kvm_vm *vm, void *arg)
+static struct list_head *ucall_list;
+
+void ucall_init(struct kvm_vm *vm)
 {
+       struct ucall *ucalls;
+       int nr_cpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
+       int i;
+
+       TEST_ASSERT(!ucall_list, "ucall() can only be used by one VM at a time");
+
+       INIT_LIST_HEAD(&vm->ucall_list);
+
+       ucalls = vm_vaddr_alloc(nr_cpus * sizeof(struct ucall));
+       ucall_make_shared(ucall_list, <size>);
+
+       for (i = 0; i < nr_cpus; i++) {
+               ucalls[i].gpa = addr_gva2gpa(vm, &ucalls[i]);
+
+               list_add(&vm->ucall_list, &ucalls[i].list)
+       }
+
+       ucall_list = &vm->ucall_list;
+       sync_global_to_guest(vm, ucall_list);
 }

 void ucall_uninit(struct kvm_vm *vm)
 {
+       ucall_list =  NULL;
+       sync_global_to_guest(vm, ucall_list);
+}
+
+static struct ucall *ucall_alloc(void)
+{
+       struct ucall *uc;
+
+       /* Is there a lock primitive for the guest? */
+       lock_something(&ucall_lock);
+       uc = list_first_entry(ucall_list, struct ucall, list);
+
+       list_del(&uc->list);
+       unlock_something(&ucall_lock);
+}
+
+static void ucall_free(struct ucall *uc)
+{
+       lock_something(&ucall_lock);
+       list_add(&uc->list, ucall_list);
+       unlock_something(&ucall_lock);
 }

 void ucall(uint64_t cmd, int nargs, ...)
 {
-       struct ucall uc = {
-               .cmd = cmd,
-       };
+       struct ucall *uc = ucall_alloc();
        va_list va;
        int i;

@@ -32,7 +72,9 @@ void ucall(uint64_t cmd, int nargs, ...)
        va_end(va);

        asm volatile("in %[port], %%al"
-               : : [port] "d" (UCALL_PIO_PORT), "D" (&uc) : "rax", "memory");
+               : : [port] "d" (UCALL_PIO_PORT), "D" (uc->gpa) : "rax", "memory");
+
+       ucall_free(uc);
 }

 uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc)
@@ -47,7 +89,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc)
                struct kvm_regs regs;

                vcpu_regs_get(vm, vcpu_id, &regs);
-               memcpy(&ucall, addr_gva2hva(vm, (vm_vaddr_t)regs.rdi),
+               memcpy(&ucall, addr_gpa2hva(vm, (vm_paddr_t)regs.rdi),
                       sizeof(ucall));

                vcpu_run_complete_io(vm, vcpu_id);

Michael Roth Jan. 4, 2022, 11:35 p.m. UTC | #3

On Thu, Dec 30, 2021 at 09:11:12PM +0000, Sean Christopherson wrote:
> On Fri, Dec 10, 2021, Michael Roth wrote:
> > To summarize, x86 relies on a ucall based on using PIO intructions to generate
> > an exit to userspace and provide the GVA of a dynamically-allocated ucall
> > struct that resides in guest memory and contains information about how to
> > handle/interpret the exit. This doesn't work for SEV guests for 3 main reasons:
> > 
> >   1) The guest memory is generally encrypted during run-time, so the guest
> >      needs to ensure the ucall struct is allocated in shared memory.
> >   2) The guest page table is also encrypted, so the address would need to be a
> >      GPA instead of a GVA.
> >   3) The guest vCPU register may also be encrypted in the case of
> >      SEV-ES/SEV-SNP, so the approach of examining vCPU register state has
> >      additional requirements such as requiring guest code to implement a #VC
> >      handler that can provide the appropriate registers via a vmgexit.
> > 
> > To address these issues, the SEV selftest RFC1 patchset introduced a set of new
> > SEV-specific interfaces that closely mirrored the functionality of
> > ucall()/get_ucall(), but relied on a pre-allocated/static ucall buffer in
> > shared guest memory so it that guest code could pass messages/state to the host
> > by simply writing to this pre-arranged shared memory region and then generating
> > an exit to userspace (via a halt instruction).
> > 
> > Paolo suggested instead implementing support for test/guest-specific ucall
> > implementations that could be used as an alternative to the default PIO-based
> > ucall implementations as-needed based on test/guest requirements, while still
> > allowing for tests to use a common set interfaces like ucall()/get_ucall().
> 
> This all seems way more complicated than it needs to be.  HLT is _worse_ than
> PIO on x86 because it triggers a userspace exit if and only if the local APIC is
> not in-kernel.  That is bound to bite someone.

Hmmm, fair point. It's easy for me to just not use in-kernel APIC in
the current SEV tests to avoid the issue, but HLT is being made
available as an available ucall implementation for other tests as well,
and given in-kernel APIC is set up automatically maybe it's not robust
enough.

> not in-kernel.  That is bound to bite someone.  The only issue with SEV is the
> address, not the VM-Exit mechanism.  That doesn't change with SEV-ES, SEV-SNP,
> or TDX, as PIO and HLT will both get reflected as #VC/#VE, i.e. the guest side
> needs to be updated to use VMGEXIT/TDCALL no matter what, at which point having
> the hypercall request PIO emulation is just as easy as requesting HLT.

I'm not aware of any #VC handling needed for HLT in the case of
SEV-ES/SEV-SNP. That was one of the reasons for the SEV tests using
this ucall implementation. Of course, at some point, we'd want full support
for PIO/MMIO/etc. in the #VC handler, but it's not something I'd planned on
adding until after the SEV-SNP tests, since it seems like we'd need to
import a bunch of intruction decoding code from elsewhere in the kernel,
which is a lot of churn that's not immediately necessary for getting at least
some basic tests in place. Since the HLT implementation is only 20 lines of
code it seemed like a reasonable stop-gap until we start getting more CoCo
tests in place. But the in-kernel APIC issue probably needs more
consideration...

Perhaps for *just* PIO, the intruction decoding can be open-coded so it
can be added to the initial #VC handler implementation, which would avoid the
need for HLT implementation. I'll take a look at that.

> 
> I also don't like having to differentiate between a "shared" and "regular" ucall.
> I kind of like having to explicitly pass the ucall object being used, but that
> puts undue burden on simple single-vCPU tests.

I tried to avoid it, but I got hung up on that fact that pre-allocating
arrays/lists of ucall structs needs to be done for each VM, and so we'd
end up needing some way for a guest to identify which pool it's ucall
struct should be allocated from. But you've gotten around that by just
sync_global_to_guest()'ing for each pool at the time ucall_init() is
called, so the guest only ever sees it's particular pool. Then the switch
from writing GVA to writing GPA solves the translation problem. Nice.

> 
> The inability to read guest private memory is really the only issue, and that can
> be easily solved without completely revamping the ucall framework, and without
> having to update a huge pile of tests to make them place nice with private memory.

I think the first 5 patches in this series are still relevant cleanups
vs. having a complete standalone ucall implementation for each arch, and Andrew
has also already started looking at other header cleanups related to
patch #1, so maybe Paolo would still like to queue those. Would also
provide a better starting point for having a centralized allocator for
the ucall structs, which you hinted at wanting below.

But the subsequent patches that add the ucall_shared() interfaces should
probably be set aside for now in favor of your proposal.

> 
> This would also be a good opportunity to clean up the stupidity of tests having to
> manually call ucall_init(), drop the unused/pointless @arg from ucall_init(), and
> maybe even fix arm64's lurking landmine of not being SMP safe (the address is shared
> by all vCPUs).

I thought you *didn't* want to update a huge pile of tests :) I suppose
it's unavoidable, since with your proposal, having something like ucall_init()
being called at some point is required, as opposed to the current
implementation where it is optional. Are you intending to have it be
called automatically by vm_create*()?

> 
> To reduce the burden on tests and avoid ordering issues with creating vCPUs,
> allocate a ucall struct for every possible vCPU when the VM is created and stuff
> the GPA of the struct in the struct itself so that the guest can communicate the
> GPA instead of the GVA.  Then confidential VMs just need to make all structs shared.

So a separate call like:

  ucall_make_shared(vm->ucall_list)

? Might need some good documentation/assertions to make sure it gets
called at the right place for confidential VMs, and may need some extra
hooks in SEV selftest implementation for switching from private to shared
after the memory has already been allocated, but seems reasonable.

> 
> If all architectures have a way to access a vCPU ID, the ucall structs could be
> stored as a simple array.  If not, a list based allocator would probably suffice.

I think list allocator is nicer, generating #VCs for both the PIO and the
cpuid checks for vCPU lookup seems like a lot of extra noise to sift
through while debugging where an errant test is failing, and doesn't seem to
have any disadvantage vs. an array.

Thanks,

Mike

Sean Christopherson Jan. 5, 2022, 12:17 a.m. UTC | #4

On Tue, Jan 04, 2022, Michael Roth wrote:
> On Thu, Dec 30, 2021 at 09:11:12PM +0000, Sean Christopherson wrote:
> > not in-kernel.  That is bound to bite someone.  The only issue with SEV is the
> > address, not the VM-Exit mechanism.  That doesn't change with SEV-ES, SEV-SNP,
> > or TDX, as PIO and HLT will both get reflected as #VC/#VE, i.e. the guest side
> > needs to be updated to use VMGEXIT/TDCALL no matter what, at which point having
> > the hypercall request PIO emulation is just as easy as requesting HLT.
> 
> I'm not aware of any #VC handling needed for HLT in the case of
> SEV-ES/SEV-SNP. That was one of the reasons for the SEV tests using
> this ucall implementation.

Ah, you're right, HLT is an "automatic" exit and the CPU takes care of adjusting
RIP.  TDX is the only one that requires a hypercall.

> Of course, at some point, we'd want full support for PIO/MMIO/etc. in the #VC
> handler, but it's not something I'd planned on adding until after the SEV-SNP
> tests, since it seems like we'd need to import a bunch of intruction decoding
> code from elsewhere in the kernel, which is a lot of churn that's not
> immediately necessary for getting at least some basic tests in place. Since
> the HLT implementation is only 20 lines of code it seemed like a reasonable
> stop-gap until we start getting more CoCo tests in place. But the in-kernel
> APIC issue probably needs more consideration...
> 
> Perhaps for *just* PIO, the intruction decoding can be open-coded so it
> can be added to the initial #VC handler implementation, which would avoid the
> need for HLT implementation. I'll take a look at that.

PIO shouldn't require instruction decoding or a #VC handler.  What I was thinking
is that the guest in the selftest would make a direct #VMGEXIT/TDCALL to request
PIO instead of executing an OUT.  

> > I also don't like having to differentiate between a "shared" and "regular" ucall.
> > I kind of like having to explicitly pass the ucall object being used, but that
> > puts undue burden on simple single-vCPU tests.
> 
> I tried to avoid it, but I got hung up on that fact that pre-allocating
> arrays/lists of ucall structs needs to be done for each VM, and so we'd
> end up needing some way for a guest to identify which pool it's ucall
> struct should be allocated from. But you've gotten around that by just
> sync_global_to_guest()'ing for each pool at the time ucall_init() is
> called, so the guest only ever sees it's particular pool. Then the switch
> from writing GVA to writing GPA solves the translation problem. Nice.
> 
> > 
> > The inability to read guest private memory is really the only issue, and that can
> > be easily solved without completely revamping the ucall framework, and without
> > having to update a huge pile of tests to make them place nice with private memory.
> 
> I think the first 5 patches in this series are still relevant cleanups
> vs. having a complete standalone ucall implementation for each arch, and Andrew
> has also already started looking at other header cleanups related to
> patch #1, so maybe Paolo would still like to queue those. Would also
> provide a better starting point for having a centralized allocator for
> the ucall structs, which you hinted at wanting below.
> 
> But the subsequent patches that add the ucall_shared() interfaces should
> probably be set aside for now in favor of your proposal.
> 
> > 
> > This would also be a good opportunity to clean up the stupidity of tests having to
> > manually call ucall_init(), drop the unused/pointless @arg from ucall_init(), and
> > maybe even fix arm64's lurking landmine of not being SMP safe (the address is shared
> > by all vCPUs).
> 
> I thought you *didn't* want to update a huge pile of tests :) I suppose
> it's unavoidable, since with your proposal, having something like ucall_init()
> being called at some point is required, as opposed to the current
> implementation where it is optional. Are you intending to have it be
> called automatically by vm_create*()?

Yeah, I was thinking it could be done at the lowest level vm_create() helper.
We'll need to expand vm_create() (or add yet another layer to avoid modifying a
pile of tests) to allow opting out of initializing ucall, e.g. sev_migrate_tests.c
needs to create multiple concurrent VMs, but happily doesn't need ucall support.

> > To reduce the burden on tests and avoid ordering issues with creating vCPUs,
> > allocate a ucall struct for every possible vCPU when the VM is created and stuff
> > the GPA of the struct in the struct itself so that the guest can communicate the
> > GPA instead of the GVA.  Then confidential VMs just need to make all structs shared.
> 
> So a separate call like:
> 
>   ucall_make_shared(vm->ucall_list)
> 
> ? Might need some good documentation/assertions to make sure it gets
> called at the right place for confidential VMs, and may need some extra
> hooks in SEV selftest implementation for switching from private to shared
> after the memory has already been allocated, but seems reasonable.

Again, I was thinking that it would be done unconditionally by ucall_init(), i.e.
would be automatically handled by the selftest framework and would Just Work for
individual tests.

> > If all architectures have a way to access a vCPU ID, the ucall structs could be
> > stored as a simple array.  If not, a list based allocator would probably suffice.
> 
> I think list allocator is nicer, generating #VCs for both the PIO and the
> cpuid checks for vCPU lookup seems like a lot of extra noise to sift
> through while debugging where an errant test is failing, and doesn't seem to
> have any disadvantage vs. an array.

Ah, right, I forgot that querying the vCPU ID would require a hypercall.

Michael Roth Jan. 5, 2022, 5:02 p.m. UTC | #5

On Wed, Jan 05, 2022 at 12:17:33AM +0000, Sean Christopherson wrote:
> On Tue, Jan 04, 2022, Michael Roth wrote:
> > On Thu, Dec 30, 2021 at 09:11:12PM +0000, Sean Christopherson wrote:
> > > not in-kernel.  That is bound to bite someone.  The only issue with SEV is the
> > > address, not the VM-Exit mechanism.  That doesn't change with SEV-ES, SEV-SNP,
> > > or TDX, as PIO and HLT will both get reflected as #VC/#VE, i.e. the guest side
> > > needs to be updated to use VMGEXIT/TDCALL no matter what, at which point having
> > > the hypercall request PIO emulation is just as easy as requesting HLT.
> > 
> > I'm not aware of any #VC handling needed for HLT in the case of
> > SEV-ES/SEV-SNP. That was one of the reasons for the SEV tests using
> > this ucall implementation.
> 
> Ah, you're right, HLT is an "automatic" exit and the CPU takes care of adjusting
> RIP.  TDX is the only one that requires a hypercall.
> 
> > Of course, at some point, we'd want full support for PIO/MMIO/etc. in the #VC
> > handler, but it's not something I'd planned on adding until after the SEV-SNP
> > tests, since it seems like we'd need to import a bunch of intruction decoding
> > code from elsewhere in the kernel, which is a lot of churn that's not
> > immediately necessary for getting at least some basic tests in place. Since
> > the HLT implementation is only 20 lines of code it seemed like a reasonable
> > stop-gap until we start getting more CoCo tests in place. But the in-kernel
> > APIC issue probably needs more consideration...
> > 
> > Perhaps for *just* PIO, the intruction decoding can be open-coded so it
> > can be added to the initial #VC handler implementation, which would avoid the
> > need for HLT implementation. I'll take a look at that.
> 
> PIO shouldn't require instruction decoding or a #VC handler.  What I was thinking
> is that the guest in the selftest would make a direct #VMGEXIT/TDCALL to request
> PIO instead of executing an OUT.  

That seems like a nicer approach. But it sort of lends itself to having
test-specific ucall implementations in some form. How are you thinking
vm_create() should decide what implementation to use? With this series
in place it could be something like:

  vm_create(..., struct ucall_ops *ops)
    ucall_init_ops(ops)

and with the SEV selftests in their current form it would look something
like:

  sev_vm_create(...)
    vm_create_with_ucall(..., ops=ucall_ops_pio_vmgexit)
      ucall_init_ops(ops)

is that sort of what you're thinking, or something else?

> 
> > > I also don't like having to differentiate between a "shared" and "regular" ucall.
> > > I kind of like having to explicitly pass the ucall object being used, but that
> > > puts undue burden on simple single-vCPU tests.
> > 
> > I tried to avoid it, but I got hung up on that fact that pre-allocating
> > arrays/lists of ucall structs needs to be done for each VM, and so we'd
> > end up needing some way for a guest to identify which pool it's ucall
> > struct should be allocated from. But you've gotten around that by just
> > sync_global_to_guest()'ing for each pool at the time ucall_init() is
> > called, so the guest only ever sees it's particular pool. Then the switch
> > from writing GVA to writing GPA solves the translation problem. Nice.
> > 
> > > 
> > > The inability to read guest private memory is really the only issue, and that can
> > > be easily solved without completely revamping the ucall framework, and without
> > > having to update a huge pile of tests to make them place nice with private memory.
> > 
> > I think the first 5 patches in this series are still relevant cleanups
> > vs. having a complete standalone ucall implementation for each arch, and Andrew
> > has also already started looking at other header cleanups related to
> > patch #1, so maybe Paolo would still like to queue those. Would also
> > provide a better starting point for having a centralized allocator for
> > the ucall structs, which you hinted at wanting below.
> > 
> > But the subsequent patches that add the ucall_shared() interfaces should
> > probably be set aside for now in favor of your proposal.
> > 
> > > 
> > > This would also be a good opportunity to clean up the stupidity of tests having to
> > > manually call ucall_init(), drop the unused/pointless @arg from ucall_init(), and
> > > maybe even fix arm64's lurking landmine of not being SMP safe (the address is shared
> > > by all vCPUs).
> > 
> > I thought you *didn't* want to update a huge pile of tests :) I suppose
> > it's unavoidable, since with your proposal, having something like ucall_init()
> > being called at some point is required, as opposed to the current
> > implementation where it is optional. Are you intending to have it be
> > called automatically by vm_create*()?
> 
> Yeah, I was thinking it could be done at the lowest level vm_create() helper.
> We'll need to expand vm_create() (or add yet another layer to avoid modifying a
> pile of tests) to allow opting out of initializing ucall, e.g. sev_migrate_tests.c
> needs to create multiple concurrent VMs, but happily doesn't need ucall support.

Why does sev_migrate_tests need to opt out? Couldn't it use
ucall_ops_pio_vmgexit like that SEV case above?

I ask because there is a ucall() in the exception handling code where
some unhandled exceptions result in the guest automatically issuing a
ucall(UCALL_UNHANDLED), so even when tests don't use ucall() they
might still rely on it if they enable exception handling. So that might
be an argument for always setting up at least the default ucall_ops_pio
implementation and creating a pool just in case. (or an argument for
dropping the UCALL_HANDLED handling).

> 
> > > To reduce the burden on tests and avoid ordering issues with creating vCPUs,
> > > allocate a ucall struct for every possible vCPU when the VM is created and stuff
> > > the GPA of the struct in the struct itself so that the guest can communicate the
> > > GPA instead of the GVA.  Then confidential VMs just need to make all structs shared.
> > 
> > So a separate call like:
> > 
> >   ucall_make_shared(vm->ucall_list)
> > 
> > ? Might need some good documentation/assertions to make sure it gets
> > called at the right place for confidential VMs, and may need some extra
> > hooks in SEV selftest implementation for switching from private to shared
> > after the memory has already been allocated, but seems reasonable.
> 
> Again, I was thinking that it would be done unconditionally by ucall_init(), i.e.
> would be automatically handled by the selftest framework and would Just Work for
> individual tests.

Ok, I'll have to think that through more. Currently with the SEV
selftests as they we have:

  sev_vm_create(policy, npages)
    vm = vm_create(...)
    vm_set_memory_encryption(vm, encrypt_by_default, enc_bit)
    //vm_vaddr_alloc_shared() can be used now

The ucall struct allocations would need to go through
vm_vaddr_alloc_shared() to make sure the selftest library tracks/maps
the pages as shared, but that vm_set_memory_encryption() happens too
late if the ucall_init() stuff is done in vm_create(). It should be
possible to pass the vm_set_memory_encryption() arguments directly to
vm_create() to allow for what you're proposing, but I guess we'd need
a new vm_create() wrapper that handles both the
vm_set_memory_encryption() args, along with the ucall_ops above,
something like:

  sev_vm_create(policy, npages)
    vm = vm_create_coco(..., encrypt_by_default, enc_bit/shared_bit, ucall_ops)

Or were you thinking something else? Just trying to get an idea of how
this will all need to tie in with the SEV selftests and what needs to
change on that end.

Sean Christopherson Jan. 5, 2022, 5:43 p.m. UTC | #6

On Wed, Jan 05, 2022, Michael Roth wrote:
> On Wed, Jan 05, 2022 at 12:17:33AM +0000, Sean Christopherson wrote:
> > PIO shouldn't require instruction decoding or a #VC handler.  What I was thinking
> > is that the guest in the selftest would make a direct #VMGEXIT/TDCALL to request
> > PIO instead of executing an OUT.  
> 
> That seems like a nicer approach. But it sort of lends itself to having
> test-specific ucall implementations in some form. How are you thinking
> vm_create() should decide what implementation to use? With this series
> in place it could be something like:
> 
>   vm_create(..., struct ucall_ops *ops)
>     ucall_init_ops(ops)
> 
> and with the SEV selftests in their current form it would look something
> like:
> 
>   sev_vm_create(...)
>     vm_create_with_ucall(..., ops=ucall_ops_pio_vmgexit)
>       ucall_init_ops(ops)
> 
> is that sort of what you're thinking, or something else?

I keep forgetting ucall() doesn't have access to the VM.  But, since we're
restricing ucall() to a single VM, we can have a global that sets ucall_ops during
ucall_init() based on the VM type, or skip an ops and just open code the behavior
in x86's ucall() by snapshotting the VM type.  Either way, the goal is to avoid
having to pass in ucall_ops at the test level.

> > Yeah, I was thinking it could be done at the lowest level vm_create() helper.
> > We'll need to expand vm_create() (or add yet another layer to avoid modifying a
> > pile of tests) to allow opting out of initializing ucall, e.g. sev_migrate_tests.c
> > needs to create multiple concurrent VMs, but happily doesn't need ucall support.
> 
> Why does sev_migrate_tests need to opt out? Couldn't it use
> ucall_ops_pio_vmgexit like that SEV case above?

Because it uses multiple VMs, and my rough sketch only allows for a single VM to
use ucall.  Though I suppose we could simply keep appending to the ucall list for
every VM.  The requirement would then be that all VMs are of the same type, i.e.
utilize the same ucall_ops.

> I ask because there is a ucall() in the exception handling code where
> some unhandled exceptions result in the guest automatically issuing a
> ucall(UCALL_UNHANDLED), so even when tests don't use ucall() they
> might still rely on it if they enable exception handling. So that might
> be an argument for always setting up at least the default ucall_ops_pio
> implementation and creating a pool just in case. (or an argument for
> dropping the UCALL_HANDLED handling).

The sev_migrate_tests don't even run a guest, hence the quick-and-dirty "solution".
Though thinking toward the future, that may be too dirty as it would prevent tests
from having multiple "real" VMs.

> > > > To reduce the burden on tests and avoid ordering issues with creating vCPUs,
> > > > allocate a ucall struct for every possible vCPU when the VM is created and stuff
> > > > the GPA of the struct in the struct itself so that the guest can communicate the
> > > > GPA instead of the GVA.  Then confidential VMs just need to make all structs shared.
> > > 
> > > So a separate call like:
> > > 
> > >   ucall_make_shared(vm->ucall_list)
> > > 
> > > ? Might need some good documentation/assertions to make sure it gets
> > > called at the right place for confidential VMs, and may need some extra
> > > hooks in SEV selftest implementation for switching from private to shared
> > > after the memory has already been allocated, but seems reasonable.
> > 
> > Again, I was thinking that it would be done unconditionally by ucall_init(), i.e.
> > would be automatically handled by the selftest framework and would Just Work for
> > individual tests.
> 
> Ok, I'll have to think that through more. Currently with the SEV
> selftests as they we have:
> 
>   sev_vm_create(policy, npages)
>     vm = vm_create(...)
>     vm_set_memory_encryption(vm, encrypt_by_default, enc_bit)
>     //vm_vaddr_alloc_shared() can be used now
> 
> The ucall struct allocations would need to go through
> vm_vaddr_alloc_shared() to make sure the selftest library tracks/maps
> the pages as shared, but that vm_set_memory_encryption() happens too
> late if the ucall_init() stuff is done in vm_create(). It should be
> possible to pass the vm_set_memory_encryption() arguments directly to
> vm_create() to allow for what you're proposing, but I guess we'd need
> a new vm_create() wrapper that handles both the
> vm_set_memory_encryption() args, along with the ucall_ops above,
> something like:
> 
>   sev_vm_create(policy, npages)
>     vm = vm_create_coco(..., encrypt_by_default, enc_bit/shared_bit, ucall_ops)
> 
> Or were you thinking something else? Just trying to get an idea of how
> this will all need to tie in with the SEV selftests and what needs to
> change on that end.

Hmm, I was thinking the selftest framework would only need to be told the VM type,
e.g. DEFAULT, SEV, SEV-ES, SEV-SNP, or TDX, and would then handle setting everything
up, e.g. enumerating the C-bit location and encrypting memory as needed.

One thought would be to extend "enum vm_guest_mode" with flags above NUM_VM_MODES
to specify the VM type.  That way tests that use VM_MODE_DEFAULT would continue to
work without any updates.

Michael Roth Jan. 5, 2022, 7:11 p.m. UTC | #7

On Wed, Jan 05, 2022 at 05:43:21PM +0000, Sean Christopherson wrote:
> On Wed, Jan 05, 2022, Michael Roth wrote:
> > On Wed, Jan 05, 2022 at 12:17:33AM +0000, Sean Christopherson wrote:
> > > PIO shouldn't require instruction decoding or a #VC handler.  What I was thinking
> > > is that the guest in the selftest would make a direct #VMGEXIT/TDCALL to request
> > > PIO instead of executing an OUT.  
> > 
> > That seems like a nicer approach. But it sort of lends itself to having
> > test-specific ucall implementations in some form. How are you thinking
> > vm_create() should decide what implementation to use? With this series
> > in place it could be something like:
> > 
> >   vm_create(..., struct ucall_ops *ops)
> >     ucall_init_ops(ops)
> > 
> > and with the SEV selftests in their current form it would look something
> > like:
> > 
> >   sev_vm_create(...)
> >     vm_create_with_ucall(..., ops=ucall_ops_pio_vmgexit)
> >       ucall_init_ops(ops)
> > 
> > is that sort of what you're thinking, or something else?
> 
> I keep forgetting ucall() doesn't have access to the VM.  But, since we're
> restricing ucall() to a single VM, we can have a global that sets ucall_ops during
> ucall_init() based on the VM type, or skip an ops and just open code the behavior
> in x86's ucall() by snapshotting the VM type.  Either way, the goal is to avoid
> having to pass in ucall_ops at the test level.
> 
> > > Yeah, I was thinking it could be done at the lowest level vm_create() helper.
> > > We'll need to expand vm_create() (or add yet another layer to avoid modifying a
> > > pile of tests) to allow opting out of initializing ucall, e.g. sev_migrate_tests.c
> > > needs to create multiple concurrent VMs, but happily doesn't need ucall support.
> > 
> > Why does sev_migrate_tests need to opt out? Couldn't it use
> > ucall_ops_pio_vmgexit like that SEV case above?
> 
> Because it uses multiple VMs, and my rough sketch only allows for a single VM to
> use ucall.  Though I suppose we could simply keep appending to the ucall list for
> every VM.  The requirement would then be that all VMs are of the same type, i.e.
> utilize the same ucall_ops.

Hmm, maybe I misread your patch. Not supporting multiple VMs was the
reason I gave up on having the ucall structs allocated on-demand and
went with requiring them to be passed as arguments to ucall().

I thought with your patch you had solved that by having each vm have it's
own pool, via vm->ucall_list, and then mapping each pool into each guest
separately via:

  ucall_init(vm):
    ucall_list = vm->ucall_list
    sync_global_to_guest(ucall_list).

then as long as that ucall_init() is done *after* the guest calls
kvm_vm_elf_load(), it will end up with a 'ucall_list' global that points
to it's own specific vm->ucall_list. Then on the test side it doesn't
matter what the 'ucall_list' global is currently set to since you have
the GPA and know what vm exited.

Or am I missing something there?

Although even if that is the case, now that we're proposing doing the
ucall_init() inside vm_create(), then we run the risk of a test calling
kvm_vm_elf_load() after, which might clobber the guest's copy of
ucall_list global if ucall_init() had since been called for another VM.
But that could maybe be worked around by having whatever vm_create()
variant we use also do the kvm_vm_elf_load() unconditionally as part of
creation.

> 
> > I ask because there is a ucall() in the exception handling code where
> > some unhandled exceptions result in the guest automatically issuing a
> > ucall(UCALL_UNHANDLED), so even when tests don't use ucall() they
> > might still rely on it if they enable exception handling. So that might
> > be an argument for always setting up at least the default ucall_ops_pio
> > implementation and creating a pool just in case. (or an argument for
> > dropping the UCALL_HANDLED handling).
> 
> The sev_migrate_tests don't even run a guest, hence the quick-and-dirty "solution".
> Though thinking toward the future, that may be too dirty as it would prevent tests
> from having multiple "real" VMs.
> 
> > > > > To reduce the burden on tests and avoid ordering issues with creating vCPUs,
> > > > > allocate a ucall struct for every possible vCPU when the VM is created and stuff
> > > > > the GPA of the struct in the struct itself so that the guest can communicate the
> > > > > GPA instead of the GVA.  Then confidential VMs just need to make all structs shared.
> > > > 
> > > > So a separate call like:
> > > > 
> > > >   ucall_make_shared(vm->ucall_list)
> > > > 
> > > > ? Might need some good documentation/assertions to make sure it gets
> > > > called at the right place for confidential VMs, and may need some extra
> > > > hooks in SEV selftest implementation for switching from private to shared
> > > > after the memory has already been allocated, but seems reasonable.
> > > 
> > > Again, I was thinking that it would be done unconditionally by ucall_init(), i.e.
> > > would be automatically handled by the selftest framework and would Just Work for
> > > individual tests.
> > 
> > Ok, I'll have to think that through more. Currently with the SEV
> > selftests as they we have:
> > 
> >   sev_vm_create(policy, npages)
> >     vm = vm_create(...)
> >     vm_set_memory_encryption(vm, encrypt_by_default, enc_bit)
> >     //vm_vaddr_alloc_shared() can be used now
> > 
> > The ucall struct allocations would need to go through
> > vm_vaddr_alloc_shared() to make sure the selftest library tracks/maps
> > the pages as shared, but that vm_set_memory_encryption() happens too
> > late if the ucall_init() stuff is done in vm_create(). It should be
> > possible to pass the vm_set_memory_encryption() arguments directly to
> > vm_create() to allow for what you're proposing, but I guess we'd need
> > a new vm_create() wrapper that handles both the
> > vm_set_memory_encryption() args, along with the ucall_ops above,
> > something like:
> > 
> >   sev_vm_create(policy, npages)
> >     vm = vm_create_coco(..., encrypt_by_default, enc_bit/shared_bit, ucall_ops)
> > 
> > Or were you thinking something else? Just trying to get an idea of how
> > this will all need to tie in with the SEV selftests and what needs to
> > change on that end.
> 
> Hmm, I was thinking the selftest framework would only need to be told the VM type,
> e.g. DEFAULT, SEV, SEV-ES, SEV-SNP, or TDX, and would then handle setting everything
> up, e.g. enumerating the C-bit location and encrypting memory as needed.
> 
> One thought would be to extend "enum vm_guest_mode" with flags above NUM_VM_MODES
> to specify the VM type.  That way tests that use VM_MODE_DEFAULT would continue to
> work without any updates.

Ok, let me see what that approach looks like on the SEV selftest side.

Sean Christopherson Jan. 5, 2022, 7:40 p.m. UTC | #8

On Wed, Jan 05, 2022, Michael Roth wrote:
> On Wed, Jan 05, 2022 at 05:43:21PM +0000, Sean Christopherson wrote:
> > Because it uses multiple VMs, and my rough sketch only allows for a single VM to
> > use ucall.  Though I suppose we could simply keep appending to the ucall list for
> > every VM.  The requirement would then be that all VMs are of the same type, i.e.
> > utilize the same ucall_ops.
> 
> Hmm, maybe I misread your patch. Not supporting multiple VMs was the
> reason I gave up on having the ucall structs allocated on-demand and
> went with requiring them to be passed as arguments to ucall().
> 
> I thought with your patch you had solved that by having each vm have it's
> own pool, via vm->ucall_list, and then mapping each pool into each guest
> separately via:
> 
>   ucall_init(vm):
>     ucall_list = vm->ucall_list
>     sync_global_to_guest(ucall_list).
> 
> then as long as that ucall_init() is done *after* the guest calls
> kvm_vm_elf_load(), it will end up with a 'ucall_list' global that points
> to it's own specific vm->ucall_list. Then on the test side it doesn't
> matter what the 'ucall_list' global is currently set to since you have
> the GPA and know what vm exited.
> 
> Or am I missing something there?

Ha, that was not at all intented.  But yes, it should work.  I'd rather be lucky
than good?

> Although even if that is the case, now that we're proposing doing the
> ucall_init() inside vm_create(), then we run the risk of a test calling
> kvm_vm_elf_load() after, which might clobber the guest's copy of
> ucall_list global if ucall_init() had since been called for another VM.
> But that could maybe be worked around by having whatever vm_create()
> variant we use also do the kvm_vm_elf_load() unconditionally as part of
> creation.

Will sync_global_to_guest() even work as intended if kvm_vm_elf_load() hasn't
been called?  If not, then sync_global_{to,from}_guest() should really assert if
the test hasn't been loaded.

As for ucall_init(), I think the best approach would be to make kvm_vm_elf_load()
a static and replace all calls with:

	kvm_vm_load_guest(vm);

where its implementation is:

  void kvm_vm_load_guest(struct kvm_vm *vm)
  {
  	kvm_vm_elf_load(vm, program_invocation_name);

	ucall_init(vm);
  }

The logic being that if a test creates a VM but never loads any code into the guest,
e.g. kvm_create_max_vcpus, then it _can't_ make ucalls.

Michael Roth Jan. 5, 2022, 9:35 p.m. UTC | #9

On Wed, Jan 05, 2022 at 07:40:57PM +0000, Sean Christopherson wrote:
> On Wed, Jan 05, 2022, Michael Roth wrote:
> > On Wed, Jan 05, 2022 at 05:43:21PM +0000, Sean Christopherson wrote:
> > > Because it uses multiple VMs, and my rough sketch only allows for a single VM to
> > > use ucall.  Though I suppose we could simply keep appending to the ucall list for
> > > every VM.  The requirement would then be that all VMs are of the same type, i.e.
> > > utilize the same ucall_ops.
> > 
> > Hmm, maybe I misread your patch. Not supporting multiple VMs was the
> > reason I gave up on having the ucall structs allocated on-demand and
> > went with requiring them to be passed as arguments to ucall().
> > 
> > I thought with your patch you had solved that by having each vm have it's
> > own pool, via vm->ucall_list, and then mapping each pool into each guest
> > separately via:
> > 
> >   ucall_init(vm):
> >     ucall_list = vm->ucall_list
> >     sync_global_to_guest(ucall_list).
> > 
> > then as long as that ucall_init() is done *after* the guest calls
> > kvm_vm_elf_load(), it will end up with a 'ucall_list' global that points
> > to it's own specific vm->ucall_list. Then on the test side it doesn't
> > matter what the 'ucall_list' global is currently set to since you have
> > the GPA and know what vm exited.
> > 
> > Or am I missing something there?
> 
> Ha, that was not at all intented.  But yes, it should work.  I'd rather be lucky
> than good?

:)

> 
> > Although even if that is the case, now that we're proposing doing the
> > ucall_init() inside vm_create(), then we run the risk of a test calling
> > kvm_vm_elf_load() after, which might clobber the guest's copy of
> > ucall_list global if ucall_init() had since been called for another VM.
> > But that could maybe be worked around by having whatever vm_create()
> > variant we use also do the kvm_vm_elf_load() unconditionally as part of
> > creation.
> 
> Will sync_global_to_guest() even work as intended if kvm_vm_elf_load() hasn't
> been called?  If not, then sync_global_{to,from}_guest() should really assert if
> the test hasn't been loaded.

Yah, seems like it would get clobbered by kvm_vm_elf_load() later. And
can't think of any good reason to use sync_global_to_guest() without also
needing kvm_vm_elf_load() at some point, so makes sense to enforce it.

> 
> As for ucall_init(), I think the best approach would be to make kvm_vm_elf_load()
> a static and replace all calls with:
> 
> 	kvm_vm_load_guest(vm);
> 
> where its implementation is:
> 
>   void kvm_vm_load_guest(struct kvm_vm *vm)
>   {
>   	kvm_vm_elf_load(vm, program_invocation_name);
> 
> 	ucall_init(vm);
>   }
> 
> The logic being that if a test creates a VM but never loads any code into the guest,
> e.g. kvm_create_max_vcpus, then it _can't_ make ucalls.

Makes sense. And if different ops are needed for vmgexit()/tdcall() it
could be something like (if based on patches 1-5 of this series, and
extending vm_guest_mode as you suggested earlier):

   void kvm_vm_load_guest(struct kvm_vm *vm)
   {

     kvm_vm_elf_load(vm, program_invocation_name);
  
     if (vm->mode == VM_MODE_SEV)
  	    ucall_init_ops(vm, ucall_ops_pio_vmgexit);
     else (vm->vm_type == VM_MODE_TDX)
  	    ucall_init_ops(vm, ucall_ops_pio_tdcall);
     else
  	    ucall_init_ops(vm, ucall_ops_pio);

Shame we have to update all the kvm_vm_elf_load() call-sites, but
they'd end up potentially breaking things if left as-is anyway.

Were you planning on sending patches for these changes, or should I incorporate
your prototype and take a stab at the other changes as part of v2 of this
series?

Sean Christopherson Jan. 5, 2022, 10:02 p.m. UTC | #10

On Wed, Jan 05, 2022, Michael Roth wrote:
> On Wed, Jan 05, 2022 at 07:40:57PM +0000, Sean Christopherson wrote:
> > As for ucall_init(), I think the best approach would be to make kvm_vm_elf_load()
> > a static and replace all calls with:
> > 
> > 	kvm_vm_load_guest(vm);
> > 
> > where its implementation is:
> > 
> >   void kvm_vm_load_guest(struct kvm_vm *vm)
> >   {
> >   	kvm_vm_elf_load(vm, program_invocation_name);
> > 
> > 	ucall_init(vm);
> >   }
> > 
> > The logic being that if a test creates a VM but never loads any code into the guest,
> > e.g. kvm_create_max_vcpus, then it _can't_ make ucalls.
> 
> Makes sense. And if different ops are needed for vmgexit()/tdcall() it
> could be something like (if based on patches 1-5 of this series, and
> extending vm_guest_mode as you suggested earlier):
> 
>    void kvm_vm_load_guest(struct kvm_vm *vm)
>    {
> 
>      kvm_vm_elf_load(vm, program_invocation_name);
>   
>      if (vm->mode == VM_MODE_SEV)
>   	    ucall_init_ops(vm, ucall_ops_pio_vmgexit);
>      else (vm->vm_type == VM_MODE_TDX)

I don't think we want to do this here, but instead down in the arch-specific
ucall_init().  Also, not sure if I was clear before (can't tell what you interpreted
based on the above snippet), but I think we'll want VM_MODE_SEV etc... to be
modifiers on top of the VA/PA stuff.

>   	    ucall_init_ops(vm, ucall_ops_pio_tdcall);
>      else
>   	    ucall_init_ops(vm, ucall_ops_pio);
> 
> Shame we have to update all the kvm_vm_elf_load() call-sites, but
> they'd end up potentially breaking things if left as-is anyway.
> 
> Were you planning on sending patches for these changes, or should I incorporate
> your prototype and take a stab at the other changes as part of v2 of this
> series?

Nope, all yours.  Thanks!

Michael Roth Jan. 5, 2022, 10:32 p.m. UTC | #11

On Wed, Jan 05, 2022 at 10:02:53PM +0000, Sean Christopherson wrote:
> On Wed, Jan 05, 2022, Michael Roth wrote:
> > On Wed, Jan 05, 2022 at 07:40:57PM +0000, Sean Christopherson wrote:
> > > As for ucall_init(), I think the best approach would be to make kvm_vm_elf_load()
> > > a static and replace all calls with:
> > > 
> > > 	kvm_vm_load_guest(vm);
> > > 
> > > where its implementation is:
> > > 
> > >   void kvm_vm_load_guest(struct kvm_vm *vm)
> > >   {
> > >   	kvm_vm_elf_load(vm, program_invocation_name);
> > > 
> > > 	ucall_init(vm);
> > >   }
> > > 
> > > The logic being that if a test creates a VM but never loads any code into the guest,
> > > e.g. kvm_create_max_vcpus, then it _can't_ make ucalls.
> > 
> > Makes sense. And if different ops are needed for vmgexit()/tdcall() it
> > could be something like (if based on patches 1-5 of this series, and
> > extending vm_guest_mode as you suggested earlier):
> > 
> >    void kvm_vm_load_guest(struct kvm_vm *vm)
> >    {
> > 
> >      kvm_vm_elf_load(vm, program_invocation_name);
> >   
> >      if (vm->mode == VM_MODE_SEV)
> >   	    ucall_init_ops(vm, ucall_ops_pio_vmgexit);
> >      else (vm->vm_type == VM_MODE_TDX)
> 
> I don't think we want to do this here, but instead down in the arch-specific
> ucall_init().  Also, not sure if I was clear before (can't tell what you interpreted
> based on the above snippet), but I think we'll want VM_MODE_SEV etc... to be
> modifiers on top of the VA/PA stuff.

Ok, something like this (with additional ones added as-needed)?

  #define VM_MODE_DEFAULT VM_MODE_PXXV48_4K
  +#define SEV_VM_MODE_DEFAULT SEV_VM_MODE_PXXV48_4K

  enum vm_guest_mode {
    ...
    VM_MODE_PXXV48_4K,
    ...
    NUM_VM_MODES,
  + SEV_VM_MODE_PXXV48_4K,
  + NUM_VM_MODES_EXTENDED,
  }

> 
> >   	    ucall_init_ops(vm, ucall_ops_pio_tdcall);
> >      else
> >   	    ucall_init_ops(vm, ucall_ops_pio);
> > 
> > Shame we have to update all the kvm_vm_elf_load() call-sites, but
> > they'd end up potentially breaking things if left as-is anyway.
> > 
> > Were you planning on sending patches for these changes, or should I incorporate
> > your prototype and take a stab at the other changes as part of v2 of this
> > series?
> 
> Nope, all yours.  Thanks!

Thanks for the suggestions!

[RFC,00/10] KVM: selftests: Add support for test-selectable ucall implementations

Message

Comments