[v3,0/9] Generalize memory encryption models
mbox series

Message ID 20200619020602.118306-1-david@gibson.dropbear.id.au
Headers show
Series
  • Generalize memory encryption models
Related show

Message

David Gibson June 19, 2020, 2:05 a.m. UTC
A number of hardware platforms are implementing mechanisms whereby the
hypervisor does not have unfettered access to guest memory, in order
to mitigate the security impact of a compromised hypervisor.

AMD's SEV implements this with in-cpu memory encryption, and Intel has
its own memory encryption mechanism.  POWER has an upcoming mechanism
to accomplish this in a different way, using a new memory protection
level plus a small trusted ultravisor.  s390 also has a protected
execution environment.

The current code (committed or draft) for these features has each
platform's version configured entirely differently.  That doesn't seem
ideal for users, or particularly for management layers.

AMD SEV introduces a notionally generic machine option
"machine-encryption", but it doesn't actually cover any cases other
than SEV.

This series is a proposal to at least partially unify configuration
for these mechanisms, by renaming and generalizing AMD's
"memory-encryption" property.  It is replaced by a
"host-trust-limitation" property pointing to a platform specific
object which configures and manages the specific details.

For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
can be extended to cover the Intel and s390 mechanisms as well,
though.

Please apply.

Changes since RFCv2:
 * Rebased
 * Removed preliminary SEV cleanups (they've been merged)
 * Changed name to "host trust limitation"
 * Added migration blocker to the PEF code (based on SEV's version)
Changes since RFCv1:
 * Rebased
 * Fixed some errors pointed out by Dave Gilbert

David Gibson (9):
  host trust limitation: Introduce new host trust limitation interface
  host trust limitation: Handle memory encryption via interface
  host trust limitation: Move side effect out of
    machine_set_memory_encryption()
  host trust limitation: Rework the "memory-encryption" property
  host trust limitation: Decouple kvm_memcrypt_*() helpers from KVM
  host trust limitation: Add Error ** to HostTrustLimitation::kvm_init
  spapr: Add PEF based host trust limitation
  spapr: PEF: block migration
  host trust limitation: Alter virtio default properties for protected
    guests

 accel/kvm/kvm-all.c                  |  40 ++------
 accel/kvm/sev-stub.c                 |   7 +-
 accel/stubs/kvm-stub.c               |  10 --
 backends/Makefile.objs               |   2 +
 backends/host-trust-limitation.c     |  29 ++++++
 hw/core/machine.c                    |  61 +++++++++--
 hw/i386/pc_sysfw.c                   |   6 +-
 include/exec/host-trust-limitation.h |  72 +++++++++++++
 include/hw/boards.h                  |   2 +-
 include/qemu/typedefs.h              |   1 +
 include/sysemu/kvm.h                 |  17 ----
 include/sysemu/sev.h                 |   4 +-
 target/i386/sev.c                    | 146 ++++++++++++---------------
 target/ppc/Makefile.objs             |   2 +-
 target/ppc/pef.c                     |  89 ++++++++++++++++
 15 files changed, 325 insertions(+), 163 deletions(-)
 create mode 100644 backends/host-trust-limitation.c
 create mode 100644 include/exec/host-trust-limitation.h
 create mode 100644 target/ppc/pef.c

Comments

no-reply@patchew.org June 19, 2020, 2:42 a.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20200619020602.118306-1-david@gibson.dropbear.id.au/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      contrib/vhost-user-input/main.o
  LINK    tests/qemu-iotests/socket_scm_helper
  GEN     docs/interop/qemu-qmp-ref.html
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  GEN     docs/interop/qemu-qmp-ref.txt
  GEN     docs/interop/qemu-qmp-ref.7
  CC      qga/commands.o
---
  SIGN    pc-bios/optionrom/pvh.bin
  LINK    elf2dmp
  CC      qemu-img.o
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    qemu-io
  LINK    qemu-edid
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    fsdev/virtfs-proxy-helper
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    scsi/qemu-pr-helper
  LINK    qemu-bridge-helper
  AR      libvhost-user.a
---
  GEN     docs/interop/qemu-ga-ref.txt
  GEN     docs/interop/qemu-ga-ref.7
  LINK    qemu-keymap
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    ivshmem-client
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    ivshmem-server
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    qemu-nbd
  LINK    qemu-storage-daemon
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    virtiofsd
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    vhost-user-input
  LINK    qemu-ga
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    qemu-img
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  GEN     x86_64-softmmu/hmp-commands.h
  GEN     x86_64-softmmu/config-devices.h
  GEN     x86_64-softmmu/hmp-commands-info.h
---
  CC      x86_64-softmmu/gdbstub-xml.o
  CC      x86_64-softmmu/trace/generated-helpers.o
  LINK    x86_64-softmmu/qemu-system-x86_64
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
common.rc: line 50: test: check: binary operator expected
(printf '#define QEMU_PKGVERSION ""\n'; printf '#define QEMU_FULL_VERSION "5.0.50"\n'; ) > qemu-version.h.tmp
make -C /tmp/qemu-test/src/slirp BUILD_DIR="/tmp/qemu-test/build/slirp" PKG_CONFIG="pkg-config" CC="clang" AR="ar"      LD="ld" RANLIB="ranlib" CFLAGS="-I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -g " LDFLAGS="-Wl,--warn-common -fsanitize=undefined -fsanitize=address -Wl,-z,relro -Wl,-z,now -pie -m64  -fstack-protector-strong"
---
clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote /tmp/qemu-test/src/tcg/i386 -isystem /tmp/qemu-test/src/linux-headers -isystem /tmp/qemu-test/build/linux-headers -iquote . -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote /tmp/qemu-test/src/include -iquote /tmp/qemu-test/src/disas/libvixl -I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -MMD -MP -MT tests/test-crypto-cipher.o -MF tests/test-crypto-cipher.d -g   -c -o tests/test-crypto-cipher.o /tmp/qemu-test/src/tests/test-crypto-cipher.c
clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote /tmp/qemu-test/src/tcg/i386 -isystem /tmp/qemu-test/src/linux-headers -isystem /tmp/qemu-test/build/linux-headers -iquote . -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote /tmp/qemu-test/src/include -iquote /tmp/qemu-test/src/disas/libvixl -I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -MMD -MP -MT tests/test-crypto-secret.o -MF tests/test-crypto-secret.d -g   -c -o tests/test-crypto-secret.o /tmp/qemu-test/src/tests/test-crypto-secret.c
clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote /tmp/qemu-test/src/tcg/i386 -isystem /tmp/qemu-test/src/linux-headers -isystem /tmp/qemu-test/build/linux-headers -iquote . -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote /tmp/qemu-test/src/include -iquote /tmp/qemu-test/src/disas/libvixl -I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -MMD -MP -MT tests/test-crypto-tlscredsx509.o -MF tests/test-crypto-tlscredsx509.d -g   -c -o tests/test-crypto-tlscredsx509.o /tmp/qemu-test/src/tests/test-crypto-tlscredsx509.c
/tmp/qemu-test/src/tests/qht-bench.c:287:29: error: implicit conversion from 'unsigned long' to 'double' changes value from 18446744073709551615 to 18446744073709551616 [-Werror,-Wimplicit-int-float-conversion]
        *threshold = rate * UINT64_MAX;
                          ~ ^~~~~~~~~~
/usr/include/stdint.h:130:23: note: expanded from macro 'UINT64_MAX'
---
18446744073709551615UL
^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: tests/qht-bench.o] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 669, in <module>
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=0f30ada1843f4b7a8b54df19919b5462', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-wdzkmbob/src/docker-src.2020-06-18-22.37.15.1263:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=0f30ada1843f4b7a8b54df19919b5462
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-wdzkmbob/src'
make: *** [docker-run-test-debug@fedora] Error 2

real    5m41.189s
user    0m7.985s


The full log is available at
http://patchew.org/logs/20200619020602.118306-1-david@gibson.dropbear.id.au/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
David Hildenbrand June 19, 2020, 8:28 a.m. UTC | #2
On 19.06.20 04:05, David Gibson wrote:
> A number of hardware platforms are implementing mechanisms whereby the
> hypervisor does not have unfettered access to guest memory, in order
> to mitigate the security impact of a compromised hypervisor.
> 
> AMD's SEV implements this with in-cpu memory encryption, and Intel has
> its own memory encryption mechanism.  POWER has an upcoming mechanism
> to accomplish this in a different way, using a new memory protection
> level plus a small trusted ultravisor.  s390 also has a protected
> execution environment.

Each architecture finds its own way to vandalize the original
architecture, some in more extreme/obscure ways than others. I guess in
the long term we'll regret most of that, but what do I know :)

> 
> The current code (committed or draft) for these features has each
> platform's version configured entirely differently.  That doesn't seem
> ideal for users, or particularly for management layers.
> 
> AMD SEV introduces a notionally generic machine option
> "machine-encryption", but it doesn't actually cover any cases other
> than SEV.
> 
> This series is a proposal to at least partially unify configuration
> for these mechanisms, by renaming and generalizing AMD's
> "memory-encryption" property.  It is replaced by a
> "host-trust-limitation" property pointing to a platform specific
> object which configures and manages the specific details.
> 

I consider the property name sub-optimal. Yes, I am aware that there are
other approaches being discussed on the KVM list to disallow access to
guest memory without memory encryption. (most of them sound like people
are trying to convert KVM into XEN, but again, what do I know ... :)  )

"host-trust-limitation"  sounds like "I am the hypervisor, I configure
limited trust into myself". Also, "untrusted-host" would be a little bit
nicer (I think trust is a black/white thing).

However, once we have multiple options to protect a guest (memory
encryption, unmapping guest pages ,...) the name will no longer really
suffice to configure QEMU, no?

> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> can be extended to cover the Intel and s390 mechanisms as well,
> though.

The only approach on s390x to not glue command line properties to the
cpu model would be to remove the CPU model feature and replace it by the
command line parameter. But that would, of course, be an incompatible break.

How do upper layers actually figure out if memory encryption etc is
available? on s390x, it's simply via the expanded host CPU model.
Cornelia Huck June 19, 2020, 9:45 a.m. UTC | #3
On Fri, 19 Jun 2020 10:28:22 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 19.06.20 04:05, David Gibson wrote:
> > A number of hardware platforms are implementing mechanisms whereby the
> > hypervisor does not have unfettered access to guest memory, in order
> > to mitigate the security impact of a compromised hypervisor.
> > 
> > AMD's SEV implements this with in-cpu memory encryption, and Intel has
> > its own memory encryption mechanism.  POWER has an upcoming mechanism
> > to accomplish this in a different way, using a new memory protection
> > level plus a small trusted ultravisor.  s390 also has a protected
> > execution environment.  
> 
> Each architecture finds its own way to vandalize the original
> architecture, some in more extreme/obscure ways than others. I guess in
> the long term we'll regret most of that, but what do I know :)
> 
> > 
> > The current code (committed or draft) for these features has each
> > platform's version configured entirely differently.  That doesn't seem
> > ideal for users, or particularly for management layers.
> > 
> > AMD SEV introduces a notionally generic machine option
> > "machine-encryption", but it doesn't actually cover any cases other
> > than SEV.
> > 
> > This series is a proposal to at least partially unify configuration
> > for these mechanisms, by renaming and generalizing AMD's
> > "memory-encryption" property.  It is replaced by a
> > "host-trust-limitation" property pointing to a platform specific
> > object which configures and manages the specific details.
> >   
> 
> I consider the property name sub-optimal. Yes, I am aware that there are
> other approaches being discussed on the KVM list to disallow access to
> guest memory without memory encryption. (most of them sound like people
> are trying to convert KVM into XEN, but again, what do I know ... :)  )
> 
> "host-trust-limitation"  sounds like "I am the hypervisor, I configure
> limited trust into myself". Also, "untrusted-host" would be a little bit
> nicer (I think trust is a black/white thing).
> 
> However, once we have multiple options to protect a guest (memory
> encryption, unmapping guest pages ,...) the name will no longer really
> suffice to configure QEMU, no?

Hm... we could have a property that accepts bits indicating where the
actual limitation lies. Different parts of the code could then make
more fine-grained decisions of what needs to be done. Feels a bit
overengineered today; but maybe there's already stuff with different
semantics in the pipeline somewhere?

> 
> > For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> > can be extended to cover the Intel and s390 mechanisms as well,
> > though.  
> 
> The only approach on s390x to not glue command line properties to the
> cpu model would be to remove the CPU model feature and replace it by the
> command line parameter. But that would, of course, be an incompatible break.

Yuck.

We still need to provide the cpu feature to the *guest* in any case, no?

> 
> How do upper layers actually figure out if memory encryption etc is
> available? on s390x, it's simply via the expanded host CPU model.
>
David Gibson June 19, 2020, 9:48 a.m. UTC | #4
On Fri, Jun 19, 2020 at 10:28:22AM +0200, David Hildenbrand wrote:
> On 19.06.20 04:05, David Gibson wrote:
> > A number of hardware platforms are implementing mechanisms whereby the
> > hypervisor does not have unfettered access to guest memory, in order
> > to mitigate the security impact of a compromised hypervisor.
> > 
> > AMD's SEV implements this with in-cpu memory encryption, and Intel has
> > its own memory encryption mechanism.  POWER has an upcoming mechanism
> > to accomplish this in a different way, using a new memory protection
> > level plus a small trusted ultravisor.  s390 also has a protected
> > execution environment.
> 
> Each architecture finds its own way to vandalize the original
> architecture, some in more extreme/obscure ways than others. I guess in
> the long term we'll regret most of that, but what do I know :)

Well, sure, but that's no *more* true if we start from a common point.

> > The current code (committed or draft) for these features has each
> > platform's version configured entirely differently.  That doesn't seem
> > ideal for users, or particularly for management layers.
> > 
> > AMD SEV introduces a notionally generic machine option
> > "machine-encryption", but it doesn't actually cover any cases other
> > than SEV.
> > 
> > This series is a proposal to at least partially unify configuration
> > for these mechanisms, by renaming and generalizing AMD's
> > "memory-encryption" property.  It is replaced by a
> > "host-trust-limitation" property pointing to a platform specific
> > object which configures and manages the specific details.
> 
> I consider the property name sub-optimal. Yes, I am aware that there are
> other approaches being discussed on the KVM list to disallow access to
> guest memory without memory encryption. (most of them sound like people
> are trying to convert KVM into XEN, but again, what do I know ... :)  )

I don't love the name, it's just the best I've thought of so far.

> "host-trust-limitation"  sounds like "I am the hypervisor, I configure
> limited trust into myself".

Which is kind of... accurate...

> Also, "untrusted-host" would be a little bit
> nicer (I think trust is a black/white thing).

> However, once we have multiple options to protect a guest (memory
> encryption, unmapping guest pages ,...) the name will no longer really
> suffice to configure QEMU, no?

That's why it takes a parameter.  It points to an object which can
itself have more properties to configure the details.  SEV already
needs that set up, though for both PEF and s390 PV we could pre-create
a standard htl object.

> > For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> > can be extended to cover the Intel and s390 mechanisms as well,
> > though.
> 
> The only approach on s390x to not glue command line properties to the
> cpu model would be to remove the CPU model feature and replace it by the
> command line parameter. But that would, of course, be an incompatible break.

I don't really understand why you're so against setting the cpu
default parameters from the machine.  The machine already sets basic
configuration for all sorts of devices in the VM, that's kind of what
it's for.

> How do upper layers actually figure out if memory encryption etc is
> available? on s390x, it's simply via the expanded host CPU model.

Haven't really tackled that yet.  But one way that works for multiple
systems has got to be better than a separate one for each, right?
David Hildenbrand June 19, 2020, 9:56 a.m. UTC | #5
>> "host-trust-limitation"  sounds like "I am the hypervisor, I configure
>> limited trust into myself". Also, "untrusted-host" would be a little bit
>> nicer (I think trust is a black/white thing).
>>
>> However, once we have multiple options to protect a guest (memory
>> encryption, unmapping guest pages ,...) the name will no longer really
>> suffice to configure QEMU, no?
> 
> Hm... we could have a property that accepts bits indicating where the
> actual limitation lies. Different parts of the code could then make
> more fine-grained decisions of what needs to be done. Feels a bit
> overengineered today; but maybe there's already stuff with different
> semantics in the pipeline somewhere?
> 
>>
>>> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
>>> can be extended to cover the Intel and s390 mechanisms as well,
>>> though.  
>>
>> The only approach on s390x to not glue command line properties to the
>> cpu model would be to remove the CPU model feature and replace it by the
>> command line parameter. But that would, of course, be an incompatible break.
> 
> Yuck.
> 
> We still need to provide the cpu feature to the *guest* in any case, no?

Yeah, but that could be wired up internally. Wouldn't consider it clean,
though (I second the "overengineered" above).
David Hildenbrand June 19, 2020, 10:04 a.m. UTC | #6
>> However, once we have multiple options to protect a guest (memory
>> encryption, unmapping guest pages ,...) the name will no longer really
>> suffice to configure QEMU, no?
> 
> That's why it takes a parameter.  It points to an object which can
> itself have more properties to configure the details.  SEV already
> needs that set up, though for both PEF and s390 PV we could pre-create
> a standard htl object.

Ah, okay, that's the "platform specific object which configures and
manages the specific details". It would have been nice in the cover
letter to show some examples of how that would look like.

So it's wrapping architecture-specific data in a common parameter. Hmm.

> 
>>> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
>>> can be extended to cover the Intel and s390 mechanisms as well,
>>> though.
>>
>> The only approach on s390x to not glue command line properties to the
>> cpu model would be to remove the CPU model feature and replace it by the
>> command line parameter. But that would, of course, be an incompatible break.
> 
> I don't really understand why you're so against setting the cpu
> default parameters from the machine.  The machine already sets basic
> configuration for all sorts of devices in the VM, that's kind of what
> it's for.

It's a general design philosophy that the CPU model (especially the host
CPU model) does not depend on other command line parameters (except the
accelerator, and I think in corner cases on the machine). Necessary for
reliable host model probing by libvirt, for example.

We also don't have similar things for nested virt.

> 
>> How do upper layers actually figure out if memory encryption etc is
>> available? on s390x, it's simply via the expanded host CPU model.
> 
> Haven't really tackled that yet.  But one way that works for multiple
> systems has got to be better than a separate one for each, right?

I think that's an important piece. Especially once multiple different
approaches are theoretically available one wants to sense from upper layers.

At least on s390x, it really is like just another CPU-visible feature
that tells the guest that it can switch to protected mode.
Cornelia Huck June 19, 2020, 10:05 a.m. UTC | #7
On Fri, 19 Jun 2020 11:56:49 +0200
David Hildenbrand <david@redhat.com> wrote:

> >>> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> >>> can be extended to cover the Intel and s390 mechanisms as well,
> >>> though.    
> >>
> >> The only approach on s390x to not glue command line properties to the
> >> cpu model would be to remove the CPU model feature and replace it by the
> >> command line parameter. But that would, of course, be an incompatible break.  
> > 
> > Yuck.
> > 
> > We still need to provide the cpu feature to the *guest* in any case, no?  
> 
> Yeah, but that could be wired up internally. Wouldn't consider it clean,
> though (I second the "overengineered" above).

Could an internally wired-up cpu feature be introspected? Also, what
happens if new cpu features are introduced that have a dependency on or
a conflict with this one?
David Hildenbrand June 19, 2020, 10:10 a.m. UTC | #8
On 19.06.20 12:05, Cornelia Huck wrote:
> On Fri, 19 Jun 2020 11:56:49 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>>>>> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
>>>>> can be extended to cover the Intel and s390 mechanisms as well,
>>>>> though.    
>>>>
>>>> The only approach on s390x to not glue command line properties to the
>>>> cpu model would be to remove the CPU model feature and replace it by the
>>>> command line parameter. But that would, of course, be an incompatible break.  
>>>
>>> Yuck.
>>>
>>> We still need to provide the cpu feature to the *guest* in any case, no?  
>>
>> Yeah, but that could be wired up internally. Wouldn't consider it clean,
>> though (I second the "overengineered" above).
> 
> Could an internally wired-up cpu feature be introspected? Also, what

Nope. It would just be e.g., a "machine feature" indicated to the guest
via the STFL interface/instruction. I was tackling the introspect part
when asking David how to sense from upper layers. It would have to be
sense via a different interface as it would not longer be modeled as
part of CPU features in QEMU.

> happens if new cpu features are introduced that have a dependency on or
> a conflict with this one?

Conflict: bail out in QEMU when incompatible options are specified.
Dependency: warn and continue/fixup (e.g., mask off?)

Not clean I think.
Cornelia Huck June 22, 2020, 12:02 p.m. UTC | #9
On Fri, 19 Jun 2020 12:10:13 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 19.06.20 12:05, Cornelia Huck wrote:
> > On Fri, 19 Jun 2020 11:56:49 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >>>>> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> >>>>> can be extended to cover the Intel and s390 mechanisms as well,
> >>>>> though.      
> >>>>
> >>>> The only approach on s390x to not glue command line properties to the
> >>>> cpu model would be to remove the CPU model feature and replace it by the
> >>>> command line parameter. But that would, of course, be an incompatible break.    
> >>>
> >>> Yuck.
> >>>
> >>> We still need to provide the cpu feature to the *guest* in any case, no?    
> >>
> >> Yeah, but that could be wired up internally. Wouldn't consider it clean,
> >> though (I second the "overengineered" above).  
> > 
> > Could an internally wired-up cpu feature be introspected? Also, what  
> 
> Nope. It would just be e.g., a "machine feature" indicated to the guest
> via the STFL interface/instruction. I was tackling the introspect part
> when asking David how to sense from upper layers. It would have to be
> sense via a different interface as it would not longer be modeled as
> part of CPU features in QEMU.
> 
> > happens if new cpu features are introduced that have a dependency on or
> > a conflict with this one?  
> 
> Conflict: bail out in QEMU when incompatible options are specified.
> Dependency: warn and continue/fixup (e.g., mask off?)

Masking off would likely be surprising to the user.

> Not clean I think.

I agree.

Still unsure how to bring this new machine property and the cpu feature
together. Would be great to have the same interface everywhere, but
having two distinct command line objects depend on each other sucks.
Automatically setting the feature bit if pv is supported complicates
things further.

(Is there any requirement that the machine object has been already set
up before the cpu features are processed? Or the other way around?)

Does this have any implications when probing with the 'none' machine?
Christian Borntraeger June 22, 2020, 2:27 p.m. UTC | #10
On 19.06.20 04:05, David Gibson wrote:
> A number of hardware platforms are implementing mechanisms whereby the
> hypervisor does not have unfettered access to guest memory, in order
> to mitigate the security impact of a compromised hypervisor.
> 
> AMD's SEV implements this with in-cpu memory encryption, and Intel has
> its own memory encryption mechanism.  POWER has an upcoming mechanism
> to accomplish this in a different way, using a new memory protection
> level plus a small trusted ultravisor.  s390 also has a protected
> execution environment.
> 
> The current code (committed or draft) for these features has each
> platform's version configured entirely differently.  That doesn't seem
> ideal for users, or particularly for management layers.
> 
> AMD SEV introduces a notionally generic machine option
> "machine-encryption", but it doesn't actually cover any cases other
> than SEV.
> 
> This series is a proposal to at least partially unify configuration
> for these mechanisms, by renaming and generalizing AMD's
> "memory-encryption" property.  It is replaced by a
> "host-trust-limitation" property pointing to a platform specific
> object which configures and manages the specific details.
> 
> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> can be extended to cover the Intel and s390 mechanisms as well,
> though.

Let me try to summarize what I understand what you try to achieve:
one command line parameter for all platforms that 

common across all platforms:
- disable KSM
- by default enables iommu_platform


per platform:
- setup the necessary encryption scheme when appropriate
- block migration
-....


The tricky part is certainly the per platform thing. For example on
s390 we just have a cpumodel flag that provides interfaces to the guest
to switch into protected mode via the ultravisor. This works perfectly
fine with the host model, so no need to configure anything.  The platform
code then disables KSM _on_switchover_ and not in general. Because the 
guest CAN switch into protected, but it does not have to.

So this feels really hard to do right. Would a virtual BoF on KVM forum
be too late? We had a BoF on protected guests last year and that was
valuable.
Cornelia Huck June 24, 2020, 7:06 a.m. UTC | #11
On Mon, 22 Jun 2020 16:27:28 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 19.06.20 04:05, David Gibson wrote:
> > A number of hardware platforms are implementing mechanisms whereby the
> > hypervisor does not have unfettered access to guest memory, in order
> > to mitigate the security impact of a compromised hypervisor.
> > 
> > AMD's SEV implements this with in-cpu memory encryption, and Intel has
> > its own memory encryption mechanism.  POWER has an upcoming mechanism
> > to accomplish this in a different way, using a new memory protection
> > level plus a small trusted ultravisor.  s390 also has a protected
> > execution environment.
> > 
> > The current code (committed or draft) for these features has each
> > platform's version configured entirely differently.  That doesn't seem
> > ideal for users, or particularly for management layers.
> > 
> > AMD SEV introduces a notionally generic machine option
> > "machine-encryption", but it doesn't actually cover any cases other
> > than SEV.
> > 
> > This series is a proposal to at least partially unify configuration
> > for these mechanisms, by renaming and generalizing AMD's
> > "memory-encryption" property.  It is replaced by a
> > "host-trust-limitation" property pointing to a platform specific
> > object which configures and manages the specific details.
> > 
> > For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> > can be extended to cover the Intel and s390 mechanisms as well,
> > though.  
> 
> Let me try to summarize what I understand what you try to achieve:
> one command line parameter for all platforms that 
> 
> common across all platforms:
> - disable KSM
> - by default enables iommu_platform
> 
> 
> per platform:
> - setup the necessary encryption scheme when appropriate
> - block migration
> -....
> 
> 
> The tricky part is certainly the per platform thing. For example on
> s390 we just have a cpumodel flag that provides interfaces to the guest
> to switch into protected mode via the ultravisor. This works perfectly
> fine with the host model, so no need to configure anything.  The platform
> code then disables KSM _on_switchover_ and not in general. Because the 
> guest CAN switch into protected, but it does not have to.
> 
> So this feels really hard to do right. Would a virtual BoF on KVM forum
> be too late? We had a BoF on protected guests last year and that was
> valuable.

Maybe we can do some kind of call to discuss this earlier? (Maybe in
the KVM call slot on Tuesdays?) I think it would be really helpful if
everybody would have at least a general understanding about how
encryption/protection works on the different architectures.
David Gibson June 25, 2020, 5:25 a.m. UTC | #12
On Mon, Jun 22, 2020 at 02:02:54PM +0200, Cornelia Huck wrote:
> On Fri, 19 Jun 2020 12:10:13 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
> > On 19.06.20 12:05, Cornelia Huck wrote:
> > > On Fri, 19 Jun 2020 11:56:49 +0200
> > > David Hildenbrand <david@redhat.com> wrote:
> > >   
> > >>>>> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> > >>>>> can be extended to cover the Intel and s390 mechanisms as well,
> > >>>>> though.      
> > >>>>
> > >>>> The only approach on s390x to not glue command line properties to the
> > >>>> cpu model would be to remove the CPU model feature and replace it by the
> > >>>> command line parameter. But that would, of course, be an incompatible break.    
> > >>>
> > >>> Yuck.
> > >>>
> > >>> We still need to provide the cpu feature to the *guest* in any case, no?    
> > >>
> > >> Yeah, but that could be wired up internally. Wouldn't consider it clean,
> > >> though (I second the "overengineered" above).  
> > > 
> > > Could an internally wired-up cpu feature be introspected? Also, what  
> > 
> > Nope. It would just be e.g., a "machine feature" indicated to the guest
> > via the STFL interface/instruction. I was tackling the introspect part
> > when asking David how to sense from upper layers. It would have to be
> > sense via a different interface as it would not longer be modeled as
> > part of CPU features in QEMU.
> > 
> > > happens if new cpu features are introduced that have a dependency on or
> > > a conflict with this one?  
> > 
> > Conflict: bail out in QEMU when incompatible options are specified.
> > Dependency: warn and continue/fixup (e.g., mask off?)
> 
> Masking off would likely be surprising to the user.
> 
> > Not clean I think.
> 
> I agree.
> 
> Still unsure how to bring this new machine property and the cpu feature
> together. Would be great to have the same interface everywhere, but
> having two distinct command line objects depend on each other sucks.

Kinda, but the reality is that hardware - virtual and otherwise -
frequently doesn't have entirely orthogonal configuration for each of
its components.  This is by no means new in that regard.

> Automatically setting the feature bit if pv is supported complicates
> things further.

AIUI, on s390 the "unpack" feature is available by default on recent
models.  In that case you could do this:

 * Don't modify either cpu or HTL options based on each other
 * Bail out if the user specifies a non "unpack" secure CPU along with
   the HTL option

Cases of note:
 - User specifies an old CPU model + htl
   or explicitly sets unpack=off + htl
	=> fails with an error, correctly
 - User specifies modern/default cpu + htl, with secure aware guest
 	=> works as a secure guest
 - User specifies modern/default cpu + htl, with non secure aware guest
	=> works, though not secure (and maybe slower than neccessary)
 - User specifies modern/default cpu, no htl, with non-secure guest
 	=> works, "unpack" feature is present but unused
 - User specifies modern/default cpu, no htl, secure guest
  	=> this is the worst one.  It kind of works by accident if
	   you've also  manually specified whatever virtio (and
	   anything else) options are necessary. Ugly, but no
	   different from the situation right now, IIUC

> (Is there any requirement that the machine object has been already set
> up before the cpu features are processed? Or the other way around?)

CPUs are usually created by the machine, so I believe we can count on
the machine object being there first.

> Does this have any implications when probing with the 'none' machine?

I'm not sure.  In your case, I guess the cpu bit would still show up
as before, so it would tell you base feature availability, but not
whether you can use the new configuration option.

Since the HTL option is generic, you could still set it on the "none"
machine, though it wouldn't really have any effect.  That is, if you
could create a suitable object to point it at, which would depend on
... details.
David Gibson June 25, 2020, 5:42 a.m. UTC | #13
On Fri, Jun 19, 2020 at 12:04:25PM +0200, David Hildenbrand wrote:
> >> However, once we have multiple options to protect a guest (memory
> >> encryption, unmapping guest pages ,...) the name will no longer really
> >> suffice to configure QEMU, no?
> > 
> > That's why it takes a parameter.  It points to an object which can
> > itself have more properties to configure the details.  SEV already
> > needs that set up, though for both PEF and s390 PV we could pre-create
> > a standard htl object.
> 
> Ah, okay, that's the "platform specific object which configures and
> manages the specific details". It would have been nice in the cover
> letter to show some examples of how that would look like.

Ok, I can try to add some.

> So it's wrapping architecture-specific data in a common
> parameter. Hmm.

Well, I don't know I'd say "wrapping".  You have a common parameter
that points to an object with a well defined interface.  The available
implementations of that object will tend to be either zero or one per
architecture, but there's no theoretical reason it has to be.  Indeed
we expect at least 2 for x86 (SEV and the Intel one who's name I never
remember).  Extra ones are entirely plausible for POWER and maybe s390
too, when an updated version of PEF or PV inevitably rolls around.

Some sort of new HTL scheme which could work across multiple archs is
much less likely, but it's not totally impossible either.

> >>> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> >>> can be extended to cover the Intel and s390 mechanisms as well,
> >>> though.
> >>
> >> The only approach on s390x to not glue command line properties to the
> >> cpu model would be to remove the CPU model feature and replace it by the
> >> command line parameter. But that would, of course, be an incompatible break.
> > 
> > I don't really understand why you're so against setting the cpu
> > default parameters from the machine.  The machine already sets basic
> > configuration for all sorts of devices in the VM, that's kind of what
> > it's for.
> 
> It's a general design philosophy that the CPU model (especially the host
> CPU model) does not depend on other command line parameters (except the
> accelerator, and I think in corner cases on the machine). Necessary for
> reliable host model probing by libvirt, for example.

Ok, I've proposed a revision which doesn't require altering the CPU
model elsewhere in this thread.

> We also don't have similar things for nested virt.

I'm not sure what you're getting at there.

> >> How do upper layers actually figure out if memory encryption etc is
> >> available? on s390x, it's simply via the expanded host CPU model.
> > 
> > Haven't really tackled that yet.  But one way that works for multiple
> > systems has got to be better than a separate one for each, right?
> 
> I think that's an important piece. Especially once multiple different
> approaches are theoretically available one wants to sense from upper layers.

Fair point.

So... IIRC there's a general way of looking at available properties
for any object, including the machine.  So we can probe for
availability of the "host-trust-limitation" property itself easily
enough.

I guess we do need a way of probing for what implementations of the
htl interface are available.  And, if we go down that path, if there
are any pre-generated htl objects available.

> At least on s390x, it really is like just another CPU-visible feature
> that tells the guest that it can switch to protected mode.

Right.. which is great for you, since you already have a nice
orthogonal interface for that.   On POWER, (a) CPU model isn't enough
since you need a running ultravisor as well and (b) CPU feature
detection is already a real mess for.. reasons.
David Gibson June 25, 2020, 5:44 a.m. UTC | #14
On Mon, Jun 22, 2020 at 04:27:28PM +0200, Christian Borntraeger wrote:
> On 19.06.20 04:05, David Gibson wrote:
> > A number of hardware platforms are implementing mechanisms whereby the
> > hypervisor does not have unfettered access to guest memory, in order
> > to mitigate the security impact of a compromised hypervisor.
> > 
> > AMD's SEV implements this with in-cpu memory encryption, and Intel has
> > its own memory encryption mechanism.  POWER has an upcoming mechanism
> > to accomplish this in a different way, using a new memory protection
> > level plus a small trusted ultravisor.  s390 also has a protected
> > execution environment.
> > 
> > The current code (committed or draft) for these features has each
> > platform's version configured entirely differently.  That doesn't seem
> > ideal for users, or particularly for management layers.
> > 
> > AMD SEV introduces a notionally generic machine option
> > "machine-encryption", but it doesn't actually cover any cases other
> > than SEV.
> > 
> > This series is a proposal to at least partially unify configuration
> > for these mechanisms, by renaming and generalizing AMD's
> > "memory-encryption" property.  It is replaced by a
> > "host-trust-limitation" property pointing to a platform specific
> > object which configures and manages the specific details.
> > 
> > For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> > can be extended to cover the Intel and s390 mechanisms as well,
> > though.
> 
> Let me try to summarize what I understand what you try to achieve:
> one command line parameter for all platforms that 
> 
> common across all platforms:
> - disable KSM
> - by default enables iommu_platform

Pretty much, yes.  Plus, in future if we discover other things that
don't make sense in the context of a guest whose memory we can't
freely access, it can check for those and set sane defaults
accordingly.

> per platform:
> - setup the necessary encryption scheme when appropriate
> - block migration

That's true for now, but I believe there are plans to make secure
guests migratable, so that's not an inherent property.

> -....
> 
> 
> The tricky part is certainly the per platform thing. For example on
> s390 we just have a cpumodel flag that provides interfaces to the guest
> to switch into protected mode via the ultravisor. This works perfectly
> fine with the host model, so no need to configure anything.  The platform
> code then disables KSM _on_switchover_ and not in general.

Right, because your platform code is aware of the switchover.  On
POWER, we aren't.

> Because the 
> guest CAN switch into protected, but it does not have to.
> 
> So this feels really hard to do right. Would a virtual BoF on KVM forum
> be too late? We had a BoF on protected guests last year and that was
> valuable.
David Gibson June 25, 2020, 5:47 a.m. UTC | #15
On Wed, Jun 24, 2020 at 09:06:48AM +0200, Cornelia Huck wrote:
> On Mon, 22 Jun 2020 16:27:28 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > On 19.06.20 04:05, David Gibson wrote:
> > > A number of hardware platforms are implementing mechanisms whereby the
> > > hypervisor does not have unfettered access to guest memory, in order
> > > to mitigate the security impact of a compromised hypervisor.
> > > 
> > > AMD's SEV implements this with in-cpu memory encryption, and Intel has
> > > its own memory encryption mechanism.  POWER has an upcoming mechanism
> > > to accomplish this in a different way, using a new memory protection
> > > level plus a small trusted ultravisor.  s390 also has a protected
> > > execution environment.
> > > 
> > > The current code (committed or draft) for these features has each
> > > platform's version configured entirely differently.  That doesn't seem
> > > ideal for users, or particularly for management layers.
> > > 
> > > AMD SEV introduces a notionally generic machine option
> > > "machine-encryption", but it doesn't actually cover any cases other
> > > than SEV.
> > > 
> > > This series is a proposal to at least partially unify configuration
> > > for these mechanisms, by renaming and generalizing AMD's
> > > "memory-encryption" property.  It is replaced by a
> > > "host-trust-limitation" property pointing to a platform specific
> > > object which configures and manages the specific details.
> > > 
> > > For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> > > can be extended to cover the Intel and s390 mechanisms as well,
> > > though.  
> > 
> > Let me try to summarize what I understand what you try to achieve:
> > one command line parameter for all platforms that 
> > 
> > common across all platforms:
> > - disable KSM
> > - by default enables iommu_platform
> > 
> > 
> > per platform:
> > - setup the necessary encryption scheme when appropriate
> > - block migration
> > -....
> > 
> > 
> > The tricky part is certainly the per platform thing. For example on
> > s390 we just have a cpumodel flag that provides interfaces to the guest
> > to switch into protected mode via the ultravisor. This works perfectly
> > fine with the host model, so no need to configure anything.  The platform
> > code then disables KSM _on_switchover_ and not in general. Because the 
> > guest CAN switch into protected, but it does not have to.
> > 
> > So this feels really hard to do right. Would a virtual BoF on KVM forum
> > be too late? We had a BoF on protected guests last year and that was
> > valuable.
> 
> Maybe we can do some kind of call to discuss this earlier? (Maybe in
> the KVM call slot on Tuesdays?) I think it would be really helpful if
> everybody would have at least a general understanding about how
> encryption/protection works on the different architectures.

Yes, I think this would be a good idea.  KVM Forum is probably later
than we want, plus since it is virtual, I probably won't be shifting
into the right timezone to attend much of it.

I don't know when that Tuesday KVM call is.  Generally the best
available time for Australia/Europe meetings this time of year is 9am
CET / 5pm AEST.  As a once off I could go somewhat later into my
evening, though.
David Gibson June 25, 2020, 5:48 a.m. UTC | #16
On Thu, Jun 25, 2020 at 03:47:23PM +1000, David Gibson wrote:
> On Wed, Jun 24, 2020 at 09:06:48AM +0200, Cornelia Huck wrote:
> > On Mon, 22 Jun 2020 16:27:28 +0200
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > 
> > > On 19.06.20 04:05, David Gibson wrote:
> > > > A number of hardware platforms are implementing mechanisms whereby the
> > > > hypervisor does not have unfettered access to guest memory, in order
> > > > to mitigate the security impact of a compromised hypervisor.
> > > > 
> > > > AMD's SEV implements this with in-cpu memory encryption, and Intel has
> > > > its own memory encryption mechanism.  POWER has an upcoming mechanism
> > > > to accomplish this in a different way, using a new memory protection
> > > > level plus a small trusted ultravisor.  s390 also has a protected
> > > > execution environment.
> > > > 
> > > > The current code (committed or draft) for these features has each
> > > > platform's version configured entirely differently.  That doesn't seem
> > > > ideal for users, or particularly for management layers.
> > > > 
> > > > AMD SEV introduces a notionally generic machine option
> > > > "machine-encryption", but it doesn't actually cover any cases other
> > > > than SEV.
> > > > 
> > > > This series is a proposal to at least partially unify configuration
> > > > for these mechanisms, by renaming and generalizing AMD's
> > > > "memory-encryption" property.  It is replaced by a
> > > > "host-trust-limitation" property pointing to a platform specific
> > > > object which configures and manages the specific details.
> > > > 
> > > > For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
> > > > can be extended to cover the Intel and s390 mechanisms as well,
> > > > though.  
> > > 
> > > Let me try to summarize what I understand what you try to achieve:
> > > one command line parameter for all platforms that 
> > > 
> > > common across all platforms:
> > > - disable KSM
> > > - by default enables iommu_platform
> > > 
> > > 
> > > per platform:
> > > - setup the necessary encryption scheme when appropriate
> > > - block migration
> > > -....
> > > 
> > > 
> > > The tricky part is certainly the per platform thing. For example on
> > > s390 we just have a cpumodel flag that provides interfaces to the guest
> > > to switch into protected mode via the ultravisor. This works perfectly
> > > fine with the host model, so no need to configure anything.  The platform
> > > code then disables KSM _on_switchover_ and not in general. Because the 
> > > guest CAN switch into protected, but it does not have to.
> > > 
> > > So this feels really hard to do right. Would a virtual BoF on KVM forum
> > > be too late? We had a BoF on protected guests last year and that was
> > > valuable.
> > 
> > Maybe we can do some kind of call to discuss this earlier? (Maybe in
> > the KVM call slot on Tuesdays?) I think it would be really helpful if
> > everybody would have at least a general understanding about how
> > encryption/protection works on the different architectures.
> 
> Yes, I think this would be a good idea.  KVM Forum is probably later
> than we want, plus since it is virtual, I probably won't be shifting
> into the right timezone to attend much of it.
> 
> I don't know when that Tuesday KVM call is.  Generally the best
> available time for Australia/Europe meetings this time of year is 9am
> CET / 5pm AEST.  As a once off I could go somewhat later into my
> evening, though.

Oh.. plus I'm on holiday next week and the one after (27 Jun - 12 Jul).
David Hildenbrand June 25, 2020, 6:59 a.m. UTC | #17
> 
>> So it's wrapping architecture-specific data in a common
>> parameter. Hmm.
> 
> Well, I don't know I'd say "wrapping".  You have a common parameter
> that points to an object with a well defined interface.  The available
> implementations of that object will tend to be either zero or one per
> architecture, but there's no theoretical reason it has to be.  Indeed
> we expect at least 2 for x86 (SEV and the Intel one who's name I never
> remember).  Extra ones are entirely plausible for POWER and maybe s390
> too, when an updated version of PEF or PV inevitably rolls around.
> 
> Some sort of new HTL scheme which could work across multiple archs is
> much less likely, but it's not totally impossible either.
> 
>>>>> For now this series covers just AMD SEV and POWER PEF.  I'm hoping it
>>>>> can be extended to cover the Intel and s390 mechanisms as well,
>>>>> though.
>>>>
>>>> The only approach on s390x to not glue command line properties to the
>>>> cpu model would be to remove the CPU model feature and replace it by the
>>>> command line parameter. But that would, of course, be an incompatible break.
>>>
>>> I don't really understand why you're so against setting the cpu
>>> default parameters from the machine.  The machine already sets basic
>>> configuration for all sorts of devices in the VM, that's kind of what
>>> it's for.
>>
>> It's a general design philosophy that the CPU model (especially the host
>> CPU model) does not depend on other command line parameters (except the
>> accelerator, and I think in corner cases on the machine). Necessary for
>> reliable host model probing by libvirt, for example.
> 
> Ok, I've proposed a revision which doesn't require altering the CPU
> model elsewhere in this thread.
> 
>> We also don't have similar things for nested virt.
> 
> I'm not sure what you're getting at there.

Sorry, back when we introduced nested virt there was a similar
(internal?) discussion, to enable/disable it via a machine flag and not
via 1..X CPU features. We went for the latter, because it matches the
actual architecture and allows for easy migration checks etc. Nested
virt also collides with some features currently (e.g., huge page backing
for the guest), but not as severe as encrypted virtualization.

> 
>>>> How do upper layers actually figure out if memory encryption etc is
>>>> available? on s390x, it's simply via the expanded host CPU model.
>>>
>>> Haven't really tackled that yet.  But one way that works for multiple
>>> systems has got to be better than a separate one for each, right?
>>
>> I think that's an important piece. Especially once multiple different
>> approaches are theoretically available one wants to sense from upper layers.
> 
> Fair point.
> 
> So... IIRC there's a general way of looking at available properties
> for any object, including the machine.  So we can probe for
> availability of the "host-trust-limitation" property itself easily
> enough.

You can have a look at how it's currently probed by libvirt in

https://www.redhat.com/archives/libvir-list/2020-June/msg00518.html

For now, the s390x check consists of
- checking if /sys/firmware/uv is available
- checking if the kernel cmdline contains 'prot_virt=1'

The sev check is
- checking if /sys/module/kvm_amd/parameters/sev contains the
   value '1'
- checking if /dev/sev

So at least libvirt does not sense via the CPU model on s390x yet.

> 
> I guess we do need a way of probing for what implementations of the
> htl interface are available.  And, if we go down that path, if there
> are any pre-generated htl objects available.
> 
>> At least on s390x, it really is like just another CPU-visible feature
>> that tells the guest that it can switch to protected mode.
> 
> Right.. which is great for you, since you already have a nice
> orthogonal interface for that.   On POWER, (a) CPU model isn't enough
> since you need a running ultravisor as well and (b) CPU feature
> detection is already a real mess for.. reasons.

I can understand the pain of the latter ... :)
David Hildenbrand June 25, 2020, 7:06 a.m. UTC | #18
>> Still unsure how to bring this new machine property and the cpu feature
>> together. Would be great to have the same interface everywhere, but
>> having two distinct command line objects depend on each other sucks.
> 
> Kinda, but the reality is that hardware - virtual and otherwise -
> frequently doesn't have entirely orthogonal configuration for each of
> its components.  This is by no means new in that regard.
> 
>> Automatically setting the feature bit if pv is supported complicates
>> things further.
> 
> AIUI, on s390 the "unpack" feature is available by default on recent
> models.  In that case you could do this:
> 
>  * Don't modify either cpu or HTL options based on each other
>  * Bail out if the user specifies a non "unpack" secure CPU along with
>    the HTL option
> 
> Cases of note:
>  - User specifies an old CPU model + htl
>    or explicitly sets unpack=off + htl
> 	=> fails with an error, correctly
>  - User specifies modern/default cpu + htl, with secure aware guest
>  	=> works as a secure guest
>  - User specifies modern/default cpu + htl, with non secure aware guest
> 	=> works, though not secure (and maybe slower than neccessary)
>  - User specifies modern/default cpu, no htl, with non-secure guest
>  	=> works, "unpack" feature is present but unused
>  - User specifies modern/default cpu, no htl, secure guest
>   	=> this is the worst one.  It kind of works by accident if
> 	   you've also  manually specified whatever virtio (and
> 	   anything else) options are necessary. Ugly, but no
> 	   different from the situation right now, IIUC
> 
>> (Is there any requirement that the machine object has been already set
>> up before the cpu features are processed? Or the other way around?)
> 
> CPUs are usually created by the machine, so I believe we can count on
> the machine object being there first.

CPU model initialization is one of the first things machine
initialization code does on s390x.

static void ccw_init(MachineState *machine)
{
    [... memory init ...]
    s390_sclp_init();
    s390_memory_init(machine->ram);
    /* init CPUs (incl. CPU model) early so s390_has_feature() works */
    s390_init_cpus(machine);
    [...]
}

> 
>> Does this have any implications when probing with the 'none' machine?
> 
> I'm not sure.  In your case, I guess the cpu bit would still show up
> as before, so it would tell you base feature availability, but not
> whether you can use the new configuration option.
> 
> Since the HTL option is generic, you could still set it on the "none"
> machine, though it wouldn't really have any effect.  That is, if you
> could create a suitable object to point it at, which would depend on
> ... details.
> 

The important point is that we never want the (expanded) host cpu model
look different when either specifying or not specifying the HTL
property. We don't want to run into issues where libvirt probes and gets
host model X, but when using that probed model (automatically) for a
guest domain, we suddenly cannot run X anymore.
Cornelia Huck June 25, 2020, 9:49 a.m. UTC | #19
On Thu, 25 Jun 2020 08:59:00 +0200
David Hildenbrand <david@redhat.com> wrote:

> >>>> How do upper layers actually figure out if memory encryption etc is
> >>>> available? on s390x, it's simply via the expanded host CPU model.  
> >>>
> >>> Haven't really tackled that yet.  But one way that works for multiple
> >>> systems has got to be better than a separate one for each, right?  
> >>
> >> I think that's an important piece. Especially once multiple different
> >> approaches are theoretically available one wants to sense from upper layers.  
> > 
> > Fair point.
> > 
> > So... IIRC there's a general way of looking at available properties
> > for any object, including the machine.  So we can probe for
> > availability of the "host-trust-limitation" property itself easily
> > enough.  
> 
> You can have a look at how it's currently probed by libvirt in
> 
> https://www.redhat.com/archives/libvir-list/2020-June/msg00518.html
> 
> For now, the s390x check consists of
> - checking if /sys/firmware/uv is available
> - checking if the kernel cmdline contains 'prot_virt=1'
> 
> The sev check is
> - checking if /sys/module/kvm_amd/parameters/sev contains the
>    value '1'
> - checking if /dev/sev
> 
> So at least libvirt does not sense via the CPU model on s390x yet.

It checks for 158 (which is apparently 'host supports secure
execution'). IIUC, only 161 ('unpack facility') is relevant for the
guest... does that also show up on the host? (I guess it does, as it
describes an ultravisor feature, IIUC.) If it is always implied,
libvirt probably does not need an extra check.
David Gibson June 26, 2020, 4:42 a.m. UTC | #20
On Thu, Jun 25, 2020 at 09:06:05AM +0200, David Hildenbrand wrote:
> >> Still unsure how to bring this new machine property and the cpu feature
> >> together. Would be great to have the same interface everywhere, but
> >> having two distinct command line objects depend on each other sucks.
> > 
> > Kinda, but the reality is that hardware - virtual and otherwise -
> > frequently doesn't have entirely orthogonal configuration for each of
> > its components.  This is by no means new in that regard.
> > 
> >> Automatically setting the feature bit if pv is supported complicates
> >> things further.
> > 
> > AIUI, on s390 the "unpack" feature is available by default on recent
> > models.  In that case you could do this:
> > 
> >  * Don't modify either cpu or HTL options based on each other
> >  * Bail out if the user specifies a non "unpack" secure CPU along with
> >    the HTL option
> > 
> > Cases of note:
> >  - User specifies an old CPU model + htl
> >    or explicitly sets unpack=off + htl
> > 	=> fails with an error, correctly
> >  - User specifies modern/default cpu + htl, with secure aware guest
> >  	=> works as a secure guest
> >  - User specifies modern/default cpu + htl, with non secure aware guest
> > 	=> works, though not secure (and maybe slower than neccessary)
> >  - User specifies modern/default cpu, no htl, with non-secure guest
> >  	=> works, "unpack" feature is present but unused
> >  - User specifies modern/default cpu, no htl, secure guest
> >   	=> this is the worst one.  It kind of works by accident if
> > 	   you've also  manually specified whatever virtio (and
> > 	   anything else) options are necessary. Ugly, but no
> > 	   different from the situation right now, IIUC
> > 
> >> (Is there any requirement that the machine object has been already set
> >> up before the cpu features are processed? Or the other way around?)
> > 
> > CPUs are usually created by the machine, so I believe we can count on
> > the machine object being there first.
> 
> CPU model initialization is one of the first things machine
> initialization code does on s390x.

As it is for most platforms, but still, the values of machine
properties are available to you at this point.

> static void ccw_init(MachineState *machine)
> {
>     [... memory init ...]
>     s390_sclp_init();
>     s390_memory_init(machine->ram);
>     /* init CPUs (incl. CPU model) early so s390_has_feature() works */
>     s390_init_cpus(machine);
>     [...]
> }
> 
> > 
> >> Does this have any implications when probing with the 'none' machine?
> > 
> > I'm not sure.  In your case, I guess the cpu bit would still show up
> > as before, so it would tell you base feature availability, but not
> > whether you can use the new configuration option.
> > 
> > Since the HTL option is generic, you could still set it on the "none"
> > machine, though it wouldn't really have any effect.  That is, if you
> > could create a suitable object to point it at, which would depend on
> > ... details.
> > 
> 
> The important point is that we never want the (expanded) host cpu model
> look different when either specifying or not specifying the HTL
> property.

Ah, yes, I see your point.  So my current suggestion will satisfy
that, basically it is:

cpu has unpack (inc. by default) && htl specified
	=> works (allowing secure), as expected

!cpu has unpack && htl specified
	=> bails out with an error

!cpu has unpack && !htl specified
	=> works for a non-secure guest, as expected
	=> guest will fail if it attempts to go secure

cpu has unpack && !htl specified
	=> works as expected for a non-secure guest (unpack feature is
	   present, but unused)
	=> secure guest may work "by accident", but only if all virtio
	   properties have the right values, which is the user's
	   problem

That last case is kinda ugly, but I think it's tolerable.

> We don't want to run into issues where libvirt probes and gets
> host model X, but when using that probed model (automatically) for a
> guest domain, we suddenly cannot run X anymore.
>
David Hildenbrand June 26, 2020, 6:53 a.m. UTC | #21
>>>> Does this have any implications when probing with the 'none' machine?
>>>
>>> I'm not sure.  In your case, I guess the cpu bit would still show up
>>> as before, so it would tell you base feature availability, but not
>>> whether you can use the new configuration option.
>>>
>>> Since the HTL option is generic, you could still set it on the "none"
>>> machine, though it wouldn't really have any effect.  That is, if you
>>> could create a suitable object to point it at, which would depend on
>>> ... details.
>>>
>>
>> The important point is that we never want the (expanded) host cpu model
>> look different when either specifying or not specifying the HTL
>> property.
> 
> Ah, yes, I see your point.  So my current suggestion will satisfy
> that, basically it is:
> 
> cpu has unpack (inc. by default) && htl specified
> 	=> works (allowing secure), as expected

ack

> 
> !cpu has unpack && htl specified
> 	=> bails out with an error

ack

> 
> !cpu has unpack && !htl specified
> 	=> works for a non-secure guest, as expected
> 	=> guest will fail if it attempts to go secure

ack, behavior just like running on older hw without unpack

> 
> cpu has unpack && !htl specified
> 	=> works as expected for a non-secure guest (unpack feature is
> 	   present, but unused)
> 	=> secure guest may work "by accident", but only if all virtio
> 	   properties have the right values, which is the user's
> 	   problem
> 
> That last case is kinda ugly, but I think it's tolerable.

Right, we must not affect non-secure guests, and existing secure setups
(e.g., older qemu machines). Will have to think about this some more,
but does not sound too crazy.

Thanks!
Janosch Frank June 26, 2020, 9:01 a.m. UTC | #22
On 6/26/20 8:53 AM, David Hildenbrand wrote:
>>>>> Does this have any implications when probing with the 'none' machine?
>>>>
>>>> I'm not sure.  In your case, I guess the cpu bit would still show up
>>>> as before, so it would tell you base feature availability, but not
>>>> whether you can use the new configuration option.
>>>>
>>>> Since the HTL option is generic, you could still set it on the "none"
>>>> machine, though it wouldn't really have any effect.  That is, if you
>>>> could create a suitable object to point it at, which would depend on
>>>> ... details.
>>>>
>>>
>>> The important point is that we never want the (expanded) host cpu model
>>> look different when either specifying or not specifying the HTL
>>> property.
>>
>> Ah, yes, I see your point.  So my current suggestion will satisfy
>> that, basically it is:
>>
>> cpu has unpack (inc. by default) && htl specified
>> 	=> works (allowing secure), as expected
> 
> ack
> 
>>
>> !cpu has unpack && htl specified
>> 	=> bails out with an error
> 
> ack
> 
>>
>> !cpu has unpack && !htl specified
>> 	=> works for a non-secure guest, as expected
>> 	=> guest will fail if it attempts to go secure
> 
> ack, behavior just like running on older hw without unpack
> 
>>
>> cpu has unpack && !htl specified
>> 	=> works as expected for a non-secure guest (unpack feature is
>> 	   present, but unused)
>> 	=> secure guest may work "by accident", but only if all virtio
>> 	   properties have the right values, which is the user's
>> 	   problem
>>
>> That last case is kinda ugly, but I think it's tolerable.
> 
> Right, we must not affect non-secure guests, and existing secure setups
> (e.g., older qemu machines). Will have to think about this some more,
> but does not sound too crazy.

I severely dislike having to specify things to make PV work.
The IOMMU is already a thorn in our side and we're working on making the
whole ordeal completely transparent so the only requirement to make this
work is the right machine, kernel, qemu and kernel cmd line option
"prot_virt=1". That's why we do the reboot into PV mode in the first place.

I.e. the goal is that if customers convert compatible guests into
protected ones and start them up on a z15 on a distro with PV support
they can just use the guest without having to change XML or command line
parameters.

Internal customers have already created bugs because they did not follow
the documentation and the more cmd options we bring the more bugzillas
we'll get.

PV is already in the field/GA and can be ordered, as is our documentation.

@Christian: Please chime in here

> 
> Thanks!
>
Daniel P. Berrangé June 26, 2020, 9:32 a.m. UTC | #23
On Fri, Jun 26, 2020 at 11:01:58AM +0200, Janosch Frank wrote:
> On 6/26/20 8:53 AM, David Hildenbrand wrote:
> >>>>> Does this have any implications when probing with the 'none' machine?
> >>>>
> >>>> I'm not sure.  In your case, I guess the cpu bit would still show up
> >>>> as before, so it would tell you base feature availability, but not
> >>>> whether you can use the new configuration option.
> >>>>
> >>>> Since the HTL option is generic, you could still set it on the "none"
> >>>> machine, though it wouldn't really have any effect.  That is, if you
> >>>> could create a suitable object to point it at, which would depend on
> >>>> ... details.
> >>>>
> >>>
> >>> The important point is that we never want the (expanded) host cpu model
> >>> look different when either specifying or not specifying the HTL
> >>> property.
> >>
> >> Ah, yes, I see your point.  So my current suggestion will satisfy
> >> that, basically it is:
> >>
> >> cpu has unpack (inc. by default) && htl specified
> >> 	=> works (allowing secure), as expected
> > 
> > ack
> > 
> >>
> >> !cpu has unpack && htl specified
> >> 	=> bails out with an error
> > 
> > ack
> > 
> >>
> >> !cpu has unpack && !htl specified
> >> 	=> works for a non-secure guest, as expected
> >> 	=> guest will fail if it attempts to go secure
> > 
> > ack, behavior just like running on older hw without unpack
> > 
> >>
> >> cpu has unpack && !htl specified
> >> 	=> works as expected for a non-secure guest (unpack feature is
> >> 	   present, but unused)
> >> 	=> secure guest may work "by accident", but only if all virtio
> >> 	   properties have the right values, which is the user's
> >> 	   problem
> >>
> >> That last case is kinda ugly, but I think it's tolerable.
> > 
> > Right, we must not affect non-secure guests, and existing secure setups
> > (e.g., older qemu machines). Will have to think about this some more,
> > but does not sound too crazy.
> 
> I severely dislike having to specify things to make PV work.
> The IOMMU is already a thorn in our side and we're working on making the
> whole ordeal completely transparent so the only requirement to make this
> work is the right machine, kernel, qemu and kernel cmd line option
> "prot_virt=1". That's why we do the reboot into PV mode in the first place.
> 
> I.e. the goal is that if customers convert compatible guests into
> protected ones and start them up on a z15 on a distro with PV support
> they can just use the guest without having to change XML or command line
> parameters.

If you're exposing new features to the guest machine, then it is usually
to be expected that XML and QEMU command line will change. Some simple
things might be hidable behind a new QEMU machine type or CPU model, but
there's a limit to how much should be hidden that way while staying sane.

I'd really expect the configuration to change when switching a guest to
a new hardware platform and wanting major new functionality to be enabled.
The XML / QEMU config is a low level instantiation of a particular feature
set, optimized for a specific machine, rather than a high level description
of ideal "best" config independent of host machine.

Regards,
Daniel
Janosch Frank June 26, 2020, 9:49 a.m. UTC | #24
On 6/26/20 11:32 AM, Daniel P. Berrangé wrote:
> On Fri, Jun 26, 2020 at 11:01:58AM +0200, Janosch Frank wrote:
>> On 6/26/20 8:53 AM, David Hildenbrand wrote:
>>>>>>> Does this have any implications when probing with the 'none' machine?
>>>>>>
>>>>>> I'm not sure.  In your case, I guess the cpu bit would still show up
>>>>>> as before, so it would tell you base feature availability, but not
>>>>>> whether you can use the new configuration option.
>>>>>>
>>>>>> Since the HTL option is generic, you could still set it on the "none"
>>>>>> machine, though it wouldn't really have any effect.  That is, if you
>>>>>> could create a suitable object to point it at, which would depend on
>>>>>> ... details.
>>>>>>
>>>>>
>>>>> The important point is that we never want the (expanded) host cpu model
>>>>> look different when either specifying or not specifying the HTL
>>>>> property.
>>>>
>>>> Ah, yes, I see your point.  So my current suggestion will satisfy
>>>> that, basically it is:
>>>>
>>>> cpu has unpack (inc. by default) && htl specified
>>>> 	=> works (allowing secure), as expected
>>>
>>> ack
>>>
>>>>
>>>> !cpu has unpack && htl specified
>>>> 	=> bails out with an error
>>>
>>> ack
>>>
>>>>
>>>> !cpu has unpack && !htl specified
>>>> 	=> works for a non-secure guest, as expected
>>>> 	=> guest will fail if it attempts to go secure
>>>
>>> ack, behavior just like running on older hw without unpack
>>>
>>>>
>>>> cpu has unpack && !htl specified
>>>> 	=> works as expected for a non-secure guest (unpack feature is
>>>> 	   present, but unused)
>>>> 	=> secure guest may work "by accident", but only if all virtio
>>>> 	   properties have the right values, which is the user's
>>>> 	   problem
>>>>
>>>> That last case is kinda ugly, but I think it's tolerable.
>>>
>>> Right, we must not affect non-secure guests, and existing secure setups
>>> (e.g., older qemu machines). Will have to think about this some more,
>>> but does not sound too crazy.
>>
>> I severely dislike having to specify things to make PV work.
>> The IOMMU is already a thorn in our side and we're working on making the
>> whole ordeal completely transparent so the only requirement to make this
>> work is the right machine, kernel, qemu and kernel cmd line option
>> "prot_virt=1". That's why we do the reboot into PV mode in the first place.
>>
>> I.e. the goal is that if customers convert compatible guests into
>> protected ones and start them up on a z15 on a distro with PV support
>> they can just use the guest without having to change XML or command line
>> parameters.
> 
> If you're exposing new features to the guest machine, then it is usually
> to be expected that XML and QEMU command line will change. Some simple
> things might be hidable behind a new QEMU machine type or CPU model, but
> there's a limit to how much should be hidden that way while staying sane.
> 
> I'd really expect the configuration to change when switching a guest to
> a new hardware platform and wanting major new functionality to be enabled.
> The XML / QEMU config is a low level instantiation of a particular feature
> set, optimized for a specific machine, rather than a high level description
> of ideal "best" config independent of host machine.

You still have to set the host command line and make sure that unpack is
available. Currently you also have to specify the IOMMU which we like to
drop as a requirement. Everything else is dependent on runtime
information which tells us if we need to take a PV or non-PV branch.
Having the unpack facility should be enough to use the unpack facility.

Keep in mind that we have no real concept of a special protected VM to
begin with. If the VM never boots into a protected kernel it will never
be protected. On a reboot it drops from protected into unprotected mode
to execute the bios and boot loader and then may or may not move back
into a protected state.

> 
> Regards,
> Daniel
>
Dr. David Alan Gilbert June 26, 2020, 10:29 a.m. UTC | #25
* Janosch Frank (frankja@linux.ibm.com) wrote:
> On 6/26/20 11:32 AM, Daniel P. Berrangé wrote:
> > On Fri, Jun 26, 2020 at 11:01:58AM +0200, Janosch Frank wrote:
> >> On 6/26/20 8:53 AM, David Hildenbrand wrote:
> >>>>>>> Does this have any implications when probing with the 'none' machine?
> >>>>>>
> >>>>>> I'm not sure.  In your case, I guess the cpu bit would still show up
> >>>>>> as before, so it would tell you base feature availability, but not
> >>>>>> whether you can use the new configuration option.
> >>>>>>
> >>>>>> Since the HTL option is generic, you could still set it on the "none"
> >>>>>> machine, though it wouldn't really have any effect.  That is, if you
> >>>>>> could create a suitable object to point it at, which would depend on
> >>>>>> ... details.
> >>>>>>
> >>>>>
> >>>>> The important point is that we never want the (expanded) host cpu model
> >>>>> look different when either specifying or not specifying the HTL
> >>>>> property.
> >>>>
> >>>> Ah, yes, I see your point.  So my current suggestion will satisfy
> >>>> that, basically it is:
> >>>>
> >>>> cpu has unpack (inc. by default) && htl specified
> >>>> 	=> works (allowing secure), as expected
> >>>
> >>> ack
> >>>
> >>>>
> >>>> !cpu has unpack && htl specified
> >>>> 	=> bails out with an error
> >>>
> >>> ack
> >>>
> >>>>
> >>>> !cpu has unpack && !htl specified
> >>>> 	=> works for a non-secure guest, as expected
> >>>> 	=> guest will fail if it attempts to go secure
> >>>
> >>> ack, behavior just like running on older hw without unpack
> >>>
> >>>>
> >>>> cpu has unpack && !htl specified
> >>>> 	=> works as expected for a non-secure guest (unpack feature is
> >>>> 	   present, but unused)
> >>>> 	=> secure guest may work "by accident", but only if all virtio
> >>>> 	   properties have the right values, which is the user's
> >>>> 	   problem
> >>>>
> >>>> That last case is kinda ugly, but I think it's tolerable.
> >>>
> >>> Right, we must not affect non-secure guests, and existing secure setups
> >>> (e.g., older qemu machines). Will have to think about this some more,
> >>> but does not sound too crazy.
> >>
> >> I severely dislike having to specify things to make PV work.
> >> The IOMMU is already a thorn in our side and we're working on making the
> >> whole ordeal completely transparent so the only requirement to make this
> >> work is the right machine, kernel, qemu and kernel cmd line option
> >> "prot_virt=1". That's why we do the reboot into PV mode in the first place.
> >>
> >> I.e. the goal is that if customers convert compatible guests into
> >> protected ones and start them up on a z15 on a distro with PV support
> >> they can just use the guest without having to change XML or command line
> >> parameters.
> > 
> > If you're exposing new features to the guest machine, then it is usually
> > to be expected that XML and QEMU command line will change. Some simple
> > things might be hidable behind a new QEMU machine type or CPU model, but
> > there's a limit to how much should be hidden that way while staying sane.
> > 
> > I'd really expect the configuration to change when switching a guest to
> > a new hardware platform and wanting major new functionality to be enabled.
> > The XML / QEMU config is a low level instantiation of a particular feature
> > set, optimized for a specific machine, rather than a high level description
> > of ideal "best" config independent of host machine.
> 
> You still have to set the host command line and make sure that unpack is
> available. Currently you also have to specify the IOMMU which we like to
> drop as a requirement. Everything else is dependent on runtime
> information which tells us if we need to take a PV or non-PV branch.
> Having the unpack facility should be enough to use the unpack facility.
> 
> Keep in mind that we have no real concept of a special protected VM to
> begin with. If the VM never boots into a protected kernel it will never
> be protected. On a reboot it drops from protected into unprotected mode
> to execute the bios and boot loader and then may or may not move back
> into a protected state.

My worry isn't actually how painful adding all the iommu glue is, but
what happens when users forget; especially if they forget for one
device.

I could appreciate having a machine option to cause iommu to then get
turned on with all other devices; but I think also we could do with
something that failed with a nice error if an iommu flag was missing.
For SEV this could be done pretty early, but for power/s390 I guess
you'd have to do this when someone tried to enable secure mode, but
I'm not sure you can tell.

Dave


> > 
> > Regards,
> > Daniel
> > 
> 
> 



--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Daniel P. Berrangé June 26, 2020, 10:58 a.m. UTC | #26
On Fri, Jun 26, 2020 at 11:29:03AM +0100, Dr. David Alan Gilbert wrote:
> * Janosch Frank (frankja@linux.ibm.com) wrote:
> > On 6/26/20 11:32 AM, Daniel P. Berrangé wrote:
> > > On Fri, Jun 26, 2020 at 11:01:58AM +0200, Janosch Frank wrote:
> > >> On 6/26/20 8:53 AM, David Hildenbrand wrote:
> > >>>>>>> Does this have any implications when probing with the 'none' machine?
> > >>>>>>
> > >>>>>> I'm not sure.  In your case, I guess the cpu bit would still show up
> > >>>>>> as before, so it would tell you base feature availability, but not
> > >>>>>> whether you can use the new configuration option.
> > >>>>>>
> > >>>>>> Since the HTL option is generic, you could still set it on the "none"
> > >>>>>> machine, though it wouldn't really have any effect.  That is, if you
> > >>>>>> could create a suitable object to point it at, which would depend on
> > >>>>>> ... details.
> > >>>>>>
> > >>>>>
> > >>>>> The important point is that we never want the (expanded) host cpu model
> > >>>>> look different when either specifying or not specifying the HTL
> > >>>>> property.
> > >>>>
> > >>>> Ah, yes, I see your point.  So my current suggestion will satisfy
> > >>>> that, basically it is:
> > >>>>
> > >>>> cpu has unpack (inc. by default) && htl specified
> > >>>> 	=> works (allowing secure), as expected
> > >>>
> > >>> ack
> > >>>
> > >>>>
> > >>>> !cpu has unpack && htl specified
> > >>>> 	=> bails out with an error
> > >>>
> > >>> ack
> > >>>
> > >>>>
> > >>>> !cpu has unpack && !htl specified
> > >>>> 	=> works for a non-secure guest, as expected
> > >>>> 	=> guest will fail if it attempts to go secure
> > >>>
> > >>> ack, behavior just like running on older hw without unpack
> > >>>
> > >>>>
> > >>>> cpu has unpack && !htl specified
> > >>>> 	=> works as expected for a non-secure guest (unpack feature is
> > >>>> 	   present, but unused)
> > >>>> 	=> secure guest may work "by accident", but only if all virtio
> > >>>> 	   properties have the right values, which is the user's
> > >>>> 	   problem
> > >>>>
> > >>>> That last case is kinda ugly, but I think it's tolerable.
> > >>>
> > >>> Right, we must not affect non-secure guests, and existing secure setups
> > >>> (e.g., older qemu machines). Will have to think about this some more,
> > >>> but does not sound too crazy.
> > >>
> > >> I severely dislike having to specify things to make PV work.
> > >> The IOMMU is already a thorn in our side and we're working on making the
> > >> whole ordeal completely transparent so the only requirement to make this
> > >> work is the right machine, kernel, qemu and kernel cmd line option
> > >> "prot_virt=1". That's why we do the reboot into PV mode in the first place.
> > >>
> > >> I.e. the goal is that if customers convert compatible guests into
> > >> protected ones and start them up on a z15 on a distro with PV support
> > >> they can just use the guest without having to change XML or command line
> > >> parameters.
> > > 
> > > If you're exposing new features to the guest machine, then it is usually
> > > to be expected that XML and QEMU command line will change. Some simple
> > > things might be hidable behind a new QEMU machine type or CPU model, but
> > > there's a limit to how much should be hidden that way while staying sane.
> > > 
> > > I'd really expect the configuration to change when switching a guest to
> > > a new hardware platform and wanting major new functionality to be enabled.
> > > The XML / QEMU config is a low level instantiation of a particular feature
> > > set, optimized for a specific machine, rather than a high level description
> > > of ideal "best" config independent of host machine.
> > 
> > You still have to set the host command line and make sure that unpack is
> > available. Currently you also have to specify the IOMMU which we like to
> > drop as a requirement. Everything else is dependent on runtime
> > information which tells us if we need to take a PV or non-PV branch.
> > Having the unpack facility should be enough to use the unpack facility.
> > 
> > Keep in mind that we have no real concept of a special protected VM to
> > begin with. If the VM never boots into a protected kernel it will never
> > be protected. On a reboot it drops from protected into unprotected mode
> > to execute the bios and boot loader and then may or may not move back
> > into a protected state.
> 
> My worry isn't actually how painful adding all the iommu glue is, but
> what happens when users forget; especially if they forget for one
> device.
> 
> I could appreciate having a machine option to cause iommu to then get
> turned on with all other devices; but I think also we could do with
> something that failed with a nice error if an iommu flag was missing.
> For SEV this could be done pretty early, but for power/s390 I guess
> you'd have to do this when someone tried to enable secure mode, but
> I'm not sure you can tell.

What is the cost / downside of turning on the iommu option for virtio
devices ? Is it something that is reasonable for a mgmt app todo
unconditionally, regardless of whether memory encryption is in use,
or will that have a negative impact on things ?

Regards,
Daniel
Janosch Frank June 26, 2020, 12:49 p.m. UTC | #27
On 6/26/20 12:58 PM, Daniel P. Berrangé wrote:
> On Fri, Jun 26, 2020 at 11:29:03AM +0100, Dr. David Alan Gilbert wrote:
>> * Janosch Frank (frankja@linux.ibm.com) wrote:
>>> On 6/26/20 11:32 AM, Daniel P. Berrangé wrote:
>>>> On Fri, Jun 26, 2020 at 11:01:58AM +0200, Janosch Frank wrote:
>>>>> On 6/26/20 8:53 AM, David Hildenbrand wrote:
>>>>>>>>>> Does this have any implications when probing with the 'none' machine?
>>>>>>>>>
>>>>>>>>> I'm not sure.  In your case, I guess the cpu bit would still show up
>>>>>>>>> as before, so it would tell you base feature availability, but not
>>>>>>>>> whether you can use the new configuration option.
>>>>>>>>>
>>>>>>>>> Since the HTL option is generic, you could still set it on the "none"
>>>>>>>>> machine, though it wouldn't really have any effect.  That is, if you
>>>>>>>>> could create a suitable object to point it at, which would depend on
>>>>>>>>> ... details.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The important point is that we never want the (expanded) host cpu model
>>>>>>>> look different when either specifying or not specifying the HTL
>>>>>>>> property.
>>>>>>>
>>>>>>> Ah, yes, I see your point.  So my current suggestion will satisfy
>>>>>>> that, basically it is:
>>>>>>>
>>>>>>> cpu has unpack (inc. by default) && htl specified
>>>>>>> 	=> works (allowing secure), as expected
>>>>>>
>>>>>> ack
>>>>>>
>>>>>>>
>>>>>>> !cpu has unpack && htl specified
>>>>>>> 	=> bails out with an error
>>>>>>
>>>>>> ack
>>>>>>
>>>>>>>
>>>>>>> !cpu has unpack && !htl specified
>>>>>>> 	=> works for a non-secure guest, as expected
>>>>>>> 	=> guest will fail if it attempts to go secure
>>>>>>
>>>>>> ack, behavior just like running on older hw without unpack
>>>>>>
>>>>>>>
>>>>>>> cpu has unpack && !htl specified
>>>>>>> 	=> works as expected for a non-secure guest (unpack feature is
>>>>>>> 	   present, but unused)
>>>>>>> 	=> secure guest may work "by accident", but only if all virtio
>>>>>>> 	   properties have the right values, which is the user's
>>>>>>> 	   problem
>>>>>>>
>>>>>>> That last case is kinda ugly, but I think it's tolerable.
>>>>>>
>>>>>> Right, we must not affect non-secure guests, and existing secure setups
>>>>>> (e.g., older qemu machines). Will have to think about this some more,
>>>>>> but does not sound too crazy.
>>>>>
>>>>> I severely dislike having to specify things to make PV work.
>>>>> The IOMMU is already a thorn in our side and we're working on making the
>>>>> whole ordeal completely transparent so the only requirement to make this
>>>>> work is the right machine, kernel, qemu and kernel cmd line option
>>>>> "prot_virt=1". That's why we do the reboot into PV mode in the first place.
>>>>>
>>>>> I.e. the goal is that if customers convert compatible guests into
>>>>> protected ones and start them up on a z15 on a distro with PV support
>>>>> they can just use the guest without having to change XML or command line
>>>>> parameters.
>>>>
>>>> If you're exposing new features to the guest machine, then it is usually
>>>> to be expected that XML and QEMU command line will change. Some simple
>>>> things might be hidable behind a new QEMU machine type or CPU model, but
>>>> there's a limit to how much should be hidden that way while staying sane.
>>>>
>>>> I'd really expect the configuration to change when switching a guest to
>>>> a new hardware platform and wanting major new functionality to be enabled.
>>>> The XML / QEMU config is a low level instantiation of a particular feature
>>>> set, optimized for a specific machine, rather than a high level description
>>>> of ideal "best" config independent of host machine.
>>>
>>> You still have to set the host command line and make sure that unpack is
>>> available. Currently you also have to specify the IOMMU which we like to
>>> drop as a requirement. Everything else is dependent on runtime
>>> information which tells us if we need to take a PV or non-PV branch.
>>> Having the unpack facility should be enough to use the unpack facility.
>>>
>>> Keep in mind that we have no real concept of a special protected VM to
>>> begin with. If the VM never boots into a protected kernel it will never
>>> be protected. On a reboot it drops from protected into unprotected mode
>>> to execute the bios and boot loader and then may or may not move back
>>> into a protected state.
>>
>> My worry isn't actually how painful adding all the iommu glue is, but
>> what happens when users forget; especially if they forget for one
>> device.
>>
>> I could appreciate having a machine option to cause iommu to then get
>> turned on with all other devices; but I think also we could do with
>> something that failed with a nice error if an iommu flag was missing.
>> For SEV this could be done pretty early, but for power/s390 I guess
>> you'd have to do this when someone tried to enable secure mode, but
>> I'm not sure you can tell.
> 
> What is the cost / downside of turning on the iommu option for virtio
> devices ? Is it something that is reasonable for a mgmt app todo
> unconditionally, regardless of whether memory encryption is in use,
> or will that have a negative impact on things ?

speed, memory usage and compatibility problems.
There might also be a problem with s390 having to use <=2GB iommu areas
in the guest, I need to check with Halil if this is still true.

Also, if the default or specified IOMMU buffer size isn't big enough for
your IO workload the guest is gonna have a very bad time. I.e. if
somebody has an alternative implementation of bounce buffers we'd be
happy to take it :)

> 
> Regards,
> Daniel
>
Halil Pasic July 1, 2020, 11:59 a.m. UTC | #28
On Fri, 26 Jun 2020 14:49:37 +0200
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 6/26/20 12:58 PM, Daniel P. Berrangé wrote:
> > On Fri, Jun 26, 2020 at 11:29:03AM +0100, Dr. David Alan Gilbert wrote:
> >> * Janosch Frank (frankja@linux.ibm.com) wrote:
> >>> On 6/26/20 11:32 AM, Daniel P. Berrangé wrote:
> >>>> On Fri, Jun 26, 2020 at 11:01:58AM +0200, Janosch Frank wrote:
> >>>>> On 6/26/20 8:53 AM, David Hildenbrand wrote:
> >>>>>>>>>> Does this have any implications when probing with the 'none' machine?
> >>>>>>>>>
> >>>>>>>>> I'm not sure.  In your case, I guess the cpu bit would still show up
> >>>>>>>>> as before, so it would tell you base feature availability, but not
> >>>>>>>>> whether you can use the new configuration option.
> >>>>>>>>>
> >>>>>>>>> Since the HTL option is generic, you could still set it on the "none"
> >>>>>>>>> machine, though it wouldn't really have any effect.  That is, if you
> >>>>>>>>> could create a suitable object to point it at, which would depend on
> >>>>>>>>> ... details.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> The important point is that we never want the (expanded) host cpu model
> >>>>>>>> look different when either specifying or not specifying the HTL
> >>>>>>>> property.
> >>>>>>>
> >>>>>>> Ah, yes, I see your point.  So my current suggestion will satisfy
> >>>>>>> that, basically it is:
> >>>>>>>
> >>>>>>> cpu has unpack (inc. by default) && htl specified
> >>>>>>> 	=> works (allowing secure), as expected
> >>>>>>
> >>>>>> ack
> >>>>>>
> >>>>>>>
> >>>>>>> !cpu has unpack && htl specified
> >>>>>>> 	=> bails out with an error
> >>>>>>
> >>>>>> ack
> >>>>>>
> >>>>>>>
> >>>>>>> !cpu has unpack && !htl specified
> >>>>>>> 	=> works for a non-secure guest, as expected
> >>>>>>> 	=> guest will fail if it attempts to go secure
> >>>>>>
> >>>>>> ack, behavior just like running on older hw without unpack
> >>>>>>
> >>>>>>>
> >>>>>>> cpu has unpack && !htl specified
> >>>>>>> 	=> works as expected for a non-secure guest (unpack feature is
> >>>>>>> 	   present, but unused)
> >>>>>>> 	=> secure guest may work "by accident", but only if all virtio
> >>>>>>> 	   properties have the right values, which is the user's
> >>>>>>> 	   problem
> >>>>>>>
> >>>>>>> That last case is kinda ugly, but I think it's tolerable.
> >>>>>>
> >>>>>> Right, we must not affect non-secure guests, and existing secure setups
> >>>>>> (e.g., older qemu machines). Will have to think about this some more,
> >>>>>> but does not sound too crazy.
> >>>>>
> >>>>> I severely dislike having to specify things to make PV work.
> >>>>> The IOMMU is already a thorn in our side and we're working on making the
> >>>>> whole ordeal completely transparent so the only requirement to make this
> >>>>> work is the right machine, kernel, qemu and kernel cmd line option
> >>>>> "prot_virt=1". That's why we do the reboot into PV mode in the first place.
> >>>>>
> >>>>> I.e. the goal is that if customers convert compatible guests into
> >>>>> protected ones and start them up on a z15 on a distro with PV support
> >>>>> they can just use the guest without having to change XML or command line
> >>>>> parameters.
> >>>>
> >>>> If you're exposing new features to the guest machine, then it is usually
> >>>> to be expected that XML and QEMU command line will change. Some simple
> >>>> things might be hidable behind a new QEMU machine type or CPU model, but
> >>>> there's a limit to how much should be hidden that way while staying sane.
> >>>>
> >>>> I'd really expect the configuration to change when switching a guest to
> >>>> a new hardware platform and wanting major new functionality to be enabled.
> >>>> The XML / QEMU config is a low level instantiation of a particular feature
> >>>> set, optimized for a specific machine, rather than a high level description
> >>>> of ideal "best" config independent of host machine.
> >>>
> >>> You still have to set the host command line and make sure that unpack is
> >>> available. Currently you also have to specify the IOMMU which we like to
> >>> drop as a requirement. Everything else is dependent on runtime
> >>> information which tells us if we need to take a PV or non-PV branch.
> >>> Having the unpack facility should be enough to use the unpack facility.
> >>>
> >>> Keep in mind that we have no real concept of a special protected VM to
> >>> begin with. If the VM never boots into a protected kernel it will never
> >>> be protected. On a reboot it drops from protected into unprotected mode
> >>> to execute the bios and boot loader and then may or may not move back
> >>> into a protected state.
> >>
> >> My worry isn't actually how painful adding all the iommu glue is, but
> >> what happens when users forget; especially if they forget for one
> >> device.
> >>
> >> I could appreciate having a machine option to cause iommu to then get
> >> turned on with all other devices; but I think also we could do with
> >> something that failed with a nice error if an iommu flag was missing.
> >> For SEV this could be done pretty early, but for power/s390 I guess
> >> you'd have to do this when someone tried to enable secure mode, but
> >> I'm not sure you can tell.
> > 
> > What is the cost / downside of turning on the iommu option for virtio
> > devices ? Is it something that is reasonable for a mgmt app todo
> > unconditionally, regardless of whether memory encryption is in use,
> > or will that have a negative impact on things ?
> 
> speed, memory usage and compatibility problems.
> There might also be a problem with s390 having to use <=2GB iommu areas
> in the guest, I need to check with Halil if this is still true.

It is partially true. The coherent_dma_mask is 31 bit and the dma_mask
is 64. That means if iommu=on but !PV the coherent stuff will use <= 2GB
(that stuff allocated by virtio core, like virtqueues, CCWs, etc.) but
there will be no bounce buffering. We don't even initialize swiotlb if
!PV.

I agree with Janosch, we want iommu='on' only when really needed. I've
tried to make that point several times.

Regards,
Halil

> 
> Also, if the default or specified IOMMU buffer size isn't big enough for
> your IO workload the guest is gonna have a very bad time. I.e. if
> somebody has an alternative implementation of bounce buffers we'd be
> happy to take it :)
> 
> > 
> > Regards,
> > Daniel
> > 
> 
>