mbox series

[v3,00/37] KVM: s390: Add support for protected VMs

Message ID 20200220104020.5343-1-borntraeger@de.ibm.com (mailing list archive)
Headers show
Series KVM: s390: Add support for protected VMs | expand

Message

Christian Borntraeger Feb. 20, 2020, 10:39 a.m. UTC
mm people: This series contains a "pretty small" common code memory
management change that will allow paging, guest backing with files etc
almost just like normal VMs. It should be a no-op for all architectures
not opting in. And it should be usable for others that also try to get
notified on "the pages are in the process of being used for things like
I/O". This time I included error handling and an ACK from Will Deacon.

mm-related patches CCed on linux-mm, the complete list can be found on
the KVM and linux-s390 list. 

Andrew, any chance to either take " mm:gup/writeback: add callbacks for
inaccessible pages" or ACK so that I can take it?

Overview
--------
Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state
like guest memory and guest registers anymore. Instead the PVMs are
mostly managed by a new entity called Ultravisor (UV), which provides
an API, so KVM and the PV can request management actions.

PVMs are encrypted at rest and protected from hypervisor access while
running. They switch from a normal operation into protected mode, so
we can still use the standard boot process to load a encrypted blob
and then move it into protected mode.

Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.

All patches are in the protvirtv4 branch of the korg s390 kvm git
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git/log/?h=protvirtv5

Claudio presented the technology at his presentation at KVM Forum
2019.

https://static.sched.com/hosted_files/kvmforum2019/3b/ibm_protected_vms_s390x.pdf


v2 -> v3
- rebase against v5.6-rc2
- move some checks into the callers
- typo fixes
- extend UV query size
- do a tlb flush when entering/exiting protected mode
- more comments
- change interface to PV_ENABLE/DISABLE instead of vcpu/vm
  create/destroy
- lockdep checks for *is_protected calls
- locking improments
- move facility 161 to qemu
- checkpatch fixes
- merged error handling in mm patch
- removed vcpu pv commands
- use mp_state for setting the IPL PSW


v1 -> v2
- rebase on top of kvm/master
- pipe through rc and rrc. This might have created some churn here and
  there
- turn off sclp masking when rebooting into "unsecure"
- memory management simplification
- prefix page handling now via intercept 112
- io interrupt intervention request fix (do not use GISA)
- api.txt conversion to rst
- sample patches on top of mm/gup/writeback
- tons of review feedback
- kvm_uv debug feature fixes and unifications
- ultravisor information for /sys/firmware
- 

RFCv2 -> v1 (you can diff the protvirtv2 and the protvirtv3 branch)
- tons of review feedback integrated (see mail thread)
- memory management now complete and working
- Documentation patches merged
- interrupt patches merged
- CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST removed
- SIDA interface integrated into memop
- for merged patches I removed reviews that were not in all patches


Christian Borntraeger (5):
  KVM: s390/mm: Make pages accessible before destroying the guest
  KVM: s390: protvirt: Add SCLP interrupt handling
  KVM: s390: protvirt: do not inject interrupts after start
  KVM: s390: rstify new ioctls in api.rst
  KVM: s390: protvirt: introduce and enable KVM_CAP_S390_PROTECTED

Claudio Imbrenda (3):
  mm:gup/writeback: add callbacks for inaccessible pages
  s390/mm: provide memory management functions for protected KVM guests
  KVM: s390/mm: handle guest unpin events

Janosch Frank (24):
  KVM: s390: protvirt: Add UV debug trace
  KVM: s390: add new variants of UV CALL
  KVM: s390: protvirt: Add initial vm and cpu lifecycle handling
  KVM: s390: protvirt: Add KVM api documentation
  KVM: s390: protvirt: Secure memory is not mergeable
  KVM: s390: protvirt: Handle SE notification interceptions
  KVM: s390: protvirt: Instruction emulation
  KVM: s390: protvirt: Handle spec exception loops
  KVM: s390: protvirt: Add new gprs location handling
  KVM: S390: protvirt: Introduce instruction data area bounce buffer
  KVM: s390: protvirt: handle secure guest prefix pages
  KVM: s390: protvirt: Write sthyi data to instruction data area
  KVM: s390: protvirt: STSI handling
  KVM: s390: protvirt: disallow one_reg
  KVM: s390: protvirt: Do only reset registers that are accessible
  KVM: s390: protvirt: Only sync fmt4 registers
  KVM: s390: protvirt: Add program exception injection
  KVM: s390: protvirt: UV calls in support of diag308 0, 1
  KVM: s390: protvirt: Report CPU state to Ultravisor
  KVM: s390: protvirt: Support cmd 5 operation state
  KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and
    112
  KVM: s390: protvirt: Add UV cpu reset calls
  DOCUMENTATION: Protected virtual machine introduction and IPL
  s390: protvirt: Add sysfs firmware interface for Ultravisor
    information

Michael Mueller (1):
  KVM: s390: protvirt: Implement interrupt injection

Ulrich Weigand (1):
  KVM: s390/interrupt: do not pin adapter interrupt pages

Vasily Gorbik (3):
  s390/protvirt: introduce host side setup
  s390/protvirt: add ultravisor initialization
  s390/mm: add (non)secure page access exceptions handlers

 .../admin-guide/kernel-parameters.txt         |   5 +
 Documentation/virt/kvm/api.rst                |  91 +++-
 Documentation/virt/kvm/devices/s390_flic.rst  |  11 +-
 Documentation/virt/kvm/index.rst              |   2 +
 Documentation/virt/kvm/s390-pv-boot.rst       |  83 +++
 Documentation/virt/kvm/s390-pv.rst            | 116 ++++
 MAINTAINERS                                   |   1 +
 arch/s390/boot/Makefile                       |   2 +-
 arch/s390/boot/uv.c                           |  21 +-
 arch/s390/include/asm/gmap.h                  |   6 +
 arch/s390/include/asm/kvm_host.h              | 113 +++-
 arch/s390/include/asm/mmu.h                   |   2 +
 arch/s390/include/asm/mmu_context.h           |   1 +
 arch/s390/include/asm/page.h                  |   5 +
 arch/s390/include/asm/pgtable.h               |  35 +-
 arch/s390/include/asm/uv.h                    | 252 ++++++++-
 arch/s390/kernel/Makefile                     |   1 +
 arch/s390/kernel/pgm_check.S                  |   4 +-
 arch/s390/kernel/setup.c                      |   9 +-
 arch/s390/kernel/uv.c                         | 413 ++++++++++++++
 arch/s390/kvm/Makefile                        |   2 +-
 arch/s390/kvm/diag.c                          |   4 +
 arch/s390/kvm/intercept.c                     | 115 +++-
 arch/s390/kvm/interrupt.c                     | 399 ++++++++------
 arch/s390/kvm/kvm-s390.c                      | 509 +++++++++++++++---
 arch/s390/kvm/kvm-s390.h                      |  51 +-
 arch/s390/kvm/priv.c                          |  11 +-
 arch/s390/kvm/pv.c                            | 286 ++++++++++
 arch/s390/mm/fault.c                          |  78 +++
 arch/s390/mm/gmap.c                           |  65 ++-
 include/linux/gfp.h                           |   6 +
 include/uapi/linux/kvm.h                      |  43 +-
 mm/gup.c                                      |  15 +-
 mm/page-writeback.c                           |   5 +
 34 files changed, 2442 insertions(+), 320 deletions(-)
 create mode 100644 Documentation/virt/kvm/s390-pv-boot.rst
 create mode 100644 Documentation/virt/kvm/s390-pv.rst
 create mode 100644 arch/s390/kernel/uv.c
 create mode 100644 arch/s390/kvm/pv.c

Comments

David Hildenbrand Feb. 21, 2020, 10:54 a.m. UTC | #1
On 20.02.20 11:39, Christian Borntraeger wrote:
> mm people: This series contains a "pretty small" common code memory
> management change that will allow paging, guest backing with files etc
> almost just like normal VMs. It should be a no-op for all architectures
> not opting in. And it should be usable for others that also try to get
> notified on "the pages are in the process of being used for things like
> I/O". This time I included error handling and an ACK from Will Deacon.
> 
> mm-related patches CCed on linux-mm, the complete list can be found on
> the KVM and linux-s390 list. 
> 
> Andrew, any chance to either take " mm:gup/writeback: add callbacks for
> inaccessible pages" or ACK so that I can take it?

Summary: Mostly LGTM. Especially
- UAPI interface is minimal and clean
- Core MM changes are minimal and clean (and AFAIKT Andrea was involved
  when discussing the approach, so it can't be wrong ;) )
- Is no longer prototype quality ;)

There are still some things I want to double check (esp. how KVM memory
slots are handled - I somewhat dislike that we cannot replace/add new
ones while in PV. One would have to fence that somehow in QEMU as long
as the guest is in PV mode e.g., once we would support memory hotplug
... but looks like this is what the HW requires us to enforce for now),
certain races, etc. but I assume these things could be fixed later on.

Can we get a new version once the other reviewers are done, so at least
I can have a final look?

Thanks!
Christian Borntraeger Feb. 21, 2020, 11:26 a.m. UTC | #2
On 21.02.20 11:54, David Hildenbrand wrote:
> On 20.02.20 11:39, Christian Borntraeger wrote:
>> mm people: This series contains a "pretty small" common code memory
>> management change that will allow paging, guest backing with files etc
>> almost just like normal VMs. It should be a no-op for all architectures
>> not opting in. And it should be usable for others that also try to get
>> notified on "the pages are in the process of being used for things like
>> I/O". This time I included error handling and an ACK from Will Deacon.
>>
>> mm-related patches CCed on linux-mm, the complete list can be found on
>> the KVM and linux-s390 list. 
>>
>> Andrew, any chance to either take " mm:gup/writeback: add callbacks for
>> inaccessible pages" or ACK so that I can take it?
> 
> Summary: Mostly LGTM. Especially
> - UAPI interface is minimal and clean
> - Core MM changes are minimal and clean (and AFAIKT Andrea was involved
>   when discussing the approach, so it can't be wrong ;) )
> - Is no longer prototype quality ;)
> 
> There are still some things I want to double check (esp. how KVM memory
> slots are handled - I somewhat dislike that we cannot replace/add new
> ones while in PV. One would have to fence that somehow in QEMU as long
> as the guest is in PV mode e.g., once we would support memory hotplug
> ... but looks like this is what the HW requires us to enforce for now),
> certain races, etc. but I assume these things could be fixed later on.


In fact you can do that. The hardware checks the integrity on guest physical
address. So it is perfectly possible to remap a kvm slot as long as the
eńcrypted content matches what counter, guest content hash and guest address
tell. (It is like swapping, you move the encrypted content from one host 
page to another). 
For new pages (not unpacked and never touched by the guest) the ultravisor
will bring a zeroed out page on first import.

What does not work is when the user space address changes for a guest
virtio indicator. But this was already broken before (we never did an
adapter unmap/remap).

> 
> Can we get a new version once the other reviewers are done, so at least
> I can have a final look?

Just the updated patches as reply (e.g. a 3.2 for patch 9) or the full monty?
David Hildenbrand Feb. 21, 2020, 11:28 a.m. UTC | #3
> In fact you can do that. The hardware checks the integrity on guest physical
> address. So it is perfectly possible to remap a kvm slot as long as the
> eńcrypted content matches what counter, guest content hash and guest address
> tell. (It is like swapping, you move the encrypted content from one host 
> page to another). 
> For new pages (not unpacked and never touched by the guest) the ultravisor
> will bring a zeroed out page on first import.
> 
> What does not work is when the user space address changes for a guest
> virtio indicator. But this was already broken before (we never did an
> adapter unmap/remap).
> 

Thanks for the info!

>>
>> Can we get a new version once the other reviewers are done, so at least
>> I can have a final look?
> 
> Just the updated patches as reply (e.g. a 3.2 for patch 9) or the full monty?

Whatever you others prefer. I can see that Conny has some feedback.
maybe wait for that and then (selectively) resend.
Cornelia Huck Feb. 21, 2020, 1:45 p.m. UTC | #4
On Fri, 21 Feb 2020 12:28:51 +0100
David Hildenbrand <david@redhat.com> wrote:

> >> Can we get a new version once the other reviewers are done, so at least
> >> I can have a final look?  
> > 
> > Just the updated patches as reply (e.g. a 3.2 for patch 9) or the full monty?  
> 
> Whatever you others prefer. I can see that Conny has some feedback.
> maybe wait for that and then (selectively) resend.

I was still looking over this version. And to be honest, I find those
3.x patches utterly confusing, especially as this is all coming in so
quickly as to make it hard to keep up.

That said, I'll probably not be able to do much (or any) reviewing
today or on Monday, so feel free to send a new version next week.