mbox series

[v4,00/13] Add PCI pass-thru support to Hyper-V Confidential VMs

Message ID 1669951831-4180-1-git-send-email-mikelley@microsoft.com (mailing list archive)
Headers show
Series Add PCI pass-thru support to Hyper-V Confidential VMs | expand

Message

Michael Kelley (LINUX) Dec. 2, 2022, 3:30 a.m. UTC
This patch series adds support for PCI pass-thru devices to Hyper-V
Confidential VMs (also called "Isolation VMs"). But in preparation, it
first changes how private (encrypted) vs. shared (decrypted) memory is
handled in Hyper-V SEV-SNP guest VMs. The new approach builds on the
confidential computing (coco) mechanisms introduced in the 5.19 kernel
for TDX support and significantly reduces the amount of Hyper-V specific
code. Furthermore, with this new approach a proposed RFC patch set for
generic DMA layer functionality[1] is no longer necessary.

Background
==========
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.  vTOM is the dividing line where the uppermost bit of the
physical address space is set; e.g., with 47 bits of guest physical
address space, vTOM is 0x400000000000 (bit 46 is set).  Guest physical
memory is accessible at two parallel physical addresses -- one below
vTOM and one above vTOM.  Accesses below vTOM are private (encrypted)
while accesses above vTOM are shared (decrypted). In this sense, vTOM
is like the GPA.SHARED bit in Intel TDX.

In Hyper-V's use of vTOM, the normal guest OS runs at VMPL2, while
a Hyper-V provided "paravisor" runs at VMPL0 in the guest VM. (VMPL is
Virtual Machine Privilege Level. See AMD's SEV-SNP spec for more
details.) The paravisor provides emulation for various system devices
like the IO-APIC as part of the guest VM.  Accesses to such devices
made by the normal guest OS trap to the paravisor and are emulated in
the guest VM context instead of in the Hyper-V host. This emulation is
invisible to the normal guest OS, but with the quirk that memory mapped
I/O accesses to these devices must be treated as private, not shared as
would be the case for other device accesses.

Support for Hyper-V guests using vTOM was added to the Linux kernel
in two patch sets[2][3]. This support treats the vTOM bit as part of
the physical address.  For accessing shared (decrypted) memory, the core
approach is to create a second kernel virtual mapping that maps to
parallel physical addresses above vTOM, while leaving the original
mapping unchanged.  Most of the code for creating that second virtual
mapping is confined to Hyper-V specific areas, but there are are also
changes to generic swiotlb code.

Changes in this patch set
=========================
In preparation for supporting PCI pass-thru devices, this patch set
changes the core approach for handling vTOM. In the new approach,
the vTOM bit is treated as a protection flag, and not as part of
the physical address. This new approach is like the approach for
the GPA.SHARED bit in Intel TDX.  Furthermore, there's no need to
create a second kernel virtual mapping.  When memory is changed
between private and shared using set_memory_decrypted() and
set_memory_encrypted(), the PTEs for the existing kernel mapping
are changed to add or remove the vTOM bit just as with TDX. The
hypercalls to change the memory status on the host side are made
using the existing callback mechanism. Everything just works, with
a minor tweak to map the IO-APIC to use private accesses as mentioned
above.

With the new handling of vTOM in place, existing Hyper-V code that
creates the second kernel virtual mapping still works, but it is now
redundant as the original kernel virtual mapping (as updated) maps
to the same physical address. To simplify things going forward, this
patch set removes the code that creates the second kernel virtual
mapping. And since a second kernel virtual mapping is no longer
needed, changes to the DMA layer proposed as an RFC[1] are no
longer needed.

Finally, to support PCI pass-thru in a Confidential VM, Hyper-V
requires that all accesses to PCI config space be emulated using
a hypercall.  This patch set adds functions to invoke those
hypercalls and uses them in the config space access functions
in the Hyper-V PCI driver. Lastly, the Hyper-V PCI driver is
marked as allowed to be used in a Confidential VM.  The Hyper-V
PCI driver has been hardened against a malicious Hyper-V in a
previous patch set.[4]

Patch Organization
==================
Patches 1 thru 5 are prepatory patches that account for
slightly different assumptions when running in a Hyper-V VM
with vTOM, fix some minor bugs, and make temporary tweaks
to avoid needing a single large patch to make the transition
from the old approach to the new approach.

Patch 6 enables the new approach to handling vTOM for Hyper-V
guest VMs. This is the core patch after which the new approach
is in effect.

Patches 7 thru 10 remove existing code for creating the second
kernel virtual mapping that is no longer necessary with the
new approach.

Patch 11 updates existing code so that it no longer assumes that
the vTOM bit is part of the physical address.

Patches 12 and 13 add new hypercalls for accessing MMIO space
and use those hypercalls for PCI config space. They also enable
the Hyper-V vPCI driver to be used in a Confidential VM.

[1] https://lore.kernel.org/lkml/20220706195027.76026-1-parri.andrea@gmail.com/
[2] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[3] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/
[4] https://lore.kernel.org/all/20220511223207.3386-1-parri.andrea@gmail.com/

---

Changes in v4:
* Remove previous Patch 1 from this series and submit separately
  [Dave Hansen & Boris Petkov]

* Patch 1: Change the name of the new CC_ATTR that controls
  whether the IO-APIC is mapped decrypted [Boris Petkov]

* Patch 4: Use sme_me_mask directly instead of calling the
  getter function. Add Fixes: tag. [Tom Lendacky]

* Patch 6: Remove CC_VENDOR_HYPERV and merge associated
  vTOM functionality under CC_VENDOR_AMD. [Boris Petkov]

* Patch 8: Use bitwise OR to pick up the vTOM bit in
  shared_gpa_boundary rather than adding it

Changes in v3:
* Patch 1: Tweak the code fix to cleanly separate the page
  alignment and physical address masking [Dave Hansen]

* Patch 2: Change the name of the new CC_ATTR that controls
  whether the IO-APIC is mapped decrypted [Dave Hansen]

* Patch 5 (now patch 7): Add CC_ATTR_MEM_ENCRYPT to what
  Hyper-V vTOM reports as 'true'. With the addition, Patches
  5 and 6 are new to accomodate working correctly with Hyper-V
  VMs using vTOM. [Tom Lendacky]

Changes in v2:
* Patch 11: Include more detail in the error message if an MMIO
  hypercall fails. [Bjorn Helgaas]

* Patch 12: Restore removed memory barriers. It seems like these
  barriers should not be needed because of the spin_unlock() calls,
  but commit bdd74440d9e8 indicates that they are. This patch series
  will leave the barriers unchanged; whether they are really needed
  can be sorted out separately. [Boqun Feng]

Michael Kelley (13):
  x86/ioapic: Gate decrypted mapping on cc_platform_has() attribute
  x86/hyperv: Reorder code in prep for subsequent patch
  Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
  x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
  init: Call mem_encrypt_init() after Hyper-V hypercall init is done
  x86/hyperv: Change vTOM handling to use standard coco mechanisms
  swiotlb: Remove bounce buffer remapping for Hyper-V
  Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages
  Drivers: hv: vmbus: Remove second way of mapping ring buffers
  hv_netvsc: Remove second mapping of send and recv buffers
  Drivers: hv: Don't remap addresses that are above shared_gpa_boundary
  PCI: hv: Add hypercalls to read/write MMIO space
  PCI: hv: Enable PCI pass-thru devices in Confidential VMs

 arch/x86/coco/core.c                |  37 ++++--
 arch/x86/hyperv/hv_init.c           |  18 +--
 arch/x86/hyperv/ivm.c               | 128 ++++++++++----------
 arch/x86/include/asm/coco.h         |   1 -
 arch/x86/include/asm/hyperv-tlfs.h  |   3 +
 arch/x86/include/asm/mshyperv.h     |   8 +-
 arch/x86/include/asm/msr-index.h    |   1 +
 arch/x86/kernel/apic/io_apic.c      |   3 +-
 arch/x86/kernel/cpu/mshyperv.c      |  22 ++--
 arch/x86/mm/mem_encrypt_amd.c       |  10 +-
 arch/x86/mm/pat/set_memory.c        |   3 -
 drivers/hv/Kconfig                  |   1 -
 drivers/hv/channel_mgmt.c           |   2 +-
 drivers/hv/connection.c             | 113 +++++-------------
 drivers/hv/hv.c                     |  23 ++--
 drivers/hv/hv_common.c              |  11 --
 drivers/hv/hyperv_vmbus.h           |   2 -
 drivers/hv/ring_buffer.c            |  62 ++++------
 drivers/net/hyperv/hyperv_net.h     |   2 -
 drivers/net/hyperv/netvsc.c         |  48 +-------
 drivers/pci/controller/pci-hyperv.c | 232 ++++++++++++++++++++++++++----------
 include/asm-generic/hyperv-tlfs.h   |  22 ++++
 include/asm-generic/mshyperv.h      |   2 -
 include/linux/cc_platform.h         |  12 ++
 include/linux/swiotlb.h             |   2 -
 init/main.c                         |  19 +--
 kernel/dma/swiotlb.c                |  45 +------
 27 files changed, 398 insertions(+), 434 deletions(-)

Comments

Borislav Petkov Jan. 9, 2023, 6:47 p.m. UTC | #1
On Thu, Dec 01, 2022 at 07:30:18PM -0800, Michael Kelley wrote:
> This patch series adds support for PCI pass-thru devices to Hyper-V
> Confidential VMs (also called "Isolation VMs"). But in preparation, it
> first changes how private (encrypted) vs. shared (decrypted) memory is
> handled in Hyper-V SEV-SNP guest VMs. The new approach builds on the
> confidential computing (coco) mechanisms introduced in the 5.19 kernel
> for TDX support and significantly reduces the amount of Hyper-V specific
> code. Furthermore, with this new approach a proposed RFC patch set for
> generic DMA layer functionality[1] is no longer necessary.

In any case, this is starting to get ready - how do we merge this?

I apply the x86 bits and give Wei an immutable branch to add the rest of the
HyperV stuff ontop?
Michael Kelley (LINUX) Jan. 9, 2023, 7:35 p.m. UTC | #2
From: Borislav Petkov <bp@alien8.de> Sent: Monday, January 9, 2023 10:47 AM
> 
> On Thu, Dec 01, 2022 at 07:30:18PM -0800, Michael Kelley wrote:
> > This patch series adds support for PCI pass-thru devices to Hyper-V
> > Confidential VMs (also called "Isolation VMs"). But in preparation, it
> > first changes how private (encrypted) vs. shared (decrypted) memory is
> > handled in Hyper-V SEV-SNP guest VMs. The new approach builds on the
> > confidential computing (coco) mechanisms introduced in the 5.19 kernel
> > for TDX support and significantly reduces the amount of Hyper-V specific
> > code. Furthermore, with this new approach a proposed RFC patch set for
> > generic DMA layer functionality[1] is no longer necessary.
> 
> In any case, this is starting to get ready - how do we merge this?
> 
> I apply the x86 bits and give Wei an immutable branch to add the rest of the
> HyperV stuff ontop?
> 
> --
> Regards/Gruss,
>     Boris.
> 

I'll let Wei respond on handling the merging.

I'll spin a v5 in a few days.  Changes will be:
* Address your comments

* Use PAGE_KERNEL in the arch independent Hyper-V code instead of
   PAGE_KERNEL_NOENC.  PAGE_KERNEL_NOENC doesn't exist for ARM64, so
   it causes compile errors when building for ARM64.  Using PAGE_KERNEL means
   getting sme_me_mask when on x86, but that value will be zero for vTOM VMs.

* Fix a problem with the virtual TPM device getting mapped decrypted.  Like
   the IOAPIC, the vTPM is provided by the paravisor, and needs to be mapped
   encrypted.   My thinking is to allow hypervisor initialization code to specify
   a guest physical address range to be treated as encrypted, and add a check against
   that range in __ioremap_check_other(), similar to what is done for EFI memory.
   Thoughts?  I don't want to change the vTPM driver, and the devm_* interfaces
   it uses don't provide an option to map encrypted anyway.  But I'm open to
   other ideas.

Thanks for the review!

Michael
Wei Liu Jan. 12, 2023, 2:03 p.m. UTC | #3
On Mon, Jan 09, 2023 at 07:47:08PM +0100, Borislav Petkov wrote:
> On Thu, Dec 01, 2022 at 07:30:18PM -0800, Michael Kelley wrote:
> > This patch series adds support for PCI pass-thru devices to Hyper-V
> > Confidential VMs (also called "Isolation VMs"). But in preparation, it
> > first changes how private (encrypted) vs. shared (decrypted) memory is
> > handled in Hyper-V SEV-SNP guest VMs. The new approach builds on the
> > confidential computing (coco) mechanisms introduced in the 5.19 kernel
> > for TDX support and significantly reduces the amount of Hyper-V specific
> > code. Furthermore, with this new approach a proposed RFC patch set for
> > generic DMA layer functionality[1] is no longer necessary.
> 
> In any case, this is starting to get ready - how do we merge this?
> 
> I apply the x86 bits and give Wei an immutable branch to add the rest of the
> HyperV stuff ontop?

I can take all the patches if that's easier for you. I don't think
anyone else is depending on the x86 patches in this series.

Giving me an immutable branch works too.

Thanks,
Wei.

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Dexuan Cui Jan. 19, 2023, 5:58 p.m. UTC | #4
> From: Wei Liu <wei.liu@kernel.org>
> Sent: Thursday, January 12, 2023 6:04 AM
> To: Borislav Petkov <bp@alien8.de>
> [...]
> On Mon, Jan 09, 2023 at 07:47:08PM +0100, Borislav Petkov wrote:
> > On Thu, Dec 01, 2022 at 07:30:18PM -0800, Michael Kelley wrote:
> > > This patch series adds support for PCI pass-thru devices to Hyper-V
> > > Confidential VMs (also called "Isolation VMs"). But in preparation, it
> > > first changes how private (encrypted) vs. shared (decrypted) memory is
> > > handled in Hyper-V SEV-SNP guest VMs. The new approach builds on the
> > > confidential computing (coco) mechanisms introduced in the 5.19 kernel
> > > for TDX support and significantly reduces the amount of Hyper-V specific
> > > code. Furthermore, with this new approach a proposed RFC patch set for
> > > generic DMA layer functionality[1] is no longer necessary.
> >
> > In any case, this is starting to get ready - how do we merge this?
> >
> > I apply the x86 bits and give Wei an immutable branch to add the rest of the
> > HyperV stuff ontop?
> 
> I can take all the patches if that's easier for you. I don't think
> anyone else is depending on the x86 patches in this series.
> 
> Giving me an immutable branch works too.
> 
> Thanks,
> Wei.
> > --
> > Regards/Gruss,
> >     Boris.

Hi Boris, Wei, any news on this?
Borislav Petkov Jan. 20, 2023, 11:58 a.m. UTC | #5
On Thu, Jan 12, 2023 at 02:03:35PM +0000, Wei Liu wrote:
> I can take all the patches if that's easier for you. I don't think
> anyone else is depending on the x86 patches in this series.

But we have a bunch of changes in tip so I'd prefer if all were in one place.

> Giving me an immutable branch works too.

Yap, lemme do that after applying.

Thx.
Wei Liu Jan. 20, 2023, 12:42 p.m. UTC | #6
On Fri, Jan 20, 2023 at 12:58:29PM +0100, Borislav Petkov wrote:
> On Thu, Jan 12, 2023 at 02:03:35PM +0000, Wei Liu wrote:
> > I can take all the patches if that's easier for you. I don't think
> > anyone else is depending on the x86 patches in this series.
> 
> But we have a bunch of changes in tip so I'd prefer if all were in one place.
> 
> > Giving me an immutable branch works too.
> 
> Yap, lemme do that after applying.
> 
Ack. Thanks!

Wei.