mbox series

[RFC,00/45] KVM: Arm SMMUv3 driver for pKVM

Message ID 20230201125328.2186498-1-jean-philippe@linaro.org (mailing list archive)
Headers show
Series KVM: Arm SMMUv3 driver for pKVM | expand

Message

Jean-Philippe Brucker Feb. 1, 2023, 12:52 p.m. UTC
The pKVM hypervisor, recently introduced on arm64, provides a separation
of privileges between the host and hypervisor parts of KVM, where the
hypervisor is trusted by guests but the host is not [1]. The host is
initially trusted during boot, but its privileges are reduced after KVM
is initialized so that, if an adversary later gains access to the large
attack surface of the host, it cannot access guest data.

Currently with pKVM, the host can still instruct DMA-capable devices
like the GPU to access guest and hypervisor memory, which undermines
this isolation. Preventing DMA attacks requires an IOMMU, owned by the
hypervisor.

This series adds a hypervisor driver for the Arm SMMUv3 IOMMU. Since the
hypervisor part of pKVM (called nVHE here) is minimal, moving the whole
host SMMU driver into nVHE isn't really an option. It is too large and
complex and requires infrastructure from all over the kernel. We add a
reduced nVHE driver that deals with populating the SMMU tables and the
command queue, and the host driver still deals with probing and some
initialization.


Patch overview
==============

A significant portion of this series just moves and factors code to
avoid duplications. Things get interesting only around patch 15, which
adds two helpers that track pages mapped in the IOMMU, and ensure those
pages are not donated to guests. Then patches 16-27 add the hypervisor
IOMMU driver, split into a generic part that can be reused by other
drivers, and code specific to SMMUv3.

Patches 34-40 introduce the host component of the pKVM SMMUv3 driver,
which initializes the configuration and forwards mapping requests to the
hypervisor. Ideally there would be a single host driver with two sets of
IOMMU ops, and while I believe more code can still be shared, the
initialization is very different and having separate driver entry points
seems clearer.

Patches 41-45 provide a rough example of power management through SCMI.
Although the host decides on power management policies, the hypervisor
must at least be aware of power changes, so that it doesn't access powered
down interfaces. We expect that the platform controller enforces
dependencies so that DMA doesn't bypass a powered down IOMMU. But these
things are unfortunately platform dependent and the SCMI patches are
only illustrative.

These patches in particular are best reviewed with git's --color-moved:
1,2	iommu/io-pgtable-arm: Split*
7,29-32	iommu/arm-smmu-v3: Move*

A development branch is available at
https://jpbrucker.net/git/linux pkvm/smmu


Design
======

We've explored three solutions so far. This posting implements the third
one, slightly more invasive in the hypervisor but the most flexible.

1. Sharing stage-2 page tables

This is the simplest solution, sharing the stage-2 page tables (which
translates host physical address -> system physical address) between CPU
and SMMU. Whatever the host can access on the CPU, it can also access with
DMA. Memory that is not accessible to the host because donated to the
hypervisor or guests, DMA cannot access either.

pKVM normally populates the host stage-2 page tables lazily, when the host
first accesses them. However this relies on CPU page faults, and DMA
generally cannot fault. The whole stage-2 must therefore be populated at
boot. That's easy to do because the HPA->PA mapping for the host is an
identity.

It gets more complicated when donating some pages to guests, which
involves removing those pages from the host stage-2. To save memory and be
TLB efficient, the stage-2 is mapped with block mappings (1G or 2MB
contiguous range, rather than individual 4k units). When donating a page
from that range, the hypervisor must remove the block mapping, and replace
it with a table that excludes the donated page. Since a device may be
simultaneously performing DMA on other pages in the range, this
replacement operation must be atomic. Otherwise DMA may reach the SMMU
during a small period of time where the mapping is invalid, and fatally
abort.

The Arm architecture supports atomic replacement of block mappings only
since version 8.4 (FEAT_BBM), and it is optional. So this solution, while
tempting, is not sufficient.

2. Pinning DMA mappings in the shared stage-2

Building on the first solution, we can let the host notify the hypervisor
about pages used for DMA. This way block mappings are broken into tables
when the host sets up DMA, and donating neighbouring pages to guests won't
cause block replacement.

This solution adds runtime overhead because calls the DMA API are now
forwarded to the hypervisor, which needs to update the stage-2 mappings.

All in all, I believe this is a good solution if the hardware is up to the
task. But sharing page tables requires matching capabilities between the
stage-2 MMU and SMMU, and we don't expect all platforms to support the
required features, especially on mobile platforms where chip area is
costly.

3. Private I/O page tables

A flexible alternative uses private page tables in the SMMU, entirely
disconnected from the CPU page tables. With this the SMMU can implement a
reduced set of features, even shed a stage of translation. This also
provides a virtual I/O address space to the host, which allows more
efficient memory allocation for large buffers, and for devices with
limited addressing abilities.

This is the solution implemented in this series. The host creates
IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(), and
the hypervisor populates the page tables. Page tables are abstracted into
IOMMU domains, which allow multiple devices to share the same address
space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
and free_domain(), manage the domains.

Although the hypervisor already has pgtable.c to populate CPU page tables,
we import the io-pgtable library because it is more suited to IOMMU page
tables. It supports arbitrary page and address sizes, non-coherent page
walks, quirks and errata workarounds specific to IOMMU implementations,
and atomically switching between tables and blocks without lazy remapping.


Performance
===========

Both solutions 2 and 3 add overhead to DMA mappings, and since the
hypervisor relies on global locks at the moment, they scale poorly.
Interestingly solution 3 can be optimized to scale really well on the
map() path. We can remove the hypervisor IOMMU lock in map()/unmap() by
holding domain references, and then use the hyp vmemmap to track DMA state
of pages atomically, without updating the CPU stage-2 tables. Donation and
sharing would then need to inspect the vmemmap. On the unmap() path, the
single command queue for TLB invalidations still requires locking.

To give a rough idea, these are dma_map_benchmark results on a 96-core
server (4 NUMA nodes, SMMU on node 0). I'm adding these because I found
the magnitudes interesting but do take them with a grain of salt, my
methodology wasn't particularly thorough (although the numbers seem
repeatable). Numbers represent the average time needed for one
dma_map/dma_unmap call in μs, lower is better.

			1 thread	16 threads (node 0)	96 threads
host only		0.2/0.7		0.4/3.5			1.7/81
pkvm (this series)	0.5/2.2		28/51			291/542
pkvm (+optimizations)	0.3/1.9		0.4/38			0.8/304


[1] https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/


David Brazdil (1):
  KVM: arm64: Introduce IOMMU driver infrastructure

Jean-Philippe Brucker (44):
  iommu/io-pgtable-arm: Split the page table driver
  iommu/io-pgtable-arm: Split initialization
  iommu/io-pgtable: Move fmt into io_pgtable_cfg
  iommu/io-pgtable: Add configure() operation
  iommu/io-pgtable: Split io_pgtable structure
  iommu/io-pgtable-arm: Extend __arm_lpae_free_pgtable() to only free
    child tables
  iommu/arm-smmu-v3: Move some definitions to arm64 include/
  KVM: arm64: pkvm: Add pkvm_udelay()
  KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping()
  KVM: arm64: pkvm: Expose pkvm_map/unmap_donated_memory()
  KVM: arm64: pkvm: Expose pkvm_admit_host_page()
  KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory()
  KVM: arm64: pkvm: Add hyp_page_ref_inc_return()
  KVM: arm64: pkvm: Prevent host donation of device memory
  KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  KVM: arm64: pkvm: Add IOMMU hypercalls
  KVM: arm64: iommu: Add per-cpu page queue
  KVM: arm64: iommu: Add domains
  KVM: arm64: iommu: Add map() and unmap() operations
  KVM: arm64: iommu: Add SMMUv3 driver
  KVM: arm64: smmu-v3: Initialize registers
  KVM: arm64: smmu-v3: Setup command queue
  KVM: arm64: smmu-v3: Setup stream table
  KVM: arm64: smmu-v3: Reset the device
  KVM: arm64: smmu-v3: Support io-pgtable
  KVM: arm64: smmu-v3: Setup domains and page table configuration
  iommu/arm-smmu-v3: Extract driver-specific bits from probe function
  iommu/arm-smmu-v3: Move some functions to arm-smmu-v3-common.c
  iommu/arm-smmu-v3: Move queue and table allocation to
    arm-smmu-v3-common.c
  iommu/arm-smmu-v3: Move firmware probe to arm-smmu-v3-common
  iommu/arm-smmu-v3: Move IOMMU registration to arm-smmu-v3-common.c
  iommu/arm-smmu-v3: Use single pages for level-2 stream tables
  iommu/arm-smmu-v3: Add host driver for pKVM
  iommu/arm-smmu-v3-kvm: Pass a list of SMMU devices to the hypervisor
  iommu/arm-smmu-v3-kvm: Validate device features
  iommu/arm-smmu-v3-kvm: Allocate structures and reset device
  iommu/arm-smmu-v3-kvm: Add per-cpu page queue
  iommu/arm-smmu-v3-kvm: Initialize page table configuration
  iommu/arm-smmu-v3-kvm: Add IOMMU ops
  KVM: arm64: pkvm: Add __pkvm_host_add_remove_page()
  KVM: arm64: pkvm: Support SCMI power domain
  KVM: arm64: smmu-v3: Support power management
  iommu/arm-smmu-v3-kvm: Support power management with SCMI SMC
  iommu/arm-smmu-v3-kvm: Enable runtime PM

 drivers/iommu/Kconfig                         |   10 +
 virt/kvm/Kconfig                              |    3 +
 arch/arm64/kvm/hyp/nvhe/Makefile              |    6 +
 drivers/iommu/Makefile                        |    2 +-
 drivers/iommu/arm/arm-smmu-v3/Makefile        |    6 +
 arch/arm64/include/asm/arm-smmu-v3-regs.h     |  478 ++++++++
 arch/arm64/include/asm/kvm_asm.h              |    7 +
 arch/arm64/include/asm/kvm_host.h             |    5 +
 arch/arm64/include/asm/kvm_hyp.h              |    4 +-
 arch/arm64/kvm/hyp/include/nvhe/iommu.h       |  115 ++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   11 +-
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |   15 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |    2 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |   29 +
 .../arm64/kvm/hyp/include/nvhe/trap_handler.h |    2 +
 drivers/gpu/drm/panfrost/panfrost_device.h    |    2 +-
 drivers/iommu/amd/amd_iommu_types.h           |   17 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  510 +-------
 drivers/iommu/arm/arm-smmu/arm-smmu.h         |    2 +-
 drivers/iommu/io-pgtable-arm.h                |   30 -
 include/kvm/arm_smmu_v3.h                     |   61 +
 include/kvm/iommu.h                           |   74 ++
 include/kvm/power_domain.h                    |   22 +
 include/linux/io-pgtable-arm.h                |  190 +++
 include/linux/io-pgtable.h                    |  114 +-
 arch/arm64/kvm/arm.c                          |   41 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  101 +-
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c   |  625 ++++++++++
 .../arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c |   97 ++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c         |  393 ++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  209 +++-
 arch/arm64/kvm/hyp/nvhe/mm.c                  |   27 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |   66 +-
 arch/arm64/kvm/hyp/nvhe/power/scmi.c          |  233 ++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   47 +-
 arch/arm64/kvm/hyp/nvhe/timer-sr.c            |   43 +
 drivers/gpu/drm/msm/msm_iommu.c               |   22 +-
 drivers/gpu/drm/panfrost/panfrost_mmu.c       |   22 +-
 drivers/iommu/amd/io_pgtable.c                |   26 +-
 drivers/iommu/amd/io_pgtable_v2.c             |   43 +-
 drivers/iommu/amd/iommu.c                     |   29 +-
 drivers/iommu/apple-dart.c                    |   38 +-
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      |  632 ++++++++++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   |  864 +++++++++++++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |    2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  679 +----------
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c    |    7 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c         |   41 +-
 drivers/iommu/arm/arm-smmu/qcom_iommu.c       |   41 +-
 drivers/iommu/io-pgtable-arm-common.c         |  766 ++++++++++++
 drivers/iommu/io-pgtable-arm-v7s.c            |  190 +--
 drivers/iommu/io-pgtable-arm.c                | 1082 ++---------------
 drivers/iommu/io-pgtable-dart.c               |  105 +-
 drivers/iommu/io-pgtable.c                    |   57 +-
 drivers/iommu/ipmmu-vmsa.c                    |   20 +-
 drivers/iommu/msm_iommu.c                     |   18 +-
 drivers/iommu/mtk_iommu.c                     |   14 +-
 57 files changed, 5743 insertions(+), 2554 deletions(-)
 create mode 100644 arch/arm64/include/asm/arm-smmu-v3-regs.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/iommu.h
 delete mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 include/kvm/arm_smmu_v3.h
 create mode 100644 include/kvm/iommu.h
 create mode 100644 include/kvm/power_domain.h
 create mode 100644 include/linux/io-pgtable-arm.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/power/scmi.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
 create mode 100644 drivers/iommu/io-pgtable-arm-common.c

Comments

Tian, Kevin Feb. 2, 2023, 7:07 a.m. UTC | #1
> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Wednesday, February 1, 2023 8:53 PM
> 
> 3. Private I/O page tables
> 
> A flexible alternative uses private page tables in the SMMU, entirely
> disconnected from the CPU page tables. With this the SMMU can implement
> a
> reduced set of features, even shed a stage of translation. This also
> provides a virtual I/O address space to the host, which allows more
> efficient memory allocation for large buffers, and for devices with
> limited addressing abilities.
> 
> This is the solution implemented in this series. The host creates
> IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(),
> and
> the hypervisor populates the page tables. Page tables are abstracted into
> IOMMU domains, which allow multiple devices to share the same address
> space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
> and free_domain(), manage the domains.
> 

Out of curiosity. Does virtio-iommu fit in this usage? If yes then there is
no need to add specific enlightenment in existing iommu drivers. If no 
probably because as mentioned in the start a full-fledged iommu driver 
doesn't fit nVHE so lots of smmu driver logic has to be kept in the host?
anyway just want to check your thoughts on the possibility.

btw some of my colleagues are porting pKVM to Intel platform. I believe
they will post their work shortly and there might require some common
framework in pKVM hypervisor like iommu domain, hypercalls, etc. like 
what we have in the host iommu subsystem. CC them in case of any early
thought they want to throw in. 
Jean-Philippe Brucker Feb. 2, 2023, 10:05 a.m. UTC | #2
On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Sent: Wednesday, February 1, 2023 8:53 PM
> > 
> > 3. Private I/O page tables
> > 
> > A flexible alternative uses private page tables in the SMMU, entirely
> > disconnected from the CPU page tables. With this the SMMU can implement
> > a
> > reduced set of features, even shed a stage of translation. This also
> > provides a virtual I/O address space to the host, which allows more
> > efficient memory allocation for large buffers, and for devices with
> > limited addressing abilities.
> > 
> > This is the solution implemented in this series. The host creates
> > IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(),
> > and
> > the hypervisor populates the page tables. Page tables are abstracted into
> > IOMMU domains, which allow multiple devices to share the same address
> > space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
> > and free_domain(), manage the domains.
> > 
> 
> Out of curiosity. Does virtio-iommu fit in this usage?

I don't think so, because you still need a driver for the physical IOMMU
in the hypervisor. virtio-iommu would only replace the hypercall interface
with queues, and I don't think that buys us anything.

Maybe virtio on the guest side could be advantageous, because that
interface has to be stable and virtio comes with stable APIs for several
classes of devices. But implementing virtio in pkvm means a lot of extra
code so it needs to be considered carefully.

> If yes then there is
> no need to add specific enlightenment in existing iommu drivers. If no 
> probably because as mentioned in the start a full-fledged iommu driver 
> doesn't fit nVHE so lots of smmu driver logic has to be kept in the host?

To minimize the attack surface of the hypervisor, we don't want to load
any superfluous code, so the hypervisor part of the SMMUv3 driver only
contains code to populate tables and send commands (which is still too
much for my taste but seems unavoidable to isolate host DMA). Left in the
host are things like ACPI/DT parser, interrupts, possibly the event queue
(which informs of DMA errors), extra features and complex optimizations.
The host also has to implement IOMMU ops to liaise between the DMA API and
the hypervisor.

> anyway just want to check your thoughts on the possibility.
> 
> btw some of my colleagues are porting pKVM to Intel platform. I believe
> they will post their work shortly and there might require some common
> framework in pKVM hypervisor like iommu domain, hypercalls, etc. like 
> what we have in the host iommu subsystem. CC them in case of any early
> thought they want to throw in. 
Tian, Kevin Feb. 3, 2023, 2:04 a.m. UTC | #3
> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Thursday, February 2, 2023 6:05 PM
> 
> On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > Sent: Wednesday, February 1, 2023 8:53 PM
> > >
> > > 3. Private I/O page tables
> > >
> > > A flexible alternative uses private page tables in the SMMU, entirely
> > > disconnected from the CPU page tables. With this the SMMU can
> implement
> > > a
> > > reduced set of features, even shed a stage of translation. This also
> > > provides a virtual I/O address space to the host, which allows more
> > > efficient memory allocation for large buffers, and for devices with
> > > limited addressing abilities.
> > >
> > > This is the solution implemented in this series. The host creates
> > > IOVA->HPA mappings with two hypercalls map_pages() and
> unmap_pages(),
> > > and
> > > the hypervisor populates the page tables. Page tables are abstracted into
> > > IOMMU domains, which allow multiple devices to share the same
> address
> > > space. Another four hypercalls, alloc_domain(), attach_dev(),
> detach_dev()
> > > and free_domain(), manage the domains.
> > >
> >
> > Out of curiosity. Does virtio-iommu fit in this usage?
> 
> I don't think so, because you still need a driver for the physical IOMMU
> in the hypervisor. virtio-iommu would only replace the hypercall interface
> with queues, and I don't think that buys us anything.
> 
> Maybe virtio on the guest side could be advantageous, because that
> interface has to be stable and virtio comes with stable APIs for several
> classes of devices. But implementing virtio in pkvm means a lot of extra
> code so it needs to be considered carefully.
> 

this makes sense.

> > If yes then there is
> > no need to add specific enlightenment in existing iommu drivers. If no
> > probably because as mentioned in the start a full-fledged iommu driver
> > doesn't fit nVHE so lots of smmu driver logic has to be kept in the host?
> 
> To minimize the attack surface of the hypervisor, we don't want to load
> any superfluous code, so the hypervisor part of the SMMUv3 driver only
> contains code to populate tables and send commands (which is still too
> much for my taste but seems unavoidable to isolate host DMA). Left in the
> host are things like ACPI/DT parser, interrupts, possibly the event queue
> (which informs of DMA errors), extra features and complex optimizations.
> The host also has to implement IOMMU ops to liaise between the DMA API
> and
> the hypervisor.
> 
> > anyway just want to check your thoughts on the possibility.
> >
> > btw some of my colleagues are porting pKVM to Intel platform. I believe
> > they will post their work shortly and there might require some common
> > framework in pKVM hypervisor like iommu domain, hypercalls, etc. like
> > what we have in the host iommu subsystem. CC them in case of any early
> > thought they want to throw in. 
Jason Chen CJ Feb. 3, 2023, 8:39 a.m. UTC | #4
> -----Original Message-----
> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Friday, February 3, 2023 10:05 AM
> To: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: maz@kernel.org; catalin.marinas@arm.com; will@kernel.org;
> joro@8bytes.org; robin.murphy@arm.com; james.morse@arm.com;
> suzuki.poulose@arm.com; oliver.upton@linux.dev; yuzenghui@huawei.com;
> smostafa@google.com; dbrazdil@google.com; ryan.roberts@arm.com; linux-
> arm-kernel@lists.infradead.org; kvmarm@lists.linux.dev;
> iommu@lists.linux.dev; Chen, Jason CJ <jason.cj.chen@intel.com>; Zhang,
> Tina <tina.zhang@intel.com>
> Subject: RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
> 
> > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Sent: Thursday, February 2, 2023 6:05 PM
> >
> > On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > > > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > > Sent: Wednesday, February 1, 2023 8:53 PM
> > > >
> > > > 3. Private I/O page tables
> > > >
> > > > A flexible alternative uses private page tables in the SMMU,
> > > > entirely disconnected from the CPU page tables. With this the SMMU
> > > > can
> > implement
> > > > a
> > > > reduced set of features, even shed a stage of translation. This
> > > > also provides a virtual I/O address space to the host, which
> > > > allows more efficient memory allocation for large buffers, and for
> > > > devices with limited addressing abilities.
> > > >
> > > > This is the solution implemented in this series. The host creates
> > > > IOVA->HPA mappings with two hypercalls map_pages() and
> > unmap_pages(),
> > > > and
> > > > the hypervisor populates the page tables. Page tables are
> > > > abstracted into IOMMU domains, which allow multiple devices to
> > > > share the same
> > address
> > > > space. Another four hypercalls, alloc_domain(), attach_dev(),
> > detach_dev()
> > > > and free_domain(), manage the domains.
> > > >
> > >
> > > Out of curiosity. Does virtio-iommu fit in this usage?
> >
> > I don't think so, because you still need a driver for the physical
> > IOMMU in the hypervisor. virtio-iommu would only replace the hypercall
> > interface with queues, and I don't think that buys us anything.
> >
> > Maybe virtio on the guest side could be advantageous, because that
> > interface has to be stable and virtio comes with stable APIs for
> > several classes of devices. But implementing virtio in pkvm means a
> > lot of extra code so it needs to be considered carefully.
> >
> 
> this makes sense.
> 
> > > If yes then there is
> > > no need to add specific enlightenment in existing iommu drivers. If
> > > no probably because as mentioned in the start a full-fledged iommu
> > > driver doesn't fit nVHE so lots of smmu driver logic has to be kept in the
> host?
> >
> > To minimize the attack surface of the hypervisor, we don't want to
> > load any superfluous code, so the hypervisor part of the SMMUv3 driver
> > only contains code to populate tables and send commands (which is
> > still too much for my taste but seems unavoidable to isolate host
> > DMA). Left in the host are things like ACPI/DT parser, interrupts,
> > possibly the event queue (which informs of DMA errors), extra features
> and complex optimizations.
> > The host also has to implement IOMMU ops to liaise between the DMA API
> > and the hypervisor.
> >
> > > anyway just want to check your thoughts on the possibility.
> > >
> > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > believe they will post their work shortly and there might require
> > > some common framework in pKVM hypervisor like iommu domain,
> > > hypercalls, etc. like what we have in the host iommu subsystem. CC
> > > them in case of any early thought they want to throw in. 
Jean-Philippe Brucker Feb. 3, 2023, 11:23 a.m. UTC | #5
Hi Jason,

On Fri, Feb 03, 2023 at 08:39:41AM +0000, Chen, Jason CJ wrote:
> > > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > > believe they will post their work shortly and there might require
> > > > some common framework in pKVM hypervisor like iommu domain,
> > > > hypercalls, etc. like what we have in the host iommu subsystem. CC
> > > > them in case of any early thought they want to throw in. 
Jason Chen CJ Feb. 4, 2023, 8:19 a.m. UTC | #6
Hi, Jean,

Thanks for the information! Let's do more investigation.

Yes, if using enlighten method, we may skip nested translation. Meantime we
shall ensure host not touch this capability. We may also need trade-off to support
SVM kind features.

Thanks

Jason

> -----Original Message-----
> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Friday, February 3, 2023 7:24 PM
> To: Chen, Jason CJ <jason.cj.chen@intel.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; maz@kernel.org;
> catalin.marinas@arm.com; will@kernel.org; joro@8bytes.org;
> robin.murphy@arm.com; james.morse@arm.com;
> suzuki.poulose@arm.com; oliver.upton@linux.dev; yuzenghui@huawei.com;
> smostafa@google.com; dbrazdil@google.com; ryan.roberts@arm.com;
> linux-arm-kernel@lists.infradead.org; kvmarm@lists.linux.dev;
> iommu@lists.linux.dev; Zhang, Tina <tina.zhang@intel.com>
> Subject: Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
> 
> Hi Jason,
> 
> On Fri, Feb 03, 2023 at 08:39:41AM +0000, Chen, Jason CJ wrote:
> > > > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > > > believe they will post their work shortly and there might
> > > > > require some common framework in pKVM hypervisor like iommu
> > > > > domain, hypercalls, etc. like what we have in the host iommu
> > > > > subsystem. CC them in case of any early thought they want to
> > > > > throw in. 
Zhang, Tina Feb. 4, 2023, 12:30 p.m. UTC | #7
On 2/4/23 16:19, Chen, Jason CJ wrote:
> Hi, Jean,
> 
> Thanks for the information! Let's do more investigation.
> 
> Yes, if using enlighten method, we may skip nested translation. Meantime we
> shall ensure host not touch this capability. We may also need trade-off to support
> SVM kind features.
Hi Jason,

Nested translation is also optional to vt-d. Not all IA platforms could 
have vt-d with nested translation support. For those legacy platforms 
(e.g. on which vt-d doesn't support scalable mode), providing an 
enlightened way for pKVM to isolate DMA seems reasonable. Otherwise, 
pKVM may need to shadow io-page table which could introduce performance 
overhead.


Regards,
-Tina
> 
> Thanks
> 
> Jason
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>> Sent: Friday, February 3, 2023 7:24 PM
>> To: Chen, Jason CJ <jason.cj.chen@intel.com>
>> Cc: Tian, Kevin <kevin.tian@intel.com>; maz@kernel.org;
>> catalin.marinas@arm.com; will@kernel.org; joro@8bytes.org;
>> robin.murphy@arm.com; james.morse@arm.com;
>> suzuki.poulose@arm.com; oliver.upton@linux.dev; yuzenghui@huawei.com;
>> smostafa@google.com; dbrazdil@google.com; ryan.roberts@arm.com;
>> linux-arm-kernel@lists.infradead.org; kvmarm@lists.linux.dev;
>> iommu@lists.linux.dev; Zhang, Tina <tina.zhang@intel.com>
>> Subject: Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
>>
>> Hi Jason,
>>
>> On Fri, Feb 03, 2023 at 08:39:41AM +0000, Chen, Jason CJ wrote:
>>>>>> btw some of my colleagues are porting pKVM to Intel platform. I
>>>>>> believe they will post their work shortly and there might
>>>>>> require some common framework in pKVM hypervisor like iommu
>>>>>> domain, hypercalls, etc. like what we have in the host iommu
>>>>>> subsystem. CC them in case of any early thought they want to
>>>>>> throw in.