[RFC,19/45] KVM: arm64: iommu: Add domains

Message ID	20230201125328.2186498-20-jean-philippe@linaro.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org> From: Jean-Philippe Brucker <jean-philippe@linaro.org> To: maz@kernel.org, catalin.marinas@arm.com, will@kernel.org, joro@8bytes.org Cc: robin.murphy@arm.com, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, smostafa@google.com, dbrazdil@google.com, ryan.roberts@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, iommu@lists.linux.dev, Jean-Philippe Brucker <jean-philippe@linaro.org> Subject: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains Date: Wed, 1 Feb 2023 12:53:03 +0000 Message-Id: <20230201125328.2186498-20-jean-philippe@linaro.org> In-Reply-To: <20230201125328.2186498-1-jean-philippe@linaro.org> References: <20230201125328.2186498-1-jean-philippe@linaro.org> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	KVM: Arm SMMUv3 driver for pKVM \| expand [RFC,00/45] KVM: Arm SMMUv3 driver for pKVM [RFC,01/45] iommu/io-pgtable-arm: Split the page table driver [RFC,02/45] iommu/io-pgtable-arm: Split initialization [RFC,03/45] iommu/io-pgtable: Move fmt into io_pgtable_cfg [RFC,04/45] iommu/io-pgtable: Add configure() operation [RFC,05/45] iommu/io-pgtable: Split io_pgtable structure [RFC,06/45] iommu/io-pgtable-arm: Extend __arm_lpae_free_pgtable() to only free child tables [RFC,07/45] iommu/arm-smmu-v3: Move some definitions to arm64 include/ [RFC,08/45] KVM: arm64: pkvm: Add pkvm_udelay() [RFC,09/45] KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping() [RFC,10/45] KVM: arm64: pkvm: Expose pkvm_map/unmap_donated_memory() [RFC,11/45] KVM: arm64: pkvm: Expose pkvm_admit_host_page() [RFC,12/45] KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory() [RFC,13/45] KVM: arm64: pkvm: Add hyp_page_ref_inc_return() [RFC,14/45] KVM: arm64: pkvm: Prevent host donation of device memory [RFC,15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma() [RFC,16/45] KVM: arm64: Introduce IOMMU driver infrastructure [RFC,17/45] KVM: arm64: pkvm: Add IOMMU hypercalls [RFC,18/45] KVM: arm64: iommu: Add per-cpu page queue [RFC,19/45] KVM: arm64: iommu: Add domains [RFC,20/45] KVM: arm64: iommu: Add map() and unmap() operations [RFC,21/45] KVM: arm64: iommu: Add SMMUv3 driver [RFC,22/45] KVM: arm64: smmu-v3: Initialize registers [RFC,23/45] KVM: arm64: smmu-v3: Setup command queue [RFC,24/45] KVM: arm64: smmu-v3: Setup stream table [RFC,25/45] KVM: arm64: smmu-v3: Reset the device [RFC,26/45] KVM: arm64: smmu-v3: Support io-pgtable [RFC,27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration [RFC,28/45] iommu/arm-smmu-v3: Extract driver-specific bits from probe function [RFC,29/45] iommu/arm-smmu-v3: Move some functions to arm-smmu-v3-common.c [RFC,30/45] iommu/arm-smmu-v3: Move queue and table allocation to arm-smmu-v3-common.c [RFC,31/45] iommu/arm-smmu-v3: Move firmware probe to arm-smmu-v3-common [RFC,32/45] iommu/arm-smmu-v3: Move IOMMU registration to arm-smmu-v3-common.c [RFC,33/45] iommu/arm-smmu-v3: Use single pages for level-2 stream tables [RFC,34/45] iommu/arm-smmu-v3: Add host driver for pKVM [RFC,35/45] iommu/arm-smmu-v3-kvm: Pass a list of SMMU devices to the hypervisor [RFC,36/45] iommu/arm-smmu-v3-kvm: Validate device features [RFC,37/45] iommu/arm-smmu-v3-kvm: Allocate structures and reset device [RFC,38/45] iommu/arm-smmu-v3-kvm: Add per-cpu page queue [RFC,39/45] iommu/arm-smmu-v3-kvm: Initialize page table configuration [RFC,40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops [RFC,41/45] KVM: arm64: pkvm: Add __pkvm_host_add_remove_page() [RFC,42/45] KVM: arm64: pkvm: Support SCMI power domain [RFC,43/45] KVM: arm64: smmu-v3: Support power management [RFC,44/45] iommu/arm-smmu-v3-kvm: Support power management with SCMI SMC [RFC,45/45] iommu/arm-smmu-v3-kvm: Enable runtime PM

Jean-Philippe Brucker Feb. 1, 2023, 12:53 p.m. UTC

The IOMMU domain abstraction allows to share the same page tables
between multiple devices. That may be necessary due to hardware
constraints, if multiple devices cannot be isolated by the IOMMU
(conventional PCI bus for example). It may also help with optimizing
resource or TLB use. For pKVM in particular, it may be useful to reduce
the amount of memory required for page tables. All devices owned by the
host kernel could be attached to the same domain (though that requires
host changes).

Each IOMMU device holds an array of domains, and the host allocates
domain IDs that index this array. The alloc() operation initializes the
domain and prepares the page tables. The attach() operation initializes
the device table that holds the PGD and its configuration.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/iommu.h |  16 +++
 include/kvm/iommu.h                     |  55 ++++++++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c   | 161 ++++++++++++++++++++++++
 3 files changed, 232 insertions(+)

Mostafa Saleh Feb. 7, 2023, 1:13 p.m. UTC | #1

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> The IOMMU domain abstraction allows to share the same page tables
> between multiple devices. That may be necessary due to hardware
> constraints, if multiple devices cannot be isolated by the IOMMU
> (conventional PCI bus for example). It may also help with optimizing
> resource or TLB use. For pKVM in particular, it may be useful to reduce
> the amount of memory required for page tables. All devices owned by the
> host kernel could be attached to the same domain (though that requires
> host changes).
> 
> Each IOMMU device holds an array of domains, and the host allocates
> domain IDs that index this array. The alloc() operation initializes the
> domain and prepares the page tables. The attach() operation initializes
> the device table that holds the PGD and its configuration.

I was wondering about the need for pre-allocation of the domain array.

An alternative way I see:
- We don’t pre-allocate any domain.

- When the EL1 driver has a request to domain_alloc, it will allocate
both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).

- In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
from the kernel (via donation).

- In all other hypercalls, the kernel address of kvm_hyp_iommu_domain will
be used as domain ID, which guarantees uniqueness and O(1) access.

- The hypervisor would just need to transform the address(kern_hyp_va)
to get the domain pointer.

I believe that would save some memory as we allocate domains when needed.

Please let me know what you think about this?

Thanks,
Mostafa

Mostafa Saleh Feb. 8, 2023, 12:31 p.m. UTC | #2

On Tue, Feb 7, 2023 at 1:13 PM Mostafa Saleh <smostafa@google.com> wrote:

> I was wondering about the need for pre-allocation of the domain array.
>
> An alternative way I see:
> - We don’t pre-allocate any domain.
>
> - When the EL1 driver has a request to domain_alloc, it will allocate
> both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).
>
> - In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
> from the kernel (via donation).
>
> - In all other hypercalls, the kernel address of kvm_hyp_iommu_domain will
> be used as domain ID, which guarantees uniqueness and O(1) access.
>
> - The hypervisor would just need to transform the address(kern_hyp_va)
> to get the domain pointer.

This actually will not work with the current sequence, as we can't
guarantee that the domain_id sent later from the host is trusted, and as
the domain points to the page table this can be dangerous, I will have a
closer look to see if we can make this work somehow.

Jean-Philippe Brucker Feb. 8, 2023, 6:05 p.m. UTC | #3

On Wed, Feb 08, 2023 at 12:31:15PM +0000, Mostafa Saleh wrote:
> On Tue, Feb 7, 2023 at 1:13 PM Mostafa Saleh <smostafa@google.com> wrote:
> 
> > I was wondering about the need for pre-allocation of the domain array.
> >
> > An alternative way I see:
> > - We don’t pre-allocate any domain.
> >
> > - When the EL1 driver has a request to domain_alloc, it will allocate
> > both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).
> >
> > - In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
> > from the kernel (via donation).

That also requires an entire page for each domain no?  I guess this domain
table would only be worse in memory use if we have fewer than 2 domains,
since it costs one page for the root table, and then stores 256 domains
per leaf page.

What I've been trying to avoid with this table is introducing a malloc in
the hypervisor, but we might have to bite the bullet eventually (although
with a malloc, access will probably worse than O(1)).

Thanks,
Jean

> >
> > - In all other hypercalls, the kernel address of kvm_hyp_iommu_domain will
> > be used as domain ID, which guarantees uniqueness and O(1) access.
> >
> > - The hypervisor would just need to transform the address(kern_hyp_va)
> > to get the domain pointer.
> 
> 
> This actually will not work with the current sequence, as we can't
> guarantee that the domain_id sent later from the host is trusted, and as
> the domain points to the page table this can be dangerous, I will have a
> closer look to see if we can make this work somehow.

Mostafa Saleh Feb. 10, 2023, 10:03 p.m. UTC | #4

On Wed, Feb 8, 2023 at 6:05 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> On Wed, Feb 08, 2023 at 12:31:15PM +0000, Mostafa Saleh wrote:
> > On Tue, Feb 7, 2023 at 1:13 PM Mostafa Saleh <smostafa@google.com> wrote:
> >
> > > I was wondering about the need for pre-allocation of the domain array.
> > >
> > > An alternative way I see:
> > > - We don’t pre-allocate any domain.
> > >
> > > - When the EL1 driver has a request to domain_alloc, it will allocate
> > > both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).
> > >
> > > - In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
> > > from the kernel (via donation).
>
> That also requires an entire page for each domain no?  I guess this domain
> table would only be worse in memory use if we have fewer than 2 domains,
> since it costs one page for the root table, and then stores 256 domains
> per leaf page.

Yes, that would require a page for a domain also, which is inefficient.

> What I've been trying to avoid with this table is introducing a malloc in
> the hypervisor, but we might have to bite the bullet eventually (although
> with a malloc, access will probably worse than O(1)).

An alternative approach,

1- At SMMU init, it will alloc va range which is not backed by any memory
(via pkvm_alloc_private_va_range) that is contiguous with the max size
of domains.
2- This will be like a large array indexed by domain id, and it would
be filled on demand
from memcache.
3-alloc_domain will make sure that the new domain_id has a page and
then any other
access from map and unmap would just index this memory.

This can save the extra page in the root table, handle_to_domain would
slightly be
more efficient.
But this can cause page faults in EL2 if domain_id was not correct (allocated in
EL2 before). So I am not sure if it is worth it.

Thanks,
Mostafa

Mostafa Saleh May 19, 2023, 3:33 p.m. UTC | #5

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> +/*
> + * Serialize access to domains and IOMMU driver internal structures (command
> + * queue, device tables)
> + */
> +static hyp_spinlock_t iommu_lock;
> +
I was looking more into this lock and I think we can make it per IOMMU instead
of having one big lock to avoid congestion, as I see it is only used to
protect per-IOMMU resources.
Some special handling needed as hyp_spinlock_t is not exposed to EL1.
Maybe something like this:

diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b257a3b4bfc5..96d30a37f9e6 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -3,11 +3,13 @@
 #include <linux/kbuild.h>
 #include <nvhe/memory.h>
 #include <nvhe/pkvm.h>
+#include <nvhe/spinlock.h>

 int main(void)
 {
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
 	DEFINE(PKVM_HYP_VM_SIZE,	sizeof(struct pkvm_hyp_vm));
 	DEFINE(PKVM_HYP_VCPU_SIZE,	sizeof(struct pkvm_hyp_vcpu));
+	DEFINE(HYP_SPINLOCK_SIZE,	sizeof(hyp_spinlock_t));
 	return 0;
 }
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index a90b97d8bae3..cf9195e24a08 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
 
 obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
+ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index ab888da731bc..82827b99b1ed 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -5,6 +5,12 @@
 #include <asm/kvm_host.h>
 #include <kvm/power_domain.h>
 #include <linux/io-pgtable.h>
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/spinlock.h>
+#else
+#include "hyp_constants.h"
+#endif
 
 /*
  * Parameters from the trusted host:
@@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
 
 	struct io_pgtable_params	*pgtable;
 	bool				power_is_off;
+#ifdef __KVM_NVHE_HYPERVISOR__
+	hyp_spinlock_t			iommu_lock;
+#else
+	u8 unused[HYP_SPINLOCK_SIZE];
+#endif
 };
 
 struct kvm_hyp_iommu_memcache {
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 1f4d5fcc1386..afaf173e65ed 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -14,12 +14,6 @@
 
 struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
 
-/*
- * Serialize access to domains and IOMMU driver internal structures (command
- * queue, device tables)
- */
-static hyp_spinlock_t iommu_lock;
-
 #define domain_to_iopt(_iommu, _domain, _domain_id)		\
 	(struct io_pgtable) {					\
 		.ops = &(_iommu)->pgtable->ops,			\
@@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 {
 	int ret = -EINVAL;
 	struct io_pgtable iopt;
-	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	struct kvm_hyp_iommu_domain *domain;
 
-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	domain = handle_to_domain(iommu_id, domain_id, &iommu);
 	if (!domain)
 		goto out_unlock;
@@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	domain->refs = 1;
 	domain->pgd = iopt.pgd;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }
 
@@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 {
 	int ret = -EINVAL;
 	struct io_pgtable iopt;
-	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	struct kvm_hyp_iommu_domain *domain;
 
-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	domain = handle_to_domain(iommu_id, domain_id, &iommu);
 	if (!domain)
 		goto out_unlock;
@@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 	memset(domain, 0, sizeof(*domain));
 
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

Jean-Philippe Brucker June 2, 2023, 3:29 p.m. UTC | #6

Hi Mostafa,

On Fri, 19 May 2023 at 16:33, Mostafa Saleh <smostafa@google.com> wrote:
>
> Hi Jean,
>
> On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> > +/*
> > + * Serialize access to domains and IOMMU driver internal structures (command
> > + * queue, device tables)
> > + */
> > +static hyp_spinlock_t iommu_lock;
> > +
> I was looking more into this lock and I think we can make it per IOMMU instead
> of having one big lock to avoid congestion, as I see it is only used to
> protect per-IOMMU resources.

Yes it's a major bottleneck, thanks for looking into this. I think we'll
eventually want to improve scalability within an IOMMU and even a domain:
a multi-queue network device will share a single domain between multiple
CPUs, each issuing lots of map/unmap calls. Or just two devices drivers
working on different CPUs and sharing one IOMMU. In those cases the
per-IOMMU lock won't be good enough and we'll be back to this problem, but
a per-IOMMU lock should already improve scalability for some systems.

Currently the global lock protects:

(1) The domain array (per IOMMU)
(2) The io-pgtables (per domain)
(3) The command queue (per SMMU)
(4) Power state (per SMMU)

I ran some experiments with refcounting the domains to avoid the lock for
(1) and (2), which improves map() scalability. See
https://jpbrucker.net/git/linux/commit/?h=pkvm/smmu-wip&id=5ad3bc6fd589b6f11fe31ccee5b8777ba8739167
(and another experiment to optimize the DMA refcount on branch pkvm/smmu-wip)

For (2), the io-pgtable-arm code already supports concurrent accesses
(2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")). But
this one needs careful consideration because in the host, the io-pgtable
code trusts the device drivers. For example it expects that only buggy
drivers call map()/unmap() to the same IOVA concurrently. We need to make
sure a compromised host drivers can't exploit these assumptions to do
anything nasty.

There are options to improve (3) as well. The host SMMU driver supports
lockless command-queue (587e6c10a7ce ("iommu/arm-smmu-v3: Reduce
contention during command-queue insertion")) but it may be too complex to
put in the hypervisor (or rather, I haven't tried to understand it yet).
We could also wait for systems with ECMDQ, which enables per-CPU queues.

> Some special handling needed as hyp_spinlock_t is not exposed to EL1.
> Maybe something like this:
>
> diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
> index b257a3b4bfc5..96d30a37f9e6 100644
> --- a/arch/arm64/kvm/hyp/hyp-constants.c
> +++ b/arch/arm64/kvm/hyp/hyp-constants.c
> @@ -3,11 +3,13 @@
>  #include <linux/kbuild.h>
>  #include <nvhe/memory.h>
>  #include <nvhe/pkvm.h>
> +#include <nvhe/spinlock.h>
>
>  int main(void)
>  {
>         DEFINE(STRUCT_HYP_PAGE_SIZE,    sizeof(struct hyp_page));
>         DEFINE(PKVM_HYP_VM_SIZE,        sizeof(struct pkvm_hyp_vm));
>         DEFINE(PKVM_HYP_VCPU_SIZE,      sizeof(struct pkvm_hyp_vcpu));
> +       DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
>         return 0;
>  }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index a90b97d8bae3..cf9195e24a08 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
>  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
>
>  obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
> +ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
>  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
>  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
>  arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
> diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
> index ab888da731bc..82827b99b1ed 100644
> --- a/include/kvm/iommu.h
> +++ b/include/kvm/iommu.h
> @@ -5,6 +5,12 @@
>  #include <asm/kvm_host.h>
>  #include <kvm/power_domain.h>
>  #include <linux/io-pgtable.h>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +#include <nvhe/spinlock.h>
> +#else
> +#include "hyp_constants.h"
> +#endif
>
>  /*
>   * Parameters from the trusted host:
> @@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
>
>         struct io_pgtable_params        *pgtable;
>         bool                            power_is_off;
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +       hyp_spinlock_t                  iommu_lock;
> +#else
> +       u8 unused[HYP_SPINLOCK_SIZE];
> +#endif
>  };
>
>  struct kvm_hyp_iommu_memcache {
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index 1f4d5fcc1386..afaf173e65ed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -14,12 +14,6 @@
>
>  struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
>
> -/*
> - * Serialize access to domains and IOMMU driver internal structures (command
> - * queue, device tables)
> - */
> -static hyp_spinlock_t iommu_lock;
> -
>  #define domain_to_iopt(_iommu, _domain, _domain_id)            \
>         (struct io_pgtable) {                                   \
>                 .ops = &(_iommu)->pgtable->ops,                 \
> @@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>  {
>         int ret = -EINVAL;
>         struct io_pgtable iopt;
> -       struct kvm_hyp_iommu *iommu;
> +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
>         struct kvm_hyp_iommu_domain *domain;
>
> -       hyp_spin_lock(&iommu_lock);
> +       hyp_spin_lock(&iommu->iommu_lock);
>         domain = handle_to_domain(iommu_id, domain_id, &iommu);
>         if (!domain)
>                 goto out_unlock;
> @@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>         domain->refs = 1;
>         domain->pgd = iopt.pgd;
>  out_unlock:
> -       hyp_spin_unlock(&iommu_lock);
> +       hyp_spin_unlock(&iommu->iommu_lock);
>         return ret;
>  }
>
> @@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
>  {
>         int ret = -EINVAL;
>         struct io_pgtable iopt;
> -       struct kvm_hyp_iommu *iommu;
> +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
>         struct kvm_hyp_iommu_domain *domain;
>
> -       hyp_spin_lock(&iommu_lock);
> +       hyp_spin_lock(&iommu->iommu_lock);
>         domain = handle_to_domain(iommu_id, domain_id, &iommu);
>         if (!domain)
>                 goto out_unlock;
> @@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
>         memset(domain, 0, sizeof(*domain));
>
>  out_unlock:
> -       hyp_spin_unlock(&iommu_lock);
> +       hyp_spin_unlock(&iommu->iommu_lock);
>         return ret;
>  }
> --
>
> (I didn't include full patch as it is too long, but mainly the
> rest is s/&iommu_lock/&iommu->iommu_lock)
>
> Please let me know what do you think?

It makes sense, I can append it to my tree or squash it (with your S-o-B)
if you want to send the full patch

Thanks,
Jean

Mostafa Saleh June 15, 2023, 1:32 p.m. UTC | #7

Hi Jean,

On Fri, Jun 02, 2023 at 04:29:07PM +0100, Jean-Philippe Brucker wrote:
> Hi Mostafa,
> 
> On Fri, 19 May 2023 at 16:33, Mostafa Saleh <smostafa@google.com> wrote:
> >
> > Hi Jean,
> >
> > On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> > > +/*
> > > + * Serialize access to domains and IOMMU driver internal structures (command
> > > + * queue, device tables)
> > > + */
> > > +static hyp_spinlock_t iommu_lock;
> > > +
> > I was looking more into this lock and I think we can make it per IOMMU instead
> > of having one big lock to avoid congestion, as I see it is only used to
> > protect per-IOMMU resources.
> 
> Yes it's a major bottleneck, thanks for looking into this. I think we'll
> eventually want to improve scalability within an IOMMU and even a domain:
> a multi-queue network device will share a single domain between multiple
> CPUs, each issuing lots of map/unmap calls. Or just two devices drivers
> working on different CPUs and sharing one IOMMU. In those cases the
> per-IOMMU lock won't be good enough and we'll be back to this problem, but
> a per-IOMMU lock should already improve scalability for some systems.
> 
> Currently the global lock protects:
> 
> (1) The domain array (per IOMMU)
> (2) The io-pgtables (per domain)
> (3) The command queue (per SMMU)
> (4) Power state (per SMMU)
> 
> I ran some experiments with refcounting the domains to avoid the lock for
> (1) and (2), which improves map() scalability. See
> https://jpbrucker.net/git/linux/commit/?h=pkvm/smmu-wip&id=5ad3bc6fd589b6f11fe31ccee5b8777ba8739167
> (and another experiment to optimize the DMA refcount on branch pkvm/smmu-wip)

Thanks for the detailed explanation! I will give it a try.

> For (2), the io-pgtable-arm code already supports concurrent accesses
> (2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")). But
> this one needs careful consideration because in the host, the io-pgtable
> code trusts the device drivers. For example it expects that only buggy
> drivers call map()/unmap() to the same IOVA concurrently. We need to make
> sure a compromised host drivers can't exploit these assumptions to do
> anything nasty.
> 
> There are options to improve (3) as well. The host SMMU driver supports
> lockless command-queue (587e6c10a7ce ("iommu/arm-smmu-v3: Reduce
> contention during command-queue insertion")) but it may be too complex to
> put in the hypervisor (or rather, I haven't tried to understand it yet).
> We could also wait for systems with ECMDQ, which enables per-CPU queues.
I was thinking about the same thing, in the futrue we should aim to have a
similar lockless approach for the cmdq as the host in EL2. I am not sure
how feasable it is, I will have a closer look. But having a per-IOMMU
shouldbe be a step in the right direction.

> > Some special handling needed as hyp_spinlock_t is not exposed to EL1.
> > Maybe something like this:
> >
> > diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
> > index b257a3b4bfc5..96d30a37f9e6 100644
> > --- a/arch/arm64/kvm/hyp/hyp-constants.c
> > +++ b/arch/arm64/kvm/hyp/hyp-constants.c
> > @@ -3,11 +3,13 @@
> >  #include <linux/kbuild.h>
> >  #include <nvhe/memory.h>
> >  #include <nvhe/pkvm.h>
> > +#include <nvhe/spinlock.h>
> >
> >  int main(void)
> >  {
> >         DEFINE(STRUCT_HYP_PAGE_SIZE,    sizeof(struct hyp_page));
> >         DEFINE(PKVM_HYP_VM_SIZE,        sizeof(struct pkvm_hyp_vm));
> >         DEFINE(PKVM_HYP_VCPU_SIZE,      sizeof(struct pkvm_hyp_vcpu));
> > +       DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
> >         return 0;
> >  }
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> > index a90b97d8bae3..cf9195e24a08 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> > +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> > @@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
> >  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
> >
> >  obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
> > +ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
> >  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
> >  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
> >  arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
> > diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
> > index ab888da731bc..82827b99b1ed 100644
> > --- a/include/kvm/iommu.h
> > +++ b/include/kvm/iommu.h
> > @@ -5,6 +5,12 @@
> >  #include <asm/kvm_host.h>
> >  #include <kvm/power_domain.h>
> >  #include <linux/io-pgtable.h>
> > +#ifdef __KVM_NVHE_HYPERVISOR__
> > +#include <nvhe/spinlock.h>
> > +#else
> > +#include "hyp_constants.h"
> > +#endif
> >
> >  /*
> >   * Parameters from the trusted host:
> > @@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
> >
> >         struct io_pgtable_params        *pgtable;
> >         bool                            power_is_off;
> > +#ifdef __KVM_NVHE_HYPERVISOR__
> > +       hyp_spinlock_t                  iommu_lock;
> > +#else
> > +       u8 unused[HYP_SPINLOCK_SIZE];
> > +#endif
> >  };
> >
> >  struct kvm_hyp_iommu_memcache {
> > diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > index 1f4d5fcc1386..afaf173e65ed 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > @@ -14,12 +14,6 @@
> >
> >  struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
> >
> > -/*
> > - * Serialize access to domains and IOMMU driver internal structures (command
> > - * queue, device tables)
> > - */
> > -static hyp_spinlock_t iommu_lock;
> > -
> >  #define domain_to_iopt(_iommu, _domain, _domain_id)            \
> >         (struct io_pgtable) {                                   \
> >                 .ops = &(_iommu)->pgtable->ops,                 \
> > @@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> >  {
> >         int ret = -EINVAL;
> >         struct io_pgtable iopt;
> > -       struct kvm_hyp_iommu *iommu;
> > +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
> >         struct kvm_hyp_iommu_domain *domain;
> >
> > -       hyp_spin_lock(&iommu_lock);
> > +       hyp_spin_lock(&iommu->iommu_lock);
> >         domain = handle_to_domain(iommu_id, domain_id, &iommu);
> >         if (!domain)
> >                 goto out_unlock;
> > @@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> >         domain->refs = 1;
> >         domain->pgd = iopt.pgd;
> >  out_unlock:
> > -       hyp_spin_unlock(&iommu_lock);
> > +       hyp_spin_unlock(&iommu->iommu_lock);
> >         return ret;
> >  }
> >
> > @@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
> >  {
> >         int ret = -EINVAL;
> >         struct io_pgtable iopt;
> > -       struct kvm_hyp_iommu *iommu;
> > +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
> >         struct kvm_hyp_iommu_domain *domain;
> >
> > -       hyp_spin_lock(&iommu_lock);
> > +       hyp_spin_lock(&iommu->iommu_lock);
> >         domain = handle_to_domain(iommu_id, domain_id, &iommu);
> >         if (!domain)
> >                 goto out_unlock;
> > @@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
> >         memset(domain, 0, sizeof(*domain));
> >
> >  out_unlock:
> > -       hyp_spin_unlock(&iommu_lock);
> > +       hyp_spin_unlock(&iommu->iommu_lock);
> >         return ret;
> >  }
> > --
> >
> > (I didn't include full patch as it is too long, but mainly the
> > rest is s/&iommu_lock/&iommu->iommu_lock)
> >
> > Please let me know what do you think?
> 
> It makes sense, I can append it to my tree or squash it (with your S-o-B)
> if you want to send the full patch
Sure, that is the full patch based on latest
https://jpbrucker.net/git/linux/log/?h=pkvm/smmu

Thanks,
Mostafa

--

From 5b43b97ca2486d52aa2ad475e4a34357dea57bca Mon Sep 17 00:00:00 2001
From: Mostafa Saleh <smostafa@google.com>
Date: Tue, 13 Jun 2023 14:59:37 +0000
Subject: [PATCH] KVM: arm: iommu: Use per-iommu lock instead of one global
 lock

Currently, there is one big lock for all of the IOMMU operations
which is used to protect per-IOMMU resources or domains (which are
currently per-IOMMU also).

This can be improved for setups with many IOMMUs by having a per
IOMMU lock.

As now we always get the iommu before calling handle_to_domain, we
can remove the out_iommu arg and iommu_id and pass iommu instead.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 arch/arm64/kvm/hyp/hyp-constants.c     |  1 +
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c  | 75 +++++++++++++-------------
 drivers/iommu/arm/arm-smmu-v3/Makefile |  1 +
 include/kvm/iommu.h                    | 10 ++++
 4 files changed, 48 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b257a3b4bfc5..d84cc8ec1e46 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -9,5 +9,6 @@ int main(void)
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
 	DEFINE(PKVM_HYP_VM_SIZE,	sizeof(struct pkvm_hyp_vm));
 	DEFINE(PKVM_HYP_VCPU_SIZE,	sizeof(struct pkvm_hyp_vcpu));
+	DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
 	return 0;
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index c70d3b6934eb..1dcba39d1d41 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -14,12 +14,6 @@

 struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;

-/*
- * Serialize access to domains and IOMMU driver internal structures (command
- * queue, device tables)
- */
-static hyp_spinlock_t iommu_lock;
-
 #define domain_to_iopt(_iommu, _domain, _domain_id)		\
 	(struct io_pgtable) {					\
 		.ops = &(_iommu)->pgtable->ops,			\
@@ -59,14 +53,11 @@ void kvm_iommu_reclaim_page(void *p)
 }

 static struct kvm_hyp_iommu_domain *
-handle_to_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
-		 struct kvm_hyp_iommu **out_iommu)
+handle_to_domain(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id)
 {
 	int idx;
-	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domains;

-	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	if (!iommu)
 		return NULL;

@@ -83,7 +74,6 @@ handle_to_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		iommu->domains[idx] = domains;
 	}

-	*out_iommu = iommu;
 	return &domains[domain_id & KVM_IOMMU_DOMAIN_ID_LEAF_MASK];
 }

@@ -95,8 +85,9 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto out_unlock;

@@ -111,7 +102,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	domain->refs = 1;
 	domain->pgd = iopt.pgd;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -122,8 +113,9 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto out_unlock;

@@ -136,7 +128,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 	memset(domain, 0, sizeof(*domain));

 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -147,8 +139,9 @@ int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain || !domain->refs || domain->refs == UINT_MAX)
 		goto out_unlock;

@@ -158,7 +151,7 @@ int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,

 	domain->refs++;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -169,8 +162,9 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain || domain->refs <= 1)
 		goto out_unlock;

@@ -180,7 +174,7 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,

 	domain->refs--;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -209,9 +203,10 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	    iova + size < iova || paddr + size < paddr)
 		return -EOVERFLOW;

-	hyp_spin_lock(&iommu_lock);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);

-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto err_unlock;

@@ -236,7 +231,7 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		paddr += mapped;
 	}

-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return 0;

 err_unmap:
@@ -245,7 +240,7 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		iopt_unmap_pages(&iopt, iova_orig, pgsize, pgcount, NULL);
 	__pkvm_host_unshare_dma(paddr_orig, size);
 err_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -269,8 +264,9 @@ size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	    iova + size < iova)
 		return 0;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto out_unlock;

@@ -299,7 +295,7 @@ size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	}

 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return total_unmapped;
 }

@@ -311,14 +307,15 @@ phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (domain) {
 		iopt = domain_to_iopt(iommu, domain, domain_id);

 		phys = iopt_iova_to_phys(&iopt, iova);
 	}
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return phys;
 }

@@ -333,9 +330,9 @@ static int iommu_power_on(struct kvm_power_domain *pd)
 	 * We currently assume that the device retains its architectural state
 	 * across power off, hence no save/restore.
 	 */
-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	iommu->power_is_off = false;
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return 0;
 }

@@ -346,9 +343,9 @@ static int iommu_power_off(struct kvm_power_domain *pd)

 	pkvm_debug("%s\n", __func__);

-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	iommu->power_is_off = true;
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return 0;
 }

@@ -362,6 +359,8 @@ int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
 	int ret;
 	void *domains;

+	hyp_spin_lock_init(&iommu->iommu_lock);
+
 	ret = pkvm_init_power_domain(&iommu->power_domain, &iommu_power_ops);
 	if (ret)
 		return ret;
@@ -376,8 +375,6 @@ int kvm_iommu_init(void)
 {
 	enum kvm_pgtable_prot prot;

-	hyp_spin_lock_init(&iommu_lock);
-
 	if (WARN_ON(!kvm_iommu_ops.get_iommu_by_id ||
 		    !kvm_iommu_ops.alloc_iopt ||
 		    !kvm_iommu_ops.free_iopt ||
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index a90b97d8bae3..cf9195e24a08 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)

 obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
+ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index ab888da731bc..b7cda7540156 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -5,6 +5,11 @@
 #include <asm/kvm_host.h>
 #include <kvm/power_domain.h>
 #include <linux/io-pgtable.h>
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/spinlock.h>
+#else
+#include "hyp_constants.h"
+#endif

 /*
  * Parameters from the trusted host:
@@ -23,6 +28,11 @@ struct kvm_hyp_iommu {

 	struct io_pgtable_params	*pgtable;
 	bool				power_is_off;
+#ifdef __KVM_NVHE_HYPERVISOR__
+	hyp_spinlock_t			iommu_lock;
+#else
+	u8 unused[HYP_SPINLOCK_SIZE];
+#endif
 };

 struct kvm_hyp_iommu_memcache {
--
2.41.0.162.gfafddb0af9-goog

[RFC,19/45] KVM: arm64: iommu: Add domains

Commit Message

Comments

Patch