diff mbox series

[v7,13/24] iommu/arm-smmu-v3: Enable broadcast TLB maintenance

Message ID 20200519175502.2504091-14-jean-philippe@linaro.org (mailing list archive)
State Not Applicable, archived
Headers show
Series iommu: Shared Virtual Addressing for SMMUv3 | expand

Commit Message

Jean-Philippe Brucker May 19, 2020, 5:54 p.m. UTC
The SMMUv3 can handle invalidation targeted at TLB entries with shared
ASIDs. If the implementation supports broadcast TLB maintenance, enable it
and keep track of it in a feature bit. The SMMU will then be affected by
inner-shareable TLB invalidations from other agents.

A major side-effect of this change is that stage-2 translation contexts
are now affected by all invalidations by VMID. VMIDs are all shared and
the only ways to prevent over-invalidation, since the stage-2 page tables
are not shared between CPU and SMMU, are to either disable BTM or allocate
different VMIDs. This patch does not address the problem.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm-smmu-v3.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Will Deacon May 21, 2020, 2:17 p.m. UTC | #1
[+Marc]

On Tue, May 19, 2020 at 07:54:51PM +0200, Jean-Philippe Brucker wrote:
> The SMMUv3 can handle invalidation targeted at TLB entries with shared
> ASIDs. If the implementation supports broadcast TLB maintenance, enable it
> and keep track of it in a feature bit. The SMMU will then be affected by
> inner-shareable TLB invalidations from other agents.
> 
> A major side-effect of this change is that stage-2 translation contexts
> are now affected by all invalidations by VMID. VMIDs are all shared and
> the only ways to prevent over-invalidation, since the stage-2 page tables
> are not shared between CPU and SMMU, are to either disable BTM or allocate
> different VMIDs. This patch does not address the problem.

This sounds like a potential performance issue, particularly as we expose
stage-2 contexts via VFIO directly. Maybe we could reserve some portion of
VMID space for the SMMU? Marc, what do you reckon?

Will
Marc Zyngier May 21, 2020, 2:38 p.m. UTC | #2
On 2020-05-21 15:17, Will Deacon wrote:
> [+Marc]
> 
> On Tue, May 19, 2020 at 07:54:51PM +0200, Jean-Philippe Brucker wrote:
>> The SMMUv3 can handle invalidation targeted at TLB entries with shared
>> ASIDs. If the implementation supports broadcast TLB maintenance, 
>> enable it
>> and keep track of it in a feature bit. The SMMU will then be affected 
>> by
>> inner-shareable TLB invalidations from other agents.
>> 
>> A major side-effect of this change is that stage-2 translation 
>> contexts
>> are now affected by all invalidations by VMID. VMIDs are all shared 
>> and
>> the only ways to prevent over-invalidation, since the stage-2 page 
>> tables
>> are not shared between CPU and SMMU, are to either disable BTM or 
>> allocate
>> different VMIDs. This patch does not address the problem.
> 
> This sounds like a potential performance issue, particularly as we 
> expose
> stage-2 contexts via VFIO directly. Maybe we could reserve some portion 
> of
> VMID space for the SMMU? Marc, what do you reckon?

Certainly doable when we have 16bits VMIDs. With smaller VMID spaces 
(like on
v8.0), this is a bit more difficult (we do have pretty large v8.0 
systems
around). How many VMID bits are we talking about?

         M.
Jean-Philippe Brucker May 22, 2020, 10:17 a.m. UTC | #3
[+Eric]

On Thu, May 21, 2020 at 03:38:35PM +0100, Marc Zyngier wrote:
> On 2020-05-21 15:17, Will Deacon wrote:
> > [+Marc]
> > 
> > On Tue, May 19, 2020 at 07:54:51PM +0200, Jean-Philippe Brucker wrote:
> > > The SMMUv3 can handle invalidation targeted at TLB entries with shared
> > > ASIDs. If the implementation supports broadcast TLB maintenance,
> > > enable it
> > > and keep track of it in a feature bit. The SMMU will then be
> > > affected by
> > > inner-shareable TLB invalidations from other agents.
> > > 
> > > A major side-effect of this change is that stage-2 translation
> > > contexts
> > > are now affected by all invalidations by VMID. VMIDs are all shared
> > > and
> > > the only ways to prevent over-invalidation, since the stage-2 page
> > > tables
> > > are not shared between CPU and SMMU, are to either disable BTM or
> > > allocate
> > > different VMIDs. This patch does not address the problem.
> > 
> > This sounds like a potential performance issue, particularly as we
> > expose
> > stage-2 contexts via VFIO directly.

Yes it's certainly going to affect SMMU performance, though I haven't
measured it. QEMU and kvmtool currently use stage-1 translations instead
of stage-2, so it won't be a problem until they start using nested
translation (and unless the SMMU only supports stage-2).

In the coming month I'd like to have a look at coordinating VMID
allocation between KVM and SMMU, for guest SVA. If the guest wants to
share page tables with the SMMU, the SMMU has to use the same VMIDs as the
VM to receive broadcast TLBI.

Similarly to patch 06 ("arm64: mm: Pin down ASIDs for sharing mm with
devices") the SMMU would request a VMID allocated by KVM, when setting up
a nesting VFIO container. One major downside is that the VMID is pinned
and cannot be recycled on rollover while it's being used for DMA.

I wonder if we could use this even when page tables aren't shared between
CPU and SMMU, to avoid splitting the VMID space.

> > Maybe we could reserve some portion
> > of
> > VMID space for the SMMU? Marc, what do you reckon?
> 
> Certainly doable when we have 16bits VMIDs. With smaller VMID spaces (like
> on
> v8.0), this is a bit more difficult (we do have pretty large v8.0 systems
> around).

It's only an issue if those systems have an SMMUv3 supporting DVM. With
any luck that doesn't exist?

> How many VMID bits are we talking about?

That's anyone's guess... One passed-through device per VM would halve the
VMID space. But the SMMU allocates one VMID for each device assigned to a
guest, not one per VM (well one per domain, or VFIO container, but I think
it boils down to one per device with QEMU). So with SR-IOV for example it
should be pretty easy to reach 256 VMIDs in the SMMU.

Thanks,
Jean
diff mbox series

Patch

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7e1933e7e35f..9332253e3608 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -56,6 +56,7 @@ 
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF			GENMASK(3, 2)
 #define IDR0_TTF_AARCH64		2
@@ -658,6 +659,7 @@  struct arm_smmu_device {
 #define ARM_SMMU_FEAT_VAX		(1 << 14)
 #define ARM_SMMU_FEAT_RANGE_INV		(1 << 15)
 #define ARM_SMMU_FEAT_E2H		(1 << 16)
+#define ARM_SMMU_FEAT_BTM		(1 << 17)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -3819,11 +3821,14 @@  static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID;
+	reg = CR2_RECINVSID;
 
 	if (smmu->features & ARM_SMMU_FEAT_E2H)
 		reg |= CR2_E2H;
 
+	if (!(smmu->features & ARM_SMMU_FEAT_BTM))
+		reg |= CR2_PTM;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -3934,6 +3939,7 @@  static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
 	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	bool vhe = cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN);
 
 	/* IDR0 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
@@ -3983,10 +3989,19 @@  static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+		if (vhe)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	/*
+	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
+	 * will create TLB entries for NH-EL1 world and will miss the
+	 * broadcasted TLB invalidations that target EL2-E2H world. Don't enable
+	 * BTM in that case.
+	 */
+	if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP))
+		smmu->features |= ARM_SMMU_FEAT_BTM;
+
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
 	 * register, but warn on mismatch.