diff mbox series

[v5,04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions

Message ID 4-v5-cd1be8dd9c71+3fa-smmuv3_newapi_p1_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series [v5,01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers | expand

Commit Message

Jason Gunthorpe Feb. 6, 2024, 3:12 p.m. UTC
This is preparation to move the STE calculation higher up in to the call
chain and remove arm_smmu_write_strtab_ent(). These new functions will be
called directly from attach_dev.

Reviewed-by: Moritz Fischer <mdf@kernel.org>
Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 136 ++++++++++++--------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   1 +
 2 files changed, 84 insertions(+), 53 deletions(-)

Comments

Jason Gunthorpe Feb. 16, 2024, 5:12 p.m. UTC | #1
On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote:
> +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> +					struct arm_smmu_master *master,
> +					struct arm_smmu_domain *smmu_domain)
> +{
> +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
> +
> +	memset(target, 0, sizeof(*target));
> +	target->data[0] = cpu_to_le64(
> +		STRTAB_STE_0_V |
> +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
> +
> +	target->data[1] = cpu_to_le64(
> +		FIELD_PREP(STRTAB_STE_1_EATS,
> +			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
> +		FIELD_PREP(STRTAB_STE_1_SHCFG,
> +			   STRTAB_STE_1_SHCFG_NON_SHARABLE));

Just so we are on the same page.. The above NON_SHARABLE is a mistake
here since v1.

It is hard to follow arm_smmu_write_strtab_ent() so we all missed that
the S2 ends up re-using the qword[1] that was installed by the
bypass/abort STE that has to be in place prior to installing the S2.

Only the S1 path sets SHCFG to 0 because the HW doesn't use it due to
the current driver not using S1DSS.

Jason
Will Deacon Feb. 16, 2024, 5:39 p.m. UTC | #2
On Fri, Feb 16, 2024 at 01:12:17PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote:
> > +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> > +					struct arm_smmu_master *master,
> > +					struct arm_smmu_domain *smmu_domain)
> > +{
> > +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
> > +
> > +	memset(target, 0, sizeof(*target));
> > +	target->data[0] = cpu_to_le64(
> > +		STRTAB_STE_0_V |
> > +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
> > +
> > +	target->data[1] = cpu_to_le64(
> > +		FIELD_PREP(STRTAB_STE_1_EATS,
> > +			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
> > +		FIELD_PREP(STRTAB_STE_1_SHCFG,
> > +			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
> 
> Just so we are on the same page.. The above NON_SHARABLE is a mistake
> here since v1.
> 
> It is hard to follow arm_smmu_write_strtab_ent() so we all missed that
> the S2 ends up re-using the qword[1] that was installed by the
> bypass/abort STE that has to be in place prior to installing the S2.

Ah! I thought you were inheriting the existing behaviour, but yeah, it's
a straight-up bug which I think just makes life a little more difficult
than it needs to be. If we can keep SHCFG as "use incoming" in all
configurations, then I do think we can move to a per-qword rather than a
per-field approach, as mentioned in the other part of the thread. I'll
try to make some time next week to play with it.

Will
Jason Gunthorpe Feb. 16, 2024, 5:58 p.m. UTC | #3
On Fri, Feb 16, 2024 at 05:39:22PM +0000, Will Deacon wrote:
> On Fri, Feb 16, 2024 at 01:12:17PM -0400, Jason Gunthorpe wrote:
> > On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote:
> > > +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> > > +					struct arm_smmu_master *master,
> > > +					struct arm_smmu_domain *smmu_domain)
> > > +{
> > > +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
> > > +
> > > +	memset(target, 0, sizeof(*target));
> > > +	target->data[0] = cpu_to_le64(
> > > +		STRTAB_STE_0_V |
> > > +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
> > > +
> > > +	target->data[1] = cpu_to_le64(
> > > +		FIELD_PREP(STRTAB_STE_1_EATS,
> > > +			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
> > > +		FIELD_PREP(STRTAB_STE_1_SHCFG,
> > > +			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
> > 
> > Just so we are on the same page.. The above NON_SHARABLE is a mistake
> > here since v1.
> > 
> > It is hard to follow arm_smmu_write_strtab_ent() so we all missed that
> > the S2 ends up re-using the qword[1] that was installed by the
> > bypass/abort STE that has to be in place prior to installing the S2.
> 
> Ah! I thought you were inheriting the existing behaviour, but yeah,

Yeah, so did I..

> it's a straight-up bug which I think just makes life a little more
> difficult than it needs to be. If we can keep SHCFG as "use
> incoming" in all configurations, then I do think we can move to a
> per-qword rather than a per-field approach, as mentioned in the
> other part of the thread. I'll try to make some time next week to
> play with it.

I'm sure you can make per-qword work. I think it will be worse code
though because doing so will have to compromise some of the
underpinning logical principles:

 - Used bits reflects actual HW behavior and flows from the
   spec's IGNORED/etc language
 - Make STE functions set the bits that the HW uses and no extra bits
 - The make STE functions create a complete STE

I already tried the naive version where none of the above are
compromised and it does not work. Someone else may have an idea. IMHO
this is really not a valuable avenue to use all of our limited time
on.

I'm getting the existings remarks typed in, it is already turning into
some work, but if you feel strongly please come next week with exactly
you will accept and I will ensure whatever it is gets done if you will
commit to merge it. Even if we have to toss out Michael's version too.

I can always bring back this version for some future shared code
thing.

Thanks,
Jason
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2ab36dcf7c61f5..893df3e76400ec 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1518,13 +1518,89 @@  static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
 		FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
 }
 
+static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
+				      struct arm_smmu_master *master)
+{
+	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
+		FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt) |
+		(cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
+		FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax));
+
+	target->data[1] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
+		FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
+		FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
+		FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
+		((smmu->features & ARM_SMMU_FEAT_STALLS &&
+		  !master->stall_enabled) ?
+			 STRTAB_STE_1_S1STALLD :
+			 0) |
+		FIELD_PREP(STRTAB_STE_1_EATS,
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H) {
+		/*
+		 * To support BTM the streamworld needs to match the
+		 * configuration of the CPU so that the ASID broadcasts are
+		 * properly matched. This means either S/NS-EL2-E2H (hypervisor)
+		 * or NS-EL1 (guest). Since an SVA domain can be installed in a
+		 * PASID this should always use a BTM compatible configuration
+		 * if the HW supports it.
+		 */
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_EL2));
+	} else {
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1));
+
+		/*
+		 * VMID 0 is reserved for stage-2 bypass EL1 STEs, see
+		 * arm_smmu_domain_alloc_id()
+		 */
+		target->data[2] =
+			cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0));
+	}
+}
+
+static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
+					struct arm_smmu_master *master,
+					struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
+
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
+
+	target->data[1] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_1_EATS,
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
+		FIELD_PREP(STRTAB_STE_1_SHCFG,
+			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
+
+	target->data[2] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
+		FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
+		STRTAB_STE_2_S2AA64 |
+#ifdef __BIG_ENDIAN
+		STRTAB_STE_2_S2ENDI |
+#endif
+		STRTAB_STE_2_S2PTW |
+		STRTAB_STE_2_S2R);
+
+	target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+}
+
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 				      struct arm_smmu_ste *dst)
 {
-	u64 val;
-	struct arm_smmu_device *smmu = master->smmu;
-	struct arm_smmu_ctx_desc_cfg *cd_table = NULL;
-	struct arm_smmu_s2_cfg *s2_cfg = NULL;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 	struct arm_smmu_ste target = {};
 
@@ -1539,61 +1615,15 @@  static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1:
-		cd_table = &master->cd_table;
+		arm_smmu_make_cdtable_ste(&target, master);
 		break;
 	case ARM_SMMU_DOMAIN_S2:
-		s2_cfg = &smmu_domain->s2_cfg;
+		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
 		break;
 	case ARM_SMMU_DOMAIN_BYPASS:
 		arm_smmu_make_bypass_ste(&target);
-		arm_smmu_write_ste(master, sid, dst, &target);
-		return;
+		break;
 	}
-
-	/* Nuke the existing STE_0 value, as we're going to rewrite it */
-	val = STRTAB_STE_0_V;
-
-	if (cd_table) {
-		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
-			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
-
-		target.data[1] = cpu_to_le64(
-			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
-			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
-			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
-			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
-			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
-
-		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		    !master->stall_enabled)
-			target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
-
-		val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
-			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
-			FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax) |
-			FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt);
-	}
-
-	if (s2_cfg) {
-		target.data[2] = cpu_to_le64(
-			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
-			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
-#ifdef __BIG_ENDIAN
-			 STRTAB_STE_2_S2ENDI |
-#endif
-			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
-			 STRTAB_STE_2_S2R);
-
-		target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
-
-		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
-	}
-
-	if (master->ats_enabled)
-		target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
-						 STRTAB_STE_1_EATS_TRANS));
-
-	target.data[0] = cpu_to_le64(val);
 	arm_smmu_write_ste(master, sid, dst, &target);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 65fb388d51734d..53695dbc9b33f3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -249,6 +249,7 @@  struct arm_smmu_ste {
 #define STRTAB_STE_1_STRW_EL2		2UL
 
 #define STRTAB_STE_1_SHCFG		GENMASK_ULL(45, 44)
+#define STRTAB_STE_1_SHCFG_NON_SHARABLE	0UL
 #define STRTAB_STE_1_SHCFG_INCOMING	1UL
 
 #define STRTAB_STE_2_S2VMID		GENMASK_ULL(15, 0)