Message ID | 4-v5-cd1be8dd9c71+3fa-smmuv3_newapi_p1_jgg@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5,01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers | expand |
On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote: > +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, > + struct arm_smmu_master *master, > + struct arm_smmu_domain *smmu_domain) > +{ > + struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; > + > + memset(target, 0, sizeof(*target)); > + target->data[0] = cpu_to_le64( > + STRTAB_STE_0_V | > + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS)); > + > + target->data[1] = cpu_to_le64( > + FIELD_PREP(STRTAB_STE_1_EATS, > + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) | > + FIELD_PREP(STRTAB_STE_1_SHCFG, > + STRTAB_STE_1_SHCFG_NON_SHARABLE)); Just so we are on the same page.. The above NON_SHARABLE is a mistake here since v1. It is hard to follow arm_smmu_write_strtab_ent() so we all missed that the S2 ends up re-using the qword[1] that was installed by the bypass/abort STE that has to be in place prior to installing the S2. Only the S1 path sets SHCFG to 0 because the HW doesn't use it due to the current driver not using S1DSS. Jason
On Fri, Feb 16, 2024 at 01:12:17PM -0400, Jason Gunthorpe wrote: > On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote: > > +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, > > + struct arm_smmu_master *master, > > + struct arm_smmu_domain *smmu_domain) > > +{ > > + struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; > > + > > + memset(target, 0, sizeof(*target)); > > + target->data[0] = cpu_to_le64( > > + STRTAB_STE_0_V | > > + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS)); > > + > > + target->data[1] = cpu_to_le64( > > + FIELD_PREP(STRTAB_STE_1_EATS, > > + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) | > > + FIELD_PREP(STRTAB_STE_1_SHCFG, > > + STRTAB_STE_1_SHCFG_NON_SHARABLE)); > > Just so we are on the same page.. The above NON_SHARABLE is a mistake > here since v1. > > It is hard to follow arm_smmu_write_strtab_ent() so we all missed that > the S2 ends up re-using the qword[1] that was installed by the > bypass/abort STE that has to be in place prior to installing the S2. Ah! I thought you were inheriting the existing behaviour, but yeah, it's a straight-up bug which I think just makes life a little more difficult than it needs to be. If we can keep SHCFG as "use incoming" in all configurations, then I do think we can move to a per-qword rather than a per-field approach, as mentioned in the other part of the thread. I'll try to make some time next week to play with it. Will
On Fri, Feb 16, 2024 at 05:39:22PM +0000, Will Deacon wrote: > On Fri, Feb 16, 2024 at 01:12:17PM -0400, Jason Gunthorpe wrote: > > On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote: > > > +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, > > > + struct arm_smmu_master *master, > > > + struct arm_smmu_domain *smmu_domain) > > > +{ > > > + struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; > > > + > > > + memset(target, 0, sizeof(*target)); > > > + target->data[0] = cpu_to_le64( > > > + STRTAB_STE_0_V | > > > + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS)); > > > + > > > + target->data[1] = cpu_to_le64( > > > + FIELD_PREP(STRTAB_STE_1_EATS, > > > + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) | > > > + FIELD_PREP(STRTAB_STE_1_SHCFG, > > > + STRTAB_STE_1_SHCFG_NON_SHARABLE)); > > > > Just so we are on the same page.. The above NON_SHARABLE is a mistake > > here since v1. > > > > It is hard to follow arm_smmu_write_strtab_ent() so we all missed that > > the S2 ends up re-using the qword[1] that was installed by the > > bypass/abort STE that has to be in place prior to installing the S2. > > Ah! I thought you were inheriting the existing behaviour, but yeah, Yeah, so did I.. > it's a straight-up bug which I think just makes life a little more > difficult than it needs to be. If we can keep SHCFG as "use > incoming" in all configurations, then I do think we can move to a > per-qword rather than a per-field approach, as mentioned in the > other part of the thread. I'll try to make some time next week to > play with it. I'm sure you can make per-qword work. I think it will be worse code though because doing so will have to compromise some of the underpinning logical principles: - Used bits reflects actual HW behavior and flows from the spec's IGNORED/etc language - Make STE functions set the bits that the HW uses and no extra bits - The make STE functions create a complete STE I already tried the naive version where none of the above are compromised and it does not work. Someone else may have an idea. IMHO this is really not a valuable avenue to use all of our limited time on. I'm getting the existings remarks typed in, it is already turning into some work, but if you feel strongly please come next week with exactly you will accept and I will ensure whatever it is gets done if you will commit to merge it. Even if we have to toss out Michael's version too. I can always bring back this version for some future shared code thing. Thanks, Jason
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 2ab36dcf7c61f5..893df3e76400ec 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1518,13 +1518,89 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); } +static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master) +{ + struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; + struct arm_smmu_device *smmu = master->smmu; + + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | + FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt) | + (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) | + FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax)); + + target->data[1] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | + FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | + FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | + FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) | + ((smmu->features & ARM_SMMU_FEAT_STALLS && + !master->stall_enabled) ? + STRTAB_STE_1_S1STALLD : + 0) | + FIELD_PREP(STRTAB_STE_1_EATS, + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0)); + + if (smmu->features & ARM_SMMU_FEAT_E2H) { + /* + * To support BTM the streamworld needs to match the + * configuration of the CPU so that the ASID broadcasts are + * properly matched. This means either S/NS-EL2-E2H (hypervisor) + * or NS-EL1 (guest). Since an SVA domain can be installed in a + * PASID this should always use a BTM compatible configuration + * if the HW supports it. + */ + target->data[1] |= cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_EL2)); + } else { + target->data[1] |= cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1)); + + /* + * VMID 0 is reserved for stage-2 bypass EL1 STEs, see + * arm_smmu_domain_alloc_id() + */ + target->data[2] = + cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0)); + } +} + +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; + + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS)); + + target->data[1] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_EATS, + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) | + FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_NON_SHARABLE)); + + target->data[2] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | + FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | + STRTAB_STE_2_S2AA64 | +#ifdef __BIG_ENDIAN + STRTAB_STE_2_S2ENDI | +#endif + STRTAB_STE_2_S2PTW | + STRTAB_STE_2_S2R); + + target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); +} + static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, struct arm_smmu_ste *dst) { - u64 val; - struct arm_smmu_device *smmu = master->smmu; - struct arm_smmu_ctx_desc_cfg *cd_table = NULL; - struct arm_smmu_s2_cfg *s2_cfg = NULL; struct arm_smmu_domain *smmu_domain = master->domain; struct arm_smmu_ste target = {}; @@ -1539,61 +1615,15 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, switch (smmu_domain->stage) { case ARM_SMMU_DOMAIN_S1: - cd_table = &master->cd_table; + arm_smmu_make_cdtable_ste(&target, master); break; case ARM_SMMU_DOMAIN_S2: - s2_cfg = &smmu_domain->s2_cfg; + arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); break; case ARM_SMMU_DOMAIN_BYPASS: arm_smmu_make_bypass_ste(&target); - arm_smmu_write_ste(master, sid, dst, &target); - return; + break; } - - /* Nuke the existing STE_0 value, as we're going to rewrite it */ - val = STRTAB_STE_0_V; - - if (cd_table) { - u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ? - STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1; - - target.data[1] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | - FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | - FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | - FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) | - FIELD_PREP(STRTAB_STE_1_STRW, strw)); - - if (smmu->features & ARM_SMMU_FEAT_STALLS && - !master->stall_enabled) - target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); - - val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) | - FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | - FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax) | - FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt); - } - - if (s2_cfg) { - target.data[2] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | - FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | -#ifdef __BIG_ENDIAN - STRTAB_STE_2_S2ENDI | -#endif - STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 | - STRTAB_STE_2_S2R); - - target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); - - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS); - } - - if (master->ats_enabled) - target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS, - STRTAB_STE_1_EATS_TRANS)); - - target.data[0] = cpu_to_le64(val); arm_smmu_write_ste(master, sid, dst, &target); } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 65fb388d51734d..53695dbc9b33f3 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -249,6 +249,7 @@ struct arm_smmu_ste { #define STRTAB_STE_1_STRW_EL2 2UL #define STRTAB_STE_1_SHCFG GENMASK_ULL(45, 44) +#define STRTAB_STE_1_SHCFG_NON_SHARABLE 0UL #define STRTAB_STE_1_SHCFG_INCOMING 1UL #define STRTAB_STE_2_S2VMID GENMASK_ULL(15, 0)