diff mbox series

[v5,8/9] iommu/arm-smmu-v3: Skip cd sync if CD table isn't active

Message ID 20230809011204.v5.8.Idedc0f496231e2faab3df057219c5e2d937bbfe4@changeid (mailing list archive)
State New, archived
Headers show
Series Refactor the SMMU's CD table ownership | expand

Commit Message

Michael Shavit Aug. 8, 2023, 5:12 p.m. UTC
This commit explicitly keeps track of whether a CD table is installed in
an STE so that arm_smmu_sync_cd can skip the sync when unnecessary. This
was previously achieved through the domain->devices list, but we are
moving to a model where arm_smmu_sync_cd directly operates on a master
and the master's CD table instead of a domain.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Michael Shavit <mshavit@google.com>
---

Changes in v5:
- Fix an issue where cd_table.installed wasn't correctly updated.

Changes in v3:
- Flip the cd_table.installed bit back off when table is detached
- re-order the commit later in the series since flipping the installed
  bit to off isn't obvious when the cd_table is still shared by multiple
  masters.

Changes in v2:
- Store field as a bit instead of a bool. Fix comment about STE being
  live before the sync in write_ctx_desc().

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 ++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

Comments

Will Deacon Aug. 9, 2023, 1:50 p.m. UTC | #1
On Wed, Aug 09, 2023 at 01:12:04AM +0800, Michael Shavit wrote:
> This commit explicitly keeps track of whether a CD table is installed in
> an STE so that arm_smmu_sync_cd can skip the sync when unnecessary. This
> was previously achieved through the domain->devices list, but we are
> moving to a model where arm_smmu_sync_cd directly operates on a master
> and the master's CD table instead of a domain.

Why is this path worth optimising?

> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index f5ad386cc8760..488d12dd2d4aa 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -985,6 +985,9 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
>  		},
>  	};
>  
> +	if (!master->cd_table.installed)
> +		return;

Doesn't this interact badly with the sync in arm_smmu_detach_dev(), which I
think happens after zapping the STE?

>  	cmds.num = 0;
>  	for (i = 0; i < master->num_streams; i++) {
>  		cmd.cfgi.sid = master->streams[i].id;
> @@ -1091,7 +1094,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>  		cdptr[3] = cpu_to_le64(cd->mair);
>  
>  		/*
> -		 * STE is live, and the SMMU might read dwords of this CD in any
> +		 * STE may be live, and the SMMU might read dwords of this CD in any
>  		 * order. Ensure that it observes valid values before reading
>  		 * V=1.
>  		 */

Why does this patch need to update this comment?

Will
Michael Shavit Aug. 10, 2023, 8:34 a.m. UTC | #2
On Wed, Aug 9, 2023 at 9:50 PM Will Deacon <will@kernel.org> wrote:
>
> On Wed, Aug 09, 2023 at 01:12:04AM +0800, Michael Shavit wrote:
> > This commit explicitly keeps track of whether a CD table is installed in
> > an STE so that arm_smmu_sync_cd can skip the sync when unnecessary. This
> > was previously achieved through the domain->devices list, but we are
> > moving to a model where arm_smmu_sync_cd directly operates on a master
> > and the master's CD table instead of a domain.
>
> Why is this path worth optimising?

I have no idea what the practical impact of this optimization is, but
the motivation here was to make the overall series as close to a nop
as possible. This optimization existed before but is "broken" by the
previous patch. This patch restores it.

> Doesn't this interact badly with the sync in arm_smmu_detach_dev(), which I
> think happens after zapping the STE?

The arm_smmu_write_ctx_desc call added in arm_smmu_detach_dev() was
inserted after zapping the STE precisely so that we could skip the
sync. Is there a concern that a stale CD could be used when the
CDtable is re-inserted into the STE?

> >               /*
> > -              * STE is live, and the SMMU might read dwords of this CD in any
> > +              * STE may be live, and the SMMU might read dwords of this CD in any
> >                * order. Ensure that it observes valid values before reading
> >                * V=1.
> >                */
>
> Why does this patch need to update this comment?

This is a drive-by to make this comment more accurate. Note how
(before this patch series) arm_smmu_domain_finalise_s1 explicitly
mentions that it calls arm_smmu_write_ctx_desc while the STE isn't
installed yet. Yet this comment asserts the STE *is* live.
Will Deacon Aug. 10, 2023, 4:27 p.m. UTC | #3
On Thu, Aug 10, 2023 at 04:34:39PM +0800, Michael Shavit wrote:
> On Wed, Aug 9, 2023 at 9:50 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Wed, Aug 09, 2023 at 01:12:04AM +0800, Michael Shavit wrote:
> > > This commit explicitly keeps track of whether a CD table is installed in
> > > an STE so that arm_smmu_sync_cd can skip the sync when unnecessary. This
> > > was previously achieved through the domain->devices list, but we are
> > > moving to a model where arm_smmu_sync_cd directly operates on a master
> > > and the master's CD table instead of a domain.
> >
> > Why is this path worth optimising?
> 
> I have no idea what the practical impact of this optimization is, but
> the motivation here was to make the overall series as close to a nop
> as possible. This optimization existed before but is "broken" by the
> previous patch. This patch restores it.

I'm not sure it's necessary, tbh. It's not like we're calling
arm_smmu_sync_cd() all over the place -- it's used when we're actually
working with the CD.

> > Doesn't this interact badly with the sync in arm_smmu_detach_dev(), which I
> > think happens after zapping the STE?
> 
> The arm_smmu_write_ctx_desc call added in arm_smmu_detach_dev() was
> inserted after zapping the STE precisely so that we could skip the
> sync. Is there a concern that a stale CD could be used when the
> CDtable is re-inserted into the STE?

Ah, sorry, I went and looked at the architecture and it says for
CMD_CFGI_STE:

  | This command invalidates all Context descriptors (including L1CD)
  | that were cached using the given StreamID.

so as long as we make the CD unreachable in the STE before the STE
invalidation (which I think we do by setting the Config field to bypass or
abort), then I agree that we don't need the subsequent CD invalidation.

> > >               /*
> > > -              * STE is live, and the SMMU might read dwords of this CD in any
> > > +              * STE may be live, and the SMMU might read dwords of this CD in any
> > >                * order. Ensure that it observes valid values before reading
> > >                * V=1.
> > >                */
> >
> > Why does this patch need to update this comment?
> 
> This is a drive-by to make this comment more accurate. Note how
> (before this patch series) arm_smmu_domain_finalise_s1 explicitly
> mentions that it calls arm_smmu_write_ctx_desc while the STE isn't
> installed yet. Yet this comment asserts the STE *is* live.

Can you do it as its own patch then, please?

Will
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f5ad386cc8760..488d12dd2d4aa 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -985,6 +985,9 @@  static void arm_smmu_sync_cd(struct arm_smmu_master *master,
 		},
 	};
 
+	if (!master->cd_table.installed)
+		return;
+
 	cmds.num = 0;
 	for (i = 0; i < master->num_streams; i++) {
 		cmd.cfgi.sid = master->streams[i].id;
@@ -1091,7 +1094,7 @@  int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
 		cdptr[3] = cpu_to_le64(cd->mair);
 
 		/*
-		 * STE is live, and the SMMU might read dwords of this CD in any
+		 * STE may be live, and the SMMU might read dwords of this CD in any
 		 * order. Ensure that it observes valid values before reading
 		 * V=1.
 		 */
@@ -1333,6 +1336,7 @@  static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		 */
 		if (smmu)
 			arm_smmu_sync_ste_for_sid(smmu, sid);
+		master->cd_table.installed = false;
 		return;
 	}
 
@@ -1360,6 +1364,9 @@  static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 				  cd_table->l1_desc ?
 					  STRTAB_STE_0_S1FMT_64K_L2 :
 					  STRTAB_STE_0_S1FMT_LINEAR);
+		cd_table->installed = true;
+	} else {
+		master->cd_table.installed = false;
 	}
 
 	if (s2_cfg) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1f3b370257779..e76452e735a04 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -599,6 +599,8 @@  struct arm_smmu_ctx_desc_cfg {
 	u8				max_cds_bits;
 	/* Whether CD entries in this table have the stall bit set. */
 	u8				stall_enabled:1;
+	/* Whether this CD table is installed in any STE */
+	u8				installed:1;
 };
 
 struct arm_smmu_s2_cfg {