diff mbox series

[RFC,v1,2/8] iommu/arm-smmu-v3: Perform invalidations over installed_smmus

Message ID 20230818021629.RFC.v1.2.I782000a264a60e00ecad1bee06fd1413685f9253@changeid (mailing list archive)
State New, archived
Headers show
Series Install domain onto multiple smmus | expand

Commit Message

Michael Shavit Aug. 17, 2023, 6:16 p.m. UTC
Prepare and batch invalidation commands for each SMMU that a domain is
installed onto.
Move SVA's check against the smmu's ARM_SMMU_FEAT_BTM bit into
arm_smmu_tlb_inv_range_asid so that it can be checked against each
installed SMMU.

Signed-off-by: Michael Shavit <mshavit@google.com>
---
It's not obvious to me whether skipping the tlb_inv_range_asid when
ARM_SMMU_FEAT_BTM is somehow specific to SVA? Is moving the check into
arm_smmu_tlb_inv_range_asid still valid if that function were called
outside of SVA?

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  11 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 103 +++++++++++++-----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   2 +-
 3 files changed, 80 insertions(+), 36 deletions(-)

Comments

Jason Gunthorpe Aug. 17, 2023, 7:20 p.m. UTC | #1
On Fri, Aug 18, 2023 at 02:16:24AM +0800, Michael Shavit wrote:
> Prepare and batch invalidation commands for each SMMU that a domain is
> installed onto.
> Move SVA's check against the smmu's ARM_SMMU_FEAT_BTM bit into
> arm_smmu_tlb_inv_range_asid so that it can be checked against each
> installed SMMU.
> 
> Signed-off-by: Michael Shavit <mshavit@google.com>
> ---
> It's not obvious to me whether skipping the tlb_inv_range_asid when
> ARM_SMMU_FEAT_BTM is somehow specific to SVA? Is moving the check into
> arm_smmu_tlb_inv_range_asid still valid if that function were called
> outside of SVA?

Logically it should be linked to SVA, and specifically to the mmu
notifier callback. The mmu notifier callback is done whenever the CPU
did an invalidation and BTM means the SMMU tracks exactly those
automatically. Thus we don't need to duplicated it. Indeed, we should
probably not even register a mmu notifier on BTM capable devices.

It is certainly wrong to skip invalidations generated for any other
reason.

From what I can tell SVA domains should have their CD table entry
programmed with "ASET=0" and normal paging domains should be
programmed with "ASET=1". This causes only the SVA domains to listen
to the BTM invalidations.

Jason
Robin Murphy Aug. 17, 2023, 7:41 p.m. UTC | #2
On 2023-08-17 20:20, Jason Gunthorpe wrote:
> On Fri, Aug 18, 2023 at 02:16:24AM +0800, Michael Shavit wrote:
>> Prepare and batch invalidation commands for each SMMU that a domain is
>> installed onto.
>> Move SVA's check against the smmu's ARM_SMMU_FEAT_BTM bit into
>> arm_smmu_tlb_inv_range_asid so that it can be checked against each
>> installed SMMU.
>>
>> Signed-off-by: Michael Shavit <mshavit@google.com>
>> ---
>> It's not obvious to me whether skipping the tlb_inv_range_asid when
>> ARM_SMMU_FEAT_BTM is somehow specific to SVA? Is moving the check into
>> arm_smmu_tlb_inv_range_asid still valid if that function were called
>> outside of SVA?
> 
> Logically it should be linked to SVA, and specifically to the mmu
> notifier callback. The mmu notifier callback is done whenever the CPU
> did an invalidation and BTM means the SMMU tracks exactly those
> automatically. Thus we don't need to duplicated it. Indeed, we should
> probably not even register a mmu notifier on BTM capable devices.

Almost - broadcast invalidates from the CPU only apply to SMMU TLBs; we 
still need the notifier for the sake of issuing ATC invalidate commands 
to endpoints.

> It is certainly wrong to skip invalidations generated for any other
> reason.
> 
>  From what I can tell SVA domains should have their CD table entry
> programmed with "ASET=0" and normal paging domains should be
> programmed with "ASET=1". This causes only the SVA domains to listen
> to the BTM invalidations.

Correct.

Thanks,
Robin.
Michael Shavit Aug. 18, 2023, 3:44 a.m. UTC | #3
On Fri, Aug 18, 2023 at 3:41 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2023-08-17 20:20, Jason Gunthorpe wrote:
> > It is certainly wrong to skip invalidations generated for any other
> > reason.
> >
> >  From what I can tell SVA domains should have their CD table entry
> > programmed with "ASET=0" and normal paging domains should be
> > programmed with "ASET=1". This causes only the SVA domains to listen
> > to the BTM invalidations.
>
> Correct.
>
> Thanks,
> Robin.

Would it be fair to rename arm_smmu_tlb_inv_asid (or move into
arm-smmu-v3-sva) to make it explicit that it shouldn't be used outside
of SVA then? Or add a parameter such as skip_btm_capable_devices.
Jason Gunthorpe Aug. 18, 2023, 1:51 p.m. UTC | #4
On Fri, Aug 18, 2023 at 11:44:55AM +0800, Michael Shavit wrote:
> On Fri, Aug 18, 2023 at 3:41 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > On 2023-08-17 20:20, Jason Gunthorpe wrote:
> > > It is certainly wrong to skip invalidations generated for any other
> > > reason.
> > >
> > >  From what I can tell SVA domains should have their CD table entry
> > > programmed with "ASET=0" and normal paging domains should be
> > > programmed with "ASET=1". This causes only the SVA domains to listen
> > > to the BTM invalidations.
> >
> > Correct.
> >
> > Thanks,
> > Robin.
> 
> Would it be fair to rename arm_smmu_tlb_inv_asid (or move into
> arm-smmu-v3-sva) to make it explicit that it shouldn't be used outside
> of SVA then? Or add a parameter such as skip_btm_capable_devices.

???

arm_smmu_tlb_inv_asid() is generally used in many places and has
nothing to do with BTM..

Did you mean arm_smmu_tlb_inv_range_asid ?

Broadly, invalidation is not SVA specific..

Notice that arm_smmu_tlb_inv_range_asid() already duplicates
arm_smmu_tlb_inv_range_domain().

IMHO I would split the ATC step out of arm_smmu_mm_invalidate_range(),
get rid of arm_smmu_tlb_inv_range_domain(), and have the mmu notifier
just do as it already does:

	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
		arm_smmu_tlb_inv_range_domain_no_atc(start, size, smmu_mn->cd->asid,
					    PAGE_SIZE, false, smmu_domain);
	arm_smmu_atc_inv_domain(smmu_domain, start, size);

And make arm_smmu_tlb_inv_range_domain() just call
   arm_smmu_tlb_inv_range_domain_no_atc();
   arm_smmu_atc_inv_domain();

Jason
Michael Shavit Aug. 21, 2023, 8:33 a.m. UTC | #5
On Fri, Aug 18, 2023 at 9:51 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Aug 18, 2023 at 11:44:55AM +0800, Michael Shavit wrote:
> > On Fri, Aug 18, 2023 at 3:41 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > >
> > > On 2023-08-17 20:20, Jason Gunthorpe wrote:
> > > > It is certainly wrong to skip invalidations generated for any other
> > > > reason.
> > > >
> > > >  From what I can tell SVA domains should have their CD table entry
> > > > programmed with "ASET=0" and normal paging domains should be
> > > > programmed with "ASET=1". This causes only the SVA domains to listen
> > > > to the BTM invalidations.
> > >
> > > Correct.
> > >
> > > Thanks,
> > > Robin.
> >
> > Would it be fair to rename arm_smmu_tlb_inv_asid (or move into
> > arm-smmu-v3-sva) to make it explicit that it shouldn't be used outside
> > of SVA then? Or add a parameter such as skip_btm_capable_devices.
>
> ???
>
> arm_smmu_tlb_inv_asid() is generally used in many places and has
> nothing to do with BTM..
>
> Did you mean arm_smmu_tlb_inv_range_asid ?

Whoops yes that's what I meant.


>
> Broadly, invalidation is not SVA specific..
>
> Notice that arm_smmu_tlb_inv_range_asid() already duplicates
> arm_smmu_tlb_inv_range_domain().
>
> IMHO I would split the ATC step out of arm_smmu_mm_invalidate_range(),
> get rid of arm_smmu_tlb_inv_range_domain(), and have the mmu notifier
> just do as it already does:
>
>         if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
>                 arm_smmu_tlb_inv_range_domain_no_atc(start, size, smmu_mn->cd->asid,
>                                             PAGE_SIZE, false, smmu_domain);
>         arm_smmu_atc_inv_domain(smmu_domain, start, size);
>
> And make arm_smmu_tlb_inv_range_domain() just call
>    arm_smmu_tlb_inv_range_domain_no_atc();
>    arm_smmu_atc_inv_domain();

That's a nice clean-up but doesn't really solve the problem faced by this patch.

This patch series eliminates the smmu_domain->smmu handle, replacing
it for a list of SMMUs. So SVA can no longer optimize the
arm_smmu_tlb_inv_range_asid call away by checking whether the SMMU BTM
feature is enabled since there's now a list of SMMUs with possibly
heterogeneous support for the feature. Since there's now a loop over a
series of SMMUs inside arm_smmu_tlb_inv_range_asid, it makes sense to
move the check into that loop. This technically works because only SVA
is calling arm_smmu_tlb_inv_range_asid but can (IMO) risk introducing
bugs in the future since it's not obvious from the function name.

The suggestion was then to introduce a parameter to
arm_smmu_tlb_inv_range_asid (or arm_smmu_tlb_inv_range_domain_no_atc)
to make this behavior explicit in the API.
Jason Gunthorpe Aug. 21, 2023, 11:57 a.m. UTC | #6
On Mon, Aug 21, 2023 at 04:33:36PM +0800, Michael Shavit wrote:
> > Notice that arm_smmu_tlb_inv_range_asid() already duplicates
> > arm_smmu_tlb_inv_range_domain().
> >
> > IMHO I would split the ATC step out of arm_smmu_mm_invalidate_range(),
> > get rid of arm_smmu_tlb_inv_range_domain(), and have the mmu notifier
> > just do as it already does:
> >
> >         if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
> >                 arm_smmu_tlb_inv_range_domain_no_atc(start, size, smmu_mn->cd->asid,
> >                                             PAGE_SIZE, false, smmu_domain);
> >         arm_smmu_atc_inv_domain(smmu_domain, start, size);
> >
> > And make arm_smmu_tlb_inv_range_domain() just call
> >    arm_smmu_tlb_inv_range_domain_no_atc();
> >    arm_smmu_atc_inv_domain();
> 
> That's a nice clean-up but doesn't really solve the problem faced by this patch.
>
> This patch series eliminates the smmu_domain->smmu handle, replacing
> it for a list of SMMUs. So SVA can no longer optimize the
> arm_smmu_tlb_inv_range_asid call away by checking whether the SMMU BTM
> feature is enabled since there's now a list of SMMUs with possibly
> heterogeneous support for the feature. 

You could also go in the direction of making a SVA BTM and SV non-BTM
domain type and then you know what to do immediately in the notifier.

> Since there's now a loop over a series of SMMUs inside
> arm_smmu_tlb_inv_range_asid, it makes sense to move the check into
> that loop. This technically works because only SVA is calling
> arm_smmu_tlb_inv_range_asid but can (IMO) risk introducing bugs in
> the future since it's not obvious from the function name.

Well, I would remove the duplication and add an argument if you intend
to share the function that loops

Jason
Michael Shavit Aug. 22, 2023, 8:17 a.m. UTC | #7
On Mon, Aug 21, 2023 at 7:58 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > Since there's now a loop over a series of SMMUs inside
> > arm_smmu_tlb_inv_range_asid, it makes sense to move the check into
> > that loop. This technically works because only SVA is calling
> > arm_smmu_tlb_inv_range_asid but can (IMO) risk introducing bugs in
> > the future since it's not obvious from the function name.
>
> Well, I would remove the duplication and add an argument if you intend
> to share the function that loops

What do you think about this as a final stage:
Once the set_dev_pasid and sva refactor lands, SVA could call a common
arm_smmu_inv_range_domain implementation which would:
1. Skip the TLB invalidation on a per-smmu basis if it detects that
the domain type is SVA, or based on a passed-in parameter that is only
set True by SVA.
2. Issue ATC invalidations with SSIDs found in the arm_smmu_domain.
This common function would be used for all use-cases: invalidations of
domains attached on RIDs, on PASIDs (SVA and non SVA).

Then we have two options for the intermediate stage with this series:
1. Non-SVA code uses arm_smmu_inv_range_domain which calls
arm_smmu_tlb_inv_range_domain(_no_atc) and arm_smmu_atc_range_domain,
SVA code individually calls those two functions.
arm_smmu_tlb_inv_range_domain(_no_atc) accepts a parameter to skip the
invalidation if BTM feature is set.
2. Same as option 1, but SVA also calls arm_smmu_inv_range_domain.
arm_smmu_inv_range_domain accepts both a parameter to skip TLB inv
when BTM is set, as well as an SSID for the atc invalidation. SSID
would be 0 in non-sva callers, and mm->pasid for SVA.
Michael Shavit Aug. 22, 2023, 8:21 a.m. UTC | #8
On Tue, Aug 22, 2023 at 4:17 PM Michael Shavit <mshavit@google.com> wrote:
>
> On Mon, Aug 21, 2023 at 7:58 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > Since there's now a loop over a series of SMMUs inside
> > > arm_smmu_tlb_inv_range_asid, it makes sense to move the check into
> > > that loop. This technically works because only SVA is calling
> > > arm_smmu_tlb_inv_range_asid but can (IMO) risk introducing bugs in
> > > the future since it's not obvious from the function name.
> >
> > Well, I would remove the duplication and add an argument if you intend
> > to share the function that loops
>
> What do you think about this as a final stage:
> Once the set_dev_pasid and sva refactor lands, SVA could call a common
> arm_smmu_inv_range_domain implementation which would:
> 1. Skip the TLB invalidation on a per-smmu basis if it detects that
> the domain type is SVA, or based on a passed-in parameter that is only
> set True by SVA.
> 2. Issue ATC invalidations with SSIDs found in the arm_smmu_domain.
> This common function would be used for all use-cases: invalidations of
> domains attached on RIDs, on PASIDs (SVA and non SVA).
>
> Then we have two options for the intermediate stage with this series:
> 1. Non-SVA code uses arm_smmu_inv_range_domain which calls
> arm_smmu_tlb_inv_range_domain(_no_atc) and arm_smmu_atc_range_domain,
> SVA code individually calls those two functions.
> arm_smmu_tlb_inv_range_domain(_no_atc) accepts a parameter to skip the
> invalidation if BTM feature is set.
> 2. Same as option 1, but SVA also calls arm_smmu_inv_range_domain.
> arm_smmu_inv_range_domain accepts both a parameter to skip TLB inv
> when BTM is set, as well as an SSID for the atc invalidation. SSID
> would be 0 in non-sva callers, and mm->pasid for SVA.

(Something I sneaked in there is renaming operations that invalidate
both TLBs and ATC to remove the tlb/atc in-fix, but if people aren't
keen on such renames then yeah I suppose we'd need awkward names like
arm_smmu_tlb_inv_range_domain_no_atc)
Michael Shavit Aug. 22, 2023, 10:10 a.m. UTC | #9
On Tue, Aug 22, 2023 at 4:17 PM Michael Shavit <mshavit@google.com> wrote:
>
> On Mon, Aug 21, 2023 at 7:58 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > Since there's now a loop over a series of SMMUs inside
> > > arm_smmu_tlb_inv_range_asid, it makes sense to move the check into
> > > that loop. This technically works because only SVA is calling
> > > arm_smmu_tlb_inv_range_asid but can (IMO) risk introducing bugs in
> > > the future since it's not obvious from the function name.
> >
> > Well, I would remove the duplication and add an argument if you intend
> > to share the function that loops
>
> What do you think about this as a final stage:
> Once the set_dev_pasid and sva refactor lands, SVA could call a common
> arm_smmu_inv_range_domain implementation which would:
> 1. Skip the TLB invalidation on a per-smmu basis if it detects that
> the domain type is SVA, or based on a passed-in parameter that is only
> set True by SVA.
> 2. Issue ATC invalidations with SSIDs found in the arm_smmu_domain.
> This common function would be used for all use-cases: invalidations of
> domains attached on RIDs, on PASIDs (SVA and non SVA).
>
> Then we have two options for the intermediate stage with this series:
> 1. Non-SVA code uses arm_smmu_inv_range_domain which calls
> arm_smmu_tlb_inv_range_domain(_no_atc) and arm_smmu_atc_range_domain,
> SVA code individually calls those two functions.
> arm_smmu_tlb_inv_range_domain(_no_atc) accepts a parameter to skip the
> invalidation if BTM feature is set.
> 2. Same as option 1, but SVA also calls arm_smmu_inv_range_domain.
> arm_smmu_inv_range_domain accepts both a parameter to skip TLB inv
> when BTM is set, as well as an SSID for the atc invalidation. SSID
> would be 0 in non-sva callers, and mm->pasid for SVA.

Ugh, ok, there's a reason arm_smmu_tlb_inv_range_asid duplicates
arm_smmu_tlb_inv_range_domain!
The invalidation isn't performed with the smmu_domain's ASID... but
with the CD's ASID (because the domain is the RID domain.... again).

I think we better leave this out of scope for this series as well.
Let's keep the redundancy for now, and add a parameter to make things
explicit. We can clean things up when the above is fixed.
Jason Gunthorpe Aug. 22, 2023, 2:15 p.m. UTC | #10
On Tue, Aug 22, 2023 at 04:17:31PM +0800, Michael Shavit wrote:
> On Mon, Aug 21, 2023 at 7:58 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > Since there's now a loop over a series of SMMUs inside
> > > arm_smmu_tlb_inv_range_asid, it makes sense to move the check into
> > > that loop. This technically works because only SVA is calling
> > > arm_smmu_tlb_inv_range_asid but can (IMO) risk introducing bugs in
> > > the future since it's not obvious from the function name.
> >
> > Well, I would remove the duplication and add an argument if you intend
> > to share the function that loops
> 
> What do you think about this as a final stage:
> Once the set_dev_pasid and sva refactor lands, SVA could call a common
> arm_smmu_inv_range_domain implementation which would:
> 1. Skip the TLB invalidation on a per-smmu basis if it detects that
> the domain type is SVA, or based on a passed-in parameter that is only
> set True by SVA.
> 2. Issue ATC invalidations with SSIDs found in the arm_smmu_domain.
> This common function would be used for all use-cases: invalidations of
> domains attached on RIDs, on PASIDs (SVA and non SVA).

That seems like a good place to aim for

Jason
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a4e235b4f1c4b..58def59c36004 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -128,7 +128,7 @@  arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
 	arm_smmu_write_ctx_desc_devices(smmu_domain, 0, cd);
 
 	/* Invalidate TLB entries previously associated with that context */
-	arm_smmu_tlb_inv_asid(smmu, asid);
+	arm_smmu_tlb_inv_asid(smmu_domain, asid);
 
 	xa_erase(&arm_smmu_asid_xa, asid);
 	return NULL;
@@ -246,9 +246,8 @@  static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
 	 */
 	size = end - start;
 
-	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
-		arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
-					    PAGE_SIZE, false, smmu_domain);
+	arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
+				    PAGE_SIZE, false, smmu_domain);
 	arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
 }
 
@@ -269,7 +268,7 @@  static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	 */
 	arm_smmu_write_ctx_desc_devices(smmu_domain, mm->pasid, &quiet_cd);
 
-	arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
+	arm_smmu_tlb_inv_asid(smmu_domain, smmu_mn->cd->asid);
 	arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, 0, 0);
 
 	smmu_mn->cleared = true;
@@ -357,7 +356,7 @@  static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn)
 	 * new TLB entry can have been formed.
 	 */
 	if (!smmu_mn->cleared) {
-		arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid);
+		arm_smmu_tlb_inv_asid(smmu_domain, cd->asid);
 		arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, 0, 0);
 	}
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index cb4bf0c7c3dd6..447af74dbe280 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -960,15 +960,24 @@  static int arm_smmu_page_response(struct device *dev,
 }
 
 /* Context descriptor manipulation functions */
-void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
+void arm_smmu_tlb_inv_asid(struct arm_smmu_domain *smmu_domain, u16 asid)
 {
+	struct arm_smmu_installed_smmu *installed_smmu;
+	struct arm_smmu_device *smmu;
 	struct arm_smmu_cmdq_ent cmd = {
-		.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
-			CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID,
 		.tlbi.asid = asid,
 	};
+	unsigned long flags;
 
-	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
+	spin_lock_irqsave(&smmu_domain->installed_smmus_lock, flags);
+	list_for_each_entry(installed_smmu, &smmu_domain->installed_smmus,
+			    list) {
+		smmu = installed_smmu->smmu;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+			CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID;
+		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
+	}
+	spin_unlock_irqrestore(&smmu_domain->installed_smmus_lock, flags);
 }
 
 static void arm_smmu_sync_cd(struct arm_smmu_master *master,
@@ -1818,9 +1827,6 @@  int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 	struct arm_smmu_cmdq_batch cmds;
 	struct arm_smmu_installed_smmu *installed_smmu;
 
-	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
-		return 0;
-
 	/*
 	 * Ensure that we've completed prior invalidation of the main TLBs
 	 * before we read 'nr_ats_masters' in case of a concurrent call to
@@ -1862,12 +1868,29 @@  int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 	return ret;
 }
 
+static void arm_smmu_tlb_inv_vmid(struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_installed_smmu *installed_smmu;
+	struct arm_smmu_device *smmu;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_TLBI_S12_VMALL,
+		.tlbi.vmid = smmu_domain->s2_cfg.vmid,
+	};
+	unsigned long flags;
+
+	spin_lock_irqsave(&smmu_domain->installed_smmus_lock, flags);
+	list_for_each_entry(installed_smmu, &smmu_domain->installed_smmus,
+			    list) {
+		smmu = installed_smmu->smmu;
+		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
+	}
+	spin_unlock_irqrestore(&smmu_domain->installed_smmus_lock, flags);
+}
+
 /* IO_PGTABLE API */
 static void arm_smmu_tlb_inv_context(void *cookie)
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
-	struct arm_smmu_cmdq_ent cmd;
 
 	/*
 	 * NOTE: when io-pgtable is in non-strict mode, we may get here with
@@ -1877,11 +1900,9 @@  static void arm_smmu_tlb_inv_context(void *cookie)
 	 * careful, 007.
 	 */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		arm_smmu_tlb_inv_asid(smmu, smmu_domain->cd.asid);
+		arm_smmu_tlb_inv_asid(smmu_domain, smmu_domain->cd.asid);
 	} else {
-		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
-		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
-		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
+		arm_smmu_tlb_inv_vmid(smmu_domain);
 	}
 	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
 }
@@ -1889,9 +1910,9 @@  static void arm_smmu_tlb_inv_context(void *cookie)
 static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
 				     unsigned long iova, size_t size,
 				     size_t granule,
-				     struct arm_smmu_domain *smmu_domain)
+				     struct arm_smmu_domain *smmu_domain,
+				     struct arm_smmu_device *smmu)
 {
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	unsigned long end = iova + size, num_pages = 0, tg = 0;
 	size_t inv_range = granule;
 	struct arm_smmu_cmdq_batch cmds;
@@ -1956,21 +1977,32 @@  static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
 					  size_t granule, bool leaf,
 					  struct arm_smmu_domain *smmu_domain)
 {
+	struct arm_smmu_installed_smmu *installed_smmu;
+	struct arm_smmu_device *smmu;
 	struct arm_smmu_cmdq_ent cmd = {
 		.tlbi = {
 			.leaf	= leaf,
 		},
 	};
+	unsigned long flags;
 
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		cmd.opcode	= smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ?
-				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
-		cmd.tlbi.asid	= smmu_domain->cd.asid;
-	} else {
-		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
-		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
+	spin_lock_irqsave(&smmu_domain->installed_smmus_lock, flags);
+	list_for_each_entry(installed_smmu, &smmu_domain->installed_smmus,
+			    list) {
+		smmu = installed_smmu->smmu;
+		if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+			cmd.opcode = smmu->features & ARM_SMMU_FEAT_E2H ?
+					     CMDQ_OP_TLBI_EL2_VA :
+					     CMDQ_OP_TLBI_NH_VA;
+			cmd.tlbi.asid = smmu_domain->cd.asid;
+		} else {
+			cmd.opcode = CMDQ_OP_TLBI_S2_IPA;
+			cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid;
+		}
+		__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain,
+					 smmu);
 	}
-	__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
+	spin_unlock_irqrestore(&smmu_domain->installed_smmus_lock, flags);
 
 	/*
 	 * Unfortunately, this can't be leaf-only since we may have
@@ -1983,16 +2015,30 @@  void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
 				 size_t granule, bool leaf,
 				 struct arm_smmu_domain *smmu_domain)
 {
+
+	struct arm_smmu_installed_smmu *installed_smmu;
+	struct arm_smmu_device *smmu;
+	unsigned long flags;
 	struct arm_smmu_cmdq_ent cmd = {
-		.opcode	= smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ?
-			  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA,
 		.tlbi = {
 			.asid	= asid,
 			.leaf	= leaf,
 		},
 	};
-
-	__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
+	spin_lock_irqsave(&smmu_domain->installed_smmus_lock, flags);
+	list_for_each_entry(installed_smmu, &smmu_domain->installed_smmus,
+			    list) {
+		smmu = installed_smmu->smmu;
+		if (smmu->features & ARM_SMMU_FEAT_BTM)
+			continue;
+		cmd.opcode = smmu->features &
+					     ARM_SMMU_FEAT_E2H ?
+				     CMDQ_OP_TLBI_EL2_VA :
+				     CMDQ_OP_TLBI_NH_VA;
+		__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain,
+					 smmu);
+	}
+	spin_unlock_irqrestore(&smmu_domain->installed_smmus_lock, flags);
 }
 
 static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
@@ -2564,8 +2610,7 @@  static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-	if (smmu_domain->smmu)
-		arm_smmu_tlb_inv_context(smmu_domain);
+	arm_smmu_tlb_inv_context(smmu_domain);
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index a9202d2045537..2ab23139c796e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -756,7 +756,7 @@  extern struct arm_smmu_ctx_desc quiet_cd;
 
 int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid,
 			    struct arm_smmu_ctx_desc *cd);
-void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
+void arm_smmu_tlb_inv_asid(struct arm_smmu_domain *smmu_domain, u16 asid);
 void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
 				 size_t granule, bool leaf,
 				 struct arm_smmu_domain *smmu_domain);