diff mbox series

iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_range

Message ID 20230901203904.4073-1-nicolinc@nvidia.com (mailing list archive)
State New, archived
Headers show
Series iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_range | expand

Commit Message

Nicolin Chen Sept. 1, 2023, 8:39 p.m. UTC
When running an SVA case, the following soft lockup is triggered:
--------------------------------------------------------------------
watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
sp : ffff8000d83ef290
x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000
x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000
x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0
x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0
x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a
x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001
Call trace:
 arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
 __arm_smmu_tlb_inv_range+0x118/0x254
 arm_smmu_tlb_inv_range_asid+0x6c/0x130
 arm_smmu_mm_invalidate_range+0xa0/0xa4
 __mmu_notifier_invalidate_range_end+0x88/0x120
 unmap_vmas+0x194/0x1e0
 unmap_region+0xb4/0x144
 do_mas_align_munmap+0x290/0x490
 do_mas_munmap+0xbc/0x124
 __vm_munmap+0xa8/0x19c
 __arm64_sys_munmap+0x28/0x50
 invoke_syscall+0x78/0x11c
 el0_svc_common.constprop.0+0x58/0x1c0
 do_el0_svc+0x34/0x60
 el0_svc+0x2c/0xd4
 el0t_64_sync_handler+0x114/0x140
 el0t_64_sync+0x1a4/0x1a8
--------------------------------------------------------------------

The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable
protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur
to SMMU too since arm_smmu_mm_invalidate_range() is typically called next
to MMU tlb flush function, e.g.
	tlb_flush_mmu_tlbonly {
		tlb_flush {
			__flush_tlb_range {
				// check MAX_TLBI_OPS
			}
		}
		mmu_notifier_invalidate_range {
			arm_smmu_mm_invalidate_range {
				// does not check MAX_TLBI_OPS
			}
		}
	}

Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an
SVA case SMMU uses the CPU page table, so it makes sense to align with the
tlbflush code. Then, replace per-page TLBI commands with a single per-asid
TLBI command, if the request size hits this threshold.

Fixes: 51d113c3be09 ("iommu/arm-smmu-v3: Make BTM optional for SVA")
Cc: stable@vger.kernel.org
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 ++++++++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h     |  9 +++++++++
 2 files changed, 17 insertions(+), 3 deletions(-)

Comments

Nicolin Chen Sept. 15, 2023, 7:43 a.m. UTC | #1
I found this patch cannot be applied to v6.6-rc1 due to conflicts
with some new commits that were merged during the previous cycle.

I can redo a version rebasing on the v6.6-rc1, yet the new version
won't apply to earlier kernel stable trees. Is there a way to make
it happen that both mainline and earlier trees can have this fix?

Thanks
Nicolin

On Fri, Sep 01, 2023 at 01:39:04PM -0700, Nicolin Chen wrote:
> When running an SVA case, the following soft lockup is triggered:
> --------------------------------------------------------------------
> watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
> pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
> lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
> sp : ffff8000d83ef290
> x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000
> x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000
> x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0
> x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0
> x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000
> x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc
> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa
> x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a
> x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001
> Call trace:
>  arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
>  __arm_smmu_tlb_inv_range+0x118/0x254
>  arm_smmu_tlb_inv_range_asid+0x6c/0x130
>  arm_smmu_mm_invalidate_range+0xa0/0xa4
>  __mmu_notifier_invalidate_range_end+0x88/0x120
>  unmap_vmas+0x194/0x1e0
>  unmap_region+0xb4/0x144
>  do_mas_align_munmap+0x290/0x490
>  do_mas_munmap+0xbc/0x124
>  __vm_munmap+0xa8/0x19c
>  __arm64_sys_munmap+0x28/0x50
>  invoke_syscall+0x78/0x11c
>  el0_svc_common.constprop.0+0x58/0x1c0
>  do_el0_svc+0x34/0x60
>  el0_svc+0x2c/0xd4
>  el0t_64_sync_handler+0x114/0x140
>  el0t_64_sync+0x1a4/0x1a8
> --------------------------------------------------------------------
> 
> The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable
> protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur
> to SMMU too since arm_smmu_mm_invalidate_range() is typically called next
> to MMU tlb flush function, e.g.
>         tlb_flush_mmu_tlbonly {
>                 tlb_flush {
>                         __flush_tlb_range {
>                                 // check MAX_TLBI_OPS
>                         }
>                 }
>                 mmu_notifier_invalidate_range {
>                         arm_smmu_mm_invalidate_range {
>                                 // does not check MAX_TLBI_OPS
>                         }
>                 }
>         }
> 
> Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an
> SVA case SMMU uses the CPU page table, so it makes sense to align with the
> tlbflush code. Then, replace per-page TLBI commands with a single per-asid
> TLBI command, if the request size hits this threshold.
> 
> Fixes: 51d113c3be09 ("iommu/arm-smmu-v3: Make BTM optional for SVA")
> Cc: stable@vger.kernel.org
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 ++++++++---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h     |  9 +++++++++
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index a5a63b1c947e..7ec3f5219250 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -201,9 +201,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
>          */
>         size = end - start;
> 
> -       if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
> -               arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
> -                                           PAGE_SIZE, false, smmu_domain);
> +       if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) {
> +               if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) &&
> +                   size >= CMDQ_MAX_TLBI_OPS * PAGE_SIZE)
> +                       arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
> +               else
> +                       arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
> +                                                   PAGE_SIZE, false, smmu_domain);
> +       }
>         arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
>  }
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index dcab85698a4e..79a81eed1dcc 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -326,6 +326,15 @@
>   */
>  #define CMDQ_BATCH_ENTRIES             BITS_PER_LONG
> 
> +/*
> + * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this
> + * is used as a threshold to replace per-page TLBI commands to issue in the
> + * command queue with an address-space TLBI command, when SMMU w/o a range
> + * invalidation feature handles too many per-page TLBI commands, which will
> + * otherwise result in a soft lockup.
> + */
> +#define CMDQ_MAX_TLBI_OPS              (1 << (PAGE_SHIFT - 3))
> +
>  #define CMDQ_0_OP                      GENMASK_ULL(7, 0)
>  #define CMDQ_0_SSV                     (1UL << 11)
> 
> --
> 2.42.0
> 
>
Baolu Lu Sept. 18, 2023, 5:33 a.m. UTC | #2
On 9/15/23 3:43 PM, Nicolin Chen wrote:
> I found this patch cannot be applied to v6.6-rc1 due to conflicts
> with some new commits that were merged during the previous cycle.
> 
> I can redo a version rebasing on the v6.6-rc1, yet the new version
> won't apply to earlier kernel stable trees. Is there a way to make
> it happen that both mainline and earlier trees can have this fix?

Normally, bug fixes should first be submitted to the mainline kernel
(also known as Linus's tree). If you use the "Fixes" and "CC-stable"
tags, the patch will be automatically picked up for the appropriate
stable kernels.

If the patch does not apply to any stable kernel that you want it to be
there, you can then post a back-ported patch to the stable mailing list.

When doing so, it's better to include the following information:

- The mainline commit ID of the back-ported patch.
- The versions of the stable kernel(s) to which you want the back-ported
   patch to be applied.

Hope this helps.

Best regards,
baolu
Will Deacon Sept. 18, 2023, 9:22 a.m. UTC | #3
On Fri, Sep 01, 2023 at 01:39:04PM -0700, Nicolin Chen wrote:
> When running an SVA case, the following soft lockup is triggered:
> --------------------------------------------------------------------
> watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
> pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
> lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
> sp : ffff8000d83ef290
> x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000
> x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000
> x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0
> x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0
> x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000
> x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc
> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa
> x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a
> x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001
> Call trace:
>  arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
>  __arm_smmu_tlb_inv_range+0x118/0x254
>  arm_smmu_tlb_inv_range_asid+0x6c/0x130
>  arm_smmu_mm_invalidate_range+0xa0/0xa4
>  __mmu_notifier_invalidate_range_end+0x88/0x120
>  unmap_vmas+0x194/0x1e0
>  unmap_region+0xb4/0x144
>  do_mas_align_munmap+0x290/0x490
>  do_mas_munmap+0xbc/0x124
>  __vm_munmap+0xa8/0x19c
>  __arm64_sys_munmap+0x28/0x50
>  invoke_syscall+0x78/0x11c
>  el0_svc_common.constprop.0+0x58/0x1c0
>  do_el0_svc+0x34/0x60
>  el0_svc+0x2c/0xd4
>  el0t_64_sync_handler+0x114/0x140
>  el0t_64_sync+0x1a4/0x1a8
> --------------------------------------------------------------------
> 
> The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable
> protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur
> to SMMU too since arm_smmu_mm_invalidate_range() is typically called next
> to MMU tlb flush function, e.g.
> 	tlb_flush_mmu_tlbonly {
> 		tlb_flush {
> 			__flush_tlb_range {
> 				// check MAX_TLBI_OPS
> 			}
> 		}
> 		mmu_notifier_invalidate_range {
> 			arm_smmu_mm_invalidate_range {
> 				// does not check MAX_TLBI_OPS
> 			}
> 		}
> 	}
> 
> Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an
> SVA case SMMU uses the CPU page table, so it makes sense to align with the
> tlbflush code. Then, replace per-page TLBI commands with a single per-asid
> TLBI command, if the request size hits this threshold.
> 
> Fixes: 51d113c3be09 ("iommu/arm-smmu-v3: Make BTM optional for SVA")
> Cc: stable@vger.kernel.org
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 ++++++++---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h     |  9 +++++++++
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index a5a63b1c947e..7ec3f5219250 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -201,9 +201,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
>  	 */
>  	size = end - start;
>  
> -	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
> -		arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
> -					    PAGE_SIZE, false, smmu_domain);
> +	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) {
> +		if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) &&
> +		    size >= CMDQ_MAX_TLBI_OPS * PAGE_SIZE)
> +			arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
> +		else
> +			arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
> +						    PAGE_SIZE, false, smmu_domain);

cosmetic nit: Please use braces for the multi-line conditionals.

> +	}
>  	arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
>  }
>  
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index dcab85698a4e..79a81eed1dcc 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -326,6 +326,15 @@
>   */
>  #define CMDQ_BATCH_ENTRIES		BITS_PER_LONG
>  
> +/*
> + * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this
> + * is used as a threshold to replace per-page TLBI commands to issue in the
> + * command queue with an address-space TLBI command, when SMMU w/o a range
> + * invalidation feature handles too many per-page TLBI commands, which will
> + * otherwise result in a soft lockup.
> + */
> +#define CMDQ_MAX_TLBI_OPS		(1 << (PAGE_SHIFT - 3))

Maybe stick "SVA" in the name of this somewhere, since that's the reason why
looking at PAGE_SHIFT is relevant?

Will
Will Deacon Sept. 18, 2023, 9:23 a.m. UTC | #4
On Mon, Sep 18, 2023 at 01:33:31PM +0800, Baolu Lu wrote:
> On 9/15/23 3:43 PM, Nicolin Chen wrote:
> > I found this patch cannot be applied to v6.6-rc1 due to conflicts
> > with some new commits that were merged during the previous cycle.
> > 
> > I can redo a version rebasing on the v6.6-rc1, yet the new version
> > won't apply to earlier kernel stable trees. Is there a way to make
> > it happen that both mainline and earlier trees can have this fix?
> 
> Normally, bug fixes should first be submitted to the mainline kernel
> (also known as Linus's tree). If you use the "Fixes" and "CC-stable"
> tags, the patch will be automatically picked up for the appropriate
> stable kernels.
> 
> If the patch does not apply to any stable kernel that you want it to be
> there, you can then post a back-ported patch to the stable mailing list.
> 
> When doing so, it's better to include the following information:
> 
> - The mainline commit ID of the back-ported patch.
> - The versions of the stable kernel(s) to which you want the back-ported
>   patch to be applied.
> 
> Hope this helps.

Yup, please send a version against -rc1 so we can land that first.

Will
Nicolin Chen Sept. 18, 2023, 2:58 p.m. UTC | #5
On Mon, Sep 18, 2023 at 10:23:42AM +0100, Will Deacon wrote:

> > If the patch does not apply to any stable kernel that you want it to be
> > there, you can then post a back-ported patch to the stable mailing list.
> >
> > When doing so, it's better to include the following information:
> >
> > - The mainline commit ID of the back-ported patch.
> > - The versions of the stable kernel(s) to which you want the back-ported
> >   patch to be applied.
> >
> > Hope this helps.

Yes! Thanks :)

> Yup, please send a version against -rc1 so we can land that first.

Yea, will send a new version.

Thank you
Nicolin
Nicolin Chen Sept. 18, 2023, 3:09 p.m. UTC | #6
On Mon, Sep 18, 2023 at 10:22:36AM +0100, Will Deacon wrote:
> > @@ -201,9 +201,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
> >        */
> >       size = end - start;
> >
> > -     if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
> > -             arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
> > -                                         PAGE_SIZE, false, smmu_domain);
> > +     if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) {
> > +             if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) &&
> > +                 size >= CMDQ_MAX_TLBI_OPS * PAGE_SIZE)
> > +                     arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
> > +             else
> > +                     arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
> > +                                                 PAGE_SIZE, false, smmu_domain);
> 
> cosmetic nit: Please use braces for the multi-line conditionals.

Ack.

> > +     }
> >       arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
> >  }
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > index dcab85698a4e..79a81eed1dcc 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > @@ -326,6 +326,15 @@
> >   */
> >  #define CMDQ_BATCH_ENTRIES           BITS_PER_LONG
> >
> > +/*
> > + * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this
> > + * is used as a threshold to replace per-page TLBI commands to issue in the
> > + * command queue with an address-space TLBI command, when SMMU w/o a range
> > + * invalidation feature handles too many per-page TLBI commands, which will
> > + * otherwise result in a soft lockup.
> > + */
> > +#define CMDQ_MAX_TLBI_OPS            (1 << (PAGE_SHIFT - 3))
> 
> Maybe stick "SVA" in the name of this somewhere, since that's the reason why
> looking at PAGE_SHIFT is relevant?

Hmm, that does make sense, yet it wouldn't apply to the non-SVA
pathway, which makes putting it in the common header meaningless.
Perhaps I should have just left it in arm-smmu-v3-sva.c file...

Meanwhile, we'd need to figure out another definition for non-SVA
pathway.

Thanks
Nicolin
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a5a63b1c947e..7ec3f5219250 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -201,9 +201,14 @@  static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
 	 */
 	size = end - start;
 
-	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
-		arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
-					    PAGE_SIZE, false, smmu_domain);
+	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) {
+		if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) &&
+		    size >= CMDQ_MAX_TLBI_OPS * PAGE_SIZE)
+			arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
+		else
+			arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
+						    PAGE_SIZE, false, smmu_domain);
+	}
 	arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index dcab85698a4e..79a81eed1dcc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -326,6 +326,15 @@ 
  */
 #define CMDQ_BATCH_ENTRIES		BITS_PER_LONG
 
+/*
+ * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this
+ * is used as a threshold to replace per-page TLBI commands to issue in the
+ * command queue with an address-space TLBI command, when SMMU w/o a range
+ * invalidation feature handles too many per-page TLBI commands, which will
+ * otherwise result in a soft lockup.
+ */
+#define CMDQ_MAX_TLBI_OPS		(1 << (PAGE_SHIFT - 3))
+
 #define CMDQ_0_OP			GENMASK_ULL(7, 0)
 #define CMDQ_0_SSV			(1UL << 11)