diff mbox series

[v3] iommu/arm-smmu: Avoid constant zero in TLBI writes

Message ID 09a290f1-27a0-5ee3-16b9-659ef2ba99dc@free.fr (mailing list archive)
State Mainlined, archived
Commit 4e4abae311e4b44aaf61f18a826fd7136037f199
Headers show
Series [v3] iommu/arm-smmu: Avoid constant zero in TLBI writes | expand

Commit Message

Marc Gonzalez June 3, 2019, 12:15 p.m. UTC
From: Robin Murphy <robin.murphy@arm.com>

Apparently, some Qualcomm arm64 platforms which appear to expose their
SMMU global register space are still, in fact, using a hypervisor to
mediate it by trapping and emulating register accesses. Sadly, some
deployed versions of said trapping code have bugs wherein they go
horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
register.

While this can be mitigated for GCC today by tweaking the constraints
for the implementation of writel_relaxed(), to avoid any potential
arms race with future compilers more aggressively optimising register
allocation, the simple way is to just remove all the problematic
constant zeros. For the write-only TLB operations, the actual value is
irrelevant anyway and any old nearby variable will provide a suitable
GPR to encode. The one point at which we really do need a zero to clear
a context bank happens before any of the TLB maintenance where crashes
have been reported, so is apparently not a problem... :/

Reported-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
---
Changes from v2:
- Define and use QCOM_DUMMY_VAL for the 3 problematic mmio writes
- Drop previous Reviewed-by and Tested-by tags (rationale: we are now writing a different value)

Boot log:
REMAP: PA=01680000 VA=ffffff80111b0000 SIZE=10000
arm-smmu 1680000.arm,smmu: probing hardware configuration...
arm-smmu 1680000.arm,smmu: SMMUv2 with:
arm-smmu 1680000.arm,smmu:      stage 1 translation
arm-smmu 1680000.arm,smmu:      address translation ops
arm-smmu 1680000.arm,smmu:      non-coherent table walk
arm-smmu 1680000.arm,smmu:      (IDR0.CTTW overridden by FW configuration)
arm-smmu 1680000.arm,smmu:      stream matching with 16 register groups
arm-smmu 1680000.arm,smmu:      6 context banks (0 stage-2 only)
arm-smmu 1680000.arm,smmu:      Supported page sizes: 0x63315000
arm-smmu 1680000.arm,smmu:      Stage-1: 36-bit VA -> 36-bit IPA
[        SMMU + 000048] = 00000000
[        SMMU + 000c00] = 00020000
[        SMMU + 000800] = 00000000
[        SMMU + 000c04] = 00020000
[        SMMU + 000804] = 00000000
[        SMMU + 000c08] = 00020000
[        SMMU + 000808] = 00000000
[        SMMU + 000c0c] = 00020000
[        SMMU + 00080c] = 00000000
[        SMMU + 000c10] = 00020000
[        SMMU + 000810] = 00000000
[        SMMU + 000c14] = 00020000
[        SMMU + 000814] = 00000000
[        SMMU + 000c18] = 00020000
[        SMMU + 000818] = 00000000
[        SMMU + 000c1c] = 00020000
[        SMMU + 00081c] = 00000000
[        SMMU + 000c20] = 00020000
[        SMMU + 000820] = 00000000
[        SMMU + 000c24] = 00020000
[        SMMU + 000824] = 00000000
[        SMMU + 000c28] = 00020000
[        SMMU + 000828] = 00000000
[        SMMU + 000c2c] = 00020000
[        SMMU + 00082c] = 00000000
[        SMMU + 000c30] = 00020000
[        SMMU + 000830] = 00000000
[        SMMU + 000c34] = 00020000
[        SMMU + 000834] = 00000000
[        SMMU + 000c38] = 00020000
[        SMMU + 000838] = 00000000
[        SMMU + 000c3c] = 00020000
[        SMMU + 00083c] = 00000000
[        SMMU + 008000] = 00000000
[        SMMU + 008058] = c00001fe
[        SMMU + 009000] = 00000000
[        SMMU + 009058] = c00001fe
[        SMMU + 00a000] = 00000000
[        SMMU + 00a058] = c00001fe
[        SMMU + 00b000] = 00000000
[        SMMU + 00b058] = c00001fe
[        SMMU + 00c000] = 00000000
[        SMMU + 00c058] = c00001fe
[        SMMU + 00d000] = 00000000
[        SMMU + 00d058] = c00001fe
[        SMMU + 00006c] = ffffffff
[        SMMU + 000068] = ffffffff
[        SMMU + 000070] = ffffffff
[        SMMU + 000000] = 00201e36
[        SMMU + 000800] = 00001fff
[        SMMU + 000800] = 1fff0000
[        SMMU + 001800] = 00000001
[        SMMU + 001000] = 0001f300
[        SMMU + 008010] = 00038011
[        SMMU + 008030] = 0080351c
[        SMMU + 008020] = 00000000785d5000
[        SMMU + 008028] = 0000000000000000
[        SMMU + 008038] = 0004ff44
[        SMMU + 00803c] = 00000000
[        SMMU + 008000] = 00001067
[        SMMU + 000c00] = 00000000
atl1c 0000:01:00.0: Adding to iommu group 0
[        SMMU + 000c00] = 00000000
[        SMMU + 000800] = 80001480
---
 drivers/iommu/arm-smmu.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Will Deacon June 5, 2019, 12:19 p.m. UTC | #1
[+Joerg on To:]

On Mon, Jun 03, 2019 at 02:15:37PM +0200, Marc Gonzalez wrote:
> From: Robin Murphy <robin.murphy@arm.com>
> 
> Apparently, some Qualcomm arm64 platforms which appear to expose their
> SMMU global register space are still, in fact, using a hypervisor to
> mediate it by trapping and emulating register accesses. Sadly, some
> deployed versions of said trapping code have bugs wherein they go
> horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
> register.
> 
> While this can be mitigated for GCC today by tweaking the constraints
> for the implementation of writel_relaxed(), to avoid any potential
> arms race with future compilers more aggressively optimising register
> allocation, the simple way is to just remove all the problematic
> constant zeros. For the write-only TLB operations, the actual value is
> irrelevant anyway and any old nearby variable will provide a suitable
> GPR to encode. The one point at which we really do need a zero to clear
> a context bank happens before any of the TLB maintenance where crashes
> have been reported, so is apparently not a problem... :/
> 
> Reported-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
> Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr>

Acked-by: Will Deacon <will.deacon@arm.com>

Joerg -- Please can you take this as a fix for 5.2, with a Cc stable?

Cheers,

Will
Marc Gonzalez June 7, 2019, 10:40 a.m. UTC | #2
On 05/06/2019 14:19, Will Deacon wrote:

> On Mon, Jun 03, 2019 at 02:15:37PM +0200, Marc Gonzalez wrote:
>
>> From: Robin Murphy <robin.murphy@arm.com>
>>
>> Apparently, some Qualcomm arm64 platforms which appear to expose their
>> SMMU global register space are still, in fact, using a hypervisor to
>> mediate it by trapping and emulating register accesses. Sadly, some
>> deployed versions of said trapping code have bugs wherein they go
>> horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
>> register.
>>
>> While this can be mitigated for GCC today by tweaking the constraints
>> for the implementation of writel_relaxed(), to avoid any potential
>> arms race with future compilers more aggressively optimising register
>> allocation, the simple way is to just remove all the problematic
>> constant zeros. For the write-only TLB operations, the actual value is
>> irrelevant anyway and any old nearby variable will provide a suitable
>> GPR to encode. The one point at which we really do need a zero to clear
>> a context bank happens before any of the TLB maintenance where crashes
>> have been reported, so is apparently not a problem... :/
>>
>> Reported-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
>> Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> 
> Acked-by: Will Deacon <will.deacon@arm.com>
> 
> Joerg -- Please can you take this as a fix for 5.2, with a Cc stable?

Hello Joerg,

Can you ping this thread once this patch hits linux-next, so I can
ask Bjorn to pick up the 8998 ANOC1 DT node, and the PCIe DT node
that requires ANOC1.

Bjorn: for ANOC1, a small fixup: s/arm,smmu/iommu/

https://patchwork.kernel.org/project/linux-arm-msm/list/?series=99701
https://patchwork.kernel.org/patch/10895341/

Regards.
Joerg Roedel June 12, 2019, 8:10 a.m. UTC | #3
On Mon, Jun 03, 2019 at 02:15:37PM +0200, Marc Gonzalez wrote:
>  drivers/iommu/arm-smmu.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)

Applied, thanks. It should show up in linux-next in the next days.
Marc Gonzalez June 14, 2019, 11:24 a.m. UTC | #4
[ Trimming recipients to minimize inconvenience ]

On 12/06/2019 10:10, Joerg Roedel wrote:

> On Mon, Jun 03, 2019 at 02:15:37PM +0200, Marc Gonzalez wrote:
> 
>>  drivers/iommu/arm-smmu.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> Applied, thanks. It should show up in linux-next in the next days.

Almost there... Should be in tomorrow's next.

https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/log/?h=next
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=next&id=4e4abae311e4b44aaf61f18a826fd7136037f199

Regards.
diff mbox series

Patch

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 930c07635956..9435e4a7759f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -59,6 +59,15 @@ 
 
 #include "arm-smmu-regs.h"
 
+/*
+ * Apparently, some Qualcomm arm64 platforms which appear to expose their SMMU
+ * global register space are still, in fact, using a hypervisor to mediate it
+ * by trapping and emulating register accesses. Sadly, some deployed versions
+ * of said trapping code have bugs wherein they go horribly wrong for stores
+ * using r31 (i.e. XZR/WZR) as the source register.
+ */
+#define QCOM_DUMMY_VAL -1
+
 #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
 
 #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
@@ -423,7 +432,7 @@  static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
 {
 	unsigned int spin_cnt, delay;
 
-	writel_relaxed(0, sync);
+	writel_relaxed(QCOM_DUMMY_VAL, sync);
 	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
 		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
 			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
@@ -1761,8 +1770,8 @@  static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	}
 
 	/* Invalidate the TLB, just in case */
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+	writel_relaxed(QCOM_DUMMY_VAL, gr0_base + ARM_SMMU_GR0_TLBIALLH);
+	writel_relaxed(QCOM_DUMMY_VAL, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
 
 	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);